aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlex Auvolat <alex@adnab.me>2023-01-30 18:00:01 +0100
committerAlex Auvolat <alex@adnab.me>2023-01-30 18:00:01 +0100
commit44f8b1d71abf661fb4e2a34b22c00569efc09481 (patch)
tree67a6bf20d295a9f053e26901a36f5633e260d787
parent56384677fa70bace19a4f2b555d84de7f77339e0 (diff)
downloadgarage-44f8b1d71abf661fb4e2a34b22c00569efc09481.tar.gz
garage-44f8b1d71abf661fb4e2a34b22c00569efc09481.zip
Reorder reference manual section, move metrics list to there
-rw-r--r--doc/book/cookbook/monitoring.md276
-rw-r--r--doc/book/reference-manual/admin-api.md2
-rw-r--r--doc/book/reference-manual/k2v.md2
-rw-r--r--doc/book/reference-manual/monitoring.md285
-rw-r--r--doc/book/reference-manual/s3-compatibility.md2
5 files changed, 289 insertions, 278 deletions
diff --git a/doc/book/cookbook/monitoring.md b/doc/book/cookbook/monitoring.md
index f2240e8c..8313daa9 100644
--- a/doc/book/cookbook/monitoring.md
+++ b/doc/book/cookbook/monitoring.md
@@ -52,280 +52,6 @@ or make your own.
We detail below the list of exposed metrics and their meaning.
-
## List of exported metrics
-### Garage system metrics
-
-#### `garage_build_info` (counter)
-
-Exposes the Garage version number running on a node.
-
-```
-garage_build_info{version="1.0"} 1
-```
-
-#### `garage_replication_factor` (counter)
-
-Exposes the Garage replication factor configured on the node
-
-```
-garage_replication_factor 3
-```
-
-### Metrics of the API endpoints
-
-#### `api_admin_request_counter` (counter)
-
-Counts the number of requests to a given endpoint of the administration API. Example:
-
-```
-api_admin_request_counter{api_endpoint="Metrics"} 127041
-```
-
-#### `api_admin_request_duration` (histogram)
-
-Evaluates the duration of API calls to the various administration API endpoint. Example:
-
-```
-api_admin_request_duration_bucket{api_endpoint="Metrics",le="0.5"} 127041
-api_admin_request_duration_sum{api_endpoint="Metrics"} 605.250344830999
-api_admin_request_duration_count{api_endpoint="Metrics"} 127041
-```
-
-#### `api_s3_request_counter` (counter)
-
-Counts the number of requests to a given endpoint of the S3 API. Example:
-
-```
-api_s3_request_counter{api_endpoint="CreateMultipartUpload"} 1
-```
-
-#### `api_s3_error_counter` (counter)
-
-Counts the number of requests to a given endpoint of the S3 API that returned an error. Example:
-
-```
-api_s3_error_counter{api_endpoint="GetObject",status_code="404"} 39
-```
-
-#### `api_s3_request_duration` (histogram)
-
-Evaluates the duration of API calls to the various S3 API endpoints. Example:
-
-```
-api_s3_request_duration_bucket{api_endpoint="CreateMultipartUpload",le="0.5"} 1
-api_s3_request_duration_sum{api_endpoint="CreateMultipartUpload"} 0.046340762
-api_s3_request_duration_count{api_endpoint="CreateMultipartUpload"} 1
-```
-
-#### `api_k2v_request_counter` (counter), `api_k2v_error_counter` (counter), `api_k2v_error_duration` (histogram)
-
-Same as for S3, for the K2V API.
-
-
-### Metrics of the Web endpoint
-
-
-#### `web_request_counter` (counter)
-
-Number of requests to the web endpoint
-
-```
-web_request_counter{method="GET"} 80
-```
-
-#### `web_request_duration` (histogram)
-
-Duration of requests to the web endpoint
-
-```
-web_request_duration_bucket{method="GET",le="0.5"} 80
-web_request_duration_sum{method="GET"} 1.0528433229999998
-web_request_duration_count{method="GET"} 80
-```
-
-#### `web_error_counter` (counter)
-
-Number of requests to the web endpoint resulting in errors
-
-```
-web_error_counter{method="GET",status_code="404 Not Found"} 64
-```
-
-
-### Metrics of the data block manager
-
-#### `block_bytes_read`, `block_bytes_written` (counter)
-
-Number of bytes read/written to/from disk in the data storage directory.
-
-```
-block_bytes_read 120586322022
-block_bytes_written 3386618077
-```
-
-#### `block_compression_level` (counter)
-
-Exposes the block compression level configured for the Garage node.
-
-```
-block_compression_level 3
-```
-
-#### `block_read_duration`, `block_write_duration` (histograms)
-
-Evaluates the duration of the reading/writing of individual data blocks in the data storage directory.
-
-```
-block_read_duration_bucket{le="0.5"} 169229
-block_read_duration_sum 2761.6902550310056
-block_read_duration_count 169240
-block_write_duration_bucket{le="0.5"} 3559
-block_write_duration_sum 195.59170078500006
-block_write_duration_count 3571
-```
-
-#### `block_delete_counter` (counter)
-
-Counts the number of data blocks that have been deleted from storage.
-
-```
-block_delete_counter 122
-```
-
-#### `block_resync_counter` (counter), `block_resync_duration` (histogram)
-
-Counts the number of resync operations the node has executed, and evaluates their duration.
-
-```
-block_resync_counter 308897
-block_resync_duration_bucket{le="0.5"} 308892
-block_resync_duration_sum 139.64204196100016
-block_resync_duration_count 308897
-```
-
-#### `block_resync_queue_length` (gauge)
-
-The number of block hashes currently queued for a resync.
-This is normal to be nonzero for long periods of time.
-
-```
-block_resync_queue_length 0
-```
-
-#### `block_resync_errored_blocks` (gauge)
-
-The number of block hashes that we were unable to resync last time we tried.
-**THIS SHOULD BE ZERO, OR FALL BACK TO ZERO RAPIDLY, IN A HEALTHY CLUSTER.**
-Persistent nonzero values indicate that some data is likely to be lost.
-
-```
-block_resync_errored_blocks 0
-```
-
-
-### Metrics related to RPCs (remote procedure calls) between nodes
-
-#### `rpc_netapp_request_counter` (counter)
-
-Number of RPC requests emitted
-
-```
-rpc_request_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 176
-```
-
-#### `rpc_netapp_error_counter` (counter)
-
-Number of communication errors (errors in the Netapp library, generally due to disconnected nodes)
-
-```
-rpc_netapp_error_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 354
-```
-
-#### `rpc_timeout_counter` (counter)
-
-Number of RPC timeouts, should be close to zero in a healthy cluster.
-
-```
-rpc_timeout_counter{from="<this node>",rpc_endpoint="garage_rpc/membership.rs/SystemRpc",to="<remote node>"} 1
-```
-
-#### `rpc_duration` (histogram)
-
-The duration of internal RPC calls between Garage nodes.
-
-```
-rpc_duration_bucket{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>",le="0.5"} 166
-rpc_duration_sum{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 35.172253716
-rpc_duration_count{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 174
-```
-
-
-### Metrics of the metadata table manager
-
-#### `table_gc_todo_queue_length` (gauge)
-
-Table garbage collector TODO queue length
-
-```
-table_gc_todo_queue_length{table_name="block_ref"} 0
-```
-
-#### `table_get_request_counter` (counter), `table_get_request_duration` (histogram)
-
-Number of get/get_range requests internally made on each table, and their duration.
-
-```
-table_get_request_counter{table_name="bucket_alias"} 315
-table_get_request_duration_bucket{table_name="bucket_alias",le="0.5"} 315
-table_get_request_duration_sum{table_name="bucket_alias"} 0.048509778000000024
-table_get_request_duration_count{table_name="bucket_alias"} 315
-```
-
-
-#### `table_put_request_counter` (counter), `table_put_request_duration` (histogram)
-
-Number of insert/insert_many requests internally made on this table, and their duration
-
-```
-table_put_request_counter{table_name="block_ref"} 677
-table_put_request_duration_bucket{table_name="block_ref",le="0.5"} 677
-table_put_request_duration_sum{table_name="block_ref"} 61.617528636
-table_put_request_duration_count{table_name="block_ref"} 677
-```
-
-#### `table_internal_delete_counter` (counter)
-
-Number of value deletions in the tree (due to GC or repartitioning)
-
-```
-table_internal_delete_counter{table_name="block_ref"} 2296
-```
-
-#### `table_internal_update_counter` (counter)
-
-Number of value updates where the value actually changes (includes creation of new key and update of existing key)
-
-```
-table_internal_update_counter{table_name="block_ref"} 5996
-```
-
-#### `table_merkle_updater_todo_queue_length` (gauge)
-
-Merkle tree updater TODO queue length (should fall to zero rapidly)
-
-```
-table_merkle_updater_todo_queue_length{table_name="block_ref"} 0
-```
-
-#### `table_sync_items_received`, `table_sync_items_sent` (counters)
-
-Number of data items sent to/recieved from other nodes during resync procedures
-
-```
-table_sync_items_received{from="<remote node>",table_name="bucket_v2"} 3
-table_sync_items_sent{table_name="block_ref",to="<remote node>"} 2
-```
-
-
+See our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section.
diff --git a/doc/book/reference-manual/admin-api.md b/doc/book/reference-manual/admin-api.md
index 0b7e2e16..363bc886 100644
--- a/doc/book/reference-manual/admin-api.md
+++ b/doc/book/reference-manual/admin-api.md
@@ -1,6 +1,6 @@
+++
title = "Administration API"
-weight = 60
+weight = 40
+++
The Garage administration API is accessible through a dedicated server whose
diff --git a/doc/book/reference-manual/k2v.md b/doc/book/reference-manual/k2v.md
index 207d056a..d40ec854 100644
--- a/doc/book/reference-manual/k2v.md
+++ b/doc/book/reference-manual/k2v.md
@@ -1,6 +1,6 @@
+++
title = "K2V"
-weight = 70
+weight = 100
+++
Starting with version 0.7.2, Garage introduces an optionnal feature, K2V,
diff --git a/doc/book/reference-manual/monitoring.md b/doc/book/reference-manual/monitoring.md
new file mode 100644
index 00000000..97c533d3
--- /dev/null
+++ b/doc/book/reference-manual/monitoring.md
@@ -0,0 +1,285 @@
+
++++
+title = "Monitoring"
+weight = 60
++++
+
+
+For information on setting up monitoring, see our [dedicated page](@/documentation/cookbook/monitoring.md) in the Cookbook section.
+
+## List of exported metrics
+
+### Garage system metrics
+
+#### `garage_build_info` (counter)
+
+Exposes the Garage version number running on a node.
+
+```
+garage_build_info{version="1.0"} 1
+```
+
+#### `garage_replication_factor` (counter)
+
+Exposes the Garage replication factor configured on the node
+
+```
+garage_replication_factor 3
+```
+
+### Metrics of the API endpoints
+
+#### `api_admin_request_counter` (counter)
+
+Counts the number of requests to a given endpoint of the administration API. Example:
+
+```
+api_admin_request_counter{api_endpoint="Metrics"} 127041
+```
+
+#### `api_admin_request_duration` (histogram)
+
+Evaluates the duration of API calls to the various administration API endpoint. Example:
+
+```
+api_admin_request_duration_bucket{api_endpoint="Metrics",le="0.5"} 127041
+api_admin_request_duration_sum{api_endpoint="Metrics"} 605.250344830999
+api_admin_request_duration_count{api_endpoint="Metrics"} 127041
+```
+
+#### `api_s3_request_counter` (counter)
+
+Counts the number of requests to a given endpoint of the S3 API. Example:
+
+```
+api_s3_request_counter{api_endpoint="CreateMultipartUpload"} 1
+```
+
+#### `api_s3_error_counter` (counter)
+
+Counts the number of requests to a given endpoint of the S3 API that returned an error. Example:
+
+```
+api_s3_error_counter{api_endpoint="GetObject",status_code="404"} 39
+```
+
+#### `api_s3_request_duration` (histogram)
+
+Evaluates the duration of API calls to the various S3 API endpoints. Example:
+
+```
+api_s3_request_duration_bucket{api_endpoint="CreateMultipartUpload",le="0.5"} 1
+api_s3_request_duration_sum{api_endpoint="CreateMultipartUpload"} 0.046340762
+api_s3_request_duration_count{api_endpoint="CreateMultipartUpload"} 1
+```
+
+#### `api_k2v_request_counter` (counter), `api_k2v_error_counter` (counter), `api_k2v_error_duration` (histogram)
+
+Same as for S3, for the K2V API.
+
+
+### Metrics of the Web endpoint
+
+
+#### `web_request_counter` (counter)
+
+Number of requests to the web endpoint
+
+```
+web_request_counter{method="GET"} 80
+```
+
+#### `web_request_duration` (histogram)
+
+Duration of requests to the web endpoint
+
+```
+web_request_duration_bucket{method="GET",le="0.5"} 80
+web_request_duration_sum{method="GET"} 1.0528433229999998
+web_request_duration_count{method="GET"} 80
+```
+
+#### `web_error_counter` (counter)
+
+Number of requests to the web endpoint resulting in errors
+
+```
+web_error_counter{method="GET",status_code="404 Not Found"} 64
+```
+
+
+### Metrics of the data block manager
+
+#### `block_bytes_read`, `block_bytes_written` (counter)
+
+Number of bytes read/written to/from disk in the data storage directory.
+
+```
+block_bytes_read 120586322022
+block_bytes_written 3386618077
+```
+
+#### `block_compression_level` (counter)
+
+Exposes the block compression level configured for the Garage node.
+
+```
+block_compression_level 3
+```
+
+#### `block_read_duration`, `block_write_duration` (histograms)
+
+Evaluates the duration of the reading/writing of individual data blocks in the data storage directory.
+
+```
+block_read_duration_bucket{le="0.5"} 169229
+block_read_duration_sum 2761.6902550310056
+block_read_duration_count 169240
+block_write_duration_bucket{le="0.5"} 3559
+block_write_duration_sum 195.59170078500006
+block_write_duration_count 3571
+```
+
+#### `block_delete_counter` (counter)
+
+Counts the number of data blocks that have been deleted from storage.
+
+```
+block_delete_counter 122
+```
+
+#### `block_resync_counter` (counter), `block_resync_duration` (histogram)
+
+Counts the number of resync operations the node has executed, and evaluates their duration.
+
+```
+block_resync_counter 308897
+block_resync_duration_bucket{le="0.5"} 308892
+block_resync_duration_sum 139.64204196100016
+block_resync_duration_count 308897
+```
+
+#### `block_resync_queue_length` (gauge)
+
+The number of block hashes currently queued for a resync.
+This is normal to be nonzero for long periods of time.
+
+```
+block_resync_queue_length 0
+```
+
+#### `block_resync_errored_blocks` (gauge)
+
+The number of block hashes that we were unable to resync last time we tried.
+**THIS SHOULD BE ZERO, OR FALL BACK TO ZERO RAPIDLY, IN A HEALTHY CLUSTER.**
+Persistent nonzero values indicate that some data is likely to be lost.
+
+```
+block_resync_errored_blocks 0
+```
+
+
+### Metrics related to RPCs (remote procedure calls) between nodes
+
+#### `rpc_netapp_request_counter` (counter)
+
+Number of RPC requests emitted
+
+```
+rpc_request_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 176
+```
+
+#### `rpc_netapp_error_counter` (counter)
+
+Number of communication errors (errors in the Netapp library, generally due to disconnected nodes)
+
+```
+rpc_netapp_error_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 354
+```
+
+#### `rpc_timeout_counter` (counter)
+
+Number of RPC timeouts, should be close to zero in a healthy cluster.
+
+```
+rpc_timeout_counter{from="<this node>",rpc_endpoint="garage_rpc/membership.rs/SystemRpc",to="<remote node>"} 1
+```
+
+#### `rpc_duration` (histogram)
+
+The duration of internal RPC calls between Garage nodes.
+
+```
+rpc_duration_bucket{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>",le="0.5"} 166
+rpc_duration_sum{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 35.172253716
+rpc_duration_count{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 174
+```
+
+
+### Metrics of the metadata table manager
+
+#### `table_gc_todo_queue_length` (gauge)
+
+Table garbage collector TODO queue length
+
+```
+table_gc_todo_queue_length{table_name="block_ref"} 0
+```
+
+#### `table_get_request_counter` (counter), `table_get_request_duration` (histogram)
+
+Number of get/get_range requests internally made on each table, and their duration.
+
+```
+table_get_request_counter{table_name="bucket_alias"} 315
+table_get_request_duration_bucket{table_name="bucket_alias",le="0.5"} 315
+table_get_request_duration_sum{table_name="bucket_alias"} 0.048509778000000024
+table_get_request_duration_count{table_name="bucket_alias"} 315
+```
+
+
+#### `table_put_request_counter` (counter), `table_put_request_duration` (histogram)
+
+Number of insert/insert_many requests internally made on this table, and their duration
+
+```
+table_put_request_counter{table_name="block_ref"} 677
+table_put_request_duration_bucket{table_name="block_ref",le="0.5"} 677
+table_put_request_duration_sum{table_name="block_ref"} 61.617528636
+table_put_request_duration_count{table_name="block_ref"} 677
+```
+
+#### `table_internal_delete_counter` (counter)
+
+Number of value deletions in the tree (due to GC or repartitioning)
+
+```
+table_internal_delete_counter{table_name="block_ref"} 2296
+```
+
+#### `table_internal_update_counter` (counter)
+
+Number of value updates where the value actually changes (includes creation of new key and update of existing key)
+
+```
+table_internal_update_counter{table_name="block_ref"} 5996
+```
+
+#### `table_merkle_updater_todo_queue_length` (gauge)
+
+Merkle tree updater TODO queue length (should fall to zero rapidly)
+
+```
+table_merkle_updater_todo_queue_length{table_name="block_ref"} 0
+```
+
+#### `table_sync_items_received`, `table_sync_items_sent` (counters)
+
+Number of data items sent to/recieved from other nodes during resync procedures
+
+```
+table_sync_items_received{from="<remote node>",table_name="bucket_v2"} 3
+table_sync_items_sent{table_name="block_ref",to="<remote node>"} 2
+```
+
+
diff --git a/doc/book/reference-manual/s3-compatibility.md b/doc/book/reference-manual/s3-compatibility.md
index dd3492a0..15b29bd1 100644
--- a/doc/book/reference-manual/s3-compatibility.md
+++ b/doc/book/reference-manual/s3-compatibility.md
@@ -1,6 +1,6 @@
+++
title = "S3 Compatibility status"
-weight = 40
+weight = 70
+++
## DISCLAIMER