Merge pull request 'Fix unbounded buffering when one node has slower network' (#792) from fix-buffering into main

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/792
author: Alex <alex@adnab.me> 2024-03-28 12:40:27 +0000
committer: Alex <alex@adnab.me> 2024-03-28 12:40:27 +0000
commit: ecf641d88c264f7278d13a6d988288feb24a5dfe (patch)
tree: 5cd60dfa4f0d6d32a66d2e32d7912c9e289067c8 /doc/book/reference-manual/configuration.md
parent: 75cd14926d8dec8c36289197822df78391686c6a (diff)
parent: 85f580cbde4913fe8382316ff3c27b8443c61dd7 (diff)
download: garage-ecf641d88c264f7278d13a6d988288feb24a5dfe.tar.gz
garage-ecf641d88c264f7278d13a6d988288feb24a5dfe.zip
1 files changed, 33 insertions, 0 deletions
diff --git a/doc/book/reference-manual/configuration.md b/doc/book/reference-manual/configuration.md
index e6aced6d..1ac2051e 100644
--- a/doc/book/reference-manual/configuration.md
+++ b/doc/book/reference-manual/configuration.md
@@ -20,6 +20,7 @@ metadata_auto_snapshot_interval = "6h"
 db_engine = "lmdb"
 
 block_size = "1M"
+block_ram_buffer_max = "256MiB"
 
 sled_cache_capacity = "128MiB"
 sled_flush_every_ms = 2000
@@ -88,6 +89,7 @@ The following gives details about each available configuration option.
 
 Top-level configuration options:
 [`allow_world_readable_secrets`](#allow_world_readable_secrets),
+[`block_ram_buffer_max`](#block_ram_buffer_max),
 [`block_size`](#block_size),
 [`bootstrap_peers`](#bootstrap_peers),
 [`compression_level`](#compression_level),
@@ -420,6 +422,37 @@ files will remain available. This however means that chunks from existing files
 will not be deduplicated with chunks from newly uploaded files, meaning you
 might use more storage space that is optimally possible.
 
+#### `block_ram_buffer_max` (since v0.9.4) {#block_ram_buffer_max}
+
+A limit on the total size of data blocks kept in RAM by S3 API nodes awaiting
+to be sent to storage nodes asynchronously.
+
+Explanation: since Garage wants to tolerate node failures, it uses quorum
+writes to send data blocks to storage nodes: try to write the block to three
+nodes, and return ok as soon as two writes complete. So even if all three nodes
+are online, the third write always completes asynchronously.  In general, there
+are not many writes to a cluster, and the third asynchronous write can
+terminate early enough so as to not cause unbounded RAM growth.  However, if
+the S3 API node is continuously receiving large quantities of data and the
+third node is never able to catch up, many data blocks will be kept buffered in
+RAM as they are awaiting transfer to the third node.
+
+The `block_ram_buffer_max` sets a limit to the size of buffers that can be kept
+in RAM in this process.  When the limit is reached, backpressure is applied
+back to the S3 client.
+
+Note that this only counts buffers that have arrived to a certain stage of
+processing (received from the client + encrypted and/or compressed as
+necessary) and are ready to send to the storage nodes. Many other buffers will
+not be counted and this is not a hard limit on RAM consumption.  In particular,
+if many clients send requests simultaneously with large objects, the RAM
+consumption will always grow linearly with the number of concurrent requests,
+as each request will use a few buffers of size `block_size` for receiving and
+intermediate processing before even trying to send the data to the storage
+node.
+
+The default value is 256MiB.
+
 #### `sled_cache_capacity` {#sled_cache_capacity}
 
 This parameter can be used to tune the capacity of the cache used by
author	Alex <alex@adnab.me>	2024-03-28 12:40:27 +0000
committer	Alex <alex@adnab.me>	2024-03-28 12:40:27 +0000
commit	ecf641d88c264f7278d13a6d988288feb24a5dfe (patch)
tree	5cd60dfa4f0d6d32a66d2e32d7912c9e289067c8 /doc/book/reference-manual/configuration.md
parent	75cd14926d8dec8c36289197822df78391686c6a (diff)
parent	85f580cbde4913fe8382316ff3c27b8443c61dd7 (diff)
download	garage-ecf641d88c264f7278d13a6d988288feb24a5dfe.tar.gz garage-ecf641d88c264f7278d13a6d988288feb24a5dfe.zip