aboutsummaryrefslogtreecommitdiff
path: root/doc/book/design/internals.md
diff options
context:
space:
mode:
authorMendes <mendes.oulamara@pm.me>2022-10-04 18:14:49 +0200
committerMendes <mendes.oulamara@pm.me>2022-10-04 18:14:49 +0200
commit829f815a897b04986559910bbcbf53625adcdf20 (patch)
tree6db3c27cff2aded754a641d1f2b05c83be701267 /doc/book/design/internals.md
parent99f96b9564c9c841dc6c56f1255a6e70ff884d46 (diff)
parenta096ced35562bd0a8877a1ee2f755be1edafe343 (diff)
downloadgarage-829f815a897b04986559910bbcbf53625adcdf20.tar.gz
garage-829f815a897b04986559910bbcbf53625adcdf20.zip
Merge remote-tracking branch 'origin/main' into optimal-layout
Diffstat (limited to 'doc/book/design/internals.md')
-rw-r--r--doc/book/design/internals.md43
1 files changed, 43 insertions, 0 deletions
diff --git a/doc/book/design/internals.md b/doc/book/design/internals.md
index 05d852e2..777e017d 100644
--- a/doc/book/design/internals.md
+++ b/doc/book/design/internals.md
@@ -20,6 +20,49 @@ In the meantime, you can find some information at the following links:
- [an old design draft](@/documentation/working-documents/design-draft.md)
+## Request routing logic
+
+Data retrieval requests to Garage endpoints (S3 API and websites) are resolved
+to an individual object in a bucket. Since objects are replicated to multiple nodes
+Garage must ensure consistency before answering the request.
+
+### Using quorum to ensure consistency
+
+Garage ensures consistency by attempting to establish a quorum with the
+data nodes responsible for the object. When a majority of the data nodes
+have provided metadata on a object Garage can then answer the request.
+
+When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions:
+
+- Make a request to the two preferred nodes for object metadata
+- Try the third node if one of the two initial requests fail
+- Check that the metadata from at least 2 nodes match
+- Check that the object hasn't been marked deleted
+- Answer the request with inline data from metadata if object is small enough
+- Or get data blocks from the preferred nodes and answer using the assembled object
+
+Garage dynamically determines which nodes to query based on health, preference, and
+which nodes actually host a given data. Garage has no concept of "primary" so any
+healthy node with the data can be used as long as a quorum is reached for the metadata.
+
+### Node health
+
+Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection
+cannot be established, or a node fails to answer a number of pings, the target node is marked as failed.
+Failed nodes are not used for quorum or other internal requests.
+
+### Node preference
+
+Garage prioritizes which nodes to query according to a few criteria:
+
+- A node always prefers itself if it can answer the request
+- Then the node prioritizes nodes in the same zone
+- Finally the nodes with the lowest latency are prioritized
+
+
+For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md)
+and [cluster layout management](@/documentation/reference-manual/layout.md) pages.
+
## Garbage collection
A faulty garbage collection procedure has been the cause of