diff options
author | Alex Auvolat <alex@adnab.me> | 2022-09-14 19:31:13 +0200 |
---|---|---|
committer | Alex Auvolat <alex@adnab.me> | 2022-09-14 19:31:13 +0200 |
commit | f6aebefcc9747bf5afad3767e9ae6f9f3aba30ae (patch) | |
tree | fa48f9ee7b5ae7e9b93df8146ede7a8536262fb2 /doc/book/design/internals.md | |
parent | 89b8087ba81c508ba382aa6c9cb6bb3afa6a43c8 (diff) | |
download | garage-f6aebefcc9747bf5afad3767e9ae6f9f3aba30ae.tar.gz garage-f6aebefcc9747bf5afad3767e9ae6f9f3aba30ae.zip |
Some work on documentation towards v0.8
Diffstat (limited to 'doc/book/design/internals.md')
-rw-r--r-- | doc/book/design/internals.md | 43 |
1 files changed, 43 insertions, 0 deletions
diff --git a/doc/book/design/internals.md b/doc/book/design/internals.md index 05d852e2..777e017d 100644 --- a/doc/book/design/internals.md +++ b/doc/book/design/internals.md @@ -20,6 +20,49 @@ In the meantime, you can find some information at the following links: - [an old design draft](@/documentation/working-documents/design-draft.md) +## Request routing logic + +Data retrieval requests to Garage endpoints (S3 API and websites) are resolved +to an individual object in a bucket. Since objects are replicated to multiple nodes +Garage must ensure consistency before answering the request. + +### Using quorum to ensure consistency + +Garage ensures consistency by attempting to establish a quorum with the +data nodes responsible for the object. When a majority of the data nodes +have provided metadata on a object Garage can then answer the request. + +When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions: + +- Make a request to the two preferred nodes for object metadata +- Try the third node if one of the two initial requests fail +- Check that the metadata from at least 2 nodes match +- Check that the object hasn't been marked deleted +- Answer the request with inline data from metadata if object is small enough +- Or get data blocks from the preferred nodes and answer using the assembled object + +Garage dynamically determines which nodes to query based on health, preference, and +which nodes actually host a given data. Garage has no concept of "primary" so any +healthy node with the data can be used as long as a quorum is reached for the metadata. + +### Node health + +Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection +cannot be established, or a node fails to answer a number of pings, the target node is marked as failed. +Failed nodes are not used for quorum or other internal requests. + +### Node preference + +Garage prioritizes which nodes to query according to a few criteria: + +- A node always prefers itself if it can answer the request +- Then the node prioritizes nodes in the same zone +- Finally the nodes with the lowest latency are prioritized + + +For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md) +and [cluster layout management](@/documentation/reference-manual/layout.md) pages. + ## Garbage collection A faulty garbage collection procedure has been the cause of |