From 92336619679712a0aa5cf3ea2e115c706f99ff22 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 11:54:21 +0200 Subject: Add documentation on durability and repair procedures (fix #219) --- doc/book/cookbook/durability-repairs.md | 114 ++++++++++++++++++++++++++++++++ doc/book/cookbook/recovering.md | 2 +- doc/book/cookbook/upgrading.md | 2 +- 3 files changed, 116 insertions(+), 2 deletions(-) create mode 100644 doc/book/cookbook/durability-repairs.md (limited to 'doc') diff --git a/doc/book/cookbook/durability-repairs.md b/doc/book/cookbook/durability-repairs.md new file mode 100644 index 00000000..46eb25b8 --- /dev/null +++ b/doc/book/cookbook/durability-repairs.md @@ -0,0 +1,114 @@ ++++ +title = "Durability & Repairs" +weight = 50 ++++ + +To ensure the best durability of your data and to fix any inconsistencies that may +pop up in a distributed system, Garage provides a serires of repair operations. +This guide will explain the meaning of each of them and when they should be applied. + + +# General syntax of repair operations + +Repair operations described below are of the form `garage repair `. +These repairs will not launch without the `--yes` flag, which should +be added as follows: `garage repair --yes `. +By default these repair procedures will only run on the Garage node your CLI is +connecting to. To run on all nodes, add the `-a` flag as follows: +`garage repair -a --yes `. + +# Data block operations + +## Data store scrub + +Scrubbing the data store means examining each individual data block to check that +their content is correct, by verifying their hash. Any block found to be corrupted +(e.g. by bitrot or by an accidental manipulation of the datastore) will be +restored from another node that holds a valid copy. + +A scrub is run automatically by Garage every 30 days. It can also be launched +manually using `garage repair scrub start`. + +To view the status of an ongoing scrub, first find the task ID of the scrub worker +using `garage worker list`. Then, run `garage worker info ` to +view detailed runtime statistics of the scrub. To gather cluster-wide information, +this command has to be run on each individual node. + +A scrub is a very disk-intensive operation that might slow down your cluster. +You may pause an ongoing scrub using `garage repair scrub pause`, but note that +the scrub will resume automatically 24 hours later as Garage will not let your +cluster run without a regular scrub. If the scrub procedure is too intensive +for your servers and is slowing down your workload, the recommended solution +is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`. +A higher tranquility value will make Garage take longer pauses between two block +verifications. Of course, scrubbing the entire data store will also take longer. + +## Block check and resync + +In some cases, nodes hold a reference to a block but do not actually have the block +stored on disk. Conversely, they may also have on disk blocks that are not referenced +any more. To fix both cases, a block repair may be run with `garage repair blocks`. +This will scan the entire block reference counter table to check that the blocks +exist on disk, and will scan the entire disk store to check that stored blocks +are referenced. + +It is recommended to run this procedure when changing your cluster layout, +after the metadata tables have finished synchronizing between nodes +(usually a few hours after `garage layout apply`). + +## Inspecting lost blocks + +In extremely rare situations, data blocks may be unavailable from the entire cluster. +This means that even using `garage repair blocks`, some nodes may be unable +to fetch data blocks for which they hold a reference. + +These errors are stored on each node in a list of "block resync errors", i.e. +blocks for which the last resync operation failed. +This list can be inspected using `garage block list-errors`. +These errors usually fall into one of the following categories: + +1. a block is still referenced but the object was deleted, this is a case + of metadata reference inconsistency (see below for the fix) +2. a block is referenced by a non-deleted object, but could not be fetched due + to a transient error such as a network failure +3. a block is referenced by a non-deleted object, but could not be fetched due + to a permanent error such as there not being any valid copy of the block on the + entire cluster + +To help make the difference between cases 1 and cases 2 and 3, you may use the +`garage block info` command to see which objects hold a reference to each block. + +In the second case (transient errors), Garage will try to fetch the block again +after a certain time, so the error should disappear natuarlly. You can also +request Garage to try to fetch the block immediately using `garage block retry-now` +if you have fixed the transient issue. + +If you are confident that you are in the third scenario and that your data block +is definitely lost, then there is no other choice than to declare your S3 objects +as unrecoverable, and to delete them properly from the data store. This can be done +using the `garage block purge` command. + + +# Metadata operations + +## Metadata table resync + +Garage automatically resyncs all entries stored in the metadata tables every hour, +to ensure that all nodes have the most up-to-date version of all the information +they should be holding. +The resync procedure is based on a Merkle tree that allows to efficiently find +differences between nodes. + +In some special cases, e.g. before an upgrade, you might want to run a table +resync manually. This can be done using `garage repair tables`. + +## Metadata table reference fixes + +In some very rare cases where nodes are unavailable, some references between objects +are broken. For instance, if an object is deleted, the underlying versions or data +blocks may still be held by Garage. If you suspect that such corruption has occurred +in your cluster, you can run one of the following repair procedures: + +- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version +- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected) + diff --git a/doc/book/cookbook/recovering.md b/doc/book/cookbook/recovering.md index 2129a7f3..1c6a6763 100644 --- a/doc/book/cookbook/recovering.md +++ b/doc/book/cookbook/recovering.md @@ -1,6 +1,6 @@ +++ title = "Recovering from failures" -weight = 50 +weight = 60 +++ Garage is meant to work on old, second-hand hardware. diff --git a/doc/book/cookbook/upgrading.md b/doc/book/cookbook/upgrading.md index 64201994..5a2850c0 100644 --- a/doc/book/cookbook/upgrading.md +++ b/doc/book/cookbook/upgrading.md @@ -1,6 +1,6 @@ +++ title = "Upgrading Garage" -weight = 60 +weight = 70 +++ Garage is a stateful clustered application, where all nodes are communicating together and share data structures. -- cgit v1.2.3