From 5a186be363ca5225a40bb4ecffb97b342e840269 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 11:09:31 +0200 Subject: Doc: update goals, add docker alias Fix #235 --- doc/book/cookbook/real-world.md | 6 ++++++ doc/book/design/goals.md | 12 +++++------- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/doc/book/cookbook/real-world.md b/doc/book/cookbook/real-world.md index 08266b23..0b9e016c 100644 --- a/doc/book/cookbook/real-world.md +++ b/doc/book/cookbook/real-world.md @@ -197,6 +197,12 @@ The `garage` binary has two purposes: Ensure an appropriate `garage` binary (the same version as your Docker image) is available in your path. If your configuration file is at `/etc/garage.toml`, the `garage` binary should work with no further change. +You can also use an alias as follows to use the Garage binary inside your docker container: + +```bash +alias garage="docker exec -ti /garage" +``` + You can test your `garage` CLI utility by running a simple command such as: ```bash diff --git a/doc/book/design/goals.md b/doc/book/design/goals.md index 4e390ba6..78ac7978 100644 --- a/doc/book/design/goals.md +++ b/doc/book/design/goals.md @@ -42,15 +42,13 @@ locations. They use Garage themselves for the following tasks: - As a [Matrix media backend](https://github.com/matrix-org/synapse-s3-storage-provider) -- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 proxy +- As a Nix binary cache -- In the Drone continuous integration platform to store task logs +- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 and SFTP-to-S3 proxy -- As a Nix binary cache +- As a backup target using `rclone` and `restic` -- As a backup target using `rclone` +- In the Drone continuous integration platform to store task logs The Deuxfleurs Garage cluster is a multi-site cluster currently composed of -4 nodes in 2 physical locations. In the future it will be expanded to at -least 3 physical locations to fully exploit Garage's potential for high -availability. +9 nodes in 3 physical locations. -- cgit v1.2.3 From 3aadba724d0ed8e57fe98bf5f6b4cb998e0ee093 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 11:21:56 +0200 Subject: doc: english improvement --- doc/book/cookbook/upgrading.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/book/cookbook/upgrading.md b/doc/book/cookbook/upgrading.md index 9d60a988..64201994 100644 --- a/doc/book/cookbook/upgrading.md +++ b/doc/book/cookbook/upgrading.md @@ -58,7 +58,7 @@ From a high level perspective, a major upgrade looks like this: ### Major upgarades with minimal downtime -There is only one operation that has to be coordinated cluster-wide: the passage of one version of the internal RPC protocol to the next. +There is only one operation that has to be coordinated cluster-wide: the switch of one version of the internal RPC protocol to the next. This means that an upgrade with very limited downtime can simply be performed from one major version to the next by restarting all nodes simultaneously in the new version. The downtime will simply be the time required for all nodes to stop and start again, which should be less than a minute. -- cgit v1.2.3 From 92336619679712a0aa5cf3ea2e115c706f99ff22 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 11:54:21 +0200 Subject: Add documentation on durability and repair procedures (fix #219) --- doc/book/cookbook/durability-repairs.md | 114 ++++++++++++++++++++++++++++++++ doc/book/cookbook/recovering.md | 2 +- doc/book/cookbook/upgrading.md | 2 +- 3 files changed, 116 insertions(+), 2 deletions(-) create mode 100644 doc/book/cookbook/durability-repairs.md diff --git a/doc/book/cookbook/durability-repairs.md b/doc/book/cookbook/durability-repairs.md new file mode 100644 index 00000000..46eb25b8 --- /dev/null +++ b/doc/book/cookbook/durability-repairs.md @@ -0,0 +1,114 @@ ++++ +title = "Durability & Repairs" +weight = 50 ++++ + +To ensure the best durability of your data and to fix any inconsistencies that may +pop up in a distributed system, Garage provides a serires of repair operations. +This guide will explain the meaning of each of them and when they should be applied. + + +# General syntax of repair operations + +Repair operations described below are of the form `garage repair `. +These repairs will not launch without the `--yes` flag, which should +be added as follows: `garage repair --yes `. +By default these repair procedures will only run on the Garage node your CLI is +connecting to. To run on all nodes, add the `-a` flag as follows: +`garage repair -a --yes `. + +# Data block operations + +## Data store scrub + +Scrubbing the data store means examining each individual data block to check that +their content is correct, by verifying their hash. Any block found to be corrupted +(e.g. by bitrot or by an accidental manipulation of the datastore) will be +restored from another node that holds a valid copy. + +A scrub is run automatically by Garage every 30 days. It can also be launched +manually using `garage repair scrub start`. + +To view the status of an ongoing scrub, first find the task ID of the scrub worker +using `garage worker list`. Then, run `garage worker info ` to +view detailed runtime statistics of the scrub. To gather cluster-wide information, +this command has to be run on each individual node. + +A scrub is a very disk-intensive operation that might slow down your cluster. +You may pause an ongoing scrub using `garage repair scrub pause`, but note that +the scrub will resume automatically 24 hours later as Garage will not let your +cluster run without a regular scrub. If the scrub procedure is too intensive +for your servers and is slowing down your workload, the recommended solution +is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`. +A higher tranquility value will make Garage take longer pauses between two block +verifications. Of course, scrubbing the entire data store will also take longer. + +## Block check and resync + +In some cases, nodes hold a reference to a block but do not actually have the block +stored on disk. Conversely, they may also have on disk blocks that are not referenced +any more. To fix both cases, a block repair may be run with `garage repair blocks`. +This will scan the entire block reference counter table to check that the blocks +exist on disk, and will scan the entire disk store to check that stored blocks +are referenced. + +It is recommended to run this procedure when changing your cluster layout, +after the metadata tables have finished synchronizing between nodes +(usually a few hours after `garage layout apply`). + +## Inspecting lost blocks + +In extremely rare situations, data blocks may be unavailable from the entire cluster. +This means that even using `garage repair blocks`, some nodes may be unable +to fetch data blocks for which they hold a reference. + +These errors are stored on each node in a list of "block resync errors", i.e. +blocks for which the last resync operation failed. +This list can be inspected using `garage block list-errors`. +These errors usually fall into one of the following categories: + +1. a block is still referenced but the object was deleted, this is a case + of metadata reference inconsistency (see below for the fix) +2. a block is referenced by a non-deleted object, but could not be fetched due + to a transient error such as a network failure +3. a block is referenced by a non-deleted object, but could not be fetched due + to a permanent error such as there not being any valid copy of the block on the + entire cluster + +To help make the difference between cases 1 and cases 2 and 3, you may use the +`garage block info` command to see which objects hold a reference to each block. + +In the second case (transient errors), Garage will try to fetch the block again +after a certain time, so the error should disappear natuarlly. You can also +request Garage to try to fetch the block immediately using `garage block retry-now` +if you have fixed the transient issue. + +If you are confident that you are in the third scenario and that your data block +is definitely lost, then there is no other choice than to declare your S3 objects +as unrecoverable, and to delete them properly from the data store. This can be done +using the `garage block purge` command. + + +# Metadata operations + +## Metadata table resync + +Garage automatically resyncs all entries stored in the metadata tables every hour, +to ensure that all nodes have the most up-to-date version of all the information +they should be holding. +The resync procedure is based on a Merkle tree that allows to efficiently find +differences between nodes. + +In some special cases, e.g. before an upgrade, you might want to run a table +resync manually. This can be done using `garage repair tables`. + +## Metadata table reference fixes + +In some very rare cases where nodes are unavailable, some references between objects +are broken. For instance, if an object is deleted, the underlying versions or data +blocks may still be held by Garage. If you suspect that such corruption has occurred +in your cluster, you can run one of the following repair procedures: + +- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version +- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected) + diff --git a/doc/book/cookbook/recovering.md b/doc/book/cookbook/recovering.md index 2129a7f3..1c6a6763 100644 --- a/doc/book/cookbook/recovering.md +++ b/doc/book/cookbook/recovering.md @@ -1,6 +1,6 @@ +++ title = "Recovering from failures" -weight = 50 +weight = 60 +++ Garage is meant to work on old, second-hand hardware. diff --git a/doc/book/cookbook/upgrading.md b/doc/book/cookbook/upgrading.md index 64201994..5a2850c0 100644 --- a/doc/book/cookbook/upgrading.md +++ b/doc/book/cookbook/upgrading.md @@ -1,6 +1,6 @@ +++ title = "Upgrading Garage" -weight = 60 +weight = 70 +++ Garage is a stateful clustered application, where all nodes are communicating together and share data structures. -- cgit v1.2.3 From dd7533a260291a25d69b8e7afa423df9e0d6a30c Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 12:08:02 +0200 Subject: doc: add an operations&maintenance section and move some pages there --- doc/book/build/_index.md | 2 +- doc/book/connect/_index.md | 2 +- doc/book/cookbook/_index.md | 6 +- doc/book/cookbook/durability-repairs.md | 114 ------------------------------ doc/book/cookbook/real-world.md | 2 +- doc/book/cookbook/recovering.md | 110 ---------------------------- doc/book/cookbook/upgrading.md | 85 ---------------------- doc/book/design/_index.md | 2 +- doc/book/design/internals.md | 2 +- doc/book/development/_index.md | 2 +- doc/book/operations/_index.md | 23 ++++++ doc/book/operations/durability-repairs.md | 114 ++++++++++++++++++++++++++++++ doc/book/operations/layout.md | 77 ++++++++++++++++++++ doc/book/operations/recovering.md | 110 ++++++++++++++++++++++++++++ doc/book/operations/upgrading.md | 85 ++++++++++++++++++++++ doc/book/quick-start/_index.md | 2 +- doc/book/reference-manual/_index.md | 2 +- doc/book/reference-manual/features.md | 2 +- doc/book/reference-manual/layout.md | 77 -------------------- doc/book/working-documents/_index.md | 2 +- 20 files changed, 420 insertions(+), 401 deletions(-) delete mode 100644 doc/book/cookbook/durability-repairs.md delete mode 100644 doc/book/cookbook/recovering.md delete mode 100644 doc/book/cookbook/upgrading.md create mode 100644 doc/book/operations/_index.md create mode 100644 doc/book/operations/durability-repairs.md create mode 100644 doc/book/operations/layout.md create mode 100644 doc/book/operations/recovering.md create mode 100644 doc/book/operations/upgrading.md delete mode 100644 doc/book/reference-manual/layout.md diff --git a/doc/book/build/_index.md b/doc/book/build/_index.md index 9bb17086..021045aa 100644 --- a/doc/book/build/_index.md +++ b/doc/book/build/_index.md @@ -1,6 +1,6 @@ +++ title = "Build your own app" -weight = 4 +weight = 40 sort_by = "weight" template = "documentation.html" +++ diff --git a/doc/book/connect/_index.md b/doc/book/connect/_index.md index 93a2b87e..7d8e686c 100644 --- a/doc/book/connect/_index.md +++ b/doc/book/connect/_index.md @@ -1,6 +1,6 @@ +++ title = "Existing integrations" -weight = 3 +weight = 30 sort_by = "weight" template = "documentation.html" +++ diff --git a/doc/book/cookbook/_index.md b/doc/book/cookbook/_index.md index 07bf6ebf..ff90ad52 100644 --- a/doc/book/cookbook/_index.md +++ b/doc/book/cookbook/_index.md @@ -1,7 +1,7 @@ +++ title="Cookbook" template = "documentation.html" -weight = 2 +weight = 20 sort_by = "weight" +++ @@ -37,7 +37,3 @@ This chapter could also be referred as "Tutorials" or "Best practices". - **[Monitoring Garage](@/documentation/cookbook/monitoring.md)** This page explains the Prometheus metrics available for monitoring the Garage cluster/nodes. - -- **[Recovering from failures](@/documentation/cookbook/recovering.md):** Garage's first selling point is resilience - to hardware failures. This section explains how to recover from such a failure in the - best possible way. diff --git a/doc/book/cookbook/durability-repairs.md b/doc/book/cookbook/durability-repairs.md deleted file mode 100644 index 46eb25b8..00000000 --- a/doc/book/cookbook/durability-repairs.md +++ /dev/null @@ -1,114 +0,0 @@ -+++ -title = "Durability & Repairs" -weight = 50 -+++ - -To ensure the best durability of your data and to fix any inconsistencies that may -pop up in a distributed system, Garage provides a serires of repair operations. -This guide will explain the meaning of each of them and when they should be applied. - - -# General syntax of repair operations - -Repair operations described below are of the form `garage repair `. -These repairs will not launch without the `--yes` flag, which should -be added as follows: `garage repair --yes `. -By default these repair procedures will only run on the Garage node your CLI is -connecting to. To run on all nodes, add the `-a` flag as follows: -`garage repair -a --yes `. - -# Data block operations - -## Data store scrub - -Scrubbing the data store means examining each individual data block to check that -their content is correct, by verifying their hash. Any block found to be corrupted -(e.g. by bitrot or by an accidental manipulation of the datastore) will be -restored from another node that holds a valid copy. - -A scrub is run automatically by Garage every 30 days. It can also be launched -manually using `garage repair scrub start`. - -To view the status of an ongoing scrub, first find the task ID of the scrub worker -using `garage worker list`. Then, run `garage worker info ` to -view detailed runtime statistics of the scrub. To gather cluster-wide information, -this command has to be run on each individual node. - -A scrub is a very disk-intensive operation that might slow down your cluster. -You may pause an ongoing scrub using `garage repair scrub pause`, but note that -the scrub will resume automatically 24 hours later as Garage will not let your -cluster run without a regular scrub. If the scrub procedure is too intensive -for your servers and is slowing down your workload, the recommended solution -is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`. -A higher tranquility value will make Garage take longer pauses between two block -verifications. Of course, scrubbing the entire data store will also take longer. - -## Block check and resync - -In some cases, nodes hold a reference to a block but do not actually have the block -stored on disk. Conversely, they may also have on disk blocks that are not referenced -any more. To fix both cases, a block repair may be run with `garage repair blocks`. -This will scan the entire block reference counter table to check that the blocks -exist on disk, and will scan the entire disk store to check that stored blocks -are referenced. - -It is recommended to run this procedure when changing your cluster layout, -after the metadata tables have finished synchronizing between nodes -(usually a few hours after `garage layout apply`). - -## Inspecting lost blocks - -In extremely rare situations, data blocks may be unavailable from the entire cluster. -This means that even using `garage repair blocks`, some nodes may be unable -to fetch data blocks for which they hold a reference. - -These errors are stored on each node in a list of "block resync errors", i.e. -blocks for which the last resync operation failed. -This list can be inspected using `garage block list-errors`. -These errors usually fall into one of the following categories: - -1. a block is still referenced but the object was deleted, this is a case - of metadata reference inconsistency (see below for the fix) -2. a block is referenced by a non-deleted object, but could not be fetched due - to a transient error such as a network failure -3. a block is referenced by a non-deleted object, but could not be fetched due - to a permanent error such as there not being any valid copy of the block on the - entire cluster - -To help make the difference between cases 1 and cases 2 and 3, you may use the -`garage block info` command to see which objects hold a reference to each block. - -In the second case (transient errors), Garage will try to fetch the block again -after a certain time, so the error should disappear natuarlly. You can also -request Garage to try to fetch the block immediately using `garage block retry-now` -if you have fixed the transient issue. - -If you are confident that you are in the third scenario and that your data block -is definitely lost, then there is no other choice than to declare your S3 objects -as unrecoverable, and to delete them properly from the data store. This can be done -using the `garage block purge` command. - - -# Metadata operations - -## Metadata table resync - -Garage automatically resyncs all entries stored in the metadata tables every hour, -to ensure that all nodes have the most up-to-date version of all the information -they should be holding. -The resync procedure is based on a Merkle tree that allows to efficiently find -differences between nodes. - -In some special cases, e.g. before an upgrade, you might want to run a table -resync manually. This can be done using `garage repair tables`. - -## Metadata table reference fixes - -In some very rare cases where nodes are unavailable, some references between objects -are broken. For instance, if an object is deleted, the underlying versions or data -blocks may still be held by Garage. If you suspect that such corruption has occurred -in your cluster, you can run one of the following repair procedures: - -- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version -- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected) - diff --git a/doc/book/cookbook/real-world.md b/doc/book/cookbook/real-world.md index 0b9e016c..7061069f 100644 --- a/doc/book/cookbook/real-world.md +++ b/doc/book/cookbook/real-world.md @@ -345,7 +345,7 @@ garage layout apply ``` **WARNING:** if you want to use the layout modification commands in a script, -make sure to read [this page](@/documentation/reference-manual/layout.md) first. +make sure to read [this page](@/documentation/operations/layout.md) first. ## Using your Garage cluster diff --git a/doc/book/cookbook/recovering.md b/doc/book/cookbook/recovering.md deleted file mode 100644 index 1c6a6763..00000000 --- a/doc/book/cookbook/recovering.md +++ /dev/null @@ -1,110 +0,0 @@ -+++ -title = "Recovering from failures" -weight = 60 -+++ - -Garage is meant to work on old, second-hand hardware. -In particular, this makes it likely that some of your drives will fail, and some manual intervention will be needed. -Fear not! For Garage is fully equipped to handle drive failures, in most common cases. - -## A note on availability of Garage - -With nodes dispersed in 3 zones or more, here are the guarantees Garage provides with the 3-way replication strategy (3 copies of all data, which is the recommended replication mode): - -- The cluster remains fully functional as long as the machines that fail are in only one zone. This includes a whole zone going down due to power/Internet outage. -- No data is lost as long as the machines that fail are in at most two zones. - -Of course this only works if your Garage nodes are correctly configured to be aware of the zone in which they are located. -Make sure this is the case using `garage status` to check on the state of your cluster's configuration. - -In case of temporarily disconnected nodes, Garage should automatically re-synchronize -when the nodes come back up. This guide will deal with recovering from disk failures -that caused the loss of the data of a node. - - -## First option: removing a node - -If you don't have spare parts (HDD, SDD) to replace the failed component, and if there are enough remaining nodes in your cluster -(at least 3), you can simply remove the failed node from Garage's configuration. -Note that if you **do** intend to replace the failed parts by new ones, using this method followed by adding back the node is **not recommended** (although it should work), -and you should instead use one of the methods detailed in the next sections. - -Removing a node is done with the following command: - -```bash -garage layout remove -garage layout show # review the changes you are making -garage layout apply # once satisfied, apply the changes -``` - -(you can get the `node_id` of the failed node by running `garage status`) - -This will repartition the data and ensure that 3 copies of everything are present on the nodes that remain available. - - - -## Replacement scenario 1: only data is lost, metadata is fine - -The recommended deployment for Garage uses an SSD to store metadata, and an HDD to store blocks of data. -In the case where only a single HDD crashes, the blocks of data are lost but the metadata is still fine. - -This is very easy to recover by setting up a new HDD to replace the failed one. -The node does not need to be fully replaced and the configuration doesn't need to change. -We just need to tell Garage to get back all the data blocks and store them on the new HDD. - -First, set up a new HDD to store Garage's data directory on the failed node, and restart Garage using -the existing configuration. Then, run: - -```bash -garage repair -a --yes blocks -``` - -This will re-synchronize blocks of data that are missing to the new HDD, reading them from copies located on other nodes. - -You can check on the advancement of this process by doing the following command: - -```bash -garage stats -a -``` - -Look out for the following output: - -``` -Block manager stats: - resync queue length: 26541 -``` - -This indicates that one of the Garage node is in the process of retrieving missing data from other nodes. -This number decreases to zero when the node is fully synchronized. - - -## Replacement scenario 2: metadata (and possibly data) is lost - -This scenario covers the case where a full node fails, i.e. both the metadata directory and -the data directory are lost, as well as the case where only the metadata directory is lost. - -To replace the lost node, we will start from an empty metadata directory, which means -Garage will generate a new node ID for the replacement node. -We will thus need to remove the previous node ID from Garage's configuration and replace it by the ID of the new node. - -If your data directory is stored on a separate drive and is still fine, you can keep it, but it is not necessary to do so. -In all cases, the data will be rebalanced and the replacement node will not store the same pieces of data -as were originally stored on the one that failed. So if you keep the data files, the rebalancing -might be faster but most of the pieces will be deleted anyway from the disk and replaced by other ones. - -First, set up a new drive to store the metadata directory for the replacement node (a SSD is recommended), -and for the data directory if necessary. You can then start Garage on the new node. -The restarted node should generate a new node ID, and it should be shown with `NO ROLE ASSIGNED` in `garage status`. -The ID of the lost node should be shown in `garage status` in the section for disconnected/unavailable nodes. - -Then, replace the broken node by the new one, using: - -```bash -garage layout assign --replace \ - -c -z -t -garage layout show # review the changes you are making -garage layout apply # once satisfied, apply the changes -``` - -Garage will then start synchronizing all required data on the new node. -This process can be monitored using the `garage stats -a` command. diff --git a/doc/book/cookbook/upgrading.md b/doc/book/cookbook/upgrading.md deleted file mode 100644 index 5a2850c0..00000000 --- a/doc/book/cookbook/upgrading.md +++ /dev/null @@ -1,85 +0,0 @@ -+++ -title = "Upgrading Garage" -weight = 70 -+++ - -Garage is a stateful clustered application, where all nodes are communicating together and share data structures. -It makes upgrade more difficult than stateless applications so you must be more careful when upgrading. -On a new version release, there is 2 possibilities: - - protocols and data structures remained the same ➡️ this is a **minor upgrade** - - protocols or data structures changed ➡️ this is a **major upgrade** - -You can quickly now what type of update you will have to operate by looking at the version identifier: -when we require our users to do a major upgrade, we will always bump the first nonzero component of the version identifier -(e.g. from v0.7.2 to v0.8.0). -Conversely, for versions that only require a minor upgrade, the first nonzero component will always stay the same (e.g. from v0.8.0 to v0.8.1). - -Major upgrades are designed to be run only between contiguous versions. -Example: migrations from v0.7.1 to v0.8.0 and from v0.7.0 to v0.8.2 are supported but migrations from v0.6.0 to v0.8.0 are not supported. - -The `garage_build_info` -[Prometheus metric](@/documentation/reference-manual/monitoring.md) provides -an overview for which Garage versions are currently in use within a cluster. - -## Minor upgrades - -Minor upgrades do not imply cluster downtime. -Before upgrading, you should still read [the changelog](https://git.deuxfleurs.fr/Deuxfleurs/garage/releases) and ideally test your deployment on a staging cluster before. - -When you are ready, start by checking the health of your cluster. -You can force some checks with `garage repair`, we recommend at least running `garage repair --all-nodes --yes tables` which is very quick to run (less than a minute). -You will see that the command correctly terminated in the logs of your daemon, or using `garage worker list` (the repair workers should be in the `Done` state). - -Finally, you can simply upgrade nodes one by one. -For each node: stop it, install the new binary, edit the configuration if needed, restart it. - -## Major upgrades - -Major upgrades can be done with minimal downtime with a bit of preparation, but the simplest way is usually to put the cluster offline for the duration of the migration. -Before upgrading, you must read [the changelog](https://git.deuxfleurs.fr/Deuxfleurs/garage/releases) and you must test your deployment on a staging cluster before. - -We write guides for each major upgrade, they are stored under the "Working Documents" section of this documentation. - -### Major upgrades with full downtime - -From a high level perspective, a major upgrade looks like this: - - 1. Disable API access (for instance in your reverse proxy, or by commenting the corresponding section in your Garage configuration file and restarting Garage) - 2. Check that your cluster is idle - 3. Make sure the health of your cluster is good (see `garage repair`) - 4. Stop the whole cluster - 5. Back up the metadata folder of all your nodes, so that you will be able to restore it if the upgrade fails (data blocks being immutable, they should not be impacted) - 6. Install the new binary, update the configuration - 7. Start the whole cluster - 8. If needed, run the corresponding migration from `garage migrate` - 9. Make sure the health of your cluster is good - 10. Enable API access (reverse step 1) - 11. Monitor your cluster while load comes back, check that all your applications are happy with this new version - -### Major upgarades with minimal downtime - -There is only one operation that has to be coordinated cluster-wide: the switch of one version of the internal RPC protocol to the next. -This means that an upgrade with very limited downtime can simply be performed from one major version to the next by restarting all nodes -simultaneously in the new version. -The downtime will simply be the time required for all nodes to stop and start again, which should be less than a minute. -If all nodes fail to stop and restart simultaneously, some nodes might be temporarily shut out from the cluster as nodes using different RPC protocol -versions are prevented to talk to one another. - -The entire procedure would look something like this: - -1. Make sure the health of your cluster is good (see `garage repair`) - -2. Take each node offline individually to back up its metadata folder, bring them back online once the backup is done. - You can do all of the nodes in a single zone at once as that won't impact global cluster availability. - Do not try to make a backup of the metadata folder of a running node. - -3. Prepare your binaries and configuration files for the new Garage version - -4. Restart all nodes simultaneously in the new version - -5. If any specific migration procedure is required, it is usually in one of the two cases: - - - It can be run on online nodes after the new version has started, during regular cluster operation. - - it has to be run offline - - For this last step, please refer to the specific documentation pertaining to the version upgrade you are doing. diff --git a/doc/book/design/_index.md b/doc/book/design/_index.md index 50933139..5881ab8f 100644 --- a/doc/book/design/_index.md +++ b/doc/book/design/_index.md @@ -1,6 +1,6 @@ +++ title = "Design" -weight = 6 +weight = 70 sort_by = "weight" template = "documentation.html" +++ diff --git a/doc/book/design/internals.md b/doc/book/design/internals.md index 777e017d..cefb7acc 100644 --- a/doc/book/design/internals.md +++ b/doc/book/design/internals.md @@ -61,7 +61,7 @@ Garage prioritizes which nodes to query according to a few criteria: For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md) -and [cluster layout management](@/documentation/reference-manual/layout.md) pages. +and [cluster layout management](@/documentation/operations/layout.md) pages. ## Garbage collection diff --git a/doc/book/development/_index.md b/doc/book/development/_index.md index 8e730bf6..2b2af0cc 100644 --- a/doc/book/development/_index.md +++ b/doc/book/development/_index.md @@ -1,6 +1,6 @@ +++ title = "Development" -weight = 7 +weight = 80 sort_by = "weight" template = "documentation.html" +++ diff --git a/doc/book/operations/_index.md b/doc/book/operations/_index.md new file mode 100644 index 00000000..16d0e9d5 --- /dev/null +++ b/doc/book/operations/_index.md @@ -0,0 +1,23 @@ ++++ +title = "Operations & Maintenance" +weight = 50 +sort_by = "weight" +template = "documentation.html" ++++ + +This section contains a number of important information on how to best operate a Garage cluster, +to ensure integrity and availability of your data: + +- **[Upgrading Garage](@/documentation/operations/upgrading.md):** General instructions on how to + upgrade your cluster from one version to the next. Instructions specific for each version upgrade + can bef ound in the [working documents](@/documentation/working-documents/_index.md) section. + +- **[Layout management](@/documentation/operations/layout.md):** Best practices for using the `garage layout` + commands when adding or removing nodes from your cluster. + +- **[Durability and repairs](@/documentation/operations/durability-repairs.md):** How to check for small things + that might be going wrong, and how to recover from such failures. + +- **[Recovering from failures](@/documentation/operations/recovering.md):** Garage's first selling point is resilience + to hardware failures. This section explains how to recover from such a failure in the + best possible way. diff --git a/doc/book/operations/durability-repairs.md b/doc/book/operations/durability-repairs.md new file mode 100644 index 00000000..b8992f85 --- /dev/null +++ b/doc/book/operations/durability-repairs.md @@ -0,0 +1,114 @@ ++++ +title = "Durability & Repairs" +weight = 30 ++++ + +To ensure the best durability of your data and to fix any inconsistencies that may +pop up in a distributed system, Garage provides a serires of repair operations. +This guide will explain the meaning of each of them and when they should be applied. + + +# General syntax of repair operations + +Repair operations described below are of the form `garage repair `. +These repairs will not launch without the `--yes` flag, which should +be added as follows: `garage repair --yes `. +By default these repair procedures will only run on the Garage node your CLI is +connecting to. To run on all nodes, add the `-a` flag as follows: +`garage repair -a --yes `. + +# Data block operations + +## Data store scrub + +Scrubbing the data store means examining each individual data block to check that +their content is correct, by verifying their hash. Any block found to be corrupted +(e.g. by bitrot or by an accidental manipulation of the datastore) will be +restored from another node that holds a valid copy. + +A scrub is run automatically by Garage every 30 days. It can also be launched +manually using `garage repair scrub start`. + +To view the status of an ongoing scrub, first find the task ID of the scrub worker +using `garage worker list`. Then, run `garage worker info ` to +view detailed runtime statistics of the scrub. To gather cluster-wide information, +this command has to be run on each individual node. + +A scrub is a very disk-intensive operation that might slow down your cluster. +You may pause an ongoing scrub using `garage repair scrub pause`, but note that +the scrub will resume automatically 24 hours later as Garage will not let your +cluster run without a regular scrub. If the scrub procedure is too intensive +for your servers and is slowing down your workload, the recommended solution +is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`. +A higher tranquility value will make Garage take longer pauses between two block +verifications. Of course, scrubbing the entire data store will also take longer. + +## Block check and resync + +In some cases, nodes hold a reference to a block but do not actually have the block +stored on disk. Conversely, they may also have on disk blocks that are not referenced +any more. To fix both cases, a block repair may be run with `garage repair blocks`. +This will scan the entire block reference counter table to check that the blocks +exist on disk, and will scan the entire disk store to check that stored blocks +are referenced. + +It is recommended to run this procedure when changing your cluster layout, +after the metadata tables have finished synchronizing between nodes +(usually a few hours after `garage layout apply`). + +## Inspecting lost blocks + +In extremely rare situations, data blocks may be unavailable from the entire cluster. +This means that even using `garage repair blocks`, some nodes may be unable +to fetch data blocks for which they hold a reference. + +These errors are stored on each node in a list of "block resync errors", i.e. +blocks for which the last resync operation failed. +This list can be inspected using `garage block list-errors`. +These errors usually fall into one of the following categories: + +1. a block is still referenced but the object was deleted, this is a case + of metadata reference inconsistency (see below for the fix) +2. a block is referenced by a non-deleted object, but could not be fetched due + to a transient error such as a network failure +3. a block is referenced by a non-deleted object, but could not be fetched due + to a permanent error such as there not being any valid copy of the block on the + entire cluster + +To help make the difference between cases 1 and cases 2 and 3, you may use the +`garage block info` command to see which objects hold a reference to each block. + +In the second case (transient errors), Garage will try to fetch the block again +after a certain time, so the error should disappear natuarlly. You can also +request Garage to try to fetch the block immediately using `garage block retry-now` +if you have fixed the transient issue. + +If you are confident that you are in the third scenario and that your data block +is definitely lost, then there is no other choice than to declare your S3 objects +as unrecoverable, and to delete them properly from the data store. This can be done +using the `garage block purge` command. + + +# Metadata operations + +## Metadata table resync + +Garage automatically resyncs all entries stored in the metadata tables every hour, +to ensure that all nodes have the most up-to-date version of all the information +they should be holding. +The resync procedure is based on a Merkle tree that allows to efficiently find +differences between nodes. + +In some special cases, e.g. before an upgrade, you might want to run a table +resync manually. This can be done using `garage repair tables`. + +## Metadata table reference fixes + +In some very rare cases where nodes are unavailable, some references between objects +are broken. For instance, if an object is deleted, the underlying versions or data +blocks may still be held by Garage. If you suspect that such corruption has occurred +in your cluster, you can run one of the following repair procedures: + +- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version +- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected) + diff --git a/doc/book/operations/layout.md b/doc/book/operations/layout.md new file mode 100644 index 00000000..5e314246 --- /dev/null +++ b/doc/book/operations/layout.md @@ -0,0 +1,77 @@ ++++ +title = "Cluster layout management" +weight = 20 ++++ + +The cluster layout in Garage is a table that assigns to each node a role in +the cluster. The role of a node in Garage can either be a storage node with +a certain capacity, or a gateway node that does not store data and is only +used as an API entry point for faster cluster access. +An introduction to building cluster layouts can be found in the [production deployment](@/documentation/cookbook/real-world.md) page. + +## How cluster layouts work in Garage + +In Garage, a cluster layout is composed of the following components: + +- a table of roles assigned to nodes +- a version number + +Garage nodes will always use the cluster layout with the highest version number. + +Garage nodes also maintain and synchronize between them a set of proposed role +changes that haven't yet been applied. These changes will be applied (or +canceled) in the next version of the layout + +The following commands insert modifications to the set of proposed role changes +for the next layout version (but they do not create the new layout immediately): + +```bash +garage layout assign [...] +garage layout remove [...] +``` + +The following command can be used to inspect the layout that is currently set in the cluster +and the changes proposed for the next layout version, if any: + +```bash +garage layout show +``` + +The following commands create a new layout with the specified version number, +that either takes into account the proposed changes or cancels them: + +```bash +garage layout apply --version +garage layout revert --version +``` + +The version number of the new layout to create must be 1 + the version number +of the previous layout that existed in the cluster. The `apply` and `revert` +commands will fail otherwise. + +## Warnings about Garage cluster layout management + +**Warning: never make several calls to `garage layout apply` or `garage layout +revert` with the same value of the `--version` flag. Doing so can lead to the +creation of several different layouts with the same version number, in which +case your Garage cluster will become inconsistent until fixed.** If a call to +`garage layout apply` or `garage layout revert` has failed and `garage layout +show` indicates that a new layout with the given version number has not been +set in the cluster, then it is fine to call the command again with the same +version number. + +If you are using the `garage` CLI by typing individual commands in your +shell, you shouldn't have much issues as long as you run commands one after +the other and take care of checking the output of `garage layout show` +before applying any changes. + +If you are using the `garage` CLI to script layout changes, follow the following recommendations: + +- Make all of your `garage` CLI calls to the same RPC host. Do not use the + `garage` CLI to connect to individual nodes to send them each a piece of the + layout changes you are making, as the changes propagate asynchronously + between nodes and might not all be taken into account at the time when the + new layout is applied. + +- **Only call `garage layout apply` once**, and call it **strictly after** all + of the `layout assign` and `layout remove` commands have returned. diff --git a/doc/book/operations/recovering.md b/doc/book/operations/recovering.md new file mode 100644 index 00000000..7a830788 --- /dev/null +++ b/doc/book/operations/recovering.md @@ -0,0 +1,110 @@ ++++ +title = "Recovering from failures" +weight = 40 ++++ + +Garage is meant to work on old, second-hand hardware. +In particular, this makes it likely that some of your drives will fail, and some manual intervention will be needed. +Fear not! For Garage is fully equipped to handle drive failures, in most common cases. + +## A note on availability of Garage + +With nodes dispersed in 3 zones or more, here are the guarantees Garage provides with the 3-way replication strategy (3 copies of all data, which is the recommended replication mode): + +- The cluster remains fully functional as long as the machines that fail are in only one zone. This includes a whole zone going down due to power/Internet outage. +- No data is lost as long as the machines that fail are in at most two zones. + +Of course this only works if your Garage nodes are correctly configured to be aware of the zone in which they are located. +Make sure this is the case using `garage status` to check on the state of your cluster's configuration. + +In case of temporarily disconnected nodes, Garage should automatically re-synchronize +when the nodes come back up. This guide will deal with recovering from disk failures +that caused the loss of the data of a node. + + +## First option: removing a node + +If you don't have spare parts (HDD, SDD) to replace the failed component, and if there are enough remaining nodes in your cluster +(at least 3), you can simply remove the failed node from Garage's configuration. +Note that if you **do** intend to replace the failed parts by new ones, using this method followed by adding back the node is **not recommended** (although it should work), +and you should instead use one of the methods detailed in the next sections. + +Removing a node is done with the following command: + +```bash +garage layout remove +garage layout show # review the changes you are making +garage layout apply # once satisfied, apply the changes +``` + +(you can get the `node_id` of the failed node by running `garage status`) + +This will repartition the data and ensure that 3 copies of everything are present on the nodes that remain available. + + + +## Replacement scenario 1: only data is lost, metadata is fine + +The recommended deployment for Garage uses an SSD to store metadata, and an HDD to store blocks of data. +In the case where only a single HDD crashes, the blocks of data are lost but the metadata is still fine. + +This is very easy to recover by setting up a new HDD to replace the failed one. +The node does not need to be fully replaced and the configuration doesn't need to change. +We just need to tell Garage to get back all the data blocks and store them on the new HDD. + +First, set up a new HDD to store Garage's data directory on the failed node, and restart Garage using +the existing configuration. Then, run: + +```bash +garage repair -a --yes blocks +``` + +This will re-synchronize blocks of data that are missing to the new HDD, reading them from copies located on other nodes. + +You can check on the advancement of this process by doing the following command: + +```bash +garage stats -a +``` + +Look out for the following output: + +``` +Block manager stats: + resync queue length: 26541 +``` + +This indicates that one of the Garage node is in the process of retrieving missing data from other nodes. +This number decreases to zero when the node is fully synchronized. + + +## Replacement scenario 2: metadata (and possibly data) is lost + +This scenario covers the case where a full node fails, i.e. both the metadata directory and +the data directory are lost, as well as the case where only the metadata directory is lost. + +To replace the lost node, we will start from an empty metadata directory, which means +Garage will generate a new node ID for the replacement node. +We will thus need to remove the previous node ID from Garage's configuration and replace it by the ID of the new node. + +If your data directory is stored on a separate drive and is still fine, you can keep it, but it is not necessary to do so. +In all cases, the data will be rebalanced and the replacement node will not store the same pieces of data +as were originally stored on the one that failed. So if you keep the data files, the rebalancing +might be faster but most of the pieces will be deleted anyway from the disk and replaced by other ones. + +First, set up a new drive to store the metadata directory for the replacement node (a SSD is recommended), +and for the data directory if necessary. You can then start Garage on the new node. +The restarted node should generate a new node ID, and it should be shown with `NO ROLE ASSIGNED` in `garage status`. +The ID of the lost node should be shown in `garage status` in the section for disconnected/unavailable nodes. + +Then, replace the broken node by the new one, using: + +```bash +garage layout assign --replace \ + -c -z -t +garage layout show # review the changes you are making +garage layout apply # once satisfied, apply the changes +``` + +Garage will then start synchronizing all required data on the new node. +This process can be monitored using the `garage stats -a` command. diff --git a/doc/book/operations/upgrading.md b/doc/book/operations/upgrading.md new file mode 100644 index 00000000..e8919a19 --- /dev/null +++ b/doc/book/operations/upgrading.md @@ -0,0 +1,85 @@ ++++ +title = "Upgrading Garage" +weight = 10 ++++ + +Garage is a stateful clustered application, where all nodes are communicating together and share data structures. +It makes upgrade more difficult than stateless applications so you must be more careful when upgrading. +On a new version release, there is 2 possibilities: + - protocols and data structures remained the same ➡️ this is a **minor upgrade** + - protocols or data structures changed ➡️ this is a **major upgrade** + +You can quickly now what type of update you will have to operate by looking at the version identifier: +when we require our users to do a major upgrade, we will always bump the first nonzero component of the version identifier +(e.g. from v0.7.2 to v0.8.0). +Conversely, for versions that only require a minor upgrade, the first nonzero component will always stay the same (e.g. from v0.8.0 to v0.8.1). + +Major upgrades are designed to be run only between contiguous versions. +Example: migrations from v0.7.1 to v0.8.0 and from v0.7.0 to v0.8.2 are supported but migrations from v0.6.0 to v0.8.0 are not supported. + +The `garage_build_info` +[Prometheus metric](@/documentation/reference-manual/monitoring.md) provides +an overview for which Garage versions are currently in use within a cluster. + +## Minor upgrades + +Minor upgrades do not imply cluster downtime. +Before upgrading, you should still read [the changelog](https://git.deuxfleurs.fr/Deuxfleurs/garage/releases) and ideally test your deployment on a staging cluster before. + +When you are ready, start by checking the health of your cluster. +You can force some checks with `garage repair`, we recommend at least running `garage repair --all-nodes --yes tables` which is very quick to run (less than a minute). +You will see that the command correctly terminated in the logs of your daemon, or using `garage worker list` (the repair workers should be in the `Done` state). + +Finally, you can simply upgrade nodes one by one. +For each node: stop it, install the new binary, edit the configuration if needed, restart it. + +## Major upgrades + +Major upgrades can be done with minimal downtime with a bit of preparation, but the simplest way is usually to put the cluster offline for the duration of the migration. +Before upgrading, you must read [the changelog](https://git.deuxfleurs.fr/Deuxfleurs/garage/releases) and you must test your deployment on a staging cluster before. + +We write guides for each major upgrade, they are stored under the "Working Documents" section of this documentation. + +### Major upgrades with full downtime + +From a high level perspective, a major upgrade looks like this: + + 1. Disable API access (for instance in your reverse proxy, or by commenting the corresponding section in your Garage configuration file and restarting Garage) + 2. Check that your cluster is idle + 3. Make sure the health of your cluster is good (see `garage repair`) + 4. Stop the whole cluster + 5. Back up the metadata folder of all your nodes, so that you will be able to restore it if the upgrade fails (data blocks being immutable, they should not be impacted) + 6. Install the new binary, update the configuration + 7. Start the whole cluster + 8. If needed, run the corresponding migration from `garage migrate` + 9. Make sure the health of your cluster is good + 10. Enable API access (reverse step 1) + 11. Monitor your cluster while load comes back, check that all your applications are happy with this new version + +### Major upgarades with minimal downtime + +There is only one operation that has to be coordinated cluster-wide: the switch of one version of the internal RPC protocol to the next. +This means that an upgrade with very limited downtime can simply be performed from one major version to the next by restarting all nodes +simultaneously in the new version. +The downtime will simply be the time required for all nodes to stop and start again, which should be less than a minute. +If all nodes fail to stop and restart simultaneously, some nodes might be temporarily shut out from the cluster as nodes using different RPC protocol +versions are prevented to talk to one another. + +The entire procedure would look something like this: + +1. Make sure the health of your cluster is good (see `garage repair`) + +2. Take each node offline individually to back up its metadata folder, bring them back online once the backup is done. + You can do all of the nodes in a single zone at once as that won't impact global cluster availability. + Do not try to make a backup of the metadata folder of a running node. + +3. Prepare your binaries and configuration files for the new Garage version + +4. Restart all nodes simultaneously in the new version + +5. If any specific migration procedure is required, it is usually in one of the two cases: + + - It can be run on online nodes after the new version has started, during regular cluster operation. + - it has to be run offline + + For this last step, please refer to the specific documentation pertaining to the version upgrade you are doing. diff --git a/doc/book/quick-start/_index.md b/doc/book/quick-start/_index.md index f01789a3..f556eaa3 100644 --- a/doc/book/quick-start/_index.md +++ b/doc/book/quick-start/_index.md @@ -1,6 +1,6 @@ +++ title = "Quick Start" -weight = 0 +weight = 10 sort_by = "weight" template = "documentation.html" +++ diff --git a/doc/book/reference-manual/_index.md b/doc/book/reference-manual/_index.md index ab1de5e6..1f360e57 100644 --- a/doc/book/reference-manual/_index.md +++ b/doc/book/reference-manual/_index.md @@ -1,6 +1,6 @@ +++ title = "Reference Manual" -weight = 5 +weight = 60 sort_by = "weight" template = "documentation.html" +++ diff --git a/doc/book/reference-manual/features.md b/doc/book/reference-manual/features.md index 550504ff..2f8e633a 100644 --- a/doc/book/reference-manual/features.md +++ b/doc/book/reference-manual/features.md @@ -35,7 +35,7 @@ This makes setting up and administering storage clusters, we hope, as easy as it A Garage cluster can very easily evolve over time, as storage nodes are added or removed. Garage will automatically rebalance data between nodes as needed to ensure the desired number of copies. -Read about cluster layout management [here](@/documentation/reference-manual/layout.md). +Read about cluster layout management [here](@/documentation/operations/layout.md). ### No RAFT slowing you down diff --git a/doc/book/reference-manual/layout.md b/doc/book/reference-manual/layout.md deleted file mode 100644 index a7d6f51f..00000000 --- a/doc/book/reference-manual/layout.md +++ /dev/null @@ -1,77 +0,0 @@ -+++ -title = "Cluster layout management" -weight = 50 -+++ - -The cluster layout in Garage is a table that assigns to each node a role in -the cluster. The role of a node in Garage can either be a storage node with -a certain capacity, or a gateway node that does not store data and is only -used as an API entry point for faster cluster access. -An introduction to building cluster layouts can be found in the [production deployment](@/documentation/cookbook/real-world.md) page. - -## How cluster layouts work in Garage - -In Garage, a cluster layout is composed of the following components: - -- a table of roles assigned to nodes -- a version number - -Garage nodes will always use the cluster layout with the highest version number. - -Garage nodes also maintain and synchronize between them a set of proposed role -changes that haven't yet been applied. These changes will be applied (or -canceled) in the next version of the layout - -The following commands insert modifications to the set of proposed role changes -for the next layout version (but they do not create the new layout immediately): - -```bash -garage layout assign [...] -garage layout remove [...] -``` - -The following command can be used to inspect the layout that is currently set in the cluster -and the changes proposed for the next layout version, if any: - -```bash -garage layout show -``` - -The following commands create a new layout with the specified version number, -that either takes into account the proposed changes or cancels them: - -```bash -garage layout apply --version -garage layout revert --version -``` - -The version number of the new layout to create must be 1 + the version number -of the previous layout that existed in the cluster. The `apply` and `revert` -commands will fail otherwise. - -## Warnings about Garage cluster layout management - -**Warning: never make several calls to `garage layout apply` or `garage layout -revert` with the same value of the `--version` flag. Doing so can lead to the -creation of several different layouts with the same version number, in which -case your Garage cluster will become inconsistent until fixed.** If a call to -`garage layout apply` or `garage layout revert` has failed and `garage layout -show` indicates that a new layout with the given version number has not been -set in the cluster, then it is fine to call the command again with the same -version number. - -If you are using the `garage` CLI by typing individual commands in your -shell, you shouldn't have much issues as long as you run commands one after -the other and take care of checking the output of `garage layout show` -before applying any changes. - -If you are using the `garage` CLI to script layout changes, follow the following recommendations: - -- Make all of your `garage` CLI calls to the same RPC host. Do not use the - `garage` CLI to connect to individual nodes to send them each a piece of the - layout changes you are making, as the changes propagate asynchronously - between nodes and might not all be taken into account at the time when the - new layout is applied. - -- **Only call `garage layout apply` once**, and call it **strictly after** all - of the `layout assign` and `layout remove` commands have returned. diff --git a/doc/book/working-documents/_index.md b/doc/book/working-documents/_index.md index 8fc170b7..fe79e65d 100644 --- a/doc/book/working-documents/_index.md +++ b/doc/book/working-documents/_index.md @@ -1,6 +1,6 @@ +++ title = "Working Documents" -weight = 8 +weight = 90 sort_by = "weight" template = "documentation.html" +++ -- cgit v1.2.3 From 7169ee6ee661c84acc2847c96cc8afaebf06dc09 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 12:09:29 +0200 Subject: doc: reformulate in monitoring page --- doc/book/cookbook/monitoring.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/doc/book/cookbook/monitoring.md b/doc/book/cookbook/monitoring.md index 8313daa9..b204dbbe 100644 --- a/doc/book/cookbook/monitoring.md +++ b/doc/book/cookbook/monitoring.md @@ -49,9 +49,5 @@ add the following lines in your Prometheus scrape config: To visualize the scraped data in Grafana, you can either import our [Grafana dashboard for Garage](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/branch/main/script/telemetry/grafana-garage-dashboard-prometheus.json) or make your own. -We detail below the list of exposed metrics and their meaning. - -## List of exported metrics - -See our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section. +The list of exported metrics is available on our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section. -- cgit v1.2.3 From 39c3738a079f2a18ee1ef378c8f67050eb2f442b Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 12:29:52 +0200 Subject: Add a page about encryption (fix #416) --- doc/book/cookbook/encryption.md | 105 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 doc/book/cookbook/encryption.md diff --git a/doc/book/cookbook/encryption.md b/doc/book/cookbook/encryption.md new file mode 100644 index 00000000..156c54e8 --- /dev/null +++ b/doc/book/cookbook/encryption.md @@ -0,0 +1,105 @@ ++++ +title = "Encryption" +weight = 50 ++++ + +Encryption is a recurring subject when discussing Garage. +Garage does not handle data encryption by itself, but many things can +already be done with Garage's current feature set and the existing ecosystem. + +This page takes a high level approach to security in general and data encryption +in particular. + + +# Examining your need for encryption + +- Why do you want encryption in Garage? + +- What is your threat model? What are you fearing? + - A stolen HDD? + - A curious administrator? + - A malicious administrator? + - A remote attacker? + - etc. + +- What services do you want to protect with encryption? + - An existing application? Which one? (eg. Nextcloud) + - An application that you are writing + +- Any expertise you may have on the subject + +This page explains what Garage provides, and how you can improve the situation by yourself +by adding encryption at different levels. + +We would be very curious to know your needs and thougs about ideas such as +encryption practices and things like key management, as we want Garage to be a +serious base platform for the developpment of secure, encrypted applications. +Do not hesitate to come talk to us if you have any thoughts or questions on the +subject. + + +# Capabilities provided by Garage + +## Traffic is encrypted between Garage nodes + +RPCs between Garage nodes are encrypted. More specifically, contrary to many +distributed software, it is impossible in Garage to have clear-text RPC. We +use the [kuska handshake](https://github.com/Kuska-ssb/handshake) library which +implements a protocol that has been clearly reviewed, Secure ScuttleButt's +Secret Handshake protocol. This is why setting a `rpc_secret` is mandatory, +and that's also why your nodes have super long identifiers. + +## Encrypting traffic between a Garage node and your client + +HTTP API endpoints provided by Garage are in clear text. +You have multiple options to have encryption between your client and a node: + + - Setup a reverse proxy with TLS / ACME / Let's encrypt + - Setup a Garage gateway locally, and only contact the garage daemon on `localhost` + - Only contact your Garage daemon over a secure, encrypted overlay network such as Wireguard + +## Garage stores data in plain text on the filesystem + +Garage does not handle data encryption at rest by itself, and instead delegates +to the user to add encryption, either at the storage layer (LUKS, etc) or on +the client side (or both). There are no current plans to add data encryption +directly in Garage. + +Implementing data encryption directly in Garage might make things simpler for +end users, but also raises many more questions, especially around key +management: for encryption of data, where could Garage get the encryption keys +from ? If we encrypt data but keep the keys in a plaintext file next to them, +it's useless. We probably don't want to have to manage secrets in garage as it +would be very hard to do in a secure way. Maybe integrate with an external +system such as Hashicorp Vault? + + +# Adding data encryption using external tools + +## Encrypting data at rest + +Protects against the following threats: + +- Stolen HDD + +Crucially, does not protect againt malicious sysadmins or remote attackers that +might gain access to your servers. + +Methods include full-disk encryption with tools such as LUKS. + +## Encrypting data on the client side + +Protects againt the following threats: + +- A honest-but-curious administrator +- A malicious administrator that tries to corrupt your data +- A remote attacker that can read your server's data + +Implementations are very specific to the various applications. Examples: + +- Matrix: uses the OLM protocol for E2EE of user messages. Media files stored + in Matrix are probably encrypted using symmetric encryption, with a key that is + distributed in the end-to-end encrypted message that contains the link to the object. + +- Aerogramme: use the user's password as a key to decrypt data in the user's bucket + -- cgit v1.2.3 From 120f8b3bfb61d1f38290207ac67933263cb57eeb Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 12:33:25 +0200 Subject: doc: better doc on systemd's DynamicUser (fix #430) --- doc/book/cookbook/systemd.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/doc/book/cookbook/systemd.md b/doc/book/cookbook/systemd.md index b271010b..c0ed7d1f 100644 --- a/doc/book/cookbook/systemd.md +++ b/doc/book/cookbook/systemd.md @@ -33,7 +33,20 @@ NoNewPrivileges=true WantedBy=multi-user.target ``` -*A note on hardening: garage will be run as a non privileged user, its user id is dynamically allocated by systemd. It cannot access (read or write) home folders (/home, /root and /run/user), the rest of the filesystem can only be read but not written, only the path seen as /var/lib/garage is writable as seen by the service (mapped to /var/lib/private/garage on your host). Additionnaly, the process can not gain new privileges over time.* +**A note on hardening:** Garage will be run as a non privileged user, its user +id is dynamically allocated by systemd (set with `DynamicUser=true`). It cannot +access (read or write) home folders (`/home`, `/root` and `/run/user`), the +rest of the filesystem can only be read but not written, only the path seen as +`/var/lib/garage` is writable as seen by the service. Additionnaly, the process +can not gain new privileges over time. + +For this to work correctly, your `garage.toml` must be set with +`metadata_dir=/var/lib/garage/meta` and `data_dir=/var/lib/garage/data`. This +is mandatory to use the DynamicUser hardening feature of systemd, which +autocreates these directories as virtual mapping. If the directory +`/var/lib/garage` already exists before starting the server for the first time, +the systemd service might not start correctly. Note that in your host +filesystem, Garage data will be held in `/var/lib/private/garage`. To start the service then automatically enable it at boot: -- cgit v1.2.3 From 9092c71a01311f8f7174fa03facdb4d95a7b1389 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Jun 2023 12:51:47 +0200 Subject: doc: encryption organization --- doc/book/cookbook/encryption.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/doc/book/cookbook/encryption.md b/doc/book/cookbook/encryption.md index 156c54e8..8d45a0ee 100644 --- a/doc/book/cookbook/encryption.md +++ b/doc/book/cookbook/encryption.md @@ -49,14 +49,9 @@ implements a protocol that has been clearly reviewed, Secure ScuttleButt's Secret Handshake protocol. This is why setting a `rpc_secret` is mandatory, and that's also why your nodes have super long identifiers. -## Encrypting traffic between a Garage node and your client +## HTTP API endpoints provided by Garage are in clear text -HTTP API endpoints provided by Garage are in clear text. -You have multiple options to have encryption between your client and a node: - - - Setup a reverse proxy with TLS / ACME / Let's encrypt - - Setup a Garage gateway locally, and only contact the garage daemon on `localhost` - - Only contact your Garage daemon over a secure, encrypted overlay network such as Wireguard +Adding TLS support built into Garage is not currently planned. ## Garage stores data in plain text on the filesystem @@ -76,6 +71,14 @@ system such as Hashicorp Vault? # Adding data encryption using external tools +## Encrypting traffic between a Garage node and your client + +You have multiple options to have encryption between your client and a node: + + - Setup a reverse proxy with TLS / ACME / Let's encrypt + - Setup a Garage gateway locally, and only contact the garage daemon on `localhost` + - Only contact your Garage daemon over a secure, encrypted overlay network such as Wireguard + ## Encrypting data at rest Protects against the following threats: -- cgit v1.2.3