aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlex Auvolat <alex@adnab.me>2024-03-15 13:16:41 +0100
committerAlex Auvolat <alex@adnab.me>2024-03-15 13:51:31 +0100
commit8cf3d24875d41d79ab08d637cd38d2a5b9e527dd (patch)
treec92172dee172941c3daf32a08927f8ebab0ded9e
parenta68c37555d15bb19a10f74c7ee85485a5228ab66 (diff)
downloadgarage-8cf3d24875d41d79ab08d637cd38d2a5b9e527dd.tar.gz
garage-8cf3d24875d41d79ab08d637cd38d2a5b9e527dd.zip
[db-snapshot] documentation for metadata db snapshotsdb-snapshot
-rw-r--r--doc/book/cookbook/real-world.md16
-rw-r--r--doc/book/operations/durability-repairs.md18
-rw-r--r--doc/book/operations/recovering.md54
-rw-r--r--doc/book/operations/upgrading.md12
-rw-r--r--doc/book/reference-manual/configuration.md21
5 files changed, 114 insertions, 7 deletions
diff --git a/doc/book/cookbook/real-world.md b/doc/book/cookbook/real-world.md
index 15a58b9b..9e226030 100644
--- a/doc/book/cookbook/real-world.md
+++ b/doc/book/cookbook/real-world.md
@@ -72,13 +72,14 @@ to store 2 TB of data in total.
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
- For the metadata storage, Garage does not do checksumming and integrity
- verification on its own. Users have reported that when using the LMDB
- database engine (the default), database files have a tendency of becoming
- corrupted after an unclean shutdown (e.g. a power outage), so you should use
- a robust filesystem such as BTRFS or ZFS for the metadata partition, and take
- regular snapshots so that you can restore to a recent known-good state in
- case of an incident. If you cannot do so, you might want to switch to Sqlite
- which is more robust.
+ verification on its own, so it is better to use a robust filesystem such as
+ BTRFS or ZFS. Users have reported that when using the LMDB database engine
+ (the default), database files have a tendency of becoming corrupted after an
+ unclean shutdown (e.g. a power outage), so you should take regular snapshots
+ to be able to recover from such a situation. This can be done using Garage's
+ built-in automatic snapshotting (since v0.9.4), or by using filesystem level
+ snapshots. If you cannot do so, you might want to switch to Sqlite which is
+ more robust.
- LMDB is the fastest and most tested database engine, but it has the following
weaknesses: 1/ data files are not architecture-independent, you cannot simply
@@ -124,6 +125,7 @@ A valid `/etc/garage.toml` for our cluster would look as follows:
metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
db_engine = "lmdb"
+metadata_auto_snapshot_interval = "6h"
replication_mode = "3"
diff --git a/doc/book/operations/durability-repairs.md b/doc/book/operations/durability-repairs.md
index f4450dae..c76dc39e 100644
--- a/doc/book/operations/durability-repairs.md
+++ b/doc/book/operations/durability-repairs.md
@@ -104,6 +104,24 @@ operation will also move out all data from locations marked as read-only.
# Metadata operations
+## Metadata snapshotting
+
+It is good practice to setup automatic snapshotting of your metadata database
+file, to recover from situations where it becomes corrupted on disk. This can
+be done at the filesystem level if you are using ZFS or BTRFS.
+
+Since Garage v0.9.4, Garage is able to take snapshots of the metadata database
+itself. This basically amounts to copying the database file, except that it can
+be run live while Garage is running without the risk of corruption or
+inconsistencies. This can be setup to run automatically on a schedule using
+[`metadata_auto_snapshot_interval`](@/documentation/reference-manual/configuration.md#metadata_auto_snapshot_interval).
+A snapshot can also be triggered manually using the `garage meta snapshot`
+command. Note that taking a snapshot using this method is very intensive as it
+requires making a full copy of the database file, so you might prefer using
+filesystem-level snapshots if possible. To recover a corrupted node from such a
+snapshot, read the instructions
+[here](@/documentation/operations/recovering.md#corrupted_meta).
+
## Metadata table resync
Garage automatically resyncs all entries stored in the metadata tables every hour,
diff --git a/doc/book/operations/recovering.md b/doc/book/operations/recovering.md
index 7a830788..6e19db0e 100644
--- a/doc/book/operations/recovering.md
+++ b/doc/book/operations/recovering.md
@@ -108,3 +108,57 @@ garage layout apply # once satisfied, apply the changes
Garage will then start synchronizing all required data on the new node.
This process can be monitored using the `garage stats -a` command.
+
+## Replacement scenario 3: corrupted metadata {#corrupted_meta}
+
+In some cases, your metadata DB file might become corrupted, for instance if
+your node suffered a power outage and did not shut down properly. In this case,
+you can recover without having to change the node ID and rebuilding a cluster
+layout. This means that data blocks will not need to be shuffled around, you
+must simply find a way to repair the metadata file. The best way is generally
+to discard the corrupted file and recover it from another source.
+
+First of all, start by locating the database file in your metadata directory,
+which [depends on your `db_engine`
+choice](@/documentation/reference-manual/configuration.md#db_engine). Then,
+your recovery options are as follows:
+
+- **Option 1: resyncing from other nodes.** In case your cluster is replicated
+ with two or three copies, you can simply delete the database file, and Garage
+ will resync from other nodes. To do so, stop Garage, delete the database file
+ or directory, and restart Garage. Then, do a full table repair by calling
+ `garage repair -a --yes tables`. This will take a bit of time to complete as
+ the new node will need to receive copies of the metadata tables from the
+ network.
+
+- **Option 2: restoring a snapshot taken by Garage.** Since v0.9.4, Garage can
+ [automatically take regular
+ snapshots](@/documentation/reference-manual/configuration.md#metadata_auto_snapshot_interval)
+ of your metadata DB file. This file or directory should be located under
+ `<metadata_dir>/snapshots`, and is named according to the UTC time at which it
+ was taken. Stop Garage, discard the database file/directory and replace it by the
+ snapshot you want to use. For instance, in the case of LMDB:
+
+ ```bash
+ cd $METADATA_DIR
+ mv db.lmdb db.lmdb.bak
+ cp -r snapshots/2024-03-15T12:13:52Z db.lmdb
+ ```
+
+ And for Sqlite:
+
+ ```bash
+ cd $METADATA_DIR
+ mv db.sqlite db.sqlite.bak
+ cp snapshots/2024-03-15T12:13:52Z db.sqlite
+ ```
+
+ Then, restart Garage and run a full table repair by calling `garage repair -a
+ --yes tables`. This should run relatively fast as only the changes that
+ occurred since the snapshot was taken will need to be resynchronized. Of
+ course, if your cluster is not replicated, you will lose all changes that
+ occurred since the snapshot was taken.
+
+- **Option 3: restoring a filesystem-level snapshot.** If you are using ZFS or
+ BTRFS to snapshot your metadata partition, refer to their specific
+ documentation on rolling back or copying files from an old snapshot.
diff --git a/doc/book/operations/upgrading.md b/doc/book/operations/upgrading.md
index 6b6ea26d..c239bfe4 100644
--- a/doc/book/operations/upgrading.md
+++ b/doc/book/operations/upgrading.md
@@ -73,6 +73,18 @@ The entire procedure would look something like this:
You can do all of the nodes in a single zone at once as that won't impact global cluster availability.
Do not try to make a backup of the metadata folder of a running node.
+ **Since Garage v0.9.4,** you can use the `garage meta snapshot --all` command
+ to take a simultaneous snapshot of the metadata database files of all your
+ nodes. This avoids the tedious process of having to take them down one by
+ one before upgrading. Be careful that if automatic snapshotting is enabled,
+ Garage only keeps the last two snapshots and deletes older ones, so you might
+ want to disable automatic snapshotting in your upgraded configuration file
+ until you have confirmed that the upgrade ran successfully. In addition to
+ snapshotting the metadata databases of your nodes, you should back-up at
+ least the `cluster_layout` file of one of your Garage instances (this file
+ should be the same on all nodes and you can copy it safely while Garage is
+ running).
+
3. Prepare your binaries and configuration files for the new Garage version
4. Restart all nodes simultaneously in the new version
diff --git a/doc/book/reference-manual/configuration.md b/doc/book/reference-manual/configuration.md
index 8e87b7d8..de800ec0 100644
--- a/doc/book/reference-manual/configuration.md
+++ b/doc/book/reference-manual/configuration.md
@@ -15,6 +15,7 @@ data_dir = "/var/lib/garage/data"
metadata_fsync = true
data_fsync = false
disable_scrub = false
+metadata_auto_snapshot_interval = "6h"
db_engine = "lmdb"
@@ -90,6 +91,7 @@ Top-level configuration options:
[`db_engine`](#db_engine),
[`disable_scrub`](#disable_scrub),
[`lmdb_map_size`](#lmdb_map_size),
+[`metadata_auto_snapshot_interval`](#metadata_auto_snapshot_interval),
[`metadata_dir`](#metadata_dir),
[`metadata_fsync`](#metadata_fsync),
[`replication_mode`](#replication_mode),
@@ -346,6 +348,25 @@ at the cost of a moderate drop in write performance.
Similarly to `metatada_fsync`, this is likely not necessary
if geographical replication is used.
+#### `metadata_auto_snapshot_interval` (since Garage v0.9.4) {#metadata_auto_snapshot_interval}
+
+If this value is set, Garage will automatically take a snapshot of the metadata
+DB file at a regular interval and save it in the metadata directory.
+This can allow to recover from situations where the metadata DB file is corrupted,
+for instance after an unclean shutdown.
+See [this page](@/documentation/operations/recovering.md#corrupted_meta) for details.
+
+Garage keeps only the two most recent snapshots of the metadata DB and deletes
+older ones automatically.
+
+Note that taking a metadata snapshot is a relatively intensive operation as the
+entire data file is copied. A snapshot being taken might have performance
+impacts on the Garage node while it is running. If the cluster is under heavy
+write load when a snapshot operation is running, this might also cause the
+database file to grow in size significantly as pages cannot be recycled easily.
+For this reason, it might be better to use filesystem-level snapshots instead
+if possible.
+
#### `disable_scrub` {#disable_scrub}
By default, Garage runs a scrub of the data directory approximately once per