Merge pull request 'multi-hdd support (fix #218)' (#625) from multihdd into next

Reviewed-on: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/625
author: Alex <alex@adnab.me> 2023-09-11 10:52:01 +0000
committer: Alex <alex@adnab.me> 2023-09-11 10:52:01 +0000
commit: 7228fbfd4f62942cf0212d838446ece5ee7f8ef2 (patch)
tree: ef742d50efb5c3d66145f8a5583e4728c483cb9f /doc/book/operations
parent: 4b4f2000f45a83b4dad3f2a8fd8392a245a30286 (diff)
parent: ba7ac52c196c452e0b09fef63862264e0c4582bb (diff)
download: garage-7228fbfd4f62942cf0212d838446ece5ee7f8ef2.tar.gz
garage-7228fbfd4f62942cf0212d838446ece5ee7f8ef2.zip
3 files changed, 112 insertions, 2 deletions
diff --git a/doc/book/operations/durability-repairs.md b/doc/book/operations/durability-repairs.md
index 498c8fda..b0d2c78a 100644
--- a/doc/book/operations/durability-repairs.md
+++ b/doc/book/operations/durability-repairs.md
@@ -91,6 +91,16 @@ is definitely lost, then there is no other choice than to declare your S3 object
 as unrecoverable, and to delete them properly from the data store. This can be done
 using the `garage block purge` command.
 
+## Rebalancing data directories
+
+In [multi-HDD setups](@/documentation/operations/multi-hdd.md), to ensure that
+data blocks are well balanced between storage locations, you may run a
+rebalance operation using `garage repair rebalance`. This is usefull when
+adding storage locations or when capacities of the storage locations have been
+changed.  Once this is finished, Garage will know for each block of a single
+possible location where it can be, which can increase access speed.  This
+operation will also move out all data from locations marked as read-only.
+
 
 # Metadata operations
 
@@ -114,4 +124,3 @@ in your cluster, you can run one of the following repair procedures:
 
 - `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version
 - `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected)
-
diff --git a/doc/book/operations/multi-hdd.md b/doc/book/operations/multi-hdd.md
new file mode 100644
index 00000000..36445b0a
--- /dev/null
+++ b/doc/book/operations/multi-hdd.md
@@ -0,0 +1,101 @@
++++
+title = "Multi-HDD support"
+weight = 15
++++
+
+
+Since v0.9, Garage natively supports nodes that have several storage drives
+for storing data blocks (not for metadata storage).
+
+## Initial setup
+
+To set up a new Garage storage node with multiple HDDs,
+format and mount all your drives in different directories,
+and use a Garage configuration as follows:
+
+```toml
+data_dir = [
+    { path = "/path/to/hdd1", capacity = "2T" },
+    { path = "/path/to/hdd2", capacity = "4T" },
+]
+```
+
+Garage will automatically balance all blocks stored by the node
+among the different specified directories, proportionnally to the
+specified capacities.
+
+## Updating the list of storage locations
+
+If you add new storage locations to your `data_dir`,
+Garage will not rebalance existing data between storage locations.
+Newly written blocks will be balanced proportionnally to the specified capacities,
+and existing data may be moved between drives to improve balancing,
+but only opportunistically when a data block is re-written (e.g. an object
+is re-uploaded, or an object with a duplicate block is uploaded).
+
+To understand precisely what is happening, we need to dive in to how Garage
+splits data among the different storage locations.
+
+First of all, Garage divides the set of all possible block hashes
+in a fixed number of slices (currently 1024), and assigns
+to each slice a primary storage location among the specified data directories.
+The number of slices having their primary location in each data directory
+is proportionnal to the capacity specified in the config file.
+
+When Garage receives a block to write, it will always write it in the primary
+directory of the slice that contains its hash.
+
+Now, to be able to not lose existing data blocks when storage locations
+are added, Garage also keeps a list of secondary data directories
+for all of the hash slices. Secondary data directories for a slice indicates
+storage locations that once were primary directories for that slice, i.e. where
+Garage knows that data blocks of that slice might be stored.
+When Garage is requested to read a certain data block,
+it will first look in the primary storage directory of its slice,
+and if it doesn't find it there it goes through all of the secondary storage
+locations until it finds it. This allows Garage to continue operating
+normally when storage locations are added, without having to shuffle
+files between drives to place them in the correct location.
+
+This relatively simple strategy works well but does not ensure that data
+is correctly balanced among drives according to their capacity.
+To rebalance data, two strategies can be used:
+
+- Lazy rebalancing: when a block is re-written (e.g. the object is re-uploaded),
+  Garage checks whether the existing copy is in the primary directory of the slice
+  or in a secondary directory. If the current copy is in a secondary directory,
+  Garage re-writes a copy in the primary directory and deletes the one from the
+  secondary directory. This might never end up rebalancing everything if there
+  are data blocks that are only read and never written.
+
+- Active rebalancing: an operator of a Garage node can explicitly launch a repair
+  procedure that rebalances the data directories, moving all blocks to their
+  primary location. Once done, all secondary locations for all hash slices are
+  removed so that they won't be checked anymore when looking for a data block.
+
+## Read-only storage locations
+
+If you would like to move all data blocks from an existing data directory to one
+or several new data directories, mark the old directory as read-only:
+
+```toml
+data_dir = [
+    { path = "/path/to/old_data", read_only = true },
+    { path = "/path/to/new_hdd1", capacity = "2T" },
+    { path = "/path/to/new_hdd2", capacity = "4T" },
+]
+```
+
+Garage will be able to read requested blocks from the read-only directory.
+Garage will also move data out of the read-only directory either progressively
+(lazy rebalancing) or if requested explicitly (active rebalancing).
+
+Once an active rebalancing has finished, your read-only directory should be empty:
+it might still contain subdirectories, but no data files. You can check that
+it contains no files using:
+
+```bash
+find -type f /path/to/old_data      # should not print anything
+```
+
+at which point it can be removed from the `data_dir` list in your config file.
diff --git a/doc/book/operations/upgrading.md b/doc/book/operations/upgrading.md
index e8919a19..9a738282 100644
--- a/doc/book/operations/upgrading.md
+++ b/doc/book/operations/upgrading.md
@@ -80,6 +80,6 @@ The entire procedure would look something like this:
 5. If any specific migration procedure is required, it is usually in one of the two cases:
 
   - It can be run on online nodes after the new version has started, during regular cluster operation.
-  - it has to be run offline
+  - it has to be run offline, in which case you will have to again take all nodes offline one after the other to run the repair
 
    For this last step, please refer to the specific documentation pertaining to the version upgrade you are doing.
author	Alex <alex@adnab.me>	2023-09-11 10:52:01 +0000
committer	Alex <alex@adnab.me>	2023-09-11 10:52:01 +0000
commit	7228fbfd4f62942cf0212d838446ece5ee7f8ef2 (patch)
tree	ef742d50efb5c3d66145f8a5583e4728c483cb9f /doc/book/operations
parent	4b4f2000f45a83b4dad3f2a8fd8392a245a30286 (diff)
parent	ba7ac52c196c452e0b09fef63862264e0c4582bb (diff)
download	garage-7228fbfd4f62942cf0212d838446ece5ee7f8ef2.tar.gz garage-7228fbfd4f62942cf0212d838446ece5ee7f8ef2.zip