aboutsummaryrefslogtreecommitdiff
path: root/doc/book/working-documents
diff options
context:
space:
mode:
authorAlex Auvolat <alex@adnab.me>2023-01-30 18:41:04 +0100
committerAlex Auvolat <alex@adnab.me>2023-01-30 18:41:04 +0100
commit7f715ba94fd636c5fb9d19686e5bf9f51242df06 (patch)
tree5b65f9e5a3023be3f1094ef896e206a8d1417888 /doc/book/working-documents
parent44f8b1d71abf661fb4e2a34b22c00569efc09481 (diff)
downloadgarage-7f715ba94fd636c5fb9d19686e5bf9f51242df06.tar.gz
garage-7f715ba94fd636c5fb9d19686e5bf9f51242df06.zip
zero-downtime migration procedure
Diffstat (limited to 'doc/book/working-documents')
-rw-r--r--doc/book/working-documents/migration-08.md25
1 files changed, 24 insertions, 1 deletions
diff --git a/doc/book/working-documents/migration-08.md b/doc/book/working-documents/migration-08.md
index 5f97c45b..b7c4c783 100644
--- a/doc/book/working-documents/migration-08.md
+++ b/doc/book/working-documents/migration-08.md
@@ -12,13 +12,15 @@ back up all your data before attempting it!**
Garage v0.8 introduces new data tables that allow the counting of objects in buckets in order to implement bucket quotas.
A manual migration step is required to first count objects in Garage buckets and populate these tables with accurate data.
+## Simple migration procedure (takes cluster offline for a while)
+
The migration steps are as follows:
1. Disable API and web access. Garage v0.7 does not support disabling
these endpoints but you can change the port number or stop your reverse proxy for instance.
2. Do `garage repair --all-nodes --yes tables` and `garage repair --all-nodes --yes blocks`,
check the logs and check that all data seems to be synced correctly between
- nodes. If you have time, do additional checks (`scrub`, `block_refs`, etc.)
+ nodes. If you have time, do additional checks (`versions`, `block_refs`, etc.)
3. Check that queues are empty: run `garage stats` to query them or inspect metrics in the Grafana dashboard.
4. Turn off Garage v0.7
5. **Backup the metadata folder of all your nodes!** For instance, use the following command
@@ -32,3 +34,24 @@ The migration steps are as follows:
10. Your upgraded cluster should be in a working state. Re-enable API and Web
access and check that everything went well.
11. Monitor your cluster in the next hours to see if it works well under your production load, report any issue.
+
+## Minimal downtime migration procedure
+
+The migration to Garage v0.8 can be done with almost no downtime,
+by restarting all nodes at once in the new version. The only limitation with this
+method is that bucket sizes and item counts will not be estimated correctly
+until all nodes have had a chance to run their offline migration procedure.
+
+The migration steps are as follows:
+
+1. Do `garage repair --all-nodes --yes tables` and `garage repair --all-nodes --yes blocks`,
+ check the logs and check that all data seems to be synced correctly between
+ nodes. If you have time, do additional checks (`versions`, `block_refs`, etc.)
+
+2. Turn off each node individually; back up its metadata folder (see above); turn it back on again. This will allow you to take a backup of all nodes without impacting global cluster availability. You can do all nodes of a single zone at once as this does not impact the availability of Garage.
+
+3. Prepare your binaries and configuration files for Garage v0.8
+
+4. Shut down all v0.7 nodes simultaneously, and restart them all simultaneously in v0.8. Use your favorite deployment tool (Ansible, Kubernetes, Nomad) to achieve this as fast as possible.
+
+5. At this point, Garage will indicate invalid values for the size and number of objects in each bucket (most likely, it will indicate zero). To fix this, take each node offline individually to do the offline migration step: `garage offline-repair --yes object_counters`. Again you can do all nodes of a single zone at once.