From 6790e24f5a41af121f413eb2c9052d9b2438a4a0 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Thu, 5 Oct 2023 15:20:48 +0200 Subject: Add migration to v0.9 guide --- doc/book/working-documents/migration-09.md | 68 ++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 doc/book/working-documents/migration-09.md diff --git a/doc/book/working-documents/migration-09.md b/doc/book/working-documents/migration-09.md new file mode 100644 index 00000000..b277bb9b --- /dev/null +++ b/doc/book/working-documents/migration-09.md @@ -0,0 +1,68 @@ ++++ +title = "Migrating from 0.8 to 0.9" +weight = 14 ++++ + +**This guide explains how to migrate to 0.9 if you have an existing 0.8 cluster. +We don't recommend trying to migrate to 0.9 directly from 0.7 or older.** + +**We make no guarantee that this migration will work perfectly: +back up all your data before attempting it!** + +The following are **breaking changes** in Garage v0.9 that require your attention when migrating: + +- LMDB is now the default metadata db engine and Sled is deprecated. If you were using Sled, make sure to specify `db_engine = "sled"` in your configuration file, or take the time to [convert your database](https://garagehq.deuxfleurs.fr/documentation/reference-manual/configuration/#db-engine-since-v0-8-0). + +- Capacity values are now in actual byte units. The translation from the old layout will assign 1 capacity = 1Gb by default, which might be wrong for your cluster. This does not cause any data to be moved around, but you might want to re-assign correct capacity values post-migration. + +- Multipart uploads that were started in Garage v0.8 will not be visible in Garage v0.9 and will have to be restarted from scratch. + +- Changes to the admin API: some `v0/` endpoints have been replaced by `v1/` counterparts with updated/uniformized syntax. All other endpoints have also moved to `v1/` by default, without syntax changes, but are still available under `v0/` for compatibility. + + +## Simple migration procedure (takes cluster offline for a while) + +The migration steps are as follows: + +1. Disable API and web access. You may do this by stopping your reverse proxy or by commenting out + the `api_bind_addr` values in your `config.toml` file. +2. Do `garage repair --all-nodes --yes tables` and `garage repair --all-nodes --yes blocks`, + check the logs and check that all data seems to be synced correctly between + nodes. If you have time, do additional checks (`versions`, `block_refs`, etc.) +3. Check that the block resync queue and Merkle queue are empty: + run `garage stats -a` to query them or inspect metrics in the Grafana dashboard. +4. Turn off Garage v0.8 +5. **Backup the metadata folder of all your nodes!** For instance, use the following command + if your metadata directory is `/var/lib/garage/meta`: `cd /var/lib/garage ; tar -acf meta-v0.8.tar.zst meta/` +6. Install Garage v0.9 +7. Update your configuration file if necessary. +8. Turn on Garage v0.9 +9. Do `garage repair --all-nodes --yes tables` and `garage repair --all-nodes --yes blocks`. + Wait for a full table sync to run. +10. Your upgraded cluster should be in a working state. Re-enable API and Web + access and check that everything went well. +11. Monitor your cluster in the next hours to see if it works well under your production load, report any issue. +12. You might want to assign correct capacity values to all your nodes. Doing so might cause data to be moved + in your cluster, which should also be monitored carefully. + +## Minimal downtime migration procedure + +The migration to Garage v0.9 can be done with almost no downtime, +by restarting all nodes at once in the new version. + +The migration steps are as follows: + +1. Do `garage repair --all-nodes --yes tables` and `garage repair --all-nodes --yes blocks`, + check the logs and check that all data seems to be synced correctly between + nodes. If you have time, do additional checks (`versions`, `block_refs`, etc.) + +2. Turn off each node individually; back up its metadata folder (see above); turn it back on again. + This will allow you to take a backup of all nodes without impacting global cluster availability. + You can do all nodes of a single zone at once as this does not impact the availability of Garage. + +3. Prepare your binaries and configuration files for Garage v0.9 + +4. Shut down all v087 nodes simultaneously, and restart them all simultaneously in v0.9. + Use your favorite deployment tool (Ansible, Kubernetes, Nomad) to achieve this as fast as possible. + +5. Proceed with steps 9-12 above. -- cgit v1.2.3