From a3871f2251ab61744ab6e0b77763207949e0dd20 Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Tue, 9 Nov 2021 12:24:04 +0100 Subject: Improve how node roles are assigned in Garage - change the terminology: the network configuration becomes the role table, the configuration of a nodes becomes a node's role - the modification of the role table takes place in two steps: first, changes are staged in a CRDT data structure. Then, once the user is happy with the changes, they can commit them all at once (or revert them). - update documentation - fix tests - implement smarter partition assignation algorithm This patch breaks the format of the network configuration: when migrating, the cluster will be in a state where no roles are assigned. All roles must be re-assigned and commited at once. This migration should not pose an issue. --- doc/book/src/SUMMARY.md | 3 +- doc/book/src/cookbook/gateways.md | 10 +++- doc/book/src/cookbook/real_world.md | 58 ++++++++++++------- doc/book/src/cookbook/recovering.md | 18 +++--- doc/book/src/intro.md | 39 +++++++------ doc/book/src/quick_start/index.md | 27 +++++---- doc/book/src/reference_manual/configuration.md | 4 +- doc/book/src/reference_manual/layout.md | 74 ++++++++++++++++++++++++ doc/book/src/working_documents/load_balancing.md | 2 + 9 files changed, 172 insertions(+), 63 deletions(-) create mode 100644 doc/book/src/reference_manual/layout.md (limited to 'doc/book/src') diff --git a/doc/book/src/SUMMARY.md b/doc/book/src/SUMMARY.md index 1f597954..cbf6bb70 100644 --- a/doc/book/src/SUMMARY.md +++ b/doc/book/src/SUMMARY.md @@ -5,12 +5,12 @@ - [Quick start](./quick_start/index.md) - [Cookbook](./cookbook/index.md) + - [Multi-node deployment](./cookbook/real_world.md) - [Building from source](./cookbook/from_source.md) - [Integration with systemd](./cookbook/systemd.md) - [Gateways](./cookbook/gateways.md) - [Exposing buckets as websites](./cookbook/exposing_websites.md) - [Configuring a reverse proxy](./cookbook/reverse_proxy.md) - - [Production Deployment](./cookbook/real_world.md) - [Recovering from failures](./cookbook/recovering.md) - [Integrations](./connect/index.md) @@ -25,6 +25,7 @@ - [Reference Manual](./reference_manual/index.md) - [Garage configuration file](./reference_manual/configuration.md) + - [Cluster layout management](./reference_manual/layout.md) - [Garage CLI](./reference_manual/cli.md) - [S3 compatibility status](./reference_manual/s3_compatibility.md) diff --git a/doc/book/src/cookbook/gateways.md b/doc/book/src/cookbook/gateways.md index f1ad43e4..7b286b65 100644 --- a/doc/book/src/cookbook/gateways.md +++ b/doc/book/src/cookbook/gateways.md @@ -21,7 +21,9 @@ Currently it will not work with minio client. Follow issue [#64](https://git.deu The instructions are similar to a regular node, the only option that is different is while configuring the node, you must set the `--gateway` parameter: ```bash -garage node configure --gateway --tag gw1 xxxx +garage layout assign --gateway --tag gw1 +garage layout show # review the changes you are making +garage layout apply # once satisfied, apply the changes ``` Then use `http://localhost:3900` when a S3 endpoint is required: @@ -29,3 +31,9 @@ Then use `http://localhost:3900` when a S3 endpoint is required: ```bash aws --endpoint-url http://127.0.0.1:3900 s3 ls ``` + +If a newly added gateway node seems to not be working, do a full table resync to ensure that bucket and key list are correctly propagated: + +```bash +garage repair -a --yes tables +``` diff --git a/doc/book/src/cookbook/real_world.md b/doc/book/src/cookbook/real_world.md index 7864274c..4b3fec2b 100644 --- a/doc/book/src/cookbook/real_world.md +++ b/doc/book/src/cookbook/real_world.md @@ -41,15 +41,15 @@ For our example, we will suppose the following infrastructure with IPv6 connecti ## Get a Docker image -Our docker image is currently named `lxpz/garage_amd64` and is stored on the [Docker Hub](https://hub.docker.com/r/lxpz/garage_amd64/tags?page=1&ordering=last_updated). +Our docker image is currently named `dxflrs/amd64_garage` and is stored on the [Docker Hub](https://hub.docker.com/r/dxflrs/amd64_garage/tags?page=1&ordering=last_updated). We encourage you to use a fixed tag (eg. `v0.4.0`) and not the `latest` tag. For this example, we will use the latest published version at the time of the writing which is `v0.4.0` but it's up to you -to check [the most recent versions on the Docker Hub](https://hub.docker.com/r/lxpz/garage_amd64/tags?page=1&ordering=last_updated). +to check [the most recent versions on the Docker Hub](https://hub.docker.com/r/dxflrs/amd64_garage/tags?page=1&ordering=last_updated). For example: ``` -sudo docker pull lxpz/garage_amd64:v0.4.0 +sudo docker pull dxflrs/amd64_garage:v0.4.0 ``` ## Deploying and configuring Garage @@ -144,7 +144,7 @@ At this point, nodes are not yet talking to one another. Your output should therefore look like follows: ``` -Mercury$ garage node-id +Mercury$ garage status ==== HEALTHY NODES ==== ID Hostname Address Tag Zone Capacity 563e1ac825ee3323… Mercury [fc00:1::1]:3901 NO ROLE ASSIGNED @@ -157,14 +157,14 @@ When your Garage nodes first start, they will generate a local node identifier (based on a public/private key pair). To obtain the node identifier of a node, once it is generated, -run `garage node-id`. +run `garage node id`. This will print keys as follows: ```bash -Mercury$ garage node-id +Mercury$ garage node id 563e1ac825ee3323aa441e72c26d1030d6d4414aeb3dd25287c531e7fc2bc95d@[fc00:1::1]:3901 -Venus$ garage node-id +Venus$ garage node id 86f0f26ae4afbd59aaf9cfb059eefac844951efd5b8caeec0d53f4ed6c85f332@[fc00:1::2]:3901 etc. @@ -191,20 +191,22 @@ ID Hostname Address Tag Zone Capa 212f7572f0c89da9… Mars [fc00:F::1]:3901 NO ROLE ASSIGNED ``` -## Giving roles to nodes +## Creating a cluster layout We will now inform Garage of the disk space available on each node of the cluster as well as the zone (e.g. datacenter) in which each machine is located. +This information is called the **cluster layout** and consists +of a role that is assigned to each active cluster node. For our example, we will suppose we have the following infrastructure (Capacity, Identifier and Zone are specific values to Garage described in the following): | Location | Name | Disk Space | `Capacity` | `Identifier` | `Zone` | |----------|---------|------------|------------|--------------|--------------| -| Paris | Mercury | 1 To | `2` | `563e` | `par1` | -| Paris | Venus | 2 To | `4` | `86f0` | `par1` | -| London | Earth | 2 To | `4` | `6814` | `lon1` | -| Brussels | Mars | 1.5 To | `3` | `212f` | `bru1` | +| Paris | Mercury | 1 To | `10` | `563e` | `par1` | +| Paris | Venus | 2 To | `20` | `86f0` | `par1` | +| London | Earth | 2 To | `20` | `6814` | `lon1` | +| Brussels | Mars | 1.5 To | `15` | `212f` | `bru1` | #### Node identifiers @@ -239,13 +241,9 @@ in order to provide high availability despite failure of a zone. Garage reasons on an abstract metric about disk storage that is named the *capacity* of a node. The capacity configured in Garage must be proportional to the disk space dedicated to the node. -Due to the way the Garage allocation algorithm works, capacity values must -be **integers**, and must be **as small as possible**, for instance with -1 representing the size of your smallest server. -Here we chose that 1 unit of capacity = 0.5 To, so that we can express servers of size -1 To and 2 To, as wel as the intermediate size 1.5 To, with the integer values 2, 4 and -3 respectively (see table above). +Capacity values must be **integers** but can be given any signification. +Here we chose that 1 unit of capacity = 100 GB. Note that the amount of data stored by Garage on each server may not be strictly proportional to its capacity value, as Garage will priorize having 3 copies of data in different zones, @@ -257,13 +255,29 @@ have 66% chance of being stored by Venus and 33% chance of being stored by Mercu Given the information above, we will configure our cluster as follow: +```bash +garage layout assign -z par1 -c 10 -t mercury 563e +garage layout assign -z par1 -c 20 -t venus 86f0 +garage layout assign -z lon1 -c 20 -t earth 6814 +garage layout assign -z bru1 -c 15 -t mars 212f ``` -garage node configure -z par1 -c 2 -t mercury 563e -garage node configure -z par1 -c 4 -t venus 86f0 -garage node configure -z lon1 -c 4 -t earth 6814 -garage node configure -z bru1 -c 3 -t mars 212f + +At this point, the changes in the cluster layout have not yet been applied. +To show the new layout that will be applied, call: + +```bash +garage layout show ``` +Once you are satisfied with your new layout, apply it with: + +```bash +garage layout apply +``` + +**WARNING:** if you want to use the layout modification commands in a script, +make sure to read [this page](/reference_manual/layout.html) first. + ## Using your Garage cluster diff --git a/doc/book/src/cookbook/recovering.md b/doc/book/src/cookbook/recovering.md index a6f15fcb..279d574c 100644 --- a/doc/book/src/cookbook/recovering.md +++ b/doc/book/src/cookbook/recovering.md @@ -28,8 +28,10 @@ and you should instead use one of the methods detailed in the next sections. Removing a node is done with the following command: -``` -garage node remove --yes +```bash +garage layout remove +garage layout show # review the changes you are making +garage layout apply # once satisfied, apply the changes ``` (you can get the `node_id` of the failed node by running `garage status`) @@ -50,7 +52,7 @@ We just need to tell Garage to get back all the data blocks and store them on th First, set up a new HDD to store Garage's data directory on the failed node, and restart Garage using the existing configuration. Then, run: -``` +```bash garage repair -a --yes blocks ``` @@ -58,7 +60,7 @@ This will re-synchronize blocks of data that are missing to the new HDD, reading You can check on the advancement of this process by doing the following command: -``` +```bash garage stats -a ``` @@ -94,9 +96,11 @@ The ID of the lost node should be shown in `garage status` in the section for di Then, replace the broken node by the new one, using: -``` -garage node configure --replace \ - -c -z -t +```bash +garage layout assign --replace \ + -c -z -t +garage layout show # review the changes you are making +garage layout apply # once satisfied, apply the changes ``` Garage will then start synchronizing all required data on the new node. diff --git a/doc/book/src/intro.md b/doc/book/src/intro.md index ffce8847..a54362be 100644 --- a/doc/book/src/intro.md +++ b/doc/book/src/intro.md @@ -18,10 +18,18 @@ This very website is hosted using Garage. In other words: the doc is the PoC! # The Garage Geo-Distributed Data Store -Garage is a lightweight geo-distributed data store. -It comes from the observation that despite numerous object stores -many people have broken data management policies (backup/replication on a single site or none at all). -To promote better data management policies, we focused on the following **desirable properties**: +Garage is a lightweight geo-distributed data store that implements the +[Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) +object storage protocole. It enables applications to store large blobs such +as pictures, video, images, documents, etc., in a redundant multi-node +setting. S3 is versatile enough to also be used to publish a static +website. + +Garage comes from the observation that despite the numerous existing +implementation of object stores, many people have broken data management +policies (backup/replication on a single site or none at all). To promote +better data management policies, we focused on the following **desirable +properties**: - **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target [hyperconverged infrastructures](https://en.wikipedia.org/wiki/Hyper-converged_infrastructure). - **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures. @@ -32,26 +40,19 @@ We also noted that the pursuit of some other goals are detrimental to our initia The following has been identified as **non-goals** (if these points matter to you, you should not use Garage): - **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only. - - **Feature extensiveness**: complete implementation of the S3 API or any other API to make garage a drop-in replacement is not targeted as it could lead to decisions impacting our desirable properties. + - **Feature extensiveness**: complete implementation of the S3 API or any other API to make Garage a drop-in replacement is not targeted as it could lead to decisions impacting our desirable properties. - **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication. - **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment. -## Supported and planned protocols - -Garage speaks (or will speak) the following protocols: - - - [S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) - *SUPPORTED* - Enable applications to store large blobs such as pictures, video, images, documents, etc. S3 is versatile enough to also be used to publish a static website. - - [IMAP](https://github.com/go-pluto/pluto) - *PLANNED* - email storage is quite complex to get good performances. -To keep performances optimal, most IMAP servers only support on-disk storage. -We plan to add logic to Garage to make it a viable solution for email storage. - - *More to come* - ## Use Cases -**[Deuxfleurs](https://deuxfleurs.fr):** Garage is used by Deuxfleurs which is a non-profit hosting organization. -Especially, it is used to host their main website, this documentation and some of its members' blogs. -Additionally, Garage is used as a [backend for Nextcloud](https://docs.nextcloud.com/server/20/admin_manual/configuration_files/primary_storage.html). -Deuxfleurs also plans to use Garage as their [Matrix's media backend](https://github.com/matrix-org/synapse-s3-storage-provider) and as the backend of [OCIS](https://github.com/owncloud/ocis). +**[Deuxfleurs](https://deuxfleurs.fr):** Garage is used by Deuxfleurs which +is a non-profit hosting organization. Especially, it is used to host their +main website, this documentation and some of its members' blogs. +Deuxfleurs also uses Garage as their [Matrix's media +backend](https://github.com/matrix-org/synapse-s3-storage-provider). +Deuxfleurs also uses it in its continuous integration platform to store +Drone's job logs and a Nix binary cache. *Are you using Garage? [Open a pull request](https://git.deuxfleurs.fr/Deuxfleurs/garage/) to add your organization here!* diff --git a/doc/book/src/quick_start/index.md b/doc/book/src/quick_start/index.md index 8de3fd8b..ffb3ebbe 100644 --- a/doc/book/src/quick_start/index.md +++ b/doc/book/src/quick_start/index.md @@ -6,22 +6,23 @@ and how to interact with it. Our goal is to introduce you to Garage's workflows. Following this guide is recommended before moving on to -[configuring a real-world deployment](../cookbook/real_world.md). +[configuring a multi-node cluster](../cookbook/real_world.md). -Note that this kind of deployment should not be used in production, as it provides -no redundancy for your data! +Note that this kind of deployment should not be used in production, +as it provides no redundancy for your data! ## Get a binary Download the latest Garage binary from the release pages on our repository: - + Place this binary somewhere in your `$PATH` so that you can invoke the `garage` command directly (for instance you can copy the binary in `/usr/local/bin` or in `~/.local/bin`). If a binary of the last version is not available for your architecture, +or if you want a build customized for your system, you can [build Garage from source](../cookbook/from_source.md). @@ -109,9 +110,9 @@ ID Hostname Address Tag Zone Capacit 563e1ac825ee3323… linuxbox 127.0.0.1:3901 NO ROLE ASSIGNED ``` -## Configuring your Garage node +## Creating a cluster layout -Configuring the nodes in a Garage deployment means informing Garage +Creating a cluster layout for a Garage deployment means informing Garage of the disk space available on each node of the cluster as well as the zone (e.g. datacenter) each machine is located in. @@ -119,14 +120,18 @@ For our test deployment, we are using only one node. The way in which we configu it does not matter, you can simply write: ```bash -garage node configure -z dc1 -c 1 +garage layout assign -z dc1 -c 1 ``` where `` corresponds to the identifier of the node shown by `garage status` (first column). You can enter simply a prefix of that identifier. -For instance here you could write just `garage node configure -z dc1 -c 1 563e`. +For instance here you could write just `garage layout assign -z dc1 -c 1 563e`. +The layout then has to be applied to the cluster, using: +```bash +garage layout apply +``` ## Creating buckets and keys @@ -197,7 +202,7 @@ Now that we have a bucket and a key, we need to give permissions to the key on t ``` garage bucket allow \ --read \ - --write + --write \ nextcloud-bucket \ --key nextcloud-app-key ``` @@ -270,5 +275,5 @@ The following tools can also be used to send and recieve files from/to Garage: - [Cyberduck](https://cyberduck.io/) - [`s3cmd`](https://s3tools.org/s3cmd) -Refer to the ["configuring clients"](../cookbook/clients.md) page to learn how to configure -these clients to interact with a Garage server. +Refer to the ["Integrations" section](../connect/index.md) to learn how to +configure application and command line utilities to integrate with Garage. diff --git a/doc/book/src/reference_manual/configuration.md b/doc/book/src/reference_manual/configuration.md index 61f7bcee..0b1e7bc7 100644 --- a/doc/book/src/reference_manual/configuration.md +++ b/doc/book/src/reference_manual/configuration.md @@ -133,9 +133,9 @@ These peer identifiers have the following syntax: In the case where `rpc_public_addr` is correctly specified in the configuration file, the full identifier of a node including IP and port can -be obtained by running `garage node-id` and then included directly in the +be obtained by running `garage node id` and then included directly in the `bootstrap_peers` list of other nodes. Otherwise, only the node's public -key will be returned by `garage node-id` and you will have to add the IP +key will be returned by `garage node id` and you will have to add the IP yourself. #### `consul_host` and `consul_service_name` diff --git a/doc/book/src/reference_manual/layout.md b/doc/book/src/reference_manual/layout.md new file mode 100644 index 00000000..80c71d60 --- /dev/null +++ b/doc/book/src/reference_manual/layout.md @@ -0,0 +1,74 @@ +# Creating and updating a cluster layout + +The cluster layout in Garage is a table that assigns to each node a role in +the cluster. The role of a node in Garage can either be a storage node with +a certain capacity, or a gateway node that does not store data and is only +used as an API entry point for faster cluster access. +An introduction to building cluster layouts can be found in the [production deployment](/cookbook/real_world.md) page. + +## How cluster layouts work in Garage + +In Garage, a cluster layout is composed of the following components: + +- a table of roles assigned to nodes +- a version number + +Garage nodes will always use the cluster layout with the highest version number. + +Garage nodes also maintain and synchronize between them a set of proposed role +changes that haven't yet been applied. These changes will be applied (or +canceled) in the next version of the layout + +The following commands insert modifications to the set of proposed role changes +for the next layout version (but they do not create the new layout immediately): + +```bash +garage layout assign [...] +garage layout remove [...] +``` + +The following command can be used to inspect the layout that is currently set in the cluster +and the changes proposed for the next layout version, if any: + +```bash +garage layout show +``` + +The following commands create a new layout with the specified version number, +that either takes into account the proposed changes or cancels them: + +```bash +garage layout apply --version +garage layout revert --version +``` + +The version number of the new layout to create must be 1 + the version number +of the previous layout that existed in the cluster. The `apply` and `revert` +commands will fail otherwise. + +## Warnings about Garage cluster layout management + +**Warning: never make several calls to `garage layout apply` or `garage layout +revert` with the same value of the `--version` flag. Doing so can lead to the +creation of several different layouts with the same version number, in which +case your Garage cluster will become inconsistent until fixed.** If a call to +`garage layout apply` or `garage layout revert` has failed and `garage layout +show` indicates that a new layout with the given version number has not been +set in the cluster, then it is fine to call the command again with the same +version number. + +If you are using the `garage` CLI by typing individual commands in your +shell, you shouldn't have much issues as long as you run commands one after +the other and take care of checking the output of `garage layout show` +before applying any changes. + +If you are using the `garage` CLI to script layout changes, follow the following recommendations: + +- Make all of your `garage` CLI calls to the same RPC host. Do not use the + `garage` CLI to connect to individual nodes to send them each a piece of the + layout changes you are making, as the changes propagate asynchronously + between nodes and might not all be taken into account at the time when the + new layout is applied. + +- **Only call `garage layout apply` once**, and call it **strictly after** all + of the `layout assign` and `layout remove` commands have returned. diff --git a/doc/book/src/working_documents/load_balancing.md b/doc/book/src/working_documents/load_balancing.md index c436fdcb..99271add 100644 --- a/doc/book/src/working_documents/load_balancing.md +++ b/doc/book/src/working_documents/load_balancing.md @@ -1,5 +1,7 @@ # Load Balancing Data (planned for version 0.2) +**This is being yet improved in release 0.5. The working document has not been updated yet, it still only applies to Garage 0.2 through 0.4.** + I have conducted a quick study of different methods to load-balance data over different Garage nodes using consistent hashing. ## Requirements -- cgit v1.2.3