aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/api/garage-admin-v0.yml2
-rw-r--r--doc/book/build/_index.md2
-rw-r--r--doc/book/connect/_index.md2
-rw-r--r--doc/book/connect/apps/index.md49
-rw-r--r--doc/book/connect/backup.md1
-rw-r--r--doc/book/cookbook/_index.md6
-rw-r--r--doc/book/cookbook/encryption.md116
-rw-r--r--doc/book/cookbook/monitoring.md6
-rw-r--r--doc/book/cookbook/real-world.md8
-rw-r--r--doc/book/cookbook/reverse-proxy.md44
-rw-r--r--doc/book/cookbook/systemd.md15
-rw-r--r--doc/book/design/_index.md2
-rw-r--r--doc/book/design/goals.md12
-rw-r--r--doc/book/design/internals.md2
-rw-r--r--doc/book/development/_index.md2
-rw-r--r--doc/book/development/devenv.md2
-rw-r--r--doc/book/operations/_index.md23
-rw-r--r--doc/book/operations/durability-repairs.md117
-rw-r--r--doc/book/operations/layout.md (renamed from doc/book/reference-manual/layout.md)2
-rw-r--r--doc/book/operations/recovering.md (renamed from doc/book/cookbook/recovering.md)2
-rw-r--r--doc/book/operations/upgrading.md (renamed from doc/book/cookbook/upgrading.md)4
-rw-r--r--doc/book/quick-start/_index.md2
-rw-r--r--doc/book/reference-manual/_index.md2
-rw-r--r--doc/book/reference-manual/admin-api.md90
-rw-r--r--doc/book/reference-manual/configuration.md44
-rw-r--r--doc/book/reference-manual/features.md2
-rw-r--r--doc/book/reference-manual/k2v.md2
-rw-r--r--doc/book/working-documents/_index.md2
-rw-r--r--doc/drafts/admin-api.md4
29 files changed, 522 insertions, 45 deletions
diff --git a/doc/api/garage-admin-v0.yml b/doc/api/garage-admin-v0.yml
index 51968894..83316d93 100644
--- a/doc/api/garage-admin-v0.yml
+++ b/doc/api/garage-admin-v0.yml
@@ -632,7 +632,7 @@ paths:
operationId: "UpdateBucket"
summary: "Update a bucket"
description: |
- All fields (`websiteAccess` and `quotas`) are optionnal.
+ All fields (`websiteAccess` and `quotas`) are optional.
If they are present, the corresponding modifications are applied to the bucket, otherwise nothing is changed.
In `websiteAccess`: if `enabled` is `true`, `indexDocument` must be specified.
diff --git a/doc/book/build/_index.md b/doc/book/build/_index.md
index 9bb17086..021045aa 100644
--- a/doc/book/build/_index.md
+++ b/doc/book/build/_index.md
@@ -1,6 +1,6 @@
+++
title = "Build your own app"
-weight = 4
+weight = 40
sort_by = "weight"
template = "documentation.html"
+++
diff --git a/doc/book/connect/_index.md b/doc/book/connect/_index.md
index 93a2b87e..7d8e686c 100644
--- a/doc/book/connect/_index.md
+++ b/doc/book/connect/_index.md
@@ -1,6 +1,6 @@
+++
title = "Existing integrations"
-weight = 3
+weight = 30
sort_by = "weight"
template = "documentation.html"
+++
diff --git a/doc/book/connect/apps/index.md b/doc/book/connect/apps/index.md
index 4d556ff8..83aadec2 100644
--- a/doc/book/connect/apps/index.md
+++ b/doc/book/connect/apps/index.md
@@ -11,6 +11,7 @@ In this section, we cover the following web applications:
| [Peertube](#peertube) | ✅ | Supported with the website endpoint, proxifying private videos unsupported |
| [Mastodon](#mastodon) | ✅ | Natively supported |
| [Matrix](#matrix) | ✅ | Tested with `synapse-s3-storage-provider` |
+| [ejabberd](#ejabberd) | ✅ | `mod_s3_upload` |
| [Pixelfed](#pixelfed) | ❓ | Not yet tested |
| [Pleroma](#pleroma) | ❓ | Not yet tested |
| [Lemmy](#lemmy) | ✅ | Supported with pict-rs |
@@ -474,6 +475,52 @@ And add a new line. For example, to run it every 10 minutes:
*External link:* [matrix-media-repo Documentation > S3](https://docs.t2bot.io/matrix-media-repo/configuration/s3-datastore.html)
+## ejabberd
+
+ejabberd is an XMPP server implementation which, with the `mod_s3_upload`
+module in the [ejabberd-contrib](https://github.com/processone/ejabberd-contrib)
+repository, can be integrated to store chat media files in Garage.
+
+For uploads, this module leverages presigned URLs - this allows XMPP clients to
+directly send media to Garage. Receiving clients then retrieve this media
+through the [static website](@/documentation/cookbook/exposing-websites.md)
+functionality.
+
+As the data itself is publicly accessible to someone with knowledge of the
+object URL - users are recommended to use
+[E2EE](@/documentation/cookbook/encryption.md) to protect this data-at-rest
+from unauthorized access.
+
+Install the module with:
+
+```bash
+ejabberdctl module_install mod_s3_upload
+```
+
+Create the required key and bucket with:
+
+```bash
+garage key new --name ejabberd
+garage bucket create objects.xmpp-server.fr
+garage bucket allow objects.xmpp-server.fr --read --write --key ejabberd
+garage bucket website --allow objects.xmpp-server.fr
+```
+
+The module can then be configured with:
+
+```
+ mod_s3_upload:
+ #bucket_url: https://objects.xmpp-server.fr.my-garage-instance.mydomain.tld
+ bucket_url: https://my-garage-instance.mydomain.tld/objects.xmpp-server.fr
+ access_key_id: GK...
+ access_key_secret: ...
+ region: garage
+ download_url: https://objects.xmpp-server.fr
+```
+
+Other configuration options can be found in the
+[configuration YAML file](https://github.com/processone/ejabberd-contrib/blob/master/mod_s3_upload/conf/mod_s3_upload.yml).
+
## Pixelfed
[Pixelfed Technical Documentation > Configuration](https://docs.pixelfed.org/technical-documentation/env.html#filesystem)
@@ -539,7 +586,7 @@ secret_key = 'abcdef0123456789...'
```
PICTRS__STORE__TYPE=object_storage
-PICTRS__STORE__ENDPOINT=http:/my-garage-instance.mydomain.tld:3900
+PICTRS__STORE__ENDPOINT=http://my-garage-instance.mydomain.tld:3900
PICTRS__STORE__BUCKET_NAME=pictrs-data
PICTRS__STORE__REGION=garage
PICTRS__STORE__ACCESS_KEY=GK...
diff --git a/doc/book/connect/backup.md b/doc/book/connect/backup.md
index f51dda30..d20c3c96 100644
--- a/doc/book/connect/backup.md
+++ b/doc/book/connect/backup.md
@@ -105,6 +105,7 @@ restic restore 79766175 --target /var/lib/postgresql
Restic has way more features than the ones presented here.
You can discover all of them by accessing its documentation from the link below.
+Files on Android devices can also be backed up with [restic-android](https://github.com/lhns/restic-android).
*External links:* [Restic Documentation > Amazon S3](https://restic.readthedocs.io/en/stable/030_preparing_a_new_repo.html#amazon-s3)
diff --git a/doc/book/cookbook/_index.md b/doc/book/cookbook/_index.md
index 07bf6ebf..ff90ad52 100644
--- a/doc/book/cookbook/_index.md
+++ b/doc/book/cookbook/_index.md
@@ -1,7 +1,7 @@
+++
title="Cookbook"
template = "documentation.html"
-weight = 2
+weight = 20
sort_by = "weight"
+++
@@ -37,7 +37,3 @@ This chapter could also be referred as "Tutorials" or "Best practices".
- **[Monitoring Garage](@/documentation/cookbook/monitoring.md)** This page
explains the Prometheus metrics available for monitoring the Garage
cluster/nodes.
-
-- **[Recovering from failures](@/documentation/cookbook/recovering.md):** Garage's first selling point is resilience
- to hardware failures. This section explains how to recover from such a failure in the
- best possible way.
diff --git a/doc/book/cookbook/encryption.md b/doc/book/cookbook/encryption.md
new file mode 100644
index 00000000..21a5cbc6
--- /dev/null
+++ b/doc/book/cookbook/encryption.md
@@ -0,0 +1,116 @@
++++
+title = "Encryption"
+weight = 50
++++
+
+Encryption is a recurring subject when discussing Garage.
+Garage does not handle data encryption by itself, but many things can
+already be done with Garage's current feature set and the existing ecosystem.
+
+This page takes a high level approach to security in general and data encryption
+in particular.
+
+
+# Examining your need for encryption
+
+- Why do you want encryption in Garage?
+
+- What is your threat model? What are you fearing?
+ - A stolen HDD?
+ - A curious administrator?
+ - A malicious administrator?
+ - A remote attacker?
+ - etc.
+
+- What services do you want to protect with encryption?
+ - An existing application? Which one? (eg. Nextcloud)
+ - An application that you are writing
+
+- Any expertise you may have on the subject
+
+This page explains what Garage provides, and how you can improve the situation by yourself
+by adding encryption at different levels.
+
+We would be very curious to know your needs and thougs about ideas such as
+encryption practices and things like key management, as we want Garage to be a
+serious base platform for the developpment of secure, encrypted applications.
+Do not hesitate to come talk to us if you have any thoughts or questions on the
+subject.
+
+
+# Capabilities provided by Garage
+
+## Traffic is encrypted between Garage nodes
+
+RPCs between Garage nodes are encrypted. More specifically, contrary to many
+distributed software, it is impossible in Garage to have clear-text RPC. We
+use the [kuska handshake](https://github.com/Kuska-ssb/handshake) library which
+implements a protocol that has been clearly reviewed, Secure ScuttleButt's
+Secret Handshake protocol. This is why setting a `rpc_secret` is mandatory,
+and that's also why your nodes have super long identifiers.
+
+## HTTP API endpoints provided by Garage are in clear text
+
+Adding TLS support built into Garage is not currently planned.
+
+## Garage stores data in plain text on the filesystem
+
+Garage does not handle data encryption at rest by itself, and instead delegates
+to the user to add encryption, either at the storage layer (LUKS, etc) or on
+the client side (or both). There are no current plans to add data encryption
+directly in Garage.
+
+Implementing data encryption directly in Garage might make things simpler for
+end users, but also raises many more questions, especially around key
+management: for encryption of data, where could Garage get the encryption keys
+from ? If we encrypt data but keep the keys in a plaintext file next to them,
+it's useless. We probably don't want to have to manage secrets in garage as it
+would be very hard to do in a secure way. Maybe integrate with an external
+system such as Hashicorp Vault?
+
+
+# Adding data encryption using external tools
+
+## Encrypting traffic between a Garage node and your client
+
+You have multiple options to have encryption between your client and a node:
+
+ - Setup a reverse proxy with TLS / ACME / Let's encrypt
+ - Setup a Garage gateway locally, and only contact the garage daemon on `localhost`
+ - Only contact your Garage daemon over a secure, encrypted overlay network such as Wireguard
+
+## Encrypting data at rest
+
+Protects against the following threats:
+
+- Stolen HDD
+
+Crucially, does not protect againt malicious sysadmins or remote attackers that
+might gain access to your servers.
+
+Methods include full-disk encryption with tools such as LUKS.
+
+## Encrypting data on the client side
+
+Protects againt the following threats:
+
+- A honest-but-curious administrator
+- A malicious administrator that tries to corrupt your data
+- A remote attacker that can read your server's data
+
+Implementations are very specific to the various applications. Examples:
+
+- Matrix: uses the OLM protocol for E2EE of user messages. Media files stored
+ in Matrix are probably encrypted using symmetric encryption, with a key that is
+ distributed in the end-to-end encrypted message that contains the link to the object.
+
+- XMPP: clients normally support either OMEMO / OpenPGP for the E2EE of user
+ messages. Media files are encrypted per
+ [XEP-0454](https://xmpp.org/extensions/xep-0454.html).
+
+- Aerogramme: use the user's password as a key to decrypt data in the user's bucket
+
+- Cyberduck: comes with support for
+ [Cryptomator](https://docs.cyberduck.io/cryptomator/) which allows users to
+ create client-side vaults to encrypt files in before they are uploaded to a
+ cloud storage endpoint.
diff --git a/doc/book/cookbook/monitoring.md b/doc/book/cookbook/monitoring.md
index 8313daa9..b204dbbe 100644
--- a/doc/book/cookbook/monitoring.md
+++ b/doc/book/cookbook/monitoring.md
@@ -49,9 +49,5 @@ add the following lines in your Prometheus scrape config:
To visualize the scraped data in Grafana,
you can either import our [Grafana dashboard for Garage](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/branch/main/script/telemetry/grafana-garage-dashboard-prometheus.json)
or make your own.
-We detail below the list of exposed metrics and their meaning.
-
-## List of exported metrics
-
-See our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section.
+The list of exported metrics is available on our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section.
diff --git a/doc/book/cookbook/real-world.md b/doc/book/cookbook/real-world.md
index 08266b23..7061069f 100644
--- a/doc/book/cookbook/real-world.md
+++ b/doc/book/cookbook/real-world.md
@@ -197,6 +197,12 @@ The `garage` binary has two purposes:
Ensure an appropriate `garage` binary (the same version as your Docker image) is available in your path.
If your configuration file is at `/etc/garage.toml`, the `garage` binary should work with no further change.
+You can also use an alias as follows to use the Garage binary inside your docker container:
+
+```bash
+alias garage="docker exec -ti <container name> /garage"
+```
+
You can test your `garage` CLI utility by running a simple command such as:
```bash
@@ -339,7 +345,7 @@ garage layout apply
```
**WARNING:** if you want to use the layout modification commands in a script,
-make sure to read [this page](@/documentation/reference-manual/layout.md) first.
+make sure to read [this page](@/documentation/operations/layout.md) first.
## Using your Garage cluster
diff --git a/doc/book/cookbook/reverse-proxy.md b/doc/book/cookbook/reverse-proxy.md
index 9c833ad0..b715193e 100644
--- a/doc/book/cookbook/reverse-proxy.md
+++ b/doc/book/cookbook/reverse-proxy.md
@@ -378,6 +378,47 @@ admin.garage.tld {
But at the same time, the `reverse_proxy` is very flexible.
For a production deployment, you should [read its documentation](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy) as it supports features like DNS discovery of upstreams, load balancing with checks, streaming parameters, etc.
+### Caching
+
+Caddy can compiled with a
+[cache plugin](https://github.com/caddyserver/cache-handler) which can be used
+to provide a hot-cache at the webserver-level for static websites hosted by
+Garage.
+
+This can be configured as follows:
+
+```caddy
+# Caddy global configuration section
+{
+ # Bare minimum configuration to enable cache.
+ order cache before rewrite
+
+ cache
+
+ #cache
+ # allowed_http_verbs GET
+ # default_cache_control public
+ # ttl 8h
+ #}
+}
+
+# Site specific section
+https:// {
+ cache
+
+ #cache {
+ # timeout {
+ # backend 30s
+ # }
+ #}
+
+ reverse_proxy ...
+}
+```
+
+Caching is a complicated subject, and the reader is encouraged to study the
+available options provided by the plugin.
+
### On-demand TLS
Caddy supports a technique called
@@ -428,3 +469,6 @@ https:// {
reverse_proxy localhost:3902 192.168.1.2:3902 example.tld:3902
}
```
+
+More information on how this endpoint is implemented in Garage is available
+in the [Admin API Reference](@/documentation/reference-manual/admin-api.md) page.
diff --git a/doc/book/cookbook/systemd.md b/doc/book/cookbook/systemd.md
index b271010b..c0ed7d1f 100644
--- a/doc/book/cookbook/systemd.md
+++ b/doc/book/cookbook/systemd.md
@@ -33,7 +33,20 @@ NoNewPrivileges=true
WantedBy=multi-user.target
```
-*A note on hardening: garage will be run as a non privileged user, its user id is dynamically allocated by systemd. It cannot access (read or write) home folders (/home, /root and /run/user), the rest of the filesystem can only be read but not written, only the path seen as /var/lib/garage is writable as seen by the service (mapped to /var/lib/private/garage on your host). Additionnaly, the process can not gain new privileges over time.*
+**A note on hardening:** Garage will be run as a non privileged user, its user
+id is dynamically allocated by systemd (set with `DynamicUser=true`). It cannot
+access (read or write) home folders (`/home`, `/root` and `/run/user`), the
+rest of the filesystem can only be read but not written, only the path seen as
+`/var/lib/garage` is writable as seen by the service. Additionnaly, the process
+can not gain new privileges over time.
+
+For this to work correctly, your `garage.toml` must be set with
+`metadata_dir=/var/lib/garage/meta` and `data_dir=/var/lib/garage/data`. This
+is mandatory to use the DynamicUser hardening feature of systemd, which
+autocreates these directories as virtual mapping. If the directory
+`/var/lib/garage` already exists before starting the server for the first time,
+the systemd service might not start correctly. Note that in your host
+filesystem, Garage data will be held in `/var/lib/private/garage`.
To start the service then automatically enable it at boot:
diff --git a/doc/book/design/_index.md b/doc/book/design/_index.md
index 50933139..5881ab8f 100644
--- a/doc/book/design/_index.md
+++ b/doc/book/design/_index.md
@@ -1,6 +1,6 @@
+++
title = "Design"
-weight = 6
+weight = 70
sort_by = "weight"
template = "documentation.html"
+++
diff --git a/doc/book/design/goals.md b/doc/book/design/goals.md
index 4e390ba6..78ac7978 100644
--- a/doc/book/design/goals.md
+++ b/doc/book/design/goals.md
@@ -42,15 +42,13 @@ locations. They use Garage themselves for the following tasks:
- As a [Matrix media backend](https://github.com/matrix-org/synapse-s3-storage-provider)
-- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 proxy
+- As a Nix binary cache
-- In the Drone continuous integration platform to store task logs
+- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 and SFTP-to-S3 proxy
-- As a Nix binary cache
+- As a backup target using `rclone` and `restic`
-- As a backup target using `rclone`
+- In the Drone continuous integration platform to store task logs
The Deuxfleurs Garage cluster is a multi-site cluster currently composed of
-4 nodes in 2 physical locations. In the future it will be expanded to at
-least 3 physical locations to fully exploit Garage's potential for high
-availability.
+9 nodes in 3 physical locations.
diff --git a/doc/book/design/internals.md b/doc/book/design/internals.md
index 777e017d..cefb7acc 100644
--- a/doc/book/design/internals.md
+++ b/doc/book/design/internals.md
@@ -61,7 +61,7 @@ Garage prioritizes which nodes to query according to a few criteria:
For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md)
-and [cluster layout management](@/documentation/reference-manual/layout.md) pages.
+and [cluster layout management](@/documentation/operations/layout.md) pages.
## Garbage collection
diff --git a/doc/book/development/_index.md b/doc/book/development/_index.md
index 8e730bf6..2b2af0cc 100644
--- a/doc/book/development/_index.md
+++ b/doc/book/development/_index.md
@@ -1,6 +1,6 @@
+++
title = "Development"
-weight = 7
+weight = 80
sort_by = "weight"
template = "documentation.html"
+++
diff --git a/doc/book/development/devenv.md b/doc/book/development/devenv.md
index 8d7d2e95..dd3bdec0 100644
--- a/doc/book/development/devenv.md
+++ b/doc/book/development/devenv.md
@@ -25,7 +25,7 @@ git clone https://git.deuxfleurs.fr/Deuxfleurs/garage
cd garage
```
-*Optionnaly, you can use our nix.conf file to speed up compilations:*
+*Optionally, you can use our nix.conf file to speed up compilations:*
```bash
sudo mkdir -p /etc/nix
diff --git a/doc/book/operations/_index.md b/doc/book/operations/_index.md
new file mode 100644
index 00000000..16d0e9d5
--- /dev/null
+++ b/doc/book/operations/_index.md
@@ -0,0 +1,23 @@
++++
+title = "Operations & Maintenance"
+weight = 50
+sort_by = "weight"
+template = "documentation.html"
++++
+
+This section contains a number of important information on how to best operate a Garage cluster,
+to ensure integrity and availability of your data:
+
+- **[Upgrading Garage](@/documentation/operations/upgrading.md):** General instructions on how to
+ upgrade your cluster from one version to the next. Instructions specific for each version upgrade
+ can bef ound in the [working documents](@/documentation/working-documents/_index.md) section.
+
+- **[Layout management](@/documentation/operations/layout.md):** Best practices for using the `garage layout`
+ commands when adding or removing nodes from your cluster.
+
+- **[Durability and repairs](@/documentation/operations/durability-repairs.md):** How to check for small things
+ that might be going wrong, and how to recover from such failures.
+
+- **[Recovering from failures](@/documentation/operations/recovering.md):** Garage's first selling point is resilience
+ to hardware failures. This section explains how to recover from such a failure in the
+ best possible way.
diff --git a/doc/book/operations/durability-repairs.md b/doc/book/operations/durability-repairs.md
new file mode 100644
index 00000000..498c8fda
--- /dev/null
+++ b/doc/book/operations/durability-repairs.md
@@ -0,0 +1,117 @@
++++
+title = "Durability & Repairs"
+weight = 30
++++
+
+To ensure the best durability of your data and to fix any inconsistencies that may
+pop up in a distributed system, Garage provides a series of repair operations.
+This guide will explain the meaning of each of them and when they should be applied.
+
+
+# General syntax of repair operations
+
+Repair operations described below are of the form `garage repair <repair_name>`.
+These repairs will not launch without the `--yes` flag, which should
+be added as follows: `garage repair --yes <repair_name>`.
+By default these repair procedures will only run on the Garage node your CLI is
+connecting to. To run on all nodes, add the `-a` flag as follows:
+`garage repair -a --yes <repair_name>`.
+
+# Data block operations
+
+## Data store scrub
+
+Scrubbing the data store means examining each individual data block to check that
+their content is correct, by verifying their hash. Any block found to be corrupted
+(e.g. by bitrot or by an accidental manipulation of the datastore) will be
+restored from another node that holds a valid copy.
+
+Scrubs are automatically scheduled by Garage to run every 25-35 days (the
+actual time is randomized to spread load across nodes). The next scheduled run
+can be viewed with `garage worker get`.
+
+A scrub can also be launched manually using `garage repair scrub start`.
+
+To view the status of an ongoing scrub, first find the task ID of the scrub worker
+using `garage worker list`. Then, run `garage worker info <scrub_task_id>` to
+view detailed runtime statistics of the scrub. To gather cluster-wide information,
+this command has to be run on each individual node.
+
+A scrub is a very disk-intensive operation that might slow down your cluster.
+You may pause an ongoing scrub using `garage repair scrub pause`, but note that
+the scrub will resume automatically 24 hours later as Garage will not let your
+cluster run without a regular scrub. If the scrub procedure is too intensive
+for your servers and is slowing down your workload, the recommended solution
+is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`.
+A higher tranquility value will make Garage take longer pauses between two block
+verifications. Of course, scrubbing the entire data store will also take longer.
+
+## Block check and resync
+
+In some cases, nodes hold a reference to a block but do not actually have the block
+stored on disk. Conversely, they may also have on disk blocks that are not referenced
+any more. To fix both cases, a block repair may be run with `garage repair blocks`.
+This will scan the entire block reference counter table to check that the blocks
+exist on disk, and will scan the entire disk store to check that stored blocks
+are referenced.
+
+It is recommended to run this procedure when changing your cluster layout,
+after the metadata tables have finished synchronizing between nodes
+(usually a few hours after `garage layout apply`).
+
+## Inspecting lost blocks
+
+In extremely rare situations, data blocks may be unavailable from the entire cluster.
+This means that even using `garage repair blocks`, some nodes may be unable
+to fetch data blocks for which they hold a reference.
+
+These errors are stored on each node in a list of "block resync errors", i.e.
+blocks for which the last resync operation failed.
+This list can be inspected using `garage block list-errors`.
+These errors usually fall into one of the following categories:
+
+1. a block is still referenced but the object was deleted, this is a case
+ of metadata reference inconsistency (see below for the fix)
+2. a block is referenced by a non-deleted object, but could not be fetched due
+ to a transient error such as a network failure
+3. a block is referenced by a non-deleted object, but could not be fetched due
+ to a permanent error such as there not being any valid copy of the block on the
+ entire cluster
+
+To help make the difference between cases 1 and cases 2 and 3, you may use the
+`garage block info` command to see which objects hold a reference to each block.
+
+In the second case (transient errors), Garage will try to fetch the block again
+after a certain time, so the error should disappear naturally. You can also
+request Garage to try to fetch the block immediately using `garage block retry-now`
+if you have fixed the transient issue.
+
+If you are confident that you are in the third scenario and that your data block
+is definitely lost, then there is no other choice than to declare your S3 objects
+as unrecoverable, and to delete them properly from the data store. This can be done
+using the `garage block purge` command.
+
+
+# Metadata operations
+
+## Metadata table resync
+
+Garage automatically resyncs all entries stored in the metadata tables every hour,
+to ensure that all nodes have the most up-to-date version of all the information
+they should be holding.
+The resync procedure is based on a Merkle tree that allows to efficiently find
+differences between nodes.
+
+In some special cases, e.g. before an upgrade, you might want to run a table
+resync manually. This can be done using `garage repair tables`.
+
+## Metadata table reference fixes
+
+In some very rare cases where nodes are unavailable, some references between objects
+are broken. For instance, if an object is deleted, the underlying versions or data
+blocks may still be held by Garage. If you suspect that such corruption has occurred
+in your cluster, you can run one of the following repair procedures:
+
+- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version
+- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected)
+
diff --git a/doc/book/reference-manual/layout.md b/doc/book/operations/layout.md
index a7d6f51f..5e314246 100644
--- a/doc/book/reference-manual/layout.md
+++ b/doc/book/operations/layout.md
@@ -1,6 +1,6 @@
+++
title = "Cluster layout management"
-weight = 50
+weight = 20
+++
The cluster layout in Garage is a table that assigns to each node a role in
diff --git a/doc/book/cookbook/recovering.md b/doc/book/operations/recovering.md
index 2129a7f3..7a830788 100644
--- a/doc/book/cookbook/recovering.md
+++ b/doc/book/operations/recovering.md
@@ -1,6 +1,6 @@
+++
title = "Recovering from failures"
-weight = 50
+weight = 40
+++
Garage is meant to work on old, second-hand hardware.
diff --git a/doc/book/cookbook/upgrading.md b/doc/book/operations/upgrading.md
index 9d60a988..e8919a19 100644
--- a/doc/book/cookbook/upgrading.md
+++ b/doc/book/operations/upgrading.md
@@ -1,6 +1,6 @@
+++
title = "Upgrading Garage"
-weight = 60
+weight = 10
+++
Garage is a stateful clustered application, where all nodes are communicating together and share data structures.
@@ -58,7 +58,7 @@ From a high level perspective, a major upgrade looks like this:
### Major upgarades with minimal downtime
-There is only one operation that has to be coordinated cluster-wide: the passage of one version of the internal RPC protocol to the next.
+There is only one operation that has to be coordinated cluster-wide: the switch of one version of the internal RPC protocol to the next.
This means that an upgrade with very limited downtime can simply be performed from one major version to the next by restarting all nodes
simultaneously in the new version.
The downtime will simply be the time required for all nodes to stop and start again, which should be less than a minute.
diff --git a/doc/book/quick-start/_index.md b/doc/book/quick-start/_index.md
index 5863c09b..46aaa9bc 100644
--- a/doc/book/quick-start/_index.md
+++ b/doc/book/quick-start/_index.md
@@ -1,6 +1,6 @@
+++
title = "Quick Start"
-weight = 0
+weight = 10
sort_by = "weight"
template = "documentation.html"
+++
diff --git a/doc/book/reference-manual/_index.md b/doc/book/reference-manual/_index.md
index ab1de5e6..1f360e57 100644
--- a/doc/book/reference-manual/_index.md
+++ b/doc/book/reference-manual/_index.md
@@ -1,6 +1,6 @@
+++
title = "Reference Manual"
-weight = 5
+weight = 60
sort_by = "weight"
template = "documentation.html"
+++
diff --git a/doc/book/reference-manual/admin-api.md b/doc/book/reference-manual/admin-api.md
index 363bc886..6932ac60 100644
--- a/doc/book/reference-manual/admin-api.md
+++ b/doc/book/reference-manual/admin-api.md
@@ -39,11 +39,95 @@ Authorization: Bearer <token>
## Administration API endpoints
-### Metrics-related endpoints
-
-#### Metrics `GET /metrics`
+### Metrics `GET /metrics`
Returns internal Garage metrics in Prometheus format.
+The metrics are directly documented when returned by the API.
+
+**Example:**
+
+```
+$ curl -i http://localhost:3903/metrics
+HTTP/1.1 200 OK
+content-type: text/plain; version=0.0.4
+content-length: 12145
+date: Tue, 08 Aug 2023 07:25:05 GMT
+
+# HELP api_admin_error_counter Number of API calls to the various Admin API endpoints that resulted in errors
+# TYPE api_admin_error_counter counter
+api_admin_error_counter{api_endpoint="CheckWebsiteEnabled",status_code="400"} 1
+api_admin_error_counter{api_endpoint="CheckWebsiteEnabled",status_code="404"} 3
+# HELP api_admin_request_counter Number of API calls to the various Admin API endpoints
+# TYPE api_admin_request_counter counter
+api_admin_request_counter{api_endpoint="CheckWebsiteEnabled"} 7
+api_admin_request_counter{api_endpoint="Health"} 3
+# HELP api_admin_request_duration Duration of API calls to the various Admin API endpoints
+...
+```
+
+### Health `GET /health`
+
+Returns `200 OK` if enough nodes are up to have a quorum (ie. serve requests),
+otherwise returns `503 Service Unavailable`.
+
+**Example:**
+
+```
+$ curl -i http://localhost:3903/health
+HTTP/1.1 200 OK
+content-type: text/plain
+content-length: 102
+date: Tue, 08 Aug 2023 07:22:38 GMT
+
+Garage is fully operational
+Consult the full health check API endpoint at /v0/health for more details
+```
+
+### On-demand TLS `GET /check`
+
+To prevent abuses for on-demand TLS, Caddy developpers have specified an endpoint that can be queried by the reverse proxy
+to know if a given domain is allowed to get a certificate. Garage implements this endpoints to tell if a given domain is handled by Garage or is garbage.
+
+Garage responds with the following logic:
+ - If the domain matches the pattern `<bucket-name>.<s3_api.root_domain>`, returns 200 OK
+ - If the domain matches the pattern `<bucket-name>.<s3_web.root_domain>` and website is configured for `<bucket>`, returns 200 OK
+ - If the domain matches the pattern `<bucket-name>` and website is configured for `<bucket>`, returns 200 OK
+ - Otherwise, returns 404 Not Found, 400 Bad Request or 5xx requests.
+
+*Note 1: because in the path-style URL mode, there is only one domain that is not known by Garage, hence it is not supported by this API endpoint.
+You must manually declare the domain in your reverse-proxy. Idem for K2V.*
+
+*Note 2: buckets in a user's namespace are not supported yet by this endpoint. This is a limitation of this endpoint currently.*
+
+**Example:** Suppose a Garage instance configured with `s3_api.root_domain = .s3.garage.localhost` and `s3_web.root_domain = .web.garage.localhost`.
+
+With a private `media` bucket (name in the global namespace, website is disabled), the endpoint will feature the following behavior:
+
+```
+$ curl -so /dev/null -w "%{http_code}" http://localhost:3903/check?domain=media.s3.garage.localhost
+200
+$ curl -so /dev/null -w "%{http_code}" http://localhost:3903/check?domain=media
+400
+$ curl -so /dev/null -w "%{http_code}" http://localhost:3903/check?domain=media.web.garage.localhost
+400
+```
+
+With a public `example.com` bucket (name in the global namespace, website is activated), the endpoint will feature the following behavior:
+
+```
+$ curl -so /dev/null -w "%{http_code}" http://localhost:3903/check?domain=example.com.s3.garage.localhost
+200
+$ curl -so /dev/null -w "%{http_code}" http://localhost:3903/check?domain=example.com
+200
+$ curl -so /dev/null -w "%{http_code}" http://localhost:3903/check?domain=example.com.web.garage.localhost
+200
+```
+
+
+**References:**
+ - [Using On-Demand TLS](https://caddyserver.com/docs/automatic-https#using-on-demand-tls)
+ - [Add option for a backend check to approve use of on-demand TLS](https://github.com/caddyserver/caddy/pull/1939)
+ - [Serving tens of thousands of domains over HTTPS with Caddy](https://caddy.community/t/serving-tens-of-thousands-of-domains-over-https-with-caddy/11179)
### Cluster operations
diff --git a/doc/book/reference-manual/configuration.md b/doc/book/reference-manual/configuration.md
index 38062bab..b916bb61 100644
--- a/doc/book/reference-manual/configuration.md
+++ b/doc/book/reference-manual/configuration.md
@@ -35,12 +35,18 @@ bootstrap_peers = [
[consul_discovery]
+api = "catalog"
consul_http_addr = "http://127.0.0.1:8500"
service_name = "garage-daemon"
ca_cert = "/etc/consul/consul-ca.crt"
client_cert = "/etc/consul/consul-client.crt"
client_key = "/etc/consul/consul-key.crt"
+# for `agent` API mode, unset client_cert and client_key, and optionally enable `token`
+# token = "abcdef-01234-56789"
tls_skip_verify = false
+tags = [ "dns-enabled" ]
+meta = { dns-acl = "allow trusted" }
+
[kubernetes_discovery]
namespace = "garage"
@@ -201,7 +207,7 @@ Garage supports the following replication modes:
that should probably never be used.
Note that in modes `2` and `3`,
-if at least the same number of zones are available, an arbitrary number of failures in
+if at least the same number of zones are available, an arbitrary number of failures in
any given zone is tolerated as copies of data will be spread over several zones.
**Make sure `replication_mode` is the same in the configuration files of all nodes.
@@ -245,7 +251,7 @@ Values between `1` (faster compression) and `19` (smaller file) are standard com
levels for zstd. From `20` to `22`, compression levels are referred as "ultra" and must be
used with extra care as it will use lot of memory. A value of `0` will let zstd choose a
default value (currently `3`). Finally, zstd has also compression designed to be faster
-than default compression levels, they range from `-1` (smaller file) to `-99` (faster
+than default compression levels, they range from `-1` (smaller file) to `-99` (faster
compression).
If you do not specify a `compression_level` entry, Garage will set it to `1` for you. With
@@ -316,6 +322,12 @@ reached by other nodes of the cluster, which should be set in `rpc_public_addr`.
The `consul_http_addr` parameter should be set to the full HTTP(S) address of the Consul server.
+### `api`
+
+Two APIs for service registration are supported: `catalog` and `agent`. `catalog`, the default, will register a service using
+the `/v1/catalog` endpoints, enabling mTLS if `client_cert` and `client_key` are provided. The `agent` API uses the
+`v1/agent` endpoints instead, where an optional `token` may be provided.
+
### `service_name`
`service_name` should be set to the service name under which Garage's
@@ -324,6 +336,7 @@ RPC ports are announced.
### `client_cert`, `client_key`
TLS client certificate and client key to use when communicating with Consul over TLS. Both are mandatory when doing so.
+Only available when `api = "catalog"`.
### `ca_cert`
@@ -334,6 +347,29 @@ TLS CA certificate to use when communicating with Consul over TLS.
Skip server hostname verification in TLS handshake.
`ca_cert` is ignored when this is set.
+### `token`
+
+Uses the provided token for communication with Consul. Only available when `api = "agent"`.
+The policy assigned to this token should at least have these rules:
+
+```hcl
+// the `service_name` specified above
+service "garage" {
+ policy = "write"
+}
+
+service_prefix "" {
+ policy = "read"
+}
+
+node_prefix "" {
+ policy = "read"
+}
+```
+
+### `tags` and `meta`
+
+Additional list of tags and map of service meta to add during service registration.
## The `[kubernetes_discovery]` section
@@ -373,7 +409,7 @@ message that redirects the client to the correct region.
### `root_domain` {#root_domain}
-The optionnal suffix to access bucket using vhost-style in addition to path-style request.
+The optional suffix to access bucket using vhost-style in addition to path-style request.
Note path-style requests are always enabled, whether or not vhost-style is configured.
Configuring vhost-style S3 required a wildcard DNS entry, and possibly a wildcard TLS certificate,
but might be required by softwares not supporting path-style requests.
@@ -396,7 +432,7 @@ This endpoint does not suport TLS: a reverse proxy should be used to provide it.
### `root_domain`
-The optionnal suffix appended to bucket names for the corresponding HTTP Host.
+The optional suffix appended to bucket names for the corresponding HTTP Host.
For instance, if `root_domain` is `web.garage.eu`, a bucket called `deuxfleurs.fr`
will be accessible either with hostname `deuxfleurs.fr.web.garage.eu`
diff --git a/doc/book/reference-manual/features.md b/doc/book/reference-manual/features.md
index 550504ff..2f8e633a 100644
--- a/doc/book/reference-manual/features.md
+++ b/doc/book/reference-manual/features.md
@@ -35,7 +35,7 @@ This makes setting up and administering storage clusters, we hope, as easy as it
A Garage cluster can very easily evolve over time, as storage nodes are added or removed.
Garage will automatically rebalance data between nodes as needed to ensure the desired number of copies.
-Read about cluster layout management [here](@/documentation/reference-manual/layout.md).
+Read about cluster layout management [here](@/documentation/operations/layout.md).
### No RAFT slowing you down
diff --git a/doc/book/reference-manual/k2v.md b/doc/book/reference-manual/k2v.md
index ed069b27..c01f641e 100644
--- a/doc/book/reference-manual/k2v.md
+++ b/doc/book/reference-manual/k2v.md
@@ -3,7 +3,7 @@ title = "K2V"
weight = 100
+++
-Starting with version 0.7.2, Garage introduces an optionnal feature, K2V,
+Starting with version 0.7.2, Garage introduces an optional feature, K2V,
which is an alternative storage API designed to help efficiently store
many small values in buckets (in opposition to S3 which is more designed
to store large blobs).
diff --git a/doc/book/working-documents/_index.md b/doc/book/working-documents/_index.md
index 8fc170b7..fe79e65d 100644
--- a/doc/book/working-documents/_index.md
+++ b/doc/book/working-documents/_index.md
@@ -1,6 +1,6 @@
+++
title = "Working Documents"
-weight = 8
+weight = 90
sort_by = "weight"
template = "documentation.html"
+++
diff --git a/doc/drafts/admin-api.md b/doc/drafts/admin-api.md
index 9a697a59..fb71dc83 100644
--- a/doc/drafts/admin-api.md
+++ b/doc/drafts/admin-api.md
@@ -453,7 +453,7 @@ Request body format:
}
```
-All fields (`name`, `allow` and `deny`) are optionnal.
+All fields (`name`, `allow` and `deny`) are optional.
If they are present, the corresponding modifications are applied to the key, otherwise nothing is changed.
The possible flags in `allow` and `deny` are: `createBucket`.
@@ -609,7 +609,7 @@ Request body format:
}
```
-All fields (`websiteAccess` and `quotas`) are optionnal.
+All fields (`websiteAccess` and `quotas`) are optional.
If they are present, the corresponding modifications are applied to the bucket, otherwise nothing is changed.
In `websiteAccess`: if `enabled` is `true`, `indexDocument` must be specified.