aboutsummaryrefslogtreecommitdiff
path: root/doc/book/operations/layout.md
blob: 39cbcc1c3d2c1f3dfb05af2b37993f0ba6c12859 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
+++
title = "Cluster layout management"
weight = 20
+++

The cluster layout in Garage is a table that assigns to each node a role in
the cluster. The role of a node in Garage can either be a storage node with
a certain capacity, or a gateway node that does not store data and is only
used as an API entry point for faster cluster access.
An introduction to building cluster layouts can be found in the [production deployment](@/documentation/cookbook/real-world.md) page.

## How cluster layouts work in Garage

In Garage, a cluster layout is composed of the following components:

- a table of roles assigned to nodes
- a version number

Garage nodes will always use the cluster layout with the highest version number.

Garage nodes also maintain and synchronize between them a set of proposed role
changes that haven't yet been applied. These changes will be applied (or
canceled) in the next version of the layout

The following commands insert modifications to the set of proposed role changes
for the next layout version (but they do not create the new layout immediately):

```bash
garage layout assign [...]
garage layout remove [...]
```

The following command can be used to inspect the layout that is currently set in the cluster
and the changes proposed for the next layout version, if any:

```bash
garage layout show
```

The following commands create a new layout with the specified version number,
that either takes into account the proposed changes or cancels them:

```bash
garage layout apply --version <new_version_number>
garage layout revert --version <new_version_number>
```

The version number of the new layout to create must be 1 + the version number
of the previous layout that existed in the cluster.  The `apply` and `revert`
commands will fail otherwise.

## Warnings about Garage cluster layout management

**Warning: never make several calls to `garage layout apply` or `garage layout
revert` with the same value of the `--version` flag. Doing so can lead to the
creation of several different layouts with the same version number, in which
case your Garage cluster will become inconsistent until fixed.** If a call to
`garage layout apply` or `garage layout revert` has failed and `garage layout
show` indicates that a new layout with the given version number has not been
set in the cluster, then it is fine to call the command again with the same
version number.

If you are using the `garage` CLI by typing individual commands in your
shell, you shouldn't have much issues as long as you run commands one after
the other and take care of checking the output of `garage layout show`
before applying any changes.

If you are using the `garage` CLI to script layout changes, follow the following recommendations:

- Make all of your `garage` CLI calls to the same RPC host. Do not use the
  `garage` CLI to connect to individual nodes to send them each a piece of the
  layout changes you are making, as the changes propagate asynchronously
  between nodes and might not all be taken into account at the time when the
  new layout is applied.

- **Only call `garage layout apply` once**, and call it **strictly after** all
  of the `layout assign` and `layout remove` commands have returned.


## Understanding unexpected layout calculations


### Example 1

```
$ garage layout show
==== CURRENT CLUSTER LAYOUT ====
ID                Tags   Zone  Capacity   Usable capacity
b10c110e4e854e5a  node1  dc1   1000.0 MB  1000.0 MB (100.0%)
a235ac7695e0c54d  node2  dc2   1000.0 MB  1000.0 MB (100.0%)
62b218d848e86a64  node3  dc3   1000.0 MB  1000.0 MB (100.0%)

Zone redundancy: maximum

Current cluster layout version: 6

==== STAGED ROLE CHANGES ====
ID                Tags   Zone  Capacity
a11c7cf18af29737  node4  dc1   1000.0 MB


==== NEW CLUSTER LAYOUT AFTER APPLYING CHANGES ====
ID                Tags   Zone  Capacity   Usable capacity
b10c110e4e854e5a  node1  dc1   1000.0 MB  1000.0 MB (100.0%)
a11c7cf18af29737  node4  dc1   1000.0 MB  0 B (0.0%)
a235ac7695e0c54d  node2  dc2   1000.0 MB  1000.0 MB (100.0%)
62b218d848e86a64  node3  dc3   1000.0 MB  1000.0 MB (100.0%)

Zone redundancy: maximum

==== COMPUTATION OF A NEW PARTITION ASSIGNATION ====

Partitions are replicated 3 times on at least 3 distinct zones.

Optimal partition size:                     3.9 MB (3.9 MB in previous layout)
Usable capacity / total cluster capacity:   3.0 GB / 4.0 GB (75.0 %)
Effective capacity (replication factor 3):  1000.0 MB

A total of 0 new copies of partitions need to be transferred.

dc1                 Tags   Partitions        Capacity   Usable capacity
  b10c110e4e854e5a  node1  256 (0 new)       1000.0 MB  1000.0 MB (100.0%)
  a11c7cf18af29737  node4  0 (0 new)         1000.0 MB  0 B (0.0%)
  TOTAL                    256 (256 unique)  2.0 GB     1000.0 MB (50.0%)

dc2                 Tags   Partitions        Capacity   Usable capacity
  a235ac7695e0c54d  node2  256 (0 new)       1000.0 MB  1000.0 MB (100.0%)
  TOTAL                    256 (256 unique)  1000.0 MB  1000.0 MB (100.0%)

dc3                 Tags   Partitions        Capacity   Usable capacity
  62b218d848e86a64  node3  256 (0 new)       1000.0 MB  1000.0 MB (100.0%)
  TOTAL                    256 (256 unique)  1000.0 MB  1000.0 MB (100.0%)
```

### Example 2

```
==== CURRENT CLUSTER LAYOUT ====
ID                Tags   Zone  Capacity   Usable capacity
b10c110e4e854e5a  node1  dc1   1000.0 MB  500.0 MB (50.0%)
a11c7cf18af29737  node4  dc1   1000.0 MB  500.0 MB (50.0%)
a235ac7695e0c54d  node2  dc2   1000.0 MB  1000.0 MB (100.0%)
62b218d848e86a64  node3  dc3   1000.0 MB  1000.0 MB (100.0%)

Zone redundancy: maximum

Current cluster layout version: 8

==== STAGED ROLE CHANGES ====
ID                Tags   Zone  Capacity
a11c7cf18af29737  node4  dc3   1000.0 MB


==== NEW CLUSTER LAYOUT AFTER APPLYING CHANGES ====
ID                Tags   Zone  Capacity   Usable capacity
b10c110e4e854e5a  node1  dc1   1000.0 MB  1000.0 MB (100.0%)
a235ac7695e0c54d  node2  dc2   1000.0 MB  1000.0 MB (100.0%)
62b218d848e86a64  node3  dc3   1000.0 MB  753.9 MB (75.4%)
a11c7cf18af29737  node4  dc3   1000.0 MB  246.1 MB (24.6%)

Zone redundancy: maximum

==== COMPUTATION OF A NEW PARTITION ASSIGNATION ====

Partitions are replicated 3 times on at least 3 distinct zones.

Optimal partition size:                     3.9 MB (3.9 MB in previous layout)
Usable capacity / total cluster capacity:   3.0 GB / 4.0 GB (75.0 %)
Effective capacity (replication factor 3):  1000.0 MB

A total of 128 new copies of partitions need to be transferred.

dc1                 Tags   Partitions        Capacity   Usable capacity
  b10c110e4e854e5a  node1  256 (128 new)     1000.0 MB  1000.0 MB (100.0%)
  TOTAL                    256 (256 unique)  1000.0 MB  1000.0 MB (100.0%)

dc2                 Tags   Partitions        Capacity   Usable capacity
  a235ac7695e0c54d  node2  256 (0 new)       1000.0 MB  1000.0 MB (100.0%)
  TOTAL                    256 (256 unique)  1000.0 MB  1000.0 MB (100.0%)

dc3                 Tags   Partitions        Capacity   Usable capacity
  62b218d848e86a64  node3  193 (0 new)       1000.0 MB  753.9 MB (75.4%)
  a11c7cf18af29737  node4  63 (0 new)        1000.0 MB  246.1 MB (24.6%)
  TOTAL                    256 (256 unique)  2.0 GB     1000.0 MB (50.0%)
```