aboutsummaryrefslogtreecommitdiff
path: root/content/blog/2024-03-survey/index.md
blob: 20e0284ba5e50cbb762d106c21199b3ac1fde223 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
+++
title="Results of the community survey"
date=2024-03-12
+++

*We ran a community survey to gather feedback from Garage users and potential
users during a two-month period. One of the main objectives of
this survey was to determine expectations from the community for Garage's
upcoming v1.0 release and for future work. Read this article for a discussion
of the results.*

<!-- more -->

---

The survey collected 127 response during a time period of almost 2 months,
from the 15th of January to the 12th of March. 
The first question we asked users were how they have heard of Garage:
the majority answered that they have head of Garage through a link
aggregator or social network such as Reddit or HN. A portion of
users have heard of it from word of mouth, and a significant portion also
answered "Other". Unfortunately we didn't ask respondents for details
if they selected "Other", so I'm quite curious as to what this could be.
Other choices have almost negligible number of responses.

<center><img src="all-how-known.png" /></center>

Half of the respondents indicated that they are currently running a Garage cluster
for production data, of which a small fraction indicated running it in a commercial
setting. Another third of respondents indicated that they are currently testing Garage
or have tested it previously.

<center><img src="all-currently-admin.png" /></center>

## About currently running Garage installations

We first asked users what kind of data they were storing in Garage.
The first answer, selected by about half of the participants,
is for storing back-ups, followed closely by personal files.
Other answers follow with a rougly linearly decreasing pattern.

<center><img src="all-data-kind.png" /></center>

The majority of users are not running Garage in geodistributed mode,
but many users are also running in 2, 3 or even 4 locations.

<center><img src="all-n-zones.png" /></center>

A large majority of users are only using Garage through the S3 API.
The remaining users are mostly using a mix of S3 API and web API,
with a small number of users (5) using Garage primarily as a web server.

<center><img src="all-access-mode.png" /></center>

Regarding the size of clusters, the majority of installed clusters are less
than 1TB in size.  The others are almost all between 1TB to 10TB. 8 users
indicated that they are running clusters of more than 10TB. Two users that
reported running clusters of more than 100TB, but they also indicated that they
are not currently using Garage, so I think that's the size of the data they
would like/need to store on Garage, but not the actual size of an
installed cluster.  The number of objects stored in clusters is quite evenly
split between less than 10k, 10k to 100k, and more than 100k.

<center><img src="all-cluster-size.png" /></center>
<center><img src="all-cluster-object-count.png" /></center>

For about half of respondents, this means storing mostly objects of around 100MB in size.
For the others, it's mostly objects of around 10MB. This is very inexact since the
proposed answers for cluster size and object count had such large ranges.

<center><img src="all-object-size.png" /></center>

## Satisfaction regarding Garage

A majority of users reported a high degree of satisfaction with Garage.
About a quarter said that Garage has some significant flaws. A small portion
of respondents indicated that they cannot use Garage due to missing
important features or critical bugs, but still took the time to answer
the survey (thanks to them!).

<center><img src="all-satisfaction.png" /></center>

The top 3 strong points of Garage reported by its users are: good S3 compatibility
(first place, with 2/3 of respondents agreeing), good performance on small / low-power
machines, and easy setup. I'd say we are pretty much on target, as these are some of the
main objectives of Garage.

<center><img src="all-strong-points.png" /></center>

As for most wanted features in Garage, there is a clear winner with a web interface
for cluster administration, with over 40% of users mentioning it.  The second most
wanted feature is support for S3 versioning, with almost 30% of answers.

<center><img src="all-wanted-features.png" /></center>

The vast majority of users reported never losing data that they stored in Garage.
Only one indicated that they lost data and it was Garage's fault: this was
because they tried to move an LMDB database between machines with different
architectures, but the LMDB on-disk format is architecture specific. We should
probably be more clear about this in the documentation.

<center><img src="all-lose-data.png" /></center>


# Users in a "homelab/self-hosted setting"

52 respondents indicated that they are using Garage for storing production
data in a homelab or self-hosted setting. I'd say this is the most
representative portion of Garage users, as it is its primary target.
Let's look at the answers from these users only.

## About the clusters

Personal files now takes the first place of the kinds of data stored on these clusters,
still closely followed by back-ups.

<center><img src="homelab-data-kind.png" /></center>

These users are mostly not using Garage in a geodistributed setting.
The distribution of answers is very similar to the overall.

<center><img src="homelab-n-zones.png" /></center>

Most clusters of these users are less than 1TB and size,
and the remaining are mostly in the 1TB - 10TB range.
There are fewer clusters than average storing more than 100k objects in this population,
but the distribution of object sizes (not shown) is very similar to the overall.

<center><img src="homelab-cluster-size.png" /></center>
<center><img src="homelab-cluster-object-count.png" /></center>

## Satisfaction regarding Garage

Homelab/self-hosting users reported a level of satisfaction a bit higher with Garage,
with almost 3/4 very satisfied.

<center><img src="homelab-satisfaction.png" /></center>

The top 3 reasons for using Garage are the same, but good performance on small
/ low-power machines is now taking the first place.

<center><img src="homelab-strong-points.png" /></center>

The top 2 wanted features are still the same, now with an equal number of votes.

<center><img src="homelab-wanted-features.png" /></center>

# Users in a "commercial setting"

Fewer users indicated that they are running Garage in a commercial setting,
as this concerned only 12 of the respondents to the survey.

## About the clusters

Half of users reported using Garage to store back-ups,
and almost half reported storing observability data and web app / service data.
One third selected static websites.

<center><img src="commercial-data-kind.png" /></center>

Users in a commercial setting are more consistent in their use of the
geo-distribution features offered by Garage. Only one third of users are
not running in geo-distributed mode. Another third is running Garage in 2 locations,
and the last third is running in 3 or more locations, thus benefitting from
the best resiliency properties that Garage can offer.

<center><img src="commercial-n-zones.png" /></center>

The majority of commercial deployments are storing between 1TB and 10TB of data.
About a quarter are storing more than 1 million objects.

<center><img src="commercial-cluster-size.png" /></center>
<center><img src="commercial-cluster-object-count.png" /></center>

It seems that the average object size is much smaller in this population:
the majority of answers correspond to average object sizes of less than 10MB,
and one foruth of answers corresponds to objects of around 1MB.

<center><img src="commercial-object-size.png" /></center>


## Satisfaction regarding Garage

Three quarter of these users reported a high degree of satisfaction with Garage,
about the same as for homelab users.

<center><img src="commercial-satisfaction.png" /></center>

The most liked qualities of Garage are a bit different. Fewer users reported
satisfaction due to the easy setup of Garage, but more users indicated 
that the possibility of easily adding and removing nodes was important to them.
Good tolerance to offline nodes and crashes, and good performance in the face
of latency, which are the core properties that make Garage work well in
geo-distributed settings, were selected by two thirds of users, most likely
the same that said they are running in geo-distributed mode.

<center><img src="commercial-strong-points.png" /></center>

A web interface for cluster administration is still the most wanted feature, with 40%
of votes. Then, one third voted for better monitoring and observability, and for
per-bucket levels of consistency and numbers of replicas. Only 25% voted for S3
versioning.

<center><img src="commercial-wanted-features.png" /></center>

# Users that have the biggest clusters

7 users reported running clusters storing more than 10TB of data.
About half of these users are using Garage for a homelab or self-hosted setup,
and one is in a commercial setting.

<center><img src="big-currently-admin.png" /></center>

## About the clusters

Almost all of these users are using Garage to store back-ups.
Multimedia files are the second most selected option, which
would explain why these clusters are so big.

<center><img src="big-data-kind.png" /></center>

These deployments are quite evenly split between not
being geo-replicated and being geo-replicated in 2 or 3 locations.

<center><img src="big-n-zones.png" /></center>

## Satisfaction regarding garage

A majority of users report a high degree of satisfaction with Garage,
but many users also reported significant flaws.

<center><img src="big-satisfaction.png" /></center>

Unsurprisingly, when clusters start becoming big enough, the most requested
improvement is better performance around the board.
Per-bucket levels of consistency and number of replicas was also selected
by almost half of users.

<center><img src="big-wanted-features.png" /></center>

# Users that reported that garage had some significant flaws

Focusing on users that reported that Garage is usable for them but has "significant flaws",
the two most requested features were a web administration interface and S3 versioning.
Bucket-level ACLs (that would allow anonymous access directly from the S3 endpoint)
and performance improvements came next.

<center><img src="flaws-wanted-features.png" /></center>

Concerning users that said that Garage has critical issues that is preventing
them from using it, the "Other" option was the most selected answer for the
requested features. Licensing issues allegedly preventing commercial use were
cited by a few users (hint: it's actually a non-issue, and we will write about
this at some point), but I think for most of these users, they have a specific
use case in mind which is not targeted by Garage. For instance, several have
indicated that they would need POSIX filesystem compatibility and/or the
possibility to use Garage as a CSI driver in Kubernetes (unfortunately, this is
mostly impossible to achieve with good performance in a geo-distributed
environment, and the principles on which Garage is based explicitly prevents it
from fulfilling this role).