aboutsummaryrefslogtreecommitdiff
path: root/content/blog/2024-03-survey/index.md
diff options
context:
space:
mode:
authorAlex Auvolat <alex@adnab.me>2024-03-14 15:43:59 +0100
committerAlex Auvolat <alex@adnab.me>2024-03-14 15:43:59 +0100
commit2da43a24722b62c8f8f480c757327c41e5fbd407 (patch)
tree792306ee3c44fcf8f7d47ce06550c8537fe34377 /content/blog/2024-03-survey/index.md
parent049dc482cfe364d70b26b27a5812ae5a7d410b8a (diff)
downloadgaragehq.deuxfleurs.fr-2da43a24722b62c8f8f480c757327c41e5fbd407.tar.gz
garagehq.deuxfleurs.fr-2da43a24722b62c8f8f480c757327c41e5fbd407.zip
first pass on survey results blog postblog-survey
Diffstat (limited to 'content/blog/2024-03-survey/index.md')
-rw-r--r--content/blog/2024-03-survey/index.md261
1 files changed, 261 insertions, 0 deletions
diff --git a/content/blog/2024-03-survey/index.md b/content/blog/2024-03-survey/index.md
new file mode 100644
index 0000000..20e0284
--- /dev/null
+++ b/content/blog/2024-03-survey/index.md
@@ -0,0 +1,261 @@
++++
+title="Results of the community survey"
+date=2024-03-12
++++
+
+*We ran a community survey to gather feedback from Garage users and potential
+users during a two-month period. One of the main objectives of
+this survey was to determine expectations from the community for Garage's
+upcoming v1.0 release and for future work. Read this article for a discussion
+of the results.*
+
+<!-- more -->
+
+---
+
+The survey collected 127 response during a time period of almost 2 months,
+from the 15th of January to the 12th of March.
+The first question we asked users were how they have heard of Garage:
+the majority answered that they have head of Garage through a link
+aggregator or social network such as Reddit or HN. A portion of
+users have heard of it from word of mouth, and a significant portion also
+answered "Other". Unfortunately we didn't ask respondents for details
+if they selected "Other", so I'm quite curious as to what this could be.
+Other choices have almost negligible number of responses.
+
+<center><img src="all-how-known.png" /></center>
+
+Half of the respondents indicated that they are currently running a Garage cluster
+for production data, of which a small fraction indicated running it in a commercial
+setting. Another third of respondents indicated that they are currently testing Garage
+or have tested it previously.
+
+<center><img src="all-currently-admin.png" /></center>
+
+## About currently running Garage installations
+
+We first asked users what kind of data they were storing in Garage.
+The first answer, selected by about half of the participants,
+is for storing back-ups, followed closely by personal files.
+Other answers follow with a rougly linearly decreasing pattern.
+
+<center><img src="all-data-kind.png" /></center>
+
+The majority of users are not running Garage in geodistributed mode,
+but many users are also running in 2, 3 or even 4 locations.
+
+<center><img src="all-n-zones.png" /></center>
+
+A large majority of users are only using Garage through the S3 API.
+The remaining users are mostly using a mix of S3 API and web API,
+with a small number of users (5) using Garage primarily as a web server.
+
+<center><img src="all-access-mode.png" /></center>
+
+Regarding the size of clusters, the majority of installed clusters are less
+than 1TB in size. The others are almost all between 1TB to 10TB. 8 users
+indicated that they are running clusters of more than 10TB. Two users that
+reported running clusters of more than 100TB, but they also indicated that they
+are not currently using Garage, so I think that's the size of the data they
+would like/need to store on Garage, but not the actual size of an
+installed cluster. The number of objects stored in clusters is quite evenly
+split between less than 10k, 10k to 100k, and more than 100k.
+
+<center><img src="all-cluster-size.png" /></center>
+<center><img src="all-cluster-object-count.png" /></center>
+
+For about half of respondents, this means storing mostly objects of around 100MB in size.
+For the others, it's mostly objects of around 10MB. This is very inexact since the
+proposed answers for cluster size and object count had such large ranges.
+
+<center><img src="all-object-size.png" /></center>
+
+## Satisfaction regarding Garage
+
+A majority of users reported a high degree of satisfaction with Garage.
+About a quarter said that Garage has some significant flaws. A small portion
+of respondents indicated that they cannot use Garage due to missing
+important features or critical bugs, but still took the time to answer
+the survey (thanks to them!).
+
+<center><img src="all-satisfaction.png" /></center>
+
+The top 3 strong points of Garage reported by its users are: good S3 compatibility
+(first place, with 2/3 of respondents agreeing), good performance on small / low-power
+machines, and easy setup. I'd say we are pretty much on target, as these are some of the
+main objectives of Garage.
+
+<center><img src="all-strong-points.png" /></center>
+
+As for most wanted features in Garage, there is a clear winner with a web interface
+for cluster administration, with over 40% of users mentioning it. The second most
+wanted feature is support for S3 versioning, with almost 30% of answers.
+
+<center><img src="all-wanted-features.png" /></center>
+
+The vast majority of users reported never losing data that they stored in Garage.
+Only one indicated that they lost data and it was Garage's fault: this was
+because they tried to move an LMDB database between machines with different
+architectures, but the LMDB on-disk format is architecture specific. We should
+probably be more clear about this in the documentation.
+
+<center><img src="all-lose-data.png" /></center>
+
+
+# Users in a "homelab/self-hosted setting"
+
+52 respondents indicated that they are using Garage for storing production
+data in a homelab or self-hosted setting. I'd say this is the most
+representative portion of Garage users, as it is its primary target.
+Let's look at the answers from these users only.
+
+## About the clusters
+
+Personal files now takes the first place of the kinds of data stored on these clusters,
+still closely followed by back-ups.
+
+<center><img src="homelab-data-kind.png" /></center>
+
+These users are mostly not using Garage in a geodistributed setting.
+The distribution of answers is very similar to the overall.
+
+<center><img src="homelab-n-zones.png" /></center>
+
+Most clusters of these users are less than 1TB and size,
+and the remaining are mostly in the 1TB - 10TB range.
+There are fewer clusters than average storing more than 100k objects in this population,
+but the distribution of object sizes (not shown) is very similar to the overall.
+
+<center><img src="homelab-cluster-size.png" /></center>
+<center><img src="homelab-cluster-object-count.png" /></center>
+
+## Satisfaction regarding Garage
+
+Homelab/self-hosting users reported a level of satisfaction a bit higher with Garage,
+with almost 3/4 very satisfied.
+
+<center><img src="homelab-satisfaction.png" /></center>
+
+The top 3 reasons for using Garage are the same, but good performance on small
+/ low-power machines is now taking the first place.
+
+<center><img src="homelab-strong-points.png" /></center>
+
+The top 2 wanted features are still the same, now with an equal number of votes.
+
+<center><img src="homelab-wanted-features.png" /></center>
+
+# Users in a "commercial setting"
+
+Fewer users indicated that they are running Garage in a commercial setting,
+as this concerned only 12 of the respondents to the survey.
+
+## About the clusters
+
+Half of users reported using Garage to store back-ups,
+and almost half reported storing observability data and web app / service data.
+One third selected static websites.
+
+<center><img src="commercial-data-kind.png" /></center>
+
+Users in a commercial setting are more consistent in their use of the
+geo-distribution features offered by Garage. Only one third of users are
+not running in geo-distributed mode. Another third is running Garage in 2 locations,
+and the last third is running in 3 or more locations, thus benefitting from
+the best resiliency properties that Garage can offer.
+
+<center><img src="commercial-n-zones.png" /></center>
+
+The majority of commercial deployments are storing between 1TB and 10TB of data.
+About a quarter are storing more than 1 million objects.
+
+<center><img src="commercial-cluster-size.png" /></center>
+<center><img src="commercial-cluster-object-count.png" /></center>
+
+It seems that the average object size is much smaller in this population:
+the majority of answers correspond to average object sizes of less than 10MB,
+and one foruth of answers corresponds to objects of around 1MB.
+
+<center><img src="commercial-object-size.png" /></center>
+
+
+## Satisfaction regarding Garage
+
+Three quarter of these users reported a high degree of satisfaction with Garage,
+about the same as for homelab users.
+
+<center><img src="commercial-satisfaction.png" /></center>
+
+The most liked qualities of Garage are a bit different. Fewer users reported
+satisfaction due to the easy setup of Garage, but more users indicated
+that the possibility of easily adding and removing nodes was important to them.
+Good tolerance to offline nodes and crashes, and good performance in the face
+of latency, which are the core properties that make Garage work well in
+geo-distributed settings, were selected by two thirds of users, most likely
+the same that said they are running in geo-distributed mode.
+
+<center><img src="commercial-strong-points.png" /></center>
+
+A web interface for cluster administration is still the most wanted feature, with 40%
+of votes. Then, one third voted for better monitoring and observability, and for
+per-bucket levels of consistency and numbers of replicas. Only 25% voted for S3
+versioning.
+
+<center><img src="commercial-wanted-features.png" /></center>
+
+# Users that have the biggest clusters
+
+7 users reported running clusters storing more than 10TB of data.
+About half of these users are using Garage for a homelab or self-hosted setup,
+and one is in a commercial setting.
+
+<center><img src="big-currently-admin.png" /></center>
+
+## About the clusters
+
+Almost all of these users are using Garage to store back-ups.
+Multimedia files are the second most selected option, which
+would explain why these clusters are so big.
+
+<center><img src="big-data-kind.png" /></center>
+
+These deployments are quite evenly split between not
+being geo-replicated and being geo-replicated in 2 or 3 locations.
+
+<center><img src="big-n-zones.png" /></center>
+
+## Satisfaction regarding garage
+
+A majority of users report a high degree of satisfaction with Garage,
+but many users also reported significant flaws.
+
+<center><img src="big-satisfaction.png" /></center>
+
+Unsurprisingly, when clusters start becoming big enough, the most requested
+improvement is better performance around the board.
+Per-bucket levels of consistency and number of replicas was also selected
+by almost half of users.
+
+<center><img src="big-wanted-features.png" /></center>
+
+# Users that reported that garage had some significant flaws
+
+Focusing on users that reported that Garage is usable for them but has "significant flaws",
+the two most requested features were a web administration interface and S3 versioning.
+Bucket-level ACLs (that would allow anonymous access directly from the S3 endpoint)
+and performance improvements came next.
+
+<center><img src="flaws-wanted-features.png" /></center>
+
+Concerning users that said that Garage has critical issues that is preventing
+them from using it, the "Other" option was the most selected answer for the
+requested features. Licensing issues allegedly preventing commercial use were
+cited by a few users (hint: it's actually a non-issue, and we will write about
+this at some point), but I think for most of these users, they have a specific
+use case in mind which is not targeted by Garage. For instance, several have
+indicated that they would need POSIX filesystem compatibility and/or the
+possibility to use Garage as a CSI driver in Kubernetes (unfortunately, this is
+mostly impossible to achieve with good performance in a geo-distributed
+environment, and the principles on which Garage is based explicitly prevents it
+from fulfilling this role).
+