diff options
author | Alex Auvolat <alex@adnab.me> | 2022-09-28 17:22:50 +0200 |
---|---|---|
committer | Alex Auvolat <alex@adnab.me> | 2022-09-28 17:22:50 +0200 |
commit | 7edd77d61a87e113f0d559ca984afa3db0fef2b3 (patch) | |
tree | ae0a9fdacdc4cbc8b9a49948677a2ca538c0e564 /content/blog/2022-perf | |
parent | 2ccde811e91c0aaba4ce401b63b35833f04d5367 (diff) | |
download | garagehq.deuxfleurs.fr-7edd77d61a87e113f0d559ca984afa3db0fef2b3.tar.gz garagehq.deuxfleurs.fr-7edd77d61a87e113f0d559ca984afa3db0fef2b3.zip |
Spellcheck
Diffstat (limited to 'content/blog/2022-perf')
-rw-r--r-- | content/blog/2022-perf/index.md | 41 |
1 files changed, 21 insertions, 20 deletions
diff --git a/content/blog/2022-perf/index.md b/content/blog/2022-perf/index.md index d020be4..39520a4 100644 --- a/content/blog/2022-perf/index.md +++ b/content/blog/2022-perf/index.md @@ -4,15 +4,16 @@ date=2022-09-26 +++ -*During the past years, we have extensively analyzed possible design decisions and -their theoretical trade-offs for Garage, especially concerning networking, data -structures, and scheduling. Garage worked well enough for our production +*During the past years, we have thought a lot about possible design decisions and +their theoretical trade-offs for Garage. In particular, we pondered the impacts +of data structures, networking methods, and scheduling algorithms. +Garage worked well enough for our production cluster at Deuxfleurs, but we also knew that people started to discover some -unexpected behaviors. We thus started a round of benchmark and performance +unexpected behaviors. We thus started a round of benchmarks and performance measurements to see how Garage behaves compared to our expectations. This post presents some of our first results, which cover -3 aspects of performance: efficient I/O, "myriads of objects" and resiliency, -to reflect the high-level properties we are seeking.* +3 aspects of performance: efficient I/O, "myriads of objects", and resiliency, +reflecting the high-level properties we are seeking.* <!-- more --> @@ -20,8 +21,8 @@ to reflect the high-level properties we are seeking.* ## ⚠️ Disclaimer -The following results must be taken with a critical grain of salt due to some -limitations that are inherent to any benchmark. We try to reference them as +The results presented in this blog post must be taken with a critical grain of salt due to some +limitations that are inherent to any benchmarking endeavour. We try to reference them as exhaustively as possible in this first section, but other limitations might exist. Most of our tests were made on simulated networks, which by definition cannot represent all the @@ -91,7 +92,7 @@ The main purpose of an object storage system is to store and retrieve objects across the network, and the faster these two functions can be accomplished, the more efficient the system as a whole will be. For this analysis, we focus on 2 aspects of performance. First, since many applications can start processing a file -before receiving it completely, we will evaulate the Time-to-First-Byte (TTFB) +before receiving it completely, we will evaluate the Time-to-First-Byte (TTFB) on GetObject requests, i.e. the duration between the moment a request is sent and the moment where the first bytes of the returned object are received by the client. Second, we will evaluate generic throughput, to understand how well @@ -187,7 +188,7 @@ adapted for small-scale tests, and we kept only the aggregated result named "cluster total". The goal of this experiment is to get an idea of the cluster performance with a standardized and mixed workload. -![Plot showing IO perf of Garage configs and Minio](io.png) +![Plot showing IO performances of Garage configurations and Minio](io.png) Minio, our reference point, gives us the best performances in this test. Looking at Garage, we observe that each improvement we made has a visible @@ -213,8 +214,8 @@ the only supported option was [sled](https://sled.rs/), but we started having serious issues with it - and we were not alone ([#284](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/284)). With Garage v0.8, we introduce an abstraction semantic over the features we expect from our -database, allowing us to switch from one backend to another without touching -the rest of our codebase. We added two additional backends: LMDB +database, allowing us to switch from one back-end to another without touching +the rest of our codebase. We added two additional back-ends: LMDB (through [heed](https://github.com/meilisearch/heed)) and SQLite (using [Rusqlite](https://github.com/rusqlite/rusqlite)). **Keep in mind that they are both experimental: contrarily to sled, we have never run them in production @@ -281,7 +282,7 @@ use a combination of `O_DSYNC` and `fdatasync(3p)` - a derivative that ensures only data and not metadata is persisted on disk - in combination with `O_DIRECT` for direct I/O ([discussion](https://github.com/minio/minio/discussions/14339#discussioncomment-2200274), -[example in minio source](https://github.com/minio/minio/blob/master/cmd/xl-storage.go#L1928-L1932)).* +[example in Minio source](https://github.com/minio/minio/blob/master/cmd/xl-storage.go#L1928-L1932)).* **Storing a million objects** - Object storage systems are designed not only for data durability and availability but also for scalability, so naturally, @@ -347,11 +348,11 @@ Let us now focus on Garage's metrics only to better see its specific behavior: ![Showing the time to send 128 batches of 8192 objects for Garage only](1million.png) -Two effects are now more visible: 1increasing batch completion time increases with the -number of objects in the bucket and 2. measurements are dispersed, at least +Two effects are now more visible: 1., increasing batch completion time increases with the +number of objects in the bucket and 2., measurements are dispersed, at least more than for Minio. We expect this batch completion time increase to be logarithmic, -but we don't have enough datapoint to conclude safety: additional -measurements are needed. Concercning the observed instability, it could +but we don't have enough data points to conclude safety: additional +measurements are needed. Concerning the observed instability, it could be a symptom of what we saw with some other experiments in this machine, which sometimes freezes under heavy I/O load. Such freezes could lead to request timeouts and failures. If this occurs on our testing computer, it will @@ -418,14 +419,14 @@ any significant evolution from one version to another (Garage v0.7.3 and Garage v0.8.0 Beta 1 here). Compared to Minio, these values are either similar (for ListObjects and ListBuckets) or way better (for GetObject, PutObject, and RemoveObject). This can be easily understood by the fact that Minio has not been designed for -environments with high latencies. Instead, it expects to run on clusters that are buil allt -the same datacenter. In a multi-DC setup, different clusters could then possibly be interconnected them with their asynchronous +environments with high latencies. Instead, it expects to run on clusters that are built +in a singe data center. In a multi-DC setup, different clusters could then possibly be interconnected them with their asynchronous [bucket replication](https://min.io/docs/minio/linux/administration/bucket-replication.html?ref=docs-redirect) feature. *Minio also has a [multi-site active-active replication system](https://blog.min.io/minio-multi-site-active-active-replication/) but it is even more sensitive to latency: "Multi-site replication has increased -latency sensitivity, as MinIO does not consider an object as replicated until +latency sensitivity, as Minio does not consider an object as replicated until it has synchronized to all configured remote targets. Replication latency is therefore dictated by the slowest link in the replication mesh."* |