aboutsummaryrefslogtreecommitdiff
path: root/doc/book/benchmarks/failure_recovery.md
blob: 59c9399dea392e7269524c7fcd4b796f79295d07 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
+++
title = "Failure & recovery"
weight = 50
+++

# Failure impact

Failures will lead to timeouts, which in turn could
lead to failed requests (this is a bug if failure enters in Garage tolerance)
and to increased latency as some retries might be performed.

How we proceed: we pause (`kill -STOP xxx`) one Garage process.
The idea is we don't want to close the TCP connection that would signal too easily
that a crash occured. Instead, we want to simulate a network error
or an overloaded process, ie. a 'non-collaborating' crash.


# Recovery impact