op_guide/stolon/manual_backup.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292

## Disclaimer

Do **NOT** use the following backup methods on the Stolon Cluster:
  1. copying the data directory
  2. `pg_dump`
  3. `pg_dumpall`

The first one will lead to corrupted/inconsistent files.
The second and third ones put too much pressure on the cluster.
Basically, you will destroy it, in the following ways:
  - Load will increase, requests will timeout
  - RAM will increase, the daemon will be OOM (Out Of Memory) killed by Linux
  - Potentially, the WAL log will grow a lot


## A binary backup with `pg_basebackup`

The only acceptable solution is `pg_basebackup` with **some throttling configured**.
Later, if you want a SQL dump, you can inject this binary backup on an ephemeral database you spawned solely for this purpose on a non-production machine.

First, start by fetching from Consul the identifiers of the replication account.
Do not use the root account setup in Stolon, it will not work.

First setup a SSH tunnel on your machine that bind postgresql, eg:

```bash
ssh -L 5432:psql-proxy.service.2.cluster.deuxfleurs.fr:5432 ...
```

*Later, we will use `/tmp/sql` as our working directory. Depending on your distribution, this
folder may be a `tmpfs` and thus mounted on RAM. If it is the case, choose another folder, that is not a `tmpfs`, otherwise you will fill your RAM
and fail your backup. I am using NixOS and the `/tmp` folder is a regular folder, persisted on disk, which explain why I am using it.*

Then export your password in `PGPASSWORD` and launch the backup:

```bash
export PGPASSWORD=xxx

mkdir -p /tmp/sql
cd /tmp/sql

pg_basebackup \
  --host=127.0.0.1 \
  --username=replicator \
  --pgdata=/tmp/sql \
  --format=tar \
  --wal-method=stream \
  --gzip \
  --compress=6 \
  --progress \
  --max-rate=5M
```

*Something you should now: while it seems optional, fetching the WAL is mandatory. At first, I thought it was a way to have a "more recent backup".
But after some reading, it appears that the base backup is corrupted because it is not a snapshot at all, but a copy of the postgres folder with no specific state.
The whole point of the WAL is, in fact, to fix this corrupted archive...*

*Take a cup of coffe, it will take some times...*

The result I get (the important file is `base.tar.gz`, `41921.tar.gz` will probably be missing as it is a secondary tablespace I will deactivate soon):

```
[nix-shell:/tmp/sql]$ ls
41921.tar.gz  backup_manifest  base.tar.gz  pg_wal.tar.gz
```

From now, disconnect from the production to continue your work.
You don't need it anymore and it will prevent some disaster if you fail a command.


## Importing the backup

> The backup taken with `pg_basebckup` is an exact copy of your data directory so, all you need to do to restore from that backup is to point postgres at that directory and start it up.

```bash
mkdir -p /tmp/sql/pg_data && cd /tmp/sql/pg_data
tar xzfv ../base.tar.gz
```

Now you should have something like that:

```
[nix-shell:/tmp/sql/pg_data]$ ls
backup_label      base    pg_commit_ts  pg_hba.conf    pg_logical    pg_notify    pg_serial     pg_stat      pg_subtrans  pg_twophase  pg_wal   postgresql.conf              tablespace_map
backup_label.old  global  pg_dynshmem   pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    PG_VERSION   pg_xact  stolon-temp-postgresql.conf
```

Now we will extract the WAL:

```bash
mkdir -p /tmp/sql/wal && cd /tmp/sql/wal
tar xzfv ../pg_wal.tar.gz
```

You should have something like that:

```
[nix-shell:/tmp/sql/wal]$ ls
00000003000014AF000000C9  00000003000014AF000000CA  00000003.history  archive_status
```

Now, We must edit/read some files before launching our ephemeral server:
  - Set `listen_addresses = '0.0.0.0'` in `postgresql.conf` 
  - Add `restore_command = 'cp /mnt/wal/%f %p' ` in `postgresql.conf` 
  - Check `port` in `postgresql.conf`, in our case it is `5433`.
  - Create an empty file named `recovery.signal`

*Do not create a `recovery.conf` file, it might be written on the internet but this is a deprecated method and your postgres daemon will refuse to boot if it finds one.*

*Currently, we use port 5433 in oour postgresql configuration despite 5432 being the default port. Indeed, in production, clients access the cluster transparently through the Stolon Proxy that listens on port 5432 and redirect the requests to the correct PostgreSQL instance, listening secretly on port 5433! To export our binary backup in text, we will directly query our postgres instance without passing through the proxy, which is why you must note this port.*

Now we will start our postgres container on our machine.

At the time of writing the live version is `superboum/amd64_postgres:v9`.
We must start by getting `postgres` user id. Our container are run by default with this user, so you only need to run:

```bash
docker run --rm -it superboum/amd64_postgres:v9 id
```

And we get:

```
uid=999(postgres) gid=999(postgres) groups=999(postgres),101(ssl-cert)
```

Now `chown` your `pg_data`:

```bash
chown 999:999 -R /tmp/sql/{pg_data,wal}
chmod 700 -R /tmp/sql/{pg_data,wal}
```

And finally:

```
docker run \
  --rm \
  -it \
  -p 5433:5433 \
  -v /tmp/sql/:/mnt/ \
  superboum/amd64_postgres:v9 \
  postgres -D /mnt/pg_data
```

I have the following output:

```
2022-01-28 14:46:39.750 GMT [1] LOG:  skipping missing configuration file "/mnt/pg_data/postgresql.auto.conf"
2022-01-28 14:46:39.763 UTC [1] LOG:  starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2022-01-28 14:46:39.764 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2022-01-28 14:46:39.767 UTC [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2022-01-28 14:46:39.773 UTC [7] LOG:  database system was interrupted; last known up at 2022-01-28 14:33:13 UTC
cp: cannot stat '/mnt/wal/00000004.history': No such file or directory
2022-01-28 14:46:40.318 UTC [7] LOG:  starting archive recovery
2022-01-28 14:46:40.321 UTC [7] LOG:  restored log file "00000003.history" from archive
2022-01-28 14:46:40.336 UTC [7] LOG:  restored log file "00000003000014AF000000C9" from archive
2022-01-28 14:46:41.426 UTC [7] LOG:  could not open directory "pg_tblspc/41921/PG_13_202007201": No such file or directory
2022-01-28 14:46:41.445 UTC [7] LOG:  could not open directory "pg_tblspc/41921/PG_13_202007201": No such file or directory
2022-01-28 14:46:41.457 UTC [7] LOG:  redo starts at 14AF/C9000028
2022-01-28 14:46:41.500 UTC [7] LOG:  restored log file "00000003000014AF000000CA" from archive
2022-01-28 14:46:42.461 UTC [7] LOG:  consistent recovery state reached at 14AF/CA369AB0
2022-01-28 14:46:42.461 UTC [1] LOG:  database system is ready to accept read only connections
cp: cannot stat '/mnt/wal/00000003000014AF000000CB': No such file or directory
2022-01-28 14:46:42.463 UTC [7] LOG:  redo done at 14AF/CA369AB0
2022-01-28 14:46:42.463 UTC [7] LOG:  last completed transaction was at log time 2022-01-28 14:35:04.698438+00
2022-01-28 14:46:42.480 UTC [7] LOG:  could not open directory "pg_tblspc/41921/PG_13_202007201": No such file or directory
2022-01-28 14:46:42.493 UTC [7] LOG:  restored log file "00000003000014AF000000CA" from archive
cp: cannot stat '/mnt/wal/00000004.history': No such file or directory
2022-01-28 14:46:43.462 UTC [7] LOG:  selected new timeline ID: 4
2022-01-28 14:46:44.441 UTC [7] LOG:  archive recovery complete
2022-01-28 14:46:44.444 UTC [7] LOG:  restored log file "00000003.history" from archive
2022-01-28 14:46:45.614 UTC [1] LOG:  database system is ready to accept connections
```

*Notes: the missing tablespace is a legacy tablesplace used in the past to debug Matrix. It will be removed soon, we can safely ignore it. Other errors on cp seems to be intended as postgres might want to know how far it can rewind with the WAL but I a not 100% sure.*

Your ephemeral instance should work:

```bash
export PGPASSWORD=xxx # your postgres (admin) account password

psql -h 127.0.0.1 -p 5433 -U postgres postgres
```

And your databases should appear:

```
[nix-shell:~/Documents/dev/infrastructure]$ psql -h 127.0.0.1 -p 5433 -U postgres postgres
psql (13.5, server 13.3 (Debian 13.3-1.pgdg100+1))
Type "help" for help.

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 xxxxx     | xxxxx    | UTF8     | en_US.utf8 | en_US.utf8 |
 xxxxx     | xxxxx    | UTF8     | en_US.utf8 | en_US.utf8 |
 xxxxx     | xxxxx    | UTF8     | en_US.utf8 | en_US.utf8 |
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 xxxx      | xxxxx    | UTF8     | en_US.utf8 | en_US.utf8 |
 xxxx      | xxxxx    | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
(8 rows)
```

## Dump your ephemeral database as SQL

Now we can do a SQL export of our ephemeral database.
We use zstd to automatically compress the outputed file.
We use multiple parameters:
  - `-vv` gives use some idea on the progress
  - `-9` is a quite high value and should compress efficiently. Decrease it if your machine is low powered
  - `-T0` asks zstd to use all your cores. By default, zstd uses only one core.

```bash
pg_dumpall -h 127.0.0.1 -p 5433 -U postgres \
  | zstd -vv -9 -T0 --format=zstd > dump-`date --rfc-3339=seconds | sed 's/ /T/'`.sql.zstd
```

I get the following result:

```
[nix-shell:/tmp/sql]$ ls -lah dump*
-rw-r--r-- 1 quentin users 749M janv. 28 16:07 dump-2022-01-28T16:06:29+01:00.sql.zstd
```

Now you can stop your ephemeral server.

## Restore your SQL file

First, start a blank server:

```bash
docker run \
  --rm -it \
  --name postgres \
  -p 5433:5432 \
  superboum/amd64_postgres:v9 \
  bash -c '
    set -ex
    mkdir /tmp/psql
    initdb -D /tmp/psql --no-locale --encoding=UTF8
    echo "host all postgres 0.0.0.0/0 md5" >> /tmp/psql/pg_hba.conf
    postgres -D /tmp/psql
  '
```

Then set the same password as your prod for the `posgtgres` user (it will be required as part of the restore):

```bash
docker exec -ti postgres bash -c "echo \"ALTER USER postgres WITH PASSWORD '$PGPASSWORD';\" | psql"
echo '\l' | psql -h 127.0.0.1 -p 5433 -U postgres postgres
# the database should have no entry (except `posgtres`, `template0` and `template1`) otherwise ABORT EVERYTHING, YOU ARE ON THE WRONG DB
```

And finally, restore your SQL backup:

```bash
zstdcat -vv dump-* | \
  grep -P -v '^(CREATE|DROP) ROLE postgres;' | \
  psql -h 127.0.0.1 -p 5433 -U postgres --set ON_ERROR_STOP=on  postgres
```

*Note: we must skip CREATE/DROP ROLE postgres during the restore as it aready exists and would generate an error.
Because we want to be extra careful, we specifically asked to crash on every error and do not want to change this behavior.
So, instead, we simply remove any entry that contains the specific regex stated in the previous command.*

Check that the backup has been correctly restored.
For example:

```bash
docker exec -ti postgres psql
#then type "\l", "\c db-name", "select ..."
```

## Finally, store it safely

```bash
rsync --progress -av /tmp/sql/{*.tar.gz,backup_manifest,dump-*} backup/target
```

## Ref

https://philipmcclarence.com/backing-up-and-restoring-postgres-using-pg_basebackup/
https://www.cybertec-postgresql.com/en/pg_basebackup-creating-self-sufficient-backups/
https://www.postgresql.org/docs/14/continuous-archiving.html
https://www.postgresql.org/docs/14/backup-dump.html#BACKUP-DUMP-RESTORE
https://dba.stackexchange.com/questions/75033/how-to-restore-everything-including-postgres-role-from-pg-dumpall-backup