aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 412ee46c84330c4696a92e10b1bfd0c03fabf868 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Deuxfleurs on NixOS!

This repository contains code to run Deuxfleur's infrastructure on NixOS.

It sets up the following:

- A Wireguard mesh between all nodes
- Consul, with TLS
- Nomad, with TLS


## How to welcome a new administrator

See: https://guide.deuxfleurs.fr/operations/acces/pass/

Basically:
  - The new administrator generates a GPG key and publishes it on Gitea
  - All existing administrators pull their key and sign it
  - An existing administrator reencrypt the keystore with this new key and push it
  - The new administrator clone the repo and check that they can decrypt the secrets
  - Finally, the new administrator must choose a password to operate over SSH with `./passwd prod rick` where `rick` is the target username


## How to create files for a new zone

*The documentation is written for the production cluster, the same apply for other clusters.*

Basically:
  - Create your `site` file in `cluster/prod/site/` folder
  - Create your `node` files in `cluster/prod/node/` folder
  - Add your wireguard configuration to `cluster/prod/cluster.nix`
    - You will have to edit your NAT config manually
    - To get your node's wg public key, you must run `./deploy_prod prod <node>`, see the next section for more information
  - Add your nodes to `cluster/prod/ssh_config`, it will be used by the various SSH scripts.
    - If you use `ssh` directly, use `ssh -F ./cluster/prod/ssh_config`
    - Add `User root` for the first time as your user will not be declared yet on the system

## How to deploy a Nix configuration on a fresh node

We suppose that the node name is `datura`. 
Start by doing the deployment one node at a time, you will have plenty of time
in your operator's life to break everything through automation.

Run:
  - `./deploy_wg prod datura` - to generate wireguard's keys
  - `./deploy_nixos prod datura` - to deploy the nix configuration files
   - need to be redeployed on all nodes as the new wireguard conf is needed everywhere
  - `./deploy_password prod datura` - to deploy user's passwords
   - need to be redeployed on all nodes to setup the password on all nodes
  - `./deploy_pki prod datura` - to deploy Nomad's and Consul's PKI

## How to operate a node

Edit your `~/.ssh/config` file:

```
Host dahlia
  HostName dahlia.machine.deuxfleurs.fr
  LocalForward 14646 127.0.0.1:4646
  LocalForward 8501 127.0.0.1:8501
  LocalForward 1389 bottin.service.prod.consul:389
  LocalForward 5432 psql-proxy.service.prod.consul:5432
```

Then run the TLS proxy and leave it running:

```
./tlsproxy prod
```

SSH to a production machine (e.g. dahlia) and leave it running:

```
ssh dahlia
```


Finally you should see be able to access the production Nomad and Consul by browsing: 

 - Consul: http://localhost:8500
 - Nomad: http://localhost:4646


## Why not Ansible?

I often get asked why not use Ansible to deploy to remote machines, as this
would look like a typical use case.  There are many reasons, which basically
boil down to "I really don't like Ansible":

- Ansible tries to do declarative system configuration, but doesn't do it
  correctly at all, like Nix does.  Example: in NixOS, to undo something you've
  done, just comment the corresponding lines and redeploy.

- Ansible is massive overkill for what we're trying to do here, we're just
  copying a few small files and running some basic commands, leaving the rest
  to NixOS.

- YAML is a pain to manipulate as soon as you have more than two or three
  indentation levels.  Also, why in hell would you want to write loops and
  conditions in YAML when you could use a proper expression language?

- Ansible's vocabulary is not ours, and it imposes a rigid hierarchy of
  directories and files which I don't want.

- Ansible is probably not flexible enough to do what we want, at least not
  without getting a migraine when trying. For example, it's inventory
  management is too simple to account for the heterogeneity of our cluster
  nodes while still retaining a level of organization (some configuration
  options are defined cluster-wide, some are defined for each site - physical
  location - we deploy on, and some are specific to each node).

- I never remember Ansible's command line flags.

- My distribution's package for Ansible takes almost 400MB once installed,
  WTF???  By not depending on it, we're reducing the set of tools we need to
  deploy to a bare minimum: Git, OpenSSH, OpenSSL, socat,
  [pass](https://www.passwordstore.org/) (and the Consul and Nomad binaries
  which are, I'll admit, not small).


## More

Please read README.more.md for more detailed information