1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
|
# Overall architecture
## Configuring the OS
This repo contains a bunch of scripts to configure NixOS on all cluster nodes.
Most scripts are invoked with the following syntax:
- for scripts that generate secrets: `./gen_<something> <cluster_name>` to generate the secrets to be used on cluster `<cluster_name>`
- for deployment scripts:
- `./deploy_<something> <cluster_name>` to run the deployment script on all nodes of the cluster `<cluster_name>`
- `./deploy_<something> <cluster_name> <node1> <node2> ...` to run the deployment script only on nodes `node1, node2, ...` of cluster `<cluster_name>`.
All deployment scripts can use the following parameters passed as environment variables:
- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning.
- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.
### Assumptions (how to setup your environment)
- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`)
- your account is in group `wheel` and you know its password (you need it to become root using `sudo`);
the password is the same on all cluster nodes (see below for password management tools)
- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs`
(scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
### Deploying the NixOS configuration
The NixOS configuration makes use of a certain number of files:
- files in `nix/` that are the same for all deployments on all clusters
- the file `cluster/<cluster_name>/cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes
- files in `cluster/<cluster_name>/site/`, which are specific to the various sites on which Nix nodes are deployed
- files in `cluster/<cluster_name>/node/` which are specific to each node
To deploy the NixOS configuration on the cluster, simply do:
```
./deploy_nixos <cluster_name>
```
or to deploy only on a single node:
```
./deploy_nixos <cluster_name> <node_name>
```
To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
### Generating and deploying a PKI for Consul and Nomad
First, if the PKI has not yet been created, create it with:
```
./gen_pki <cluster_name>
```
Then, deploy the PKI on all nodes with:
```
./deploy_pki <cluster_name>
```
Note that certificates are valid for not much more than one year: every year in January, `gen_pki` and `deploy_pki` have to be re-run to generate certificates for the new year.
### Adding administrators and password management
Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary).
This is where their public SSH keys for remote access are put.
Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo.
To set the password for a new administrator, they must have a working `pass` installation as specified above.
They must then run:
```
./passwd <cluster_name> <user_name>
```
to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db).
Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes:
```
./deploy_passwords <cluster_name>
```
## Deploying stuff on Nomad
### Connecting to Nomad
Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost.
You can for instance use an entry in your `~/.ssh/config` that looks like this:
```
Host caribou
HostName 2a01:e0a:c:a720::23
LocalForward 14646 127.0.0.1:4646
LocalForward 8501 127.0.0.1:8501
LocalForward 1389 bottin.service.staging.consul:389
```
Then, in a separate window, launch `./tlsproxy <cluster_name>`: this will
launch `socat` proxies that strip the TLS layer and allow you to simply access
Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for
Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as
long as you need to access Nomad and Consul on the cluster.
### Setting scheduler config
Some configuration options have to be tweaked in the orchestrator. Use `nomad orchestrator scheduler set-config` to obtain the following result:
```bash
$ nomad operator scheduler get-config --json
{
"KnownLeader": true,
"LastContact": 0,
"LastIndex": 0,
"NextToken": "",
"RequestTime": 0,
"SchedulerConfig": {
"CreateIndex": 5,
"MemoryOversubscriptionEnabled": true, # << THIS
"ModifyIndex": 399239,
"PauseEvalBroker": false,
"PreemptionConfig": {
"BatchSchedulerEnabled": true, # << THIS
"ServiceSchedulerEnabled": true, # << THIS
"SysBatchSchedulerEnabled": true # << THIS
"SystemSchedulerEnabled": true # << THIS
},
"RejectJobRegistration": false,
"SchedulerAlgorithm": "binpack"
}
}
```
### Launching services
To launch a service, e.g. `app/core`, use `nomad plan` first:
```
cd cluster/staging/app/core/deploy
nomad plan core-system.hcl
```
If the diff looks fine, then you can run the job for real
(the index is printed in the output of `nomad plan`):
```
nomad job run -check-index NNN core-system.hcl
```
There may be several jobs in the same directory, for instance
`core-system.hcl` and `core-service.hcl`.
### Which services to launch
Stuff should be started in this order:
1. `app/core`
2. `app/frontend`
3. `app/telemetry`
4. `app/garage`
5. `app/directory`
Then, other stuff can be started in any order, e.g.:
- `app/im`
- `app/cryptpad`
- `app/drone-ci`
|