aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlex Auvolat <alex@adnab.me>2022-04-20 16:13:14 +0200
committerAlex Auvolat <alex@adnab.me>2022-04-20 16:13:14 +0200
commit04f2bd48bb3d9a33e36409b8eddbad05e21807c1 (patch)
tree435b58f4d69342b2b8ad86d522447551ac1f1206
parent6c22f5fdfad8752006c2245b503313973766c31c (diff)
downloadnixcfg-04f2bd48bb3d9a33e36409b8eddbad05e21807c1.tar.gz
nixcfg-04f2bd48bb3d9a33e36409b8eddbad05e21807c1.zip
Add some readme
-rw-r--r--README.md136
-rw-r--r--cluster/prod/ssh_config6
-rw-r--r--cluster/staging/ssh_config6
-rw-r--r--ssh_known_hosts3
4 files changed, 145 insertions, 6 deletions
diff --git a/README.md b/README.md
index d993362..854ee41 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,142 @@ The following scripts are available here:
- `tlsproxy.sh`, a script that allows non-TLS access to the TLS-secured Consul and Nomad, by running a simple local proxy with socat
- `tlsenv.sh`, a script to be sourced (`source tlsenv.sh`) that configures the correct environment variables to use the Nomad and Consul CLI tools with TLS
+## Configuring the OS
+
+This repo contains a bunch of scripts to configure NixOS on all cluster nodes.
+Most scripts are invoked with the following syntax:
+
+- for scripts that generate secrets: `./gen_<something> <cluster_name>` to generate the secrets to be used on cluster `<cluster_name>`
+- for deployment scripts:
+ - `./deploy_<something> <cluster_name>` to run the deployment script on all nodes of the cluster `<cluster_name>`
+ - `./deploy_<something> <cluster_name> <node1> <node2> ...` to run the deployment script only on nodes `node1, node2, ...` of cluster `<cluster_name>`.
+
+
+### Assumptions (how to setup your environment)
+
+- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`)
+
+- your account is in group `wheel` and you know its password (you need it to become root using `sudo`)
+
+- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs`
+ (scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
+
+### Deploying the NixOS configuration
+
+The NixOS configuration makes use of a certain number of files:
+
+- files in `nix/` that are the same for all deployments on all clusters
+- the file `cluster/<cluster_name>/cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes
+- files in `cluster/<cluster_name>/site/`, which are specific to the various sites on which Nix nodes are deployed
+- files in `cluster/<cluster_name>/node/` which are specific to each node
+
+To deploy the NixOS configuration on the cluster, simply do:
+
+```
+./deploy_nixos <cluster_name>
+```
+
+or to deploy only on a single node:
+
+```
+./deploy_nixos <cluster_name> <node_name>
+```
+
+To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
+
+**When adding a node to the cluster:** just do `./deploy_nixos <cluster_name> <name_of_new_node>`
+
+### Deploying Wesher
+
+We use Wesher to provide an encrypted overlay network between nodes in the cluster.
+This is usefull in particular for securing services that are not able to do mTLS,
+but as a security-in-depth measure, we make all traffic go through Wesher even when
+TLS is done correctly. It is thus mandatory to have a working Wesher installation
+in the cluster for it to run correctly.
+
+First, if no Wesher shared secret key has been generated for this cluster yet,
+generate it with:
+
+```
+./gen_wesher_key <cluster_name>
+```
+
+This key will be stored in `pass`, so you must have a working `pass` installation
+for this script to run correctly.
+
+Then, deploy the key on all nodes with:
+
+```
+./deploy_wesher_key <cluster_name>
+```
+
+This should be done after `./deploy_nixos` has run successfully on all nodes.
+You should now have a working Wesher network between all your nodes!
+
+**When adding a node to the cluster:** just do `./deploy_wesher_key <cluster_name> <name_of_new_node>`
+
+### Generating and deploying a PKI for Consul and Nomad
+
+This is very similar to how we do for Wesher.
+
+First, if the PKI has not yet been created, create it with:
+
+```
+./gen_pki <cluster_name>
+```
+
+Then, deploy the PKI on all nodes with:
+
+```
+./deploy_pki <cluster_name>
+```
+
+**When adding a node to the cluster:** just do `./deploy_pki <cluster_name> <name_of_new_node>`
+
+### Adding administrators
+
+Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary).
+This is where their public SSH keys for remote access are put.
+
+Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo.
+To set the password for a new administrator, they must have a working `pass` installation as specified above.
+They must then run:
+
+```
+./passwd <cluster_name> <user_name>
+```
+
+to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db).
+
+Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes:
+
+```
+./deploy_passwords <cluster_name>
+```
+
+## Deploying stuff on Nomad
+
+### Connecting to Nomad
+
+Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost.
+
+You can for instance use an entry in your `~/.ssh/config` that looks like this:
+
+```
+Host caribou
+ HostName 2a01:e0a:c:a720::23
+ LocalForward 14646 127.0.0.1:4646
+ LocalForward 8501 127.0.0.1:8501
+```
+
+Then, in a separate window, launch `./tlsproxy <cluster_name>`: this will
+launch `socat` proxies that strip the TLS layer and allow you to simply access
+Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for
+Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as
+long as you need to access Nomad and Consul on the cluster.
+
+### Launching services
+
Stuff should be started in this order:
- `app/core`
diff --git a/cluster/prod/ssh_config b/cluster/prod/ssh_config
index 266d77f..cb4841f 100644
--- a/cluster/prod/ssh_config
+++ b/cluster/prod/ssh_config
@@ -1,10 +1,10 @@
UserKnownHostsFile ./ssh_known_hosts
Host concombre
- HostName 10.42.1.31
+ HostName 2a01:e0a:c:a720::31
Host courgette
- HostName 10.42.1.32
+ HostName 2a01:e0a:c:a720::32
Host celeri
- HostName 10.42.1.33
+ HostName 2a01:e0a:c:a720::33
diff --git a/cluster/staging/ssh_config b/cluster/staging/ssh_config
index 8fae8ab..9bc4e6e 100644
--- a/cluster/staging/ssh_config
+++ b/cluster/staging/ssh_config
@@ -1,13 +1,13 @@
UserKnownHostsFile ./ssh_known_hosts
Host caribou
- HostName 10.42.2.23
+ HostName 2a01:e0a:c:a720::23
Host carcajou
- HostName 10.42.2.22
+ HostName 2a01:e0a:c:a720::22
Host cariacou
- HostName 10.42.2.21
+ HostName 2a01:e0a:c:a720::21
Host spoutnik
HostName 10.42.0.2
diff --git a/ssh_known_hosts b/ssh_known_hosts
index 7e224a3..e3181cf 100644
--- a/ssh_known_hosts
+++ b/ssh_known_hosts
@@ -6,3 +6,6 @@
10.42.2.21 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPXTUrXRFhudJBESCqjHCOttzqYPyIzpPOMkI8+SwLRx
10.42.2.22 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMf/ioVSSb19Slu+HZLgKt4f1/XsL+K9uMxazSWb/+nQ
10.42.2.23 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDsYD1gNmGyb6c9wjGR6tC69fHP6+FpPHTBT6laPTHeD
+2a01:e0a:c:a720::22 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMf/ioVSSb19Slu+HZLgKt4f1/XsL+K9uMxazSWb/+nQ
+2a01:e0a:c:a720::21 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPXTUrXRFhudJBESCqjHCOttzqYPyIzpPOMkI8+SwLRx
+2a01:e0a:c:a720::23 ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDsYD1gNmGyb6c9wjGR6tC69fHP6+FpPHTBT6laPTHeD