From 3e5e2d60cdac107cc996e0efe936ced8fd25c61d Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Thu, 22 Dec 2022 23:33:10 +0100 Subject: reorganize documentation --- doc/architecture.md | 129 ++++++++++++++++++++++++++++++++ doc/nixos-install-luks.md | 182 ++++++++++++++++++++++++++++++++++++++++++++++ doc/nixos-install.md | 178 --------------------------------------------- doc/quick-start.md | 73 +++++++++++++++++++ 4 files changed, 384 insertions(+), 178 deletions(-) create mode 100644 doc/architecture.md create mode 100644 doc/nixos-install-luks.md delete mode 100644 doc/nixos-install.md create mode 100644 doc/quick-start.md (limited to 'doc') diff --git a/doc/architecture.md b/doc/architecture.md new file mode 100644 index 0000000..8a9579f --- /dev/null +++ b/doc/architecture.md @@ -0,0 +1,129 @@ +# Additional README + +## Configuring the OS + +This repo contains a bunch of scripts to configure NixOS on all cluster nodes. +Most scripts are invoked with the following syntax: + +- for scripts that generate secrets: `./gen_ ` to generate the secrets to be used on cluster `` +- for deployment scripts: + - `./deploy_ ` to run the deployment script on all nodes of the cluster `` + - `./deploy_ ...` to run the deployment script only on nodes `node1, node2, ...` of cluster ``. + +All deployment scripts can use the following parameters passed as environment variables: + +- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning. +- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used. + +### Assumptions (how to setup your environment) + +- you have an SSH access to all of your cluster nodes (listed in `cluster//ssh_config`) + +- your account is in group `wheel` and you know its password (you need it to become root using `sudo`); + the password is the same on all cluster nodes (see below for password management tools) + +- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs` + (scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster//`) + +### Deploying the NixOS configuration + +The NixOS configuration makes use of a certain number of files: + +- files in `nix/` that are the same for all deployments on all clusters +- the file `cluster//cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes +- files in `cluster//site/`, which are specific to the various sites on which Nix nodes are deployed +- files in `cluster//node/` which are specific to each node + +To deploy the NixOS configuration on the cluster, simply do: + +``` +./deploy_nixos +``` + +or to deploy only on a single node: + +``` +./deploy_nixos +``` + +To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax). + +**When adding a node to the cluster:** just do `./deploy_nixos ` + +### Generating and deploying a PKI for Consul and Nomad + +This is very similar to how we do for Wesher. + +First, if the PKI has not yet been created, create it with: + +``` +./gen_pki +``` + +Then, deploy the PKI on all nodes with: + +``` +./deploy_pki +``` + +**When adding a node to the cluster:** just do `./deploy_pki ` + +### Adding administrators and password management + +Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary). +This is where their public SSH keys for remote access are put. + +Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo. +To set the password for a new administrator, they must have a working `pass` installation as specified above. +They must then run: + +``` +./passwd +``` + +to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db). + +Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes: + +``` +./deploy_passwords +``` + +## Deploying stuff on Nomad + +### Connecting to Nomad + +Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost. + +You can for instance use an entry in your `~/.ssh/config` that looks like this: + +``` +Host caribou + HostName 2a01:e0a:c:a720::23 + LocalForward 14646 127.0.0.1:4646 + LocalForward 8501 127.0.0.1:8501 + LocalForward 1389 bottin.service.staging.consul:389 +``` + +Then, in a separate window, launch `./tlsproxy `: this will +launch `socat` proxies that strip the TLS layer and allow you to simply access +Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for +Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as +long as you need to access Nomad and Consul on the cluster. + +### Launching services + +Stuff should be started in this order: + +1. `app/core` +2. `app/frontend` +3. `app/telemetry` +4. `app/garage-staging` +5. `app/directory` + +Then, other stuff can be started in any order: + +- `app/im` (cluster `staging` only) +- `app/cryptpad` (cluster `prod` only) +- `app/drone-ci` + diff --git a/doc/nixos-install-luks.md b/doc/nixos-install-luks.md new file mode 100644 index 0000000..3f0feca --- /dev/null +++ b/doc/nixos-install-luks.md @@ -0,0 +1,182 @@ +## Preparation + +Download NixOS 21.11 ISO. Burn to USB. + +## Booting into install environment + +Boot the ISO on PC to install. + +Become root with `sudo su` + +```bash +loadkeys fr +setfont sun12x22 +``` + +Do network config if necessary, see [install guide](https://nixos.org/manual/nixos/stable/index.html#sec-installation-booting-networking) + +## Make partitions + +```bash +cgdisk /dev/sda +``` + +Recommended layout: + +``` +/dev/sda1 512M ef00 EFI System partition +/dev/sda2 100% 8309 Linux LUKS +``` + +## Setup cryptography + +```bash +cryptsetup luksFormat /dev/sda2 +cryptsetup open /dev/sda2 cryptlvm +``` + +## Create PV, VG and LVs + +```bash +pvcreate /dev/mapper/cryptlvm +vgcreate NixosVG /dev/mapper/cryptlvm +lvcreate -L 8G NixosVG -n swap +lvcreate -l 100%FREE NixosVG -n root +``` + +## Format partitions + +```bash +mkfs.fat -F 32 -n boot /dev/sda1 +mkswap /dev/NixosVG/swap +mkfs.ext4 /dev/NixosVG/root +``` + +## Mount partitions + +```bash +swapon /dev/NixosVG/swap +mount /dev/NixosVG/root /mnt +mkdir /mnt/boot +mount /dev/sda1 /mnt/boot +``` + +## Generate base NixOS configuration + +```bash +nixos-generate-config --root /mnt +``` + +## Update `hardware-configuration.nix` + +This section is needed: + +```nix + boot.initrd.luks.devices."cryptlvm" = { + device = "/dev/disk/by-uuid/"; + allowDiscards = true; + }; +``` + +And for the root filesystem, remember to add the `relatime` and `discard` options so that it looks like this: + +```nix + fileSystems."/" = + { device = "/dev/disk/by-uuid/<...>"; + fsType = "ext4"; + options = [ "relatime" "discard" ]; + }; +``` + +## Update `configuration.nix` + +Just enough so that basic tasks can be done from keyboard and remotely: + +- timezone +- keyboard layout +- font `sun12x22` +- vim +- non-root user +- ssh +- tcp port 22 in firewall + +## Do the installation + +```bash +nixos-install +``` + +## First boot + +Reboot machine. Login as `root` + +```bash +passwd +``` + +If necessary, assign static IP. E.g. `ip addr add 192.168.1.40/24 dev eno1` or sth (replace ip and device appropriately) + +Remotely: `ssh-copy-id @`. Check SSH access is good. + +## Deploy from this repo + +See [this documentation](quick-start.md). + +## Old guide + +It's time! + +**Files in this repo to create/change:** + +- create node `.nix` file and symlink for node `.site.nix` (create site and + cluster `.nix` files if necessary; use existing files of e.g. the staging + cluster as examples/templates) +- make sure values are filled in correctly +- add node to `ssh_config` with it's LAN IP, we don't have VPN at this stage + +**Configuration steps on the node:** + +```bash +# On node being installed +mkdir -p /var/lib/deuxfleurs/remote-unlock +cd /var/lib/deuxfleurs/remote-unlock +ssh-keygen -t ed25519 -N "" -f ./ssh_host_ed25519_key +``` + +**Try to deploy:** + +```bash +# In nixcfg repository from your PC +./deploy.sh +``` + +Reboot. + +Check remote unlocking works: `ssh -p 222 root@` + +## Configure wireguard + +```bash +# On node being installed +mkdir -p /var/lib/deuxfleurs/wireguard-keys +cd /var/lib/deuxfleurs/wireguard-keys +wg genkey | tee private | wg pubkey > public +``` + +Get the public key, make sure it is in `cluster.nix` so that nodes know one +another. Also put it anywhere else like in your local wireguard config for +instance so that you can access the node from your PC by its wireguard address +and not only its LAN address. + +Redo a deploy (`./deploy.sh `) + +Check VPN works. Change IP in `ssh_config` to use VPN IP instead of LAN IP (required for deploy when away from home). + +## Commit changes to `nixcfg` repo + +This is a good point to commit your new/modified `.nix` files. + +## Configure Nomad and Consul TLS + +If you are bootstraping a new cluster, you need to `./genpki.sh ` to +make a TLS PKI for the Nomad+Consul cluster to work. Then redo a deploy. diff --git a/doc/nixos-install.md b/doc/nixos-install.md deleted file mode 100644 index 7b3d137..0000000 --- a/doc/nixos-install.md +++ /dev/null @@ -1,178 +0,0 @@ -## Preparation - -Download NixOS 21.11 ISO. Burn to USB. - -## Booting into install environment - -Boot the ISO on PC to install. - -Become root with `sudo su` - -```bash -loadkeys fr -setfont sun12x22 -``` - -Do network config if necessary, see [install guide](https://nixos.org/manual/nixos/stable/index.html#sec-installation-booting-networking) - -## Make partitions - -```bash -cgdisk /dev/sda -``` - -Recommended layout: - -``` -/dev/sda1 512M ef00 EFI System partition -/dev/sda2 100% 8309 Linux LUKS -``` - -## Setup cryptography - -```bash -cryptsetup luksFormat /dev/sda2 -cryptsetup open /dev/sda2 cryptlvm -``` - -## Create PV, VG and LVs - -```bash -pvcreate /dev/mapper/cryptlvm -vgcreate NixosVG /dev/mapper/cryptlvm -lvcreate -L 8G NixosVG -n swap -lvcreate -l 100%FREE NixosVG -n root -``` - -## Format partitions - -```bash -mkfs.fat -F 32 -n boot /dev/sda1 -mkswap /dev/NixosVG/swap -mkfs.ext4 /dev/NixosVG/root -``` - -## Mount partitions - -```bash -swapon /dev/NixosVG/swap -mount /dev/NixosVG/root /mnt -mkdir /mnt/boot -mount /dev/sda1 /mnt/boot -``` - -## Generate base NixOS configuration - -```bash -nixos-generate-config --root /mnt -``` - -## Update `hardware-configuration.nix` - -This section is needed: - -```nix - boot.initrd.luks.devices."cryptlvm" = { - device = "/dev/disk/by-uuid/"; - allowDiscards = true; - }; -``` - -And for the root filesystem, remember to add the `relatime` and `discard` options so that it looks like this: - -```nix - fileSystems."/" = - { device = "/dev/disk/by-uuid/<...>"; - fsType = "ext4"; - options = [ "relatime" "discard" ]; - }; -``` - -## Update `configuration.nix` - -Just enough so that basic tasks can be done from keyboard and remotely: - -- timezone -- keyboard layout -- font `sun12x22` -- vim -- non-root user -- ssh -- tcp port 22 in firewall - -## Do the installation - -```bash -nixos-install -``` - -## First boot - -Reboot machine. Login as `root` - -```bash -passwd -``` - -If necessary, assign static IP. E.g. `ip addr add 192.168.1.40/24 dev eno1` or sth (replace ip and device appropriately) - -Remotely: `ssh-copy-id @`. Check SSH access is good. - -## Deploy from this repo - -It's time! - -**Files in this repo to create/change:** - -- create node `.nix` file and symlink for node `.site.nix` (create site and - cluster `.nix` files if necessary; use existing files of e.g. the staging - cluster as examples/templates) -- make sure values are filled in correctly -- add node to `ssh_config` with it's LAN IP, we don't have VPN at this stage - -**Configuration steps on the node:** - -```bash -# On node being installed -mkdir -p /var/lib/deuxfleurs/remote-unlock -cd /var/lib/deuxfleurs/remote-unlock -ssh-keygen -t ed25519 -N "" -f ./ssh_host_ed25519_key -``` - -**Try to deploy:** - -```bash -# In nixcfg repository from your PC -./deploy.sh -``` - -Reboot. - -Check remote unlocking works: `ssh -p 222 root@` - -## Configure wireguard - -```bash -# On node being installed -mkdir -p /var/lib/deuxfleurs/wireguard-keys -cd /var/lib/deuxfleurs/wireguard-keys -wg genkey | tee private | wg pubkey > public -``` - -Get the public key, make sure it is in `cluster.nix` so that nodes know one -another. Also put it anywhere else like in your local wireguard config for -instance so that you can access the node from your PC by its wireguard address -and not only its LAN address. - -Redo a deploy (`./deploy.sh `) - -Check VPN works. Change IP in `ssh_config` to use VPN IP instead of LAN IP (required for deploy when away from home). - -## Commit changes to `nixcfg` repo - -This is a good point to commit your new/modified `.nix` files. - -## Configure Nomad and Consul TLS - -If you are bootstraping a new cluster, you need to `./genpki.sh ` to -make a TLS PKI for the Nomad+Consul cluster to work. Then redo a deploy. diff --git a/doc/quick-start.md b/doc/quick-start.md new file mode 100644 index 0000000..1307fde --- /dev/null +++ b/doc/quick-start.md @@ -0,0 +1,73 @@ +# Quick start + +## How to welcome a new administrator + +See: https://guide.deuxfleurs.fr/operations/acces/pass/ + +Basically: + - The new administrator generates a GPG key and publishes it on Gitea + - All existing administrators pull their key and sign it + - An existing administrator reencrypt the keystore with this new key and push it + - The new administrator clone the repo and check that they can decrypt the secrets + - Finally, the new administrator must choose a password to operate over SSH with `./passwd prod rick` where `rick` is the target username + + +## How to create files for a new zone + +*The documentation is written for the production cluster, the same apply for other clusters.* + +Basically: + - Create your `site` file in `cluster/prod/site/` folder + - Create your `node` files in `cluster/prod/node/` folder + - Add your wireguard configuration to `cluster/prod/cluster.nix` + - You will have to edit your NAT config manually to bind one public IPv4 port to each node + - Nodes' public wireguard keys are generated during the first run of `deploy_nixos`, see below + - Add your nodes to `cluster/prod/ssh_config`, it will be used by the various SSH scripts. + - If you use `ssh` directly, use `ssh -F ./cluster/prod/ssh_config` + - Add `User root` for the first time as your user will not be declared yet on the system + +## How to deploy a Nix configuration on a fresh node + +We suppose that the node name is `datura`. +Start by doing the deployment one node at a time, you will have plenty of time +in your operator's life to break everything through automation. + +Run: + - `./deploy_nixos prod datura` - to deploy the nix configuration file; + - a new wireguard key is printed if it hadn't been generated before, it has to be + added to `cluster.nix`, and then redeployed on all nodes as the new wireguard conf is needed everywhere + - `./deploy_passwords prod datura` - to deploy user's passwords + - if a user changes their password (using `./passwd`), needs to be redeployed on all nodes to setup the password on all nodes + - `./deploy_pki prod datura` - to deploy Nomad's and Consul's PKI + +## How to operate a node + +Edit your `~/.ssh/config` file: + +``` +Host dahlia + HostName dahlia.machine.deuxfleurs.fr + LocalForward 14646 127.0.0.1:4646 + LocalForward 8501 127.0.0.1:8501 + LocalForward 1389 bottin.service.prod.consul:389 + LocalForward 5432 psql-proxy.service.prod.consul:5432 +``` + +Then run the TLS proxy and leave it running: + +``` +./tlsproxy prod +``` + +SSH to a production machine (e.g. dahlia) and leave it running: + +``` +ssh dahlia +``` + + +Finally you should see be able to access the production Nomad and Consul by browsing: + + - Consul: http://localhost:8500 + - Nomad: http://localhost:4646 + -- cgit v1.2.3