diff options
Diffstat (limited to 'doc/architecture.md')
-rw-r--r-- | doc/architecture.md | 129 |
1 files changed, 129 insertions, 0 deletions
diff --git a/doc/architecture.md b/doc/architecture.md new file mode 100644 index 0000000..8a9579f --- /dev/null +++ b/doc/architecture.md @@ -0,0 +1,129 @@ +# Additional README + +## Configuring the OS + +This repo contains a bunch of scripts to configure NixOS on all cluster nodes. +Most scripts are invoked with the following syntax: + +- for scripts that generate secrets: `./gen_<something> <cluster_name>` to generate the secrets to be used on cluster `<cluster_name>` +- for deployment scripts: + - `./deploy_<something> <cluster_name>` to run the deployment script on all nodes of the cluster `<cluster_name>` + - `./deploy_<something> <cluster_name> <node1> <node2> ...` to run the deployment script only on nodes `node1, node2, ...` of cluster `<cluster_name>`. + +All deployment scripts can use the following parameters passed as environment variables: + +- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning. +- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used. + +### Assumptions (how to setup your environment) + +- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`) + +- your account is in group `wheel` and you know its password (you need it to become root using `sudo`); + the password is the same on all cluster nodes (see below for password management tools) + +- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs` + (scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`) + +### Deploying the NixOS configuration + +The NixOS configuration makes use of a certain number of files: + +- files in `nix/` that are the same for all deployments on all clusters +- the file `cluster/<cluster_name>/cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes +- files in `cluster/<cluster_name>/site/`, which are specific to the various sites on which Nix nodes are deployed +- files in `cluster/<cluster_name>/node/` which are specific to each node + +To deploy the NixOS configuration on the cluster, simply do: + +``` +./deploy_nixos <cluster_name> +``` + +or to deploy only on a single node: + +``` +./deploy_nixos <cluster_name> <node_name> +``` + +To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax). + +**When adding a node to the cluster:** just do `./deploy_nixos <cluster_name> <name_of_new_node>` + +### Generating and deploying a PKI for Consul and Nomad + +This is very similar to how we do for Wesher. + +First, if the PKI has not yet been created, create it with: + +``` +./gen_pki <cluster_name> +``` + +Then, deploy the PKI on all nodes with: + +``` +./deploy_pki <cluster_name> +``` + +**When adding a node to the cluster:** just do `./deploy_pki <cluster_name> <name_of_new_node>` + +### Adding administrators and password management + +Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary). +This is where their public SSH keys for remote access are put. + +Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo. +To set the password for a new administrator, they must have a working `pass` installation as specified above. +They must then run: + +``` +./passwd <cluster_name> <user_name> +``` + +to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db). + +Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes: + +``` +./deploy_passwords <cluster_name> +``` + +## Deploying stuff on Nomad + +### Connecting to Nomad + +Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost. + +You can for instance use an entry in your `~/.ssh/config` that looks like this: + +``` +Host caribou + HostName 2a01:e0a:c:a720::23 + LocalForward 14646 127.0.0.1:4646 + LocalForward 8501 127.0.0.1:8501 + LocalForward 1389 bottin.service.staging.consul:389 +``` + +Then, in a separate window, launch `./tlsproxy <cluster_name>`: this will +launch `socat` proxies that strip the TLS layer and allow you to simply access +Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for +Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as +long as you need to access Nomad and Consul on the cluster. + +### Launching services + +Stuff should be started in this order: + +1. `app/core` +2. `app/frontend` +3. `app/telemetry` +4. `app/garage-staging` +5. `app/directory` + +Then, other stuff can be started in any order: + +- `app/im` (cluster `staging` only) +- `app/cryptpad` (cluster `prod` only) +- `app/drone-ci` + |