aboutsummaryrefslogblamecommitdiff
path: root/os/config/README.md
blob: 81fe9c93749c8b20c28c761f72543a985a7411f6 (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11
12
         
 
                 
 

                                    





                                                                                                       
                                 

                              
                                  
                        
                                    
                                                                                     
                                               
          
                                         
                                                                   
 
# ANSIBLE

## How to proceed

For each machine, **one by one** do:
  - Check that cluster is healthy
    - Check garage
      - check that all nodes are online `docker exec -ti xxx /garage status`
      - check that tables are in sync `docker exec -ti 63a4d7ecd795 /garage repair --yes tables`
      - check garage logs
        - no unknown errors or resync should be in progress
        - the following line must appear `INFO  garage_util::background > Worker exited: Repair worker`
    - Check that Nomad is healthy
      - `nomad server members`
      - `nomad node status`
    - Check that Consul is healthy
      - `consul members`
    - Check that Postgres is healthy
  - Run `ansible-playbook -i production.yml --limit <machine> -u <username> site.yml`
  - Run `nomad node drain -enable -force -self`
  - Reboot
  - Run `nomad node drain -self -disable`
  - Check that cluster is healthy (basically the whole first point)