blob: b012b5c19ce23d101efd804cde5cced5935bb910 [file] [log] [blame]
Sergiusz Bazanskide061802019-01-13 21:14:02 +01001HSCloud Clusters
2================
3
4Current cluster: `k0.hswaw.net`
5
6Accessing via kubectl
7---------------------
8
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +02009 prodaccess # get a short-lived certificate for your use via SSO
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020010 kubectl version
11 kubectl top nodes
12
13Every user gets a `personal-$username` namespace. Feel free to use it for your own purposes, but watch out for resource usage!
Sergiusz Bazanskide061802019-01-13 21:14:02 +010014
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020015Persistent Storage
Sergiusz Bazanskide061802019-01-13 21:14:02 +010016------------------
17
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020018HDDs on bc01n0{1-3}. 3TB total capacity.
19
20The following storage classes use this cluster:
21
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020022 - `waw-hdd-paranoid-1` - 3 replicas
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020023 - `waw-hdd-redundant-1` - erasure coded 2.1
Sergiusz Bazanski36cc4fb2019-05-17 18:08:48 +020024 - `waw-hdd-yolo-1` - unreplicated (you _will_ lose your data)
Piotr Dobrowolski56918232019-04-09 23:48:33 +020025 - `waw-hdd-redundant-1-object` - erasure coded 2.1 object store
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020026
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020027Rados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To create a user, ask an admin.
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020028
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020029PersistentVolumes currently bound to PVCs get automatically backued up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020030
31Administration
32==============
33
34Provisioning nodes
35------------------
36
37 - bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
Sergiusz Bazanski5f9b1ec2019-09-22 02:19:18 +020038 - `bazel run //cluster/clustercfg nodestrap bc01nXX.hswaw.net`
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020039
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020040Ceph - Debugging
41-----------------
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020042
43We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.
44
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020045A dashboard is available at https://ceph-waw2.hswaw.net/, to get the admin password run:
46
47 kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
48
49
50Ceph - Backups
51--------------
52
53Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob runs in every Ceph cluster. You can also manually trigger a run by doing:
54
55 kubectl -n ceph-waw2 create job --from=cronjob/ceph-waw2-benji ceph-waw2-benji-manual-$(date +%s)
56
57Ceph ObjectStorage pools (RADOSGW) are _not_ backed up yet!
58
59Ceph - Object Storage
60---------------------
61
62To create an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html)
63User authentication secret is generated in ceph cluster namespace (`ceph-waw2`),
64thus may need to be manually copied into application namespace. (see
65`app/registry/prod.jsonnet` comment)
66
67`tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd.
68Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`)
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020069