blob: 034a28c903ca62ba2ecadbc21649fd659448f54c [file] [log] [blame]
Sergiusz Bazanskide061802019-01-13 21:14:02 +01001HSCloud Clusters
2================
3
4Current cluster: `k0.hswaw.net`
5
6Accessing via kubectl
7---------------------
8
9There isn't yet a service for getting short-term user certificates. Instead, you'll have to get admin certificates:
10
11 clustercfg admincreds $(whoami)-admin
12 kubectl get nodes
13
14Provisioning nodes
15------------------
16
17 - bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
18 - `clustercfg nodestrap bc01nXX.hswaw.net`
19
20That's it!
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020021
22Ceph
23====
24
25We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.
26
27The following Ceph clusters are available:
28
29ceph-waw1
30---------
31
32HDDs on bc01n0{1-3}. 3TB total capacity.
33
34The following storage classes use this cluster:
35
36 - `waw-hdd-redundant-1` - erasure coded 2.1
37
38A dashboard is available at https://ceph-waw1.hswaw.net/, to get the admin password run:
39
40 kubectl -n ceph-waw1 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
41
42Known Issues
43============
44
45After running `nixos-configure switch` on the hosts, the shared host/container CNI plugin directory gets nuked, and pods will fail to schedule on that node (TODO(q3k): error message here). To fix this, restart calico-node pods running on nodes that have this issue. The Calico Node pod will reschedule automatically and fix the CNI plugins directory.
46
47 kubectl -n kube-system get pods -o wide | grep calico-node
48 kubectl -n kube-system delete pod calico-node-XXXX
49