| HSCloud Clusters |
| ================ |
| |
| Current cluster: `k0.hswaw.net` |
| |
| Accessing via kubectl |
| --------------------- |
| |
| There isn't yet a service for getting short-term user certificates. Instead, you'll have to get admin certificates: |
| |
| clustercfg admincreds $(whoami)-admin |
| kubectl get nodes |
| |
| Provisioning nodes |
| ------------------ |
| |
| - bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented) |
| - `clustercfg nodestrap bc01nXX.hswaw.net` |
| |
| That's it! |
| |
| Ceph |
| ==== |
| |
| We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs. |
| |
| The following Ceph clusters are available: |
| |
| ceph-waw1 |
| --------- |
| |
| HDDs on bc01n0{1-3}. 3TB total capacity. |
| |
| The following storage classes use this cluster: |
| |
| - `waw-hdd-redundant-1` - erasure coded 2.1 |
| |
| A dashboard is available at https://ceph-waw1.hswaw.net/, to get the admin password run: |
| |
| kubectl -n ceph-waw1 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo |
| |
| Known Issues |
| ============ |
| |
| After running `nixos-configure switch` on the hosts, the shared host/container CNI plugin directory gets nuked, and pods will fail to schedule on that node (TODO(q3k): error message here). To fix this, restart calico-node pods running on nodes that have this issue. The Calico Node pod will reschedule automatically and fix the CNI plugins directory. |
| |
| kubectl -n kube-system get pods -o wide | grep calico-node |
| kubectl -n kube-system delete pod calico-node-XXXX |
| |