| HSCloud Clusters |
| ================ |
| |
| Current cluster: `k0.hswaw.net` |
| |
| Accessing via kubectl |
| --------------------- |
| |
| prodaccess # get a short-lived certificate for your use via SSO |
| # if youre local username is not the same as your HSWAW SSO |
| # username, pass `-username foo` |
| kubectl version |
| kubectl top nodes |
| |
| Every user gets a `personal-$username` namespace. Feel free to use it for your own purposes, but watch out for resource usage! |
| |
| kubectl run -n personal-$username run --image=alpine:latest -it foo |
| |
| To proceed further you should be somewhat familiar with Kubernetes. Otherwise the rest of terminology might not make sense. We recommend going through the original Kubernetes tutorials. |
| |
| Persistent Storage (waw2) |
| ------------------------- |
| |
| HDDs on bc01n0{1-3}. 3TB total capacity. Don't use this as this pool should go away soon (the disks are slow, the network is slow and the RAID controllers lie). Use ceph-waw3 instead. |
| |
| The following storage classes use this cluster: |
| |
| - `waw-hdd-paranoid-1` - 3 replicas |
| - `waw-hdd-redundant-1` - erasure coded 2.1 |
| - `waw-hdd-yolo-1` - unreplicated (you _will_ lose your data) |
| - `waw-hdd-redundant-1-object` - erasure coded 2.1 object store |
| |
| Rados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To create a user, ask an admin. |
| |
| PersistentVolumes currently bound to PersistentVolumeClaims get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year). |
| |
| Persistent Storage (waw3) |
| ------------------------- |
| |
| HDDs on dcr01s2{2,4}. 40TB total capacity for now. Use this. |
| |
| The following storage classes use this cluster: |
| |
| - `waw-hdd-yolo-3` - 1 replica |
| - `waw-hdd-redundant-3` - 2 replicas |
| - `waw-hdd-redundant-3-object` - 2 replicas, object store |
| |
| Rados Gateway (S3) is available at https://object.ceph-waw3.hswaw.net/. To create a user, ask an admin. |
| |
| PersistentVolumes currently bound to PVCs get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year). |
| |
| Administration |
| ============== |
| |
| Provisioning nodes |
| ------------------ |
| |
| - bring up a new node with nixos, the configuration doesn't matter and will be nuked anyway |
| - edit cluster/nix/defs-machines.nix |
| - `bazel run //cluster/clustercfg nodestrap bc01nXX.hswaw.net` |
| |
| Ceph - Debugging |
| ----------------- |
| |
| We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs. |
| |
| A dashboard is available at https://ceph-waw2.hswaw.net/, to get the admin password run: |
| |
| kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo |
| |
| |
| Ceph - Backups |
| -------------- |
| |
| Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob runs in every Ceph cluster. You can also manually trigger a run by doing: |
| |
| kubectl -n ceph-waw2 create job --from=cronjob/ceph-waw2-benji ceph-waw2-benji-manual-$(date +%s) |
| |
| Ceph ObjectStorage pools (RADOSGW) are _not_ backed up yet! |
| |
| Ceph - Object Storage |
| --------------------- |
| |
| To create an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html) |
| User authentication secret is generated in ceph cluster namespace (`ceph-waw2`), |
| thus may need to be manually copied into application namespace. (see |
| `app/registry/prod.jsonnet` comment) |
| |
| `tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd. |
| Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`) |
| |