Admin documentation. For user documentation, see //cluster/doc/user.md.
Current cluster: k0.hswaw.net
HDDs on bc01n0{1-3}. 3TB total capacity. Don't use this as this pool should go away soon (the disks are slow, the network is slow and the RAID controllers lie). Use ceph-waw3 instead.
The following storage classes use this cluster:
waw-hdd-paranoid-1
- 3 replicaswaw-hdd-redundant-1
- erasure coded 2.1waw-hdd-yolo-1
- unreplicated (you will lose your data)waw-hdd-redundant-1-object
- erasure coded 2.1 object storeRados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To create a user, ask an admin.
PersistentVolumes currently bound to PersistentVolumeClaims get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
HDDs on dcr01s2{2,4}. 40TB total capacity for now. Use this.
The following storage classes use this cluster:
waw-hdd-yolo-3
- 1 replicawaw-hdd-redundant-3
- 2 replicaswaw-hdd-redundant-3-object
- 2 replicas, object storeRados Gateway (S3) is available at https://object.ceph-waw3.hswaw.net/. To create a user, ask an admin.
PersistentVolumes currently bound to PVCs get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
bazel run //cluster/clustercfg nodestrap bc01nXX.hswaw.net
We run Ceph via Rook. The Rook operator is running in the ceph-rook-system
namespace. To debug Ceph issues, start by looking at its logs.
A dashboard is available at https://ceph-waw2.hswaw.net/ and https://ceph-waw3.hswaw.net, to get the admin password run:
kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob runs in every Ceph cluster. You can also manually trigger a run by doing:
kubectl -n ceph-waw2 create job --from=cronjob/ceph-waw2-benji ceph-waw2-benji-manual-$(date +%s) kubectl -n ceph-waw3 create job --from=cronjob/ceph-waw3-benji ceph-waw3-benji-manual-$(date +%s)
Ceph ObjectStorage pools (RADOSGW) are not backed up yet!
To create an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html) User authentication secret is generated in ceph cluster namespace (ceph-waw2
), thus may need to be manually copied into application namespace. (see app/registry/prod.jsonnet
comment)
tools/rook-s3cmd-config
can be used to generate test configuration file for s3cmd. Remember to append :default-placement
to your region name (ie. waw-hdd-redundant-1-object:default-placement
)