blob: 01208592952c20c8cb89344f648370504563d45f [file] [log] [blame]
Sergiusz Bazanskide061802019-01-13 21:14:02 +01001HSCloud Clusters
2================
3
4Current cluster: `k0.hswaw.net`
5
6Accessing via kubectl
7---------------------
8
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +02009 prodaccess # get a short-lived certificate for your use via SSO
Sergiusz Bazanski58d08592020-02-15 00:58:47 +010010 # if youre local username is not the same as your HSWAW SSO
11 # username, pass `-username foo`
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020012 kubectl version
13 kubectl top nodes
14
15Every user gets a `personal-$username` namespace. Feel free to use it for your own purposes, but watch out for resource usage!
Sergiusz Bazanskide061802019-01-13 21:14:02 +010016
Sergiusz Bazanski58d08592020-02-15 00:58:47 +010017 kubectl run -n personal-$username run --image=alpine:latest -it foo
Sergiusz Bazanskide061802019-01-13 21:14:02 +010018
Sergiusz Bazanski58d08592020-02-15 00:58:47 +010019To proceed further you should be somewhat familiar with Kubernetes. Otherwise the rest of terminology might not make sense. We recommend going through the original Kubernetes tutorials.
20
21Persistent Storage (waw2)
22-------------------------
23
24HDDs on bc01n0{1-3}. 3TB total capacity. Don't use this as this pool should go away soon (the disks are slow, the network is slow and the RAID controllers lie). Use ceph-waw3 instead.
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020025
26The following storage classes use this cluster:
27
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020028 - `waw-hdd-paranoid-1` - 3 replicas
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020029 - `waw-hdd-redundant-1` - erasure coded 2.1
Sergiusz Bazanski36cc4fb2019-05-17 18:08:48 +020030 - `waw-hdd-yolo-1` - unreplicated (you _will_ lose your data)
Piotr Dobrowolski56918232019-04-09 23:48:33 +020031 - `waw-hdd-redundant-1-object` - erasure coded 2.1 object store
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020032
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020033Rados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To create a user, ask an admin.
Sergiusz Bazanski2fd58612019-04-02 14:45:17 +020034
Sergiusz Bazanski58d08592020-02-15 00:58:47 +010035PersistentVolumes currently bound to PersistentVolumeClaims get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
36
37Persistent Storage (waw3)
38-------------------------
39
40HDDs on dcr01s2{2,4}. 40TB total capacity for now. Use this.
41
42The following storage classes use this cluster:
43
44 - `waw-hdd-yolo-3` - 1 replica
45 - `waw-hdd-redundant-3` - 2 replicas
46 - `waw-hdd-redundant-3-object` - 2 replicas, object store
47
48Rados Gateway (S3) is available at https://object.ceph-waw3.hswaw.net/. To create a user, ask an admin.
49
50PersistentVolumes currently bound to PVCs get automatically backed up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020051
52Administration
53==============
54
55Provisioning nodes
56------------------
57
Sergiusz Bazanski58d08592020-02-15 00:58:47 +010058 - bring up a new node with nixos, the configuration doesn't matter and will be nuked anyway
59 - edit cluster/nix/defs-machines.nix
Sergiusz Bazanski5f9b1ec2019-09-22 02:19:18 +020060 - `bazel run //cluster/clustercfg nodestrap bc01nXX.hswaw.net`
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020061
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020062Ceph - Debugging
63-----------------
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020064
65We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.
66
Sergiusz Bazanski13bb1bf2019-08-31 16:33:29 +020067A dashboard is available at https://ceph-waw2.hswaw.net/, to get the admin password run:
68
69 kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
70
71
72Ceph - Backups
73--------------
74
75Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob runs in every Ceph cluster. You can also manually trigger a run by doing:
76
77 kubectl -n ceph-waw2 create job --from=cronjob/ceph-waw2-benji ceph-waw2-benji-manual-$(date +%s)
78
79Ceph ObjectStorage pools (RADOSGW) are _not_ backed up yet!
80
81Ceph - Object Storage
82---------------------
83
84To create an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html)
85User authentication secret is generated in ceph cluster namespace (`ceph-waw2`),
86thus may need to be manually copied into application namespace. (see
87`app/registry/prod.jsonnet` comment)
88
89`tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd.
90Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`)
Sergiusz Bazanskib13b7ff2019-08-29 20:12:24 +020091