Get in the Cluster, Benji!

Here we introduce benji [1], a backup system based on backy2. It lets us
backup Ceph RBD objects from Rook into Wasabi, our offsite S3-compatible
storage provider.

Benji runs as a k8s CronJob, every hour at 42 minutes. It does the
following:
 - runs benji-pvc-backup, which iterates over all PVCs in k8s, and backs
   up their respective PVs to Wasabi
 - runs benji enforce, marking backups outside our backup policy [2] as
   to be deleted
 - runs benji cleanup, to remove unneeded backups
 - runs a custom script to backup benji's sqlite3 database into wasabi
   (unencrypted, but we're fine with that - as the metadata only contains
   image/pool names, thus Ceph PV and pool names)

[1] - https://benji-backup.me/index.html
[2] - latest3,hours48,days7,months12, which means the latest 3 backups,
      then one backup for the next 48 hours, then one backup for the next
      7 days, then one backup for the next 12 months, for a total of 65
      backups (deduplicated, of course)

We also drive-by update some docs (make them mmore separated into
user/admin docs).

Change-Id: Ibe0942fd38bc232399c0e1eaddade3f4c98bc6b4
diff --git a/cluster/README b/cluster/README
index ae09fc7..efff049 100644
--- a/cluster/README
+++ b/cluster/README
@@ -7,7 +7,10 @@
 ---------------------
 
     prodaccess # get a short-lived certificate for your use via SSO
-    kubectl get nodes
+    kubectl version
+    kubectl top nodes
+
+Every user gets a `personal-$username` namespace. Feel free to use it for your own purposes, but watch out for resource usage!
 
 Persistent Storage
 ------------------
@@ -21,18 +24,9 @@
  - `waw-hdd-yolo-1` - unreplicated (you _will_ lose your data)
  - `waw-hdd-redundant-1-object` - erasure coded 2.1 object store
 
-A dashboard is available at https://ceph-waw1.hswaw.net/, to get the admin password run:
+Rados Gateway (S3) is available at https://object.ceph-waw2.hswaw.net/. To create a user, ask an admin.
 
-    kubectl -n ceph-waw1 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
-
-Rados Gateway (S3) is available at https://object.ceph-waw1.hswaw.net/. To create
-an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html)
-User authentication secret is generated in ceph cluster namespace (`ceph-waw1`),
-thus may need to be manually copied into application namespace. (see
-`app/registry/prod.jsonnet` comment)
-
-`tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd.
-Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`)
+PersistentVolumes currently bound to PVCs get automatically backued up (hourly for the next 48 hours, then once every 4 weeks, then once every month for a year).
 
 Administration
 ==============
@@ -43,12 +37,33 @@
  - bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
  - `bazel run //cluster/clustercfg:clustercfg nodestrap bc01nXX.hswaw.net`
 
-That's it!
-
-Ceph
-====
+Ceph - Debugging
+-----------------
 
 We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.
 
-The following Ceph clusters are available:
+A dashboard is available at https://ceph-waw2.hswaw.net/, to get the admin password run:
+
+    kubectl -n ceph-waw2 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo
+
+
+Ceph - Backups
+--------------
+
+Kubernetes PVs backed in Ceph RBDs get backed up using Benji. An hourly cronjob runs in every Ceph cluster. You can also manually trigger a run by doing:
+
+    kubectl -n ceph-waw2 create job --from=cronjob/ceph-waw2-benji ceph-waw2-benji-manual-$(date +%s)
+
+Ceph ObjectStorage pools (RADOSGW) are _not_ backed up yet!
+
+Ceph - Object Storage
+---------------------
+
+To create an object store user consult rook.io manual (https://rook.io/docs/rook/v0.9/ceph-object-store-user-crd.html)
+User authentication secret is generated in ceph cluster namespace (`ceph-waw2`),
+thus may need to be manually copied into application namespace. (see
+`app/registry/prod.jsonnet` comment)
+
+`tools/rook-s3cmd-config` can be used to generate test configuration file for s3cmd.
+Remember to append `:default-placement` to your region name (ie. `waw-hdd-redundant-1-object:default-placement`)