cluster/README - hscloud - Gitiles

 HSCloud Clusters
 ================

 Current cluster: `k0.hswaw.net`

 Accessing via kubectl
 ---------------------

 There isn't yet a service for getting short-term user certificates. Instead, you'll have to get admin certificates:

     clustercfg admincreds $(whoami)-admin
     kubectl get nodes

 Provisioning nodes
 ------------------

  - bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
  - `clustercfg nodestrap bc01nXX.hswaw.net`

 That's it!

 Ceph
 ====

 We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.

 The following Ceph clusters are available:

 ceph-waw1
 ---------

 HDDs on bc01n0{1-3}. 3TB total capacity.

 The following storage classes use this cluster:

  - `waw-hdd-redundant-1` - erasure coded 2.1

 A dashboard is available at https://ceph-waw1.hswaw.net/, to get the admin password run:

     kubectl -n ceph-waw1 get secret rook-ceph-dashboard-password -o yaml | grep "password:" | awk '{print $2}' | base64 --decode ; echo

 Known Issues
 ============

 After running `nixos-configure switch` on the hosts, the shared host/container CNI plugin directory gets nuked, and pods will fail to schedule on that node (TODO(q3k): error message here). To fix this, restart calico-node pods running on nodes that have this issue. The Calico Node pod will reschedule automatically and fix the CNI plugins directory.

     kubectl -n kube-system get pods -o wide | grep calico-node
     kubectl -n kube-system delete pod calico-node-XXXX
	HSCloud Clusters
	================

	Current cluster: `k0.hswaw.net`

	Accessing via kubectl
	---------------------

	There isn't yet a service for getting short-term user certificates. Instead, you'll have to get admin certificates:

	clustercfg admincreds $(whoami)-admin
	kubectl get nodes

	Provisioning nodes
	------------------

	- bring up a new node with nixos, running the configuration.nix from bootstrap (to be documented)
	- `clustercfg nodestrap bc01nXX.hswaw.net`

	That's it!

	Ceph
	====

	We run Ceph via Rook. The Rook operator is running in the `ceph-rook-system` namespace. To debug Ceph issues, start by looking at its logs.

	The following Ceph clusters are available:

	ceph-waw1
	---------

	HDDs on bc01n0{1-3}. 3TB total capacity.

	The following storage classes use this cluster:

	- `waw-hdd-redundant-1` - erasure coded 2.1

	A dashboard is available at https://ceph-waw1.hswaw.net/, to get the admin password run:

	kubectl -n ceph-waw1 get secret rook-ceph-dashboard-password -o yaml \| grep "password:" \| awk '{print $2}' \| base64 --decode ; echo

	Known Issues
	============

	After running `nixos-configure switch` on the hosts, the shared host/container CNI plugin directory gets nuked, and pods will fail to schedule on that node (TODO(q3k): error message here). To fix this, restart calico-node pods running on nodes that have this issue. The Calico Node pod will reschedule automatically and fix the CNI plugins directory.

	kubectl -n kube-system get pods -o wide \| grep calico-node
	kubectl -n kube-system delete pod calico-node-XXXX