k0.hswaw.net: pass metallb through Calico

Previously, we had the following setup:

                          .-----------.
                          | .....     |
                        .-----------.-|
                        | dcr01s24  | |
                      .-----------.-| |
                      | dcr01s22  | | |
                  .---|-----------| |-'
    .--------.    |   |---------. | |
    | dcsw01 | <----- | metallb | |-'
    '--------'        |---------' |
                      '-----------'

Ie., each metallb on each node directly talked to dcsw01 over BGP to
announce ExternalIPs to our L3 fabric.

Now, we rejigger the configuration to instead have Calico's BIRD
instances talk BGP to dcsw01, and have metallb talk locally to Calico.

                      .-------------------------.
                      | dcr01s24                |
                      |-------------------------|
    .--------.        |---------.   .---------. |
    | dcsw01 | <----- | Calico  |<--| metallb | |
    '--------'        |---------'   '---------' |
                      '-------------------------'

This makes Calico announce our pod/service networks into our L3 fabric!

Calico and metallb talk to eachother over 127.0.0.1 (they both run with
Host Networking), but that requires one side to flip to pasive mode. We
chose to do that with Calico, by overriding its BIRD config and
special-casing any 127.0.0.1 peer to enable passive mode.

We also override Calico's Other Bird Template (bird_ipam.cfg) to fiddle
with the kernel programming filter (ie. to-kernel-routing-table filter),
where we disable programming unreachable routes. This is because routes
coming from metallb have their next-hop set to 127.0.0.1, which makes
bird mark them as unreachable. Unreachable routes in the kernel will
break local access to ExternalIPs, eg. register access from containerd.

All routes pass through without route reflectors and a full mesh as we
use eBGP over private ASNs in our fabric.

We also have to make Calico aware of metallb pools - otherwise, routes
announced by metallb end up being filtered by Calico.

This is all mildly hacky. Here's hoping that Calico will be able to some
day gain metallb-like functionality, ie. IPAM for
externalIPs/LoadBalancers/...

There seems to be however one problem with this change (but I'm not
fixing it yet as it's not critical): metallb would previously only
announce IPs from nodes that were serving that service. Now, however,
the Calico internal mesh makes those appear from every node. This can
probably be fixed by disabling local meshing, enabling route reflection
on dcsw01 (to recreate the mesh routing through dcsw01). Or, maybe by
some more hacking of the Calico BIRD config :/.

Change-Id: I3df1f6ae7fa1911dd53956ced3b073581ef0e836
diff --git a/cluster/kube/lib/calico-bird-ipam.cfg.template b/cluster/kube/lib/calico-bird-ipam.cfg.template
new file mode 100644
index 0000000..869a480
--- /dev/null
+++ b/cluster/kube/lib/calico-bird-ipam.cfg.template
@@ -0,0 +1,66 @@
+# This is forked from bird.cfg.template from calico running on k0.hswaw.net on 2020/09/21.
+# Changed vs. upstream (C-f HSCLOUD):
+#  - do not program RTD_UNREACHABLE routes into the kernel (these come from metallb, and
+#    programming them seems to break things)
+# Generated by confd
+filter calico_export_to_bgp_peers {
+  calico_aggr();
+{{- $static_key := "/staticroutes"}}
+{{- if ls $static_key}}
+
+  # Export static routes.
+  {{- range ls $static_key}}
+    {{- $parts := split . "-"}}
+    {{- $cidr := join $parts "/"}}
+  if ( net ~ {{$cidr}} ) then { accept; }
+  {{- end}}
+{{- end}}
+{{range ls "/v1/ipam/v4/pool"}}{{$data := json (getv (printf "/v1/ipam/v4/pool/%s" .))}}
+  if ( net ~ {{$data.cidr}} ) then {
+    accept;
+  }
+{{- end}}
+  reject;
+}
+
+{{$network_key := printf "/bgp/v1/host/%s/network_v4" (getenv "NODENAME")}}
+filter calico_kernel_programming {
+{{- $reject_key := "/rejectcidrs"}}
+{{- if ls $reject_key}}
+
+  if ( dest = RTD_UNREACHABLE ) then { # HSCLOUD
+    reject;
+  }
+
+  # Don't program static routes into kernel.
+  {{- range ls $reject_key}}
+    {{- $parts := split . "-"}}
+    {{- $cidr := join $parts "/"}}
+  if ( net ~ {{$cidr}} ) then { reject; }
+  {{- end}}
+
+{{- end}}
+{{- if exists $network_key}}{{$network := getv $network_key}}
+{{range ls "/v1/ipam/v4/pool"}}{{$data := json (getv (printf "/v1/ipam/v4/pool/%s" .))}}
+  if ( net ~ {{$data.cidr}} ) then {
+{{- if $data.vxlan_mode}}
+    # Don't program VXLAN routes into the kernel - these are handled by Felix.
+    reject;
+  }
+{{- else if $data.ipip_mode}}{{if eq $data.ipip_mode "cross-subnet"}}
+    if defined(bgp_next_hop) && ( bgp_next_hop ~ {{$network}} ) then
+      krt_tunnel = "";                     {{- /* Destination in ipPool, mode is cross sub-net, route from-host on subnet, do not use IPIP */}}
+    else
+      krt_tunnel = "{{$data.ipip}}";       {{- /* Destination in ipPool, mode is cross sub-net, route from-host off subnet, set the tunnel (if IPIP not enabled, value will be "") */}}
+    accept;
+  } {{- else}}
+    krt_tunnel = "{{$data.ipip}}";         {{- /* Destination in ipPool, mode not cross sub-net, set the tunnel (if IPIP not enabled, value will be "") */}}
+    accept;
+  } {{- end}} {{- else}}
+    krt_tunnel = "{{$data.ipip}}";         {{- /* Destination in ipPool, mode field is not present, set the tunnel (if IPIP not enabled, value will be "") */}}
+    accept;
+  } {{- end}}
+{{end}}
+{{- end}}{{/* End of 'exists $network_key' */}}
+  accept;                                  {{- /* Destination is not in any ipPool, accept  */}}
+}