Introduction
I wanted to set up a Kubernetes Cluster with HA on my Proxmox Datacenter. I went with HA because i have more than one Proxmox Server so my idea was to deploy the Containers on different servers and use HA to always have my services running, even if one server fails. We also use HAproxy and keepalived to have a HA Cluster of 2 Containers (one on each Proxmox host) to allow access to our server nodes over a single virtual IP
I decided to go with K3s as my Kubernetes distribution because its lightweight and easy to set up.
In K3s, there are 2 approaches to HA. One with an external database and one with embedded etcd. I decided not to use an external database because i would need to setup 2 VMs/Containers on my Proxmox hosts with replication just so i dont have a single point of failure again.
The setup im using in tl;dr:
- Rocky Linux 9 LXCs on 2 Proxmox Hosts
- K3s, 3 server nodes and 3 agent nodes
- HAproxy, 2 LXCs
- no traefik, no servicelb – MetalLB instead
- HA through embedded etcd
Architecture
General preparation
In this setup i use 6 LXCs for 6 nodes and 2 LCXs for HAproxy which requires 9 IP addresses. We also use a range of 11 IPs for the Loadbalancer, resulting in 20 IPs total.
I’m using:
Name | IP | Host |
server-1 | 10.1.20.40 | Proxmox 1 |
server-2 | 10.1.20.41 | Proxmox 2 |
server-3 | 10.1.20.42 | Proxmox 2 |
agent-1 | 10.1.20.43 | Proxmox 1 |
agent-2 | 10.1.20.44 | Proxmox 2 |
agent-3 | 10.1.20.45 | Proxmox 2 |
haproxy1 | 10.1.20.46 | Proxmox 1 |
haproxy2 | 10.1.20.47 | Proxmox 2 |
HAproxy VIP | 10.1.20.5 | HAproxys |
Loadbalancer IP Range | 10.1.20.50-60 | K3s LB |
Make sure to install kubectl
and helm
on your machine! All the .yaml-files i use below are in my home directory, make sure to be in the directoy of the files when using kubectl later on.
Preparing LXC on Proxmox
First, we need to download the Rocky Linux image and upload it to Proxmox to a valid storage – CT Templates.
The download can be found here:
https://us.lxd.images.canonical.com/images/rockylinux/9/amd64/default/
The newest versions can be found in the folders here. The file we need is „rootfs.tar.xz“. Download it, rename it to rocky9.tar.xz or something and upload it to your Proxmox storage.
Now we can start creating the LXCs.
Click on Create CT in Proxmox and check Advanced in the bottom:
Make sure to uncheck „Unpriviled container“, enter root password and optionally insert public ssh key. Set the hostname to the node name, e.g. server-1.
In the next window select your Storage with the CT Template, select the Rocky 9 template and continue.
Select your storage where the LXC is created and assign a disc size. I just use 16GB, do this as it fits your storage/needs.
Same as before, assignt CPUs – i use 4.
Again, assign Memory. I’m using 4GB with 0mb swap.
In the next 2 windows assign your network settings. Bridge, vlan-id, IP (from the table above), Gateway and DNS-Server. I also disable the Firewall here.
Once you’re on the Confirm page make sure to uncheck „Start after created“! We have to adjust some settings before booting the LXC.
In the /etc/pve/lxc
directory, you’ll find files called XXX.conf
, where XXX
are the ID numbers of the containers we just created. Using your text editor of choice, edit the files for the containers we created to add the following lines:
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:
lxc.mount.auto: "proc:rw sys:rw"
Note: It’s important that the container is stopped when you try to edit the file, otherwise Proxmox’s network filesystem will prevent you from saving it.
In order, these options (1) disable AppArmor, (2) allow the container’s cgroup to access all devices, (3) prevent dropping any capabilities for the container, and (4) mount /proc
and /sys
as read-write in the container.
Next, we need to publish the kernel boot configuration into the container. Normally, this isn’t needed by the container since it runs using the host’s kernel, but the Kubelet uses the configuration to determine various settings for the runtime, so we need to copy it into the container. To do this, first start the container using the Proxmox web UI, then run the following command on the Proxmox host:
pct push <container id> /boot/config-$(uname -r) /boot/config-$(uname -r)
Finally, in each of the containers, we need to make sure that /dev/kmsg
exists. Kubelet uses this for some logging functions, and it doesn’t exist in the containers by default. For our purposes, we’ll just alias it to /dev/console
. In each container, create the file /usr/local/bin/conf-kmsg.sh
with the following contents:
#!/bin/sh -e
if [ ! -e /dev/kmsg ]; then
ln -s /dev/console /dev/kmsg
fi
mount --make-rshared /
This script symlinks /dev/console
as /dev/kmsg
if the latter does not exist. Finally, we will configure it to run when the container starts with a SystemD one-shot service. Create the file /etc/systemd/system/conf-kmsg.service
with the following contents:
[Unit]
Description=Make sure /dev/kmsg exists
[Service]
Type=simple
RemainAfterExit=yes
ExecStart=/usr/local/bin/conf-kmsg.sh
TimeoutStartSec=0
[Install]
WantedBy=default.target
Finally, enable the service by running the following:
chmod +x /usr/local/bin/conf-kmsg.sh
systemctl daemon-reload
systemctl enable --now conf-kmsg
Do these steps 6 times, 3x for the server nodes and 3x for the agent nodes.
Setting up HAproxy LXCs
Setup two LXCs containers with the Rocky 9 image as before, just skip the parts after creating the containers. You can boot the two containers right after creation.
Do the following on both containers.
We start by installing haproxy
and keepalived
on the LXC.
dnf install haproxy keepalived
After this, we have to allow IP forwarding and non-local binding to allow keepalived to forward network packets to the backend servers.
IP forwarding:
sed -i 's/#net.ipv4.ip_forward=1/net.ipv4.ip_forward=1/' /etc/sysctl.conf
Bind to non-local adresses:
echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf
Reload sysctl settings:
sysctl -p
Now we edit the haproxy config file with vi:
vi /etc/haproxy/haproxy.cfg
Remove the default config and insert the following:
global
log /dev/log local0 warning
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
log global
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend kube-apiserver
bind *:6443
mode tcp
option tcplog
default_backend kube-apiserver
backend kube-apiserver
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server kube-controller-server-1 10.1.20.40:6443 check
server kube-controller-server-2 10.1.20.41:6443 check
server kube-controller-server-3 10.1.20.42:6443 check
Save the file and run the following command to restart HAproxy.
systemctl restart haproxy
Make it persist through reboots:
systemctl enable haproxy
Make sure to configure the LB on the other LXC as well.
Keepalived Configuration
Keepalived must be installed on both machines while the configuration of them is slightly different.
Run the following command to configure Keepalived.
vi /etc/keepalived/keepalived.conf
Here is the configuration (haproxy1) for your reference:
global_defs {
notification_email {
}
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance haproxy-vip {
state MASTER # Master on haproxy1, BACKUP on second
priority 200 # 200 on haproxy1, 100 on second
interface eth0 # Network card name
virtual_router_id 60
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
unicast_src_ip 10.1.20.45 # The IP address of this machine
unicast_peer {
10.1.20.46 # The IP address of peer machines
}
virtual_ipaddress {
10.1.20.5/24 # The VIP address
}
track_script {
chk_haproxy
}
}
The config on haproxy2 looks a little different:
global_defs {
notification_email {
}
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance haproxy-vip {
state BACKUP # Master on haproxy1, BACKUP on second
priority 100 # 200 on haproxy1, 100 on second
interface eth0 # Network card name
virtual_router_id 60
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
unicast_src_ip 10.1.20.46 # The IP address of this machine
unicast_peer {
10.1.20.45 # The IP address of peer machines
}
virtual_ipaddress {
10.1.20.5/24 # The VIP address
}
track_script {
chk_haproxy
}
}
Save the files and run the following command to restart keepalived.
systemctl restart keepalived
Make it persist through reboots:
systemctl enable keepalived
Setting up the Container OS & K3s
Now that we’ve got the containers up and running, we will set up Rancher K3s on them. Luckily, Rancher intentionally makes this pretty easy.
Setting up server nodes
Starting on the first server node, we’ll run the following command to setup K3s:
curl -sfL https://get.k3s.io | sh -s - server --token=YOURTOKENHERE --tls-san dns.name.lab.local --tls-san 10.1.20.5 --cluster-init --disable servicelb --disable traefik
We run the setup with a token (just generate one or use random string), –clusterinit on the FIRST NODE ONLY and then disable the default loadbalancer and traefik ingress proxy. –tls-san is needed to pass the virtual IP information on to be added in the certificates.
On the next 2 nodes we basically run the same command – minus the –cluster-init but instead use –server to point those nodes to the first node we set up.
curl -sfL https://get.k3s.io | sh -s - server --token=YOURTOKENHERE --tls-san dns.name.lab.local --tls-san 10.1.20.5 --server https://10.1.20.40:6443 --disable servicelb --disable traefik
Once everything is done, you can copy /etc/rancher/k3s/k3s.yaml
to ~/.kube/config
on your local machine.
Edit the k3s.yaml,
the IP in there probably is set to 127.0.0.1 – change this to 10.1.2.5 (VIP of the HAproxy) and you should be able to see your new cluster using kubectl get nodes
!
NAME STATUS ROLES AGE VERSION
server-1 Ready control-plane,etcd,master 17h v1.27.7+k3s2
server-2 Ready control-plane,etcd,master 17h v1.27.7+k3
server-3 Ready control-plane,etcd,master 17h v1.27.7+k3s2
Adding the agent nodes
Now we go back to our remaining three LXCs, the agent nodes. We just have to join them to the cluster, pointing them at the HA IP (10.1.20.5). For this we run the same command on all three nodes:
curl -sfL https://get.k3s.io | sh -s - agent --token=YOURTOKENHERE --server https://10.1.20.5:6443
Do this step on all nodes and finally check with kubectl get nodes
:
NAME STATUS ROLES AGE VERSION
agent-1 Ready <none> 59s v1.27.7+k3s2
agent-2 Ready <none> 29s v1.27.7+k3s2
agent-3 Ready <none> 14s v1.27.7+k3s2
server-1 Ready control-plane,etcd,master 17h v1.27.7+k3s2
server-2 Ready control-plane,etcd,master 17h v1.27.7+k3s2
server-3 Ready control-plane,etcd,master 17h v1.27.7+k3s2
At this point our HA Cluster is up and running! In the next step, we set up MetalLB as our Loadbalancer to make the services reachable for „end user“.
Load balancer
Since K3s is fully compatible with Helm out of the box, we just use the Helm Controller to install MetalLB.
We assign a range of IPs that our Loadbalancer can use to make services available. In my case i just use 10.1.20.50 to 10.1.20.60.
We start by adding the MetalLB repo:
helm repo add metallb https://metallb.github.io/metallb
In the next step, we install it but for this, we want to install it with additional parameters. Create a file metallb-values.yaml
and fill it with this text:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default-pool
namespace: metallb-system
spec:
addresses:
- 10.1.20.50-10.1.20.60
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default
namespace: metallb-system
spec:
ipAddressPools:
- default-pool
Next, we install MetalLB.
helm install metallb metallb/metallb --create-namespace \
--namespace metallb-system --wait
And finally, we apply the metallb-values.yaml
to it.
kubectl apply -f metallb-values.yaml
Now, verify all pods are there with kubectl get pods -n metallb-system
:
NAME READY STATUS RESTARTS AGE
metallb-controller-6cb58c6c9b-jtd58 1/1 Running 0 3m24s
metallb-speaker-4s862 4/4 Running 0 3m24s
metallb-speaker-lv2bf 4/4 Running 0 3m24s
metallb-speaker-m2wcz 4/4 Running 0 3m24s
metallb-speaker-rrdw5 4/4 Running 0 3m24s
metallb-speaker-tg2xr 4/4 Running 0 3m24s
metallb-speaker-vn24x 4/4 Running 0 3m24s
Kubernetes Dashboard
To test everything, we install the Kubernetes Dashboard and set it up on our loadbalancer.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
The Dashboard is running, we just have to apply more config to expose it through the Loadbalancer.
Create a file dashboard-lb.yaml
and fill it with the following code:
apiVersion: v1
kind: Service
metadata:
name: kubernetes-dashboard-lb
namespace: kube-system
spec:
type: LoadBalancer
ports:
- port: 443
protocol: TCP
targetPort: 8443
selector:
k8s-app: kubernetes-dashboard
And now apply it with kubectl apply -f dashboard-lb.yaml
After a few seconds, check the IP it got assigned with kubectl -n kubernetes-dashboard get svc
:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.43.16.74 <none> 8000/TCP 2m6s
kubernetes-dashboard ClusterIP 10.43.99.34 <none> 443/TCP 2m6s
kubernetes-dashboard-lb LoadBalancer 10.43.125.0 10.1.20.50 443:30167/TCP 119s
Now we can open the web browser and access https://10.1.20.50 and we are presented this site:
As per the documentation from the Kubernetes Dashboard:
To protect your cluster data, Dashboard deploys with a minimal RBAC configuration by default. Currently, Dashboard only supports logging in with a Bearer Token. To create a token for this demo, you can follow our guide on creating a sample user.
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
Because i want to access the Dashboard frequently, i went with a long-lived Bearer Token. I created a file db-admin.yaml
:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
---
apiVersion: v1
kind: Secret
metadata:
name: admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: "admin-user"
type: kubernetes.io/service-account-token
And applied it with kubectl apply -f db-admin.yaml
. Next step is to create a ClusterRoleBinding, crb.yaml
:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
Apply it, as always: kubectl apply -f crb.yaml
Now we extract the token with:
kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d
The string we get now can be used to access our dashboard. Once we’re in, we can check out all the nodes:
The next we really need is an ingress controller. K3s ships with Traefik but we disabled it in the installation with a flag because i wanted to use nginx ingress controller.
NGINX Ingress Controller
With the ingress controller we can assign dns names to services and expose them. Later on we can also automatically generate Certificates for those hostnames to securely use https.
First, we add the repo to helm:
helm repo add nginx-stable https://helm.nginx.com/stable
After this, we can install nginx. We use the default settings but create a namespace for it:
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
Now, we create a service that maps the ports 80 and 443 of nginx to MetalLB. This way, the nginx gets one of our Loadbalancer IPs and is accessible via http(s) on this ip. We create ingress-controller-lb.yaml
:
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx-controller-loadbalancer
namespace: ingress-nginx
spec:
selector:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
- name: https
port: 443
protocol: TCP
targetPort: 443
type: LoadBalancer
And apply it: kubectl apply -f ingress-controller-lb.yaml
. Now we can deploy services and pods and expose them on nginx. We can do one final kubectl get services -n ingress-nginx
to see what IP from our range the ingress controller got:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.43.147.154 10.1.20.51 80:30240/TCP,443:32350/TCP 5h31m
ingress-nginx-controller-admission ClusterIP 10.43.78.109 <none> 443/TCP 5h31m
In my example we got the 10.1.20.51.
Example: deploy uptime-kuma docker
To deploy a docker container on our cluster we have to deploy it, create a service that maps the port(s) and an ingress to expose it through nginx.
apiVersion: apps/v1
kind: Deployment
metadata:
name: uptime-kuma
namespace: uptime-kuma
spec:
selector:
matchLabels:
name: uptime-kuma-nginx-backend
template:
metadata:
labels:
name: uptime-kuma-nginx-backend
spec:
containers:
- name: backend
image: louislam/uptime-kuma:1
imagePullPolicy: Always
ports:
- containerPort: 3001
---
apiVersion: v1
kind: Service
metadata:
name: uptime-kuma-nginx-service
namespace: uptime-kuma
spec:
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 3001
selector:
name: uptime-kuma-nginx-backend
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: uptime-kuma-nginx-ingress
namespace: uptime-kuma
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "false"
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/use-regex: "true"
spec:
rules:
- host: kuma.homelab.local
http:
paths:
- path: /(.*)
pathType: Prefix
backend:
service:
name: uptime-kuma-nginx-service
port:
number: 8080
We save this as kuma-http.yaml
and before we deploy it, we create the namespace for it with kubectl create namespace uptime-kuma
. Then deploy it with kubectl apply -f kuma-http.yaml -n uptime-kuma
Since we set the host to the fqdn kuma.homelab.local
, we have to make sure our DNS server points this name to 10.20.1.51. Now when we open it in our browser we are greeted by the setup page of uptime kuma!