github.com/benspilker/proxmox-k3s
Completing this project took a ridiculous amount of time and effort—months in the making—but we did it. If there’s one thing I’ve learned, it’s that in development, you’re never really done. There’s always room to improve, and every script evolved throughout this video series.
Let’s revisit the Proxmox host. We originally allocated 300GB of disk space, with only 200GB usable on local-lvm. After deploying 9 VMs, Kubernetes, Rancher, and a persistent Nextcloud instance, we’re using about 150GB—all as thin-provisioned disk. The disk usage reflects the efficiency of this setup.
Along the way, I ran into major challenges. Early takes of this series failed because VMs didn’t have enough RAM or disk to fully deploy everything. After much trial and error, I finalized the right parameters in Script 2B.
Another hurdle: CPU cores. My physical server runs VMware and has 24 cores but also other VMs. I gave the nested Proxmox 8 cores, but the 9 VMs need 22 in total. That caused contention—VMs would wait for host CPU availability. This impacted Kubernetes deployment timing, especially in Script 6. For example, if the MariaDB pod wasn’t ready, Nextcloud couldn’t connect. I had to build in retry logic to handle timing issues reliably.
In one take, I mistakenly gave the Proxmox VM only 4 cores, which made the problem worse. It became clear just how much certain pods depended on others. Even when a pod showed a “ready” state, internal components weren’t always initialized. A good example: Nextcloud couldn’t create a new database if MariaDB wasn’t fully ready—even if it appeared online. Another issue was that attempts to connect could crash the MariaDB pod under CPU pressure. So I added retry blocks and more graceful handling.
To demonstrate this, I shut down all VMs with Script 2C, powered off the Proxmox host, reduced it to 4 CPUs in VMware, then powered it back on and restarted the VMs. This setup simulates high contention.
I even went further—dropping all VMs to 1 CPU each and the Proxmox host to just 2. Kubernetes barely ran, but it exposed failure points. I left it overnight to let components settle, and by morning, everything was running again. I reran Script 6 to show how the built-in fail-safes handled deployment under stress.
One major fail-safe: creating and connecting the Nextcloud DB in MariaDB. You can see it try, fail, retry—until it finally succeeds. Another: if MariaDB crashes during connection, odd errors appear. Eventually, the script recognizes a corrupted deployment with errors like “maintenance: install is not defined.”
At that point, the script destroys the failed deployment, creates a new namespace, and starts again. After the restart, the deployment completes successfully, and I’m able to log into Nextcloud.
I also wanted to show manually logging into a pod. While the script handles this, it’s useful to see how. To access the Nextcloud pod shell:
POD_NAME=$(kubectl get pods -n nextcloud --no-headers | grep -v maria | awk '{print $1}' | head -n 1)
kubectl exec -it $POD_NAME -n nextcloud -- /bin/bash
Inside the pod, you can use ls, cd, and cat to view and edit the Nextcloud config file, like adding a trusted domain.
This is a good segue into talking about access levels. In this scripted approach, I have full access and can use kubectl exec. But in many production environments, we don’t get node or pod-level access—especially in managed Kubernetes clusters. That’s where GitOps comes in. We’ll explore that method in the next video using a different deployment approach tailored for those environments.
Информация по комментариям в разработке