Kubernetes is not the first thing that comes to mind when I think of "understanding where their code is running and what it's doing"...
Just an “idle” Kubernetes system is a behemoth to comprehend…
Kubernetes is etcd, apiserver, and controllers. That's exactly as many components as your average MVC app. The control-loop thing is interesting, and there are a few "kinds" of resources to get used to, but why is it always presented as this insurmountable complexity?
I ran into a VXLAN checksum offload kernel bug once, but otherwise this thing is just solid. Sure it's a lot of YAML but I don't understand the rep.
…and containerd and csi plugins and kubelet and cni plugins and kubectl and kube-proxy and ingresses and load balancers…
Sure at some point there are too many layers to count but I wouldn't say any of this is "Kubernetes". What people tend to be hung about is the difficulty of Kubernetes compared to `docker run` or `docker compose up`. That is what I am surprised about.
I never had any issue with kubelet, or kube-proxy, or CSI plugins, or CNI plugins. That is after years of running a multi-tenant cluster in a research institution. I think about those about as much as I think about ext4, runc, or GRUB.
And CNI problems are extremely normal. Pretty much anyone that didn't just use weavenet and called it a day has had to spend quiet a bit of time to figure it out. If you already know networking by heart it's obviously going to be easier, but few devs do.
You definitely can run Kubernetes without running Ceph or any storage system, and you already rely on a distributed storage system if you use the cloud whether you use Kubernetes or not. So I wouldn't count this as added complexity from Kubernetes.
If you discount issues like that, you can safely say that it's impossible to have any issues with CSI, because it's always going to be with one of it's implementation.
That feels a little disingenuous, but maybe that's just me.
For example you'd say AWS EBS is part of Kubernetes?
But then there's always always a lot of complexity and abstraction. Certainly, most software people don't need to know everything about what a CPU is doing at the lowest levels.
I mean, in my homelab I do have Kubernetes and no LB in front, but it's a homelab for fun and learn K8s internals. But in a professional environment...
step one: draw a circle
step two: import the rest of the owl
Go back to good ol' corsync/pacemaker clusters with XML and custom scripts to migrate IPs and set up firewall rules (and if you have someone writing them for you, why don't you have people managing your k8s clusters?).
Or buy something from a cloud provider that "just works" and eventually go down in flames with their indian call centers doing their best but with limited access to engineering to understand why service X is misbehaving for you and trashing your customer's data. It's trade-offs all the way.
Do you understand you're referring to optional components and add-ons?
> and kubectl
You mean the command line interface that you optionally use if you choose to do so?
> and kube-proxy and ingresses and load balancers…
Do you understand you're referring to whole classes of applications you run on top of Kubernetes?
I get it that you're trying to make a mountain out of a mole hill. Just understand that you can't argue that something is complex by giving as your best examples a bunch of things that aren't really tied to it.
It's like trying to claim Windows is hard, and then your best example is showing a screenshot of AutoCAD.
CSI is optional, you can just not use persistent storage (use the S3 API or whatever) or declare persistentvolumes that are bound to a single or group of machines (shared NFS mount or whatever).
I don't know how GP thinks you could run without the other bits though. You do need kubelet and a container runtime.
For some applications these people are absolutely right, but they've persuaded themselves that that means it's the best way to handle all use cases, which makes them see Kubernetes as way more complex than is necessary, rather than as a roll-your-own ECS for those who would otherwise truly need a cloud provider.
I assume everyone wants to be in control of their environment. But with so many ways to compose your infra that means a lot of different things for different people.
K8s is meant to be operated by some class of engineers, and used by another. Just like you have DBAs, sysadmins, etc, maybe your devops should have more system experience besides terraform.
Sir, I upvoted you for your wonderful sense of humour.
Some bash and Ansible and EC2? That is usually what Kubernetes haters suggest one does to simplify.
The main pain point I personally see is that everyone goes 'just use Kubernetes' and this is an answer, however it is not the answer. It steamrolling all conversations leads to a lot of the frustration around it in my view.
I love that the Kubernetes lovers tend to forget that Kubernetes is just one tool, and they believe that the only possible alternative to this coolness is that sweaty sysadmins writing bash scripts in a dark room.
I thought Mesos was kinda dead nowadays, good to hear it’s still kicking. Last time I used it it the networking was a bit annoying, not able to provide virtual network interfaces but only ports.
It seems like if you are going to operate these things, picking a solution with a huge community and in active development feels like the smart thing to do.
Nomad is very nice to use from a developer perspective, and it’s nice to hear infrastructure people preferring it. From outside the reason people pick Kubernetes seems to be the level of control of infra and security teams want over things like networking and disk.
I would argue against Kubernetes in particular situations, and even recommend Ansible in some cases, where it is a better fit in the given circumstances. Do you consider me as a Kubernetes hater?
Point is, Kubernetes is a great tool. In particular situations. Ansible is a great tool. In particular situations. Even bash is a great tool. In particular situations. But Kubernetes even could be the worst tool if you choose unwisely. And Kubernetes is not the ultimate infrastructure tool. There are alternatives, and there will be new ones.
Etcd is truly a horrible data store, even the creator thinks so.
For anyone unfamiliar with this the "official limits" are here, and as of 1.32 it's 5000 nodes, max 300k containers, etc.
https://kubernetes.io/docs/setup/best-practices/cluster-larg...
Maintaining a lot of clusters is super different than maintaining one cluster.
Also please don't actually try to get near those limits, your etcd cluster will be very sad unless you're _very_ careful (think few deployments, few services, few namespaces, no using etcd events, etc).
The department saw more need for storage than Kubernetes compute so that's what we're growing. Nowadays you can get storage machines with 1 PB in them.
The larger Supermicro or Quanta storage servers can easily handle 36 HDD's each, or even more.
So with just 16 of those with 36x24TB disks, that meets the ~14PB capacity mark, leaving 44 remaining nodes for other compute task, load balancing, NVME clusters, etc.
Cluster networking can sometimes get pretty mind-bending, but honestly that's true of just containers on their own.
I think just that ability to schedule pods on its own requires about that level of complexity; you're not going to get a much simpler system if you try to implement things yourself. Most of the complexity in k8s comes from components layered on top of that core, but then again, once you start adding features, any custom solution will also grow more complex.
If there's one legitimate complaint when it comes to k8s complexity, it's the ad-hoc way annotations get used to control behaviour in a way that isn't discoverable or type-checked like API objects are, and you just have to be aware that they could exist and affect how things behave. A huge benefit of k8s for me is its built-in discoverability, and annotations hurt that quite a bit.
I would ask a different question. How many people actually need to understand implementation details of Kubernetes?
Look at any company. They pay engineers to maintain a web app/backend/mobile app. They want features to be rolled out, and they want their services to be up. At which point does anyone say "we need an expert who actually understands Kubernetes"?
I have to wonder how many people actually understand when to use K8s or docker. Docker is not a magic bullet, and can actually be a foot gun when it's not the right solution.
In the end it's a scheduler for Docker containers on a bunch of virtual or bare metal machines. Once you get that in your head life becomes much more easy.
The only thing I'd really love to see from an ops perspective is a way to force-revive crashed containers for debugging. Yes, one shouldn't have to debug cattle, just haul the carcass off and get a new one... but I still prefer to know why the cattle died.
* Host hundreds or thousands of interacting containers across multiple teams in sane manner * Let's you manage and understand how is it done in the full extent.
Of course there are tons of organizations that can (and should) easily resign from one of these, but if you need both, there isn't better choice right now.
What looks like absurd scale to one team is a regular Tuesday for another, because "scale" is completely meaningless without context. We don't balk at a single machine running dozens of processes for a single web browser, we shouldn't balk at something running dozens of containers to do something that creates value somehow. And scale that up by number of devs/customers and you can see how thousands/hundreds of thousands can happen easily.
Also the cloud vendors make it easy to have these problems because it's super profitable.
* H: "kubernetes [at planetary scale] is too complex"
* A: "you can run it on a toaster and it's simpler to reason about than systemd + pile of bash scripts"
* H: "what's the point of single node kubernetes? I'll just SSH in and paste my bash script and call it a day"
* A: "but how do you scale/maintain that?"
* H: "who needs that scale?"
What's a bit different is we're creating own products, not renting people to others, so having uniform hosting platform is actual benefit.
I mean, if that's your starting point, then complexity is absolutely a given. When folks complain about the complexity of Kubernetes, they are usually complaining about the complexity relative to a project that runs a frontend, a backend, and a postgres instance...
People started using K8s for training, where you already had a network isolated cluster. Extending the K8s+container pattern to multi-tenant environments is scary at best.
I didn't understand the following part though.
> Instead, we burned months trying (and ultimately failing) to get Nvidia’s host drivers working to map virtualized GPUs into Intel Cloud Hypervisor.
Why was this part so hard? Doing PCI passthrough with the Cloud Hypervisor (CH) is relatively common. Was it the transition from Firecracker to CH that was tricky?
Bonus points for writing a basic implementation from first principles capturing the essence of the problem kubernetes really was meant to solve.
The 100 pages kubernetes book, Andriy Burkov style.
https://github.com/kelseyhightower/kubernetes-the-hard-way
It probably won't answer the "why" (although any LLM can answer that nowadays), but it will definitely answer the "how".
Thanks for taking the time to share the walk through.
I mean an understanding from the view of the internals and not so much the user perspective.
What would be the interest of it? Think about it:
- kubernetes is an interface and not a specific implementation,
- the bulk of the industry standardized on managed services, which means you actually have no idea what are the actual internals driving your services,
- so you read up on the exact function call that handles a specific aspect of pod auto scaling. That was a nice read. How does that make you a better engineer than those who didn't?
I just want to know how you'd implement something that would load your services and dependencies from a config file, bind them altogether, distribute the load through several local VMs and make it still work if I kill the service or increase the load.
In less than 1000 lines.
Then you seem to be confused, because you're saying Kubernetes but what you're actually talking about is implementing a toy container orchestrator.
I really wonder why this opinion is so commonly accepted by everyone. I get that not everything needs most Kubernetes features, but it's useful. The Linux kernel is a dreadfully complex beast full of winding subsystems and full of screaming demons all over. eBPF, namespaces, io_uring, cgroups, SE Linux, so much more, all interacting with eachother in sometimes surprising ways.
I suspect there is a decent likelihood that a lot of sysadmins have a more complete understanding of what's going on in Kubernetes than in Linux.
I think there's a degree of confusion over your understanding of what Kubernetes is.
Kubernetes is a platform to run containerized applications. Originally it started as a way to simplify the work of putting together clusters of COTS hardware, but since then its popularity drove it to become the platform instead of an abstraction over other platforms.
What this means is that Kubernetes is now a standard way to deploy cloud applications, regardless of complexity or scale. Kubernetes is used to deploy apps to raspberry pis, one-box systems running under your desk, your own workstation, one or more VMs running on random cloud providers, and AWS. That's it.
My point is that the mere notion of "a system that's actually big or complex enough to warrant using Kubernetes" is completely absurd, and communicates a high degree of complete cluelessness over the whole topic.
Do you know what's a system big enough for Kubernetes? It's a single instance of a single container. That's it. Kubernetes is a container orchestration system. You tell it to run a container, and it runs it. That's it.
See how silly it all becomes once you realize these things?
https://www.ibm.com/docs/en/cics-ts/6.x?topic=sysplex-parall...
even if it wasn't as scalable as Kube. One the other hand, a cluster of 32 CMOS mainframe could handle any commercial computing job that people were doing in the 1990s.
That's assuming you have a solid foundation in the nuts and bolts of how computers work to begin with.
If you just jumped into software development without that background, well, you're going to end up in the latter pool of developers as described by the parent comment.
Containers are inherently difficult to sum up in a sentence. Perhaps the most reasonable comparison is to liken them to a "lightweight" vm, but the reasons people use them are so drastically different than vms at this point. The most common usecase for containers is having a decent toolchain for simple, somewhat reproducible software environments. Containers are mostly a hack to get around the mess we've made in software.
A VM, in contrast, fakes the existence of an entire computer, hardware and all. That fake hardware comes with a fake disk on which you put a new root filesystem, but it also comes with a whole lot of other virtualization. In a VM, CPU instructions (eg CPUID) can get trapped and executed by the VM to fake the existence of a different processor, and things like network drivers are completely synthetic. None of that happens with containers. A VM, in turn, needs to run its own OS to manage all this fake hardware, while a container gets to piggyback on the management functions of the host and can then include a very minimal amount of stuff in its synthetic root.
Not than I think. I'm well aware of how "tasks" work in Linux specifically, and am pretty comfortable working directly with clone.
Your explanation is great, but I intentionally went out of my way to not explain it and instead give a simple analogy. The entire point was that it's difficult to summarize.
It came from how Docker works, when you start a new container it runs a single process in the container, as defined in the Dockerfile.
It's a simplification of what containers are capable of and how they do what they do, but that simplification is how it got popular.
Super easy if we talk about Linux. It's a process tree being spawned inside it's own set of kernel namespaces, security measures and a cgroup to provide isolation from the rest of the system.
Once you recursively expand all the concepts, you will have multiple dense paragraphs, which don't "summarize" anything, but instead provide full explanations.
If you're running one team with all services trusting each other, you don't have problems solved by these things. Whenever you introduce a CNCF component outside core kubernetes, invest time in understanding it and why it does what it does. Nothing is "deploy and forget" and will need to be regularly checked and upgraded, and when issues come up you need some architecture-level of the component to troubleshoot because so many moving parts are there.
So if I can get away writing my own cronjob in 1000 lines rather than installing something from GitHub with a helm chart, I will go with the former option.
(Helm is crap though, but you often won't have much choice).
But yeah, the argument could have as well just said running code on a VPS directly, because that also gives you a good deal of control.
> The other group (increasingly large) just wants to `git push` and be done with it, and they're willing to spend a lot of (usually their employer's) money to have that experience. They don't want to have to understand DNS, linux, or anything else beyond whatever framework they are using.
I'm a "full full-stack" developer because I understand what happens when you type an address into the address bar and hit Enter - the DNS request that returns a CNAME record to object storage, how it returns an SPA, the subsequent XHR requests laden with and cookies and other goodies, the three reverse proxies they have to flow through to get to before they get to one of several containers running on a fleet of VMs, the environment variable being injected by the k8s control plane from a Secret that tells the app where the Postgres instance is, the security groups that allow tcp/5432 from the node server to that instance, et cetera ad infinitum. I'm not hooking debuggers up to V8 to examine optimizations or tweaking container runtimes but I can speak intelligently to and debug every major part of a modern web app stack because I feel strongly that it's my job to be able to do so (and because I've worked places where if I didn't develop that knowledge then nobody would have).
I can attest that this type of thinking is becoming increasingly rare as our industry continues to specialize. These considerations are now often handled by "DevOps Engineers" who crank out infra and seldom write code outside of Python and bash glue scripts (which is the antithesis to what DevOps is supposed to be, but I digress). I find this unfortunate because this results in teams throwing stuff over the wall to each other which only compounds the hand-wringing when things go wrong. Perhaps this is some weird psychopathology of mine but I sleep much better at night knowing that if I'm on the hook for something I can fix it once it's out in the wild, not just when I'm writing features and debugging it locally.