Understanding Kubernetes Multi-Tenancy: Models, Challenges, and Solutions
https://www.loft.sh/blog/understanding-kubernetes-multi-tenancy-models-challenges-and-solutions
https://www.loft.sh/blog/understanding-kubernetes-multi-tenancy-models-challenges-and-solutions
Deep Dive into Kubernetes CPU Usage, Requests, and Limits
https://john-tucker.medium.com/deep-dive-into-kubernetes-cpu-usage-requests-and-limits-57b6d0dec625
https://john-tucker.medium.com/deep-dive-into-kubernetes-cpu-usage-requests-and-limits-57b6d0dec625
We Threw Away 13 Years of Work for EKS
https://medium.com/gumgum-tech/we-threw-away-13-years-of-work-for-eks-b0fd8f53917c
Thirteen years of running in EC2.
Thirteen years of custom AMIs. Thirteen years of deployment pipelines put together with toothpicks and bubblegum. Thirteen years of launch scripts that really-do-seem-to-be-an-anti-pattern-but-hey-at-least-they-work.
And we threw it all away to run in EKS.
This is the choice we made at GumGum in early 2023, and this blog post covers the problems that led to this insane idea, and why this idea wasn’t so insane after all.
https://medium.com/gumgum-tech/we-threw-away-13-years-of-work-for-eks-b0fd8f53917c
How we avoided an outage caused by running out of IPs in EKS
https://medium.com/adevinta-tech-blog/how-we-avoided-an-outage-caused-by-running-out-of-ips-in-eks-c831ab97d0e4
Solving IP exhaustion in EKS: Avoiding a network outage by implementing custom networking
https://medium.com/adevinta-tech-blog/how-we-avoided-an-outage-caused-by-running-out-of-ips-in-eks-c831ab97d0e4
A Deep Dive into Kubernetes Validating Admission Policy: The Native Alternative to Webhooks
https://medium.com/@chetanatole99/a-deep-dive-into-kubernetes-validating-admission-policy-the-native-alternative-to-webhooks-b35df05e6a5b
https://medium.com/@chetanatole99/a-deep-dive-into-kubernetes-validating-admission-policy-the-native-alternative-to-webhooks-b35df05e6a5b
The Bootc Revolution: One Build Language for VMs and Containers
https://medium.com/@josephsims1/the-bootc-revolution-one-build-language-for-vms-and-containers-48ecdf7fc7e6
https://medium.com/@josephsims1/the-bootc-revolution-one-build-language-for-vms-and-containers-48ecdf7fc7e6
updatecli
https://github.com/updatecli/updatecli
Automatically open a PR on your GitOps repository when a third party service publishes an update
https://github.com/updatecli/updatecli
pepr
https://github.com/defenseunicorns/pepr
Pepr is on a mission to save Kubernetes from the tyranny of YAML, intimidating glue code, bash scripts, and other makeshift solutions.
https://github.com/defenseunicorns/pepr
ClusterSecret
https://github.com/zakkg3/ClusterSecret
The clusterSecret operator makes sure all the matching namespaces have the secret available and up to date.
https://github.com/zakkg3/ClusterSecret
ls-lint
https://github.com/loeffel-io/ls-lint
An extremely fast directory and filename linter - Bring some structure to your project filesystem
https://github.com/loeffel-io/ls-lint
zxc
https://github.com/hail-hydrant/zxc
Terminal based intercepting proxy written in rust with tmux and vim as user interface.
https://github.com/hail-hydrant/zxc
graft
https://github.com/orbitinghail/graft
Transactional page storage engine supporting lazy partial replication to the edge. Optimized for scale and cost over latency. Leverages object storage for durability.
https://github.com/orbitinghail/graft
liam
https://github.com/liam-hq/liam
Automatically generates beautiful and easy-to-read ER diagrams from your database.
https://github.com/liam-hq/liam
How We Run Terraform At Scale
https://benchling.engineering/how-we-run-terraform-at-scale-da7bb75dc394
Managing over 165k cloud resources across hundreds of workspaces could seem daunting. But for us, it’s just another day at Benchling. Here’s how we do it.
We currently have:
- 165k cloud resources under management
- 625 Terraform workspaces
- 38 AWS accounts
- 170 engineers (40 of whom are infra specialists)
We perform:
- 225 infrastructure releases daily (terraform apply operations)
- 723 plans daily (terraform plan operations)
We’ve been successfully operating Benchling’s infrastructure release system for the past two years (spoiler, it’s Terraform Cloud), over which time we’ve doubled our infrastructure footprint with minimal additional release overhead.
https://benchling.engineering/how-we-run-terraform-at-scale-da7bb75dc394
openinfraquote
https://github.com/terrateamio/openinfraquote
OpenInfraQuote is a lightweight, open-source CLI tool for estimating infrastructure costs using Terraform plan and state files. It runs locally or in CI/CD. No backend, no API keys, no external services.
https://github.com/terrateamio/openinfraquote
Things that go wrong with disk IO
https://notes.eatonphil.com/2025-03-27-things-that-go-wrong-with-disk-io.html
There are a few interesting scenarios to keep in mind when writing applications (not just databases!) that read and write files, particularly in transactional contexts where you actually care about the integrity of the data and when you are editing data in place (versus copy-on-write for example).
https://notes.eatonphil.com/2025-03-27-things-that-go-wrong-with-disk-io.html
Hot Take: I Want Execs Closer to Incidents, Not Farther
https://uptimelabs.io/hot-take-i-want-execs-closer-to-incidents-not-farther
https://uptimelabs.io/hot-take-i-want-execs-closer-to-incidents-not-farther
Improving Kubernetes-Mixin API Server Rules Consistency
https://medium.com/codex/improving-kubernetes-mixin-api-server-rules-consistency-1c0d727e8160
A journey into troubleshooting an insidious, and subtle, issue that may occur with Prometheus Recording Rules
https://medium.com/codex/improving-kubernetes-mixin-api-server-rules-consistency-1c0d727e8160
Simplifying Kubernetes Limits Range with sxlimits
https://startxfr.medium.com/simplifying-kubernetes-limits-range-with-sxlimits-604a96eaaf2c
https://startxfr.medium.com/simplifying-kubernetes-limits-range-with-sxlimits-604a96eaaf2c