DevOps&SRE Library 6412

DevOps&SRE Library

Amazon EKS Auto Mode vs Azure AKS Automatic: The Better Managed Kubernetes Solution?

https://pixelrobots.co.uk/2024/12/amazon-eks-auto-mode-vs-azure-aks-automatic-the-better-managed-kubernetes-solution

3.1K views15:05

DevOps&SRE Library

A CNI 'Chicken-and-Egg' Dilemma: How Does Calico Assign IPs to Itself?

While research CNI recently, I recalled an interesting issue I encountered during the development of network plugins and investigation of Calico: Calico assigns IP addresses to its own components’ Pods (e.g., calico-kube-controllers). How does Calico achieve this? From the installation of the Calico network plugin to assigning IPs to its own Pods, what happens at the underlying level?

This essentially poses a “chicken-and-egg” problem: running a Pod requires the CNI plugin, while the CNI plugin’s operation depends on the proper functioning of other Pods.

This analysis is based on Cilium v1.16.5, Calico v3.29.1, and Kubernetes v1.23.

https://midbai.com/en/post/cni-chicken-egg-problem

3.09K views07:04

DevOps&SRE Library

How we tested scaling to 10,000 Kubernetes clusters without missing a beat

https://www.spectrocloud.com/blog/how-we-tested-scaling-to-10-000-kubernetes-clusters-without-missing-a-beat

2.97K views15:04

DevOps&SRE Library

kro

This project aims to simplify the creation and management of complex custom resources for Kubernetes.

Kube Resource Orchestrator (kro) helps you to define complex multi-resource constructs as reusable components in your applications and systems. It does this by providing a Kubernetes-native, vendor agnostic way to define groupings of Kubernetes resources.

kro's fundamental custom resource is the ResourceGraphDefinition. A ResourceGraphDefinition defines collections of underlying Kubernetes resources. It can define any Kubernetes resources, either native or custom, and can specify the dependencies between them. This lets you define complex custom resources, and include default configurations for their use.

The kro controller will determine the dependencies between resources, establish the correct order of operations to create and configure them, and then dynamically create and manage all of the underlying resources for you.

kro is Kubernetes native and integrates seamlessly with existing tools to preserve familiar processes and interfaces.

https://github.com/kro-run/kro

3.78K views07:02

DevOps&SRE Library

The Hidden Risk of Running WordPress on Kubernetes: Debugging an Unexpected Downtime Issue

https://medium.com/1000farmacie/the-hidden-risk-of-running-wordpress-on-kubernetes-debugging-an-unexpected-downtime-issue-e810bf4fb577

3.74K views15:02

DevOps&SRE Library

Understanding the 1MB Limit of Etcd in Kubernetes: Challenges with Helm Deployments

https://logeshbalu1998.medium.com/understanding-the-1mb-limit-of-etcd-in-kubernetes-challenges-with-helm-deployments-47ef41f37e9c

4.99K views07:03

DevOps&SRE Library

kubewall

A single binary to manage your multiple kubernetes clusters.

kubewall provides a simple and rich real time interface to manage and investigate your clusters.

https://github.com/kubewall/kubewall

3.67K views15:01

DevOps&SRE Library

stunner

A Kubernetes media gateway for WebRTC.

https://github.com/l7mp/stunner

3.27K views07:00

DevOps&SRE Library

Terratags: Enforce Tags on your AWS Terraform configuration

https://dev.to/quixoticmonk/terratags-enforce-tags-on-your-aws-terraform-configuration-1ck5

2.93K views15:04

DevOps&SRE Library

Azure Verified Module - Azure Landing Zones

In this article, we take a look at the Azure Verified Module for Azure Landing Zones, and how we can customise deployments.

P1: https://mikeguy.co.uk/posts/azure-verified-module-landing-zones-part-1

P2: https://mikeguy.co.uk/posts/azure-verified-module-landing-zones-part-2

3.27K views07:00

DevOps&SRE Library

What I Really Mean When I Say “Good Communication” in Incident Response

https://uptimelabs.io/good-communication-in-incident-response

3.51K views15:03

DevOps&SRE Library

As a Seasoned K8s Expert: An In-Depth Analysis of the OpenAI’s Incident and Mitigation Strategies

On December 11, 2024, OpenAI experienced a major outage caused by a failure in the Kubernetes cluster control plane. For outsiders, this may simply seem like an interesting incident, but as an insider, I analyzed this failure from a technical perspective.

https://midbai.com/en/post/how-to-avoid-openai-incident

3.19K views07:36

DevOps&SRE Library

Taming the Wild West of Research Computing: How Policies Saved Us a Thousand Headaches

Harnessing the power of policy-driven governance in shared computing environments

https://alessandropomponio.medium.com/taming-the-wild-west-of-research-computing-how-policies-saved-us-a-thousand-headaches-9432558f5740

3.44K views15:05

DevOps&SRE Library

We’re leaving Kubernetes

Kubernetes seems like the obvious choice for building out remote, standardized and automated development environments. We thought so too and have spent six years invested in making the most popular cloud development environment platform at internet scale. That’s 1.5 million users, where we regularly see thousands of development environments per day. In that time, we’ve found that Kubernetes is not the right choice for building development environments.

https://www.gitpod.io/blog/we-are-leaving-kubernetes

3.42K views07:05

DevOps&SRE Library

Reducing Pod Startup Time for Java Application on EKS

https://medium.com/@balu8095/reducing-pod-startup-time-for-java-application-on-eks-a4fc80482039

3.63K views15:00

DevOps&SRE Library

How It Works — Validating Admission Policy

https://ihcsim.medium.com/how-it-works-validating-admission-policy-0664d23ce230

4.63K views07:03

DevOps&SRE Library

Istio-Proxy Chaos in the Middle of a Snowy Morning

December 4th, 2024, started as a peaceful, snowy morning. Around 8 AM, I settled into my work-from-home routine, having freshly brewed coffee. My usual workflow:

1. Check the Production dashboard to ensure everything is running smoothly — and it was.
2. Check my email and Slack to see if any team member needs help.
3. Open JIRA, pick up a task and get ready to dive into work.

There were no pressing issues to address. I opened JIRA and picked up a task to migrate one of our Infrastructures as a Code repository from Terragrunt to Terraform. This is a topic for another post to explain why.

Lucked out! The peace and serenity didn’t last long. An alert popped up: One of our production services had gone down. What started as a calm Wednesday morning quickly turned into a troubleshooting adventure.

https://medium.com/@zehendiaries/istio-proxy-chaos-in-the-middle-of-a-snowy-morning-6fe437cf3996

3.96K views15:04

DevOps&SRE Library

Demistifying Istio Gateways

https://medium.com/@arivermar/demistifying-istio-gateways-762d37070431

3.86K views07:05

DevOps&SRE Library

ETCD Production setup with TLS

https://blog.mohsen.co/etcd-production-setup-with-openssl-2b9ecd7e00d5

3.64K views15:04

DevOps&SRE Library

Mastering Compute Efficiency: Dynamic GPU Partitioning Strategies for Kubernetes-Based ML Systems

https://yashmehra2411.medium.com/mastering-gpu-efficiency-dynamic-partitioning-strategies-for-kubernetes-based-ml-systems-75100c94112b

3.44K views07:01

2025/07/08 18:01:14
Back to Top

HTML Embed Code:

<iframe width="100%" src="https://www.tgoop.com/buyppe/web?embed=1" title="Telegram Web" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>