DevOps&SRE Library 6526

canine

Canine is an easy to use intuitive deployment platform for Kubernetes clusters.

3.67K views15:03

How We Migrated 30+ Kubernetes Clusters to Terraform

https://medium.com/learnings-from-the-paas/how-we-migrated-30-kubernetes-clusters-to-terraform-cd2b1cef8b84

3.49K views07:03

DevOps&SRE Library

How We Integrated Native macOS Workloads with Kubernetes

https://medium.com/agoda-engineering/how-we-integrated-native-macos-workloads-with-kubernetes-b4d3c14881a0

3.52K views15:02

DevOps&SRE Library

Why Our Pods Were Breaking Bad (and How We Fixed Them)

In this article, we’ll walk through the process of diagnosing a memory leak, analyzing the root cause, and implementing effective solutions to mitigate its impact. We’ll explore practical steps that any application, regardless of the underlying stack or architecture, can follow to troubleshoot and optimize performance.

https://kshitij-nawandar.medium.com/why-our-pods-were-breaking-bad-and-how-we-fixed-them-b3c3e9e8003b

3.42K views07:03

DevOps&SRE Library

FacetController: How we made infrastructure changes at Lyft simple

https://eng.lyft.com/facetcontroller-how-we-made-infrastructure-changes-at-lyft-simple-dab49f5b27c7

3.09K views15:04

DevOps&SRE Library

Operational Considerations for Managing Stateful Workloads

When managing stateful workloads, whether in Kubernetes or traditional infrastructure, operational concerns like isolation, lifecycle management, security, disaster recovery, scalability, and observability take center stage. While the examples focus on AWS, PostgreSQL, and Kubernetes, the principles and best practices discussed here are broadly applicable to any environment. This article approaches these topics from an operations perspective, prioritizing reliability, maintainability, and resilience. The goal is not just to run a database, but to ensure it operates efficiently, scales properly, and remains secure in real-world conditions. We’ll explore key aspects of running stateful workloads, from managing failure domains to ensuring observability, and how these impact both operations teams and developers. Whether you’re running a database in a cloud-native setup or on bare metal, these strategies will help you build a robust, well-managed system.

https://dev.to/pampatzoglou/operational-considerations-for-managing-stateful-workloads-20c3

3.32K views07:01

DevOps&SRE Library

Can Configuration Languages (config DSLs) solve configuration complexity?

https://itnext.io/can-configuration-languages-dsls-solve-configuration-complexity-eee8f124e13a

3.31K views15:05

DevOps&SRE Library

GKE Cost Cutting — Three Key Lookout Points to view your Potential Savings

https://medium.com/google-cloud/gke-cost-cutting-three-key-lookout-points-to-view-your-potential-savings-10f271dc4fa9

3.25K views07:04

DevOps&SRE Library

How Kubernetes HPA Decides Which Pod to Terminate When Scaling Down

https://medium.com/@AlexanderObregon/how-kubernetes-hpa-decides-which-pod-to-terminate-when-scaling-down-6675ebbdf56f

3.34K views15:01

DevOps&SRE Library

Load Balancing gRPC traffic with Istio

https://dev.to/visepol/load-balancing-grpc-traffic-with-istio-1k49

3.21K views07:04

DevOps&SRE Library

Why Every Platform Engineer Should Care About Kubernetes Operators

https://www.pulumi.com/blog/why-every-platform-engineer-should-care-about-kubernetes-operators

3.91K views15:04

DevOps&SRE Library

Demystifying Swap in Kubernetes: A Handbook for DevOps Engineers

https://medium.com/@robertbotez/demystifying-swap-in-kubernetes-a-handbook-for-devops-engineers-e5ef934593e3

2.9K views07:01

DevOps&SRE Library

Argo Rollouts — Canary Deployment with Istio

https://medium.chuklee.com/argo-rollouts-canary-deployment-with-istio-b432bc141ba9

2.82K views15:02

DevOps&SRE Library

kpatch

kpatch is a Linux dynamic kernel patching infrastructure which allows you to patch a running kernel without rebooting or restarting any processes. It enables sysadmins to apply critical security patches to the kernel immediately, without having to wait for long-running tasks to complete, for users to log off, or for scheduled reboot windows. It gives more control over uptime without sacrificing security or stability.

https://github.com/dynup/kpatch

2.65K views07:04

DevOps&SRE Library

Understanding the Circuit Breaker: A Key Design Pattern for Resilient Systems

The Circuit Breaker Pattern is a key design pattern for building resilient systems by preventing cascading failures and ensuring graceful degradation.

https://dzone.com/articles/circuit-breaker-pattern-resilient-systems

2.46K views15:01

DevOps&SRE Library

Load Testing with Impulse at Airbnb

Comprehensive Load Testing with Load Generator, Dependency Mocker, Traffic Collector, and More

https://medium.com/airbnb-engineering/load-testing-with-impulse-at-airbnb-f466874d03d2

2.27K views07:00

DevOps&SRE Library

🤔 Как не дать развалиться системе из 1500 микросервисов под пиковой нагрузкой? И что делать при DDoS-атаке на 1 млн RPS?

Команда Яндекс Маркета выкатила детальный разбор своей инженерии надёжности. Внутри — честно о том, как на практике работает Graceful Degradation, зачем нужны war rooms и как они проводят нагрузочные тесты прямо на проде.

✅Философия Graceful Degradation.
✅Must-have архитектурные паттерны.
✅Распределение процессов во время инцидентов.
✅Нагрузочное тестирование на проде.

Статья будет полезна тем, кто строит и поддерживает высоконагруженные и распределённые системы. Отличная возможность заглянуть под капот гиганта e-commerce и сравнить их подходы со своими.

Реклама. Рекламодатель ООО «Яндекс.Такси». ИНН 7704340310

2.36K views12:03

2025/07/10 16:05:43
Back to Top

HTML Embed Code:

<iframe width="100%" src="https://www.tgoop.com/buyppe/web?embed=1" title="Telegram Web" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>