Awesome Platform Engineering Tools ¶

A curated list of Platform and Production Engineering tools - Maintained by Saif Rajhi

Contents¶

Awesome Platform Engineering Tools
Contents
Articles and Presentations and Books
Newsletters, Chats and Podcasts
Specifications
Reference Architecture
AI powered platform tools
Development
- Source Code Management
- Feature flags and change management
- Project Management \& Issue Tracking Software
- Bug / Defect Tracking Software
- Code Editors and IDE's
Continuous Testing
Continuous Integration
- Build
- Integration
Continuous Delivery
- Deployment
- Automation and Collaboration
- Infrastructure orchestration
- Container
- Container Registry
- Container Orchestration
Continuous Monitoring
Incident Management / Incident Response / IT Alerting / On-Call
- IT Service Management
- Incident Communication
Security
Internal Developer Portal
Path to senior platform engineer handbook
Miscellaneous and Related
Notions and concepts
Stargazers over time
Licence

Articles and Presentations and Books¶

Chances are you don't need a platform team - How to minimize your platform and maximize user value by @bschaatsbergen .
How To Create A Complete Internal Developer Platform (IDP)? - It's time to build an internal developer platform (IDO) with Crossplane, Argo CD, SchemaHero, External Secrets Operator (ESO), GitHub Actions, Port, and a few others by @vfarcic .
What does it take to become a Platform Engineer? .
Platform Engineering How Did We Get Here .
Can we say that Platform Engineering is DevSec(Rel)Ops? .
The Practical Guide to Internal Developer Portals - The next big thing in DevOps is platform engineering, and the main tool it uses is the internal developer portal. Read this guide to understand what can be done with portals and why they matter .
Platform Engineering on Kubernetes - A book that teaches how to build custom platforms on top of Kubernetes using open-source tools such as Dapr, Knative, Argo CD and Rollouts, and Tekton. It explores the tools and techniques needed to overcome common cloud-native challenges and is suitable for readers with different expertise levels .
Build Your IDP at Light Speed with a Platform Reference Architecture - Now organizations have a standard, proven, scalable, and repeatable pattern for internal developer platforms that’s applicable to any tooling choice .
What Is Platform Engineering? Role, Principles & Benefits.
How to Design an Internal Developer Platform.
A Platform Team Product Manager Determines DevOps Success.
Platform Engineering KPIs.
Platform Engineering: Creating your Internal Developer Platform.
The 10 Platform Engineering Tools To Use in 2022.
Platform Engineering at Palo Alto Networks.
Platform Engineering story from a CTO: WHY, WHAT, HOW.
Create Preview Environments with Terraform, GitHub Actions, and Vercel.
Guide To Internal Developer Portals.
Introducing KBOM – Kubernetes Bill of Materials .
Platform Engineering Is Not Just about the Tools - Platform engineering isn’t solely about the tools and components but also about alignment within the organization and a special focus on understanding user needs .
Platform Engineering Rules the Day - Eight Key Themes .
Wwhat is Platform Engineering and why it is important for better developer experience - Some of the core tenets of Platform Engineering.
5 myths about platform engineering: what it is and what it isn’t - Five common myths about platform engineering.
The ultimate guide to platform engineering - Stay competetive: embrace platform engineering .
Top Platform Engineering KPIs You Need to Monitor - A curated list of top platform engineering KPIs that software teams must monitor .
Can ops actually do product management - Platform engineering problems: can ops actually do product management?.
Pulumi Platform Engineering - Accelerate, Scale and Secure AI Innovation with Pulumi Platform Engineering .
Insights to enable your platform engineering team to improve agility and customer focus - A Platform engineering that empowers users and reduces risk .
Can Your Developers Benefit from Platform Engineering? - Will designing tools and workflows to bring self-service to software development help developers work more efficiently? .
Platform Engineering: A Guide for Technical, Product, and People Leaders - This book guides you on adopting a developer-centric approach to platform engineering, understanding and building platform teams, automating infrastructure, and managing platform scalability and team dynamics. It also covers the role of a platform product manager and improving developer experience through self-service infrastructure .
Effective Platform Engineering - Learn to design and build platforms and tools that maximize developer efficiency .
The Road to Simplicity - What platform engineering can learn from automobile design .
Top 7 Platform Engineering Tools - Key platform engineering tools for developers .
Platform Engineering Podcast - This podcast caters to professionals and enthusiasts passionate about the intricacies of platform architecture, cloud operations, and scaling DevOps practices .
A brief history of Platform Engineering - How the shift to cloud native applications gave rise to a new practice called Platform engineering .
Platform Engineering Essential Tools - Key Platform Engineering tools .
Platform as a Product - Platform as a Product: What, Why, and How? .
State of Platform Engineering - The 2024 State of Platform Engineering? Fledgling at Best .
4000 microservices, 8 million customers, 1 Internal Developer Platform The story of how Sicredi, a credit union, with over 8 million clients, 2700 branches, and over 45,000 employees embarked on building an Internal Developer Platform to serve its thousands of developers - .
Platform Engineering: A Workshop to Help Map Your Strategy - Stakeholders from across an organization can decide together what their internal developer platform should do .
Combining practical Platform Engineering with Crossplane and ArgoCD - A hands-on look into tools and their integrated usage can help kickstart your knowledge about Platform Engineering patterns. Therefore let's dive into two top frameworks when it comes to Platform Engineering today: Crossplane & ArgoCD. Both alone can already be great choices, but combining their powers can unlock a whole new level .
Platform tooling landscape - Based on 100s of platform engineering setups in organizations of all sizes, this landscape distills the best practices in designing Internal Developer Platforms. .
How Platform Engineering Empowers Users and Reduces Risk? - This blog focuses on how platform engineering reduces complexity, enhances user autonomy, and minimizes risks in modern tech ecosystems. .
Why Every Platform Engineer Should Care About Kubernetes Operators from Pulumi Blog - Platform Engineer should konw that successful Kubernetes-powered platform is the use of Kubernetes Operators, as they are a great way to automate operational tasks and the lifecycle of complex applications and services on Kubernetes. .
Principal Engineer Roles Framework - The Principal Engineer Roles Framework by Mai-Lan Tomsen Bukovec outlines six distinct roles (Sponsor, Guide, Catalyst, Tie Breaker, Catcher, and Participant) to optimize impact, support skills development, and maintain clarity and alignment in fast-paced environments. .

Newsletters, Chats and Podcasts¶

Platform Engineering Certification - Cloud Native Computing Foundation Expands Certification to Platform Engineering.
Platform Engineering
Internal Developer Platform
The New Stack
Resources about Internal Platform teams and products
Humanitec (Platform Engineering) Blog
InfoQ Platform Engineering Articles
Port Blog
Platform weekly
Platformengineering.org Slack
What's Platform Engineering? And How Does It Support DevOps?
The New Stack Podcast
Platform Engineering with Nicholas Eberts

Specifications¶

OAM: One Application Model - An open model for defining cloud native apps.
Argonaut - Deploy apps and infrastructure on your cloud in minutes.
devtron - An open source Internal Developer Platform for Kubernetes.
SaaS Backstage Roadie - SaaS Backstage. Simple, safe, and more powerful.
ZYMR - We excell at Platform engineering.
CTO: platform for platform teams - The platform for platform teams : Easily implement your vision for the perfect developer platform without having to build everything from scratch. We’re more than just a CI/CD pipeline. We’re an intelligent automation platform for all of your development workflows.
score - One easy way to configure all your workload. Everywhere.
kubevela - Make shipping applications more enjoyable.
kusionstack - Open Tech Stack to build self-service, collaborative, reliable and sustainable Internal Developer Platform.
Cloud Native Operational Excellence (CNOE) - CNOE will enable organizations to navigate tooling sprawl and technology churn by coordinating contributions, offering tools, and providing neutral guidance on technology choices to deliver IDPs.
OpenGitOps - OpenGitOps is a set of open-source standards, best practices, and community-focused education to help organizations adopt a structured, standardized approach to implementing GitOps.
Open Platform for Enterprise AI - An ecosystem orchestration framework to integrate performant GenAI technologies & workflows leading to quicker GenAI adoption and business value.
karpor: Intelligence for Kubernetes. - World's most promising Kubernetes visualization Tool for developer and platform engineering teams.

Reference Architecture¶

The Reference Architecture for Agility is a technology-neutral logical architecture based on a disaggregated cloud-based model - A proven approach to helping every development organization become an integration agile organization.
CloudGeometry Reference Architecture - CloudGeometry Reference Architecture for simplifying the creation and management of DevOps and Cloud resources.
AWS Reference Architecture implementation - How to spin up your Humanitec AWS Reference Architecture.
GCP Reference Architecture implementation - How to spin up your Humanitec Google Cloud Reference Architecture Implementation.
Azure Reference Architecture implementation - How to spin up your Humanitec Azure Reference Architecture.
IBM Cloud Reference Architecture - IBM Infrastructure Automation.
Awesome Software Architecture - A curated list of resources on software architecture.

AI powered platform tools¶

InfraStack AI - AI-Powered Observability Copilot.
AI-Powered Incident Management - A solution that combines an on-call AI copilot and end-to-end automation.
Monolith - No-code AI software built for engineers.
Viktor - Implement AI in your engineering workflow.
initializ.ai - AI-Driven Unified DevSecOps Platform.

Development¶

Source Code Management¶

Git
GitHub
Gitlab
Bitbucket
Fossil
Mercurial
Perforce Helix Core
Subversion (SVN)
Nvim - hyperextensible Vim-based text editor.
Unleash - Open-source feature management solution built for developers.

Feature flags and change management¶

Project Management & Issue Tracking Software¶

Bug / Defect Tracking Software¶

Code Editors and IDEs¶

Continuous Testing¶

Continuous Integration¶

Build¶

Integration¶

Continuous Delivery¶

Deployment¶

AWS CodeDeploy
ElectricFlow
Octopus Deploy
IBM UrbanCode
DeployBot
Shippable
Codar Continuous Delivery
Wercker
Humanitec
ArgoCD
FluxCD
Jenkins X - CI/CD including everything you need to start exploring Kubernetes
Tekton
Buddy Works
werf
Google Cloud Build
Spinnaker
Kluctl - Easily handle Kubernetes deployments of any size, complexity, and across various environments using the push based CLI or pull based GitOps.
Walrus - An open-source application management platform based on IaC tools including OpenTofu, Terraform and others. It helps platform engineers build golden paths for developers and empowers developers with self-service capabilities.
dyrector.io - dyrector.io is a self-hosted continuous delivery & deployment platform with version management.
ketch - Application delivery framework that facilitates the deployment and management of applications on Kubernetes using a simple command line interface.

Automation and Collaboration¶

Digger - Infrastructure as code management platform that enables you to run OpenTofu & Terraform in your CI/CD system.
Atlantis — Open Source Terraform Pull Request Automation tool.
Env0 — Automate and Manage IaC at Scale, With Confidence
Spacelift — Spacelift is a sophisticated CI/CD platform for OpenTofu, Terraform, Terragrunt, CloudFormation, Pulumi, Kubernetes, and Ansible.
Terramate — Terramate adds powerful capabilities such as code generation, stacks, orchestration, change detection, data sharing and more to Terraform.
Terrateam — Infrastructure as Code CI/CD for GitHub
OTF — An open source alternative to terraform enterprise.
Hatchet — An all-in-one platform to automate, secure and monitor Terraform
GitHub Actions - Automate, customize, and execute your software development workflows right in your repository
Runme - Infrastructure Notebooks Built with Markdown. Runme is a free tool that enables Markdown files to become runnable notebooks. You can use scripts in Shell, Perl, Python, and more.
Earthly - A versatile, approachable CI/CD framework that runs every pipeline inside containers, giving you repeatable builds that you write once and run anywhere.

Infrastructure orchestration¶

Vagrant
Puppet
Chef
SaltStack
Ansible
Terraform
OpenTofu
Terragrunt - DRY and maintainable Terraform code.
Pulumi
AWS CloudFormation
Rundeck
Selefra
Scalr
Google Cloud Deployment Manager
OPS
Helm - The package manager for Kubernetes
Helmfile - Deploy Kubernetes Helm Charts
Crossplane
Packer
Kubestack
Shipyard - Ephemeral environment management platform.

Container¶

Container Registry¶

Container Orchestration¶

Continuous Monitoring¶

AWS CloudWatch
DebugBear
Prometheus
StackDriver
Sensu
Sentry
CopperEgg
Crashlytics
Kapacitor
loggly
logmatic
Logstash
MongoDB Atlas
MongoDB Cloud Manager
NewRelic
Papertrail
Pingdom
ServerDensity
Zabbix
InsightOps
AppSignal
Grafana
VictoriaMetrics
Chaos Genius
Thanos
Mimir
Hydrozen.io - Uptime monitoring & Statuspages
Steampipe.io - Universal SQL interface to any cloud API
Better Stack
Netdata
DoctorGPT - Brings GPT into production for application log error monitoring
Dynatrace
Datadog
Elastic APM
Healthchecks.io
OnlineOrNot - Uptime monitoring for websites, APIs, and cron jobs, with integrated status pages.
ELK Stack (Elasticsearch, Logstash, Kibana)
VictoriaLogs database for logs from VictoriaMetrics
OpenTelemetry
Fluentd CNCF- a Distributed Tracing Platform
Jaeger CNCF: Unified Logging Layer
Infracost- cost estimates for Terraform
OpenCost — open source cost monitoring tool for Kubernetes
Apache SkyWalking — Application Performance Monitoring
SigNoz- an open-source alternative to DataDog, NewRelic, etc.
Loki - low cost open source logging; self-hosted or SaaS
SigLens

Incident Management / Incident Response / IT Alerting / On-Call¶

Squadcast
PagerDuty
VictorOps
OpsGenie
AlertOps
Blameless
Jira Ops
OnPage
PagerTree
Cabot
AlertAgility
xMatters
Derdack Enterprise Alert
Bigpanda
OpenDuty
ngDesk
Geneos
FireHydrant
SLO exporter
SLO Calculator
Rootly
Grafana OnCall
Keep - CLI for alerting
Better Stack
Everbridge
Moogsoft
incident.io
AlertManager
rootly - Manage incidents directly from Slack
Pagerly - Manage Oncalls and Incidents on Slack

IT Service Management¶

Homer - A very simple static homepage for your server.
FreshService
ServiceNow
BMC Remedy
Jira Service Management(formerly Jira Service Desk)
Samanage
Cherwell
SysAid
ManageEngine Servicedesk plus
Zendesk

Incident Communication¶

Squadcast Statuspages
StatusPal - communicate incidents and maintenance effectively with a beautiful hosted status page.
Hydrozen.io Statuspages
Atlassian Statuspages
Instatus Statuspages - Quick and beautiful status page.
Cachet

Security¶

Internal Developer Portal¶

Port
Backstage Software Catalog
OpsLevel
KusionStack
KubeStack
Radius app - Open-source, cloud-native, application platform that enables developers and the operators that support them to define, deploy, and collaborate on cloud-native applications across public clouds and private infrastructure.
Mia platform - Don’t waste time setting up your platform, just push the code!.
Humanitec - Powering your Internal Developer Platform.
Appvia - Increase Developer productivity with self-service.
qovery - Deliver Self-Service Infrastructure Faster.
Mogenius - The Kubernetes Operations Platform.
Nullstone - An easy-to-use developer platform that enables developers to quickly deploy any application.
Kratix - A framework for building Platform-as-a-Product.
cycloid - Platform Engineering is DevOps with an action plan.
Shipa - Shipa simplifies the way you deploy, secure, and manage applications across cloud native infrastructures by taking an application-centric approach.
Upbound - The platform for platform teams.
Kubero - A fully self-hosted Internal Developer Platform (IDP).
Roadie Internal Developer Portal - SaaS-based Internal Developer Portal.

Path to senior platform engineer handbook¶

Platform Engineer Career Path - Everything you need to know to senior platform engineer and beyond.
A Guide to shaping your Platform Engineer career - Platform Engineering career pathing.

Platform Engineering serves as a distinct and valuable career path within an organization, complementing roles like DevOps and Site Reliability Engineering (SRE). While DevOps and SRE ensure smooth software development processes and reliable, scalable infrastructure respectively, Platform Engineers are entrusted with the unique responsibility of crafting the tools, processes, and platforms on which software development and operational tasks occur

Platform Engineering Career pathing¶

Platform Engineering Roles Summary¶

Junior Platform Engineer:
Handles routine tasks, troubleshooting, cloud configurations, and code reviews.
Skills: Basic scripting, cloud tech, containerization, Golang, Kubernetes.

Platform Engineer:
Implements features, security, system scaling, and CI/CD tools.
Skills: Advanced cloud platforms, containerization, CI/CD, scripting, Golang, Kubernetes.

Senior Platform Engineer:
Designs architecture, mentors, manages projects, ensures security.
Skills: System architecture, cloud computing, containerization, CI/CD, Golang, Kubernetes.

Lead Platform Engineer:
Leads team, manages projects, strategic decisions, stakeholder interaction.
Skills: Systems design, project management, various tech stacks, Golang, Kubernetes.

Staff Platform Engineer:
Sets technical direction, standards, leads projects, guides cloud services.
Skills: Multiple tech stacks, system design, performance, security, Golang, Kubernetes.

Principal Platform Engineer:
Technical leader, sets vision, strategy, engineering processes, cloud strategy.
Skills: Broad tech expertise, strategic planning, complex engineering processes, Golang, Kubernetes.

Platform Engineering Manager:
Oversees teams, strategic direction, performance, budgeting, alignment with business.
Skills: Technical background, budgeting, talent development, Golang, Kubernetes.

Senior Manager, Platform Engineering:
Sets organization-wide strategy, manages teams, cross-team collaboration, cloud strategy.
Skills: Broad tech knowledge, business understanding, engineering management, Golang, Kubernetes.

Director of Platform Engineering:
Sets vision, plans execution, tech decisions, team structure, budget management, strategic planning.
Skills: Vision setting, strategic planning, cloud strategy, Golang, Kubernetes.

Notions and concepts¶

Fundamentals¶

Keep it simple, stupid. You ain't gonna need it.
You should think about what to do before you do it.
You should try to talk about what you’re planning to do before you do it.
You should think about what you did after you did it.
Be prepared to throw away something you’ve done in order to do something different.
Always look for better ways of doing things.
“Good enough” isn’t good enough.

Code¶

Code is a liability, not an asset. Aim to have as little of it as possible.
Build programs out of pure functions. This saves you from spending your brain power on tracking side effects, mutated state and actions at a distance.
Use a programming language with a rich type system that lets you describe the parts of your code and checks your program at compile time.
The expressivity of a programming language matters hugely. It’s not just a convenience to save keypresses, it directly influences the way in which you write code.
Choose a programming language that has a good module system, and use it. Be explicit about the public interface of a module, and ensure its interals don't leak out to client code.
Code is a living construct that is never “done”. You need to tend it like a garden, always improving and tidying it, or it withers and dies.
Have the same high standards for all the code you write, from little scripts to the inner loop of your critical system.
Write code that is exception safe and resource safe, always, even in contexts where you think it won’t matter. The code you wrote in a little ad-hoc script will inevitably find its way into more critical or long-running code.
Use the same language for the little tools and scripts in your system too. There are few good reasons to drop down into bash or Python scripts, and some considerable disadvantages.
In code, even the smallest details matter. This includes whitespace and layout!

Design¶

Modelling - the act of creating models of the world - is a crucial skill, and one that’s been undervalued in recent years.
Model your domain using types.
Model your domain first, using data types and function signatures, pick implementation technologies and physical architecture later.
Implement functionality in vertical slices that span your whole system, and iterate to grow the system.
Resist the temptation to use your main domain types to describe interfaces or messages exchanged by your system. Use separate types for these, even if it entails some duplication, as these types will evolve differently over time.
Prefer immutability always. This applies to data storage as well as in-memory data structures.
When building programs that perform actions, model the actions as data, then write an interpreter that performs them. This makes your code much easier to test, monitor, debug, and refactor.
Dependency management is crucial, so do it from day one. The payoff for this mostly comes when your system is bigger, but it’s not expensive to do from the beginning and it saves massive problems later.
Avoid circular dependencies, always.

Designing systems¶

A better system is often a smaller, simpler system.
To design healthy systems, divide and conquer. Split the problem into smaller parts.
Divide and conquer works recursively: divide the system into a hierarchy of simpler sub-systems and components.
Corollary: When designing a system, there are more choices than a monolith vs. a thousand “microservices”.
The interface between parts is crucial. Aim for interfaces that are as small and simple as possible.
Data dependencies are insidious. Take particular care to manage the coupling introduced by such dependencies.
Plan to evolve data definitions over time, as they will inevitably change.
Asynchronous interfaces can be useful to remove temporal coupling between parts.
Every inter-process boundary incurs a great cost, losing type safety, and making it much harder to reason about failures. Only introduce such boundaries where absolutely necessary and where the benefits outweigh the cost.
Being able to tell what your system is doing is crucial, so make sure it’s observable.
Telling what your system has done in the past is even more crucial, so make sure it’s auditable.
A modern programming language is the most expressive tool we have for describing all aspects of a system.
This means: write configuration as code, unless it absolutely, definitely has to change at runtime.
Also, write the specification of the system as executable code.
And, use code to describe the infrastructure of your system, in the same language as the rest of the code. Write code that interprets the description of your system to provision actual physical infrastructure.
At the risk of repeating myself: everything is code.
Corollary: if you’re writing JSON or YAML by hand, you’re doing it wrong. These are formats for the machines, not for humans to produce and consume. (Don’t despair though: most people do this, I do too, so you’re not alone! Let's just try to aim for something better).
The physical manifestation of your system (e.g. choices of storage, messaging, RPC technology, packaging and scheduling etc) should usually be an implementation detail, not the main aspect of the system that the rest is built around.
It should be easy to change the underlying technologies (e.g. for data storage, messaging, execution environment) used by a component in your system, this should not affect large parts of your code base.
You should have at least two physical manifestations of your system: a fully integrated in-memory one for testing, and the real physical deployment. They should be functionally equivalent.
You should be able to run a local version of your system on a developer’s computer with a single command. With the capacity of modern computers, there is absolutely no rational reason why this isn’t feasible, even for big, complex systems.
There is a running theme here: separate the description of what a system does from how it does it. This is probably the single most important consideration when creating a system.

Building systems¶

For a new system, get a walking skeleton deployed to production as soon as possible.
Your master branch should always be deployable to production.
Use feature branches if you like. Modern version control tools make merging easy enough that it’s not a problem to let these be long-lived in some cases.
Ideally, deploy automatically to production on every update to master. If that’s not feasible, it should be a one-click action to perform the deployment.
Maintain a separate environment for situations when you find it useful to test code separately from production. Avoid more than one such extra environment, as this introduces overheads and cost.
Prefer feature flags and similar mechanisms to control what's enabled in production over separate test/staging environments and manual promotion of releases.
Get in the habit of deploying from master to production from the very beginning of a project. Doing this shapes both your system and how you work with it for the better.
In fact, follow all these practices from the very beginning of a new system. Retrofitting them later is much, much harder.

Technology¶

Beware of hyped or fashionable technologies. The fundamentals of computer science and engineering don’t change much over time.
Keep up with latest developments in technology to see how they can help you, but be realistic about what they can do.
Choose your data storage backend according to the shape of data, types of queries needed, patterns of writes vs. reads, performance requirements, and more. Every use case is different.
That said, PostgreSQL should be your default and you should only pick something else if you have a good reason.

Stargazers over time¶

Licence¶

Shield:

This work is licensed under a Creative Commons Attribution 4.0 International License.