DevOps Fundamentals

DevOps is a cultural and professional movement that emphasizes collaboration between development and operations teams, automating infrastructure, and delivering software rapidly and reliably. Born from the Agile movement and refined through years of large-scale production experience, DevOps represents a fundamental shift in how organizations build, deploy, and maintain software systems.

What is DevOps

DevOps is not a single tool, a role, or a team. It is a set of practices, cultural philosophies, and technical patterns that aims to shorten the systems development lifecycle while delivering features, fixes, and updates frequently in close alignment with business objectives. The term emerged around 2009, coined by Patrick Debois, and has since evolved into a mature discipline practiced by leading technology organizations worldwide.

At Samsung, where I led infrastructure for Knox, Pay, and SmartThings, DevOps was the difference between monthly release trains and multiple daily deployments to production. The transformation did not happen overnight; it required deliberate investment in automation, cultural change, and tooling.

The CALMS Framework

The CALMS framework, popularized by Jez Humble and coined by John Willis, provides a structured way to evaluate DevOps adoption:

Letter	Principle	Description	Key Practices
C	Culture	Shared ownership, blameless postmortems, cross-functional teams	Breaking down silos, shared on-call rotations, blameless culture
A	Automation	Automating repetitive tasks to reduce human error and increase speed	CI/CD pipelines, IaC, automated testing, self-service platforms
L	Lean	Minimizing waste, optimizing flow, delivering value continuously	Value stream mapping, small batch sizes, eliminating bottlenecks
M	Measurement	Data-driven decisions through comprehensive observability	DORA metrics, SLIs/SLOs/SLAs, monitoring, distributed tracing
S	Sharing	Open knowledge transfer, shared tooling, community building	Internal wikis, demo days, open source contributions, chat ops

Tip: Culture is the hardest and most important component of CALMS. You can buy tools, but you cannot buy culture. At Samsung, the turning point was when developers began participating in production on-call rotations alongside SREs, creating genuine shared ownership.

DevOps vs. SRE: Complementary Disciplines

Site Reliability Engineering (SRE), pioneered at Google, is often discussed alongside DevOps. Rather than competing approaches, they are complementary disciplines with significant overlap:

Aspect	DevOps	SRE
Origin	Movement focused on cultural transformation	Engineering discipline with concrete practices
Primary Goal	Bridge dev and ops through collaboration	Apply software engineering to operations problems
Error Budget	Emphasizes speed and stability balance	Formalizes error budgets as engineering contracts
Implementation	Broader cultural and process changes	Specific engineering practices and tooling
Metrics	DORA metrics (deployment frequency, lead time)	SLIs, SLOs, SLAs, error budgets, toil reduction
Role Definition	Can be a role, team, or cultural practice	Clearly defined engineering role with coding requirements

Ben Treynor Sloss, VP of Engineering at Google, described SRE as "what happens when you ask a software engineer to design an operations function." In practice, mature organizations adopt both: DevOps as the cultural foundation and SRE as the engineering implementation.

DevOps Lifecycle Phases

The DevOps lifecycle is often visualized as an infinite loop with eight phases. Each phase feeds into the next, creating a continuous improvement cycle:

1. Plan

Requirements gathering, sprint planning, task tracking. Tools include Jira, Azure DevOps Boards, Linear, and Confluence. Infrastructure planning happens here too, using architecture decision records (ADRs) to document choices.

2. Code

Application development with version control, code review, and branch protection. Git is the standard VCS. Feature branching with pull requests, trunk-based development, and GitFlow are common branching strategies. Code quality gates include linting, static analysis, and peer review requirements.

3. Build

Compilation, packaging, and artifact creation. CI servers (GitHub Actions, Jenkins, CircleCI) trigger builds on every commit. Docker images are built, versioned with semantic tags, and pushed to registries (ECR, GCR, ACR, Docker Hub).

4. Test

Automated testing at multiple levels: unit tests, integration tests, end-to-end tests, security scans (SAST/DAST), and performance tests. Shift-left testing integrates quality checks into the earliest stages of the pipeline.

5. Release

Artifact promotion through environments, changelog generation, and release orchestration. Git tagging, GitHub Releases, and semantic versioning (SemVer) provide traceability from code to deployed artifact.

6. Deploy

Infrastructure provisioning and application deployment. Infrastructure-as-Code (Terraform, Pulumi, CloudFormation) provisions resources. Blue/green deployments, canary releases, and rolling updates minimize risk.

7. Operate

Running production systems: configuration management, secrets rotation, database maintenance, capacity planning. This is where SRE practices heavily intersect with DevOps.

8. Monitor

Observability through metrics, logs, and traces. Alerting on SLO breaches, anomaly detection, and feedback into planning. Tools include Prometheus, Grafana, Datadog, New Relic, and Jaeger.

Lifecycle Integration: The loop closes when monitoring insights feed back into planning. A production incident may result in a Jira ticket for the next sprint. A performance regression triggers an architecture review. This feedback loop is what makes DevOps genuinely continuous.

DORA Metrics: Measuring DevOps Performance

The DevOps Research and Assessment (DORA) team, now part of Google Cloud, identified four key metrics that predict software delivery performance:

Metric	Description	Elite performers	How to measure
Deployment Frequency	How often code is deployed to production	Multiple times per day	Count of production deployments per day/week
Lead Time for Changes	Time from commit to production deployment	Less than 1 hour	Git commit timestamp to deployment completion timestamp
Mean Time to Recovery (MTTR)	Time to recover from a production failure	Less than 1 hour	Incident detection timestamp to service restoration timestamp
Change Failure Rate	Percentage of deployments causing production failures	Less than 5%	Failed deployments / total deployments

Warning: Do not optimize DORA metrics in isolation. A team could game deployment frequency by making trivial changes. The metrics must be considered together: frequent deployments with high failure rates indicate instability; low failure rates with infrequent deployments indicate excessive caution. Balance is essential.

In my experience migrating 50+ microservices at Samsung, focusing on these four metrics provided a clear North Star. We reduced lead time from 2 weeks to 45 minutes and cut the change failure rate from 12% to 3% through investment in automated testing, canary deployments, and feature flags.

DevOps Toolchain Landscape

The DevOps toolchain is organized by lifecycle phase. The following diagram represents a production-grade toolchain as used at enterprise scale:

┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│   Plan   │   Code   │   Build  │   Test   │  Release │  Deploy  │  Operate │  Monitor │
├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤
│ Jira     │ GitHub   │ GitHub   │ SonarQube│ GitHub   │ Terraform│ Ansible  │ Datadog  │
│ Confluence│ GitLab  │ Actions  │ Jest/JUnit│ Releases│ AWS Code │ Kubernetes│ Prometheus│
│ Linear   │ Bitbucket│ Jenkins  │ Cypress  │ Semantic │ Deploy   │ Helm     │ Grafana  │
│ Miro     │ VS Code  │ CircleCI │ Snyk     │ Version │ ArgoCD   │ Puppet   │ Jaeger   │
│          │          │ Docker   │ Trivy    │ tags    │ Flux     │ Chef     │ PagerDuty│
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘

Tool Selection Principles

API-first: Tools must expose APIs for automation and integration
Git-native: Configuration stored in version control for auditability and rollback
Open standards: Prefer OpenTelemetry, OCI, CNCF-graduated projects
Composability: Tools should integrate via webhooks, events, or APIs
Exit strategy: Consider migration cost before adopting any tool

DevOps Best Practices Summary

Practice	Description	Implementation
Everything as Code	Define all infrastructure and configuration in version-controlled code	Terraform, Ansible, Pulumi
Immutable Infrastructure	Never modify running servers; replace them with new versions	Container images, blue/green deployments
Shift-Left Security	Integrate security scanning into the earliest pipeline stages	Snyk, Trivy, SonarQube, Checkov
Trunk-Based Development	Short-lived branches merged to main frequently	Feature flags, mainline development
Observability by Design	Applications emit structured logs, metrics, and traces from day one	OpenTelemetry, Prometheus, structured logging
Self-Service Platforms	Developer portals enable provisioning without tickets	Backstage, Terraform modules, service catalog
Blameless Postmortems	Incident reviews focus on system improvement, not individual blame	Documented runbooks, incident tracking
Automated Rollbacks	Failed deployments automatically revert to the last known good state	ArgoCD auto-sync, canary analysis

Getting Started: Implementation Roadmap

For organizations beginning their DevOps journey, the following phased approach has proven effective across multiple enterprise migrations:

Assessment: Map current state value stream, identify bottlenecks, establish DORA baseline
Foundation: Implement version control for all code, establish CI pipelines for builds
Automation: Deploy Infrastructure-as-Code, automate testing, implement artifact management
CD Pipeline: Build deployment pipelines with environment promotion, implement blue/green or canary
Observability: Deploy monitoring, logging, and alerting; define initial SLOs
Governance: Implement policy-as-code, secrets management, compliance automation
Platform: Build internal developer platform with self-service capabilities
Optimize: Continuous improvement through DORA measurement, cost optimization, chaos engineering

Insider Advice: Do not attempt to implement all practices simultaneously. Start with version control and CI builds. The biggest mistake I have seen is teams trying to deploy Kubernetes and GitOps before they have reliable builds. Build the foundation first, then add complexity.

DevOps Fundamentals

What is DevOps

The CALMS Framework

DevOps vs. SRE: Complementary Disciplines

DevOps Lifecycle Phases

1. Plan

2. Code

3. Build

4. Test

5. Release

6. Deploy

7. Operate

8. Monitor

DORA Metrics: Measuring DevOps Performance

DevOps Toolchain Landscape

Tool Selection Principles

DevOps Best Practices Summary

Getting Started: Implementation Roadmap

Related Topics