Self-Service Platform
A self-service platform empowers developers to provision infrastructure, create repositories, and deploy applications without manual operations intervention.
Self-Service Platform Principles
The fundamental shift from ticket-driven operations to self-service platforms represents one of the highest-ROI transformations in platform engineering. At Samsung Research, implementing these principles reduced developer onboarding time from 4 days to under 2 hours and eliminated 80% of repetitive infrastructure tickets.
| Principle | Description | Anti-Pattern |
|---|---|---|
| Guardrails, not gates | Automated validation prevents bad configs instead of human review | Requiring manager approval for dev environment access |
| GitOps-driven | All changes via version-controlled pull requests | Click-ops in cloud consoles |
| Failsafe defaults | Secure, compliant defaults applied automatically | Blank templates requiring manual security configuration |
| Observable | Every provisioning action logged and auditable | Silent automation with no audit trail |
| Composable | Users combine building blocks, not write custom scripts | One-off Terraform modules per service |
| Measurable | Platform success measured by developer productivity metrics | Platform team measured by tickets closed |
Architecture Components
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REQUEST PORTAL β
β βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Web UI (React) β β Slack Bot β β CLI Tool β β
β β β β β β β β
β β - Form-based β β /provision-repo β β kuyaops provision repo β β
β β config wizard β β /request-access β β kuyaops provision infra β β
β β - Live validationβ β /my-resources β β kuyaops status β β
β β - Request historyβ β β β β β
β ββββββββββ¬βββββββββ ββββββββββ¬ββββββββββ ββββββββββββββ¬ββββββββββββββ β
βββββββββββββΌβββββββββββββββββββββΌββββββββββββββββββββββββββΌββββββββββββββββββ
β β β
ββββββββββββββββββββββΌββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β API Gateway β
β (AuthN/AuthZ) β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β VALIDATION ENGINE β
β β
β ββββββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ β
β β Schema Validation β β Policy-as-Code β β Cost Estimation β β
β β (JSON Schema) β β (OPA/Rego) β β (Infracost) β β
β β β β β β β β
β β - Required fields β β - Naming conv. β β - Monthly cost β β
β β - Type checking β β - Tag compliance β β - Budget alerts β β
β β - Reference validity β β - Security rules β β - Show diff β β
β ββββββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β Request Queue β
β (Redis/RabbitMQ) β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β PROVISIONING ORCHESTRATOR β
β β
β βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββββ β
β β GitHub Provisionβ β Terraform Cloud β β Secrets Manager β β
β β (Python/GitHub β β (Infrastructure β β (HashiCorp Vault / β β
β β API) β β provisioning) β β AWS Secrets Manager) β β
β β β β β β β β
β β - Repo creation β β - VPC/ECS/RDS β β - Inject CI/CD secrets β β
β β - Branch protec.β β - IAM roles β β - Rotate credentials β β
β β - Team access β β - DNS/certificatesβ β - App config injection β β
β βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β Notification Bus β
β (Event-driven) β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β NOTIFICATION SYSTEM β
β β
β βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββββ β
β β Slack β β Email β β Audit Log (SIEM) β β
β β #provision-notifβ β Welcome email β β CloudWatch/Splunk β β
β β #security-alertsβ β Cost alerts β β SOC 2 evidence β β
β βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Repository Provisioning Workflow
-
Developer submits config YAML
Developer fills a form or writes a YAML file describing the desired repository configuration. The portal validates schema and provides live feedback.
-
Validation against standards
OPA/Rego policies enforce naming conventions, required tags, approved technology choices, and security baselines. Violations are rejected with specific guidance.
-
GitHub repo creation via API
The GitHub automation framework creates the repository with correct settings, branch protection, and team access.
-
Branch protection setup
Required status checks, code owner reviews, and merge restrictions are applied automatically based on repository classification (production, internal, experimental).
-
Secrets injection (Vault/AWS SM)
CI/CD secrets, API keys, and service credentials are provisioned and injected into GitHub Actions secrets β never exposed to developers.
-
SonarQube project registration
A SonarQube project is created with quality gate rules appropriate for the technology stack and risk classification.
-
CI/CD pipeline generation
Standardized GitHub Actions workflows are generated from templates: build, test, security scan, and deploy stages.
-
Slack notification
Developer and team channel receive confirmation with resource links, estimated costs, and next steps.
Complete Request Config YAML
# provisioning-request.yaml
# Complete self-service request for a new service
# Submit via: kuyaops provision apply -f provisioning-request.yaml
apiVersion: platform.kuyaops.com/v1
kind: ServiceRequest
metadata:
name: payment-gateway-service
requester: john.doe@company.com
team: payments-team
costCenter: CC-12345
jiraTicket: PROJ-4567
spec:
# ---------------------------------------------------------------------------
# Repository Configuration
# ---------------------------------------------------------------------------
repository:
name: payment-gateway
description: "PCI-compliant payment processing service"
visibility: private
template: microservice-golang # From service catalog
classification: production # Determines security controls
settings:
defaultBranch: main
deleteBranchOnMerge: true
allowSquashMerge: true
allowRebaseMerge: false
allowMergeCommit: false
branchProtection:
requiredChecks:
- "ci/lint"
- "ci/unit-tests"
- "ci/integration-tests"
- "ci/security-scan (Trivy)"
- "ci/sonarqube"
- "terraform/plan"
requiredReviewers: 2
requireCodeOwnerReview: true
dismissStaleReviews: true
requireLinearHistory: true
teams:
- name: payments-team
permission: maintain
- name: sre-oncall
permission: admin
- name: security-reviewers
permission: push
topics:
- payments
- pci-scope
- golang
- grpc
- microservice
# ---------------------------------------------------------------------------
# Infrastructure Configuration
# ---------------------------------------------------------------------------
infrastructure:
provider: aws
region: us-east-1
environment: production
compute:
type: ecs-fargate
cpu: 1024
memory: 2048
minReplicas: 3
maxReplicas: 20
database:
engine: postgres
version: "15"
instanceClass: db.r6g.large
multiAz: true
encrypted: true
backupRetention: 30
cache:
engine: redis
nodeType: cache.r6g.large
networking:
vpcId: vpc-prod-001 # Reference existing VPC
publicEndpoints: false
wafEnabled: true
ddosProtection: true
storage:
s3Buckets:
- name: payment-gateway-logs
encryption: AES256
retention: 90
# ---------------------------------------------------------------------------
# Security & Compliance
# ---------------------------------------------------------------------------
security:
secretScanning: enabled
pushProtection: enabled
dependencyScanning: enabled
codeOwners: true
signedCommits: required
compliance:
soc2: true
pciDss: true # Triggers additional scanning and network isolation
iam:
roles:
- name: payment-gateway-exec
policies:
- arn: arn:aws:iam::aws:policy/CloudWatchLogsFullAccess
- arn: arn:aws:iam::123456789:policy/PaymentServiceAccess
# ---------------------------------------------------------------------------
# Observability
# ---------------------------------------------------------------------------
observability:
datadog:
enabled: true
serviceName: payment-gateway
apm: true
logs: true
infrastructure: true
synthetics:
- name: health-check
url: /health
frequency: 60
locations: [aws:us-east-1, aws:us-west-2]
alerts:
pagerduty: payments-critical
slack: "#payments-alerts"
dashboards:
- service-overview
- payment-metrics
- slo-compliance
# ---------------------------------------------------------------------------
# CI/CD Pipeline
# ---------------------------------------------------------------------------
pipeline:
platform: github-actions
stages:
- name: build
steps: [lint, unit-tests, build-image]
- name: security
steps: [trivy-scan, sonarqube, dependency-check]
- name: integration
steps: [integration-tests, contract-tests]
- name: deploy
strategy: blue-green
environments: [staging, production]
approval: production
Policy Validation with OPA/Rego
Open Policy Agent (OPA) evaluates every provisioning request against organizational policies written in Rego. Requests that violate policy are rejected with specific, actionable error messages.
# policies/provisioning.rego
package platform.provisioning
import future.keywords.if
import future.keywords.in
# Deny repositories without required tags
deny contains msg if {
some repo
input.spec.repository
missing := required_tag - {key | input.spec.repository.topics[_] == key}
count(missing) > 0
msg := sprintf("Repository missing required tags: %v", [missing])
}
required_tag := {"compliance-scope", "data-classification", "owner-team"}
# Deny production services without branch protection
deny contains msg if {
input.spec.infrastructure.environment == "production"
not input.spec.repository.branchProtection
msg := "Production services require branch protection"
}
# Deny non-encrypted databases in PCI scope
deny contains msg if {
input.spec.security.compliance.pciDss
not input.spec.infrastructure.database.encrypted
msg := "PCI DSS requires database encryption at rest"
}
# Deny public repositories for PCI-scoped services
deny contains msg if {
input.spec.security.compliance.pciDss
input.spec.repository.visibility == "public"
msg := "PCI-scoped services cannot have public repositories"
}
# Deny if cost center not provided
deny contains msg if {
not input.metadata.costCenter
msg := "Cost center required for resource provisioning (FinOps policy)"
}
# Deny if requester not in approved team
deny contains msg if {
some team
team := input.metadata.team
not team in data.approved_teams
msg := sprintf("Team '%s' not in approved provisioning teams", [team])
}
# Deny overly permissive IAM policies
deny contains msg if {
some role in input.spec.iam.roles
some policy in role.policies
policy == "arn:aws:iam::aws:policy/AdministratorAccess"
msg := sprintf("Role '%s' uses overly permissive AdministratorAccess policy", [role.name])
}
# Allow only if no denials
allow if {
count(deny) == 0
}
# Approval workflow triggers
needs_approval if {
input.spec.infrastructure.environment == "production"
}
needs_approval if {
input.spec.infrastructure.database.instanceClass == "db.r6g.xlarge"
input.spec.infrastructure.database.instanceClass == "db.r6g.2xlarge"
}
OPA Evaluation in Python
# validation_engine.py
import json
import requests
from typing import List, Dict
class PolicyValidator:
"""Validates provisioning requests against OPA/Rego policies."""
def __init__(self, opa_url: str = "http://localhost:8181"):
self.opa_url = opa_url
def validate(self, request: Dict) -> Dict:
"""
Evaluate a provisioning request against all policies.
Returns:
{
"allowed": bool,
"violations": ["..."],
"needs_approval": bool
}
"""
response = requests.post(
f"{self.opa_url}/v1/data/platform/provisioning",
json={"input": request},
timeout=10
)
response.raise_for_status()
result = response.json().get("result", {})
denials = result.get("deny", [])
needs_approval = result.get("needs_approval", False)
return {
"allowed": len(denials) == 0,
"violations": denials,
"needs_approval": needs_approval
}
Terraform Cloud/Enterprise Integration
For infrastructure provisioning, the platform integrates with Terraform Cloud to provide policy checks, state management, and approval workflows.
# terraform/main.tf
# Infrastructure module referenced by the self-service platform
terraform {
cloud {
organization = "kuyaops"
workspaces {
tags = ["self-service"]
}
}
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
}
}
# Variables populated by the self-service platform API
variable "service_name" { type = string }
variable "environment" { type = string }
variable "request_id" { type = string }
variable "requester" { type = string }
variable "team" { type = string }
variable "cost_center" { type = string }
# Tagging standard for FinOps cost allocation
locals {
common_tags = {
Service = var.service_name
Environment = var.environment
Team = var.team
CostCenter = var.cost_center
RequestId = var.request_id
ManagedBy = "self-service-platform"
Repository = "github.com/j1-medilo06"
}
}
# Resources are provisioned based on the request spec
module "ecs_service" {
source = "../modules/ecs-service"
service_name = var.service_name
environment = var.environment
tags = local.common_tags
# Auto-scaling from request spec
desired_count = lookup(var.spec, "min_replicas", 2)
max_count = lookup(var.spec, "max_replicas", 10)
}
Service Catalog Design
The service catalog provides pre-approved, pre-configured service templates that developers can instantiate with minimal customization.
| Template | Stack | Infra | Security Level | Approval |
|---|---|---|---|---|
microservice-golang | Go, gRPC, PostgreSQL | ECS Fargate, RDS | Standard | Auto |
microservice-node | Node.js, REST, MongoDB | ECS Fargate, DocumentDB | Standard | Auto |
web-frontend | React, TypeScript | S3 + CloudFront | Standard | Auto |
data-pipeline | Python, Airflow | MWAA, S3, Athena | Standard | Manager |
ml-inference | Python, FastAPI, Torch | SageMaker, ECR | Standard | Manager |
pci-payment-service | Go, PostgreSQL, Vault | Isolated VPC, HSM | PCI DSS | Security + Manager |
Approval Workflows for Production Access
# approval-workflow.yaml
# Production access requires multi-level approval
workflows:
standard_production:
triggers:
- environment == "production"
- estimated_monthly_cost > 500
steps:
- level: 1
approvers: ["team-lead"]
timeout: 24h
auto_escalate: true
- level: 2
approvers: ["engineering-manager"]
timeout: 48h
auto_escalate: true
- level: 3
approvers: ["platform-team"]
timeout: 72h
high_security:
triggers:
- compliance.pciDss == true
- compliance.soc2 == true
steps:
- level: 1
approvers: ["security-team"]
requires: ["security-review-checklist"]
- level: 2
approvers: ["ciso-delegate"]
timeout: 5d
emergency_bypass:
triggers:
- incident.severity == "critical"
steps:
- level: 0
approvers: ["oncall-sre"]
post_approval_review: 24h
audit_priority: high
Metrics and Adoption Tracking
Platform success is measured through developer productivity and operational efficiency metrics.
| Metric | Before Platform | After Platform | Target |
|---|---|---|---|
| Developer onboarding time | 2β4 days | 15 minutes | < 10 minutes |
| Repo provisioning time | 1β2 days (ticket) | 3 minutes | < 2 minutes |
| Infra provisioning time | 3β5 days | 15 minutes | < 10 minutes |
| Policy violations (production) | 12/month | 0.5/month | 0 |
| Platform adoption rate | N/A | 85% | > 90% |
| Mean time to first commit | 5 days | 2 hours | < 1 hour |
- Deployment Frequency: How often developers deploy (should increase)
- Lead Time for Changes: Commit to production time (should decrease)
- Change Failure Rate: Percentage of deployments causing incidents (should decrease)
- Time to Recovery: Mean time to restore service (should decrease)
- Developer Satisfaction (NPS): Quarterly surveys on platform experience
References
| Resource | Link |
|---|---|
| Open Policy Agent | openpolicyagent.org |
| Rego Language Reference | OPA Policy Language |
| Terraform Cloud | HashiCorp Docs |
| Platform Engineering Book | O'Reilly Platform Engineering |
| Team Topologies | teamtopologies.com |
| Author GitHub | github.com/j1-medilo06 |