Self-Service Platform

A self-service platform empowers developers to provision infrastructure, create repositories, and deploy applications without manual operations intervention.

Self-Service Platform Principles

The fundamental shift from ticket-driven operations to self-service platforms represents one of the highest-ROI transformations in platform engineering. At Samsung Research, implementing these principles reduced developer onboarding time from 4 days to under 2 hours and eliminated 80% of repetitive infrastructure tickets.

Principle	Description	Anti-Pattern
Guardrails, not gates	Automated validation prevents bad configs instead of human review	Requiring manager approval for dev environment access
GitOps-driven	All changes via version-controlled pull requests	Click-ops in cloud consoles
Failsafe defaults	Secure, compliant defaults applied automatically	Blank templates requiring manual security configuration
Observable	Every provisioning action logged and auditable	Silent automation with no audit trail
Composable	Users combine building blocks, not write custom scripts	One-off Terraform modules per service
Measurable	Platform success measured by developer productivity metrics	Platform team measured by tickets closed

Architecture Components

┌─────────────────────────────────────────────────────────────────────────────┐
│                               REQUEST PORTAL                                 │
│  ┌─────────────────┐  ┌──────────────────┐  ┌──────────────────────────┐   │
│  │ Web UI (React)  │  │ Slack Bot        │  │ CLI Tool                 │   │
│  │                 │  │                  │  │                          │   │
│  │ - Form-based    │  │ /provision-repo  │  │ kuyaops provision repo   │   │
│  │   config wizard │  │ /request-access  │  │ kuyaops provision infra  │   │
│  │ - Live validation│ │ /my-resources    │  │ kuyaops status           │   │
│  │ - Request history│  │                  │  │                          │   │
│  └────────┬────────┘  └────────┬─────────┘  └────────────┬─────────────┘   │
└───────────┼────────────────────┼─────────────────────────┼─────────────────┘
            │                    │                         │
            └────────────────────┼─────────────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │  API Gateway        │
                    │  (AuthN/AuthZ)      │
                    └──────────┬──────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────────────────┐
│                            VALIDATION ENGINE                                 │
│                                                                              │
│  ┌──────────────────────┐  ┌──────────────────┐  ┌──────────────────────┐   │
│  │ Schema Validation    │  │ Policy-as-Code   │  │ Cost Estimation      │   │
│  │ (JSON Schema)        │  │ (OPA/Rego)       │  │ (Infracost)          │   │
│  │                      │  │                  │  │                      │   │
│  │ - Required fields    │  │ - Naming conv.   │  │ - Monthly cost       │   │
│  │ - Type checking      │  │ - Tag compliance │  │ - Budget alerts      │   │
│  │ - Reference validity │  │ - Security rules │  │ - Show diff          │   │
│  └──────────────────────┘  └──────────────────┘  └──────────────────────┘   │
└──────────────────────────────┬──────────────────────────────────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │  Request Queue      │
                    │  (Redis/RabbitMQ)   │
                    └──────────┬──────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────────────────┐
│                         PROVISIONING ORCHESTRATOR                            │
│                                                                              │
│  ┌─────────────────┐  ┌──────────────────┐  ┌────────────────────────────┐  │
│  │ GitHub Provision│  │ Terraform Cloud   │  │ Secrets Manager            │  │
│  │ (Python/GitHub  │  │ (Infrastructure   │  │ (HashiCorp Vault /         │  │
│  │  API)           │  │  provisioning)    │  │  AWS Secrets Manager)      │  │
│  │                 │  │                   │  │                            │  │
│  │ - Repo creation │  │ - VPC/ECS/RDS     │  │ - Inject CI/CD secrets     │  │
│  │ - Branch protec.│  │ - IAM roles       │  │ - Rotate credentials       │  │
│  │ - Team access   │  │ - DNS/certificates│  │ - App config injection     │  │
│  └─────────────────┘  └──────────────────┘  └────────────────────────────┘  │
└──────────────────────────────┬──────────────────────────────────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │  Notification Bus   │
                    │  (Event-driven)     │
                    └──────────┬──────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────────────────┐
│                          NOTIFICATION SYSTEM                                 │
│                                                                              │
│  ┌─────────────────┐  ┌──────────────────┐  ┌────────────────────────────┐  │
│  │ Slack           │  │ Email             │  │ Audit Log (SIEM)           │  │
│  │ #provision-notif│  │ Welcome email     │  │ CloudWatch/Splunk          │  │
│  │ #security-alerts│  │ Cost alerts       │  │ SOC 2 evidence             │  │
│  └─────────────────┘  └──────────────────┘  └────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

Repository Provisioning Workflow

Developer submits config YAML
Developer fills a form or writes a YAML file describing the desired repository configuration. The portal validates schema and provides live feedback.
Validation against standards
OPA/Rego policies enforce naming conventions, required tags, approved technology choices, and security baselines. Violations are rejected with specific guidance.
GitHub repo creation via API
The GitHub automation framework creates the repository with correct settings, branch protection, and team access.
Branch protection setup
Required status checks, code owner reviews, and merge restrictions are applied automatically based on repository classification (production, internal, experimental).
Secrets injection (Vault/AWS SM)
CI/CD secrets, API keys, and service credentials are provisioned and injected into GitHub Actions secrets — never exposed to developers.
SonarQube project registration
A SonarQube project is created with quality gate rules appropriate for the technology stack and risk classification.
CI/CD pipeline generation
Standardized GitHub Actions workflows are generated from templates: build, test, security scan, and deploy stages.
Slack notification
Developer and team channel receive confirmation with resource links, estimated costs, and next steps.

Complete Request Config YAML

# provisioning-request.yaml
# Complete self-service request for a new service
# Submit via: kuyaops provision apply -f provisioning-request.yaml

apiVersion: platform.kuyaops.com/v1
kind: ServiceRequest
metadata:
  name: payment-gateway-service
  requester: john.doe@company.com
  team: payments-team
  costCenter: CC-12345
  jiraTicket: PROJ-4567

spec:
  # ---------------------------------------------------------------------------
  # Repository Configuration
  # ---------------------------------------------------------------------------
  repository:
    name: payment-gateway
    description: "PCI-compliant payment processing service"
    visibility: private
    template: microservice-golang  # From service catalog
    classification: production     # Determines security controls
    
    settings:
      defaultBranch: main
      deleteBranchOnMerge: true
      allowSquashMerge: true
      allowRebaseMerge: false
      allowMergeCommit: false
      
    branchProtection:
      requiredChecks:
        - "ci/lint"
        - "ci/unit-tests"
        - "ci/integration-tests"
        - "ci/security-scan (Trivy)"
        - "ci/sonarqube"
        - "terraform/plan"
      requiredReviewers: 2
      requireCodeOwnerReview: true
      dismissStaleReviews: true
      requireLinearHistory: true
      
    teams:
      - name: payments-team
        permission: maintain
      - name: sre-oncall
        permission: admin
      - name: security-reviewers
        permission: push
        
    topics:
      - payments
      - pci-scope
      - golang
      - grpc
      - microservice
      
  # ---------------------------------------------------------------------------
  # Infrastructure Configuration
  # ---------------------------------------------------------------------------
  infrastructure:
    provider: aws
    region: us-east-1
    environment: production
    
    compute:
      type: ecs-fargate
      cpu: 1024
      memory: 2048
      minReplicas: 3
      maxReplicas: 20
      
    database:
      engine: postgres
      version: "15"
      instanceClass: db.r6g.large
      multiAz: true
      encrypted: true
      backupRetention: 30
      
    cache:
      engine: redis
      nodeType: cache.r6g.large
      
    networking:
      vpcId: vpc-prod-001  # Reference existing VPC
      publicEndpoints: false
      wafEnabled: true
      ddosProtection: true
      
    storage:
      s3Buckets:
        - name: payment-gateway-logs
          encryption: AES256
          retention: 90
          
  # ---------------------------------------------------------------------------
  # Security & Compliance
  # ---------------------------------------------------------------------------
  security:
    secretScanning: enabled
    pushProtection: enabled
    dependencyScanning: enabled
    codeOwners: true
    signedCommits: required
    
    compliance:
      soc2: true
      pciDss: true  # Triggers additional scanning and network isolation
      
    iam:
      roles:
        - name: payment-gateway-exec
          policies:
            - arn: arn:aws:iam::aws:policy/CloudWatchLogsFullAccess
            - arn: arn:aws:iam::123456789:policy/PaymentServiceAccess
            
  # ---------------------------------------------------------------------------
  # Observability
  # ---------------------------------------------------------------------------
  observability:
    datadog:
      enabled: true
      serviceName: payment-gateway
      apm: true
      logs: true
      infrastructure: true
      synthetics:
        - name: health-check
          url: /health
          frequency: 60
          locations: [aws:us-east-1, aws:us-west-2]
          
    alerts:
      pagerduty: payments-critical
      slack: "#payments-alerts"
      
    dashboards:
      - service-overview
      - payment-metrics
      - slo-compliance
      
  # ---------------------------------------------------------------------------
  # CI/CD Pipeline
  # ---------------------------------------------------------------------------
  pipeline:
    platform: github-actions
    stages:
      - name: build
        steps: [lint, unit-tests, build-image]
      - name: security
        steps: [trivy-scan, sonarqube, dependency-check]
      - name: integration
        steps: [integration-tests, contract-tests]
      - name: deploy
        strategy: blue-green
        environments: [staging, production]
        approval: production

Policy Validation with OPA/Rego

Open Policy Agent (OPA) evaluates every provisioning request against organizational policies written in Rego. Requests that violate policy are rejected with specific, actionable error messages.

# policies/provisioning.rego
package platform.provisioning

import future.keywords.if
import future.keywords.in

# Deny repositories without required tags
deny contains msg if {
    some repo
    input.spec.repository
    missing := required_tag - {key | input.spec.repository.topics[_] == key}
    count(missing) > 0
    msg := sprintf("Repository missing required tags: %v", [missing])
}

required_tag := {"compliance-scope", "data-classification", "owner-team"}

# Deny production services without branch protection
deny contains msg if {
    input.spec.infrastructure.environment == "production"
    not input.spec.repository.branchProtection
    msg := "Production services require branch protection"
}

# Deny non-encrypted databases in PCI scope
deny contains msg if {
    input.spec.security.compliance.pciDss
    not input.spec.infrastructure.database.encrypted
    msg := "PCI DSS requires database encryption at rest"
}

# Deny public repositories for PCI-scoped services
deny contains msg if {
    input.spec.security.compliance.pciDss
    input.spec.repository.visibility == "public"
    msg := "PCI-scoped services cannot have public repositories"
}

# Deny if cost center not provided
deny contains msg if {
    not input.metadata.costCenter
    msg := "Cost center required for resource provisioning (FinOps policy)"
}

# Deny if requester not in approved team
deny contains msg if {
    some team
    team := input.metadata.team
    not team in data.approved_teams
    msg := sprintf("Team '%s' not in approved provisioning teams", [team])
}

# Deny overly permissive IAM policies
deny contains msg if {
    some role in input.spec.iam.roles
    some policy in role.policies
    policy == "arn:aws:iam::aws:policy/AdministratorAccess"
    msg := sprintf("Role '%s' uses overly permissive AdministratorAccess policy", [role.name])
}

# Allow only if no denials
allow if {
    count(deny) == 0
}

# Approval workflow triggers
needs_approval if {
    input.spec.infrastructure.environment == "production"
}

needs_approval if {
    input.spec.infrastructure.database.instanceClass == "db.r6g.xlarge"
    input.spec.infrastructure.database.instanceClass == "db.r6g.2xlarge"
}

OPA Evaluation in Python

# validation_engine.py
import json
import requests
from typing import List, Dict


class PolicyValidator:
    """Validates provisioning requests against OPA/Rego policies."""
    
    def __init__(self, opa_url: str = "http://localhost:8181"):
        self.opa_url = opa_url
    
    def validate(self, request: Dict) -> Dict:
        """
        Evaluate a provisioning request against all policies.
        
        Returns:
            {
                "allowed": bool,
                "violations": ["..."],
                "needs_approval": bool
            }
        """
        response = requests.post(
            f"{self.opa_url}/v1/data/platform/provisioning",
            json={"input": request},
            timeout=10
        )
        response.raise_for_status()
        
        result = response.json().get("result", {})
        
        denials = result.get("deny", [])
        needs_approval = result.get("needs_approval", False)
        
        return {
            "allowed": len(denials) == 0,
            "violations": denials,
            "needs_approval": needs_approval
        }

Terraform Cloud/Enterprise Integration

For infrastructure provisioning, the platform integrates with Terraform Cloud to provide policy checks, state management, and approval workflows.

# terraform/main.tf
# Infrastructure module referenced by the self-service platform

terraform {
  cloud {
    organization = "kuyaops"
    workspaces {
      tags = ["self-service"]
    }
  }
  
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

# Variables populated by the self-service platform API
variable "service_name" { type = string }
variable "environment" { type = string }
variable "request_id" { type = string }
variable "requester" { type = string }
variable "team" { type = string }
variable "cost_center" { type = string }

# Tagging standard for FinOps cost allocation
locals {
  common_tags = {
    Service     = var.service_name
    Environment = var.environment
    Team        = var.team
    CostCenter  = var.cost_center
    RequestId   = var.request_id
    ManagedBy   = "self-service-platform"
    Repository  = "github.com/j1-medilo06"
  }
}

# Resources are provisioned based on the request spec
module "ecs_service" {
  source = "../modules/ecs-service"
  
  service_name = var.service_name
  environment  = var.environment
  tags         = local.common_tags
  
  # Auto-scaling from request spec
  desired_count = lookup(var.spec, "min_replicas", 2)
  max_count     = lookup(var.spec, "max_replicas", 10)
}

Service Catalog Design

The service catalog provides pre-approved, pre-configured service templates that developers can instantiate with minimal customization.

Template	Stack	Infra	Security Level	Approval
`microservice-golang`	Go, gRPC, PostgreSQL	ECS Fargate, RDS	Standard	Auto
`microservice-node`	Node.js, REST, MongoDB	ECS Fargate, DocumentDB	Standard	Auto
`web-frontend`	React, TypeScript	S3 + CloudFront	Standard	Auto
`data-pipeline`	Python, Airflow	MWAA, S3, Athena	Standard	Manager
`ml-inference`	Python, FastAPI, Torch	SageMaker, ECR	Standard	Manager
`pci-payment-service`	Go, PostgreSQL, Vault	Isolated VPC, HSM	PCI DSS	Security + Manager

Approval Workflows for Production Access

# approval-workflow.yaml
# Production access requires multi-level approval

workflows:
  standard_production:
    triggers:
      - environment == "production"
      - estimated_monthly_cost > 500
    steps:
      - level: 1
        approvers: ["team-lead"]
        timeout: 24h
        auto_escalate: true
      - level: 2
        approvers: ["engineering-manager"]
        timeout: 48h
        auto_escalate: true
      - level: 3
        approvers: ["platform-team"]
        timeout: 72h
        
  high_security:
    triggers:
      - compliance.pciDss == true
      - compliance.soc2 == true
    steps:
      - level: 1
        approvers: ["security-team"]
        requires: ["security-review-checklist"]
      - level: 2
        approvers: ["ciso-delegate"]
        timeout: 5d
        
  emergency_bypass:
    triggers:
      - incident.severity == "critical"
    steps:
      - level: 0
        approvers: ["oncall-sre"]
        post_approval_review: 24h
        audit_priority: high

Metrics and Adoption Tracking

Platform success is measured through developer productivity and operational efficiency metrics.

Metric	Before Platform	After Platform	Target
Developer onboarding time	2–4 days	15 minutes	< 10 minutes
Repo provisioning time	1–2 days (ticket)	3 minutes	< 2 minutes
Infra provisioning time	3–5 days	15 minutes	< 10 minutes
Policy violations (production)	12/month	0.5/month	0
Platform adoption rate	N/A	85%	> 90%
Mean time to first commit	5 days	2 hours	< 1 hour

Platform Engineering KPIs:

Deployment Frequency: How often developers deploy (should increase)
Lead Time for Changes: Commit to production time (should decrease)
Change Failure Rate: Percentage of deployments causing incidents (should decrease)
Time to Recovery: Mean time to restore service (should decrease)
Developer Satisfaction (NPS): Quarterly surveys on platform experience

References

Resource	Link
Open Policy Agent	openpolicyagent.org
Rego Language Reference	OPA Policy Language
Terraform Cloud	HashiCorp Docs
Platform Engineering Book	O'Reilly Platform Engineering
Team Topologies	teamtopologies.com
Author GitHub	github.com/j1-medilo06