HashiCorp Vault
Vault secures, stores, and tightly controls access to tokens, passwords, certificates, encryption keys, and other sensitive secrets. This guide covers production-grade deployment, authentication, secret engines, and platform integration.
Vault Architecture
Vault operates on a client-server architecture with three core components:
| Component | Description | Responsibility |
|---|---|---|
| Vault Server | The core daemon handling API requests | Secret storage, access control, encryption, audit logging |
| Storage Backend | Encrypted data persistence layer | Stores all secrets and Vault's own data at rest |
| Vault Agent | Client-side daemon for token management | Auto-auth, caching, templating, sidecar injection |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Vault Server Cluster โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Node 1 โโโโ Node 2 โโโโ Node 3 โ โ
โ โ (Active) โ โ(Standby) โ โ(Standby) โ โ
โ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโดโโโโโโโ โ
โ โ Raft Log โ โ
โ โ (Integrated) โ โ
โ โโโโโโโโฌโโโโโโโ โ
โ โ โ
โ โโโโโโโโดโโโโโโโ โ
โ โ Storage โ โ
โ โ (Consul/S3) โ โ
โ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Deployment Modes
| Mode | Use Case | Storage | High Availability |
|---|---|---|---|
| Dev | Local development, testing | In-memory | No |
| Server | Production single node | File, Consul, S3, PostgreSQL | No |
| HA with Raft | Production cluster (recommended) | Integrated Raft | Yes, automatic failover |
Complete Vault Server Configuration
Production HA Configuration (config.hcl)
# /etc/vault.d/vault.hcl
# Production Vault server configuration with Raft HA
# Listener configuration with TLS
listener "tcp" {
address = "0.0.0.0:8200"
tls_cert_file = "/opt/vault/tls/vault.crt"
tls_key_file = "/opt/vault/tls/vault.key"
tls_min_version = "1.3"
# CIDR restrictions for management access
tls_disable_client_certs = false
}
# Integrated Raft storage (recommended for HA)
storage "raft" {
path = "/opt/vault/data"
node_id = "vault-node-1"
# Raft cluster peers (3-node minimum for production)
retry_join {
leader_api_addr = "https://vault-node-2.internal:8200"
leader_ca_cert_file = "/opt/vault/tls/ca.crt"
}
retry_join {
leader_api_addr = "https://vault-node-3.internal:8200"
leader_ca_cert_file = "/opt/vault/tls/ca.crt"
}
# Raft performance tuning
performance_multiplier = 1
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "10s"
max_trailing_logs = 1000
min_quorum = 3
server_stabilization_time = "10s"
}
}
# Auto-unseal with AWS KMS (production requirement)
seal "awskms" {
region = "us-east-1"
kms_key_id = "arn:aws:kms:us-east-1:123456789:key/vault-unseal"
endpoint = ""
}
# Telemetry for monitoring
telemetry {
prometheus_retention_time = "30s"
disable_hostname = false
enable_hostname_label = true
}
# Audit log configuration
audit_device "file" {
path = "/var/log/vault/audit.log"
log_raw = false
hmac_accessor = true
}
# API rate limiting
api_addr = "https://vault-node-1.internal:8200"
cluster_addr = "https://vault-node-1.internal:8201"
cluster_name = "production-vault"
# Performance and security settings
default_lease_ttl = "768h" # 32 days
max_lease_ttl = "8760h" # 1 year
disable_mlock = false
ui = true
log_level = "info"
Never use Dev Mode in Production
Vault dev mode stores everything in memory, uses a single unseal key printed to stdout, and listens on localhost only. It exists solely for local development.
Systemd Service File
# /etc/systemd/system/vault.service
[Unit]
Description=HashiCorp Vault
Documentation=https://www.vaultproject.io/docs/
Requires=network-online.target
After=network-online.target
[Service]
User=vault
Group=vault
ExecStart=/usr/local/bin/vault server -config=/etc/vault.d/vault.hcl
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
LimitMEMLOCK=infinity
# Security hardening
NoNewPrivileges=true
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/opt/vault/data /var/log/vault
AmbientCapabilities=CAP_IPC_LOCK
[Install]
WantedBy=multi-user.target
Authentication Methods
AWS IAM Authentication
AWS IAM auth allows EC2 instances, Lambda functions, and IAM users to authenticate using AWS signature v4 without static credentials:
# 1. Enable the AWS auth method
vault auth enable aws
# 2. Configure the AWS auth method
vault write auth/aws/config/client \
access_key="$AWS_ACCESS_KEY_ID" \
secret_key="$AWS_SECRET_ACCESS_KEY" \
region="us-east-1" \
iam_server_id_header_value="vault.internal.company.com"
# 3. Create a role for EC2 instances
vault write auth/aws/role/app-server-role \
auth_type="iam" \
bound_iam_principal_arn="arn:aws:iam::123456789:role/AppServerRole" \
policies="app-server-policy" \
ttl="1h" \
max_ttl="24h"
# AWS auth IAM trust policy (attach to EC2 role)
# terraform/iam-vault-auth.tf
data "aws_iam_policy_document" "vault_auth_trust" {
statement {
effect = "Allow"
principals {
type = "AWS"
identifiers = ["arn:aws:iam::123456789:root"]
}
actions = ["sts:AssumeRole"]
condition {
test = "StringEquals"
variable = "sts:ExternalId"
values = ["vault-auth-${var.environment}"]
}
}
}
resource "aws_iam_role" "app_server" {
name = "AppServerRole"
assume_role_policy = data.aws_iam_policy_document.vault_auth_trust.json
}
Kubernetes Authentication
# 1. Enable Kubernetes auth
vault auth enable kubernetes
# 2. Configure Kubernetes auth (from Vault server)
vault write auth/kubernetes/config \
kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
kubernetes_ca_cert="@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt" \
token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
issuer="https://kubernetes.default.svc.cluster.local"
# 3. Create a role for a service account
vault write auth/kubernetes/role/app-service \
bound_service_account_names="app-sa" \
bound_service_account_namespaces="production" \
policies="app-production-policy" \
ttl="1h"
# kubernetes/service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
namespace: production
annotations:
# Vault Agent Injector annotation
vault.hashicorp.com/role: "app-service"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
namespace: production
spec:
template:
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "app-service"
vault.hashicorp.com/agent-inject-secret-db: "database/creds/app-role"
vault.hashicorp.com/agent-inject-template-db: |
{{ with secret "database/creds/app-role" -}}
export DB_USER="{{ .Data.username }}"
export DB_PASS="{{ .Data.password }}"
{{- end }}
vault.hashicorp.com/agent-pre-populate: "true"
spec:
serviceAccountName: app-sa
containers:
- name: app
image: company/app:latest
GitHub Authentication
# Enable GitHub auth for team access
vault auth enable github
# Configure GitHub organization
vault write auth/github/config organization="mycompany" \
ttl="1h" \
max_ttl="24h"
# Map teams to Vault policies
vault write auth/github/map/teams/engineering \
value="engineering-policy,secrets-read"
vault write auth/github/map/teams/platform \
value="platform-admin-policy,secrets-admin"
# Login via GitHub (interactive)
vault login -method=github token="$GITHUB_TOKEN"
AppRole Authentication (Machine-to-Machine)
# AppRole is designed for machine authentication
vault auth enable approle
# Create an AppRole for CI/CD pipelines
vault write auth/approle/role/ci-pipeline \
secret_id_ttl="60m" \
token_ttl="15m" \
token_max_ttl="60m" \
secret_id_num_uses=10 \
token_policies="ci-read-policy"
# Fetch RoleID (long-lived, stored in CI variables)
vault read auth/approle/role/ci-pipeline/role-id
# RoleID: 8a2d-...-b3f1
# Generate SecretID (short-lived, generated per-run)
vault write -f auth/approle/role/ci-pipeline/secret-id
# SecretID: 1f9c-...-7e2a
# Login using RoleID + SecretID
vault write auth/approle/login \
role_id="8a2d-...-b3f1" \
secret_id="1f9c-...-7e2a"
Secret Engines
Key-Value v2 (Versioned Secrets)
# Enable KV v2 at a custom path
vault secrets enable -path=secret -version=2 kv
# Write a secret
vault kv put secret/production/api-gateway \
DATABASE_URL="postgres://prod-db.internal:5432/api" \
REDIS_URL="redis://prod-redis.internal:6379" \
STRIPE_API_KEY="sk_live_..." \
JWT_SECRET="$(openssl rand -hex 32)"
# Read a secret
vault kv get secret/production/api-gateway
# Read a specific field
vault kv get -field=DATABASE_URL secret/production/api-gateway
# Read a specific version
vault kv get -version=3 secret/production/api-gateway
# List versions
vault kv metadata get secret/production/api-gateway
# Roll back to a previous version
vault kv rollback -version=2 secret/production/api-gateway
# Delete latest version (soft delete)
vault kv delete secret/production/api-gateway
# Permanently delete a specific version
vault kv destroy -versions=4 secret/production/api-gateway
# Undelete
vault kv undelete -versions=4 secret/production/api-gateway
AWS Dynamic Credentials
# Enable AWS secrets engine
vault secrets enable -path=aws aws
# Configure root IAM credentials for Vault to use
vault write aws/config/root \
access_key="$VAULT_AWS_ACCESS_KEY" \
secret_key="$VAULT_AWS_SECRET_KEY" \
region="us-east-1" \
iam_endpoint="https://iam.amazonaws.com" \
sts_endpoint="https://sts.amazonaws.com" \
max_retries=5
# Create a role that generates IAM user credentials
vault write aws/roles/ec2-power-user \
credential_type="iam_user" \
policy_document=-
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:DescribeTags"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:RebootInstances"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/Environment": "production"
}
}
}
]
}
EOF
# Create a role for STS AssumeRole (recommended)
vault write aws/roles/s3-readonly \
credential_type="assumed_role" \
role_arns="arn:aws:iam::123456789:role/VaultS3ReadOnly" \
default_sts_ttl="3600" \
max_sts_ttl="7200"
# Generate dynamic credentials
vault read aws/creds/ec2-power-user
# Key Value
# --- -----
# access_key AKIA...X7P2
# secret_key wJalrXUtnFEMI/K7MDENG/bPxRfiCY...EXAMPLE
# security_token <nil>
# lease_id aws/creds/ec2-power-user/AbCdEfG...
# lease_duration 1h
# renewable true
PKI Engine for Internal TLS
# Enable PKI engine
vault secrets enable -path=pki pki
vault secrets tune -max-lease-ttl=87600h pki
# Generate root CA
vault write pki/root/generate/internal \
common_name="Company Internal Root CA" \
ttl=87600h \
key_type="rsa" \
key_bits=4096 \
ou="Security Team" \
organization="Company Inc." \
country="US" \
province="California" \
locality="San Francisco"
# Configure CA and CRL URLs
vault write pki/config/urls \
issuing_certificates="https://vault.internal:8200/v1/pki/ca" \
crl_distribution_points="https://vault.internal:8200/v1/pki/crl"
# Create a role for service certificates
vault write pki/roles/service-certs \
allowed_domains="internal.company.com,internal" \
allow_subdomains=true \
allow_bare_domains=true \
allow_localhost=true \
max_ttl="720h" \
key_type="ec" \
key_bits=256 \
require_cn=true \
client_flag=true \
server_flag=true \
key_usage=["DigitalSignature","KeyAgreement","KeyEncipherment"] \
ext_key_usage=["ServerAuth","ClientAuth"]
# Issue a certificate
vault write pki/issue/service-certs \
common_name="api.internal.company.com" \
ttl="720h" \
alt_names="api.internal,api" \
ip_sans="10.0.1.100,10.0.1.101"
# Response contains:
# - certificate (PEM)
# - issuing_ca (PEM)
# - private_key (PEM)
# - private_key_type: ec
# - serial_number
# - ca_chain[]
#!/bin/bash
# rotate-service-cert.sh โ Automated certificate rotation
ROLE="service-certs"
COMMON_NAME="${1:-$(hostname -f)}"
CERT_PATH="/etc/ssl/certs"
KEY_PATH="/etc/ssl/private"
echo "Requesting new certificate for $COMMON_NAME..."
CERT_DATA=$(vault write -format=json pki/issue/$ROLE \
common_name="$COMMON_NAME" \
ttl="720h")
echo "$CERT_DATA" | jq -r '.data.certificate' > "$CERT_PATH/server.crt"
echo "$CERT_DATA" | jq -r '.data.private_key' > "$KEY_PATH/server.key"
echo "$CERT_DATA" | jq -r '.data.issuing_ca' > "$CERT_PATH/ca.crt"
chmod 644 "$CERT_PATH/server.crt" "$CERT_PATH/ca.crt"
chmod 600 "$KEY_PATH/server.key"
# Reload service (e.g., nginx)
systemctl reload nginx
echo "Certificate rotated. Expires: $(echo "$CERT_DATA" | jq -r '.data.expiration')"
Policies and ACLs
HCL Policy Examples
# policies/app-production.hcl
# Application production secrets access
# Read production secrets for this application only
path "secret/data/production/api-gateway" {
capabilities = ["read", "list"]
}
# Read database dynamic credentials
path "database/creds/api-gateway" {
capabilities = ["read"]
}
# Read AWS credentials for S3 access
path "aws/creds/s3-app-bucket" {
capabilities = ["read"]
}
# Deny access to other applications' secrets (explicit deny)
path "secret/data/production/+" {
capabilities = ["deny"]
}
# Allow transit encryption for application data
path "transit/encrypt/api-gateway" {
capabilities = ["update"]
}
path "transit/decrypt/api-gateway" {
capabilities = ["update"]
}
# policies/platform-admin.hcl
# Platform team admin-level access (NOT root!)
# Read all secrets (but not write without break-glass)
path "secret/*" {
capabilities = ["read", "list"]
}
# Manage auth methods
path "sys/auth/*" {
capabilities = ["create", "read", "update", "delete", "sudo"]
}
# Manage policies
path "sys/policies/acl/*" {
capabilities = ["create", "read", "update", "delete"]
}
# Manage secret engines
path "sys/mounts/*" {
capabilities = ["create", "read", "update", "delete"]
}
# Read audit logs
path "sys/audit" {
capabilities = ["read"]
}
# Generate root token (break-glass only, heavily monitored)
path "sys/generate-root/*" {
capabilities = ["update"]
mfa_methods = ["totp_platform_admin"]
}
# Write policies
vault policy write app-production policies/app-production.hcl
vault policy write platform-admin policies/platform-admin.hcl
# Verify policy interpretation
vault policy read app-production
Vault Agent and Agent Injector
Vault Agent Configuration
# /etc/vault/agent.hcl
# Vault Agent for auto-auth and templating
vault {
address = "https://vault.internal:8200"
ca_cert = "/opt/vault/tls/ca.crt"
}
auto_auth {
method "kubernetes" {
mount_path = "auth/kubernetes"
config = {
role = "app-service"
}
}
sink "file" {
config = {
path = "/vault/.vault-token"
}
}
}
# Cache responses to reduce Vault server load
cache {
use_auto_auth_token = true
}
listener "tcp" {
address = "127.0.0.1:8100"
tls_disable = true # Localhost only
}
# Template secrets to files
template {
destination = "/opt/app/config/database.env"
perms = "0600"
command = "systemctl reload app"
contents = <<EOT
{{ with secret "database/creds/app-role" }}
DATABASE_URL=postgresql://{{ .Data.username }}:{{ .Data.password }}@db.internal:5432/app
{{ end }}
{{ with secret "secret/data/production/api-gateway" }}
REDIS_URL={{ .Data.data.REDIS_URL }}
API_KEY={{ .Data.data.STRIPE_API_KEY }}
{{ end }}
EOT
}
template {
destination = "/opt/app/certs/server.crt"
perms = "0644"
contents = <<EOT
{{ with secret "pki/issue/service-certs" "common_name=api.internal.company.com" "ttl=720h" }}
{{ .Data.certificate }}
{{ end }}
EOT
}
template {
destination = "/opt/app/certs/server.key"
perms = "0600"
contents = <<EOT
{{ with secret "pki/issue/service-certs" "common_name=api.internal.company.com" "ttl=720h" }}
{{ .Data.private_key }}
{{ end }}
EOT
}
Terraform Vault Provider Integration
# terraform/vault-setup.tf
# Complete Vault configuration via Terraform
terraform {
required_providers {
vault = {
source = "hashicorp/vault"
version = "~> 3.23"
}
}
}
provider "vault" {
address = var.vault_address
auth_login {
path = "auth/approle/login"
parameters = {
role_id = var.vault_role_id
secret_id = var.vault_secret_id
}
}
}
# Enable KV v2 secrets engine
resource "vault_mount" "secret" {
path = "secret"
type = "kv"
options = { version = "2" }
description = "Versioned secrets store"
}
# Write a production secret
resource "vault_kv_secret_v2" "database_config" {
mount = vault_mount.secret.path
name = "production/api-gateway"
data_json = jsonencode({
DATABASE_URL = "postgresql://prod-db.internal:5432/api"
REDIS_URL = "redis://prod-redis.internal:6379"
API_KEY = var.api_key # Injected via CI/CD
})
}
# Create a policy
resource "vault_policy" "app_production" {
name = "app-production"
policy = file("${path.module}/policies/app-production.hcl")
}
# Enable Kubernetes auth
resource "vault_auth_backend" "kubernetes" {
type = "kubernetes"
}
resource "vault_kubernetes_auth_backend_config" "kubernetes" {
backend = vault_auth_backend.kubernetes.path
kubernetes_host = var.k8s_host
kubernetes_ca_cert = var.k8s_ca_cert
token_reviewer_jwt = var.k8s_token
issuer = "https://kubernetes.default.svc.cluster.local"
}
resource "vault_kubernetes_auth_backend_role" "app_service" {
backend = vault_auth_backend.kubernetes.path
role_name = "app-service"
bound_service_account_names = ["app-sa"]
bound_service_account_namespaces = ["production"]
token_ttl = 3600
token_policies = [vault_policy.app_production.name]
}
# Enable AWS secrets engine
resource "vault_aws_secret_backend" "aws" {
access_key = var.vault_aws_access_key
secret_key = var.vault_aws_secret_key
region = "us-east-1"
path = "aws"
}
resource "vault_aws_secret_backend_role" "ec2_power_user" {
backend = vault_aws_secret_backend.aws.path
name = "ec2-power-user"
credential_type = "iam_user"
policy_document = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ec2:DescribeInstances",
"ec2:DescribeVolumes",
"ec2:CreateTags",
"ec2:DescribeTags"
]
Resource = "*"
}
]
})
}
Vault + GitHub Actions (OIDC Auth)
Use OIDC authentication to eliminate static tokens in GitHub Actions:
# Configure Vault for GitHub OIDC (one-time setup)
vault auth enable -path=github-actions jwt
vault write auth/github-actions/config \
oidc_discovery_url="https://token.actions.githubusercontent.com" \
bound_issuer="https://token.actions.githubusercontent.com"
# Create a role for specific repositories
vault write auth/github-actions/role/github-ci \
role_type="jwt" \
user_claim="repository" \
bound_claims_type="glob" \
bound_claims={
"repository"="mycompany/*",
"ref"="refs/heads/main"
} \
token_ttl="15m" \
token_policies="ci-read-policy"
# .github/workflows/vault-oidc-example.yml
name: Deploy with Vault Secrets
on:
push:
branches: [main]
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Authenticate to Vault
id: vault-auth
uses: hashicorp/vault-action@v3
with:
method: jwt
url: https://vault.internal.company.com
role: github-ci
path: github-actions
exportToken: true
- name: Read database credentials
run: |
DB_CREDS=$(vault read -format=json database/creds/app-role)
echo "DB_USER=$(echo "$DB_CREDS" | jq -r '.data.username')" >> "$GITHUB_ENV"
echo "DB_PASS=$(echo "$DB_CREDS" | jq -r '.data.password')" >> "$GITHUB_ENV"
- name: Read application secrets
run: |
SECRETS=$(vault kv get -format=json secret/production/api-gateway)
echo "REDIS_URL=$(echo "$SECRETS" | jq -r '.data.data.REDIS_URL')" >> "$GITHUB_ENV"
- name: Deploy application
run: |
./scripts/deploy.sh \
--db-user "$DB_USER" \
--db-pass "$DB_PASS" \
--redis-url "$REDIS_URL"
env:
VAULT_TOKEN: ${{ steps.vault-auth.outputs.vault_token }}
No Static Tokens Required
With OIDC authentication, no long-lived Vault token is stored in GitHub. Each workflow run receives a short-lived token (15 minutes TTL) based on JWT identity federation. The token is automatically scoped to the specific repository, branch, and workflow.
Unsealing Strategies
Shamir's Secret Sharing
# Initialize Vault with 5 key shares, threshold 3
vault operator init -key-shares=5 -key-threshold=3
# Output:
# Unseal Key 1: 1a2b3c4d5e6f...
# Unseal Key 2: 2b3c4d5e6f7g...
# Unseal Key 3: 3c4d5e6f7g8h...
# Unseal Key 4: 4d5e6f7g8h9i...
# Unseal Key 5: 5e6f7g8h9i0j...
# Initial Root Token: s.xxxxxxxxxxxxxx
# Distribute unseal keys to 5 separate operators
# Require 3 operators to unseal after restart
vault operator unseal 1a2b3c4d5e6f...
vault operator unseal 2b3c4d5e6f7g...
vault operator unseal 3c4d5e6f7g8h...
Auto-Unseal with AWS KMS
# vault.hcl โ Auto-unseal configuration (recommended for production)
seal "awskms" {
region = "us-east-1"
kms_key_id = "arn:aws:kms:us-east-1:123456789:key/12345678-1234-1234-1234-123456789012"
}
# terraform/kms-vault-unseal.tf
resource "aws_kms_key" "vault_unseal" {
description = "Vault auto-unseal key"
deletion_window_in_days = 30
enable_key_rotation = true
multi_region = true
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "Enable IAM User Permissions"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::123456789:root"
}
Action = "kms:*"
Resource = "*"
},
{
Sid = "Allow Vault Role"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::123456789:role/VaultServerRole"
}
Action = [
"kms:Encrypt",
"kms:Decrypt",
"kms:DescribeKey"
]
Resource = "*"
}
]
})
tags = {
Name = "vault-auto-unseal"
}
}
Monitoring Vault
Prometheus Metrics
# prometheus/vault-scrape-config.yml
scrape_configs:
- job_name: 'vault'
metrics_path: '/v1/sys/metrics'
params:
format: ['prometheus']
scheme: https
tls_config:
ca_file: /etc/prometheus/certs/vault-ca.crt
authorization:
type: Bearer
credentials_file: /etc/prometheus/vault-token
static_configs:
- targets: ['vault-node-1.internal:8200']
labels:
instance: 'vault-node-1'
- targets: ['vault-node-2.internal:8200']
labels:
instance: 'vault-node-2'
- targets: ['vault-node-3.internal:8200']
labels:
instance: 'vault-node-3'
# Key Vault Prometheus Alerts
# vault-alerts.yml
groups:
- name: vault
rules:
- alert: VaultSealed
expr: vault_core_unsealed == 0
for: 0m
labels:
severity: critical
annotations:
summary: "Vault instance is sealed"
description: "Vault {{ $labels.instance }} has been sealed for more than 1 minute"
- alert: VaultHighLatency
expr: histogram_quantile(0.99, vault_core_handle_request{quantile="0.99"}) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Vault request latency is high"
description: "99th percentile latency is {{ $value }}s"
- alert: VaultLeadershipChange
expr: changes(vault_core_active{job="vault"}[5m]) > 0
labels:
severity: warning
annotations:
summary: "Vault leadership changed"
description: "Vault cluster has experienced a leadership change"
Disaster Recovery and Backup
Raft Snapshot Management
#!/bin/bash
# vault-backup.sh โ Automated Raft snapshot backup
VAULT_ADDR="https://vault.internal:8200"
VAULT_TOKEN="$(cat /etc/vault/backup-token)"
BACKUP_DIR="/backup/vault/$(date +%Y%m%d)"
S3_BUCKET="s3://company-vault-backups"
RETENTION_DAYS=30
mkdir -p "$BACKUP_DIR"
# Create Raft snapshot
vault operator raft snapshot save "$BACKUP_DIR/vault.snapshot"
# Encrypt snapshot with GPG
gpg --encrypt --recipient vault-backup@company.com \
--output "$BACKUP_DIR/vault.snapshot.gpg" \
"$BACKUP_DIR/vault.snapshot"
# Upload to S3
aws s3 cp "$BACKUP_DIR/vault.snapshot.gpg" "$S3_BUCKET/snapshots/"
# Clean up local files
rm -rf "$BACKUP_DIR"
# Remove old snapshots from S3
aws s3 ls "$S3_BUCKET/snapshots/" | \
awk '{print $4}' | \
while read -r file; do
aws s3 rm "$S3_BUCKET/snapshots/$file"
done
echo "Vault backup completed at $(date)"
DR Cluster Configuration
# Primary cluster: Enable replication
vault write -f sys/replication/dr/primary/enable
# Generate a secondary token
vault write sys/replication/dr/primary/secondary-token \
id="dr-secondary" \
ttl="24h"
# On DR cluster: Enable secondary
vault write sys/replication/dr/secondary/enable \
token="...token from primary..." \
primary_api_addr="https://vault-primary.internal:8200" \
ca_file="/opt/vault/tls/ca.crt"
# Promote DR to primary (disaster recovery)
vault write -f sys/replication/dr/secondary/promote \
dr_operation_token="..."
Related Topics
- Secrets Management โ Comprehensive secrets lifecycle and rotation
- SecOps Overview โ DevSecOps principles and security frameworks
- IAM & RBAC โ Identity and access management patterns