Kubernetes Deployment¶

This document describes the Kubernetes deployment model for AgentOS, including scaling, secrets management, and operational patterns.

Deployment Architecture¶

Each AgentOS pod contains two containers:

AgentOS Container — the main FastAPI server (ghcr.io/k8s-engineering/agno-k8s/agentos, port 8000, UID 61000)
Git-Sync Sidecar (optional) — pulls agent code from Git into a shared volume

A ClusterIP Service fronts the pods. Optionally, an Istio VirtualService routes external traffic through a gateway.

External dependencies: - PostgreSQL (RDS) — agent sessions, knowledge/RAG tables (ai schema), trace storage - OTLP Collector — receives traces and metrics (Coralogix, Grafana, Datadog, Phoenix, etc.)

graph TB subgraph "Kubernetes Cluster" subgraph "AgentOS Pod" AOS["AgentOS Container :8000 · UID 61000"] GS["git-sync sidecar (optional)"] VOL[("/agents volume")] GS -->|atomic symlink swap| VOL VOL -->|reads| AOS end SVC["ClusterIP Service"] SVC --> AOS VS["Istio VirtualService (optional)"] --> SVC INITJOB["init-db Job Helm pre-install hook (optional)"] -->|creates DB, user, schema| PG ESO["ExternalSecret (optional)"] -->|populates| SEC["K8s Secret"] SEC -->|mounted as env| AOS end PG[("PostgreSQL")] AOS -->|sessions, RAG, traces| PG OTLP["OTLP Collector"] AOS -->|traces & metrics| OTLP GIT["GitHub Repo"] GIT -->|pull| GS GW["External Traffic"] --> VS style AOS fill:#4051b5,color:#fff style PG fill:#336791,color:#fff style VS fill:#466bb0,color:#fff style INITJOB fill:#ff9800,color:#fff style ESO fill:#7b1fa2,color:#fff

Feature Flags¶

The Helm chart uses simple boolean flags to toggle optional components:

Istio (`istio.enabled`)¶

When enabled, creates an Istio VirtualService that routes external traffic through an Istio gateway to the AgentOS service.

istio:
  enabled: true
  gateway: "istio-ingress/default-gateway"
  hostname: "agentos.example.com"
  timeout: "3600s"

Init Database (`initDb.enabled`)¶

When enabled, creates a Kubernetes Job (Helm pre-install hook) that:

Creates the agno database if it doesn't exist
Creates the application user with the configured password
Creates the ai schema for PgVector/RAG tables
Enables the vector extension
Grants appropriate privileges

initDb:
  enabled: true
  image: postgres:17-alpine
  adminCredentials:
    secretName: "rds-credentials"
    usernameKey: "username"
    passwordKey: "password"

Git-Sync (`gitSync.enabled`)¶

When enabled, adds the git-sync sidecar container and shared volume to the pod.

gitSync:
  enabled: true
  repo: "https://github.com/your-org/your-agents.git"
  branch: "main"
  subPath: "agents"
  period: "30s"
  auth:
    username: "x-access-token"
    secretName: "git-credentials"
    secretKey: "token"

The git-sync sidecar writes to an emptyDir shared volume. The AgentOS container reads from this volume at /agents. When git-sync detects a new commit, it creates a new worktree directory and atomically swaps the symlink. The AgentOS filesystem watcher detects this and triggers a reload.

sequenceDiagram participant GH as GitHub participant GS as git-sync participant Vol as emptyDir Volume participant W as Watcher (inotify) participant AOS as AgentOS loop Every period (default 30s) GS->>GH: git fetch alt New commit detected GS->>Vol: Clone to rev-XXXXX/ GS->>Vol: ln -sfn rev-XXXXX current Vol-->>W: Symlink change event W->>W: Debounce (2s) W->>AOS: Trigger reload AOS->>AOS: discover_agents() AOS->>AOS: resync() end end

External Secrets (`externalSecrets.enabled`)¶

When enabled, creates ExternalSecret resources that fetch secrets from an external provider and populate a Kubernetes Secret.

The chart supports any provider that has an External Secrets Operator SecretStore: Keeper, AWS SSM, HashiCorp Vault, Azure Key Vault, GCP Secret Manager, etc.

externalSecrets:
  enabled: true
  targetSecretName: "agentos-secrets"
  refreshInterval: "48h"
  secretStoreRef:
    name: "keeper-store"
    kind: "SecretStore"
  secrets:
    openai_api_key:
      remoteRef:
        key: "<keeper-record-id>"
        property: "password"
    jira_token:
      remoteRef:
        key: "<keeper-record-id>"
        property: "password"

Multi-Provider Support¶

The secrets map is provider-agnostic. The remoteRef fields map directly to the External Secrets Operator API. Different providers use different key formats:

Provider	remoteRef.key format	remoteRef.property
Keeper	Record ID (e.g. sLXJGAObJv9bZauWHdm4hA)	password, login, etc.
AWS SSM	Parameter path (e.g. /prod/db/password)	n/a
AWS Secrets Manager	Secret name (e.g. prod/myapp/secrets)	JSON key
HashiCorp Vault	Path (e.g. secret/data/myapp)	JSON key
Azure Key Vault	Secret name	n/a

RDS Credentials¶

A separate ExternalSecret can be configured for RDS admin credentials (used by the init-db job). This typically uses a different SecretStore (e.g. ClusterSecretStore backed by AWS SSM).

externalSecrets:
  rdsCredentials:
    enabled: true
    secretStoreRef:
      name: "external-secrets-store"
      kind: "ClusterSecretStore"
    target:
      name: "rds-credentials"
    remoteRef:
      usernameKey: "/prod/database/username"
      passwordKey: "/prod/database/password"

Database¶

AgentOS uses PostgreSQL with the pgvector extension for:

Agent sessions and state managed by Agno in the public schema
Knowledge/RAG tables PgVector embeddings in the ai schema
Trace storage DatabaseSpanExporter writes OTel spans for the Agno UI

Connection URL Convention¶

The Helm chart auto-generates PostgreSQL URLs from the database values:

{driver}://{appUser}:{password}@{host}:{port}/{name}

Three URL variants are injected as environment variables:

Env Var	Format	Usage
POSTGRES_URL	postgresql+psycopg://...	SQLAlchemy (with psycopg driver)
POSTGRES_RAW_URL	postgresql://...	Raw connections (psql, migrations)
PGVECTOR_DB_URL	postgresql+psycopg://...	PgVector operations

Scaling¶

Multi-Replica¶

AgentOS pods are stateless. All persistent state lives in PostgreSQL. You can scale horizontally by increasing replicaCount or using a HorizontalPodAutoscaler.

With Istio enabled, traffic is automatically load-balanced across replicas.

graph LR GW["Istio Gateway"] --> VS["VirtualService"] VS --> SVC["ClusterIP Service"] SVC --> P1["Pod 1 AgentOS + git-sync"] SVC --> P2["Pod 2 AgentOS + git-sync"] SVC --> P3["Pod N AgentOS + git-sync"] P1 --> PG[("PostgreSQL (shared state)")] P2 --> PG P3 --> PG style GW fill:#466bb0,color:#fff style PG fill:#336791,color:#fff

Important: If your agents use in-memory dedup or caching (e.g. webhook dedup windows), you will need to centralize that state in Redis or PostgreSQL for multi-replica deployments.

Resource Defaults¶

resources:
  requests:
    cpu: "1"
    memory: "2Gi"

Adjust based on your agent workloads. LLM-heavy agents may need more memory; CPU-bound agents may need more CPU.

Health Checks¶

The Helm chart configures both liveness and readiness probes against the built-in /health endpoint:

Liveness: initialDelaySeconds: 30, periodSeconds: 15
Readiness: initialDelaySeconds: 15, periodSeconds: 10

Security¶

Non-root: All containers run as UID 61000
Read-only filesystem: Can be enabled via securityContext.readOnlyRootFilesystem
No privilege escalation: allowPrivilegeEscalation: false
Dropped capabilities: All capabilities dropped
Secrets: Never stored in values.yaml, always fetched via External Secrets Operator

Observability¶

Tracing¶

Dual-export tracing sends OTel spans to: 1. PostgreSQL for the Agno UI dashboard 2. External OTLP collector for Grafana, Datadog, Coralogix, Phoenix, etc.

graph LR AOS["AgentOS"] --> TP["TracerProvider"] TP --> DB["DB Exporter → PostgreSQL"] TP --> OT["OTLP Exporter → Collector"] DB --> UI["Agno UI"] OT --> GR["Grafana / Datadog / Coralogix / Phoenix"] style AOS fill:#4051b5,color:#fff style DB fill:#336791,color:#fff style OT fill:#f5a623,color:#fff

Configure via environment variables:

env:
  - name: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
    value: "http://collector.observability.svc:4318/v1/traces"

Metrics¶

Application metrics are pushed via OTLP gRPC when OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is set. The /api/metrics endpoint provides a JSON debug view of current gauge values.