Agno Architecture¶

This document describes the Agno AgentOS framework patterns used in this deployment and the architectural decisions behind them.

Overview¶

AgentOS is a FastAPI-based runtime for managing AI agents built with the Agno framework. It provides:

Agent lifecycle management (load, register, serve via API)
Session and knowledge management (PostgreSQL-backed)
MCP (Model Context Protocol) tool integration
Health checks, scheduling, and auto-discovery

This repository implements AgentOS following Agno's recommended patterns rather than the monolithic approach that often evolves in POC deployments.

What Changed from the Default AgentOS¶

The upstream agentos-docker-template is a minimal starter — 43-line main.py, two hardcoded agents, Docker Compose workflow. This repo restructures it for production Kubernetes while keeping the same base image and entrypoint.

graph LR subgraph "Default Template" DM["main.py 43 lines"] --> DOS["AgentOS()"] DOS --> DAPP["get_app()"] DA["agents/ hardcoded imports"] --> DOS end subgraph "K8s Version" KM["main.py 147 lines"] --> KBA["base_app + routers"] KBA --> KOS["AgentOS( base_app, lifespan)"] KOS --> KAPP["get_app()"] KAL["agent_loader dynamic discovery"] --> KOS KW["watcher hot-reload"] -.-> KOS KME["metrics OTel SDK"] -.-> KAPP end style DM fill:#e8eaf6,stroke:#3949ab style KM fill:#4051b5,color:#fff style KOS fill:#4051b5,color:#fff style KW fill:#ff9800,color:#fff style KME fill:#f5a623,color:#fff

Comparison¶

Aspect	Default Template	K8s Version
`main.py`	43 lines — construct `AgentOS`, call `get_app()`	147 lines — tracing, discovery, `base_app` pattern, constructor lifespan
Agent loading	Static imports baked into image	Dynamic `importlib` discovery from `/agents` volume
Hot-reload	None (container restart)	Filesystem watcher + symlink poller (seconds)
Custom routes	None	`base_app` + `APIRouter` modules (`/admin/`, `/api/`)
Tracing	DB exporter only (`tracing=True`)	Dual-export: DB + OTLP to external collector
Metrics	None	OTel SDK: counters, histograms, gauges via OTLP gRPC
Default CMD	`chill` (sleep loop)	`uvicorn app.main:app` (auto-start)
COPY strategy	`COPY . .` (entire repo + agents)	`COPY src/ .` (app only, no agents)
Dependencies	90 packages	157 packages (+observability, data processing, doc parsing)
Deployment	Single container, `docker compose up`	Multi-container pod (+ git-sync sidecar), Helm-managed
Scaling	Single instance	Horizontal replicas behind Istio/Service
Secrets	`example.env`, manual	External Secrets Operator (Keeper, AWS SSM, Vault)

New Modules¶

Module	Purpose
`agent_loader.py`	Dynamic agent discovery — scans `/agents`, collects `Agent` instances, handles `.env` and reloads
`watcher.py`	Filesystem watcher + git-sync symlink poller — debounced reload with route snapshot/restore
`metrics.py`	OTel metrics SDK — counters, histograms, observable gauges; OTLP gRPC push exporter
`shared.py`	Cross-router state — `agent_os` reference, webhook auth, agent lookup
`routers/admin.py`	`/admin/reload` endpoint for manual agent resync
`routers/observability.py`	`/api/metrics` debug endpoint + background DB size collector

Unchanged from Upstream¶

Base image (agnohq/python:3.12), non-root user (UID 61000), entrypoint script, db/session.py, core AgentOS() constructor parameters, config.yaml format, and uv pip sync installer are all identical.

Application Bootstrap Flow¶

flowchart LR A["Dual-export Tracing Setup"] --> B["Discover Agents from /agents"] B --> C["Create base_app + include routers"] C --> D["Construct AgentOS (base_app, lifespan)"] D --> E["agent_os.get_app()"] E --> F["Uvicorn serves app"] style A fill:#e8eaf6,stroke:#3949ab style D fill:#4051b5,color:#fff style F fill:#43a047,color:#fff

Four Pillars¶

1. `base_app` Pattern¶

AgentOS accepts a base_app parameter — a custom FastAPI application with your own routes. This cleanly separates your webhook handlers and admin endpoints from AgentOS's internal routes.

base_app = FastAPI(title="AgentOS")
base_app.include_router(admin.router)
base_app.include_router(observability.router)

agent_os = AgentOS(
    agents=agents,
    base_app=base_app,
    on_route_conflict="preserve_base_app",
)
app = agent_os.get_app()

Why on_route_conflict="preserve_base_app": Your routes (/api/*, /admin/*) don't overlap with AgentOS routes (/agents/*, /sessions/*, /knowledge/*, /health), so the conflict handler rarely fires. Setting it explicitly documents intent and protects against future AgentOS releases adding overlapping routes.

graph LR subgraph "base_app (FastAPI)" R1["/admin/reload"] R2["/api/metrics"] end subgraph "AgentOS Routes" R3["/agents/*"] R4["/sessions/*"] R5["/knowledge/*"] R6["/health"] end BA["base_app"] -->|passed to| AOS["AgentOS(base_app=...)"] AOS -->|merges routes| APP["Final App"] style BA fill:#e8eaf6,stroke:#3949ab style AOS fill:#4051b5,color:#fff style APP fill:#43a047,color:#fff

2. Constructor Lifespan¶

Startup and shutdown logic is passed to the AgentOS() constructor via the lifespan parameter — not monkey-patched after construction.

@asynccontextmanager
async def lifespan(app_instance: FastAPI):
    observer = start_watcher(AGENTS_DIR, agent_os, app_instance)
    setup_otlp_metrics()
    # ... start daemon threads ...
    yield
    stop_watcher(observer)

agent_os = AgentOS(
    lifespan=lifespan,
    # ...
)

When lifespan is passed at construction, get_app() calls _add_agent_os_to_lifespan_function() which inspects the function signature. If your lifespan accepts an agent_os parameter, the framework injects the AgentOS instance automatically.

Lifespan composition order (from the base_app branch of get_app()):

Your lifespan (startup)
DB lifespan
MCP tools lifespan
httpx cleanup
Scheduler lifespan
yield (app is serving)
Scheduler shutdown
httpx cleanup
MCP tools cleanup
DB cleanup
Your lifespan (shutdown)

Your lifespan wraps everything — it starts before framework resources initialize and stops after they close.

sequenceDiagram participant YL as Your Lifespan participant DB as DB Lifespan participant MCP as MCP Tools participant SCHED as Scheduler participant APP as App Serving Note over YL,APP: Startup (top to bottom) YL->>YL: Start watcher, metrics, workers DB->>DB: Init tables, connections MCP->>MCP: Register MCP tools SCHED->>SCHED: Start scheduler poller APP->>APP: Serving requests... Note over YL,APP: Shutdown (bottom to top) SCHED->>SCHED: Stop scheduler MCP->>MCP: Cleanup tools DB->>DB: Close connections YL->>YL: Stop watcher

3. Dynamic Agent Loader + Watcher¶

Agents are not baked into the container image. They are plain Python files delivered via git-sync and discovered at runtime using importlib.

/agents/
├── my_agent.py          # Agent module
├── another_agent.py     # Another agent
├── helpers/
│   ├── __init__.py      # Package with agents
│   └── utils.py
└── .env                 # Shared secrets (loaded before agent import)

The agent_loader.py module: 1. Loads .env from the agents directory (if present) 2. Scans for *.py files (skipping _-prefixed names) 3. Imports each as a module, collecting top-level Agent instances 4. Reloads previously-imported modules so code changes take effect

The watcher.py module uses watchdog to monitor the agents directory: - Debounced reload (2 seconds) prevents rapid-fire reloads during git-sync writes - Symlink poller detects git-sync worktree swaps that inotify cannot see

Route Snapshot/Restore¶

Critical invariant: AgentOS.resync() unconditionally clears all routes in _reprovision_routers(). Using base_app alone does NOT fix this — the route-wiping happens regardless.

The watcher snapshots custom routes before resync and restores them after, using an exclude-list of AgentOS-owned prefixes:

AGENTOS_PREFIXES = (
    "/agents/", "/sessions/", "/knowledge/",
    "/health", "/docs", "/openapi.json", "/mcp",
)

# Snapshot: everything NOT matching these prefixes is custom
custom_routes = [
    route for route in app.router.routes
    if hasattr(route, "path")
    and not any(route.path.startswith(p) for p in AGENTOS_PREFIXES)
]

This is more defensive than an include-list — new custom routes are automatically preserved without filter updates.

flowchart TD FS["Filesystem Event (symlink swap)"] --> DEB["Debounce 2 seconds"] DEB --> SNAP["Snapshot custom routes"] SNAP --> DISC["Re-discover agents importlib reload"] DISC --> SYNC["agent_os.resync() (wipes all routes)"] SYNC --> REST["Restore custom routes"] REST --> LIVE["New agents live"] style FS fill:#ff9800,color:#fff style SYNC fill:#d32f2f,color:#fff style LIVE fill:#43a047,color:#fff

4. Daemon Threads in Lifespan¶

Long-running persistent workers (triage queues, metrics collectors, etc.) use threading.Thread(daemon=True) started inside the lifespan, not at import time.

@asynccontextmanager
async def lifespan(app_instance: FastAPI):
    # Workers start only when the app is serving
    threading.Thread(target=collect_db_metrics_loop, daemon=True).start()
    yield

Why daemon threads over asyncio tasks: Workers that call agent.arun() need their own event loops because MCP tools use blocking SSE transports. Running these on the main uvicorn loop would block it. The thread-per-worker model with disposable event loops is the correct pattern.

Why not import-time: Starting workers at import time means importing main.py in tests spawns threads. Moving them to the lifespan eliminates import-time side effects.

Tracing Architecture¶

AgentOS uses dual-export tracing:

Database exporter — DatabaseSpanExporter writes spans to PostgreSQL for the Agno UI
OTLP exporter — sends spans to an external collector (Phoenix, Grafana, Datadog, etc.)

Both exporters are wired into the same TracerProvider via SimpleSpanProcessor, so every span goes to both destinations.

graph LR AGENT["Agent Run"] --> TP["TracerProvider"] TP --> SSP1["SimpleSpanProcessor"] TP --> SSP2["SimpleSpanProcessor"] SSP1 --> DBE["DatabaseSpanExporter (PostgreSQL → Agno UI)"] SSP2 --> OTLP["OTLPSpanExporter (External Collector)"] style AGENT fill:#4051b5,color:#fff style DBE fill:#336791,color:#fff style OTLP fill:#f5a623,color:#fff

Agent Packaging Model¶

No per-agent Helm charts — all agents share a single AgentOS deployment
No agent code in the container image — agents live in a separate Git repository
git-sync sidecar delivers agent code to /agents via an emptyDir shared volume
Hot-reload picks up changes within seconds of a git push

This means you can update agent behavior (prompts, tools, logic) without rebuilding or redeploying the AgentOS container.

Metrics¶

The metrics.py module defines pure OpenTelemetry SDK instruments:

Type	Name	Description
Counter	`agno.webhook.requests`	Total webhook requests
Counter	`agno.agent.runs`	Total agent runs
Counter	`agno.dedup.decisions`	Dedup hit/miss
Histogram	`agno.webhook.duration`	Webhook processing time
Histogram	`agno.agent.duration`	Agent run duration
UpDownCounter	`agno.agent.in_progress`	Active agent runs
Observable Gauge	`agno.agents.loaded`	Loaded agent count
Observable Gauge	`agno.queue.depth`	Queue depths
Observable Gauge	`agno.db.table.size`	Postgres table sizes
Observable Gauge	`agno.db.table.rows`	Approximate row counts

When OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is set, metrics are pushed via OTLP gRPC every 30 seconds. When unset, all instruments are no-ops with zero runtime cost.