Agno Architecture¶
This document describes the Agno AgentOS framework patterns used in this deployment and the architectural decisions behind them.
Overview¶
AgentOS is a FastAPI-based runtime for managing AI agents built with the Agno framework. It provides:
- Agent lifecycle management (load, register, serve via API)
- Session and knowledge management (PostgreSQL-backed)
- MCP (Model Context Protocol) tool integration
- Health checks, scheduling, and auto-discovery
This repository implements AgentOS following Agno's recommended patterns rather than the monolithic approach that often evolves in POC deployments.
What Changed from the Default AgentOS¶
The upstream agentos-docker-template is a minimal starter — 43-line main.py, two hardcoded agents, Docker Compose workflow. This repo restructures it for production Kubernetes while keeping the same base image and entrypoint.
Comparison¶
| Aspect | Default Template | K8s Version |
|---|---|---|
main.py |
43 lines — construct AgentOS, call get_app() |
147 lines — tracing, discovery, base_app pattern, constructor lifespan |
| Agent loading | Static imports baked into image | Dynamic importlib discovery from /agents volume |
| Hot-reload | None (container restart) | Filesystem watcher + symlink poller (seconds) |
| Custom routes | None | base_app + APIRouter modules (/admin/*, /api/*) |
| Tracing | DB exporter only (tracing=True) |
Dual-export: DB + OTLP to external collector |
| Metrics | None | OTel SDK: counters, histograms, gauges via OTLP gRPC |
| Default CMD | chill (sleep loop) |
uvicorn app.main:app (auto-start) |
| COPY strategy | COPY . . (entire repo + agents) |
COPY src/ . (app only, no agents) |
| Dependencies | 90 packages | 157 packages (+observability, data processing, doc parsing) |
| Deployment | Single container, docker compose up |
Multi-container pod (+ git-sync sidecar), Helm-managed |
| Scaling | Single instance | Horizontal replicas behind Istio/Service |
| Secrets | example.env, manual |
External Secrets Operator (Keeper, AWS SSM, Vault) |
New Modules¶
| Module | Purpose |
|---|---|
agent_loader.py |
Dynamic agent discovery — scans /agents, collects Agent instances, handles .env and reloads |
watcher.py |
Filesystem watcher + git-sync symlink poller — debounced reload with route snapshot/restore |
metrics.py |
OTel metrics SDK — counters, histograms, observable gauges; OTLP gRPC push exporter |
shared.py |
Cross-router state — agent_os reference, webhook auth, agent lookup |
routers/admin.py |
/admin/reload endpoint for manual agent resync |
routers/observability.py |
/api/metrics debug endpoint + background DB size collector |
Unchanged from Upstream¶
Base image (agnohq/python:3.12), non-root user (UID 61000), entrypoint script, db/session.py, core AgentOS() constructor parameters, config.yaml format, and uv pip sync installer are all identical.
Application Bootstrap Flow¶
Four Pillars¶
1. base_app Pattern¶
AgentOS accepts a base_app parameter — a custom FastAPI application with your own routes. This cleanly separates your webhook handlers and admin endpoints from AgentOS's internal routes.
base_app = FastAPI(title="AgentOS")
base_app.include_router(admin.router)
base_app.include_router(observability.router)
agent_os = AgentOS(
agents=agents,
base_app=base_app,
on_route_conflict="preserve_base_app",
)
app = agent_os.get_app()
Why on_route_conflict="preserve_base_app": Your routes (/api/*, /admin/*) don't overlap with AgentOS routes (/agents/*, /sessions/*, /knowledge/*, /health), so the conflict handler rarely fires. Setting it explicitly documents intent and protects against future AgentOS releases adding overlapping routes.
2. Constructor Lifespan¶
Startup and shutdown logic is passed to the AgentOS() constructor via the lifespan parameter — not monkey-patched after construction.
@asynccontextmanager
async def lifespan(app_instance: FastAPI):
observer = start_watcher(AGENTS_DIR, agent_os, app_instance)
setup_otlp_metrics()
# ... start daemon threads ...
yield
stop_watcher(observer)
agent_os = AgentOS(
lifespan=lifespan,
# ...
)
When lifespan is passed at construction, get_app() calls _add_agent_os_to_lifespan_function() which inspects the function signature. If your lifespan accepts an agent_os parameter, the framework injects the AgentOS instance automatically.
Lifespan composition order (from the base_app branch of get_app()):
- Your lifespan (startup)
- DB lifespan
- MCP tools lifespan
- httpx cleanup
- Scheduler lifespan
- yield (app is serving)
- Scheduler shutdown
- httpx cleanup
- MCP tools cleanup
- DB cleanup
- Your lifespan (shutdown)
Your lifespan wraps everything — it starts before framework resources initialize and stops after they close.
3. Dynamic Agent Loader + Watcher¶
Agents are not baked into the container image. They are plain Python files delivered via git-sync and discovered at runtime using importlib.
/agents/
├── my_agent.py # Agent module
├── another_agent.py # Another agent
├── helpers/
│ ├── __init__.py # Package with agents
│ └── utils.py
└── .env # Shared secrets (loaded before agent import)
The agent_loader.py module:
1. Loads .env from the agents directory (if present)
2. Scans for *.py files (skipping _-prefixed names)
3. Imports each as a module, collecting top-level Agent instances
4. Reloads previously-imported modules so code changes take effect
The watcher.py module uses watchdog to monitor the agents directory:
- Debounced reload (2 seconds) prevents rapid-fire reloads during git-sync writes
- Symlink poller detects git-sync worktree swaps that inotify cannot see
Route Snapshot/Restore¶
Critical invariant: AgentOS.resync() unconditionally clears all routes in _reprovision_routers(). Using base_app alone does NOT fix this — the route-wiping happens regardless.
The watcher snapshots custom routes before resync and restores them after, using an exclude-list of AgentOS-owned prefixes:
AGENTOS_PREFIXES = (
"/agents/", "/sessions/", "/knowledge/",
"/health", "/docs", "/openapi.json", "/mcp",
)
# Snapshot: everything NOT matching these prefixes is custom
custom_routes = [
route for route in app.router.routes
if hasattr(route, "path")
and not any(route.path.startswith(p) for p in AGENTOS_PREFIXES)
]
This is more defensive than an include-list — new custom routes are automatically preserved without filter updates.
4. Daemon Threads in Lifespan¶
Long-running persistent workers (triage queues, metrics collectors, etc.) use threading.Thread(daemon=True) started inside the lifespan, not at import time.
@asynccontextmanager
async def lifespan(app_instance: FastAPI):
# Workers start only when the app is serving
threading.Thread(target=collect_db_metrics_loop, daemon=True).start()
yield
Why daemon threads over asyncio tasks: Workers that call agent.arun() need their own event loops because MCP tools use blocking SSE transports. Running these on the main uvicorn loop would block it. The thread-per-worker model with disposable event loops is the correct pattern.
Why not import-time: Starting workers at import time means importing main.py in tests spawns threads. Moving them to the lifespan eliminates import-time side effects.
Tracing Architecture¶
AgentOS uses dual-export tracing:
- Database exporter —
DatabaseSpanExporterwrites spans to PostgreSQL for the Agno UI - OTLP exporter — sends spans to an external collector (Phoenix, Grafana, Datadog, etc.)
Both exporters are wired into the same TracerProvider via SimpleSpanProcessor, so every span goes to both destinations.
Agent Packaging Model¶
- No per-agent Helm charts — all agents share a single AgentOS deployment
- No agent code in the container image — agents live in a separate Git repository
- git-sync sidecar delivers agent code to
/agentsvia anemptyDirshared volume - Hot-reload picks up changes within seconds of a git push
This means you can update agent behavior (prompts, tools, logic) without rebuilding or redeploying the AgentOS container.
Metrics¶
The metrics.py module defines pure OpenTelemetry SDK instruments:
| Type | Name | Description |
|---|---|---|
| Counter | agno.webhook.requests |
Total webhook requests |
| Counter | agno.agent.runs |
Total agent runs |
| Counter | agno.dedup.decisions |
Dedup hit/miss |
| Histogram | agno.webhook.duration |
Webhook processing time |
| Histogram | agno.agent.duration |
Agent run duration |
| UpDownCounter | agno.agent.in_progress |
Active agent runs |
| Observable Gauge | agno.agents.loaded |
Loaded agent count |
| Observable Gauge | agno.queue.depth |
Queue depths |
| Observable Gauge | agno.db.table.size |
Postgres table sizes |
| Observable Gauge | agno.db.table.rows |
Approximate row counts |
When OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is set, metrics are pushed via OTLP gRPC every 30 seconds. When unset, all instruments are no-ops with zero runtime cost.