Deploying Gotthard - The Fission Platform AI Agent¶
Complete guide for deploying Gotthard, the Fission platform's AI assistant, to your Kubernetes cluster.
What You're Building¶
Gotthard is a production-ready AI platform assistant that: - Answers questions about the Fission platform via Slack - Uses LangGraph for multi-agent orchestration - Integrates with AWS Bedrock (Claude) and MCP tools - Provides real-time streaming responses via SSE - Supports surveys, feedback collection, and announcements - Runs on Kubernetes with auto-scaling - Deploys via GitOps (ArgoCD)
Architecture Overview¶
┌─────────────┐
│ Slack User │
└──────┬──────┘
│
▼
┌──────────────────────┐
│ gotthard-slack-bot │ (Slack Bolt - Python)
│ (Pod in K8s) │ - Message routing
└──────┬───────────────┘ - Survey management
│ - Feedback collection
│ SSE Stream
▼
┌──────────────────────┐
│ gotthard-api │ (FastAPI + LangGraph)
│ (Pod in K8s) │ - Multi-agent router
└──────┬───────────────┘ - Specialized agents
│ - Document RAG
▼
┌──────────────────────┐
│ AWS Bedrock Claude │ + MCP Tools
│ (LLM Provider) │ (GitHub, ArgoCD, etc.)
└──────────────────────┘
│
▼
┌──────────────────────┐
│ PostgreSQL │ (Memory + Surveys)
└──────────────────────┘
Repository Structure¶
fission/
├── services/
│ ├── gotthard-api/ # LangGraph API service
│ │ ├── app/
│ │ │ ├── agent.py # Multi-agent workflow
│ │ │ ├── api.py # FastAPI + SSE endpoints
│ │ │ └── mcp_servers/ # MCP integrations
│ │ ├── Dockerfile
│ │ ├── pyproject.toml
│ │ └── docs/
│ │ ├── DEPLOYMENT.md
│ │ └── MULTI_AGENT_ARCHITECTURE.md
│ │
│ └── gotthard-slack-bot/ # Slack interface
│ ├── slack_bot/
│ │ ├── app.py # Slack Bolt app
│ │ ├── client.py # SSE client
│ │ ├── storage/ # Survey/feedback DB
│ │ └── utils/
│ │ ├── handlers.py # Message routing
│ │ └── streaming.py # SSE streaming
│ ├── Dockerfile
│ └── pyproject.toml
│
└── deploy/
├── gotthard/ # Helm deployment configs
│ ├── values.yaml
│ └── manifests/
└── argocd/
└── platform-gotthard.yaml # ArgoCD application
Quick Deploy¶
Prerequisites¶
- Kubernetes cluster with:
- PostgreSQL database (or use managed RDS)
- AWS IRSA/Pod Identity configured for Bedrock access
-
ArgoCD installed
-
Slack workspace with:
- Slack app created
- Bot token (
xoxb-...) -
Signing secret
-
AWS credentials with:
- Bedrock API access (Claude models)
-
IAM role for pod authentication
-
Tools installed:
- kubectl configured
- AWS CLI configured
Step 1: Create Secrets¶
# Create namespace
kubectl create namespace platform-agents
# Slack credentials for slack-bot
kubectl create secret generic gotthard-slack \
--namespace platform-agents \
--from-literal=bot-token="xoxb-..." \
--from-literal=signing-secret="..."
# Database URL for both services
kubectl create secret generic gotthard-db \
--namespace platform-agents \
--from-literal=database-url="postgresql://user:pass@host:5432/gotthard"
Step 2: Deploy via ArgoCD¶
# Apply ArgoCD application
kubectl apply -f deploy/argocd/platform-gotthard.yaml
# Watch deployment
argocd app get platform-gotthard
argocd app sync platform-gotthard
# Check pods
kubectl get pods -n platform-agents -l app.kubernetes.io/name=gotthard
Step 3: Verify Deployment¶
# Check gotthard-api health
kubectl port-forward -n platform-agents svc/gotthard-api 8000:8000
curl http://localhost:8000/health
# Expected: {"status": "healthy"}
# Check gotthard-slack-bot logs
kubectl logs -f -n platform-agents deployment/gotthard-slack-bot
# Test in Slack
# Send a DM to your bot: "hello"
Local Development¶
Running gotthard-api Locally¶
cd services/gotthard-api
# Start PostgreSQL with pgvector
docker run -d -p 5432:5432 \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=gotthard \
pgvector/pgvector:pg17
# Configure environment
cp .env.example .env
# Edit .env with your AWS credentials and settings
# Install dependencies
uv pip install -e ".[dev]"
# Run the API
python -m app.main
# API available at http://localhost:8000
# Health check: curl http://localhost:8000/health
Running gotthard-slack-bot Locally¶
cd services/gotthard-slack-bot
# Configure environment
cp .env.example .env
# Edit .env with:
# - SLACK_BOT_TOKEN
# - SLACK_SIGNING_SECRET
# - GOTTHARD_API_URL=http://localhost:8000
# - DATABASE_URL
# Install dependencies
uv pip install -e ".[dev]"
# Run the bot
python -m slack_bot.app
# Bot will connect to Slack via Socket Mode
Testing the Integration¶
# Terminal 1: Run gotthard-api
cd services/gotthard-api
python -m app.main
# Terminal 2: Run gotthard-slack-bot
cd services/gotthard-slack-bot
python -m slack_bot.app
# Terminal 3: Test API directly
curl -X POST http://localhost:8000/api/v1/chat/stream \
-H "Content-Type: application/json" \
-d '{
"thread_id": "test-123",
"messages": [{"role": "user", "content": "What is Fission?"}]
}'
# Or test via Slack: Send a DM to your bot
Configuration¶
gotthard-api Environment Variables¶
# AWS Bedrock
AWS_REGION=us-east-1
DEFAULT_LLM_MODEL=claude-3-5-sonnet
# Multi-agent mode (optional)
ENABLE_MULTI_AGENT=true # Enable specialized agents
# Database
DATABASE_URL=postgresql://user:pass@host:5432/gotthard
# Server
HOST=0.0.0.0
PORT=8000
ENV=production
LOG_LEVEL=info
# MCP Tools
GITHUB_TOKEN=ghp_... # For GitHub MCP server
gotthard-slack-bot Environment Variables¶
# Slack
SLACK_BOT_TOKEN=xoxb-...
SLACK_SIGNING_SECRET=...
# Gotthard API
GOTTHARD_API_URL=http://gotthard-api:8000
GOTTHARD_API_TIMEOUT=300
# Database (for surveys/feedback)
DATABASE_URL=postgresql://user:pass@host:5432/gotthard
# Features
ENABLE_SURVEYS=true
ENABLE_FEEDBACK=true
Helm Values¶
Edit deploy/gotthard/values.yaml or values-k8s.yaml:
# gotthard-api configuration
gotthard-api:
replicaCount: 2
env:
- name: ENABLE_MULTI_AGENT
value: "true"
- name: DEFAULT_LLM_MODEL
value: "claude-3-5-sonnet"
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
# gotthard-slack-bot configuration
gotthard-slack-bot:
replicaCount: 1 # Usually 1 is enough
env:
- name: GOTTHARD_API_URL
value: "http://gotthard-api:8000"
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
Testing¶
Check Deployment Status¶
# Check all Gotthard pods
kubectl get pods -n platform-agents | grep gotthard
# Expected output:
# gotthard-api-xxx 1/1 Running
# gotthard-slack-bot-xxx 1/1 Running
# Check services
kubectl get svc -n platform-agents | grep gotthard
Test gotthard-api Directly¶
# Port-forward to API
kubectl port-forward -n platform-agents svc/gotthard-api 8000:8000
# Health check
curl http://localhost:8000/health
# Test chat endpoint (non-streaming)
curl -X POST http://localhost:8000/api/v1/chat \
-H "Content-Type: application/json" \
-d '{
"thread_id": "test-123",
"messages": [{"role": "user", "content": "What is Fission?"}]
}'
# Test streaming endpoint (SSE)
curl -N -X POST http://localhost:8000/api/v1/chat/stream \
-H "Content-Type: application/json" \
-d '{
"thread_id": "test-456",
"messages": [{"role": "user", "content": "Tell me about ArgoCD"}]
}'
Test via Slack¶
-
Direct Message: Send a DM to your bot
-
Ask Questions:
-
Built-in Commands:
-
Survey (if enabled):
- Bot will prompt you with survey questions in DM
Monitoring¶
View Logs¶
# gotthard-api logs
kubectl logs -f -n platform-agents deployment/gotthard-api
# gotthard-slack-bot logs
kubectl logs -f -n platform-agents deployment/gotthard-slack-bot
# Filter for errors
kubectl logs -n platform-agents deployment/gotthard-api | grep ERROR
# Show structured logs with jq
kubectl logs -n platform-agents deployment/gotthard-api | jq -r 'select(.level=="ERROR")'
Health Checks¶
# API health
kubectl exec -n platform-agents deployment/gotthard-api -- \
curl -s http://localhost:8000/health
# Check pod status
kubectl get pods -n platform-agents -l app.kubernetes.io/name=gotthard-api
kubectl get pods -n platform-agents -l app.kubernetes.io/name=gotthard-slack-bot
# Check resource usage
kubectl top pods -n platform-agents | grep gotthard
Database Queries¶
# Port-forward to PostgreSQL
kubectl port-forward -n platform-agents svc/postgresql 5432:5432
# Connect and check
psql postgresql://postgres:pass@localhost:5432/gotthard
# Check conversation threads
SELECT thread_id, created_at FROM threads ORDER BY created_at DESC LIMIT 10;
# Check survey responses
SELECT * FROM survey_responses ORDER BY created_at DESC LIMIT 10;
Troubleshooting¶
Issue: Bot Not Responding in Slack¶
Symptoms: Bot doesn't reply to messages
Diagnosis:
# Check slack-bot is running
kubectl get pods -n platform-agents | grep slack-bot
# Check logs for connection errors
kubectl logs -n platform-agents deployment/gotthard-slack-bot | tail -50
# Look for Slack connection messages
kubectl logs -n platform-agents deployment/gotthard-slack-bot | grep "socket_mode"
Solutions:
1. Verify Slack credentials are correct
2. Check bot has been added to workspace
3. Verify GOTTHARD_API_URL is accessible from slack-bot pod
4. Restart slack-bot: kubectl rollout restart deployment/gotthard-slack-bot -n platform-agents
Issue: API Returns Errors¶
Symptoms: 500 errors from gotthard-api
Diagnosis:
# Check API logs
kubectl logs -n platform-agents deployment/gotthard-api | grep ERROR
# Check database connection
kubectl logs -n platform-agents deployment/gotthard-api | grep "database"
# Check Bedrock access
kubectl logs -n platform-agents deployment/gotthard-api | grep "bedrock"
Solutions: 1. Verify DATABASE_URL is correct 2. Check AWS IRSA role has Bedrock permissions 3. Verify PostgreSQL is accessible 4. Check MCP server initialization in logs
Issue: Slow Responses¶
Symptoms: Bot takes >30 seconds to respond
Diagnosis:
# Check if multi-agent is enabled
kubectl logs -n platform-agents deployment/gotthard-api | grep "multi_agent"
# Check token usage in logs
kubectl logs -n platform-agents deployment/gotthard-api | grep "token"
# Check for tool execution times
kubectl logs -n platform-agents deployment/gotthard-api | grep "tool_executed"
Solutions:
1. Enable multi-agent mode (ENABLE_MULTI_AGENT=true) to reduce token usage
2. Use faster models like Claude Haiku for routing
3. Increase resources if CPU/memory constrained
4. Check database query performance
Issue: Database Connection Failures¶
Symptoms: connection refused or authentication failed
Solution:
# Verify secret exists
kubectl get secret gotthard-db -n platform-agents
# Check DATABASE_URL format
kubectl get secret gotthard-db -n platform-agents -o jsonpath='{.data.database-url}' | base64 -d
# Test connection from pod
kubectl exec -n platform-agents deployment/gotthard-api -- \
python -c "import asyncpg; import asyncio; asyncio.run(asyncpg.connect('postgresql://...'))"
Features¶
Multi-Agent Mode¶
Enable specialized agents for better performance:
Benefits: - 30-50% reduction in token usage - Better tool selection accuracy - Specialized knowledge per domain - Faster responses
Available Agents: - github: Code, PRs, issues, workflows - argocd: K8s deployments, applications - backstage: Service catalog, docs - terraform: Infrastructure modules - aws: Costs, resources, accounts - general: Platform questions, docs
Surveys¶
Collect user feedback via automated surveys:
Configure surveys in services/gotthard-slack-bot/slack_bot/survey/survey_config.yaml
Announcements¶
Send platform announcements to Slack channels:
# Via API or admin interface
POST /api/v1/announcements
{
"message": "Platform maintenance on Friday",
"channels": ["#engineering", "#platform"]
}
Resources¶
Documentation: - gotthard-api README - gotthard-slack-bot README - Multi-Agent Architecture - Deployment Guide
Code: - services/gotthard-api - services/gotthard-slack-bot
External: - LangGraph Documentation - AWS Bedrock Documentation - Slack Bolt Python
Cost Estimates¶
Monthly costs (assuming 1000 queries/day):
| Component | Cost | Notes |
|---|---|---|
| gotthard-api pods | ~$30 | 2 replicas, 0.5 CPU, 1GB RAM |
| gotthard-slack-bot pods | ~$10 | 1 replica, 0.2 CPU, 512MB RAM |
| AWS Bedrock API | ~$50 | Claude Sonnet 3.5 pricing |
| PostgreSQL RDS | ~$25 | db.t3.small |
| Total | ~$115/month |
Optimization Tips: - Enable multi-agent mode to reduce token usage by 30-50% - Use Haiku for routing decisions (cheaper) - Implement caching for frequently asked questions - Set appropriate HPA limits to prevent over-scaling
Need Help? Ask in #it_pts_dai_tec Slack channel or open an issue in the repository.