Deploying Gotthard - The Fission Platform AI Agent¶

Complete guide for deploying Gotthard, the Fission platform's AI assistant, to your Kubernetes cluster.

What You're Building¶

Gotthard is a production-ready AI platform assistant that: - Answers questions about the Fission platform via Slack - Uses LangGraph for multi-agent orchestration - Integrates with AWS Bedrock (Claude) and MCP tools - Provides real-time streaming responses via SSE - Supports surveys, feedback collection, and announcements - Runs on Kubernetes with auto-scaling - Deploys via GitOps (ArgoCD)

Architecture Overview¶

┌─────────────┐
│  Slack User │
└──────┬──────┘
       │
       ▼
┌──────────────────────┐
│  gotthard-slack-bot  │  (Slack Bolt - Python)
│  (Pod in K8s)        │  - Message routing
└──────┬───────────────┘  - Survey management
       │                  - Feedback collection
       │ SSE Stream
       ▼
┌──────────────────────┐
│   gotthard-api       │  (FastAPI + LangGraph)
│   (Pod in K8s)       │  - Multi-agent router
└──────┬───────────────┘  - Specialized agents
       │                  - Document RAG
       ▼
┌──────────────────────┐
│  AWS Bedrock Claude  │  +  MCP Tools
│  (LLM Provider)      │     (GitHub, ArgoCD, etc.)
└──────────────────────┘
       │
       ▼
┌──────────────────────┐
│  PostgreSQL          │  (Memory + Surveys)
└──────────────────────┘

Repository Structure¶

fission/
├── services/
│   ├── gotthard-api/              # LangGraph API service
│   │   ├── app/
│   │   │   ├── agent.py          # Multi-agent workflow
│   │   │   ├── api.py            # FastAPI + SSE endpoints
│   │   │   └── mcp_servers/      # MCP integrations
│   │   ├── Dockerfile
│   │   ├── pyproject.toml
│   │   └── docs/
│   │       ├── DEPLOYMENT.md
│   │       └── MULTI_AGENT_ARCHITECTURE.md
│   │
│   └── gotthard-slack-bot/        # Slack interface
│       ├── slack_bot/
│       │   ├── app.py            # Slack Bolt app
│       │   ├── client.py         # SSE client
│       │   ├── storage/          # Survey/feedback DB
│       │   └── utils/
│       │       ├── handlers.py   # Message routing
│       │       └── streaming.py  # SSE streaming
│       ├── Dockerfile
│       └── pyproject.toml
│
└── deploy/
    ├── gotthard/                   # Helm deployment configs
    │   ├── values.yaml
    │   └── manifests/
    └── argocd/
        └── platform-gotthard.yaml  # ArgoCD application

Quick Deploy¶

Prerequisites¶

Kubernetes cluster with:
PostgreSQL database (or use managed RDS)
AWS IRSA/Pod Identity configured for Bedrock access
ArgoCD installed
Slack workspace with:
Slack app created
Bot token (xoxb-...)
Signing secret
AWS credentials with:
Bedrock API access (Claude models)
IAM role for pod authentication
Tools installed:
kubectl configured
AWS CLI configured

Step 1: Create Secrets¶

# Create namespace
kubectl create namespace platform-agents

# Slack credentials for slack-bot
kubectl create secret generic gotthard-slack \
  --namespace platform-agents \
  --from-literal=bot-token="xoxb-..." \
  --from-literal=signing-secret="..."

# Database URL for both services
kubectl create secret generic gotthard-db \
  --namespace platform-agents \
  --from-literal=database-url="postgresql://user:pass@host:5432/gotthard"

Step 2: Deploy via ArgoCD¶

# Apply ArgoCD application
kubectl apply -f deploy/argocd/platform-gotthard.yaml

# Watch deployment
argocd app get platform-gotthard
argocd app sync platform-gotthard

# Check pods
kubectl get pods -n platform-agents -l app.kubernetes.io/name=gotthard

Step 3: Verify Deployment¶

# Check gotthard-api health
kubectl port-forward -n platform-agents svc/gotthard-api 8000:8000
curl http://localhost:8000/health
# Expected: {"status": "healthy"}

# Check gotthard-slack-bot logs
kubectl logs -f -n platform-agents deployment/gotthard-slack-bot

# Test in Slack
# Send a DM to your bot: "hello"

Local Development¶

Running gotthard-api Locally¶

cd services/gotthard-api

# Start PostgreSQL with pgvector
docker run -d -p 5432:5432 \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=gotthard \
  pgvector/pgvector:pg17

# Configure environment
cp .env.example .env
# Edit .env with your AWS credentials and settings

# Install dependencies
uv pip install -e ".[dev]"

# Run the API
python -m app.main

# API available at http://localhost:8000
# Health check: curl http://localhost:8000/health

Running gotthard-slack-bot Locally¶

cd services/gotthard-slack-bot

# Configure environment
cp .env.example .env
# Edit .env with:
# - SLACK_BOT_TOKEN
# - SLACK_SIGNING_SECRET
# - GOTTHARD_API_URL=http://localhost:8000
# - DATABASE_URL

# Install dependencies
uv pip install -e ".[dev]"

# Run the bot
python -m slack_bot.app

# Bot will connect to Slack via Socket Mode

Testing the Integration¶

# Terminal 1: Run gotthard-api
cd services/gotthard-api
python -m app.main

# Terminal 2: Run gotthard-slack-bot
cd services/gotthard-slack-bot
python -m slack_bot.app

# Terminal 3: Test API directly
curl -X POST http://localhost:8000/api/v1/chat/stream \
  -H "Content-Type: application/json" \
  -d '{
    "thread_id": "test-123",
    "messages": [{"role": "user", "content": "What is Fission?"}]
  }'

# Or test via Slack: Send a DM to your bot

Configuration¶

gotthard-api Environment Variables¶

# AWS Bedrock
AWS_REGION=us-east-1
DEFAULT_LLM_MODEL=claude-3-5-sonnet

# Multi-agent mode (optional)
ENABLE_MULTI_AGENT=true  # Enable specialized agents

# Database
DATABASE_URL=postgresql://user:pass@host:5432/gotthard

# Server
HOST=0.0.0.0
PORT=8000
ENV=production
LOG_LEVEL=info

# MCP Tools
GITHUB_TOKEN=ghp_...  # For GitHub MCP server

gotthard-slack-bot Environment Variables¶

# Slack
SLACK_BOT_TOKEN=xoxb-...
SLACK_SIGNING_SECRET=...

# Gotthard API
GOTTHARD_API_URL=http://gotthard-api:8000
GOTTHARD_API_TIMEOUT=300

# Database (for surveys/feedback)
DATABASE_URL=postgresql://user:pass@host:5432/gotthard

# Features
ENABLE_SURVEYS=true
ENABLE_FEEDBACK=true

Helm Values¶

Edit deploy/gotthard/values.yaml or values-k8s.yaml:

# gotthard-api configuration
gotthard-api:
  replicaCount: 2

  env:
    - name: ENABLE_MULTI_AGENT
      value: "true"
    - name: DEFAULT_LLM_MODEL
      value: "claude-3-5-sonnet"

  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "500m"

# gotthard-slack-bot configuration
gotthard-slack-bot:
  replicaCount: 1  # Usually 1 is enough

  env:
    - name: GOTTHARD_API_URL
      value: "http://gotthard-api:8000"

  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "512Mi"
      cpu: "200m"

Testing¶

Check Deployment Status¶

# Check all Gotthard pods
kubectl get pods -n platform-agents | grep gotthard

# Expected output:
# gotthard-api-xxx       1/1  Running
# gotthard-slack-bot-xxx 1/1  Running

# Check services
kubectl get svc -n platform-agents | grep gotthard

Test gotthard-api Directly¶

# Port-forward to API
kubectl port-forward -n platform-agents svc/gotthard-api 8000:8000

# Health check
curl http://localhost:8000/health

# Test chat endpoint (non-streaming)
curl -X POST http://localhost:8000/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "thread_id": "test-123",
    "messages": [{"role": "user", "content": "What is Fission?"}]
  }'

# Test streaming endpoint (SSE)
curl -N -X POST http://localhost:8000/api/v1/chat/stream \
  -H "Content-Type: application/json" \
  -d '{
    "thread_id": "test-456",
    "messages": [{"role": "user", "content": "Tell me about ArgoCD"}]
  }'

Test via Slack¶

Direct Message: Send a DM to your bot
```
hello
```

Ask Questions:

What is Fission?
How do I deploy to Kubernetes?
Show me recent PRs in the fission repo

Built-in Commands:

help
status
links
feedback This bot is awesome!

Survey (if enabled):
Bot will prompt you with survey questions in DM

Monitoring¶

View Logs¶

# gotthard-api logs
kubectl logs -f -n platform-agents deployment/gotthard-api

# gotthard-slack-bot logs
kubectl logs -f -n platform-agents deployment/gotthard-slack-bot

# Filter for errors
kubectl logs -n platform-agents deployment/gotthard-api | grep ERROR

# Show structured logs with jq
kubectl logs -n platform-agents deployment/gotthard-api | jq -r 'select(.level=="ERROR")'

Health Checks¶

# API health
kubectl exec -n platform-agents deployment/gotthard-api -- \
  curl -s http://localhost:8000/health

# Check pod status
kubectl get pods -n platform-agents -l app.kubernetes.io/name=gotthard-api
kubectl get pods -n platform-agents -l app.kubernetes.io/name=gotthard-slack-bot

# Check resource usage
kubectl top pods -n platform-agents | grep gotthard

Database Queries¶

# Port-forward to PostgreSQL
kubectl port-forward -n platform-agents svc/postgresql 5432:5432

# Connect and check
psql postgresql://postgres:pass@localhost:5432/gotthard

# Check conversation threads
SELECT thread_id, created_at FROM threads ORDER BY created_at DESC LIMIT 10;

# Check survey responses
SELECT * FROM survey_responses ORDER BY created_at DESC LIMIT 10;

Troubleshooting¶

Issue: Bot Not Responding in Slack¶

Symptoms: Bot doesn't reply to messages

Diagnosis:

# Check slack-bot is running
kubectl get pods -n platform-agents | grep slack-bot

# Check logs for connection errors
kubectl logs -n platform-agents deployment/gotthard-slack-bot | tail -50

# Look for Slack connection messages
kubectl logs -n platform-agents deployment/gotthard-slack-bot | grep "socket_mode"

Solutions: 1. Verify Slack credentials are correct 2. Check bot has been added to workspace 3. Verify GOTTHARD_API_URL is accessible from slack-bot pod 4. Restart slack-bot: kubectl rollout restart deployment/gotthard-slack-bot -n platform-agents

Issue: API Returns Errors¶

Symptoms: 500 errors from gotthard-api

Diagnosis:

# Check API logs
kubectl logs -n platform-agents deployment/gotthard-api | grep ERROR

# Check database connection
kubectl logs -n platform-agents deployment/gotthard-api | grep "database"

# Check Bedrock access
kubectl logs -n platform-agents deployment/gotthard-api | grep "bedrock"

Solutions: 1. Verify DATABASE_URL is correct 2. Check AWS IRSA role has Bedrock permissions 3. Verify PostgreSQL is accessible 4. Check MCP server initialization in logs

Issue: Slow Responses¶

Symptoms: Bot takes >30 seconds to respond

Diagnosis:

# Check if multi-agent is enabled
kubectl logs -n platform-agents deployment/gotthard-api | grep "multi_agent"

# Check token usage in logs
kubectl logs -n platform-agents deployment/gotthard-api | grep "token"

# Check for tool execution times
kubectl logs -n platform-agents deployment/gotthard-api | grep "tool_executed"

Solutions: 1. Enable multi-agent mode (ENABLE_MULTI_AGENT=true) to reduce token usage 2. Use faster models like Claude Haiku for routing 3. Increase resources if CPU/memory constrained 4. Check database query performance

Issue: Database Connection Failures¶

Symptoms: connection refused or authentication failed

Solution:

# Verify secret exists
kubectl get secret gotthard-db -n platform-agents

# Check DATABASE_URL format
kubectl get secret gotthard-db -n platform-agents -o jsonpath='{.data.database-url}' | base64 -d

# Test connection from pod
kubectl exec -n platform-agents deployment/gotthard-api -- \
  python -c "import asyncpg; import asyncio; asyncio.run(asyncpg.connect('postgresql://...'))"

Features¶

Multi-Agent Mode¶

Enable specialized agents for better performance:

# In Helm values or environment
env:
  - name: ENABLE_MULTI_AGENT
    value: "true"

Benefits: - 30-50% reduction in token usage - Better tool selection accuracy - Specialized knowledge per domain - Faster responses

Available Agents: - github: Code, PRs, issues, workflows - argocd: K8s deployments, applications - backstage: Service catalog, docs - terraform: Infrastructure modules - aws: Costs, resources, accounts - general: Platform questions, docs

Surveys¶

Collect user feedback via automated surveys:

# Enable in slack-bot environment
env:
  - name: ENABLE_SURVEYS
    value: "true"

Configure surveys in services/gotthard-slack-bot/slack_bot/survey/survey_config.yaml

Announcements¶

Send platform announcements to Slack channels:

# Via API or admin interface
POST /api/v1/announcements
{
  "message": "Platform maintenance on Friday",
  "channels": ["#engineering", "#platform"]
}

Resources¶

Documentation: - gotthard-api README - gotthard-slack-bot README - Multi-Agent Architecture - Deployment Guide

Code: - services/gotthard-api - services/gotthard-slack-bot

External: - LangGraph Documentation - AWS Bedrock Documentation - Slack Bolt Python

Cost Estimates¶

Monthly costs (assuming 1000 queries/day):

Component	Cost	Notes
gotthard-api pods	~$30	2 replicas, 0.5 CPU, 1GB RAM
gotthard-slack-bot pods	~$10	1 replica, 0.2 CPU, 512MB RAM
AWS Bedrock API	~$50	Claude Sonnet 3.5 pricing
PostgreSQL RDS	~$25	db.t3.small
Total	~$115/month

Optimization Tips: - Enable multi-agent mode to reduce token usage by 30-50% - Use Haiku for routing decisions (cheaper) - Implement caching for frequently asked questions - Set appropriate HPA limits to prevent over-scaling

Need Help? Ask in #it_pts_dai_tec Slack channel or open an issue in the repository.