Fission Platform Roadmap¶
Last Updated: 2025-10-16
Fission is Tamedia's unified Internal Developer Platform (IDP) that consolidates infrastructure, reduces complexity, and enables innovation through self-service developer capabilities. The platform provides standardized tooling, automated workflows, and AI-powered assistance for the complete software lifecycle.
Key Goals:
- Consolidation: Move from isolated AWS accounts to shared platform infrastructure for legacy products
- Standardization: Provide golden paths and templates for apps and services
- Self-Service: Enable developers to deploy and operate services independently
- Innovation: Leverage AI and automation to keep things running smoothly
Platform Architecture Overview¶
┌─────────────────────────────────────────────────────────────────┐
│ Developer Interfaces │
│ Slack • Backstage • CLI • IDE Extensions |└─────────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────────┴───────────────────────────────┐
│ Fission Platform Layer │
│ Self-Service • Automation • AI Agents • Golden Paths │
└─────────────────────────────────┬───────────────────────────────┘
│
┌────────────────────┼──────────────────┐
│ │ │
┌───────▼────────┐ ┌────────▼────────┐ ┌──────▼──────┐
│ Foundation │ │ Development │ │ Operations │
│ │ │ │ │ │
│ • AWS/EKS │ │ • GitHub │ │ • Datadog │
│ • Terraform │ │ • ArgoCD │ │ • PagerDuty │
│ • Kubernetes │ │ • Backstage │ │ • OpenCost │
│ • Networking │ │ • Templates │ │ • Lacework │
└────────────────┘ └─────────────────┘ └─────────────┘
Component Categories¶
1. Foundation Infrastructure¶
Purpose: Core platform services and infrastructure
| Component | Technology | Status | Owner |
|---|---|---|---|
| Cloud Provider | AWS | ✅ Production | DAI/SRE |
| Kubernetes | Amazon EKS | ✅ Production | DAI/SRE |
| Infrastructure as Code | Terraform | ✅ Production | DAI/SRE |
| Networking | AWS VPC | ✅ Production | DAI/SRE |
| Service Mesh | AWS VPC Lattice | 📋 Proposal | DAI/SRE |
Repositories:
tx-pts-dai/terraform-aws-kubernetes-platform- K8s platform modulesDND-IT/template-infra- Product infrastructure template
2. Developer Tools & CI/CD¶
Purpose: Code management, build, and deployment automation
| Component | Technology | Status | Owner |
|---|---|---|---|
| Version Control | GitHub | ✅ Production | All Teams |
| CI/CD Pipelines | GitHub Actions | ✅ Production | All Teams |
| GitOps Deployment | ArgoCD | ✅ Production | DAI/SRE |
| Reusable Workflows | github-workflows | ✅ Production | Platform Team |
| Helm Repository | helm-charts | ✅ Production | Platform Team |
| Release Management | Changelog + Actions | 🔄 In Progress | Platform Team |
Repositories:
DND-IT/github-workflows- Reusable CI/CD workflowsDND-IT/helm-charts- Centralized Helm chartsDND-IT/fission-argocd- GitOps deployment manifests
GitOps Flow:
Code Push → GitHub Actions (Build/Test) → Container Registry
↓
Update Helm Chart → ArgoCD Syncs → Deploy to Kubernetes
3. Developer Experience & Portal¶
Purpose: Self-service capabilities and developer productivity
| Component | Technology | Status | Owner |
|---|---|---|---|
| Documentation | MkDocs | ✅ Production | Platform Team |
| Project Templates | CookieCutter/Copier | 🔄 Q3 2025 | Platform Team |
| Developer Portal | Backstage | 📋 Proposal | Platform Team |
| Service Catalog | Backstage Catalog | 📋 Proposal | Platform Team |
| Gotthard AI Agent | Slack Bot (AI) | 📋 Proposal | Platform Team |
| Platform CLI | Custom CLI | 📋 Proposal | Platform Team |
Template Types:
- PRD Templates - Product requirements documentation
- Service Templates - Microservice scaffolding
- Infrastructure Templates - Terraform modules
- CI/CD Templates - GitHub Actions workflows
- Post-mortem Templates - Incident reviews
Repositories:
DND-IT/fission- Platform documentation and API
4. Observability & Monitoring¶
Purpose: System visibility, metrics, logs, and alerting
| Component | Technology | Status | Owner |
|---|---|---|---|
| APM & Logging | Datadog | ✅ Production | All Teams |
| Metrics | Datadog | ✅ Production | All Teams |
| Dashboards | Datadog Dashboards | ✅ Production | All Teams |
| Health Checks | HealthChecks | ✅ Production | DAI/SRE |
| External Monitoring | ??? | 🔄 Spike | DAI/SRE |
5. Incident Management & SRE¶
Purpose: Incident response, on-call, and reliability
| Component | Technology | Status | Owner |
|---|---|---|---|
| Alerting | Datadog Monitors | ✅ Production | DAI/SRE |
| On-Call Management | PagerDuty | ✅ Production | DAI/SRE |
| Incident Response | PagerDuty + Slack | ✅ Production | DAI/SRE |
| Incident Agent | AI-powered automation | 📋 Q1 2026 | Platform Team |
| Post-Mortems | Confluence | 📝 Spike | All Teams |
Integration Flow:
6. Security & Compliance¶
Purpose: Security scanning, secrets management, policy enforcement
| Component | Technology | Status | Owner |
|---|---|---|---|
| Container Security | Lacework | ✅ Production | DAI/SRE |
| Vulnerability Scanning | Lacework | ✅ Production | DAI/SRE |
| Secrets Management | AWS Secrets Manager | ✅ Production | DAI/SRE |
| Runtime Security | Lacework | ✅ Production | DAI/SRE |
| Compliance Monitoring | Lacework | ✅ Production | DAI/SRE |
| Policy Enforcement | OPA / Kyverno | 📋 Q1 2026 | DAI/SRE |
Security Workflow:
7. Cost Management (FinOps)¶
Purpose: Cost visibility, tracking, and optimization
| Component | Technology | Status | Owner |
|---|---|---|---|
| FinOps Reporting | Custom Dashboards | ✅ Production | EIS |
| K8s Cost Allocation | OpenCost | 📋 Q1 2026 | DAI/SRE |
| Cost Containers | OpenCost Labels | 📋 Q1 2026 | DAI/SRE |
| Workload Cost Visibility | Datadog + OpenCost | 📋 Q1 2026 | DAI/SRE |
| Optimization Tools | Lacework + OpenCost | 📋 Q1 2026 | DAI/SRE |
Cost Visibility:
2026 Goals:
- cost reduction through consolidation (Q1)
- Automated right-sizing recommendations (Q2)
8. AI & Automation Platform¶
Purpose: AI-powered platform automation and assistance
AI Infrastructure¶
| Component | Technology | Status | Owner |
|---|---|---|---|
| Agent Orchestration | Kagent (K8s native) | 🔄 Q2 2026 | Platform Team |
| Agent Engine | Google ADK + LangGraph | 🔄 Q2 2026 | Platform Team |
| LLM Integration | Anthropic Claude, OpenAI | 🔄 Q2 2025 | Platform Team |
| Vector Database | PGvector | 🔄 Q2 2025 | Platform Team |
| MCP Tools | GitHub, Terraform, K8s | 🔄 Q2 2025 | Platform Team |
Platform Agents¶
| Agent | Purpose | Status | Interfaces |
|---|---|---|---|
| Onboarding Agent | Automate service onboarding | 🔄 Q2 2025 | Slack, Backstage |
| Release Agent | Coordinate releases with validation | 📋 Q3 2025 | Slack, ArgoCD |
| Incident Agent | Automated incident response | 📋 Q3 2025 | Slack, PagerDuty |
| Infrastructure Agent | Generate Terraform from NLP | 📋 Q3 2025 | Slack, GitHub |
| Documentation Agent | Keep docs synchronized | 📋 Q3 2025 | GitHub, Backstage |
| Gotthard | General platform assistance | 🔄 Q2 2025 | Slack |
Agent Architecture:
User (Slack) → Gotthard → Routes to Agent
↓
Kagent Controller
↓
Agent Pod (ADK/LangGraph)
↓
Execute Tools (MCP)
↓ ↓ ↓
GitHub ArgoCD K8s
MCP (Model Context Protocol) Integrations:
- GitHub - Repository management
- Terraform - Infrastructure documentation
- Kubernetes - Cluster operations
- ArgoCD - Deployment management
- Datadog - Observability queries
- Jira - Issue tracking
- Kargo - Release promotions
- OpenCost - Cost queries
Repositories:
DND-IT/fission/docs/platform/agents/- Agent CRD definitionsDND-IT/helm-charts/kagent/- Kagent controller chartDND-IT/helm-charts/platform-agents/- Agent deployment chartDND-IT/fission-argocd/agents/- Agent GitOps manifests
9. Developer Resource Lifecycle Management (DRLM)¶
Purpose: Self-service resource provisioning and lifecycle management
| Component | Status | Description |
|---|---|---|
| Resource Provisioning | 📋 Q3 2025 | Self-service infra requests |
| Environment Management | 🔄 Q2 2025 | Dev/staging/prod environments |
| Ephemeral Environments | 📋 Q3 2025 | PR preview environments |
| Resource Cleanup | 📋 Q3 2025 | Automated resource lifecycle |
| Environment Parity | 📋 Q3 2025 | Dev/prod consistency |
| Dev Containers | 📋 Q3 2025 | Standardized local dev |
Migration Timeline & Phases¶
Phase 1: Foundation ✅ Complete (Q1-Q2 2025)¶
Achievements:
- Product accounts migrated to infra stacks
- Standard Kubernetes Deployment patterns
- VPC and networking standardized
- CI/CD pipelines with GitHub Actions
- ArgoCD for GitOps
Phase 2: Migration & Integration ⚠️ In Progress (Q3-Q4 2025)¶
Current Focus:
- Finalizing Product teams to Standard Kubernetes
- Migrating legacy products to central Kubernetes
- Template library implementation
Phase 3: 🔜 Planned (Q1-Q2 2026)¶
Goals:
- Complete workload migration (100%)
- Decommission legacy infrastructure
- Advanced automation via AI agents
- Complete self-service capabilities
Team Ownership¶
DAI Team (SRE)¶
Responsibilities:
- AWS cloud infrastructure management
- Terraform infrastructure as code
- Kubernetes cluster operations (EKS)
- Platform reliability and performance
- Security and compliance
- Cost optimization
Platform Team (Sudo)¶
Responsibilities:
- Developer experience tools
- AI agents and automation
- Platform documentation
- Reusable workflows and templates
- Platform API and integrations
- Developer onboarding and support
Product Teams¶
Responsibilities:
- Application development and deployment
- Service-specific configurations
- Team-specific documentation
- Cost awareness and optimization
- Security best practices adoption
Key Principles¶
1. Self-Service First¶
Developers provision resources and deploy services without waiting for platform team
2. Golden Paths¶
Opinionated, well-tested patterns for common tasks via templates and automation
3. Security by Default¶
Built-in compliance, secure defaults, automated scanning
4. Observable Everything¶
Comprehensive monitoring, logging, tracing for all services
5. Cost Conscious¶
Visibility and accountability for infrastructure costs at all levels
6. GitOps Native¶
All platform changes managed through Git with ArgoCD and Terraform
7. AI-Assisted¶
Leverage AI agents for automation, documentation, and developer assistance
Success Metrics!?¶
Migration Progress¶
- Workload migration percentage
- Legacy infrastructure decommissioned
- Teams onboarded to platform
Platform Reliability¶
- Platform uptime (target: 99.9%)
- Mean time to recovery (MTTR)
- Incident frequency and severity
Developer Experience¶
- Onboarding time (developer to first deploy)
- Self-service adoption rate
- Developer satisfaction scores
Cost Efficiency¶
- Infrastructure cost reduction
- Cost per service/team visibility
- Right-sizing implementation rate
Automation & AI¶
- Agent invocation frequency
- Automation success rate
- Manual intervention reduction
Status Legend¶
| Symbol | Status | Description |
|---|---|---|
| 🔍 | Spiking | Researching feasibility |
| 📋 | Planned | Roadmap item, not started |
| 🔄 | In Progress | Active development/deployment |
| ⚠️ | Blocked | Requires dependencies or decisions |
| ✅ | Production | Deployed and operational |
Feedback & Communication¶
Slack Channels:
- #TBA - General platform discussion
- #it_pts_dai_monitoring: WIP Channel
This roadmap is a living document. Updates are tracked in the fission repository.