Skip to content

Fission Platform Roadmap

Last Updated: 2025-10-16

Fission is Tamedia's unified Internal Developer Platform (IDP) that consolidates infrastructure, reduces complexity, and enables innovation through self-service developer capabilities. The platform provides standardized tooling, automated workflows, and AI-powered assistance for the complete software lifecycle.

Key Goals:

  • Consolidation: Move from isolated AWS accounts to shared platform infrastructure for legacy products
  • Standardization: Provide golden paths and templates for apps and services
  • Self-Service: Enable developers to deploy and operate services independently
  • Innovation: Leverage AI and automation to keep things running smoothly

Platform Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Developer Interfaces                         │
│              Slack • Backstage • CLI • IDE Extensions           |└─────────────────────────────────┬───────────────────────────────┘
┌─────────────────────────────────┴───────────────────────────────┐
│                     Fission Platform Layer                      │
│  Self-Service • Automation • AI Agents • Golden Paths           │
└─────────────────────────────────┬───────────────────────────────┘
             ┌────────────────────┼──────────────────┐
             │                    │                  │
     ┌───────▼────────┐  ┌────────▼────────┐  ┌──────▼──────┐
     │   Foundation   │  │   Development   │  │  Operations │
     │                │  │                 │  │             │
     │ • AWS/EKS      │  │ • GitHub        │  │ • Datadog   │
     │ • Terraform    │  │ • ArgoCD        │  │ • PagerDuty │
     │ • Kubernetes   │  │ • Backstage     │  │ • OpenCost  │
     │ • Networking   │  │ • Templates     │  │ • Lacework  │
     └────────────────┘  └─────────────────┘  └─────────────┘

Component Categories

1. Foundation Infrastructure

Purpose: Core platform services and infrastructure

Component Technology Status Owner
Cloud Provider AWS ✅ Production DAI/SRE
Kubernetes Amazon EKS ✅ Production DAI/SRE
Infrastructure as Code Terraform ✅ Production DAI/SRE
Networking AWS VPC ✅ Production DAI/SRE
Service Mesh AWS VPC Lattice 📋 Proposal DAI/SRE

Repositories:

  • tx-pts-dai/terraform-aws-kubernetes-platform - K8s platform modules
  • DND-IT/template-infra - Product infrastructure template

2. Developer Tools & CI/CD

Purpose: Code management, build, and deployment automation

Component Technology Status Owner
Version Control GitHub ✅ Production All Teams
CI/CD Pipelines GitHub Actions ✅ Production All Teams
GitOps Deployment ArgoCD ✅ Production DAI/SRE
Reusable Workflows github-workflows ✅ Production Platform Team
Helm Repository helm-charts ✅ Production Platform Team
Release Management Changelog + Actions 🔄 In Progress Platform Team

Repositories:

  • DND-IT/github-workflows - Reusable CI/CD workflows
  • DND-IT/helm-charts - Centralized Helm charts
  • DND-IT/fission-argocd - GitOps deployment manifests

GitOps Flow:

Code Push → GitHub Actions (Build/Test) → Container Registry
Update Helm Chart → ArgoCD Syncs → Deploy to Kubernetes


3. Developer Experience & Portal

Purpose: Self-service capabilities and developer productivity

Component Technology Status Owner
Documentation MkDocs ✅ Production Platform Team
Project Templates CookieCutter/Copier 🔄 Q3 2025 Platform Team
Developer Portal Backstage 📋 Proposal Platform Team
Service Catalog Backstage Catalog 📋 Proposal Platform Team
Gotthard AI Agent Slack Bot (AI) 📋 Proposal Platform Team
Platform CLI Custom CLI 📋 Proposal Platform Team

Template Types:

  • PRD Templates - Product requirements documentation
  • Service Templates - Microservice scaffolding
  • Infrastructure Templates - Terraform modules
  • CI/CD Templates - GitHub Actions workflows
  • Post-mortem Templates - Incident reviews

Repositories:

  • DND-IT/fission - Platform documentation and API

4. Observability & Monitoring

Purpose: System visibility, metrics, logs, and alerting

Component Technology Status Owner
APM & Logging Datadog ✅ Production All Teams
Metrics Datadog ✅ Production All Teams
Dashboards Datadog Dashboards ✅ Production All Teams
Health Checks HealthChecks ✅ Production DAI/SRE
External Monitoring ??? 🔄 Spike DAI/SRE

5. Incident Management & SRE

Purpose: Incident response, on-call, and reliability

Component Technology Status Owner
Alerting Datadog Monitors ✅ Production DAI/SRE
On-Call Management PagerDuty ✅ Production DAI/SRE
Incident Response PagerDuty + Slack ✅ Production DAI/SRE
Incident Agent AI-powered automation 📋 Q1 2026 Platform Team
Post-Mortems Confluence 📝 Spike All Teams

Integration Flow:

Alert → Datadog → PagerDuty → On-Call Engineer
                  Slack Channel
            Incident Agent (Future)


6. Security & Compliance

Purpose: Security scanning, secrets management, policy enforcement

Component Technology Status Owner
Container Security Lacework ✅ Production DAI/SRE
Vulnerability Scanning Lacework ✅ Production DAI/SRE
Secrets Management AWS Secrets Manager ✅ Production DAI/SRE
Runtime Security Lacework ✅ Production DAI/SRE
Compliance Monitoring Lacework ✅ Production DAI/SRE
Policy Enforcement OPA / Kyverno 📋 Q1 2026 DAI/SRE

Security Workflow:

Code Push → Container Build → Lacework Scan → Vulnerability Report
                           Block/Warn/Pass → Deploy


7. Cost Management (FinOps)

Purpose: Cost visibility, tracking, and optimization

Component Technology Status Owner
FinOps Reporting Custom Dashboards ✅ Production EIS
K8s Cost Allocation OpenCost 📋 Q1 2026 DAI/SRE
Cost Containers OpenCost Labels 📋 Q1 2026 DAI/SRE
Workload Cost Visibility Datadog + OpenCost 📋 Q1 2026 DAI/SRE
Optimization Tools Lacework + OpenCost 📋 Q1 2026 DAI/SRE

Cost Visibility:

Kubernetes Resources → OpenCost → Cost Allocation by:
                                   • Team
                                   • Service
                                   • Environment
                                   • Namespace

2026 Goals:

  • cost reduction through consolidation (Q1)
  • Automated right-sizing recommendations (Q2)

8. AI & Automation Platform

Purpose: AI-powered platform automation and assistance

AI Infrastructure

Component Technology Status Owner
Agent Orchestration Kagent (K8s native) 🔄 Q2 2026 Platform Team
Agent Engine Google ADK + LangGraph 🔄 Q2 2026 Platform Team
LLM Integration Anthropic Claude, OpenAI 🔄 Q2 2025 Platform Team
Vector Database PGvector 🔄 Q2 2025 Platform Team
MCP Tools GitHub, Terraform, K8s 🔄 Q2 2025 Platform Team

Platform Agents

Agent Purpose Status Interfaces
Onboarding Agent Automate service onboarding 🔄 Q2 2025 Slack, Backstage
Release Agent Coordinate releases with validation 📋 Q3 2025 Slack, ArgoCD
Incident Agent Automated incident response 📋 Q3 2025 Slack, PagerDuty
Infrastructure Agent Generate Terraform from NLP 📋 Q3 2025 Slack, GitHub
Documentation Agent Keep docs synchronized 📋 Q3 2025 GitHub, Backstage
Gotthard General platform assistance 🔄 Q2 2025 Slack

Agent Architecture:

User (Slack) → Gotthard → Routes to Agent
                       Kagent Controller
                       Agent Pod (ADK/LangGraph)
                       Execute Tools (MCP)
                            ↓    ↓    ↓
                       GitHub  ArgoCD  K8s

MCP (Model Context Protocol) Integrations:

  • GitHub - Repository management
  • Terraform - Infrastructure documentation
  • Kubernetes - Cluster operations
  • ArgoCD - Deployment management
  • Datadog - Observability queries
  • Jira - Issue tracking
  • Kargo - Release promotions
  • OpenCost - Cost queries

Repositories:

  • DND-IT/fission/docs/platform/agents/ - Agent CRD definitions
  • DND-IT/helm-charts/kagent/ - Kagent controller chart
  • DND-IT/helm-charts/platform-agents/ - Agent deployment chart
  • DND-IT/fission-argocd/agents/ - Agent GitOps manifests

9. Developer Resource Lifecycle Management (DRLM)

Purpose: Self-service resource provisioning and lifecycle management

Component Status Description
Resource Provisioning 📋 Q3 2025 Self-service infra requests
Environment Management 🔄 Q2 2025 Dev/staging/prod environments
Ephemeral Environments 📋 Q3 2025 PR preview environments
Resource Cleanup 📋 Q3 2025 Automated resource lifecycle
Environment Parity 📋 Q3 2025 Dev/prod consistency
Dev Containers 📋 Q3 2025 Standardized local dev

Migration Timeline & Phases

Phase 1: Foundation ✅ Complete (Q1-Q2 2025)

Achievements:

  • Product accounts migrated to infra stacks
  • Standard Kubernetes Deployment patterns
  • VPC and networking standardized
  • CI/CD pipelines with GitHub Actions
  • ArgoCD for GitOps

Phase 2: Migration & Integration ⚠️ In Progress (Q3-Q4 2025)

Current Focus:

  • Finalizing Product teams to Standard Kubernetes
  • Migrating legacy products to central Kubernetes
  • Template library implementation

Phase 3: 🔜 Planned (Q1-Q2 2026)

Goals:

  • Complete workload migration (100%)
  • Decommission legacy infrastructure
  • Advanced automation via AI agents
  • Complete self-service capabilities

Team Ownership

DAI Team (SRE)

Responsibilities:

  • AWS cloud infrastructure management
  • Terraform infrastructure as code
  • Kubernetes cluster operations (EKS)
  • Platform reliability and performance
  • Security and compliance
  • Cost optimization

Platform Team (Sudo)

Responsibilities:

  • Developer experience tools
  • AI agents and automation
  • Platform documentation
  • Reusable workflows and templates
  • Platform API and integrations
  • Developer onboarding and support

Product Teams

Responsibilities:

  • Application development and deployment
  • Service-specific configurations
  • Team-specific documentation
  • Cost awareness and optimization
  • Security best practices adoption

Key Principles

1. Self-Service First

Developers provision resources and deploy services without waiting for platform team

2. Golden Paths

Opinionated, well-tested patterns for common tasks via templates and automation

3. Security by Default

Built-in compliance, secure defaults, automated scanning

4. Observable Everything

Comprehensive monitoring, logging, tracing for all services

5. Cost Conscious

Visibility and accountability for infrastructure costs at all levels

6. GitOps Native

All platform changes managed through Git with ArgoCD and Terraform

7. AI-Assisted

Leverage AI agents for automation, documentation, and developer assistance


Success Metrics!?

Migration Progress

  • Workload migration percentage
  • Legacy infrastructure decommissioned
  • Teams onboarded to platform

Platform Reliability

  • Platform uptime (target: 99.9%)
  • Mean time to recovery (MTTR)
  • Incident frequency and severity

Developer Experience

  • Onboarding time (developer to first deploy)
  • Self-service adoption rate
  • Developer satisfaction scores

Cost Efficiency

  • Infrastructure cost reduction
  • Cost per service/team visibility
  • Right-sizing implementation rate

Automation & AI

  • Agent invocation frequency
  • Automation success rate
  • Manual intervention reduction

Status Legend

Symbol Status Description
🔍 Spiking Researching feasibility
📋 Planned Roadmap item, not started
🔄 In Progress Active development/deployment
⚠️ Blocked Requires dependencies or decisions
Production Deployed and operational

Feedback & Communication

Slack Channels:


This roadmap is a living document. Updates are tracked in the fission repository.