alfredtm

GitOps Squared

March 23, 2026

I’ve been thinking about a problem with GitOps that I haven’t seen great answers to. I’m not sure I have one either, but I want to write the idea down.

GitOps works great, until other people show up

If you work on a platform team, you’re provisioning infrastructure for other people. The amount you manage grows every year. Your team does not. Add AI to the picture (agents spinning up resources, engineers shipping faster) and the gap gets even larger.

xychart-beta x-axis "Year" [2020, 2021, 2022, 2023, 2024, 2025, 2026] y-axis "Scale" line "Infrastructure (with AI)" [10, 18, 30, 48, 85, 150, 250] line "Infrastructure" [10, 18, 30, 48, 70, 100, 135] line "Platform team" [5, 6, 7, 8, 8, 9, 10]

ClickOps doesn’t scale here. So you adopt IAC and GitOps. Push manifests to Git, a sync tool like ArgoCD picks them up, and your infrastructure converges to match.

Git is stored state, Kubernetes is desired state, and whatever gets provisioned is actual state.

graph LR STORED[Stored State] -->|sync| DESIRED[Desired State] -->|provisions| S[State] style STORED fill:#fff3cd,stroke:#ffc107,color:#000 style DESIRED fill:#d4edda,stroke:#28a745,color:#000 style S fill:#f8d7da,stroke:#dc3545,color:#000

This works beautifully if your team is the only user. But platform teams rarely operate alone.

Some engineers love GitOps, others want to manage firewall rules trough a portal. Some customers want a Terraform provider, others want a Kubernetes operator, many want nothing to do with IAC.

So you build multiple interfaces: a UI, a CLI, a Kubernetes operator, maybe a Terraform provider. Now you have multiple paths into the system and no single source of truth.

graph LR U[User] --> GO[GitOps] --> GIT[Git] --> K8S[K8s] --> S1[Something] U --> CO[ClickOps] --> FE[Frontend] --> BE[Backend] --> S2[Something] U --> TF[Terraform] --> BE U --> CLI2[CLI] --> BE style GIT fill:#fff3cd,stroke:#ffc107,color:#000 style K8S fill:#d4edda,stroke:#28a745,color:#000 style S1 fill:#f8d7da,stroke:#dc3545,color:#000 style S2 fill:#f8d7da,stroke:#dc3545,color:#000

Everything funnels through Git

The natural reaction is to force everything through Git. Instead of provisioning directly, every interface writes to a repository through a Git proxy. Git becomes the system of record, and the cluster reconciles from there.

graph LR GO[GitOps] --> GIT[Git] CO[ClickOps] --> FE[Frontend] --> BE[Backend] --> GP[Git Proxy] --> GIT TF[Terraform] --> BE CLI2[CLI] --> BE GIT --> K8S[K8s] --> S[Something] style GIT fill:#fff3cd,stroke:#ffc107,color:#000 style K8S fill:#d4edda,stroke:#28a745,color:#000 style S fill:#f8d7da,stroke:#dc3545,color:#000

Then the requirements keep expanding. External customers want their own repositories and pipelines, not to be forced into an existing GitOps flow that might not work for them. The platform team does not want externals into their repositories, their clusters. So a desicion is made to move everything behind the Git proxy.

Eventually everything funnels through the same backend. Internal users, external users, GitOps systems, and AI agents all rely on the platform to create commits, resolve conflicts, and manage repository state.

graph LR CO[ClickOps] --> FE[Frontend] --> BE[Backend] --> GP[Git Proxy] --> GIT[Git] --> K8S[K8s] --> S[Something] EGO[Ext-GitOps] --> EG[Ext-Git] --> EK[Ext-K8s] --> BE IGO[GitOps] --> IG[Int-Git] --> IK[Int-K8s] --> BE AI[AI Agent] -->|API / MCP / CLI| BE style GIT fill:#fff3cd,stroke:#ffc107,color:#000 style EG fill:#fff3cd,stroke:#ffc107,color:#000 style IG fill:#fff3cd,stroke:#ffc107,color:#000 style K8S fill:#d4edda,stroke:#28a745,color:#000 style EK fill:#d4edda,stroke:#28a745,color:#000 style IK fill:#d4edda,stroke:#28a745,color:#000 style S fill:#f8d7da,stroke:#dc3545,color:#000 style AI fill:#e8daef,stroke:#8e44ad,color:#000

At this point, the problem is clear. Git was designed for humans. Now the platform is using it as a high-concurrency machine interface. Commits, merges, and conflicts become operational concerns.

The problem is not GitOps. The problem is using Git as the system boundary.

What if state was stored as artifacts instead

Instead of Git, use something designed for automation. An OCI registry.

graph LR subgraph UserSpace TF[Terraform] --> OCI[OCI Proxy] OP[K8s Operator] --> OCI CLI[CLI] --> OCI MCP[MCP] --> OCI API[API] --> OCI UI[UI] --> OCI end OCI --> REG[OCI Registry] subgraph SystemSpace REG --> SYNC[Sync] --> SK[K8s] --> OP2[Operator] --> S[Something] end style REG fill:#fff3cd,stroke:#ffc107,color:#000 style SK fill:#d4edda,stroke:#28a745,color:#000 style S fill:#f8d7da,stroke:#dc3545,color:#000

On one side is UserSpace. The platform offers whatever interface makes sense: Terraform, Kubernetes operators, CLI, MCP, REST, or a UI. Teams choose the workflow that fits them.

The platform does not need to enforce a single way of working. You can still provide golden paths through a UI, a CLI, or curated Terraform modules. If teams choose to follow them, great. If they build their own workflows, that is fine too.

The system only cares about the intent that comes out of those interactions. The contract is the artifact, not the interface. In many cases, platforms should standardize outcomes, not how people get there.

For example:

POST /api/v1/resources
{
  "kind": "VirtualMachine",
  "name": "my-app-vm",
  "spec": {
    "size": "medium",
    "image": "ubuntu-24.04",
    "region": "eu-north-1"
  }
}

The OCI proxy converts that intent into a versioned artifact:

apiVersion: platform.example.com/v1
kind: VirtualMachine
metadata:
  name: my-app-vm
  namespace: team-alpha
spec:
  size: medium
  image: ubuntu-24.04
  region: eu-north-1

Stored as: registry.example.com/team-alpha/my-app-vm:v1

Users do not interact with the registry directly. They just declare intent. The proxy API handles everything in between: validation, tenancy, concurrency. These are normal API concerns, not new problems.

On the other side is SystemSpace. This is where GitOps still exists, but fully controlled by the platform. The registry is stored state, a cluster syncs from it as desired state, and operators reconcile the actual infrastructure. Teams that want to run their own GitOps can still do that on the UserSpace side. The platform just doesn’t require it.

There is a reconciliation loop on both sides of the boundary. User intent converges into artifacts. The system converges artifacts into real resources. That is GitOps squared: not “GitOps but more,” but two independent reconciliation loops with an artifact contract between them.

UserSpace and SystemSpace evolve separately. You can swap sync tools or migrate clusters without touching user-facing interfaces, and add new interfaces without changing how the system reconciles infrastructure.

I don’t know if this works

This is a thought experiment. I have not built it at scale. I put together a proof of concept using a Go API server, a Zot OCI registry, and Flux to close the loop. But a POC is not production.

Plenty of organizations run GitOps at scale today. Queue-based commit systems, branch-per-tenant, or a mutex around writes can solve the concurrency problem without rearchitecting anything. If Git works for you, keep using it.

What I like is the boundary. Nobody commits to someone else’s repository. The system does not care how intent was produced. Humans, pipelines, and AI agents all interact through the same contract. Whether that cleaner separation is worth the extra moving parts, I honestly do not know.

If you have tried something similar, or see reasons this would fail, I would love to hear about it.