Kubernetes Deployment

Cliq (Agent Teams for Cursor CLI) can run on Kubernetes for horizontally scaled production deployments. The official Helm chart under helm/cliq/ packages the server workload, shared workspace storage, RBAC for pipeline Jobs, optional ingress, and HPA-friendly defaults.

Prerequisites

Kubernetes 1.26 or newer
Helm 3.x
PostgreSQL — reachable from the cluster (managed or in-cluster); Cliq stores task and A2A state there
ReadWriteMany persistent storage — for the workspace PVC (for example AWS EFS via the EFS CSI driver, or NFS on premises). Server pods and pipeline Job pods must mount the same volume concurrently
Container registry access — pull images from GitHub Container Registry (ghcr.io). Defaults: ghcr.io/elanamir/cliq-server and ghcr.io/elanamir/cliq-runner. Use imagePullSecrets on the service accounts if your cluster requires authenticated pulls

Architecture overview

Server pods expose the HTTP API (including A2A), handle incoming requests, and create Kubernetes Jobs to run pipelines. The server image (cliq-server) includes both the Docker CLI and kubectl for managing Job lifecycles.
Job pods use the runner image (cliq-runner) and execute cliq run --headless --force --no-docker. The --no-docker flag forces the local executor so pipelines run directly inside the Job pod without attempting nested K8s or Docker scheduling.
PostgreSQL holds durable state so server replicas stay stateless relative to task and coordination data.

Both server and Job pods use an init container that copies settings.json from a read-only ConfigMap into a writable emptyDir volume mounted as CLIQ_HOME. This allows the main container to write teams, database files, and other state without hitting a read-only filesystem.

Treat the PVC and database as the two persistence contracts for a correct deployment.

Quick start with Helm

Install the chart from the repository root (adjust namespace and values for your environment):

helm install cliq ./helm/cliq \
  --namespace cliq --create-namespace \
  --set database.url="postgres://..." \
  --set secrets.cursorApiKey="crsr_..." \
  --set ingress.enabled=false

Expose the service locally and confirm the A2A agent card responds:

kubectl port-forward -n cliq svc/cliq-server 4100:4100

curl -sS http://127.0.0.1:4100/.well-known/agent-card.json | head

You should see JSON describing the agent card. For full production access, use an Ingress or in-cluster clients instead of port-forward.

For every Helm value (replicas, images, pipeline.*, autoscaling, ingress, resources, service accounts), see helm/cliq/README.md.

Storage setup

Cliq’s Helm chart provisions a PersistentVolumeClaim for workspaces. The backing StorageClass must support ReadWriteMany so multiple server replicas and many concurrent Jobs can use the same claim.

AWS (EFS)
Install the AWS EFS CSI driver, create an EFS file system and mount targets, then define a StorageClass that uses the EFS CSI provisioner. Set kubernetes.storageClass in the chart values to that class.

On-premises (NFS)
Use an NFS-backed StorageClass or a pre-provisioned PV/PVC pair with accessModes: ReadWriteMany and point the chart at the existing claim or matching storage class and size parameters as documented in the README.

Local testing (kind / minikube)
For single-node test clusters that don’t support ReadWriteMany, set kubernetes.storageAccessMode=ReadWriteOnce.

If PVCs stay Pending, verify the storage class, provisioner health, and that the requested access mode is supported.

Database setup

External managed PostgreSQL (RDS, Cloud SQL, Azure Database, etc.) is the usual production choice. Create a database and user, allow network access from the cluster (security groups, authorized networks, or private connectivity), then set:

database:
  url: "postgres://USER:PASSWORD@HOST:5432/DATABASE"

Pass the same string with --set database.url=... or a values file. Use TLS parameters supported by your driver if required by the provider.

In-cluster PostgreSQL
You can run Postgres with a chart such as Bitnami’s, then point database.url at the resulting service DNS name:

helm install cliq-db oci://registry-1.docker.io/bitnamicharts/postgresql \
  --namespace cliq \
  --set auth.database=cliq \
  --set auth.username=cliq \
  --set auth.password=your-secure-password

Construct database.url from the release notes Bitnami prints (service name and port, typically 5432). For production, pin versions, enable backups, and prefer managed databases when possible.

Scaling

Server pods — The chart can install a HorizontalPodAutoscaler when autoscaling.enabled is true (default). Scaling is CPU-based (targetCPUUtilizationPercentage). Tune minReplicas, maxReplicas, and server resources so the HPA has a meaningful signal.

Nodes and Jobs — Pipeline work runs as Jobs. Under load, the cluster needs enough CPU and memory for both server pods and Job pods. Use Cluster Autoscaler, Karpenter, or your cloud’s node autoscaler so new nodes appear when Jobs cannot schedule.

Concurrency cap — pipeline.maxConcurrent limits how many pipeline Jobs Cliq will run at once, which protects the cluster and the shared workspace from unbounded parallelism. Align it with node capacity and pipeline.memory / pipeline.cpus per Job.

Networking

Ingress with TLS — Set ingress.enabled=true, ingress.className (for example nginx or your AWS Load Balancer Controller class), and ingress.host. Enable ingress.tls for TLS on the chart’s Ingress, or terminate TLS at the load balancer using ingress.annotations appropriate to your controller (ALB, GCE, and so on).

cert-manager — For automatic certificates, install cert-manager, create an Issuer or ClusterIssuer, and add the usual annotations on the Ingress so a Certificate is issued for ingress.host. Exact annotations depend on your issuer type (ACME HTTP-01, DNS-01).

A2A and public URL — If push or mesh features need a stable external URL, set a2a.publicUrl in values to the HTTPS base URL clients use to reach the service.

Monitoring

HTTP probes — Server containers expose:

/healthz — liveness
/readyz — readiness (checks database connectivity)

Kubernetes uses these automatically from the chart’s Deployment. Failed readiness removes the pod from Service endpoints; failed liveness restarts the container.

Pipeline Jobs — Watch Job lifecycle in the release namespace (replace cliq if you used another name):

kubectl get jobs -n cliq -w

Combine with kubectl describe job, kubectl logs, and server logs for end-to-end pipeline debugging.

RBAC

The chart creates a Role granting the server ServiceAccount the following permissions:

API Group	Resources	Verbs
`batch`	`jobs`	create, get, list, watch, delete
`""` (core)	`pods`, `pods/log`	get, list, watch
`""` (core)	`pods/exec`	create
`""` (core)	`namespaces`, `persistentvolumeclaims`, `secrets`, `serviceaccounts`	get

The get permissions on core resources are used during server startup to validate that required K8s resources (namespace, PVC, secret, service account) exist before accepting traffic.

HUG Server

The Kubernetes chart deploys the cliq server and pipeline Jobs. If your teams use human gates, the HUG server must be deployed separately — either as its own Deployment in the cluster or on a standalone host. It needs PostgreSQL and network reachability from both cliq instances and human reviewers’ browsers. See HUG Server for details.

Troubleshooting

Symptom	What to check
Job stuck Pending	`kubectl describe pod` for the Job’s pod: insufficient CPU/memory on nodes, pod affinity, or PVC not bound (storage class / RWX). Ensure the workspace PVC is `Bound` and schedulable volumes exist.
Server cannot reach Kubernetes API or create Jobs	RBAC: server ServiceAccount must be bound to the chart’s Role. Confirm the pod uses the expected service account and namespace. Verify `kubectl` is available in the server image.
Pipeline fails immediately	`CURSOR_API_KEY`: verify the chart secret (from `secrets.cursorApiKey`) and that the key is valid. Check server and Job pod env and logs for auth errors.
Job pod stuck pulling image	For local test clusters (kind), load images with `kind load docker-image`. For production, verify registry credentials and `imagePullSecrets`. Job pods use `imagePullPolicy: IfNotPresent`.
Multi-replica issues with files	Confirm the workspace volume is ReadWriteMany and shared, not RWO per replica.

For chart parameters and defaults, see helm/cliq/README.md.