Deployment and Infrastructure

Kubernetes deployment, autoscaling, health probes, zone-aware topology, multi-region failover, and manifest generation in Aether.

Aether runs world-server pods on Kubernetes with custom metric-based scaling, zone-aware scheduling, and multi-region routing. The aether-deploy crate generates deterministic deployment manifests, scaling configurations, and topology rules as plain YAML for consumption by kubectl apply or GitOps pipelines.

Key Concepts

  • Manifest generation -- Deterministic K8s YAML from Rust structs, with no hand-editing required.
  • Player-count scaling -- Custom HPA metrics based on player count rather than CPU utilization.
  • Zone-aware scheduling -- Pod affinity rules co-locate players in the same availability zone.
  • Health probes -- Liveness and readiness probes that distinguish "process alive" from "ready for players."
  • Multi-region routing -- Directs players to the closest healthy region with automatic failover.
  • Persistent storage -- PVC management for Write-Ahead Log (WAL) survival across pod restarts.

Architecture

The deployment system generates configuration artifacts consumed by CI/CD and Kubernetes:

DeploymentConfig (Rust struct)
       |
       v
  Manifest Generator
       |
  +----+--------+--------+--------+
  |    |        |        |        |
  v    v        v        v        v
YAML  Scaling  Topology Probes  Region
Manifest Rules  Rules   Config  Routing

All modules produce in-memory structs that serialize to YAML via serde and serde_yaml. There is no Kubernetes client SDK dependency -- manifests are generated as plain YAML strings.

Manifest Generation

The manifest generator produces complete Kubernetes resource definitions:

use aether_deploy::{DeploymentConfig, WorkloadKind, ResourceRequirements, PvcConfig};

let config = DeploymentConfig {
    name: "world-server".to_string(),
    namespace: "aether-prod".to_string(),
    image: "aether/world-server:v1.2.0".to_string(),
    replicas: 3,
    kind: WorkloadKind::StatefulSet, // durable worlds with WAL
    resources: ResourceRequirements {
        cpu_request: "500m".to_string(),
        cpu_limit: "2000m".to_string(),
        memory_request: "1Gi".to_string(),
        memory_limit: "4Gi".to_string(),
    },
    pvc: Some(PvcConfig {
        storage_class: "ssd".to_string(),
        size: "10Gi".to_string(),
        mount_path: "/data/wal".to_string(),
    }),
    // ... probes, topology, scaling config
};

let yaml: String = config.render_yaml();
// Produces a complete K8s StatefulSet manifest

Workload kind selection:

KindWhen to UseStorage
StatefulSetDurable worlds needing WAL persistencePVC attached
DeploymentStateless gateway / matchmaking podsNo persistent storage

When pvc is Some, the generator produces a StatefulSet with a volumeClaimTemplate. When pvc is None, a Deployment is generated instead.

Player-Count Autoscaling

CPU utilization is a poor proxy for VR world capacity. A world with 100 idle players uses little CPU but still needs memory and bandwidth headroom. Aether uses player-count-based scaling:

use aether_deploy::{ScalingConfig, ScalingDecision};

let config = ScalingConfig {
    target_players_per_pod: 50,
    min_replicas: 2,
    max_replicas: 20,
    scale_up_cooldown_secs: 60,
    scale_down_cooldown_secs: 300,
};

let decision: ScalingDecision = config.compute_desired_replicas(
    current_players,   // total active players
    current_replicas,  // current pod count
);

match decision {
    ScalingDecision::ScaleUp { target_replicas } => {
        // Scale up to accommodate more players
    }
    ScalingDecision::ScaleDown { target_replicas } => {
        // Scale down to save resources
    }
    ScalingDecision::Hold => {
        // No change needed, or cooldown still active
    }
}

The scaling algorithm:

  1. Compute the load ratio: current_players / current_replicas.
  2. If load ratio > target_players_per_pod and cooldown has elapsed: scale up to ceil(current_players / target_players_per_pod).
  3. If load ratio < target * 0.5 and cooldown has elapsed: scale down to max(ceil(current_players / target), min_replicas).
  4. Otherwise: hold.
  5. All results are clamped to [min_replicas, max_replicas].

Scale-down uses a longer cooldown (default 5 minutes) than scale-up (default 1 minute) to avoid flapping.

Custom HPA Metrics

The scaling configuration renders as HPA annotations on the generated manifest. Custom metrics are defined as:

use aether_deploy::CustomMetric;

let metric = CustomMetric {
    name: "aether_world_player_count".to_string(),
    target_value: 50,
    metric_type: "Pods".to_string(),
};

These metrics are consumed by a custom metrics adapter (e.g., Prometheus Adapter) that exposes player counts from the world server telemetry.

Zone-Aware Topology

Pod affinity and anti-affinity rules control scheduling to optimize latency and cost:

use aether_deploy::{TopologyConfig, AffinityRule};

let topology = TopologyConfig {
    spread_across_zones: true,
    topology_key: "topology.kubernetes.io/zone".to_string(),
    preferred_colocation: vec![
        AffinityRule {
            service: "voice-relay".to_string(),
            weight: 80,
        },
    ],
};

The topology generator produces:

  • Pod anti-affinity with topologyKey: topology.kubernetes.io/zone to spread world-server pods across availability zones, reducing blast radius.
  • Preferred pod affinity to co-locate related services (e.g., world servers near voice relay pods) when possible.

Health Probes

Health probes distinguish between process liveness and readiness to accept players:

use aether_deploy::{ProbeConfig, Probe};

let probes = ProbeConfig {
    liveness: Probe {
        path: "/healthz".to_string(),
        port: 8080,
        initial_delay_secs: 10,
        period_secs: 15,
        failure_threshold: 3,
    },
    readiness: Probe {
        path: "/readyz".to_string(),
        port: 8080,
        initial_delay_secs: 5,
        period_secs: 10,
        failure_threshold: 2,
    },
};
ProbePurposeFailure Action
Liveness"Is the process alive?"Pod is restarted
Readiness"Can it accept players?"Pod removed from service endpoints

A world server might be alive (liveness passes) but not ready (loading world state from WAL). The readiness probe ensures players are only routed to fully initialized pods.

Multi-Region Routing

The region router directs players to the closest healthy region:

use aether_deploy::{RegionRoutingConfig, RoutingDecision};

let config = RegionRoutingConfig {
    regions: vec![
        Region { code: "us-east-1", latency_ms: 20, healthy: true },
        Region { code: "eu-west-1", latency_ms: 90, healthy: true },
        Region { code: "ap-south-1", latency_ms: 150, healthy: false },
    ],
};

let decision: RoutingDecision = config.route_player("us-east-1");

match decision {
    RoutingDecision::Primary { region } => {
        // Route to the closest healthy region
    }
    RoutingDecision::Failover { region, reason } => {
        // Primary unhealthy, routing to next-closest
    }
}

Routing logic:

  1. Find the player's preferred region (based on geographic proximity or explicit selection).
  2. If the preferred region is healthy, route there.
  3. If unhealthy, select the next-closest healthy region as failover.

Infrastructure Topology

The aether-deploy crate models the full deployment topology including regions, datacenters, and data-plane components: single-primary or sharded PostgreSQL with Patroni failover, Redis cluster for session caching, NATS JetStream with supercluster for cross-region messaging, and S3-compatible object storage with CDN edge distribution.

Database failover follows Patroni-style leader election with configurable loop wait, TTL, retry timeout, and maximum WAL lag on failover (default 1MB).

Deployment Workflow

A typical workflow: define DeploymentConfig -> call render_yaml() -> commit to GitOps repo -> ArgoCD/Flux applies manifests -> HPA monitors custom player-count metrics -> pods scale on demand -> health probes gate traffic -> region routing directs players to nearest healthy cluster.