Feature Overview

The Immutable Infrastructure support provides comprehensive cluster and cloud infrastructure management capabilities designed for enterprise-grade Kubernetes deployments. This platform leverages advanced automation and infrastructure-as-code principles to deliver reliable, scalable, and maintainable infrastructure solutions.

Cluster Management

Our platform offers end-to-end Kubernetes cluster lifecycle management with immutable OS principles, ensuring consistent and reproducible deployments across environments.

Cluster Creation

  • Immutable OS Support: Create clusters using immutable OS patterns for enhanced security and consistency
  • Automated Compute Provisioning: Automatic provisioning of compute instances with pre-configured specifications
  • Bootstrap Automation: Automated cluster bootstrapping with minimal manual intervention

Cluster Deletion

  • Complete Resource Cleanup: Comprehensive deletion process that removes all associated resources
  • Provider Resource Release: Proper deallocation of provider resources to prevent orphaned instances

Cluster Scaling

  • Horizontal Scaling: Add or remove worker nodes to meet workload demands
  • Automated Compute Management: Automatic creation and release of compute instances during scaling operations
  • Zero-Downtime Scaling: Scale operations without service interruption

Cluster Upgrades

  • Kubernetes Version Management: Seamless upgrades to newer Kubernetes versions

Supported Infrastructure Providers

Our platform follows a pluggable provider model aligned with Cluster API infrastructure providers. It is designed for multiple infrastructure platforms. Today, the DCS infrastructure provider is supported, with additional providers in progress.

  • Provider-Agnostic Design: Core workflows are consistent across providers
  • Current Support: DCS infrastructure provider
  • Roadmap: Additional providers are being added

Compute Resource Management

Advanced virtual machine lifecycle management with enterprise-grade features for optimal resource utilization and performance.

Compute Lifecycle Operations

  • Create Compute Instances: Provision instances with customizable specifications and configurations
  • Delete Compute Instances: Secure deletion with proper resource cleanup

Compute Configuration Options

  • Instance/Flavor Selection: Choose from predefined instance types or flavors optimized for different workloads
  • Size Customization: Flexible sizing options from small development instances to large production workloads
  • Resource Allocation: Precise control over CPU, memory, and storage allocation
  • Network Configuration: Advanced networking options including custom subnets and security groups
  • Storage Options: Multiple storage types and classes (for example: SSD, HDD, NVMe) for different performance requirements

Use Cases

Use Case 1: Highly Available Control Plane

Scenario: Deploy a production cluster with a highly available control plane to ensure cluster stability.

Implementation:

  • Deploy a 3-node control plane with automatic failover
  • Control plane nodes are distributed across different availability zones (when supported by the infrastructure)
  • Load balancer automatically distributes API server traffic
  • Automatic recovery of failed control plane components

Benefits:

  • No single point of failure in the control plane
  • Cluster remains operational even if one control plane node fails
  • Automatic recovery reduces manual intervention

Use Case 2: Horizontal Scaling for Workload Demands

Scenario: Respond to increased application load by adding worker nodes, then scale down when demand decreases.

Implementation:

  • Adjust the replicas field in the MachineDeployment resource
  • Cluster API automatically provisions new nodes based on the Machine Template
  • New nodes automatically join the cluster and become ready for workloads
  • When scaling down, nodes are drained and deleted gracefully

Benefits:

  • Respond to workload changes in minutes, not hours
  • Automated scaling reduces operational overhead

Use Case 3: Rolling Upgrades with Zero Downtime

Scenario: Upgrade the Kubernetes version or VM template without disrupting running applications.

Implementation:

  • Update the Machine Template or Kubernetes version in the control plane/MachineDeployment
  • Cluster API performs a rolling upgrade: creates new nodes, waits for them to be ready, then deletes old nodes
  • Configurable maxSurge and maxUnavailable parameters control upgrade behavior
  • Pods are automatically drained from old nodes and rescheduled on new nodes

Benefits:

  • Zero-downtime upgrades for mission-critical applications
  • Gradual rollout allows for early problem detection
  • Easy rollback if issues are discovered
  • No manual node re-provisioning required

Use Case 4: Multi-Node Pool Management

Scenario: Run different types of workloads on dedicated node pools with different configurations.

Implementation:

  • Create multiple MachineDeployments, each with different Machine Templates
  • Configure different resource allocations (CPU, memory, storage) per pool
  • Use node labels and taints to control workload placement
  • Scale each pool independently based on workload requirements

Example Pools:

  • General Purpose Pool: Balanced CPU/memory for typical workloads
  • Compute-Optimized Pool: High CPU for batch processing or build workloads
  • Memory-Optimized Pool: High memory for databases or caching

Benefits:

  • Optimize resource allocation for different workload types
  • Isolate workloads for security and performance
  • Independent scaling per workload type
  • Cost optimization through right-sized resources