Node Management
This document explains how to manage worker nodes using Cluster API Machine resources.
TOC
PrerequisitesOverviewWorker Node DeploymentStep 1: Configure IP-Hostname PoolStep 2: Configure Machine TemplateStep 3: Configure Bootstrap TemplateStep 4: Configure Machine DeploymentNode Management OperationsScaling Worker NodesAdding Worker NodesRemoving Worker NodesDeleting Specific NodesUpgrading Machine InfrastructureUpdating Bootstrap TemplatesUpgrading Kubernetes VersionPrerequisites
Important Prerequisites
- The control plane must be deployed before performing node operations. See Create Cluster for setup instructions.
- Ensure you have proper access to the DCS platform and required permissions.
Configuration Guidelines When working with the configurations in this document:
- Only modify values enclosed in
<>brackets - Replace placeholder values with your environment-specific settings
- Preserve all other default configurations unless explicitly required
Overview
Worker nodes are managed through Cluster API Machine resources, providing declarative and automated node lifecycle management. The deployment process involves:
- IP-Hostname Pool Configuration - Network settings for worker nodes
- Machine Template Setup - VM specifications
- Bootstrap Configuration - Node initialization and join settings
- Machine Deployment - Orchestration of node creation and management
Worker Node Deployment
Step 1: Configure IP-Hostname Pool
The IP-Hostname Pool defines the network configuration for worker node virtual machines. You must plan and configure the IP addresses, hostnames, DNS servers, and other network parameters before deployment.
Pool Size Requirement The pool must include at least as many entries as the number of worker nodes you plan to deploy. Insufficient entries will prevent node deployment.
Example:
Create a DCSIpHostnamePool named <worker-iphostname-pool-name>:
Key parameters:
Step 2: Configure Machine Template
The DCSMachineTemplate defines the specifications for worker node virtual machines, including VM templates, compute resources, storage configuration, and network settings.
Required Disk Configurations The following disk mount points are mandatory. Do not remove them:
- System volume (
systemVolume: true) /var/lib/kubelet- Kubelet data directory/var/lib/containerd- Container runtime data/var/cpaas- Platform-specific data
You may add additional disks, but these essential configurations must be preserved.
Example:
Create a DCSMachineTemplate named <worker-dcs-machine-template-name>:
Key parameters:
*Required when parent object is specified
Step 3: Configure Bootstrap Template
The KubeadmConfigTemplate defines the bootstrap configuration for worker nodes, including user accounts, SSH keys, system files, and kubeadm join settings.
Template Optimization The template includes pre-optimized configurations for security and performance. Modify only the parameters that require customization for your environment.
Example:
Step 4: Configure Machine Deployment
The MachineDeployment orchestrates the creation and management of worker nodes by referencing the previously configured DCSMachineTemplate and KubeadmConfigTemplate resources. It manages the desired number of nodes and handles rolling updates.
Example:
Key parameters:
Node Management Operations
This section covers common operational tasks for managing worker nodes, including scaling, updates, upgrades, and template modifications.
Cluster API Framework Node management operations are based on the Cluster API framework. For detailed information, refer to the official Cluster API documentation.
Scaling Worker Nodes
Worker node scaling allows you to adjust cluster capacity based on workload demands. The Cluster API manages the node lifecycle automatically through the MachineDeployment resource.
Adding Worker Nodes
Increase the number of worker nodes to handle increased workload or add new capacity.
Use Case: Scale up cluster to add more compute resources
Prerequisites:
- Verify the IP pool has sufficient available IP addresses for new nodes
- Ensure the DCS platform has adequate resources to provision new VMs
Procedure:
-
Check Current Node Status
View the current machines in the cluster:
-
Verify IP Pool Capacity
Before scaling, ensure the IP pool has enough available entries:
Check that the pool contains at least as many entries as the desired replica count.
WARNINGIP Pool Requirement If the IP pool has insufficient entries, add more IP entries to the pool before scaling. Refer to the IP Pool Configuration section for guidance on adding entries.
-
Scale Up the MachineDeployment
Update the
replicasfield to the desired number of nodes:Example: Scale from 3 to 5 nodes
-
Monitor the Scaling Progress
Watch the machine creation process:
The Cluster API controller will automatically create new machines based on the MachineDeployment template.
-
Verify Nodes Joined the Cluster
Switch to the target cluster context and verify the new nodes:
The new nodes should appear in the list and transition to
Readystatus.
Rolling Update Behavior When scaling up, new nodes are created immediately without affecting existing nodes. This ensures zero-downtime scaling.
Removing Worker Nodes
Decrease the number of worker nodes to reduce cluster capacity or remove underutilized resources.
Use Case: Scale down cluster to reduce costs or adjust to reduced workload
Procedure:
-
Identify Nodes to Remove
View the current machines in the MachineDeployment:
-
Scale Down the MachineDeployment
Update the
replicasfield to reduce the node count:Example: Scale from 5 to 3 nodes
-
Monitor the Removal Progress
Watch the machine deletion process:
The Cluster API controller will:
- Drain the selected nodes (evict pods if possible)
- Delete the underlying VMs from the DCS platform
- Remove the machine resources
-
Verify Nodes Removed
Switch to the target cluster context:
The removed nodes should no longer appear in the list.
Data Loss Warning Scaling down removes nodes and their associated disks. Ensure:
- Workloads can tolerate node loss through proper replication
- No critical data is stored only on the nodes being removed
- Applications are designed for horizontal scaling
Deleting Specific Nodes
Remove a specific unhealthy or problematic node while maintaining the replica count.
Use Case: Replace a single unhealthy node without scaling the entire deployment
Procedure:
-
Identify the Unhealthy Machine
Find the machine corresponding to the unhealthy node:
-
Annotate the Machine for Deletion
Mark the machine for deletion:
-
Wait for Machine Replacement
The Cluster API controller will:
- Delete the annotated machine
- Create a new machine to maintain the desired replica count
- The new machine will automatically join the cluster
-
Monitor the Replacement
-
Verify Node Replacement
The new node should appear and transition to
Readystatus.
Automatic Replacement If you only annotate a machine for deletion without changing the replica count, the MachineDeployment automatically creates a replacement machine to maintain the desired state.
Upgrading Machine Infrastructure
To upgrade worker machine specifications (CPU, memory, disk, VM template), follow these steps:
-
Create New Machine Template
- Copy the existing
DCSMachineTemplatereferenced by yourMachineDeployment - Modify the required values (CPU, memory, disk, VM template, etc.)
- Give the new template a unique name
- Apply the new
DCSMachineTemplateto the cluster
- Copy the existing
-
Update Machine Deployment
- Modify the
MachineDeploymentresource - Update the
spec.template.spec.infrastructureRef.namefield to reference the new template - Apply the changes
- Modify the
-
Rolling Update
- The system will automatically trigger a rolling update
- Worker nodes will be replaced with the new specifications
- Monitor the update progress through the MachineDeployment status
Updating Bootstrap Templates
Bootstrap templates (KubeadmConfigTemplate) are used by MachineDeployment and MachineSet resources. Changes to existing templates do not automatically trigger rollouts of existing machines; only new machines use the updated template.
Update Process:
-
Export Existing Template
-
Modify Configuration
- Update the desired fields in the exported YAML
- Change the
metadata.nameto a new unique name - Remove extraneous metadata fields (
resourceVersion,uid,creationTimestamp, etc.)
-
Create New Template
-
Update MachineDeployment
- Modify the MachineDeployment resource
- Update
spec.template.spec.bootstrap.configRef.nameto reference the new template - Apply the changes to trigger a rolling update
Template Rollout Behavior Existing machines continue using the old bootstrap configuration. Only newly created machines (during scaling or rolling updates) will use the updated template.
Upgrading Kubernetes Version
Kubernetes version upgrades require coordinated updates to both the MachineDeployment and the underlying VM template to ensure compatibility.
Upgrade Process:
-
Update Machine Template
- Create a new
DCSMachineTemplatewith an updatedvmTemplateNamethat supports the target Kubernetes version - Ensure the VM template includes the correct Kubernetes binaries and dependencies
- Create a new
-
Update MachineDeployment
- Modify the
MachineDeploymentresource with the following changes:- Update
spec.template.spec.versionto the target Kubernetes version - Update
spec.template.spec.infrastructureRef.nameto reference the new machine template - Optionally update
spec.template.spec.bootstrap.configRef.nameif bootstrap configuration changes are needed
- Update
- Modify the
-
Monitor Upgrade
- The system will perform a rolling upgrade of worker nodes
- Verify that new nodes join the cluster with the correct Kubernetes version
- Monitor cluster health throughout the upgrade process
Version Compatibility Ensure the VM template's Kubernetes version matches the version specified in the MachineDeployment. Mismatched versions will cause node join failures.