Node Management

This document explains how to manage worker nodes using Cluster API Machine resources.

Prerequisites Overview Worker Node Deployment Step 1: Configure IP-Hostname Pool Step 2: Configure Machine Template Step 3: Configure Bootstrap Template Step 4: Configure Machine Deployment Node Management Operations Scaling Worker Nodes Adding Worker Nodes Removing Worker Nodes Deleting Specific Nodes Upgrading Machine Infrastructure Updating Bootstrap Templates Upgrading Kubernetes Version

Prerequisites

WARNING

Important Prerequisites

The control plane must be deployed before performing node operations. See Create Cluster for setup instructions.
Ensure you have proper access to the DCS platform and required permissions.

INFO

Configuration Guidelines When working with the configurations in this document:

Only modify values enclosed in <> brackets
Replace placeholder values with your environment-specific settings
Preserve all other default configurations unless explicitly required

Overview

Worker nodes are managed through Cluster API Machine resources, providing declarative and automated node lifecycle management. The deployment process involves:

IP-Hostname Pool Configuration - Network settings for worker nodes
Machine Template Setup - VM specifications
Bootstrap Configuration - Node initialization and join settings
Machine Deployment - Orchestration of node creation and management

Worker Node Deployment

Step 1: Configure IP-Hostname Pool

The IP-Hostname Pool defines the network configuration for worker node virtual machines. You must plan and configure the IP addresses, hostnames, DNS servers, and other network parameters before deployment.

WARNING

Pool Size Requirement The pool must include at least as many entries as the number of worker nodes you plan to deploy. Insufficient entries will prevent node deployment.

Example:

Create a DCSIpHostnamePool named <worker-iphostname-pool-name>:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSIpHostnamePool
metadata:
  name: <worker-iphostname-pool-name>
  namespace: cpaas-system
spec:
  pool:
  - ip: "<worker-ip-1>"
    mask: "<worker-mask>"
    gateway: "<worker-gateway>"
    dns: "<worker-dns>"
    hostname: "<worker-hostname-1>"
    machineName: "<worker-machine-name-1>"
  - ip: "<worker-ip-2>"
    mask: "<worker-mask>"
    gateway: "<worker-gateway>"
    dns: "<worker-dns>"
    hostname: "<worker-hostname-2>"
    machineName: "<worker-machine-name-2>"
  - ip: "<worker-ip-3>"
    mask: "<worker-mask>"
    gateway: "<worker-gateway>"
    dns: "<worker-dns>"
    hostname: "<worker-hostname-3>"
    machineName: "<worker-machine-name-3>"

Key parameters:

Parameter	Type	Description	Required
`.spec.pool[].ip`	string	IP address for the worker virtual machine	Yes
`.spec.pool[].mask`	string	Subnet mask for the network	Yes
`.spec.pool[].gateway`	string	Gateway IP address	Yes
`.spec.pool[].dns`	string	DNS server IP addresses (comma-separated for multiple)	No
`.spec.pool[].machineName`	string	Virtual machine name in the DCS platform	No
`.spec.pool[].hostname`	string	Hostname for the virtual machine	No

Step 2: Configure Machine Template

The DCSMachineTemplate defines the specifications for worker node virtual machines, including VM templates, compute resources, storage configuration, and network settings.

WARNING

Required Disk Configurations The following disk mount points are mandatory. Do not remove them:

System volume (systemVolume: true)
/var/lib/kubelet - Kubelet data directory
/var/lib/containerd - Container runtime data
/var/cpaas - Platform-specific data

You may add additional disks, but these essential configurations must be preserved.

Example:

Create a DCSMachineTemplate named <worker-dcs-machine-template-name>:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
metadata:
  name: <worker-dcs-machine-template-name>
  namespace: cpaas-system
spec:
  template:
    spec:
      vmTemplateName: <vm-template-name>
      location:
        type: folder
        name: <folder-name>
      resource: # Optional, if not specified, uses template defaults
        type: cluster # cluster | host. Optional
        name: <cluster-name> # Optional
      vmConfig:
        dvSwitchName: <dv-switch-name> # Optional
        portGroupName: <port-group-name> # Optional
        dcsMachineCpuSpec:
          quantity: <worker-cpu>
        dcsMachineMemorySpec: # MB
          quantity: <worker-memory>
        dcsMachineDiskSpec: # GB
        - quantity: 0
          datastoreClusterName: <datastore-cluster-name>
          systemVolume: true
        - quantity: 100
          datastoreClusterName: <datastore-cluster-name>
          path: /var/lib/kubelet
          format: xfs
        - quantity: 100
          datastoreClusterName: <datastore-cluster-name>
          path: /var/lib/containerd
          format: xfs
        - quantity: 100
          datastoreClusterName: <datastore-cluster-name>
          path: /var/cpaas
          format: xfs
      ipHostPoolRef:
        name: <worker-iphostname-pool-name>

Key parameters:

Parameter	Type	Description	Required
`.spec.template.spec.vmTemplateName`	string	DCS virtual machine template name	Yes
`.spec.template.spec.location`	object	VM creation location (auto-selected if not specified)	No
`.spec.template.spec.location.type`	string	Location type (currently supports "folder" only)	Yes*
`.spec.template.spec.location.name`	string	Folder name for VM creation	Yes*
`.spec.template.spec.resource`	object	Compute resource selection (auto-selected if not specified)	No
`.spec.template.spec.resource.type`	string	Resource type: `cluster` or `host`	Yes*
`.spec.template.spec.resource.name`	string	Compute resource name	Yes*
`.spec.template.spec.vmConfig`	object	Virtual machine configuration	Yes
`.spec.template.spec.vmConfig.dvSwitchName`	string	Virtual switch name (uses template default if not specified)	No
`.spec.template.spec.vmConfig.portGroupName`	string	Port group name (must belong to the specified switch)	No
`.spec.template.spec.vmConfig.dcsMachineCpuSpec.quantity`	int	CPU cores for worker VM	Yes
`.spec.template.spec.vmConfig.dcsMachineMemorySpec.quantity`	int	Memory size in MB	Yes
`.spec.template.spec.vmConfig.dcsMachineDiskSpec`	[]object	Disk configuration array	Yes
`.spec.template.spec.vmConfig.dcsMachineDiskSpec[].quantity`	int	Disk size in GB (0 for system disk uses template size)	Yes
`.spec.template.spec.vmConfig.dcsMachineDiskSpec[].datastoreClusterName`	string	Datastore cluster name	Yes
`.spec.template.spec.vmConfig.dcsMachineDiskSpec[].systemVolume`	bool	System disk flag (only one disk can be true)	No
`.spec.template.spec.vmConfig.dcsMachineDiskSpec[].path`	string	Mount path (disk not mounted if omitted)	No
`.spec.template.spec.vmConfig.dcsMachineDiskSpec[].format`	string	Filesystem format (e.g., xfs, ext4)	No
`.spec.template.spec.ipHostPoolRef.name`	string	Referenced DCSIpHostnamePool name	Yes

*Required when parent object is specified

Step 3: Configure Bootstrap Template

The KubeadmConfigTemplate defines the bootstrap configuration for worker nodes, including user accounts, SSH keys, system files, and kubeadm join settings.

INFO

Template Optimization The template includes pre-optimized configurations for security and performance. Modify only the parameters that require customization for your environment.

Example:

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: <worker-kubeadm-config-template>
  namespace: cpaas-system
spec:
  template:
    spec:
      format: ignition
      users:
      - name: boot
        sshAuthorizedKeys:
        - "<ssh-authorized-keys>"
      files:
      - path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
        owner: "root:root"
        permissions: "0644"
        content: |
          {
            "apiVersion": "kubelet.config.k8s.io/v1beta1",
            "kind": "KubeletConfiguration",
            "protectKernelDefaults": true,
            "staticPodPath": null,
            "tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
            "tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key",
            "streamingConnectionIdleTimeout": "5m",
            "clientCAFile": "/etc/kubernetes/pki/ca.crt"
          }
      preKubeadmCommands:
      - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
      - mkdir -p /run/cluster-api && restorecon -Rv /run/cluster-api
      - if [ -f /etc/disk-setup.sh ]; then bash /etc/disk-setup.sh; fi
      postKubeadmCommands:
      - chmod 600 /var/lib/kubelet/config.yaml
      joinConfiguration:
        patches:
          directory: /etc/kubernetes/patches
        nodeRegistration:
          kubeletExtraArgs:
            provider-id: PROVIDER_ID
            volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"

Step 4: Configure Machine Deployment

The MachineDeployment orchestrates the creation and management of worker nodes by referencing the previously configured DCSMachineTemplate and KubeadmConfigTemplate resources. It manages the desired number of nodes and handles rolling updates.

Example:

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: <worker-machine-deployment-name>
  namespace: cpaas-system
spec:
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
  clusterName: <cluster-name>
  replicas: 3
  selector:
    matchLabels: null
  template:
    spec:
      nodeDrainTimeout: 1m
      nodeDeletionTimeout: 5m
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: <worker-kubeadm-config-template-name>
          namespace: cpaas-system
      clusterName: <cluster-name>
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: DCSMachineTemplate
        name: <worker-dcs-machine-template-name>
        namespace: cpaas-system
      version: <worker-kubernetes-version>

Key parameters:

Parameter	Type	Description	Required
`.spec.clusterName`	string	Target cluster name for node deployment	Yes
`.spec.replicas`	int	Number of worker nodes (must not exceed IP pool size)	Yes
`.spec.template.spec.bootstrap.configRef`	object	Reference to KubeadmConfigTemplate	Yes
`.spec.template.spec.infrastructureRef`	object	Reference to DCSMachineTemplate	Yes
`.spec.template.spec.version`	string	Kubernetes version (must match VM template)	Yes
`.spec.strategy.rollingUpdate.maxSurge`	int	Maximum nodes above desired during update	No
`.spec.strategy.rollingUpdate.maxUnavailable`	int	Maximum unavailable nodes during update	No

Node Management Operations

This section covers common operational tasks for managing worker nodes, including scaling, updates, upgrades, and template modifications.

INFO

Cluster API Framework Node management operations are based on the Cluster API framework. For detailed information, refer to the official Cluster API documentation.

Scaling Worker Nodes

Worker node scaling allows you to adjust cluster capacity based on workload demands. The Cluster API manages the node lifecycle automatically through the MachineDeployment resource.

Adding Worker Nodes

Increase the number of worker nodes to handle increased workload or add new capacity.

Use Case: Scale up cluster to add more compute resources

Prerequisites:

Verify the IP pool has sufficient available IP addresses for new nodes
Ensure the DCS platform has adequate resources to provision new VMs

Procedure:

Check Current Node Status

View the current machines in the cluster:

# List all machines in the cluster
kubectl get machines -n cpaas-system

# List machines for a specific MachineDeployment
kubectl get machines -n cpaas-system -l cluster.x-k8s.io/deployment-name=<worker-machine-deployment-name>

Verify IP Pool Capacity

Before scaling, ensure the IP pool has enough available entries:
kubectl get dcsiphostnamepool -n cpaas-system <worker-iphostname-pool-name> -o yaml
Check that the pool contains at least as many entries as the desired replica count.

WARNING
IP Pool Requirement If the IP pool has insufficient entries, add more IP entries to the pool before scaling. Refer to the IP Pool Configuration section for guidance on adding entries.

Scale Up the MachineDeployment

Update the replicas field to the desired number of nodes:

kubectl patch machinedeployment <worker-machine-deployment-name> -n cpaas-system \
  --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": <new-replica-count>}]'

Example: Scale from 3 to 5 nodes

kubectl patch machinedeployment worker-pool-1 -n cpaas-system \
  --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": 5}]'

Monitor the Scaling Progress

Watch the machine creation process:

# Watch machines being created
kubectl get machines -n cpaas-system -w

# Check MachineDeployment status
kubectl get machinedeployment <worker-machine-deployment-name> -n cpaas-system

The Cluster API controller will automatically create new machines based on the MachineDeployment template.

Verify Nodes Joined the Cluster

Switch to the target cluster context and verify the new nodes:
# Switch to target cluster context kubectl config use-context <target-cluster-context> # Check all nodes are Ready kubectl get nodes
The new nodes should appear in the list and transition to Ready status.

INFO

Rolling Update Behavior When scaling up, new nodes are created immediately without affecting existing nodes. This ensures zero-downtime scaling.

Removing Worker Nodes

Decrease the number of worker nodes to reduce cluster capacity or remove underutilized resources.

Use Case: Scale down cluster to reduce costs or adjust to reduced workload

Procedure:

Identify Nodes to Remove

View the current machines in the MachineDeployment:

kubectl get machines -n cpaas-system -l cluster.x-k8s.io/deployment-name=<worker-machine-deployment-name>

Scale Down the MachineDeployment

Update the replicas field to reduce the node count:

kubectl patch machinedeployment <worker-machine-deployment-name> -n cpaas-system \
  --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": <new-replica-count>}]'

Example: Scale from 5 to 3 nodes

kubectl patch machinedeployment worker-pool-1 -n cpaas-system \
  --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": 3}]'

Monitor the Removal Progress

Watch the machine deletion process:
kubectl get machines -n cpaas-system -w
The Cluster API controller will:
- Drain the selected nodes (evict pods if possible)
- Delete the underlying VMs from the DCS platform
- Remove the machine resources
Verify Nodes Removed

Switch to the target cluster context:
kubectl config use-context <target-cluster-context> kubectl get nodes
The removed nodes should no longer appear in the list.

WARNING

Data Loss Warning Scaling down removes nodes and their associated disks. Ensure:

Workloads can tolerate node loss through proper replication
No critical data is stored only on the nodes being removed
Applications are designed for horizontal scaling

Deleting Specific Nodes

Remove a specific unhealthy or problematic node while maintaining the replica count.

Use Case: Replace a single unhealthy node without scaling the entire deployment

Procedure:

Identify the Unhealthy Machine

Find the machine corresponding to the unhealthy node:

# List all machines
kubectl get machines -n cpaas-system

# Check machine status
kubectl get machine <machine-name> -n cpaas-system -o yaml

Annotate the Machine for Deletion

Mark the machine for deletion:

kubectl patch machine <machine-name> -n cpaas-system \
  --type='merge' -p='{"metadata": {"annotations": {"cluster.x-k8s.io/delete-machine": "true"}}}'

Wait for Machine Replacement

The Cluster API controller will:
- Delete the annotated machine
- Create a new machine to maintain the desired replica count
- The new machine will automatically join the cluster

Monitor the Replacement

# Watch machine status
kubectl get machines -n cpaas-system -w

Verify Node Replacement
kubectl config use-context <target-cluster-context> kubectl get nodes
The new node should appear and transition to Ready status.

INFO

Automatic Replacement If you only annotate a machine for deletion without changing the replica count, the MachineDeployment automatically creates a replacement machine to maintain the desired state.

Upgrading Machine Infrastructure

To upgrade worker machine specifications (CPU, memory, disk, VM template), follow these steps:

Create New Machine Template
- Copy the existing DCSMachineTemplate referenced by your MachineDeployment
- Modify the required values (CPU, memory, disk, VM template, etc.)
- Give the new template a unique name
- Apply the new DCSMachineTemplate to the cluster
Update Machine Deployment
- Modify the MachineDeployment resource
- Update the spec.template.spec.infrastructureRef.name field to reference the new template
- Apply the changes
Rolling Update
- The system will automatically trigger a rolling update
- Worker nodes will be replaced with the new specifications
- Monitor the update progress through the MachineDeployment status

Updating Bootstrap Templates

Bootstrap templates (KubeadmConfigTemplate) are used by MachineDeployment and MachineSet resources. Changes to existing templates do not automatically trigger rollouts of existing machines; only new machines use the updated template.

Update Process:

Export Existing Template

kubectl get KubeadmConfigTemplate <template-name> -o yaml > new-template.yaml

Modify Configuration
- Update the desired fields in the exported YAML
- Change the metadata.name to a new unique name
- Remove extraneous metadata fields (resourceVersion, uid, creationTimestamp, etc.)
Create New Template
kubectl apply -f new-template.yaml
Update MachineDeployment
- Modify the MachineDeployment resource
- Update spec.template.spec.bootstrap.configRef.name to reference the new template
- Apply the changes to trigger a rolling update

WARNING

Template Rollout Behavior Existing machines continue using the old bootstrap configuration. Only newly created machines (during scaling or rolling updates) will use the updated template.

Upgrading Kubernetes Version

Kubernetes version upgrades require coordinated updates to both the MachineDeployment and the underlying VM template to ensure compatibility.

Upgrade Process:

Update Machine Template
- Create a new DCSMachineTemplate with an updated vmTemplateName that supports the target Kubernetes version
- Ensure the VM template includes the correct Kubernetes binaries and dependencies
Update MachineDeployment
- Modify the MachineDeployment resource with the following changes:
  - Update spec.template.spec.version to the target Kubernetes version
  - Update spec.template.spec.infrastructureRef.name to reference the new machine template
  - Optionally update spec.template.spec.bootstrap.configRef.name if bootstrap configuration changes are needed
Monitor Upgrade
- The system will perform a rolling upgrade of worker nodes
- Verify that new nodes join the cluster with the correct Kubernetes version
- Monitor cluster health throughout the upgrade process

WARNING

Version Compatibility Ensure the VM template's Kubernetes version matches the version specified in the MachineDeployment. Mismatched versions will cause node join failures.

#Node Management

#TOC

#Prerequisites

#Overview

#Worker Node Deployment

#Step 1: Configure IP-Hostname Pool

#Step 2: Configure Machine Template

#Step 3: Configure Bootstrap Template

#Step 4: Configure Machine Deployment

#Node Management Operations

#Scaling Worker Nodes

#Adding Worker Nodes

#Removing Worker Nodes

#Deleting Specific Nodes

#Upgrading Machine Infrastructure

#Updating Bootstrap Templates

#Upgrading Kubernetes Version

Node Management

TOC

Prerequisites

Overview

Worker Node Deployment

Step 1: Configure IP-Hostname Pool

Step 2: Configure Machine Template

Step 3: Configure Bootstrap Template

Step 4: Configure Machine Deployment

Node Management Operations

Scaling Worker Nodes

Adding Worker Nodes

Removing Worker Nodes

Deleting Specific Nodes

Upgrading Machine Infrastructure

Updating Bootstrap Templates

Upgrading Kubernetes Version