Booting 640 HPC Nodes in 5 Minutes: Part 4 - Booting Nodes

Oct 17, 2024 • HPC, OpenCHAMI, Booting • 3 min read • by Alex Lovell-Troy

Booting Nodes

In previous posts, we covered how we set up OpenCHAMI and interacted with it via the API and CLI. Now, let’s dive into one of the most critical aspects of managing a large HPC cluster—booting nodes efficiently and reliably.

Diskless Nodes and Remote Filesystems

Like many HPC systems, the nodes in the Badger cluster are diskless. Each boot relies on loading a remote filesystem image into memory. The image is built to include everything needed for the node to operate, while any filesystem changes during runtime are saved to an overlayfs layer, which also runs in memory.

OpenCHAMI itself doesn’t include tooling to build, store, and serve system images. In keeping with our core principle of modularity, each site has their own preferred OS and image build pipeline. And, since OpenCHAMI doesn’t have custom softare that must be installed in the system image, any Linux operating system should work. OpenCHAMI references existing kernels, ramdisks, and system image through urls in boot parameters.

At LANL, we use Buildah and containers to create images and then share them with Quay. For automation, we use gitlab runners to trigger a new image build on new commits to our git repository.

Create a blank container
```
CNAME=$(buildah from scratch)
```
Mount it
```
MNAME=$(buildah mount $CNAME)
```

Install some base packages

dnf groupinstall --installroot=$MNAME minimal-install
dnf install --installroot=$MNAME some-other-list-of-packages

Install the kernel and some need dracut stuff:

dnf install --installroot=$MNAME kernel dracut-live fuse-overlayfs

Rebuild the initrd so that during init it will download the image and mount the rootfs as an in memory overlay

buildah run -tty $CNAME bash -c " \
    dracut \
    --add "dmsquash-live livenet network-manager" \
    --kver $(basename /lib/modules/*) \
    -N \
    -f \
    --logfile /tmp/dracut.log 2>/dev/null \
   "

Commit it
```
buildah commit -t test-image:v1 $CNAME
```

Then you can push it to a remote registry if desired

buildah push test-image:v1 registry.local/test-image:v1

Create a boot configuration

Here’s how a node’s boot process is configured in OpenCHAMI using ochami-cli:

# bss.yaml
macs:
  - AA:BB:CC:DD:EE:FF
initrd: 'http://192.168.1.253:8080/alma/initramfs.img'
kernel: 'http://192.168.1.253:8080/alma/vmlinuz'
params: 'nomodeset ro ip=dhcp selinux=0 console=ttyS0,115200 ip6=off ochami_ci_url=http://10.1.0.3:8081/cloud-init/ ochami_ci_url_secure=http://10.1.0.3:8081/cloud-init-secure/ network-config=disabled rd.shell root=live:http://192.168.1.253:8080/alma/rootfs'

Our kernel commandline has a few unique items:

ochami_ci_url - The url for our cloud-init server which delivers a set of instance-specific information to each node
ochami_ci_url_secure - The secure endpoint for cloud-init which may transmit secrets
root - the root filesystem to boot. This may be nfs:// or http:// or other exotic protocols as needed. The live specification indicates that Linux will download the filesystem and make it an overlayfs layer for the newroot.

To populate BSS with ochami-cli:

ochami-cli bss --add-bootparams --payload bss.yaml

And to view the new data:

ochami-cli bss --get-bootparams

Summary

In this post, we explored how OpenCHAMI orchestrates the boot process for diskless HPC nodes, leveraging remote filesystem images and modular tools like Buildah for creating and managing system images. By maintaining flexibility in image creation and boot configurations, OpenCHAMI allows sites to use their preferred operating systems and infrastructure. With a focus on efficiency and scalability, the system simplifies booting large clusters by integrating seamlessly with existing tools and workflows. As we continue this series, we’ll dive deeper into deployment workflows and how OpenCHAMI can streamline HPC operations across a wide range of environments.

Stay tuned for the final part in our series!

Booting 640 HPC Nodes in 5 Minutes: Part 4 - Booting Nodes

Booting Nodes

Diskless Nodes and Remote Filesystems

Creating and sharing system images

Create a boot configuration

Summary

Related posts

Booting 640 HPC Nodes in 5 Minutes: Part 1 - An Introduction to OpenCHAMI

Booting 640 HPC Nodes in 5 Minutes: Part 2 - Deploying OpenCHAMI

Booting 640 HPC Nodes in 5 Minutes: Part 3 - Interacting with OpenCHAMI