Deploying OpenCHAMI: A Hands-On Guide to Setting Up and Running Your Cluster
This blog post is an abridged version of the training we give internal sysadmins at LANL. It guides you through the whole process of building and deploying OpenCHAMI on a set of small teaching clusters that we maintain for that purpose. For more details and example image configurations, visit our repo at github.com/OpenCHAMI/mini-bootcamp
Pre-requisites
To get started, you’ll need:
Linux OS installed on your machine (we assume you’ve done this).
Basic Configuration Management knowledge—while this guide covers essential configs, we won’t dive into full system deployment.
Cluster Images—OpenCHAMI doesn’t come with image-build tools, so we’ll work through building images locally.
1. Initial Package Installations
Install necessary packages for OpenCHAMI deployment:
2. Configure Cluster Hosts
Edit your /etc/hosts file to include entries for your cluster. For example:
3. Setup Power Management
Install powerman and conman for node power and console management:
Configure Powerman: Add device and node info to /etc/powerman/powerman.conf using your shortnames.
Start Powerman:
4. Building a Test Image
Use buildah to create a lightweight test image.
Install buildah:
Build the base image:
Set up the kernel and dependencies:
Rebuild initrd:
Save the image:
5. Microservices Overview
OpenCHAMI relies on several key microservices:
SMD (State Management Database): Stores system hardware data.
BSS (BootScript Service): Provides iPXE scripts to nodes.
Cloud-init: Customized for OpenCHAMI to configure nodes during boot.
TPM-manager: Issues JWTs for secure configurations.
6. Setup OpenCHAMI with Ansible
Clone the deployment recipes repository:
Go to the LANL podman-quadlets recipe:
Inventory Setup: Edit the inventory/01-ochami file to specify your hostname.
Cluster Configurations: Update inventory/group_vars/ochami/cluster.yaml with your cluster name and shortname.
SSH Key Pair: Generate an SSH key, add it to inventory/group_vars/ochami/cluster.yaml under cluster_boot_ssh_pub_key.
Run the Playbook:
7. Testing OpenCHAMI Services
After rebooting, run the full playbook:
Check that the expected containers are running:
8. Setting Up Cloud-init and BSS
Verify Services: Ensure SMD, BSS, and cloud-init are populated correctly.
Boot Nodes: Start and monitor node boots using pm and conman commands.
Logs for Debugging: Open additional terminal windows to monitor logs for DHCP, BSS, and cloud-init.
9. Building and Using Images
For more complex deployments, use the image-builder tool to build layered images.
10. Advanced Topics: Security and Automation
Two-Step Cloud-init: Set up secure configurations by adding a second layer of cloud-init for sensitive data.
JWT for Secure Data: Use tpm-manager to handle secure data distribution to nodes.
Conclusion
By now, you should have a fully deployed OpenCHAMI environment, equipped with essential microservices and custom-built images, ready to scale. As a final step, consider adding further integrations like Slurm for job scheduling and network-mounted filesystems for additional storage solutions.