In the background, VESSL Clusters leverages GPU-accelerated Docker containers and Kubernetes pods. It abstracts the complex compute backends and system details of Kubernetes-backed GPU infrastructure into an easy-to-use web interface and simple CLI commands. Data Scientists and Machine Learning Researchers without any software or DevOps backgrounds can use VESSL’s single-line CURL command to set up and configure on-premise GPU servers for ML.
VESSL’s cluster integration is composed of four primitives.
- VESSL API Server — Enables communication between the user and the GPU clusters, through which users can launch containerized ML workloads.
- VESSL Cluster Agent — Sends information about the clusters and workloads running on the cluster such as the node specifications and model metrics.
- Control plane node — Acts as the 🔗 cluster-wide control tower and orchestrates subsidiary worker nodes.
- Worker nodes — Run specified ML workloads based on the runtime spec and environment received from the control plane node.
Integrating more powerful, multi-node GPU clusters for your team is as easy as integrating your personal laptop. To make the process easier, we’ve prepared a single-line curl command that installs all the binaries and dependencies on your server.
1. Install dependencies
You can install all the dependencies required for cluster integration using a single-line
curl command. The command
- Installs 🔗 Docker if it’s not already installed.
- Installs and configures 🔗 NVIDIA container runtime.
- Installs 🔗 k0s, a lightweight Kubernetes distribution, and designates and configures a control plane node.
- Generates a token and a command for connecting worker nodes to the control plane node configured above.
If you wish to use your control plane solely for the control plane node — meaning not running any ML workloads on the control plane node and only using it for admin and monitoring purposes — add a
--taint-controller flag at the end of the command.
curl -sSLf https://install.dev.vssl.ai | sudo bash -s -- --role=controller
Upon installing all the dependencies, the command returns a follow-up command with a token. You can use this to add worker nodes to the control plane. If you don’t want to add an additional worker node you can skip to the next step.
curl -sSLf https://install.dev.vssl.ai | sudo bash -s -- --role worker --token '[TOKEN_HERE]'
You can confirm that your control plane and worker node have been successfully configured using a
sudo k0s kubectl get nodes
Please try a manual installation if you encounter an error while installing with magic script.
2. Install vessl agent
First, make sure that you set the
kubeconfig in your home directory.
chmod +r /var/lib/k0s/pki/admin.conf mkdir -p ~/.kube/config cp /var/lib/k0s/pki/admin.conf ~/.kube/config
You are now ready to integrate the Kubernetes cluster with VESSL. Make sure you have VESSL Client installed on the server and configured for your organization.
pip install vessl --upgrade
The following single-line command connects your Kubernetes-backed GPU cluster to VESSL. Note the —mode multi flag, specifying multi-node cluster integraiton.
vessl cluster create --name='[CLUSTER_NAME_HERE]' --mode=multi
By this point, you have successfully completed the integration.
You can use VESSL CLI command or visit 🗂️ Clusters to confirm your integration.
vessl cluster list
Destroy and delete the cluster
In order to destroy a cluster created by VESSL, follow these steps:
k0s stop k0s reset
To complete the deletion, you may need to reboot your machine.
After destroying a cluster, you can delete it from the cluster page.