Constructing a Run

With VESSL’s user-friendly Web Console, setting up a new machine learning run is easier than ever. There are two primary ways to create a run in the Web Console. web_console

Create a Run from scratch

Metadata

Metadata configuration is used to annotate runs with additional contextual infromation. This includes name, description, and tags. Note that name is required field and tags are unique in the project. metadata

Resources

You can create a run on either VESSL’s managed cluster or your custom cluster. Start by selecting a cluster.

Cluster & Resource

  • VESSL managed cluster

  • Custom cluster

Once you selected VESSL’s managed cluster, you can view a list of available resources under the dropdown menu.

vessl_managed

You also have an option to use sopt instances.

Run on spot instance

Handling spot interruption and checkpointing to preserve your work.

Check out the full list of resource types and corresponding prices:

Billing information

Calculating fees according to the time and type of computational resources consumed.

Container image

The Container image specifies typically the Docker image to be used for the run. The image encompasses all required dependencies and the environment needed for executing your machine learning model seamlessly. You can either use a VESSL-managed image or your own custom image.

  • VESSL managed image

  • Custom image

Managed images serve as wrapper images built on top of NVIDIA GPU Cloud (NGC) images, providing an optimized and streamlined environment for GPU-accelerated applications and workflows. managed_image

Task

Volumes

The volumes configuration plays a crucial role in mananging data flows with respect to the run container. Three primary volume operations — import, mount, and export— determine the data accessibility and transfer mechanisms.

  • Import

  • Mount

  • Export

During import operation, specified data will be downloaded into the run container. This is particularly useful when container requires local access to certain data before or during execution. import

  1. Code: Source code required for the run.
  2. Dataset: The dataset registered in VESSL Dataset.
  3. Model: Pre-trained ML checkpoints registered in VESSL Model Registry.
  4. VESSL Artifcat: The storage manged within VESSL. You can use it as a backup volume.
  5. Object Storage: Data stored in a generic object storage.
  6. Files: Uploaded local files.

Backup and Restore Data

Run, Backup, Repeat: GPU-powered JupyterLab with VESSL Artifact

By understanding and correctly configuring these volumes options, users can create a flexible and efficient data flow strategy in their VESSL Runs.

Start commands

Start commands are a collection of commands that specify how a container should begin execution after it is initialized. These commands can be grouped into two categories.

start_command

  1. Commands that include a pair of working directory and the command to be run in the container.
  2. A wait command to introduce a delay before or between command execution.

The start command can be empty to signify an interactive run where the user is expected to manually execute commands within the container.

Interactive

Interactive is a key feature designed to specify whether the container allows interactive communication with the user. interactive It is particularly useful for debugging, data analysis, or running services that require user interaction. By default, the interactive run supports JupyterLab and SSH. Both Max runtime and Jupyter idle timeout are useful to mange resource usage and costs. You can also use multiple types of custom service via specified ports.

Port

Port configuration is a list of maps that specifies infromation about a particular application or service should expose. Each map within the list defines specific attributes of a port such as its number, name, and type. port

Variables

Environment variables

You can set environment variables as key-value pairs. env_vars A typical machine learning run will include hyperparameters such as learning_rate and optimizer. You can also use them at runtime by appending them to the start command as follows.

python main.py  \
  --learning_rate $learning_rate
  --optimzer $optimizer

If you have sensitive information like API keys or passwords that you need to include in your environment, you can mark these variables as secrets. The values will never be shown in the UI, ensuring an extra layer of secrutiy.

Advanced Settings

advance

Service account name

A service account is a type of non-human account that Kubernetes provides a distinct identity in a cluster. The account is useful to implement identity-based security policies. Create one in a Kubernetes cluster and specify its name.

Termination protection

Checking the termination protection option puts the run in idle once it completes running, so you to access the container of the finished run.

Create a Run from template

Initiate a new run using a pre-configured template as a baseline. Instead of setting up each parameter and configuration from scratch, you can use a template that already has essential settings and parameters defined. This can significantly accelerate the deployment and testing phases of your projects by reusing configurations that are known to work well for specific use-cases.

template

The template typically comes in a YAML format. You can further customize these templates to better fit your specific requirements, making it a versatile tool for repetitive or complex tasks.

Additionally, for more advanced configurations and examples, you can visit VESSL Hub. The hub offers a variatey of YAML examples that you can use as references.

vessl_hub