You can integrate an AWS cluster in your AWS account and connect it to VESSL.

In order to integrate VESSL, the following resources will be created:

  • S3 Bucket: A bucket for storing configuration, state, and data.
  • EKS Cluster: An AWS-managed Kubernetes cluster for running ML workloads.
  • EKS Node Groups: Autoscaling groups for selected resource types.

Step-by-Step Guide

1. Install Terraform and AWS CLI

VESSL uses Terraform to add a EKS cluster, EKS node groups, and Kubernetes installations.

2. Configure cluster config

First, clone VESSL’s cloud integration terraform code from Github.

git clone https://github.com/vessl-ai/vessl-cloud-integration
cd vessl-cloud-integration/examples/aws-eks-existing-vpc

Using VESSL CLI, you can configure Terraform variables and the Terraform backend.

pip install vessl
vessl cluster create-config aws

In your directory and in the bucket, two config files and a node group definition file will be generated.

  1. terraform.tfbackend: This file configures Terraform’s backend storage.
  2. terraform.tfvars: This file specifies the variables for your cluster configuration.
  3. nodes.tf: Thie file defines the node groups of your resource types

3. Applying terraform

To initialize your terraform state,

terraform init -backend-config="terraform.tfbackend"

The actual resources will be created by applying terraform.

terraform apply -var-file="terraform.tfvars"

The installation process takes about 20~30 minutes. While installing, please keep your internet connection on.

Once the cluster is installed, you can find it on the cluster page.

Destroy and delete the cluster

In order to destroy all resources created by VESSL, including the clusters, follow these steps:

terraform destroy -var-file="terraform.tfvars"

When the config file is missing in local, you can download it and start from scratch.

git clone https://github.com/vessl-ai/vessl-cloud-integration
cd vessl-cloud-integration/examples/aws-eks-existing-vpc
vessl cluster get-config [cluster_name]
terraform init -backend-config="terraform.tfbackend"
terraform destroy -var-file="terraform.tfvars"

After destroying a cluster, you can delete it from the cluster page.