Local - Kind

In this quickstart guide, you will install Substratus into a Kubernetes cluster running on your local machine and deploy an Open Source LLM.

Required Tools

Make sure you have the following tools installed and up to date.

kind
kubectl

Setup

Larger models require a GPU and Kind can work with GPUs. It's recommended to choose GPU if you have a GPU available on your machine.

Create a local Kubernetes cluster using Kind.

kind create cluster --name substratus --config - <<EOF
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
  extraPortMappings:
  - containerPort: 30080
    hostPort: 30080
EOF

Install the NVIDIA container toolkit for Docker: Install Guide

The steps below are all required as part of a workaround for Kind to support GPUs. You can read more about the workaround on the Kind PR for GPU support

Configure nvidia as the default runtime for Docker:

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default

Restart docker daemon:

sudo systemctl restart docker

Change accept-nvidia-visible-devices-as-volume-mounts to true in /etc/nvidia-container-runtime/config.toml:

sudo sed -i '/accept-nvidia-visible-devices-as-volume-mounts/c\accept-nvidia-visible-devices-as-volume-mounts = true' \
  /etc/nvidia-container-runtime/config.toml

Create the kind cluster: embedmd:# (https://raw.githubusercontent.com/substratusai/substratus/main/install/kind/up-gpu.sh bash /kind create cluster.*/ /^EOF$/)

kind create cluster --name substratus --config - <<EOF
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
  image: kindest/node:v1.27.3@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72
  extraPortMappings:
  - containerPort: 30080
    hostPort: 30080
  # required for GPU workaround
  extraMounts:
    - hostPath: /dev/null
      containerPath: /var/run/nvidia-container-devices/all
EOF

Create required symlink inside kind container (workaround for issue with nvidia operator): embedmd:# (https://raw.githubusercontent.com/substratusai/substratus/main/install/kind/up-gpu.sh bash /docker exec -ti .*/ /.*ldconfig.real/)

docker exec -ti substratus-control-plane ln -s /sbin/ldconfig /sbin/ldconfig.real

Install the NVIDIA GPU operator: embedmd:# (https://raw.githubusercontent.com/substratusai/substratus/main/install/kind/up-gpu.sh bash /helm repo add .*/ /.*driver.enabled=false/)

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia || true
helm repo update
helm install --wait --generate-name \
     -n gpu-operator --create-namespace \
     nvidia/gpu-operator --set driver.enabled=false

Install Substratus.

kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/install/kind/manifests.yaml

Deploy LLM

Now that Substratus is running let's deploy an Open Source LLM.

Running CPU mode only supports smaller models. Rou can use the (relatively-tiny) Facebook OPT 125M model (125 million parameters) from HuggingFace.

apiVersion: substratus.ai/v1
kind: Model
metadata:
  namespace: default
  name: facebook-opt-125m
spec:
  image: substratusai/model-loader-huggingface
  params:
    name: facebook/opt-125m

kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/examples/facebook-opt-125m/base-model.yaml

The model is now being downloaded from HuggingFace into local storage. You can apply the following Server manifest to deploy the Model once it is imported.

apiVersion: substratus.ai/v1
kind: Server
metadata:
  name: facebook-opt-125m
spec:
  image: substratusai/model-server-basaran
  model:
    name: facebook-opt-125m

kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/examples/facebook-opt-125m/base-server.yaml

Running Kind with GPU allows you to use bigger models such as the Falcon 7B model, which has 7 billion parameters. The falcon-7b-instruct model requires about 3.5 GB of GPU memory when serving in 4 bit mode.

apiVersion: substratus.ai/v1
kind: Model
metadata:
  name: falcon-7b-instruct
spec:
  image: substratusai/model-loader-huggingface
  params:
    name: tiiuae/falcon-7b-instruct

kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/examples/falcon-7b-instruct/base-model.yaml

The model is now being downloaded from HuggingFace into local storage. You can apply the following Server manifest to deploy the Model once it is imported.

apiVersion: substratus.ai/v1
kind: Server
metadata:
  name: falcon-7b-instruct
spec:
  image: substratusai/model-server-basaran
  model:
    name: falcon-7b-instruct
  env:
    MODEL_LOAD_IN_8BIT: "true"
  resources:
    gpu:
      count: 1

kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/examples/falcon-7b-instruct/server-4bit-any-gpu.yaml

You can check on the progress of both processes using the following command.

kubectl get ai

When the Server reports a Ready status, proceed to the next section to test it out.

Talk to your LLM!

note

The 125 million parameters used in the CPU example is not much in the world of LLMs. Expect some whacky answers to your prompts!

In order to access the model for exploratory purposes, forward ports from within the cluster to your local machine.

kubectl port-forward service/facebook-opt-125m-server 8080:8080

In order to access the model for exploratory purposes, forward ports from within the cluster to your local machine.

kubectl port-forward service/falcon-7b-instruct-server 8080:8080

All substratus Servers ship with an API and interactive frontend. Open up your browser to http://localhost:8080/ and talk to your model! Alternatively, request text generation via the OpenAI compatible HTTP API:

curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{ \
    "model": "facebook-opt-125m", \
    "prompt": "Who was the first president of the United States? ", \
    "max_tokens": 10\
  }'

The process that is serving the model can be stopped by simply deleting the same Server object that was applied before.

kubectl delete server facebook-opt-125m

Cleanup

Delete the local cluster.

kind delete cluster --name substratus

If you are interested in continuing your journey through Substratus, take a look at the guides to learn how to finetune models with your own dataset and much more!

To learn more about how Substratus works, check out the Overview page.

Local - Kind

Required Tools​

Setup​

Deploy LLM​

Talk to your LLM!​

Cleanup​

Required Tools

Setup

Deploy LLM

Talk to your LLM!

Cleanup