Skip to main content

Local - Kind

In this quickstart guide, you will install Substratus into a Kubernetes cluster running on your local machine and deploy an Open Source LLM.

Required Tools

Make sure you have the following tools installed and up to date.

Setup

Larger models require a GPU and Kind can work with GPUs. It's recommended to choose GPU if you have a GPU available on your machine.

Create a local Kubernetes cluster using Kind.

kind create cluster --name substratus --config - <<EOF
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30080
hostPort: 30080
EOF

Install Substratus.

kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/install/kind/manifests.yaml

Deploy LLM

Now that Substratus is running let's deploy an Open Source LLM.

Running CPU mode only supports smaller models. Rou can use the (relatively-tiny) Facebook OPT 125M model (125 million parameters) from HuggingFace.

apiVersion: substratus.ai/v1
kind: Model
metadata:
namespace: default
name: facebook-opt-125m
spec:
image: substratusai/model-loader-huggingface
params:
name: facebook/opt-125m
kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/examples/facebook-opt-125m/base-model.yaml

The model is now being downloaded from HuggingFace into local storage. You can apply the following Server manifest to deploy the Model once it is imported.

apiVersion: substratus.ai/v1
kind: Server
metadata:
name: facebook-opt-125m
spec:
image: substratusai/model-server-basaran
model:
name: facebook-opt-125m
kubectl apply -f https://raw.githubusercontent.com/substratusai/substratus/main/examples/facebook-opt-125m/base-server.yaml

You can check on the progress of both processes using the following command.

kubectl get ai

When the Server reports a Ready status, proceed to the next section to test it out.

Talk to your LLM!

note

The 125 million parameters used in the CPU example is not much in the world of LLMs. Expect some whacky answers to your prompts!

In order to access the model for exploratory purposes, forward ports from within the cluster to your local machine.

kubectl port-forward service/facebook-opt-125m-server 8080:8080

All substratus Servers ship with an API and interactive frontend. Open up your browser to http://localhost:8080/ and talk to your model! Alternatively, request text generation via the OpenAI compatible HTTP API:

curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{ \
"model": "facebook-opt-125m", \
"prompt": "Who was the first president of the United States? ", \
"max_tokens": 10\
}'

The process that is serving the model can be stopped by simply deleting the same Server object that was applied before.

kubectl delete server facebook-opt-125m

Cleanup

Delete the local cluster.

kind delete cluster --name substratus

If you are interested in continuing your journey through Substratus, take a look at the guides to learn how to finetune models with your own dataset and much more!

To learn more about how Substratus works, check out the Overview page.