Skip to main content

GCP - Google Cloud

In this quickstart guide, you will create a Kubernetes Cluster on Google Cloud and install Substratus. By the end you will have the Open Source Falcon 7B model deployed and ready to interact with.


Support for AWS (GitHub Issue #12) and Azure (GitHub Issue #63) is planned. Give those issues a thumbs up if you would like to see them prioritized.

Required Tools

Make sure you have the following tools installed and up to date.


You will need a Google Cloud Platform project with billing enabled.

Set your current project to the project you want to use for Substratus:

gcloud config set project <your-project-id>
export PROJECT_ID=$(gcloud config get project)

Create a GKE cluster along with supporting infrastructure (buckets, service accounts, image registries) and install Substratus operator by using the convenience script:

bash <(curl

After creating the GKE cluster in the last step, kubectl should now be pointing at the Substratus cluster. The substratus operator has been installed into the substratus namespace.

Deploy LLM

The following Model object will import the medium-sized Falcon 7B Instruct model (7 billion parameters) from HuggingFace.

kind: Model
name: falcon-7b-instruct
image: substratusai/model-loader-huggingface
name: tiiuae/falcon-7b-instruct
kubectl apply -f

The model is now being downloaded from HuggingFace into the Substratus GCS bucket. This takes about 5 minutes. You can apply the following Server object to start serving the Model once it is loaded.

kind: Server
name: falcon-7b-instruct
image: substratusai/model-server-basaran
name: falcon-7b-instruct
type: nvidia-l4
count: 1
kubectl apply -f

You can check on the progress of both processes using the following command.

kubectl get ai

When the Server reports a Ready status, proceed to the next section to test it out.

Talk to your LLM!

In order to access the model for exploratory purposes, forward ports from within the cluster to your local machine.

kubectl port-forward service/falcon-7b-instruct-server 8080:8080

All substratus Servers ship with an API and interactive frontend. Open up your browser to http://localhost:8080/ and talk to your model! Alternatively, request text generation via the OpenAI compatible HTTP API:

curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{ \
"model": "falcon-7b-instruct", \
"prompt": "Who was the first president of the United States? ", \
"max_tokens": 10\

The process that is serving the model can be stopped by simply deleting the same Server object that was applied before.

kubectl delete server falcon-7b-instruct


Delete all GCP infrastructure.

bash <(curl

If you are interested in continuing your journey through Substratus, take a look at the guides to learn how to finetune models with your own dataset and much more!

To learn more about how Substratus works, check out the Overview page.