Skip to main content

2 posts tagged with "k8s"

View All Tags

· 3 min read
mistral 7b k8s helm

Learn how to use the text-generation-inference (TGI) Helm Chart to quickly deploy Mistral 7B Instruct on your K8s cluster.

Add the Substratus.ai Helm repo:

helm repo add substratusai https://substratusai.github.io/helm

This command adds a new Helm repository, making the text-generation-inference Helm chart available for installation.

Create a configuration file named values.yaml. This file will contain the necessary settings for your deployment. Here’s an example of what the content should look like:

model: mistralai/Mistral-7B-Instruct-v0.1
# resources: # optional, override if you need more than 1 GPU
# limits:
# nvidia.com/gpu: 1
# nodeSelector: # optional, can be used to target specific GPUs
# cloud.google.com/gke-accelerator: nvidia-l4

In this configuration file, you are specifying the model to be deployed and optionally setting resource limits or targeting specific nodes based on your requirements.

With your configuration file ready, you can now deploy Mistral 7B Instruct using Helm:

helm install mistral-7b-instruct substratusai/text-generation-inference \
-f values.yaml

This command initiates the deployment, creating a Kubernetes Deployment and Service based on the settings defined in your values.yaml file.

After initiating the deployment, it's important to ensure that everything is running as expected. Run the following command to get detailed information about the newly created pod:

kubectl describe pod -l app.kubernetes.io/instance=mistral-7b-instruct

This will display various details about the pod, helping you to confirm that it has been successfully created and is in the right state. Note that depending on your cluster's setup, you might need to wait for the cluster autoscaler to provision additional resources if necessary.

Once the pod is running, check the logs to ensure that the model is initializing properly:

kubectl logs -f -l app.kubernetes.io/instance=mistral-7b-instruct

The model first downloads the model and after a few minutes, you should see a message that looks like this:

Invalid hostname, defaulting to 0.0.0.0

This is expected and means it's now serving on host 0.0.0.0.

By default, the model is only accessible within the Kubernetes cluster. To access it from your local machine, set up a port forward:

kubectl port-forward deployments/mistral-7b-instruct-text-generation-inference 8080:8080

This command maps port 8080 on your local machine to port 8080 on the deployed pod, allowing you to interact with the model directly.

With the service exposed, you can now run inference tasks. To explore the available API endpoints and their usage, visit the TGI API documentation at http://localhost:8080/docs.

Here’s an example of how to use curl to run an inference task:

curl 127.0.0.1:8080/generate -X POST \
-H 'Content-Type: application/json' \
--data-binary @- << 'EOF' | jq -r '.generated_text'
{
"inputs": "<s>[INST] Write a K8s YAML file to create a pod that deploys nginx[/INST]",
"parameters": {"max_new_tokens": 400}
}
EOF

In this example, we are instructing the model to generate a Kubernetes YAML file for deploying an Nginx pod. The prompt includes specific tokens that the Mistral 7B Instruct model recognizes, ensuring accurate and context-aware responses.

The prompt we are using starts with <s> token which indicates beginning of a sequence. The [INST] token tells Mistral-7b Instruct what follows is an instruction. The Mistral 7B Instruct model was finetuned with this prompt template, so it's important to re-use that same prompt template.

The response is quite impressive, it did return a valid K8s YAML manifest and also instructions on how to apply it.

Need help? Want to see other models? other serving frameworks?
Join our Discord and ask me directly:

discord-invite

· 3 min read
kubectl notebook

Excited to announce the K8s YAML dataset containing 276,520 valid K8s YAML files.

HuggingFace Dataset: https://huggingface.co/datasets/substratusai/the-stack-yaml-k8s
Source code: https://github.com/substratusai/the-stack-yaml-k8s

Why?

  • This dataset can be used to fine-tune an LLM directly
  • New datasets can be created from his dataset such as an K8s instruct dataset (coming soon!)
  • What's your use case?

How?

Getting a lot of K8s YAML manifests wasn't easy. My initial approach was to use the Kubernetes website and scrape the YAML example files, however the issue was the quantity since I could only scrape about ~250 YAML examples that way.

Luckily, I came across the-stack dataset which is a cleaned dataset of code on GitHub. The dataset is nicely structured by language and I noticed that yaml was one of the languages in the dataset.

Install libraries used in this blog post:

pip3 install datasets kubernetes-validate

Let's load the the-stack dataset but only the YAML files (takes about 200GB of disk space):

from datasets import load_dataset
ds = load_dataset("bigcode/the-stack", data_dir="data/yaml", split="train")

Once loaded there are 13,439,939 YAML files in ds.

You can check the content of one of the files:

print(ds[0]["content"])

You probably notice that this ain't a K8s YAML file, so next we need to filter these 13 million YAML files and only keep the one that have valid K8 YAML.

The approach I took was to use the kubernetes-validate OSS library. It turned out that YAML parsing was too slow so I added a 10x speed improvement by eagerly checking if "Kind or "kind" is not a substring in the YAML file.

Here is the validate function that takes the yaml_content as a string and returns if the content was valid K8s YAML or not:

import kubernetes_validate
import yaml

def validate(yaml_content: str):
try:
# Speed optimization to return early without having to load YAML
if "kind" not in yaml_content and "Kind" not in yaml_content:
return False
data = yaml.safe_load(yaml_content)
kubernetes_validate.validate(data, '1.22', strict=True)
return True
except Exception as e:
return False

validate(ds[0]["content"])

Now all that's needed is to filter out all YAML files that aren't valid:

import os
os.cpu_count()
valid_k8s = ds.filter(lambda batch: [validate(x) for x in batch["content"]],
num_proc=os.cpu_count(), batched=True)

There were 276,520 YAML files left in valid_k8s. You can print one again to see:

print(valid_k8s[0]["content"])

You can upload the dataset back to HuggingFace by running:

valid_k8s.push_to_hub("substratusai/the-stack-yaml-k8s")

What's next?

Creating a new dataset called K8s Instruct that also provides a prompt for each YAML file.

Support the project by adding a star on GitHub! ❤️ Star