Skip to content
Docs

Cerebras LLM Provider

The Cerebras provider connects Beluga AI to Cerebras’ inference platform, which uses wafer-scale engine (WSE) hardware for extremely fast inference. Cerebras exposes an OpenAI-compatible API, so this provider supports all standard features including streaming, tool calling, and structured output.

Choose Cerebras when you need the fastest possible inference for Llama models. Cerebras’ wafer-scale hardware delivers extremely high tokens-per-second rates, making it particularly suited for latency-critical applications and high-throughput processing of open-source models.

Terminal window
go get github.com/lookatitude/beluga-ai/llm/providers/cerebras
FieldRequiredDefaultDescription
ModelYesModel ID (e.g. "llama-3.3-70b")
APIKeyYesCerebras API key (csk-...)
BaseURLNohttps://api.cerebras.ai/v1Override API endpoint
TimeoutNo30sRequest timeout

Environment variables:

VariableMaps to
CEREBRAS_API_KEYAPIKey
package main
import (
"context"
"fmt"
"log"
"os"
"github.com/lookatitude/beluga-ai/config"
"github.com/lookatitude/beluga-ai/llm"
"github.com/lookatitude/beluga-ai/schema"
_ "github.com/lookatitude/beluga-ai/llm/providers/cerebras"
)
func main() {
model, err := llm.New("cerebras", config.ProviderConfig{
Model: "llama-3.3-70b",
APIKey: os.Getenv("CEREBRAS_API_KEY"),
})
if err != nil {
log.Fatal(err)
}
msgs := []schema.Message{
schema.NewSystemMessage("You are a helpful assistant."),
schema.NewHumanMessage("What is the capital of France?"),
}
resp, err := model.Generate(context.Background(), msgs)
if err != nil {
log.Fatal(err)
}
fmt.Println(resp.Text())
}
for chunk, err := range model.Stream(context.Background(), msgs) {
if err != nil {
log.Fatal(err)
}
fmt.Print(chunk.Delta)
}
fmt.Println()
modelWithTools := model.BindTools(tools)
resp, err := modelWithTools.Generate(ctx, msgs, llm.WithToolChoice(llm.ToolChoiceAuto))
resp, err := model.Generate(ctx, msgs,
llm.WithResponseFormat(llm.ResponseFormat{Type: "json_object"}),
)
resp, err := model.Generate(ctx, msgs,
llm.WithTemperature(0.7),
llm.WithMaxTokens(2048),
llm.WithTopP(0.9),
llm.WithStopSequences("END"),
)
resp, err := model.Generate(ctx, msgs)
if err != nil {
log.Fatal(err)
}
import "github.com/lookatitude/beluga-ai/llm/providers/cerebras"
model, err := cerebras.New(config.ProviderConfig{
Model: "llama-3.3-70b",
APIKey: os.Getenv("CEREBRAS_API_KEY"),
})
Model IDDescription
llama-3.3-70bLlama 3.3 70B — highest quality available
llama-3.1-8bLlama 3.1 8B — ultra-fast small model

Refer to Cerebras’ documentation for the latest model list.