Groq LLM Provider
The Groq provider connects Beluga AI to Groq’s inference platform, which uses custom LPU (Language Processing Unit) hardware for extremely fast token generation. Groq exposes an OpenAI-compatible API, so this provider supports all standard features including streaming, tool calling, and structured output.
Choose Groq when inference latency is your primary concern. Groq’s LPU hardware delivers the fastest token generation available, making it well-suited for interactive applications, real-time agents, and latency-sensitive pipelines. Groq hosts popular open-source models including Llama and Mixtral.
Installation
Section titled “Installation”go get github.com/lookatitude/beluga-ai/llm/providers/groqConfiguration
Section titled “Configuration”| Field | Required | Default | Description |
|---|---|---|---|
Model | Yes | — | Model ID (e.g. "llama-3.3-70b-versatile") |
APIKey | Yes | — | Groq API key (gsk_...) |
BaseURL | No | https://api.groq.com/openai/v1 | Override API endpoint |
Timeout | No | 30s | Request timeout |
Environment variables:
| Variable | Maps to |
|---|---|
GROQ_API_KEY | APIKey |
Basic Usage
Section titled “Basic Usage”package main
import ( "context" "fmt" "log" "os"
"github.com/lookatitude/beluga-ai/config" "github.com/lookatitude/beluga-ai/llm" "github.com/lookatitude/beluga-ai/schema" _ "github.com/lookatitude/beluga-ai/llm/providers/groq")
func main() { model, err := llm.New("groq", config.ProviderConfig{ Model: "llama-3.3-70b-versatile", APIKey: os.Getenv("GROQ_API_KEY"), }) if err != nil { log.Fatal(err) }
msgs := []schema.Message{ schema.NewSystemMessage("You are a helpful assistant."), schema.NewHumanMessage("What is the capital of France?"), }
resp, err := model.Generate(context.Background(), msgs) if err != nil { log.Fatal(err) }
fmt.Println(resp.Text())}Streaming
Section titled “Streaming”for chunk, err := range model.Stream(context.Background(), msgs) { if err != nil { log.Fatal(err) } fmt.Print(chunk.Delta)}fmt.Println()Advanced Features
Section titled “Advanced Features”Tool Calling
Section titled “Tool Calling”tools := []schema.ToolDefinition{ { Name: "get_weather", Description: "Get current weather for a location", InputSchema: map[string]any{ "type": "object", "properties": map[string]any{ "location": map[string]any{ "type": "string", "description": "City name", }, }, "required": []any{"location"}, }, },}
modelWithTools := model.BindTools(tools)resp, err := modelWithTools.Generate(ctx, msgs, llm.WithToolChoice(llm.ToolChoiceAuto))Structured Output
Section titled “Structured Output”resp, err := model.Generate(ctx, msgs, llm.WithResponseFormat(llm.ResponseFormat{Type: "json_object"}),)Generation Options
Section titled “Generation Options”resp, err := model.Generate(ctx, msgs, llm.WithTemperature(0.7), llm.WithMaxTokens(2048), llm.WithTopP(0.9), llm.WithStopSequences("END"),)Error Handling
Section titled “Error Handling”resp, err := model.Generate(ctx, msgs)if err != nil { log.Fatal(err)}Direct Construction
Section titled “Direct Construction”import "github.com/lookatitude/beluga-ai/llm/providers/groq"
model, err := groq.New(config.ProviderConfig{ Model: "llama-3.3-70b-versatile", APIKey: os.Getenv("GROQ_API_KEY"),})Available Models
Section titled “Available Models”| Model ID | Description |
|---|---|
llama-3.3-70b-versatile | Llama 3.3 70B — best quality on Groq |
llama-3.1-8b-instant | Llama 3.1 8B — fastest, lowest latency |
mixtral-8x7b-32768 | Mixtral 8x7B — 32K context, MoE model |
gemma2-9b-it | Gemma 2 9B — compact, instruction-tuned |
Refer to Groq’s documentation for the latest model list.