Pixtral Vision (Mistral AI)
Many AI applications need to process visual content — extracting data from screenshots, analyzing product images, reading documents, or understanding diagrams. Pixtral is Mistral AI’s vision-language model that handles these tasks through the same API used for text-only Mistral models, making it straightforward to add visual understanding to existing Mistral-based workflows.
Choose Pixtral when you need vision capabilities from a European-hosted provider (relevant for GDPR compliance) or when you are already using Mistral for text generation and want to keep your provider stack consistent.
Overview
Section titled “Overview”Pixtral provides vision-language capabilities through the Mistral AI API. When integrated with Beluga AI, it enables:
- Visual question answering — ask questions about image content
- Image captioning — generate descriptions from images
- Scene analysis — detailed analysis of complex visual scenes
- Document understanding — extract information from documents and screenshots
Pixtral uses the same Mistral API as other Mistral models, with multimodal input support through content parts.
Prerequisites
Section titled “Prerequisites”- Go 1.23 or later
- A Mistral AI API key (obtain from mistral.ai)
- Beluga AI framework installed
Installation
Section titled “Installation”The Mistral provider is included in Beluga AI. No additional dependencies are required beyond the framework itself.
Set your API key:
export MISTRAL_API_KEY="your-api-key"Configuration
Section titled “Configuration”Basic Setup
Section titled “Basic Setup”Create a Mistral provider configured for the Pixtral model:
package main
import ( "context" "fmt" "log" "os"
"github.com/lookatitude/beluga-ai/llm/providers/mistral" "github.com/lookatitude/beluga-ai/schema")
func main() { ctx := context.Background()
config := &mistral.Config{ APIKey: os.Getenv("MISTRAL_API_KEY"), Model: "pixtral-12b", }
provider, err := mistral.New(config) if err != nil { log.Fatalf("failed to create Pixtral provider: %v", err) }
messages := []schema.Message{ &schema.UserMessage{ Content: []schema.ContentPart{ schema.TextPart{Text: "What is in this image?"}, schema.ImagePart{Data: []byte{/* image data */}, MIMEType: "image/png"}, }, }, }
response, err := provider.Generate(ctx, messages) if err != nil { log.Fatalf("generation failed: %v", err) }
fmt.Printf("Response: %s\n", response.Content)}Verify the setup:
export MISTRAL_API_KEY="your-api-key"go run main.goConfiguration Reference
Section titled “Configuration Reference”| Option | Description | Default | Required |
|---|---|---|---|
APIKey | Mistral AI API key | - | Yes |
Model | Pixtral model name | pixtral-12b | No |
Timeout | Request timeout | 30s | No |
MaxImageSize | Maximum image size in bytes | 20MB | No |
Vision-Language Tasks
Section titled “Vision-Language Tasks”Pixtral supports several vision-language task patterns:
import "github.com/lookatitude/beluga-ai/schema"
// Image captioning (image only, no text prompt)messages := []schema.Message{ &schema.UserMessage{ Content: []schema.ContentPart{ schema.ImagePart{Data: imageData, MIMEType: "image/png"}, }, },}
// Visual question answeringmessages = []schema.Message{ &schema.UserMessage{ Content: []schema.ContentPart{ schema.TextPart{Text: "What color is the car in this image?"}, schema.ImagePart{Data: imageData, MIMEType: "image/jpeg"}, }, },}
// Detailed scene analysismessages = []schema.Message{ &schema.UserMessage{ Content: []schema.ContentPart{ schema.TextPart{Text: "Analyze this image and describe the scene in detail."}, schema.ImagePart{Data: imageData, MIMEType: "image/png"}, }, },}Using Pixtral with Agents
Section titled “Using Pixtral with Agents”Integrate Pixtral as the backing model for a Beluga AI agent that can process images:
package main
import ( "context" "log" "os"
"github.com/lookatitude/beluga-ai/agent" "github.com/lookatitude/beluga-ai/llm/providers/mistral")
func main() { ctx := context.Background()
config := &mistral.Config{ APIKey: os.Getenv("MISTRAL_API_KEY"), Model: "pixtral-12b", }
model, err := mistral.New(config) if err != nil { log.Fatalf("failed to create model: %v", err) }
a, err := agent.New( agent.WithName("vision-agent"), agent.WithModel(model), agent.WithInstructions("You are an image analysis assistant."), ) if err != nil { log.Fatalf("failed to create agent: %v", err) }
// Agent can now process multimodal inputs _ = a _ = ctx}Advanced Topics
Section titled “Advanced Topics”OpenTelemetry Instrumentation
Section titled “OpenTelemetry Instrumentation”Add tracing to Pixtral calls:
package main
import ( "context" "fmt" "log" "os" "time"
"github.com/lookatitude/beluga-ai/llm/providers/mistral" "github.com/lookatitude/beluga-ai/schema" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/trace")
func main() { ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel()
config := &mistral.Config{ APIKey: os.Getenv("MISTRAL_API_KEY"), Model: "pixtral-12b", }
tracer := otel.Tracer("beluga.llm.pixtral") ctx, span := tracer.Start(ctx, "pixtral.generate", trace.WithAttributes( attribute.String("gen_ai.system", "mistral"), attribute.String("gen_ai.request.model", config.Model), ), ) defer span.End()
provider, err := mistral.New(config) if err != nil { span.RecordError(err) log.Fatalf("failed to create provider: %v", err) }
messages := []schema.Message{ &schema.UserMessage{ Content: []schema.ContentPart{ schema.TextPart{Text: "Describe this image in detail."}, schema.ImagePart{Data: loadImage("image.png"), MIMEType: "image/png"}, }, }, }
response, err := provider.Generate(ctx, messages) if err != nil { span.RecordError(err) log.Fatalf("generation failed: %v", err) }
span.SetAttributes( attribute.Int("gen_ai.response.input_tokens", response.Usage.InputTokens), attribute.Int("gen_ai.response.output_tokens", response.Usage.OutputTokens), )
fmt.Printf("Response: %s\n", response.Content)}
func loadImage(path string) []byte { data, err := os.ReadFile(path) if err != nil { log.Fatalf("failed to load image: %v", err) } return data}Production Considerations
Section titled “Production Considerations”When deploying Pixtral integrations to production:
- Image optimization: Resize and compress images before sending to reduce latency and cost. Pixtral supports PNG, JPEG, and WebP formats.
- Cost management: Monitor API usage through the Mistral AI dashboard. Image inputs consume more tokens than text.
- Size limits: Respect the maximum image size (20MB default). Implement client-side validation before sending requests.
- Format validation: Verify image format compatibility before sending. Unsupported formats return errors.
- Error handling: Use Beluga AI’s
resiliencepackage for automatic retry on transient API failures.
Troubleshooting
Section titled “Troubleshooting””API key invalid”
Section titled “”API key invalid””The API key is incorrect or has been revoked.
# Verify the key is setecho $MISTRAL_API_KEY
# Re-set from the Mistral AI dashboardexport MISTRAL_API_KEY="your-new-api-key"”Image format not supported”
Section titled “”Image format not supported””The provided image format is not recognized by Pixtral. Supported formats are PNG, JPEG, and WebP. Convert the image before sending:
// Ensure images are in a supported format before creating the content part// Supported: image/png, image/jpeg, image/webpRelated Resources
Section titled “Related Resources”- Google Vertex AI Vision — Google Vertex AI multimodal integration
- LLM Providers — All supported LLM providers
- Monitoring — Observability and tracing setup