Testing Guide — Beluga AI
Testing is a critical part of the Beluga AI development process. With 2,885 tests across 157 packages, the test suite serves as both a safety net and a specification — tests document how each interface behaves, what edge cases are handled, and how components interact. This guide covers how to run tests, write new ones, and follow the testing conventions used throughout the project.
Running Tests
Section titled “Running Tests”| Command | Description |
|---|---|
make test | Run all unit tests |
make test-verbose | Run unit tests with verbose output |
make integration-test | Run integration tests (requires external services) |
make coverage | Generate an HTML coverage report |
make bench | Run benchmarks |
make fuzz | Run fuzz tests |
You can also run tests for a specific package:
go test ./llm/...go test -v ./agent/... -run TestAgentHandoffUnit Test Conventions
Section titled “Unit Test Conventions”File Placement
Section titled “File Placement”Test files live alongside the source code they test. This co-location makes it easy to find tests for any source file and ensures that tests are updated when the source changes:
llm/├── router.go├── router_test.go├── structured.go└── structured_test.goTable-Driven Tests
Section titled “Table-Driven Tests”Table-driven tests are the preferred pattern throughout Beluga AI. They make it easy to add new cases, provide clear failure messages that identify which specific case failed, and separate test data from test logic. When you need to test a function with multiple inputs, a table-driven test is almost always the right approach:
func TestParseTemperature(t *testing.T) { tests := []struct { name string input float64 wantErr bool }{ {name: "valid zero", input: 0.0, wantErr: false}, {name: "valid mid", input: 1.0, wantErr: false}, {name: "valid max", input: 2.0, wantErr: false}, {name: "negative", input: -0.1, wantErr: true}, {name: "too high", input: 2.1, wantErr: true}, }
for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { err := validateTemperature(tt.input) if (err != nil) != tt.wantErr { t.Errorf("validateTemperature(%v) error = %v, wantErr %v", tt.input, err, tt.wantErr) } }) }}Subtests
Section titled “Subtests”Use t.Run() to create subtests for logical grouping:
func TestAgent(t *testing.T) { t.Run("Run", func(t *testing.T) { /* ... */ }) t.Run("Stream", func(t *testing.T) { /* ... */ }) t.Run("Handoff", func(t *testing.T) { /* ... */ })}Using Mocks
Section titled “Using Mocks”Every public interface in Beluga AI has a corresponding mock in internal/testutil/. Use these shared mocks in your tests instead of creating ad-hoc implementations. This ensures consistent test behavior across the codebase and reduces the maintenance burden of keeping multiple mock implementations in sync with interface changes:
import "github.com/lookatitude/beluga-ai/internal/testutil"
func TestAgentWithMockModel(t *testing.T) { mock := &testutil.MockChatModel{ GenerateFunc: func(ctx context.Context, msgs []schema.Message, opts ...llm.Option) (*schema.Message, error) { return &schema.Message{ Role: schema.RoleAssistant, Content: "Hello from mock!", }, nil }, }
a := agent.New("test-agent", agent.WithModel(mock)) result, err := a.Run(context.Background(), "Hi") if err != nil { t.Fatal(err) } if result != "Hello from mock!" { t.Errorf("unexpected result: %s", result) }}Integration Tests
Section titled “Integration Tests”Integration tests interact with external services (databases, APIs, etc.) and are separated from unit tests using a build tag. This separation ensures that make test runs quickly without requiring external infrastructure, while make integration-test provides full end-to-end validation when needed.
Build Tag
Section titled “Build Tag”Add the following build constraint at the top of integration test files:
//go:build integration
package llm_testRunning
Section titled “Running”# Run all integration testsmake integration-test
# Run integration tests for a specific packagego test -tags=integration ./llm/providers/openai/...Integration tests are run in CI on every PR but may require environment variables for API keys and service URLs. Check each provider’s test file for required configuration.
Fuzz Testing
Section titled “Fuzz Testing”Fuzz tests help find edge cases and unexpected inputs that table-driven tests might miss. They are especially valuable for parsing functions, serialization logic, and anything that processes untrusted input. Use the standard Go fuzzing framework:
# Run all fuzz testsmake fuzz
# Run a specific fuzz testgo test -fuzz=FuzzParseMessage ./schema/...Naming Convention
Section titled “Naming Convention”Fuzz test functions must start with Fuzz:
func FuzzParseMessage(f *testing.F) { f.Add([]byte(`{"role":"user","content":"hello"}`))
f.Fuzz(func(t *testing.T, data []byte) { msg, err := schema.ParseMessage(data) if err != nil { return // invalid input is fine } // If parsing succeeds, re-marshaling should not fail _, err = json.Marshal(msg) if err != nil { t.Errorf("re-marshal failed: %v", err) } })}Benchmarks
Section titled “Benchmarks”Write benchmarks for hot paths such as streaming, tool execution, and retrieval. Benchmarks are especially important for the core/ stream utilities and the agent executor loop, where per-event overhead directly impacts latency:
# Run all benchmarksmake bench
# Run benchmarks for a specific packagego test -bench=. -benchmem ./core/...Writing Benchmarks
Section titled “Writing Benchmarks”func BenchmarkStreamProcessing(b *testing.B) { agent := setupTestAgent() ctx := context.Background()
b.ResetTimer() for i := 0; i < b.N; i++ { for event, err := range agent.Stream(ctx, "test input") { if err != nil { b.Fatal(err) } _ = event } }}Code Coverage
Section titled “Code Coverage”Aim for high test coverage, especially for core packages and public APIs. Coverage reports help identify untested code paths, but coverage alone does not guarantee correctness — focus on meaningful assertions that verify behavior, not just line execution:
# Generate coverage reportmake coverage
# View in browser (generates coverage.html)go tool cover -html=coverage.out -o coverage.htmlCoverage reports are generated in CI and included in PR checks.
Testing Streaming Code
Section titled “Testing Streaming Code”Testing iter.Seq2 based streaming requires collecting events into a slice before making assertions. The collectEvents helper function below is the standard pattern used throughout the test suite. This pattern converts the push-based iterator into a collected slice that you can assert on with standard Go testing tools:
func collectEvents(seq iter.Seq2[schema.Event, error]) ([]schema.Event, error) { var events []schema.Event for event, err := range seq { if err != nil { return events, err } events = append(events, event) } return events, nil}
func TestAgentStream(t *testing.T) { a := setupTestAgent() events, err := collectEvents(a.Stream(context.Background(), "Hello")) if err != nil { t.Fatal(err) } if len(events) == 0 { t.Error("expected at least one event") }}CI Checks
Section titled “CI Checks”All tests run automatically on every pull request via GitHub Actions. The CI pipeline runs:
- Lint —
go vetandgolangci-lint(13 linters including gosec, staticcheck, errcheck) - Build —
go build ./...andgo mod tidyverification - Unit tests —
go test -racewith coverage reporting - Integration tests —
go test -race -tags integration - Security scans — Snyk (dependency vulnerabilities), Trivy (filesystem scanning), govulncheck (Go vulnerability database), gosec (static security analysis), Gitleaks (secret detection), go-licenses (license compliance)
- SonarCloud — Code quality, duplication detection, and maintainability analysis
- Greptile — AI-powered code review via GitHub App (automatic on every PR)
All checks must pass before a PR can be merged. See the Pull Request Process for details.