Skip to content
Docs

RAGAS Evaluation Provider

The RAGAS provider connects Beluga AI’s evaluation framework to a RAGAS server instance. It implements the eval.Metric interface with RAG-specific evaluation metrics such as faithfulness, answer relevancy, context precision, and context recall.

Choose RAGAS when you are evaluating RAG pipelines and need metrics that specifically measure retrieval quality and answer groundedness. RAGAS provides four complementary metrics (faithfulness, answer relevancy, context precision, context recall) designed for end-to-end RAG assessment. For general LLM evaluation beyond RAG, consider DeepEval or Braintrust.

Terminal window
go get github.com/lookatitude/beluga-ai/eval/providers/ragas
OptionTypeDefaultDescription
WithMetricName(name)string"faithfulness"Metric to evaluate
WithBaseURL(url)stringhttp://localhost:8080RAGAS server endpoint
WithAPIKey(key)stringOptional bearer token for authentication
WithTimeout(d)time.Duration30sHTTP request timeout
Metric NameDescription
faithfulnessMeasures whether the answer is grounded in the provided context
answer_relevancyMeasures how relevant the answer is to the question
context_precisionMeasures whether the retrieved context contains relevant information
context_recallMeasures whether all relevant information is present in the context
package main
import (
"context"
"fmt"
"log"
"github.com/lookatitude/beluga-ai/eval"
"github.com/lookatitude/beluga-ai/eval/providers/ragas"
"github.com/lookatitude/beluga-ai/schema"
)
func main() {
metric, err := ragas.New(
ragas.WithMetricName("faithfulness"),
ragas.WithBaseURL("http://localhost:8080"),
)
if err != nil {
log.Fatal(err)
}
sample := eval.EvalSample{
Input: "What is photosynthesis?",
Output: "Photosynthesis converts sunlight into chemical energy in plants.",
ExpectedOutput: "Photosynthesis is the process by which plants convert light energy into glucose.",
RetrievedDocs: []schema.Document{
{Content: "Photosynthesis is a process used by plants to convert light energy into chemical energy."},
{Content: "The process occurs primarily in the leaves of plants using chlorophyll."},
},
}
score, err := metric.Score(context.Background(), sample)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s: %.3f\n", metric.Name(), score)
// Output: ragas_faithfulness: 0.920
}

Use RAGAS metrics with the evaluation runner for batch evaluation:

faithfulness, err := ragas.New(
ragas.WithMetricName("faithfulness"),
ragas.WithBaseURL("http://localhost:8080"),
)
if err != nil {
log.Fatal(err)
}
relevancy, err := ragas.New(
ragas.WithMetricName("answer_relevancy"),
ragas.WithBaseURL("http://localhost:8080"),
)
if err != nil {
log.Fatal(err)
}
runner := eval.NewRunner(
eval.WithMetrics(faithfulness, relevancy),
eval.WithDataset(samples),
eval.WithParallel(4),
eval.WithTimeout(5 * time.Minute),
)
report, err := runner.Run(context.Background())
if err != nil {
log.Fatal(err)
}
fmt.Printf("Faithfulness: %.3f\n", report.Metrics["ragas_faithfulness"])
fmt.Printf("Answer relevancy: %.3f\n", report.Metrics["ragas_answer_relevancy"])

Combine multiple RAGAS metrics for a comprehensive RAG pipeline assessment:

metricNames := []string{"faithfulness", "answer_relevancy", "context_precision", "context_recall"}
var metrics []eval.Metric
for _, name := range metricNames {
m, err := ragas.New(
ragas.WithMetricName(name),
ragas.WithBaseURL("http://localhost:8080"),
)
if err != nil {
log.Fatal(err)
}
metrics = append(metrics, m)
}
runner := eval.NewRunner(
eval.WithMetrics(metrics...),
eval.WithDataset(samples),
eval.WithParallel(4),
)
report, err := runner.Run(ctx)
if err != nil {
log.Fatal(err)
}
for name, score := range report.Metrics {
fmt.Printf("%s: %.3f\n", name, score)
}

RAGAS uses RAG-specific terminology. The provider automatically maps EvalSample fields to RAGAS conventions:

EvalSample FieldRAGAS FieldDescription
InputquestionThe user’s query
OutputanswerThe generated response
ExpectedOutputground_truthThe reference answer
RetrievedDocscontextsArray of context document contents

For RAGAS servers that require authentication, provide an API key:

metric, err := ragas.New(
ragas.WithMetricName("faithfulness"),
ragas.WithBaseURL("https://ragas.example.com"),
ragas.WithAPIKey(os.Getenv("RAGAS_API_KEY")),
)

The API key is sent as a bearer token in the Authorization header.

RAGAS metrics are prefixed with ragas_ to distinguish them from metrics from other providers. For example, a metric configured with WithMetricName("faithfulness") reports its name as ragas_faithfulness.

score, err := metric.Score(ctx, sample)
if err != nil {
// Errors include HTTP failures, invalid metric names, and server-side errors
log.Printf("RAGAS scoring failed: %v", err)
}

Scores are clamped to the [0.0, 1.0] range. If the API returns a score outside this range, it is automatically normalized.