Skip to content
Docs

Cloud Storage Document Loader

The Cloud Storage loader implements the loader.DocumentLoader interface for loading files from cloud object storage services. It detects the cloud provider automatically by URL prefix (s3://, gs://, az://) and fetches content via direct HTTP calls.

Choose Cloud Storage when your documents are stored in AWS S3, Google Cloud Storage, or Azure Blob Storage. The loader auto-detects the cloud provider from the URL prefix, so a single loader instance handles all three. For loading from SaaS knowledge bases, consider Confluence, Notion, or Google Drive.

Terminal window
go get github.com/lookatitude/beluga-ai/rag/loader/providers/cloudstorage
package main
import (
"context"
"fmt"
"log"
"os"
"github.com/lookatitude/beluga-ai/config"
"github.com/lookatitude/beluga-ai/rag/loader"
_ "github.com/lookatitude/beluga-ai/rag/loader/providers/cloudstorage"
)
func main() {
l, err := loader.New("cloudstorage", config.ProviderConfig{
APIKey: os.Getenv("AWS_ACCESS_KEY_ID"),
Options: map[string]any{
"secret_key": os.Getenv("AWS_SECRET_ACCESS_KEY"),
"region": "us-east-1",
},
})
if err != nil {
log.Fatal(err)
}
docs, err := l.Load(context.Background(), "s3://my-bucket/documents/report.txt")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Content: %s\n", docs[0].Content)
}
ParameterTypeDefaultDescription
APIKeystring""Access key or bearer token for authentication
Timeouttime.Duration60sHTTP request timeout
Options["secret_key"]string""Secret key (used with S3)
Options["region"]stringus-east-1AWS region for S3 bucket URLs

The loader determines the cloud provider from the URL prefix:

PrefixProviderURL Format
s3://AWS S3s3://bucket/key
gs://Google Cloud Storagegs://bucket/object
az://Azure Blob Storageaz://container/blob
docs, err := l.Load(ctx, "s3://my-bucket/data/document.pdf")
l, err := loader.New("cloudstorage", config.ProviderConfig{
APIKey: os.Getenv("GCS_ACCESS_TOKEN"),
})
docs, err := l.Load(ctx, "gs://my-bucket/reports/summary.txt")
l, err := loader.New("cloudstorage", config.ProviderConfig{
APIKey: os.Getenv("AZURE_STORAGE_TOKEN"),
})
docs, err := l.Load(ctx, "az://my-container/files/data.csv")

Each loaded document includes the following metadata fields:

FieldTypeDescription
sourcestringOriginal cloud storage URI
loaderstringAlways "cloudstorage"
providerstringCloud provider: "s3", "gcs", or "azure"
bucketstringBucket or container name
keystringObject key or blob path
filenamestringExtracted filename from the key

The loader returns descriptive errors for invalid URLs and failed requests:

docs, err := l.Load(ctx, "s3://my-bucket/path/to/file.txt")
if err != nil {
// Possible errors:
// - "cloudstorage: invalid S3 URL ..." (malformed URL)
// - "cloudstorage: fetch ... failed (status 403): ..." (auth failure)
// - "cloudstorage: fetch ...: context deadline exceeded" (timeout)
log.Fatal(err)
}