Google Drive Document Loader
Many teams store knowledge in Google Drive — meeting notes in Docs, data in Sheets, presentations in Slides, and reference PDFs. Making this content searchable by AI agents requires loading it into a RAG pipeline. This loader handles the complexities of the Drive API, including Google Workspace format exports (Docs, Sheets, and Slides cannot be downloaded directly — they must be exported to text formats) and pagination for large folders.
Choose this approach when your organization’s knowledge lives in Google Drive and you want agents to answer questions using that content.
Overview
Section titled “Overview”The Google Drive loader uses the Drive API to list and download files from a specified folder. It handles Google Workspace formats (Docs, Sheets, Slides) by exporting them to text, and downloads regular files directly. The loader implements patterns compatible with Beluga AI’s DocumentLoader interface from the rag/loader package.
Prerequisites
Section titled “Prerequisites”- Go 1.23 or later
- Beluga AI framework installed
- Google Cloud project with the Drive API enabled
- OAuth 2.0 credentials or a service account key
Installation
Section titled “Installation”Install the Google API client library:
go get google.golang.org/api/drive/v3go get google.golang.org/api/optionSetting Up Credentials
Section titled “Setting Up Credentials”- Open the Google Cloud Console
- Create a project or select an existing one
- Enable the Google Drive API under APIs & Services
- Create credentials:
- For server-side access: create a Service Account and download the JSON key
- For user-delegated access: create OAuth 2.0 Client ID credentials
- Set the credentials path:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"The DocumentLoader Interface
Section titled “The DocumentLoader Interface”The loader satisfies the DocumentLoader interface:
// From github.com/lookatitude/beluga-ai/rag/loadertype DocumentLoader interface { Load(ctx context.Context, source string) ([]schema.Document, error)}For Google Drive, the source parameter is interpreted as a folder ID.
Basic Drive Loader
Section titled “Basic Drive Loader”Build a loader that fetches all files from a Google Drive folder:
package main
import ( "context" "fmt" "io" "log" "os" "strings"
"google.golang.org/api/drive/v3" "google.golang.org/api/option" "github.com/lookatitude/beluga-ai/schema")
// DriveLoader loads documents from a Google Drive folder.type DriveLoader struct { service *drive.Service}
// NewDriveLoader creates a new Google Drive loader using the credentials// file specified by the GOOGLE_APPLICATION_CREDENTIALS environment variable.func NewDriveLoader(ctx context.Context) (*DriveLoader, error) { credsFile := os.Getenv("GOOGLE_APPLICATION_CREDENTIALS") if credsFile == "" { return nil, fmt.Errorf("GOOGLE_APPLICATION_CREDENTIALS not set") }
svc, err := drive.NewService(ctx, option.WithCredentialsFile(credsFile)) if err != nil { return nil, fmt.Errorf("create drive service: %w", err) }
return &DriveLoader{service: svc}, nil}
// Load fetches all non-trashed files from the given folder ID and returns// them as documents.func (l *DriveLoader) Load(ctx context.Context, folderID string) ([]schema.Document, error) { query := fmt.Sprintf("'%s' in parents and trashed=false", folderID) files, err := l.service.Files.List(). Q(query). Fields("files(id, name, mimeType)"). Context(ctx). Do() if err != nil { return nil, fmt.Errorf("list files: %w", err) }
var docs []schema.Document for _, file := range files.Files { content, err := l.downloadFile(ctx, file.Id, file.MimeType) if err != nil { log.Printf("skipping %s: %v", file.Name, err) continue }
docs = append(docs, schema.Document{ Content: content, Metadata: map[string]any{ "source": fmt.Sprintf("drive://%s", file.Id), "name": file.Name, "mime_type": file.MimeType, }, }) }
return docs, nil}Handling Google Workspace File Types
Section titled “Handling Google Workspace File Types”Google Workspace files (Docs, Sheets, Slides) cannot be downloaded directly. They must be exported to a supported format:
// downloadFile retrieves the content of a Drive file. Google Workspace// files are exported to text; regular files are downloaded directly.func (l *DriveLoader) downloadFile(ctx context.Context, fileID, mimeType string) (string, error) { switch mimeType { case "application/vnd.google-apps.document": return l.export(ctx, fileID, "text/plain") case "application/vnd.google-apps.spreadsheet": return l.export(ctx, fileID, "text/csv") case "application/vnd.google-apps.presentation": return l.export(ctx, fileID, "text/plain") default: return l.download(ctx, fileID) }}
func (l *DriveLoader) export(ctx context.Context, fileID, exportMIME string) (string, error) { resp, err := l.service.Files.Export(fileID, exportMIME).Context(ctx).Download() if err != nil { return "", fmt.Errorf("export %s: %w", fileID, err) } defer resp.Body.Close() return readAll(resp.Body)}
func (l *DriveLoader) download(ctx context.Context, fileID string) (string, error) { resp, err := l.service.Files.Get(fileID).Context(ctx).Download() if err != nil { return "", fmt.Errorf("download %s: %w", fileID, err) } defer resp.Body.Close() return readAll(resp.Body)}
func readAll(r io.Reader) (string, error) { var b strings.Builder if _, err := io.Copy(&b, r); err != nil { return "", err } return b.String(), nil}| Workspace Type | MIME Type | Export Format |
|---|---|---|
| Google Docs | application/vnd.google-apps.document | text/plain |
| Google Sheets | application/vnd.google-apps.spreadsheet | text/csv |
| Google Slides | application/vnd.google-apps.presentation | text/plain |
Complete Example
Section titled “Complete Example”package main
import ( "context" "fmt" "log" "os")
func main() { ctx := context.Background()
loader, err := NewDriveLoader(ctx) if err != nil { log.Fatal(err) }
folderID := os.Getenv("GOOGLE_DRIVE_FOLDER_ID") if folderID == "" { log.Fatal("GOOGLE_DRIVE_FOLDER_ID not set") }
docs, err := loader.Load(ctx, folderID) if err != nil { log.Fatal(err) }
for _, doc := range docs { fmt.Printf("Loaded: %s (%d bytes)\n", doc.Metadata["name"], len(doc.Content)) }}Advanced Topics
Section titled “Advanced Topics”Observability with OpenTelemetry
Section titled “Observability with OpenTelemetry”Add tracing to the loader for production monitoring:
import ( "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/trace")
type TracedDriveLoader struct { *DriveLoader tracer trace.Tracer}
func NewTracedDriveLoader(ctx context.Context) (*TracedDriveLoader, error) { base, err := NewDriveLoader(ctx) if err != nil { return nil, err } return &TracedDriveLoader{ DriveLoader: base, tracer: otel.Tracer("beluga.loader.drive"), }, nil}
func (l *TracedDriveLoader) Load(ctx context.Context, folderID string) ([]schema.Document, error) { ctx, span := l.tracer.Start(ctx, "drive.load") defer span.End()
span.SetAttributes(attribute.String("drive.folder_id", folderID))
docs, err := l.DriveLoader.Load(ctx, folderID) if err != nil { span.RecordError(err) return nil, err }
span.SetAttributes(attribute.Int("drive.document_count", len(docs))) return docs, nil}Pagination for Large Folders
Section titled “Pagination for Large Folders”The Drive API returns a maximum of 100 files per page by default. Handle pagination for folders with many files:
func (l *DriveLoader) LoadAll(ctx context.Context, folderID string) ([]schema.Document, error) { query := fmt.Sprintf("'%s' in parents and trashed=false", folderID) var docs []schema.Document pageToken := ""
for { call := l.service.Files.List(). Q(query). Fields("nextPageToken, files(id, name, mimeType)"). PageSize(100). Context(ctx)
if pageToken != "" { call = call.PageToken(pageToken) }
result, err := call.Do() if err != nil { return nil, fmt.Errorf("list page: %w", err) }
for _, file := range result.Files { content, err := l.downloadFile(ctx, file.Id, file.MimeType) if err != nil { log.Printf("skipping %s: %v", file.Name, err) continue } docs = append(docs, schema.Document{ Content: content, Metadata: map[string]any{ "source": fmt.Sprintf("drive://%s", file.Id), "name": file.Name, "mime_type": file.MimeType, }, }) }
if result.NextPageToken == "" { break } pageToken = result.NextPageToken }
return docs, nil}Configuration
Section titled “Configuration”| Option | Description | Default | Required |
|---|---|---|---|
GOOGLE_APPLICATION_CREDENTIALS | Path to service account or OAuth credentials JSON | - | Yes |
FolderID | Google Drive folder ID (from the folder URL) | - | Yes |
| Export format | MIME type for exporting Workspace files | text/plain | No |
Troubleshooting
Section titled “Troubleshooting”“Google Drive API has not been used in project” — Enable the Drive API in the Google Cloud Console under APIs & Services > Library.
“The caller does not have permission” — Verify that the service account or OAuth user has access to the target folder. For service accounts, share the folder with the service account email address.
“File not exportable” — Only Google Workspace files support the Export endpoint. Regular files (PDFs, images, etc.) must use the direct download path.
Production Considerations
Section titled “Production Considerations”- Use service accounts for server-side access; OAuth 2.0 is better suited for user-facing applications
- Request the minimal OAuth scope needed:
https://www.googleapis.com/auth/drive.readonly - Handle Drive API rate limits (currently 12,000 queries per minute per project) with exponential backoff
- Implement incremental sync using
modifiedTimefilters to avoid re-processing unchanged files - For large folders, use pagination and process files in batches
- Store processed file IDs to enable deduplication across runs
Related Resources
Section titled “Related Resources”- Document Loaders — All document loader integrations
- S3 Event-Driven Loader — AWS S3 document loading
- Embedding Providers — Generating embeddings for loaded documents