Skip to content
Docs

GitHub Document Loader

The GitHub loader implements the loader.DocumentLoader interface for loading files from GitHub repositories. It uses the GitHub Contents API to fetch file content, with support for branch/tag/SHA references.

Choose GitHub when you need to load source code, documentation, or configuration files from GitHub repositories into a RAG pipeline. The loader supports branch/tag/SHA references and works with both GitHub.com and GitHub Enterprise. For loading from other platforms, consider Confluence or Notion.

Terminal window
go get github.com/lookatitude/beluga-ai/rag/loader/providers/github
package main
import (
"context"
"fmt"
"log"
"os"
"github.com/lookatitude/beluga-ai/config"
"github.com/lookatitude/beluga-ai/rag/loader"
_ "github.com/lookatitude/beluga-ai/rag/loader/providers/github"
)
func main() {
l, err := loader.New("github", config.ProviderConfig{
APIKey: os.Getenv("GITHUB_TOKEN"),
})
if err != nil {
log.Fatal(err)
}
docs, err := l.Load(context.Background(), "lookatitude/beluga-ai/README.md")
if err != nil {
log.Fatal(err)
}
fmt.Printf("SHA: %s\n", docs[0].Metadata["sha"])
fmt.Printf("Content: %s\n", docs[0].Content)
}
ParameterTypeDefaultDescription
APIKeystring""GitHub personal access token (optional for public repos)
BaseURLstringhttps://api.github.comGitHub API base URL (use for GitHub Enterprise)
Timeouttime.Duration30sHTTP request timeout
Options["ref"]string""Git ref (branch, tag, or commit SHA) to load from

The source parameter uses the format owner/repo/path/to/file:

docs, err := l.Load(ctx, "lookatitude/beluga-ai/docs/concepts.md")

The source must contain at least three path segments: owner, repository, and file path.

Use Options["ref"] to load from a specific branch, tag, or commit:

l, err := loader.New("github", config.ProviderConfig{
APIKey: os.Getenv("GITHUB_TOKEN"),
Options: map[string]any{
"ref": "v2.0.0",
},
})
docs, err := l.Load(ctx, "lookatitude/beluga-ai/README.md")

Point to your GitHub Enterprise instance using BaseURL:

l, err := loader.New("github", config.ProviderConfig{
APIKey: os.Getenv("GHE_TOKEN"),
BaseURL: "https://github.corp.example.com/api/v3",
})
FieldTypeDescription
sourcestringOriginal owner/repo/path string
loaderstringAlways "github"
pathstringFile path within the repository
shastringGit blob SHA
sizeintFile size in bytes
html_urlstringGitHub web URL for the file
download_urlstringDirect download URL

For public repositories, the APIKey is optional. However, unauthenticated requests are subject to lower rate limits (60 requests per hour vs. 5,000 authenticated):

l, err := loader.New("github", config.ProviderConfig{})
docs, err := l.Load(ctx, "golang/go/README.md")
docs, err := l.Load(ctx, "owner/repo/file.go")
if err != nil {
// Possible errors:
// - "github: source is required" (empty source)
// - "github: source must be in format owner/repo/path" (invalid format)
// - "github: ... is a dir, not a file" (source points to directory)
// - "github: fetch ...: ..." (API error, 404 or 403)
log.Fatal(err)
}