Skip to content
Docs

Docling Document Loader

The Docling loader implements the loader.DocumentLoader interface using the IBM Docling API to convert documents (PDFs, DOCX, images, and more) into structured text. Docling extracts text, tables, and layout information and returns the content as Markdown or plain text.

Choose Docling when you need structured document conversion that preserves tables and layout information as Markdown. Docling handles PDFs, DOCX, images, and more, and can be self-hosted via Docker for data privacy. For a broader range of file types with element-level extraction, consider Unstructured. For web scraping, consider Firecrawl.

Terminal window
go get github.com/lookatitude/beluga-ai/rag/loader/providers/docling
package main
import (
"context"
"fmt"
"log"
"github.com/lookatitude/beluga-ai/config"
"github.com/lookatitude/beluga-ai/rag/loader"
_ "github.com/lookatitude/beluga-ai/rag/loader/providers/docling"
)
func main() {
l, err := loader.New("docling", config.ProviderConfig{
BaseURL: "http://localhost:5001",
})
if err != nil {
log.Fatal(err)
}
docs, err := l.Load(context.Background(), "/path/to/document.pdf")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Content: %s\n", docs[0].Content)
}
ParameterTypeDefaultDescription
BaseURLstringhttp://localhost:5001Docling API endpoint
APIKeystring""Optional Bearer token for authentication
Timeouttime.Duration0 (no timeout)HTTP request timeout

The loader accepts two types of sources:

File paths are uploaded to the Docling API as multipart form data:

docs, err := l.Load(ctx, "/path/to/document.pdf")

HTTP/HTTPS URLs are passed to the Docling API as a JSON body for server-side download:

docs, err := l.Load(ctx, "https://example.com/report.pdf")

The Docling API returns both Markdown and plain text representations. The loader prefers Markdown content when available, falling back to plain text:

  1. Markdown content (md_content) is used if present
  2. Plain text content (text_content) is used as fallback
  3. If both are empty, nil is returned (no documents)
FieldTypeDescription
sourcestringOriginal file path or URL
formatstringAlways "docling"
loaderstringAlways "docling"

Docling supports a wide range of document formats including:

  • PDF documents
  • Microsoft Word (DOCX)
  • Microsoft PowerPoint (PPTX)
  • Images (PNG, JPG, TIFF)
  • HTML pages

Refer to the Docling documentation for the complete list of supported formats.

Docling can be run as a local service using Docker:

Terminal window
docker run -p 5001:5001 ds4sd/docling-serve

Once running, configure the loader to point to your local instance:

l, err := loader.New("docling", config.ProviderConfig{
BaseURL: "http://localhost:5001",
})
docs, err := l.Load(ctx, "/path/to/document.pdf")
if err != nil {
// Possible errors:
// - "docling: source is required" (empty source)
// - "docling: open file: ..." (local file not found)
// - "docling: API error (status 422): ..." (unsupported format)
// - "docling: request: ..." (connection failure)
log.Fatal(err)
}