Many teams treat dbt documentation as an optional afterthought, something generated at the end of a sprint or manually viewed in a dev environment. At Datum Labs, we think differently. We believe documentation should be automated, immutable and integrated into your CI/CD pipeline, without depending on warehouse access.
One key area we have optimized is generating dbt documentation as part of CI/CD workflows, without requiring access to a live database. This blog walks through how we use Docker to generate and serve dbt docs, elegantly and securely statically.
Why Automate dbt Docs in CI?
Traditionally, generating dbt docs required access to a data warehouse. While that approach works for local environments or ad-hoc builds, it introduces unnecessary overhead and risk in continuous integration pipelines.
Our approach eliminates that dependency, enabling:
- Faster builds: No round-trip to the data warehouse.
- Safer pipelines: No credentials or secrets needed.
- More portability: Generates static HTML docs that can be hosted anywhere.
- Better CI integration: Works seamlessly with platforms like GitHub Actions or GitLab CI.
How We Generate dbt Docs Without a Database?
By using a multi-stage Dockerfile, we can build dbt docs in isolation, completely detached from your production environment. Here's how we accomplish that.
Step-by-Step Breakdown of the Dockerfile
FROM python:3.10-slim as builder
RUN apt-get update && apt-get install -y git && \
pip install dbt-core dbt-clickhouse
WORKDIR /app
COPY ./dbt/dbt /app/project
WORKDIR /app/project
RUN rm -f /app/project/profiles.yml
RUN mkdir -p /root/.dbt && echo "\
airclick_dbt:\n\
target: dev\n\
outputs:\n\
dev:\n\
type: clickhouse\n\
schema: default\n\
user: none\n\
password: none\n\
host: localhost\n\
port: 9000\n\
database: default\n\
" > /root/.dbt/profiles.yml
RUN dbt deps && dbt parse && \
dbt docs generate --no-compile --empty-catalog --profiles-dir /root/.dbt
FROM nginx:alpine
COPY --from=builder /app/project/target /usr/share/nginx/html
EXPOSE 80
Key Benefits of This Docker Setup
Step |
Main Objective |
Implementation Notes |
1. Base Image |
Use a minimal Python 3.10 base |
Keeps the image lean and CI-friendly |
2. Install Dependencies |
Install git , dbt-core , and adapter |
Swap adapter as needed for your warehouse |
3. Copy Project |
Bring in your dbt project |
Assumes ./dbt/dbt folder structure |
4. Remove Profiles |
Prevent using real configs |
Enhances security |
5. Add Dummy Profile |
Avoid connecting to warehouse |
Works even in air-gapped CI |
6. Generate Docs |
Build docs statically |
Uses --empty-catalog and --no-compile |
7. Serve via Nginx |
Use lightweight static file server |
No Python runtime required |
Why This Approach Matters?
1. Decouples Documentation from Warehouse Access
Using the --empty-catalog flag, dbt skips querying schemas. This enables us to create documentation that focuses on model structure, descriptions and lineage, without requiring actual data.
2. Promotes Immutable, Reviewable Documentation
When a developer pushes a change that adds a new model, it's reflected in the docs immediately as part of the PR review cycle. Documentation becomes code-native, version-controlled and continuously updated.
3. CI-Ready and DevOps-Friendly
This method fits naturally into Git-based workflows. Whether you are using GitHub Actions, GitLab or another CI provider, you can run this Dockerfile with confidence and predictability, no secrets, no special networking, no service dependencies.
Automating dbt Documentation with CI/CD
Once the dbt docs site is successfully containerized, the next logical step is automation. Relying on manual builds or ad-hoc local commands introduces friction and inconsistency, especially in team environments.
To ensure that documentation remains accurate and up to date, it should be rebuilt and deployed automatically as part of your CI/CD process. This not only improves reliability but also enforces documentation as a first-class citizen in your development workflow.
One common approach is to use GitHub Actions to trigger the documentation build and deployment pipeline every time a change is pushed to your production branch.
Implementing GitHub Actions for dbt Docs Deployment
Below is an example of a GitHub Actions workflow that automates the following:
- Check out your dbt project from the repository
- Builds the Docker image that generates the documentation
- Extracts the static HTML files from the container
- Deploys the documentation to GitHub Pages
This ensures that your documentation is regenerated and published automatically on every relevant update, without requiring access to the data warehouse.
name: Build dbt Docs
on:
push:
branches:
- main
workflow_dispatch:
jobs:
build-docs:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v3
- name: Build dbt docs Docker image
run: |
docker build -t dbt-docs -f path/to/your/Dockerfile .
- name: Export generated site
run: |
container_id=$(docker create dbt-docs)
docker cp "$container_id":/usr/share/nginx/html ./dbt-docs-site
docker rm "$container_id"
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./dbt-docs-site
Note: This workflow assumes that your Dockerfile is set up to output the dbt docs site to and that GitHub Pages is enabled for the repository.
Why This Workflow Matters
Automating dbt documentation serves several purposes:
- Consistency: Ensures documentation is always aligned with the latest project changes
- Security: Avoids warehouse exposure during documentation generation
- Transparency: Provides a clear audit trail through your version control system
- Scalability: Enables team-wide adoption without adding manual steps
By integrating dbt documentation generation into your CI/CD pipeline, you embed data documentation into the core of your development lifecycle, making it easier to maintain, audit, and scale.
Bring Stability and Speed to dbt Documentation
At Datum Labs, we believe documentation should evolve alongside your code, not lag behind it. By removing the dependency on a live warehouse and embedding dbt docs generation into CI/CD workflows, we have built a solution that is reliable, secure and scalable across teams.
This approach ensures your data documentation is always current, version-controlled and reviewable, with no compromises on speed or auditability.
It’s not just about automating docs. It’s about strengthening the foundation of your data operations.