As your dbt project grows, you’ll inevitably hit the wall: a monolithic project with hundreds of models, slow parse times, tangled CI pipelines, and multiple teams stepping on each other’s feet. The natural next question is: should you split this into multiple projects?

And if you do, how do you handle cross-project dependencies without paying for dbt Cloud?

This article walks through the mono repo vs. multi repo decision, then dives into dbt-loom, the open-source plugin that brings dbt Mesh-style cross-project references to dbt Core users. Everything shown here is backed by a working demo repository using dbt Core, DuckDB, and uv (no dbt Cloud account required).

I’ve made a repository in order to illustrate what is described in the article, you can find it here: https://github.com/p-munhoz/dbt-loom-multi-project-demo


The Problem with Monolithic dbt Projects

When a dbt project is small, everything in one repo is perfectly fine. A single dbt run rebuilds the world, lineage is complete, and every model is discoverable by the whole team.

But as the project scales (more teams, more domains, more models) the monolith starts to crack:

  • Slow parse times. dbt parses every model on every run, even if you only touched two files.
  • Long CI builds. Every PR triggers a full pipeline. Slim CI helps, but the blast radius of any change is hard to control.
  • Ownership conflicts. The finance team shouldn’t need to negotiate with the marketing team to merge a model rename.
  • Cognitive overload. Finding the “right” model becomes harder when everything lives in one giant DAG.

At some point, splitting into smaller, domain-oriented dbt projects becomes the natural solution. This is exactly what dbt Mesh promotes and what dbt-loom makes accessible without a dbt Cloud subscription.


Mono Repo vs. Multi Repo: What Are We Actually Deciding?

Before diving into tooling, it’s worth being precise about what this choice entails.

A mono repo contains multiple dbt projects in a single Git repository. Teams work in subdirectories. This keeps tooling simple (one CI/CD pipeline, shared linting configs) while enabling project isolation. This is the layout of the demo in this article: core_project, finance_project, and marketing_project all live under a shared projects/ directory.

A multi repo gives each dbt project its own Git repository. Teams are fully autonomous, they have their own CI, their own deployment schedules, their own dependency graph. This is the microservices model applied to data.

The key point: dbt-loom works for both. The mechanism is the same whether the upstream manifest lives at a relative path next door or in an S3 bucket from a completely separate repo. The only thing that changes is where you point the config.

Here’s a practical comparison to guide the decision:

Mono RepoMulti Repo
Coordination overheadLowHigher
Team autonomyMediumHigh
CI/CD complexitySimplerMore pipelines to manage
Cross-project refsEasier (same repo)Requires manifest publishing
Access control (Git)Harder to isolateNative per-repo
Onboarding new teamsEasyMore setup

Neither is universally better. Your choice should map to your org structure, not the other way around.


What Is dbt Mesh and Why It’s Gated

dbt Labs introduced dbt Mesh as their native answer to multi-project collaboration. The core idea: each dbt project exposes a set of public models that downstream projects can reference with a scoped ref() call. The metadata plumbing between projects is handled by dbt Cloud’s metadata service.

# dependencies.yml - dbt Cloud only
projects:
  - name: core_project
-- downstream model referencing an upstream public model
select * from {{ ref('core_project', 'dim_customers') }}

This is elegant. But it requires dbt Cloud and enterprise pricing that puts it out of reach for many teams. As Nicholas Yager, the author of dbt-loom, stated in the dbt-core GitHub discussions when Labs announced cross-project refs would not be open-sourced: “As a disappointed community member, I have created dbt-loom, an Apache 2-licensed Python package that enables cross-project references in dbt-core.” It quickly became the go-to open-source bridge.


dbt-loom: How It Works Under the Hood

dbt-loom is a dbt Core plugin, a mechanism introduced in dbt Core v1.6 that lets you hook into dbt’s internal node resolution process.

Specifically, dbt-loom implements the get_nodes hook. Every time dbt resolves nodes in your DAG, the plugin intercepts that call, reads a manifest.json from an upstream project, extracts all public models, and injects them as external nodes into your current project’s graph.

The key insight: you don’t need the upstream project’s source code. You only need its manifest.json, the artifact dbt generates on every run describing every node in the project. This is the same contract dbt Cloud uses with its metadata service. dbt-loom replicates it for dbt Core.

                    dbt run
  core_project  ──────────────▶  target/manifest.json
  dim_customers                        │
  access: public                       │  file / S3 / GCS / Azure
                                       ▼
                            dbt-loom reads manifest,
  finance_project  ◀───────  injects dim_customers
  ref('core_project',        as external node
      'dim_customers')
                            dbt-loom reads manifest,
  marketing_project  ◀─────  injects dim_customers
  ref('core_project',        as external node
      'dim_customers')

Example of fan-out pattern with dbt-loom

This is the fan-out pattern: one upstream project exposes a public model, multiple downstream teams consume it independently. Each downstream project only needs the manifest (not the source code, not access to the upstream repo).


The Demo Setup

The demo repository has this layout:

dbt-loom-multi-project-demo/
├── profiles/
│   └── profiles.yml              # env-var-driven DuckDB path for all projects
├── projects/
│   ├── core_project/             # upstream: exposes dim_customers as public
│   ├── finance_project/          # downstream: revenue by customer
│   └── marketing_project/        # downstream: clicks by customer
├── warehouse/
│   └── analytics.duckdb          # shared DuckDB file
├── scripts/
│   ├── run_dbt_demo.sh           # one-shot end-to-end script
│   ├── mark_external_nodes_for_docs.py
│   └── flag_external_nodes.py
├── .github/workflows/
│   └── dbt-mesh-demo.yml         # CI showing two-job deployment order
└── Makefile                      # per-project targets with enforced ordering

All three projects point to the same DuckDB file in separate schemas (core, finance, marketing), simulating a shared warehouse. Dependencies are managed with uv:

# pyproject.toml
[project]
dependencies = [
    "dbt-core",
    "dbt-duckdb",
    "dbt-loom",
]
uv sync

The DuckDB path and manifest path are driven by environment variables, so the repo works out of the box without editing any config files:

# profiles/profiles.yml
core_project:
  outputs:
    dev:
      type: duckdb
      path: "{{ env_var('DBT_LOOM_WAREHOUSE_PATH', 'warehouse/analytics.duckdb') }}"
      schema: core
      threads: 4
  target: dev

The same env_var() pattern applies to finance_project and marketing_project profiles.


Setting Up dbt-loom Step by Step

Schema of how dbt-loom resolves cross-project refs

1. Declare public models in the upstream project

In core_project/models/schema.yml, mark the models you want to expose downstream:

models:
  - name: dim_customers
    description: Public customer dimension exposed to downstream projects.
    access: public
    config:
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: integer
        tests:
          - not_null
          - unique
      - name: customer_name
        data_type: varchar
      - name: customer_tier
        data_type: varchar

Two things matter here. access: public is the signal to dbt-loom that this model should be injected into downstream projects. contract: enforced: true makes the column definitions a hard contract, it means that dbt will fail the run if the actual output doesn’t match. Tests on customer_id (not_null, unique) are also defined at the source, not delegated to consumers. This is the right default for anything you expose across team boundaries.

The model itself uses explicit casts to satisfy the contract:

-- core_project/models/dim_customers.sql
select
  cast(customer_id  as integer) as customer_id,
  cast(customer_name as varchar) as customer_name,
  cast(customer_tier as varchar) as customer_tier
from {{ ref('customers') }}

2. Run the upstream project to produce the manifest

export DBT_PROFILES_DIR=$(pwd)/profiles
export DBT_LOOM_WAREHOUSE_PATH=$(pwd)/warehouse/analytics.duckdb
export CORE_PROJECT_MANIFEST_PATH=$(pwd)/projects/core_project/target/manifest.json
 
cd projects/core_project
dbt seed   # loads customers.csv → core.customers
dbt run    # builds core.dim_customers, writes target/manifest.json
dbt test   # validates not_null + unique on customer_id

After this step, core_project/target/manifest.json contains the full node metadata for dim_customers, including its access: public declaration and enforced contract columns. This file is what dbt-loom will read.

3. Configure dbt-loom in each downstream project

Create dbt_loom.config.yml at the root of each downstream project:

# finance_project/dbt_loom.config.yml
manifests:
  - name: core_project
    type: file
    config:
      path: $CORE_PROJECT_MANIFEST_PATH

The same config goes into marketing_project/dbt_loom.config.yml. The $CORE_PROJECT_MANIFEST_PATH env var is resolved at runtime by dbt-loom, you set it once and all downstream projects pick it up.

Two things worth knowing here:

Naming gotcha: The config file must be named dbt_loom.config.yml with an underscore, dbt-loom looks for the underscored variant by default. You can override this with the DBT_LOOM_CONFIG environment variable.

No plugin registration needed: There is no plugins: block to add to dbt_project.yml. dbt-loom auto-discovers the config file when it’s present in the project directory. Your dbt_project.yml stays completely clean:

# finance_project/dbt_project.yml
name: finance_project
version: '1.0.0'
config-version: 2
profile: finance_project
 
model-paths: ["models"]
seed-paths: ["seeds"]
 
models:
  finance_project:
    +materialized: table

4. Reference upstream public models in downstream projects

The finance project builds revenue by joining orders against the upstream customer dimension:

-- finance_project/models/fct_revenue_by_customer.sql
with customers as (
  select * from {{ ref('core_project', 'dim_customers') }}
),
orders as (
  select
    cast(order_id    as integer) as order_id,
    cast(customer_id as integer) as customer_id,
    cast(amount      as double)  as amount
  from {{ ref('orders') }}
)
select
  c.customer_id,
  c.customer_name,
  c.customer_tier,
  sum(o.amount) as total_revenue
from orders o
join customers c on o.customer_id = c.customer_id
group by 1, 2, 3
order by 1

The marketing project does the same with campaign click events:

-- marketing_project/models/fct_clicks_by_customer.sql
with customers as (
  select * from {{ ref('core_project', 'dim_customers') }}
),
events as (
  select
    cast(event_id    as integer) as event_id,
    cast(customer_id as integer) as customer_id,
    cast(clicks      as integer) as clicks
  from {{ ref('campaign_events') }}
)
select
  c.customer_id,
  c.customer_name,
  c.customer_tier,
  sum(e.clicks) as total_clicks
from events e
join customers c on e.customer_id = c.customer_id
group by 1, 2, 3
order by 1

Both use the exact same ref('core_project', 'dim_customers') syntax. When dbt resolves this, dbt-loom intercepts the node lookup, finds dim_customers in the loaded manifest, and injects it as an external node. dbt then knows its physical location (core.dim_customers in DuckDB) and builds the join correctly without either downstream project having access to core_project’s source code.

5. Run the downstream projects

cd projects/finance_project
dbt seed   # loads orders.csv → finance.orders
dbt run    # builds finance.fct_revenue_by_customer
dbt test   # validates not_null on customer_id
 
cd ../marketing_project
dbt seed   # loads campaign_events.csv → marketing.campaign_events
dbt run    # builds marketing.fct_clicks_by_customer
dbt test   # validates not_null on customer_id

Expected outputs:

finance.fct_revenue_by_customer

customer_idcustomer_namecustomer_tiertotal_revenue
1AliceGold200.0
2BobSilver40.0
3CarolGold220.0
4DavidBronze15.25

marketing.fct_clicks_by_customer

customer_idcustomer_namecustomer_tiertotal_clicks
1AliceGold7
2BobSilver4
3CarolGold10
4DavidBronze1

Or run everything in one shot:

make demo
# or: scripts/run_dbt_demo.sh

The Makefile also exposes per-project targets with enforced ordering, make finance automatically runs make core first:

finance: core   # depends on core target
    ...
 
marketing: core  # depends on core target
    ...

This enforces the correct execution order locally without having to remember it manually.


Manifest Sources: Beyond Local Files

The demo uses a local file path via $CORE_PROJECT_MANIFEST_PATH, the right choice for a mono repo. For production multi-repo setups, you publish the manifest to object storage after each upstream CI run and point the config there instead.

AWS S3:

manifests:
  - name: core_project
    type: s3
    config:
      bucket_name: your-dbt-artifacts-bucket
      object_name: core_project/manifest.json
pip install "dbt-loom[s3]"

Google Cloud Storage:

manifests:
  - name: core_project
    type: gcs
    config:
      project_id: your-gcp-project-id
      bucket_name: your-dbt-artifacts-bucket
      object_name: core_project/manifest.json
pip install "dbt-loom[gcs]"

Azure Blob Storage:

manifests:
  - name: core_project
    type: azure
    config:
      container_name: dbt-artifacts
      blob_name: core_project/manifest.json

dbt Cloud (hybrid setup):

manifests:
  - name: core_project
    type: dbt_cloud
    config:
      account_id: 12345
      job_id: 67890
      api_endpoint: https://cloud.getdbt.com

dbt-loom also supports gzipped manifests natively, suffix the object name with .gz and it decompresses automatically:

config:
  object_name: core_project/manifest.json.gz

Deployment: Enforcing Execution Order in CI

The critical operational constraint with dbt-loom is manifest freshness and execution order. The downstream projects must run after the upstream one has completed and published its manifest. This is obvious with one downstream consumer, but becomes a real coordination problem when you have multiple, both finance_project and marketing_project depend on core_project, so core_project must always go first.

GitHub Actions example with dbt-loom dependencies

The GitHub Actions workflow in the demo makes this explicit with two jobs:

jobs:
  core_project:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v5
      - run: uv sync
      - name: Build core project
        env:
          DBT_PROFILES_DIR: ${{ github.workspace }}/profiles
          DBT_LOOM_WAREHOUSE_PATH: ${{ github.workspace }}/warehouse/analytics.duckdb
        run: |
          cd projects/core_project
          ../../.venv/bin/dbt seed
          ../../.venv/bin/dbt run
          ../../.venv/bin/dbt test
      - name: Publish manifest artifact
        uses: actions/upload-artifact@v4
        with:
          name: core-manifest
          path: projects/core_project/target/manifest.json
 
  downstream_projects:
    runs-on: ubuntu-latest
    needs: core_project          # ← enforces execution order
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v5
      - run: uv sync
      - name: Download core manifest
        uses: actions/download-artifact@v4
        with:
          name: core-manifest
          path: projects/core_project/target
      - name: Build finance project
        env:
          DBT_PROFILES_DIR: ${{ github.workspace }}/profiles
          DBT_LOOM_WAREHOUSE_PATH: ${{ github.workspace }}/warehouse/analytics.duckdb
          CORE_PROJECT_MANIFEST_PATH: ${{ github.workspace }}/projects/core_project/target/manifest.json
        run: |
          cd projects/finance_project
          ../../.venv/bin/dbt seed && ../../.venv/bin/dbt run && ../../.venv/bin/dbt test
      - name: Build marketing project
        env:
          DBT_PROFILES_DIR: ${{ github.workspace }}/profiles
          DBT_LOOM_WAREHOUSE_PATH: ${{ github.workspace }}/warehouse/analytics.duckdb
          CORE_PROJECT_MANIFEST_PATH: ${{ github.workspace }}/projects/core_project/target/manifest.json
        run: |
          cd projects/marketing_project
          ../../.venv/bin/dbt seed && ../../.venv/bin/dbt run && ../../.venv/bin/dbt test

The needs: core_project dependency is what enforces the pipeline order. The manifest is transferred between jobs as a GitHub Actions artifact.

One DuckDB-specific caveat worth understanding. In this demo, the downstream job also needs the DuckDB warehouse file to exist because core.dim_customers is a table in that file and the downstream models join against it. In a real production setup with a cloud warehouse (Snowflake, BigQuery, Redshift, etc.), this problem doesn’t exist: the upstream project builds its tables into the warehouse and any downstream project connecting to the same warehouse can query them. DuckDB is a local file, so you’d additionally need to pass the warehouse file as an artifact between CI jobs. The manifest-only transfer is the real pattern, DuckDB just adds a file-system wrinkle.

With Dagster across code locations, you’d model dim_customers as an asset dependency that fct_revenue_by_customer and fct_clicks_by_customer explicitly declare, Dagster then enforces run order and propagates failure signals across projects. With Airflow, an ExternalTaskSensor or TriggerDagRunOperator covers the same need.


The Documentation Limitation and a Workaround

This is the roughest edge of dbt-loom, and it’s worth being direct about it.

When you run dbt docs generate in a downstream project, dbt-loom injects external nodes into the graph but they appear with minimal metadata. In the dbt docs UI, dim_customers shows up as an external model without its description, column docs, or any indication of where it comes from. For a small team this is manageable. At scale it becomes confusing.

The demo includes two Python scripts that address this directly.

mark_external_nodes_for_docs.py patches the target/manifest.json after dbt docs generate runs. It finds every node whose package_name differs from the current project, prefixes its name with [EXTERNAL], adds a [External node injected by dbt-loom] marker to its description, and sets meta.loom_external: true. The result is that external nodes are visually distinguishable in the lineage UI:

dbt docs generate
python scripts/mark_external_nodes_for_docs.py
# → patches finance_project/target/manifest.json in place
# → dim_customers is now named "[EXTERNAL] dim_customers"

flag_external_nodes.py generates a post-docs audit report in both JSON and Markdown format:

python scripts/flag_external_nodes.py
# → writes target/external_nodes_report.json
# → writes target/external_nodes_report.md

The Markdown report looks like:

# External Nodes Report: finance_project

Total external nodes: 1

## By Resource Type
- `model`: 1

## By Package
- `core_project`: 1

## Nodes
| Package       | Type    | Name           | Unique ID                          |
| core_project  | model   | dim_customers  | model.core_project.dim_customers   |

The script also accepts a --fail-on-external flag, which exits with a non-zero status if any external nodes are detected. This is useful in a governance-focused CI check, for example, asserting that a project has no unexpected external dependencies beyond what’s been explicitly approved.

These scripts are workarounds, not solutions. The underlying limitation is a known constraint of dbt’s plugin API: injected nodes are not full ManifestNode objects, so dbt docs generate can’t enrich them the way it would native models. This may improve as the plugin API matures.


Caveats to Know Before Adopting

Plugin API is still beta. The dbt plugin system arrived in v1.6 and is still evolving. Pin your dbt and dbt-loom versions together and test upgrades carefully before rolling them to production.

Manifest freshness is your responsibility. There’s no background metadata service. If the upstream project runs and schema changes but the manifest isn’t republished, downstream projects silently build on stale information. Make manifest publishing a required, non-optional CI step.

Mark manifests as optional during bootstrapping. When standing up a new downstream project before the upstream has ever run, avoid hard failures:

manifests:
  - name: core_project
    type: file
    config:
      path: $CORE_PROJECT_MANIFEST_PATH
    optional: true

Exclude upstream packages from injection. If your upstream project uses dbt_project_evaluator or similar, you don’t want those nodes leaking into your downstream graph:

manifests:
  - name: core_project
    type: s3
    config:
      bucket_name: your-bucket
      object_name: manifest.json
    excluded_packages:
      - dbt_project_evaluator

When to Use dbt-loom vs. When to Consider dbt Cloud Mesh

Use dbt-loom if:

  • You’re on dbt Core and want multi-project references without a Cloud migration
  • Your team is comfortable owning manifest publishing in CI
  • You have a small-to-medium number of upstream projects (roughly 1–4)
  • You want to validate dbt Mesh patterns before committing to them at scale

Consider dbt Cloud Mesh if:

  • You need first-class cross-project lineage in a shared UI
  • You want environment-aware ref resolution (staging refs staging, prod refs prod) handled automatically
  • You have many interdependent projects and manifest management is becoming a coordination burden
  • You need model versioning and access policy governance at scale

Wrapping Up

The mono repo vs. multi repo question isn’t primarily a tooling question, it’s more about team ownership boundaries. If your data teams are becoming meaningfully independent, splitting dbt projects along domain lines is a sound architectural move regardless of which repo strategy you adopt.

dbt-loom makes that move accessible without a Cloud contract. The mechanism is clean: upstream projects run and publish a manifest.json, downstream projects consume it and reference public models with the exact same ref() syntax dbt Cloud Mesh uses. No source code sharing, no tight coupling, just metadata.

The fan-out pattern in the demo (one core_project feeding both finance_project and marketing_project) is the topology you’ll encounter most often in practice. Get the execution order right (upstream first, always), enforce public model contracts, keep your manifests fresh, and you have a working open-source data mesh.

The docs limitation is real but manageable with the patching scripts in the demo. If rich cross-project lineage in a shared UI is a hard requirement for your team, that’s probably the clearest signal to look at dbt Cloud Mesh seriously.