DataLex project layout¶
Every DataLex project is a directory tree with one YAML file per object. The
file's kind: key dispatches it to the right parser. This page is the
reference for what each kind: looks like and how the loader discovers them.
Migrating from v1/v2
*.model.yaml? Usedatalex datalex migrate to-datalex-layout path/to/legacy.model.yamlto explode a legacy file into this layout. See also archive/yaml-spec-v2.md.
Shape of a project¶
my-project/
datalex.yaml # kind: project
models/
conceptual/customer.yaml # kind: entity, layer: conceptual
logical/customer.yaml # kind: entity, layer: logical
physical/postgres/customer.yaml # kind: entity, layer: physical
sources/
jaffle_shop_raw.yaml # kind: source (imported via dbt sync)
models/dbt/
stg_customers.yaml # kind: model (imported via dbt sync)
glossary/
customer.yaml # kind: term
domains/
sales.yaml # kind: domain
policies/
require_owner.yaml # kind: policy
.datalex/
snippets/audit_columns.yaml # kind: snippet
lock.yaml # package lockfile (from resolve)
All paths are discoverable via globs configured in datalex.yaml; defaults
match the tree above.
kind: project — the manifest¶
datalex.yaml at the root declares the project and optional globs/imports.
kind: project
name: my_project
version: '1'
dialects: [postgres, snowflake]
default_dialect: postgres
# Optional globs — defaults shown
# models: models/**/*.yaml
# sources: sources/**/*.yaml
# glossary: glossary/**/*.yaml
# snippets: .datalex/snippets/**/*.yaml
# policies: policies/**/*.yaml
# diagrams: datalex/diagrams/**/*.yaml
imports:
- package: acme/warehouse-core@1.4.0
git: https://github.com/acme/warehouse-core.git
ref: v1.4.0
alias: wc
Schema: datalex_core/_schemas/datalex/project.schema.json.
kind: entity — tables and views in three layers¶
Entities come in three layer: values that you may use independently or as a
traceable conceptual → logical → physical chain.
kind: entity
layer: physical
dialect: postgres
name: customer
physical_name: dim_customer # optional override for DDL
logical: customer # optional back-reference to logical layer
description: One row per customer.
owner: growth
domain: sales
tags: [core, pii]
columns:
- name: id
type: bigint
constraints: [{type: primary_key}]
- name: email
type: string(255)
nullable: false
sensitivity: pii
- name: home_region_id
type: int
references:
entity: region
column: id
indexes:
- name: idx_customer_email
columns: [email]
unique: true
Notable fields:
previous_name:— explicit rename tracking;datalex datalex diffprefers explicit renames over heuristics.physical:on a column — per-dialect type overrides:- name: body type: string physical: snowflake: { type: VARCHAR(16777216) } postgres: { type: text }raw_ddl:— preserved for vendor-specific hints emitters can't round-trip.meta.datalex.*— emitter-owned namespace; usermetafields anywhere else are preserved across import/emit.
Schema: datalex_core/_schemas/datalex/entity.schema.json.
kind: source — external data (dbt sources)¶
One file per dbt source group, with nested tables:
kind: source
name: jaffle_shop_raw
database: warehouse
schema: main
tables:
- name: raw_customers
description: Raw customer feed.
columns:
- name: id
type: bigint
nullable: false
description: Primary key.
- name: email
type: string
meta:
datalex:
dbt:
unique_id: source.jaffle_shop.jaffle_shop_raw.raw_customers
Populated by datalex datalex dbt sync; emitted back to sources.yml by
datalex datalex dbt emit.
Schema: datalex_core/_schemas/datalex/source.schema.json.
kind: model — derived tables (dbt models)¶
kind: model
name: stg_customers
materialization: view
description: Staged customers, one row per customer.
depends_on:
- source: {source: jaffle_shop_raw, name: raw_customers}
columns:
- name: customer_id
type: bigint
description: Unique customer identifier.
tests: [unique, not_null]
Emits with contract.enforced: true when the DataLex columns carry
data_type, so dbt parse passes without edits.
Schema: datalex_core/_schemas/datalex/model.schema.json.
kind: term — glossary entries¶
kind: term
name: customer
definition: A person or organization that has placed at least one order.
synonyms: [buyer, account]
steward: growth
Columns reference terms via terms: [term:customer]. Terms are loaded
independently of entities, so you can build the glossary incrementally.
kind: domain — subject-area grouping¶
kind: domain
name: sales
description: Everything orders, revenue, and pipeline.
entities: [customer, order, invoice]
color: "#3b82f6"
Drives grouping in the UI and per-domain batch exports.
kind: policy — governance rules¶
kind: policy
name: require_owner
rule: require_owner
severity: error
applies_to: [entity]
The validator enforces policies during datalex datalex validate. See
governance-policy-spec in archive for
rule semantics (still accurate; the wrapper changed, the rules didn't).
kind: snippet — reusable fragments¶
kind: snippet
name: audit_columns
description: Standard created_at / updated_at columns.
targets: [entity]
apply:
columns:
- name: created_at
type: timestamp
default: now()
- name: updated_at
type: timestamp
Entities opt in via columns: - use: audit_columns. Preview the expanded
output with datalex datalex expand <root>.
Schema: datalex_core/_schemas/datalex/snippet.schema.json.
kind: diagram — ER diagram composition (v0.3+)¶
A diagram file composes an ER view from N referenced entity/model
files. Entity definitions stay in their source .model.yaml or dbt
schema.yml — the diagram only stores references and canvas
positions, so moving a node in one diagram never touches another.
kind: diagram
name: customer_360
title: Customer 360
description: Customer + orders overview.
entities:
- file: models/staging/schema.yml # path relative to project root
entity: stg_customers # entity name within that file
x: 60
y: 60
- file: models/marts/dim_customers.yml
entity: dim_customers
x: 360
y: 60
width: 280
edges_overrides: [] # optional: hide/relabel inferred edges
viz:
layoutMode: elk
groupBySubjectArea: false
Created by clicking New Diagram in the Explorer, or by writing the
file by hand. Default location: datalex/diagrams/<slug>.diagram.yaml
(discoverable via the diagrams: glob in datalex.yaml).
Schema: datalex_core/_schemas/datalex/diagram.schema.json.
How the loader works¶
- Reads
datalex.yamlfor the project manifest and glob overrides. - Resolves imports (see Cross-repo imports below).
- Walks each configured glob, streaming one file at a time (no
whole-project
yaml.safe_load). - Dispatches on
kind:to the per-kind parser/validator. - Caches parsed docs by
sha256(content)underbuild/.cache/or~/.datalex/cache/so unchanged files don't re-parse on the next run.
Errors carry file, line, column, and a suggested_fix where the
parser can produce one.
Cross-repo imports¶
imports: in datalex.yaml lets one project consume another:
imports:
- package: acme/warehouse-core@1.4.0
git: https://github.com/acme/warehouse-core.git
ref: v1.4.0
alias: wc
- package: local/shared
path: ../shared-models
datalex datalex packages resolve fetches, caches (under ~/.datalex/packages/
by default), and writes a content-hashed lockfile at .datalex/lock.yaml.
Later loads reject drift unless you re-run with --update.
Imported entities are namespaced under their alias: @wc.shared_dim
resolves the imported shared_dim entity without colliding with a local
entity of the same name.
Conventions¶
- Names match
^[a-z][a-z0-9_]*$— snake_case identifiers. - Tags match
^[a-z][a-z0-9-]*$— kebab-case allowed. meta.datalex.*is owned by DataLex emitters/importers; never write into it by hand. Any othermeta.*is yours and survives round-trip.
See also¶
- Tutorial: dbt sync in 5 minutes
- CLI cheat sheet
- Architecture
- JSON Schemas — machine-readable reference