Implementation

Adding Your First Data Source: A Step-by-Step Guide

Follow along as we walk through adding a new data source to the Nexus Model Architecture. From metadata configuration to validation, see how it's done in 3-5 days.

By Van Rij Data Partners January 8, 2025

Why This Process Is Different

Traditional data warehouse implementations require weeks of custom development for each new source. The Nexus Model Architecture flips this: configuration over code.

In this guide, we'll walk through the complete process—from assessment to production deployment.

%%{init: {'theme':'base'}}%% flowchart TD A["fa:fa-clipboard-list Stage 1: Preparation"] --> B["fa:fa-cogs Stage 2: Metadata Config"] B --> C["fa:fa-project-diagram Stage 3: ETL Config"] C --> D["fa:fa-tasks Stage 4: Validation"] D --> E["fa:fa-rocket Production"] subgraph A A1("Assess Source") A2("Load to Stage") A3("Build History") end subgraph B B1("Add SourceSystem") B2("Init Mappings") B3("Populate Flow") end subgraph C C1("Create Dataflow") C2("Create Pipeline") end subgraph D D1("Validate Data") D2("Review Fallout") end style E fill:#dcfce7,stroke:#10b981

Stage 1: Preparation (0.5-1 day)

Step 1: Assess the Data Source

Before diving in, answer these key questions:

Business purpose: What insights will this provide?
Data type: Dimension, fact, EAV properties, or bridge table?
Volume: Current row count and growth rate
Dependencies: Does it reference other entities?

Step 2: Prepare Raw Data in Stage Layer

Load your source data into the appropriate Stage database:

DS_Stage: For common/reference data
DS_Stage_[SourceName]: For specific source systems

Ensure the table has a unique identifier column (primary key or unique index).

Step 3: Set Up History Tracking

Enable change data capture using the framework's built-in procedure:

EXEC [dbo].[BuildHistoryTracking] '[dbo].[YourSourceTable]'

This creates a History table that tracks all changes to your source data.

Stage 2: Metadata Configuration (0.5-1 day)

Step 1: Add SourceSystem Entry

This is where the magic happens. One INSERT statement defines your entire integration:

INSERT INTO Reference.SourceSystem (
    SourceDescription,
    SourceDatabase,
    SourceTable,
    TargetSchema,
    TargetTable,
    SourceIdentifierColumn,
    TargetIdentifierColumn,
    ValidFromDate,
    ValidToDate
) VALUES (
    'Your Source Name',
    'DS_Stage_YourSource',
    'YourTable',
    'Sales',  -- or Product, Contact, etc.
    'Order',  -- target table name
    'OrderId',
    'OrderId',
    '2024-01-01',
    '9999-12-31'
)

Step 2: Initialize Column Mappings

Run the initialization procedure to auto-discover columns:

EXEC [Reference].[InitSourceColumnMapping]

Then manually map source columns to target columns in the SourceColumnMapping table.

Step 3: Populate Flow Table

Run the standard stored procedure to prepare incremental batches:

EXEC [dbo].[ReadSourceTable_NewRecords] @SourceSystemId = [YourId]

Stage 3: ETL Configuration (1-2 days)

Create Dataflow

In Azure Data Factory (or SSIS), create a dataflow that:

Reads from the Flow table
Performs FK lookups
Validates business rules
Splits valid records to DWH, invalid to Fallout

Create Pipeline

Orchestrate the process:

Execute ReadSourceTable_NewRecords
Execute dataflow
Log completion

Stage 4: Validation & Deployment (0.5-1 day)

Validate Data Quality

Check row counts match expectations
Review NULL values
Verify no duplicates

Review Fallout

Check the Fallout table for any data quality issues and resolve them.

Schedule Production Pipeline

Configure the pipeline to run on your desired schedule (daily, hourly, etc.) and set up alerts.

That's It!

In 3-5 days, you've integrated a new data source—without writing custom ETL code. The framework handles the heavy lifting; you configure the metadata.

For detailed technical documentation, see our Resources page.

Ready to Transform Your ETL Processes?

Discover how metadata-driven ETL can accelerate your data integration projects.

View Pricing Schedule Demo