Adding Your First Data Source: A Step-by-Step Guide

Follow along as we walk through adding a new data source to the Nexus Model Architecture. From metadata configuration to validation, see how it's done in 3-5 days.

Team at a whiteboard planning a data integration workflow

Why This Process Is Different

Traditional data warehouse implementations require weeks of custom development for each new source. The Nexus Model Architecture flips this: configuration over code.

In this guide, we'll walk through the complete process—from assessment to production deployment.

%%{init: {'theme':'base'}}%% flowchart TD A["fa:fa-clipboard-list Stage 1: Preparation"] --> B["fa:fa-cogs Stage 2: Metadata Config"] B --> C["fa:fa-project-diagram Stage 3: ETL Config"] C --> D["fa:fa-tasks Stage 4: Validation"] D --> E["fa:fa-rocket Production"] subgraph A A1("Assess Source") A2("Load to Stage") A3("Build History") end subgraph B B1("Add SourceSystem") B2("Init Mappings") B3("Populate Flow") end subgraph C C1("Create Dataflow") C2("Create Pipeline") end subgraph D D1("Validate Data") D2("Review Fallout") end style E fill:#dcfce7,stroke:#10b981

Stage 1: Preparation (0.5-1 day)

Step 1: Assess the Data Source

Before diving in, answer these key questions:

  • Business purpose: What insights will this provide?
  • Data type: Dimension, fact, EAV properties, or bridge table?
  • Volume: Current row count and growth rate
  • Dependencies: Does it reference other entities?

Step 2: Prepare Raw Data in Stage Layer

Load your source data into the appropriate Stage database:

  • DS_Stage: For common/reference data
  • DS_Stage_[SourceName]: For specific source systems

Ensure the table has a unique identifier column (primary key or unique index).

Step 3: Set Up History Tracking

Enable change data capture using the framework's built-in procedure:

EXEC [dbo].[BuildHistoryTracking] '[dbo].[YourSourceTable]'

This creates a History table that tracks all changes to your source data.

Stage 2: Metadata Configuration (0.5-1 day)

Step 1: Add SourceSystem Entry

This is where the magic happens. One INSERT statement defines your entire integration:

INSERT INTO Reference.SourceSystem (
    SourceDescription,
    SourceDatabase,
    SourceTable,
    TargetSchema,
    TargetTable,
    SourceIdentifierColumn,
    TargetIdentifierColumn,
    ValidFromDate,
    ValidToDate
) VALUES (
    'Your Source Name',
    'DS_Stage_YourSource',
    'YourTable',
    'Sales',  -- or Product, Contact, etc.
    'Order',  -- target table name
    'OrderId',
    'OrderId',
    '2024-01-01',
    '9999-12-31'
)

Step 2: Initialize Column Mappings

Run the initialization procedure to auto-discover columns:

EXEC [Reference].[InitSourceColumnMapping]

Then manually map source columns to target columns in the SourceColumnMapping table.

Step 3: Populate Flow Table

Run the standard stored procedure to prepare incremental batches:

EXEC [dbo].[ReadSourceTable_NewRecords] @SourceSystemId = [YourId]

Stage 3: ETL Configuration (1-2 days)

Create Dataflow

In Azure Data Factory (or SSIS), create a dataflow that:

  • Reads from the Flow table
  • Performs FK lookups
  • Validates business rules
  • Splits valid records to DWH, invalid to Fallout

Create Pipeline

Orchestrate the process:

  1. Execute ReadSourceTable_NewRecords
  2. Execute dataflow
  3. Log completion

Stage 4: Validation & Deployment (0.5-1 day)

Validate Data Quality

  • Check row counts match expectations
  • Review NULL values
  • Verify no duplicates

Review Fallout

Check the Fallout table for any data quality issues and resolve them.

Schedule Production Pipeline

Configure the pipeline to run on your desired schedule (daily, hourly, etc.) and set up alerts.

That's It!

In 3-5 days, you've integrated a new data source—without writing custom ETL code. The framework handles the heavy lifting; you configure the metadata.

For detailed technical documentation, see our Resources page.

Ready to Transform Your ETL Processes?

Discover how metadata-driven ETL can accelerate your data integration projects.