Adding Your First Data Source: A Step-by-Step Guide
Follow along as we walk through adding a new data source to the Nexus Model Architecture. From metadata configuration to validation, see how it's done in 3-5 days.
Why This Process Is Different
Traditional data warehouse implementations require weeks of custom development for each new source. The Nexus Model Architecture flips this: configuration over code.
In this guide, we'll walk through the complete process—from assessment to production deployment.
Stage 1: Preparation (0.5-1 day)
Step 1: Assess the Data Source
Before diving in, answer these key questions:
- Business purpose: What insights will this provide?
- Data type: Dimension, fact, EAV properties, or bridge table?
- Volume: Current row count and growth rate
- Dependencies: Does it reference other entities?
Step 2: Prepare Raw Data in Stage Layer
Load your source data into the appropriate Stage database:
- DS_Stage: For common/reference data
- DS_Stage_[SourceName]: For specific source systems
Ensure the table has a unique identifier column (primary key or unique index).
Step 3: Set Up History Tracking
Enable change data capture using the framework's built-in procedure:
EXEC [dbo].[BuildHistoryTracking] '[dbo].[YourSourceTable]'
This creates a History table that tracks all changes to your source data.
Stage 2: Metadata Configuration (0.5-1 day)
Step 1: Add SourceSystem Entry
This is where the magic happens. One INSERT statement defines your entire integration:
INSERT INTO Reference.SourceSystem (
SourceDescription,
SourceDatabase,
SourceTable,
TargetSchema,
TargetTable,
SourceIdentifierColumn,
TargetIdentifierColumn,
ValidFromDate,
ValidToDate
) VALUES (
'Your Source Name',
'DS_Stage_YourSource',
'YourTable',
'Sales', -- or Product, Contact, etc.
'Order', -- target table name
'OrderId',
'OrderId',
'2024-01-01',
'9999-12-31'
)
Step 2: Initialize Column Mappings
Run the initialization procedure to auto-discover columns:
EXEC [Reference].[InitSourceColumnMapping]
Then manually map source columns to target columns in the SourceColumnMapping table.
Step 3: Populate Flow Table
Run the standard stored procedure to prepare incremental batches:
EXEC [dbo].[ReadSourceTable_NewRecords] @SourceSystemId = [YourId]
Stage 3: ETL Configuration (1-2 days)
Create Dataflow
In Azure Data Factory (or SSIS), create a dataflow that:
- Reads from the Flow table
- Performs FK lookups
- Validates business rules
- Splits valid records to DWH, invalid to Fallout
Create Pipeline
Orchestrate the process:
- Execute ReadSourceTable_NewRecords
- Execute dataflow
- Log completion
Stage 4: Validation & Deployment (0.5-1 day)
Validate Data Quality
- Check row counts match expectations
- Review NULL values
- Verify no duplicates
Review Fallout
Check the Fallout table for any data quality issues and resolve them.
Schedule Production Pipeline
Configure the pipeline to run on your desired schedule (daily, hourly, etc.) and set up alerts.
That's It!
In 3-5 days, you've integrated a new data source—without writing custom ETL code. The framework handles the heavy lifting; you configure the metadata.
For detailed technical documentation, see our Resources page.
Ready to Transform Your ETL Processes?
Discover how metadata-driven ETL can accelerate your data integration projects.