Data Source Diff

Overview

Data Source Diff allows you to compare entire data sources or specific tables between different database instances. This is particularly useful for validating data migrations or ensuring consistency across environments.

Key Features

Bulk Comparison: Compare multiple tables in one operation
Filtering Options: Apply filters to focus on specific subsets of data
Summary Diff: Get a high-level summary of differences between source and target
Meta Diff: Compare table structures and metadata across data sources
Data Diff: Verify row-level data consistency between source and target
Export Diff Report: Export results as PDF/CSV/JSON for sharing or further analysis

Video Demonstration

How to Use SmartDiff Data Source Diff

Step 1: Access SmartDiff

Select Smart Diff from the left panel
Click Workflow to view workflow history

Step 2: Create New Diff Workflow

Click CREATE DIFF
By default, DATA SOURCE DIFF is selected
Choose Source and Target data sources (can be the same or different)
Click Next

Step 3: Select Items

On this page, you select the items to compare. Options available:

Feature	Description
AUTO MAP	Automatically maps items based on similarity. Low-similarity items are skipped.
Transform & Filtering	Apply transformations or exclude specific columns (e.g., exclude timestamp columns for better accuracy).
SmartDiff Configuration	Configure settings: enable/disable summary diff, choose summary type (count or range), set batch size, enable parallel/sequential execution, and define storage (S3/Azure Blob).
Filter Columns	Exclude unnecessary columns or define custom keys for comparison.
SCHEDULE (Coming Soon)	Schedule diffs to run at specific times to reduce system load.

Once ready:

Review items from both source and target (schemas, tables, etc.)
Map corresponding items
Click Next

Step 4: Configure Key Column

note

If a primary key exists, this step is skipped.

Select a key column to map records (e.g., ID column for cost tables)
Click Proceed

Step 5: Review Results

tip

The diff runs asynchronously. You don’t need to wait — check results once the process finishes.

View live progress with completion percentage
Click View Diff to explore detailed reports

Step 6: Analyze Detailed Report

Diff Overview

Shows high-level statistics:

Diff Columns: Number of columns with differences
Diff Rows: Number of rows with differences
Same Rows: Number of identical rows
Rows in Source: Total rows in source
Rows in Target: Total rows in target
Missing Rows in Target: Rows present in source but missing in target
New Rows in Target: Rows present in target but not in source

Summary View

High-level count differences without full row-by-row comparison:

Option	Description
ALL DATA	Shows differences for all columns
ONLY DIFF	Shows only columns with differences
BY SOURCE	Compare frequency counts from source to target
BY TARGET	Compare frequency counts from target to source
Graphs	Visual differences based on column type: • Numeric: Range-based counts • Date: Monthly frequency comparisons • String/Other: Frequency of unique items
Export	Export summary results as CSV/PDF

Meta Diff

Compares metadata (schema and column properties):

Column Name: Lists all compared columns
Property Name: Metadata property (e.g., datatype, length)
Source Value / Target Value: Shows values from each system, highlighting differences in red (source) and green (target)
Export: Save metadata diff as CSV/PDF

Data Diff

Row-level data comparison with cluster-based grouping:

Feature	Description
Clusters	Rows grouped into clusters (sorted by most differences)
ONLY DIFF	Show only rows with differences
ALL DATA	Show all rows
Side by Side View	Compare source vs target in table format (color-coded: red = source diff, green = target diff)
Inline View	Highlight inline differences between values
Columns to Hide	Hide non-relevant columns
Export	Export data-level diffs as CSV/PDF

Comparison Modes

Full Comparison: Compare all rows and columns
Sample Comparison: Compare a representative sample (faster, less resource-intensive)
Key-based Comparison: Compare based on primary/custom keys

Best Practices

tip

Schedule comparisons during off-peak hours for large datasets
Use filtering to reduce load and focus on critical data
Save configurations for recurring validations
Review summary results before checking detailed diffs

Troubleshooting

warning

Connection Timeouts: Check network/database connectivity
Permission Issues: Ensure read access to required tables
Performance Issues: Use sampling for very large tables

Overview​

Key Features​

Video Demonstration​

How to Use SmartDiff Data Source Diff​

Step 1: Access SmartDiff​

Step 2: Create New Diff Workflow​

Step 3: Select Items​

Step 4: Configure Key Column​

Step 5: Review Results​

Step 6: Analyze Detailed Report​

Diff Overview​

Summary View​

Meta Diff​

Data Diff​

Comparison Modes​

Best Practices​

Troubleshooting​