Skip to main content

Data Transformation

Overview

Data Transformation in SmartDiff allows you to normalize data before comparison using transformation expressions. This is useful when source and target data have formatting differences that are not actual data discrepancies.

Common scenarios include:

  • Case differences: Source has 'EDA-24424', Target has 'eda-24424'
  • Leading zeros: Source has '131234', Target has '00000131234'
  • Whitespace: Extra spaces in source or target values
  • Date formats: Different timestamp representations

How to Add Transformation Rules

Step 1: Select Source and Target Objects

Select the source and target data sources and objects you want to compare.

Select Source and Target Objects


Step 2: Open Filters and Transformation Panel

Click on the Filters and Transformation option to access transformation settings.

Click on Filters and Transformation


Step 3: Add New Transformation Rule

Click the + icon in the Transform section to add a new transformation rule.

Click on Transform Plus Icon


Step 4: Configure Transformation Rule

Configure your transformation rule by:

  • Entering a name for the transformation
  • Selecting the column to apply the transformation
  • Choosing the transformation type or entering a custom expression

Add Transformation Rule


Step 5: Verify with Sample Data

After adding the rule, verify the transformation using sample data preview. This shows how the transformation will be applied before running the actual diff.

Verify Sample Data


Step 6: Add Additional Rules

You can add multiple transformation rules as needed. Each rule can target different columns or apply different transformations.

Add Another Rule


Expression Syntax

Pandas Compatible Expressions

All transformation expressions must be pandas compatible. You can write any complex expression that pandas supports, including chained operations, custom functions, and advanced string manipulations. For the full list of available string methods and operations, refer to the official pandas documentation.

Transformation expressions use the following placeholders:

PlaceholderDescription
{{src_data}}Source DataFrame - use this to transform source data
{{tgt_data}}Target DataFrame - use this to transform target data
{{COLUMN_NAME}}Actual column name (e.g., {{BSTNK}}, {{KUNNR}})

Basic Expression Format

{{src_data}}['{{COLUMN_NAME}}'] = {{src_data}}['{{COLUMN_NAME}}'].transformation()
{{tgt_data}}['{{COLUMN_NAME}}'] = {{tgt_data}}['{{COLUMN_NAME}}'].transformation()

Real-World Examples

Convert to uppercase and remove trailing hyphen:

{{src_data}}['{{BSTNK}}'] = {{src_data}}['{{BSTNK}}'].astype(str).str.upper().str.rstrip('-')

Remove leading zeros from customer number:

{{src_data}}['{{KUNNR}}'] = {{src_data}}['{{KUNNR}}'].astype(str).str.lstrip('0')

Common Transformations

Case Transformation

Convert to Lowercase

Converts values to lowercase to match target format.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.lower()

Example: 'EDA-24424''eda-24424'


Convert to Uppercase

Converts values to uppercase to match source format.

Expression:

{{tgt_data}}['{{COLUMN_NAME}}']={{tgt_data}}['{{COLUMN_NAME}}'].astype(str).str.upper()

Example: 'eda-24424''EDA-24424'


Leading Zeros

Remove Leading Zeros

Strips leading zeros from values.

Expression:

{{tgt_data}}['{{COLUMN_NAME}}']={{tgt_data}}['{{COLUMN_NAME}}'].astype(str).str.lstrip('0')

Example: '00000131234''131234'


Pad with Leading Zeros

Pads values with leading zeros to a specific length.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.zfill(11)

Example: '131234''00000131234'


Whitespace Handling

Trim Whitespace

Removes leading and trailing whitespace.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.strip()

Example: ' hello ''hello'


Special Characters

Remove Special Characters

Removes all special characters, keeping only alphanumeric and spaces.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace(r'[^\w\s]', '', regex=True)

Example: 'EDA-24424!''EDA24424'


Remove Double Quotes

Removes double quote characters.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace('"', '', regex=False)

Example: '"value"''value'


Substring Operations

Keep First N Characters

Keeps only the first N characters of the value.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[:10]

Example: 'EDA-24424-EXTRA''EDA-24424-'


Keep Last N Characters

Keeps only the last N characters of the value.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[-5:]

Example: 'EDA-24424''24424'


Remove First N Characters

Removes the first N characters from the value.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[4:]

Example: 'EDA-24424''24424'


Date/Time Transformations

Truncate Time (Keep Date Only)

Removes time portion, keeping only the date (first 10 characters).

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[:10]

Example: '2024-01-15 10:30:00''2024-01-15'


Replace/Substitute

Replace Specific Value

Replaces a specific string with another value.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace('NULL', '', regex=False)

Remove All Digits

Removes all numeric digits from the value.

Expression:

{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace(r'\d+', '', regex=True)

Example: 'EDA-24424''EDA-'


Quick Reference Table

TransformationMethodExample InputExample Output
Lowercase.str.lower()'EDA-24424''eda-24424'
Uppercase.str.upper()'eda-24424''EDA-24424'
Title Case.str.title()'hello world''Hello World'
Strip whitespace.str.strip()' hello ''hello'
Strip left.str.lstrip()' hello''hello'
Strip right.str.rstrip()'hello ''hello'
Remove leading zeros.str.lstrip('0')'00000131234''131234'
Pad with zeros.str.zfill(N)'131234' (N=11)'00000131234'
First N chars.str[:N]'hello' (N=3)'hel'
Last N chars.str[-N:]'hello' (N=3)'llo'
Remove first N.str[N:]'hello' (N=2)'llo'
Remove last N.str[:-N]'hello' (N=2)'hel'
Replace literal.str.replace('a', 'b', regex=False)'aaa''bbb'
Replace regex.str.replace(r'\d+', '', regex=True)'abc123''abc'

Built-in Ignore Options

SmartDiff also provides built-in ignore options that can be enabled without custom expressions:

  • Ignore Leading Zeros: Automatically strips leading zeros from both source and target before comparison
  • Ignore Double Quotes: Removes double quote characters before comparison

These can be enabled via diff settings without writing custom transformation expressions.


Best Practices

tip
  1. Always use .astype(str) before string operations to handle mixed data types
  2. Test expressions using the sample data preview before applying to the full diff
  3. Use regex=False for literal string replacements (faster performance)
  4. Use regex=True for pattern-based replacements
  5. Escape backslashes in JSON: use \\d instead of \d

Troubleshooting

warning
  • Expression errors: Check that placeholders are correctly formatted with double braces {{}}
  • No transformation applied: Verify the column name matches exactly (case-sensitive)
  • Unexpected results: Use sample data preview to validate before running the full diff