Data Transformation
Overview
Data Transformation in SmartDiff allows you to normalize data before comparison using transformation expressions. This is useful when source and target data have formatting differences that are not actual data discrepancies.
Common scenarios include:
- Case differences: Source has
'EDA-24424', Target has'eda-24424' - Leading zeros: Source has
'131234', Target has'00000131234' - Whitespace: Extra spaces in source or target values
- Date formats: Different timestamp representations
How to Add Transformation Rules
Step 1: Select Source and Target Objects
Select the source and target data sources and objects you want to compare.

Step 2: Open Filters and Transformation Panel
Click on the Filters and Transformation option to access transformation settings.

Step 3: Add New Transformation Rule
Click the + icon in the Transform section to add a new transformation rule.
![]()
Step 4: Configure Transformation Rule
Configure your transformation rule by:
- Entering a name for the transformation
- Selecting the column to apply the transformation
- Choosing the transformation type or entering a custom expression

Step 5: Verify with Sample Data
After adding the rule, verify the transformation using sample data preview. This shows how the transformation will be applied before running the actual diff.

Step 6: Add Additional Rules
You can add multiple transformation rules as needed. Each rule can target different columns or apply different transformations.

Expression Syntax
All transformation expressions must be pandas compatible. You can write any complex expression that pandas supports, including chained operations, custom functions, and advanced string manipulations. For the full list of available string methods and operations, refer to the official pandas documentation.
Transformation expressions use the following placeholders:
| Placeholder | Description |
|---|---|
{{src_data}} | Source DataFrame - use this to transform source data |
{{tgt_data}} | Target DataFrame - use this to transform target data |
{{COLUMN_NAME}} | Actual column name (e.g., {{BSTNK}}, {{KUNNR}}) |
Basic Expression Format
{{src_data}}['{{COLUMN_NAME}}'] = {{src_data}}['{{COLUMN_NAME}}'].transformation()
{{tgt_data}}['{{COLUMN_NAME}}'] = {{tgt_data}}['{{COLUMN_NAME}}'].transformation()
Real-World Examples
Convert to uppercase and remove trailing hyphen:
{{src_data}}['{{BSTNK}}'] = {{src_data}}['{{BSTNK}}'].astype(str).str.upper().str.rstrip('-')
Remove leading zeros from customer number:
{{src_data}}['{{KUNNR}}'] = {{src_data}}['{{KUNNR}}'].astype(str).str.lstrip('0')
Common Transformations
Case Transformation
Convert to Lowercase
Converts values to lowercase to match target format.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.lower()
Example: 'EDA-24424' → 'eda-24424'
Convert to Uppercase
Converts values to uppercase to match source format.
Expression:
{{tgt_data}}['{{COLUMN_NAME}}']={{tgt_data}}['{{COLUMN_NAME}}'].astype(str).str.upper()
Example: 'eda-24424' → 'EDA-24424'
Leading Zeros
Remove Leading Zeros
Strips leading zeros from values.
Expression:
{{tgt_data}}['{{COLUMN_NAME}}']={{tgt_data}}['{{COLUMN_NAME}}'].astype(str).str.lstrip('0')
Example: '00000131234' → '131234'
Pad with Leading Zeros
Pads values with leading zeros to a specific length.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.zfill(11)
Example: '131234' → '00000131234'
Whitespace Handling
Trim Whitespace
Removes leading and trailing whitespace.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.strip()
Example: ' hello ' → 'hello'
Special Characters
Remove Special Characters
Removes all special characters, keeping only alphanumeric and spaces.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace(r'[^\w\s]', '', regex=True)
Example: 'EDA-24424!' → 'EDA24424'
Remove Double Quotes
Removes double quote characters.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace('"', '', regex=False)
Example: '"value"' → 'value'
Substring Operations
Keep First N Characters
Keeps only the first N characters of the value.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[:10]
Example: 'EDA-24424-EXTRA' → 'EDA-24424-'
Keep Last N Characters
Keeps only the last N characters of the value.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[-5:]
Example: 'EDA-24424' → '24424'
Remove First N Characters
Removes the first N characters from the value.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[4:]
Example: 'EDA-24424' → '24424'
Date/Time Transformations
Truncate Time (Keep Date Only)
Removes time portion, keeping only the date (first 10 characters).
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str[:10]
Example: '2024-01-15 10:30:00' → '2024-01-15'
Replace/Substitute
Replace Specific Value
Replaces a specific string with another value.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace('NULL', '', regex=False)
Remove All Digits
Removes all numeric digits from the value.
Expression:
{{src_data}}['{{COLUMN_NAME}}']={{src_data}}['{{COLUMN_NAME}}'].astype(str).str.replace(r'\d+', '', regex=True)
Example: 'EDA-24424' → 'EDA-'
Quick Reference Table
| Transformation | Method | Example Input | Example Output |
|---|---|---|---|
| Lowercase | .str.lower() | 'EDA-24424' | 'eda-24424' |
| Uppercase | .str.upper() | 'eda-24424' | 'EDA-24424' |
| Title Case | .str.title() | 'hello world' | 'Hello World' |
| Strip whitespace | .str.strip() | ' hello ' | 'hello' |
| Strip left | .str.lstrip() | ' hello' | 'hello' |
| Strip right | .str.rstrip() | 'hello ' | 'hello' |
| Remove leading zeros | .str.lstrip('0') | '00000131234' | '131234' |
| Pad with zeros | .str.zfill(N) | '131234' (N=11) | '00000131234' |
| First N chars | .str[:N] | 'hello' (N=3) | 'hel' |
| Last N chars | .str[-N:] | 'hello' (N=3) | 'llo' |
| Remove first N | .str[N:] | 'hello' (N=2) | 'llo' |
| Remove last N | .str[:-N] | 'hello' (N=2) | 'hel' |
| Replace literal | .str.replace('a', 'b', regex=False) | 'aaa' | 'bbb' |
| Replace regex | .str.replace(r'\d+', '', regex=True) | 'abc123' | 'abc' |
Built-in Ignore Options
SmartDiff also provides built-in ignore options that can be enabled without custom expressions:
- Ignore Leading Zeros: Automatically strips leading zeros from both source and target before comparison
- Ignore Double Quotes: Removes double quote characters before comparison
These can be enabled via diff settings without writing custom transformation expressions.
Best Practices
- Always use
.astype(str)before string operations to handle mixed data types - Test expressions using the sample data preview before applying to the full diff
- Use
regex=Falsefor literal string replacements (faster performance) - Use
regex=Truefor pattern-based replacements - Escape backslashes in JSON: use
\\dinstead of\d
Troubleshooting
- Expression errors: Check that placeholders are correctly formatted with double braces
{{}} - No transformation applied: Verify the column name matches exactly (case-sensitive)
- Unexpected results: Use sample data preview to validate before running the full diff