Skip to main content

File Diff

Overview

File Diff enables you to compare data from different file formats, making it easy to validate data exports, ETL processes, or data migrations.

Supported file formats include CSV, Excel, JSON, XML, and other common file formats.


Supported File Formats

  • CSV/TSV: Comma/Tab-separated values
  • Parquet/Avro/ORC: Columnar storage file formats
  • Excel: .xlsx and .xls formats
  • JSON: JavaScript Object Notation
  • XML: eXtensible Markup Language
  • Fixed Width: Fixed-width text files
  • Custom Delimited: User-specified delimiters

Key Features

  • Multiple Format Support: Compare files in different formats
  • Schema Mapping: Map columns/fields between different file structures
  • Large File Handling: Efficient processing of large files
  • Character Encoding Support: Automatic detection and handling of different encodings

Advanced Options

  • Ignore header rows
  • Case-sensitive comparison
  • Trim whitespace
  • Handle null/empty values
  • Custom date/time formats

File-Specific Guides

  • CSV File Diff – Step-by-step guide for comparing CSV (and other table-structured) files
  • XML File Diff – Step-by-step guide for comparing XML (and other hierarchical) files

Best Practices

tip
  • Use consistent file encodings (UTF-8 recommended)
  • Include headers in your files for better column identification
  • For large files, consider splitting them into smaller chunks
  • Save comparison configurations for recurring validations
  • Review the summary statistics before analyzing detailed differences

Troubleshooting

warning
  • File Format Issues: Ensure files are not corrupted and match the specified format
  • Encoding Problems: If you see special character issues, verify the file encoding
  • Column Mismatches: Verify that column names and data types match between files