File Diff
Overview
File Diff enables you to compare data from different file formats, making it easy to validate data exports, ETL processes, or data migrations.
Supported file formats include CSV, Excel, JSON, XML, and other common file formats.
Supported File Formats
- CSV/TSV: Comma/Tab-separated values
- Parquet/Avro/ORC: Columnar storage file formats
- Excel: .xlsx and .xls formats
- JSON: JavaScript Object Notation
- XML: eXtensible Markup Language
- Fixed Width: Fixed-width text files
- Custom Delimited: User-specified delimiters
Key Features
- Multiple Format Support: Compare files in different formats
- Schema Mapping: Map columns/fields between different file structures
- Large File Handling: Efficient processing of large files
- Character Encoding Support: Automatic detection and handling of different encodings
Advanced Options
- Ignore header rows
- Case-sensitive comparison
- Trim whitespace
- Handle null/empty values
- Custom date/time formats
File-Specific Guides
- CSV File Diff – Step-by-step guide for comparing CSV (and other table-structured) files
- XML File Diff – Step-by-step guide for comparing XML (and other hierarchical) files
Best Practices
tip
- Use consistent file encodings (UTF-8 recommended)
- Include headers in your files for better column identification
- For large files, consider splitting them into smaller chunks
- Save comparison configurations for recurring validations
- Review the summary statistics before analyzing detailed differences
Troubleshooting
warning
- File Format Issues: Ensure files are not corrupted and match the specified format
- Encoding Problems: If you see special character issues, verify the file encoding
- Column Mismatches: Verify that column names and data types match between files