Data Quality
Overview
The Data Quality (DQ HUB) in DataDios empowers you to define, manage, and monitor data quality rules across multiple data sources including PostgreSQL, SQL Server, and Oracle. This guide walks you through creating data sources, configuring connections, building rules, and validating data quality results.
Rule Creation Approaches
DataDios offers two powerful approaches for creating data quality rules:
-
Manual Rule Creation (covered in this guide)
- Full control over SQL logic and rule parameters
- Best for custom business logic and specific validation scenarios
- Ideal when you need precise control over rule behavior
-
AI-Powered Rule Generation (see Generate DQ Rules)
- Automatically generates rules from database metadata
- Accelerates onboarding of new databases
- Best for quick setup and standard data quality checks
For new databases, consider using AI-Powered Rule Generation first to create baseline rules, then customize or add manual rules for specific business requirements.
Step 1: Create a DQ HUB Data Source
Navigation:
Data Source → Sources (from the left sidebar menu)
Action:
Click the "CREATE SOURCE" button in the top-right corner.
What you’ll see:
- Main content area listing Data Sources (columns: NAME, OWNERSHIP, TYPE, ACTIONS)
"No Record found"message if no sources exist yet- Action buttons: TIMELINE SUMMARY, IMPORT, CREATE SOURCE
Step 2: Initial Setup for DQ HUB Source
Dialog: The "Create Data Sources" modal opens.
Fields to fill:
- DS Type – Choose from dropdown (e.g., DQ HUB, PostgreSQL Database, etc.)
- Name – Enter a descriptive name for your data source
Actions:
- CREATE → Save configuration
- TEST CONNECTION → Validate settings before saving
Options available:
- Form / JSON toggle for configuration input
- Supported Data Sources for DQ HUB:
- PostgreSQL Database
- SQL Server Database
- Oracle Database
At least one supported data source is required to create data quality rules.
Step 3: PostgreSQL Database Configuration
Connection Parameters include:
Required Fields:
- Host → Database server address (e.g.,
localhost) - Port → Database port number (e.g.,
5432) - Username → Database user (e.g.,
postgres) - Password → Database password
- Database → Target database name (e.g.,
QuickStart_DQ) - Schema → Database schema (e.g.,
public)
Optional Fields:
- Schedule_sync → Frequency of data sync (Daily, Weekly, etc.)
- Secret_name → Credential management key
Actions:
- CREATE → Save the configuration
- TEST CONNECTION → Verify connectivity
Step 4: View Created Data Sources
What you’ll see:
- Hierarchical tree view showing:
- Database sources (with Database label)
- DQ HUB sources (with DQ HUB label)
- Source ownership & type (PostgreSQL, SQL Server, etc.)
Available Actions:
- Edit → Modify source configuration
- Delete → Remove a source
- View Details → Access metadata
- Expand/Collapse → Navigate hierarchy
Step 5: Create Data Quality Rules
Navigation:
Click a DQ HUB source → Rule Creation
Rule Example (SQL):
SELECT FIRST_NAME
FROM public."dq_customer"
WHERE PRIMARY_EMAIL IS NULL;
Rule Configuration Fields:
| Field | Description | Example |
|---|---|---|
| Database_name | Select source database | QuickStart_DQ |
| Schema_name | Choose schema | public |
| Table_name | Select target table | dq_customer |
| Column_name | Choose column to validate | PRIMARY_EMAIL |
| Rule_value | SQL query for validation | SELECT FIRST_NAME FROM public."dq_customer" WHERE PRIMARY_EMAIL IS NULL; |
| Rule_desc | Rule description | Check missing emails |
| Severity | Impact level | Low, Medium, High |
| Scoring_type | Scoring method | Percent(%), Count |
| Points | Rule weightage | 10 |
| Preview_rows | Rows to preview | 50 |
| Run_time_limit | Max execution time | 60 seconds |
| dataset | API_HUB / custom dataset | API_HUB |
| Rule_name | Descriptive name | Null Email Check |
| Rule_type | Native SQL / Custom Logic | Native SQL |
| Category | Business classification | Customer Data |
| Dimension | Quality aspect | Completeness |
| Rule_level | Scope | Table Level |
Actions:
- CREATE → Save rule
- VALIDATE → Test SQL syntax
- CANCEL → Discard changes
- CREATE & SCHEDULE → Save + schedule automatic runs
Step 6: Manage Created Rules
What you’ll see:
- Hierarchical rule list: Domain → Source → Schema → Table → Column
- Each rule with quick actions
Available Actions:
- Edit → Modify rule
- Delete → Remove rule
- View Details → See rule metadata
- Expand/Collapse → Navigate hierarchy
- Run Now → Execute rule immediately
Step 7: Rule Details View
Information displayed:
- General Info → RULE_ID, RULE_NAME, RULE_TYPE, RULE_VALUE, DS_TYPE, DS_NAME
- Data Source Info → DATABASE, SCHEMA, TABLE, COLUMN
- Metadata → SEVERITY, SCORING_TYPE, POINTS, PREVIEW_ROWS, RUN_TIME_LIMIT, DATASET, RULE_DESC
- Classification → CATEGORY, DIMENSION, RULE_LEVEL
- Audit Info → CREATION_TIME, UPDATED_TIME
Rule Details also include:
- Execution history
- Validation results
Step 8: View Validation Results
Run Details Tab:
- Sample data rows failing validation
- Example: Records with
NULLPRIMARY_EMAIL values - Columns shown (e.g., FIRST_NAME, WORK_PHONE)
- Pagination for large datasets
Insights Provided:
- Detect nulls or invalid values in key fields
- Track completeness & accuracy issues
- Review historical trends in data quality
Demo Video
Watch the walkthrough video below for setting up and managing DQ HUB:
Navigation Tips
- Use left sidebar for quick navigation
- Follow the breadcrumb trail to track progress
- Action buttons are always in the top-right corner
- Use search to quickly find sources/rules
- Tree view provides structured navigation
Troubleshooting
- Connection Issues → Verify host, port, credentials
- Permission Errors → Ensure database user has correct privileges
- Query Failures → Validate SQL syntax in
rule_value
Next Steps
Now that you understand manual data quality rule creation, explore these advanced features:
- Generate DQ Rules (AI-Powered) - Automatically generate rules from database metadata
- Scheduling - Automate rule execution with scheduled workflows
- Alerting - Configure notifications for data quality violations
- Dashboards - Visualize data quality trends and metrics
For bulk rule creation across multiple tables, consider using the AI-Powered Rule Generation feature to save time.