Skip to main content

Data Quality

Overview

The Data Quality (DQ HUB) in DataDios empowers you to define, manage, and monitor data quality rules across multiple data sources including PostgreSQL, SQL Server, and Oracle. This guide walks you through creating data sources, configuring connections, building rules, and validating data quality results.

Rule Creation Approaches

DataDios offers two powerful approaches for creating data quality rules:

  1. Manual Rule Creation (covered in this guide)

    • Full control over SQL logic and rule parameters
    • Best for custom business logic and specific validation scenarios
    • Ideal when you need precise control over rule behavior
  2. AI-Powered Rule Generation (see Generate DQ Rules)

    • Automatically generates rules from database metadata
    • Accelerates onboarding of new databases
    • Best for quick setup and standard data quality checks
tip

For new databases, consider using AI-Powered Rule Generation first to create baseline rules, then customize or add manual rules for specific business requirements.


Step 1: Create a DQ HUB Data Source

Navigation:
Data Source → Sources (from the left sidebar menu)

Action:
Click the "CREATE SOURCE" button in the top-right corner.

What you’ll see:

  • Main content area listing Data Sources (columns: NAME, OWNERSHIP, TYPE, ACTIONS)
  • "No Record found" message if no sources exist yet
  • Action buttons: TIMELINE SUMMARY, IMPORT, CREATE SOURCE

Step 2: Initial Setup for DQ HUB Source

Dialog: The "Create Data Sources" modal opens.

Fields to fill:

  1. DS Type – Choose from dropdown (e.g., DQ HUB, PostgreSQL Database, etc.)
  2. Name – Enter a descriptive name for your data source

Actions:

  • CREATE → Save configuration
  • TEST CONNECTION → Validate settings before saving

Options available:

  • Form / JSON toggle for configuration input
  • Supported Data Sources for DQ HUB:
    • PostgreSQL Database
    • SQL Server Database
    • Oracle Database
note

At least one supported data source is required to create data quality rules.


Step 3: PostgreSQL Database Configuration

Connection Parameters include:

Required Fields:

  • Host → Database server address (e.g., localhost)
  • Port → Database port number (e.g., 5432)
  • Username → Database user (e.g., postgres)
  • Password → Database password
  • Database → Target database name (e.g., QuickStart_DQ)
  • Schema → Database schema (e.g., public)

Optional Fields:

  • Schedule_sync → Frequency of data sync (Daily, Weekly, etc.)
  • Secret_name → Credential management key

Actions:

  • CREATE → Save the configuration
  • TEST CONNECTION → Verify connectivity

Step 4: View Created Data Sources

What you’ll see:

  • Hierarchical tree view showing:
    • Database sources (with Database label)
    • DQ HUB sources (with DQ HUB label)
    • Source ownership & type (PostgreSQL, SQL Server, etc.)

Available Actions:

info
  • Edit → Modify source configuration
  • Delete → Remove a source
  • View Details → Access metadata
  • Expand/Collapse → Navigate hierarchy

Step 5: Create Data Quality Rules

Navigation:
Click a DQ HUB source → Rule Creation

Rule Example (SQL):

SELECT FIRST_NAME
FROM public."dq_customer"
WHERE PRIMARY_EMAIL IS NULL;

Rule Configuration Fields:

FieldDescriptionExample
Database_nameSelect source databaseQuickStart_DQ
Schema_nameChoose schemapublic
Table_nameSelect target tabledq_customer
Column_nameChoose column to validatePRIMARY_EMAIL
Rule_valueSQL query for validationSELECT FIRST_NAME FROM public."dq_customer" WHERE PRIMARY_EMAIL IS NULL;
Rule_descRule descriptionCheck missing emails
SeverityImpact levelLow, Medium, High
Scoring_typeScoring methodPercent(%), Count
PointsRule weightage10
Preview_rowsRows to preview50
Run_time_limitMax execution time60 seconds
datasetAPI_HUB / custom datasetAPI_HUB
Rule_nameDescriptive nameNull Email Check
Rule_typeNative SQL / Custom LogicNative SQL
CategoryBusiness classificationCustomer Data
DimensionQuality aspectCompleteness
Rule_levelScopeTable Level

Actions:

  • CREATE → Save rule
  • VALIDATE → Test SQL syntax
  • CANCEL → Discard changes
  • CREATE & SCHEDULE → Save + schedule automatic runs

Step 6: Manage Created Rules

What you’ll see:

  • Hierarchical rule list: Domain → Source → Schema → Table → Column
  • Each rule with quick actions

Available Actions:

info
  • Edit → Modify rule
  • Delete → Remove rule
  • View Details → See rule metadata
  • Expand/Collapse → Navigate hierarchy
  • Run Now → Execute rule immediately

Step 7: Rule Details View

Information displayed:

  • General Info → RULE_ID, RULE_NAME, RULE_TYPE, RULE_VALUE, DS_TYPE, DS_NAME
  • Data Source Info → DATABASE, SCHEMA, TABLE, COLUMN
  • Metadata → SEVERITY, SCORING_TYPE, POINTS, PREVIEW_ROWS, RUN_TIME_LIMIT, DATASET, RULE_DESC
  • Classification → CATEGORY, DIMENSION, RULE_LEVEL
  • Audit Info → CREATION_TIME, UPDATED_TIME

Rule Details also include:

  • Execution history
  • Validation results

Step 8: View Validation Results

Run Details Tab:

  • Sample data rows failing validation
  • Example: Records with NULL PRIMARY_EMAIL values
  • Columns shown (e.g., FIRST_NAME, WORK_PHONE)
  • Pagination for large datasets

Insights Provided:

  • Detect nulls or invalid values in key fields
  • Track completeness & accuracy issues
  • Review historical trends in data quality

Demo Video

Watch the walkthrough video below for setting up and managing DQ HUB:


  1. Use left sidebar for quick navigation
  2. Follow the breadcrumb trail to track progress
  3. Action buttons are always in the top-right corner
  4. Use search to quickly find sources/rules
  5. Tree view provides structured navigation

Troubleshooting

  • Connection Issues → Verify host, port, credentials
  • Permission Errors → Ensure database user has correct privileges
  • Query Failures → Validate SQL syntax in rule_value

Next Steps

Now that you understand manual data quality rule creation, explore these advanced features:

  • Generate DQ Rules (AI-Powered) - Automatically generate rules from database metadata
  • Scheduling - Automate rule execution with scheduled workflows
  • Alerting - Configure notifications for data quality violations
  • Dashboards - Visualize data quality trends and metrics
info

For bulk rule creation across multiple tables, consider using the AI-Powered Rule Generation feature to save time.