Check Type |
Description |
How to Perform / Example |
Row
Count Validation |
Verify
the total number of rows in source and target tables match exactly. |
SELECT COUNT(*) FROM source_table; SELECT COUNT(*) FROM
target_table; |
Data
Completeness |
Ensure
all mandatory fields are migrated with no missing/null data unless allowed. |
Check
for nulls in mandatory columns after migration. |
Data
Type Validation |
Confirm
data types are preserved or correctly transformed in the target system. |
Compare
schema definitions before and after migration. |
Data
Accuracy |
Validate
that data values match between source and target for key fields. |
Sample
or full data comparison using checksums or hash functions. |
Duplicate
Records Check |
Ensure
duplicates are not introduced or lost during migration. |
Use
distinct counts or unique constraints validation. |
Referential
Integrity |
Verify
foreign key relationships remain intact post-migration. |
Check
counts of joined tables and constraint enforcement. |
Data
Transformation Validation |
If
transformation logic is applied during migration, validate the correctness. |
Compare
transformed data against expected output. |
Null /
Default Value Checks |
Check
if nulls and default values in source are correctly handled in target. |
Validate
null counts and default value assignments. |
Date
and Time Validation |
Confirm
dates/timestamps are correctly migrated and timezone handled. |
Compare
min/max dates and specific edge cases. |
Special
Characters and Encoding |
Verify
no data corruption occurs in text fields (e.g., encoding issues). |
Validate
text fields for special chars or encoding mismatches. |
Index
and Constraint Validation |
Ensure
indexes and constraints (unique, primary keys) are correctly applied. |
Compare
constraints and test queries on target system. |
Checksum
/ Hash Validation |
Use
checksums or hashing to compare entire rows or tables for integrity. |
Generate
hash on source and target datasets and compare. |
Sampling
Validation |
For
large datasets, validate by random sampling or critical segments. |
Query
sample rows from source and target for manual comparison. |
Audit
Trail Verification |
Check
logs or audit data for records processed and any migration errors. |
Review
migration logs and error reports. |
Performance
Metrics |
Measure
migration speed and validate no data truncation due to performance. |
Monitor
ETL performance and error rates. |
Data Validation Checks During Migration
(Interview Flashcard Version)
Check | What & Why | How to Verify (Quick Example) |
---|---|---|
1. Row Count | Ensure total rows match exactly. | SELECT COUNT(*) FROM source; & target; |
2. Null Check | Mandatory fields must NOT be null. | SELECT COUNT(*) WHERE col IS NULL; |
3. Duplicate Check | No duplicate records introduced. | GROUP BY key HAVING COUNT(*) > 1 |
4. Data Type Validation | Data types preserved or mapped right. | Compare schemas before and after migration. |
5. Data Accuracy | Data values unchanged or correctly transformed. | Sample data or use checksum/hash on rows. |
6. Referential Integrity | Foreign keys remain intact. | Left join child to parent, find orphans. |
7. Date/Time Validation | Dates/timestamps migrated correctly. | Check min/max dates, timezone handling. |
8. Encoding Check | No text corruption (special chars). | Review sample text fields for anomalies. |
9. Checksum Validation | Verify row-level data integrity. | MD5/SHA hash of concatenated columns. |
10. Sample Spot Check | Manual data spot verification. | Random sample queries and manual review. |
Bonus Tips for Interview:
-
Always automate these checks in your ETL pipeline.
-
Use logging and reports for audit and error tracking.
-
Perform incremental validation if data migration happens in phases.
-
Explain that data validation ensures no data loss or corruption during migration — vital for business trust and decision-making.
Sample Scripts for Key Validations
1. Row Count Validation (SQL)
2. Null Check on Mandatory Column (SQL)
3. Duplicate Check (SQL)
4. Checksum Validation (SQL)
Generate row-wise checksum/hash and compare counts of distinct hashes:
Compare outputs manually or export for diff tools.
5. Referential Integrity Check (SQL)
6. Sample Random Data Validation (SQL)
Python Script Snippet for Automated Validation (Example with Snowflake Connector)
No comments:
Post a Comment