Thursday, June 5, 2025

Migrating customer data from Oracle (on-prem) to Snowflake Cloud.

 Migrating customer data from Oracle (on-prem) to Snowflake Cloud.

🎯 Objectives:

  1. Validate customer data is loaded accurately.

  2. Apply transformation logic for formatting.

  3. Enforce business rules such as KYC flag checks.

✅ Validation Approach:

StepTaskValidation Logic
1️⃣Source to Target MappingChecked that customer_id, name, dob matched between Oracle and Snowflake. Used SQL joins to compare values.
2️⃣Data Type & LengthEnsured VARCHAR(100) from Oracle fits Snowflake STRING. Identified potential truncation in address field.
3️⃣TransformationTransformed full_name = CONCAT(first_name, ' ', last_name). Validated sample rows.
4️⃣Business RulesEnsured IF kyc_status = 'Y' THEN is_active = true. Built SQL CASE statements to validate rule execution.
5️⃣Aggregate ChecksChecked total active customers in source = target. Also checked by region & segment.
6️⃣Exception HandlingFlagged records with null DOBs or duplicate PANs. Reported to data team for remediation.

πŸ›  Tools Used:

  • SQL (Oracle and Snowflake)

  • Python with pandas (for sampling and automated comparison)

  • Excel (for reporting mismatches)

  • Jira (to raise data quality issues)


πŸ“„ PDF Download (Optional)

Would you like this flashcard and case study saved as a PDF for offline prep or sharing? I can generate it and give you a download link. Just confirm ✅

Also, I can share Python + SQL scripts used for sample validation tasks if needed.


✅ Here is your sample Python validation script summary:

This Python script:

  • Validates source-to-target ID mapping.

  • Verifies transformation logic for full_name = first_name + last_name.

  • Confirms business rule: if kyc_status = 'Y', then is_active = True.

✅ Results:


{ 'Customer ID Mapping Valid': True, 'Full Name Transformation Valid': True, 'KYC Status Rule Valid': True }

Let me know if you'd like:

  • A downloadable .py script,

  • Equivalent SQL validation queries,

  • Or an extended version for larger datasets or file-based validations (CSV/Excel/SQL DBs).


Here’s a sample SQL script to automate common source-to-target validation checks for a data migration or ETL process — tailored for an interview scenario (especially relevant for Snowflake, Oracle, or other relational systems):

Scenario: Migrating customer data from Oracle (source) to Snowflake (target)

🧾 Assumptions

  • source.customer_table and target.customer_table exist

  • Key fields: customer_id, first_name, last_name, dob, kyc_status, is_active


πŸ›  1. Row Count Validation

-- Total row count match
SELECT (SELECT COUNT(*) FROM source.customer_table) AS source_count, (SELECT COUNT(*) FROM target.customer_table) AS target_count;

🧩 2. Source to Target ID Mapping Check


-- Find mismatched customer_ids SELECT s.customer_id FROM source.customer_table s LEFT JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE t.cust_id IS NULL;

🧠 3. Full Name Transformation Check


-- Compare derived full_name from source vs. stored full_name in target SELECT s.customer_id, s.first_name || ' ' || s.last_name AS expected_full_name, t.full_name FROM source.customer_table s JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE s.first_name || ' ' || s.last_name <> t.full_name;

4. Business Rule Check – KYC Status to is_active Mapping


-- Validate business rule: if kyc_status = 'Y', then is_active should be TRUE SELECT s.customer_id, s.kyc_status, t.is_active FROM source.customer_table s JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE (s.kyc_status = 'Y' AND t.is_active <> TRUE) OR (s.kyc_status <> 'Y' AND t.is_active <> FALSE);

πŸ“‹ 5. Data Quality Checks – NULLs in Critical Fields


-- Check for NULLs in critical fields SELECT * FROM target.customer_table WHERE cust_id IS NULL OR full_name IS NULL OR dob IS NULL;

🎯 Summary for Interview Use:

"I use SQL automation scripts for validating key checks during data migration. These include row counts, field-level mapping, transformation logic like full_name generation, and business rules like KYC checks. Any mismatches are flagged and documented for remediation before sign-off."


Would you like this saved as a .sql or .txt file for download? Or want to add more checks like duplicate detection or date format validations?


It seems like I can’t do more advanced data analysis right now. Please try again later.

In the meantime, you can manually copy the following SQL script into your own .sql file:


-- 1. Row Count Validation SELECT (SELECT COUNT(*) FROM source.customer_table) AS source_count, (SELECT COUNT(*) FROM target.customer_table) AS target_count; -- 2. Source to Target ID Mapping Check SELECT s.customer_id FROM source.customer_table s LEFT JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE t.cust_id IS NULL; -- 3. Full Name Transformation Check SELECT s.customer_id, s.first_name || ' ' || s.last_name AS expected_full_name, t.full_name FROM source.customer_table s JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE s.first_name || ' ' || s.last_name <> t.full_name; -- 4. Business Rule Check – KYC Status to is_active Mapping SELECT s.customer_id, s.kyc_status, t.is_active FROM source.customer_table s JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE (s.kyc_status = 'Y' AND t.is_active <> TRUE) OR (s.kyc_status <> 'Y' AND t.is_active <> FALSE); -- 5. Data Quality Checks – NULLs in Critical Fields SELECT * FROM target.customer_table WHERE cust_id IS NULL OR full_name IS NULL OR dob IS NULL;


Absolutely! The main objective of this SQL script is to validate the integrity and accuracy of data during a migration or ETL process — especially when moving data from a source system (like Oracle) to a target system (like Snowflake).

Here’s a breakdown of each section and its purpose:


πŸ” 1. Row Count Validation

SELECT
(SELECT COUNT(*) FROM source.customer_table) AS source_count, (SELECT COUNT(*) FROM target.customer_table) AS target_count;

Purpose:
To check if the total number of records in the source and target tables are the same.
✅ This ensures no records were lost or duplicated during migration.


πŸ” 2. Source to Target ID Mapping Check

SELECT s.customer_id
FROM source.customer_table s LEFT JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE t.cust_id IS NULL;

Purpose:
To identify customer IDs that exist in the source but are missing in the target.
✅ This validates the mapping and completeness of primary keys (like customer_idcust_id).


3. Full Name Transformation Check

SELECT s.customer_id, s.first_name || ' ' || s.last_name AS expected_full_name, t.full_name
FROM source.customer_table s JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE s.first_name || ' ' || s.last_name <> t.full_name;

Purpose:
To ensure that the transformation logic applied during ETL (like combining first_name and last_name into full_name) is working as expected.
✅ This validates data transformation logic.


4. Business Rule Check – KYC Status Logic

SELECT s.customer_id, s.kyc_status, t.is_active
FROM source.customer_table s JOIN target.customer_table t ON s.customer_id = t.cust_id WHERE (s.kyc_status = 'Y' AND t.is_active <> TRUE) OR (s.kyc_status <> 'Y' AND t.is_active <> FALSE);

Purpose:
To validate that a business rule is correctly implemented:

If kyc_status = 'Y', then the customer must be marked as is_active = TRUE.

✅ This ensures that critical business rules are preserved in the migrated data.


🧼 5. Data Quality Checks – NULLs in Critical Fields

SELECT *
FROM target.customer_table WHERE cust_id IS NULL OR full_name IS NULL OR dob IS NULL;

Purpose:
To identify missing critical data (e.g., customer ID, name, or DOB) in the target system.
✅ This is essential for data quality assurance.


🎯 Main Objective Summary (For Interviews)

“This script ensures that data is not only moved from source to target accurately but also follows the transformation logic, business rules, and data quality standards. It's a critical part of the post-migration validation process in any data project.”

No comments:

Post a Comment

Python using AI

  Python using AI - Prompts & Codes Tools useful for Python + AI ChatGPT - https://chatgpt.com/ Claude AI - https://claude.ai/new ...