Identify the Fact Table(s)
-
Stores measurable events or metrics (e.g.,
sales_amount
,units_sold
,support_case_duration
).
Example: FactSales
DateKey | ProductKey | CustomerKey | StoreKey | UnitsSold | Revenue |
---|
Identify Dimension Tables
-
Contain descriptive attributes used for filtering, grouping, and labeling.
Common dimensions:
-
-
DimDate
-
DimProduct
-
DimCustomer
-
DimEmployee
-
DimRegion
-
A fact table contains measurable, quantitative data like sales amount or order count, while dimension tables store descriptive attributes such as product details, customer info, or time periods.
The goal is to create a structure that supports fast query performance and is easy for business users to understand. Typically, I use star schemas for simplicity or snowflake schemas when normalization is needed.
For example, in a sales data warehouse, I might create a
FactSales
table that stores metrics like revenue and quantity, linked to dimensions like DimDate
, DimProduct
, and DimCustomer
.I also consider slowly changing dimensions (SCDs) — like tracking historical changes in customer address — and ensure conformed dimensions are reused across different subject areas to maintain consistency.
Overall, dimensional modeling helps bridge the gap between complex source systems and user-friendly reporting tools like Power BI"
“In one of my recent projects, I built a sales analytics solution for a retail company using dimensional modeling to structure the data warehouse. The goal was to create a central source of truth for sales metrics across multiple regions and product lines.”
1. Business Understanding:
“We met with stakeholders to identify the key metrics they cared about, such as total sales, units sold, gross margin, and average order value, along with slicing by date, product, customer, and location.”
2. Defined the Grain:
“I set the grain of the fact table at the daily transaction level — one row per product per transaction per store.”
3. Designed the Star Schema:
Fact Table: FactSales
-
DateKey
,ProductKey
,CustomerKey
,StoreKey
-
UnitsSold
,SalesAmount
,DiscountAmount
,GrossMargin
Dimension Tables:
-
DimDate
– day, week, month, quarter, year -
DimProduct
– name, category, subcategory, brand -
DimCustomer
– name, segment, loyalty status, gender (with SCD Type 2 for loyalty changes) -
DimStore
– region, city, store size -
DimEmployee
– salesperson, hire date
4. ETL Process:
“I built ETL pipelines using SQL and dbt to extract raw data, apply business rules, manage SCDs, and load into the dimensional model. For example, customer loyalty status was tracked using a Slowly Changing Dimension Type 2 approach.”
5. Reporting Layer:
“I used Power BI to build dashboards where users could filter sales by region, product, or store, and drill down from year to day level. Performance was optimized with aggregate tables for monthly sales.”
No comments:
Post a Comment