If you have followed some of the earlier posts, you would remember that a data mart is created as a star schema through a process known as dimensional modeling. In this post we will create a dimensional model for Sales data mart at a hypothetical retailer.
Note: To go through the examples here, you would need MySQL database and DB Designer for database modeling. You can either install DecisionStudio Professional (download from sourceforge), which has both MySQL and DB Designer along with many other analytics goodies, or else you can install them individually from MySQL website and DB Designer website. You would also need to download the sample foodmart database available along with DecisionStudio Professional.
Now let’s assume you are an IT person at FoodMart (a hypothetical retailer) who has decided to build a sales data mart as the first step in rolling out comprehensive analytics. In discussions with the sales department you have figured that the no. of units sold, dollar amount of sales, and the number of unique customers in a segment are the main metrics they look at. Digging deeper you figure that the sales guys are likely to want analysis by product, product category/class, brand, store location (city, state, region, country, …), customer demographics, and also by individual promotions and promotion categories. It may not be explicitly mentioned, but the metrics would also be analyzed by time (day, week, month, quarter, …)
Now that you have figured out the business metrics to be measured, this gives you the facts you would need in the data mart ‘fact table’ for calculating them. Similarly you have figured out the potential segments for analysis, and that gives you the ‘dimensions’ for analysis. The ‘fact table’ linked to the ‘dimension tables’ makes up the ’star-schema’ (because of the star-like structure), also known as the data mart.
With this information in place, we have the high level Dimensional Model for Sales.
Sales_Fact_1998 is the main fact table that has sales information by store/location, product, time, customer, and promotion. Correspondingly there are 5 dimension tables joined to the fact table through foreign keys in the star-schema.
The dimension tables in turn have detailed data that can now be used for defining ad-hoc analysis segments. For example, we can put demographic filters on the customer dimension (say age<30, married, college-educated), choose specific product class(es) in the product table (say Dairy Products), specify a limited time period, and then get our metrics calculated for the ad-hoc segment.
The image below shows the detailed information available in the dimension tables for defining ad-hoc segments.
You can download the data model here, and then open the saved model using DB Designer in DecisionStudio Professional (Start -> Program Files -> DecisionStudio Professional -> Data Analyst -> DB Designer Workbench). You can see other tables in the FoodMart database by scrolling around on the canvas (scroller in top-right corner).
Do note that our dimensional model for sales covers only a small relevant set of tables from the entire FoodMart database. You can load the entire downloaded FoodMart data into MySQL as outlined here, and can query on the data using Query Browser (Start -> Program Files -> DecisionStudio Professional -> Data Analyst -> MySQL Query Browser).