Welcome

Welcome to Aaditya Incorporated's Blog. This blog is meant to share the ideas and technical knowledge which can be useful to the software developement community.

Wednesday, 19 October 2011

Sales Data Mart – Dimensional Model for Retail


If you have followed some of the earlier posts, you would remember that a data mart is created as a star schema through a process known as dimensional modeling. In this post we will create a dimensional model for Sales data mart at a hypothetical retailer.
Note: To go through the examples here, you would need MySQL database and DB Designer for database modeling. You can either install DecisionStudio Professional (download from sourceforge), which has both MySQL and DB Designer along with many other analytics goodies, or else you can install them individually from MySQL website and DB Designer website. You would also need to download the sample foodmart database available along with DecisionStudio Professional.
Now let’s assume you are an IT person at FoodMart (a hypothetical retailer) who has decided to build a sales data mart as the first step in rolling out comprehensive analytics. In discussions with the sales department you have figured that the no. of units sold, dollar amount of sales, and the number of unique customers in a segment are the main metrics they look at. Digging deeper you figure that the sales guys are likely to want analysis by product, product category/class, brand, store location (city, state, region, country, …), customer demographics, and also by individual promotions and promotion categories. It may not be explicitly mentioned, but the metrics would also be analyzed by time (day, week, month, quarter, …)
Now that you have figured out the business metrics to be measured, this gives you the facts you would need in the data mart ‘fact table’ for calculating them. Similarly you have figured out the potential segments for analysis, and that gives you the ‘dimensions’ for analysis. The ‘fact table’ linked to the ‘dimension tables’ makes up the ’star-schema’ (because of the star-like structure), also known as the data mart.
With this information in place, we have the high level Dimensional Model for Sales.





Sales_Fact_1998 is the main fact table that has sales information by store/location, product, time, customer, and promotion. Correspondingly there are 5 dimension tables joined to the fact table through foreign keys in the star-schema.
The dimension tables in turn have detailed data that can now be used for defining ad-hoc analysis segments. For example, we can put demographic filters on the customer dimension (say age<30, married, college-educated), choose specific product class(es) in the product table (say Dairy Products), specify a limited time period, and then get our metrics calculated for the ad-hoc segment.
The image below shows the detailed information available in the dimension tables for defining ad-hoc segments.


You can download the data model here, and then open the saved model using DB Designer in DecisionStudio Professional (Start -> Program Files -> DecisionStudio Professional -> Data Analyst -> DB Designer Workbench). You can see other tables in the FoodMart database by scrolling around on the canvas (scroller in top-right corner).
Do note that our dimensional model for sales covers only a small relevant set of tables from the entire FoodMart database. You can load the entire downloaded FoodMart data into MySQL as outlined here, and can query on the data using Query Browser (Start -> Program Files -> DecisionStudio Professional -> Data Analyst -> MySQL Query Browser).

Build Your Own Data Warehouse - I: Background and Homework

This is really nice article. I found it very useful.


Let's say you are a technologist at FoodMart, a (fictitious) retailer that sells various grocery products in a chain of stores across North America. The FoodMart IT Department currently churns out multiple standard reports every day by firing SQL queries on the application databases, and also serves ad-hoc report and analysis requests from users.

However, this approach is becoming unmaintainable due to the sheer number of distinct reports and SQLs out there, and also slows down your applications. Deep down you believe that there must be some sanity out of this madness.

After doing a little bit of reading, you think that a separate well planned data warehouse could help by taking the reporting and analysis load off the transactional apps. And you would also love to have a reporting platform to help efficiently handle the growing number of reports and analysis requests your department is flooded with. More specifically, you think that an OLAP system could help you out by delivering upon the most common analysis requests of your users.

So how do you go about doing this?

Since you already know that your solution has two distinct pieces (Data Warehouse and OLAP Reporting) we can address the two separately. First let's concern ourselves with the Data Warehousing portion as this is going to be the foundation of your entire solution. In fact this is the critical back-end piece that most people overlook which then undermines the entire BI initiative. (Don't worry about the OLAP bit yet, we would come back to it in another post)

Data Mart vs Data Warehouse
There are two broad schools of Data Warehousing that take opposite approaches to the process. There is Bill Inmon who propounds the top down approach where all the organizational data sources get mapped and built into a massive data warehouse that then serves user's data and analysis needs. Then there is Ralph Kimball who recommends starting small with a subject oriented Data Mart to start delivering end-user value in a small iteration, and then building more Data Marts that together form the Data Warehouse. (For more information see Data Mart vs Data Warehouse - The Great Debate)

Since you may not have very strong buy-in of FoodMart top management at this stage (after all this is still your personal project), you cannot really hope to get the Inmon-style Data Warehouse built in a single shot. So we'll go with the Kimball approach and begin with a Data Mart that hopefully will deliver enough value to generate management buy-in and to get you funding. Then you can add more Data Marts in due course and build your entire Data Warehouse in an iterative ground-up manner.

Identifying the Subject
A Data Mart is subject oriented, which means that it is designed to handle data for one specific aspect of the business. This means that you cannot hope to have a 'single Data Mart' for handling requirements from Sales, Ops, Marketing, Customer Service, etc. Serving the needs of all these user groups is the end-vision, not the starting point.

Now you have to prioritise and choose the first business aspect you are going to attack. Most likely you would choose this to be Sales. Some reasons for this choice include: that's where a lot of requests would be coming in from, the data might be more easily available than others, everyone likes to see the Sales and revenue figures. Last, but not the least, if you can get the sales department to back your pet project from the outset, you are well on your way.

What information would the sales people want?
Before you start ringing up your contacts in the sales department, do some homework in figuring out what their needs could be. If you have done this homework, you are more likely to get constructive inputs from the sales guys. And it's not all that difficult really... A good way to begin is to list down all the kinds of current and anticipated reporting/analysis requests from the sales guys. Do get it reviewed from some sales guys, but make sure that they understand that this is just brainstorming and not a commitment to delivery yet.

Measures and Dimensions
Looking at your list you would soon realise that most of the requests involve measurement of specific business events for various contexts. Consider a single sale of grocery worth $10 at a FoodMart supermarket somwhere. The measurable business event is the sale worth $10. And then there is a lot of contextual information about the sales event. Some examples: Who bought this? Where did this sale happen (country/region/state/city)? When did this happen (year/month/quarter/week/day)? what product was sold (food/beverage/product group/subgroup/brand/SKU)? What kind of store?

A Measure is any business metric that you want to calculate and report on. It can range from simple stuff like Dollar Sales (total sales revenue), Unit Sales (no of units for a particular product sold), Profits, Current Inventory Level, etc. to more complicated computed fields like Average Cost, Average Profit per Unit, % Margin, etc.

Dimensions provide the "context" for measurement and help subset the database to get to the specific information you are interested in. You may want to subset by Geography, Time, Product/Service Offering, Customer Demographics, etc.

Armed with these concepts, you would now see that most of your list involves calculating (and delivering as reports) certain specific Measures across specific Dimensions (that would appear as row or columns in the report).

Now we are in a position to start defining the measures and dimensions for the Sales Data Mart at FoodMart, and that is precisely what we will do in the next post. In the meantime, do make sure you refresh your understanding of dimensional modeling from a previous post (
Dimensional Modeling... or, Why your App is NOT your Data Warehouse)