Welcome

Welcome to Aaditya Incorporated's Blog. This blog is meant to share the ideas and technical knowledge which can be useful to the software developement community.

Wednesday, 19 October 2011

Build Your Own Data Warehouse - I: Background and Homework

This is really nice article. I found it very useful.


Let's say you are a technologist at FoodMart, a (fictitious) retailer that sells various grocery products in a chain of stores across North America. The FoodMart IT Department currently churns out multiple standard reports every day by firing SQL queries on the application databases, and also serves ad-hoc report and analysis requests from users.

However, this approach is becoming unmaintainable due to the sheer number of distinct reports and SQLs out there, and also slows down your applications. Deep down you believe that there must be some sanity out of this madness.

After doing a little bit of reading, you think that a separate well planned data warehouse could help by taking the reporting and analysis load off the transactional apps. And you would also love to have a reporting platform to help efficiently handle the growing number of reports and analysis requests your department is flooded with. More specifically, you think that an OLAP system could help you out by delivering upon the most common analysis requests of your users.

So how do you go about doing this?

Since you already know that your solution has two distinct pieces (Data Warehouse and OLAP Reporting) we can address the two separately. First let's concern ourselves with the Data Warehousing portion as this is going to be the foundation of your entire solution. In fact this is the critical back-end piece that most people overlook which then undermines the entire BI initiative. (Don't worry about the OLAP bit yet, we would come back to it in another post)

Data Mart vs Data Warehouse
There are two broad schools of Data Warehousing that take opposite approaches to the process. There is Bill Inmon who propounds the top down approach where all the organizational data sources get mapped and built into a massive data warehouse that then serves user's data and analysis needs. Then there is Ralph Kimball who recommends starting small with a subject oriented Data Mart to start delivering end-user value in a small iteration, and then building more Data Marts that together form the Data Warehouse. (For more information see Data Mart vs Data Warehouse - The Great Debate)

Since you may not have very strong buy-in of FoodMart top management at this stage (after all this is still your personal project), you cannot really hope to get the Inmon-style Data Warehouse built in a single shot. So we'll go with the Kimball approach and begin with a Data Mart that hopefully will deliver enough value to generate management buy-in and to get you funding. Then you can add more Data Marts in due course and build your entire Data Warehouse in an iterative ground-up manner.

Identifying the Subject
A Data Mart is subject oriented, which means that it is designed to handle data for one specific aspect of the business. This means that you cannot hope to have a 'single Data Mart' for handling requirements from Sales, Ops, Marketing, Customer Service, etc. Serving the needs of all these user groups is the end-vision, not the starting point.

Now you have to prioritise and choose the first business aspect you are going to attack. Most likely you would choose this to be Sales. Some reasons for this choice include: that's where a lot of requests would be coming in from, the data might be more easily available than others, everyone likes to see the Sales and revenue figures. Last, but not the least, if you can get the sales department to back your pet project from the outset, you are well on your way.

What information would the sales people want?
Before you start ringing up your contacts in the sales department, do some homework in figuring out what their needs could be. If you have done this homework, you are more likely to get constructive inputs from the sales guys. And it's not all that difficult really... A good way to begin is to list down all the kinds of current and anticipated reporting/analysis requests from the sales guys. Do get it reviewed from some sales guys, but make sure that they understand that this is just brainstorming and not a commitment to delivery yet.

Measures and Dimensions
Looking at your list you would soon realise that most of the requests involve measurement of specific business events for various contexts. Consider a single sale of grocery worth $10 at a FoodMart supermarket somwhere. The measurable business event is the sale worth $10. And then there is a lot of contextual information about the sales event. Some examples: Who bought this? Where did this sale happen (country/region/state/city)? When did this happen (year/month/quarter/week/day)? what product was sold (food/beverage/product group/subgroup/brand/SKU)? What kind of store?

A Measure is any business metric that you want to calculate and report on. It can range from simple stuff like Dollar Sales (total sales revenue), Unit Sales (no of units for a particular product sold), Profits, Current Inventory Level, etc. to more complicated computed fields like Average Cost, Average Profit per Unit, % Margin, etc.

Dimensions provide the "context" for measurement and help subset the database to get to the specific information you are interested in. You may want to subset by Geography, Time, Product/Service Offering, Customer Demographics, etc.

Armed with these concepts, you would now see that most of your list involves calculating (and delivering as reports) certain specific Measures across specific Dimensions (that would appear as row or columns in the report).

Now we are in a position to start defining the measures and dimensions for the Sales Data Mart at FoodMart, and that is precisely what we will do in the next post. In the meantime, do make sure you refresh your understanding of dimensional modeling from a previous post (
Dimensional Modeling... or, Why your App is NOT your Data Warehouse)

No comments:

Post a Comment

Note: only a member of this blog may post a comment.