Data Warehousing for Dummies PDF: What is a data warehouse?
If you haven't already, you should check out my “data warehousing for dummies PDF”. Last week I started a new series on this blog exploring big data solutions for small business. My first step is creating a data warehouse. This week I wanted to get into a little more about what a data warehouse is.
As I am building out and iterating my own data warehousing for dummies PDF, I am reading the official for Dummies book and I don’t recommend it. It is written for an audience working in a larger organization, not a small business. For example it says, “You can’t easily find an organization right now that doesn’t have at least one data warehousing initiative underway, on the drawing board, or in production. Everyone wants to consume data - which leads directly to the need for a data warehouse!” Meh.
I don’t think data drives a lot of activity in a small business (But we are trying to change that, right?).
What Is a Data Warehouse?
Before I started reading up on it I thought of a data warehouse as a sort of central core, like a nucleus of a cell. It’s more like a Costco. A Costco . . . for data. If you ask 100 consultants to, “Define a data warehouse in 20 words or less.” 95 of them will drop jargon like, subject-oriented, time-variant, & read-only. The other five are MBA’s that will talk about “improving corporate data driven decision making through timely access to information.
A data warehouse is a home for high value data. This is high quality, refined information that has gone through a 25 point quality inspection and an upsell for a warranty you don’t really need.
What data should you stock?
A data warehouse is a retail store. But this store has shelves stocked with data. Not raw data. Not partially processes data. This data is clean, refined, packaged and marketed for use. It’s helpful to think of data as having three major categories:
Run the business (raw) data: customer orders, manage finances, this is raw material.
Integrate the business (intermediate) data: a master list of customers, or other tools built to improve the quality and sync two or more corporate apps
Monitor the business (retail) data: Used for reporting and decision supports, like a financial dashboard. Data is clean to enable users to understand progress and evaluate cause and effect relationships.
Data assets are made of raw material (run the business data) to produce higher quality data products to integrate and monitor the business.
How to manufacture data assets?
Data warehousing is the coordinated, architected, and periodic copying of data from various sources, both inside and outside the enterprise, into an environment optimized for analytical and informational processing. It generally follows these five steps:
Select a focus area for tracking and reporting.
Identify a group of business users as subject matter experts.
List the different types of information that can enable them to use the data warehouse.
Identify where to get this information.
The data warehouse team creates extraction programs to collect the data, ETL (extract, transform, load).
First Steps?
This is where the official for Dummies book really falls apart. It’s good reference material but doesn’t offer anything upfront to help you get started. The first major decision you are going to need to make is where are you going to build your data warehouse? The three leading options are:
AWS
Google Cloud
Microsoft Azure
But you have options of using other platforms that work with the big three. Qubole is one. In my data warehousing for dummies pdf, I’ll help you take this first step as well as those that follow.