Sunday, 1 February 2015

Data Warehousing


What is data warehouse

  • Is a Read only database which stored data from Transactional database.
  • Corporate information repository.
  • Source for intelligent reporting.
  • Collection of data to assist Decision Support System.

What is Data-warehousing

Data warehousing is not single s/w or technology. Is a criteria of five different processes done in principled way.
  • Data Transformation: In DTP the data is extracted from OLTP or any other sources.
  • Data Standardization: Data will moved form different sources. So it may have different format & style. Need to be standardization before going to DWH.
  • Data Filtering: It process of extracting required data from OLTP. Entire data notrequired from Source, hence data filtering process conducted.
  • Data Cleaning: Filtering ,Merging and translating.
  • Data Aggregation: Is a final stage of building DWH. Its a data Summarization.

Expectation from DWH

  • Support database design for analytical  task.
  • Provide user friendliness.
  • Provide easiness to even the non IT User.
  • Maintain current & Historical data for analysis.

Goal of DWH

  • Understand the business trends & make better forecasting decision. 
  • Data warehouse provide consistent data.
  • Data warehouse can performed data analysis Quickly.
  • Data warehouse is nearer to end user’s to define data with ease of access to information.
What DWH should really Do?
  • Provide integrated view of total enterprise.
  • Provide current and historical data easily for decision making process.
  • Information more consistent & accurate with quality.
  • Its interactive source of Strategic information.

Different type of data warehouses

There are three types of data warehouses:

  • Enterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout the enterprise.
  • ODS(Operational Data Store) - This has a broad enterprise wide scope, but unlike the real Enterprise data warehouse, data is refreshed in near real time and used for routine business activity.
  • Data Mart - Datamart is a subset of data warehouse and it supports a particular region,business unit or business function.

Different Type of approaches

Top down approach: This approach invent by Bill Immon. First we need to implement the Enterprise data warehouse by extracting the data from individual departments and from the Enterprise data warehouse develop subject oriented databases called as “Data Marts”.
Advantages

  • Provide enterprise view of data.
  • Provide single & centralize storage of data.
  • It can be associate with centralize rules & privileges.
Disadvantage

  • Longer time to built.
  • More risk of failure.
  • Its high costly.
Bottom Up approach: This approach invent by Ralph Kimball. “First we need to develop subject oriented database called as “Data Marts” then integrate all the Data Marts to develop the Enterprise data warehouse.

Advantages

  • Faster & easier in implementation.
  • Low Risk of failure.
Disadvantages

  • Risk of redundant information.
  • Chance of inconsistent data may increase.
  • Number of Data mart increases then may be unmanageable.
  • Central administration difficult.

What is Dimension Modeling

  • Dimension modeling is a process of designing the data warehouse.
  • Dimension data modeling contain one or more dimension table and fact table.
  • Dimensional Models are designed for reading, summarizing and analyzing numeric information.
  • In dimension modeling Dimension integrate through the concept of Star Join with Fact table.
  • In dimensional modeling Data Warehouse is designing with the following type of schemas

  1. Star Schema
  2. Snowflake Schema
  3. Galaxy Schema

Star Schema

  • A star Schema is database design which contain centrally located Fact table which is surrounded by dimension table.
  • The star schema architecture is the simplest data warehouse schema.
  • In Star Schema dimension tables are denormalized and fact table is normalized.

Snowflake Schema
  • The snowflake schema is represented by centralized fact table which are connected to multiple dimension table.
  • The snowflake schema is a more complex data warehouse model than a star schema.
  • In Snowflake Denormalized Dimensions tables can be splited into two or more normalized dimensions is called snowflake schema.


Galaxy Schema

  • In Galaxy Schema multiple fact tables that share many dimension tables. 
  • This Schema is viewed as Collection of Stars hence called Galaxy Schema or Fact Constellation.


Components of Dimension Modeling

Following are the component in dimension modeling

  1. Dimension Tables.
  2. Fact tables
Dimension Tables
  • Dimension table contain textual description of the business.
  • Dimension table in real time contain a very large number of column.
  • The number of Records or Rows managed by Dimension table are very less.
  • Dimension table  stored record related to that particular dimension & no measures are part of it.
  • Dimension table also called as Lookup or Reference tables.

Different type of dimension

Slowly Changing Dimensions
Attribute that change over time are called Slowly Changing Attribute  and a dimension containing such an attribute is Slowly Changing Dimension. There are three type of SCD.
  • Slowly Changing Dimensions Type 1:In This approach we simply overwrite the original information with new information. In this approach the history is not kept OR maintain.
  • Slowly Changing Dimensions Type 2:In this approach a new record is added to the table to represent the new information. In this approach previous and new record available in database. The new recored is maintain with its own primary key.
  • Slowly Changing Dimensions Type 3:In this approach there will be two column. One column indicate the original value and other column indicate current value.
Junk Dimensions
A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Grouping of random flags and text attribute in a dimension and moving tem to separate sub dimension is known as Junk Dimension.

Conformed Dimensions
A Dimension that is used with multiple fact tables is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

Organization Dimension
In a relational data model, for normalization purposes, corporate office lookup, region lookup, branch lookup, and employee lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called Organization Dimension for performance and slicing data.

Degenerate Dimensions
A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table.
Example: A transactional code in a fact table.

Inferred Dimension
While loading fact records, a dimension record may not yet be ready. One solution is to generate a surrogate key with null for all other attribute. This should technically be called an inferred member, but is often called as inferred dimension.

Shrunk Dimension
A shrunk dimension is a sub set of another dimension.
For Example: A month dimension would be a shrunken dimension of the Date dimension. The Month dimension could be connected to a forecast fact table whose grain is at the monthly level.

Role Playing Dimensions
A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Static Dimensions
Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually.
For example --Time dimension.

Rapidly Changing Dimension 


A dimension attribute that change frequently is a rapidly changing attribute. If you don’t need to track the changes, the rapidly changing attribute is no problem, but if you do need to track the changes, using a slandered Slowly Changing Dimension technique can result in a huge inflation of the size of dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. The new dimension is Rapidly Changing Dimension.

Fact Tables

  • Fact table are the primary table in Dimensional Modeling.
  • Fact table contain Numerical Performance Measurement of business.
  • Fact table contain less number of column and large number of records.
Fact table consist of two types of columns.

  1.  Column containing facts.
  2.  Column that are foreign key to dimension tables.

Different Type of Facts

Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.

Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.

Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

Factless Fact : It is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables

Share:

Total Pageviews

Lables

Powered by Blogger.

Followers