What is data warehouse
- Is a Read only database which stored data from Transactional database.
- Corporate information repository.
- Source for intelligent reporting.
- Collection of data to assist Decision Support System.
What is Data-warehousing
Data warehousing is not single s/w or technology. Is a criteria of five different processes done in principled way.- Data Transformation: In DTP the data is extracted from OLTP or
any other sources.
- Data Standardization: Data will moved form different sources. So
it may have different format & style. Need to be standardization before
going to DWH.
- Data Filtering: It process of extracting required data from
OLTP. Entire data notrequired from Source, hence data filtering process conducted.
- Data Cleaning: Filtering ,Merging and translating.
- Data Aggregation: Is a final stage of building DWH. Its a data Summarization.
Expectation from DWH
- Support database design for analytical task.
- Provide user friendliness.
- Provide easiness to even the non IT User.
- Maintain current & Historical data for analysis.
Goal of DWH
- Understand the business trends & make better forecasting decision.
- Data warehouse provide consistent data.
- Data warehouse can performed data analysis Quickly.
- Data warehouse is nearer to end user’s to define data with ease of access to information.
- Provide integrated view of total enterprise.
- Provide current and historical data easily for decision making process.
- Information more consistent & accurate with quality.
- Its interactive source of Strategic information.
Different type of data warehouses
There are three types of data warehouses:- Enterprise Data
Warehouse - An enterprise
data warehouse provides a central database for decision support throughout the enterprise.
- ODS(Operational Data
Store) - This has a broad
enterprise wide scope, but unlike the real Enterprise data warehouse, data is refreshed in near real time and used
for routine business activity.
- Data Mart - Datamart is a subset of data warehouse and it supports a particular region,business unit or business function.
Different Type of approaches
Top down approach: This approach invent
by Bill Immon. First we need to implement the Enterprise data
warehouse by extracting the data from individual departments and from the
Enterprise data warehouse develop subject oriented databases called as “Data
Marts”.
Advantages
- Provide enterprise view of data.
- Provide single & centralize storage of data.
- It can be associate with centralize rules & privileges.
Disadvantage
- Longer time to built.
- More risk of failure.
- Its high costly.
Bottom Up approach: This approach invent
by Ralph Kimball. “First we need to develop subject oriented
database called as “Data Marts” then integrate all the Data Marts to develop
the Enterprise data warehouse.
Advantages
- Faster & easier in implementation.
- Low Risk of failure.
Disadvantages
- Risk of redundant information.
- Chance of inconsistent data may increase.
- Number of Data mart increases then may be unmanageable.
- Central administration difficult.
What is Dimension Modeling
- Dimension modeling is a process of designing the data warehouse.
- Dimension data modeling contain one or more dimension table and fact table.
- Dimensional Models are designed for reading, summarizing and analyzing numeric information.
- In dimension modeling Dimension integrate through the concept of Star Join with Fact table.
- In dimensional modeling Data Warehouse is designing with the following type of schemas
- Star Schema
- Snowflake Schema
- Galaxy Schema
Star Schema
- A star Schema is database design which contain centrally located Fact table which is surrounded by dimension table.
- The star schema architecture is the simplest data warehouse schema.
- In Star Schema dimension tables are denormalized and fact table is normalized.
Snowflake Schema
- The snowflake schema is represented by centralized fact table which are connected to multiple dimension table.
- The snowflake schema is a more complex data warehouse model than a star schema.
- In Snowflake Denormalized Dimensions tables can be splited into two or more normalized dimensions is called snowflake schema.
Galaxy Schema
- In Galaxy Schema multiple fact tables that share many dimension tables.
- This Schema is viewed as Collection of Stars hence called Galaxy Schema or Fact Constellation.
Components of Dimension Modeling
Following are the
component in dimension modeling
- Dimension Tables.
- Fact tables
Dimension Tables
- Dimension table contain textual description of the business.
- Dimension table in real time contain a very large number of column.
- The number of Records or Rows managed by Dimension table are very less.
- Dimension table stored record related to that particular dimension & no measures are part of it.
- Dimension table also called as Lookup or Reference tables.
Different type of dimension
Slowly Changing
Dimensions
Attribute that change
over time are called Slowly Changing Attribute and a dimension
containing such an attribute is Slowly Changing Dimension. There are three type
of SCD.
- Slowly Changing Dimensions Type 1:In This approach we simply overwrite the original information with new information. In this approach the history is not kept OR maintain.
- Slowly Changing Dimensions Type 2:In this approach a new record is added to the table to represent the new information. In this approach previous and new record available in database. The new recored is maintain with its own primary key.
- Slowly Changing Dimensions Type 3:In this approach there will be two column. One column indicate the original value and other column indicate current value.
A junk dimension is a
single table with a combination of different and unrelated attributes to avoid
having a large number of foreign keys in the fact table. Grouping of random
flags and text attribute in a dimension and moving tem to separate sub
dimension is known as Junk Dimension.
Conformed Dimensions
A Dimension that
is used with multiple fact tables is called a conformed dimension. A conformed
dimension may be used with multiple fact tables in a single database, or across
multiple data marts or data warehouses.
Organization
Dimension
In a relational data
model, for normalization purposes, corporate office lookup, region lookup,
branch lookup, and employee lookups are not merged as a single table. In a
dimensional data modeling(star schema), these tables would be merged as a
single table called Organization Dimension for performance and
slicing data.
Degenerate
Dimensions
A degenerate dimension
is when the dimension attribute is stored as part of fact table, and not in a
separate dimension table.
Example: A
transactional code in a fact table.
Inferred Dimension
While loading fact
records, a dimension record may not yet be ready. One solution is to generate a
surrogate key with null for all other attribute. This should technically be
called an inferred member, but is often called as inferred dimension.
Shrunk Dimension
A shrunk dimension is
a sub set of another dimension.
For Example: A month
dimension would be a shrunken dimension of the Date dimension. The Month
dimension could be connected to a forecast fact table whose grain is at the
monthly level.
Role Playing
Dimensions
A role-playing
dimension is one where the same dimension key — along with its associated
attributes — can be joined to more than one foreign key in the fact table. For
example, a fact table may include foreign keys for both Ship Date and Delivery
Date. But the same date dimension attributes apply to each foreign key, so you
can join the same dimension table to both foreign keys. Here the date dimension
is taking multiple roles to map ship date as well as delivery date, and hence
the name of Role Playing dimension.
Static Dimensions
Static dimensions are
not extracted from the original data source, but are created within the context
of the data warehouse. A static dimension can be loaded manually.
For example --Time
dimension.
Rapidly Changing
Dimension
A dimension
attribute that change frequently is a rapidly changing attribute. If you don’t
need to track the changes, the rapidly changing attribute is no problem, but if
you do need to track the changes, using a slandered Slowly Changing Dimension
technique can result in a huge inflation of the size of dimension. One solution
is to move the attribute to its own dimension, with a separate foreign key in
the fact table. The new dimension is Rapidly Changing Dimension.
Fact Tables
- Fact table are the primary table in Dimensional Modeling.
- Fact table contain Numerical Performance Measurement of business.
- Fact table contain less number of column and large number of records.
Fact table consist of
two types of columns.
- Column containing facts.
- Column that are foreign key to dimension tables.
Different Type of Facts
Additive: Additive facts are facts that can be summed
up through all of the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be
summed up for some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be
summed up for any of the dimensions present in the fact table.
Factless Fact : It is possible to have a fact table
that contains no measures or facts. These tables are called as Factless Fact
tables