What is Staging area why we need it in DWH?
If target and source databases are different and target table volume is high it contains some millions of records in this scenario without staging table we need to design your informatica using look up to find out whether the record exists or not in the target table since target has huge volumes so its costly to create cache it will hit the performance.
If we create staging tables in the target database we can simply do outer join in the source qualifier to determine insert/update this approach will give you good performance.
It will avoid full table scan to determine insert/updates on target.And also we can create index on staging tables since these tables were designed for specific application it will not impact to any other schemas/users.
While processing flat files to data warehousing we can perform cleansing.
Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data is correct and accurate. During data cleansing, records are checked for accuracy and consistency.
- Since it is one-to-one mapping from ODS to staging we do truncate and reload.
- We can create indexes in the staging state, to perform our source qualifier best.
- If we have the staging area no need to relay on the informatics transformation to known whether the record exists or not.
Weeding out unnecessary or unwanted things (characters and spaces etc) from incoming data to make it more meaningful and informative
Data can be gathered from heterogeneous systems and put together
Data scrubbing is the process of fixing or eliminating individual pieces of data that are incorrect, incomplete or duplicated before the data is passed to end user.
Data scrubbing is aimed at more than eliminating errors and redundancy. The goal is also to bring consistency to various data sets that may have been created with different, incompatible business rules.