Home3

Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

Home2

Home

Monday, April 18, 2016

What is ETL?. & Its Advantages

ETL, or Extract, Transform and Load, eases the combination of heterogeneous sources into a unified central repository. Usually this repository is a data warehouse or mart which will support enterprise business intelligence.

Extract - read data from multiple source systems into a single format. This process extracts the data from each native system and saves it to one target location. That source data may be any number of database formats, flat files, or document repositories. Usually, the goal is to extract the entire unmodified source system data, though certain checks and filters may be performed here to ensure the data meets an expected layout or to selectively remove data (e.g. potentially confidential information).

Transform - in this step, the data from the various systems is made consistent and linked. Some of the key operations here are:
Standardization - data is mapped to a consistent set of lookup values (e.g. US, USA, United States and blank/null - all mapped to the standard ISO country code)
Cleansing - perform validity checks and either remove or modify problem data
Surrogate keys - new key values applied to similar data from different source systems prevent key collisions in the future and provide a cross reference across these systems
Transposing - organizes data to optimize reporting. Many source systems are optimized for transactional performance but the warehouse will be primarily used for reporting. Often this involves denormalizing and re-organizing into a dimensional model.

Load - the transformed data is now written out to a warehouse/mart. The load process will usually preserve prior data. In some instances existing warehouse data is never removed, just marked as inactive. This provides full auditing and supports historical reporting.




No comments:

Post a Comment