Month: June 2014
About Source to Target (S2T) Document or Mapping Document
S2T document is the bible of any ETL Projects. This will be designed by the Data Modeler’s using the FSD’s (Functional Specification Documents).
Data Modeler’s are interested in
- the Source Data’s nature
- Source Data is expected as it is mentioned in FSD
- Are the source is sending the correct data type as they promised?
- All the Date and Time fields are expected
- Columns that are participating in Referential Integrity are falling under the data set given in FSD
- Source Table and Target Table nature – like Set, Multiset etc.,
- Is the Source is Time Variant or Non Time Variant
- What all are the Key columns that makes the Record unique from the source?
- What all are the value columns that triggers the delta from the Source?
- Logical connections of the Source Tables participating in ETL Loads
- Logical connections of the Target Tables participating in ETL Loads
- Physical connections of the Source Tables participating in ETL Loads
- Physical connections of the Target Tables participating in ETL Loads
Most of the S2T’s are written in Excel spread sheet. Where you will find many columns, each column is important for the ETL Load to be designed.
Major components of an S2T,
- Version Number – To maintain and track the ongoing changes happening in the S2T are tracked
using the Version Numbers.
SourceDatabase– Source Database name or names will be mentioned in this space
Source Column– All the Source columns from respective Source Database are mentioned in this space (these columns will undergo the transformations if required.
Extraction Criteria– As I already mentioned, we are not going to pull all the data from the source before transformation however we need to pull the data that we required for the transformation and this will be mentioned in this space. All the Joins and Unions will be done here so as a Tester we need to understand and analysis this area with more care.
Example – Now I wanted to load Customers details and their balance from Savings account. Customers Details will be fetched from Cust_Detl table and the Balance from ACCT_BALANCE table, so you need to perform join in between. This extracting filters only Customers Savings account. So I consider this is my Data extraction criteria.
Filter Criteria– After we extracted the data from the source we need to filter the data if required for few or more target tables and this will be covered in this space.
Example – I have extracted 100 records from the source which is linked to Savings account and to ACCT_OVER_LIMIT , we just want to load , customer’s who’s acct balance is more than $100, so Data Modelers, will use the filter rule just below to the Record extraction criteria.
Target Database – Target Database name or names will be mentioned in this space
Target Column – All the Target columns from respective Target Database are mentioned in this space (these columns will undergo the transformations if required.
Key or Value Columns–Next to the Target Column names you could see Key / Value. Key column means the column that makes the record unique and Value means, what makes the record time variant.
Comments and Version Changes – This space will explain us what was in the S2T before and what changed now. Comments will tell us more about the transformation.
What we need to look in the S2T?
As soon as you get the S2T, please query your Staging Source tables and check the data that you have got will satisfy your transformation rule. S2T’s will be written with the SIT phase data, and the rules mentioned might change as soon as we get the UAT Phase data. So to achieve the good quality of testing we always interested in UAT data.
Example: Suppose if Data modeler mentioned a column as Key column (which should not change consecutive loads). But you have noticed that column values are changing from source on consecutive loads then you should notify the data modeler and the source system , either one should be correct and have to make the changes from their end.
Most of us will confuse the below transformation logic
- Difference between IN and NOT IN operator
Example: If the source columns are NOT NULL BLANK SPACE 0 and DO NOT LOAD the record. In this case, we might look for the record in the target when it has Unknown values.
- Joins – Even if the Data modelers mentioned the Joins in the Extraction rules we need to ensure the joins are working with the source data provide. This should be done because the data modelers will not experiment the transformation rules on the source data given.
- Reference tables – In every S2T you will have separate spread sheets that will have reference table names and the data needs to be inserted and the Insert scripts prepared by the Data Modelers.
Thanks for reading – Comments are Expected 🙂
Cheers – Asik