About Source to Target (S2T) Document or Mapping Document

Posted on

About Source to Target (S2T) Document or Mapping Document

S2T document is the bible of any ETL Projects. This will be designed by the Data Modeler’s using the FSD’s (Functional Specification Documents).

Data Modeler’s are interested in

  • the Source Data’s nature
  • Source Data is expected as it is mentioned in FSD
  • Are the source is sending the correct data type as they promised?
  • All the Date and Time fields are expected
  • Columns that are participating in Referential Integrity are falling under the data set given in FSD
  • Source Table and Target Table nature – like Set, Multiset etc.,
  • Is the Source is Time Variant or Non Time Variant
  • What all are the Key columns that makes the Record unique from the source?
  • What all are the value columns that triggers the delta from the Source?
  • Logical connections of the Source Tables participating in ETL Loads
  • Logical connections of the Target Tables participating in ETL Loads
  • Physical connections of the Source Tables participating in ETL Loads
  • Physical connections of the Target Tables participating in ETL Loads

Most of the S2T’s are written in Excel spread sheet. Where you will find many columns, each column is important for the ETL Load to be designed.

Major components of an S2T,

  1. Version Number – To maintain and track the ongoing changes happening in the S2T are tracked

using the Version Numbers.

 SourceDatabase– Source Database name or names will be mentioned in this space

 Source Column– All the Source columns from respective Source Database are mentioned in this space (these columns will undergo the transformations if required.

 Extraction Criteria– As I already mentioned, we are not going to pull all the data from the source before transformation however we need to pull the data that we required for the transformation and this will be mentioned in this space. All the Joins and Unions will be done here so as a Tester we need to understand and analysis this area with more care.

 Example – Now I wanted to load Customers details and their balance from Savings account. Customers Details will be fetched from Cust_Detl table and the Balance from ACCT_BALANCE table, so you need to perform join in between. This extracting filters only Customers Savings account. So I consider this is my Data extraction criteria.

 Filter Criteria– After we extracted the data from the source we need to filter the data if required for few or more target tables and this will be covered in this space.

 Example – I have extracted 100 records from the source which is linked to Savings account and to ACCT_OVER_LIMIT , we just want to load , customer’s who’s acct balance is more than $100, so Data Modelers, will use the filter rule just below to the Record extraction criteria.

 Target Database – Target Database name or names will be mentioned in this space

 Target Column – All the Target columns from respective Target Database are mentioned in this space (these columns will undergo the transformations if required.

 Key or Value Columns–Next to the Target Column names you could see Key / Value. Key column means the column that makes the record unique and Value means, what makes the record time variant.

Comments and Version Changes – This space will explain us what was in the S2T before and what changed now. Comments will tell us more about the transformation.

What we need to look in the S2T?

As soon as you get the S2T, please query your Staging Source tables and check the data that you have got will satisfy your transformation rule. S2T’s will be written with the SIT phase data, and the rules mentioned might change as soon as we get the UAT Phase data. So to achieve the good quality of testing we always interested in UAT data.

Example: Suppose if Data modeler mentioned a column as Key column (which should not change consecutive loads). But you have noticed that column values are changing from source on consecutive loads then you should notify the data modeler and the source system , either one should be correct and have to make the changes from their end.

Most of us will confuse the below transformation logic

  1. Difference between IN and NOT IN operator

Example: If the source columns are NOT NULL BLANK SPACE 0 and DO NOT LOAD the record. In this case, we might look for the record in the target when it has Unknown values.

  1. Joins – Even if the Data modelers mentioned the Joins in the Extraction rules we need to ensure the joins are working with the source data provide. This should be done because the data modelers will not experiment the transformation rules on the source data given.
  2. Reference tables – In every S2T you will have separate spread sheets that will have reference table names and the data needs to be inserted and the Insert scripts prepared by the Data Modelers.

 Thanks for reading – Comments are Expected 🙂

Cheers – Asik

Advertisements

2 thoughts on “About Source to Target (S2T) Document or Mapping Document

    Tony Chang said:
    April 26, 2015 at 3:16 am

    The article is very helpful and informative especially on the breakdown over each of the critical components to complete a successful S2T data mapping exercise.

    As you mentioned that most of the S2T’s are written in Excel spreadsheets, do you find it effective, or it is used simply because there isn’t any alternate productive tool?

    I often found data professionals frustrated choosing the following when it comes to data mapping exercise:
    1) Excel (not scalable – painful in version controls, scattered copies and team collaboration, etc.); or
    2) Heavy & costly tool-set (etl/migration execution focused instead & big learning curve)

    Much appreciated if you can share some of your thoughts!

    AnalytiXDS said:
    December 8, 2015 at 11:16 am

    ETL, QA, profiling and requirements phases all have their tools, but strangely there are hardly any tools when it comes to pre-ETL data spec/mapping phase of large implementations. Though Excel is extensively used in data mapping phase but it is prone to errors

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s