Process Control & its importance in ETL Testing

Posted on Updated on

Hi All,

Being a gap given today my blog about Process Control and its importance in ETL Testing.

process control

Everything in the world should be controlled by something, like how a mom controls her kids ūüôā Why we need to control the ETL process? and why it does matter for the Testers?

If we do not control the ETL process then we might not do the Correct Inserts and Updates to the target system. The code may be worn out and it behaves crazy. And if something goes wrong then you cant find the root cause of the issue. If you would have lost few records in ETL process without controlling the process then those records will be unknown for ever. Data fix is easy if something goes wrong. We can easily control the source behavior by controlling the ETL Process.

The extract , load and transfer mechanism is controlled by few attributes.

1. Load Date

2. Load Time

4. Process Identifiers 

5. Record Active Date

6. Record Expiry Date

Testers should verify these attributes because

1. Are the data loaded for today’s run?

2.Are the data updated correctly?

3.Are we loading the correct data from source system?

4.Are we maintaining the historical data?

5.Are the Insert and Update process is happening correctly?

6. Are we doing the correct Catch up for the delta records

 (Catch up means, I have inserted a record today and its delta record is coming one week later, this record should expire the          existing record and should insert the new record.)

7. Are the reconciliation is achieved in both Insert and Update scenarios?

In ETL process, the first step is loading into Intermediate Tables – just dumping all the source data into a table for further processing. You cant keep the data in the Intermediate tables for ever, because of the Database size. So you need to delete the data. So you can Truncate or Purge the data.

Truncate – Every time delete and load the data into intermediate tables.

Purge – Keep few days of data into Intermediate tables and delete the older data from Intermediate tables. This will be useful for capturing the historical data to do the catch up.

Here testers need to verify Truncating of records are functioning or not using the Date Columns in the Intermediate tables and Purging of records are happening for given data intervals using the Date and Time columns.

So now we are extracted the Source data and its controlled for Transforming – and we do not have any Delta Logic’s in the Intermediate tables.

Coming to the Warehouse tables, we have Inserts, Updates and Logical Deletes , let me explain you how these process are controlled and how testers needs to verify the same.

Again let me take my Face Book account for better understanding.

Inserts and Process Control

Today I ¬†have created a profile using the Face Book sign up page !! and when Face Book engineer load my data into their ware house they attach today’s date as my Effective Date that is 21/12/2013 and Expiry Date as HIGH DATE (coz they don’t know when I will deactivate the account) and they will give me a Process Active Identifier (to denote it is a Insert or update record) for inserting the record and as the record is active so they will not give Process Expiry¬†Identifier¬†.

> Testers needs to verify Date columns and the Identifiers are for the current day load or not.

Updates and Process Control

The very next data I have updated my Display Name, now my record is going for Delta,

what will happen to existing record’s¬†Effective Date,Effective Date,Process Active Identifier¬†,Process Expiry Identifier ?

and

what will happen to new record’s¬†Effective Date,Effective Date,Process Active Identifier¬†,Process Expiry Identifier ?

Above questions are based on our requirements. So Testers should verify the transformation for these Process Control values are achieved as per the Source to Target specifications.

Logical Deletes and Process Control

On the third day I’m deactivating the Face Book account. Now my record in the warehouse should be deactivate in the case

what will happen to columns Effective Date,Effective Date,Process Active Identifier ,Process Expiry Identifier for my delete record?

Again testers should verify the correct transformation is achieved for the Deleted records as per the Transformation rules.

By ensuring this Process Control attributes we can ensure that

“Are we inserting the correct records into warehouse”

“Are we updating the correct records”

“Are we logically delete the correct records”

“Catch of the records are achieved”

“Maintaining the historical data and so on :-)”

Cheers

Asik ūüôā

Advertisements

5 thoughts on “Process Control & its importance in ETL Testing

    Radhakrishna said:
    February 26, 2014 at 12:52 pm

    it’s very nice presentation asik…. Can you please give me an example project on banking domain?

    email: passion2conquer@gmail.com

      asikfromindia responded:
      March 6, 2014 at 10:03 am

      Thanks Radha…yeah sure I I’ll write a post for banking domain !!!!

    Anthony said:
    August 27, 2015 at 1:39 pm

    Hi, Can you please give me an example project to my id antolarain1981@gmail.com

    Anthony said:
    August 27, 2015 at 1:44 pm

    I have a question, when you say “Are we updating the correct records‚ÄĚ i believe every update to OLTP will be first Extracted and transformed then it will be tested in Target DB is it correct?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s