Month: December 2013

Do you know how to Refer a friend for a Job in your Company ? then you know Referential Integrity and ETL Testing

Posted on Updated on

Hi All,

Learn Referential Integrity by Referring your friend for a Job in your company.

refer-a-friend

What ever the domain is ? and what ever the data that you gonna load into Warehouse you will have the Reference Tables created and used for loading the data.

Why we need the Reference Tables?

Suppose if your company is conducting Interview for below mentioned skill set.

1) Software Testing Manual

2) Automation – QTP

3) Automation – Selenium

4) Automation – J-Unit

5) Data warehouse Testing

As usual you will have a on line application to enter all the Details related to the Employers. When it comes to the Skills , you will have the drop down list where you can see all the Skill sets in it.

Without using Reference Tables :

I am referring 5 candidates and loaded the data into warehouse for sending the Emails for the right candidate. Very next day company wants few more Skill Set, so they are updating the Front End drop down box to reflect the new skill sets. And peoples started applying for the added skill Set.

With Reference Tables :

Now I am going to create a Reference table which will have all the Skill Set that my company has like (Java, C, C++ and all types of testing). And in the Reference tables I mentioned some codes that related to each Skill Sets. And now I have all the Codes from the source system and will refer the codes to the skill sets in the reference tables and select the corresponding Skill Set.

Ask your self which one is more easier !!!! each time I cant go and modify the Front End application rather that just inserting a new record into the Reference Table and make them available for our data loads.

If the job that is not listed or not matched the we need to send a Default Message – ‘ Better Luck Next Time ‘ is the default value and ¬†error table check ūüôā

Tips and Techniques 

Reference tables are created to populate Product Hierarchy, Multiple statuses as per the source data, Multiple departments as per the source data, Interest calculation etc.,

Banking Example :

For example source will not send Product level hierarchy , so our data modellers creates MAP and TYPE tables and stores the Product hierarchy data. Source used to send the Product Number, and our code will search the PRODUCT Codes in the MAP tables and retrieves what all Products level data requires for the target tables. If the PRODUCT NUMBER not in the MAP table then we will write this into ERROR tables and assign Default value.

When verifying the Reference data the testers should concentrate on
a. What is the data type of the source column that is used for look up the MAAP or TYPE tables? If the source  column is NULL then design the SQL with TRIM function and without TRIM function, If developer didn’t use the TRIM function when he designs the SQL, there are high possibility of getting  space in the source column, then Look up couldn’t be successful.

{Left Join
MAP_TABLE where (date check)
On Trim(Srce_Column) = MAP_Column}

b. If the source column that is used for Look Up is time variant (Delta Column), then check Delta triggered by
looking up the current value from the source.

c. If the Look up is not successful then check the ERROR table for and entry related to the
column.
d. Testers doesn’t have any ownership of MAP or TYPE tables, they should not do any INSERTs or
UPDATES to them.

e. The testers should be very conscious about the data that inserted into MAP or TYPE tables.
Example:

MAP

Please refer the above image for better understanding of Reference Tables ūüôā

Cheers –¬†Asik

Advertisements

Process Control & its importance in ETL Testing

Posted on Updated on

Hi All,

Being a gap given today my blog about Process Control and its importance in ETL Testing.

process control

Everything in the world should be controlled by something, like how a mom controls her kids ūüôā Why we need to control the ETL process? and why it does matter for the Testers?

If we do not control the ETL process then we might not do the Correct Inserts and Updates to the target system. The code may be worn out and it behaves crazy. And if something goes wrong then you cant find the root cause of the issue. If you would have lost few records in ETL process without controlling the process then those records will be unknown for ever. Data fix is easy if something goes wrong. We can easily control the source behavior by controlling the ETL Process.

The extract , load and transfer mechanism is controlled by few attributes.

1. Load Date

2. Load Time

4. Process Identifiers 

5. Record Active Date

6. Record Expiry Date

Testers should verify these attributes because

1. Are the data loaded for today’s run?

2.Are the data updated correctly?

3.Are we loading the correct data from source system?

4.Are we maintaining the historical data?

5.Are the Insert and Update process is happening correctly?

6. Are we doing the correct Catch up for the delta records

 (Catch up means, I have inserted a record today and its delta record is coming one week later, this record should expire the          existing record and should insert the new record.)

7. Are the reconciliation is achieved in both Insert and Update scenarios?

In ETL process, the first step is loading into Intermediate Tables – just dumping all the source data into a table for further processing. You cant keep the data in the Intermediate tables for ever, because of the Database size. So you need to delete the data. So you can Truncate or Purge the data.

Truncate – Every time delete and load the data into intermediate tables.

Purge – Keep few days of data into Intermediate tables and delete the older data from Intermediate tables. This will be useful for capturing the historical data to do the catch up.

Here testers need to verify Truncating of records are functioning or not using the Date Columns in the Intermediate tables and Purging of records are happening for given data intervals using the Date and Time columns.

So now we are extracted the Source data and its controlled for Transforming – and we do not have any Delta Logic’s in the Intermediate tables.

Coming to the Warehouse tables, we have Inserts, Updates and Logical Deletes , let me explain you how these process are controlled and how testers needs to verify the same.

Again let me take my Face Book account for better understanding.

Inserts and Process Control

Today I ¬†have created a profile using the Face Book sign up page !! and when Face Book engineer load my data into their ware house they attach today’s date as my Effective Date that is 21/12/2013 and Expiry Date as HIGH DATE (coz they don’t know when I will deactivate the account) and they will give me a Process Active Identifier (to denote it is a Insert or update record) for inserting the record and as the record is active so they will not give Process Expiry¬†Identifier¬†.

> Testers needs to verify Date columns and the Identifiers are for the current day load or not.

Updates and Process Control

The very next data I have updated my Display Name, now my record is going for Delta,

what will happen to existing record’s¬†Effective Date,Effective Date,Process Active Identifier¬†,Process Expiry Identifier ?

and

what will happen to new record’s¬†Effective Date,Effective Date,Process Active Identifier¬†,Process Expiry Identifier ?

Above questions are based on our requirements. So Testers should verify the transformation for these Process Control values are achieved as per the Source to Target specifications.

Logical Deletes and Process Control

On the third day I’m deactivating the Face Book account. Now my record in the warehouse should be deactivate in the case

what will happen to columns Effective Date,Effective Date,Process Active Identifier ,Process Expiry Identifier for my delete record?

Again testers should verify the correct transformation is achieved for the Deleted records as per the Transformation rules.

By ensuring this Process Control attributes we can ensure that

“Are we inserting the correct records into warehouse”

“Are we updating the correct records”

“Are we logically delete the correct records”

“Catch of the records are achieved”

“Maintaining the historical data and so on :-)”

Cheers

Asik ūüôā

Chapter 14 – A Sample ETL Project – Customer Management in a Hospital

Posted on Updated on

Hi All,

Have a please week ahead !!!

In this post let me explain a Project Structure.

asik_world

I have a Multi-Specialty Hospital located across the world. My hospital is famous for Vaccination. Patients who comes to my hospital across globe will be given a User Card with which they can access any of my hospital in the world.

Business Need

Now we maintain all customers in one Database. There are heaps and heaps of customers related to my hospital.So I decided to split up the customers based on the country and load them into corresponding country tables. Elaborated details of this requirements will be mentioned in the Business Requirement Specification.

Design Need 

To pull the customers as per Country, a designer should know what are all the places the Customer Data is available. So the data extracting will be done by our Source System. They will pull the all the relevant customer data and will give us a Data file.

In Design Documents you will have

Source File Specification 

1)   File Name Specification РName String , Extension of the files
2)   Date and Time format of the File РYYYYMMDD, HHMMSSTT or any other format
3) ¬† Control File Specification –¬†Name String , Extension of the files
5)   Header Records Layout Р|H|Customer_Records|20131012|1245872|
6)   Details Record Layout Р|D|Asik|123456|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
7) ¬† Trailer Record Layout –¬†|T|10|

Detail Records will tell you what data you are getting from source, what data type, is it mandatory or not and the length of the column.

File Position Column_Name Filed Length Data Type Mandatory (Y/N) Key Column
1 Customer Name 255 VARCHAR Y Y
2 Customer ID 18 VARCHAR Y N
3 Customer Open Date 8 DATE Y N
4 Last Consulted Date 8 DATE N N
5 Vacination Type 5 CHAR N N
6 Doctor Consulted 255 CHAR N N
7 State 5 CHAR N N
8 Country 5 CHAR N N
9 Post Code 5 INT N N
10 Date of Birth 8 DATE N N
11 Active Customer 1 CHAR N N

Click below

Please refer my Post РSource File Validation to know how validate the above details.

The sample file format will be

|H|Customer_Records|20131012|1245872|
|D|Asik|123456|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|D|Alex|123457|20101012|20121013|MVD|Sanjay1|SA|USA|06031987|A
|D|John|123458|20101012|20121013|MVD|Sanjay|TN|IND|06031987|A
|D|Mathew|123459|20101012|20121013|MVD|Sanjay|WAS|PHIL|06031987|A
|D|Matt|12345|20101012|20121013|MVD|Sanjay|BOS|NYC|06031987|A
|D|Jacob|1256|20101012|20121013|MVD|Sanjay|VIC|AU|06031987|A
|D|Arun|1456|20101012|20121013|MVD|Sanjay|QA|AU|06031987|A
|D|Wague|23456|20101012|20121013|MVD|Sanjay|AOL|CAN|06031987|A
|D|Steve|126|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|D|Neil|156|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|T|10|

Now using the ETL process now we loaded the data into Staging Tables. Intermediate tables will look like below

Staging Table Load 

Click below

Please Check my ETL Process post how the data is loaded and what we need to verify from this Step :

Name Cust_I Open_Dt Consul_Dt VAC_ID DR_Name State County DOB FLAG
Asik 123456 20101012 20121013 MVD Sanjay NSW AU 6031987 A
Alex 123457 20101012 20121013 MVD Sanjay1 SA USA 6031987 A
John 123458 20101012 20121013 MVD Sanjay TN IND 6031987 A
Mathew 123459 20101012 20121013 MVD Sanjay WAS PHIL 6031987 A
Matt 12345 20101012 20121013 MVD Sanjay BOS NYC 6031987 A
Jacob 1256 20101012 20121013 MVD Sanjay VIC AU 6031987 A
Arun 1456 20101012 20121013 MVD Sanjay QA AU 6031987 A
Wague 23456 20101012 20121013 MVD Sanjay AOL CAN 6031987 A
Steve 126 20101012 20121013 MVD Sanjay NSW AU 6031987 A
Neil 156 20101012 20121013 MVD Sanjay NSW AU 6031987 A

Now my Staging Data is ready and we need to load them into corresponding Target Tables.

As a project we will test data load from file to Staging as Phase-I and Staging to Target tables to Phase II.

In this project we are not having any transformation rules , so we will insert the records from Staging tables to corresponding target tables.

Like

All customers related to India will go to Table_India and so on. In my next blog let me come up with few more complex examples for Target tables load.

The Source to Target Document will looks like

 Customers From India will be loaded by below logic
S.No Source Column Source Table Transformation Rule Target Table KEY / Value Target Column
1 Customer Name CUST_STG If Source value is Null Do not Load Else load CUST_IND Key CUST_NAME
2 Customer ID CUST_STG If Source value is Null Do not Load Else load CUST_IND Key CUST_ID
3 Customer Open Date CUST_STG Assign Source Value CUST_IND Value CUST_OP_DT
4 Last Consulted Date CUST_STG Assign Source Value CUST_IND Value CUST_CONS_DT
5 Vacination Type CUST_STG Assign Source Value CUST_IND CUST_VAC_TYPE
6 Doctor Consulted CUST_STG Assign Source Value CUST_IND CUST_DR_CONS
7 State CUST_STG If Source value is Null Do not Load Else load CUST_IND Value CUST_STATE
8 Country CUST_STG If Source value is Null Do not Load Else load CUST_IND Value CUST_COUNTRY
9 Post Code CUST_STG If Source value is Null Do not Load Else load CUST_IND CUST_POS_C
10 Date of Birth CUST_STG Assign Source Value CUST_IND CUST_DOB
11 Active Customer CUST_STG Assign Source Value CUST_IND CUST_STATUS

So based on above rule will load the data into IND_CUST tables.

Cheers

Asik

Chapter 13 – Unknown Verification in ETL Testing using an Online Job Application

Posted on Updated on

Hi All,

Hope you are enjoying your weekend. Have a wonderful weekend ahead !!!

Today I am going to explain how we need to take care of¬†Nulls Blanks Spaces 0’s from Source System. These¬†¬†Nulls Blanks Spaces 0’s are extracted from the upstream applications and given to us to load it into Ware house. But our Data Modelers may or may not want them to put into ware house as it will not give meaning to the business.

Image

Origination of Nulls Blanks Spaces 0s from Source System

I hope everyone used Online applications for your job search. You would have seen a ( * ) symbol in most of the field , so you need to enter some value.

1.In some field , you cant enter Numeric values

2.In some field , you can only enter Numeric Values

3.In some field , dates as per the given format

4.In some fields , you need to give your contact number as per the country standards

5.In some fields , you need to give your correct Email Id.

Like above you will be having so many check points for an online application. Say for an example, 2000 people applied for a Test Engineer job in my Company. And my Online Application has few Free Text field, applicant can enter what ever wanted to. In that few of them gave

1. Spaces in the Applicant Name (by mistake)

2. 0’s in the Experience column (because he is fresher)

3. Combinations of Spaces and 0s in the Phone Number field

4.They didn’t mention date fields.

My source system refers this Online webpage and pulls data and stores into their database. And will extract the data as files.

We will get the file as

D|Asik|8|06031987|9880112345|IND|PERSON@GMAIL.COM  РValid Record

D||8|06031987|9880112345|IND|PERSON@GMAIL.COM – In Valid Record

We load these records into our  source tables we will not consider these NULLs Blanks 0s and Spaces as the Staging table data types are VarChar (it can have all types of data).

Our Source to Target specification will be taking care of these Unknowns.

a) If the source field_1 is NULL then populate Null in the corresponding target column_1

b) If the source field_2 is NULL then do not load the record

c) If the source field_3 is 0 then populate Null in the corresponding target column_3

d) If the source field_4 is 0 then do not load the record

e) If the source field_5 is BLANK then populate Null in the corresponding target column_5

f) If the source field_6 is BLANK then do not load the record

g) If the source field_7 is Space then populate Null in the corresponding target column_7

h) If the source field_8 is Space  then do not load the record

This verification can be done for each columns separately or collectively to one column like,

If Column_1 from source is NULL, BLANK, SPACE, 0 then populate NULL to the corresponding target column_1

of

If Column_1 from source is NULL, BLANK, SPACE, 0 then Do Not load the record into ware house.

Whatever the circumstances our ETL code should stop Unknown records loading into warehouse.

Most of the S2T extraction criteria designed to stop the records into Warehouse if (mentioned) column values are any of NULL BLANK SPACE. Source system does not have any validations to stop Unknowns When they extract from their upstream. These unknown checks are closely dependent on Data Types of the mentioned columns.When tester designs the SQL, to test these scenarios, he should consider the Data Type of the columns.

1. Integer: Teradata accepts only NULLs and 0s for INTEGER columns. If we insert BLANK or SPACE it will take it as 0s only

2. Decimal: Teradata accepts only NULLs and 0.00s for Decimal columns. If we insert BLANK or SPACE it will take it as 0.0s only

3. CHAR:  All NULL BLANK SPACE 0 can be possible for CHAR columns, so we need to test all the Unknown possibilities for CHAR columns

4. VARCHAR: All NULL BLANK SPACE 0 can be possible for VARCHAR columns, so we need to test all the
Unknown possibilities for VARCHAR columns

When we test VARCHAR column we should be very conscious because:

Source Record from file,
Scenario_01 ‚Äď (0s with Space in VARCHAR columns)
|D| Asik| 51302138| 0|
These values will be loaded with spaces into staging tables. The tester designed SQL (where Column<>0) then this
record will not be stopped! So it should be (where TRIM(Column)<>0).
If you find any issues like this please raise the concern to Developer and ask them to add TRIM for
VARCHAR columns.

Scenario_02 (multiple 0000s in VRACHAR or CHAR columns)

|D| Asik| 51302138| 00| –> CHAR with 00
|D| Asik| 51302138|000000| –> VARCHAR with 000000
|D| Asik| 51302138|0| –>CHAR or VARCHAR with 0

Source can send 0, 00, 00000 to VARCHAR or CHAR columns, but the meaning is same its 0. If the tester designed SQL (where Column<>0) then only the 3rd record will be stopped whereas other two records will be loaded to in Target tables.

If you find any issues like this then please raise concerns to developer to add TRIM Leading 0 and check  <> 0.

Hope you all will Stop Unknown records into my warehouse.

Cheers Asik

My Next Post is will be a Sample Project !!!!

Chapter 12 – Know about Data Types and ETL Testing

Posted on Updated on

Hi All,

Have you created , Updated , Deleted Face book account to know about DWH concepts ? Today in this post let me explain you what is the necessity and importance of the Data types in ETL Testing.

water

We will start with our known examples :

Can you fill 10 liters of water into a 5 liters container?

“No, the container can have only 5 liters of water, if you fill more than its capacity then it will burst :-(”

Can you use Salt instead of sugar to make Tea?

“No, then every one will stop drinking Tea :-(”

Can we name a Kid using Numbers?

“No, if we keep numbers , then how many duplicate persons exists in this world ? just Imaging if ¬†I was named as 10215 !!!!

Can anyone have their Bank balance as absolute number ?

“No, because every money that you spent is fractional amount ! you cant have $ 5 all the time , it would be $ 5.28 most of the time.

Can you have your mobile number more than 10 digit ?or Can you have your mobile number as alphabets? 

“No, because the mobile number length is pre-defined and the number cant be¬†alphabets”

Like the above example

Our Source files, Source tables , Target tables are constructed with limitations. You cant have or keep the data that you want. You can keep or have the data that what system can accept.

In every programming we have this data types , most of them who reads this post knew about basics of Data types.

INTEGER, CHAR, VARCHAR, DECIMAL, FLOAT etc.,

Most of the time developers are testers encounters problems because of the data typing in Data warehouse world are ,

1. Correct Data type is not chosen in Source tables

2. Correct length of the data is not received from the source system in the source file

3. Source is not giving the values as per the data types mentioned

4. Data got truncated when loading it into Target tables.

5.The amount column precision is not populated correctly as Teradata changes it to make round off value.

6.Regarding Dates, source will send them as var-char but when we load it into target tables we keep as DATE and the format

The Data type and its length will be designed it its  DDL РData Definition Language . If you want to know about the tables properties then please use the blow query

a) ”¬†SHOW TABLE¬†Database.Table_Name¬†” – this will give you all about data types, data length. Not Null, Null, Primary Key definitions

b) ” HELP TABLE Database.Table_Name” – this will give you all about the table.¬†

As a Tester what we need to verify ?

Again as I said,

Check the data is matching with the data type mentioned in the spec.

Check any data truncation happened when source data is loaded into Staging tables

Check the data is the staging tables are as per the Staging tables DDL

Check the target table columns are loaded as per the Target tables DDL.

If it a Varchar columns from source ,then please take care of the space , invalid characters etc., right from source till staging tables, because data stage will not accept special characters

If its a Decimal column then make sure the precision is carried out till the target load

If its a Integer column then make sure you should not get blanks ans spaces from source

If its is a CHAR then check the length of the character that fit into this column

Like above we can add as many as much scenarios for Data Type verification.

Hops this blog helps you to understand what is the importance of the Data Types in ETL testing,

See you in my next post

Cheers –¬†Asik

Chapter 11 – Data Loads (Inserts, Updates, Deletes) Concepts by Creating a Face Book account , Updating your existing account and delete a account

Posted on

Hi

Hope you all enjoyed Tom and Jerry show !!!

In this blog I am going to explain the Importance of Knowing Inserts , Updates and Deletes using Face Book account.

scienceandtechnology-Facebook-friends-unlockaccounts_5-3-2013_99367_l

Insert – I am a new user , and I am trying to Sign up using my email id XYZ.gmail.com in Face Book , I got my account created – Inserting a new account into Face Book.

Duplicate РNow I am an existing user of the Face book if I try sign up again using XYZ.gmail.com then Face Book will throw an Error says  the user already in the Face Book РThis is a Duplicate record for the key XYZ.gmail.com

Update – When I created my Face Book account I kept my user name as ‘aashiq amrita’ but my friends said to change the display name, so I changed the name to ‘Asik Ali’ now my name is changed – This is Updating a existing record.

Copy –¬† When I created my Face Book account I used my Date of Birth as 06/03/1987, and now I was trying to modify with the same value ’06-03-1987′, nothing is changed – this is a copy record to my existing data.¬†

Delete records (logically) :  When I had fight with my Girl friend I went to Face Book account and I deactivated (but I can re open it later) РLogically delete

As a tester we need to ensure all the above mentioned logic are working when ETL jobs are executed.

Copy records and Duplicate records are different and we need to understand when it will happen.

Initial Day Load (Day_01) Insert Jobs will extract all the required data from the source and load them  into warehouse. Day_01 load will not eliminate duplicates until otherwise it is mention as part of Filter rule. РCreating a face book a account

Delta Load (Day_02) – In delta load the Insert Job as well as Delta jobs will run in Parallel.

a) Inserts will be done when the corresponding record is not in the target tables.

b) Deltas will be done when records are in Source and Target and change in the value column .

Delta jobs are nothing like comparing what is coming in to what is already there in .  If you have Copy or Duplicate records in this scenario of updating record what will happen??

Copy in updating a record :  

Copy records can be from source or can be existing in the target table

I am sending update for one field of my existing  Face Book account  by two different browser ??? try what will happen ?

Duplicate in updating a record :  

By mistake Face Book allowed me to create two accounts , and for that I am sending an update – so which of mine will be updated?

Logically delete a record : 

Like deleting a a face book account you can expire a target record without updating it, this will done based on the source data instructions.

When you say a record is active ? Р In most of the tables design we will have a Expiry date column Рif it is as high date 9999-12-31 then it is a Active record.

When you say a record is expired ? Р In most of the tables design we will have a Expiry date column Рif it is not  9999-12-31 then it is a Active record.

What happen to the existing record when you do Update ? –¬†While updating a record, it will insert a updated record with Expiry date as 9999-12-31 and expire the existing record with the date when its updated.

What happen to the existing record when you do Delete the record? While deleting a record will not insert a new record whereas it will expire the existing record by setting Expiry date as the date when it wants to be logically deleted.

Learn Inserts , Updates, Delete using your Face Book account.

Cheers

Asik

Chapter 10 – Developer and Tester Healthy Environment by Tom and Jerry Style

Posted on

Hi All,

I hope you guys are tired of reading stories behind the Business Intelligence Testing ! to make us relax Tom and Jerry gonna play as Developer and Tester role !!

Please have a look and enjoy yourself !!!

Project Starts Here 

Image

 

Project Discussion 

Image

 

Development phase – Coding and Unit testing

Image

Image

Image

Testing Phase

Image

The code deployment !

Image

Defect reported by Tester 

Image

 

Development Lead talks to Developer about Defect

Image

 

Developer is telling it is not a defect

Image

Tester is explaining it is a Defect

Image

 

Developer fixed the Defect :

Image

 

The defect is fixed and retested 

Image

 

Developer and Tester have gone for Holiday after the deployment 

Image

After they come back ! they came to know the project is running successfully in the Production 

Image

 

Hope you enjoyed the Show – Catch you all Jerry’s in my next blog.