Data file

Chapter 14 – A Sample ETL Project – Customer Management in a Hospital

Posted on Updated on

Hi All,

Have a please week ahead !!!

In this post let me explain a Project Structure.

asik_world

I have a Multi-Specialty Hospital located across the world. My hospital is famous for Vaccination. Patients who comes to my hospital across globe will be given a User Card with which they can access any of my hospital in the world.

Business Need

Now we maintain all customers in one Database. There are heaps and heaps of customers related to my hospital.So I decided to split up the customers based on the country and load them into corresponding country tables. Elaborated details of this requirements will be mentioned in the Business Requirement Specification.

Design Need 

To pull the customers as per Country, a designer should know what are all the places the Customer Data is available. So the data extracting will be done by our Source System. They will pull the all the relevant customer data and will give us a Data file.

In Design Documents you will have

Source File Specification 

1)   File Name Specification – Name String , Extension of the files
2)   Date and Time format of the File – YYYYMMDD, HHMMSSTT or any other format
3)   Control File Specification – Name String , Extension of the files
5)   Header Records Layout – |H|Customer_Records|20131012|1245872|
6)   Details Record Layout – |D|Asik|123456|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
7)   Trailer Record Layout – |T|10|

Detail Records will tell you what data you are getting from source, what data type, is it mandatory or not and the length of the column.

File Position Column_Name Filed Length Data Type Mandatory (Y/N) Key Column
1 Customer Name 255 VARCHAR Y Y
2 Customer ID 18 VARCHAR Y N
3 Customer Open Date 8 DATE Y N
4 Last Consulted Date 8 DATE N N
5 Vacination Type 5 CHAR N N
6 Doctor Consulted 255 CHAR N N
7 State 5 CHAR N N
8 Country 5 CHAR N N
9 Post Code 5 INT N N
10 Date of Birth 8 DATE N N
11 Active Customer 1 CHAR N N

Click below

Please refer my Post – Source File Validation to know how validate the above details.

The sample file format will be

|H|Customer_Records|20131012|1245872|
|D|Asik|123456|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|D|Alex|123457|20101012|20121013|MVD|Sanjay1|SA|USA|06031987|A
|D|John|123458|20101012|20121013|MVD|Sanjay|TN|IND|06031987|A
|D|Mathew|123459|20101012|20121013|MVD|Sanjay|WAS|PHIL|06031987|A
|D|Matt|12345|20101012|20121013|MVD|Sanjay|BOS|NYC|06031987|A
|D|Jacob|1256|20101012|20121013|MVD|Sanjay|VIC|AU|06031987|A
|D|Arun|1456|20101012|20121013|MVD|Sanjay|QA|AU|06031987|A
|D|Wague|23456|20101012|20121013|MVD|Sanjay|AOL|CAN|06031987|A
|D|Steve|126|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|D|Neil|156|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|T|10|

Now using the ETL process now we loaded the data into Staging Tables. Intermediate tables will look like below

Staging Table Load 

Click below

Please Check my ETL Process post how the data is loaded and what we need to verify from this Step :

Name Cust_I Open_Dt Consul_Dt VAC_ID DR_Name State County DOB FLAG
Asik 123456 20101012 20121013 MVD Sanjay NSW AU 6031987 A
Alex 123457 20101012 20121013 MVD Sanjay1 SA USA 6031987 A
John 123458 20101012 20121013 MVD Sanjay TN IND 6031987 A
Mathew 123459 20101012 20121013 MVD Sanjay WAS PHIL 6031987 A
Matt 12345 20101012 20121013 MVD Sanjay BOS NYC 6031987 A
Jacob 1256 20101012 20121013 MVD Sanjay VIC AU 6031987 A
Arun 1456 20101012 20121013 MVD Sanjay QA AU 6031987 A
Wague 23456 20101012 20121013 MVD Sanjay AOL CAN 6031987 A
Steve 126 20101012 20121013 MVD Sanjay NSW AU 6031987 A
Neil 156 20101012 20121013 MVD Sanjay NSW AU 6031987 A

Now my Staging Data is ready and we need to load them into corresponding Target Tables.

As a project we will test data load from file to Staging as Phase-I and Staging to Target tables to Phase II.

In this project we are not having any transformation rules , so we will insert the records from Staging tables to corresponding target tables.

Like

All customers related to India will go to Table_India and so on. In my next blog let me come up with few more complex examples for Target tables load.

The Source to Target Document will looks like

 Customers From India will be loaded by below logic
S.No Source Column Source Table Transformation Rule Target Table KEY / Value Target Column
1 Customer Name CUST_STG If Source value is Null Do not Load Else load CUST_IND Key CUST_NAME
2 Customer ID CUST_STG If Source value is Null Do not Load Else load CUST_IND Key CUST_ID
3 Customer Open Date CUST_STG Assign Source Value CUST_IND Value CUST_OP_DT
4 Last Consulted Date CUST_STG Assign Source Value CUST_IND Value CUST_CONS_DT
5 Vacination Type CUST_STG Assign Source Value CUST_IND CUST_VAC_TYPE
6 Doctor Consulted CUST_STG Assign Source Value CUST_IND CUST_DR_CONS
7 State CUST_STG If Source value is Null Do not Load Else load CUST_IND Value CUST_STATE
8 Country CUST_STG If Source value is Null Do not Load Else load CUST_IND Value CUST_COUNTRY
9 Post Code CUST_STG If Source value is Null Do not Load Else load CUST_IND CUST_POS_C
10 Date of Birth CUST_STG Assign Source Value CUST_IND CUST_DOB
11 Active Customer CUST_STG Assign Source Value CUST_IND CUST_STATUS

So based on above rule will load the data into IND_CUST tables.

Cheers

Asik

Advertisements

Chapter 4 – Source Data – Static Verification

Posted on Updated on

Hi Guys ,

So we are ready with Test Environments, Test Plans, Test Cases to begin with Static Testing.

When we design our test cases we need to write test cases and test conditions right from Source validation till the Reports testing.

Image

1. Test cases to validated the Source Data .

What you wanted to do with source data ? Why it is important ? Why do we need to know the importance of the source Data ?

Let me explain you with normal Scenario!!!

You wanted to purchase a Mobile so what you will do , start analyzing the markets with different channels like [Suggestions from friends] ,[Surfing in Internet], [Checking in news Papers Advertisements], [Checking @ Outlets]. In this you will not get the same details from all of these channels. So we will note down all the data retrieved from different channels and choose what you need !!!

Like the above scenario, a customer who approaches an Organization will go through multiple channels and share multiple information but at the end what you need for your Reports will be Designed , Developed , Tested and Extracted by Us 🙂

As far as I know, if you are 90% OK with the source data then there are high possibility of not finding even a single defect @ any of the test phases unless care less mistakes or code defects found by developers. The defects leaked by Developers can be easily fixed because ETL code is under our control. If a defect is leaked because of the data then it is a huge impact to the projects because we need to go back to the source system and inform them to fix !!!!

What all you need to verify in Source data ?

Just think you are intermediate between a seller (Source) and a target (Seller) . Do you buy defective product from your buyer? Does your buyer buy defective products from you ? No 😦 so please ensure you are buying a non defective product and a selling a 90% defect free product to your customer.

Coming to Business Intelligence 😉

1.The data that we need will be sent to our address (Unix Landing Directory) tester should verify the data has arrived to the correct address (Landing Directory).

2.Our source system will tell us on what Format they are sending the files. So the tester should verify the Name of the file is as what source said to us.

3.Date component in the file name is critical for us, this date is the date when the file is extracted for us.

4.The Source data will be sent to use is different formats , mostly it will be .DAT (data) or .DLY (daily) files.

5.Along with DATA files they will send CTL (Empty file) to make sure all the files arrived -as Acknowledgement copy 🙂

6.Now you validated the address (Landing Directory) and File names.

7. Now open the file you will see Numbers, Alpha-Numeric, Alpha characters separated by [,] or [|] – you need to verify how its separated

8.Most of the files will have 3 sections – Header , Detail and Trailer records.Testers should verify these sections are retrieved as per the specifications.

9.As a tester you should be interested on the data how it is Organized as per the Specification? the order of the column is matching with the order given in Specification ? Data types of the columns are matching with the data types mention in the Specification?

Order of the column:

In the spec you have this Order |Name|Place|Number|DOB|  but in the file you are getting as  |Name|DOB|Number|Place| —Wohooo its a Defect 🙂

Number of the Columns :

In the spec you have this Order |Name|Place|Number|DOB|  but in the file you are getting as  |Name|DOB|Place| —Wohooo its a Defect 🙂

Data Type of the Columns:

In the spec you have this Order |Name as VARCHAR |Place as VARCHAR |Number as INTEGER |DOB as DATE | but in the file you are getting as |Name as VARCHAR |Place as VARCHAR |Number as CHAR|DOB as DATE | Wohooo its a Defect 🙂

File name specification :

What are the file name, how it should be named, and extension of the file (.DAT), business time stamp?
File name should be as per the Spec, because developers mention the exact file name in their Data Stage File watched job. If the file name is not as per the Spec then the Job fail due to the mismatch. The time stamp filed (Date + Time) should be validated, example the Spec says the Date should be DDMMYYYY if source is sending the date as MMDDYYYY then our Data Stage job fails due to Invalid Date.

File landing location :

Each project has specific location to X-Com the files from Source System. Testers need to verify the file is arrived
to the correct location.

(Note: If tester cannot able to view the file due to File size, he can view the file using Unix Commands in PUTTY
server)

File Header specification :

In general every file must have Header and the header record will contain when the file is processed. The testers
should aware of the Date in External file name and the date in the Header details.
i) Date in External file name: The date when the file is X-Com’d
ii) Date in Header record: The date when the file is processed.
Suppose if SAP system is extracting the file on 3/12/2013 and X-Com’ing on 4/12/2013, then External file name is
4/12/2013 and Header date is 3/12/2013. The Header is very important because Developer will use the Header
date as EFFECTIVE Date for Data loads.

File detailed records specification:

Detailed records section of the file contains the actual data that need to be loaded into Warehouse. Detail records
are separated by Pipe delimiter in general and it may change from source to source. Testers should be very
conscious about detailed records.

File Position Column Name Key Column Null / Not Null Data Type Filed Length
1 User ID Y N INTEGER 255
2 Dept ID Y N INTEGER 255
3 Name N Y VARCHAR 255
4 Contact Number N Y INTEGER 18
5 Address N Y VARCHAR 255
6 DOB N Y DATE 8

The testers should verify:
1.The columns are in the correct order as per the Spec

2.If the columns are mentioned as Key then source should send us only one record for that combination, if
find more than one record after loading into the Staging tables then raise defect

3.If the columns are mentioned as Not Null in the spec then we should not raise Nulls (||) from the source,
suppose if we get NULLs , our Data Stage job fails due to NULLs presents in Not Null Columns  , now
raise defect to Source system

4.Data type verification is the critical stuff, if a column is defined as INT then the source should send INT only
if not please raise defect to source system.

5. Filed length should be as per the spec, if the source sends less then what spec says then its OK, if it
exceeds then please raise defects to source system, because it will fail our Data Stage load jobs.

File Trailer record specification:

Trailers records will tell you how many records in the file. And the count could be including Header and Trailer. If
the file is related to transactional data then we might have a Hash total count column.
The tester should use the Trailer record count to verify the reconciliation between source file to staging table data.

So now we have validated our source 🙂 @ my next blog we are going to see how to deal with the Requirement documents for execution.

Thanks – Asik