Have a please week ahead !!!
In this post let me explain a Project Structure.
I have a Multi-Specialty Hospital located across the world. My hospital is famous for Vaccination. Patients who comes to my hospital across globe will be given a User Card with which they can access any of my hospital in the world.
Now we maintain all customers in one Database. There are heaps and heaps of customers related to my hospital.So I decided to split up the customers based on the country and load them into corresponding country tables. Elaborated details of this requirements will be mentioned in the Business Requirement Specification.
To pull the customers as per Country, a designer should know what are all the places the Customer Data is available. So the data extracting will be done by our Source System. They will pull the all the relevant customer data and will give us a Data file.
In Design Documents you will have
Source File Specification
1) File Name Specification – Name String , Extension of the files
2) Date and Time format of the File – YYYYMMDD, HHMMSSTT or any other format
3) Control File Specification – Name String , Extension of the files
5) Header Records Layout – |H|Customer_Records|20131012|1245872|
6) Details Record Layout – |D|Asik|123456|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
7) Trailer Record Layout – |T|10|
Detail Records will tell you what data you are getting from source, what data type, is it mandatory or not and the length of the column.
|File Position||Column_Name||Filed Length||Data Type||Mandatory (Y/N)||Key Column|
|3||Customer Open Date||8||DATE||Y||N|
|4||Last Consulted Date||8||DATE||N||N|
|10||Date of Birth||8||DATE||N||N|
Please refer my Post – Source File Validation to know how validate the above details.
The sample file format will be
Now using the ETL process now we loaded the data into Staging Tables. Intermediate tables will look like below
Staging Table Load
Now my Staging Data is ready and we need to load them into corresponding Target Tables.
As a project we will test data load from file to Staging as Phase-I and Staging to Target tables to Phase II.
In this project we are not having any transformation rules , so we will insert the records from Staging tables to corresponding target tables.
All customers related to India will go to Table_India and so on. In my next blog let me come up with few more complex examples for Target tables load.
The Source to Target Document will looks like
|Customers From India will be loaded by below logic|
|S.No||Source Column||Source Table||Transformation Rule||Target Table||KEY / Value||Target Column|
|1||Customer Name||CUST_STG||If Source value is Null Do not Load Else load||CUST_IND||Key||CUST_NAME|
|2||Customer ID||CUST_STG||If Source value is Null Do not Load Else load||CUST_IND||Key||CUST_ID|
|3||Customer Open Date||CUST_STG||Assign Source Value||CUST_IND||Value||CUST_OP_DT|
|4||Last Consulted Date||CUST_STG||Assign Source Value||CUST_IND||Value||CUST_CONS_DT|
|5||Vacination Type||CUST_STG||Assign Source Value||CUST_IND||CUST_VAC_TYPE|
|6||Doctor Consulted||CUST_STG||Assign Source Value||CUST_IND||CUST_DR_CONS|
|7||State||CUST_STG||If Source value is Null Do not Load Else load||CUST_IND||Value||CUST_STATE|
|8||Country||CUST_STG||If Source value is Null Do not Load Else load||CUST_IND||Value||CUST_COUNTRY|
|9||Post Code||CUST_STG||If Source value is Null Do not Load Else load||CUST_IND||CUST_POS_C|
|10||Date of Birth||CUST_STG||Assign Source Value||CUST_IND||CUST_DOB|
|11||Active Customer||CUST_STG||Assign Source Value||CUST_IND||CUST_STATUS|
So based on above rule will load the data into IND_CUST tables.
Hi Guys ,
So we are ready with Test Environments, Test Plans, Test Cases to begin with Static Testing.
When we design our test cases we need to write test cases and test conditions right from Source validation till the Reports testing.
1. Test cases to validated the Source Data .
What you wanted to do with source data ? Why it is important ? Why do we need to know the importance of the source Data ?
Let me explain you with normal Scenario!!!
You wanted to purchase a Mobile so what you will do , start analyzing the markets with different channels like [Suggestions from friends] ,[Surfing in Internet], [Checking in news Papers Advertisements], [Checking @ Outlets]. In this you will not get the same details from all of these channels. So we will note down all the data retrieved from different channels and choose what you need !!!
Like the above scenario, a customer who approaches an Organization will go through multiple channels and share multiple information but at the end what you need for your Reports will be Designed , Developed , Tested and Extracted by Us 🙂
As far as I know, if you are 90% OK with the source data then there are high possibility of not finding even a single defect @ any of the test phases unless care less mistakes or code defects found by developers. The defects leaked by Developers can be easily fixed because ETL code is under our control. If a defect is leaked because of the data then it is a huge impact to the projects because we need to go back to the source system and inform them to fix !!!!
What all you need to verify in Source data ?
Just think you are intermediate between a seller (Source) and a target (Seller) . Do you buy defective product from your buyer? Does your buyer buy defective products from you ? No 😦 so please ensure you are buying a non defective product and a selling a 90% defect free product to your customer.
Coming to Business Intelligence 😉
1.The data that we need will be sent to our address (Unix Landing Directory) tester should verify the data has arrived to the correct address (Landing Directory).
2.Our source system will tell us on what Format they are sending the files. So the tester should verify the Name of the file is as what source said to us.
3.Date component in the file name is critical for us, this date is the date when the file is extracted for us.
4.The Source data will be sent to use is different formats , mostly it will be .DAT (data) or .DLY (daily) files.
5.Along with DATA files they will send CTL (Empty file) to make sure all the files arrived -as Acknowledgement copy 🙂
6.Now you validated the address (Landing Directory) and File names.
7. Now open the file you will see Numbers, Alpha-Numeric, Alpha characters separated by [,] or [|] – you need to verify how its separated
8.Most of the files will have 3 sections – Header , Detail and Trailer records.Testers should verify these sections are retrieved as per the specifications.
9.As a tester you should be interested on the data how it is Organized as per the Specification? the order of the column is matching with the order given in Specification ? Data types of the columns are matching with the data types mention in the Specification?
Order of the column:
In the spec you have this Order |Name|Place|Number|DOB| but in the file you are getting as |Name|DOB|Number|Place| —Wohooo its a Defect 🙂
Number of the Columns :
In the spec you have this Order |Name|Place|Number|DOB| but in the file you are getting as |Name|DOB|Place| —Wohooo its a Defect 🙂
Data Type of the Columns:
In the spec you have this Order |Name as VARCHAR |Place as VARCHAR |Number as INTEGER |DOB as DATE | but in the file you are getting as |Name as VARCHAR |Place as VARCHAR |Number as CHAR|DOB as DATE | Wohooo its a Defect 🙂
File name specification :
What are the file name, how it should be named, and extension of the file (.DAT), business time stamp?
File name should be as per the Spec, because developers mention the exact file name in their Data Stage File watched job. If the file name is not as per the Spec then the Job fail due to the mismatch. The time stamp filed (Date + Time) should be validated, example the Spec says the Date should be DDMMYYYY if source is sending the date as MMDDYYYY then our Data Stage job fails due to Invalid Date.
File landing location :
Each project has specific location to X-Com the files from Source System. Testers need to verify the file is arrived
to the correct location.
(Note: If tester cannot able to view the file due to File size, he can view the file using Unix Commands in PUTTY
File Header specification :
In general every file must have Header and the header record will contain when the file is processed. The testers
should aware of the Date in External file name and the date in the Header details.
i) Date in External file name: The date when the file is X-Com’d
ii) Date in Header record: The date when the file is processed.
Suppose if SAP system is extracting the file on 3/12/2013 and X-Com’ing on 4/12/2013, then External file name is
4/12/2013 and Header date is 3/12/2013. The Header is very important because Developer will use the Header
date as EFFECTIVE Date for Data loads.
File detailed records specification:
Detailed records section of the file contains the actual data that need to be loaded into Warehouse. Detail records
are separated by Pipe delimiter in general and it may change from source to source. Testers should be very
conscious about detailed records.
|File Position||Column Name||Key Column||Null / Not Null||Data Type||Filed Length|
The testers should verify:
1.The columns are in the correct order as per the Spec
2.If the columns are mentioned as Key then source should send us only one record for that combination, if
find more than one record after loading into the Staging tables then raise defect
3.If the columns are mentioned as Not Null in the spec then we should not raise Nulls (||) from the source,
suppose if we get NULLs , our Data Stage job fails due to NULLs presents in Not Null Columns , now
raise defect to Source system
4.Data type verification is the critical stuff, if a column is defined as INT then the source should send INT only
if not please raise defect to source system.
5. Filed length should be as per the spec, if the source sends less then what spec says then its OK, if it
exceeds then please raise defects to source system, because it will fail our Data Stage load jobs.
File Trailer record specification:
Trailers records will tell you how many records in the file. And the count could be including Header and Trailer. If
the file is related to transactional data then we might have a Hash total count column.
The tester should use the Trailer record count to verify the reconciliation between source file to staging table data.
So now we have validated our source 🙂 @ my next blog we are going to see how to deal with the Requirement documents for execution.
Thanks – Asik