Test Data Management
Hope all are doing great….it’s very long time since I posted through my blog… my sincere apologies 😦
Test Data availability is the one of and most significant issue that will lead to Schedule slippage in DWH projects.
So the Testers and Test Managers should know what all are requirements for the test data and they need to define the test data strategy.
The Test Data Management should cover the below points:
- Subset of PROD data catered for Testing Requirements
Suppose if you are working in an Enhancement projects then you can avail the existing data that is already loaded into warehouse. In this instance you can’t use the data that’s pulled directly from Production as it’s sensitive to the Bank and its Customers.
- Mask the data as per regulatory and data privacy requirements
All the customer information related information should be masked. It is difficult to test the Masked data, suppose if the Name column should accept 20 (CHAR) and if the data is masked then we cannot do the BVA for the masked column.
- Fabricate Test Data in case of un availability
Your source system cannot provide you the data for all your Testing Scenarios. Lack of Test data is the major issue in the Data warehouse projects. A tester should analyze the system and need to create the test data in the form of Insert Scripts. The test data should be extracted correctly by the query before its loaded into the Target tables.
Typical Challenges involved in Test Data Management:
- Delay in Schedule
The test data from Upstream should be their UAT else our testing will not effective, Suppose if we are in SIT and our Up Stream have not Completed UAT then our Schedule will be postponed as unavailable of UAT tested data from Up Stream.
- Issues to cover Business Cases
Test Data for few Business Scenarios cannot be produced in Up Stream or through Data fabricating. Take this example, reopening of an Account, a Customer was having an account and he closed the same few years before and his Records are Kept Active in ACCT_STATUS Closed Records. When the Customer comes to Bank and the Bank will somehow Identify his existing account and will try to Reopen across all the tables to stop not having one more record in the warehouse, for this kind of scenarios it’s difficult to manufacture the data in a given time.
- Data Privacy & Compliance
Banks are curious about the Customer Data because of the Data theft in any means. So when we will get the data from PROD for our Regression Testing, most of the Customer related data will be masked as per Compliance policy. So we cannot produce the data or test those scenarios in our Environment.
- Dependency of SME’s
SME’s availability is a biggest issue in Data warehouse world.
System Architect – He is accountable for the System Design based on the Business Requirements. He will be closely working with Data Modeler (Data Scientist) to design the Source to Target. System Architect needs to fit the current requirement into the Existing or he should create an Application Landscape for the new requirement.
Data Modeler – He/She is accountable for designing the S2T. Data Modeler should know the Physical and Logical Design of the System and Database. For any Data warehouse project, the S2T is the important document.
ETL Developers – He/She should be accountable for High Level and Low Level design of the Requirements into Code. An ETL developer should be capable enough to design the ETL Code without compromising the performance of the Extraction and Lodging mechanism. He/She should know what kind of Transformation mechanism should be designed for the particular requirement.
Test Analyst / Test Managers – Test Managers should foresee all the technical and business requirements can be tested in given time frame, because the Test Data and System Behaviors might be changing which might cause the Schedule slippage.
- Availability of Adequate Data
We should start our System Testing with the Source System’s UAT Tested Data. If in case due to some issue if the Source System is not completed their System Testing, and they can provide only their System Tested data, then these data is not sufficient to continue our System Testing, so we need to fabricate the Test Data in order to kick off out Testing.
- Volume of Test Data
The Data threshold requirement will be declared by the System Owners. Whereas we cannot create that declared volume of data in our environment and test the performance beyond the threshold limit and below the threshold limit, so there might be Spool Space issues when we go to Production.
- Multiple Source & format of data
In case of Multiple Source Systems are participating in ETL, then it’s difficult to get the Data from different source systems on the same time to being our Testing, in this case again we need to create the Mock Test files to being our testing.
8. Version Control of Data
Versioning of Data is very difficult in the Data warehouse world whereas we can see the history of the data using few housekeeping columns (EFFT_D, EXPY_D). But the data loads and extracts should be versioned to avoid the confusions.
Chapter 8 – Business Intelligence – Do you know how to Make Cofee? Then you know (E)xtract T(ransform) & L(oad)
[I am using Civil Engineers and Coffee making scenarios for Better Understanding for Beginners]
Most widely spoken concept in Business Intelligence world is ETL – Extract Transform and Load. The persons who are in IT field already they call it as ETL Architecture 🙂 .
ETL Architecture :
Who use the word Architecture most commonly ? – Civil Engineers. They say Building Architecture so when you are constructing something you are using the word Architecture. You cant construct anything without prior planning. So a Civil engineer will draw a diagram that will tell how the building going to look like.
Likewise in Business Intelligence world, we have Solution Designers (Civil Engineers) who designs the ETL System and they are also called System Architectures!!!
So now you understood why we are calling it as ETL System Architect 🙂
Now let me explain how Extract Transform and Load process works.
How to make Coffee?
Asik in the below image is preparing coffee for his wife 😦 .
In the first step, he is preparing decoction (extracting the essence of the coffee from Beans).
In the second step, he is transforming it to Coffee by adding correct proportion Milk,Sugar
In the third Step, he loaded the Coffee into Container for Drinking
After drinking the coffee (testing) he found (defects) he needs to add more sugar 😦 he added and ready for (reporting) studying.
Now how ETL works ?
Developers involved in designing the Extract Transform Load Jobs. What is a Job? to make it understand ! we can say it is a Function.
These jobs will be written using Unix Scripting.
Extract Job (decoction ready) will bring all the data into Staging tables – testers needs to verify all the data is moved from file to Staging table
Transform Job (added milk and sugar) will apply rules and filter the records , alters the records as per documents – testers needs to verify all the data transformed as expected
Load Jobs (coffee ready ) once the transform jobs ran successful then Load jobs will load the data into Intermediate tables (will be used for delta check in Day_02) – testers needs to verify the the transformed data is loaded into target tables
1.In Load job we have Insert as well as Delta
Hope you guys understood the ETL Logic!!!
In my next blog, I will explain about INSERTS , UPDATES , Logically DELETES