Database

Data warehouse – Test Data Mangement

Posted on Updated on

Test Data Management

Hi All,

Hope all are doing great….it’s very long time since I posted through my blog… my sincere apologies  😦

Happy Reading……:-)

Software-testing-trends-2013

Test Data availability is the one of and most significant issue that will lead to Schedule slippage in DWH projects.

So the Testers and Test Managers should know what all are requirements for the test data and they need to define the test data strategy.

The Test Data Management should cover the below points:

  1. Subset of PROD data catered for Testing Requirements

 Suppose if you are working in an Enhancement projects then you can avail the existing data that is already loaded into warehouse. In this instance you can’t use the data that’s pulled directly from Production as it’s sensitive to the Bank and its Customers.

  1. Mask the data as per regulatory and data privacy requirements 

All the customer information related information should be masked. It is difficult to test the Masked data, suppose if the Name column should accept 20 (CHAR) and if the data is masked then we cannot do the BVA for the masked column.

  1. Fabricate Test Data in case of un availability 

Your source system cannot provide you the data for all your Testing Scenarios. Lack of Test data is the major issue in the Data warehouse projects. A tester should analyze the system and need to create the test data in the form of Insert Scripts. The test data should be extracted correctly by the query before its loaded into the Target tables.

Typical Challenges involved in Test Data Management:

  1. Delay in Schedule

The test data from Upstream should be their UAT else our testing will not effective, Suppose if we are in SIT and our Up Stream have not Completed UAT then our Schedule will be postponed as unavailable of UAT tested data from Up Stream.

  1. Issues to cover Business Cases

Test Data for few Business Scenarios cannot be produced in Up Stream or through Data fabricating. Take this example, reopening of an Account, a Customer was having an account and he closed the same few years before and his Records are Kept Active in ACCT_STATUS Closed Records. When the Customer comes to Bank and the Bank will somehow Identify his existing account and will try to Reopen across all the tables to stop not having one more record in the warehouse, for this kind of scenarios it’s difficult to manufacture the data in a given time.

  1. Data Privacy & Compliance

Banks are curious about the Customer Data because of the Data theft in any means. So when we will get the data from PROD for our Regression Testing, most of the Customer related data will be masked as per Compliance policy. So we cannot produce the data or test those scenarios in our Environment.

  1. Dependency of SME’s

SME’s availability is a biggest issue in Data warehouse world.

System Architect – He is accountable for the System Design based on the Business Requirements. He will be closely working with Data Modeler (Data Scientist) to design the Source to Target. System Architect needs to fit the current requirement into the Existing or he should create an Application Landscape for the new requirement.

Data Modeler – He/She is accountable for designing the S2T. Data Modeler should know the Physical and Logical Design of the System and Database. For any Data warehouse project, the S2T is the important document.

ETL Developers – He/She should be accountable for High Level and Low Level design of the Requirements into Code. An ETL developer should be capable enough to design the ETL Code without compromising the performance of the Extraction and Lodging mechanism. He/She should know what kind of Transformation mechanism should be designed for the particular requirement.

Test Analyst / Test Managers – Test Managers should foresee all the technical and business requirements can be tested in given time frame, because the Test Data and System Behaviors might be changing which might cause the Schedule slippage.

  1. Availability of Adequate Data

We should start our System Testing with the Source System’s UAT Tested Data. If in case due to some issue if the Source System is not completed their System Testing, and they can provide only their System Tested data, then these data is not sufficient to continue our System Testing, so we need to fabricate the Test Data in order to kick off out Testing.

  1. Volume of Test Data

The Data threshold requirement will be declared by the System Owners. Whereas we cannot create that declared volume of data in our environment and test the performance beyond the threshold limit and below the threshold limit, so there might be Spool Space issues when we go to Production.

  1. Multiple Source & format of data

In case of Multiple Source Systems are participating in ETL, then it’s difficult to get the Data from different source systems on the same time to being our Testing, in this case again we need to create the Mock Test files to being our testing.

 8.     Version Control of Data

Versioning of Data is very difficult in the Data warehouse world whereas we can see the history of the data using few housekeeping columns (EFFT_D, EXPY_D). But the data loads and extracts should be versioned to avoid the confusions.

Cheers,

Asik

ETL Testing Roll Back post – Ask yourself and get ready to move to ETL Testing world

Posted on Updated on

Hello Folks

It had been long time I posted something through my Blog L , I’m really sorry for that 😦

jobinterview

I have met a person unknowingly in a common place, we were discussing about ETL testing and other stuffs and I told him I’m writing blogs about ETL Testing.  He was surprised and said that he is one visitor of my Blog. I am happy that I have met a unknown person who reads and benefited with my blog.

I have got few requests ‘Can you please share some of Interview Q & A?’

I replied them

“If you have hands on experience and if you would have mentioned in your CV about you skills and responsibilities then why you wanted to know about Q&A in ETL Testing”

Guys who are trying in ETL Testing domain, are thinking that if they know how to write SQL can clear the Interview and can get the job. Folks that’s not possible. I request you to think that ETL testing is not a different entity; your job is testing on whatever technology it is.

If you are taking an interview, please go through your CV and be strong on what you have mentioned. If you are asked other than what you have mentioned then please raise your hand ‘I’m sorry, I haven’t worked in it’

Ok let me come subject what I wanted to discuss here!!!

Here I’m going ask questions to myself to test my technical capability, if I win myself then I can win anywhere in the Globe, right guys?

So try to ask as much questions as possible to yourself!!!!

Q1

INTRV: Asik, Can you please let us know your roles and responsibilities in your previous project?

CAND: Hello, I’m from here and blaw, blaw, blaw………

Q2

INTRV: Asik, How you can scale yourself in writing in SQL from 1-5?

CAND:  [—-Fill your Best Answer here——]

Q3

INTRV: Asik, What all are the tools that you used in your ETL Projects?

CAND: [—-Fill your Best Answer here——]

Q4

INTRV: Asik, Can you please explain me the ETL architecture?

CAND: [—-Fill your Best Answer here——]

Q5

INTRV: What all are the Joins that you have worked so far? And can you please write down joins using two tables T1 and T2?

CAND: [—-Fill your Best Answer here——]

Q6

INTRV: Asik, what is the difference between Inner join and Outer Join?

CAND: [—-Fill your Best Answer here——]

Q7

INTRV: Asik, can you please explain me about the significance of Data Types in ETL Testing?

CAND: [—-Fill your Best Answer here——]

Q8

INTRV: Asik, I have a column as VARCHAR from source but corresponding target column is INTERGET, how you can write a query to check the target column?

CAND: [—-Fill your Best Answer here——]

Q9

INTRV: Asik, what is the significance of the Staging tables in the ETL loads?

CAND: [—-Fill your Best Answer here——]

Q10

INTRV: Asik, what is the difference between, copy records and duplicate records?

CAND: [—-Fill your Best Answer here——]

Q11

INTRV: Asik, what is the significance of Oracle Control tables in ETL Testing?

CAND: [—-Fill your Best Answer here——]

Q12

INTRV: Asik, what does it mean if you get a NULL, Blank, Space or 0 from Source?

CAND: [—-Fill your Best Answer here——]

Q13

INTRV: Asik, I have a VARCHAR column from source, this column can allow 0, suppose if source is sending 0000 instead of 0 , and the rule is we should not allow this record into target table. Does our SQL will stop this record if we write the condition as [Column <> 0]

CAND: [—-Fill your Best Answer here——]

Q14

INTRV: Asik, what is the difference between the Primary Index and Primary Key?

CAND: [—-Fill your Best Answer here——]

Q15

INTRV: Asik, what could be consequence if the reconciliation failed in the Day_01 load or Day_02 load?

CAND: [—-Fill your Best Answer here——]

Q16

INTRV: Asik, Why you wanted to test the DDL of any table?

CAND: [—-Fill your Best Answer here——]

Q17

INTRV: Asik, what is the difference between Data Definition Language and View Definition?

CAND: [—-Fill your Best Answer here——]

Q18

INTRV: Asik, as a tester what all are the access you required? And do you know the SQL query syntax to get the access?

CAND: [—-Fill your Best Answer here——]

 Q19

INTRV: Asik, what is the difference between DB level access, Table level access and view level access?

CAND: [—-Fill your Best Answer here——]

Q20

INTRV: Asik, What all are the steps you take to validate the Source Files? What is significance of Source File Names, Header, Trailer and Detailed Records?

CAND: [—-Fill your Best Answer here——]

Q21

INTRV: Asik, if you could not able to open a file manually by double click due to the file size? How you will access the file and look at the records?

CAND: [—-Fill your Best Answer here——]

Q22

INTRV: Asik, What is the High Severity defect that you raised so far in any ETL project and why you have classified as Sev-01?

CAND: [—-Fill your Best Answer here——]

Q22

INTRV: Asik, What is the High Priority defect that you raised so far in any ETL project and why you have classified as Sev-01?

CAND: [—-Fill your Best Answer here——]

Q23

INTRV: Asik, Explain about Inserts, Updates, Logical Deletes in any given ETL Projects?

CAND: [—-Fill your Best Answer here——]

Q24

INTRV: Asik, Why and what is necessity of Reference tables in any ETL Projects?

CAND: [—-Fill your Best Answer here——]

 Q25

INTRV: Asik, Can you modify Reference table data if you have Update access to the reference tables?

CAND: [—-Fill your Best Answer here——]

Q26

INTRV: Asik, What all are the Steps by which you can achieve the test data for a given scenarios?

CAND: [—-Fill your Best Answer here——]

Q27

INTRV: Asik, What is the difference between the Time Variant and Non time variant tables in any ETL Testing projects and how you will test it?

CAND: [—-Fill your Best Answer here——]

Q28

INTRV: Asik, Can you delete a table or Drop a table? What happened if we Drop a tables?

CAND: [—-Fill your Best Answer here——]

Q29

INTRV: Asik, What is the significance of Views in any ETL testing projects?

CAND: [—-Fill your Best Answer here——]

Q30

INTRV: Asik, Does a table contain more than one Primary Key? And Does a table contain more than one Primary Index?

CAND: [—-Fill your Best Answer here——]

Q31

INTRV: Asik, DBA has granted Select access to DB, does it mean that you have select access to all the tables sits in that DB?

CAND: [—-Fill your Best Answer here——]

 Q31

INTRV: Asik, if the target column is Integer (30) can you load a Varchar (30) into it?

CAND: [—-Fill your Best Answer here——]

Q32

INTRV: Asik, target Date column is constrained as DDMMYYYY but from source we are getting it as MMDDYYYY? What will happen to load job?

CAND: [—-Fill your Best Answer here——]

Q34

INTRV: Asik, What does it mean by Full-Snap shot and historical records?

CAND: [—-Fill your Best Answer here——]

Q35

INTRV: Asik, What does it mean Catch up logic involved in loads and how you will test it?

CAND: [—-Fill your Best Answer here——]

Q36

INTRV: Asik, What is the difference between Load projects and Extract projects?

CAND: [—-Fill your Best Answer here——]

Q37

INTRV: Asik, if the target column is defined as Decimal (4, 3), can you load the values as Decimal (4, 5)? Or can you load the values as Decimal (4, 3)

CAND: [—-Fill your Best Answer here——]

Q37

INTRV: Asik, What is the difference between Union and Union All? And where we want to use Union and where we want to use Union All?

CAND: [—-Fill your Best Answer here——]

 Q38

INTRV: Asik, What is the use of Cross Join? And what circumstances you use the Cross Join?

CAND: [—-Fill your Best Answer here——]

Q39

INTRV: Asik, What is the AutoSys comment to trigger the job, abort the job, hold the job?

CAND: [—-Fill your Best Answer here——]

Q40

INTRV: Asik, What is the significance of Group by functions in SQL and where we will use the Group by functions?

CAND: [—-Fill your Best Answer here——]

Q41

INTRV: Asik, How you can find the duplicates in the Staging tables?

CAND: [—-Fill your Best Answer here——]

Q42

INTRV: Asik, What is the difference between SET table and Multi Set tables? From where we can get these details?

CAND: [—-Fill your Best Answer here——]

Q43

INTRV: Asik, Can one View table refer another view table to create a new view table?

CAND: [—-Fill your Best Answer here——]

Q44

INTRV: Asik, Can one View table refer another view table to create a new view table?

CAND: [—-Fill your Best Answer here——]

 Q45

INTRV: Asik, What all are the common SQL Syntax errors you encountered and how you are resolved it?

CAND: [—-Fill your Best Answer here——]

Q46

INTRV: Asik, Do you have any idea what is Data Modeling?

CAND: [—-Fill your Best Answer here——]

Q47

INTRV: Asik, What is the difference between the Physical Design and Logical Design?

CAND: [—-Fill your Best Answer here——]

Q48

INTRV: Asik, What all are the Items available in a generic S2T?

CAND: [—-Fill your Best Answer here——]

Q49

INTRV: Asik, what does it mean if a column is mentioned as Value column in the S2T?

CAND: [—-Fill your Best Answer here——]

Q50

INTRV: Asik, Have I answered all of my basic questions to go to advanced questions?

CAND: I am still thinking and trying to answer few questions J

What is Quality Assurance & Quality Control? and Who is PQA, CC, DPA in QA ???

Posted on

Hi All,

Quality Assurance and Quality Control

quality-blocks1

 

Being a Test Professionals we should know about QA and QC. What we are doing is actually Quality Control related stuff and what people involved in bringing up CMMI, ISO standards are Quality Assurance. Quality Assurance is more likely related to maintain the process in any given Project or Programme. Quality Control is nothing but giving the right product by verifying the requirements are covered and working as expected.

We know there are multiple levels of testing methods are spoken in the testing world and we have the methodology by which we need to execute them like System Test, Integration Test etc., and Methodology like Water fall, Iterative etc.,

Quality Assurance:

Let me explain what I know and aware of Quality Assurance:

There 3 different roles who will be responsible for assuring the process in any projects.

  1. PQA – Project Quality Analyst
  2. CC – Configuration Controller
  3. DPA – Defect Prevention Analyst

Project Quality Analyst – PQA role

A person who involved in this role needs to know the pre-defined industry standards of that Organization.

PQA’s Roles and Responsibilities

  1. Documents Naming Convention met as per the Industry Standard
  2. Names of who prepared , reviewed , approved the deliverables
  3. Reviews happened across all the Customer facing documents
  4. Review defects are found, fixed, verified and captured for Metrics
  5. Checking whether all the deliverables are kept in a Common place where the stake holders can access
  6. Checking all the necessary deliverables are prepared by the project team
  7. Checking the actual project delivery date and the date mentioned in the documents are same
  8. Checking the Stake Holders names, document owner names are mentioned correctly in all customer deliverables
  9. Differentiating the Customer facing deliverables and Internal audit specific deliverables are as per the industry standards
  10. Verifying the Entry Criteria and Exit Criteria of any Levels in SDLC are met and collecting the proofs for the same
  11. PQA’s will be involved in all levels of SDLC

Business Analyst Teams will have separate sets of Deliverables like Business Requirement documents, Business Understanding Documents, Requirement Traceability documents etc.,

  1. Development teams will have separate sets of Deliverables like High Level Design, Low Level Design , Functional Specifications etc.,
  2. Testing teams will have separate sets of documents like Test Plans, Test Conditions

The PQA should validate all the documents that supposed to be delivered to the Clients and maintain for internal audits

CC – Configuration Controller

Configuration Controller who controls the versions and the placement of the documents in tools like VSS – Microsoft Virtual Source Safe or Documentum etc.,

Configuration Controller Roles and Responsibilities

  1. CC’s are responsible of Creating the folder structures in VSS or Documentum

Like, in any Projects the following folders will be created to maintain the projects deliverables

  1. Project Kick off
  2. Minutes of Meeting
  3. Requirements
  4. Review Log
  5. Development

1.1.  High Level design

1.2. Low Level Design

1.3. Issue Log

1.4. Emails

6. Testing

1.1   Unit Testing

1.2   System Testing

1.3   System Integration Testing

1.4   User Acceptance Testing

6.Production Support

CC’s will have the Admin rights to Grant and Revoke access to folders.

Developers should not have access to the folders related to Testing and vice versa

  1. CC’s will maintain the Check in and Check out of the documents that goes into VSS
  2. CC’s will maintain the relevant documents are kept in corresponding folders in VSS

DPA – Defect Prevention Analyst

Defect Prevent Analysts will maintain the defects across the SDLC. For any raised defects the work flow should be maintained. Proper comments for those defects should be given when they are created. All high severity defects should be fixed from one Phase to next phase to being with.

As testers when we raise defects we need to concentrate on applying Defect Cause and Defect Type in any Defect Management tool. This will help DPA’s to classify the defects and comes up prevention tips.

Defect Cause – What is the root cause of the defect that is,

  • Is the defect caused because of the Upstream Source Data or Test Data
  • Is the defect caused because of Incomplete of Missing requirements
  • Is the defect caused because the Inconsistent Requirement
  • Is the defect caused because of the Code discrepancy
  • If you find any anomalies in any Documents then raise the defects to Artefacts
  • If any of your defects leads to Changes in the Requirement then please raise them as Change Request – CR can be on the Actual Business Requirement or on Design Changes.

Defect Type – Classifications of the Defects that is,

  • Is the defect related to Data Error
  • Is the defect related to Application Code
  • Is the defect related to Work Request
  • Is the defect related to Change Request
  • Is the defect related to Deployment
  • Is the defect related to Specification
  • Is the defect related to Artefact
  • Is the defect related to Production
  • Is the defect related to Process
  • Is the defect related to Environment
  • Is the defect related to Specification
  • Is the defect related to Requirements
  • Is the defect related to Reviews

DPA’s most prominent work is to prepare the CAPA – “Corrective Analysis and Preventive Actions”

DPA roles and Responsibilities

  1. DPA’s will collect the Metrics related to Defects in a periodic manner – Weekly, Monthly or Ad-hoc
  2. DPA’s will collect the defects by Defect classifications like in a given period of time how many defects are raised in Reviews, Code, Requirement Change and collects the cause of the defects
  3. Then using the metrics that was retrieved from any Defect Management tools, they design the Fish Bone diagram by filling the necessary details
  4. Using the Statistical tools like Minitab, they calculate the Defect Density of the defects from each phase
  5. Then they will create the Prevention actions on the areas where the Defect Density is above to the threshold limits.

Suppose if your Organization has the Defect Density threshold limit as 0.5 and your Defects under Review defect type is more than 0.5 then they will ask the Reviewers to do their work better to minimize the review defects at any levels of SDLC.

Cheers – Asik

Importance of Non Functional Testing in Data warehouse

Posted on Updated on

Hi All

In this post I would like to share my knowledge in Non Functional Testing in Data warehouse testing.

car

There are different types non-functional testing that we do in testing world, some of them is

  1. Baseline testing
  2. Compatibility testing
  3. Compliance testing
  4. Documentation testing
  5. Endurance testing
  6. Load testing
  7. Localization testing and Internationalization testing
  8. Performance testing
  9. Recovery testing
  10. Resilience testing
  11. Security testing
  12. Scalability testing
  13. Stress testing
  14. Usability testing
  15. Volume testing

To me Non Functional testing is something like which will not give any business values; It’s something like dealing with the environment. When we extract the data from heterogeneous source system, we might need to think of handling

Verifying the volume of the data

Any business can’t ensure what could be the volume of the data that they will send. They can say approximately, Our Code should have the capability of pulling the maximum number of data that they source system can send at any point of the time. To manage the Volume of the data, Teradata has the feature called M-Load and T-Pump. When developers designs the system they fix a limit by which data will be loaded into Warehouse.

Example:

  • M-Load – If we get a data file with 100 records then the records will be loaded by M-Load functionality
  • T-Pump – If we get a data file with less than 100 records then the records will be loaded by T-Pump

What we need to test here is, send a file with 100 records and check records are loaded by M-Load. This can be verified using the Load Job Names.

Verifying Date and Time of the Data file arrival to the Unix Landing directory

Most of the Companies will not function on Week Ends, Public Holidays so our source systems will not send any transactional data on those days. Because of the phenomenon developers will design their jobs to archive any files coming on these days.

Normally, Monday’s transactional data will come to us for loading on Tuesday early morning and it will end on Fridays transactional data will hit us on Saturday early morning.

We as testers need to verify these schedules are working as per the specification. This can be achieved

  • sending a file on Week End and check this file is archived
  • Sending a file on Public Holiday and check this file is archived
  • Verifying Mondays transactional data received on Tuesday morning until on Saturday morning

Verifying Purging and Truncate Loads

I have already mentioned about Purging and Truncate loads in my earlier blogs.

Purging –  The AutoSys jobs will Purge the data leaving the required data in staging table. Suppose if I have loaded 10th,11th ,12th of January data into staging table and when I load 13th of January data, the 10th of January data will be purged.

Truncate –  Simple load day_01 data and when you load day_02 data  they Day_01 data will be deleted

We as testers need to verify the Truncate and Purging is happening as per design requirement.

Verifying File Watcher Script

There will be File Watched Script that will look for files until it arrives the Unix Landing directory. Source system is promising us that they will send Day_01 file on 10-01-2013. So we have set the Date in File watcher Script. Source System sent the records on 10-01-2013 , now our File watcher Script will look the date from the file header, if both are matching then it will process the file into Staging table. Source system failed to send the data on 11-01-2013, our file watcher job will look for the file on 11-01-2013 for given time interval if its not arrived then automated Email will be sent to the concern source system saying the file is not arrived

So we as testers needs to verify the File watched job is working as expected.

Cheers – Asik.

My Imagination – How Scientists and Software Testers are related to each other?

Posted on

Hi All,

Hope you guys had  wonderful X-Mas holidays :-).

Image

As we all know all the Inventions are made by Asking more and more questions.

If Newton would have not asked ‘Why Apple fallen down from the Tree’ then we would have not have Gravity Theory.

If Archimedes would have not asked ‘Why waters spilled out of the Tub when he was bathing?’ then we would have not have Archimedes principle.

Likewise if testers asks questions on applications like Why or How? if he/She finds the answer that is not matching with what it supposed to be then IT IS A Discovery, So we are Scientists 🙂

As far as I consider testing the application is asking questions about the application that I am testing – is perfect way of finding software anomalies.

Each functional defect that we stopping by our testing is just saving the production fixes but if we stop any Business Gaps then it saves the whole Business Need. So before kick of the testing make sure we know the Business Needs.

You will get a Functional Defect – If you ask How things are working? then it will be already mentioned in all the functional specifications ! if some functionality is missed out in the developed code can be found.

You will get a Req/ Specification / Design / Defect If you ask why things are working ? then you will need to check the Business Requirements, if you feel something is wrong then you will encounter few anomalies that may be from all the specifications related to that particular work request.

In this post I wanted to explain how Important the Domain knowledge is required for a Testers.

Domains like Banking, Health Care, Manufacturing, Insurance etc., All these domains are closely related to us.

To modernize these functional areas

> Business peoples will write Specs to cater the business needs as Business Requirement Documents.

> Considering the Business Specifications Solution Designers will prepare Functional Specification Documents.

> We testers and developers will refer the above documents and Develop and Test the application.

How you can learn Business easily ?

If you are working in a Banking Domain, you got Loan functionality to be tested in your Client website, then please create a Loan application in real time with your bank or with some other bank’s on-line application ,

If you are working in a Health Care Domain, you got Inventory functionality to be tested then go to nearby Chemist (Medical Shop) pick some medicines from here and there and go for billing and check how the shop keeper handling your goods.

Like above two examples, what ever business that you are testing, please do imagine that YOU ARE ALSO GOING TO USE THIS PRODUCT. Would you accept a Defective product from your manufacturer, ‘No’ right then your testing will be perfect.

I hope all the readers know about Validation and Verification ?

Let me tell you what I think about it,

Validation means, we need to verify all the documents that related to the given functionality are acceptable and valid.

Verification means, by validated specifications, the Code is written and its verified by us.

In Data warehouse world, the Specification documents are needs to be validated, because even a simple mistake will create a huge problem @ the end.

For an Example,

In warehouse we keep amount columns are in Negative (Bank Owe’s to us) as well as in Positive numbers (We Owe to bank).

Business Need – All the transactions of the day to be extracted

For extracts If specification documents asks us to pull the records where Balance > 0 then you will get the customers who are owing money to the bank.

So even a Single Symbol matters a lot !!! Before we start the Verification we need to Validate first!!!

Lets Discover along with Finding defects 🙂

Cheers – Asik

Chapter 14 – A Sample ETL Project – Customer Management in a Hospital

Posted on Updated on

Hi All,

Have a please week ahead !!!

In this post let me explain a Project Structure.

asik_world

I have a Multi-Specialty Hospital located across the world. My hospital is famous for Vaccination. Patients who comes to my hospital across globe will be given a User Card with which they can access any of my hospital in the world.

Business Need

Now we maintain all customers in one Database. There are heaps and heaps of customers related to my hospital.So I decided to split up the customers based on the country and load them into corresponding country tables. Elaborated details of this requirements will be mentioned in the Business Requirement Specification.

Design Need 

To pull the customers as per Country, a designer should know what are all the places the Customer Data is available. So the data extracting will be done by our Source System. They will pull the all the relevant customer data and will give us a Data file.

In Design Documents you will have

Source File Specification 

1)   File Name Specification – Name String , Extension of the files
2)   Date and Time format of the File – YYYYMMDD, HHMMSSTT or any other format
3)   Control File Specification – Name String , Extension of the files
5)   Header Records Layout – |H|Customer_Records|20131012|1245872|
6)   Details Record Layout – |D|Asik|123456|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
7)   Trailer Record Layout – |T|10|

Detail Records will tell you what data you are getting from source, what data type, is it mandatory or not and the length of the column.

File Position Column_Name Filed Length Data Type Mandatory (Y/N) Key Column
1 Customer Name 255 VARCHAR Y Y
2 Customer ID 18 VARCHAR Y N
3 Customer Open Date 8 DATE Y N
4 Last Consulted Date 8 DATE N N
5 Vacination Type 5 CHAR N N
6 Doctor Consulted 255 CHAR N N
7 State 5 CHAR N N
8 Country 5 CHAR N N
9 Post Code 5 INT N N
10 Date of Birth 8 DATE N N
11 Active Customer 1 CHAR N N

Click below

Please refer my Post – Source File Validation to know how validate the above details.

The sample file format will be

|H|Customer_Records|20131012|1245872|
|D|Asik|123456|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|D|Alex|123457|20101012|20121013|MVD|Sanjay1|SA|USA|06031987|A
|D|John|123458|20101012|20121013|MVD|Sanjay|TN|IND|06031987|A
|D|Mathew|123459|20101012|20121013|MVD|Sanjay|WAS|PHIL|06031987|A
|D|Matt|12345|20101012|20121013|MVD|Sanjay|BOS|NYC|06031987|A
|D|Jacob|1256|20101012|20121013|MVD|Sanjay|VIC|AU|06031987|A
|D|Arun|1456|20101012|20121013|MVD|Sanjay|QA|AU|06031987|A
|D|Wague|23456|20101012|20121013|MVD|Sanjay|AOL|CAN|06031987|A
|D|Steve|126|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|D|Neil|156|20101012|20121013|MVD|Sanjay|NSW|AU|06031987|A
|T|10|

Now using the ETL process now we loaded the data into Staging Tables. Intermediate tables will look like below

Staging Table Load 

Click below

Please Check my ETL Process post how the data is loaded and what we need to verify from this Step :

Name Cust_I Open_Dt Consul_Dt VAC_ID DR_Name State County DOB FLAG
Asik 123456 20101012 20121013 MVD Sanjay NSW AU 6031987 A
Alex 123457 20101012 20121013 MVD Sanjay1 SA USA 6031987 A
John 123458 20101012 20121013 MVD Sanjay TN IND 6031987 A
Mathew 123459 20101012 20121013 MVD Sanjay WAS PHIL 6031987 A
Matt 12345 20101012 20121013 MVD Sanjay BOS NYC 6031987 A
Jacob 1256 20101012 20121013 MVD Sanjay VIC AU 6031987 A
Arun 1456 20101012 20121013 MVD Sanjay QA AU 6031987 A
Wague 23456 20101012 20121013 MVD Sanjay AOL CAN 6031987 A
Steve 126 20101012 20121013 MVD Sanjay NSW AU 6031987 A
Neil 156 20101012 20121013 MVD Sanjay NSW AU 6031987 A

Now my Staging Data is ready and we need to load them into corresponding Target Tables.

As a project we will test data load from file to Staging as Phase-I and Staging to Target tables to Phase II.

In this project we are not having any transformation rules , so we will insert the records from Staging tables to corresponding target tables.

Like

All customers related to India will go to Table_India and so on. In my next blog let me come up with few more complex examples for Target tables load.

The Source to Target Document will looks like

 Customers From India will be loaded by below logic
S.No Source Column Source Table Transformation Rule Target Table KEY / Value Target Column
1 Customer Name CUST_STG If Source value is Null Do not Load Else load CUST_IND Key CUST_NAME
2 Customer ID CUST_STG If Source value is Null Do not Load Else load CUST_IND Key CUST_ID
3 Customer Open Date CUST_STG Assign Source Value CUST_IND Value CUST_OP_DT
4 Last Consulted Date CUST_STG Assign Source Value CUST_IND Value CUST_CONS_DT
5 Vacination Type CUST_STG Assign Source Value CUST_IND CUST_VAC_TYPE
6 Doctor Consulted CUST_STG Assign Source Value CUST_IND CUST_DR_CONS
7 State CUST_STG If Source value is Null Do not Load Else load CUST_IND Value CUST_STATE
8 Country CUST_STG If Source value is Null Do not Load Else load CUST_IND Value CUST_COUNTRY
9 Post Code CUST_STG If Source value is Null Do not Load Else load CUST_IND CUST_POS_C
10 Date of Birth CUST_STG Assign Source Value CUST_IND CUST_DOB
11 Active Customer CUST_STG Assign Source Value CUST_IND CUST_STATUS

So based on above rule will load the data into IND_CUST tables.

Cheers

Asik

Chapter 5 – Know about Data Base (Home) , Views , Tables (Rooms) , Reference Tables (Guest House) in Business Intelligence

Posted on Updated on

Hi Guys,

Hope you had a wonderful weekend !!!

In this blog I would share my personal thoughts on What is Data Base, Views, Tables and Reference Data?

Data Base is something like a House and Tables are like Rooms and Reference Tables are like Guest House 🙂

Now I bought a house I said to my mom and dad you guys please take the room in Upstairs, sisters please share the room near to the Kitchen, brothers please share the room near to the Balcony.

Like the example above an Organization will decide to keep their data into one Data Base (Home) but in different Tables (Rooms) and when they want to keep the data that changes as per Customer recommendation then they will capture in Reference Tables (Guest House).

Why we need View Data base ?

Again I will consider the home as example – when you are inviting your friends to your house you will not say how your house is physically build, but you will say your home address right? it is difficult to understand how they building is built rather then checking its address. Likewise the data will be loaded into Physical tables and developers will just write a Select Query on the tables , and that query will be used to create the View Database.

Primary Key and Foreign Key relationship :

Each room in your house are having a logical reference to the house and the rooms and the people are living in the rooms can be called or connected by this reference likewise in DB all the tables are Logically and Physically connected with a Column. So all the tables will be communicating each other using this Primary Key;

I will give you one more example :

bank

I wanted to create an account with Citi Bank, I gave my Name, DOB, Address, Amount needs to be deposited, my mobile number, my telephone number, my email id, relevant documents.

Account Created 

> After some days I gone to bank and applied for Credit Card !!! now they are asking my Account Number only !! Why ?

I got my Credit Card!!! Wohoooo 😀

> After some days I got a call from bank and they offered me a Life Insurance – I accepted it , now what

I have Insurance from Citi Bank 

> After some days I my wife was urging me to buy a House 😦 so I checked in Citi Bank Home Loans Online and submitted all the documents along with my account number , after the check they disbursed the loan amount, now what

I have bought a house with loan from my Bank

If you see the above scenario Bank needs to capture all the data related to my account !!! so what they gonna do ??

First step Create a Data Base called – ACCOUNTS

and now create tables !!! is only one table enough to accommodate all the data related to me ? Ghrrrr no !!!

They will create multiple tables for each types of Data !!

Table_1 – Primary Details of my account

Table_2 – Address Details of my account

Table_3 – Contact Channels (mobile , Email. Home Phone)

Table_4 – My account Balance Details

Table_5 – My Credit Card Details

Table_6 – My Home Loan Details

Now all theses 6 tables will have ACCOUNT NUMBER as my primary Key. If the bank person wants to see my Home Loan Balance and Credit Card balance he will join tables Table_5 and Table_6 using my Account number 🙂

I hope now you understood how the Primary Key functions 🙂

Reference Tables:

Using my account details I will explain you why Citi Bank wants to create the Reference Tables ?

When I applied Credit Card , they asked about my Annual Income, I said its AUD 100K, they punched the value in their system. After some days I got a letter from the Bank says that you got 85K as your Credit Limit. I wonder how they have captured this data when they extracted the credit limit for my application. Here is the culprit Reference tables comes into the picture 🙂

When my salary in => 100K then it will refer to the Credit Limit Reference table and fixes the credit limit of 85K. I would have said 50K then my Credit Limit would have been limited to 30K :-(.

Hope you guys understood the concepts of Data Base, Tables, Views and Reference Tables 🙂

See you guys in my Next Blog about Shake Down testing 🙂

Cheers – Asik