Hope you are enjoying your weekend. Have a wonderful weekend ahead !!!
Today I am going to explain how we need to take care of Nulls Blanks Spaces 0’s from Source System. These Nulls Blanks Spaces 0’s are extracted from the upstream applications and given to us to load it into Ware house. But our Data Modelers may or may not want them to put into ware house as it will not give meaning to the business.
Origination of Nulls Blanks Spaces 0s from Source System
I hope everyone used Online applications for your job search. You would have seen a ( * ) symbol in most of the field , so you need to enter some value.
1.In some field , you cant enter Numeric values
2.In some field , you can only enter Numeric Values
3.In some field , dates as per the given format
4.In some fields , you need to give your contact number as per the country standards
5.In some fields , you need to give your correct Email Id.
Like above you will be having so many check points for an online application. Say for an example, 2000 people applied for a Test Engineer job in my Company. And my Online Application has few Free Text field, applicant can enter what ever wanted to. In that few of them gave
1. Spaces in the Applicant Name (by mistake)
2. 0’s in the Experience column (because he is fresher)
3. Combinations of Spaces and 0s in the Phone Number field
4.They didn’t mention date fields.
My source system refers this Online webpage and pulls data and stores into their database. And will extract the data as files.
We will get the file as
D|Asik|8|06031987|9880112345|IND|PERSON@GMAIL.COM – Valid Record
D||8|06031987|9880112345|IND|PERSON@GMAIL.COM – In Valid Record
We load these records into our source tables we will not consider these NULLs Blanks 0s and Spaces as the Staging table data types are VarChar (it can have all types of data).
Our Source to Target specification will be taking care of these Unknowns.
a) If the source field_1 is NULL then populate Null in the corresponding target column_1
b) If the source field_2 is NULL then do not load the record
c) If the source field_3 is 0 then populate Null in the corresponding target column_3
d) If the source field_4 is 0 then do not load the record
e) If the source field_5 is BLANK then populate Null in the corresponding target column_5
f) If the source field_6 is BLANK then do not load the record
g) If the source field_7 is Space then populate Null in the corresponding target column_7
h) If the source field_8 is Space then do not load the record
This verification can be done for each columns separately or collectively to one column like,
If Column_1 from source is NULL, BLANK, SPACE, 0 then populate NULL to the corresponding target column_1
If Column_1 from source is NULL, BLANK, SPACE, 0 then Do Not load the record into ware house.
Whatever the circumstances our ETL code should stop Unknown records loading into warehouse.
Most of the S2T extraction criteria designed to stop the records into Warehouse if (mentioned) column values are any of NULL BLANK SPACE. Source system does not have any validations to stop Unknowns When they extract from their upstream. These unknown checks are closely dependent on Data Types of the mentioned columns.When tester designs the SQL, to test these scenarios, he should consider the Data Type of the columns.
1. Integer: Teradata accepts only NULLs and 0s for INTEGER columns. If we insert BLANK or SPACE it will take it as 0s only
2. Decimal: Teradata accepts only NULLs and 0.00s for Decimal columns. If we insert BLANK or SPACE it will take it as 0.0s only
3. CHAR: All NULL BLANK SPACE 0 can be possible for CHAR columns, so we need to test all the Unknown possibilities for CHAR columns
4. VARCHAR: All NULL BLANK SPACE 0 can be possible for VARCHAR columns, so we need to test all the
Unknown possibilities for VARCHAR columns
When we test VARCHAR column we should be very conscious because:
Source Record from file,
Scenario_01 – (0s with Space in VARCHAR columns)
|D| Asik| 51302138| 0|
These values will be loaded with spaces into staging tables. The tester designed SQL (where Column<>0) then this
record will not be stopped! So it should be (where TRIM(Column)<>0).
If you find any issues like this please raise the concern to Developer and ask them to add TRIM for
Scenario_02 (multiple 0000s in VRACHAR or CHAR columns)
|D| Asik| 51302138| 00| –> CHAR with 00
|D| Asik| 51302138|000000| –> VARCHAR with 000000
|D| Asik| 51302138|0| –>CHAR or VARCHAR with 0
Source can send 0, 00, 00000 to VARCHAR or CHAR columns, but the meaning is same its 0. If the tester designed SQL (where Column<>0) then only the 3rd record will be stopped whereas other two records will be loaded to in Target tables.
If you find any issues like this then please raise concerns to developer to add TRIM Leading 0 and check <> 0.
Hope you all will Stop Unknown records into my warehouse.
My Next Post is will be a Sample Project !!!!