Problem Description:


During the fixed-width file ingestion, if there as a string value 'NULL' in your data, it will be ingested as NULL data in the hive.


Root cause:

 Infoworks EDO2 (Infoworks Enterprise Data Operations and Orchestration) will consider the string value NULL in the CSV file as NULL data and will ingest the same in the hive unless we explicitly mention/configure that it has to be considered as a string value.


Solution:


Infoworks does not have an advanced configuration to handle this scenario just like the issue mentioned in the below KB article.


https://support.infoworks.io/support/solutions/articles/14000105960-string-value-null-present-in-csv-file-will-be-converted-to-null-data-in-hive


You would need to set the below custom property in YARN>Configs>Advanced>Custom yarn-site to resolve this issue.

csv.null.string=%&*&#$#


The default value for the above configuration is NULL from the YARN side. So whenever a NULL string is encountered Yarn will replace it with NULL data. So the issue mentioned above is occurring because of this.


By adding the above property and setting its value to something which is not NULL(in this case %&*&#$#). The string value NULL will be considered as a string and Infoworks EDO2 will ingest it as a string in the hive.


Note: Make sure that your fixed-width data does not have %&*&#$#.


Applicable Infoworks EDO2 Versions.


v2.3.x,v2.4,x,v2.5.x,v2.6.x,v2.7.x,v2.8.x