Problem Description

Pipeline build job will be successful but it will process 0 records in Infoworks DataFoundry 3.1.1-EMR

Root cause: This issue happens in Infoworks DataFoundry 3.1.1-EMR if the underlying Spark version used while deploying the EMR cluster is 2.3.x and higher. Spark 2.3.x will not be able to read the external hive table (the source table that is ingested through ingestion process).

Solution: Perform the below steps and run the pipeline build job.

a) Login to Infoworks DataFoundry edge node as Infoworks user.

b) Go to $IW_HOME/conf directory.

c) vi dt_spark_defaults.conf

d) Add the below properties, save the file and run the pipeline job.

spark.sql.hive.convertMetastoreParquet false

spark.sql.hive.convertMetastoreOrc false

Applicable Infoworks DataFoundry Versions: