Problem Description:


The pipeline build job fails with the below error during the Map operator initialization with the below error.


Caused by: java.lang.RuntimeException: Map operator initialization failed
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:262)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
    ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded



Root cause:


Map join is a Hive feature that is used to speed up Hive queries. It lets a table to be loaded into memory so that a join could be performed within a mapper without using a Map/Reduce step. If queries frequently depend on small table joins, using map joins speed up queries’ execution.



If there are three or more tables involved in the join conditions and if the data in these tables are huge, sometimes the Map operation initialization will fail with OOM as these tables will be loaded into memory during the map task.


Solution:


Set the below-advanced config in the Pipeline>Advanced configuration and run the job.


key: df_batch_hive_settings

value: hive.auto.convert.join=false


Setting this configuration to false, will not perform the Map side joins and Map-reduce jobs will be launched instead of map side joins and that will resolve this issue.


Applicable Infoworks version:


IWX EDO2 2.4.x,2.5.x,2.6.x,2.7.x,2.8.x,2.9.x