Problem Description:


Pipeline build job on IWX installed on EMR with Spark as the execution engine takes more (more than 8 hours) time while processing even a million records.


Root cause:


This issue could happen sometimes if Spark dynamic allocation of executors is enabled at the EMR cluster (spark.dynamicAllocation.enabled set to true) and the spark job will request for a high number of executors than required. 

Because of this scenario, the job runs for more time and slowly the executor number will be lowered. The below message will be displayed in the corresponding yarn log many times.



21/03/19 05:37:49 DEBUG ExecutorAllocationManager: Lowering target number of executors to 428 (previously 429) because not all requested executors are actually needed

21/03/19 05:37:49 DEBUG ExecutorAllocationManager: Lowering target number of executors to 427 (previously 428) because not all requested executors are actually needed

21/03/19 05:37:50 DEBUG ExecutorAllocationManager: Lowering target number of executors to 425 (previously 427) because not all requested executors are actually needed

21/03/19 05:37:50 DEBUG ExecutorAllocationManager: Lowering target number of executors to 423 (previously 425) because not all requested executors are actually needed

21/03/19 05:37:50 DEBUG ExecutorAllocationManager: Lowering target number of executors to 422 (previously 423) because not all requested executors are actually needed

21/03/19 05:37:50 DEBUG ExecutorAllocationManager: Lowering target number of executors to 421 (previously 422) because not all requested executors are actually needed

21/03/19 05:37:50 DEBUG ExecutorAllocationManager: Lowering target number of executors to 420 (previously 421) because not all requested executors are actually needed

21/03/19 05:37:50 DEBUG ExecutorAllocationManager: Lowering target number of executors to 419 (previously 420) because not all requested executors are actually needed

21/03/19 05:37:50 DEBUG ExecutorAllocationManager: Lowering target number of executors to 418 (previously 419) because not all requested executors are actually needed

21/03/19 05:37:51 DEBUG ExecutorAllocationManager: Lowering target number of executors to 416 (previously 418) because not all requested executors are actually needed

21/03/19 05:37:51 DEBUG ExecutorAllocationManager: Lowering target number of executors to 415 (previously 416) because not all requested executors are actually needed

21/03/19 05:37:51 DEBUG ExecutorAllocationManager: Lowering target number of executors to 414 (previously 415) because not all requested executors are actually needed




Solution:


Disable the dynamic allocation of executors for this particular pipeline. Set the min, max, and initial executors for the job instead of dynamically allocating them.

 Set the below parameters at the Pipeline>Settings>Advanced Configurations section.

key: spark.dynamicAllocation.minExecutors
value:5
key:spark.dynamicAllocation.maxExecutors
value:5
key:spark.dynamicAllocation.initialExecutors
value:5
key:spark.driver.memory
value:16g
key:spark.dynamicAllocation.enabled
value:false
key:spark.driver.cores
value:4
key:spark.executor.cores
value:4
key:spark.executor.memory
value:8g


Note: You can increase/decrease the value for the above parameters for the successive runs a bit based on the execution time for the first run with these parameters in place.


Applicable Infoworks Versions:

IWX on EMR v3.1.x