Target Audience: Hadoop Admins/Infoworks Admins


How to Set the Spark Configurations for Pipelines

Infoworks Version: 2.4.2 and Above


Please follow the below steps

1. Login to Infoworks Edge Node


2. source $IW_HOME/bin/env.sh

cd $IW_HOME/bin/

./stop.sh df


3. cat $IW_HOME/conf/conf.properties | grep df_spark_configfile_batch

output will be something similar to this 

df_spark_configfile_batch=/opt/infoworks/conf/df_spark_defaults.conf


4. vi /opt/infoworks/conf/df_spark_defaults.conf 

 comment all the file contents


5. Navigate and copy the contents of the file /etc/spark2/configs.spark-config.txt to the beginning of the file /opt/infoworks/conf/df_spark_defaults.conf


6. Navigate and copy the contents of the file $IW_HOME/conf/sparkconf/df_spark_pdp.conf to the beginning of the file /opt/infoworks/conf/df_spark_defaults.conf


7. Add the property in /opt/infoworks/conf/df_spark_defaults.conf as mentioned below (if not present)

spark.master yarn

spark.yarn.jars local:/opt/hdp/spark/spark-2.1.0/jars/* [may change based on the installed paths]


8. source $IW_HOME/bin/env.sh

    cd $IW_HOME/bin/

    ./start.sh df


After all the above settings Infoworks pipeline can now use Spark as Execution engine.


Thanks,

Sri