Description


After the installation of Spark clients on Infoworks Edge node, the below configurations should be set to submit the spark pipelines. These configurations are specific to MapR cluster.


1) Copy the content from /opt/mapr/spark/spark-<version>/conf/spark-defaults.conf to $IW_HOME/conf/df_spark_defaults.conf

2) Add the below configurations in $IW_HOME/conf/con.properties file

df_spark_configfile=/opt/infoworks/conf/df_spark_defaults.conf
df_spark_configfile_batch=/opt/infoworks/conf/df_spark_defaults.conf


3) Add the below configuration in $IW_HOME/bin/env.sh file

   

 export SPARK_HOME=/opt/mapr/spark/spark-<version>/


4) Add the below IWX related configs properly in df_spark_defaults.conf 

spark.master yarn
spark.sql.hive.convertMetastoreParquet false
spark.mapreduce.input.fileinputformat.input.dir.recursive true
spark.hive.mapred.supports.subdirectories true
spark.mapred.input.dir.recursive true
spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive true
spark.sql.crossJoin.enabled true


comment the below properties if they are present in the $IW_HOME/conf/df_spark_defaults.conf

#spark.dynamicAllocation.enabled true 
#spark.shuffle.service.enabled true



5) Append the path /opt/mapr/spark/spark-<version>/conf to the beginning of df_batch_classpath and df_tomcat_classpath.


For instance,


df_batch_classpath=/opt/mapr/spark/spark-2.2.1/conf/:/opt/infoworks/lib/extras/dt/*:/opt/infoworks/df/udfs/*:/opt/infoworks/df/apache-tomcat-8.0.33/lib/*:/opt/infoworks/bin/df-commons.jar:/opt/infoworks/bin/tools.jar:/opt/infoworks/lib/spark-jackson/*:/opt/infoworks/lib/df/*:/opt/infoworks/lib/mongodb/mongo-java-driver-3.8.0.jar:/opt/infoworks/lib/shared/*:/opt/infoworks/platform/bin/notification-common.jar:/opt/infoworks/platform/bin/platform-common.jar:/opt/mapr/spark/spark-2.2.1/jars/*:/opt/mapr/hive/hive-2.1/lib/*:/opt/mapr/hive/hive-2.1/conf/

df_tomcat_classpath=/opt/mapr/spark/spark-2.2.1/conf/:/opt/infoworks/lib/extras/dt/*:/opt/infoworks/df/udfs/*:/opt/infoworks/df/apache-tomcat-8.0.33/lib/*:/opt/infoworks/bin/df-commons.jar:/opt/infoworks/bin/tools.jar:/opt/infoworks/lib/spark-jackson/*:/opt/infoworks/lib/df/*:/opt/infoworks/lib/mongodb/mongo-java-driver-3.8.0.jar:/opt/infoworks/lib/shared/*:/opt/infoworks/platform/bin/notification-common.jar:/opt/infoworks/platform/bin/platform-common.jar:/opt/mapr/spark/spark-2.2.1/jars/*:/opt/mapr/hive/hive-2.1/lib/*:/opt/mapr/hive/hive-2.1/conf/



6) cd $IW_HOME/bin

7) source env.sh

8) ./stop.sh hangman && ./start.sh hangman


Applicable Versions:


Currently v2.0 and higher versions of Spark are supported.

IWX Data Foundry v2.4.x,v2.5.x,v2.6.x,2.7.x