In Infoworks 3.1.2, when user runs the pipeline build with spark as execution engine then a spark job is submitted to perform the transformations on the data. User would want to configure and control these applications by passing certain spark configurations. Infoworks allows users to set these configurations at three locations each with different scope.
1. System level configuration: User can add/edit spark configurations at
{IW_HOME}/conf/dt_spark_defaults.conf (3.1.2 and above)
{IW_HOME}/conf/df_spark_defaults.conf (3.1 and below)
Any configuration added to this file will be applied to every spark job launched by Infoworks pipelines
2. Domain level configuration: User can add/edit spark configurations at the below location for a specific domain
Example:
For 3.1.2 and above:
key: spark.yarn.queue
value: newqueue
For 3.1 and below
key: df_batch_sparkapp_settings
value: spark.yarn.queue=newqueue;spark.executor.memory=2g;
These configuration will overwrite the system level configurations.
3. Pipeline level configuration: User can add/edit spark configurations at the below location for a specific pipeline
Example:
For 3.1.2 and above:
key: spark.yarn.queue
value: newqueue
For 3.1 and below
key: df_batch_sparkapp_settings
value: spark.yarn.queue=newqueue;spark.executor.memory=2g;
These configuration will overwrite both system level and domain level configurations.
Application versions: 3.1.x, 3.3.x
Spark configurations on Infoworks 3.1.x and 3.3.x Print
Modified on: Sun, 21 Mar, 2021 at 1:05 PM
Did you find it helpful? Yes No
Send feedbackSorry we couldn't be helpful. Help us improve this article with your feedback.