Spark configurations on Infoworks 3.1.x and 3.3.x : Infoworks

In Infoworks 3.1.2, when user runs the pipeline build with spark as execution engine then a spark job is submitted to perform the transformations on the data. User would want to configure and control these applications by passing certain spark configurations. Infoworks allows users to set these configurations at three locations each with different scope.

1. System level configuration: User can add/edit spark configurations at

{IW_HOME}/conf/dt_spark_defaults.conf (3.1.2 and above)
{IW_HOME}/conf/df_spark_defaults.conf (3.1 and below)

Any configuration added to this file will be applied to every spark job launched by Infoworks pipelines

2. Domain level configuration: User can add/edit spark configurations at the below location for a specific domain

Example:
For 3.1.2 and above:
key: spark.yarn.queue
value: newqueue

For 3.1 and below
key: df_batch_sparkapp_settings
value: spark.yarn.queue=newqueue;spark.executor.memory=2g;

These configuration will overwrite the system level configurations.

3. Pipeline level configuration: User can add/edit spark configurations at the below location for a specific pipeline

Example:
For 3.1.2 and above:
key: spark.yarn.queue
value: newqueue

For 3.1 and below
key: df_batch_sparkapp_settings
value: spark.yarn.queue=newqueue;spark.executor.memory=2g;

These configuration will overwrite both system level and domain level configurations.

Application versions: 3.1.x, 3.3.x

Spark configurations on Infoworks 3.1.x and 3.3.x Print

Related Articles