Databricks jobs, for ingestion, submitted by infoworks DF can get timed out if it runs more than a configurable limit. The messages in job log which indicate the timeout are the following


[INFO] 2020-10-20 10:39:44,500 [pool-1-thread-1] io.infoworks.platform.databricks.util.MetaDBUtil:114 :: Marked databricks run '8231' as 'TIMEDOUT'
[INFO] 2020-10-20 10:39:44,501 [pool-1-thread-1] io.infoworks.platform.databricks.monitor.DatabricksJobMonitor:83 :: runId:8231, State: {"life_cycle_state":"TERMINATING","result_state":"TIMEDOUT","state_message":"Terminating the spark cluster"}

In this scenario, one can increase the parallelism and try to run the jobs below the timeout or increase the timeout value itself or even a mix of both. 

Instructions to optimise job performance

  • Step 1: Select Split By Column
  • Step 2: Increase the number of max connections to number of distinct splits
  • Step 3: Increase number of max allowed workers in the cluster template. 

Step 1: Select Split By Column


Step 2: Increase the number of max connections to number of distinct splits

  • Increase the max connections to the source in the below setting 

     

    For example, if the split by column splits the tables into 24 splits, you can give the max connections as 24 so that each connection can bring the data from each split. 


Step 3: Increase number of max allowed workers in the cluster template. 

  • Increase number of workers to the cluster template you are using for ingestion.


    For example, if you have 24 splits/connections, the to enable full parallelism you need 24 vcpus which is equal to 3 Standard_DS13_v2 nodes. This is to ensure that all the data coming from 24 parallel connections is processed in parallel by serving each connection with a vpcu. 

Alternatively you can also increase the timeout value to a higher value


The current timeout be seen from the following file and can be edited in the same file to a higher value 


cat /opt/infoworks/conf/databricks_ingestion_defaults.json | grep timeout_seconds

The below property can be set to a higher value (example: 20 hrs)
"timeout_seconds": 72000


Applicable versions: 3.2, 4.2, 4.2.0.1