I would like to use a custom autoscaling policy for my Dataproc Cluster for Ephemeral jobs or I would like to use secondary worker nodes for the Dataproc Cluster


Infoworks provides a pre ingestion job hook that can be used to run a bash script before beginning the ingestion job.

In the below steps, we would leverage the pre ingestion job hook to replace the default autoscaling policy with a user-defined custom autoscaling policy.


1. Create a custom autoscaling policy on the GCP console and take a note of the autoscaling policy ID

2. Create a bash script like below,

if ! grep -q interactive "/proc/sys/kernel/hostname"
 master_node=$(cat /proc/sys/kernel/hostname)
 gcloud dataproc clusters update $cluster_name \
 echo "Autoscaling Policy Updated"
 echo "Interactive Cluster, not updating autoscaling policy"


Replace with your actual autoscaling policy ID from step 1


Replace with your actual region for the Dataproc Cluster

3. Create a pre ingestion job hook and upload the bash script.

4. Add the ingestion hook to the Infoworks source where you would like to use the  custom autoscaling policy


1. The above script updates the autoscaling policy only for ephemeral clusters

2. A pre ingestion job hook is applied for all tables in the source and cannot be applied individually for table

Affects Version:

Infoworks 5.0, 5.1.X