How to update the default autoscaling policy for Dataproc Ephemeral clusters ? : Infoworks

Problem:

I would like to use a custom autoscaling policy for my Dataproc Cluster for Ephemeral jobs or I would like to use secondary worker nodes for the Dataproc Cluster

Solution:

Infoworks provides a pre ingestion job hook that can be used to run a bash script before beginning the ingestion job.

In the below steps, we would leverage the pre ingestion job hook to replace the default autoscaling policy with a user-defined custom autoscaling policy.

Steps:

1. Create a custom autoscaling policy on the GCP console and take a note of the autoscaling policy ID

https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/autoscaling

2. Create a bash script like below,

#/bin/bash
if ! grep -q interactive "/proc/sys/kernel/hostname"
then
 master_node=$(cat /proc/sys/kernel/hostname)
 cluster_name=${master_node::-2}
 gcloud dataproc clusters update $cluster_name \
    --autoscaling-policy=autoscale-015a243a33b20d5eba4e5e98\
    --region=us-central1
 echo "Autoscaling Policy Updated"
else
 echo "Interactive Cluster, not updating autoscaling policy"
fi

--autoscaling-policy=autoscale-015a243a33b20d5eba4e5e98

Replace with your actual autoscaling policy ID from step 1

--region=us-central1

Replace with your actual region for the Dataproc Cluster

3. Create a pre ingestion job hook and upload the bash script.

https://docs.infoworks.io/infoworks-5.1.2/admin-and-operations/extensions#managing-job-hooks

4. Add the ingestion hook to the Infoworks source where you would like to use the custom autoscaling policy

https://docs.infoworks.io/infoworks-5.1.2/admin-and-operations/extensions#using-a-job-hook

Note:

1. The above script updates the autoscaling policy only for ephemeral clusters

2. A pre ingestion job hook is applied for all tables in the source and cannot be applied individually for table

Affects Version:

Infoworks 5.0, 5.1.X

How to update the default autoscaling policy for Dataproc Ephemeral clusters ? Print

Related Articles