Problem:
I would like to add labels to Dataproc Cluster launched by Infowokrs ingestion jobs
Solution:
Infoworks provides a pre ingestion job hook that can be used to run a bash script before beginning the ingestion job.
In the below steps, we would leverage the pre ingestion job hook to add labels for Dataproc clusters once they are launched
Steps:
1. Create a bash script as below
#/bin/bash if ! grep -q interactive "/proc/sys/kernel/hostname" then master_node=$(cat /proc/sys/kernel/hostname) cluster_name=${master_node::-2} gcloud dataproc clusters update $cluster_name --update-labels env=prod,source=csv --region=us-central1 echo "Added labels to DataProc Cluster" else echo "Interactive Cluster, not adding labels" fi
--update-labels env=prod,source=csv
Declare your own vairables
--region=us-central1
Replace with your actual region for the Dataproc Cluster
Note:
1. The above script updates labels only for ephemeral clusters launched by ingestion jobs
2. A pre ingestion job hook is applied for all tables in the source and cannot be applied individually for table
Affects Version:
Infoworks 5.0, 5.1.X