Problem:


I would like to add labels to Dataproc Cluster launched by Infowokrs ingestion jobs



Solution:


Infoworks provides a pre ingestion job hook that can be used to run a bash script before beginning the ingestion job.

In the below steps, we would leverage the pre ingestion job hook to add labels for Dataproc clusters once they are launched


Steps:


1. Create a bash script as below


#/bin/bash
if ! grep -q interactive "/proc/sys/kernel/hostname"
then
 master_node=$(cat /proc/sys/kernel/hostname)
 cluster_name=${master_node::-2}
 gcloud dataproc clusters update $cluster_name --update-labels env=prod,source=csv  --region=us-central1
 echo "Added labels to DataProc Cluster"
else
 echo "Interactive Cluster, not adding labels"
fi


--update-labels env=prod,source=csv

Declare your own vairables


--region=us-central1

Replace with your actual region for the Dataproc Cluster



Note:

1. The above script updates labels only for ephemeral clusters launched by ingestion jobs

2. A pre ingestion job hook is applied for all tables in the source and cannot be applied individually for table



Affects Version:


Infoworks 5.0, 5.1.X