Below is a script to monitor for active EMR job drivers that start with "ClusterJobDriver_". At any given time there should be only one "ClusterJobDriver_" in the running state.
The below script is to calculate the no of active "ClusterJobDriver_ " and notify by email or AWS SNS topic when the count is more than 1.
- Need to configure AWS region and cluster ID in the script
- The script assumes aws cli is already setup with the required authentication
- All the job drivers launched for ingestion or pipeline jobs start with "ClusterJobDriver_".
#!/bin/bash # Set the AWS region and EMR cluster ID AWS_REGION="your_aws_region" EMR_CLUSTER_ID="your_emr_cluster_id" # Get the list of step functions for the EMR cluster that start with "ClusterJobDriver_" step_functions=$(aws emr list-steps --cluster-id "$EMR_CLUSTER_ID" --region "$AWS_REGION" --output json | jq '.Steps[] | select(.Name | startswith("ClusterJobDriver_")) | select(.Status.State == "RUNNING")') # Count the number of running step functions running_count=$(echo "$step_functions" | jq 'length') # Display the count of running step functions echo "Running Step Functions Count: $running_count" # Check if the running count is more than 1 if [[ $running_count -gt 1 ]]; then # Email alert subject alert_subject="More than 1 job driver found to be in running state on Cluster $EMR_CLUSTER_ID" alert_body="Please investigate the running step functions on EMR Cluster $EMR_CLUSTER_ID. There are $running_count job drivers currently running." # Uncomment the following lines to send email using mailx #ALERT_EMAIL="[email protected]" # echo "$alert_body" | mailx -s "$alert_subject" "$ALERT_EMAIL" # Uncomment the following lines to send an AWS SNS notification # sns_topic_arn="your_sns_topic_arn" # aws sns publish --region "$AWS_REGION" --topic-arn "$sns_topic_arn" --subject "$alert_subject" --message "$alert_body" fi