PROBLEM DESCRIPTION


Batch jobs like Ingestion/Pipeline build/Workflow jobs might fail with the below error in the hangman.log which is located in /opt/infoworks/logs/hangman.


When user submits a job, the job is blocked with the following message: "There are no more executors available at this time"


By default Hangman can run 20 jobs in parallel (configurable via conf.properties).


TROUBLESHOOTING STEPS/ SOLUTIONS:


1. In Infoworks UI, Navigate  to Admin > Job Queue - check if there are 20 jobs in the queue, which means that this message is valid and the users need to wait or cancel some jobs.


If not, then please proceed to step 2.


2. Check if any Infoworks jars are running on the server while the job status has been marked as completed/failed. 

You can do this by running the following command on the edge node:

ps aux | grep <job-Id>


The result will list out all the Infoworks batch job jars that are currently running.

Scan through them to see if any of the jobs are actually running. If not, kill the jar using below command

kill -15 <job-Id>


[Internally, when Hangman sees that the process has terminated, it frees up the executor, thereby letting other jobs run.]