Problem Statement: Execution of shell script/python command from a bash node of the workflow fails in Infoworks but the same script/python command would succeed when executed from the Infoworks edge node.

Sample Error:

9-23 15:44:21,949] {bash_operator.py:110} INFO -   File "/home/infoworks-user/harshit_test/get_count.py", line 133, in main

[2021-09-23 15:44:21,949] {bash_operator.py:110} INFO -     raise Exception(snowflake_harsh_df['msg'])

[2021-09-23 15:44:21,949] {bash_operator.py:110} INFO - Exception: SF_API_FTCH_PD_ERR: Error fetching harsh dataframe, 177962: 177962: Optional dependency: 'pyarrow' is not installed, please see the following link for install instructions: https://docs.snowflake.com/en/user-guide/python-connector-pandas.html#installation

[2021-09-23 15:44:22,126] {bash_operator.py:110} INFO - Hive Count is: 19415

[2021-09-23 15:44:22,126] {bash_operator.py:110} INFO - Snowtable count is:

[2021-09-23 15:44:22,126] {bash_operator.py:110} INFO - Failure

[2021-09-23 15:44:22,126] {bash_operator.py:114} INFO - Command exited with return code 1



Root cause: 


a) In the above-mentioned scenario, the get_count.py is a custom script that is being used and it needs a python dependency library, pyarrow. The Python which is installed under /usr/bin is used to create the script and also the corresponding pip (package management system for python) is being used to install the python dependencies.


b) When you execute the python command/script from the bash node, Infoworks will source the environment variable file located under $IW_HOME/bin/env.sh and the Infoworks shipped python will be picked up instead of the Python that is installed/being used on the Infoworks Edge node. 


c) As the pyarrow dependency is not present in Infoworks shipped Python but it is present in the Python installed on the Infoworks Edge node, the command execution will fail with the above-mentioned error.


NoteAll Infoworks versions are shipped with a specific version for python ($IW_HOME/resources/python36) which is required by Infoworks services. It is not recommended to use this Python to develop any custom scripts or install any python dependencies.




Solution:

Provide the absolute path for the python binary in the shell script/command while running it from a bash node within a workflow. 


For instance:


Use the command mentioned below with the absolute path for python.


/usr/bin/python script.py


Instead of using the command mentioned below to avoid Infoworks Python getting used/picked up while executing the script from the bash node in a workflow.


python script.py


Applicable Infoworks versions: 
All Versions.