Description: We noticed the below meteor timeout in the $IW_HOME/logs/rest-api/iw-rest-api.log at Macy's environment where the UI service(node application) would be consuming a high CPU.

This is an issue with the UI cluster package while serving multiple user requests.

2020-10-15 08:37:15 | ERROR | meteor_rest.py | 114 | Login request timed out, Ensure that Meteor server is up and running.
2020-10-15 10:29:05 | ERROR | meteor_rest.py | 114 | Login request timed out, Ensure that Meteor server is up and running.


Steps: Perform the below steps to apply a patch to mitigate this issue.


1) Make sure the mongo index is created on jobs collection, one can use the below commands to create and list the indexes.

db.getCollection('jobs').createIndex({entityId: 1, buildNumber: 1})
db.collection.getIndexes()

2) Stop the UI process (cd $IW_HOME/bin && ./stop.sh)

3) Take the backup of apricot-meteor to apricot-meteor-bak

4) Download the apricot-meteor tarball

5) Untar the apricot-meteor tarball

6) Start the UI process


Note: 

  • In the case of Service Recovery setup, For secondary and tertiary nodes, only steps 3, 4, 5 are applicable.
  • For future debugging purposes, users may set ENABLE_DETAIL_LOGGER variable to true in the admin configuration section.


Example:

$IW_HOME/bin/stop.sh ui
mv $IW_HOME/apricot-meteor $IW_HOME/apricot-meteor.bak
cd $IW_HOME
wget https://infoworks-setup.s3.amazonaws.com/3.1/macys-ha/apricot-meteor-3.1.2-fixes.tar.gz
tar -xvf apricot-meteor-3.1.2-fixes.tar.gz
$IW_HOME/bin/start.sh ui

       

Brief info on meteor timeout issue: 

Infoworks versions prior to 4.4, all the Rest API calls were handled by the meteor layer. When there are multiple API requests that reach the meteor layer in parallel, some of the requests are getting lost due to the high load on the meteor(during that timeframe, one can see meteor timeout errors in the rest API log). 


The permanent fix for this issue is the Rest API re-architecture where the requests are handled separately and they will not reach the meteor layer like older versions. But to mitigate this timeout issue in older versions one needs to apply the above-mentioned fix(this is specific to 3.1.2-Macys) and restart the Rest API and UI services periodically.