Job Specific Logs for Infoworks DataFoundry on v2.x,v3.x(GCP,HDI,CDH,AZURE,EMR)


Ingestion


How to collect and analyze the Ingestion Job logs in Infoworks DataFoundry v2.x,v3.x (GCP,HDI,CDH,AZURE,EMR)


Ingestion job in Infoworks DataFoundry versions v2.x,v3.x (Azure, HDI, CDH, GCP, EMR) submits a map-reduce job in the customer’s Hadoop cluster to extract the data from the Source System and to ingest it into the hive.

If the ingestion job fails, collect the below logs.


  1. Click on the Download logs option to collect the Ingestion job logs.




  1. Unzip the downloaded zip file and look out for any errors in the .out and .log files. Open the Infoworks Job log and look out for any related ERRORs in the .log and .out file. You can search for the keyword “ERROR” in the log and try to understand the stack trace. Analyze some 10 lines before the ERROR message to understand the flow of the job.


  1. You can also check the source configuration details from the job_object.json file, present in the same zip directory.


  1. If you do not see any ERRORs in the .out file and the .log file then click on the MR Jobs tab in the UI and see if the Map-Reduce job that is submitted has failed.


  1. If you see the Status as shown below under the MR Jobs tab and if it doesn’t show 100% completion for Mapper or the Reduce phase then the map-reduce job might have failed.


  1. You can get the corresponding map-reduce application job log id from the MR Jobs section (1603205573328_0076) or if you open the ingestion job<id>.log file and look out for the below message in the log.



[pool-6-thread-1] org.apache.hadoop.mapreduce.Job:1294 :: The url to track the job: http://cs-internal-headnode.c.gcp-cs-shared-resources.internal:8088/proxy/application_1603205573328_0076/


  1. Login to the Hadoop Resource Manager and look for the above application job id, open the logs, and look out for the error messages.

  2. You can also collect the same map-reduce application job log from the Infoworks DataFoundry edge node by running the below command as the user who starts Infoworks services.

             

                       yarn logs -applicationId <job_name>


For example  


yarn logs -applicationId application_1603205573328_0076 > application_log.txt


Share the application_log.txt


Refer to the below article on how to analyze the logs to find out the status of the map/reduce tasks, job status transitions, java exception messages, and any kind of informational or warning messages by tracing syslogs of applications.


http://hadooptutorial.info/tracing-logs-through-yarn-web-ui/


  1. Take a look at the articles and solutions on the IW knowledge base. See below for the link. We have been adding new articles every week. You will be able to search for error messages and find solutions. 

 

             You can search for a solution before you open a ticket. 

 

            https://infoworks.freshdesk.com/a/solutions

 

 

Pipeline build


How to collect and analyze the Pipeline build Job logs in Infoworks DataFoundry v2.x,v3.x (GCP,HDI,CDH,AZURE,EMR)


Infoworks DataFoundry pipeline build jobs might run with Hive or Spark as the underlying execution engines in v2.x,v3.x (Azure, HDI, CDH, GCP, EMR)


If the pipeline build fails, collect, and analyze the below logs.


a) Download the pipeline build job log from Infoworks DataFoundry web UI as shown below.




Hive Execution Engine


b) If the execution Engine is Hive and if the pipeline build job fails while running a Tez job, look for the below message in the pipeline build job_<job_id>.log


UUID:dfdb15a4-2bf9-4e3f-a002-a192c0781fee  INFO  : Status: Running (Executing on YARN cluster with App id application_1603205573328_1198)



  1. Login to the Hadoop Resource Manager and look for the above application job id, open the logs, and look out for the error messages.

  2. You can also collect the same application job log from the Infoworks DataFoundry edge node by running the below command as the user who starts Infoworks services.

             

                       yarn logs -applicationId <job_name>


For example  


yarn logs -applicationId application_1603205573328_1198 > application_log.txt

Share the application_log.txt

Refer to the below article for more information on how to analyze the Tez application.


https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App


Spark Execution Engine


  1. If the Execution Engine is Spark for the pipeline build job and if the underlying Spark job is failing, get the Spark Application job log from the Resource Manager page or run the below command from the Infoworks Edge node to get the Spark application job id.



              yarn application -list -appStates ALL | grep <piepeline_build_job_id>


  1. You would see the output as shown below.


application_1600952462285_19507 DF_InteractiveApp_b2538de45dc7ddc81f1d8cce  SPARK    ceinfoworks default                FINISHED  SUCCEEDED 100%  cs-internal-workernode3.c.gcp-cs-shared-resources.internal:18081/history/application_1600952462285_19507/1


  1. Get the application_id and run the below yarn command to get the Spark application job log.


              yarn logs -applicationId application_1600952462285_19507



  1. Take a look at the articles and solutions on the IW knowledge base. See below for the link. We have been adding new articles every week. You will be able to search for error messages and find solutions. 

 

             You can search for a solution before you open a ticket

 

             https://infoworks.freshdesk.com/a/solutions


Pipeline Interactive/sample data generation failure


Whenever you run a preview data job or if you see any errors in the Pipeline Editor page while creating a pipeline in editor mode, an interactive job would be executed in the background and you can collect the logs for this from the below location on the DataFoundry Edge node.


i) $IW_HOME/logs/df/job_<job_id>


Note: The sample data generation will not have a job id shown in UI. You need to check the latest log file generated in the above directory.


ii) $IW_HOME/logs/dt/<interactive.out> (If it exists)


Workflow Failure logs


a) Whenever a workflow run fails at a particular node, select the failed run, go to the failed node, and click on View Task logs, copy the content into a text file and check the ERROR messages.



b) If it is an ingest node or a pipeline build node that fails during the job execution, collect the job logs as well along with the task log by clicking on View Job Logs as shown below.



It will take you to the ingestion/pipeline job log page and click on Download to collect the job logs.



c) Share the below logs from the DataFoundry Edge node along with the above logs.


$IW_HOME/orchestrator-engine/airflow-scheduler.log

$IW_HOME/orchestrator-engine/airflow-webserver.err

$IW_HOME/orchestrator-engine/airflow-webserver.out

$IW_HOME/orchestrator-engine/airflow-worker.out

$IW_HOME/orchestrator-engine/airflow-scheduler.out

$IW_HOME/orchestrator-engine/airflow-worker.err

$IW_HOME/orchestrator-engine/airflow-scheduler.err


Cube build jobs


i) Download the cube build job logs from the UI using the Download Logs option as shown below.




OR


Collect the job log from the below location on the Edge node.


$IW_HOME/logs/job/job_<job_id>


Along with the job log, share the below logs.


ii) $IW_HOME/cube-engine/logs/iw_cube.log

iii) $IW_HOME/cube-engine/logs/iw_cube.out

iv) IW_HOME/logs/cube-engine/access-server.log


Scheduled Jobs not getting triggered in Infoworks DataFoundry.


i) Platform service is responsible for the scheduled jobs(Workflows, Ingestion, Pipeline build) to run.


ii) If the scheduled workflows or the Ingestion & Pipeline jobs are not running on time, check the log below log.


$IW_HOME/platform/logs/scheduler-service.log


iii) Look for messages as shown below to check if the workflows are getting triggered.


[QuartzScheduler_Worker-1] job.IWSingleJob:19 : Executing command: /opt/infoworks/apricot-meteor/infoworks_python/infoworks/cli/infoworks.sh workflow-execute --workflow-action run --workflow Ingest_Tables --domain gd_domain -at AU66mMKxL9fEwHEUe9cMnlK9RTDNLkIfp7vcYdT9cSs=


Infoworks DataFoundry Service logs

For Data Foundry Versions v2.4.x, v2.5.x, v2.6.x


Locations for all the Service Logs in IWX DataFoundry



Service logs

Location

Service info for UI

$IW_HOME/logs/apricot/apricot-out.log

Service info for Governor

$IW_HOME/logs/governor/governor.log

Service info for Hangman

$IW_HOME/logs/hangman/hangman.log

Service info for Rest API

$IW_HOME/logs/rest-api/iw-rest-api.log

Service info for Scheduler Service

$IW_HOME/RestAPI/apache-tomcat-7.0.63/logs/catalina.out

Service info for Data Transformation

$IW_HOME/df/apache-tomcat-8.0.33/logs/catalina.out

Service info for Cube Engine

$IW_HOME/cube-engine/logs/kylin.log

Service info for Monitoring Service

$IW_HOME/logs/monitor/monitor.log

Service info for Postgres

$IW_HOME/logs//pgsql.log

Service info for Orchestrator Web Server

$IW_HOME/logs//orchestrator/orchestrator.log

Service info for Orchestrator Engine Webserver

$IW_HOME/orchestrator-engine/airflow-webserver.log

Service info for Orchestrator Engine Scheduler

$IW_HOME/orchestrator-engine/airflow-scheduler.log

Service info for Orchestrator Engine Worker

$IW_HOME/orchestrator-engine/airflow-worker.log

Service info for RabbitMQ

$IW_HOME/logs//rabbit*.log

Service info for Nginx

$IW_HOME/resources/nginx-portable/logs/*.log

Service info for MongoDB

$IW_HOME/logs/mongod.log




For Data Foundry Versions v2.7.x,v2.8.x,v2.9.x,v3.x,v4.x,v5.x


Locations for all the Service Logs in IWX DataFoundry


Starting from IWX DataFoundry v2.7.x, there are some new IWX services added.


Platform

Configuration service

Notification Service

Ingestion Service (Only for DataFoundry on Databricks v3.x,v4.x)


Service logs

Location

Service info for UI

$IW_HOME/logs//apricot/apricot-out.log

Service info for Governor

$IW_HOME/logs//governor/governor.log

Service info for Hangman

$IW_HOME/logs//hangman/hangman.log

Service info for Rest API

$IW_HOME/logs//rest-api/iw-rest-api.log

Service info for DT (Data Transformation)

 $IW_HOME/logs/dt/interactive.out

Service info for Monitoring Service

$IW_HOME/logs/monitor/monitor.log

Service info for Postgres

$IW_HOME/logs/pgsql.log

Service info for Orchestrator Web Server

$IW_HOME/logs//orchestrator/orchestrator.log

Service info for Orchestrator Engine Web Server

$IW_HOME/orchestrator-engine/airflow-webserver.log

Service info for Orchestrator Engine Scheduler

$IW_HOME/orchestrator-engine/airflow-scheduler.log

Service info for Orchestrator Engine Worker

$IW_HOME/orchestrator-engine/airflow-worker.log

Service info for RabbitMQ

$IW_HOME/logs/rabbit*.log

Service info for Nginx

$IW_HOME/resources/nginx-portable/logs/*.log

Service info for MongoDB

$IW_HOME/logs/mongod.log

Service info for Platform Services

$IW_HOME/logs/platform/platform-server.log

Service info for Notification User Consumer

$IW_HOME/logs/platform/artifact-consumer.log

Service info for Notification Artifact Consumer

$IW_HOME/logs/platform/user-consumer.log

Service info for Configuration Service

$IW_HOME/logs/platform/config-service.log

Service info for Ingestion Service (Only for DataFoundry on Databricks v3.x,v4.x)

$IW_HOME/logs/ingestion/ingestion.log


How to Collect/Analyze the Infoworks DataFoundry Export Job logs

Export to a delimited file

For the export jobs, one can divide the job into two stages and the exact cause for failure can usually be located based on the stage. 


Pre-MR job stage: If there is no MR job launched by DF, as shown in the below figure, then this means the job failed in this stage.
 

  1. In this case, one can download the logs and search in job_<jobid>.log file for the messages with the tag [ERROR] to look at the exact cause of failure.

  2. In the cases when job_<jobid>.log is empty or is uninformative, one can search job_<jobid>.out file error messages or exceptions.    


 MR-Job submission and Post Submission stage:  If there are MR jobs launched by DF, as shown in the below figure, then this means the job failed in this stage.



  1. In this case, one can reach out to the Hadoop admin to download the MR job logs for the job id shown in the Identifier column of the above image. You can also get the corresponding map-reduce application job log id from the job<id>.log file and search for the below message in the log.


 The URL to track the job: http://cs-internal-headnode.c.gcp-cs-shared-resources.internal:8088/proxy/application_1603205573328_0076/

  1. After this, one has to look for messages with the tag ERROR along with steps mentioned in stage 1 to look at the exact cause for failure. 

  2. Once the error message is found, one can search for the keywords in the Infoworks Freshdesk Knowledge base to see if they are addressed before. 



Export to Bigquery: Refer to the below doc to collect Bigquery audit log,

https://support.infoworks.io/a/solutions/articles/14000100766



How to collect and analyze the TPT logs for the Teradata TPT ingestion jobs in DataFoundry


Here is a Knowledge Base article on how to collect the TPT logs for a Teradata ingestion job

https://infoworks.freshdesk.com/a/solutions/articles/14000105966