Job Specific Logs for Infoworks DataFoundry on v2.x,v3.x(GCP,HDI,CDH,AZURE,EMR)
Ingestion
How to collect and analyze the Ingestion Job logs in Infoworks DataFoundry v2.x,v3.x (GCP,HDI,CDH,AZURE,EMR)
Ingestion job in Infoworks DataFoundry versions v2.x,v3.x (Azure, HDI, CDH, GCP, EMR) submits a map-reduce job in the customer’s Hadoop cluster to extract the data from the Source System and to ingest it into the hive.
If the ingestion job fails, collect the below logs.
Click on the Download logs option to collect the Ingestion job logs.
Unzip the downloaded zip file and look out for any errors in the .out and .log files. Open the Infoworks Job log and look out for any related ERRORs in the .log and .out file. You can search for the keyword “ERROR” in the log and try to understand the stack trace. Analyze some 10 lines before the ERROR message to understand the flow of the job.
You can also check the source configuration details from the job_object.json file, present in the same zip directory.
If you do not see any ERRORs in the .out file and the .log file then click on the MR Jobs tab in the UI and see if the Map-Reduce job that is submitted has failed.
If you see the Status as shown below under the MR Jobs tab and if it doesn’t show 100% completion for Mapper or the Reduce phase then the map-reduce job might have failed.
You can get the corresponding map-reduce application job log id from the MR Jobs section (1603205573328_0076) or if you open the ingestion job<id>.log file and look out for the below message in the log.
[pool-6-thread-1] org.apache.hadoop.mapreduce.Job:1294 :: The url to track the job: http://cs-internal-headnode.c.gcp-cs-shared-resources.internal:8088/proxy/application_1603205573328_0076/
Login to the Hadoop Resource Manager and look for the above application job id, open the logs, and look out for the error messages.
You can also collect the same map-reduce application job log from the Infoworks DataFoundry edge node by running the below command as the user who starts Infoworks services.
yarn logs -applicationId <job_name>
For example
yarn logs -applicationId application_1603205573328_0076 > application_log.txt
Share the application_log.txt
Refer to the below article on how to analyze the logs to find out the status of the map/reduce tasks, job status transitions, java exception messages, and any kind of informational or warning messages by tracing syslogs of applications.
http://hadooptutorial.info/tracing-logs-through-yarn-web-ui/
Take a look at the articles and solutions on the IW knowledge base. See below for the link. We have been adding new articles every week. You will be able to search for error messages and find solutions.
You can search for a solution before you open a ticket.
https://infoworks.freshdesk.com/a/solutions
Pipeline build
How to collect and analyze the Pipeline build Job logs in Infoworks DataFoundry v2.x,v3.x (GCP,HDI,CDH,AZURE,EMR)
Infoworks DataFoundry pipeline build jobs might run with Hive or Spark as the underlying execution engines in v2.x,v3.x (Azure, HDI, CDH, GCP, EMR)
If the pipeline build fails, collect, and analyze the below logs.
a) Download the pipeline build job log from Infoworks DataFoundry web UI as shown below.
Hive Execution Engine
b) If the execution Engine is Hive and if the pipeline build job fails while running a Tez job, look for the below message in the pipeline build job_<job_id>.log
UUID:dfdb15a4-2bf9-4e3f-a002-a192c0781fee INFO : Status: Running (Executing on YARN cluster with App id application_1603205573328_1198)
Login to the Hadoop Resource Manager and look for the above application job id, open the logs, and look out for the error messages.
You can also collect the same application job log from the Infoworks DataFoundry edge node by running the below command as the user who starts Infoworks services.
yarn logs -applicationId <job_name>
For example
yarn logs -applicationId application_1603205573328_1198 > application_log.txt
Share the application_log.txt
Refer to the below article for more information on how to analyze the Tez application.
https://cwiki.apache.org/confluence/display/TEZ/How+to+Diagnose+Tez+App
Spark Execution Engine
If the Execution Engine is Spark for the pipeline build job and if the underlying Spark job is failing, get the Spark Application job log from the Resource Manager page or run the below command from the Infoworks Edge node to get the Spark application job id.
yarn application -list -appStates ALL | grep <piepeline_build_job_id>
You would see the output as shown below.
application_1600952462285_19507 DF_InteractiveApp_b2538de45dc7ddc81f1d8cce SPARK ceinfoworks default FINISHED SUCCEEDED 100% cs-internal-workernode3.c.gcp-cs-shared-resources.internal:18081/history/application_1600952462285_19507/1
Get the application_id and run the below yarn command to get the Spark application job log.
yarn logs -applicationId application_1600952462285_19507
Take a look at the articles and solutions on the IW knowledge base. See below for the link. We have been adding new articles every week. You will be able to search for error messages and find solutions.
You can search for a solution before you open a ticket
https://infoworks.freshdesk.com/a/solutions
Pipeline Interactive/sample data generation failure
Whenever you run a preview data job or if you see any errors in the Pipeline Editor page while creating a pipeline in editor mode, an interactive job would be executed in the background and you can collect the logs for this from the below location on the DataFoundry Edge node.
i) $IW_HOME/logs/df/job_<job_id>
Note: The sample data generation will not have a job id shown in UI. You need to check the latest log file generated in the above directory.
ii) $IW_HOME/logs/dt/<interactive.out> (If it exists)
Workflow Failure logs
a) Whenever a workflow run fails at a particular node, select the failed run, go to the failed node, and click on View Task logs, copy the content into a text file and check the ERROR messages.
b) If it is an ingest node or a pipeline build node that fails during the job execution, collect the job logs as well along with the task log by clicking on View Job Logs as shown below.
It will take you to the ingestion/pipeline job log page and click on Download to collect the job logs.
c) Share the below logs from the DataFoundry Edge node along with the above logs.
$IW_HOME/orchestrator-engine/airflow-scheduler.log
$IW_HOME/orchestrator-engine/airflow-webserver.err
$IW_HOME/orchestrator-engine/airflow-webserver.out
$IW_HOME/orchestrator-engine/airflow-worker.out
$IW_HOME/orchestrator-engine/airflow-scheduler.out
$IW_HOME/orchestrator-engine/airflow-worker.err
$IW_HOME/orchestrator-engine/airflow-scheduler.err
Cube build jobs
i) Download the cube build job logs from the UI using the Download Logs option as shown below.
OR
Collect the job log from the below location on the Edge node.
$IW_HOME/logs/job/job_<job_id>
Along with the job log, share the below logs.
ii) $IW_HOME/cube-engine/logs/iw_cube.log
iii) $IW_HOME/cube-engine/logs/iw_cube.out
iv) IW_HOME/logs/cube-engine/access-server.log
Scheduled Jobs not getting triggered in Infoworks DataFoundry.
i) Platform service is responsible for the scheduled jobs(Workflows, Ingestion, Pipeline build) to run.
ii) If the scheduled workflows or the Ingestion & Pipeline jobs are not running on time, check the log below log.
$IW_HOME/platform/logs/scheduler-service.log
iii) Look for messages as shown below to check if the workflows are getting triggered.
[QuartzScheduler_Worker-1] job.IWSingleJob:19 : Executing command: /opt/infoworks/apricot-meteor/infoworks_python/infoworks/cli/infoworks.sh workflow-execute --workflow-action run --workflow Ingest_Tables --domain gd_domain -at AU66mMKxL9fEwHEUe9cMnlK9RTDNLkIfp7vcYdT9cSs=
Infoworks DataFoundry Service logs
For Data Foundry Versions v2.4.x, v2.5.x, v2.6.x
Locations for all the Service Logs in IWX DataFoundry
For Data Foundry Versions v2.7.x,v2.8.x,v2.9.x,v3.x,v4.x,v5.x
Locations for all the Service Logs in IWX DataFoundry
Starting from IWX DataFoundry v2.7.x, there are some new IWX services added.
Platform
Configuration service
Notification Service
Ingestion Service (Only for DataFoundry on Databricks v3.x,v4.x)
How to Collect/Analyze the Infoworks DataFoundry Export Job logs
Export to a delimited file
For the export jobs, one can divide the job into two stages and the exact cause for failure can usually be located based on the stage.
Pre-MR job stage: If there is no MR job launched by DF, as shown in the below figure, then this means the job failed in this stage.
In this case, one can download the logs and search in job_<jobid>.log file for the messages with the tag [ERROR] to look at the exact cause of failure.
In the cases when job_<jobid>.log is empty or is uninformative, one can search job_<jobid>.out file error messages or exceptions.
MR-Job submission and Post Submission stage: If there are MR jobs launched by DF, as shown in the below figure, then this means the job failed in this stage.
In this case, one can reach out to the Hadoop admin to download the MR job logs for the job id shown in the Identifier column of the above image. You can also get the corresponding map-reduce application job log id from the job<id>.log file and search for the below message in the log.
The URL to track the job: http://cs-internal-headnode.c.gcp-cs-shared-resources.internal:8088/proxy/application_1603205573328_0076/
After this, one has to look for messages with the tag ERROR along with steps mentioned in stage 1 to look at the exact cause for failure.
Once the error message is found, one can search for the keywords in the Infoworks Freshdesk Knowledge base to see if they are addressed before.
Export to Bigquery: Refer to the below doc to collect Bigquery audit log,
https://support.infoworks.io/a/solutions/articles/14000100766
How to collect and analyze the TPT logs for the Teradata TPT ingestion jobs in DataFoundry
Here is a Knowledge Base article on how to collect the TPT logs for a Teradata ingestion job
https://infoworks.freshdesk.com/a/solutions/articles/14000105966