The Guide for Infoworks DataFoundry on Databricks logs : Infoworks

Job Specific Logs for Infoworks DataFoundry 3.x,4.x,v5.x on Databricks, GCP, EMR

Ingestion

How to collect and analyze the Ingestion Job logs in Infoworks DataFoundry 3.x,4.x,v5.x on Databricks, GCP,EMR

Infoworks DataFoundry on Databricks submits a Spark job during the ingestion process. If the ingestion job fails, it might fail at

The Infoworks End while dispatching the Spark job (Infoworks Driver program)

The Spark job might fail

Steps to collect the Job logs and to analyze them

i) Download the ingestion job log from Infoworks DataFoundry web UI as shown below.

Note: The same job log can also be located on the Infoworks Edge node location mentioned below.

$IW_HOME/logs/job/job_<job_id>

ii) If you see the failure in the Spark job, click on the Download icon below the Status tab to get the Databricks job log.

iii) Open the Infoworks Job log and look out for any related ERRORs in the .log and .out file. You can search for the keyword “ERROR” in the log and try to understand the stack trace. Analyze some 10 lines before the ERROR message to understand the flow of the job.

iv) Similarly, open the downloaded Databricks job log and look out for any ERRORs with the Keyword “ERROR”

v) Take a look at the articles and solutions on the IW knowledge base. See below for the link. We have been adding new articles every week. You will be able to search for error messages and find solutions.

You can search for a solution before you open a ticket.

https://infoworks.freshdesk.com/a/solutions

Pipeline build

How to collect and analyze the Pipeline build Job logs in Infoworks DataFoundry 3.x,4.x,5.x on Databricks

Infoworks DataFoundry on Databricks submits a Spark job during the pipeline build job. If the job fails, it might fail at

The Infoworks end while dispatching the Spark job (Infoworks Driver program)

The Spark job might fail

i) Download the Build pipeline job log from the UI as shown below.

The same pipeline build job log can also be located in the below location on the Edge node. $IW_HOME/logs/job/job_<job_id>

ii) If you see the failure in the Spark job, click on the Download icon below the Status tab to get the Databricks job log.

iv) Similarly, open the downloaded Databricks job log and look out for any ERRORs with the Keyword “ERROR”

v) Interactive jobs such as (Sample Data, Browse the Source schema to onboarding more tables, File preview job for File-based sources) will log the messages in $IW_HOME/logs/ingestion/ingest.log and $IW_HOME/logs/ingestion/ingest.out

The above-mentioned jobs will use the interactive cluster that you set up after the Infoworks install. If any of the above jobs fail, you can look for the below message in the interactive.log to get the cluster id and the run id. Go to the respective cloud environment (GCP, EMR, Databricks for AWS, Azure) to get the corresponding cluster job log in case of failure.

Submitting an application with the following configuration:

{

"existing_cluster_id" : "0928-075015-nits385",

[INFO] 2021-10-07 09:41:39,800 [pool-8-thread-1] io.infoworks.platform.job.dispatcher.core.cluster.impl.databricks.DatabricksInteractiveCluster:88 :: Started databricks run with id: 34783

vi) Take a look at the articles and solutions on the IW knowledge base. See below for the link. We have been adding new articles every week. You will be able to search for error messages and find solutions.

You can search for a solution before you open a ticket

https://infoworks.freshdesk.com/a/solutions

Pipeline Interactive/sample data generation failure

Whenever you run a preview data job or if you see any errors in the Pipeline Editor page while creating a pipeline in editor mode, an interactive job would be executed in the background and you can collect the logs for this from the below location on the DataFoundry Edge node.

i) $IW_HOME/logs/df/job_<job_id>

Note: The sample data generation will not have a job id shown in UI. You need to check the latest log file generated in the above directory.

ii) $IW_HOME/logs/dt/<interactive.out> (If it exists)

Workflow Failure logs

a) Whenever a workflow run fails at a particular node, select the failed run, go to the failed node, and click on View Task logs, copy the content into a text file and check the ERROR messages.

b) If it is an ingest node or a pipeline build node that fails during the job execution, collect the job logs as well along with the task log by clicking on View Job Logs as shown below.

It will take you to the ingestion/pipeline job log page and click on Download to collect the job logs.

c) Share the below logs from the DataFoundry Edge node along with the above logs.

$IW_HOME/orchestrator-engine/airflow-scheduler.log

$IW_HOME/orchestrator-engine/airflow-webserver.err

$IW_HOME/orchestrator-engine/airflow-webserver.out

$IW_HOME/orchestrator-engine/airflow-worker.out

$IW_HOME/orchestrator-engine/airflow-scheduler.out

$IW_HOME/orchestrator-engine/airflow-worker.err

$IW_HOME/orchestrator-engine/airflow-scheduler.err

Locations for all the Service Logs in IWX DataFoundry on Databricks

Starting from IWX DataFoundry v2.7.x, there are some new IWX services added.

Platform

Configuration service

Notification Service

Ingestion Service (Only for DataFoundry on Databricks v3.x,v4.x)

Service logs	Location
Service info for UI	$IW_HOME/logs//apricot/apricot-out.log
Service info for Governor	$IW_HOME/logs//governor/governor.log
Service info for Hangman	$IW_HOME/logs//hangman/hangman.log
Service info for Rest API	$IW_HOME/logs//rest-api/iw-rest-api.log
Service info for DT (Data Transformation)	$IW_HOME/logs/dt/interactive.out
Service info for Monitoring Service	$IW_HOME/logs/monitor/monitor.log
Service info for Postgres	$IW_HOME/logs/pgsql.log
Service info for Orchestrator Web Server	$IW_HOME/logs//orchestrator/orchestrator.log
Service info for Orchestrator Engine Web Server	$IW_HOME/orchestrator-engine/airflow-webserver.log
Service info for Orchestrator Engine Scheduler	$IW_HOME/orchestrator-engine/airflow-scheduler.log
Service info for Orchestrator Engine Worker	$IW_HOME/orchestrator-engine/airflow-worker.log
Service info for RabbitMQ	$IW_HOME/logs/rabbit*.log
Service info for Nginx	$IW_HOME/resources/Nginx-portable/logs/*.log
Service info for MongoDB	$IW_HOME/logs/mongod.log
Service info for Platform Services	$IW_HOME/logs/platform/platform-server.log
Service info for Notification User Consumer	$IW_HOME/logs/platform/artifact-consumer.log
Service info for Notification Artifact Consumer	$IW_HOME/logs/platform/user-consumer.log
Service info for Configuration Service	$IW_HOME/logs/platform/config-service.log
Service info for Ingestion Service (Only for DataFoundry on Databricks v3.x,v4.x)	$IW_HOME/logs/ingestion/ingestion.log

The Guide for Infoworks DataFoundry on Databricks logs Print