Problem Description :

CSV ingestion job fails with the below ERROR in the ingestion job log.

[ERROR] 2020-05-19 12:03:58,169 [pool-6-thread-4] :: Exception occured : org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://<path>

    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(

    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(

    at infoworks.discovery.filecrawler.generic.csv.inputformat.FlatFileToHiveInputFormat.getSplits(

    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(

    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(

    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(

    at org.apache.hadoop.mapreduce.Job$

    at org.apache.hadoop.mapreduce.Job$

    at Method)


Root cause :

This issue happens if the CSV files do not exist in the local source directories on Infoworks DataFoundry edge node.

Solution :

Placing the CSV files in the source directory and rerunning the job will fix this issue.

Applicable Infoworks versions : 

IWX 2.8.x,2.9.x,3.1.x