Problem Description :

CSV ingestion job fails with the below ERROR in the ingestion job log.


[ERROR] 2020-05-19 12:03:58,169 [pool-6-thread-4] infoworks.tools.hadoop.mapreduce.IWJob:133 :: Exception occured : org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://<path>

    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)

    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)

    at infoworks.discovery.filecrawler.generic.csv.inputformat.FlatFileToHiveInputFormat.getSplits(FlatFileToHiveInputFormat.java:208)

    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)

    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)

    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)

    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)

    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:422)


Root cause :

This issue happens if the CSV files do not exist in the local source directories on Infoworks DataFoundry edge node.


Solution :

Placing the CSV files in the source directory and rerunning the job will fix this issue.


Applicable Infoworks versions : 

IWX 2.8.x,2.9.x,3.1.x