Problem Description:


Pipeline in cluster mode with Spark as execution engine will fail with the below error in the job log.



Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1064141559-10.45.32.95-1593507852692:blk_1073765565_24742 file=/user/infoworks-user/dt/dt_path_checksum.txt
    at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1053)
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.Reader.read(Reader.java:140)


Root Cause:


This issue occurs when there are any configurations related issues to a multi-homed network for the Hadoop cluster, the scan fails. An internal IP address is referred by the HDFS client and the name node is not able to locate the specific IP. 


Solution:


 Set dfs.client.use.datanode.hostname=true in custom HDFS site properties on the cluster, restart HDFS service, and then run the Spark pipeline. This should resolve the issue.


Applicable Infoworks DataFoundry Versions:

v2.8.x,v2.9.x,v3.x