Problem Description:


CDC job for Oracle source fails with below socket read timeout error in the job log.


AttemptID:attempt_1522528327194_0004_m_000018_1 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Socket read timed out

at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)

at infoworks.tools.filecrawler.generic.inputformat.oracle.LogBasedCDCInputFormat.setConf(LogBasedCDCInputFormat.java:169)

at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)

at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)

Caused by: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Socket read timed out

at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)

at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)


Solution: 


This issue occurs if the mappers that execute Oracle query to read from the temporary table timeout. Look for a select query as shown below in the job log file and see if that is timing out.


SELECT MIN(scn), MAX(scn) FROM (select * from (select rs_id,scn,table_name,op as operation,CAST(cdctimestampcol as TIMESTAMP) as cdcdate,CAST(cdctimestampcol as TIMESTAMP) as cdctimestamp,status,dbms_xmlgen.convert(xmlagg(XMLELEMENT(E,sql_redo||'') order by ROWNUM ASC).extract('//text()').getclobval() ,1) as sql_redo,seg_owner,ssn from ( SELECT rs_id,scn,table_name,operation as op,CAST(tscolumn as TIMESTAMP) as cdctimestampcol,status,sql_redo,seg_owner,ssn FROM "INFOWORKS".LOG_d74829bbcc301fff9e03ef0c where seg_owner = 'HOLDINGDBO' AND table_name = 'POSITION_DETAIL' AND CAST(tscolumn as TIMESTAMP) <= to_timestamp('2018-03-30 22:59:43','YYYY-MM-DD HH24:MI:SS.FF')) GROUP BY rs_id,scn,table_name,op,cdctimestampcol,status,seg_owner,ssn) log_table where   (1 = 1) ) t1
[INFO] 2018-03-31 17:44:24,973 [pool-5-thread-1] org.apache.hadoop.mapreduce.JobSubmitter:198 :: number of splits:30
[INFO] 2018-03-31 17:44:25,370 [pool-5-thread-1] org.apache.hadoop.mapreduce.JobSubmitter:287 :: Submitting tokens for job: job_1522528327194_0004


Solution: Add the configuration entry create_index_on_temp_table and then set the value to true at Source level (Source Configurations) and run the job again. This would create the index on the temporary table and should resolve the socket read timeout issue.


Version of Infoworks: 2.3.3