Problem Description:


Ingestion is failing with "io.infoworks.saas.ingestion.commons.exception.IWException: io.infoworks.saas.ingestion.commons.exception.IWException: Error while writing data" Error for csv source(source file location is sftp) when the target is gs bucket in iwx 4.4.0.1. Sample stacktrace looks like below,


21/04/06 23:46:42 ERROR io.infoworks.connectors.files.csv.datacrawler.CSVDataCrawler: io.infoworks.saas.ingestion.commons.exception.IWException: io.infoworks.saas.ingestion.commons.exception.IWException: Error while writing data to gs://gcp-pres-demo-staging//iw/sources/member_360_org_b_gcp_schema/temp//opt/infoworks/sources/member_360/member360_org_b.csv
    at io.infoworks.saas.ingestion.commons.sftpclient.SFTPClient.copyFilesMethod(SFTPClient.java:161)
    at io.infoworks.saas.ingestion.core.connectors.file.datacrawler.AbstractFileDataCrawler.copyFilesFromSFTPServer(AbstractFileDataCrawler.java:178)
    at io.infoworks.saas.ingestion.core.connectors.file.datacrawler.AbstractFileDataCrawler.getFileDetailsMap(AbstractFileDataCrawler.java:162)
    at io.infoworks.connectors.files.csv.datacrawler.CSVDataCrawler.runJob(CSVDataCrawler.java:106)
    at io.infoworks.saas.ingestion.core.datacrawler.Crawler.crawl(Crawler.java:58)
    at io.infoworks.saas.ingestion.core.datacrawler.Crawler.runCrawlJob(Crawler.java:85)
    at io.infoworks.saas.ingestion.core.datacrawler.DistJobsDriver.submitCrawlJob(DistJobsDriver.java:82)
    at io.infoworks.saas.ingestion.core.datacrawler.DistJobsDriver.runJobs(DistJobsDriver.java:43)
    at io.infoworks.saas.ingestion.core.datacrawler.DistJobsDriver.main(DistJobsDriver.java:26)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.infoworks.saas.ingestion.commons.exception.IWException: Error while writing data to gs://gcp-pres-demo-staging//iw/sources/member_360_org_b_gcp_schema/temp//opt/infoworks/sources/member_360/member360_org_b.csv


Root cause:


This is a limitation in Infoworks which is fixed in future releases.


Solution:


This issue has been fixed in future releases and there is a patch available for IWX v4.4.0.1. Instructions are listed below.

  • Take the backup of core.jar and commons.jar present in /opt/infoworks/lib/ingestion/core/
  • Replace the core.jar and commons.jar present in /opt/infoworks/lib/ingestion/core/ with the jars attached.
  • Change the permission to 755 for these 2 jars.
                  chmod 755 core.jar
                  chmod 755 commons.jar
  • Rerun the ingestion job.


Applicable IWX versions:

IWX 4.4.0.1