PROBLEM DESCRIPTION


Please refer this article if the ingestion job fails during the crawling state with the below error. 

[ERROR] 2018-03-20 21:23:09,120 [main] infoworks.discovery.dbcrawler.rdbms.CrawlController:750 :: Error while creating crawl Threads. java.lang.NoClassDefFoundError: parquet/hadoop/api/ReadSupport at infoworks.tools.format.FormatUtilsFactory.getFormatUtil(FormatUtilsFactory.java:48) at infoworks.tools.format.FormatUtilsFactory.getFormatUtil(FormatUtilsFactory.java:23) at infoworks.discovery.dbcrawler.rdbms.utils.CrawlWorkerThread.<init>(CrawlWorkerThread.java:152) at infoworks.discovery.dbcrawler.rdbms.CrawlController.getCrawlWorkerThread(CrawlController.java:1285) at infoworks.discovery.dbcrawler.rdbms.CrawlController.crawlTable(CrawlController.java:1097) at infoworks.discovery.dbcrawler.rdbms.CrawlController.importDB(CrawlController.java:706) at infoworks.discovery.dbcrawler.rdbms.CrawlController.doCrawl(CrawlController.java:148) at infoworks.discovery.main.Main.startCrawl(Main.java:62) at infoworks.discovery.main.Main.main(Main.java:27)


CAUSE


This issue occurs if the parquet related jars are missing in the iw_jobs_classpath.


WORKAROUND/ RESOLUTION


Perform the below steps to overcome the issue,

  1. Go to $IW_HOME/conf ($IW_HOME refers to the infoworks home directory)


        2.Open the conf.properties file and comment the iw_jobs_classpath entry.

        

        3.Uncomment the iw_jobs_classpath entry which is related to parquet format under commented section shown 

#The following commented iw_jobs_classpath is for parquet support
#If enabled please ensure that only one configuration with the key iw_jobs_classpath is enabled

#Ensure that all the paths in the classpath are correct including the path for hive jars
#Also ensure any changes made to the original iw_jobs_classpath is copied here as well

        

        4.Add the below entry to the iw_jobs_classpath and run the ingestion job again.

        /opt/infoworks/lib/parquet-support/*:/usr/hdp/current/hive-client/lib/datanucleus-core-4.1.6.jar



APPLIES TO VERSION


IWX 2.3.0