Problem Description 


ORC tables will not be visible under the sources tab while creating pipelines with spark as execution engine.


Cause


This is because of a compatibility limitation between ORC and spark pipelines which is fixed in Infoworks ADE v2.5.4. 



WORKAROUND/ RESOLUTION


Starting from Infoworks v2.5.4, the support for ORC tables has been added in Spark pipelines. This does not include any change for the newly ingested tables but requires migration for older tables that are already crawled(Tables that are  ingested in the past).


A script, support_spark_pipelines.sh is available in the $IW_HOME/bin/migration_support_spark_pipelines/ folder.


This scripts obtains 2 or more arguments and convert them into a new Hive table structure to build Spark pipelines.


Steps to run the script are provided below.


1. Login to Infoworks Edge Node


2. Navigate to $IW_HOME/bin folder and source env.sh file

    cd $IW_HOME/bin

    source env.sh


3. The steps to run the migrations is mentioned below. The script can take 2 or more parameters depending on the user requirement

   

./support_spark_pipelines.sh <auth_token> <table_id> <table_id>


For example,


./support_spark_pipelines.sh <auth_token>  yshs7hd82bd92vdjbdhxsd  73bdhw4hcbswhbd6nbwc


where,

  • <auth_token> is the auth token of the user migrating to the new hive table structure. This auth token must be URL encodable.
  • yshs7hd82bd92vdjbdhxsd is the ID of the  first table to be migrated.
  • 73bdhw4hcbswhbd6nbwc is the ID of the second table to be migrated


  4. Restart the Data Transformation service after performing the below steps.


APPLIES TO


Versions prior to Infoworks ADE v2.5.4