ORC tables will not be visible under the sources tab while creating pipelines with spark as execution engine.
This is because of a compatibility limitation between ORC and spark pipelines which is fixed in Infoworks ADE v2.5.4.
Starting from Infoworks v2.5.4, the support for ORC tables has been added in Spark pipelines. This does not include any change for the newly ingested tables but requires migration for older tables that are already crawled(Tables that are ingested in the past).
A script, support_spark_pipelines.sh is available in the $IW_HOME/bin/migration_support_spark_pipelines/ folder.
This scripts obtains 2 or more arguments and convert them into a new Hive table structure to build Spark pipelines.
Steps to run the script are provided below.
1. Login to Infoworks Edge Node
2. Navigate to $IW_HOME/bin folder and source env.sh file
3. The steps to run the migrations is mentioned below. The script can take 2 or more parameters depending on the user requirement
./support_spark_pipelines.sh <auth_token> <table_id> <table_id>
./support_spark_pipelines.sh <auth_token> yshs7hd82bd92vdjbdhxsd 73bdhw4hcbswhbd6nbwc
- <auth_token> is the auth token of the user migrating to the new hive table structure. This auth token must be URL encodable.
- yshs7hd82bd92vdjbdhxsd is the ID of the first table to be migrated.
- 73bdhw4hcbswhbd6nbwc is the ID of the second table to be migrated
4. Restart the Data Transformation service after performing the below steps.
Versions prior to Infoworks ADE v2.5.4