As an example below are the instructions to ingest a file with file name appended with a timestamp and use the Datafoundry pipeline to create a table with a new column by extracting timestamp from the file name.
Step 1: Create a Structured File source, configure source, and perform an initialize and ingest.
Step 2: Create a new pipeline
Step 3: Open Pipeline editor, drag the ingested data lake table ,two ‘derive’ nodes and a ‘target’ node and connect all of them as below
Step 4: Double click on the table and click ‘Inputs’ tab
Step 5: Select ‘ZIW_FILENAME’ and click ‘Include Columns’ and close the table window.
‘ZIM_FILENAME’ is a metadata column created during initialize and ingest
Assuming the source file is in the format: data_20200724-060102.csv we will do the next derivations using Infoworks derive expression.
Step 6: Double click on the first ‘Derive node’, click ‘Add Derivation’ and enter column name , example ‘timestamp_on_file’ and expression as substr(ZIW_FILENAME,-19)
Step 7: Double click on the first ‘Derive node’, click ‘Add Derivation’ and enter column name, ex ‘timestamp and expression as from_unixtime(unix_timestamp(timestamp_on_file,'yyyyMMdd-hhmmss'))
Step 8:Double click on Target and click edit properties to configure target table properties.
On the ‘INPUT’ tab, you may remove any extra columns that you want to drop that were created on step 6 and step 7
Step 9: Build the pipeline
On each transformation node, we can use the ‘Preview Data’ tab to preview the output from the transformation even before building the pipeline.