As an example below are the instructions to ingest a file with file name appended with a timestamp and use the Datafoundry pipeline to create a table with a new column by extracting timestamp from the file name. 


Step 1: Create a Structured File source, configure source, and perform an initialize and ingest.


Ref: https://docs2x.infoworks.io/datafoundry-3.1/data-ingestion/file-ingestion-process


Step 2: Create a new pipeline 


Ref: https://docs2x.infoworks.io/datafoundry-3.1/data-transformation/creating-pipeline


Step 3: Open Pipeline editor, drag the ingested data lake table ,two ‘derive’ nodes and a ‘target’ node and connect all of them as below



Step 4: Double click on the table and click ‘Inputs’ tab


Step 5: Select ‘ZIW_FILENAME’ and click ‘Include Columns’ and close the table window.

‘ZIM_FILENAME’ is a metadata column created during initialize and ingest


Assuming the source file is in the format:  data_20200724-060102.csv we will do the next derivations using Infoworks derive expression.


Step 6: Double click on the first ‘Derive node’, click ‘Add Derivation’ and enter column name , example ‘timestamp_on_file’ and expression as substr(ZIW_FILENAME,-19)

Click Save

             




Step 7:  Double click on the first ‘Derive node’, click ‘Add Derivation’ and enter column name, ex ‘timestamp and expression as from_unixtime(unix_timestamp(timestamp_on_file,'yyyyMMdd-hhmmss'))

Click Save


Step 8:Double click on Target and click edit properties to configure target table properties.

On the ‘INPUT’ tab, you may remove any extra columns that you want to drop that were created on step 6 and step 7


Step 9: Build the pipeline 


On each transformation node, we can use the ‘Preview Data’ tab to preview the output from the transformation even before building the pipeline.



Applicable Infoworks DataFoundry Versions:

v.2.x,3.x