How to extract filename or part of file name using Data Transformation? : Infoworks

As an example below are the instructions to ingest a file with file name appended with a timestamp and use the Datafoundry pipeline to create a table with a new column by extracting timestamp from the file name.

Step 1: Create a Structured File source, configure source, and perform an initialize and ingest.

Ref: https://docs2x.infoworks.io/datafoundry-3.1/data-ingestion/file-ingestion-process

Step 2: Create a new pipeline

Ref: https://docs2x.infoworks.io/datafoundry-3.1/data-transformation/creating-pipeline

Step 3: Open Pipeline editor, drag the ingested data lake table ,two ‘derive’ nodes and a ‘target’ node and connect all of them as below

Step 4: Double click on the table and click ‘Inputs’ tab

Step 5: Select ‘ZIW_FILENAME’ and click ‘Include Columns’ and close the table window.

‘ZIM_FILENAME’ is a metadata column created during initialize and ingest

Assuming the source file is in the format: data_20200724-060102.csv we will do the next derivations using Infoworks derive expression.

Step 6: Double click on the first ‘Derive node’, click ‘Add Derivation’ and enter column name , example ‘timestamp_on_file’ and expression as substr(ZIW_FILENAME,-19)

Click Save

Step 7: Double click on the first ‘Derive node’, click ‘Add Derivation’ and enter column name, ex ‘timestamp and expression as from_unixtime(unix_timestamp(timestamp_on_file,'yyyyMMdd-hhmmss'))

Click Save

Step 8:Double click on Target and click edit properties to configure target table properties.

On the ‘INPUT’ tab, you may remove any extra columns that you want to drop that were created on step 6 and step 7

Step 9: Build the pipeline

On each transformation node, we can use the ‘Preview Data’ tab to preview the output from the transformation even before building the pipeline.

Applicable Infoworks DataFoundry Versions:

v.2.x,3.x

How to extract filename or part of file name using Data Transformation? Print

Related Articles