Description:


Windows Azure Storage Blob (WASB) is a file system implemented as an extension built on top of the HDFS APIs and is in many ways similar to HDFS.


Perform the below steps to ingest the CSV files located in Azure private storage containers.



a) Following are the Azure Blob Storage configurations:

  • In the Source Settings page Source Configuration section, select the host type as From Hadoop Cluster.
  • Prefix the Source Base Path with the wasb protocol and container address as follows: wasb://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net/




b) For containers which are private, if the source container has the access type as private in azure, then the following configuration must be added in the custom core-site.xml file:

  • Key: fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
  • Value: Container Access Key (available in the accessKeys section of the azure storage account).


c) Login to Infoworks edge node and look for the wasb location in the core-site.xml file present in the Hadoop client location.


d) cat /etc/hadoop/conf/core-site.xml | grep wasb

<name>fs.AbstractFileSystem.wasb.impl</name>
<name>fs.AbstractFileSystem.wasbs.impl</name>
<value>wasb://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net</value>

e) If you run the below command with infoworks user you would be able to see all the files and folders.

 hdfs dfs -ls wasb://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net

drwxr-xr-x   - infoworks-user supergroup          0 2019-08-06 05:16 

wasb://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net/user/infoworks-user/temp


I have created a directory CSV and a test.csv file is present in the below hdfs location.

wasb://<CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net/user/infoworks-user/temp/csv/test.csv

f) So, I have provided the Source Base Path as shown below after selecting the option From Hadoop Cluster. This should be the wasb location in the azure file system

g) Under the File Mappings tab I have added the entry as shown below. The test.csv file is present under the directory csv under temp so the table name should be provided as csv. Refer the document https://docs.infoworks.io/datafoundry-2.7.2/structured-file-ingestion#azure-blob-storage-ingestion on how to provide the Table mappings.




h) Save the entry.
i) Go to Tables>Select the Table>Actions>Recrawl the Metadata.


j) After the metadata is crawled, create a table group and then ingest the tables.



Applicable Versions:

IWX Data Foundry v2.7.x