Problem Description:

The metadata crawl job for Cosmos DB will not fetch some columns during the metadata crawl if the column has null values for the first 100 records.

For instance, if the column acceptDate has null values for the first 100 records in CosmosDB, then this column will not be picked up during the metadata crawl job log and it would be missing in the table schema.

Enable DEBUG logging to check what columns are getting picked up during the metadata crawl from the Metadata crawl job log by performing the steps mentioned here.

:"select \"id\" , \"_attachments\" , \"_etag\" , \"_rid\" , \"_self\" , \"_ts\" , \"applyDate\" , \"createDate\" , \"jobId\" , \"sourceJobId\" , \"type\" , \"typeId\" , \"updateDate\" , \"userId\" from \"enterpriseapis\".\"logs\"","is_jtds_driver":false,"full_load_performed":false,"meta_crawl_performed":true,"origTableName":"logs","schemaNameAtSource":"enterpriseapis","target_schema_name":"cosmos_db","_id":{"$oid":"5fabea66b43495139ac1196f"},"state":"blank","rowCount":0,"table":"logs"},"query":{"_id":{"$oid":"5fabea66b43495139ac1196f"}},"options":{"multi":false},"collection":"tables"}

The acceptDate column is missing in the select query mentioned in the above DEBUG log message.

Root cause: This is according to the design behavior of the CData JDBC connector. By default, the CData driver will scan for the first 100 records in CosmosDB and if all the values are null for any column in that result set, it will not fetch that column during the metadata crawl.

Solution: CData connector has a config to increase this scan range limit. Set the config RowScanDepth (The maximum number of rows to scan to look for the columns available in a table) to a higher value and append it to the Source connection JDBC URL in Infow0rand then run the metadata crawl job again.

For example:


This should resolve the issue.

Applicable Infoworks Versions:

IWX v4.x,v5.x