Problem Description:


Teradata TPT full load jobs fail with the below exception from the Azure end when Infoworks is running on Azure platform.


[INFO] 2019-08-15 02:57:02,111 [pool-4-thread-1] infoworks.discovery.utils.TPTScriptGenerator:486 :: Error Output:

FSDataOutputStream#close error:

java.io.IOException: The block list may not contain more than 50,000 blocks. Please see the cause for further information.

    at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:778)

    at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.close(BlobOutputStreamInternal.java:327)

    at java.io.FilterOutputStream.close(FilterOutputStream.java:159)

    at 


Root cause:


This is an exception coming from the Azure end but not from Infoworks end. During the TPT ingestion, the Teradata Parallel transport utility will extract the data from Teradata and will write it into a CSV file. When TPT tries to upload this CSV file which is more than 200GB to Azure BLOB storage, it fails with the above mentioned Azure exception. This is a limitation from the Azure side that we cannot upload a file that is more than 200GB.


To overcome that we should split the file into chunks or increase the number of writers so that the size of a single file will not exceed 200GB.


A block blob can include a maximum of 50,000 blocks. Uncommitted blocks should be committed to fulfill the content or data of the blob. A blob can have a maximum of 100,000 uncommitted blocks at any given time. If this maximum count is exceeded, the service returns status code 409 (RequestEntityTooLargeBlockCountExceedsLimit).


Solution:


Perform the below steps to resolve this issue.


If the size of the file generated by TPT is more than 200GB, we should split the file into chunks. To achieve this, we need to increase the number of TPT writers at the table level to 15. The default value for the TPT writers is 5 at the table level.

 Increase the number of TPT writers to 15 at table level, run the TPT job again. This should resolve the issue.





Reference Links


https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

https://docs.microsoft.com/en-us/rest/api/storageservices/get-block-list

https://docs.microsoft.com/en-us/rest/api/storageservices/put-block

https://github.com/Azure/azure-storage-python/issues/346

https://dzone.com/articles/comprehensive-comparison-0


Applicable Infoworks EDO2 Versions.


v2.8.x