Character encoding issues can sometimes cause errors during data ingestion from an IBM DB2 database. These errors are typically related to data in character columns that do not conform to the expected character encoding or code page [jcc][t4][1065][12306][XXX.XXX.XXX] Caught See attached Throwable for details. ERRORCODE=-4220, SQLSTATE=null

Root Cause

The underlying cause of these errors is that the IBM Data Server Java Common Client (JCC) driver throws an exception when it encounters data in a character column that does not adhere to the expected character encoding or code page.


JCC Configuration Property Adjustment:

To provide a more lenient handling of non-valid data, you can configure the JCC driver with the db2.jcc.charsetDecoderEncoder=3 property. When this property is set, the JCC driver will replace non-valid data sequences with the Unicode REPLACEMENT CHARACTER (U+FFFD) instead of raising exceptions. This approach allows data ingestion to continue without disruptions.

To implement this configuration, you must access the Compute cluster template and add the following advanced configuration:

Key: iw_environment_cluster_spark_config

Value: spark.executor.extraJavaOptions=-Ddb2.jcc.charsetDecoderEncoder=3;spark.driver.extraJavaOptions=-Ddb2.jcc.charsetDecoderEncoder=3;   

Please note that only users with an 'Infoworks admin role' have permission to add this configuration.
After applying this change, it is essential to retry the data ingestion job with a cluster configured with the specified settings and monitor the outcome.

Anirudh Chekuri

Infoworks Support Team