Problem
Character encoding issues can sometimes cause errors during data ingestion from an IBM DB2 database. These errors are typically related to data in character columns that do not conform to the expected character encoding or code page
com.ibm.db2.jcc.am.SqlException: [jcc][t4][1065][12306][XXX.XXX.XXX] Caught java.io.CharConversionException. See attached Throwable for details. ERRORCODE=-4220, SQLSTATE=null
Root Cause
The underlying cause of these errors is that the IBM Data Server Java Common Client (JCC) driver throws an exception when it encounters data in a character column that does not adhere to the expected character encoding or code page.
Resolution
JCC Configuration Property Adjustment:
To provide a more lenient handling of non-valid data, you can configure the JCC driver with the db2.jcc.charsetDecoderEncoder=3 property. When this property is set, the JCC driver will replace non-valid data sequences with the Unicode REPLACEMENT CHARACTER (U+FFFD) instead of raising exceptions. This approach allows data ingestion to continue without disruptions.
To implement this configuration, you must access the Compute cluster template and add the following advanced configuration:
Key: iw_environment_cluster_spark_config
Value: spark.executor.extraJavaOptions=-Ddb2.jcc.charsetDecoderEncoder=3;spark.driver.extraJavaOptions=-Ddb2.jcc.charsetDecoderEncoder=3;
Please note that only users with an 'Infoworks admin role' have permission to add this configuration.
After applying this change, it is essential to retry the data ingestion job with a cluster configured with the specified settings and monitor the outcome.
Anirudh Chekuri
Infoworks Support Team