Problem Description:

Export to SQL server is failing with java.sql.BatchUpdateException: The conversion of a datetime2 data type to a datetime data type resulted in an out-of-range value exception and the sample stack trace looks like below,

21/04/09 18:00:51 ERROR DefaultTaskDagExecutorImpl: Exception: Job aborted due to stage failure: Task 2 in stage 10.0 failed 4 times, most recent failure: Lost task 2.3 in stage 10.0 (TID 223, 172.25.178.21, executor 2): java.sql.BatchUpdateException: The conversion of a datetime2 data type to a datetime data type resulted in an out-of-range value.
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:672)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:839)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:951)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:951)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2284)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2284)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
    at org.apache.spark.scheduler.Task.run(Task.scala:112)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Root cause:

This issue occurs when invalid date is being inserted/updated into a SQL Server Datetime field. The bad record may have a date field with the value, for instance, 0/31/0001. Such data cannot be handled by SQL Server in the Datetime field. The issue comes from how the SQL Server database handles the two data types (Datetime and Datetime2). Microsoft SQL Server treats both the data types differently. In Microsoft SQL, Datetime supports 1753/1/1 to 9999/12/31, while Datetime2 supports 0001/1/1 to 9999/12/31.


Solution:

To resolve this issue, rectify the data (Date) issues on the delta lake record so that the date within the valid range is updated in the Microsoft SQL Datetime field. Or we can use extra transformation nodes to filter out those data.


Applicable IWX versions:

IWX 4.X