Problem Description:

Renaming the columns and running the pipeline results in schema mismatch exceptions. The sample error message looks like below.

21/07/15 17:37:56 ERROR SparkBatchExecutor: Exception A schema mismatch detected when writing to the Delta table.
To enable schema migration, please set:
'.option("mergeSchema", "true")'.
Table schema:
root
-- customer_id: long (nullable = true)
-- order_id: integer (nullable = true)
-- order_date: timestamp (nullable = true)
-- ship_date: timestamp (nullable = true)
-- card_number: long (nullable = true)
-- card_type: string (nullable = true)
-- product_id: integer (nullable = true)
-- quantity: integer (nullable = true)
-- cost: double (nullable = true)
-- item_total: double (nullable = true)
-- seqval: long (nullable = true)
-- start_lsn: timestamp (nullable = true)
-- order_year: integer (nullable = true)
-- ziw_target_timestamp: timestamp (nullable = true)
-- ziw_status_flag: string (nullable = true)
-- ziw_effective_start_timestamp: timestamp (nullable = true)
-- ziw_effective_end_timestamp: timestamp (nullable = true)
-- ziw_active: boolean (nullable = true)
-- ziw_is_deleted: string (nullable = true)
-- ziw_row_id: string (nullable = true)
Data schema:
root
-- customer_id: long (nullable = true)
-- order_id: integer (nullable = true)
-- order_date: timestamp (nullable = true)
-- ship_date: timestamp (nullable = true)
-- card_number: long (nullable = true)
-- cardtype: string (nullable = true)
-- product_id: integer (nullable = true)
-- quantity: integer (nullable = true)
-- cost: double (nullable = true)
-- item_total: double (nullable = true)
-- seqval: long (nullable = true)
-- start_lsn: timestamp (nullable = true)
-- ziw_target_timestamp: timestamp (nullable = true)
-- ziw_status_flag: string (nullable = true)
-- ziw_effective_start_timestamp: timestamp (nullable = true)
-- ziw_effective_end_timestamp: timestamp (nullable = true)
-- ziw_active: boolean (nullable = true)
-- ziw_is_deleted: string (nullable = true)
-- ziw_row_id: string (nullable = true)


Root cause:

You are getting the schema mismatch error because the columns in your table are different from the columns that you have in the already ingested table. In version 4.4 or below this happens for the overwrite pipelines as well and it's a bug we have fixed in the 5.0 version.


Solution:

In version 4.4 or below we can address this issue by removing the _delta_log directory in the target path then the subsequent pipeline build succeeds.

dbfs rm -r dbfs:/iw/test_spark_pl_7451/full_0/_delta_log


Applicable IWX versions:

IWX 4.X