Problem Description
Incremental ingestion fails at the merge phase if the CDC data has multiple records for the given natural key with the below error.
20/06/05 05:41:30 ERROR Merger: java.lang.UnsupportedOperationException: Cannot perform MERGE as multiple source rows matched and attempted to update the same target row in the Delta table.
20/06/05 05:41:30 ERROR DistJobsDriver: io.infoworks.saas.ingestion.commons.exception.IWException: java.lang.UnsupportedOperationException: Cannot perform MERGE as multiple source rows matched and attempted to update the same target row in the Delta table.
at io.infoworks.saas.ingestion.jobs.merger.Merger.databricksMerge(Merger.java:147)
at io.infoworks.saas.ingestion.jobs.merger.Merger.runMerge(Merger.java:217)
at io.infoworks.saas.ingestion.jobs.merger.Merger.runJob(Merger.java:204)
at io.infoworks.saas.ingestion.jobs.DistJobsDriver.submitMergeJob(DistJobsDriver.java:87)
at io.infoworks.saas.ingestion.jobs.DistJobsDriver.runJobs(DistJobsDriver.java:51)
at io.infoworks.saas.ingestion.jobs.DistJobsDriver.main(DistJobsDriver.java:28)
at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command--1:1)
at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw$$iw$$iw.<init>(command--1:44)
at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw$$iw.<init>(command--1:46)
at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw.<init>(command--1:48)
at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw.<init>(command--1:50)
Cause
This happens if the CDC job brings multiple records for the given natural key.
Solution
If the CDC table has the duplicate records then set a configuration at a table level to run dedupe on the CDC table before the merge. This will dedupe the CDC table using the natural key and ZIW_TARGET_TIMESTAMP column. Set the below key-value pair at the table level.
Key : should_run_dedupe Value: true
Applicable Infoworks versions
Infoworks Datafoundry v3.x on Databricks