Problem Description


Incremental ingestion fails at the merge phase if the CDC data has multiple records for the given natural key with the below error.


20/06/05 05:41:30 ERROR Merger: java.lang.UnsupportedOperationException: Cannot perform MERGE as multiple source rows matched and attempted to update the same target row in the Delta table.

20/06/05 05:41:30 ERROR DistJobsDriver: io.infoworks.saas.ingestion.commons.exception.IWException: java.lang.UnsupportedOperationException: Cannot perform MERGE as multiple source rows matched and attempted to update the same target row in the Delta table.

    at io.infoworks.saas.ingestion.jobs.merger.Merger.databricksMerge(Merger.java:147)

    at io.infoworks.saas.ingestion.jobs.merger.Merger.runMerge(Merger.java:217)

    at io.infoworks.saas.ingestion.jobs.merger.Merger.runJob(Merger.java:204)

    at io.infoworks.saas.ingestion.jobs.DistJobsDriver.submitMergeJob(DistJobsDriver.java:87)

    at io.infoworks.saas.ingestion.jobs.DistJobsDriver.runJobs(DistJobsDriver.java:51)

    at io.infoworks.saas.ingestion.jobs.DistJobsDriver.main(DistJobsDriver.java:28)

    at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command--1:1)

    at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw$$iw$$iw.<init>(command--1:44)

    at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw$$iw.<init>(command--1:46)

    at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw$$iw.<init>(command--1:48)

    at line06d343ff07b84216ae75d4f415b410f925.$read$$iw$$iw.<init>(command--1:50)


Cause


This happens if the CDC job brings multiple records for the given natural key.


Solution


If the CDC table has the duplicate records then set a configuration at a table level to run dedupe on the CDC table before the merge. This will dedupe the CDC table using the natural key and ZIW_TARGET_TIMESTAMP column. Set the below key-value pair at the table level.


Key  :   should_run_dedupe
Value:   true



Applicable Infoworks versions

Infoworks Datafoundry v3.x on Databricks