What is the configuration to set the comparator to fetch CDC records? : Infoworks

In onprem offering(version 2.X-3.X):

The USE_GTE_FOR_CDC configuration allows users to fetch the CDC records based on the use case.

true: the CDC records will be fetched using the >= comparator. This is the default behavior and must be used for merge use cases to ensure that the data from the last batch is brought again. This is performed when some data of the last batch and timestamp is still being populated in the source system when the ingestion job has finished.
false: the CDC records will be fetched using the > comparator. This behavior must be used for append mode scenarios where the data for the last batch or timestamp in the source system is fully populated and the user does not want the old data again.

In Databricks offering(version 4.X onwards):

The cdc_comparator_key configuration allows users to fetch the CDC records based on the use case.

>=: Here the CDC records will be fetched using the >= comparator. This is the default value of the cdc_comparator_key config and must be used for merge use cases to ensure that the data from the last batch is brought again. This is performed when some data of the last batch and timestamp is still being populated in the source system when the ingestion job has finished.
>: Here the CDC records will be fetched using the > comparator. This value must be used for append mode scenarios where the data for the last batch or timestamp in the source system is fully populated and the user does not want the old data again.

NOTE: Data might be lost when the > comparator is used. If records with the same batch ID are being inserted in the source system when the ingestion job is running, all the records that are inserted just after the job is run and with the same batch ID will be missed in the next CDC job.

What is the configuration to set the comparator to fetch CDC records? Print

Related Articles