Problem Description:


During the cdc merge job sometimes the total record count will be reduced and the below message would be displayed in the log file.


INFO] 2019-05-20 04:32:45,807 [pool-246-thread-1] infoworks.discovery.incremental.MergeDirectorySwitch:175 :: Incrementing current mongo rowcount for table citi_eomm_scmphsx_tbl by -224744

WARN] 2019-05-20 04:32:45,813 [pool-246-thread-1] infoworks.tools.utils.IWUtil:1120 :: Updating row count for table 62065ae2faba27d768435474 citi_eomm_scmphsx_tbl.

[INFO] 2019-05-20 04:32:45,813 [pool-246-thread-1] infoworks.tools.utils.IWUtil:1139 :: Current number of records for this table is 4539603

[INFO] 2019-05-20 04:32:45,813 [pool-246-thread-1] infoworks.tools.utils.IWUtil:1140 :: Merged -224744 new records

[INFO] 2019-05-20 04:32:45,813 [pool-246-thread-1] infoworks.tools.utils.IWUtil:1142 :: Updating current records to -224744 + 4539603 = 4314859



The cdc_merge job brought -224744 new records and some of the existing records were actually deleted during the merge job and the final count went to 4314859 from 4539603


Cause: 


This issue occurs if the natural key provided the table is not unique.
Consider during the Initialize and ingest, there might be some 10 records with the same ziw_row_id (This value will be generated by hashing the natural key column(s) values).

During CDC, if it gets a single record with the same ziw_row_id, then after merge dedupe will happen and the final data will have only one record which will be the latest. Hence 9 records with the same ziw_row_id will be deleted and the count will be reduced. So the negative number would be displayed in the logs.


Solution:


a) Provide a column or combination of columns as a natural key at the table configuration and then do an Initialize and ingest.
b) The next consecutive CDC loads after this will not have this issue.


Applicable IWX Versions:


2.4.*,2.5.*,2.6.*,2.7.*