Frequently when importing / scrubbing data within CRM, identifying duplicates to remove or ‘flag’ may be needed. It’s helpful to have a query that will return all the duplicates in the database –but exclude a single ‘surviving’ record. (i.e. Return all-but-one of the duplicated records.)
This query is one of my go-to query patterns – I rarely, if ever, use it as-is, but will modify it to fit the particular entity / scenario I need to investigate.
The returned record set might used as the source for a Scribe Insight job to either ‘flag’ or inactivate or delete the duplicates depending on the situation. – Of course you’ll need to take into account the record’s relationship to other entities etc.
This example query assumes the duplicated contacts were entered more than 3 milliseconds apart in SQL. If your data was migrated from another system or entered as a result of a large batch and you need finer control, consider using the 'versionnumber' field as a unique timestamp field. (The VERSIONNUMBER field is a unique value that gets incremented as records are updated – it can be very useful.)
Another variations might be that you want to change MAX to MIN – and adjust the JOIN in the query to return all but the most recently created or updated record.
The key is that this is a pattern you can build on and adapt to your particular situation.
( Also, by blogging it here, I no longer have to search my laptop for copies of it when I need a example for someone. )
Title: Find Duplicate Contact to Remove (Example)
Author: Scott Sewell, Customer Effective
Description: Find the most recently created contacts that duplicates an existing contact based on fullname and emailaddress1. The Subquery finds the Name/Email and CreatedOn date of the most recently created duplicates. e.g. if there are multiple contact records with the same name and emailaddress1, return the date of all but the oldest one. */
SELECT FC.contactid ,FC.fullname ,FC.emailaddress1 FROM dbo.FilteredContact AS FC JOIN ( SELECT FC2.fullname ,FC2.emailaddress1 ,MIN(FC2.createdon) MIN_CREATED_ON FROM dbo.FilteredContact AS FC2 WHERE FC2.emailaddress1 IS NOT NULL AND FC2.statecode = 0 GROUP BY FC2.fullname ,FC2.emailaddress1 HAVING COUNT(FC2.contactid) >= 2 ) AS FC_DUPE ON FC_DUPE.MIN_CREATED_ON < FC.createdon AND FC.statecode = 0 AND FC.fullname = FC_DUPE.fullname AND FC.emailaddress1 = FC_DUPE.emailaddress1