The folks over at Digg recently announced several major updates to the website’s dupe detection technology. The changes were implemented in an effort to better identify and eliminate duplicate submissions on the site.
According to Digg, the most common duplicate issues involve the same stories from the same site, but having different URLs. The Digg R&D team has developed a new document similarity algorithm to identify these submissions in a more reliable manner.
The other common duplicate issue involves the same or similar story being submitted from two different sites. This type of dupe is obviously more difficult to identify, but Digg has implemented improved technology that is able to match stories with similar titles and descriptions with a much higher accuracy than before.
While these technology updates will mainly be working behind the scenes, there are also some changes that will be immediately noticed by users. The most important change to note is that the duplicate check has been moved to immediately after the URL entry. This means that users won’t need to enter their description in order to see if the story has already been submitted. Additionally, the lag time involved in checking for dupes has been reduced which will make the submission process easier and faster.







