Soumis par admin le jeu, 2011/11/17 - 18:06.
pIf you#8217;ve used a href="http://code.google.com/p/google-refine/"Google Refine/a, you know how useful its clustering algorithms are for finding and merging alternative representations of the same thing, e.g. #8220;Gödel, Escher, Bach#8221;, #8220;Godel, Escher, Bach#8221; (accents), #8220;Gödel Escher Bach#8221; (punctuation), #8220;gödel, escher, bach#8221; (case)./p
pIn my open data and data journalism projects, I perform similar data consolidation and reconciliation steps to avoid unwanted duplicates.