Disambiguation of References to Individuals

We study the problem of disambiguating references to named people in web data. Each name spotted online is shared by several hundred people on average, and teasing apart these references is critical for a new family of person-aware analytical applications. We present and evaluate algorithms for this problem, and give results to indicate that 25% of personal references may be successfully disambiguated with precision in excess of 95%, but that larger fractions cause a significant decline in precision..

By: Levon Lloyd; Varun Bhagwan; Daniel F. Gruhl; Andrew Tomkins

Published in: RJ10364 in 2005


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .