Search
2017 Volume 32
Article Contents
ORIGINAL RESEARCH   Open Access    

A survey of author name disambiguation techniques: 2010–2016

More Information
  • Abstract: Digital libraries content and quality of services are badly affected by the author name ambiguity problem in the citations and it is considered as one of the hardest problems faced by the digital library researchers. Several techniques have been proposed in the literature for the author name ambiguity problem. In this paper, we reviewed some recently presented author name disambiguation techniques and give some challenges and future research directions. We analyze the recent advancements in this field and classify these techniques into supervised, unsupervised, semi-supervised, graph-based and heuristic-based techniques according to their problem formulation that is mainly used for the author name disambiguation. A few surveys have been conducted to review different techniques for the author name disambiguation. These surveys highlighted only the methodology adopted for author name disambiguation but did not critically review their shortcomings. This survey provides a detailed review of author name disambiguation techniques available in the literature, makes a comparison of these techniques at an abstract level and discusses their limitations.
  • 加载中
  • Amancio D. R., Oliveira O. N.Jr & Costa L. D. F. 2015. Topological-collaborative approach for disambiguating authors names in collaborative networks. Scientometrics 102(1), 465–485.

    Google Scholar

    Arunachalam S. & Madhan M. 2016. Adopting orcid as a unique identifier will benefit all involved in scholarly communication. The National Medical Journal of India 29(4), 227–234.

    Google Scholar

    Aswani N., Bontcheva K. & Cunningham H. 2006. Mining information for instance unification. In International Semantic Web Conference, 329–342. Springer.

    Google Scholar

    Bekkerman R. & McCallum A. 2005. Disambiguating web appearances of people in a social network. In Proceedings of the 14th International Conference on World Wide Web, 463–470. ACM.

    Google Scholar

    Bhattacharya I. & Getoor L. 2007. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 5.

    Google Scholar

    Carrasco R. C., Serrano A. & Castillo-Buergo R. 2016. A parser for authority control of author names in bibliographic records. Information Processing & Management 52(5), 753–764.

    Google Scholar

    Chin W.-S., Zhuang Y., Juan Y.-C., Wu F., Tung H.-Y., Yu T., Wang J.-P., Chang C.-X, Yang C.-P., Chang W.-C. Huang K.-H., Kuo T.-M., Lin S.-W., Lin Y.-S., Lu Y.-C., Su Y.-C., Wei C.-K., Yin T.-C., Li C.-L., Lin T.-W., Tsai C.-H., Lin S.-D., Lin H.-T. & Lin C.-J. 2014. Effective string processing and matching for author disambiguation. The Journal of Machine Learning Research 15(1), 3037–3064.

    Google Scholar

    Chisholm A. & Hachey B 2015. Entity disambiguation with web links. Transactions of the Association for Computational Linguistics 3, 145–156.

    Google Scholar

    Christen P. 2006. A comparison of personal name matching: techniques and practical issues. In Sixth IEEE International Conference on Data Mining-Workshops (ICDMW’06), 290–294. IEEE.

    Google Scholar

    De Carvalho A. P., Ferreira A. A., Laender A. H. & Gonçalves M. A. 2011. Incremental unsupervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management 2(3), 289.

    Google Scholar

    Elliott S. 2010. Survey of author name disambiguation: 2004 to 2010. Library Philosophy and Practice 473, http://digitalcommons.unl.edu/libphilprac/473/.

    Google Scholar

    Esperidião L. V. B., Ferreira A. A., Laender A. H., Gonçalves M. A., Gomes D. M., Tavares A. I. & de Assis G. T. 2014. Reducing fragmentation in incremental author name disambiguation. Journal of Information and Data Management 5(3), 293.

    Google Scholar

    Fan X., Wang J., Pu X., Zhou L. & Lv B. 2011. On graph-based name disambiguation. Journal of Data and Information Quality (JDIQ) 2(2), 10.

    Google Scholar

    Ferreira A. A., Gonçalves M. A. & Laender A. H. 2012. A brief survey of automatic methods for author name disambiguation. Acm Sigmod Record 41(2), 15–26.

    Google Scholar

    Ferreira A. A., Gonçalves M. A. & Laender A. H. 2015. Automatic methods for disambiguating author names in bibliographic data repositories. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, 297–298. ACM.

    Google Scholar

    Ferreira A. A., Veloso A., Gonçalves M. A. & Laender A. H. 2010. Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, 39–48. ACM.

    Google Scholar

    Ferreira A. A., Veloso A., Gonçalves M. A. & Laender A. H. 2014. Self-training author name disambiguation for information scarce scenarios. Journal of the Association for Information Science and Technology 65(6), 1257–1278.

    Google Scholar

    Giunchiglia F. & Shvaiko P. 2003. Semantic matching. The Knowledge Engineering Review 18(3), 265–280.

    Google Scholar

    Gurney T., Horlings E. & Van Den Besselaar P. 2012. Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2), 435–449.

    Google Scholar

    Han D., Liu S., Hu Y., Wang B. & Sun Y. 2015. Elm-based name disambiguation in bibliography. World Wide Web 18(2), 253–263.

    Google Scholar

    Han H., Giles L., Zha H., Li C. & Tsioutsiouliklis K. 2004. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 joint ACM/IEEE conference on Digital Libraries, 2004, 296–305. IEEE.

    Google Scholar

    Han H., Xu W., Zha H. & Giles C. L. 2005. A hierarchical naive bayes mixture model for name disambiguation in author citations. In Proceedings of the 2005 ACM Symposium on Applied Computing, 1065–1069. ACM.

    Google Scholar

    Huynh T., Hoang K., Do T. & Huynh D. 2013. Vietnamese author name disambiguation for integrating publications from heterogeneous sources. In Asian Conference on Intelligent Information and Database Systems, 226–235. Springer.

    Google Scholar

    Imran M., Gillani S. & Marchese M. 2013. A real-time heuristic-based unsupervised method for name disambiguation in digital libraries. D-Lib Magazine 19(9), 1.

    Google Scholar

    Johnson D. B. 1975. Finding all the elementary circuits of a directed graph. SIAM Journal on Computing 4(1), 77–84.

    Google Scholar

    Kofod-Petersen A. 2012. How to do a structured literature review in computer science. Document released as a guide to performing a Structured Literature Review at NTNU. https://pdfs.semanticscholar.org/f9e7/b1f645ddeddfbf702558f554dd316a7692ae.pdf.

    Google Scholar

    Krzywicki A., Wobcke W., Bain M., Martinez J. C. & Compton P. 2016. Data mining for building knowledge bases: techniques, architectures and applications. Knowledge Engineering Review 31(2), 97–123.

    Google Scholar

    Kum H.-C., Krishnamurthy A., Machanavajjhala A., Reiter M. K. & Ahalt S. 2014. Privacy preserving interactive record linkage (ppirl). Journal of the American Medical Informatics Association 21(2), 212–220.

    Google Scholar

    LaFlamme M. 2016. On the problem of the namesake. Cultural Anthropology 31(1), 1–3.

    Google Scholar

    Lee D., Kang J., Mitra P., Giles C. L. & On B.-W. 2007. Are your citations clean? Communications of the ACM 50(12), 33–38.

    Google Scholar

    Levin F. H. & Heuser C. A. 2010. Evaluating the use of social networks in author name disambiguation in digital libraries. Journal of Information and Data Management 1(2), 183.

    Google Scholar

    Levin M., Krawczyk S., Bethard S. & Jurafsky D. 2012. Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology 63(5), 1030–1047.

    Google Scholar

    Liu Y., Li W., Huang Z. & Fang Q. 2015. A fast method based on multiple clustering for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology 66(3), 634–644.

    Google Scholar

    Liu Y. & Tang Y. 2015. Network based framework for author name disambiguation applications. International Journal of u-and e-Service, Science and Technology 8(9), 75–82.

    Google Scholar

    Maguire E. J. 2016. Ethnicity sensitive author disambiguation using semi-supervised learning. In Proceedings of the Knowledge Engineering and Semantic Web: 7th International Conference, KESW 2016 649, 272. Springer, 21–23 September 2016.

    Google Scholar

    Moher D., Liberati A., Tetzlaff J. & Altman D. G. 2009. Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. Annals of Internal Medicine 151(4), 264–269.

    Google Scholar

    Murnane E. L., Haslhofer B. & Lagoze C. 2013. Reslve: leveraging user interest to improve entity disambiguation on short text. In Proceedings of the 22nd International Conference on World Wide Web, 1275–1284. ACM.

    Google Scholar

    Nicholson S. W. & Bennett T. B. 2016. Dissemination and discovery of diverse data: do libraries promote their unique research data collections? International Information & Library Review 48(2), 85–93.

    Google Scholar

    On B.-W., Elmacioglu E., Lee D., Kang J. & Pei J. 2006. Improving grouped-entity resolution using quasi-cliques. In Sixth International Conference on Data Mining (ICDM’06), 1008–1015. IEEE.

    Google Scholar

    On B.-W., Lee D., Kang J. & Mitra P. 2005. Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, 344–353. ACM.

    Google Scholar

    On B.-W., Lee I. & Lee D. 2012. Scalable clustering methods for the name disambiguation problem. Knowledge and Information Systems 31(1), 129–151.

    Google Scholar

    Onodera N., Iwasawa M., Midorikawa N., Yoshikane F., Amano K., Ootani Y., Kodama T., Kiyama Y., Tsunoda H. & Yamazaki S. 2011. A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. Journal of the American Society for Information Science and Technology 62(4), 677–690.

    Google Scholar

    Oramas S., Espinosa-Anke L., Sordo M., Saggion H. & Serra X. 2016. Elmd: an automatically generated entity linking gold standard dataset in the music domain. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC.

    Google Scholar

    Palfrey J. 2016. Design choices for libraries in the digital-plus era. Daedalus 145(1), 79–86.

    Google Scholar

    Peng H.-T., Lu C.-Y., Hsu W. & Ho J.-M. 2012. Disambiguating authors in citations on the web and authorship correlations. Expert Systems with Applications 39(12), 10521–10532.

    Google Scholar

    Pereira D. A., Ribeiro-Neto B., Ziviani N., Laender A. H., Gonçalves M. A. & Ferreira A. A. 2009. Using web information for author name disambiguation. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, 49–58. ACM.

    Google Scholar

    Provost F. & Kohavi R. 1998. Guest editors’ introduction: on applied research in machine learning. Machine Learning 30(2), 127–132.

    Google Scholar

    Pyle R. L. 2016. Towards a global names architecture: the future of indexing scientific names. ZooKeys 550, 261–281.

    Google Scholar

    Santana A. F., Gonçalves M. A., Laender A. H. & Ferreira A. A. 2015. On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method. International Journal on Digital Libraries 16(3–4), 229–246.

    Google Scholar

    Scholtes J. C. & Maes F. P. E.et al. 2016. System and method for authorship disambiguation and alias resolution in electronic data. US Patent 9,264,387.

    Google Scholar

    Schulz C., Mazloumian A., Petersen A. M., Penner O. & Helbing D. 2014. Exploiting citation networks for large-scale author name disambiguation. EPJ Data Science 3(1), 1.

    Google Scholar

    Seol J.-W., Lee S.-H. & Kim K.-Y. 2016. Author disambiguation using co-author network and supervised learning approach in scholarly data. International Journal of Software Engineering and Its Applications 10(4), 73–82.

    Google Scholar

    Shin D., Kim T., Choi J. & Kim J. 2014. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1), 15–50.

    Google Scholar

    Song Y., Huang J., Councill I. G., Li J. & Giles C. L. 2007. Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, 342–351. ACM.

    Google Scholar

    Tang J., Fong A. C., Wang B. & Zhang J. 2012. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24(6), 975–987.

    Google Scholar

    Tang L. & Walsh J. P. 2010. Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps. Scientometrics 84(3), 763–784.

    Google Scholar

    Torvik V. I. & Smalheiser N. R. 2009. Author name disambiguation in medline. ACM Transactions on Knowledge Discovery from Data (TKDD) 3(3), 11.

    Google Scholar

    Tran H. N., Huynh T. & Do T. 2014. Author name disambiguation by using deep neural network. In Asian Conference on Intelligent Information and Database Systems, 123–132. Springer

    Google Scholar

    Wang J., Berzins K., Hicks D., Melkers J., Xiao F. & Pinheiro D. 2012. A boosted-trees method for name disambiguation. Scientometrics 93(2), 391–411.

    Google Scholar

    Wang P., Zhao J., Huang K. & Xu B. 2014. A unified semi-supervised framework for author disambiguation in academic social network. In International Conference on Database and Expert Systems Applications, 1–16. Springer.

    Google Scholar

    Wang X., Tang J., Cheng H. & Philip S. Y. 2011. Adana: active name disambiguation. In 2011 IEEE 11th International Conference on Data Mining, 794–803. IEEE.

    Google Scholar

    Weiss A. 2016. Examining massive digital libraries (mdls) and their impact on reference services. The Reference Librarian 57(4), 286–306.

    Google Scholar

    Wu H., Li B., Pei Y. & He J. 2014. Unsupervised author disambiguation using Dempster-Shafer theory. Scientometrics 101(3), 1955–1972.

    Google Scholar

    Zhao J., Wang P. & Huang K. 2013. A semi-supervised approach for author disambiguation in KDD CUP 2013. In Proceedings of the 2013 KDD CUP 2013 Workshop, 10. ACM.

    Google Scholar

    Zhu J., Yang Y., Xie Q., Wang L. & Hassan S.-U. 2014. Robust hybrid name disambiguation framework for large databases. Scientometrics 98(3), 2255–2274.

    Google Scholar

    Zhu L., Ghasemi-Gol M., Szekely P., Galstyan A. & Knoblock C. A. 2016. Unsupervised entity resolution on multi-type graphs. In International Semantic Web Conference, 649–667. Springer.

    Google Scholar

    Zhu Y. & Li Q. 2013. Enhancing object distinction utilizing probabilistic topic model. In 2013 International Conference on Cloud Computing and Big Data (CloudCom-Asia), 177–182. IEEE.

    Google Scholar

  • Cite this article

    Ijaz Hussain, Sohail Asghar. 2017. A survey of author name disambiguation techniques: 2010–2016. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000182
    Ijaz Hussain, Sohail Asghar. 2017. A survey of author name disambiguation techniques: 2010–2016. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000182

Article Metrics

Article views(64) PDF downloads(129)

Other Articles By Authors

ORIGINAL RESEARCH   Open Access    

A survey of author name disambiguation techniques: 2010–2016

Abstract: Abstract: Digital libraries content and quality of services are badly affected by the author name ambiguity problem in the citations and it is considered as one of the hardest problems faced by the digital library researchers. Several techniques have been proposed in the literature for the author name ambiguity problem. In this paper, we reviewed some recently presented author name disambiguation techniques and give some challenges and future research directions. We analyze the recent advancements in this field and classify these techniques into supervised, unsupervised, semi-supervised, graph-based and heuristic-based techniques according to their problem formulation that is mainly used for the author name disambiguation. A few surveys have been conducted to review different techniques for the author name disambiguation. These surveys highlighted only the methodology adopted for author name disambiguation but did not critically review their shortcomings. This survey provides a detailed review of author name disambiguation techniques available in the literature, makes a comparison of these techniques at an abstract level and discusses their limitations.

    • The first author is partially supported by a grant of the Higher Education Commission (HEC), Pakistan.

    • http://dblp.uni-trier.de

    • http://www.medline.com

    • http://citeseerx.ist.psu.edu

    • http://arxiv.org

    • http://academic.research.microsoft.com

    • http://scholar.google.com.pk

    • http://www.lbd.dcc.ufmg.br/bdbcomp

    • Indexed by Google Scholar on October 1st, 2016

    • http://dblp.uni-trier.de

    • https://aminer.org/

    • http://www.lbd.dcc.ufmg.br/bdbcomp

    • https://www.dagstuhl.de/ueber-dagstuhl/projekte/autoren-disambiguierung

    • http://www.nature.com/news/physics-paper-sets-record-with-more-than-5-000-authors-1.17567

    • © Cambridge University Press, 2017 2017Cambridge University Press
References (67)
  • About this article
    Cite this article
    Ijaz Hussain, Sohail Asghar. 2017. A survey of author name disambiguation techniques: 2010–2016. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000182
    Ijaz Hussain, Sohail Asghar. 2017. A survey of author name disambiguation techniques: 2010–2016. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000182
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return