Search
2017 Volume 32
Article Contents
RESEARCH ARTICLE   Open Access    

The state of the art in semantic relatedness: a framework for comparison

More Information
  • Abstract: Semantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.
  • 加载中
  • Agirre E., Alfonseca E., Hall K., Kravalova J., Paşca M. & Soroa A.2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 19–27. Association for Computational Linguistics.

    Google Scholar

    Banerjee S. & Pedersen T.2002. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing (CICLing ’02), Gelbukh, A. F. (ed.). Springer-Verlag, 136–145.

    Google Scholar

    Bicici M. E.2015. RTM-DCU: predicting semantic similarity with referential translation machines. In SemEval-2015: Semantic Evaluation Exercises – International Workshop on Semantic Evaluation. http://doras.dcu.ie/20650/.

    Google Scholar

    Bollegala D., Matsuo Y. & Ishizuka M.2006. Disambiguating personal names on the web using automatically extracted key phrases. In Proceedings of the 17th European Conference on Artificial Intelligence, 553–557. IOS Press.

    Google Scholar

    Bollegala D., Matsuo Y. & Ishizuka M.2007. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th International Conference on World Wide Web (WWW ’07), 757–766. ACM.

    Google Scholar

    Bu F., Hao Y. & Zhu X.2011. Semantic relationship discovery with Wikipedia structure. InProceedings of the 22nd International Joint Conference on Artificial Intelligence – Vol. 3 (IJCAI ’11), Walsh, T. (ed.). AAAI Press, 1770–1775.

    Google Scholar

    Budan I. A. & Graeme H.2006. Evaluating WordNet-based measures of semantic distance. Computational Linguistics32(1),13–47.

    Google Scholar

    Budanitsky A. & Hirst G.2006. Evaluating Wordnet-based measures of lexical semantic relatedness. Computational Linguistics32(1),13–47.

    Google Scholar

    Chen H. H., Lin M. S. & Wei Y. C.2006. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, 1009–1016. Association for Computational Linguistics.

    Google Scholar

    Chen P., Ding W., Bowes C. & Brown D.2009. A fully unsupervised word sense disambiguation method using dependency knowledge. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’09), 28–36. Association for Computational Linguistics.

    Google Scholar

    Cilibrasi R. L. & Vitanyi P.2007. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering19(3),370–383.

    Google Scholar

    Duan J. & Zeng J.2012. Computing semantic relatedness based on search result analysis. In Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology – Vol. 3, 205–209. IEEE Computer Society.

    Google Scholar

    Euzenat J. & Shvaiko P.2013. Ontology Matching, 2nd edition.Springer-Verlag.

    Google Scholar

    Feng Y., Fani H., Bagheri E. & Jovanovic J.2015. Lexical semantic relatedness for Twitter analytics. InIEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI 2015), 202–209. IEEE.

    Google Scholar

    Ferrara F. & Tasso C.2013. Evaluating the results of methods for computing semantic relatedness. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing – Part I (CICLing ’13), 447–458. Springer-Verlag.

    Google Scholar

    Finkelstein L., Gabrilovich E., Matias Y., Rivlin E., Solan Z., Wolfman G. & Ruppin E.2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems20(1),116–131.

    Google Scholar

    Gabrilovich E. & Markovitch S.2007. Computing semantic relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07), Sangal, R., Mehta, H. & Bagga, R. K. (eds). Morgan Kaufmann Publishers Inc., 1606–1611.

    Google Scholar

    Gracia J. & Mena E.2008. Web-based measure of semantic relatedness. InProceedings of the 9th International Conference on Web Information Systems Engineering (WISE ’08), Bailey, J., Maier, D., Schewe, K. D., Thalheim, B. & Wang, X. S. (eds). Springer-Verlag, 136–150.

    Google Scholar

    Graham M., Milanowski A. & Miller J.2012. Measuring and promoting inter-rater agreement of teacher and principal performance ratings. Center for Educator Compensation Reform. http://files.eric.ed.gov/fulltext/ED532068.pdf.

    Google Scholar

    Gruninger M. & Kopena J. B.2005. Semantic integration through invariants. AI Magazine26(1),11–20.

    Google Scholar

    Gurevych I.2005. Using the structure of a conceptual network in computing semantic relatedness. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP ’05), Dale, R., Wong, K. F., Su, J. & Kwong, O. Y. (eds). Springer-Verlag, 767–778.

    Google Scholar

    Gurevych I.2006. Computing semantic relatedness across parts of speech. Technical report, Department of Computer Science, Telecooperation, Darmstadt University of Technology.

    Google Scholar

    Gurevych I. & Niederlich H.2005. Computing semantic relatedness of GermaNet concepts. InSprachtechnologie, mobile Kommunikation und linguistische Ressourcen: Proceedings of the Workshop on Applications of GermaNet II at GLDV2005, 462–474.

    Google Scholar

    Hecht B., Carton S. H., Quaderi M., Schöning J., Raubal M., Gergle D. & Downey D.2012. Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’12), 415–424. ACM.

    Google Scholar

    Hughes T. & Ramage D.2007. Lexical semantic relatedness with random graph walks. InEmpirical Methods on Natural Language Processing and Computational Natural Language Learning, 581–589.

    Google Scholar

    Jarmasz M. & Szpakowicz S.2012a. Roget’s thesaurus and semantic similarity.arXiv preprint arXiv:1204.0245.

    Google Scholar

    Jarmasz M. & Szpakowicz S.2012b. Roget’s thesaurus: a lexical resource to treasure.arXiv preprint arXiv:1204.0258.

    Google Scholar

    Jiang J. J. & Conrath D. W.1997. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.

    Google Scholar

    Karanastasi A. & Christodoulakis S.2007. The OntoNL semantic relatedness measure for OWL ontologies. In Proceedings of the 2nd International Conference on Digital Information Management, ICDIM ’07, 333–338. IEEE Computer Society.

    Google Scholar

    Krizhanovsky A. A. & Lin F.2009. Related terms search based on WordNet/Wiktionary and its application in ontology matching. arXiv preprint arXiv:0907.2209.

    Google Scholar

    Leacock C. & Chodorow M.1998. Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database49(2),265–283.

    Google Scholar

    Leong C. W. & Mihalcea R.2011. Measuring the semantic relatedness between words and images. In Proceedings of the 9th International Conference on Computational Semantics, 185–194. Association for Computational Linguistics.

    Google Scholar

    Lesk M.1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC ’86), DeBuys, V. (ed.). ACM, 24–26.

    Google Scholar

    Li Y., Bandar Z. A. & McLean D.2003. An approach for measuring semantic similarity between words using multiple information sources.IEEE Transactions on Knowledge and Data Engineering15(4),871–882.

    Google Scholar

    Matsuo Y., Mori J., Hamasaki M., Ishida K., Nishimura T., Takeda H., Hasida K. & Ishizuka M.2007. Polyphonet: an advanced social network extraction system. Web Semantics: Science, Services and Agents on the World Wide Web5(4),262–278.

    Google Scholar

    Meyer C. M. & Gurevych I.2012. To exhibit is not to loiter: a multilingual, sense-disambiguated Wiktionary for measuring verb similarity. InProceedings of the 24th International Conference on Computational Linguistics (COLING), 1763–1780.

    Google Scholar

    Mihalcea R. & Moldovan D. I.1999. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL ’99), 152–158. Association for Computational Linguistics.

    Google Scholar

    Mika P.2007. Ontologies are us: a unified model of social networks and semantics. Web Semantics: Science, Services and Agents on the World Wide Web5(1),5–15.

    Google Scholar

    Mikolov T., Sutskever I., Chen K., Corrado G. S. & Dean J.2013a. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS ’13), 3111–3119. Curran Associates Inc.

    Google Scholar

    Mikolov T., Yih W. T. & Zweig G.2013b. Linguistic regularities in continuous space word representations. In Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 746–751. The Association for Computational Linguistics.

    Google Scholar

    Milikic N., Jovanovic J. & Stankovic M.2011. Discovering the dynamics of terms’ semantic relatedness through Twitter. In Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: Big Things Come in Small Packages, 57–68.

    Google Scholar

    Miller G. A. & Charles W. G.1991. Contextual correlates of semantic similarity. Language and Cognitive Processes6(1),1–28.

    Google Scholar

    Milne D.2007. Computing semantic relatedness using Wikipedia link structure. In Proceedings of the New Zealand Computer Science Research Student Conference.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.3604.

    Google Scholar

    Mori J., Ishizuka M. & Matsuo Y.2007. Extracting keyphrases to represent relations in social networks from web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI ’07), 2820–2825. Morgan Kaufmann Publishers Inc.

    Google Scholar

    Otero-Cerdeira L., Rodríguez-Martínez F. J. & Gómez-Rodríguez A.2015. Ontology matching.Expert Systems With Applications42(2),949–971.

    Google Scholar

    Patwardhan S. & Pedersen T.2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL 2006 Workshop Making Sense of Sense – Bringing Computational Linguistics and Psycholinguistics Together, 1501, 1–8.

    Google Scholar

    Pedersen T.2012. Duluth: measuring degrees of relational similarity with the gloss vector measure of semantic relatedness. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval ’12), 497–501. Association for Computational Linguistics.

    Google Scholar

    Pedersen T., Pakhomov S. V., Patwardhan S. & Chute C. G.2007. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics40(3),288–299.

    Google Scholar

    Pirró G.2012. REWOrD: semantic relatedness in the web of data. InProceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI ’12), 129–135. AAAI Press.

    Google Scholar

    Polčicová G. & Návrat P.2002. Semantic similarity in content-based filtering. InProceedings of the 6th East European Conference on Advances in Databases and Information Systems (ADBIS ’02), Manolopoulos, Y. & Návrat, P. (eds). Springer-Verlag, 80–85.

    Google Scholar

    Rada R., Mili H., Bicknell E. & Blettner M.1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics19(1),17–30.

    Google Scholar

    Radinsky K., Agichtein E., Gabrilovich E. & Markovitch S.2011. A word at a time: computing word relatedness using Temporal Semantic Analysis. In Proceedings of the 20th International Conference on World Wide Web (WWW ’11), 337–346. ACM.

    Google Scholar

    Resnik P.1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence – Vol. 1 (IJCAI ’95), Mellish, C. S. (ed.). Morgan Kaufmann Publishers Inc., 448–453.

    Google Scholar

    Resnik P.1999. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research11,95–130.

    Google Scholar

    Rubenstein H. & Goodenough J. B.1965. Contextual correlates of synonymy. Communications of the ACM8(10),627–633.

    Google Scholar

    Sabou M., Gracia J., Angeletou S., d’Aquin M. & Motta E.2007. Evaluating the semantic web: a task-based approach. InProceedings of the 6th International Semantic Web Conference, ISWC 2007, 423–437. Springer-Verlag.

    Google Scholar

    Sahami M. & Heilman T. D.2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (WWW ’06), 377–386. ACM.

    Google Scholar

    Schütze H.1998. Automatic word sense discrimination. Computational Linguistics24(1),97–123.

    Google Scholar

    Seco N., Veale T. & Hayes J.2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence, ECAI ’2004, 1089–1090.

    Google Scholar

    Spanakis G., Siolas G. & Stafylopatis A.2009. A hybrid web-based measure for computing semantic relatedness between words. In Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI ’09), 441–448. IEEE Computer Society.

    Google Scholar

    Strube M. & Ponzetto S. P.2006. WikiRelate! Computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence – Vol. 2 (AAAI ’06), Cohn, A. (ed.). AAAI Press, 1419–1424.

    Google Scholar

    Taieb M. A. H., Aouicha M. B. & Hamadou A. B.2013. Computing semantic relatedness using Wikipedia features. Knowledge-Based Systems50,260–278.

    Google Scholar

    Turdakov D. & Velikhov P.2008. Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. InProceedings of the SYRCODIS 2008 Colloquium on Databases and Information Systems.http://ceur-ws.org/Vol-355/turdakov.pdf.

    Google Scholar

    Turney P.2006. Expressing implicit semantic relations without supervision. In Proceedings of the 21st International Committee on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), 313–320. Association for Computational Linguistics.

    Google Scholar

    Turney P. D. & Pantel P.2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research37(1),141–188.

    Google Scholar

    Vélez B., Weiss R., Sheldon M. A. & Gifford D. K.1997. Fast and effective query refinement. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’97), 6–15. ACM.

    Google Scholar

    Wan S. & Angryk R.2007. Measuring semantic similarity using Wordnet-based context vectors. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2007. ISIC, 908–913. IEEE Computer Society.

    Google Scholar

    Weng J. & Lee B. S.2011. Event detection in Twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, ICWSM 2011, 401–408. Association for the Advancement of Artificial Intelligence.

    Google Scholar

    Witten I. & Milne D.2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, 25–30. AAAI Press.

    Google Scholar

    Wu H., Min M. R. & Bai B.2014. Deep semantic embedding. In Proceedings of Workshop on Semantic Matching in Information Retrieval Co-Located with the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 46–52.

    Google Scholar

    Wu Z. & Palmer M.1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL ’94), 133–138. Association for Computational Linguistics.

    Google Scholar

    Yang D. & Powers D. M.2006. Verb similarity on the taxonomy of WordNet. InProceedings of the 3rd International WordNet Conference (GWC-06).

    Google Scholar

    Yeh E., Ramage D., Manning C. D., Agirre E. & Soroa A.2009. WikiWalk: random walks on Wikipedia for semantic relatedness. In Proceedings of the 2009 Workshop on Graph-Based Methods for Natural Language Processing, 41–49. Association for Computational Linguistics.

    Google Scholar

    Zarrinkalam F., Fani H., Bagheri E. & Kahani M.2016. Inferring implicit topical interests on Twitter. InProceedings of the 38th European Conference on IR Research, ECIR 2016, 479–491. Springer International Publishing.

    Google Scholar

    Zesch T.2010. Study of semantic relatedness of words using collaboratively constructed semantic resources. PhD thesis, Technische Universität.

    Google Scholar

    Zesch T. & Gurevych I.2006. Automatically creating datasets for measures of semantic relatedness. In Proceedings of the Workshop on Linguistic Distances (LD ‘06), 16–24. Association for Computational Linguistics.

    Google Scholar

    Zesch T. & Gurevych I.2010. The more the better? Assessing the influence of Wikipedia’s growth on semantic relatedness measures. InProceedings of the Conference on Language Resources and Evaluation (LREC ’10).

    Google Scholar

    Zesch T., Gurevych I. & Mühlhäuser M.2007. Comparing Wikipedia and German Wordnet by evaluating semantic relatedness on multiple datasets. In Proceedings of Human Language Technologies 2007: Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, 205–208. Association for Computational Linguistics.

    Google Scholar

    Zesch T., Müller C. & Gurevych I.2008. Using Wiktionary for computing semantic relatedness. InProceedings of the 23rd National Conference on Artificial Intelligence – Volume 2 (AAAI ’08), Cohn, A. (ed.). AAAI Press, 861–866.

    Google Scholar

    Zhao Q., Hoi S. C., Liu T. Y., Bhowmick S. S., Lyu M. R. & Ma W. Y.2006. Time-dependent semantic similarity measure of queries using historical click-through data. In Proceedings of the 15th International Conference on World Wide Web, 543–552. ACM.

    Google Scholar

    Zhou W., Wang H., Chao J., Zhang W. & Yu Y.2012. LODDO: using linked open data description overlap to measure semantic relatedness between named entities. In Proceedings of the 2011 Joint International Conference on The Semantic Web (JIST ’11), Pan, J. Z., Chen, H., Kim, H. G., Li, J. & Wu, Z. (eds). Springer-Verlag, 268–283.

    Google Scholar

  • Cite this article

    Yue Feng, Ebrahim Bagheri, Faezeh Ensan, Jelena Jovanovic. 2017. The state of the art in semantic relatedness: a framework for comparison. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000029
    Yue Feng, Ebrahim Bagheri, Faezeh Ensan, Jelena Jovanovic. 2017. The state of the art in semantic relatedness: a framework for comparison. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000029

Article Metrics

Article views(32) PDF downloads(39)

RESEARCH ARTICLE   Open Access    

The state of the art in semantic relatedness: a framework for comparison

Abstract: Abstract: Semantic relatedness (SR) is a form of measurement that quantitatively identifies the relationship between two words or concepts based on the similarity or closeness of their meaning. In the recent years, there have been noteworthy efforts to compute SR between pairs of words or concepts by exploiting various knowledge resources such as linguistically structured (e.g. WordNet) and collaboratively developed knowledge bases (e.g. Wikipedia), among others. The existing approaches rely on different methods for utilizing these knowledge resources, for instance, methods that depend on the path between two words, or a vector representation of the word descriptions. The purpose of this paper is to review and present the state of the art in SR research through a hierarchical framework. The dimensions of the proposed framework cover three main aspects of SR approaches including the resources they rely on, the computational methods applied on the resources for developing a relatedness metric, and the evaluation models that are used for measuring their effectiveness. We have selected 14 representative SR approaches to be analyzed using our framework. We compare and critically review each of them through the dimensions of our framework, thus, identifying strengths and weaknesses of each approach. In addition, we provide guidelines for researchers and practitioners on how to select the most relevant SR method for their purpose. Finally, based on the comparative analysis of the reviewed relatedness measures, we identify existing challenges and potentially valuable future research directions in this domain.

    • The first two authors acknowledge funding from the Natural Sciences and Engineering Research Council of Canada.

    • While acknowledging the differences, we use the terms ‘words, concepts, terms and entities’, interchangeably in this paper.

    • http://www.worldwidewebsize.com/

    • © Cambridge University Press, 2017 2017Cambridge University Press
References (81)
  • About this article
    Cite this article
    Yue Feng, Ebrahim Bagheri, Faezeh Ensan, Jelena Jovanovic. 2017. The state of the art in semantic relatedness: a framework for comparison. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000029
    Yue Feng, Ebrahim Bagheri, Faezeh Ensan, Jelena Jovanovic. 2017. The state of the art in semantic relatedness: a framework for comparison. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000029
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return