University Bretagne Sud, IRISA Lab, France – Vannes Email: ahmad.issa-alaa-eddine@univ-ubs.fr, giuseppe.berio@univ-ubs.fr, nicolas.bechet@irisa.fr"/> LINA - University of Nantes, France E-mail: mounira.harzallah@univ-nantes.fr"/> Lebanese University, Lebanon Email: ahmad.faour@ul.edu.lb"/>
Search
2021 Volume 36
Article Contents
RESEARCH ARTICLE   Open Access    

A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance

More Information
  • Abstract: Patterns have been extensively used to extract hypernym relations from texts. The most popular patterns are Hearst’s patterns, formulated as regular expressions mainly based on lexical information. Experiences have reported good precision and low recall for such patterns. Thus, several approaches have been developed for improving recall. While these approaches perform better in terms of recall, it remains quite difficult to further increase recall without degrading precision. In this paper, we propose a novel 3-phase approach based on sequential pattern mining to improve pattern-based approaches in terms of both precision and recall by (i) using a rich pattern representation based on grammatical dependencies (ii) discovering new hypernym patterns, and (iii) extending hypernym patterns with anti-hypernym patterns to prune wrong extracted hypernym relations. The results obtained by performing experiments on three corpora confirm that using our approach, we are able to learn sequential patterns and combine them to outperform existing hypernym patterns in terms of precision and recall. The comparison to unsupervised distributional baselines for hypernym detection shows that, as expected, our approach yields much better performance. When compared to supervised distributional baselines for hypernym detection, our approach can be shown to be complementary and much less loosely coupled with training datasets and corpora.
  • 加载中
  • Agrawal , R. & Srikant , R. 1995. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, ICDE 1995, IEEE Computer Society, 3–14, http://dl.acm.org/citation.cfm?id=645480.655281

    Google Scholar

    Aldine , A. I. A., Harzallah , M., Giuseppe , B., BÉchet , N. & Faour , A. 2018. Redefining hearst patterns by using dependency relations. In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, INSTICC, SciTePress, 148–155, doi: 10.5220/0006962201480155

    Google Scholar

    Baroni , M., Bernardi , R., Do , N. Q. & Chieh Shan , C. 2012. Entailment above the word level in distributional semantics. In EACL, 23–32.

    Google Scholar

    Bechet , N., Cellier , P., Charnois , T. & Cremilleux , B. 2012. Sequential pattern mining to discover relations between genes and rare diseases. In IEEE Int. Symp. on Computer-Based Medical Systems (CBMS), 1–6.

    Google Scholar

    BÉchet , N., Cellier , P., Charnois , T. & CrÉmilleux , B. 2015. Sequence mining under multiple constraints. In Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC 2015, ACM, 908–914, doi: 10.1145/2695664.2695889, http://doi.acm.org/10.1145/2695664.2695889.

    Google Scholar

    Buitelaar , P., Cimiano , P. & Magnini , B. 2005. Ontology learning from text: An overview. In Ontology Learning from Text: Methods, Applications and Evaluation, 3–12.

    Google Scholar

    Camacho-Collados , J., Delli Bovi , C., Espinosa-Anke , L., Oramas , S., Pasini , T., Santus , E., Shwartz , V., Navigli , R. & Saggion , H. 2018. SemEval-2018 Task 9: Hypernym discovery. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), Association for Computational Linguistics.

    Google Scholar

    Cellier , P., Charnois , T. & Plantevit , M. 2010. Sequential patterns to discover and characterise biological relations. In Computational Linguistics and Intelligent Text Processing, Gelbukh, A. (ed). Springer Berlin Heidelberg, 537–548.

    Google Scholar

    Chandramouli , K., Kliegr , T., Nemrava , J., Svatek , V. & Izquierdo , E. 2008. Query refinement and user relevance feedback for contextualized image retrieval. In 2008 5th International Conference on Visual Information Engineering (VIE 2008), 453–458.

    Google Scholar

    Cui , H., Kan , M. Y. & Chua , T. S. 2007. Soft pattern matching models for definitional question answering. ACM Transactions on Information Systems 25, 8.

    Google Scholar

    Devlin , J., Chang , M. W., Lee , K. & Toutanova , K. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding.

    Google Scholar

    Fellbaum , C. 1998. Wordnet: An Electronic Lexical Database. MIT Press.

    Google Scholar

    Gomez-PÉrez , A. & Manzano-Mancho , D. 2004. An overview of methods and tools for ontology learning from texts. The Knowledge Engineering Review 19(3), 187–212. doi: 10.1017/S0269888905000251.

    CrossRef   Google Scholar

    Hearst , M. A. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, 539–545.

    Google Scholar

    Hearst , M. A. 1998. Automated Discovery of Wordnet Relations. WordNet: An Electronic Lexical Database, 131–152.

    Google Scholar

    Jacques , M. P. & Aussenac-Gilles , N. 2006. VariabilitÉ des performances des outils de tal et genre textuel. cas des patrons lexico-syntaxiques 47, 11–32.

    Google Scholar

    Klein , D. & Manning , C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL 2003, Association for Computational Linguistics, 423–430, doi: 10.3115/1075096.1075150, https://doi.org/10.3115/1075096.1075150.

    Google Scholar

    Kotlerman , L., Dagan , I., Szpektor , I. & Zhitomirsky-Geffet , M. 2010. Directional distributional similarity for lexical inference. NLE, 359–389.

    Google Scholar

    Levy , O., Remus , S., Biemann , C. & Dagan , I. 2015. Do supervised distributional methods really learn lexical inference relations? In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, 970–976. doi: 10.3115/v1/N15-1098, https://www.aclweb.org/anthology/N15-1098.

    Google Scholar

    Lin , D. 2003. Dependency-based evaluation of minipar. Treebanks - Building and Using Parsed Corpora, 317–329.

    Google Scholar

    Mikolov , T., Sutskever , I., Chen , K., Corrado , G. S. & Dean , J. 2013. Distributed representations of words and phrases and their compositionality. In NIPS, 3111–3119.

    Google Scholar

    Mirkin , S., Dagan , I. & Geffet , M. 2006. Integrating pattern-based and distributional similarity methods for lexical entailment acquisition. In COLING and ACL, 579–586.

    Google Scholar

    Nguyen , D. P. T., Matsuo , Y. & Ishizuka , M. 2007. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS.

    Google Scholar

    Orna-Montesinos , C. 2011. Words & Patterns: Lexico-Grammatical Patterns and Semantic Relations in Domain-Specific Discourses, 24.

    Google Scholar

    Pei , J., Han , J., Mortazavi-Asl , B., Pinto , H., Chen , Q., Dayal , U. & Hsu , M. C. 2001. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In International Conference on Data Engineering, 215–224.

    Google Scholar

    Pennington , J., Socher , R. & Manning , C. D. 2014. Glove: Global vectors for word representation. In EMNL, 1532–1543.

    Google Scholar

    Ponzetto , S. P. & Strube , M. 2011. Taxonomy induction based on a collaboratively built knowledge repository. Artificial Intelligence 175(9), 1737–1756, https://doi.org/10.1016/j.artint.2011.01.003, http://www.sciencedirect.com/science/article/pii/S000437021100004X

    Google Scholar

    Roller , S., Kiela , D. & Nickel , M. 2018. Hearst patterns revisited: Automatic hypernym detection from large text corpora. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, 358–363, http://aclweb.org/anthology/P18-2057.

    Google Scholar

    Sang , E. T. K. & Hofmann , K. 2009. Lexical patterns or dependency patterns: Which is better for hypernym extraction? In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, Association for Computational Linguistics, 174–182.

    Google Scholar

    Seitner , J., Bizer , C., Eckert , K., Faralli , S., Meusel , R., Paulheim , H. & Ponzetto , S. P. 2016 A large database of hypernymy relations extracted from the web. In LREC.

    Google Scholar

    Sheena , N., Jasmine , S. M. & Joseph , S. 2016. Automatic extraction of hypernym and meronym relations in english sentences using dependency parser. In Procedia Computer Science, 539–546.

    Google Scholar

    Shwartz , V., Goldberg , Y. & Dagan , I. 2016. Improving hypernymy detection with an integrated path-based and distributional method. CoRR abs/1603.06076, http://arxiv.org/abs/1603.06076,

    Google Scholar

    Shwartz , V., Santus , E. & Schlechtweg , D. 2017. Hypernyms under siege: Linguistically-motivated artillery for hypernymy detection. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Association for Computational Linguistics, 65–75, https://www.aclweb.org/anthology/E17-1007

    Google Scholar

    Snow , R., Jurafsky , D. & Ng , A. 2005. Learning Syntactic Patterns for Automatic Hypernym Discovery. MIT Press, 1297–1304.

    Google Scholar

    Srikant , R. & Agrawal , R. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 1996, Springer-Verlag, 3–17, http://dl.acm.org/citation.cfm?id=645337.650382

    Google Scholar

    Wang , J. & Han , J. 2004. Bide: Efficient mining of frequent closed sequences. In Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, IEEE Computer Society, 79, http://dl.acm.org/citation.cfm?id=977401.978142

    Google Scholar

    Weeds , J. & Weir , D. 2003. A general framework for distributional similarity. In EMLP, 81–88.

    Google Scholar

    Yan , X., Han , J. & Afshar , R. 2003. Clospan: Mining closed sequential patterns in large datasets. In: SDM, 166–177.

    Google Scholar

    Yang , Z., Dai , Z., Yang , Y., Carbonell , J., Salakhutdinov , R. & Le , Q. V. (2020) Xlnet: Generalized autoregressive pretraining for language understanding.

    Google Scholar

    Yu , C., Han , J., Wang , P., Song , Y., Zhang , H., Ng , W. & Shi , S. (2020) When hearst is not enough: Improving hypernymy detection from corpus with distributional models.

    Google Scholar

    Zhang , E. & Zhang , Y. 2009. Average Precision, Springer US, 192–193. doi: 10.1007/978-0-387-39940-9_482, https://doi.org/10.1007/978-0-387-39940-9_482

    Google Scholar

    Zheng , W., Cheng , H., Yu , J. X., Zou , L. & Zhao , K. 2019. Interactive natural language question answering over knowledge graphs. Information Sciences 481, 141–159, doi: https://doi.org/10.1016/j.ins.2018.12.032, https://www.sciencedirect.com/science/article/pii/S0020025518309848

    Google Scholar

  • Cite this article

    Ahmad Issa Alaa Aldine, Mounira Harzallah, Giuseppe Berio, Nicolas Béchet, Ahmad Faour. 2021. A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888921000126
    Ahmad Issa Alaa Aldine, Mounira Harzallah, Giuseppe Berio, Nicolas Béchet, Ahmad Faour. 2021. A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888921000126

Article Metrics

Article views(100) PDF downloads(53)

RESEARCH ARTICLE   Open Access    

A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance

Abstract: Abstract: Patterns have been extensively used to extract hypernym relations from texts. The most popular patterns are Hearst’s patterns, formulated as regular expressions mainly based on lexical information. Experiences have reported good precision and low recall for such patterns. Thus, several approaches have been developed for improving recall. While these approaches perform better in terms of recall, it remains quite difficult to further increase recall without degrading precision. In this paper, we propose a novel 3-phase approach based on sequential pattern mining to improve pattern-based approaches in terms of both precision and recall by (i) using a rich pattern representation based on grammatical dependencies (ii) discovering new hypernym patterns, and (iii) extending hypernym patterns with anti-hypernym patterns to prune wrong extracted hypernym relations. The results obtained by performing experiments on three corpora confirm that using our approach, we are able to learn sequential patterns and combine them to outperform existing hypernym patterns in terms of precision and recall. The comparison to unsupervised distributional baselines for hypernym detection shows that, as expected, our approach yields much better performance. When compared to supervised distributional baselines for hypernym detection, our approach can be shown to be complementary and much less loosely coupled with training datasets and corpora.

    • In Snow et al. (2005), paths are derived from Minipar parser (Lin 2003) which is a shallow parser; while in Sheena et al. (2016), paths are derived from Stanford parser (Klein & Manning 2003) which is a dependency parser.

    • For instance, in noun phrases ‘musical instrument’ and ‘the title of book’, ‘instrument’ and ‘title’ are the respective headwords.

    • Blank cells are those corresponding to patterns that are not learned by the corpus.

    • Word2Vec is available in Gensim python library.

    • Number 20 has been chosen by observing unsuccessful tests to extract good patterns from TS with less than 20 sentences; additionally, few sentences can be manually analyzed if needed.

    • © The Author(s), 2021. Published by Cambridge University Press on behalf of Asian Journal of Law and Society2021Cambridge University Press
References (42)
  • About this article
    Cite this article
    Ahmad Issa Alaa Aldine, Mounira Harzallah, Giuseppe Berio, Nicolas Béchet, Ahmad Faour. 2021. A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888921000126
    Ahmad Issa Alaa Aldine, Mounira Harzallah, Giuseppe Berio, Nicolas Béchet, Ahmad Faour. 2021. A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance. The Knowledge Engineering Review 36(1), doi: 10.1017/S0269888921000126
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return