Lero–Science Foundation Ireland Research Centre for Software, Department of Computer Science and Information Systems, University of Limerick, Limerick, Ireland"/> NatPro Center, School of Pharmacy and Pharmaceutical Sciences, Trinity College Dublin, Dublin 2, Ireland"/>
Search
2023 Volume 38
Article Contents
RESEARCH ARTICLE   Open Access    

Using active learning and an agent-based system to perform interactive knowledge extraction based on the COVID-19 corpus

More Information
  • Corresponding author: Corresponding author: Yao Yao; Email: Yao.Yao@ncirl.ie 
  • Abstract: Efficient knowledge extraction from Big Data is quite a challenging topic. Recognizing relevant concepts from unannotated data while considering both context and domain knowledge is critical to implementing successful knowledge extraction. In this research, we provide a novel platform we call Active Learning Integrated with Knowledge Extraction (ALIKE) that overcomes the challenges of context awareness and concept extraction, which have impeded knowledge extraction in Big Data. We propose a method to extract related concepts from unorganized data with different contexts using multiple agents, synergy, reinforcement learning, and active learning.We test ALIKE on the datasets of the COVID-19 Open Research Dataset Challenge. The experiment result suggests that the ALIKE platform can more efficiently distinguish inherent concepts from different papers than a non-agent-based method (without active learning) and that our proposed approach has a better chance to address the challenges of knowledge extraction with heterogeneous datasets. Moreover, the techniques used in ALIKE are transferable across any domain with multidisciplinary activity.
  • 加载中
  • Bishop , C. M. 2006. Pattern Recognition and Machine Learning. Springer.

    Google Scholar

    Che , D., Safran , M. & Peng , Z. 2013. From big data to big data mining: challenges, issues, and opportunities. In International Conference on Database Systems for Advanced Applications, Springer, 1–15.

    Google Scholar

    Chen , X., Zhang , N., Xie , X., Deng , S., Yao , Y., Tan , C., … & Chen , H. 2022. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Proceedings of the ACM Web Conference 2022, 2778–2788.

    Google Scholar

    Cheng , Y., Chen , K., Sun , H., Zhang , Y. & Tao , F. 2018. Data and knowledge mining with big data towards smart production. Journal of Industrial Information Integration 9, 1–13.

    Google Scholar

    Coppola , M., Guo , J., Gill , E. & de Croon , G. C. 2019. Provable self-organizing pattern formation by a swarm of robots with limited knowledge. Swarm Intelligence 13 (1), 59–94.

    Google Scholar

    Costa , J. P., Grobelnik , M., Fuart , F., Stopar , L., Epelde , G., Fischaber , S., … & Davis , P. 2020. Meaningful big data integration for a global COVID-19 strategy. IEEE Computational Intelligence Magazine 15(4), 51–61.

    Google Scholar

    Devlin , J., Chang , M. W., Lee , K. & Toutanova , K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

    Google Scholar

    Dutta , A., Meilicke , C., Niepert , M. & Ponzetto , S. P. 2013. Integrating open and closed information extraction: challenges and first steps. In NLP-DBPEDIA@ ISWC.

    Google Scholar

    Elahi , M., Sugiyama , M. & Kaplan , D. 2016. Active learning in recommender systems. In Recommender Systems Handbook, Ricci , F., Rokach , L. & Shapira , B. (eds), 2nd edition. Springer US. doi: 10.1007/978-1-4899-7637-6. hdl:11311/1006123. ISBN 978-1-4899-7637-6.

    Google Scholar

    Ghosh , S. & Ghosh , S. K. 2022. MANTRA: semantic mobility knowledge analytics framework for trajectory annotation. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 1–2.

    Google Scholar

    Goble , C. & Stevens , R. 2008. State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics 41 (5), 687–693.

    Google Scholar

    Goldberg , Y. & Levy , O. 2014. word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.

    Google Scholar

    Gyrard , A., Gaur , M., Padhee , S., Sheth , A. & Juganaru-Mathieu , M. 2018. Knowledge Extraction for the Web of Things (KE4WoT) WWW 2018 challenge summary. In Companion Proceedings of the The Web Conference 2018, 1935–1936.

    Google Scholar

    Hürriyetoğlu , A., Yörük , E., Mutlu , O., Duruşan , F., Yoltar , Ç., Yüret , D. & Gürel , B. 2021. Cross-context news corpus for protest event-related knowledge base construction. Data Intelligence 3 (2), 308–335.

    Google Scholar

    Kaggle. 2020. COVID-19 Open Research Dataset Challenge (CORD-19).

    Google Scholar

    Kendal , S. L. & Creen , M. 2007. An Introduction to Knowledge Engineering. Springer, ISBN 978-1-84628-475-5, OCLC 70987401.

    Google Scholar

    Kraljevic , Z., Searle , T., Shek , A., Roguski , L., Noor , K., Bean , D., … & Dobson , R. J. 2021. Multi-domain clinical natural language processing with MedCAT: the medical concept annotation toolkit. Artificial Intelligence in Medicine 117, 102083.

    Google Scholar

    Kulikovskikh , I., Lipic , T. & Šmuc , T. 2020. From knowledge transmission to knowledge construction: a step towards human-like active learning. Entropy 22 (8), 906.

    Google Scholar

    Lehmann , J., Isele , R., Jakob , M., Jentzsch , A., Kontokostas , D., Mendes , P. N., Hellmann , S., Morsey , M., Van Kleef , P. & Auer , S., et al. 2015. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6( (2), 167–195.

    Google Scholar

    Liu , B., Guo , W., Niu , D., Wang , C., Xu , S., Lin , J., … & Xu , Y. 2019. A user-centred concept mining system for query and document understanding at tencent. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1831–1841.

    Google Scholar

    Nadgeri , A., Bastos , A., Singh , K., Mulang , I. O., Hoffart , J., Shekarpour , S. & Saraswat , V. 2021. Kgpool: Dynamic knowledge graph context selection for relation extraction. arXiv preprint arXiv:2106.00459.

    Google Scholar

    Potter , S. 2003. A survey of knowledge acquisition from natural language. TMA of Knowledge Acquisition from Natural Language.

    Google Scholar

    Rosenthal , S., Biswas , J. & Veloso , M. M. 2010. An effective personal mobile robot agent through symbiotic human-robot interaction. In AAMAS, 10, 915–922.

    Google Scholar

    Settles , B. 2010. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin–Madison. Retrieved 2014-11-18.

    Google Scholar

    Shubhomoy , D., Wong , W.-K., Dietterich , T., Fern , A. & Emmott , A. 2016. Incorporating expert feedback into active anomaly discovery. In IEEE 16th International Conference on Data Mining, Bonchi , F., Domingo-Ferrer , J., Baeza-Yates , R., Zhou , Z.-H. & Wu , X. (eds). IEEE, 853–858. doi: 10.1109/ICDM.2016.0102. ISBN 978-1-5090-5473-2.

    Google Scholar

    Suchanek , F. M., Kasneci , G. & Weikum , G. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web. ACM, 697–706.

    Google Scholar

    Tenorth , M. & Beetz , M. 2017. Representations for robot knowledge in the KnowRob framework. Artificial Intelligence 247, 151–169.

    Google Scholar

    Unbehauen , J., Hellmann , S., Auer , S. & Stadler , C. 2012. Knowledge extraction from structured sources. In Search Computing, 34–52.

    Google Scholar

    Wei , C. & Hindriks , K. V. 2012. An agent-based cognitive robot architecture. In International Workshop on Programming Multi-Agent Systems. Springer, 54–71.

    Google Scholar

    Weichselbraun , A., Gindl , S. & Scharl , A. 2014. Enriching semantic knowledge bases for opinion mining in big data applications. Knowledge-based Systems 69, 78–85.

    Google Scholar

    Wu , W., Li , H., Wang , H. & Zhu , K. Q. 2012. Probase: a probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 481–492.

    Google Scholar

    Zhukova , A., Hamborg , F., Donnay , K. & Gipp , B. 2021. Concept identification of directly and indirectly related mentions referring to groups of persons. In International Conference on Information. Springer, 514–526.

    Google Scholar

  • Cite this article

    Yao Yao, Junying Liu, Conor Ryan. 2023. Using active learning and an agent-based system to perform interactive knowledge extraction based on the COVID-19 corpus. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000085
    Yao Yao, Junying Liu, Conor Ryan. 2023. Using active learning and an agent-based system to perform interactive knowledge extraction based on the COVID-19 corpus. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000085

Article Metrics

Article views(126) PDF downloads(47)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Using active learning and an agent-based system to perform interactive knowledge extraction based on the COVID-19 corpus

  • Corresponding author: Corresponding author: Yao Yao; Email: Yao.Yao@ncirl.ie 

Abstract: Abstract: Efficient knowledge extraction from Big Data is quite a challenging topic. Recognizing relevant concepts from unannotated data while considering both context and domain knowledge is critical to implementing successful knowledge extraction. In this research, we provide a novel platform we call Active Learning Integrated with Knowledge Extraction (ALIKE) that overcomes the challenges of context awareness and concept extraction, which have impeded knowledge extraction in Big Data. We propose a method to extract related concepts from unorganized data with different contexts using multiple agents, synergy, reinforcement learning, and active learning.We test ALIKE on the datasets of the COVID-19 Open Research Dataset Challenge. The experiment result suggests that the ALIKE platform can more efficiently distinguish inherent concepts from different papers than a non-agent-based method (without active learning) and that our proposed approach has a better chance to address the challenges of knowledge extraction with heterogeneous datasets. Moreover, the techniques used in ALIKE are transferable across any domain with multidisciplinary activity.

    • This work was supported with the financial support of the Science Foundation Ireland grant 13/RC/2094_P2 and co-funded under the European Regional Development Fund through the Southern & Eastern Regional Operational Programme to Lero—the Science Foundation Ireland Research Centre for Software (www.lero.ie). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 754489. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

    • This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
References (32)
  • About this article
    Cite this article
    Yao Yao, Junying Liu, Conor Ryan. 2023. Using active learning and an agent-based system to perform interactive knowledge extraction based on the COVID-19 corpus. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000085
    Yao Yao, Junying Liu, Conor Ryan. 2023. Using active learning and an agent-based system to perform interactive knowledge extraction based on the COVID-19 corpus. The Knowledge Engineering Review 38(1), doi: 10.1017/S0269888923000085
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return