Instituto Gulbenkian de Ciência, Oeiras, Portugal e-mail: dfaria@igc.gulbenkian.pt"/> INESC-ID, Lisboa, Portugal"/> Department of Computer Science, Università degli Studi di Milano, Milan, Italy e-mails: alfio.ferrara@unimi.it, stefano.montanelli@unimi.it"/> Data Science Research Center, Università degli Studi di Milano, Milan, Italy"/> City, University of London, London, UK e-mail: ernesto.jimenez-ruiz@city.ac.uk"/> Department of Informatics, University of Oslo, Oslo, Norway e-mail: ernestoj@ifi.uio.no"/> Lasige, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal e-mail: clpesquita@fc.ul.pt"/>
Search
2020 Volume 35
Article Contents
RESEARCH ARTICLE   Open Access    

Crowd-assessing quality in uncertain data linking datasets

More Information
  • Abstract: The quality of a dataset used for evaluating data linking methods, techniques, and tools depends on the availability of a set of mappings, called reference alignment, that is known to be correct. In particular, it is crucial that mappings effectively represent relations between pairs of entities that are indeed similar due to the fact that they denote the same object. Since the reliability of mappings is decisive in order to perform a fair evaluation of automatic linking methods and tools, we call this property of mappings as mapping fairness. In this article, we propose a crowd-based approach, called Crowd Quality (CQ), for assessing the quality of data linking datasets by measuring the fairness of the mappings in the reference alignment. Moreover, we present a real experiment, where we evaluate two state-of-the-art data linking tools before and after the refinement of the reference alignment based on the CQ approach, in order to present the benefits deriving from the crowd assessment of mapping fairness.
  • 加载中
  • Achichi , M., Cheatham , M., Dragisic , Z., Euzenat , J., Faria , D., Ferrara , A., Flouris , G., Fundulaki , I., Harrow , I., Ivanova , V., Jiménez-Ruiz, E., Kuss, E., Lambrix, P., Leopold, H., Li, H., Meilicke, C., Montanelli, S., Pesquita, C., Saveta, T., Shvaiko, P., Splendiani, A., Stuckenschmidt, H., Todorov, K., Trojahn dos Santos, C. & Zamazal, O. 2016. Results of the ontology alignment evaluation initiative 2016. In 11th International Workshop on Ontology Matching (OM 2016), Kobe, Japan, 73–129. CEUR-WS.org.

    Google Scholar

    Acosta , M., Zaveri , A., Simperl , E., Kontokostas , D., Auer , S. & Lehmann , J.2013. Crowdsourcing linked data quality assessment. In Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, 260–276.

    Google Scholar

    Algergawy , A., Cheatham, M., Faria, D., Ferrara, A., Fundulaki, I., Harrow, I., Hertling, S., Jiménez-Ruiz, E., Karam, N., Khiat, N., Lambrix, P., Li, H., Montanelli, S., Paulheim, H., Pesquita, C., Saveta, T., Schmidt, D., Shvaiko, P., Splendiani, A., Thiéblin, E., Trojahn dos Santos, C., Vatascinová, J., Zamazal, O. & Zhou, L. 2018. Results of the ontology alignment evaluation initiative 2018. In 13th International Workshop on Ontology Matching (OM 2018), Monterey, CA, USA, 76–116. CEUR-WS.org.

    Google Scholar

    Bozzon , A., Brambilla , M., Ceri , S. & Mauri , A.2013. Reactive crowdsourcing. In Proceedings of the 22nd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil, 153–164.

    Google Scholar

    Carmines , E. G. & Zeller , R. A.1979. Reliability and Validity Assessment, 17. Sage Publications.

    Google Scholar

    Castano , S., Ferrara , A., Genta , L. & Montanelli , S.2016. Combining Crowd Consensus and User Trustworthiness for Managing Collective Tasks. Future Generation Computer Systems, 54.

    Google Scholar

    Castano , S., Ferrara , A. & Montanelli , S. (2015). A multi-dimensional approach to crowd-consensus modeling and evaluation. In Proceedings of the 34th International Conference on Conceptual Modeling (ER 2015), Stockholm, Sweden.

    Google Scholar

    Cheatham , M. & Hitzler , P.2014. Conference v2.0: An uncertain version of the OAEI conference benchmark. In Proceedings of the 13th International Semantic Web Conference, Riva del Garda, Italy, 33–48.

    Google Scholar

    Cruz , I. F., Loprete , F., Palmonari , M., Stroe , C. & Taheri , A.2014. Pay-as-you-go multi-user feedback model for ontology matching. In Proceedings of the 19th International Conference on Knowledge Engineering and Knowledge Management, Link’oping, Sweden, 80–96.

    Google Scholar

    Cuenca Grau, B., Dragisic, Z., Eckert, K., Euzenat, J., Ferrara, A., Granada, R., Ivanova, V., Jiménez-Ruiz, E., Kempf, A. O., Lambrix, P., Nikolov, A., Paulheim, H., Ritze, D., Scharffe, F., Shvaiko, P., Trojahn dos Santos, C. & Zamazal, O. 2013. Results of the ontology alignment evaluation initiative 2013. In 8th International Workshop on Ontology Matching (OM 2013), Sydney, Australia, 61–100. CEUR-WS.org

    Google Scholar

    Dragisic , Z., Ivanova , V., Lambrix , P., Faria , D., Jiménez-Ruiz , E., & Pesquita , C. (2016). User Validation in Ontology Alignment. In Proceedings of the 15th International Semantic Web Conference, Kobe, Japan.

    Google Scholar

    Estellés-Arolas, E. & Guevara, F. G. L. 2012. Towards an integrated crowdsourcing definition. Journal of Information Science38(2), 189–200.

    Google Scholar

    Euzenat , J., Rosoiu , M. & dos Santos, C. T. 2013. Ontology matching benchmarks: generation, stability, and discriminability. Journal of Web Semantics21, 30–48.

    Google Scholar

    Euzenat , J. & Shvaiko , P.2013. Ontology Matching, 2nd edition. Springer.

    Google Scholar

    Euzenat , J. & Shvaiko , P.2007. Ontology Matching, 18. Springer.

    Google Scholar

    Faria , D., Pesquita , C., Santos , E., Palmonari , M., Cruz , I. F. & Couto , F. M.2013. The AgreementMakerLight ontology matching system. In OTM Conferences - ODBASE, 527–541.

    Google Scholar

    Ferrara , A., Montanelli , S., Noessner , J. Stuckenschmidt, H. 2011. Benchmarking matching applications on the semantic web. In Extended Semantic Web Conference. Springer, 108–122.

    Google Scholar

    Galton , F.1907. One vote, one value. Nature75, 414.

    Google Scholar

    Genta , L., Ferrara , A. & Montanelli , S.2017. Consensus-based techniques for range-task resolution in crowdsourcing systems. In Proceedings of the 7th EDBT International Workshop on Linked Web Data Management, Venice, Italy.

    Google Scholar

    Howe , J.2006. The rise of crowdsourcing. Wired Magazine14(6), 1–4.

    Google Scholar

    Jiménez-Ruiz , E. & Cuenca Grau , B.2011. LogMap: logic-based and scalable ontology matching. In Proceedings of the 10th International Semantic Web Conference, Bonn, Germany, 273–288.

    Google Scholar

    Jiménez-Ruiz , E., Cuenca Grau , B., Horrocks , I. & Berlanga , R.2011. Logic-based assessment of the compatibility of UMLS ontology sources. Journal of Biomedical Semantics2.

    Google Scholar

    Jiménez-Ruiz , E., Cuenca Grau , B., Zhou , Y. & Horrocks , I.2012a. Large-scale interactive ontology matching: algorithms and implementation. In European Conference on Artificial Intelligence (ECAI), 444–449.

    Google Scholar

    Jiménez-Ruiz , E., Grau , B. C., Horrocks , I.et al.2012b. Exploiting the UMLS metathesaurus in the ontology alignment evaluation initiative. In 2nd International Workshop on Exploiting Large Knowledge Repositories (E- LKR). CEUR- WS. org.

    Google Scholar

    Li , H., Dragisic , Z., Faria , D., Ivanova , V., Jiménez-Ruiz , E., Lambrix , P. & Pesquita , C.2019. User validation in ontology alignment: functional assessment and impact. Knowledge Engineering Review34, e15.

    Google Scholar

    Malone , T. W., Laubacher , R. & Dellarocas , C.2010. The Collective Intelligence Genome. IEEE Engineering Management Review38(3).

    Google Scholar

    Mortensen , J. M.2013. Crowdsourcing Ontology Verification. In Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, 448–455.

    Google Scholar

    Ngomo , A.-C. N. & Auer , S.2011. Limesa time-efficient approach for large-scale link discovery on the web of data. In 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain.

    Google Scholar

    Noronha , J., Hysen , E., Zhang , H. & Gajos , K. Z.2011. Platemate: crowdsourcing nutritional analysis from food photographs. In Proceeding of the 24th Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 1–12.

    Google Scholar

    Noy , N. F., Mortensen , J., Musen , M. A. & Alexander , P. R.2013. Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow. In Proceedings of the 5th ACM Web Science Conference, Paris, France, 262–271.

    Google Scholar

    Paulheim , H., Hertling , S. & Ritze , D.2013. Towards evaluating interactive ontology matching tools. In Proceedings of the 10th Extended Semantic Web Conference, Montpellier, France, 31–45.

    Google Scholar

    Röder , M., Saveta , T., Fundulaki , I. & Ngomo , A.-C. N. (2017). Hobbit link discovery benchmarks. 12th International Workshop on Ontology Matching (OM 2017), Vienna, Austria.

    Google Scholar

    Sarasua , C., Simperl , E. & Noy , N. F.2012. CrowdMap: crowdsourcing ontology alignment with microtasks. In Proceedings of the 11th International Semantic Web Conference, Boston, MA, USA, 525–541.

    Google Scholar

    Saveta , T., Daskalaki , E., Flouris , G., Fundulaki , I., Herschel , M. & Ngonga Ngomo , A.-C.2015. Pushing the limits of instance matching systems: a semantics-aware benchmark for linked data. In Proceedings of the 24th International Conference on World Wide Web, ACM, 105–106.

    Google Scholar

    Thaler , S., Simperl , E. P. B. & Siorpaes , K.2011. SpotTheLink: a game for ontology alignment. In Proceedings of the 6th Conference on Professional Knowledge Management: From Knowledge to Action, Innsbruck, Austria, 246–253.

    Google Scholar

    Van Dusen , D. A., Chase , C. & Wise , J. A.2016. System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction. US Patent 9461876.

    Google Scholar

    Volz , J., Bizer , C., Gaedke , M. & Kobilarov , G.2009. Silk-a link discovery framework for the web of data. In International Workshop on Linked Data on the Web (LDOW2009), Madrid, Spain. CEUR-WS.org.

    Google Scholar

  • Cite this article

    Daniel Faria, Alfio Ferrara, Ernesto Jiménez-ruiz, Stefano Montanelli, Catia Pesquita. 2020. Crowd-assessing quality in uncertain data linking datasets. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000363
    Daniel Faria, Alfio Ferrara, Ernesto Jiménez-ruiz, Stefano Montanelli, Catia Pesquita. 2020. Crowd-assessing quality in uncertain data linking datasets. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000363

Article Metrics

Article views(60) PDF downloads(43)

RESEARCH ARTICLE   Open Access    

Crowd-assessing quality in uncertain data linking datasets

Abstract: Abstract: The quality of a dataset used for evaluating data linking methods, techniques, and tools depends on the availability of a set of mappings, called reference alignment, that is known to be correct. In particular, it is crucial that mappings effectively represent relations between pairs of entities that are indeed similar due to the fact that they denote the same object. Since the reliability of mappings is decisive in order to perform a fair evaluation of automatic linking methods and tools, we call this property of mappings as mapping fairness. In this article, we propose a crowd-based approach, called Crowd Quality (CQ), for assessing the quality of data linking datasets by measuring the fairness of the mappings in the reference alignment. Moreover, we present a real experiment, where we evaluate two state-of-the-art data linking tools before and after the refinement of the reference alignment based on the CQ approach, in order to present the benefits deriving from the crowd assessment of mapping fairness.

    • Daniel Faria was funded by the EC H2020 grant 676559 ELIXIR-EXCELERATE and the Portuguese FCT Grants 22231 BioData.pt (co-financed by FEDER) and UIDB/50021/2020 to INESC-ID). Catia Pesquita was supported by FCT through the LaSIGE research unit (ref. UIDB/00408/2020 and ref. UIDP/00408/2020) and project SMILAX ref. PTDC/EEI-ESS/4633/2014. Ernesto Jiménez-Ruiz was supported by the AIDA project, The Alan Turing Institute under the EPSRC grant EP/N510129/1 and the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889).

    • Throughout this article, we use ontology matching or simply matching to denote the matching of either Tbox concepts/properties or the matching of Abox individuals, and ontology to denote either the Tbox or the Abox. We use instance matching when referring specifically to the matching of Abox individuals.

    • http://oaei.ontologymatching.org.

    • The running example is based on an ontology Abox describing about 1 600 Boardgames. Data have been retrieved from the BoardGameGeek (BGG) website (https://boardgamegeek.com).

    • The setting of the refinement threshold th is a parameter that is expected to be set up by the designer of the evaluation process. The idea is that higher values of th produce a simpler challenge for matching tools in that only highly fair mappings are preserved. In this paper, we run an extensive experimental evaluation of the impact of different levels of th on the evaluation process (see Section 5).

    • The criteria used for assigning tasks to workers are out of the scope of this work, and it depends on the specific task routing policies enforced by the crowdsourcing platform where the campaign is hosted.

    • For the sake of clarity of crowd workers, the slider allows to specify an answer in the range [0, 10] which is eventually shifted to the range [0, 1] when the answer is inserted in A.

    • Further details about Argo and related crowdsourcing techniques for consensus evaluation are provided in Castano et al. (2016).

    • The value of the threshold $th_{cv}$ has been determined on the basis of experimental observations to maximize the trade-off between the number of committed tasks (i.e., tasks with successful consensus evaluation) and the number of worker answers to consider in the consensus group (see the discussion on the ma techniques provided in Section 4.2).

    • http://islab.di.unimi.it/iimb/

    • © The Author(s), 2020. Published by Cambridge University Press2020Cambridge University Press
References (37)
  • About this article
    Cite this article
    Daniel Faria, Alfio Ferrara, Ernesto Jiménez-ruiz, Stefano Montanelli, Catia Pesquita. 2020. Crowd-assessing quality in uncertain data linking datasets. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000363
    Daniel Faria, Alfio Ferrara, Ernesto Jiménez-ruiz, Stefano Montanelli, Catia Pesquita. 2020. Crowd-assessing quality in uncertain data linking datasets. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000363
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return