Search
2015 Volume 30
Article Contents
RESEARCH ARTICLE   Open Access    

Federated query processing on linked data: a qualitative survey and open challenges

More Information
  • Abstract: A large number of data providers publish and connect their structured data on the Web as linked data. Thus, the Web of data becomes a global data space. In this paper, we initially give an overview of query processing approaches used in this interlinked and distributed environment, and then focus on federated query processing on linked data. We provide a detailed and clear insight on data source selection, join methods and query optimization methods of existing query federation engines. Furthermore, we present a qualitative comparison of these engines and give a complementary comparison of the measured metrics of each engine with the idea of pointing out the major strengths of each one. Finally, we discuss the major challenges of federated query processing on linked data.
  • 加载中
  • Acosta M., Vidal M.-E., Lampo T., Castillo J. & Ruckhaus E.2011. ANAPSID: an adaptive query processing engine for SPARQL endpoints. In The Semantic Web ISWC 2011, Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N. & Blomqvist, E. (eds), Lecture Notes in Computer Science 7031, 18–34. Springer.

    Google Scholar

    Adali S., Candan K. S., Papakonstantinou Y. & Subrahmanian V. S.1996. Query caching and optimization in distributed mediator systems. ACM SIGMOD Record25(2), 137–146.

    Google Scholar

    Akar Z., Halaç T. G., Ekinci E. E. & Dikenelli O.2012. Querying the web of interlinked datasets using VOID descriptions. In Linked Data on the Web (LDOW2012).

    Google Scholar

    Alexander K. & Hausenblas M.2009. Describing linked datasets—on the design and usage of VoID, the ‘Vocabulary of Interlinked Datasets’. In WWW 2009 Workshop: Linked Data on the Web (LDOW2009).

    Google Scholar

    Amsaleg L., Franklin M. J. & Tomasic A.1998. Dynamic query operator scheduling for wide-area remote access. Distributed and Parallel Databases6(3), 217–246.

    Google Scholar

    Arcangeli J., Hameurlain A., Migeon F. & Morvan F.2004. Mobile agent based self-adaptive join for wide-area distributed query processing. Journal of Database Management (JDM)15(4), 25–44.

    Google Scholar

    Avnur R. & Hellerstein J. M.2000. Eddies: continuously adaptive query processing. ACM SIGMOD Record29(2), 261–272.

    Google Scholar

    Babu S., Bizarro P. & DeWitt D.2005. Proactive re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD’05, 107–118. ACM.

    Google Scholar

    Berners-Lee T.2006. Linked data—design issues. http://www.w3.org/DesignIssues/LinkedData.html.

    Google Scholar

    Bizarro P., Babu S., DeWitt D. & Widom J.2005. Content-based routing: different plans for different data. In Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’05, 757–768. VLDB Endowment.

    Google Scholar

    Bizer C.2009. The emerging web of linked data. IEEE Intelligent Systems24(5), 87–92.

    Google Scholar

    Bizer C., Heath T. & Berners-Lee T.2009. Linked data—the story so far. International Journal on Semantic Web and Information Systems5(3), 1–22.

    Google Scholar

    Blanco E., Cardinale Y. & Vidal M.-E.2012. Experiences of sampling-based approaches for estimating qos parameters in the web service composition problem. IJWGS8(1), 1–30.

    Google Scholar

    Buil-Aranda C., Arenas M., Corcho O. & Polleres A.2013. Federating queries in SPARQL 1.1: syntax, semantics and evaluation. Web Semantics: Science, Services and Agents on the World Wide Web18(1), 1–17.

    Google Scholar

    Buil-Aranda C., Polleres A. & Umbrich J.2014. Strategies for executing federated queries in SPARQL 1.1. In The Semantic Web—ISWC 2014—13th International Semantic Web Conference, 19–23 October. Proceedings, Part II, 390–405.

    Google Scholar

    Cambazoglu B. B., Altingovde I. S., Ozcan R. & Ulusoy O.2012. Cache-based query processing for search engines. ACM Transactions on the Web (TWEB)6(4), 14.

    Google Scholar

    Cyganiak R., Zhao J., Alexander K. & Hausenblas M.2011. Describing linked datasets with the VoID vocabulary. http://rdfs.org/ns/void/.

    Google Scholar

    Deshpande A.2004. An initial study of overheads of eddies. ACM SIGMOD Record33(1), 44–49.

    Google Scholar

    Deshpande A. & Hellerstein J. M.2004. Lifting the burden of history from adaptive query processing. In Proceedings of the Thirtieth International Conference on Very Large Data Bases—Volume 30, VLDB’04, 948–959. VLDB Endowment.

    Google Scholar

    Deshpande A., Ives Z. & Raman V.2007. Adaptive query processing. Found Trends Databases1(1), 1–140.

    Google Scholar

    Fionda V., Gutierrez C. & Pirró G.2012. Semantic navigation on the web of data: specification of routes, web fragments and actions. In Proceedings of the 21st International Conference on World Wide Web, WWW’12, 281–290. ACM.

    Google Scholar

    Florescu D., Levy A., Manolescu I. & Suciu D.1999. Query optimization in the presence of limited access patterns. ACM SIGMOD Record28(2), 311–322.

    Google Scholar

    Gan Q. & Suel T.2009. Improved techniques for result caching in web search engines. In Proceedings of the 18th International Conference on World Wide Web, WWW’09, 431–440. ACM.

    Google Scholar

    Gardarin G. & Valduriez P.1990. Relational Databases and Knowledge Bases. Addison-Wesley Longman Publishing Co., Inc.

    Google Scholar

    Görlitz O. & Staab S.2011a. Federated data management and query optimization for linked open data. In New Directions in Web Data Management 1, Vakali, A. & Jain, L. C. (eds), Studies in Computational Intelligence 331, 109–137. Springer.

    Google Scholar

    Görlitz O. & Staab S.2011b. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proceedings of the Second International Workshop on Consuming Linked Data (COLD2011), 23 October, Hartig, O., Harth, A. & Sequeda, J. (eds), CEUR Workshop Proceedings 782, CEUR-WS.org

    Google Scholar

    Haas L. M., Kossmann D., Wimmers E. L. & Yang J.1997. Optimizing queries across diverse data sources. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB’97, 276–285. Morgan Kaufmann Publishers, Inc.

    Google Scholar

    Han W.-S., Ng J., Markl V., Kache H. & Kandil M.2007. Progressive optimization in a shared-nothing parallel database. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD’07, 809–820. ACM.

    Google Scholar

    Hartig O.2011. Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In Proceedings of the 8th Extended Semantic Web Conference on The Semantic Web: Research and Applications—Volume Part I, ESWC’11, 154–169. Springer-Verlag.

    Google Scholar

    Hartig O.2013. SQUIN: a traversal based query execution system for the web of linked data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD’13, 1081–1084. ACM.

    Google Scholar

    Hartig O., Bizer C. & Freytag J.-C.2009. Executing SPARQL queries over the web of linked data. In The Semantic Web—ISWC 2009, Bernstein, A., Karger, D., Heath, T., Feigenbaum, L., Maynard, D., Motta, E. & Thirunarayan, K. (eds), Lecture Notes in Computer Science 5823, 293–309. Springer.

    Google Scholar

    Hartig O. & Langegger A.2010. A database perspective on consuming linked data on the web. Datenbank-Spektrum10(2), 57–66.

    Google Scholar

    Ibaraki T. & Kameda T.1984. On the optimal nesting order for computing n-relational joins. ACM Transactions on Database Systems9(3), 482–502.

    Google Scholar

    Ives Z. G., Florescu D., Friedman M., Levy A. & Weld D. S.1999. An adaptive query execution system for data integration. ACM SIGMOD Record28(2), 299–310.

    Google Scholar

    Kabra N. & DeWitt D. J.1998. Efficient mid-query re-optimization of sub-optimal query execution plans. ACM SIGMOD Record27(2), 106–117.

    Google Scholar

    Kache H., Han W.-S., Markl V., Raman V. & Ewen S.2006. POP/FED: progressive query optimization for federated queries in DB2. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB’06, 1175–1178. VLDB Endowment.

    Google Scholar

    Lorey J. & Naumann F.2013. Caching and prefetching strategies for SPARQL queries. In The Semantic Web: ESWC 2013 Satellite Events, Cimiano, P., Fernndez, M., Lopez, V., Schlobach, S. & Vlker, J. (eds), Lecture Notes in Computer Science 7955, 46–65. Springer.

    Google Scholar

    Lynden S., Kojima I., Matono A. & Tanimura Y.2010. Adaptive integration of distributed semantic web data. In Proceedings of the 6th International Conference on Databases in Networked Information Systems, DNIS’10, 174–193. Springer-Verlag.

    Google Scholar

    Lynden S., Kojima I., Matono A. & Tanimura Y.2011. ADERIS: an adaptive query processor for joining federated SPARQL endpoints. In Proceedings of the 2011th Confederated International Conference on the Move to Meaningful Internet Systems—Volume Part II, OTM’11, 808–817. Springer-Verlag.

    Google Scholar

    Markl V., Raman V., Simmen D., Lohman G., Pirahesh H. & Cilimdzic M.2004. Robust query processing through progressive optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD’04, 659–670. ACM.

    Google Scholar

    Martin M., Unbehauen J. & Auer S.2010. Improving the performance of semantic web applications with SPARQL query caching. In Proceedings of the 7th International Conference on The Semantic Web: Research and Applications—Volume Part II, ESWC’10, 304–318. Springer-Verlag.

    Google Scholar

    Ozakar B., Morvan F. & Hameurlain A.2005. Mobile join operators for restricted sources. Mobile Information Systems1(3), 167–184.

    Google Scholar

    Ozsu M. & Valduriez P.2011. Principles of Distributed Database Systems, 3rd edition. Springer.

    Google Scholar

    Quilitz B. & Leser U.2008. Querying distributed RDF data sources with SPARQL. In Proceedings of the 5th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC’08, 524–538. Springer-Verlag.

    Google Scholar

    Rakhmawati N. A., Umbrich J., Karnstedt M., Hasnain A. & Hausenblas M.2013. Querying over federated SPARQL endpoints—a state of the art survey. CoRR abs/1306.1723.

    Google Scholar

    Raman V., Deshpande A. & Hellerstein J. M.2003. Using state modules for adaptive query processing. In Proceedings of the 19th International Conference on Data Engineering, 5–8 March, 353–364.

    Google Scholar

    Saleem M., Khan Y., Hasnain A., Ermilov I. & Ngomo A. N.2015. A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web Journal, 1–26. http://content.iospress.com/articles/semantic-web/sw186.

    Google Scholar

    Saleem M. & Ngomo A. N.2014. HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In The Semantic Web: Trends and Challenges—11th International Conference, ESWC 2014, 25–29 May. Proceedings, 176–191.

    Google Scholar

    Saleem M., Ngomo A. N., Parreira J. X., Deus H. F. & Hauswirth M.2013. DAW: duplicate-aware federated query processing over the web of data. In The Semantic Web—ISWC 2013—12th International Semantic Web Conference, 21–25 October, Proceedings, Part I, 574–590.

    Google Scholar

    Schwarte A., Haase P., Hose K., Schenkel R. & Schmidt M.2011. FedX: optimization techniques for federated query processing on linked data. In The Semantic Web—ISWC 2011—10th International Semantic Web Conference, 23–27 October, Proceedings, Part I, 601–616.

    Google Scholar

    Stocker M., Seaborne A., Bernstein A., Kiefer C. & Reynolds D.2008. SPARQL basic graph pattern optimization using selectivity estimation. In Proceedings of the 17th International Conference on World Wide Web, WWW 2008, 21–25 April, 595–604.

    Google Scholar

    Umbrich J., Karnstedt M., Hogan A. & Parreira J. X.2012a. Freshening up while staying fast: towards hybrid SPARQL queries. In Knowledge Engineering and Knowledge Management—18th International Conference, EKAW 2012, 8–12 October. Proceedings, 164–174.

    Google Scholar

    Umbrich J., Karnstedt M., Hogan A. & Parreira J. X.2012b. Hybrid SPARQL queries: fresh vs. fast results. In The Semantic Web—ISWC 2012—11th International Semantic Web Conference, 11–15 November, Proceedings, Part I, 608–624.

    Google Scholar

    Urhan T. & Franklin M. J.2000. XJoin: a reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin23(2), 27–33.

    Google Scholar

    Vidal M., Ruckhaus E., Lampo T., Martnez A., Sierra J. & Polleres A.2010. Efficiently joining group patterns in SPARQL queries. In The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, 30 May 30–3 June, Proceedings, Part I, 228–242.

    Google Scholar

    Wang X., Tiropanis T. & Davis H. C.2013. LHD: optimising linked data query processing using parallelisation. In Proceedings of the WWW2013 Workshop on Linked Data on the Web, 14 May.

    Google Scholar

    Wiederhold G.1992. Mediators in the architecture of future information systems. IEEE Computer25(3), 38–49.

    Google Scholar

    Williams G. T. & Weaver J.2011. Enabling fine-grained HTTP caching of SPARQL query results. In The Semantic Web—ISWC 2011—10th International Semantic Web Conference, 23–27 October, Proceedings, Part I, 762–777.

    Google Scholar

    Wilschut A. N. & Apers P. M. G.1991. Dataflow query execution in a parallel main-memory environment. In Proceedings of the First International Conference on Parallel and Distributed Information Systems, PDIS’91, 68–77. IEEE Computer Society Press.

    Google Scholar

    Yönyül B.2014. Performance Management in Federated Linked Data Query Engines. Master’s thesis, Ege University.

    Google Scholar

    Zhou Y., De S. & Moessner K.2013. Implementation of federated query processing on linked data. In 2013 IEEE 24th International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC), 3553–3557.

    Google Scholar

  • Cite this article

    Damla Oguz, Belgin Ergenc, Shaoyi Yin, Oguz Dikenelli, Abdelkader Hameurlain. 2015. Federated query processing on linked data: a qualitative survey and open challenges. The Knowledge Engineering Review 30(5)545−563, doi: 10.1017/S0269888915000107
    Damla Oguz, Belgin Ergenc, Shaoyi Yin, Oguz Dikenelli, Abdelkader Hameurlain. 2015. Federated query processing on linked data: a qualitative survey and open challenges. The Knowledge Engineering Review 30(5)545−563, doi: 10.1017/S0269888915000107

Article Metrics

Article views(24) PDF downloads(123)

RESEARCH ARTICLE   Open Access    

Federated query processing on linked data: a qualitative survey and open challenges

The Knowledge Engineering Review  30 2015, 30(5): 545−563  |  Cite this article

Abstract: Abstract: A large number of data providers publish and connect their structured data on the Web as linked data. Thus, the Web of data becomes a global data space. In this paper, we initially give an overview of query processing approaches used in this interlinked and distributed environment, and then focus on federated query processing on linked data. We provide a detailed and clear insight on data source selection, join methods and query optimization methods of existing query federation engines. Furthermore, we present a qualitative comparison of these engines and give a complementary comparison of the measured metrics of each engine with the idea of pointing out the major strengths of each one. Finally, we discuss the major challenges of federated query processing on linked data.

    • This work is partially supported by The Scientific and Technological Research Council of Turkey (TUBITAK).

    • http://code.google.com/p/sparql-aderis/

    • http://www.w3.org/TR/rdf-sparql-query/#alternatives

    • http://www.w3.org/TR/sparql11-query/#expressions

    • http://www.w3.org/TR/sparql11-query/

    • http://www.w3.org/TR/2013/REC-sparql11-service-description-20130321/

    • http://www.w3.org/TR/sparql11-query/#inline-data

    • © Cambridge University Press, 2015 2015Cambridge University Press
References (61)
  • About this article
    Cite this article
    Damla Oguz, Belgin Ergenc, Shaoyi Yin, Oguz Dikenelli, Abdelkader Hameurlain. 2015. Federated query processing on linked data: a qualitative survey and open challenges. The Knowledge Engineering Review 30(5)545−563, doi: 10.1017/S0269888915000107
    Damla Oguz, Belgin Ergenc, Shaoyi Yin, Oguz Dikenelli, Abdelkader Hameurlain. 2015. Federated query processing on linked data: a qualitative survey and open challenges. The Knowledge Engineering Review 30(5)545−563, doi: 10.1017/S0269888915000107
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return