Search
2016 Volume 31
Article Contents
RESEARCH ARTICLE   Open Access    

Optimizing large knowledge networks in spatial computers

More Information
  • Abstract: This paper presents a novel concept of a Spatially Aware Graph Store, which realizes a Graph Store on top of a spatial computer architecture to manage graphs in one, two or three physical dimensions. In this environment, the physical distance between graph nodes strongly affects graph traversal performance. Consequently, a Spatially Aware Graph Store needs to minimize these distances to operate efficiently. We show that this minimization can be achieved in two ways. First, by increasing the dimensionality of the spatial computer and second by applying optimization methods. For the latter, this work introduces a novel Mid Point Optimization method to quickly optimize large real-world knowledge networks by rearranging nodes in a way that distances between linked nodes are reduced. In addition, a Local Optimization method is subsequently applied to refine the result. Finally, the Node Decomposition method is presented that splits nodes with many edges into several smaller nodes to achieve a further reduction of distances between linked nodes.Our results show that the overall distances between nodes can be reduced by three orders of magnitude for 3D in comparison to one-dimensional (1D) Spatially Aware Graph Stores. The suggested Mid Point Optimization method achieves a reduction by another order of magnitude. In a 3D spatial computer, Local Optimization is capable of reducing distances by another 20%. However, in 1D and 2D spatial computers it becomes a prohibitive time consuming method. Finally, the Node Decomposition enables an additional distance reduction by 40% in Scale Free Graph Data sets.
  • 加载中
  • Abadi, D. J., Marcus, A., Madden, S. R. & Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very Large Data Bases, 411–422. VLDB Endowment.

    Google Scholar

    Abou-Rjeili A. & Karypis G. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, IPDPS’06, 10pp. IEEE.

    Google Scholar

    Binna, R. et al. (2010). SpiderStore: exploiting main memory for efficient RDF graph representation and fast querying. In Proceedings of the Workshop on Semantic Data Management (SemData). VLDB.

    Google Scholar

    Brandt A., Ron D. & Amit D. 1986. Multi-level approaches to discrete-state and stochastic problems. Multigrid Methods II 1228, 65–98.

    Google Scholar

    Broekstra J., Kampman A. & van Harmelen F. 2002. Sesame: a generic architecture for storing and querying RDF and RDF schema. In Semantic Web – ISWC’02, 2342, 54–68. Aidministrator Nederland BV.

    Google Scholar

    Butera B. 2002. Programming a paintable computer. PhD thesis, Massachusetts Institute of Technology.

    Google Scholar

    DeHon A., Giavitto J.-L. & Gruau F. 2007. Executive Report. In: Dagstuhl Seminar Proceedings, 06361—Computing Media and Languages for Space-Oriented Computation.

    Google Scholar

    Delorimier M. & Kapre N. 2006. GraphStep: a system architecture for sparse-graph algorithms. In Field-Programmable Custom Computing Machines, 2006, FCCM’06, 14th Annual IEEE Symposium on IEEE, 143–151.

    Google Scholar

    Delorimier, M., Kapre, N., Mehta, N. & Dehon, A. (2011). Spatial hardware implementation for sparse graph algorithms in GraphStep. ACM Transactions on Autonomous and Adaptive Systems, 6. 1-20.

    Google Scholar

    Erling O. 2008. Towards web scale RDF. In The 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS).

    Google Scholar

    Erling O. & Mikhailov I. 2009. RDF support in the Virtuoso DBMS. Networked Knowledge – Networked Media. Studies in Computational Intelligence. 221, 7–24.

    Google Scholar

    Fishburn P., Tetali P. & Winkler P. 2000. Optimal linear arrangement of arectangular grid. Discrete Mathematics 213, 123–139.

    Google Scholar

    Garey M. R. & Johnson D. S. 1979. Computers and Intractability; A Guide to theTheory of NP-Completeness. W. H. Freeman & Co.

    Google Scholar

    Harel D. & Koren Y. 2004. Graph drawing by high-dimensional embedding. Journal of Graph Algorithms and Applications 8, 195–214.

    Google Scholar

    Harris S., Lamb N. & Shadbolt N. 2009. 4store: the design and implementation of a clustered RDF store. In 5th International Workshop on Scalable Semantic Web KnowledgeBase Systems (SSWS2009), 94–109.

    Google Scholar

    Harth, A., Umbrich, J., Hogan, A. & Decker, S. (2007). Yars2: a federated repository for searching and querying graphstructured data. In Proceedings of the 6th International Semantic Web Conference—ISWC 06, 211–224.

    Google Scholar

    Hu Y. 2006. Efficient, high-quality force-directed graph drawing. The Mathematica Journal 10, 37–71.

    Google Scholar

    Huang J., Abadi D. J. & Ren K. 2011. Scalable SPARQL querying of large RDF graphs. In Proceedings of the VLDB Endowment, 4, 1123–1134.

    Google Scholar

    Husain M. & McGlothlin J. 2011. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering 23, 1312–1327.

    Google Scholar

    Janik M. & Kochut K. 2005. BRAHMS: a work bench RDF store and high performance memory system for Semantic Association Discovery. In Fourth International Semantic Web Conference, 431–445.

    Google Scholar

    Johnson, D. S., Aragon, C. R., Mcgeoch, L. A. & Schevon, C. (1989). Optimization by simulated annealing: an experimental evaluation; part I, graph partitioning. Operations Research 37, 865-892.

    Google Scholar

    Koren Y. & David H. 2002. A multi-scale algorithm for the linear arrangement problem. In Graph-Theoretic Concepts in Computer Science, 296–309.

    Google Scholar

    Mondal J. & Deshpande A. 2012. Managing large dynamic graphs efficiently. In Proceedings of the 2012 International Conference on Management of Data—SIGMOD 12, 145–156.

    Google Scholar

    Neumann T. & Weikum G. 2008. RDF-3X: a RISC-style engine for RDF. In Proceedings of the VLDB Endowment 1, 647–-659.

    Google Scholar

    Nguyen, Q., Eades, P., Hong, S. & Huang, W. (2011). Large crossing angles in circular layouts. In Proceedings of the18th International Conference on Graph Drawing, 397–399.

    Google Scholar

    Ousterhout, J., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Ongaro, D., Parulkar, G., Rosenblum, M., Rumble, S., Stratmann, E. & Stutsman, R. (2011). The case for RAM Cloud. Communications of the ACM 54, 121.

    Google Scholar

    Petit J. 2003. Experiments on the minimum linear arrangement problem. Journal of Experimental Algorithmics 8, 112–128.

    Google Scholar

    Rodriguez-Tello E., Hao J.-K. & Torres-Jimenez J. 2006. A refined evaluation function for the MinLA problem. In MICAI2006: Advances in Artificial Intelligence, 392–403.

    Google Scholar

    Rodriguez-Tello E., Hao J.-K. & Torres-Jimenez J. 2008. An effective two-stage simulated annealing algorithm for the minimum linear arrangement problem. Computers & Operations Research 35, 3331–3346.

    Google Scholar

    Safro I., Ron D. & Brandt A. 2006. Graph minimum linear arrangement by multilevel weighted edge contractions. Journal of Algorithms 60, 24–41.

    Google Scholar

    Safro I., Ron D. & Brandt A. 2009. Multilevel algorithms for linear ordering problems. Journal of Experimental Algorithmics 13, 1.4:1–1.4:20.

    Google Scholar

    Shao B., Wang H. & Li Y. 2012. The Trinity Graph Engine. Technical Report, Microsoft Research.

    Google Scholar

    Shvachko K. & Kuang H. 2010. The hadoop distributed file system. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 1–10.

    Google Scholar

    Walshaw C. 2003. A multilevel algorithm for force-directed graph drawing. Journal of Graph Algorithms and Applications 7, 253–285.

    Google Scholar

    Wilkinson, K., Sayers, C., Kuno, H. & Reynolds, D. (2003). Efficient RDF storage and retrieval in Jena2. In Proceedings of the 1st International Workshop on Semantic Web and Databases, 35–43.

    Google Scholar

    Wulf W. A. & McKee S. A. 1995. Hitting the memory wall: implications of the obvious. ACMSIGARCH Computer Architecture News 23, 20–24.

    Google Scholar

    Zeng, K., Yang, J., Wang, H., Shao, B. & Wang, Z. (2013). A distributed graph engine for web scale RDF data. In Proceedings of the 39th International Conference on Very Large Data Bases, 265–276.

    Google Scholar

  • Cite this article

    Dominic Pacher, Robert Binna, Günther Specht. 2016. Optimizing large knowledge networks in spatial computers. The Knowledge Engineering Review 31(4)367−390, doi: 10.1017/S0269888916000187
    Dominic Pacher, Robert Binna, Günther Specht. 2016. Optimizing large knowledge networks in spatial computers. The Knowledge Engineering Review 31(4)367−390, doi: 10.1017/S0269888916000187

Article Metrics

Article views(26) PDF downloads(4)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Optimizing large knowledge networks in spatial computers

The Knowledge Engineering Review  31 2016, 31(4): 367−390  |  Cite this article

Abstract: Abstract: This paper presents a novel concept of a Spatially Aware Graph Store, which realizes a Graph Store on top of a spatial computer architecture to manage graphs in one, two or three physical dimensions. In this environment, the physical distance between graph nodes strongly affects graph traversal performance. Consequently, a Spatially Aware Graph Store needs to minimize these distances to operate efficiently. We show that this minimization can be achieved in two ways. First, by increasing the dimensionality of the spatial computer and second by applying optimization methods. For the latter, this work introduces a novel Mid Point Optimization method to quickly optimize large real-world knowledge networks by rearranging nodes in a way that distances between linked nodes are reduced. In addition, a Local Optimization method is subsequently applied to refine the result. Finally, the Node Decomposition method is presented that splits nodes with many edges into several smaller nodes to achieve a further reduction of distances between linked nodes.Our results show that the overall distances between nodes can be reduced by three orders of magnitude for 3D in comparison to one-dimensional (1D) Spatially Aware Graph Stores. The suggested Mid Point Optimization method achieves a reduction by another order of magnitude. In a 3D spatial computer, Local Optimization is capable of reducing distances by another 20%. However, in 1D and 2D spatial computers it becomes a prohibitive time consuming method. Finally, the Node Decomposition enables an additional distance reduction by 40% in Scale Free Graph Data sets.

    • Dominic Pacher was supported by a scholarship from the University of Innsbruck (Doktoratsstipendium aus der Nachwuchsförderung, MIP8/2012/1).

    • Freie Universität Berlin, The Linking Open Data cloud diagram, http://lod-cloud.net/.

    • Stanford University, Stanford Large Network Data set Collection, http://snap.stanford.edu/data/.

    • The DBpedia Data set, http://wiki.dbpedia.org.

    • Yago2s: a high-quality knowledge base, http://www.mpi-inf.mpg.de/yago-naga/yago/.

    • Supplementary Materials, http://tinyurl.com/njnj8hf.

    • © Cambridge University Press, 2016 2016Cambridge University Press
References (37)
  • About this article
    Cite this article
    Dominic Pacher, Robert Binna, Günther Specht. 2016. Optimizing large knowledge networks in spatial computers. The Knowledge Engineering Review 31(4)367−390, doi: 10.1017/S0269888916000187
    Dominic Pacher, Robert Binna, Günther Specht. 2016. Optimizing large knowledge networks in spatial computers. The Knowledge Engineering Review 31(4)367−390, doi: 10.1017/S0269888916000187
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return