Search
2013 Volume 28
Article Contents
RESEARCH ARTICLE   Open Access    

A survey of frequent subgraph mining algorithms

More Information
  • Abstract: Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues.
  • 加载中
  • Abello A., Resende M. G. C., Sundarsky S.2002. Massive quasi-clique detection. In Proceedings of the 5th Latin America Symposium on Theoretical Informatics, Cancun, Mexico, 598–612.

    Google Scholar

    Agrawal R., Srikant R.1994. Fast algorithm for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases(VLDB). Morgan Kaufmann, 487–499.

    Google Scholar

    Agrawal R. C., Aggarwal C. C., Prasad V. V. V.2001. A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing61(3), 350–371.

    Google Scholar

    Alm E., Arkin A. P.2003. Biological Networks. Current Opinion in Structural Biology13(2), 193–202.

    Google Scholar

    Asai T., Abe K., Kawasoe S., Arimura H., Satamoto H., Arikawa S.2002. Efficient substructure discovery from large semi-structured data. In Proceedings of the 2nd SIAM International Conference on Data Mining, Fukuoka, Japan, 158–174.

    Google Scholar

    Asai T., Arimura H., Uno T., Nakano S.2003. Discovering frequent substructures in large unordered trees. In Proceedings of the 6th International Conference on Discovery Science, Fukuoka, Japan, 47–61.

    Google Scholar

    Bayardo R. J.Jr1998. Efficiently Mining Long Patterns from Databases. In Proceedings of the 1998 International Conference on Management of Data, 85–93.

    Google Scholar

    Bomze I. M., Budinich M., Pardalos P. M., Pelillo M.1999. The maximum clique problem. In Handbook of Combinatorial Optimization, Do, D-Z. & Pardalos, P. M. (eds). Kluwer Academic Publishers, 4, 1–74.

    Google Scholar

    Borgelt C., Berthold M.2002. Mining molecular fragments: finding relevant substructures of molecules. In Proceedings of International Conference on Data Mining, 211–218.

    Google Scholar

    Borgwardt K. M., Kriegel H. P.2005. Shortest-path kernels on graphs. In Proceedings of the 2005 International Conference on Data Mining, 74–81.

    Google Scholar

    Brin S., Page L.1998. The anatomy of a large-scale hyper-textual web search engine. In Proceedings of the 7th International World Wide Web Conference, 107–117.

    Google Scholar

    Bunke H., Allerman G.1983. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters1(4), 245–253.

    Google Scholar

    Calders T., Ramon J., van Dyck D.2008. Anti-monotonic overlap-graph support measures. In Proceedings of the Eighth IEEE International Conference on Data Mining, 73–82.

    Google Scholar

    Chakrabarti S., Dom B., Gibson D., Kleinberg J., Kumar R., Raghavan P., Rajagopaln S., Tomkins A.1999. Mining the link structure of the world wide web. IEEE Computer32(8), 60–67.

    Google Scholar

    Chen M. S., Han J., Yu P. S.1996. Data mining: an overview from database perspective. IEEE Transaction on Knowledge and Data Engineering8, 866–883.

    Google Scholar

    Chen C., Yan X., Zhu F., Han J.2007a. gApprox: mining frequent approximate patterns from a massive network. In Proceedings of the 7th IEEE International Conference on Data Mining, 445–450.

    Google Scholar

    Chen C., Yan X., Yu P. S., Han J., Zhang D., Gu X.2007b. Towards graph containment search and indexing. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07), 926–937.

    Google Scholar

    Chen C., Lin C. X., Yan X., Han J.2008. On effective presentation of graph patterns: a structural representative approach. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, 299–308.

    Google Scholar

    Chi Y., Yang Y., Xia Y., Muntz R. R.2003. Indexing and mining free trees. In Proceedings of the 2003 IEEE International Conference on Data Mining, 509–512.

    Google Scholar

    Chi Y., Nijssen S., Muntz R., Kok J.2004. Frequent subtree mining – an overview. Fundamenta Informaticae, Special Issue on Graph and Tree Mining66(1–2), 161–198.

    Google Scholar

    Chi Y., Yang Y., Xia Y., Muntz R. R.2004a. HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and trees using canonical forms. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management, 11–20.

    Google Scholar

    Chi Y., Yang Y., Xia Y., Muntz R. R.2004b. CMTreeMiner: mining both closed and maximal frequent subtrees. In Proceedings of the 8th Pacific Asia Conference on Knowledge Discovery and Data Mining, 63–73.

    Google Scholar

    Chi Y., Yang Y., Xia Y., Muntz R. R.2005. Canonical forms for labelled trees and their applications in frequent subtree mining. Journal of Knowledge and Information Systems8(2), 203–234.

    Google Scholar

    Christmas W. J., Kittler J., Petrou M.1995. Structural matching in computer vision using probabilistic relaxation. IEEE Transactions on Pattern Analysis and Machine Intelligence17(8), 749–764.

    Google Scholar

    Chung J. C.1987. (n2.5) time algorithm for subgraph homeomorphism problem on trees. Journal of Algorithms8, 106–112.

    Google Scholar

    Conte D., Foggia F., Sansone C., Vento M.2004. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence18(3), 265–298.

    Google Scholar

    Cook D. J., Holder L. B.1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research1, 231–255.

    Google Scholar

    Cook D. J., Holder L. B.2000. Graph-based data mining. IEEE Intelligent Systems15(2), 32–41.

    Google Scholar

    Cordella L. P., Foggia P., Sansone C., Vento M.2001. An improved algorithm for matching large graphs. In Proceedings of the 3rd IAPR-TC15 Workshop on Graph-based Representation in Pattern Recognition, 149–159.

    Google Scholar

    Cordella L. P., Foggia P., Sansone C., Tortorella F., Vento M.1998. Graph Matching: a fast algorithm and its evaluation. In Proceedings of the 14th Conference on Pattern Recognition, 1582–1584.

    Google Scholar

    Cui J. H., Kim J., Maggiorini D., Boussetta K., Gerla M.2005. Aggregated multicast – a comparative study. Cluster Computing8(1), 15–26.

    Google Scholar

    Deshpande M., Kuramochi M., Wale N., Karypis G.2005. Frequent sub-structure-based approach for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering17(8), 1036–1050.

    Google Scholar

    Fan W., Zhang K., Cheng H., Gao J., Yan X., Han J., Yu P. S., Verscheure O.2008. Direct mining of discriminative and essential frequent patterns via model-based search tree. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, 230–238.

    Google Scholar

    Fatta G. D., Berthold M. R.2005. High performance subgraph mining in molecular compounds. In Proceedings of the 2005 International Conference on High Performance Computing and Communications (HPCC'05), 866–877.

    Google Scholar

    Fischer I., Meinl T.2004. Graph based molecular data mining – an overview. In Proceedings of the 2004 IEEE International Conference on Systems,Man and Cybernetics, 4578–4582.

    Google Scholar

    Flake G., Tarjan R., Tsioutsiouliklis K.2004. Graph clustering and minimum cut trees. Internet Mathematics1, 385–408.

    Google Scholar

    Fortin S.1996. The Graph Isomorphism Problem Technical report, no. TR06-20, The University of Alberta.

    Google Scholar

    Foggia P., Genna R., Vento M.2001. A performance comparison of five algorithms for graph isomorphism. In Proceedings of the 3rd IAPR-TC15 Workshop on Graph-based Representation in Pattern Recognition, 188–199.

    Google Scholar

    Garey M. R., Johnson D. S.1979. Computers and intractability – a guide to the theory of NP-completeness. W.H. Freeman and Company.

    Google Scholar

    Gärtner T., Flach P., Wrobel S.2003. On graph kernels: hardness results and efficient alternatives. In Proceedings of the 16th Annual Conference on Learning Theory (COLT'03), 129–143.

    Google Scholar

    Getoor L., Diehl C.2005. Link mining: a survey. ACM SIGKDD Explorations Newsletter7(2), 3–12.

    Google Scholar

    Gibbons A.1985. Algorithmic Graph Theory. Cambridge University Press.

    Google Scholar

    Greco G., Guzzo A., Manco G., Pontieri L., Saccá D.2005. Mining Constrained Graphs: the case of workflow systems, constraint based mining and inductive databases, Lecture Notes in Computer Science, 155–171. Springer.

    Google Scholar

    Gudes E., Shimony S. E., Vanetik N.2006. Discovering frequent graph patterns using disjoint paths. IEEE Transaction on Knowledge and Data Engineering18(11), 1441–1456.

    Google Scholar

    Gutin G.2004. 5.3 Independent sets and cliques. In Handbook of Graph Theory, Discrete Mathematics & Its Applications, Gross, J. L. & Yellin, J. (eds). CRC Press, 389–402.

    Google Scholar

    Han J., Kamber M.2006. Data Mining Concepts and Techniques, 2nd edition. Morgan Kaufmann.

    Google Scholar

    Han J., Pei J., Yin Y.2000. Mining frequent patterns without candidate generation. In Proceedings of ACM SIGMOD International Conference on Management of Data, 1–12.

    Google Scholar

    Han J., Cheng H., Xin D., Yan X.2007. Frequent pattern mining: current status and future directions. Journal of Data Mining and Knowledge Discovery15(1), 55–86.

    Google Scholar

    Harary F., Ross I. C.1957. A procedure for clique detection using the group matrix. Sociometry20(3), 205–215.

    Google Scholar

    Hein J., Jiang T., Wang L., Zhang K.1996. On the complexity of comparing evolutionary trees. Discrete Applied Mathematics71(1–3), 153–169.

    Google Scholar

    Hido S., Kawano H.2005. AMIOT:induced ordered tree mining in tree-structured databases. In Proceedings of the 5th IEEE International Conference on Data Mining, 170–177.

    Google Scholar

    Hopcroft J. E., Tarjan R. E.1972. Isomorphism of planar graphs. In Complexity of Computer Computations, Miller, R. E. & Thatcher, J. W. (eds). IBM Research Symposia Series Plenum Press, 131–152.

    Google Scholar

    Hu H., Yan X., Huang Y., Han J., Zhou X.2005. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics21(1), 213–221.

    Google Scholar

    Huan J., Wang W., Prins J.2003. Efficient mining of frequent subgraph in the presence of isomorphism. In Proceedings of the 2003 International Conference on Data Mining, 549–552.

    Google Scholar

    Huan J., Wang W., Prins J., Yang J.2004. SPIN: mining maximal frequent subgraphs from graph databases. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 581–586.

    Google Scholar

    Huang X., Lai W.2006. Clustering graphs for visualization via node similarities. Visual Language and Computing17, 225–253.

    Google Scholar

    Inokuchi A., Washio T., Motoda H.2000. An Apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, 13–23.

    Google Scholar

    Inokuchi A., Washio T., Motoda H.2003. Complete mining of frequent patterns from graphs: mining graph data. Journal of Machine Learning50(3), 321–354.

    Google Scholar

    Inokuchi A., Washio T., Nishimura K., Motoda H.2002. A Fast Algorithm for Mining Frequent Connected Subgraphs. Technical report RT0448, IBM Research, Tokyo Research Laboratory, Japan.

    Google Scholar

    Jahn K., Kramer S.2005. Optimizing gSpan for molecular datasets. In Proceedings of the 3rd International Workshop on Mining Graphs, Trees and Sequences, 509–523.

    Google Scholar

    Kashima H., Tsuda K., Inokuchi A.2003. Marginalized kernels between labelled graphs. In Proceedings of the 20th International Conference on Machine Learning (ICML'03), 321–328.

    Google Scholar

    Ke Y., Cheng J.,, Ng W.2007. Correlated search in graph databases. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 390–399.

    Google Scholar

    Ke Y., Cheng J., Yu J.2009. Efficient discovery of frequent correlated subgraph pairs. In Proceedings of the 9th IEEE International Conference on Data Mining, 239–248.

    Google Scholar

    Kelley B., Sharan R., Karp R., Sittler E., Root D., Stockwell B., Tdeker T.2003. Conserved pathways within bacteria and yeast as revealed by Global Protein Network alignment. In Proceedings of the National Academy of Science of the United States of America (PNAS'03)100(20), 11394–11399.

    Google Scholar

    Kleinberg J. M.1998. Authoritative sources in a hyper-linked environment. In Proceedings of ACM-SIAM Symposium Discrete Algorithms, 668–677.

    Google Scholar

    Kosala R., Blockeel H.2000. Web mining research: a survey. ACM SIGKDD Explorations Newsletter2(1), 1–15.

    Google Scholar

    Koyutürk M., Grama A., Szpankowski W.2004. An efficient algorithm for detecting frequent subgraphs in biological networks. Journal of Bioinformatics20(1), 200–207.

    Google Scholar

    Kudo T., Maeda E., Matsumoto Y.2004. An application to boosting to graph classification. In Proceedings of the 8th Annual Conference on Neural Information Processing Systems, 729–736.

    Google Scholar

    Kuramochi M., Karypis G.2001. Frequent subgraph discovery. In Proceedings of the International Conference on Data Mining, 313–320.

    Google Scholar

    Kuramochi M., Karypis G.2002. Discovering frequent geometric subgraphs. In Proceedings of the IEEE International Conference on Data Mining, 258–265.

    Google Scholar

    Kuramochi M., Karypis G.2004a.An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering16(9), 1038–1051.

    Google Scholar

    Kuramochi M., Karypis G.2004b. GREW-A scalable frequent subgraph discovery algorithm. In Proceedings of the 4th IEEE International Conference on Data Mining, 439–442.

    Google Scholar

    Kuramochi M., Karypis G.2004c. Finding frequent patterns in a large sparse graph. In Proceedings of the SIAM International Conference on Data Mining, 345–356.

    Google Scholar

    Kuramochi M., Karypis G.2005. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery11(3), 243–271.

    Google Scholar

    Liu B.2008. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer.

    Google Scholar

    Liu T. L., Geiger D.1999. Approximate tree matching and shape similarity. In Proceedings of 7th International Conference on Computer Vision, 456–462.

    Google Scholar

    Matula D. W.1978. Subtree isomorphism in (n5/2)Annals of Discrete Mathematics2, 91–106.

    Google Scholar

    McKay B. D.1981. Practical graph isomorphism. Congressus Numerantium30, 45–87.

    Google Scholar

    Messmer B. T., Bunke H.1998. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Transaction on Pattern Analysis and Machine Intelligence20(5), 493–504.

    Google Scholar

    Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D., Alon U.2002. Network motifs: simple building blocks of complex networks. Science298(5594), 824–827.

    Google Scholar

    Miyazaki T.1997. The complexity of McKay's canonical labelling algorithm, Groups and Computation II, DIMACS Series Discrete Mathematics Theoretical Computer Science, American Mathematical Society, 28, 239–256.

    Google Scholar

    Newman M. E. J.2004. Detecting community structure in networks. The European Physical Journal B – Condensed Matter and Complex Systems38(2), 321–330.

    Google Scholar

    Nijssen S., Kok J. N.2003. Efficient discovery of frequent unordered trees. In Proceedings of the 1st International Workshop on Mining Graphs, Trees and Sequences, 55–64.

    Google Scholar

    Nijssen S., Kok J. N.2004. A quickstart in frequent structure mining can make a difference. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 647–652.

    Google Scholar

    Ozaki T., Ohkawa T.2008. Mining correlated subgraphs in graph databases. In Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'08), 272–283.

    Google Scholar

    Paul S.1998. Multicasting on the Internet and Its Applications. Kluwer Academic Publishers.

    Google Scholar

    Pei J., Jiang D., Zhang A.2005. On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, USA, 228–238.

    Google Scholar

    Pei J., Han J., Mortazavi-Asl B., Pinto H., Chen Q., Dayal U., Hsu M. C.2001. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of 12th IEEE International Conference on Data Engineering (ICDE 01), Heidelberg, Germany, 215–224.

    Google Scholar

    Preiss B. R.1998. Data Structures and Algorithms with Object-Oriented Design Patterns in C++. Wiley.

    Google Scholar

    Prüfer H.1918. Neuer beweis eines satzes über permutationen. Archiv für Mathematik und Physik27, 742–744.

    Google Scholar

    Read R. C., Corneil D. G.1977. The graph isomorph disease. Journal of Graph Theory1, 339–363.

    Google Scholar

    Rückert U., Kramer S.2004. Frequent free tree discovery in graph data. In Proceedings of Special Track on Data Mining, ACM Symposium on Applied Computing, 564–570.

    Google Scholar

    Schmidt D. C., Druffel L. E.1976. A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices. Journal of the ACM23(3), 433–445.

    Google Scholar

    Schreiber F., Schwöbbermeyer H.2005. Frequency concepts and pattern detection for the analysis of motifs in networks. Transactions on Computational Systems Biology3, 89–104.

    Google Scholar

    Shamir R., Tsur D.1999. Faster subtree isomorphism. Journal of Algorithms33(2), 267–280.

    Google Scholar

    Shapiro L. G., Haralick R. M.1981. Structural descriptions and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence3, 504–519.

    Google Scholar

    Shasha D., Wang J. T. L., Giugno R.2002. Algorithms and applications of tree and graph searching. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles on Database Systems, 39–52.

    Google Scholar

    Shasha D., Wang J., Zhang S.2004. Unordered tree mining with applications to phylogeny. In Proceedings of the 20th International Conference on Data Engineering (ICDE 04), 708–719.

    Google Scholar

    Sharan R., Suthram S., Kelley R., Kuhn T., McCuine S., Uetz P., Sittler T., Karp R., Ideker T.2005. Conserved patterns of protein interaction in multiple species. In Proceedings of the National Academy of Science of the United States of America (PNAS'05), 102(6), 1974–1979.

    Google Scholar

    Tan H., Dillon T. S., Feng L., Chang E., Hadzic F.2005. X3-Miner: mining patterns from XML database. In Proceedings of the 6th International Data Mining, 287–297.

    Google Scholar

    Tan H., Dillon T. S., Hadzic F., Chang E., Feng L.2006. IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 450–461.

    Google Scholar

    Tatikonda S., Parthasarathy S., Kurc T.2006. Trips and Tides: new algorithms for tree mining. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, 455–464.

    Google Scholar

    Termier A., Rousset M. C., Sebag M.2002. Treefinder: a first step towards XML data mining. In Proceedings of the 2002 IEEE International Conference on Data Mining, 450–457.

    Google Scholar

    Thomas L. T., Valluri S. R., Karlapalem K.2006. MARGIN: maximal frequent subgraph mining. In Proceedings of the 6th International Conference on Data Mining (ICDM 06), Hong Kong, 1097–1101.

    Google Scholar

    Ullmann J. R.1976. An algorithm for subgraph isomorphism. Journal of the ACM23(1), 31–42.

    Google Scholar

    Valiente G.2002. Algorithms on Trees and Graphs. Springer.

    Google Scholar

    Vanetik N.2002. Discovery of Frequent Patterns in Semi-structured Data. Department of Computer Science, Ben Gurion University.

    Google Scholar

    Vanetik N., Gudes E., Shimony S. E.2002. Computing frequent graph patterns from semi-structured data. In Proceedings of the 2nd International Conference on Data Mining, 458–465.

    Google Scholar

    Vanetik N., Shimony S. E., Gudes E.2006. Support measures for graph data. Journal of Data Mining and Knowledge Discovery13(2), 243–260.

    Google Scholar

    Wang C., Hong M., Pei J., Zhou H., Wang W., Shi B.2004a. Efficient pattern-growth methods for frequent tree pattern mining. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 441–451.

    Google Scholar

    Wang C., Wang W., Pei J., Zhu Y., Shi B.2004b. Scalable mining of large disk-based graph databases. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 316–325.

    Google Scholar

    Wang J., Zeng Z., Zhou L.2006. CLAN: an algorithm for mining closed cliques from large dense graph databases. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, 797–802.

    Google Scholar

    Washo T., Motoda H.2003. State of the art of graph-based data mining. SIGKDD Explorations5, 59–68.

    Google Scholar

    West D. B.2000. Introduction to Graph Theory, 2nd edition. Prentice Hall.

    Google Scholar

    Wörlein M., Meinl T., Fischer I., Philippsen M.2005. A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM and Gaston. In Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, 392–404.

    Google Scholar

    Xin D., Cheng H., Yan X., Han J.2006. Extracting redundancy aware top K patterns. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 444–453.

    Google Scholar

    Yan X., Han J. W.2002. gSpan: graph-based substructure pattern mining. In Proceedings of the International Conference on Data Mining, 721–724.

    Google Scholar

    Yan X., Han J.2003. CloseGraph: mining closed frequent graph patterns. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, D.C., USA, 286–295.

    Google Scholar

    Yan X., Yu P. S., Han J.2004. Graph Indexing: a frequent structure-based approach. In Proceedings of ACM-SIGMOD International Conference on Management of Data, Paris, France, 335–346.

    Google Scholar

    Yan X., Zhou X., Han J.2005a. Mining closed relational graphs with connectivity constraints. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 324–333.

    Google Scholar

    Yan X., Yu P. S., Han J.2005b. Sub-structure similarity search in graph databases. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, 766–777.

    Google Scholar

    Yan X., Zhu F., Han J., Yu P. S.2006. Searching substructures with superimposed distance. In Proceedings of the 22nd International Conference on Data Engineering, 88–97.

    Google Scholar

    Yan X., Cheng H., Han J., Yu P. S.2008. Mining significant graph patterns by leap search. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 433–444.

    Google Scholar

    Zaki M. J.2002. Efficiently Mining Frequent Trees in a Forest. In Proceedings of the SIGKDD 2002. ACM, 71–80.

    Google Scholar

    Zaki M. J.2005a.Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae66(1–2), 33–52.

    Google Scholar

    Zaki M. J.2005b.Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering17(8), 1021–1035.

    Google Scholar

    Zaki M. J., Aggarwal C. C.2003. XRules: an effective structural classifier for XML data. In Proceedings of the 2003 International Conference on Knowledge Discovery and Data Mining, 316–325.

    Google Scholar

    Zeng Z., Wang J., Zhou L., Karypis G.2006. Coherent closed quasi-clique discovery from large dense graph databases. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, 797–802.

    Google Scholar

    Zhang S., Wang J. T. L.2006. Mining frequent agreement subtrees in phylogenetic databases. In Proceedings of the 6th SIAM International Conference on Data Mining, 222–233.

    Google Scholar

    Zhang S., Yang J.2008. RAM: Randomized Approximate Graph Mining. In Proceedings of the 20th International Conference on Scientific and Statistical Database Management, 187–203.

    Google Scholar

    Zhao P., Yu J.2006. Fast frequent free tree mining in graph databases. In Proceedings of the 6th IEEE International Conference on Data Mining Workshop, 315–319.

    Google Scholar

    Zhao P., Yu J.2007. Mining closed frequent free trees in graph databases. In Proceedings of the 12th International Conference on Database Systems for Advanced Applications, Thailand, 91–102.

    Google Scholar

    Zhu F., Yan X., Han J., Yu P. S.2007. gPrune: a constraint pushing framework for graph pattern mining. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 388–400.

    Google Scholar

  • Cite this article

    Chuntao Jiang, Frans Coenen, Michele Zito. 2013. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28(1)75−105, doi: 10.1017/S0269888912000331
    Chuntao Jiang, Frans Coenen, Michele Zito. 2013. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28(1)75−105, doi: 10.1017/S0269888912000331

Article Metrics

Article views(17) PDF downloads(276)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

A survey of frequent subgraph mining algorithms

The Knowledge Engineering Review  28 2013, 28(1): 75−105  |  Cite this article

Abstract: Abstract: Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues.

    • If a graph is frequent, then all of its subgraphs will also be frequent.

    • A simple graph is an unweighted and undirected graph with no loops and no multiple links between any two distinct nodes (Gibbons, 1985; West, 2000).

    • The length of a path is equivalent to the number of edges in the path.

    • A labelled ordered tree, in graph theory, is also called a rooted plane tree (West, 2000).

    • A pre-order traversal is where a sequence of operations are performed recursively as follows: visit the root first, and then do a pre-order traversal of each of the subtrees of the root one-by-one in the order given (Preiss, 1998).

    • k refers to the expansion unit for growing the candidate subtrees, which can be expressed in terms of vertexes, edges.

    • In Zaki (2002), two k-subtrees T1, T2 are in the same ‘prefix’ equivalence class if and only if they share the same encoding up to the (k − 1)-th vertex.

    • The left-most leaf of the tree is the first leaf vertex in the DFS traversal of that tree (Hido & Kawano, 2005).

    • IP multicast: a method for building multicast trees at the Internet Protocol layer so as to send packets to multiple receivers in a single transmission (Paul, 1998).

    • The level of embedding is defined as ‘the length of path between two vertexes that form an ancestor–descendant relationship’ (Tan et al., 2006).

    • The prüfer sequence (Prüfer, 1918) of a labelled tree on n vertexes is a unique (n − 2) length sequence that can be formed by an iterative algorithm (Tatikonda et al., 2006).

    • The path from the root to the left-most leaf is the left-most path (Tatikonda et al., 2006).

    • Two embeddings in a graph G are vertex-disjoint, if they do not share any vertexes in G.

    • Automorphism is a graph isomorphism to itself via a non-identity mapping.

    • In the candidate generation phase, a core is a common (k − 1) subgraph shared by two frequent k subgraphs. Two frequent k subgraphs are eligible for joining only if they contain the same core (Kuramochi & Karypis, 2001).

    • An edge-angle list is a multi-set where each element represents the angle formed by two distinct edges sharing the same end points.

    • An overlap graph for a given pattern with a set of all embeddings (occurrences) is a constructed graph where each vertex represents a non-identical embedding of the pattern; two vertexes are connected if the corresponding embeddings overlap (Kuramochi & Karypis, 2005).

    • Network Motifs are defined as ‘patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks’ (Milo et al., 2002).

    • The minimum degree of a pattern g is the minimum of the degree of v, for all vV(g) (Yan et al., 2005a).

    • A canonical spanning tree of a graph is defined as the lexicographically maximal spanning tree of the graph (Huan et al., 2004).

    • A cut between two nodes in the lattice is defined as an ordered pair (p, c), where node p represents the parent of node c and p is infrequent while c is frequent (Thomas et al., 2006).

    • Let V(g) denote the set of vertexes in a graph g, a subset sV(g) is a clique if the subgraph induced on s is a complete graph.

    • A γ-quasi-clique(0 ⩽ γ ⩽ 1) is a k subgraph (k ⩾ 1), g, where ∀vV(g), $$\[--><$> degree{\rm{(}}v{\rm{)}}\,{\geqslant}\,{\lceil}\gamma {\rm{(}}k\,{\rm{ - }}\,{\rm{1)}}{\rceil} <$><!--$$ (Zeng et al., 2006).

    • Copyright © Cambridge University Press 20122012Cambridge University Press
References (133)
  • About this article
    Cite this article
    Chuntao Jiang, Frans Coenen, Michele Zito. 2013. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28(1)75−105, doi: 10.1017/S0269888912000331
    Chuntao Jiang, Frans Coenen, Michele Zito. 2013. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28(1)75−105, doi: 10.1017/S0269888912000331
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return