|
Achard F., Vaysseix G. & Barillot E.2001. XML, bioinformatics and data integration. Bioinformatics17(1), 115–125.
Google Scholar
|
|
Aggarwal C. C., Ta N., Wang J., Feng J. & Zaki M.2007. XProj: a framework for projected structural clustering of XML documents. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’07, 46–55.
Google Scholar
|
|
Aggarwal C. C. & Zhai C. (eds) 2012. Mining Text Data, Springer.
Google Scholar
|
|
Aïtelhadj A., Boughanem M., Mezghiche M. & Souam F.2012. Using structural similarity for clustering XML documents. Knowledge and Information Systems32(1), 109–139.
Google Scholar
|
|
Algergawy A., Mesiti M., Nayak R. & Saake G.2011. XML data clustering: an overview. ACM Computing Surveys43(4), 25:1–25:41.
Google Scholar
|
|
Alishahi M., Naghibzadeh M. & Aski B. S.2010. Tag name structure-based clustering of XML documents. International Journal of Electrical and Computer Engineering2(1), 119–126.
Google Scholar
|
|
Andreopoulos B., An A., Wang X. & Schroeder M.2009. A roadmap of clustering algorithms: finding a match for a biomedical application. Briefings in Bioinformatics10(3), 297–314.
Google Scholar
|
|
Ankerst M., Breunig M. M., Kriegel H.-P. & Sander J.1999. OPTICS: ordering points to identify the clustering structure. SIGMOD Record28(2), 49–60.
Google Scholar
|
|
Antonellis P., Makris C. & Tsirakis N.2008. XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries. In Proceedings of the 2008 International Conference on Advances in Computer, SAC’08, 1081–1088.
Google Scholar
|
|
Ausbrooks R., Buswell S., Carlisle D., Chavchanidze G., Dalmas S., Devitt S., Diaz A., Dooley S., Hunter R., Ion P., Kohlhase M., Lazrek A., Libbrecht P., Miller B., Miner R., (deceased), Rowley C., Sargent M., Smith B., Soiffer N., Sutor R., & Watt S.2014. Mathematical Markup Language (MathML), Version 3.0, 2nd Edition. In Carlisle, D., Ion, P. & Miner, R. (eds). W3C, http://www.w3.org/TR/MathML3/.
Google Scholar
|
|
Baeza-Yates R. A. & Ribeiro-Neto B. A.1999. Modern Information Retrieval, ACM Press/Addison-Wesley.
Google Scholar
|
|
Bennett C. H., Gacs P., Li M., Vitanyi P. M. B. & Zurek W. H.1998. Information distance. IEEE Transactions on Information Theory44(4), 1407–1423.
Google Scholar
|
|
Bertino E., Guerrini G. & Mesiti M.2008. Measuring the structural similarity among XML documents and DTDs. Journal of Intelligent Information Systems30(1), 55–92.
Google Scholar
|
|
Brzezinski D., Lesniewska A., Morzy T. & Piernik M.2011. XCleaner: a new method for clustering XML documents by structure. Control and Cybernetics40(3), 877–891.
Google Scholar
|
|
Buttler D.2004. A short survey of document structure similarity algorithms. In Proceedings of the 5th International Conference on Internet Computing, 3–9.
Google Scholar
|
|
Candillier L., Tellier I. & Torre F.2006. Transforming XML trees for efficient classification and clustering. In Proceedings of the 4th International Conference of the Initiative for the Evaluation of XML Retrieval, INEX’05, 469–480.
Google Scholar
|
|
Chawathe S. S.1999. Comparing hierarchical data in external memory. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB’99, 90–101.
Google Scholar
|
|
Chawathe S. S., Rajaraman A., Garcia-Molina H. & Widom J.1996. Change detection in hierarchically structured information. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD ’96, 493–504.
Google Scholar
|
|
Costa G., Manco G., Ortale R. & Tagarelli A.2004. A tree-based approach to clustering XML documents by structure. In Proccedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD’04, 137–148.
Google Scholar
|
|
Cover T. M. & Thomas J. A.1991. Elements of Information Theory, Wiley-Interscience.
Google Scholar
|
|
Crescenzi V., Mecca G. & Merialdo P.2001. RoadRunner: towards automatic data extraction from large web sites. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB’01, 109–118.
Google Scholar
|
|
Crescenzi V., Merialdo P. & Missier P.2005. Clustering web pages based on their structure. Data & Knowledge Engineering54(3), 279–299.
Google Scholar
|
|
Dalamagas T., Cheng T., Winkel K.-J. & Sellis T.2006. A methodology for clustering XML documents by structure. Information Systems31(3), 187–228.
Google Scholar
|
|
Dalamagas T., Cheng T., Winkel K.-J. & Sellis T. K.2004. Clustering XML documents by structure. In Proceedings of the 3rd Hellenic Conference on Artificial Intelligence, SETN’04, 112–121.
Google Scholar
|
|
Dempster A. P., Laird N. M. & Rubin D. B.1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological)39(1), 1–38.
Google Scholar
|
|
Dolin R. H., Alschuler L., Boyer S., Beebe C., Behlen F. M., Biron P. V. & Shvo A. S.2006. HL7 clinical document architecture, release 2. Journal of the American Medical Informatics Association13(1), 30–39.
Google Scholar
|
|
Doucet A. & Lehtonen M.2006. Unsupervised classification of text-centric XML document collections. In Proceedings of the 5th International Conference of the Initiative for the Evaluation of XML Retrieval, INEX’06, 497–509.
Google Scholar
|
|
Ester M., Kriegel H.-P., Sander J. & Xu X.1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, KDD’96, 226–231.
Google Scholar
|
|
Flesca S., Manco G., Masciari E., Pontieri L. & Pugliese A.2002. Detecting structural similarities between XML documents. In Proceedings of the 5th International Workshop on the Web and Databases, WebDB’02, 55–60.
Google Scholar
|
|
Flesca S., Manco G., Masciari E., Pontieri L. & Pugliese A.2005. Fast detection of XML structural similarity. IEEE Transactions on Knowledge and Data Engineering17(2), 160–175.
Google Scholar
|
|
Goldman R. & Widom J.1997. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. Technical report no. 1997-50, Stanford InfoLab.
Google Scholar
|
|
Guerrini G., Mesiti M. & Sanz I.2007. An overview of similarity measures for clustering XML documents. In Web Data Management Practices: Emerging Techniques and Technologies, chapter 3, Vakali, A. & Pallis, G. (eds)., 56--78. Idea Group Inc. (IGI).
Google Scholar
|
|
Guha S., Rastogi R. & Shim K.1998. CURE: an efficient clustering algorithm for large databases. SIGMOD Record27(2), 73–84.
Google Scholar
|
|
Guha S., Rastogi R. & Shim K.2000. ROCK: a robust clustering algorithm for categorical attributes. Information Systems25(5), 345–366.
Google Scholar
|
|
Hadzic F., Hecker M. & Tagarelli A.2011. XML document clustering using structure preserving flat representation of XML content and structure. In Proceedings of the 7th International Conference on Advanced Data Mining and Applications, ADMA’11, 403–416.
Google Scholar
|
|
Hagenbuchner M., Sperduti A., Tsoi A. C., Trentini F., Scarselli F. & Gori M.2006. Clustering XML documents using self-organizing maps for structures. In Proceedings of the 4th International Conference of the Initiative for the Evaluation of XML Retrieval, INEX’05, 481–496.
Google Scholar
|
|
Han J.2005. Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers Inc.
Google Scholar
|
|
Hartigan J. A. & Wong M. A.1979. A K-means clustering algorithm. Journal of the Royal Statistical Society Series C Applied Statistics28, 100–108.
Google Scholar
|
|
Helmer S.2007. Measuring the structural similarity of semistructured documents using entropy. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB’07, VLDB Endowment, 1022–1032.
Google Scholar
|
|
Helmer S., Augsten N. & Böhlen M.2012. Measuring structural similarity of semistructured data based on information-theoretic approaches. VLDB Journal21(5), 677–702.
Google Scholar
|
|
Hubert L. J. & Levin J. R.1976. A general statistical framework for accessing categorical. Psychological Bulletin83, 1072–1082.
Google Scholar
|
|
Husek D., Pokorny J., Rezankova H. & Snasel V.2007. Data clustering: from documents to the web. In Web Data Management Practices: Emerging Techniques and Technologies, chapter 1, Vakali, A. & Pallis, G. (eds)., 1--33. Idea Group Inc. (IGI).
Google Scholar
|
|
Hwang J. H. & Ryu K. H.2004. A new XML clustering for structural retrieval. In Proceedings of the 23rd International Conference on Conceptual Modeling, ER’04, 377–387.
Google Scholar
|
|
Hwang J. H. & Ryu K. H.2010. A weighted common structure based clustering technique for XML documents. Journal of Systems and Software83(7), 1267–1274.
Google Scholar
|
|
Jain A. K., Murty M. N. & Flynn P. J.1999. Data clustering: a review. ACM Computing Surveys31(3), 264–323.
Google Scholar
|
|
Johnson S.1967. Hierarchical clustering schemes. Psychometrika32, 241–254.
Google Scholar
|
|
Kohonen T.1989. Self-Organization and Associative Memory, 3rd edition. Springer-Verlag New York Inc.
Google Scholar
|
|
Kutty S., Nayak R. & Li Y.2010. Utilising semantic tags in XML clustering. In Proceedings of the 8th International Conference on Initiative for the Evaluation of XML Retrieval, INEX’09, 416–425.
Google Scholar
|
|
Kutty S., Nayak R. & Li Y.2011. XML documents clustering using a tensor space model. In Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD’11, 488–499.
Google Scholar
|
|
Kutty S., Tran T., Nayak R. & Li Y.2007. Clustering XML documents using closed frequent subtrees: a structural similarity approach. In Proceedings of the 5th International Conference on Initiative for the Evaluation of XML Retrieval, INEX’06, 183–194.
Google Scholar
|
|
Kutty S., Tran T., Nayak R. & Li Y.2008. Clustering XML documents using frequent subtrees. In Proceedings of the 6th International Conference on Initiative for the Evaluation of XML Retrieval, INEX’07, 436–445.
Google Scholar
|
|
Lee M. L., Yang L. H., Hsu W. & Yang X.2002. XClust: clustering XML schemas for effective integration. In Proceedings of the 11th International Conference on Information and Knowledge Management, CIKM’02, 292–299.
Google Scholar
|
|
Lesniewska A.2009. Clustering XML documents by structure. In 13th East-European Conference Advances on Databases and Information Systems’, ADBIS’09, 238–246.
Google Scholar
|
|
Lesniewska A. & Primke K.2008. Finding Features on the Basis of the Structure of XML Documents. Technical Report RA 16/08, Poznan University of Technology.
Google Scholar
|
|
Leung H.-P., Chung K. F.-L., fai Chan S. C. & Luk R. W. R.2005. XML document clustering using common XPath. In International Workshop on Challenges in Web Information Retrieval and Integration, WIRI’05, 91–96.
Google Scholar
|
|
Li J., Liu J., Liu C., Wang G., Yu J. X. & Yangt C.2007. Computing structural similarity of source XML schemas against domain XML schema. In Proceedings of the 19th Australasian Database Conference, ADC’08, 155–164.
Google Scholar
|
|
Lian W., Cheung D. W.-l., Mamoulis N. & Yiu S.-M.2004. An efficient and scalable algorithm for clustering XML documents by structure. IEEE Transactions on Information Theory16(1), 82–96.
Google Scholar
|
|
Moon T. K.1996. The expectation-maximization algorithm. IEEE Signal Processing Magazine13(6), 47–60.
Google Scholar
|
|
Murray-Rust P. & Rzepa H.1995. Chemical Markup Language, http://www.xml-cml.org/.
Google Scholar
|
|
Nayak R.2008. Fast and effective clustering of XML data using structural information. Knowledge and Information Systems14(2), 197–215.
Google Scholar
|
|
Nayak R. & Iryadi W.2006. XMine: a methodology for mining XML structure. In Proceedings of the 8th Asia-Pacific Web Conference on Frontiers of WWW Research and Development, APWeb’06, 786–792.
Google Scholar
|
|
Nierman A. & Jagadish H. V.2002. Evaluating structural similarity in XML documents. In Proceedings of the 5th International Workshop on the Web and Databases, WebDB’02, 61–66.
Google Scholar
|
|
Pawlik M. & Augsten N.2011. RTED: a robust algorithm for the tree edit distance. The Proceedings of the VLDB Endowment5(4), 334–345.
Google Scholar
|
|
Piernik M., Brzezinski D. & Morzy T.2014. Clustering XML documents by patterns. Knowledge and Information Systems, Technical report number: RA-12/2013. Poznan University of Technology. (review in progress).
Google Scholar
|
|
Pospech T.2009. GML – Geography Markup Language, GRIN Verlag.
Google Scholar
|
|
Rafiei D., Moise D. L. & Sun D.2006. Finding syntactic similarities between XML documents. In Proceedings of the 17th International Conference on Database and Expert Systems Applications, DEXA’06, 512–516.
Google Scholar
|
|
Rand W. M.1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association66(336), 846–850.
Google Scholar
|
|
Rousseeuw P.1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics20(1), 53–65.
Google Scholar
|
|
Sankoff D. & Kruskal J. B.2000. Time Warps, String Edits, and Macromolecules, Cambridge University Press.
Google Scholar
|
|
Selkow S.1977. The tree-to-tree editing problem. Information Processing Letters6(6), 184–186.
Google Scholar
|
|
Sokal R. R. & Rohlf F. J.1962. The comparison of dendrograms by objective methods. Taxon11(2), 33–40.
Google Scholar
|
|
Somervuo P. & Kohonen T.2000. Clustering and visualization of large protein sequence databases by means of an extension on the self-organizing map. In Proceedings of the 3rd International Conference on Discovery Science, DS’00, 76–85.
Google Scholar
|
|
Tagarelli A. & Greco S.2010. Semantic clustering of XML documents. ACM Transactions on Information Systems28(1), 3:1–3:56.
Google Scholar
|
|
Tai K.-C.1979. The tree-to-tree correction problem. The Journal of the ACM26(3), 422–433.
Google Scholar
|
|
Tan P.-N., Steinbach M. & Kumar V.2005. Introduction to Data Mining, Addison-Wesley.
Google Scholar
|
|
Tekli J. & Chbeir R.2012. A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. The Journal of Web Semantics11, 14–40.
Google Scholar
|
|
Tekli J., Chbeir R. & Yetongnon K.2009. Survey: an overview on XML similarity: background, current trends and future directions. Computer Science Review3(3), 151–173.
Google Scholar
|
|
Theobald M.2003. Exploiting structure, annotation, and ontological knowledge for automatic classification of XML data. In Proceedings of the International Workshop on the Web and Databases, WebDB’03, 1–6.
Google Scholar
|
|
Tran T., Nayak R. & Bruza P.2007. Document clustering using incremental and pairwise approaches. In Proceedings of the 5th International Conference on Initiative for the Evaluation of XML Retrieval, INEX’06, 4862, 222–233.
Google Scholar
|
|
Tran T., Nayak R. & Bruza P.2008. Combining structure and content similarities for XML document clustering. In Proceedings of the 7th Australasian Data Mining Conference, AusDM’08, 219–226.
Google Scholar
|
|
Vakali A., Pallis G. & Angelis L.2007. Clustering web information services. In Web Data Management Practices: Emerging Techniques and Technologies, chapter 2, Vakali, A. & Pallis, G. (eds)., 34--55. Idea Group Inc. (IGI).
Google Scholar
|
|
Vercoustre A.-M., Fegas M., Gul S. & Lechevallier Y.2006. A flexible structured-based representation for XML document mining. In Proceedings of the 4th International Conference on Initiative for the Evaluation of XML Retrieval, INEX’05, 443–457.
Google Scholar
|
|
Viyanon W., Madria S. K. & Bhowmick S. S.2008. XML data integration based on content and structure similarity using keys. In Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE, OTM’08, 484–493.
Google Scholar
|
|
Wang T., Liu D.-X., Lin X.-Z., Sun W. & Ahmad G.2006. Clustering large scale of XML documents. In Proceedings of the 1st International Conference on Advances in Grid and Pervasive Computing, GPC’06, 447–455.
Google Scholar
|
|
Wernecke J.2008. The KML Handbook: Geographic Visualization for the Web, 1st edition. Addison-Wesley Professional.
Google Scholar
|
|
Westbrook J., Ito N., Nakamura H., Henrick K. & Berman H. M.2005. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics21(7), 988–992.
Google Scholar
|
|
Wilson R., Cobb M., McCreedy F., Ladner R., Olivier D., Lovitt T., Shaw K., Petry F. & Abdelguerfi M.2003. Geographical data interchange using XML-enabled technology within the GIDB system. In XML Data Management: Native XML and XML-Enabled Database Systems, Chaudhri, A. B., Rashid, A. & Zicari, R. (eds). Addison Wesley, 354–374.
Google Scholar
|
|
Xu R. & Wunsch I.2005. Survey of clustering algorithms. IEEE Transactions on Neural Networks16(3), 645–678.
Google Scholar
|
|
Yang R., Kalnis P. & Tung A. K. H.2005. Similarity evaluation on tree-structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD’05, 754–765.
Google Scholar
|
|
Yang Y., Guan X. & You J.2002. CLOPE: a fast and effective clustering algorithm for transactional data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 682–687.
Google Scholar
|
|
Yoon J. P., Raghavan V., Chakilam V. & Kerschberg L.2001. BitCube: a three-dimensional bitmap indexing for XML documents. Journal of Intelligent Information Systems17(2–3), 241–254.
Google Scholar
|
|
Zaki M. J.2002. Efficiently mining frequent trees in a forest. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 71–80.
Google Scholar
|
|
Zaki M. J. & Aggarwal C. C.2006. XRules: an effective algorithm for structural classification of XML data. Machine Learning62(1–2), 137–170.
Google Scholar
|
|
Zhang K. & Shasha D.1989. Simple fast algorithms for the editing distance between trees and related problems. The SIAM Journal on Computing18(6), 1245–1262.
Google Scholar
|
|
Zhao Y. & Karypis G.2002. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the 11th ACM International Conference on Information and Knowledge Management, CIKM’02, 515–524.
Google Scholar
|
|
Zheng Y.-T., Li Y., Zha Z.-J. & Chua T.-S.2011. Mining travel patterns from GPS-tagged photos. In Proceedings of the 17th International Multimedia Modeling Conference, MMM’11, 262–272.
Google Scholar
|
|
Zhu Y.-W., Ji G.-L. & Sun Q.-H.2010. Clustering GML documents using maximal frequent induced subtrees. In Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discover, FSKD’10, 5, 2265–2269.
Google Scholar
|