, 100864 China e-mail: lzwang@ceode.ac.cn, cz.xu@siat.ac.cn, nikolaos@siat.ac.cn"/>
King Abdulaziz University, 21589 Saudi Arabia e-mail: asalzahrani@kau.edu.sa"/>
Abstract: In this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.
Aci M., Inan C. & Avci M.2010. A hybrid classification method of k-nearest neighbour, Bayesian method and genetic algorithm. Expert Systems with Applications37(7), 5061–5067.
Baumer E. P. S., Sinclair J. & Tomlinson B.2010. America is like metamucil: fostering critical and creative thinking about metaphor in political blogs. In Proceedings of 28th International Conference on Human Factor in Computing Systems (CHI 2010). ACM, 34–45.
Brucher H., Knolmayer G. & Mittermayer M.2002. Document classification methods for organizing explicit knowledge. In Proceedings of 3rd European Conference on Organizational Knowledge, Learning and Capabilities, 1–25.
Chen W. & Wang M.2009. A fuzzy c-means clustering-based fragile watermarking scheme for image authentication. Expert Systems with Applications36(2), 1300–1307.
Dai Y., Kakkonen T. & Sutinen E.2011. MinEDec: a decision-support model that combines text-mining technologies with two competitive intelligence analysis method. International Journal of Computer Information System and Industrial Management Applications3, 165–173.
Durga A. K. & Govardhan A.2011. Ontology based text categorization-telugu document. International Journal of Scientific and Engineering Research2(9), 1–4.
Esuli A. & Sibastiani F.2006. SentiWordNet: a publicly available lexical resource for opinion mining. In Proceedings of the 5th International Conference on Language Resources and Evaluation, 417–422.
Evans B. M., Kairam S. & Pirolli P.2010. Do your friends make you smarter: an analysis of social strategies in online information seeking. Information Processing and Management46(6), 679–692.
Forman G. & Kirshenbaum E.2008. Extremely fast text feature extraction for classification and indexing. In Proceedings of 17th ACM Conference on Information and Knowledge Management, 26–30.
Gazzah S. & Ammara N. B.2008. Neural network and support vector machines classifiers for writer identification using Arabic script. International Arab Journal of Information Technology5(1), 92–101.
Guzek M., Pecero J. E., Dorronsoro B., Bouvry P. & Khan S. U.2010. A cellular genetic algorithm for scheduling applications and energy-aware communication optimization. In Proceedings of PACM/IEEE/IFIP International Conference on High Performance Computing and Simulation (HPCS), 241–248.
Hang N., Honda K., Ichihashi H. & Notsu A.2008. Linear fuzzy clustering of relational databased on extended fuzzy c-medoids. In Proceedings of IEEE International Conference on Fuzzy Systems, 366–371.
Hua J., Tembe W. D., Dougherty E. R. & Edward R. D.2009. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition42(3), 409–424.
Kano Y., Baumgartner W. A., McCrohon L., Ananiadou S., Cohen K. B., Hunter L. & Tsujii T.2009. Data mining: concept and techniques. Oxford Journal of Bioinformatics25(15), 1997–1998.
Kavitha V. & Punithavalli M.2010. Clustering time series data stream – a literature survey. International Journal of Computer Science and Information Security8(1), 289–294.
Khalessizadeh S. M., Zaefarian R., Nasseri S. H. & Ardil E.2006. Genetic mining: using genetic algorithm for topic based on concept distribution. Journal of Word Academy of Science, Engineering and Technology13(2), 144–147.
Kolodziej J., Burczynski B. & Khan S. U.2012. Advances in Intelligent Modelling and Simulation: Artificial Intelligence-Based Models and Techniques in Scalable Computing, Springer-Verlag.
Kolodziej J., Khan S. U. & Xhafa F.2011. Genetic algorithms for energy-aware scheduling in computational grids. In Proceedings of 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC), 17–24.
Lee L. H., Wan C. H., Yong T. F. & Kok H. M.2010. A review of nearest neighbour-support vector machine hybrid classification model. Journal of Applied Science10(17), 1841–1858.
Li J. & Khan S. U.2009a. MobiSN: semantics-based mobile ad hoc social network framework. In Proceedings of IEEE Global Communications Conference (Globecom), Zomaya, A. Y. & Sarbazi-Azad, H. (eds). John Wiley & Sons, Hoboken, NJ, USA, 2013, ISBN: 978-0-470-93688-7.
Li J. & Khan S. U.2009b. On How to Construct a Social Network from a Mobile Ad Hoc Network. Technical report, NDSU-CS-TR-09-009, North Dakota State University.
Li J., Khan S. U., Li Q., Ghani N., Bouvry P. & Zhang W.2011a. Efficient data sharing over large-scale distributed communities. In Intelligent Decision Systems in Large-Scale Distributed Environments, Bouvry, P., Gonzalez-Velez, H. & Kolodziej, J. (eds). Springer, New York, NY, USA, 2011, pp. 110–128, ISBN: 978-3-642-21270-3.
Li J., Li Q., Khan S. U. & Ghani N.2011b. Community-based cloud for emergency management. In Proceedings of the 6th IEEE International Conference on System of Systems Engineering (SoSE), 55–60.
Li J., Wang H. & Khan S. U.2012. A fully distributed scheme for discovery of semantic relationships. IEEE Transactions on Services Computing6(4), 257–469.
Ling H. S., Bali R. & Salam R.2006. Emotion detection using keywords spotting and semantic network. In Proceedings of International Conference on Computing and Informatics IEEE (ICOCI), 1–5.
Liu F. & Lu X.2011. Survey on text clustering algorithm. In Proceedings of 2nd International IEEE Conference on Software Engineering and Services Science (ICSESS), 901–904.
Meesad P., Boonrawd P. & Nuipian V.2011. A chi-square-test for word importance differentiation in text classification. In Proceedings of International Conference on Information and Electronics Engineering, 110–114.
Miao D., Duan Q., Zhang H. & Jiao N.2009. Rough set based hybrid algorithm for text classification. Journal of Expert Systems with Applications36(5), 9168–9174.
Mitra V., Wang C. & Banerjee S.2005. A neuro-SVM model for text classification using latent semantic indexing. In Proceedings of International Joint Conference on Neural Networks, 564–569.
Patterson D., Rooney N., Galushka M., Dobrynin V. & Smirnova E.2008. SOPHIA-TCBR: a knowledge discovery framework for textual case-based reasoning. Knowledge-Based Systems21(5), 404–414.
Remeikis N., Skucas I. & Melninkaite V.2005. Hybrid machine learning approach for text categorization. International Journal of Computational Intelligence1(1), 63–67.
Ringel M. M., Teevan J. & Panovich K.2010. What do people ask their social networks, and why: a survey study of status message question & answer behavior. In Proceedings of International Conference on Human Factors in Computing Systems (CHI 10), 56–62.
Sathiyakumari K. & Manimekalai G.2011. A survey on various approaches in document clustering. International Journal of Computer Technology and Application (IJCTA)2(5), 1534–1539.
Shekar C. B. H. & Shoba G.2009. Classification of documents using Kohonens self organizing map. International Journal of Computer Theory and Engineering (IACSIT)1(5), 610–613.
Sorensen L.2009. User managed trust in social networking comparing Facebook, MySpace and LinkedIn. In Proceedings of 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic System Technology, (Wireless VITAE 09), 427–431.
Tekiner F., Aanaiadou S., Tsuruoka Y. & Tsuji J.2009. Highly scalable text mining parallel tagging application. In Proceedings of IEEE 5th International Conference on Soft Computing, Computing with Words and Perception in System Analysis, Decision and Control (ICSCCW), 1–4.
Udupa R. & Kumar S.2010. Hashing-based approaches to spelling correction of personal names. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 1256–1265.
Wimalasuriya D. C. & Dou D.2010. Ontology-based information extraction: an introduction and a survey of current approach. Journal of Information Science36(5), 306–323.
Wollmer M., Eyben F., Keshet J., Graves A., Schuller B. & Rigool G.2009. Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 3949–3952.
Wu C.2009. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications36(3), 4321–4330.
Xu X., Zhang F. & Niu Z.2008. An ontology-based query system for digital libraries. In Proceedings of IEEE, Pacific-Asia Workshop on Computational Intelligence and Industrial Application, 222–226.
Yin S., Wang G., Qiu Y. & Zhang W.2007. Research and implement of classification algorithm on web text mining. In Proceedings of 3rd International Conference on Semantics, Knowledge and Grid, 446–449.
Yuan L.2010. Improvement for the automatic part-of-speech tagging based on Hidden Markov Model. In Proceedings of 2nd International Conference on Signal Processing System IEEE (ICSPS), 744–747.
Yonghong Y. & Wenyang B.2010. Text clustering based on term weights automatic partition. In Proceedings of 2nd International Conference on Computer and Automation Engineering (ICCAE), 373–377.
Yoo K.2012. Automatic document archiving for cloud storage using text mining-based topic identification technique. In Proceedings of International Conference on Information and Computer Application, 189–192.
Yoshida K., Tsuruoka Y., Miyao Y. & Tsujii J.2007. Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers. In Proceedings of 20th International Conference on Artificial Intelligence, 1783–1788.
Yu Y. & Hsu C.2011. A structured ontology construction by using data clustering and pattern tree mining. In Proceedings of International Conference on Machine Learning and Cybernetics, 45–49.
Zhao P., Han J. & Sun Y.2009. P-Rank: a comprehensive structural similarity measure over information networks. In Proceedings of 18th ACM Conference on Information and Knowledge Management, 233–238.
Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
Abstract: Abstract: In this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.
HTML
Acknowledgments
We are grateful to Juan Li, Matthew Warner, and Daniel Grages for their feedback on draft of this survey report. Samee U. Khan's work was partly supported by the Young International Scientist Fellowship of the Chinese Academy of Sciences, (Grant No. 2011Y2GA01).
Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
Catalog
Share:
Export File
Citation
Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277