Chinese Academy of Sciences, 100864 China e-mail: lzwang@ceode.ac.cn, cz.xu@siat.ac.cn, nikolaos@siat.ac.cn"/> King Abdulaziz University, 21589 Saudi Arabia e-mail: asalzahrani@kau.edu.sa"/>
2015 Volume 30
Article Contents
RESEARCH ARTICLE   Open Access    

A survey on text mining in social networks

More Information
  • Abstract: In this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.
  • 加载中
  • Aci M., Inan C. & Avci M.2010. A hybrid classification method of k-nearest neighbour, Bayesian method and genetic algorithm. Expert Systems with Applications37(7), 5061–5067.

    Google Scholar

    Aggarwal C.2011. Text mining in social networks. In Social Network Data Analytics, Charu, A. C. (ed.), 2nd edition. Springer, 353–374.

    Google Scholar

    Baatarjav E., Phithakkitnukoon S. & Dantu R.2008. Group Recommendation System for Facebook, 2nd edition. Springer.

    Google Scholar

    Baumer E. P. S., Sinclair J. & Tomlinson B.2010. America is like metamucil: fostering critical and creative thinking about metaphor in political blogs. In Proceedings of 28th International Conference on Human Factor in Computing Systems (CHI 2010). ACM, 34–45.

    Google Scholar

    Brucher H., Knolmayer G. & Mittermayer M.2002. Document classification methods for organizing explicit knowledge. In Proceedings of 3rd European Conference on Organizational Knowledge, Learning and Capabilities, 1–25.

    Google Scholar

    Chang M. & Poon C. K.2009. Using phrases as features in e-mail classification. Journal of System and Software82(6), 1036–1945.

    Google Scholar

    Chen W. & Wang M.2009. A fuzzy c-means clustering-based fragile watermarking scheme for image authentication. Expert Systems with Applications36(2), 1300–1307.

    Google Scholar

    Dai Y., Kakkonen T. & Sutinen E.2011. MinEDec: a decision-support model that combines text-mining technologies with two competitive intelligence analysis method. International Journal of Computer Information System and Industrial Management Applications3, 165–173.

    Google Scholar

    Durga A. K. & Govardhan A.2011. Ontology based text categorization-telugu document. International Journal of Scientific and Engineering Research2(9), 1–4.

    Google Scholar

    Esuli A. & Sibastiani F.2006. SentiWordNet: a publicly available lexical resource for opinion mining. In Proceedings of the 5th International Conference on Language Resources and Evaluation, 417–422.

    Google Scholar

    Evans B. M., Kairam S. & Pirolli P.2010. Do your friends make you smarter: an analysis of social strategies in online information seeking. Information Processing and Management46(6), 679–692.

    Google Scholar

    Forman G. & Kirshenbaum E.2008. Extremely fast text feature extraction for classification and indexing. In Proceedings of 17th ACM Conference on Information and Knowledge Management, 26–30.

    Google Scholar

    Gazzah S. & Ammara N. B.2008. Neural network and support vector machines classifiers for writer identification using Arabic script. International Arab Journal of Information Technology5(1), 92–101.

    Google Scholar

    Guzek M., Pecero J. E., Dorronsoro B., Bouvry P. & Khan S. U.2010. A cellular genetic algorithm for scheduling applications and energy-aware communication optimization. In Proceedings of PACM/IEEE/IFIP International Conference on High Performance Computing and Simulation (HPCS), 241–248.

    Google Scholar

    Hang N., Honda K., Ichihashi H. & Notsu A.2008. Linear fuzzy clustering of relational databased on extended fuzzy c-medoids. In Proceedings of IEEE International Conference on Fuzzy Systems, 366–371.

    Google Scholar

    Hua J., Tembe W. D., Dougherty E. R. & Edward R. D.2009. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition42(3), 409–424.

    Google Scholar

    Jain A. K.2010. Data clustering: 50 years beyond k-means. Pattern Recognition31(8), 651–666.

    Google Scholar

    Jo T.2010. NTC (Neural Text Categorizer): neural network for text categorization. International Journal of Information Science2(2), 83–96.

    Google Scholar

    Kano Y., Baumgartner W. A., McCrohon L., Ananiadou S., Cohen K. B., Hunter L. & Tsujii T.2009. Data mining: concept and techniques. Oxford Journal of Bioinformatics25(15), 1997–1998.

    Google Scholar

    Kavitha V. & Punithavalli M.2010. Clustering time series data stream – a literature survey. International Journal of Computer Science and Information Security8(1), 289–294.

    Google Scholar

    Khalessizadeh S. M., Zaefarian R., Nasseri S. H. & Ardil E.2006. Genetic mining: using genetic algorithm for topic based on concept distribution. Journal of Word Academy of Science, Engineering and Technology13(2), 144–147.

    Google Scholar

    Kolodziej J., Burczynski B. & Khan S. U.2012. Advances in Intelligent Modelling and Simulation: Artificial Intelligence-Based Models and Techniques in Scalable Computing, Springer-Verlag.

    Google Scholar

    Kolodziej J., Khan S. U. & Xhafa F.2011. Genetic algorithms for energy-aware scheduling in computational grids. In Proceedings of 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC), 17–24.

    Google Scholar

    Lee L. H., Wan C. H., Yong T. F. & Kok H. M.2010. A review of nearest neighbour-support vector machine hybrid classification model. Journal of Applied Science10(17), 1841–1858.

    Google Scholar

    Li J. & Khan S. U.2009a. MobiSN: semantics-based mobile ad hoc social network framework. In Proceedings of IEEE Global Communications Conference (Globecom), Zomaya, A. Y. & Sarbazi-Azad, H. (eds). John Wiley & Sons, Hoboken, NJ, USA, 2013, ISBN: 978-0-470-93688-7.

    Google Scholar

    Li J. & Khan S. U.2009b. On How to Construct a Social Network from a Mobile Ad Hoc Network. Technical report, NDSU-CS-TR-09-009, North Dakota State University.

    Google Scholar

    Li J., Khan S. U., Li Q., Ghani N., Bouvry P. & Zhang W.2011a. Efficient data sharing over large-scale distributed communities. In Intelligent Decision Systems in Large-Scale Distributed Environments, Bouvry, P., Gonzalez-Velez, H. & Kolodziej, J. (eds). Springer, New York, NY, USA, 2011, pp. 110–128, ISBN: 978-3-642-21270-3.

    Google Scholar

    Li J., Li Q., Khan S. U. & Ghani N.2011b. Community-based cloud for emergency management. In Proceedings of the 6th IEEE International Conference on System of Systems Engineering (SoSE), 55–60.

    Google Scholar

    Li J., Wang H. & Khan S. U.2012. A fully distributed scheme for discovery of semantic relationships. IEEE Transactions on Services Computing6(4), 257–469.

    Google Scholar

    Ling H. S., Bali R. & Salam R.2006. Emotion detection using keywords spotting and semantic network. In Proceedings of International Conference on Computing and Informatics IEEE (ICOCI), 1–5.

    Google Scholar

    Liu F. & Lu X.2011. Survey on text clustering algorithm. In Proceedings of 2nd International IEEE Conference on Software Engineering and Services Science (ICSESS), 901–904.

    Google Scholar

    Luger G. F.2008. Artificial Intelligence: Structure and Strategies for Complex Problem Solving, 6th edition. Addison Wesley.

    Google Scholar

    Ma C., Helmut P. & Mitsuru l.2005. Emotion Estimation and Reasoning Based on Affective Textual Interaction, 3rd edition. Springer.

    Google Scholar

    Meesad P., Boonrawd P. & Nuipian V.2011. A chi-square-test for word importance differentiation in text classification. In Proceedings of International Conference on Information and Electronics Engineering, 110–114.

    Google Scholar

    Mehmed K.2011. Data Mining: Concepts, Models, Methods, and Algorithms, 2nd edition. John Wiley & Sons.

    Google Scholar

    Miao D., Duan Q., Zhang H. & Jiao N.2009. Rough set based hybrid algorithm for text classification. Journal of Expert Systems with Applications36(5), 9168–9174.

    Google Scholar

    Mitra V., Wang C. & Banerjee S.2005. A neuro-SVM model for text classification using latent semantic indexing. In Proceedings of International Joint Conference on Neural Networks, 564–569.

    Google Scholar

    Negi P. S., Rauthan M. M. S. & Dhami H. S.2010. Language model for information retrieval. International Journal of Computer Applications12(7), 13–17.

    Google Scholar

    Patterson D., Rooney N., Galushka M., Dobrynin V. & Smirnova E.2008. SOPHIA-TCBR: a knowledge discovery framework for textual case-based reasoning. Knowledge-Based Systems21(5), 404–414.

    Google Scholar

    Remeikis N., Skucas I. & Melninkaite V.2005. Hybrid machine learning approach for text categorization. International Journal of Computational Intelligence1(1), 63–67.

    Google Scholar

    Ringel M. M., Teevan J. & Panovich K.2010. What do people ask their social networks, and why: a survey study of status message question & answer behavior. In Proceedings of International Conference on Human Factors in Computing Systems (CHI 10), 56–62.

    Google Scholar

    Sathiyakumari K. & Manimekalai G.2011. A survey on various approaches in document clustering. International Journal of Computer Technology and Application (IJCTA)2(5), 1534–1539.

    Google Scholar

    Shekar C. B. H. & Shoba G.2009. Classification of documents using Kohonens self organizing map. International Journal of Computer Theory and Engineering (IACSIT)1(5), 610–613.

    Google Scholar

    Sorensen L.2009. User managed trust in social networking comparing Facebook, MySpace and LinkedIn. In Proceedings of 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic System Technology, (Wireless VITAE 09), 427–431.

    Google Scholar

    Strapparava C. & Ozbal G.2010. The color of emotion in text. In Proceedings of 2nd Workshop on Cognitive Aspects of the Lexicon, 28–32.

    Google Scholar

    Tekiner F., Aanaiadou S., Tsuruoka Y. & Tsuji J.2009. Highly scalable text mining parallel tagging application. In Proceedings of IEEE 5th International Conference on Soft Computing, Computing with Words and Perception in System Analysis, Decision and Control (ICSCCW), 1–4.

    Google Scholar

    Udupa R. & Kumar S.2010. Hashing-based approaches to spelling correction of personal names. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 1256–1265.

    Google Scholar

    Wimalasuriya D. C. & Dou D.2010. Ontology-based information extraction: an introduction and a survey of current approach. Journal of Information Science36(5), 306–323.

    Google Scholar

    Wollmer M., Eyben F., Keshet J., Graves A., Schuller B. & Rigool G.2009. Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 3949–3952.

    Google Scholar

    Wu C.2009. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications36(3), 4321–4330.

    Google Scholar

    Xu X., Zhang F. & Niu Z.2008. An ontology-based query system for digital libraries. In Proceedings of IEEE, Pacific-Asia Workshop on Computational Intelligence and Industrial Application, 222–226.

    Google Scholar

    Yin S., Wang G., Qiu Y. & Zhang W.2007. Research and implement of classification algorithm on web text mining. In Proceedings of 3rd International Conference on Semantics, Knowledge and Grid, 446–449.

    Google Scholar

    Yuan L.2010. Improvement for the automatic part-of-speech tagging based on Hidden Markov Model. In Proceedings of 2nd International Conference on Signal Processing System IEEE (ICSPS), 744–747.

    Google Scholar

    Yonghong Y. & Wenyang B.2010. Text clustering based on term weights automatic partition. In Proceedings of 2nd International Conference on Computer and Automation Engineering (ICCAE), 373–377.

    Google Scholar

    Yoo K.2012. Automatic document archiving for cloud storage using text mining-based topic identification technique. In Proceedings of International Conference on Information and Computer Application, 189–192.

    Google Scholar

    Yoshida K., Tsuruoka Y., Miyao Y. & Tsujii J.2007. Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers. In Proceedings of 20th International Conference on Artificial Intelligence, 1783–1788.

    Google Scholar

    Yu Y. & Hsu C.2011. A structured ontology construction by using data clustering and pattern tree mining. In Proceedings of International Conference on Machine Learning and Cybernetics, 45–49.

    Google Scholar

    Zhao P., Han J. & Sun Y.2009. P-Rank: a comprehensive structural similarity measure over information networks. In Proceedings of 18th ACM Conference on Information and Knowledge Management, 233–238.

    Google Scholar

  • Cite this article

    Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
    Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277

Article Metrics

Article views(24) PDF downloads(59)

RESEARCH ARTICLE   Open Access    

A survey on text mining in social networks

The Knowledge Engineering Review  30 2015, 30(2): 157−170  |  Cite this article

Abstract: Abstract: In this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.

    • We are grateful to Juan Li, Matthew Warner, and Daniel Grages for their feedback on draft of this survey report. Samee U. Khan's work was partly supported by the Young International Scientist Fellowship of the Chinese Academy of Sciences, (Grant No. 2011Y2GA01).

    • © Cambridge University Press, 2015 2015Cambridge University Press
References (58)
  • About this article
    Cite this article
    Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
    Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review 30(2)157−170, doi: 10.1017/S0269888914000277
  • Catalog