A survey on text mining in social networks

Rizwana Irfan; Christine K. King; Daniel Grages; Sam Ewen; Samee U. Khan; Sajjad A. Madani; Joanna Kolodziej; Lizhe Wang; Dan Chen; Ammar Rayes; Nikolaos Tziritas; Cheng-Zhong Xu; Albert Y. Zomaya; Ahmed Saeed Alzahrani; Hongxiang Li; Rizwana Irfan; Christine K. King; Daniel Grages; Sam Ewen; Samee U. Khan; Sajjad A. Madani; Joanna Kolodziej; Lizhe Wang; Dan Chen; Ammar Rayes; Nikolaos Tziritas; Cheng-Zhong Xu; Albert Y. Zomaya; Ahmed Saeed Alzahrani; Hongxiang Li

doi:10.1017/S0269888914000277

2015 Volume 30

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

A survey on text mining in social networks

1.
Fargo
2.
44000 Islamabad
3.
30001 Cracow
4.
4Chinese Academy of Sciences, 100864 China e-mail: lzwang@ceode.ac.cn, cz.xu@siat.ac.cn, nikolaos@siat.ac.cn
5.
430000 Wuhan
6.
San Jose
7.
2006 NSW
8.
8King Abdulaziz University, 21589 Saudi Arabia e-mail: asalzahrani@kau.edu.sa
9.
40292 KY

More Information

Published online: 25 March 2015
The Knowledge Engineering Review 30, Article number: 10.1017/S0269888914000277 (2015) | Cite this article

Abstract

Abstract: In this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.
Rights and permissions
© Cambridge University Press, 2015 2015Cambridge University Press

References

Aci M., Inan C. & Avci M.2010. A hybrid classification method of k-nearest neighbour, Bayesian method and genetic algorithm. Expert Systems with Applications37(7), 5061–5067.

Google Scholar

Aggarwal C.2011. Text mining in social networks. In Social Network Data Analytics, Charu, A. C. (ed.), 2nd edition. Springer, 353–374.

Google Scholar

Baatarjav E., Phithakkitnukoon S. & Dantu R.2008. Group Recommendation System for Facebook, 2nd edition. Springer.

Google Scholar

Baumer E. P. S., Sinclair J. & Tomlinson B.2010. America is like metamucil: fostering critical and creative thinking about metaphor in political blogs. In Proceedings of 28th International Conference on Human Factor in Computing Systems (CHI 2010). ACM, 34–45.

Google Scholar

Brucher H., Knolmayer G. & Mittermayer M.2002. Document classification methods for organizing explicit knowledge. In Proceedings of 3rd European Conference on Organizational Knowledge, Learning and Capabilities, 1–25.

Google Scholar

Chang M. & Poon C. K.2009. Using phrases as features in e-mail classification. Journal of System and Software82(6), 1036–1945.

Google Scholar

Chen W. & Wang M.2009. A fuzzy c-means clustering-based fragile watermarking scheme for image authentication. Expert Systems with Applications36(2), 1300–1307.

Google Scholar

Dai Y., Kakkonen T. & Sutinen E.2011. MinEDec: a decision-support model that combines text-mining technologies with two competitive intelligence analysis method. International Journal of Computer Information System and Industrial Management Applications3, 165–173.

Google Scholar

Durga A. K. & Govardhan A.2011. Ontology based text categorization-telugu document. International Journal of Scientific and Engineering Research2(9), 1–4.

Google Scholar

Esuli A. & Sibastiani F.2006. SentiWordNet: a publicly available lexical resource for opinion mining. In Proceedings of the 5th International Conference on Language Resources and Evaluation, 417–422.

Google Scholar

Evans B. M., Kairam S. & Pirolli P.2010. Do your friends make you smarter: an analysis of social strategies in online information seeking. Information Processing and Management46(6), 679–692.

Google Scholar

Forman G. & Kirshenbaum E.2008. Extremely fast text feature extraction for classification and indexing. In Proceedings of 17th ACM Conference on Information and Knowledge Management, 26–30.

Google Scholar

Gazzah S. & Ammara N. B.2008. Neural network and support vector machines classifiers for writer identification using Arabic script. International Arab Journal of Information Technology5(1), 92–101.

Google Scholar

Guzek M., Pecero J. E., Dorronsoro B., Bouvry P. & Khan S. U.2010. A cellular genetic algorithm for scheduling applications and energy-aware communication optimization. In Proceedings of PACM/IEEE/IFIP International Conference on High Performance Computing and Simulation (HPCS), 241–248.

Google Scholar

Hang N., Honda K., Ichihashi H. & Notsu A.2008. Linear fuzzy clustering of relational databased on extended fuzzy c-medoids. In Proceedings of IEEE International Conference on Fuzzy Systems, 366–371.

Google Scholar

Hua J., Tembe W. D., Dougherty E. R. & Edward R. D.2009. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition42(3), 409–424.

Google Scholar

Jain A. K.2010. Data clustering: 50 years beyond k-means. Pattern Recognition31(8), 651–666.

Google Scholar

Jo T.2010. NTC (Neural Text Categorizer): neural network for text categorization. International Journal of Information Science2(2), 83–96.

Google Scholar

Kano Y., Baumgartner W. A., McCrohon L., Ananiadou S., Cohen K. B., Hunter L. & Tsujii T.2009. Data mining: concept and techniques. Oxford Journal of Bioinformatics25(15), 1997–1998.

Google Scholar

Kavitha V. & Punithavalli M.2010. Clustering time series data stream – a literature survey. International Journal of Computer Science and Information Security8(1), 289–294.

Google Scholar

Khalessizadeh S. M., Zaefarian R., Nasseri S. H. & Ardil E.2006. Genetic mining: using genetic algorithm for topic based on concept distribution. Journal of Word Academy of Science, Engineering and Technology13(2), 144–147.

Google Scholar

Kolodziej J., Burczynski B. & Khan S. U.2012. Advances in Intelligent Modelling and Simulation: Artificial Intelligence-Based Models and Techniques in Scalable Computing, Springer-Verlag.

Google Scholar

Kolodziej J., Khan S. U. & Xhafa F.2011. Genetic algorithms for energy-aware scheduling in computational grids. In Proceedings of 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC), 17–24.

Google Scholar

Lee L. H., Wan C. H., Yong T. F. & Kok H. M.2010. A review of nearest neighbour-support vector machine hybrid classification model. Journal of Applied Science10(17), 1841–1858.

Google Scholar

Li J. & Khan S. U.2009a. MobiSN: semantics-based mobile ad hoc social network framework. In Proceedings of IEEE Global Communications Conference (Globecom), Zomaya, A. Y. & Sarbazi-Azad, H. (eds). John Wiley & Sons, Hoboken, NJ, USA, 2013, ISBN: 978-0-470-93688-7.

Google Scholar

Li J. & Khan S. U.2009b. On How to Construct a Social Network from a Mobile Ad Hoc Network. Technical report, NDSU-CS-TR-09-009, North Dakota State University.

Google Scholar

Li J., Khan S. U., Li Q., Ghani N., Bouvry P. & Zhang W.2011a. Efficient data sharing over large-scale distributed communities. In Intelligent Decision Systems in Large-Scale Distributed Environments, Bouvry, P., Gonzalez-Velez, H. & Kolodziej, J. (eds). Springer, New York, NY, USA, 2011, pp. 110–128, ISBN: 978-3-642-21270-3.

Google Scholar

Li J., Li Q., Khan S. U. & Ghani N.2011b. Community-based cloud for emergency management. In Proceedings of the 6th IEEE International Conference on System of Systems Engineering (SoSE), 55–60.

Google Scholar

Li J., Wang H. & Khan S. U.2012. A fully distributed scheme for discovery of semantic relationships. IEEE Transactions on Services Computing6(4), 257–469.

Google Scholar

Ling H. S., Bali R. & Salam R.2006. Emotion detection using keywords spotting and semantic network. In Proceedings of International Conference on Computing and Informatics IEEE (ICOCI), 1–5.

Google Scholar

Liu F. & Lu X.2011. Survey on text clustering algorithm. In Proceedings of 2nd International IEEE Conference on Software Engineering and Services Science (ICSESS), 901–904.

Google Scholar

Luger G. F.2008. Artificial Intelligence: Structure and Strategies for Complex Problem Solving, 6th edition. Addison Wesley.

Google Scholar

Ma C., Helmut P. & Mitsuru l.2005. Emotion Estimation and Reasoning Based on Affective Textual Interaction, 3rd edition. Springer.

Google Scholar

Meesad P., Boonrawd P. & Nuipian V.2011. A chi-square-test for word importance differentiation in text classification. In Proceedings of International Conference on Information and Electronics Engineering, 110–114.

Google Scholar

Mehmed K.2011. Data Mining: Concepts, Models, Methods, and Algorithms, 2nd edition. John Wiley & Sons.

Google Scholar

Miao D., Duan Q., Zhang H. & Jiao N.2009. Rough set based hybrid algorithm for text classification. Journal of Expert Systems with Applications36(5), 9168–9174.

Google Scholar

Mitra V., Wang C. & Banerjee S.2005. A neuro-SVM model for text classification using latent semantic indexing. In Proceedings of International Joint Conference on Neural Networks, 564–569.

Google Scholar

Negi P. S., Rauthan M. M. S. & Dhami H. S.2010. Language model for information retrieval. International Journal of Computer Applications12(7), 13–17.

Google Scholar

Patterson D., Rooney N., Galushka M., Dobrynin V. & Smirnova E.2008. SOPHIA-TCBR: a knowledge discovery framework for textual case-based reasoning. Knowledge-Based Systems21(5), 404–414.

Google Scholar

Remeikis N., Skucas I. & Melninkaite V.2005. Hybrid machine learning approach for text categorization. International Journal of Computational Intelligence1(1), 63–67.

Google Scholar

Ringel M. M., Teevan J. & Panovich K.2010. What do people ask their social networks, and why: a survey study of status message question & answer behavior. In Proceedings of International Conference on Human Factors in Computing Systems (CHI 10), 56–62.

Google Scholar

Sathiyakumari K. & Manimekalai G.2011. A survey on various approaches in document clustering. International Journal of Computer Technology and Application (IJCTA)2(5), 1534–1539.

Google Scholar

Shekar C. B. H. & Shoba G.2009. Classification of documents using Kohonens self organizing map. International Journal of Computer Theory and Engineering (IACSIT)1(5), 610–613.

Google Scholar

Sorensen L.2009. User managed trust in social networking comparing Facebook, MySpace and LinkedIn. In Proceedings of 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic System Technology, (Wireless VITAE 09), 427–431.

Google Scholar

Strapparava C. & Ozbal G.2010. The color of emotion in text. In Proceedings of 2nd Workshop on Cognitive Aspects of the Lexicon, 28–32.

Google Scholar

Tekiner F., Aanaiadou S., Tsuruoka Y. & Tsuji J.2009. Highly scalable text mining parallel tagging application. In Proceedings of IEEE 5th International Conference on Soft Computing, Computing with Words and Perception in System Analysis, Decision and Control (ICSCCW), 1–4.

Google Scholar

Udupa R. & Kumar S.2010. Hashing-based approaches to spelling correction of personal names. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 1256–1265.

Google Scholar

Wimalasuriya D. C. & Dou D.2010. Ontology-based information extraction: an introduction and a survey of current approach. Journal of Information Science36(5), 306–323.

Google Scholar

Wollmer M., Eyben F., Keshet J., Graves A., Schuller B. & Rigool G.2009. Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 3949–3952.

Google Scholar

Wu C.2009. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications36(3), 4321–4330.

Google Scholar

Xu X., Zhang F. & Niu Z.2008. An ontology-based query system for digital libraries. In Proceedings of IEEE, Pacific-Asia Workshop on Computational Intelligence and Industrial Application, 222–226.

Google Scholar

Yin S., Wang G., Qiu Y. & Zhang W.2007. Research and implement of classification algorithm on web text mining. In Proceedings of 3rd International Conference on Semantics, Knowledge and Grid, 446–449.

Google Scholar

Yuan L.2010. Improvement for the automatic part-of-speech tagging based on Hidden Markov Model. In Proceedings of 2nd International Conference on Signal Processing System IEEE (ICSPS), 744–747.

Google Scholar

Yonghong Y. & Wenyang B.2010. Text clustering based on term weights automatic partition. In Proceedings of 2nd International Conference on Computer and Automation Engineering (ICCAE), 373–377.

Google Scholar

Yoo K.2012. Automatic document archiving for cloud storage using text mining-based topic identification technique. In Proceedings of International Conference on Information and Computer Application, 189–192.

Google Scholar

Yoshida K., Tsuruoka Y., Miyao Y. & Tsujii J.2007. Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers. In Proceedings of 20th International Conference on Artificial Intelligence, 1783–1788.

Google Scholar

Yu Y. & Hsu C.2011. A structured ontology construction by using data clustering and pattern tree mining. In Proceedings of International Conference on Machine Learning and Cybernetics, 45–49.

Google Scholar

Zhao P., Han J. & Sun Y.2009. P-Rank: a comprehensive structural similarity measure over information networks. In Proceedings of 18th ACM Conference on Information and Knowledge Management, 233–238.

Google Scholar

About this article

Cite this article

Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review. 30:277 doi: 10.1017/S0269888914000277

Rizwana Irfan, Christine K. King, Daniel Grages, Sam Ewen, Samee U. Khan, Sajjad A. Madani, Joanna Kolodziej, Lizhe Wang, Dan Chen, Ammar Rayes, Nikolaos Tziritas, Cheng-Zhong Xu, Albert Y. Zomaya, Ahmed Saeed Alzahrani, Hongxiang Li. 2015. A survey on text mining in social networks. The Knowledge Engineering Review. 30:277 doi: 10.1017/S0269888914000277

Download PDF

Article Metrics

Article views(94) PDF downloads(243)

{{lists.name}}

A survey on text mining in social networks

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors