Search
2017 Volume 32
Article Contents
RESEARCH ARTICLE   Open Access    

Emerging approaches in literature-based discovery: techniques and performance review

More Information
  • Abstract: Literature-based discovery systems aim at discovering valuable latent connections between previously disparate research areas. This is achieved by analyzing the contents of their respective literatures with the help of various intelligent computational techniques. In this paper, we review the progress of literature-based discovery research, focusing on understanding their technical features and evaluating their performance. The present literature-based discovery techniques can be divided into two general approaches: the traditional approach and the emerging approach. The traditional approach, which dominate the current research landscape, comprises mainly of techniques that rely on utilizing lexical statistics, knowledge-based and visualization methods in order to address literature-based discovery problems. On the other hand, we have also observed the births of new trends and unprecedented paradigm shifts among the recently emerging literature-based discovery approach. These trends are likely to shape the future trajectory of the next generation literature-based discovery systems.
  • 加载中
  • Andronis C., Sharma A., Deftereos S., Virvilis V., Konstanti O., Persidis A. & Persidis A.2012. Mining scientific and clinical databases to identify novel uses for existing drugs. In Drug Repositioning: Bringing New Life to Shelved Assets and Existing Drugs, Michael J. Barrat & Donald E. Frail (eds). Wiley, 137.

    Google Scholar

    Bassecoulard E. & Zitt M.2004. Patents and publications. InHandbook of Quantitative Science and Technology Research, Henk F. Moed, Wolfgang Glänzel, & Ulrich Schmoch (eds). Springer,665–694.

    Google Scholar

    Bekhuis T.2006. Conceptual biology, hypothesis discovery, and text mining: Swanson’s legacy. Biomedical Digital Libraries3(1), 1.

    Google Scholar

    Berry M. W. & Castellanos M.2004. Survey of text mining.Computing Reviews45(9), 548.

    Google Scholar

    Blei D. M., Ng A. Y. & Jordan M. I.2003. Latent dirichlet allocation.Journal of Machine Learning Research3,993–1022.

    Google Scholar

    Bornmann L. & Mutz R.2015. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology66(11),2215–2222.

    Google Scholar

    Boyack K. W. & Klavans R.2010. Co-citation analysis, bibliographic coupling, and direct citation: which citation approach represents the research front most accurately?Journal of the American Society for Information Science and Technology61(12),2389–2404.

    Google Scholar

    Boyack K. W., Small H. & Klavans R.2013. Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology64(9),1759–1767.

    Google Scholar

    Brin S. & Page L.2012. Reprint of: the anatomy of a large-scale hypertextual web search engine. Computer Networks56(18),3825–3833.

    Google Scholar

    Callon M., Courtial J.-P., Turner W. A. & Bauin S.1983. From translations to problematic networks: an introduction to co-word analysis. Social Science Information22(2),191–235.

    Google Scholar

    Cameron D., Bodenreider O., Yalamanchili H., Danh T., Vallabhaneni S., Thirunarayan K., Sheth A. P. & Rindflesch T. C.2013. A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications.Journal of Biomedical Informatics46(2),238–251.

    Google Scholar

    Cameron D. H.2014. A Context-Driven Subgraph Model for Literature-Based Discovery. PhD thesis, Wright State University.

    Google Scholar

    Cameron D., Kavuluru R., Rindflesch T. C., Sheth A. P., Thirunarayan K. & Bodenreider O.2015. Context-driven automatic subgraph creation for literature-based discovery. Journal of Biomedical Informatics54,141–157.

    Google Scholar

    Chang J. & Blei D. M.2010. Hierarchical relational models for document networks. The Annals of Applied Statistics4(1),124–150.

    Google Scholar

    Chen C.2012. Predictive effects of structural variation on citation counts. Journal of the American Society for Information Science and Technology63(3),431–449.

    Google Scholar

    Chen C., Chen Y., Horowitz M., Hou H., Liu Z. & Pellegrino D.2009. Towards an explanatory and computational theory of scientific discovery. Journal of Informetrics3(3),191–209.

    Google Scholar

    Chen H.-H., Gou L., Zhang X. L. & Giles C. L.2013. Towards the discovery of diseases related by genes using vertex similarity measures. In 2013 IEEE International Conference on Healthcare Informatics (ICHI), 505–510. IEEE.

    Google Scholar

    Cohen A. M. & Hersh W. R.2005. A survey of current work in biomedical text mining. Briefings in Bioinformatics6(1),57–71.

    Google Scholar

    Cohen P. R.2015. Darpa’s big mechanism program. Physical Biology12(4), 045008.

    Google Scholar

    Cohen T., Schvaneveldt R. & Widdows D.2010. Reflective random indexing and indirect inference: a scalable method for discovery of implicit connections. Journal of Biomedical Informatics43(2),240–256.

    Google Scholar

    Cohen T., Widdows D. & Rindflesch T.2015. Expansion-by-analogy: a vector symbolic approach to semantic search. In Quantum Interaction: 8th International Conference, QI 2014, Filzbach, Switzerland, June 30–July 3, Atmanspacher, H., Bergomi, C., Filk, T. & Kitto, K. (eds). Springer International Publishing, 54–66.

    Google Scholar

    Cohen T., Widdows D., Schvaneveldt R. W., Davies P. & Rindflesch T. C.2012. Discovering discovery patterns with predication-based semantic indexing. Journal of Biomedical Informatics45(6),1049–1065.

    Google Scholar

    Cohen T., Widdows D., Stephan C., Zinner R., Kim J., Rindflesch T. & Davies P.2014. Predicting high-throughput screening results with scalable literature-based discovery methods.CPT: Pharmacometrics & Systems Pharmacology3(10),1–9.

    Google Scholar

    Cory K. A.1997. Discovering hidden analogies in an online humanities database. Computers and the Humanities31(1),1–12.

    Google Scholar

    Davies R.1989. The creation of new knowledge by information retrieval and classification. Journal of Documentation45(4),273–301.

    Google Scholar

    Deerwester S., Dumais S. T., Furnas G. W., Landauer T. K. & Harshman R.1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science41(6), 391.

    Google Scholar

    DiGiacomo R. A., Kremer J. M. & Shah D. M.1989. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. The American Journal of Medicine86(2),158–164.

    Google Scholar

    Ding Y., Song M., Han J., Yu Q., Yan E., Lin L. & Chambers T.2013. Entitymetrics: measuring the impact of entities. PloS One8(8), e71416.

    Google Scholar

    Eronen L. & Toivonen H.2012. Biomine: predicting links between biological entities using network models of heterogeneous databases.BMC Bioinformatics13(1), 1.

    Google Scholar

    Feller I. & Stern P. C.2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. National Academies Press.

    Google Scholar

    Freeman L. C.1978. Centrality in social networks conceptual clarification. Social Networks1(3),215–239.

    Google Scholar

    Frijters R., Van Vugt M., Smeets R., Van Schaik R., De Vlieg J. & Alkema W.2010. Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Computational Biology6(9), e1000943.

    Google Scholar

    Fujita K.2012. Finding linkage between sustainability science and technologies based on citation network analysis. In 2012 Fifth IEEE International Conference on Service-Oriented Computing and Applications (SOCA), 1–6. IEEE.

    Google Scholar

    Ganiz M., Pottenger W. M. & Janneck C. D.2005. Recent Advances in Literature Based Discovery. Technical report, Lehigh University.

    Google Scholar

    Getoor L. & Diehl C. P.2005. Link mining: a survey.ACM SIGKDD Explorations Newsletter7(2),3–12.

    Google Scholar

    Goodwin J. C., Cohen T. & Rindflesch T.2012. Discovery by scent: discovery browsing system based on the information foraging theory. In Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), 232–239. IEEE.

    Google Scholar

    Gordon M. D. & Dumais S.1998. Using latent semantic indexing for literature based discovery. Journal of the American Society for Information Science49(8),674–685.

    Google Scholar

    Gordon M. D. & Lindsay R. K.1996. Toward discovery support systems: a replication, re-examination, and extension of Swanson’s work on literature-based discovery of a connection between Raynaud’s and fish oil. Journal of the American Society for Information Science47(2),116–128.

    Google Scholar

    Gordon M., Lindsay R. K. & Fan W.2002. Literature-based discovery on the world wide web. ACM Transactions on Internet Technology2(4),261–275.

    Google Scholar

    Hahn U., Cohen K. B., Garten Y. & Shah N. H.2012. Mining the pharmacogenomics literature: a survey of the state of the art. Briefings in Bioinformatics13(4),460–494.

    Google Scholar

    Hristovski D., Džeroski S., Peterlin B. & Rožić A.2000. Supporting discovery in medicine by association rule mining of bibliographic databases. In Principles of Data Mining and Knowledge Discovery: 4th European Conference, PKDD 2000 Lyon, France, September 13–16, 2000 Proceedings, Zighed, D. A., Komorowski, J, Żytkow, J. (eds). Springer Berlin Heidelberg,149–159.

    Google Scholar

    Hristovski D., Friedman C., Rindflesch T. C. & Peterlin B.2006. Exploiting semantic relations for literature-based discovery. In Proceedings of the 2006 AMIA Symposium, 349–353.

    Google Scholar

    Hu X., Yoo I., Song M., Zhang Y. & Song I.-Y.2005. Mining undiscovered public knowledge from complementary and non-interactive biomedical literature through semantic pruning. InProceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05, 249–250. ACM.

    Google Scholar

    Ittipanuvat V., Fujita K., Sakata I. & Kajikawa Y.2014. Finding linkage between technology and social issue: a literature based discovery approach.Journal of Engineering and Technology Management32,160–184.

    Google Scholar

    Janssens F., Glänzel W. & De Moor B.2008. A hybrid mapping of information science. Scientometrics75(3),607–631.

    Google Scholar

    Jensen L. J., Saric J. & Bork P.2006. Literature mining for the biologist: from information retrieval to biological discovery.Nature Reviews Genetics7(2),119–129.

    Google Scholar

    Juršič M., Sluban B., Cestnik B., Grčar M. & Lavrač N.2012. Bridging concept identification for constructing information networks from text documents. InBisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications, M. R. Berthold (ed.).Springer Berlin Heidelberg,66–90.

    Google Scholar

    Kastrin A., Rindflesch T. C. & Hristovski D.2013. Link prediction in a mesh co-occurrence network: preliminary results. Studies in Health Technology and Informatics205,579–583.

    Google Scholar

    Kessler M. M.1963. Bibliographic coupling between scientific papers. American Documentation14(1),10–25.

    Google Scholar

    Kleinberg J. M.1999. Authoritative sources in a hyperlinked environment. Journal of the ACM46(5),604–632.

    Google Scholar

    Kostoff R. N.2007. Validating discovery in literature-based discovery. Journal of Biomedical Informatics40(4),448–450.

    Google Scholar

    Kostoff R. N.2008. Literature-related discovery (LRD): potential treatments for cataracts. Technological Forecasting and Social Change75(2),215–225.

    Google Scholar

    Kostoff R. N.2012. Literature-related discovery and innovation update. Technological Forecasting and Social Change79(4),789–800.

    Google Scholar

    Kostoff R. N.2014. Literature-related discovery: common factors for Parkinson’s disease and Crohn’s disease.Scientometrics100(3),623–657.

    Google Scholar

    Kostoff R. N., Block J. A., Solka J. L., Briggs M. B., Rushenberg R. L., Stump J. A., Johnson D., Lyons T. J. & Wyatt J. R.2009. Literature-related discovery.Annual Review of Information Science and Technology43(1),1–71.

    Google Scholar

    Kostoff R. N. & Briggs M. B.2008. Literature-related discovery (LRD): potential treatments for Parkinson’s disease.Technological Forecasting and Social Change75(2),226–238.

    Google Scholar

    Kostoff R. N., Briggs M. B. & Lyons T. J.2008. Literature-related discovery (LRD): potential treatments for multiple sclerosis. Technological Forecasting and Social Change75(2),239–255.

    Google Scholar

    Kostoff R. N., Solka J. L., Rushenberg R. L. & Wyatt J. A.2008. Literature-related discovery (LRD): water purification. Technological Forecasting and Social Change75(2),256–275.

    Google Scholar

    Kraines S. B., Guo W., Hoshiyama D., Makino T., Mizutani H., Okuda Y., Shidahara Y. & Takagi T.2010. Literature-based knowledge discovery from relationship associations based on a DL ontology created from mesh. In Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management, 87–106. Springer.

    Google Scholar

    Larsen P. O. & Von Ins M.2010. The rate of growth in scientific publication and the decline in coverage provided by science citation index. Scientometrics84(3),575–603.

    Google Scholar

    Leskovec J., Kleinberg J. & Faloutsos C.2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 177–187. ACM.

    Google Scholar

    Leskovec J., Lang K. J. & Mahoney M.2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web, 631–640. ACM.

    Google Scholar

    Li C., Liakata M. & Rebholz-Schuhmann D.2014. Biological network extraction from scientific literature: state of the art and challenges.Briefings in Bioinformatics15(5),856–877.

    Google Scholar

    Li J., Zhu X. & Chen J. Y.2010. Discovering breast cancer drug candidates from biomedical literature. International Journal of Data Mining and Bioinformatics4(3),241–255.

    Google Scholar

    Lindsay R. K. & Gordon M. D.1999. Literature-based discovery by lexical statistics. Journal of the Association for Information Science and Technology50(7), 574.

    Google Scholar

    Lytras M., Sicilia M.-A., Davies J., Kashyap V. & Hu X.2005. Mining novel connections from large online digital library using biomedical ontologies. Library Management26(4/5),261–270.

    Google Scholar

    Manning C. D., Raghavan P. & Schütze H.2008. Introduction to Information Retrieval.Cambridge University Press.

    Google Scholar

    Marsi E. & Öztürk P.2015. Extraction and generalisation of variables from scientific publications. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015).

    Google Scholar

    Marsi E., Oztürk P., Aamot E., Sizov G. & Ardelan M. V.2014. Towards text mining in climate science: extraction of quantitative variables and their relations. In Proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, Reykjavik, Iceland.

    Google Scholar

    Meyer H. S. & Lundberg G. D.1985. Fifty-One Landmark Articles in Medicine: The JAMA Centennial Series. Chicago Review Press.

    Google Scholar

    Miller C. M., Rindflesch T. C., Fiszman M., Hristovski D., Shin D., Rosemblat G., Zhang H. & Strohl K. P.2012. A closed literature-based discovery technique finds a mechanistic link between hypogonadism and diminished sleep quality in aging men. Sleep35(2),279–285.

    Google Scholar

    Mostafa J., Seki K. & Ke W.2009. Beyond information retrieval: literature mining for biomedical knowledge discovery. In J. Y. Chen & S. Lonardi (eds). Biological Data Mining.CRC Press,449–485.

    Google Scholar

    Nakamura H., Ii S., Chida H., Friedl K., Suzuki S., Mori J. & Kajikawa Y.2014. Shedding light on a neglected area: a new approach to knowledge creation. Sustainability Science9(2),193–204.

    Google Scholar

    Narayanasamy V., Mukhopadhyay S., Palakal M. & Potter D. A.2004. Transminer: Mining transitive associations among biological objects from text. Journal of Biomedical Science11(6),864–873.

    Google Scholar

    Newman M. E.2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences98(2),404–409.

    Google Scholar

    Newman M. E.2003. The structure and function of complex networks. SIAM Review45(2),167–256.

    Google Scholar

    Newman M. E.2004. Fast algorithm for detecting community structure in networks. Physical Review E69(6), 066133.

    Google Scholar

    Novacek V.2015. Formalising hypothesis virtues in knowledge graphs: a general theoretical framework and its validation in literature-based discovery experiments. arXiv preprint arXiv:1503.09137.

    Google Scholar

    Perez-Iratxeta C., Bork P. & Andrade M. A.2002. Association of genes to genetically inherited diseases using data mining. Nature Genetics31(3),316–319.

    Google Scholar

    Perez-Iratxeta C., Wjst M., Bork P. & Andrade M. A.2005. G2d: a tool for mining genes associated with disease. BMC Genetics6(1), 1.

    Google Scholar

    Petrič I., Cestnik B., Lavrač N. & Urbančič T.2010. Outlier detection in cross-context link discovery for creative literature mining 55(1). The Computer Journal,47–61.

    Google Scholar

    Piatetsky-Shapiro G., Djeraba C., Getoor L., Grossman R., Feldman R. & Zaki M.2006. What are the grand challenges for data mining?: Kdd-2006 panel report. ACM SIGKDD Explorations Newsletter8(2),70–77.

    Google Scholar

    Pratt W. & Yetisgen-Yildiz M.2003. Litlinker: capturing connections across the biomedical literature. In Proceedings of the 2nd International Conference on Knowledge Capture, K-CAP ’03, 105–112. ACM.

    Google Scholar

    Preiss J. & Stevenson R.2016. The effect of word sense disambiguation accuracy on literature based discovery. BMC Medical Informatics and Decision Making16(Suppl 1), 57.

    Google Scholar

    Preiss J., Stevenson M. & Gaizauskas R.2015. Exploring relation types for literature-based discovery,Journal of the American Medical Informatics Association22(5),987–992.

    Google Scholar

    Salton G. & McGill M. J.1986. Introduction to Modern Information Retrieval.McGraw-Hill.

    Google Scholar

    Sebastian Y.2014. Cluster links prediction for literature based discovery using latent structure and semantic features. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, 1275–1275. ACM.

    Google Scholar

    Sebastian Y., Siew E.-G. & Orimaye S. O.2015. Predicting future links between disjoint research areas using heterogeneous bibliographic information network. In Advances in Knowledge Discovery and Data Mining: 19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, T. Cao, E.-P. Lim, Z.-H. Zhou, T.-B. Ho, D. Cheung H. Motoda (eds). Springer International Publishing, 610–621.

    Google Scholar

    Sebastian Y., Siew E.-G. & Orimaye S. O.2017. Learning the heterogeneous bibliographic information network for literature-based discovery.Knowledge-Based Systems115,66–79.

    Google Scholar

    Seki K.2015. Hypothesis discovery exploiting closed chains of relation. In A. Hameurlain, J. Küng & R. Wagner (eds). Transactions on Large-Scale Data- and Knowledge-Centered Systems XXII. Springer Berlin Heidelberg,145–164.

    Google Scholar

    Shang N., Xu H., Rindflesch T. C. & Cohen T.2014. Identifying plausible adverse drug reactions using knowledge extracted from the literature.Journal of Biomedical Informatics52,293–310.

    Google Scholar

    Smalheiser N. R.2012. Literature-based discovery: beyond the ABCs. Journal of the American Society for Information Science and Technology63(2),218–224.

    Google Scholar

    Smalheiser N. R. & Swanson D. R.1996a. Indomethacin and Alzheimer’s disease. Neurology46(2),583–583.

    Google Scholar

    Smalheiser N. R. & Swanson D. R.1996b. Linking estrogen to Alzheimer’s disease an informatics approach. Neurology47(3),809–810.

    Google Scholar

    Smalheiser N. R. & Torvik V. I.2008. The place of literature-based discovery in contemporary scientific practice. In P. Bruza & M. Weeber (eds). Literature-Based Discovery.Springer Berlin Heidelberg,13–22.

    Google Scholar

    Small H.2010. Maps of science as interdisciplinary discourse: co-citation contexts and the role of analogy.Scientometrics83(3),835–849.

    Google Scholar

    Sneed W. A.2003. Knowledge Synthesis in the Biomedical Literature: Nordihydroguaiaretic Acid and Breast Cancer. PhD thesis, University of North Texas.

    Google Scholar

    Song M., Han N.-G., Kim Y.-H., Ding Y. & Chambers T.2013. Discovering implicit entity relation with the gene-citation-gene network. PloS One8(12), e84639.

    Google Scholar

    Song M., Heo G. E. & Ding Y.2015. SemPathFinder: semantic path analysis for discovering publicly unknown knowledge. Journal of Informetrics9(4),686–703.

    Google Scholar

    Srinivasan P.2004. Text mining: generating hypotheses from medline. Journal of the American Society for Information Science and Technology55(5),396–413.

    Google Scholar

    Srinivasan P. & Libbus B.2004. Mining medline for implicit links between dietary substances and diseases.Bioinformatics20(Suppl 1),i290–i296.

    Google Scholar

    Srinivasan P., Libbus B. & Sehgal A. K.2004. Mining medline: postulating a beneficial role for curcumin longa in retinal diseases. In Workshop BioLINK, Linking Biological Literature, Ontologies and Databases at HLT NAACL, 33–40.

    Google Scholar

    Stegmann J. & Grohmann G.2003. Hypothesis generation guided by co-word clustering. Scientometrics56(1),111–135.

    Google Scholar

    Sun Y. & Han J.2012. Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery3(2),1–159.

    Google Scholar

    Swanson D.2008. Literature-based discovery? The very idea. In Literature-Based Discovery, Peter Bruza & Marc Weeber (eds.). Springer,3–11.

    Google Scholar

    Swanson D. R.1979. Libraries and the growth of knowledge. The Library Quarterly49(1),3–25.

    Google Scholar

    Swanson D. R.1986a. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine30(1),7–18.

    Google Scholar

    Swanson D. R.1986b. Undiscovered public knowledge. The Library Quarterly56(2),103–118.

    Google Scholar

    Swanson D. R.1987. Two medical literatures that are logically but not bibliographically connected. Journal of the American Society for Information Science38(4), 228.

    Google Scholar

    Swanson D. R.1988. Migraine and magnesium: eleven neglected connections. Perspectives in Biology and Medicine31(4),526–557.

    Google Scholar

    Swanson D. R.1990. The absence of co-citation as a clue to undiscovered causal connections. Scholarly Communication and Bibliometrics,129–137.

    Google Scholar

    Swanson D. R.1993. Intervening in the life cycles of scientific knowledge. Library Trends41(4),606–631.

    Google Scholar

    Swanson D. R. & Smalheiser N. R.1997. An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence91(2),183–203.

    Google Scholar

    Symonds M., Bruza P. & Sitbon L.2014. The efficiency of corpus-based distributional models for literature-based discovery on large data sets. In Proceedings of the Second Australasian Web Conference – Volume 155, AWC ’14, 49–57.

    Google Scholar

    Tarjan R.1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing1(2),146–160.

    Google Scholar

    Torvik V. I. & Smalheiser N. R.2007. A quantitative model for linking two disparate sets of articles in medline. Bioinformatics23(13),1658–1665.

    Google Scholar

    Uzzi B., Mukherjee S., Stringer M. & Jones B.2013. Atypical combinations and scientific impact. Science342(6157),468–472.

    Google Scholar

    Valdés-Pérez R. E.1999. Principles of human-computer collaboration for knowledge discovery in science. Artificial Intelligence107(2),335–346.

    Google Scholar

    van Haagen H.H., AC’t Hoen P., Bovo A.B., de Morrée A., van Mulligen E.M., Chichester C., Kors J.A., den Dunnen J.T., van Ommen G.J.B., van der Maarel S.M. & Kern V.M.2009. Novel protein-protein interactions inferred from literature context. PLoS One4(11), e7894.

    Google Scholar

    van Haagen H. H., ’t Hoen P. A., de Morree A., van Roon-Mom W., Peters D. J., Roos M., Mons B., van Ommen G.-J. & Schuemie M. J.2011. In silico discovery and experimental validation of new protein–protein interactions.Proteomics11(5),843–853.

    Google Scholar

    van Mulligen E. M., van Der Eijk C., Kors J. A., Schijvenaars B. J. & Mons B.2002. Research for research: tools for knowledge discovery and visualization. In Proceedings of the 2002 AMIA Symposium, 835. American Medical Informatics Association.

    Google Scholar

    Waltman L. & Eck N. J.2012. A new methodology for constructing a publication-level classification system of science.Journal of the American Society for Information Science and Technology63(12),2378–2392.

    Google Scholar

    Weeber M., Klein H., de Jong-van den Berg L. & Vos R.2001. Using concepts in literature-based discovery: simulating Swanson’s Raynaud–fish oil and migraine–magnesium discoveries. Journal of the American Society for Information Science and Technology52(7),548–557.

    Google Scholar

    Weeber M., Kors J. A. & Mons B.2005. Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics6(3),277–286.

    Google Scholar

    Weeber M., Vos R., Klein H., Aronson A. R. & Molema G.2003. Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. Journal of the American Medical Informatics Association10(3),252–259.

    Google Scholar

    Wei C.-P., Chen K.-A. & Chen L.-C.2014. Mining biomedical literature and ontologies for drug repositioning discovery. In Advances in Knowledge Discovery and Data Mining: 18th Pacific-Asia Conference, PAKDD 2014, Tainan, Taiwan, May 13-16, V. S. Tseng, T. B. Ho, Z.-H. Zhou, A. L. P. Chen & H.-Y. Kao (eds). Springer International Publishing,373–384.

    Google Scholar

    White H. D. & Griffith B. C.1981. Author cocitation: a literature measure of intellectual structure. Journal of the American Society for Information Science32(3),163–171.

    Google Scholar

    Wilkowski B., Fiszman M., Miller C. M., Hristovski D., Arabandi S., Rosemblat G. & Rindflesch T. C.2011. Graph-based methods for discovery browsing with semantic predications. In Proceedings of the 2011 AMIA Symposium, 2011, 1514. American Medical Informatics Association.

    Google Scholar

    Witten I. H. & Frank E.2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.

    Google Scholar

    Wren J. D.2004. Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics5(1), 1.

    Google Scholar

    Wren J. D.2008. The ‘open discovery’ challenge. In Literature-Based Discovery, P. Bruza & M. Weeber (eds). Springer Berlin Heidelberg,39–55.

    Google Scholar

    Wren J. D., Bekeredjian R., Stewart J. A., Shohet R. V. & Garner H. R.2004. Knowledge discovery by automated identification and ranking of implicit relationships.Bioinformatics20(3),389–398.

    Google Scholar

    Yamamoto Y. & Takagi T.2007. Biomedical knowledge navigation by literature clustering. Journal of Biomedical Informatics40(2),114–130.

    Google Scholar

    Yetisgen-Yildiz M.2006. Litlinker: a system for searching potential discoveries in biomedical literature. In Proceedings of 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR’06) Doctoral Consortium, Seattle, WA.

    Google Scholar

    Yetisgen-Yildiz M. & Pratt W.2006. Using statistical and knowledge-based approaches for literature-based discovery. Journal of Biomedical Informatics39(6),600–611.

    Google Scholar

    Yetisgen-Yildiz M. & Pratt W.2008. Evaluation of literature-based discovery systems. In Literature-Based Discovery, P. Bruza & M. Weeber (eds). Springer Berlin Heidelberg,101–113.

    Google Scholar

    Yetisgen-Yildiz M. & Pratt W.2009. A new evaluation methodology for literature-based discovery systems. Journal of Biomedical Informatics42(4),633–643.

    Google Scholar

    Youn H., Strumsky D., Bettencourt L. M. & Lobo J.2015. Invention as a combinatorial process: evidence from US patents. Journal of The Royal Society Interface12(106),20150272.

    Google Scholar

  • Cite this article

    Yakub Sebastian, Eu-Gene Siew, Sylvester O. Orimaye. 2017. Emerging approaches in literature-based discovery: techniques and performance review. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000042
    Yakub Sebastian, Eu-Gene Siew, Sylvester O. Orimaye. 2017. Emerging approaches in literature-based discovery: techniques and performance review. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000042

Article Metrics

Article views(33) PDF downloads(93)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Emerging approaches in literature-based discovery: techniques and performance review

Abstract: Abstract: Literature-based discovery systems aim at discovering valuable latent connections between previously disparate research areas. This is achieved by analyzing the contents of their respective literatures with the help of various intelligent computational techniques. In this paper, we review the progress of literature-based discovery research, focusing on understanding their technical features and evaluating their performance. The present literature-based discovery techniques can be divided into two general approaches: the traditional approach and the emerging approach. The traditional approach, which dominate the current research landscape, comprises mainly of techniques that rely on utilizing lexical statistics, knowledge-based and visualization methods in order to address literature-based discovery problems. On the other hand, we have also observed the births of new trends and unprecedented paradigm shifts among the recently emerging literature-based discovery approach. These trends are likely to shape the future trajectory of the next generation literature-based discovery systems.

    • The first author would like to thank the School of Information Technology, Monash University Malaysia for supporting this research through the Monash Higher Degree Research Scholarship. The authors would also like to thank the two anonymous reviewers and the editor for providing valuable feedback on the initial manuscript of this paper.

    • http://string-db.org/

    • http://www.omim.org/

    • N-gram is the continuous n-term orn-word sequence in a text.

    • Stemmed words are words whose inflections have been removed by a stemming algorithm so that only their base or root forms are retained. The goal is to reduce level of noise in text. Refer to http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html for more details.

    • http://metamap.nlm.nih.gov/

    • http://skr3.nlm.nih.gov/SemMedDB/

    • http://knoesis-hpco.cs.wright.edu/obvio/

    • http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview

    • http://thomsonreuters.com/en/products-services/scholarly-scientific-research/scholarly-search-and-discovery.html

    • http://www.ncbi.nlm.nih.gov/pmc/

    • http://www.drugbank.ca/

    • http://www.genenames.org/

    • http://ctdbase.org/

    • http://string-db.org/

    • http://www.ncbi.nlm.nih.gov/gene

    • http://thomsonreuters.com/en/products-services/scholarly-scientific-research/scholarly-search-and-discovery/web-of-science.html

    • https://spark.apache.org/graphx/

    • https://networkx.github.io/

    • http://snap.stanford.edu/

    • http://prl.aps.org/50years/milestones

    • http://semrep.nlm.nih.gov/

    • http://arrowsmith.psych.uic.edu/arrowsmith_uic/index.html

    • http://www.ncbi.nlm.nih.gov/pubmed

    • © Cambridge University Press, 2017 2017Cambridge University Press
References (138)
  • About this article
    Cite this article
    Yakub Sebastian, Eu-Gene Siew, Sylvester O. Orimaye. 2017. Emerging approaches in literature-based discovery: techniques and performance review. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000042
    Yakub Sebastian, Eu-Gene Siew, Sylvester O. Orimaye. 2017. Emerging approaches in literature-based discovery: techniques and performance review. The Knowledge Engineering Review 32(1), doi: 10.1017/S0269888917000042
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return