Search
2016 Volume 31
Article Contents
RESEARCH ARTICLE   Open Access    

Scientific Knowledge Engineering: a conceptual delineation and overview of the state of the art

More Information
  • Abstract: As a community work, scientific contributions are usually built incrementally, involving some transformation, expansion or refutation of existing conceptual and propositional networks. As the body of knowledge increases, scientists concentrate more effort on ensuring that new hypotheses and observations are needed and consistent with previous findings. In this paper, we will characterize Knowledge Engineering as an important groundwork for structuring scientific knowledge. We argue that knowledge-based computational infrastructures can support researchers in organizing and making explicit the main aspects needed to make inferences or extract conclusions from an existing body of knowledge. This view is also comparatively built, contrasting it with alternatives for manipulating scientific knowledge, namely data-intensive approaches and the computational discovery of scientific knowledge. The current state of the art is presented with 22 knowledge representations and computational infrastructure implementations, with their main relevant properties analyzed and compared. Based on this review and on the theoretical foundations of Knowledge Engineering, a high level step-by-step approach for specifying and constructing scientific computational environments is described. The paper concludes by indicating paths for further development of the view initiated here, especially related to the technical specificities that originates from applying Knowledge Engineering to scientific knowledge.
  • 加载中
  • Atkins D., Best D., Briss P. A., Eccles M., Falck-Ytter Y., Flottorp S., Guyatt G. H., Harbour R. T., Haugh M. C., Henry D., Hill S., Jaeschke R., Leng G., Liberati A., Magrini N., Mason J., Middleton P., Mrukowicz J., O’Connell D., Oxman A. D., Phillips B., Schünemann H. J., Edejer T. T.-T., Varonen H., Vist G. E.,Williams J. W.Jr & Zaza S., GRADE Working Group2004. Grading quality of evidence and strength of recommendations. BMJ328(7454), 1490.

    Google Scholar

    Bairoch A.2009. The future of annotation/biocuration, Nature Precedings.

    Google Scholar

    Barga R. & Gannon D.2007. Scientific versus business workows. In Workows for e-Science, Taylor I. J., Deelman E., Gannon D. B. & Shields M. (eds). Springer, 9–16.

    Google Scholar

    Bauer-Mehren A., Furlong L. I. & Sanz F.2009. Pathway databases and tools for their exploitation: benefits, current limitations and challenges. Molecular Systems Biology5(290).

    Google Scholar

    Bechhofer S., Buchan I., De Roure D., Missier P., Ainsworth J., Bhagat J., Couch P., Cruickshank D., Delderfield M., Dunlop I., Gamble M., Michaelides D., Owen S., Newman D., Sufi S. & Goble C.2013. Why linked data is not enough for scientists. Future Generation Computer Systems29(2), 599–611.

    Google Scholar

    Biolchini J., Mian P., Natali A. & Travassos G. H.2005. Systematic review in software engineering. Technical report No. RT-ES 679/05, Federal University of Rio de Janeiro (UFRJ/COPPE).

    Google Scholar

    Booth A.2011. Evidence-based practice: triumph of style over substance?Health Information & Libraries Journal28(3), 237–241.

    Google Scholar

    Budgen D., Turner M., Brereton P. & Kitchenham B.2008. Using mapping studies in software engineering. In Proceedings of PPIG Psychology of Programming Interest Group, 195–204. Lancaster University.

    Google Scholar

    Bunge M.2004. How does it work? The search for explanatory mechanisms. Philosophy of the Social Sciences34(2), 182–210.

    Google Scholar

    Bylander T. & Chandrasekaran B.1987. Generic tasks for knowledge-based reasoning: the ‘right’ level of abstraction for knowledge acquisition. International Journal of Man-Machine Studies26(2), 231–243.

    Google Scholar

    Callahan A., Dumontier M. & Shah N. H.2011. HyQue: evaluating hypotheses using semantic web technologies. Journal of Biomedical Semantics2(2), 1–17.

    Google Scholar

    Chua C. E. H., Storey V. C. & Chiang R. H.2012. Deriving knowledge representation guidelines by analyzing knowledge engineer behavior. Decision Support Systems54(1), 304–315.

    Google Scholar

    Cohen A. M. & Hersh W. R.2005. A survey of current work in biomedical text mining. Briefings in Bioinformatics6(1), 57–71.

    Google Scholar

    Cooper H. M., Hedges L. V. & Valentine J. C.2009. The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation.

    Google Scholar

    Craver C. F. & Darden L.2005. Introduction. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences36(2), 233–244.

    Google Scholar

    da Cruz S., Campos M. & Mattoso M.2009. Towards a taxonomy of provenance in scientific workow management systems. In 2009 World Conference on Services – I, 259–266.

    Google Scholar

    Deelman E., Gannon D., Shields M. & Taylor I.2009. Workows and e-science: an overview of workow system features and capabilities. Future Generation Computer Systems25(5), 528–540.

    Google Scholar

    Dennis C.2002. Biology databases: information overload. Nature417(6884), 14.

    Google Scholar

    Dibble D. & Bostrom R. P.1987. Managing expert systems projects: factors critical for successful implementation. In Proceedings of the Conference on the 1987 ACM SIGBDP-SIGCPR Conference, SIGCPR’ 87, 96–128. ACM.

    Google Scholar

    Dinakarpandian D., Lee Y., Vishwanath K. & Lingambhotla R.2006. MachineProse: an ontological framework for scientific assertions. Journal of the American Medical Informatics Association13(2), 220–232.

    Google Scholar

    Dixon-Woods M., Agarwal S., Jones D., Young B. & Sutton A.2005. Synthesising qualitative and quantitative evidence: a review of possible methods. Journal of Health Services Research & Policy10(1), 45–53.

    Google Scholar

    Dung P. M.1995. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence77(2), 321–357.

    Google Scholar

    Dyba T., Dingsoyr T. & Hanssen G.2007. Applying systematic reviews to diverse study types: an experience report. In International Symposium on Empirical Software Engineering and Measurement, 225–234.

    Google Scholar

    Džeroski S., Langley P. & Todorovski L.2007. Computational discovery of scientific knowledge. In Computational Discovery of Scientific Knowledge, Džeroski S. & Todorovski L. (eds), Lecture Notes in Computer Science 4660, 1–14. Springer.

    Google Scholar

    Easterbrook S., Singer J., Storey M.-A. & Damian D.2008. Selecting empirical methods for software engineering research. In Guide to Advanced Empirical Software Engineering, Shull F., Singer J. & Sjøberg D. I. K. (eds). Springer, 285–311.

    Google Scholar

    Eriksson H.1992. A survey of knowledge acquisition techniques and tools and their relationship to software engineering. Journal of Systems and Software19(1), 97–107.

    Google Scholar

    Fayyad U. & Stolorz P.1997. Data mining and KDD: promise and challenges. Future Generation Computer Systems13(2–3), 99–115.

    Google Scholar

    Fellers J.1987. Key factors in knowledge acquisition. SIGCPR Computer Personnel11(1), 10–24.

    Google Scholar

    Fiore S. & Aloisio G.2011. Special section: data management for eScience. Future Generation Computer Systems27(3), 290–291.

    Google Scholar

    Forbus K. D. & DeKleer J.1993. Building Problem Solvers. MIT Press.

    Google Scholar

    Ford K. M.1993. Knowledge Acquisition as Modeling. Wiley.

    Google Scholar

    Freiling M., Alexande J., Messick S., Rehfuss S. & Shulman S.1985. Starting a knowledge engineering project: a step-by-step approach. AI Magazine6(3), 150.

    Google Scholar

    Goertz G. & Mahoney J.2012. A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton University Press.

    Google Scholar

    Hars A.2001. Designing scientific knowledge infrastructures: the contribution of epistemology. Information Systems Frontiers3(1), 63–73.

    Google Scholar

    Hey T. & Trefethen A.2003. The data deluge: an e-science perspective. In Grid Computing, Berman F., Fox G. & Hey T. (eds). John Wiley & Sons Ltd, 809–824.

    Google Scholar

    Hunter A. & Liu W.2010. A survey of formalisms for representing and reasoning with scientific knowledge. The Knowledge Engineering Review25(2), 199–222.

    Google Scholar

    Hunter J.2008. Scientific publication packages—a selective approach to the communication and archival of scientific output. International Journal of Digital Curation1(1), 33–52.

    Google Scholar

    Ivarsson M. & Gorschek T.2012. Tool support for disseminating and improving development practices. Software Quality Journal20(1), 173–199.

    Google Scholar

    Khatri P., Sirota M. & Butte A. J.2012. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Computational Biology8(2), e1002375.

    Google Scholar

    Kiritchenko S., Bruijn B. D., Carini S., Martin J. & Sim I.2010. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making10(1), 56.

    Google Scholar

    Kitchenham B. & Charters S.2007. Guidelines for performing systematic literature reviews in software engineering. Technical Report No. EBSE 2007-001, Keele University and Durham University Joint Report.

    Google Scholar

    Langley P.1987. Scientific Discovery: Computational Explorations of the Creative Processes. MIT Press.

    Google Scholar

    Langley P., Zytkow J. M., Bradshaw G. L. & Simon H. A.1983. Three facets of scientific discovery. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence – Volume 1, IJCAI’83, 465–468. Morgan Kaufmann Publishers, Inc.

    Google Scholar

    Lenat D. B. & Feigenbaum E. A.1991. On the thresholds of knowledge. Artificial Intelligence47(1–3), 185–250.

    Google Scholar

    Lewis-Beck M., Bryman A. & Liao T. F.2004. Encyclopedia of Social Science Research Methods. SAGE Publications, Inc.

    Google Scholar

    Lin C., Lu S., Fei X., Chebotko A., Pai D., Lai Z., Fotouhi F. & Hua J.2009. A reference architecture for scientific workflow management systems and the VIEW SOA solution. IEEE Transactions on Services Computing2(1), 79–92.

    Google Scholar

    Lord P., Macdonald A., Lyon L. & Giaretta D.2004. From data deluge to data curation, In Proceeding of the 3th UK e-Science All Hands Meeting, 371–375.

    Google Scholar

    Maccagnan A., Riva M., Feltrin E., Simionati B., Vardanega T., Valle G. & Cannata N.2010. Combining ontologies and workflows to design formal protocols for biological laboratories. Automated Experimentation2(1), 1–14.

    Google Scholar

    Martinez-Fernandez S., Santos P., Ayala C., Franch X. & Travassos G.2015. Aggregating Empirical Evidence about the Benefits and Drawbacks of Software Reference Architectures, 2015 ACM/IEEE. International Symposium, on Empirical Software Engineering and Measurement (ESEM), pp. 1–10.

    Google Scholar

    Mcdermott J.1988. Preliminary steps toward a taxonomy of problem-solving methods. In Automating Knowledge Acquisition for Expert Systems, number 57 in The Kluwer International Series in Engineering and Computer Science, Marcus S. (ed.). Springer, 225–256.

    Google Scholar

    Mons B.2005. Which gene did you mean?BMC Bioinformatics6(1), 142.

    Google Scholar

    Mons B. & Velterop J.2009. Nano-publication in the e-science era. In Workshop on Semantic Web Applications in Scientific Discourse.

    Google Scholar

    Moody D.2009. The ‘physics’ of notations: toward a scientific basis for constructing visual notations in software engineering. IEEE Transactions on Software Engineering35(6), 756–779.

    Google Scholar

    Motta E., Rajan T. & Eisenstadt M.1990. Knowledge acquisition as a process of model refinement. Knowledge Acquisition2(1), 21–49.

    Google Scholar

    Newman H. B., Ellisman M. H. & Orcutt J. A.2003. Data-intensive e-science frontier research. Communication of the ACM46(11), 68–77.

    Google Scholar

    Noblit G. W. & Hare R. D.1988. Meta-Ethnography: Synthesizing Qualitative Studies. SAGE.

    Google Scholar

    Novère N. L., Hucka M., Mi H., Moodie S., Schreiber F., Sorokin A., Demir E., Wegner K., Aladjem M. I., Wimalaratne S. M., Bergman F. T., Gauges R., Ghazal P., Kawaji H., Li L., Matsuoka Y., Villéger A., Boyd S. E., Calzone L., Courtot M., Dogrusoz U., Freeman T. C., Funahashi A., Ghosh S., Jouraku A., Kim S., Kolpakov F., Luna A., Sahle S., Schmidt E., Watterson S., Wu G., Goryanin I., Kell D. B., Sander C., Sauro H., Snoep J. L., Kohn K. & Kitano H.2009. The systems biology graphical notation. Nature Biotechnology27(8), 735–741.

    Google Scholar

    Petersen K., Feldt R., Mujtaba S. & Mattsson M.2008. Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering, EASE’08, 68–77. British Computer Society.

    Google Scholar

    Plant R. T.1991. Rigorous approach to the development of knowledge-based systems. Knowledge-Based Systems4(4), 186–196.

    Google Scholar

    Rainer A., Jagielska D. & Hall T.2005. Software engineering practice versus evidence-based software engineering research. In Proceedings of the 2005 Workshop on Realising Evidence-Based Software Engineering, REBSE’05, 1–5. ACM.

    Google Scholar

    Rook F. & Croghan J.1989. The knowledge acquisition activity matrix: a systems engineering conceptual framework. IEEE Transactions on Systems, Man and Cybernetics19(3), 586–597.

    Google Scholar

    Rzhetsky A., Iossifov I., Koike T., Krauthammer M., Kra P., Morris M., Yu H., Duboué P. A., Weng W., Wilbur W. J., Hatzivassiloglou V. & Friedman C.2004. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. Journal of Biomedical Informatics37(1), 43–53.

    Google Scholar

    Sackett D. L., Rosenberg W. M., Gray J. A., Haynes R. B. & Richardson W. S.1996. Evidence based medicine: what it is and what it isn’t. BMJ312(7023), 71–72.

    Google Scholar

    Sanders T. J. M., Spooren W. P. M. & Noordman L. G. M.1993. Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics4(2), 93–134.

    Google Scholar

    Santos P. & Travassos G.2013. On the representation and aggregation of evidence in software engineering: a theory and belief-based perspective. Electronic Notes in Theoretical Computer Science292, 95–118.

    Google Scholar

    Santos P. & Travassos G.2015. Aggregating empirical evidence about the benefits and drawbacks of software reference architectures. In International Symposium on Empirical Software Engineering and Measurement (in press).

    Google Scholar

    Santos P. S., Nascimento I. & Travassos G. H.2015. A computational infrastructure for research synthesis in software engineering. In XVIII Ibero-American Conference on Software Engineering, 309–322. URP, SPC, UCSP, UCSP.

    Google Scholar

    Schreiber G.2000. Knowledge Engineering and Management: The CommonKADS Methodology. MIT Press.

    Google Scholar

    Shafer G.1976. A Mathematical Theory of Evidence. Princeton University Press.

    Google Scholar

    Shotton D.2009. Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing22(2), 85–94.

    Google Scholar

    Shrager J.1990. Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann Publisher.

    Google Scholar

    Shull F., Feldmann R. & Shaw M.2006. Building decision support in an imperfect world. In International Symposium on Empirical Software Engineering ISESE, 33–35.

    Google Scholar

    Shull F., Singer J. & Sjøberg D. I. K.2007. Guide to Advanced Empirical Software Engineering, 2008 edition. Springer.

    Google Scholar

    Simon H. A.1977. Scientific discovery and the psychology of problem solving. In Models of Discovery, Number 54 in Boston Studies in the Philosophy of Science, Simon H. A. (ed.). Springer, 286–303.

    Google Scholar

    Sjøberg D. I. K., Dybå T., Anda B. C. D. & Hannay J. E.2008. Building theories in software engineering. In Guide to Advanced Empirical Software Engineering, Shull F., Singer J. & Sjøberg D. I. K. (eds). Springer, 312–336.

    Google Scholar

    Slater T., Bouton C. & Huang E. S.2008. Beyond data integration. Drug Discovery Today13(13–14), 584–589.

    Google Scholar

    Stock K., Robertson A., Reitsma F., Stojanovic T., Bishr M., Medyckyj-Scott D. & Ortmann J.2009. eScience for sea science: a semantic scientific knowledge infrastructure for marine scientists. In Fifth IEEE International Conference on e-Science. e-Science’ 09, 110–117.

    Google Scholar

    Studer R., Benjamins V. & Fensel D.1998. Knowledge engineering: principles and methods. Data & Knowledge Engineering25(1–2), 161–197.

    Google Scholar

    Travassos G., Santos P., Neto P. & Biolchini J.2008. An environment to support large scale experimentation in software engineering. In 13th IEEE International Conference on Engineering of Complex Computer Systems, 2008. ICECCS 2008, 193–202.

    Google Scholar

    Valdés-Pérez R. E.1996. Computer science research on scientific discovery. The Knowledge Engineering Review11(1), 57–66.

    Google Scholar

    Vorms M.2011. Representing with imaginary models: formats matter. Studies in History and Philosophy of Science Part A42(2), 287–295.

    Google Scholar

    Wallace D. & Fujii R.1989. Software verification and validation: an overview. IEEE Software6(3), 10–17.

    Google Scholar

    Wielinga B., Schreiber A. & Breuker J.1992. KADS: a modelling approach to knowledge engineering. Knowledge Acquisition4(1), 5–53.

    Google Scholar

  • Cite this article

    Paulo Sérgio M. Dos Santos, Guilherme H. Travassos. 2016. Scientific Knowledge Engineering: a conceptual delineation and overview of the state of the art. The Knowledge Engineering Review 31(2)167−199, doi: 10.1017/S0269888916000011
    Paulo Sérgio M. Dos Santos, Guilherme H. Travassos. 2016. Scientific Knowledge Engineering: a conceptual delineation and overview of the state of the art. The Knowledge Engineering Review 31(2)167−199, doi: 10.1017/S0269888916000011

Article Metrics

Article views(22) PDF downloads(77)

RESEARCH ARTICLE   Open Access    

Scientific Knowledge Engineering: a conceptual delineation and overview of the state of the art

The Knowledge Engineering Review  31 2016, 31(2): 167−199  |  Cite this article

Abstract: Abstract: As a community work, scientific contributions are usually built incrementally, involving some transformation, expansion or refutation of existing conceptual and propositional networks. As the body of knowledge increases, scientists concentrate more effort on ensuring that new hypotheses and observations are needed and consistent with previous findings. In this paper, we will characterize Knowledge Engineering as an important groundwork for structuring scientific knowledge. We argue that knowledge-based computational infrastructures can support researchers in organizing and making explicit the main aspects needed to make inferences or extract conclusions from an existing body of knowledge. This view is also comparatively built, contrasting it with alternatives for manipulating scientific knowledge, namely data-intensive approaches and the computational discovery of scientific knowledge. The current state of the art is presented with 22 knowledge representations and computational infrastructure implementations, with their main relevant properties analyzed and compared. Based on this review and on the theoretical foundations of Knowledge Engineering, a high level step-by-step approach for specifying and constructing scientific computational environments is described. The paper concludes by indicating paths for further development of the view initiated here, especially related to the technical specificities that originates from applying Knowledge Engineering to scientific knowledge.

    • This research is supported by CNPq (Brazilian Research Council) under the grant 305929/2014-3. Prof. Travassos is a CNPq researcher.

    • We use the term computational infrastructure instead of knowledge-based system or expert system, as we believe that it better represents the KE application to the scientific domain and draws attention to the fact that this kind of system does not intend to replace the scientist expertise but to boost it. It is also consonant with Hars (2001) which uses the term Scientific Knowledge Infrastructure.

    • The alternative paradigm is the transfer process paradigm (Studer et al., 1998).

    • To have an impression of the precision at hand with a more structured search string, we compiled one a posteriori, that is, after we identified the papers in this section. The string captures two dimensions, namely ‘KE’ and ‘scientific knowledge’. Retrieving all the papers presented in this section, the search string ended up with the following terms: (‘scientific knowledge’ OR ‘science knowledge’ OR ‘science information’ OR ‘scientific information’ OR ‘pathway’ OR ‘scientific paper’ OR ‘scientific publication’ OR ‘scientific assertion’ OR ‘scientific discourse’ OR ‘supplementary nature scholarly discourse’ OR ‘scientific statements’ OR ‘scholarly communication’ OR ‘mechanism knowledge’ OR ‘evidence based’ OR ‘scholarly argumentation’ OR ‘scientific argumentation’ OR ‘scientific contributions’ OR ‘scientific claim’ OR ‘evidence representation’ OR ‘research proposal’ OR ‘scientific theory’) AND (‘KE’ OR ‘knowledge management’ OR ‘expert system’ OR ‘knowledge-based system’ OR ‘computational infrastructure’ OR ‘ontology’ OR ‘inference’ OR ‘knowledge representation’ OR ‘semantic Web’ OR ‘RDF’ OR ‘argument system’ OR ‘discourse representation’ OR ‘research system’ OR ‘belief functions’ OR ‘decision support system’). With these terms, >13 000 papers were returned which clearly shows an absence of consensus in terminology, but also indicates the diversity of SKE applications.

    • Description logic is a subset of the language of classical logic. Ontologies are commonly used in conjunction with description logics which allow logical reasoning based on monadic and binary predicates to represent relations such as sub-concepts, union and intersections.

    • References: (i) Dinakarpandian et al. (2006), Ciccarese et al. (2008), de Waard et al. (2009), Kraines and Guo (2011); (ii) Sharma et al. (2010); (iii) Groza et al. (2007), Pike and Gahegan (2007), Brodaric et al. (2008), Groth et al. (2010), Clare et al. (2011), Marcondes (2011), Bölling et al. (2014), Ekaputra et al. (2014); (v) de Waard & schneider (2012), Santos & Travassos (2013); (vi) Boyce et al. (2007), Croft et al. (2011), Hunter and Williams (2012), van Valkenhoef et al. (2013); (vii) Mancini & Buckingham Shum (2006), Kuhn et al. (2013); (viii) Russ et al. (2011).

    • It is important to recall that in SKE modelling activities occur at two moments. One is the modelling (or ‘translation’) of scientific results into knowledge representations. The other is the modelling (or design) of the knowledge representation itself as an operational model with a particular behaviour, given a set of specific conditions.

    • Neo4j database: http://www.neo4j.org/

    • © Cambridge University Press, 2016 2016Cambridge University Press
References (83)
  • About this article
    Cite this article
    Paulo Sérgio M. Dos Santos, Guilherme H. Travassos. 2016. Scientific Knowledge Engineering: a conceptual delineation and overview of the state of the art. The Knowledge Engineering Review 31(2)167−199, doi: 10.1017/S0269888916000011
    Paulo Sérgio M. Dos Santos, Guilherme H. Travassos. 2016. Scientific Knowledge Engineering: a conceptual delineation and overview of the state of the art. The Knowledge Engineering Review 31(2)167−199, doi: 10.1017/S0269888916000011
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return