Search
2010 Volume 25
Article Contents
RESEARCH ARTICLE   Open Access    

A survey of data mining and knowledge discovery process models and methodologies

More Information

Article Metrics

Article views(16) PDF downloads(467)

RESEARCH ARTICLE   Open Access    

A survey of data mining and knowledge discovery process models and methodologies

The Knowledge Engineering Review  25 Article number: 10.1017/S0269888910000032  (2010)  |  Cite this article

Abstract: Abstract: Up to now, many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success. In this paper, we describe the most used (in industrial and academic projects) and cited (in scientific literature) data mining and knowledge discovery methodologies and process models, providing an overview of its evolution along data mining and knowledge discovery history and setting down the state of the art in this topic. For every approach, we have provided a brief description of the proposed knowledge discovery in databases (KDD) process, discussing about special features, outstanding advantages and disadvantages of every approach. Apart from that, a global comparative of all presented data mining approaches is provided, focusing on the different steps and tasks in which every approach interprets the whole KDD process. As a result of the comparison, we propose a new data mining and knowledge discovery process named refined data mining process for developing any kind of data mining and knowledge discovery project. The refined data mining process is built on specific steps taken from analyzed approaches.

    • This work has been partially funded by the project no. TIN 2008-05924/TIN of the Ministry of Science and Innovation of Spain.

    • De facto standards are those that have come into existence without any formal plan by any of the standard organizations. Rather, they are developed through the industry’s acceptance of a specific vendor’s standard, which is placed in the public domain (De facto is Latin for from the fact) (Gallo & Hancock, 2001).

    • BI is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions (SearchDataManagement.com, 2008). Data mining is an important component of BI (Cabena et al., 1997).

    • Actually, although authors refer to CRISP-DM as a process model, it is really an instanced process model because it establishes a waterfall life cycle (CRISP-DM states which tasks have to be carried out to successfully complete a data mining project and its order). Therefore, it must not be considered as a process model. It must not be considered as a pure methodology either, because it does not describe how to do all the tasks. It can be considered as a mixing between both terms.

    • Copyright © Cambridge University Press 20102010Cambridge University Press
References (78)
  • About this article
    Cite this article
    Gonzalo Mariscal, Óscar Marbán, Covadonga Fernández. 2010. A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review. 25:32 doi: 10.1017/S0269888910000032
    Gonzalo Mariscal, Óscar Marbán, Covadonga Fernández. 2010. A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review. 25:32 doi: 10.1017/S0269888910000032
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return