Search
2013 Volume 28
Article Contents
RESEARCH ARTICLE   Open Access    

Using automated planning for improving data mining processes

More Information
  • Abstract: This paper presents a distributed architecture for automating data mining (DM) processes using standard languages. DM is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple and alternative DM tasks to process the data. Here, we describe DM tasks in terms of Automated Planning, which allows us to automate the DM knowledge flow construction. The work is based on the use of standards that have been defined in both DM and automated-planning communities. Thus, we use PMML (Predictive Model Markup Language) to describe DM tasks. From the PMML, a problem description in PDDL (Planning Domain Definition Language) can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a DM workflow description, Knowledge Flow for Machine Learning format (Knowledge Flow file for the WEKA (Waikato Environment for Knowledge Analysis) tool), so the plan or DM workflow can be executed in WEKA.
  • 加载中
  • Amant R. S., Cohen P. R.1997. Evaluation of a semi-autonomous assistant for exploratory data analysis. In Proceedings of the 1st International Conference on Autonomous Agents, Johnson, W. L. & Hayes-Roth, B. (eds). Marina del Rey, California, United States, 355–362. ACM Press.

    Google Scholar

    Ambite J. L., Kapoor D.2007. Automatically composing data workflows with relational descriptions and shim services. In The Semantic Web, Lecture Notes in Computer Science 4825, 15–29. Springer.

    Google Scholar

    Bernstein A., Provost F., Hill S.2005. Towards intelligent assistance for a data mining process: an ontology based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering17(4), 503–518.

    Google Scholar

    Chien S. A., Mortensen H. B.1996. Automating image processing for scientific data analysis of a large image database. IEEE Transactions on Pattern Analysis and Machine Intelligence18(8), 854–859.

    Google Scholar

    De la Rosa T., García-Olaya A., Borrajo D.2007. Using cases utility for heuristic planning improvement. In Case-Based Reasoning Research and Development: Proceedings of the 7th International Conference on Case-Based Reasoning, Weber, R. O. & Richter, M. M. Belfast, Northern Ireland, UK, 137–148. Springer Verlag. ISBN 978-3-540-74138-1.

    Google Scholar

    Diamantini C., Potena D., Storti E.2009. Ontology-driven KDD process composition. In Advances in Intelligent Data Analysis VIII, Lecture Notes in Computer Science 5772, 285–296. Springer.

    Google Scholar

    Engels R.1996. Planning tasks for knowledge discovery in databases; performing task-oriented user-guidance. In Proceedings of the 2nd International Conference on KDD, Menlo Park, California. AAAI Press.

    Google Scholar

    Etzioni O., Weld D.1994. A softbot-based interface to the internet. Communications of the ACM37(7), 72–76.

    Google Scholar

    Fayyad U., Piatetsky-Shapiro G., Smyth P.1996. From data mining to knowledge discovery in databases. AI Magazine17(3), 37–54.

    Google Scholar

    Fernández F., Borrajo D., Fernández S., Manzano D.2009. Assisting data mining through automated planning. In Machine Learning and Data Mining 2009 (MLDM 2009), Perner, P. (ed.), Lecture Notes in Artificial Intelligence 5632, 760–774. Springer-Verlag.

    Google Scholar

    Fox M., Long D.2003. PDDL2.1: an extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research20, 61–124.

    Google Scholar

    Ghallab M., Nau D., Traverso P.2004. Automated Planning—Theory and Practice. Morgan Kaufmann.

    Google Scholar

    Goebel M., Gruenwald L.1999. A survey of data mining and knowledge discovery software tools. SIGKDD Explorations1, 20–33.

    Google Scholar

    Golden K.1997. Planning and Knowledge Representations for Softbots. PhD thesis, University of Washington.

    Google Scholar

    Hilario M., Kalousis A., Nguyen P., Woznica A.2009. A data mining ontology for algorithm selection and meta-learning. In ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), Bled, Slovenia, 76–87.

    Google Scholar

    Hoffmann J., Bertoli P., Helmert M., Pistore M.2009. Message-based web service composition, integrity constraints, and planning under uncertainty: a new connection. Journal of Artificial Intelligence Research35, 49–117.

    Google Scholar

    Kietz J.-U., Serban F., Bernstein A., Fischer S.2009. Towards cooperative planning of data mining workflows. In ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09), Bled, Slovenia, 1–12.

    Google Scholar

    Livingston G. R., Rosenberg J. M., Buchanan B. G.2001. Closing the loop: an agenda- and justification-based framework for selecting the next discovery task to perform. IEEE International Conference on Data Mining, Vancouver, BC, Canada, 385. doi: http://doi.ieeecomputersociety.org/10.1109/ICDM.2001.989543.

    Google Scholar

    Michalski R. S., Kaufman K. A.1998. Discovery planning: multistrategy learning in data mining. In Proceedings of the 4th International Workshop on Multistrategy Learning, Desenzano de Garda, Italy, 14–20.

    Google Scholar

    Michie D., Spiegelhalter D., Taylor C. (eds) 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood.

    Google Scholar

    Morik K., Scholz M.2003. The miningmart approach to knowledge discovery in databases. In Intelligent Technologies for Information Analysis, Zhong, N. & Liu, J. (eds), 47–65. Springer.

    Google Scholar

    Penberthy J. S., Weld D.1992. UCPOP: a sound, complete, partial order planner for ADL. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning, San Mateo, CA.

    Google Scholar

    Rodríguez-Moreno M. D., Borrajo D., Cesta A., Oddi A.2007. Integrating planning and scheduling in workflow domains. Expert System with Applications, 33(2). Retrieved from http://hdl.handle.net/10016/8289.

    Google Scholar

    Rosset S., Perlich C., Zadrozny B.2007. Ranking-based evaluation of regression models. Knowledge and Information Systems12(3), 331–353.

    Google Scholar

    Sumathi S., Sivanandam S.2006. Active data mining. In Studies in Computational Intelligence (SCI), 29. Springer-Verlag.

    Google Scholar

    Witten I. H., Frank E.2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition. Morgan Kaufmann.

    Google Scholar

    Zakova M., Kremen P., Zelezny F., Lavrac N.2008. Planning for data mining workflow composition. In SoKD: ECML/PKDD 2008 Workshop on 3rd Generation Data Mining: Towards Service-oriented Knowledge Discovery, Antwerp, Belgium.

    Google Scholar

  • Cite this article

    Susana Fernández, Tomás de la Rosa, Fernando Fernández, Rubén Suárez, Javier Ortiz, Daniel Borrajo, David Manzano. 2013. Using automated planning for improving data mining processes. The Knowledge Engineering Review 28(2)157−173, doi: 10.1017/S0269888912000409
    Susana Fernández, Tomás de la Rosa, Fernando Fernández, Rubén Suárez, Javier Ortiz, Daniel Borrajo, David Manzano. 2013. Using automated planning for improving data mining processes. The Knowledge Engineering Review 28(2)157−173, doi: 10.1017/S0269888912000409

Article Metrics

Article views(16) PDF downloads(52)

RESEARCH ARTICLE   Open Access    

Using automated planning for improving data mining processes

The Knowledge Engineering Review  28 2013, 28(2): 157−173  |  Cite this article

Abstract: Abstract: This paper presents a distributed architecture for automating data mining (DM) processes using standard languages. DM is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple and alternative DM tasks to process the data. Here, we describe DM tasks in terms of Automated Planning, which allows us to automate the DM knowledge flow construction. The work is based on the use of standards that have been defined in both DM and automated-planning communities. Thus, we use PMML (Predictive Model Markup Language) to describe DM tasks. From the PMML, a problem description in PDDL (Planning Domain Definition Language) can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a DM workflow description, Knowledge Flow for Machine Learning format (Knowledge Flow file for the WEKA (Waikato Environment for Knowledge Analysis) tool), so the plan or DM workflow can be executed in WEKA.

    • This work has been partially supported by the Spanish MICINN under projects TIN2008-06701-C03-03, TRA-2009-008, the regional projects CCG08-UC3M/TIC-4141, and the Automated User Knowledge Building (AUKB) project funded by Ericsson Research Spain.

    • This has to do mainly with how the heuristic is computed.

    • http://archive.ics.uci.edu/ml/datasets.html

    • Copyright © Cambridge University Press 2013 2013Cambridge University Press
References (27)
  • About this article
    Cite this article
    Susana Fernández, Tomás de la Rosa, Fernando Fernández, Rubén Suárez, Javier Ortiz, Daniel Borrajo, David Manzano. 2013. Using automated planning for improving data mining processes. The Knowledge Engineering Review 28(2)157−173, doi: 10.1017/S0269888912000409
    Susana Fernández, Tomás de la Rosa, Fernando Fernández, Rubén Suárez, Javier Ortiz, Daniel Borrajo, David Manzano. 2013. Using automated planning for improving data mining processes. The Knowledge Engineering Review 28(2)157−173, doi: 10.1017/S0269888912000409
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return