Search
1996 Volume 11
Article Contents
RESEARCH ARTICLE   Open Access    

An integrated approach for different attribute types in nearest neighbour classification

More Information
  • Abstract: The basic nearest neighbour algorithm works by storing the training instances and classifying a new case by predicting that it has the same class as its nearest stored instance. To measure the distance between instances, some distance metric needs to be used. In situations when all attributes have numeric values, the conventional nearest neighbour method treats examples as points in feature spaces and uses Euclidean distance as the distance metric. In tasks with only nominal attributes, the simple “over-lap” metric is usually used. To handle classification tasks that have mixed types of attributes, the two different metrics are simply combined. Work by researchers in the machine learning field has shown that this approach performs poorly. This paper attempts to study a more recently developed distance metric and show that this metric is capable of measuring the importance of different attributes. With the use of discretisation for numeric-valued attributes, this method provides an integrated way in dealing with problem domains with mixtures of attribute types. Through detailed analyses, this paper tries to provide further insights into the understanding of nearest neighbour classification techniques and promote further use of this type of classification algorithm.
  • 加载中
  • Breiman L, Friedman JH, Olshen RA and Stone CJ, 1984. Classification and Regression Trees, Wadsworth.

    Google Scholar

    Catlett J, 1991. “On changing continuous attributes into ordered discrete attributes”. In: Kodratoff Y (ed.), Proceedings of the European Working Session on Learning.

    Google Scholar

    Cost S and Salzberg S, 1993. “A weighted nearest neighbour algorithm for learning with symbolic features”. Machine Learning1057–78.

    Google Scholar

    Dasarathy BV, 1991. Nearest Neighbour (NN) Norms: NN Pattern Classification Techniques. IEEE Press.

    Google Scholar

    Devijver PA and Kittler J, 1980. “On the edited nearest neighbour rule”. In: Proceedings of the Fifth International Conference on Pattern Recognition, 72–80.

    Google Scholar

    Fayyad UM and Irani KB, 1993. “Multi-interval discretization of continuous valued attributes for classification learning”. In: Proceedings of the 13th International Joint Conference on Artifical Intelligence, 1022–1027, Morgan Kaufmann.

    Google Scholar

    Fix E and Hodges JL, 1951. “Discriminatory analysis—nonparametric discrimination: consistency properties. Project 21–49–004, Report No. 4, USAF School of Aviation Medicine, Randolph Field, TX, 261–279.

    Google Scholar

    Fix E and Hodges JL, 1952. “Discriminatory analysis—nonparametric discrimination: small sample performance. Project 21–49–004, Report No. 11, USAF School of Aviation Medicine, Randolph Field, TX, 280–322.

    Google Scholar

    Hart PE, 1968. “The condensed nearest neighbour rule”. IEEE Transactions of Information Theory IT-14 (3).

    Google Scholar

    Kerber R, 1992. “ChiMerge: discretization of numeric attributes”. In: Proceedings of the Tenth National Conference on Artificial Intelligence, 123–128, AAAI Press/MIT Press.

    Google Scholar

    Kononenko I, Bratko I and Roskar E, 1984. “Experiments in automatic learning of medical diagnostic rules”. Technical Report. Jozef Stefan Institute, Ljubjana, Yugoslavia.

    Google Scholar

    Kononenko I, 1993. “Inductive and Bayesian learning in medical diagnosis”. Applied Artificial Intelligence7317–337.

    Google Scholar

    Liu WZ and White AP, 1991. “A review of inductive learning”. In: Graham IM and Milne RW (eds.), Research and Development in Expert Systems VIII, 112–126, Cambridge University Press.

    Google Scholar

    Liu WZ and White AP, 1994. “The importance of attribute selection measures in decision tree induction”. Machine Learning1525–41.

    Google Scholar

    Liu WZ and White AP, 1995. “A comparison of nearest neighbour and tree-based discriminant analysis. Journal of Statistical and Computational Simulation5341–50.

    Google Scholar

    Quinlan JR, 1986. “Induction of decision trees”. Machine Learning181–106.

    Google Scholar

    Quinlan JR, 1988. “Decision trees and multi-valued attributes”. Machine Intelligence11305–318.

    Google Scholar

    Quinlan JR and Rivest RL, 1989. “Inferring decision trees using the minimum description length principle. Information and Computation80227–248.

    Google Scholar

    Rachlin J, Kasif S, Salzberg S and Aha D, 1994. “Towards a better understanding of memory-based and Bayesian classifiers”. In: Proceedings of the Eleventh International Conference on Machine Learning, 242–250, New Brunswick, NJ.

    Google Scholar

    Salzberg S, 1989. “Nested hyper-rectangles for exemplar-based learning”. In: Jantke KP (ed), Analogical and Inductive Inference: International Workshop A11'89, Springer-Verlag.

    Google Scholar

    Salzberg S, 1990. Learning with Nested Generalized Exemplars, Kluwer Academic.

    Google Scholar

    Salzberg S, 1991. “A nested hyper-rectangle learning method”. Machine Learning6 (3) 251–276.

    Google Scholar

    Stanfill C and Waltz D, 1986. “Towards memory-based reasoning.” Communications of the ACM29 (12) 1213–1228.

    Google Scholar

    Swonger CW, 1972, “Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition”. In: Watanabe S (ed.), Frontiers of Pattern Recognition, 511–519.

    Google Scholar

    Ting KM, 1994. “Discretisation of continuous-valued attributes and instance-based learning”. Technical Report, 491, Basser Department of Computer Science, University of Sydney.

    Google Scholar

    Tomek I, 1976, “An experiment with the edited nearest neighbour rule”. IEEE Transactions on Systems, Man and Cybernetics. SMC-6 (6) 448–452.

    Google Scholar

    Van de Merckt, T, 1993. “Decision trees in numerical attributes spaces”. In: Proceedings of the 13th International Joint Conference on Artificial Intellegence, 1016–1021, Morgan Kaufmann.

    Google Scholar

    White AP, 1987. “Probabilistic induction by dynamic path generation in virtual trees”. In: Bramer MA (ed.), Research and Development in Expert Systems III, 35–46, Cambridge University Press.

    Google Scholar

    White AP and Liu WZ, 1990. “Probabilistic induction by dynamic path generation for continuous attributes”. In: Addis TR and Muir RM (eds.), Research and Development in Expert Systems VII, 285–296, Cambridge University Press.

    Google Scholar

    White AP and Liu WZ, 1993. “Fairness of attribute selection in probabilistic induction”. In: Bramer MA and Milne RW (eds.), Research and Development in Expert Systems IX, 209–224, Cambridge University Press.

    Google Scholar

    White AP and Liu WZ, 1994. “Bias in information-based measures in decision tree induction.” Machine Learning15321–329.

    Google Scholar

  • Cite this article

    W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906
    W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906

Article Metrics

Article views(18) PDF downloads(95)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

An integrated approach for different attribute types in nearest neighbour classification

The Knowledge Engineering Review  11 Article number: 10.1017/S0269888900007906  (1996)  |  Cite this article

Abstract: Abstract: The basic nearest neighbour algorithm works by storing the training instances and classifying a new case by predicting that it has the same class as its nearest stored instance. To measure the distance between instances, some distance metric needs to be used. In situations when all attributes have numeric values, the conventional nearest neighbour method treats examples as points in feature spaces and uses Euclidean distance as the distance metric. In tasks with only nominal attributes, the simple “over-lap” metric is usually used. To handle classification tasks that have mixed types of attributes, the two different metrics are simply combined. Work by researchers in the machine learning field has shown that this approach performs poorly. This paper attempts to study a more recently developed distance metric and show that this metric is capable of measuring the importance of different attributes. With the use of discretisation for numeric-valued attributes, this method provides an integrated way in dealing with problem domains with mixtures of attribute types. Through detailed analyses, this paper tries to provide further insights into the understanding of nearest neighbour classification techniques and promote further use of this type of classification algorithm.

    • Copyright © Cambridge University Press 19961996Cambridge University Press
References (31)
  • About this article
    Cite this article
    W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906
    W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return