An integrated approach for different attribute types in nearest neighbour classification

W. Z. Liu; W. Z. Liu

doi:10.1017/S0269888900007906

1996 Volume 11

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

An integrated approach for different attribute types in nearest neighbour classification

W. Z. Liu

More Information

Published online: 07 July 2009
The Knowledge Engineering Review 11, Article number: 10.1017/S0269888900007906 (1996) | Cite this article

Abstract

Abstract: The basic nearest neighbour algorithm works by storing the training instances and classifying a new case by predicting that it has the same class as its nearest stored instance. To measure the distance between instances, some distance metric needs to be used. In situations when all attributes have numeric values, the conventional nearest neighbour method treats examples as points in feature spaces and uses Euclidean distance as the distance metric. In tasks with only nominal attributes, the simple “over-lap” metric is usually used. To handle classification tasks that have mixed types of attributes, the two different metrics are simply combined. Work by researchers in the machine learning field has shown that this approach performs poorly. This paper attempts to study a more recently developed distance metric and show that this metric is capable of measuring the importance of different attributes. With the use of discretisation for numeric-valued attributes, this method provides an integrated way in dealing with problem domains with mixtures of attribute types. Through detailed analyses, this paper tries to provide further insights into the understanding of nearest neighbour classification techniques and promote further use of this type of classification algorithm.
Rights and permissions
Copyright © Cambridge University Press 19961996Cambridge University Press

References

Breiman L, Friedman JH, Olshen RA and Stone CJ, 1984. Classification and Regression Trees, Wadsworth.

Google Scholar

Catlett J, 1991. “On changing continuous attributes into ordered discrete attributes”. In: Kodratoff Y (ed.), Proceedings of the European Working Session on Learning.

Google Scholar

Cost S and Salzberg S, 1993. “A weighted nearest neighbour algorithm for learning with symbolic features”. Machine Learning1057–78.

Google Scholar

Dasarathy BV, 1991. Nearest Neighbour (NN) Norms: NN Pattern Classification Techniques. IEEE Press.

Google Scholar

Devijver PA and Kittler J, 1980. “On the edited nearest neighbour rule”. In: Proceedings of the Fifth International Conference on Pattern Recognition, 72–80.

Google Scholar

Fayyad UM and Irani KB, 1993. “Multi-interval discretization of continuous valued attributes for classification learning”. In: Proceedings of the 13th International Joint Conference on Artifical Intelligence, 1022–1027, Morgan Kaufmann.

Google Scholar

Fix E and Hodges JL, 1951. “Discriminatory analysis—nonparametric discrimination: consistency properties. Project 21–49–004, Report No. 4, USAF School of Aviation Medicine, Randolph Field, TX, 261–279.

Google Scholar

Fix E and Hodges JL, 1952. “Discriminatory analysis—nonparametric discrimination: small sample performance. Project 21–49–004, Report No. 11, USAF School of Aviation Medicine, Randolph Field, TX, 280–322.

Google Scholar

Hart PE, 1968. “The condensed nearest neighbour rule”. IEEE Transactions of Information Theory IT-14 (3).

Google Scholar

Kerber R, 1992. “ChiMerge: discretization of numeric attributes”. In: Proceedings of the Tenth National Conference on Artificial Intelligence, 123–128, AAAI Press/MIT Press.

Google Scholar

Kononenko I, Bratko I and Roskar E, 1984. “Experiments in automatic learning of medical diagnostic rules”. Technical Report. Jozef Stefan Institute, Ljubjana, Yugoslavia.

Google Scholar

Kononenko I, 1993. “Inductive and Bayesian learning in medical diagnosis”. Applied Artificial Intelligence7317–337.

Google Scholar

Liu WZ and White AP, 1991. “A review of inductive learning”. In: Graham IM and Milne RW (eds.), Research and Development in Expert Systems VIII, 112–126, Cambridge University Press.

Google Scholar

Liu WZ and White AP, 1994. “The importance of attribute selection measures in decision tree induction”. Machine Learning1525–41.

Google Scholar

Liu WZ and White AP, 1995. “A comparison of nearest neighbour and tree-based discriminant analysis. Journal of Statistical and Computational Simulation5341–50.

Google Scholar

Quinlan JR, 1986. “Induction of decision trees”. Machine Learning181–106.

Google Scholar

Quinlan JR, 1988. “Decision trees and multi-valued attributes”. Machine Intelligence11305–318.

Google Scholar

Quinlan JR and Rivest RL, 1989. “Inferring decision trees using the minimum description length principle. Information and Computation80227–248.

Google Scholar

Rachlin J, Kasif S, Salzberg S and Aha D, 1994. “Towards a better understanding of memory-based and Bayesian classifiers”. In: Proceedings of the Eleventh International Conference on Machine Learning, 242–250, New Brunswick, NJ.

Google Scholar

Salzberg S, 1989. “Nested hyper-rectangles for exemplar-based learning”. In: Jantke KP (ed), Analogical and Inductive Inference: International Workshop A11'89, Springer-Verlag.

Google Scholar

Salzberg S, 1990. Learning with Nested Generalized Exemplars, Kluwer Academic.

Google Scholar

Salzberg S, 1991. “A nested hyper-rectangle learning method”. Machine Learning6 (3) 251–276.

Google Scholar

Stanfill C and Waltz D, 1986. “Towards memory-based reasoning.” Communications of the ACM29 (12) 1213–1228.

Google Scholar

Swonger CW, 1972, “Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition”. In: Watanabe S (ed.), Frontiers of Pattern Recognition, 511–519.

Google Scholar

Ting KM, 1994. “Discretisation of continuous-valued attributes and instance-based learning”. Technical Report, 491, Basser Department of Computer Science, University of Sydney.

Google Scholar

Tomek I, 1976, “An experiment with the edited nearest neighbour rule”. IEEE Transactions on Systems, Man and Cybernetics. SMC-6 (6) 448–452.

Google Scholar

Van de Merckt, T, 1993. “Decision trees in numerical attributes spaces”. In: Proceedings of the 13th International Joint Conference on Artificial Intellegence, 1016–1021, Morgan Kaufmann.

Google Scholar

White AP, 1987. “Probabilistic induction by dynamic path generation in virtual trees”. In: Bramer MA (ed.), Research and Development in Expert Systems III, 35–46, Cambridge University Press.

Google Scholar

White AP and Liu WZ, 1990. “Probabilistic induction by dynamic path generation for continuous attributes”. In: Addis TR and Muir RM (eds.), Research and Development in Expert Systems VII, 285–296, Cambridge University Press.

Google Scholar

White AP and Liu WZ, 1993. “Fairness of attribute selection in probabilistic induction”. In: Bramer MA and Milne RW (eds.), Research and Development in Expert Systems IX, 209–224, Cambridge University Press.

Google Scholar

White AP and Liu WZ, 1994. “Bias in information-based measures in decision tree induction.” Machine Learning15321–329.

Google Scholar

About this article

Cite this article

W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906

W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906

Download PDF

Article Metrics

Article views(18) PDF downloads(95)

An integrated approach for different attribute types in nearest neighbour classification

W. Z. Liu

Published online: 07 July 2009

The Knowledge Engineering Review 11, Article number: 10.1017/S0269888900007906 (1996) | Cite this article

Abstract: Abstract: The basic nearest neighbour algorithm works by storing the training instances and classifying a new case by predicting that it has the same class as its nearest stored instance. To measure the distance between instances, some distance metric needs to be used. In situations when all attributes have numeric values, the conventional nearest neighbour method treats examples as points in feature spaces and uses Euclidean distance as the distance metric. In tasks with only nominal attributes, the simple “over-lap” metric is usually used. To handle classification tasks that have mixed types of attributes, the two different metrics are simply combined. Work by researchers in the machine learning field has shown that this approach performs poorly. This paper attempts to study a more recently developed distance metric and show that this metric is capable of measuring the importance of different attributes. With the use of discretisation for numeric-valued attributes, this method provides an integrated way in dealing with problem domains with mixtures of attribute types. Through detailed analyses, this paper tries to provide further insights into the understanding of nearest neighbour classification techniques and promote further use of this type of classification algorithm.

HTML

Rights and permissions

References (31)

About this article

Cite this article

W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906

W. Z. Liu. 1996. An integrated approach for different attribute types in nearest neighbour classification. The Knowledge Engineering Review. 11:6 doi: 10.1017/S0269888900007906

DownLoad: Full-Size Img PowerPoint

Return

{{lists.name}}

An integrated approach for different attribute types in nearest neighbour classification

Abstract