Document Classification by Machine: Theory and Practice

Publication Date


Document Type

Conference Proceeding


Louise Guthrie, Elbert Walker, and Joe Guthrie. 1994. Document classification by machine: theory and practice. In Proceedings of the 15th conference on Computational linguistics - Volume 2 (COLING '94), Vol. 2. Association for Computational Linguistics, Stroudsburg, PA, USA, 1059-1063. DOI: https://doi.org/10.3115/991250.991322


In this note, we present results concerning the theory and practice of determining for a given document which of several categories it best fits. We describe a mathematical model of classification schemes and the one scheme which can be proved optimal among all those based on word frequencies. Finally, we report the results of an experiment which illustrates the efficacy of this classification method.