A study and comparison of data clustering techniques

Anurag Anand, University of Texas at El Paso


This graduate thesis is a study and comparison of various classification techniques applied to manufacturing data. Data Mining is the extraction of useful information from a very large set of data. Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. There are several applications of data mining in the industries. Many organizations are using data mining to help manage all phases of the customer life cycle, including acquiring new customers, increasing revenue from existing customers, and retaining good customers. Telecommunications and credit card companies are two of the leaders in applying data mining to detect fraudulent use of their services. Insurance companies and stock exchanges are also interested in applying this technology to reduce fraud. ^ This research will explore various classification methods to classify data and compare their results. This will provide a tool to effectively select a data classification method for a specific application. The four clustering techniques that have been compared are Average linkage, K-means, Ward's method and Bayesian classification. Their performance has been evaluated on the basis of two different performance measures and conclusions are drawn on the usability and effectiveness of each of the four methods. This research also aims at testing the application of data clustering in commercial and industrial fields for knowledge discovery and quality control. ^

Subject Area

Engineering, Industrial|Information Science|Computer Science

Recommended Citation

Anand, Anurag, "A study and comparison of data clustering techniques" (2003). ETD Collection for University of Texas, El Paso. AAIEP10510.