cluster analysis in data mining

on January 16, 2019 data warehousing and mining 1 comment

cluster analysis in data mining

cluster analysis in data mining is the classification of objects into different groups or the portioning of dataset into subsets (cluster).

In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another duster.

While doing cluster analysis, we first partition the set of data into groups based on data similarly and then assign the lables to the groups.

Cluster analysis is employed for the analysis of production flow analysis chart to determine feasible groups of processes and their respective packs of parts. It makes use of algorithms for the study of similarties between objects in a quantitative manner as compared to the classification techniques, which appears to be descriptive.

We may define cluster analysis in data mining as the science of the classification of objects based on their possession or lack of defined characteristics.

This technique shows an approach to study the similarities between a diverse population of objects in a quantitative manner

cluster analysis in data mining methods:-

Partioning method
Hierarchical method
Density based method
Grid based method
Model based method
Constraint based method

1. Partioning method:-

Suppose we are given a database of ‘n’ objects and the portioning method constructs ‘k’ partion of data.

Each portion will represent a cluster and (k≤n)

Each group contains at least one object.

Each object must belong to exactly one group.

2. Hierarchical methods:-

As its name suggests a hierarchical tree of given set of data object is created in this method

We can classify hierarchical methods on the basis of how the hierarchical decomsition is formed.

There are two approaches

Agglomerative approach
Divisive approach

cluster analysis in data mining

a. Agglomerative approach:-

Agglomerative approach also suggests as bottom up approach.This process start with separating each objects into groups.

b. Divisive approach:-

Divisive approach also called as top down approach.

All the objects are grouped into same cluster.

This method is rigid, i.e. once a merging or splitting is done, and it can never be undone.

3. Density based method:-The basic idea is to continue growing. The given cluster as long as the density in the heighbourhood exceeds some threshold i.e. for each data point within a given cluster, the radius of given cluster has to contain at least a minimum number of points.

4. Grid based method:- The objects is quantized into finite number of cells that forms grid structure.

Advantages:-Fast processing time.

5. Model based method:-

This methods locates the cluster by clustering the density function.

This method also provides a way to automatically determine the number of clusters based on standard statics, taking out lier or noise into account.

6. Constraint based method :-In this method, the clustering is performed by the incorporation of user or application oriented constraints.

A constraint refer to the user expectation or the properties of desired clustering results.

Constraint can be specified by the user for the application requirement.

=>. Three stages of cluster analysis in data mining

Clustering consists of the following three stages:

Preparing a post-operation matrix. This shows whether certain features (like a keyway on shaft) are present or absent.

Computing a similarity co-efficient matrix. The bases of this are the extent to which the parts share common characteristics. In this case, co-efficient would have a value of one (1) when parts are identical and ten (10) when they have no common entity.

Performing a clustering analysis. In this case, the similarity between each pair of objects is examined and group of objects formed such that, within each group, the objects are similar to each other according to the set of rules which have been formulated previously.