Classification Vs. Clustering in ML

Sandaminimadhushika
3 min readJul 15, 2021
Image Source : Google Image

Intro to ML

Machine learning is an artificial intelligence (AI) application that provides the system with the ability to automatically learn and expand knowledge based on experience without explicit programming. Machine learning mainly focuses on developing computer programs that can access data and observations, and then use them for training.

The process begins with observations or data, such as examples, direct experience in order to look for patterns in data and make better decisions in the future based on the examples and experience that we provide. The main goal is to allow the computer to automatically learn and adjust actions accordingly without human intervention or guidance.

Supervised Vs. Unsupervised learning

In supervised learning algorithm, the goal is to predict outcomes for newly identified data. You know the type of results to expect. But, with an unsupervised learning algorithm, the goal is to get insights from large volumes of new data. The machine learning itself identifies and determines what is different or interesting from the dataset.

Image Source : Google Image

Intro to Classification and Clustering

In supervised learning, we see a data set, that we already know what our correct result should look like and know the relationship between input and output. Supervised learning tasks are subdivided into “regression” and “classification” problems. Therefore, classification refers to supervised learning.

Unsupervised learning allows us to handle problems with little or no idea about what our results should look like. We can derive structure from data when we don’t necessarily know the effect of the variables. We can derive this structure by clustering the data based on relationships among the variables in the data. Therefore, clustering comes under unsupervised learning.

Supervised Learning → Classification

Unsupervised Learning → Clustering

What is Classification?

Classification is used for supervised learning. It is the process of classifying the input according to the predefined corresponding class label of the input. It has predefined labels for the data, so training and testing data sets are needed to verify the generated model. Classification is more complicated than clustering. We can use many different classification algorithms. Logistic regression, naive Bayes classifier, and support vector machine are some of them.

Image Source : Google Image

When a patient comes with a tumor, we have to predict whether the tumor is malignant or benign. We can use calssifications for this type of predictions.

What is Clustering?

Clustering is used for unsupervised learning. This is the process of grouping the instances based on their similarity, without the help of predefined class labels. So, there is no need of training and testing dataset. Compared with classification, it is less difficult. We can use many different clustering algorithms. k-means clustering algorithm, Fuzzy c-means clustering algorithm, Gaussian (EM) clustering algorithm are some of them.

Image Source : Google Image

Dividing consumers into some groups in market segmentation can be considered as an example for clustering. Consumers in a particular group, will be similar to each other based on some predefined set of characteristics. If two consumers are different from those characteristics, then they will be put into two different groups.

--

--