Classification is one of the simplest forms of ML, yet one of the most used. It is very simple to learn and the complexity is not that high. Classification can be used in many places with very or few changes made to the actual architecture, today we will be learning about this useful model under the wide spectrum that is Machine Learning. It is called the Hello World of ML as along with Regression, it is the first model a beginner learns in his ML career. Today you are going to start it too!
Hey Guys! This is Manas from csopensource.com – “Your one stop destination for everything computer science”. I am sorry that this post has come very late, it is because I was very busy with school and have been learning about blockchain technology and the Solidity Programming Language. I will continue to post but not as frequently as before. Sorry for the inconvenience! Anyways, back to the topic, We will be learning quite a bit about Classification and also doing a practical test with Python 3.
What is Classification?
Classification is the act of creating a line of separation between two or more dimensional distinct groups of data. Once the algorithm has found out the line (also known as the hyperplane), any new points(data) that are introduced will be categorized into the features(groups of data) depending upon the position of that point from the line. This algorithm works best with scenarios where you want to categorize data depending on their distinct features.
Suppose you have 2 fruits, let’s say an apple and an orange. You know how to distinguish it, but does the computer know it? Well of course not! Let us segregate the fruits based on its features(data), now an apple has a smooth texture but an orange usually has a bumpy texture. based on this, we can predict which group the fruit belongs to. This is the act of classification in a nutshell.
DEFINITION OF CLASSIFICATION [WIKIPEDIA]:
In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
- Features, Features are the unique characteristics shown by an object, in short it is the data. For an example of a fruit classifier, the weight of the fruit and its respective texture are the features.
- Labels, Labels are unique to supervised forms of ML only. They serve as the defining medium to a model. Suppose we have a fruit that weighs 110 grams and has a smooth texture which is an apple, we give the model a label named ‘Apple’. This helps in the training process where the model learns the connection between the features and the labels.
In short, Classification is the process of predicting which set the given data belongs to.This kind of ability is vital in the ML space and thus, classification finds many applications in areas such as,
- Disease Prediction, Breast Cancer Prediction, Heart disease prediction etc.
- Video Grouping on YouTube, Once a creator uploads his video, using tags and specific Search Engine Optimization techniques YouTube categorizes the video into its specific genre.
- Image Classification
- Voice Classification
- E-mail spam filtering
This is just the tip of the Iceberg, Classification is much more than this, but we have learnt all the core fundamental features of the classification model. All the fields under ML are vast and constantly under development so covering everything in one article is not feasible. Anyways, what good is learning about a model and not implementing it in code? So let us do that right now! We are going to write a simple classifier using Python 3 to solve the Iris Flower Classification Problem! This problem constitutes of finding which species does a given flower belong to in the Iris family using the data. I will be providing a small snippet of code showing the implementation of the Iris classification problem solved in Python 3.
There are a few things to be done before starting,
- make sure you have a valid installation of Python 3 on your system.
- Download the specific modules using pip with the commands in cmd,
- pip install numpy
- pip install pandas
- pip install scikit-learn
- Download the dataset. Link Here.
- right click on the ‘iris.data’ file on the webpage and click on ‘Save link as’ and save it to your working directory.
Now, you are ready to go! So let us dive into the world of Machine Learning in 3…2…1
Here, we can see that out model has predicted that the flower with the specific attributes mentioned in the test_array inside of python is of the Iris-Setosa species. This is a very simple problem. That’s why I referred to it as the ‘Hello World’ of Classification. Can you believe it, we have written our first classification problem in just 16 lines of Python, that is insane!
Classification is just grouping of data according to the features that they possess. We coded a simple classification problem in just 16 lines of Python! We have decided to use the classification library called Neighbors. It includes the classifier called KNeighborsClassifier(). However, This is not the only library that exists. other modules such as svm.SVC() and DecisionTreeClassifier() do exist but for the sake of convenience we have chosen KNeighborsClassifier() as it is very simple and convineint to use. In short, Classification is one of the easiest forms of an ML Model and is just a stepping stone towards the bigger and more complex models such as Restricted Boltzmann Machines and Auto Encoders etc.
Hope you enjoyed this small introduction and tutorial for Classification. Sorry for the long delay! Thank you so much for reading this and have a nice day 🙂