Rashmi Jain

Author
Rashmi Jain

Blogs
Rashmi began their journey in software development but found their voice in storytelling. Now, Rashmi simplifies complex tech concepts through engaging narratives that resonate with both engineers and hiring managers.
author’s Articles

Insights & Stories by Rashmi Jain

Explore Rashmi Jain’s blogs for thoughtful breakdowns of tech hiring, development culture, and the softer skills that build stronger engineering teams.
Clear all
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Filter
Filter

Simple Tutorial on SVM and Parameter Tuning in Python and R

Introduction

Data classification is a very important task in machine learning.Support Vector Machines (SVMs) are widely applied in the field of pattern classifications and nonlinear regressions. The original form of the SVM algorithm was introduced by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963. Since then, SVMs have been transformed tremendously to be used successfully in many real-world problemssuch as text (and hypertext) categorization,image classification,bioinformatics (Protein classification,Cancer classification), handwritten character recognition, etc.

Table of Contents

  1. What is a Support Vector Machine?
  2. How does it work?
  3. Derivation of SVM Equations
  4. Pros and Cons of SVMs
  5. Python and R implementation

What is a Support Vector Machine(SVM)?

A Support Vector Machine is a supervised machine learning algorithm which can be used for both classification and regression problems. It follows a technique called the kernel trick to transform the data and based on these transformations, it finds an optimal boundary between the possible outputs.

In simple words, it does some extremely complex data transformations to figure out how to separate the data based on the labels or outputs defined.We will be looking only at the SVM classification algorithm in this article.

Support Vector Machine Classification Algorithm

How does it work?

The main idea is to identify the optimal separating hyperplane which maximizes the margin of the training data. Let us understand this objective term by term.

What is a separating hyperplane?

We can see that it is possible to separate the data given in the plot above. For instance, we can draw a line in which all the points above the line are green and the ones below the line are red. Such a line is said to be a separating hyperplane.

Now the obvious confusion, why is it called a hyperplane if it is a line?

In the diagram above, we have considered the simplest of examples, i.e., the dataset lies in the 2-dimensional plane(R2). But the support vector machine can work for a general n-dimensional dataset too. And in the case of higher dimensions, thehyperplane is the generalization of a plane.

More formally, it is an n-1 dimensional subspace of an n-dimensional Euclidean space. So for a

  • 1D dataset, a single point represents the hyperplane.
  • 2D dataset, a line is a hyperplane.
  • 3D dataset, a plane is a hyperplane.
  • And in the higher dimension, it is called a hyperplane.

We have said that the objective of an SVM is to find the optimal separating hyperplane. When is a separating hyperplane said to be optimal?

The fact that there exists a hyperplane separating the dataset doesn’t mean that it is the best one.

Let us understand the optimal hyperplane through a set of diagrams.

  1. Multiple hyperplanes
    There are multiple hyperplanes, but which one of them is a separating hyperplane? It can be easily seen that line B is the one which best separates the two classes.
Support Vector Machines multiple hyperplanes
  1. Multiple separating hyperplanes
    There can be multiple separating as well. How do wefind the optimal one? Intuitively, if we select a hyperplane which is close to the data points of one class, then it might not generalize well. So the aim is to choose the hyperplane which is as far as possible from the data points of each category.
multiple separating hyperplanes SVM
  1. In the diagram above, the hyperplane that meets the specified criteria for the optimal hyperplane is B.

Therefore, maximizing the distance between the nearest points of each class and the hyperplane would result in an optimal separating hyperplane. This distance is called the margin.

The goal of SVMs is to find the optimal hyperplane because it not only classifies the existing dataset but also helps predict the class of the unseen data. And the optimal hyperplane is the one which has the biggest margin.

Optimal hyperplane SVM

Mathematical Setup

Now that we have understood the basic setup of this algorithm, let us dive straight into the mathematical technicalities of SVMs.

I will be assuming you are familiar withbasic mathematical concepts such as vectors, vector arithmetic(addition, subtraction, dot product) and the orthogonal projection. Some of these concepts can also be found in the article, Prerequisites of linear algebra for machine learning.

Equation of Hyperplane

You musthave come across the equation of a straight line as y=mx+c, where m is the slope and cis the y-intercept of the line.

The generalized equation of a hyperplane is as follows:

wTx=0

Here w and x are the vectors and wTx represents the dot product of the two vectors. The vector w is often called as the weight vector.

Consider the equation of the line as y−mx−c=0.In this case,

w=⎛⎝⎜−c−m1⎞⎠⎟ and x=⎛⎝⎜1xy⎞⎠⎟

wTx=−c×1−m×x+y=y−mx−c=0

It is just two different ways of representing the same thing. So why do we use wTx=0? Simply because it is easier to deal with this representation in thecase of higher dimensional dataset and w represents the vector which is normal to the hyperplane. This property will be useful once we start computing the distance from a point to the hyperplane.

Machine learning challenge, ML challenge

Understanding the constraints

The training data in our classification problem is of the form {(x1,y1),(x2,y2),…,(xn,yn)}∈Rn×−1,1. This means that the training dataset is a pair of xi, an n-dimensional feature vector and yi, the label of xi. When yi=1 implies that the sample with the feature vector xi belongs to class 1 and if yi=−1 implies that the sample belongs to class -1.

In a classification problem, we thus try to find out a function, y=f(x):Rn⟶{−1,1}. f(x) learns from the training data set and then applies its knowledge to classify the unseen data.

There are an infinite number of functions, f(x) that can exist, so we have to restrict the class of functions that we are dealing with. In thecase of SVM’s, this class of functions is that of the hyperplanerepresented as wTx=0.

It can also be represented as w⃗ .x⃗ +b=0;w⃗ ∈Rn and b∈R

This divides the input space into two parts, one containing vectors of class ?1 and the other containing vectors of class +1.

For the rest of this article, we will consider 2-dimensional vectors. Let H0 be a hyperplane separating the dataset and satisfying the following:

w⃗ .x⃗ +b=0

Along with H0, we can select two others hyperplanes H1 and H2 such that they also separate the data and have the following equations:

w⃗ .x⃗ +b=δ and w⃗ .x⃗ +b=-δ

This makes Ho equidistant from H1 as well as H2.

The variable ? is not necessary so we can set ?=1 to simplify the problem as w⃗ .x⃗ +b=1 and w⃗ .x⃗ +b=-1

Next, we want to ensure that there is no point between them. So for this, we will select only those hyperplanes which satisfy the following constraints:

For every vector xieither:

  1. w⃗ .x⃗ +b≤-1 for xi having the class ?1 or
  2. w⃗ .x⃗ +b≥1 for xi having the class 1
constraints_SVM

Combining the constraints

Both the constraints stated above can be combined into a single constraint.

Constraint 1:

For xi having the class -1, w⃗ .x⃗ +b≤-1
Multiplying both sides by yi (which is always -1 for this equation)
yi(w⃗ .x⃗ +b)≥yi(−1) which implies yi(w⃗ .x⃗ +b)≥1 for xi having the class?1.

Constraint 2:yi=1

yi(w⃗ .x⃗ +b)≥1 for xi having the class 1

Combining both the above equations, we get yi(w⃗ .x⃗ +b)≥1 for all 1≤i≤n

This leads to a unique constraint instead of two which are mathematically equivalent. The combined new constraint also has the same effect, i.e., no points between the two hyperplanes.

Maximize the margin

For the sake of simplicity, we will skip the derivation of the formula for calculating the margin, m which is

m=2||w⃗ ||

The only variable in this formula is w, which is indirectly proportional to m, hence to maximize the margin we will have to minimize ||w⃗ ||. This leads to the following optimization problem:

Minimize in (w⃗ ,b){||w⃗ ||22 subject to yi(w⃗ .x⃗ +b)≥1 for any i=1,…,n

The above is the case when our data is linearly separable. There are many cases where the data can not be perfectly classified through linear separation. In such cases, Support Vector Machine looks for the hyperplane that maximizes the margin and minimizes the misclassifications.

For this, we introduce the slack variable,ζi which allows some objects to fall off the margin but it penalizes them.

Slack variables SVM

In this scenario, the algorithm tries to maintain the slack variable to zero while maximizing the margin. However, it minimizes the sum of distances of the misclassification from the margin hyperplanes and not the number of misclassifications.

Constraints now changes to yi(w⃗ .xi→+b)≥1−ζi for all 1≤i≤n,ζi≥0

and the optimization problem changes to

Minimize in (w⃗ ,b){||w⃗ ||22+C∑iζi subject to yi(w⃗ .x⃗ +b)≥1−ζi for any i=1,…,n

Here, the parameter C is the regularization parameter that controls the trade-off between the slack variable penalty (misclassifications) and width of the margin.

  • Small C makes the constraints easy to ignore which leads to a large margin.
  • Large C allows the constraints hard to be ignored which leads to a small margin.
  • For C=inf, all the constraints are enforced.

The easiest way to separate two classes of data is a line in case of 2D data and a plane in case of 3D data. But it is not always possible to use lines or planes and one requires a nonlinear region to separate these classes. Support Vector Machines handle such situations by using a kernel function which maps the data to a different space where a linear hyperplane can be used to separate classes. This is known as thekernel trick where the kernel function transforms the data into the higher dimensional feature space so that a linear separation is possible.

kernel trick SVM

If ϕ is the kernel function which maps xito ϕ(xi), the constraints change toyi(w⃗ .ϕ(xi)+b)≥1−ζi for all 1≤i≤n,ζi≥0

And the optimization problem is

Minimize in (w⃗ ,b){||w⃗ ||22+C∑iζi subject to yi(w⃗ .ϕ(xi)+b)≥1−ζi  for all 1≤i≤n,ζi≥0

We will not get into the solution of these optimization problems. The most common method used to solve these optimization problems is Convex Optimization.

Pros and Cons of Support Vector Machines

Every classification algorithm has its own advantages and disadvantages that are come into play according to the dataset being analyzed. Some of the advantages of SVMs are as follows:

  • The very nature of the Convex Optimization method ensures guaranteed optimality. The solution is guaranteed to be a global minimum and not a local minimum.
  • SVMis an algorithm which is suitable for both linearly and nonlinearly separable data (using kernel trick). The only thing to do is to come up with the regularization term, C.
  • SVMswork well on small as well as high dimensional data spaces. It works effectively for high-dimensional datasets because of the fact that the complexity of the training dataset in SVM is generally characterized by the number of support vectors rather than the dimensionality. Even if all other training examples are removed and the training is repeated, we will get the same optimal separating hyperplane.
  • SVMscan work effectively on smaller training datasets as they don’trely on the entire data.

Disadvantages of SVMs are as follows:

  • Theyarenot suitable for larger datasets because the training time with SVMs can be high and much more computationally intensive.
  • They areless effective on noisier datasets that have overlapping classes.

SVM with Python and R

Let us look at the libraries and functions used to implement SVM in Python and R.

Python Implementation

The most widely used library for implementing machine learning algorithms in Python is scikit-learn. The class used for SVMclassification in scikit-learn issvm.SVC()

sklearn.svm.SVC (C=1.0, kernel=’rbf’, degree=3, gamma=’auto’)

Parameters are as follows:

  • C: It is the regularization parameter, C, of the error term.
  • kernel: It specifies the kernel type to be used in the algorithm. It can be ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’, or a callable. The default value is ‘rbf’.
  • degree: It is the degree of the polynomial kernel function (‘poly’) and is ignored by all other kernels. The default valueis 3.
  • gamma: It is the kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’. If gamma is ‘auto’, then 1/n_features will be used instead.

There are many advanced parameters too which I have not discussed here. You can check them outhere.

https://gist.github.com/HackerEarthBlog/07492b3da67a2eb0ee8308da60bf40d9

One can tune the SVM by changing the parameters C,γ and the kernel function. The function for tuning the parameters available in scikit-learn is called gridSearchCV().

sklearn.model_selection.GridSearchCV(estimator, param_grid)

Parameters of this function are defined as:

  • estimator: It is theestimator object which is svm.SVC() in our case.
  • param_grid: It is the dictionary or list with parameters names (string) as keys and lists of parameter settings to try as values.

To know more about other parameters of GridSearch.CV(), click here.

https://gist.github.com/HackerEarthBlog/a84a446810494d4ca0c178e864ab2391

In the above code, the parameters we have considered for tuning are kernel, C, and gamma. The values from which the best value is to be are the ones written in the bracket. Here, we have only given a few values to be considered but a whole range of values can be given for tuning but it will take a longer time for execution.

R Implementation

The package that we will use for implementing SVM algorithm in R is e1071. The function used will be svm().

https://gist.github.com/HackerEarthBlog/0336338c5d93dc3d724a8edb67ad0a05

Summary

Inthis article, Ihave gone through a very basic explanation of SVM classification algorithm. I have left outa few mathematical complications such as calculating distances and solving the optimization problem. But I hope this gives you enough know-how abouthow a machine learning algorithm, that is,SVM, can be modified based on the type of dataset provided.

Introduction to Naive Bayes Classification Algorithm in Python and R

Let's say you are given with a fruit which is yellow, sweet, and long and you have to check the class to which it belongs.Step 2: Draw the likelihood table for the features against the classes.
NameYellowSweetLongTotal
Mango350/800=P(Mango|Yellow)450/8500/400650/1200=P(Mango)
Banana400/800300/850350/400400/1200
Others50/800100/85050/400150/1200
Total800=P(Yellow)8504001200
Step 3: Calculate the conditional probabilities for all the classes, i.e., the following in our example:







Step 4: Calculate [latex]\displaystyle\max_{i}{P(C_i|x_1, x_2,\ldots, x_n)}[/latex]. In our example, the maximum probability is for the class banana, therefore, the fruit which is long, sweet and yellow is a banana by Naive Bayes Algorithm.In a nutshell, we say that a new element will belong to the class which will have the maximum conditional probability described above.

Variations of the Naive Bayes algorithm

There are multiple variations of the Naive Bayes algorithm depending on the distribution of [latex]P(x_j|C_i)[/latex]. Three of the commonly used variations are
  1. Gaussian: The Gaussian Naive Bayes algorithm assumes distribution of features to be Gaussian or normal, i.e.,
    [latex]\displaystyle P(x_j|C_i)=\frac{1}{\sqrt{2\pi\sigma_{C_i}^2}}\exp{\left(-\frac{(x_j-\mu_{C_j})^2}{2\sigma_{C_i}^2}\right)}[/latex]
    Read more about it here.
  2. Multinomial: The Multinomial Naive Bayes algorithm is used when the data is distributed multinomially, i.e., multiple occurrences matter a lot. You can read more here.
  3. Bernoulli: The Bernoulli algorithm is used when the features in the data set are binary-valued. It is helpful in spam filtration and adult content detection techniques. For more details, click here.

Pros and Cons of Naive Bayes algorithm

Every coin has two sides. So does the Naive Bayes algorithm. It has advantages as well as disadvantages, and they are listed below:

Pros

  • It is a relatively easy algorithm to build and understand.
  • It is faster to predict classes using this algorithm than many other classification algorithms.
  • It can be easily trained using a small data set.

Cons

  • If a given class and a feature have 0 frequency, then the conditional probability estimate for that category will come out as 0. This problem is known as the "Zero Conditional Probability Problem." This is a problem because it wipes out all the information in other probabilities too. There are several sample correction techniques to fix this problem such as "Laplacian Correction."
  • Another disadvantage is the very strong assumption of independence class features that it makes. It is near to impossible to find such data sets in real life.

Naive Bayes with Python and R

Let us see how we can build the basic model using the Naive Bayes algorithm in R and in Python.

R Code

To start training a Naive Bayes classifier in R, we need to load the e1071 package.
library(e1071)
To split the data set into training and test data we will use the caTools package.
library(caTools)

The predefined function used for the implementation of Naive Bayes in R is called naiveBayes(). There are only a few parameters that are of use:
naiveBayes(formula, data, laplace = 0, subset, na.action = na.pass)
  • formula: The traditional formula [latex]Y\sim X_1+X_2+\ldots+X_n[/latex]
  • data: The data frame containing numeric or factor variables
  • laplace: Provides a smoothing effect
  • subset: Helps in using only a selection subset of the data based on some Boolean filter
  • na.action: Helps in determining what is to be done when a missing value in the data set is encountered
Let us take the example of the iris data set.
> library(e1071)

> library(caTools)



> data(iris)



> iris$spl=sample.split(iris,SplitRatio=0.7)

# By using the sample.split() we are creating a vector with values TRUE and FALSE and by setting

the SplitRatio to 0.7, we are splitting the original Iris dataset of 150 rows to 70% training

and 30% testing data.

> train=subset(iris, iris$spl==TRUE)#the subset of iris dataset for which spl==TRUE

> test=subset(iris, iris$spl==FALSE)



> nB_model <- naiveBayes(train[,1:4], train[,5])



> table(predict(nB_model, test[,-5]), test[,5]) #returns the confusion matrix

setosa versicolor virginica

setosa 17 0 0

versicolor 0 17 2

virginica 0 0 14

Python Code

We will use the Python library scikit-learn to build the Naive Bayes algorithm.
>>> from sklearn.naive_bayes import GaussianNB

>>> from sklearn.naive_bayes import MultinomialNB

>>> from sklearn import datasets

>>> from sklearn.metrics import confusion_matrix

>>> from sklearn.model_selection import train_test_split



>>> iris = datasets.load_iris()

>>> X = iris.data

>>> y = iris.target



# Split the data into a training set and a test set

>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

>>> gnb = GaussianNB()

>>> mnb = MultinomialNB()



>>> y_pred_gnb = gnb.fit(X_train, y_train).predict(X_test)

>>> cnf_matrix_gnb = confusion_matrix(y_test, y_pred_gnb)



>>> print(cnf_matrix_gnb)

[[16 0 0]

[ 0 18 0]

[ 0 0 11]]



>>> y_pred_mnb = mnb.fit(X_train, y_train).predict(X_test)

>>> cnf_matrix_mnb = confusion_matrix(y_test, y_pred_mnb)



>>> print(cnf_matrix_mnb)

[[16 0 0]

[ 0 0 18]

[ 0 0 11]]

Applications

The Naive Bayes algorithm is used in multiple real-life scenarios such as
  1. Text classification: It is used as a probabilistic learning method for text classification. The Naive Bayes classifier is one of the most successful known algorithms when it comes to the classification of text documents, i.e., whether a text document belongs to one or more categories (classes).
  2. Spam filtration: It is an example of text classification. This has become a popular mechanism to distinguish spam email from legitimate email. Several modern email services implement Bayesian spam filtering.
    Many server-side email filters, such as DSPAM, SpamBayes, SpamAssassin, Bogofilter, and ASSP, use this technique.
  3. Sentiment Analysis: It can be used to analyze the tone of tweets, comments, and reviews—whether they are negative, positive or neutral.
  4. Recommendation System: The Naive Bayes algorithm in combination with collaborative filtering is used to build hybrid recommendation systems which help in predicting if a user would like a given resource or not.

Conclusion

This article is a simple explanation of the Naive Bayes Classification algorithm, with an easy-to-understand example and a few technicalities.Despite all the complicated math, the implementation of the Naive Bayes algorithm involves simply counting the number of objects with specific features and classes. Once these numbers are obtained, it is very simple to calculate probabilities and arrive at a conclusion.Hope you are now familiar with this machine learning concept you most like would have heard of before.

Top 17 Competitive Data Scientists From India on Kaggle

"Data Scientist: Sexiest Job of the 21st century" – Harvard Business Review, 2012

In more recent times, Glassdoor named it the "best job of the year" for 2016.

Where did the title "Data Scientist" come from?

The title was coined in 2008 by Dr. Dhanurjay Patil, Chief Data Scientist at the White House Office of Science and Technology Policy, and Jeff Hammerbacher, Chief Scientist at Cloudera. Since then, data science has evolved from a niche field to a core business function and one of the most sought-after careers globally.

For those aspiring to become data scientists in India, here's a curated list of top-ranking professionals on Kaggle, highlighting their education, roles, and key achievements.

Bishwarup Bhattacharjee (Current Rank: 9)

Bishwarup Bhattacharjee

Bishwarup holds a bachelor's degree in Statistics from the University of Calcutta and is currently a Senior Business Analyst at VMware. He has previously consulted for multiple companies and co-founded Alphinite Analytics.

Achievements:

  1. 1st place – Allstate Claims Severity Challenge
  2. 2nd place – BNP Paribas Cardif Claim Challenge
  3. 4th place – Santander Customer Satisfaction Challenge

Abhishek Thakur (Current Rank: 19)

Abhishek Thakur

Abhishek holds a B.Tech. from NIT Surat and a Master's from the University of Bonn. He has served in several data science roles and is now Principal Data Scientist at ProductsUp.

Achievements:

  1. Gold medal – Springleaf Marketing Response Challenge (Rank 2)
  2. Rank 3 – How Much Did It Rain? Challenge
  3. Rank 3 – Otto Group Product Classification Challenge

Sudalai Rajkumar (SRK) (Current Rank: 26)

Sudalai Rajkumar

SRK, Lead Data Scientist at Freshdesk, is an experienced problem solver with a background in engineering and analytics. He’s also a top solver on CrowdANALYTIX.

Achievements:

  1. Gold medal – How Much Did It Rain? Challenge (Rank 2)
  2. Rank 5 – EEG Detection Challenge
  3. Rank 6 – Taxi Trajectory Prediction Challenge

Thakur Raj Anand (Current Rank: 45)

Thakur Raj Anand

Thakur is a Data Scientist at DataRobot. He completed his post-graduation in Applied Quantitative Finance from Madras School Of Economics and has held various roles in analytics and consulting.

Achievements:

  1. Rank 6 – Bosch Production Line Performance Challenge
  2. Gold medal – Homesite Quote Conversion Challenge
  3. Rank 9 – Avito Duplicate Ads Detection Challenge

These accomplished professionals serve as an inspiration for aspiring data scientists in India and beyond. Their contributions reflect the power of data-driven problem solving and continuous learning in a competitive global landscape.

Explaining The Basics of Machine Learning, Algorithms and Applications

“Data is abundant and cheap but knowledge is scarce and expensive.”

In last few years, the sources of data capturing have evolved overwhelmingly. No longer companies limit themselves to surveys, questionnaire and other traditional forms of data collection. Smartphones, online browsing activity, drones, cameras are the modern form of data collection devices. And, believe me, that data is enormous.

There is no way a human can look at such huge amounts of data and make sense out of it. Even if it is possible, it would be prone to irresistible errors. Is there a way out? Yes, Machine Learning has enabled humans to make intelligent real life decision by making relatively less errors.

Have a look at the exciting ~ 4mins video below. It gives an idea of how machine learning is making computers, and many of the things like maps, search, recommending videos, translations, etc. better.

At the end of this article, you will be familiar with the basic concepts of machine learning, types of machine learning, its applications, and a lot more. Let us begin by addressing the elephant in the room. Machine learning challenge, ML challenge

What is Machine Learning (ML)?

The search engines (Google, Bing, Duckduckgo) have become the new knowledge discovery platforms. They have answers (probably accurate) to almost every silly question you can think of? But, how did it become so intelligent? Think about it!

In the meanwhile, let us first look at a few definitions of machine learning. The term “machine learning” was coined by Arthur Samuel in 1959. According to him,

+ "Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed."

Tom M. Mitchell provided a more formal definition, which says,

+ "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."

In simple words, machine learning is a set of techniques used to program computers and make decisions automatically. How does it make decisions? It makes decisions by detecting (or learning) pattern in the past data and generalising it on the future data. There can be different forms of decisions such as predictions of the house prices or the weather or customer behavior, or classifications, like whether a spoken word in a recording is "world" or whether a photograph contains a face. To enhance the process of detecting these patterns and improving decision-making, one can make use of data simulation.

An ideal example for practical use of machine learning is email spam filters. Services like google, yahoo, hotmail etc uses machine learning to detect if an email is spam or not. Furthermore, there are numerous other applications that as well which we'll look at later on in this article.

+ “True loneliness is when you don’t even receive spam emails.”

What are the different types of ML algorithms?

There are several types of ML algorithms and techniques that you can easily get lost. Therefore, for better understanding, they have been divided into 3 major categories. Following is a list of different categories and types of machine learning algorithms:

Types of Machine Learning

1. Supervised Learning

It is one of the most commonly used types of machine learning algorithms. In these types of ML algorithms, we have input and output variables and the algorithm generates a function that predicts the output based on given input variables. It is called 'supervised' because the algorithm learns in a supervised (given target variable) fashion. This learning process iterates over the training data until the model achieves an acceptable level. Supervised learning problems can be further divided into two parts:

  • Regression: A supervised problem is said to be regression problem when the output variable is a continuous value such as “weight”, “height” or “dollars.”
  • Classification: It is said to be a classification problem when the output variable is a discrete (or category) such as “male” and “female” or “disease” and “no disease.”

A real-life application of supervised machine learning is the recommendation system used by Amazon, Google, Facebook, Netflix, Youtube, etc. Another example of supervised machine learning is fraud detection. Let's say, a sample of the records is collected, and it is manually classified as “fraudulent or non-fraudulent”. These manually classified records are then used to train a supervised machine learning algorithm, and it can be further used to predict frauds in the future. Some examples for supervised algorithms include Linear Regression, Decision Trees, Random Forest, k nearest neighbours, SVM, Gradient Boosting Machines (GBM), Neural Network etc.

2. Unsupervised Learning

In unsupervised machine learning algorithms, we only have input data and there is no corresponding output variable. The aim of these type of algorithms is to model the underlying structure or distribution in the dataset so that we can learn more about the data. It is called so because unlike supervised learning, there is no teacher and there are no correct answers. Algorithms are left to their own devices to discover and present the structure in the data. Similar to supervised learning problems, unsupervised learning problems can also be divided into two groups, namely Cluster analysis and Association.

  • Cluster analysis: A cluster analysis problem is where we want to discover the built-in groupings in the data.
  • Association: An association rule learning problem is where we want to discover the existence of interesting relationships between variables in the dataset.

In marketing, unsupervised machine learning algorithms can be used to segment customers according to their similarities which in return is helpful in doing targeted marketing. Some examples for unsupervised learning algorithms would be k-means clustering, hierarchical clustering, PCA, Apriori algorithm, etc.

3. Reinforcement Learning

In reinforcement learning algorithm, the machine is trained to act given an observation or make specific decisions. It is learning by interacting with an environment. The machine learns from the repercussions of its actions rather than from being explicitly taught. It is essentially trial-and-error learning where the machine selects its actions on the basis of its past experiences and new choices. In this, machine learns from these actions and tries to capture the best possible knowledge to make accurate decisions. An example of reinforcement learning algorithm is Markov Decision Process.

In a nutshell, there are three different ways in which a machine can learn. Imagine yourself to be a machine. Suppose in an exam you are provided with an answer sheet where you can see the answers after your calculations. Now, if the answer is correct you will do the same calculations for that particular type of question. This is when it is said that you have learned through supervised learning.

Imagine the situation where you are not provided with the answer sheet and you have to learn on your own whether the answer is correct or not. You may end up giving wrong answers to most questions in the beginning but, eventually, you will learn how to answer correctly. This will be called unsupervised learning

Consider the third case where a teacher is standing next to you in the exam hall and looking at your answers as you write. Whenever you write a correct answer, she says “good” and whenever you write a wrong answer, she says “very bad,” and based on the remarks she gives, you try to improve (i.e., score the maximum possible in the exam). This is called reinforcement learning.

Where are some real life applications of machine learning?

There are numerous applications of machine learning. Here is a list of a few of them:

  1. Weather forecast: ML is applied to software that forecasts weather so that the quality can be improved.
  2. Malware stop/Anti-virus: With an increasing number of malicious files every day, it is getting impossible for humans and many security solutions to keep up, and hence, machine learning and deep learning are important. ML helps in training anti-virus software so that they can predict better.
  3. Anti-spam: We have already discussed this use case of ML. ML algorithms help spam filtration algorithms to better differentiate spam emails from anti-spam mails.
  4. Google Search: Google search resulting in amazing results is another application of ML which we have already talked about.
  5. Game playing: There can be two ways in which ML can be implemented in games, i.e., during the design phase and during runtime.
    • Designing phase: In this phase, the learning is applied before the game is rolled out. One example could be LiveMove/LiveAI products from AiLive, which are the ML tools that recognize motion or controller inputs and convert them to gameplay actions.
    • Runtime: In this phase, learning is applied during runtime and fitted to a particular player or game session. Forza Motorsports is one such example where an artificial driver can be trained on the basis of one's own style.
  6. Face detection/Face recognition: ML can be used in mobile cameras, laptops, etc. for face detection and recognition. For instance, cameras snap a photo automatically whenever someone smiles much more accurately now because of advancements in machine learning algorithms.
  7. Speech recognition: Speech recognition systems have improved significantly because of machine learning. For example, look at Google now.

  8. Genetics: Clustering algorithms in machine learning can be used to find genes that are associated with a particular disease. For instance, Medecision, a health management company, used a machine learning platform to gain a better understanding of diabetic patients who are at risk.

There are numerous other applications such as image classification, smart cars, increase cyber security and many more.

How can you start with machine learning?

There are several free open courses available online where you can start learning at your own pace:

  1. Coursera courses
    • Machine Learning created by Stanford University and taught by Andrew Ng: This course provides an introduction to machine learning, data mining, and statistical pattern recognition. Click here
    • Practical Machine Learning created by Johns Hopkins University and taught by Jeff Leek, Roger D. Peng, and Brian Caffo: This course covers the basic components of applying and building prediction functions with an emphasis on practical applications.
  2. Udacity Courses
    • It is a graduate-level course that covers the area of Artificial Intelligence concerned with programs that modify and improve the performance through experiences. Click here
    • Introduction to machine learning taught by Katie Malone and Sebastian Thrun: Click here
  3. edX courses
    • Principles of Machine Learning taught by Dr. Steve Elston and Cynthia Rudin: Click here
    • Machine Learning taught by Professor John W. Paisley: Click here

You can also check out the detailed list of free courses on machine learning and artificial intelligence. To conclude, machine learning is not rocket science (though it is used in rocket science). This article is meant for people who have probably heard about machine learning but don’t know what it is. This post just gives a basic understanding for a beginner. For more detailed articles, you can go here.

7 Powerful Programming Languages For Doing Machine Learning

Introduction

There exists a world for Machine Learning beyond R and Python!

Machine Learning is a product of statistics, mathematics, and computer science. As a practice, it has grown phenomenally in the last few years. It has empowered companies to build products like recommendation engines, self driving cars etc. which were beyond imagination until a few years back. In addition, ML algorithms have also given a massive boost to big data analysis.

But, how is ML making all these accomplishments?

After realising the sheer power of machine learning, lots of people and companies have invested their time and resources in creating a supportive ML environment. That's why, we come across several open source projects these days.

You have a great opportunity right now to make most out of machine learning. No longer, you need to write endless codes to implement machine learning algorithms. Some good people have already done the dirty work. Yes, they've made libraries. Your launchpad is set.

In this article, you'll learn about top programming languages which are being used worldwide to create machine learning models/products.

Why are libraries useful?

A library is defined as a collection of non-volatile and pre-compiled codes. Libraries are often used by programs to develop software.

Libraries tend to be relatively stable and free of bugs. If we use appropriate libraries, it reduces the amount of code that is to be written. The fewer the lines of code, the better the functionality. Therefore, in most cases, it is better to use a library than to write our own code.

Libraries can be implemented more efficiently than our own codes in algorithms. So people have to rely on libraries in the field of machine learning.

Correctness is also an important feature like efficiency is in machine learning. We can never be sure if an algorithm is implemented perfectly after reading the original research paper twice. An open source library consists of all the minute details that are dropped out of scientific literature.

Machine learning challenge, ML challenge

7 Programming Languages for Machine Learning

Python

Python is an old and very popular language designed in 1991 by Guido van Rossum. It is open source and is used for web and Internet development (with frameworks such as Django, Flask, etc.), scientific and numeric computing (with the help of libraries such as NumPy, SciPy, etc.), software development, and much more.

Let us now look at a few libraries in Python for machine learning:

  1. Scikit-learn

    It was started in 2007 by David Cournapeau as a Google Summer of Code project. Later in 2007, Matthieu Brucher started to work on this project as a part of his thesis. In 2010, Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, and Vincent Michel of INRIA took the leadership of the project. The first edition was released on February 1, 2010. It is built on libraries such as NumPy, SciPy, and Matplotlib.

    Features:

    1. It is open source and commercially usable.
    2. It integrates a wide range of machine learning algorithms for medium-scale supervised and unsupervised problems.
    3. It provides a uniform interface for training and using models.
    4. It also provides a set of tools for chaining, evaluating, and tuning model hyperparameters.
    5. It also supports libraries for data transformation steps such as cleaning data and reducing, expanding, or generating feature representations.
    6. In cases where the number of examples/features or the speed at which it is to be processed is challenging, scikit-learn has a number of options that we can consider when scaling the system.
    7. It has a detailed user guide and documentation.

    A few companies that use scikit-learn are Spotify, Evernote, Inria, and Betaworks.
    Official website: Click here

  2. TensorFlow

    It was initially released on November 9, 2015, by the Google Brain Team. It is a machine learning library written in Python and C++.

    Features:

    1. It is an open source software library for machine intelligence.
    2. It is very flexible in that it is not just a rigid neural network library. We can construct graphs and write inner loops that drive computation.
    3. It can run on GPUs, CPUs, desktop, server, or mobile computing platforms.
    4. It connects research and production.
    5. It supports automatic differentiation which is very helpful in gradient-based machine learning algorithms.
    6. It has multiple language options. It comes with an easy to use Python interface and a C++ interface to build and execute computational graphs.
    7. It has detailed tutorials and documentation.

    It is used by companies like Google, DeepMind, Mi, Twitter, Dropbox, eBay, Uber, etc.
    Official Website: Click here

  3. Theano

    It is an open source Python library that was built at the Université de Montréal by a machine learning group. Theano is named after the Greek mathematician, who may have been Pythagoras’ wife. It is in tight integration with NumPy.

    Features:

    1. It enables us to define, optimize, and evaluate mathematical expressions including the multi-dimensional arrays which can be difficult in many other libraries.
    2. It combines aspects of an optimizing compiler with aspects of a computer algebra system.
    3. It can optimize execution speeds, that is, it uses g++ or nvcc to compile parts of the expression graph which run faster than pure Python.
    4. It can automatically build symbolic graphs for computing gradients. It also has the ability to recognize some numerically unstable expressions.
    5. It has tons of tutorials and a great documentation.

    A few companies that use Theano are Facebook, Oracle, Google, and Parallel Dots.
    Official Website: Click here

  4. Caffe

    Caffe is a framework for machine learning in vision applications. It was created by Yangqing Jia during his PhD at UC Berkeley and was developed by the Berkeley Vision and Learning Center.

    Features:

    1. It is an open source library.
    2. It has got an extensive architecture which encourages innovation and application.
    3. It has extensible code which encourages development.
    4. It is quite fast. It takes 1 ms/image for inference and 4 ms/image for learning. They say "We believe that Caffe is the fastest ConvNet implementation available."
    5. It has a huge community.

    It is used by companies such as Flicker, Yahoo, and Adobe.
    Official Website: Click here

  1. GraphLab Create

    The GraphLab Create is a Python package that was started by Prof. Carlos Guestrin of Carnegie Mellon University in 2009. It is now known as Turi and was known as Dato before this. GraphLab Create is a commercial software that comes with a free one year subscription (for academic use only). It allows to perform end-to-end large scale data analysis and data product development.

    Features:

    1. It provides an interactive GUI which allows to explore tabular data, summary plots and statistics.
    2. It includes several toolkits for quick prototyping with fast and scalable algorithms.
    3. It places data and computation using sophisticated new algorithms which makes it scalable.
    4. It has a detailed user guide.

    Official Website: Click here

There are numerous other notable Python libraries for machine learning such as Pattern, NuPIC, PythonXY, Nilearn, Statsmodels, Lasagne, etc.

R

R is a programming language and environment built for statistical computing and graphics. It was designed by Robert Gentleman and Ross Ihaka in August 1993. It provides a wide variety of statistical and graphical techniques such as linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc. It is a free software.

Following are a few packages in R for machine learning:

  1. Caret

    The caret package (short for Classification And REgression Training), was written by Max Kuhn. Its development started in 2005. It was later made open source and uploaded to CRAN. It is a set of functions that attempt to unify the process for predictive analysis.

    Features:

    1. It contains tools for data splitting, pre-processing, feature selection, model tuning using resampling, variable importance estimation, etc.
    2. It provides a simple and common interface for many machine learning algorithms such as linear regression, neural networks, and SVMs.
    3. It is easy and simple to learn. Also, there are a lot of useful resources and a good tutorial.

    Official Website: Click here

  2. MLR

    It stands for Machine Learning in R. It was written by Bernd Bischl. It is a common interface for machine learning tasks such as classification, regression, cluster analysis, and survival analysis in R.

    Features:

    1. It is possible to fit, predict, evaluate and resample models with only one interface.
    2. It enables easy hyperparameter tuning using different optimization strategies.
    3. It involves built-in parallelization.
    4. It includes filter and wrapper methods for feature selection.

    Official Website: Click here

  3. h2o

    It is the R interface for H2O. It was written by Spencer Aiello, Tom Kraljevic and Petr Maj, with the contributions from the H2O.ai team. H2O makes it easy to apply machine learning and predictive analytics to solve the most challenging business problems. h2o is an R scripting functionality for H2O.

    Features:

    1. It is an open source math engine for Big Data.
    2. It computes parallel distributed machine learning algorithms such as generalized linear models, gradient boosting machines, random forests, and neural networks within various cluster environments.
    3. It provides functions for building GLM, K-means, Naive Bayes, Principal Components Analysis, Principal Components Regression, etc.
    4. It can be installed as a standalone or on top of an existing Hadoop installation.

    Official Website: Click here

Other packages in R that are worth considering for machine learning are e1071, rpart, nnet, and randomForest.

Golang

Go language is a programming language which was initially developed at Google by Robert Griesemer, Rob Pike, and Ken Thompson in 2007. It was announced in November 2009 and is used in some of Google's production systems.

It is a statically typed language which has a syntax similar to C. It provides a rich standard library. It is easy to use but the code compiles to a binary that runs almost as fast as C. So it can be considered for tasks dealing with large volumes of data.

Below is a list of libraries in Golang which are useful for data science and related fields:

  1. GoLearn

    GoLearn is claimed as a batteries included machine learning library for Go. The aim is simplicity paired with customizability.

    Features:

    1. It implements the scikit-learn interface of Fit/Predict.
    2. It also includes helper functions for data, like cross-validation, and train and test splitting.
    3. It supports performing matrix-like operations on data instances and passing them to estimators.
    4. GoLearn has support for linear and logistic regression, neural networks, K-nearest neighbor, etc.

    Official Website: Click here

  2. Gorgonia

    Gorgonia is a library in Go that helps facilitate machine learning. Its idea is quite similar to TensorFlow and Theano. It is low-level but has high goals.

    Features:

    1. It eases the process of writing and evaluating mathematical equations involving multidimensional arrays.
    2. It can perform automatic differentiation, symbolic differentiation, gradient descent optimizations, and numerical stabilization.
    3. It provides many functions which help in creating neural networks conveniently.
    4. It is fast in comparison to TensorFlow and Theano.

    Official website: Click here

  3. Goml

    goml is a library for machine learning written entirely in Golang. It lets the developer include machine learning into their applications.

    Features:

    1. It includes comprehensive tests and extensive documentation.
    2. It has clean, expressive, and modular source code.
    3. It currently supports models such as generalized linear models, clustering, text classification, and perceptron (only in online option).

    Official Website: Click here

There are other libraries too that can be considered for machine learning such as gobrain, goglaib, gago, etc.

Java

Java is a general-purpose computer programming language. It was initiated by James Gosling, Mike Sheridan, and Patrick Naughton in June 1991. The first implementation as Java 1.0 was released in 1995 by Sun Microsystems.

Some libraries in Java for machine learning are:

  1. WEKA

    It stands for Waikato Environment for Knowledge Analysis. It was created by the machine learning group at the University of Waikato. It is a library with a collection of machine learning algorithms for data mining tasks. These algorithms can either be applied directly to a dataset or we can call it from our own Java code.

    Features:

    1. It is an open source library.
    2. It contains tools for data pre-processing and data visualization.
    3. It also contains tools for classification, regression, clustering, and association rule.
    4. It is also well suited for creating new machine learning schemes.

    Official Website: Click here

  2. JDMP

    It stands for Java Data Mining Package. It is a Java library for data analysis and machine learning. Its contributors are Holger Arndt, Markus Bundschus, and Andreas Nägele. It treats every type of data as a matrix.

    Features:

    1. It is an open source Java library.
    2. It facilitates access to data sources and machine learning algorithms and provides visualization modules also.
    3. It provides an easy interface for data sets and algorithms.
    4. It is fast and can handle huge (terabyte-sized) datasets.

    Official Website: Click here

  3. MLlib (Spark)

    MLlib is a machine learning library for Apache Spark. It can be used in Java, Python, R, and Scala. It aims at making practical machine learning scalable and easy.

    Features:

    1. It contains many common machine learning algorithms such as classification, regression, clustering, and collaborative filtering.
    2. It contains utilities such as feature transformation and ML pipeline construction.
    3. It includes tools such as model evaluation and hyperparameter tuning.
    4. It also includes utilities such as distributed linear algebra, statistics, data handling, etc.
    5. It has a vast user guide.

    It is used by Oracle.
    Official Website: Click here

Other libraries: Java-ML, JSAT

C++

Bjarne Stroustrup began to work on "C with Classes" which is the predecessor to C++ in 1979. "C with Classes" was renamed to "C++" in 1983. It is a general-purpose programming language. It has imperative, object-oriented, and generic programming features, and it also provides facilities for low-level memory manipulation.

  1. mlpack

    mlpack is a machine learning library in C++ which emphasizes scalability, speed, and ease of use. Initially, it was produced by the FASTLab at Georgia Tech. mlpack was presented at the BigLearning workshop of NIPS 2011 and later published in the Journal of Machine Learning Research.

    Features:

    1. An important feature of mlpack is the scalability of the machine learning algorithms that it implements and it is achieved mostly by the use of C++.
    2. It allows kernel functions and arbitrary distance metrics for all its methods.
    3. It has high-quality documentation available.

    Official Website: Click here

  2. Shark

    Shark is a C++ machine learning library written by Christian Igel, Verena Heidrich-Meisner, and Tobias Glasmachers. It serves as a powerful toolbox for research as well as real-world applications. It depends on Boost and CMake.

    Features:

    1. It is an open source library.
    2. It provides an accord between flexibility, ease of use, and computational efficiency.
    3. It provides tools for various machine learning techniques such as LDA, linear regression, PCA, clustering, neural networks, etc.

    Official Website: Click here

  3. Shogun

    It is a machine learning toolbox developed in 1999 initiated by Soeren Sonnenburg and Gunnar Raetsch.

    Features:

    1. It can be used through a unified interface from multiple languages such as C++, Python, Octave, R, Java, Lua, C#, Ruby, etc.
    2. It enables an easy combination of multiple data representations, algorithm classes, and general purpose tools.
    3. It spans the whole space of machine learning methods including classical (such as regression, dimensionality reduction, clustering) as well as more advanced methods (such as metric, multi-task, structured output, and online learning).

    Official Website: Click here

Other libraries: Dlib-ml, MLC++

Julia

Julia is a high-performance dynamic programming language designed by Jeff Bezanson, Stefan Karpinski, Viral Shah, and Alan Edelman. It first appeared in 2012. The Julia developer community is contributing a number of external packages through Julia's built-in package manager at a rapid pace.

  1. ScikitLearn.jl

    The scikit-learn Python library is a very popular library among machine learning researchers and data scientists. ScikitLearn.jl brings the capabilities of scikit-learn to Julia. The primary goal of it is to integrate Julia and Python-defined models together into the scikit-learn framework.

    Features:

    1. It offers around 150 Julia and Python models that can be accessed through a uniform interface.
    2. ScikitLearn.jl provides two types: Pipelines and Feature Unions for data preprocessing and transformation.
    3. It offers a possibility to combine features from DataFrames.
    4. It provides features to find the best set of hyperparameters.
    5. It has a fairly detailed manual and a number of examples.

    Official Website: Click here

  2. MachineLearning.jl

    It is a library that aims to be a general-purpose machine learning library for Julia with a number of support tools and algorithms.

    Features:

    1. It includes functionality for splitting datasets into training dataset and test dataset and performing cross-validation.
    2. It also includes a lot of algorithms such as decision tree classifier, random forest classifier, basic neural network, etc.

    Official Website: Click here

  3. MLBase.jl

    It is said to be "a swiss knife for machine learning". It is a Julia package which provides useful tools for machine learning applications.

    Features:

    1. It provides many functions for data preprocessing such as data repetition and label processing.
    2. It supports tools such as classification performance, hit rate, etc. for evaluating the performance of a machine learning algorithm.
    3. It implements a variety of cross validation schemes such as k-fold, leave-one-out cross validation, etc.
    4. It has good documentation, and there are a lot of code examples for its tools.

    Official Website: Click here

Scala

Scala is another general-purpose programming language. It was designed by Martin Odersky and first appeared on January 20, 2004. The word Scala is a portmanteau of scalable and language which signifies that it is designed to grow with the demands of its users. It runs on JVM, hence Java and Scala stacks can be mixed. Scala is used in data science.

Here's a list of a few libraries in Scala that can be used for machine learning.

  1. ScalaNLP

    ScalaNLP is a suite of machine learning, numerical computing libraries, and natural language processing. It includes libraries like Breeze and Epic.

    • Breeze: It is a set of libraries for machine learning and numerical computing.
    • Epic: It is a natural language processing and prediction library written in Scala.

    Official Website: Click here

This is not an exhaustive list. There are various other languages such as SAS and MATLAB where one can perform machine learning.

5 Free Python IDE for Machine Learning

Integrated Development Environment (IDE)

An integrated development environment is an application which provides programmers and developers with basic tools to write and test software. In general, an IDE consists of an editor, a compiler (or interpreter), and a debugger which can be accessed through a graphic user interface (GUI).

According to Wikipedia, “Python is a widely used high-level, general-purpose, interpreted, dynamic programming language.” Python is a fairly old and a very popular language. It is open source and is used for web and Internet development (with frameworks such as Django, Flask, etc.), scientific and numeric computing (with the help of libraries such as NumPy, SciPy, etc.), software development, and much more.

Text editors are not enough for building large systems which require integrating modules and libraries and a good IDE is required.

Here is a list of some Python IDEs with their features to help you decide a suitable IDE for your machine learning problem.

JuPyter/IPython Notebook

Project Jupyter started as a derivative of IPython in 2014 to support scientific computing and interactive data science across all programming languages.

IPython Notebook says that “IPython 3.x was the last monolithic release of IPython. As of IPython 4.0, the language-agnostic parts of the project: the notebook format, message protocol, qtconsole, notebook web application, etc. have moved to new projects under the name Jupyter. IPython itself is focused on interactive Python, part of which is providing a Python kernel for Jupyter.”

Jupyter constitutes of three components - notebook web applications, kernels, and notebook documents.

Some of its key features are the following:
  1. It is open source.
  2. It can support up to 40 languages, and it includes languages popular for data science such as Python, R, Scala, Julia, etc.
  3. It allows one to create and share the documents with equations, visualization and most importantly live codes.
  4. There are interactive widgets from which code can produce outputs such as videos, images, and LaTeX. Not only this, interactive widgets can be used to visualize and manipulate data in real-time.
  5. It has got Big Data integration where one can take advantage of Big Data tools, such as Apache Spark, from Scala, Python, and R. One can explore the same data with libraries such as pandas, scikit-learn, ggplot2, dplyr, etc.
  6. The Markdown markup language can provide commentary for the code, that is, one can save logic and thought process inside the notebook and not in the comments section as in Python.
Jupyter- Python IDE

Some of the uses of Jupyter notebook includes data cleaning, data transformation, statistical modelling, and machine learning.

Some of the features specific to machine learning are that it has been integrated with libraries like matplotlib, NumPy, and Pandas. Another major feature of the Jupyter notebook is that it can display plots that are the output of running code cells.

It is currently used by popular companies such as Google, Microsoft, IBM, etc. and educational institutions such as UC Berkeley and Michigan State University.

Free download: Click here.

Machine learning challenge, ML challenge

PyCharm

PyCharm is a Python IDE developed by JetBrains, a software company based in Prague, Czech Republic. Its beta version was released in July 2010 and version 1.0 came three months later in October 2010.

PyCharm is a fully featured, professional Python IDE that comes in two versions: PyCharm Community Edition, which is free, and a much more advanced PyCharm Professional Edition, which comes as a 30-day free trial.

The fact that PyCharm is used by many big companies such as HP, Pinterest, Twitter, Symantec, Groupon, etc. proves its popularity.

Some of its key features are the following:
  1. It includes creative code completion for classes, objects and keywords, auto-indentation and code formatting, and customizable code snippets and formats.
  2. It shows on-the-fly error highlighting (displays error as you type). It also contains PEP-8 for Python that helps in writing neat codes that are easy to support for other languages.
  3. It has features for serving fast and safe refactoring.
  4. It includes a debugger for Python and JavaScript with a graphical UI. One can create and run tests with a GUI-based test runner and coding assistance.
  5. It has a quick documentation/definition view where one can see the documentation or object definition in the place without losing the context. Also, the documentation provided by JetBrains (here) is comprehensive, with video tutorials.
PyCharm- Python IDE

The most important feature that makes it fit for machine learning is its support for libraries such as Scikit-Learn, Matplotlib, NumPy, and Pandas.

There are features like Matplotlib interactive mode which work both in Python and debugger console where one can plot, manage, and explore the graphs in real time.

Also, one can define different environments (Python 2.7; Python 3.5; virtual environments) based on individual projects.

Free download: Click here

Spyder

Spyder stands for Scientific PYthon Development EnviRonment. Spyder’s original author is Pierre Raybaut, and it was officially released on October 18, 2009. Spyder is written in Python.

Some of its key features are the following:
  1. It is open source.
  2. Its editor supports code introspection/analysis features, code completion, horizontal and vertical splitting, and goto definition.
  3. It comes with Python and IPython consoles workspace, and it supports debugging runtime, i.e., as soon as you type it will display the errors.
  4. It has got a documentation viewer where it shows documentation related to classes or functions called either in editor or console.
  5. It also supports variable explorer where one can explore and edit the variables that are created during the execution of file from a graphic user interface like Numpy array ones.
Spyder- Python IDE

It integrates NumPy, Scipy, Matplotlib, and other scientific libraries. Spyder is best when used as an interactive console for building and testing numeric and scientific applications and scripts built on libraries such as NumPy, SciPy, and Matplotlib.

Apart from this, it is a simple and light-weight software which is easy to install and has very detailed documentation.

Rodeo

Rodeo is a Python IDE that's built expressly for doing machine learning and data science in Python. It was developed by Yhat. It uses IPython kernel.

Some of its key features are the following:
  1. It makes it easy to explore, compare, and interact with data frames and plots.
  2. The Rodeo text editor comes with auto-completion, syntax highlighting, and built-in IPython support so that writing code gets faster.
  3. Rodeo comes integrated with Python tutorials. It also includes cheat sheets for quick material reference.
Rodeo- Python IDE

It is useful for the researchers and scientists who are used to working in R and RStudio IDE.

It has many features similar to Spyder, but it lacks many features such as code analysis, PEP 8, etc. Maybe Rodeo will come up with new features in future as it is fairly new.

Free download: Click here.

Geany

Geany is a Python IDE originally written by Enrico Tröger in C and C++. It was initially released on October 19, 2005. It is a small and lightweight IDE (14 MB for windows) which is as capable as any other IDE.

Some of its key features are the following:
  1. Its editor supports syntax highlighting and line numbering.
  2. It also comes with features like auto-completion, auto closing of braces, auto closing of HTML, and XML tags.
  3. It includes code folding and code navigation.
  4. One can build systems to compile and execute the code with the help of external codes.
Geany-Python IDE

Free download: Click here.

For those who are familiar with RStudio and want to look for options in Python, RStudio has included editor support for Python, XML, YAML, SQL, and shell scripts in edition 0.98.932, which was released on June 18 2014, although there is a little support for Python as compared to R.

This is not an exhaustive list. There are other Python IDEs such as PyDev, Eric, Wing, etc. To know about more them, you can go to the Python wiki page here.