Machine Learning
k-nearest neighbor algorithm for supervised learning in Python
So far we have been going with the prerequisites to Machine Learning. We have covered a few of the important concepts that are required to solve Machine Learning problems. If you have missed those posts, I recommend you to go through Machine Learning Concepts. Here we will be focusing on Supervised Machine Learning. If you are not sure about various techniques to solve machine learning problems. Let's get started. Supervised Learning is the learning where the value or result that we want…
Boosting Algorithms explained in detail
In the post, Random Forests Explained in detail we discussed Random Forest which uses the technique of Bagging to create an ensemble of the decision tree. In this post, we will be discussing Boosting techniques and will look at a few popular algorithms: Adaptive Boosting or AdaBoost Gradient Boosting which uses boosting techniques to create an ensemble. Introduction to Boosting The basic idea of Boosting is to combine a lot of weak learners to get a strong learner, where a weak learner…
Distributional Semantics: Techniques to represent words as vectors
The distributional hypothesis states that the context words of the given ambiguous word determine the correct meaning of the word. Which in simple terms means that the meaning of the given word can be determined in the context (or neighboring words) in which the word is used. There are multiple techniques available to understand the meaning of the word based on context. In this post, we will be covering one of the most popular techniques which is word embeddings but before going there we will…
Multivariate Linear Regression detailed explanation
In the previous post, Simple Linear Regression detailed Explanation we understand how to apply Linear Regression to the problem statement where we have only one independent variable. However, in the real-time scenario, there will be many independent variables that will contribute to predicting the target variable. Here, I will be demonstrating using the Boston dataset from the sklearn library. Let's create the Boston dataset and split it into the training and test datasets for evaluation. The…
Data Visualization using Bokeh package in Python
In this post, we will be looking at the technique used to visualize data using the Bokeh package. We will be using the house property sales dataset from the kaggle. The data has been loaded in housePropertyDataset variable. import pandas as pd housePropertyDataset = pd.read_csv('house_property_sales.csv') Let's get started. Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile…
Importing data using Python
I n today's world, there is a lot of data being generated from various devices. The format of data varies from flat files to tabular structures. In this post, we will be looking into Python packages, for importing data using Python. We will be looking at techniques to import the following file types using Python packages: Flat files - .txt, .csv files Pickled file Excel files SAS files STATA file HDF5 files mat file Relational database Reading data from the web Let's get started…
Improving the Supervised Learning Model using Python
Typically in Machine Learning, we generate a Machine Learning model and once the model is generated we use multiple techniques to measure the accuracy of the model. But, the question that might be coming to your mind is: "The accuracy of the model doesn't match the expectations. What can I do to improve the accuracy of the model?" Valid thought. In this post, we will be looking at a few best practices to generate Models. In other words, we will be looking at techniques to tune the model. Let'…
Hypothesis Testing explained using practical example
There are a lot of articles/blogs that you can find on hypothesis testing. But I still feel the need to write one because when I understood Hypothesis Testing, I failed to find one that can intuitively tell real-world examples of how we can use Hypothesis Testing. Inspired by the book "Naked Statistics - Stripping the dread from the Data" by Charles Wheelan, this is my attempt to explain hypothesis testing. A hypothesis test evaluates two mutually exclusive statements about a population to…
Simple Linear Regression detailed Explanation
In this post, we will be starting to learn about the Linear Regression algorithm. We are starting up with Linear Regression because it is an easy-to-understand but very powerful algorithm to solve machine learning problems. We will be doing hands-on as we process along with the post. I will be using the Boston house price dataset from sklearn to explain Linear Regression. Before we start let's just import the data and create the data frame. import pandas as pd from sklearn.datasets import…