Machine Learning

Simple Linear Regression detailed Explanation
In this post, we will be starting to learn about the Linear Regression algorithm. We are starting up with Linear Regression because it is an easy-to-understand but very powerful algorithm to solve machine learning problems. We will be doing hands-on as we process along with the post. I will be using the Boston house price dataset from sklearn to explain Linear Regression. Before we start let's just import the data and create the data frame. import pandas as pd from sklearn.datasets import…

Unlocking the Power of Natural Language Processing: An Introduction and Pipeline Framework for Solving NLP Tasks
In this post Natural Language Processing we will be understanding what NLP is and various techniques that are used in the process. There are various packages provided to perform NLP or text-related operations in Python such as NLTK, spaCy, TextBlob, Pattern, gensim, MITIE, guess_language, Python wrapper for Stanford CoreNLP, Python wrapper for Berkeley Parser, readability-lxml, BeautifulSoup, etc. In this post, we will be focusing on NLTK and the spaCy package. Let's get started. Regular…

Part 2: Guide to build accurate Time Series Forecasting models - Auto Regressive Models
In the post Part 1: Guide to build accurate Time Series Forecasting models, we discussed the different smoothing models of forecasting using a time series dataset. This post is the continuation of the previous post where we will talk about another family used for time series forecasting called an autoregressive model (AR Model). If you are not aware of how time series models work or want to revise different smoothing models such as simple moving average, simple exponential smoothing, Holt's…

Part 1: Guide to build accurate Time Series Forecasting models
Time series forecasting is a statistical technique that predicts future values over time based on past observations. Unlike other forms of data analysis, time series forecasting involves analyzing data ordered in time. This means that each observation in the dataset is associated with a specific point in time, such as hourly, daily, weekly, monthly, or yearly. The primary goal of time series forecasting is to identify patterns in the data and use those patterns to predict future values of the…

Decoding the Magic of Probabilistic Graphical Models: A Comprehensive Guide to Understanding and Applying PGMs in Data Science
Probabilistic Graphical models belong to the generative class of models which can model all the relationships between the variables and answer any question asked even when partial data is provided. They provide a flexible and intuitive framework for representing probabilistic relationships between variables present in the dataset. PGMs come in two main flavors: Bayesian Networks and Markov Networks. Bayesian Networks are directed acyclic graphs that represent the probabilistic dependencies…

Principal Component Analysis Detailed Explanation
Principal Component Analysis (PCA) is a powerful statistical technique for dimensionality reduction and data visualization. PCA allows us to transform high-dimensional data into a lower-dimensional space while retaining most of the original variance in the data. This makes it easier to visualize, analyze and model complex datasets. However, understanding the mathematical concepts and intuition behind PCA can be challenging for beginners. This post will explain Principal Component Analysis,…

Unlocking the Power of Naive Bayes Algorithm for Text Classification
Naïve Bayes is a popular algorithm that finds use cases in text classification problems. This post will look at example text-related problems and understand how the Naïve Bayes algorithm can help solve problem statements. In general, Naïve Bayes performs well in text classification applications. This article assumes that you have a basic knowledge of how the Naïve Bayes algorithm work. If you don't know or want to revise the concept, I recommend you go through the Naïve Bayes Algorithm…

Naïve Bayes Algorithm Detailed Explanation
In this post, I will be talking about the Naïve Bayes algorithm which is a popular machine learning algorithm used for text-based tasks. Naïve Bayes is a probabilistic classifier that returns the probability of a test point belonging to a class rather than the label of the test point. It is a bayesian method that combines evidence from data (summarized by the likelihood) with initial beliefs (known as a prior distribution) to produce a posterior probability distribution of the unknown quantity…

Understanding Popular Statistical Tests To Perform Hypothesis Testing Is Not Difficult At All!
When you perform an analysis on a sample, you only get the statistics of the sample. You want to make claims about the entire population using sample statistics. But remember that these are just claims, so you can’t be sure if they’re true. This kind of claim or assumption is called a hypothesis. For example, your hypothesis might be that the average lead content in a food product is less than 2.5 ppm, or the average time to resolve a query at a call center is 6 minutes. Whatever your…