Machine Learning

Part 1: Guide to build accurate Time Series Forecasting models
Time series forecasting is a statistical technique that predicts future values over time based on past observations. Unlike other forms of data analysis, time series forecasting involves analyzing data ordered in time. This means that each observation in the dataset is associated with a specific point in time, such as hourly, daily, weekly, monthly, or yearly. The primary goal of time series forecasting is to identify patterns in the data and use those patterns to predict future values of the…

Decoding the Magic of Probabilistic Graphical Models: A Comprehensive Guide to Understanding and Applying PGMs in Data Science
Probabilistic Graphical models belong to the generative class of models which can model all the relationships between the variables and answer any question asked even when partial data is provided. They provide a flexible and intuitive framework for representing probabilistic relationships between variables present in the dataset. PGMs come in two main flavors: Bayesian Networks and Markov Networks. Bayesian Networks are directed acyclic graphs that represent the probabilistic dependencies…

Principal Component Analysis Detailed Explanation
Principal Component Analysis (PCA) is a powerful statistical technique for dimensionality reduction and data visualization. PCA allows us to transform high-dimensional data into a lower-dimensional space while retaining most of the original variance in the data. This makes it easier to visualize, analyze and model complex datasets. However, understanding the mathematical concepts and intuition behind PCA can be challenging for beginners. This post will explain Principal Component Analysis,…

Unlocking the Power of Naive Bayes Algorithm for Text Classification
Naïve Bayes is a popular algorithm that finds use cases in text classification problems. This post will look at example text-related problems and understand how the Naïve Bayes algorithm can help solve problem statements. In general, Naïve Bayes performs well in text classification applications. This article assumes that you have a basic knowledge of how the Naïve Bayes algorithm work. If you don't know or want to revise the concept, I recommend you go through the Naïve Bayes Algorithm…

Naïve Bayes Algorithm Detailed Explanation
In this post, I will be talking about the Naïve Bayes algorithm which is a popular machine learning algorithm used for text-based tasks. Naïve Bayes is a probabilistic classifier that returns the probability of a test point belonging to a class rather than the label of the test point. It is a bayesian method that combines evidence from data (summarized by the likelihood) with initial beliefs (known as a prior distribution) to produce a posterior probability distribution of the unknown quantity…

Explaining ML model results using Cumulative gains and lift instead of ROC curve are much more intuitive
The cumulative response curve (also known as cumulative gains in Data Science) and Lift curve are two different visuals that provide an effective mechanism to measure the accuracy of the predictive classification model. They are generally preferred, when cost and benefits discussed in the Using an expected value for the designed Machine Learning solutions post are difficult to calculate, but the target variable/class mix has a probability of not changing (i.e. The class balance distributions…

Explained interpretation of ROC Curve with an help of example
Consider a scenario where we have trained the classification machine learning model to identify that given 15 users will read this post or not using a trained logistic regression algorithm. The probability of each user is identified using the trained machine learning model and we want to evaluate the accuracy of the machine learning model before deploying it to the production environment. A few ways to evaluate the machine learning model are to check the accuracy, confusion metrics,…

Using an expected value for the designed Machine Learning solutions
How can we use expected value in driving the decision from the designed Machine Learning solution? We generally think about how statistics can be used in Machine Learning. The truth is that Machine Learning is built on top of statistics, but we have reached a stage where we have libraries available that hide the complex statistics from the Data Scientist. But today, we will try to answer the question, of whether expected value helps in driving the better decisions from the results that the…

Support Vector Machines Detailed Explanation
There are a lot of algorithms like logistic regression, Naive Bayes, etc that are used to solve classification problems. Though these algorithms are popular and used across the industry, they fail to classify complex classification tasks like image classification, voice detection, etc. Support Vector Machines are also known as SVM which are capable of dealing with quite complex problems, where models like logistic regression mostly fail. A few of the properties and use cases of the SVM…