Machine Learning
Unlocking the Power of Natural Language Processing: An Introduction and Pipeline Framework for Solving NLP Tasks
In this post Natural Language Processing we will be understanding what NLP is and various techniques that are used in the process. There are various packages provided to perform NLP or text-related operations in Python such as NLTK, spaCy, TextBlob, Pattern, gensim, MITIE, guess_language, Python wrapper for Stanford CoreNLP, Python wrapper for Berkeley Parser, readability-lxml, BeautifulSoup, etc. In this post, we will be focusing on NLTK and the spaCy package. Let's get started. Regular…
Part 2: Guide to build accurate Time Series Forecasting models - Auto Regressive Models
In the post Part 1: Guide to build accurate Time Series Forecasting models, we discussed the different smoothing models of forecasting using a time series dataset. This post is the continuation of the previous post where we will talk about another family used for time series forecasting called an autoregressive model (AR Model). If you are not aware of how time series models work or want to revise different smoothing models such as simple moving average, simple exponential smoothing, Holt's…
Part 1: Guide to build accurate Time Series Forecasting models
Time series forecasting is a statistical technique that predicts future values over time based on past observations. Unlike other forms of data analysis, time series forecasting involves analyzing data ordered in time. This means that each observation in the dataset is associated with a specific point in time, such as hourly, daily, weekly, monthly, or yearly. The primary goal of time series forecasting is to identify patterns in the data and use those patterns to predict future values of the…
Decoding the Magic of Probabilistic Graphical Models: A Comprehensive Guide to Understanding and Applying PGMs in Data Science
Probabilistic Graphical models belong to the generative class of models which can model all the relationships between the variables and answer any question asked even when partial data is provided. They provide a flexible and intuitive framework for representing probabilistic relationships between variables present in the dataset. PGMs come in two main flavors: Bayesian Networks and Markov Networks. Bayesian Networks are directed acyclic graphs that represent the probabilistic dependencies…
Principal Component Analysis Detailed Explanation
Principal Component Analysis (PCA) is a powerful statistical technique for dimensionality reduction and data visualization. PCA allows us to transform high-dimensional data into a lower-dimensional space while retaining most of the original variance in the data. This makes it easier to visualize, analyze and model complex datasets. However, understanding the mathematical concepts and intuition behind PCA can be challenging for beginners. This post will explain Principal Component Analysis,…
Unlocking the Power of Naive Bayes Algorithm for Text Classification
Naïve Bayes is a popular algorithm that finds use cases in text classification problems. This post will look at example text-related problems and understand how the Naïve Bayes algorithm can help solve problem statements. In general, Naïve Bayes performs well in text classification applications. This article assumes that you have a basic knowledge of how the Naïve Bayes algorithm work. If you don't know or want to revise the concept, I recommend you go through the Naïve Bayes Algorithm…
Naïve Bayes Algorithm Detailed Explanation
In this post, I will be talking about the Naïve Bayes algorithm which is a popular machine learning algorithm used for text-based tasks. Naïve Bayes is a probabilistic classifier that returns the probability of a test point belonging to a class rather than the label of the test point. It is a bayesian method that combines evidence from data (summarized by the likelihood) with initial beliefs (known as a prior distribution) to produce a posterior probability distribution of the unknown quantity…
Understanding Popular Statistical Tests To Perform Hypothesis Testing Is Not Difficult At All!
When you perform an analysis on a sample, you only get the statistics of the sample. You want to make claims about the entire population using sample statistics. But remember that these are just claims, so you can’t be sure if they’re true. This kind of claim or assumption is called a hypothesis. For example, your hypothesis might be that the average lead content in a food product is less than 2.5 ppm, or the average time to resolve a query at a call center is 6 minutes. Whatever your…
Explaining ML model results using Cumulative gains and lift instead of ROC curve are much more intuitive
The cumulative response curve (also known as cumulative gains in Data Science) and Lift curve are two different visuals that provide an effective mechanism to measure the accuracy of the predictive classification model. They are generally preferred, when cost and benefits discussed in the Using an expected value for the designed Machine Learning solutions post are difficult to calculate, but the target variable/class mix has a probability of not changing (i.e. The class balance distributions…