Machine Learning
Explained interpretation of ROC Curve with an help of example
Consider a scenario where we have trained the classification machine learning model to identify that given 15 users will read this post or not using a trained logistic regression algorithm. The probability of each user is identified using the trained machine learning model and we want to evaluate the accuracy of the machine learning model before deploying it to the production environment. A few ways to evaluate the machine learning model are to check the accuracy, confusion metrics,…
Using an expected value for the designed Machine Learning solutions
How can we use expected value in driving the decision from the designed Machine Learning solution? We generally think about how statistics can be used in Machine Learning. The truth is that Machine Learning is built on top of statistics, but we have reached a stage where we have libraries available that hide the complex statistics from the Data Scientist. But today, we will try to answer the question, of whether expected value helps in driving the better decisions from the results that the…
Support Vector Machines Detailed Explanation
There are a lot of algorithms like logistic regression, Naive Bayes, etc that are used to solve classification problems. Though these algorithms are popular and used across the industry, they fail to classify complex classification tasks like image classification, voice detection, etc. Support Vector Machines are also known as SVM which are capable of dealing with quite complex problems, where models like logistic regression mostly fail. A few of the properties and use cases of the SVM…
Confound between Covariance and Correlation? Me too.
There are two key concepts: Covariance and Correlation in the statistics, but what are they about? Most importantly are they the same or do they have any differences? Are they related to each other? What are the different types of correlations? The above are the questions that I was having initially when I started learning about statistics. In this post, I have made an attempt to answer the above-stated questions. Let's go straight into it and try to understand Covariance and Correlation.…
SQLAlchemy - ORM for Python
I n this post, we will discuss SQLAlchemy package which is used for connecting to the Object -Relational databases like: SQLite Postgres MySql and lot more Some of you might wonder why we need a package to connect to the database when we can connect it directly? SQLAlchemy package is an ORM which sorts out our development efforts and is very useful. You will understand the advantage of using SQLAlchemy package by end of this post. ORM (Object Relational Model) is always useful and…
XGBoost Detailed Explanation
In the previous post, Boosting Algorithms explained in detail we discussed in detail boosting algorithms and there working. In this post, let's talk about one more popular algorithm in boosting category: XGBoost. XGBoost XGBoost (Extreme Gradient Boosting) which is one of the most popular Gradient Boosting algorithms. It is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. NOTE: This Gradient Boosting tree algorithm is the same as the…
A complete guide to the Probability Distribution
We all are aware that probability is a measure of the likelihood of an event occurring in the experiment. The value of probability ranges between 0 indicating a less probable event and 1 indicating the event being the most probable. In our post, Does probability really help businesses? we discussed how statistics can help us in solving problems and helping us to make profitable decisions. Next, let's talk about how probability distribution can help us in further demystifying the statistics.…
Build flexible and accurate clusters with Gaussian Mixture Models
In the post, Unsupervised Learning k-means clustering algorithm in Python, we have discussed the clustering technique and covered k-means which is an unsupervised algorithm. In this post, we will be understanding the Gaussian Mixture Models algorithm which is another algorithm used to solve clustering problems. We will also talk about the limitations of the K-Means algorithm and how GMM can help to resolve the limitations. Like K-Means, GMM is also categorized as an unsupervised algorithm but…
Lasso and Ridge Regression Detailed Explanation
In Linear Regression we saw that the complexity of the model is not controlled. Linear Regression only tries to minimize the error (e.g. MSE) and may result in arbitrarily complex coefficients. The model which we are developing should be as simple as possible but not simpler. Regularization is a process used to create an optimally complex model, i.e. a model which is as simple as possible while performing well on the training data. As we can see from the diagram shown above our model…