Let's understand the role of statistical significance tests we will understand in what situations when you want to use a statistical significance test in the context of machine learning project workflow.
Complete Machine Learning Course for FREE: • Foundation of Machine Learning (The Big Pi...
Join ML+ membership for exclusive Data science content
Checkout complete Data Scientist Learning Path here:
https://edu.machinelearningplus.com/s...
🔹Role of Significance Test in Machine Learning
We come across such claims all the time in the newspapers and media all over the place, you will see claims that are being made about anything, almost anything, for example, even in our housing prices problem based on the data, you might observe certain things and make a claim. And one such claim should be housing prices in locality a is greater than locality.
But suppose, just by looking at the data, you are making this claim, and how you are making this claim, you could simply take the mean price of all the houses in locality and the mean price of all the houses in local dB.
And if the main price of a is greater than B, you are claiming that okay, the mean price of all the houses in the a is greater than b. Now, just based on the pure sample data that you have at hand, this might be a meaningful claim to make.
But is this sufficient to substantiate the claim is a statistically significant? What do I mean by that? For instance, let's take this case. Say we have locality a and data for locality B.
And let's suppose we took a sample of the sample n could be say 21 houses and for locality B could be 24 houses and the mean price, say let's call it $110,000, right, this is the mean price for 21 houses, whereas it is one not $8,000 for 24 houses and locality. Now, based on this data, we are assuming we are concluding here, we are claiming that the mean price of locality is greater than locality be right now, is this difference $110,000 and $108,000.
Is this difference, significant enough to make such a claim? That is one thing to think about? Second thing is, is this number of items in the sample is the sample size sufficient enough to make this claim. Let's suppose we take more sample say we sample nine more houses for 30 houses, the mean could be something like say for 30 houses, the mean could be changing to one or 1000.
Remaining the if the remaining nine houses that we sort of surveyed has a lesser price, then there's a very good chance that 110k turns to one or 8k, likewise. So you do some sampling here for locality, do you find that the house changes house prices mean house prices change to 15k, this is also very much possible.
So making a claim just based on taking the mean of the data sample that you have, might not always be sufficient to conclude on that claim. So always before making a game, especially if you are going to be presenting it to a larger audience, or be making some business decisions based on that claim. You need to substantiate that with suitable statistical significance test.
That is always a standard practice before you publish. Now, when do you use the statistical significance test in the context of machine learning project workflow. This is actually a form of hypothesis testing. This is used in the Discover phase where you do the expert data analysis. This is a step that you typically perform after you've collected the data and organized it right.
After you have created your data. You do the exploratory data analysis. And this happens just before you go into model building, just after you've collected the data and prepared it.
And before you enter into model building, you want to find out all the insights all the information that you want to know about the data set.
During that time you want to perform these tests before you make any solid claim around any of the insights that you might have found in your data.
Let me know in the comments section if you have any questions!
🤝 Like, Share, Subscribe for more!
Follow us on our social media handles for all updates, events and live sessions-
✅ Instagram: / machinelearningplus
✅ LinkedIn: / machine-learning-plus
✅ YouTube: / numyard
✅ Twitter: / r_programming
✅ Website: https://www.machinelearningplus.com/
If you enjoyed this video, be sure to throw it a like and make sure to subscribe to not miss any future videos!
Thanks for watching!
#machinelearningplus #python #machinelearning #datascience
Информация по комментариям в разработке