next video:    • video 13.2. multiple regression introduction   
prior video:    • video 12.4. correlation vs causation  
closed captioning text:
Now, I will talk about a statistical test that is very closely related to correlation, but you can use it for prediction. Overall, the strategy is called "regression." We are going to start with "simple regression," but after we are done with simple regression, we will talk about what is called "multiple regression." 
Multiple regression is one of the most useful statistical techniques that there are. It is used in the business world, it is used by scientists, it is really a powerful technique, but we will start out with the simple version. You use simple regression when you have one predictor variable and one outcome variable. The thing that makes it simple, is that you have one predictor variable. If you had more than one predictor, you would be doing multiple regression. The outcome variable should be interval or ratio scale of measure. The predictor variable does not need to be, it can actually be any scale of measure, and it would work out okay. 
For example, we might think that how much students study will predict exam grades. So, students who study more hours for an exam would get better grades. The null hypothesis here would be that there is nothing happening. In that case, hours studying is unrelated to (or does not predict) exam grades. I will say studying studying does not predict exam grades. Usually I would have written this null hypothesis with population parameter symbols, but since I have not introduced enough material yet for this regression, we are going to come back to what these null hypotheses look like in a little bit. 
Let's consider an example. Let's say we have asked students, ten of them, how many hours they studied for the exam. So we will put that on the bottom. You will call it X. This will be the hours studied, the number of hours they studied for the exam, and we will call this Y, this is the outcome, and this is exam scores. Let's say our results fit our intuitions, where the more students studied, the better exam grades they got. So we will say there is ten students, so one, two, three, four, five, six, seven, eight, nine, ten. So this is a student who studied a lot. Let's say this goes from 0 - 100%, so this is good. This is about 100 here. We will say the number of hours goes from 0 - 10. This is a student who studied, I don't know, nine and a half hours, and got a 99, or 100%. Really did really well on the exam. This person down here studied, maybe, like one hour and failed the exam. 
So far this looks a lot like correlation, and we could actually calculate the correlation coefficient of this data, but now we want to switch from just telling if there is a relationship between the two, to using regression. Regression is going to do a little bit more than correlation can. Because it is going to allow us to predict people's scores on exams. If this is what our data looks like, and we run our data through our statistical program, What it is going to do is it is going to tell us the line that best fits through all of that data. This is the "best fit line", also also known as the "regression line". This line, what we can do with it once we have it, is if an eleventh student shows up, and they have studied for 5 hours, given the data that we had from our first 10 students, we can predict that student is going to get about, maybe like an 80%. That is how this regression is going to allow us to predict future observations based on some previous observations that we already have. 
You probably remember from high school the equation Y = mX + B This is the equation for a line, and in regression we have a line. Statisticians actually use this same formula, but of course, to make things more complicated they re-label everything. So this, back in the day, this was the slope. Remember, that was the "rise over the run." The rise is how much the the change is this way, for each change going across. That was the slope, and this, that was the intercept. That is where the line crossed over here. Right, but now that we are in statistics, right, I think this might have been algebra, but now what we are going to do is we are going to switch this up a little bit, but the ideas are exactly the same, and call this Y-hat, cute little hat there. So this means it is going to be a predicted value rather than an observed value. It is going to equal "a + bx". So "a" is now the intercept. So "whoop," and "b" is now the slope. But otherwise they are exactly the same as they used to be. So this is still rise over run, and this is still the intercept at the vertical axis. So whenever you do a regression analysis, the result is one of these prediction equations.
[closed captioning text continued in the comments]
                         
                    
Информация по комментариям в разработке