Regression Towards the Mean is a statistical phenomenon well known to researchers. It is the sort of knowledge that the average person would not be expected to know, but which researchers use to recognize how informed people are. Knowing about Regression Towards the Mean changes the way you interpret data.
Regression Towards the Mean occurs where individual measures that are in the upper or lower tails of a bell curve show up closer to the mean (i.e. average) when retested. This often occurs in achievement tests. Vendors of educational programs know this and will frequently guarantee that if you use their program on your lowest scoring students they will show a large gain in scores when retested. Regression Towards the Mean is the real guarantee. Conversely, your highest scoring students will tend to show lower scores when retested.
Test scores are taken by the average person to be an accurate measurement of knowledge. But to researchers a test score is called an "observed" score, not to be confused with the "true" score. The "true" score is an ephemeral idea of what knowledge is actually possessed by the test taker. The observed score is one produced by a test that is likely not an accurate measurement of that knowledge.
The knowledge actually possessed by any test taker is a granular subset of what was taught. Students don't remember everything they were taught. They remember bits and pieces of often disconnected things they were taught. The questions that make up a test are another granular subset of what was taught (or at least should be!) because tests cannot ask about everything that was taught.
Tests ask questions about a subset of facts that were presumed to be taught and these generally are not the most important things that are taught. If tests asked questions about the most important things, then nearly all of the students would answer them correctly and people would call the tests "too easy." So tests are designed to ask about more peripheral things that will produce a bell curve of results.
So the "observed" test score represents where the granular subset of what the test taker knows matches the granular subset that the test asks, plus another subset that results from guesses where no match was found. There is no way to tell whether questions answered correctly were actually known or guessed at. There is no way to tell whether questions answered incorrectly were misunderstood or not known because of absences, inattentiveness, or were not even taught.
What we really want to know is the size of the student's granular subset of knowledge: what we call the "true" score. We can't actually know the "true" score. We can only measure an observed score that represents the extent that what was asked matched what was known plus a guess factor. What is important to recognize is that each student's observed score is not a measure of knowledge, but rather a chance intersection between what they know and what was asked plus a guess factor.
Therefore, all observed scores represent the "true" score plus or minus the combined chances of the intersection between what was known and what was tested plus guessing. Most observed scores will be clustered around the mean in a bell-shaped pattern with fewer scores farther from the mean. Every true score will be mismeasured by some combination of chance from intersection and guessing resulting in observed scores.
The true score may be above or below the observed score, except in two situations. Very low observed scores probably have true scores that are all above the observed score simply because it's unlikely there are even lower scores that chance moved upward. Thus most of the observed low scores likely have higher true scores that were lowered by chance. Similarly, very high observed scores probably have true scores that are below the observed score simply because there are likely few true scores higher.
Regression Towards the Mean simply means when you retest students, the farther scores were from the mean the less likely it is that the same combination of chance will move the scores the same amount. Thus on any retest, the test scores of students who were at the extremes tend to be closer to the mean ("regress towards the mean") purely by chance, while a few students with previously observed scores closer to the mean will now appear at the extremes. This is important to know whenever you retest students who have previously scored high or low on a test.
It is the reason that proper research protocols do not utilize a simple test/retest program, but instead randomly assign students to an "experimental" group and a "control" group to see if there is any difference in the scores of these groups on the retest. Thus to avoid Regression Towards the Mean you divide students into two groups by random assignment and use the vendor's program on one group and your usual procedures on the other ("control") group, then compare the results.
In education we rarely assign students to teachers randomly. As a consequence there can be subtle differences that influence which teachers are assigned what students. Where this becomes crucial is with the assignment of lower scoring students.
Students who do well in a subject tend to have a broad enough granular subset of knowledge to match well with nearly any subset asked by the test. The choice of questions asked tends to be much less important. In addition, when these students guess it frequently is between only two possible answers rather than not having a clue among four or five choices. The high end students have a much higher probability of the test questions corresponding to what they know, and they thus make fewer guesses and have a 50 percent chance of being correct.
The students with scores at the lower end of the bell curve could have their strengths completely missed by the choice of questions asked, and when they guess they have only a 25 percent chance of being correct. The observed scores of those with lesser knowledge will depend very much on chance overlaps between their knowledge subset and that of the test, and their guessing success could vary widely by chance.
As a consequence, students with scores at the top end of the bell curve are less influenced by Regression Towards the Mean than students with low scores. A student with a high observed score probably has a nearby true score and thus we are seeing a fairly accurate measure of the student's knowledge. High scoring students will tend to have lower scores due to Regression Towards the Mean but not dramatically different. Thus testing high scoring students for scholarships and admission to programs can be expected to be fairly accurate.
But students with low observed scores probably have true scores that are much closer to the mean and a retest would produce large score gains simply because a large combined chance of intersection and guessing was likely what produced their low score. In the retest it is unlikely that the same large combination of chance would recur and thus a higher test score closer to the mean would result.
Regression Towards the Mean is also one reason why simple "value added" measures can be problematic. When you are talking about students whose base scores were in the upper or lower ranges, you can expect that Regression Towards the Mean will show up in the next test score. But the next test scores will also have some students who scored close to the mean on the base test now having a large score gain or decline from large chance events that put them in the tail. Your overall bell curve may look the same on both tests, but different students will appear in the tails from test to test.
Regression Towards the Mean is a statistical phenomenon well known to researchers. It is the sort of knowledge that the average person would not be expected to know, but which researchers use to recognize how informed people are. Knowing about Regression Towards the Mean changes the way you interpret data.
Regression Towards the Mean occurs where individual measures that are in the upper or lower tails of a bell curve show up closer to the mean (i.e. average) when retested. This often occurs in achievement tests. Vendors of educational programs will frequently guarantee that if you use their program on your lowest scoring students they will show a large gain in scores when retested. Regression Towards the Mean is the real guarantee. Conversely, your highest scoring students will tend to show lower scores when retested.
Test scores are taken by the average person to be an accurate measurement of knowledge. But to researchers a test score is called an "observed" score, not to be confused with the "true" score. The "true" score is an ephemeral idea of what knowledge is actually possessed by the test taker. The observed score is one produced by a test that is likely not an accurate measurement of that knowledge.
The knowledge actually possessed by any test taker is a granular subset of what was taught. Students don't remember everything they were taught. They remember bits and pieces of often disconnected things they were taught. The questions that make up a test are another granular subset of what was taught (or at least should be!) because tests cannot ask about everything that was taught.
Tests ask questions about a subset of facts that were presumed to be taught and these generally are not the most important things that are taught. If tests asked questions about the most important things, then nearly all of the students would answer them correctly and people would call the tests "too easy." So tests are designed to ask about more peripheral things that will produce a bell curve of results.
So the "observed" test score represents where the granular subset of what the test taker knows matches the granular subset that the test asks, plus another subset that results from guesses where no match was found. There is no way to tell whether questions answered correctly were actually known or guessed at. There is no way to tell whether questions answered incorrectly were misunderstood or not known because of absences, inattentiveness, or were not even taught.
What we really want to know is the size of the student's granular subset of knowledge: what we call the "true" score. We can't actually know the "true" score. We can only measure an observed score that represents the extent that what was asked matched what was known plus a guess factor. What is important to recognize is that each student's observed score is not a measure of knowledge, but rather a chance intersection between what they know and what was asked plus a guess factor.
Therefore, all observed scores represent the "true" score plus or minus the combined chances of the intersection between what was known and what was tested plus guessing. Most observed scores will be clustered around the mean in a bell-shaped pattern with fewer scores farther from the mean. Every true score will be mismeasured by some combination of chance from intersection and guessing resulting in observed scores.
The true score may be above or below the observed score, except in two situations. Very low observed scores probably have true scores that are all above the observed score simply because it's unlikely there are even lower scores that chance moved upward. Thus most of the observed low scores likely have higher true scores that were lowered by chance. Similarly, very high observed scores probably have true scores that are below the observed score simply because there are likely few true scores higher.
Regression Towards the Mean simply means when you retest students, the farther scores were from the mean the less likely it is that the same combination of chance will move the scores the same amount. Thus on any retest, the test scores of students who were at the extremes tend to be closer to the mean ("regress towards the mean") purely by chance, while a few students with previously observed scores closer to the mean will now appear at the extremes. This is important to know whenever you retest students who have previously scored high or low on a test.
It is the reason that proper research protocols do not utilize a simple test/retest program, but instead randomly assign students to an "experimental" group and a "control" group to see if there is any difference in the scores of these groups on the retest. Thus to avoid Regression Towards the Mean you divide students into two groups by random assignment and use the vendor's program on one group and your usual procedures on the other ("control") group, then compare the results.
In education we rarely assign students to teachers randomly. As a consequence there can be subtle differences that influence which teachers are assigned what students. Where this becomes crucial is with the assignment of lower scoring students.
Students who do well in a subject tend to have a broad enough granular subset of knowledge to match well with nearly any subset asked by the test. The choice of questions asked tends to be much less important. In addition, when these students guess it frequently is between only two possible answers rather than not having a clue among four or five choices. The high end students have a much higher probability of the test questions corresponding to what they know, and they thus make fewer guesses and have a 50 percent chance of being correct.
The students with scores at the lower end of the bell curve could have their strengths completely missed by the choice of questions asked, and when they guess they have only a 25 percent chance of being correct. The observed scores of those with lesser knowledge will depend very much on chance overlaps between their knowledge subset and that of the test, and their guessing success could vary widely by chance.
As a consequence, students with scores at the top end of the bell curve are less influenced by Regression Towards the Mean than students with low scores. A student with a high observed score probably has a nearby true score and thus we are seeing a fairly accurate measure of the student's knowledge. High scoring students will tend to have lower scores due to Regression Towards the Mean but not dramatically different. Thus testing high scoring students for scholarships and admission to programs can be expected to be fairly accurate.
But students with low observed scores probably have true scores that are much closer to the mean and a retest would produce large score gains simply because a large combined chance of intersection and guessing was likely what produced their low score. In the retest it is unlikely that the same large combination of chance would recur and thus a higher test score closer to the mean would result.
Regression Towards the Mean is also one reason why simple "value added" measures can be problematic. When you are talking about students whose base scores were in the upper or lower ranges, you can expect that Regression Towards the Mean will show up in the next test score. But the next test scores will also have some students who scored close to the mean on the base test now having a large score gain or decline from large chance events that put them in the tail. Your overall bell curve may look the same on both tests, but different students will appear in the tails from test to test.
In other words, the scores that students typically receive on tests represent gross approximations of their actual ability.