In the simple experiment, you randomly assign participants to two groups. Nothing could make your two groups identical, but random assignment makes sure that the groups do not differ in any systematic way. The differences between the groups at the end of the experiment should be due to only two things:
1. The one systematic difference between the groups (the treatment [independent variable]) and
2. unsystematic, chance differences (random error)
We can use statistics to factor out the effects of chance because chance conforms to certain rules, most of the time. By taking advantage of what statisticians have learned about those rules, we can determine whether the difference between the groups is too large to be due to chance alone.
The logic statistics uses is similar to the logic you would use to decide whether a coin was biased. If a coin came up "heads" 10 out of 10 times, you would say the coin was biased because that is very different from getting 5 "heads" and "5" tails. If another coin came up "heads" 600/1000 times, you would also say that coin was biased because, with so many flips, 60% seems reliably different from 50%. However, if a coin came up "heads" 6 out of 10 times, you would not claim it was biased: Although it came up "heads" 60% of the time, with such a small sample size, 60% does not seem reliably different from 50%. So, you are probably not surprised that, to determine whether the treatment had an effect, statistics looks at two things that you would look at when deciding whether a coin was biased: (1) how big the difference was between the groups (the bigger, the less likely it is to be due to chance alone) and (2) how many participants were in each group (if there are only a few participants, then even a fairly large difference between the groups could easily be due to chance).
When comparing two groups, statistics also looks at a third thing: how much variability there is within the groups. In some studies, all the participants in the control group will have similar scores--and all the experimental participants will also score very similarly. For example, all the control group participants may score around 5 and all the experimental participants may score around 10. In other studies, the scores within each group may vary wildly. The cause of variation within a group is random error. This is most obvious when you look at a control group: If scores vary within the control group, it cannot be due to the treatment because none of the participants are getting the treatment. The variation within the group is due to uncontrolled factors. These uncontrolled factors are, thanks to random assignment, random error. Note also that differences within scores in the treatment group can't be due to the treatment because those participants are all getting the same treatment. That is, if one participant getting the treatment scores differently from another participant getting the same treatment, we can't say that the treatment caused the two participants to behave differently.
If there are big differences between scores within a group, random error (e.g., individual differences between participants, lack of standardization, unreliable measures) is having a big effect on individual scores. If random error is having a big effect on individual scores, could random error alone be causing a relatively big difference between the average score of our control group and the average score of our experimental group? To find out, statistical tests combine their estimate of how much random error is affecting individual scores (by looking at within group differences) with how many scores make up the average score (the more scores making up an average, the less the average will be affected by random error). After getting an estimate of how much the two averages could differ by chance alone, the statistical test compares that estimate to the observed difference between the means. Then, the decision is made about whether the difference between means is statistically significant: reliable, unlikely to be due to chance alone.
To be more specific, you know that your experimental and control groups may differ for two reasons:
Now that we know how random error could affect a sample mean, we now need to consider what could cause two sample means from the same population to differ from each other. Even if the treatment has no effect, the control group mean might differ from the experimental group mean for three reasons:
To be statistically significant, the actual difference between the means must be bigger than the standard error of the difference. How much bigger? That will depend on your significance level and the number of participants in your expeeriment. Usually, the difference between the means must be about twice as big as the standard error of the difference. To get the exact value, you need to know the degrees of freedom (number of participants -2) and then look at a t table, such as Table 1 in Appendix F of your text (see page 680). After doing calculating t and comparing our t to the value in the t table, we decide whether our results are statistically significant.
Unfortunately, the decision from a statistical test may be wrong. One problem is a Type 1 error can be made--deciding that a difference between our groups is due to the treatment, when the difference is due entirely to chance. Fortunately, we can decide what risk of a Type 1 error we are going to take. If we choose a .05 significance level, then there is only a 5% risk of making a Type 1 error. If we choose a .01 significance level, there is only a 1% risk of making a Type 1 error.
You might wonder what is really happening when we are choosing a smaller risk of making a Type 1 error. What we're doing is requiring a bigger difference between our groups before we declare it to be "statistically significant." Thus, a difference between the experimental group and control group that would have been big enough to be statistically significant at the .05 level might not be big enough to be significant at the .01 level.
You might wonder why we don't just set our risk of making a Type 1 error at a real low level, such as .001. The problem is that by setting our risk of a Type 1 (false alarm) error real low, we increase our risk of a Type 2 error--failing to find a real treatment effect. Usually, your risk of making a Type 2 error is much greater than the risk of making a Type 1 error. You can reduce your risk of making a Type 2 error by:
You have seen that we have to use statistics to determine whether the difference between our groups is too big to be due to chance. More importantly, you have seen that statistics affects how we design and conduct our study. Because of statistics, we should