In the simple experiment, you ** randomly assign** participants to two groups. Nothing could make your two groups identical, but random assignment makes sure that the groups do not differ in any systematic way. The differences between the groups at the end of the experiment should be due to only two things:

1. The one systematic difference between the groups (the treatment [independent variable]) and

2. unsystematic, chance differences (random error)

We can use statistics to factor out the effects of chance because chance conforms to certain rules, most of the time. By taking advantage of what statisticians have learned about those rules, we can determine whether the difference between the groups is too large to be due to chance alone.

The logic statistics uses is similar to the logic you would use to decide whether a coin was biased. If a coin came up "heads" 10 out of 10 times, you would say the coin was biased because that is very different from getting 5 "heads" and "5" tails. If another coin came up "heads" 600/1000 times, you would also say that coin was biased because, with so many flips, 60% seems reliably different from 50%. However, if a coin came up "heads" 6 out of 10 times, you would not claim it was biased: Although it came up "heads" 60% of the time, with such a small sample size, 60% does not seem reliably different from 50%. So, you are probably not surprised that, to determine whether the treatment had an effect, statistics looks at two things that you would look at when deciding whether a coin was biased: (1) how big the difference was between the groups (the bigger, the less likely it is to be due to chance alone) and (2) how many participants were in each group (if there are only a few participants, then even a fairly large difference between the groups could easily be due to chance).

When comparing two groups, statistics also looks at a third thing: how much
variability there is within the groups. In some studies, all the participants in
the control group will have similar scores--and all the experimental
participants will also score very similarly. For example, all the control group
participants may score around 5 and all the experimental participants may score
around 10. In other studies, the scores within each group may vary wildly. The
cause of variation **within** a group is **random error**. This is most obvious when you
look at a control group: If scores vary within the control group, it cannot be
due to the treatment because none of the participants are getting the treatment.
The variation within the group is due to uncontrolled factors. These
uncontrolled factors are, thanks to random assignment, random error. Note also
that differences within scores in the treatment group can't be due to the
treatment because those participants are all getting the same treatment. That
is, if one participant getting the treatment scores differently from another
participant getting the same treatment, we can't say that the treatment caused
the two participants to behave differently.

If there are big differences between scores within a group, random error (e.g.,
individual differences between participants, lack of standardization, unreliable
measures) is having a big effect on individual scores. If random error is having
a big effect on individual scores, could random error alone be causing a
relatively big difference between the **average** score of our control group
and the average score of our experimental group? To find out, statistical tests
combine their estimate of how much random error is affecting individual scores
(by looking at within group differences) with how many scores make up the
average score (the more scores making up an average, the less the average will
be affected by random error). After getting an estimate of how much the two
averages could differ by chance alone, the statistical test compares that
estimate to the observed difference between the means. Then, the decision is
made about whether the difference between means is statistically significant:
reliable, unlikely to be due to chance alone.

To be more specific, you know that your experimental and control groups may differ for two reasons:

- the independent variable
- unsystematic, chance differences (random error)

- the only reason control group participants differ from each other is random error
- the only reason experimental group participants differ from each other is random error

Now that we know how random error could affect a sample mean, we now need to consider what could cause two sample means from the same population to differ from each other. Even if the treatment has no effect, the control group mean might differ from the experimental group mean for three reasons:

- random error has thrown off the control group mean,
- random error has thrown off the experimental group mean, or
- random error has thrown off both the experimental and the control group mean.

To be statistically significant, the actual difference between the means must
be bigger than the standard error of the difference. How much bigger?
That will depend on your significance level and the number of participants
in your expeeriment. Usually, the difference between the means must be about twice as big as the
standard error of the difference. To get the exact value, you need to know
the degrees of freedom (number of participants -2) and then look at a *t* table,
such as Table 1
in Appendix F of your text (see page 680). After doing calculating t and
comparing our t to the value in the t table, we decide whether our results
are statistically significant.

Unfortunately, the decision from a statistical test may be wrong. One problem is a Type 1 error can be made--deciding that a difference between our groups is due to the treatment, when the difference is due entirely to chance. Fortunately, we can decide what risk of a Type 1 error we are going to take. If we choose a .05 significance level, then there is only a 5% risk of making a Type 1 error. If we choose a .01 significance level, there is only a 1% risk of making a Type 1 error.

You might wonder what is really happening when we are choosing a smaller risk of making a Type 1 error. What we're doing is requiring a bigger difference between our groups before we declare it to be "statistically significant." Thus, a difference between the experimental group and control group that would have been big enough to be statistically significant at the .05 level might not be big enough to be significant at the .01 level.

You might wonder why we don't just set our risk of making a Type 1 error at a real low level, such as .001. The problem is that by setting our risk of a Type 1 (false alarm) error real low, we increase our risk of a Type 2 error--failing to find a real treatment effect. Usually, your risk of making a Type 2 error is much greater than the risk of making a Type 1 error. You can reduce your risk of making a Type 2 error by:

- Having many participants
- Standardizing your procedures
- Carefully coding your data
- Using a reliable dependent measure
- Using homogeneous participants

You have seen that we have to use statistics to determine whether the difference between our groups is too big to be due to chance. More importantly, you have seen that statistics affects how we design and conduct our study. Because of statistics, we should

- randomly assign participants,
- maintain independence, often by either running participants individually, or, if participants are run in groups, not all the participants in the group are receiving the same treatment,
- use reliable measures, standardized procedures, and many participants to reduce the risk of a Type 2 error, and
**not**make predictions about treatments having no effect or having the same effect.

Back to Chapter 10 Menu