The Field Experiment

Copyright 2000-2021 Mark L. Mitchell & Janina M. Jolley




The Field Experiment

What allows a randomized, laboratory experiment to establish causality? It is not a sterile lab filled with fancy equipment. Indeed, it is not the lab at all – it is random assignment.

You do not need a lab to randomly assign participants to groups. Therefore, if you do not want to do your experiment in a lab, you could conduct a field experiment: an experiment performed in a natural setting.


Advantages of Doing the Field Experiment

Why would you want to leave the comfort of the lab to do a field experiment? There are four major reasons for leaving the lab:

1. the desire to generalize your results to different settings,

2. the desire to generalize your results to a different group of people,

3. the desire to ensure that participants are reacting to the

treatment rather than feigning the reaction they think

will please you, and

4. the desire for more power to detect a treatment's




External Validity

First, you might want to generalize your results beyond the laboratory setting. The controlled, isolated lab is a far cry from the chaotic, crowded world that we live in. Consequently, some people question whether an effect found in a lab would hold in a real world setting. The field experiment lets you find out.

Second, you might want to generalize your results to people other than those who volunteer to be in psychology experiments. In most lab experiments, participants are students in introductory psychology courses. These students are probably not "typical" of the average person. In the field experiment, on the other hand, your participants can be anyone – even real people!


Construct Validity

Third, you might want to avoid lab experiments because volunteers for these experiments know they are in an experiment. Because they know the treatments aren't "real", their responses may be more of an act than an honest reaction to the treatment. Thus, rather than reacting to the treatment as they naturally would, they may act the way they think you want them to act. In other words, they may act to confirm your hypothesis.

To field experiment participants, on the other hand, the treatment is real. These participants aren't trying to confirm your hypothesis. In fact, they do not even know you're doing an experiment on them. Because of their naiveté, they are more likely to give natural responses.



Fourth, you might leave the lab because you do not have enough volunteer participants. As you may remember from Chapter 6, the more participants you have, the more able you are to find significant effects. When confronted with having only a few people who might agree to come to the lab for your experiment, and a world of potential participants waiting for you outside the lab, you may decide to go where the participants are.


Limitations of the Field Experiment

Although the field experiment may give you more power, more construct validity, and greater external validity the field experiment is not an automatic cure-all. The field experiment may lack external, construct, and internal validity. Furthermore, field experiments may lack power, be unethical, and demand more time and energy than you would ever suspect.


Is it Ethical?

The first problem to consider is an ethical one. According to the American Psychological Association's Code of Ethics (1989), all participants for an experiment should be volunteers. Not only should participants be volunteers, but you should get their informed consent prior to their participation. That is, participants should have a good idea of what is going to happen to them and should give their written permission before the experiment begins. Furthermore, after the experiment, participants should be debriefed: informed about what they have just done and why.

These ethical guidelines may conflict with your research goals. You may not want to use volunteers because volunteers are atypical, but the guidelines suggest that you use volunteers. If you were forced to use volunteers, you might not want to tell them what the experiment was about so they wouldn't play along with your study. However, the guidelines recommend that human participants know what they are volunteering for. Finally, you may not want to debrief your participants for fear that your participants might tell other potential participants about the study. The ethical guidelines, on the other hand, recommend that you debrief participants so that participants get some benefit from their participation and so that you can remove any harm you may have inadvertently caused.

What's the solution to these thorny ethical issues? Unfortunately, there are no easy answers. Your desire for valid information must be weighed against participants' rights to privacy. Since you may not be able to fairly weigh participants' rights against your desires, you should consult informed individuals (e.g., your research design professor) before doing a field experiment. In addition to consulting with your professor, you may also have to get your experiment approved by your institution's ethics committee.

Perhaps the easiest way to deal with ethical problems is to avoid violating the guidelines. For example, you might do a field experiment, but ask for volunteers, give informed consent, and debrief your participants. Under these conditions, you have lost some advantages of field experimentation, but you may still get participants that are more "typical" than laboratory participants and you do get to see whether your results generalize to a real world setting.

A more controversial approach is to perform a field experiment on unsuspecting volunteers while they think they are waiting for an experiment to begin. For example, Latane (1968) had participants witness a theft while they were in a waiting room ostensibly waiting to start a laboratory experiment.

The "experiment in the waiting room" is a compromise between ethical principles and research goals. To meet the ethical guidelines requesting the use of volunteer participants you lose the ability to get participants who are more like "real people" than volunteer participants and you lose the ability to overcome a shortage of volunteer participants.

To meet the research goal of seeing whether the effect would occur outside the lab and seeing whether the effect would occur with naive participants, you violate the ethical guidelines for informed consent. Because participants signed up for one experiment, but ended up in another study, this kind of study raises serious ethical questions.


External Validity Is Not Guaranteed

If you think that by doing a field experiment you will get participants who represent the average person, you may be disappointed. In the "waiting room" study we just described, the participants are the same college sophomores who would participate in a lab study. Even doing an experiment in the field (e.g., a shopping mall), won't insure that your participants will represent "the average person." In fact, in many field experiments, you may not know whom your participants represent. Sometimes, the only thing you can say is that participants "represented people who used the telephone booth at the Tarfield Mall between 2:00 p.m. and 4:00 p.m. during March 1994." Consequently, you may not be surprised by what Dipboye & Flanagan (1979) found when they examined published research in industrial psychology. They found that field research typically dealt with a rather narrow range of participants and generally had no more external validity than the lab studies.


Construct Validity Is Not Guaranteed

Similarly, if you want to study naive participants, the field experiment may let you down. Former participants or talkative bystanders may talk about the experiment to potential participants and ruin everyone's naiveté. To illustrate this point, consider a field experiment conducted by Shaffer, Rogel, and Hendrik (1975) in the Kent State University Library. A confederate of the researcher sat down at a table occupied by a naive participant. After several minutes of studying, the confederate walked away from the table leaving behind several personal items. Sometimes the confederate asked the naive participant to watch his belongings (request condition) other times he said nothing (no request condition). Shortly after the confederate left the table a "thief" appeared, went through the confederate's belongings, discovered a wallet, and quickly walked away with it. The dependent variable was whether participants tried to stop the thief. Results showed that 64% of the participants in the request condition tried to stop the thief, compared to only 14% in the no-request condition.

Imagine you were one of Shaffer et al.'s participants. After watching a thief steal a man's wallet, perhaps after trying to foil a robbery attempt, would you tell any one about it? Let's say you told a friend about the incident and that friend says that she has heard of a similar incident. One night, she goes to the library to study. Shortly after she sits down, she finds herself approached by the same victim she's heard about, and witnessing the very crime you told her about. Not only has she lost her naiveté, but when she tells her friends about this, the whole school will know about the experiment.

Or, put yourself in the place of a curious bystander, say the reference librarian. You are working at the reference desk and out of the corner of your eye you observe two students sitting at a table. One student gets up and walks away leaving behind his books and several personal items. You go about your work. But then you notice a different man go up to the pile of belongings, rummage through them, pocket a wallet, and walk hurriedly away. What would you do? As a responsible employee, you would try to stop the thief. At the very least, you would report the incident to the authorities. The campus police arrive to get your statement, perhaps even to make an arrest. To stop the police investigation, the researcher explains that it's only an experiment. Students in the library strain to overhear the conversation with the police, and students question you endlessly about the incident. Soon, everyone on campus knows about the experiment.

Thus, a field experiment may end up having no more construct validity than a laboratory study unless you take appropriate precautions. Therefore, if you were doing Shaffer et. al.'s study, you would try to collect all the data in one night to reduce the chances of participants talking to potential participants. Furthermore, to reduce the chance of innocent bystanders destroying participants' innocence, you might inform the library staff about the experiment.

Internal Validity Is Not Guaranteed

A carelessly conducted field experiment may not only lack external and construct validity, but internal validity. Although all field experiments should have internal validity, some do not because of failure to randomly assign participants to groups, and because of mortality: participants dropping out of the study.


Failure to Randomly Assign

All the designs we have discussed so far rely on independent random assignment for their internal validity. Unfortunately, random assignment is much more difficult in the field than in the laboratory. Random assignment is especially difficult when you are manipulating an important, real life treatment. Often real world participants and their representatives do not believe that people should be randomly assigned to important treatments. Instead, they believe that people should be able to choose their own treatment.

To imagine the difficulties of random assignment in the field, suppose you wanted to study the effects of television violence on children's behavior. You approach parents and tell them that you want some children to watch certain non-violent television programs (e.g., Mr. Rogers, Sesame Street) and other children to watch violent television programs, such as "TV wrestling", boxing, and violent action/adventure shows. You may find that few parents will let you randomly assign their child to either condition. If you say, "I want to be able to assign your child to either one of these conditions," many parents will object. Some will say, "You can show my child Sesame Street, but you're not going to make my kid watch violence and trash!" Other parents will say, "You can make my kid watch wrestling. I watch it all the time anyway. But not those other shows. They're on the same time as my shows. You're not going to make me sit around and watch childish junk!" However, the hassles with the parents may be nothing compared with the hassles of getting the children themselves to agree to random assignment.

Yet, with enough persistence (and enough money), you could probably get people to agree to random assignment. But once you've done that, you face a huge problem: How do you know that participants will watch the television shows you assigned? You cannot go to everyone's house. You cannot trust young children to carry out your instructions. You cannot trust parents to supervise the children because they may be busy with other tasks. The children are too young to be trusted to implement the treatment program. Therefore, the prospect of using random assignment to determine children's television diets seems intimidating.

In fact, the idea of randomly assigning children to television-viewing seems so intimidating that most investigators researching the effects of TV have often avoided field experiments. This is unfortunate because such experiments would provide the strongest evidence about the effects of viewing violent television shows.

Have these researchers given up too soon? Cook and Campbell (1976) claim that researchers often give up on random assignment faster than they should. Cook and Campbell argue that random assignment can often be used in the field – if the researcher is creative.

In the case of researching the impact of television on children's behavior, researchers may have given up too soon. Perhaps researchers should approach a nursery school. If the nursery school would cooperate and get informed consent from the parents and children, the television-viewing could take place at the school as part of the children's ordinary routine. In this way, you would know that participants were getting the treatment they were assigned to.



Unfortunately, even after you assign your participants to condition, they may not stay assigned. Mortality may raise its ugly head. That is, participants may drop out of your experiment before you collect the dependent measure. For example, suppose that you are doing the television violence experiment with nursery school children. As the study progresses, you find that participants are dropping out of the violent television condition (perhaps the kids are getting too violent or the parents are having second thoughts). However, participants are not dropping out of the non-violent condition. The fact that participants in one group are more likely to quit than participants in the other group threatens the study's internal validity. That is, if the violent television group is more aggressive, we cannot say whether this is due to the less aggressive children dropping out of the violent television group or to television violence causing children to be aggressive.

Usually, losing more participants from one group than the other is due to one of two reasons. First, the treatment is too intense. In such cases, the treatment should be toned down or eliminated. To use a manipulation that leads to such a high drop-out rate is often unethical. To take an extreme case of using an unethical level of treatment, suppose the television these children were watching was X-rated violence. In that case, mortality from the treatment group would be high (although we would hope that an ethics committee would prevent such a study from being conducted). Second, mortality from the treatment group will be higher than from the control group if the control group is left alone to engage in their normal activities. For example, if the experimental group was to watch a prescribed set of programs at home whereas the control group was simply allowed to do whatever they normally did at home, mortality would be higher in the experimental group. Therefore, the control group should always get some kind of treatment, even a placebo treatment.

Power May Be Inadequate

Not only is it easier to create an internally valid experiment in the lab than in the field, but it is also easier to create a powerful experiment in the lab than in the field. In the lab, you can have impressive power by reducing random error and by using sensitive dependent measures. By leaving the lab, you may lose your ability to reduce random error and to use sensitive measures.


Random Error

In the laboratory, you can reduce random error by minimizing the degree to which irrelevant variables vary. You can reduce unwanted variation due to individual differences by using a homogeneous group of participants. You can reduce unwanted variation in the environment by running participants under identical conditions. You can reduce unwanted variation due to participants being distracted by putting participants in a soundproof, simple, virtually distraction-free environment. You can reduce unwanted variation in your procedures by rigidly standardizing your experiment. Thus, if you do your study in the laboratory, you can use many tactics to stop irrelevant variables from varying.

By leaving the lab, you may lose your ability to stop these variables from fluctuating. Sometimes you willingly give up the opportunity to control these variables so that you can generalize your results to the real world. For example, you may do a field experiment to get access to a heterogeneous group of participants. The advantage of having a wide range of participants is that you can generalize your results to a wide range of people. The disadvantage is that you are giving individual differences a chance to account for a sizable difference between your groups. Therefore, your treatment's effect might be obscured by these individual differences.

Sometimes, however, you unwillingly give up the ability to control irrelevant variables. For instance, you always want to standardize your procedures. However, it's hard to follow the same procedure every time if you have to:

1. conduct your study on the run (Perhaps even approaching

participants and saying, "Excuse me, may I talk to you for a moment?"),

2. without the benefit of equipment,

3. in a noisy, crowded environment.

Furthermore, even if you succeed in administering the treatment in the same, standard way, your participants may fail to perceive the treatment in the same, standard way. That is, distractions in the environment may prevent all your participants from attending to your entire manipulation. Indeed, a manipulation that is overpowering in the lab may seem almost invisible when taken to the field.


Insensitive Measures

You have seen that, in the field, you cannot always administer the same manipulations with the same degree of standardization as you could in the lab. Because your measures are less standardized and less effective, your experiment is less powerful. Unfortunately, the same factors that impede your ability to administer your manipulations may also hurt your ability to use sensitive, powerful measures. To illustrate, let's say you are interested in whether getting an unexpected gift will increase happiness. In the lab, you would probably measure happiness by having participants rate how happy they are on a one to seven scale. Even if you were to use a more indirect behavioral indicator of happiness, such as helping, you would measure helping with a high degree of precision. For example, you would either measure exactly how long it took participants to help a person, or how much they helped the person.

In the field, measuring happiness is much more difficult. You probably won't be able to have participants fill out a rating scale. Therefore, you will probably have to use a less sensitive behavioral measure, such as helping. Furthermore, you may not even be able to measure helping with any degree of precision. Unlike in the lab, you cannot merely sit in your chair, gaze through a one-way mirror, and record how much or how long participants help. Instead, you must inconspicuously peer around the corner, filter out the dog barking, the traffic sounds, and other people to see whether your participants help. Under these conditions, you're lucky to see whether participants help – much less to see how much they help. Thus, in the field, you may be too busy to collect anything other than dichotomous (two-valued) variables. Clearly, asking whether someone responded gives you less information than asking how long it took the person to respond or to what extent the person responded.

But you do not have to settle for less sensitive measures when you go to the field. One way to avail yourself of more sensitive measures is to use a second experimenter who does nothing but record data. This leaves you free to put quarters (unexpected gifts) in phone booths, hide until a participant finds it, and make yourself a convenient person "in need" for your unsuspecting participants to demonstrate their good will. This second experimenter could observe and record things like how quickly participants responded and to what extent they responded. If you do not have a second experimenter, let such equipment as videotape cameras, tape recorders, and stopwatches, do the recording for you.

For example, Milgram, Bickman, and Berkowitz (1969) had confederates look up at a tall building. Their independent variable was how may confederates looked up at the building. Their dependent measure was the proportion of people walking by who also looked up. Actually, the confederates were looking up at a videotape camera. After the experiment was over, Milgram et al. were able to count the number of people looking up by replaying the videotape.

TABLE G.1: Pros and Cons of Field Experiments



Power may be enhanced by access to many participants.

The increase in participants may be more than negated by the inability to control random variables and the inability to use the most sensitive of measures.

Like randomized lab experiment, has internal validity because participants were randomly assigned.

Random assignment is sometimes difficult in the field.

Mortality may harm internal validity.

External validity may be enhanced by studying a wide variety of settings and participants.

Often, field experiments do not study a wider range of participants than those in the lab.

It may take more effort to do a field experiment than a lab experiment.

Ethical questions may arise, especially in terms of informed consent.

Construct validity may be enhanced by using participants that are not playing the role of participant.

Construct validity may be harmed if people learn of the study.

Not telling participants about the study raises ethical questions.


Special Problems with Field Experiments that Use Intact Groups

Some field experimenters try to regain the power lost due to having high levels of random error and insensitive measures by using a large number of participants. To get large numbers of participants, some researchers do field experiments on intact groups. For example, they might use a few large classes or a work group.


Failure to Establish and Maintain Independence

Unfortunately it's hard to independently assign participants from intact groups, and once they're assigned, it's hard to maintain independence. For example, suppose a nursery school was willing to help you out with your study on the effects of watching prosocial television. Then you would have a large, convenient sample. However, there might be a catch: The nursery school might insist that you keep the classes intact. Thus, although you might want to assign each student independently, you may have to assign one class to one condition and another class to the other condition. Consequently, no matter how many people are in your study, you only have two independent units – the two classes. Because any two classes will obviously differ from one another in many ways, your experimental and control groups would be very different before the experiment began.

Even if you are able to independently assign participants, you may be unable to maintain independence because participants interact with one another, thereby influencing each other's responses. If the children in the group influence each other, you do not have independent responses from 60 individuals. Instead, you have responses from two mobs. For example, suppose there is one very aggressive child in the control group. As any teacher knows, one misbehaving child can cause virtually everyone in the group to misbehave.

Violation of independence, whether due to faulty assignment or failure to maintain independence of responses, can have one of two consequences – bad and worse. The worst consequence happens if the researcher does not realize that independence has been violated. In that case, she would conduct her statistical tests as if she had more individual units than she has. She would think that since each group is made up of 30 randomly assigned participants, the groups should be fairly equivalent. She would believe that since she has so many independent units, chance differences between groups should be minimal. However, since, in reality, she has only 2 independent units, chance could easily be responsible for substantial differences between groups. Therefore, she is very likely to misinterpret a difference that is due to chance as a treatment effect.

The bad consequence occurs if the researcher realizes that she has only two independent units. In that case, the good news is that since she realizes that even large differences might be due to chance, she probably won't mistake chance differences for treatment differences. However, the bad news is that since she realizes that even large differences may be due to chance, she will tend to dismiss real treatment effects as being due to chance. In other words, her study will be powerless.


Threats to Construct Validity

You can remedy the problem of too few independent units by using more classes. For example, you might have 10 classes in one group and 10 classes in the other group. However, violation of independence is only one problem with using intact groups. Using intact groups exposes your study to three serious threats to construct validity: demoralization, compensation, and diffusion of treatment.



Your study's construct validity starts the moment the classes talk to each other and find out about their differential treatment. Do not be surprised if the no-television group becomes demoralized. They may vent their frustration about missing out on television by being violent. As a result, the television-watching group may be better behaved, even though watching television did not improve their behavior. In this case, it's not that watching television reduces violence, it's that feeling deprived increases violence.



On the other hand, upon learning of the experimental group's good fortune, compensation could occur. That is, the no-television class might pull together and behave as best as they could so that they would be allowed to watch television. As a result of their efforts, the no-television group might behave better than the television group. Again, you would see a difference between your groups, but the difference wouldn't be due to the effects of the review session.


Diffusion of Treatment

Finally, you might not observe any effect for treatment because of diffusion of treatment: Both your groups are getting the treatment. In your television study, members of the no-television class might be watching television. For example, their teacher may succumb to their begging to "watch television like the other class" and thus borrow the television from the other teacher. Or, if the classes are held in the same room, pupils in the no-television group might watch or overhear the other class's television shows. Consequently, the impact of the television shows would diffuse to the no-television group.

TABLE G.2: Dealing with Problems Caused by Studying Intact Groups


Partial remedies

Groups are not independent.

Use many groups. In analyzing data, do not consider each participant as an individual unit. Instead, consider each group as a single unit. Thus, for the purposes of analysis, rather than having 300 participants, you may only have 10 classes.


(No-treatment group being depressed that they were not in the treatment group)

Use placebo treatments.

Minimize opportunities to talk by doing the experiment in a short time span and by using groups that do not come into contact with one another.


(No-treatment group working harder to make up for being denied the treatment)

Use placebo treatments.

Minimize opportunities to talk by doing the experiment in a short time span and by using groups that do not come into contact with one another.

Diffusion of treatment

(No- treatment group getting access to the treatment)

Use placebo treatments.

Minimize opportunities to talk by doing the experiment in a short time span and by using groups that do not come into contact with one another.



Minimizing Threats to Construct Validity

How can you minimize demoralization, compensation, and diffusion of treatment? The steps to take are obvious once you realize that these threats usually result from participants finding out that their treatments differ. With this in mind, the first step is to make your conditions resemble one another as much as possible. Never use a treatment group and a no-treatment group. Instead, use a treatment group and a placebo treatment group, or two different kinds of treatment.

In the television study, you could have one group watch one kind of television program (e.g., violent) while the other watched another kind of program (e.g., non-violent). Or, you could be even sneakier and show both groups the same shows – the only difference is that in one condition you have edited out some of the violence. In this way, participants would not notice that their conditions differ.

The second step is to give participants fewer opportunities to talk. For example, shorten the time between giving the treatment and collecting the dependent measure. Obviously, the longer the time between the introduction of treatment and collecting the dependent measure, the more likely the groups are to talk. Therefore, you might conduct the entire study in one day rather than having it last for several months.

If you want to look at long term effects of treatment, you could reduce opportunities for participants to talk to one another by using participants who will not run into another. Thus, in the television study, rather than assigning different classes in the same schools to different conditions, you could assign different schools to different conditions. The chances of a toddler from Busy Bee Day Care comparing curriculum with a child from Lazy Larry's Day Camp are remote.