Back to Research Design Explained Main Menu



Research Design Explained (7th edition)









2.      Operational definitions help psychology

a.       be objective because they provide objective definitions of terms.

b.      make testable statements because they allow us to define our concepts in specific and observable terms.

c.       be public because operational definitions define terms in publicly observable ways. These “recipes” can be shared.

d.      be productive because the operational definition provides a recipe that other scientists can follow to repeat and build on the original researcher’s study.


4.      Match the following to the qualities of science.

_a__ testable                                    a. learning from mistakes

_ e__ skeptical                     b. “show us the evidence”

_b__ open-minded              c. avoid bias

_c__ objective                     d. publishing studies

  d_ public                           e. question authority

_f__ productive                   f. science works





6. The characteristic of science that is threatened is finding general rules. One implication is that psychology’s inability to predict individual behavior perfectly does not make it less of a science. When physicists have to predict specific real life behavior (where a drop in the ocean will be), they don’t do very well either.


8.Astrology fails on all three counts.

a. It often makes statements that are too vague to be testable

b, It is not productive: It hasn’t changed in 2,000 years.

c. Astrologers do not seek objective evidence of their accuracy. They prefer to talk only about their successes (perhaps because their track record is not very good).


10. Psychoanalysis is attacked for

        a. not being testable because it makes after-the-fact interpretations rather than predictions

        b. not producing observable evidence (if the unconscious can’t be observed), which, in turn, leads to it being untestable as well as unproductive.

        c. not being productive–if the effectiveness of psychoanalysis has not improved






2.      Match the threat to the type of validity

_a_ construct validity

a. poor measure

_c_ external validity   

b. treatment and no treatment  groups were unequal before the study began

_b_ internal validity            

c. small, biased sample of participants



4.      You could argue that it is unethical to give patients an unproven treatment. People are coming to  (and paying) the therapist for help. If the therapist's “help” has not been tested, it may produce real harm--or prevent the patient from getting treatment that has been proven to be effective. Some would argue that if you are using an unproven treatment, you should at least tell clients that the treatment is unproven and you should not charge them for such treatments. You could argue that it is ethical to withhold a treatment that is believed to work because you do not yet know if it works or not.  That is, you should not give the treatment until you are sure that it does indeed work.




a.       To make an informed decision about the ethics of a research project, we must be able to weigh the pros and the cons. If we cannot accurately assess the harm that might result from a study, we cannot make an intelligent decision about whether a study should be done.  Thus, we may approve research that causes great harm. It could be argued that, rather than taking that risk, we should not do research.

b.      Because we are so ignorant of human behavior, we need to do research so we can become less ignorant.  Our ignorance and lack of understanding results in harm every single day. We have a duty to become less ignorant, more helpful, and more understanding of ourselves and of others.

c.       The following principles appear to have been violated.

i.        Participants did not fully understand the amount of shock that they would be asked to administer.

ii.      Participants did not feel free to quit the study at any point.

iii.    Milgram did not seek permission from an internal review board (such committees did not exist back then).

d.      The following principles appear to have been violated.

i.        The “prisoners” didn't feel free to quit the study at any point.

ii.      The investigators failed to anticipate all possible risks to the participants.










2.      According to dissonance theory, the following factors might moderate the effects of writing a counterattitudinal essay: the extent to which you believe that others will know you wrote the essay, the extent to which you believe that your essay will influence others, the extent to which you believe that you voluntarily chose to write the essay.




4.      Find a research article that tests a hypothesis derived from theory.  Give the citation for the article and describe the main findings.

No set answer.



6.      Design a study to improve the construct validity of the study reported in Appendix D.

No set answer.


8.      The study reported in Appendix B finds a relationship between variables.  Design a study to map out the functional relationship between those two variables.

No set answer.



10.  In terms of the null hypothesis, what's wrong with the following research conclusions:

a.       There is no difference in outcome among the different psychological therapies.

b.      Viewing television violence is not related to aggression.

c.       There are no gender difference in emotional responsiveness.

(1)   All these conclusions are based on assuming that the null hypothesis can proven. It cannot. Failing to find a difference or an effect does not mean there is no

(2)   difference or no effect. It just means you failed to find a difference.





There are no set answers to any of these exercises.






2.      List four basic tactics for reducing the possibility of subject bias.

The four basic tactics are:

a.      Not letting participants know that you are observing them.

b.      Letting participants know that you are observing them, but not letting them know what particular behavior you are observing.

c.       Letting participants know what behavior you are interested in, but not letting them know what the behavior really measures.

d.      Choosing a response that most participants couldn't or wouldn't change


4.      What is the discriminant validity?  Why is it necessary?

Discriminant validity involves showing that your measure does not correlate too highly with something it shouldn’t. Discriminant validity and convergent validity work together. Convergent validity is necessary to know that your measure correlates with what it should. Discriminant validity is necessary to show that your does not correlate too highly with what it should not. To take but one example, suppose that you had a measure of a person’s weight that was based on the length of their arms. Such a measure would correlate with other measures of weight (convergent validity), but it would also correlate even more highly with other measures of height. Thus, the measure would not have discriminant validity relative to height. In short, you could easily have an invalid measure that correlates with what it’s supposed to measure (has convergent validity), but correlates even more highly with a related construct (lacks discriminant validity).


6.      What is content validity?  For what measures is it most important?

Content validity is the extent to which a measure represents a balanced and adequate sampling of relevant dimensions, knowledge, and skills.  It is important when you are measuring classroom tests and other tests of knowledge and skill.




8.      Think of a construct that you would like to measure.

a.       Name that construct—No one right answer

b.      Define that construct

Definition should be drawn from a dictionary, psychological dictionary, or theory

c.       Locate two published measures of that concept (see Web Appendix B).

No one right answer.

d.      Develop a measure of that construct.

e.       What could you do to improve or evaluate your measure’s reliability?

·         use machines to record behavior

·         simplify the observer's task

·         train and motivate observers

·         provide clear-cut guidelines on scoring

·         re-check observer's ratings

·         standardize the way the measure is administered

·         calculate a test-retest reliability coefficient

f.       If you had a year to try to validate your measure, how would you go about it? (Hint: Refer to the different kinds of validities discussed in this chapter.)

Validation strategies would include

·         Assessing measure's reliability

·         Assessing convergent validity

·         Assessing discriminant validity

·         Assessing content validity

g.       How vulnerable is your measure to subject and observer bias? Why? Can you change your measure to make it more resistant to these threats?

To make the measure less vulnerable to subject bias

Prevent participants from knowing what behavior is being observed by

·               observing them in a “non-research” setting

·               using unobtrusive observation

·               using unobtrusive measures

·               using unexpected measures

Prevent participants from knowing what concept you are trying to measure by

·               using disguised measures

·               overwhelming participants with measures

Use behaviors that participants won't readily change by using

·               physiological measures

·               important behavior

To make the measure less vulnerable to observer bias

·            Don't use human observers--use machines instead.

·            If you must use human observers, make them “blind” measures)

·            Reduce memory biases by permanently recording the behavior

·            Re-check observer's ratings

·            Clearly define the rating categories

·            Train and motivate raters

·            Use only the raters who were successful during training



10.  Think of a factor that you would like to manipulate.

a.       Define this factor as specifically as you can.

No one correct answer.

b.      Find one example of this factor being manipulated in a published study. Write down the reference citation for that source.

No one correct answer.

c.       How would you manipulate that factor? Why?

Answer should focus on

·         standardization

·         reducing experimenter bias

·         reducing subject biases, including the use of a placebo treatment

·         consistency with theoretical definitions of the construct

·         evidence that the manipulation is effective, such as the results of manipulation checks from other studies

d.      How could you perform a manipulation check on the factor you want to manipulate? Would it be useful to perform a manipulation check? Why or why not?

There is no one answer to how to perform the manipulation check. However, there are clearer answers to the next two questions. Generally, it is a good idea to perform a manipulation check because one should not simply assume that a manipulation was interpreted the way that we wanted it to be interpreted. The manipulation check provides evidence that the treatment is valid (if it is) and may tell you where your study went wrong (if the treatment manipulation is not valid). Thus, if the study doesn't support the hypothesis, the manipulation check may help in determining whether it was the hypothesis or the manipulation that was faulty. 






2.      List the scales of measurement in order from least to most accurate and informative.

Nominal, ordinal, interval, and ratio.



4.      Assume that facial tension is a measure of thinking.

a.       How would you measure facial tension?

Facial tension could be measured as the amount of lines a person gets on his/her face during times of stress or by measuring electrical activity of facial muscles.

b.      What scale of measurement is it on?  Why?

You might assume that it is an ordinal scale (more tension means more thinking). Certainly, it would not be safe to assume that you had a ratio scale (twice as much tension means twice as much thinking) and it would probably not be safe to assume that you had an interval scale.

c.       How sensitive do you think this measure should be?  Why?

The measure of lines on the face might not be sensitive (there would be a small range of scores and some random observer error). However, the measure of electrical activity in the facial muscles might be extremely sensitive.



6.      In an ideal world, car gas gauges would be on what scale of measurement? Why?

Ratio. You would want the best measurement possible. It would be nice to know that if you registered having half a tank you really had half a tank.


8.      Find or invent a measure.

a.       Describe the measure.

No one correct answer.

b.      Discuss how you could improve its sensitivity.

No one correct answer.

c.       What kind of data (nominal, ordinal, interval, or ratio), do you think that measure would produce? Why?

No one correct answer.






2.      Steinberg & Dornbusch (1991) also reported that the correlation between hours of employment and interest in school was statistically significant. Specifically, they reported that r(3,989)= -.06, p<.001. [Note that the r (3,989) means that they had 3,989 participants in their study.] Interpret this finding.

The more hours students worked, the less likely they were to be interested in school. However, this effect was extremely small and is only significant because the researchers, by using almost 4,000 participants had an extremely powerful design. The effect is so small that for practical purposes it is meaningless. Put another way, the coefficient of determination is only .0036, meaning that the relationship explains almost none (0.0036 is not that far from 0.00) of the variation in interest in school.



4.      In the same study, sex was coded as 1= male, 2= female. The correlation between sex and aerobic fitness was -.58,  which was statistically significant at the p<.01 level.

a.       In this study, were men or women more fit?

Men were more aerobically fit.

b.      What would the correlation have been if sex had been coded as 1= female and 2= male?


c.       From the information we have given you, can you conclude that one gender tends to be more aerobically fit than the other? Why or why not?

No, because you do not know if the sample of men and women were a representative random sample of all men and women.


6.      A physician looked at 26 instances of crib death in a certain town. The physician found that some of these deaths were due to parents suffocating their children. As a result, the physician concluded that most crib deaths in this country are not due to problems in brain development, but to parental abuse and neglect. What problems do you have with the physician's conclusions?

First, the physician generalized from a small and limited sample to the entire country. Second, the physician made an inference about the percentage of instances in larger population (that most crib deaths are due to parental neglect) without doing any statistical test of this assertion.





1-3. Open-ended.











• Easy to answer

• Easily and objectively


• High reliability

• Participants  may dislike

• Participants’ viewpoints may not be represented

• Provides only ordinal data

• Deprives study of power because (1) measure is insensitive and (2) may require use of less powerful statistical techniques.


• Easy to answer

• Easily and objectively  scored

• High reliability

• Sensitive

• Provide interval data

• Can be analyzed with powerful statistical tests

• Potential for summating scores

• Participants

  may resist




• Allows participants freedom to respond as they choose

• Good for exploratory research

• Time-consuming to answer

• Time-consuming to score

• Hard to score objectively.


5-8. Open-ended.




10.  Why might having participants sign informed consent forms make the study less ethical?

If the survey is anonymous, innocuous, and doesn’t elicit sensitive information from participants, informed consent is not required. Filling out the informed consent form might make the study less ethical by making the participants’ involvement in the study less confidential and more time-consuming without providing any benefits to the participant.








2. In all of the following cases, the researcher wants to make cause-effect statements. What threats to internal validity is the researcher apparently overlooking?

a. Employees are interviewed on job satisfaction. Bosses undergo a three week training program. When employees are re-interviewed a second time, dissatisfaction seems to be even higher. Therefore, the researcher concludes that the training program caused further employee dissatisfaction.

History--other events besides the training program have happened in the past three weeks. For example, layoffs or salary cuts could have occurred.

Instrumentation­­­--The interviewer may have developed more rapport and been more direct in the second interview. Thus, the second time around, the measure was not the same and not administered the same way.

b. After completing a voluntary workshop on improving the company's image, workers are surveyed. Worker who attended the workshop are now more committed  than those in the "no-treatment" group who did not make the workshop. Researcher's conclusion: The workshop made workers more committed.

Obvious selection problem--even before the workshop, volunteers were probably more committed than non-volunteers.

c. After a 6-month training program, employee productivity improves. Conclusion: Training program caused increased productivity.

Maturation: New workers might have naturally improved their skills

over that period.

History: Other events (a new incentive system, a better supervisor, better technology) that happened over the last six months could be responsible for the rise in productivity.

Regression: Would be a likely problem if training was instituted because productivity was at an all time low.

Mortality: Poorer workers may have left the company.

d. Morale is at an all-time low. As a result, the company hires a "humor consultant." A month later, workers are surveyed and morale has improved. Conclusion: The consultant improved morale.

Regression is the most likely suspect.

Also likely are:

Mortality (unhappy people leaving)

History (management making other changes)


e. Two groups of workers are matched on commitment to the company. One group is asked to attend a two-week workshop on improving the company’s image, the other is the no-treatment group. Workers who complete the workshop are more committed than those in the “no-treatment” group. Researcher’s conclusion: The workshop made workers more committed.

Selection (not all workers who are asked will go) and mortality (people dropping out) are prime suspects.



4. How could a quack psychologist or doctor take advantage of regression toward the mean to make it appear that certain phony treatments actually worked?

If the quack takes people who are feeling unusually bad, those people will tend to improve on their own. That is, they will naturally rebound to their normal levels of health or happiness and the quack can take the credit.



6. Suppose a memory researcher administers a memory test to a group of residents at a nursing home. He finds a group of grade school students that score the same as the older patients on  the memory pretest. He then administers an experimental memory drug to the older patients. A year later, he gives both groups a posttest.

If the researcher finds that the older patients now have a worse memory than the grade school patients, what can the researcher conclude? Why?

Nothing--the results could be due to a selection by maturation interaction due to the school children's memories improving and the older patients' memories naturally staying the same or declining slightly.

If the researcher finds that the older patients now have a better memory than the grade school students, what can the researcher conclude? Why?

The researcher might have an easier time concluding that the drug improves memory because the difference is opposite of what would be expected on the basis of selection by maturation interactions. However, history effects are still possible (if other interventions are going on at the nursing home) and regression might be possible (if the children selected had unusually high scores for their grade level).



8. What is the difference between testing and instrumentation?

The difference between testing and instrumentation is that in testing participants may remember things from the previous test and therefore score higher, whereas in instrumentation, the actual measuring instrument changes or the way it is administered changes.



10. What is the difference between internal and external validity?

Internal validity refers to whether you can make the statement that, in a given study, with these participants, the treatment caused an effect.

External validity refers to whether you can  generalize what you discovered in a particular study to other people, situations, and times.






2. Participants are randomly assigned to meditation or no meditation condition. The meditation group meditates three times a week. The meditation group reports being significantly more relaxed than the no meditation group.

a. Why might the results of this experiment be less clear-cut than they may first appear?

There is a construct validity problem. The meditation group may feel more relaxed because of a placebo effect or they may simply report being more relaxed because they think that is what the experimenter wants them to say. In addition, it may be that the tense people dropped out of the experimental group because they were unable or unwilling to keep to the schedule of meditating three times a week.

b. How would you improve this experiment?

The experiment could be improved by improving the control group. For example, the control group might be assigned to keep to a schedule where they would listen to classical music three times a week. Alternatively, they might be asked to keep to a schedule where they would have “quiet time” three times a week.



4. A training program significantly improves worker performance. What should you know before advising a company to invest in such a training program?

You should know  how big the difference was. A statistically significant difference may not be big enough to be worth paying for.



6. Students were randomly assigned to two different strategies of studying for an exam. One group used visual imagery, the other group was told to study their normal way.

The visual imagery group scored a 88% on the test as compared to a 76% for the control group. This difference was not significant.

a. What, if anything, can the experimenter conclude?

Nothing--null results are inconclusive.

b. If the difference had been significant, what would you have concluded? What changes in the study would have made it easier to be sure of your conclusions?

Imagery seems to improve recall. We would be more confident of our conclusions if they hadn't used an “empty control group.” Ideally, the control group would have gotten some placebo-type treatment (a lecture on the importance of studying). 

c. "To be sure that they are studying the way they  should, why don't you have the imagery people form one study group and have the control group form another study group." Is this good advice? Why or why not?

This is bad advice because that would mean violating independence.

d. "Just get a random sample of students who typically use imagery and compare them to a sample of students who don't use imagery. That will do the same thing as random assignment" Is this good advice? Why or why not?

This is bad advice. Random sampling is very different from random assignment. People who typically use imagery may differ from people who don't typically use imagery in a wide variety of ways. They are probably more visual thinkers and may do better in art, architecture, geometry, and chemistry than people who do not typically use imagery.




8. Gerald's dependent measure is the order in which people turned in their exam (1st, 2nd, 3rd, etc.). Can Gerald use a t test on this data? Why or why not? What would you advise Gerald to do in future studies?

Gerald should not use a t test because he has  ordinal data. Because he has ordinal data, computing means for the control group and the experimental group (a first step in doing a t test) would be misleading. Next time, Gerald should record at what time people turned in their exam. Then, Gerald would have data that were at least interval.



10. Are the results of  experiment A or experiment B more likely to be significant? Why?


























































Experiment B’s results are more likely to be statistically significant because it studied more participants. Having more participants allows random error more opportunities to balance out. Consequently, with more participants, a moderate difference between the groups is less likely to be due to chance alone. When we do the calculations, we find that for Experiment A t = 1.225, which is not significant, and that for Experiment B, t = 2.449, which is significant.





2. Suppose people living in homes for the elderly were randomly assigned to two groups:  a no treatment group and a transcendental mediation (TM) group.  Transcendental mediation involves more than sitting with eyes closed.  The technique involves both a "mantra, or meaningless sound selected for its value in facilitating or settling down process and a specific procedure for using it mentally without effort again to facilitate transcending" (Alexander, Langer, Newman, Chandler, & Davies, 1989).  Thus, the TM group was given instruction in how to perform the technique, then "they met with their instructors 1/2 hour each week to verify that they were mediating correctly and regularly.  They were to practice their program 20 minutes twice daily (morning and afternoon) sitting comfortably in their own room with eyes closed and using a timepiece to ensure correct length of practice."  (Alexander, Langer, Newman, Chandler, & Davies). Suppose that the TM group performed significantly better than other groups on a mental health measure.

a. Could the researcher conclude that it was the transcendental meditation that caused the effect?

No, because the control group was an empty control group.

b. What besides the specific aspects of TM could cause the difference between the two groups?

The extra attention the TM group received, the structure of a routine that was imposed on the TM group, as well as the fact that those who weren't able to learn the TM technique or who didn't continue to apply the technique would be dropped from the study. Thus, people may be dropping out of the experimental group, but not out of the control group.

c. What control groups would you add?

A group that had to undergo some training (e.g., critical thinking) and would have to practice what they had learned twice a day and meet with their instructors once a week.

d. Suppose you added these control groups and then got a significant F for the treatment variable? What could you conclude? Why?

Conclusion: That at least one of the groups differ from the others. In other words, at least one of the treatments had an effect. However, we would not be able to say which groups differed from each other until we did a post hoc test.



4. Assume a researcher is looking at the relationship between caffeine consumption and sense of humor.

a. How many levels of caffeine should the researcher use? Why?

At least three because the relationship might be nonlinear. For example, people might have little sense of humor with no caffeine (they're not awake) and little with an extreme amount of caffeine (they are too hyped up and irritable), but a good sense of humor under moderate levels of caffeine. Using three or more levels of  caffeine would allow us to detect some nonlinear trends and help us make predictions about the effects of levels of caffeine that we had not directly tested.

b. What levels would you choose? Why?

Three to four levels. A no caffeine group, a low caffeine group, a moderate caffeine group, and a high caffeine group. Make sure that the amounts of caffeine are evenly spaced (e.g., 0 mg., 20, 40, 60, 80) so that trend analyses can be performed.

c. If a graph of the data suggests a curvilinear relationship, can the researcher assume that the functional relationship between the independent and dependent variable is curvilinear? Why or why not?

No—the researcher do a post hoc trend analysis to make sure the observed pattern is reliable.

d. Suppose the researcher used the following four levels of caffeine: 0 mg., 20 mg., 25 mg., 26 mg. Can the researcher do a trend analysis? Why or why not?

No—the levels are not evenly or proportionately spaced.

e. Suppose the researcher ranked participants based on their sense of humor. That is, the person that laughed least got a score of "1", the person who laughed second least got a "2", etc.  Can the researcher use this data to do a trend analysis? Why or why not?

No—you need at least interval scale measurement to do a trend analysis. Ranked data is only ordinal. 

f. If a researcher used 4 levels of caffeine, how many trends can the researcher look for?                  

3 (one less than the number of levels)

What is the treatment's degrees of freedom?

3 (also one less than the number of levels)

g. If the researcher used 3 levels of caffeine and 30 participants, what are the degrees of freedom for the treatment?


the degrees of freedom for the error term?


h. Suppose the F is 3.34  Referring to the degrees of freedom you obtained in your answer to "g" (above) and  to the table E-3, are the results statistically significant?

No--if the significance rule is that p < .05

Can the researcher look for linear and quadratic trends?

No—if the results are not statistically significant, then the researcher cannot look for trends.



6. A friend gives you the following Fs and significance levels. On what basis, would you want these Fs (or significance levels) re-checked?

a. F (2, 63)=.10, not significant

Even when the treatment has no effect, F's rarely tend to be zero. Instead, they are usually closer to 1.00. After all, if there is no treatment effect, then, at a conceptual level, you are dividing an estimate of error variance by another, estimate of the same error variance. Dividing anything by itself should result in a number close to 1.

b. F (3, 85) = -1.70, not significant

F’s can’t be negative. You are dividing a square term by another squared term.

c. F (1, 120)= 52.8, not significant

Such a large F with so many degrees of freedom would have to be significant. Indeed, according to the F table in Appendix E, the critical value of F(1,120) is 3.92.

d. F (5, 70) = 1.00, significant

F's close to one are rarely significant. An F of one is expected even when there is absolutely no effect. Indeed, the lowest critical value of F on the entire F table in Appendix E is 1.46—and that's for an F(30, and an infinite number of degrees of freedom).



8. Complete the following table.













Error (E)








SS Total=













2. Can you have an interaction without a main effect?

Yes. Having a main effect has no impact on whether you will have an interaction.




4. Describe the pattern of results in the following table in terms of main effects and interactions. Assume that all differences are statistically significant.


Status of Speaker

Rate of Speech

Low Status

Hi Status








Attitude Change

Main effect for status, main effect for status, and an interaction.




6. The following table is an ANOVA summary table of a study looking at the effects of similarity and attractiveness on liking. Complete the table. Then, answer these three questions.



a. How many participants were used in the study?


b. How many levels of similarity were used?


c. How many levels of attractiveness were used?









Similarity (S)  





Attractiveness (A)






S X A interaction   

















8. A lab experiment on motivation yielded the following results:




No financial bonus, no encouragement


No financial bonus, encouragement


Financial bonus, no encouragement


Financial bonus, encouragement




a.       Make a 2 X 2 table of these data.



No encouragement


No financial bonus







b.      Graph these data.


c.       Describe the results in terms of main effects and interactions.     

Bonus main effect; Encouragement main effect; Interaction between bonus and encouragement.

d.      What is your interpretation of the findings?

One interpretation is that you can use either bonuses or encouragement, but there is no need to do both. However, it is possible that this ordinal interaction is due to a ceiling effect and that a better measure of productivity might find that encouragement combined with bonuses is better than either encouragement or bonuses alone.





10. Suppose a researcher wanted to know whether lecturing was more effective than group discussion for teaching basic facts.  Therefore, the researcher did a study and obtained the following results:


Source of Variance 





Teaching (T)





Introversion/ Extroversion (I)





T X I interaction 











a. What does the interaction seem to indicate?

The effectiveness of the different teaching styles is different, depending on whether introverts or extroverts are being taught. Without seeing the means it is dangerous to speculate, but if one had to guess, one might say that introverts responded better to the lecture method whereas extroverts responded better to the group discussion method.

b. Even if there had been no interaction between teaching and Extraversion, would there be any value in including the introversion-extroversion variable? Explain.

Yes, because we would know whether the effectiveness of a teaching style was moderated by introversion

c. What, if anything, can you conclude about the effects of introversion on learning?

Nothing—introversion is not an experimental factor.







2.      A researcher uses a simple between-subjects experiment involving ten participants to examine the effects of memory strategy (repetition versus imagery) on memory. 

a.       Do you think the researcher will find a significant effect? Why or why not?

No—too few participants to have any power.

b.      What design would you recommend?

A counterbalanced design so that the researcher could have the power of a within-subjects design and yet control for order effects.

c.       If the researcher had used a matched pairs study involving 10 participants, would the study have more power? Why? How many degrees of freedom would the researcher have? What type of matching task would you suggest? Why?

Yes—the design should have more power because random error due to individual differences would be reduced, thereby making the treatment effect easier to detect.

Only 4  (one less than the number of pairs).

A reliable, sensitive, valid memory test that would be similar to the memory test used in the real study. Ideally, we would use a test that correlated highly with the real measure. We would use such a task because we do not have to worry about deception and because it is most likely to give us accurately matched pairs  (a real concern when we only have five pairs).


4.      What problems would there be in using a within-subjects design to study the "humor-perseverance" study (discussed in question 3)? Would a counterbalanced design solve these problems?

Participants would probably figure out what the study was about, thus hurting construct validity. Also, participants might be more frustrated during the second exposure to the frustrating task (a practice effect). In addition, there might be an interesting carry-over effect of humor for participants receiving the humor/no-humor sequence: Irritability in the “no humor” condition might be due to “coming down” from laughing (if one buys opponent process theory). Not completely. However, it might be able to balance  out and measure these effects. Thus, the design might let you know that these factors were problems.


6.      Two researchers hypothesize that spatial problems will be solved more quickly when the problems are presented to participant's left visual fields than when stimuli are presented to participant's right visual fields (because messages seen in the left visual field go directly to the right brain which is often assumed to be better at processing spatial information).  Conversely, they believe verbal tasks will be performed more quickly when stimuli are presented to participants' right visual fields than when the tasks are presented to participants' left visual fields.  What design would you recommend?  Why?

A within subject design or a counterbalanced design because

the differences looked for are probably fractions of seconds, so you need a powerful design that will reduce error variance and allow you to get many observations.

the hypothesis is not so intuitive that participants are likely to guess it and play along. Therefore, sensitization is not a big problem.

a few warm-up trials could minimize practice effects and keeping the study short would minimize fatigue effects (especially since the task is so simple). In addition, we could use counterbalancing to balance out practice and fatigue effects.


8.      You want to determine whether caffeine, a snack, or a brief walk has a more beneficial effect on mood? What design would you use? Why? How?

A between-subjects design would probably be best to avoid problems with (a) the order effects that would affect within subject designs and (b) catching on to the hypothesis (sensitization) that  would affect both matched pairs and within subject designs. This would be done simply by randomly assigning participants to groups. If you did not want to use a pure between-subjects design, you could use a mixed design in which the within-subjects variable would be before vs. after the treatment and the between-subjects variable would be caffeine vs. snack vs. the brief walk. In that case, you would be looking for a significant interaction between trials (before or after) and the treatment variable.


10.  A researcher wants to kow whether music lessons increase scores on IQ subtests and whether music lessons have more of an effect on some subtests (e.g., more of an effect on math than on vocabulary) than others.

a.       Would you make music lessons a between or within subjects factor? Why?

Between-subjects. It varies between-subjects in real life and there might be substantial carryover effects.

b.      Would you make subtests a between or within subjects factor? Why?

A within-subjects factor. There is little concern about order effects and it would give the study much more power.

c.       If the researcher did an analysis of variance (ANOVA) on the data, the researcher would obtain three effects. Name those three effects.

A between-subjects main effect for music lessons, a within-subjects main effect for subtests, and an interaction between subtests and music lessons.

d.      What effect would the researcher look for to determine whether music lessons increase scores on IQ subtests?

The between-subjects main effect of music lessons.

e.       What effect would the researcher look to determine whether music lessons have more of an effect on math subtests than on vocabulary subtests?

The interaction between music lessons and subtests.






2. If the study does not manipulate the treatment, which requirement of establishing causality will be difficult to meet?

Temporal precedence



4. Compare and contrast how single-subject experiments and randomized experiments account for non-treatment factors.


Single-n experiments

Randomized experiments

1. Eliminate between subject variables by studying a single subject.

1. Independent random assignment to be sure that irrelevant variables vary randomly rather than systematically.

2. Control relevant environmental factors and demonstrate control of extraneous variables by establishing a stable baseline.

2. Use tests of statistical significance to see if it is unlikely that random factors could account for the differences.







6. How do the A-B design and the pretest-posttest design differ in terms of

a. Procedure?

The pretest-posttest design uses more participants, does not attempt to develop a stable baseline, and usually exerts less control over non-treatment variables.

b. Internal validity?

Because the pretest-posttest researcher has not established a stable baseline and does not exert as much control over extraneous variables, the pretest-posttest has less internal validity than the A-B design.



8. Design a quasi-experiment that looks at the effects of a course on simulating parenthood, including an assignment that involves taking care of an egg, on changing the expectations of junior-high school students about parenting. What kind of design would you use? Why?

A randomized experiment would probably be the best choice because it is (a) feasible and (b) would have internal validity. The next best choice would probably be a time-series design with a control group because the control group might be able to rule out some of the history effects. A time-series design without a control group would be better than a pretest-posttest design because it could better estimate the effects of maturation. However, a pretest-posttest design would be better than a nonequivalent control group design because the nonequivalent control group is so vulnerable to selection.



10. According to one study, holding students back a grade harmed students. The evidence: students who had been held back a grade did much worse in school than students who had not been held back.

a. Does this evidence prove that holding students back harms their performance? Why or why not?

No—there is a strong possibility that those who were held back differ in certain ways from those who were not held back.

b. If you were a researcher hired by the Dept. of Education to test the assertion that holding students back harms them, which of the designs in this chapter would you use? Why?

A time series design would be inadequate because dropping out could reflect some historical force (better employment opportunities). A nonequivalent control group would not be adequate because the groups are different to begin with. Therefore, you should use a two-group time series design. To make your “held back” group and “not held back” groups as equivalent as possible, you might

o        attempt to match on key variables, such as IQ and attendance.

o        hope that you could find a district where students were held back according to some rule (scored below 50% on a standardized test). Then, you might compare those who were just above the cut-off (50-51%) to those who were just below (49-50%).

o        hope that different districts had different cut-off points so that you could compare 50% scorers who were held back against 50% scorers who advanced.


Back to Research Design Explained Main Menu