1.
Why is
bias considered more serious than random error?
Hints:
They are both errors, but how do they differ? Which
one introduces systematic error that could bias a study’s results? Which one
averages out to zero? Which one poisons validity and
which one dilutes it? The answers to these and other
questions are on pages 147-150 of your text.
2.
What are the two primary types of
subject bias? The two types of subject bias are social desirability bias and obeying
demand characteristics.
What are the
differences between these two sources?
With social desirability, participants try to
make themselves look good. With obeying demand
characteristics, participants try to make the researcher look good by giving
the researcher the results that will support the hypothesis.
3.
Suppose a “social
intelligence” test in a popular magazine had high internal consistency. What would that mean?
Hints: If a participants missed one question, would that
person tend to
miss most of the other questions? If a
person got a certain question right, would that person tend to get most of the other questions right?
If each question was considered a judge of social intelligence, would the “judges”
agreeing with each other? The answers to these and other questions are on pages
171-173.
Why
would you still want to see whether the test had discriminant validity?
Hint: What other
than social intelligence might the test be measuring? If
you can’t think of anything, re-read pages 180-183.
How
would you do a study to determine whether the test had discriminant validity?
Hints:
What tests besides the “practical
intelligence test” would you administer? What would it
mean if scores on the practical intelligence test correlated highly with scores
on those other tests? What would it mean if scores on
the practical intelligence test did not correlate highly with the scores
on the other tests? If you need more help, re-read
pages 180-183.
4.
Given that IQ tests are not perfectly reliable, why would it be irresponsible to tell someone his or her
score on an IQ test?
People
tend to think that their IQ is exactly the same as their test score. Thus, if told their IQ score was 97, they would tend to
think of their IQ as being exactly 97. However, scores
have random error. Thus, someone who scored 97 one day
might score 105 the next.
5.
What is content validity?
Hint: Content validity is defined
here and explained on page 176.
How does it
differ from internal consistency?
Hint:
Which one requires statistical evidence such as average inter-item correlation,
Cronbach’s alpha, odd-even correlations, and Kuder-Richardson coefficients to establish that all the
items on a scale seem to be measuring the same thing? Which
one requires expert judgment that the items are consistent with the concept’s
definition? Which is concerned with
showing that the items represent a fair sample of the construct’s key aspects-- and which is
concerned that all the items are measuring one thing? If
you are not sure about your answers to these questions, refer to pages 176-179 of the text.
For what
measures is it most important?
Hint:
See paragraph 5 on page 176.
6.
What is content validity?
Swann and Rentfrow (2001) wanted to develop a test “that measures the
extent to which people respond to others quickly and effusively.” In their view, high scorers would tend to blurt out their
thoughts to others immediately and low scorers would be slow to respond.
a. How would you use the known-groups technique to get evidence of
your measure’s construct validity?
You could see whether car salespeople scored higher than
librarians.
b. What measures would you correlate with your scale to make the case
for your measure’s discriminant validity? Extraversion, social desirability
Why?
Extraversion: Your claim is that your measure is doing something other
than measuring outgoingness. Social desirability: It
is usually good to show that you are not just measuring a response bias.
In what range would the
correlation coefficients between those measures and your measure have to be to
provide evidence of discriminant validity? Why?
For extraversion, you would be satisfied with a correlation between .3
and .7. You expect that the trait would be related to
extraversion. Thus, you would expect your measure to
correlate with a measure of extraversion, but you would certainly want it to be
below .8—otherwise, it may just be a measure of extraversion.
For social desirability, you would like a correlation around 0 (in the
-.2 to +.2 range) because you do not think that your
trait is related to social desirability.
If your trait is not related to social desirability, your measure of
that trait should not be related to social desirability.
c. To provide evidence of convergent validity, you
could correlate scores on your measure with a behavior typical of people who
blurt out their thoughts. What behavior would you
choose? Why?
Interrupting
others, talking during movies, or responding to rude behavior—because people
who blurt out their thoughts might not be able to help themselves from
interrupting others, talking during movies, or responding to rude behavior.
7.
A
researcher wants to measure "aggressive tendencies."
The researcher is considering two choices: a paper and pencil test of aggressive
impulses or observation of actual aggression.
a. What problems might there be with
observing participants' aggressive behavior?
Hints:
In Box 5.3, consider points 1b and 7.
To see
how to solve these problems, refer to table 5.1.
b. What
would probably be the most serious threat to the validity of a paper-and-pencil
test of aggression?
Hint:
See pages 155-160.
What
information about the test would suggest that the test is a good instrument?
Hint: See p. 184: Both Figure 5.7 and Table 5.4 are
helpful.
8.
Think of a construct that you
would like to measure.
a.
Name that construct—No one right answer
b.
Define that construct
Definition
should be drawn from a dictionary, psychological dictionary, or theory
c.
Locate two published measures of
that concept (see Web Appendix B).
No
one right answer.
d.
Develop a measure of that
construct.
e.
What could you do to improve or
evaluate your measure’s reliability?
·
use
machines to record behavior
·
simplify
the observer's task
·
train
and motivate observers
·
provide
clear-cut guidelines on scoring
·
re-check
observer's ratings
·
standardize
the way the measure is administered
·
calculate
a test-retest reliability coefficient
f.
If
you had a year to try to validate your measure, how would you go about it? (Hint: Refer to the different kinds of validities
discussed in this chapter.)
Validation
strategies would include
·
Assessing
measure's reliability
·
Assessing
convergent validity
·
Assessing
discriminant validity
·
Assessing content validity
g.
How
vulnerable is your measure to subject and observer bias? Why? Can you change your measure to make it more resistant to
these threats?
To
make the measure less vulnerable to subject bias
Prevent participants from knowing
what behavior is being observed by
·
observing
them in a “non-research” setting
·
using
unobtrusive observation
·
using
unobtrusive measures
·
using
unexpected measures
Prevent
participants from knowing what concept you are trying to measure by
·
using
disguised measures
·
overwhelming
participants with measures
Use behaviors that participants
won't readily change by using
·
physiological
measures
·
important
behavior
To make the measure less vulnerable to observer bias
·
Don't
use human observers—use machines instead.
·
If
you must use human observers, make them “blind” measures)
·
Reduce
memory biases by permanently recording the behavior
·
Re-check
observer's ratings
·
Clearly
define the rating categories
·
Train
and motivate raters
·
Use only
the raters who were successful during training
9.
What problems do you see with
measuring "athletic ability" as 40-yard dash speed?
What steps would you take to improve this measure? (Hint:
Think about solving the problems of bias and reliability).
Hints:
10.
Think of a factor that you would like to manipulate.
a.
Define this factor as
specifically as you can.
No one correct answer.
b.
Find one example of this factor
being manipulated in a published study. Write down the
reference citation for that source.
No one correct answer.
c.
Would you use an environmental or
instructional manipulation? Why?
No one correct answer.
d.
How would you manipulate that
factor? Why?
Answer should focus on
·
standardization
·
reducing
experimenter bias
·
reducing
subject biases, including the use of a placebo treatment
·
consistency
with theoretical definitions of the construct
·
evidence
that the manipulation is effective, such as the results of manipulation checks
from other studies
e.
How could you perform a
manipulation check on the factor you want to manipulate? Would
it be useful to perform a manipulation check? Why or
why not?
There is no
one answer to how to perform the manipulation check. However,
there are clearer answers to the next two questions. Generally,
it is a good idea to perform a manipulation check because one should not simply
assume that a manipulation was interpreted the way that we wanted it to be
interpreted. The manipulation check provides evidence
that the treatment is valid (if it is) and may tell you where your study went
wrong (if the treatment manipulation is not valid). Thus,
if the study doesn't support the hypothesis, the manipulation check may help in
determining whether it was the hypothesis or the manipulation that was
faulty.