Chapter 5 Glossary
operational definition: a clear, specific, publicly observable way to measure or manipulate a variable; a "recipe" for how you are going to measure or manipulate your factors. You hope that your operational definition will be unbiased, reliable, and valid. (p. 144)
bias: systematic errors that can push the scores in a given direction. Bias may lead to "finding" the results that the researcher expected or wanted. (p. 147)
observer bias: bias created by the observer seeing what the observer expects to see, or selectively remembering/counting/looking for data that support the observer's point of view (also known as scorer bias). (p. 150)
blind (masked), blind observer: an observer who is unaware of the participant's characteristics and situation. Using blind observers reduces observer bias. (p. 152)
random error of measurement: inconsistent, unsystematic errors of measurement. Carelessness on the part of the researcher administering the measure, on the part of the participant taking the test, and on the part of the individual scoring the measure can cause random error. (p. 147)
reliable, reliability: the extent to which a measure produces stable, consistent scores. Measures are able to produce such stable scores if they are not strongly influenced by random error. A measure can be reliable, but not valid (e.g., it could be measuring the wrong thing reliably). However, if a measure is not reliable, it cannot be valid (if it is measuring a stable trait, it must produce stable scores; if your friend's height is stable but your measurements vary by an average of a foot, your average measurement must be off by at least a foot). (p. 161)
test-retest reliability: away of assessing the total amount of random error in a measure by administering the measure to participants at two different times and then correlating their results. Ideally, you would have high test-retest reliability (above .80) showing that participants got about the same score the first time they were tested as they got when they were tested the second time (the retest). Low test-retest reliability could be due to observers/raters inconsistently scoring the behavior, researchers not keeping the testing instructions and conditions consistent, or the measure being affected by random factors (e.g., questions being so unclear that participants guess at them, the connection between the heart rate monitor and the skin being inconsistent, etc.) Low test-retest reliability leads to low validity. If you have low test-retest reliability, you might compute interobserver reliabilities (to see whether the problem is due to the observer) and measures of internal consistency (to see whether the problem is due to some of your questions). (p. 163)
interobserver (interjudge) agreement: p;the percentage of times the raters agree. For many measures, interobserver agreement (e.g., objective measures such as multiple-choice tests) would not be calculated because it would be assumed to be 100%. If interobserver agreement is low, you probably want to take steps to make the measure more objective. (p. 166)
interobserver reliability: like interobserver agreement, interobserver reliability is an index of the degree to which different raters rate the same behavior similarly. Low interobserver reliability probably means that random observer error is making the measure unreliable. (p. 166)
internal consistency: the degree to which all the items on a measure correlate with each other. If you have high internal consistency, all the questions seem to be measuring the same thing. If, on the other hand, answers to some questions are inconsistent with answers to other questions, this inconsistency may be due to some answers being (1) strongly influenced by random error or being (2) influenced by different constructs. Internal consistency can be estimated through average correlations, split-half reliability coefficients, and Cronbach's alpha. Do not confuse internal consistency with internal validity: They have nothing to do with each other. (p. 169)
subject (participant) biases: ways the participant can bias the results. The two main subject biases are (1) trying to help the researcher out by giving answers that will support the hypothesis, and (2) trying to make a good impression by giving the socially desirable response. (p. 155)
social desirability bias: participants acting in a way that makes the participant look good. (p. 158)
demand characteristics: aspects of the study that allow the participant to figure out how the researcher wants that participant to behave. (p. 156)
unobtrusive measurement: recording a particular behavior without the participant knowing you are measuring that behavior. Unobtrusive measurement reduces subject biases such as social desirability bias and obeying demand characteristics. (p. 157)
instructional manipulation: manipulating the variable by giving written or oral instructions. For example, some participants might be asked to repeat a list of words whereas other participants might be asked to think about personal experiences they have had with the word. (p.188)
environmental manipulation: a manipulation that involves changing the participant's environment rather than giving the participant different instructions. For example, some participants might hear happy music whereas others would hear sad music. (p. 189)
stooges (confederates): people who seem (to the real participants) to be participants, but who are actually the researcher's assistants. (p. 190)
construct validity: the degree to which an operational definition reflects the concept that it claims to reflect (thus, the focus of this chapter was on construct validity). Establishing content, convergent, and discriminant validity are all methods of arguing that your measure has construct validity. Do not confuse construct validity with content validity. (p. 176)
content validity: the extent to which a measure is thought to represent a balanced and adequate sampling of relevant dimensions, knowledge, and skills. (p. 176)
convergent validity: making the case for your measure's construct validity by demonstrating that it correlates with what it should: other measures, manipulations, or correlates of the construct. (p. 179)
known-groups technique: a convergent validity tactic that involves seeing whether groups known to differ on a characteristic differ on a measure of that characteristic (e.g., if your measure of Christian beliefs is valid, Christian ministers should differ from atheists on that measure). (p. 179)
discriminant validity: the extent to which the measure does not correlate strongly with measures of constructs other than the one you claim to be measuring. If you were trying to measure generosity, you might use discriminant validity to show that you are not measuring an unrelate thing (e.g., social desirability) or a related thing (empathy). (p. 180)
experimenter (researcher) bias: experimenters being more attentive to participants in the treatment group or giving different nonverbal cues to treatment group participants than to other participants. (p.186)
standardization: treating
each participant in the same (standard) way. Standardization should reduce experimenter
bias. (p. 186)
manipulation check: a question or set of questions designed to determine whether participants perceived the manipulation in the way that the researcher intended. For example, if you were manipulating the attractiveness of the experimenter, you might have participants in both the unattractive experimenter and attractive experimenter conditions rate the attractiveness of the experimenter to see if you had successfully manipulated participants' perceptions of the experimenter's attractiveness. (p. 187)
placebo treatment: a "fake" treatment; a treatment that the researcher knows has no real effect. To reduce the impact of subject (participant) bias, the group getting the real treatment is compared to a group getting a placebo treatment--rather than to a group that knows it is not getting a treatment. (p. 186)