This is not the document you are looking for? Use the search form below to find more!

Report home > Psychology

A note on determining the number of cues used in judgment analysis studies: The issue of type II error

0.00 (0 votes)
Document Description
Many judgment analysis studies employ multiple regression procedures to estimate the importance of cues. Some studies test the significance of regression coefficients in order to decide whether or not specific cues are attended to by the judge or decision maker. This practice is dubious because it ignores type II error. The purposes of this note are (1) to draw attention to this issue, specifically as it appears in studies of self-insight, (2) to illustrate the problem with examples from the judgment literature, and (3) to provide a simple method for calculating post-hoc power in regression analyses in order to facilitate the reporting of type II errors when regression models are used.
File Details
Submitter
  • Username: shinta
  • Name: shinta
  • Documents: 4332
Embed Code:

Add New Comment




Related Documents

Watch movie The Note II: Taking a Chance on Love download free

by: fadheela, 1 pages

CLICK HERE or on IMAGE TO DOWNLOAD MOVIE

A Note on the Measurement of Primary Memory Capacity

by: shinta, 10 pages

It is argued that the modification of the traditional Waugh and Norman (1965) method for the estimation of primary-memory capacity proposed by Watkins (1979) is not consistent with the ...

Some Hassle-free Suggestions On Determining Indispensable Issues Of how a lot do braces cost.

by: duncanhollow614, 2 pages

There are quite a few completely unique forms of braces. There are the typical metal sorts that price from $five,000 to $7,000. Tooth colored ceramic braces will cost a different five hundred bucks ...

A Note on Corporate Strategy on Environment Sustainability

by: jamefrank715, 2 pages

Environment sustainability has developed as a promising keyword in today's corporate world. Corporate strategy on environment sustainability is basically designed by a company to provide a broader ...

A Note on Unanimity of Stockholders' Preferences among Alternative Production Plans: A Reformulation of the Ekern-Wilson Model

by: shinta, 5 pages

This note, which was stimulated by Ekern and Wilson's study ofthe theory of the firm in an economy with incomplete markets, givesconditions for ex ante and ex post stockholders to be unanimousin ...

PUTTING A VALUE ON OPENNESS: THE EFFECT OF PRODUCT SOURCE CODE RELEASES ON THE MARKET VALUE OF FIRMS

by: samanta, 38 pages

This study examines the effect of releasing the source code of commercial software products as open source software on the market value of firms. Using a sample of 30 software companies in the time ...

Impact of Derivatives Trading on Emerging Capital Markets: A Note on Expiration Day Effects in India

by: shinta, 21 pages

The impact of expiration of derivatives contracts on the underlying cash market – on trading volumes, returns and volatility of returns – has been studied in various contexts. We ...

A note on neglect defaulting

by: shinta, 9 pages

I introduce the notion of “neglect defaulting,” which labels the propensity to neglect possibilities which are ordinarily sensibly neglected. In familiar contexts we are well-tuned ...

South Korea Medical Tourism: Number of Medical Tourists & Medical Tourism Market Forecast to 2015

by: Ankit Mishra, 15 pages

South Korea Medical Tourism: Number of Medical Tourists & Medical Tourism Market Forecast to 2015 - Market Overview South Korea is a fast-growing medical tourism destination. It is home to some ...

A Guide On Lv.4 Weapon Quest

by: Veronese, 12 pages

A Guide On Lv.4 Weapon Quest

Content Preview
Judgment and Decision Making, Vol. 2, No. 5, October 2007, pp. 317–325
A note on determining the number of cues used in judgment
analysis studies: The issue of type II error
Jason W. Beckstead?
University of South Florida College of Nursing
Abstract
Many judgment analysis studies employ multiple regression procedures to estimate the importance of cues. Some
studies test the signi?cance of regression coef?cients in order to decide whether or not speci?c cues are attended to by
the judge or decision maker. This practice is dubious because it ignores type II error. The purposes of this note are (1) to
draw attention to this issue, speci?cally as it appears in studies of self-insight, (2) to illustrate the problem with examples
from the judgment literature, and (3) to provide a simple method for calculating post-hoc power in regression analyses
in order to facilitate the reporting of type II errors when regression models are used.
Keywords: judgment analysis, self-insight, multiple regression, post-hoc power.
1 Introduction
the task and by how well the overall regression equation
?ts the total set of responses.
For decades judgment analysts have successfully used
This issue is discussed in this note which is organized
multiple regression to model the organizing cognitive
as follows: First, examples from the judgment litera-
principles underlying many types of judgments in a vari-
ture are reviewed to illustrate the existence of the prob-
ety of contexts (see Brehmer & Brehmer, 1988; Cooksey,
lem. Second, notation commonly used by judgment an-
1996; Dhami, et al., 2004, for reviews). Most often these
alysts when describing regression procedures is intro-
models depict the individual judge or decision maker as
duced. Third, using this notation, a method for calcu-
combining multiple differentially weighted pieces of in-
lating the post-hoc power of t-tests on regression coef?-
formation (cues) in a compensatory manner to arrive at
cients based on the noncentral t distribution is described.
a judgment. Further, these analyses portray those who
Fourth, this method is applied to estimate the number
have acquired expertise on a judgment task as applying
of cases necessary for statistical signi?cance in order
their judgment model or “policy” with regular, although
to illustrate how the investigator’s conclusions about the
less than perfect, consistency. The ability of linear regres-
number of cues attended to in a judgment task should
sion models to accurately reproduce such expert judg-
be informed by considerations of type II error. Finally,
ments under various conditions has been discussed in de-
an SPSS program for performing the calculations is de-
tail (e.g., Dawes, 1979; Dawes & Corrigan, 1974; Ein-
scribed and provided in the Appendix.
horn & Hogarth, 1975). If one accepts the proposition
that people’s judgments can be modeled as though they
are multiple regression equations, questions arise such
2 Some examples in the judgment
as: 1) How many of the available cues does the individ-
ual use? and 2) How should the number of cues used be
literature
determined?
Too many researchers blindly apply statistical signi?-
Although it is reasonable to conclude that a “signi?cant”
cance tests to inform them — in a kind of deterministic
cue is important to the judge and reliably used as he or she
manner — whether judges did or did not attend to spe-
makes judgments, the converse does not follow. When a
ci?c cues. If the t-test calculated on a cue’s weight is
cue’s weight (regression coef?cient, standardized regres-
signi?cant, then the cue is counted as being attended to
sion coef?cient, or squared semipartial correlation) is not
by the judge. Relying on p values in this way is a prob-
signi?cant, it does not necessarily mean that the cue is
lem because these values are affected by the number of
unimportant; there may simply be insuf?cient statistical
cues and number of cases presented to the judge during
power to produce a signi?cant test result. Determining
the number of cues to which an individual attends is an
?Address: Jason W. Beckstead, University of South Florida College
of Nursing, 12901 Bruce B. Downs Boulevard MDC22, Tampa, Florida
important issue from both practical and theoretical view-
33612. Email: jbeckste@health.usf.edu
points. In a practical sense, informing poorly performing
317

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
318
judges that they should attend to more (or different) cues
uralistic and controlled cue con?gurations in their judg-
than they apparently do can improve their accuracy (see
ment tasks.
Balzer, et al., 1989, for review of cognitive feedback).
One area of research particularly sensitive to the prob-
Theories of cognitive functioning have long considered
lem at hand is the study of self-insight into decisions. The
determining the amount of information we process to be
assessment of self-insight in social judgment studies has
a relevant question (e.g., Gigerenzer & Goldstein, 1996;
traditionally compared statistical weights (derived via re-
Hammond, 1966; Miller 1956).
gression equations) with subjective weights. A widely ac-
In the typical judgment analysis the problem of type
cepted ?nding is that people have relatively poor insight
II error is overlooked. I know of no studies in the judg-
into their judgment policies (see Brehmer & Brehmer,
ment analysis literature that report the power of the sig-
1988; Harries, et al., 2000; Slovic & Lichtenstein, 1971,
ni?cance tests on cue weights when these tests are relied
for reviews). In most studies assessing insight, judges are
upon to determine the number of cues being used by a
required to produce subjective weights (e.g., distributing
judge. While an exhaustive review of the empirical liter-
100 points among the cues). “It was the comparison of
ature is beyond the scope of this note, a few examples are
statistical and subjective weights that produced the great-
presented to illustrate the problem.
est evidence for the general lack of self-insight” (Reilly,
Phelps and Shanteau’s (1978) purportedly determined
1996, p. 214). Another robust ?nding from this literature
the number of cues used by expert livestock judges in
is that people report using more cues than are revealed by
making decisions using two different experimental (“con-
regression models. “A cue is considered used if its stan-
trolled” and “naturalistic”) designs. The same seven live-
dardized regression coef?cient is signi?cant” (Harries, et
stock judges rated the breeding quality of gilts (female
al., 2000, p. 461).
breeding pigs) in two completely within-subject experi-
Two in?uential studies on insight by Reilly and Do-
ments. The controlled design used a partial factorial de-
herty (1989, 1992) asked student judges to recognize their
sign in which each judge made 128 judgments of gilts
judgment policies among those from several other judges.
described on 11 orthogonal cues. The naturalistic design
In the ?rst study seven of eleven judges were able to
used eight photographs of gilts. In this experiment the
identify their own policies. In contrasting this ?nding to
judges ?rst rated the breeding quality of the gilt in each
previous studies the authors noted “These data re?ect an
photo and then rated each photo on the same 11 cues used
astonishing degree of insight” (Reilly & Doherty, 1989,
in controlled design. This procedure was repeated, result-
p. 125). In the second study the number of cues and the
ing in a total of 16 judgments per judge. The authors then
stimulus con?guration were manipulated. Overall, 35 of
used signi?cance tests to determine whether speci?c cues
77 judges were able to identify their own policies. The
were being used by each judge in the two experiments.
authors reconciled this encouraging ?nding with the pre-
An important ?nding was that the judges used far more
vailing literature on methodologic grounds, arguing that
cues (mean = 10.1) in the controlled design than they did
the lack of insight shown in previous studies might be
in the naturalistic design (mean = 0.9). The relevant data
related to people’s inability to articulate their policies.
are summarized in Table 1. Using the F statistics reported
“There is the distinct possibility that while people have
in their Tables 1 and 2 to calculate estimates of effect
reasonable self-insight on judgment tasks, they do not
sizes (?2) reveals some paradoxical results; many of the
know how to express that insight. Or pointing the ?n-
cues showed stronger relationships to judgments in the
ger the other way round, while people do have insight we
naturalistic design. Because of the lower statistical power
do not know how to measure it” (1992, p. 305).
in the naturalistic design (the controlled design presented
In both these studies, when judges were presented with
128 cases whereas the naturalistic design presented only
policies, each judge’s set of cue weights (squared semi-
16) fewer cues were counted as signi?cant and it was con-
partial correlations in this case) was rescaled to sum to
cluded that less information was being used by all judges
100, and importantly, cues which did not account for sig-
under the naturalistic design.
ni?cant (p < .01) variance were represented as zeros.
When comparing the results of the two experiments the
The authors noted the majority of judges (in both studies)
authors attributed the difference in the amount of infor-
indicated that they had relied on the presence or absence
mation used by the experts to the stimulus con?guration,
of zeros as part of the search strategy used to recognize
“...the source of the discrepancy seems to be in the inter-
their own policies. The use of signi?cance tests to assign
correlations among the characteristics and not in the sta-
speci?c cues a rescaled value of zero in these studies is
tistical analysis” (Phelps & Shanteau, 1978, p.218). Al-
problematic for two reasons. First, the power of a signif-
though Phelps and Shanteau pointed out that the F statis-
icance test on a squared semipartial correlation in multi-
tics they report could easily be expressed as estimates of
ple regression is affected by the value of the multiple R2.
effect size they did not do so. If they had, they may have
As R2 increases, smaller weights are more likely to be
come to a different conclusion about the in?uences of nat-
signi?cant. Second, the power of these signi?cance tests

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
319
Table 1: Summary of results from Phelps and Shanteau (1978) with addition of effect size estimates.
No. of signi?cant cues
Median ?2
No. cues with
larger ?2 in
Judge
Controlled
Naturalistic
Controlled
Naturalistic
naturalistic
1
10
2
0.205
0.365
5
2
9
0
0.321
0.310
7
3
10
0
0.158
0.024
3
4
9
3
0.264
0.333
8
5
11
1
0.177
0.200
5
6
11
0
0.376
0.167
2
7
11
0
0.162
0.184
5
is affected by the number of predictors in the regression
were the ones that the judge indicated he or she used, b)
equation. The net result was that the criterion used to as-
where the judge indicated that a cue was not important it
sign zero to a speci?c cue was not constant across judges.
did not have a signi?cant weight, and c) there were cues
Only when all judges are presented with the same number
that the judge indicated were important but which did not
of cues and all have equal values of R2 for their resultant
have signi?cant weights. The authors’ choice of p value
policy equations could the criterion be consistently ap-
for determining whether a cue was attended to in the tacit
plied.
policies had in?uence on all three sides of this triangular
To illustrate, Reilly and Doherty (1989) presented 160
pattern.
cases containing 19 cues to each judge. Consider two
Approximately 10 months following the decision-
judges with different values of R2 based on 18 of the
making task, participants were presented with sets of de-
cues, say .90 and .50. The minimum detectable effect
cision policies in the form of bar charts rather than tables
(i.e., smallest weight that the 19th cue could take and still
of numbers. Cues with statistically signi?cant weights
be signi?cant) for the ?rst judge is .008 but .039 for the
were presented as darker bars. With only four cues hav-
second judge. The same problem exists in the 1992 study
ing signi?cant effects on decisions (Harries et al., 2000,
that used 100 cases and is compounded by the fact that
p. 457), it is possible that physicians used the presence or
the authors manipulated the number of cues presented to
absence of lighter bars in the same way that Reilly and
the judges; half the sample rated cases described by six
Doherty’s students made use of zeros in their recogni-
cues and the other half rated cases described by twelve
tion strategies. Had more cues been classi?ed and pre-
cues. In the recognition portion of both studies the useful
sented as signi?cant, the policy recognition task might
pattern of zeros in the cue pro?les was an artifact intro-
have proved more dif?cult.
duced arbitrarily by the use of signi?cance tests. Had the
Other examples exist in the applied medical judgment
authors used p < .05 rather than p < .01 to assign zeros,
literature. Gillis et al. (1981) relied extensively on p val-
their conclusions about insight might have been astonish-
ues of beta weights for describing the judgment poli-
ingly different.
cies of 26 psychiatrists making decisions to prescribe
Harries et al. (2000, Study 1), examining the prescrip-
haloperidol based on 8 symptoms (see their Table 4).
tion decisions of a sample of 32 physicians, replicated
Averaged across judges, the number of cues used was
the ?nding that people are able to select (recognize) their
2.4, 1.9, or 1.0 depending on the p value employed (.05,
policies among those from several others. This study fol-
.01, or .001, respectively). Had the investigators chosen
lowed up on the participants in a decision making task
to compare the number of cues used with self-reported
(Evans, et al., 1995) in which 100 cases constructed from
usage, which of the three p values ought they have re-
13 cues were judged and regression analysis was used
lied upon? Had the investigators rescaled and presented
to derive decision policies. Judges also provided sub-
policies to participants for recognition (via Reilly & Do-
jective cue weights, ?rst indicating the direction (sign)
herty), their choice of p value could have affected the dif-
of in?uence, then rating how much (0–10 scale) the cue
?culty of the recognition task.
had bearing on their decisions. When comparing tacit
More recently, in a judgment analysis of 20 prescribing
to stated policies (i.e., regression weights to subjective
decisions made by 40 physicians and four medical guide-
weights) Harries et al. (2000) described a “triangular pat-
line experts, Smith et al. (2003) reported “The number of
tern of self-insight”: a) cues that had signi?cant weights
signi?cant cues . . . varied between doctors, ranging from

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
320
0 to 5” (p. 57), and among the experts “The mean num-
have been presented that highlight the problems associ-
ber of signi?cant cues was 1.25” (p. 58). It is noteworthy
ated with using signi?cance tests to determine the number
that this study presented doctors with a relatively small
of cues used in judgment tasks. Tests of signi?cance on
number of cases thus leaving open the meaning of “sig-
regression coef?cients or R2 are really not very enlight-
ni?cant.” Had Smith et al. presented more than 20 cases,
ening for distinguishing the “best” judgment model from
they may have concluded (based on p values) that doctors
among a set of competing models. The true test of which
and guideline experts attended to more information when
model (among a set of contenders) is the best is the abil-
making prescribing decisions.
ity of the equation to predict the judgments made in some
Other models of judgment, known as “fast and frugal
future sample of cases, the data from which were not used
heuristics” have recently been proposed as alternatives to
to estimate the regression equation. The remaining sec-
regression models (see Gigerenzer, 2004; Gigerenzer &
tions of this note formally present the regression model
Kurzenhäuser, 2005; Gigerenzer, et al., 1999). A hall-
as used in judgment analysis and discuss a method for
mark of fast and frugal models is that they are purported
assessing the power of signi?cance tests so as to provide
to rely on far fewer cues than do judgment models de-
more information to judgment analysts who use them.2
scribed by regression procedures. When comparing these
classes of models, the number of cues the judge uses is
one way of differentiating the psychological plausibility
3 Notation
of these models (see Gigerenzer, 2004). Studies compar-
ing regression models with fast and frugal models have
Following Cooksey (1996), let the k cues be denoted by
implied that signi?cance testing is the method of deter-
subscripted X’s (e.g., X1 to Xk). In a given judgment
mining the number of cues used despite the fact that the
analysis a series of m pro?les or cases is constructed
developers of these methods (e.g., Stewart, 1988) made
where each case is comprised of k cues. The judge or
no such claim and currently advise against it (Stewart,
subject makes m responses Ys to these cases. The re-
personal communication, July 2, 2007).
sulting multiple regression equation representing the sub-
ject’s judgment policy is of the general form
In a study comparing regression with fast and frugal
heuristics, Dhami and Harries (2001) ?tted both types of
models to 100 decisions made by medical practitioners.
Ys = b0 + b1X1 + b2X2 + ... + bkXk + e
(1)
They report that number of cues attended to was signif-
icantly greater when modeled by regression than by the
where b0 represents the regression constant and the re-
matching heuristic. According to the regression models
maining bi represent regression coef?cients for each cue
the average number of cues used was 3.13 and the average
where each coef?cient indicates the amount by which the
for the fast and frugal models was 1.22. “In the regres-
prediction of Ys would change if its associated cue value
sion model a cue was classi?ed as being used if its Wald
changed by one unit while holding all other cue values
statistic was signi?cant (p < .05) . . . ” (Dhami & Harries,
constant, and e represents residual or unmodeled in?u-
2001, p. 19). In the heuristic model, the number of cues
ences.
used was determined by the percentage of cases correctly
Tests of signi?cance may be employed to assess the
predicted by the model; signi?cance tests were not used.
null hypothesis that the value of bi in the population is
At issue is not the fact that different criteria were used to
zero, thus H0: bi = 0 against the alternative H1: bi =
count the cues used under the two types of models (al-
0. The ratio bi/SEbi is distributed as a t statistic with
though this is a problem when evaluating their results),
degrees of freedom (df ) = m - k - 1. The SEbi is found
but rather, that the authors relied on a signi?cance test
as
known for some time to be dubious1, and their choice of
p value for counting cues may have biased their data to
sdY s
1 ? R2
1
Y s
favor the psychological plausibility of the fast and frugal
SEbi =
×
(2)
sdXi
m ? k ? 1
R2
model. Had they used p < .01 rather than p < .05, the
Xi
average number of cues used according to the regression
where sdY s and sdXi are, respectively, the standard de-
procedure would presumably have been lower, and per-
viations for the judgments and for the ith cue’s values;
haps not different than the average found for the matching
R2 is the squared multiple correlation for the judgment
Y s
heuristic.
equation; and R2
is the squared multiple correlation
Xi
In the last few paragraphs, examples from the literature
2The utility of statistical signi?cance and hypothesis testing as a
general approach has been questioned by researchers in the social sci-
1Hauck and Donner (1977) found that the Wald test behaves in an
ences (e.g., Armstrong, 2007; Nickerson, 2000; Rozeboom, 1960). I
aberrant manner. Jennings (1986) has also questioned the adequacy of
believe that many of us are likely to continue to rely on this approach
the Wald test for making statistical inferences. Hosmer and Lemeshow
for some time. It is therefore important that we fully understand the
(2000) recommend using the likelihood-ratio test instead.
assumptions, mechanics, and limitations of this approach.

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
321
from a regression analysis predicting the ith cue’s values
5 Estimating the number of cases
from the values of the remaining k ? 1 cues. In stan-
necessary for signi?cant t-test of
dard multiple regression it can be shown that the signif-
icance test of b
regression coef?cients
i (t = bi/SEbi) is equivalent to testing
signi?cance of the standardized regression coef?cient ?i
and the squared semipartial correlation associated with
Faced with such a nonsigni?cant result, as in the example
Xi (see Pedhazur, 1997). This is fortunate because most
presented above, the judgment analyst may wish to know
commercially available statistics packages routinely print
the extent to which this outcome was related to the study
values for SEbi but not for SE?i.3
design. In particular, how was the nonsigni?cant t-test of
the cue weight affected by his or her decision to present
m cases to the judge instead of some larger number m??
4 Post-hoc power analysis on t-test To address this question we must ?rst clarify the types of
of regression coef?cients
the stimuli used in judgment studies.
Brunswik (1955) argued for preserving the substantive
Having analyzed data from a judgment analysis using
properties (content) of the environment to which the in-
multiple regression it is rather simple to calculate the sta-
vestigator wishes to generalize in the stimuli presented
tistical power associated with the t-test of each regression
during the experimental task. Hammond (1966), in at-
coef?cient. All that is needed from the analysis is the ob-
tempting to overcome the dif?culties inherent in such
served value of t, its df, and the a priori speci?ed value
representative designs, distinguished between “substan-
of ?. To obtain the power of the t-test that H
tive” and “formal” sampling of stimuli. Formal stimulus
0: bi = 0 for
? = .05, one may employ the noncentral distribution of
sampling concerns the relationships among environmen-
the t statistic (see Winer et al., 1991, pp. 863–865), here
tal stimuli (with content ignored). The following discus-
denoted t , which is actually a family of distributions de-
sion is limited to studies employing formal stimulus sam-
?ned by df and a noncentrality parameter ?, hence t (df ;
pling. When taking the formal approach to stimulus sam-
?). In the present context ? = b
pling, the investigator’s focus is on maintaining the statis-
i / SEbi. The power of
the t-test on the regression coef?cient may then be deter-
tical characteristics of the task environment (e.g., k, sdXi
mined as
and R2Xi) in the sample of stimuli presented to the par-
ticipant. These characteristics of the environment may be
summarized as a covariance matrix, ?. If the investigator
P rob(t ) > tdf, 1??/2|? = bi/SEbi) =
obtains a sample of m stimuli from the environment, the
1 ? P rob(type II error)
(3)
covariance matrix Sm, may be computed from the sam-
ple and compared with ?. The basic assumption of for-
Thus the probability that the noncentral t will be greater
mal stimulus sampling may then be stated as S
than the critical value of t, given the observed value of
m ? ?.
Whether probability or nonprobability sampling is used,
t = bi / SEbi, is equal to the power of the test that H0:
it is possible for the investigator to construct an alterna-
bi = 0 for ? = .05. For example, consider the following
tive set of m? cases such that S
result from an illustrative judgment analysis involving k
m* = Sm. Under the condi-
tion that S
= 6 cues and m = 30 cases provided by Cooksey (1996,
m* = Sm ? ?, it is possible to estimate SEbi*,
the standard error of the regression coef?cient based on
p.175). The unstandardized regression coef?cient for a
the larger sample of cases m?. Inspection of Eq. (2) re-
particular cue is b = 0.267, (? = .295) its standard error
veals that SEb
is 0.146, thus t = 0.267/0.146 = 1.829. The critical value
i becomes smaller as the number of cases
m becomes larger. Holding all other terms in Eq. (2)
for t with df = 30 - 6 - 1 = 23, and ? = .05 for a two-
constant, SEb
tailed test is 2.069; consequently the null hypothesis is
i* may be found as
not rejected and it might be concluded that this cue is
SEb
unimportant to the judge. Using the information from this
SEb?
i
i =
(4)
signi?cance test and the noncentral distribution of t (df =
m??k?1
m?k?1
23; ? = 1.829) we ?nd that the probability of type II error
= .582, and thus the power to reject the null is only .418.
Substituting SEbi* in place of SEbi when calculating
To claim that this cue is “unimportant to the judge,” or
t-test on bi allows us to estimate the impact of increasing
“is not being attended to by the judge” does not seem
m to m? on type I error in the same judgment analysis.
justi?able in light of the rather high probability of type II
Making the same substitution in Eq. (3) allows us to esti-
error.
mate the impact of this change on type II error and power.
Stewart (1988) has discussed the relationships among
3The method presented here is also directly applicable to standard-
ized regression coef?cients when their corresponding standard errors
k, R2Xi, and m and recommends m = 50 as a minimum
are available.
for reliable estimates of cue weights when k ranges from

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
322
4 — 10 and R2 = 0. He points out that as the intercor-
Xi
relations among the cues increases the number of cases
Table 2: Illustration of the in?uence of the number of
will need to be increased in order to maintain reason-
cases m* on t-tests of regression coef?cient
ably small values of SEbi. Of course the investigator’s
m?
SEb?
t-test
p-value
choice of m should also in?uenced by his or her sense
of subject burden. Stewart notes from empirical evidence
40
0.322
1.313
.198
that most judges can deal with making between “40 to
50
0.282
1.498
.141
75 judgments in an hour, but the number varies with the
60
0.254
1.664
.102
judge and the task” (Stewart, 1988, p.46). In discussing
the design of judgment analysis studies Cooksey (1996)
70
0.233
1.814
.074
has suggested that the optimal number of cases may be
80
0.217
1.952
.055
closer to 80 or 90. Reilly and Doherty (1992) reported
90
0.203
2.082
.040
the average time for 77 judges to complete 100 12-cue
Note: The regression coef?cient b = -0.423 and SEb
cases was 1.25 hours. In a recent study by Beckstead
= 0.386 for m = 30. The t-test of this coef?cient
and Stamp (2007) 15 judges took on average 32 minutes
was t = -1.096, p = .284; post-hoc power of the t-
(range 20–47) to respond to 80 cases constructed from 8
test is given by the Eq. (3) as .182. Due to negative
cues.
sign of regression coef?cient, resulting t-test values
For the example given in the previous section, if the in-
are negative; the sign has been omitted for clarity of
vestigator had used m? = 40, rather than m = 30, Eq. (4)
presentation.
indicates that SEbi* would have been 0.122 and the re-
sulting value for the t-test would have been 2.191 with p =
.036. The point here is that had the investigator presented
7 Summary and recommendations
10 more cases (sampled from the same population), he or
she might have come to a different conclusion about the
In this note the issue of type II error has been raised in
number of cues attended to by this judge.
the context of determining whether or not a cue is impor-
tant to a judge in judgment analysis studies. Some of the
potential pitfalls of relying on signi?cance tests to deter-
6 An SPSS program for calculat- mine cue utilization have been pointed out and a simple
ing post-hoc power in regression method for calculating post-hoc power of such tests has
been presented. A short computer program has been pro-
analysis
vided to facilitate these analyses and encourage the calcu-
lation (and reporting) of statistical power when judgment
The calculations for determining post-hoc power for tests
analysts rely on signi?cance tests to inform them as to the
of regression coef?cients as used in judgment analysis
number of cues attended to in judgment tasks.
studies and estimating SEbi* are straightforward and
As a tool for understanding the individual’s cognitive
based on statistical theory, however detailed tables of
functioning, regression analysis has proved to be quite
noncentral t distributions are hard to come by. The author
useful to judgment researchers for over 40 years. In this
has written an SPSS program for performing these calcu-
role I believe that its true value lies in its descriptive, not
lations that is provided in the Appendix. To illustrate the
its inferential, facility. Like any good tool, if we are to
program, consider another cue taken from the same ex-
continue our reliance upon it we must insure that it is in
ample found in Cooksey (1996, p.175) where b = -0.423,
proper working order and not misuse it.
SEb = 0.386, and k = 6 for m = 30. Inserting these values
There are alternative models of judgment being advo-
into the program and specifying that the number of cases
cated (e.g., probabilistic models proposed by Gigerenzer
increase to 90 by increments of 10, produces the result
and colleagues) that do not fall prey to the problems as-
shown in Table 2.
sociated with regression analysis. However, as judgment
As m? increases, the estimated values of SEb? de-
researchers develop, test, and apply these models, ques-
crease and the values of the t-statistic increase. Accord-
tions about the amount of information (i.e., the number of
ing to these estimates, the t-test on this cue would have
cues) individuals use when forming judgments and mak-
been signi?cant had approximately 85 cases been used in
ing decisions are bound to arise. The strongest evidence
the judgment task. The program can be “rerun” specify-
for the veracity of any judgment model is its ability to
ing a smaller increment in order to re?ne this estimate.
predict the outcomes of future decisions.
The results provided by such an analysis could also be
The practice of post-hoc power calculations as an aid
very useful in the planning of subsequent judgment stud-
in the interpretation of nonsigni?cant experimental re-
ies.
sults is not without its critics (e.g., Hoenig & Heisey,

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
323
2001; Nakagawa & Foster, 2004). Hypothesis testing is
schemes for decision making. Organizational Behav-
easily misunderstood but when applied with good judg-
ior and Human Performance, 13, 171–192.
ment it can be an effective aid to the interpretation of
Evans, J. St. B. T., Harries, C., Dennis, I., & Dean, J.
experimental data (Nickerson, 2000). Higher observed
(1995). General practitioners’ tacit and stated policies
power does not imply stronger evidence for a null hy-
in the prescription of lipid lowering agents. British
pothesis that is not rejected (see Hoenig & Heisey, 2001
Journal of General Practice, 45, 15–18.
for discussion of the power approach paradox). Some
Gigerenzer, G. (2004). Fast and frugal heuristics: The
researchers have argued for abandoning the use hypoth-
tools of bounded rationality. In D. J. Koehler and N.
esis testing altogether and relying instead on the con-
Harvey (Eds.), Blackwell handbook of judgment and
?dence interval estimation approach (Armstrong, 2007;
decision making, (pp. 62–88). Oxford: Blackwell Pub-
Rozeboom, 1960). I tend to agree with Gigerenzer and
lishing.
colleagues who put it succinctly, “As long as decisions
Gigerenzer, G. & Goldstein, D. G. (1996). Reasoning the
based on conventional levels of signi?cance are given
fast and frugal way: Models of bounded rationality.
top priority . . . theoretical conclusions based on signi?-
Psychological Review, 103, 650–669.
cance or nonsigni?cance remain unsatisfactory without
Gigerenzer, G., & Kurzenhäuser, S. (2005). Fast and
knowledge about power” (Sedlmeier & Gigerenzer, 1989,
frugal heuristics in medical decision making. In R.
p. 315).
Bibace, J. D. Laird, K. D. Noller, and J. Valsiner (Eds.),
Science and Medicine in Dialogue: Thinking through
Particulars and Universals, (pp. 3–15). Westport CN:
References
Praeger.
Gigerenzer, G., Todd, P. M., & the ABC Research Group
Armstrong, J. S. (2007). Signi?cance tests harm progress
(Eds.) (1999). Fast and frugal heuristics: The adaptive
in forecasting. International Journal of Forecasting,
toolbox.
23, 321–327.
Gillis, J. S., Lipkin, J. O., & Moran, T. J. (1981). Drug
Balzer, W. K., Doherty, M. E., & O’Connor, R. Jr. (1989).
therapy decisions. Journal of Nervous and Mental Dis-
Effects of cognitive feedback on performance. Psycho-
ease, 169, 439–437.
logical Bulletin, 106, 410–433.
Hammond, K. R. (1966). Probabilistic functionalism:
Beckstead, J. W., & Stamp, K. D. (2007). Understand-
Egon Brunswik’s integration, of the history, theory,
ing how nurse practitioners estimate patients’ risk for
and method of psychology. In K. R. Hammond (Ed.),
coronary heart disease: A judgment analysis. Journal
The psychology of Egon Brunswik. New York: Holt
of Advanced Nursing, 60, 436–446.
Rinehart & Winston, (pp. 15–80).
Brehmer, A., & Brehmer, B. (1988). What have we
Harries, C., Evans, J. St. B. T., & Dennis, I. (2000). Mea-
learned about human judgment from thirty years of
suring doctors’s self-insight into their treatment deci-
policy capturing? In B. Brehmer & C. R. B. Joyce
sions. Applied Cognitive Psychology, 14, 455–477.
(Eds.), Human judgment: The SJT view, (pp. 75–114).
Hauck, W. W. & Donner, A. (1977). Wald’s test as ap-
Amsterdam: Elsevier Science Publishers.
plied to hypotheses in logit analysis. Journal of the
Brunswik, E. (1955). Representative design and proba-
American Statistical Association, 82, 1110–1117.
bilistic theory in functional psychology. Psychological
Hoenig, J. M. & Heisey, D. M. (2001). The abuse of
Review, 62, 193–217.
power: The pervasive fallacy of power calculations for
Cooksey, R. W. (1996). Judgment analysis: Theory,
data analysis. American Statistician, 55, 19–24.
methods, and applications. San Francisco: Academic
Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic
Press.
regression, 2nd Ed. New York: John Wiley & Sons,
Dawes, R. M., (1979). The robust beauty of improper
Inc.
linear models in decision making. American Psychol-
Jennings, D. E. (1986). Judging inference adequacy in
ogist, 34, 571–582.
logistic regression. Journal of the American Statistical
Dawes, R. M., & Corrigan, B. (1974). Linear models in
Association, 81, 987–990.
decision making. Psychological Bulletin, 81, 95–106.
Miller, G. A. (1956). The magical number seven plus or
Dhami, M. K., Hertwig, R., & Hoffrage, U. (2004). The
minus two: Some limits on our capacity for processing
role of representative design in an ecological approach
information. Psychological Review, 63, 81–97.
to cognition. Psychological Bulletin, 130, 959–988.
Nakagawa. S. & Foster, T. M. (2004). The case against
Dhami, M. K., & Harries, C., (2001). Fast and frugal ver-
retrospective statistical power analyses with an intro-
sus regression models of human judgment. Thinking
duction to power analysis. Acta Ethologica, 7, 103–
and Reasoning, 7, 5–27.
108.
Einhorn, H. J., & Hogarth, R. M. (1975).Unit weighting
Nickerson, R. S. (2000). Null hypothesis signi?cance

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
324
testing: A review of an old and continuing controversy.
Sedlmeier, P. & Gigerenzer, G. (1989). Do studies of sta-
Psychological Methods, 5, 241–301.
tistical power have an effect on the power of studies?
Pedhazur, E. J. (1997). Multiple regression in behavioral
Psychological Bulletin, 105, 309–316.
research: Explanation and prediction, 3rd ed. Fort
Slovic, P. & Lichtenstein, S. (1971). Comparison of
Worth: Harcourt Brace College Publishers.
Bayesian and regression approaches to the study of in-
Phelps, R. H, & Shanteau, J. (1978). Livestock judges:
formation processing in judgment. Organizational Be-
How much information can an expert use? Organi-
havior and Human Performance, 6, 649–744.
zational Behavior and Human Performance, 21, 209–
Smith, L., Gilhooly, K., & Walker, A. (2003). Factors
219.
in?uencing prescribing decisions in the treatment of
Reilly, B. A. (1996). Self-insight, other-insight, and their
depression: A Social Judgment Theory approach. Ap-
relation to interpersonal con?ict. Thinking and Rea-
plied Cognitive Psychology, 17, 51–63.
soning, 2, 213–222.
Stewart, T. R. (1988). Judgment analysis: Procedures. In
Reilly, B. A., & Doherty, M. E. (1989). A note on the
B. Brehmer & C. R. B. Joyce (eds.) Human judgment:
assessment of self-insight in judgment research. Orga-
The SJT view, (pp. 41–74). Amsterdam: Elsevier Sci-
nizational Behavior and Human Decision Processes,
ence Publishers.
44, 123–131.
Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Sta-
Reilly, B. A., & Doherty, M. E. (1992). The assess-
tistical principles in experimental design, 3rd ed. New
ment of self-insight in judgment policies. Organiza-
York: McGraw-Hill, Inc.
tional Behavior and Human Decision Processes, 53,
285–309.
Rozeboom, W. W. (1960). The fallacy of the null hy-
pothesis signi?cance test. Psychological Bulletin, 57,
416–428.

Judgment and Decision Making, Vol. 2, No. 5, October 2007
Power and tests of cue weights
325
Appendix
The following is an SPSS program to calculate post-hoc power of t-test on regression coef?cients and to estimate
sample size needed for signi?cance of such tests. After typing the commands into a syntax window and supplying
information speci?c to your analysis, simply run the program to obtain results similar to those found in Table 2.
Color Key: commands, comments, information to be supplied by the user.
**------------------------------------------------------------------------------------.
**ENTER NECESSARY INFORMATION FROM MULTIPLE REGRESSION ANALYSIS HERE*.
DEFINE @STUFF ().
COMPUTE b = -0.423
/*unstandardized regression coefficient */.
COMPUTE SEb = 0.386
/*standard error of regression coefficient */.
COMPUTE k = 6
/*number of predictors in regression equation */.
COMPUTE N = 30
/*number of observations or cases */.
COMPUTE alpha = .05
/*type I error criterion */.
COMPUTE maxN = 90
/*maximum value of N for table of estimates */.
COMPUTE incN = 10
/*increment in N for table of estimates */.
!ENDDEFINE .
**------------------------------------------------------------------------------------.
**CALCULATING POST-HOC POWER for t-TEST of REGRESSION COEFFICIENT.
NEW FILE.
INPUT PROGRAM.
@STUFF.
COMPUTE t = ABS(b/SEb)
/*confirming t-test on b found in reg output */.
COMPUTE df = N-k-1
/*degrees of freedom for t-test on b */.
COMPUTE tcrit = IDF.T(1-(alpha/2),df)
/*critical value of t for desired ? */.
COMPUTE t_prob = 2*(1-CDF.T(t,df))
/*this is obs p value for t-test on b */.
COMPUTE Power = 1-NCDF.T(tcrit,df,t)
/*post-hoc power for obs t-test on b */.
END CASE.
END FILE.
END INPUT PROGRAM.
FORMAT N k DF (F3.0) t_prob t b SEb Power (F8.3).
LIST b SEb t k N t_prob power.
**ESTIMATING SAMPLE SIZE NECESSARY FOR t-TEST OF b TO BE SIGNIFICANT.
NEW FILE.
INPUT PROGRAM.
@STUFF.
LOOP newN = N+incN TO maxN BY incN.
COMPUTE SEbStar = SEb/SQRT((newN-k-1)/(N-k-1))
/*est of SEb under new N */.
COMPUTE tcritN = IDF.T(1-(alpha/2),newN-k-1)
/*crit t value for desired ? */.
COMPUTE tstar = ABS(b/SEbStar)
/*est of t under new N */.
COMPUTE t_probN = 2*(1-CDF.T(tstar,newN-k-1))
/*est of p-value for tstar */.
COMPUTE powerN = 1-NCDF.T(tcritN,newN-k-1,tstar)
/*estd power of test under new N */.
END CASE.
LEAVE b SEb k N alpha.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMAT newN (F5.0) SEbStar powerN tstar t_probN b (F5.3).
LIST newN b SEbStar tstar t_probN powerN.

Download
A note on determining the number of cues used in judgment analysis studies: The issue of type II error

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share A note on determining the number of cues used in judgment analysis studies: The issue of type II error to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share A note on determining the number of cues used in judgment analysis studies: The issue of type II error as:

From:

To:

Share A note on determining the number of cues used in judgment analysis studies: The issue of type II error.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share A note on determining the number of cues used in judgment analysis studies: The issue of type II error as:

Copy html code above and paste to your web page.

loading