For the past 10 years,
intense scientific controversy has engulfed one of psychology’s most
widely used assessment methods, Exner’s Comprehensive System for the
Rorschach (CS) (Exner, 2003). Heated debates and critical articles
concerning the CS have appeared in more than a dozen scholarly
journals, including the Journal of Personality Assessment,
Psychological Assessment, Assessment, the Journal of Clinical
Psychology, Clinical Psychology Review, Clinical Psychology: Science
and Practice , and Professional Psychology: Research and
Practice.
No longer confined to the pages of scholarly
journals, the controversy over the Exner CS Rorschach has spread to
the national press. The New York Times (Goode, 2001, 2004), Los
Angeles Times (Mestel, 2003), and Scientific American (Lilienfeld,
Wood, & Garb, 2001; see also Lilienfeld, Wood, & Garb, 2000)
have carried feature articles on the debate. A scathing critique in
the New York Review of Books (Crews, 2004/2005) recently called on
psychologists to abandon clinical use of the test.
During the past year, proponents of the Exner
system have attempted to dampen the controversy. Irving Weiner,
President of the Board of Trustees of the Society for Personality
Assessment (SPA), published an article in the Spring issue of the
Independent Practitioner (Weiner, 2005) that categorically rejected
all criticisms of the CS. A few months afterward the SPA Board
released a White Paper for psychologists, attorneys, and judges that
broadly endorsed use of the test in clinics and courtrooms (Board of
Trustees of the Society for Personality Assessment, 2005, hereafter
cited as Board of Trustees, 2005).
The unyielding stance adopted by Weiner and other
proponents of the CS is exemplified by the closing sentence of his
Independent Practitioner piece:
Although some critics have questioned the
psychometric soundness and legal suitability of Rorschach
assessment, their criticisms lack any solid conceptual or empirical
basis. (Weiner, 2005, p. 82) (emphasis added)
Is it true, as Weiner asserts, that criticisms of
the CS lack any rational basis? If so, why have editors and
reviewers at more than a dozen respected journals allowed them to be
published? Or instead, is Weiner’s absolute rejection of all
criticism a warning sign of serious problems within the Rorschach
community? Have he and other defenders of the Exner system lapsed
into a siege mentality, so that even legitimate criticisms are
rejected?
The easiest course for clinicians would be simply
to accept Weiner’s (2005) blanket assurance that all is well with
the Exner system. If they already use the CS Rorschach, they could
continue to do so without feeling uncertainty or doubt. However,
most clinicians will probably conclude that the easiest course is
not the best one, and that they have a responsibility to understand
the controversy and stay abreast of research findings. Only by doing
so can they provide the best possible service to their clients.
In the present article, we provide evidence for
five conclusions regarding the Rorschach controversy: (1) The Exner
norms are in error and seriously overpathologize adults and
children; (2) Meta-analyses indicate that at least some Rorschach
scores are valid; (3) Twenty CS Rorschach scores are valid; (4) The
remaining 160 CS scores lack demonstrated validity; and (5) About
25% of CS scores lack adequate scoring reliability for clinical
work. Afterwards, we discuss the White Paper issued by Weiner and
the other members of the SPA Board of Trustees (2005).
The CS Norms Are Seriously in Error
In 1999 and 2000, a group of respected Rorschach
experts -- Thomas Shaffer, Philip Erdberg, John Haroian, and Mel
Hamel -- reported several of the most important studies on the Exner
system to appear in the past 25 years. Two of these studies were
published in the Journal of Personality Assessment, which certainly
cannot be accused of being an anti-Rorschach journal (indeed, its
past and present editors have all been ardent proponents of the
Rorschach). In the first of these studies (Shaffer, Erdberg, &
Haroian, 1999), the researchers administered the CS Rorschach, the
WAIS-R, and the MMPI-2 to 123 nonpatient adults living in the
community. Most of these participants were volunteers who donated
blood at a blood bank and then gave their time to be tested by the
research team. According to the WAIS-R and MMPI-2, the group was
average or even slightly above-average compared with other
Americans.
In only one respect did these apparently typical
Americans stand out: When compared with the Exner norms, their
Rorschach scores indicated that most of the individuals in the study
were seriously disturbed. For example, about 1 in 6 of the
participants scored in the pathological range on the CS
Schizophrenia Index. Their Distorted Form Quality scores were so
high that half would be considered thought-disordered. Nearly a
third gave a Reflection response, a supposed indicator of
pathological narcissism.
The results from the Shaffer et al. (1999) study
were replicated in an international project led by Erdberg and
Shaffer. Researchers administered the CS Rorschach to 2,125
nonpatient adults in 9 countries besides the U.S. (see summary in
Wood, Nezworski, et al., 2001b, p. 400; but see Meyer, 2001). The
normative results from these countries were substantially different
from the Exner norms, but similar to the numbers for nonpatient
Americans reported by Shaffer et al. (1999). The international
studies confirmed that the Shaffer et al. findings were not a
fluke.
In 2000, the same group of scholars published an
additional study, this time of 100 preadolescent children with no
known history of mental health problems (Hamel, Shaffer, &
Erdberg 2000). The children were above-average in psychological
adjustment according to a well-validated measure, the Conners Parent
Rating Scale-93 (Conners, 1989). Yet when these children’s Rorschach
scores were compared with the CS norms, the results were even more
troubling than in the study of adults. More than 60% of the children
scored in the pathological range on the Schizophrenia Index. More
than 50% had Form Quality scores that indicated thought disorder.
Nearly half scored in the “depressed” range on the CS Depression
Index. Hamel and his colleagues wrote:
If we were writing a Rorschach-based,
collective psychological evaluation for this sample, the clinical
descriptors would command attention. In the main, these children may
be described as grossly misperceiving and misinterpreting their
surroundings and having unconventional ideation and significant
cognitive impairment. Their distortion of reality and faulty
reasoning approach psychosis.... They apparently suffer from an
affective disorder that includes many of the markers found in
clinical depression. Equally puzzling is that the previous
Comprehensive System descriptors are incongruent with all other
information known to this study about these children. (p. 291)
The findings of Shaffer et al. (1999) and Hamel et
al. (2000) attracted widespread interest among Rorschach scholars.
Why did apparently normal adults and children appear seriously
disturbed when compared with the Exner norms? Was something amiss
with the norms? To explore these questions, we conducted a search of
the scientific literature from 1974 to 1999 and identified 32
additional studies that had administered the Exner Rorschach to
nonpatient American adults. When we combined the numbers across
studies, the results were very similar to those reported by Shaffer
and his colleagues. That is, the apparently normal individuals in
these 32 studies appeared “sick” when compared with the Exner norms.
In an article based on these findings (Wood, Nezworski, et al.,
2001a), we concluded that the Exner norms do not accurately
represent American adults, and that use of the norms tends to make
clients appear much more disturbed than they really are.
Table 1 lists 27 “problem scores” in the Exner
system, based on the findings of Shaffer et al. (1999) and our own
article. These scores, if used with the Exner norms, have a
substantial probability of “overpathologizing” patients. That is,
these scores will tend to make adults and children appear
psychologically disturbed when in fact they are not. We strongly
recommend that clinical psychologists avoid using the Exner norms
when interpreting patients’ Rorschachs. Although Exner has reported
that he is in the process of developing new norms, preliminary
reports (Exner, 2002; see also Luxenberg & Levin, 2004, p. 195)
indicate that his numbers are still highly discrepant from those of
virtually all other researchers. Thus, clinicians who use the new CS
norms will still run a serious risk of overpathologizing
patients.
Table 1: 27 CS Scores With Inaccurate
Norms That Are Likely to Misidentify Normal Individuals as
Disturbed.
In his Independent Practitioner article, Weiner
(2005) discussed our summary of 32 Rorschach studies (which he
dismissed), but inexplicably neglected to mention the studies by
Shaffer et al. (1999) and Hamel et al. (2000) and the findings of
the international project led by Shaffer and Erdberg. Only by
ignoring these substantial scientific findings could Weiner conclude
that criticisms of the norms “lack any solid conceptual or empirical
basis.”
Weiner’s failure to address the weighty research
evidence is deeply troubling, because use of the Exner norms in
clinical practice has a high potential for harming patients. As
already noted, the CS interpretive rules based on these norms
mistakenly misclassify about half of children as thought disordered,
and about half as depressed. Also pertinent is a study by Mittman
(1983; see summary in Exner, 1991, pp. 432-433). Mittman found that
when psychologists trained by the Rorschach Workshops classified
patients based on the Rorschach CS, they misidentified more than 75%
of normal individuals as psychiatrically disturbed. The incorrect
diagnoses most likely to be assigned were depression, other mood
disorders, and personality disorders.
Clinicians who use the CS face disquieting
questions. Is it acceptable to use a test that misclassifies most
normal adults and children as seriously disturbed? What are the
ethical and legal implications for practitioners who ignore the
scientific evidence and continue to use the CS norms and decision
rules in clinical and forensic settings? Remarkably, Weiner, Exner,
and other leading advocates of the CS provide no guidance on these
issues. Instead, they deny that a problem even exists (Exner, 2001;
Meyer, 2001; Weiner, 2005). In our opinion, covering up the problems
with the CS norms does a disservice to clinicians and their
clients.
Results from Meta-analyses
In his Independent Practitioner article, Weiner
(2005) described a well-known meta-analysis by Hiller, Rosenthal,
Bornstein, Berry, and Brunell-Neuleib (1999) that found the overall
or “global” validity of the Rorschach and MMPI to be approximately
equal. Weiner charged that “Rorschach critics customarily ignore the
Hiller data....” (p. 78)
We were deeply puzzled to read Weiner’s (2005)
accusation. Who, we wondered, are these mysterious critics who
“customarily ignore” the Hiller et al. (1999) data? Weiner’s
allegation lacked clarifying details: He did not identify the
critics by name or cite their publications to substantiate his
accusation. One thing is certain, however: Weiner could not
reasonably have been referring to the authors of the present
article, because we have repeatedly discussed Hiller’s data in our
published works. Had Weiner consulted our book on the Rorschach
published two years before his article (Wood, Nezworski, Lilienfeld,
& Garb, 2003), he would have found the following paragraph:
In one meta-analysis, Harvard graduate
student Jordan Hiller and his colleagues combined the results from
30 Rorschach articles randomly selected from the published
literature. The topics of these articles were extremely diverse. For
instance, three articles examined the correlation of Form Quality
scores with learning disabilities. Another examined the correlation
of the Rorschach Prognostic Rating Scale with patients’ improvement
after psychotherapy. When the results from these and the remaining
articles were combined, the average correlation was .26. When
results from thirty MMPI articles were similarly combined, the
average correlation was .37. Although the MMPI showed a slight
advantage over the Rorschach, this difference was not statistically
significant. (pp. 252-253)
In our opinion, meta-analyses by Hiller et al.
(1999) and other scholars provide compelling evidence that some
Rorschach scores are valid. In these meta-analyses, validity
coefficients across different Rorschach scores average about
.30. Similar values are obtained for the MMPI.
Our concern, however, is that while some Rorschach
scores are valid, most scores commonly used in clinical and forensic
work are not. Rorschach advocates often claim that we believe all
Rorschach scores are invalid, but this is simply untrue. In
the next two sections, we will discuss the validity of individual
scores from the Exner CS Rorschach.
20 CS Rorschach Scores Are Valid
In What’s Right With the Rorschach? (Wood,
Nezworski & Garb, 2003), and What’s Wrong With the Rorschach?
(Wood, Nezworski, Lilienfeld, & Garb, 2003), we identified many
Rorschach scores whose validity has been well established by
research. Twenty of these scores are part of the Exner system and
will be briefly described here.
First, the inkblot responses of patients with
schizophrenia and bipolar disorder often exhibit poor form quality
(see review by Frank, 1990). That is, the images reported by these
patients often do not fit the shape of the blots. The most prominent
measures of form quality in the Exner system are Conventional Form
(X+%), Distorted Form (X-%), Form Appropriate Extended (XA%), and
the good and poor Human Representational Variables (GHR and
PHR).
Second, the inkblot responses of patients with
schizophrenia and schizotypal personality disorder, and patients in
the manic phase of bipolar disorder, are often characterized by
thought disorder, that is, by disorganized cognitions and
peculiarities of language (for reviews, see Aronow & Reznikoff,
1976; Kleiger, 1999). The two most important measures of thought
disorder in the Exner system are the Weighted Sum of 6 Special
Scores (WSum6) and Level 2 scores.
Third, the Exner system (2001) includes three
global indexes that combine measures of poor form quality with
measures of thought disorder: the Schizophrenia Index (SCZI), the
Perceptual Thinking Index (PTI), and the Ego Impairment Index (EII).
These three indexes are highly correlated with each other and
essentially redundant (Smith, 2001). Patients with schizophrenia and
other psychotic conditions receive high scores on all three.
Fourth, numerous CS scores are correlated with IQ
(for a review, see Wood, Krishnamurthy, & Archer, 2003).
Moderate correlations with IQ, ranging from .30 to .40, have been
found for Developmental Quality (DQ+) and Organizational Activity
(Zf), scores that reflect the degree to which a patient has
synthesized the diverse parts of each blot into a unified image.
Form Quality scores (X+%, X-%, XA%), the total number of responses
(R), Human responses, Human Movement responses (M), Whole responses,
Blends, Lambda, and F% (a variant of Lambda) are also correlated
with IQ.
Table 2 lists the 20 CS scores with well
demonstrated validity for the purposes described here. These scores
appear to account for most of the positive findings in global
meta-analyses of the Exner system. Thus, both our own literature
reviews and the global meta-analyses point to the same conclusion:
these 20 scores are the “keepers” in the Exner System.
Table 2: 20 Comprehensive System Scores With
Demonstrated Validity
Related to Thought Disorder, Psychotic Disorders,
Schizotypal Personality Disorder, and Borderline
Personality Disorder Form Quality (low X+%, F+%, XA%; high
X-%, M-) Deviant Verbalizations (WSum6) Good Human
Responses (GHR) Poor Human Responses (PHR) Schizophrenia
Index (SCZI) Perceptual Thinking Index (PTI) Ego
Impairment Index (EII)
Related to Intelligence: Number of responses (R)
Organizational activity (Zf, DQ+, W) Complexity (low
Lambda, F%; high Blends/R) Form Quality (high X+%, F+%,
XA%; low X-%) Human figures (Human responses,
M)
Ironically, more than half of the 20 scores in
Table 2 also appear in Table 1 because their norms are inaccurate.
Thus, even though these scores are correlated with important
clinical phenomena, they will tend to yield seriously misleading
results if used with the Exner norms or interpretive rules.
Furthermore, as discussed later in this article, some scores in
Table 2 (SCZI, PTI, X-%) have poor scoring reliability. Thus, of the
20 scores in Table 2, fewer than half are suitable for clinical or
forensic use at the present time.
160 CS Scores Lack Demonstrated Validity
The CS Rorschach currently includes more than 180
scores. According to Exner (2003), these scores are correlated with
a wide variety of psychiatric diagnoses and symptoms, including
depression, anxiety, stress reactions, narcissism, dependency,
social withdrawal, suspiciousness, and impulsiveness. However, in a
detailed review of the scientific literature published in the
Journal of Clinical Psychology, we found that (except for the 20
scores in Table 2) the Exner CS Rorschach has little or no validity
for identifying psychopathological diagnoses or symptoms:
The Rorschach has not shown a
well-demonstrated relationship to Major Depressive Disorder,
Posttraumatic Stress Disorder (PTSD), anxiety disorders other than
PTSD, Dissociative Identity Disorder, Dependent, Narcissistic, or
Antisocial Personality Disorders, Conduct Disorder, or psychopathy.
(Wood, Lilienfeld, Garb, & Nezworski, 2000, p. 395)
In fact, fewer than 15% of CS scores have a
well-demonstrated relationship to psychopathological diagnoses or
symptoms.
In his Independent Practitioner piece,
Weiner (2005) reached much different conclusions. He claimed that CS
scores are related to a wide array of psychiatric diagnoses and
symptoms. In language that struck us as elusive and inconsistent, he
argued that although the Rorschach “is not a diagnostic test,” it
can be used for “differential diagnosis” of many psychiatric
conditions (p. 76). Weiner’s claim that the Rorschach can be broadly
used for differential diagnoses would be justified if the CS had
shown adequate convergent and discriminant validity for many
psychiatric conditions. However, as we have indicated, the research
literature shows just the opposite.
Why did Weiner (2005) arrive at conclusions mainly
contradictory to our own? The reason seems to be that his standards
of evidence were much different from ours. The authors of the
present article believe that a test score should be used clinically
and forensically only if it has been well-validated in sound,
consistent, and independently replicated scientific research (Garb,
Wood, Lilienfeld, & Nezworski, 2005; Wood, Nezworski, &
Stejskal, 1996). In contrast, although Weiner accepts the general
and uncontestable idea that psychologists should “attend” to
scientific studies (Board of Trustees, 2005), he has not endorsed
the more specific principle that Rorschach scores should be used
only if they have been well validated in scientific studies.
In his Independent Practitioner piece
Weiner described a host of specific clinical inferences that could
supposedly be drawn from Rorschach protocols. However, he neglected
to note that the substantial majority of these inferences were
unsupported by research. Many of his paragraphs on the clinical
interpretation of the CS were conspicuously devoid of any citations
to scientific studies. In fact, he repeatedly argued for usage of CS
scores that have failed in research.
For example, in his article, Weiner (2005, pp. 76)
held forth Exner’s Depression Index (DEPI) as a valid measure of
“substantial emotional turmoil,” even though this claim is directly
contradicted by the scientific literature. According to published
reviews (Jorgensen, Andersen, & Dam, 2000; Wood, Nezworski,
Lilienfeld, & Garb, 2003, p. 245), fourteen studies have
examined the relationship of DEPI scores to diagnoses of depression:
Eleven reported negative results, two reported mixed findings, and
only one yielded unmixed positive results. In his discussion of the
DEPI, Weiner failed to mention these reviews and their negative
conclusions.
As a second example, Weiner (2005, p.76) claimed
that the Egocentricity Index (EGOI) is related to “negative
self-attitudes.” However, he failed to cite the only published
literature review on this topic, which concluded that the EGOI lacks
demonstrated validity as a measure of negative self-attitudes or of
any other psychological characteristic (Nezworski & Wood,
1995).
A third example is slightly more complicated but
instructive. Weiner (2005, p. 80) claimed that several CS scores,
including WSumC, Lambda, and Afr, are related to post-traumatic
stress disorder (PTSD) and can be applied by psychologists in
personal injury lawsuits. In support of this claim, he cited only
one source, namely a literature review by Luxenberg and Levin
(2004).
In 2000, the authors of the present article
published a full research review on PTSD and the Rorschach (Wood,
Lilienfeld, et al., 2000). It discussed approximately twice as many
relevant studies as Luxenberg and Levin (2004), including both the
studies that they cited and many that they omitted. Our review
appeared in the Journal of Clinical Psychology, a peer-reviewed
publication, whereas theirs was a book chapter. Based on a thorough
review of all the relevant studies -- not just the incomplete subset
discussed by Luxenberg and Levin -- we concluded that PTSD is
unrelated to WSumC, Lambda, Afr, or any other CS score. When our
review was published, it was accompanied by a Comment from Weiner
(2000). We are puzzled, therefore, that Weiner’s Independent
Practitioner piece failed to mention this review, and instead
focused exclusively on the incomplete summary by Luxenberg and
Levin. As can be seen, Weiner’s (2005) Independent Practitioner
article repeatedly affirmed the validity of CS scores for which
research evidence is overwhelmingly negative. Furthermore, he
consistently omitted published reviews that contradicted his
conclusions. It is hardly surprising, therefore, that his
conclusions differed radically from ours.
Many CS Scores Lack Adequate Scoring Reliability
For many years, psychologists accepted Exner’s
(1978, p. 14; 1986, p. 23) claim that all CS scores have a scoring
reliability (i.e., interrater reliability) of .85 or higher.
However, recent studies have revealed this claim to be incorrect.
For example, Acklin, McDowell, Verschell, & Chan (2000) found
that for 89 CS scores, reliabilities (intraclass correlation
coefficients) ranged from .16 to 1.00, with a median of .83. A
recent study by McGrath et al. (2005) found that for 69 scores,
reliabilities ranged from .58 to .99, with a median of .89. Studies
by Meyer, Hilsenroth, et al. (2002) and Viglione and Taylor (2003)
presented generally higher figures, but their methodology and
statistical analyses were problematic (for a critique, see Wood,
Nezworski, Lilienfeld, & Garb, 2003, pp. 231-234,
366-367). Is the interrater reliability of CS scores acceptable?
This question has three answers, depending on which standards are
applied. First, do CS scores meet the high standards set by the
Wechsler IQ subtests, whose minimum interrater reliability is .90
(Wechsler, 1997)? The best studies on the Rorschach (Acklin et al.,
2000; McGrath et al., 2005) indicate that only 50% of CS scores meet
this stringent standard.
Second, do CS scores meet traditional reliability
standards for tests used in clinical practice? Because scores with
reliability below .80 contain substantial error, experts in
psychological assessment often recommend that only tests above this
level of reliability should be used for clinical decision making
(Nunnally & Bernstein, 1994; but see Cicchetti, 1994). According
to the Acklin and McGrath studies, about 75% of CS scores meet this
traditional standard of .80 reliability.
Third and finally, do CS scores meet recommended
standards for tests used in research? Psychometric experts generally
recommend that the minimum reliability of tests scores used in
research should be about .60 (Shrout, 1998). All but a few CS scores
meet this minimal standard.
In his Independent Practitioner piece,
Weiner (2005, pp. 78-79) asserted that “recent research leaves
little doubt that adequately trained examiners can achieve
substantial reliability in their coding of Rorschach responses.” We
are in 75% agreement with this assertion. That is, research has
clearly shown that approximately 75% of CS scores meet traditional
standards of reliability for clinical use. However, the same
research indicates that approximately 25% of CS scores do not meet
these standards. For example, the interrater reliabilities of the
SCZI, PTI, DEPI and WSum6 appear to be in the .70s, and the
reliability of Level 2 scores in the .60s. Other scores with
reliability below .80 include X-%, XA%, the D Score, Adjusted D, the
Sum of Vista responses (Sum V), the Sum of Diffuse Shading responses
(Sum Y), Food responses, and the ratios FC:CF+C and a:p. Although
these scores possess adequate reliability for research applications,
their use in clinical practice is likely to yield unacceptable error
rates.
The SPA White Paper on the Status of the
Rorschach
Having reviewed the most important issues in the
controversy over Exner’s CS Rorschach, we turn now to the recent SPA
White Paper mentioned earlier, which is entitled “The Status of the
Rorschach in Clinical and Forensic Practice: An Official Statement
by the Board of Trustees of the Society for Personality Assessment”
(Board of Trustees, 2005). In the following sections we summarize
the White Paper’s most important points and then offer our
comments.
Summary of the SPA White Paper
The SPA White Paper states that it is intended not
only for psychologists, but also for “attorneys, judges, and
administrators” (p. 219) and that it represents a response to the
controversy that has arisen during the past 10 years:
We are concerned that the Rorschach
controversy of the past several years has placed clinical and
forensic psychologists in a conflicted position, whether they can
continue to use the Rorschach in practice. (p. 219)
After reviewing the controversy’s history, the
White Paper presents a “summary of scientific evidence” (p. 219)
that relies heavily on articles by Meyer, Finn, et al. (2001) and
Meyer and Archer (2001), and on the meta-analysis by Hiller et al.
(1999) discussed earlier. The Paper concludes that “the Rorschach
possesses adequate psychometric properties” (p. 220).
The Paper states that “the Rorschach is like other
tests for which research supports their general validity -- all have
purposes for which they are more or less valid” (p. 220). However,
the Paper identifies only one Rorschach score -- the DEPI -- as
invalid.
The Paper fails to mention that there is any
controversy concerning the norms of the Exner CS. However, two
passages in the Paper make strong assertions regarding validity.
First, the Paper echoes the conclusion of Meyer, Finn, et al. (2001)
that psychological tests are as valid as medical tests:
... psychological assessment instruments
perform as effectively as measures in a variety of other health
services areas, such as electrocardiograms, mammography, magnetic
resonance imaging (MRI), dental radiographs, Papanicolaou (Pap)
smears, Positron Emission Tomography (PET) scans, and serum
cholesterol level testing. (p. 219)
Second, the Paper argues that the Rorschach is as
valid as other psychological tests:
The Rorschach possesses documented
reliability and validity similar to other generally accepted test
instruments used in the assessment of personality and
psychopathology.... (p. 221)
Together, these two statements logically imply --
without saying so directly -- that the Rorschach is as valid as
medical tests such as electrocardiograms, mammography, MRIs, and PET
scans.
In a section on ethical practices (pp. 220-221),
the White Paper offers several recommendations for professional use
of the Rorschach: (1) Administration and scoring should be
standardized; (2) Rorschach results should be integrated with
relevant information from interviews and other tests; (3) Clinicians
should “attend to the research literature to ensure Rorschach
inferences are consistent with the evidence;” and (4) Clinicians
should not use “Rorschach findings alone” to identify childhood
sexual abuse.
In closing, the White Paper concludes that “the
Rorschach meets the variety of legal tests for admissibility” in
courts, and that “its responsible use in personality assessment is
appropriate and justified” (p. 221).
Comments on the SPA White Paper
At a time when the Exner CS Rorschach is embroiled
in intense controversy, the SPA Board of Trustees (2005) has chosen
to issue an exceptionally strong and highly partisan endorsement of
the test. Considering the problems with the CS identified in this
article, we believe the Board would have been better advised to
issue a much more cautionary and scientifically balanced statement
about the test. We offer six comments on the SPA White Paper.
1. The SPA White Paper contains
several statements that are uncontroversial. The White
Paper makes several statements that virtually all clinical
psychologists can accept without reservation: (a) There is no doubt
that intense controversy surrounds the Exner CS Rorschach. (b) All
well-trained clinical psychologists recognize that neither the
Rorschach nor any other test should be interpreted in isolation from
other relevant information. (c) Reasonable clinicians can agree that
“Rorschach findings alone” (p. 220) should not be used to diagnose
childhood sexual abuse. However, it is somewhat surprising that the
SPA Board failed to provide a stronger recommendation. In our
opinion, Rorschach findings should not be used at all to diagnose
sexual abuse, given the absence of demonstrated validity for this
purpose (Garb, Wood, & Nezworski, 2000; Lilienfeld et al., 2000;
Meyer & Archer, 2001).
2. The SPA White Paper fails to
mention the controversy concerning the CS norms or acknowledge its
clinical and forensic implications. A central issue
concerning the psychometric properties of the CS Rorschach is
conspicuously omitted from the SPA White Paper. Specifically, the
Paper nowhere mentions the fierce public controversy surrounding the
Exner norms. In fact, the word “norms” appears nowhere in the text
or footnotes of the SPA White Paper.
It is difficult to understand how the SPA White
Paper can claim to provide “a summary of the issues” concerning the
Rorschach (p. 219), or pronounce that “the Rorschach possesses
adequate psychometric properties (p. 220),” while completely
ignoring the problems with the CS norms. As noted, these normative
problems bear serious implications for clinical and forensic
applications of the test, and have been documented by several
independent lines of evidence. We believe that the SPA White Paper
should have forthrightly acknowledged this evidence and warned
clinicians, attorneys, and judges that the Exner norms for the
Rorschach are highly controversial, that they tend to yield a high
error rate, and that the test misidentifies more than half of adults
and children as seriously disturbed.
3. The SPA White Paper broadly
endorses “Rorschach validity,” but fails to warn that the large
majority of CS scores lack demonstrated validity. As
we have noted, approximately 160 of the 180 scores in the Exner
system lack a well demonstrated relationship to psychological
disorders, symptoms, or personality characteristics. Of these 160
scores, the SPA White Paper explicitly identifies only one -- the
DEPI -- as lacking in validity.
We are troubled that the SPA Board singled out the
DEPI but neglected to mention the other CS scores that also lack
demonstrated validity. In making broad claims that the Rorschach is
as valid as other psychological tests, why didn’t the SPA Board
caution clinical psychologists, attorneys, and judges that this
claim applies to only a small minority of CS scores?
Instead of forthrightly acknowledging that most
Rorschach scores lack independent, replicated evidence for validity,
the White Paper offers the well-worn platitude that the validity of
all tests can vary: “The Rorschach is like other tests... all have
purposes for which they are more or less valid” (p. 220). However,
this psychometric truism falls far short of the more specific
warning that would have been appropriate. Specifically, the White
Paper should have specifically acknowledged that although some CS
scores have well-demonstrated validity for their intended purposes,
the large majority do not.
4. The SPA White Paper broadly
endorses “Rorschach reliability,” but fails to warn users that about
25% of CS scores fail to meet traditional clinical standards for
scoring reliability. Our comments regarding CS
validity also apply to interrater reliability. In making broad
claims that the Rorschach is as reliable as other psychological
tests, the SPA Board should have cautioned attorneys and judges that
this claim does not apply to about 25% of scores in the Exner
system, including the SCZI, the PTI, WSum6, Level 2 scores, X-%,
XA%, the D Score, and Adjusted D.
5. The SPA White Paper fails to
endorse the principle that test scores should be used only for
purposes for which they have well demonstrated
validity. Like most clinical psychologists, we believe
that test scores -- including Rorschach scores -- should be used
clinically and forensically only for purposes for which they have
been well validated by research. The SPA Board failed to endorse
this principle. Instead, it endorsed a substantially diluted
standard lacking any real bite: Clinicians should “attend” to the
scientific literature to “ensure Rorschach inferences are consistent
with the evidence” (p. 220). To understand what the Board meant by
“attending,” we need look no further than the Independent
Practitioner piece by Weiner (2005), the Board president. After
“attending” to the scientific literature, Weiner promoted the use of
several CS scores that research has shown to be invalid.
6. The SPA White Paper wrongly
suggests that the Rorschach is as valid as mammography, MRIs, and
PET scans. Perhaps nothing in the White Paper is so
disturbing as its suggestion that the validity of the Rorschach is
equal to that of mammography, MRIs, and PET scans. As noted, the
Paper argues that (1) psychological tests are as valid as these
medical imaging techniques, and (2) the Rorschach is as valid as
other psychological tests. Although the Paper stops short of stating
that the Rorschach is as valid as mammography and MRIs, the logical
implications of the syllogism are clear: Validity of medical imaging
techniques = validity of psychological tests = validity of the
Rorschach.
In the 1940s, the founder of SPA, Bruno Klopfer
(1940, p. 26), likened the Rorschach to an X-ray. It is unfortunate
that in the 21st century the SPA Board continues to encourage
similar hyperboles. As justification for these overblown claims, the
Board cites an article published in American Psychologist by
Rorschach proponent Gregory Meyer (current editor of the Journal of
Personality Assessment) and his colleagues (Meyer, Finn, et al.,
2001). Because the conclusions of that article have been criticized
elsewhere (Garb, Klein, & Grove, 2002; Hunsley, 2002; but see
Meyer, Finn, et al., 2002), we will not present a detailed critique
here. Instead, we will use a single example to show why flattering
comparisons between the CS Rorschach and mammography are highly
misleading.
The Meyer, Finn, et al. (2001) article reported
that the correlation of mammogram screening results with a diagnosis
of breast cancer within one year was r=.32. In comparison, the
article reported the global validity of the Rorschach (including the
Exner Rorschach) as r=.35. As can be seen, the validity coefficient
for the Rorschach was slightly higher than that for mammography. By
comparing such coefficients, Meyer et al. arrived at their
conclusion that the Rorschach and other psychological tests are at
least as valid as mammography.
However, a more detailed analysis of these
findings, comparing the validity of the SCZI with the validity of
mammograms, leads to much different conclusions. No CS score is
better validated by research than the SCZI. It has consistently
shown a positive correlation with diagnoses of schizophrenia. In
studies reported by researchers other than Exner, the sensitivity of
the SCZI has been found to be about .71 (that is, it correctly
identifies 71% of patients with schizophrenia) and its specificity
in clinical populations is about .76 (that is, it correctly
classifies 76% of patients without schizophrenia) (Jorgensen et al.,
2000). The base rate of schizophrenia in these studies has generally
been about .40. The correlation of the SCZI with diagnoses of
schizophrenia under these constraints (sensitivity = .71,
specificity = .76, base rate = .40) is .47, which as might be
expected is higher than the average validity of Rorschach scores
In the mammography meta-analysis (Mushlin,
Kouides, & Shapiro, 1998) reported by Meyer, Finn, et al.
(2001), the sensitivity of mammography was found to be .91 (that is,
it correctly identifies 91% of patients with breast cancer) and the
specificity .95 (that is, it correctly classifies 95% of patients
without breast cancer). The base rate of breast cancer was 0.6%,
that is, slightly more than one-half of one percent. The correlation
of mammography results with subsequent diagnoses of breast cancer
was .30 (slightly lower than the figure reported by Meyer, Finn, et
al., 2001).
If we were to focus uncritically on validity
coefficients, as Meyer, Finn, et al. (2001) did, we might
erroneously conclude that the SCZI is superior to mammograms.
However, alert readers will have noticed that the sensitivity and
specificity of mammography screening (.91 and .95) are substantially
higher than the sensitivity and specificity of the SCZI (.71 and
.76). Why, then, is the validity coefficient of mammography lower
than that of the SCZI? The reason is straightforward: The base rate
of breast cancer in the mammography studies was considerably lower
(about ½ of 1%) than the base rate of schizophrenia in the SCZI
studies (40%). When base rates are low, validity decreases
substantially, even for tests with extremely high sensitivity and
specificity (Meehl & Rosen, 1955).
In the meta-analysis reported by Meyer, Finn, et
al. (2001), mammography was used as a screening device for breast
cancer. The validity of mammography was low (.30) because the base
rate of cancer was low (0.6 %) among women in the study. However, if
mammography had been studied in samples with a base rate of 40%, as
in the SCZI example, its validity would have been .87 --
substantially higher than the validity of the SCZI or any other
Rorschach score.
Considering the foregoing analysis, can we
conclude -- as the Meyer study and the SPA White Paper suggest --
that the Rorschach is as valid as mammograms? Obviously not.
Mammograms are substantially more sensitive and specific than the
best validated score that the Exner Rorschach has to offer. The
validity coefficients cited by Meyer, Finn, et al. (2001) and the
SPA Board present a highly misleading picture of the relative
effectiveness of the Rorschach and mammograms. In our opinion, the
SPA White Paper is wrong to suggest otherwise. We strongly urge the
SPA Board to retract its claims and inform psychologists, attorneys
and judges that the CS Rorschach is not even remotely as valid as
medical imaging techniques.
Conclusions
In this article, we have summarized the central
issues in the decade-long controversy concerning Exner’s CS
Rorschach. We have also responded to Weiner’s (2005) Independent
Practitioner piece and evaluated the recent SPA White Paper on the
Rorschach. Having studied Weiner’s work for many years, we were not
surprised that he categorically brushed aside all criticisms of the
Exner system. However, given that the White Paper was issued by a
group of scholars who were well aware of criticisms of the test, we
had hoped for a less partisan and more scientifically balanced
response to the current Rorschach controversy.
In the White Paper, the SPA Board of Trustees had
a valuable opportunity to place the scientific status of the
Rorschach on a firm new footing and set high standards for use of
the test. First, the Board could have shown responsiveness to the
scientific community by inviting input and participation from
Rorschach scholars who represent the full breadth of scientific
opinions regarding the test. Instead, the composition of the SPA
Board was exceedingly one-sided, with one ardent proponent of the CS
(Weiner) as President, another (Meyer) as an ex-officio member, and
two close associates of CS proponents as members (Fowler, Mihura).
Critics of the Exner system were not invited to participate. The
Board’s failure to ensure a balanced representation of scientific
viewpoints may largely account for the White Paper’s one-sided
conclusions regarding the Rorschach.
Second, the Board could have displayed its
commitment to high scientific standards for clinical practice by
recommending that psychological test scores -- including Rorschach
scores -- should be used only for purposes for which they have been
scientifically validated. Instead, the Board chose to support a weak
and ambiguous standard lacking in substance: That psychologists
should “attend” to research.
Third and finally, the Board had an opportunity to
demonstrate that the Society for Personality has left behind its
scientifically disreputable past, when SPA founder Bruno Klopfer and
his followers promoted the Rorschach as the psychological equivalent
of an x-ray. Instead, the Board chose to update and embellish
Klopfer’s thoroughly discredited claims by suggesting that the
Rorschach is as effective as modern radiographic techniques:
mammograms, MRIs, and PET scans.
We sincerely regret that the SPA Board missed
these opportunities to rehabilitate the Rorschach’s reputation in
the broader scientific community. Rorschach proponents sometimes
express surprise, dismay, or irritation that their inkblots excite
strong controversy among research-oriented clinical psychologists.
However, the reason is not hard to find: From the time of Lee J.
Cronbach, Hans Eysenck, and Joseph Zubin up until the present,
legitimate scientific criticisms of the test have been adamantly
resisted or ignored by Rorschach proponents (for a historical
summary, see Wood, Nezworski, Lilienfeld, & Garb, 2003). So long
as proponents continue to dismiss legitimate criticism, ignore
negative research, and present the test as the psychological
equivalent of medical imaging techniques, the Rorschach is destined
to remain on the fringes of psychological science.
References
Acklin, M. W., McDowell, C. J., Verschell, M. S.,
& Chan, D. (2000). Interobserver agreement, intraobserver
reliability, and the Rorschach Comprehensive System. Journal of
Personality Assessment, 74, 15-47.
Aronow, E., & Reznikoff, M. (1976). Rorschach
content interpretation. New York: Grune & Stratton.
Board of Trustees of the Society for Personality
Assessment (2005). The status of the Rorschach in clinical and
forensic practice: An official statement by the Board of Trustees of
the Society for Personality Assessment. Journal of Personality
Assessment, 85, 219-237.
Cicchetti, D. V. (1994). Guidelines, criteria,
and rules of thumb for evaluating normed and standardized assessment
instruments in psychology. Psychological Assessment, 6,
284-290.
Conners, K. (1989). Manual for Conners’ rating
scales. North Tonawanda, NY: Multi Health Systems.
Crews, F. C. (2004). Out, damned blot! New York
Review of Books, 51(12), 22-25. Reprinted in: J. Weiner (Ed.)
(2005), The best American science and nature writing 2005 (pp.
29-40). Boston: Houghton Mifflin.
Exner, J. E. (1978). The Rorschach: A
Comprehensive System: Vol. 2. Current research and advanced
interpretation. New York: Wiley.
Exner, J. E. (1986). The Rorschach: A
Comprehensive System: Vol. 1. Basic foundations (2nd ed.). New
York: Wiley.
Exner, J. E. (1991). The Rorschach: A
Comprehensive System: Vol. 2. Interpretation (2nd ed.). New
York: Wiley.
Exner, J. E. (2001). A comment on “The
misperception of psychopathology: Problems with the norms of the
Comprehensive System for the Rorschach.” Clinical Psychology:
Science and Practice, 8, 386-388.
Exner, J. E. (2002). A new nonpatient sample for
the Rorschach Comprehensive System: A progress report. Journal
of Personality Assessment, 78, 391-404.
Exner, J. E. (2003). The Rorschach: A
comprehensive system: Vol. 1: Basic foundations and principles of
interpretation (4th ed.). Hoboken, NJ: Wiley.
Frank, G. (1990). Research on the clinical
usefulness of the Rorschach: I. The diagnosis of schizophrenia.
Perceptual and Motor Skills, 71, 573-578.
Garb, H. N., Klein, D. F., & Grove, W. M.
(2002). Comparison of medical and psychological tests. American
Psychologist, 57, 137-138.
Garb, H. N., Wood, J. M., Lilienfeld, S. O.,
& Nezworski, M. T. (2005). Roots of the Rorschach controversy.
Clinical Psychology Review, 25, 97-118.
Garb, H. N., Wood, J. M., & Nezworski, M. T.
(2000). Projective techniques and the detection of child sexual
abuse. Child Maltreatment, 5, 161-168.
Goode, E. (2001). What’s in an inkblot? Some say,
not much. New York Times, February 20, D1.
Goode, E. (2004). Defying psychiatric wisdom,
these skeptics say ‘Prove it.’ The New York Times, March 9,
D1, D6.
Hamel, M., Shaffer, T. W., & Erdberg, P.
(2000). A study of nonpatient preadolescent Rorschach protocols.
Journal of Personality Assessment, 75, 280-294.
Hiller, J. B., Rosenthal, R., Bornstein, R. F.,
Berry, D. T. R., & Brunell-Neuleib, S. (1999). A comparative
meta-analysis of Rorschach and MMPI validity. Psychological
Assessment, 11, 278-296.
Hunsley, J. (2002). Psychological testing and
psychological assessment: A closer examination. American
Psychologist, 57, 139-140.
Jorgensen, K., Andersen, T. J., & Dam, H.
(2000). The diagnostic efficiency of the Rorschach depression index
and the schizophrenia index: A review. Assessment, 7,
259-280.
Kleiger, J. H. (1999). Disordered thinking
and the Rorschach. Hillsdale, NJ: Analytic Press.
Klopfer, B. (1940). Personality aspects revealed
by the Rorschach method. Rorschach Research Exchange, 4,
26-29.
Lilienfeld, S. O., Wood, J. M., & Garb, H. N.
(2000). The scientific status of projective techniques.
Psychological Science in the Public Interest, 1, 27-66.
Lilienfeld, S. O., Wood, J. M., & Garb, H. N.
(2001). What’s wrong with this picture? Scientific American, 284
(5), 80-87.
Luxenberg, T., & Levin, P. (2004). The role
of the Rorschach in the assessment of trauma. In J. P. Wilson &
T. M. Keane (Eds.), Assessing psychological trauma and PTSD
(2nd ed., pp. 190-225).
McGrath, R. E., Pogge, D. L., Stokes, J. M.,
Cragnolino, A., Zaccario, M., Hayman, J., Piacentini, T., &
Wayland-Smith, D. (2005). Field reliability of Comprehensive System
scoring in an adolescent inpatient sample. Assessment, 12,
199-209.
Meehl, P.E. & Rosen, A. (1955) Antecedent
probability and the efficiency of psychometric signs, patterns, or
cutting scores. Psychological Bulletin 52,194-216
Mestel, R. (2003). Rorschach tested. Blot out the
famous method? Los Angeles Times, May 19, F-1.
Meyer, G. J. (2001). Evidence to correct
misperceptions about Rorschach norms. Clinical Psychology:
Science and Practice, 8, 389-396.
Meyer, G. J., & Archer, R. P. (2001). The
hard science of Rorschach research: What do we know and where do we
go? Psychological Assessment, 13, 486-502.
Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G.
G., Moreland, K. L., Dies, R. R., Eisman, E. J., Kubiszyn, T. W.,
& Reed, G. M. (2001). Psychological testing and psychological
assessment: A review of evidence and issues. American
Psychologist, 56, 128-165.
Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G.
G., Dies, R. R., Eisman, E. J., Kubiszyn, T. W., & Reed, G. M.
(2002). Amplifying issues related to psychological testing and
assessment. American Psychologist, 57, 140-141.
Meyer, G. J., Hilsenroth, M. J., Baxter, D.,
Exner, J. E., Fowler, J. C., Piers, C. C., & Resnick, J. (2002).
An examination of interrater reliability for scoring the Rorschach
Comprehensive System in eight data sets. Journal of Personality
Assessment, 78, 219-274.
Mittman, B. L. (1983). Judges’ ability to
diagnose schizophrenia on the Rorschach: The effect of malingering.
Unpublished doctoral dissertation. Long Island University.
Mushlin, A. I., Kouides, R. W., & Shapiro, D.
E. (1998). Estimating the accuracy of screening mammography: A
meta-analysis. American Journal of Preventive Medicine, 14,
143-153.
Nezworski, M. T., & Wood, J. M. (1995).
Narcissism in the Comprehensive System for the Rorschach.
Clinical Psychology: Science and Practice, 2, 179-199.
Nunnally, J. C., & Bernstein, I. C. (1994).
Psychometric theory (3rd ed.). New York: McGraw-Hill.
Shaffer, T. W., Erdberg, P., & Haroian, J.
(1999). Current nonpatient data for the Rorschach, WAIS-R, and
MMPI-2. Journal of Personality Assessment, 73, 305-316.
Shrout, P. E. (1998). Measurement reliability and
agreement in psychiatry. Statistical methods in medical
research, 7, 301-317.
Smith, S. R. (2001). Multimethod assessment of
child and adolescent psychopathology: An examination of behavior
ratings, self-report, and the Rorschach inkblot method. Unpublished
doctoral dissertation. University of Arkansas.
Viglione, D. J., & Taylor, N. (2003).
Empirical support for interrater reliability of Rorschach
Comprehensive System coding. Journal of Clinical Psychology,
59, 111-121.
Wechsler, D. (1997). WAIS-III administration and
scoring manual. San Antonio, TX: The Psychological Corporation.
Weiner, I. B. (2000). Using the Rorschach
properly in practice and research. Journal of Clinical
Psychology, 56, 435-438.
Weiner, I. B. (2005). The utility of Rorschach
assessment in clinical and forensic practice. Independent
Practitioner, 25, 76-83.
Wood, J. M., Krishnamurthy, R., & Archer, R.
P. (2003). Three factors of the Comprehensive System for the
Rorschach and their relationship to Wechsler IQ scores in an
adolescent sample. Assessment, 10, 259-265.
Wood, J. M., Lilienfeld, S. O., Garb, H. N.,
& Nezworski, M. T. (2000). The Rorschach Test in clinical
diagnosis: A critical review, with a backward look at Garfield
(1947). Journal of Clinical Psychology, 56, 395-430.
Wood, J. M., Nezworski, M. T., Garb, H. N., &
Lilienfeld, S. O. (2001a). The misperception of psychopathology:
Problems with the norms of the Comprehensive System for the
Rorschach. Clinical Psychology: Science and Practice, 8,
350-373.
Wood, J. M., Nezworski, M. T., Garb, Howard, N.,
& Lilienfeld, S. O. (2001b) Problems with the norms of the
Comprehensive System for the Rorschach: Methodological and
conceptual considerations. Clinical Psychology: Science and
Practice, 8, 397-402.
Wood, J. M., Nezworski, M. T., & Garb, H. N.
(2003) What’s right with the Rorschach? Scientific Review of
Mental Health Practice, 2, 142-146.
Wood, J. M., Nezworski, M. T., Lilienfeld, S.O.,
& Garb, H. N. (2003). What’s wrong with the Rorschach? Science
confronts the controversial inkblot test. San Francisco:
Jossey-Bass.
Wood, J. M., Nezworski, M. T., & Stejskal, W.
J. (1996). The Comprehensive System for the Rorschach: A critical
examination. Psychological Science, 7, 3-10.
Author Note
James M. Wood, Department of Psychology,
University of Texas at El Paso M. Teresa Nezworski, Department of
Psychology, University of Texas at Dallas Howard N. Garb,
Psychology Research Service, Wilford Hall Medical Center, Lackland
Air Force Base
Scott O. Lilienfeld, Department of Psychology,
Emory University Correspondence concerning this article should be
addressed to James M. Wood, Department of Psychology, University of
Texas at El Paso, El Paso, Texas, 79968. E-mail: jawood@utep.edu
The views expressed in this article are those of
the authors and are not the official policy of the Department of
Defense or the United States Air Force.