Controversy Over Exner’s Comprehensive System for the Rorschach


Independent Practitioner/Spring 2006

Practitioner Information


The Controversy Over Exner’s Comprehensive System for the Rorschach: The Critics Speak

James M. Wood, M. Teresa Nezworski, Howard N. Garb, Scott O. Lilienfeld


Contents

Table of Contents

Editorial and Opinion

President’s Message Lillian Comas-Diaz

Editor’s Column; Bad TherapyEd Lundeen

A Funny Thing Happened on the Way to the Board Meeting Stanley Graham

Our Hawaii Colleagues Continue Their Exciting RXP Quest Pat DeLeon

Managed Behavioral Health Care Isn’tWallace Wilkins

Give It Away, Get It Back BiggerAri Tuckman

Classic Reprints

The Dose/Effect RelationshipHoward et.al.

CountertransferenceD.W. Winnicott

Funding Allocated for Mentally Ill Offender ActAAP Newsletter

Mental Health ParitySteve Pfeiffer

Rural PracticeDave Grundel

Technology Updates

Online Bookmarks – Pauline Wallin

Candidates for Division Offices:

Division News and Notes

Distance Learning Course in MarketingNancy Molitor

Membership Update — Ambassador ProgramMiguel Gallardo

Highlights of the APA Expert Summit on ImmigrationJosephine D. Johnson

AutobiographyStan Moldawsky

Pictures from the 2006 Division Mid-Winter MeetingAlan Entin

Mentors Corner Tiffany Snyder & Monica Neel

Book Review

The Office Survival GuideReviewed by Sandra Haber

What Therapists Don’t Talk About and Why: Understanding Taboos That Hurt Us and Our ClientsReviewed by Ray Arsenault

Silliness

Clem Sets Psychologists’ SalariesMartin Williams


For the past 10 years, intense scientific controversy has engulfed one of psychology’s most widely used assessment methods, Exner’s Comprehensive System for the Rorschach (CS) (Exner, 2003). Heated debates and critical articles concerning the CS have appeared in more than a dozen scholarly journals, including the Journal of Personality Assessment, Psychological Assessment, Assessment, the Journal of Clinical Psychology, Clinical Psychology Review, Clinical Psychology: Science and Practice , and Professional Psychology: Research and Practice.

No longer confined to the pages of scholarly journals, the controversy over the Exner CS Rorschach has spread to the national press. The New York Times (Goode, 2001, 2004), Los Angeles Times (Mestel, 2003), and Scientific American (Lilienfeld, Wood, & Garb, 2001; see also Lilienfeld, Wood, & Garb, 2000) have carried feature articles on the debate. A scathing critique in the New York Review of Books (Crews, 2004/2005) recently called on psychologists to abandon clinical use of the test.

During the past year, proponents of the Exner system have attempted to dampen the controversy. Irving Weiner, President of the Board of Trustees of the Society for Personality Assessment (SPA), published an article in the Spring issue of the Independent Practitioner (Weiner, 2005) that categorically rejected all criticisms of the CS. A few months afterward the SPA Board released a White Paper for psychologists, attorneys, and judges that broadly endorsed use of the test in clinics and courtrooms (Board of Trustees of the Society for Personality Assessment, 2005, hereafter cited as Board of Trustees, 2005).

The unyielding stance adopted by Weiner and other proponents of the CS is exemplified by the closing sentence of his Independent Practitioner piece:

Although some critics have questioned the psychometric soundness and legal suitability of Rorschach assessment, their criticisms lack any solid conceptual or empirical basis. (Weiner, 2005, p. 82) (emphasis added)

Is it true, as Weiner asserts, that criticisms of the CS lack any rational basis? If so, why have editors and reviewers at more than a dozen respected journals allowed them to be published? Or instead, is Weiner’s absolute rejection of all criticism a warning sign of serious problems within the Rorschach community? Have he and other defenders of the Exner system lapsed into a siege mentality, so that even legitimate criticisms are rejected?

The easiest course for clinicians would be simply to accept Weiner’s (2005) blanket assurance that all is well with the Exner system. If they already use the CS Rorschach, they could continue to do so without feeling uncertainty or doubt. However, most clinicians will probably conclude that the easiest course is not the best one, and that they have a responsibility to understand the controversy and stay abreast of research findings. Only by doing so can they provide the best possible service to their clients.

In the present article, we provide evidence for five conclusions regarding the Rorschach controversy: (1) The Exner norms are in error and seriously overpathologize adults and children; (2) Meta-analyses indicate that at least some Rorschach scores are valid; (3) Twenty CS Rorschach scores are valid; (4) The remaining 160 CS scores lack demonstrated validity; and (5) About 25% of CS scores lack adequate scoring reliability for clinical work. Afterwards, we discuss the White Paper issued by Weiner and the other members of the SPA Board of Trustees (2005).

The CS Norms Are Seriously in Error

In 1999 and 2000, a group of respected Rorschach experts -- Thomas Shaffer, Philip Erdberg, John Haroian, and Mel Hamel -- reported several of the most important studies on the Exner system to appear in the past 25 years. Two of these studies were published in the Journal of Personality Assessment, which certainly cannot be accused of being an anti-Rorschach journal (indeed, its past and present editors have all been ardent proponents of the Rorschach). In the first of these studies (Shaffer, Erdberg, & Haroian, 1999), the researchers administered the CS Rorschach, the WAIS-R, and the MMPI-2 to 123 nonpatient adults living in the community. Most of these participants were volunteers who donated blood at a blood bank and then gave their time to be tested by the research team. According to the WAIS-R and MMPI-2, the group was average or even slightly above-average compared with other Americans.

In only one respect did these apparently typical Americans stand out: When compared with the Exner norms, their Rorschach scores indicated that most of the individuals in the study were seriously disturbed. For example, about 1 in 6 of the participants scored in the pathological range on the CS Schizophrenia Index. Their Distorted Form Quality scores were so high that half would be considered thought-disordered. Nearly a third gave a Reflection response, a supposed indicator of pathological narcissism.

The results from the Shaffer et al. (1999) study were replicated in an international project led by Erdberg and Shaffer. Researchers administered the CS Rorschach to 2,125 nonpatient adults in 9 countries besides the U.S. (see summary in Wood, Nezworski, et al., 2001b, p. 400; but see Meyer, 2001). The normative results from these countries were substantially different from the Exner norms, but similar to the numbers for nonpatient Americans reported by Shaffer et al. (1999). The international studies confirmed that the Shaffer et al. findings were not a fluke.

In 2000, the same group of scholars published an additional study, this time of 100 preadolescent children with no known history of mental health problems (Hamel, Shaffer, & Erdberg 2000). The children were above-average in psychological adjustment according to a well-validated measure, the Conners Parent Rating Scale-93 (Conners, 1989). Yet when these children’s Rorschach scores were compared with the CS norms, the results were even more troubling than in the study of adults. More than 60% of the children scored in the pathological range on the Schizophrenia Index. More than 50% had Form Quality scores that indicated thought disorder. Nearly half scored in the “depressed” range on the CS Depression Index. Hamel and his colleagues wrote:

If we were writing a Rorschach-based, collective psychological evaluation for this sample, the clinical descriptors would command attention. In the main, these children may be described as grossly misperceiving and misinterpreting their surroundings and having unconventional ideation and significant cognitive impairment. Their distortion of reality and faulty reasoning approach psychosis.... They apparently suffer from an affective disorder that includes many of the markers found in clinical depression. Equally puzzling is that the previous Comprehensive System descriptors are incongruent with all other information known to this study about these children. (p. 291)

The findings of Shaffer et al. (1999) and Hamel et al. (2000) attracted widespread interest among Rorschach scholars. Why did apparently normal adults and children appear seriously disturbed when compared with the Exner norms? Was something amiss with the norms? To explore these questions, we conducted a search of the scientific literature from 1974 to 1999 and identified 32 additional studies that had administered the Exner Rorschach to nonpatient American adults. When we combined the numbers across studies, the results were very similar to those reported by Shaffer and his colleagues. That is, the apparently normal individuals in these 32 studies appeared “sick” when compared with the Exner norms. In an article based on these findings (Wood, Nezworski, et al., 2001a), we concluded that the Exner norms do not accurately represent American adults, and that use of the norms tends to make clients appear much more disturbed than they really are.

Table 1 lists 27 “problem scores” in the Exner system, based on the findings of Shaffer et al. (1999) and our own article. These scores, if used with the Exner norms, have a substantial probability of “overpathologizing” patients. That is, these scores will tend to make adults and children appear psychologically disturbed when in fact they are not. We strongly recommend that clinical psychologists avoid using the Exner norms when interpreting patients’ Rorschachs. Although Exner has reported that he is in the process of developing new norms, preliminary reports (Exner, 2002; see also Luxenberg & Levin, 2004, p. 195) indicate that his numbers are still highly discrepant from those of virtually all other researchers. Thus, clinicians who use the new CS norms will still run a serious risk of overpathologizing patients.

Table 1: 27 CS Scores With Inaccurate Norms
That Are Likely to Misidentify Normal Individuals as Disturbed.

In his Independent Practitioner article, Weiner (2005) discussed our summary of 32 Rorschach studies (which he dismissed), but inexplicably neglected to mention the studies by Shaffer et al. (1999) and Hamel et al. (2000) and the findings of the international project led by Shaffer and Erdberg. Only by ignoring these substantial scientific findings could Weiner conclude that criticisms of the norms “lack any solid conceptual or empirical basis.”

Weiner’s failure to address the weighty research evidence is deeply troubling, because use of the Exner norms in clinical practice has a high potential for harming patients. As already noted, the CS interpretive rules based on these norms mistakenly misclassify about half of children as thought disordered, and about half as depressed. Also pertinent is a study by Mittman (1983; see summary in Exner, 1991, pp. 432-433). Mittman found that when psychologists trained by the Rorschach Workshops classified patients based on the Rorschach CS, they misidentified more than 75% of normal individuals as psychiatrically disturbed. The incorrect diagnoses most likely to be assigned were depression, other mood disorders, and personality disorders.

Clinicians who use the CS face disquieting questions. Is it acceptable to use a test that misclassifies most normal adults and children as seriously disturbed? What are the ethical and legal implications for practitioners who ignore the scientific evidence and continue to use the CS norms and decision rules in clinical and forensic settings? Remarkably, Weiner, Exner, and other leading advocates of the CS provide no guidance on these issues. Instead, they deny that a problem even exists (Exner, 2001; Meyer, 2001; Weiner, 2005). In our opinion, covering up the problems with the CS norms does a disservice to clinicians and their clients.

Results from Meta-analyses

In his Independent Practitioner article, Weiner (2005) described a well-known meta-analysis by Hiller, Rosenthal, Bornstein, Berry, and Brunell-Neuleib (1999) that found the overall or “global” validity of the Rorschach and MMPI to be approximately equal. Weiner charged that “Rorschach critics customarily ignore the Hiller data....” (p. 78)

We were deeply puzzled to read Weiner’s (2005) accusation. Who, we wondered, are these mysterious critics who “customarily ignore” the Hiller et al. (1999) data? Weiner’s allegation lacked clarifying details: He did not identify the critics by name or cite their publications to substantiate his accusation. One thing is certain, however: Weiner could not reasonably have been referring to the authors of the present article, because we have repeatedly discussed Hiller’s data in our published works. Had Weiner consulted our book on the Rorschach published two years before his article (Wood, Nezworski, Lilienfeld, & Garb, 2003), he would have found the following paragraph:

In one meta-analysis, Harvard graduate student Jordan Hiller and his colleagues combined the results from 30 Rorschach articles randomly selected from the published literature. The topics of these articles were extremely diverse. For instance, three articles examined the correlation of Form Quality scores with learning disabilities. Another examined the correlation of the Rorschach Prognostic Rating Scale with patients’ improvement after psychotherapy. When the results from these and the remaining articles were combined, the average correlation was .26. When results from thirty MMPI articles were similarly combined, the average correlation was .37. Although the MMPI showed a slight advantage over the Rorschach, this difference was not statistically significant. (pp. 252-253)

In our opinion, meta-analyses by Hiller et al. (1999) and other scholars provide compelling evidence that some Rorschach scores are valid.  In these meta-analyses, validity coefficients across different Rorschach scores average about .30.  Similar values are obtained for the MMPI. 

Our concern, however, is that while some Rorschach scores are valid, most scores commonly used in clinical and forensic work are not. Rorschach advocates often claim that we believe all Rorschach scores are invalid, but this is simply untrue.  In the next two sections, we will discuss the validity of individual scores from the Exner CS Rorschach.

20 CS Rorschach Scores Are Valid

In What’s Right With the Rorschach? (Wood, Nezworski & Garb, 2003), and What’s Wrong With the Rorschach? (Wood, Nezworski, Lilienfeld, & Garb, 2003), we identified many Rorschach scores whose validity has been well established by research. Twenty of these scores are part of the Exner system and will be briefly described here.

First, the inkblot responses of patients with schizophrenia and bipolar disorder often exhibit poor form quality (see review by Frank, 1990). That is, the images reported by these patients often do not fit the shape of the blots. The most prominent measures of form quality in the Exner system are Conventional Form (X+%), Distorted Form (X-%), Form Appropriate Extended (XA%), and the good and poor Human Representational Variables (GHR and PHR).

Second, the inkblot responses of patients with schizophrenia and schizotypal personality disorder, and patients in the manic phase of bipolar disorder, are often characterized by thought disorder, that is, by disorganized cognitions and peculiarities of language (for reviews, see Aronow & Reznikoff, 1976; Kleiger, 1999). The two most important measures of thought disorder in the Exner system are the Weighted Sum of 6 Special Scores (WSum6) and Level 2 scores.

Third, the Exner system (2001) includes three global indexes that combine measures of poor form quality with measures of thought disorder: the Schizophrenia Index (SCZI), the Perceptual Thinking Index (PTI), and the Ego Impairment Index (EII). These three indexes are highly correlated with each other and essentially redundant (Smith, 2001). Patients with schizophrenia and other psychotic conditions receive high scores on all three.

Fourth, numerous CS scores are correlated with IQ (for a review, see Wood, Krishnamurthy, & Archer, 2003). Moderate correlations with IQ, ranging from .30 to .40, have been found for Developmental Quality (DQ+) and Organizational Activity (Zf), scores that reflect the degree to which a patient has synthesized the diverse parts of each blot into a unified image. Form Quality scores (X+%, X-%, XA%), the total number of responses (R), Human responses, Human Movement responses (M), Whole responses, Blends, Lambda, and F% (a variant of Lambda) are also correlated with IQ.

Table 2 lists the 20 CS scores with well demonstrated validity for the purposes described here. These scores appear to account for most of the positive findings in global meta-analyses of the Exner system. Thus, both our own literature reviews and the global meta-analyses point to the same conclusion: these 20 scores are the “keepers” in the Exner System.

Table 2: 20 Comprehensive System Scores With Demonstrated Validity

Related to Thought Disorder, Psychotic Disorders,
Schizotypal Personality Disorder, and Borderline Personality Disorder
Form Quality (low X+%, F+%, XA%; high X-%, M-)
Deviant Verbalizations (WSum6)
Good Human Responses (GHR)
Poor Human Responses (PHR)
Schizophrenia Index (SCZI)
Perceptual Thinking Index (PTI)
Ego Impairment Index (EII)

Related to Intelligence:
Number of responses (R)
Organizational activity (Zf, DQ+, W)
Complexity (low Lambda, F%; high Blends/R)
Form Quality (high X+%, F+%, XA%; low X-%)
Human figures (Human responses, M)

Ironically, more than half of the 20 scores in Table 2 also appear in Table 1 because their norms are inaccurate. Thus, even though these scores are correlated with important clinical phenomena, they will tend to yield seriously misleading results if used with the Exner norms or interpretive rules. Furthermore, as discussed later in this article, some scores in Table 2 (SCZI, PTI, X-%) have poor scoring reliability. Thus, of the 20 scores in Table 2, fewer than half are suitable for clinical or forensic use at the present time.

160 CS Scores Lack Demonstrated Validity

The CS Rorschach currently includes more than 180 scores. According to Exner (2003), these scores are correlated with a wide variety of psychiatric diagnoses and symptoms, including depression, anxiety, stress reactions, narcissism, dependency, social withdrawal, suspiciousness, and impulsiveness. However, in a detailed review of the scientific literature published in the Journal of Clinical Psychology, we found that (except for the 20 scores in Table 2) the Exner CS Rorschach has little or no validity for identifying psychopathological diagnoses or symptoms:

The Rorschach has not shown a well-demonstrated relationship to Major Depressive Disorder, Posttraumatic Stress Disorder (PTSD), anxiety disorders other than PTSD, Dissociative Identity Disorder, Dependent, Narcissistic, or Antisocial Personality Disorders, Conduct Disorder, or psychopathy. (Wood, Lilienfeld, Garb, & Nezworski, 2000, p. 395)

In fact, fewer than 15% of CS scores have a well-demonstrated relationship to psychopathological diagnoses or symptoms.

In his Independent Practitioner piece, Weiner (2005) reached much different conclusions. He claimed that CS scores are related to a wide array of psychiatric diagnoses and symptoms. In language that struck us as elusive and inconsistent, he argued that although the Rorschach “is not a diagnostic test,” it can be used for “differential diagnosis” of many psychiatric conditions (p. 76). Weiner’s claim that the Rorschach can be broadly used for differential diagnoses would be justified if the CS had shown adequate convergent and discriminant validity for many psychiatric conditions. However, as we have indicated, the research literature shows just the opposite.

Why did Weiner (2005) arrive at conclusions mainly contradictory to our own? The reason seems to be that his standards of evidence were much different from ours. The authors of the present article believe that a test score should be used clinically and forensically only if it has been well-validated in sound, consistent, and independently replicated scientific research (Garb, Wood, Lilienfeld, & Nezworski, 2005; Wood, Nezworski, & Stejskal, 1996). In contrast, although Weiner accepts the general and uncontestable idea that psychologists should “attend” to scientific studies (Board of Trustees, 2005), he has not endorsed the more specific principle that Rorschach scores should be used only if they have been well validated in scientific studies.

In his Independent Practitioner piece Weiner described a host of specific clinical inferences that could supposedly be drawn from Rorschach protocols. However, he neglected to note that the substantial majority of these inferences were unsupported by research. Many of his paragraphs on the clinical interpretation of the CS were conspicuously devoid of any citations to scientific studies. In fact, he repeatedly argued for usage of CS scores that have failed in research.

For example, in his article, Weiner (2005, pp. 76) held forth Exner’s Depression Index (DEPI) as a valid measure of “substantial emotional turmoil,” even though this claim is directly contradicted by the scientific literature. According to published reviews (Jorgensen, Andersen, & Dam, 2000; Wood, Nezworski, Lilienfeld, & Garb, 2003, p. 245), fourteen studies have examined the relationship of DEPI scores to diagnoses of depression: Eleven reported negative results, two reported mixed findings, and only one yielded unmixed positive results. In his discussion of the DEPI, Weiner failed to mention these reviews and their negative conclusions.

As a second example, Weiner (2005, p.76) claimed that the Egocentricity Index (EGOI) is related to “negative self-attitudes.” However, he failed to cite the only published literature review on this topic, which concluded that the EGOI lacks demonstrated validity as a measure of negative self-attitudes or of any other psychological characteristic (Nezworski & Wood, 1995).

A third example is slightly more complicated but instructive. Weiner (2005, p. 80) claimed that several CS scores, including WSumC, Lambda, and Afr, are related to post-traumatic stress disorder (PTSD) and can be applied by psychologists in personal injury lawsuits. In support of this claim, he cited only one source, namely a literature review by Luxenberg and Levin (2004).

In 2000, the authors of the present article published a full research review on PTSD and the Rorschach (Wood, Lilienfeld, et al., 2000). It discussed approximately twice as many relevant studies as Luxenberg and Levin (2004), including both the studies that they cited and many that they omitted. Our review appeared in the Journal of Clinical Psychology, a peer-reviewed publication, whereas theirs was a book chapter. Based on a thorough review of all the relevant studies -- not just the incomplete subset discussed by Luxenberg and Levin -- we concluded that PTSD is unrelated to WSumC, Lambda, Afr, or any other CS score. When our review was published, it was accompanied by a Comment from Weiner (2000). We are puzzled, therefore, that Weiner’s Independent Practitioner piece failed to mention this review, and instead focused exclusively on the incomplete summary by Luxenberg and Levin.
As can be seen, Weiner’s (2005) Independent Practitioner article repeatedly affirmed the validity of CS scores for which research evidence is overwhelmingly negative. Furthermore, he consistently omitted published reviews that contradicted his conclusions. It is hardly surprising, therefore, that his conclusions differed radically from ours.

Many CS Scores Lack Adequate Scoring Reliability

For many years, psychologists accepted Exner’s (1978, p. 14; 1986, p. 23) claim that all CS scores have a scoring reliability (i.e., interrater reliability) of .85 or higher. However, recent studies have revealed this claim to be incorrect. For example, Acklin, McDowell, Verschell, & Chan (2000) found that for 89 CS scores, reliabilities (intraclass correlation coefficients) ranged from .16 to 1.00, with a median of .83. A recent study by McGrath et al. (2005) found that for 69 scores, reliabilities ranged from .58 to .99, with a median of .89. Studies by Meyer, Hilsenroth, et al. (2002) and Viglione and Taylor (2003) presented generally higher figures, but their methodology and statistical analyses were problematic (for a critique, see Wood, Nezworski, Lilienfeld, & Garb, 2003, pp. 231-234, 366-367).
Is the interrater reliability of CS scores acceptable? This question has three answers, depending on which standards are applied. First, do CS scores meet the high standards set by the Wechsler IQ subtests, whose minimum interrater reliability is .90 (Wechsler, 1997)? The best studies on the Rorschach (Acklin et al., 2000; McGrath et al., 2005) indicate that only 50% of CS scores meet this stringent standard.

Second, do CS scores meet traditional reliability standards for tests used in clinical practice? Because scores with reliability below .80 contain substantial error, experts in psychological assessment often recommend that only tests above this level of reliability should be used for clinical decision making (Nunnally & Bernstein, 1994; but see Cicchetti, 1994). According to the Acklin and McGrath studies, about 75% of CS scores meet this traditional standard of .80 reliability.

Third and finally, do CS scores meet recommended standards for tests used in research? Psychometric experts generally recommend that the minimum reliability of tests scores used in research should be about .60 (Shrout, 1998). All but a few CS scores meet this minimal standard.

In his Independent Practitioner piece, Weiner (2005, pp. 78-79) asserted that “recent research leaves little doubt that adequately trained examiners can achieve substantial reliability in their coding of Rorschach responses.” We are in 75% agreement with this assertion. That is, research has clearly shown that approximately 75% of CS scores meet traditional standards of reliability for clinical use. However, the same research indicates that approximately 25% of CS scores do not meet these standards. For example, the interrater reliabilities of the SCZI, PTI, DEPI and WSum6 appear to be in the .70s, and the reliability of Level 2 scores in the .60s. Other scores with reliability below .80 include X-%, XA%, the D Score, Adjusted D, the Sum of Vista responses (Sum V), the Sum of Diffuse Shading responses (Sum Y), Food responses, and the ratios FC:CF+C and a:p. Although these scores possess adequate reliability for research applications, their use in clinical practice is likely to yield unacceptable error rates.

The SPA White Paper on the Status of the Rorschach

Having reviewed the most important issues in the controversy over Exner’s CS Rorschach, we turn now to the recent SPA White Paper mentioned earlier, which is entitled “The Status of the Rorschach in Clinical and Forensic Practice: An Official Statement by the Board of Trustees of the Society for Personality Assessment” (Board of Trustees, 2005). In the following sections we summarize the White Paper’s most important points and then offer our comments.

Summary of the SPA White Paper

The SPA White Paper states that it is intended not only for psychologists, but also for “attorneys, judges, and administrators” (p. 219) and that it represents a response to the controversy that has arisen during the past 10 years:

We are concerned that the Rorschach controversy of the past several years has placed clinical and forensic psychologists in a conflicted position, whether they can continue to use the Rorschach in practice. (p. 219)

After reviewing the controversy’s history, the White Paper presents a “summary of scientific evidence” (p. 219) that relies heavily on articles by Meyer, Finn, et al. (2001) and Meyer and Archer (2001), and on the meta-analysis by Hiller et al. (1999) discussed earlier. The Paper concludes that “the Rorschach possesses adequate psychometric properties” (p. 220).

The Paper states that “the Rorschach is like other tests for which research supports their general validity -- all have purposes for which they are more or less valid” (p. 220). However, the Paper identifies only one Rorschach score -- the DEPI -- as invalid.

The Paper fails to mention that there is any controversy concerning the norms of the Exner CS. However, two passages in the Paper make strong assertions regarding validity. First, the Paper echoes the conclusion of Meyer, Finn, et al. (2001) that psychological tests are as valid as medical tests:

... psychological assessment instruments perform as effectively as measures in a variety of other health services areas, such as electrocardiograms, mammography, magnetic resonance imaging (MRI), dental radiographs, Papanicolaou (Pap) smears, Positron Emission Tomography (PET) scans, and serum cholesterol level testing. (p. 219)

Second, the Paper argues that the Rorschach is as valid as other psychological tests:

The Rorschach possesses documented reliability and validity similar to other generally accepted test instruments used in the assessment of personality and psychopathology.... (p. 221)

Together, these two statements logically imply -- without saying so directly -- that the Rorschach is as valid as medical tests such as electrocardiograms, mammography, MRIs, and PET scans.

In a section on ethical practices (pp. 220-221), the White Paper offers several recommendations for professional use of the Rorschach: (1) Administration and scoring should be standardized; (2) Rorschach results should be integrated with relevant information from interviews and other tests; (3) Clinicians should “attend to the research literature to ensure Rorschach inferences are consistent with the evidence;” and (4) Clinicians should not use “Rorschach findings alone” to identify childhood sexual abuse.

In closing, the White Paper concludes that “the Rorschach meets the variety of legal tests for admissibility” in courts, and that “its responsible use in personality assessment is appropriate and justified” (p. 221).

Comments on the SPA White Paper

At a time when the Exner CS Rorschach is embroiled in intense controversy, the SPA Board of Trustees (2005) has chosen to issue an exceptionally strong and highly partisan endorsement of the test. Considering the problems with the CS identified in this article, we believe the Board would have been better advised to issue a much more cautionary and scientifically balanced statement about the test. We offer six comments on the SPA White Paper.

1. The SPA White Paper contains several statements that are uncontroversial. The White Paper makes several statements that virtually all clinical psychologists can accept without reservation: (a) There is no doubt that intense controversy surrounds the Exner CS Rorschach. (b) All well-trained clinical psychologists recognize that neither the Rorschach nor any other test should be interpreted in isolation from other relevant information. (c) Reasonable clinicians can agree that “Rorschach findings alone” (p. 220) should not be used to diagnose childhood sexual abuse. However, it is somewhat surprising that the SPA Board failed to provide a stronger recommendation. In our opinion, Rorschach findings should not be used at all to diagnose sexual abuse, given the absence of demonstrated validity for this purpose (Garb, Wood, & Nezworski, 2000; Lilienfeld et al., 2000; Meyer & Archer, 2001).

2. The SPA White Paper fails to mention the controversy concerning the CS norms or acknowledge its clinical and forensic implications. A central issue concerning the psychometric properties of the CS Rorschach is conspicuously omitted from the SPA White Paper. Specifically, the Paper nowhere mentions the fierce public controversy surrounding the Exner norms. In fact, the word “norms” appears nowhere in the text or footnotes of the SPA White Paper.

It is difficult to understand how the SPA White Paper can claim to provide “a summary of the issues” concerning the Rorschach (p. 219), or pronounce that “the Rorschach possesses adequate psychometric properties (p. 220),” while completely ignoring the problems with the CS norms. As noted, these normative problems bear serious implications for clinical and forensic applications of the test, and have been documented by several independent lines of evidence. We believe that the SPA White Paper should have forthrightly acknowledged this evidence and warned clinicians, attorneys, and judges that the Exner norms for the Rorschach are highly controversial, that they tend to yield a high error rate, and that the test misidentifies more than half of adults and children as seriously disturbed.

3. The SPA White Paper broadly endorses “Rorschach validity,” but fails to warn that the large majority of CS scores lack demonstrated validity. As we have noted, approximately 160 of the 180 scores in the Exner system lack a well demonstrated relationship to psychological disorders, symptoms, or personality characteristics. Of these 160 scores, the SPA White Paper explicitly identifies only one -- the DEPI -- as lacking in validity.

We are troubled that the SPA Board singled out the DEPI but neglected to mention the other CS scores that also lack demonstrated validity. In making broad claims that the Rorschach is as valid as other psychological tests, why didn’t the SPA Board caution clinical psychologists, attorneys, and judges that this claim applies to only a small minority of CS scores?

Instead of forthrightly acknowledging that most Rorschach scores lack independent, replicated evidence for validity, the White Paper offers the well-worn platitude that the validity of all tests can vary: “The Rorschach is like other tests... all have purposes for which they are more or less valid” (p. 220). However, this psychometric truism falls far short of the more specific warning that would have been appropriate. Specifically, the White Paper should have specifically acknowledged that although some CS scores have well-demonstrated validity for their intended purposes, the large majority do not.

4. The SPA White Paper broadly endorses “Rorschach reliability,” but fails to warn users that about 25% of CS scores fail to meet traditional clinical standards for scoring reliability. Our comments regarding CS validity also apply to interrater reliability. In making broad claims that the Rorschach is as reliable as other psychological tests, the SPA Board should have cautioned attorneys and judges that this claim does not apply to about 25% of scores in the Exner system, including the SCZI, the PTI, WSum6, Level 2 scores, X-%, XA%, the D Score, and Adjusted D.

5. The SPA White Paper fails to endorse the principle that test scores should be used only for purposes for which they have well demonstrated validity. Like most clinical psychologists, we believe that test scores -- including Rorschach scores -- should be used clinically and forensically only for purposes for which they have been well validated by research. The SPA Board failed to endorse this principle. Instead, it endorsed a substantially diluted standard lacking any real bite: Clinicians should “attend” to the scientific literature to “ensure Rorschach inferences are consistent with the evidence” (p. 220). To understand what the Board meant by “attending,” we need look no further than the Independent Practitioner piece by Weiner (2005), the Board president. After “attending” to the scientific literature, Weiner promoted the use of several CS scores that research has shown to be invalid.

6. The SPA White Paper wrongly suggests that the Rorschach is as valid as mammography, MRIs, and PET scans. Perhaps nothing in the White Paper is so disturbing as its suggestion that the validity of the Rorschach is equal to that of mammography, MRIs, and PET scans. As noted, the Paper argues that (1) psychological tests are as valid as these medical imaging techniques, and (2) the Rorschach is as valid as other psychological tests. Although the Paper stops short of stating that the Rorschach is as valid as mammography and MRIs, the logical implications of the syllogism are clear: Validity of medical imaging techniques = validity of psychological tests = validity of the Rorschach.

In the 1940s, the founder of SPA, Bruno Klopfer (1940, p. 26), likened the Rorschach to an X-ray. It is unfortunate that in the 21st century the SPA Board continues to encourage similar hyperboles. As justification for these overblown claims, the Board cites an article published in American Psychologist by Rorschach proponent Gregory Meyer (current editor of the Journal of Personality Assessment) and his colleagues (Meyer, Finn, et al., 2001). Because the conclusions of that article have been criticized elsewhere (Garb, Klein, & Grove, 2002; Hunsley, 2002; but see Meyer, Finn, et al., 2002), we will not present a detailed critique here. Instead, we will use a single example to show why flattering comparisons between the CS Rorschach and mammography are highly misleading.

The Meyer, Finn, et al. (2001) article reported that the correlation of mammogram screening results with a diagnosis of breast cancer within one year was r=.32. In comparison, the article reported the global validity of the Rorschach (including the Exner Rorschach) as r=.35. As can be seen, the validity coefficient for the Rorschach was slightly higher than that for mammography. By comparing such coefficients, Meyer et al. arrived at their conclusion that the Rorschach and other psychological tests are at least as valid as mammography.

However, a more detailed analysis of these findings, comparing the validity of the SCZI with the validity of mammograms, leads to much different conclusions. No CS score is better validated by research than the SCZI. It has consistently shown a positive correlation with diagnoses of schizophrenia. In studies reported by researchers other than Exner, the sensitivity of the SCZI has been found to be about .71 (that is, it correctly identifies 71% of patients with schizophrenia) and its specificity in clinical populations is about .76 (that is, it correctly classifies 76% of patients without schizophrenia) (Jorgensen et al., 2000). The base rate of schizophrenia in these studies has generally been about .40. The correlation of the SCZI with diagnoses of schizophrenia under these constraints (sensitivity = .71, specificity = .76, base rate = .40) is .47, which as might be expected is higher than the average validity of Rorschach scores

In the mammography meta-analysis (Mushlin, Kouides, & Shapiro, 1998) reported by Meyer, Finn, et al. (2001), the sensitivity of mammography was found to be .91 (that is, it correctly identifies 91% of patients with breast cancer) and the specificity .95 (that is, it correctly classifies 95% of patients without breast cancer). The base rate of breast cancer was 0.6%, that is, slightly more than one-half of one percent. The correlation of mammography results with subsequent diagnoses of breast cancer was .30 (slightly lower than the figure reported by Meyer, Finn, et al., 2001).

If we were to focus uncritically on validity coefficients, as Meyer, Finn, et al. (2001) did, we might erroneously conclude that the SCZI is superior to mammograms. However, alert readers will have noticed that the sensitivity and specificity of mammography screening (.91 and .95) are substantially higher than the sensitivity and specificity of the SCZI (.71 and .76). Why, then, is the validity coefficient of mammography lower than that of the SCZI? The reason is straightforward: The base rate of breast cancer in the mammography studies was considerably lower (about ½ of 1%) than the base rate of schizophrenia in the SCZI studies (40%). When base rates are low, validity decreases substantially, even for tests with extremely high sensitivity and specificity (Meehl & Rosen, 1955).

In the meta-analysis reported by Meyer, Finn, et al. (2001), mammography was used as a screening device for breast cancer. The validity of mammography was low (.30) because the base rate of cancer was low (0.6 %) among women in the study. However, if mammography had been studied in samples with a base rate of 40%, as in the SCZI example, its validity would have been .87 -- substantially higher than the validity of the SCZI or any other Rorschach score.

Considering the foregoing analysis, can we conclude -- as the Meyer study and the SPA White Paper suggest -- that the Rorschach is as valid as mammograms? Obviously not. Mammograms are substantially more sensitive and specific than the best validated score that the Exner Rorschach has to offer. The validity coefficients cited by Meyer, Finn, et al. (2001) and the SPA Board present a highly misleading picture of the relative effectiveness of the Rorschach and mammograms. In our opinion, the SPA White Paper is wrong to suggest otherwise. We strongly urge the SPA Board to retract its claims and inform psychologists, attorneys and judges that the CS Rorschach is not even remotely as valid as medical imaging techniques.

Conclusions

In this article, we have summarized the central issues in the decade-long controversy concerning Exner’s CS Rorschach. We have also responded to Weiner’s (2005) Independent Practitioner piece and evaluated the recent SPA White Paper on the Rorschach. Having studied Weiner’s work for many years, we were not surprised that he categorically brushed aside all criticisms of the Exner system. However, given that the White Paper was issued by a group of scholars who were well aware of criticisms of the test, we had hoped for a less partisan and more scientifically balanced response to the current Rorschach controversy.

In the White Paper, the SPA Board of Trustees had a valuable opportunity to place the scientific status of the Rorschach on a firm new footing and set high standards for use of the test. First, the Board could have shown responsiveness to the scientific community by inviting input and participation from Rorschach scholars who represent the full breadth of scientific opinions regarding the test. Instead, the composition of the SPA Board was exceedingly one-sided, with one ardent proponent of the CS (Weiner) as President, another (Meyer) as an ex-officio member, and two close associates of CS proponents as members (Fowler, Mihura). Critics of the Exner system were not invited to participate. The Board’s failure to ensure a balanced representation of scientific viewpoints may largely account for the White Paper’s one-sided conclusions regarding the Rorschach.

Second, the Board could have displayed its commitment to high scientific standards for clinical practice by recommending that psychological test scores -- including Rorschach scores -- should be used only for purposes for which they have been scientifically validated. Instead, the Board chose to support a weak and ambiguous standard lacking in substance: That psychologists should “attend” to research.

Third and finally, the Board had an opportunity to demonstrate that the Society for Personality has left behind its scientifically disreputable past, when SPA founder Bruno Klopfer and his followers promoted the Rorschach as the psychological equivalent of an x-ray. Instead, the Board chose to update and embellish Klopfer’s thoroughly discredited claims by suggesting that the Rorschach is as effective as modern radiographic techniques: mammograms, MRIs, and PET scans.

We sincerely regret that the SPA Board missed these opportunities to rehabilitate the Rorschach’s reputation in the broader scientific community. Rorschach proponents sometimes express surprise, dismay, or irritation that their inkblots excite strong controversy among research-oriented clinical psychologists. However, the reason is not hard to find: From the time of Lee J. Cronbach, Hans Eysenck, and Joseph Zubin up until the present, legitimate scientific criticisms of the test have been adamantly resisted or ignored by Rorschach proponents (for a historical summary, see Wood, Nezworski, Lilienfeld, & Garb, 2003). So long as proponents continue to dismiss legitimate criticism, ignore negative research, and present the test as the psychological equivalent of medical imaging techniques, the Rorschach is destined to remain on the fringes of psychological science.

References

Acklin, M. W., McDowell, C. J., Verschell, M. S., & Chan, D. (2000). Interobserver agreement, intraobserver reliability, and the Rorschach Comprehensive System. Journal of Personality Assessment, 74, 15-47.

Aronow, E., & Reznikoff, M. (1976). Rorschach content interpretation. New York: Grune & Stratton.

Board of Trustees of the Society for Personality Assessment (2005). The status of the Rorschach in clinical and forensic practice: An official statement by the Board of Trustees of the Society for Personality Assessment. Journal of Personality Assessment, 85, 219-237.

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284-290.

Conners, K. (1989). Manual for Conners’ rating scales. North Tonawanda, NY: Multi Health Systems.

Crews, F. C. (2004). Out, damned blot! New York Review of Books, 51(12), 22-25. Reprinted in: J. Weiner (Ed.) (2005), The best American science and nature writing 2005 (pp. 29-40). Boston: Houghton Mifflin.

Exner, J. E. (1978). The Rorschach: A Comprehensive System: Vol. 2. Current research and advanced interpretation. New York: Wiley.

Exner, J. E. (1986). The Rorschach: A Comprehensive System: Vol. 1. Basic foundations (2nd ed.). New York: Wiley.

Exner, J. E. (1991). The Rorschach: A Comprehensive System: Vol. 2. Interpretation (2nd ed.). New York: Wiley.

Exner, J. E. (2001). A comment on “The misperception of psychopathology: Problems with the norms of the Comprehensive System for the Rorschach.” Clinical Psychology: Science and Practice, 8, 386-388.

Exner, J. E. (2002). A new nonpatient sample for the Rorschach Comprehensive System: A progress report. Journal of Personality Assessment, 78, 391-404.

Exner, J. E. (2003). The Rorschach: A comprehensive system: Vol. 1: Basic foundations and principles of interpretation (4th ed.). Hoboken, NJ: Wiley.

Frank, G. (1990). Research on the clinical usefulness of the Rorschach: I. The diagnosis of schizophrenia. Perceptual and Motor Skills, 71, 573-578.

Garb, H. N., Klein, D. F., & Grove, W. M. (2002). Comparison of medical and psychological tests. American Psychologist, 57, 137-138.

Garb, H. N., Wood, J. M., Lilienfeld, S. O., & Nezworski, M. T. (2005). Roots of the Rorschach controversy. Clinical Psychology Review, 25, 97-118.

Garb, H. N., Wood, J. M., & Nezworski, M. T. (2000). Projective techniques and the detection of child sexual abuse. Child Maltreatment, 5, 161-168.

Goode, E. (2001). What’s in an inkblot? Some say, not much. New York Times, February 20, D1.

Goode, E. (2004). Defying psychiatric wisdom, these skeptics say ‘Prove it.’ The New York Times, March 9, D1, D6.

Hamel, M., Shaffer, T. W., & Erdberg, P. (2000). A study of nonpatient preadolescent Rorschach protocols. Journal of Personality Assessment, 75, 280-294.

Hiller, J. B., Rosenthal, R., Bornstein, R. F., Berry, D. T. R., & Brunell-Neuleib, S. (1999). A comparative meta-analysis of Rorschach and MMPI validity. Psychological Assessment, 11, 278-296.

Hunsley, J. (2002). Psychological testing and psychological assessment: A closer examination. American Psychologist, 57, 139-140.

Jorgensen, K., Andersen, T. J., & Dam, H. (2000). The diagnostic efficiency of the Rorschach depression index and the schizophrenia index: A review. Assessment, 7, 259-280.

Kleiger, J. H. (1999). Disordered thinking and the Rorschach. Hillsdale, NJ: Analytic Press.

Klopfer, B. (1940). Personality aspects revealed by the Rorschach method. Rorschach Research Exchange, 4, 26-29.

Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological Science in the Public Interest, 1, 27-66.

Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2001). What’s wrong with this picture? Scientific American, 284 (5), 80-87.

Luxenberg, T., & Levin, P. (2004). The role of the Rorschach in the assessment of trauma. In J. P. Wilson & T. M. Keane (Eds.), Assessing psychological trauma and PTSD (2nd ed., pp. 190-225).

McGrath, R. E., Pogge, D. L., Stokes, J. M., Cragnolino, A., Zaccario, M., Hayman, J., Piacentini, T., & Wayland-Smith, D. (2005). Field reliability of Comprehensive System scoring in an adolescent inpatient sample. Assessment, 12, 199-209.

Meehl, P.E. & Rosen, A. (1955) Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin 52,194-216

Mestel, R. (2003). Rorschach tested. Blot out the famous method? Los Angeles Times, May 19, F-1.

Meyer, G. J. (2001). Evidence to correct misperceptions about Rorschach norms. Clinical Psychology: Science and Practice, 8, 389-396.

Meyer, G. J., & Archer, R. P. (2001). The hard science of Rorschach research: What do we know and where do we go? Psychological Assessment, 13, 486-502.

Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R., Eisman, E. J., Kubiszyn, T. W., & Reed, G. M. (2001). Psychological testing and psychological assessment: A review of evidence and issues. American Psychologist, 56, 128-165.

Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Dies, R. R., Eisman, E. J., Kubiszyn, T. W., & Reed, G. M. (2002). Amplifying issues related to psychological testing and assessment. American Psychologist, 57, 140-141.

Meyer, G. J., Hilsenroth, M. J., Baxter, D., Exner, J. E., Fowler, J. C., Piers, C. C., & Resnick, J. (2002). An examination of interrater reliability for scoring the Rorschach Comprehensive System in eight data sets. Journal of Personality Assessment, 78, 219-274.

Mittman, B. L. (1983). Judges’ ability to diagnose schizophrenia on the Rorschach: The effect of malingering. Unpublished doctoral dissertation. Long Island University.

Mushlin, A. I., Kouides, R. W., & Shapiro, D. E. (1998). Estimating the accuracy of screening mammography: A meta-analysis. American Journal of Preventive Medicine, 14, 143-153.

Nezworski, M. T., & Wood, J. M. (1995). Narcissism in the Comprehensive System for the Rorschach. Clinical Psychology: Science and Practice, 2, 179-199.

Nunnally, J. C., & Bernstein, I. C. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.

Shaffer, T. W., Erdberg, P., & Haroian, J. (1999). Current nonpatient data for the Rorschach, WAIS-R, and MMPI-2. Journal of Personality Assessment, 73, 305-316.

Shrout, P. E. (1998). Measurement reliability and agreement in psychiatry. Statistical methods in medical research, 7, 301-317.

Smith, S. R. (2001). Multimethod assessment of child and adolescent psychopathology: An examination of behavior ratings, self-report, and the Rorschach inkblot method. Unpublished doctoral dissertation. University of Arkansas.

Viglione, D. J., & Taylor, N. (2003). Empirical support for interrater reliability of Rorschach Comprehensive System coding. Journal of Clinical Psychology, 59, 111-121.

Wechsler, D. (1997). WAIS-III administration and scoring manual. San Antonio, TX: The Psychological Corporation.

Weiner, I. B. (2000). Using the Rorschach properly in practice and research. Journal of Clinical Psychology, 56, 435-438.

Weiner, I. B. (2005). The utility of Rorschach assessment in clinical and forensic practice. Independent Practitioner, 25, 76-83.

Wood, J. M., Krishnamurthy, R., & Archer, R. P. (2003). Three factors of the Comprehensive System for the Rorschach and their relationship to Wechsler IQ scores in an adolescent sample. Assessment, 10, 259-265.

Wood, J. M., Lilienfeld, S. O., Garb, H. N., & Nezworski, M. T. (2000). The Rorschach Test in clinical diagnosis: A critical review, with a backward look at Garfield (1947). Journal of Clinical Psychology, 56, 395-430.

Wood, J. M., Nezworski, M. T., Garb, H. N., & Lilienfeld, S. O. (2001a). The misperception of psychopathology: Problems with the norms of the Comprehensive System for the Rorschach. Clinical Psychology: Science and Practice, 8, 350-373.

Wood, J. M., Nezworski, M. T., Garb, Howard, N., & Lilienfeld, S. O. (2001b) Problems with the norms of the Comprehensive System for the Rorschach: Methodological and conceptual considerations. Clinical Psychology: Science and Practice, 8, 397-402.

Wood, J. M., Nezworski, M. T., & Garb, H. N. (2003) What’s right with the Rorschach? Scientific Review of Mental Health Practice, 2, 142-146.

Wood, J. M., Nezworski, M. T., Lilienfeld, S.O., & Garb, H. N. (2003). What’s wrong with the Rorschach? Science confronts the controversial inkblot test. San Francisco: Jossey-Bass.

Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996). The Comprehensive System for the Rorschach: A critical examination. Psychological Science, 7, 3-10.

Author Note

James M. Wood, Department of Psychology, University of Texas at El Paso
M. Teresa Nezworski, Department of Psychology, University of Texas at Dallas
Howard N. Garb, Psychology Research Service, Wilford Hall Medical Center, Lackland Air Force Base

Scott O. Lilienfeld, Department of Psychology, Emory University
Correspondence concerning this article should be addressed to James M. Wood, Department of Psychology, University of Texas at El Paso, El Paso, Texas, 79968. E-mail: jawood@utep.edu

The views expressed in this article are those of the authors and are not the official policy of the Department of Defense or the United States Air Force.

Return to Top