I am, admittedly, taking on a bit of a broad topic today and I suspect that, in doing so, I will come up short on some of my explanations, but this morning I feel compelled to write about a topic that is central to the way scientists conceptualize every aspect of clinical psychology: evidence. This is a topic that I cover fairly early each semester when I teach Abnormal Psychology to undergraduates. As I explain the ways in which we approach clinical psychology as a science, students invariably express surprise, as this is not the way that many of them had thought about therapy or mental illness in the past. Similarly, as I converse with readers on Psychotherapy Brown Bag and on social media websites (e.g., Twitter - user name @PsychBrownBag), I have often encountered alternative perspectives that involve concern regarding the nature, reliability, and validity of evidence. In order to extend these conversations and help clarify what we mean when we discuss concepts such as empirically supported treatments, I would like to take this opportunity to explain what we mean by "evidence" and why so many of us believe it is absolutely essential in the field of clinical psychology - not only for researchers, but also for therapists, teachers, and clients.
Vague, huh? I told you. Let me clarify this by providing specific examples of what does and does not constitute evidence as we conceptualize it in science. If we were interested in comparing the degree to which two different types of treatments effectively reduce symptoms of depression, we would need evidence that provided us with information regarding clients' initial levels of depression as well as subsequent measures of depressive symptoms measured during the course of treatment and after termination (ideally including several long-term follow-up measurements that help us determine if the benefits of each treatment are maintained). Evidence, in this case, would be data accumulated through the administration of measures of depressive symptoms. We would assess the relative value of each treatment by examining whether there are any significant statistical differences between the two groups in terms of how much their symptoms improved during treatment, how long such benefits were maintained, etc...
Depressive symptoms can be measured in a variety of ways (see our assessment tools page for detailed descriptions of various measures). There are structured clinical interviews (e.g., SCID), in which the clincian or researcher asks an individual questions about his or her depressive symptoms and scores the answers to those questions based upon a protocol. There are also self-report measures (e.g., Beck Depression Inventory - 2), that ask the client to rate the degree to which descriptions of depressive symptoms have applied to them over a discrete period of time. When such measures are designed, they are compared to other forms of measurement to ensure that they actually accomplish what they are said to accomplish (construct validity) and predict the types of things they should theoretically predict (predictive validity). If they do not stack up to these requirements, they are revised or scrapped altogether. In other words, they are not simply measures of depression because we say so, but rather because they reliably produce results.
Ideally, research involves the administration of multiple forms of measurement by several individuals at multiple points in time. Doing so decreases the chances that results are due to bias, systematic weaknesses in the form of measurement, or the timing of measurement (e.g., measuring depressive symptoms on a particularly bad day might lead to exaggerated results). Such approaches, however, are often impractical, leaving a single individual to simply choose the method of measurement that has been most effectively used in that particular circumstance. Quite obviously, this is imperfect; however, for reasons I will discuss at great length later, such an approach is by far the most valid and reliable one available.
So what does not constitute evidence as defined above? Many things, actually, but I will just cover a couple. One example would be most projective tests. Projective tests ask individuals to complete specific tasks (e.g., interpret ambiguous stimuli, report the first word that comes into their head in response to a prompt), the results of which are "interpreted" by a clinician. Why does this not constitute evidence? For one thing, such tests are generally highly unreliable - one person's interpretation is unlikely to match another person's interpretation. If we can't agree on what the results say, the results are rendered meaningless. Additionally, whereas measures like the SCID and the BDI-II have been shown on many occasions to predict important variables (e.g., suicidal ideation, response to therapy), projective tests are rarely subject to such investigation. In other words, the value of such measures is based purely upon the opinion of the administrator. Imagine if your doctor was attempting to determine whether or not you had a physical ailment. Would you be okay with that doctor simply basing his or her answer upon a feeling or would you prefer that the answer be based upon some form of systematic examination capable of producing unambiguous results that are easily evaluated by multiple people without disagreement?
Perhaps the most important question underlying the debate about the importance of evidence in clinical psychology is thus - what is better, clinical intuition or actuarial data? Fortunately, this question has been addressed many times by people far more accomplished than myself (see Dawes, Faust, & Meehl, 1989 and Grove & Meehl, 1996 for great examples of this), so I will address this and related questions through the lens of their work rather than basing it purely upon my own.
Clinical intuition involves subjective interpretations on the part of the clinician regarding any number of variables (e.g., the effectiveness of treatment, suicide risk status, diagnostic status). Judgments are often based upon past experiences and mental shortcuts. Such an approach is not only flawed, but actually unethical and dangerous at times. Attempting to evaluate important mental health variables based upon past experiences invariably leads to inaccurate conclusions due to deficits in memory, the primacy and recency effects, and our own tendency to dismiss evidence that does not support our beliefs and cling to evidence that does.
Actuarial data, on the other hand, involves the use of quantitative information that, based upon results from prior individuals, can be used to make predictions. In other words, we can use results from a structured interview to predict the best treatment approach based upon studies demonstrating that individuals with a particular diagnosis respond best to a specific treatment. Alternatively, we can use scores on a measure of impulsivity to predict an adolescent's vulnerability to utilizing particular types of destructive behaviors. In order to qualify as actuarial data, the meaning of particular results must be predetermined (e.g., not subject to bias) and based upon empirical relations.
There are a variety of reasons why, in most circumstances, actuarial data is preferable to clinical intuition. I will attempt to cover several of these.
Absolutely. However, if we don't have any way or reliably differentiating between those who fit the general pattern and those who do not, we are simply guessing and numerous studies have shown that, when we simply guess (even if our guess is based upon years of experience), we perform significantly worse than we do using data to drive our decisions (e.g., Dawes, Faust, & Meehl, 1989; Grove & Meehl, 1996). Additionally, there are very simple ways to use data to determine who might differentiate from the norm. Such analyses are called moderation analyses and they provide us with reliable information that indicates whether a particular variable strengthens or weakens the relationship between two other variables (e.g., does an individual's ethnic background impact whether or not a specific treatment is useful in treating a specific disorder), so there's no need to rely on guesswork in our efforts to separate individuals from group results!
How is this relevant to the topic at hand? Consider suicide risk assessment. If we hear a variety of information regarding an individual's risk for suicide - some protective factors and some risk factors - we are left with a decision to make: to what degree is this particular client at imminent risk for a suicide attempt? If we are simply using our intuition, we'll attend to some of this information as particularly important, dismiss other pieces of information, and arrive at a conclusion that "feels right" to us. Here's the thing though...this is truly a matter of life and death. As it turns out, there is an immense amount of research regarding the degree to which particular variables contribute to suicide risk. By no means is such research perfect or representative of the reality of all individuals; however, there are clear empirically based guidelines upon which to base an evaluation of an individual's risk for suicide and such methods help clinicians to reliably understand to what degree a particular variable (e.g., past number of suicide attempts) impacts current risk (Joiner, Walker, Rudd, & Robes, 1999).
So here again, in a moment of crisis, would you rather your risk levels and subsequent therapy be based upon systematic data or an individual's opinion?
Mike - "Raise your hand if you have ever thought about a friend with whom you haven't spoken in a long time and then, immediately after that thought, the friend called you."
Everybody raises their hand, many of them offering comments about how amazing that experience had been, as though their thoughts made it happen. I follow that question up with:
Mike - "Raise your hand if you have ever thought about a friend with whom you haven't spoken in a long time and then....well....they didn't call."
Everybody again raises their hand, this time chuckling.
Oddmatch demonstrates our tendency to make errors when we attempt to understand the infinite number of stimuli around us in any given moment. We simply can not do it without help. Clinically, this becomes an issue because our tendency to see false connections results in the perpetuation of misinformation. For example, many clinicians maintain the belief that individuals who were sexually abused as children are likely to become sexual abusers as adults when, in fact, data has repeatedly shown that childhood sexual abuse victims are at most equally likely and perhaps slightly less likely than the rest of the population to become sexual abusers as adults. Here again, we need evidence - data - to help us accurately understand clinical psychology. Otherwise our biases, no matter how well intended, will color our interpretations and decrease the quality of care available to our clients.
So intuition is useless? Doesn't this take all of the creativity out of thinking?! Can it all be about numbers?
In this sense, science becomes a series of competing theories, each of which builds upon the past and corrects a variety of prior errors. No theory is pefect, most if not all are eventually overturned by others, and progress continues. Our progress, however, is marked by the evidence supporting our claims, not by the strength of our beliefs in our cause without reflection upon the evidence for its validity.
The evidence for the utility of CBT in treating a variety of mental illnesses is not perfect, but it is far superior to the dearth of evidence supporting many other approaches. Despite persistent attempts by some to spin the positive results for CBT as a reflection of the inability of science to measure important variables in clinical psychology, this line of reasoning remains completely unfounded. It remains a mystery to me how positive findings for CBT relative to other approaches can be seen as a weakness of science and CBT and a strength for other therapeutic approaches. Sometimes, when something fails, it is because it was not up to the task rather than representing a flawed approach to measurement.
Additionally, it remains unclear how empirical data - which is described clearly in all studies and easily replicated by skeptics should they choose to test their doubts rather than simply proclaim them to be gospel - could be more subjective than the vague interpretations of those who choose to believe that alternative approaches are better but the superiority of such approaches can not be confirmed by any form of measurement (unless they find one that supports their claim, in which case that measurement is valid but all others remain unimportant). Statstical analyses are spelled out in every journal article and, as such, any "subjectivity" can be clearly pointed out and corrected by the skeptic in a future study. The idea of statistics simply being made to say whatever the researcher wants them to say is a talking point that has been repeated often enough that many people believe despite a complete lack of support for the claim.
Clinical psychology is about accurately understanding mental illness and providing the best possible care to clients. It is not about satisfying the intellectual needs of therapists or researchers. Whether or not I understand a particular therapeutic approach, if data repeatedly demonstrates that it is a superior approach, I will advocate for its use in the proper circumstance. Doing anything different takes the focus off of the clients and places it on those who are supposed to provide help to people in need. That, to me, seems like the height of irresponsibility and an approach to business unacceptable in any other sector of professional or personal life.
Mike Anestis is a doctoral candidate in the clinical psychology department at Florida State University.



