by Michael D. Anestis, M.S.
As readers of PBB have likely come to realize over the past year, Joye and I believe it is extremely important to fight against misinformation. Unfortunately, a lot of bad research gets published - sometimes in really strong journals - and that research is often then publicized as accurate and factual. On a number of occasions, we have covered this issue, writing articles discussing the important flaws in certain research, particularly when that research has become a popular talking point (click here, here, and here for examples of this). In 2008, Leichsenring and Rabung published a meta-analysis in the highly influential Journal of the American Medical Association (JAMA) in which they claimed to demonstrate that long-term psychotherapy - defined as at least one year or 50 sessions of psychotherapy - is more effective than short-term psychotherapy. This study became a popular piece of evidence for individuals who already believed this to be the case, and was cited in a number of journal articles, including Shedler's (2010) piece, which was the subject of one of the articles linked to above.
As you might guess from the opening of this article, it turns out that the Leichsenring and Rabung (2008) article was full of substantial flaws that completely negate the conclusions they drew. In a paper just published in Psychotherapy and Psychosomatics by Sunil Bhar, Brett Thombs, Monica Pignotti, Marielle Bassel, Lisa Jewett, PBB guest contributor Jim Coyne, and Aaron Beck, these flaws were discussed in great detail. I have had to sit on these results for several months now as the paper awaited publication, so I am excited to finally have the opportunity to write about them!
In their meta-analysis, Leichsenring and Rabung (2008) analyzed 8 studies comparing long-term psychodynamic psychotherapy (LTPP) to a variety of other interventions for a number of diagnoses and concluded that LTPP was "significantly superior to shorter-term methods of psychotherapy with regard to overall outcome, target problems, and personality functioning" (p 1563). If the data supported this claim, it would be a stunning reversal of clinical research conducted over the past several decades. The data did not, however, do anything of the sort. To explain my point, I'll briefly summarize each of the areas addressed by Bhar et al (2010).
Calculation errors
The biggest issue with the Leichsenring and Rabung (2008) meta-analysis is that they ran the wrong analysis, which created faulty results. Bhar and colleagues (2010) explained that the authors, in calculating effect sizes (remember, effect sizes are a measure of how powerful a finding is), used the wrong conversion formula. The formula they used is intended for conversions of between-group point biserial correlations to standardized difference effect sizes, but the authors used within-group effect sizes. Now, obviously this is a fairly obscure statistical reference, but let me explain the consequences of this: even though no single study in the analysis demonstrated an overall standardized mean difference greater than 1.45, the combined effect size was calculated as 1.8. Additionally, because of this, they generated a between-group effect size of 6.9, which means that 93% of the variance was explained. That is essentially impossible. These miscalculations would be equivalent to earning a C on every exam you take during a semester and then concluding that your average was a B+. What does this mean? It means that they concluded that LTPP drastically outperformed control conditions when, in reality, their estimates severely overstated the case.
Issues with the comparisons
There are actually several issues here, one of which we've discussed a number of times before. First of all, the authors compared studies in which participants were being treated for a wide variety of conditions and combined those results into one outcome. In other words, some of the clients were being treated for anorexia nervosa, others for borderline personality disorder, some for "neurosis," and others for a now defunct diagnosis: self-defeating personality disorder. Asking whether one treatment is better than another for everything is a broad and essentially useless question. There is an abundance of research indicating that particular treatments are better than others on average for particular conditions. When you combine the results of treatments for a number of diagnoses together, you are glossing over those results and essentially combining apples and oranges and, perhaps not shockingly, coming up with non-significant results.
In addition to comparing studies measuring the treatment of different diagnoses, Leichsenring and Rabung (2008) also combined completely different treatments into single groups. The general comparison in their study was LTPP versus short-term psychotherapy. Included in the short-term psychotherapy group were:
- Waitlist control condition (e.g., no treatment at all!!!)
- Nutritional counseling
- Standard psychiatric care
- Low contact routine treatment
- Treatment as usual in the community
- Referral of alcohol rehabilitation
- Provision of a therapist phone number
Looking over that list, do you think that represents a strong example of what typically occurs in empirically supported short term psychotherapy?
There were only two examples in which LTPP was compared to an empirically supported treatment. In one, LTPP was compared to dialectical behavior therapy (DBT) for borderline personality disorder (BPD) and, in the other, LTPP was compared to family-based treatment for anorexia nervosa. LTPP did not outperform either treatment. In other words, adding a huge number of sessions and a large amount of time did not result in any benefits (although it almost certainly cost substantially more money to the client). The only time LTPP outperformed another form of therapy in any of the trials, it was being compared to no therapy at all or an unvalidated treatment. Those are hardly compelling results.
The final issue with the comparisons in the Leichsenring and Rabung (2008) study was that they were so severely underpowered. Leichsenring and Rabung (2008) believed that publication bias was not an issue because non-significant correlations between effect size and sample size. Because only 8 studies were used, however, a significant correlation was nearly impossible to find and, as such, the absence of one is essentially meaningless. Analyses have indicated that, in order for a treatment study to be able to actually answer the questions it asks, a minimum of 50 participants need to be in each treatment group. In the Leichsenring and Rabung (2008) study, there was anywhere from 15 to 30 participants in each group.
The issue of power and publication bias is a tricky but important one. Think of it this way: journals don't tend to publish results that are not statistically significant. Because a small sample size requires a HUGE effect in order to be statistically significant, only these extreme examples end up being published. As such, results become artificially inflated.
Potential bias in studies
Bhar et al (2010) then shifted their focus to the lack of reasonable assessments of bias in the studies included in the Leichsenring and Rabung (2008) analysis. The authors concluded that few of the studies took appropriate safeguards to ensure that participants were properly randomized, that randomization sequencing was concealed from people making assessments, that assessors were blind to condition post-treatment, that missing data were analyzed appropriately, and that the authors from the original studies actually included all of the relevant outcomes.
Additionally, there was great variety in the number and frequency of treatment sessions and the presence of medication augmentation, making it very difficult to make valid comparisons. None of the studies included in the Leichsenring and Rabung (2008) study properly assessed treatment integrity, meaning we have no idea to what degree therapists actually administered the treatments as they are designed.
Summary
Ultimately, what Bhar and colleagues (2010) found was that Leichsenring and Rabung (2008) used too few studies, that those studies were methodologically weak, that diagnoses and treatments were combined into groups that made no sense, that some short-term psychotherapies actually did not involve any therapy, and perhaps worst of all, that the analyses they ran were incorrect, leading to impossible results completely unrepresentative of reality.
A number of thoughts jump out at me when I think about this issue. First of all, how did the Leichsenring and Rabung (2008) study get published in JAMA? Secondly, what can be done to keep people from simply accepting the results of meta-analyses as though they are representations of fact rather than studies full of at least as much bias and flaws as any single study? The bottom line is, research simply does not support the claims that LTPP is more effective for short-term psychotherapy (and those distinctions aren't very useful anyway). Meta-analyses like this that make broad claims based upon weak studies and miscalculations are a real problem unless readers are willing to go to the original studies and double check the claims being made by authors. That, unfortunately, is not a realistic expectation.
************
If you would like to learn more about this or other topics discussed on PBB, we hope you will consult our online store for scientifically-based psychological resources.
Mike Anestis is a doctoral candidate in the clinical psychology department at Florida State University and an incoming resident at the University of Mississippi Medical Center.





