Imagine you are the editor of, or reviewer for, a very prestigious scientific journal, and you receive a paper about the efficacy of a novel drug for, say, cancer or HIV. You know that current drugs only work for about 1 out of 3 patients, so there are certainly large incentives to develop new drugs. The authors of the submitted study base their drug on a biological pathway which they believe has something to do with the disorder, and tested the drug in patients.
Specifically, they …
- … tested 24 patients, and treat 18 of them with the drug, and 6 of them with placebo;
- … used 4 outcome measures to assess treatment efficacy, and find that patients do better than placebo only on 2 of these 4 measures;
- … did not compare the efficacy of the novel drug against well-established drugs;
- … tested the biological pathway on which the drug was developed to work, and find it does not;
- … claim that the drug “shows potential as a treatment”.
What would your response as editor or reviewer be? 24 patients seems unreasonably small for conclusions about treatment efficacy. And if your drug does not work on the biological pathway you developed it to work on, that’s an interesting negative result, but does not imply the drug has potential. And testing a new drug without comparing it to an already established drug seems, at best, extremely odd (yet surprisingly common in the particular field the paper was published in). Justifying a novel medication over all the existing ones means you have to show it outperforms them, right?
Fava et al. 2015
Now, exactly this study was published yesterday by Fava et al. (2015) in Molecular Psychiatry, entitled: “A Phase 1B, randomized, double blind, placebo controlled, multiple-dose escalation study of NSI-189 phosphate, a neurogenic compound, in depressed patients”. Molecular Psychiatry is one of the most important psychiatric journals, ranked 1/140 in terms of impact factor.
To summarize the results: they tested a novel antidepressant drug to treat depression. It outperforms placebo in 2 of the 4 dependent variables measured, and they did not compare it against standard antidepressant treatment. The drug does not actually increase hippocampal volume, which the drug was developed to do. Keep also in mind that if you set your alpha level to 5%, testing 4 instead of 1 dependent variables for treatment efficacy increases the probability to wrongly accept the H1 from 5% to 18.5% (1-0.95^4).
I disagree that these results are “promising”. Nonetheless, the American Psychiatric Association wrote a newsletter about the paper entitled:
“Novel Compound Found to Be Safe, Effective at Reducing Symptoms of MDD”.
Oh come on … there is very little evidence for the drug being effective, seeing that it only outperformed placebo on 2 of 4 outcome measures. Moreover, the study encompassed 24 patients only, which makes such a general conclusion extremely questionable at best. And safe? Imagine, for the sake of the argument, that the drug actually kills 1 of 100 patients (this is not the case, but entertain the thought for a second). There is only a 18% chance to pick up on such a hypothetical side effect when you only test 18 patients. And that makes the conclusion absolutely not ok that the drug is safe. The study also only lasted for four weeks, which is too short to pick up on many adverse effects.
Reduced hippocampal volume
It is also crucial to point out that the evidence of reduced hippocampal volume in depressed patients compared to controls is miniscule in the first place – an assumption that is the very basis for the paper by Fava et al. In a recent paper (incidentally, also published in Molecular Psychiatry), we showed that hippocampal differences between groups are so miniscule that knowing a person’s hippocampus size does not tell you whether the person is depressed or not: the discrimination accuracy is ~52.6%, and a discrimination accuracy of 50% means that you cannot distinguish at all. You can think about this similar to two groups in which the heights differ: if you compare the two groups tall men and small girls regarding their weight, not only will you find a large mean difference, you will also find a high discrimination accuracy (observing that a specific person has a low weight tells you nearly certainly that that person is a small girl, and not at tall man). In this case here, depressed patients and healthy controls are nearly identical in terms of their hippocampal volumes, leading to a very low prediction accuracy not different from chance.
In the paper, we also discuss why hippocampal volume is likely not related to depression at all (after controlling for confounds and disease specificity), and even if there were consistent differences between depressed patients and controls, that there is no causal evidence that reduced hippocampal volume leads to depression (which is required for developing a drug that is supposed to increase hippocampal volume in order to treat depression).
Psychiatry is desperate, and understandably so, especially when it comes to depression research. After half a century of drug development, the best antidepressants only marginally outperform placebos. After over 3 decades of biomarker research, there is no single biomarker robustly associated with depression diagnosis.
But this despair does not justify lowering scientific standards regarding what we consider safe and effective drugs.
Fava, M., Johe, K., Ereshefsky, L., Gertsik, L. G., English, B. A., Bilello, J. A., … Freeman, M. P. (2015). A Phase 1B, randomized, double blind, placebo controlled, multiple-dose escalation study of NSI-189 phosphate, a neurogenic compound, in depressed patients. Molecular Psychiatry. doi:10.1038/mp.2015.178. (URL)