A new study on candidate genes for depression was just published in the American Journal of Psychiatry by Border et al, entitled “No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples”.
Below, I will discuss the background and context of the study, followed by study design and results.
One of the most costly enterprises in psychiatry in the last 2 decades has been the genotyping of millions of participants in the hopes of identifying biological markers for common mental illnesses such as major depression — in the hopes that the identification would then lead to actionable outcomes that would ultimately benefit patients.
Two common types of studies that have been conducted are studies on candidate genes (or candidate gene-by-environment interactions), and genome-wide association studies (GWAS). Candidate gene studies investigate associations between pre-specified genes (often based on theory) and e.g. depression, while GWAS search the entire genome for common genetic variation. You can compare this a bit to neuroimaging, where some have studied regions of interest (ROI) — i.e. do we see activity in the amygdala — whereas other more data-driven studies have investigated the whole brain.
Border et al. 2019
With this background in mind, let’s take a closer look at the study. Border et al. 2019 sought to investigate whether candidate genes or candidate gene-by-environment interactions predict major depression. The second is relevant because there has been much work claiming that a certain genetic polymorphism related to serotonin expression does by itself not lead to higher depression rates, but interacts with life-stress in a way that leads to higher depression rates.
First, the authors preregistered their analyses, greatly reducing their research degree of freedom and flexibility in the analyses, which dramatically increases my confidence in the veracity of the results. Second, it is the largest and most comprehensive study on the topic so far, and the authors employed a multiverse approach to data analysis. That is, they included: numerous samples; numerous different phenotypic definitions of depression (e.g. current depression, lifetime depression); numerous candidate genes and interactions identified in the prior literature; numerous analytic strategies; and numerous moderators (yes, the “hidden moderator” argument has also been raised for failed replication attempts in depression candidate gene studies). Third, they used a liberal significance threshold for detecting effects.
The authors identified very few very weak effects, but none of them explained any appreciable amount of variance, or consistently generalized across datasets / analytic methods / phenotypic definitions of depression. Border et al. 2019 conclude in the abstract:
Results: No clear evidence was found for any candidate gene polymorphism associations with depression phenotypes or any polymorphism-by-environment moderator effects. As a set, depression candidate genes were no more associated with depression phenotypes than noncandidate genes. The authors demonstrate that phenotypic measurement error is unlikely to account for these null findings.
Conclusions: The study results do not support previous depression candidate gene findings, in which large genetic effects are frequently reported in samples orders of magnitude smaller than those examined here. Instead, the results suggest that early hypotheses about depression candidate genes were incorrect and that the large number of associations reported in the depression candidate gene literature are likely to be false positives.
There were also no candidate gene x environment interactions, but the well-known environmental main effects were highly significant. As the authors state: “experiencing childhood trauma increased odds for estimated lifetime depression diagnosis by a factor of 1.655 (z=32.048, p=2.333 x 10^-225) and experiencing a traumatic event in the past 2 years increased incidence rate of current depression severity index by a factor of 1.431 (z=27.004, p=1.323×10^-160)”.
This is a strong null-finding. Border et al. 2019 provide a lower bound of the number of depression candidate gene studies conducted in the last decades, per candidate gene:
The authors conclude that “it is time for depression research to abandon historical candidate gene and candidate gene-by-environment interaction hypotheses”. Note that calls for caution regarding candidate gene research in depression were published over a decade ago; I hope the overwhelming evidence of the present paper settles this debate.
Now let’s move on from candidate gene studies for depression to GWAS. Something I’ve found odd during my PhD and post-doc was that papers kept failing to detect genetic markers for depression, but authors did often not interpret these null-findings as null-findings. This abstract is from Hek et al. 2013, in what was considered a very large sample in 2013 — and what, in other areas of science where humans subjects are involved, would still be considered a huge sample size to detect any effect of clinical relevance.
The abstract is similar to others in the field, and remarkable in the sense that it interprets the absence of evidence as tentative evidence for genetic effects, given larger samples. The question I have always asked myself is what sample size would be sufficient to conclude that there is no effect. Imagine the fifth social psychology study with 35,000 participants concluding that there is no effect of the manipulation on the outcome, with researchers asking for money to conduct a study in 50,000 participants.
But history proved the authors right, and recent studies did identify genetic markers for depression in GWAS studies. This did not come as a surprise to many, given that any trait can be linked to genetic markers if the sample is large enough. In addition, if we are picking up genetic signal for the trait we are interested in — depression — or to some correlate (such as a comorbid disorder) that we are not primarily interested in, is largely unclear at present, although there is a lot of recent work on trying to disentangle such effects.
Overall, identified genetic effects for depression have remained small. The first study identifying a replicated marker for depression across two datasets was by Cai et al. 2015. Nature published an editorial together with the study, in which the genetic markers were called “robust genetic links depression”. The editorial speculated that “findings could guide biologists to new drugs, and could one day be used to aid diagnosis”, suggested that results “may serve as a framework for future attempts to collect data from tens of thousands of people”, and that biological “could be investigated as drug targets, and for their potential to make diagnosis of depression more definitive”.
Together with Sophie van der Sluis & Angelique Cramer, we drew somewhat more skeptical inferences than the Editorial; here a brief excerpt:
The laudable effort of the CONVERGE consortium to ensure genetically and phenotypically homogenous samples confirms the elusiveness of the genetics of MDD. Hailing the results as robust insights into the biology of depression detracts from the true scientific relevance of the study: genetic effects for MDD are, even in large homogenous samples, small and do not generalize. Given the hitherto negative results of genetic MDD studies, slogging along on this current road of ever-larger samples and discovering at best small effects is not an alluring prospect, especially so considering that these effects are likely not specific to MDD. Instead, we suggest revising complex psychiatric phenotypes such as MDD that were transferred unquestioningly from psychiatry to genetics. Incorporating recently proposed network models, symptom- rather than syndrome-level analyses, and the development of new instruments that tap variation along the entire continuum (i.e., in both “cases” and “controls”) offer promising ways forward.
Another point we made is that if you compare depressed participants vs. controls and identify genetic differences, it is unclear whether they are specific to depression — and most ‘depression markers’ identified in the literature so far turned out not to be specific to depression (details here).
Note that this is not criticism of the paper by Cai et al. 2015, or other researchers in the field. I have had the privilege to work with some of the people involved in these large-scale collaborative efforts, all of whom care just as much as I do about scientific rigor and improving the well-being of patients. The criticism was directed at the over-interpretation and the hype surrounding biological findings that explain orders of magnitude less variance than well-known and highly replicated environmental and social correlates of depression (see this for more details). Also keep in mind that this is from 2015, and there has been a lot of modeling progress in recent years (check out this 2-minute explanation on Mendelian Randomization). The most recent GWAS study I am aware of features over 2.3 million combined participants, but explained variance remains very low (to the degree that explained variance, prediction accuracy or similar metrics are not provided in the abstract of the paper).
Overall, the fact is that one of the largest, most cost-intensive efforts to improve prognosis, prevention, and treatment for major depression has not led to any appreciable benefits to patients. This is not the fault of researchers working in the area, obviously. And I’m not pointing fingers: my own own work has also not led to improvements of patient-care. In fact, few would argue that we can provide more efficacious evidence-based treatments today than 3 decades ago, which I find incredibly frustrating, and I am sure many others share this sentiment.
But I think this is a good moment to pause and think about ways forward. Kevin Mitchell recently published a blog post on “Life after GWAS“, which summarizes the current state of research well:
One thing that GWAS for psychiatric disorders have not been that useful for is elucidating the underlying biology – or at least not in the way it was hoped. One of the driving motivations of GWAS was the idea that associated variants would implicate specific genes or biochemical pathways in the pathogenesis of psychiatric disease, possibly even providing direct molecular targets for new therapeutics. This has turned out not to be the case.
Unlike a condition like cancer, which really reflects an altered state of gene expression at a cellular level, psychiatric conditions like schizophrenia do not. They reflect altered states of distributed brain circuits and systems. Genetic variation may cause such a state to emerge, but there may be nothing in the molecular functions of the affected genes that specifically relates to that state in any acute or on-going fashion.
The predictive precision of polygenic scores will always be low for individuals – this is a limit in principle, not just in practice. […] So, for psychiatric conditions, polygenic scores will likely be useful for some kinds of research, but less so for clinical purposes.
Personally, I have never been surprised by the fact that we require very large samples to identify small genetic effects, because I don’t think depression is a good scientific phenotype to study. It is highly heterogeneous, measurement is all over the place, there are numerous etiologies leading up to depression, inter-rater reliability is lower than for most other mental disorders, and dozens of depression subtypes have been proposed and discarded over the last century. Lack of reliability and validity of depression diagnosis is widely acknowledged today, including by leading organizations such as the NIMH. At this stage, I strongly believe that depression does not lend itself well to the identification of underlying biological markers — depressed patients are too different from each other. Note that this is my personal opinion, and others have a somewhat more optimistic view of the literature (e.g. see the paper by Ormel et al. 2019 published just a few weeks ago on challenges of depression GWAS studies; thanks for alerting me to the paper in the comments, Raquel).
Going forward, it might be helpful to try to decompose depression into more reliable and valid phenotypes, such as symptoms. As explained elsewhere in detail, depression symptoms like concentration problems, fatigue, sadness, and weight loss are not just interchangeable indicators of a disorder: they have differential relations with many important outcomes, and summing them up to one score likely leads to considerable problems in terms of validity and reliability.[Update1: thanks for the critical evaluations of the blog, which I have updated to more clearly separate out candidate gene studies and GWAS studies] [Update2: Slate Star Codex also wrote a blog post about the candidate gene literature following the Border paper.]