I recently stumbled across the paper “A metastructural model of mental disorders and pathological personality traits”, authored by Aidan Wright and Leonard Simms in 2015. I enjoyed reading it: it’s a strong methodological paper, using state-of-the-art exploratory structural equation models (ESEM). It would have been a pleasure for me to review this paper: a very clear recommendation to publish.
However, the paper contains a few (what I personally consider to be) issues representative of the psychopathology structural equation modeling (SEM) literature in general. Therefore, I will use it as an example to write a more general critique. I’ll try to use non-technical language so that readers who know neither SEM nor psychopathology research well can follow.
It’s also very important to me to not misunderstand this as criticism of the authors. I very much like their work – in fact, we’re speaking in the same symposium at a conference later this year – it could really be any other paper as well.
Wright & Simms look at the general structure of psychopathology, which has been described in the recent literature by 3 factors: internalizing, externalizing, and psychoticism. You can understand these as 3 general domains of psychopathology into which all other disorders fit. The authors analyze psychopathology data that also includes personality traits. They identify 5 factors: the prior 3 general domains, along with 2 novel factors.
So what’s the problem? First, the data actually tell us there may be 7 factors and not 5.
The authors then choose 5 factors based on theory – because in the 5 factor model, the 3 general domains from prior research emerge, in addition to 2 novel factors, while this is not the case in the 7 factor model. But we should discuss how much leeway one has when picking a model at least partially based on theory, and later conclude that prior theory was confirmed in the data. Note that Wright & Simms discuss this all very openly, and follow standard guidelines in the field here. It is absolutely possible I would have favored the 5 factor solution as well here, over the 7 factor solution. The difference in fit is small, it makes sense to go with prior findings, and sometimes latent classes or factors can be over-extracted for some weird statistical reasons. But extracting classes at least partly based on theory, and at the same time concluding that prior research findings were replicated, seems somewhat problematic.
Second, Wright & Simms (again, like nearly all other papers) implicitly assume these 5 factors cause their data. I should be more precise here: I don’t know how Wright & Simms interpret the data, but the models they use certainly assume that factors cause (covariation among) data. But what is the externalizing factor? To what does it correspond in the world? And how does it cause the covariance among symptoms?
The interaction of the above two points is the real issue, and I will summarize below why it may stall progress towards gaining insights into mental disorders.
Let us look at the general problem first. Psychopathology data consists of many variables. If we interview a patient using a structured clinical interview that encompasses some symptoms for every mental disorder, we can easily end up with more than 100 variables that are often correlated with each other in complex ways. To make it simple, let’s assume the first 20 of 100 variables we measure are highly inter-correlated (meaning they often co-occur) and forget the other 80 for now.
One way to deal with such data is using techniques of dimension reduction that simplify data. Principal Components Analysis (PCA) is a common method to do so. A component will be constructed for items that strongly correlate with each other, in our case for the first 20 symptoms, and this offers a way to reduce 100 questions to a few components. These components represent what we call “formative” latent variables (we form the latent variable): we made them up to simplify data, they do not represent real things in the Universe. They are just descriptions to make our interpretation easier. The 20 symptoms often co-occur in patients because … we don’t know why. PCA does not provide an answer. And if a PCA identifies 7 components in our data, but 5 make our interpretation much easier, it can be justified to take 5 instead. Causally speaking, the variables determine the components (we measure many variable and then reduce these to components; if the variables change, so do the components).
The second approach – the one the authors use and that is very commonly used in the field – is some form of factor analysis (FA) (specifically, the authors use ESEM that are similar to confirmatory factor models (CFA) except that they allow all items to load on all factors). Here, one searches for underlying constructs that explain the covariance among items, so called reflective latent variables (the symptoms are reflections of the latent variable). In our case, we would extract a factor that explains the covariance among the first 20 items, and the conceptualization is that this factor causes the 20 variables. In other words, patients often have the first 20 symptoms at the same time because a disorder causes them all. This approach differs radically from the first where the variables cause the factor: here, the factor causes the variables. If the underlying disorder changes, so do its symptoms.
Wright & Simms 2015
In their paper, the authors use the second approach, an adjusted form of a factor analysis. They set out to find underlying constructs in psychopathology data (or so I conclude based on the models chosen), and identify 5 factors: 3 well-established factors that have been found a number of times, and 2 that describe the additional personality disorder data they added to their analyses.
Now, one major concern of mine is that Wright & Simms actually identify 7 factors that explain the data best, not 5. But here going “against” the data and with a priori hypotheses is difficult, especially if you conclude you confirmed prior results.
The authors state:
“Thus, although the best model fit was obtained for a seven- factor solution, factors that emerged after the five-factor solution were either highly specific or suggestive of overfactoring.”
But this is nothing else than saying “we hypothesized 5 factors, we found 7, and we choose 5 because 7 is more than we expected” (overfactoring). This leads to the second issue: this is not a problem if such factors are understood as description of psychopathology, as formative latent variables, as simplification of data. If we want to simplify our data, we are free to interpret and extract what helps us make sense of the data, especially if there is a strong prior literature making sense of the world around us. But when the goal is to discover underlying constructs that explain the covariance in data, I would like to see the field as a whole start a proper discussion in which way we should really stick to our data. The search for these underlying causal latents is also reflected in the language:
“These results reveal evidence for a psychopathology metastructure”
“A small number of broad dimensions (…) can account for much of the observed covariation among common mental disorders.”
“Three of the spectra found here, internalizing, externalizing and thought disorder/psychosis, have been well replicated in a number of samples”.
Psychopathological symptoms are inter-related in complex ways. What I am missing is a discussion about the assumption that identified underlying constructs cause the covariance among symptoms. Besides, what does it actually mean that the internalizing factor causes the internalizing symptoms? What is the internalizing factor in the world? These are important questions if we want to move the field of psychopathology research forward.
The two approaches described above are not everything the field has to offer, an alternative notion has emerged a few years ago that also explains why some symptoms co-occur more than others. It does not posit mystic latent entities that everybody assumes but nobody talks about. The covariance among symptoms is not due to a common cause, but due to mutual interactions in a dynamic complex system. Depression is a good example, where certain symptoms seem to occur together quite often. From the perspective of networks, insomnia causes fatigue which triggers psychomotor and concentration problems, instead of a latent factor depression (whatever that is) causing these symptoms. I am not suggesting the network approach as a superior solution here that is without difficulties, but merely for the sake of having all options that explain symptom covariance on the table.
While it is unclear if the network approach provides a good model for psychopathology – the approach is not without considerable issues – I would love to see the SEM literature in psychopathology start discussing the question what latent variables they model actually are and mean.
For a recent paper, we used ESEM just as Wright & Simms did, and say how we interpret these models:
“We do not advance ESEM as a plausible description of the data or data-generating mechanisms— the models exclusively function as the most feasible way to test measurement invariance in a dataset in which a dramatic change of the factor structure can be expected from prior findings. We thus use ESEM to achieve a baseline model that is as neutral as possible, because of the difficulties to justify a specific a priori [CFA] model based on the literature as discussed above. ESEM gives the model all possibilities to fit the data, but we do not mean to say that the resulting model is a reasonable model, or will replicate in this specific form in other datasets.”
Authors, reviewers, editors: if both your model and your interpretation of psychopathology requires reflective latent variables to exist as real things in the universe that cause the covariance among your data, I would be happy to see a discussion about what these are. I am not saying they do not exist, but it is absolutely unclear to me what they should be, and we’ve been modeling these things for more than half a century now.
Wright, A. G. C., & Simms, L. J. (2015). A metastructural model of mental disorders and pathological personality traits. Psychological Medicine, 45(11), 2309–19. doi:10.1017/S0033291715000252