New paper on stability and accuracy of psychological networks

Our paper “Estimating psychological networks and their accuracy: a tutorial paper” was published in Behavioral Research Methods as open access paper! You can find the full paper here, and the supplementary materials here.

In this paper, we raise the possible danger of an upcoming replicability crisis consistent with the rest of psychology, given that we are using highly data-driven network models, and provide you with opportunities to investigate the stability of your network analysis results in more detail.

Power issues in network analysis

Imagine we know the true network structure of a network with 8 nodes, and it looks like this (left side network, right side centrality values). All edges are equally strong, and that means that all centrality estimates are also equally strong.


And now imagine we take this true network, and simulate data from it for n=500. This gives us a dataset with 8 variables and 500 people, and now we estimate the network structure in this dataset, along with the centrality values.


As you can clearly see, neither edges nor centrality estimates are equally strong anymore, and if we were to write a paper on this dataset, we would (falsely) conclude that B-C is the strongest edge, and that B has the highest centrality. Because we simulated the data from a true model, we know all that all edges and centrality estimates are equally strong.

The issue is more pronounced with (a) fewer participants, (b) more nodes, or (c) both. The reason for this is that the current state-of-the-art network models we use in cross-sectional data, regularized partial correlation networks, require a very large amount of parameters to estimate.

I will to introduce two methods briefly that allow you to get to the bottom of this problem in your data — our paper introduces several more.

Edge weights accuracy

First, we want to look into the question how accurate we estimate edge weights of a network. For this purpose, we take a freely available dataset (N=359), and estimate a regularized partial correlation network in 17 PTSD symptoms, which looks like this:


Using a single line of code in our new R-package bootnet to estimate edge-weights accuracy, you can now estimate the accuracy of the edge weights in the network, leading to the following plot:

Edge weights accuracy

On the y-axis you have all edges in the network (I omitted the labels of all edges here, otherwise the graph gets very convoluted), ordered from the highest edge (top) to the lowest edge (bottom). Red dots are the edge weights of the network, and the grey area indicates the 95% CI around the edge weights. The more power you have to estimate the network (the fewer nodes / the more participants), the more reliable your edges will be estimated, the smaller the CIs around your edges will be. In our case, the CIs of most edges overlap, which means that the visualization of our network above is quite misleading! While certain edges look stronger than some weaker edges, they are actually not different from each other because their 95% CIs overlap. It’s a bit like having a group of people with a weight of 70kg (CI 65-76kg), and another group with 75kg (CI 70-80kg): these 2 point estimates do not differ from each other because their 95% CIs overlap. In sum, with 17 nodes we would prefer to have many more participants than the 359 we have here.

If we now want to look whether specific edges are different from each other, we can run the edge-weights different test in bootnet:

edge weights diff test

Here you can see all edges on both x-axis and y-xis, and in the diagonal the value of the edge weights; black boxes indicate significant difference between 2 edges. After looking at the substantial overlap of CIs above, it is not surprising that most edges do not differ from each other. (Note that this test does not control for multiple testing; more about this in the paper.

Centrality stability

The second big question we can try to answer is the stability of centrality. Let’s look at the centrality estimates of our network first, and focus on strength centrality, which is the absolute sum of edges that connect a given node to other nodes.


As you can see, node 17 has the highest strength centrality, but is the node actually more central than node 16 which has the second highest strength?

Instead of bootstrapping CIs around the centrality estimates (which are unbootstrappable, see supplementary of the paper), we look into the stability of centrality. The idea comes from Costenbader & Valente 2003 and is straightforward: we have a specific order of centrality estimates (17 is the most central one, then comes 16, then 3, etc), and now we delete people from our dataset, construct a new network, estimate centrality again, and do this many thousand times. If the order of centrality in our full dataset is very similar to the order of centrality in a dataset in which we dropped 50% of our participants, it means that the order of centrality is stable. If deleting only 10% of the our participants, on the other hands, leads to a fundamentally different order of centrality estimates (e.g. 17 is the least central now), this does not speak to the stability of centrality estimates.

Centrality stability

You see that strength centrality (blue) is more stable than betweenness or closeness, which is what we have often found when using bootnet in numerous datasets now. In our case, the correlation between the order of strength centrality in our full dataset with a dataset in which we (2000 times) sampled 50% of the participants is above .75, which isn’t too bad. In the paper, we introduce a centrality stability coefficient which tells you more about the stability of your centrality estimates.

Finally, you can also check whether centrality estimates differ from each other, using the centrality difference test; node 17 is not more central than node 16. (Similar to the edge-weights difference test, this test does not control for multiple testing.)



  • “I have a network paper and didn’t investigate the robustness of my network!” – Welcome to the club :). You can start doing it from now on!
  • “Do I have to use bootnet?” Hell no, if you have other / better ideas, you should definitely do that instead. bootnet is very new – the paper is not yet published – and there are probably many better ways to look into the robustness of network models. But it is a great start, and I’m very happy that Sacha Epskamp put so much work into developing the package and writing the paper about replicability with Denny Borsboom and me. For me personally, bootstrapping helped me to avoid drawing wrong conclusions from my data: if your most central node does not significantly differ from many other nodes in terms of centrality, you really don’t want to write up the symptom as being the most central, and if your visually strongest edge overlaps in CI with most other edges, you don’t want to conclude it is the most central edge.
  • “But Eiko, what about time-series networks? bootnet only covers cross-sectional networks at present!” – Yes, we all need to write motivating emails full of beautiful and funny gifs to Sacha to shift his priorities! But seriously, we’re working on it. And please leave him alone, ok? ;)


The usage of psychological networks that conceptualize psychological behavior as a complex interplay of psychological and other components has gained increasing popularity in various fields of psychology. While prior publications have tackled the topics of estimating and interpreting such networks, little work has been conducted to check how accurate (i.e., prone to sampling variation) networks are estimated, and how stable (i.e., interpretation remains similar with less observations) inferences from the network structure (such as centrality indices) are. In this tutorial paper, we aim to introduce the reader to this field and tackle the problem of accuracy under sampling variation. We first introduce the current state-of-the-art of network estimation. Second, we provide a rationale why researchers should investigate the accuracy of psychological networks. Third, we describe how bootstrap routines can be used to (A) assess the accuracy of estimated network connections, (B) investigate the stability of centrality indices, and (C) test whether network connections and centrality estimates for different variables differ from each other. We introduce two novel statistical methods: for (B) the correlation stability coefficient, and for (C) the bootstrapped difference test for edge-weights and centrality indices. We conducted and present simulation studies to assess the performance of both methods. Finally, we developed the free R-package bootnet that allows for estimating psychological networks in a generalized framework in addition to the proposed bootstrap methods. We showcase bootnet in a tutorial, accompanied by R syntax, in which we analyze a dataset of 359 women with posttraumatic stress disorder available online.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.