Large language models (LLMs) like chatGPT or DeepSeek AI essentially function via probabilistic next-element prediction. That is, they predict the next word in a sequence based on statistical patterns learned from huge datasets. After waking up from a pretty bizarre dream this morning, I thought about odd similarities between LLM confabulations and dreams, both of which appear to produce locally coherent but globally bizarre output.
LLM feverdreams in 2023
When these models came out, they regularly confabulated. In January 2023, I wrote a short blog post on asking chatGPT to provide some scientific references for me. It did—plausible references including plausible journal and author names—but these references did not exist. They could have, though—some of them were pretty close to the real targets. But they didn’t.
Interestingly, when I confronted chatGPT with the potential that these references weren’t real, it doubled down and told me that they very much were real.
LLMs have much improved in the 18 months since then, but they still confabulate. And those examples of confabulation have helped me a little to understand how LLMs generate words. This week, for example, I asked it to create statistical syntax for a figure, which it did, choosing 3 colors. I then asked it why it chose these colors, to which it gave a fairly surprising answer: because the colors matched the psychological constructs in the plot.
When confronted, unlike 18 months ago, it freely admitted that this just wasn’t true. Some progress, I guess.
LLMs and dreams: prioritizing local over global coherence
LLMs try to predict the next word in a sentence based on learned patterns, and this can be done in many different ways. Some output preserves local coherence: it is a word that commonly follows another word. Some output preserves global coherence: it makes sense in this sentence or story. When LLMs confabulate, it appears they maintain local plausibility over global consistency.
- Both LLMs and dreams appear to perform sequence prediction based to some degree on prior experience or information.
- They are both trained on data, which can lead to some form of novel recombination. In a way, we could say that both systems hallucinate plausible sequences from past information.
- Both systems confabulate, and maybe the reason is that they both produce locally coherent outputs that at times are globally odd. It is rarely the case that the directly next step in any dream by itself is bizarre: it is, I would argue, usually locally coherent. Things as a whole can be bizarre, seen from a distance.
- I’m neither neuroimaging nor LLM expert, but my understanding is that human dreaming is at least thought to involve spreading activation across neural networks during sleep, especially in REM. This isn’t entirely different from how LLMs activate words or tokens based on contextual embeddings and associative similarity.
Short random blog post in the hopes to start discussions on this. Would love to learn more about the neuroscience of dreams, in case there is robust work on the topic.
Interesting comparison, makes sense to me.
However, I think that nothing prevents LLMs from moving beyond local coherence. It makes sense that local is prioritized over global because otherwise we could not decifer anything. And scaling up the context requires more computation power. (Excellent explanation here: https://www.youtube.com/watch?v=eMlx5fFNoYc). But I think LLMs already did get a lot better at this. For example, students who use GPT regularly during the semester can just put in the query “I have now an exam for statistics coming up, you know the topics, provide me with some MC questions”. So there is good global coherence here.
Meanwhile in dreams, I feel like there is little incentive towards global coherence. Maybe that differs between people, and maybe that can change with training (?), but my dreams only make local sense :(
(I also dont know anything about neuroscience of dreams…)
Just for fun and curiosity, I went back to your prior LLM query post/rant and reran it. Without doing deep analysis, at a glance it seems like the response is REALLY impressive and the references are real, even has links to click through to the original papers, and as a follow up I even asked it to summarize the papers in a table which it did. Its a genuinely useful tool for research now in my opinion although you still need to fact check it. The improvement is undeniably very impressive since the sketchy early days. See:
https://grok.com/share/bGVnYWN5_6a55782a-b1e7-4d7d-87ff-7e15cef94c17
Nice idea to rerun the query, massive difference for sure — being able to access the internet and search directly helps with queries like this.