Using GPT-3 to search for scientific “references”

I have been playing around with GPT-3 and its chatbot in previous weeks, and found it fascinating. GPT-3——the Generative Pre-trained Transformer 3——is a deep learning language model developed by OPEN AI and produces human-like text.

Some amazing use cases have already been explored. Here is an example where Denny Borsboom interrogated GPT-3 about assumptions of statistical tests, in which the AI showed real potential as a statistical advisor (click the image for the full conversation), and was even able to provide R code:

And Michel Nivard wrote a brief function that uses GPT-3 to comment code for you:

 

Using GPT-3 to search scientific references and information

So today, when revising a paper, I thought I would give it a try to hunt down some references. Specifically, I asked GPT-3 to find references for depression prevalence rates in young adults.

I was so excited I sent these screenshots to a bunch of people, followed by “ok let’s check if these papers exist”.

  1. “Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys” by Kessler et al. (2007): This study estimates the prevalence of depression in young adults (18-29 years old) to be around 7% based on data from 21 countries.” A study with this title exists, and is from 2004, by Demyttenaere et al., not Kessler. They looked into 14, not 21 countries, and there is no information on age groups in the paper I could find.
  2. “Depression and anxiety disorders in young adults: Prevalence, risk factors, and treatment-seeking” by Wittchen et al. (2011): This study estimates the prevalence of depression in young adults (18-29 years old) to be around 10.9% based on data from the World Health Organization World Mental Health Survey.” No such paper exists; I did find a paper by Wittchen 2010 entitled The size and burden of mental disorders and other disorders of the brain in Europe 2010.
  3. “The prevalence and risk factors of depression in a sample of young adults: A cross-sectional study” by Wang et al. (2016): This study estimates the prevalence of depression in young adults (18-30 years old) to be around 17.3% based on data from China. No such paper exists. I did find, however, similar titles by Wang et al., such as Prevalence and risk factors for depressive and anxiety symptoms in middle-aged Chinese women: a community-based cross-sectional study (2022) and The prevalence and risk factors for depressive symptoms in frontline nurses under COVID-19 pandemic based on a large cross-sectional study using the propensity score-matched method (2021).
  4. “Depression in young adults: Prevalence, incidence, course, and risk factors” by Lynskey et al. (2000): This study estimates the prevalence of depression in young adults (18-24 years old) to be around 8.3% based on data from Australia. Does not exist—I did find a 2000 paper by a different author Natural course of adolescent major depressive disorder in a community sample: predictors of recurrence in young adults.
  5. “The prevalence and correlates of DSM-IV major depressive disorder in the national comorbidity survey replication” by Kessler et al. (2005): This study estimates the prevalence of depression in young adults (18-29 years old) to be around 10.5% based on data from the United States. There are a number of famous papers of Kessler around that time on the topic, but this paper does not exist, although there are several papers with somewhat similar titles, such as Wilhelm et al’s 2003 Prevalence and correlates of DSM-IV major depression in an Australian national survey.
  6. “The prevalence and correlates of DSM-5 major depressive disorder in the Canadian Community Health Survey-Mental Health” by Asmundson et al. (2015): This study estimates the prevalence of depression in young adults (18-34 years old) to be around 15.8% based on data from Canada. Nope, but Gordon Asmundson has written a lot of papers on Canadian mental health—he could have well written this one. But he did not.

Next step: perhaps adding journal names will help me find the papers?

It does not help, because the papers don’t exist—but they could well exist. An actual Wittchen 2010 paper, “The size and burden of mental disorders and other disorders of the brain in Europe 2010”, is possibly the reason why the AI thinks the Journal of Neural Transmission could have been the right journal for a fake 2011 Wittchen paper? That’s the only silly journal that stands out——not because it does not exist, but because the paper wouldn’t be a good fit for it.

So flabbergasted, I started a heart to heart with GPT-3: are these referenced made up, make-believe, or real?

Ooooookay.

I also asked it for DOIs for the papers, and it provided the DOI 10.1001/jama.297.24.2651 for the Kessler et al. 2007 paper——neither paper nor DOI exist.

Concluding thoughts

I find it fascinating how some information provided by GPT-3 can be highly accurate, such as the discussion on assumptions of statistical tests I provided above, while other information appears made-up. Note that the heuristic content wasn’t all bad: the prevalence rate estimates provided were roughly what you find in papers, and the names and journals mostly made sense, too, as well as the publication year (last 20 years). But the papers were never written, and the information provided is not from these papers. This may call for clearer disclaimers on the GPT-3 website.

On the bright side, this may offer some kind of solution to a problem I’ve been thinking about for a while now: student home work. GPT-3 writes absolutely fantastic essays based on prompts, that are, from what I can tell, indistinguishable from actual essays, as long as the essays are short. A new preprint looked into this question, and compared 50 paper abstracts with 50 abstracts written by GPT-3, based only on title and journal of the paper.

“When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, but that the generated abstracts were vaguer and had a formulaic feel to the writing.”

I wish there was a tool, perhaps an AI tool, to help us distinguish real from fake abstracts. Hold on .. why don’t we ask GPT-3 to write an abstract for us?

And now we can check what it thinks about the abstract:

Oh well.

7 thoughts on “Using GPT-3 to search for scientific “references”

  1. Pingback: Fabrication and errors in the bibliographic citations generated by … – Nature.com – Auto Robot Demo

  2. Pingback: Fabrication and errors in the bibliographic citations generated by … – Nature.com – Play With Chat GTP

  3. Gordo

    Yes, by design it makes things up based on an internal algorithm that scores on plausibily. While it seems ludicrous that it would make up specific citations with authors and titles and even DOIs this is what happens when you apply a minimalist code base to a system that is supposed to try to answer any question. I remember reading a funny quote from a Google executive who basically said the reason they hadn’t released their version to the public yet was partly because of the liability from a system that just makes things up and is always confident about it’s made up answers.
    More here: https://teche.mq.edu.au/2023/02/why-does-chatgpt-generate-fake-references/
    I’m sure future versions will be a lot better when it comes to providing references. You would think in the very least it should be able to quickly determine if a DOI is legit or not and the correct title, date, authors associated with a DOI.

    Reply
  4. Yidian

    I found it quite funny that ChatGPT told you the abstract it wrote seconds ago is a legit scientific abstract. It seems to have a different definition of ‘real scientific abstract’ – as long as the abstract follows a certain structure and provides related information.

    I tried to ask chatGPT – Can you tell me if this paper is published:
    “Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys” by Kessler et al. (2007): This study estimates the prevalence of depression in young adults (18-29 years old) to be around 7% based on data from 21 countries.”

    And the answer makes more sense now:
    It is not clear if the paper “Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys” by Kessler et al. (2007) exists. The paper “Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys” by Demyttenaere et al. (2004) does exist, but it does not contain the information you mentioned about the prevalence of depression in young adults or the number of countries studied. It would be best to double-check the citation and ensure that you have the correct title and authors for the paper you are trying to reference.

    Reply
    1. Eiko Post author

      Apparently, your gpt-3 is smarter than mine! I asked it the same question you did, and it tells me the paper does exist, and doubles down on a citation and DOI that does not exist.

      Reply
      1. Yidian

        It gives me the same answers as yours now, even with my exact original question. Seems to be ‘properly’ trained by us.

        Reply
    2. Covert Scientist

      This cracked me up. It would appear psychiatrists have been using CHATgpt to write patient information leaflets for years now.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.