Skip to Main Content

AI and Text Mining for Searching and Screening the Literature

This guide is intended to provide an overview of the definition and application of text mining in search strategy development and study selection; it includes a list of tools and resources that librarians or other motivated searchers may wish to try

ChatGPT, Copilot, and other AI chatbots

It is tempting to ask large language model (LLM)-based AI chatbots, like Copilot (officially supported when signed in through McGill) or ChatGPT/Gemini/DeepSeek and others, to develop database search strategies using your review question and some other contextual information (e.g., that you are conducting a systematic review, that you are searching PubMed) as a prompt.  

ChatGPT can admittedly be very useful in identifying other terms that may be used as synonyms for a given concept, e.g., it may suggest other terms that authors may use for a given concept that you had not thought of including in your search—although it typically hallucinates other irrelevant terms. To see an example of this, see this prompt produced by the University of Toronto’s Health Sciences Library. 

We do not recommend that searchers use LLMs to develop a database search strategy. Do not use it without being capable of reviewing, editing, and enhancing what it has done: That requires an understanding the basics of structuring searches properly with, for example, parentheses and Boolean operators, as well as the knowledge of how to search specific databases, including: 

  • How to break down your question into the concepts that should reasonably be included in your search

    • AI chatbots are often too restrictive or fail to break down the question properly 

  • They may generate a search approach that seems perfectly reasonable to non-expert searchers but that does something like include a concept that is too limiting

    • E.g., we have seen them include the concept of placebos AND randomized controlled trials (RCTs) when breaking down a question into concepts and search terms to identify RCTs 

  • How to identify/find actual subject headings (e.g., MeSH terms in MEDLINE)

    • AI chatbots are prone to making up subject headings; you need to know what subject headings are and how they work before trusting a chatbot with this task 

  • How to use truncation

    • AI chatbots may or may not apply truncation when appropriate, or they incorrectly truncate words 

  • How to choose which fields to search

    • AI chatbots may use overly restrictive or overly broad field searching, or invent their own fields 

  • How to use proximity operators correctly

    • AI chatbots may be incapable of applying these appropriately 

For some examples of AI chatbot search suggestions gone awry, see: 

Using ChatGPT for PubMed Searches: Be Smart! 

More AI Tools: Using Gemini for PubMed Searches 

An additional, major caveat is that these systems are creative: They may generate lists of suggested references that appear to be correct based on the authors or the journals cited, but these systems are highly prone to hallucinations: They often creatively generate references that do not exist at all! Databases like MEDLINE, CINAHL, Embase, PubMed, Cochrane, and others listed here should still be the first place to search for peer-reviewed literature. Large language models do not include many high-quality journals in their data corpora (i.e. the data "behind the scenes" that the large language model is deriving its information from) because these are proprietary products that can only be accessed with subscriptions (many of which McGill students, faculty, and staff have access to because we pay for it!) Therefore, tempting and alluring as it may seem, AI chatbots DO NOT and CANNOT replace good old fashioned database searching.

Source: https://x.com/shadbush/status/1616205520859238463 

Librarian

Profile Photo
Sabine Calleja
she/her/elle
Contact:
Schulich Library of Physical Sciences, Life Sciences, and Engineering

Liaison Librarian

Profile Photo
Genevieve Gore
she/her
Liaison Librarian, Schulich Library of Physical Sciences, Life Sciences, and Engineering
Bluesky: @genski.bsky.social
Contact:
Macdonald Stewart Library Building
809 Sherbrooke St W
Montreal QC H3A 0C1
Website

McGill LibrariesQuestions? Ask us!
Privacy notice