Skip to Main Content

AI and Text Mining for Searching and Screening the Literature

This guide is intended to provide an overview of the definition and application of text mining in search strategy development and study selection; it includes a list of tools and resources that librarians or other motivated searchers may wish to try

Text mining tools for searching

Stansfield, O'Mara-Eves, and Thomas (2017) report five ways in which text mining tools can assist in search strategy development:

  1. Improving the precision of searches (i.e., the proportion of retrieved records that are relevant), for example, by identifying more precise phrasal terms instead of using single-word terms in a search
  2. Improving the sensitivity of searches (i.e., the proportion of relevant studies retrieved by the search over the total number of relevant studies in the database) by identifying additional search terms (validating this requires the development of a gold standard/quasi-gold standard/reference set, which is often used when developing search filters or hedges)
  3. Assisting in the translation of search strategies from one database and/or platform to another
  4. Searching and screening within an integrated system
  5. Developing objectively derived search strategies

Using text mining techniques to increase the objectivity of search strategies requires a more sophisticated use of tools that librarians or other searchers may or may not be prepared to implement. Decisions about cutoffs for high frequency terms, for example, and calculations to establish high frequencies require somewhat large sets of relevant references (which can be derived based on the included studies of relevant systematic reviews, for example) as well as a population set of random records against which one can test whether a term is high frequency across documents in general (for example, words that are high-frequency due to common check tags such as 'human') or in the relevant documents only. 

Text mining, like data science in general, also involves a great deal of preprocessing, which tools may or may not handle. Preprocessing includes data cleaning and normalization techniques such as:

  • Changing all characters to lower case
  • Removing punctuation
  • Stripping whitespace
  • Removing numbers
  • Removal of stopwords
  • Stemming
  • Lemmatization

Some of the tools listed allow for customization of these procedures, while some are preconfigured. Programming tools such as the tm package in R or quanteda allow for much more flexibility than some of the tools covered here, but they are also much more difficult to use if one is not accustomed to programming.

Tools for search strategy development

Digital Evidence Synthesis Tool (DEST) Evaluations - Examines automation tools in evidence synthesis, including tools available for the searching stage

  • Use filters in the left column to limit to, e.g., Evidence synthesis stage: Searching/Deduplication, then click on "List records" button at the top of the column
  • Focuses on tools for health and climate change syntheses, but tools are often agnostic or applicable to fields more generally speaking, like health sciences

Facilitating search strategy translation

These tools should be used with caution: They may apply the correct syntax to translate one strategy to another database/interface and make it seem that the subject headings have also been mapped correctly when in fact they simply change the syntax but do not adjust the subject headings to the corresponding vocabulary (e.g., when translation from PubMed or Ovid MEDLINE to Embase, they will continue to use MeSH terms instead of EMTREE terms). They are useful if you understand the fundamentals of searching within the applicable databases and how the databases/platforms work, but the searches will require reviewing and editing.

Tools to identify authors based on PubMed records

Tools to identify high-frequency subject headings

References on AI and text mining in search strategy development

Useful repository of articles:

  • Bond, M., Finnerty, A., O'Mara-Eves, A., O'Driscoll, P., Thomas, J., Minx, J., Callaghan, M., & Scheelbeek, P. (2024). Digital Evidence Synthesis Tool Evaluations. EPPI Visualiser database.
    • In left column, under "Evidence Synthesis Stage": Select "Searching/Deduplicating" > Click "List Records"

Selected articles:

Liaison Librarian

Profile Photo
Genevieve Gore
she/her
Liaison Librarian, Schulich Library of Physical Sciences, Life Sciences, and Engineering
Bluesky: @genski.bsky.social
Contact:
Macdonald Stewart Library Building
809 Sherbrooke St W
Montreal QC H3A 0C1
Website

Librarian

Profile Photo
Sabine Calleja
she/her/elle
Contact:
Schulich Library of Physical Sciences, Life Sciences, and Engineering

Online training: AI in searching

McGill LibrariesQuestions? Ask us!
Privacy notice