Skip to Main Content

AI and text mining for searching and screening the literature

This guide is intended to provide an overview of the definition and application of text mining in search strategy development and study selection; it includes a list of tools and resources that librarians or other motivated searchers may wish to try

What this guide covers

What this guide covers

This guide is intended to provide quick access for expert searchers to tools using AI -- or, more specifically, machine learning -- and text mining, in the context of searching and screening the literature. It is not an exhaustive guide but focuses on resources that do not require programming skills or the assistance of computer scientists, computational linguists, or statisticians, i.e., tools that librarians and other searchers would be comfortable using. It also includes some guidance on more advanced tools.

What is machine learning in AI?

Based on a prompt of ChatGPT 3.5, machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and statistical models enabling computer systems to improve their performance on a task through experience, without being explicitly programmed.

What is text mining?

Text mining is the process of distilling actionable insights from/identifying structure within unstructured text (Kwartler 2017). Text mining is based in statistics, computational linguistics, computer science, information studies, and other disciplines. It is most useful in large collections of unstructured textual data in which manual analysis would be unscalable, for example when screening tens of thousands of records in a knowledge synthesis, but it can also be applied in contexts such as search strategy development, and can be used with corpora composed of structured data (like bibliographic records) as well.

How are machine learning and text mining useful in searching and study selection?

Here we will go over some of the ways in which machine learning and text mining have been used in knowledge syntheses, from the identification of relevant articles to the development of methodological search filters to other steps in the process such as screening/study selection.

What about Boolean logic?

Machine learning and text mining are being used to enhance traditional search techniques as well as to overcome their limitations. Librarians teach Boolean logic extensively in advanced searching workshops. Although Boolean searching is still important, other information retrieval methods like similarity queries and automated categorization (clustering) are showing up in the literature, and as a result useful tools are cropping up as well. It's important that searchers understand how information retrieval methods are evolving, and how they can help advance our own professional practices and those of our users.

Machine learning and text mining in knowledge synthesis

Main steps in knowledge synthesis in which machine learning and text mining are being used:

  • Identification of seed articles
  • Search strategy development
  • Study selection (also known as study eligibility or screening)
  • Data extraction/abstraction
    • To validate data extracted by humans
    • This step is currently outside the scope of this guide
  • Critical appraisal
    • To validate appraisals by humans
    • This step is currently outside the scope of this guide
  • Summarizing
  • Review updating

Situating machine learning and text mining in search strategy development

Situating machine learning and text mining in search strategy development

Why text mining? Aren't librarians and other advanced searchers already text miners?

  • Systematic reviews aim for objectivity, transparency, methodological rigor
  • Search strategies should also aim for objectivity, transparency, methodological rigor
  • Librarians are great at transparency and at conveying the importance of methodological rigor and the avoidance of bias in knowledge syntheses, but what about objectivity and methodological rigor in our search strategies?
    • How do we select the terms included in our search strategies?
    • Answer: Usually in collaboration with domain experts, by examining the search strategies used in well documented systematic reviews (e.g., Cochrane reviews), by examining relevant studies and manually "mining" their bibliographic records for terms
  • Can we be more objective in our selection of search terms?
    • We can try (many of us already do)
  • Can we be more exhaustive in our selection of search terms?
    • Machine learning techniques may be useful for further developing our searches, for example, by using prompts to elicit synonyms from large language models such as ChatGPT

Text mining techniques

Examples of text mining techniques

  • Term extraction (for search strategy development)
    • Word/multi-word frequency analysis (e.g., EndNote, PubReminer, Systematic Review Accelerator)
    • Concordance/collocation (e.g., AntConc)
    • Automatic term recognition (e.g., TerMine)
      • In field of natural language processing, denotes the computer-mediated extraction of terms from a corpus, using linguistic processing; also called phrase mining
  • Query expansion (for search strategy development)
  • Text categorization/classification/clustering using machine learning (for search strategy development, screening)
    • SVM...

Here, the tools will mainly focus on term extraction, but the tools for study selection/screening also frequently use machine learning for text categorization/classification/clustering.

Liaison Librarian

Profile Photo
Genevieve Gore
she/her
Liaison Librarian, Schulich Library of Physical Sciences, Life Sciences, and Engineering
Contact: Website

McGill LibraryQuestions? Ask us!
Privacy notice