Skip to main content

Text mining for searching and screening the literature

This guide is intended to provide an overview of the definition and application of text mining in search strategy development and study selection; it includes a list of tools and resources that librarians or other motivated searchers may wish to try

What this guide covers

What this guide covers

This guide is intended to provide quick access for expert searchers to text mining tools that are being used in the context of searching and screening the literature. It is not an exhaustive guide but focuses on resources that do not require programming skills or the assistance of computer scientists, computational linguists, or statisticians, i.e., tools that librarians and other searchers would be comfortable using. It also includes some guidance on more advanced tools.

What is text mining?

Text mining is the process of distilling actionable insights from/identifying structure within unstructured text (Kwartler 2017). Text mining is based in statistics, computational linguistics, computer science, information studies, and other disciplines. It is most useful in large collections of unstructured textual data in which manual analysis would be unscalable, for example when screening tens of thousands of records in a knowledge synthesis, but it can also be applied in contexts such as search strategy development, and can be used with structured data (like bibliographic records) as well.

How is text mining useful in searching and study selection?

Here we will go over some of the ways in which text mining has been used in knowledge syntheses, from the development of methodological search filters to objective search strategy development to other steps in the process such as screening/study selection.

What about Boolean logic?

Text mining and machine learning are being used to enhance traditional search techniques as well as to overcome their limitations. Librarians teach Boolean logic extensively in advanced searching workshops. Although Boolean searching is still important, other information retrieval methods like similarity queries and automated categorization (clustering) are showing up in the literature, and as a result useful tools are cropping up as well. It's important that searchers understand how information retrieval methods are evolving, and how they can help advance our own professional practices and those of our users.

Text mining in knowledge synthesis

Main steps in knowledge synthesis in which text mining is being used:

  • Search strategy development
  • Study selection (also known as study eligibility or screening)
  • Data extraction/abstraction
    • To validate data extracted by humans
    • This step is currently outside the scope of this guide
  • Critical appraisal
    • To validate appraisals by humans
    • This step is currently outside the scope of this guide
  • Review updating

Situating text mining in search strategy development

Situating text mining in search strategy development

Why text mining? Aren't librarians and other advanced searchers already text miners?

  • Systematic reviews aim for objectivity, transparency, methodological rigor
  • Search strategies should also aim for objectivity, transparency, methodological rigor
  • Librarians are great at transparency and at conveying the importance of methodological rigor and the avoidance of bias in knowledge syntheses, but what about objectivity and methodological rigor in our search strategies?
    • How do we select the terms included in our search strategies?
    • Answer: Usually in collaboration with domain experts, by examining the search strategies used in well documented systematic reviews (e.g., Cochrane reviews), by examining relevant studies and manually "mining" their bibliographic records for terms
  • Can we be more objective in our selection of search terms?
    • We can try (many of us already do)

Text mining techniques

Examples of text mining techniques

  • Term extraction (for search strategy development)
    • Word/multi-word frequency analysis (e.g., EndNote, PubReminer, Systematic Review Accelerator)
    • Concordance/collocation (e.g., AntConc)
    • Automatic term recognition (e.g., TerMine)
      • In field of natural language processing, denotes the computer-mediated extraction of terms from a corpus, using linguistic processing; also called phrase mining
  • Query expansion (for search strategy development)
  • Text categorization/classification/clustering using machine learning (for search strategy development, screening)
    • SVM...

Here, the tools will mainly focus on term extraction, but the tools for study selection/screening also frequently use machine learning for text categorization/classification/clustering.

Librarian

Genevieve Gore's picture
Genevieve Gore
Contact:
Schulich Library of Physical Sciences, Life Sciences, & Engineering
(Office temporarily located in the McLennan Library Building, Room M6-63)
514.398.3472
Website Skype Contact: genatlibrary

McGill LibraryQuestions? Ask us!
Privacy notice