Skip to Main Content

Text mining for searching and screening the literature

This guide is intended to provide an overview of the definition and application of text mining in search strategy development and study selection; it includes a list of tools and resources that librarians or other motivated searchers may wish to try

Using TerMine

Using TerMine to find multi-word terms

Available from http://www.nactem.ac.uk/software/termine/

TerMine is a useful tool for drawing out high-frequency multi-word terms from a corpus; however, it treats the corpus as a single file or document rather than also taking into account patterns across the documents, i.e., bibliographic records, so it is not possible to know if a term is highly frequent in, for example, only one record in a corpus, or common across many records in the corpus.

  • TerMine integrates an automatic term recognition algorithm using C-values (method combining linguistic and statistical analyses) and AcroMine acronym recognition (acronym dictionary generated from MEDLINE)
    • Based on natural language processing techniques
  • Includes option to select a part of speech (POS) tagger for biomedical texts, GENIA Tagger, or a POS tagger for generic texts, Tree Tagger
  • TerMine is available through a free web demonstration or for download upon request; it is also built into the EPPI-Reviewer software, wihch is paid

Suggestions for using TerMine to identify multi-word terms

  • Collect the bibliographic records you would like to analyze in an EndNote library (or other citation software)
  • Using EndNote: Create an output style including the record fields you would like to analyze for high-frequency terms (e.g., title, abstract, keywords); export the records using said output style to a text file
  • From the TerMine Web Demonstration interface:
    • Choose the text file you saved using the Local text file option
    • Select GENIA Tagger version 2.1 for biomedical records
    • Click Analyze
    • From the resulting page, you can change the C-value threshold to highlight terms with a given C-value or higher if the number of terms is too high
    • Select in table to display the list of terms by C-value, in descending order
    • Copy the terms 
    • From Excel: Paste Special and select Text to maintain the table format in Excel

Liaison Librarian

Profile Photo
Genevieve Gore
Liaison Librarian, Schulich Library of Physical Sciences, Life Sciences, and Engineering
Contact: Website

McGill LibraryQuestions? Ask us!
Privacy notice