Skip to main content

Text mining for searching and screening the literature

This guide is intended to provide an overview of the definition and application of text mining in search strategy development and study selection; it includes a list of tools and resources that librarians or other motivated searchers may wish to try

Text mining tools for searching

Text mining tools for searching

Stansfield, O'Mara-Eves, and Thomas (2017) report five ways in which text mining tools can assist in search strategy development:

  1. Improving the precision of searches (i.e., the proportion of retrieved records that are relevant), for example, by identifying more precise phrasal terms instead of using single-word terms in a search
  2. Improving the sensitivity of searches (i.e., the proportion of relevant studies retrieved by the search over the total number of relevant studies in the database) by identifying additional search terms (validating this requires the development of a gold standard/quasi-gold standard/reference set, which is often used when developing search filters or hedges)
  3. Assisting in the translation of search strategies from one database and/or platform to another
  4. Searching and screening within an integrated system
  5. Developing objectively derived search strategies

Using text mining techniques to increase the objectivity of search strategies requires a more sophisticated use of tools that librarians or other searchers may or may not be prepared to implement. Decisions about cutoffs for high frequency terms, for example, and calculations to establish high frequencies require somewhat large sets of relevant references (which can be derived based on the included studies of relevant systematic reviews, for example) as well as a population set of random records against which one can test whether a term is high frequency across documents in general (for example, words that are high-frequency due to common check tags such as 'human') or in the relevant documents only. 

Text mining, like data science in general, also involves a great deal of preprocessing, which tools may or may not handle. Preprocessing includes data cleaning and normalization techniques such as:

  • Changing all characters to lower case
  • Removing punctuation
  • Stripping whitespace
  • Removing numbers
  • Removal of stopwords
  • Stemming
  • Lemmatization

Some of the tools listed allow for customization of these procedures, while some are preconfigured. Programming tools such as the tm package in R or quanteda allow for much more flexibility than some of the tools covered here, but they are also much more difficult to use if one is not accustomed to programming.

Tools for search strategy development

Tools for search strategy development

Facilitating search strategy translation

Tools to facilitate search strategy translation

These tools should be used with caution: They may apply the correct syntax to translate one strategy to another database/interface and make it seem that the subject headings have also been mapped correctly when in fact they simply change the syntax but do not adjust the subject headings to the corresponding vocabulary (e.g., when translation from PubMed or Ovid MEDLINE to Embase, they will continue to use MeSH terms instead of EMTREE terms). They are useful if you understand the fundamentals of searching within the applicable databases and how the databases/platforms work, but the searches will require reviewing and editing.

Tools to identify experts

Tools to identify experts

Tools to identify high-frequency subject headings

Tools to identify high-frequency subject headings

References on text mining in search strategy development

Text mining in search strategy development

Suggested references

Librarian

Genevieve Gore's picture
Genevieve Gore
Contact:
Schulich Library of Science & Engineering
514.398.3472
Website Skype Contact: genatlibrary

McGill LibraryQuestions? Ask us!
Privacy notice