Skip to Main Content

Text mining for searching and screening the literature

This guide is intended to provide an overview of the definition and application of text mining in search strategy development and study selection; it includes a list of tools and resources that librarians or other motivated searchers may wish to try

Using AntConc

Using AntConc to find term frequencies

Available from

AntConc is a useful tool for finding clusters (frequency patterns of word sequences) or n-grams (sequences of n words within your corpus or document), which may be particularly useful once you have established high-frequency words for a search strategy but need to increase the precision of your search by either searching for phrases that contain those words or by establishing good collocates if you are making selections for adjacency searching.

It is best used once you have established the high-frequency words that you would like to add to your strategy: Tools such as PubReminer and Systematic Review Accelerator's Word Frequency Analysis are easy to use for that purpose as they take into account the occurrence of words but also the number of records in which those words appear, so they could be used before performing the analysis in AntConc (in fact, the Systematic Review Accelerator also identifies n-grams). The corpus or file containing relevant bibliographic records can then be opened in AntConc for text mining, and some authors suggest separately analyzing titles then abstracts, and setting different cutoffs for inclusion (less strict for titles, stricter for abstracts). Stopword lists and lemma lists can also be added to the tool, for example, PubMed's list of 132 stopwords. Many such lists exist for reuse on the internet and the choice depends on the context of the search.


One issue that can be overcome with some scripting is the fact that groups of bibliographic records are usually exported in one file, whereas ideally individual bibliographic records should be imported into AntConc as individual documents (one per record) within a corpus. I have not been able to figure out how to do this yet.

AntConc video tutorials

AntConc tutorials by the software's creator, Laurence Anthony

Tutorial 1: Getting started
Tutorial 6: Clusters tool
Tutorial 7: N-grams tool

Liaison Librarian

Profile Photo
Genevieve Gore
Liaison Librarian, Schulich Library of Physical Sciences, Life Sciences, and Engineering
Contact: Website

McGill LibraryQuestions? Ask us!
Privacy notice