Guides: AI and Text Mining for Searching and Screening the Literature: AntConc

Using AntConc

Using AntConc to find term frequencies

Available from http://www.laurenceanthony.net/software/antconc/

AntConc is a useful tool for finding clusters (frequency patterns of word sequences) or n-grams (sequences of n words within your corpus or document), which may be particularly useful once you have established high-frequency words for a search strategy but need to increase the precision of your search by either searching for phrases that contain those words or by establishing good collocates if you are making selections for adjacency searching.

It is best used once you have established the high-frequency words that you would like to add to your strategy: Tools such as PubReminer and Systematic Review Accelerator's Word Frequency Analysis are easy to use for that purpose as they take into account the occurrence of words but also the number of records in which those words appear, so they could be used before performing the analysis in AntConc (in fact, the Systematic Review Accelerator also identifies n-grams). The corpus or file containing relevant bibliographic records can then be opened in AntConc for text mining, and some authors suggest separately analyzing titles then abstracts, and setting different cutoffs for inclusion (less strict for titles, stricter for abstracts). Stopword lists and lemma lists can also be added to the tool, for example, PubMed's list of 132 stopwords. Many such lists exist for reuse on the internet and the choice depends on the context of the search.

Note

One issue that can be overcome with some scripting is the fact that groups of bibliographic records are usually exported in one file, whereas ideally individual bibliographic records should be imported into AntConc as individual documents (one per record) within a corpus.