Using TerMine to find multi-word terms
Available from http://www.nactem.ac.uk/software/termine/
TerMine is a useful tool for drawing out high-frequency multi-word terms from a corpus; however, it treats the corpus as a single file or document rather than also taking into account patterns across the documents, i.e., bibliographic records, so it is not possible to know if a term is highly frequent in, for example, only one record in a corpus, or common across many records in the corpus.
- TerMine integrates an automatic term recognition algorithm using C-values (method combining linguistic and statistical analyses) and AcroMine acronym recognition (acronym dictionary generated from MEDLINE)
- Based on natural language processing techniques
- Includes option to select a part of speech (POS) tagger for biomedical texts, GENIA Tagger, or a POS tagger for generic texts, Tree Tagger
- TerMine is available through a free web demonstration or for download upon request; it is also built into the EPPI-Reviewer software, wihch is paid
Suggestions for using TerMine to identify multi-word terms
- Collect the bibliographic records you would like to analyze in an EndNote library (or other citation software)
- Using EndNote: Create an output style including the record fields you would like to analyze for high-frequency terms (e.g., title, abstract, keywords); export the records using said output style to a text file
- From the TerMine Web Demonstration interface:
- Choose the text file you saved using the Local text file option
- Select GENIA Tagger version 2.1 for biomedical records
- Click Analyze
- From the resulting page, you can change the C-value threshold to highlight terms with a given C-value or higher if the number of terms is too high
- Select in table to display the list of terms by C-value, in descending order
- Copy the terms
- From Excel: Paste Special and select Text to maintain the table format in Excel