Guides: AI and Text Mining for Searching and Screening the Literature: Tools for search strategy development

Text mining tools for searching

Stansfield, O'Mara-Eves, and Thomas (2017) report five ways in which text mining tools can assist in search strategy development:

Improving the precision of searches (i.e., the proportion of retrieved records that are relevant), for example, by identifying more precise phrasal terms instead of using single-word terms in a search
Improving the sensitivity of searches (i.e., the proportion of relevant studies retrieved by the search over the total number of relevant studies in the database) by identifying additional search terms (validating this requires the development of a gold standard/quasi-gold standard/reference set, which is often used when developing search filters or hedges)
Assisting in the translation of search strategies from one database and/or platform to another
Searching and screening within an integrated system
Developing objectively derived search strategies

Using text mining techniques to increase the objectivity of search strategies requires a more sophisticated use of tools that librarians or other searchers may or may not be prepared to implement. Decisions about cutoffs for high frequency terms, for example, and calculations to establish high frequencies require somewhat large sets of relevant references (which can be derived based on the included studies of relevant systematic reviews, for example) as well as a population set of random records against which one can test whether a term is high frequency across documents in general (for example, words that are high-frequency due to common check tags such as 'human') or in the relevant documents only.

Text mining, like data science in general, also involves a great deal of preprocessing, which tools may or may not handle. Preprocessing includes data cleaning and normalization techniques such as:

Changing all characters to lower case
Removing punctuation
Stripping whitespace
Removing numbers
Removal of stopwords
Stemming
Lemmatization

Some of the tools listed allow for customization of these procedures, while some are preconfigured. Programming tools such as the tm package in R or quanteda allow for much more flexibility than some of the tools covered here, but they are also much more difficult to use if one is not accustomed to programming.

Tools for search strategy development

Digital Evidence Synthesis Tool (DEST) Evaluations - Examines automation tools in evidence synthesis, including tools available for the searching stage

Use filters in the left column to limit to, e.g., Evidence synthesis stage: Searching/Deduplication, then click on "List records" button at the top of the column
Focuses on tools for health and climate change syntheses, but tools are often agnostic or applicable to fields more generally speaking, like health sciences

Anne O'Tate
Allows user to perform a PubMed search and view important words, phrases, topics, authors, MeSH pairs, and gaps
AntConc
Concordance tool which can be used to show collocates, clusters, and n-grams; useful for adjacency searching decisions or to add precision to a search. Free download.
BibExcel
Bibliographic data analysis that can be used in Excel to count the occurrence of words, stemmed words, or cooccurrence of words across a pool of citation data. Free download.
EndNote for text mining
Use EndNote to list high-frequency words or subject headings based on pre-selected relevant records that have been imported into an EndNote library.
litsearchr
R package for quasi-automated search strategy development; use with RStudio. Graphical user interface also available for those who do not want to code (still in development)
PubMed PubReminer
Allows you to directly enter, for example, a set of relevant PMIDs and determine high frequency words and subject headings associated with those specific records. Can also be used to determine the most prolific authors on a topic (based on indexed content), for example, authors who have the highest frequency of publications including a specific term like "hospital at home"[tiab]
PubMed Similar Articles
PubMed's "Similar Articles" feature uses an algorithm to identify potentially similar articles to a specific article record and can be a method to identify additional search terms associated with relevant records (e.g., by scanning the titles of Similar Articles for terms getting at the same concepts in your search, that have not been included in the search)
Systematic Review Accelerator
Uses more sophisticated techniques for term frequency analysis (multi-word terms are an option, based on n-grams) which allow you to rank terms by also taking into account the number of records and the fields in which words occur. Useful for databases records found outside MEDLINE. Free with sign up.
TerMine
"A domain independent method for automatic term recognition" using C-value approach for recognition of multi-word terms. Freeware; can be used through website or can request batch service access for analysis of documents > 2 MB. Free.
Text Analyzer
Upload text files to find the more frequent phrases or frequencies of words. Free.
VOSviewer
Visualization tool which includes text mining functionality to see co-occurrence networks of terms. Can be used through web browser or downloaded. Free.
Voyant Tools
Combines text mining with data visualization. Free.
WordStat
Text mining or content analysis software that seamlessly integrates with SimStat (statistical analysis software), QDA Miner (qualitative data analysis software, an alternative to NVivo and Atlas.ti), as well as STATA. The company that produces WordStat, Provalis, is based in Montreal. Paid.
Yale MeSH Analyzer
Enter up to 20 PubMed unique identifiers (PMIDs) of records that are relevant to your topic and then visualize, in a tabular format, what MeSH terms were assigned to the records. Allows you to visually analyze any patterns of assigned MeSH terms, e.g., more frequently used MeSH terms. Free.

Facilitating search strategy translation

These tools should be used with caution: They may apply the correct syntax to translate one strategy to another database/interface and make it seem that the subject headings have also been mapped correctly when in fact they simply change the syntax but do not adjust the subject headings to the corresponding vocabulary (e.g., when translation from PubMed or Ovid MEDLINE to Embase, they will continue to use MeSH terms instead of EMTREE terms). They are useful if you understand the fundamentals of searching within the applicable databases and how the databases/platforms work, but the searches will require reviewing and editing.

Medline Transpose
Tool to translate search queries between PubMed and Ovid MEDLINE
SRA Polyglot
Translate your search syntax from PubMed or Ovid MEDLINE to PubMed, Ovid MEDLINE/Embase/PsycInfo, CINAHL, Cochrane Library, Scopus, and/or Web of Science. To work well, requires knowledge of how database searches should work given adjustments will likely be necessary (corrections to search fields, removal of orphan field tags, translation of subject headings when necessary - Polyglot does not identify the subject headings to be used in Embase, PsycInfo, and CINAHL)

Word macros
Microsoft Word macros developed by Wichor Bramer; utility will depend on whether their databases and platforms correspond to those used at other institutions (e.g., embase.com versus Embase on OvidSP)
Search Builder 1.0: Excel macros for PC or Mac
Excel macros for translating search strings to PubMed and embase.com; limited functionality

Tools to identify authors based on PubMed records

Anne O'Tate
Allows user to perform a PubMed search and view important words, phrases, topics, authors, MeSH pairs, and gaps
PubMed PubReminer
Allows you to directly enter, for example, a set of relevant PMIDs and determine high frequency words and subject headings associated with those specific records. Can also be used to determine the most prolific authors on a topic (based on indexed content), for example, authors who have the highest frequency of publications including a specific term like "hospital at home"[tiab]

Tools to identify high-frequency subject headings

EndNote for text mining
Use EndNote to list high-frequency words or subject headings based on pre-selected relevant records that have been imported into an EndNote library.
PubMed PubReminer
Allows you to directly enter, for example, a set of relevant PMIDs and determine high frequency words and subject headings associated with those specific records. Can also be used to determine the most prolific authors on a topic (based on indexed content), for example, authors who have the highest frequency of publications including a specific term like "hospital at home"[tiab]

References on AI and text mining in search strategy development

Useful repository of articles:

Bond, M., Finnerty, A., O'Mara-Eves, A., O'Driscoll, P., Thomas, J., Minx, J., Callaghan, M., & Scheelbeek, P. (2024). Digital Evidence Synthesis Tool Evaluations. EPPI Visualiser database.
- In left column, under "Evidence Synthesis Stage": Select "Searching/Deduplicating" > Click "List Records"

Selected articles:

Adam, G. P., DeYoung, J., Paul, A., Saldanha, I. J., Balk, E. M., Trikalinos, T. A., & Wallace, B. C. (2024). Literature search sandbox: A large language model that generates search queries for systematic reviews. JAMIA Open, 7(3), ooae098.
Grames EM, Stillman AN, Tingley MW, Elphick CS. (2019). An automated approach to identifying search terms for systematic reviews using keyword co-occurrence networks. Methods in Ecology and Evolution, 10(10), 1645-54.
Hatami, N., Zarenezhad, M., Javdani, F., Keshavarz, P., Kalani, N., Sadeghinikoo, A., Ghaedi, M., Nobari, M. N., Feily, A., & Hashemi, S. A. (2023). Meta-phill: Feasibility of a new metadata repository for evidence-based practice in literature review. Research Square.
Hausner, E., Knelangen, M., & Waffenschmidt, S. (2022). Use of text mining tools in the development of search strategies – Comparison of different approaches. Journal of Clinical Epidemiology, 149, 254-256.
Kapp, C., Fujita-Rohwerder, N., Lilienthal, J., Sieben, W., Waffenschmidt, S., & Hausner, E. (2024). The searchbuildr shiny app: A new implementation of the objective approach for search strategy development in systematic reviews. Cochrane Evidence Synthesis and Methods, 2(6), e12078.
Lefebvre, C., Glanville, J., Briscoe, S., Featherstone, R., Littlewood, A., Metzendorf, M.-I., Noel-Storr, A., Paynter, R., Rader, T., Thomas, J., Wieland, L. S. (2024, Sep). Technical Supplement to Chapter 4: Searching for and selecting studies. 3.2.3 Text mining, machine learning and artificial intelligence for term selection and strategy building. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane. Available from www.training.cochrane.org/handbook.
O'Keefe, H., Rankin, J., Wallace, S. A., & Beyer, F. (2023). Investigation of text-mining methodologies to aid the construction of search strategies in systematic reviews of diagnostic test accuracy—A case study. Research Synthesis Methods, 14(1), 79-98.
Paynter, R. A., Featherstone, R., Stoeger, E., Fiordalisi, C., Voisin, C., & Adam, G. P. (2021). A prospective comparison of evidence synthesis search strategies developed with and without text-mining tools. Journal of Clinical Epidemiology.
Smalheiser, N. R., Fragnito, D. P., & Tirk, E. E. (2021). Anne O’Tate: Value-added PubMed search engine for analysis and text mining. PloS One, 16(3), e0248335.
Stansfield, C., O'Mara-Eves, A., & Thomas, J. (2017). Text mining for search term development in systematic reviewing: A discussion of some methods and challenges. Research Synthesis Methods, 8(3), 355-365.
Staudinger, M., Kusa, W., Piroi, F., Lipani, A., & Hanbury, A. (2024). A reproducibility and generalizability study of large language models for query generation. Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, Tokyo, Japan.

Liaison Librarian

Genevieve Gore

she/her

Email me

Contact:

Macdonald Stewart Library Building
809 Sherbrooke St W
Montreal QC H3A 0C1

Subjects: Biomedical ethics, Health and biological sciences, Medicine, Public health

Librarian

Sabine Calleja

she/her/elle

Email me

Contact:

Schulich Library of Physical Sciences, Life Sciences, and Engineering

Subjects: Health and biological sciences, Medicine, Nursing

Online training: AI in searching

(How) can AI-based automation tools assist with systematic searching?
Cochrane training providing "overview of tools that are already being used for supporting the search process and (...) a critical appraisal of large language model (LLM)-based tools regarding their performance in supporting exploratory and systematic searches (can they identify relevant references, can they design, peer review and run searches (...) [E]xplor[ation] of AI-supported search tools that may be used in the preparation of systematic searches or as supplementary search methods.
Artificial Intelligence (AI) methods in evidence synthesis: Learning Live webinar series
"In this webinar series, we will explore the role of AI in evidence synthesis, examine how it can complement traditional methods and provide a platform for experts to discuss the opportunities, challenges, and risks involved. This series targets those with foundational knowledge of systematic reviews who want to stay updated on AI developments in evidence synthesis."