This guide is intended to provide quick access for expert searchers to tools using AI -- or, more specifically, machine learning -- and text mining, in the context of searching and screening the literature. It is not an exhaustive guide but focuses on resources that do not require programming skills or the assistance of computer scientists, computational linguists, or statisticians, i.e., tools that librarians and other searchers would be comfortable using. It also includes some guidance on more advanced tools.
Based on a prompt of ChatGPT 3.5, machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and statistical models enabling computer systems to improve their performance on a task through experience, without being explicitly programmed.
Text mining is the process of distilling actionable insights from/identifying structure within unstructured text (Kwartler 2017). Text mining is based in statistics, computational linguistics, computer science, information studies, and other disciplines. It is most useful in large collections of unstructured textual data in which manual analysis would be unscalable, for example when screening tens of thousands of records in a knowledge synthesis, but it can also be applied in contexts such as search strategy development, and can be used with corpora composed of structured data (like bibliographic records) as well.
Here we will go over some of the ways in which machine learning and text mining have been used in knowledge syntheses, from the identification of relevant articles to the development of methodological search filters to other steps in the process such as screening/study selection.
Machine learning and text mining are being used to enhance traditional search techniques as well as to overcome their limitations. Librarians teach Boolean logic extensively in advanced searching workshops. Although Boolean searching is still important, other information retrieval methods like similarity queries and automated categorization (clustering) are showing up in the literature, and as a result useful tools are cropping up as well. It's important that searchers understand how information retrieval methods are evolving, and how they can help advance our own professional practices and those of our users.
Why text mining? Aren't librarians and other advanced searchers already text miners?
Here, the tools will mainly focus on term extraction, but the tools for study selection/screening also frequently use machine learning for text categorization/classification/clustering.
McGill Libraries • Questions? Ask us!
Privacy notice