This guide is intended to provide quick access for expert searchers to text mining tools that are being used in the context of searching and screening the literature. It is not an exhaustive guide but focuses on resources that do not require programming skills or the assistance of computer scientists, computational linguists, or statisticians, i.e., tools that librarians and other searchers would be comfortable using. It also includes some guidance on more advanced tools.
Text mining is the process of distilling actionable insights from/identifying structure within unstructured text (Kwartler 2017). Text mining is based in statistics, computational linguistics, computer science, information studies, and other disciplines. It is most useful in large collections of unstructured textual data in which manual analysis would be unscalable, for example when screening tens of thousands of records in a knowledge synthesis, but it can also be applied in contexts such as search strategy development, and can be used with structured data (like bibliographic records) as well.
Here we will go over some of the ways in which text mining has been used in knowledge syntheses, from the development of methodological search filters to objective search strategy development to other steps in the process such as screening/study selection.
Text mining and machine learning are being used to enhance traditional search techniques as well as to overcome their limitations. Librarians teach Boolean logic extensively in advanced searching workshops. Although Boolean searching is still important, other information retrieval methods like similarity queries and automated categorization (clustering) are showing up in the literature, and as a result useful tools are cropping up as well. It's important that searchers understand how information retrieval methods are evolving, and how they can help advance our own professional practices and those of our users.
Why text mining? Aren't librarians and other advanced searchers already text miners?
Here, the tools will mainly focus on term extraction, but the tools for study selection/screening also frequently use machine learning for text categorization/classification/clustering.