Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.
McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for managing and analyzing textual data.
Lexos is a web-based tool designed for transforming, analyzing, and visualizing texts small to medium text collections, and is especially useful with ancient languages and languages that do not employ the Latin alphabet.
The HathiTrust Research Center enables computational analysis of the HathiTrust corpus. It is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with HathiTrust, to help meet the technical challenges researchers face when dealing with massive amounts of digital text. It develops cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.
HTRC Analytics is the primary site for interacting with HTRC. It provides access to HTRC worksets and off-the-shelf algorithms to analyze them. It also contains a dashboard where researchers can create a secure computing environment, called a Data Capsule. McGill Login required.
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
An open-source tool for comparing and collating multiple witnesses to a single textual work. Originally designed to aid scholars and editors examine the history of a text from manuscript to print versions, Juxta is a desktop application that offers a number of possibilities for humanities computing and textual scholarship.