Skip to main content


Source of microdata, aggregate data, and statistics, data visualization, text and data mining, R programming, SPSS, SAS, Stata

Text and Data Mining

Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is emerging as an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.

McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for mananging and analyzing textual data.

Free Digital Text Corpora

Large digital archives and publishers are increasingly making large corposus of text available for researchers to text mine. Here is a select list of sources that make text corpora freely available.

Text Corpora with Data Mining Rights Licensed to McGill

McGill Library works with vendors whenever possible to include text and data mining into future agreements and can help negotiate access for specific projects. Some licensed databases that McGill has negotiated text data mining rights for are listed below.

GALE Databases:

Artemis Primary Sources

Contemporary Authors

Dictionary of Literary Biography Complete Online

Eighteenth Century Collections Online

Gale Virtual Reference Library


If you would like access to any of these collections for text mining purposes, please contact:

Tools for Mining Text Corpora

Tools and Tutorials for Mining Social Media

Tools and Tutorials for Cleaning, Visualizing and Analyzing Textual Data

McGill LibraryQuestions? Ask us!
Privacy notice