Skip to Main Content

Text Data Mining (TDM)

A research guide for helping you identify public and licensed text sources for text data mining as well as tools for text analysis.

As of September 1, most Library branches, including the stack areas, will be accessible to current McGill students, faculty, and staff. Visit the hours webpage for a full listing of opening and service hours.

Further to Quebec government and university directives, procedural masks will continue to be required in all Library branches. For more information, please visit Library services & spaces re-opening.

Text Corpora with Data Mining Rights Licensed to McGill

  • Digital Scholar Lab (Gale) STAFF PICK!!!  a cloud-based research and learning platform that allows students and researchers to apply natural language processing tools to raw text data (OCR text) from Gale's primary source collections in a single research platform. It contains built-in tools for Topic Modeling, Clustering, Ngrams, Named entity recognition (NER), Parts of Speech Tagger, Sentiment Analysis. McGill subscription includes the following 38 Gale databases
  • HathiTrust Research Center gives access to Extracted Features: an unrestricted dataset of metadata and word counts for each page in the HathiTrust Digital Library; Text Analysis Algorithms, web-based, click-and-run tools that perform computational text analysis on worksets; and Data Capsules, secure virtual environments for non-consumptive text analysis, where researchers can implement their own data analysis and visualization tools
  • JSTOR - Data for Research (DfR) provides datasets of content on JSTOR for use in research and teaching. Researchers may use DfR to define and submit their desired dataset to be automatically processed. Data available through the service includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR. Datasets are produced at no cost to researchers and may include data for up to 25,000 documents.

McGill Library works with vendors whenever possible to include text and data mining into future agreements and can help negotiate access for specific projects (without or with a moderate cost for the researcher). Please contact us if you need our assistance in getting text data mining right for additional databases, e.g. Oxford Resources or ProQuest databases). Consult our Databases A-Z list to see all resources where negotiating text-mining rights is possible.


Profile Photo
Marcela Isuster
Humanities and Social Sciences Library
Bibliothèque des sciences humaines et sociales
3459 McTavish
Montreal, Quebec
H3A 0C9
(514) 398 - 4729

McGill LibraryQuestions? Ask us!
Privacy notice