Skip to main content

Text Data Mining (TDM)

A research guide for helping you identify public and licensed text sources for text data mining as well as tools for text analysis.
Further to Quebec government and university directives, McGill campuses are currently closed. As a result, all library branches are closed until further notice. Library staff will continue to provide virtual reference during service hours. I will also be available via email and virtual consultation. For more information about the closure, please visit our COVID -19 | Library Closure & Service Disruptions FAQ.

Text Corpora with Data Mining Rights Licensed to McGill

  • Digital Scholar Lab (Gale) STAFF PICK!!!  a cloud-based research and learning platform that allows students and researchers to apply natural language processing tools to raw text data (OCR text) from Gale's primary source collections in a single research platform. It contains built-in tools for Topic Modeling, Clustering, Ngrams, Named entity recognition (NER), Parts of Speech Tagger, Sentiment Analysis. McGill subscription includes the following 38 Gale databases
  • HathiTrust Research Center gives access to Extracted Features: an unrestricted dataset of metadata and word counts for each page in the HathiTrust Digital Library; Text Analysis Algorithms, web-based, click-and-run tools that perform computational text analysis on worksets; and Data Capsules, secure virtual environments for non-consumptive text analysis, where researchers can implement their own data analysis and visualization tools
  • JSTOR - Data for Research (DfR) provides datasets of content on JSTOR for use in research and teaching. Researchers may use DfR to define and submit their desired dataset to be automatically processed. Data available through the service includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR. Datasets are produced at no cost to researchers and may include data for up to 25,000 documents.

McGill Library works with vendors whenever possible to include text and data mining into future agreements and can help negotiate access for specific projects (without or with a moderate cost for the researcher). Please contact us if you need our assistance in getting text data mining right for additional databases, e.g. Oxford Resources or ProQuest databases). Consult our Databases A-Z list to see all resources where negotiating text-mining rights is possible.

Librarian

Svetlana Kochkina's picture
Svetlana Kochkina
Contact:
McLennan-Redpath Library Complex
3459 McTavish Street
Montreal, Quebec H3A 0C9
514-398-7224

Librarian

Eamon Duffy's picture
Eamon Duffy
Contact:
Humanities and Social Sciences Library
3459, rue McTavish
Montréal (Québec) H3A 0C9
514-398-4697

McGill LibraryQuestions? Ask us!
Privacy notice