Skip to Main Content

Text Data Mining (TDM)

A research guide for helping you identify public and licensed text sources for text data mining as well as tools for text analysis.

Text Corpora with Data Mining Rights Licensed to McGill

  • Digital Scholar Lab (Gale)  a cloud-based research and learning platform that allows students and researchers to apply natural language processing tools to raw text data (OCR text) from Gale's primary source collections in a single research platform. It contains built-in tools for Topic Modeling, Clustering, Ngrams, Named entity recognition (NER), Parts of Speech Tagger, Sentiment Analysis. McGill subscription includes the following 38 Gale databases
  • HathiTrust Research Center gives access to Extracted Features: an unrestricted dataset of metadata and word counts for each page in the HathiTrust Digital Library; Text Analysis Algorithms, web-based, click-and-run tools that perform computational text analysis on worksets; and Data Capsules, secure virtual environments for non-consumptive text analysis, where researchers can implement their own data analysis and visualization tools
  • JSTOR Text Analysis Support accommodates text analysis and digital humanities research by providing datasets of full-text for journals, books, research reports, and pamphlets on JSTOR. Text analysis—also known as text analytics or text mining—is the process of using technology to find valuable insights, trends, and patterns in text data to create new information.
  • IEEE Xplore API Platform and Metadata API API Platform and Metadata API provides metadata for IEEE Xplore articles.
  • Elsevier allows text mining of their content through the use of their API. Anyone can obtain an API Key and use the APIs free of charge. However, full API access is only granted to clients that have subscriptions to the corresponding Elsevier product. Clients without subscriptions have access to limited basic metadata for most publications and citation records, as well as to basic search functionality. Content published by Elsevier under Open Access licenses is fully available.
  • Wiley allows users to TDM under license (or in accordance with statutory rights under applicable legislation) on subscribed content for non-commercial purposes at no extra cost.

McGill Library works with vendors whenever possible to include text and data mining into future agreements and can help negotiate access for specific projects (without or with a moderate cost for the researcher). Please contact us if you need our assistance in getting text data mining right for additional databases, e.g. Oxford Resources). Consult our Databases A-Z list to see all resources where negotiating text-mining rights is possible.

Other guides

See the following related guides for:

Librarian

Profile Photo
Marcela Isuster
Contact:
Humanities and Social Sciences Library
Bibliothèque des sciences humaines et sociales
3459 McTavish
Montreal, Quebec
H3A 0C9
(514) 398 - 4729

Text and Data Mining

Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.

McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for managing and analyzing textual data.

McGill LibrariesQuestions? Ask us!
Privacy notice