Skip to Main Content

Text Data Mining (TDM)

A research guide for helping you identify public and licensed text sources for text data mining as well as tools for text analysis.

Text Corpora with Data Mining Rights Licensed to McGill

  • Proquest's TDM Studio enables researchers to text mine large volumes of published content from the millions of news articles, scholarly and other publications that McGill University licenses through ProQuest.  TDM Studio provides a cloud-based environment that allows researchers to execute queries, develop datasets, and analyze the text of publications by writing data analysis scripts in R or Python or interacting with the pre-defined data visualizations. Contact us if you'd like to use this service. 
  • Digital Scholar Lab (Gale)  a cloud-based research and learning platform that allows students and researchers to apply natural language processing tools to raw text data (OCR text) from Gale's primary source collections in a single research platform. It contains built-in tools for Topic Modeling, Clustering, Ngrams, Named entity recognition (NER), Parts of Speech Tagger, Sentiment Analysis. McGill subscription includes the following 38 Gale databases
  • HathiTrust Research Center gives access to Extracted Features: an unrestricted dataset of metadata and word counts for each page in the HathiTrust Digital Library; Text Analysis Algorithms, web-based, click-and-run tools that perform computational text analysis on worksets; and Data Capsules, secure virtual environments for non-consumptive text analysis, where researchers can implement their own data analysis and visualization tools
  • Constellate is a text analytics platform aimed at teaching and enabling a generation of researchers to text mine. Two of ITHAKA’s services, JSTOR and Portico, are the initial sources of content for the new platform, which now includes Chronicling America, collections from Documenting the American South, the South Asia Open Archives and Independent Voices from Reveal Digital.   Constellate provides value to users in three core areas -- they can teach and learn text analytics, build datasets from across multiple content sources, and visualize and analyze their datasets.

McGill Library works with vendors whenever possible to include text and data mining into future agreements and can help negotiate access for specific projects (without or with a moderate cost for the researcher). Please contact us if you need our assistance in getting text data mining right for additional databases, e.g. Oxford Resources). Consult our Databases A-Z list to see all resources where negotiating text-mining rights is possible.

Other guides

See the following related guides for:

Librarian

Profile Photo
Marcela Isuster
Contact:
Humanities and Social Sciences Library
Bibliothèque des sciences humaines et sociales
3459 McTavish
Montreal, Quebec
H3A 0C9
(514) 398 - 4729

Text and Data Mining

Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.

McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for managing and analyzing textual data.

McGill LibraryQuestions? Ask us!
Privacy notice