Guides: Text Data Mining (TDM): Text Corpora with Data Mining Rights Licensed to McGill

Text Corpora with Data Mining Rights Licensed to McGill

Proquest's TDM Studio enables researchers to text mine large volumes of published content from the millions of news articles, scholarly and other publications that McGill University licenses through ProQuest. TDM Studio provides a cloud-based environment that allows researchers to execute queries, develop datasets, and analyze the text of publications by writing data analysis scripts in R or Python or interacting with the pre-defined data visualizations. You will need to create an account with your McGill email address before logging in for the first time.
Digital Scholar Lab (Gale) a cloud-based research and learning platform that allows students and researchers to apply natural language processing tools to raw text data (OCR text) from Gale's primary source collections in a single research platform. It contains built-in tools for Topic Modeling, Clustering, Ngrams, Named entity recognition (NER), Parts of Speech Tagger, Sentiment Analysis. McGill subscription includes the following 38 Gale databases

HathiTrust Research Center gives access to Extracted Features: an unrestricted dataset of metadata and word counts for each page in the HathiTrust Digital Library; Text Analysis Algorithms, web-based, click-and-run tools that perform computational text analysis on worksets; and Data Capsules, secure virtual environments for non-consumptive text analysis, where researchers can implement their own data analysis and visualization tools

Constellate is a text analytics platform aimed at teaching and enabling a generation of researchers to text mine. Two of ITHAKA’s services, JSTOR and Portico, are the initial sources of content for the new platform, which now includes Chronicling America, collections from Documenting the American South, the South Asia Open Archives and Independent Voices from Reveal Digital. Constellate provides value to users in three core areas -- they can teach and learn text analytics, build datasets from across multiple content sources, and visualize and analyze their datasets.

IEEE Xplore API Platform and Metadata API provides metadata for IEEE Xplore articles.
Elsevier allows text mining of their content through the use of their API. Anyone can obtain an API Key and use the APIs free of charge. However, full API access is only granted to clients that have subscriptions to the corresponding Elsevier product. Clients without subscriptions have access to limited basic metadata for most publications and citation records, as well as to basic search functionality. Content published by Elsevier under Open Access licenses is fully available.

McGill Library works with vendors whenever possible to include text and data mining into future agreements and can help negotiate access for specific projects (without or with a moderate cost for the researcher). Please contact us if you need our assistance in getting text data mining right for additional databases, e.g. Oxford Resources). Consult our Databases A-Z list to see all resources where negotiating text-mining rights is possible.

Other guides

See the following related guides for:

Network Analysis & Data Visualization for Humanities and Social Sciences
Creating digital books and exhibits as well as interactive map and timelines: Digital and Multimedia Publishing guide
GIS software & tools: Maps and Geospatial Data guide
Statistical data, software, & visualisation: Numeric Data guide

Librarian

Marcela Isuster

Email me

Contact:

Humanities and Social Sciences Library
Bibliothèque des sciences humaines et sociales
3459 McTavish
Montreal, Quebec
H3A 0C9

(514) 398 - 4729

Subjects: Digital humanities, Digital scholarship, Hispanic studies, Italian language and literature, Library and information studies

Text and Data Mining

Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.

McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for managing and analyzing textual data.