Created in the course of the Text Creation Partnership project undertaken by the University of Michigan Library, Bodleian Libraries at the University of Oxford, ProQuest, and the Council on Library and Information Resources. The content from the Phase I is available for access, distribution, use, or reuse by anyone, the removal of restrictions for the content created during the Phase II will occur on or about January 1, 2021
Sample corpora assembled from Project Gutenberg by students in Alan Liu's English 197 course, Fall 2014 at UC Santa Barbara). They can be particularly useful for assignments and individual students' projects:
ARTFL: Public Databases expansive collection of French-language resources in the humanities and other fields from the 17th to 20th centuries
All of PLOS More than 200,000 fully Open Access research articles available for text data mining. The corpus of articles and metadata can be accessed via the PLOS API or directly downloaded as a zipped file.
HathiTrust Digital Library provides long-term preservation and access services for public domain and in-copyright content from a variety of sources, including Google, the Internet Archive, Microsoft, and in-house partner institution initiatives.
Internet Archive Books includes plain-text ["full text"] access to 20,000,000 books, issues of magazines, periodicals, etc.
Project Gutenberg out of copyright works that can be downloaded individually as plain text with some limited automated access
Text and data mining refers to the processes by which "text or datasets are crawled by software that recognizes entities, relationships, and action." -- GALE, 2017. Text and data mining is an important, new area for academic researchers largely because the output of these processes can result in detecting patterns, trends and also drawing new conclusions.
McGill Library can facilitate access to text corpora for McGill researchers. Assistance can entail helping you locate textual data sources, negotiate access to textual collections for text mining, and, in some cases, purchase or license data. We can also help you find and use tools for managing and analyzing textual data.
See the following related guides for:
Network Analysis & Data Visualization for Humanities and Social Sciences
Creating digital books and exhibits as well as interactive map and timelines: Digital and Multimedia Publishing guide
GIS software & tools: Maps and Geospatial Data guide
McGill Libraries • Questions? Ask us!
Privacy notice