Created in the course of the Text Creation Partnership project undertaken by the University of Michigan Library, Bodleian Libraries at the University of Oxford, ProQuest, and the Council on Library and Information Resources. The content from the Phase I is available for access, distribution, use, or reuse by anyone, the removal of restrictions for the content created during the Phase II will occur on or about January 1, 2021
Sample corpora assembled from Project Gutenberg by students in Alan Liu's English 197 course, Fall 2014 at UC Santa Barbara). They can be particularly useful for assignments and individual students' projects:
ARTFL: Public Databases expansive collection of French-language resources in the humanities and other fields from the 17th to 20th centuries
All of PLOS More than 200,000 fully Open Access research articles available for text data mining. The corpus of articles and metadata can be accessed via the PLOS API or directly downloaded as a zipped file.
HathiTrust Digital Library provides long-term preservation and access services for public domain and in-copyright content from a variety of sources, including Google, the Internet Archive, Microsoft, and in-house partner institution initiatives.
Internet Archive Books includes plain-text ["full text"] access to 20,000,000 books, issues of magazines, periodicals, etc.