Skip to Main Content


Linguistic Corpora

  • The corpora have many different uses, such as finding the frequency of words, phrases, and collocates; looking at language variation and change, historical dialects.

  • The Corpus of Contemporary American English  is a freely-available corpus of English. It was developed by Mark Davies, Professor of Linguistics at Brigham Young University. The corpus contains more than one billion words of text (25+ million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, and TV and Movies subtitles, blogs, and other web pages.

  • CHILDES Child Language Data Exchange System
  • Russian National Corpus a corpus of the modern Russian language incorporating over 300 million words
  • British National Corpus (BNC) a 100 million word collection of samples of written and spoken language from a wide range of sources
  • The New York times annotated corpus

Liaison Librarian

Profile Photo
Veronica Bergsten
Humanities and Social Sciences Library
Bibliothèque des sciences humaines et sociales
3459 McTavish
Montreal, Quebec
H3A 0C9

McGill LibraryQuestions? Ask us!
Privacy notice