Skip to Main Content


Linguistic Corpora

  • The corpora have many different uses, such as finding the frequency of words, phrases, and collocates; looking at language variation and change, historical dialects.

  • The Corpus of Contemporary American English  is a freely-available corpus of English. It was developed by Mark Davies, Professor of Linguistics at Brigham Young University. The corpus contains more than one billion words of text (25+ million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, and TV and Movies subtitles, blogs, and other web pages.

  • CHILDES Child Language Data Exchange System
  • Russian National Corpus a corpus of the modern Russian language incorporating over 300 million words
  • British National Corpus (BNC) a 100 million word collection of samples of written and spoken language from a wide range of sources
  • The New York times annotated corpus

Linguistics Librarian

Profile Photo
Tatiana Bedjanian
McLennan-Redpath Library Complex 3459 McTavish Street
Montreal, Quebec H3A 0C9

McGill LibraryQuestions? Ask us!
Privacy notice