Skip to Main Content

Linguistics

Linguistic Corpora

  • English-Corpora.org The corpora have many different uses, such as finding the frequency of words, phrases, and collocates; looking at language variation and change, historical dialects.

  • The Corpus of Contemporary American English  is a freely-available corpus of English. It was developed by Mark Davies, Professor of Linguistics at Brigham Young University. The corpus contains more than one billion words of text (25+ million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, and TV and Movies subtitles, blogs, and other web pages.

  • CHILDES Child Language Data Exchange System
     
  • Russian National Corpus a corpus of the modern Russian language incorporating over 300 million words
     
  • British National Corpus (BNC) a 100 million word collection of samples of written and spoken language from a wide range of sources
     
  • The New York times annotated corpus

Liaison Librarian

Profile Photo
Veronica Bergsten
she/her
Contact:
Humanities and Social Sciences Library
Bibliothèque des sciences humaines et sociales
3459 McTavish
Montreal, Quebec
H3A 0C9
514-396-2067

McGill LibraryQuestions? Ask us!
Privacy notice