Skip to Main Content
Linguistic Corpora
-
English-Corpora.org The corpora have many different uses, such as finding the frequency of words, phrases, and collocates; looking at language variation and change, historical dialects.
-
The Corpus of Contemporary American English is a freely-available corpus of English. It was developed by Mark Davies, Professor of Linguistics at Brigham Young University. The corpus contains more than one billion words of text (25+ million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, and TV and Movies subtitles, blogs, and other web pages.
- CHILDES Child Language Data Exchange System
- Russian National Corpus a corpus of the modern Russian language incorporating over 300 million words
- British National Corpus (BNC) a 100 million word collection of samples of written and spoken language from a wide range of sources
- The New York times annotated corpus.