Corpus Linguistics

The word "corpus", derived from the Latin word meaning "body", may be used to refer to any text in written or spoken form. However, in modern Linguistics this term is used to refer to large collections of texts which represent a sample of a particular variety or use of language(s) that are presented in machine readable form.

Corpus Linguistics is a branch of linguistics that uses a large collection of natural texts known as corpus for analysis. It is a complementary approach to traditional approaches. Corpus linguistics gets its real power by using computers for analysis.

There are many different kinds of corpora. They can contain written or spoken (transcribed) language, modern or old texts, texts from one language or several languages. The texts can be whole books, newspapers, journals, speeches etc, or consist of extracts of varying length. The kind of texts included and the combination of different texts vary between different corpora and corpus types.

"Corpus studies boomed from 1980 onwards, as corpora, techniques and new arguments in favour of the use of corpora became more apparent. Currently this boom continues--and both of the 'schools' of corpus linguistics are growing . . .. Corpus linguistics is maturing methodologically and the range of languages addressed by corpus linguists is growing annually."
(Tony McEnery and Andrew Wilson, Corpus Linguistics, Edinburgh University Press, 2001)

"To make good use of corpus resources a teacher needs a modest orientation to the routines involved in retrieving information from the corpus, and--most importantly--training and experience in how to evaluate that information."
(John McHardy Sinclair, How to Use Corpora in Language Teaching, John Benjamins, 2004)