Introduction to Corpus Linguistics| Features/Principles of Corpus | Urdu and Hindi | Notes PDF |

Описание к видео Introduction to Corpus Linguistics| Features/Principles of Corpus | Urdu and Hindi | Notes PDF |

Corpus Linguistics
 investigates language on the basis of
electronically stored samples of naturally
occurring language
 ‘corpus’ is a collection of such language samples
stored in a principled way in order to address
linguistic questions
What is corpus linguistics?
Corpus linguistics is a field which focuses upon a set of procedures, or methods, for studying language. We can take a corpus-based approach to many areas of linguistics. Importantly, the development of corpus linguistics has also spawned new theories of language – theories which draw their inspiration from attested language use and the findings drawn from it.

But corpus linguistics is not a monolithic, consensually agreed set of methods and procedures. It is in fact a heterogeneous field – although there are some basic generalisations that we can make.

A concordance in the AntConc tool
A concordance in the AntConc tool

The main features of corpus linguistics
Research in corpus linguistics deals with some set of machine-readable texts which is deemed an appropriate basis on which to study a particular research questions. The set of texts or corpus is usually of a size which defies analysis by hand and eye alone within any reasonable timeframe. For this reason, corpora are invariably exploited using software search tools. Concordancers allow users to look at words in context. Other tools allow the production of frequency data, for example a word frequency list, which lists all words appearing in a corpus and specifies how many times each one occurs in that corpus. Concordances and frequency data exemplify respectively the two forms of analysis, namely qualitative and quantitative, that are equally important to corpus linguistics.

Different types of corpus study
The following features effectively distinguish different types of studies in corpus linguistics:

Mode of communication;
Corpus-based versus corpus-driven linguistics;
Data collection regimes;
The use of annotated versus unannotated corpora;
Multilingual versus monolingual corpora.

Quantitative and Qualitative Analyses
"Quantitative techniques are essential for corpus-based studies. For example, if you wanted to compare the language use of patterns for the words big and large, you would need to know how many times each word occurs in the corpus, how many different words co-occur with each of these adjectives (the collocations), and how common each of those collocations is. These are all quantitative measurements....

"A crucial part of the corpus-based approach is going beyond the quantitative patterns to propose functional interpretations explaining why the patterns exist. As a result, a large amount of effort in corpus-based studies is devoted to explaining and exemplifying quantitative patterns."


linguistic interaction. Since a corpus is capable of representing potentially unlimited
selections of text, it may be defined acrostically from the letters used to compose the
term in following way (Dash 2005: 4):
C : Compatible to both man and computer,
O : Operational in research and application,
R : Representative of a language or a variety,
P : Processable by both man and machine,
U : Unlimited in the amount of data and samples, and
S : Systematic both in formation and representation.

Комментарии

Информация по комментариям в разработке