Annotation of Corpora| Linguistics & Non-Linguistics Annotation| Maxims of Annotation| Urdu/Hindi

Описание к видео Annotation of Corpora| Linguistics & Non-Linguistics Annotation| Maxims of Annotation| Urdu/Hindi

Annotation
Advantages of Annotation
How Annotation is achieved?
Linguistics & Non- Linguistics Annotation
Maxims of Annotation
Corpus annotation is the practice of adding interpretative linguistic information to a corpus. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which words in a text belong. This is so-called part-of-speech tagging (or POS tagging), and can be useful, for example, in distinguishing words which have the same spelling, but different meanings or pronunciation. If a word in a text is spelt present, it may be a noun (= 'gift'), a verb (= 'give someone a present') or an adjective (= 'not absent'). The meanings of these same-looking words are very different, and also there is a difference of pronunciation, since the verb present has stress on the final syllable. Using one simple method of representing the POS tags — attaching tags to words by an underscore symbol — these three words may be annotated as follows:

present_NN1 (singular common noun)
present_VVB (base form of a lexical verb)
present_JJ (general adjective)

What different kinds of annotation are there?
Apart from part-of-speech (POS) tagging, there are other types of annotation, corresponding to different levels of linguistic analysis of a corpus or text — for example:

phonetic annotation
e.g. adding information about how a word in a spoken corpus was pronounced. prosodic annotation — again in a spoken corpus — adding information about prosodic features such as stress, intonation and pauses. syntactic annotation — e.g. adding information about how a given sentence is parsed, in terms of syntactic analysis into such units such phrases and clauses
semantic annotation
e.g. adding information about the semantic category of words — the noun cricket as a term for a sport and as a term for an insect belong to different semantic categories, although there is no difference in spelling or pronunciation.
pragmatic annotation
e.g. adding information about the kinds of speech act (or dialogue act) that occur in a spoken dialogue — thus the utterance okay on different occasions may be an acknowledgement, a request for feedback, an acceptance, or a pragmatic marker initiating a new phase of discussion.
discourse annotation
e.g. adding information about anaphoric links in a text, for example connecting the pronoun them and its antecedent the horses in: I'll saddle the horses and bring them round. [an example from the Brown corpus]
stylistic annotation
e.g. adding information about speech and thought presentation (direct speech, indirect speech, free indirect thought, etc.)
lexical annotation
adding the identity of the lemma of each word form in a text — i.e. the base form of the word, such as would occur as its headword in a dictionary (e.g. lying has the lemma LIE).

Комментарии

Информация по комментариям в разработке