Časopis Slovo a slovesnost
en cz

Syntaktická proměna Českého akademického korpusu

Barbora Hladká, Zdeňka Urešová, Alla Bémová

[Discussion]

(pdf)

The syntactic transformation of the Czech Academic Corpus

A B S T R A C T
The idea of the Czech Academic Corpus (CAC) came to life in 1971 thanks to the Department of Mathematical Linguistics within the Czech Language Institute. By the mid 1980s, a total of 540,000 words were morphologically and syntactically annotated manually. After the Prague Dependency Treebank (PDT) – the largest annotated treebank of Czech written texts – was built, the conversion from CAC to PDT format began. The main goal was to make the CAC and the PDT compatible, and thus to enable the integration of the CAC into the PDT. The second version of the CAC is thus a complete conversion of the internal format and annotation schemes. The conversion of syntactic annotation began three years after the syntactic annotation of PDT was finished. Such a situation is exceptional because, to our knowledge, there is no other language for which such a significant amount of data is being annotated in two subsequent projects. This article summarizes the experience acquired during the conversion of the CAC syntactic annotation.

Key words: corpus, syntactic annotation, annotation guidelines, annotation checking
Klíčová slova: korpus, syntaktická anotace, pokyny k anotaci, oprava anotací

Daný článek je on-line k dispozici v databázi CEEOL.

Ústav formální a aplikované lingvistiky MFF UK
Malostranské náměstí 25, 118 00 Praha 1
hladka@ufal.mff.cuni.cz
uresova@ufal.mff.cuni.cz

Slovo a slovesnost, volume 72 (2011), number 4, pp. 268-286

Previous Björn Hansen, Marek Nekula, Monika Banášová: Nová konstrukce „Karla Gotta nemusím“ v češtině a slovenštině: případ lexikalizace, pragmatikalizace nebo začínající degramatikalizace?

Next Hana Goláňová: Novočeský lexikální archiv a excerpce v průběhu let 1911–2011