Michal Křen, Martina Waclawičová
Comparison of spoken corpora: really just a matter of perspective?
A B S T R A C T
Recently, more attention has been paid to the issues of corpus design and representativeness. These issues are especially important for general-purpose language corpora such as the spoken corpora developed within the framework of the Czech National Corpus. This text is a response to Jan Chromý’s paper “Comparison of spoken corpora from a sociolinguistic perspective” (Slovo a slovesnost 78, 2017: 145–158), in which the author compares the general-purpose spoken corpus ORAL2013 with his own dataset collected for the SAUP project. We argue that some of his claims are not justified by the findings presented in the paper and that his understanding of the concept of representativeness is rather misleading. Therefore, we aim to clarify some fundamental design decisions adopted for the compilation of ORAL2013 by responding to the specific objections raised by Chromý. We also point out some methodological and reasoning inconsistencies in his paper.
Key words: corpus design, spoken Czech, metadata, regional coverage, representativeness
Klíčová slova: výstavba jazykového korpusu, mluvená čeština, metadata, regionální pokrytí, reprezentativnost
Daný článek je on-line k dispozici v databázi CEEOL.
Ústav Českého národního korpusu FF UK
Panská 7, 110 00 Praha 1
Slovo a slovesnost, volume 80 (2019), number 2, pp. 128–139
Previous Michal Hořejší: Karel Klostermann a diskurz o Šumavě