Semantic cataracts in text structures
A B S T R A C T
Assume a semantic space demarcated by a set of lexical units occurring in a text. Their transition into word forms is a dynamic semantic process connected with the collocation of the units in text segments. The aim is to find semantic constructs and constituents and to apply Menzerath-Altmann’s law, which defines the relationship of language constructs and constituents to the semantic text level. Any text is expected to provide a characteristic arrangement of lexical units in their contextual bonds. This can be explained as a feature of semantic dynamism and possibly also of human thinking. We can metaphorically call it a semantic cataract.
S U M M A R Y
The search for text structures is an important subject for text linguistics. Recently, this problem appeared to be solvable with the help of Menzerath-Altmann’s law. The formal structure of this law is expressed by formula (1). Its basic notions are language construct and its size (x), language constituent expressed by a mean-size value (y) and their relationship (given by parameters A and b). This is a general law valid for all language levels, including the semantic text level. In this way, any text (in a natural language with an uninterrupted sequence of sentences) actually becomes something like a language unit.
The standard manner of detection of this law at the semantic level is documented in Table 1, where z is the number of lexical units, each of which is a base for semantic constructs of a text. Their sizes equal the number of text segments, each containing a given lexical unit; the (usually approximate) identity of the semantic construct’s size with frequency, i.e. x ≡ f, is thus defined. These segments are constituents of a construct and their average size is expressed in number of words.
Another manner of treating this structure considers each lexical unit i occurring in a text. The semantic or contextual weight of a unit is defined in formula (2), where Si is the sum of segment sizes of a given i, and fi is word frequency in the text. If the values of wi are treated in relation to fi as its function, i.e. as wi(fi), and the Zipfian arrangement of word frequencies is applied to fi and wi, an L-formed curve for fi together with a “cataract curve” for wi(fi) is obtained, see Figure 1 concerning the same Czech text as in Table 1.
This typical arrangement of lexical units was observed in different texts and different languages, as is documented by the Turkish text in Figure 2. Both texts, Czech and Turkish, are items from literature (a short story and a chapter of a novel), but similar structures can be found in different kinds of texts.
Each “wave” of the semantic cataract represents a distribution of values that can be characterized by three variables: max wi(fi), a mean value <wi(fi)> and min wi(fi) – or simply max wi, <wi>, min wi. Each wave, however, seems to hang on the first of these variables. Its relevance for the whole structure is proven by the fact that the points of max wi (or, simply, the peaks of the waves) are situated on a Menzerathian curve. This curve corresponds to the transformed formula (1), where y is substituted by max wi. It is evident that this new curve better captures the basic idea of Menzerath-Altmann’s law: the greater the construct, the smaller the constituent.
Any text is a complex system, therefore the descriptive variables and curves experience the interference of random fluctuations. However, it is rational to expect that in the future some typical deviations from the ideal forms of the characteristic curves will be found and combined with pathological digressions proper to certain brain functions.
Key words: language construct and constituent, Menzerath-Altmann’s law, text segment, semantic construct, semantic (contextual) weight of lexical units
Klíčová slova: jazykový konstrukt a konstituent, Menzerathův-Altmannův zákon, textový segment, sémantický konstrukt, sémantická (kontextuální) váha lexikálních jednotek
Daný článek je on-line k dispozici v databázi CEEOL.
Orientální ústav AV ČR, v. v. i.
Pod vodárenskou věží 4, 182 08 Praha 8
Předchozí Josef Anderš: Setkání ukrajinistů střední a východní Evropy v Olomouci
Následující Jarmila Panevová: Znovu o reciprocitě