Language Differences

©  Wayne Paul Amsbury            31 May 2002        UP: metasearch_process.htm

 

The set of all possible digs of language fragments is intended to be comprehensive in the sense that any feature of any language can be incorporated into an appropriate one of them as needed. Some constraints on the graphs are imposed by the model, as above; others are characteristic of particular languages and may be formalized by rules of grammar, parameters, and the like.

A landig is a dig that is acceptable, or grammatically correct, or in use, in a particular language. Thus there is a set of English landigs and a set of Mohawk landigs, and they certainly differ, and they differ in their schemas as well as their vocabulary.

Using an example from Baker [pages 33-34], consider the English sentence: John hit Mary. The word body of this sentence is the fragment with the (incomplete) dig below. (The recognition of proper names by capitalization and agreement in person is omitted here for the sake of focus.):

subject → action → object

↓                ↓              ↓

name         verb        name

↓                 ↓             ↓

John    →   hit    →  Mary

↕                ↕            ↕

subject ↔   ┴   ↔    object

 

The parts-of-speech agreement between hit and both John and Mary is a consequence of the rule path subject à action à object. This is really two loops, one generated when the verb is encountered, the other generated when the object is encountered.

This has a Japanese transliteration: John-ga Mary-o butta which has a word body dig of the form:

name     suffix     name       suffix  verb

↓             ↓            ↓            ↓        ↓

John  →   ga  →    Mary  →  -o  →  butta

↑             ↓                        ↓

subject  ←┘            object  ←┘

There is no need for the subject and object to be determined by an agreement; they are determined by the suffixes. The suffixes can determine an alternate structure: John-o Mary-ga butta in contrast translates into Mary hit John, and has the word body dig:

name     suffix   name     suffix    verb

↓             ↓         ↓          ↓         ↓

John  →  -o  →  Mary → -ga → butta

↑             ↓         ↑           ↓

object  ← ┘       subject  ← ┘

 

Neither of the Japanese digs above can be an English landig, and the English dig cannot be a Japanese landig. Aside from the difference in the object ─ verb direction, the feedback loop in the English fragment that determines the subject/object category of the noun comes from the verb by way of a rule path, not from suffixes of the nouns.

Consider the Japanese example John-o Mary-ga butta above in terms of the search process, where John-o may be recognized and known to be an object before Mary-ga is even encountered, but perhaps not, because John is not a common Japanese name. It is certainly true that if a word is an object, it is known to be so before the verb is encountered in Japanese, but not in English.

Embedded tokens such as prefixes and suffixes can affect the category of words other than their own. Baker [page 9] provides a Navaho example that is paraphrased as: boy girl yi-saw, which corresponds to the English, boy saw the girl. It has a word body dig of the following form:

                               prefix    verb

                                ↓        ↓

boy      →    girl   →   yi- → saw

↑                ↑           ↓

subject ↔ object    

                             

└  ←← ←←  ←← ←←  ┘

In contrast, boy girl bi-saw has the English equivalent: girl saw the boy and a word body dig of the form:

                          prefix     verb

                              ↓         ↓

boy   →   girl   →   bi- → saw

↑              ↑           ↓

object ↔ subject   ↓

                ↑            ↓

                └ ← ← ← ┘

Consider the Navaho example paraphrased a: boy girl yi-saw above. It is intuitively clear that, at least for the spoken stream, the first noun is usually resolved as a specific name before the prefix yi- is even encountered, and yet the sense of the fragment can not be fully understood until some mental analog mig of the fragment dig is determined. Nonetheless, the first noun may be slow in coming, or even irresolvable, and the general structure may be determined either first or (apparently) at the same time.

Some languages have multiple suffixes or prefixes and some place multiple verbs adjacent to each other. Some have verbs split into two widely separated pieces; some pack the equivalent of an English sentence into a "word."

It is clear that such constructions are governed by grammatical rules, and that they affect the search for an interpretation if only because the relevant tokens are encountered in a specified sequence. However, the rules are a major control of the construction process.

The overview of all such constructions is that fragment tokens occur in a stream, some tokens affect the class of other tokens, and some categories are constrained to precede or follow others by the rules of the language of the fragment. There appears to be no such structure or rule that cannot be modeled by a landig.

There are several (imprecise) attributes of languages that may be used for comparison:

         The expressiveness of a language is related to its flexibility, to the number of choices that it makes available in both vocabulary and schemas. The creation and the borrowing of words enhance expressiveness.

         The efficiency of a language is related to the limits placed on both vocabulary and schemas. The overloading of terms and a complex grammar diminish efficiency, a simple grammar enhances it.

         The power of a language lies in how it balances these two opposing factors, the balance providing its approximation of some optimum. Clearly, every individual has their own point in the space of possible language balances.

         The speed of a language is the relative number of steps with which equivalent fragments can be processed.

It will be argued below that speed is not simply a token count. By one measure, the speed of processing for the Japanese and Exglish equivalents above are the same. By another, the tokens of English do more work; English is relatively terse.

Parameters. Some aspects of landigs in a particular language are invariant, they are characteristic features of that language. In particular, the parameters of Baker are constraints on landigs. In a subject-before-verb landig, there is no directed path from a verb to its subject. In a verb-before-subject landig, the opposite is true. The landigs in these two categories are determined by a parameter according to Baker, and cannot be mapped to each other within a given language while retaining parts of speech. It is highly improbable for both constructs to be common in the same language.

Parameters differ from the lexical elements of language in that they do not correspond to structural nodes or paths; they limit either the choice of structure nodes or the direction of links between nodes. Typically parameters partition the space of potential languages into two or more subsets, but there may be statistical overlap in some cases. To generalize:

         Parameter effect. A parameter is a selection rule that limits the landigs of a language to a proper subset of those possible.

Some of the parameters that are recognized by linguists are easily illustrated.

Null-subject Parameter. In some languages every tensed clause must have an overt subject, in others no overt subject is required.

Following Baker [pages 35-44], in order to be terse about the weather, one may say:

It is raining. (English)

Il pleut. (French)

Llueve. (Spanish)

Piove. (Italian)

In English, She will come. In Italian, Verrà, which also serves for He will come.

This is a choice between the rule path: verb and the more complete path: subject à verb. It is not at all clear that one system has advantages over the other, since one requires the look-up of a subject, and the other relies on context that may or may not be immediately obvious. What is clear is that the language process is easier if only one of these systems is used.

Head-Directionality Parameter. In some languages heads follow phrases in forming larger phrases, in others heads precede phrases during accretion.

Baker provides a table [page 60] that shows a number of specialized rules which are inextricably linked to this parameter. Its basic form is that a encompassing phrase is formed either from the rule path: head à phrase or from the path: phrase à head. In a head-first language, He looked left, not the equivalent of He left looked.

In a head-first language, there are prepositions, read: pre-positions, not: post-positions, as used in a head-last language. Baker [page 61] gives an example from the Sioux language Lakhota that transliterates into: John letter that bed the under found instead of: John found the letter under the bed. It should be obvious that trying to deal with mixed use of both rule paths would be very confusing, but there is no apparent absolute advantage of one system over the other.

Polysynthesis Parameter. Verbs must include some expression of each of the main participants in the event described by the verb (the subject, object, and indirect object).

Languages that obey this rule tend to have complex words packed with information that would be spread out over a sentence in languages that do not, such as English. Baker [Chapter 4] explores some of both the similarities and differences between Mohawk and English, and some of his examples are necessarily borrowed here.

Prefixes in Mohawk act as pronouns, denoting subject, direct object, or indirect object. Furthermore, they denote combinations of them, 58 in all. For instance, the one-word sentence: Shakonuhwe’s (He likes her.) differs from Shakwanuhwe’s (He likes us.) only in the prefix. As a search process, these provide information about the subject and the object of the verb prior to, and independent from, the verb encounter, at the cost of sorting through the 58 possible pairings.

Two more complex examples demonstrate how digs relate to such constructions:

                pair prefix      verb            article

                      ↓                 ↓                  ↓

subject ↔┐ ↓┌↔↔↔ ↓ ↔↔↔↔↓ ↔↔object

↓              ↕  ↓ ↕              ↓                 ↓             ↓

Sak   →     shako-   →     nuhwe’s    →    ne →   owira’a

Sak           he/her           likes                the        baby

 

This has a simpler form in which a noun form for baby is also incorporated in to the verb:

 

subject     object    sound    verb

↓         ↕       ↓           ↓           ↓

Sak →  ra- →  wir- →   a- →   nuhwe’s

Sak     he        baby               likes

Given equivalent vocabularies, there is no apparent advantage for the adaptation of polysynthesis or not. However, there is a subtle difference in the English and Mohawk associated with the search but not the lexical structure. The verb likes could be cuddles, dresses, holds, or something else. Taking the extreme of a linear search, something about the subject and object, at least the gender of the latter, is known in Mohawk prior to the encounter with the verb, but this will not happen in English.

 

NEXT: a_process_model.htm   BACK: metasearch_process.htm