The Metasearch Process

               © Wayne P. Amsbury 31 May 2002        UP: search space

 

It is clear that language fragments cannot always be processed as a whole ─ there are just too many patterns to be considered for this to be a viable thesis; the size of a natural language is open-ended, it is re-created from moment to moment. Even from the limited number of (English) examples above, it is also clear that tokens are not always fully understood in token order. Nor is it feasible to always predict which token in a fragment will be understood first.

Simple English examples and common knowledge and folk wisdom bolster these conclusions when they are considered in light of digs, migs, and search spaces. Even so, metasearches fall within a spectrum, from linear to parallel, with the interweaving of component searches falling between these extremes.

There are certainly English sentences that might be interpreted token by token. A rather famous example is: Let them eat cake. (Surely the last word was a surprise.) Many readers appear to engulf words in chunks, moving eye focus from spot to spot, and so could possibly be interpreting a chunk at a time. Both of these extremes are useful models of language processes at times, but the examples of feedback in previous sections make them unusable for most interpretation.

Generally there are several constraints in effect on any fragment.

Consider the recognition of ‘Twas brillig. The translation of ‘Twas into It was and the structure schema nounà verb à adjective are not easily separated. For one who has not seen this fragment before, the complete schema would surely be complete before the word brillig was determined as a new word, but not before its use as an adjective. However, someone not familiar with the contraction ‘Twas in ‘Twas sunny might first derive verb à adjective before extrapolating ‘Twas into It was, and then to an understanding of the full schema. Much of this was implicit in the creation by Lewis Carrol of Jaberwocky.

The search process is not a circle, but rather a spiraling iteration. Intuitively it is carried out for every token, and the final results are consistent for every token.

It is intuitively clear that in a complex sentence, or even a fairly simple fragment, the component fragments are being resolved concurrently with the whole. The fragments must form a cohesive whole, some of them are encountered and their processing begun before the last of them, and all must be resolved into cohesive supporting units before the whole is understood.

There are also choices of constraint that may be applied to a token, depending upon its evolving role during the search for an interpretation. This is easily demonstrated for both nouns and verbs.

Nouns. During interpretation, the recognition that a word is a noun of a fragment restricts it to a limited number of roles in the fragment, and thus limits it to participation in only a subset of rules of the language.

If a word is recognized to fall into noun then it lies in the intersection of categories such as subject and object, and these categories force a selection of lexical rules that can include the word, such as verb before object in English. If this rule is violated by the token order, then the word falls into some other category, with some other set of rules that must apply.

An alternate category has implications for the comprehension of a word previously thought to be a noun.

A noun phrase must be fully resolved in order to play its role as a noun in the rules governing a containing fragment. Such a phrase may have the very same structure as the containing fragment. It is nested and the processes involved are iterated, or recursive.

(The technical difference is that iteration may be identical at each use and its exit controlled from outside the procedure. A recursive procedure essentially calls itself, but then must eventually change something in the call in order to get out of the loop.)

Verbs. Consider the fragment Joan looked happy. This has a linking verb, looked, with the meaning appeared to be and where happy modifies Joan as an adjective. In contrast the fragment Joan looked happily (at me) uses looked as an action verb, where happily is an adverb modifying looked. It is not possible to recognize the role of the verb or the overall structure of the fragment until the part of speech of happy or happily is recognized. Ignoring the past tense of looked and some other details, these can be placed in (incomplete) digs as follows:

subject → link   →     complement

↓               ↓                ↓

noun         verb           adjective

↓               ↓                 ↓

Joan   →  looked  →      happy

↑               ↕                  ↕  ↓

              linking ↔↔┘ ↓

└   ←←←←←←←←←←←←     ┘

 

subject → action → → → object

↓                ↓

noun         verb          adverb ←┐

↓                ↓             ↓           ↑

Joan   →   looked  →  happi → -ly

↕                 ↕

action ↔↔┘

Reading comprehension may or may not occur in chunks, but in light of the dig model of the data stream, this begs the question: How are the chunks processed? If they are really comprehended all at once, then the tokens are processed in parallel, which is a very restrictive form of concurrency indeed. In the fragments diagrammed above, it is possible that both happy and happily are recognized as units, but the suffix –ly is common to many words; it is a strong clue, built into the language for some purpose.

Principle: Every element of language serves a purpose. It may increase precision, it may increase expressive power, it may be borrowed from another language, but these purposes are balanced against speed in comprehension and construction.

Inefficiencies are eroded; they are washed away by an enormous flood of human language. Argot and formality and hyperbole have their place as statements of community, but they are eddies in the stream.

It is possible that pattern recognition predominates in language, embodied in neural nets (real neural nets), but because individual tokens play the roles they do, this surely occurs largely at the token level.

Spoken speech is noticeably a real-time stream, and the comprehension of it surely involves caching, (or parking), several components while awaiting some crucial aspect that must be searched to gain a global understanding of the fragment. How well this is done is determined by how many things one can remember, but then, the entire process is affected by how many things one can do at once. The number of concurrent searches one can pursue at the same time is relevant to the speed of retrieval.

There appears to be a lack of explicit support for alternate interpretations of token sequences in some languages. They lack mechanisms such as connectives and articles and the like, and this is surely related to the ability to deal with abstractions. It is possible that the structure of a language affects not only the speed with which it is comprehended, but the degree of abstraction that the language provides as an exercise for the brain. In any case, all of that is inherently expressed in the set of digs available in a particular language.

Individual concurrent searches in a metasearch may involve the structural variations of dig schemas, prefix and suffix spaces, parts of speech, semantic context, person, tense, word order, word size, word stems, word meaning, and many other things. Sometimes rather subtle effects are involved, and it is not clear what they all are.

Consider the distinctions between Stove baking and Stover baking and Stover sweltering, and how you determine their meaning, to the extent possible. The middle example has two interpretations, just like time files, but one is much less likely than the other.

The language processes of both construction and interpretation at their normal speed appear to be more or less instantaneous, but construction often involves trial and error, a sifting of possibilities, a convergence on meaning, and the settlement on a comfortable model. In a word, design. These processes surely do include the metasearch we are promoting as a tool and as a model of the essence of language. By separating parts of the process, the model makes it clear how amazing it is that we do these things instantaneously.

The root from which a search begins may be any token of a dig. The results of one component search may constrain the other searches, making them more efficient by establishing context. (Consider personal names in sentences.) Restated, the dig of a fragment does not determine the flow of a search process during either its construction or its interpretation; the dig is an artifact of the final search result, not a search script.

It should be noted that searches of large data spaces such as the Internet are gravitating in the direction of metasearch. Metasearches of the billions of nodes of the Internet have become the rage amongst computer scientists, and with reason. The size and complexity of the search space is not yet comparable to that of the brain, but simple searches have become burdensome to the point of inanity. There is a movement as well toward a more mathematical discipline of data mining in large data sets. The computerized data spaces are apparently trivial when compared to those of the human brain, but at some point in time this is or will not be true. Nothing in computing practice is remotely in competition with the mind for natural language recognition, let alone the speed of it. The search art is still gaining in sophistication at a rapid pace, but it is surely not yet on the most efficient track.

Computational linguistics may be considered to be an application computer science, which has yet to explore concurrent searching in much detail. Certainly the launching of multiple Web crawlers onto the Internet is concurrent searching, and so is an application of parallel (or distributed) processors in a search. What seems to occur in language processes, however, is the dynamic interaction between searches, the leakage from one search component to another. The component searches are tightly coupled. The effective mechanisms for coupling in natural language are still mysterious, but they surely come under the heading of message passing of some kind, perhaps pattern passing. It is not clear what messages are passed in language or how the coupling occurs.

NEXT: language_differences.htm  BACK: metasearch_process.htm

.