Originally a piece in ScientificBlogging, September 28, 2009…
You open your dictionary to figure out what your friend meant by ‘nasute,’ only to find that the definition is “A wittol, or jemadar; bannocked in an emunctory fashion.” What good is this dictionary, you wonder, if it only refers me to other words I don’t know? And worse still, the definitions of some of these words refer back to ‘nasute,’ the word you didn’t know in the first place! Even if your attempt to learn what ‘nasute’ means is not infected by circularity, you face a quick explosion of words to look up: the words in the definition, the words in each of these definitions, and so on. The dictionary appears, then, to be a terribly messy tangled web.
In reality, however, dictionaries aren’t quite that worthless. …and the definition of ‘nasute’ above is, thankfully, fiction. The standard line for why dictionaries are useful is that the typical users of dictionaries already know the meanings of much of the dictionary, and so a disorderly dictionary definition doesn’t send them on an exploding wild goose chase.
Dictionaries would, however, be only a source of frustration for a person not knowing any of the vocabulary. And, therefore, dictionaries – and the lexicon of language they attempt to record – can’t be our mental lexicon. If a word is in our mental lexicon, then we know what it means. And if we know what it means, then our brain is able to unpack its meaning in terms it understands. The brain is not sent on a wild goose chase like the one fated for a Zulu native handed the Oxford English Dictionary.
Compared to the disheveled dictionary, the mental lexicon is much more carefully designed. The mental lexicon is hierarchical, having at its foundation a small number – perhaps around 50 – of “semantic primes” (or fundamental atoms of meaning) that are combined to mentally define all our mental concepts, something the linguist Anna Wierzbicka has argued. And our internal lexicon has a number of hierarchical levels, analogous to the multiple levels in the visual hierarchy or auditory hierarchy.
The “visual meaning” of a complex concept – e.g., the look of a cow – gets built out of a large combination of fundamental visual atoms, e.g., oriented contours and colored patches. In the same way, the (semantic) meaning of the concept of a cow gets built out of a large combination of fundamental semantic atoms, e.g., words like ‘you’, ‘body’, ‘some’, ‘good’, ‘want’, ‘now’, ‘here’, ‘above’, ‘maybe’, and ‘more’. In both sensory and semantic hierarchies, the small number of bottom level primes are combined to build a larger set of more complex things, and these, in turn, are used to build more complex ones, and so on. For vision, sample objects at increasingly more complex levels include contours, junctions, junction-combinations, and objects. For the lexicon, examples of increasing complexity are ‘object’, ‘living thing’, ‘animal’, ‘vertebrate’, ‘mammal’, ‘artiodactyl’, and ‘cow’.
In our natural state, the mental lexicon we end up with would depend upon our experiences. No rabbits in your locale, no lexical entry in the head for rabbits. And the same is true for vision. The neural hierarchy was designed to be somewhat flexible, but designed to do its lexical work hierarchically, in an efficient fashion, no matter the specific lexicon that would fill it.
There is quite a difference, then, between the disordered, knotted dictionary and our orderly, heavily optimized, hierarchical mental lexicon. Language’s vocabulary determined by cultural selection – and whose structure is partially measured by dictionaries – does not seem to have harnessed the lexical expectations of our brain.
However, are dictionaries really so tangled? Back in 2005 while working at Caltech, I began to wonder. Dictionaries surely do have some messiness, because they’re built by real people from real messy data about the use of words: so some circularities may occasionally get thrown in by accident. But my bet was that the signature, efficient hierarchical structure of our inner lexicon should be in the dictionary, if only we looked carefully for it. Language would work best if the public vocabulary were organized in such a way that it would naturally fit the shape of our lexical brain, and I suspected cultural selection over time should have figured this out. …that it should have given us a dictionary shaped like the brain: a braintionary.
So I set out on a search for these signature hierarchical structures in the dictionary. A search to find the hidden brain in the dictionary. In particular, I asked whether the dictionary is hierarchically organized in such a way that it minimizes the total size of needed to define everything it must.
To grasp the problem, a starting point is to realize that there is more than one way to build a hierarchical dictionary. One could use the most fundamental words to define all the other words in the dictionary, so that there would be just two hierarchical levels: the small set of fundamental (or atomic) words, and the set of everything else. Alternatively, dictionaries could use the most fundamental words to define an intermediate level of words, and in turn use these words to define the rest. That would make three levels, and, clearly, greater numbers of levels are possible.
My main theoretical observation was that having just the right number of hierarchical levels can greatly reduce the overall size of the dictionary. A dictionary with just two hierarchical levels, for example, would have to be more than three times larger than an optimal one that uses around seven levels.
Via measurements from and analysis of WordNet and the Oxford English Dictionary, in a paper I published in the Journal of Cognitive Systems Research I provided evidence that actual dictionaries have approximately the optimal number of hierarchical levels. I discovered that dictionaries do have the structure expected for the brain’s lexical hierarchy. Dictionaries are braintionaries, designed by culture to have the structure our brains like, maximizing our vocabulary capabilities for our fixed ape brains.
What it means is that language has culturally evolved over the centuries and millenia not only to have the words we need, but also to have an overall organization—in terms of how words get their meanings from other words—that helps minimize the overall size of the dictionary. …and simultaneously helps us efficiently encode the lexicon in our heads.
The journal article itself can be linked here.