Noam Chomsky wrote:
1. Colorless green ideas sleep furiously.
2. Furiously sleep ideas green colorless.
It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally “remote” from English. Yet (1), though nonsensical, is grammatical, while (2) is not grammatical.
Noam Chomsky, Syntactic Structures (1957) p. 15.
These famous sentences (among linguists, at least) were constructed deliberately to convey no meaning by choosing an opposite of the previous word. Green is the logical opposite of colorless. Ideas are not animate and so do not sleep. Sleep is a passive action, and so the adverb furiously is the opposite idea of passivity. The second sentence is produced by reversing the order of the words. Chomsky says sentence (2) is not grammatical and (1) is. That makes hardly better sense than the sentences! Yet a native speaker of English “feels” the fact that (2) is more “wrong” than (1). A non-native speaker of English may be better able to say why this is so.
Chomsky’s point (one of them) is that there is a distinction in language between the correct relationships of sentence elements (parts of speech, abbreviated POS) and the referential meaning that the individual parts point to. Put another way, one can distinguish between syntax and semantics. The semantics of clauses is an additional kind of meaning to lexical and referential meaning.
How is it that sentence (1) is syntactically permitted (grammatical) and sentence (2) is not? Syntax trees help us to see the answer to this question. Syntax is about the names, relationships and function of POSs in a clause. The classic method for representing syntactic structure is the a syntax “tree”. Here is Chomsky’s sentence (1) in tree form:
When sentence (2) reverses the words, Furiously sleep ideas green colorless, the noun ideas now comes before the two adjectives green colorless. This word order is absolutely grammatical — for Hebrew! But for English, such a word order is incorrect.
A matrix is another useful way of representing syntactic information, such as the Attribute Value Matrix (AVM) used by HPSG and similar constraint-based unification linguistic theories.
Still another way to represent syntax is by using directed graphs. They are the most free-form of the various visual representations of syntax. Directed graphs have interesting mathematical properties that allow for computational generation and manipulation as well as representation in databases. You will read more about directed graphs in future posts.
The latter two methods are more current with computational linguistics and natural language processing. And that brings us to our next question: how is linguistic information — especially syntax information — best represented in a database?