Tag: semantics

Mapping the territory. Part 1.

I was updat­ing one of the many bib­li­ogra­phies I main­tain (in Bib­TeX, of course!) the other day. One of the tasks when mak­ing a new entry in the bib­li­og­ra­phy is to choose key­words for that entry. This allows the bib­li­og­ra­phy man­age­ment soft­ware I use (Bib­Desk on the Apple, JabRef on Linux; although this may change once the Zotero stand­alone appli­ca­tion comes out of alpha.) to orga­nize the entries into mean­ing­ful group­ings. The list of key­words has grown in an ad hoc fash­ion. I never really gave much thought to it, think­ing the choice of terms used self-​​evident.

As I was choos­ing a key­word for a cer­tain entry, I found myself unable to do so. None of my lists of key­words really fit; yes, I could add more than one key­word, but that really did not work, either. It needed…more than one word. In a sud­den inspi­ra­tion, I looked up the book’s Library of Con­gress sub­ject clas­si­fi­ca­tion. Ah! Just what I needed. “Hebrew lan­guage — dis­course analy­sis — Con­gresses”. Here are three terms, asso­ci­ated together in a hier­ar­chi­cal man­ner. The moral of this tale is to aban­don key­words and use LOC sub­jects, no? Of course, if my entry had been a jour­nal arti­cle, I would have to choose the sub­ject myself; I could not rely upon a library sci­ence pro­fes­sional to do the clas­si­fi­ca­tion for me. Nev­er­the­less, a def­i­nite step for­ward. Or so it seems.

I proudly men­tioned my bril­liant idea to my friend, Patrick Durusau, who, among other hats, is the con­vener of one of the ISO Topic Maps work­ing groups. His reac­tion: “Well, yes, that works, sort of. Why not cre­ate a topic map of your dis­ci­pline and use that instead of key­words for bib­li­og­ra­phy sub­ject asso­ci­a­tion? It’s more gen­eral, can be more accu­rate and flex­i­ble than LOC sub­jects, and is use­ful for a lot more than just bibliography.”

Duh. I knew that. It would have come to me even­tu­ally. But how to bell the cat? I’m not sure yet. I’m work­ing on it. More anon.

//jabref.sourceforge.net/

21st century study of religious texts

A pic­ture is worth a thou­sand words. So a con­crete exam­ple that you can not only see, but also play with, is worth ten thou­sand words.

The Quranic Ara­bic Cor­pus incor­po­rates much of my vision for the study of the Bible in the third mil­len­nium of our civ­i­liza­tion. For “under the hood” details, see the descrip­tion of the research of Kais Dukes, who is — of all things! — a VP of Mer­rill Lynch.

Three ele­ments of this projects are mor­pho­log­i­cal anno­ta­tion, a syn­tax tree­bank and a seman­tic ontol­ogy. All three are com­bined into a web user inter­face in such a way that col­lab­o­ra­tion is pos­si­ble. The gen­eral pub­lic inter­ested in the Quran itself can browse the orig­i­nal Ara­bic text, and dive into mor­phol­ogy, syn­tax and seman­tics as desired. Schol­ars can work on the actual analy­sis sim­ply by log­ging in.

This model of lin­guis­tic anno­ta­tion of a cor­pus can eas­ily be extended to include bib­li­og­ra­phy, web resources, archae­o­log­i­cal and his­tor­i­cal data — the pos­si­bil­i­ties are endless.

One exten­sion ought to be the abil­ity to add user anno­ta­tion which is stored locally on the user/visitor’s own com­puter but which inte­grates seam­lessly with the website.

I noticed one fea­ture that is lack­ing: the abil­ity for com­plex search­ing, using the mor­phol­ogy, syn­tax and seman­tic anno­ta­tions. There is a search box for sim­ple text queries, but a more sophis­ti­cated search engine would greatly enhance the value of this remark­able resource.

Meaningful meaninglessness

Noam Chom­sky wrote:

1.  Col­or­less green ideas sleep furi­ously.
2.  Furi­ously sleep ideas green col­or­less.
It is fair to assume that nei­ther sen­tence (1) nor (2) (nor indeed any part of these sen­tences) has ever occurred in an Eng­lish dis­course. Hence, in any sta­tis­ti­cal model for gram­mat­i­cal­ness, these sen­tences will be ruled out on iden­ti­cal grounds as equally “remote” from Eng­lish. Yet (1), though non­sen­si­cal, is gram­mat­i­cal, while (2) is not grammatical.

Noam Chom­sky,  Syn­tac­tic Struc­tures (1957) p. 15.

These famous sen­tences (among lin­guists, at least) were con­structed delib­er­ately to con­vey no mean­ing by choos­ing an oppo­site of the pre­vi­ous word. Green is the log­i­cal oppo­site of col­or­less. Ideas are not ani­mate and so do not sleep. Sleep is a pas­sive action, and so the adverb furi­ously is the oppo­site idea of pas­siv­ity. The sec­ond sen­tence is pro­duced by revers­ing the order of the words. Chom­sky says sen­tence (2) is not gram­mat­i­cal and (1) is. That makes hardly bet­ter sense than the sen­tences! Yet a native speaker of Eng­lish “feels” the fact that (2) is more “wrong” than (1). A non-​​native speaker of Eng­lish may be bet­ter able to say why this is so.

Chomsky’s point (one of them) is that there is a dis­tinc­tion in lan­guage between the cor­rect rela­tion­ships of sen­tence ele­ments (parts of speech, abbre­vi­ated POS) and the ref­er­en­tial mean­ing that the indi­vid­ual parts point to. Put another way, one can dis­tin­guish between syn­tax and seman­tics. The seman­tics of clauses is an addi­tional kind of mean­ing to lex­i­cal and ref­er­en­tial meaning.

How is it that sen­tence (1) is syn­tac­ti­cally per­mit­ted (gram­mat­i­cal) and sen­tence (2) is not? Syn­tax trees help us to see the answer to this ques­tion. Syn­tax is about the names, rela­tion­ships and func­tion of POSs in a clause. The clas­sic method for rep­re­sent­ing syn­tac­tic struc­ture is the a syn­tax “tree”. Here is Chomsky’s sen­tence (1) in tree form:

image:cgisf-tgg.png

When sen­tence (2) reverses the words, Furi­ously sleep ideas green col­or­less, the noun ideas now comes before the two adjec­tives green col­or­less. This word order is absolutely gram­mat­i­cal — for Hebrew! But for Eng­lish, such a word order is incorrect.

A matrix is another use­ful way of rep­re­sent­ing syn­tac­tic infor­ma­tion, such as the Attribute Value Matrix (AVM) used by HPSG and sim­i­lar constraint-​​based uni­fi­ca­tion lin­guis­tic the­o­ries.

Still another way to rep­re­sent syn­tax is by using directed graphs. They are the most free-​​form of the var­i­ous visual rep­re­sen­ta­tions of syn­tax. Directed graphs have inter­est­ing math­e­mat­i­cal prop­er­ties that allow for com­pu­ta­tional gen­er­a­tion and manip­u­la­tion as well as rep­re­sen­ta­tion in data­bases. You will read more about directed graphs in future posts.

The lat­ter two meth­ods are more cur­rent with com­pu­ta­tional lin­guis­tics and nat­ural lan­guage pro­cess­ing. And that brings us to our next ques­tion: how is lin­guis­tic infor­ma­tion — espe­cially syn­tax infor­ma­tion — best rep­re­sented in a database?

What do you mean by that?!

This is the first of a series of shorter posts on the nature of mean­ing and its rela­tion­ship to real­ity and lin­guis­tic the­ory and analy­sis. The rea­son it is going to be shorter and a series is because I am grop­ing toward I know not what.

If the sign is a “form-​​meaning” con­struct — a par­tic­u­lar mean­ing asso­ci­ated with a span of sound — then we already know what phys­i­cal sound is. And we can observe it, note its char­ac­ter­is­tics. But what about “mean­ing?” What does mean­ing mean? What does one mean by meaning?

The first thing we need to pin down is, what are we talk­ing about? The study of mean­ing is called seman­tics. The word seman­tics is used is a num­ber of dif­fer­ent ways, and often incor­rectly. What I’m inter­ested in is how humans com­mu­ni­cate and how we can con­fi­dently mea­sure and talk about those communications.

One kind of mean­ing is “lex­i­cal”, that is the “dic­tio­nary” or “vocab­u­lary” type of mean­ing. I think of this kind of mean­ing as “ref­er­en­tial”. A sig­ni­fier (sound pat­tern) points to — refers to — a sig­ni­fied (entity). So “tree” refers to an object in our envi­ron­ment. But the sig­ni­fied can be many other types of “things”. It can be an abstract qual­ity like color or gen­der. It can be the rela­tion between other signs, such as tem­po­ral, spa­tial or log­i­cal relations.

It’s amaz­ing to think of the load of mean­ing with which we bur­den a word! Let’s take one word — toll — and see what it can mean.

  • For native Eng­lish speak­ers, per­haps the first image that comes to mind is the toll one pays to use a road. It is a noun.
  • But Ernest Hem­ing­way would be dis­ap­pointed with you, for he used toll as a verb — mean­ing “to summon”.
  • If we were in Hun­gary, then oth­ers around you would think of a pen! No, not a pen where one keeps ani­mals, but the writ­ing instrument!
  • If in Ger­many, toll is an adjec­tive that has the same con­no­ta­tion as “Cool! Awe­some! Neat!” in English.

This exam­ple tells us two things: (1) asso­ci­a­tion of mean­ing to form is rel­a­tively arbi­trary; it is not absolutely arbi­trary because (2) the asso­ci­a­tion of mean­ing to form is con­text bound, that is, a form can change mean­ing with­out notice if the envi­ron­ment changes.

If this is the sit­u­a­tion with indi­vid­ual words, what about phrases and clauses? Are they signs, groups of signs or what? What is syn­tax and why do lin­guists dis­tin­guish between syn­tax and seman­tics? This is the sub­ject of my next post.