Tag: programming

User Interface Design Patterns

When one works in an area — it doesn’t mat­ter whether it’s in the human­i­ties or in build­ing con­struc­tion — one begins to rec­og­nize pat­terns in how prob­lems are solved. Typ­i­cal solu­tions accrue as a body of knowl­edge and are passed on to new practitioners.

In com­puter sci­ence this has been hap­pen­ing for a decade or more. “Design pat­terns”, soft­ware con­structs which have both data struc­tures and the algo­rithms to effi­ciently and effec­tively manip­u­late them, are becom­ing more and more well-​​known and well under­stood. For exam­ple, there is the “fac­tory” pat­tern, which makes “wid­gets”, defined by the pro­gram­mer. This is a com­mon task, so com­mon that it has been done many times. The gen­eral prin­ci­ples of how to con­struct a fac­tory are described, regard­less of the soft­ware plat­form or environment.

The idea of design pat­terns can be extended, and the folks at Endeca have done just that for user inter­faces (UI): the Endeca User Inter­face Design Pat­tern Library. There is no rea­son the rein­vent the wheel; this library deals with com­mon tasks or prob­lems in pro­gram­ming a UI, e.g., search, faceted nav­i­ga­tion, and infor­ma­tion dis­cov­ery. There are other UI design pat­tern libraries out there, e.g., Pat­ternry.

Why my inter­est in this? Because Patrick Durusau and I are exper­i­ment­ing with new ways of inter-​​acting with text, using the rab­binic Miqra’ot Gedolot (the Rab­binic Bible; kind of like a medieval Jew­ish “study Bible”) as a point of depar­ture for design con­cepts. We are play­ing around with var­i­ous ways of map­ping rab­binic ideas of text study to mod­ern UI con­cepts. Maybe we will come up with a design pat­tern library for the study of bib­li­cal and other ancient texts!

Tools for linguistic research

Steve DeRose pointed out to me this web­page by Bill Poser, a lin­guist who uses the com­puter in sophis­ti­cated ways. This page of resources is not about Com­pu­ta­tional Lin­guis­tics, which is a spe­cific dis­ci­pline. Rather, think “gen­eral com­puter resources”, or “how I can use the gen­eral com­put­ing power of my desk­top to do linguistics”.

Besides the tools avail­able to any sophis­ti­cated user of the com­puter, a lin­guist in addi­tion must col­lect data and mas­sage it into many dif­fer­ent forms so that other tools can be used. Per­haps the most impor­tant tool cat­e­gory for the lin­guist is text manip­u­la­tion. For me per­son­ally, the most pow­er­ful tool I ever dis­cov­ered was reg­u­lar expres­sions. “Regexes”, as they’re famil­iarly known, are descrip­tions of strings of char­ac­ters, no mat­ter how com­plex. These can then be used in scripts and pro­grams to rec­og­nize seg­ments of text on input which can then be manip­u­lated for the desired out­put. The Poser web­page pro­vides an excel­lent set of links to resources and tutorials.

There are many other lin­guis­tic top­ics that are cov­ered on this page. While sur­vey­ing the entire web­site, I ran across an excel­lent list of “Rec­om­mended Read­ing” of books for the lin­guist who desires to lever­age the com­puter for his or her work. I own or have read nearly all of these. Highly recommended.

For any researcher in the human­i­ties, there is no excuse not to have mas­tered the sub­set of these resources appro­pri­ate to his or her sub­ject of study. I have no patience or sym­pa­thy for schol­ars who mas­ter all kinds of arcana and yet object to learn­ing how to use the com­puter prop­erly because it is too “dif­fi­cult”. It’s not too dif­fi­cult. Nor does one need for­mal train­ing. One only needs motivation.

An immod­est postscript

I was pleas­antly sur­prised to see listed on this page my 2008 review in the jour­nal Lan­guage Doc­u­men­ta­tion and Con­ser­va­tion of the data­base engine Emdros, a pro­gram opti­mized for anno­tated text.

Unsupervised Part-​​Of-​​Speech Tagger

Another lin­guis­tic analy­sis tool has come to my atten­tion: A “State-​​Of-​​The-​​Art Unsu­per­vised Part-​​Of-​​Speech Tag­ger” .

In recent years com­pu­ta­tional lin­guis­tics has used the enor­mous vol­ume of ver­biage on the Inter­net to over­come the prob­lems of ana­lyz­ing nat­ural lan­guage. Using prob­a­bil­i­ties cal­cu­lated for a lan­guage using bil­lions of sen­tences, a pro­gram is “trained” to see pat­terns and from the con­text assign the like­li­est part of speech (noun, verb, adjec­tive, etc.) to a word.

Clever and pro­found, yes. Com­pli­cated? Not really. This pro­gram con­sists of just 300 lines of Clo­jure code.(Clojure is a mod­ern dialect of Lisp. It is “Lisp reloaded”, and imple­mented on the Java Vir­tual Machine. It is a func­tional pro­gram­ming lan­guage and it sim­pli­fies multi-​​threaded pro­gram­ming.)

Read­ing the follow-​​up blog post explain­ing the algo­rithm in detail, I found myself won­der­ing about the applic­a­bil­ity of a Hid­den Markov Model for ana­lyz­ing ancient texts. In par­tic­u­lar I won­der about the usu­ally numer­i­cally lim­ited num­ber of obser­va­tions. A prob­a­bil­ity model works best with a “large” set of obser­va­tions. There are “only” 480,446 mor­phemes in 23,213 verses in the Hebrew Bible as rep­re­sented by the Leningrad Codex.

Some would say such pro­grams are of lim­ited value for ancient texts, since man­ual analy­sis is finite and “rea­son­able” in cost. On the other hand, the pro­gram will more con­sis­tently tag the text, and regen­er­at­ing the entire data­base costs very little.

Com­ments?

Programming Tools for Computational Linguistics

The prob­lem with com­pu­ta­tional lin­guis­tics is that it is — well — so arcane. There are plenty of books and web resources to teach the the­ory and prin­ci­ples. But what is often miss­ing is a fully func­tional pro­gram that actu­ally car­ries out the desired tasks. There are two resources that I have found, one thanks to Patrick Durusau.

The Nat­ural Lan­guage Toolkit (NLTK) is imple­mented in Python,  and is a set of libraries and pro­grams to illus­trate all aspects of com­pu­ta­tional lin­guis­tics, includ­ing includ­ing empir­i­cal lin­guis­tics (my pri­mary inter­est), cog­ni­tive sci­ence, arti­fi­cial intel­li­gence, infor­ma­tion retrieval, and machine learn­ing. It was devel­oped for use in the class­room and has a free, down­load­able text­book describ­ing the fea­tures of com­pu­ta­tional lin­guists as imple­mented in the NLTK.

Another sim­i­lar toolkit is Ling­Pipe, which is imple­mented in Java. I  just dis­cov­ered this one and have not spent any time with it. I con­fess it is not as attrac­tive to me because I’m not a Java pro­gram­mer. And the NLTK has nifty graph­i­cal inter­faces as demon­stra­tions of the tools. It would be use­ful in a future post to com­pare features.