Archive for February, 2006

Emdros preview 1.2.0.pre195 released

Tuesday, February 21st, 2006

I’ve released Emdros preview 1.2.0.pre195. This release sees, as its perhaps most important addition, a Penn Treebank importer. This importer has been underway for a long time… I wrote the first prototype in Python about a year ago. The new implementation is written i C++, and seems to be robust. It imports the BLLIP corpus without a hiccup, as well as the TIGER corpus in Penn format. As I said in the previous post, if anyone tests it on “the real thing” (i.e., the Penn Treebank), please let me know whether it works. Thanks.

Other goodies in the new release include:

  1. A NOTEXIST as described here on this blog.
  2. Export to Annotation Graph XML format was added.
  3. A few bugfixes.
  4. Mac OS X is now a supported platform.
  5. A new Chunking Tool was added as an example.
  6. And more…

Enjoy!

Ulrik

Penn Treebank importer for Emdros

Sunday, February 19th, 2006

Over the weekend, I’ve implemented an importer for the Penn Treebank format. All it does is read Penn Treebank data and transform it to CREATE OBJECT statements (well, CREATE OBJECTS WITH OBJECT TYPE statements, actually). This MQL can then be imported into Emdros via the mql(1) program.

The importer works fine on both the BLLIP corpus and the Penn version of the TIGER Corpus. I haven’t had a chance to test it with “the real thing” (aka the Penn Treebank) yet. If anyone has access to “the real thing” and want to test the importer on that corpus, please drop me a line.

The importer recognizes and resolves coreference links, as well as splitting “NP-SUBJ” into “type” (NP) and “function” (SUBJ). Even unparsed (but POS-tagged) sentences are imported correctly.

This will make its debut in the next public release after 1.2.0.pre191.

Emdros examples galore

Sunday, February 19th, 2006

One of the things that keeps users from evaluating Emdros is the paucity of easily available example databases.

I’ve done something about that particular problem today, by releasing more than 10 Bibles in various languages, as MQL files that can be easily imported into Emdros.

Downloads are here.

If you want any other examples, please let me know via the contact information available at the Emdros website.

Enjoy!

Ulrik