The Emdros blog is back

January 3rd, 2012

The Emdros blog is kindly hosted by the J. Alan Groves Center for Advanced Biblical Research. The Groves Center suffered a hardware outage in late 2011, bringing this blog down.

Thanks to the hard work of Dr. Kirk Lowery, the blog is now back. Thanks, Kirk!

More news coming. Stay tuned!

Ulrik

Emdros 3.3.0 released

July 4th, 2011

I have released Emdros version 3.3.0 over at SourceForge.Net.

http://emdros.org/download.html

Please note that the implementation and method of indexing of the Full Text Search are subject to change, as this feature is still experimental.

Enjoy!

Ulrik Sandborg-Petersen

 

Controlling containment in topographic MQL

February 14th, 2011

I have just finished adding a new feature to the topographic part of the MQL query language.

Hitherto, the only relation one could specify for containment between an inner object block and the outer container was “part_of”, and it was always relative to the containing substrate.

In plain English, that meant that the inner object’s monad set had to be a subset of the outer object’s monad set, or (if the inner block was at the outermost level), it must be a subset given in the IN clause after SELECT ALL OBJECTS.

Now, you can specify these four relations:

  • part_of(substrate) // The default
  • part_of(universe) // To disregard gaps in the substrate
  • overlap(substrate)
  • overlap(universe)

The overlap relation means: The inner object must have a non-empty intersection (i.e., share at least one monad with) the outer substrate or universe.

This makes it possible to specify things like this:

SELECT ALL OBJECTS
IN Aramaic_monads // Pre-defined monad set
WHERE
// This means that we want all clauses which share at least one monad
// with the Aramaic_monads monad set
[Clause overlap(substrate)
   // This finds all phrases inside the left and right boundaries of
   // the outer clause, regardless of any gaps in the clause.
   [Phrase part_of(universe)
   ]
]

This will appear in the next public release after 3.2.0.

If anyone is interested in trying this out, please let me know.

Ulrik

Full Text Search implemented in Emdros

October 30th, 2010

I’ve finished the implementation, tuning, and testing of Full Text Search (FTS) for Emdros.

The implementation is part of the libharvest library, and is written in C++ like the rest of Emdros.

I implemented the basic idea in Python first, then reimplemented it in C++. Python is so malleable that this sort of prototyping work makes Python ideal for the task.

The Full Text Search has a lot of features, including:

  • Index “documents”, which must exist as object types.
  • Index documents based on “indexed object types” (e.g., token) and one indexed feature of the indexed object type.
  • Search within “documents”.
  • Chainable filters that modify token strings before being indexed, e.g., to weed out stop-words, or to strip, lower-case, or otherwise alter the token strings.
  • Tokenization of query-string splitting on spaces.
  • Optional application of the chainable filters to the query-terms after tokenization, so as to be more likely to match the indexed feature.
  • Google-like “quoted strings” that make the query-terms be adjacent.
  • More than one “quoted string” allowed in the query-string.
  • Return results as list of three-tuples (document-first-monad, document-last-monad, first-search-term-first-monad)
  • Return results as customizable snippets of real tokens, with optional highlighting of query terms.
  • Command-line tools for both indexing and searching.

This will appear in the next public release of Emdros.

Interested parties should contact me via email for getting the latest sources.

Enjoy!

Ulrik

Linguistic Tree Constructor 3.0.4 released (with an Easter egg)

September 18th, 2010

I’ve released Linguistic Tree Constructor (LTC) version 3.0.4 over at http://ltc.sourceforge.net …

The significance for this blog is that:

  1. LTC uses Emdros
  2. The latest release of LTC has the latest Emdros sources for 3.2.1.pre02 as an Easter egg inside.

Go grab the sources of LTC if you want to see what I’m up to for the next version of Emdros, then look at the ChangeLog.

Enjoy!

Ulrik Sandborg-Petersen

Bit Packed Table backend with encryption

August 4th, 2010

In March 2010 (3rd and 9th), I wrote on this blog about a new backend for Emdros under development, called the “Bit Packed Table” (BPT) backend. It is a high-performance, read-only database engine, based on “bit packed tables” and custom-tailored to the EMdF model. It outperforms even SQLite in terms of raw querying speed by about 30% on average.

I have recently made the BPT engine almost feature-complete, including adding an encryption layer. The encryption isn’t strong, but it does the job of keeping prying eyes out of your data.

I have added BPT to two of my Emdros-based software projects, using it exclusively for the backend for these projects, both of which deliver content to the user through a thin shell on top of Emdros. It works fine, and the speed increase over SQLite 3 is especially noticeable — pieces of content that used to take 1.5 seconds to load now leap onto the screen.

I said the BPT engine is almost feature-complete. The only thing missing, in fact, is support for stored monad sets. That is, monad sets that don’t have any object data associated with them, but which can be used for delimiting a query. I will add this feature in due course.

The BPT engine isn’t Open Source, and won’t be for the foreseeable future. If you are interested in licensing the engine, please drop me an email.

Enjoy!

Ulrik

Emdros 3.2.0 released

July 4th, 2010

I’ve released Emdros 3.2.0 over at SourceForge.net.

http://emdros.org/download.html

The release notes appear below.

Please let me know via the usual avenues whether anything is amiss.

Enjoy!

Ulrik

- *** Version 3.2.0 ***

As usual, binaries are available for Mac OS X, Windows(R), and Fedora
(13).

The Windows binaries have support for MySQL, SQLite 2, and SQLite 3.
They are built with Visual Studio Express 2010.

The Mac OS X binaries are Universal binaries running on Mac OS X 10.4
(Tiger), 10.5 (Leopard), and 10.6 (Snow Leopard).  They do not have
support for either MySQL or PostgreSQL; Only SQLite 2 and SQLite 3 are
supported in the Mac OS X binaries.  You can compile the sources with
support for MySQL yourself, though, and possibly also PostgreSQL.

The Fedora binaries come with support for PostgreSQL, MySQL, SQLite 2,
and SQLite 3.

This release has the following changes over 3.1.1:

- A new backend was created, called the BPT engine.  It is
proprietary, and thus not Open Source, at the moment (sorry).
Interested licensors can contact me at ulrikp – at – emdros |dot|
org for questions about this new engine.

- SQLite3 was upgraded to version 3.6.17

- PCRE was upgraded to version 8.01. The license is still BSD.

- The TIGERXML importer is now more lenient towards the XML being
imported.

- The Emdros Query Tool now implements an XML_Output_Style.  See the
User’s Guide for the Emdros Query Tool for how to use it.  WARNING:
The output is still subject to change!

- The Emdros Query Tool (GUI version) can now create PNG files right
from the command line.  See the man page for eqtu.

- Assorted changes to the harvest library.  Note that the harvest
library is not stable yet; all APIs are subject to change as I
experiment with the best way of doing this important task.

- A topographic query can be stopped by setting the following bool to
false:

MQLExecEnv::m_bContinueExecution.

- Assorted changes to the horizontal tree and vertical tree layout
engines.

Enjoy!

Ulrik Sandborg-Petersen

Linguistic Tree Constructor — 25000 downloads passed

June 28th, 2010

One of my Open Source “successes”, Linguistic Tree Constructor, has passed 25000 downloads over at SourceForge.net.

Linguistic Tree Constructor (LTC) is a tool for building linguistic syntax trees in no time flat, using your mouse. Its main strength is quick annotation of large amounts of text, i.e., production of syntactic databases. It is based on Emdros for much of its implementation.

You can see the stats, or download for Mac OS X, Windows, and Linux over at the kindly folk at SourceForge.Net.

Enjoy!

Ulrik

Getting data out of Emdros (the easy way)

April 12th, 2010

Between now and the last Emdros release (August 2009), I’ve been busy building up an infrastructure around Emdros which should make it easier to use.

One of those efforts has involved what I call “the harvesting library”. Basically, it’s a piece of software which is part of Emdros, and which runs on top of the core Emdros services, and whose primary goal in life is to make it incredibly easy to extract information from almost any Emdros database. Not only that, but the harvesting library also has some nifty ways of turning that extracted information into HTML, XML, JSON, or whatever you like.

The way it works is, you write a “stylesheet” in what’s called JSON. JSON is a very small language, and is very easy to learn. So, you feed the harvesting library a specially structured JSON data file, which I call a “stylesheet”.  Then, the harvesting library interprets that JSON structure, and goes to work extracting the desired information from the Emdros database at hand. This extraction process is driven by the JSON data file, and is extremely simple to set up. Once extracted, the harvesting library optionally takes that information and transforms it according to the rules you’ve written in another part of the same JSON structure. This could be HTML, XHTML, RTF, YAML, JSON, or whatever you want.

What this amounts to is that you can store in an Emdros database, not only “what” you want to store (the data), but also “how” you want to extract it, in what order you want to “assemble” the information, and how you want to “present” it (using HTML, RTF, or another presentation language). You just store the JSON script in an EMdF object that your application knows how to find in your database. When you want to use the JSON, you grab the EMdF object, extract the JSON, and pass it to the harvesting library, along with information about which monad set to harvest, and out comes your nifty, formatted HTML, XML, or whatever it is your stylesheet produces.

This will appear in the next public release of Emdros. Interested parties are, as always, welcome to contact me (http://emdros.org/contact.html) to get preview code.

Ulrik

SQLite 3 on Mac OS X

March 10th, 2010

I was doing some speed tests of the BPT engine on Mac OS X Tiger, which ships with SQLite 3.1.3.  I accidentally built Emdros against the SQLite 3 that ships with Tiger, and found that GET OBJECTS HAVING MONADS IN was painfully slow. It took up to 10 minutes to run a basic query.

I found out that SQLite 3.1.3 doesn’t have some optimizations for column_name BETWEEN X AND Y which are present in later versions of SQLite 3.  GET OBJECTS HAVING MONADS IN makes heavy use of precisely this construct.

So I changed all cases of this construct to the idiom “column_name >= X AND column_name <= Y”. This has no noticeable side-effects on the speed on Linux, but does give a great increase in speed on Mac OS X.

Still, BPT beats SQLite 3 by a wide margin: 270 seconds for SQLite 3, 214 seconds for BPT, to run all test-queries in one of my test suites (124 queries against a syntactic database of 1.4 million syntactic objects). That’s a 20% speed increase on this particular combination of harware and Mac OS X 10.4. (The hardware is a 2007 Mac Mini, Intel Core Duo 1.6GHz, 1GB Ram).

Ulrik