Archive for the ‘EMdF’ Category

On wrap blocks, marks, and other goodies

Friday, May 4th, 2007

A lot has happened to the Emdros sources since the latest public release, which was 1.2.0.pre242. Here is a shortlist of the most important ones:

  • You can now specify “marks” on an object block or (opt) gap block. This means that you can pass arbitrary identifiers back to the layer above Emdros for each object block or (opt) gap block, like so:[Clause`yellow [Phrase`red]]. The Emdros engine doesn’t interpret these “marks”; it just passes them along in the MatchedObject.This is an enhancement of the idea of “focus”.
  • A “[wrap]” block has been added (see below).
  • TIGER-XML import capabilities have been added. The implementation covers most of the features of the TIGER-XML spec, but things like copora in separate files have been left out for now.
  • An almost-finished “configuration wizard” for the Emdros Query Tool has been added. It doesn’t actually do any configuration yet, but the GUI is almost finished.
  • A lot of regression tests were added, bringing the total of queries run up to around 300.
  • Lots of small and not-so-small bugfixes.

As for the “wrap” block, it works like a grouping of block_strings. This means that you can specify, at any level in the query, that a certain string of blocks (possibly several strings of blocks with OR in between) must occur, while still being able to specify blocks to come before and/or after the wrap block. The wrap block behaves as if there is an implicit power block (..) before the first block inside the wrap block. This can be circumvented by specifying “first” on the first object block inside the wrap block.

The wrap block is useful mostly if you want to put ORs in between strings of blocks in the middle of a string of blocks. Otherwise, if you don’t use ORs, it degenerates to the case where the wrap block wasn’t there, with the caveat that there is an implicit power block at the start (unless you specify “first”).

It isn’t quite as simple as a “grouping of block_strings”, that is, you shouldn’t think of it as mere parentheses around an OR-separated string of block_strings. The reason is that it is a full “blocks” construct that is inside, and thus there is an implicit power block at the start,as mentioned.

The wrap block doesn’t, as in Crist-Jan Doedens’ book, result in an intermediate pow_m object. Instead, the wrap block computes the straws inside of it, as if the Substrate and Universe started at whatever monad we came to with the previous block (+ 1), and extended until the end of the surrounding Substrate. If there was no surrounding previous block, then the Universe and Substrate are identical to those of the context in which the wrap block is embedded. Once the straws arising from the “blocks” inside the wrap block have been computed, they are each of them used as a basis for computing straws based on whatever came both before and after the wrap block. Thus each straw arising from the wrap block is used at the level in which the wrap block is embedded, as if the wrap block was a single block resulting in straws each with a single matched object (but really there may be more than one matched object in each straw, since the wrap block can contain arbitrary block_strings).

For exampe:

[Clause [Word] [wrap [Phrase][Phrase] OR [Clause first]] [Word]]

This could be translated to this query:

[Clause [Word]..[Phrase][Phrase][Word] OR [Word][Clause][Word]]

This second query would yield identical results, except that the ordering of the straws in the sheaf would be different.

Note how the Clause inside the wrap block has been anchored at the start of the wrap block with the “first” keyword, and so, in the translated query, there is no power block (..) between the first [Word] block and the [Clause] block.

I have no schedule for when this is going to be available to the general public. If you really, really want to try it, drop me a note.

Until then,

Enjoy!

Ulrik

Regression-testing Emdros

Wednesday, January 17th, 2007

It suddenly dawned on my otherwise benighted cogitatation-device a few days ago, that a good answer to the problem of adding more regression tests to Emdros would be to have a file, in which MQL queries and their expected output were placed together.

Last night I designed a simple little file format which fits this need.  It is processed with a small Python script to a C++ header file containing an array of QueryAnswer objects. A QueryAnswer is an object that holds both a query and its expected answer, along with a few booleans such as whether a compiler error is expected, and whether to create a new database before this query starts.

I then proceeded to adding more than 135 regression tests. As a result, I caught five obscure bugs in Emdros. Most of the bugs are so obscure that it is unlikely that many would have run into these bugs.  Except for one of them, which involved a segfault on queries of the type GET OBJECTS HAVING MONADS IN [myObjectType GET ALL], the operative clause being “GET ALL”.

Most of the MQL engine is now tested with the regression test machinery, including most error messages. The machinery even checks that the correct/expected error message is emitted in case an error is expected.  Although I should say, the statement that “most” of the MQL engine is tested remains a conjecture.  Hence, I look forward to trying with gcov to see what the real percentage of coverage is.