Archive for the ‘Meta’ Category

Compiling wxWidgets 2.8.12 on Mac OS X 10.6 (Snow Leopard) for use with Mac OS X 10.4 and later

Saturday, October 20th, 2012

I’ve successfully compiled Universal binaries of wxWidgets 2.8.12 on Mac OS X 10.6 (Snow Leopard) that work with Mac OS X 10.4 and later.

Here’s how:

  1. Unpack the sources of wxMac-2.8.12.tar.gz
  2. cd wxMac-2.8.12
  3. mkdir macosx
  4. cd macosx
  5. ../configure –enable-unicode –disable-shared –prefix=/Users/ulrikp/opt/wxMac-2.8.12-10.4-Unicode-noshared –with-macosx-sdk=/Developer/SDKs/MacOSX10.4u.sdk –with-macosx-version-min=10.4 –enable-universal_binary CC=gcc-4.0 CXX=g++-4.0 LD=g++-4.0
  6. make -j 2 all
  7. make instal

The crucial part is in step #5. You can, and should, change the path to the –prefix switch, to match your username.

The part that threw me off was that you have to switch the C and C++ compiler away from the default, to gcc-4.0 and g++-4.0. This is documented here.

Step #6 has the switch -j 2. This makes “make” use two processes at once, whenever it can. If you’ve got more horsepower than I do, you can up this to 4 or 8, or whatever is appropriate for your processor count.

After step #7, you can do this:

export PATH=/Users/ulrikp/opt/wxMac-2.8.12-10.4-Unicode-noshared/bin:$PATH

then any configure-script which uses wx-config to determine how to use wxWidgets will pick up this particular version of wx-config which we’ve just compiled.

Remember, though, to do

CC=gcc-4.0 CXX=g++-4.0 LD=g++-4.0

as well, when compiling/configuring your program.

Ulrik

 

Harvesting revisited

Monday, April 16th, 2012

I’ve spent some time writing about how to harvest objects to produce documents. The result is some documentation of a yet-to-be-implemented “Render2″ library. It is basically a description of some languages which are at once more powerful and yet also simpler than the RenderObjects and RenderXML library languages.

Once implemented, the Render2 code will:

  • Be easier to use than RenderObjects and RenderXML
  • Be more powerful than RenderObjects and RenderXML
  • Be more easily extensible than RenderObjects and RenderXML

The idea is still the same as in RenderObjects and RenderXML:

  • “Stylesheets” tell the Render2 engine what to do when encountering an object in the database (when retrieving), or what to do with XML elements (when parsing XML).
  • These “Stylesheets” basically tell what to do at the start and/or end of an object or XML element.
  • The “Stylesheets” are ordered in a tree, with inheritance semantics between them.
  • “What to do” at the start/end of an object / XML element is expressed in a second language, called a “template language”. The template language is quite powerful (both for the old RenderObjects/RenderXML library and the new Render2 library), and has support for things like variables, lists, counters, etc.

What’s new in the Render2 library includes:

  • “RenderObjects2″ stylesheets can inherit from other “RenderObjects” stylesheets. This is not just for RenderXML stylesheets any more.
  • The new template language is more regular, with less idiosyncrasies, and more expressive power. This expressive power comes in part from the new concept of “pockets” (see below).
  • The new template language introduces the idea of functions. A number of built-in functions will be provided. I am debating with myself whether to include a small scripting language in which the user can express functions themselves. We’ll see.
  • The new template language introduces the idea of expressions, which can be used in such places as “if” templates, and in parameters to function-calls.
  • The new stylesheet language (in which the template language is embedded) has a very, very simple grammar which fits in about 12 grammar-rules in Extended Backus-Naur Form. This alone should make it easier to use than the current JSON-embedded stylesheet language. The simple grammar makes it very, very easy to remember how to create a stylesheet, with very few “what you don’t know will hurt you” surprises.
  • The new stylesheet language introduces the idea of strings that are “”"triple-quoted”. This idea has been stolen from Python. The idea is to be able to use “single quotes” and newlines within “”"triple-”quote” strings”"” witout needing to escape them with backslashes. This should not only make the new stylesheets easier to use in practice (because of fewer backslashes); it should also make them more beautiful.
  • The new Stylesheet language uses the idea of “packet” to encompass all the different kinds of things you put into a stylesheet. Basically, a stylesheet unit is an ordered list of “packets”, where each packet has a packet name and a packet class (telling us how to use it), and a packet always belongs to exactly one stylesheet. Internally, a packet is no more, no less than an ordered list of key/value pairs. (This ordered list of key/value pairs may turn into a map/dictionary, but that is not part of the syntax, only part of the semantics).
  • The old RenderObjects/RenderXML stylesheets had the disadvantage that it was sometimes difficult to see which stylesheet we were currently looking at, since the stylesheet name was only mentioned once, at the top of the stylesheet. The new stylesheet language repeats the stylesheet name for every “packet”, making it easier to orient oneself in the stylesheet unit file.
  • The C++ API to the Render2 library has been greatly simplified as compared to the RenderObjects/RenderXML library.
  • The Render2 library takes a Set of Monads, not a range of monads, when needing to retrieve objects. This generalization makes it much more powerful than the old RenderObjects/RenderXML library.

The idea of “pockets” has been introduced. A “pocket” is a map/dictionary which maps strings to lists of strings. In addition, each pocket has a name which is a C identifier. The idea that one can redirect the output to a pocket, and that one can refer to the list of strings in a pocket by pocket-name coupled with pocket-key, has turned out to be quite powerful and general, supporting within one data-structure such diverse concepts as: variables, counters, integer-arithmetic, lists, and the “pockets” themselves, which can be used to output stuff “later” in the document than otherwise would have been the case.

Interested parties are welcome to ask for the documentation. The documentation is still a work-in-progress, but implementation will hopefully start soon.

Ulrik

Linguistic Tree Constructor 3.0.4 released (with an Easter egg)

Saturday, September 18th, 2010

I’ve released Linguistic Tree Constructor (LTC) version 3.0.4 over at http://ltc.sourceforge.net …

The significance for this blog is that:

  1. LTC uses Emdros
  2. The latest release of LTC has the latest Emdros sources for 3.2.1.pre02 as an Easter egg inside.

Go grab the sources of LTC if you want to see what I’m up to for the next version of Emdros, then look at the ChangeLog.

Enjoy!

Ulrik Sandborg-Petersen

Linguistic Tree Constructor — 25000 downloads passed

Monday, June 28th, 2010

One of my Open Source “successes”, Linguistic Tree Constructor, has passed 25000 downloads over at SourceForge.net.

Linguistic Tree Constructor (LTC) is a tool for building linguistic syntax trees in no time flat, using your mouse. Its main strength is quick annotation of large amounts of text, i.e., production of syntactic databases. It is based on Emdros for much of its implementation.

You can see the stats, or download for Mac OS X, Windows, and Linux over at the kindly folk at SourceForge.Net.

Enjoy!

Ulrik

Tree-displays and Emdros queries

Tuesday, November 10th, 2009

For some time, I have been having ideas for how to make a tree-based topographic query editor.  Today I’ve been working my way towards the preliminaries for an implementation of those ideas. That is, I have been working on getting the tree displays that are in the wx/htreecanvas.cpp source code file to look much better.  That’s the first step.

The ideas can be laid out as follows:

  1. Have an interactive query-editor in which the query looks like a  linguistic tree, with the root (Query) at the top and its branches (going dowards) going to nodes that represent object blocks, power blocks, gap blocks, optional gap blocks, groupings, etc.
  2. For object blocks, the main node name should be the object type.  Below the object type name is shown the node number (e.g., “Clause 1″) (this becomes an object reference declaration, e.g., “AS CLause1″). Any repetition (kleene star) gets shown below the node number, e.g., “*{1-3}” or simply “*”. Then, if there are any feature-restrictions, they get to be shown below it, probably as a subtree of nodes where each AND becomes two new nodes (with OR being the node name), and each OR becomes another line in a stacked box of disjoined terms.
  3. For power blocks, the node name is simply “…” and below it we find any monad-restrictions spelled out (e.g., “< 5″).
  4. For gap blocks, the node name is simply “gap”.
  5. For optional gap blocks, the node name is simply “optional gap”.
  6. For groupings, the node name is simply “group”.  Any repetition (kleene star) gets shown below the “group” name, e.g., “*{1-3}” or simply “*”.
  7. For OR between strings of blocks, the node name is simply “OR”.

There should be a palette from which to choose these node types.

Each node should result in a side-panel which shows the options for this node-type

If it is an object block:

  • All features, together with the possible values (e.g., for enumerations, a checklistbox; for strings, a text control, etc.)
  • A way of saying that such and such a feature is equal to (less than, greater than, different from, etc.) some other feature of some other, named node in the tree.
  • Whether the object block should NOTEXIST or not.
  • Whether the object block should be FIRST, LAST, or FIRST AND LAST within the context.
  • Any repetition (Kleene Star).

If it is a power block:

  • Any monad-restrictions.

If it is a gap or optional gap block:

  • Nothing to select.

If it is a grouping:

  • Any repetition (Kleene Star).

If it is an OR between strings of blocks:

  • Nothing to select.

There should be ways of moving nodes around, and copying and pasting subtrees.

The tree should probably have slanted lines rather than lines that are perpendicular.

The tree should not have the leaf nodes in a straight line at the bottom, but should have the leaf nodes that are at the same level be horizontally laid out in a straight line.

The above is a general overview of what it should look and feel like. Much inspiration was taken from the way that Logos Bible Software does it. They, however, have a tree which grows from the left and goes right.

I do believe that the above will make it easier for the user to understand what is going on.

The above is only a sketch, with lots of details to be filled out.

But it’s a start.

Ulrik

Emdros Query Tool: New Harvesting algorithms

Wednesday, March 19th, 2008

Since its inception by Hendrik Jan Bosman many years ago, the Emdros Query Tool has only had one harvesting algorithm. Well, until today, that is. Now it has four, including the old one.

The overall harvesting algorithm is:

  1. Execute the query. This results in a sheaf.
  2. Traverse the sheaf and gather a list of “hits”: One monad set for each “hit”.
  3. Traverse the sheaf and gather the big-union of the sets of monads in all matched objects whose “Focus” boolean is true. This is called the “sheaf focus monad set”.
  4. Get a set of raster monad ranges based on the list of “hits”. A “raster monad range” determines how much context to show around a set of monads corresponding to a “hit”. See below for how it is calculated.
  5. Get all “data units” and their features, based on the set of monads being the big-union of all raster monad ranges. A “data unit” is an object type whose objects must be shown for any given hit. Typical data units include “Word”, “Phrase”, “Clause”, “Sentence”, etc. This is gotten using the MQL statement called “GET OBJECTS HAVING MONADS IN”.
  6. Traverse the list of monad sets corresponding to a “hit”. For each monad set, calculate one “solution” to be: (i) The “hit” set of monads; (ii) The set of monads arising from taking all of the raster units that overlap with a stretch of monads in the “hit” set of monads. This is called the “raster monad set” for this solution; (iii) All data unit objects which have monads sets which overlap with the “raster monad set”. (iv) A “focus set of monads”, which is the intersection of the “raster monad set” and the “sheaf focus monad set”.

There are two changes to the harvesting algorithm which I have made today. The first relates to step #2 (gathering “hit” monad sets), and the second relates to step #4 (gathering raster monad ranges).

The first change (gathering “hit” monad sets) now has four ways to do it, as opposed to only one before today:

  • outermost“: This is the old one which was already there. It simply traverses the sheaf, and for each outermost straw, it calculates one set of monads being the big-union of the monad sets of all matched objects which are direct children of each outermost straw. Naturally, this can get unwieldy if the outermost block is, say, a “book”.
  • focus“: This calculates one “hit” monad set for each matched object whose “focus” boolean is “true”. The “hit” monad set is simply the monad set of the matched object.
  • innermost“: This calculates one “hit” for each straw which satisfies the condition that all its children are terminals in the sheaf tree, i.e., none of the children have an inner sheaf. The “hit” is simply the big-union of the monad sets of all matched objects in such straws.
  • innermost_focus“: Like innermost, but only does the big-union of the monad sets of those matched objects in the straw whose focus boolean is “true”.

The “innermost” and “innermost_focus” algorithms are especially well suited to making concordance-views (which I’ll hopefully blog about at some point).

The second change is to step #4, which calculates the raster monad ranges. The old way used to be to be told an object type (a “raster unit”) whose objects would determine the context range of monads. This would be done with GET OBJECTS HAVING MONADS IN, using the big-union of all “hit” monad sets, and using the “raster unit” object type as the object type to GET. This method is still available.

The new way, however, specifies two context monads: “raster_context_before” and “raster_context_after”: Two independent, positive integers which determine the raster context ranges. The algorithm is to traverse the list of “hit” set of monads, and for each set of monads, take the first monad, minus “raster_context_before” as the first monad of the range, and take the last monad, plus “raster_context_after” as the last monad of the range. Again, this is especially useful for concordance-type views.
This will appear in the next public release after 3.0.1.

As always, if anyone is interested in having a preview, please contact me.

Until then,

Ulrik

Emdros downloads approaching 16000

Sunday, July 15th, 2007

Within the next 24 hours, I expect that the 16000th copy of Emdros and related files will be downloaded from SourceForge.Net. This does not include those copies that may have been downloaded from elsewhere.

Emdros was first released to the public on October 11, 2001, as version 1.0.3. Since then, around 45 releases have been made public. One Linux distribution (the Russian “Alt Linux”) has picked it up and included it in their portfolio of packages. Two companies have bought licenses, and incorporated it into their software, so that Emdros may be in use by thousands of people every day. At least four academic settings have used Emdros for meeting their own needs, including IRIT in Toulouse, France, who are using Emdros as the foundation for a concordancer used by linguists in their research. Several individuals have been very kind, and have written to me with requests for help and enhancements, and some have even contributed bugfixes. Emdros has taken me to two countries to meet with people who were interested in using Emdros, and my work on Emdros has led to several new friends, some of whom I have not yet met face to face, but only via the Internet. So, I have been truly blessed by the Lord in his making me able to produce Emdros.

Update: It has already happened, as of around 09:15 GMT, on 2007-07-15.