Merging LIFT dictionary files

Author John Hatton | 21.01.2010 | Category WeSay

If you aren’t yet using the new collaboration features of WeSay, you may have multiple versions of your dictionary out there.  Here are a few notes on ways to get them together.

The simplest case is where the users have been working on completely different sets of words, with no overlap. That is, they each started with completely empty dictionaries, which have never once been merged together.  In this specific case, you can merge them by hand.  Do that by opening each .lift file and copying all the chunks of one file in next to the chunks of the other file.    Open in WeSay to make sure you didn’t mess the lift file up.

In the more general case, you will want to merge them together using FieldWorks Language Explorer (FLEx).  To do that, follow these steps:

1) Create a new project using FLEx.

2) Import each .lift file into the project, one at a time, until you have a nice combined dictionary.

If getting/installing/using FLEx seems like to much, you can always just ask for someone to do this for you.  Write to the WeSay email list and ask someone to do the merge for you.

New Writing System UI

Author John Hatton | 04.01.2010 | Category Palaso Library

This posting will be of interest only to developers currently using, or considering using our .net Palaso Library, which provides components to do many common language software tasks.  This post will look at just one of those, setting up set of Writings Systems.

Well, each year ‘round this time I take a break from my normal obligations and do something interesting, or learn something new. Alas, this year, I did neither.  Instead, I squandered the time doing some long “unfinished business”.  A couple of years ago, we added to Palaso some pretty nice support for LDML, the standard XML format for writing systems. Writing systems? BORING. I know, I know.  Anyhow, the GUI that we had was just the minimum, and not helpful enough to actually ship in our flagship product, WeSay.  So WeSay limps along on an older, pre-LDML system.  Early in 2009 I did a UI design of what we really need, but, alas, noon actually implemented it.  My teammates mentioned they were a bit peeved at me for not letting that bare-bones GUI ship.

The only interesting thing to me about writing systems, especially user interfaces for them, is that we keep finding it so hard to get them right!  I’ve seen half a dozen attempts in the last 10 years, just within the confines of SIL & friends.  Here’s why, in my opinion:  The vast majority of users and languages have pretty simple needs in this area.  The rest, well, they’re pretty complicated (like the dictionary we worked with in South East Asia which has to handle scripts of Thailand, Burma, China, a Romanization, and IPA).  Yet all the UI’s we’ve done have catered to both of these equally, and so most people were blocked, they need to call in more geeky help to get past this part of setting up their software. The key insight to fixing this, I think, is that people in typical situation will be shocked by the complexity  needed in the non-typical situations.  And those who have it hard, well they’ll expect things to be non-trivial.  So this latest attempt uses progressive disclosure to keep things simple for most people.

In this post, I’m not going to talk about a lot of the really cool, problem-solving parts of this system which aren’t new. The non UI parts, the ones my colleagues actually give a hoot a about. I don’t think we ever did blog about them, though, so tag-your-it, Cambell 😉

The Writing System Repository Pane

Here’s the equivalent in the latest Palaso system:


There are two main parts to this design. First, the tree on the left organizes your repository in a hopefully easy to understand way, while at the same time giving you shortcuts to what you most likely want to do next.  That is, it:

  • Shows you the writings systems in your repository, grouped by language
  • Makes suggestions about other writing systems you may wish to add to existing languages (e.g. IPA).
  • Make suggestions about other languages you may be working on, based on what your Operating System thinks (here, Icelandic and Arabic on my machine).

As a developer, you can control which kinds of suggestions make sense for your application, to keep things simple.  For WeSay, for example, we offer voice writing systems, but phonetic transcription and dialects are pretty unlikely. So we’d set up the Suggestor accordingly:


You might have noticed that there are no suggestions under English. Again, this it to keep things simple for the majority of users. We do that by specifying:


Note, even without those suggestions, someone could still make multiple Writing Systems for English easily enough, if they need to.

The Identifiers Tab

The second leg of the design is an Identfiers Tab which stays as simple as possible.  As you know, there’s more to life that the good ‘ol “Ethnologue Code” (Now a’ days ISO 639-3).  In addition to that, we need to help people come up with a proper RFC5646 identifier, including handling situations common in linguistics which aren’t spelled out by that standard.  This is the job of the Identifiers tab.

For the simple (and normal) case, we don’t need to say any more than what the language is.  Notice how in the image above, the Identifiers tab has just two controls.

After adding a simple writing system for a language, the next most common thing for users to need is a way to write the language phonetically or phonemically, using the International Phonetic Alphabet.  Clicking the provided button just under the Aarri label gives:


Notice in the upper left, this new writing system has been grouped underneath the plain ‘ol Aari one.  And on the right, notice that the “Special” combo now says “IPA Transcription”.  If we want to specify phonetic vs. phonemic, we can do that with the “Purpose” combo.

If we need a new dialect, clicking the provided button brings up a dialog asking us to type in the name of the dialect.  I entered “Foo”, and now we get:


Notice, with the “Special” combo set to “Script/Variant/Region”, we get the more control over the Writing system and its RFC5646 identifier (displayed in the upper right).

Ok, that’s mostly what I wanted to show.  When you click “Add Language”, of course, you get searchable list of known languages. And under that More button in the lower left, you get these rarely needed commands:


There’s still some work that could be done (notice Region doesn’t offer a list), but my New Year’s break is over, so that’s it for now.  The Palaso Library lives in Mercurial, at