WeSay Stable for Linux

Author cambell | 10.01.2013 | Category Uncategorized

WeSay is now available for Linux in two flavours.  The wesay-stable package will install the latest 0.6 stable series, whereas the wesay package will install the latest 0.7 development series.

Full installation instructions are available on projects.palaso.org.

7 Ultimate

Using the Internet to collaborate on a dictionary

Author John Hatton | 21.06.2010 | Category WeSay

Get yourself a LanguageDepot.org Account

Go to Language Depot and create yourself an account.

Please don’t use the same password you use for anything important… WeSay is NOT going to be careful about keeping your password well-hidden.

Get a LanguageDepot.org Project for the Language

Write to [email protected].  Please provide the following information:

  • The name of the account you created in the previous step
  • The name of the project.  Normally, the name of the language works well for this.
  • The ISO 639-3 code for the language.  Easiest way to find that is via the Ethnologue.

We will do three things:

1) Create the language project

2) Give you manager permissions on that project.  With those permissions, you will be able to assign additional contributors to the project, and turn features of the web site on and off.

3) Create a contributor account name “____Contributor” (with your code where the blanks are).

People you have not added to the project will not be able to access your data.  However, we wouldn’t pretend to promise any real “security”.  If you need that, it’s perfectly ok to use a Mercurial server somewhere else… you aren’t tied to LanguageDepot.org.

Unless you tell us otherwise, we’ll assume it is ok for us to occasionally look at the files in your repository project for the purpose of fixing a problem for you or seeing how the collaboration features are being used in real projects.

Make sure you backup the WeSay project before embarking on any major change like this.

Get the Data Together

It’s important that there is a single, up-to-date copy of the dictionary when you first put it up on Language Depot.  If there is currently only a  single person working on the dictionary, you need to get their project, and delete the project from their computer.  That does two good things: ensures they don’t keep working on it, and ensures that they will be using the proper version of the project later.

If there are multiple copies of the dictionary out there, you need to do that for each one of them.  That is, get the project, remove it from their computer.  You have an extra step in this case, which is to merge the entries together.  Read these instructions on merging LIFT files.

Get the Most Recent Version of WeSay

The stuff shown here requires version 0.7 of WeSay, or greater.  Get the latest on the WeSay Downloads Page.

Push the project up to LanguageDepot

Ok, once you have a single dictionary folder with the whole team’s data, it’s time to do the initial push up to LanguageDepot.

First, run the Configuration Tool, and Open your project.

Go to “Actions”, and scroll until you see Send/Receive:

Click that button with the two arrows, and you should see:

Now click “Set Up”, and fill in the account details from the email you received from us.

Click “OK”.  Now, the Internet button becomes available, labeled with the name of the server you will be synchronizing with.

Notice that the “Set Up” button disappeared. This is intentional.  We want to decrease the chance that a user will accidentally mess up his/her ability to do a send/receive by messing up their account information. Therefore, once set up, WeSay hides that Set Up button.  Notice that hovering over the button reveals the trick for getting it back, if you really need to change your account settings:

Click “Internet”, and if all your account settings are correct, your project will be pushed up to the LanguageDepot server.

Pull the project down to the computers of your colleagues

Run the WeSay Configuration Tool, and click “Get From Internet”:

As before, we enter the account information:

And click Download.

Pull the project down to the computers of the rest of the team

Now do the same for each member of the team.  It’s fine to reuse the same “___Contributor” account for each of them… just make sure that their Windows/Linux account names are unique, as that is what will be used to keep track of who did what when you look at the project history.

Begin Collaborating

You’ll notice a Send/Receive button now shows up on the dashboard:

Clicking it there will show a dialog like this:

Clicking “Internet” starts the synchronization:

Note: when WeSay detects that some changes were pulled down from the internet, it closes down and restarts itself so that it has a nice clean start with the new data.

Using Collaboration Notes

Author cambell | 15.06.2010 | Category WeSay

Collaboration is more than just sharing changes.  It turns out that as soon as you start working with others using Send/Receive, you find you want to write notes to them.  So we added the ability to attach questions to lexical entries, and to carry on conversations about the question.  Future versions will add other kinds of notes, and allow them to be attached to particular fields, not just whole entries.

In the following screenshot, notice the circled buttons.  We’d click the left-most to add a new question to this entry.  The other button represents a previous, unresolved question. A click on it brings up a dialog box in which we can read and answer the question.

Ok, but how do you find which entries have unresolved notes?  WeSay 0.7 also introduces the Notes Browser, which lets you find and interact with notes from all over the dictionary:

In addition to Questions, WeSay currently supports just one other kind of note, the Merge Conflict.  These are created by the automatic merger when two team members edit the same field at the same time. Unlike traditional version control systems, Chorus (the engine we’ve written to do all this) doesn’t stop the merge and force you to deal with the problem immediately.  Instead, it makes its best guess as to what to do, then creates a Merge Conflict Note which the team can read and deal with when it is ready.

If you don’t like what the merger did, go the entry and make whatever changes are necessary. Then click the “resolved” box to show that this has been dealt with. Or if you need to discuss what to do with a teammate, add a new message to the note. In the following screenshot, I’ve highlighted the hyperlink at the top, and the Resolved box at the bottom.

Viewing the History Of Changes

Author cambell | 12.06.2010 | Category WeSay

Like most things in WeSay, you, the Advisor, need to turn on the collaboration features which are appropriate for your project.   There are two optional tasks you can enable if you want:

These show up on the dashboard under the “review” section:

The history show you all the changes the team has made:

Those familiar with WeSay will notice that the history screen is more complex than what we expect many WeSay users to handle.  Use your own discretion. It may be that you, the advisor, will want this enabled in your configuration, but not in that of the rest of the team.

WeSay Dictionary Collaboration

Author John Hatton | 09.06.2010 | Category WeSay

Since we first introduced WeSay a couple years ago, we’ve heard one request over and over: “Make it so that multiple people can collaborate on the dictionary”.  We’ve now delivered on that, and you can try it out by downloading the latest 0.7 Development Release.

In the next series of posts, I’ll walk through the basics of setting up your dictionary project so that team members can synchronize their work with each other.  We’ll see how to do this using a USB flash drive, server on a local network, or the internet.  We’ll look at two optional features: viewing the complete history of who did what, and having chat-like convers

Zithromax from australia - .

XML indentation in .Net

Author Tim | 22.02.2010 | Category Uncategorized

Upon integration of Chorus into WeSay and more specifically in using a DVCS (specifically Mercurial) to manage WeSay's XML encoded .lift file, the exact format of that file has taken on a new importance. Because Mercurial (and Chorus at this point) uses a standard line diffing tool to express the difference between two revisions, line breaks, indentations and other white space have suddenly become an issue where they normally are not in XML documents.
As it turns out formatting XML in .net is not entirely trivial. Though the XMLWriter and XMLReader as well as their respective xmlWriterSettings and XmlReaderSettings have various switches for enabling and disabling indentation and linebreaking on attributes, these are bound together by subtle interactions which I hope to clarify in this post.

First some background:
The indentation and whitespace in a given XML file can be of interest for at least two reasons:
– line differs typically care about whitespace and the name itself bears witness to the importance of newlines.
– readability. It's much easier for a human being to read a nicely formatted XML file.

In WeSay the .lift file is frequently created from two seperate files. First, a valid .lift file and secondly an xml fragment file. Each time an entry is added or modified these two files are merged to form the new .lift file. For this reason we are interested in the interaction between an XmlReader and an XmlWriter that outputs said readers data.

As an example we will use some very simple XML rather than an actual lift entry as that construct is unnecassarily complex for this discussion.
Here are the two source files:
File 1:


File 2:


Here is our envisioned result:


All of these files have been written with indentation and new lines for each element and attribute. This makes for ok readability and should keep our diff files nice and small.

The first thing we are going to try is to see what happens when we start with completely unformatted input files, default readers and a default writer:
File 1:

File 2:

And the code goes something like this:
XmlReaderSettings readerSettings = new XmlReaderSettings
ConformanceLevel = ConformanceLevel.Fragment

XmlWriterSettings writerSettings = new XmlWriterSettings
ConformanceLevel = ConformanceLevel.Document

XmlReader reader = XmlReader.Create(stream0, readerSettings);
XmlWriter writer = XmlWriter.Create(stream1, writerSettings);

while (!reader.EOF)
writer.WriteNode(reader, true);

With these settings the resulting file looks like this:

Just one long line… pretty much the worst case possible for a line diffing tool and for reading. So let's spruce it up a bit and add some formatting to the writer by changing the WriterSettings a bit:XmlWriterSettings writerSettings = new XmlWriterSettings
Indent = Indent,
NewLineOnAttributes = newLineOnAttribute,
ConformanceLevel = ConformanceLevel.Document
Resulting file:


*sigh*… beautiful.
But being geeks we just can't halp but fix something that ain't broke. So inspite of this beautiful result we now want to try and fix the the source file. This isn't entirely unreasonable considering you may want to look at the source file while debugging and it would be nice if it were a bit more legible. So just for kicks, let's see what happens when we put a single line break in the source file.. say after the element.

File 1:

Resulting File:


?!!? what happened?! Not only do we have a line break after the element, but also the

element and the closing element are not indented!!
This brings us to our first interesting observation: Whitespace in a source document causes the writer to ignore it's Indent Attribute until the containing element of the whitespace (in our case ) is closed. This is true of whitespace such as “spaces” as well. Here is the resulting file if I substitute the newline of our last example with a simple space:


Interestingly, you'll notice that this is not the case for the NewLineOnAttribute Property of the XmlWriterSettings. This is even more interesting when you consider that this property is ignored UNLESS the Indent Property is TRUE. here it is straight from the horses mouth (i.e. MSDN): This setting has no effect when the Indent property value is false.

Ok.. so we've established that whitespace is an issue. The easiest way to get around this is to instruct the reader to ignore whitespace so that the writer doesn't get too clever on us:

XmlReaderSettings readerSettings = new XmlReaderSettings

IgnoreWhitespace = true,

ConformanceLevel = ConformanceLevel.Fragment

So now we are back on track and looking good! To celebrate, let's tell the world how happy we are! Let's write a string into our first file that will proclaim our joy! Of course we will do this without spaces.. just in case.

File 1:


Resulting file:


Arrrgh! It did it again!!! So here is observation number two: A text node in a source document causes the writer to ignore it's Indent Attribute until the containing element of the text node (in our case ) is closed.

Finally, WeSay uses an XPathNavigator in some places and in the course of my testing I noticed that XmlWriter.WriteNode() behaves slightly different when it is passed an XPathNavigator rather than an XmlReader. Specifically, it seems to always ignore whitespace. So passing an XmlReader (with IgnoreWhitespace = false) to WriteNode for the first file and an XPathDocument for the second file where the files look like this:
File 1:

File 2:

Results in a final file looking like this:


Here's an outline of the code:

XmlReaderSettings readerSettings = new XmlReaderSettings
ConformanceLevel = ConformanceLevel.Fragment

XmlWriterSettings writerSettings = new XmlWriterSettings
ConformanceLevel = ConformanceLevel.Document

XmlReader reader = XmlReader.Create(stream0, readerSettings);
XmlReader reader2 = XmlReader.Create(stream0, readerSettings);
XmlDocument document = new XmlDocument();
XmlWriter writer = XmlWriter.Create(stream1, writerSettings);

while (!reader.EOF)
writer.WriteNode(reader, true);
writer.WriteNode(document.CreateNavigator(), true);

Note that this is the case even when you create an XmlDocument from an XmlReader with IgnoreWhistespace = false.

So that about wraps it up. This was not meant to be an exhaustive study of all the Xml- Reader/Writer/Document/WrietSettings/ReaderSettings/XPathNavigator interactions so if you find anything else unusual or that I grossly misrepresented something please feel free to let me know!

Merging LIFT dictionary files

Author John Hatton | 21.01.2010 | Category WeSay

If you aren’t yet using the new collaboration features of WeSay, you may have multiple versions of your dictionary out there.  Here are a few notes on ways to get them together.

The simplest case is where the users have been working on completely different sets of words, with no overlap. That is, they each started with completely empty dictionaries, which have never once been merged together.  In this specific case, you can merge them by hand.  Do that by opening each .lift file and copying all the chunks of one file in next to the chunks of the other file.    Open in WeSay to make sure you didn’t mess the lift file up.

In the more general case, you will want to merge them together using FieldWorks Language Explorer (FLEx).  To do that, follow these steps:

1) Create a new project using FLEx.

2) Import each .lift file into the project, one at a time, until you have a nice combined dictionary.

If getting/installing/using FLEx seems like to much, you can always just ask for someone to do this for you.  Write to the WeSay email list and ask someone to do the merge for you.

New Writing System UI

Author John Hatton | 04.01.2010 | Category Palaso Library

This posting will be of interest only to developers currently using, or considering using our .net Palaso Library, which provides components to do many common language software tasks.  This post will look at just one of those, setting up set of Writings Systems.

Well, each year ‘round this time I take a break from my normal obligations and do something interesting, or learn something new. Alas, this year, I did neither.  Instead, I squandered the time doing some long “unfinished business”.  A couple of years ago, we added to Palaso some pretty nice support for LDML, the standard XML format for writing systems. Writing systems? BORING. I know, I know.  Anyhow, the GUI that we had was just the minimum, and not helpful enough to actually ship in our flagship product, WeSay.  So WeSay limps along on an older, pre-LDML system.  Early in 2009 I did a UI design of what we really need, but, alas, noon actually implemented it.  My teammates mentioned they were a bit peeved at me for not letting that bare-bones GUI ship.

The only interesting thing to me about writing systems, especially user interfaces for them, is that we keep finding it so hard to get them right!  I’ve seen half a dozen attempts in the last 10 years, just within the confines of SIL & friends.  Here’s why, in my opinion:  The vast majority of users and languages have pretty simple needs in this area.  The rest, well, they’re pretty complicated (like the dictionary we worked with in South East Asia which has to handle scripts of Thailand, Burma, China, a Romanization, and IPA).  Yet all the UI’s we’ve done have catered to both of these equally, and so most people were blocked, they need to call in more geeky help to get past this part of setting up their software. The key insight to fixing this, I think, is that people in typical situation will be shocked by the complexity  needed in the non-typical situations.  And those who have it hard, well they’ll expect things to be non-trivial.  So this latest attempt uses progressive disclosure to keep things simple for most people.

In this post, I’m not going to talk about a lot of the really cool, problem-solving parts of this system which aren’t new. The non UI parts, the ones my colleagues actually give a hoot a about. I don’t think we ever did blog about them, though, so tag-your-it, Cambell 😉

The Writing System Repository Pane

Here’s the equivalent in the latest Palaso system:


There are two main parts to this design. First, the tree on the left organizes your repository in a hopefully easy to understand way, while at the same time giving you shortcuts to what you most likely want to do next.  That is, it:

  • Shows you the writings systems in your repository, grouped by language
  • Makes suggestions about other writing systems you may wish to add to existing languages (e.g. IPA).
  • Make suggestions about other languages you may be working on, based on what your Operating System thinks (here, Icelandic and Arabic on my machine).

As a developer, you can control which kinds of suggestions make sense for your application, to keep things simple.  For WeSay, for example, we offer voice writing systems, but phonetic transcription and dialects are pretty unlikely. So we’d set up the Suggestor accordingly:


You might have noticed that there are no suggestions under English. Again, this it to keep things simple for the majority of users. We do that by specifying:


Note, even without those suggestions, someone could still make multiple Writing Systems for English easily enough, if they need to.

The Identifiers Tab

The second leg of the design is an Identfiers Tab which stays as simple as possible.  As you know, there’s more to life that the good ‘ol “Ethnologue Code” (Now a’ days ISO 639-3).  In addition to that, we need to help people come up with a proper RFC5646 identifier, including handling situations common in linguistics which aren’t spelled out by that standard.  This is the job of the Identifiers tab.

For the simple (and normal) case, we don’t need to say any more than what the language is.  Notice how in the image above, the Identifiers tab has just two controls.

After adding a simple writing system for a language, the next most common thing for users to need is a way to write the language phonetically or phonemically, using the International Phonetic Alphabet.  Clicking the provided button just under the Aarri label gives:


Notice in the upper left, this new writing system has been grouped underneath the plain ‘ol Aari one.  And on the right, notice that the “Special” combo now says “IPA Transcription”.  If we want to specify phonetic vs. phonemic, we can do that with the “Purpose” combo.

If we need a new dialect, clicking the provided button brings up a dialog asking us to type in the name of the dialect.  I entered “Foo”, and now we get:


Notice, with the “Special” combo set to “Script/Variant/Region”, we get the more control over the Writing system and its RFC5646 identifier (displayed in the upper right).

Ok, that’s mostly what I wanted to show.  When you click “Add Language”, of course, you get searchable list of known languages. And under that More button in the lower left, you get these rarely needed commands:


There’s still some work that could be done (notice Region doesn’t offer a list), but my New Year’s break is over, so that’s it for now.  The Palaso Library lives in Mercurial, at http://projects.palaso.org/projects/show/palaso.


Using SOLID to convert to one-ps-per-sense format

Author cambell | 27.06.2009 | Category Solid

This week saw another colleague here in Papua New Guinea deciding to move his Toolbox dictionary to FLEx.  He had a fair amount of entries like this:

\lx ba
\ps n
\sn 1
\ge brother-in-law
\de brother-in-law: reciprocal term between wife”s father”s brother”s children and father”s brother”s daughter”s husband
\sn 2
\ge brother-in-law
\de brother-in-law: reciprocal term between sister”s husband and wife”s siblings
\sn 3
\ge in-law
\de reciprocal term between father”s sister”s husband and wife”s brother”s child
\dt 18/Sep/2000

These are all nouns, and he identifies this just once, at the top of the entry.  That counts as good MDF.  The problem is, FLEx import doesn’t handle this situation.  In fact, a recent import I did left us with over 60 cases where the \ps was turned into its own sense, followed by all the actual senses which were left with no grammatical category. Neither I nor my colleague caught this until she had already been working in WeSay on the new data for too long to go back and repeat the import. Yuck.

To prevent this, we need to move that \ps down to under the \sn, and then copy it for all remaining senses which lack a \ps.  As of SOLID version 0.9.319, we can now do this:


As with all these quickfixes, I use TortoiseHG (mercurial) so that I can look at exactly what changed, verifying that nothing was messed up. Here’s what quickfix did to the above record, as seen from TortoiseHg’s Commit tool:


More control over “missing info” tasks

Author John Hatton | 22.06.2009 | Category WeSay

WeSay has always had Tasks which would show you just the words that needed some more information in a particular field.  However, the selection of which entries to show was pretty blunt:  if the field had an empty slot in any of its multiple writing systems, the task would show that entry.  This meant that you couldn’t easily set up WeSay for a user who, for example, just wanted to add vernacular definitions where English ones had already been entered. 

In another case, we might want to set a user up to add voice recordings of example sentences. But the task should only show example sentences where someone had previously entered in the example text.

The latest development release (0.5 build 2000) addresses this.  When you first create a project, tasks are configured to have the same behavior as before: an entry will be chosen if *any* of the writing systems assigned to that field are empty.

You can now limit the task to filling in the vernacular (gaw, in this example):

In addition, we can limit the task to only those entries where some other writing system has already been filled in:

Thanks, Mark, for taking the time to submit this request.  We would appreciate any feedback you can give us on this feature.  Does it work well for you?

The obvious next step would be to add a way make duplicates of some tasks, so that you could have both an “Add Examples” and an “Add Example Recordings” task.  This is now possible by editing the .wesayconfig file in a text editor.  If you want to know how, set me an message (hattonjohn at gmail).  That will tell me how much demand there is for it, and if there’s enough, we’ll make it easier.