Welcome to palaso.org
Website of the Payap Language Software Development Group
Merging LIFT dictionary files
Author John Hatton | 21.01.2010 | Category WeSay
If you aren’t yet using the new collaboration features of WeSay, you may have multiple versions of your dictionary out there. Here are a few notes on ways to get them together.
The simplest case is where the users have been working on completely different sets of words, with no overlap. That is, they each started with completely empty dictionaries, which have never once been merged together. In this specific case, you can merge them by hand. Do that by opening each .lift file and copying all the <entry>…</entry> chunks of one file in next to the <entry>…</entry> chunks of the other file. Open in WeSay to make sure you didn’t mess the lift file up.
In the more general case, you will want to merge them together using FieldWorks Language Explorer (FLEx). To do that, follow these steps:
1) Create a new project using FLEx.
2) Import each .lift file into the project, one at a time, until you have a nice combined dictionary.
If getting/installing/using FLEx seems like to much, you can always just ask for someone to do this for you. Write to the WeSay email list and ask someone to do the merge for you.
Testing the New WeSay Collaboration Features
Author John Hatton | 21.12.2009 | Category WeSay
Since we first introduced WeSay a couple years ago, we’ve heard one request over and over: “Make it so that multiple people can collaborate on the dictionary”. The need for this is actually greater than it was for ShoeBox, ToolBox, Lingualinks, or FieldWorks. These other programs were primarily for linguists, and linguists working on smaller languages rarely have the luxury of a several partner linguists. In contrast, it’s very likely that multiple members of the language community will want to contribute to a WeSay project. In addition, WeSay is designed to be collaboration between and advisor and a native speaker. They will most always have different computers and live in different places. All that to say we’ve known this was important for years, but it’s quite a challenge to get the collaboration powerful enough and yet simple enough for WeSay users. Building this system, known as Chorus, has consumed the lion’s share of our new feature time over the last year.
There’s another reason why we’ve been slow to put this feature out, and why we still (as of December 2009) are only slowly, cautiously introducing it. It turns out that when we add a system which hooks all the team members together, we’ve just dramatically increased the potential for the entire team to suffer from one person’s mishap. For example, imagine that one user deletes or renames a critically important file. If that change is transmitted to the rest of the team, none of them will be able to keep working. Some of these problems have proved hard to predict ahead of time. As we identify such scenarios, we attempt to add protection against them.
We understand that people are tired of waiting. And we could use more feedback to get this ready for Prime Time. So I’m writing this blog for those of you who are willing to spend some time finding out if WeSay’s collaboration features are solid enough for your team. IF you find that they are, then great, you can start benefiting from them. If you find that more work is needed, and you tell us about it, then we’ll be able to better prioritize our efforts to get you what you need.
Note: this blog entry is a work in progress… I expect to improve it over the coming weeks, as people try it out and give me feedback.
Eventually, getting started with WeSay and Language Depot needs to be a lot more simple than it is today. Please don’t hesitate to ask for further help.
Get a LanguageDepot.org Account for the Manager
Go to Language Depot and create yourself an account.
Please don’t use the same password you use for anything important… WeSay is NOT going to be careful about keeping your password well-hidden. |
Get a LanguageDepot.org Project for the Language
Write to admin@languagedepot.org. Please provide the following information:
- The name of the account you created in the previous step
- The name of the project. Normally, the name of the language works well for this.
- The ISO 639-3 code for the language. Easiest way to find that is to Google for “ethnologue thelanguagename”.
You will be the initial “manager” of the repository. You will be able to assign contributors to the project, and turn features of the web site on and off.
Get LanguageDepot.org Accounts for the contributors
Create accounts for the contributors just like you did for yourself. Then go to the settings for the project and assign each of those people as contributors to the project.
Get the Data Together
It’s important that there is a single, up-to-date copy of the dictionary when you first put it up on Language Depot. If there is currently only a single person working on the dictionary, you need to get their project, and delete the project from their computer. That does two good things: ensures they don’t keep working on it, and ensures that they will be using the proper version of the project later.
If there are multiple copies of the dictionary out there, you need to do that for each one of them. That is, get the project, remove it from their computer. You have an extra step in this case, which is to merge the entries together. The simplest case is where the users have been working on completely different sets of words, with no overlap. That is, they each started with completely empty dictionaries, which have never once been merged together. In this specific case, you can merge them by hand. Do that by opening each .lift file and copying all the <entry>…</entry> chunks of one file in next to the <entry>…</entry> chunks of the other file. Open in WeSay to make sure you didn’t mess the lift file up.
In the more general case, you will want to merge them together using FieldWorks Language Explorer (FLEx). To do that, follow these steps:
1) Create a new project using FLEx.
2) Import each .lift file into the project, one at a time, until you have a nice combined dictionary.
3) Export the dictionary as a .lift file. That file needs to be placed in the WeSay project you will be pushing up to Language Depot.
Get the Most Recent Version of WeSay
As of December 2009, these features are too new for us to support a large number of people using them. Therefore, you have to go “to the factory” to get it. To get instructions on where to get it, simply email me (hattonjohn on gmail).
Install TortoiseHg
WeSay uses the Mercurial version control system to support sending data around and keeping a history of it. For Windows, the easiest way to get Mercurial is from a free system named “TortoiseHG”, which you can download from here.
Push the project up to LanguageDepot
Ok, once you have a single dictionary folder with the whole team’s data, it’s time to do the initial push up to LanguageDepot. At this point, WeSay can’t do this initial push for you. The easiest way is to open a good ‘ol “command prompt” window. If you don’t know how to do that, probably you’ve already enlisted a more technical person in previous steps. Get them to do this step too.
First, make sure Mercurial (hg) is properly installed. Type “hg version <return>”. You should see a little version message:
Now, change the directory to be that containing the WeSay dictionary.
Next, push the project up to Language Depot. Here, instead of “wagi”, you’d type the ISO 639-3 language code you used when setting up the project.
You will be asked to enter the user name and password you used in setting up your Language Depot account.
Pull the project down to your computer
I recommend that you now pull the project right back to your computer as if you had never had it. So if the project is already right where you want it, move it, rename it, or back it up and delete it. Now, run the WeSay Configuration Tool, and click “Get From Internet”:
Here comes the most un-cooked part, where we make you type a whole bunch of stuff into one big scary URL:
So, if your account on Language Depot is “lucy_loo” with password “v9isual9”, and the ISO code for the language is “foo”, then you’d enter:
http://lucy_loo:v9isual9@hg-public.languagedepot.org/foo
Give the project a name, and then click “Download”.
Turn on Collaboration
Ok, it’s dumb that this isn’t automatic yet, but you need to enable Send/Receive:
This will cause a button to show up on the dashboard:
Clicking it there (or even while you’re still in the WeSay Configuration tool) will show a dialog like this (sorry about that obvious bug in the text at the bottom):
Clicking “Internet” starts the synchronization:
Note: when WeSay detects that some changes were pulled down from the internet, it closes down and restarts itself so that it has a nice clean start with the new data. We plan to make this unnecessary in some future version.
Pull the project down to the computers of the rest of the team
Now do the same for each member of the team, as you get Language Depot accounts for them. If you have limited internet connectivity, you can speed things up by instead using a USB flash drive.
Like most things in WeSay, you, the Advisor, need to turn on the collaboration features which are appropriate for your project. Some of this could eventually be done automatically, but for now, you need to do this for each user (or do it for one and then copy their “.userConfig” file for everyone else, renaming each copy to match each person’s user name).
“Advanced History”
There are two optional tasks you can enable if you want:
These show up on the dashboard under the “review” section:
The history show you all the changes the team has made:
It’s labeled “advanced history” because, frankly, this is more complex than what we expect many WeSay users to handle. Use your own discretion. It may be that you, the advisor, will want this enabled in your configuration, but not in that of the rest of the team.
The Notes Browser
Collaboration is more than just sharing changes. It turns out that as soon as you start working with others using Send/Receive, you find you want to write notes to them. So we added the ability to attach questions to lexical entries, and to carry on conversations about the question. Future versions will add other kinds of notes, and allow them to be attached to particular fields, not just whole entries.
In the following screenshot, notice the circled buttons. We’d click the left-most to add a new question to this entry. The other button represents a previous, unresolved question. A click on it brings up a dialog box in which we can read and answer the question.
Ok, but how do you find which entries have unresolved notes? WeSay 0.7 also introduces the Notes Browser, which lets you find and interact with notes from all over the dictionary:
In addition to Questions, WeSay currently supports just one other kind of note, the Merge Conflict. These are created by the automatic merger when two team members edit the same field at the same time. Unlike traditional version control systems, Chorus (the engine we’ve written to do all this) doesn’t stop the merge and force you to deal with the problem immediately. Instead, it makes its best guess as to what to do, then creates a Merge Conflict Note which the team can read and deal with when it is ready.
If you don’t like what the merger did, go the entry and make whatever changes are necessary. Then click the “resolved” box to show that this has been dealt with. Or if you need to discuss what to do with a teammate, add a new message to the note. In the following screenshot, I’ve highlighted the hyperlink at the top, and the Resolved box at the bottom.
Ok, sorry, I know that’s a lot to absorb in on sitting. When we’re beyond this alpha testing period, I imagine I’ll reintroduce all of this pieces, with greater detail (though still lacking pedagogical skill :-) ).
WeSay Stable for Linux
Author cambell | 25.08.2009 | Category Linux, WeSay
WeSay is now available for Linux in two flavours. The wesay-stable package will install the latest 0.6 stable series, whereas the wesay package will install the latest 0.7 development series.
Full installation instructions are available on projects.palaso.org.
More control over “missing info” tasks
Author John Hatton | 22.06.2009 | Category WeSay
WeSay has always had Tasks which would show you just the words that needed some more information in a particular field. However, the selection of which entries to show was pretty blunt: if the field had an empty slot in any of its multiple writing systems, the task would show that entry. This meant that you couldn’t easily set up WeSay for a user who, for example, just wanted to add vernacular definitions where English ones had already been entered.
In another case, we might want to set a user up to add voice recordings of example sentences. But the task should only show example sentences where someone had previously entered in the example text.
The latest development release (0.5 build 2000) addresses this. When you first create a project, tasks are configured to have the same behavior as before: an entry will be chosen if *any* of the writing systems assigned to that field are empty.
You can now limit the task to filling in the vernacular (gaw, in this example):
In addition, we can limit the task to only those entries where some other writing system has already been filled in:
Thanks, Mark, for taking the time to submit this request. We would appreciate any feedback you can give us on this feature. Does it work well for you?
The obvious next step would be to add a way make duplicates of some tasks, so that you could have both an “Add Examples” and an “Add Example Recordings” task. This is now possible by editing the .wesayconfig file in a text editor. If you want to know how, set me an message (hattonjohn at gmail). That will tell me how much demand there is for it, and if there’s enough, we’ll make it easier.
WeSay, Palaso, and Linux
Author cambell | 29.05.2009 | Category OurWord, Linux, WeSay
Feeling in somewhat of a reflective mood I thought I’d write about the current state of our work with Linux, and have a look at where we hope to go in the next few months.
Over the last few months (er year) we’ve been working on getting our applications to run on the Ubuntu Linux platform. We are hoping that this will enable our applications to be well used on the new breed of net books and other lesser specified machines. The main work has been centered around the mono project which enables .Net Winforms applications (most commonly a Windows technology) to run on Linux. So, where are we now?
- The http://packages.palaso.org repository has been launched to make our software available to you all.
- WeSay 0.4 (stable) is available from packages.palaso.org. Install details are on the WeSay Wiki on projects.palaso.org. This is an early release and has a number of issues that we know about. Have a look at the WeSay bug tracker for further information.
- OurWord is available from packages.palaso.org. Install details are on the OurWord Wiki on projects.palaso.org. Again this is an early release, there are a number of visual issues which are being tracked on the OurWord bug tracker.
- Automatic USB stick detection now works. That means in WeSay, you can click backup and expect it to wait for you to put a USB stick into your computer and the backup will ‘just work’.
There are a few significant issues that we’ve noticed and we’re working on fixing these up over the next month or two.
- Keyboard switching doesn’t work. In WeSay we make extensive use of keyboard switching. If you’re editing a field that’s say IPA they keyboard should be set to IPA for you. When you tab to another field with a different writing system the keyboard should change automatically. We are fixing this by contributing to the SCIM project. We’re extending it’s capability to support our Keyboard Switching component. We expect to have this available very soon.
- Non Roman Script rendering using Mono doesn’t work. We’re fixing this in Mono, by re-writing large chunks of the Text Control. This will take some time to filter through and become available - perhaps some time in the next three months.
- The WeSay development branch (0.5.x) doesn’t currently work with Mono. We’re working on bringing that up to date so that we can release on Windows and Ubuntu simultaneously.
Once we’ve finished these issues, we’ll be back to focusing on core features of our products rather than enabling Linux to run them. At that time we’ll leave the improvement of Mono and Linux to others.
Well, enough reflecting. Time to go get a coffee and carry on with the work.
Further details can be found on our various projects at http://projects.palaso.org
Art Of Reading comes to WeSay
Author John Hatton | 08.04.2009 | Category WeSay
Illustrations always cheer up an otherwise drab dictionary. Until now, you had to put in a lot of work to find or create illustrations, get the rights to them, and hook them into your dictionary. With the latest release of our 0.5 line (build 1917), adding illustrations is a lot more fun.
First, let’s have a visual tour of the new feature. Then I’ll explain about “Art Of Reading”.
To add an illustration, you just go to the Picture field and click “Search Gallery…”:
(Notice that the old “Choose Image File…” is still there. Eventually, we should make that button hidden by default, as it violates our desire to not send the WeSay user into the confusing depths of the file system.)
WeSay looks in the English meaning and uses it to search the gallery for matching illustrations:
![]()
If that doesn’t show you the pictures you’re looking for, you can change the search terms and try again. Once you find the one you want, you double click on it. This closes the dialog and inserts the illustration:
Art of Reading 2.0 is a CD put out by SIL; ask around, someone near you may already have it. If not, you can order it here. From that page:
International Illustrations is the second artwork CD-ROM produced by the International Literacy Department of SIL International. This expanded, enhanced collection is the follow-up to Art of Reading 1.0 and contains over 11,000 indexed images collected from SIL and national artists around the world. Searchable by keywords.
Black and white line drawings (in compressed TIF format for Windows and Mac) are suitable for use in a wide variety of literacy materials, newsletters, bulletin board displays, and other cultural awareness materials.
Images come from Brazil, Cambodia, Cameroon, Canada, Colombia, D.R. of Congo, India, Indonesia, Kenya, Mexico, Nigeria, Papua New Guinea, Peru, Philippines, Senegal, Sudan, Thailand, USA
I’m told there’s a version 3 in the works, which will be a DVD with even more illustrations and an Indonesian index. If/when anyone produces indices for French, Spanish, Portuguese, etc., tell us and we’ll add them to WeSay, too (I suspect you could use a computer translator to generate something useful, quickly). Notice, some of the description of how the package works is irrelevant. The included software is too unwieldy for a WeSay audience. WeSay bypasses that software, keeping the process as simple as you see above.
Get the latest WeSay here. As always, we rely on your feedback here on the blog, on the Google Group, or (if you have a problem) via email: issues at wesay.org.
Thanks to René van den Berg for inspiring this new feature.
Technical Details
- Instead of leaving the CD in the machine, I recommend you copy it to the user’s hard drive. On Windows, WeSay will look for it at “C:\Art of Reading”. You don’t need anything but the “images” folder, which would be at “C:\Art Of Reading\Images”. When we ship this feature for Linux (soon), we’ll update this page with the corresponding location there.
- I do not know if WeSay’s index of this package works with Art Of Reading 1.0. It works with 2.0, and should work fine for the forthcoming 3.0.
- If you have a different image library which you think would be widely used, let us know. It would be great to have one which could be downloaded for free.
- This feature is the latest (and last) major addition to WeSay 0.5, our “development” release. Projects created or edited in 0.5 cannot be opened with WeSay 0.4, our “stable” release as of the date of this posting.
WeSay: What’s it all about
Author cambell | 13.03.2009 | Category Dictionary, WeSay
To celebrate the opening of the new Linguistics Institute at Payap University the Palaso team made a video. It’s a fun introduction to ‘WeSay’ our dictionary making software.
Collecting audio with WeSay
Author John Hatton | 07.01.2009 | Category WeSay
For a long time, I’ve had the crazy idea that audio should be just another kind of “writing system”. I’m happy to say that now, crazy or not, you can set up a project to like this:
Notice the circles there? They’re trying to be unobtrusive. When you move the mouse near, they light up:
As long as you hold the mouse button down, your voice is recorded. Like a walkie-talkie. Then, the symbol changes:
and when you go near that, you have a play and a delete button:
How is this useful? For one thing, as electronic dictionaries become more common, wouldn’t it be nice to hear the word or example sentence? This might also be helpful for language learning, by gleaning the sounds and dictionary file to create listening exercises.
How to set it up
In the configuration tool, go to Writing Systems and make a new one called “voice”. Then set the “is audio” switch to “true”.
Now go the Fields section, and tick “voice” next to any field you want to include this voice capability.
Note, you could have multiple voice writing systems, carrying different accents, genders, whatever.
Please, let us know if you have a chance to play around with this, and any experiences you have using it with a native speaker.
This is currently available in our 0.5 line, for Windows only. Linux could follow soon, especially if we hear from you.
Technical Details
All sounds are saved as .wav files under a new “audio” subfolder of your WeSay project. Their names are a bit unwieldy at the moment, largely to keep the code simple as long as this is still a proof-of-concept. Files are named as the form of the word + a time stamp, so that multiple recordings in the same word (or homograph) don’t step on each other.
Single Click Printing
Author John Hatton | 07.01.2009 | Category WeSay
In my last post, I mentioned that three levels of printing WeSay dictionaries are taking shape:
- Useful for everyday WeSay users, with no training.
- Good enough for final publication of many projects, with a little training or computer savvy.
- Powerful enough for any project, perhaps needing a specialist.
In that post, we covered #2, at least for Windows users. Now I’m pleased to announce a big step towards #1:
Click this, and a few moments later your PDF reader (e.g. Acrobat) opens with a dictionary:
Our aims for this feature are limited:
1) provide a Linux (as well as Windows) way to get simple printouts. (Lexique Pro is Windows only).
2) provide a very simple way to get printout when no computer-savvy advisor is available to run a more extensive set of applications (like Lexique Pro + Microsoft Word).
Currently, the fields that this outputs are limited to:
- Headword (from Lexeme Form and Citation Form). Multiple writing-system headword are supported.
- Definition
- Part of Speech
- Top level senses (not sub senses)
- Example sentences, and translations of them
- Illustrations, auto-captioned to the headword of the entry
- Cross references
We can easily add to the capabilities here, at you request. But we may be resistant to any enhancements which involve wizards, dialogs, etc. For that kind of control, you really need to use Lexique Pro, FLEx, or MDF. In other words, the request “I need to get borrowed words” would be implemented quickly, whereas “I want control over the placement of the illustrations” will not.
Future Work
Depending on feedback from you, gentle reader, we could do more interesting things here. These include
- automatically ordering pages for booklet printing
- a title page
- a section of words categorized by semantic-domain
- a reversal section
Technical details
As with Lexique Pro export, WeSay begins by producing a PLIFT file, which is a simplified copy of your LIFT dictionary file. It then converts this to html (like web pages use), and produces style sheets (industry standard css3 ones). Finally, it uses a terrific page-layout engine named PrinceXml to produce the pdf. The stylesheets are:
- autoLayout.css
- autoFonts.css
- customLayout.css
- customFonts.css
If you are so inclined, you can edit the to “custom” ones. This has the effect of overriding the styles in the “auto” ones. In this way, the technical user has full control. You can also setup the dictionary the way you want using FieldWorks Language Explorer’s dictionary export function, which gives you extensive control over many aspects of the layout. WeSay’s html uses the same style names as FLEx, so you can grab the css that FLEx creates and use that for your “customLayout.css” when using WeSay. If you do any of this kind of thing, please let us know. We really need to know what people are using, and what they aren’t.
Have you read this far? Leave a comment. I’m not clear if folks in the language documentation community actually read blogs.
Open In Lexique Pro
Author John Hatton | 06.01.2009 | Category WeSay
WeSay really wants to focus on gathering data. It really doesn’t want to become a full-powered dictionary layout system. Ideally, there would be an invisible, friction-free means of getting a simple dictionary printout at the click of a button, and customized one with a couple clicks. And perhaps a 3rd, ultra flexible, standards based, high-end dictionary publisher where that is called for.
So we’d have
- Useful for everyday WeSay users, with no training.
- Good enough for final publication of many projects, with a little training or computer savvy.
- Powerful enough for any project, perhaps needing a specialist.
These three scenarios are all now in the works from various SIL software teams, and I’ll blog about them as they become available to WeSay users.
Today, I’m please to update you on #2, the growing interoperability of WeSay and Lexique Pro. Lexique Pro is a high-regarded, free dictionary tool for MS Windows. From the LP web site:
Lexique Pro is an interactive lexicon viewer and editor, with hyperlinks between entries, category views, dictionary reversal, search, and export tools. It’s designed to display your data in a user-friendly format so you can distribute it to others.
Starting with version 3, LP can directly read the LIFT-standard xml files which WeSay uses. No need to go through the “standard format” or “MDF” first.
Starting with version 0.5 (our current development track), opening your dictionary with Lexique Pro got really easy:
You can get the latest WeSay here. Lexique Pro 3.0 is currently a beta, available here. With the 31 Oct beta of LP, at least, there are a number of things still to be worked out, but I expect we’ll see first-rate LIFT-based printing in LP this year.
Technical details
When you click this button, WeSay actually writes out a modified form of your LIFT file to the “exports” subdirectory of you WeSay project. While it is still compliant LIFT, some preprocessing is done to help printing programs show the right things. For example, homograph numbers are computed, headwords calculated, and any fields you have turned off for the current user are stripped from this file. We refer to this kind of file as “PLIFT” for “publication” lift.
Categories
- Developers (17)
- Dictionary (3)
- FLEx (1)
- Linux (2)
- OurWord (1)
- Palaso Library (1)
- Solid (11)
- Spelling (3)
- Typesetting (1)
- Uncategorized (1)
- WeSay (35)