Welcome to palaso.org
Website of the Payap Language Software Development Group
New Writing System UI
Author John Hatton | 04.01.2010 | Category Palaso Library, Developers
This posting will be of interest only to developers currently using, or considering using our .net Palaso Library, which provides components to do many common language software tasks. This post will look at just one of those, setting up set of Writings Systems.
Well, each year ‘round this time I take a break from my normal obligations and do something interesting, or learn something new. Alas, this year, I did neither. Instead, I squandered the time doing some long “unfinished business”. A couple of years ago, we added to Palaso some pretty nice support for LDML, the standard XML format for writing systems. Writing systems? BORING. I know, I know. Anyhow, the GUI that we had was just the minimum, and not helpful enough to actually ship in our flagship product, WeSay. So WeSay limps along on an older, pre-LDML system. Early in 2009 I did a UI design of what we really need, but, alas, noon actually implemented it. My teammates mentioned they were a bit peeved at me for not letting that bare-bones GUI ship.
adobe acrobat 9 pro
buy adobe acrobat 8
Adobe CS4 Master Collection mac
Buy Corel Draw X4
Adobe Photoshop CS4 Extended
ptc mathcad
buy microsoft office 2007
autodesk autocad 2009
autodesk autocad 64 bit
Microsoft Windows 7 Ultimate
turbo tax 2006
The only interesting thing to me about writing systems, especially user interfaces for them, is that we keep finding it so hard to get them right! I’ve seen half a dozen attempts in the last 10 years, just within the confines of SIL & friends. Here’s why, in my opinion: The vast majority of users and languages have pretty simple needs in this area. The rest, well, they’re pretty complicated (like the dictionary we worked with in South East Asia which has to handle scripts of Thailand, Burma, China, a Romanization, and IPA). Yet all the UI’s we’ve done have catered to both of these equally, and so most people were blocked, they need to call in more geeky help to get past this part of setting up their software. The key insight to fixing this, I think, is that people in typical situation will be shocked by the complexity needed in the non-typical situations. And those who have it hard, well they’ll expect things to be non-trivial. So this latest attempt uses progressive disclosure to keep things simple for most people.
|
In this post, I’m not going to talk about a lot of the really cool, problem-solving parts of this system which aren’t new. The non UI parts, the ones my colleagues actually give a hoot a about. I don’t think we ever did blog about them, though, so tag-your-it, Cambell |
The Writing System Repository Pane
Here’s the equivalent in the latest Palaso system:
There are two main parts to this design. First, the tree on the left organizes your repository in a hopefully easy to understand way, while at the same time giving you shortcuts to what you most likely want to do next. That is, it:
- Shows you the writings systems in your repository, grouped by language
- Makes suggestions about other writing systems you may wish to add to existing languages (e.g. IPA).
- Make suggestions about other languages you may be working on, based on what your Operating System thinks (here, Icelandic and Arabic on my machine).
As a developer, you can control which kinds of suggestions make sense for your application, to keep things simple. For WeSay, for example, we offer voice writing systems, but phonetic transcription and dialects are pretty unlikely. So we’d set up the Suggestor accordingly:
You might have noticed that there are no suggestions under English. Again, this it to keep things simple for the majority of users. We do that by specifying:
Note, even without those suggestions, someone could still make multiple Writing Systems for English easily enough, if they need to.
The Identifiers Tab
The second leg of the design is an Identfiers Tab which stays as simple as possible. As you know, there’s more to life that the good ‘ol “Ethnologue Code” (Now a’ days ISO 639-3). In addition to that, we need to help people come up with a proper RFC5646 identifier, including handling situations common in linguistics which aren’t spelled out by that standard. This is the job of the Identifiers tab.
For the simple (and normal) case, we don’t need to say any more than what the language is. Notice how in the image above, the Identifiers tab has just two controls.
After adding a simple writing system for a language, the next most common thing for users to need is a way to write the language phonetically or phonemically, using the International Phonetic Alphabet. Clicking the provided button just under the Aarri label gives:
Notice in the upper left, this new writing system has been grouped underneath the plain ‘ol Aari one. And on the right, notice that the “Special” combo now says “IPA Transcription”. If we want to specify phonetic vs. phonemic, we can do that with the “Purpose” combo.
If we need a new dialect, clicking the provided button brings up a dialog asking us to type in the name of the dialect. I entered “Foo”, and now we get:
Notice, with the “Special” combo set to “Script/Variant/Region”, we get the more control over the Writing system and its RFC5646 identifier (displayed in the upper right).
Ok, that’s mostly what I wanted to show. When you click “Add Language”, of course, you get searchable list of known languages. And under that More button in the lower left, you get these rarely needed commands:
There’s still some work that could be done (notice Region doesn’t offer a list), but my New Year’s break is over, so that’s it for now. The Palaso Library lives in Mercurial, at http://projects.palaso.org/projects/show/palaso.
jh
Git notes
Author Eric Albright | 23.06.2008 | Category Developers, WeSay
Now that I’ve used git for a couple weeks, I thought I’d make a few notes of commands I’ve found helpful.
To make a local branch for development
git checkout -b name_of_new_branch
To commit changes to the local repository (although I usually use the visual gittk for this)
git commit -a
To commit changes back to subversion
git svn dcommit
To uncommit
git reset HEAD~1
To keep a local branch up to date with subversion (use git stash to hide away local uncommitted changes for later)
git svn rebase
To move the master branch up to trunk
git checkout master
git svn rebase
to handle conflicts with merge
git mergetool path_to_file_needing_merge
or
git mergetool -t toolname path_to_file_needing_merge
To remove untracked files (like the temps that get created during a merge resolve)
git clean -n to see what it would do
git clean -f -d (-d if you want to remove untracked directories as well)
Git, Subversion and a CRLF mess
Author Eric Albright | 23.06.2008 | Category Developers, WeSay
When initializing from WeSay’s Subversion repository, (git svn init -t tags -b branches -T trunk http://www.wesay.org/code/WeSay) I found that I was then told that I had a ton of files that had changed. Turns out on Windows, git has core.autocrlf = true by default — a good thing. But git-svn apparently doesn’t take this into account and if you have crlf’s stored in the svn repository, they will be pushed into the git repository as well. So for now we have a repository that has crlf’s in it instead of just lf’s which get translated depending on the platform. Setting core.autocrlf to false and then doing a hard reset will make this work for now, although not as nicely as we would like. (git config core.autocrlf=false; git reset –hard)
Merging with git
Author Eric Albright | 12.06.2008 | Category Developers, WeSay
Git still doesn’t have good unicode support so to merge unicode files that git has labeled binary, I wanted to use a visual merger. Finally figured out how to do it — add the following lines to config:
[merge] tool = tortoise [mergetool "tortoise"] cmd = \"TortoiseMerge.exe\" /base:\"$BASE\" /theirs:\"$REMOTE\" /mine:\"$LOCAL\" /merged:\"$MERGED\" [mergetool "p4"] cmd = \"p4merge.exe\" \"$BASE\" \"$REMOTE\" \"$LOCAL\" \"$MERGED\"
If you don’t have TortoiseMerge.exe in your path then you can replace that with the full path (c:/Program Files/TortoiseSVN/bin/TortoiseMerge.exe).
Upgrading user settings in C#
Author Tim | 10.06.2008 | Category Developers, WeSay
In the course of development we found it necessary to migrate an old user setting into a new one and to then remove it. This brought with it a few problems which I hope to shed some light on below.
In order to get the value of the old setting we used the Property.Settings.GetPreviousVersion() method. Initially we were getting a SettingsPropertyNotFoundException() although the setting was verifiably present in the user.config file. As it turns out we had removed the Property from the Settings designer which removed the Property in the Property.Settings class. In order for Settings to be found, they have to have a property that is tagged with the [UserScopedSettingAttribute] attribute. This tells the GetPreviousVersion() method to look for the setting in user.config. So far so good…
At this point however, the base.Upgrade() method is called to move old settings into the new file. This causes the old, unwanted setting to be moved in right along with all the old settings that we want to keep around. In order to avoid this behavior the [NoSettingsVersionUpgrade] attribute must also be used for the unwanted Property.
public override void Upgrade()
{
string lastConfigFilePath = (string) GetPreviousVersion(”LastConfigFilePath”);
base.Upgrade(); // bring forward our properties that are the
// same (but also will bring forward LastConfigFilePath)
}
[UserScopedSettingAttribute]
[DebuggerNonUserCode]
[DefaultSettingValueAttribute(”")]
[Obsolete(”Please use MruConfigFilePaths instead”)]
[NoSettingsVersionUpgrade]
public string LastConfigFilePath
{
get
{
throw new NotSupportedException(”LastConfigFilePath is obsolete”);
}
set
{
throw new NotSupportedException(”LastConfigFilePath is obsolete”);
}
}
An enchant provider for LIFT
Author Eric Albright | 13.05.2008 | Category Developers, WeSay
We wanted to allow users to edit their dictionary and use that same dictionary for spell checking. Since WeSay uses LIFT as the file format for the dictionary and keeps that file up to date, all we needed was an enchant provider that can read LIFT files.
I took the spell checking engine I had written a while back, Ascens, and refactored it so that it could read files of various formats. Currently it supports line based and XML based formats. For line based formats, the words are entered one per line. For XML based formats, an XPath expression determines what text from within the file should be selected to constitute correctly spelled words.
Ascens looks for a settings file with the same name as the language identifier that is passed to enchant. Within the settings file, the location of the dictionary and the type of the dictionary are specified. If the type is xml then the xpath expression should be defined.
The following is an example settings file for Ascens referring to a Lift file:
# This is the settings file for Ascens
[Dictionary]
# Type is either xml or line
# for xml you also need to set the XPath
#Type=line
Type=xml
# path to the dictionary
# (can be absolute or relative to the directory that this file is in)
#Path=c:\documents and settings\user\my documents\dictionaries\fr_FR.dic
#Path=fr_FR.dic
Path=..\..\..\My Documents\WeSay\French\French.lift
# XPath gives the Xpath that selects the words to be used as dictionary
# it must all be on a single line
XPath=//entry[not(citation-form/form[@lang='fr'])]/lexical-unit/form[@lang='fr']/text | //entry/citation-form/form[@lang='fr']/text
# this xpath selects the forms with the language id of 'fr' from the
# citation form when there is one and from the lexical unit when
# there is no citation form (it will not select both)
Enchant looks for user Ascens settings files in the following locations:
- The
ascenssubdirectory of the value found in the registry atHKEY_CURRENT_USER\Software\Enchant\Config\Data_Dir, if there is one. %APPDATA%\enchant\ascens, where%APPDATA%is shorthand for theC:\Users\<username>\AppData\Roaming\folder (Windows Vista) or theC:\Documents and Settings\<username>\Application Data\folder (Windows XP/2000).- The
enchant\ascenssubdirectory of the directory value found in the registry atHKEY_CURRENT_USER\Software\Enchant\Config\Home_Dir, if there is one. %USERPROFILE%\enchant\ascens, where%USERPROFILE%is shorthand for theC:\Users\<username>folder (Windows Vista) or theC:\Documents and Settings\<username>folder (Windows XP/2000).
Enchant looks for shared Ascens settings files in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\ascens\Data_Dir, if there is one. Otherwise, using the value found in the registry atHKEY_LOCAL_MACHINE\Software\Enchant\ascens\Data_Dir, if there is one. <enchant>\share\enchant\ascens, where<enchant>is the location oflibenchant.dll.
WeSay Tests on Mono Status
Author Eric Albright | 22.04.2008 | Category Developers, WeSay
One step toward getting WeSay to run on the OLPC is to verify that it can run with Mono.
We already reported all the System.Windows.Forms bugs that we could find by running MWF on Windows as documented here. The next step has been to run all the tests under Mono. As you can see from the diagram (that actually lives on our whiteboard) at left, we have found and fixed and reported quite a few bugs that have made the number of failing tests plummet. We’re still not there yet, but I’m making good progress.
Formatting dictionaries with CSS
Author Eric Albright | 26.02.2008 | Category Typesetting, Dictionary, Developers
In evaluating CSS as a stylesheet language for formatting dictionaries, I started putting PrinceXML through its paces. I tried what I considered to be
the hardest dictionary layout and while I think I have matched many of the features. The sidenotes are just not going to happen without specialized support for them in CSS. (The closest I could get was a float but of course if you have more than one within a line, they just write on top of each other). That result is here. I then switched to a more typical layout which had no problems at all. That result is here. You can get all the files to reproduce this exercise here.
Types of style
There are really a number items which contribute to the style of a dictionary:- Selection of fields
- Order of fields
- Textual markup - characters or text that is added before, after, or around items to distinguish a field from surrounding text
- Character styles - font changes
- Paragraph styles
- Page layout - columns
CSS3 Selectors
Another interesting behavior of CSS 3 is that you cannot select the first element having a class containing the word ‘pronunciation’:.pronunciation:first-of-type
You can only use the :first-of-type selector to select the first element with a particular name so a general div and span with class attributes would have to be converted to xml named elements instead. There is a way around this, given that our document will be generated from another format and that is to actually add classes first-of-type and last-of-type. Then the data becomes:
<span class="pronunciation
first-of-type">...</span><span
class="pronunciation">...</span><span class="pronunciation
last-of-type">...</span>
<span class="pronunciation first-of-type last-of-type">...</span>
Column-span
The only other problem I ran into was that Prince does not yet support the column-span property. This ended up not being a big problem since I just wanted the heading to span both columns and was able to work around this by making the first page of the section have a 12cm top margin and to float the heading into this space.Configuring where Enchant looks for files
Author albright | 22.02.2008 | Category Spelling, Developers
So far, I have covered how to get started using Enchant and how to set up dictionaries. This post will cover more advanced concepts that let an application developer or a user take more control over Enchant.
Where Enchant looks for providers
Enchant looks for which providers are available when the enchant_broker_init function is called.
Providers can be installed on the machine for all users to use on the system or can be installed for only one user. If Enchant finds a particular provider as a system provider and as a user provider, the user provider is used.
Enchant looks for system providers in the following locations:
- The value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Module_Dir, if any - Otherwise, the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Config\Module_Dir, if any - Otherwise, in
%enchant%\lib\enchant, where%enchant%is the location oflibenchant.dll.
The provider location for the user is determined by:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
How Enchant decides which provider to load for a given language
The provider that is used for a given language is determined by the provider ordering. This can be set programatically by using the enchant_broker_set_ordering function. Enchant initializes the ordering by looking in the enchant.ordering file. There is a system ordering file as well as a user ordering file. A user entry overrides a system entry.
Enchant looks for the system enchant.ordering file in the following locations:
- The value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Config\Data_Dir, if any - Otherwise, in
%enchant%\share\enchant, where%enchant%is the location oflibenchant.dll.
Enchant looks for the user enchant.ordering file in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
If enchant doesn’t find any ordering files and the ordering is not overridden programmatically then the ordering is system dependent (but I think that means they will be ordered alphabetically by filename).
Where Enchant looks for Ispell dictionaries
Enchant looks for user Ispell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Config\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant\ispell, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
Enchant looks for system Ispell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Ispell\Data_Dir, if there is one. - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Ispell\Data_Dir, if there is one. - Otherwise, in
%enchant%\share\enchant\ispell, where%enchant%is the location oflibenchant.dll.
Where Enchant looks for MySpell dictionaries
Enchant looks for user MySpell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Myspell\Data_Dir, if there is one. - Otherwise, in
%APPDATA%\enchant\myspell, where%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000).\Application Data\
Enchant looks for system Ispell dictionaries in the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Myspell\Data_Dir, if there is one. - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Myspell\Data_Dir, if there is one. - Otherwise, in
%enchant%\share\enchant\myspell, where%enchant%is the location oflibenchant.dll.
Where Enchant looks for the Aspell library
Enchant looks for the aspell-15.dll using the following locations:
- Using the value found in the registry at
HKEY_CURRENT_USER\Software\Enchant\Aspell\Module, if there is one (this value should include the filename and not just the path). - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Enchant\Aspell\Module, if there is one (this value should include the filename and not just the path). - Otherwise, using the value found in the registry at
HKEY_LOCAL_MACHINE\Software\Aspell\Path, if there is one, as the path to findaspell-15.dll(this is set by the Aspell installer for Windows). - Otherwise, in the same directory as
libenchant_aspell.dll. - Otherwise, it uses the normal Windows search strategy, which includes looking in the path.
Setting up dictionaries for Enchant
Author albright | 21.02.2008 | Category Spelling, Developers
In my last post, I gave some tips for getting started with Enchant but you really can’t get anywhere until you have properly configured the providers and installed some dictionaries.
ASpell
The ASpell provider for Enchant requires aspell-15.dll. The easiest way to get started with ASpell is to use the installer for ASpell and for dictionaries.
- Be sure you have the ASpell provider (you can list it with enchant-lsmod) libenchant_aspell.dll
- Download the installer and run it to install ASpell.
- Download a dictionary installer from here and run the installer.
- Verify that it has been installed correctly by running
enchant-lsmod.exe -list-dicts. You should see something like:en_US (aspell)but with the language code for the language you installed instead ofen_US - You can also test it using
enchant -d en_US -a(again using the language code for the language you installed). Then you can type words which are or aren’t in the dictionary and see suggestions when they aren’t.
It is possible to use ASpell by including the aspell-15.dll in the same directory as libenchant_aspell.dll or it can be somewhere in the path. If you install aspell using the Windows installer, it will write a registry entry that points to where it was installed and Enchant will use that to find the dependency.
MySpell/Hunspell (OpenOffice format)
Enchant doesn’t require any additional dependencies other than the MySpell provider for MySpell dictionaries but it does require you to copy the dictionary files to the right place.
- Be sure you have the MySpell provider (you can list it with enchant-lsmod) libenchant_myspell.dll
- Download a dictionary that you want: You can get any of the dictionaries from OpenOffice.org.
- Unzip (or otherwise uncompress the package) and copy the contents into
%APPDATA%\enchant\myspell(you may need to create theenchantandmyspelldirectories the first time).%APPDATA%is shorthand for theC:\Users\folder (Windows Vista) or the\AppData\Roaming\ C:\Documents and Settings\folder (Windows XP/2000). But you can type\Application Data\ %APPDATA%in the explorer’s address bar and it will go to the right place. - Verify that it has been installed correctly by running
enchant-lsmod.exe -list-dicts. You should see something like:en_US (myspell)but with the language code for the language you installed instead ofen_US - You can also test it using
enchant -d en_US -a(again using the language code for the language you installed). Then you can type words which are or aren’t in the dictionary and see suggestions when they aren’t.
Note: if you install MySpell and ASpell dictionaries for the same language, the ASpell dictionaries will be used instead of the MySpell dictionaries (this can be changed but I’ll leave that for another post)
If you are feeling really adventurous and would like to create your own, you can see the directions here.
ISpell
Enchant’s Ispell provider also doesn’t have any dependencies (the dictionaries are read directly by Enchant).
- Be sure you have the ISpell provider (you can list it with enchant-lsmod) libenchant_ispell.dll
- Download a dictionary from here (at the bottom of the page).
- Unzip (or otherwise uncompress the package) and copy the contents into
%APPDATA%\enchant\ispell(you may need to create theenchantandispelldirectories the first time). - Verify that it has been installed correctly by running
enchant-lsmod.exe -list-dicts. You should see something like:en_US (ispell)but with the language code for the language you installed instead ofen_US - You can also test it using
enchant -d en_US -a(again using the language code for the language you installed). Then you can type words which are or aren’t in the dictionary and see suggestions when they aren’t.
Empty dictionaries
An easy way to get spell checking for a language that doesn’t have a dictionary, is to create an empty MySpell dictionary. First, decide on the language code to be used. (You should use the iso639 code or the ietf language tag, for our example we will use qaa, the first of the private use language codes, as the language code). There are two files that are required, the affix file, qaa.aff, and the dictionary file, qaa.dic. They should both be put in %APPDATA%\enchant\myspell.
The qaa.aff file should contain the following line:
SET UTF-8
The qaa.dic file should contain the following line (it’s a zero, the number of items in the dictionary):
0
Of course, you won’t have any items in your empty dictionary so all the words will be marked as misspelled. As you add items to the dictionary using Enchant, the words will be stored in %APPDATA%\enchant\qaa.dic.
Categories
- Developers (17)
- Dictionary (3)
- FLEx (1)
- Linux (2)
- OurWord (1)
- Palaso Library (1)
- Solid (11)
- Spelling (3)
- Typesetting (1)
- Uncategorized (1)
- WeSay (35)