Using SOLID to convert to one-ps-per-sense format

Author cambell

This week saw another colleague here in Papua New Guinea deciding to move his Toolbox dictionary to FLEx.  He had a fair amount of entries like this:

\lx ba
\ps n
\sn 1
\ge brother-in-law
\de brother-in-law: reciprocal term between wife”s father”s brother”s children and father”s brother”s daughter”s husband
\sn 2
\ge brother-in-law
\de brother-in-law: reciprocal term between sister”s husband and wife”s siblings
\sn 3
\ge in-law
\de reciprocal term between father”s sister”s husband and wife”s brother”s child
\dt 18/Sep/2000

These are all nouns, and he identifies this just once, at the top of the entry.  That counts as good MDF.  The problem is, FLEx import doesn’t handle this situation.  In fact, a recent import I did left us with over 60 cases where the \ps was turned into its own sense, followed by all the actual senses which were left with no grammatical category. Neither I nor my colleague caught this until she had already been working in WeSay on the new data for too long to go back and repeat the import. Yuck.

To prevent this, we need to move that \ps down to under the \sn, and then copy it for all remaining senses which lack a \ps.  As of SOLID version 0.9.319, we can now do this:

image

As with all these quickfixes, I use TortoiseHG (mercurial) so that I can look at exactly what changed, verifying that nothing was messed up. Here’s what quickfix did to the above record, as seen from TortoiseHg’s Commit tool:

2009-06-27_09-52-06-134

Reader's Comments

  1. languist |

    Hi John,

    This latest post of yours (from the 27th) introduces
    Thanks for another very useful Quick Fix, though I won’t quite be able to use it until it supports the standard hierarchy. Currently it assumes that the user is prepared to convert from standard to alternate, which seems unlikely to me.

    I imagine the FLEx importer would be happy with either hierarchy as long as the \ps and \sn fields are paired up nicely together, though I’ve not tested this thoroughly.

    What if instead, this Quick Fix first checked to see whether “MDF Unicode” or “MDF Alternate Unicode” was selected when the file was opened, and then:
    – If standard, copy \ps down to just above each \sn.
    – If alternate, copy \ps down to just below each \sn.

    The current wording, “Push \ps down to subsequent \sn’s”, is ambiguous and therefore perfect. 🙂

    Personally, I think that the alternative hierarchy is more intuitive and better matches FLEx. But from a theoretical standpoint, some linguists would disallow having a single sense with multiple parts of speech. They’d say that multiple senses must be multiple POS’s. Of course, FLEx enforces this one-to-one relationship, but MDF Alternate cannot.

    Anyways, kudos again for posting this stuff publicly to make it easier to dialog about it in a way others can benefit from.

    -Languist