Hunspell Wrapper

If you haven’t heard of it, Hunspell is the open source spell checking engine used in OpenOffice. Having used it successfully in personal projects over some years now, I’ve written (or rather, re-written) a simple wrapper class. In doing it, my aim was twofold: to properly support dictionaries with foreign code pages (most standard Hunspell dictionaries not being Unicode), and to work seamlessly between Delphi 2006/7 and Unicode Delphis. You can download the result here – the ZIP includes a demo together with a prebuilt Hunspell DLL, compiled in VC++ 2008 Express using v1.2.8 of the Hunspell sources.

For wont of consistency with the other code I’ve put up, I’ve given my wrapper an MPL licence, though anyone interested can basically do what they want with it so long as they don’t pretend the original code was their own. Hunspell itself is LGPL though, and the licence on common dictionaries can vary, so you might want to check things out yourself before deciding  to use Hunspell for any serious development.

Advertisements

15 thoughts on “Hunspell Wrapper

  1. Hi!

    I’ve stuck trying to build HunSpell under Windows/VS. Could you please help me with this? I ask you to send me HunSpell sources with Visual Studio project file.

    Alex

    • Hi Alex,

      Is there a reason you can’t just use the precompiled DLL I included in the ZIP? Anyhow, I’ve just put the basic steps into a blogpost here.

  2. Pingback: Compiling a Hunspell DLL, step by step « Delphi Haven

  3. Hello,

    I am using the Hunspell Wrapper for a while now and it works great. But I found out that it sometimes accepts words as spelt correctly allthough they are not.

    Example: “Feštsetzung” (with a caron over the “s”) is not a german word (whereas “Festsetzung” is), but if I check it against the german dictionary, IsSpeltCorrectly says that it is spelt correctly.

    Within the Wrapper, the Unicode variant of the function IsSpeltCorrectly is executed. That function calls UnicodeStringToDLLString, and that functions transforms the string “Feštsetzung” to “Festsetzung” via WideCharToMultiByte, which then of course is considered as being spelt correctly.

    I understand that this has to be done because the DLL functions work with Ansi strings and code pages. FCodePage of the Hunspell component is set to ISOLatin1 (28591), as the german dictionary’s coding is set to ISO8859-1.

    I am not exactly sure what is going wrong here. Any ideas?

    Best regards,

    Christian

    • Hi Christian —

      The issue was a combination of the fact that the hatted character can’t be represented in a Latin 1 codepage, and WideCharToMultiByte by default aims for ‘best fit’ (see http://blogs.msdn.com/b/michkap/archive/2005/02/13/371895.aspx). Having played around with it though, I’ve made amendments that should fix the problem – check out the revised CodeCentral entry.

      With respect to your last point, the fact there is more than one variant of the ‘Latin 1’ (alias ‘Western European’) codepage isn’t actually an issue, since they only differ WRT control characters — forcing conversions between them for textual data just kills performance for no practical gain.

      Anyhow, as I said, have a go with my revised version, and let me know whether it works as expected now.

      • Thanks very much, Chris, this fast response is very much appreciated.

        On a side note: I am developing that particular project with Delphi 6, so a bit outdated. I had to adjust some of your code, but both the old and your new version compile and run fine after my adjustments.

        I hate to bother you any further, but I ran into another problem: In my code words are checked against different dictionaries. Some of them are utf-8 encoded. In your function UnicodeToDLLString you use WideCharToMultiByte. In the new version of the code you set the parameters dwFlags and lpUsedDefaultChar.
        My experience is, that now the result of that function (stored in AnsiLen) is always zero, and the AnsiBuffer always remains unchanged. The following SetString command sets Result to an empty string.

        Within IsSpeltCorrectly, that empty result of UnicodeToDLLString is fed to Hunspell_spell which always returns true.

        Thus checking any word against a utf-8 encoded dictionary always results in true.

        Do you think that this could be resolved easily?

        Best regards,

        Christian

        • Hi Christian
          Is there any possibility of getting a copy of your D6 implementation please? I’m guessing this would be an easier place to start than Chris’s standard release..
          Regards
          Dave Sellers

          • Actually, ignore previous comment, I’ve just downloaded the ‘vanilla’ version and found it supports D7.
            Thanks
            Dave

      • My immediately thought is just to disable passing the extra parameters in the case of UTF-8 dictionaries. Could you email me an example dictionary to make it easier to test against? My email address is cc, my surname, @gmail.com.

  4. Hi Christian,
    I’ve tried succefully Hunspell Wrapper that works well.

    So I’ve a request: is possible know the original word in dictionary ?

    P.e.: from dictionary the word “wife” in loaded also as “wives”. If I write “wives” there is a method to retrieve the original “wife” ? Any ideas ?

    Best regards,

    Gianfranco.

    • Christian…? Anyhow, if I understand you correctly, you want to just call the Stem method. E.g., if I run the demo I included, select the Test tab, type wives then click the Stem button, wife is returned.

    • Hunspell itself is tri-licenced LGPL, GPL and MPL; the licencing issues you may have come with the dictionaries, not the core library.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s