Hunspell Wrapper

If you haven’t heard of it, Hunspell is the open source spell checking engine used in OpenOffice. Having used it successfully in personal projects over some years now, I’ve written (or rather, re-written) a simple wrapper class. In doing it, my aim was twofold: to properly support dictionaries with foreign code pages (most standard Hunspell dictionaries not being Unicode), and to work seamlessly between Delphi 2006/7 and Unicode Delphis. You can download the result here – the ZIP includes a demo together with a prebuilt Hunspell DLL, compiled in VC++ 2008 Express using v1.2.8 of the Hunspell sources.

For wont of consistency with the other code I’ve put up, I’ve given my wrapper an MPL licence, though anyone interested can basically do what they want with it so long as they don’t pretend the original code was their own. Hunspell itself is LGPL though, and the licence on common dictionaries can vary, so you might want to check things out yourself before deciding to use Hunspell for any serious development.

15 thoughts on “Hunspell Wrapper”

Alex says: 4 February, 2010 at 10.06 am

Hi!

I’ve stuck trying to build HunSpell under Windows/VS. Could you please help me with this? I ask you to send me HunSpell sources with Visual Studio project file.

Alex

CR says: 6 February, 2010 at 1.18 pm

Hi Alex,

Is there a reason you can’t just use the precompiled DLL I included in the ZIP? Anyhow, I’ve just put the basic steps into a blogpost here.

Reply

Pingback: Compiling a Hunspell DLL, step by step « Delphi Haven

Christian says: 14 March, 2011 at 4.51 pm

Hello,

I am using the Hunspell Wrapper for a while now and it works great. But I found out that it sometimes accepts words as spelt correctly allthough they are not.

Example: “Feštsetzung” (with a caron over the “s”) is not a german word (whereas “Festsetzung” is), but if I check it against the german dictionary, IsSpeltCorrectly says that it is spelt correctly.

Within the Wrapper, the Unicode variant of the function IsSpeltCorrectly is executed. That function calls UnicodeStringToDLLString, and that functions transforms the string “Feštsetzung” to “Festsetzung” via WideCharToMultiByte, which then of course is considered as being spelt correctly.

I understand that this has to be done because the DLL functions work with Ansi strings and code pages. FCodePage of the Hunspell component is set to ISOLatin1 (28591), as the german dictionary’s coding is set to ISO8859-1.

I am not exactly sure what is going wrong here. Any ideas?

Best regards,

Christian

CR says: 14 March, 2011 at 9.45 pm

Hi Christian —

The issue was a combination of the fact that the hatted character can’t be represented in a Latin 1 codepage, and WideCharToMultiByte by default aims for ‘best fit’ (see http://blogs.msdn.com/b/michkap/archive/2005/02/13/371895.aspx). Having played around with it though, I’ve made amendments that should fix the problem – check out the revised CodeCentral entry.

With respect to your last point, the fact there is more than one variant of the ‘Latin 1’ (alias ‘Western European’) codepage isn’t actually an issue, since they only differ WRT control characters — forcing conversions between them for textual data just kills performance for no practical gain.

Anyhow, as I said, have a go with my revised version, and let me know whether it works as expected now.

Reply
- Christian says: 15 March, 2011 at 12.00 pm
  
  Thanks very much, Chris, this fast response is very much appreciated.
  
  On a side note: I am developing that particular project with Delphi 6, so a bit outdated. I had to adjust some of your code, but both the old and your new version compile and run fine after my adjustments.
  
  I hate to bother you any further, but I ran into another problem: In my code words are checked against different dictionaries. Some of them are utf-8 encoded. In your function UnicodeToDLLString you use WideCharToMultiByte. In the new version of the code you set the parameters dwFlags and lpUsedDefaultChar.
  My experience is, that now the result of that function (stored in AnsiLen) is always zero, and the AnsiBuffer always remains unchanged. The following SetString command sets Result to an empty string.
  
  Within IsSpeltCorrectly, that empty result of UnicodeToDLLString is fed to Hunspell_spell which always returns true.
  
  Thus checking any word against a utf-8 encoded dictionary always results in true.
  
  Do you think that this could be resolved easily?
  
  Best regards,
  
  Christian
  
  Reply
  - Dave Sellers says: 20 December, 2012 at 5.19 pm
    
    Hi Christian
    Is there any possibility of getting a copy of your D6 implementation please? I’m guessing this would be an easier place to start than Chris’s standard release..
    Regards
    Dave Sellers
    
    Reply
    - Dave Sellers says: 20 December, 2012 at 5.55 pm
      
      Actually, ignore previous comment, I’ve just downloaded the ‘vanilla’ version and found it supports D7.
      Thanks
      Dave
- CR says: 15 March, 2011 at 2.21 pm
  
  My immediately thought is just to disable passing the extra parameters in the case of UTF-8 dictionaries. Could you email me an example dictionary to make it easier to test against? My email address is cc, my surname, @gmail.com.
  
  Reply

Gianfranco De Villa says: 9 September, 2011 at 10.25 am

Hi Christian,
I’ve tried succefully Hunspell Wrapper that works well.

So I’ve a request: is possible know the original word in dictionary ?

P.e.: from dictionary the word “wife” in loaded also as “wives”. If I write “wives” there is a method to retrieve the original “wife” ? Any ideas ?

Best regards,

Gianfranco.

Chris Rolliston says: 9 September, 2011 at 8.09 pm

Christian…? Anyhow, if I understand you correctly, you want to just call the Stem method. E.g., if I run the demo I included, select the Test tab, type wives then click the Stem button, wife is returned.

Reply

AzzaAzza69 says: 28 August, 2013 at 4.28 pm

Nice clean code but which unit should I be using…the CCR.Hunspell or the HunSpell?

Chris Rolliston says: 1 September, 2013 at 11.46 pm

CCR.Hunsell.pas if Delphi 7 or greater – the other one was for Delphi 6.

Reply

saroteck says: 28 December, 2013 at 5.15 am

Is it allowed to sell an “APP” on Ubuntu Store, wich uses hunspell?

Best regards

Chris Rolliston says: 28 December, 2013 at 7.10 pm

Hunspell itself is tri-licenced LGPL, GPL and MPL; the licencing issues you may have come with the dictionaries, not the core library.

Reply

Delphi Haven

Just another Delphi programming port of call on the internet

Hunspell Wrapper

15 thoughts on “Hunspell Wrapper”

Leave a comment Cancel reply