Using Hunspell — a code page-aware wrapper

Yes, it’s been done before – indeed, I’ve used Brian Moelke’s simple Hunspell wrapper from a few years back myself – but a few posts on the Embarcadero forums in the past couple of months have prompted me to write up my own.

Basically, if you haven’t heard of it, Hunspell is the open source spell checking engine used in OpenOffice, and very good it is too, at least for English – I’ve found it much better than Ispell, for example, in terms of both speed and the quality of its suggestions.

The Hunspell source itself can be downloaded from SourceForge here – you’ll need a C++ compiler to build a DLL from it (VC++ Express is fine for this purpose, and so might C++Builder – I don’t know). Calling a resulting DLL is then fairly straightforward, though one slightly tricky thing – and where my own code has its main reason for being – is in using dictionaries with foreign code pages, such as a Greek dictionary on an English system. The difficulty here is that while Hunspell itself supports UTF-8 encoded dictionaries, most actually-existing ones have an ANSI encoding – and the strings you pass to the Hunspell engine must have the encoding of the dictionary being used, the engine itself doing no conversions. In light of that, my wrapper transparently does any needed conversions for you, with the key methods having Ansi and Unicode overloads when compiling in Delphi 2006 or 2007. Moreover, I’ve also tried to write the source in a D2009+ friendly manner too.

Naturally, it may turn out that no one but myself will find it useful though, but anyhow, it’s available here if you’re interested. The ZIP includes a demo app (as one might expect), together with a prebuilt Hunspell DLL compiled with the current-at-my-time-of-typing version of the Hunspell source, namely v1.2.8.

Advertisements

7 thoughts on “Using Hunspell — a code page-aware wrapper

  1. Thanks for this. I don’t know as to how configurable this is but I will give it a go and try to implement spell checking for non English language.

    • Check out the demo and the readme – you just need to download the appropriate AFF and DIC pair and pass the filename to THunspell.LoadDictionary. One thing I will say is that performance can be slow when using non-‘Western’ (Latin-1) dictionaries – for example, the Greek dictionary you can download from the OpenOffice Hunspell site is over ten times the size of the American English dictionary, and on my slow old laptop, quite probably ten times as slow, if not more so. I’ve also found that big dictionaries can cause a noticeable delay when closing an app when you have run it through the debugger — things are fine when you run without the debugger though.

  2. Thank you for your wrapper. I have used it in an app and have a suggestion and a question. I added a mousedown event in the memo so that the care is moved whether a left or right mouse button is pressed. This eliminates the need to first left clik and then right click on a mis-spelled word.

    I added a nother menuitem at the end of the popup menu of suggested words, ‘Add To Dictionary’. I then check to see if that was the menuItem selected and if so I use the getWordAtCaret and use the add function. This works as long as I don’t close the app. Is there a way to save the dictionary with the added words? I couldn’t find a hunspell.saveDictionary or other such method.

    I have to admit that I haven’t looked at all of the code, but, after adding the new word to the dictionary, what is the best way to re-spellcheck the memo so that the red squiggle disappears?

    TIA

    RT

    • Ralph —

      The Hunspell API that my wrapper uses doesn’t expose a method like ‘saveDictionary’. The simplest workaround would be to just maintain a TStringList of custom words, which you can then load when the Hunspell engine is initialised, and save out again when your application closes (and possibly at some other stage too, e.g. when your app loses focus) —

      type
        TMyForm = class(TForm)
        ...
        strict private
          FCustomDictionary: TStringList;
          FCustomDictionaryChanged: Boolean;
          FCustomDictionaryPath: string;
          FHunspell: THunspell;
          procedure FlushCustomDictionary;
        end;
      
      ...
      
      procedure TMyForm.FormCreate(Sender: TObject);
      var
        S: string;
      begin
        FHunspell := THunspell.Create;
        FHunspell.LoadDictionary('en_GB');
        FCustomDictionary := TStringList.Create;
        FCustomDictionary.Duplicates := dupIgnore;
        FCustomDictionary.Sorted := True;
        FCustomDictionaryPath := SettingsPath + 'Custom.dic';
        if FileExists(FCustomDictionaryPath) then
          FCustomDictionary.LoadFromFile(FCustomDictionaryPath);
        for S in FCustomDictionary do
          FHunspell.AddCustomWord(S);
      end;
      
      procedure TMyForm.FormDestroy(Sender: TObject);
      begin
        FlushCustomDictionary;
        FCustomDictionary.Free;
        FHunspell.Free;
      end;
      
      procedure TMyForm.FlushCustomDictionary;
      begin
        if not FCustomDictionaryChanged then Exit;
        FCustomDictionaryChanged := False;
        FCustomDictionary.SaveToFile(FCustomDictionaryPath);
        //if D2009+, best save to a Unicode format instead:
        //FCustomDictionary.SaveToFile(FCustomDictionaryPath,
        //    TEncoding.Unicode);
      end;
      
      procedure TMyForm.itmAddToDictionaryExecute(Sender: TObject);
      var
        S: string;
      begin
        S := Trim(Memo.SelText);
        if (S = '') or (FCustomDictionary.IndexOf(S) >= 0) then Exit;
        FHunspell.AddCustomWord(S);
        FCustomDictionary.Add(S);
        FCustomDictionaryChanged := True;
      end;
      

      Hope that helps.

    • “what is the best way to re-spellcheck the memo so that the red squiggle disappears”

      That would be a question for the author(s) of the memo control you’re using (no standard one comes with red squiggles).

  3. Did a little digging and found that the ‘.dic’ file is just a sorted list of words with the first entry the total number of entries. I like your approach and will see if whether I should use that or just read, add, sort and write the dictionary file on close. Seems like six of one and a half dozen of the other.

    The benefit of your approach is that you could keep your ‘custom’ words for each application and not mess up the global list.

    Regards,

    RT

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s