In my previous post I wondered out loud what was the best way to index my book (this will be for the print edition by the way, which I’m definitely going to do now). The consensus seemed to be ‘mark the index entries manually’, and once I actually started the process, it hasn’t been too hard at all – the UI for marking index entries in Word is a bit crap, being both fiddly and incomplete (it doesn’t expose the ability to mark entries for a specific index), but then I’m not using Word to actually write the text anyhow, so that didn’t matter.

Nonetheless, I do have an indexing query I’d like to get some opinions about: in indexing a chapter on multithreading, should terms like ‘atomicity’ be indexed independently under ‘atomicity’, as a second-level entries under ‘multithreading’ (‘multithreading – atomicity’), or both?

Superficially, the issue is similar to indexing a topic like trapping exceptions in secondary threads. In that particular case, I’m marking two entries, ‘exceptions – multithreading’ and ‘multithreading – exceptions’. However, that is because it is clearly a second-level topic which is equally so under both ‘multithreading’ and ‘exceptions’. ‘Atomicity’, in contrast, is either a second-level topic under ‘multithreading’ or its own top-level item (or so I think). Any thoughts…?


9 thoughts on “Indexing

  1. Generally speaking, an index should contains as many entries as you can possibly imagine people wanting to look up. So if you are in doubt, you can be pretty sure that your reader will be, too, so always (in general) include it both ways.

    • Thanks. The main counter-thoughts are, I think –
      1. Less deliberate duplication means less unintentional lack of duplication, and therefore, potentially greater consistency; and consistency is a virtue.
      2. As it is, the index is getting quite large due to the T prefix convention (e.g., ‘TStream’ and ‘streams’ both point to the same pages). (OTOH, size probably shouldn’t be seen as a problem, especially as I’m intending to keep the table of contents relatively succinct – I dislike fat TOCs as a reader.)
      3. The indices in O’Reilly books don’t appear to do it (or at least, not much), and O’Reilly books are my ‘gold standard’ – my regulative ideal, to use a Kantian analogy.

  2. Word UI and performance issues when you reach some hundredths of pages is pretty disappointing.
    That’s why we developed and use our SynProject open source tool, which is able to create the most complex documentation from a text file in a wiki-like syntax, with a dedicated editor.
    It handles source code highlight (pascal/c/c#/txt/xml/dfm…), tables in a simple wiki syntax, reference to units, classes or methods, picture indexes (including easy GraphViz tool), and keyword indexes in the middle of the text (like “This is a @*keyword@”).
    For huge content, it can easily process huge document refactoring very easily, and direct navigation.
    It uses Word only for the rendering step, just before publication.
    See and the mORMot documentation.
    We use SynProject with a project orientation (with specs/risks/design/tests… documents), but you can use it to create “normal” one-way documents, like a book.
    It is an Open Source project (written in Delphi 6/7) so you can even contribute to it!

    I do not like multi-level indexes. Finding a particular term sounds just hard to me: if even the writer (i.e. you) has issues about where one term should be placed, think that it will be even worse for the poor reader!
    So my advice is just to index simple words like they do appear in the body text, with some highlighted “main” reference to a keyword (in SynProject, you can write “@**keyword@” to indicates that this paragraph/page will define it more widely, and its page will be marked as bold and underlines in the keyword index).

    • “Word UI and performance issues when you reach some hundredths of pages is pretty disappointing.”

      I recently found a 32 bit version of WinWord 6 on an MS FTP site (the 6 meant ‘of Word’, not ‘of Word for Windows), and you know what? It handles my big RTF file with ease. Unfortunately, its age means it doesn’t support either Unicode, linked PNG images or VBA, all of which I use heavily (the fact it can’t show the images is, admittedly, part of the reason it’s so quick I guess!).

      I’ve actually written the book in my own outliner application, based on a rich edit 3 wrapper – in the text, things like index entries are entered using custom markup, which is then transformed into proper RTF codes later on. While the rich edit control (or at least, the riched20.dll version) can’t handle big documents display-wise, it has no problem if you need to process one non-visually, e.g. with the control parented to a hidden form.

      “It uses Word only for the rendering step, just before publication.”

      That’s pretty much how I’m using Word.

  3. They should be indexed both ways because you have to provide for the most probable ways (plural) people will be searching.

    BTW This WordPress login to reply to your posts does not really work, wqe have to go through too many hoops (especially if something is broken with the hoops). I had already prepared a very extensive answers to your first ‘indexing’ post a couple of weeks ago, but gave up because of login problems.

    • Thanks – another vote for both then!

      “This WordPress login to reply to your posts does not really work”

      You shouldn’t have to log on (that’s for me). Can you try it again without doing so?

  4. I think the answer to this is to use a simple text format with an easy-to-parse markup. For final publishing I strongly favour LaTeX (BibTeX if you need to include external references, `glossaries` package if you need them, and `makeidx` for your indexes.)

    • Genuine question – what advantages do you see LaTeX as holding over a more mainstream word processing format like RTF, in league with Word’s progammability? I don’t really see any myself – RTF supports (for example) styles and index markers, has a well-established specification, and has excellent support on both Windows and OS X. More generally, while I neither use nor desire a completely WYSIWYG approach, I find seeing basic formatting as I write helpful (think of Word’s ‘draft’ view with non-printing characters and field codes showing, and the ‘document map’ visible).

      • I worked with TeX and LaTeX for several projects.
        What makes the difference with RTF/Word, is definitively the rendering layout.
        If you want to have a professional and fully automated page composition, LaTeX is the reference. It knows about all publishing expectations (like “grey” of text, orphans, and so on).
        If you want a perfect automated rendering, use LaTeX. It is easy to use with some GUI tools available.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s