Naïvely Latin-1

Just checking out DelphiFeeds.com, I see a post at the top which makes an error (or at least veers towards it) that I find weirdly irritating: conflating ASCII with the Windows Latin-1 codepage. Let’s get some things clear:

  • Latin-1 covers more than just modern English.
  • Latin-1 is itself a full 8 bit code page in the sense of having code points with values greater than 127.
  • Many of these code points are needed even for basic English. (That this is typically with respect to French loan words is neither here nor there.)

Consequently, for ‘human-readable’ case conversions, Delphi programmers should always have used AnsiUpperCase/AnsiLowerCase rather than just UpperCase and LowerCase, even when only English was (legitimately) assumed. Just try calling UpperCase(‘café’) to see what I mean.

Advertisements

12 thoughts on “Naïvely Latin-1

  1. You can easily find out who wrongly used UpperCase instead of AnsiUpperCase by searching the web for people who complain that their beloved UpperCase doesn’t handle Unicode code points.

  2. CR: I wrote that blog post, and I really wonder about your post. Who is confusing Latin-1 and ASCII? I don’t mention any of them, and never meant to refer to any of them.

    • Hi Lars. You wrote:

      The default 8-bit character set in Windows is not Windows-1252 in countries like Greece, Hungary, Russia, Japan, China etc. These countries use letters that need values >=128 for their encoding, or sometimes multiple bytes.

      This implies Windows-1252 does not contain code points with ordinal values over 127. This is false as you probably well know, though it would be an understandable mistake to make if one were to conflate Windows-1252 (‘Windows Latin-1’ as I put it) with ASCII.

      The rest of your post was then ambiguous as to whether a D2-D2007 app that worked in a purely English context should ever have used the AnsiXXX functions over their non-prefixed equivalents. The way you seemed to put it, whether to use UpperCase and LowerCase or something cleverer is simply a matter of ‘internationalization’, which is not the case – AnsiXXX were needed even for English.

      • Hi CR

        I did not mention this as a contrast to anything, and therefore there is no “implying” part.

        I have noticed, that a lot of U.S. programmers find it annoying that they cannot use Uppercase() for ordinary unicode text, because they used that in Delphi 2007 and previously. This implies, that they did not have the need to uppercase any other letters than a-z, which is also the reason why many applications, that lack internationalization, convert “café” to uppercase “CAFé”.

        My blog post mainly tried to explain to those people, that it is wrong to use uppercase() for ordinary text, and “café” is a good example why it is wrong. Good apps in Delphi 2007 and earlier use AnsiUppercase(). I wish I had thought of that word before writing the blog post 🙂

        Lars.

        • In Delphi 2009/2010, UpperCase/LowerCase (and some others), have a extra parameters.
          The funny part is that parameters just redirect to AnsiXXXX functions (that don’t work on AnsiString, but, UnicodeString, what a mess no…)

          So, UpperCase calls AnsiUpperCase that works on Wide characters instead… 🙂

          I never used UpperCase function anyway, and in OnKeyPress events, sometimes, i need to do a CharUpperBuff(@Key, 1), that is the same windows API function used for AnsiUpperCase…

          • In Delphi 2009/2010, UpperCase/LowerCase (and some others), have a extra parameters.

            Right. They’re in D2006/7 too.

            The funny part is that parameters just redirect to AnsiXXXX functions (that don’t work on AnsiString, but, UnicodeString, what a mess no…

            Well no, it’s not a mess if backwards compatibility is important. I’d agree they should be marked deprecated now though — in fact, they probably should have been once UpperCase/LowerCase got their optional parameter.

  3. I started both answers at the same time, but should have refreshed before posting the actual answers. And then still it wouldn’t be transactional 🙂

    Few people really know about all these character sets. Hopefully posts like yours will and Lar’s will improve that.

    –jeroen

  4. Chris, in French the proper spelling of “café” in capital letters is actually “CAFE” not “CAFÉ”. And even trickier is that “CAFE” should become “café” in lower case, as the locals would do automatically.
    This is for typographical and historical reasons.
    Keeping the accents on capital letters is more used on the E letters as it can help disambiguation and also because computerized typography made it easier, but really no one would hand-write my first name FRANÇOIS instead of FRANCOIS nor Francois instead of François.
    By the way these letters ÀÂÇÈÉÊËÎÙ are not even on the keyboard.

      • In Denmark, you would uppercase ‘café’ to ‘CAFÉ’. Uppercase/lowercase depends on the selected locale, and not on the letters – it’s just like sorting. For instance, the Danish locale specifies ‘AB'<'AC'<'AA' because 'AA' is seen as a substitute for the letter 'Å' which is last in the alphabet. Also, 'V'='W' when sorting, so that 'WA' < 'VB' < 'WC' < 'VD' etc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s