Things that make you go ‘urgh’…

What’s the flaw in this test code?

program Project1;

{$APPTYPE CONSOLE}

var
  Arr1, Arr2: array of array of Integer;
  I, J: Integer;
begin
  SetLength(Arr1, 5, 5);
  for I := 0 to 4 do
    for J := 0 to 4 do
      Arr1[I, J] := I * J;
  Arr2 := Copy(Arr1);
  for I := 0 to 4 do
    for J := 0 to 4 do
      if Arr2[I, J] <> Arr1[I, J] then
      begin
        WriteLn('Nope');
        Break;
      end;
  Write('Press ENTER to exit...');
  ReadLn;
end.

I’ve just checked, and FPC lovingly duplicates the behaviour I’m thinking of too. The flaw in question was immediately spotted by Rudy Velthuis, so credit to him.

[And no, I’m not saying avoid dynamic arrays like the plague. The fact the baby’s being kept in doesn’t mean you can’t complain about the murkiness of the water.]

Dynamic arrays — pure reference types, except when they’re not

A reasonable way to understand the semantics of dynamic arrays in Delphi is to recall the sort of code you might have used as a substitute before they where introduced in Delphi 4. Assuming only a single dimension to keep things simple, stage one would be to declare dummy static array type, together with a corresponding pointer type:

type
  PRectArray = ^TRectArray;
  TRectArray = array[0..$FFFFF] of TRect;

Allocation and reallocation may then be done using the appropriately-named ReallocMem routine. More exactly, you can use GetMem and FreeMem as well, though since ReallocMem can do both initial allocation and final deallocation, there’s no need — just remember to initialise the variable to nil if declared in a local routine, and always call ReallocMem at the end to free the array:
Continue reading

Are dynamic arrays in Delphi half-baked?

In a nice little series on ‘Delphi in a Unicode world’ written and published around the time of Delphi 2009’s release, Nick Hodges writes on the topic of using strings as binary buffers as thus:

A common idiom is to use a string as a data buffer. It’s common because it’s been easy — manipulating strings is generally pretty straight forward. However, existing code that does this will almost certainly need to be adjusted given the fact that string now is a UnicodeString.

There are a couple of ways to deal with code that uses a string as a data buffer. The first is to simply declare the variable being used as a data buffer as an AnsiString instead of string […] The second and preferred way dealing with this situation [, however,] is to convert your buffer from a string type to an array of bytes, or TBytes. TBytes is designed specifically for this purpose, and works as you likely were using the string type previously.

Now, I’m totally at one with those who think misusing the string type for binary buffers was a silly thing to do. Nevertheless, to say TBytes was ‘designed specifically for this purpose’ is equally as silly in my view, since in being a simple typedef for a dynamic array of bytes that was only added in D2007 (dynamic arrays themselves being added way back in D4), it patently wasn’t.

More to the point, despite having an implementation that redeployed that of the original AnsiString type for more general purposes, dynamic arrays at large — and thus, TBytes specifically — suffer from various key shortcomings in comparison:

  1. No copy-on-write semantics. The fact that dynamic arrays and strings share key RTL functions (Copy, Length and SetLength) frequently leads me to forget this, as well as the fact that dynamic arrays aren’t in fact pure reference types in use.
  2. The equals (=) and not equals (<>) operators compare references rather than data. (Note how the string type is simply more flexible here, since you can just cast to Pointer if you do want to compare string references.)
  3. You can’t use the addition (+) operator. For sure, using this in a light loop is highly inefficient — but if it’s so terrible in principle, why allow it for strings? [Edit: before you get the wrong idea, see my response to Luigi Sandon — ‘LDS’ — in the comments.]
  4. You cannot assign an array constant to a dynamic array. Cf. how there isn’t a practical distinction between string constants and string variables — they’re all just ‘strings’, and even under the hood, a string constant is just a string with a dummy reference count.
  5. No copy-on-write semantics means you lose much of the const-ness of constant paramaters and read-only properties — basically, the consumers of an object can change the elements of a read-only dynamic array property where they can’t change the characters of a read-only string property.  Admittedly, the loss of the const-ness of constant parameters is much alleviated by the open array syntax (though let’s not dilute this by encouraging the use of paramaters declared as TBytes rather than ‘const array of Byte’, eh?).* Nonetheless, it is still an unfortunate side effect of dynamic arrays not being implemented as quasi-value types, à la AnsiString and UnicodeString.

In my view, it is these features that make manipulating strings ‘pretty straight forward’, and moreover, not prone to bugs through not fully understanding the type’s internal semantics. The fact that dynamic arrays do not have them, then, makes the idea of TBytes being some sort of genuine substitute for the misused old AnsiString quite false. That said, one particular issue with dynamic arrays especially gets my beef, but I’ll leave elucidating that to another time…

* Thus:

procedure Test(const Arg1: TBytes; const Arg2: array of Byte);
begin
  Arg1[0] := 99; //compiles!
  Arg2[0] := 99; //doesn't compile
end;