In a nice little series on ‘Delphi in a Unicode world’ written and published around the time of Delphi 2009’s release, Nick Hodges writes on the topic of using strings as binary buffers as thus:
A common idiom is to use a string as a data buffer. It’s common because it’s been easy — manipulating strings is generally pretty straight forward. However, existing code that does this will almost certainly need to be adjusted given the fact that string now is a UnicodeString.
There are a couple of ways to deal with code that uses a string as a data buffer. The first is to simply declare the variable being used as a data buffer as an AnsiString instead of string […] The second and preferred way dealing with this situation [, however,] is to convert your buffer from a string type to an array of bytes, or TBytes. TBytes is designed specifically for this purpose, and works as you likely were using the string type previously.
Now, I’m totally at one with those who think misusing the string type for binary buffers was a silly thing to do. Nevertheless, to say TBytes was ‘designed specifically for this purpose’ is equally as silly in my view, since in being a simple typedef for a dynamic array of bytes that was only added in D2007 (dynamic arrays themselves being added way back in D4), it patently wasn’t.
More to the point, despite having an implementation that redeployed that of the original AnsiString type for more general purposes, dynamic arrays at large — and thus, TBytes specifically — suffer from various key shortcomings in comparison:
- No copy-on-write semantics. The fact that dynamic arrays and strings share key RTL functions (Copy, Length and SetLength) frequently leads me to forget this, as well as the fact that dynamic arrays aren’t in fact pure reference types in use.
- The equals (=) and not equals (<>) operators compare references rather than data. (Note how the string type is simply more flexible here, since you can just cast to Pointer if you do want to compare string references.)
- You can’t use the addition (+) operator. For sure, using this in a light loop is highly inefficient — but if it’s so terrible in principle, why allow it for strings? [Edit: before you get the wrong idea, see my response to Luigi Sandon — ‘LDS’ — in the comments.]
- You cannot assign an array constant to a dynamic array. Cf. how there isn’t a practical distinction between string constants and string variables — they’re all just ‘strings’, and even under the hood, a string constant is just a string with a dummy reference count.
- No copy-on-write semantics means you lose much of the const-ness of constant paramaters and read-only properties — basically, the consumers of an object can change the elements of a read-only dynamic array property where they can’t change the characters of a read-only string property. Admittedly, the loss of the const-ness of constant parameters is much alleviated by the open array syntax (though let’s not dilute this by encouraging the use of paramaters declared as TBytes rather than ‘const array of Byte’, eh?).* Nonetheless, it is still an unfortunate side effect of dynamic arrays not being implemented as quasi-value types, à la AnsiString and UnicodeString.
In my view, it is these features that make manipulating strings ‘pretty straight forward’, and moreover, not prone to bugs through not fully understanding the type’s internal semantics. The fact that dynamic arrays do not have them, then, makes the idea of TBytes being some sort of genuine substitute for the misused old AnsiString quite false. That said, one particular issue with dynamic arrays especially gets my beef, but I’ll leave elucidating that to another time…
procedure Test(const Arg1: TBytes; const Arg2: array of Byte);
Arg1 := 99; //compiles!
Arg2 := 99; //doesn't compile