Back in September 2012, a post appeared in non-tech reporting Delphi’s PCRE wrapper to be many, many times slower than Python’s. With sample code attached, the problem was undeniable, though the immediate cause was soon identified: the Delphi wrapper’s failure to include a ‘don’t validate the UTF8’ flag (PCRE was traditionally a UTF-8 based library, so the Delphi wrapper was using UTF8String). Putting everyone’s work together, I posted a QC report. Soon after doing that an even better solution was noted, which was for the wrapper to wrap a newer version of PCRE that supported UTF-16 internally, i.e. Delphi’s native string encoding, and so allow avoiding UTF-8 roundtrips entirely.
To be fair to Embarcadero, the second solution might have been considered a bit problematic in practice, given PCRE’s UTF-16 mode was only 6 months old at that point, and using it may have been tricky for OS X. This is because on that platform, the Delphi wrapper uses the system PCRE dylib rather than statically linking equivalent C object files, due to the fact DCCOSX only consumes object files produced by the Windows C++Builder compiler (or at least, only did when I last looked into the matter). On the other hand, the additional flag fix involves adding just a couple of lines… so perhaps it could be implemented fairly quickly?
Alas, but it isn’t been implemented as yet. Oh well – I can see shipping iOS and Android support were much bigger fish to fry. Does this mean the unit in question hasn’t been touched at all? Oh no: it has been extensively fiddled about with due to the fact the UTF8String type was removed from the so-called ‘nextgen’ (i.e., LLVM-based) compilers. As such, an elegant UTF8String interface has been replaced with an ordinary string one that now has to use the ugly ‘marshaller’ API and TBytes internally. Even worse, it now includes pearls like the following:
function CopyBytes(const S: TBytes; Index, Count: Integer): TBytes; var Len, I: Integer; begin Len := Length(S); if Len = 0 then Result := TEncoding.UTF8.GetBytes('') else begin if Index < 0 then Index := 0 else if Index > Len then Count := 0; Len := Len - Index; if Count <= 0 then Result := TEncoding.UTF8.GetBytes('') else begin if Count > Len then Count := Len; SetLength(Result, Count); for I := 0 to Count - 1 do Result[I] := S[Index + I]; end; end; end;
If you’re reading this and thinking ‘oh no – it looks like the Move procedure has been removed!’, don’t worry, because it hasn’t. Likewise, Delphi hasn’t suddenly gone all Java-esque and dropped the equation of an empty dynamic array with a nil one – i.e., this code:
Result := TEncoding.UTF8.GetBytes('')
really is what it seems, namely an obscure way of assigning nil that if you step through it, passes through several method calls and IF tests to do the deed.