TStringList.Assign… Bug, Feature, or Documentation Error?

I was recently revisiting my improved implementation of TStringList (see http://www.riversoftavg.com/efficiencytlistntstringlist.htm).  Now this code is very old, but its point about the efficiencies to be gained when copying sorted TStringLists applies.  Unfortunately, my stringlist, and Delphi’s TStringList, has a bug.

Suppose you have 2 TStringLists and the following code:

Strings1.Assign(Strings2);
if not Strings1.Equals(Strings2) then
  raise EListError.Create('Copy unsuccessful');

Do you think the exception will ever occur?  If you are like me, you would say no, and you would be wrong.

Here is the TStringList.Assign method:

procedure TStringList.Assign(Source: TPersistent);
begin
 inherited Assign(Source);
 if Source is TStringList then
 begin
   FCaseSensitive := TStringList(Source).FCaseSensitive;
   FDuplicates := TStringList(Source).FDuplicates;
   FSorted := TStringList(Source).FSorted;
 end;
end;

The inherited Assign clears the TStrings, copies some properties over, and then copies the strings using AddStrings.  The problem is subtle, but the error is that TStringList sets the Self properties after the strings have been copied (To be precise, the trouble is that the Self TStringList does not set the properties properly before copying the strings.  But more on that in a moment).

If Self has its Sorted property set to True, things start to take a wrong turn.  Basically, 3 errors can occur:

  • If the Source TStringList is not sorted, the 2 TStringLists are guaranteed to not be equal, e.g.,
Source := TStringList.Create;
Source.Add('Hello');
Source.Add('World');
Source.Add('Hello');
Dest := TStringList.Create;
Dest.Sorted := True;
Dest.Assign(Source);
if not Dest.Equals(Source) then raise exception.create('oops');
  • If the Source TString is sorted but the CaseSensitive property is not the same, the 2 TStringLists could possibly not be equal as they sort the strings differently. 
  • If the Source TStringList is sorted but contains duplicates, the 2 TStringLists could still not be equal.  The default value for the Duplicates property is dupIgnore, which means that the Self TStringList could be just throwing those duplicates away.  So not only are they not in the same order, but the 2 TStringLists don’t even contain the same number of items!

So how should the Assign method look?  The first thought is to put the TStringList property assigns before the inherited call, e.g.,

procedure TStringList.Assign(Source: TPersistent);
begin
 if Source is TStringList then
 begin
   FCaseSensitive := TStringList(Source).FCaseSensitive;
   FDuplicates := TStringList(Source).FDuplicates;
   FSorted := TStringList(Source).FSorted;
 end;
 inherited Assign(Source);
end;

This is better, but there is still a bug.  The problem is that even if the Source TStringList is Sorted and Duplicates = dupIgnore, there still may be duplicates in the Source TStringList!  Setting the Duplicates property does not actually do anything – it only works when strings are added.  So the Source TStringList could have added duplicate strings and then set its Duplicates property to dupIgnore or dupError afterwards.

The solution is to temporarily set the Self TStringList to unsorted and Duplicates to dupAccept.  Copy the strings and then copy the actual property values.  So, our final code looks like this:

procedure TStringList.Assign(Source: TPersistent);
begin
 if Source is TStringList then
 begin
   FDuplicates := dupAccept;
   FSorted := False;
 end;
 inherited Assign(Source);
 if Source is TStringList then
 begin
   FCaseSensitive := TStringList(Source).FCaseSensitive;
   FDuplicates := TStringList(Source).FDuplicates;
   FSorted := TStringList(Source).FSorted;
 end;
end;

One thing I am unsure about.  What should be done if the Source is a TStrings but is not implemented by a TStringList?  In that case, should the strings be sorted or not as they are added to Self?  My decision was that the Self TStringList should keep its own property values, even if that means the 2 objects, Self and Source, will not be equal.

Postscript

So will this “bug” ever be fixed?  My instinct says that it won’t.  This behavior has been around way too long and has become the expected behavior.  After all, the current behavior is a quick way to strip duplicate strings from a TStrings.  Changing the behavior now may break user’s code.  So that is why the title of this post asks is it a bug, is it a feature, or is it just a documentation error?

Happy CodeSmithing!