How to track down collation bugs

by Michael S. Kaplan, published on 2005/10/05 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/10/05/477108.aspx


Every once in a while, it nice to be able to blog about something I am working on while I am working on it. And over the past few days I have been clearing out a bunch of small bugs in collation edge cases, the earliest of which was reported back in 2001.

It occurred to me that there is a pattern I can use to narrow down where in the code the problem might be before I even look at the code. This is a good thing not because I am lazy but because there is a lot of code there....

So I thought it would be nice to lay out a lot of these issues in as post.

If this does not seem interesting to you, then you can probably move on at this point. :-)

Anyway, without furth adieu, the LIST -- worded as if one had a specific problem with sorting results and wanted to figure out the cause:

  1. Does it happen only in a specific software product or products (e.g. Word 2000, Excel 2002, Access (any version), SQL Server)?
  2. Does it happen only in Windows?
  3. Does it happen only in Explorer (the Windows Shell)?
  4. Does it happen if you switch string1 and string2?
  5. Does it happen only with certain locales?
  6. Does it happen only when you pass -1 for the string length parameters?
  7. Does it happen only when you pass explicit lengths for the string length parameters?
  8. Does it happen only with the Korean locale?
  9. Does it happen only with Jamo, Kana, Extension A, or other specific interesting subranges of Unicode?
  10. Does it happen only when particular NORM_IGNORE* flags are passed?
  11. Does it happen only when SORT_STRINGSORT is passed?
  12. Does something different happen if you use LCMapString on each string to get two sort keys and then compare them, instead?
  13. Does something different happen if you perform the same comparison on an older or newer version of Windows?
  14. Does something different happen if you make the same comparisons in the .NET Framework?
  15. Does it happen only with certain kinds of characters?

You can probably think of some specific issues that each of these problems might imply....

Anyway, back to bugs!

 

This post brought to you by "∂" (U+2202, a.k.a. PARTIAL DIFFERENTIAL)


no comments

referenced by

2010/09/07 Refusing to ignore some particular character's width isn't [always] an act of discrimination…

2007/10/09 A&P of Sort Keys, part 13 (About the function that is too lazy to get it right every time)

2006/03/30 If at first you don't succeed, there's probably still a bug

go to newer or older post, or back to index or month or day