by Michael S. Kaplan, published on 2007/04/25 06:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/04/25/2268592.aspx
OK, I previously talked about the problem with Ordinal comparisons and one of the more uncool suggested ways around the problem.
So what is a potential better way to approach the problem?
Well, if we take the simplified model of the Ordinal comparison where every code point has an equal weight and focus on the fact that the real flaw is in the order, what is the best way to proceed?
Well, the entire table could be built up by ordering every single code point by their sort keys, breaking ties with the code point's numeric value.
And then giving each of those code points a weight from 0x0000 to 0xFFFF, beginning to end.
With each new version of the sorting tables (like when new Unicode versions come out and the new characters are added), this process can be repeated.
In fact, why wait for Microsoft to do this?
For the cost of 256k (the amount of space that 216 DWORDS will take up), you could implement this yourself! :-)
In fact (were I interviewing candidates at this moment!) I think this would make a fun interview question, focusing on not just building the table but designing the interface to use it.
I'd probably have to think this through a bit more first, and I'll most likely be thinking about something else entirely by then anyway. But if you ever found yourself dismayed by how stupid the results of an ordinal comparison seem then you could code this idea up pretty quickly.
If you go to the next Unicode Conference you could show me what you came up with and impress me....
This post brought to you by ○ (U+ffee, a.k.a. HALFWIDTH WHITE CIRCLE)