Sort keys - answers to several questions

by Michael S. Kaplan, published on 2011/12/08 07:01 -05:00, original URI:

In a comment on a blog I wrote nearly 7 years ago (How do sort keys work?), Alexander Savin asked another bunch of questions:

If I need to generate a sort key for the string I have to call LCMapString twice:

bufferSize = LCMapStringW(..., 0);

...Allocate buffer...

LCMapStringW(..., bufferSize);

Is there a performance flaw in this approach? Doesn't this imply that LCMapString does a double work? Does LCMapString need to scan an input string to determine the sort key size?

If it does, I would very much prefer having an ability to pass, in a single call, the pre-allocated buffer size and having LCMapString return the required buffer size if the input one appears to be not large enough.

What about the following approach (if I don't need a sort key itself but only its hash code)?

...Allocate 512 bytes on stack...

sortKeySize = LCMapStringW(..., 512);

if (sortKeySize == 0)


sortKeySize = LCMapStringW(..., 0);

...Allocate buffer...

LCMapStringW(..., sortKeySize);


...Calc hash...

It does a triple work in the worst case but should work faster in most cases. Shouldn't it?

Also, having to estimate the worst sort key size, considering an input string has the length of N, is it true that its sort key is no longer than (2*N + 1)*4?

And if IgnoreCase is specified, for example, will the [all Case weights] section be empty?

Michael, it would be very helpful if you could clarify all this? Thanks.

Giving answers to each question consecutively:

  1. I wouldn't say flaw so much as penalty.
  2. No imply, it *does* double work.
  3. Yes, it does.
  4. This is not provided.
  5. Maybe -- it depends on the strings in your scenaro.
  6. Just about, yes. Though that is subject to change, and may have changed since I left the team that owns it.
  7. Yes.

Hey, that was easy. :-)

no comments

go to newer or older post, or back to index or month or day