by Michael S. Kaplan, published on 2007/09/23 06:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/09/23/5071210.aspx
As I mentioned back in How do I feel about lstrcmpi? I think it blows...., the Mac CFString stuff has some fascinating issues related to collation that I thought I'd chat about, with me owning a MacBook Pro and with Microsoft making Silverlight run on it and all. :-)
Here are the CFString methods of interest that I am thinking about:
Searching Strings
CFStringCreateArrayWithFindResults CFStringFind CFStringFindCharacterFromSet CFStringFindWithOptions CFStringGetLineBounds Comparing Strings
CFStringCompare CFStringCompareWithOptions CFStringHasPrefix CFStringHasSuffix
First of all, do you see how they split between comparing and searching? In our FindNLSString/FindNLSStringEx functions, since prefix and suffix matching are a special case of a find operation, they are bundled together with the find, too. Though I can see the argument for the split in this way too....
But what actually interested me the most was the list of String Comparison Flags that all of these functions seem to take:
typedef enum {
kCFCompareCaseInsensitive = 1,
kCFCompareBackwards = 4,
kCFCompareAnchored = 8,
kCFCompareNonliteral = 16,
kCFCompareLocalized = 32,
kCFCompareNumerically = 64
};
CFStringCompareFlags;
There seem to be some odd interactions between some of the flags and some of the methods, which makes me suspect that not all of them work together in every case, and in other cases the combinations seem redundant (for example, what's the point of CFString::CFStringHasPrefix or CFString:CFStringHasSuffix with functions that have the kCFCompareAnchored flag? Like there is some other way to be a prefix or a suffix that isn't "anchored" by their definition?).
And some of the definitions seemed off, like their analogue for StrCmpLogicalW:
kCFCompareNumerically
Specifies that represented numeric values should be used as the basis for comparison and not the actual character values.
For example, “version 2” is less than “version 2.5”. Does not work if kCFCompareLocalized is specified on systems before 10.3.
There is a version where "2" > "2.5" ? Maybe they meant one where "version 10" is greater than "version 2" here, like sorting digits as numbers is meant to do? Or did they also extend it to decimals here, as well (in which case the example is still dumb but the functionality is pretty damn cool and I'd love to know how well it works and what else it can do) ?
Plus some of the encoding stuff looked unusual, I thought that might be worth a look too.
Anyway, after I saw it all, I realized the documentation was really insufficient, and I wanted a Sorting the Mac all Out blog to read so I could find out what was really up with these methods. I couldn't find one in my cursory search, so at some point I may have to start digging in and playing with stuff (unless someone knows of such a blog, of course!).
This post brought to you by ⓐ (U+24d0, a.k.a. CIRCLED LATIN SMALL LETTER A)
Mihai on 24 Sep 2007 3:55 AM:
<<Plus some of the encoding stuff looked unusual, I thought that might be worth a look too.>>
This is the part that I don't like at all about CFString: the stuff inside can have various encodings, with little control. The way you put something in with an encoding, and take it out in others, sometimes it can fail (if a conversion happens), sometimes might succeed (if there is no conversion or the conversion succeeds), some encoding can be internal, some cannot.
It just feels "unclean" and "hacked" together.
Rosyna on 24 Sep 2007 4:50 AM:
The CFStringStringHasSuffix/HasPrefix are simply convenience functions. They only actually call:
return CFStringFindWithOptions(string, prefix, CFRangeMake(0, CFStringGetLength(string)), kCFCompareAnchored, NULL);
return CFStringFindWithOptions(string, suffix, CFRangeMake(0, CFStringGetLength(string)), kCFCompareAnchored|kCFCompareBackwards, NULL);
However, for such simple operations as finding a prefix/suffix, passing all those args is tedious and makes code less readable.
Also, compare comparing 2 to 2.0 to 2.00 to 2.5 to 2.50 numerically.
And if you look at the CFString.h header, it lists which options work with which functions (since some can obviously not be used well together). 2.5 is a number too.
Of course, all of this CFString stuff is completely opensource, but I don't recommend anyone look at the source code as it will make you head asplode.
Note that all of this stuff is toll-free bridged with NSString so you might find some of the documentation in NSString.
Wevah on 24 Sep 2007 4:51 AM:
CFStringHasPrefix/CFStringHasSuffix are just convenience functions that call CFStringFindWithOptions with the kCFCompareAnchored specified (and kCFCompareBackwards in the case of CFStringHasSuffix).
Rosyna on 24 Sep 2007 8:26 AM:
Mihai, this is a design goal of CFString. It's an opaque data type, it is not a bucket of bits. It can contain multiple encodings in one string as well. If you want an explicit encoding, you ask for it. If a character cannot be rendered in an encoding (like a CJK character when you want ASCII) then it will, of course, be a lossy conversion.
The internal encoding of CFString means diddly crap.
Do you have a very specific example that fails when it should not?