What Unicode version do you support?
by Michael S. Kaplan, published on 2005/12/23 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/23/506887.aspx
When I was in my mid-20s, I lived in Columbus, Ohio. Living next door to me was a nice couple (Robert and Wendy) who were trying to start a family, and they were really having a tough time with it.
(I promise there is a point to this particular recollection!)
After a lot of effort and clinic visits and so forth (details are of course not relevant here), they finally managed it; she was pregnant.
Any time people asked Wendy "Are you having a boy or a girl?", something that was reportedly happening a lot, her answer was invariably "Yes, I hope so! Having a boy or a girl would be great!".
It has been many years since that time, but let me tell you that I think about Wendy and her answer any time someone asks me the question:
What version of Unicode does MS [Windows|.NET|SQL Server|Office|Bob] support?
I think that Wendy, if she is reading this right now, might be proud to hear my new answer to that eternal (or should I say infernal?) question:
The version released by The Unicode Consortium.
Because there really is no definitive answer to this very non-specific question. The answer always depends entirely on the [usually one] specific issue that the person asking is looking for the answer to. For example:
- If they are looking to conformance to a particular part of the standard such as normalization or UTF-8, then there may or may not be a specific answer, and we seldom put numbers on versions we support for that very reason (Unicode and our products are on 20 or more very different shipping cycles).
- If they are looking for UCA support, the answer is Microsoft does not use the Unicode Collation Algorithm so they are definitely asking the wrong question (though they have made some changes to be a little more like us).
- If they want to know whether ________ is supported (they fill in the blank with the language or script of choice) then the answer is that any version of Unicode supports subsets which means supporting a version does not mean supporting any particular character or characters -- they should ask whether the language or script is supported.
- If they want to know whether ________ is supported (they fill in the blank with a particular character) then that subset answer I just pointed out applies, as does the fact that it really depends on what they man -- do they mean in fonts, in Unicode properties, in collation, in fallback/linking/shaping, or what?
- If they are looking to know about our Bidi support, then they should try to understand that Microsoft's support of bidirectional scripts in products predates UAX #9 (The Bidirectional Algorithm), and that in truth UAX #9 has been moving toward our implementation rather than the other way around!
- If they want to know about Unicode properties, then shucks, I don't know what to tell you -- depends on product version. Though Whidbey is 4.1 not 3.2 and Vista hasn't shipped yet but the latest CTP is 4.1.
- If they actually have no idea what they mean but are trying to fill in a space in a line item on a form that leaves a space for a number then they can make one up, since there is no answer anyway....
- (I could go on but you get the point, I think!)
So the polite answer in the end is IT DEPENDS ON WHAT YOU MEAN. CAN YOU ELABORATE A BIT?
But for now, I am going to stick with my new answer.
Perhaps it is ornery.
But I think Wendy would want it this way.... :-)
This post brought to you by "𝍖" (U+1d356, a.k.a. TETRAGRAM FOR FOSTERING)
# Roman Belenov on 23 Dec 2005 4:17 AM:
It's a silly question, but -
Can you recommend some font that includes today's featured code point ?
# Michael S. Kaplan on 23 Dec 2005 10:27 AM:
Not a silly question at all, Roman!
If you click on the fileformat.info link for the code point, there is a link on that page with font(s) that support the character....
# Chris on 23 Dec 2005 1:04 PM:
I am so glad you posted this. I recently worked on a unicode enablement project where the QA manager demanded to know what version of Unicode we would support. Frankly I was at a loss for an answer! Anyway Mr. QA Manager went ahead and decided that it was X.X version and thats what it said in the test plan.
# Roman Belenov on 26 Dec 2005 4:05 AM:
# Nick Lamb on 28 Dec 2005 2:20 PM:
Although agreeing with the general line of argument that it's meaningless to ask such a vague question, it seems to me that Windows -- at least in the Japanese and Korean locales -- can't conform to any version of Unicode because they all require ASCII and they all include a normative requirement that characters appear (if at all) as described in the standard... No exceptions for mistakes made in the 1980s or 1990s by enthusiastic but misguided programmers from the Raymond Chen school.
# Michael S. Kaplan on 28 Dec 2005 3:35 PM:
I am not sure anyone in Unicode really does consider what Windows does with the path separator to be a Unicode conformance issue, as fate would have it....
# Nick Lamb on 28 Dec 2005 4:40 PM:
That's an odd answer, care to elaborate?
# Michael S. Kaplan on 28 Dec 2005 7:55 PM:
As a standard, Unicode rules are pretty clear, but they are more willing to take into account the things that customers actually do while using the standard. So what Microsoft does with the reverse solidus for backcompat with filesystems that have been working continuously since before Unicode was even started is not the sort of thing that the folks who maintain the standard get snippy about.
Blake on 11 Dec 2008 11:15 AM:
Is there a specific version of unicode which SQL Server 2000 complies to with regards to languages or unicode scripts?
I guess the better question is what languages are capable of being encoded in UCS-2 using SQL Server 2000. If the answer is not available, are you able to provide a list of Unicode scripts which are supported?
Michael S. Kaplan on 11 Dec 2008 11:22 AM:
You can STORE data of any version, even ones that don't exist yet. Beyond storage, it depends on the operation....
Michael S. Kaplan on 11 Dec 2008 9:35 PM:
Sorry Blake, I accidentally deleted your reply (some overactive spam filtering). The part I managed to save:
"Thanks for the quick response. I'm interested in performing string manipulation, searching, ordering etc. on stored text data in Unicode. I understand that UCS-2 is used as the encoding which doesn't support surrogate pairs rather a subset of the full"
This is a common misconception. EVERY version of Unicode has supported the notion of not mucking with unassigned characters; from the standpoint of SQL Server, these are just things you can store. Now perhaps they aren't legal as identifiers and such. The data will be just fine in any Unicode column...
Please consider a donation
to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.
go to newer or older post, or back to index or month or day