by Michael S. Kaplan, published on 2005/12/21 04:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/21/506248.aspx
I have often talked about how awesome I think the MSDN Product Feedback Center is. But I do have to admit that sometimes bugs can kind of get lost....
As an example, Ben Monroe sent me the following via the contact link:
Greetings. I enjoy reading your blog. I find many of the issues to be quite interesting. Thanks for your contributions.
I reguarly tested and continue to test VS 2005. I came across an issue with surrogate handling. I reported it, but there hasn't been a response in 17 months. Perhaps it has been overlooked. If you have the time or interest, would you please review it or make sure that another capable person sees it?
"Backspace of Unicode characters above U+FFFF results in unpaired surrogates"
Bug ID: FDBK10993
(http://lab.msdn.microsoft.com/ProductFeedback/viewFeedback.aspx?FeedbackId=ffbcc915-2c0d-4f83-9b3c-107a090a4bf3)Thanks.
Allow me to apologize, Ben.
No one intentionally meant to be unresponsive, but this particular bug had been reported several different times, by many different people (I saw over a dozen bug reports put in across several product bug databases!). And unfortunately at some point when the bug report was resolved as a duplicate of a duplicate and so on, the follow-up information was not transferred to the active bug report (so the old bug report kept getting your updates but no one knew to look at it anymore since it was not active).
Again, I am sorry about that.
For the primary bug itself, the actual problem is an issue in the core OS control (you can reproduce the issue in Notepad, for example) and the managed controls are really wrappers around the core controls in this case. So the active bug in the "Whidbey" bug database was resolved as an 'external issue' while the bug report was put into the Windows bug database.
Believe it or not, the behavior is mostly intentional!
You see, in situations where one is typing digraphs or other compounds that are considered to be a single character to the user even when they are not represented by a single keystroke on the keyboard, the decision to have the DELETE key get rid of the entire text element yet have the BACKSPACE key only remove the last one is an intentional one. A decision that allows you to avoid the frustration of having to retype 2-4 or more characters if you only mis-typed the last one.
However, while this is a reasonable plan of action in most cases, it is obviously not so reasonable in the case of a supplementary character (since it is in most cases unlikely that individual keystrokes are the high and low surrogate code units). And in this case it probably would make more sense to delete both the high and low surrogate rather than just the trailing low surrogate, especially considering the fact that there really is no recovery other than hitting the BACKSPACE again (since the keyboard is unlikely to be made up of individual high and low surrogate code units with one per key). And all of that is ignoring the very real concerns that Ben pointed out about having unpaired surrogates in data streams -- a situation that is definitely best avoided.
The bug in the Windows database was reported fixed a few months ago, but was never marked as resolved (and does not appear to be fixed in the latest build I have installed on my machine, from a few days ago, so it is probably not fixed in the December CTP build, either).
I will keep an eye on this bug in any case, not only to make sure it is fixed but also to make sure the desired behavior for the non-supplementary character scenario is not unintentionally broken.
In any case, I am sorry about how letting you know kind of fell through the cracks. The MSDN Product Feedback Center is still a very awesome resource and at the very least the issue itself was still being tracked. We've got your back on this one!
This post brought to you by "𐂀" (U+10080, a.k.a. LINEAR B IDEOGRAM B100 MAN)
# Anutthara MSFT on 21 Dec 2005 4:20 AM:
# Marc Bernard on 21 Dec 2005 9:19 AM:
# Nick Lamb on 21 Dec 2005 11:28 AM:
# Michael S. Kaplan on 21 Dec 2005 12:08 PM:
# Ben Monroe on 21 Dec 2005 8:15 PM:
# Michael S. Kaplan on 21 Dec 2005 8:25 PM:
# Nick Lamb on 21 Dec 2005 8:37 PM:
# Michael S. Kaplan on 21 Dec 2005 9:32 PM:
Pavel Radzivilovsky on 6 Dec 2009 12:55 AM:
Then you guys mixed two unrelated things together.
Deleting part of a character by backspace is legitimate and may have a meaning.
It should have nothing to do with how many bytes does it take in UTF-16 or other encoding of unicode.
Especially given that UTF-16 will hopefully not survive in Windows - the UI should be designed regardless of underlying text encoding.
Michael S. Kaplan on 6 Dec 2009 7:26 AM:
UTF-16 is not going away, so your premise is somewhat faulty Pavel.
referenced by
2012/04/27 Should considering UTF-16 be harmful be considered harmful?
2008/10/06 UCS-2 to UTF-16, Part 4: Talking about the ask
2006/08/11 Are ligatures supposed to be thought of as 'single characters' ?
2006/06/22 Things I [don't] like about blogging
2006/06/21 Give me a break [Char] !
2006/02/17 What do you get when you combine a base character with a buttload of diacritics?
2005/12/30 More on cursor movement