What do you get when you combine a base character with a buttload of diacritics?
by Michael S. Kaplan, published on 2006/02/17 04:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/02/17/533929.aspx
The other day I was looking at a particular bug repro (it was actually that BACKSPACE vs. DELETE bug I have mentioned before, if you were curious).
Anyway, I decided to take the letter a and put as many different diacritics on it as I could. Here it is:
And here are the code points:
0061 0300 0301 0302 0303 0304 0305 0306 0307 0308 0309 030a 030b 030c 030d 030e 030f 0310 0311 0312 0313 0314 0315 0316 0317 0318 0319 031a 031b 031c 031d 031e 031f 0320 0321 0322 0323 0324 0325 0326 0327 0328 0329 032a 032b 032c 032d 032e 032f 032f 0330 0331 0332 0333 0334 0335 0336 0337 0338 0339 033a 033b 033c 033d 033e 033f 0340 0341 0342 0343 0344 0345 0346 0347 0348 0349 034a 034b 034c 034d 034e 0360 0361
Here is how it looked in Notepad:
Scary, huh? Somewhere under that mess is a letter a. Tahoma does have a specific way of dealing with those diacritics that just seems to not do so well in outrageous situations, huh? :-)
Then, on the advice of someone on the typography team, I tried Segoe UI, one of the new Vista fonts. The results were a little different:
Doesn't it seem like some of them are missing?
So I talked to Judy (Safran-Aasen) and Simon (Daniels) and they suggested I remember that just because I can't see something doesn't mean that it isn't there. They suggested I try looking at it Word while increasing the before and after point size per line. So I did and suddenly saw what they were expecting (here it is with both fonts, side by side):
It is very difficult to explain why I think this is so cool. You may just have to accept that there are two kinds of people -- people who think it is cool and people who think those other folks need counseling.
Or maybe it is that there are 10 kinds of people (those who understand binary and those who don't).
Obviously there is no perfect solution to this sort of unreal situation, but I think the stacking behavior may be much cooler for a whole bunch of normal cases.
Now what these weird cases do for sorting is a different story, one that I will talk about another time....
This post brought to you by "a" (U+0061, LATIN SMALL LETTER A)
# Serge Wautier on 17 Feb 2006 4:27 AM:
I never figured one could create smileys using diacritics only :-)
Speaking of these 10 categories, I guess you fall in the one where people can't make a difference between Halloween and Christmas because Dec 25 and Oct 31 are no different...
# check on 17 Feb 2006 4:28 AM:
# aidan_walsh on 17 Feb 2006 4:30 AM:
"What do you get when you combine a base character with a buttload of diacritics?"
Notepad getting burninated!
# Rosyna on 17 Feb 2006 5:04 AM:
Is there a smiley face hidden in there...?
# Michael S. Kaplan on 17 Feb 2006 8:39 AM:
Not intentionally -- but if you stack the right diacritics....
The Unicode version of ASCII art!!!! :-)
# Ben Cooke on 17 Feb 2006 1:25 PM:
# Mihai on 17 Feb 2006 2:23 PM:
This is 100% useless and I don't think there is something reasonable in the whole mess. Nobody should expect such a thing to render or sort “properly” (whatever that means).
I think software should try to solve real needs first.
In the same time I think it is cool :-) Probably in the same way some people throw paint on a wall and call it art. Is random, illogical, unexpected.
# Michael S. Kaplan on 17 Feb 2006 4:24 PM:
Is the extreme case useless? Of course it is. But is the technology useful for handling cases that are more extreme than what we might currently do but still within user requirements? Definitely!
The fact that the very extreme case does something is a side effect of the generic framework that is put in place....
# Michael S. Kaplan on 17 Feb 2006 4:27 PM:
I agree, that is amazing. :-)
# Maurits [MSFT] on 17 Feb 2006 5:20 PM:
... one massive "CharNext(ch) - ch" delta.
# Michael S. Kaplan on 17 Feb 2006 10:48 PM:
Great answer to that riddle. :-)
# Vorn on 18 Feb 2006 4:20 AM:
...I need therapy.
# Ben Cooke on 18 Feb 2006 6:30 AM:
Further to my previous post containing a big mess of diacritics, I notice that my browser highlights only the base characters when I select that text and not the diacritics. I wonder why that is?
# Si on 18 Feb 2006 9:55 AM:
Should just add that the OpenType Layout tables that do this, 'mark to base' and 'mark to mark' glyph positioning, are also in the Windows Vista versions of Tahoma, Microsoft Sans Serif, Arial and Times New Roman. Also the marks are not just automatically centered above or below the base characters, over 80,000 combinations (per font file) were visually proofed and many combinations manually positioned to get the best results.
# Michael Dunn_ on 18 Feb 2006 9:57 PM:
I would imagine that the stacking behavior would be preferable, just thinking of cases like Vietnamese and IPA where it's common to have 2 modifiers on a letter.
go to newer or older post, or back to index or month or day