BestBetter than nothing fit mappings, unleashed, #1

by Michael S. Kaplan, published on 2005/02/15 16:47 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/02/15/373713.aspx


The other day, in If the shoe [best-]fits.... I talked about best fit mappings in Windows code pages. I thought it might be fun to try to unravel them a bit to see what wa in there.

Note that these were all found using code that anyone can write -- just map every character in Unicode into a particular code page and save all of the ones that do not fill in a default character. There is a method to the madness (or is that madness to the method?). In the table below, only code points on cp1252 that have best fit mappings are shown. The first character is always the real two-way mapping, the others are one way only.

My favorite one is marked with RED. Full width ones are in DARK GREEN.

" " 0020 2000 2001 2002 2003 2004 2005 2006 3000
"!" 0021 01c3 ff01
""" 0022 02ba 030e ff02
"#" 0023 ff03
"$" 0024 ff04
"%" 0025 066a ff05
"&" 0026 ff06
"'" 0027 02b9 02bc 02c8 2032 ff07
"(" 0028 2320 ff08
")" 0029 2321 ff09
"*" 002a 2217 ff0a
"+" 002b 250c 2510 2514 2518 251c 253c 2552 2553 2554 2555 2556 2557 2558 2559 255a 255b 255c 255d 256a 256b 256c ff0b
"," 002c ff0c
"-" 002d 2010 2011 2212 2500 252c 2534 2550 2564 2565 2566 2567 2568 2569 ff0d
"." 002e ff0e
"/" 002f 2044 2215 ff0f
"0" 0030 2080 ff10
"1" 0031 2081 ff11
"2" 0032 2082 ff12
"3" 0033 2083 ff13
"4" 0034 2074 2084 ff14
"5" 0035 2075 2085 ff15
"6" 0036 2076 2086 ff16
"7" 0037 2077 2087 ff17
"8" 0038 2078 2088 221e ff18
"9" 0039 2089 ff19
"@" 0040 ff20
"A" 0041 0100 0102 0104 01cd 01de ff21
"a" 0061 0101 0103 0105 01ce 01df 03b1 ff41
"B" 0042 212c ff22
"b" 0062 0180 ff42
"C" 0043 0106 0108 010a 010c 2102 212d ff23
"c" 0063 0107 0109 010b 010d ff43
"D" 0044 010e ff24
"d" 0064 010f 0111 03b4 ff44
"E" 0045 0112 0114 0116 0118 011a 2107 2130 ff25
"e" 0065 0113 0115 0117 0119 011b 03b5 212e 212f ff45
"F" 0046 03a6 2131 ff26
"f" 0066 03c6 ff46
"G" 0047 011c 011e 0120 0122 01e4 01e6 0393 ff27
"g" 0067 011d 011f 0121 0123 01e5 01e7 0261 210a ff47
"H" 0048 0124 0126 210b 210c 210d ff28
"h" 0068 0125 0127 04bb 210e ff48
"I" 0049 0128 012a 012c 012e 0130 0197 01cf 2110 2111 ff29
"i" 0069 0129 012b 012d 012f 0131 01d0 ff49
"J" 004a 0134 ff2a
"j" 006a 0135 01f0 ff4a
"K" 004b 0136 01e8 212a ff2b
"k" 006b 0137 01e9 ff4b
"L" 004c 0139 013b 013d 0141 2112 ff2c
"l" 006c 013a 013c 013e 0142 019a 2113 ff4c
"M" 004d 2133 ff2d
"m" 006d ff4d
"N" 004e 0143 0145 0147 2115 ff2e
"n" 006e 0144 0146 0148 207f 2229 ff4e
"O" 004f 014c 014e 0150 019f 01a0 01d1 01ea 01ec 03a9 ff2f
"o" 006f 014d 014f 0151 01a1 01d2 01eb 01ed 2134 ff4f
"P" 0050 20a7 2118 2119 ff30
"p" 0070 03c0 ff50
"Q" 0051 211a ff31
"q" 0071 ff51
"R" 0052 0154 0156 0158 211b 211c 211d ff32
"r" 0072 0155 0157 0159 ff52
"S" 0053 015a 015c 015e 03a3 ff33
"s" 0073 015b 015d 015f 03c3 ff53
"T" 0054 0162 0164 0166 01ae 0398 ff34
"t" 0074 0163 0165 0167 01ab 03c4 ff54
"U" 0055 0168 016a 016c 016e 0170 0172 01af 01d3 01d5 01d7 01d9 01db ff35
"u" 0075 0169 016b 016d 016f 0171 0173 01b0 01d4 01d6 01d8 01da 01dc ff55
"V" 0056 ff36
"v" 0076 221a ff56
"W" 0057 0174 ff37
"w" 0077 0175 ff57
"X" 0058 ff38
"x" 0078 ff58
"Y" 0059 0176 ff39
"y" 0079 0177 ff59
"Z" 005a 0179 017b 2124 2128 ff3a
"z" 007a 017a 017c 01b6 ff5a
"{" 007b ff5b
"|" 007c 01c0 2223 2758 ff5c
"}" 007d ff5d
"~" 007e 0303 223c ff5e
"ƒ" 0192 0191
"¢" 00a2 20a1
"£" 00a3 20a4
"¤" 00a4 263c
"¦" 00a6 2302 2502 2524 2551 255e 255f 2560 2561 2562 2563 2588 258c 2590 2591 2592 2593 25a0
"¨" 00a8 0308
"«" 00ab 226a 300a
"¬" 00ac 2310
"¯" 00af 02c9 0304 0305 2580
"°" 00b0 02da 030a 2070 2218
"±" 00b1 2213
"´" 00b4 02ca 0301
"µ" 00b5 03bc
"·" 00b7 2024 2219 22c5 30fb
"¸" 00b8 0327
"»" 00bb 226b 300b
"Å" 00c5 212b
"Ð" 00d0 0110 0189
"Ø" 00d8 2205
"ß" 00df 03b2

All together, it looks something like this:

!ǃ!"ʺ̎"##$$%٪%&&'ʹʼˈ′'(⌠()⌡)*∗*+┌┐└┘├┼╒╓╔╕╖╗╘╙╚╛╜╝╪╫╬+,,-‐‑−─┬┴═╤╥
╦╧╨╩-../⁄∕/0₀01₁12₂23₃34⁴₄45⁵₅56⁶₆67⁷₇78⁸₈∞89₉9@@AĀĂĄǍǞAaāăąǎǟαaBℬBbƀbCĆĈĊČ
ℂℭCcćĉċčcDĎDdďđδdEĒĔĖĘĚℇℰEeēĕėęěε℮ℯeFΦℱFfφfGĜĞĠĢǤǦΓGgĝğġģǥǧɡℊgHĤĦℋℌℍHhĥħһℎhIĨĪĬĮİƗǏℐℑI
iĩīĭįıǐiJĴJjĵǰjKĶǨKKkķǩkLĹĻĽŁℒLlĺļľłƚℓlMℳMmmNŃŅŇℕNnńņňⁿ∩nOŌŎŐƟƠǑǪǬΩOoōŏőơǒǫǭℴoP₧℘ℙP
pπpQℚQqqRŔŖŘℛℜℝRrŕŗřrSŚŜŞΣSsśŝşσsTŢŤŦƮΘTtţťŧƫτtUŨŪŬŮŰŲƯǓǕǗǙǛUuũūŭůűųưǔǖǘǚǜuVVv⁶√vWŴW
wŵwXXxxYŶYyŷyZŹŻℤℨZzźżƶz{{|ǀ∣❘|}}~̃∼~ƒƑ´ˊ́¢₡£₤¤☼¦⌂│┤║╞╟╠╡╢╣█▌▐░▒▓■¨̈«≪《¬⌐¯ˉ̄̅
▀°˚̊⁰∘±∓´ˊ́µμ·․∙⋅・¸̧»≫》ÅÅÐĐƉØ∅ßβ

Fascinating. I guess they really are better than

!??"???#?$?%??&?'?????(??)??*??+??????????????????????,?-??????????????.?/???0??1??2??3??4???5?
??6???7???8????9??@?A??????a???????B??b??C???????c?????D??d????E????????e?????????F???f??G??
??????g?????????H??????h?????I??????????i???????J??j???K????k???L??????l???????M??m?N?????n???
???O??????????o?????????P????p??Q??q?R???????r????S?????s?????T??????t??????U?????????????u??
???????????V?v???W??w??X?x?Y??y??Z?????z????{?|????}?~???ƒ?´??¢?£?¤?¦?????????????????¨?«??¬
?¯????°????±?´??µ?·????¸?»??Å?Ð??Ø?ß

aren't they? :-)


# Philip Newton on 16 Feb 2005 1:23 AM:

I find it hard to believe that superscript-6 (U+2076) is best-fit mapped to lowercase v (U+0076); is the number 2076 off, perhaps?

Also, you have acute accent in there twice: once between florin sign and cent sign, and once between plus-minus and micron sign.

# Michael Kaplan on 16 Feb 2005 1:59 AM:

Oops.. good catch! I took that bogus entry and the dupe entry out.

Thanks. :-)

referenced by

2012/02/20 Where short file names can fail

2008/05/08 In hindsight, they may have BEST FIT these files where the sun never shines

2006/02/14 Every character has a story #18: U+06cc and U+064a (ARABIC LETTER FARSI YEH and ARABIC LETTER YEH)

2005/10/29 Why an 8-hour day seems to take forever

2005/04/18 A few of the gotchas of WideCharToMultiByte

go to newer or older post, or back to index or month or day