“The ‘Smart Quotes’ are leaking all over the Internet!”

by Michael S. Kaplan, published on 2013/04/03 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2013/04/03/10407114.aspx


I've talked about Word's "Smart Quotes" many times over the years, in blogs like this one and this one and this one and this one and this one.

Oh, and then there's this other one.

I like that last one best since Gwyneth from Office gave me that cool table from some version of Word or another.

In fact, I like it so much that I'll show it again here:

Symbol
Unicode Value
Language
'O'
U+0027 Danish, Dutch, English, Finnish, Norwegian, Swedish
"O"
U+0022 Danish, Dutch, English, Finnish, Norwegian, Swedish
''O''
U+0027 Danish, Dutch, English, Finnish, Norwegian, Swedish
‘O’
U+2018, U+2019 Dutch, English, Italian, Norwegian, Portuguese, Spanish
‛O’
U+201B, U+2019 Dutch, English, Italian, Spanish
’O’
U+2019 Danish, Finnish, Hungarian, Norwegian, Swedish
,O‘
U+201A, U+2018 Bulgarian, Czech, German, Icelandic, Lettish, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Ukrainian
,O’
U+201A, U+2019 Afrikaans, Dutch
‛O,
U+201B, U+201A Greek, Italian, Turkish
“O”
U+201C, U+201D Dutch, English, Italian, Portuguese, Spanish, Turkish
O”
U+201F, U+201D Dutch, English, Italian, Portuguese, Spanish, Turkish
„O“
U+201D, U+201C Bulgarian, Czech, German, Icelandic, Lettish, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Sorbish
„O”
U+201E, U+201D Afrikaans, Danish, Dutch, Hungarian, Polish, Russian
”O”
U+201D Danish, Finnish, Norwegian, Swedish
“O„
U+201C, U+201E Greek, Italian, Turkish
‹ O ›
U+2039, U+203A Albanian, Byelorussian, Estonian, French, Greek, Italian, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Turkish
›O‹
U+203A, U+2039 Danish, Polish, Serbian, Slovak,Slovenian
›O›
U+203A Finnish, Swedish
« O »
U+00AB, U+00BB Albanian, Byelorussian, Dutch, Estonian, French, Greek, Italian, Lettish, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian
»O«
U+00BB, U+00AB Croatian, Danish, German, Hungarian, Polish, Serbian, Slovak, Slovenian
»O»
U+00BB Finnish, Swedish
〝O〟
U+301D, U+301F East Asian
〞O〟
U+301E, U+301F East Asian
「O」
U+300C, U+300D East Asian
『O』
U+300E, U+300F East Asian
׳O׳
U+05F3 Hebrew
״O״
U+05F4 Hebrew

 But it isn't just Word.

PowerPoint does something like this from their own list.

OneNote does some of this too.

And Publisher.

Outlook picks it up from Word, of course.

There are probably others.

I think there might be a shared Office service in the latest version that some (but not all!) of them use.

And after all these years it has leaked elsewhere.

Like to blogging software and apps.

Just the other day I was asked to comment on a Performance Counter which, when localized into French, turned an ordinary U+0027 to a U+2019.

They definitely preferred the way it looked, but I was reflecting on the fact that none of our keyboards have these fancy quotes.

What if some developer writes a script comparing the name? Better use the ID value, just to be safe!

I'm afraid to ask whether

  1. they did the work in Word, or
  2. they did the work in some other Office app that does this, or
  3. they used some non-Microsoft Suite that does it, or
  4. their localization tools did this, or
  5. they used MSKLC and added the smart quotes to their own keyboard.

Only one answer (#5) would make me blush and smile; all the rest would just make me a little bit sad.

I shudder to think of how many will mangle this blog's title....

“The ‘Smart Quotes’ are leaking all over the Internet!”


Doug Ewell on 3 Apr 2013 7:30 AM:

I don't know what that bogus double-apostrophes version ''O'' is doing on the chart, but it can't be any worse than the horrible ``O'' hack that Unix/Linux fanboys use, seemingly to prove they're not Microsoft. Sometimes I'd almost rather prefer boxes or question marks.

I did add the directional quotes (among much else) to my own keyboard using MSKLC.

Heath Stewart on 3 Apr 2013 9:44 AM:

It's also fun when writing code snippets or command lines for users to run. As you pointed out, a lot of blogging and other WYSIWYG editors use smart quotes or EM dashes and I probably spend more time scrubbing snippets or command lines from those than writing the meat of them. If I don't, it never fails that someone copies and pastes and the command fails. That gets masked in cmd.exe, where smart quotes - when copied and pasted into the console buffer - look like regular quotes with fixed width fonts. That's tripped me up a few times with cryptic error messages. Someone needs to give smart quotes a V8 and end the madness!

Joshua on 3 Apr 2013 12:59 PM:

> horrible ``O'' hack

Try it with the old fonts. ` and ' are mirrors of each other.

Azarien on 4 Apr 2013 3:09 AM:

I use a 3rd party keyboard layout with various smart quotes: «» „” “” ‚’ ‘’ (and the ones I'm using most of the time are these: „”)

Max Battcher on 4 Apr 2013 7:44 AM:

Yes, the Colemak keyboard layout has a bunch of the “smart” quotes directly as AltGr combinations. Then some fun things in Windows-only (currently) that use the AltGr+\ dead key, like ellipsis at AltGr+\.… It's a good way to test Unicode support in random apps when you can quickly type things like that.

Also, I know of Python and Ruby libraries that tackle smart quote automation. (I use a Python library on my own blog, for example, because they do look nice in reading long articles. My blog posts are, for the most part, stored with straight quotes and the library replaces them on output.)

Samuel A. Falvo II on 4 Apr 2013 8:51 AM:

Doug, the double-tick quotes predate Windows by almost two decades, and possibly longer.  Not only are they characters that exist on the common, every-day keyboard, they are representable in plain-text documents without the aid of GUIs or special readers (e.g., WordPad, Word, HTML graphics engines, etc.), and when fed through typesetting systems such as TeX and any of its macro packages like LaTeX, they get converted to the proper opening and closing quotation marks automatically.

So, this has nothing what-so-ever to do with ``proving we're different from Microsoft,'' and everything to do historical convention.  If anything, it was Microsoft that wanted to be different by coming up with SmartQuote feature, with the full knowledge that an established, cross-platform convention already existed.  And, as an early Windows user myself, I distinctly remember numerous documents using the convention before SmartQuotes was pushed as a notable feature of Word.

Doug Ewell on 5 Apr 2013 10:13 AM:

I remember the paired double-tick quoting convention, and the fonts that made them look halfway decent, from back in the ’80s. They were an attempt to implement directional quotes in an ASCII-only world, similar to conventions like “nai:ve” and “cafe'” (I remember seeing both of these). And as Samuel said, they could be automatically converted for more capable systems. But they always looked at least a bit awkward in monospaced fonts (which were typical for that era), and downright crappy when the glyph for ' is not a mirror of `.

Latin-1 added ï and é, and other supra-ASCII character sets like Windows-1252 (and of course Unicode) added “ and ”, but that is little help to users with ASCII-only keyboards. That’s why automatic conversion features like “smart quotes” were created, and they are really no different from the TeX feature Samuel mentioned. Users who have typed `` and '' for decades probably won’t relearn their habit, but smart quotes could easily be applied to those sequences as well. (Users who are *just now* learning to type `` and '' are, I still contend, waging a protest of some sort.)

People don’t like smart quotes because the conversion algorithm from straight quotes isn’t perfect in every case, and can’t be, and because some time ago, some tools that applied smart quotes also happened to declare the wrong character set. Document-creation tools that labeled Windows-1252 text as ISO 8859-1 became legendary and infamous; the reputation lasted long after the actual bugs were fixed. Readers that assume any non-ASCII text must be 8859-1 are their own problem.

I long for the day when ordinary users do think of Unicode text as “plain-text documents,” not something arcane that requires “GUIs or special readers.”

Kwpolska on 5 Apr 2013 10:56 AM:

> They definitely preferred the way it looked, but I was reflecting on the fact that none of our keyboards have these fancy quotes.

Does having them in my fancily-customized keyboard layout based on the X11 pl layout (already having “‘”’ — I added „‚ in terms of quotes) count?


go to newer or older post, or back to index or month or day