If they aren't in Unicode, what could they be in? And you can [smart-]quote me on that!

by Michael S. Kaplan, published on 2008/03/12 07:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/03/12/8163426.aspx

Please read the disclaimer; content of Michael Kaplan's blog not approved by Microsoft!

The other day Joe Clark asked me via the Contact link:

Does Word still use non-Unicode characters for smart quotes?

Now Joe is someone I might be talking about soon, as I have some more thoughts about subtitling and closed captioning and he is quite a master about those things and more, as you can see from this site of his. So I do tend to pay attention when he asks me questions....

He included a link with the source of his concerns, as thread on smart quotes and how much some people hate them right here:

I've submitted articles to magazines where they say in advance, no smart quotes.
posted by dances_with_sneetches at 4:28 PM on March 2

I've submitted articles to magazines where they say in advance, no smart quotes.
That's because Word's Smart Quotes are Windows-specific characters and drive some publishing typesetting software nuts, especially on the Mac. If you look at the printed mag I'll bet you hojillions they're printing typographer's quotes.
posted by bonaldi at 4:49 PM on March 2

Now everyone knows I am no fan of smart quotes in Word (ref: here and here, for example).

But to be honest, in a weird burst of provincialism, all of the so-called ANSI code pages have the following entries in them (where the language/scripts that use the code pages need them or not, perhaps even for the sake of Word):

0x91   <--->   0x2018     (Left Single Quotation Mark)
0x92   <--->   0x2019     (Right Single Quotation Mark)
0x93   <--->   0x201c     (Left Double Quotation Mark)
0x94   <--->   0x201d     (Right Double Quotation Mark)

Even so, you'll notice that these are all mappings to Unicode.

That's right, Unicode.

So every single character in there is gonna be Unicode whenever the document does (which is often).

Those publishers who complain?

For the most part they complain because they use programs like Quark Express that don't support Unicode, and sometimes (reportedly) fonts that don't support these characters well.

Perhaps if they would get on the ball and support a Unicode version of a program and a font for their publishing, everything would be good. :-)



# Mike Dimmick on 12 Mar 2008 10:34 AM:

To be fair, those byte-oriented characters are from Windows-1252 (typically) which is a Microsoft extension to ISO-8859-1, and some software in the past hasn't understood that there's a difference. (Internet Explorer was traditionally happy to accept 0x92 = U+2019 in a document declared as iso-8859-1, and some versions of Outlook would generate it that way.) 0x80-0x9F are C1 control codes in 8859-1.

Legacy MacApps would normally be expecting MacRoman (http://en.wikipedia.org/wiki/MacRoman) where 0x92 = U+00CD. MacRoman has got typographer's quotes but they're at 0xD2 - 0xD5 (and the double quotes appear before the single ones). Given the breadth of different byte-oriented codepages, I would have expected Quark Express to be able to handle Windows-1252, though. Perhaps it's a case of operator error, not setting an option correctly for importing a Windows file?

# Dean Harding on 12 Mar 2008 6:06 PM:

Perhaps it's because the typesetters like doing their own quotes, rather than relying on Word's implementation?

Jennifer Thomas on 5 Jan 2009 1:12 PM:

I'm using quotes as delimeters in vba and could not figure out where the heck they got those 8220 8221 codes - thanks for clearing up the mystery!

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2013/04/03 “The ‘Smart Quotes’ are leaking all over the Internet!”

2011/07/08 Not dumb, but dumb quotes! (aka Sorry Mr. Boehner, this one may be our fault)

go to newer or older post, or back to index or month or day