Dumb quotes... or maybe they are just smart-ass quotes

by Michael S. Kaplan, published on 2007/02/20 03:31 -08:00, original URI: http://blogs.msdn.com/michkap/archive/2007/02/20/1724946.aspx


(I think I mentioned 'Smart Quotes' previously, in passing

If I had a dime for every time someone who was having trouble getting the Regional and Language Options unattend setting to work who posted as the command line they were running something like this:

control intl.cpl,, /f:”filename.txt”

Then I'd have to worry about the tax bracket I was going to be put into....

In case you can't see the problem, it is pretty obvious if you blow up the text some:

control intl.cpl,, /f:”filename.txt”

At some point the person was looking at instructions in documentation or in email written by a copy of Outlook that has Word set as its mail editor.

It replaced the regular ASCII quotes with so-called "smart" quotes, which can turn " (U+0022, a.k.a. QUOTATION MARK) into something else such as  (U+201d, a.k.a. RIGHT DOUBLE QUOTATION MARK). Which of course the command prompt will not recognize.

Man I hate that feature. Not because it is isn't useful, because it can be. But it is not quite smart enough of a feature to know when it isn't helpful!

Anyway, colleague Gwyneth Marshall provided me with as list that some version of Office uses for quotes used in different languages:

Symbol
Unicode Value
Language
'O'
U+0027 Danish, Dutch, English, Finnish, Norwegian, Swedish
"O"
U+0022 Danish, Dutch, English, Finnish, Norwegian, Swedish
''O''
U+0027 Danish, Dutch, English, Finnish, Norwegian, Swedish
‘O’
U+2018, U+2019 Dutch, English, Italian, Norwegian, Portuguese, Spanish
‛O’
U+201B, U+2019 Dutch, English, Italian, Spanish
’O’
U+2019 Danish, Finnish, Hungarian, Norwegian, Swedish
,O‘
U+201A, U+2018 Bulgarian, Czech, German, Icelandic, Lettish, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Ukrainian
,O’
U+201A, U+2019 Afrikaans, Dutch
‛O,
U+201B, U+201A Greek, Italian, Turkish
“O”
U+201C, U+201D Dutch, English, Italian, Portuguese, Spanish, Turkish
O”
U+201F, U+201D Dutch, English, Italian, Portuguese, Spanish, Turkish
„O“
U+201D, U+201C Bulgarian, Czech, German, Icelandic, Lettish, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Sorbish
„O”
U+201E, U+201D Afrikaans, Danish, Dutch, Hungarian, Polish, Russian
”O”
U+201D Danish, Finnish, Norwegian, Swedish
“O„
U+201C, U+201E Greek, Italian, Turkish
‹ O ›
U+2039, U+203A Albanian, Byelorussian, Estonian, French, Greek, Italian, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Turkish
›O‹
U+203A, U+2039 Danish, Polish, Serbian, Slovak,Slovenian
›O›
U+203A Finnish, Swedish
« O »
U+00AB, U+00BB Albanian, Byelorussian, Dutch, Estonian, French, Greek, Italian, Lettish, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian
»O«
U+00BB, U+00AB Croatian, Danish, German, Hungarian, Polish, Serbian, Slovak, Slovenian
»O»
U+00BB Finnish, Swedish
〝O〟
U+301D, U+301F East Asian
〞O〟
U+301E, U+301F East Asian
「O」
U+300C, U+300D East Asian
『O』
U+300E, U+300F East Asian
׳O׳
U+05F3 Hebrew
״O״
U+05F Hebrew
 

Now if only Office 14/Word 14 can be made smart enough to detect the cases where the feature is not needed, it could save everyone a lot of grief!

We're all sick of the smartass/dumbass aspects of this particular feature.:-)

 

This post brought to you by  (U+201d, a.k.a. RIGHT DOUBLE QUOTATION MARK)


# Mike Dimmick on Tuesday, February 20, 2007 7:56 AM:

It's a Do-What-I-Mean feature and as such is pretty intractable - it will always do something wrong. Likewise AutoFormat.

Just recently I answered a question on the WiX mailing list in which I posted a command line. The Word email editor autocorrected my hyphens (to indicate a command line switch) to en-dashes. The original poster replied saying it wasn't working. Thankfully they posted the command-line output - where it was clear that the en-dashes had been replaced by question marks (and the error message was something like 'cannot find file ?ext').

You just have to remember to hit Undo if you spot Word making the change!

# Björn on Tuesday, February 20, 2007 9:09 AM:

Another reason why text only emails are the best!

I finish this rather useless comment with a cheerful "Yay ASCII!" :)

# Christoph Päper on Tuesday, February 20, 2007 9:47 AM:

The last time I checked (which has been a while ago), Word was not able to do autoconversion of ' to single quotes low-9 + high-6 (‚…‘, German etc.) correctly, because ' could also mean the apostroph ’. This does not matter in English, because apostroph and closing quotation mark are the same glyph.

Anyway, “smart quotes” were the wrong approach in the first place, one should have altered keyboard layouts instead (or at least additionally).

# Bertilo Wennergren on Tuesday, February 20, 2007 9:55 AM:

Great table!

But I think you have some typos at the end.

"U+301F" should probably be "U+301E".

And for some incomprehensible reason "U+301D" displays as something that is definitely not a quotation mark in my browser, although it should be. Perhaps my fonts are borked...

# Jeffrey L. Whitledge on Tuesday, February 20, 2007 2:48 PM:

I never use Word as an editor because it is always messing with what I type. I HATE it when it tries to figure out what I mean instead of just doing what I actually told it.

Because of that, I’m constantly typing ALT-0146, ALT-0147, ALT-0148, etc. to get the right quote marks on everything—and the correct dashes!

What I want to know is, why doesn’t the command-line parser know how to handle surrounding quotation marks? I think COPY “FILENAME.TXT” should be a legal command.

# Maurits on Tuesday, February 20, 2007 4:40 PM:

In Office 2007 the ability to turn Word off as the email editor goes away.

Plain text emails are just as susceptible to AutoFormat as HTML email.

> why doesn’t the command-line parser know how to handle surrounding quotation marks?

Please, no... the solution to DWIM is not more DWIM.

# Centaur on Wednesday, February 21, 2007 4:13 AM:

Christoph Päper wrote:

> Anyway, “smart quotes” were the wrong approach in the first

> place, one should have altered keyboard layouts instead (or at

> least additionally).

+1.

Michael S. Kaplan wrote:

> Now if only Office 14/Word 14 can be made smart enough to

> detect the cases where the feature is not needed

Easy. Quotes must be left “as is” inside <code>, <kbd> and <samp> elements. And maybe also <pre>. In Word, this maps to predefined “HTML Code”, “HTML Keyboard”, “HTML Sample” and “HTML Preformatted” styles. Marking up samples and code as such is then responsibility of authors.

# Michael S. Kaplan on Wednesday, February 21, 2007 4:26 AM:

I want something smarter than that, though -- because not everyone obsessively marks styles, especially in EMAIL! :-)

# RubenP on Wednesday, February 21, 2007 11:33 AM:

I think you've got Dutch mentioned a few times to many. That is, I've never seen ‛O’ U+201B + U+2019 or ‟O” U+201F + U+201D.

And I think ,O’ U+201A + U+2019 never existed either, since Dutch only recently started using single quotation marks under the influence of American typography (IIRC). So I believe this is a hypercorrect form of ‘O’ U+2018 + U+2019, under the influence of „O” U+201E + U+201D, which itself has become extinct under the weight of American typographical conventions (such as these „smart” quotes and IJ „autocorrecting” to Ij).

# Maurits on Wednesday, February 21, 2007 11:35 AM:

How about moving the "smart quote" feature to the rendering level?  Store quotes as " in the bit stream, and have Word display them as “ and ” at render time as it sees fit.

But when shoving to the clipboard, make a faithful copy of the bit stream.

# Mihai on Wednesday, February 21, 2007 12:00 PM:

It seems that the Romanian Academy decided that they have to mess-up with the quotes (and other things), so you will have to update the Romanian entries.

The new Romanian keyboards in Vista know about it (I did not, until I have filed a bug and was classified as "not a bug" :-)

So, you might also want to let the Office team know :-)

The external quotes are 99-down  99-up, and the internal ones are the French chevrons, but with not spaces:

http://www.secarica.ro/misc/InstitLingvTastatura-20031008.pdf

-- Original quote -----------------------------

1., 2. în limba română, semnele citării sau ghilimelele sunt, aşa cum se indică în unica lucrare

normativă actualmente în vigoare în acest domeniu (Academia Română, Institutul de Lingvistică „Iorgu

Iordan”, Îndreptar ortografic, ortoepic şi de punctuaţie, ediţia a V-a, Univers Enciclopedic, Bucureşti,

1995), „...” [deci 99 jos, 99 sus] ori «...» (p. 79), cu precizarea că ghilimelele «...» se aşază în interiorul

textului cuprins între ghilimelele „...” (p. 81);

== Translation (aprox, not pro :-) ======

1., 2. in the Romanian language the quotes are (as specified by the normative document

actually valid in the field, the Romanian Academy, the Institute of Linguistics „Iorgu

Iordan”, Guide of orthography, orthographic, and punctuation,  5th edition, published by Univers Enciclopedic, Bucharest,

1995), „...” [99 down, 99 up] or «...» (p. 79), with the specification that the quotes «...» are to be used inside

the text contained already between the quotes „...” (p. 81);

-------------------------------

It is reflected in the Romanian keyboard standard SR 13392:2004, and ISO/IEC 8859-16:2001

(both based on SR 13411:1999)

# Mihai on Wednesday, February 21, 2007 12:05 PM:

BTW "smart" quotes: MS Word, up to version 97 (I think) was doing something really-really dumb with the quotes.

When typing on a French system in a French document you got the chevrons (correct). But if the file was saved as RTF, and opened on an English system, the quotes where displayed as English quotes.

The dumb thing was not having a RTF tag for opening and closing quote, but the fact that Word used the user locale instead of the paragraph language to determine the look.

# Mihai on Wednesday, February 21, 2007 12:09 PM:

<<Quotes must be left “as is” inside <code>, <kbd> and <samp> elements. And maybe also <pre>. In Word, this maps to predefined “HTML Code”, “HTML Keyboard”, “HTML Sample” and “HTML Preformatted” styles. Marking up samples and code as such is then responsibility of authors.>>

Might be good, but not too useful.

Because tagging with styles is done after the text is done, but the quotes are converted when you type them.

Now imagine the functionality. You type, and type some quotes, they are converted to "smart" ones. Then you select the text area and mark it code. Now what? If you convert all the selected quotes to "dumb" quotes, I will again call fool! Because I can give a string example with smart quotes in code:

ids_Help "Click “Help” for more info"

So, this is not such a simple decision.

# RubenP on Wednesday, February 21, 2007 12:33 PM:

@Maurits: I think that's an even worse idea, seeing the current 'intelligence' Word seems to apply. I'm constantly fighting Word to get it to display 's, 't and 'n right (with a 9 rather than a 6; it always ends up as ‘S rather than ’s), and attr=”O” instead of attr=“O” or attr="O" looks really awful when you're using a font like Consolas, etc., etc. Now, you can still Ctrl+Z your way out.

# Mihai on Wednesday, February 21, 2007 2:05 PM:

The Romanian quotes in this table are not correct (any more).

This has changed about 8 years ago (the members of the Romanian Academy mush justify their salaries :-)

The new Romanian keyboards in Vista already provide these new quotes. It might be time for the Office team to update it's tables too.

===============================

Ok, the external quotes are 99-down  99-up, and the internal ones are the French chevrons, but with not spaces:

http://www.secarica.ro/misc/InstitLingvTastatura-20031008.pdf

-- Original quote from the PDF above ---------------------------

1., 2. în limba română, semnele citării sau ghilimelele sunt, aşa cum se indică în unica lucrare normativă actualmente în vigoare în acest domeniu (Academia Română, Institutul de Lingvistică „Iorgu Iordan”, Îndreptar ortografic, ortoepic şi de punctuaţie, ediţia a V-a, Univers Enciclopedic, Bucureşti, 1995), „...” [deci 99 jos, 99 sus] ori «...» (p. 79), cu precizarea că ghilimelele «...» se aşază în interiorul textului cuprins între ghilimelele „...” (p. 81);

== Translation (aprox, not pro :-) ======

1., 2. in the Romanian language the quotes are (as specified by the normative document actually valid in the field, the Romanian Academy, the Institute of Linguistics „Iorgu Iordan”, Guide of orthography, orthographic, and punctuation,  5th edition, published by Univers Enciclopedic, Bucharest, 1995), „...” [99 down, 99 up] or «...» (p. 79), with the specification that the quotes «...» are to be used inside the text contained already between the quotes „...” (p. 81);

---------------------------------------------------

It is reflected in the Romanian keyboard standard SR 13392:2004, and ISO/IEC 8859-16:2001

(both based on SR 13411:1999)

# Centaur on Wednesday, February 21, 2007 4:35 PM:

> I want something smarter than that, though -- because not

> everyone obsessively marks styles, especially in EMAIL

For plain text mail, just treat it as one big <pre>, no quote replacements. Yes, typographically correct quotes are important, but computers are not smart enough to detect where the conversion is or is not needed. I presume if people had “” on their keyboards they might actually learn to use them, but, unfortunately, most only have ' and ".

# Mihai on Wednesday, February 21, 2007 7:00 PM:

I have posted twice a message about the fact that the Romanian quotes are incorrect in the posted table.

It this censorship, or there is something wrong in my message?

:-)

# Michael S. Kaplan on Wednesday, February 21, 2007 7:08 PM:

They were auto-detected as spam, sorry....

# Cristian Secară on Thursday, February 22, 2007 6:26 AM:

Still about Romanian:

,O‘ is never used

„O“ is wrong, should be „O”

‹ O › is never used

« O » should be «O» (used when nested quotes are required, , i.e. „blah «blah» blah”)

I find strange the "some version of Office" statement, since AFAIK the autocorrect feature of MS Office always uses the correct official quotes (99 down / 99 up) since long time ago, even before the recent (2003) Romanian Academy clarification.

Some further references:

The shape (not the codes !) of the Romanian quotes have been clarified (not changed !) in 2003.

The official reference is this (scannd documents; intro + decision):

http://www.secarica.ro/misc/InstitLingvTastatura-20031008-1.tif (.tif, 1,81 MB)

http://www.secarica.ro/misc/InstitLingvTastatura-20031008-2.tif (.tif, 3,66 MB)

The physical written document is stored at the Institute for Cultural Memory ( http://www.cimec.ro/e_default.htm )

Additionally, some sort of tutorial on this subject can be found at the Romanain Academy's official site, on the very first page, under a link that says "Punctuaţia limbii române. I. Ghilimelele" (or direct link http://www.academiaromana.ro/com2006/doc/ghilimele.doc )

The Romanian translation for "quotes" is "ghilimele". They are described as 99 down (on opening) / 99 up (on closing). The Unicode correspondence for this description is U+201E / U+201D.

Cristi

# Mihai on Thursday, February 22, 2007 12:00 PM:

<<They were auto-detected as spam, sorry....>>

So you decided to pass both, although they are almost identical :-)

# Michael S. Kaplan on Thursday, February 22, 2007 12:27 PM:

That ALMOST bit is what did it, I did not want to have to choose....

# Mihai on Friday, February 23, 2007 12:00 PM:

<<That ALMOST bit is what did it, I did not want to have to choose....>>

Well, it was *almost* because I don't save the messages I submit.

So after submitting the first and did not make it, I had no way to resubmit, had to "recreate" it.

The differences where not relevant (in my mind at least :-)

Whatever, it does not matter.

Thanks.

# Michael S. Kaplan on Friday, February 23, 2007 12:37 PM:

Cristi kinda disagreed anyway, so it's all good. :-)


referenced by

2013/04/03 “The ‘Smart Quotes’ are leaking all over the Internet!”

2011/06/13 “Word isn't always ‘smart’.” You can quote me on that (since I said it in English)...

2009/09/11 Microsoft is a company in Redmond, WA, in the USA

2008/03/12 If they aren't in Unicode, what could they be in? And you can [smart-]quote me on that!

2008/01/30 A more usable Dutch keyboard that works properly?

2007/10/09 If you are going to take it so literally, you may want to give 'em a quote or two

2007/03/22 Dumb ellipses?

go to newer or older post, or back to index or month or day