by Michael S. Kaplan, published on 2006/06/14 11:47 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/14/631016.aspx
Larry Osterman pointed me at an article entitled How to break Windows Notepad that makes for an interesting experiment:
Here's how to do it:
1. Open up Notepad (not Wordpad, not Word or any other word processor)
2. Type in this sentence exactly (without quotes): "this app can break"
3. Save the file to your hard drive.
4. Close Notepad
5. Open the saved file by double clicking it.
Instead of seeing your sentence, you should see a series of squares. For whatever reason, Notepad can't figure out what to do with that series of characters and breaks
Now if you have East Asian language support installed, instead of seeing squares (NULL glyphs), you will see:
桴獩愠灰挠湡戠敲歡
An if you look at the code points under those characters, you will likely see what happened:
Ah, each byte is a letter that when combined just so happens to line up with a CJK ideograph!
I have talked about the encoding detection mechanisms that notepad uses recently, and this is another example of the problem, one that is more fun since the repro steps are so much fun (in fact the only improvement would be text insulting Microsoft or one of its rivals, which notepad appears to censor in an example of a big bad monopoly, etc.!).
Now I have pointed out that I do not like the IsTextUnicode function in the past, and I suppose this could be considered a good reason (IsTextUnicode returns TRUE here, which is why Notepad guesses as it does).
This post brought to you by 桴 (U+6874, a CJK ideograph)
# TP on 14 Jun 2006 12:07 PM:
# Lionel Fourquaux on 14 Jun 2006 12:47 PM:
# Maurits on 14 Jun 2006 3:30 PM:
# Maurits on 14 Jun 2006 3:34 PM:
# Michael S. Kaplan on 14 Jun 2006 3:47 PM:
# Michael S. Kaplan on 14 Jun 2006 3:49 PM:
# Maurits [MSFT] on 14 Jun 2006 3:52 PM:
# Maurits [MSFT] on 14 Jun 2006 3:54 PM:
# Maurits [MSFT] on 14 Jun 2006 4:04 PM:
# Maurits [MSFT] on 14 Jun 2006 5:12 PM:
# Maurits [MSFT] on 14 Jun 2006 6:30 PM:
# Dean Harding on 14 Jun 2006 7:33 PM:
# Maurits [MSFT] on 14 Jun 2006 8:02 PM:
Michael S. Kaplan on 15 Jun 2006 12:11 AM:
Michael S. Kaplan on 15 Jun 2006 12:14 AM:
Michael S. Kaplan on 15 Jun 2006 12:19 AM:
Michael S. Kaplan on 15 Jun 2006 12:20 AM:
# Pavanaja U B on 15 Jun 2006 2:30 AM:
# Michael S. Kaplan on 15 Jun 2006 9:15 AM:
# dragonfrog on 15 Jun 2006 11:48 AM:
# borky on 15 Jun 2006 12:55 PM:
# q^-o|o-^p on 17 Jun 2006 8:38 PM:
# A fish called blue on 19 Jun 2006 3:28 AM:
# Rajesh Shenoy on 19 Jun 2006 10:24 AM:
# east on 19 Jun 2006 4:18 PM:
# grsws on 19 Jun 2006 4:19 PM:
# Anuj on 22 Jun 2006 10:43 PM:
# Mircea on 28 Jun 2006 7:56 PM:
# Skews Me on 3 Jul 2006 3:56 PM:
# Michael S. Kaplan on 3 Jul 2006 4:53 PM:
# Sanjay Vyas on 11 Jul 2006 6:23 AM:
# proxy on 2 Aug 2006 12:57 PM:
# South Korean Man on 10 Nov 2007 7:10 AM:
Wow I`m from south korean
IT`S SurpRISE
because Internet Chat Can
OKAY?
I`M 15 YEARS OLD
NAME:Kim Dong UK
LIVE:SEOUL
VERY THANK YOU
# Morbo on 3 Dec 2007 10:45 AM:
A different one:
Now if you type a newline, all the CrLFs are rendered as square blocks.
# Erzengel on 24 Mar 2008 7:12 PM:
I can't repro in Vista. Fixed?
# Michael S. Kaplan on 25 Mar 2008 3:01 AM:
Funny you should ask. I'll bet you have Sp1 installed! :-)
Sameera R. on 19 Apr 2010 2:38 AM:
It doesn't happen in Win7.
Check it.
Michael S. Kaplan on 19 Apr 2010 6:59 AM:
Check what?
If you read the comment just before yours, it points to a blog I wrote that explains how and where this was "fixed". Perhaps I should suggest you check *that*? :-)
referenced by
2010/08/14 (It wasn't me)
2010/04/01 Is the text in XKCD broken?
2008/03/25 Bush might've still hid the facts, but he can't hide them from Vista SP1/Server 2008 Notepad!
2008/03/24 Unicode not being the default is slower and leads to bugs; maybe it ought to change?
2007/12/11 How to get yourself imprisoned [by/for talking about Unicode]
2007/04/22 The Notepad encoding detection issues keep coming up
2006/12/23 Do not adjust your browser, a.k.a. sometimes two wrongs DO make a right, a.k.a. dumb quotes
2006/08/02 Hang on just a [Hansel]Minute!
2006/07/11 More on that which breaks Windows Notepad
2006/07/04 Behind Norman's 'Who needs Unicode?' post
2006/06/22 Things I [don't] like about blogging