UTF-8 default isn't in the latest Notepad, either

by Michael S. Kaplan, published on 2011/02/15 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/02/15/10129419.aspx

Allow me to intentionally misquote a West Wing episode I enjoyed:

Every once in a while, every once in a while, there's a bug report with an absolute right and an absolute wrong, but those reports almost always include blue screens of death. Other than that, there aren't very many un-nuanced bug reports in writing a blog that's way too big for ten words. I'm the author of Sorting it all Out, not the author of the readers who agree with me.

That was fun. :-)

It is true that I often start blogs with "simple questions" which turn out to have complicated answers.

And when the answers are simple they usually are simple in a bad way: like the word NO.

This is one of those blogs....

The question, as you have probably gathered, was simple:

Customer wants to automatically use UTF-8 when saving files with Notepad instead of ANSI by default.

The answer is indeed that no, it isn't possible. This default is hard-coded into Notepad.

They made the decision in 1993 when Notepad was added to NT 3.1, and have stuck to their guns -- even after UTF-8 support was added in 1998-1999.


Now as a workround, you could try the following:

  1. Create your own app to start every text file rather than starting them in Notepad directly;
  2. Add the UTF-8 BOM to this otherwise empty file you create;
  3. Hope they never start a new file in Notepad or fail to use your "Wolves' Highway" application.

But in the end, there is no way to keep Chloe on the Wolves' Highway. That might be why she was shot and killed by ranchers.

And why users will seldom follow the directions here, either....

Hiroshi on 15 Feb 2011 2:55 PM:

This is a large inconvenience.

I use Notepad pretty much daily, and most of my data is not in English.

I manually need to select the UTF-8 encoding.

Also, I often need to work with CSV data exports from customers in multiple countries in multiple languages.

Very often there is "mojibake" since the data is saved in the local codepage. Very few are knowledgeable enough to understand how to save in a Unicode encoding, or even understand why they should. I need to switch the non-Unicode setting for my entire system several times a week to view and repair.

Hopefully this can be improved in a future Windows.

Also, at that time, please consider renaming "Unicode" and "Unicode big endian" to "UTF-16 LE" and "UTF-16 BE", respectively.

Yuhong Bao on 15 Feb 2011 5:27 PM:

"I need to switch the non-Unicode setting for my entire system several times a week to view and repair."

I suggested adding a codepage number field in the Open dialog before so that the ACP won't have to be switched to do this.

Michael S. Kaplan on 15 Feb 2011 8:02 PM:

The question was how to change the default to UTF-8 without changing it in the dialog; your suggestion would not have helped even if it had been done. :-)

Michael S. Kaplan on 15 Feb 2011 8:05 PM:


If you have characters outside the ACP, you are warned that data is not gong to be saved properly.

We cannot rename the entries, most users do not understand the other terms; the few who do have no problems understanding the current behavior....

Cheong on 15 Feb 2011 9:43 PM:

I'm with YuHong Bao's suggestion even if it doesn't solve this particular problem.

My files litterd in harddisk contains files in ANSI (Big5, GB2312, SJIS), Unicode, UTF-8. Each time I want to open them, if it's in one of the ANSI codepage range, I'd usually need to fire up a web browser, switch to a correct code page, fire up notepad, and then copy and paste everything there. After finished editing, I'll save it as UTF-8.

It'd be great if I can switch the codepage be used to open file in notepad if I want to.

Michael S. Kaplan on 15 Feb 2011 10:07 PM:

I understand. But let's try and keep this blog ON TOPIC. I'll talk about that issue another day....

John Cowan on 16 Feb 2011 9:22 AM:

Notepad is a joke.  There are plenty of real plain-text editors, for every value of "real" from TextPad to Emacs.

Michael S. Kaplan on 16 Feb 2011 12:12 PM:

It is the joke people keep wanting to telluse, though....

Peter on 16 Feb 2011 12:21 PM:

@John Cowan: Maybe I got you wrong, but TextPad is also a good example for bad Unicode support. For example it cannot process files with multiple scripts.

GunSmoker on 21 May 2011 7:12 PM:

BTW, any chance for this to change? It's 2011, not 1995 after all. Where can we vote for this? :)

Well, if not default out-of-the-box - at least an option to do this. It should help a lot in multi-language world...

Michael S. Kaplan on 22 May 2011 2:52 AM:

I doubt it will ever change.

modernshoppers on 1 Jul 2013 12:35 AM:

There is a simple fixer, to set Unicode(UTF8) as default for notepad -













