I want the lines to BREAK, dammit!

by Michael S. Kaplan, published on 2006/09/10 14:32 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/09/10/748785.aspx

Moz asks, in the Suggestion Box:

I'm wondering when (if?) MS programs like notepad will start supporting the UTF-16 line and paragraph terminators ($2028 and $2029). It's really annoying that I can't easily view files that include them, especially as those are the default for some other apps I use. The conversion to CRLF is a pain to do while debugging, and equally painful to shell to after running something. It makes people think *my* apps are broken.

This is of course an extension to the issue I first discussed back in May of 2005 when I posted When will this line end? And how?.

In that case I was talking about how:

Now if you add U+2028 (LINE SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR) to the mix, the problem only gets harder, not easier. since Notepad does not do any of it.

Let us take a simple test string containing the following stream of characters:

0041 2028 0042 2029 0043 000d 0044 000a 0045 000d 000a 0046

While it is true that Notepad won't do anything useful with anything but the last one, the truth is that both Wordpad and Word and Visual Studio (2003 and 2005) handle all of these different line breaks properly, even in the context of a plain text file.

So to the question of when MS programs will handle this myriad of ways to say NEW LINE PLEASE, the answer is about 4-7 years ago, depending on the program you are using. :-)

And as I said before, I fixed a bunch of this bug for the Access Text Import Wizard, though I did not pick up these two new characters. Maybe someone on the Access Development Team could extend my fix to handle additional linebreak characters? :-)


This post braought to you by U+2028 and U+2029, a.k.a. LINE SEPARATOR and PARAGRAPH SEPARATOR

# Aaron Ballman on 10 Sep 2006 3:17 PM:

I wouldn't go claiming Visual Studio "handles" things too loudly as the definition of handle can vary slightly.  ;-)  One major annoyance that I run into a lot with VS 2005 is that my source code sometimes comes to me from VCS with Mac line endings (U+000a), and Visual Studio does not handle it terribly well.  Sure, the file is displayed just fine.  But try compiling any C++ code with Mac line endings.  The compiler helpfully tells you that you have Mac line endings and it can only grok Windows and Unix line endings and will you please convert the file.  If the compiler already has the logic to handle CRLF or just CR... how hard is it to make it handle just LF?

# Michael S. Kaplan on 10 Sep 2006 3:46 PM:

Heh heh heh... fair enough, Aaron. Though I was limiting myself to the display issue since that is all I tested. :-)

Given how much work it is to verify compiler features and changes and fixes, I would never presume that such a change would be easy....

# pdw on 10 Sep 2006 3:54 PM:

You're confused here: Unix uses LF line endings, Macs (used to?) use CR line endings.

# Michael S. Kaplan on 10 Sep 2006 4:55 PM:

Hey pdw,

You're right. What the hell kind of weird coneceptual dyslexia causes THAT kind of mistake? :-)

In any case, corrected now. Thanks....

# Aaron Ballman on 11 Sep 2006 1:39 PM:

I thought those constants looked strange to me.  Heh.  

And regardless of the difficultly level of the fix, one would think that just plain sensibility would say "hmm... perhaps this should work."  There's nothing more frustrating than the computer telling you "Hi!  I noticed that you're doing this.  You could please do that instead?  I'd rather not do it for you..."  For instance, if the compiler can't grok the Mac line ending, then warn me of the issue, save out a version of the file to temp with Windows line endings and compile using that.  This way I can at least get work done.

Alas, I suspect I'm preaching to the choir though.  ;-)

# elpepe on 4 Oct 2006 2:06 PM:

Hi, I am dealing with something like this ..., I use Visual C++ 2005 Express Edition for writing a text (ASCII) file. I put a "\n" to the end of each string I consider a line. Notepad does not recognize them and shows me a strange character. EditPlus (for example) does recognize them and interpret them as "new line" characters. What I'm doing wrong?. I want Notepad shows them as "new line" characters also...

# Michael S. Kaplan on 4 Oct 2006 2:56 PM:

Notepad won't do it, sorry.

WordPad will, as will Word. And IE.....

go to newer or older post, or back to index or month or day