The new compiler error C4819

by Michael S. Kaplan, published on 2005/01/05 21:07 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/01/05/347394.aspx


I was looking at Elyasse's Weblog and was reminded of one of the coolest feature entries in Whidbey.

I think I have been waiting roughly 112 versions of the Microsoft compilers for this. Well, probably not that many but it does feel like that....

New in Whidbey! From the help:

C4819 occurs when an ANSI source file is compiled on a system with a codepage that cannot represent all characters in the file.

To resolve C4819, save the file in Unicode format.

This is incredibly cool.... :-)

 

This post is sponsored by "©" (a.k.a. U+009, COPYRIGHT SIGN)


# Anon on 8 Jan 2005 4:21 PM:

"C4819 occurs when a non-ANSI compliant compiler ignores clause 2.1 (Phases of Translation) in the C++ standard and does not map physical source characters to the basic source character set"

# Michael Kaplan on 8 Jan 2005 5:35 PM:

Answer to the question this post implied....

# Anon on 9 Jan 2005 8:03 AM:

Ah, it generates this when it can't perform the mapping because it has encountered a nonsense byte? That's fine, but I still think "save the file as unicode" is duff advice considering all the legacy source control systems out there...

I'm now curious. What did the compiler do prior to Whidbey? Assume a particular codepage?

# Michael Kaplan on 9 Jan 2005 9:38 AM:

Well, the advice is for *new* code, not legacy.

Legacy systems would always save the file as ANSI and so the characters in question would be converted to question marks (same as in notepad when you try to save as ANSI).

# Vladislav Vaintroub on 6 May 2008 6:05 PM:

Too much i18n does not seem good for the compiler.

Michael, by all respect I cannot share your view on this "incredibly cool" feature. I think it is incredibly uncool.

The bad thing about this warning  can result to an error like here

http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=341454

is that : C strings,  the null terminated arrays of bytes,

do not have any encoding information per se, i.e are supposed to be treated as opaque arrays of bytes. Now, I have a perfectly valid

C file, containing ASCII-only, except for UTF8 bytes instrings (UTF8 for a good reason, I intend to edit this file in UTF-8 editor). And such a file will break with incomprehensible message on Whidbey on Japanese Windows now

The connect bug is now resolved with Won't Fix, so I can not even hope that this will be fixed with the next version of the compiler.

Alternatives for me?

1)Documentation and support says  - add a BOM to the file. No way, then it will break on older compiler and on non-Microsoft compilers.

2)#pragma setlocale?

Does not work

3) convert strings  to  their hex-byte-array array form

something like

char foo={0xba,0xad,0xf0,0x0d,0x00}?

Will work, will look ugly and I'll have to forget about editing this file in a my wonderful UTF8 -capable editor , VS2005 IDE.

Or forget about getting this file compiled on Japanese Windows. It is not important *for me* anyway.  This compiler works quite well on latin1 territories:)

# Michael S. Kaplan on 6 May 2008 6:37 PM:

Since the BOM does exist, you could also petition the other compilers to start recognizing it, too. I'm sorry, but I agree with Jonathan Caves on this issue -- use the BOM and you are golden.


referenced by

2010/06/05 Not all developers speak English. Dammit.

2008/05/19 Everyone seems averse to the BOM these days; Should we blame TSA? :-)

2005/12/09 More on the C4819 error

2005/11/23 100% roundtrip ASCII? 100% roundtrip ANSI?

2005/10/28 It isn't Unicode, it's Double Secret Unicode!

2005/01/08 How does it detect invalid characters?

go to newer or older post, or back to index or month or day