No data loss (as long as it is Unicode data)

by Michael S. Kaplan, published on 2005/05/18 09:30 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/18/419112.aspx


A few days ago, I made some SQL server developers nervous when I explained that Not all SQL Server collations are created equal.

But there is one point I must emphasize here.

There is no data loss in moving between collations, as long as you use Unicode data. Because even if a code point has no weight, it is still in the data stream. The code point will have whatever display characteristics are associated with it, and therefore still have whatever influence it had (other than the fact that it cannot be used in searches).

Of course, if you do not have Unicode data, then you have lots of potential data corruption/data loss issues to contend with.

Corruption will often be an appearance issue, except for those times that there are unassigned code points in one code page but not another, or when moving between DBCS code pages that consider cetain sequences illegal based on lead byte/trail byte semantics.

All the more reason to stick with Unicode here, right? :-)

 

This post brought to you by "Ʒ" (U+01b7, a.k.a. LATIN CAPITAL LETTER EZH)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2006/05/25 SQL Server's cross-version collation support

go to newer or older post, or back to index or month or day