Case/kana/accent/width sensitive SQL Server, for testing

by Michael S. Kaplan, published on 2005/05/11 02:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/11/416293.aspx


A few minutes ago, I posted about Achieving case insensitivity, to deal with how best to make things case sensitive in an expression in SQL Server.

Now looking beyond the issue with that one expression, and to the wider issue of collations and what to use, for just a moment....

(This post contains several topics I will be covering in my TechEd 2005 talks in Orlando and Amsterdam -- lets hope TechEdBloggers picks it up, they missed this other one!)

Remember those old posts of mine (like They ask me "why is my Korean text in random order?" and A few of the gotchas of CompareString) where I implied (nicely) that calling CompareString with the NORM_IGNORENONSPACE flag is like making an official request of Windows to destroy important Korean data distinctions in collation? Well, an "accent insensitive" server is the SQL Server equivalent of that flag.

And of course all of the strange issues that the second post above point out about kana, width, and case insensitivity via NORM_IGNOREKANA, NORM_IGNOREWIDTH, and NORM_IGNORECASE? The ones that can potentially affect the Japan, Japan/Korea, and whole world in unanticipated ways, respectively? Those are the same as the respective collation types for SQL Server, too.

But let's ignore all that for a moment. Not everyone is working on international scenarios, and I am not quite naive enough to believe that they are. :-)

However, the truth is that you never know what the client is going to do with their server. And if you write an application that is fast and loose with the rules of case sensitivity in column names and it breaks because the customer installed it on their server which is case sensitive, you may be really unhappy with their reaction.

Even the Access team had a problem with this back in 1999 in their NorthwindCS.sql file that installed a SQL Server version of their NorthWind sample database -- of course they did manage to find that bug before too many people noticed it, and a lot of testers suddenly realized that case sensitivity is a veritable bug farm on a project that thinks it is ready to go, but has not yet tried it. :-)

I have talked to folks on the SQL Server test team in the past and have been told that more tests are run on case sensitive servers than not, for the same reason. In the end, installing things to not be insensitive to these distinctions will almost universally find more bugs than it hides. Even if your final production work will usually not be set up that way, a lot of the testing during the early phases will help shake out those bugs.

And that way if the client does run that way, you do not have to be embarrassed about the mixed use of case in your own queries and stored procedures!

 

This post brought to you by "₨" (U+20a8, a.k.a. RUPEE SIGN)


# Dean Harding on 11 May 2005 12:51 AM:

You're on fire tonight, Michael! This must be a record of some kind :)

I never really understood why SQL Server let you set different collations on the master database anyway. I guess it makes sense from the point of view that it lets users set it to whatever they want (obviously!), but generally users don't care what sorting their column names and stuff get, that's the realm of the programmer... and do case-sensitive stored procedures really help anyone?

# Michael S. Kaplan on 11 May 2005 1:24 AM:

Well, it was not really something I woke up to start doing yesterday morning -- I just kept running across interesting points all day. :-)

But as to the master having user settable collations, there is some benefit in terms of people wanting to query the system tables and see things the way they expect to see them. I think that is how most people use it, without really thinking about it.

But there are interesting consequences of the way many different factors in the model intersect (including this one!) -- I will cover that at TechEd and probably in a post some time soon. :-)

# Ambarish Sridharanarayanan on 11 May 2005 6:10 PM:

Interesting that the Tahoma font (on Windows Server 2003) misrepresents the Rupee Sign (U+20a8) using a combination of 'R' and 'p'; as you can see from trying it out on Lucida Sans Unicode and Arial Unicode MS or from a web search for "rupee Rs.", it should use a combination of 'R' and 's'. Where do I file a bug-report?

# Michael S. Kaplan on 15 May 2005 10:36 AM:

Ambarish,

I will forward the report on to the people in typography....

# Michael S. Kaplan on 16 May 2005 11:14 AM:

Ambarish -- I have info from a reliable source that this problem has been noticed and will be addressed in future versions of the font.

But I appreciate the report, in any case. Bugs found through blogging rock! :-)

referenced by

2007/12/06 In SQL Server, A-Z, A-z, a-Z, and a-z may not mean the same thing!

2007/02/16 The rumor was that SQL Server passwords are case insensitive?

2006/10/30 If you are more sensitive, you'll pick up on more problems

2006/05/30 Avoiding SQL Server collation woes in the TempDB

2005/05/15 Not all SQL Server collations are created equal

go to newer or older post, or back to index or month or day