Extending collation support in SQL Server and Jet, Part 1 (the broad strokes)

by Michael S. Kaplan, published on 2005/09/14 03:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/09/14/464780.aspx


Prior posts in the series:

Extending collation support in SQL Server and Jet, Part 0 (HISTORY)

What makes this problem so much easier in the soon to be released Yukon (SQL Server 2005) is that several different technologies are coming together that, when combined together, make it all easier. You can work without them (I will explain how in future posts) but it is never so easy as when it all comes together....

Those features are:

  1. Binary collations in SQL Server (these have been available for many versions)
  2. The binary and varbinary datatypes in SQL Server (these have also been available for many versions)
  3. Windows-only CultureInfo objects (available in Whidbey)
  4. SQLCLR integration features, allowing one to create functions in .NET that can be called from stored procedures (available in Yukon)

I will explain each of them individually.

Binary collations in SQL Server

One of the most important characteristics of some of the new locales in Windows is that there are no collation weights for many of them (a-la-the jury will give this string no weight). When every row of data in a particular column is equal to every other row, it can be quite un-nerving (for obvious reasons). But using a binary collation means you can not only work around this problem, but that you can do so in a very fast way. Not much that needs to be said about this one, other than that. You may never even need to use this collation, but you may. It is worth doing.

As for which one to choose -- now as I have mentioned before in SQL Server has its own version of .NET "ordinal" comparisons, the specific collation language choice only matters for the sake of legacy data that you may need to convert into or out of Unicode. And most of these new locales have characters that are not on any code page. So it truly does not matter which you choose here in most cases. When it does matter for your application, just be sure to choose keeping that conversion requirement in mind.

The binary and varbinary datatypes in SQL Server (these have also been available for many versions)

One of the things that we will be doing to make sure that your application can work as quickly as if collations for these other locales were built into SQL Server is to build indexes on these columns. Stored in these columns will be sort keys that you will generate any time a new string value is inserted or the string value in question is changed. You can then build an index on this binary column and sort by it or use it to search for information (sorting is easy but the search can be intriugingly tricky so I will be talking about some of the more obscure details of using the index if you need to do so another day).

For now, just keep in mind that you will be adding one of these columns for every text column that might contain strings that will require the custom collation. You redoing mostly what SQL Server does anyway, but you are doing it with much more knowledgable functions, for these new languages....

Windows only CultureInfo objects (available in Whidbey)

In the manner of great timing, the article Cathy and I wrote for the October 2005 MSDN Magazine is available today! I'll excerpt from the article to explain Windows Only CultureInfo objects.

I do want to take a moment to call out the efforts of the Software Design Engineer who did the bulk of the work to support these special culture types: Tarek Mahmoud Sayed. His efforts here not only make the story of running the .NET Framework on platforms that are newer and have more features than it possible, but they make that support seamless and as easy to use as any of the built-in cultures that have always been available. There are lazy developers (like me) who can blog and come up with clever ideas now and again, and there are incredibly smart and hardworking developers like Tarek who make me happy to go to work each day, to one of the smartest development teams around. Thanks, Tarek!

So, without further adieu, here is that excerpt:

Cultures in Windows and the .NET Framework

The .NET Framework was released at an interesting point in the history of Windows: after Windows XP was released, but before Windows Server™ 2003. As a result, the list of cultures available in the .NET Framework matched the locales included in Windows XP (and provided a superset of the locales included in previous versions of Windows). Developers didn't have to consider the consequences of new locales on Windows. There would be no issue with how these new Windows locales would interact with a version of the .NET Framework that tries to, for example, base its culture settings on the choices available in the operating system. The .NET Framework has always maintained its own data so that it could return the same results on all possible platforms, and until Windows XP SP2, this had never caused any difficulties.

The globalization development team had to address this problem, however, after Windows XP SP2 shipped with 25 new locales. For a listing of the new locales for SP2 only, see New Locale and Language Features in Windows XP. Imagine our surprise when one of our testers discovered that you could not even start a managed application when installing an early build of SP2 and using one of those new locales as the default user locale! This was clearly an issue we needed to address immediately in earlier versions of the .NET Framework, and fix more fully in the .NET Framework 2.0.

Future Windows service packs may include additional locales. Windows Vista™ (formerly codenamed "Longhorn") is expected to ship with additional locales above and beyond what have been supported to date; so that presents a very possible situation where an installed version of Windows could include locales that are not recognized cultures in the .NET Framework. Therefore, it's imperative that the .NET Framework gracefully handle Windows locales in a managed environment. Figure 3 shows Francois Liger's Culture Explorer (available for download from www.gotdotnet.com), which illustrates how the .NET Framework 2.0 picks up the new locales in Windows Vista through the Windows-only cultures.

The .NET Framework can now handle previously unidentified Windows locales by using the Win32® API to synthesize a CultureInfo object any time a locale supported in Windows has no corresponding culture in the .NET Framework. These cultures can be created either by name or by LCID, just like any other culture. The following code illustrates how to create a culture by name (new cultures on Windows XP SP2 include mt-MT, bs-BA-Latn, smn-FI, smj-NO, smj-SE, sms-FI, sma-NO, sma-SE, quz-BO, quz-EC, quz-PE, ml-IN, bn-IN, cy-GB, and more):

' Visual Basic 
For Each ci As CultureInfo In CultureInfo.GetCultures( _
        CultureTypes.WindowsOnlyCultures)
    Console.WriteLine(ci.Name)
Next

// C#
foreach(CultureInfo culture in CultureInfo.GetCultures(
        CultureTypes.WindowsOnlyCultures))
{
    Console.WriteLine(ci.Name);
}

This is obviously a break from the typical practice in the .NET Framework of giving the same results independent of the platform. However, given the choice between failing completely and succeeding when there is a way to retrieve the data, the option of handling Windows-only cultures successfully provides a better solution for developers who expect some type of culture data returned by the .NET Framework for these Windows-only locales.

You'll notice in the previous code snippet that the CultureInfo.GetCultures static method was used to retrieve a collection of Windows-only cultures. While GetCultures and the CultureTypes enumeration existed in previous versions of the Framework, the .NET Framework 2.0 rounds out the enumeration with more options in order to provide better support for custom and replacement cultures. One of these new values is WindowsOnlyCultures. Figure 4 provides a comparison of the various culture types.

I'll show that great screenshot of the Culture Explorer here, everyone love good art:

How we plan to use these objects is hopefully obvious -- if you get the CultureInfo in this way, any sort key we create by using CultureInfo.CompareInfo.GetSortKey, any comparison done via CultureInfo.CompareInfo.Compare, will all use the support built into Window for the language.

SQLCLR integration features, allowing one to create functions in .NET that can be called from stored procedures (available in Yukon)

In the same way that we effortlessly added Oriya support to the Culture Explorer by just sitting on a Windows Vista box and the or-IN (Oriya - India) culture that was made available thereby, you can effortlessly add support for an Oriya collation if your SQL Server 2005 indexes are being built on a Windows Vista machine!

I will definitely be covering the details of doing this in future articles, but the basic model should be fairly obvious -- a managed assembly containing some basic procedures for creating indexes and making comparisons, and some triggers for inserts and changes to the text columns that will in turn update the indexes. You can of course sort the data via that index column. Easy!

Most search operations can be done by using the original binary index described above, and a future column will describe how to do it with the homemade index (that part is a bit harder but if I provide the code it will hopefully be easy enough to use).

But the most important thing to keep in mind that this integrated solution gives you the power of SQL Server 2005, extended to the 25 locales added in the first ELK release, the 11 added in the second ELK release, or the many being added to Vista (you can see the beta if you have it, or you can sneak a peek at that screen shot to see some of the culture names in the list of cultures in the dialog.

In a word, amazing.

Future posts in this series can be put into two categories:

Feel free to let me know if you are impressed.:-)

 

This post brought to you by "" (U+0b23, ORIYA LETTER NNA)


# Michael S. Kaplan on 14 Sep 2005 3:51 PM:

Hmmmm... either the throngs of people who are excited by this concept have fainted due to the rush, or this is actually more boring than watching flies screw and I have misjudged the importance of the feature. :-)

Or maybe they are all at the PDC right now....

I will keep posting about it though, just in case!

referenced by

2006/09/01 Cue the smarter version of GetDateFormat... ok, it's a wrap!

2005/10/21 Extending collation support in SQL Server and Jet, Part 4 (What about Jet?)

2005/10/09 Extending collation support in SQL Server and Jet, Part 3 (THAT CLASS)

2005/09/25 Extending collation support in SQL Server and Jet, Part 2.1 (is this on?)

2005/09/18 Extending collation support in SQL Server and Jet, Part 2 (generating sort keys)

go to newer or older post, or back to index or month or day