Making SQL Server index usage a bit more deterministic

by Michael S. Kaplan, published on 2005/05/06 17:06 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/06/415199.aspx

Chris McKenzie blogged about a situation that I will be talking about in my upcoming TechEd 2005 presentations (particularly DBA319!), so I thought I might talk about the issue a bit here.

In his post Note to Self, Chris said:

Don't put ORDER BY Statements in Views.

Delay sorting until you're actually about to use the data.

If you end up sorting differently than the default sort specified by the ORDER BY in the view, it comes as an expensive performance hit. Witness a stored procedure that was breaking a unit test due to a SqlClient.SqlTimeout Exception (took 2.5 minutes to run). Removing the ORDER BY statements from the underlying views brought execution time down to 6 seconds.

Luckily it is not that as bad as all that (or it does not have to be). Though I actually said as much in my International Features in SQL Server 2000 white paper, which was reviewed and approved by the SQL Server International team (and several people on the Engine and at least one architect!):

Issues with defining collation at the column level

How often would you have a database that needs one sort order (for example, Latin1_General) and a column that needs a different one (for example Greek)? Sometimes, this may be crucial, but in other cases if the data in your database does not use a single collation, then it is probably multilingual data that may need to be sorted according to more than one collation. Being able to define multiple collations, each of which can be indexed, allows you to access the Greek data by specifying the Greek collation, and to have this query be an indexed search.

That last clause "and to have this query be an indexed search" is the crux of the matter. In the example provided earlier, using a COLLATE expression in the ORDER BY clause of a query gives you the functionality; however, this will not be an indexed ordering, so it will be slower for large datasets. As it stands, column-level collation makes sense only if you do not have monolingual data in a column, or if you denormalize your database to store different languages in different columns.

It was later pointed out to me that there was a way to make this work in Shiloh (SQL Server 2000), a way which gets much better in Yukon (SQL Server 2005). You can basically create multi-lingual indexes!

It works like this:

Say you have a Table tbl with column c with collation of Latin1_General_CI_AS

select * from tbl
order by c COLLATE Japanese_CI_AS

No Japanese index exists here -- so everything done with this uery will be slower.

create table tbl (c nvarchar(100), c2 as c COLLATE Japanese_CI_AS)
create index itc2 on tbl(c2)

The original query can now use the Japanese index!

Now, In SQL Server 2000, you must explicitly use that field c2 to get the index to work.

But in SQL Server 2005, the work was done in the engine to allow deterministic index usage that would not treat all indexes as "equal" and arbitrarily pick one. If an index exists on a column that matches the characteristice of the query, then the QPE will make intelligent use of that index.

Pretty cool trick, if you ask me. I wish I had known about in several of the applications I did, and before I wrote the white paper!

I will be delving into this sort of item and more in Orlando and in Amsterdam, so if you are going to TechEd 2005 then be sure to sign up for DBA 319: Best Practices for Search in Multilingual Data Sets (Using SQL Server Collation for Sorting and Indexing).

This post brought to you by "§" (U+00a7, a.k.a. SECTION SIGN)

# Chris McKenzie on 6 May 2005 7:00 PM:

Thanks for posting this. I solved my earler problem by delaying sorting until the stored procedure. I identified the problem as being that the ORDER BY clause in my stored proc was sorting the data on a different set of columns than the default sort specified by the underlying view. Are you saying that I could also have improved performance by adding indexes to the columns or column-groups that I will most often be sorting on? I may have to compare and benchmark the two approaches.

This was a very interesting blog. Thanks again.

# Michael S. Kaplan on 6 May 2005 8:23 PM:

Yes, that is what I am saying. :-)

And of course in SQLS 2005 the syntax is more natural since you never have to think about the indexes after you have created them; they will just work....

Glad you liked it! I'll be doing more SQL server posts as I get closer and closer to TechEd.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/08/26 Making SQL Server operations slower (without explicitly trying)

2007/09/18 A&P of Sort Keys, part 8 (aka You can often think of ignoring weights as a form of ignorance)

2006/06/07 Performance issues with language specific sorts?

2006/01/29 Handling multilingual data in SQL Server

2006/01/01 Sorting multilingual data

2005/05/15 Not all SQL Server collations are created equal

2005/05/11 Case/kana/accent/width sensitive SQL Server, for testing

2005/05/11 Achieving case insensitivity

go to newer or older post, or back to index or month or day