Thus the problems resist solution, and the workarounds are often inadequate

by Michael S. Kaplan, published on 2010/03/15 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/03/15/9974423.aspx


Language can be hard.

Sure there are really complex issues I have talked about before in languages like Hungarian (ref: Double compressions -- Hungarian goulash? and Hungarian is even more complicated than I thought). I mean, the need to consider collation differently depending in whether one is asking the which comes first? question or the are they equal? question is a hard problem, made harder by the fact that even when there are two functions and a good technical reason to call the right one, engineers often don't do it (as I point out here, for example).

The fact that they don't have a function anyway may seem bad.

But on the other hand?

In the end, it really speaks against making something possible if you have good reason to believe it won't be utilized.

And there is even more to it than that.

Even as language shifts in its expected usage, software that keeps on top of the issues does little more than wait to flip a switch.

I mean, whether one thinks about Norwegian changes that have happened with no longer treating aa as a å (ref: here), or Swedish changes that will happen differentiating w and v (ref: here), all this amounts to is waiting until enough situations come up where the people annoyed with us doing some old out-of-date thing outnumber the people annoyed with us for doing some new-fangled thing. Basically waiting for the tipping point, and hoping that too many people don't get annoyed along the way as the static snapshots of "what we do" interact with people who do stuff at different times, in different ways.

But even with that, people usually want more.

Consider the following, each one of these is something I have had requested before multiple times:

Now that last one just came in this last weekend, in regard to SQL Server. And yes it is bound up not only in issues like the technical ones I mention in What the %#$* is wrong with German sorting? which are sometimes just a side effect of the logical way to try to implement a technical solution that interested language experts (though generally not computational linguists) are scandalized by whenever they hear about them.

But instead this is actual linguistic issues related to the fact that some words need to be treated differently and there is no algorithmic way to discern the differences.

Now each of the above problems can be solved with engineering effort that may include enhanced algorithms or may include knowledge of specific words/letters or may include dictionary lookups or may just include bug fixes/mea culpas.

Though for the most part the team that owns globalization services considers all of this to be out of scope for "low level" functions.

And of course they include no "higher level functions".

Now any of these issues can be hacked up by anyone, but if the team that does all the work with linguistic fidelity doesn't do it, how likely is that some other team will do better here, especially when the only satisfactory way to fix many of these problems from an engineering standpoint is to do so inside the same kinds of algorithms that those lower level functions perform?

And especially when there aren't oodles of teams just jumping at the chance to do that kind of work if the team that traditionally is supposed to care so much says meh.

Thus the problems resist solution, and the workarounds are often inadequate....

All of which ignores the fact that even when cooler, more useful functions and methods and flags exist that people don't know to call them anyway!


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day