How bad does data have to before it is wrong? And how long does it have to be wrong before it is right?

by Michael S. Kaplan, published on 2008/08/06 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/08/06/8836106.aspx


There are not very many times that a feature within NLS can make a person psychotic.

Though of course by making such a claim one implies that there are in fact such cases, no matter how rare they may be.

This post will be about one of them....

It is about the  TransliteratedFrench and TransliteratedEnglish calendars in Windows.

In order to properly tear them apart, first we'll write some code to enumerate the information in them.

Note that the tortuous method of getting to the data is not my idea. :-)

Here is the code:

namespace PsychoticCalendars {
    using System;
    using System.Globalization;
    class PsychoticCalendars {
        [STAThread]
        static void Main(string[] args) {
            CultureInfo[] rgci = {new CultureInfo("en-US"), new CultureInfo("fr-FR"), new CultureInfo("ar-IQ")};
            foreach(CultureInfo ci in rgci) {
                foreach(Calendar cal in ci.OptionalCalendars) {
                    if(cal is GregorianCalendar) {
                        Console.WriteLine("{0}\t{1} ({2})", ci.Name, cal, ((GregorianCalendar)cal).CalendarType);
                        ci.DateTimeFormat.Calendar = cal;
                        Console.Write('\t');
                        for(int i = 1; i <= 12; i++) {
                            Console.Write(ci.DateTimeFormat.GetMonthName(i) + "  ");
                        }
                        Console.WriteLine();
                        Console.Write('\t');
                        for(int i = 1; i <= 12; i++) {
                            Console.Write(ci.DateTimeFormat.GetAbbreviatedMonthName(i) + "  ");
                        }
                        Console.WriteLine();
                        Console.Write('\t');
                        for(DayOfWeek d = DayOfWeek.Sunday; d <= DayOfWeek.Saturday; d++) {
                            Console.Write(ci.DateTimeFormat.GetDayName(d) + "  ");
                        }
                        Console.WriteLine();
                        Console.Write('\t');
                        for(DayOfWeek d = DayOfWeek.Sunday; d <= DayOfWeek.Saturday; d++) {
                            Console.Write(ci.DateTimeFormat.GetAbbreviatedDayName(d) + "  ");
                        }
                        Console.WriteLine();
                        Console.Write('\t');
                        for(DayOfWeek d = DayOfWeek.Sunday; d <= DayOfWeek.Saturday; d++) {
                            Console.Write(ci.DateTimeFormat.GetShortestDayName(d) + "  ");
                        }
                        Console.WriteLine("\r\n");
                    }
                }
            }
        }
    }
}

First, before you run this code, you will want to run chcp 1256 or chcp 65001 since those are two of the only code pages that will be able to contain the French and Arabic letters that will be needed here.

Okay, now here is the output....

en-US   System.Globalization.GregorianCalendar (Localized)
        January  February  March  April  May  June  July  August  September  October  November  December
        Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
        Sunday  Monday  Tuesday  Wednesday  Thursday  Friday  Saturday
        Sun  Mon  Tue  Wed  Thu  Fri  Sat
        Su  Mo  Tu  We  Th  Fr  Sa

en-US   System.Globalization.GregorianCalendar (USEnglish)
        January  February  March  April  May  June  July  August  September  October  November  December
        Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
        Sunday  Monday  Tuesday  Wednesday  Thursday  Friday  Saturday
        Sun  Mon  Tue  Wed  Thu  Fri  Sat
        Su  Mo  Tu  We  Th  Fr  Sa

fr-FR   System.Globalization.GregorianCalendar (Localized)
        janvier  février  mars  avril  mai  juin  juillet  août  septembre  octobre  novembre  décembre
        janv.  févr.  mars  avr.  mai  juin  juil.  août  sept.  oct.  nov.  déc.
        dimanche  lundi  mardi  mercredi  jeudi  vendredi  samedi
        dim.  lun.  mar.  mer.  jeu.  ven.  sam.
        di  lu  ma  me  je  ve  sa

ar-IQ   System.Globalization.GregorianCalendar (Localized)
        كانون الثاني  شباط  آذار  نيسان  أيار  حزيران  تموز  آب  أيلول  تشرين الأول  تشرين الثاني  كانون الأول
        كانون الثاني  شباط  آذار  نيسان  أيار  حزيران  تموز  آب  أيلول  تشرين الأول  تشرين الثاني  كانون الأول
        الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
        الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
        أ  ا  ث  أ  خ  ج  س

ar-IQ   System.Globalization.GregorianCalendar (USEnglish)
        January  February  March  April  May  June  July  August  September  October  November  December
        Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
        Sunday  Monday  Tuesday  Wednesday  Thursday  Friday  Saturday
        Sun  Mon  Tue  Wed  Thu  Fri  Sat
        Su  Mo  Tu  We  Th  Fr  Sa

ar-IQ   System.Globalization.GregorianCalendar (MiddleEastFrench)
        janvier  février  mars  avril  mai  juin  juillet  août  septembre  octobre  novembre  décembre
        janv.  févr.  mars  avr.  mai  juin  juil.  août  sept.  oct.  nov.  déc.
        dimanche  lundi  mardi  mercredi  jeudi  vendredi  samedi
        dim.  lun.  mar.  mer.  jeu.  ven.  sam.
        أ  ا  ث  أ  خ  ج  س

ar-IQ   System.Globalization.GregorianCalendar (TransliteratedEnglish)
        يناير  فبراير  مارس  ابريل  مايو  يونيو  يوليو  اغسطس  سبتمبر  اكتوبر  نوفمبر  ديسمبر
        يناير  فبراير  مارس  ابريل  مايو  يونيو  يوليو  اغسطس  سبتمبر  اكتوبر  نوفمبر  ديسمبر
        الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
        الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
        أ  ا  ث  أ  خ  ج  س

ar-IQ   System.Globalization.GregorianCalendar (TransliteratedFrench)
        جانفييه  فيفرييه  مارس  أفريل  مي  جوان  جوييه  أوت  سبتمبر  اكتوبر  نوفمبر  ديسمبر
        جانفييه  فيفرييه  مارس  أفريل  مي  جوان  جوييه  أوت  سبتمبر  اكتوبر  نوفمبر  ديسمبر
        الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
        الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
        أ  ا  ث  أ  خ  ج  س

Okay, so here we go.

Let's pick them apart, after getting some advice from the NLS "Calendar Girl" Shelby who first pointed out one of the problems here (and though we both turned out to be mistaken as to the cause, that is only because both of us attributed more smarts to the actual process!).

First of all, there is the fact that the shortest day names  for the MiddleEastFrench calendar, rather than matching the French Gregorian localized calendar like all of the rest of the data does, matches the Arabic Gregorian localized calendar.

Thus instead of 

di  lu  ma  me  je  ve  sa

we have

 أ  ا  ث  أ  خ  ج  س

which for those who don't know Arabic, is

ALEF WITH HAMZA ABOVE, ALEF, THEH, ALEF WITH HAMZA ABOVE, KHAH. JEEM, SEEN

Okay.

Now there is also the fact that the shortest day names for the TransliteratedEnglish and TransliteratedFrench calendars are also identical to these. Note from the above that they are in no way transliterations for either the English or French Gregorian calendars.

That seems like kind of a problem too.

But don't worry too much -- it turns out that the day names and abbreviated day names for the TransliteratedEnglish and TransliteratedFrench calendars are also identical to the Arabic Gregorian localized calendar.

And are also in no way transliterations.

In case you don't believe me I'll take one and prove it. Wednesday is:

الاربعاء

which is

ALEF, LAM, ALEF, REH, BEH, AIN, ALEF HAMZA

which is obviously not a transliteration for either Wednesday or mercredi.

Month name fare a bit better, though -- they do look like transliterations. Thus

سبتمبر

is

SEEN BEH TEH MEEM BEH REH

which is a fair transliteration for September, just as

فيفرييه

is

FEH YEH FEH REH YEH YEH HEH

which is kind of a transliteration for février.

Though of course in both the TransliteratedEnglish and TransliteratedFrench calendars, the abbreviated month names, rather than being transliterations of the  English and French calendars, are identical to their non-abbreviated cousins.

At this point, it is fair to say that of the data in these three Gregorian calendars:

 60% of it is just wrong, wrong, wrong in any conventional sense of how a reasonable person would expect them to work.

If you ignore the shortest day name stuff (which was added fairly recently) then only 50% of it is wrong.

But this data is not newly wrong -- it has been wrong for as long as these calendars have existed -- in Windows 95, I think?

It would be easy to claim that this is really a fallback system kind of thing -- you know, data was not there so it is falling back to data elsewhere.

I could make such a claim right now credibly based on the situation.

There would be one problem this this claim, though.

The fact that I would be full of crap if I made it. :-)

This data is stored as is in the data and has been for as long as the data has been there.

These calendars are just wrong and weird and odd and strange and they are mostly not transliterations in any sense.

From a quality of data standpoint, in fact, I would tentatively suggest that we are currently hip deep in the low-point of NLS right now.

Transliterationally speaking, that is.

So I'll put forward the two questions again:

How bad does data have to before it is wrong?

Will 50-60% do it? How about the 75-80% of the transliterated calendars only?

And how long does it have to be wrong before it is right?

Is over a decade long enough that this is not just okay? Or does fixing the worst of it make sense at some point in the future?


This blog brought to you by ج (U+062c, aka ARABIC LETTER JEEM)


John Cowan on 6 Aug 2008 12:44 PM:

But you can't *change* it, because Someone Out There is undoubtedly *depending* on that very trash!

Michael S. Kaplan on 6 Aug 2008 3:38 PM:

That might be true for some of it, but other parts are likely fixable. There is not always a requirement to stay broken. :-)


go to newer or older post, or back to index or month or day