by Michael S. Kaplan, published on 2005/02/01 00:03 -08:00, original URI: http://blogs.msdn.com/michkap/archive/2005/02/01/364376.aspx
A little over a week ago, when I was mentioning that In Tamil -- sometimes, they are digits; other times, just numbers, Scott Hanselman suggested "That would ROCK if you would do Ethiopic sometime." Well, rock on Scott -- today is the day.
For the record I am not an expert in these things, just a geek who finds alternate number systems to be really interesting (whether roman numbers, Tamil numbers, or Ethiopic numbers).
Ready? here we go....
Factoid -- there is no Ethiopic zero. There are some numbers that have zeros in them (10, 20, 30, etc.) but no zero. It makes the number system quite fascinating.
We'll start with a small quote from the Unicode Standard on the subject, found in Chapter 12, Section 1 (available for viewing online in PDF format, here):
Numbers. Ethiopic digit glyphs are derived from the Greek alphabet, possibly borrowed from Coptic letterforms. In modern use, European digits are often used. The Ethiopic number system does not use a zero, nor is it based on digital-positional notation. A number is denoted as a sequence of powers of 100, each preceded by a coefficient (2 through 99). In each term of the series, the power 100^n is indicated by n HUNDRED characters (merged to a digraph when n = 2). The coefficient is indicated by a tens digit and a ones digit, either of which is absent if its value is zero.
For example, the number 2345 is represented by
2,345 = (20 + 3)*100^1 + (40 + 5)*100^0
= 20 3 100 40 5
= TWENTY THREE HUNDRED FORTY FIVE
= 1373 136b 137b 1375 136d
= ፳፫፻፵፭
If you are like me then your eyes may have crossed when you read this, even though the example seemed clear enough. Maybe they should have put in a bigger example....
Personally, I find Daniel's Ethiopic Number Algorithm #4 to be much clearer from a conceptual standpoint. If you prefer something a bit more cerebral with code samples, then you can look at http://www.geez.org/Numerals/ for a slightly different algorithm (using the same number, I suspect a shared source, maybe? <grin>). The page even has links to demonstrations of the algorithm in Perl, C, Java, and C#.
So let us take the resulting number that both sites talk about (፯፻፷፭፼፵፫፻፳፩) and try to convert it back from Ethiopic to our familiar Arabic-Indic digits:
= ፯፻፷፭፼፵፫፻፳፩
= 136f 137b 1377 136d 137c 1375 136b 137b 1373 1369
= DIGIT SEVEN; NUMBER HUNDRED; NUMBER SIXTY; DIGIT FIVE; NUMBER TEN THOUSAND; NUMBER FORTY; DIGIT THREE; NUMBER HUNDRED; NUMBER TWENTY; DIGIT ONE
(I removed the word ETHIOPIC from each character name to allow more to fit per line)
At this point, even knowing what the number is, the words on the site ("Conversion from Ethiopic numerals into western form is trivial") do not seem quite as true, do they? :-)
Though it actually is easy, it just looks hard. Keeping in mind those "sentinels" that ETHIOPIC NUMBER HUNDRED and ETHIOPIC NUMBER TEN THOUSAND represent (with two digits in each group, between them) and we have:
= DIGIT SEVEN; NUMBER HUNDRED;
NUMBER SIXTY; DIGIT FIVE; NUMBER TEN THOUSAND;
NUMBER FORTY; DIGIT THREE; NUMBER HUNDRED;
NUMBER TWENTY; DIGIT ONE
Notice how the sentinels keep swapping between the TEN THOUSAND and the HUNDRED? Interesting...
Picking at the pieces:
= 7
65
43
21
or more conventionally
= 7654321
Not too hard, right? Lets try another one:
= ፳፩፼፳፰፻፷፯፼፶፫፻፱
= 1373 1369 137c 1373 1370 137b 1377 136f 137c 1376 136b 137b 1371
= NUMBER TWENTY; DIGIT ONE; NUMBER TEN THOUSAND; NUMBER TWENTY; DIGIT EIGHT; NUMBER HUNDRED; NUMBER SIXTY; DIGIT SEVEN; NUMBER TEN THOUSAND; NUMBER FIFTY; DIGIT THREE; NUMBER HUNDRED; DIGIT NINE
A little harder this time, but lets do the grouping where those grouping sentinels are and see what we have:
= NUMBER TWENTY; DIGIT ONE; NUMBER TEN THOUSAND;
NUMBER TWENTY; DIGIT EIGHT; NUMBER HUNDRED;
NUMBER SIXTY; DIGIT SEVEN; NUMBER TEN THOUSAND;
NUMBER FIFTY; DIGIT THREE; NUMBER HUNDRED;
DIGIT NINE
We seem to be missing a digit right before that nine -- what happened to two numbers in each group? Ah, thats easy -- look at the sentinel! A zero goes there. So we have:
= 21
28
67
53
09
And as Tommy Tutone knows, Jenny's New York phone number is indeed 212-867-5309.
Ok, one more that shows a bit more of that missing zero stuff:
= ፶፻፭፼፭
= NUMBER FIFTY; NUMBER HUNDRED; DIGIT FIVE; NUMBER TEN THOUSAND; DIGIT FIVE
Ooh, a tough one. I'll insert some fake zeros in where they seem to belong based on those sentinels:
= NUMBER FIFTY; NUMBER HUNDRED;
DIGIT ZERO; DIGIT FIVE; NUMBER TEN THOUSAND;
DIGIT ZERO; DIGIT ZERO; NUMBER HUNDRED;
DIGIT ZERO; DIGIT FIVE
So we have:
= 50
05
00
05
Or more conventionally 50,050,005.
Now of course I am not saying that you would write code that is quite this silly. But it is reasonably straightforward to write an algorithm that can handle these numbers. A bit more background required than I would try to give for an interview question (though someone who could understand it in such a short time and come up with a good answer might have impressed me).
Anyone want to take a stab at it? :-)
Side note #1 -- the Unicode Technical Committee voted in UTC#98 to change the general category of the ETHIOPIC DIGITS from Nd (Number, Digit) to No (Number, Other) due in large part to the fact that the Ethiopic numbers are not generally used as digits. This change was effective as of Unicode 4.01. As such, the update will not be seen in Windows until Longhorn or in the .NET Framework until the version after Whidbey.
Side Note #2 -- Ethiopic is in the category of scripts I defined in The jury will give this string no weight (a fact that will not be changing until coincidentally around the same time -- Longhorn and the .NET Framework in the version after Whidbey).
This post brought to you by "፼" (U+137c, a.k.a. ETHIOPIC NUMBER TEN THOUSAND)
# Igor Tandetnik on Tuesday, February 01, 2005 9:45 AM:
# Michael Kaplan on Tuesday, February 01, 2005 9:54 AM:
# Ahadu on Tuesday, February 01, 2005 10:02 AM:
# Michael Kaplan on Tuesday, February 01, 2005 10:11 AM:
# Michael Kaplan on Tuesday, February 01, 2005 11:27 AM:
# Dean Harding on Tuesday, February 01, 2005 2:55 PM:
# Michael Kaplan on Tuesday, February 01, 2005 3:31 PM:
# Dean Harding on Tuesday, February 01, 2005 4:47 PM:
# Michael Kaplan on Tuesday, February 01, 2005 4:50 PM:
# Marcel on Tuesday, February 01, 2005 6:09 PM:
# Dean Harding on Tuesday, February 01, 2005 6:40 PM:
# Michael Kaplan on Tuesday, February 01, 2005 8:47 PM:
# Marcel on Wednesday, February 02, 2005 2:41 AM:
# Dean Harding on Wednesday, February 02, 2005 4:14 AM:
# Michael Kaplan on Wednesday, February 02, 2005 6:09 AM:
# Dean Harding on Wednesday, February 02, 2005 11:59 AM:
# Michael Kaplan on Wednesday, February 02, 2005 12:20 PM:
# Ahadu on Wednesday, February 02, 2005 8:51 PM:
# Michael Kaplan on Wednesday, February 02, 2005 9:13 PM:
# Marcel on Thursday, February 03, 2005 6:38 AM:
# Dean Harding on Tuesday, February 08, 2005 2:38 AM:
# Michael Kaplan on Tuesday, February 08, 2005 7:55 AM:
referenced by
2010/02/20 The road not traveled (or, more to the point, the road not built) for Amharic
2010/02/03 Read my LIP[s]: It rhymes with አማርኛ (Amharic) !
2008/10/21 Behond the Table Driven Text Service, Part 14 (Don't expect too much from numbers)
2007/04/14 Rhymes with Amharic #4 (a.k.a. we're all [sub]set so turning out the lights and going to [em]bed!)
2007/02/14 Nothing seems to be parsing the crap out of *this* number
2006/10/02 What would it mean to internationalize StrCmpLogicalW?