Why that is positively Ethiopic!

by Michael S. Kaplan, published on 2005/02/01 00:03 -08:00, original URI: http://blogs.msdn.com/michkap/archive/2005/02/01/364376.aspx


A little over a week ago, when I was mentioning that In Tamil -- sometimes, they are digits; other times, just numbers, Scott Hanselman suggested "That would ROCK if you would do Ethiopic sometime." Well, rock on Scott -- today is the day.

For the record I am not an expert in these things, just a geek who finds alternate number systems to be really interesting (whether roman numbers, Tamil numbers, or Ethiopic numbers).

Ready? here we go....

Factoid -- there is no Ethiopic zero. There are some numbers that have zeros in them (10, 20, 30, etc.) but no zero. It makes the number system quite fascinating.

We'll start with a small quote from the Unicode Standard on the subject, found in Chapter 12, Section 1 (available for viewing online in PDF format, here):

Numbers. Ethiopic digit glyphs are derived from the Greek alphabet, possibly borrowed from Coptic letterforms. In modern use, European digits are often used. The Ethiopic number system does not use a zero, nor is it based on digital-positional notation. A number is denoted as a sequence of powers of 100, each preceded by a coefficient (2 through 99). In each term of the series, the power 100^n is indicated by n HUNDRED characters (merged to a digraph when n = 2). The coefficient is indicated by a tens digit and a ones digit, either of which is absent if its value is zero.

For example, the number 2345 is represented by

2,345 = (20 + 3)*100^1 + (40 + 5)*100^0
      = 20 3 100 40 5
      = TWENTY THREE HUNDRED FORTY FIVE
      = 1373 136b 137b 1375 136d 
      = ፳፫፻፵፭

If you are like me then your eyes may have crossed when you read this, even though the example seemed clear enough. Maybe they should have put in a bigger example....

Personally, I find Daniel's Ethiopic Number Algorithm #4 to be much clearer from a conceptual standpoint. If you prefer something a bit more cerebral with code samples, then you can look at http://www.geez.org/Numerals/ for a slightly different algorithm (using the same number, I suspect a shared source, maybe? <grin>). The page even has links to demonstrations of the algorithm in Perl, C, Java, and C#.

So let us take the resulting number that both sites talk about (፯፻፷፭፼፵፫፻፳፩) and try to convert it back from Ethiopic to our familiar Arabic-Indic digits:

= ፯፻፷፭፼፵፫፻፳፩

= 136f 137b 1377 136d 137c 1375 136b 137b 1373 1369

= DIGIT SEVEN; NUMBER HUNDRED; NUMBER SIXTY; DIGIT FIVE; NUMBER TEN THOUSAND; NUMBER FORTY; DIGIT THREE; NUMBER HUNDRED; NUMBER TWENTY; DIGIT ONE

(I removed the word ETHIOPIC from each character name to allow more to fit per line)

At this point, even knowing what the number is, the words on the site ("Conversion from Ethiopic numerals into western form is trivial") do not seem quite as true, do they? :-)

Though it actually is easy, it just looks hard. Keeping in mind those "sentinels" that ETHIOPIC NUMBER HUNDRED and ETHIOPIC NUMBER TEN THOUSAND represent (with two digits in each group, between them) and we have:

= DIGIT SEVEN; NUMBER HUNDRED;
      NUMBER SIXTY; DIGIT FIVE; NUMBER TEN THOUSAND;
      NUMBER FORTY; DIGIT THREE; NUMBER HUNDRED;
      NUMBER TWENTY; DIGIT ONE

Notice how the sentinels keep swapping between the TEN THOUSAND and the HUNDRED? Interesting...

Picking at the pieces:


      65 
      43 
      21

or more conventionally

7654321

Not too hard, right? Lets try another one:

= ፳፩፼፳፰፻፷፯፼፶፫፻፱

= 1373 1369 137c 1373 1370 137b 1377 136f 137c 1376 136b 137b 1371

= NUMBER TWENTY; DIGIT ONE; NUMBER TEN THOUSAND; NUMBER TWENTY; DIGIT EIGHT; NUMBER HUNDRED; NUMBER SIXTY; DIGIT SEVEN; NUMBER TEN THOUSAND; NUMBER FIFTY; DIGIT THREE; NUMBER HUNDRED; DIGIT NINE

A little harder this time, but lets do the grouping where those grouping sentinels are and see what we have:

= NUMBER TWENTY; DIGIT ONE; NUMBER TEN THOUSAND;
    NUMBER TWENTY; DIGIT EIGHT; NUMBER HUNDRED;
    NUMBER SIXTY; DIGIT SEVEN; NUMBER TEN THOUSAND;
    NUMBER FIFTY; DIGIT THREE; NUMBER HUNDRED;
    DIGIT NINE

We seem to be missing a digit right before that nine -- what happened to two numbers in each group? Ah, thats easy -- look at the sentinel! A zero goes there. So we have:

= 21 
    28 
    67 
    53 
    09

And as Tommy Tutone knows, Jenny's New York phone number is indeed 212-867-5309.

Ok, one more that shows a bit more of that missing zero stuff:

= ፶፻፭፼፭

= 1376 137b 136d 137c 136d

= NUMBER FIFTY; NUMBER HUNDRED; DIGIT FIVE; NUMBER TEN THOUSAND; DIGIT FIVE

Ooh, a tough one. I'll insert some fake zeros in where they seem to belong based on those sentinels:

= NUMBER FIFTY; NUMBER HUNDRED; 
    DIGIT ZERO; DIGIT FIVE; NUMBER TEN THOUSAND;
    DIGIT ZERO; DIGIT ZERO; NUMBER HUNDRED;
    DIGIT ZERO; DIGIT FIVE

So we have:

= 50 
    05
    00
    05

Or more conventionally 50,050,005.

Now of course I am not saying that you would write code that is quite this silly. But it is reasonably straightforward to write an algorithm that can handle these numbers. A bit more background required than I would try to give for an interview question (though someone who could understand it in such a short time and come up with a good answer might have impressed me).

Anyone want to take a stab at it? :-)

Side note #1 -- the Unicode Technical Committee voted in UTC#98 to change the general category of the ETHIOPIC DIGITS from Nd (Number, Digit) to No (Number, Other) due in large part to the fact that the Ethiopic numbers are not generally used as digits. This change was effective as of Unicode 4.01. As such, the update will not be seen in Windows until Longhorn or in the .NET Framework until the version after Whidbey.

Side Note #2 -- Ethiopic is in the category of scripts I defined in The jury will give this string no weight (a fact that will not be changing until coincidentally around the same time -- Longhorn and the .NET Framework in the version after Whidbey).

 

This post brought to you by "፼" (U+137c, a.k.a. ETHIOPIC NUMBER TEN THOUSAND)


# Igor Tandetnik on Tuesday, February 01, 2005 9:45 AM:

How do you represent a number with four or more consecutive zeros? Say, 1000001 or 10000000001. Do you alternate HUNDRED and TEN THOUSAND characters, with nothing between them, for each group of two zeros?

# Michael Kaplan on Tuesday, February 01, 2005 9:54 AM:

Well, here are a few more to show the pattern:

1000001 ፻፼፩
10000000001 ፻፼፼፩
100000000000001 ፻፼፼፼፩

:-)

# Ahadu on Tuesday, February 01, 2005 10:02 AM:

1000001 => ፻፼፩
10000000001 => ፻፼፼፩

See:

http://geez.org/Numerals/NumberSamples.html

sample numeral conversions from the sources.

# Michael Kaplan on Tuesday, February 01, 2005 10:11 AM:

Or you can do what I did.... I took the C# source and compiled it. The code creates that NumberSamples.html file, and you can add to it whatever numbers you like. :-)

# Michael Kaplan on Tuesday, February 01, 2005 11:27 AM:

Darn, I figured thousands of page views would find one person who wanted to give it a shot. :-)

# Dean Harding on Tuesday, February 01, 2005 2:55 PM:

I never understood how they represent the answer to "5 - 5" in these numbers systems... I guess they hadn't invented subtraction when they came up with it. :p~

Besides, it's really only mathematicians who care about "0" - you don't really see it in every day life, do you?

# Michael Kaplan on Tuesday, February 01, 2005 3:31 PM:

Well, I see zero all the time.... and so do Swedes (its on every elevator).

# Dean Harding on Tuesday, February 01, 2005 4:47 PM:

Sure, but if we didn't have a zero to begin with, then they'd probably put "G" on there (or whatever the first letter of the Swedish word for "ground" is).

I guess my point was that if we hadn't "invented" zero so that subtraction could be properly defined, then we'd probably never need one (e.g. instead of saying "$0 deposit" in an ad for a car, you'd say "no deposit" or whatever.)

Mind you, I only really started thinking about this when I posted my first post, so maybe there *are* plenty of reasons for a zero outside of mathematics - I just can't think of one right now (that is, where you can't replace the zero by something equally meaningful...)

As an example, my local IP address is 10.0.0.45. Those two zeros could just be left blank and you'd have "10. . .45" which is equally unambiguous.

Anyway, the Ethipians, Romans and Sri Lankins seemed to get along fine without them. Maybe my point is irrelevent, I dunno, but it's interesting nonetheless.

# Michael Kaplan on Tuesday, February 01, 2005 4:50 PM:

Definitely interesting -- the whole area fascinates me. :-)

Though it was too bad no one decided to code up the Ethopian to Arabic-Indic solution....

# Marcel on Tuesday, February 01, 2005 6:09 PM:

Okay, all I know about this numeric system I have from this article and the linked algorithm, but don't you leave out some "power characters"?
Like in "7654321" shouldn't it be "DIGIT SEVEN; NUMBER HUNDRED; *NUMBER TEN THOUSAND;*" etc.?

# Dean Harding on Tuesday, February 01, 2005 6:40 PM:

Just another reason I love reading your blog - there's always something that gets me thinking about things I would have never considered before. I mean, why would it have otherwise occured to me that you can get along with a character for zero anyway?

OK, you've convinced me: I'm at work now (different timezones and all) but when I get home, I'll see if I can't write a little Ethiopian to Arabic-Indic converter :)

# Michael Kaplan on Tuesday, February 01, 2005 8:47 PM:

Well, lets see -- the number would be ፯፻፷፭፼፵፫፻፳፩.

Thats:

DIGIT SEVEN; NUMBER HUNDRED;
NUMBER SIXTY; DIGIT FIVE; NUMBER TEN THOUSAND;
NUMBER FORTY; DIGIT THREE; NUMBER HUNDRED;
NUMBER TWENTY; DIGIT ONE

So you started ok, but you forgot the two numbers between the HUNDRED and the TEN THOUSAND....

:-)

# Marcel on Wednesday, February 02, 2005 2:41 AM:

No, I mean if you look at the linked "algorithm 4" it should be
DIGIT SEVEN; NUMBER HUNDRED; NUMBER TEN THOUSAND;
NUMBER SIXTY; DIGIT FIVE; NUMBER TEN THOUSAND;
NUMBER FORTY; DIGIT THREE; NUMBER HUNDRED;
NUMBER TWENTY; DIGIT ONE

The seven is 7*10^6 or 7*10^(2+4) after all.

# Dean Harding on Wednesday, February 02, 2005 4:14 AM:

OK, I took up your challenge :)

You can see a screenshot of my app here: http://www.codeka.com/tmp/ethiopian.png

And you can download the C# source + binary here: http://www.codeka.com/tmp/ethiopian.zip

Simply enter an arabic-indic number in the bottom text box, click convert and it'll output an ethiopic number (the code for that is "borrowed" from that web site). If you then cut'n'paste that ethiopic number into the top text box and click that convert button, it'll convert it back to the familiar arabic-indic form. (That's the code I wrote).

I didn't write any automated tests, but you can manually test by typing a number into the arabic-indic box and clicking "Test" - this'll convert to Ethiopic then back again - you'll have to eyeball the result to make sure it's right.

It's probably not that lenient with respect to invalid ethiopic numbers, but it works OK for normalized numbers.

The code for doing the conversion is not that hard, but I won't bother explaining it here, since you should just be able to look at and see (probably better to follow it through with the debugger, it's not commented very well, heh)

# Michael Kaplan on Wednesday, February 02, 2005 6:09 AM:

Very cool, Dean -- you took up the challenge and got the job done!

Under the circumstances, I would recommend adding a comment to the top with a copyright and a little info, adding an attribution to the other site for "algorithmic inspiration in one direction" and then posting it in a more permanent place than a "tmp" directory. Code like this ought not to be lost. :-)

# Dean Harding on Wednesday, February 02, 2005 11:59 AM:

Well, I've been meaning to set up a blog of my own for quite a while now, so this is a good excuse to get off my butt and do it over the weekend. Once I do that, I'll fix up the code and post it in a more permanent place.

# Michael Kaplan on Wednesday, February 02, 2005 12:20 PM:

Well, make sure it is a place that will let you host files and such.... some blogs do and some don't. :-)

This one for example does not -- but I had other sites I could use to offload content to when I have to show pictures or whatever....

# Ahadu on Wednesday, February 02, 2005 8:51 PM:

Marcel, you did perceive a discrepancy with the Algorithm #4 correctly, it produces different results than the source codes. #4 was snared by a pitfall with the power expansions.

The geez.org article is really an algorithm #6 by the same author and supersedes anything appearing earlier. The article first appeared in Multiligual magazine in 2000 and has been used in Mozilla since 2001 or so. A prose form of the algorithm can be found in the CSS3 list module proposal. Lots of numeral algorithms there actually.

# Michael Kaplan on Wednesday, February 02, 2005 9:13 PM:

Interesting! Too bad things are not labeled a bit more explicitly....

I'll think of slgorithm #4 like the Bohr model of the atom -- even though its not accurate, it does help explain some of the concepts. :-)

# Marcel on Thursday, February 03, 2005 6:38 AM:

Ah, I haven't had a look at the second article because the first one was already simple enough to understand.
I also found that it fittet the Unicode quote ("the power 100^n is indicated by n HUNDRED characters"), the only clue for the contrary now being that they're speaking of a digraph when 'n' is exactly two and not for every 2 'n's.

Ah well, it's not like I think I'll ever going to need it ;-) But interessting nonetheless.

# Dean Harding on Tuesday, February 08, 2005 2:38 AM:

I finally set up my blog and posted a slightly newer version of my app. The main difference is a slight restructuring of the class that does the conversion, and some copyright notices :)

Have a look here: http://www.codeka.com/blogs/index.php/dean/2005/02/04/parsing_ethiopian_numbers

I'd eventually like to make it handle stuff like roman numerals and other kinds of number systems...

:-)

# Michael Kaplan on Tuesday, February 08, 2005 7:55 AM:

Very cool. :-)

referenced by

2010/02/20 The road not traveled (or, more to the point, the road not built) for Amharic

2010/02/03 Read my LIP[s]: It rhymes with አማርኛ (Amharic) !

2008/10/21 Behond the Table Driven Text Service, Part 14 (Don't expect too much from numbers)

2007/04/14 Rhymes with Amharic #4 (a.k.a. we're all [sub]set so turning out the lights and going to [em]bed!)

2007/02/14 Nothing seems to be parsing the crap out of *this* number

2006/10/02 What would it mean to internationalize StrCmpLogicalW?

go to newer or older post, or back to index or month or day