Pluralization(s) can be singularly difficult

by Michael S. Kaplan, published on 2007/07/24 03:15 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/07/24/4022881.aspx


A tribute to plurals, with fondest memories of the first comedian I ever enjoyed, Allan Sherman (original inspiration of Weird Al Yankovic for those who don't know the name):

One Hippopotami

One hippopotami cannot get on a bus,
Because one hippopotami is two hippopotamus.
And if you have two goose, that makes one geese.
A pair of mouse is mice. A pair of moose is meese.

A paranoia is a bunch of mental blocks.
And when Ben Casey meets Kildaire, that's called a paradox.
When two minks fall in love, with all their heart and soul,
You'll find the plural of two minks is one mink stole.

Singulars and plurals are so different, bless my soul.
Has it ever occurred to you that the plural of "half" is "whole"?

A bunch of tooth is teeth. A group of foot is feet.
And two canaries make a pair--they call it a parakeet.
A paramecium is not a pair.
A parallelogram is just a crazy square.

Nobody knows just what a paraphernalia is.
And what is half a pair of scissors, but a single sciz?
With someone you adore, if you should find romance,
You'll pant, and pant once more, and that's a pair of pants

Pluralization is hard.

Even in English you need a huge dictionary with all of the weird and interesting exception cases (once you convince yourself that sticking an "s" on the end of every word won't do it.

It came up again in multiple comments to In a much better position to handle inserts by Centaur like this one and this other one.

And his examples were not really overstatements, believe it or not.

We'll start with the obligatory Wikipedia link on pluralization, which will help to scratch the  surface enough to make one realize what one has just gotten oneself into.

Languages like Spanish are considered pretty simple but can still fill a page with the explanation of them.

English is reasonably simple with its cases for one, many, and uncountable (where the uncountables are usually in singular form, and zero items take the plural form). But all of the rules with subject/verb agreement are the min force behind me not paying attention in English class as a youth and being much more interested in the weird rules of other languages than the rules of my own. You could almost blame my linguistic notions on the crazy orgy of inconsistencies embodied in my native tongue.

And then in French things are pretty simple (ref: here, here, and here). Then again those page list exceptions up the wazzoo. Oh, and zero items take a singular form, which also sounds weird to me though someone from France would find the converse to be true.

Then there is Hebrew, whose uncountable words tend to take plural form. Oh, and they add gender to the mix as many others do, each with a different suffix. Then there are those bisexual words like "one" which have both a masculine and feminine suffix form. And some words that are feminine yet take a masculine plural suffix.

Most Indic languages have singular, dual, and plural forms, though Hindi only has singular and plural while Sanskrit has the dual form too.

Lots of other Indo-European languages also have a dual form.

Polish has singular and plural like most of them, but then it also has a paucal form for when the last digit is 2, 3, or 4 (not including 12, 13, or 14).

Persian (or is it Farsi? Or maybe not!) has many rules a lot like English, other than the influx of Arabic loan words that come with their plurals and make up a lot of exceptions -- which, come to think of it, is also a lot like English. Though with different loan words (and of course the different script).

And Slovenian has a special purpose "dual" that is used for all numbers ending in two.

To put into programming a bit, Jeff Boulter has talked about it in his 5 way(s) to pluralize, and I just noticed that Tom White also quoted a bit of that Allan Sherman sing has even made a plea for people with knowledge of other languages to get involved with Java solutions here.

But C# is out there too -- see dmitryr's Simple English Noun Pluralizer in C#, for example, which has a couple of great comments that delve into additional exceptions and other language.

Or fun ones like Bradley Tetzlaff's C# 2.0 Ninety Nine Bottles of Beer Example, which shows a very important practical implementation. :-)

Even my own IStemmer'ed the tide talks about how stemming is involved with pluralization (among other things).

The rules are very complex even to get any one language done perfectly, so doing lots of languages is staggering.

Definitely a hard problem to consider. I think I'll leave that one alone, myself, and just try and stem some tides (leaving the stemmering to others!).

 

This post brought to you by S (U+0053, a.k.a. LATIN CAPITAL LETTER S)


# Mihai on 24 Jul 2007 12:33 PM:

Let's add Romanian, and count horses (works for everything else, including files and folders, but horses are shorter :-):

0 cai // plural
1 cal // singular
2 cai
3 cai
...
19 cai
20 de cai // !!!
...
99 de cai
100 cai // !!!
...
10119 cai
10120 de cai

So the rule if n % 100 >= 20 you would add "de"

People are used with "de" missing in computer UI, because computers are stupid, but you don't get away with it as a human :-)


referenced by

2007/08/13 Some documentations is having troubles with theirs pluralizations syntaxes

go to newer or older post, or back to index or month or day