Every character has a story #8: U+03c2 (GREEK SMALL LETTER FINAL SIGMA)

by Michael S. Kaplan, published on 2005/03/31 02:48 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/03/31/404052.aspx


GREEK SMALL LETTER FINAL SIGMA is the sort of character that only ever gets to have the last word.

The character (ς) only ever gets used when it is the last character in the word; otherwise you are supposed to use U+03c3 (σ, a.k.a. GREEK SMALL LETTER SIGMA).

Some of the backstory about its presence in Unicode is explained on this site:

The final sigma is a positional variant of sigma (U+03C3 Greek Small Letter Sigma, σ), such as also occurs in Hebrew and Arabic. It might legitimately be questioned whether Unicode needed a separate codepoint for the two lowercase sigmas; and indeed, Beta Code has done without the differentiation. The use of distinct codepoints in the legacy scheme of Latin-7 has decided the matter, however.

That site also talks a bit about the history of the actual letter's use in the language.

U+03c2 has some prominent mentions in parts of the section of the Unicode FAQ about character properties, case mappings, and names:

Q: Do the case mappings in Unicode allow a round-trip?

A: No, there are instances where two characters map to the same result. For example, both a sigma and a final sigma uppercase to a capital sigma.

and

Q: Near the end of the SpecialCasing.txt, there are the two lines on SIGMA that look weird to me. Can you explain them:
# 03C3; 03C2; 03A3; 03A3; FINAL; # GREEK SMALL LETTER SIGMA
# 03C2; 03C3; 03A3; 03A3; NON_FINAL; # GREEK SMALL LETTER FINAL SIGMA

A: Both of these are conditional (column 5); that is, in normal Greek text a 03C3 (non-final sigma) should be written as 03C2 (final sigma) if it is at the end of a word, and a 03C2 (final sigma) should be written as a 03C3 (non-final sigma) if it is not at the end of a word. That's what these two lines would mean if they were uncommented. However, they are commented, just for that reason: the SpecialCasing file is not intended to normalize the appearance of a small sigma.

There was also an interesting thread in the Unicode List about this character back in early 2001:

Carl W. Brown:

It is final when followed by a hyphen or combining diacritical mark?  Can you have a final sigma in the middle of a word?

Patrick T. Rourke:

Don't know what the Unicode rules are, but the answer is no.  The final sigma form is not used if the sigma is in a medial position in the word but at the end of the line (e.g., when it occurs at the point of hyphenation in a hyphenated word at line end).  Also, there is no reason why a consonant other than rho should be followed by a combining diacritical mark, except say an underdot for use in papyrological or epigraphical texts.

The upper case sigma is the same regardless of position; there is no differentiation between upper case final sigma and upper case initial/medial sigma.

If a font uses the lunate sigma for the initial/medial form, it must use it for the final form as well, and vice versa.

Nick Nicholas:

On the latter, yes in some 19th century typographical traditions, where the final sigma is used to differentiate the prefix pros- from pro-; e.g. you'll see Lambros in his _Neos Hellenomnemon_ journal write, say, PRO*S*ABBATON = pro-sabbaton, but PRO*@*AGW = pros-agw. (Sorry about non-Unicode; I'm on a Mac and have left my lookup-list at the office.) This tradition has not been maintained, and I don't think it was ever mainstream in Western Europe. I think I've also seen it done with other such prefixes, like eis-.

Diacritics following a final sigma would only occur in Modern Greek dialectology --- e.g. hacek used to denote that the sigma is pronounced as "sh". (Epigraphists and papyrologists too, I suppose, though they'd tend more to the lunate sigma anyway.) In that case, yes, the final sigma remains final. Before a hyphen, on the other hand, it would clearly remain medial, unless you're pulling the 19th century pros- prefix trick.

Carl W. Brown riposted to Nick:

Nick,

If you have a lowercase sigma in the middle of the word followed by a diacritic is it final;

sigma, hacek, some other letter.

Carl W. Brown then tried to restate the question he as trying to ask:

Maybe in might be clearer to ask if there are any cases where you use the final sigma form where it is not the last letter in a word.  Modern Greek only.

Lucas Pietsch responded to Patrick:

Just one addition: You do get a final sigma before explicit (hard) hyphens, i.e. u+2020 and other kinds of dashes, as opposed to (soft) line-breaking hyphens (u+00AD).

I guess explicit hyphenation isn't likely to occur in typesetting of Ancient Greek, but it does occur in Modern Greek, in noun compounds of the type κράτος-μέλος. The Unicode rules will handle this correctly, as far as I can see.

Michael Everson proved he'd be an ace in the "Scripts" category of the Trivial Pursuit game :-)  :

Sigma with caron is used at least in dictionaries of the Tsakonian dialect.

Nick Nicholas disagreed with Carl W. Brown's suggestion:

>If you have a lowercase sigma in the middle of the word followed by a diacritic is it final;

>sigma, hacek, some other letter.

No, sir. And medial sigma-diacritic is far more frequent than a sigma having a diacritic word-finally.

Nick Nicholas also responded to the restating of the question by Carl W. Brown:

What I described in my first paragraph is the only such instance I'm aware of (the 19th texts I have in mind were editions of Byzantine texts, but I think the editor was generalising it in his orthography, and was not the only one to do so). It has never been mainstream practice. You'll see a lot of stigmas as sigma-tau ligatures up to the nineteenth century, and being printed as final sigmas; but they're stigmas nonetheless, not sigmas.

Oh, just remembered: the phonetic Greek alphabet used in the Soviet Union in the '30s for Pontic and Mariupolitan Greek uses the final sigma universally (and doubles it for "sh".) Again, not mainstream, and any such texts that have been reprinted in Greek academia have been reprinted in conventional orthography. (The Mariupolitans are now using Cyrillic; the ex-Soviet Pontians are mostly migrating to Greece, and I don't know if they're still writing their dialect.)

Carl W. Brown responded to this fuller response from Nick Nicholas:

It looks like the Unicode TR 21 special casing rules for the Greek final sigma are not quite right.

The final sigma in modern Greek should only be used at the end of a word including the case where separate words are joined with hard hyphens.  If it is followed by a character such as a combining mark or soft hyphen you must continue scanning to see what follows.  If it is followed a letter then it is not final.

A simpler test might be it see if a letter or a spacing character or hard hyphen is found first.  If it is a letter then it is not a final sigma.

Nick Nicholas responded to this suggestion promptly:

Which is what we do at the TLG with Beta code (whose S is both medial or final); in fact, Beta code conflates hard hyphens and dashes anyway, considering the (em) dash, without space, punctuation.

If the Unicode rules are wrong, well, I hope those that can fix them are tuned in. :-)

Mark Davis then jumped in, to respond to the bug report:

Yes, that was filed as a bug, and will be fixed the next time we update the case mappings. We are right in the middle of the Unicode 3.1 release, so that will be coming sometime later.

I will talk about this character on Windows in a different post....

 

This post brought to you by "ς" and "σ" (U+03c2 and U+03c3, a.k.a. GREEK SMALL LETTER FINAL SIGMA and GREEK SMALL LETTER SIGMA)
Both of whom are eager to read that upcoming post about their status on Windows!


# Sebastian Redl on 31 Mar 2005 10:21 AM:

Old German script has the same thing. The old s in the middle of the word looks like an f, but without the horizontal line up there (more precisely, it only extends in one direction). At the end of the word, it's written like a modern s.

referenced by

2007/06/12 The difference between 'Dangeous Characters' and 'Dangerous Minds' is the lack of Michelle Pfeiffer

2006/08/18 Sometimes, uppercasing sucks

2005/04/04 When casing does not need to roundtrip in .NET

go to newer or older post, or back to index or month or day