Every character has a story #13: U+0241 and U+0294 (upper and lower case glottal stops)

by Michael S. Kaplan, published on 2005/08/17 08:32 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/08/17/452603.aspx

It started just the other day when John Jenkins asked on the core Unicode mailing list:

Now that we have an uppercase glottal stop, any recommendations as to how it should look in a font? Both the uc and lc glottal stops occupy the full space from baseline to cap height, or so I've always understood it…

To which Peter Constable suggested:

He he he... This is the same matter I raised several months ago.

A slight clarification to what you wrote: the orthographic bicameral glottal stops used by some Athabaskan languages are x height and cap height. The caseless glottal stop used in phonetic transcription is cap height -- identical to the orthographic uppercase.

Well, the matter has temporarily ‎gotten slightly more complex: last week, UTC approved the addition of LATIN SMALL LETTER GLOTTAL STOP as the x-height case pair to 0241 (tentatively assigned to 0242). If this is accepted in WG2, then it would likely go in amendment 3, meaning post the next version of Unicode. So, if you are creating fonts *now* to support TUS4.1, you can choose between:

having identical glyphs for 0294 and 0241, anticipating a later addition of the lowercase pair to 0241;
you can innovate a distinction between glyphs for 0294 and 0241 (e.g. make 0241 slightly wider); or,
don't support 0241 until you can also support 0242.

Another logical option consistent with TUS4.1 would be to make 0294 x height, but I'd strongly advise against that, as it likely won't be consistent with a future version of the standard, and would not work for phonetic transcription, which will likely be relevant to a larger number of Apple customers than the orthographic usage.

Ken Whistler's response was perhaps a little more cynical, though accurate as usual!

Ah, John, you arrive right on time with the first set of misconceptions about what is going on.

What happened last week is that a *lowercase* glottal stop was added, invalidating the relationship between the Unicode 4.1 *uppercase* glottal stop and the erstwhile glottal stop, and returning the erstwhile glottal stop to glorious *un*cased status. (See my data file message from yesterday.)

So now we need *3* glyphs for glottal stops.

First we have the *real* glottal stop, U+0294, used in most orthographies without case.

It started out as a tiny hook, grew to the top half of a question mark to accomodate linguists filing the dots off their typewriters to be able to type the thing. It grew further, under the auspices of the IPA, into a taller and taller glyph, in a largely vain attempt to convince Europeans that "nothing" could be a *real* letter -- merely by making it so big they could no longer ignore its appearance in text.

Then the Chipewyan, Dogrib, and Slavey communities in NW Canada, aided and abetted by linguists who should have known better, invented a case pair for glottal stop. And because IPA had turned the thing into a monumental cap form in their effort to get people to take it seriously, the Dene decided, quite reasonably, that that monstrosity *was* a capital letter, and so invented a tinier version to be their normal glottal stop -- the lowercase one in running text.

In actual samples of Chipewyan and Dogrib texts using the case pair, the distinction is basically between a cap-height capital glottal stop and an x-height small glottal stop, otherwise of the same shape. I don't know of any systematic way to distinguish their cap-height capital glottal stop from what we now have as a caseless U+0294 glottal stop -- because, frankly, I don't think that issue ever occurred to the people who were using it and creating fonts for it.

Michael Everson has stepped in, in L2/05-194, with an attempt to make a 3-way glyphic distinction. But I consider the shapes for U+0241 and U+0242 (the uc/lc pair) to be typographic fantasizing. We now have a LATIN CAPITAL LETER GLOTTAL STOP that has grown "fatter" in another vain attempt to convince people it is a *real* letter and to be systematically distinguished from an ordinary glottal stop.

Oh well. Take L2/05-194 for what it is worth. That is the font that is likely to go into the book to confuse future generations further. Now, basically because typographic inflation and casepair invention has still not convinced Europeans that glottal stops are real -- we have embarked down the road of multiplying the encoding of them. Perhaps by the time we have encode 7 more glottal stops (in addition to U+0241, U+0242, U+0294, U+02BC, U+02BE, U+02C0, and U+097D) we'll finally manage to convince people to take it seriously. :-(

Ken further responded, to Peter's post:

> If this is accepted in WG2, then it would likely go in amendment 3,
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It *must* go in Amendment 2. The whole point of the rush on their lowercase pairs for uppercase characters is because the door slams shut on casefolding stability issues as of Unicode 5.0.

> meaning post the next version of Unicode.
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

meaning Unicode 5.0.

> Another logical option consistent with TUS4.1 would be to make 0294 x height, but I'd strongly advise against that,

As would I. You might as well go with the fantasy, Big 'N Tall uppercase glottal stop to convince people for TUS 4.1 that it is the appropriate uppercase for U+0294, and then leave Ol' Fat Boy there in your font when you add the x-height U+0242
for Unicode 5.0 as the actual lowercase for it.

--Ken

Oh, and since you asked -- no, I'm not bitter at all about this travesty. ;-)

Michael Everson agreed with this last assessment:

> then it would likely go in amendment 3, meaning post the next version of Unicode.

No, it has to go into the FPDAM2 because of the case-folding stability lockdown.

>- you can innovate a distinction between glyphs for 0294 and 0241 (e.g. make 0241 slightly wider); or,

I do this. The width of the footstem for 0241 should be the width of a capital I; the width of 0294 should be the width of a small i.

Though he did differ from some of Ken's opinions:

>returning the erstwhile glottal stop to glorious *un*cased status.

Which is the way it is used in IPA and in other Canadian orthographies: as an uncasing character ignored e.g. in title casing.

>...convince Europeans that "nothing" could be a *real* letter -- merely by making it so big they could no longer ignore its appearance in text.

I don't know why you are picking on Europeans, Ken.

>Then the Chipewyan, Dogrib, and Slavey communities in NW Canada, aided and abetted by linguists who should have known better, invented a case pair for glottal stop.

Case-pairing is a perfectly natural thing for people to want to do, and it is no surprise that people did this, and I for one don't find it as distasteful as Ken does.

>And because IPA had turned the thing into a monumental cap form in their effort to get people to take it seriously, the Dene decided, quite reasonably, that that monstrosity *was* a capital letter, and so invented a tinier version to be their normal glottal stop -- the lowercase one in running text.

But of course the UTC decided that 0294 could not be turned from Ll to Lu, which is why 0241 was added as Lu, but this caused the problems that L2/05-194 identifies and addresses.

>Michael Everson has stepped in, in L2/05-194, with an attempt to make a 3-way glyphic distinction. But I consider the shapes for U+0241 and U+0242 (the uc/lc pair) to be typographic fantasizing. We now have a LATIN CAPITAL LETER GLOTTAL STOP that has grown "fatter" in another vain attempt to convince people it is a *real* letter and to be systematically distinguished from an ordinary glottal stop.

Capital letters should have the same vertical weights. I, T, Y, and the capital glottal have stems which should be cap width. 0294 should have the same width as l and i. This is pretty easy, really.

And of course Michael had to respond to Ken's postscript:

>Oh, and since you asked -- no, I'm not bitter at all about this travesty. ;-)

We've only ended up encoding what SIL asked for in the first instance, based on what was Peter's correct analysis. What was accepted by UTC then was an overunification, and that's had to be corrected now. Sorry if that smarts.

John Jenkins responded to part of Ken's post, adding the ideographic point of view to all of this:

> ...we'll finally manage to convince people to take it seriously. :-(

Nah, we'll have to wait until there is an isomorphism between glottal stops, middle dots, turtles, and grass radicals. :-)

Ken had to respond to Michael Everson's post about Europeans (just as Michael had to respond to Ken's!):

> I don't know why you are picking on Europeans, Ken.

Because it was the Spanish, English, Portuguese, French, and the Dutch, with their impoverished, Latin-based writing systems, who colonized the Americas, Michael.

At least if the Arabs had colonized the Americas instead of the Western Europeans, they would have recognized (and written) a glottal stop when they heard one. On the other hand, then we'd probably be arguing about how to encode the 7 dots above and below the LAM needed for writing an ejective lateral affricate in these languages. :-(

Rick McGowan had some fun with John's foray into ideographs:

> Nah, we'll have to wait until there is an isomorphism between glottal stops, middle dots, turtles, and grass radicals. :-)

Oh! oh! Look what I just found... The rare "Spotted Vegetarian Turtle Stop"...

[MK -- picture not posted here in the blog, but it was pretty funny]

To which Benson Margulies responded:

Hmm. A mark turtle stop?

Michael Everson responded to this one effectively:

>Hmm. A mark turtle stop?

A mo' tur'le sto'.

Michael then responded to Ken's European qualification:

>Because it was the Spanish, English, Portuguese, French, and the Dutch, with their impoverished, Latin-based writing systems, who colonized the Americas, Michael.

Pity the Finns didn't take UPA with them.

Peter responded to the theorizing about colonizing of the Americas with an interesting thought...

Hmmm... There are Korean linguists who think that Hangul would do at least as good a job as Latin as a script that can be adapted to any transcription or orthographic need. So, what if the Koreans had colonized the Americas? I guess we'd be re-opening debates about Hangul encoding models.

To which Michael Everson responded:

Yeah... we only have three.

Thomas Milo had a little fun with Peter's notion, too...

What if...

https://www.winston.nl/sitenieuw/artrooms/ar203.html
https://www.winston.nl/sitenieuw/artrooms/poitiers.pdf

A MILLENNIUM BUG

Almost two millennia ago the Romans conquered the Netherlands. Today the Dutch still use Latin letters to write their language. Down with perennial spelling reforms. Just change the bloody script! That would have happened anyway, if Charles Martel at Poitiers AD 732 had lost the battle against the Muslim raider Abd-ar-Rahman ibn-Abdullah al-Ghafiqi. As a result, many European languages, including our own, might have been written with Arabic letters ever since and look today like medieval Spanish or 19th century Bosnian Croato-Serbian.

What have we learned? Well, that the glottal stop is harder than we thought it would be. For me, native speaker of no language that has a glottal stop, who thinks of it as looking like punctuation, it is even harder to imagine. Like a lowercase question mark or something. I guess that's why my thoughts of linguistic aptitude are just delusions....

If nothing else the whole conversation shows that the internal list can be just as off the wall as the external one can be! :-)

# Daniel Garlans on 17 Aug 2005 9:09 AM:

Is a glottal stop the "click" that you see in certain African languages, or is it that one where you sorta swallow in the middle of the word?

# Michael S. Kaplan on 17 Aug 2005 10:33 AM:

Someone just pointed out an "example" in English (where glottal stops are not phonemic) -- if you say "uh oh" then between the two syllables is one (and if it is at the beginning of the sentence then before the word is one, too).

But I have trouble conceiving of a case pair of them!

# Stuart Ballard on 17 Aug 2005 12:00 PM:

Now I want to see the Spotted Vegetarian Turtle Stop image! ;)

As far as the case pair is concerned, presumably if you're saying "Uh oh" at the beginning of a sentence you'd want to write it (for the sake of argument, since I have no "glottal stop" key on my keyboard, I'm pretending it's a G instead) as "Guh goh". An uppercase glottal stop for the one at the beginning of the sentence, and a lowercase one for the one that isn't.

Does that make sense or am I just demonstrating my ignorance?

# Ben Bryant on 17 Aug 2005 12:39 PM:

Is that the same as Hawaii (the dash in Hawa-ee)?

# Ben Bryant on 17 Aug 2005 12:58 PM:

There is a dialect (dialect is probably too strong a word) in parts (or a stratus?) of the U.S., in which people do not use "an" in front of a noun beginning with a vowel, but instead of the n they use a glottal stop. E.g. "a airplane."

# silverpie on 17 Aug 2005 1:44 PM:

And as if that weren't confusing enough, Hawaiian uses the opening-single-quote-like U+02BB (or the same-glyphed and better-supported U+2108) as its glottal stop (presumably to distinguish it from the apostrophe--the two can appear together if a word with a stop is put in English possessive form). As for casing, it's ignored, and if it begins a word, the second letter is capitalized to get titlecase.

# Elsebeth on 17 Aug 2005 3:31 PM:

Hmm, interesting with the reference to Europeans. Danish actually uses glottal stops, we just don't use them in our written language. A little trick to make it more difficult for foreigners to learn the language, I guess. We have quite a few word pairs where the only difference in spoken language is the glottal stop ("beans" vs. "peasants", for instance).

# Michael S. Kaplan on 18 Aug 2005 3:41 AM:

Stuart, you are not showing ignorance -- you are recognizing that something is going on here, and trying to identify it! :-)

# Jim Hughes on 18 Aug 2005 11:54 AM:

A glottal stop is a common nuance in spoken English in the South East of England, being frequently used in the <a href="http://www.phon.ucl.ac.uk/home/estuary/home.htm">Estuary English</a> and Cockney dialects.

Amusingly the word glottal would contain a glottal stop in these dialects, as in Glo'all.

# alanjmcf on 19 Aug 2005 11:05 AM:

And coincidentally hear a programme broadcast this week discussing it at, http://www.bbc.co.uk/radio4/factual/word4word.shtml, "Programme Three: London and the World?", "[...] look at the features of Estuary that have found a place here - TH-fronting, glottal stops and such lexical items as the near national standard terms 'knackered', [...]"

# Stuart Ballard on 19 Aug 2005 11:08 AM:

Still looking for the Spotted Vegetarian Turtle Stop image... Google only finds this post, though...

Jim, good point - I'm originally *from* the SE of England and it never occurred to me :)

# Michael S. Kaplan on 19 Aug 2005 1:20 PM:

Hi Stuart -- we'll see. :-)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/02/29 Newer, stronger, more case pair stability! The world's first 5.1 million dollar character encoding standard!

go to newer or older post, or back to index or month or day