by Michael S. Kaplan, published on 2012/03/23 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/03/23/10286854.aspx
The other day Karl Williamson asked:
This is a graphic symbol for the control character U+0007. Since Unicode 6.0 used the name "BELL" for another character, this character is now misnamed, and would likely confuse someone not familiar with the history.
I think something should be done. One possibility is to add a correction or alternate name for this in NameAliases.txt, like "SYMBOL FOR ALERT". Another possibility is to add a notation in NamesList.txt.
Shriramana Sharma offered:
If anyone is looking at the chart of http://www.unicode.org/charts/PDF/U2400.pdf and sees the preceding and following characters I don't think there is any chance of anyone getting confused.
Have you also seen 237E "BELL SYMBOL"? Apparently this was a compatibility character for the same purpose.
Wow now we have:
2407;SYMBOL FOR BELL;So;0;ON;;;;;N;GRAPHIC FOR BELL;;;;
As Ken (W) said usually new names are only checked by an automated algorithm to not clash with existing characters and not manually, we have nice alternatives now...
So anyone want to jingle those bells?!
That one made me smile, and make me think the matter was [more or less] resolved.
But when John M. Fiscella jumped in, I changed my mind:
For those people who may have been born after the dissappearence of teletype machines from this planet, the U+0007 bell was used to make a teletype machine ring a bell (or make some other noise) for signaling purposes.
This brings up an interesting point: although Unicode was primarily invented to represent text in multiple languages, the utility of having some ("control"?) characters whose sole purpose is to activate a sound when embedded in text seems like a good and useful idea. This, in a sense, can already be done by embedding a WAV file in a document, but not to any degree of convience that a single character actor could produce. Now I know this sounds like anathema to suggest that Unicode
include characters used for other than language or graphical text, but give this idea some consideration. A very short block of characters in the BMP reserved for activating sounds (which could be used for a number of purposes) would be useful.
That is definitely not the direction I imagined Unicode wanting to go.
We need an officer to weigh in.
Asmus Freytag? Raise deflector shields, and set phasers on stun!
On 3/4/2012 11:44 PM, John M. Fiscella wrote:
> For those people who may have been born after the dissappearence of
> teletype machines from this planet, the U+0007 bell was used to make a
> teletype machine ring a bell (or make some other noise) for signaling
> This brings up an interesting point: although Unicode was primarily
> invented to represent text in multiple languages, the utility of
> having some ("control"?) characters...
... lies in compatibility.
ISO 10646 treats control codes as outside its scope, in effect you could think of the ISO model of character encoding as one where control codes essentially represent markup.
And the idea for control characters comes from device control, such a ringing a bell, or moving the carriage, or feeding paper.
Just as different devices implement different functions, the code points were supposed to be mere placeholders for some external protocol (by default ISO/IEC 6429) that would define their meaning.
However, Unicode departed from this approach, because by the late 80's it had become clear that some of these controls were no longer primarily used for device control, but represented logical operators on text - related to those characters that Unicode now calls "format controls".
When "plain text" files contain tabs and line feeds, you can't efficiently define plain text algorithms, such as the bidi algorithm without insisting that they be mapped to specific code points. And for all practical purposes, when people needed a TAB character, they used U+0009, etc.
For the remaining code points in the control code range, even Unicode concedes that their particular semantics are outside the scope of the standard. Doing so has the great advantage that legacy data streams (which mixed device control and text, or even assigned graphic characters to control codes) could be mapped to Unicode with only sender and ultimate receiver needing to agree on these detailed semantics - any intermediary could just pass through or "blindly" convert things to and
from 8-bit data streams.
The practical purpose for this balanced treatment has diminished over time, as other protocols have taken over and terminal devices are no longer as important. Instead, the use of a tiny subset of these characters as part of plain text is now dominant.
Using one of those legacy device control characters that's essentially fallen out of use as "precedent" to suggest entirely new avenues of coding is a little far fetched:
> whose sole purpose is to activate a sound when embedded in text seems
> like a good and useful idea. This, in a sense, can already be done by
> embedding a WAV file in a document, but not to any degree of convience
> that a single character actor could produce. Now I know this sounds
> like anathema to suggest that Unicode include characters used for
> other than language or graphical text, but give this idea some
> consideration. A very short block of characters in the BMP reserved
> for activating sounds (which could be used for a number of purposes)
> would be useful.
What can I say?
I guess Every Control Character Has a Story, too! :-)
Joshua on 23 Mar 2012 1:09 PM:
And 7 is still ALARM/BELL, but it's nice to have a glyph for it when you need to render text containing it in a UI.
Peter Krefting on 26 Mar 2012 12:24 AM:
> Every character has a story #37: U+2047 (♩ You Can Ring My [SYMBOL FOR] BELL ♩)
Uhm, U+2047 is ⁇ (DOUBLE QUESTION MARK), you're looking for U+2407…
Michael S. Kaplan on 26 Mar 2012 4:52 AM:
Didn't I say that? :-)
go to newer or older post, or back to index or month or day