Po vs.So? Sm vs. So? General Category, generally speaking

by Michael S. Kaplan, published on 2011/07/27 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/07/27/10190214.aspx


Over in the Suggestion Box, Van asked:

New possibility for the Every Character has a Story series. Fantasai wrote in to the Unicode list asking about Gc of various characters, but one group really popped out to me, which is Music Sharp in Misc. Symbols, and two of the white triangles in Geometric Shapes are all Gc=Sm, while Music Flat and the corresponding black triangles are Gc=So. Just a thought.

Ah, the Unicode List.

Let me quote Fantasai's full message:

So I've been doing some very close reading of the Po and So categories, and there are a few things that aren't making sense to me. I was wondering why is:

   - per cent, per mille, per ten thousand classified as Other Punctuation (Po) not as Other Symbol (So)?
       http://www.fileformat.info/info/unicode/char/0025/
       http://www.fileformat.info/info/unicode/char/2030/
       http://www.fileformat.info/info/unicode/char/2031/
       http://www.fileformat.info/info/unicode/char/066a/
       http://www.fileformat.info/info/unicode/char/0609/
       http://www.fileformat.info/info/unicode/char/060a/
       http://www.fileformat.info/info/unicode/char/fe6a/
       http://www.fileformat.info/info/unicode/char/ff05/

   - number sign, ampersand, and commercial at classified as Other Punctuation (Po), not Other Symbol (So)?
       http://www.fileformat.info/info/unicode/char/0023/
       http://www.fileformat.info/info/unicode/char/0026/
       http://www.fileformat.info/info/unicode/char/0040/
       http://www.fileformat.info/info/unicode/char/fe5f/
       http://www.fileformat.info/info/unicode/char/fe60/
       http://www.fileformat.info/info/unicode/char/fe6b/
       http://www.fileformat.info/info/unicode/char/ff03/
       http://www.fileformat.info/info/unicode/char/ff06/
       http://www.fileformat.info/info/unicode/char/ff20/

   These characters each symbolize a concept, and are not used as punctuation (except maybe in URLs, but that shouldn't count). So why are they punctuation and not symbols?

   - music sharp classified as Mathematical Symbol (Sm) not Other Symbol (So), while music flat is So, not Sm?
       http://www.fileformat.info/info/unicode/char/266f/

   - certain white triangles are classified as Mathematical Symbol (Sm) while their corresponding black triangles are classified as Other Symbol (So)?
       http://www.fileformat.info/info/unicode/char/25b7/
       http://www.fileformat.info/info/unicode/char/25b8/
       http://www.fileformat.info/info/unicode/char/25c1/
       http://www.fileformat.info/info/unicode/char/25c2/

   These just seem really inconsistent...

~fantasai

 Well, let me start from the top.

All of the items that are Po are correct, by the way I have learned them -- they are punctuation, they are not symbols.

Perhaps the way I learned these things is flawed, and I won't defend my education on this point.

But I will point out that obviously some other people might have learned the same way....

As for the various musical symbols, such twiddling to try to build up consistencies can be interesting, of course. But to consider it useful and productive, the difference between the two has to be significant in some way for the processes that make use of them.

You know, like the kind of differences I mentioned in Why are there MODIFIER LETTERS that are not in the Letter, Modifier category?, looking at Sk vs. Lm differences.

Such an argument would likely have to be made here, too....

But here it looks like some effort was put into making the symbols that people thought might have other uses the So vs. the Sm for math-only characters.

None of it looks inconsistent in any real sense. It just looks like an effort was made to categorize that perhaps not everyone agrees with.

Van's question I kind of put into that same category....


no comments

go to newer or older post, or back to index or month or day