by Michael S. Kaplan, published on 2006/02/09 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/02/09/528337.aspx

Regular reader Maurits asked in the Suggestion Box:

What kind of name for a character is this??


Well, first (and most importantly), I should point out the name of the Unicode character U+534d is not that string -- this is just a Unified CJK Ideograph.

(look out, that is a big download!)

Though if you look at the Unihan database for the information on the character, you will see where that string comes from. It is in the kDefinition field:

An English definition for this character. Definitions are for modern written Chinese and are usually (but not always) the same as the definition in other Chinese dialects or non-Chinese languages. In some cases, synonyms are indicated. Fuller variant information can be found using the various variant fields. • Definitions specific to non-Chinese languages or Chinese dialects other than modern Mandarin are marked, e.g., (Cant.) or (J). • Major definitions are separated by semicolons, and minor definitions by commas. Any valid Unicode character (except for tab, double-quote, and any line break character) may be used within the definition field.

And the reverse character is also in Unicode, and it is in Unihan, with the same definition.

Thomas Chan suggested:

The entry for U+534D in the _Hanyu Da Zidian_, vol. 1, p. 51 (as indicated in unihan.txt) includes a quote that it was originally not a Han character, "wan ben fei zi ...", suggesting that it now is.  There are also serifs shown in that dictionary and the _Kangxi Zidian_ for both characters.

Couldn't the above two characters be considerd a "CJK" or "IDEOGRAPHIC" version (like the spaces, zero, punctuation, brackets, etc. in the "CJK Symbols and Punctuation" block)?

Andrew C. West posted some of the background information on these characters:

If memory serves me, the swastika was formally designated a Chinese ideograph by the redoubtable Empress Wu of the Tang dynasty during the late 7th century. Empress Wu had a penchant for creating new ideographs, and decreed that the Buddhist swastika symbol should henceforth be considered a Chinese ideograph to be pronounced WAN4 (a deliberate homophone for U+842C "10,000"). This is why, unexpectedly to some, the swastika symbols are found in the CJK Ideograph block rather than elsewhere.

Incidentally, U+534D and U+5350 are rarely used within running text in Chinese. In the decorative arts the swastika motif is generally described as WAN4ZI4 <842C, 5B57> "WAN ideograph", as in the word WAN4ZI4JIN1 <842C, 5B57, 5DFE>, a type of turban with a swastika decoration that was the height of fashion during the Ming dynasty.

The truth is that these characters are much older than any of the offensive things that were later done under their banner, and certainly in the correct context and situations, it would be a real problem not to include them.

Now if it were up to me I would do something about that definition string, in Unihan, whether it is in the dictionary or not, but that is a different story, entirely....


This post brought to you by "卍" and "卐" (U+534d and U+5350, two CJK Unified Ideographs)

# Andrew West on 9 Feb 2006 8:34 AM:

There are some people who would like to see the swastika symbols encoded separately for use as general symbols, and leave U+534D and U+5350 for Han usage only -- which I think is probably unnecessary. Nevertheless, the current situation is not entirely satisfactory, as the script property of U+534D and U+5350 is "Han", but the Unicode Standard explicitly states that these characters may be used in other scripts (see remarks in the Tibetan chapter in TUS 4.0 p.257), which suggests that these two Han ideographs should perhaps be given a script property of "Common".


# Michael S. Kaplan on 9 Feb 2006 8:45 AM:

Hi Andrew,

Yes, but I can see why they might be hesitant to take that particular step....

# Andrew West on 9 Feb 2006 9:07 AM:

I can understand why the UTC and WG2 would be reluctant to encode new swastika symbols (just remember the fuss about "unacceptable symbols" accidentally included in the Bookshelf Symbol 7 font), but I'm not sure what objections there may be to changing the script property of U+534D and U+5350 to Common.

# Michael S. Kaplan on 9 Feb 2006 9:14 AM:

Well, one can argue now that they are not really intended for the use that offends, but if they were "Common" there is no way to know who will pick up on that fact and use in places beyond Tibetan.

You consider it a limitation in the script designations (where the only choice is  for one or all, the need for just two or three is not really met).

Chris Fynn on 15 Dec 2009 10:24 PM:

Of course in Unicode 5.2 we now have U+0FD5  and U+0FD6 - and, if you want them with polka-dots,   U+0FD7  and U+0FD8.

