by Michael S. Kaplan, published on 2005/06/26 01:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/06/25/432723.aspx
The Win32 DrawText function and its more full-featured cousin DrawTextEx, have been around for a long time. They both have a simple stated set of purposes:
...draws formatted text in the specified rectangle. It formats the text according to the specified method (expanding tabs, justifying characters, breaking lines, and so forth).
Now I have talked about word breaking in the past, and obviously they are related (where else would you break lines but on valid word breaks?). But the DrawText/DrawTextEx functions are from an earlier time -- a time before complex scripts, or good integration of Unicode character properties, of the real existence of mature Unicode character properties.
But let's take a look at its offerings, via the variou flags you can specify that affect the word break behavior:
DT_WORDBREAK - Breaks words. Lines are automatically broken between words if a word extends past the edge of the rectangle specified by the lprc parameter. A carriage return-line feed sequence also breaks the line.
DT_NOFULLWIDTHCHARBREAK - Prevents a line break at a DBCS (double-wide character string), so that the line-breaking rule is equivalent to SBCS strings. For example, this can be used in Korean windows, for more readability of icon labels. This value has no effect unless DT_WORDBREAK is specified.
Huh?
What the first one is trying to say is that by default, the text will just keep going and then when the border is reached it will start the new line and possibly break right in the middle. But if you pass the DT_WORDBREAK flag, then you are saying to make the breaks at the boundaries of words in the text. Which is pretty much what people expect (and what controls like EDIT already do themselves).
The second flag was added after many user complaints about the Windows 95/NT 4.0 behavior that treats after each CJK ideograph as a potential word break opportunity. This new flag says to treat CJK the same way everything else is treated -- look for the spaces as the word break opportunities.
Of course you may expect for more than just U+0020 to be handled when I say space. But most of the ones you would expect on such a list would not be there.
Interestingly, all of the following are also looked as word breaking opportunities in East Asian text:
Obviously the functionality in DrawText and DrawTextEx is not quite up to Unscribe standards, when it comes to complex scripts. But you know how I feel about NLS API behavior changes? Well this is core GDI behavior, and both they are MS Typograophy have to worry about even the most minute changes in behavior of their functions once something has shipped. Because you never know who is relying on it. A small change in word break behavior could make the page count of a document double or worse, so even sensible changes can only be made via new flags (or in the case of complex scripts via new functions).
This post brought to you by " " (U+0020, a.k.a. SPACE)
# TheMuuj on 25 Jun 2005 11:38 PM:
# Michael S. Kaplan on 26 Jun 2005 10:34 AM:
referenced by
2005/06/27 A quick look at Whidbey's TextRenderer