Line breaking, according to DrawText

by Michael S. Kaplan, published on 2005/06/26 01:01 -04:00, original URI:

The Win32 DrawText function and its more full-featured cousin DrawTextEx, have been around for a long time. They both have a simple stated set of purposes:

...draws formatted text in the specified rectangle. It formats the text according to the specified method (expanding tabs, justifying characters, breaking lines, and so forth).

Now I have talked about word breaking in the past, and obviously they are related (where else would you break lines but on valid word breaks?). But the DrawText/DrawTextEx functions are from an earlier time -- a time before complex scripts, or good integration of Unicode character properties, of the real existence of mature Unicode character properties.

But let's take a look at its offerings, via the variou flags you can specify that affect the word break behavior:

DT_WORDBREAK - Breaks words. Lines are automatically broken between words if a word extends past the edge of the rectangle specified by the lprc parameter. A carriage return-line feed sequence also breaks the line.

DT_NOFULLWIDTHCHARBREAK - Prevents a line break at a DBCS (double-wide character string), so that the line-breaking rule is equivalent to SBCS strings. For example, this can be used in Korean windows, for more readability of icon labels. This value has no effect unless DT_WORDBREAK is specified.


What the first one is trying to say is that by default, the text will just keep going and then when the border is reached it will start the new line and possibly break right in the middle. But if you pass the DT_WORDBREAK flag, then you are saying to make the breaks at the boundaries of words in the text. Which is pretty much what people expect (and what controls like EDIT already do themselves).

The second flag was added after many user complaints about the Windows 95/NT 4.0 behavior that treats after each CJK ideograph as a potential word break opportunity. This new flag says to treat CJK the same way everything else is treated -- look for the spaces as the word break opportunities.

Of course you may expect for more than just U+0020 to be handled when I say space. But most of the ones you would expect on such a list would not be there.

Interestingly, all of the following are also looked as word breaking opportunities in East Asian text:

Obviously the functionality in DrawText and DrawTextEx is not quite up to Unscribe standards, when it comes to complex scripts. But you know how I feel about NLS API behavior changes? Well this is core GDI behavior, and both they are MS Typograophy have to worry about even the most minute changes in behavior of their functions once something has shipped. Because you never know who is relying on it. A small change in word break behavior could make the page count of a document double or worse, so even sensible changes can only be made via new flags (or in the case of complex scripts via new functions).


This post brought to you by " " (U+0020, a.k.a. SPACE)

# TheMuuj on 25 Jun 2005 11:38 PM:

Note to self: Don't use DrawText.

As far as I know, there are new classes in Whidbey for drawing text with GDI (as a result of GDI+'s questionable screen rendering in some cases). Are these based on DrawText?

Is there any word or rumors on a managed wrapper for Uniscribe?

P/Invoke may be okay for now, but more and more applications will run with ClickOnce, and security will become more of an issue.

So maybe we need System.Drawing.Text.Complex.dll.

Just a thought. Hopefully WinFX will fix a lot of these brick walls in .NET and we (non-MS developers) won't be using Interop for much.

# Michael S. Kaplan on 26 Jun 2005 10:34 AM:

Well, I don't want to distort any truths here.... the main point of the new methods is to pick up the updates to the shaping engines in Uniscribe, which can handle more languages than GDI+.

I'll be talking about the new methods more another day, though.

referenced by

2005/06/27 A quick look at Whidbey's TextRenderer

go to newer or older post, or back to index or month or day