How long is it in the console?

by Michael S. Kaplan, published on 2007/04/23 16:51 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/04/23/2250029.aspx


The width of a string in the console can be a tricky question to calculate. So earlier today when Eugene asked:

What is the right way to figure out how many columns on a console will be needed to display a Unicode string? I believe wcwidth does that for some Unix systems. What is the right way to do it on Windows?

I have to admit there is no perfect answer.

I mean first of all there are the issues I talked about in Fonts that are 'fixed-width' even if they do not claim to be, and that post even has a partial solution to the question which simply uses the results of the code page to determine whether a character takes up one space or two.

But this is not a solution for arbitrary Unicode strings, since for example there are plenty of ideographs that cannot be found in one of those DBCS code pages.

There are also several other flaws in this approach I cited that Buck Hodges's post suggested, such as the fact that code pages like 932 contain several characters like U+0410 (CYRILLIC CAPITAL LETTER A) that are actually double byte on the code page even though they will be single byte in the console.

So as a method it is only useful for a small range of characters (arguably the ones most commonly used, of course, but even still...).

So we will set aside the code page method given that the question was trying to imply arbitrary Unicode strings.

One could perhaps veer into the typographic method of using a fixed width font (and simply dividing the total width by the single character width), though then one to worry about the problems pointed out in The font width is broken? Well, fix it! and Where are the IE plain text fonts? and What would a 'Kartika Fixed' font for Malayalam DO, exactly?.

In the end, this two would limit t he number of characters that can be supported to a subset -- though in this case a slightly more acceptable subset since it is the same one defined by the console itself (since neither can handle complex scripts properly!).

To be honest, this does not suggest a superior UNIX via a wcwidth call, since it too has the same limitations and does not return the actual console column widths in these same cases in any implementation I have seen.

It just happens to be a hard problem!

In the end you may just have to pick your poison, and choose which limitation you want to (or are able to) live with. Well, that or stop using the console. :-) 

 

This post brought to you by А (U+0410 a.k.a. CYRILLIC CAPITAL LETTER A)


# Ben Karas on 23 Apr 2007 6:40 PM:

It sounds to me like they want to know how many columns (out of the default 80) would get used up during a printf() to a command console.  Character widths don't have anything to do with that question.  

Would it work if you call WideCharToMultiByte passing CP_OEM?

# Michael S. Kaplan on 23 Apr 2007 6:46 PM:

Well, character widths do enter into it with trying to figure out what will happen in the console -- what happens if the output codepage of the console is changed to 65001? You will get back 1, 2, 3, an sometimes even 4 bytes per character when the console will never use more than 1 or 2 columns per character....

The first solution solves the problem is the string is in the right subset. Otherwise it just gets harder. :-(

Evan Lavelle on 12 Feb 2010 3:09 AM:

The question specifically asked about Unicode strings, and 'columns' on a 'console'. If there is no MS-specific answer (and I haven't found it yet) then the answer is to use Markus Kuhn's wc[s]width implementations at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day