The MSL8 project? Cool!

by Michael S. Kaplan, published on 2007/02/23 11:10 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/02/23/1748028.aspx


Bart mentions over the Suggestion Box:

Continuing on the 'Wednesday, October 11, 2006 8:05 AM Michael S. Kaplan

Why can't the CP_ACP be UTF-8? ' topic.

We are currently looking into making something like unicows.dll.

Except that it intercepts calls to A functions and converts from UTF-8 instead of CP_ACP and calls the W functions. (and back ofcourse))

This is probably less work then converting our apps to use W functions. (third party components we can't replace is the biggest problem, that and delphi isn't strong on the WChar side)

Yep, Bart was following on to this post. :-)

As projects go, the one he is talking about sounds like a pretty interesting one. And not just because it is a much cooler version of the very MSLU project from my past, though I will admit that is a not insignificant part of it.

It is just amazing to think about in terms of how much easier it would be to accomplish, using the same automated means as the very first build of MSLU (the one that had none of the hundreds of special cases needed later to try to work around various Win9x bugs). I'd be able to focus on the more compelling side of the user messaging story, the sort of thing that there just wasn't time to do with unicows.dll.

I doubt I'd be able to convince anyone at Microsoft to sponsor it (not that I wouldn't be willing to try!), though in terms of how much work it would require, it would cost significantly less than trying to get the ANSI Win32 API to support UTF-8, a perennial topic of discussion it seems at times....

Assuming it wouldn't have to be written in Delphi of course, maybe I should ask Bart if he is hiring and take a short leave of absence? :-)

 

This post brought to you by 8 (U+0038, a.k.a. DIGIT EIGHT)


Ben Bryant on 23 Feb 2007 1:22 PM:

I think this is an interesting idea. As it stands you cannot reliably put UTF-8 strings into Delphi Strings or AnsiStrings if any of the core code will be utilizing Win32 or library functions. This would conceivably allow that, however a lot of code that assumes Ansi (max of 2 bytes per character, as well as lead/trail byte stuff) must be addressed such as string functions like EndsWith, NextChar, etc, so I still have questions about the strategy.

Michael, I assume you are saying that you wouldn't need the Win9x bug workarounds  because you would not be supporting Win9x, not because UTF-8 would inherently solve those issues.

Michael S. Kaplan on 23 Feb 2007 1:54 PM:

Correct, Ben -- one of the biggest pain points in the project was Win9x bugs and behavior and debugging challenges....

Rob Kennedy on 24 Feb 2007 1:55 AM:

Bart might be interested in my MSLU units for Delphi. One is just a set of functions that directly call the unicows.dll functions (no different from calling any other DLL), but the other is one that patches the import table to redirect all calls the program makes, even from areas of the program you don't have source-code access to.

http://www.cs.wisc.edu/~rkennedy/mslu

Bart on 26 Feb 2007 5:35 AM:

I got a proof of concept working allready, altough some parts are truncated and RichEdits seem to be a real challenge.

Intercepting all the WndProc's flying about required some runtime code generation, but that seems to be working fine.

Ben, besides redirection the Windows unit functions i will also redirect stuff like NextChar where needed

Rob, Thanks i'll take a look :)

Currently i'm looking at what kind of code is at the current implementation and either change the jump address or overwrite it with a jmp.

Michael, sorry it is all in Delphi :) and i can't hire you either :) i was hoping more for the reverse.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2011/06/22 There and Back Again (aka ACP --> UTF-8 --> ACP)

go to newer or older post, or back to index or month or day