Not everyone completely and totally loves pseudo-localization

by Michael S. Kaplan, published on 2012/06/06 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/06/06/10315337.aspx


I've talked about pseudo-localization a few times before.

Often when I have screenshots of Windows they contain either pseudo-localized of pseudo-enhanced text.

I thought I'd take a moment to explain how pseudo works.

And why so many people run with it on their selfhost machines.

And most importantly, why there are a few senior developers and testers who are serious selfhosters but who don't run it unless they absolutely have to....

Technically, you can put me in that category; while I often selfhost pseudo-mirrored, I avoid regular pseudo (unless it's pseudo-enhanced!)....

The main principle behind pseudo-localization is to localize absolutely everything that can be localized via an automated method.

In this way, anyone can easily notice when text can't be localized due to it not being properly exposed to localization.

And also, if there were supposed to be restrictions on how text was localized to avoid breaking code -- for example with font face names -- that saw the restrictions not properly set, that could be detected as well.

Good engineering, through and through.

So why would some senior developers and testers prefer to not run on pseudo-localized builds?

Well....

Let's look at how the character substitution works.

There are several different tables like this one below that show how to map A-Z and a-z to different lookalike (and look similar enough to be read!) characters:

 

Letter Lowercase Uppercase
a/A ªàáâãäåāăąǻάαад ÀÁÂÃÄÅĀĂĄǺΆΑ∆ΔΛАД
b/B þвьБъ ÞΒßβБВЪЬ
c/C ¢çćĉċčсς ÇĆĈĊČС€
d/D ðďđδ ÐĎĐ
e/E èéêëēĕėęěέεеёє℮зэ ÈÉÊËĒĔĖĘĚΈΕΣЕ∑ЁЄЗЄЭ
f/F ƒ
g/G ĝğġģ ĜĞĠĢ
h/H ĥħнћђ ĤĦΉΗН
i/I ìíîïĩīĭįίιϊіїΐ ÌÍÎĨĪĬĮİΊΪІЇ
j/J ĵј ĴЈ
k/K ķĸκкќ ĶΚЌК
l/L ĺļľŀłℓ £ĹĻĽĿŁ₤
m/M ΜМм
n/N ийлⁿпπήηńņňʼnŋñ ÑŃŅŇŊΝИЙЛП∏
o/O òóôõöøōŏőοσόоǿθб ÒÓÔÕÖØŌŎŐǾΌΘΟΦΩОФΩΏ
p/P þφρр ÞΡР
q/Q    
r/R ŕŗřяѓґгř ŔŖŘЯΓЃҐГ
s/S §śŝşѕš ŚŜŞЅŠ
t/T ţťŧτт ŢŤŦΤТ
u/U µùúûüũūŭůűųΰυϋύцμџ ÙÚÛÜŨŪŬŮŰŲЏЦ
v/V ν  
w/W ŵωώшщẁẃẅ ŴШЩẀẂẄ
x/X ×хж ΧχХЖ
y/Y ýÿŷγУўỳу ¥ÝŶΎΥΫỲЎ
z/Z źżž ŹŻΖŽ

 Now most of these mappings are fine and dandy.

Seriously.

But in some cases, the mapping will change the Unicode General_Category or other key ways that platforms, applications, components, or algorithms use to interact with text.

Perhaps it will changed how it is ordered or collated.

Or how it breaks on words or lines.

Or whether text is to be ignored.

Or not ignored!

If you are a developer or tester and you want to focus your time and effort on valid bugs and not on noise due to bugs in pseudo, then having mappings like

y --> ¥

or

A --> ∆

or

C --> €

or

L --> ₤

or

E --> ∑

or

O --> Ω

or

L --> £

or

 a --> ª

or

c --> ¢

or any of the others can ber challenging....

Because it can be a challenge to be happy when the code you write or the code you test has to deal with that sort of thing....

so I don't blame people who would rather selfhost on pseudo-mirrored builds.

Or true localized builds.

Some of those mappings can be a bit too clever to be smart, if you know what I mean!


John Cowan on 6 Jun 2012 10:16 AM:

Yes, that's obviously bogus.  Letters should be mapped to other letters, and in the correct case, too.

Joshua on 6 Jun 2012 11:12 AM:

We have our own pseudo that does "String" -> "{String}". Unfortunately it has its own defect of not detecting somebody doing string building before the localization call.

Simon on 6 Jun 2012 12:05 PM:

Why are there so many substitutions to different characters with identical glyphs (at least in Verdana)?  The Greek and Cyrillic characters seem to be the biggest offenders here.  Did things get too hard to read with all of the glyphs altered?

Michael S. Kaplan on 6 Jun 2012 5:04 PM:

Nah, but then we seldom use Verdana in Windows UI (I'm tempted to say never, but I've been burnefd by such broad statements in the past!); Segoe UI is much more common....

Simon on 7 Jun 2012 8:31 AM:

I picked on Verdana because that's what the table was rendered in.  Even in Segoe UI, I don't see a difference between "MAIN SCOPE" and "ΜАІΝ ЅСΟРΕ".  Admittedly, that's a contrived example and you'd have to be pretty unlucky to get those particular substitutions.  If the objective is to visually identify unlocalized strings, why allow homoglyphs at all?

kinokijuf on 8 Jun 2012 10:50 AM:

I there a pseudo MUI pack available for Windows?

Michael S. Kaplan on 8 Jun 2012 3:08 PM:

Not at the moment, but discussion is ongoing....

Alex Cohn on 4 Feb 2013 1:33 AM:

It's such a pity that the Google Translator Toolkit does not support φ§єцđø as one of the target languages. Except from that, it's a nice online engine, and it understands quite a few document formats and a zillion of languages.


go to newer or older post, or back to index or month or day