There's no "I" in IDN, part 7: IDN comes to AdWords

by Michael S. Kaplan, published on 2011/07/28 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/07/28/10190701.aspx


Previous parts in this series:

Now I'll admit it has been a long time coming, but it is now here....

As mentioned by Lisa Shieh in IDN now supported by Google AdWords:

If you write your ads in any language other than English, you may have wished that you could make your display URL consistent with the rest of your ad by showing your internationalized domain name. Today, we’re announcing that we now support non-ASCII characters, including non-Latin characters and Latin characters with accents and diacritics, in display and destination URLs.
 
When you create an ad in AdWords, you can now enter Unicode characters in the display and destination URL fields. To ensure that users will be able to reach your site, we’ll verify that the URL works properly in both Unicode and Punycode.
 
We also want to make sure that users are not shown URLs in any language other than their own, so we’ll render the display URL in Unicode characters only if its language matches the user’s Google interface language. In all other cases, it’ll be shown in Punycode.
 
Remember that all of our AdWords policies regarding display and destination URLs still apply. In particular, the domain of the display and destination URLs must match, so if you use non-ASCII characters in the display URL, make sure to do the same in the destination URL.
 
Additionally, if you include multi-byte characters, please note that the character limits may vary. For more information, please visit our Help Center.

Now those last two links are pretty important ones.

The first one says:

How Much Text Can I have In My Ads?

Ads can show, including spaces, 25 characters for the title, 70 characters for the ad text, and 35 characters for a display URL (or approximately 17 for languages that use non-ASCII(multi-byte) characters).

On Google, text ads are displayed on four lines: a title, two lines of ad text (each with 35 characters), and a URL line. However, the format may differ on Google partner sites.

Some Eastern European and Asian countries also support longer text ads containing up to 30 characters in the title and 76 characters in the ad text.

I. Ad Text

If your ad text contains any wide characters, such as certain capital letters and punctuation marks, fewer characters may fit on the line. The system will notify you if you exceed a character limit. Also, some of Google's syndication partners may not display non-standard characters if you include them in your ad.

If you create text ads using non-Latin characters, please be aware that the character limit may vary. Ads in languages with non-latin (double-byte) characters, such as Chinese, Japanese, and Korean, can contain the following number of characters, including spaces: 12 characters in the title, 17 characters in each line of ad text, and 17 characters in the display URL. Countries that support longer text ads have higher double-byte character limits.

II. Display URL

Google can only display up to 35 characters of your display URL, due to limited space. If your display URL is longer than 35 characters, it will appear shortened when your ad is displayed. WAP mobile ads can show up to 20 characters in a display URL, so any longer domain will be truncated to fit within those limits. For non-ASCII (multi-byte) languages such as Japanese or Korean, the width of these characters can vary, so the display URL might be shortened if it’s longer than 17 characters.

If your display URL is longer than 35 characters (or 20, for WAP mobile ads), you may consider using a shortened version of your URL, such as your homepage. Please be sure that your display URL accurately represents your destination URL, the page within your site to which users are taken via your ad. The display URL should have the same domain (such as example.com) as your landing page.

Also, please note that your display URL must be an actual web address, appearing in the form of a valid URL. It must include the extension (such as .com, .net, or .org,). It does not need to include the prefix (such as http:// or www).

Since your ad space is limited, try to create compelling and targeted ad text that is highly relevant to the products or services you're promoting. You can optimize your ad text to create the most effective ads.

And the second one? It says:

You can use non-ASCII (multi-byte) characters (such as those used in Japanese, Korean, and Chinese) in your URLs, but note that some of these characters need nearly twice the display space as single-byte characters. So, the exact number of characters you can use in a destination URL might be less than the character limit shown in the preview counter. To mitigate against URL spoofing, non-ASCII characters will be displayed only when the user’s interface language matches the characters in the visible URL. In all other cases, the URL will render as ASCII punycode. For example, if your Google interface language isn't in a language that uses Cyrillic characters (e.g. Russian), these characters won't render (e.g.http://пример.испытание will display as http://xn--e1a...).

Aha, very informative! And there is now a little insight into why it took so long for us to see this....

Google, like Microsoft, is a big place, and it takes time to get every team interested in something new, no matter how important you might think it is....

Given how long URLs were limited to just LDH (letters A-Z, digits 0-9, hyphen), it's easy to see how any given technology might have such limitations in its own DNA, and how un-eager they would be to make changes that could lead to service degradation or  customer confusion.

Overall, I think it's good that AdWords has taken these steps.

Though I will feel better when the very natural annoyance with provincial assumptions like "each person knows only one language" also penetrate the AdWords folks and they take the next step -- like finding a more intelligible way to show IDNs that are in a different script.

The current solution is a great rudimentary first step, but it can't be the last one. Showing Punycode so readily is never the best answer, so hopefully that is a temporary plan (this one, moderated by UI language, has some obvious flaws in it)....


Joshua on 3 Aug 2011 3:10 PM:

Showing punycode prevents some very nasty attacks, like say Greek glyphs matching English glyphs resulting on a URL appearing to go where it does not go.

Michael S. Kaplan on 3 Aug 2011 3:35 PM:

It is not a good solution since text that cannot be read cannot be judged. The final solution will need to be smarter than this.


referenced by

2013/10/17 There's no "I" in IDN, part 19: There's no "I" in IPv6, either!

2013/10/08 There's no "I" in IDN, part 18: There isn't even an "I" in John C. Klensin's name!

2013/09/13 There's no "I" in IDN, part 17: EAI made it to China, and everybody knows it!

2013/04/19 There's no "I" in IDN, part 16: It's a good thing they decided to call it EAI!

2012/10/12 There's no "I" in IDN, part 15: Still no 'I' in EAI.... but we could use an US sometime soon!

2012/08/08 There's no "I" in IDN, part 14: It turns out there's no "I" in IE, either

2012/05/18 There's no "I" in IDN, part 13: Desktop and Managed and Metro; oh my!

2012/02/27 There's no "I" in IDN, part 12: Emoji + IDN == U+1F4A9 (PILE OF POO)

2011/10/25 There's no "I" in IDN, part 11: There's no place like ::1, not even 127.0.0.1!

2011/09/21 There's no "I" in IDN, part 10: Who needs IDN support? How much? When? (Part 2)

2011/09/16 There's no "I" in IDN, part 9: Who needs IDN support? How much? When? (Part 1)

2011/08/12 There's no "I" in IDN part 8: Punycode don't do the PUA

go to newer or older post, or back to index or month or day