Two guys walked into a bar, but the bar was broken

by Michael S. Kaplan, published on 2006/02/24 01:01 -08:00, original URI: http://blogs.msdn.com/michkap/archive/2006/02/24/538496.aspx


It was over a year ago that I pointed out in the post Keyboards: hardware vs. software how disconnected our team (which owns most of the keyboard layouts) and the hardware team (which owns most of the actual keyboard hardware) were.

And how impressive it was that we managed to be in sync so often, given that disconnect.

But it is possible I may live the rest of my life without being able to understand why almost every keyboard layout has a key which, when typed, will produce | (U+007c, a.k.a. VERTICAL LINE) yet printed upon the face of the key is ¦ (U+00a6, a.k.a. BROKEN BAR).

What's up with that?

It turns out that every single byte code page other 874 of the Windows code pages supports U+00a6, and every single Windows code page bar none (pardon the pun) supports U+007c.

And just about every font that has one has the other.

Even though in most cases (to get back to keyboards) almost every keyboard prints one on theface of a key but the matching layout has the other input.

So why this disconnect?

And more importantly, why does it persist?

And most important of all, why don't people complain? In either direction?

I suspect it is because no one really cares.

Or maybe is just that two guys can walk into a bar. Even if it looks like it is broken. Since it turns out they may still be serving drinks....

 

This post brought to you by "|" (U+007c, a.k.a. VERTICAL LINE)


# abhinaba on Friday, February 24, 2006 4:12 AM:

I'm using the Microsoft Natural Multi-media keyboard and that doesn't have the broken bar http://www.microsoft.com/hardware/mouseandkeyboard/productdetails.aspx?pid=019

# Gabe on Friday, February 24, 2006 5:31 AM:

On the console's raster fonts (and Terminal) U+7c is still a broken bar, so I think that this is historical.

It seems the bar was first broken in 1967 and later repaired in 1977, but not all devices changed along with it. I would assume that IBM decided the bar should be broken, so it made the original PC with a broken bar on the keyboard and in the graphics card fonts. Later on when Microsoft started implementing fonts of course they implemented the ANSI standard solid bar, but nobody ever thought to change the keyboard.

At some point I guess ISO decided that the broken bar was still useful (backcompat maybe?), so they assigned it a code point in the upper half of the charset.

# Mike on Friday, February 24, 2006 6:45 AM:

I suspect that people don't even understand that theres a difference - they just see it as two different ways of printing the same character.  Like 0 with a dot, a dash or empty.

FWIW the random, cheap Dell keyboard I've got in front of me has both ¦ and | and they both print the right characters.

# Ben Cooke on Friday, February 24, 2006 7:13 AM:

It's a solid bar on my old, clunky, UK keyboard!

I suspect that one reason why no-one complains is that only geeks ever type that symbol, and geeks don't generally look at the keyboard while typing! I certainly didn't know what was printed on that key until you prompted me to look.

Of course, on a UK keyboard the | symbol is on an extra key between the Z and the left shift key. A key which doesn't exist at all on a US keyboard! We also have a second extra key between the " (which is really the @ key!) and the enter key, which has on it # and ~. I've often wondered why the UK keyboard is so different from the US one, since the only extra symbol you get is the British pound sign in the place of # on the 3 key.

My laptop has a US keyboard, and I initially tried to use the UK keymap on it despite the lying key captions, but I soon found out that I had four symbols I could no longer type due to the missing keys. Quite frustrating.

# teambanana on Friday, February 24, 2006 9:36 AM:

On my keyboard when I press '|' I get '¦' and vice verca.  And the only way that I can get '¦' involves use of the 'Alt Gr' key. Luckily I don't think I've had any cause to use in in the last 5 years!

# pcooper on Friday, February 24, 2006 10:33 AM:

I always thought of them as being the same character, like the many forms of writing 0. I didn't realize that they were both in character sets. When would one use one versus the other? I'm not aware of their uses in typography, just their use for piping in shells and their use in ASCII art.

# Chris L on Friday, February 24, 2006 11:10 AM:

When I was learning MS-DOS, I was really confused when the manual told me to type the "pipe" key, which looked like a "vertical line" and not being able to find it.

# Maurits on Friday, February 24, 2006 11:39 AM:

I would guess the reason the keyboard shows a broken version is to avoid confusing the "|" key with the "I" key.  In fact, on my Microsoft Natural Keyboard, the "I" on the "I" key and the "|" on the "|" key look absolutely identical.

Which brings up another question.  Why are the letter marks on the keyboard capital, but when you press them, the letter that shows up on the screen is lowercase?  (Unless you have Caps Lock on, of course.)

# Maurits on Friday, February 24, 2006 12:23 PM:

Oh boy...
http://www.fileformat.info/info/unicode/char/007c/index.htm 
http://www.fileformat.info/info/unicode/char/00a6/index.htm 
http://www.fileformat.info/info/unicode/char/01c0/index.htm 
http://www.fileformat.info/info/unicode/char/05c0/index.htm 
http://www.fileformat.info/info/unicode/char/2223/index.htm 
http://www.fileformat.info/info/unicode/char/2758/index.htm 

# Maurits on Friday, February 24, 2006 12:25 PM:

Not to mention
http://www.fileformat.info/info/unicode/char/0049/index.htm
http://www.fileformat.info/info/unicode/char/006c/index.htm

# Michael S. Kaplan on Friday, February 24, 2006 1:16 PM:

Uppercase letters one the key faces even though the default keystrokes will be lowercase is something inherited from typewriters....

# Maurits on Friday, February 24, 2006 2:15 PM:

Whee...

Il|¦ǀ׀∣❘

Watch out for that RTL character in there.

# Vorn on Friday, February 24, 2006 5:39 PM:

my mac keyboard has an unbroken bar.

Vorn

# orcmid on Friday, February 24, 2006 6:01 PM:

Gabe was the closest on this one.   And in a way, it was IBM's fault, but not the way you might think.

As I recall, the original ASCII specification, ANS X3.4-1968 had the broken bar, and so the keyboard committee (a different group) used it when the X4.22 and X4.23 keyboard standards were created.  (I'd go look for it but I don't want to move off this page.)

The problem had to do with reserving the stylization of "!" being as "|" (for logical-or symbol) and of "^" as the logical not symbol, "¬".  To allow for that, there could be no vertical bar already in the code and so position 7/12 got the broken bar.

I have this vague recollection that an objection from IBM was involved.  There could have also been some concern that the 7/12 position was available for customization in international usage.

In the X3.4-1977 revision, it was observed that those stylizations never caught on and the idea was removed.  Also, IBM hadn't implemented ASCII very much anyhow.  (IBM's move to ASCII didn't start in earnest until introduction of the PC and Microsoft may deserve some credit here.)  The vertical bar was restored to the 7/12 position in line with the International Reference version of ISO 646-1973.  It appears that the keyboard standard "got stuck"

# orcmid on Friday, February 24, 2006 7:07 PM:

It's all coming back to me (and thank heavens for digital libraries that go way back).

The first ASCII (ANS X3.4-1963) was defined in a 7-bit code, but there were 29 undefined code points.  http://doi.acm.org/10.1145/366707.367524

The lower-case letters were not defined yet, nor were any of the special characters in the same sticks as the lower case letters.  There were also 4 control-code positions in the top (7/12-7/15) positions.  Keep in mind that having a 4-bit subset and a good 6-bit (64-character) subset was important at that time.  Also think EBCDIC 64-character subsets with "|" and "¬" already tucked in there.  Think PL/I programming.

When the filled-out code was being proposed and brought out for public comment, the situation was rather different.  The vertical bar was proposed for 7/14, there was a "¬" in 7/12 (but called overbar with a hook for readability), the tilde was in 6/12 and carat was in 6/14.  There were some pretty amazing intermediate stages, documented in http://doi.acm.org/10.1145/363831.363839

In the rearrangement before X3.4-1968 was completed and agreed to, the back-slash arrived and the organization became what it is now.  The vertical bar became broken so that a 64-point subset could have vertical bar as a substitute for ! as lobbied for by IBM.  In ISO 646-1973 the tilde disappeared and the overbar (without the hook) ended up at code point 7/14.  I don't think I ever saw that used, but I can't verify that X3.4-1968 went directly to tilde at 7/14.

It helps to remember that while all of this was being figured out, most computer memory organizations and printer/display capabilities made 6-bit character codes the norm.  There were few peripheral units that provided for more characters than that and I never saw a punched card with lower-case codes on it, though I'm sure there were some.  

Although EBCDIC had been introduced (along with System/360's 8-bit bytes), it was a sparse 8-bit code and the telecommunication folk were having none of it.  It took minicomputers and teletype terminals to bring ASCII into serious use for computing.    

# John Elliott on Saturday, February 25, 2006 7:31 AM:

My recollection is that on Win95, the default UK options in Setup would select a UK keyboard and codepage 850; then in DOS windows the |\ key next to Shift produced character 0xDC (U+00A6, broken bar) not 0x7C (U+007C, vertical bar). And anyone who used it certainly did care, because it meant that pipe operators didn't work and you couldn't write C programs in a DOS IDE.

# Ben Cooke on Saturday, February 25, 2006 8:58 AM:

The question about the upper-case keycaps has reminded me that at my primary school (I'm not sure what the US equivalent of that is, but it's for children aged five to ten years) they had BBC Micros with sticky labels on all of the alphabetical keys and the lowercase letters written on. I guess this was because the younger children hadn't learned about uppercase characters yet.

Now that computers are highly prevalent in schools, I wonder if they go to the effort of getting in lowercase-captioned keyboards or if they now just expect the kids to deal with it. After all, my old primary school — now largely refurbished and currently host to the next generation of my family — has several rooms devoted to computers, whereas in my day we had just roughly one per class of 30 children and they were generally just used to play educational computer games once in a blue moon. (I was "computer monitor" when I was seven or eight years old! Gotta start young!)

# Gabe on Saturday, February 25, 2006 4:55 PM:

My understanding of the 1967 decision was that PL/I users needed a vertical bar, and IBM's user group wanted it to be among the upper case letters because not everybody had lower case at the time. Since PL/I didn't use the exclamation mark, and it is essentially a vertical line, they said that you could make your exclamation mark look like a vertical bar. Of course then people with full 7-bit charsets would still have the real vertical bar character and the only way to visually tell them apart would be to have a hole in the U+7C character.

Of course this was stupid and was fixed 10 years later, but IBM apparently never got the memo.

# Martin Kochanski on Monday, February 27, 2006 4:27 PM:

If you download a search from Dialog, | is used to mark end-of-field and end-of-record, and ¦ is used to mark end-of-paragraph within a field.

# Maurits on Tuesday, February 28, 2006 4:19 PM:

> Uppercase letters ... inherited from typewriters.

This then defers the question to "why are typewriter keys marked in upper-case even though (usually) you get a lower-case letter when you press them?"

I was watching "Bumping into Broadway"
http://www.imdb.com/title/tt0009973/

... and one of the early scenes shows a typewriter with SIX ROWS of keys.

The three top rows are the capital letters, marked as such:
QWERTYUIOP
ASDFGHJKL
ZXCVBNM

and the bottom three rows are the lower case letters, also marked as such:
qwertyuiop
asdfghjkl
zxcvbnm

Also, the top three rows were black keys with white markings, and the bottom three were white keys with black markings.

The movie was made in 1919, but the typewriter was very likely old even then... Harold Lloyd's character has some technical troubles and it ends up getting thrown out of a window.

# Maurits on Tuesday, February 28, 2006 5:41 PM:

It's all here:
http://www.mytypewriter.com/generic.html?pid=21

The first page has:
"The Sholes and Gliden model, wrote capitals only, is the first for introducing the QWERTYT keyboard, which is still used in computer keyboard of today."

The third page has:
"Caligraph No. 1  was the second typewriter appeared on the US market in 1880 (shown on the right.) Its No. 2 model had a giant keyboard that featured both lower and upper cases rather than the shift key used on double-case machines from Remington."

# Maurits on Tuesday, February 28, 2006 8:31 PM:

It looked basically like this:
http://staff.xu.edu/~polt/typewriters/caligraph.html

The author of that page mentions that the capital letter Q is missing.  But the typewriter in the movie definitely had a capital Q.

referenced by

2013/09/24 Keyboards: hardware vs.software, Redux

go to newer or older post, or back to index or month or day