The 'grammar' of identifiers

by Michael S. Kaplan, published on 2005/06/28 15:01 -04:00, original URI:

Ok, it is time for one of my periodic delusional episodes (you know, those delusions of linguistic aptitude I have from time to time.

(this post pre-recorded, a little blog experimentation!)

Now there is disambiguation, a word which may have already existed (it did according to a colleague who is in fact a linguist). But it was spontaneously reinvented by a program manager presenting at a developer conference, trying to describe the process by which identifiers in VBA are resolved. And it had been a party that went late into the night on the evening prior, and he was tired. Maybe even a little hungover. He was explaining that if the name has not been bound to anything yet, that what it is meant to refer to is ambiguous. This set the stage for his next words -- that VBA had to look through the references to disambiguate the name.

He thus introduced the word into the cosmic consciousness of VBA developers.

But I was going to talk about identfiers.

When I had to write the managed version of IsNLSDefinedString for Whidbey, what to call it was an interesting question. I suspect that largely on the basis of no one else really caring what it was called, no oneobjected when I dubbed it IsSortable. I actually had more than one person ask me later if sortable is a real word (to which I of course responded that English is a productive language, yada, yada, yada). On the longstanding precedent of the Server 2003 IsNLSDefinedString function in addition to weightless strings, unpaired surrogates and private use characters will cause them both to return FALSE. And while several people have asked why (since both do have some kind of sorting weight), people have stuck to their guns on this one -- there is no useful cross-machine usage for either unpaired surrogates or for private use area characters in identifiers like machine names.

It may seem somewhat pretentious, it may even be a little pretentious. It is just not a good idea, and maybe by having a method that calls itself IsSortable, people can be influenced about the idea of using these things in machine names and identiers in programming languages and such. The former might be possible (Active Directory uses us for collation after all), but the latter is of course a pipedream, since programming languages that allow attrocities like this will not even blink before allowing these "unsortable" characters in identfiers.

But is there something wrong with using IsSortable here? It is not like the naysayers who questioned its validity as a word had a better name they could suggest. And the method is referring to strings being used in collation operations, which do prefer meaningful strings anyway. Maybe IsDefined would have been better, but people seemed reluctant to have too new concepts added. If people were to ask what were the consequences of being undefined, the answer would be that you could not sort them effectively -- so we'd have explain what it meant to be sortable, anyway. So the current plan has fewer concepts to explain. :-)

Now yesterday, I was talking about the TextRenderer class. If one has the job of rendering, I suppose one is a renderer. And if one is rendering text, one is a text renderer. And English is a productive language, yada, yada, yada. But is it a word one would usually use in this context? It seems like it is more common to put the word Render in a method. And obviously when one looks at the methods on the class, it has two actions -- measuring and drawing. It is kind of a stretch to say they both fall under the category of the act of rendering. So what makes this usage seem okay? Or maybe nothing does and people just shake their heads.

Which gets me back to what you could (loosely) call the subject of this post.

The grammar of identifiers is a sparse one, meant to be the consistent application of a limited number of concepts. If I were creating a property on System.String for this, it would just be String.Sortable or String.Defined. But any time they have to be methods (like when they take parameters), they have an Is prefix, like all the char.Is* methods. Maybe calling a class TextRenderer feels weird to some because classes are supposed to be "pure" nouns, and not just the noun forms of verbs. Or maybe it just feels weird since the scope of what the class represents is not fully covered by the name. All as if we are trying to create an actual language that one could use to communicate the concept of the program.

Of course to non-programmers it may appear that programmers are talking like people with developmental disabilities, and some linguist may even balk at the idea of calling a programming language an actual language, but in truth one can communicate some very complex ideas. And every time I write a program in an object oriented language, I am extending the language. Well, maybe I am not unless I am adding something to the BCL, which in my case actually can happen.

But that raises another interesting idea -- when I create methods and properties in a new program, am I creating a dialect? One that you only speak if the appropriate terms are in scope for you? And if that is true, what does it mean to be the author of a much-used library -- are you the programming equivalent of Académie française, only much more effective since you control the language in a much more literal sense than our colleagues in France ever could?

Of course, as Raymond Chen pointed out, sample code often tries harder to be multilingual, for obvious reasons. :-)

Are we creating language here? Or are my delusions of linguistic aptitude confusing me?

I wonder if anyone has ever studied this before, on the linguistic side.


This post brought to you by "Ӓ" (U+04d2, CYRILLIC CAPITAL LETTER A WITH DIARESIS)

# Eric Lippert on 28 Jun 2005 7:27 PM:

A few random comments:

Coming up with good identifiers is tricky. Just yesterday in fact I had to put the smack down on someone who wanted to call a function LifetimeTokenRevoke, to, of course, revoke the lifetime token.

For Yoda, this function is? Verb at the end, programmers like? Up with these shenanigans put not I will.

Lisp and Scheme have the benefit of being able to use punctuation in identifiers -- if you have a predicate, you can actually call it "EvenNumber?" which pretty strongly calls out that it's providing the answer to a question.

And finally, I was in a dance class a few years ago, and I asked what lead disambiguated two very similar moves so that the follow could tell the difference. I got teased for weeks for using a ten dollar word. I honestly had no idea that "disambiguated" was a fancy pants word -- I mean, it's perfectly straightforward. You've got an ambiguous situation, you disambiguate it, what's the big deal? :)

# Dean Harding on 28 Jun 2005 7:35 PM:

> Maybe calling a class TextRenderer feels weird to some because classes are supposed to be
> "pure" nouns, and not just the noun forms of verbs.

I'd say that for classes that only have static methods, like the TextRenderer class, that the "class name must be nouns" rule doesn't hold as much. It's only when you actually have an instance of something that it needs to be decribed by a noun. Anyway, I think the act of creating an instance of a class would turn it into a noun anyway...

Also, I absolutely agree that writing a library (or any program) means you're creating a new dialect. Whenever you join a new project, it always takes a few days (or weeks!) to learn all the lingo...

# Michael S. Kaplan on 29 Jun 2005 6:07 PM:

Hey Eric and Dean,

Yeah, it is looking more and more like an actual language issue all the time. Lisp is a good example of how it stretched almost to t he point of being obvious.

No linguists have taken the bait yet. I honestly do wonder if anyone has studied this conceptually before. Maybe I should write a scholarly paper on the subject!

referenced by

2008/08/12 Unbloggableable: an inconceivableable term that makes me a little uncomfortableable

2007/01/10 Two things that suck about CurrentUICulture, (Part 2, aka On judging a book by its cover)

2006/09/06 IsSortable() == false? Well, sometimes it may be lying....

go to newer or older post, or back to index or month or day