Ottinger's Rules for Variable and Class NamingTim OttingerBrad Appleton took the time to convert the original textual version to hypertext, and I thank him for the this encouragement. |
This paper grew out of some postings made on usenet, specifically comp.object, in 1997. There was some response, and so it is presented here in entirety, enhanced a bit.
There is often a great deal of artificial curve which is added to a project by decree or by accident. This has the opposite effect; it increases ramp-up time and can hurt the new developer's time-to-first-contribution considerably. And not only the first contribution, but the next several.
The goal of these rules set is to help avoid creating one type of artificial learning curve, that of deciphering or memorizing strange names.
The rules were developed in group discussions, largely by examining poor names and dissecting them to determine the cause of their "badness".
I company I know has genymdhms (generation date, year, month
day, hour, minute and second) so they walked around saying "gen why
emm dee aich emm ess". I have an annoying habit of pronouncing everything
as-written, so I started saying "gen-yah-mudda-hims". It later was
being called this by a host of designers and analysts, and we still sounded
silly. But we were in on the joke, so it was fun. Fun or not, don't do
that.
It would have been so much better if it had been called generation_timestamp.
"Hey, Mikey, take a look at this record! The generation timestamp is
tomorrow! How can that be?"
When you worked in name-length-challenged programs, you probably violated this rule with impunity and regret. Fortran forced it by basing type on the first letter, making the first letter a `code' for the type. Hungarian has taken this to a whole new level.
We've all seen bizarre encoded naming standards for files, producing
(real name) cccoproi.sc and SRD2T3. This is an artificially-created
naming standard in the modern world of long filenames, though it had it's
time.
This isn't intended as an attack on Hungarian notation out of malice
toward Microsoft or Windows. It's a simple rule of simplifying and clarifying
names. HN was pretty important back when everything was an integer handle
or a long pointer, but in C++ we have (and should have) a much richer type
system. We don't need HN any more. Besides, encoded names are seldom
pronounceable ([#1]).
Of course, you can get used to anything, but why create an artificial learning curve for new hires? Avoid this if you can avoid it.
I've seen other similar cutesy namings fail.
Given the choice, choose clarity over entertainment value. It's a good practice.
For instance, it's confusing to have fetch, retrieve and get as same-acting methods of the different classes. How do you remember which method name goes with which class? Sadly, you often have to remember who wrote the library in order to remember which term was used. Otherwise, you spend an awful lot of time browsing through headers and previous code samples. This is a considerably worse practice than the use of encodings.
Likewise, it's confusing to have a controller and a manager and a driver in the same process. What is the essential difference between a DeviceManager and a ProtocolController? Why are both not controllers, or both not managers? The name leads you to expect two objects that have very different type as well as having different classes.
We can take advantage of this to create consistent interfaces and simplify learning dramatically.
This is the inverse of the previous rule. When you use different terms, it leads one to think that there are different types underlying them. If I use DeviceManager and ProtocolManager, it leads one to expect the two to have very similar interfaces. If I can call DeviceManager::add(), I should be able to call ProtocolManager::add(). Why? Because the name created an association between the two. I expect to see *Manager::add() now.
If you use the same word, but you have very different interfaces, this isn't a total evil (see #12 ), but it does cause some confusion. If you system or your module is small enough, or your controls rigorous enough to prevent synonyms, then that's great.
If you're learning a framework, though, you need to be most careful not to be fooled by synonyms. While you should be able to count on the names denoting type, you frequently cannot.
Remember also that it's not polite at all to have the same name in two scopes.
There are some methods (commonly called "accessors") which calculate and/or return a value. These can and probably should have noun names. This way accessing a person's first name can read like:
string x = person.name();Other methods (sometimes called "manipulators", but not so commonly anymore) cause something to happen. These should have verb or verb-phrase names. This way, changing a name would read like:
fred.changeNameTo("mike")
As a class designer, does this sound boringly unimportant? If so, then
go write code that uses your classes. The best way to test an interface
is to use it and look for ugly, contrived, or confusing text. This really
helps.
Yeah, it's a bit heretical, but you don't want your developers having to run back and forth to the customer asking what every name means if they already know the concept by a different name.
We're talking about code here, so you're more likely to have your code maintained by a CS major or informed programmer than by a domain expert with no programming background. End users of a system very seldom read the code, but the maintainers have to.
In analysis, of course, this is the superior rule to [Use Solution Domain Names], because the end-user is the target audience.
There are some unfortunate examples for this. One of them is Microsoft's choice to call the things that walk through a list Enumerators instead of Iterators. This is sad because the term iterator is in common use in software circles and was completely appropriate to the domain (see Pick One ) and also because the term enumeration typically has a very different meaning (see Multiple Meanings ). Between the two, most developers have to translate enumerator to iterator mentally as the conversations about such things go on.
This problem generally arises from a choice to use neither problem domain terms nor solution domain terms.
For this reason, most important names should be in a glossary or should be explained in comments at least. Even if they're parameters or local variables. Even if they're inside the static member of a class, unless the term is completely in harmony with all of these naming rules.
Likewise don't refer to a grouping of accounts as an AccountList unless it's actually a list. A list means something to CS people. It denotes a certain type of data structure. If the container isn't a list, you've disinformed the programmer who has to maintain your code. AccountGroup or BunchOfAccounts would have been better.
The absolute worse example of this would be the use of lower-case L or uppercase o as variable names, especially in combination. The problem, of course is in code where such things as this occur:
int a = l; if ( O = l ) a = O1; else l = 0;You think that I made this one up, right? Sorry. I've examined code this year (1997) where such things were abundant. It's a great technique for shrouding your code.
When I complained, one author told me that I should use a different font so that the differences were more obvious. I think that the problem could be more easily and finally corrected by search-and-replace than by publishing a requirement that all future readers to choose Font X..
The term `tree' needs some disambiguation, for example if the application is a forestry application. You may have syntax trees, red-black or b-trees, and also elms, oaks, and pines. The word `tree' is a good word, and is not to be avoided, but it must be placed in context every place it is used.
If you review a program or enter into a conversation where the word "tree" could mean either, and you aren't sure, then the author (speaker) will have to clarify.
Likewise, say you invented a `Mailing Address' class in GSD's accounting module, and you named it AccountAddress. Later, you need a mailing address for your customers. Do you use `AccountAddress'?
In both these cases, the naming reveals an earlier short-sightedness regarding reuse. It shows that there was a failing at the design level to look for common classes across an application.
Sadly, this is the standard being used by many Java authors. Even in C++, this is becoming increasingly common. We need language support for this type of work. I've not had too much trouble with it in Python, but I'm watching out. You should also.
The names `accountAddress' and `customerAddress' are fine names for instances of the class.
Imagine that you have a Product class. If you have another called ProductInfo or ProductData, you have failed to make the names different. Info and Data are like "stuff": basically meaningless. Likewise, using the words Class or Object in an OO system is so much noise; can you imagine having CustomerObject and Customer as two different class names?
MoneyAmount is no better than `money'. CustomerInfo is no better than Customer. The word `variable' should never appear in a variable name. The word `table' should never appear in a table name. How is NameString better than Name? Would a Name ever be a floating point number? Probably not. If so, it breaks an earlier rule about disinformation.
There is an application I know of where this is illustrated. I've changed the name of the thing we're getting to protect the guilty, but the exact form of the error is:
getSomething(); getSomethings(); getSomethingInfo();The second tells you there are many of these things. The first lets you know you'll get one, but which? The third tells you nothing more than the first, but the compiler (and hopefully the author) can tell them apart. You are going to have to work harder.
Try to disambiguate in such a way that the reader knows what the different versions offer her, instead of merely that they're different.
Follow some of these rules, and see if you don't improve the readability of your code. If you are maintaining someone else's code, make changes to resolve these problems. It will pay off in the long run.
![]() Back |
Object Mentor |
|