Ottinger's Rules for Variable and Class Naming

Tim Ottinger

Brad Appleton took the time to convert the original textual version to hypertext, and I thank him for the this encouragement.

This is an archived copy; you might prefer the original (and maintained) version.

This paper grew out of some postings made on usenet, specifically comp.object, in 1997. There was some response, and so it is presented here in entirety, enhanced a bit.



  1. Use Pronounceable names
  2. Avoid Encodings
  3. Don't be too cute
  4. Most meanings have multiple words. Pick ONE
  5. Most words have multiple meanings
  6. Nouns and Verb Phrases
  7. Use Solution Domain Names
  8. Also Use Problem Domain Names
  9. Avoid Mental Mapping
  10. Nothing is intuitive
  11. Avoid Disinformation
  12. Names are only Meaningful in Context
  13. Don't add Artificial Context
  14. No Disambiguation without Differentiation

Introduction

When a new developer joins a project which is already in progress, there is a steep learning curve. If the new developer already knows the methodology and programming language, some of this is reduced. If the new developer already knows the problem domain fairly well, this also shortens the ramp-up time.

There is often a great deal of artificial curve which is added to a project by decree or by accident. This has the opposite effect; it increases ramp-up time and can hurt the new developer's time-to-first-contribution considerably. And not only the first contribution, but the next several.

The goal of these rules set is to help avoid creating one type of artificial learning curve, that of deciphering or memorizing strange names.

The rules were developed in group discussions, largely by examining poor names and dissecting them to determine the cause of their "badness".

    Use Pronounceable names

    If you can't pronounce it, you can't discuss it without sounding like an idiot. "Well, over here on the bee cee arr three cee enn tee we have a pee ess zee kyew int, see?"

    I company I know has genymdhms (generation date, year, month day, hour, minute and second) so they walked around saying "gen why emm dee aich emm ess". I have an annoying habit of pronouncing everything as-written, so I started saying "gen-yah-mudda-hims". It later was being called this by a host of designers and analysts, and we still sounded silly. But we were in on the joke, so it was fun. Fun or not, don't do that.
     
    It would have been so much better if it had been called generation_timestamp. "Hey, Mikey, take a look at this record! The generation timestamp is tomorrow! How can that be?"

    Avoid Encodings

    Encoded names require deciphering. This is true for Hungarian and other `type-encoded' or otherwise encoded variable names. To allow any encoded prefixes or suffixes in code is suspect, but to require it seems irresponsible inasmuch as it requires each new employee to learn an encoding "language" in addition to learning the (usually considerable) body of code that they'll be working in.

    When you worked in name-length-challenged programs, you probably violated this rule with impunity and regret. Fortran forced it by basing type on the first letter, making the first letter a `code' for the type. Hungarian has taken this to a whole new level.

    We've all seen bizarre encoded naming standards for files, producing (real name) cccoproi.sc and SRD2T3. This is an artificially-created naming standard in the modern world of long filenames, though it had it's time.
    This isn't intended as an attack on Hungarian notation out of malice toward Microsoft or Windows. It's a simple rule of simplifying and clarifying names. HN was pretty important back when everything was an integer handle or a long pointer, but in C++ we have (and should have) a much richer type system. We don't need HN any more.  Besides, encoded names are seldom pronounceable ([#1]).

    Of course, you can get used to anything, but why create an artificial learning curve for new hires? Avoid this if you can avoid it.

    Don't be too cute

    If the names are too clever, they will be memorable only to people who share your sense of humor and remember the joke. Will the people coming after you really remember what HolyHandGrenade is supposed to do in your program? Sure, it's cute, but maybe in this case ListItemRemover might be a better name. I've seen Monty Python's The Holy Grail, but it may take me a while to realize what you are meaning to do.

    I've seen other similar cutesy namings fail.

    Given the choice, choose clarity over entertainment value. It's a good practice.

    Most meanings have multiple words. Pick ONE

    Pick one word for one abstract function and stick with it.  I hear that the Eiffel libraries excel at this, and I know that the C++ STL is very consistent. Sometimes the names seem a little odd (like pop_front for a list), but being consistent will reduce the overall learning curve for the whole library.

    For instance, it's confusing to have fetch, retrieve and get as same-acting methods of the different classes. How do you remember which method name goes with which class? Sadly, you often have to remember who wrote the library in order to remember which term was used. Otherwise, you spend an awful lot of time browsing through headers and previous code samples. This is a considerably worse practice than the use of encodings.

    Likewise, it's confusing to have a controller and a manager and a driver in the same process. What is the essential difference between a DeviceManager and a ProtocolController? Why are both not controllers, or both not managers? The name leads you to expect two objects that have very different type as well as having different classes.

    We can take advantage of this to create consistent interfaces and simplify learning dramatically.

    Most words have multiple meanings

    Don't use the same word for two purposes, if you can at all avoid it.

    This is the inverse of the previous rule. When you use different terms, it leads one to think that there are different types underlying them. If I use DeviceManager and ProtocolManager, it leads one to expect the two to have very similar interfaces. If I can call DeviceManager::add(), I should be able to call ProtocolManager::add(). Why? Because the name created an association between the two. I expect to see *Manager::add() now.

    If you use the same word, but you have very different interfaces, this isn't a total evil (see  #12 ), but it does cause some confusion. If you system or your module is small enough, or your controls rigorous enough to prevent synonyms, then that's great.

    If you're learning a framework, though, you need to be most careful not to be fooled by synonyms. While you should be able to count on the names denoting type, you frequently cannot.

    Remember also that it's not polite at all to have the same name in two scopes.

    Nouns and Verb Phrases

    Classes and objects should have noun or noun phrase names.

    There are some methods (commonly called "accessors") which calculate and/or return a value. These can and probably should have noun names. This way accessing a person's first name can read like:

            string x = person.name();
    Other methods (sometimes called "manipulators", but not so commonly anymore) cause something to happen. These should have verb or verb-phrase names. This way, changing a name would read like:
            fred.changeNameTo("mike")
    As a class designer, does this sound boringly unimportant? If so, then go write code that uses your classes. The best way to test an interface is to use it and look for ugly, contrived, or confusing text. This really helps.

    Use Solution Domain Names

    Go ahead, use computer science (CS) terms, algorithm names, pattern names, math terms, etc.

    Yeah, it's a bit heretical, but you don't want your developers having to run back and forth to the customer asking what every name means if they already know the concept by a different name.

    We're talking about code here, so you're more likely to have your code maintained by a CS major or informed programmer than by a domain expert with no programming background. End users of a system very seldom read the code, but the maintainers have to.

    Also Use Problem Domain Names

    When there is no `programmer-ese' for what you're doing, use the name from the problem domain. At least the programmer who maintains your code can ask his boss what it means.

    In analysis, of course, this is the superior rule to  [Use Solution Domain Names], because the end-user is the target audience.

    Avoid Mental Mapping

    Readers shouldn't have to mentally translate your names into other names they already know.

    There are some unfortunate examples for this. One of them is Microsoft's choice to call the things that walk through a list Enumerators instead of Iterators.  This is sad because the term iterator is in common use in software circles and was completely appropriate to the domain (see  Pick One ) and also because the term enumeration typically has a very different meaning (see  Multiple Meanings ). Between the two, most developers have to translate enumerator to iterator mentally as the conversations about such things go on.

    This problem generally  arises from a choice to use neither  problem domain terms nor  solution domain terms.

    Nothing is intuitive

    Sadly, and in contradiction to the above, all names require some mental mapping, since this is the nature of language. If you use a term which might not be known to your audience, you must map it to the concept you'd like it to represent.

    For this reason, most important names should be in a glossary or should be explained in comments at least. Even if they're parameters or local variables. Even if they're inside the static member of a class, unless the term is completely in harmony with all of these naming rules.

    Avoid Disinformation

    Avoid words which already mean something else. For example, "hp", "aix", and "sco" would be horrible variable names because they are the names of Unix platforms or variants. Even if you are coding a hypotenuse and "hp" looks like a good abbreviation, it violates too many rules and also is disinformative.

    Likewise don't refer to a grouping of accounts as an AccountList unless it's actually a list. A list means something to CS people. It denotes a certain type of data structure. If the container isn't a list, you've disinformed the programmer who has to maintain your code. AccountGroup or BunchOfAccounts would have been better.

    The absolute worse example of this would be the use of lower-case L or uppercase o as variable names, especially in combination. The problem, of course is in code where such things as this occur:

        int a = l;
        if ( O = l )
            a = O1;
        else
            l = 0;
    You think that I made this one up, right? Sorry. I've examined code this year (1997) where such things were abundant. It's a great technique for shrouding your code.

    When I complained, one author told me that I should use a different font so that the differences were more obvious. I think that the problem could be more easily and finally corrected by search-and-replace than by publishing a requirement that all future readers to choose Font X..

    Names are only Meaningful in Context

    There are few names which are meaningful in and of themselves. Most, however are not. Instead, you need to place names in context for your reader by enclosing them in classes, well-named functions, or comments.

    The term `tree' needs some disambiguation, for example if the application is a forestry application. You may have syntax trees, red-black or b-trees, and also elms, oaks, and pines. The word `tree' is a good word, and is not to be avoided, but it must be placed in context every place it is used.

    If you review a program or enter into a conversation where the word "tree" could mean either, and you aren't sure, then the author (speaker) will have to clarify.

    Don't add Artificial Context

    In an imaginary application called "Gas Station Deluxe", it is a bad idea to prefix every class with `GSD' if there is a chance that the class might later be used in "Inventory Manager" (at which time the prefix becomes meaningless).

    Likewise, say you invented a `Mailing Address' class in GSD's accounting module, and you named it AccountAddress. Later, you need a mailing address for your customers. Do you use `AccountAddress'?

    In both these cases, the naming reveals an earlier short-sightedness regarding reuse. It shows that there was a failing at the design level to look for common classes across an application.

    Sadly, this is the standard being used by many Java authors. Even in C++, this is becoming increasingly common. We need language support for this type of work. I've not had too much trouble with it in Python, but I'm watching out. You should also.

    The names `accountAddress' and `customerAddress' are fine names for instances of the class.

    No Disambiguation without Differentiation

    This is a problem that usually arises from writing code solely for the compiler/interpreter. You can't have the same name referring to two things in the same scope, so you change one of them. Well, that's better than misspelling one (I've seen code that looks like this was intentional, and correcting the spelling prevented compiles due to symbol clashes), but there should be some fundamental change in name that make it clear that they are different.

    Imagine that you have a Product class. If you have another called ProductInfo or ProductData, you have failed to make the names different. Info and Data are like "stuff": basically meaningless. Likewise, using the words Class or Object in an OO system is so much noise; can you imagine having CustomerObject and Customer as two different class names?

    MoneyAmount is no better than `money'. CustomerInfo is no better than Customer. The word `variable' should never appear in a variable name. The word `table' should never appear in a table name. How is NameString better than Name? Would a Name ever be a floating point number? Probably not. If so, it breaks an earlier rule about disinformation.

    There is an application I know of where this is illustrated. I've changed the name of the thing we're getting to protect the guilty, but the exact form of the error is:

             getSomething();
             getSomethings();
             getSomethingInfo();
    The second tells you there are many of these things. The first lets you know you'll get one, but which? The third tells you nothing more than the first, but the compiler (and hopefully the author) can tell them apart. You are going to have to work harder.

    Try to disambiguate in such a way that the reader knows what the different versions offer her, instead of merely that they're different.

Final Words ...

The hardest thing about choosing good names is that it requires good descriptive skills and a shared cultural background. This is a teaching issue, rather than a technical, business, or management issue. As a result many people in this field don't do it very well.

Follow some of these rules, and see if you don't improve the readability of your code. If you are maintaining someone else's code, make changes to resolve these problems. It will pay off in the long run.



Back

Object Mentor
Archived by Chris Lott.
Collection index