Archive for the ‘code review’ Category

Code Review #8: When to comment

Monday, March 16th, 2009

The last ? in a series of posts about commenting. See “the why”. See “not commenting is career threatening”. And the comment that started this off!

This post should have really been the second one I wrote. The first post was about who a developer needs to keep happy – the client or the manager.

But commenting everything is both hard, excessive, and wasteful of time. In “the why” post, I cover what should be in the comments – implicitly I was limiting comments to just the places where knowing “the why” is important.

However, still the question is “when” should a developer comment?

Here is my rule of thumb:

  • Any developer had to spend time figuring out what the code did. Refactor it and if still not clear comment the code with questions and assumptions. (Just in case the refactoring introduced a bug or caused an existing bug to appear.)
  • Another developer asks the implementing developer (or the current “owner”/responsible developer) about the code, interfaces, package. The developer asking the question copy the explanation into the code, cleaning up the chat log/email response and adding further edits and explanation as needed.
  • The code is not going to have further development for a while. Comment the code enough so that when development is resumed, the restart is faster.
  • The code is being handed off from one developer to another.
  • Key interfaces that are used by multiple groups.

And managers give developers time to wrap up and add these comments.

Conversely, what is not worth commenting:

  • Code that is actively, daily, being changed rearchitected. Every developer already knows what the comments would say and the structure is changing fast enough that the comments would become outdated.
  • Code that does not impact data integrity and end-user experience in a horrid way.
  • Utility classes/ methods.

Why and when to wrap external library classes

Monday, March 16th, 2009

At some point, every developer starts using an external library. They then have to decide if that external library should be wrapped in their own custom interfaces and classes.

If the external library would be pervasively imported in throughout the code then the answer is empathically YES — the external library should be wrapped.

At Amplafi, hibernate is just such a library. Without wrapping every time the hibernate library changed its structure or behavior, the change would have a large impact. However, by wrapping the libraries hibernate became loosely coupled.

This also enabled us to solve other more major issues with Hibernate Query and Session.

Specifically:

  • control when a transaction could be committed. We wanted to count how many times a tx was “started” and only commit when the tx was “ended” the same number of times it was started. Useful for code that doesn’t know if it needs to start a transaction. Now any code that needs a tx just “starts” one and ends it when done.
  • Performance metrics gathering.
  • Delaying starting the transaction until it is known that something will actually be done.
    More gentle behavior for query.uniqueResult() ( default Hibernate behavior is to throw an exception if there is more than one row returned. Amplafi’s wrapper logs the non-uniqueness + ids of the extra rows. )

How we did this:

  1. Create an interface (AmplafiQuery) that extends Query
  2. Create a class (AmplafiQueryImpl) that extends AmplafiQuery and wraps a org.hibernate.Query
  3. Create a Txmanager that returns a Tx.
  4. Tx has the various createQuery methods and returns AmplafiQuery

We also took the opportunity to add to AmplafiQuery:

  • a “asList()” that is a generic enabled version of Query.list()
  • a “unique()” that is a generic enabled version of Query.uniqueResult()

Interfaces vs. abstract classes

Saturday, March 14th, 2009

Sigh … some people just don’t get it…. :-P

Interfaces rock! Below is my comment from stackoverflow.com, a question about how to handle the “interfaces v. abstract classes” interview question in an interview.

  1. First, the “only one super class” answer is lame. Anyone who gave me that answer in an interview would be quickly countered with “C++ existed before Java and C++ had multiple super classes. Why do you think James Gosling only allowed one superclass for Java?”

    Understand the philosophy behind your answer otherwise you are toast (at least if I interview you.)

  2. Second, interfaces have multiple advantages over abstract classes, especially when designing interfaces. The biggest one is not having a particular class structure imposed on the caller of a method. There is nothing worse than trying to use a method call that demands a particular class structure. It is painful and awkward. Using an interface *anything* can be passed to the method with a minimum of expectations.

    Example:

    public void foo(Hashtable bar);

    vs.

    public void foo(Map bar);

    For the former, the caller will always be taking their existing data structure and slamming it into a new Hashtable.

  3. Third, interfaces allow public methods in the concrete class implementers to be “private”. If the method is not declared in the interface then the method cannot be used (or misused) by classes that have no business using the method. Which brings me to point 4….
  4. Fourth, Interfaces represent a minimal contract between the implementing class and the caller. This minimal contract specifies exactly *how* the concrete implementer expects to be used and no more. The calling class is not allowed to use any other method not specified by the “contract” of the interface. The interface name in use also flavors the developer’s expectation of how they should be using the object. If a developer is passed a

    public interface FragmentVisitor {
    public void visit(Node node);
    }

    The developer knows that the only method they can call is the visit method. They don’t get distracted by the bright shiny methods in the concrete class that they shouldn’t mess with.

  5. Lastly, abstract classes have many methods that are really only present for the subclasses to be using. So abstract classes tend to look a little like a mess to the outside developer, there is no guidance on which methods are intended to be used by outside code.

    Yes of course some such methods can be made protected. However, sadly protected methods are also visible to other classes in the same package. And if an abstract class’ method implements an interface the method must be public.

    However using interfaces all this innards that are hanging out when looking at the abstract super class or the concrete class are safely tucked away.

Yes I know that of course the developer may use some “special” knowledge to cast an object to another broader interface or the concrete class itself. But such a cast violates the expected contract, and the developer should be slapped with a salmon.

Code Review #7 – Comment the “why” not the “what”

Wednesday, March 4th, 2009

[This post continues the response to Mike.]

Clean “good” code is good but not enough. Code needs comments — but the right kind of comments.

“What” comments are useless and the most quickly out-dated. An example of a what comment is: “add the suffix path to the end”.

However, the post “Thoughts on how to evaluate code” was very specific referring to “why” comments.

Code no matter how “clean” can not self-document. Excellent comments are “why” comments. “Why” comments talk about :

  1. the assumptions / program state the code is expecting. (For example, the user is logged in, premium membership, or admin privileges )
  2. other code affected if this code is changed (For example, setting up a language default for the session, a default that another piece of code is expecting to be set.)
  3. if something is being done in a non-standard way – why it is being done that way (for example, iterating through a loop from high index to low-index rather than the standard low to high)
  4. should the code in question be used as an example pattern of how to do similar operations elsewhere in the code – or should a developer instead use a different pattern elsewhere.
  5. list the assumptions or conditions that require this implementation (with timestamp) so that it is easy to spot code that is present to handle conditions that no longer exist. For example, a comment like this: “12/20/2007 – yahoo’s open id implementation is using the openid 0.8-proposedspec.” Its pretty likely that the the spec has changed as has yahoo’s implementation. This wasn’t a HACK but knowing that reason behind the code is invaluable.
  6. Future work planned.
  7. Alternative solutions that were rejected (and why).
  8. Performance considerations – this code might be special because it executes millions of times a second. So there is good reasons for some “ugliness”.

The developer, at the moment they are writing the code, must be able to explain why they are writing the code a certain way. It is a red flag to any developer if they cannot explain the why. They should double-check to make sure they understand the requirements. Chances are they don’t and they are doing the wrong thing.

I started requiring developers to clearly document the “why” in their code. I discovered developers that were doing things “wrong” because they were unaware of the correct solution. I discovered that I was not as clear in conveying the requirements as I really thought. Without the developer’s “why” comment, it is easy to get frustrated with developers for the wrong reasons.

Parting thought

Why should the developers that come after the original developer have to spend any time ( and money! ) figuring out the previous work?

You may be the next developer.

Not commenting code is dangerous to your career

Wednesday, March 4th, 2009

There is this myth that code can be self-documenting and that comments are not necessary in good code. Michael recently comment on an earlier blog post advocating the idea of self-documenting code.

“Self-documenting” code is a career-damaging concept, because:

  • Your “manager” (person who decides you stay employed) is an “idiot”,
  • Your co-workers are “idiots”,
  • You are an “idiot”
  • The product manager is an “idiot”
  • Your client is an “idiot”

Your “manager” (person who decides you stay employed) is an “idiot”

That’s me. The guy who has 6+ people to manage has to figure out business plans and a much of non-technical stuff that has no doubt resulted in permanent brain damage.

Your manager is code reviewing all the changes by his direct reports. A task that takes a few minutes every day… unless there are no comments. I know that your manager should be able to immediately see what the code does. But remember your manager is an “idiot”. Unfortunately for you, he signs your paycheck.

Also unfortunately for you, if your manager doesn’t immediately understand your change — you are an “idiot” in his eyes.

You have also just turned a 15-minute task into a 2-hour task. He didn’t really want to be home with his family.

Your co-workers are idiots

Lets agree, your code was stellar but then your co-workers came in and mucked it up. They did a stupid refactoring and broke everything.

The code was “obvious” but not to them.

You are an idiot

Your manager returns from a two-week vacation and is code reviewing 2 weeks of changes. Your manager is asking about a “odd” change that you did 14 days ago.

  • Can you explain it clearly?
  • Do you remember your thinking at the time?
  • Alternative solutions that you tried and failed?

The product manager is an “idiot”

  1. Your stellar (comment-free) code is deployed into production.
  2. The product manager changes the product definition and a new feature is born.
  3. One of your (“idiot”) co-workers makes a change (but not to your code).
  4. The tests pass and 2am this Friday, the code is to be deployed
  5. You go out drinking.
  6. Your (“idiot”) manager deploys the code to production and it breaks.
  7. Unfortunately, it breaks in your code.
  8. Your manager looks at your code and can’t figure out why you wrote the code a certain way.
  9. Its 3am now. He calls you up. Are you going to remember why and are you going to be able to help him?

Lets pretend that the test coverage is better and the problem is discovered before deployment. But your change was made 3 months ago, are you going to be able to explain the thought process for the change?

Your client is an “idiot”

You are independent and you take on clients. Your new client is a non-technical person who needs your awesome coding skills. Now you are in trouble. A non-technical product manager and a non-technical manager (they are paying you!).

But your code is self-documenting right?

Not to them. The product is late. The doubts grow in their head. They ask a friend to look at your code. Their friend (“another idiot!”) says the code looks like garbage and he cannot tell why you wrote the code a certain way. The client starts to push back on paying your invoices. Things get ugly.

Code Review #6 – ‘Too smart’ aka scared of being dumb

Tuesday, May 6th, 2008

One of the biggest failing junior developers have is that they are too ’smart’. ‘Too smart’??? How can someone be ‘too smart’? Actually pretty easily.

‘Too smart’ is when the person spent hours looking at a problem. And the next day she/he

  • realizes the specs would require violating the speed-of-light;
  • or has the product manager come up to them and say, “Oh yeah, there was a typo in the spec and it should have said ‘… NOT …’”;
  • or the product manager/tech lead (when asked) says, “that’s not what I meant”;
  • or the product manager/tech lead (when asked) says, “I didn’t consider that case” (or don’t worry about that case now);
  • or is told by a fellow developer, “oh I ran into that problem last week and on the 3rd page of the documentation, a workaround solution to that problem is described.”;
  • or discovers they were working on an old version of the code and upgrading to the latest version of code or libraries, solves the problem;
  • or finally another developer looks at the problem and points out a simple condition that has been reversed;
  • or…

In all cases, the person (doesn’t have to be a developer) violated the 20-minute rule. Quite simply the 20-minute rule says

“If you are stuck for more than 20 minutes, and you are no closer to understanding the source of the problem – then ask for help”

Yes, it could be something ’stupid’ but it could also be something else. Sometimes you will be told to keep on looking because the person in question wants you to learn something. But sometimes there is a fundamental misunderstanding.

So to make sure that you are not asking for help too easily there is the corollary to the 20-minute rule, the 10-minute review rule.

Spend 10 minutes reviewing, how you got to this situation.

  • Describe why the previous code is written the way it was. Notice this is not a ‘what’ question. So do not answer ‘what the previous code was doing?’ Answer why did the previous developer did the code this way – do you know how the code is being used? Are you breaking those assumptions?
  • Describe why the previous code is now incorrect. List the changed assumptions or requirements.
  • Describe what your solution is going to accomplish.
  • Describe why your solution is the minimal solution.
  • Describe why your solution is the correct choice.
  • Describe the single problem that if solved would make your solution work correctly.

At this point, one of these choices is the path that needs to be taken:

  • Fundamental misunderstanding about existing code. You realized your initial assumptions were wrong and that your solution is the wrong path. Solution: STOP and look at reevaluate problem discarding old assumptions.
  • Missing information. The problem was not specified clearly enough. There are multiple possible problems that are being asked to be solved. Solution: get clarification. Do not proceed until the problem has been clarified.
  • Reinventing wheel. Has this problem already been solved somewhere else in the code? If so, refactor so the already created/tested solution can be used.
  • Rathole. A simple problem is becoming more and more complex. Are you fixing the problem of the problem of the problem that you started off solving? Are you making changes all over the place? If you are in a rathole, this is a sure sign of Doing-the-wrong-thing. Solution: STOP!
  • I don’t know If you don’t know, then you need to ask for help. Solution: Swallow pride and ask for help.

Code Review #5 – splurge on reporting configuration errors

Tuesday, October 30th, 2007

Configuration problems when deploying a new build are high on the “high anxiety” list. The pressure is high to hurry up and get the build deployed. Often times, the deploy happens late at night when the deployer really just wants to go to bed.

The number one way a developer can help out in this situation is very good error messages. If an exception occurs as a result of a misconfiguration: be verbose. Be overly verbose. The sanity you save may be your own.

This error message does not cut it:

Caused by: java.io.IOException: Couldn’t initialize working directory.
at com.amplafi.core.iomanagement.FtpManagerImpl.initializeWorkingDirectory(FtpManagerImpl.java:83)
at com.amplafi.core.iomanagement.FtpManagerImpl.initializeService(FtpManagerImpl.java:50)
… 16 more
Caused by: java.io.IOException: Couldn’t create directory: ftp-working
at com.amplafi.core.iomanagement.FtpManagerImpl.initializeDirectory(FtpManagerImpl.java:61)
at com.amplafi.core.iomanagement.FtpManagerImpl.initializeWorkingDirectory(FtpManagerImpl.java:81)

The error message should have at the minimum contained the following pieces of information:

  • The directory name as supplied in the configuration
  • The full absolute path to that directory (not the same thing as the configuration name if the name is a relative, not absolute path)
  • The reason for the problem (was the parent directory not writable, something already there, disk full, etc.)
  • A suggestion on how to correct the problem

Code Review #4: Always read the documentation/code – a.k.a. java.net.URL is evil

Sunday, September 9th, 2007

The Setup
Before I plunge into my rant, lets review a little-ole documentation. Under java.lang.Object, for equals() we have this:

It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.

Over in java.util.Collection land, there is this little bit of documentation, highlighting the importance of overriding the default implementations of equals() and hashCode() if one wants to play nice in a Collection:

Many methods in Collections Framework interfaces are defined in terms of the equals method. This specification should not be construed to imply that invoking Collection.contains with a non-null argument o will cause o.equals(e) to be invoked for any element e. Implementations are free to implement optimizations whereby the equals invocation is avoided, for example, by first comparing the hash codes of the two elements.

So for instances of a class to function well as a key in a java.util.Map or be placed into a java.util.Set, the class should override the standard equals() and hashCode() provided by Object. Because of this universal need, it is a reasonable expectation that implementors of equals() and hashCode() pay attention to performance. For example, the programmers who wrote java.lang.String did a fairly simple implementation of hashCode() and then cached the result so the calculation was not repeated for a given String object.

The resulting ubiquity of overridden equals() and hashCode() results in a certain set of expectations:

  1. if the object doesn’t change the results of equals()/hashCode() shouldn’t change (otherwise Set collections will break)
  2. equals()/hashCode() should be fast and in the case of hashCode – for potentially expensive hashCode() operations the results should be cached.
  3. Immutables (such as URL) should be good candidates to be keys in a java.util.Map

The Takedown
So I was completely blindside by the brain-dead, high school, freshman implementation of java.net.URL.

The only thing I will say positive about the java.net.URL implementation equals()/hashCode() is this: buried deep, deep in the documentation, the high school students who wrote the implementation do casually tell you that they are going to screw you over.

In java.net.URL.equals() (only if you go look at the detailed documentation) do you see this:

Compares this URL for equality with another object.

If the given object is not a URL then this method immediately returns false.

Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.

Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; (emphasis mine) else if either host name can’t be resolved, the host names must be equal without regard to case; or both host names equal to null.

Since hosts comparison requires name resolution (emphasis mine), this operation is a blocking operation.

Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP. (emphasis mine)

Translation: We are going to screw you and you will enjoy it.

The Scream
How does this screw you?

  1. Two fundamental operations equals()/hashCode() are now ridiculously expensive since they involve a DNS lookup to see if they resolve to the same ip address. So now an operation that is optimize in other classes is outrageously expensive (milliseconds long) for a very fundamental internet class.
  2. URL looks to be an immutable. Immutable objects should mean that the identity: x.equals(x) is true. But this isn’t the case for URL. If you serialize a URL, the resolved hostaddress (it’s transient) is lost. So serialize/deserialize a URL. Wait a little bit. Add a little bit of the Dynamic DNS magic. Presto “http://google.com” will not equal the deserialized version of itself
  3. It is simple wrong. Take the case of a hosting service that hosts two different sites on the same server. The DNS lookup resolves the 2 domains to the same physical ip address. URL (with an “assist” from URLStreamHandler, InetAddress, and our very expensive DNS lookup) compares URLs based on the ip address not the entered hostname. As a result, http://my-porn-site.com could ‘equal’ http://jesus-has-saved-my-soul.org. I think someone might disagree with this assessment, don’t you?

The Takeaway
So after these very expensive operations URL.equals()/hashCode() is broken. The implementers of URL clearly decided that they were going to be ‘clever’ and for all their ‘cleverness’ the only thing they managed to do is screw users of this class. Sun should just admit the error of its ways and just reimplement URL.equals() (with hashCode() being the equivalent).

public boolean equals(Object o) {
    if( !(o instanceof URL)) {
        return false;
    } else if ( o == this ) {
        return true;
    } else {
        return ((URL)o).toString().equalsIgnoreCase(this.toString());
    }
}

The old functionality should be quietly pushed to a uselessEquals() method some place.

The Net
Performance problems and bugs can be introduced from the wildest and least expected locations. Always recheck your assumptions.

“When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.”
-Sherlock Holmes.

Code Review #3: Use the debugger to check “working” code

Friday, September 7th, 2007

There is a bug (or two) in this code:

    protected void compareStreams(InputStream is, InputStream isFromGet)
            throws IOException {
        byte[] newBuf = new byte[4096];
        byte[] oldBuf = new byte[4096];

        int len = 0;
        while ((len = is.read(newBuf)) != -1) {
            boolean flag = false;
            if (isFromGet.read(oldBuf) == len) {
                flag = true;
                newBuf.equals(oldBuf);
            }
            if (!flag) {
                throw new IOException("New file differs from old one.");
            }
        }
        isFromGet.read(newBuf);
        if (isFromGet.read(newBuf) != -1) {
            throw new IOException("New file differs from old one.");
        }
    }

Did you spot any problems? Well maybe you didn't spot any right away. But use a debugger, and the bugs would quickly become obvious.

  1. Look at :
                if (isFromGet.read(oldBuf) == len) {
                    flag = true;
                    newBuf.equals(oldBuf);
                }
    

    The results check newBuf.equals(oldBuf) is completely ignored. So what happens if 2 streams of the same length but with different contents are compared? This method says they are the same!

  2. But even if the check was flag = newBuf.equals(oldBuf), it would still be wrong! Using an IDE (or a debugger) quickly discovers that newBuf.equals():

        public boolean equals(Object obj) {
    	return (this == obj);
        }
    

    This is not the same as comparing the contents of the arrays. The correct call is Arrays.equals(newBuf, oldBuf). And of course after each read the arrays in question should be 0-filled to avoid issues with partially buffers.

  3. Next, what happens if the streams are read at different rates? newBuf and oldBuf will miscompared simply because one stream didn't have data immediately available to it.
  4. Finally,

            isFromGet.read(newBuf);
            if (isFromGet.read(newBuf) != -1) {
                throw new IOException("New file differs from old one.");
            }
    

    What is that first line doing in there!? This allows 'isFromGet' stream to be up to 4096 bytes larger than the 'is' stream!

Yecks! So few lines so many major bugs.

Do a line-by-line code review with a debugger, and discover hidden bugs such as these in "perfect" code.

But finally, haven gotten this far... use the (debugged) work of others...

org.apache.commons.io.IOUtils.contentEquals(InputStream, InputStream) is good and correct. Use that.

Code Review #2: “It ain’t stored until the database says it is”

Monday, September 3rd, 2007

This was a recent comment on a bug report:

EventImpl stores start and end dates as two strings (milliseconds presentation of date). Not clearly understand how information can be lost.

What is wrong with this statement?

Well nothing is “stored” until there is a persistent representation of the data that will outlast the program being stopped. Period.

Putting some value in an instance’s field does not qualify. Unless and until some storage mechanism (database, serialization to a file, etc.) puts it some place permanent – it is not “stored”.

The developer can only say the value is being stored if:

  1. there is an active transaction;
  2. the object being modified is mapped to the database tables.
  3. the field being modified is mapped to some column within that database
  4. the specific object instance being modified is actually participating in the active transaction. There could be an active transaction that this object is not part of!
  5. the transaction commits without errors

For the particular bug report, neither the field nor the object was persisted by the database. Therefore nothing was stored.