Tuesday, April 10, 2012

Patches with more than one fix will no longer be tolerated

Patches with more than one fix in them are a bad development practice and need to be stopped.

I was recently looking at Git commit in an open source project.  I was pretty sure part of the commit was wrong, but it was doing a bunch of other stuff which might have negated the issue I spotted.  I wasted fifteen minutes before I realized that the other stuff was actually actually a separate fix and the patch contained two different fixes.  The fixes touched the same code chunk and the commit message made it sound like both changes were effected by the same bug, but that was not the case.  I am pretty sure that fix number one is wrong, but because it is all munged in with fix number two I can't prove it to the original authors without investing significantly more time.  The commit could have been small and easy to talk about in a bug report, but now it will require several hours with multiple patches before it will be resolved.

I have seen this problem before and even given five minute lightning talks about the evil of multiple fixes in one commit, but it is clearly time to put the reasoning down in writing to help stop this behavior on a larger scale.

What is so wrong with a commit that fixes multiple different things?
  • It hurts the reviewer: When reviewing a patch that contains two separate bug fixes the reviewer have to mentally figure out which changes go to which fix. This makes reviewing harder and take longer than it has to be and mistakes are more likely to happen.  The worst mistake is if the reviewer only verifies one fix and not the other.
  • It hurts yourself: the first reviewer of any patch is yourself. The more complicated it is the higher the likeness you will miss something and look foolish later when you have to get patch #2 reviewed.
  • It hurts every future readers: In the future when someone else looks at that patch for whatever reason they will be in the same boat as the reviewer and have to take the extra time and effort and can make the same mistakes.  Except this time they don't have you to ask questions and probably wont lookup the bug in the bug database and will miss that discussion too further increasing the likely hood that they interoperate the patch incorrectly.
  • It hurts the project: It is rare for a patch to be perfect the first time around and the bigger they are the higher the likeliness that they will have a few rounds of tweaking before they can go in. If the patch has two separate fixes, one fix will usually be done before the other, but by tying them together in one commit you are denying the project the finished fix until the second one is ready. This might be an hour or a day, but it could be months.
  • It hurts you later if you they are at all wrong: Patches are not always perfect. Sometimes they are flat out wrong and need to be reverted. With a commit with two fixes you can not just do a "git revert [sha]", you have to dissect the patch and revert only half of it. Once again you have to mentally figure out what part of the patch goes to which fix. And because we are not perfect (hey we had to revert something remember) there is a chance that you will revert not all of the correct bits or revert something that didn't apply.  I have never seen someone that split a revert test that the other half they left behind still works now that I think of it!  Worst case scenario the person reverting the commit doesn't realize the commit is two fixes and reverts the whole patch assuming it only was a single fix and no one spots it until the regression bug report from the user rolls in.
Disclaimer: I have seen all of the above happen with my own eyes multiple times. These are not made up scenarios, they really happened, wastes my time and decreased my productivity.

This practice is typically found in cases where both fixes are in the same file.  With revision control systems like SVN or Perforce splitting multiple fixes in a single file was a non trivial task.  Typically you grudging commit files in your working tree with all of the changes.  But when using Git the situation is different.

If you are working in a file and notice something else there that needs to be changed you have options:

git branch - Branching in Git is cheap, make a branch for your existing code or commit and start a new branch for the new fix.

But if that is too much work, you can:

git stash - Git stash will store your changes in a hidden branch while you make the other fix in its own commit and then use Git stash apply your changes and continue your work.

But if that is still too much work, you can:

git add -p or git add -i - When it comes time to make the commits rather than adding the whole file to be committed, Git add -p will prompt you one by one through each chunk in the file to find out if it should be added to the staging.  Only add the relevant parts to a fix, commit and then do the same to the other fix.  A more detailed explanation with examples can be found on this blog entry on git add -p or in the git add man page.  (article bonus tip: git checkout -p exists and does exactly what you think it does)

Splitting commits

If you already have committed a patch and only realize after the fact that it should have been two Git is still there for you. If it was the last commit you can git reset HEAD^ and then use git commit -p to add them as two commits (You can even use git commit -c [sha] so you don't have to re-type the half of the commit message that matters). And if it isn't the top commit that is okay too, you can perform an interactive rebase and mark the commit to be edited. When the rebase reaches that commit, use git reset HEAD^, add the multiple new commits, and finish the rebase.  The git rebase man page has detailed instructions on how to split commits.

Why not a hook?

When making a commit message if you find yourself writing multiple paragraphs about different topics, listing multiple different bug id's, using words such as "while I was in this file I ..." or even just the word "Also" it is a pretty good hint that you probably want to step back and split your commit into two. Because Git is a distributed revision control system the infrastructure to support server side push hooks exists on your desktop too.   To use hooks and automate the process of spotting commits that should be split of you can whip up a one liner commit-msg hook that greps for the word "Also" and outputs a warning.  I have even tossed together a slightly more fancy version for the git-hooks project.

So now you have been warned.  Creating patches with multiple fixes is a bad practice that wastes time and can cause errors.  And if you are asked to review a commit that should be two, reject it and have them split it.  Patches with more than one fix will no longer be tolerated.

Popular Posts