Sunday, December 15, 2013

Tragedy of the commons in software

Unowned resources that are shared in software seem to inevitably end up disorganized and there is unfortunately no easy solution.

A few instances of this I have seen include:
  • Shared libraries
  • Shared revision control repositories
  • Shared database
  • Shared folders
  • Shared log files
  • Shared settings
When different projects are using the same shared resource they often have different needs, goals, rules and cadences.  The resources themselves usually don't provide a way to split up the resource cleanly and one projects ends up spilling over into another one.  Two simple examples would be in a source code repository while one group might name their branches release/minor_release another group might follow build/* and in a shared library one project might declare static objects that eat up ram harming another group that is trying to reduce memory usage.
  The inevitable cleaning up of the shared resource grows to become a monumental and bureaucratic task.  Even what seems like a simple task of who maintains/own something can be a large task and at least once is found to be a guy that is no longer with the project (and by the way it is no longer used).
  Because the resource is shared by many different users it already takes up a decent amount of "stuff" (ram, hd space, bandwidth).  There is an admin team that is in charge of making sure more "stuff" is added when needed and the users inevitably take advantage of this to an extreme example where a user decided to check in a Visual Studio install into the revision control system (deep within the source too).  From their perspective they don't really feel the pain that everyone else might suddenly be burdened with an additional 10GB+.
  Some projects run very lean and clean.  They have very strict rules about how things should work and be stored, but there are many more that don't and as time marches on and new projects are added they end up having cruft all over the place and dependancies across what should have been the project divides.  This abuse of the shared resource ends up hurting all of the projects.  Rules are put in place and even if there is a good reason they are difficult to change.
  What seems inevitable is that slowly some projects start breaking off and using a different resource that is hopefully not shared and in the process the old shared resource gets less attention and is unlikely to ever recover as more and more becomes unmaintained.  Sadly it is often not a big thing, but many small problems that users put up with until one day they realize that abandoning everything will give them a significant boost.
  The only solutions to this problem I am aware of is to first recognize the problem early and second to have a steward, someone whose job it is to rapidly respond to problems and anticipate new ones before users start to leave.  The steward's job include occasionally striking down long held rules about the resource because they are found to be harmful and being the one that causes all the projects pain by forcing a migration.  It is only through these actions that the shared resource can maintain its viability in the long run.

Wednesday, April 03, 2013

Code analysis reporting tools don't work

Code analysis tools are good at highlighting code defects and technical debt, but it is when the issues are presented to the developer that determines how effective the tool will be at making the code better.  Tools that only generate reports nightly will be magnitudes less effective than tools that inform developers of errors before a change is put into the repository.

A few weeks ago I played with a code analysis tool that generates a website showing errors that it found in a codebase.  Like most reporting tools this one was made to run on a nightly cron job to generate its reports.  Upon reflection of my career I have never seen tools of this type produced more than a small improvement in a project.  After introduction there are a few developers that strive to keep the area they maintained clean and an even smaller pockets of developers that utilized the tools to raise the quality of their code to the next level, but they were the exception and not the norm.  A scenario I have seen several times over my career was a project that had tools to automatically run unit tests at night.  With this in place you would expect failures to be fixed the next day, but often I saw the failures continue for weeks or months and were only fixed right before a release.  Once the commit was in the repository the developer moves onto another task and considers it done. You could almost call it a law: Before a developer gets a commit into the repository they are willing to move the moon to make the patch right, but after it is in the repository the patch will have to destroy the moon before they will think about revisiting it and even in that case they will ask if you want to fix it so they don't have to.  This means that code analysis reporting tools are able to make only a small impact but no where near what the desired result is.

After pondering why the reporting tools do so poorly  and how they could be improved to make a bigger impact I finally figured out what was really nagging at me, these tools were created because our existing processes are failing.  If we could catch the issues sooner it would both be cheaper to fix the issue and eliminate a whole class of time wastes. While you could think about new developer training, better code review's, mentoring, etc all of which can be improved, a simpler solution would be to move the tools ability closer to the time when the change is made.

In 2007 I started a project that included local commit hooks with Git.  Anytime I had something that could have been automated it was added as a hook.  When you modify file foo.cpp it would run foo's unit tests, code style checking, project building, xml validation and more.  This idea was wildly successful and there were only a few times (~six?) in the lifetime of the project that the main branch failed to build on one of the three OS's or had failing unit tests.  More importantly the quality of the code was kept extremely high though out the project lifetime.   When working in a the much larger WebKit project when you put up a patch for review on the project's Bugzilla a bot would automatically grab the patch and run a gantlet of tests against it adding a comment to the patch when it was done.  Often it was done before the human reviewer even had a chance to look at the patch.  These bots would catch the same technical debt problems and the report tools, but because it was presented at the time of review it would be cleaned up right then and there when it was cheap and easy to do. Automatically reviewing patches after they are made but before they go into the main repository is a very successful way to prevent problems from ever appearing in the code base.

But why stop at commit time?  Many editors have built in warnings from code style to verification of code parsing.  A lot has been written about LLVM's improved compiler warnings and even John Carmack has written how powerful turning on /analyze is for providing static code analysis at compile time.  Much more could be done in this area to find and present issues to the developer in as soon as they create them or even in real time.

Code analysis reporting tools will always be useful because they can provide a view into legacy code, but for new code project using error reporting before commit time with hooks, bots, and editor integration will be able to actually prevent technical debt and do more for quality than nightly reports ever could.

Popular Posts