Code analysis tools are good at highlighting code defects and technical debt, but it is when the issues are presented to the developer that determines how effective the tool will be at making the code better. Tools that only generate reports nightly will be magnitudes less effective than tools that inform developers of errors before a change is put into the repository.
A few weeks ago I played with a code analysis tool that generates a website showing errors that it found in a codebase. Like most reporting tools this one was made to run on a nightly cron job to generate its reports. Upon reflection of my career I have never seen tools of this type produced more than a small improvement in a project. After introduction there are a few developers that strive to keep the area they maintained clean and an even smaller pockets of developers that utilized the tools to raise the quality of their code to the next level, but they were the exception and not the norm. A scenario I have seen several times over my career was a project that had tools to automatically run unit tests at night. With this in place you would expect failures to be fixed the next day, but often I saw the failures continue for weeks or months and were only fixed right before a release. Once the commit was in the repository the developer moves onto another task and considers it done. You could almost call it a law: Before a developer gets a commit into the repository they are willing to move the moon to make the patch right, but after it is in the repository the patch will have to destroy the moon before they will think about revisiting it and even in that case they will ask if you want to fix it so they don't have to. This means that code analysis reporting tools are able to make only a small impact but no where near what the desired result is.
After pondering why the reporting tools do so poorly and how they could be improved to make a bigger impact I finally figured out what was really nagging at me, these tools were created because our existing processes are failing. If we could catch the issues sooner it would both be cheaper to fix the issue and eliminate a whole class of time wastes. While you could think about new developer training, better code review's, mentoring, etc all of which can be improved, a simpler solution would be to move the tools ability closer to the time when the change is made.
In 2007 I started a project that included local commit hooks with Git. Anytime I had something that could have been automated it was added as a hook. When you modify file foo.cpp it would run foo's unit tests, code style checking, project building, xml validation and more. This idea was wildly successful and there were only a few times (~six?) in the lifetime of the project that the main branch failed to build on one of the three OS's or had failing unit tests. More importantly the quality of the code was kept extremely high though out the project lifetime. When working in a the much larger WebKit project when you put up a patch for review on the project's Bugzilla a bot would automatically grab the patch and run a gantlet of tests against it adding a comment to the patch when it was done. Often it was done before the human reviewer even had a chance to look at the patch. These bots would catch the same technical debt problems and the report tools, but because it was presented at the time of review it would be cleaned up right then and there when it was cheap and easy to do. Automatically reviewing patches after they are made but before they go into the main repository is a very successful way to prevent problems from ever appearing in the code base.
But why stop at commit time? Many editors have built in warnings from code style to verification of code parsing. A lot has been written about LLVM's improved compiler warnings and even John Carmack has written how powerful turning on /analyze is for providing static code analysis at compile time. Much more could be done in this area to find and present issues to the developer in as soon as they create them or even in real time.
Code analysis reporting tools will always be useful because they can provide a view into legacy code, but for new code project using error reporting before commit time with hooks, bots, and editor integration will be able to actually prevent technical debt and do more for quality than nightly reports ever could.
Meta Magic
Benjamin Meyer's software development blog
Wednesday, April 03, 2013
Wednesday, August 22, 2012
The minimal amount of data needed to lock in users
I recently upgraded to OS X Mountain Lion only to find that RSS support wasn't just moved out of Mail, but out of Safari too. RSS bookmarks were the only reason I was still using Safari on a daily basis so this removal is forcing me to migrate them somewhere else and in the process stopping my daily usage of Safari.
Stepping back, I realized how crazy it was that I was using Safari to read RSS. The last five years I have been working on WebKit and browsers, three of those years (until RIM legal killed it) were spent making my very own browser called Arora. And yet through all those years I still kept using Safari because the switching cost of the RSS feeds were "too high" (I had a mac around with safari so why not just keep using it...). I even started hacking on a desktop RSS reader at one point. RSS feed's are not locked into Safari, the Export Bookmarks action is right there in the File menu* and Safari doesn't keep feed data for more then about a month so it wasn't even the rss history I cared about, just the urls.
Stepping back, I realized how crazy it was that I was using Safari to read RSS. The last five years I have been working on WebKit and browsers, three of those years (until RIM legal killed it) were spent making my very own browser called Arora. And yet through all those years I still kept using Safari because the switching cost of the RSS feeds were "too high" (I had a mac around with safari so why not just keep using it...). I even started hacking on a desktop RSS reader at one point. RSS feed's are not locked into Safari, the Export Bookmarks action is right there in the File menu* and Safari doesn't keep feed data for more then about a month so it wasn't even the rss history I cared about, just the urls.
Here is a case of the bare minimum of data locking and yet it was able to keep a user that writes browsers (including rss feed plugins), uses a different OS as his primary desktop for years. In the past when I thought about data lock in I thought about databases, custom scripts, iCloud, but with this I realize that the bar is much lower. It wasn't until they forcefully took away the feature that I sat up in a daze wondering what I was doing and went looking for an alternative and in the process am going to abandon the application entirely.
Now imagine you are a Windows user and suddenly all your apps don't work on the new Metro arm based laptops. It is probably the needed kick in the pants to sit up and go checkout what those mac's, web applications, and ipads are all about. Scary stuff for Microsoft.
* You would think with Safari RSS users suddenly not having their RSS feeds apps like NetNewsWire would provide a bookmarks import, but oddly they don't (as of yesterday when I checked with the current version).
Wednesday, May 23, 2012
When publishing onto two platforms one will end up being the "lesser" of the two.
When a company produces a product for multiple platform invariably one of the platform is the primary platform. This can take on a number of forms such as:
- Releasing to one platform first.
- Releasing updates only to one platform.
- Releasing a reduced feature set for the later platforms.
- Releasing a product for a later platform that while works doesn't fit in or follow that platforms UI guidelines.
- The primary platform is stable while the secondary ones have bugs/crash.
Some big examples:
- Video drivers: Windows XP v.s. Linux
- Flash: Windows v.s. Linux
- Video games: PS3/360 v.s. the Wii
- Mobile apps: iphone v.s. android
- Git: Linux v.s. Windows
- Books: physical v.s. ebook
- DVD's: U.S. v.s. Australia
There are many reasons for why this happens such as management believing that the primary platform will make more money, or the company (or the developers) have more experience in the primary platform or even as silly as the CEO getting the primary platform for christmas and mandating it is the primary. The secondary platforms are seen as nice to have and a possible extra source of revenue, but it would be foolish to think that they will have the same quality/features as the primary platform.
If the company has any hard times they will kill the secondary platform first. If the product is ever killed it will almost always first be killed on the secondary platforms.
This is often a frustrating thing for the consumer as they typically can't do much about it, but at least realizing that you are on a secondary platform can help you schedule extra testing time and lower your expectations about what you will get.
The one nice thing is once you realize that product X is the future and its primary platform is not on your platform of choice and you believe that platform is the future then there is an opportunity. The Wii can't run 360 games, but it does have a set of games that take advantage of its hardware that can't run on the 360. ebooks coming from publishers wont replace traditional books, but a company that creates reading product that targets tablets first and physical books second will come to dominate ebooks.
Look around at the tools and products that you use. What is their primary platform? Is that your platform?
Tuesday, April 10, 2012
Patches with more than one fix will no longer be tolerated
Patches with more than one fix in them are a bad development practice and need to be stopped.
I was recently looking at Git commit in an open source project. I was pretty sure part of the commit was wrong, but it was doing a bunch of other stuff which might have negated the issue I spotted. I wasted fifteen minutes before I realized that the other stuff was actually actually a separate fix and the patch contained two different fixes. The fixes touched the same code chunk and the commit message made it sound like both changes were effected by the same bug, but that was not the case. I am pretty sure that fix number one is wrong, but because it is all munged in with fix number two I can't prove it to the original authors without investing significantly more time. The commit could have been small and easy to talk about in a bug report, but now it will require several hours with multiple patches before it will be resolved.
I have seen this problem before and even given five minute lightning talks about the evil of multiple fixes in one commit, but it is clearly time to put the reasoning down in writing to help stop this behavior on a larger scale.
What is so wrong with a commit that fixes multiple different things?
This practice is typically found in cases where both fixes are in the same file. With revision control systems like SVN or Perforce splitting multiple fixes in a single file was a non trivial task. Typically you grudging commit files in your working tree with all of the changes. But when using Git the situation is different.
If you are working in a file and notice something else there that needs to be changed you have options:
git branch - Branching in Git is cheap, make a branch for your existing code or commit and start a new branch for the new fix.
But if that is too much work, you can:
git stash - Git stash will store your changes in a hidden branch while you make the other fix in its own commit and then use Git stash apply your changes and continue your work.
But if that is still too much work, you can:
git add -p or git add -i - When it comes time to make the commits rather than adding the whole file to be committed, Git add -p will prompt you one by one through each chunk in the file to find out if it should be added to the staging. Only add the relevant parts to a fix, commit and then do the same to the other fix. A more detailed explanation with examples can be found on this blog entry on git add -p or in the git add man page. (article bonus tip: git checkout -p exists and does exactly what you think it does)
Splitting commits
If you already have committed a patch and only realize after the fact that it should have been two Git is still there for you. If it was the last commit you can git reset HEAD^ and then use git commit -p to add them as two commits (You can even use git commit -c [sha] so you don't have to re-type the half of the commit message that matters). And if it isn't the top commit that is okay too, you can perform an interactive rebase and mark the commit to be edited. When the rebase reaches that commit, use git reset HEAD^, add the multiple new commits, and finish the rebase. The git rebase man page has detailed instructions on how to split commits.
Why not a hook?
When making a commit message if you find yourself writing multiple paragraphs about different topics, listing multiple different bug id's, using words such as "while I was in this file I ..." or even just the word "Also" it is a pretty good hint that you probably want to step back and split your commit into two. Because Git is a distributed revision control system the infrastructure to support server side push hooks exists on your desktop too. To use hooks and automate the process of spotting commits that should be split of you can whip up a one liner commit-msg hook that greps for the word "Also" and outputs a warning. I have even tossed together a slightly more fancy version for the git-hooks project.
So now you have been warned. Creating patches with multiple fixes is a bad practice that wastes time and can cause errors. And if you are asked to review a commit that should be two, reject it and have them split it. Patches with more than one fix will no longer be tolerated.
I was recently looking at Git commit in an open source project. I was pretty sure part of the commit was wrong, but it was doing a bunch of other stuff which might have negated the issue I spotted. I wasted fifteen minutes before I realized that the other stuff was actually actually a separate fix and the patch contained two different fixes. The fixes touched the same code chunk and the commit message made it sound like both changes were effected by the same bug, but that was not the case. I am pretty sure that fix number one is wrong, but because it is all munged in with fix number two I can't prove it to the original authors without investing significantly more time. The commit could have been small and easy to talk about in a bug report, but now it will require several hours with multiple patches before it will be resolved.
I have seen this problem before and even given five minute lightning talks about the evil of multiple fixes in one commit, but it is clearly time to put the reasoning down in writing to help stop this behavior on a larger scale.
What is so wrong with a commit that fixes multiple different things?
- It hurts the reviewer: When reviewing a patch that contains two separate bug fixes the reviewer have to mentally figure out which changes go to which fix. This makes reviewing harder and take longer than it has to be and mistakes are more likely to happen. The worst mistake is if the reviewer only verifies one fix and not the other.
- It hurts yourself: the first reviewer of any patch is yourself. The more complicated it is the higher the likeness you will miss something and look foolish later when you have to get patch #2 reviewed.
- It hurts every future readers: In the future when someone else looks at that patch for whatever reason they will be in the same boat as the reviewer and have to take the extra time and effort and can make the same mistakes. Except this time they don't have you to ask questions and probably wont lookup the bug in the bug database and will miss that discussion too further increasing the likely hood that they interoperate the patch incorrectly.
- It hurts the project: It is rare for a patch to be perfect the first time around and the bigger they are the higher the likeliness that they will have a few rounds of tweaking before they can go in. If the patch has two separate fixes, one fix will usually be done before the other, but by tying them together in one commit you are denying the project the finished fix until the second one is ready. This might be an hour or a day, but it could be months.
- It hurts you later if you they are at all wrong: Patches are not always perfect. Sometimes they are flat out wrong and need to be reverted. With a commit with two fixes you can not just do a "git revert [sha]", you have to dissect the patch and revert only half of it. Once again you have to mentally figure out what part of the patch goes to which fix. And because we are not perfect (hey we had to revert something remember) there is a chance that you will revert not all of the correct bits or revert something that didn't apply. I have never seen someone that split a revert test that the other half they left behind still works now that I think of it! Worst case scenario the person reverting the commit doesn't realize the commit is two fixes and reverts the whole patch assuming it only was a single fix and no one spots it until the regression bug report from the user rolls in.
This practice is typically found in cases where both fixes are in the same file. With revision control systems like SVN or Perforce splitting multiple fixes in a single file was a non trivial task. Typically you grudging commit files in your working tree with all of the changes. But when using Git the situation is different.
If you are working in a file and notice something else there that needs to be changed you have options:
git branch - Branching in Git is cheap, make a branch for your existing code or commit and start a new branch for the new fix.
But if that is too much work, you can:
git stash - Git stash will store your changes in a hidden branch while you make the other fix in its own commit and then use Git stash apply your changes and continue your work.
But if that is still too much work, you can:
git add -p or git add -i - When it comes time to make the commits rather than adding the whole file to be committed, Git add -p will prompt you one by one through each chunk in the file to find out if it should be added to the staging. Only add the relevant parts to a fix, commit and then do the same to the other fix. A more detailed explanation with examples can be found on this blog entry on git add -p or in the git add man page. (article bonus tip: git checkout -p exists and does exactly what you think it does)
Splitting commits
If you already have committed a patch and only realize after the fact that it should have been two Git is still there for you. If it was the last commit you can git reset HEAD^ and then use git commit -p to add them as two commits (You can even use git commit -c [sha] so you don't have to re-type the half of the commit message that matters). And if it isn't the top commit that is okay too, you can perform an interactive rebase and mark the commit to be edited. When the rebase reaches that commit, use git reset HEAD^, add the multiple new commits, and finish the rebase. The git rebase man page has detailed instructions on how to split commits.
Why not a hook?
When making a commit message if you find yourself writing multiple paragraphs about different topics, listing multiple different bug id's, using words such as "while I was in this file I ..." or even just the word "Also" it is a pretty good hint that you probably want to step back and split your commit into two. Because Git is a distributed revision control system the infrastructure to support server side push hooks exists on your desktop too. To use hooks and automate the process of spotting commits that should be split of you can whip up a one liner commit-msg hook that greps for the word "Also" and outputs a warning. I have even tossed together a slightly more fancy version for the git-hooks project.
So now you have been warned. Creating patches with multiple fixes is a bad practice that wastes time and can cause errors. And if you are asked to review a commit that should be two, reject it and have them split it. Patches with more than one fix will no longer be tolerated.
Saturday, January 21, 2012
parallelizing sequential work in amdahl's law
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program.When a sequential fraction of the program is the act of splitting up the data you can remove this sequential work by parallelizing the splitting and pseudo randomly picking split points, later tossing out duplicates in the join step. In this senario you trade the requirement of extra computing power for faster results.
Wednesday, November 02, 2011
Using collaborative diffusion rather than path finding for the Google AI ant challenge.
For the 2011 Google Ants AI Challenge rather than doing the typical solution of choosing direction for each ant based upon the shortest path to some goal I used a diffusion based approach which was simpler, faster to code and resulted in some nice emergent behavior with very little work.
A while back I read the paper Collaborative Diffusion: Programming Antiobjects. The 2011 Ants AI Challenge seemed like a perfect test for it. Rather than having each ant, hill, water etc be its own object, the map is the only object. The map computes the diffusion values for agents (such as food and searching) on each square and then each decide where to have each ant go not based upon any path finding algorithm, but simply based upon the diffusion values surrounding the square.
Each square on the board has several agents or diffused values (food, explore, hill, etc). On every turn the program would loop through the map setting or diffusing all of the squares and then loop through the ants using the diffused values at the ant's square and some very basic logic to decide which way the ant should move.
When running it against the test bots even the simplest version would get neat emergent behavior such as in a maze when two ants follow each other down a hall when they reach a fork they will naturally go separate ways or when there are five bad guys against the one good guy the diffusion value wont say to attack until you have six guys.
The two simplest things ants want to do are get food and explore. To make the ants go after food we first have to diffuse the food value and then have the ants go to the neighboring square with the highest FOOD value.
A snippet of slightly modified code from my diffusion function which is run on each square on the map.
return;
foreach (goal, goalsToDiffuse) {
To diffuse food when a square is water it should set the food diffused value to 0 so ants never want to go into the water when looking for food, otherwise when a square has food set the food diffused value to a large value. This large value is the final goal for the ants. Lastly if we are not water nor food we simply diffused value to the diffusion value of the neighboring squares.
To diffuse exploration I include in the information of when we last saw the square into the diffusion value so my ants always go to the part of the map that has been least explored and where they can potentially find hills. This prevents them from bouncing up and down, but always moving forward.
After diffusing you want to output the moves. This is where the real logic comes into play. My first version was similar to the following pseudo code:
outputMove will chose the neighboring square (N, S, E, W) with the highest goal value. In effect the ants will move to the food/explore/hill/etc via the shortest route.
The results are are fun to watch and with just the above code the ants will explore maps of any shape be it mazes or open lands eating up food and and as more and more ants are born they will naturally separate across the board waiting for new food to appear always moving to the squares that it saw least recently. Adding a HILL to the above diffusion would be similar to FOOD, and the logic for the ant movement is as simple as giving it a higher precedence than FOOD. And once the diffused HILL value reaches the ants again the map doesn't matter be it a maze or open land, the ants will march to the hill via the shortest route.
The diffusion approach to the ant problem was simple to code (no shortest path algorithms) and I was able to get the first version up and working in about an hour. The runtime is O(rows*cols) (notice it isn't dependent upon the number of ants), the cpu time is extremely small compared to other solutions, and the memory overhead is pretty much non-existant compared to other solutions.
Given both the ease at which this is implemented and the small number of lines it would be nice if a basic diffusion based solution with goals for food, explore and hill were included with the stock python bots that the aichallenge package includes. It would be both a great starting point for making more advanced diffusion bots and as a stronger test bot, but one that still used very little cpu.
Anti-objects are a neat concept that I don't hear too much about and something I wish I had explored sooner. If I was introducing someone to programming the ants problem combined with a diffusion solution would definitely be a small effort, huge reward way to go. The simplicity of the algorithm combined with the low memory and cpu overhead also was an aspect that I had not looked at before and could probably be utilized to make some great cheap behavior for AI in games, demos and other places.
A while back I read the paper Collaborative Diffusion: Programming Antiobjects. The 2011 Ants AI Challenge seemed like a perfect test for it. Rather than having each ant, hill, water etc be its own object, the map is the only object. The map computes the diffusion values for agents (such as food and searching) on each square and then each decide where to have each ant go not based upon any path finding algorithm, but simply based upon the diffusion values surrounding the square.
Each square on the board has several agents or diffused values (food, explore, hill, etc). On every turn the program would loop through the map setting or diffusing all of the squares and then loop through the ants using the diffused values at the ant's square and some very basic logic to decide which way the ant should move.
When running it against the test bots even the simplest version would get neat emergent behavior such as in a maze when two ants follow each other down a hall when they reach a fork they will naturally go separate ways or when there are five bad guys against the one good guy the diffusion value wont say to attack until you have six guys.
The two simplest things ants want to do are get food and explore. To make the ants go after food we first have to diffuse the food value and then have the ants go to the neighboring square with the highest FOOD value.
A snippet of slightly modified code from my diffusion function which is run on each square on the map.
void Map::diffusion(int r, int c)
{ std::vector goalsToDiffuse; Square *square = &grid[r][c]; // water blocks everything if (square->isWater) { square->agents[EXPLORE] = 0; square->agents[FOOD] = 0;return;
}
// FOOD if (square->isFood) { square->agents[FOOD] = INT_MAX; } else { goalsToDiffuse.push_back(FOOD); }
// EXPLORE if (!square->isVisible) { square->agents[EXPLORE] = INT_MAX - ((200 - square->lastSeen) * 300); } else { goalsToDiffuse.push_back(EXPLORE); }
...foreach (goal, goalsToDiffuse) {
double up = upSquare->agents[goal]; ... square->agents[goal] = 0.25 * (up + down + left + right); }To diffuse food when a square is water it should set the food diffused value to 0 so ants never want to go into the water when looking for food, otherwise when a square has food set the food diffused value to a large value. This large value is the final goal for the ants. Lastly if we are not water nor food we simply diffused value to the diffusion value of the neighboring squares.
To diffuse exploration I include in the information of when we last saw the square into the diffusion value so my ants always go to the part of the map that has been least explored and where they can potentially find hills. This prevents them from bouncing up and down, but always moving forward.
After diffusing you want to output the moves. This is where the real logic comes into play. My first version was similar to the following pseudo code:
foreach(ant, myants) { food = valueOfFoodAt(ant); if (food != 0) goal = FOOD; else goal = EXPLORE; outputMove(ant, goal);}outputMove will chose the neighboring square (N, S, E, W) with the highest goal value. In effect the ants will move to the food/explore/hill/etc via the shortest route.
The results are are fun to watch and with just the above code the ants will explore maps of any shape be it mazes or open lands eating up food and and as more and more ants are born they will naturally separate across the board waiting for new food to appear always moving to the squares that it saw least recently. Adding a HILL to the above diffusion would be similar to FOOD, and the logic for the ant movement is as simple as giving it a higher precedence than FOOD. And once the diffused HILL value reaches the ants again the map doesn't matter be it a maze or open land, the ants will march to the hill via the shortest route.
The diffusion approach to the ant problem was simple to code (no shortest path algorithms) and I was able to get the first version up and working in about an hour. The runtime is O(rows*cols) (notice it isn't dependent upon the number of ants), the cpu time is extremely small compared to other solutions, and the memory overhead is pretty much non-existant compared to other solutions.
Given both the ease at which this is implemented and the small number of lines it would be nice if a basic diffusion based solution with goals for food, explore and hill were included with the stock python bots that the aichallenge package includes. It would be both a great starting point for making more advanced diffusion bots and as a stronger test bot, but one that still used very little cpu.
Anti-objects are a neat concept that I don't hear too much about and something I wish I had explored sooner. If I was introducing someone to programming the ants problem combined with a diffusion solution would definitely be a small effort, huge reward way to go. The simplicity of the algorithm combined with the low memory and cpu overhead also was an aspect that I had not looked at before and could probably be utilized to make some great cheap behavior for AI in games, demos and other places.
Tuesday, October 18, 2011
Qt on Blackberry
Today it was announced that Qt will be included in the BlackBerry native SDK so you can put your Qt apps on BlackBerry devices.
Subscribe to:
Posts (Atom)
Popular Posts
-
Git hooks are scripts that are run by Git before or after certain commands. Because the hooks are run locally and not on the server it allo...
-
A little over a year ago on reddit I saw a picture of a guy sitting on the edge of a rock . Not just any rock, but a clip of a rock that je...
-
Depending on the definition "code review" can mean a wide variety of things such as formal code review or automated code analysis....
-
For the 2011 Google Ants AI Challenge rather than doing the typical solution of choosing direction for each ant based upon the shortest pa...
-
SSD's provide significant improvement in disk IO. How well does that translate over when using Git? A lot. After watching prices dro...
-
Update: See my more recent blog post Git hooks for a more in depth look at Git hooks. The past few weeks I have started learning Git and ...
-
I have created a new application called KAudioCreator. It is a front-end tool for ripping and encoding CD's. I am happy to say that KAud...
-
All through my programming career I have had a whiteboard, but beyond simply making sure I had one I have never thought much more about it. ...
-
This weekend I created a little application called git achievements . Similar to the XBox360 Achievements you can unlock all sorts of Achie...
-
In late 1997 when I was working with hand tools or typing for extended periods of time, my wrists would occasionally start to tingle. Like ...
