Sunday, November 02, 2008

The status of Chromium on Linux

When Chromium was first announced in the beginning of September I was very surprised that it was a Windows only application given that WebKit is very much cross platform. The past few weeks I have been spending a little bit of time here and there hacking on the source code and thought I would write an update for those who are interested on the status of the native port of Chromium on Linux.

On day one you could checkout the source code on Linux and you could build some things. Of course all that you were building were some object files, nothing more, not even Webkit was being built. There was no test application, no linking and no Chrome.

From what I can tell nearly all of the development for Chrome was done on Windows in Visual Studio. There is even a c# tool that can be found in the sources. This lead to the case where the normal course of action when something didn't build on Linux was to just disable it. So by the time that the release was made nearly nothing was being built. I am also pretty sure that the Chromium port was entirely different then the Android port

Starting with the glue directory I went file by file fixing the compiler errors. Developed in Windows there was fun fixes such as the Windows "String.h" include that was not used, but caused build breakage on non-Windows platforms. Many patches later a lot more builds and Linux. Linux is part of the build farm so as each file was fixed and enabled it became one more file that Windows developers could not break or their change would be reverted. The Chromium developers had also been syncing up with WebKit trunk and so the Chromium fork of WebKit also built. Attempting to link a test application we found only 500 missing symbols. Tracking these down one by one and either porting, fixing or adding a stub the test application finally linked about a week ago. It can't do anything and will crash, but that is where we are. (update: Pointed out by Evan as of Friday it can render google.com.)

Beyond a webkit test application, Chrome on Linux is far away from happening. There are still a large number of basic components such as fonts, clipboard, plugins, and basic widget handling that have to be implemented. The test application is only a test for WebKit. In the next month or so you should be able to see WebKit rendering and at some point next year a beta of Chrome on Linux will be running, but it wont be tomorrow or next week.

Sharing some of the same code OS X is in the same boat, but they are a little better because they can re-use the OS X WebKit code, they currently can render pages, but can't do things like click on links.

Things are not as bad as they sound. The Chromium team is a good team that is moving forward quickly. As they integrate more with upstream WebKit more of the Linux port will work. They are also re-writing chunks of their code to be more cross platform friendly. And now that OS X is kinda working and Linux links there is a lot more interest in putting developer time on the Linux port to get things done.

An important lesson that Google should take away from this is the importance of developing cross platform software on all platforms at the same time. Assigning just one guy six months ago to the Linux port would have significantly improved the source on day 1. Having the build bot catch errors early such as using a Windows only function rather then a C function pays off in developing cross platform code. The sad thing is that any company that does cross platform development could have told them this without them having to learn it the hard way.

If you are interested in helping with Chromium and want to hack in git rather than on svn, an official mirror that is updated every other minute is now on github:
http://github.com/chromium/chromium/tree/master

I have written up getting started instructions and created a script to upload (and update) patches from git here:
http://github.com/icefox/chromium_tools/tree/master

Thanks to the guys on #chromium-dev who are very patient and helpful.

Friday, October 17, 2008

Git Hooks

Git hooks are scripts that are run by Git before or after certain commands. Because the hooks are run locally and not on the server it allows for a lot of freedom to write more in depth scripts. The scripts are located in the .git/hooks directory and Git comes with some sample scripts that can be enable by removing the .sample suffix and making it executable.

A nice example is the included pre-commit script which checks for trailing whitespace in the patch and exits with 1 if any are found. If a script exits with non-zero then the command is aborted.

Below is a simple, but very useful script. All it does is run make in your current directory if a Makefile exists. If make fails then it returns non zero preventing the commit from occurring. While this script is crude and verbose, it works in preventing commits the break the build. Compared to a server side script which has to build the entire project from scratch on potentially a slow machine, there should be nothing to build (you did run make before committing right?) so it should return very quickly. With this one script you can prevent most build errors from getting pushed up to the central repository and causing problems for other developers. One liner "fixes" that break the build can be a thing of the past.

pre-commit_make

#!/bin/sh
echo "--Attempting to build--"
if [ -f Makefile ] ; then
make --quiet
if [ $? != 0 ] ; then
echo "--Build failure--";
exit 1
fi
fi
echo "--Attempting to build pass--"
And really you can simplify it down to just this
#!/bin/sh
[ ! -f Makefile ] || make


Files in the .git/hooks directory are not part of the repository and so they are not tracked. A workaround is to have a git_hooks directory at the top of your repository like done in Arora and symlink .git/hooks to git_hooks whenever you clone. This way the hooks will be part of the project, under revision control and accessible to everyone.

When using git commit if you want to skip the hooks you can pass the --no-verify option.

Although many hooks you see are written in shell script they can be in any language don't even need to be a script, just an executable in the hooks directory.

An important thing to remember when writing hooks is that the patch and the list of patched files is available. A script that validates XML shouldn't do anything if the change doesn't modify xml files. 'git diff-index --name-only HEAD' is one simple way to get the list of all the changed files Using this you can make sure that no the hooks will always run quickly and only test/check what is needed.

There are a number of different hooks that git provides access to. For a complete list and details about what each of them are for checkout the Git hooks documentation.

Here are some hooks that I have either thought about doing, have already done, or found via Google:

pre-commit
- Check that the project compiles
- Check for compiler warnings
- Check that the auto tests for the files that are being modified still pass.
- Check that the performance tests for the files that are being modified still pass.
- Check that the code in the patch conforms to a code style
- Validate XML files
- Spell check user strings, documentation, etc
- Check for binary incompatibility
- Try to build a release package
- Check that any public API is documented and has no errors
- Verify any new files have a copyright header
- Check that all public strings are properly wrapped so they can be translated
- Check for calls to functions that are known to be deprecated or inefficient.
- Check for swear words
- Reject commits made between 4am and 7am with a note to go to bed.
- Automatically remove whitespace
- Run RSpec tests before allowing a commit
- Some rake testing

commit-msg
- Spell check the commit message
- Verify an agreed upon commit message format is used.
- Prevent business announcements or key words from leaking out in a message.

pre-rebase
- Useful for preventing an accidental rebase from occurring on a master branch that everyone shares.

post-commit
- Send out an e-mail to the mailing list
- Build a package
- Build/Update API documentation for the website
- Update/close bugs in a bug tracking system
- Spawn off builds on other operating systems

post-receive
- Update tickets in Lighthouse

I have mostly played around with the user side where each commit is one patch and life is pretty simple, but on the server a push can contain a number of commits and the scripts are only called once. Checking out the docs and the scripts in the git source code contrib/hooks directory for more information on how to extract each commit.

Taking advantage of the hooks system one can easily add checks for all sorts of things that can be automated resulting in always building source that is higher quality, more consistent, and with less embarrassing errors. The ease that anyone can add hooks will hopefully cause more and more projects to utilize this built in part of git.

If you have written an interesting hook feel free to post a link to it in the comments for others to checkout.

Hook photo taken by *L*u*z*a*

Tuesday, October 07, 2008

LSDVD

LSDVD was the name of a successful project to create a fully functional software DVD player for Linux back in 1999.

At the start of my freshman year at RIT in 1999 A guy by the name of Gad gave a presentation about the DVD format and his project to create a software DVD player for Linux. This was all back before DeCSS, nearly every DVD related tool that exists today and really before Linux was used by many as a desktop machine. I didn't even own a DVD reader for my computer, but it sounded like an exciting project so I picked up a 2X dvd drive for $109 and ordered my first DVD online.

I got involved in the project by working on the graphical front-end for the application. Over time I worked on many areas of the project, but I was in charge of the front-end from then on. We worked on and off on the project while going to school and through the winter, but the project was going very slowly. RIT runs on the quarter system and so in the spring took time off and rented a house north of RIT in Greece so we could work on the project full time. Day in and day out we worked on the project. Moving into the house with only a few hundred dollars to my name I slept on a couch in the basement and ate what canned food and Ramen noodles I had. We bought wood from Home Depot and built ourselves some desks for our computers and placed them in the living rooms. The house was pretty much empty other then a kitchen table and a mattress in one of the rooms. We would wake up, sit at the computer and code till it was time to sleep. Highlights included when someone would come over and we would have a real cooked meal (and watching office space).

Before DeCSS there was very little selection for DVD's to test because we could only read DVD's that were not encrypted and region free. This turned out to be mostly old gore flicks. It was a nice welcome when we could finally access our DVD collection to test against. The other two guys in the group were Paul and Dave. Dave was a crazy hacker who was in charge of the video decoding. Paul was in charge of the project. He set us up with the cvs server, managed the build script, all the small things that a project needs to survive, and of course huge chunks of the DVD player itself.

When we first got the MPEG2 decoder written it was still slow and so it would que twenty seconds of frames and then beep several times before playing it. We had a computer in the corner of the room playing Enemy of the State. Every five minutes it would beep and we would all run over to watch the video play. It took at least a day or so to get through the movie.

At each stage there was a fun magical moment as it came together for the first time. Getting audio working for the first time. Getting the audio and video sync working, subtitles, menus, and more. And of course we had to do testing too :)

Gad was looking for a company to support us and luckily one decided to buy us. Two months after we had moved into the house in Greece, we were acquired by a company in California. The four of us had never officially formed a company in that short time and so in trade for the code we got a small sign on bonus. Part of the agreement was that we had to live in California (which they paid for) and we were all excited about that. Flying out to California we spent the summer working on the DVD player. I have many good memories from that trip. We got to see a bit of California and went to a number of trade shows. I really grew as a programmer on this project, learning quite a bit from the other three especially Paul who designed the system. It was my first real large-scale project. Returning to Rochester that fall we continued to work on the player and got it to almost release status, but alas the project was killed in April 2001 due to financial problems within the company. It was less then a year later when the company was sold. The only one who have the application now is a small server sitting somewhere in the new company with our CVS code repository on it.

LSDVD includes the following features
* Almost perfect (within 1 frames) audio/video synchronization.
* Ran smoothly on a Celeron300a.
* Full menu support including subpictures.
* Graphical frontend.
* Didn't need to mount the media before playing.
* Worked on SMP machines.
* Fully supported AC3/PCM audio.
* All of the features that you would expect a stand alone DVD player to have such as seamless branches.

Whenever I see how little the software dvd players on Linux can still do and how much CPU they require I always have to smile.

Below is the one screenshot I still have of the application which was our easter egg and team photo.

My Kipling hacker bag

In March of 1999 the Belgian bag manufacturing company Kipling created a contest to promote their new line of "Hacker" clothes. The first 99 people to guess the login and password to their contest site got a "Hacker" bag. This was an interesting enough of a story to get it posted on http://www.slashdot.org which is where I saw it. I began to work on the problem and with the help of several others eventually got a bag.

The way that you found out if your login and password was correct was by a JavaScript program embedded in the web page. Having done some JavaScript the previous year I figured that it would be pretty easy to modify the page and JavaScript code to go through every possible combination until the winner was found.

Pretty soon I knew that the JavaScript engines of the day would be way to slow and a C/C++ version was needed. Later that afternoon when I got a version working I posted it on the discussion forums on Slashdot. Several other people were interested and began to help improve the application. A whole week with little sleep went by while we spend time optimizing the code, making sure the code was creating correct output, calculating how many years it would take to complete, and chatting on IRC.

By the middle of the week not much had changed. On the website they hinted that the username was 16 character long and the password was 4 characters long. The backpack that you would win was called "Host" and so we were almost sure that was the password. We planned a couple of wild guesses that might give us the answer for the login. The first idea was that the subscript on the packs was edgyhipunique or something like that. Although it was 16 characters, it was not the solution. So we went further. Next we tried all combinations of the first 16 letters of this sentence. All combinations were checked out, but nothing was found there either.

As we were continuing to make speed improvements a web site was set up to track what blocks people where checking so that people wouldn't check the same block twice. In all over 500 people were helping out in checking for the answer.

In the end someone went to one of Kipling's stores and figured out that one of the bags ("mailbomb") serial number appended with 001 was in fact the login.
Login: 9840112000309001
Password: host
In the end the brute force solution didn't find the login and password, but it was fun and I submitted the answer anyway. To my surprise a few months later got a bag in the mail. While the bag was neat it was the thrill of hacking on the code that I enjoyed much more.

I found the source tarball on a backup CD and have put it up online if anyone cares to look. It includes some binaries which probably don't work anymore and some messing around with the javscript too. Looking at the code I am horrified at it, but I was very happy that when I ran "./a.out -login 9840112000309001" it was quickly able to find the winning combination.

Saturday, September 27, 2008

Javascript speed, the browser wars, and the death of IE6

Over the past year the FireFox, WebKit, and now Chrome teams have been going back and forth with faster and faster Javascript engines. Shortly after each update a number of blogs run all the engines through the benchmarks. When Internet Explorer is included it is always in last place. Not just by a little bit but by a significant amount. It is so slow that it is often just left out of the graph altogether. The IE team is working on it and the latest beta of IE 8 is three times faster then IE7, but it is still three times slower1 than Firefox 3.0.1. And sense that article FireFox and everyone else have gotten even faster engines, not just by a little, but by a significant amount. This isn't about IE8 or IE7, but about IE6 and its market share. IE6 currently has around 35% market share which is a huge number of users to fight for. All of the Javascript wars is about making those AJAX and heavy Javascript applications run faster. With developers using fast computers with Firefox one can see what is going to happen when someone on a 800Mhz computer and IE6 tries to load the site. Already I have heard of developers migrating users off IE to FireFox because it would take the user several minutes to just load their Javascript heavy site. In the next year as developers take advantage of the new speed this huge swath of users will find themselves feeling more and more pain. Developers might try speeding up a thing or two, but the speed difference between the engines will only allow a developer to do so much and many developers will simply ignore the problem. Sure users can still *use* the site if they are willing to wait, but eventually they will hear about how their friend doesn't have the same problem with FireFox/Safari/Chrome/Arora and upgrade. While tabs and add ons have caused a downswing in IE's marketshare it might be javascript performance that will be the tipping point for the end of IE6. When FireFox got tabs you could still browse the web with IE6, but when FireFox gets a much faster Javascript engine you might not be able to browse the web with IE6.
1

Monday, September 15, 2008

Adobe owns the web and they don't even know it.

The last ten months I have been hacking on a little cross platform WebKit based browser call Arora. Users try it out and are mostly happy, but near all of them end up asking me the same thing:
How do I get flash to work?
Arora uses the Qt and the current version (4.4) does not ship with support for netscape plugins. After the release of Qt 4.4 this feature was added to QtWebKit and will be part of Qt 4.5 this winter. To get it today all you have to do is compile Arora against WebKit trunk. Every few days someone pops into the Arora irc channel asking how they get get plugins. They say "plugins", but they really mean flash, and by flash they really mean video. Every day users are willing to download a development version of WebKit, built it and then build Arora just so they can get their YouTube fix. Users! These are the people who are not willing to do anything are willing to go through all that pain on their own. Big red alarms should be going off at the w3c that a binary plugin is required for browsing these days. Arora has pretty clear documentation on how to get plugins working so I can only imagine how many users went through the hassle and didn't have to ask for a pointer to the instructions. There was even a developer who after putting Arora on an arm device asked how to get Flash working. After discovering that he could not just copy the flash libraries to his device I got the distinct impression that no matter how good Arora was if he couldn't get flash it wouldn't be good enough. The majority of flash on the web for me seems to be 1) videos 2) ads 3) movie websites. I even had a user tell me that Arora was fantastic and would use it every day, but only once Qt snapshot had flash. In the last three years flash has gone from just another plugin to the only plugin on the web browser that matters. I myself never installed flash in Linux until two years ago. This is of course all because of video and specifically YouTube.

But here is where I am puzzled. Adobe is acting like they deserve to be the standard rather then what they should be which is scrambling to make sure they keep their foothold in the browser. Today flash is installed on nearly every browser, but "almost all" should not be good enough. They should have binaries for every platform under the sun, a QA team like no other, and paid developers on every open source browser to make sure the integration is perfect. Adobe should make sure that there is no geek out there who is fed up with flash and willing to dedicate their time to hacking on video on HTML 5.

Flash as a product is pretty bad. It frequently causes the browser to crash and Konqueror (and now Chrome) even ran flash in its own process so that when it crashed it wouldn't take down your whole browser. Flash ads frequently cause your CPU to spike and cause your page loading to freeze. As a user experience they are pretty bad to boot and until recently Google couldn't even index them. Adobe regularly make releases that break browsers and only release binaries for a insanely small set of platforms. Want to run 64bit Linux, or any BSD? No flash for you.

So what might the future hold? It was pretty clear from the Chrome presentation and literature that Google dislikes flash's presence in their browser. It cuts through their security model and in general doesn't play well with what the goals of their project are. But just like me probably found that their initial user base didn't care about their design goals, but just want to see videos of ninja cats. Video through HTML5 is real and coming. It will be in FireFox and WebKit browsers very soon. Up until now I hadn't thought much about this because of the YouTube problem. Even if I.E. supported HTML5 video why would YouTube go through the hassle of upgrading? With Google's Chrome there is finally a very compelling reason for Google to apply internal pressure to get support for HTML5 Video in YouTube. Once YouTube can stream those videos then Chrome could ship with plugins off by default (or disabled until your click on it in some fancy javascript way) creating a more stable, faster, and secure browsing experience. And what about those flash ads? Google uses text ads so they would be hurting competitors to boot. Once YouTube switches the other video sites will follow suit and copy this feature crushing flash's primary reason for existing. Of course only time will tell what really happens with YouTube.

Adobe has bet big on Flash and I mean huge. Six years ago they were a desktop company producing desktop software. But the past few years they have changed. They bought Macromedia and along with flash got their board of directors. For them flash, flex and the web is the future. They want to be the Microsoft of the web. They are directly completing with every cross platform toolkit out there in an attempt to create the new way that desktop and web software is created. They are riding the wave of flash installed and enabled on your browser and if that starts to change they will not be happy. If the golden goose (video) stops laying eggs before some killer flex apps are released there will be problems. Who knows how unhappy they were when the iPhone was released without flash and a dedicated YouTube application sat on the main screen. Perhaps they wanted too much money? There is a video on YouTube a short while back where on of the Adobe sales guys was giving a presentation at Google about flex. There was around ten people in the audience and near the end of the presentation he started saying
Just between you and me Adobe plans to ... platform .. web .. embedded .. flex
My jaw hit the floor when he did that and he was no doubt reprimanded if not fired when he returned to Adobe for the plans he spilled*.

No doubt Adobe will try to flight, maybe create their own YouTube clone, maybe fight with patents to keep HTML5 video off the web. Already they have announced that they will be removing the license fees on the next major releases of Adobe Flash Player for devices. You can read it for yourself in the Adobe quarterly report where they expect to get more money from "an increased demand for tooling products, server technologies, hosted services and applications". Now what happens if people disable flash, developers don't use it, and device manufacturers no longer want it? If YouTube even hints at HTML5 video I would short their stock.

*This has happened several times on Google tech talks that I have seen, it is amazing what people will say when they think they are only speaking to ten people and not the web.

Thursday, September 11, 2008

Usable Linux on the laptop?

Four years ago I picked up a used G4 apple laptop as a second machine. I kept OS X on it and happily used it. The reason I picked up an Apple was that for years I had tried to use Linux on a laptop, but never been able to very well. When I got the apple I discovered some 'amazing' properties that it had. At any point in time I could close the lid and it would go to sleep. If the battery got to low it would suspend to disk. Open up the lid and it would wake up and in just moments you are back working. Not only that but it would automatically find wifi and connect to it even if you were not in the same location. So what if the thread performance of the OS X kernel wasn't that fantastic, it worked really well with the hardware to give me the user a fantastic experience and I still got all my unix tools. Two years ago I picked up a macbook and to add to the fun I can stretch it to five hours on battery if I want to. Every once in a while someone would ask me why I don't run Linux and after telling them the above they try to convince me that it is much better now. I shrug my shoulders and say I am happy so I see no compelling reason to switch to Linux on my laptop (my desktop is Linux).

At my new job I got a laptop and was excited to find out it was getting a Lenovo, one of the laptops that had support for Linux, formally part of IBM and a good rugged laptop to boot. Putting the latest Ubuntu on there I figured was the perfect combination. Ubuntu had been talking about putting themselves on laptops so I assumed that it would all work. After having it for a week here is my notes:

100 Minutes - Not sure if this is Lenovo's fault or Linux, but I get 100 minutes off the battery with light usage. Compiling webkit or something big and continuous and the battery is shot. You plugin your laptop when it has 20% left and usually it starts out at 80% just from being carried to work or being in my bag. So if I am real lucky I get less then fifty minutes before it starts warning me to plug in laptop. You could say this is really a portable workstations, but my macbook also has dual core and 2GB ram and it gets much better. I am leaning to blame Linux because I couldn't believe a laptop manufacture would actually release a non-gamer laptop with only 1 1/2 hour battery life.

Maybe I have become spoiled, but when I close the lid I expect the laptop to go to sleep. Closing the lid on the Lenovo doesn't do anything other then turn off the screen. The first day I put it in my backpack only to find my backpack getting *very* hot by the time I to MIT and to top it off the computer running out of power before the next day loosing my state. With the screen off and the graphics hardware asleep I can't imagine what is causing the heat buildup with the desktop idle.

Suspend or hibernate. Ask anyone on the street to tell you the difference and they will tell you to go away because they don't know. Laptops shouldn't bother asking you. By default they should sleep and suspend only when the battery is near gone.

In the event that the laptop sleeps for a very long time I would expect that the shiny Lenovo would suspend to disk when the battery is very low to prevent me from loosing my state, but nope Linux is happy to loose it.

When I do get it to suspend I would expect that the resume would be snappy. The first few times I assumed it wasn't working and hard rebooted. Turns out it takes nearly ten seconds to resume under Linux. Resume from sleep on OS X is near instant, maybe half a second.

On the plus side wifi in Linux now works...

Wednesday, August 20, 2008

Pick the transformer

I finally got around to making a fun little website where you are shown two classic Transformer toys from the 80's and you have to pick which one you get to keep and which one ends up getting traded away. The site is called called Pick the transformer. It keeps track of the votes and is able to rank all of the toys and show some basic stats on the votes. It is fun to play with and easy to find yourself clicking and clicking.

Back in 2000 a site called amihotornot.com (now hotornot.com) was launched. The site showed the viewer a photo of a girl and you got to rank her 1-10. While not a perfect system it was simple enough to get its internet moment of fame. A little while later another site called pickthehottie.com was put up that showed two images and you had to select which girl you found more attractive. Girls submitted their own image because they want to see where they rank and guys, well they want to look at girls. The sites have expanded sense then to have all sorts of options and extras, but at the core it is just trying to rank photos. The algorithms between the two sites are very different and believe that pickthehottie is much better then hotornot.com. On hotornot you are faced to rank a photo between 1-10 with no guidelines as to how to make judgments. While it is simple enough to say that photo X is near the top if most votes put it at a 10, you can't know with very good accuacy where the photo ranks in relation to the other photos that are mostly 10. (or another way who is the hottest?) This is further problematic when you consider that most photos will cluster around several numbers such as 1, 5, 7, and 10. To rank anything from 1-10 you need to first get a metal baseline of what are the two extremes of 1 and 10. The first few images you have to vote on are very speculative and it can't be until you have seen a number of images that you can consistently rank images. But even once you are consistent you must constantly try to compare the current image with all previous images you have seen which can cause a 7 to become a 9 or 5 over time as you get more and more input or just change your opinion. Contrasting this with pickthehottie I think that many of those problems go away. Given the choice between two images users only have to pick the one that they like more. This is usually so easy that users don't even think about why they think one photo is hotter, they just know it. It is only when faced with the choice between two very similar photos that the user would even have to make a hard choice. Behind the scene every photo contains two lists of photos that are hotter or cooler then the current image. Just based upon the percentage of the two votes you could generate a ranking for each photo. Taking it one step further you can fine tweak the ranking by only counting the votes that were places when comparing two photos of similar percentages. This was a fun little two day project that came together very quickly and is fun to use.

Friday, August 15, 2008

Three years in Europe with Trolltech

After three years at Trolltech today was a sad day as it was my last day. When my wife and I originally decided to move to Norway we wanted to stay for three years and it is satisfying that we were able to reach that goal. The time here has been wonderful, sometimes stressful, but something I will always remember. I thought it would be worth going back and putting together a journal entry of projects I was involved in and photos I took.

In January 2005 My wife and I visited Oslo for an interview at Trolltech. The day before my interview my wife and I walked to the address to make sure I could find the building. There we discovered that Trolltech was located in a somewhat rundown building, but the next day when I went inside I found it to be much nicer and more importantly filled with interesting people working on captivating projects. I took the job and six months later we moved to Oslo. It turned out we arrived in Oslo on the day of the big Qt 4.0.0 party, unfortunately I didn't know about this and we stayed at the hotel. There was around thirty developers and Trolltech was still small enough that the computer at your desk had an actual internet IP.Being new to Trolltech and Qt4 I figured I would have been assigned a minor task to some rarely used bit of code, but after getting setup I was asked to add a feature to QWidget and my first commit was adding support for freedesktop.org startup notifications. After using Qt for eight years the idea of adding code, a feature nonless to QWidget was amazing.


The first three months we stayed at the Trolltech Markvien apartment which is just a ten minute walk from the office. When we flew to Norway we came with just four suitcases. There was no internet at the apartment and as my blogs from back then can attest to I took advantage of the Trolltech library and read through a number of the books. Every weekend for the first two months Jen and I went to one of the tourist attractions around Oslo because we knew that if we didn't do them then we wouldn't at all. We went to the ski jump, the art museum, viking museum, the fjord, the statue park, and many more places. Moving to Oslo we sold both our cars and were now walking everywhere. Along with visiting attractions we frequently would just go off walking around Oslo to see what we could find. There is a path next to a river that flows through Oslo with waterfalls that we would walk down to get to the center of town. It was a very different experience then we were used to back in the states, but the biggest shocker has to be women on bikes. It is sociably acceptable for every day adults to ride bikes, even a women in a skirt which was something you would never see in the states.

On the first day of work I asked Marius (another developer at Trolltech) where the grocery store was and he told me it was across the street on the way to the apartment. I was very skeptical of this as I had walked to work that morning and hadn't seen any grocery store. That evening on the way home I went into the place where he said it was to discover a store about the size of a seven eleven. This I was very quickly about to discover was the average size of a grocery store in Oslo. It has most everything you need, just not fifty choices of everything. Rather then one massive store per town there are many little stores all over that are within walking distance. It makes a lot of sense when you think about it, but coming from Super Stop-n-Shop's it was a bit of a culture shock. Of course everything tastes different too, but no doubt after living here for so long when I return I will fondly miss food that I could only get here.

A month after arriving I went to my first aKademy in Spain. We went for the full ten days and in typical American fashion Jen and I both brought a suitcase, while everyone else from Trolltech only had backpacks. A good lesson to learn for all of our future trips in Europe. As it was my first aKademy I finally got to meet everyone who I had interacted with online and on the mailing lists. I had a blast going to all of the presentations, discussing endless ideas and possibilities and even getting in a bit of hacking. And after many years I finally got a stuffed Konqi which has sat happily in my office. We made grand plans thinking KDE4 could be out in less then a year, but little did we realize how wrong we would be. For me post aKademy I spent a bunch of time cleaning up kdelibs and in particular reducing KApplication to the point where I was able to run a KDE application that used QApplication.

Just a few months later on December 19th, Qt 4.1.0 was released. The biggest addition for me was qtestlib. The super cool little test framework that had been part of Qt-addons was now part of Qt. During aKademy that year it was decided that including it in Qt would add a lot of value and reduce the duplicated effort that had begun within KDE to make their own test framework. I am very big on testing and ended up writing a tool to generate autotests stubs for you. All you had to do is pass in your class header and out comes a file just waiting for you to flesh out that tests your entire class. And when you think you are done writing tests you run it through valgrind and feed the callgrind file into the new code coverage tool I wrote to discover just how much of your class you didn't test. And for anyone making a custom QAbstractItemModel I wrote the Modeltest to help you make your model better and more bug free. One thing I always wanted to see was releasing the Qt autotests with Qt which might (unless it doesn't) happen later this year.

During this time Trolltech was growing and in December 2005 we moved to a brand new building with plenty of empty space for us to grow. Me and Marius traded a view of a waterfall for a view of a hill, both better then any office view I have had in the states. Over the next year the development team at Trolltech doubled as we expanded. The new office was big and much more in line with Trolltech becoming a company rather then a group of hackers, but that didn't stop us from having a paper airplane contest off our 6th floor walkway.

In June of 2006 Trolltech sponsered a cabin trip for the KDE Developers to hack on KDE4. Developers from all over the world came to a little town in Norway and spend a week hacking in a very big cabin. Since the last aKademy KDE4 had deteriorated and was in need of some work. The first two days were spent fixing up KDE left and right and finally It was there for the first time that I was able to see and run KDE4. It was also there that QtWebKit was born. It was amazing to watch as the guys discussed the future of KHTML, grabbed the WebKit source and hacked on it until at 2am on Zack's screen there was the Google homepage. One of those moments you can't plan for or buy tickets to. Meanwhile I was fixing something in the KDE about dialog, depressed at the comparison I went to bed. While there the world cup was going on, Trolltech IPO'd, and the various outdoor meetings we had were just fantastic. For me personally this cabin trip ranks at the top of my entire time in Europe.

A few short months later Qt 4.2.0 was released on October 10th 2006. That date had been set in stone in time for dev days and after the fact everyone agrees that it was pushed out too early, a mistake we tried not to repeat since, pushing back releases by a few weeks when we knew that the quality would be significantly better. Unsurprisingly Qt 4.2.1 followed very shortly after. At some point after 4.1.0 (I don't recall when exactly) Mathias gave one of his fantastic speeches which determined one of the big directions of 4.2 which was better integration with the desktops. For my part I wrote QDesktopServices, but there were many more such as QCleanlooksStyle, QNetworkInterface, QSystemTrayIcon, QKeySequence's standard shortcuts, integration with Glib, and more. Another new class I got involved with (and wrote a few older versions of) was QTimeLine, something I have had lots of fun using ever since it was created. 4.2 also saw that introduction of the eye candy changelog.

Around this time I got involved with the netflix prize and ended up making a framework that used Qt and included a little app that you could use the view the entire database. While making the tool I improved Qt so that QTableView could display all 101 million rows of data. I didn't win the prize, but I had a lot of fun, learned a lot and if I ever get access to a farm of computers know exactly what I am going to mess around with. And to top it off every few months in the mail I get a thank you book or dvd from my amazon wishlist which is very cool.

For the first time since moving to Oslo Jen and I returned to the states for Christmas. After living in Norway for a year and a half we could both speak very bad Norwegian. With no real Norwegians around we happily talked to each other using our horrible pronunciations and bad sentences structure. We could understand each other and we were happy. While in the states I got a MacBook and have had it automatically take two or three photos every day for the past two years. Looking through the photos there are many photos of people, books, and other images of every day life that I would normally not have taken a photos of. It is nice to have captures the memories that would have been lost.

Returning to Oslo I formed a group and wrote up a proposal to get a Wii for Trolltech. A month or so later a box arrived with a Wii and extra controllers which was promptly setup (The Wii being somewhat easier to obtain in Europe we only had to wait a month). We have had tennis tournaments and many after lunch Bomberman93 (an absolutely fantastic five person, ten minute, easy to learn, multiplayer game) sessions ever since.

Looking back I was surprised to discover that I have been using Git for over a year and a half now. One of the first things I used it for was to port an annoyingly addictive Qt2.x game to Qt4.x called Anigma. The big thing in my life was that I cut my hair which had been very long. The first few weeks were very freaky and I kept reaching back to move it out of the way or to play with my phantom hair.

On my creative Fridays I had been hacking on a little class called QColumnView. A nice little class that I am proud of that got into Qt 4.3 which was released June 1st 2007. Also in 4.3 I wrote a new QFileDialog. The majority of the work was in the model, but I also tried out a new look a feel for the dialog that was a hybrid of OS X's and Vistas with animations and other 'improvements'. Which everyone completely hated and got torn to shreds online so I reverted the interface back to the 4.2 one. The biggest thing for me in the 4.3 release was QtScript. The kick ass, fast scripting engine that I had been messing around with for several months and used to play around with genetic algorithms. I even wrote a QScript profiling tool. Overall Qt 4.3 was a good solid release with more improvements then features I would say.

In the fall of 2007 I finally sat down and started to learn Lisp something I have always wanted to do, but just never made the time to. Unfortunately a much more exciting project was just around the corner that took my free time away...

Shortly before Christmas while at lunch one day I had a conversation with Simon and Lars about the QtWebKit module which had just been merged into Qt. I had been asking about if there was plans to make a little browser that could be used to help test the module. In WebKit there is the QtLauncher, but it was more for developers and not something that can be used for day to day browsing. They were also interested in such an application and I offered to help make one. All through Christmas I hacked and my last blog entry of 2007 gave a hint about what I was working on and what would become the demo browser. Starting with the cookie jar I took each feature one by one, fleshing them out and only integrating them when there was a manual test and matching autotests. I spent all of my free time hacking on the browser and then part of my time at work tracking down and fixing bugs in QtWebKit and the new networking code. I was ruthless with any segfault I found and by the time 4.4.0 was released QtWebKit was pretty stable. One of the very first QtWebKit bugs I fixed was adding support for tooltips so xkcd's tooltip would work. Unlike FireFox2 which would cut off long tooltips QtWebKit tooltips could wordwrap. Just like my first commit to QWidget in Qt the feeling of committing a patch to WebKit was really cool. In February Trolltech had its annual ski trip in the mountains. Before we left I printed out all of the code for the demo browser and with the help of other developers did a complete code review on the bus and in the cabins. This helped improve the code for the announcement of the little project. One day I was asked if I would be interested in writing a QtWebKit article for Linux Journal. Having never been published in a magazine before I jumped at the option and a few months later got three Linux Journal magazines in the mail which was really neat to see an article I wrote in print. Shortly before Qt 4.4.0 was released I forked the demo browser and created Arora. Although many people had been interested in seeing it forked much sooner I didn't want to do that until 4.4.0 was released to get as much feedback from everyone trying out the Qt 4.4.0 beta.

Around this time it was announced that Nokia was going to buy Trolltech. For me the biggest bummer had to be when I saw them taking down the Trolltech sign off the building the night before we became Nokia. Overall Nokia has not caused any major changes of doom as some feared and I expect Qt will continue to improve, grow and be even more open in the future. Trolltech has a lot of smart people in it and they wont let anything happen to Qt.

On May 2nd Qt 4.4.0 was released. This was while I was in Italy and as Qt got lots of press the demo browser also got mentioned various places (with lots of questions ending in my inbox), but I was unaware of all of this until I got back as I didn't think the 4.4.0 packages were going to be released until the end of the month. I had a few stress filled days catching up before things returned to normal. Other then the demo browser I had also gotten in QSystemSemaphore, QSharedMemory, QLocalServer & QLocalSocker, and the QFileSystemModel. But my favorite addition has to be QFile::map, something I have implemented several times on personal projects. Qt 4.4.0 also had a number of brand new features and I think that this release was the best .0 that Trolltech has had to date. One feature of Qt 4.4.0 that I think is a hidden gem is QtConcurrent. I have used QtConcurrent in some personal projects since it was first announced and the ease that it lets me use both of my cores to double my speed still amazes me. With the possibility of having six and eight cores very soon the speedup in performance will not be insignificant.

I am leaving before 4.5 is released, but I have been able to squeeze in some stuff including the new QTabBar and QTabWidget improvements and implementing a cache for the http backend in QNetworkAccessManager. I am looking forward to the 4.5 release.

Around this time Jen and I knew that we would be returning to the states and so we made trips to Italy and the camping trip to Trolltunga. While living in Europe I got the opportunity to visit Spain, Germany, France, Ireland, Scotland, England, Italy, and Jen visited even more. Because they were spaced out we took the time in each place rather then rushing from one to the next during a month long Europe vacation. I have tried to limit my blog to technical related articles, while Jen's blog contains many more of our every day adventures from visiting cities to laughing about the food we would find.

While visiting Euope can be a lot of fun, the act of living here has open my eyes to all sorts of things. The idea that we could get by without a car is one we could not imagine three years ago and yet somehow it works out just fine. With so many people from work walking to get there is is no surprise that on an evening stroll or walk downtown that you will run into someone you know. The closeness of the community is very nice. Many of the guys are Trolltech have stories of when they were in Army, unlike in the states where you get to sign up, here they don't really have a choice and although I knew that some countries did that it didn't really hit me about what it means until talking to many of them.
Another difference was the banking system where in Norway it is far ahead of the states. No one has a checkbook and my bank didn't even accept checks, all your bills are paid online, and if you want to send money to a friend it is easy to do online. Even taxes are calculated for you by the government and if there are no errors you can send a SMS to confirm the form and you are done.
The metric system, 24 hour clocks, A4 paper, currencies, and other things you don't think about when visiting, but you have to deal with when moving were just a few other things we had to adjust to. At least with the 24 hour clock I plan on continuing to use that and getting a 24 hour alarm clock in the states so I will never set the alarm for 7pm when I meant 7am. Overall living in Europe has been a fantastic once in a lifetime experience and I am glad that I did it.

While at Trolltech they managed to put out consistent releases with new features and improvements. From the time I started to my last day they strive to be open, from making the Windows version of Qt GPL, the launch of labs.trolltech.com, creative Friday's and involvement in many open source projects such as webkit, git and KDE, they have risen the bar about what a company can do. But none of this would have been possible without the people behind the scene. Above everything else coming to Europe and Trolltech has been about the people I have met and worked with. Trolltech is filled with some of the smartest and most interesting folks I have met who enjoy what they do and it shows. They are friendly, welcome you into your home and will help you out when you are completely confused.

Even though I am leaving Trolltech and returning to the U.S. I plan on continuing to use Qt, and probably submit a patch or two. Arora has really taken off and has an active set of users and developers and I am really looking forward to what it will become. Hopefully I will return to Europe to attend aKademy and get to see everyone again.

And to wrap it all up I thought it would be fitting to have the Qt4 dance.

Thursday, August 07, 2008

Code Review Should Be Free

Depending on the definition "code review" can mean a wide variety of things such as formal code review or automated code analysis. For this article I am talking about when a developer has a patch to some code base and would like someone else to review it before it goes into the main repository.

Just the thought of having a code review will frequently cause developers to spend extra time making sure a patch is as good as it can be. Code reviews typically catch bugs, design flaws, missed edge cases, inconsistent or confusing code style, and more.

There are several core ideas for code reviews, the more of them that are followed the more successful the technique will be.
- Review of the code by one or more persons before committing it to the central repository.
- Record comments about a patch so suggestions are not forgotten as a patch changes.
- Patches should be easy to review and test.
- The developer shouldn't feel like the code review is holding them back or that they are waiting for a review before they can continue working.
- Reviews should integrate easily with the existing work flow.

With only these five requirements it shouldn't be too hard to do. So what are the most common ways that developers do code reviews?

Email pass-around

The most common form of code review I have seen is the email mailing-list of commits. It is so common that even when you find any other technique you will often still find this one running too. This is mostly because it is "the example" of how to add a hook to your revision control system and usually extremely easy to setup if not done for free. It works by having a hook in the primary code repository that sends an e-mail to a mailing-list with the diff of every commit. This of course fails the first and most important requirement which is to get the review before committing. The idea is that everyone reads commits they are interested in and replies to the mailing-list when they spot something wrong which another commit fixes. In practice most commits are not reviewed and most people read only a few random commits before marking them all as read. This of course only gets worse as the project grows and the volume of mail on this mailing-list increases to the point where no one reviews. To top it off when you see some code that works, but has a few confusing variable names you wont say anything. In fact it takes quite a bit for anyone to actually hit the reply and comment on a patch; code that doesn't compile, possible security bug, introduction of bad public API or some other major and obvious bug. And this works both ways, the committers know that they can get away with it so they don't make their commits as polished as they could.

Cons: By the time the e-mail goes out to the mailing list the patch is already committed into the repository. Once the code is in the repository it is much harder to ask someone to make smaller fixes and can even be taken as an insults. In practice there is an extremely low volume of actual reviews that are done.

Pros: Having nothing to do with code review this method is often best to occasionally let you see what is going on elsewhere in the project. In fact it has spawn a whole sub world that digest commits and write up reports such as the KDE commit digest.

Over the shoulder

Less common in the open source world (and in the corporate world) is the in person code review. This is where you have someone physically next to you while you explain the patch. While better it has its own drawbacks. When you go over to do a code review for someone else they have usually been sitting around for five minutes waiting for a code review and quickly walk you through the code. Because you are face to face, you can often ask any question and quickly determine if the developer had explored all the options you can think of. You can ask if the unit tests have run, if there is an example, etc. At the end of the day the results vary wildly with some reviews being in depth while others being just a quick glance. When this is the primary code review method and the person that would normally review some code is not available (vacation, sick, out of office) the commit might go in without any review.

Pros: Face to face reviews can be very quick and I can quickly turn into pair programming. The process of explaining the problem and patch can often times cause the developer to think of edge cases themselves to test. Unrelated this is one way to teach someone about a piece of code.

Cons: Many times the reviewer can feel rushed as they try to digest code that the developer quickly explained. The developers tells the reviewer the patch is correct even before the reviewer has reviewed the code. The developer who is best suited to review the patch might not be around and a developer who does not know the code and is not as able to properly review the code is drafted into doing it.

E-Mail diff

Bits of all of the above, a diff is passed around through e-mail which can spawn a thread of comments. The biggest downside is that there is a higher cost to entry and people are less likely to send smaller or simpler patches via e-mail. Often times the only patches that are sent via e-mail are larger ones that go through a number of revisions, and yet the code being e-mailed around isn't in a revision control system loosing out on the ability to see the diff between versions. Because the developer resends the entire patch every time they make a change the more revisions that are sent in a thread the less actual reviewing will be done.

PasteBin

In the past few years I have seen the rise of paste websites such as http://paste.lisp.org/ that provide a simple way to paste some text. Usually no login is required and it takes only seconds to do. Developers will generate a diff and paste it onto these sites and then ask for a review over IRC or another chat system. There are even command line tools to make pasting code straight onto the website even easier such as 'svn diff | lpaste'. Pasting while cheap and easy has drawbacks such as requiring the other person to be there right then and the paste is often missing full context such as what was just above a diff. Because pasting is usually done in conjunction with chat the paste sites don't provide a way to add comments to a patch.

Pros: Very easy to do. Can be done with someone that is remote. As it is just text it is not constrained by any system or process. Also used sometimes for a quick way to send patches or other files rather then e-mail. Because you are also chatting via IRC you know that patch is being reviewed right there and then vs e-mail which could be a week.

Cons: The patch is frequently only read and almost never applied to a working repository to test even if it even compiles. This is just another tool that has absolutely no integration with any other tools you use such as the source repository. Feedback is always done via another format such as IRC. Requires live interaction between both parties and requires the manually pushing of the diff up to the website in the first place to get a review. Many paste sites have an expiration of a month and sometimes developers set it to one day causing frustration the next day when the reviewer tries to access it.

WebTool

In the past year or two several code review website tools have popped up. http://www.review-board.org/ and Rietveld are the one that had obtained some blog press. The review tools all try to improve upon the paste idea by managing patches. You can add new patches, request reviews from developers, add comments to patches, and update the patch. When I first heard of the tool I though it was fantastic and played around with it, but ended up not using it for very long. Fundamentally it requires a lot of work on the part of the developer. For every patch there is a lot of manual work that needs to be done. Even when the patch is approved you still have to submit the patch and then mark it as committed in the tool. Not to mention you need to have yet another way to watch for updates to your patches and for patches you need to review. When testing out the systems I only ever used one patch, but in production I had many patches, some of them on the same file. This caused problems as the patches were approved and other patches got stale and the tool would get confused. Fundamentally a review system needs to have the foundations of a revision control system to properly work and be closely tied into the revision control system that is used. Most of the review sites out there are little more then paste projects that got scaled up, they will let you down. To further illustrate this: you can patch your patches. This causes the review system to keep what is a branch on top of the current tree that you can view. This eventually will break by not behave like your real revision control system does. The most common one I saw was where I wrote patch A, someone else submitted B and then I updated A and uploaded it. Rietveld would freak out and break. And because the review systems are so big they eventually go down for a day here or there. All of your patches are up on the system when the review system goes down you can't do anything either. Just as pasting got popular because it was so very easy, a web tool probably wont get very popular because it is just to much work.

Is there any better options?

So after that whirlwind tour of what is currently in use out there I want to present something I have been using the past few months that takes advantages of distributed revision control and a little known feature of GitHub.

One behavior I have witnessed is that once a developers pastes a patch they wont do anything until they get a review. They won't commit until they get a review and they don't want to work on the next task until they commit the first. Nice catch-22. Using a distributed revision control system such as Git (or any system where branches are cheap) a developer can commit to the branch, post the patch for review, and continuing hacking in another branch giving the reviewer(s) as much time as they want to review the code. Patches sitting in their own branch can be worked on and revises for as long as needed without feeling as pressured to merge it 'right now' as often happens when a system does not use branches.

So just switching to Git solves one annoying code review problem, but can it do more? Several months ago I started using GitHub for hosting Arora. Every developer has a fork of the main repository (including me). When a branch (often just one change) is ready it is pushed up to GitHub and a pull request is made. When I get a pull request I can view the patch which is the same as e-mail pass-around and paster, but because the patch is in the source repository I can do all of the following very cheaply:

- Review the patch in the context of the entire file. I can glance at just a patch, view the entire file, its history, other branches, and well anything I want because I have the full repository right there.
- Review the changelog entry to make sure it is following the project format, spelling, etc.
- Checkout the code and try it myself. The number of times I have actually downloaded and applied a patch from a paster is very small, maybe twice a year. But with Arora almost every time I have fetched the repository, built the branch, played with the new feature / bug fix, ran the auto tests, etc. It is *very* cheap and easy. When I get a pull request from user foo, I run git fetch foo to grab their changes and because the patch is in its own branch compiling and running the patch is as easy as checking out their branch. No patching required.

Another nice aspect is that when viewing the change on GitHub you can click on any line number and add a comment right there. Add as many comments as you want to the different issues you spot. When the developer commits the next version you don't have to review the whole patch, you only have to review the patch to the patch to make sure that what you commented on was fixed. Because the branch is a real Git branch and not some files on a website review tool it understands patches to patches and when upstream is rebased just fine and never has issues.

Unlike review-board there is no extra work involved to review code. I get code review features for free with doing my normal everyday committing and pushing. When I get a pull request that is perfectly fine I can just merge it and reply back with a "Thanks!". But if I want to go in-depth with a review and leave comments and have them upload a new versions I can. If I want multiple reviews of one of my changes I can easily get that with multiple pull requests (with a comment just to review and not merge). To get reviews of my code I already push up my branches so it is nothing more then going to github and clicking on the pull request button.

Reviewing with GitHub isn't perfect. I would like to have an RSS of all the comments I have left and another one of all the comments others have left on my repository. Currently they are simply placed in your news feed. But what is in place today works well enough and is far beyond in my opinion the other code review options.

With distributed revision control systems and sites like GitHub or Gitorious code review is such a simple add-on to already proven workflows, it's essentially free.

Photo by Sebastian Bergmann under the CC license.

Wednesday, July 30, 2008

An adventure to Trolltunga

A little over a year ago on reddit I saw a picture of a guy sitting on the edge of a rock. Not just any rock, but a clip of a rock that jets out of a vertical side of a mountain with a 350 meter drop. In the comments I discovered that the photo was of Trolltunga and is right near by in Odda Norway. Along with some other trolls from Trolltech we rented a car and drove to Odda for a little camping trip. The drive took around five hours, but we were in for a treat as the two lane highway though Norway is filled with fantastic scenery. The highway went through many of the mountains and once there was even a corkscrew tunnel. About half way through the trip you come out of a tunnel up in the top of the mountains dotted with snow. We stopped there and had a snack and took photos before continuing on.





To go with the mountains were numerous waterfalls.



Once we reached Odda we started our journey at the funicular railway which takes you up the side of the mountain which is 960 meters high with a 42 degree angle slope. According to our information the train was not running this summer and we would have to take a very steep path up the side of the mountain with full backpacks of camping gear. This exhausting start to our trip took a little over two hours.





But is was rewarding to reach the top and after a short rest ...



We continued up the path for another hour and finally made camp at a small clearing by a stream. Making dinner we had an absolutely fantastic view of the sunset over the snow topped mountains.



Being so far north the sun was only set for a few hours and in the morning after packing up camp we started off the trip to Trolltunga. On the way:

We walked through snow ...



We saw crystal clear ponds formed from melting snow ...



Saw many streams and waterfalls of every shape and size ...



and walked along the edge of the fjord.




After hiking through the mountains for about four hours we arrived at Trolltunga!







After taking many pictures and having lunch we headed back the way we came. Along the way we met someone who told us that the railway was in fact running, but only twice a day, at ten and six. We hurried back and four hours later arrived just in time for a ride down the insanely steep mountain. Riding in the front of the car it was like a roller coster ride where you go off the edge, except it took ten minutes to get down.



In all we walked around twelve kilometers over the two days. Originally this trip was all about seeing one little site, but it turned into a full adventure with all sorts of things to see at every step of the way. If you every get the opportunity to goto Trolltunga I would highly recommend it and it is without a doubt one of the highlight of my time in Europe.

Update: After submitting this to reddit and getting a bunch of votes I got a reddit header.

Saturday, July 12, 2008

Puzzles, Algorithms, and Qt::Concurrent

ITA software, a very cool Lisp company has for years put programming puzzles up on their website. Although the puzzles are fore hiring, they are still interesting and every once in a while I take a stab at one of them. One that recently caught my eye was this Sling Blade Runner puzzle. Although it is not in the archive of retired puzzles the problem itself is NP so I am not worried about showing off my answer as everyones solution will no doubt be different. I myself wrote several different solutions including a genetic algorithm and it was turning out fantastic results, but it was taking days to run v.s. the solution I present here which takes only minutes. Onto the actual problem:
"How long a chain of overlapping movie titles, like Sling Blade Runner, can you find?"

Use the following listing of movie titles: MOVIES.LST. Multi-word overlaps, as in "License to Kill a Mockingbird," are allowed. The same title may not be used more than once in a solution. Heuristic solutions that may not always produce the greatest number of titles will be accepted: seek a reasonable tradeoff of efficiency and optimality.

The problem description itself is a little vague leading to different goals depending upon how you read it. I choose to count the length by the number of movie titles that were used rather then words or character.

The first part of the problem is just reading in the data (the data file has its own issues for you to solve). Just from googling you will find several blogs where the discussion is mostly all about just reading in the data and running a depth search. This finds chain lengths of around 220 very quickly and while nice doesn't attack the real problem of the algorithm.

Once you read in the data at the core you end up with a list of nodes. Each node contains a list of nodes that it points to (and optionally a list of nodes that point to it). In other words your challenge if you accept it is to find the longest path in a directed graph.

My solution is built upon two algorithms. The first algorithm takes a starting node and produces a list of long paths. This algorithm wont guarantee the return of the longest path for a source node, but it will return very long paths in a very short amount of time. The pseudo-code version of the first algorithm in (called findLongChain in the code):

set1 is populated with a source node
while (set1 is not empty)
for each node in set1 all paths are followed.
If the path to the next node is more then the
currently known length to that node it is
stored and that node is added to set2.
set1 = set2
Using just this algorithm a respectable result can be found.

These long paths can be sent through a second algorithm which attempts to increase a chain length by finding local longer paths using a limited depth search. The pseudo-code version of the second algorithm (called expandChain in the code):

for (i = 0; i < chain.length; ++i)
Starting at i do a depth first search with a max depth
of X where X is a relatively small value to find a longer
path then the current chain[i] -> chain[i + X]. If a longer
path or subpath is found replace it in the chain.

Once the graph is constructed each node is passed to findLongChain which runs its best chain through expandChain before returning it. Once each node has been run through findLongChain the root node of the best chain is run through findLongChain, but this time all the best paths are run through expandChain to see if there is anything longer to find.

Because findLongChain is completely stand alone I was able to take code that was originally something like this:

Chain longChain;
for (int i = 0; i < todo.count(); ++i) {
Chain chain = findLongChain(todo[i]);
if (chain.count() > longChain.count())
longChain = chain;
}

and change it to use QtConcurrent's mapped reduce function to take fully advantage of my dual and quad core computers. The ease that QtConcurrent lets you scale your solution is fantastic.

Chain longChain = QtConcurrent::mappedReduced(todo, findLongChain, reduce);

While hacking on my solution I came across Jeremy Faller's website who also had been looking at the problem and was able to produce good results. (our approaches were completely different and we didn't see each others code until after we had finished). I took his longest length chain of 318 and ran it through my expandChain function to get a chain with length 326. A very respectable result found by the combination of our two programs.

The source code for my solution can be found on github in my sling blade runner repository.

After compiling run './sbr' and in around ten minutes on a quad core (subsequent runs last around five minutes). It will print out the best solution that was found which unless you change the code was a length of 312. If you uncomment the DEBUG define at the top of main.cpp the progress through the nodes will be printed as well as the best chain as it is found.

Hacking on this problem has been a lot of fun. Because there is no right solution I was free to try out all sorts of different approaches and spend time reading up on papers on graphs (the ones I didn't have to pay for). Spent some time messing around with Qt::concurrent and how fast I could get it to run. While working on the problem it was fun and motivating as I was able to pass everyone else's (the ones I found via Google) longest chain. The best score found by my code is 313, but I am more happy about the combined length of 326.

Popular Posts