Thursday, October 02, 2014

So you want to build a Git server?

So I heard you are thinking about creating a Git server.  Try to cash in on the massive rise of Git perhaps?  Maybe it will be online only like Github or perhaps it will be for customers to install like Gitorious or perhaps it will be mobile only!  You of course have a reason why your Git server is going to be better and take the market by storm.  Having been messing around with Git servers for almost almost seven years I have installed many different types many times and even started working on one myself called GitHaven many years ago that I had to abandon.  But my loss is your gain!  Below is the essential wisdom I have gained that hopefully you can use to make your Git server better.

Beyond that one killer feature that started you down the road of making a Git server you will need to implement a fair number of other things which fall mostly under three categories backend, frontend, and bonus.

Backend is the bits that accept connections for the git fetch/push commands and provides an API into the data and system.  It does authentication and authorization.  It is of course completely separate from the frontend for many good architectural reasons.   After basics you can talk about multiple different authentication schemes, hooking into ldap, scaling ssh, and providing many different rich commands that users can use.

The frontend these days includes at least a web frontend.  At the core this is all about either browsing files from a branch/tag/etc or viewing a log of commits.  Beyond basic viewing the sky is the limit with the addition of search, view log by any number of filters such as author author, diff tools, etc.  After that is basic web administration, user account, user settings, ssh key management, gravatar, password reset etc, and the same goes for the various repository settings.  Then you hit upon richer version control features like forking, pull requests, and comments.  Note that even forking and pull requests are imposing a workflow upon your customers and as you create richer tools they will be used by a smaller audience so choose wisely.

Bonus covers everything not related to Git, typically much more project oriented.  This includes stuff like wiki's, issue trackers, release files and my personal favorite the equivalent of GitHub pages.

I have mentioned just a few of the feature that a Git server could have and there are countless more, but what is really truly important for a Git Server to have?  What makes a Git server?  Is it a built in wiki?  Are customers buying your Git server because of the wiki and the fact that they get repository management a bonus?  Okay maybe it isn't a wiki.  What about issue trackers?  It is probably bolted on and no where near as powerful as something like Jira so perhaps not.  Again it might get used, but they wont be buying the product because of it.  So maybe picking on everything in Bonus is easy.

Maybe a better question to ask would be what feature that if it was missing would have the customer seriously consider switching to another Git server because it stopped from doing their job?  What if you provided no web frontend at all and could only interact with the server through api calls or ssh like Gitolite?  Sure for many customers they might be annoyed, but they could get the job done knowing they could clone, work and share changes.  One reason the frontend separated from the backend is that even if the frontend goes down for some reason developers can still "do work".  So what is the backend providing that is so central to a Git server?

  Mentioned above is authentication and authorization.  If the Git server didn't have those built into its core would that matter?  For authentication for example a customer requires you hook into ldap and your server doesn't support it they will look elsewhere, on the web maybe this means authenticating through a 3rd party.  At the bare minimum if anyone could do anything almost no one would want to use it.

  What about authorization?  Authorization is the ability to say yes or no to what a user wants to do.  At the base minimum would be just a check to see if they are authorized and then it returns true they can do whatever they want.  The average depth of permissions most Git servers have is that each repository can have a list of users that can push to the repository.  A pretty basic safe guard is force pushing.  A good example is Atlassian Git server Stash which out of the box doesn't provide you any way to prevent users from force pushing to a branch.  In the age of automated deployments an accidental push could result in downtime, maybe data lose or worse if the wrong branch is force pushed to.  Stash‎ leaves it to the 3rd party market (only a few clicks away) to provide a plugin which prevents force pushing (on every branch, no configuration) so it isn't as bad as it first sounds.  On the other end of the spectrum is Gitolite which lets you create user groups, project permissions, repository permissions, branch permissions, cascading permissions, read, write, force push, create, and more.  They even let you have permissions based upon what file you are modifying, it is very rich and powerful.  What sounds like edge use cases such as a build server should only be allowed to create tags that start with release-* or the encrypted password file should only be modified by one single user are very common.

  Very closely related to authentication and authorization is having an audit trail.  At some point something is going to be configured incorrectly and someone is going to do something bad and the user will demand the logs so they can find out what went wrong (by whom), undo the behavior, and prevent it from happening again (such as someone force pushing to master).  If you don't have logs and can't prevent force pushing to master they might look at you funny and then start looking for a new place to host their code.

  The third thing that the core of any Git server does is provide the ability for users to add new commands.  This is central to Git itself with its hooks system.  The classic server side hook is email where users can configure the hook to send out email to some email list whenever new commits are added to a branch.  Providing a way to add new hooks cleanly into the system is definitely something you want to do, but there is one hook that should be included from day one, the WebHook.  The web hook is a very simple hook that users can configure that says when something changes in a repository send a post to some url.  This url points to something that the customer owns is something they can get working in no time flat with no access to the Git server.  They don't need to learn your api, create a hook in your language or choice or anything that is a hassle (putting in a change request with IT admin to install a plugin).  They pull up their favorite web interface and make their own tool that runs on their box.  The best part is the because it doesn't run on the server security isn't and issue and it can't take down the server no matter how many web hooks are enabled.

So what is at the heart of a Git server?  Authentication, authorization, and extensibility.  Whatever type of product you create you need to absolutely own those because they are central to the success of your product.  You not only need to own the code, not leave it to a 3rd party and they should be the fundamental building blocks of the product.  Maybe the rich extension system you have planned isn't ready yet, but the WebHook should be there.  Maybe the permissions model is in place, but the frontend doesn't expose it yet, having a check box to blocking force push to master should be there on day one proving it works.  Out of the box the lack of features in these three areas are the biggest area where customers will turn away not because they disagree with some design choice, but because they flat out can't use your product and on the flip side if you fully support these three you will find customers migrating to your product.

Extra: Git users love to use hard drive space, especially enterprise customers.  They will migrate an old CVS repository into a single 10GB Git repository without thinking much about it.  The disk usage will only grow and the system must proactively monitor and support this.  Using authorization to limit disk space, notifying users when a push isn't normal (accidentally committed the object files), and letting users see their own disk usage is one method, but it only slows the growth, supporting scaling of the number and total size of repositories is an absolute must.  Even with a small number of users they will find a way to use all of the available space.

Popular Posts