Why Squashing commits is bad
I recently had a discussion with a colleague about why squashing commits in Git pull requests, or any source control for that matter, was bad. The discussion was started with a suggestion that developers (at work) should start to follow a specific commit message format. This was so that release notes could be automatically generated based on the commits in a given release. Now to do this, and have a release note that looks sensible, it relies on very few commits per feature or bug fix, which in turn involves squashing developer's commits. While I don't object to a standardised commit message and automated release notes (in fact I'm a massive supporter of devops and build/task automation), I believe that squashing commits is a bad idea. Here's why.
Small commits, often
Developers are encouraged; I certainly encourage other developers, to commit often - typically every 30-60 minutes if not more. I know this can sometimes be difficult to stick to and can depend on the type of work, but mostly I find that 30-60 minutes isn't unreasonable. In fact in an ideal world, particularly when refactoring, this time frame should be smaller.
Smaller, more frequent commits, not only encourages running tests often but also allows rolling back to a known point very easily without loosing large amounts of work. It helps break down a problem in to multiple chunks, you can't reason about changes in a commit if it's just a pile of unrelated changes, and it can also prevent heading off in a complete tangent.
Looking at a developer's commits also gives a good insight in to the approach and the thought process of the developer at the time they wrote the code. This is incredibly useful if a bug needs to be tracked down and fixed at a later date.
History is invaluable
Now, if we squash commits so that a single commit makes up a feature we've lost all of the information we added previously. You condense all of the information in, let's say, 100 commits to a single commit that talks about the feature and the high level changes required to implement the feature. You have no insight into the approach, the thought process, the reasoning.
Source control is a tool to track changes to source code (and in this I include tests, configuration and build code), not features - that's what an issue tracker is for. So what if there are commits that have been reverted? So what if a there are commits where the developer went down the wrong path and then backtracked? This all adds to the story of developing the feature and provides a tonne of information if a feature needs to be changed or debugged at a later date.
Now, I do agree with automating the release note process. This is where the issue tracker comes in, that is what it's there for. To track issues. Issues should be labelled and categorised accordingly so they can be used to form the release notes. In the BladeRunnerJS project we use the GitHub API to request all of the commits for a given milestone (a specific release version) and use the issue tags to calculate whether they should form part of the release notes.
If your issue tracker doesn't have this kind of API? Get a new one! Any issue tracker worth using will have an API that can be used in this way. Whatever you do (please) don't squash commits in order to form sensible release notes from commits.
Commits provide a step by step record of changes made, a very technical view of a feature's composition. They should be used for this and only this. An issue tracker is the tool for tracking features, bugs, and high level information - just use it properly!
Credits
- Git squash image - http://cdn.meme.am/instances/56528829.jpg
- "All your bug" image - http://static.thegeekstuff.com/wp-content/uploads/2008/10/bugzilla-logo-260x300.png