branch14log

Find hot spots with Git

10 Aug 2012

A couple of months ago an article on the Googles engineering blog discussed bug prediction, as a measure to sustain quality of a code base. Part of that discussion covered hot spots, which Google defines as section in our code base which get touched a lot, because those are likely not well designed and thus prone to bugs. For a recent code review I wanted to start with those hot spots. But first I had to figure out how to find the hotspots of a code base stored in a Git repo. After some research I found that beside some Groovy script and some Ruby project, there is a unix way to detect hot spots in any Git repo.
git log --pretty=%H --name-status \
  | grep '^M' | sort | uniq -c | sort -n
Example for the ruby project mentioned earlier. The number in the first column tells you in how many commits the file has been subject to modification. (The M is just a remainder of the original output from Git and stands for 'Modification'.)
[...]
6 M test/hotspots/store_test.rb
7 M CHANGELOG.markdown
8 M lib/hotspots.rb
8 M lib/hotspots/repository/parser/git.rb
8 M test/hotspots/repository/parser/git_test.rb
10 M hotspot.rb
10 M lib/hotspots/repository/driver/git.rb
12 M hotspots
12 M README.markdown
14 M lib/hotspots/options_parser.rb
Now you could put this in an alias and use it as is, but you could also easily refine it into a shell function. That way you can even pass any switch to 'git log'. Like for example '--since "30 days ago"'.
hotspots () {
  git log --pretty=%H --name-status $* \
    | grep '^M' | sort | uniq -c | sort -n
}
Have fun with your hot spots ;)