Find hot spots with Git
A couple of months ago an article
on the Googles engineering blog discussed bug prediction, as a
measure to sustain quality of a code base. Part of that discussion
covered hot spots, which Google defines as section in our code base
which get touched a lot, because those are likely not well designed
and thus prone to bugs.
For a recent code review I wanted to start with those hot spots. But
first I had to figure out how to find the hotspots of a code base
stored in a Git repo. After some research I found that beside some
Groovy script and some
Ruby project, there is a unix way to detect hot spots in any Git
repo.
Example for the ruby project mentioned earlier. The number in the
first column tells you in how many commits the file has been subject
to modification. (The M is just a remainder of the original output
from Git and stands for 'Modification'.)
Have fun with your hot spots ;)
git log --pretty=%H --name-status \
| grep '^M' | sort | uniq -c | sort -n
[...] 6 M test/hotspots/store_test.rb 7 M CHANGELOG.markdown 8 M lib/hotspots.rb 8 M lib/hotspots/repository/parser/git.rb 8 M test/hotspots/repository/parser/git_test.rb 10 M hotspot.rb 10 M lib/hotspots/repository/driver/git.rb 12 M hotspots 12 M README.markdown 14 M lib/hotspots/options_parser.rbNow you could put this in an alias and use it as is, but you could also easily refine it into a shell function. That way you can even pass any switch to 'git log'. Like for example '--since "30 days ago"'.
hotspots () {
git log --pretty=%H --name-status $* \
| grep '^M' | sort | uniq -c | sort -n
}