Sunday, February 26, 2012

Bluffing Line Count and why it's still useful

I'll take it as a given that every developer knows that line count is a problematic metric. You know you got problems when someone with power thinks that "Homer did twice as much work because he wrote twice as many lines". Why is line count so unreliable? Here are some reasons to get started:
1.       Significant lines vs. raw, i.e. whitespace, comments, brackets on new-line
2.       New code (creates lots of it) vs. production bug update (5 hours to fix 1 line)
3.       Generated files (proxies, designer) or massive code generation
4.       Copying 1000 lines from open-source online (algorithm works, so low risk, but you've added a lot of code)
5.       Complexity of code, i.e. a 20-line algorithm may be more work than 1000 line of casual UI and data hookup.
However, if you're trying to coordinate more than 10 developers, and you have no other metric, line count still has some value because it quickly tells you something is going on. (i.e. the "something is better than nothing" philosophy.) You've got to look at the trends, not the absolute values.
·         It's useful to know if your team's average developer produces 500 lines of code (LOC) per week (of course this varies from team to team), then seeing someone produce either 50 or 5000 should catch your attention. Sure, there may be a good reason, but you at least want to be aware of what that reason is. Is the guy generating 5000 massively copying and pasting code, re-inventing the wheel for quick-to-write utility code, or using a passive code-generator instead of your team's ORM framework? Is the guy only doing 50 not checking anything in, and waiting to surprise the team with 4 weeks of work the day before code freeze for one "glorious" check-in?
·         Line count is ubiquitous and everyone can understand it.
·         Line count is very cheap to calculate; many tools can provide this.
·         Line count is the basis for two more relevant metrics: code churn, which tells you how many lines per file is changing per changeset (and hence per developer), and code duplication (I personally love Simian for this).
·         You can also write reports splitting line count by file name to see the ratio of business, entity, data-access, unit test, UI, etc… For example, is someone checking in 1000 lines of business logic, but with zero unit tests? It's something worth investigating.
You cannot reduce an art like code craftsmanship to auto-generated metrics. But the metrics do offer clues to what is going on.  It's good to be aware, but never judge a developer on metrics alone.