Sunday, May 30, 2010

Three cautions with mocking frameworks

[This was originally posted at]

I'm a big fan of unit testing. I think in many cases, it's faster to developer with unit tests than without.

Perhaps the biggest problem for writing unit tests is how to handle dependencies - especially in legacy code. For example, say you have a method that calls the database or file system. How do you write a unit test for such a method?

One approach is dependency injection - where you inject the dependency into the method (via some seam like a parameter or instantiate it from a config file). This is powerful, but could require rewriting the code you want to test.

Another approach is using mock or "isolation" framework, like TypeMock or RhinoMock. TypeMock lets you isolate an embedded method call and replace it with something else (the "mock"). For example, you could replace a database call with a mock method that simply returns an object for your test. This is powerful; this changes the rules of the game. It's great to assist a team in adopting unit testing because it guarantees that they always have a way to test even that difficult code. However, as Spiderman taught us, "With great power comes great responsibility". TypeMock is fire. A developer can do amazing things with it, but they can also burn themselves. If abused, TypeMock could:

  1. Enable developers to continue to write "spaghetti" code. You can write the most tangled, dependent code ever (with no seams), the kind of thing that would get zero test coverage, and TypeMock will rescue it. One of the key points of unit testing is that by writing testable code, you are writing fundamentally better code.
  2. Allow developers to get high test coverage by simply mocking every line. The problem is that if everything is mocked, then there's nothing real left that is actually tested.
  3. Make it harder to refactor because the method is no longer encapsulated. For example, say a spaghetti method has a call to a private database method, so the developer uses TypeMock to mock out that private call. Later, a developer refactors that code by simply changing the name of a private method (or splits a big private method into two smaller ones). It will break the related unit tests. This is the opposite of what you want - encapsulated code means you can change the private implementation without breaking anything, and unit tests are supposed to give confidence to refactoring.

TypeMock can work magic, but it must be used properly.

Monday, May 24, 2010

Developer balance of power

[This was originally posted at]

You need people to get the project done, but people are eventually error prone. Just like in government there are "separation of powers", software projects can also benefit from such separation. As a general rule, for production code, the same person should not both:
  • Code and Review - The reviewer checks the code quality (It's too easy to give a free pass, or have bias, to your own code)
  • Develop and Test - The tester checks the developer. (The dev already thinks their code works fine)
  • Build and Deploy - Having someone else deploy what the developer built encourages easier and objective deployment, and helps invalidate the it works on my machine.

Sunday, May 23, 2010

Why it is faster to developer with unit tests

[This was originally posted at]

I keep hinting at this with various blog posts over the years, so I wanted to just come out and openly defend it.

It is faster for the average developer to develop non-dependent C# code with unit tests than without.

By non-dependent, I mean code that doesn't have external dependencies, like to the database, UI controls, FTP servers, and the like. These types of UI/Functional/Integration tests are tricky and expensive, and I fully emphasize why projects may punt on them. But code like algorithms, validation, and data manipulation can often be refactored to in-memory C# methods.

Let's walk through a practical example. Say you have an aspx page that collects user input, loads a bunch of data, and eventually manipulates that data with some C# method (like getting the top N items from an array):

    public static T[] SelectTopN(T[] aObj, int intTop)
      if (aObj == null)
        return null;
      if (aObj.Length <= intTop || aObj.Length == 0 || intTop <= 0)
        return aObj;

      //do real work:
      T[] aNew = new T[intTop];
      for (int i = 0; i < intTop; i++)
        aNew[i] = aObj[i];

      return aNew;

This is the kind of low-hanging fruit, obvious method that should absolutely be tested. Yet many devs don't unit test it. Yes it looks simple, but there's actually a lot of real code that can easily be refactored to this type of testable method (and it's usually this type of method that has some sort of "silly" error). There's a lot that could go wrong: null inputs, boundary cases for the length of the array, bad indexes on an array, mapping values to the new array. Sure, real code would be more complicated, which just reinforces the need for unit testing even more so.

Here's the thing - the first time the average developer writes a new method like that, they will miss something. Somehow, the dev needs to test it.

So how does the average programmer test it? By setting up the database, starting the app, and navigating 5 levels deep. Oops, missed the third-null; try again. That's 3 minute wasted. Oops, had an off-by-one in the loop; try again. 6 minutes wasted. Finally, set everything back up, testing the feature, score! 15 minutes later, the dev has verified a positive flow works. The dev is busy and under pressure, got that task done, so they move on. 4 weeks later (after the dev has forgotten everything), QA comes and says "somewhere there's bug", and the dev spends an hour tracking it down, and it was because the dev didn't handle when the array has a length less than the "Select Top N", and the method throws an out-of-range exception. Then the dev makes the fix, hopes they didn't break anything else, and waits a day (?) for that change to be deployed to QA so a test engineer can verify it. Have mercy if that algorithm was a 200-line spaghetti code mess ("there wasn't time to code it right"), and it's like a rubix cube where every change fixes one side only to break another. Have even more mercy if the error is caught in production - not QA.

Unit test is faster because it stubs out context. Instead of take 60 seconds (or 5 minutes, or an hour of hunting a month later!) to set up data and stepping through the app, you just jump straight to it. Because unit tests are so cheap, the dev can quickly try all the boundary conditions (outside of the few positive flows that the application normally runs when the dev is testing their code). This means that QA and Prod often don't find a new boundary condition that the dev just "didn't have time" to check for. Because unit tests are run continually throughout the day, the CI build instantly detects when someone else's change breaks the code.

Especially with MSTest or NUnit, unit test tools are effectively free. Setting up the test harness project takes 60 seconds. Even if you start with only 5% code coverage for just the public static utility methods - it's still 5% better than nothing.

Of course, the "trick" is to write your code such that more and more of it can be unit tested. That's why dependency injection, mocking, or even refactoring to helper public-static helper utilities is so helpful.

Over the last 5 years, I've heard a lot of objections to avoid testing even simple C# methods, but I don't think they save time:

Objection against unit testing being fasterRebuttal
You're writing more code, so it's actually slowerIt's not typing that takes the time, but thinking.
I can test it faster by just running the appWhat does "it" really mean? You're not testing the whole app, just a handful of common scenarios (and you're only running the app occasionally on your machine, as opposed to unit tests than run every day on a verified build server)
"Unit testing" is one more burden for developers to learn, which slows us downUnit tests are ultimately just a class library in the language of  your choice. They're not a new tool or a new language. The only "learning curve" is that conceptually it requires you write code that can be instantiated and run in a test method - i.e. you need to dogfood your own code, which a dev should be prepared to do anyway.
My code is already perfect, there are no bugs, so there is not need to write such unit tests.Ok, so you know your code is perfect (for the sake of argument) - but how will you prove that to others? And how will you "protect" your perfect code from changes caused by other developers? If a dev is smart enough to write perfect code the first time, then the extra time needed to write the unit test will be trivial.
If I write unit tests, when I change my code, then I need to go update all my tests.Having that safety net of tests for when you do change your code is one of the benefits of unit tests. Play out the scene - you change your code, 7 tests fail - that's highlighted what would likely break in production. Better to find out about breaking changes from your unit tests rather than from angry customers.
Devs will just write bad tests that don't actually add value, so it's a waste of time.Any tool can be abused. Use common sense criteria to help devs write good tests - like code coverage and checking boundary cases.
I just don't have timeFor non-dependent C# code, you don't have time not too. Here's the thing - the code has got to be tested anyway. What is your faster alternative to prove that the code indeed works, and that it wasn't broken by someone else after you moved on?


Wednesday, May 19, 2010

Using SMO to automate SQL tasks

[This was originally posted at]

I used to use shelling out to the console with osql, or splitting out all the "GO" keywords and using ADO.Net to execute non-queries.

And years ago, I guess that was fine.

But Server Management Objects (SMO) are phenomenal. They just work. With a few lines in C#, you can create or kill a database, install schema scripts, enumerate the database, and much more. You also get exceptions thrown if there is an error (much easier than parsing the osql output from the command line).

Good stuff.

Wednesday, May 12, 2010

Why are developer estimates almost always short?

[This was originally posted at]

We constantly need to make estimates in software engineering. But, the ironic thing is not that our estimates are off, but rather that they're almost always short. There's a one-way bias.

I think this is because:

  • It always looks simple in our heads. Devs are overly optimistic and plan for best case scenario, and hence get caught off-guard. I mean, who would ever guess that it takes 5 hours to fix a single line of code?
  • Scope creep - The business just keeps piling on more requests (and it's hard to say "no" to the business).
  • Strong bias from managers - Most managers want the lower estimate because, regardless of whether the estimate is actually accurate, it's easier to sell to their own boss or clients. For example, in consulting it's easier to give a lower estimate to "get your foot in the door", and then gradually try to pile on more services.
  • Devs try to impress - I can't help but wonder if many devs try to impress their managers by providing a lower estimate "Sure Mr. Burns, I can get that 5 day task done tomorrow (because I'm a rock star developer... please appreciate me)"
  • It's almost expected - At least in my experience, it almost seems like (unfortunately) the industry has resigned itself to low-ball developer estimates. So the managers always just double (if the estimate is from a senior dev) or triple (if the estimate is from a junior dev).

The best solution I can think of: read and study Steve McConnell's phenomenal book on Software Estimation: Demystifying the Black Art.

Also see: How not to estimate, Playing estimates off each other.

Monday, May 3, 2010

Civil Engineering and Writing New Code

[This was originally posted at]

I have a special appreciation for civil engineering - I just find the bridges, highways, dams, and sky-scrapers beautiful in a way that an engineer can appreciate.

In America, before all the infrastructure was built, one might say that there was a "golden age" of civil engineering. With literally millions of square miles of country, there were seemingly endless opportunities to build new structures. And at the time that these structures needing building (before computers, flight, or bio-engineering), civil engineering was arguably one of the advanced fields of its day. You put these circumstances together: lots of new projects that require advanced technology - and you've got an engineer's dream.

Of course, over two hundred years, enough roads and bridges and buildings were built to fulfill much of the country's need. There were a couple spurts in between - building the highways, the major airports, an occasional monument, and more recently the cell phone transmission towers. However, in general, the civil engineering mindset went from "new creation" to "infrastructure maintenance".

At least from my perspective, the same life-cycle appears to be happening with software engineering. Even back in the 80's, just finding a machine, learning the syntax and writing a "program" was a new thrill. However, especially with the internet (just google the syntax and copy code-snippets), better hardware (you don't need an algorithm-genius to make an application perform), mass availability of machines, outsourcing (huge pool of developers), standard platforms and components that encourage reusability instead of re-inventing the wheel, and simply enough years passing - almost every established company has some sort of core IT infrastructure in place. Back in the late 90's, major companies had huge booms to implement their ERP, email, and websites (made lots of consultants very rich) - but now those expensive systems are in place. Sure, there's always work to do, like integrating components, migrating code, consolidating applications, extending functionality of existing apps, and maintaining existing code. There's still new development, but it seems much scarcer than 10 years ago. The cowboy days of just being thrilled to write a program seem to have passed.

Similar to how civil engineering has filled the country's physical infrastructure, software engineering has filled much of the country's IT infrastructure - and therefore in both cases much of the work currently being done is maintenance. America doesn't make many Hoover Dams or Golden Gate Bridges anymore - but there's always annual road re-surfacing. Same concept for developers. (This means that there's tons of legacy code out there, which is why every developer should read the phenomenal book Working Effectively with Legacy Code.)

Some developers view this in a pessimistic light, as if "the good old days" have passed us by. However, I'm an optimist, and there's much reason to believe that there are still many innovative and new software development efforts ahead.

  • There are continually newer technologies - This provides a business incentive to rebuild older systems. Web systems replaced many fat clients. But now web 2.0 is replacing many existing web systems, and mobile apps may replace those, and there will be something after that (what if voice recognition takes off and makes most UI obsolete?).
  • Much room for innovation - The nature of IT (with the low barrier to entry, the ability to cheaply experiment, and building projects out of "thought" stuff) allows for massive innovation, unlike any other field I can think of. Innovation means hopes for a better, more profitable system, and therefore business sponsors to fund new development.
  • Software applications have a short lifespan. - Most software projects are replaced within 5-10 years, so there is continually new work. (A good bridge or building could last for a hundred).
  • Programming is a foundational skill that can assist you with everything else - Because almost every desk job in any industry uses MS Office apps (or the equivalent of), databases, and web sites, the ability to write even simple programs can assist with those jobs. For example, writing an Excel macro or finding results in a SQL query may let you get a task done much quicker.

So while on one-hand there's definitely more maintenance work for all the legacy systems, as the field of software engineering matures, I think there's still a lot to be optimistic about for new development opportunities.