Friday, December 30, 2011

The benefits to “check in early and often”

I am a huge advocate of checking in early and often. I’ve seen many a project get burnt by the developer who saves 3 weeks of work for a single “glorious” check-in.
I favor frequent check-ins because it’s:
  1. Cheaper integration. Someone once said “Integration is pay me now or pay me later”, and I find it much easier to pay now. Especially with automated builds and continuous integration, it’s much easier to check in 10 little changes than 1 big change (Sometimes I think of it like being easier to hold my breath for 30 seconds, ten times, as opposed to holding it for 5 minutes straight). Why? Because with bigger changes, you inevitably get farther out of sync – especially on critical shared files – and there’s more to forget.
  2. More objective measure of what you really have: Code that isn’t checked in, that just works on a developer’s machine, doesn’t really exist. They might as well say “it works in my head”. Once you actually get the code past a build server’s policy, then we can see what’s really there.
  3. Earlier Detection: We all know it’s cheaper to fix a bug or redesign the sooner you catch it. I’d rather developers check in code early so we can quick detect things (“why is there 5000 lines but no tests?”)
  4. More Modular: Checking in 10 chunks of code, where each one works, implies more granular and modular code. I.e. code that can at least be split into multiple check-ins is more modular than code that can’t be split at all.
Of course there’s always exceptions (you do a massive refactoring, etc…), but those should be the exception, not the rule.
Most of the time, in my experience, large check-ins by developers means something bad – spaghetti code, tightly-coupled code, code that was trying to hide under the radar until right before the deadline and then the developer says “oops, I just don’t have time to change it”, or something like that. Think of it like this: there is zero benefit to you to have to wait one month before seeing what a developer is doing, but there is benefit to early detection of code, so risk-reward wise it’s better to check-in early.
Note that for these purposes, a shelve set is not the same as a check-in. Shelvesets are private, and hence deliberately avoid the benefits listed above (which some say is a feature). For example, you mostly likely don’t have builds on a private shelfset. For a developer to say “I put my 20,000 lines in a shelveset” is misguided– use a branch instead if you need to.
So how to encourage check-in early and often?
You could write a whole chapter on this, but here's a short answer: You can explain the benefits so some developers are internally motivated, or you can make it official policy so that other developers are externally “motivated”. You can leverage the TFS Code Churn tables to automatically monitor activity, or even just view check-ins in Team Explorer, to see how often a developer checks in and how much code has changed. If a developer or contractor insists that they need to wait 1 month to check-in their code when “it’s ready”, you’ve got problems, much like if a developer insisted they didn’t need to follow any other policy or good practice.

Monday, December 5, 2011

Is development for sissies?

I was reading the book "Tales to make boys out of men" (I have 2 young ferocious gorillas boys). It was filled with adventurous stories of courage and valor who fought battles in the jungle or trekked through the frost-bitten Antarctic. Then here am I, a software engineer, essentially doing a "desk job" in an air-conditioned office with free coffee.
Sometimes I see people who have two categories: "tough-guy" jobs like fighter pilot, football player, or astronaut,  and "sissy" jobs like software engineer sitting behind a desk. What do I tell my impressionable kids?
I see it like this. "Tough-guy" jobs are honorable, and you certainly need them. But don't dismiss a "desk job" as being a sissy. Many IT engineers need to work with the most ferocious, dangerous, lethal, destructive animal on the planet – other people.  People inevitably have competing demands and interests, there are ruthless sharks out there, and any job that must constantly deal with people cannot, by definition, be a sissy job.
Second, developers also must work with the most uncaring and cold-hearted beast ever to exist – the compiler. The compiler doesn't care if you've had a bad day, if your code should work, or if you've spent a hundred hours on a 5 minute task. It has no grace. Such an inhuman vacuum is not the field of sissies.
Furthermore, other people are depending on the IT engineer's work. You could have a million customers using your financial application, or a billion dollars of revenue flowing through your processing system. Hackers attack your system every day. And the system has got to work. To have that sort of responsibility is not sissy-like.
Lastly, software engineering is so complex that you inevitably make mistakes (sometimes really big ones) – and then need to own up to them. That takes courage.
Ok, it's still not Rambo, but software engineering is not for the weak.

Friday, December 2, 2011

10 Reasons why the build works locally but fails on the build server

This is a braindump:
1.       Developer did not check all the files in, or developer doesn't have the latest files (sometimes TFS hiccups getting latest dlls files).
2.       Different modes (release vs. debug). Either #if DEBUG, or project is unmarked in configuration manager.
3.       Different bin structure - each project gets its own (Default for visual studio), vs. single shared bin for all (default for TFS). This is especially common when different versions of the same assembly is referenced in multiple projects in the same solution.
4.       Different platform/configuration
5.       The build is running other steps (perhaps a packaging or command-line unit tests)
6.       Different bitness, say developer workstation is 64-bit, but build server is 32-bit, and some extra step breaks because of this.
7.       Rebuild-vs-build. Developer not running a rebuild. Hence there's an error in creating a dll, but it already exists on dev machine due to some other process, but build server fails.
8.       Workspace mapping is incorrect – TFS not getting all the files it needs
9.       Unit test code coverage – visual studio (at least 2008) can be very brittle running command line unit tests and code coverage.
10.   Treat warnings as compile errors – depending on your process, the build server may fail on these, but Visual studio may only flag you with a warning (which dev ignores)

Tuesday, November 29, 2011

Why I'm liking Pluralsight

My department had scheduled to send each of us to training. We had different training classes, and the specific vendor for my class needed to cancel. That left me short notice to squeeze in different training by year end. So, being creative, I got an online subscription to Pluralsight instead of the traditional training.
PluralSight is a set of online .Net videos created by industry experts.  Each video seems 2-4 hours' worth of power point slides and code demos. It's worked out very well. What I'm liking so far:
·         Different medium - After 10 linear feet of books, I like the different medium. Hearing someone's voice seems to trigger a different part of the brain for remembering, and seeing the demo from end-to-end has obvious benefits over isolated screenshots in a book or article.
·         It's on-demand – It's hard to make it to physical events. I like the inherent benefit of on-demand training, where I can listen on my schedule (by which I mean everyone else's schedule - my kid's sleeping schedule, my company's work schedule, etc…)
·         Professional content - There are tons of free videos online, but these are often like reactionary scraps. To break the ice with a new technology, it helps to have a systematic 2-hour block that goes from end-to-end.
·         Track progress – Some personality types won't care about this, but I like how it tracks completion progress through courses. It's almost like finishing levels of a video game.
·         Coordinated – I don't need 10 videos all telling different or rehashed angles of the same thing (which is often what I'd find in a google search) – rather I need one good video that nails it, or a collection of videos that each explains their specific part well.
·         Continually Improving – They seem to come out with a few new "courses" every week.
It's getting to the point where rather than watch my favorite sitcom, I watch the next Pluralsight video.

Sunday, November 27, 2011

Measure what you actually care about

Our three kids are currently 2, 4, and 6. We are starting to potty train the youngest. She's a cute thing, but you can imagine it's always a trying experience. Because I'm very anti-ivory-tower, and think the best developers are the ones grounded in the practically of everyday life (such as potty-training a two-year old), I can't help but think how this relates to software engineering.
Here's how: we found ourselves rewarding our daughter every time she successfully went potty. It sounded reasonable, but we remembered that it's actually misleading – we're rewarding the wrong thing. What we really want is not a two-year old that goes potty every 20 minutes in order to earn her chocolate-chip, but rather a two-year old that remains dry. Even two-year olds can figure out how to game the system.
This sort of misguided measurement is what often occurs in demoralized IT shops. For example, the main compensation is based on the number of bugs fixed or number of UI screens created (because it's easy to measure), but what they actually care about is increased functionality or quality. The irony is that this often encourages the exact opposite of what the boss really wants. Just like I don't want  a two-year old going "tinkle" every 20 minutes, I don't want developers gaming the system by fixing large quantities of irrelevant or duplicate bugs.
The blog post doesn't have a quick answer, I mainly just wanted to write about my daughter's potty-training adventures while she was taking a nap. But a quick approach is to focus on what you actually care about (say quality), and then work backwards thinking "what would high quality look like", such as less production complaints, les support time, less application down time, less developer time spent fixing bugs, etc… Then focus how to measure those things.

Wednesday, August 3, 2011

Whatever requirements we're given tomorrow, we got to get that done

I've seen a thousand hacks justified with "We got to get it done". You know the drill – copy and paste 200 lines of code, hard-code data that should be configurable, skip any automated testing, etc… Such hacks come at the expense of future flexibility (i.e. good design).
However, ironically, given the continual feature change, scope creep, and unknowns in software development, the real question becomes "Whatever requirements we're given tomorrow, we got to get that done."
This second question, the more realistic one for long-term departments, brings completely opposite connotations than the first. Instead of cranking out a feature now with no concern for maintenance costs or flexibility tomorrow, developers need to prepare – i.e. ensure they have automation, builds, reuse, etc…
Besides technical debt, the other problem I have with the "just get it done now" crowd is the false sense of nobility. Often these devs insist that they're doing a good thing (putting out a fire), but really it's just punting the problem down the road for someone else to pay while they boast how quickly they've solved it.

Wednesday, July 27, 2011

Why you’re in trouble if you rely on 30-page SOPs

Every organization wants to have its development processes documented into Standard Operating Procedures (SOPs) for the obvious reasons – faster onboarding, standardization, auditing, etc… The Holy Grail is the potential to hire a bunch of contractors (or outsource), tell them to read a novel worth of documentation, and then they’re fully up to speed a week later. SOPs also imply that the team knows what it is doing and has a plan, which is one indicator of a mature organization. SOPs are also a prerequisite for outsourcing, an appealing option for large organizations.
While documentation has its benefits, you can’t rely solely on large documents to communicate process and onboard new people. Here are at least four common problems, and it will result in a frustrated and confused team:
  1. The doc itself will be wrong (or outdated), such as skipping steps or assuming institutional knowledge. This is especially common if you hire outside consultants (with no institutional knowledge of your systems) to document your process.
  2. It’s easier to bluff a doc – a busy tech writer will hurry the doc, thinking it looks done ("I’ve written 50 pages!"), but the content won’t be correct or specific enough.
  3. Screens will vary (example: the software is upgraded or the OS doesn’t match).
  4. People simply won’t read the docs, they’re skim and miss details.
Several ways to communicate an SOP instead of just 30-page MSWord docs:
  • Favor automation over documentation where possible. The best document is an automated script. A script is usually kept up to date (compared to a doc) because developers need the script to work. It’s also much faster (and less error-prone) for a new guy to kick off the script than to tediously step through 20 pages of instructions.
  • Lower the cost of documenting by leveraging a wiki. Developers are far more likely to update or correct a wiki then a big MSWord doc on SharePoint.
  • Favor simplifying the process so the doc itself is simpler.

Monday, July 25, 2011

If you're going to fail, fail big – example with dirty towels

After showering the other day, I needed a towel to dry off.  I saw a bunch of towels in a nearby laundry basket, but wasn't sure if they were dirty laundry going downstairs, or clean laundry coming upstairs. I was too lazy to walk down the hall to get a towel from the closet that I knew would be clean, and the towels in the basket looked clean, so I used the ones from the basket because they were immediately available. But it didn't feel "right". I started feeling bits of lint and loose hair on me, so I did the most thorough test I could find – I phoned my wife and asked her if the towels in the laundry basket were clean or dirty. She informed me that they were "of course" dirty towels used as floor mats.

Like every other aspect of normal life, I see this directly analogous to software coding. The dirty towel is like a defect. If it was obviously dirty, I never would have used it, hence saving myself the grossness of drying myself with a floor mat. Applied to software: Better to have a defect explode in your face so that you're forced to fix it, as opposed to a "half-bug" that continually bites you.

Wednesday, July 20, 2011

Ghosts and Time Bombs

Having bugs that are reproducible on your local machine is a luxury. Enterprise production apps are often void of such luxuries. Indeed, often the reason the bug gets past developers, code reviews, QA, UAT, regression, and every other quality control measure is because it is not acting in an obviously  deterministic way. Two common types of such bugs are "Time Bombs" and "Ghosts".

A Time bomb works perfectly, only to explode at some point in the future because it depends on some external dependency that eventually changes. These are usually deterministic, and can be reproduced if you know what you're looking for, but it's very hard to trace the exact cause.  The temptation with the time bomb is that it's working perfectly right now, and everyone is always so busy, so they move on.
Examples of time bombs are:
·         Dependency on the clock – Y2K was the most famous case. Other examples include code that doesn't account for the new year (say it sorts by month, and doesn't realize that January 2012 is greater than December 2011), or storing a total milliseconds as in Int32 (and it overflows after a month).
·         Growing data – Say your system logs to a database table, and it works perfect on local and even QA tests where the database is constantly refreshed. But then in 6 months (after the developers have rolled off the project and no-one even knows about the log) the log file becomes so bloated that performance slows to a crawl and all the connections timeout.
·         Memory leak – Similar to the growing data.
·         Service contract changes or expires – In today's interconnected systems, it is common to have external data dependencies from third parties. What if a business owner or manager forgets to renew these contracts, or the schema of the contract fails, and hence the service constantly fails. Even worse – say you shell out to a third-party tool (with System.Diagnostics, and hide the window so there's no popup) that gives a visual EULA after such an expiration, and all you see if the process appears frozen because it's waiting for the (hidden) EULA?
·         Expiring Cache – What if you store critical data in the cache on startup, but that data eventually expires without any renewal policy and the app crashes without it?
·         Rare events with big impact – What if there's a annual refresh of an external data table? I've seen apps that work perfectly in prod for 8 months, processing some external file, and then unexpectedly explode because they're given an "annual refresh" file that is either too big, or has a different schema.

General ways to test for time bombs:
·         Leave the app running for days on end.
·         Forcibly kill the cache in the middle of running – will it recover?
·         Do load testing on the database tables.
·         Make sure you have archive and cleanup routines.
·         Set the system clock to various values.
·         Test for annual events.

Ghosts are bugs that work perfectly in every environment that you can control, but randomly show up in environments you can't. You can't reproduce them, and you don't have access to the environment, so it's ugly. Ghosts are tempting to ignore because they seem to go away. The problem is that if you can't control the ghost, then you can't control your own application, and that looks really bad to senior management. Examples of ghosts include:

·         Concurrency, threading, and deadlocks – Because 99% of devs test their code as a single user stepping through the debugger, they'll almost never see concurrency issues.
·         Environmental issues – For unknown reasons, the network has hiccups (limited bandwidth, so sometimes you app gets kicked out), or database occasionally runs significantly slower, causing your performance-tuned application to time out.
·         Another process overwrites your data – Enterprise apps are not a closed system – there could be other services, database triggers, or batch jobs randomly interfering with your data.
·         Hardware failures – What if the network is temporarily down, or the load balancer has a routing error (it was manually configured wrong during the last deploy?), or a disk is corrupt?
·         Different OS or Windows updates – Sometimes devs create (and debug) on one OS version, but the app actually runs on another. This is especially common with client apps where you could create it on Windows 7 Professional, but it runs on Windows Vista. Throw in service packs and even windows updates, and there can be a lot of subtle differences.
·         Load balancing – What if you have a web farm with 10 servers, and 9 work perfect, but the last one is broken (or deployed to incorrectly)? The app appears to work perfectly 90% of the time. Realistically, say it's a compound issue where the server only fails 10% of the time, then your app appears to work 99% of the time.
·         Tedious logic with too many inputs – A complex HR application could have hundreds of test cases that work perfectly, but say everyone missed the obscure case that only occurs when a twice-terminated user logs in and tries to change their email address.

General ways to test for ghosts:
·         Increase load and force concurrency (You can easily use a homegrown threading tool to make many web service or database calls at once, forcing a concurrency test).
·         Simulate hardware failures – unplug a network cable or temporarily turn off IIS in your QA environment. Does the app recover?
·         Allow developers some means for QA and Prod debug access –if you can finally reproduce that bug in prod (an nowhere else), the cheapest solution is to allow devs some means to troubleshoot it there. Perhaps they need to sit with a support specialist to use their security access, but find a way.
·         Have tracers and profilers on all servers, especially web and database servers.
·         Have a diagnostic  check for your own app. How do you know your app is healthy? Perhaps a tool that pings every webservice (on each machine in the webfarm), or ensures each stored proc is correctly installed?

Monday, July 18, 2011

The exponential learning curve from studying off-hours

You work a full hard day, so why bother "studying" off-hours? Because, it has an exponential reward. The trick is to differentiate between daily "grunt" work that just takes time without improving you as a developer, vs. "learning" work, such as experimenting with new technology or patterns or tools.

I've seen endless resumes where the candidate says "I have 5 years experience in technology X", but it's really 1 year of experience repeated 5 times. They've sadly spent their career doing repetitious work, and have nothing new to show for it.

Here's a simplistic case: say you spend 9 hours a day ("45" is the new "40") at work, but 8 hours of that is grunt work – data access plumbing, fix a logic bug, attend yet another meeting where you just sit through it, fill out a timesheet – and you've snuck away only 1 hour to research some new data-performance prototype, that means about 90% of you day is grunt work, and about 10% is advancing your career. If you did 1 hour self-study at night, it's not that you go from 9 hours to 10 hours, but rather from 1 hour to 2 hours, i.e. that extra hour is "cool" stuff for self-study, so it gives your "self-study" a  100% return.

Because I love Excel, here's a chart. "Work hours" is split between "Grunt hours" and "self-study hours". Hence an extra hour or two at night could double your learning curve.
I realize this is very black and white, and live is grey (for example, our jobs aren't neatly divided into "grunt " and "self-study", and 1 hour of self-study may be split into a dozen 5-minute Google queries throughout the day), but the general idea still holds.


Work hours
Grunt hours
Self-study hours
Self-study
Percent
Overtime
Self-study
Self-study
increase
9
9
0
0%
1
Infinite
9
8
1
11%
1
100%
9
8
1
11%
2
200%
9
7
2
22%
1
50%
9
7
2
22%
2
100%


Related to this is the Upward Spiral – that the extra hour of overtime helps you learn a technique that makes your day job much faster. For example, say you spend 4 hours a day writing plumbing code to filter C# objects, but then you study LINQ during the evenings and now can do a 4-hour job in 30 minutes. That theoretically gives you a "surplus" of 3.5 hours. Of course that time gets snatched up by other things, but in general it's reasonable to reinvest part of that time in yet further self-study, i.e.  compound interest for your development career.

Friday, July 15, 2011

Presentation: An Introduction to Practical Unit Testing

I had the opportunity to speak at the LCNUG yesterday about one of my favorite topics - unit testing.

For convenience, here are some of the links from the PowerPoint:

Database "Unit Testing"
IOC

Tuesday, July 12, 2011

Upcoming talk: July 14 - An introduction to practical unit testing

[UPDATE] - slides at: http://www.timstall.com/2011/07/presentation-introduction-to-practical.html

I'll be presenting at the Lake County .Net Users Group this Thursday, July 14, on An Introduction to Practical Unit Testing.
Unit testing is one of those buzzwords that every developer hears about, but relatively few projects actually do in a way that adds value. Many developers view unit tests as some "tax" imposed by management. This session will show how to start using unit tests to add immediate value, as well as dispel several common myths and abuses of unit testing. It will explain how unit tests fit with other types of automated tests (integration, functional, UI, performance), as well as how unit tests are but one technique in a developers' tool belt to craft better code.
This is intended as a basic 101-level session.

Tuesday, June 28, 2011

Query files in the TFS VersionControl database

TFS provides an API that C# could programmatically query source control. However, even with Linq, that may become tedious coding. TFS also provides a TfsVersionControl database that you can query directly with SQL. This has power.
Why use the undocumented TfsVersionControl database when you're "encouraged" to use TfsWareHouse?
  1. The Transaction databases (TfsBuild, TfsVersionControl, TfsIntegration) are realtime, so you don't need to wait 30 minutes - 2 hours for it to refresh.
  2. Not all the info is migrated to the TfsWareHouse (or at least, I can't find it in any documentation). For example, the warehouse has a File table, but it doesn't contain all versioned files (such as binaries, images, etc...)
  3. The TFS warehouse may be corrupted (the process to sync it may be down)
Here's a simple (TFS 2008) query to get you started. It contains version, the full path, file name, and the CreateDate (when it was checked in). It's based on versioned items, so you can query history (you may also get duplicated, so you'd need to query that).
select
V.VersionFrom, V.FullPath, L.CreationDate,
Replace(V.ChildItem, '\', '') as [FileName],
V.*, L.*
from tbl_Version V (nolock)
inner join tbl_File L (nolock) on V.FileId = L.FileId
where V.ParentPath = '$\MyTeamProject\Folder\SubFolder\'
order by V.VersionFrom desc
Note that TFS by default stores paths in a different format, so you may need to convert:
·         '/' becomes \
·          '_' becomes >
·         '-' becomes " (double quote)