Agile, waterfall, Brooks' Law, and 94% failure rates -- there's lots to learn from HealthCare.gov troubles

Oct 22, 2013

The troubles with the Obamacare website offer this side benefit: consumers are learning a lot about the debate between "waterfall vs. agile" software development. During the next several weeks, we'll probably learn about Brooks' Law, too. I wish we were learning the word “pretotype.” And perhaps, we'll learn about The Standish Group, too.

If you hate of government boondoggles, you'll probably learn to hate these terms. But they offer some really handy life lessons, too.

In short, here goes:

Waterfall: Building a piece of software one step at a time, from full specs to full code to full test. It's one-way, analog, rigid and old-fashioned.

Agile: Building software in a series of sprints that let you adjust along the way.

Brooks' Law: When a product is crashing, burning and delayed, throwing more people at it only makes crash harder, burn faster, an be more delayed.

The Standish Group -- A research firm that collects data on software project failures. Hint: Success is rare.

Let me explain each in more details.

Waterfall vs. Agile

Readers of The Plateau Effect, our recent book about getting stuck, know a lot about waterfall vs. agile. In the book, we profile Agile's “spiritual” leader, Kent Beck. He once described Agile as analogous to driving a car: Even on a straight road, you can't simply set a steering wheel and forget it. You have to constantly make small course corrections along the way. Kent rallied a series of programmers back in 2001 to sign the Agile Manifesto. They committed to break free of the rigid engineering spec process, and instead build and test incrementally throughout creation of new software.

(Sign up for Bob Sullivan's free weekly newsletter.)

By all accounts, HealthCare.gov was trapped, and perhaps doomed, by a waterfall process. We shouldn't blame the programmers: government bidding, procurement and oversight procedures practically demand step-by-step waterfall project management -- spec - code - test -- code release fail -- blame.

Old fashioned waterfall software development often used "milestones" as management goals. But milestones can be too grandiose, and too set in stone, such as "Launch website Oct. 1." Agile uses checkpoints -- as frequently as every week -- to see whether or not a project is on target. Runners might think of this concept as "splits" -- if you are trying to run a marathon in three hours, you want to know you are sticking to about a 7-minute-per-mile pace. You don't want to find out at mile 20 that the clock is at 2:50 and ticking.

That's what it looks like HealthCare.gov ran into.

Pretotyping

There clearly wasn't enough time for a beta test -- in fact, we are all living the beta test right now. There's no downplaying the incredible complexity of what Obamacare and HealthCare.gov are trying to do: Verify identities, test income, make dozens of computer systems play nice, all accessible on dozens of consumer platforms and devices. There are probably millions of decision trees that should have been tested, but weren't. That's why it's a shame there wasn't more pretotyping. You've heard of prototyping, of course -- creating an almost-working version of a gadget so consumers can try it out. Pretotyping involves creation of much more rudimentary testable devices that can head off later problems much earlier in the process. The classic pretotype was used to test the initial Palm Pilot. Developers drew buttons on a piece of wood and asked people to pretend to use it so they could understand true consumer behavior before spending millions on engineering. The concept works well in web development, too. We've seen no evidence that this early kind of testing was attempted.

An analyst I spoke to back in August was very concerned with the lack of testing taking place.

"All these exchanges are going live with not enough testing," he told me back then. "I've not seen an IT project where the stakes are so high like this where there is such a disconnect in the level of confidence (of the developers) and the reality."

Brooks Law

You've probably experienced Brooks Law at your workplace. Throwing more people at a job often does more harm than good. Digital projects are not like assembly line projects; when Ford added a third shift in 1914, the automaker was able to make more cars. Today, adding a third shift often means more people have to explain more things to more people. There's a related principal I like to describe: Meeting Law. Meetings take longer with every additional person you invite. That's why meeting are often ineffective. They are supposed to be one to many efficient communication, but often, managers find talking to employees individually can be faster.

Brooks Law, named after IBM programmer Fred Brooks, was explained in Brooks’ book “The Mythical Man-Month.” It states simply: “Adding manpower to a late software project makes it later.”

This is why Health and Human Services' pledge to hire the best and brightest worries me. Even the best and brightest will only interfere with the programmers and testers trying to fix what's broken right now, because the people doing the work will have to stop working and explain the software conventions and architecture to the newbies, slowing down emergency repairs. But not to worry: the best and brightest are busy inventing the next Google or Facebook right now; they're not going to drop what they are doing and fix HealthCare.gov.

The Standish Group

Finally, it's almost a certainty that the architects of HealthCare.gov and the state exchanges were naive and overly optimistic. The odds say so. It's important to understand that virtually all big software projects are late, or fail. There's some brilliant stats on this in a Computerworld story published on Monday. Standish tracked 3,555 big-ticket projects from 2003 to 2012, and told Computerworld that only 6.4% were successful. Another 52 percent were over budget, late, or fell short of expectations. And 41 percent were abandoned.

"They didn't have a chance in hell," Jim Johnson, founder and chairman of Standish, said to Computerworld. "There was no way they were going to get this right."

So the numbers suggest the odds are 4 in 10 that when "the best and brightest" start digging into HealthCare.gov, they will decide to scrap what's there and begin work on Obamacare 2.0. If fact, the aforementioned analyst, who requested anonymity, predicted that very outcome in August. It's often easier to write code from scratch than fix a badly broken codebase.

"They will find out a lot of stuff in first 90 days, before they move into Exchange 2.0 and can move to a higher-level fix," he said.

If that’s true, the best course of action is abandon ship, and roll back the web release, and try more limited releases the next time around – begin with one state, or one type of consumer. It would be embarrassing, but it would also be efficient. Sadly, we’re not talking about an embarrassed company here; we’d be talking about an embarrassed political party. A rollback, politically is probably out of the question.

KEEP LEARNING AT BOBSULLIVAN.NET

The Red Tape Chronicles

Discussion about this post