Wednesday, 25 January 2017

Why does TDD work for you

Test-driven development (TDD) is big these days. It is often recommended as a solution for a wide range of problems.

However from an engineering point of view, it puzzles devs for two reasons:
  1. The "write test + refactor till pass" approach looks incredibly anti-engineering. If civil engineers used that approach for bridge construction, or car designers for their cars, for example, they would be reshaping their bridges or cars at very high cost, and the result would be a patched-up mess with no well thought-out architecture. The "refactor till pass" guideline is often taken as a mandate to forget architectural design and do whatever is necessary to comply with the test; in other words, the test, rather than the user, sets the requirement. In this situation, how can we guarantee good "ilities" in the outcomes, i.e. a final result that is not only correct but also extensible, robust, easy to use, reliable, safe, secure, etc.? This is what architecture usually does.
  2. Testing cannot guarantee that a system works; it can only show that it doesn't. In other words, testing may show you that a system contains defects if it fails a test, but a system that passes all tests is not safer than a system that fails them. Test coverage, test quality and other factors are crucial here. The false safe feelings that an "all green" outcomes produces to many people has been reported in civil and aerospace industries as extremely dangerous, because it may be interpreted as "the system is fine", when it really means "the system is as good as our testing strategy". Often, the testing strategy is not checked. Or, who tests the tests?
In summary, they are more concerned about the "driven" bit in TDD than about the "test" bit. Testing is perfectly OK; what devs don't get is driving the design by doing it.

What works for bridges might not work for software, and vice versa.

Why Does TDD work?


Bridges, cars, and other physical designs are nowhere near as malleable as software. This is an important distinction, and means that comparisons between software and real engineering isn't always relevant. What works for bridges might not work for software, and vice versa.

Before giving my two cents on why  TDD works, I want to highlight one misconception here.

In software design, the design is very close to the product. In civil engineering, architecture, the design is decoupled from the actual product: there are blueprints that hold the design, that are then materialized into the finished product, and those are separated by huge amounts of time and effort.

TDD is testing the design. But every car design and building design is also tested. Construction techniques are first calculated, then tested in smaller scale, then tested in larger scale, before put out in a real building. When they invented H-beams and the load for example, rest assured that this has been tried and tried again before it they actually build the first bridge with it.

Designs of cars are also tested, by designing prototypes, and yes, certainly by adjusting things that are not exactly right, until it lives up to the expectations. Part of this process though is slower, because as you said, you can't mess around much with the product. But every redesign of a car draws on experiences learned from former ones, and every building has about a thousand years of fundamentals behind it about the importance of space, light, insulation, strength, etc. Details are changed and improved, both in the buildings and in redesigns for newer ones.

Also, parts are tested. Perhaps not exactly in the same style as software, but mechanical parts (wheels, igniters, cables) are usually measured and put under stress to know the sizes are correct, no abnormalities are to be seen, etc. They might be x-rayed or laser-measured, they tap bricks to spot broken ones, they might be actually tested in some configuration or other, or they draw a limited representation of a large group to really put it to the test.

Those are all things you can put in place with TDD.

And indeed, testing is no guarantee. Programs crash, cars break down, and buildings start doing funny things when the wind blows. But... 'safety' is not a boolean question. Even when you can't ever include everything, being able to cover - say - 99% of the eventualities is better than covering only 50%. Not testing and then finding out the steel hasn't settled well and is brittle and breaks at the first smack of a hammer when you just put up your main structure is a plain waste of money. That there are other concerns that might still hurt the building do not make it any less stupid to allow an easily preventable flaw bring down your design.

As to the practice of TDD, that is a matter of balancing. The cost of doing it one way (for example, not testing, and then picking up the pieces later), versus the cost of doing it another way. It is always a balance. But do not think that other design processes do not have testing and TDD in place.
 

Wednesday, 4 January 2017

Delivering Large Faultless IT Projects


After watching National Geographic's MegaStructures series, I was surprised how fast large projects are completed. Once the preliminary work (design, specifications, etc.) is done on paper, the realization itself of huge projects take just a few years or sometimes a few months.

For example, Airbus A380 "formally launched on Dec. 19, 2000", and "in the Early March, 2005", the aircraft was already tested. The same goes for huge oil tankers, skyscrapers, etc.

Comparing this to the delays in software industry, I can't help wondering why most IT projects are so slow, or more precisely, why they cannot be as fast and faultless, at the same scale, given enough people?

Projects such as the Airbus A380 present both:

  • Major unforeseen risks: while this is not the first aircraft built, it still pushes the limits if the technology and things which worked well for smaller airliners may not work for the larger one due to physical constraints; in the same way, new technologies are used which were not used yet, because for example they were not available in 1969 when Boeing 747 was done.
  • Risks related to human resources and management in general: people quitting in the middle of the project, inability to reach a person because she's on vacation, ordinary human errors, etc.

With those risks, people still achieve projects like those large airliners in a very short period of time, and despite the delivery delays, those projects are still hugely successful and of a high quality.

When it comes to software development, the projects are hardly as large and complicated as an airliner (both technically and in terms of management), and have slightly less unforeseen risks from the real world.

Still, most IT projects are slow and late, and adding more developers to the project is not a solution (going from a team of ten developer to two thousand will sometimes allow to deliver the project faster, sometimes not, and sometimes will only harm the project and increase the risk of not finishing it at all).

Those which are still delivered may often contain a lot of bugs, requiring consecutive service packs and regular updates (imagine "installing updates" on every Airbus A380 twice per week to patch the bugs in the original product and prevent the aircraft from crashing).

How can such differences be explained? Is it due exclusively to the fact that software development industry is too young to be able to manage thousands of people on a single project in order to deliver large scale, nearly faultless products very fast?

What does the Software Industry Lack ?



Ed Yourdon's Death March touches upon a number of these meta type questions.

In general, the software industry lacks a lot of the following, which gets in the way of large projects.
  • Standardization and work item breakdown.
    • This has certainly gotten better, but the design constructs still aren't there to break out a big system. In some ways, software can't even agree on what's needed for a given project, much less being able to break things down into components.
    • Aerospace, building construction, auto, etc.. all have very component-driven architectures with reasonably tight interfaces to allow fully parallel development. Software still allows too much bleed through in the corresponding areas.
  • A large body of successful, similar projects. The A380 wasn't the first big airplane that Airbus built. There are a lot of large software applications out there, but many of them have suffered dramatically in some aspect or the other and wouldn't come close to being called "successful."
  • A large body of designers and builders who have worked on a number of similar and successful projects. Related to the successful project issue, not having the human talent who has been there, done that makes things very difficult from a repeatability point of view.
  • "Never" building the same thing twice. In many ways, an airplane is like any other airplane. It's got wings, engines, seats, etc.. Large software projects rarely repeat themselves. Each OS kernel is significantly different. Look at the disparity in file systems. And for that matter, how many truly unique OSs are there? The big ones become clones of a base item at some point. AIX, Solaris, HP-UX, ... herald back to AT&T System V. Windows has had an incredible amount of drag forward through each iteration. Linux variants generally all go back to the same core that Linus started. I bring it up, because the variants tend to propagate faster than the truly unique, proprietary OSs.
  • Really bad project estimation. Since the repeatability factor is so low, it's difficult to project how large it will end up being and how long something will take to build. Given that project managers and Management can't put their hands on the code and actually see what is being done, unrealistic expectations regarding timelines get generated.
  • QA / QC is not emphasized as heavily as it could or should be for larger projects. This goes back to having looser interfaces between components, and not having rigid specifications for how components should work. That looseness allows for unintended consequences and for bugs to creep in.
  • Consistently measurable qualifications. Generally, people speak of the number of years they've worked in X language or in programming. Time in is being used as a substitute for caliber or quality of skill. As has been mentioned many times before, interviewing and finding good programming talent is hard. Part of the problem is that the definition of "good" remains very subjective.
I don't mean to be all negative, and I think the software industry has made significant strides from where we've been. There are organizations working to standardize on "baseline" knowledge for software engineers. There is certainly room for improvement, but I think the industry has come a long way in a reasonably short period of time.