Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Testing & QA

What Test Engineers do at Google: Building Test Infrastructure

Google Testing Blog - Fri, 11/18/2016 - 18:13
Author: Jochen Wuttke

In a recent post, we broadly talked about What Test Engineers do at Google. In this post, I talk about one aspect of the work TEs may do: building and improving test infrastructure to make engineers more productive.

Refurbishing legacy systems makes new tools necessary
A few years ago, I joined an engineering team that was working on replacing a legacy system with a new implementation. Because building the replacement would take several years, we had to keep the legacy system operational and even add features, while building the replacement so there would be no impact on our external users.

The legacy system was so complex and brittle that the engineers spent most of their time triaging and fixing bugs and flaky tests, but had little time to implement new features. The goal for the rewrite was to learn from the legacy system and to build something that was easier to maintain and extend. As the team's TE, my job was to understand what caused the high maintenance cost and how to improve on it. I found two main causes:
  • Tight coupling and insufficient abstraction made unit testing very hard, and as a consequence, a lot of end-to-end tests served as functional tests of that code.
  • The infrastructure used for the end-to-end tests had no good way to create and inject fakes or mocks for these services. As a result, the tests had to run the large number of servers for all these external dependencies. This led to very large and brittle tests that our existing test execution infrastructure was not able to handle reliably.
Exploring solutions
At first, I explored if I could split the large tests into smaller ones that would test specific functionality and depend on fewer external services. This proved impossible, because of the poorly structured legacy code. Making this approach work would have required refactoring the entire system and its dependencies, not just the parts my team owned.

In my second approach, I also focussed on large tests and tried to mock services that were not required for the functionality under test. This also proved very difficult, because dependencies changed often and individual dependencies were hard to trace in a graph of over 200 services. Ultimately, this approach just shifted the required effort from maintaining test code to maintaining test dependencies and mocks.

My third and final approach, illustrated in the figure below, made small tests more powerful. In the typical end-to-end test we faced, the client made RPCcalls to several services, which in turn made RPC calls to other services. Together the client and the transitive closure over all backend services formed a large graph (not tree!) of dependencies, which all had to be up and running for the end-to-end test. The new model changes how we test client and service integration. Instead of running the client on inputs that will somehow trigger RPC calls, we write unit tests for the code making method calls to the RPC stub. The stub itself is mocked with a common mocking framework like Mockito in Java. For each such test, a second test verifies that the data used to drive that mock "makes sense" to the actual service. This is also done with a unit test, where a replay client uses the same data the RPC mock uses to call the RPC handler method of the service.

This pattern of integration testing applies to any RPC call, so the RPC calls made by a backend server to another backend can be tested just as well as front-end client calls. When we apply this approach consistently, we benefit from smaller tests that still test correct integration behavior, and make sure that the behavior we are testing is "real".

To arrive at this solution, I had to build, evaluate, and discard several prototypes. While it took a day to build a proof-of-concept for this approach, it took me and another engineer a year to implement a finished tool developers could use.

The engineers embraced the new solution very quickly when they saw that the new framework removes large amounts of boilerplate code from their tests. To further drive its adoption, I organized multi-day events with the engineering team where we focussed on migrating test cases. It took a few months to migrate all existing unit tests to the new framework, close gaps in coverage, and create the new tests that validate the mocks. Once we converted about 80% of the tests, we started comparing the efficacy of the new tests and the existing end-to-end tests.

The results are very good:
  • The new tests are as effective in finding bugs as the end-to-end tests are.
  • The new tests run in about 3 minutes instead of 30 minutes for the end-to-end tests.
  • The client side tests are 0% flaky. The verification tests are usually less flaky than the end-to-end tests, and never more.
Additionally, the new tests are unit tests, so you can run them in your IDE and step through them to debug. These results allowed us to run the end-to-end tests very rarely, only to detect misconfigurations of the interacting services, but not as functional tests.

Building and improving test infrastructure to help engineers be more productive is one of the many things test engineers do at Google. Running this project from requirements gathering all the way to a finished product gave me the opportunity to design and implement several prototypes, drive the full implementation of one solution, lead engineering teams to adoption of the new framework, and integrate feedback from engineers and actual measurements into the continuous refinement of the tool.
Categories: Testing & QA

Is Your Organization Killing Your Software?

From the Editor of Methods & Tools - Thu, 11/17/2016 - 16:19
When asked “What is your architecture?” most people immediately respond with how their software is laid out and what their plans are for improving parts of it. Rarely does anybody really think through their team and organizational architecture, and even more rarely do people understand how that may fundamentally impact how the software gets written […]

GTAC 2016 Registration Deadline Extended

Google Testing Blog - Tue, 11/15/2016 - 21:09
by Sonal Shah on behalf of the GTAC Committee

Our goal in organizing GTAC each year is to make it a first-class conference, dedicated to presenting leading edge industry practices. The quality of submissions we've received for GTAC 2016 so far has been overwhelming. In order to include the best talks possible, we are extending the deadline for speaker and attendee submissions by 15 days. The new timelines are as follows:

June 1, 2016 June 15, 2016 - Last day for speaker, attendee and diversity scholarship submissions.
June 15, 2016 July 15, 2016 - Attendees and scholarship awardees will be notified of selection/rejection/waitlist status. Those on the waitlist will be notified as space becomes available.
August 15, 2016 August 29, 2016 - Selected speakers will be notified.

To register, please fill out this form.
To apply for diversity scholarship, please fill out this form.

The GTAC website has a list of frequently asked questions. Please do not hesitate to contact if you still have any questions.

Categories: Testing & QA

GTAC 2016 Registration Deadline Extended

Google Testing Blog - Tue, 11/15/2016 - 21:09
by Sonal Shah on behalf of the GTAC Committee

Our goal in organizing GTAC each year is to make it a first-class conference, dedicated to presenting leading edge industry practices. The quality of submissions we've received for GTAC 2016 so far has been overwhelming. In order to include the best talks possible, we are extending the deadline for speaker and attendee submissions by 15 days. The new timelines are as follows:

June 1, 2016 June 15, 2016 - Last day for speaker, attendee and diversity scholarship submissions.
June 15, 2016 July 15, 2016 - Attendees and scholarship awardees will be notified of selection/rejection/waitlist status. Those on the waitlist will be notified as space becomes available.
August 15, 2016 August 29, 2016 - Selected speakers will be notified.

To register, please fill out this form.
To apply for diversity scholarship, please fill out this form.

The GTAC website has a list of frequently asked questions. Please do not hesitate to contact if you still have any questions.

Categories: Testing & QA

Hackable Projects - Pillar 2: Debuggability

Google Testing Blog - Thu, 11/10/2016 - 20:34
By: Patrik Höglund

This is the second article in our series on Hackability; also see the first article.

“Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before.” -- Edgar Allan Poe

Debuggability can mean being able to use a debugger, but here we’re interested in a broader meaning. Debuggability means being able to easily find what’s wrong with a piece of software, whether it’s through logs, statistics or debugger tools. Debuggability doesn’t happen by accident: you need to design it into your product. The amount of work it takes will vary depending on your product, programming language(s) and development environment.

In this article, I am going to walk through a few examples of how we have aided debuggability for our developers. If you do the same analysis and implementation for your project, perhaps you can help your developers illuminate the dark corners of the codebase and learn what truly goes on there.
Figure 1: computer log entry from the Mark II, with a moth taped to the page.
Running on Localhost Read more on the Testing Blog: Hermetic Servers by Chaitali Narla and Diego Salas

Suppose you’re developing a service with a mobile app that connects to that service. You’re working on a new feature in the app that requires changes in the backend. Do you develop in production? That’s a really bad idea, as you must push unfinished code to production to work on your change. Don’t do that: it could break your service for your existing users. Instead, you need some kind of script that brings up your server stack on localhost.

You can probably run your servers by hand, but that quickly gets tedious. In Google, we usually use fancy python scripts that invoke the server binaries with flags. Why do we need those flags? Suppose, for instance, that you have a server A that depends on a server B and C. The default behavior when the server boots should be to connect to B and C in production. When booting on localhost, we want to connect to our local B and C though. For instance:

b_serv --port=1234 --db=/tmp/fakedb
c_serv --port=1235
a_serv --b_spec=localhost:1234 --c_spec=localhost:1235

That makes it a whole lot easier to develop and debug your server. Make sure the logs and stdout/stderr end up in some well-defined directory on localhost so you don’t waste time looking for them. You may want to write a basic debug client that sends HTTP requests or RPCs or whatever your server handles. It’s painful to have to boot the real app on a mobile phone just to test something.

A localhost setup is also a prerequisite for making hermetic tests,where the test invokes the above script to bring up the server stack. The test can then run, say, integration tests among the servers or even client-server integration tests. Such integration tests can catch protocol drift bugs between client and server, while being super stable by not talking to external or shared services.
Debugging Mobile AppsFirst, mobile is hard. The tooling is generally less mature than for desktop, although things are steadily improving. Again, unit tests are great for hackability here. It’s really painful to always load your app on a phone connected to your workstation to see if a change worked. Robolectric unit tests and Espresso functional tests, for instance, run on your workstation and do not require a real phone. xcTestsand Earl Grey give you the same on iOS.

Debuggers ship with Xcode and Android Studio. If your Android app ships JNI code, it’s a bit trickier, but you can attach GDB to running processes on your phone. It’s worth spending the time figuring this out early in the project, so you don’t have to guess what your code is doing. Debugging unit tests is even better and can be done straightforwardly on your workstation.
When Debugging gets TrickySome products are harder to debug than others. One example is hard real-time systems, since their behavior is so dependent on timing (and you better not be hooked up to a real industrial controller or rocket engine when you hit a breakpoint!). One possible solution is to run the software on a fake clock instead of a hardware clock, so the clock stops when the program stops.

Another example is multi-process sandboxed programs such as Chromium. Since the browser spawns one renderer process per tab, how do you even attach a debugger to it? The developers have made it quite a lot easier with debugging flags and instructions. For instance, this wraps gdb around each renderer process as it starts up:

chrome --renderer-cmd-prefix='xterm -title renderer -e gdb --args'

The point is, you need to build these kinds of things into your product; this greatly aids hackability.
Proper LoggingRead more on the Testing Blog: Optimal Logging by Anthony Vallone

It’s hackability to get the right logs when you need them. It’s easy to fix a crash if you get a stack trace from the error location. It’s far from guaranteed you’ll get such a stack trace, for instance in C++ programs, but this is something you should not stand for. For instance, Chromium had a problem where renderer process crashes didn’t print in test logs, because the test was running in a separate process. This was later fixed, and this kind of investment is worthwhile to make. A clean stack trace is worth a lot more than a “renderer crashed” message.

Logs are also useful for development. It’s an art to determine how much logging is appropriate for a given piece of code, but it is a good idea to keep the default level of logging conservative and give developers the option to turn on more logging for the parts they’re working on (example: Chromium). Too much logging isn’t hackability. This article elaborates further on this topic.

Logs should also be properly symbolized for C/C++ projects; a naked list of addresses in a stack trace isn’t very helpful. This is easy if you build for development (e.g. with -g), but if the crash happens in a release build it’s a bit trickier. You then need to build the same binary with the same flags and use addr2line / ndk-stack / etc to symbolize the stack trace. It’s a good idea to build tools and scripts for this so it’s as easy as possible.
Monitoring and StatisticsIt aids hackability if developers can quickly understand what effect their changes have in the real world. For this, monitoring tools such as Stackdriver for Google Cloudare excellent. If you’re running a service, such tools can help you keep track of request volumes and error rates. This way you can quickly detect that 30% increase in request errors, and roll back that bad code change, before it does too much damage. It also makes it possible to debug your service in production without disrupting it.
System Under Test (SUT) SizeTests and debugging go hand in hand: it’s a lot easier to target a piece of code in a test than in the whole application. Small and focused tests aid debuggability, because when a test breaks there isn’t an enormous SUT to look for errors in. These tests will also be less flaky. This article discusses this fact at length.

Figure 2. The smaller the SUT, the more valuable the test.
You should try to keep the above in mind, particularly when writing integration tests. If you’re testing a mobile app with a server, what bugs are you actually trying to catch? If you’re trying to ensure the app can still talk to the server (i.e. catching protocol drift bugs), you should not involve the UI of the app. That’s not what you’re testing here. Instead, break out the signaling part of the app into a library, test that directly against your local server stack, and write separate tests for the UI that only test the UI.

Smaller SUTs also greatly aids test speed, since there’s less to build, less to bring up and less to keep running. In general, strive to keep the SUT as small as possible through whatever means necessary. It will keep the tests smaller, faster and more focused.
SourcesFigure 1: By Courtesy of the Naval Surface Warfare Center, Dahlgren, VA., 1988. - U.S. Naval Historical Center Online Library Photograph NH 96566-KN, Public Domain,

(Continue to Pillar 3: Infrastructure)
Categories: Testing & QA

Hackable Projects - Pillar 1: Code Health

Google Testing Blog - Thu, 11/10/2016 - 20:33
By: Patrik Höglund
IntroductionSoftware development is difficult. Projects often evolve over several years, under changing requirements and shifting market conditions, impacting developer tools and infrastructure. Technical debt, slow build systems, poor debuggability, and increasing numbers of dependencies can weigh down a project The developers get weary, and cobwebs accumulate in dusty corners of the code base.

Fighting these issues can be taxing and feel like a quixotic undertaking, but don’t worry — the Google Testing Blog is riding to the rescue! This is the first article of a series on “hackability” that identifies some of the issues that hinder software projects and outlines what Google SETIs usually do about them.

According to Wiktionary, hackable is defined as:
hackable ‎(comparative more hackable, superlative most hackable)
  1. (computing) That can be hacked or broken into; insecure, vulnerable. 
  2. That lends itself to hacking (technical tinkering and modification); moddable.

Obviously, we’re not going to talk about making your product more vulnerable (by, say, rolling your own crypto or something equally unwise); instead, we will focus on the second definition, which essentially means “something that is easy to work on.” This has become the mainfocus for SETIs at Google as the role has evolved over the years.
In PracticeIn a hackable project, it’s easy to try things and hard to break things. Hackability means fast feedback cycles that offer useful information to the developer.

This is hackability:
  • Developing is easy
  • Fast build
  • Good, fast tests
  • Clean code
  • Easy running + debugging
  • One-click rollbacks
In contrast, what is not hackability?
  • Broken HEAD (tip-of-tree)
  • Slow presubmit (i.e. checks running before submit)
  • Builds take hours
  • Incremental build/link > 30s
  • Flakytests
  • Can’t attach debugger
  • Logs full of uninteresting information
The Three Pillars of HackabilityThere are a number of tools and practices that foster hackability. When everything is in place, it feels great to work on the product. Basically no time is spent on figuring out why things are broken, and all time is spent on what matters, which is understanding and working with the code. I believe there are three main pillars that support hackability. If one of them is absent, hackability will suffer. They are:

Pillar 1: Code Health“I found Rome a city of bricks, and left it a city of marble.”
   -- Augustus
Keeping the code in good shape is critical for hackability. It’s a lot harder to tinker and modify something if you don’t understand what it does (or if it’s full of hidden traps, for that matter).
TestsUnit and small integration tests are probably the best things you can do for hackability. They’re a support you can lean on while making your changes, and they contain lots of good information on what the code does. It isn’t hackability to boot a slow UI and click buttons on every iteration to verify your change worked - it is hackability to run a sub-second set of unit tests! In contrast, end-to-end (E2E) tests generally help hackability much less (and can evenbe a hindrance if they, or the product, are in sufficiently bad shape).

Figure 1: the Testing Pyramid.
I’ve always been interested in how you actually make unit tests happen in a team. It’s about education. Writing a product such that it has good unit tests is actually a hard problem. It requires knowledge of dependency injection, testing/mocking frameworks, language idioms and refactoring. The difficulty varies by language as well. Writing unit tests in Go or Java is quite easy and natural, whereas in C++ it can be very difficult (and it isn’t exactly ingrained in C++ culture to write unit tests).

It’s important to educate your developers about unit tests. Sometimes, it is appropriate to lead by example and help review unit tests as well. You can have a large impact on a project by establishing a pattern of unit testing early. If tons of code gets written without unit tests, it will be much harder to add unit tests later.

What if you already have tons of poorly tested legacy code? The answer is refactoring and adding tests as you go. It’s hard work, but each line you add a test for is one more line that is easier to hack on.
Readable Code and Code ReviewAt Google, “readability” is a special committer status that is granted per language (C++, Go, Java and so on). It means that a person not only knows the language and its culture and idioms well, but also can write clean, well tested and well structured code. Readability literally means that you’re a guardian of Google’s code base and should push back on hacky and ugly code. The use of a style guide enforces consistency, and code review (where at least one person with readability must approve) ensures the code upholds high quality. Engineers must take care to not depend too much on “review buddies” here but really make sure to pull in the person that can give the best feedback.

Requiring code reviews naturally results in small changes, as reviewers often get grumpy if you dump huge changelists in their lap (at least if reviewers are somewhat fast to respond, which they should be). This is a good thing, since small changes are less risky and are easy to roll back. Furthermore, code review is good for knowledge sharing. You can also do pair programming if your team prefers that (a pair-programmed change is considered reviewed and can be submitted when both engineers are happy). There are multiple open-source review tools out there, such as Gerrit.

Nice, clean code is great for hackability, since you don’t need to spend time to unwind that nasty pointer hack in your head before making your changes. How do you make all this happen in practice? Put together workshops on, say, the SOLID principles, unit testing, or concurrency to encourage developers to learn. Spread knowledge through code review, pair programming and mentoring (such as with the Readability concept). You can’t just mandate higher code quality; it takes a lot of work, effort and consistency.
Presubmit Testing and LintConsistently formatted source code aids hackability. You can scan code faster if its formatting is consistent. Automated tooling also aids hackability. It really doesn’t make sense to waste any time on formatting source code by hand. You should be using tools like gofmt, clang-format, etc. If the patch isn’t formatted properly, you should see something like this (example from Chrome):

$ git cl upload
Error: the media/audio directory requires formatting. Please run
git cl format media/audio.

Source formatting isn’t the only thing to check. In fact, you should check pretty much anything you have as a rule in your project. Should other modules not depend on the internals of your modules? Enforce it with a check. Are there already inappropriate dependencies in your project? Whitelist the existing ones for now, but at least block new bad dependencies from forming. Should our app work on Android 16 phones and newer? Add linting, so we don’t use level 17+ APIs without gating at runtime. Should your project’s VHDL code always place-and-route cleanly on a particular brand of FPGA? Invoke the layout tool in your presubmit and and stop submit if the layout process fails.

Presubmit is the most valuable real estate for aiding hackability. You have limited space in your presubmit, but you can get tremendous value out of it if you put the right things there. You should stop all obvious errors here.

It aids hackability to have all this tooling so you don’t have to waste time going back and breaking things for other developers. Remember you need to maintain the presubmit well; it’s not hackability to have a slow, overbearing or buggy presubmit. Having a good presubmit can make it tremendously more pleasant to work on a project. We’re going to talk more in later articles on how to build infrastructure for submit queues and presubmit.
Single Branch And Reducing RiskHaving a single branch for everything, and putting risky new changes behind feature flags, aids hackability since branches and forks often amass tremendous risk when it’s time to merge them. Single branches smooth out the risk. Furthermore, running all your tests on many branches is expensive. However, a single branch can have negative effects on hackability if Team A depends on a library from Team B and gets broken by Team B a lot. Having some kind of stabilization on Team B’s software might be a good idea there. Thisarticle covers such situations, and how to integrate often with your dependencies to reduce the risk that one of them will break you.
Loose Coupling and TestabilityTightly coupled code is terrible for hackability. To take the most ridiculous example I know: I once heard of a computer game where a developer changed a ballistics algorithm and broke the game’s chat. That’s hilarious, but hardly intuitive for the poor developer that made the change. A hallmark of loosely coupled code is that it’s upfront about its dependencies and behavior and is easy to modify and move around.

Loose coupling, coherence and so on is really about design and architecture and is notoriously hard to measure. It really takes experience. One of the best ways to convey such experience is through code review, which we’ve already mentioned. Education on the SOLID principles, rules of thumb such as tell-don’t-ask, discussions about anti-patterns and code smells are all good here. Again, it’s hard to build tooling for this. You could write a presubmit check that forbids methods longer than 20 lines or cyclomatic complexity over 30, but that’s probably shooting yourself in the foot. Developers would consider that overbearing rather than a helpful assist.

SETIs at Google are expected to give input on a product’s testability. A few well-placed test hooks in your product can enable tremendously powerful testing, such as serving mock content for apps (this enables you to meaningfully test app UI without contacting your real servers, for instance). Testability can also have an influence on architecture. For instance, it’s a testability problem if your servers are built like a huge monolith that is slow to build and start, or if it can’t boot on localhost without calling external services. We’ll cover this in the next article.
Aggressively Reduce Technical DebtIt’s quite easy to add a lot of code and dependencies and call it a day when the software works. New projects can do this without many problems, but as the project becomes older it becomes a “legacy” project, weighed down by dependencies and excess code. Don’t end up there. It’s bad for hackability to have a slew of bug fixes stacked on top of unwise and obsolete decisions, and understanding and untangling the software becomes more difficult.

What constitutes technical debt varies by project and is something you need to learn from experience. It simply means the software isn’t in optimal form. Some types of technical debt are easy to classify, such as dead code and barely-used dependencies. Some types are harder to identify, such as when the architecture of the project has grown unfit to the task from changing requirements. We can’t use tooling to help with the latter, but we can with the former.

I already mentioned that dependency enforcement can go a long way toward keeping people honest. It helps make sure people are making the appropriate trade-offs instead of just slapping on a new dependency, and it requires them to explain to a fellow engineer when they want to override a dependency rule. This can prevent unhealthy dependencies like circular dependencies, abstract modules depending on concrete modules, or modules depending on the internals of other modules.

There are various tools available for visualizing dependency graphs as well. You can use these to get a grip on your current situation and start cleaning up dependencies. If you have a huge dependency you only use a small part of, maybe you can replace it with something simpler. If an old part of your app has inappropriate dependencies and other problems, maybe it’s time to rewrite that part.

(Continue to Pillar 2: Debuggability)
Categories: Testing & QA

Hackable Projects - Pillar 3: Infrastructure

Google Testing Blog - Thu, 11/10/2016 - 20:12
By: Patrik Höglund

This is the third article in our series on Hackability; also see the first and second article.

We have seen in our previous articles how Code Health and Debuggability can make a project much easier to work on. The third pillar is a solid infrastructure that gets accurate feedback to your developers as fast as possible. Speed is going to be major theme in this article, and we’ll look at a number of things you can do to make your project easier to hack on.

Build Systems Speed
Question: What’s a change you’d really like to see in our development tools?

“I feel like this question gets asked almost every time, and I always give the same answer:
 I would like them to be faster.”
        -- Ian Lance Taylor

Replace make with ninja. Use the gold linker instead of ld. Detect and delete dead code in your project (perhaps using coverage tools). Reduce the number of dependencies, and enforce dependency rules so new ones are not added lightly. Give the developers faster machines. Use distributed build, which is available with many open-source continuous integration systems (or use Google’s system, Bazel!). You should do everything you can to make the build faster.

Figure 1: “Cheetah chasing its prey” by Marlene Thyssen.

Why is that? There’s a tremendous difference in hackability if it takes 5 seconds to build and test versus one minute, or even 20 minutes, to build and test. Slow feedback cycles kill hackability, for many reasons:
  • Build and test times longer than a handful of seconds cause many developers’ minds to wander, taking them out of the zone.
  • Excessive build or release times* makes tinkering and refactoring much harder. All developers have a threshold when they start hedging (e.g. “I’d like to remove this branch, but I don’t know if I’ll break the iOS build”) which causes refactoring to not happen.

* The worst I ever heard of was an OS that took 24 hours to build!

How do you actually make fast build systems? There are some suggestions in the first paragraph above, but the best general suggestion I can make is to have a few engineers on the project who deeply understand the build systems and have the time to continuously improve them. The main axes of improvement are:
  1. Reduce the amount of code being compiled.
  2. Replace tools with faster counterparts.
  3. Increase processing power, maybe through parallelization or distributed systems.
Note that there is a big difference between full builds and incremental builds. Both should be as fast as possible, but incremental builds are by far the most important to optimize. The way you tackle the two is different. For instance, reducing the total number of source files will make a full build faster, but it may not make an incremental build faster. 
To get faster incremental builds, in general, you need to make each source file as decoupled as possible from the rest of the code base. The less a change ripples through the codebase, the less work to do, right? See “Loose Coupling and Testability” in Pillar 1 for more on this subject. The exact mechanics of dependencies and interfaces depends on programming language - one of the hardest to get right is unsurprisingly C++, where you need to be disciplined with includes and forward declarations to get any kind of incremental build performance. 
Build scripts and makefiles should be held to standards as high as the code itself. Technical debt and unnecessary dependencies have a tendency to accumulate in build scripts, because no one has the time to understand and fix them. Avoid this by addressing the technical debt as you go.

Continuous Integration and Presubmit Queues
You should build and run tests on all platforms you release on. For instance, if you release on all the major desktop platforms, but all your developers are on Linux, this becomes extra important. It’s bad for hackability to update the repo, build on Windows, and find that lots of stuff is broken. It’s even worse if broken changes start to stack on top of each other. I think we all know that terrible feeling: when you’re not sure your change is the one that broke things.
At a minimum, you should build and test on all platforms, but it’s even better if you do it in presubmit. The Chromium submit queue does this. It has developed over the years so that a normal patch builds and tests on about 30 different build configurations before commit. This is necessary for the 400-patches-per-day velocity of the Chrome project. Most projects obviously don’t have to go that far. Chromium’s infrastructure is based off BuildBot, but there are many other job scheduling systems depending on your needs.
Figure 2: How a Chromium patch is tested.
As we discussed in Build Systems, speed and correctness are critical here. It takes a lot of ongoing work to keep build, tests, and presubmits fast and free of flakes. You should never accept flakes, since developers very quickly lose trust in flaky tests and systems. Tooling can help a bit with this; for instance, see the Chromium flakiness dashboard.

Test Speed
Speed is a feature, and this is particularly true for developer infrastructure. In general, the longer a test takes to execute, the less valuable it is. My rule of thumb is: if it takes more than a minute to execute, its value is greatly diminished. There are of course some exceptions, such as soak tests or certain performance tests. 
Figure 3. Test value.
If you have tests that are slower than 60 seconds, they better be incredibly reliable and easily debuggable. A flaky test that takes several minutes to execute often has negative value because it slows down all work in the code it covers. You probably want to build better integration tests on lower levels instead, so you can make them faster and more reliable.
If you have many engineers on a project, reducing the time to run the tests can have a big impact. This is one reason why it’s great to have SETIs or the equivalent. There are many things you can do to improve test speed:
  • Sharding and parallelization. Add more machines to your continuous build as your test set or number of developers grows.
  • Continuously measure how long it takes to run one build+test cycle in your continuous build, and have someone take action when it gets slower.
  • Remove tests that don’t pull their weight. If a test is really slow, it’s often because of poorly written wait conditions or because the test bites off more than it can chew (maybe that unit test doesn’t have to process 15000 audio frames, maybe 50 is enough).
  • If you have tests that bring up a local server stack, for instance inter-server integration tests, making your servers boot faster is going to make the tests faster as well. Faster production code is faster to test! See Running on Localhost, in Pillar 2 for more on local server stacks.

Workflow Speed
We’ve talked about fast builds and tests, but the core developer workflows are also important, of course. Chromium undertook multi-year project to switch from Subversion to Git, partly because Subversion was becoming too slow. You need to keep track of your core workflows as your project grows. Your version control system may work fine for years, but become too slow once the project becomes big enough. Bug search and management must also be robust and fast, since this is generally systems developers touch every day.

Release Often
It aids hackability to deploy to real users as fast as possible. No matter how good your product's tests are, there's always a risk that there's something you haven't thought of. If you’re building a service or web site, you should aim to deploy multiple times per week. For client projects, Chrome’s six-weeks cycle is a good goal to aim for.
You should invest in infrastructure and tests that give you the confidence to do this - you don’t want to push something that’s broken. Your developers will thank you for it, since it makes their jobs so much easier. By releasing often, you mitigate risk, and developers will rush less to hack late changes in the release (since they know the next release isn’t far off).

Easy Reverts
If you look in the commit log for the Chromium project, you will see that a significant percentage of the commits are reverts of a previous commits. In Chromium, bad commits quickly become costly because they impede other engineers, and the high velocity can cause good changes to stack onto bad changes. 
Figure 4: Chromium’s revert button.

This is why the policy is “revert first, ask questions later”. I believe a revert-first policy is good for small projects as well, since it creates a clear expectation that the product/tools/dev environment should be working at all times (and if it doesn’t, a recent change should probably be reverted).
It has a wonderful effect when a revert is simple to make. You can suddenly make speculative reverts if a test went flaky or a performance test regressed. It follows that if a patch is easy to revert, so is the inverse (i.e. reverting the revert or relanding the patch). So if you were wrong and that patch wasn’t guilty after all, it’s simple to re-land it again and try reverting another patch. Developers might initially balk at this (because it can’t possibly be their patch, right?), but they usually come around when they realize the benefits.
For many projects, a revert can simply begit revert 9fbadbeefgit push origin master

If your project (wisely) involves code review, it will behoove you to add something like Chromium’s revert button that I mentioned above. The revert button will create a special patch that bypasses review and tests (since we can assume a clean revert takes us back to a more stable state rather than the opposite). See Pillar 1 for more on code review and its benefits.
For some projects, reverts are going to be harder, especially if you have a slow or laborious release process. Even if you release often, you could still have problems if a revert involves state migrations in your live services (for instance when rolling back a database schema change). You need to have a strategy to deal with such state changes. 

Reverts must always put you back to safer ground, and everyone must be confident they can safely revert. If not, you run the risk of massive fire drills and lost user trust if a bad patch makes it through the tests and you can’t revert it.

Performance Tests: Measure Everything
Is it critical that your app starts up within a second? Should your app always render at 60 fps when it’s scrolled up or down? Should your web server always serve a response within 100 ms? Should your mobile app be smaller than 8 MB? If so, you should make a performance test for that. Performance tests aid hackability since developers can quickly see how their change affects performance and thus prevent performance regressions from making it into the wild.
You should run your automated performance tests on the same device; all devices are different and this will reflect in the numbers. This is fairly straightforward if you have a decent continuous integration system that runs tests sequentially on a known set of worker machines. It’s harder if you need to run on physical phones or tablets, but it can be done. 
A test can be as simple as invoking a particular algorithm and measuring the time it takes to execute it (median and 90th percentile, say, over N runs).
Figure 5: A VP8-in-WebRTC regression (bug) and its fix displayed in a Catapult dashboard.

Write your test so it outputs performance numbers you care about. But what to do with those numbers? Fortunately, Chrome’s performance test framework has been open-sourced, which means you can set up a dashboard, with automatic regression monitoring, with minimal effort. The test framework also includes the powerful Telemetry framework which can run actions on web pages and Android apps and report performance results. Telemetry and Catapult are battle-tested by use in the Chromium project and are capable of running on a wide set of devices, while getting the maximum of usable performance data out of the devices.

Figure 1: By Malene Thyssen (Own work) [CC BY-SA 3.0 (], via Wikimedia Commons

Categories: Testing & QA

The Software Development Knights Who Say “No!”

From the Editor of Methods & Tools - Wed, 11/09/2016 - 17:13
In the movie “Monty Python and the Holy Grail“, King Arthur and the Knights of the Round Table had to face the obstacle of the knights who say “ni!” to travel further in their quest for the Grail. In the modern software development world, it seems that software developers can instead follow the knights that […]

Software Architecture for the Unknown

From the Editor of Methods & Tools - Thu, 11/03/2016 - 16:35
In this SATURN 2016 Keynote, Grady Booch discusses about architecting the unknown. There are many systems that we know how to architect (usually because we’ve built them many times before). There also many systems for which we know a process that will lead us to a reasonable architecture (usually because the forces on our project […]

A Whirlwind Tour of the Kotlin Type Hierarchy

Mistaeks I Hav Made - Nat Pryce - Fri, 10/28/2016 - 09:08
Kotlin has plenty of good language documentation and tutorials. But I’ve not found an article that describes in one place how Kotlin’s type hierarchy fits together. That’s a shame, because I find it to be really neat1. Kotlin’s type hierarchy has very few rules to learn. Those rules combine together consistently and predictably. Thanks to those rules, Kotlin can provide useful, user extensible language features – null safety, polymorphism, and unreachable code analysis – without resorting to special cases and ad-hoc checks in the compiler and IDE. Starting from the Top All types of Kotlin object are organised into a hierarchy of subtype/supertype relationships. At the “top” of that hierarchy is the abstract class Any. For example, the types String and Int are both subtypes of Any. Any is the equivalent of Java’s Object class. Unlike Java, Kotlin does not draw a distinction between “primitive” types, that are intrinsic to the language, and user-defined types. They are all part of the same type hierarchy. If you define a class that is not explicitly derived from another class, the class will be an immediate subtype of Any. class Fruit(val ripeness: Double) If you do specify a base class for a user-defined class, the base class will be the immediate supertype of the new class, but the ultimate ancestor of the class will be the type Any. abstract class Fruit(val ripeness: Double) class Banana(ripeness: Double, val bendiness: Double): Fruit(ripeness) class Peach(ripeness: Double, val fuzziness: Double): Fruit(ripeness) If your class implements one or more interfaces, it will have multiple immediate supertypes, with Any as the ultimate ancestor. interface ICanGoInASalad interface ICanBeSunDried class Tomato(ripeness: Double): Fruit(ripeness), ICanGoInASalad, ICanBeSunDried The Kotlin type checker enforces subtype/supertype relationships. For example, you can store a subtype into a supertype variable: var f: Fruit = Banana(bendiness=0.5) f = Peach(fuzziness=0.8) But you cannot store a supertype value into a subtype variable: val b = Banana(bendiness=0.5) val f: Fruit = b val b2: Banana = f // Error: Type mismatch: inferred type is Fruit but Banana was expected Nullable Types Unlike Java, Kotlin distinguishes between “non-null” and “nullable” types. The types we’ve seen so far are all “non-null”. Kotlin does not allow null to be used as a value of these types. You’re guaranteed that dereferencing a reference to a value of a “non-null” type will never throw a NullPointerException. The type checker rejects code that tries to use null or a nullable type where a non-null type is expected. For example: var s : String = null // Error: Null can not be a value of a non-null type String If you want a value to maybe be null, you need to use the nullable equivalent of the value type, denoted by the suffix ‘?’. For example, the type String? is the nullable equivalent String, and so allows all String values plus null. var s : String? = null s = "foo" s = null s = bar The type checker ensures that you never use a nullable value without having first tested that it is not null. Kotlin provides operators to make working with nullable types more convenient. See the Null Safety section of the Kotlin language reference for examples. When non-null types are related by subtyping, their nullable equivalents are also related in the same way. For example, because String is a subtype of Any, String? is a subtype of Any?, and because Banana is a subtype of Fruit, Banana? is a subtype of Fruit?. Just as Any is the root of the non-null type hierarchy, Any? is the root of the nullable type hierarchy. Because Any? is the supertype of Any, Any? is the very top of Kotlin’s type hierarchy. A non-null type is a subtype of its nullable equivalent. For example, String, as well as being a subtype of Any, is also a subtype of String?. This is why you can store a non-null String value into a nullable String? variable, but you cannot store a nullable String? value into a non-null String variable. Kotlin’s null safety is not enforced by special rules, but is an outcome of the same subtype/supertype rules that apply between non-null types. This applies to user-defined type hierarchies as well. Unit Kotlin is an expression oriented language. All control flow statements (apart from variable assignment, unusually) are expressions. Kotlin does not have void functions, like Java and C. Functions always return a value. Functions that don’t actually calculate anything – being called for their side effect, for example – return Unit, a type that has a single value, also called Unit. Most of the time you don’t need to explicitly specify Unit as a return type or return Unit from functions. If you write a function with a block body and do not specify the result type, the compiler will treat it as a Unit function. Otherwise the compiler will infer it. fun example() { println("block body and no explicit return type, so returns Unit") } val u: Unit = example() There’s nothing special about Unit. Like any other type, it’s a subtype of Any. It can be made nullable, so is a subtype of Unit?, which is a subtype of Any?. The type Unit? is a strange little edge case, a result of the consistency of Kotlin’s type system. It has only two members: the Unit value and null. I’ve never found a need to use it explicitly, but the fact that there is no special case for “void” in the type system makes it much easier to treat all kinds of functions generically. Nothing At the very bottom of the Kotlin type hierarchy is the type Nothing. As its name suggests, Nothing is a type that has no instances. An expression of type Nothing does not result in a value. Note the distinction between Unit and Nothing. Evaluation of an expression type Unit results in the singleton value Unit. Evaluation of an expression of type Nothing never returns at all. This means that any code following an expression of type Nothing is unreachable. The compiler and IDE will warn you about such unreachable code. What kinds of expression evaluate to Nothing? Those that perform control flow. For example, the throw keyword interrupts the calculation of an expression and throws an exception out of the enclosing function. A throw is therefore an expression of type Nothing. By having Nothing as a subtype of every other type, the type system allows any expression in the program to actually fail to calculate a value. This models real world eventualities, such as the JVM running out of memory while calculating an expression, or someone pulling out the computer’s power plug. It also means that we can throw exceptions from within any expression. fun formatCell(value: Double): String = if (value.isNaN()) throw IllegalArgumentException("$value is not a number") else value.toString() It may come as a surprise to learn that the return statement has the type Nothing. Return is a control flow statement that immediately returns a value from the enclosing function, interrupting the evaluation of any expression of which it is a part. fun formatCellRounded(value: Double): String = val rounded: Long = if (value.isNaN()) return "#ERROR" else Math.round(value) rounded.toString() A function that enters an infinite loop or kills the current process has a result type of Nothing. For example, the Kotlin standard library declares the exitProcess function as: fun exitProcess(status: Int): Nothing If you write your own function that returns Nothing, the compiler will check for unreachable code after a call to your function just as it does with built-in control flow statements. inline fun forever(action: ()->Unit): Nothing { while(true) action() } fun example() { forever { println("doing...") } println("done") // Warning: Unreachable code } Like null safety, unreachable code analysis is not implemented by ad-hoc, special-case checks in the IDE and compiler, as it has to be in Java. It’s a function of the type system. Nullable Nothing? Nothing, like any other type, can be made nullable, giving the type Nothing?. Nothing? can only contain one value: null. In fact, Nothing? is the type of null. Nothing? is the ultimate subtype of all nullable types, which lets the value null be used as a value of any nullable type. Conclusion When you consider it all at once, Kotlin’s entire type hierarchy can feel quite complicated. But never fear! I hope this article has demonstrated that Kotlin has a simple and consistent type system. There are few rules to learn: a hierarchy of supertype/subtype relationships with Any? at the top and Nothing at the bottom, and subtype relationships between non-null and nullable types. That’s it. There are no special cases. Useful language features like null safety, object-oriented polymorphism, and unreachable code analysis all result from these simple, predictable rules. Thanks to this consistency, Kotlin’s type checker is a powerful tool that helps you write concise, correct programs. “Neat” meaning “done with or demonstrating skill or efficiency”, rather than the Kevin Costner backstage at a Madonna show sense of the word↩
Categories: Programming, Testing & QA

Software Development Conferences Forecast October 2016

From the Editor of Methods & Tools - Tue, 10/25/2016 - 14:02
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP), DevOps and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods […]

Creating EC2 and Route 53 resources with Terraform

Agile Testing - Grig Gheorghiu - Thu, 10/20/2016 - 22:14
Inspired by the great series of Terraform-related posts published on the Gruntwork blog, I've been experimenting with Terraform the last couple of days. So far, I like it a lot, and I think the point that Yevgeniy Brikman makes in the first post of the series, on why they chose Terraform over other tools (which are the usual suspects Chef, Puppet, Ansible), is a very valid point: Terraform is a declarative and client-only orchestration tool that allows you to manage immutable infrastructure. Read that post for more details about why this is a good thing.

In this short post I'll show how I am using Terraform in its Docker image incarnation to create an AWS ELB and a Route 53 CNAME record pointing to the name of the newly-created ELB.

I created a directory called terraform and created this Dockerfile inside it:

$ cat Dockerfile
FROM hashicorp/terraform:full
COPY data /data/

My Terraform configuration files are under terraform/data locally. I have 2 files, one for variable declarations and one for the actual resource declarations.

Here is the variable declaration file:

$ cat data/

variable "access_key" {}
variable "secret_key" {}
variable "region" {
  default = "us-west-2"

variable "exposed_http_port" {
  description = "The HTTP port exposed by the application"
  default = 8888

variable "security_group_id" {
  description = "The ID of the ELB security group"
  default = "sg-SOMEID"

variable "host1_id" {
  description = "EC2 Instance ID for ELB Host #1"
  default = "i-SOMEID1"

variable "host2_id" {
  description = "EC2 Instance ID for ELB Host #2"
  default = "i-SOMEID2"

variable "elb_cname" {
  description = "CNAME for the ELB"

variable "route53_zone_id" {
  description = "Zone ID for the Route 53 zone "
  default = "MY_ZONE_ID"

Note that I am making some assumptions, namely that the ELB will point to 2 EC2 instances. This is because I know beforehand which 2 instances I want to point it to. In my case, those instances are configured as Rancher hosts, and the ELB's purpose is to expose as port 80 to the world an internal Rancher load balancer port (say 8888).

Here is the resource declaration file:

$ cat data/

# --------------------------------------------------------
# --------------------------------------------------------

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"

# --------------------------------------------------------
# --------------------------------------------------------

resource "aws_elb" "my-elb" {
  name = "MY-ELB"
  availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]

  listener {
    instance_port = "${var.exposed_http_port}"
    instance_protocol = "http"
    lb_port = 80
    lb_protocol = "http"

  health_check {
    healthy_threshold = 2
    unhealthy_threshold = 2
    timeout = 3
    target = "TCP:${var.exposed_http_port}"
    interval = 10

  instances = ["${var.host1_id}", "${var.host2_id}"]
  security_groups = ["${var.security_group_id}"]
  cross_zone_load_balancing = true
  idle_timeout = 400
  connection_draining = true
  connection_draining_timeout = 400

  tags {
    Name = "MY-ELB"

# --------------------------------------------------------
# --------------------------------------------------------

resource "aws_route53_record" "my-elb-cname" {
  zone_id = "${var.route53_zone_id}"
  name = "${var.elb_cname}"
  type = "CNAME"
  ttl = "300"
  records = ["${}"]

First I declare a provider of type aws, then I declare 2 resources, one of type aws_elb, and another one of type aws_route53_record. These types are all detailed in the very good Terraform documentation for the AWS provider.

The aws_elb resource defines an ELB named my-elb, which points to the 2 EC2 instances mentioned above. The instances are specified by their variable names from, using the syntax ${var.VARIABLE_NAME}, e.g. ${var.host1_id}. For the other properties of the ELB resource, consult the Terraform aws_elb documentation.

The aws_route53_record defined a CNAME record in the given Route 53 zone file (specified via the route53_zone_id variable). An important thing to note here is that the CNAME points to the name of the ELB just created via the variable. This is one of the powerful things you can do in Terraform - reference properties of resources in other resources.

Again, for more details on aws_route53_record, consult the Terraform documentation.

Given these files, I built a local Docker image:

$ docker build -t terraform:local

I can then run the Terraform 'plan' command to see what Terraform intends to do:

$ docker run -t --rm terraform:local plan \
-var "access_key=$TERRAFORM_AWS_ACCESS_KEY" \
-var "secret_key=$TERRAFORM_AWS_SECRET_KEY" \
-var "exposed_http_port=$LB_EXPOSED_HTTP_PORT" \
-var "elb_cname=$ELB_CNAME"

The nice thing about this is that I can run Terraform in exactly the same way via Jenkins. The variables above are defined in Jenkins either as credentials of type 'secret text' (the 2 AWS keys), or as build parameters of type string. In Jenkins, the Docker image name would be specified as an ECR image, something of the type

After making sure that the plan corresponds to what I expected, I ran the Terraform apply command:

$ docker run -t --rm terraform:local apply \
-var "access_key=$TERRAFORM_AWS_ACCESS_KEY" \
-var "secret_key=$TERRAFORM_AWS_SECRET_KEY" \
-var "exposed_http_port=$LB_EXPOSED_HTTP_PORT" \
-var "elb_cname=$ELB_CNAME"

One thing to note is that the AWS credentials I used are for an IAM user that has only the privileges needed to create the resources I need. I tinkered with the IAM policy generator until I got it right. Terraform will emit various AWS errors when it's not able to make certain calls. Those errors help you add the required privileges to the IAM policies. In my case, here are some example of policies.

Allow all ELB operations:

"Version": "2012-10-17",
"Statement": [
"Sid": "Stmt1476919435000",
"Effect": "Allow",
"Action": [ "elasticloadbalancing:*"
"Resource": [
Allow the ec2:DescribeSecurityGroups operation:

"Version": "2012-10-17",
"Statement": [
"Sid": "Stmt1476983196000",
"Effect": "Allow",
"Action": [
"Resource": [
Allow the route53:GetHostedZone and GetChange operations:
"Version": "2012-10-17",
"Statement": [
"Sid": "Stmt1476986919000",
"Effect": "Allow",
"Action": [
"Resource": [

Allow the creation, changing and listing of Route 53 record sets in a given Route 53 zone file:
"Version": "2012-10-17",
"Statement": [
"Sid": "Stmt1476987070000",
"Effect": "Allow",
"Action": [
"Resource": [

Software Development Linkopedia October 2016

From the Editor of Methods & Tools - Wed, 10/19/2016 - 11:02
Here is our monthly selection of knowledge on programming, software testing and project management. This month you will find some interesting information and opinions about the life and career of a software developer, technical debt, DevOps, software testing, metrics, microservices, API and mobile testing. Blog: Being A Developer After 40 Blog: Tech Debt Snowball – […]

Quote of the Month October 2016

From the Editor of Methods & Tools - Wed, 10/12/2016 - 12:00
If you blindly accept what clients say they want and proceed with a project on that basis, both you and the clients may be in for a rude awakening. You will have built something the clients cannot use. Often in the process of building a solution, the clients learn what they need is not the […]

A Critical Review of the IoT Hype

From the Editor of Methods & Tools - Thu, 10/06/2016 - 16:58
According to the Gartner Hype Cycle 2015, Internet of Things (IoT) is on the “Peak of Inflated Expectations”. So it’s time to take a good hard look past the promises of IoT , to see what it is really accomplishing. What is IoT really? What do companies mean when they dream about the IoT revolution? […]

Accountability for What You Say is Dangerous and That’s Okay

James Bach’s Blog - Sat, 10/01/2016 - 20:33

[Note: I offered Maaret Pyhäjärvi the right to review this post and suggest edits to it before I published it. She declined.]

A few days ago I was keynoting at the New Testing Conference, in New York City, and I used a slide that has offended some people on Twitter. This blog post is intended to explore that and hopefully improve the chances that if you think I’m a bad guy, you are thinking that for the right reasons and not making a mistake. It’s never fun for me to be a part of something that brings pain to other people. I believe my actions were correct, yet still I am sorry that I caused Maaret hurt, and I will try to think of ways to confer better in the future.

Here’s the theme of this post: Getting up in front of the world to speak your mind is a dangerous process. You will be misunderstood, and that will feel icky. Whether or not you think of yourself as a leader, speaking at a conference IS an act of leadership, and leadership carries certain responsibilities.

I long ago learned to let go of the outcome when I speak in public. I throw the ideas out there, and I do that as an American Aging Overweight Left-Handed Atheist Married Father-And-Father-Figure Rough-Mannered Bearded Male Combative Aggressive Assertive High School Dropout Self-Confident Freedom-Loving Sometimes-Unpleasant-To-People-On-Twitter Intellectual. I know that my ideas will not be considered in a neutral context, but rather in the context of how people feel about all that. I accept that.  But, I have been popular and successful as a speaker in the testing world, so maybe, despite all the difficulties, enough of my message and intent gets through, overall.

What I can’t let go of is my responsibility to my audience and the community at large to speak the truth and to do so in a compassionate and reasonable way. Regardless of what anyone else does with our words, I believe we speakers need to think about how our actions help or harm others. I think a lot about this.

Let me clarify. I’m not saying it’s wrong to upset people or to have disagreement. We have several different culture wars (my reviewers said “do you have to say wars?”) going on in the software development and testing worlds right now, and they must continue or be resolved organically in the marketplace of ideas. What I’m saying is that anyone who speaks out publicly must try to be cognizant of what words do and accept the right of others to react.

Although I’m surprised and certainly annoyed by the dark interpretations some people are making of what I did, the burden of such feelings is what I took on when I first put myself forward as a public scold about testing and software engineering, a quarter century ago. My annoyance about being darkly interpreted is not your problem. Your problem, assuming you are reading this and are interested in the state of the testing craft, is to feel what you feel and think what you think, then react as best fits your conscience. Then I listen and try to debug the situation, including helping you debug yourself while I debug myself. This process drives the evolution of our communities. Jay Philips, Ash Coleman, Mike Talks, Ilari Henrik Aegerter, Keith Klain, Anna Royzman, Anne-Marie Charrett, David Greenlees, Aaron Hodder, Michael Bolton, and my own wife all approached me with reactions that helped me write this post. Some others approached me with reactions that weren’t as helpful, and that’s okay, too.

Leadership and The Right of Responding to Leaders

In my code of conduct, I don’t get to say “I’m not a leader.” I can say no one works for me and no one has elected me, but there is more to leadership than that. People with strong voices and ideas gain a certain amount of influence simply by virtue of being interesting. I made myself interesting, and some people want to hear what I have to say. But that comes with an implied condition that I behave reasonably. The community, over time negotiates what “reasonable” means. I am both a participant and a subject of those negotiations. I recommend that we hold each other accountable for our public, professional words. I accept accountability for mine. I insist that this is true for everyone else. Please join me in that insistence.

People who speak at conferences are tacitly asserting that they are thought leaders– that they deserve to influence the community. If that influence comes with a rule that “you can’t talk about me without my permission” it would have a chilling effect on progress. You can keep to yourself, of course; but if you exercise your power of speech in a public forum you cannot cry foul when someone responds to you. Please join me in my affirmation that we all have the right of response when a speaker takes the microphone to keynote at a conference.

Some people have pointed out that it’s not okay to talk back to performers in a comedy show or Broadway play. Okay. So is that what a conference is to you? I guess I believe that conferences should not be for show. Conferences are places for conferring. However, I can accept that some parts of a conference might be run like infomercials or circus acts. There could be a place for that.

The Slide

Here is the slide I used the other day:


Before I explain this slide, try to think what it might mean. What might its purposes be? That’s going to be difficult, without more information about the conference and the talks that happened there. Here are some things I imagine may be going through your mind:

  • There is someone whose name is Maaret who James thinks he’s different from.
  • He doesn’t trust nice people. Nice people are false. Is Maaret nice and therefore he doesn’t trust her, or does Maaret trust nice people and therefore James worries that she’s putting herself at risk?
  • Is James saying that niceness is always false? That’s seems wrong. I have been nice to people whom I genuinely adore.
  • Is he saying that it is sometimes false? I have smiled and shook hands with people I don’t respect, so, yes, niceness can be false. But not necessarily. Why didn’t he put qualifying language there?
  • He likes debate and he thinks that Maaret doesn’t? Maybe she just doesn’t like bad debate. Did she actually say she doesn’t like debate?
  • What if I don’t like debate, does that mean I’m not part of this community?
  • He thinks excellence requires attention and energy and she doesn’t?
  • Why is James picking on Maaret?

Look, if all I saw was this slide, I might be upset, too. So, whatever your impression is, I will explain the slide.

Like I said I was speaking at a conference in NYC. Also keynoting was Maaret Pyhäjärvi. We were both speaking about the testing role. I have some strong disagreements with Maaret about the social situation of testers. But as I watched her talk, I was a little surprised at how I agreed with the text and basic concepts of most of Maaret’s actual slides, and a lot of what she said. (I was surprised because Maaret and I have a history. We have clashed in person and on Twitter.) I was a bit worried that some of what I was going to say would seem like a rehash of what she just did, and I didn’t want to seem like I was papering over the serious differences between us. That’s why I decided to add a contrast slide to make sure our differences weren’t lost in the noise. This means a slide that highlights differences, instead of points of connection. There were already too many points of connection.

The slide was designed specifically:

  • for people to see who were in a specific room at a specific time.
  • for people who had just seen a talk by Maaret which established the basis of the contrast I was making.
  • about differences between two people who are both in the spotlight of public discourse.
  • to express views related to technical culture, not general social culture.
  • to highlight the difference between two talks for people who were about to see the second talk that might seem similar to the first talk.
  • for a situation where both I and Maaret were present in the room during the only time that this slide would ever be seen (unless someone tweeted it to people who would certainly not understand the context).
  • as talking points to accompany my live explanation (which is on video and I assume will be public, someday).
  • for a situation where I had invited anyone in the audience, including Maaret, to ask me questions or make challenges.

These people had just seen Maaret’s talk and were about to see mine. In the room, I explained the slide and took questions about it. Maaret herself spoke up about it, for which I publicly thanked her for doing so. It wasn’t something I was posting with no explanation or context. Nor was it part of the normal slides of my keynote.

Now I will address some specific issues that came up on Twitter:

1. On Naming Maaret

Maaret has expressed the belief that no one should name another person in their talk without getting their permission first. I vigorously oppose that notion. It’s completely contrary to the workings of a healthy society. If that principle is acceptable, then you must agree that there should be no free press. Instead, I would say if you stand up and speak in the guise of an expert, then you must be personally accountable for what you say. You are fair game to be named and critiqued. And the weird thing is that Maaret herself, regardless of what she claims to believe, behaves according to my principle of freedom to call people out. She, herself, tweeted my slide and talked about me on Twitter without my permission. Of course, I think that is perfectly acceptable behavior, so I’m not complaining. But it does seem to illustrate that community discourse is more complicated than “be nice” or “never cause someone else trouble with your speech” or “don’t talk about people publicly unless they gave you permission.”

2. On Being Nice

Maaret had a slide in her talk about how we can be kind to each other even though we disagree. I remember her saying the word “nice” but she may have said “kind” and I translated that into “nice” because I believed that’s what she meant. I react to that because, as a person who believes in the importance of integrity and debate over getting along for the sake of appearances, I observe that exhortations to “be nice” or even to “be kind” are often used when people want to quash disturbing ideas and quash the people who offer them. “Be nice” is often code for “stop arguing.” If I stop arguing, much of my voice goes away. I’m not okay with that. No one who believes there is trouble in the world should be okay with that. Each of us gets to have a voice.

I make protests about things that matter to me, you make protests about things that matter to you.

I think we need a way of working together that encourages debate while fostering compassion for each other. I use the word compassion because I want to get away from ritualized command phrases like “be nice.” Compassion is a feeling that you cultivate, rather than a behavior that you conform to or simulate. Compassion is an antithesis of “Rules of Order” and other lists of commandments about courtesy. Compassion is real. Throughout my entire body of work you will find that I promote real craftsmanship over just following instructions. My concern about “niceness” is the same kind of thing.

Look at what I wrote: I said “I don’t trust nice people.” That’s a statement about my feelings and it is generally true, all things being equal. I said “I’m not nice.” Yet, I often behave in pleasant ways, so what did I mean? I meant I seek to behave authentically and compassionately, which looks like “nice” or “kind”, rather than to imagine what behavior would trick people into thinking I am “nice” when indeed I don’t like them. I’m saying people over process, folks.

I was actually not claiming that Maaret is untrustworthy because she is nice, and my words don’t say that. Rather, I was complaining about the implications of following Maaret’s dictum. I was offering an alternative: be authentic and compassionate, then “niceness” and acts of kindness will follow organically. Yes, I do have a worry that Maaret might say something nice to me and I’ll have to wonder “what does that mean? is she serious or just pretending?” Since I don’t want people to worry about whether I am being real, I just tell them “I’m not nice.” If I behave nicely it’s either because I feel genuine good will toward you or because I’m falling down on my responsibility to be honest with you. That second thing happens, but it’s a lapse. (I do try to stay out of rooms with people I don’t respect so that I am not forced to give them opinions they aren’t willing or able to process.)

I now see that my sentence “I want to be authentic and compassionate” could be seen as an independent statement connected to “how I differ from Maaret,” implying that I, unlike her, am authentic and compassionate. That was an errant construction and does not express my intent. The orange text on that line indicated my proposed policy, in the hope that I could persuade her to see it my way. It was not an attack on her. I apologize for that confusion.

3. Debate vs. Dialogue

Maaret had earlier said she doesn’t want debate, but rather dialogue. I have heard this from other Agilists and I find it disturbing. I believe this is code for “I want the freedom to push my ideas on other people without the burden of explaining or defending those ideas.” That’s appropriate for a brainstorming session, but at some point, the brainstorming is done and the judging begins. I believe debate is absolutely required for a healthy professional community. I’m guided in this by dialectical philosophy, the history of scientific progress, the history of civil rights (in fact, all of politics), and the modern adversarial justice system. Look around you. The world is full of heartfelt disagreement. Let’s deal with it. I helped create the culture of small invitational peer conferences in our industry which foster debate. We need those more than ever.

But if you don’t want to deal with it, that’s okay. All that means is that you accept that there is a wall between your friends and those other people whom you refuse to debate with. I will accept the walls if necessary but I would rather resolve the walls. That’s why I open myself and my ideas for debate in public forums.

Debate is not a process of sticking figurative needles into other people. Debate is the exchange of views with the goal of resolving our differences while being accountable for our words and actions. Debate is a learning process. I have occasionally heard from people I think are doing harm to the craft that they believe I debate for the purposes of hurting people instead of trying to find resolution. This is deeply insulting to me, and to anyone who takes his vocation seriously. What’s more, considering that these same people express the view that it’s important to be “nice,” it’s not even nice. Thus, they reveal themselves to be unable to follow their own values. I worry that “Dialogue not debate” is a slogan for just another power group trying to suppress its rivals. Beware the Niceness Gang.

I understand that debating with colleagues may not be fun. But I’m not doing it for fun. I’m doing it because it is my responsibility to build a respectable craft. All testing professionals share this responsibility. Debate serves another purpose, too, managing the boundaries between rival value systems. Through debate we may discover that we occupy completely different paradigms; schools of thought. Debate can’t bridge gaps between entirely different world views, and yet I have a right to my world view just as you have a right to yours.

Jay Philips said on Twitter:

@jamesmarcusbach pointless 2debate w/ U because in your mind you’re right. Slide &points shouldn’t have happened @JokinAspiazu @ericproegler

— Jay Philips (@jayphilips) September 30, 2016

I admire Jay. I called her and we had a satisfying conversation. I filled her in on the context and she advised me to write this post.

One thing that came up is something very important about debate: the status of ideas is not the only thing that gets modified when you debate someone; what also happens is an evolution of feelings.

Yes I think “I’m right.” I acted according to principles I think are eternal and essential to intellectual progress in society. I’m happy with those principles. But I also have compassion for the feelings of others, and those feelings may hold sway even though I may be technically right. For instance, Maaret tweeted my slide without my permission. That is copyright violation. She’s objectively “wrong” to have done that. But that is irrelevant.

[Note: Maaret points out that this is legal under the fair use doctrine. Of course, that is correct. I forgot about fair use. Of course, that doesn’t change the fact that though I may feel annoyed by her selective publishing of my work, that is irrelevant, because I support her option to do that. I don’t think it was wise or helpful for her to do that, but I wouldn’t seek to bar her from doing so. I believe in freedom to communicate, and I would like her to believe in that freedom, too]

I accept that she felt strongly about doing that, so I [would] choose to waive my rights. I feel that people who tweet my slides, in general, are doing a service for the community. So while I appreciate copyright law, I usually feel okay about my stuff getting tweeted.

I hope that Jay got the sense that I care about her feelings. If Maaret were willing to engage with me she would find that I care about her feelings, too. This does not mean she gets whatever she wants, but it’s a factor that influences my behavior. I did offer her the chance to help me edit this post, but again, she refused.

4. Focus and Energy

Maaret said that eliminating the testing role is a good thing. I worry it will lead to the collapse of craftsmanship. She has a slide that says “from tester to team member” which is a sentiment she has expressed on Twitter that led me to say that I no longer consider her a tester. She confirmed to me that I hurt her feelings by saying that, and indeed I felt bad saying it, except that it is an extremely relevant point. What does it mean to be a tester? This is important to debate. Maaret has confirmed publicly (when I asked a question about this during her talk) that she didn’t mean to denigrate testing by dismissing the value of a testing role on projects. But I don’t agree that we can have it both ways. The testing role, I believe, is a necessary prerequisite for maintaining a healthy testing craft. My key concern is the dilution of focus and energy that would otherwise go to improving the testing craft. This is lost when the role is lost.

This is not an attack on Maaret’s morality. I am worried she is promoting too much generalism for the good of the craft, and she is worried I am promoting too much specialism. This is a matter of professional judgment and perspective. It cannot be settled, I think, but it must be aired.

The Slide Should Not Have Been Tweeted But It’s Okay That It Was

I don’t know what Maaret was trying to accomplish by tweeting my slide out of context. Suffice it to say what is right there on my slide: I believe in authenticity and compassion. If she was acting out of authenticity and compassion then more power to her. But the slide cannot be understood in isolation. People who don’t know me, or who have any axe to grind about what I do, are going to cry “what a cruel man!” My friends contacted me to find out more information.

I want you to know that the slide was one part of a bigger picture that depicts my principled objection to several matters involving another thought leader. That bigger picture is: two talks, one room, all people present for it, a lot of oratory by me explaining the slide, as well as back and forth discussion with the audience. Yes, there were people in the room who didn’t like hearing what I had to say, but “don’t offend anyone, ever” is not a rule I can live by, and neither can you. After all, I’m offended by most of the talks I attend.

Although the slide should not have been tweeted, I accept that it was, and that doing so was within the bounds of acceptable behavior. As I announced at the beginning of my talk, I don’t need anyone to make a safe space for me. Just follow your conscience.

What About My Conscience?
  • My conscience is clean. I acted out of true conviction to discuss important matters. I used a style familiar to anyone who has ever seen a public debate, or read an opinion piece in the New York Times. I didn’t set out to hurt Maaret’s feelings and I don’t want her feelings to be hurt. I want her to engage in the debate about the future of the craft and be accountable for her ideas. I don’t agree that I was presuming too much in doing so.
  • Maaret tells me that my slide was “stupid and hurtful.” I believe she and I do not share certain fundamental values about conferring. I will no longer be conferring with her, until and unless those differences are resolved.
  • Compassion is important to me. I will continue to examine whether I am feeling and showing the compassion for my fellow humans that they are due. These conversations and debates I have with colleagues help me do that.
  • I agree that making a safe space for students is important. But industry consultants and pundits should be able to cope with the full spectrum, authentic, principled reactions by their peers. Leaders are held to a higher standard, and must be ready and willing to defend their ideas in public forums.
  • The reaction on Twitter gave me good information about a possible trend toward fragility in the Twitter-facing part of the testing world. There seems to be a significant group of people who prize complete safety over the value that comes from confrontation. In the next conference I help arrange, I will set more explicit ground rules, rather than assuming people share something close to my own sense of what is reasonable to do and expect.
  • I will also start thinking, for each slide in my presentation: “What if this gets tweeted out of context?”

(Oh, and to those who compared me to Donald Trump… Can you even imagine him writing a post like this in response to criticism? BELIEVE ME, he wouldn’t.)

Categories: Testing & QA

How Michael Bolton and I Collaborate on Articles

James Bach’s Blog - Mon, 09/05/2016 - 07:28

(Someone posted a question on Quora asking how Michael and I write articles together. This is the answer I gave, there.)

It begins with time. We take our time. We rarely write on a deadline, except for fun, self-imposed deadlines that we can change if we really want to. For Michael and I, the quality of our writing always dominates over any other consideration.

Next is our commitment to each other. Neither one of us can contemplate releasing an article that the other of us is not proud of and happy with. Each of us gets to “stop ship” at any time, for any reason. We develop a lot of our work through debate, and sometimes the debate gets heated. I have had many colleagues over the years who tired of my need to debate even small issues. Michael understands that. When our debating gets too hot, as it occasionally does, we know how to stop, take a break if necessary, and remember our friendship.

Then comes passion for the subject. We don’t even try to write articles about things we don’t care about. Otherwise, we couldn’t summon the energy for the debate and the study that we put into our work. Michael and I are not journalists. We don’t function like reporters talking about what other people do. You will rarely find us quoting other people in our work. We speak from our own experiences, which gives us a sort of confidence and authority that comes through in our writing.

Our review process also helps a lot. Most of the work we do is reviewed by other colleagues. For our articles, we use more reviewers. The reviewers sometimes give us annoying responses, and they generally aren’t as committed to debating as we are. But we listen to each one and do what we can to answer their concerns without sacrificing our own vision. The responses can be annoying when a reviewer reads something into our article that we didn’t put there; some assumption that may make sense according to someone else’s methodology but not for our way of thinking. But after taking some time to cool off, we usually add more to the article to build a better bridge to the reader. This is especially true when more than one reviewer has a similar concern. Ultimately, of course, pleasing people is not our mission. Our mission is to say something true, useful, important, and compassionate (in that order of priority, at least in my case). Note that “amiable” and “easy to understand” or “popular” are not on that short list of highest priorities.

As far as the mechanisms of collaboration go, it depends on who “owns” it. There are three categories of written work: my blog, Michael’s blog, and jointly authored standalone articles. For the latter, we use Google Docs until we have a good first draft. Sometimes we write simultaneously on the same paragraph; more normally we work on different parts of it. If one of us is working on it alone he might decide to re-architect the whole thing, subject, of course, to the approval of the other.

After the first full draft (our recent automation article went through 28 revisions, according to Google Docs, over 14-weeks, before we reached that point), one of us will put it into Word and format it. At some point one of us will become the “article boss” and manage most of the actual editing to get it done, while the other one reviews each draft and comments. One heuristic of reviewing we frequently use is to turn change-tracking off for the first re-read, if there have been many changes.  That way whichever of us is reviewing is less likely to object to a change based purely on attachment to the previous text, rather than having an actual problem with the new text.

For the blogs, usually we have a conversation, then the guy who’s going to publish it on his blog writes a draft and does all the editing while getting comments from the other guy. The publishing party decides when to “ship” but will not do so over the other party’s objections.

I hope that makes it reasonably clear.

(Thanks to Michael Bolton for his review.)

Categories: Testing & QA

Thought after Test Automation Day 2013

Henry Ford said “Obstacles are those frightful things you see when take your eyes off the goal.” After I’ve been to Test Automation Day last month I’m figuring out why industrializing testing doesn’t work. I try to put it in this negative perspective, because I think it works! But also when is it successful? A lot of times the remark from Ford is indeed the problem. People tend to see obstacles. Obstacles because of the thought that it’s not feasible to change something. They need to change. But that’s not an easy change.

After attending the #TAD2013 as it was on Twitter I saw a huge interest in better testing, faster testing, even cheaper testing by using tools to industrialize. Test automation has long been seen as an interesting option that can enable faster testing. it wasn’t always cheaper, especially the first time, but at least faster. As I see it it’ll enable better testing. “Better?” you may ask. Test automation itself doesn’t enable better testing, but by automating regression tests and simple work the tester can focus on other areas of the quality.


And isn’t that the goal? In the end all people involved in a project want to deliver a high quality product, not full of bugs. But they also tend to see the obstacles. I see them less and less. New tools are so well advanced and automation testers are becoming smarter and smarter that they enable us to look beyond the obstacles. I would like to say look over the obstacles.

At the Test Automation Day I learned some new things, but it also proved something I already know; test automation is here to stay. We don’t need to focus on the obstacles, but should focus on the goal.

Categories: Testing & QA

The Toyota Way: The need for doing it right the first time

After WWII Toyota started developing its Toyota Production System (TPS); which was identified as ‘Lean’ in the 1990s. Taiichi Ohno, Shigeo Shingo and Eiji Toyoda developed the system between 1948 and 1975. In the myth surrounding the system it was not inspired by the American automotive industry, but from a visit to American supermarkets, Ohno saw the supermarket as model for what he was trying to accomplish in the factor and perfect the Just-in-Time (JIT) production system. While accomplishing this low inventory levels were a key outcome of the TPS, and an important element of the philosophy behind its system is to work intelligently and eliminate waste so that only minimal inventory is needed.

As TPS and Lean have their own principles as outlined by Toyota:

  • Long-term Philosophy
  • Right process will produce the right results
  • Value to organization by developing people
  • Solving root problems drives organizational learning

As these principles were summed up and published by Toyota in 2001, by naming it “The Toyota Way 2001”. It consists the above named principles in two key areas: Continuous Improvement, and Respect for People.


The principles for a continuous improvement include establishing a long-term vision, working on challenges, continual innovation, and going to the source of the issue or problem. The principles relating to respect for people include ways of building respect and teamwork. When looking at the ALM all these principles come together in the ‘first time right’ approach already mentioned. And from Toyota’s view they were outlined as followed:

  • The right process will produce the right results
    • Create continuous process flow to bring problems to the surface.
    • Use the ‘pull’ system to avoid overproduction (kanban)
    • Level out the workload (heijunka).
    • Build a culture of stopping to fix problems, to get quality right from the first (jidoka)
  • Continuously solving root problems drives organizational learning
    • Go and see for yourself to thoroughly understand the situation (Genchi Genbutsu);
    • Make decisions slowly by consensus, thoroughly considering all options (nemawashi); implement decisions rapidly;
    • Become a learning organization through relentless reflection (hansei) and continuous improvement (kaizen).
Let’s do it right now!

As the economy is changing and IT is more common sense throughout ore everyday life the need for good quality software products has never been this high. Software issues create bigger and bigger issues in our lives. Think about trains that cannot ride due to software issues, bank clients that have no access to their bank accounts, and people oversleeping because their alarm app didn’t work on their iPhone. As Capers Jones [Jones, 2011] states in his 2011 study that “software is blamed for more major business problems than any other man-made product” and that “poor quality has become one of the most expensive topics in human history”. The improvement of software quality is a key topic for all industries.

 Right the first time vs jidoka

In both TPS and Lean autonomation or jidoka are used. Autonomation can be described as ‘intelligent autonomation’, it means that when an abnormal situation arises the ‘machine’ stops and fix the abnormality. Autonomation prevents the production of defective products, eliminates overproduction, and focuses attention on understanding the problem and ensuring that it never recurs; a quality control process that applies the following four principles:

  • Detect the abnormality.
  • Stop.
  • Fix or correct the immediate condition.
  • Investigate the root cause and install a countermeasure.
Find defects as early as possible

In other words autonomation helps to get quality right the first time perfectly. With IT projects being different from the Toyota car production line, ‘perfectly’ may be a bit too much, but the process around quality assurance should be the same:

  • Find the defect.
  • Stop.
  • Fix or correct the error.
  • Investigate the root cause and take countermeasures.

The defect should be found as early as possible to be fixed as early as possible. And as with Lean and TPS the reason behind this is to make it possible to address the identification and correction of defects immediately in the process.

Categories: Testing & QA

When will you start with test automation?

I just came back from vacation and when I started again I noticed a slight change in resource requests I now see coming by; as almost all requests are with a statement around test automation. In the last two days I had two separate sessions around test automation tools. Has test automation all of a sudden become more important? Did people follow up on my last post, where I state that tools are a prerequisite in testing today, or actually: yesterday!

If you missed the latest cycle in new tools for test automation you’re either an ostrich with your head in the ground (“sorry vacation was in Southern Africa”), or you just simply still afraid. Afraid of change that test automation would cannibalize your manual test execution.

Test automation is not anymore that it takes over test execution in a very complex and unmanageable way. No it offers higher efficiency on test design, test execution, but also more options to test certain non-functional parts of applications – that could not be done without those tools – like security and performance, and virtual environments to do end-2-end tests without test environments that are down all the time. Tools are now also offering more support for testing mobile solutions. Tools are everywhere!


Test automation offers us testers the opportunity to do more, faster, les risky, and cheaper. I set these words specifically in that order. Test automation is often seen as a way to do test cheaper. You can, but you also can do more, for instance:

  • Let the tool do the checks and you explore the application further;
  • Setup a virtual test environment that doesn’t go down after 1 hour of use and test more in the same time;
  • Create and execute more test cases by generating and executing them automatically.
  • Get higher quality by really doing a thorough regression test, instead of a simple check, to find integration errors.

There are enough reasons to work on test automation and I don’t see why not. I think now it is even time for the next step in test automation. What that is time will tell, but I look forward to hearing that at the Test Automation Day in June. Where Bryan Bakker will tell more on this next step in his presentation “Design for Testability – the next step in Test Automation”. After the congress I’ll post my ideas here.

Categories: Testing & QA