Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Testing & QA

Software Development Linkopedia January 2015

From the Editor of Methods & Tools - Tue, 01/20/2015 - 14:58
Here is our monthly selection of interesting knowledge material on programming, software testing and project management. This month you will find some interesting information and opinions about managing software developers, software architecture, Agile testing, product owner patterns, mobile testing, continuous improvement, project planning and technical debt. Blog: Killing the Crunch Mode Antipattern Blog: Barry’s Rules of Engineering and Architecture Blog: Lessons Learned Moving From Engineering Into Management Blog: A ScrumMaster experience report on using Feature Teams Article: Using Models to Help Plan Tests in Agile Projects Article: A Mitigation Plan for a Product Owner’s Anti-Pattern Article: Guerrilla Project ...

Software Architecture Articles of 2014

From the Editor of Methods & Tools - Thu, 01/15/2015 - 15:07
When software features are distributed on multiple infrastructures (server, mobile, cloud) that needs to communicate and synchronize, having a sound and reactive software architecture is a key for success and evolution of business functions. Here are seven software architecture articles published in 2014 that can help you understand the basic topics and the current trends in software architecture: Agile, Cloud, SOA, Security
 and even a little bit of data modeling. * Designing Software in a Distributed World This is an overview of what is involved in designing services that use distributed computing ...

Testing on the Toilet: Prefer Testing Public APIs Over Implementation-Detail Classes

Google Testing Blog - Wed, 01/14/2015 - 18:35
by Andrew Trenk

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.


Does this class need to have tests?
class UserInfoValidator {
public void validate(UserInfo info) {
if (info.getDateOfBirth().isInFuture()) { throw new ValidationException()); }
}
}
Its method has some logic, so it may be good idea to test it. But what if its only user looks like this?
public class UserInfoService {
private UserInfoValidator validator;
public void save(UserInfo info) {
validator.validate(info); // Throw an exception if the value is invalid.
writeToDatabase(info);
}
}
The answer is: it probably doesn’t need tests, since all paths can be tested through UserInfoService. The key distinction is that the class is an implementation detail, not a public API.

A public API can be called by any number of users, who can pass in any possible combination of inputs to its methods. You want to make sure these are well-tested, which ensures users won’t see issues when they use the API. Examples of public APIs include classes that are used in a different part of a codebase (e.g., a server-side class that’s used by the client-side) and common utility classes that are used throughout a codebase.

An implementation-detail class exists only to support public APIs and is called by a very limited number of users (often only one). These classes can sometimes be tested indirectly by testing the public APIs that use them.

Testing implementation-detail classes is still useful in many cases, such as if the class is complex or if the tests would be difficult to write for the public API class. When you do test them, they often don’t need to be tested in as much depth as a public API, since some inputs may never be passed into their methods (in the above code sample, if UserInfoService ensured that UserInfo were never null, then it wouldn’t be useful to test what happens when null is passed as an argument to UserInfoValidator.validate, since it would never happen).

Implementation-detail classes can sometimes be thought of as private methods that happen to be in a separate class, since you typically don’t want to test private methods directly either. You should also try to restrict the visibility of implementation-detail classes, such as by making them package-private in Java.

Testing implementation-detail classes too often leads to a couple problems:

- Code is harder to maintain since you need to update tests more often, such as when changing a method signature of an implementation-detail class or even when doing a refactoring. If testing is done only through public APIs, these changes wouldn’t affect the tests at all.

- If you test a behavior only through an implementation-detail class, you may get false confidence in your code, since the same code path may not work properly when exercised through the public API. You also have to be more careful when refactoring, since it can be harder to ensure that all the behavior of the public API will be preserved if not all paths are tested through the public API.
Categories: Testing & QA

MVVM and Threading

Actively Lazy - Sun, 01/11/2015 - 21:01

The Model-View-ViewModel pattern is a very powerful design pattern when building WPF applications, even if I’m not sure everyone interprets it the same way. However, it’s never been clear to me how to easily manage multi-threaded WPF applications: writing multi-threaded code is hard and there seems to be no real support baked into WPF or the idea of MVVM to make multi-threaded code easier to get right.

The Problem

All WPF applications are effectively bound by the same three constraints:

  1. All interaction with UI elements must be done on the UI thread
  2. All long-running tasks (web service calls etc) should be done on a background thread to keep the UI responsive
  3. Repeatedly switching between threads is expensive

Bound by these constraints it means that all WPF code has two types of thread running through it: UI threads and background threads. It is clearly important to know which type of thread will be executing any given line of code: a background thread cannot interact with UI elements and UI threads should not make expensive calls.

Example

A very brief, contrived example might help. The source code for this is available on github.

Here is a viewmodel for a very simple view:

class ViewModel
{
  public ObservableCollection<string> Items { get; private set; }
  public ICommand FetchCommand { get; private set; }

  public async void Fetch()
  {
    var items = await m_model.Fetch();
    foreach (var item in items)
      Items.Add(item);
  }
}

The ViewModel exposes a command, which calls ViewModel.Fetch() to retrieve some data from the model; once retrieved this data is added to the list of displayed items. Since Fetch is called by an ICommand and interacts with an ObservableCollection (bound to the view) it clearly runs on the UI thread.

Our model is then responsible for fetching the data requested by the viewmodel:

class Model
{
  public async Task<IList<string>> Fetch()
  {
    return await Task.Run(() => DoFetch());
  }

  private IList<string> DoFetch()
  {
    Thread.Sleep(1000);
    return new[] { "Hello" };
  }
}

In a real application DoFetch() would obviously call a database or web service or whatever is required; it is also probable that the Fetch() method might be more complex, e.g. coordinating multiple service calls or managing caching or other application logic.

There is a difference in threading in the model compared to the viewmodel: the Fetch method will, in our example, be called on the UI thread whereas DoFetch will be called on a background thread. Here we have a class through which may run two separate types of thread.

In this very simple example which thread type calls each method is obvious. But scale this up to a real application with dozens of classes and many methods and it becomes less obvious. It suddenly becomes very easy to add a long-running service call to a method which runs on the UI thread; or a switch to a background thread from a method that already runs on a background thread. These errors can be difficult to spot: neither causes an obvious failure. The first will only show if the service call is obviously slow, the observed behaviour may simply be a UI that intermittently pauses for no apparent reason. The second simply slows tasks down, the application will seem slower than it ought to be with no obvious indication of why.

It seems as though WPF and MVVM have carefully guided us into a multi-threaded minefield.

First Approach

The first idea is to simply apply a naming convention, each method is suffixed with _UI or _Worker to indicate which type of thread invoked it. E.g. our model class becomes:

class Model
{
  public async Task<IList<string>> Fetch_UI()
  {
    return await Task.Run(() => Fetch_Worker());
  }

  private IList<string> Fetch_Worker()
  {
    Thread.Sleep(1000);
    return new[] { "Hello" };
  }
}

This at least makes it obvious to my future self which type of thread I think executes each method. Simple code inspection shows that a _UI method calling someWebService.Invoke(…) is a Bad Thing. Similarly, Task.Run(…) in a _Worker method is obviously suspect. However, it looks ugly and isn’t guaranteed to be correct – it is just a naming convention, nothing stops me calling a _UI method from a background thread, or vice versa.

Introducing PostSharp

If you haven’t tried PostSharp, the C# AOP library, it is definitely worth a look. Even the free version is quite powerful and allows us to evolve the first idea into something more powerful.

PostSharp allows you to create an attribute which will introduce “advice” (i.e. code to run) around any method annotated with the attribute. For example, we can annotate the ViewModel constructor with a new UIThreadPolicy attribute:

  [UIThreadPolicy]
  public ViewModel()
  {
    Items = new ObservableCollection<string>();
    FetchCommand = new FetchCommand(this);
  }

This attribute is logically similar to using a suffix on the method name, in that it documents our intention. However, by using AOP it also allows us to introduce code to run before the method executes:

class UIThreadPolicy : OnMethodBoundaryAspect
{
  public override void OnEntry(MethodExecutionArgs args)
  {
#if DEBUG
    if (Thread.CurrentThread.IsBackground)
      Console.WriteLine("*** Thread policy warning: \n" + Environment.StackTrace);
#endif
  }
}

The OnEntry method will be triggered before the annotated method is invoked. If the method is ever invoked on a background thread, we’ll output a warning to the console. In this rudimentary way, we not only document our intention that the ViewModel should only be created on a UI thread, we also enforce it at runtime to ensure that it remains correct.

We can define another attribute, WorkerThreadPolicy, to enforce the reverse: that a method is only invoked on a background thread. With discipline one of these two attributes can be applied to every method. This makes it trivial when making changes in months to come to know whether a given method runs on the UI thread or a background thread.

Locking

Understanding situations where multiple threads access code is hard, so wouldn’t it be great if we could easily identify situations where it’s safe to ignore it?

By using the thread attributes, we can identify situations where code is only accessed by the UI thread. In this case, we have no concurrency to deal with. There is exactly one UI thread so we can ignore any concurrency concerns. For example, our simple Fetch() method above can add a property to keep track of whether we’re already busy:

  [UIThreadPolicy]
  public async void Fetch()
  {
    IsFetching = true;
    try
    {
      var items = await m_model.Fetch();
      foreach (var item in items)
      Items.Add(item);
    }
    finally
    {
      IsFetching = false;
    }
  }

So long as all access to the IsFetching property is on the UI thread, there is no need to worry about locking. We can enforce this by adding attributes to the property itself, too:

  private bool m_isFetching;

  internal bool IsFetching
  {
    [UIThreadPolicy]
    get { return m_isFetching; }

    [UIThreadPolicy]
    set
    {
      m_isFetching = value;
      IsFetchingChanged(this, new EventArgs());
    }
  }

Here we use a simple, unlocked bool – knowing that the only access is from a single thread. Without these attributes, it is possible that someone in the future writes to IsFetching from a background thread. It will generally work – but access from the UI thread could continue to see a stale value for a short period.

Conclusion

In general we have aimed for a pattern where ViewModels are only accessed on the UI thread. Model classes, however, tend to have a mixture. And since the “model” in any non-trivial application actually includes dozens of classes this mixture of threads permeates the code. By using these attributes it is possible to understand which threads are used where without exhaustively searching up call stacks.

Writing multi-threaded code is hard: but, with a bit of PostSharp magic, knowing which type of thread will execute any given line of code at least makes it a little bit easier.


Categories: Programming, Testing & QA

Decoupling JUnit and Hamcrest

Mistaeks I Hav Made - Nat Pryce - Tue, 01/06/2015 - 23:34
Evolution of Hamcrest and JUnit has been complicated by the dependencies between the two projects. If they can be decoupled, both projects can move forward with less coordination required between the development teams. To that end, we've come up with a plan to allow JUnit to break its dependency on Hamcrest. We've created a new library, hamcrest-junit , that contains a copy of the JUnit code that uses Hamcrest. The classes have been repackaged into org.hamcrest.junit so that the library can be added to existing projects without breaking any code and projects can migrate to the new library at their own pace. The Assert.assertThat function has been copied into the class org.hamcrest.junit.MatcherAssert and Assume.assumeThat to org.hamcrest.junit.MatcherAssume. The ExpectedException and ErrorCollector classes have been copied from package org.junit.rules to package org.hamcrest.junit. The classes in hamcrest-junit only pass strings to JUnit. For example, the assumeThat function does not use the constructor of the AssumptionFailedException that takes a Matcher but generates the description of the matcher as a String and uses that to create the exception instead. This will allow JUnit to deprecate and eventually remove any code that depends on types defined by Hamcrest, finally decoupling the two projects. Version 1.0.0.0-RC1 of the hamcrest-junit library has been published to Maven Central (groupId: org.hamcrest, artifactId: hamcrest-junit, version: 1.0.0.0-RC1). The source is on GitHub at https://github.com/hamcrest/hamcrest-junit. Feedback about the library, about how easy it is to transition code from using JUnit's classes to using hamcrest-junit and suggestions for improvement are most welcome. Discussion can happen on the hamcrest mailing list or the project's issue tracker on GitHub.
Categories: Programming, Testing & QA

10 technologies that impressed me in 2014

Agile Testing - Grig Gheorghiu - Wed, 12/31/2014 - 07:26
Some of these have been new to me, some are old friends that I came to appreicate more. In alphabetical order:

  1. Ansible
  2. Bitcoin
  3. Consul
  4. Docker
  5. Golang
  6. HAProxy
  7. nginx
  8. OpenStack
  9. Sysdig
  10. Varnish

Agile Analysis, Self-Selecting Teams, TDD & BDD in Methods & Tools Winter 2014 issue

From the Editor of Methods & Tools - Mon, 12/22/2014 - 15:22
Methods & Tools – the free e-magazine for software developers, testers and project managers – has just published its Winter 2014 issue that discusses Agile Analysis, Self-Selecting Teams,Collaborative Development of Domain-specific Languages, TDD with Mock Objects, BDDfire. Methods & Tools Winter 2014 contains the following articles: * Analysis on Analysts in Agile * Self-Selecting Teams – Why You Should Try Self-selection * Collaborative Development of Domain-specific Languages, Models and Generators * TDD with Mock Objects: Design Principles and Emergent Properties * BDDfire: Instant Ruby-Cucumber Framework 55 pages of software development knowledge that you can freely download from ...

Testing on the Toilet: Truth: a fluent assertion framework

Google Testing Blog - Fri, 12/19/2014 - 19:28
by Dori Reuveni and Kurt Alfred Kluever

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.


As engineers, we spend most of our time reading existing code, rather than writing new code. Therefore, we must make sure we always write clean, readable code. The same goes for our tests; we need a way to clearly express our test assertions.

Truth is an open source, fluent testing framework for Java designed to make your test assertions and failure messages more readable. The fluent API makes reading (and writing) test assertions much more natural, prose-like, and discoverable in your IDE via autocomplete. For example, compare how the following assertion reads with JUnit vs. Truth:
assertEquals("March", monthMap.get(3));          // JUnit
assertThat(monthMap).containsEntry(3, "March"); // Truth
Both statements are asserting the same thing, but the assertion written with Truth can be easily read from left to right, while the JUnit example requires "mental backtracking".

Another benefit of Truth over JUnit is the addition of useful default failure messages. For example:
ImmutableSet<String> colors = ImmutableSet.of("red", "green", "blue", "yellow");
assertTrue(colors.contains("orange")); // JUnit
assertThat(colors).contains("orange"); // Truth
In this example, both assertions will fail, but JUnit will not provide a useful failure message. However, Truth will provide a clear and concise failure message:

AssertionError: <[red, green, blue, yellow]> should have contained <orange>

Truth already supports specialized assertions for most of the common JDK types (Objects, primitives, arrays, Strings, Classes, Comparables, Iterables, Collections, Lists, Sets, Maps, etc.), as well as some Guava types (Optionals). Additional support for other popular types is planned as well (Throwables, Iterators, Multimaps, UnsignedIntegers, UnsignedLongs, etc.).

Truth is also user-extensible: you can easily write a Truth subject to make fluent assertions about your own custom types. By creating your own custom subject, both your assertion API and your failure messages can be domain-specific.

Truth's goal is not to replace JUnit assertions, but to improve the readability of complex assertions and their failure messages. JUnit assertions and Truth assertions can (and often do) live side by side in tests.

To get started with Truth, check out http://google.github.io/truth/

Categories: Testing & QA

Dynamic DNS updates with nsupdate (new and improved!)

Agile Testing - Grig Gheorghiu - Wed, 12/17/2014 - 23:14
I blogged about this topic before. This post shows a slightly different way of using nsupdate remotely against a DNS server running BIND 9 in order to programatically update DNS records. The scenario I am describing here involves an Ubuntu 12.04 DNS server running BIND 9 and an Ubuntu 12.04 client running nsupdate against the DNS server.

1) Run ddns-confgen and specify /dev/urandom as the source of randomness and the name of the zone file you want to dynamically update via nsupdate:

$ ddns-confgen -r /dev/urandom -z myzone.com

# To activate this key, place the following in named.conf, and
# in a separate keyfile on the system or systems from which nsupdate
# will be run:
key "ddns-key.myzone.com" {
algorithm hmac-sha256;
secret "1D1niZqRvT8pNDgyrJcuCiykOQCHUL33k8ZYzmQYe/0=";
};

# Then, in the "zone" definition statement for "myzone.com",
# place an "update-policy" statement like this one, adjusted as
# needed for your preferred permissions:
update-policy {
 grant ddns-key.myzone.com zonesub ANY;
};

# After the keyfile has been placed, the following command will
# execute nsupdate using this key:
nsupdate -k <keyfile>

2) Follow the instructions in the output of ddns-keygen (above). I actually named the key just ddns-key, since I was going to use it for all the zones on my DNS server. So I added this stanza to /etc/bind/named.conf on the DNS server:

key "ddns-key" {
algorithm hmac-sha256;
secret "1D1niZqRvT8pNDgyrJcuCiykOQCHUL33k8ZYzmQYe/0=";
};

3) Allow updates when the key ddns-key is used. In my case, I added the allow-update line below to all zones that I wanted to dynamically update, not only to myzone.com:

zone "myzone.com" {
        type master;
        file "/etc/bind/zones/myzone.com.db";
allow-update { key "ddns-key"; };
};

At this point I also restarted the bind9 service on my DNS server.

4) On the client box, create a text file containing nsupdate commands to be sent to the DNS server. In the example below, I want to dynamically add both an A record and a reverse DNS PTR record:

$ cat update_dns1.txt
server dns1.mycompany.com
debug yes
zone myzone.com
update add testnsupdate1.myzone.com 3600 A 10.10.2.221
show
send
zone 2.10.10.in-addr.arpa
update add 221.2.10.10.in-addr.arpa 3600 PTR testnsupdate1.myzone.com
show
send

Still on the client box, create a file containing the stanza with the DDNS key generated in step 1:

$ cat ddns-key.txt
key "ddns-key" {
algorithm hmac-sha256;
secret "Wxp1uJv3SHT+R9rx96o6342KKNnjW8hjJTyxK2HYufg=";
};

5) Run nsupdate and feed it both the update_dns1.txt file containing the commands, and the ddns-key.txt file:

$ nsupdate -k ddns-key.txt -v update_dns1.txt

You should see some fairly verbose output, since the command file specifies 'debug yes'. At the same time, tail /var/log/syslog on the DNS server and make sure there are no errors.

In my case, there were some hurdles I had to overcome on the DNS server. The first one was that apparmor was installed and it wasn't allowing the creation of the journal files used to keep track of DDNS records. I saw lines like these in /var/log/syslog:

Dec 16 11:22:59 dns1 kernel: [49671335.189689] type=1400 audit(1418757779.712:12): apparmor="DENIED" operation="mknod" parent=1 profile="/usr/sbin/named" name="/etc/bind/zones/myzone.com.db.jnl" pid=31154 comm="named" requested_mask="c" denied_mask="c" fsuid=107 ouid=107
Dec 16 11:22:59 dns1 kernel: [49671335.306304] type=1400 audit(1418757779.828:13): apparmor="DENIED" operation="mknod" parent=1 profile="/usr/sbin/named" name="/etc/bind/zones/rev.2.10.10.in-addr.arpa.jnl" pid=31153 comm="named" requested_mask="c" denied_mask="c" fsuid=107 ouid=107

To get past this issue, I disabled apparmor for named:

# ln -s /etc/apparmor.d/usr.sbin.named /etc/apparmor.d/disable/
# service apparmor restart

The next issue was an OS permission denied (nothing to do with apparmor) when trying to create the journal files in /etc/bind/zones:

Dec 16 11:30:54 dns1 named[32640]: /etc/bind/zones/myzone.com.db.jnl: create: permission denied
Dec 16 11:30:54 dns named[32640]: /etc/bind/zones/rev.2.0.10.in-addr.arpa.jnl: create: permission denied

I got past this issue by running

# chown -R bind:bind /etc/bind/zones

At this point everything worked as expected.


Software Development Linkopedia December 2014

From the Editor of Methods & Tools - Wed, 12/10/2014 - 15:37
Here is our monthly selection of interesting knowledge material on programming, software testing and project management.  This month you will find some interesting information and opinions about slow programming, technical career paths, Agile QA, Scrum backlog refinement meetings, being a better test manager, java BDD, mixing Waterfall and Agile, the TDD cycle and dealing with bad Java code. Blog: The Case for Slow Programming Blog: Coding, Fast and Slow: Developers and the Psychology of Overconfidence Blog: Climbing off the CTO ladder Blog: What Does QA Do on the First Day of a Sprint? Blog: Stop ...

Quote of the Month December 2014

From the Editor of Methods & Tools - Tue, 12/09/2014 - 18:23
Stories shouldn’t be small because they need to fit into an iteration, but because the world shouldn’t end just because a story turns out to be wrong. Stories are based on assumptions about business value, and those assumptions might turn out to be right or wrong.The key questions for story sizing shouldn’t be about the iteration length, but about how much business stakeholders want to invest in learning whether a proposed change will actually give them what they assumed. Source: Fifty Quick Ideas to Improve your User Stories, Gojko Adzic and ...

Protractor: Angular testing made easy

Google Testing Blog - Sun, 11/30/2014 - 18:50
By Hank Duan, Julie Ralph, and Arif Sukoco in Seattle

Have you worked with WebDriver but been frustrated with all the waits needed for WebDriver to sync with the website, causing flakes and prolonged test times? If you are working with AngularJS apps, then Protractor is the right tool for you.

Protractor (protractortest.org) is an end-to-end test framework specifically for AngularJS apps. It was built by a team in Google and released to open source. Protractor is built on top of WebDriverJS and includes important improvements tailored for AngularJS apps. Here are some of Protractor’s key benefits:

  • You don’t need to add waits or sleeps to your test. Protractor can communicate with your AngularJS app automatically and execute the next step in your test the moment the webpage finishes pending tasks, so you don’t have to worry about waiting for your test and webpage to sync. 
  • It supports Angular-specific locator strategies (e.g., binding, model, repeater) as well as native WebDriver locator strategies (e.g., ID, CSS selector, XPath). This allows you to test Angular-specific elements without any setup effort on your part. 
  • It is easy to set up page objects. Protractor does not execute WebDriver commands until an action is needed (e.g., get, sendKeys, click). This way you can set up page objects so tests can manipulate page elements without touching the HTML. 
  • It uses Jasmine, the framework you use to write AngularJS unit tests, and Javascript, the same language you use to write AngularJS apps.

Follow these simple steps, and in minutes, you will have you first Protractor test running:

1) Set up environment

Install the command line tools ‘protractor’ and ‘webdriver-manager’ using npm:
npm install -g protractor

Start up an instance of a selenium server:
webdriver-manager update & webdriver-manager start

This downloads the necessary binary, and starts a new webdriver session listening on http://localhost:4444.

2) Write your test
// It is a good idea to use page objects to modularize your testing logic
var angularHomepage = {
nameInput : element(by.model('yourName')),
greeting : element(by.binding('yourName')),
get : function() {
browser.get('index.html');
},
setName : function(name) {
this.nameInput.sendKeys(name);
}
};

// Here we are using the Jasmine test framework
// See http://jasmine.github.io/2.0/introduction.html for more details
describe('angularjs homepage', function() {
it('should greet the named user', function(){
angularHomepage.get();
angularHomepage.setName('Julie');
expect(angularHomepage.greeting.getText()).
toEqual('Hello Julie!');
});
});

3) Write a Protractor configuration file to specify the environment under which you want your test to run:
exports.config = {
seleniumAddress: 'http://localhost:4444/wd/hub',

specs: ['testFolder/*'],

multiCapabilities: [{
'browserName': 'chrome',
// browser-specific tests
specs: 'chromeTests/*'
}, {
'browserName': 'firefox',
// run tests in parallel
shardTestFiles: true
}],

baseUrl: 'http://www.angularjs.org',
};

4) Run the test:

Start the test with the command:
protractor conf.js

The test output should be:
1 test, 1 assertions, 0 failures


If you want to learn more, here’s a full tutorial that highlights all of Protractor’s features: http://angular.github.io/protractor/#/tutorial

Categories: Testing & QA

Software Development Conferences Forecast November 2014

From the Editor of Methods & Tools - Tue, 11/25/2014 - 15:20
Here is a list of software development related conferences and events on Agile ( Scrum, Lean, Kanban) software testing and software quality, programming (Java, .NET, JavaScript, Ruby, Python, PHP) and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods & Tools software development magazine. SPTechCon, February 8-11 2015, Austin, USA Use code SHAREPOINT for a $200 conference discount off the 3 and 4-day pass NorDevCon, February 27 2015, Norwich, UK QCon London, March 2-6 2015, London, UK Exclusive 50 pounds Method & Tools discount ...

Quote of the Month November 2014

From the Editor of Methods & Tools - Mon, 11/24/2014 - 14:57
Walking on water and developing software from a specification are easy if both are frozen. Source: Edward V. Berard (1993) Essays on object-oriented software engineering. Volume 1, Prentice Hall

Software Development Linkopedia November 2014

From the Editor of Methods & Tools - Tue, 11/18/2014 - 17:19
Here is our monthly selection of interesting knowledge material on programming, software testing and project management.  This month you will find some interesting information and opinions about Agile coaching and management, positive quality assurance, product managers and owners, enterprise software architecture, functionnal programming, MySQL performance and software testing at Google. Blog: Fewer Bosses. More Coaches. Please. Blog: Advice for Agile Coaches on “Dealing With” Middle Management Blog: Five ways to make testing more positive Blog: 9 Things Every Product Manager Should Know about Being an Agile Product Owner Article: Managing Trust in Distributed Teams Article: Industrial ...

Service discovery with consul and consul-template

Agile Testing - Grig Gheorghiu - Tue, 11/18/2014 - 00:20
I talked in the past about an "Ops Design Pattern: local haproxy talking to service layer". I described how we used a local haproxy on pretty much all nodes at a given layer of our infrastructure (webapp, API, e-commerce) to talk to services offered by the layer below it. So each webapp server has a local haproxy that talks to all API nodes it sends requests to. Similarly, each API node has a local haproxy that talks to all e-commerce nodes it needs info from.

This seemed like a good idea at a time, but it turns out it has a couple of annoying drawbacks:
  • each local haproxy runs health checks against N nodes, so if you have M nodes running haproxy, each of the N nodes will receive M health checks; if M and N are large, then you have a health check storm on your hands
  • to take a node out of a cluster at any given layer, we tag it as 'inactive' in Chef, then run chef-client on all nodes that run haproxy and talk to the inactive node at layers above it; this gets old pretty fast, especially when you're doing anything that might conflict with Chef and that the chef-client run might overwrite (I know, I know, you're not supposed to do anything of that nature, but we are all human :-)
For the second point, we are experimenting with haproxyctl so that we don't have to run chef-client on every node running haproxy. But it still feels like a heavy-handed approach.
If I were to do this again (which I might), I would still have an haproxy instance in front of our webapp servers, but for communicating from one layer of services to another I would use a proper service discovery tool such as grampa Apache ZooKeeper or the newer kids on the block, etcd from CoreOS and consul from HashiCorp.

I settled on consul for now, so in this post I am going to show how you can use consul in conjunction with the recently released consul-template to discover services and to automate configuration changes. At the same time, I wanted to experiment a bit with Ansible as a configuration management tool. So the steps I'll describe were actually automated with Ansible, but I'll leave that for another blog post.

The scenario I am going to describe involves 2 haproxy instances, each pointing to 2 Wordpress servers running Apache, PHP and MySQL, with Varnish fronting the Wordpress application. One of the 2 Wordpress servers is considered primary as far as haproxy is concerned, and the other one is a backup server, which will only get requests if the primary server is down. All servers are running Ubuntu 12.04.

Install and run the consul agent on all nodes

The agent will start in server mode on the 2 haproxy nodes, and in agent mode on the 2 Wordpress nodes.

I first deployed consul to the 2 haproxy nodes. I used a modified version of the ansible-consul role from jivesoftware. The configuration file /etc/consul.cfg for the first server (lb1) is:

{
  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "lb1",
  "server": true,
  "bind_addr": "10.0.0.1",
  "datacenter": "us-west-1b",
  "bootstrap": true,
  "rejoin_after_leave": true
}

(and similar for lb2, with only node_name and bind_addr changed to lb2 and 10.0.0.2 respectively)

The ansible-consul role also creates a consul user and group, and an upstart configuration file like this:

# cat /etc/init/consul.conf

# Consul Agent (Upstart unit)
description "Consul Agent"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec sudo -u consul -g consul /opt/consul/bin/consul agent -config-dir /etc/consul.d -config-file=/etc/consul.conf >> /var/log/consul 2>&1
respawn
respawn limit 10 10
kill timeout 10

To start/stop consul, I use:

# start consul
# stop consul

Note that "server" is set to true and "bootstrap" is also set to true, which means that each consul server will be the leader of a cluster with 1 member, itself. To join the 2 servers into a consul cluster, I did the following:
  • join lb1 to lb2: on lb1 run consul join 10.0.0.2
  • tail /var/log/consul on lb1, note messages complaining about both consul servers (lb1 and lb2) running in bootstrap mode
  • stop consul on lb1: stop consul
  • edit /etc/consul.conf on lb1 and set  "bootstrap": false
  • start consul on lb1: start consul
  • tail /var/log/consul on both lb1 and lb2; it should show no more errors
  • run consul info on both lb1 and lb2; the output should show server=true on both nodes, but leader=true only on lb2
Next I ran the consul agent in regular non-server mode on the 2 Wordpress nodes. The configuration file /etc/consul.cfg on node wordpress1 was:
{  "domain": "consul.",  "data_dir": "/opt/consul/data",  "log_level": "INFO",  "node_name": "wordpress1",  "server": false,  "bind_addr": "10.0.1.1",  "datacenter": "us-west-1b",  "rejoin_after_leave": true}
(and similar for wordpress2, with the node_name set to wordpress2 and bind_addr set to 10.0.1.2)
After starting up the agents via upstart, I joined them to lb2 (although the could be joined to any of the existing members of the cluster). I ran this on both wordpress1 and wordpress2:
# consul join 10.0.0.2
At this point, running consul members on any of the 4 nodes should show all 4 members of the cluster:
Node          Address         Status  Type    Build  Protocollb1           10.0.0.1:8301   alive   server  0.4.0  2wordpress2    10.0.1.2:8301   alive   client  0.4.0  2lb2           10.0.0.2:8301   alive   server  0.4.0  2wordpress1    10.0.1.1:8301   alive   client  0.4.0  2
Install and run dnsmasq on all nodes
The ansible-consul role does this for you. Consul piggybacks on DNS resolution for service naming, and by default the domain names internal to Consul start with consul. In my case they are configured in consul.cfg via "domain": "consul."
The dnsmasq configuration file for consul is:
# cat /etc/dnsmasq.d/10-consul
server=/consul./127.0.0.1#8600
This causes dnsmasq to provide DNS resolution for domain names starting with consul. by querying a DNS server on 127.0.0.1 running on port 8600 (which is the port the local consul agent listens on to provide DNS resolution).
To start/stop dnsmasq, use: service dnsmasq start | stop.
Now that dnsmasq is running, you can look up names that end in .node.consul from any member node of the consul cluster (there are 4 member nodes in my cluster, 2 servers and 2 agents). For example, I ran this on lb2:
$ dig wordpress1.node.consul
; <<>> DiG 9.8.1-P1 <<>> wordpress1.node.consul;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2511;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:;wordpress1.node.consul. IN A
;; ANSWER SECTION:wordpress1.node.consul. 0 IN A 10.0.1.1
;; Query time: 1 msec;; SERVER: 127.0.0.1#53(127.0.0.1);; WHEN: Fri Nov 14 00:09:16 2014;; MSG SIZE  rcvd: 76
Configure services and checks on consul agent nodes

Internal DNS resolution within the .consul domain becomes even more useful when nodes define services and checks. For example, the 2 Wordpress nodes run varnish and apache (on port 80 and port 443) so we can define 3 services as JSON files in /etc/consul.d. On wordpress1, which is our active/primary node in haproxy, I defined these services:

$ cat http_service.json
{
    "service": {
        "name": "http",
        "tags": ["primary"],
        "port":80,
        "check": {
                "id": "http_check",
                "name": "HTTP Health Check",
  "script": "curl -H 'Host=www.mydomain.com' http://localhost",
        "interval": "5s"
        }
    }
}

$ cat ssl_service.json
{
    "service": {
        "name": "ssl",
        "tags": ["primary"],
        "port":443,
        "check": {
                "id": "ssl_check",
                "name": "SSL Health Check",
  "script": "curl -k -H 'Host=www.mydomain.com' https://localhost:443",
        "interval": "5s"
        }
    }
}

$ cat varnish_service.json
{
    "service": {
        "name": "varnish",
        "tags": ["primary"],
        "port":6081 ,
        "check": {
                "id": "varnish_check",
                "name": "Varnish Health Check",
  "script": "curl http://localhost:6081",
        "interval": "5s"
        }
    }
}

Each service we defined has a name, a port and a check with its own ID, name, script that runs whenever the check is executed, and an interval that specifies how often the check is run. In the examples above I specified simple curl commands against the ports that these services are running on. Note also that each service has a list of tags associated with it. In my case, the services on wordpress1 have the tag "primary". The services defined on wordpress2 are identical to the ones on wordpress1 with the only difference being the tag, which on wordpress2 is "backup".

After restarting consul on wordpress1 and wordpress2, the following service-related DNS names are available for resolution on all nodes in the consul cluster (I am going to include only relevant portions of the dig output):

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.1
varnish.service.consul. 0 IN A 10.0.1.2

This name resolves in DNS round-robin fashion to the IP addresses of all nodes that are running the varnish service, regardless of their tags and regardless of the data centers that their nodes run in. In our case, it resolves to the IP addresses of wordpress1 and wordpress2.

Note that the IP address of a given node only appears in the DNS result set if the service running on that node has a healty check. If the check fails, then consul's DNS service will not include the IP of the node in the result set. This is very important for the dynamic discovery of healthy services.

$ dig varnish.service.us-west-1b.consul

;; ANSWER SECTION:
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.2
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.1

If we include the data center (in our case us-west-1b) in the DNS name we query, then only the services running on nodes in that data center will be returned in the result set. In our case though, all nodes run in the us-west-1b data center, so this query returns, like the previous one, the IP addresses of wordpress1 and wordpress2. Note that the IPs can be returned in any order, because of DNS round-robin. In this case the IP of wordpress2 was first.

$ dig SRV varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress1.node.us-west-1b.consul.
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress2.node.us-west-1b.consul.

;; ADDITIONAL SECTION:
wordpress1.node.us-west-1b.consul. 0 IN A 10.0.1.1
wordpress2.node.us-west-1b.consul. 0 IN A 10.0.1.2

A useful feature of the consul DNS service is that it returns the port number that a given service runs on when queried for an SRV record. So this query returns the names and IPs of the nodes that the varnish service runs on, as well as the port number, which in this case is 6081. The application querying for the SRV record needs to interpret this extra piece of information, but this is very useful for the discovery of internal services that might run on non-standard port numbers.

$ dig primary.varnish.service.consul

;; ANSWER SECTION:
primary.varnish.service.consul. 0 IN A 10.0.1.1

$ dig backup.varnish.service.consul

;; ANSWER SECTION:
backup.varnish.service.consul. 0 IN A 10.0.1.2

The 2 DNS queries above show that it's possible to query a service by its tag, in our case 'primary' vs. 'backup'. The result set will contain the IP addresses of the nodes tagged with the specific tag and running the specific service we asked for. This feature will prove useful when dealing with consul-template in haproxy, as I'll show later in this post.

Load balance across services

It's easy now to see how an application can take advantage of the internal DNS service provided by consul and load balance across services. For example, an application that needs to load balance across the 2 varnish services on wordpress1 and wordpress2 would use varnish.service.consul as the DNS name it talks to when it needs to hit varnish. Every time this DNS name is resolved, a random node from wordpress1 and wordpress2 is returned via the DNS round-robin mechanism. If varnish were to run on a non-standard port number, the application would need to issue a DNS request for the SRV record in order to obtain the port number as well as the IP address to hit.

Note that this method of load balancing has health checks built in. If the varnish health check fails on one of the nodes providing the varnish service, that node's IP address will not be included in the DNS result set returned by the DNS query for that service.

Also note that the DNS query can be customized for the needs of the application, which can query for a specific data center, or a specific tag, as I showed in the examples above.

Force a node out of service

I am still looking for the best way to take nodes in and out of service for maintenance or other purposes. One way I found so far is to deregister a given service via the Consul HTTP API. Here is an example of a curl command that accomplishes that, executed on node wordpress1:

$ curl -v http://localhost:8500/v1/agent/service/deregister/varnish
* About to connect() to localhost port 8500 (#0)
*   Trying 127.0.0.1... connected
> GET /v1/agent/service/deregister/varnish HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 17 Nov 2014 19:01:06 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
* Closing connection #0

The effect of this command is that the varnish service on node wordpress1 is 'deregistered', which for my purposes means 'marked as down'. DNS queries for varnish.service.consul will only return the IP address of wordpress2:

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.2

We can also use the Consul HTTP API to verify that the varnish service does not appear in the list of active services on node wordpress1. We'll use the /agent/services API call and we'll save the output to a file called services.out, then we'll use the jq tool to pretty-print the output:

$ curl -v http://localhost:8500/v1/agent/services -o services.out
$ jq . <<< `cat services.out`{  "http": {    "ID": "http",    "Service": "http",    "Tags": [      "primary"    ],    "Port": 80  },  "ssl": {    "ID": "ssl",    "Service": "ssl",    "Tags": [      "primary"    ],    "Port": 443  }}
Note that only the http and ssl services are shown.
Force a node back in service

Again, I am still looking for the best way to mark as service as 'up' once it was marked as 'down'. One way would be to register the service via the Consul HTTP API, and that requires issuing a POST request with the payload being the JSON configuration file for that service. Another way is to just restart the consul agent on the node in question. This will register the service that had been deregistered previously.

Install and configure consul-template

For the next few steps, I am going to show how to use consul-template in conjuction with consul for discovering services and configuring haproxy based on the discovered services.

I automated the installation and configuration of consul-template via an Ansible role that I put on Github, but I am going to discuss the main steps here. See also the instructions on the consul-template Github page.

In my Ansible role, I copy the consul-template binary to the target node (in my case the 2 haproxy nodes lb1 and lb2), then create a directory structure /opt/consul-template/{bin,config,templates}. The consul-template configuration file is /opt/consul-template/config/consul-template.cfg and it looks like this in my case:

$ cat config/consul-template.cfg
consul = "127.0.0.1:8500"

template {
  source = "/opt/consul-template/templates/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy restart"
}

Note that consul-template needs to be able to talk a consul agent, which in my case is the local agent listening on port 8500. The template that consul-template maintains is defined in another file,  /opt/consul-template/templates/haproxy.ctmpl. What consul-template does is monitor changes to that file via changes to the services referenced in the file. Upon any such change, consul-template will generate a new target file based on the template and copy it to the destination file, which in my case is the haproxy config file /etc/haproxy/haproxy.cfg. Finally, consul-template will executed a command, which in my case is the restarting of the haproxy service.

Here is the actual template file for my haproxy config, which is written in the Go template format:

$ cat /opt/consul-template/templates/haproxy.ctmpl

global
  log 127.0.0.1   local0
  maxconn 4096
  user haproxy
  group haproxy

defaults
  log     global
  mode    http
  option  dontlognull
  retries 3
  option redispatch
  timeout connect 5s
  timeout client 50s
  timeout server 50s
  balance  roundrobin

# Set up application listeners here.

frontend http
  maxconn {{key "service/haproxy/maxconn"}}
  bind 0.0.0.0:80
  default_backend servers-http-varnish

backend servers-http-varnish
  balance            roundrobin
  option httpchk GET /
  option  httplog
{{range service "primary.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}

frontend https
  maxconn            {{key "service/haproxy/maxconn"}}
  mode               tcp
  bind               0.0.0.0:443
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin
{{range service "primary.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}


To the trained eye, this looks like a regular haproxy configuration file, with the exception of the portions bolded above. These are Go template snippets which rely on a couple of template functions exposed by consul-template above and beyond what the Go templating language offers. Specifically, the key function queries a key stored in the Consul key/value store and outputs the value associated with that key (or an empty string if the value doesn't exist). The service function queries a consul service by its DNS name and returns a result set used inside the range statement. The variables inside the result set can be inspected for properties such as Node, Address and Port, which correspond to the Consul service node name, IP address and port number for that particular service.

In my example above, I use the value of the key service/haproxy/maxconn as the value of maxconn. In the http-varnish backend, I used 2 sets of services names, primary.varnish and backup.varnish, because I wanted to differentiate in haproxy.cfg between the primary server (wordpress1 in my case) and the backup server (wordpress2). In the ssl backend, I did the same but with the ssl service.
Everything so far would work fine with the exception of the key/value pair represented by the key service/haproxy/maxconn. To define that pair, I used the Consul key/value store API (this can be run on any member of the Consul cluster):
$ cat set_haproxy_maxconn.sh#!/bin/bash
MAXCONN=4000
curl -X PUT -d "$MAXCONN" http://localhost:8500/v1/kv/service/haproxy/maxconn
To verify that the value was set, I used:
$ cat query_consul_kv.sh#!/bin/bash
curl -v http://localhost:8500/v1/kv/?recurse
$ ./query_consul_kv.sh* About to connect() to localhost port 8500 (#0)*   Trying 127.0.0.1... connected> GET /v1/kv/?recurse HTTP/1.1> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3> Host: localhost:8500> Accept: */*>< HTTP/1.1 200 OK< Content-Type: application/json< X-Consul-Index: 30563< X-Consul-Knownleader: true< X-Consul-Lastcontact: 0< Date: Mon, 17 Nov 2014 23:01:07 GMT< Content-Length: 118<* Connection #0 to host localhost left intact* Closing connection #0[{"CreateIndex":10995,"ModifyIndex":30563,"LockIndex":0,"Key":"service/haproxy/maxconn","Flags":0,"Value":"NDAwMA=="}]
At this point, everything is ready for starting up the consul-template service (in Ubuntu), I did it via this Upstart configuration file:
# cat /etc/init/consul-template.conf# Consul Template (Upstart unit)description "Consul Template"start on (local-filesystems and net-device-up IFACE!=lo)stop on runlevel [06]
exec /opt/consul-template/bin/consul-template  -config=/opt/consul-template/config/consul-template.cfg >> /var/log/consul-template 2>&1
respawnrespawn limit 10 10kill timeout 10
# start consul-template
Once consul-template starts, it will peform the actions corresponding to the functions defined in the template file /opt/consul-template/templates/haproxy.ctmpl. In my case, it will query Consul for the value of the key service/haproxy/maxconn and for information about the 2 Consul services varnish.service and ssl.service. It will then save the generated file to /etc/haproxy/haproxy.cfg and it will restart the haproxy service. The relevant snippets from haproxy.cfg are:
frontend http  maxconn 4000  bind 0.0.0.0:80  default_backend servers-http
backend servers-http  balance            roundrobin  option httpchk GET /  option  httplog
    server wordpress1 10.0.1.1:6081 weight 1 check port 6081

    server wordpress2 10.0.1.2:6081 backup weight 1 check port 6081
and
frontend https  maxconn            4000  mode               tcp  bind               0.0.0.0:443  default_backend    servers-https
backend servers-https  mode               tcp  option             tcplog  balance            roundrobin
    server wordpress1 10.0.1.1:443 weight 1 check port 443

    server wordpress2 10.0.1.2:443 backup weight 1 check port 443
I've been running this as a test on lb2. I don't consider my setup quite production-ready because I don't have monitoring in place, and I also want to experiment with consul security tokens for better security. But this is a pattern that I think will work.





7 Articles On Risk Management in Software Development

From the Editor of Methods & Tools - Wed, 11/12/2014 - 16:20
While starting a new software development project creates some enthusiasm, the engineer part of the software developer and project manager will also see this event as a set of possible risks. These risks encompass many domains: business, project team, software architecture, technology… Besides being aware of the possible impediments to the project success, risk management is also used to make choices in the product and technology roadmaps and manage priorities when resources are limited. Here is a list of interesting articles that discusses approaches to risk in software development. Risk Management ...

“Are you listening? Say something!”

James Bach’s Blog - Mon, 10/27/2014 - 03:04

[Note: J. Michael Hammond suggests that I note right at the top of this post that the dictionary definition of listen does not restrict the word to the mere noticing of sounds. In the Oxford English Dictionary, as well as others, an extended and more active definition of listening is also given. It’s that extended definition of listening I am writing about here.]

I’m tired of hearing the simplistic advice about how to listen one must not talk. That’s not what listening means. I listen by reacting. As an extravert, I react partly by talking. Talking is how I chew on what you’ve told me. If I don’t chew on what you say, I will choke or get tummy aches and nightmares. You don’t want me to have nightmares, do you? Until you interrupt me to say otherwise, I charitably assume you don’t.

Below is an alternative theory of listening; one that does not require passivity. I will show how this theory is consistent the “don’t talk” advice if you consider that being quiet while other people speak is one heuristic of good listening, rather than the definition or foundation of it. I am tempted to say that listening requires talking, but that is not quite true. This is my proposal of a universal truth of listening: Listening requires you to change.

To Listen is to Change

  1. I propose that to listen is to react coherently and charitably to incoming information. That is how I would define listening.
  2. To react is to change. The reactions of listening may involve a change of mood, attention, concept, or even a physical action.

Notice that I said “coherently and charitably” and not “constructively” or “agreeably.” I think I can be listening to a criminal who demands ransom even if I am not constructive in my response to him. Reacting coherently is not the same as accepting someone’s view of the world. If I don’t agree with you or do what you want me to, that is not proof of my poor listening. “Coherently” refers to a way of making sense of something by interpreting it such that it does not contradict anything important that you also believe is true and important about the world. “Charitably” refers to making sense of something in a way most likely to fit the intent of the speaker.

Also, notice that coherence does not require understanding. I would not be a bad listener, necessarily, if I didn’t understand the intent or implications of what was told to me. Understanding is too high a burden to require for listening. Coherence and charitability already imply a reasonable attempt to understand, and that is the important part.

Poor listening would be the inability or refusal to do the following:

  • take in data at a reasonable pace. (“reasonable pace” is subject to disagreement)
  • make sense of data that is reasonably sensible in that context, including empathizing with it. (“reasonably sensible” is subject to disagreement)
  • reason appropriately about the data. (“reason appropriately” is subject to disagreement)
  • take appropriate responsibility for one’s feelings about the data (“appropriate responsibility” is subject to disagreement)
  • make a coherent response. (“coherent response” is subject to disagreement)
  • comprehend the reasonable purposes and nature of the interaction (“reasonable purposes and nature” is subject to disagreement)

Although all these elements are subject to disagreement, you might not choose to actively dispute them in a given situation, because maybe you feel that the disagreement is not very important. (As an example, I originally wrote “dispute” in the text above, which I think is fine, but during review, after hearing me read the above, Michael Bolton suggested changing “dispute” to “disagreement” and that seemed okay, too, so I made the change. In making his suggestion, he did not need to explain or defend his preference, because he’s earned a lot of trust with me and I felt listened to.)

I was recently told, in an argument, that I was not listening. I didn’t bother to reply to the man that I also felt he wasn’t listening to me. For the record, I think I was listening well enough, and what the man wanted from me was not listening– he wanted compliance to his world view, which was the very matter of dispute! Clearly he wasn’t getting the reaction he wanted, and the word he used for that was listening. Meanwhile, I had reacted to his statements with arguments against them. To me, this is close to the essence of listening.

If you really believe someone isn’t listening, it’s unlikely that it will help to say that, unless you have a strong personal relationship. When my wife tells me I’m not listening, that’s a very special case. She’s weaker than me and crucial to my health and happiness, therefore I will use every tool at my disposal to make myself easy for her to talk to. I generally do the same for children, dogs, people who seem mentally unstable, fire, and dangerous things, but not for most colleagues. I do get crossed up sometimes. Absolutely. Especially on Twitter. Sometimes I assume a colleague feels powerful, and respond to him that way, only later to discover he was afraid of me.

(This happened again just the other day on Twitter. Which is why it is unlikely you will see me teach in Finland any time soon! I am bitten by such a mistake a few times a year, at least. For me this is not a reason to be softer with my colleagues. Then again, it may be. I struggle with the pros and cons. There is no simple answer. I regularly receive counsel from my most trusted colleagues on this point.)

A Sign of Being Listened to is the Change that Happens

Introspect for a moment. How do you know that your computer is listening to you? At this moment, as I am typing, the letters I want to see are appearing on the screen as I press the keys. WordPress is talking back to me. WordPress is changing, and its changes seem coherent and reasonable to me. My purposes are apparently being served. The computer is listening. Consider what happens when you don’t see a response from your computer. How many times have you clicked “save” or “print” or “calculate” or “paste” and suffered that sinking feeling as the forest noises go completely silent and your screen goes glassy and gets that faraway grayed out look of the damned? You feel out of control. You want to shout at your screen “Come back! I’ve changed my mind! Undo! Cancel!” How do you feel then? You don’t say to yourself “what a good listener my computer is!”

Why is this so? It’s because you are involved in a cybernetic control loop with your computer. Without frequent feedback from your system you lose your control over it. You don’t know what it needs or what to do about it. It may be listening to something, but when nothing changes in a manner that seems to relate to your input, you suspect it is not listening to you.

Based just on this example I conjecture that we feel listened to when a system responds to our utterances and actions in a harmonious manner that honors our purposes. I further conjecture that the advice to maintain attentive silence in order to listen better is a special case of change in such a way as to foster harmony and supportiveness.

Can we think of a situation where listening to someone means shouting loudly over them? I can. I was recently in a situation where a quiet colleague was trying to get students to return to her tutorial after a break. The hallway was too noisy and few people could hear her. I noticed that, so I repeated her words very loudly that her students might hear. I would argue that I listened and responded harmoniously in support of her needs. I didn’t ask her if she felt that I listened to her. She knows I did. I could tell by her smile.

If my wife cries “brake!” when I’m driving, I hit the brake. The physical action of my foot on the brake is her evidence that I listened, not attentive silence or passivity.

It may be a small change or a large change, but for the person communicating with you to feel listened to, they must see good evidence of an appropriate change (or change process) in you.

Let me tell you about being a father of a strong-minded son. I have been in numerous arguments with my boy. I have learned how to get my point across: plant the idea, argue for a while, and then let go of it. I discovered it doesn’t matter if he seems to reject the idea. In fact, I’ve come to believe he cannot reject any idea of mine unless it is genuinely wrong for him. I know he’s listening because he argues with me. And if he gets upset, that means he must be taking it quite seriously. Then I wait. And I invariably see a response in the days that follow (I mean not a single instance of this not happening comes to mind right now).

One of the tragedies of fatherhood is that many fathers can’t tell when their children are listening because they need to see too specific a response too quickly. Some listening is a long process. I know that my son needs to chew on difficult ideas in order to process them. This is how to think about the listening process. True listening implies digestion and incubation. The mental metabolism is subtle, complicated, and absolutely vital.

Let People Chew on Your Ideas

Listening is not primarily about taking information into yourself, any more than eating is about taking food into yourself. With eating the real point is digestion. And for good listening you need to digest, too. Part of digestion is chewing, and for humans part of listening is reacting to the raw data for the purposes of testing understanding and contrasting the incoming data with other data they have. Listening well about any complicated thing requires testing. Does this apply to your spouse and children, too? Yes! But perhaps it applies differently to them than to a colleague at work, and certainly differently than testing-as-listening to politician or a telemarketer.

Why does this matter so much? Because if we uncritically accept ideas we risk falling prey to shallow agreement, which is the appearance of agreement despite an unrecognized deep disagreement. I don’t want to find out in the middle of a critical moment on a project that your definition of testing, or role, or collaboration, or curiosity doesn’t match mine. I want to have conversations about the meanings of words well before that. Therefore I test my understanding. Too many in the Agile culture seem to confuse a vacant smile with philosophical and practical comprehension. I was told recently that for an Agile tester, “collaboration” may be more important than testing skill. That is probably the stupidest thing I have heard all year. By “stupid” I mean willfully refusing to use one’s mind. I was talking to a smart man who would not use his smarts in that moment, because, by his argument, the better tester is the one who agrees to do anything for anyone, not the one who knows how to find important bugs quickly. In other words, any unskilled day laborer off the street, desperate for work, is apparently a better tester than me. Yeah… Right…

In addition to the idea digestion process, listening also has a critical social element. As I said above, whether or not you are listening is, practically speaking, always a matter of potential dispute. That’s the way of it. Listening practices and instances are all tied up in socially constructed rituals and heuristics. And these rituals are all about making ourselves open to reasonable change in response to each other. Listening is about the maintenance of social order as well as maintaining specific social relationships. This is the source of all that advice about listening by keeping attentively quiet while someone else speaks. What that misses is that the speaker also has a duty to perform in the social system. The speaker cannot blather on in ignorance or indifference to the idea processing practices of his audience. When I teach, I ask my students to interrupt me, and I strive to reward them for doing so. When I get up to speak, I know I must skillfully use visual materials, volume control, rhythm, and other rhetorical flourishes in order to package what I’m communicating into a more digestible form.

Unlike many teachers, I don’t interpret silence as listening. Silence is easy. If an activity can be done better and cheaper by a corpse or an inanimate object, I don’t consider it automatically worth doing as a living human.

I strongly disagree with Paul Klipp when he writes: “Then stop thinking about talking and pretend, if it’s not obvious to you yet, that the person who is talking is as good at thinking as you are. You may suddenly have a good idea, or you may have information that the person speaking doesn’t. That’s not a good enough reason to interrupt them when they are thinking.” Paul implies that interrupting a speaker is an expression of dominance or subversion. Yes, it can be, but it is not necessarily so, and I wish someone trained in Anthropology would avoid such an uncharitable oversimplification. Some interruptions are harmful and some are helpful. In fact, I would say that every social act is both harmful and helpful in some way. We must use our judgment to know what to say, how to say it and when. Stating favorite heuristics as if they were “best practices” is patronizing and unnecessary.

One Heuristic of Listening: Stop Talking

Where I agree with Paul and others like him is that one way of improving the harmony of communication and that feeling of being coherently and charitably responded to is to talk less. I’m more likely to use that in a situation where I’m dealing with someone whom I suspect is feeling weak, and whom I want to encourage to speak to me. However, another heuristic I use in that situation is to speak more. I do this when I want to create a rhetorical framework to help the person get his idea across. This has the side effect of taking pressure of someone who may not want to speak at all. I say this based on the vivid personal experience of my first date with the one who would become my wife. I estimate I spoke many thousands of words that evening. She said about a dozen. I found out later that’s just what she was looking for. How do I know? After two dates we got married. We’ve been married 23 years, so far. I also have many vivid experiences of difficult conversations that required me to sit next to her in silence for as long as 10 minutes until she was ready to speak. Both the “talk more” and “talk less” heuristics are useful for having a conversation.

What does this have to do with testing?

My view of listening can be annoying to people for exactly the same reason that testing is annoying to people. A developer may want me to accept his product without “judgment.” Sorry, man. That is not the tester’s way. A tester who doesn’t subject your product to criticism is, in fact, not taking it seriously. You should not feel honored by that, but rather insulted. Testing is how I honor strong, good products. And arguing with you may be how I honor your ideas.

Listening, I claim, is itself a testing process. It must be, because testing is how we come to comprehend anything deeply. Testing is a practice that enables deep learning and deeply trusting what we know.

Are You Listening to Me?

Then feel free to respond. Even if you disagree, you could well have been listening. I might be able to tell from your response, if that matters to you.

If you want to challenge this post, try reading it carefully… I will understand if you skip parts, or see one thing and want to argue with that. Go ahead. That might be okay. If I feel that there is critical information that you are missing, I will suggest that you read the post again. I don’t require that people read or listen to me thoroughly before responding. I ask only that you make a reasonable and charitable effort to make sense of this.

 

Categories: Testing & QA

Thought after Test Automation Day 2013

Henry Ford said “Obstacles are those frightful things you see when take your eyes off the goal.” After I’ve been to Test Automation Day last month I’m figuring out why industrializing testing doesn’t work. I try to put it in this negative perspective, because I think it works! But also when is it successful? A lot of times the remark from Ford is indeed the problem. People tend to see obstacles. Obstacles because of the thought that it’s not feasible to change something. They need to change. But that’s not an easy change.

After attending the #TAD2013 as it was on Twitter I saw a huge interest in better testing, faster testing, even cheaper testing by using tools to industrialize. Test automation has long been seen as an interesting option that can enable faster testing. it wasn’t always cheaper, especially the first time, but at least faster. As I see it it’ll enable better testing. “Better?” you may ask. Test automation itself doesn’t enable better testing, but by automating regression tests and simple work the tester can focus on other areas of the quality.

images

And isn’t that the goal? In the end all people involved in a project want to deliver a high quality product, not full of bugs. But they also tend to see the obstacles. I see them less and less. New tools are so well advanced and automation testers are becoming smarter and smarter that they enable us to look beyond the obstacles. I would like to say look over the obstacles.

At the Test Automation Day I learned some new things, but it also proved something I already know; test automation is here to stay. We don’t need to focus on the obstacles, but should focus on the goal.

Categories: Testing & QA

The Toyota Way: The need for doing it right the first time

After WWII Toyota started developing its Toyota Production System (TPS); which was identified as ‘Lean’ in the 1990s. Taiichi Ohno, Shigeo Shingo and Eiji Toyoda developed the system between 1948 and 1975. In the myth surrounding the system it was not inspired by the American automotive industry, but from a visit to American supermarkets, Ohno saw the supermarket as model for what he was trying to accomplish in the factor and perfect the Just-in-Time (JIT) production system. While accomplishing this low inventory levels were a key outcome of the TPS, and an important element of the philosophy behind its system is to work intelligently and eliminate waste so that only minimal inventory is needed.

As TPS and Lean have their own principles as outlined by Toyota:

  • Long-term Philosophy
  • Right process will produce the right results
  • Value to organization by developing people
  • Solving root problems drives organizational learning

As these principles were summed up and published by Toyota in 2001, by naming it “The Toyota Way 2001”. It consists the above named principles in two key areas: Continuous Improvement, and Respect for People.

the-toyota-way

The principles for a continuous improvement include establishing a long-term vision, working on challenges, continual innovation, and going to the source of the issue or problem. The principles relating to respect for people include ways of building respect and teamwork. When looking at the ALM all these principles come together in the ‘first time right’ approach already mentioned. And from Toyota’s view they were outlined as followed:

  • The right process will produce the right results
    • Create continuous process flow to bring problems to the surface.
    • Use the ‘pull’ system to avoid overproduction (kanban)
    • Level out the workload (heijunka).
    • Build a culture of stopping to fix problems, to get quality right from the first (jidoka)
  • Continuously solving root problems drives organizational learning
    • Go and see for yourself to thoroughly understand the situation (Genchi Genbutsu);
    • Make decisions slowly by consensus, thoroughly considering all options (nemawashi); implement decisions rapidly;
    • Become a learning organization through relentless reflection (hansei) and continuous improvement (kaizen).
Let’s do it right now!

As the economy is changing and IT is more common sense throughout ore everyday life the need for good quality software products has never been this high. Software issues create bigger and bigger issues in our lives. Think about trains that cannot ride due to software issues, bank clients that have no access to their bank accounts, and people oversleeping because their alarm app didn’t work on their iPhone. As Capers Jones [Jones, 2011] states in his 2011 study that “software is blamed for more major business problems than any other man-made product” and that “poor quality has become one of the most expensive topics in human history”. The improvement of software quality is a key topic for all industries.

 Right the first time vs jidoka

In both TPS and Lean autonomation or jidoka are used. Autonomation can be described as ‘intelligent autonomation’, it means that when an abnormal situation arises the ‘machine’ stops and fix the abnormality. Autonomation prevents the production of defective products, eliminates overproduction, and focuses attention on understanding the problem and ensuring that it never recurs; a quality control process that applies the following four principles:

  • Detect the abnormality.
  • Stop.
  • Fix or correct the immediate condition.
  • Investigate the root cause and install a countermeasure.
Find defects as early as possible

In other words autonomation helps to get quality right the first time perfectly. With IT projects being different from the Toyota car production line, ‘perfectly’ may be a bit too much, but the process around quality assurance should be the same:

  • Find the defect.
  • Stop.
  • Fix or correct the error.
  • Investigate the root cause and take countermeasures.

The defect should be found as early as possible to be fixed as early as possible. And as with Lean and TPS the reason behind this is to make it possible to address the identification and correction of defects immediately in the process.

Categories: Testing & QA