Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator/categories/2&page=1' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Testing & QA
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.

Quote of the Month January 2016

From the Editor of Methods & Tools - Wed, 01/06/2016 - 16:19
Testing by itself does not improve software quality. Test results are an indicator of quality, but in and of themselves, they don’t improve it. Trying to improve software quality by increasing the amount of testing is like trying to lose weight by weighing yourself more often. What you eat before you step onto the scale […]

Git stash driven development

Actively Lazy - Tue, 01/05/2016 - 22:23

I’ve found myself using a pattern quite often recently, which I’ve been calling “git stash driven development” – that is, relying heavily on the magic of git stash as part of my development workflow.

Normally I follow what I think of as a fairly typical TDD workflow:

  • Write next test, watch it fail
  • Write code to make it pass
  • Commit
  • Refactor
  • Commit
  • Push

This cycle can repeat very frequently – as often as every couple of minutes. Sometimes this cycle gets slowed down when the next test to write isn’t obvious or the refactoring needs more thought. But generally this is the process I try and follow.

Quite often having written the next test which takes me forwards on my feature I hit a problem: I can’t actually make the test pass (easily). First I need to refactor to make the problem easy. In that situation I can mark the test as ignored, commit and come back to it later. I refactor as required, commit, push; then finally unignore my test and get back to where I was before. This is a fairly neat process.

However there are a couple of times when this process doesn’t work: what if I’m part way through writing my test and I realise I can’t finish without refactoring the test infrastructure? I can’t ignore my test, it probably isn’t even compiling. I certainly don’t want to commit it in its current state. I could just bin my test and re-write it, if I’m following the 15 minute rule I’m not going to lose much work. But, with the magic of git stash, I can stash my changes and come back once I’ve refactored the test code.

The more annoying time this happens is when I’m part way through a refactor step. This happens more commonly when I’m really going through a design-change – this isn’t really refactoring as it often happens outside of the normal TDD loop. I’m trying to evolve the design to somewhere different; sometimes this is driven by tests, sometimes its a non-feature changing refactor. But often there are non-trivial changes happening across numerous source files. At this point it is very easy to get part way through a refactor and realise that something else needed to have happened first. I could bin my change, I only stand to lose 15 minutes work – but why throw it away when I have git stash?

So I git stash my changes, go and make the change I needed to have happened first. Then, all too commonly, I get part way through this second change and realise something else needs to happen first. Well, git stash again! This stack of git stashes can get quite deep, if you’re not careful. But once I’ve bottomed the stack out, once I’ve managed to commit a refactor that frees up the step above I can git stash pop, complete the next refactor, commit, git stash pop; and so on up the stack until I’m done.

Now, arguably, I’m discovering the refactor in reverse order, but this seems to me often how I find it. I could have spent more time analysing the change in detail, of course. Spent time planning out my change on paper before embarking on it in the correct order. However, this is always time consuming and there’s still the risk that I miss something and come at a change “backwards”. I find that using git stash in this way lets me discover the refactor that I need to make one step at a time. Each commit is kept small, I try and stick to the 15 minute rule so that no single commit loses more than 15 minutes. Ultimately the design change is completed in a sequence of small commits, each of which builds logically on the one before. They’ve been discovered by exploration, the commits were just discovered in reverse order.

The danger is always that I find a refactor step I can’t complete the way I’d imagined – now I can’t unwind the stack and potentially all the previous git stashes aren’t committable. Whenever this happens I normally find going one or two levels up the stack will present a different approach, from where I can continue as before.


Categories: Programming, Testing & QA

Distributing a beta version of an iOS app

Agile Testing - Grig Gheorghiu - Fri, 01/01/2016 - 20:41

I am not an iOS expert by any means, but recently I’ve had to maintain an iOS app and distribute it to beta testers. I had to jump through a few hoops, so I am documenting here the steps I had to take.

First of all, I am using Xcode 6.4 with the Fabric 2.1.1 plugin. I assume you are already signed up for the Fabric/Crashlytics service and that you also have an Apple developer account.
  1. Ask each beta tester to send you the UUID of the devices they want to run your app on.
  2. Go to developer.apple.com -> “Certificates, Identifiers and Profiles” -> “Devices” and add each device with its associated UUID. Let’s say you add a device called “Tom’s iPhone 6s” with its UUID.
  3. Go to Xcode -> Preferences -> Accounts. If you already have an account set up, remove it by selecting it and clicking the minus icon on the lower left side. Add an account: click the plus icon, choose “Add Apple ID” and enter your Apple ID and password. This will import your Apple developer provisioning profile into Xcode, with the newly added device UUIDs (note: there may be a better way of adding/modifying the provisioning profile within Xcode but this worked for me)
  4. Make sure the Fabric plugin is running on your Mac.
  5. Go to Xcode and choose the iOS application you want to distribute. Choose iOS Device as the target for the build.
  6. Go to Xcode -> Product -> Archive. This will build the app, then the Fabric plugin will pop up a message box asking you if you want to distribute the archive build. Click Distribute.
  7. The Fabric plugin will pop up a dialog box asking you for the email of the tester(s) you want to invite. Enter one or more email addresses. Enter release notes. At this point the Fabric plugin will upload your app build to the Fabric site and notify the tester(s) that they are invited to test the app.

Early Impressions of Kotlin

Mistaeks I Hav Made - Nat Pryce - Thu, 12/31/2015 - 00:07
We’ve been using the Kotlin programming language for a few weeks on our latest project to perform technical experiments, explore the problem space, and write a few HTTP services. I’ve also ported Hamcrest to Kotlin, as HamKrest, to help us write tests, and written a small library for type safe configuration of our services. Why Kotlin? The organisation I’m working with has mature infrastructure for deploying JVM services in their internal PaaS cloud. They use a mix of Java and Scala but have found Scala builds too slow. Watching my colleague struggle to use Java 8 streams to write what should have been basic functional map-and-fold code, I decided to have a look at some other “post-Java” JVM languages. We wanted a typed language. We wanted language aware editing. And we wanted a language that had an organisation behind its development and enough people using it that we could get questions answered, even the stupid ones we’d be likely to ask while learning. That eliminated dynamically typed languages (Groovy, JavaScript, Clojure) and languages that are less popular or have small, informal development teams behind them (Xtend, Gosu, Fantom, Frege, etc.). In the end it came down to Red Hat’s Ceylon and JetBrains’ Kotlin. Of the two, Ceylon is the most innovative, and therefore (to me, anyway) the most interesting, but Kotlin met more of our criteria. Kotlin has a more active community, is being used for commercial development by JetBrains and a number of other companies1, has good editing support within IntelliJ, has an active community on social media, and promises easy interop with existing Java libraries. Good points Kotlin was very quick to learn. Kotlin is a conservative increment to Java that smooths off a lot of Java’s rough edges. Kotlin is small and regular, with few special cases and gotchas to learn. In many ways it feels a bit like a compiled, typed Python with curly brace syntax. The type system is a breath of fresh air compared to Java. Type inferencing makes code less cluttered. There is no distinction between primitive & reference types. Generics and subtyping work together far better than Java: the type system uses declaration site variance, not use site variance, and variance does not usually have to be specified at all for functions. You never have unavoidable compiler warnings, as you do with Java’s type system. Functional programming is more convenient in Kotlin than Java. You can define free-standing functions and constants at the top level of a module, instead of having to define them in a “Utils” class. Function definitions can be nested. There is language support for immutable value types (aka “data classes”) and algebraic data types (“sealed classes”). Null safety is enforced by the type system and variables and fields are non-nullable by default. The language defines standard function types and a lambda syntax for anonymous functions. You can define extension methods on existing types. They are only syntactic sugar for free-standing functions, but nevertheless can lead to more concise, expressive code. The Kotlin standard library defines a number of useful extension methods, especially on the Iterable and String types. Kotlin supports, and carefully controls, operator overloading. You can overload operators that act as functions (e.g. arithmetic, comparison, function call) but not those that perform flow control (e.g. short-cut logical operators). You cannot define your own operators, which will stop me going down the rabbit hole of “ascii-art programming”, which I found hard to resist when writing Scala. Class definitions require a lot less boilerplate than in Java. Using old fashioned getter-and-setter style Java code is made more convenient by language support for bean properties. Anonymous extension methods (borrowed from Groovy, I believe) make domain specific embedded languages easier to write. Method chaining builder APIs are unnecessary in Kotlin and more awkward than using the apply function to set properties of an object. Kotlin has a philosophy of preferring behaviour to be explicitly specified. For example, there is no automatic coercion between numeric types, even from low to high precision. That means you have to explicitly convert from Int to Long, for example. I like that, but I expect some people will find it annoying. Surprises I didn’t miss destructuring pattern match Kotlin doesn’t have destructuring pattern match & language support for tuples. A limited form of destructuring can be used in for loops and assignments. The when expression is an alternative conditional expression that doesn’t do destructuring. I was surprised to find that I didn’t really miss destructuring. Without tuples you have to use data classes with named fields. The flow-sensitive typing then works rather well where destructuring would be necessary in a language where the type system is not flow-sensitive. For example: sealed class Result { class Success(val value: T) : Result() class Failure(val exception: Throwable): Result() } ... val e : Result = doSomething() if (e is Result.Success) { // e is known to be a Result.Success and // can be used as such without a downcast println(e.value) } Sealed classes cannot be data classes An algebraic data type cannot be be a data class, so you have to write equals, hashCode, etc. yourself. This is a surprising omission, since I always want an algebraic data type to be a value type. The Kotlin developers say that this will be fixed after version 1.0 is released. Functions are not quite first-class objects There’s a difference between f1 and f2 below: val f1 = {i:Int -> i+1} fun f2(i:Int) = i + 1 Code can refer to the value of f1 directly, but must use the “::” operator to obtain a reference to f2: val f = ::f2 Null safe operators push me to write more extension functions The null safe operators (?. and ?:) only help when dereferencing a nullable reference. When you have a nullable reference and want to pass a value (if it exists) to a function as a parameter, you have to use the awkward construct: nullableThing?.let{ thing -> fn(thing, other, parameters) } I find myself refactoring my functions to extension functions to reduce the syntactic noise, letting me rewrite the code above as: nullableThing?.fn(other, parameters) Is that a good thing? I’m not sure. Sometimes it feels a little forced. Companion objects Kotlin borrows the concept of companion objects from Scala. Why can’t classes be objects? Perhaps it’s a limitation of the JVM but compared to, say, Python, it feels clunky. Frustrations I only have a couple of frustrations with Kotlin. Lack of polymorphism Operators and the for loop are syntactic sugar that desugar to method calls. The methods are invoked statically and structurally — the target does not need to implement an interface that defines the method. For example, a + b desugars to a.add(b). However, there’s no way to write a generic function that sums its parameters, because add is not defined on an interface that can be used as an upper bound. There’s no way to define the following function for any type that has an add operator method. fun sum(T first, T second) : T = first + second You would have to define overloads for different types, but then wouldn’t be able to call sum within a generic function. fun sum(first: Int, second: Int) = first + second fun sum(first: Long, second: Long) = first + second fun sum(first: Double, second: Double)= first + second fun sub(first: Matrix, second: Matrix)= first + second fun sum(first: Money, second: Money)= first + second ... That kind of duplication is what generics are meant to avoid! Extension methods are also statically bound. That means you cannot write a generic function that calls an extension method on the value of a type parameter. For example, the standard library defines the same extension methods on unrelated types — forEach, map, flatMap, fold, etc. But there is no concept of, for example, “mappable” or “foldable”. Nor is there a way of defining such a concept and applying it to existing types to allow you to write generic functions over unrelated types. Kotlin doesn’t have higher kinded types that would let you express this kind of generic function or type classes that would let you add polymorphism without modifying existing type definitions. Compared to Rust’s traits, which support extension methods, operator overloading, type classes for parametric polymorphism, and interfaces for subtype polymorphism, Kotlin’s monomorphic extension methods and operator overloading are quite limited and do not help refactor duplicated logic. However, for typical monomorphic, procedural Java code this is probably not an issue. Optionality is a Special Case Kotlin’s support for nullable types is implemented by a special case in the type system and special case operators that only apply to nullable references. The operators do not desugar to methods that can be implemented for other types. For example, optionalThing?.foo can be considered a map of the function {thing -> thing.foo} over the option optionalThing. If foo itself is nullable, then it can be considered a flatMap. But the expressions do not desugar to map and flatMap And if you want to map or flatMap a function, you have to use a different syntax: optionalThing?.let(theFunction). For typical Java code, which is monomorphic and uses null references with wild abandon, language support for nullability is invaluable. But I would find it more convenient if it could be used polymorphically, or if Kotlin used a common naming convention for optional and other functor types. Null safety is not enforced when you interop with Java code. And you do that a lot. Kotlin doesn’t have many libraries or much of a runtime and makes it easy to call existing Java libraries directly. Kotlin expects you to know what you’re doing with respect to null references when calling Java code, and doesn’t force you to treat every value returned from Java as nullable. As a result, null safety is a bit of an illusion in a lot of our Kotlin code and it has come back to bite us. Conclusion The code we’ve been writing has been a mix of coordinating and piping data between existing Java libraries – Apache HTTP client, Undertow HTTP server, JDBC, Sesame, JSON and XML parsers, and so on – and algorithmic code that analyses human readable text. For that kind of work, Kotlin has been very useful. The Kotlin code is far more concise than the equivalent Java. In our domain models and algorithmic code, Kotlin’s type safety and especially null safety, has been a big help. Exactly how happy we’ve been with Kotlin has depended on the design style of the code we’re writing: For functional programming, Kotlin has occasionally been frustrating, because we cannot use parametric polymorphism to factor out duplicated logic as much as we’d like. For object-oriented programming, Kotlin’s concise syntax for class definitions and language support for delegation avoids a lot of Java’s boilerplate. However, most Java out there is monomorphic, procedural code moving data between “NOJOs” and APIs that expect objects to have “bean” getters and setters. Kotlin has made working with that kind of API much easier and far more concise than doing so in Java. As far as I can tell, RedHat sponsor development of Ceylon but do not actually use it to develop their own products. If I’m wrong, please let me know in the comments.↩
Categories: Programming, Testing & QA

Installing and configuring Raspbian Jessie on a Raspberry Pi B+

Agile Testing - Grig Gheorghiu - Wed, 12/23/2015 - 20:38
I blogged before about configuring a Raspberry Pi B+ with Raspbian Wheezy. Here are some notes I took today while going through the whole process again, but this time with the latest Raspbian version, Jessie, from 2015-11-21. Many steps are the same, but I will add instructions for configuring a wireless connection.

1) Bought micro SD card. Note: DO NOT get a regular SD card for the B+ because it will not fit in the SD card slot. You need a micro SD card.

2) Inserted the SD card via an SD USB adaptor in my MacBook Pro.

3) Went to the command line and ran df to see which volume the SD card was mounted as. In my case, it was /dev/disk2s1.

4) Unmounted the SD card from my Mac. I initially tried 'sudo umount /dev/disk2s1' but the system told me to use 'diskutil unmount', so the command that worked for me was:

$ diskutil unmount /dev/disk2s1

5) Downloaded 2015-11-21-raspbian-jessie.zip from  https://downloads.raspberrypi.org/raspbian/images. Unzipped it to obtain the image file 2015-11-21-raspbian-jessie.img
6) Used dd to copy the image from my Mac to the SD card. Thanks to an anonymous commenter on my previous blog post, I specified the target of the dd command as the raw device /dev/rdisk2. Note: DO NOT specify the target as /dev/disk2s1 or /dev/rdisk2s1. Either /dev/disk2 or /dev/rdisk2 will work, but copying to the raw device is faster. Here is the dd command I used:
$ sudo dd if=2015-11-21-raspbian-jessie.img of=/dev/rdisk2 bs=1m3752+0 records in3752+0 records out3934257152 bytes transferred in 233.218961 secs (16869371 bytes/sec)
7) I unmounted the SD card from my Mac one more time:
$ diskutil unmount /dev/disk2s1
8) I inserted the SD card into my Raspberry Pi. I also inserted a USB WiFi adapter (I used the Wi-Pi 802.11n adapter). My Pi was also connected to a USB keyboard, to a USB mouse and to a monitor via HDMI. 
9) I powered up the Pi. It went through the Raspbian Jessie boot process uneventfully, and it brought up the X Windows GUI interface (which is the default in Jessie, as opposed to the console in Wheezy). At this point, I configured the Pi to boot back into console mode by going to Menu -> Preferences -> Raspberry Pi Configuration and changing the Boot option from "To Desktop" to "To CLI". While in the configuration dialog, I also changed the default password for user pi, and unchecked the autologin option.
10) I rebooted the Pi and this time it booted up in console mode and stopped at the login prompt. I logged in as user pi.
11) I spent the next 30 minutes googling around to find out how to make the wireless interface work. It's always been a chore for me to get wlan to work on a Pi, hence the following instructions (based on this really good blog post).
12) Edit /etc/network/interfaces:
(i)  change "auto l0" to "auto wlan0"(ii) change "iface wlan0 inet manual" to "iface wlan0 inet dhcp"
13) Edit /etc/wpa_supplicant/wpa_supplicant.conf and add this at the end:
network={  ssid="your_ssid"  psk="your_ssid_password"}
14) Rebooted the Pi and ran ifconfig. At this point I could see wlan0 configured properly with an IP address.
Hope these instructions work for you. Merry Christmas!

Software Architecture for Rookies

From the Editor of Methods & Tools - Wed, 12/23/2015 - 17:10
How frequently have you encountered people who are neither coding- nor tech-savvy discussing software architecture? How frequently are they decision-makers? Yes, it happens, and everyone who experiences it knows what it causes. The point is to find a way to talk about software architecture with all interested stakeholders (even if they are business/product people who […]

Software Development Conferences Forecast December 2015

From the Editor of Methods & Tools - Mon, 12/21/2015 - 16:01
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP), DevOps and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods […]

Software Development Linkopedia December 2015

From the Editor of Methods & Tools - Fri, 12/11/2015 - 09:52
Here is our monthly selection of knowledge on programming, software testing and project management. This month you will find some interesting information and opinions about hiring software developers, mobile testing, self-organization, technical debt, Agile testing, product prioritization, menu design, project estimating, Agile learning and Devops in large organizations. Blog: How to Onboard Software Engineers – […]

GTAC 2015 Wrap Up

Google Testing Blog - Tue, 12/08/2015 - 21:45
by Michael Klepikov and Lesley Katzen on behalf of the GTAC Committee

The ninth GTAC (Google Test Automation Conference) was held on November 10-11 at the Google Cambridge office, the “Hub” of innovation. The conference was completely packed with presenters and attendees from all over the world, from industry and academia, discussing advances in test automation and the test engineering computer science field, bringing with them a huge diversity of experiences. Speakers from numerous companies and universities (Applitools, Automattic, Bitbar, Georgia Tech, Google, Indian Institute of Science, Intel, LinkedIn, Lockheed Martin, MIT, Nest, Netflix, OptoFidelity, Splunk, Supersonic, Twitter, Uber, University of Waterloo) spoke on a variety of interesting and cutting edge test automation topics.


All presentation videos and slides are posted on the Video Recordings and Presentations pages. All videos have professionally transcribed closed captions, and the YouTube descriptions have the slides links. Enjoy and share!

We had over 1,300 applicants and over 200 of those for speaking. Over 250 people filled our venue to capacity, and the live stream had a peak of about 400 concurrent viewers, with about 3,300 total viewing hours.

Our goal in hosting GTAC is to make the conference highly relevant and useful for both attendees and the larger test engineering community as a whole. Our post-conference survey shows that we are close to achieving that goal; thanks to everyone who completed the feedback survey!

  • Our 82 survey respondents were mostly (81%) test focused professionals with a wide range of 1 to 40 years of experience. 
  • Another 76% of respondents rated the conference as a whole as above average, with marked satisfaction for the venue, the food (those Diwali treats!), and the breadth and coverage of the talks themselves.


The top five most popular talks were:

  • The Uber Challenge of Cross-Application/Cross-Device Testing (Apple Chow and Bian Jiang) 
  • Your Tests Aren't Flaky (Alister Scott) 
  • Statistical Data Sampling (Celal Ziftci and Ben Greenberg) 
  • Coverage is Not Strongly Correlated with Test Suite Effectiveness (Laura Inozemtseva) 
  • Chrome OS Test Automation Lab (Simran Basi and Chris Sosa).


Our social events also proved to be crowd pleasers. The social events were a direct response to feedback from GTAC 2014 for organized opportunities for socialization among the GTAC attendees.


This isn’t to say there isn’t room for improvement. We had 11% of respondents express frustration with event communications and provided some long, thoughtful suggestions for what we could do to improve next year. Also, many of the long form comments asked for a better mix of technologies, noting that mobile had a big presence in the talks this year.

If you have any suggestions on how we can improve, please comment on this post, or better yet – fill out the survey, which remains open. Based on feedback from last year urging more transparency in speaker selection, we included an individual outside of Google in the speaker evaluation. Feedback is precious, we take it very seriously, and we will use it to improve next time around.

Thank you to all the speakers, attendees, and online viewers who made this a special event once again. To receive announcements about the next GTAC, currently planned for early 2017, subscribe to the Google Testing Blog.

Categories: Testing & QA

Protecting your site for free with Let's Encrypt SSL certificates and acmetool

Agile Testing - Grig Gheorghiu - Tue, 12/08/2015 - 02:47
The buzz level around Let's Encrypt has been more elevated lately, due to their opening up their service as a public beta. If you don't know what Let's Encrypt is, it's a Certificate Authority which provides SSL certificates free of charge. The twist is that they implement a protocol called ACME ("Automated Certificate Management Environment") for automating the management of domain-validation certificates, based on a simple JSON-over-HTTPS interface. Read more technical details about Let's Encrypt here.

The certificates from Let's Encrypt have a short life of 90 days, and this is done on purpose so that they encourage web site administrators to renew them programatically and automatically. In what follows, I'll walk you through how to obtain and install Let's Encrypt certificates for nginx on Ubuntu. I will use a tool called acmetool, and not the official Let's Encrypt client tools, because acmetool generates standalone SSL keys and certs and doesn't try to reconfigure a given web server automatically in order to use them (like the letsencrypt client tools do). I like this separation of concerns. Plus acmetool is written in Go, so you just deploy it as a binary and you're off to the races.
1) Configure nginx to serve your domain name

I will assume you want to protect www.mydomain.com with SSL certificates from Let's Encrypt. The very first step, which I assume you have already taken, is to configure nginx to serve www.mydomain.com on port 80. I also assume the document root is /var/www/mydomain.
2) Install acmetool

$ sudo apt-get install libcap-dev
$ git clone https://github.com/hlandau/acme $ cd acme$ make && sudo make install
3) Run "acmetool quickstart" to configure ACME
The ACME protocol requires a verification of your ownership of mydomain.com. There are multiple ways to prove that ownership and the one I chose below was to let the ACME agent (in this case acmetool) to drop a file under the nginx document root. As part of the verification, the ACME agent will also generate a keypair under the covers, and sign a nonce sent from the ACME server with the private key, in order to prove possession of the keypair.# acmetool quickstart
------------------------- Select ACME Server -----------------------Please choose an ACME server from which to request certificates. Your principal choices are the Let's Encrypt Live Server, and the Let's Encrypt Staging Server.
You can use the Let's Encrypt Live Server to get real certificates.
The Let's Encrypt Staging Server does not issue publically trusted certificates. It is useful for development purposes, as it has far higher rate limits than the live server.
  1) Let's Encrypt Live Server - I want live certificates  2) Let's Encrypt Staging Server - I want test certificates  3) Enter an ACME server URL
I chose option 1 (Let's Encrypt Live Server).
----------------- Select Challenge Conveyance Method ---------------acmetool needs to be able to convey challenge responses to the ACME server in order to prove its control of the domains for which you issue certificates. These authorizations expire rapidly, as do ACME-issued certificates (Let's Encrypt certificates have a 90 day lifetime), thus it is essential that the completion of these challenges is a) automated and b) functioning properly. There are several options by which challenges can be facilitated:
WEBROOT: The webroot option installs challenge files to a given directory. You must configure your web server so that the files will be available at <http://[HOST]/.well-known/acme-challenge/>. For example, if your webroot is "/var/www", specifying a webroot of "/var/www/.well-known/acme-challenge" is likely to work well. The directory will be created automatically if it does not already exist.
PROXY: The proxy option requires you to configure your web server to proxy requests for paths under /.well-known/acme-challenge/ to a special web server running on port 402, which will serve challenges appropriately.
REDIRECTOR: The redirector option runs a special web server daemon on port 80. This means that you cannot run your own web server on port 80. The redirector redirects all HTTP requests to the equivalent HTTPS URL, so this is useful if you want to enforce use of HTTPS. You will need to configure your web server to not listen on port 80, and you will need to configure your system to run "acmetool redirector" as a daemon. If your system uses systemd, an appropriate unit file can automatically be installed.
LISTEN: Directly listen on port 80 or 443, whichever is available, in order to complete challenges. This is useful only for development purposes.
  1) WEBROOT - Place challenges in a directory  2) PROXY - I'll proxy challenge requests to an HTTP server  3) REDIRECTOR - I want to use acmetool's redirect-to-HTTPS functionality  4) LISTEN - Listen on port 80 or 443 (only useful for development purposes)
I chose option 1 (WEBROOT).
------------------------- Enter Webroot Path -----------------------Please enter the path at which challenges should be stored.
If your webroot path is /var/www, you would enter /var/www/.well-known/acme-challenge here.The directory will be created if it does not exist.
Webroot paths vary by OS; please consult your web server configuration.
I indicated /var/www/mydomain/.well-known/acme-challenge as the directory where the challenge will be stored.
------------------------- Quickstart Complete ----------------------The quickstart process is complete.
Ensure your chosen challenge conveyance method is configured properly before attempting to request certificates. You can find more information about how to configure your system for each method in the acmetool documentation: https://github.com/hlandau/acme.t/blob/master/doc/WSCONFIG.md
To request a certificate, run:
$ sudo acmetool want example.com www.example.com
If the certificate is successfully obtained, it will be placed in /var/lib/acme/live/example.com/{cert,chain,fullchain,privkey}.
Press Return to continue.
4) Obtain the Let's Encrypt SSL key and certificates for www.mydomain.com
As the quickstart output indicates above, we need to run:
# acmetool want www.mydomain.com
This should run with no errors and drop the following files in /var/lib/acme/live/www.mydomain.com: cert, chain, fullchain, privkey and url.
5) Configure nginx to use the Let's Encrypt SSL key and certificate chain
I found a good resource for specifying secure (as of Dec. 2015) SSL configurations for a variety of software, including nginx: cipherli.st.
Here is the nginx configuration pertaining to SSL that I used, pointing to the SSL key and certificate chain retrieved by acmetool from Let's Encrypt:
        listen 443 ssl default_server;        listen [::]:443 ssl default_server;
        ssl_certificate     /var/lib/acme/live/www.mydomain.com/fullchain;        ssl_certificate_key /var/lib/acme/live/www.mydomain.com/privkey;
        ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";        ssl_protocols TLSv1 TLSv1.1 TLSv1.2;        ssl_prefer_server_ciphers on;        ssl_session_cache shared:SSL:10m;        add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";        add_header X-Frame-Options DENY;        add_header X-Content-Type-Options nosniff;        ssl_session_tickets off; # Requires nginx >= 1.5.9        ssl_stapling on; # Requires nginx >= 1.3.7        ssl_stapling_verify on; # Requires nginx => 1.3.7
At this point, if you hit www.mydomain.com over SSL, you should be able to inspect the SSL certificate and see that it's considered valid by your browser (I tested it in Chrome, Firefox and Safari). The Issuer Name has Organization Name "Let's Encrypt" and Common Name "Let's Encrypt Authority X1".
6) Configure cron job for SSL certificate renewal
Let's Encrypt certificates expire in 90 days after the issue date, so you need to renew them more often than you are used to with regular SSL certificates. I added this line to my crontab on the server that handles www.mydomain.com:
# m h  dom mon dow   command0 0 1 * * /usr/local/bin/acmetool reconcile --batch; service nginx restart
This runs the acmetool "reconcile" command in batch mode (with no input required from the user) at midnight on the 1st day of every month, then restarts nginx just in case the certificate has changed. If the Let's Encrypt SSL certificate is 30 days away from expiring, acmetool reconcile will renew it.
I think Let's Encrypt is a great service, and you should start using it if you're not already!


Protecting your site for free with Let's Encrypt SSL certificates and acmetool

Agile Testing - Grig Gheorghiu - Tue, 12/08/2015 - 02:47
The buzz level around Let's Encrypt has been more elevated lately, due to their opening up their service as a public beta. If you don't know what Let's Encrypt is, it's a Certificate Authority which provides SSL certificates free of charge. The twist is that they implement a protocol called ACME ("Automated Certificate Management Environment") for automating the management of domain-validation certificates, based on a simple JSON-over-HTTPS interface. Read more technical details about Let's Encrypt here.

The certificates from Let's Encrypt have a short life of 90 days, and this is done on purpose so that they encourage web site administrators to renew them programatically and automatically. In what follows, I'll walk you through how to obtain and install Let's Encrypt certificates for nginx on Ubuntu. I will use a tool called acmetool, and not the official Let's Encrypt client tools, because acmetool generates standalone SSL keys and certs and doesn't try to reconfigure a given web server automatically in order to use them (like the letsencrypt client tools do). I like this separation of concerns. Plus acmetool is written in Go, so you just deploy it as a binary and you're off to the races.
1) Configure nginx to serve your domain name

I will assume you want to protect www.mydomain.com with SSL certificates from Let's Encrypt. The very first step, which I assume you have already taken, is to configure nginx to serve www.mydomain.com on port 80. I also assume the document root is /var/www/mydomain.
2) Install acmetool

$ sudo apt-get install libcap-dev
$ git clone https://github.com/hlandau/acme $ cd acme$ make && sudo make install
3) Run "acmetool quickstart" to configure ACME
The ACME protocol requires a verification of your ownership of mydomain.com. There are multiple ways to prove that ownership and the one I chose below was to let the ACME agent (in this case acmetool) to drop a file under the nginx document root. As part of the verification, the ACME agent will also generate a keypair under the covers, and sign a nonce sent from the ACME server with the private key, in order to prove possession of the keypair.# acmetool quickstart
------------------------- Select ACME Server -----------------------Please choose an ACME server from which to request certificates. Your principal choices are the Let's Encrypt Live Server, and the Let's Encrypt Staging Server.
You can use the Let's Encrypt Live Server to get real certificates.
The Let's Encrypt Staging Server does not issue publically trusted certificates. It is useful for development purposes, as it has far higher rate limits than the live server.
  1) Let's Encrypt Live Server - I want live certificates  2) Let's Encrypt Staging Server - I want test certificates  3) Enter an ACME server URL
I chose option 1 (Let's Encrypt Live Server).
----------------- Select Challenge Conveyance Method ---------------acmetool needs to be able to convey challenge responses to the ACME server in order to prove its control of the domains for which you issue certificates. These authorizations expire rapidly, as do ACME-issued certificates (Let's Encrypt certificates have a 90 day lifetime), thus it is essential that the completion of these challenges is a) automated and b) functioning properly. There are several options by which challenges can be facilitated:
WEBROOT: The webroot option installs challenge files to a given directory. You must configure your web server so that the files will be available at <http://[HOST]/.well-known/acme-challenge/>. For example, if your webroot is "/var/www", specifying a webroot of "/var/www/.well-known/acme-challenge" is likely to work well. The directory will be created automatically if it does not already exist.
PROXY: The proxy option requires you to configure your web server to proxy requests for paths under /.well-known/acme-challenge/ to a special web server running on port 402, which will serve challenges appropriately.
REDIRECTOR: The redirector option runs a special web server daemon on port 80. This means that you cannot run your own web server on port 80. The redirector redirects all HTTP requests to the equivalent HTTPS URL, so this is useful if you want to enforce use of HTTPS. You will need to configure your web server to not listen on port 80, and you will need to configure your system to run "acmetool redirector" as a daemon. If your system uses systemd, an appropriate unit file can automatically be installed.
LISTEN: Directly listen on port 80 or 443, whichever is available, in order to complete challenges. This is useful only for development purposes.
  1) WEBROOT - Place challenges in a directory  2) PROXY - I'll proxy challenge requests to an HTTP server  3) REDIRECTOR - I want to use acmetool's redirect-to-HTTPS functionality  4) LISTEN - Listen on port 80 or 443 (only useful for development purposes)
I chose option 1 (WEBROOT).
------------------------- Enter Webroot Path -----------------------Please enter the path at which challenges should be stored.
If your webroot path is /var/www, you would enter /var/www/.well-known/acme-challenge here.The directory will be created if it does not exist.
Webroot paths vary by OS; please consult your web server configuration.
I indicated /var/www/mydomain/.well-known/acme-challenge as the directory where the challenge will be stored.
------------------------- Quickstart Complete ----------------------The quickstart process is complete.
Ensure your chosen challenge conveyance method is configured properly before attempting to request certificates. You can find more information about how to configure your system for each method in the acmetool documentation: https://github.com/hlandau/acme.t/blob/master/doc/WSCONFIG.md
To request a certificate, run:
$ sudo acmetool want example.com www.example.com
If the certificate is successfully obtained, it will be placed in /var/lib/acme/live/example.com/{cert,chain,fullchain,privkey}.
Press Return to continue.
4) Obtain the Let's Encrypt SSL key and certificates for www.mydomain.com
As the quickstart output indicates above, we need to run:
# acmetool want www.mydomain.com
This should run with no errors and drop the following files in /var/lib/acme/live/www.mydomain.com: cert, chain, fullchain, privkey and url.
5) Configure nginx to use the Let's Encrypt SSL key and certificate chain
I found a good resource for specifying secure (as of Dec. 2015) SSL configurations for a variety of software, including nginx: cipherli.st.
Here is the nginx configuration pertaining to SSL that I used, pointing to the SSL key and certificate chain retrieved by acmetool from Let's Encrypt:
        listen 443 ssl default_server;        listen [::]:443 ssl default_server;
        ssl_certificate     /var/lib/acme/live/www.mydomain.com/fullchain;        ssl_certificate_key /var/lib/acme/live/www.mydomain.com/privkey;
        ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";        ssl_protocols TLSv1 TLSv1.1 TLSv1.2;        ssl_prefer_server_ciphers on;        ssl_session_cache shared:SSL:10m;        add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";        add_header X-Frame-Options DENY;        add_header X-Content-Type-Options nosniff;        ssl_session_tickets off; # Requires nginx >= 1.5.9        ssl_stapling on; # Requires nginx >= 1.3.7        ssl_stapling_verify on; # Requires nginx => 1.3.7
At this point, if you hit www.mydomain.com over SSL, you should be able to inspect the SSL certificate and see that it's considered valid by your browser (I tested it in Chrome, Firefox and Safari). The Issuer Name has Organization Name "Let's Encrypt" and Common Name "Let's Encrypt Authority X1".
6) Configure cron job for SSL certificate renewal
Let's Encrypt certificates expire in 90 days after the issue date, so you need to renew them more often than you are used to with regular SSL certificates. I added this line to my crontab on the server that handles www.mydomain.com:
# m h  dom mon dow   command0 0 1 * * /usr/local/bin/acmetool reconcile --batch; service nginx restart
This runs the acmetool "reconcile" command in batch mode (with no input required from the user) at midnight on the 1st day of every month, then restarts nginx just in case the certificate has changed. If the Let's Encrypt SSL certificate is 30 days away from expiring, acmetool reconcile will renew it.
I think Let's Encrypt is a great service, and you should start using it if you're not already!


Quote of the Month December 2015

From the Editor of Methods & Tools - Wed, 12/02/2015 - 08:58
Now more than ever, we need to unleash the talent of individuals, teams, and organizations. This might be the only hope we have to not just survive but thrive. But, too often, we put a wet blanket over the fire of innovation and motivation. In the midst of uncertainty, we attempt to control outcomes by […]

Software Development Conferences Forecast November 2015

From the Editor of Methods & Tools - Mon, 11/23/2015 - 17:14
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP), DevOps and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods […]

Initial experiences with the Prometheus monitoring system

Agile Testing - Grig Gheorghiu - Fri, 11/20/2015 - 22:23
I've been looking for a while for a monitoring system written in Go, self-contained and easy to deploy. I think I finally found what I was looking for in Prometheus, a monitoring system open-sourced by SoundCloud and started there by ex-Googlers who took their inspiration from Google's Borgmon system.

Prometheus is a pull system, where the monitoring server pulls data from its clients by hitting a special HTTP handler exposed by each client ("/metrics" by default) and retrieving a list of metrics from that handler. The output of /metrics is plain text, which makes it fairly easily parseable by humans as well, and also helps in troubleshooting.

Here's a subset of the OS-level metrics that are exposed by a client running the node_exporter Prometheus binary (and available when you hit http://client_ip_or_name:9100/metrics):

# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="guest"} 0
node_cpu{cpu="cpu0",mode="idle"} 2803.93
node_cpu{cpu="cpu0",mode="iowait"} 31.38
node_cpu{cpu="cpu0",mode="irq"} 0
node_cpu{cpu="cpu0",mode="nice"} 2.26
node_cpu{cpu="cpu0",mode="softirq"} 0.23
node_cpu{cpu="cpu0",mode="steal"} 21.16
node_cpu{cpu="cpu0",mode="system"} 25.84
node_cpu{cpu="cpu0",mode="user"} 79.94
# HELP node_disk_io_now The number of I/Os currently in progress.
# TYPE node_disk_io_now gauge
node_disk_io_now{device="xvda"} 0
# HELP node_disk_io_time_ms Milliseconds spent doing I/Os.
# TYPE node_disk_io_time_ms counter
node_disk_io_time_ms{device="xvda"} 44608
# HELP node_disk_io_time_weighted The weighted # of milliseconds spent doing I/Os. See https://www.kernel.org/doc/Documentation/iostats.txt.
# TYPE node_disk_io_time_weighted counter
node_disk_io_time_weighted{device="xvda"} 959264

There are many such "exporters" available for Prometheus, exposing metrics in the format expected by the Prometheus server from systems such as Apache, MySQL, PostgreSQL, HAProxy and many others (see a list here).

What drew me to Prometheus though was the fact that it allows for easy instrumentation of code by providing client libraries for many languages: Go, Java/Scala, Python, Ruby and others. 
One of the main advantages of Prometheus over alternative systems such as Graphite is the rich query language that it provides. You can associate labels (which are arbitrary key/value pairs) with any metrics, and you are then able to query the system by label. I'll show examples in this post. Here's a more in-depth comparison between Prometheus and Graphite.
Installation (on Ubuntu 14.04)
I put together an ansible role that is loosely based on Brian Brazil's demo_prometheus_ansible repo.
Check out my ansible-prometheus repo for this ansible role, which installs Prometheus, node_exporter and PromDash (a ruby-based dashboard builder). For people not familiar with ansible, most of the installation commands are in the install.yml task file. Here is the sequence of installation actions, in broad strokes.
For the Prometheus server:
  • download prometheus-0.16.1.linux-amd64.tar.gz from https://github.com/prometheus/prometheus/releases/download
  • extract tar.gz into /opt/prometheus/dist and link /opt/prometheus/prometheus-server to /opt/prometheus/dist/prometheus-0.16.1.linux-amd64
  • create Prometheus configuration file from ansible template and drop it in /etc/prometheus/prometheus.yml (more on the config file later)
  • create Prometheus default command-line options file from ansible template and drop it in /etc/default/prometheus
  • create Upstart script for Prometheus in /etc/init/prometheus.conf:
# Run prometheus

start on startup

chdir /opt/prometheus/prometheus-server

script
./prometheus -config.file /etc/prometheus/prometheus.yml
end script
For node_exporter:
  • download node_exporter-0.12.0rc1.linux-amd64.tar.gz from https://github.com/prometheus/node_exporter/releases/download
  • extract tar.gz into /opt/prometheus/dist and move node_exporter binary to /opt/prometheus/bin/node_exporter
  • create Upstart script for Prometheus in /etc/init/prometheus_node_exporter.conf:
# Run prometheus node_exporter
start on startup
script   /opt/prometheus/bin/node_exporterend script
For PromDash:
  • git clone from https://github.com/prometheus/promdash
  • follow instructions in the Prometheus tutorial from Digital Ocean (can't stop myself from repeating that D.O. publishes the best technical tutorials out there!)
Here is a minimal Prometheus configuration file (/etc/prometheus/prometheus.yml):
global:  scrape_interval: 30s  evaluation_interval: 5s
scrape_configs:  - job_name: 'prometheus'    target_groups:      - targets:        - prometheus.example.com:9090  - job_name: 'node'    target_groups:      - targets:        - prometheus.example.com:9100        - api01.example.com:9100        - api02.example.com:9100        - test-api01.example.com:9100        - test-api02.example.com:9100
The configuration file format for Prometheus is well documented in the official docs. My example shows that the Prometheus server itself is monitored (or "scraped" in Prometheus parlance) on port 9090, and that OS metrics are also scraped from 5 clients which are running the node_exporter binary on port 9100, including the Prometheus server.
At this point, you can start Prometheus and node_exporter on your Prometheus server via Upstart:
# start prometheus# start prometheus_node_exporter
Then you should be able to hit http://prometheus.example.com:9100 to see the metrics exposed by node_exporter, and more importantly http://prometheus.example.com:9090 to see the default Web console included in the Prometheus server. A demo page available from Robust Perception can be examined here.
Note that Prometheus also provides default Web consoles for node_exporter OS-level metrics. They are available at http://prometheus.example.com:9090/consoles/node.html (the ansible-prometheus role installs nginx and redirects http://prometheus.example.com:80 to the previous URL). The node consoles show CPU, Disk I/O and Memory graphs and also network traffic metrics for each client running node_exporter. 




Working with the MySQL exporter
I installed the mysqld_exporter binary on my Prometheus server box.
# cd /opt/prometheus/dist# git clone https://github.com/prometheus/mysqld_exporter.git# cd mysqld_exporter# make
Then I created a wrapper script I called run_mysqld_exporter.sh:
# cat run_mysqld_exporter.sh#!/bin/bash
export DATA_SOURCE_NAME=“dbuser:dbpassword@tcp(dbserver:3306)/dbname”; ./mysqld_exporter
Two important notes here:
1) Note the somewhat awkward format for the DATA_SOURCE_NAME environment variable. I tried many other formats but only this one worked for me. The wrapper's script main purpose is to define this variable properly. With some of my other tries, I got this error message:
INFO[0089] Error scraping global state: Default addr for network 'dbserver:3306' unknown  file=mysqld_exporter.go line=697
You could also define this variable in ~/.bashrc but in that case it may clash with other  Prometheus exporters (the one for PostgreSQL for example) which also need to define this variable.
2) Note that the dbuser specified in the DATA_SOURCE_NAME variable needs to have either SUPER or REPLICATION CLIENT permissions to the MySQL server you need to monitor. I ran a SQL statement of this form:
GRANT REPLICATION CLIENT ON *.* TO dbuser@'%' IDENTIFIED BY 'dbpassword';

I created an Upstart init script I called /etc/init/prometheus_mysqld_exporter.conf:
# cat /etc/init/prometheus_mysqld_exporter.conf# Run prometheus mysqld exporter
start on startup
chdir /opt/prometheus/dist/mysqld_exporter
script   ./run_mysqld_exporter.shend script
I modified the Prometheus server configuration file (/etc/prometheus/prometheus.yml) and added a scrape job for the MySQL metrics:

  - job_name: 'mysql'
    honor_labels: true
    target_groups:
      - targets:
        - prometheus.example.com:9104

I restarted the Prometheus server:

# stop prometheus
# start prometheus

Then I started up mysqld_exporter via Upstart:
# start prometheus_mysqld_exporter
If everything goes well, the metrics scraped from MySQL will be available at http://prometheus.example.com:9104/metrics
Here are some of the available metrics:
# HELP mysql_global_status_innodb_data_reads Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_data_reads untyped
mysql_global_status_innodb_data_reads 12660
# HELP mysql_global_status_innodb_data_writes Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_data_writes untyped
mysql_global_status_innodb_data_writes 528790
# HELP mysql_global_status_innodb_data_written Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_data_written untyped
mysql_global_status_innodb_data_written 9.879318016e+09
# HELP mysql_global_status_innodb_dblwr_pages_written Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_dblwr_pages_written untyped
mysql_global_status_innodb_dblwr_pages_written 285184
# HELP mysql_global_status_innodb_row_ops_total Total number of MySQL InnoDB row operations.
# TYPE mysql_global_status_innodb_row_ops_total counter
mysql_global_status_innodb_row_ops_total{operation="deleted"} 14580
mysql_global_status_innodb_row_ops_total{operation="inserted"} 847656
mysql_global_status_innodb_row_ops_total{operation="read"} 8.1021419e+07
mysql_global_status_innodb_row_ops_total{operation="updated"} 35305

Most of the metrics exposed by mysqld_exporter are of type Counter, which means they always increase. A meaningful number to graph then is not their absolute value, but their rate of change. For example, for the mysql_global_status_innodb_row_ops_total metric, the rate of change of reads for the last 5 minutes (reads/sec) can be expressed as:
rate(mysql_global_status_innodb_row_ops_total{operation="read"}[5m])
This is also an example of a Prometheus query which filters by a specific label (in this case {operation="read"})
A good way to get a feel for the metrics available to the Prometheus server is to go to the Web console and graphing tool available at http://prometheus.example.com:9090/graph. You can copy and paste the ine above in the Expression edit box and click execute. You should see something like this graph in the Graph tab:


It's important to familiarize yourself with the 4 types of metrics handled by Prometheus: Counter, Gauge, Histogram and Summary. 
Working with the Postgres exporter
Although not an official Prometheus package, the Postgres exporter has worked just fine for me. 
I installed the postgres_exporter binary on my Prometheus server box.
# cd /opt/prometheus/dist# git clone https://github.com/wrouesnel/postgres_exporter.git# cd postgres_exporter# make
Then I created a wrapper script I called run_postgres_exporter.sh:

# cat run_postgres_exporter.sh
#!/bin/bash

export DATA_SOURCE_NAME="postgres://dbuser:dbpassword@dbserver/dbname"; ./postgres_exporter
Note that the format for DATA_SOURCE_NAME is a bit different from the MySQL format.
I created an Upstart init script I called /etc/init/prometheus_postgres_exporter.conf:
# cat /etc/init/prometheus_postgres_exporter.conf# Run prometheus postgres exporter
start on startup
chdir /opt/prometheus/dist/postgres_exporter
script   ./run_postgres_exporter.shend script
I modified the Prometheus server configuration file (/etc/prometheus/prometheus.yml) and added a scrape job for the Postgres metrics:

  - job_name: 'postgres'
    honor_labels: true
    target_groups:
      - targets:
        - prometheus.example.com:9113

I restarted the Prometheus server:

# stop prometheus
# start prometheus
Then I started up postgres_exporter via Upstart:
# start prometheus_postgres_exporter
If everything goes well, the metrics scraped from Postgres will be available at http://prometheus.example.com:9113/metrics
Here are some of the available metrics:
# HELP pg_stat_database_tup_fetched Number of rows fetched by queries in this database
# TYPE pg_stat_database_tup_fetched counter
pg_stat_database_tup_fetched{datid="1",datname="template1"} 7.730469e+06
pg_stat_database_tup_fetched{datid="12998",datname="template0"} 0
pg_stat_database_tup_fetched{datid="13003",datname="postgres"} 7.74208e+06
pg_stat_database_tup_fetched{datid="16740",datname="mydb"} 2.18194538e+08
# HELP pg_stat_database_tup_inserted Number of rows inserted by queries in this database
# TYPE pg_stat_database_tup_inserted counter
pg_stat_database_tup_inserted{datid="1",datname="template1"} 0
pg_stat_database_tup_inserted{datid="12998",datname="template0"} 0
pg_stat_database_tup_inserted{datid="13003",datname="postgres"} 0
pg_stat_database_tup_inserted{datid="16740",datname="mydb"} 3.5467483e+07
# HELP pg_stat_database_tup_returned Number of rows returned by queries in this database
# TYPE pg_stat_database_tup_returned counter
pg_stat_database_tup_returned{datid="1",datname="template1"} 6.41976558e+08
pg_stat_database_tup_returned{datid="12998",datname="template0"} 0
pg_stat_database_tup_returned{datid="13003",datname="postgres"} 6.42022129e+08
pg_stat_database_tup_returned{datid="16740",datname="mydb"} 7.114057378094e+12
# HELP pg_stat_database_tup_updated Number of rows updated by queries in this database
# TYPE pg_stat_database_tup_updated counter
pg_stat_database_tup_updated{datid="1",datname="template1"} 1
pg_stat_database_tup_updated{datid="12998",datname="template0"} 0
pg_stat_database_tup_updated{datid="13003",datname="postgres"} 1
pg_stat_database_tup_updated{datid="16740",datname="mydb"} 4351

These metrics are also of type Counter, so to generate meaningful graphs for them, you need to plot their rates. For example, to see the rate of rows returned per second from the database called mydb, you would plot this expression:
rate(pg_stat_database_tup_returned{datid="16740",datname="mydb"}[5m])
The Prometheus expression evaluator available at http://prometheus.example.com:9090/graph is again your friend. BTW, if you start typing pg_ in the expression field, you'll see a drop-down filled automatically with all the available metrics starting with pg_. Handy!
Working with the AWS CloudWatch exporterThis is one of the officially supported Prometheus exporters, used for graphing and alerting on AWS CloudWatch metrics. I installed it on the Prometheus server box. It's a java app, so it needs a JDK installed, and also maven for building the app.
# cd /opt/prometheus/dist# git clone https://github.com/prometheus/cloudwatch_exporter.git# apt-get install maven2 openjdk-7-jdk# cd cloudwatch_exporter# mvn package
The cloudwatch_exporter app needs AWS credentials in order to connect to CloudWatch and read the metrics. Here's what I did:
  1. created an AWS IAM user called cloudwatch_ro and downloaded its access key and secret key
  2. created an AWS IAM custom policy called CloudWatchReadOnlyAccess-201511181031, which includes the default CloudWatchReadOnlyAccess policy (the custom policy is not stricly necessary, and you can use the default one, but I preferred to use a custom one because I may need to further edits to the policy file)
  3. attached the CloudWatchReadOnlyAccess-201511181031 policy to the cloudwatch_ro user
  4. created a file called ~/.aws/credentials with the contents:
[default]aws_access_key_id=ACCESS_KEY_FOR_USER_CLOUDWATCH_ROaws_secret_access_key=SECRET_KEY_FOR_USER_CLOUDWATCH_RO
The cloudwatch_exporter app also needs a json file containing the CloudWatch metrics we want it to retrieve from AWS. Here is an example of ELB-related metrics I specified in a file called cloudwatch.json:
{
  "region": "us-west-2",
  "metrics": [
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "RequestCount",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "BackendConnectionErrors",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_Backend_2XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_Backend_4XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_Backend_5XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_ELB_4XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_ELB_5XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "SurgeQueueLength",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Maximum", "Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "SpilloverCount",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "Latency",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Average"]},
  ]
}
Note that you need to look up the exact syntax for each metric name, dimensions and preferred statistics in the AWS CloudWatch documentation. For ELB metrics, the documentation is here. The CloudWatch name corresponds to the cloudwatch_exporter JSON parameter aws_metric_name, dimensions corresponds to aws_dimensions, and preferred statistics corresponds to aws_statistics.
I modified the Prometheus server configuration file (/etc/prometheus/prometheus.yml) and added a scrape job for the CloudWatch metrics:

  - job_name: 'cloudwatch'
    honor_labels: true
    target_groups:
      - targets:
        - prometheus.example.com:9106

I restarted the Prometheus server:

# stop prometheus
# start prometheus

I created an Upstart init script I called /etc/init/prometheus_cloudwatch_exporter.conf:
# cat /etc/init/prometheus_cloudwatch_exporter.conf# Run prometheus cloudwatch exporter
start on startup
chdir /opt/prometheus/dist/cloudwatch_exporter
script   /usr/bin/java -jar target/cloudwatch_exporter-0.2-SNAPSHOT-jar-with-dependencies.jar 9106 cloudwatch.jsonend script
Then I started up cloudwatch_exporter via Upstart:
# start prometheus_cloudwatch_exporter
If everything goes well, the metrics scraped from CloudWatch will be available at http://prometheus.example.com:9106/metrics
Here are some of the available metrics:
# HELP aws_elb_request_count_sum CloudWatch metric AWS/ELB RequestCount Dimensions: [AvailabilityZone, LoadBalancerName] Statistic: Sum Unit: Count
# TYPE aws_elb_request_count_sum gauge
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2a",} 1.0
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2c",} 1.0
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2c",} 2.0
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2a",} 12.0
# HELP aws_elb_httpcode_backend_2_xx_sum CloudWatch metric AWS/ELB HTTPCode_Backend_2XX Dimensions: [AvailabilityZone, LoadBalancerName] Statistic: Sum Unit: Count
# TYPE aws_elb_httpcode_backend_2_xx_sum gauge
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2a",} 1.0
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2c",} 1.0
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2c",} 2.0
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2a",} 12.0
# HELP aws_elb_latency_average CloudWatch metric AWS/ELB Latency Dimensions: [AvailabilityZone, LoadBalancerName] Statistic: Average Unit: Seconds
# TYPE aws_elb_latency_average gauge
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2a",} 0.5571935176849365
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2c",} 0.5089397430419922
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2c",} 0.035556912422180176
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2a",} 0.0031794110933939614

Note that there are 3 labels available to query the metrics above: job, load_balancer_name and availability_zone. 
If we specify something like aws_elb_request_count_sum{job="aws_elb"} in the expression evaluator at http://prometheus.example.com:9090/graph, we'll see 4 graphs, one for each load_balancer_name/availability_zone combination. 
To see only graphs related to a specific load balancer, say LB1, we can specify an expression of the form:aws_elb_request_count_sum{job="aws_elb",load_balancer_name="LB1"}In this case, we'll see 2 graphs for LB1, one for each availability zone.
In order to see the request count across all availability zones for a specific load balancer, we need to apply the sum function: sum(aws_elb_request_count_sum{job="aws_elb",load_balancer_name="LB1"}) by (load_balancer_name) In this case, we'll see one graph with the request count across the 2 availability zones pertaining to LB1.
If we want to graph all load balancers but only show one graph per balancer, summing all availability zones for each balancer, we would use an expression like this: sum(aws_elb_request_count_sum{job="aws_elb"}) by (load_balancer_name)So in this case we'll see 2 graphs, one for LB1 and one for LB2, with each graph summing the request count across the availability zones for LB1 and LB2 respectively.
Note that in all the expressions above, since the job label has the value "aws_elb" common to all metrics, it can be dropped from the queries because it doesn't produce any useful filtering.
For other AWS CloudWatch metrics, consult the Amazon CloudWatch Namespaces, Dimensions and Metrics Reference.

Instrumenting Go code with Prometheus
For me, the most interesting feature of Prometheus is that allows for easy instrumentation of the code. Instead of pushing metrics a la statsd and Graphite, a web app needs to implement a /metrics handler and use the Prometheus client library code to publish app-level metrics to that handler. The Prometheus server will then hit /metrics on the client and pull/scrape the metrics.

More specifics for Go code instrumentation

1) Declare and register Prometheus metrics in your code

I have the following 2 variables defined in an init.go file in a common package that gets imported in all of the webapp code:

var PrometheusHTTPRequestCount = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Namespace: "myapp",
        Name:      "http_request_count",
        Help:      "The number of HTTP requests.",
    },
    []string{"method", "type", "endpoint"},
)

var PrometheusHTTPRequestLatency = prometheus.NewSummaryVec(
    prometheus.SummaryOpts{
        Namespace: "myapp",
        Name:      "http_request_latency",
        Help:      "The latency of HTTP requests.",
    },
    []string{"method", "type", "endpoint"},
)

Note that the first metric is a CounterVec, which in the Prometheus client_golang library specifies a Counter metric that can also get labels associated with it. The labels in my case are "method", "type" and "endpoint". The purpose of this metric is to measure the HTTP request count. Since it's a Counter, it will increase monotonically, so for graphing purposes we'll need to plot its rate and not its absolute value.

The second metric is a SummaryVec, which in the client_golang library specifies a Summary metric with labels. I have the same labels are for the CounterVec metric. The purpose of this metric is to measure the HTTP request latency. Because it's a Summary, it will provide the absolute measurement, the count, as well as quantiles for the measurements.

These 2 variables then get registered in the init function:

func init() {
    // Register Prometheus metric trackers
    prometheus.MustRegister(PrometheusHTTPRequestCount)
    prometheus.MustRegister(PrometheusHTTPRequestLatency)
}

2) Let Prometheus handle the /metrics endpoint

The GitHub README for client_golang shows the simplest way of doing this:

http.Handle("/metrics", prometheus.Handler())
http.ListenAndServe(":8080", nil)

However, most of the Go webapp code will rely on some sort of web framework, so YMMV. In our case, I had to insert the prometheus.Handler function as a variable pretty deep in our framework code in order to associate it with the /metrics endpoint.

3) Modify Prometheus metrics in your code

The final step in getting Prometheus to instrument your code is to modify the Prometheus metrics you registered by incrementing Counter variables and taking measurements for Summary variables in the appropriate places in your app. In my case, I increment PrometheusHTTPRequestCount in every HTTP handler in my webapp by calling its Inc() method. I also measure the HTTP latency, i.e. the time it took for the handler code to execute, and call the Observe() method on the PrometheusHTTPRequestLatency variable.

The values I associate with the "method", "type" and "endpoint" labels come from the endpoint URL associated with each instrumented handler. As an example, for an HTTP GET request to a URL such as http://api.example.com/customers/find, "method" is the HTTP method used in the request ("GET"), "type" is "customers", and "endpoint" is "/customers/find".

Here is the code I use for modifying the Prometheus metrics (R is an object/struct which represents the HTTP request):

    // Modify Prometheus metrics
    pkg, endpoint := common.SplitUrlForMonitoring(R.URL.Path)
    method := R.Method
    PrometheusHTTPRequestCount.WithLabelValues(method, pkg, endpoint).Inc()
    PrometheusHTTPRequestLatency.WithLabelValues(method, pkg, endpoint).Observe(float64(elapsed) / float64(time.Millisecond))


4) Retrieving your metrics

Assuming your web app runs on port 8080, you'll need to modify the Prometheus server configuration file and add a scrape job for app-level metrics. I have something similar to this in /etc/prometheus/prometheus.xml:

- job_name: 'myapp-api'
    target_groups:
      - targets:
        - api01.example.com:8080
        - api02.example.com:8080
        labels:
          group: 'production'
      - targets:
        - test-api01.example.com:8080
        - test-api02.example.com:8080
        labels:
          group: 'test'

Note an extra label called "group" defined in the configuration file. It has the values "production" and "test" respectively, and allows for the filtering of Prometheus measurements by the environment of the monitored nodes.

Whenever the Prometheus configuration file gets modified, you need to restart the Prometheus server:

# stop prometheus
# start prometheus

At this point, the metrics scraped from the webapp servers will be available at http://api01.example.com:8080/metrics.

Here are some of the available metrics:
# HELP myapp_http_request_count The number of HTTP requests.
# TYPE myapp_http_request_count counter
myapp_http_request_count{endpoint="/merchant/register",method="GET",type="admin"} 2928
# HELP myapp_http_request_latency The latency of HTTP requests.
# TYPE myapp_http_request_latency summary
myapp_http_request_latency{endpoint="/merchant/register",method="GET",type="admin",quantile="0.5"} 31.284808
myapp_http_request_latency{endpoint="/merchant/register",method="GET",type="admin",quantile="0.9"} 33.353354
myapp_http_request_latency{endpoint="/merchant/register",method="GET",type="admin",quantile="0.99"} 33.353354
myapp_http_request_latency_sum{endpoint="/merchant/register",method="GET",type="admin"} 93606.57930099976

myapp_http_request_latency_count{endpoint="/merchant/register",method="GET",type="admin"} 2928

Note that myapp_http_request_count and myapp_http_request_latency_count show the same value for the method/type/endpoint combination in this example. You could argue that myapp_http_request_count is redundant in this case. There could be instances where you want to increment a counter without taking a measurement for the summary, so it's still useful to have both. 
Also note that myapp_http_request_latency, being a summary, computes 3 different quantiles: 0.5, 0.9 and 0.99 (so 50%, 90% and 99% of the measurements respectively fall under the given numbers for the latencies).

5) Graphing your metrics with PromDash
The PromDash tool provides an easy way to create dashboards with a look and feel similar to Graphite. PromDash is available at http://prometheus.example.com:3000. 
First you need to define a server by clicking on the Servers link up top, then entering a name ("prometheus") and the URL of the Prometheus server ("http://prometheus.example.com:9090/").
Then click on Dashboards up top, and create a new directory, which offers a way to group dashboards. You can call it something like "myapp". Now you can create a dashboard (you also need to select the directory it belongs to). Once you are in the Dashboard create/edit screen, you'll see one empty graph with the default title "Title". 
When you hover over the header of the graph, you'll see other buttons available. You want to click on the 2nd button from the left, called Datasources, then click Add Expression. Note that the server field is already pre-filled. If you start typing myapp in the expression field, you should see the metrics exposed by your application (for example myapp_http_request_count and myapp_http_request_latency).
To properly graph a Counter-type metric, you need to plot its rate. Use this expression to show the HTTP request/second rate measured in the last minute for all the production endpoints in my webapp:
rate(myapp_http_request_count{group="production",job="myapp-api"}[1m])
(the job and group values correspond to what we specified in /etc/prometheus/prometheus.xml)
If you want to show the HTTP request/second rate for test endpoints of "admin" type, use this expression:
rate(myapp_http_request_count{group="test",job="myapp-api",type="admin"}[1m])
If you want to show the HTTP request/second rate for a specific production endpoint, use an expression similar to this:
rate(myapp_http_request_count{group="production",job="myapp-api",endpoint="/merchant/register",type="admin"}[1m])
Once you enter the expression you want, close the Datasources form (it will save everything). Also change the title by clicking on the button called "Graph and Axis Settings". In that form, you can also specify that you want the plot lines stacked as opposed to regular lines.
 For latency metrics, you don't need to look at the rate. Instead, you can look at a specific quantile. Let's say you want to plot the 99% quantile for latencies observed in all production endpoint, for write operations (corresponding to HTTP methods which are not GET). Then you would use an expression like this:
myapp_http_request_latency{method!="GET",quantile="0.99",group="production",job="myapp-api"}
As for the HTTP request/second graphs, you can refine the latency queries by specifying a type, an endpoint or both:
myapp_http_request_latency{method!="GET",quantile="0.99",group="production",type="admin",endpoint="/merchant/register",job="myapp-api"}
I hope you have enough information at this point to go wild with dashboards! Remember, who has the most dashboards wins!
Wrapping up
I wanted to write this blog post so I don't forget all the stuff that was involved in setting up and using Prometheus. It's a lot, but it's also not that bad once you get a hang for it. In particular, the Prometheus server itself is remarkably easy to set up and maintain, a refreshing change from other monitoring systems I've used before.
One thing I haven't touched on is the alerting mechanism used in Prometheus. I haven't looked at that yet, since I'm still using a combination of Pingdom, monit and Jenkins for my alerting. I'll tackle Prometheus alerting in another blog post.
I really like Prometheus so far and I hope you'll give it a try!








Software Development Linkopedia November 2015

From the Editor of Methods & Tools - Thu, 11/19/2015 - 17:55
Here is our monthly selection of knowledge on programming, software testing and project management. This month you will find some interesting information and opinions about software development teams, software engineering management, metrics approaches, UX personas, tools for distributed retrospectives, exploratory testing, Agile principles and load testing. Blog: How to Make the Leap from Engineer to […]

Why I like golang: a programming autobiography

Agile Testing - Grig Gheorghiu - Mon, 11/16/2015 - 19:07
Tried my hand at writing a story on Medium.

Quote of the Month November 2015

From the Editor of Methods & Tools - Tue, 11/10/2015 - 15:31
Because we are working in such small increments, the only thing we need to come up with at a Retrospective is something to try out for the next few weeks: an experiment. Because we are only going to try the experiment for a limited amount of time, any negative impact is controlled by the fact […]

GTAC 2015 is Next Week!

Google Testing Blog - Sat, 11/07/2015 - 03:46
by Anthony Vallone on behalf of the GTAC Committee

The ninth GTAC (Google Test Automation Conference) commences on Tuesday, November 10th, at the Google Cambridge office. You can find the latest details on the conference site, including schedule, speaker profiles, and travel tips.

If you have not been invited to attend in person, you can watch the event live. And if you miss the livestream, we will post slides and videos later.

We have an outstanding speaker lineup this year, and we look forward to seeing you all there or online!

Categories: Testing & QA

Grady Booch: The Future of Software Engineering

From the Editor of Methods & Tools - Fri, 11/06/2015 - 10:21
No matter what future we may envision, it relies on software that has not yet been written. Even now, software-intensive systems have woven themselves into the interstitial spaces of civilization, and we as individuals and as a species have slowly surrendered ourselves to computing. Looking back, we can identify several major and distinct styles whereby […]

Higher Order React Components

Mistaeks I Hav Made - Nat Pryce - Fri, 11/06/2015 - 09:27
When writing user interfaces with the React framework, I often find that several of my components have similar behaviour. For example, I may have several components that display the eventual value of a promise, or display changing values of an Rx event stream, or are sources or targets for drag-and-drop interactions, and so on. I want to define these common behaviours once and compose them into my component classes where required. This, in a nutshell, is what “higher-order components” do. An Example Use Case for Higher-Order Components Imagine we’re writing an international e-commerce site. When a customer uses the site, the information they see is localised for the country in which they reside. The site uses the user’s country to determine the currency in which to display prices, calculate shipping costs, etc. The site displays the customer’s country in the navigation bar at the top of each page. If the user is travelling, they can select their preferred country from a menu of countries supported by the site. Both the country associated with the user’s session and the list of all countries supported by the application are fetched by HTTP from the server in JSON format and displayed by React components. For example, the user’s country is served as: {"iso": "gb", "name": "United Kingdom"} And the list of supported countries is served as: [ {"iso": "fr", "name": "France"}, {"iso": "gb", "name": "United Kingdom"}, ... ] The Country component below displays the user’s preferred country. Because country data is received asynchronously, the Country component must be given a promise of the country information. While the promise is pending, the component displays a loading indicator. When the promise is resolved successfully, the component displays the country information as a flag icon and name. If the promise is rejected, the component displays an error message. class Country extends React.Component { constructor(props) { super(props); this.state = {loading: true, error: null, country: null}; } componentDidMount() { this.props.promise.then( value => this.setState({loading: false, country: value}), error => this.setState({loading: false, error: error})); } render() { if (this.state.loading) { return Loading...; } else if (this.state.error !== null) { return Error: {this.state.error.message}; } else { var iso = this.state.country.iso; var name = this.state.country.name; return ( {name} ); } } } It can be used like this (assuming fetchJson starts loading JSON from a URL and returns a promise of the JSON): The CountryChooser component below displays the list of available countries, which are also passed to it as a promise: class CountryChooser extends React.Component { constructor(props) { super(props); this.state = {loading: true, error: null, countries: null}; } componentDidMount() { this.props.promise.then( value => this.setState({loading: false, countries: value}), error => this.setState({loading: false, error: error})); } render() { if (this.state.loading) { return Loading...; } else if (this.state.error !== null) { return Error: {this.state.error.message}; } else { return (
    {this.state.countries.map(c =>
  • this.props.onSelect(c.iso)}> {c.name}
  • ) }
); } } } It can be used like this (assuming the same fetchJson function and a changeUsersPreferredCountry function that sends the change of country to the server): There’s a lot of duplication between the two components. They duplicate the state machine required to receive and render data obtained asynchronously from a promise. These are not the only React components in the application that need to display data loaded asynchronously from the server, so addressing that duplication will shrink the code significantly. The CountryChooser component cannot use the Country component to display the countries in the list because the event handling is intermingled with the presentation of the data. It therefore duplicates the code to render a country as HTML. We don’t want these HTML fragments diverging, because that will then create further duplication in our CSS stylesheets. What can we do? We can’t achieve what we want with parent/child relationships between components, where a parent component handles the promise events and child components render the promised value. Child component props are specified in the code that creates the component hierarchy, but at that point the we do not know the prop values. We want to calculate the props dynamically, when the promise is resolved. We could extract the promise event handling into a base class. But JavaScript only supports single inheritance, so if our components inherit event handling for promises, they cannot inherit base classes that provide event handling for other things, such as user interaction 1. And although it disentangles the promise event handling from the rendering, it doesn’t disentangle the rendering from the promise event handling, so we still couldn’t use the Country component within the CountryChooser. It sounds like a job for mixins, but React’s mixins don’t work with ES6 classes and are going to be dropped from the API. The solution is a higher-order component. Higher-Order Components A higher-order component is merely a function from component class to component class. The function takes a component class as a parameter and returns a new component class that wraps useful functionality around the class passed in 2. If you’re familiar with the “Gang of Four” design patterns and are thinking “Decorator pattern”, you’re pretty much bang on. As a shorthand, in the rest of this article, and to avoid confusion with ES7 decorators, I’m going to call class passed to the function the “wrapped class”, the class returned by the function the “wrapper class”, and the function itself as the “higher-order component”. I’ll use “wrapped component” and “wrapper component” to mean instances of the wrapped and wrapper classes. A wrapper component usually handles events on behalf of the wrapped component. It maintains some state and communicates with the wrapped component by passing state values and callbacks to the wrapped component via its props. Let’s assume we have a higher-order component called Promised that translates a promise of a value into props for a wrapped component. The wrapper component performs all the state management required to use the promise. This means that wrapped components can be stateless, only concerned with presentation. The Country component now only needs to display to country information: var Country = ({name, iso}) => {name} ; To define a component that receives the country information asynchronously as a promise, we decorate it with the Promised higher-order component: var AsyncCountry = Promised(Country); The CountryChooser can also be written as a stateless component, and can now use the Country component to display each country: var CountryChooser = ({countries, onSelect}) =>
    { countries.map(c =>
  • onSelect(c.iso)}>
  • ) }
; And can also be wrapped with Promised to receive the list of countries as a promise: var AsyncCountryChooser = Promised(CountryChooser); By moving state management into a generic higher-order component, we have made our application-specific components both simpler and more useful, in that they can be used in more contexts. Implementing the Higher-Order Component Here is the implementation of the Promised function: var React = require('react'); var R = require('ramda'); var Promised = Wrapped => class Promised extends React.Component { // (1) constructor(props) { super(props); this.state = {loading: true, error: null, value: null}; } componentDidMount() { this.props.promise.then( // (2) value => this.setState({loading: false, value: value}), error => this.setState({loading: false, error: error})); } render() { if (this.state.loading) { return Loading...; } else if (this.state.error !== null) { return Error: {this.state.error.message}; } else { var propsWithoutThePromise = R.dissoc('promise', this.props); // (3) return ; } } }; Promised is a function from one component class, named Wrapped in this example, to another class. Like a function, the returned class closes over definitions in the scope where it is defined, and so the methods of the class can refer the parameters and local variables of the function. The parameter name for the wrapped component must start with a with a capital letter so that the JSX compiler recognises it as a React component, rather than an HTML DOM element. Client code passes a promise of props for the wrapped component to the wrapper as a prop named “promise”. The wrapper passes all other props through to the wrapped component unchanged. This lets you configure a Promised(X) component with the same props you would use to configure an unwrapped X component. For example, you can initialise the wrapper with event callbacks that get passed to the wrapped component when it is rendered. When the wrapper renders the wrapped component, it creates the props for wrapped component by merging the properties of the promised value its own properties, except for the promise itself. The code above uses a utility function from the Ramda library to remove the promise from the wrapper component’s props, and uses ES6 “spread” syntax to remove merge the props with the properties of the promise value. Massaging Props to Avoid Name Clash The eagle eyed reader will have noticed that the AsyncCountryChooser has a slightly different API from the original CountryChooser component above. The original accepted a promise of an array of country objects. But the Promised wrapper uses the fields of the promised value as the props of the wrapped component, so the promised value must be an object with the array of country objects in a field named “countries”. We can address that by mapping the array to an object when we create the promise: {countries: list})} onSelect={changeCountry}/>, Another problem is that the current implementation reserves the prop name “promise”. This means we cannot pass a prop named “promise” through to the wrapped component. This could cause some head-scratching in the future as we evolve the system. If it is to be compatible with arbitrary components, a higher-order component must provide a way to control the interface between the wrapper and wrapped components to avoid name clash and map the data provided by the wrapper to the props expected by the wrapped component. The most flexible method, which is most frequently by libraries published in NPM, is to parameterise the higher-order component with a function that maps the state and props of the wrapper component to props that are passed to the wrapped component. That way, the client code is in full control of the interface between the wrapper and wrappee and can programatically resolve any name clash. However, in this example, letting the caller name the promise prop and and using the promise’s then method to map the promised value to props of the wrapped component is good enough. Because a Javascript class is a closure we can pass the name of the promise prop to the Promised function along with the class to be wrapped. var Promised = (promiseProp, Wrapped) => class extends React.Component { constructor(props) { super(props); this.state = {loading: true, error: null, value: null}; } componentDidMount() { this.props[promiseProp].then( value => this.setState({loading: false, value: value}), error => this.setState({loading: false, error: error})); } render() { if (this.state.loading) { return Loading...; } else if (this.state.error !== null) { return Error: {this.state.error.message}; } else { var propsWithoutThePromise = R.dissoc(promiseProp, this.props); return ; } } }; We now have to name the promises when we apply the higher-order component to define new classes. But this lets us introduce better names into the code, which (in my opinion) is a good thing. var AsyncCountry = Promised("country", Country); var AsyncCountryChooser = Promised("countries", CountryChooser); ... {countries: list})} onSelect={changeCountry}/> Further Reading You can access the full source code for this example on GitHub. Other React libraries that use higher-order components include: React Drag and Drop React Callback Register for example, in recent project we needed to compose live updates, drag-source and drop-target behaviour into stateless rendering components↩ actually, a higher-order component could take more than one components as parameters, but we only need one in this example↩
Categories: Programming, Testing & QA