Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Testing & QA

Software Development Conferences Forecast December 2016

From the Editor of Methods & Tools - Tue, 12/27/2016 - 08:41
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP), DevOps and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods […]

Blockchain for Software Developers

From the Editor of Methods & Tools - Tue, 12/20/2016 - 09:18
In a lot of software developer conferences, there are talks about the technical aspects of the blockchains, how to develop smart contracts on top of Ethereum and things like that. But before looking at those, it is crucial to take a step back and understand what is the blockchain, what it brings to the table […]

Software Development Linkopedia December 2016

From the Editor of Methods & Tools - Wed, 12/14/2016 - 12:21
Here is our monthly selection of knowledge on programming, software testing and project management. This month you will find some interesting information and opinions about project management personalities, better teams, starting a new job, code reviews, agile testing, scaling Agile, IoT and tests quality. Blog: Implementers, Solvers, and Finders Blog: Giving better code reviews Blog: […]

Quote of the Month December 2016

From the Editor of Methods & Tools - Mon, 12/12/2016 - 15:48
Experience shows that architecting is not something that’s performed once, early in a project. Rather, architecting is applied over the life of the project; the architecture is grown through the delivery of a series of incremental and iterative deliveries of executable software. At each delivery, the architecture becomes more complete and stable, which raises the […]

The Impostor Software Developer Syndrome

From the Editor of Methods & Tools - Wed, 12/07/2016 - 17:50
Did you ever feel like a fraud as a software developer? Have the feeling that at some point, someone is going to find out that you really don’t belong where you are? That you are not as smart as other people think? You are not alone with this; many high-achieving people suffer from the imposter […]

Using Helm to install Traefik as an Ingress Controller in Kubernetes

Agile Testing - Grig Gheorghiu - Tue, 12/06/2016 - 23:15
That was a mouthful of a title...Hope this post lives up to it :)

First of all, just a bit of theory. If you want to expose your application running on Kubernetes to the outside world, you have several choices.

One choice you have is to expose the pods running your application via a Service of type NodePort or LoadBalancer. If you run your service as a NodePort, Kubernetes will allocate a random high port on every node in the cluster, and it will proxy traffic to that port to your service. Services of type LoadBalancer are only supported if you run your Kubernetes cluster using certain specific cloud providers such as AWS and GCE. In this case, the cloud provider will create a specific load balancer resource, for example an Elastic Load Balancer in AWS, which will then forward traffic to the pods comprising your service. Either way, the load balancing you get by exposing a service is fairly crude, at the TCP layer and using a round-robin algorithm.

A better choice for exposing your Kubernetes application is to use Ingress resources together with Ingress Controllers. An ingress resource is a fancy name for a set of layer 7 load balancing rules, as you might be familiar with if you use HAProxy or Pound as a software load balancer. An Ingress Controller is a piece of software that actually implements those rules by watching the Kubernetes API for requests to Ingress resources. Here is a fragment from the Ingress Controller documentation on GitHub:

What is an Ingress Controller?

An Ingress Controller is a daemon, deployed as a Kubernetes Pod, that watches the ApiServer's /ingresses endpoint for updates to the Ingress resource. Its job is to satisfy requests for ingress.
Writing an Ingress Controller

Writing an Ingress controller is simple. By way of example, the nginx controller does the following:
  • Poll until apiserver reports a new Ingress
  • Write the nginx config file based on a go text/template
  • Reload nginx
As I mentioned in a previous post, I warmly recommend watching a KubeCon presentation from Gerred Dillon on "Kubernetes Ingress: Your Router, Your Rules" if you want to further delve into the advantages of using Ingress Controllers as opposed to plain Services.
While nginx is the only software currently included in the Kubernetes source code as an Ingress Controller, I wanted to experiment with a full-fledged HTTP reverse proxy such as Traefik. I should add from the beginning that only nginx offers the TLS feature of Ingress resources. Traefik can terminate SSL of course, and I'll show how you can do that, but it is outside of the Ingress resource spec.

I've also been looking at Helm, the Kubernetes package manager, and I noticed that Traefik is one of the 'stable' packages (or Charts as they are called) currently offered by Helm, so I went the Helm route in order to install Traefik. In the following instructions I will assume that you are already running a Kubernetes cluster in AWS and that your local kubectl environment is configured to talk to that cluster.

Install Helm

This is pretty easy. Follow the instructions on GitHub to download or install a binary for your OS.

Initialize Helm

Run helm init in order to install the server component of Helm, called tiller, which will be run as a Kubernetes Deployment in the kube-system namespace of your cluster.

Get the Traefik Helm chart from GitHub

I git cloned the entire kubernetes/charts repo, then copied the traefik directory locally under my own source code repo which contains the rest of the yaml files for my Kubernetes resource manifests.

# git clone https://github.com/kubernetes/charts.git helmcharts# cp -r helmcharts/stable/traefik traefik-helm-chart
It is instructive to look at the contents of a Helm chart. The main advantage of a chart in my view is the bundling together of all the Kubernetes resources necessary to run a specific set of services. The other advantage is that you can use Go-style templates for the resource manifests, and the variables in those template files can be passed to helm via a values.yaml file or via the command line.
For more details on Helm charts and templates, I recommend this linux.com article.
Create an Ingress resource for your application service
I copied the dashboard-ingress.yaml template file from the Traefik chart and customized it so as to refer to my application's web service, which is running in a Kubernetes namespace called tenant1.

# cd traefik-helm-chart/templates# cp dashboard-ingress.yaml web-ingress.yaml# cat web-ingress.yaml{{- if .Values.tenant1.enabled }}apiVersion: extensions/v1beta1kind: Ingressmetadata:  namespace: {{ .Values.tenant1.namespace }}  name: {{ template "fullname" . }}-web-ingress  labels:    app: {{ template "fullname" . }}    chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"    release: "{{ .Release.Name }}"    heritage: "{{ .Release.Service }}"spec:  rules:  - host: {{ .Values.tenant1.domain }}    http:      paths:      - path: /        backend:          serviceName: {{ .Values.tenant1.serviceName }}          servicePort: {{ .Values.tenant1.servicePort }}{{- end }}
The variables referenced in the template above are defined in the values.yaml file in the Helm chart. I started with the variables in the values.yaml file that came with the Traefik chart and added my own customizations:
# vi traefik-helm-chart/values.yamlssl:  enabled: trueacme:  enabled: true  email: admin@mydomain.com  staging: false  # Save ACME certs to a persistent volume. WARNING: If you do not do this, you will re-request  # certs every time a pod (re-)starts and you WILL be rate limited!  persistence:    enabled: true    storageClass: kubernetes.io/aws-ebs    accessMode: ReadWriteOnce    size: 1Gidashboard:  enabled: true  domain: tenant1-lb.dev.mydomain.comgzip:  enabled: falsetenant1:  enabled: true  namespace: tenant1  domain: tenant1.dev.mydomain.com  serviceName: web  servicePort: http
Note that I added a section called tenant1, where I defined the variables referenced in the web-ingress.yaml template above. I also enabled the ssl and acme sections, so that Traefik can automatically install SSL certificates from Let's Encrypt via the ACME protocol.
Install your customized Helm chart for Traefik
With these modifications done, I ran 'helm install' to actually deploy the various Kubernetes resources included in the Traefik chart. 
I specified the directory containing my Traefik chart files (traefik-helm-chart) as the last argument passed to helm install:
# helm install --name tenant1-lb --namespace tenant1 traefik-helm-chart/NAME: tenant1-lbLAST DEPLOYED: Tue Nov 29 09:51:12 2016NAMESPACE: tenant1STATUS: DEPLOYED
RESOURCES:==> extensions/IngressNAME                                  HOSTS                    ADDRESS   PORTS     AGEtenant1-lb-traefik-web-ingress   tenant1.dev.mydomain.com             80        1stenant1-lb-traefik-dashboard   tenant1-lb.dev.mydomain.com             80        0s
==> v1/PersistentVolumeClaimNAME                    STATUS    VOLUME    CAPACITY   ACCESSMODES   AGEtenant1-lb-traefik-acme   Pending                                      0s
==> v1/SecretNAME                            TYPE      DATA      AGEtenant1-lb-traefik-default-cert   Opaque    2         1s
==> v1/ConfigMapNAME               DATA      AGEtenant1-lb-traefik   1         1s
==> v1/ServiceNAME                         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGEtenant1-lb-traefik-dashboard   10.3.0.15    <none>        80/TCP    1stenant1-lb-traefik   10.3.0.215   <pending>   80/TCP,443/TCP   1s
==> extensions/DeploymentNAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGEtenant1-lb-traefik   1         1         1            0           1s

NOTES:1. Get Traefik's load balancer IP/hostname:
    NOTE: It may take a few minutes for this to become available.
    You can watch the status by running:
        $ kubectl get svc tenant1-lb-traefik --namespace tenant1 -w
    Once 'EXTERNAL-IP' is no longer '<pending>':
        $ kubectl describe svc tenant1-lb-traefik --namespace tenant1 | grep Ingress | awk '{print $3}'
2. Configure DNS records corresponding to Kubernetes ingress resources to point to the load balancer IP/hostname found in step 1
At this point you should see two Ingress resources, one for the Traefik dashboard and on for the custom web ingress resource:
# kubectl --namespace tenant1 get ingressNAME                           HOSTS                       ADDRESS   PORTS     AGEtenant1-lb-traefik-dashboard   tenant1-lb.dev.mydomain.com           80        50stenant1-lb-traefik-web-ingress tenant1.dev.mydomain.com            80        51s
As per the Helm notes above (shown as part of the output of helm install), run this command to figure out the CNAME of the AWS ELB created by Kubernetes during the creation of the tenant1-lb-traefik service of type LoadBalancer:
# kubectl describe svc tenant1-lb-traefik --namespace tenant1 | grep Ingress | awk '{print $3}'a5be275d8b65c11e685a402e9ec69178-91587212.us-west-2.elb.amazonaws.com
Create tenant1.dev.mydomain.com and tenant1-lb.dev.mydomain.com as DNS CNAME records pointing to a5be275d8b65c11e685a402e9ec69178-91587212.us-west-2.elb.amazonaws.com.

Now, if you hit http://tenant1-lb.dev.mydomain.com you should see the Traefik dashboard showing the frontends on the left and the backends on the right:
Screen Shot 2016-11-29 at 10.54.07 AM.pngIf you hit http://tenant1.dev.mydomain.com you should see your web service in action.
You can also inspect the logs of the tenant1-lb-traefik pod to see what's going on under the covers when Traefik is launched and to verify that the Let's Encrypt SSL certificates were properly downloaded via ACME:
# kubectl --namespace tenant1 logs tenant1-lb-traefik-3710322105-o2887time="2016-11-29T00:03:51Z" level=info msg="Traefik version v1.1.0 built on 2016-11-18_09:20:46AM"time="2016-11-29T00:03:51Z" level=info msg="Using TOML configuration file /config/traefik.toml"time="2016-11-29T00:03:51Z" level=info msg="Preparing server http &{Network: Address::80 TLS:<nil> Redirect:<nil> Auth:<nil> Compress:false}"time="2016-11-29T00:03:51Z" level=info msg="Preparing server https &{Network: Address::443 TLS:0xc4201b1800 Redirect:<nil> Auth:<nil> Compress:false}"time="2016-11-29T00:03:51Z" level=info msg="Starting server on :80"time="2016-11-29T00:03:58Z" level=info msg="Loading ACME Account..."time="2016-11-29T00:03:59Z" level=info msg="Loaded ACME config from store /acme/acme.json"time="2016-11-29T00:04:01Z" level=info msg="Starting provider *main.WebProvider {\"Address\":\":8080\",\"CertFile\":\"\",\"KeyFile\":\"\",\"ReadOnly\":false,\"Auth\":null}"time="2016-11-29T00:04:01Z" level=info msg="Starting provider *provider.Kubernetes {\"Watch\":true,\"Filename\":\"\",\"Constraints\":[],\"Endpoint\":\"\",\"DisablePassHostHeaders\":false,\"Namespaces\":null,\"LabelSelector\":\"\"}"time="2016-11-29T00:04:01Z" level=info msg="Retrieving ACME certificates..."time="2016-11-29T00:04:01Z" level=info msg="Retrieved ACME certificates"time="2016-11-29T00:04:01Z" level=info msg="Starting server on :443"time="2016-11-29T00:04:01Z" level=info msg="Server configuration reloaded on :80"time="2016-11-29T00:04:01Z" level=info msg="Server configuration reloaded on :443"
To get an even better warm and fuzzy feeling about the SSL certificates installed via ACME, you can run this command against the live endpoint tenant1.dev.mydomain.com:
# echo | openssl s_client -showcerts -servername tenant1.dev.mydomain.com -connect tenant1.dev.mydomain.com:443 2>/dev/nullCONNECTED(00000003)---Certificate chain 0 s:/CN=tenant1.dev.mydomain.com   i:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3-----BEGIN CERTIFICATE-----MIIGEDCCBPigAwIBAgISAwNwBNVU7ZHlRtPxBBOPPVXkMA0GCSqGSIb3DQEBCwUA-----END CERTIFICATE----- 1 s:/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3   i:/O=Digital Signature Trust Co./CN=DST Root CA X3-----BEGIN CERTIFICATE-----uM2VcGfl96S8TihRzZvoroed6ti6WqEBmtzw3Wodatg+VyOeph4EYpr/1wXKtx8/KOqkqm57TH2H3eDJAkSnh6/DNFu0Qg==-----END CERTIFICATE--------Server certificatesubject=/CN=tenant1.dev.mydomain.comissuer=/C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3---No client certificate CA names sent---SSL handshake has read 3009 bytes and written 713 bytes---New, TLSv1/SSLv3, Cipher is AES128-SHAServer public key is 4096 bitSecure Renegotiation IS supportedCompression: NONEExpansion: NONESSL-Session:    Protocol  : TLSv1    Cipher    : AES128-SHA    Start Time: 1480456552    Timeout   : 300 (sec)    Verify return code: 0 (ok)etc.
Other helm commands
You can list the Helm releases that are currently running (a Helm release is a particular versioned instance of a Helm chart) with helm list:
# helm listNAME        REVISION UPDATED                  STATUS   CHARTtenant1-lb    1        Tue Nov 29 10:13:47 2016 DEPLOYED traefik-1.1.0-a

If you change any files or values in a Helm chart, you can apply the changes by means of the 'helm upgrade' command:

# helm upgrade tenant1-lb traefik-helm-chart
You can see the status of a release with helm status:
# helm status tenant1-lbLAST DEPLOYED: Tue Nov 29 10:13:47 2016NAMESPACE: tenant1STATUS: DEPLOYED
RESOURCES:==> v1/ServiceNAME               CLUSTER-IP   EXTERNAL-IP        PORT(S)          AGEtenant1-lb-traefik   10.3.0.76    a92601b47b65f...   80/TCP,443/TCP   35mtenant1-lb-traefik-dashboard   10.3.0.36   <none>    80/TCP    35m
==> extensions/DeploymentNAME               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGEtenant1-lb-traefik   1         1         1            1           35m
==> extensions/IngressNAME                                  HOSTS                    ADDRESS   PORTS     AGEtenant1-lb-traefik-web-ingress   tenant1.dev.mydomain.com             80        35mtenant1-lb-traefik-dashboard   tenant1-lb.dev.mydomain.com             80        35m
==> v1/PersistentVolumeClaimNAME                    STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGEtenant1-lb-traefik-acme   Bound     pvc-927df794-b65f-11e6-85a4-02e9ec69178b   1Gi        RWO           35m
==> v1/SecretNAME                            TYPE      DATA      AGEtenant1-lb-traefik-default-cert   Opaque    2         35m
==> v1/ConfigMapNAME               DATA      AGEtenant1-lb-traefik   1         35m




Announcing OSS-Fuzz: Continuous Fuzzing for Open Source Software

Google Testing Blog - Thu, 12/01/2016 - 18:00
By Mike Aizatsky, Kostya Serebryany (Software Engineers, Dynamic Tools); Oliver Chang, Abhishek Arya (Security Engineers, Google Chrome); and Meredith Whittaker (Open Research Lead). 

We are happy to announce OSS-Fuzz, a new Beta program developed over the past years with the Core Infrastructure Initiative community. This program will provide continuous fuzzing for select core open source software.

Open source software is the backbone of the many apps, sites, services, and networked things that make up "the internet." It is important that the open source foundation be stable, secure, and reliable, as cracks and weaknesses impact all who build on it.

Recent security storiesconfirm that errors likebuffer overflow anduse-after-free can have serious, widespread consequences when they occur in critical open source software. These errors are not only serious, but notoriously difficult to find via routine code audits, even for experienced developers. That's wherefuzz testing comes in. By generating random inputs to a given program, fuzzing triggers and helps uncover errors quickly and thoroughly.

In recent years, several efficient general purpose fuzzing engines have been implemented (e.g. AFL and libFuzzer), and we use them to fuzz various components of the Chrome browser. These fuzzers, when combined with Sanitizers, can help find security vulnerabilities (e.g. buffer overflows, use-after-free, bad casts, integer overflows, etc), stability bugs (e.g. null dereferences, memory leaks, out-of-memory, assertion failures, etc) and sometimeseven logical bugs.

OSS-Fuzz's goal is to make common software infrastructure more secure and stable by combining modern fuzzing techniques with scalable distributed execution. OSS-Fuzz combines various fuzzing engines (initially, libFuzzer) with Sanitizers (initially, AddressSanitizer) and provides a massive distributed execution environment powered by ClusterFuzz.
Early successesOur initial trials with OSS-Fuzz have had good results. An example is the FreeType library, which is used on over a billion devices to display text (and which might even be rendering the characters you are reading now). It is important for FreeType to be stable and secure in an age when fonts are loaded over the Internet. Werner Lemberg, one of the FreeType developers, wasan early adopter of OSS-Fuzz. Recently the FreeType fuzzer found a new heap buffer overflow only a few hours after the source change:

ERROR: AddressSanitizer: heap-buffer-overflow on address 0x615000000ffa 
READ of size 2 at 0x615000000ffa thread T0
SCARINESS: 24 (2-byte-read-heap-buffer-overflow-far-from-bounds)
#0 0x885e06 in tt_face_vary_cvtsrc/truetype/ttgxvar.c:1556:31

OSS-Fuzz automatically notifiedthe maintainer, whofixed the bug; then OSS-Fuzz automaticallyconfirmed the fix. All in one day! You can see the full list of fixed and disclosed bugs found by OSS-Fuzz so far.
Contributions and feedback are welcomeOSS-Fuzz has already found 150 bugs in several widely used open source projects (and churns ~4 trillion test cases a week). With your help, we can make fuzzing a standard part of open source development, and work with the broader community of developers and security testers to ensure that bugs in critical open source applications, libraries, and APIs are discovered and fixed. We believe that this approach to automated security testing will result in real improvements to the security and stability of open source software.

OSS-Fuzz is launching in Beta right now, and will be accepting suggestions for candidate open source projects. In order for a project to be accepted to OSS-Fuzz, it needs to have a large user base and/or be critical to Global IT infrastructure, a general heuristic that we are intentionally leaving open to interpretation at this early stage. See more details and instructions on how to apply here.

Once a project is signed up for OSS-Fuzz, it is automatically subject to the 90-day disclosure deadline for newly reported bugs in our tracker (see details here). This matches industry's best practices and improves end-user security and stability by getting patches to users faster.

Help us ensure this program is truly serving the open source community and the internet which relies on this critical software, contribute and leave your feedback on GitHub.
Categories: Testing & QA

Announcing OSS-Fuzz: Continuous Fuzzing for Open Source Software

Google Testing Blog - Thu, 12/01/2016 - 18:00
By Mike Aizatsky, Kostya Serebryany (Software Engineers, Dynamic Tools); Oliver Chang, Abhishek Arya (Security Engineers, Google Chrome); and Meredith Whittaker (Open Research Lead). 

We are happy to announce OSS-Fuzz, a new Beta program developed over the past years with the Core Infrastructure Initiative community. This program will provide continuous fuzzing for select core open source software.

Open source software is the backbone of the many apps, sites, services, and networked things that make up "the internet." It is important that the open source foundation be stable, secure, and reliable, as cracks and weaknesses impact all who build on it.

Recent security storiesconfirm that errors likebuffer overflow anduse-after-free can have serious, widespread consequences when they occur in critical open source software. These errors are not only serious, but notoriously difficult to find via routine code audits, even for experienced developers. That's wherefuzz testing comes in. By generating random inputs to a given program, fuzzing triggers and helps uncover errors quickly and thoroughly.

In recent years, several efficient general purpose fuzzing engines have been implemented (e.g. AFL and libFuzzer), and we use them to fuzz various components of the Chrome browser. These fuzzers, when combined with Sanitizers, can help find security vulnerabilities (e.g. buffer overflows, use-after-free, bad casts, integer overflows, etc), stability bugs (e.g. null dereferences, memory leaks, out-of-memory, assertion failures, etc) and sometimeseven logical bugs.

OSS-Fuzz's goal is to make common software infrastructure more secure and stable by combining modern fuzzing techniques with scalable distributed execution. OSS-Fuzz combines various fuzzing engines (initially, libFuzzer) with Sanitizers (initially, AddressSanitizer) and provides a massive distributed execution environment powered by ClusterFuzz.
Early successesOur initial trials with OSS-Fuzz have had good results. An example is the FreeType library, which is used on over a billion devices to display text (and which might even be rendering the characters you are reading now). It is important for FreeType to be stable and secure in an age when fonts are loaded over the Internet. Werner Lemberg, one of the FreeType developers, wasan early adopter of OSS-Fuzz. Recently the FreeType fuzzer found a new heap buffer overflow only a few hours after the source change:

ERROR: AddressSanitizer: heap-buffer-overflow on address 0x615000000ffa 
READ of size 2 at 0x615000000ffa thread T0
SCARINESS: 24 (2-byte-read-heap-buffer-overflow-far-from-bounds)
#0 0x885e06 in tt_face_vary_cvtsrc/truetype/ttgxvar.c:1556:31

OSS-Fuzz automatically notifiedthe maintainer, whofixed the bug; then OSS-Fuzz automaticallyconfirmed the fix. All in one day! You can see the full list of fixed and disclosed bugs found by OSS-Fuzz so far.
Contributions and feedback are welcomeOSS-Fuzz has already found 150 bugs in several widely used open source projects (and churns ~4 trillion test cases a week). With your help, we can make fuzzing a standard part of open source development, and work with the broader community of developers and security testers to ensure that bugs in critical open source applications, libraries, and APIs are discovered and fixed. We believe that this approach to automated security testing will result in real improvements to the security and stability of open source software.

OSS-Fuzz is launching in Beta right now, and will be accepting suggestions for candidate open source projects. In order for a project to be accepted to OSS-Fuzz, it needs to have a large user base and/or be critical to Global IT infrastructure, a general heuristic that we are intentionally leaving open to interpretation at this early stage. See more details and instructions on how to apply here.

Once a project is signed up for OSS-Fuzz, it is automatically subject to the 90-day disclosure deadline for newly reported bugs in our tracker (see details here). This matches industry's best practices and improves end-user security and stability by getting patches to users faster.

Help us ensure this program is truly serving the open source community and the internet which relies on this critical software, contribute and leave your feedback on GitHub.
Categories: Testing & QA

Kubernetes resource graphing with Heapster, InfluxDB and Grafana

Agile Testing - Grig Gheorghiu - Tue, 11/29/2016 - 23:58
I know that the Cloud Native Computing Foundation chose Prometheus as the monitoring platform of choice for Kubernetes, but in this post I'll show you how to quickly get started with graphing CPU, memory, disk and network in a Kubernetes cluster using Heapster, InfluxDB and Grafana.

The documentation in the kubernetes/heapster GitHub repo is actually pretty good. Here's what I did:

$ git clone https://github.com/kubernetes/heapster.git
$ cd heapster/deploy/kube-config/influxdb

Look at the yaml manifests to see if you need to customize anything. I left everything 'as is' and ran:

$ kubectl create -f .
deployment "monitoring-grafana" created
service "monitoring-grafana" created
deployment "heapster" created
service "heapster" created
deployment "monitoring-influxdb" created
service "monitoring-influxdb" created

Then you can run 'kubectl cluster-info' and look for the monitoring-grafana endpoint. Since the monitoring-grafana service is of type LoadBalancer, if you run your Kubernetes cluster in AWS, the service creation will also involve the creation of an ELB. By default the ELB security group allows 80 from all, so I edited that to restrict it to some known IPs.

After a few minutes, you should see CPU and memory graphs shown in the Kubernetes dashboard. Here is an example showing pods running in the kube-system namespace:



You can also hit the Grafana endpoint and choose the Cluster or Pods dashboards. Note that if you have a namespace different from default and kube-system, you have to enter its name manually in the namespace field of the Grafana Pods dashboard. Only then you'll be able to see data corresponding to pods running in that namespace (or at least I had to jump through that hoop.)

Here is an example of graphs for the kubernetes-dashboard pod running in the kube-system namespace:


For info on how to customize the Grafana graphs, here's a good post from Deis.

Software Development Conferences Forecast November 2016

From the Editor of Methods & Tools - Tue, 11/29/2016 - 16:00
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP), DevOps and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods […]

Quote of the Month November 2016

From the Editor of Methods & Tools - Mon, 11/28/2016 - 13:32
There is surely no team sport in which every player on the field is not accurately aware of the score at any and every moment of play. Yet in software development it is not uncommon to find team members who do not know the next deadline, or what their colleagues are doing. Nor is it […]

Rethinking Equivalence Class Partitioning, Part 1

James Bach’s Blog - Sun, 11/27/2016 - 13:41

Wikipedia’s article on equivalence class partitioning (ECP) is a great example of the poor thinking and teaching and writing that often passes for wisdom in the testing field. It’s narrow and misleading, serving to imply that testing is some little game we play with our software, rather than an open investigation of a complex phenomenon.

(No, I’m not going to edit that article. I don’t find it fun or rewarding to offer my expertise in return for arguments with anonymous amateurs. Wikipedia is important because it serves as a nearly universal reference point when criticizing popular knowledge, but just like popular knowledge itself, it is not fixable. The populus will always prevail, and the populus is not very thoughtful.)

In this article I will comment on the Wikipedia post. In a subsequent post I will describe ECP my way, and you can decide for yourself if that is better than Wikipedia.

“Equivalence partitioning or equivalence class partitioning (ECP)[1] is a software testing technique that divides the input data of a software unit into partitions of equivalent data from which test cases can be derived.”

Not exactly. There’s no reason why ECP should be limited to “input data” as such. The ECP thought process may be applied to output, or even versions of products, test environments, or test cases themselves. ECP applies to anything you might be considering to do that involves any variations that may influence the outcome of a test.

Yes, ECP is a technique, but a better word for it is “heuristic.” A heuristic is a fallible method of solving a problem. ECP is extremely fallible, and yet useful.

“In principle, test cases are designed to cover each partition at least once. This technique tries to define test cases that uncover classes of errors, thereby reducing the total number of test cases that must be developed.”

This text is pretty good. Note the phrase “In principle” and the use of the word “tries.” These are softening words, which are important because ECP is a heuristic, not an algorithm.

Speaking in terms of “test cases that must be developed,” however, is a misleading way to discuss testing. Testing is not about creating test cases. It is for damn sure not about the number of test cases you create. Testing is about performing experiments. And the totality of experimentation goes far beyond such questions as “what test case should I develop next?” The text should instead say “reducing test effort.”

“An advantage of this approach is reduction in the time required for testing a software due to lesser number of test cases.”

Sorry, no. The advantage of ECP is not in reducing the number of test cases. Nor is it even about reducing test effort, as such (even though it is true that ECP is “trying” to reduce test effort). ECP is just a way to systematically guess where the bigger bugs probably are, which helps you focus your efforts. ECP is a prioritization technique. It also helps you explain and defend those choices. Better prioritization does not, by itself, allow you to test with less effort, but we do want to stumble into the big bugs sooner rather than later. And we want to stumble into them with more purpose and less stumbling. And if we do that well, we will feel comfortable spending less effort on the testing. Reducing effort is really a side effect of ECP.

“Equivalence partitioning is typically applied to the inputs of a tested component, but may be applied to the outputs in rare cases. The equivalence partitions are usually derived from the requirements specification for input attributes that influence the processing of the test object.”

Typically? Usually? Has this writer done any sort of research that would substantiate that? No.

ECP is a process that we all do informally, not only in testing but in our daily lives. When you push open a door, do you consciously decide to push on a specific square centimeter of the metal push plate? No, you don’t. You know that for most doors it doesn’t matter where you push. All pushable places are more or less equivalent. That is ECP! We apply ECP to anything that we interact with.

Yes, we apply it to output. And yes, we can think of equivalence classes based on specifications, but we also think of them based on all other learning we do about the software. We perform ECP based on all that we know. If what we know is wrong (for instance if there are unexpected bugs) then our equivalence classes will also be wrong. But that’s okay, if you understand that ECP is a heuristic and not a golden ticket to perfect testing.

“The fundamental concept of ECP comes from equivalence class which in turn comes from equivalence relation. A software system is in effect a computable function implemented as an algorithm in some implementation programming language. Given an input test vector some instructions of that algorithm get covered, ( see code coverage for details ) others do not…”

At this point the article becomes Computer Science propaganda. This is why we can’t have nice things in testing: as soon as the CS people get hold of it, they turn it into a little logic game for gifted kids, rather than a pursuit worthy of adults charged with discovering important problems in technology before it’s too late.

The fundamental concept of ECP has nothing to do with computer science or computability. It has to do with logic. Logic predates computers. An equivalence class is simply a set. It is a set of things that share some property. The property of interest in ECP is utility for exploring a particular product risk. In other words, an equivalence class in testing is an assertion that any member of that particular group of things would be more or less equally able to reveal a particular kind of bug if it were employed in a particular kind of test.

If I define a “test condition” as something about a product or its environment that could be examined in a test, then I can define equivalence classes like this: An equivalence class is a set of tests or test conditions that are equivalent with respect to a particular product risk, in a particular context. 

This implies that two inputs which are not equivalent for the purposes of one kind of bug may be equivalent for finding another kind of bug. It also implies that if we model a product incorrectly, we will also be unable to know the true equivalence classes. Actually, considering that bugs come in all shapes and sizes, to have the perfectly correct set of equivalence classes would be the same as knowing, without having tested, where all the bugs in the product are. This is because ECP is based on guessing what kind of bugs are in the product.

If you read the technical stuff about Computer Science in the Wikipedia article, you will see that the author has decided that two inputs which cover the same code are therefore equivalent for bug finding purposes. But this is not remotely true! This is a fantasy propagated by people who I suspect have never tested anything that mattered. Off the top of my head, code-coverage-as-gold-standard ignores performance bugs, requirements bugs, usability bugs, data type bugs, security bugs, and integration bugs. Imagine two tests that cover the same code, and both involve input that is displayed on the screen, except that one includes an input which is so long that when it prints it goes off the edge of the screen. This is a bug that the short input didn’t find, even though both inputs are “valid” and “do the same thing” functionally.

The Fundamental Problem With Most Testing Advice Is…

The problem with most testing advice is that it is either uncritical folklore that falls apart as soon as you examine it, or else it is misplaced formalism that doesn’t apply to realistic open-ended problems. Testing advice is better when it is grounded in a general systems perspective as well as a social science perspective. Both of these perspectives understand and use heuristics. ECP is a powerful, ubiquitous, and rather simple heuristic, whose utility comes from and is limited by your mental model of the product. In my next post, I will walk through an example of how I use it in real life.

Categories: Testing & QA

Running an application using Kubernetes on AWS

Agile Testing - Grig Gheorghiu - Wed, 11/23/2016 - 02:13
I've been knee-deep in Kubernetes for the past few weeks and to say that I like it is an understatement. It's exhilarating to have at your fingertips a distributed platfom created by Google's massive brain power.

I'll jump right in and talk about how I installed Kubernetes in AWS and how I created various resources in Kubernetes in order to run a database-backed PHP-based web application.

Installing Kubernetes

I used the tack tool from my laptop running OSX to spin up a Kubernetes cluster in AWS. Tack uses terraform under the hood, which I liked a lot because it makes it very easy to delete all AWS resources and start from scratch while you are experimenting with it. I went with the tack defaults and spun up 3 m3.medium EC2 instances for running etcd and the Kubernetes API, the scheduler and the controller manager in an HA configuration. Tack also provisioned 3 m3.medium EC2 instances as Kubernetes workers/minions, in an EC2 auto-scaling group. Finally, tack spun up a t2.nano EC2 instance to server as a bastion host for getting access into the Kubernetes cluster. All 7 EC2 instances launched by tack run CoreOS.

Using kubectl

Tack also installs kubectl, which is the Kubernetes command-line management tool. I used kubectl to create the various Kubernetes resources needed to run my application: deployments, services, secrets, config maps, persistent volumes etc. It pays to become familiar with the syntax and arguments of kubectl.

Creating namespaces

One thing I needed to do right off the bat was to think about ways to achieve multi-tenancy in my Kubernetes cluster. This is done with namespaces. Here's my namespace.yaml file:

$ cat namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: tenant1

To create the namespace tenant1, I used kubectl create:

$ kubectl create -f namespace.yaml

To list all namespaces:

$ kubectl get namespaces
NAME          STATUS    AGE
default       Active    12d
kube-system   Active    12d
tenant1       Active    11d 

If you don't need a dedicated namespace per tenant, you can just run kubectl commands in the 'default' namespace.

Creating persistent volumes, storage classes and persistent volume claims

I'll show how you can create two types of Kubernetes persistent volumes in AWS: one based on EFS, and one based on EBS. I chose the EFS one for my web application layer, for things such as shared configuration and media files. I chose the EBS one for my database layer, to be mounted as the data volume.

First, I created an EFS share using the AWS console (although I recommend using terraform to do it automatically, but I am not there yet). I allowed the Kubernetes worker security group to access this share. I noted one of the DNS names available for it, e.g. us-west-2a.fs-c830ab1c.efs.us-west-2.amazonaws.com. I used this Kubernetes manifest to define a persistent volume (PV) based on this EFS share:

$ cat web-pv-efs.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-efs-web
spec:
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: s-west-2a.fs-c830ab1c.efs.us-west-2.amazonaws.com
    path: "/"

To create the PV, I used kubectl create, and I also specified the namespace tenant1:

$ kubectl create -f web-pv-efs.yaml --namespace tenant1

However, creating a PV is not sufficient. Pods use persistent volume claims (PVC) to refer to persistent volumes in their manifests. So I had to create a PVC:

$ cat web-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: web-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi 

$ kubectl create -f web-pvc.yaml --namespace tenant1

Note that a PVC does not refer directly to a PV. The storage specified in the PVC is provisioned from available persistent volumes.

Instead of defining a persistent volume for the EBS volume I wanted to use for the database, I created a storage class:

$ cat db-storageclass-ebs.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: db-ebs
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2

$ kubectl create -f db-storageclass-ebs.yaml --namespace tenant1

I also created a PVC which does refer directly to the storage class name db-ebs. When the PVC is used in a pod, the underlying resource (i.e. the EBS volume in this case) will be automatically provisioned by Kubernetes.

$ cat db-pvc-ebs.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: db-pvc-ebs
  annotations:
     volume.beta.kubernetes.io/storage-class: 'db-ebs'
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi

$ kubectl create -f db-pvc-ebs.yaml --namespace tenant1

To list the newly created resource, you can use:

$ kubectl get pv,pvc,storageclass --namespace tenant1

Creating secrets and ConfigMaps

I followed the "Persistent Installation of MySQL and Wordpress on Kubernetes" guide to figure out how to create and use Kubernetes secrets. Here is how to create a secret for the MySQL root password, necessary when you spin up a pod based on a Percona or plain MySQL image:
$ echo -n $MYSQL_ROOT_PASSWORD > mysql-root-pass.secret
$ kubectl create secret generic mysql-root-pass --from-file=mysql-root-pass.secret --namespace tenant1 

Kubernetes also has the handy notion of ConfigMap, a resource where you can store either entire configuration files, or key/value properties that you can then use in other Kubernetes resource definitions. For example, I save the GitHub branch and commit environment variables for the code I deploy in a ConfigMap:
$ kubectl create configmap git-config --namespace tenant1 \                 --from-literal=GIT_BRANCH=$GIT_BRANCH \                 --from-literal=GIT_COMMIT=$GIT_COMMIT
I'll show how to use secrets and ConfigMaps in pod definitions a bit later on.
Creating an ECR image pull secret and a service account

We use AWS ECR to store our Docker images. Kubernetes can access images stored in ECR, but you need to jump through a couple of hoops to make that happen. First, you need to create a Kubernetes secret of type dockerconfigjson which encapsulates the ECR credentials in base64 format. Here's a shell script that generates a file called ecr-pull-secret.yaml:

#!/bin/bash

TMP_JSON_CONFIG=/tmp/ecr_config.json

PASSWORD=$(aws --profile default --region us-west-2 ecr get-login | cut -d ' ' -f 6)

cat > $TMP_JSON_CONFIG << EOF
{"https://YOUR_AWS_ECR_ID.dkr.ecr.us-west-2.amazonaws.com":{"username":"AWS","email":"none","password":"$PASSWORD"}}
EOF


BASE64CONFIG=$(cat $TMP_JSON_CONFIG | base64)
cat > ecr-pull-secret.yaml << EOF
apiVersion: v1
kind: Secret
metadata:
  name: ecr-key
  namespace: tenant1
data:
  .dockerconfigjson: $BASE64CONFIG
type: kubernetes.io/dockerconfigjson
EOF

rm -rf $TMP_JSON_CONFIG

Once you run the script and generate the file, you can then define a Kubernetes service account that will use this secret:

$ cat service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: tenant1
  name: tenant1-dev
imagePullSecrets:
 - name: ecr-key

Note that the service account refers to the ecr-key secret in the imagePullSecrets property.

As usual, kubectl create will create these resources based on their manifests:

$ kubectl create -f ecr-pull-secret.yaml
$ kubectl create -f service-account.yaml

Creating deployments

The atomic unit of scheduling in Kubernetes is a pod. You don't usually create a pod directly (though you can, and I'll show you a case where it makes sense.) Instead, you create a deployment, which keeps track of how many pod replicas you need, and spins up the exact number of pods to fulfill your requirement. A deployment actually creates a replica set under the covers, but in general you don't deal with replica sets directly. Note that deployments are the new recommended way to create multiple pods. The old way, which is still predominant in the documentation, was to use replication controllers.

Here's my deployment manifest for a pod running a database image:

$ cat db-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: db-deployment
  labels:
    app: myapp
spec:
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: myapp
        tier: db
    spec:
      containers:
      - name: db
        image: MY_ECR_ID.dkr.ecr.us-west-2.amazonaws.com/myapp-db:tenant1
        imagePullPolicy: Always
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-root-pass
              key: mysql-root-pass.secret
        - name: MYSQL_DATABASE
          valueFrom:
            configMapKeyRef:
              name: tenant1-config
              key: MYSQL_DATABASE
        - name: MYSQL_USER
          valueFrom:
            configMapKeyRef:
              name: tenant1-config
              key: MYSQL_USER
        - name: MYSQL_DUMP_FILE
          valueFrom:
            configMapKeyRef:
              name: tenant1-config
              key: MYSQL_DUMP_FILE
        - name: S3_BUCKET
          valueFrom:
            configMapKeyRef:
              name: tenant1-config
              key: S3_BUCKET
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: ebs
          mountPath: /var/lib/mysql
      volumes:
      - name: ebs
        persistentVolumeClaim:
          claimName:  db-pvc-ebs
      serviceAccount: tenant1-dev

The template section specifies the elements necessary for spinning up new pods. Of particular importance are the labels, which, as we will see, are used by services to select pods that are included in a given service.  The image property specifies the ECR Docker image used to spin up new containers. In my case, the image is called myapp-db and it is tagged with the tenant name tenant1. Here is the Dockerfile from which this image was generated:

$ cat Dockerfile
FROM mysql:5.6

# disable interactive functions
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y python-pip
RUN pip install awscli

VOLUME /var/lib/mysql

COPY etc/mysql/my.cnf /etc/mysql/my.cnf
COPY scripts/db_setup.sh /usr/local/bin/db_setup.sh

Nothing out of the ordinary here. The image is based on the mysql DockerHub image, specifically version 5.6. The my.cnf is getting added in as a customization, and a db_setup.sh script is copied over so it can be run at a later time.

Some other things to note about the deployment manifest:

  • I made pretty heavy use of secrets and ConfigMap key/values
  • I also used the db-pvc-ebs Persistent Volume Claim and mounted the underlying physical resource (an EBS volume in this case) as /var/lib/mysql
  • I used the tenant1-dev service account, which allows the deployment to pull down the container image from ECR
  • I didn't specify the number of replicas I wanted, which means that 1 pod will be created (the default)

To create the deployment, I ran kubectl:

$ kubectl create -f db-deployment.yaml --record --namespace tenant1

Note that I used the --record flag, which tells Kubernetes to keep a history of the commands used to create or update that deployment. You can show this history with the kubectl rollout history command:

$ kubectl --namespace tenant1 rollout history deployment db-deployment 

To list the running deployments, replica sets and pods, you can use:

$ kubectl get get deployments,rs,pods --namespace tenant1 --show-all

Here is another example of a deployment manifest, this time for redis:

$ cat redis-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: redis-deployment
spec:
  replicas: 1
  minReadySeconds: 10
  template:
    metadata:
      labels:
        app: myapp
        tier: redis
    spec:
      containers:
        - name: redis
          command: ["redis-server", "/etc/redis/redis.conf", "--requirepass", "$(REDIS_PASSWORD)"]
          image: MY_ECR_ID.dkr.ecr.us-west-2.amazonaws.com/myapp-redis:tenant1
          imagePullPolicy: Always
          env:
          - name: REDIS_PASSWORD
            valueFrom:
              secretKeyRef:
                name: redis-pass
                key: redis-pass.secret
          ports:
          - containerPort: 6379
            protocol: TCP
      serviceAccount: tenant1-dev

One thing that is different from the db deployment is the way a secret (REDIS_PASSWORD) is used as a command-line parameter for the container command. Make sure you use in this case the syntax $(VARIABLE_NAME) because that's what Kubernetes expects.

Also note the labels, which have app: myapp in common with the db deployment, but a different value for tier, redis instead of db.

My last deployment example for now is the one for the web application pods:

$ cat web-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 2
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: myapp
        tier: frontend
    spec:
      containers:
      - name: web
        image: MY_ECR_ID.dkr.ecr.us-west-2.amazonaws.com/myapp-web:tenant1
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: web-persistent-storage
          mountPath: /var/www/html/shared
      volumes:
      - name: web-persistent-storage
        persistentVolumeClaim:
          claimName: web-pvc
      serviceAccount: tenant1-dev

Note that replicas is set to 2, so that 2 pods will be launched and kept running at all times. The labels have the same common part app: myapp, but the tier is different, set to frontend.  The persistent volume claim web-pvc for the underlying physical EFS volume is used to mount /var/www/html/shared over EFS.

The image used for the container is derived from a stock ubuntu:14.04 DockerHub image, with apache and php 5.6 installed on top. Something along these lines:

FROM ubuntu:14.04

RUN apt-get update && \
    apt-get install -y ntp build-essential binutils zlib1g-dev telnet git acl lzop unzip mcrypt expat xsltproc python-pip curl language-pack-en-base && \
    pip install awscli

RUN export LC_ALL=en_US.UTF-8 && export LC_ALL=en_US.UTF-8 && export LANG=en_US.UTF-8 && \
        apt-get install -y mysql-client-5.6 software-properties-common && add-apt-repository ppa:ondrej/php

RUN apt-get update && \
    apt-get install -y --allow-unauthenticated apache2 apache2-utils libapache2-mod-php5.6 php5.6 php5.6-mcrypt php5.6-curl php-pear php5.6-common php5.6-gd php5.6-dev php5.6-opcache php5.6-json php5.6-mysql

RUN apt-get remove -y libapache2-mod-php5 php7.0-cli php7.0-common php7.0-json php7.0-opcache php7.0-readline php7.0-xml

RUN curl -sSL https://getcomposer.org/composer.phar -o /usr/bin/composer \
    && chmod +x /usr/bin/composer \
    && composer selfupdate

COPY files/apache2-foreground /usr/local/bin/
RUN chmod +x /usr/local/bin/apache2-foreground
EXPOSE 80
CMD bash /usr/local/bin/apache2-foreground

Creating services

In Kubernetes, you are not supposed to refer to individual pods when you want to target the containers running inside them. Instead, you need to use services, which provide endpoints for accessing a set of pods based on a set of labels.

Here is an example of a service for the db-deployment I created above:

$ cat db-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: db
  labels:
    app: myapp
spec:
  ports:
    - port: 3306
  selector:
    app: myapp
    tier: db
  clusterIP: None

Note the selector property, which is set to app: myapp and tier: db. By specifying these labels, we make sure that only the deployments tagged with those labels will be included in this service. There is only one deployment with those 2 labels, and that is db-deployment.

Here are similar service manifests for the redis and web deployments:

$ cat redis-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: myapp
spec:
  ports:
    - port: 6379
  selector:
    app: myapp
    tier: redis
  clusterIP: None

$ cat web-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: web
  labels:
    app: myapp
spec:
  ports:
    - port: 80
  selector:
    app: myapp
    tier: frontend
  type: LoadBalancer

The selector properties for each service are set so that the proper deployment is included in each service.

One important thing to note in the definition of the web service: its type is set to LoadBalancer. Since Kubernetes is AWS-aware, the service creation will create an actual ELB in AWS, so that the application can be accessible from the outside world. It turns out that this is not the best way to expose applications externally, since this LoadBalancer resource operates only at the TCP layer. What we need is a proper layer 7 load balancer, and in a future post I'll show how to use a Kubernetes ingress controller in conjunction with the traefik proxy to achieve that. In the mean time, here is a KubeCon presentation from Gerred Dillon on "Kubernetes Ingress: Your Router, Your Rules".

To create the services defined above, I used kubectl:

$ kubectl create -f db-service.yaml --namespace tenant1
$ kubectl create -f redis-service.yaml --namespace tenant1$ kubectl create -f web-service.yaml --namespace tenant1
At this point, the web application can refer to the database 'host' in its configuration files by simply using the name of the database service, which is db in our example. Similarly, the web application can refer to the redis 'host' by using the name of the redis service, which is redis. The Kubernetes magic will make sure calls to db and redis are properly routed to their end destinations, which are the actual containers running those services.

Running commands inside pods with kubectl exec

Although you are not really supposed to do this in a container world, I found it useful to run a command such as loading a database from a MySQL dump file on a newly created pod. Kubernetes makes this relatively easy via the kubectl exec functionality. Here's how I did it:

DEPLOYMENT=db-deployment
NAMESPACE=tenant1

POD=$(kubectl --namespace $NAMESPACE get pods --show-all | grep $DEPLOYMENT | awk '{print $1}')
echo Running db_setup.sh command on pod $POD
kubectl --namespace $NAMESPACE exec $POD -it /usr/local/bin/db_setup.sh

where db_setup.sh downloads a sql.tar.gz file from S3 and loads it into MySQL.

A handy troubleshooting tool is to get a shell prompt inside a pod. First you get the pod name (via kubectl get pods --show-all), then you run:

$ kubectl --namespace tenant1 exec -it $POD -- bash -il

Sharing volumes across containers

One of the patterns I found useful in docker-compose files is to mount a container volume into another container, for example to check out the source code in a container volume, then mount it as /var/www/html in another container running the web application. This pattern is not extremely well supported in Kubernetes, but you can find your way around it by using init-containers.

Here's an example of creating an individual pod for the sole purpose of running a Capistrano task against the web application source code. Simply running two regular containers inside the same pod would not achieve this goal, because the order of creation for those containers is random. What we need is to force one container to start before any regular containers by declaring it to be an 'init-container'.

$ cat capistrano-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: capistrano
  annotations:
     pod.beta.kubernetes.io/init-containers: '[
            {
                "name": "data4capistrano",
                "image": "MY_ECR_ID.dkr.ecr.us-west-2.amazonaws.com/myapp-web:tenant1",
                "command": ["cp", "-rH", "/var/www/html/current", "/tmpfsvol/"],
                "volumeMounts": [
                    {
                        "name": "crtvol",
                        "mountPath": "/tmpfsvol"
                    }
                ]
            }
        ]'
spec:
  containers:
  - name: capistrano
    image: MY_ECR_ID.dkr.ecr.us-west-2.amazonaws.com/capistrano:tenant1
    imagePullPolicy: Always
    command: [ "cap", "$(CAP_STAGE)", "$(CAP_TASK)", "--trace" ]
    env:
    - name: CAP_STAGE
      valueFrom:
        configMapKeyRef:
          name: tenant1-cap-config
          key: CAP_STAGE
    - name: CAP_TASK
      valueFrom:
        configMapKeyRef:
          name: tenant1-cap-config
          key: CAP_TASK
    - name: DEPLOY_TO
      valueFrom:
        configMapKeyRef:
          name: tenant1-cap-config
          key: DEPLOY_TO
    volumeMounts:
    - name: crtvol
      mountPath: /var/www/html
    - name: web-persistent-storage
      mountPath: /var/www/html/shared
  volumes:
  - name: web-persistent-storage
    persistentVolumeClaim:
      claimName: web-pvc
  - name: crtvol
    emptyDir: {}
  restartPolicy: Never
  serviceAccount: tenant1-dev

The logic is here is a bit convoluted. Hopefully some readers of this post will know a better way to achieve the same thing. What I am doing here is launching a container based on the myapp-web:tenant1 Docker image, which already contains the source code checked out from GitHub. This container is declared as an init-container, so it's guaranteed to run first. What it does is it mounts a special Kubernetes volume declared at the bottom of the pod manifest as an emptyDir. This means that Kubernetes will allocate some storage on the node where this pod will run. The data4capistrano container runs a command which copies the contents of the /var/www/html/current directory from the myapp-web image into this storage space mounted as /tmpfsvol inside data4capistrano. One other thing to note is that init-containers are a beta feature currently, so their declaration needs to be embedded into an annotation.

When the regular capistrano container is created inside the pod, it also mounts the same emptyDir container (which is not empty at this point, because it was populated by the init-container), this time as /var/www/html. It also mounts the shared EFS file system as /var/www/html/shared. With these volumes in place, it has all it needs in order to run Capistrano locally via the cap command. The stage, task, and target directory for Capistrano are passed via ConfigMaps values.

One thing to note is that the RestartPolicy is set to Never for this pod, because we only want to run it once and be done with it.

To run the pod, I used kubectl again:

$ kubectl create -f capistrano-pod.yaml --namespace tenant1

Creating jobs

Kubernetes also has the concept of jobs, which differ from deployments in that they run one instance of a pod and make sure it completes. Jobs are useful for one-off tasks that you want to run, or for periodic tasks such as cron commands. Here is an example of a job manifest which runs a script that uses the twig template engine under the covers in order to generate a configuration file for the web application:

$ cat template-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: myapp-template
spec:
  template:
    metadata:
      name: myapp-template
    spec:
      containers:
      - name: myapp-template
        image: Y_ECR_ID.dkr.ecr.us-west-2.amazonaws.com/myapp-template:tenant1
        imagePullPolicy: Always
        command: [ "php", "/root/scripts/templatize.php"]
        env:
        - name: DBNAME
          valueFrom:
            configMapKeyRef:
              name: tenant1-config
              key: MYSQL_DATABASE
        - name: DBUSER
          valueFrom:
            configMapKeyRef:
              name: tenant1-config
              key: MYSQL_USER
        - name: DBPASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-db-pass
              key: mysql-db-pass.secret
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: redis-pass
              key: redis-pass.secret
        volumeMounts:
        - name: web-persistent-storage
          mountPath: /var/www/html/shared
      volumes:
      - name: web-persistent-storage
        persistentVolumeClaim:
          claimName: web-pvc
      restartPolicy: Never
      serviceAccount: tenant1-dev

The templatize.php script substitutes DBNAME, DBUSER, DBPASSWORD and REDIS_PASSWORD with the values passed in the job manifest, obtained from either Kubernetes secrets or ConfigMaps.

To create the job, I used kubectl:

$ kubectl create -f template-job.yaml --namespace tenant1

Performing rolling updates and rollbacks for Kubernetes deployments

Once your application pods are running, you'll need to update the application to a new version. Kubernetes allows you to do a rolling update of your deployments. One advantage of using deployments as opposed to the older method of using replication controllers is that the update process for deployment happens on the Kubernetes server side, and can be paused and restarted. There are a few ways of doing a rolling update for a deployment (and a recent linux.com article has a good overview as well).

a) You can modify the deployment's yaml file and change a label such as a version or a git commit, then run kubectl apply:

$ kubectl --namespace tenant1 apply -f deployment.yaml

Note from the Kubernetes documentation on updating deployments:

a Deployment’s rollout is triggered if and only if the Deployment’s pod template (i.e. .spec.template) is changed, e.g. updating labels or container images of the template. Other updates, such as scaling the Deployment, will not trigger a rollout.

b) You can use kubectl set to specify a new image for the deployment containers. Example from the documentation:
$ kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1 deployment "nginx-deployment" image update

c) You can use kubectl patch to add a unique label to the deployment spec template on the fly. This is the method I've been using, with the label being set to a timestamp:
$ kubectl patch deployment web-deployment --namespace tenant1 -p \  "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%Y%M%d%H%M%S'`\"}}}}}"

When updating a deployment, a new replica set will be created for that deployment, and the specified number of pods will be launched by that replica set, while the pods from the old replica set will be shut down. However, the old replica set itself will be preserved, allowing you to perform a rollback if needed. 
If you want to roll back to a previous version, you can use kubectl rollout history to show the revisions of your deployment updates:
$ kubectl --namespace tenant1 rollout history deployment web-deploymentdeployments "web-deployment"REVISION CHANGE-CAUSE1 kubectl create -f web-deployment.yaml --record --namespace tenant12 kubectl patch deployment web-deployment --namespace tenant1 -p {"spec":{"template":{"metadata":{"labels":{"date":"1479161196"}}}}}3 kubectl patch deployment web-deployment --namespace tenant1 -p {"spec":{"template":{"metadata":{"labels":{"date":"1479161573"}}}}}4 kubectl patch deployment web-deployment --namespace tenant1 -p {"spec":{"template":{"metadata":{"labels":{"date":"1479243444"}}}}}
Now use kubectl rollout undo to rollback to a previous revision:
$ kubectl --namespace tenant1 rollout undo deployments web-deployment --to-revision=3deployment "web-deployment" rolled back
I should note that all these kubectl commands can be easily executed out of Jenkins pipeline scripts or shell steps. I use a Docker image to wrap kubectl and its keys so that they I don't have to install it on the Jenkins worker nodes.

And there you have it. I hope the examples I provided will shed some light on some aspects of Kubernetes that go past the 'Kubernetes 101' stage. Before I forget, here's a good overview from the official documentation on using Kubernetes in production.

I have a lot more Kubernetes things on my plate, and I hope to write blog posts on all of them. Some of these:

  • ingress controllers based on traefik
  • creation and renewal of Let's Encrypt certificates
  • monitoring
  • logging
  • using the Helm package manager
  • ...and more




Software Development Linkopedia November 2016

From the Editor of Methods & Tools - Mon, 11/21/2016 - 15:08
Here is our monthly selection of knowledge on programming, software testing and project management. This month you will find some interesting information and opinions about introvert project manager, scaling Agile, Test-Driven Development, UX vs UI, philosophy and programming, retrospectives, BDD in Java and Agile metrics. Blog: How Introvert Can Survive as Project Manager Blog: #AgileAfrica […]

What Test Engineers do at Google: Building Test Infrastructure

Google Testing Blog - Fri, 11/18/2016 - 18:13
Author: Jochen Wuttke

In a recent post, we broadly talked about What Test Engineers do at Google. In this post, I talk about one aspect of the work TEs may do: building and improving test infrastructure to make engineers more productive.

Refurbishing legacy systems makes new tools necessary
A few years ago, I joined an engineering team that was working on replacing a legacy system with a new implementation. Because building the replacement would take several years, we had to keep the legacy system operational and even add features, while building the replacement so there would be no impact on our external users.

The legacy system was so complex and brittle that the engineers spent most of their time triaging and fixing bugs and flaky tests, but had little time to implement new features. The goal for the rewrite was to learn from the legacy system and to build something that was easier to maintain and extend. As the team's TE, my job was to understand what caused the high maintenance cost and how to improve on it. I found two main causes:
  • Tight coupling and insufficient abstraction made unit testing very hard, and as a consequence, a lot of end-to-end tests served as functional tests of that code.
  • The infrastructure used for the end-to-end tests had no good way to create and inject fakes or mocks for these services. As a result, the tests had to run the large number of servers for all these external dependencies. This led to very large and brittle tests that our existing test execution infrastructure was not able to handle reliably.
Exploring solutions
At first, I explored if I could split the large tests into smaller ones that would test specific functionality and depend on fewer external services. This proved impossible, because of the poorly structured legacy code. Making this approach work would have required refactoring the entire system and its dependencies, not just the parts my team owned.

In my second approach, I also focussed on large tests and tried to mock services that were not required for the functionality under test. This also proved very difficult, because dependencies changed often and individual dependencies were hard to trace in a graph of over 200 services. Ultimately, this approach just shifted the required effort from maintaining test code to maintaining test dependencies and mocks.

My third and final approach, illustrated in the figure below, made small tests more powerful. In the typical end-to-end test we faced, the client made RPCcalls to several services, which in turn made RPC calls to other services. Together the client and the transitive closure over all backend services formed a large graph (not tree!) of dependencies, which all had to be up and running for the end-to-end test. The new model changes how we test client and service integration. Instead of running the client on inputs that will somehow trigger RPC calls, we write unit tests for the code making method calls to the RPC stub. The stub itself is mocked with a common mocking framework like Mockito in Java. For each such test, a second test verifies that the data used to drive that mock "makes sense" to the actual service. This is also done with a unit test, where a replay client uses the same data the RPC mock uses to call the RPC handler method of the service.


This pattern of integration testing applies to any RPC call, so the RPC calls made by a backend server to another backend can be tested just as well as front-end client calls. When we apply this approach consistently, we benefit from smaller tests that still test correct integration behavior, and make sure that the behavior we are testing is "real".

To arrive at this solution, I had to build, evaluate, and discard several prototypes. While it took a day to build a proof-of-concept for this approach, it took me and another engineer a year to implement a finished tool developers could use.

Adoption
The engineers embraced the new solution very quickly when they saw that the new framework removes large amounts of boilerplate code from their tests. To further drive its adoption, I organized multi-day events with the engineering team where we focussed on migrating test cases. It took a few months to migrate all existing unit tests to the new framework, close gaps in coverage, and create the new tests that validate the mocks. Once we converted about 80% of the tests, we started comparing the efficacy of the new tests and the existing end-to-end tests.

The results are very good:
  • The new tests are as effective in finding bugs as the end-to-end tests are.
  • The new tests run in about 3 minutes instead of 30 minutes for the end-to-end tests.
  • The client side tests are 0% flaky. The verification tests are usually less flaky than the end-to-end tests, and never more.
Additionally, the new tests are unit tests, so you can run them in your IDE and step through them to debug. These results allowed us to run the end-to-end tests very rarely, only to detect misconfigurations of the interacting services, but not as functional tests.

Building and improving test infrastructure to help engineers be more productive is one of the many things test engineers do at Google. Running this project from requirements gathering all the way to a finished product gave me the opportunity to design and implement several prototypes, drive the full implementation of one solution, lead engineering teams to adoption of the new framework, and integrate feedback from engineers and actual measurements into the continuous refinement of the tool.
Categories: Testing & QA

Is Your Organization Killing Your Software?

From the Editor of Methods & Tools - Thu, 11/17/2016 - 16:19
When asked “What is your architecture?” most people immediately respond with how their software is laid out and what their plans are for improving parts of it. Rarely does anybody really think through their team and organizational architecture, and even more rarely do people understand how that may fundamentally impact how the software gets written […]

GTAC 2016 Registration Deadline Extended

Google Testing Blog - Tue, 11/15/2016 - 21:09
by Sonal Shah on behalf of the GTAC Committee

Our goal in organizing GTAC each year is to make it a first-class conference, dedicated to presenting leading edge industry practices. The quality of submissions we've received for GTAC 2016 so far has been overwhelming. In order to include the best talks possible, we are extending the deadline for speaker and attendee submissions by 15 days. The new timelines are as follows:

June 1, 2016 June 15, 2016 - Last day for speaker, attendee and diversity scholarship submissions.
June 15, 2016 July 15, 2016 - Attendees and scholarship awardees will be notified of selection/rejection/waitlist status. Those on the waitlist will be notified as space becomes available.
August 15, 2016 August 29, 2016 - Selected speakers will be notified.

To register, please fill out this form.
To apply for diversity scholarship, please fill out this form.

The GTAC website has a list of frequently asked questions. Please do not hesitate to contact gtac2016@google.com if you still have any questions.

Categories: Testing & QA

GTAC 2016 Registration Deadline Extended

Google Testing Blog - Tue, 11/15/2016 - 21:09
by Sonal Shah on behalf of the GTAC Committee

Our goal in organizing GTAC each year is to make it a first-class conference, dedicated to presenting leading edge industry practices. The quality of submissions we've received for GTAC 2016 so far has been overwhelming. In order to include the best talks possible, we are extending the deadline for speaker and attendee submissions by 15 days. The new timelines are as follows:

June 1, 2016 June 15, 2016 - Last day for speaker, attendee and diversity scholarship submissions.
June 15, 2016 July 15, 2016 - Attendees and scholarship awardees will be notified of selection/rejection/waitlist status. Those on the waitlist will be notified as space becomes available.
August 15, 2016 August 29, 2016 - Selected speakers will be notified.

To register, please fill out this form.
To apply for diversity scholarship, please fill out this form.

The GTAC website has a list of frequently asked questions. Please do not hesitate to contact gtac2016@google.com if you still have any questions.

Categories: Testing & QA

Hackable Projects - Pillar 2: Debuggability

Google Testing Blog - Thu, 11/10/2016 - 20:34
By: Patrik Höglund

This is the second article in our series on Hackability; also see the first article.


“Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before.” -- Edgar Allan Poe

Debuggability can mean being able to use a debugger, but here we’re interested in a broader meaning. Debuggability means being able to easily find what’s wrong with a piece of software, whether it’s through logs, statistics or debugger tools. Debuggability doesn’t happen by accident: you need to design it into your product. The amount of work it takes will vary depending on your product, programming language(s) and development environment.

In this article, I am going to walk through a few examples of how we have aided debuggability for our developers. If you do the same analysis and implementation for your project, perhaps you can help your developers illuminate the dark corners of the codebase and learn what truly goes on there.
Figure 1: computer log entry from the Mark II, with a moth taped to the page.
Running on Localhost Read more on the Testing Blog: Hermetic Servers by Chaitali Narla and Diego Salas

Suppose you’re developing a service with a mobile app that connects to that service. You’re working on a new feature in the app that requires changes in the backend. Do you develop in production? That’s a really bad idea, as you must push unfinished code to production to work on your change. Don’t do that: it could break your service for your existing users. Instead, you need some kind of script that brings up your server stack on localhost.

You can probably run your servers by hand, but that quickly gets tedious. In Google, we usually use fancy python scripts that invoke the server binaries with flags. Why do we need those flags? Suppose, for instance, that you have a server A that depends on a server B and C. The default behavior when the server boots should be to connect to B and C in production. When booting on localhost, we want to connect to our local B and C though. For instance:

b_serv --port=1234 --db=/tmp/fakedb
c_serv --port=1235
a_serv --b_spec=localhost:1234 --c_spec=localhost:1235

That makes it a whole lot easier to develop and debug your server. Make sure the logs and stdout/stderr end up in some well-defined directory on localhost so you don’t waste time looking for them. You may want to write a basic debug client that sends HTTP requests or RPCs or whatever your server handles. It’s painful to have to boot the real app on a mobile phone just to test something.

A localhost setup is also a prerequisite for making hermetic tests,where the test invokes the above script to bring up the server stack. The test can then run, say, integration tests among the servers or even client-server integration tests. Such integration tests can catch protocol drift bugs between client and server, while being super stable by not talking to external or shared services.
Debugging Mobile AppsFirst, mobile is hard. The tooling is generally less mature than for desktop, although things are steadily improving. Again, unit tests are great for hackability here. It’s really painful to always load your app on a phone connected to your workstation to see if a change worked. Robolectric unit tests and Espresso functional tests, for instance, run on your workstation and do not require a real phone. xcTestsand Earl Grey give you the same on iOS.

Debuggers ship with Xcode and Android Studio. If your Android app ships JNI code, it’s a bit trickier, but you can attach GDB to running processes on your phone. It’s worth spending the time figuring this out early in the project, so you don’t have to guess what your code is doing. Debugging unit tests is even better and can be done straightforwardly on your workstation.
When Debugging gets TrickySome products are harder to debug than others. One example is hard real-time systems, since their behavior is so dependent on timing (and you better not be hooked up to a real industrial controller or rocket engine when you hit a breakpoint!). One possible solution is to run the software on a fake clock instead of a hardware clock, so the clock stops when the program stops.

Another example is multi-process sandboxed programs such as Chromium. Since the browser spawns one renderer process per tab, how do you even attach a debugger to it? The developers have made it quite a lot easier with debugging flags and instructions. For instance, this wraps gdb around each renderer process as it starts up:

chrome --renderer-cmd-prefix='xterm -title renderer -e gdb --args'

The point is, you need to build these kinds of things into your product; this greatly aids hackability.
Proper LoggingRead more on the Testing Blog: Optimal Logging by Anthony Vallone

It’s hackability to get the right logs when you need them. It’s easy to fix a crash if you get a stack trace from the error location. It’s far from guaranteed you’ll get such a stack trace, for instance in C++ programs, but this is something you should not stand for. For instance, Chromium had a problem where renderer process crashes didn’t print in test logs, because the test was running in a separate process. This was later fixed, and this kind of investment is worthwhile to make. A clean stack trace is worth a lot more than a “renderer crashed” message.

Logs are also useful for development. It’s an art to determine how much logging is appropriate for a given piece of code, but it is a good idea to keep the default level of logging conservative and give developers the option to turn on more logging for the parts they’re working on (example: Chromium). Too much logging isn’t hackability. This article elaborates further on this topic.

Logs should also be properly symbolized for C/C++ projects; a naked list of addresses in a stack trace isn’t very helpful. This is easy if you build for development (e.g. with -g), but if the crash happens in a release build it’s a bit trickier. You then need to build the same binary with the same flags and use addr2line / ndk-stack / etc to symbolize the stack trace. It’s a good idea to build tools and scripts for this so it’s as easy as possible.
Monitoring and StatisticsIt aids hackability if developers can quickly understand what effect their changes have in the real world. For this, monitoring tools such as Stackdriver for Google Cloudare excellent. If you’re running a service, such tools can help you keep track of request volumes and error rates. This way you can quickly detect that 30% increase in request errors, and roll back that bad code change, before it does too much damage. It also makes it possible to debug your service in production without disrupting it.
System Under Test (SUT) SizeTests and debugging go hand in hand: it’s a lot easier to target a piece of code in a test than in the whole application. Small and focused tests aid debuggability, because when a test breaks there isn’t an enormous SUT to look for errors in. These tests will also be less flaky. This article discusses this fact at length.


Figure 2. The smaller the SUT, the more valuable the test.
You should try to keep the above in mind, particularly when writing integration tests. If you’re testing a mobile app with a server, what bugs are you actually trying to catch? If you’re trying to ensure the app can still talk to the server (i.e. catching protocol drift bugs), you should not involve the UI of the app. That’s not what you’re testing here. Instead, break out the signaling part of the app into a library, test that directly against your local server stack, and write separate tests for the UI that only test the UI.

Smaller SUTs also greatly aids test speed, since there’s less to build, less to bring up and less to keep running. In general, strive to keep the SUT as small as possible through whatever means necessary. It will keep the tests smaller, faster and more focused.
SourcesFigure 1: By Courtesy of the Naval Surface Warfare Center, Dahlgren, VA., 1988. - U.S. Naval Historical Center Online Library Photograph NH 96566-KN, Public Domain, https://commons.wikimedia.org/w/index.php?curid=165211


(Continue to Pillar 3: Infrastructure)
Categories: Testing & QA

Hackable Projects - Pillar 1: Code Health

Google Testing Blog - Thu, 11/10/2016 - 20:33
By: Patrik Höglund
IntroductionSoftware development is difficult. Projects often evolve over several years, under changing requirements and shifting market conditions, impacting developer tools and infrastructure. Technical debt, slow build systems, poor debuggability, and increasing numbers of dependencies can weigh down a project The developers get weary, and cobwebs accumulate in dusty corners of the code base.

Fighting these issues can be taxing and feel like a quixotic undertaking, but don’t worry — the Google Testing Blog is riding to the rescue! This is the first article of a series on “hackability” that identifies some of the issues that hinder software projects and outlines what Google SETIs usually do about them.

According to Wiktionary, hackable is defined as:
Adjective
hackable ‎(comparative more hackable, superlative most hackable)
  1. (computing) That can be hacked or broken into; insecure, vulnerable. 
  2. That lends itself to hacking (technical tinkering and modification); moddable.

Obviously, we’re not going to talk about making your product more vulnerable (by, say, rolling your own crypto or something equally unwise); instead, we will focus on the second definition, which essentially means “something that is easy to work on.” This has become the mainfocus for SETIs at Google as the role has evolved over the years.
In PracticeIn a hackable project, it’s easy to try things and hard to break things. Hackability means fast feedback cycles that offer useful information to the developer.

This is hackability:
  • Developing is easy
  • Fast build
  • Good, fast tests
  • Clean code
  • Easy running + debugging
  • One-click rollbacks
In contrast, what is not hackability?
  • Broken HEAD (tip-of-tree)
  • Slow presubmit (i.e. checks running before submit)
  • Builds take hours
  • Incremental build/link > 30s
  • Flakytests
  • Can’t attach debugger
  • Logs full of uninteresting information
The Three Pillars of HackabilityThere are a number of tools and practices that foster hackability. When everything is in place, it feels great to work on the product. Basically no time is spent on figuring out why things are broken, and all time is spent on what matters, which is understanding and working with the code. I believe there are three main pillars that support hackability. If one of them is absent, hackability will suffer. They are:


Pillar 1: Code Health“I found Rome a city of bricks, and left it a city of marble.”
   -- Augustus
Keeping the code in good shape is critical for hackability. It’s a lot harder to tinker and modify something if you don’t understand what it does (or if it’s full of hidden traps, for that matter).
TestsUnit and small integration tests are probably the best things you can do for hackability. They’re a support you can lean on while making your changes, and they contain lots of good information on what the code does. It isn’t hackability to boot a slow UI and click buttons on every iteration to verify your change worked - it is hackability to run a sub-second set of unit tests! In contrast, end-to-end (E2E) tests generally help hackability much less (and can evenbe a hindrance if they, or the product, are in sufficiently bad shape).

Figure 1: the Testing Pyramid.
I’ve always been interested in how you actually make unit tests happen in a team. It’s about education. Writing a product such that it has good unit tests is actually a hard problem. It requires knowledge of dependency injection, testing/mocking frameworks, language idioms and refactoring. The difficulty varies by language as well. Writing unit tests in Go or Java is quite easy and natural, whereas in C++ it can be very difficult (and it isn’t exactly ingrained in C++ culture to write unit tests).

It’s important to educate your developers about unit tests. Sometimes, it is appropriate to lead by example and help review unit tests as well. You can have a large impact on a project by establishing a pattern of unit testing early. If tons of code gets written without unit tests, it will be much harder to add unit tests later.

What if you already have tons of poorly tested legacy code? The answer is refactoring and adding tests as you go. It’s hard work, but each line you add a test for is one more line that is easier to hack on.
Readable Code and Code ReviewAt Google, “readability” is a special committer status that is granted per language (C++, Go, Java and so on). It means that a person not only knows the language and its culture and idioms well, but also can write clean, well tested and well structured code. Readability literally means that you’re a guardian of Google’s code base and should push back on hacky and ugly code. The use of a style guide enforces consistency, and code review (where at least one person with readability must approve) ensures the code upholds high quality. Engineers must take care to not depend too much on “review buddies” here but really make sure to pull in the person that can give the best feedback.

Requiring code reviews naturally results in small changes, as reviewers often get grumpy if you dump huge changelists in their lap (at least if reviewers are somewhat fast to respond, which they should be). This is a good thing, since small changes are less risky and are easy to roll back. Furthermore, code review is good for knowledge sharing. You can also do pair programming if your team prefers that (a pair-programmed change is considered reviewed and can be submitted when both engineers are happy). There are multiple open-source review tools out there, such as Gerrit.

Nice, clean code is great for hackability, since you don’t need to spend time to unwind that nasty pointer hack in your head before making your changes. How do you make all this happen in practice? Put together workshops on, say, the SOLID principles, unit testing, or concurrency to encourage developers to learn. Spread knowledge through code review, pair programming and mentoring (such as with the Readability concept). You can’t just mandate higher code quality; it takes a lot of work, effort and consistency.
Presubmit Testing and LintConsistently formatted source code aids hackability. You can scan code faster if its formatting is consistent. Automated tooling also aids hackability. It really doesn’t make sense to waste any time on formatting source code by hand. You should be using tools like gofmt, clang-format, etc. If the patch isn’t formatted properly, you should see something like this (example from Chrome):

$ git cl upload
Error: the media/audio directory requires formatting. Please run
git cl format media/audio.

Source formatting isn’t the only thing to check. In fact, you should check pretty much anything you have as a rule in your project. Should other modules not depend on the internals of your modules? Enforce it with a check. Are there already inappropriate dependencies in your project? Whitelist the existing ones for now, but at least block new bad dependencies from forming. Should our app work on Android 16 phones and newer? Add linting, so we don’t use level 17+ APIs without gating at runtime. Should your project’s VHDL code always place-and-route cleanly on a particular brand of FPGA? Invoke the layout tool in your presubmit and and stop submit if the layout process fails.

Presubmit is the most valuable real estate for aiding hackability. You have limited space in your presubmit, but you can get tremendous value out of it if you put the right things there. You should stop all obvious errors here.

It aids hackability to have all this tooling so you don’t have to waste time going back and breaking things for other developers. Remember you need to maintain the presubmit well; it’s not hackability to have a slow, overbearing or buggy presubmit. Having a good presubmit can make it tremendously more pleasant to work on a project. We’re going to talk more in later articles on how to build infrastructure for submit queues and presubmit.
Single Branch And Reducing RiskHaving a single branch for everything, and putting risky new changes behind feature flags, aids hackability since branches and forks often amass tremendous risk when it’s time to merge them. Single branches smooth out the risk. Furthermore, running all your tests on many branches is expensive. However, a single branch can have negative effects on hackability if Team A depends on a library from Team B and gets broken by Team B a lot. Having some kind of stabilization on Team B’s software might be a good idea there. Thisarticle covers such situations, and how to integrate often with your dependencies to reduce the risk that one of them will break you.
Loose Coupling and TestabilityTightly coupled code is terrible for hackability. To take the most ridiculous example I know: I once heard of a computer game where a developer changed a ballistics algorithm and broke the game’s chat. That’s hilarious, but hardly intuitive for the poor developer that made the change. A hallmark of loosely coupled code is that it’s upfront about its dependencies and behavior and is easy to modify and move around.

Loose coupling, coherence and so on is really about design and architecture and is notoriously hard to measure. It really takes experience. One of the best ways to convey such experience is through code review, which we’ve already mentioned. Education on the SOLID principles, rules of thumb such as tell-don’t-ask, discussions about anti-patterns and code smells are all good here. Again, it’s hard to build tooling for this. You could write a presubmit check that forbids methods longer than 20 lines or cyclomatic complexity over 30, but that’s probably shooting yourself in the foot. Developers would consider that overbearing rather than a helpful assist.

SETIs at Google are expected to give input on a product’s testability. A few well-placed test hooks in your product can enable tremendously powerful testing, such as serving mock content for apps (this enables you to meaningfully test app UI without contacting your real servers, for instance). Testability can also have an influence on architecture. For instance, it’s a testability problem if your servers are built like a huge monolith that is slow to build and start, or if it can’t boot on localhost without calling external services. We’ll cover this in the next article.
Aggressively Reduce Technical DebtIt’s quite easy to add a lot of code and dependencies and call it a day when the software works. New projects can do this without many problems, but as the project becomes older it becomes a “legacy” project, weighed down by dependencies and excess code. Don’t end up there. It’s bad for hackability to have a slew of bug fixes stacked on top of unwise and obsolete decisions, and understanding and untangling the software becomes more difficult.

What constitutes technical debt varies by project and is something you need to learn from experience. It simply means the software isn’t in optimal form. Some types of technical debt are easy to classify, such as dead code and barely-used dependencies. Some types are harder to identify, such as when the architecture of the project has grown unfit to the task from changing requirements. We can’t use tooling to help with the latter, but we can with the former.

I already mentioned that dependency enforcement can go a long way toward keeping people honest. It helps make sure people are making the appropriate trade-offs instead of just slapping on a new dependency, and it requires them to explain to a fellow engineer when they want to override a dependency rule. This can prevent unhealthy dependencies like circular dependencies, abstract modules depending on concrete modules, or modules depending on the internals of other modules.

There are various tools available for visualizing dependency graphs as well. You can use these to get a grip on your current situation and start cleaning up dependencies. If you have a huge dependency you only use a small part of, maybe you can replace it with something simpler. If an old part of your app has inappropriate dependencies and other problems, maybe it’s time to rewrite that part.

(Continue to Pillar 2: Debuggability)
Categories: Testing & QA