Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

How to write an Amazon RDS service broker for Cloud Foundry

Xebia Blog - Mon, 03/23/2015 - 10:58

Cloud Foundry is a wonderful on-premise PaaS  that makes it very easy to build, deploy while providing scalability and high availability to your stateless applications. But Cloud Foundry is really a Application Platform Service and does not provide high availability and scalability for your data. Fortunately, there is Amazon RDS, which excels in providing this as a service.

In this blog I will show you how easy it is to build, install and use a Cloud Foundry Service Broker for Amazon RDS.  The broker was developed in Node.JS using the Restify framework and can be deployed as a normal Cloud Foundry application. Finally,  I will point you to a skeleton service broker which you can use as the basis for your own.

Cloud Foundry Service Broker Domain

Before I race of into the details of the implementation, I would like to introduce you into the Cloud Foundry lingo. If you are aware of the lingo, just skip to the paragraph 'AWS RDS Service Broker operations'.

Service - an external resource that can be used by an application. It can be a database, a messaging system or an external application.  Commonly provided services are mysql, postgres, redis and memcached.

Service Plan - a plan specify the quality of the service and governs the amount memory, disk space, nodes etc. provided with the service.

Service Catalog - a document containing all services and service plans of a service broker.

Service Broker - a program that is capable of creating services and providing the necessary information to applications to connect to the service.

Now a service broker can provide the following operations:

Describe Services - Show me all the services this broker can provide.

Create Service - Creating an instance of a service matching a specified plan. When the service is a database, it depends on the broker what this means: It may create an entire database server, or just a new database instance, or even just a database schema.   Cloud Foundry calls this 'provisioning a service instance'.

Binding a Service - providing a specific application with the necessary information to connect to an existing service.  When the service is a database, it provides the hostname, portname, database name, username and password. Depending on the service broker, the broker may even  create specific credentials for each  bind request/application. The Cloud Controller will store the returned credentials in a JSON document stored as an UNIX environment variable (VCAP_SERVICES).

Unbind service - depending on the service broker, undo what what done on the bind.

Destroy Service - Easy, just deleting what was created. Cloud Foundry calls this 'deprovisioning a service instance'.

In the next paragraph I will map these operations to Amazon AWS RDS services.

AWS RDS Service Broker operations

Any Service Broker has to implement a REST API of the Cloud Foundry specification.  To create the Amazon AWS RDS service broker, I had to implement four out of five methods:

  • describe services -¬†returns available services and service plans
  • create service -¬†call the createDBInstance operation and store generated credentials as tags in with the instance.
  • bind service - call the¬†describeDBInstances¬†operation and return the stored credentials.
  • delete service - just¬†call the deleteDBInstance¬†operation.

I implemented these REST calls using the Restify framework and the Amazon AWS RDS API for Javascript. the skeleton looks like this:

// get catalog
server.get('/v2/catalog', function(request, response, next) {

// create service
server.put('/v2/service_instances/:id', function(request, response, next) {
        response.send(501, { 'description' : 'create/provision service not implemented' });

// delete service
server.del('/v2/service_instances/:id', function(req, response, next) {
        response.send(501, { 'description' : 'delete/unprovision service not implemented' });

// bind service
server.put('/v2/service_instances/:instance_id/service_bindings/:id', function(req, response, next) {
        response.send(501, { 'description' : 'bind service not implemented' });

// unbind service
server.del('/v2/service_instances/:instance_id/service_bindings/:id', function(req, response, next) {
    response.send(501, { 'description' : 'unbind service not implemented' });

For the actual implementation of each operations on AWS RDS,  I would like to refer you to the source code of aws-rds-service-broker.js on .

Design decisions

That does not look all too difficult does it?  Here are some of my design decisions:

Where do I store the credentials?

I store the credentials as tags on the  instance.  I wanted to create service broker that was completely stateless so that I could deploy it in Cloud Foundry itself. I did not want to be dependent on a complete database for a little bit of information. The tags seemed to fit the purpose.

Why does bind return the same credentials for every bind?

I wanted the bind service to be as simple as possible. I did not want to generate new user accounts and passwords, because if I did, I had even more state to maintain.  Even more, I found  that if I bind two applications to the same MySQL service, they could see each others data. So why bother creating users for binds? Finally, making the bind service simple, kept the unbind service even simpler because there is nothing to undo.

How to implement different service plans?

The createDBInstance operation of AWS RDS API operation, takes a JSON object as input parameter that is basically the equivalent of a plan. I just had to add an appropriate JSON record to the configuration file for each plan. See the description of the params parameter of the createDBInstance operation.

How do I create a AWS RDS service within 60 seconds?

Well, I don't.  The service broker API states that you have to create a service within the timeout of the cloud controller (which is 60 seconds), but RDS takes a whee bit more time. So the create request is initiated within seconds, but before you can bind an application to it may take a few minutes. Nothing I can do about that.

Why store the service broker credentials in environment variables?

I want the service broker to be configured upon deployment time. When the credentials are in the config file, you need to change the files of the application on each deployment.


In these instructions, I presume you have access to an AWS account and you have an installation of Cloud Foundry. I used  Stackato which is a Cloud Foundy implementation by ActiveState.  These instructions assume you are too!

  1. Create a AWS IAM user
    First create a AWS IAM user (cf-aws-service-broker) with at least the folllowing privileges
  2. Assign privileges to execute AWS RDS operations
    The newly created IAM user needs the privileges to create RDS databases. I used the following permissions:

      "Version": "2012-10-17",
      "Statement": [
          "Effect": "Allow",
          "Action": [
          "Resource": [
          "Effect": "Allow",
          "Action": [
          "Resource": [
  3. Generate AWS access key and secret for the user 'cf-aws-service-broker'
  4. Create a Database Subnet
    Create a  database subnet 'stackato-db-subnet-group' in the AWS Region where you want to have the databases to be created.
  5. Check out the service broker
    git clone
    cd aws-rds-service-broker
  6. Add all your parameters as environment variables to the manifest.yml
       - name: aws-rds-service-broker
         mem: 256M
         disk: 1024M
         instances: 1
           AWS_ACCESS_KEY_ID: <fillin>
           AWS_SECRET_ACCESS_KEY: <fillin>
           AWS_REGION: <of db subnet group,eg eu-west-1>
           AWS_DB_SUBNET_GROUP: stackato-db-subnet-group
           SERVICE_BROKER_USERNAME: <fillin>
           SERVICE_BROKER_PASSWORD: <fillin>
             - .git
             - bin
             - node_modules
  7. Deploy the service broker
    stackato target <your-service-broker> --skip-ssl-validation
    stackato login
  8. Install the service broker
    This script is a cunning implementation which create the service broker in Cloud Foundry and makes all the plans publicly available. In stackato we use the curl commands to achieve this. This script requires you to have installed jq, the wonderful JSON command line processor by Stephen Dolan.


Now you can use the service broker!

Using the Service Broker

Now we are ready to use the service broker.

  1. Deploy a sample application
    $ git clone
    $ stackato push -n 
  2. Create a service for the mysql services
    $ stackato create-service
    1. filesystem 1.0, by core
    2. mysql
    3. mysql 5.5, by core
    4. postgres
    5. postgresql 9.1, by core
    6. redis 2.8, by core
    7. user-provided
    Which kind to provision:? 2
    1. 10gb: 10Gb HA MySQL database.
    2. default: Small 5Gb non-HA MySQL database
    Please select the service plan to enact:? 2
    Creating new service [mysql-844b1] ... OK
  3. Bind the service to the application
    stackato bind-service mysql-844b1 paas-monitor
      Binding mysql-844b1 to paas-monitor ... Error 10001: Service broker error: No endpoint set on the instance 'cfdb-3529e5764'. The instance is in state 'creating'. please retry a few minutes later (500)

    retry until the database is actually created (3-10 minutes on AWS)

    stackato bind-service mysql-844b1 paas-monitor
     Binding mysql-844d1 to paas-monitor ...
    Stopping Application [paas-monitor] ... OK
    Starting Application [paas-monitor] ...
    http://paas-monitor.<your-api-endpoint>/ deployed
  4. Check the environment of the application
    curl -s http://paas-monitor.<your-api-endpoint>/environment | jq .DATABASE_URL

    As you can see the credentials for the newly created database has been inserted into the environment of the application.

Creating your own service broker

If you want to create your own service broker in Node.JS you may find the Skeleton Service Broker  a good starting point. It includes a number of utilities to test your broker in the bin directory.

  • - calls the catalog operation
  • - calls the create¬†operation
  • - call the delete¬†operation
  • - calls the bind operation on a specified instance
  • - calls the unbind operation on a specified instance and bind id.
  • - calls the list all service instances operation
  • - gets the environment variables of an CF applications as sourceable output
  • - installs the application and makes all plans public.
  • - calls the stackato CURL operation., and require jq to be installed.


As you can see, it is quite easy to create your own Cloud Foundry service broker!

By: ¬Ľ Business Analyst resource guide

Software Requirements Blog - - Mon, 03/23/2015 - 03:11

[…] Seilevel Blog – Visit this site for relevant and timely articles on Business Analysis. The authors are business analysts who write about their work and what they’ve learnt on the job. The tips you get from this site are practical and can be applied to your projects. […]

Categories: Requirements

Python: Equivalent to flatMap for flattening an array of arrays

Mark Needham - Mon, 03/23/2015 - 01:45

I found myself wanting to flatten an array of arrays while writing some Python code earlier this afternoon and being lazy my first attempt involved building the flattened array manually:

episodes = [
    {"id": 1, "topics": [1,2,3]},
    {"id": 2, "topics": [4,5,6]}
flattened_episodes = []
for episode in episodes:
    for topic in episode["topics"]:
        flattened_episodes.append({"id": episode["id"], "topic": topic})
for episode in flattened_episodes:
    print episode

If we run that we’ll see this output:

$ python
{'topic': 1, 'id': 1}
{'topic': 2, 'id': 1}
{'topic': 3, 'id': 1}
{'topic': 4, 'id': 2}
{'topic': 5, 'id': 2}
{'topic': 6, 'id': 2}

What I was really looking for was the Python equivalent to the flatmap function which I learnt can be achieved in Python with a list comprehension like so:

flattened_episodes = [{"id": episode["id"], "topic": topic}
                      for episode in episodes
                      for topic in episode["topics"]]
for episode in flattened_episodes:
    print episode

We could also choose to use itertools in which case we’d have the following code:

from itertools import chain, imap
flattened_episodes = chain.from_iterable(
                        imap(lambda episode: [{"id": episode["id"], "topic": topic}
                                             for topic in episode["topics"]],
for episode in flattened_episodes:
    print episode

We can then simplify this approach a little by wrapping it up in a ‘flatmap’ function:

def flatmap(f, items):
        return chain.from_iterable(imap(f, items))
flattened_episodes = flatmap(
    lambda episode: [{"id": episode["id"], "topic": topic} for topic in episode["topics"]], episodes)
for episode in flattened_episodes:
    print episode

I think the list comprehensions approach still works but I need to look into itertools more – it looks like it could work well for other list operations.

Categories: Programming

SPaMCAST 334 ‚Äď Mario Lucero, It‚Äôs All About Agile Coaching


Listen Now

Subscribe on iTunes

In this episode of the Software Process and Measurement Cast we feature our interview with Agile coach Mario Lucero.  Mario and I discussed the nuts and bolts of coaching Agile teams, what is and isn’t Agile and the impact of coaching on success. Mario provided insights on Agile that span both Americas!

Mario describes himself as an Agile evangelist (including Kanban) delivering coaching for Agile transformations and Scrum mastery. He performs as a Scrum Master for several teams while mentoring and coaching other teams, Scrum Masters and product owners.

Mario is as comfortable advising senior management on the Agile transformation strategy and implementation as he is working with teams.


Twitter: @metlucero



Call to action!

Can you tell a friend about the podcast? If your friends don’t know how to subscribe or listen to a podcast, show them how you listen and subscribe them!  Remember to send us the name of you person you subscribed (and a picture) and I will give both you and the horde you have converted to listeners a call out on the show.

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

I am beginning to think of which book will be next. Do you have any ideas?

Upcoming Events

CMMI Institute Conference EMEA 2015
March 26 -27 London, UK
I will be presenting ‚ÄúAgile Risk Management.‚ÄĚ

QAI Quest 2015
April 20 -21 Atlanta, GA, USA
Scale Agile Testing Using the TMMi

DCG will also have a booth!

Next SPaMCast

The next Software Process and Measurement Cast will feature our essay on the definitions of four critical words.  What do the words effectiveness, efficiency, frameworks and methodologies really mean?  These words get used ALL the time, however they really do have fairly specific meanings.  Meanings that, once understood and used to guide how we work, can help everyone to deliver more value and make our customers more satisfied!

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques¬†co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: ‚ÄúThis book will prove that software projects should not be a tedious process, neither for you or your team.‚ÄĚ Support SPaMCAST by buying the book¬†here.

Available in English and Chinese.

Categories: Process Management

SPaMCAST 334 ‚Äď Mario Lucero, It‚Äôs All About Agile Coaching

Software Process and Measurement Cast - Sun, 03/22/2015 - 22:00

In this episode of the Software Process and Measurement Cast we feature our interview with Agile coach Mario Lucero.  Mario and I discussed the nuts and bolts of coaching Agile teams, what is and isn’t Agile and the impact of coaching on success. Mario provided insights on Agile that span both Americas!

Mario describes himself as an Agile evangelist (including Kanban) delivering coaching for Agile transformations and Scrum mastery. He performs as a Scrum Master for several teams while mentoring and coaching other teams, Scrum Masters and product owners.

Mario is as comfortable advising senior management on the Agile transformation strategy and implementation as he is working with teams.


Twitter: @metlucero



Call to action!

Can you tell a friend about the podcast? If your friends don’t know how to subscribe or listen to a podcast, show them how you listen and subscribe them!  Remember to send us the name of you person you subscribed (and a picture) and I will give both you and the horde you have converted to listeners a call out on the show. 

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

I am beginning to think of which book will be next. Do you have any ideas?

Upcoming Events

CMMI Institute Conference EMEA 2015
March 26 -27 London, UK
I will be presenting ‚ÄúAgile Risk Management.‚Ä̬†

QAI Quest 2015
April 20 -21 Atlanta, GA, USA
Scale Agile Testing Using the TMMi

DCG will also have a booth!

Next SPaMCast

The next Software Process and Measurement Cast will feature our essay on the definitions of four critical words.  What do the words effectiveness, efficiency, frameworks and methodologies really mean?  These words get used ALL the time, however they really do have fairly specific meanings.  Meanings that, once understood and used to guide how we work, can help everyone to deliver more value and make our customers more satisfied! 

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques¬†co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: ‚ÄúThis book will prove that software projects should not be a tedious process, neither for you or your team.‚ÄĚ Support SPaMCAST by buying the book¬†here.

Available in English and Chinese.

Categories: Process Management

Capabilities Based Planning First Then Requirements

Herding Cats - Glen Alleman - Sun, 03/22/2015 - 16:23

When I hear about requirements churn, bad requirements management - which is really bad business management, emergent requirements that turn over 20% a month for a complete turnover in 4 months - it's clear there is a serious problem in understanding how to manage the development of a non-trivial project.

Let's start here. Start with what capabilities does this project need to produce when it is done? The order of the capabilities is dependent of the business's ability to not only absorb the capability, but the value stream of those capabilities in support of the business strategy.

That picture at the bottom shows a value stream of capabilities for a health insurance provider network system. The notion of INVEST in agile has to be tested for any project. Dependencies exist and are actually required for enterprise projects. See the flow of capabilities chart below. Doing work in independent order would simply not work. 

Once we have the needed capabilities, and know their dependencies, we can determine - from the business strategy - what order they need to be delivered.
The Point When you hear about all the problems with requirements - or anything to do with software development - stop and remember - it is trivial to point out problems. The classical example of this trivial approach is estimates are the smell of dysfunction. This approach is a Dilbert carton management method. It's not only lame, it's not managing projects as an adult. Adults don't whine, they provide solutions.  So here's a place to start with Requirements Management.  Each  of these books informs our Command Media for requirements elicitation and management for software intensive systems. As well professional journals provide up to date guidance.  There are also tools for requirements management. But don't start with tools, start with a process. Analytic Hierarchy Process (AHP) is my favorite There is no reason to not have a credible requirements process - don't let the whiners dominate the conversation. Provide solutions to the problem.  Related articles Why We Need Governance I Think You'll Find It's a Bit More Complicated Than That The Use, Misuse, and Abuse of Complexity and Complex
Categories: Project Management

Quote of the Day

Herding Cats - Glen Alleman - Sun, 03/22/2015 - 15:40

Science is the great antidote to the poison of enthusiasm and superstition.
- Adam Smith Wealth of Nations

If you hear a conjecture or a claim that sounds like it is not what you were taught in school, doesn't seem to make sense in a common sense way, or appears to violate established principles of science, math, or business - ask for the numbers.

Categories: Project Management

Python: Simplifying the creation of a stop word list with defaultdict

Mark Needham - Sun, 03/22/2015 - 02:51

I’ve been playing around with topics models again and recently read a paper by David Mimno which suggested the following heuristic for working out which words should go onto the stop list:

A good heuristic for identifying such words is to remove those that occur in more than 5-10% of documents (most common) and those that occur fewer than 5-10 times in the entire corpus (least common).

I decided to try this out on the HIMYM dataset that I’ve been working on over the last couple of months.

I started out with the following code to build a dictionary of words, their total occurrences and the episodes they’d been used in:

import csv
from sklearn.feature_extraction.text import CountVectorizer
from collections import defaultdict
episodes = defaultdict(str)
with open("sentences.csv", "r") as file:
    reader = csv.reader(file, delimiter = ",")
    for row in reader:
        episodes[row[1]] += row[4]
vectorizer = CountVectorizer(analyzer='word', min_df = 0, stop_words = 'english')
matrix = vectorizer.fit_transform(episodes.values())
features = vectorizer.get_feature_names()
words = {}
for doc_id, doc in enumerate(matrix.todense()):
    for word_id, score in enumerate(doc.tolist()[0]):
        word = features[word_id]
        if not words.get(word):
            words[word] = {}
        if not words[word].get("score"):
            words[word]["score"] = 0
        words[word]["score"] += score
        if not words[word].get("episodes"):
            words[word]["episodes"] = set()
        if score > 0:

This works fine but the code inside the last for block is ugly and most of it is handling the case when parts of a dictionary aren’t yet initialised which is defaultdict territory. You’ll notice I am using defaultdict in the first part of the code but not yet the second as I’d struggled to get it working.

This was my first attempt to make the ‘words’ variable based on it:

>>> words = defaultdict({})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: first argument must be callable

We can see why this doesn’t work if we try to evaluate ‘{}’ as a function which is what defaultdict does internally:

>>> {}()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dict' object is not callable

Instead what we need is to pass in ‘dict':

>>> dict()
>>> words = defaultdict(dict)
>>> words
defaultdict(<type 'dict'>, {})

That simplifies the first bit of the loop:

words = defaultdict(dict)
for doc_id, doc in enumerate(matrix.todense()):
    for word_id, score in enumerate(doc.tolist()[0]):
        word = features[word_id]
        if not words[word].get("score"):
            words[word]["score"] = 0
        words[word]["score"] += score
        if not words[word].get("episodes"):
            words[word]["episodes"] = set()
        if score > 0:

We’ve still got a couple of other places to simplify though which we can do by defining a custom function and passing that into defaultdict:

def default_dict_function():
   return {"score": 0, "episodes": set()}
>>> words = defaultdict(default_dict_function)
>>> words
defaultdict(<function default_dict_function at 0x10963fcf8>, {})

And here’s the final product:

def default_dict_function():
   return {"score": 0, "episodes": set()}
words = defaultdict(default_dict_function)
for doc_id, doc in enumerate(matrix.todense()):
    for word_id, score in enumerate(doc.tolist()[0]):
        word = features[word_id]
        words[word]["score"] += score
        if score > 0:

After this we can write out the words to our stop list:

with open("stop_words.txt", "w") as file:
    writer = csv.writer(file, delimiter = ",")
    for word, value in words.iteritems():
        # appears in > 10% of episodes
        if len(value["episodes"]) > int(len(episodes) / 10):
        # less than 10 occurences
        if value["score"] < 10:
Categories: Programming

Python: Forgetting to use enumerate

Mark Needham - Sun, 03/22/2015 - 02:28

Earlier this evening I found myself writing the equivalent of the following Python code while building a stop list for a topic model…

words = ["mark", "neo4j", "michael"]
word_position = 0
for word in words:
   print word_position, word
   word_position +=1

…which is very foolish given that there’s already a function that makes it really easy to grab the position of an item in a list:

for word_position, word in enumerate(words):
   print word_position, word

Python does make things extremely easy at times – you’re welcome future Mark!

Categories: Programming

Risk Management is How Adults Manage Projects

Herding Cats - Glen Alleman - Sat, 03/21/2015 - 23:50

The quote in the title is from Tim Lister. It says volumes about project management and project failure. It also means that managing risk is managing in the presence of uncertainty. And managing in the presence of uncertainty means making estimates about the impacts of our decision on future outcomes. So you can invert the statement when you hear we can make decisions in the absence of estimates.

For those interested in managing projects in the presence of uncertainty and the risk that uncertainty creates, here's a collection from the office library, in no particular order

Categories: Project Management

The Microeconomics of Decision Making in the Presence of Uncertainty - Re-Deux

Herding Cats - Glen Alleman - Sat, 03/21/2015 - 22:43

Microeconomics is a branch of economics that studies the behavior of individuals and small impacting organizations in making decisions on the allocation of limited resources.

All engineering is constrained optimization. How do we take the resources we've been given and deliver the best outcomes. That's microeconomics is. Unlike models of mechanical engineering or classical physics, the models of microeconomics are never precise. They are probabilistic, driven by the underlying statistical processes of the two primary actors - suppliers and consumers. 

Let's look at both in light of the allocation of limited resources paradigm.

  • Supplier = development resources - these are limited in both time and capacity for work. And as likely talent and production of latent defects, which cost time and money to remove.
  • Consumer = those paying for the development resources have limited time and money. Limited money is obvious, they have a budget. Limited time, since the¬†time value of money of part of the Return in Capital equation used by the business. Committing capital (not real capital, software development is usually carried on the books as an expense), needs a time when that capital investment will start to return¬†value.¬†

In both case time, money, capacity for productive value are limited (scarce) and compete with each other and compete with the needs of both the supplier and the consumer. In addition, since the elasticity of labor costs is limited by the market, we can't simply buy cheaper to make up for time and capacity. It's done of course but always to the determent of quality and actual productivity.

So cost is inelastic, time is inelastic, capacity for work is inelastic and other attributes of the developed product constrained. The market need is like constrained as well. Business needs are rarely elastic - oh we really didn't need to pay people in the time keeping system, let's just collect the time sheets, we'll run payroll when that feature gets implemented.

Enough Knowing, Let's Have Some Doing

With the principles of Microeconomics applied to software development, there is one KILLER issue, that if willfully ignored ends the conversation for any business person trying to operate in the presence of limited resources - time, money, capacity for work.

The decisions being made about these limited resources are being made in the presence of uncertainty. This uncertainty - as mentioned - is based on random processes. Random process produce imprecise data. Data drawn from random variables. Random variables with variances, instability (stochastic processes), non-linear stochastic processes. 

Quick Diversion Into Random Variables

There are many mathematical definitions of random variables, but for this post let's use a simple one.

  • A variable is an attribute of a system or project that can take on multiple values. If the value of this variable is¬†fixed for example when someone asks what is¬†the number of people on the project can be known by counting then and writing that down. When someone asked you could count and say say 16.
  • When the values of the variable are¬†random then the variable can take on a range of values just like the non-random variable, but we don't know exactly what those values will be when we want to use that variable to ask a question. If the variable is a¬†random variable and¬†someone asks what will be the cost of this project when it is done, you'll have to provide a range of values and the confidence for each of the numbers in the range.¬†

A simple example - silly but illustrative - would be HR wants to buy special shoes for the development team, with the company logo on them. If we could not for some reason (doesn't matter why) measure the shoe size of all the males on our project, we could estimate how many shows of what size woudl be needed from the statistical distribution of males shoe sizes for a large population of make coders.


This would get use close to how many shoes of what size we need to order. This is a notional example, so please don't place an order for actual shoes. But the underlying probability distribution of the values the random variable can take on can tell us about the people working on the project.

Since all the variables on any project are random variables, we can't know the exact value of them at any one time. But we can know about their possible ranges and the probabilities of any specific value when asked to produce that value for making a decision. 

The viability of the population values and its analysis should not be seen not as a way of making precise predictions about the project outcomes, but as a way of ensuring that all relevant outcomes produced by these variables have been considered, that they have been evaluated appropriately, and that we have a reasonable sense what will happen for the multitude of values produced by a specific variable. It provides a way of structuring our thinking about the problem. 

Making Decisions In The Presence of Random Variables

To make a decision - a choice among several choices - means making an opportunity cost  decision based in random data. And if there is only one choice, then the choice is either take the choice or don't.

This means the factors that go into that decision are themselves random variables. Labor, productivity, defects, capacity, quality, usability, functionality, produced business capability, time. Each is a random variables, interacting in nonlinear ways with the other random variables.

To make a choice in the presence of this paradigm we must make estimates of not only the behaviour of the variables, but also the behaviors of the outcomes.

In other words

To develop software in the presence of limited resources driven by uncertain processes for each resource (time, money, capacity, technical outcomes), we must ESTIMATE the behaviors of these variables that inform our decision.

It's that simple and it's that complex. Anyone conjecturing decisions can be made in the absence of estimates of the future outcomes of that decision is willfully ignoring the Microeconomics of business decision making in the software development domain.

For those interested in further exploring of the core principle of Software Development business beyond this willful ignorance, here's a starting point.

These are the tip of the big pile of books, papers, journal articles on estimating software systems. 

A Final Thought on Empirical Data

Making choices in the presence of uncertainty can be informed by several means:

  • We have data from the past
  • We have a model of the system that can simulated
  • We have¬†reference classes from which we can extract similar information

This is empirical data. But there are several critically important questions that must be answered if we are not going to be disappointed with our empirical data outcomes

  • Is the past representative of the future?
  • Is the sample of data from the past sufficient to make sound forecasts of the future. The number of sample needed greatly influences the confidence intervals on the estimates of the future.

Calculating the number of samples needed for a specific level of confidence requires some statistics. But here's a place to start. Suffice it to say, those conjecturing estimates based on past performance (number of story point in the past) will need to produce the confidence calculation before any non-trivial decisions should be made on their data. Without those calculations the use of past performance be very sporty when spending other peoples money.

Thanks to Richard Askew for suggesting the addition of the random variable background

Categories: Project Management

Re-Read Saturday: The Goal: A Process of Ongoing Improvement. Part 5


The Goal: A Process of Ongoing Improvement, published in 1984, is a business novel. The Goal uses the story of Alex Rogo, plant manager, to illustrate the theory of constraints and how the wrong measurement focus can harm an organization. The focus of the re-read is less on the story, but rather on the ideas the book explores that have shaped lean thinking. Earlier entries in this re-read are:

Part 1                         Part 2                         Part 3                         Part 4

In the next 4 chapters Alex stumbles on the how the concepts of dependent events and variability affect the flow of work, nearly literally.

Chapter 13

Chapter 13 begins with Alex awaking to his son, Dave, in his Boy Scout uniform waiting to go on a weekend hike and camping trip. Alex ends up as the leader due to the absence of the normal scout master.¬†¬† The column of scouts sets out with an adult, Ron, leading the way. Alex asks Ron to set a pace that is consistent and maintainable. The scouts create a queue based on the arcane rationale of young boys, and Alex anchors to the column to ensure the troop stays together. The column spreads out immediately even though everyone is moving at the same ‚Äúaverage‚ÄĚ speed. The interaction amongst the hikers is a series of dependent events. Scouts speed up and slow impacting those behind them. The act of speeding up and slowing down is the statistical variation described in Chapter 12. The speed of any individual hiker is influenced by the person directly ahead them in the line. Finally Alex realizes that the speed of the overall hike is less a reflection of first person in line than the last person in line. In the software-testing world, testing is not complete when the first test is done, but rather when the last test is completed.

Side Note: Anyone that wants to understand why every effort should be made to remove or reduce dependencies needs to read these this of chapters carefully. Dependencies make any process A LOT more complicated.

Chapter 14

Rogo considers how to reduce the statistical variation in the column. While stopping for lunch, Alex recruits a few of the scouts to play a game using match sticks, bowls and a die (they are boy scouts . . . ready for anything, including dice games). The game is played by moving match sticks between bowls. The number match sticks moved in each step is based on the roll of the die. As Alex gathers statistics by repeating the game, the combination of dependent events (movement of the match sticks from one bowl to another) and statistical variation (die roll) show him how build ups of inventory occurs between steps. The flow of work becomes irregular and unmanageable.

Side Note: This is a great game to play with software teams to drive home the point of the impact of variability.

Chapter 15

As the troop starts out from lunch, Alex considers the concept of reserves as a mechanism to fix the flow problem (spreading out) the troop is having. He watches the slowest kid in the troop fall behind and then sprint to catch up over and over, generating a large gaps in the line. The troop is utilizing all of its energy to stay together meaning that it has no spare capacity to recover when gaps appear. Consider software teams that generate plans with 100% utilization. As any developer or tester knows nothing goes exactly as planned, and as soon as a problem is encountered you are immediately behind if you are 100% utilized.

Another option he considers is to have everyone hike as fast as they can individually. In this scenario everyone would optimize their individual performance. The outcome would be chaos with scouts strung out all over the trail. With the troop spread out on the it would be impossible to know when the last person would get to the camp for the night. ¬†Remember the hike is only complete when the last person gets to camp, therefore chaos does not promote predictability.¬†In Alex’s¬†plant, even though they are using robot and each step is running at high levels of efficiency orders are not completing on-time. Similar problems can be seen in many software projects with developers and testers individually running at 100% capacity and high levels of efficiency while functionality is delivered well after it was promised.

When Rogo realizes that the process is only as fast as the slowest person, he decides to re-adjust the line of scouts so that the slowest is in front. The gaps immediately disappear. With the process now under control he can shift to helping the slowest person speed up, therefore improving the whole process.

Chapter 16 moves the novel plot forward with Julie, Alex’s wife, dumping Alex‚Äôs daughter at Alex‚Äôs mother‚Äôs house and leaves Alex.

Chapters 13 ‚Äď 15 drive home the point that dependent events and statistical variation impact the performance of the overall system. In order for the overall process to be more effective you have to understand the capability and capacity of each step and then take a systems view. These chapters establish the concepts of bottlenecks and constraints without directly naming them and that focusing on local optimums causes more trouble than benefit.

Summary of The Goal so far:

Chapters 1 through 3 actively present the reader with a burning platform. The plant and division are failing. Alex Rogo has actively pursued increased efficiency and automation to generate cost reductions, however performance is falling even further behind and fear has become central feature in the corporate culture.

Chapters 4¬†through¬†6¬†shift the focus from steps in the process to the process as a whole. Chapters 4 ‚Äď 6 move us down the path of identifying the ultimate goal of the organization (in this book). The goal is making money and embracing the big picture of systems thinking. In this section, the authors point out that we are often caught up with pursuing interim goals, such as quality, efficiency or even employment, to the exclusion of the of the ultimate goal. We are reminded by the burning platform identified in the first few pages of the book, the impending closure of the plant and perhaps the division, which in the long run an organization must make progress towards their ultimate goal, or they won‚Äôt exist.

Chapters 7 through 9 show Alex‚Äôs commitment to change, seeks more precise advice from Johan, brings his closest reports into the discussion and begins a dialog with his wife (remember this is a novel). In this section of the book the concept ‚Äúthat you get what you measure‚ÄĚ is addressed. In this section of the book, we see measures of efficiency being used at the level of part production, but not at the level of whole orders or even sales. We discover the corollary to the adage ‚Äėyou get what you measure‚Äô is that if you measure the wrong thing ‚Ķyou get the wrong thing. We begin to see Alex‚Äôs urgency and commitment to make a change.

Chapters 10 through 12 mark a turning point in the book. Alex has embraced a more systems view of the plant and that the measures that have been used to date are more focused on optimizing parts of the process to the detriment to overall goal of the plant.  What has not fallen into place is how to take that new knowledge and change how the plant works. The introduction of the concepts of dependent events and statistical variation begin the shift the conceptual understanding of what measure towards how the management team can actually use that information.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast. Dead Tree Version or Kindle Version

Categories: Process Management

Feynman Lectures Now Online

Herding Cats - Glen Alleman - Sat, 03/21/2015 - 15:43

FeynmanThe Feynman Lectures were a staple of my education, including have Feynman come to UC Irvine a speak to the Student Physics Society on his current work in Quantum Electrodynamics (QED).

The 3 volume set is still in our library. Mine are hardbound, there are paper backs available now.

The books are not actually very good text books. The Lectures are just that, transcriptions of lectures. When reading them, you can hear Feynman talk, in the way several other authors write in the way they talk.

The point of this post is the Lectures are now available electronically at The Feynman Lectures on Physics.

For anyone interested in physics, or has ever heard of Richard Feynman should take a look. 

My memory - after many decades - is Feynman loved students in ways not all physics professors do. He was a professional teacher as well as a physicist. His Nobel prize never got in the way of his love of students. One of our own Nobel Laureates, Fred Reines  had a similar view of students - both undergrad and grad students. Dr. Reines would invite us to his house for BBQ and entertain us with stories of Los Alamos and other adventures.

Take a look, see how a true teacher writes about a topic he loves.

Related articles Empirical Data Used to Estimate Future Performance Five Estimating Pathologies and Their Corrective Actions Fifty years late for Feynman's lectures Managing Projects By The Numbers
Categories: Project Management

Reminder to migrate to OAuth 2.0 or OpenID Connect

Google Code Blog - Fri, 03/20/2015 - 22:12

Posted by William Denniss, Product Manager, Identity and Authentication

Over the past few years, we’ve publicized that ClientLogin, OAuth 1.0 (3LO)1, AuthSub, and OpenID 2.0 were deprecated and would shut down on April 20, 2015. We’re moving away from these older protocols in order to focus support on the latest Internet standards, OAuth 2.0 and OpenID Connect, which increase security and reduce complexity for developers.

The easiest way to migrate to these new standards is to use the Google Sign-in SDKs (see the migration documentation). Google Sign-in is built on top of our OAuth 2.0 and OpenID Connect infrastructure and provides a single interface for authentication and authorization flows on Web, Android and iOS.

If the migration for applications using these deprecated protocols is not completed before the deadline, the application will experience an outage in its ability to connect with Google (possibly including the ability to sign in) until the migration to a supported protocol occurs. To avoid any interruptions in service, it is critical that you work to migrate prior to the shutdown date.

If you need to migrate your integration with Google:

If you have any technical questions about migrating your application, please post questions to Stack Overflow under the tag google-oauth or google-openid.

1 3LO stands for 3-legged OAuth: There's an end-user that provides consent. In contrast, 2-legged (2LO) correspond to Enterprise authorization scenarios: organizational-wide policies control access. Both OAuth1 3LO and 2LO flows are deprecated.

Categories: Programming

Android UI Automated Testing

Google Testing Blog - Fri, 03/20/2015 - 21:51
by Mona El Mahdy


This post reviews four strategies for Android UI testing with the goal of creating UI tests that are fast, reliable, and easy to debug.

Before we begin, let’s not forget an import rule: whatever can be unit tested should be unit tested. Robolectric and gradle unit tests support are great examples of unit test frameworks for Android. UI tests, on the other hand, are used to verify that your application returns the correct UI output in response to a sequence of user actions on a device. Espresso is a great framework for running UI actions and verifications in the same process. For more details on the Espresso and UI Automator tools, please see: test support libraries.

The Google+ team has performed many iterations of UI testing. Below we discuss the lessons learned during each strategy of UI testing. Stay tuned for more posts with more details and code samples.

Strategy 1: Using an End-To-End Test as a UI Test

Let’s start with some definitions. A UI test ensures that your application returns the correct UI output in response to a sequence of user actions on a device. An end-to-end (E2E) test brings up the full system of your app including all backend servers and client app. E2E tests will guarantee that data is sent to the client app and that the entire system functions correctly.

Usually, in order to make the application UI functional, you need data from backend servers, so UI tests need to simulate the data but not necessarily the backend servers. In many cases UI tests are confused with E2E tests because E2E is very similar to manual test scenarios. However, debugging and stabilizing E2E tests is very difficult due to many variables like network flakiness, authentication against real servers, size of your system, etc.

When you use UI tests as E2E tests, you face the following problems:
  • Very large and slow tests. 
  • High flakiness rate due to timeouts and memory issues. 
  • Hard to debug/investigate failures. 
  • Authentication issues (ex: authentication from automated tests is very tricky).

Let’s see how these problems can be fixed using the following strategies.

Strategy 2: Hermetic UI Testing using Fake Servers

In this strategy, you avoid network calls and external dependencies, but you need to provide your application with data that drives the UI. Update your application to communicate to a local server rather than external one, and create a fake local server that provides data to your application. You then need a mechanism to generate the data needed by your application. This can be done using various approaches depending on your system design. One approach is to record server responses and replay them in your fake server.

Once you have hermetic UI tests talking to a local fake server, you should also have server hermetic tests. This way you split your E2E test into a server side test, a client side test, and an integration test to verify that the server and client are in sync (for more details on integration tests, see the backend testing section of blog).

Now, the client test flow looks like:

While this approach drastically reduces the test size and flakiness rate, you still need to maintain a separate fake server as well as your test. Debugging is still not easy as you have two moving parts: the test and the local server. While test stability will be largely improved by this approach, the local server will cause some flakes.

Let’s see how this could this be improved...

Strategy 3: Dependency Injection Design for Apps.

To remove the additional dependency of a fake server running on Android, you should use dependency injection in your application for swapping real module implementations with fake ones. One example is Dagger, or you can create your own dependency injection mechanism if needed.

This will improve the testability of your app for both unit testing and UI testing, providing your tests with the ability to mock dependencies. In instrumentation testing, the test apk and the app under test are loaded in the same process, so the test code has runtime access to the app code. Not only that, but you can also use classpath override (the fact that test classpath takes priority over app under test) to override a certain class and inject test fakes there. For example, To make your test hermetic, your app should support injection of the networking implementation. During testing, the test injects a fake networking implementation to your app, and this fake implementation will provide seeded data instead of communicating with backend servers.

Strategy 4: Building Apps into Smaller Libraries

If you want to scale your app into many modules and views, and plan to add more features while maintaining stable and fast builds/tests, then you should build your app into small components/libraries. Each library should have its own UI resources and user dependency management. This strategy not only enables mocking dependencies of your libraries for hermetic testing, but also serves as an experimentation platform for various components of your application.

Once you have small components with dependency injection support, you can build a test app for each component.

The test apps bring up the actual UI of your libraries, fake data needed, and mock dependencies. Espresso tests will run against these test apps. This enables testing of smaller libraries in isolation.

For example, let’s consider building smaller libraries for login and settings of your app.

The settings component test now looks like:


UI testing can be very challenging for rich apps on Android. Here are some UI testing lessons learned on the Google+ team:
  1. Don‚Äôt write E2E tests instead of UI tests. Instead write unit tests and integration tests beside the UI tests. 
  2. Hermetic tests are the way to go. 
  3. Use dependency injection while designing your app. 
  4. Build your application into small libraries/modules, and test each one in isolation. You can then have a few integration tests to verify integration between components is correct . 
  5. Componentized UI tests have proven to be much faster than E2E and 99%+ stable. Fast and stable tests have proven to drastically improve developer productivity.

Categories: Testing & QA

Five Estimating Pathologies and Their Corrective Actions

Herding Cats - Glen Alleman - Fri, 03/20/2015 - 19:03

Jim Benson has a thought provoking post on the Five Pathologies of estimating. Each is likely present in an organization that has not moved up the maturity scale. Maturity levels in the CMMI paradigm. The post, like many notions in the agile world, starting with the Manifesto assumes the software development process is broken - ML 1. With that assumption there is little to direct the effort towards identifying the rot causes and taking corrective actions, as found in highly maturity levels. 

CMMI Maturity Levels

Let's See What Corrective Actions Will Remove the Symptoms produced by a Root Cause

Each item in the post could be a root cause or just a symptom of the root cause. A full Root Cause Analysis would be needed for a specific domain, but I'll make suggestions below that can be broadly applied to each. These responses are extracted from Steve Loving's "Mitigation's for the Planning Fallacy" given at the PMI Silicon Valley Symposium.

  • Guarantism ‚Äď The belief an estimate is actually correct. The commitment to an estimate as a¬†fact,¬†that is¬†we¬†guarantee the price of the work will be $X.XX
    • First let's establish a fundamental principle of all estimating processes. No point estimate is correct in the absence of a confidence level.
    • All numbers on projects are random numbers. Only the check out price at the grocery store is a¬†fixed number.¬†
    • Is that $600 plumbing quote a 80% confidence number or a 30% confidence numbers.
    • When the receiver of the quote doesn't ask for the confidence level, and further ask for the uncertainties in that quote - the reducible and irreducible uncertainties - then the provider of the quote of off the hook. And the receiver of the quote has not¬†locked that number in her mind.¬†
    • This is the anchoring and adjustment problem
    • CORRECTIVE ACTION - no quote accepted without a statistical confidence based on past performance or some parametric assessment
  • ¬†Promisoriality ‚Äď The belief that estimates are possible. This is labeled the Planning fallacy
    • Use of the Planning Fallacy without also reading the solutions to the Planning Fallacy is a Root Cause.
    • Bent Flyvbjerg has much to say on the Planning Fallacy and the corrective actions.
    • Reference Class Forecasting, Parametric Modeling, simple wide band Delphi and many other estimating techniques can address the Planning Fallacy
    • CORRECT ACTION¬†- Use of the Planning Fallacy and NOT reading the solutions, is a low maturity behaviour.
  • Swami-itis ‚Äď The belief that an estimates is a basis for sound decision making. All projects are probabilistic process driven by the underlying statistical network of interconnected activities. Fail to recognize that modeling this¬†system is the root cause of naive and ill-informed estimates.
    • Our domain - software intensive systems - is driven by Systems Engineering and Monte Carlo Simulation of the System of Systems.
    • Tools are available - some simple some complex. But the paradigm is systems engineering drives our thought processes.
  • Craftosis ‚Äď The assumption that estimates can be done better.¬†We believe we will get better at our estimates. ¬†That estimation is a skill. ¬†This may be true, ¬†our estimating skills can be honed. ¬†However, ¬†the planning fallacy and the realities of how we work put a cap on the accuracy we are able to attain.
    • We'e back to the misuse of the Planning Fallacy.¬†
    • So let's revisit the planning fallacy foundations so we don't misuse it again
      • The context of planning provides many examples in which the distribution of outcomes in past experience is ignored. Scientists and writers, for example, are notoriously prone to underestimate the time required to complete a project, even when they have considerable experience of past failures to live up to planned schedules. - - Kahneman and Tversky (1979)
      • Kahneman and Tversky tell us as does Flyvbjerg -¬†don't do stupid things on purpose. Don't underestimate when you have direct experience from the past that those underestimates were wrong.¬†

An Interlude at Address the Planning Fallacy

The Planning Fallacy is a set of cognitive biases present across all levels of expertise and all subject matters

  • Planners/PMs/Project teams are exhibiting the Planning Fallacy when they
    • Use internal, idealized scenarios about the future.
    • Ignore past information
    • Fall prey to false optimism.
    • Engage in the estimating processes with unacknowledged high motivational forces.
  • The Planning Fallacy has two components
    • Anchors that influence the Planning process ‚Äď data ¬†introduced early in planning, even spurious data, that then wrongly ¬†influence estimates
    • Related to the Planning Fallacy, found in novices and experts alike, are limits in cognition when probabilistic thinking is involved ‚Äď this becomes an issue when project teams must deal with likelihood of risk events. This is seen in the project.¬†Identifying and Managing Project Risk: Essential Tools for Failure-Proofing Your Project, Second Edition. By Tom Kendrick, 2009.
  • The Planning Fallacy has several dimensions
    • Plans
    • Past
    • Optimism
    • Motivation
    • Anchor
    • Models
  • Solutions to the Planning Fallacy are straight forward
    • Premortum -What were all the possible pitfalls in this imaginary project failure?
    • Reference Classes -¬†The Reference Class is a database of projects. A project team can compare their project to projects in the database. There are several external database for most IT projects.
    • Most Likely Development -¬†Look at parts of the project that have the highest probability of cost overrun, schedule overrun, or negative environmental impacts.
    • Bayesian analysis -¬†adjust judgment of the probability of success of the planning efforts.
    • Explicit model - simulation models, including Monte Carlo Carlo and Method of Moments.
    • Structured techniques -¬†processes they rely on data and detail¬†repeated multiple times throughout estimation phases
    • Decomposition - using the Work Breakdown Structure¬†to force more details, and illuminate faulty estimating models and resulting biases
    • Scrum -¬†Stories and tasks are ranked and scored by the Scrum team using story points.¬†
    • Theory of Constraints -¬†use real time mitigation for projects via buffer management.

OK, Back to the Post

  • Reality Blindness ‚Äď The insistence that estimates¬†are implementable¬†In business, ¬†we create estimates and plans and begin work immediately, ¬†rarely with the expectation that those plans will change.
    • This is a very naive process, a low maturity level in the CMMI sense.
    • Plans change, that's why it's called a plan.
    • When plans changes for all the right, and possibly wrong, reasons, estimates must be adjusted.
    • In our formal Federal Acquisition Regulation, Baseline Change Requests mandate a new Estimate to Complete and Estimate At Completion.


The post showed the symptoms of poor planning processes. The root causes are well know and the Corrective Actions readily available. The challenge is to have the will and fortitude at actually do the right thing.

This is a challenge in our high maturity space and defense business. So that's the real problem. It is not what many suggest - that estimating can't be done. It can. But you have to want to do it.

Like nearly every root cause it comes down to the people, page 18 below. Without a Root Cause Analysis, the original post just points out the symptoms and provides not means to take corrective actions. Leaving 2 out of the 3 steps missing for making improvements to our probability of project success. 

  Some Resource Materials

Final Comment

The key here is to understand that Benson's 5 symptoms of poor planning all have root causes. Each root cause has a corrective action. Each corrective action is hard to implement, hard to sustain, but for mission critical project manifestly important to apply.

And most important when we hear estimating can't be done, do your home work, look for thought leaders in estimating, read about estimating in all kinds of domains, and don't believe any conjecture without first testing that concept in the broader realm of research, evidence based, peer reviewed, field validated concept. The notion that we can make decisions about the spending of other peoples money n the absence of estimating how much, when we'll be done, and what we'll be able to deliver is as Alistair suggests about #NoEstimates is a pile of self-contradictory statements.

Time to reconnect with the business process of managing other peoples money.


Related articles The Use, Misuse, and Abuse of Complexity and Complex Release Early and Release Often Open Loop Thinking v. Close Loop Thinking
Categories: Project Management

Publishing Google Docs add-ons for domain-wide installation

Google Code Blog - Fri, 03/20/2015 - 18:02
Since we introduced add-ons for Google Docs, Sheets, and Forms last year, our developer partners have brought a world of new features to millions of users. Still, administrators for Google Apps domains (and developers!) kept asking for two things:

So, if you’ve built (or are thinking of building) a Google Docs, Sheets or Forms add-on, then be sure to make your add-on available in Google Apps Marketplace today.

Posted by Saurabh Gupta, product manager, Google Apps Script
Categories: Programming

Stuff The Internet Says On Scalability For March 20th, 2015

Hey, it's HighScalability time:

What a view! The solar eclipse at sunrise from the International Space Station.
  • 60 billion: rows in DynamoDB; 18.5 billion: BuzzFeed impressions
  • Quotable Quotes:
    • @postwait: Hell is other people’s APIs.
    • @josephruscio: .@Netflix is now 34% of US Internet traffic at night. 2B+ hours of streaming a month. #SREcon15
    • Geo Curnoff: Everything he said makes an insane amount of sense, but it might sound like a heresy to most people,ÔĽŅ who are more interested in building software cathedrals rather than solving real problems.
    • Mike Acton: Reality is not a hack you're forced to deal with to solve your abstract, theoretical problem. Reality isÔĽŅ the actual problem.
    • @allspaw: "The right tool for the job!" said someone whose assumptions, past experience, motivations, and definition of "job" aren't explicit.
    • Sam Cutler: Mechanical ignoranceÔĽŅ is, in fact, not a strength.
    • @Grady_Booch: Beautiful quote from @timoreilly “rms is sort of like an Old Testament prophet, with lots of ‘Thou shalt not.'" 
    • @simonbrown: "With event-sourcing, messaging is back in the hipster quadrant" @ufried at #OReillySACon
    • @ID_AA_Carmack: I just dumped the C++ server I wrote last year for a new one in Racket. May not scale, but it is winning for development even as a newbie.
    • @mfdii: Containers aren't going to reduce the need to manage the underlying services that containers depend on. Exhibit A: 
    • @bdnoble: "DevOps: The decisions you make now affect the quality of sleep you get later." @caitie at #SREcon15
    • @giltene: By that logic C++ couldn't possibly multiply two integers faster than an add loop on CPUs with no mul instruction, right?
    • @mjpt777: Aeron beats all the native C++ messaging implementations and it is written in Java. 
    • @HypertextRanch: Here's what happens to your Elasticsearch performance when you upgrade the firmware on your SSDs.
    • @neil_conway: Old question: "How is this better than Hadoop?". New question: How is this better than GNU Parallel?"
    • @evgenymorozov: "Wall Street Firm Develops New High-Speed Algorithm Capable Of Performing Over 10,000 Ethical Violations Per Second"

  • And soon the world's largest army will have no soldiers. @shirazdatta: In 2015 Uber, the world's largest taxi company owns no vehicles, Facebook the world's most popular media owner creates no content, Alibaba, the most valuable retailer has no inventory and Airbnb the world's largest accommodation provider owns no real estate.

  • Not doing something is still the #1 performance improver. Coordination Avoidance in Database Systems: after looking at the problem from a fresh perspective, and without breaking any of the invariants required by TPC-C, the authors were able to create a linearly scalable system with 200 servers processing 12.7M tps – about 25x the next-best system.

  • Tesla and the End of the Physical World. Tesla downloading new software to drastically improve battery usage is cool, but devices have been doing this forever. Routers, switches, set tops, phones, virtually every higher end connected device knows how to update itself. Cars aren't any different. Cars are just another connected device. Also, interesting that Tesla is Feature Flagging their new automatic steering capability.

  • The Apple Watch is technology fused with fashion and ecosystem in a way we've never seen before. Which is a fascinating way of routing around slower moving tech cycles. Cycles equal money. Do you need a new phone or tablet every year? Does the technology demand it? Not so much. But fashion will. Fashion is a force that drives cycles to move for no reason at all. And that's what you need to make money. Crazy like a fox.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Best Happiness Books #InternationalDayOfHappiness

NOOP.NL - Jurgen Appelo - Fri, 03/20/2015 - 11:21

Today is International Day of Happiness. What a great day to search and find all the best book on the topic of happiness!

This list is created from the books on GoodReads tagged with “happiness”, sorted using an algorithm that favors number of reviews, average rating, and recent availability.

The post Best Happiness Books #InternationalDayOfHappiness appeared first on NOOP.NL.

Categories: Project Management

Badass: Making users awesome ‚Äď Kathy Sierra: Book Review

Mark Needham - Fri, 03/20/2015 - 08:30

I started reading Kathy Sierra’s new book ‘Badass: Making users awesome‘ a couple of weeks ago and with the gift of flights to/from Stockholm this week I’ve got through the rest of it.

I really enjoyed the book and have found myself returning to it almost every day to check up exactly what was said on a particular topic.

There were a few things that I’ve taken away and have been going on about to anyone who will listen.

2015 03 20 06 52 51

Paraphrasing, ‘help users acquire skills, don’t throw knowledge at them.’ I found this advice helpful both in my own learning of new things as well as for thinking how to help users of Neo4j get up and running faster.

Whether we’re doing a talk, workshop or online training, the goal isn’t to teach the user a bunch of information/facts but rather to help them learn skills which they can use to achieve their ‘compelling context‘.

Having said that, it’s very easy to fall into the information/facts trap as that type of information is much easier to prepare and present. You don’t have to spend much time thinking about how the user is going to use, rather you hope that if you throw enough information at them some of it will stick.

A user’s compelling context the problem they’re trying to solve regardless of the tools they use to solve it. The repeated example of this is a camera – we don’t buy a camera because we want to buy a camera, we buy it because we want to take great photographs.

2015 03 17 23 49 25

There’s a really interesting section in the middle of the book which talks about expert performance and skill acquisition and how we can achieve this through deliberate practice.

My main take away here is that we have only mastered a skill if we can achieve 95% reliability in repeating the task within 1-3 45-90 minute sessions.

If we can’t achieve this then the typical reaction is to either give up or keep trying to achieve the goal for many more hours. Neither of these is considered a useful approach.

Instead we should realise that if we can’t do the skill it’s probably because there’s a small sub skill that we need to master first. So our next step is to break this skill down into its components, master those and then try the original skill again.

Amy Hoy’s ‘doing it backwards‘ guide is very helpful for doing the skill breakdown as it makes you ask the question ‘can I do it tomorrow?‘ or is there something else that I need to do (learn) first.

I’ve been trying to apply this approach to my machine learning adventures which most recently has involved various topic modelling attempts on a How I met your mother data set.

I’d heard good things about the MALLET open source library but having never used it before sketched out the goals/skills I wanted to achieve:

Extract topics for HIMYM corpus ->
Train a topic model with mallet ->
Tweak an existing topic model that uses mallet ->
Run an existing topic model that uses mallet -> 
Install mallet
2015 03 20 00 11 48

The idea is that you then start from the last action and work your way back up the chain – it should also act as a nice deterrent for yak shaving.

While learning about mallet I came across several more articles that I should read about topic modelling and while these don’t directly contribute to learning a skill I think they will give me good background to help pick up some of the intuition behind topic modelling.

My take away about gaining knowledge on a skill is that when we’re getting started we should spend more time gaining practical knowledge rather than only reading but once we get more into it we’ll naturally become more curious and do the background reading. I often find myself just reading non stop about things but never completely understanding them because I don’t go hands on so this was a good reminder.

One of the next things I’m working on is a similar skill break down for people learning Neo4j and then we’ll look to apply this to make our meetup sessions more effective – should be fun!

The other awesome thing about this book is that I’ve come away with a bunch of other books to read as well:

In summary, if learning is your thing get yourself a copy of the book and read it over a few times – so many great tips, I’ve only covered a few.

Categories: Programming