Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!


Neo4j: LOAD CSV – Processing hidden arrays in your CSV documents

Mark Needham - Thu, 07/10/2014 - 15:54

I was recently asked how to process an ‘array’ of values inside a column in a CSV file using Neo4j’s LOAD CSV tool and although I initially thought this wouldn’t be possible as every cell is treated as a String, Michael showed me a way of working around this which I thought was pretty neat.

Let’s say we have a CSV file representing people and their friends. It might look like this:


And what we want is to have the following nodes:

  • Mark
  • Michael
  • Peter
  • Kenny
  • Anders

And the following friends relationships:

  • Mark -> Michael
  • Mark -> Peter
  • Michael -> Peter
  • Michael -> Kenny
  • Kenny -> Anders
  • Kenny -> Michael

We’ll start by loading the CSV file and returning each row:

$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row RETURN row;
| row                                            |
| {name -> "Mark", friends -> "Michael,Peter"}   |
| {name -> "Michael", friends -> "Peter,Kenny"}  |
| {name -> "Kenny", friends -> "Anders,Michael"} |
3 rows

As expected the ‘friends’ column is being treated as a String which means we can use the split function to get an array of people that we want to be friends with:

$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row RETURN row, split(row.friends, ",") AS friends;
| row                                            | friends              |
| {name -> "Mark", friends -> "Michael,Peter"}   | ["Michael","Peter"]  |
| {name -> "Michael", friends -> "Peter,Kenny"}  | ["Peter","Kenny"]    |
| {name -> "Kenny", friends -> "Anders,Michael"} | ["Anders","Michael"] |
3 rows

Now that we’ve got them as an array we can use UNWIND to get pairs of friends that we want to create:

$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row 
  WITH row, split(row.friends, ",") AS friends 
  UNWIND friends AS friend 
  RETURN, friend;
|  | friend    |
| "Mark"    | "Michael" |
| "Mark"    | "Peter"   |
| "Michael" | "Peter"   |
| "Michael" | "Kenny"   |
| "Kenny"   | "Anders"  |
| "Kenny"   | "Michael" |
6 rows

And now we’ll introduce some MERGE statements to create the appropriate nodes and relationships:

$ load csv with headers from "file:/Users/markneedham/Desktop/friends.csv" AS row 
  WITH row, split(row.friends, ",") AS friends 
  UNWIND friends AS friend  
  MERGE (p1:Person {name:}) 
  MERGE (p2:Person {name: friend}) 
  MERGE (p1)-[:FRIENDS_WITH]->(p2);
| No data returned. |
Nodes created: 5
Relationships created: 6
Properties set: 5
Labels added: 5
373 ms

And now if we query the database to get back all the nodes + relationships…

$ match (p1:Person)-[r]->(p2) RETURN p1,r, p2;
| p1                      | r                  | p2                      |
| Node[0]{name:"Mark"}    | :FRIENDS_WITH[0]{} | Node[1]{name:"Michael"} |
| Node[0]{name:"Mark"}    | :FRIENDS_WITH[1]{} | Node[2]{name:"Peter"}   |
| Node[1]{name:"Michael"} | :FRIENDS_WITH[2]{} | Node[2]{name:"Peter"}   |
| Node[1]{name:"Michael"} | :FRIENDS_WITH[3]{} | Node[3]{name:"Kenny"}   |
| Node[3]{name:"Kenny"}   | :FRIENDS_WITH[4]{} | Node[4]{name:"Anders"}  |
| Node[3]{name:"Kenny"}   | :FRIENDS_WITH[5]{} | Node[1]{name:"Michael"} |
6 rows

…you’ll see that we have everything.

If instead of a comma separated list of people we have a literal array in the cell…

"Mark", "[Michael,Peter]"
"Michael", "[Peter,Kenny]"
"Kenny", "[Anders,Michael]"

…we’d need to tweak the part of the query which extracts our friends to strip off the first and last characters:

$ load csv with headers from "file:/Users/markneedham/Desktop/friendsa.csv" AS row 
  RETURN row, split(substring(row.friends, 1, length(row.friends) -2), ",") AS friends;
| row                                              | friends              |
| {name -> "Mark", friends -> "[Michael,Peter]"}   | ["Michael","Peter"]  |
| {name -> "Michael", friends -> "[Peter,Kenny]"}  | ["Peter","Kenny"]    |
| {name -> "Kenny", friends -> "[Anders,Michael]"} | ["Anders","Michael"] |
3 rows

And then if we put the whole query together we end up with this:

$ load csv with headers from "file:/Users/markneedham/Desktop/friendsa.csv" AS row 
  WITH row, split(substring(row.friends, 1, length(row.friends) -2), ",") AS friends 
  UNWIND friends AS friend  
  MERGE (p1:Person {name:}) 
  MERGE (p2:Person {name: friend}) 
  MERGE (p1)-[:FRIENDS_WITH]->(p2);;
| No data returned. |
Nodes created: 5
Relationships created: 6
Properties set: 5
Labels added: 5
Categories: Programming

Fitting In With Corporate Culture

Making the Complex Simple - John Sonmez - Thu, 07/10/2014 - 15:00

In this video I answer a question about fitting into corporate culture when you come from a different background.

The post Fitting In With Corporate Culture appeared first on Simple Programmer.

Categories: Programming

Update on Android Wear Paid Apps

Android Developers Blog - Wed, 07/09/2014 - 22:10

We have a workaround to enable paid apps (and other apps that use Google Play's forward-lock mechanism) on Android Wear. The assets/ directory of those apps, which contains the wearable APK, cannot be extracted or read by the wearable installer. The workaround is to place the wearable APK in the res/raw directory instead.

As per the documentation, there are two ways to package your wearable app: use the “wearApp” Gradle rule to package your wearable app or manually package the wearable app. For paid apps, the workaround is to manually package your apps with the following two changes, and you cannot use the “wearApp” Gradle rule. To manually package the wearable APK into res/raw, do the following:

  1. Copy the signed wearable app into your handheld project's res/raw directory and rename it to wearable_app.apk, it will be referred to as wearable_app.
  2. Create a res/xml/wearable_app_desc.xml file that contains the version and path information of the wearable app:
    <wearableApp package="wearable app package name">

    The package, versionCode, and versionName are the same as values specified in the wearable app's AndroidManifest.xml file. The rawPathResId is the static variable name of the resource. If the filename of your resource is wearable_app.apk, the static variable name would be wearable_app.

  3. Add a <meta-data> tag to your handheld app's <application> tag to reference the wearable_app_desc.xml file.
    <meta-data android:name=""
  4. Build and sign the handheld app.

We will be updating the “wearApp” Gradle rule in a future update to the Android SDK build tools to support APK embedding into res/raw. In the meantime, for paid apps you will need to follow the manual steps outlined above. We will be also be updating the documentation to reflect the above workaround. We're working to make this easier for you in the future, and we apologize for the inconvenience.

Categories: Programming

Putting your Professional Group on the Map

Google Code Blog - Wed, 07/09/2014 - 17:30
By Sarah Maddox, Google Developer Relations team

People love to know what's happening in their area of expertise around the world. What better way to show it, than on a map? Tech Comm on a Map puts technical communication tidbits onto an interactive map, together with the data and functionality provided by Google Maps.

I'm a technical writer at Google. In this post I share a project that uses the new Data layer in the Google Maps JavaScript API, with a Google Sheets spreadsheet as a data source and a location search provided by Google Places Autocomplete.

Although this project is about technical communication, you can easily adapt it for other special interest groups too. The code is on GitHub. The map in action Visit Tech Comm on a Map to see it in action. Here's a screenshot:
The colored circles indicate the location of technical communication conferences, societies, groups and businesses. The 'other' category is for bits and pieces that don't fit into any of the categories. You can select and deselect the checkboxes at top left of the map, to choose the item types you're interested in.

When you hover over a circle, an info window pops up with information about the item you chose. If you click a circle, the map zooms in so that you can see where the event or group is located. You can also search for a specific location, to see what's happening there.

Let's look at the building blocks of Tech Comm on a Map.
Getting hold of a mapI'm using the Google Maps JavaScript API to display and interact with a map. Where does the data come from?When planning this project, I decided I want technical communicators to be able to add data (conferences, groups, businesses, and so on) themselves, and the data must be immediately visible on the map.

I needed a data entry and storage tool that provided a data entry UI, user management and authorization, so that I didn't have to code all that myself. In addition, contributors shouldn't need to learn a new UI or a new syntax in order to add data items to the map. I needed a data entry mechanism that is familiar to most people - a spreadsheet, for example.

In an episode of Google Maps Developer Shortcuts, Paul Saxman shows how to pull data from Google Drive into your JavaScript app. That's just what I needed. Here's how it works.

The data for Tech Comm on a Map is in a Google Sheets spreadsheet. It looks something like this:

Also in the spreadsheet is a Google Apps Script that outputs the data in JSON format:

var SPREADSHEET_ID = '[MY-SPREADSHEET-ID]';var SHEET_NAME = 'Data';function doGet(request) {  var callback = request.parameters.jsonp;  var range = SpreadsheetApp      .openById(SPREADSHEET_ID)      .getSheetByName(SHEET_NAME)      .getDataRange();  var json = callback + '(' +      Utilities.jsonStringify(range.getValues()) + ')';    return ContentService      .createTextOutput(json)      .setMimeType(ContentService.MimeType.JAVASCRIPT);}

Follow these steps to add the script to the spreadsheet and make it available as a web service:
  1. In Google Sheets, choose 'Tools' > 'Script Editor'.
  2. Add a new script as a blank project.
  3. Insert the above code.
  4. Choose 'File' > 'Manage Versions', and name the latest version of the script.
  5. Choose 'Publish' >  'Deploy as web app'. Make it executable by 'anyone, even anonymous'. Note: This means anyone will be able to access the data in this spreadsheet via a script.
  6. Choose 'Deploy'.
  7. Copy the URL of the web service. You'll need to paste it into the JavaScript on your web page.

In your JavaScript, define a variable to contain the URL of the Google Apps script, and add the JSONP callback parameter:
var DATA_SERVICE_URL =   "[MY-SCRIPT-ID]/exec?jsonp=?";

Then use jQuery's Ajax function to fetch and process the rows of data from the spreadsheet. Each row contains the information for an item: type, item name, description, website, start and end dates, address, latitude and longitude.

$.ajax({  url: DATA_SERVICE_URL,  dataType: 'jsonp',  success: function(data) {    // Get the spreadsheet rows one by one.    // First row contains headings, so start the index at 1 not 0.    for (var i = 1; i < data.length; i++) {{        properties: {          type: data[i][0],          name: data[i][1],          description: data[i][2],          website: data[i][3],          startdate: data[i][4],          enddate: data[i][5],          address: data[i][6]        },        geometry: {          lat: data[i][7],          lng: data[i][8]        }      });    }  }});The new Data layer in the Maps JavaScript API
Now that I could pull the tech comm information from the spreadsheet into my web page, I needed a way to visualize the data on the map. The new Data layer in the Google Maps JavaScript API is designed for just such a purpose. Notice the method in the above code. This is an instruction to add a feature in the Data layer.

With the basic JavaScript API you can add separate objects to the map, such as a polygon, a marker, or a line. But by using the Data layer, you can define a collection of objects and then manipulate and style them as a group. (The Data layer is also designed to play well with GeoJSON, but we don't need that aspect of it for this project.)

The tech comm data is represented as a series of features in the Data layer, each with a set of properties (type, name, address, etc) and a geometry (latitude and longitude).

Style the markers on the map, with different colors depending on the data type (conference, society, group, etc):

function techCommItemStyle(feature) {
 var type = feature.getProperty('type');
 var style = {
   icon: {      path: google.maps.SymbolPath.CIRCLE,      fillOpacity: 1,      strokeWeight: 3,      scale: 10            },    // Show the markers for this type if    // the user has selected the corresponding checkbox.    visible: (checkboxes[type] != false)  };
 // Set the marker colour based on type of tech comm item.  switch (type) {    case 'Conference':      style.icon.fillColor = '#c077f1';      style.icon.strokeColor = '#a347e1';      break;    case 'Society':      style.icon.fillColor = '#f6bb2e';      style.icon.strokeColor = '#ee7b0c';      break;. . . SNIPPED SOME DATA TYPES FOR BREVITY    default:      style.icon.fillColor = '#017cff';      style.icon.strokeColor = '#0000ff';  }  return style;}

Set listeners to respond when the user hovers over or clicks a marker. For example, this listener opens an info window on hover, showing information about the relevant data item:'mouseover', function(event) {    createInfoWindow(event.feature);;  });The Place Autocomplete search
The last piece of the puzzle is to let users search for a specific location on the map, so that they can zoom in and see the events in that location. The location search box on the map is provided by the Place Autocomplete widget from the Google Places API.What's next?
Tech Comm on a Map is an ongoing project. We technical communicators are using a map to document our presence in the world!

Would you like to contribute? The code is on GitHub.

Posted by Louis Gray, Googler
Categories: Programming

Developer Productivity Tool Review: Telerik’s Devcraft

Making the Complex Simple - John Sonmez - Tue, 07/08/2014 - 15:00

I don’t do many product reviews on this blog–and there is a good reason for it. I get plenty of requests for companies asking me to “pimp” their stuff on this blog, but most of the stuff I am asked to write about I just don’t use or would never really find myself using. However, […]

The post Developer Productivity Tool Review: Telerik’s Devcraft appeared first on Simple Programmer.

Categories: Programming

What is an Estimate?

Software Development Today - Vasco Duarte - Tue, 07/08/2014 - 04:00
If you don’t know what an estimate is, you can’t avoid using them. So here’s my attempt to define what is an estimate.
The "estimates" that I'm concerned about are those that can easily (by omission, incompetence or malice) be turned into commitments. I believe Software Development is better seen as a discovery process (even simple software projects). In this context, commitments remove options and force the teams/companies to follow a plan instead of responding to change.

Here's my definition: "Estimates, in the context of #NoEstimates, are all estimates that can be (on purpose, or by accident) turned into a commitment regarding project work that is not being worked on at the moment when the estimate is made."
The principles behind this definition of an estimateIn this definition I have the following principles in mind:
  • Delay commitment, create options: When we commit to a particular solution up front we forego possible other solutions and, as a consequence we will make the plan harder to change. Each solution comes with explicit and implicit assumptions about what we will tackle in the future, therefore I prefer to commit only to what is needed in order to validate or deliver value to the customer now. This way, I keep my options open regarding the future.
  • Responding to change over following a plan: Following a plan is easy and comforting, especially when plans are detailed and very clear: good plans. That’s why we create plans in the first place! But being easy does not make it right. Sooner or later we are surprised by events we could not predict and are no longer compatible with the plan we created upfront. Estimation up front makes it harder for us to change the plan because as we define the plan in detail, we commit ourselves to following it, mentally and emotionally.
  • Collaboration over contract negotiation: Perhaps one of the most important Agile principles. Even when you spend time and invest time in creating a “perfect” contract there will be situations you cannot foresee. What do you do then? Hopefully by then you’ve established a relationship of trust with the other party. In that case, a simple conversation to review options and chose the best path will be enough. Estimation locks us down and tends to put people on the defensive when things don’t happen as planned. Leaving the estimation open and relying on incremental development with constant focus on validating the value delivered will help both parties come to an agreement when things don’t go as planned. Thereby focusing on collaboration instead of justifying why an estimated release date was missed.
  Here are some examples that fit the definition of Estimates that I outlined above:
  • An estimate of time/duration for work that is several days, weeks or months in the future.
  • An estimate of value that is not part of an experiment (the longer the time-frame the more of a problem it is).
  • A long term estimate of time and/or value that we can only validate after that long term is overï»ż.

How do you define Estimates in your work? Are you still doing estimates that fit the definition above? What is making it hard to stop doing such estimates? Share below in the comments what you think of this definition and how you would improve it.

This definition of an estimate was sent to the #NoEstimates mailing list a few weeks ago. If you want to receive exclusive content about #NoEstimates just sign up below. You will receive a regular email on the topic of #NoEstimates as Agile Software Development.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */ Subscribe to our mailing list* indicates required Email Address * First Name Last Name

Picture credit: Pascal @ Flickr

Applying Little's Law in Agile Games

Xebia Blog - Mon, 07/07/2014 - 21:20

Have you ever used Little's Law to explain that lower WiP (work in progress) limits lead to shorter cycle times? Ever tried to illustrate Little's Law in an Agile game and found it doesn't hold? Then read this blog to discover that it is exactly true in Agile games and how it really works.

Some time ago I gave a kanban workshop. Part of the workshop was a game of folding paper airplanes to illustrate flow. To illustrate Little's Law we determined the throughput, cycle time and work in progress. To my surprise the law didn't hold. Not even close. In this blog I want to share the insight into why it does work!


It is well known that the average number of items in progress is proportional to the average cycle time of completed work items. The proportionality is the average input rate (or throughput rate) of work items. This relation is known as Little's Law. It was discovered by Little in the 1960s and has found many applications.

In kanban teams this relationship is often used to qualitatively argue that it is favourable for flow to have not too much work in parallel. To this end WiP (work in progress) limits are introduced. The smaller the WiP the smaller the average cycle time which means better flow.

A surprise to me was that it is exactly true and it remains true under very relaxed conditions.

Little's Law

In mathematical form the law is often stated as:


 \bar{N} = \lambda \bar{W} \bar{N} = \lambda \bar{W}

Here  \bar{N} \bar{N} is the average number of work items in progress at a certain time, and  \bar{W} \bar{W} is the average cycle time.  \lambda \lambda is the average input rate (new work items per unit of time). In stable systems this also equals the average throughput. In this case Little's Law is often (re)stated as


 \frac{\mathrm{Work\, in\, progress}}{\mathrm{Throughput}} = \mathrm{Cycle Time} \frac{\mathrm{Work\, in\, progress}}{\mathrm{Throughput}} = \mathrm{Cycle Time}


In practise one considers Little's Law over a finite period of time, e.g. 6 months, 5 sprints, 3 rounds in an Agile game. Also in practise, teams work on backlog items which are discrete items. After the work is done this results in a new product increment.

Under the following conditions (1) is exact:

  • The system is observed over a finite period of time,
  • The system is a queueing system.

A queuing system is a system that consists of discrete items which arrive at a certain rate, receive service after which they depart.

Examples of a queueing system. An agile team works on backlog items. A kanban team that works on production incidents. A scrum team.

Agile Game

An often used game for explaining the importance of flow to team is the game of folding paper airplanes. Many forms of this games exist. See e.g. [Heintz11].

For this blog's purpose consider a team that folds air planes. The backlog is a stack of white paper. 3 Rounds of folding are done. Airplanes that are folded and fly at least 2 meters are considered done.

new doc 5_1-1

At the end of each round we will collect the following metrics:

  • number of completed airplanes
  • number of airplanes in progress and not yet finished.

The result of the the 3 rounds are shown at the right. At the end of round 1 Team A completed 3 airplanes and having 8 unfinished airplanes. Likewise, Team B finished 4 airplanes in round 3 giving a total of 12 finished airplanes and having 6 unfinished airplanes in progress.

The cycle time are got by writing the round number of the sheet of paper when starting to fold the airplane. When done, write the round number of the paper. The cycle time for one airplanes is got by subtracting the two and adding 1.

Calculating Little's Law

The way I was always calculating the number for work in progress, throughput and cycle time has been

  1. averaging cycle time for all completed airplanes,
  2. averaging the throughput over all rounds,
  3. averaging the work in progress over all rounds.

When calculated at the end of round 3, for Team A this amounts to:

  • Average work in progress = (8+6+2)/3 = 16/3,
  • Average throughput = 10 (completed airplanes)/3 = 10/3,
  • Average cycle time = 22/10 = 11/5

Using (2) above we get: 16/3 / (10/3) = 8/5. This is not equal to the average cycle time of 11/5. Not even close. How come?

The Truth

The interpretation of work in progress, throughput and cycle time I got from working with cumulative flow diagrams. There are many resources explaining these, see e.g. [Vega2011].

The key to the correct interpretation is choosing the time interval for which to measure the quantities  \bar{W} \bar{W} ,  \lambda \lambda , and  \bar{W} \bar{W} . Second, using the input rate instead of the throughput. Third, at the end of the time period include the unfinished items. Last, in calculating  \bar{N} \bar{N} consider all items that went through the system.

When we reinterpret the results for teams A and B we get

Team A

  • Average work in progress
    In round 1 3 airplanes were completed and left 8 unfinished; a total of 11 for work in progress (11 airplanes picked up as work)
    In round 2 the team completed 2 airplanes and have 6 unfinished; a total of 8
    In round 3 the team finished an additional 5 airplanes and left 2 uncompleted; a total of 7
    When measured over 3 rounds an average of (11+8+7)/3 = 26/3
  • Average input rate
    Using the input rate:
    In round 1 the team picked up 11 new airplanes
    In round 2 the team picked up no additional airplanes
    In round 3 one new airplanes was picked up.
    An average input rate of (11+0+1)/3 = 4 airplanes per round
  • Average cycle time
    At the end of the third round 2 airplanes are left in progress; one taken up in the third round having a waiting time of 1 and one left from the first round having waiting time of 3 rounds. A total waiting time of 22 + 3 + 1 = 26 rounds.
    Averaging over 12 airplanes we have an average cycle time of 26/12 = 13/6 rounds per airplane.

Dividing the average work in progress by the average input rate we get 26/3 divided by 4 = 26/12(!). This is exactly equal to the calculated average cycle time!

Team B

In a similar manner the reinterpreted results for team B are:

  • Average work in progress = (13+14+10)/3 = 37/3 airplanes,
  • Average input rate = (13+2+3)/3 = 6 airplanes per round,
  • Average cycle time = (27 (completed) + 10 (unfinished))/18 (airplanes) = 37/18 rounds per airplane

Again, dividing the average work in progress by the average input rate we get 37/18 rounds per airplane, which again is exactly equal to the average cycle time or waiting time!

Note: the cycle time of 10 days is built up by (a) 1 airplane from round 1 (cycle time of 3), 2 airplanes picked up in round 2 (total of 4 rounds), 3 airplanes picked up in round 3 (total of 3 rounds).

What About Cumulative Flow Diagrams?

Now that we understand how to calculate the quantities in Little's Law, we go back to cumulative flow diagrams. How come Little's Law works in this case.

In the case of teams that have collected data on cycle time, work in progress and throughput Little's Law work when done as explained in the section 'Calculating Little's Law' because:

  1. the teams are kept stable by having WiP limits on the left most column ("To Do"); then the throughput is more or less equal to the input rate,
  2. the team has completed a fairly large amount of work items in which case the waiting time of unfinished work items can be neglected,
  3. when measured over the (large part of the) value creation process, the completed items per time period can often be neglected in the calculation of the average work in progress.

Little's Law (1) holds under the conditions that (a) the system considered is a queueing system and (b) the observation or measurements are done over a finite time interval. It then holds independently of the stationaryness of the probability distributions, queuing discipline, emptiness of the system at the start and end of the time interval.

Calculate the quantities  \bar{N} \bar{N} ,  \lambda \lambda , and  \bar{W} \bar{W} as follows:

  • Average work in progress  \bar{N} \bar{N}
    For each time interval considered count the total amount of work in the system and add any items completed in that time interval.
  • Average cycle time  \bar{W} \bar{W}
    Sum the cycle times for all completed items and include the waiting time for unfinished items and divide by the total number of items.
  • Average input rate  \lambda \lambda
    Add the total number of items that entered the system and divide by the total number of time intervals.

[Little61] Little, J. D. C. 1961. A proof for the queuing formula: L = ãW . Oper. Res. 9(3) 383–387.

[Heintz11] John Heintz, June 2011, Agile Airplane Game, GistLabs,

[Vega11] Vega Information System Services, Inc., September 2011, Basics of Reading Cumulative Flow Diagrams,


Do Software Developers Really Need Degrees?

Making the Complex Simple - John Sonmez - Mon, 07/07/2014 - 15:45

When I first started out my career as a software developer, I didn’t have a degree. I took my first real job when I was on summer break from my first year of college. By the time the summer was up and it was time to enroll back in school, I found that the salary […]

The post Do Software Developers Really Need Degrees? appeared first on Simple Programmer.

Categories: Programming

R/plyr: ddply – Error in vector(type, length) : vector: cannot make a vector of mode ‘closure’.

Mark Needham - Mon, 07/07/2014 - 07:07

In my continued playing around with plyr’s ddply function I was trying to group a data frame by one of its columns and return a count of the number of rows with specific values and ran into a strange (to me) error message.

I had a data frame:

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
df = data.frame(n, s, b)

And wanted to group and count on column ‘b’ so I’d get back a count of 2 for TRUE and 1 for FALSE. I wrote this code:

ddply(df, "b", function(x) { 
  countr <- length(x$n) 
  data.frame(count = count) 

which when evaluated gave the following error:

Error in vector(type, length) : 
  vector: cannot make a vector of mode 'closure'.

It took me quite a while to realise that I’d just made a typo in assigned the count to a variable called ‘countr’ instead of ‘count’.

As a result of that typo I think the R compiler was trying to find a variable called ‘count’ somwhere else in the lexical scope but was unable to. If I’d defined the variable ‘count’ outside the call to ddply function then my typo wouldn’t have resulted in an error but rather an unexpected resulte.g.

> count = 10
> ddply(df, "b", function(x) { 
+   countr <- length(x$n) 
+   data.frame(count = count) 
+ })
      b count
1 FALSE     4
2  TRUE     4

Once I spotted the typo and fixed it things worked as expected:

> ddply(df, "b", function(x) { 
+   count <- length(x$n) 
+   data.frame(count = count) 
+ })
      b count
1 FALSE     1
2  TRUE     2
Categories: Programming

Create the smallest possible Docker container

Xebia Blog - Fri, 07/04/2014 - 21:59

When you are playing around with Docker, you quickly notice that you are downloading large numbers of megabytes as you use preconfigured containers. A simple Ubuntu container easily exceeds 200MB and as software is installed on top of it, the size increases. In some use cases, you do not need everything that comes with Ubuntu. For example, if you want to run a simple web server, written in Go, there is no need for any tool around that at all.

I have been searching for the smallest possible container to start with and found this one:

docker pull scratch

The scratch image is perfect. Literally perfect! It is elegant, small and fast. It does not contain any bugs, security leaks, slow code or technical debt. And that is because it is basically empty. Except for a bit of metadata added by Docker. In fact, you could have created this scratch image yourself with this command as described in the Docker documentation:

tar cv --files-from /dev/null | docker import - scratch


So that is it, the smallest possible Docker image. End of blog post!

... or is there something more we can say about this? For example, how do you use the scratch base image? It turns out this brings some challenges of its own.

Creating content for the scratch image

What can we run on an empty base image? An executable without dependencies. Do you have executables without dependencies?

I used to write code in Python, Java and JavaScript. Each of these languages/platforms require a runtime installed. Recently, I started looking into the Go (or GoLang if you prefer) platform. And it seems (spoiler alert) like Go is statically  linked. So I tried compiling a simple web server saying Hello World and running it within the scratch container. Here is the code for the Hello World web server:

package main

import (

func helloHandler(w http.ResponseWriter, r *http.Request) {
	fmt.Fprintln(w, "Hello World from Go in minimal Docker container")

func main() {
	http.HandleFunc("/", helloHandler)

	fmt.Println("Started, serving at 8080")
	err := http.ListenAndServe(":8080", nil)
	if err != nil {
		panic("ListenAndServe: " + err.Error())


Obviously, I cannot compile my webserver inside the scratch container as there is no Go compiler in it. And as I am working on a Mac, I also cannot compile a Linux binary just like that. (Actually, it is possible to cross-compile GoLang sources to different platforms, but that is material for another blog post)

So I first need a Docker container with a Go compiler. Let's start simple:

docker run -ti google/golang /bin/bash


Inside this container, I can build the Go web server, which I have committed in a GitHub repository:

go get


The go get command is a variant of the go build command that allows fetching and building remote dependencies. You can start the resulting executable with:



This works. But it is not what we want. We need the hello world container to run inside the scratch container. So, in fact, we need a Dockerfile saying:

FROM scratch
ADD bin/helloworld /helloworld
CMD ["/helloworld"]


and then start that. Unfortunately, the way we started the google/golang container, there is no way to build this Dockerfile. So first, we need a way to access Docker from within the container.

Calling Docker from within Docker

When you use Docker, sooner or later you run into the need to control Docker from within Docker. There are multiple ways to accomplish this. You could use recursion and run Docker inside Docker. However, that seems overly complex and again leads to large containers. You can also provide access to the Docker server outside the instance with a few additional command line options:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(which docker):$(which docker) -ti google/golang /bin/bash


Before you continue, please rerun the Go compiler, as Docker forgot our previous compilation during the restart:

go get


When starting the container, the -v flag creates a volume inside the Docker container and allows you to provide a file from the Docker machine as input. The /var/run/docker.sock is the Unix socket that allows access to the Docker server. The $(which docker) part is a clever way to provide the path for the docker executable inside the container without hardcoding it. However, be careful when you use this command on an Apple when using boot2docker. If the docker executable is installed in a different location than it is installed in boot2docker's virtual machine, this results in a mismatch. It will be the executable inside the boot2docker virtual server that gets inserted into the container. So you may want to replace $(which docker) with /usr/local/bin/docker which is hardcoded. Similarly, if you run a different system, there is a chance that the /var/run/docker.sock has a different location and you need to adjust it accordingly.

Now you can use the Dockerfile inside the google/golang container in the $GOPATH directory, which points to /gopath in this example. Actually, I already checked this Dockerfile into GitHub. So you can copy it from the Go build directory to the desired location like this:



You need to copy this as the compiled binary is now located in $GOPATH/bin and it is not possible to include files from parent directories when building a Dockerfile. So after copying, the next step is:

docker build -t adejonge/helloworld $GOPATH


And if all goes, well, Docker responds with something like:

Successfully built 6ff3fd5a381d


Which allows you to run the container:

docker run -ti --name hellobroken adejonge/helloworld


But unfortunately, now Docker responds with:

2014/07/02 17:06:48 no such file or directory


So what is going on? We have a statically linked executable inside a scratch container. Did we make a mistake?

As it turns out, Go does not statically link libraries. Or at least not all libraries. Under Linux, we can see the dynamically linked libraries for an executable with the ldd command:

ldd $GOPATH/bin/helloworld 


Which responds with: => (0x00007fff039fe000) => /lib/x86_64-linux-gnu/ (0x00007f61df30f000) => /lib/x86_64-linux-gnu/ (0x00007f61def84000)
/lib64/ (0x00007f61df530000)


So before we can run the Hello World webserver, we need to tell the Go compiler to actually do static linking.

Creating statically linked executables in Go

In order to create statically linked executables, we need to tell Go to use the cgo compiler rather than the go compiler. The command to do so is:

CGO_ENABLED=0 go get -a -ldflags '-s'


The CGO_ENABLED environment variable tells Go to use the cgo compiler rather than the go compiler. The -a flag tells Go to rebuild all dependencies. Otherwise you still end up with dynamically linked dependencies. And finally the -ldflags '-s' flag is a nice extra. It reduces the file size of the resulting executable by roughly 50%. You can also do this without the cgo compiler. The size reduction is a result from removing debug information.

Just to be sure, rerun the ldd command.

ldd $GOPATH/bin/helloworld 


It should now respond with:

not a dynamic executable


You can also rerun the steps for creating the Docker container around the executable from scratch:

docker build -t adejonge/helloworld $GOPATH


And if all goes well, Docker responds with something like:

Successfully built 6ff3fd5a381d


Which allows you to run the container:

docker run -ti --name helloworld adejonge/helloworld


And this time it should respond with:

Started, serving at 8080


Until so far, there were many manual steps and there is a lot of room for error. Let's exit from the google/golang container and continue from the surrounding machine:

<Press Ctrl-C>


You can check the existence or absence of containers and images with:

docker ps -a
docker images -a


And you can do some cleaning of Docker with:

docker rm -f helloworld
docker rmi -f adejonge/helloworld


Creating a Docker container that creates a Docker container

The steps we took so far, we can also record in a Dockerfile and have Docker do the work for us:

FROM google/golang
RUN CGO_ENABLED=0 go get -a -ldflags '-s'
RUN cp /gopath/src/ /gopath
CMD docker build -t adejonge/helloworld gopath


I checked this Dockerfile into a separate GitHub repository called adriaandejonge/hellobuild. It can be built with this command:

docker build -t adejonge/hellobuild


Providing the  -t flag names the image as adejonge/hellobuild and implicitly tags it as latest. These names make it easier for you to remove the image later on. Next,  you can create a container from this image while providing the flags that you have seen earlier in this post:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(which docker):$(which docker) -ti --name hellobuild adejonge/hellobuild


Providing the --name hellobuild flag makes it easier to remove the container after running. In fact, you can do so right away, because after running this command, you already created the adejonge/helloworld image:

docker rm -f hellobuild
docker rmi -f adejonge/hellobuild


And now you can start a new container named helloworld based on the adejonge/helloworld image as you have done before:

docker run -ti --name helloworld adejonge/helloworld


Because all these steps are run from the same command line, without opening a bash shell inside a Docker container, you can add these steps to a bash script and run it automatically. For your convenience, I have added these bash scripts to the hellobuild GitHub repository.

Also, if you want to try the smallest possible Docker container running a Hello World web server without following all the steps described in this blog post, you can also use the pre-built image that I checked into the Docker Hub repository:

docker pull adejonge/helloworld


With docker images -a you can see that the size is 3.6MB. Of course, you can make it even smaller if you manage to create an executable that is smaller than the web server in Go that I wrote. In C or Assembly you may be able to do so. However, you can never make it smaller than the scratch image.

Dockerfiles as automated installation scripts

Xebia Blog - Thu, 07/03/2014 - 19:16

Dockerfiles are great and easily readable specifications for the installation and configuration of an application. It is terse, can be understood by anyone who understands UNIX commands, results in a testable product and can easily be turned into an automated installation script using a little awk'ward magic. Just in case you want to install the application in question on the good old fashioned way, without the Docker hassle :-)

In this case, we needed to experiment with the Codahale Metrics library and Zabbix. Instead of installing a complete Zabbix server, I googled for a docker container and was pleased to find a ready to run Zabbix server configuration created by Bernardo Gomez Palacio. . Unfortunately, the server stopped repeatedly after about 5 minutes due the simplevisor's impression that it was requested to stop. I could not figure out where this request was coming from, and as it was pretty persistent, I decided to install zabbix on a virtual box.

So I checked out the  docker-zabbix github project and found a ready to run Vagrant configuration to build the zabbix docker container itself (Cool!). The Dockerfile contained easily and readable instructions on how to install and configure Zabbix. But,  instead of copy-and-pasting the instructions to the command prompt, I cloned the project on the vagrant box and created the following awk script in order to execute the instructions in the Dockerfile directly on the running system.

/^ADD/ {
sub(/ADD/, "")
    cmd = "mkdir -p $(dirname " $2 ")"
    cmd = "cp " $0

/^RUN/ {
    sub(/RUN/, "")
    cmd = $0

After a few minutes, the image was properly configured. I just needed to run the database initialisation script (/ and ensured that all the services were started on reboot.

 cd /etc/init.d
for i in zabbix* httpd mysqld snmp* ; do
     chkconfig $i on
     service $i start

Even if you do not use Docker in production, Dockerfiles are a great improvement in the specifications of installation instructions!

Be Bold!

Making the Complex Simple - John Sonmez - Thu, 07/03/2014 - 15:00

Sometimes you have to just be bold if you want to maximize your opportunities.

The post Be Bold! appeared first on Simple Programmer.

Categories: Programming

How architecture enables kick ass teams (1): replication considered harmful?

Xebia Blog - Thu, 07/03/2014 - 11:51

At Xebia we regularly have discussions regarding Agile Architecture? What is it? What does it take? How should you organise this? Is it technical or organisational? And much more questions
 which I won’t be answering today. What I will do today is kick off a blog series covering subjects that are often part of these heated debates. In general what we strive for with Agile Architecture is an architecture that enables the organisation to keep moving fast and without IT be a limiting factor for realising changes. As you read this series you’ll start noticing one theme coming back over and over again: Autonomy. Sometimes we’ll be focussing on the architecture of systems, sometimes on the architecture of the organisation or teams, but autonomy is the overarching theme. And if you’re familiar with Conways Law it should be no surprise that there is a strong correlation between team and system structure. Having a structure of teams  that is completely different from your system landscape causes friction. We are convinced that striving for optimal team and system autonomy will lead to an organisation which is able to quickly adapt and respond to changes.

The first subject is replication of data, this is more a systems (landscape) issue and less of an organisational issue and definitely not the only one, more posts will follow.

We all have to deal with situations where:

  • consumers of a data retrieval service (e.g. customer account details) require this service to be highly available, or
  • compute intensive analysis must be done using the data in a system, or
  • data owned by a system must be searched in a way that is not (efficiently) supported by that system

These situations all impact the autonomy of the system owning the data.Is the system able to provide the it's functionality at the require quality level or do these external requirements lead to negative consequences on quality of the service provided or maintainability? Should these requirements be forced into the system or is another approach more appropriate?

Above examples all could be solved by replicating data into another system which is more suitable for meeting these requirements but 
 replication of data is considered to be harmful by some. Is it really? Often mentioned reasons not to replicate data are:

  • The replicated data will always be less accurate and timely than the original data
    True, and is this really a problem for the specific situation you’re dealing with? Sometimes you really need the latest version of a customer record, but in many situations it is no problem is the data is seconds, minutes or even hours old.
  • Business logic that interprets the data is implemented twice and needs to be maintained
    Yes, and you have to compare the costs of this against the benefits. As long as the benefits outweigh the costs, it is a good choice.  You can even consider to provide a library that is used in both systems.
  • System X is the authoritative source of the data and should be the only one that exposes it
    Agree, and keeping that system as the authoritative source is good practice and does not mean that there could be read only access to the same (replicated) data in other systems.

As you can see it is never a black and white decision, you’ll have to make a balanced decision and include benefits and costs of both alternatives. The gained autonomy and business benefits derived from this can easily outweigh the extra development, hosting and maintenance costs of replicating data.

A few concrete examples from my own experience:

We had a situation where a CRM system instance owned data which was also required in a 24x7 emergency support proces. The data was nicely exposed by a number of data retrieval services. At that organisation the CRM system deployment was such that most components were redundant, but during updates the system as a whole would still be down for several hours. Which was not acceptable given that the data was required in a 24x7 emergency support process. Making the CRM system deployment upgradable without downtime was not possible or would cost .
In this situation the costs of replicating the CRM system database to another datacenter using standard database features and having the data retrieval services access either that replicated database or the original database (as fall back) was much cheaper than trying to make CRM system itself high available. The replicated database would remain running accessible even when CRM system  got upgraded. Yes, we’re bypassing the CRM system business logic for interpreting the data, but for this situation the logic was so simple that the costs of reimplementing and maintaining this in a new lightweight service (separate from CRM system) were neglectable.

Another example is from a telecom provider that uses a chain of fulfilment systems in which it registered all network products sold to its customers (e.g. internet access, telephony, tv). Each product instance depends on instances registered in another system and if you drill down deep enough you’ll reach the physical network hardware ports on which it runs. The systems that registered all products used a relational model which was okay for registration. However, questions like “if this product instance breaks, which customers are impacted” were impossible to answer without overheating CPUs in those systems. By publishing all changes in the registrations to a separate system we could model the whole inventory of services as a network graph and easily do analysis on it without impacting the fulfilment systems. The fact that the data would be a (at most) a few seconds old was absolutely no problem.

And a last example is that sometimes you want to do a full (phonetic) text search through a subset of your domain model. Relational data models quickly get you into an unmaintainable situation. You’re SQL queries will require many tables, lot’s of inefficient “LIKE ‘%gold%’" and developers that have a hard time understanding what a query actually intended to do. Replicating the data to a search engine makes searching far easier and provides more possibilities for searches that are hard to realise in a relational database.

As you can see replication of data can increase autonomy of systems and teams and thereby make your system or system landscape and organisation more agile. I.e. you can realise new functionality faster and get it available for your users quicker because the coupling with other systems or teams is reduced.

In a next blog we'll discuss another subject that impacts team or system autonomy.

Google Play services 4.4

Android Developers Blog - Wed, 07/02/2014 - 20:01

A new release of Google Play services has now been rolled out to the world, and as usual we have a number of features that can make your apps better than before. This release includes a major enhancement to Maps with the introduction of Street View, as well as new features in Location, Games Services, Mobile Ads, and Wallet API.

Here are the highlights of Google Play services release 4.4:

Google Maps Android API

Starting with a much anticipated announcement for the Google Maps Android API: Introducing Street View. You can now embed Street View imagery into an activity enabling your users to explore the world through panoramic 360-degree views. Programmatically control the zoom and orientation (tilt and bearing) of the Street View camera, and animate the camera movements over a given duration. Here is an example of what you can do with the API, where the user navigates forward one step:
We've also added more features to the Indoor Maps feature of the API. You can turn the default floor picker off - useful if you want to build your own. You can also detect when a new building comes into focus, and find the currently-active building and floor. Great if you want to show custom markup for the active level, for example.

Activity Recognition

And while we are on the topic of maps, let’s turn to some news in the Location API. For those of you that have used this API, you may have seen the ability already there to detect if the device is in a vehicle, on a bicycle, on foot, still, or tilting.

In this release, two new activity detectors have been added: Running, and Walking. So a great opportunity to expand your app to be even more responsive to your users. And for you that have not worked with this capability earlier, we hardly need to tell the cool things you can do with it. Just imagine combining this capability with features in Maps, Games Services, and other parts of Location...

Games Services Update

In the 4.3 release we introduced Game Gifts, which allows you to request gifts or wishes. And although there are no external API changes this time, the default requests sending UI has been extended to now allow the user to select multiple Game Gifts recipients. For your games this means more collaboration and social engagement between your players.

Mobile Ads

For Mobile Ads, we’ve added new APIs for publishers to display in-app promo ads, which enables users to purchase advertised items directly. We’re offering app developers control of targeting specific user segments with ads, for example offering high-value users an ad for product A, or new users with an ad for product B, etc.

With these extensions, users can conveniently purchase in-app items that interest them, advertisers can reach consumers, and your app connects the dots; a win-win-win in other words.

Wallet Fragments

For the Instant Buy API, we’ve now reduced the work involved to place a Buy With Google button in an app. The WalletFragment API introduced in this release makes it extremely easy to integrate Google Wallet Instant Buy with an existing app. Just configure these fragments and add them to your app.

And that’s another release of Google Play services. The updated Google Play services SDK is now available through the Android SDK manager. Coming up in June is Google I/O, no need to say more

For the release video, please see:
DevBytes: Google Play services 4.4

For details on the APIs, please see:
New Features in Google Play services 4.4

Join the discussion on
+Android Developers

Categories: Programming

Art, made with code, opens at London’s Barbican

Google Code Blog - Wed, 07/02/2014 - 19:30
Author PhotoBy Paul Kinlan, Staff Developer Advocate and tinkerer

Good News Everybody! DevArt has officially opened at the Barbican’s Digital Revolution Exhibition, the biggest exploration of digital creativity ever staged in the UK.

(Images - Andrew Meredith)
Technology has long gone hand in hand with art and with DevArt we’re showcasing the developers who use technology as their canvas and code as their raw material to create innovative, interactive digital art installations. Karsten Schmidt, Zach Lieberman, and duo Varvara Guljajeva and Mar Canet, have been commissioned by Google and the Barbican for Digital Revolution. Alongside these three commissions, a fourth - Cyril Diagne and Beatrice Lartigue - were handpicked as a result of DevArt’s global initiative to discover the interactive artists of tomorrow. You can also see their incredible art online and through our exhibition launch film here:

Play the World, 2014. Zach Lieberman [View on Github]
Using Google Compute Engine, Google Maps Geolocation API and openFrameworks, Zach has been able to find musical notes from hundreds of live radio stations around the world, resulting in a unique geo-orientated piece of music every time a visitor plays the piano at the centre of the piece.

Image by Andrew Meredith

Wishing Wall, 2014, Varvara Guljajeva & Mar Canet [View on Github]
Taking advantage of Google Compute Engine, Web Speech API, Chrome Apps, openFrameworks and node.js, Varvara and Mar are able to capture a whispered wish, and let you watch it transform before your eyes, allowing you to reach out and let it land on your hand.

Image by Andrew Meredith

Co(de) Factory, 2014, Karsten Schmidt [View on Github]
Android, Google Cloud Platform, Google Closure Compiler, WebGL, WebSockets, and YouTube have been combined by Karsten to allow anybody to create art and become an artist. It empowers people by giving them the tools to create, and offers them the chance to have their digital piece fabricated in 3D and showcased in the exhibition.

Image by Andrew Meredith

Les MĂ©tamorphoses de Mr. Kalia, 2014, BĂ©atrice Lartigue and Cyril Diagne [View on Github]
Android, Chrome Apps, Google App Engine, node.js, openFrameworks have enabled BĂ©atrice and Cyril to create tracking technology that transforms movement into a visual performance where visitors take on the persona of Mr. Kalia, a larger-than-life animated character, that undergoes a series of surreal changes while following your every movement.

Image by Andrew Meredith

DevArt will tour the world with the Digital Revolution Exhibition for up to five years following the Barbican show in London.

Soon we’re also starting our DevArt Young Creators program — an education component of DevArt designed to inspire a new generation of coders — each led by the DevArt interactive artists. Developed alongside the UK’s new computing curriculum, the workshops have been designed especially for students aged 9-13 years who have never tried coding before. Each workshop will be developed into lesson plans in-line with the UK’s new national computing curriculum, and distributed to educators by arts and technology organisations.

Paul Kinlan is a Developer Advocate in the UK on the Chrome team specialising on mobile. He lives in Liverpool and loves trying to progress the city's tech community from places like DoES Liverpool hack-space.

Posted by Louis Gray, Googler
Categories: Programming

Loan calculator

Phil Trelford's Array - Wed, 07/02/2014 - 08:17

Yesterday I came across a handy loan payment calculator in C# by Jonathon Wood via Alvin Ashcraft’s Morning Dew links. The implementation appears to be idiomatic C# using a class, mutable properties and wrapped in a host console application to display the results.

I thought it’d be fun to spend a few moments re-implementing it in F# so it can be executed in F# interactive as a script or a console application.

Rather than use a class, I’ve plumped for a record type that captures all the required fields:

/// Loan record
type Loan = {
   /// The total purchase price of the item being paid for.
   PurchasePrice : decimal
   /// The total down payment towards the item being purchased.
   DownPayment : decimal
   /// The annual interest rate to be charged on the loan
   InterestRate : double
   /// The term of the loan in months. This is the number of months
   /// that payments will be made.
   TermMonths : int


And for the calculation simply a function:

/// Calculates montly payment amount
let calculateMonthlyPayment (loan:Loan) =
   let monthsPerYear = 12
   let rate = (loan.InterestRate / double monthsPerYear) / 100.0
   let factor = rate + (rate / (Math.Pow(rate+1.,double loan.TermMonths) 1.))
   let amount = loan.PurchasePrice - loan.DownPayment
   let payment = amount * decimal factor


We can test the function immediately in F# interactive

let loan = {
   PurchasePrice = 50000M
   DownPayment = 0M
   InterestRate = 6.0
   TermMonths = 5 * 12

calculateMonthlyPayment loan


Then a test run (which produces the same results as the original code):

let displayLoanInformation (loan:Loan) =
   printfn "Purchase Price: %M" loan.PurchasePrice
   printfn "Down Payment: %M" loan.DownPayment
   printfn "Loan Amount: %M" (loan.PurchasePrice - loan.DownPayment)
   printfn "Annual Interest Rate: %f%%" loan.InterestRate
   printfn "Term: %d months" loan.TermMonths
   printfn "Monthly Payment: %f" (calculateMonthlyPayment loan)
   printfn ""

for i in 0M .. 1000M .. 10000M do
   let loan = { loan with DownPayment = i }
   displayLoanInformation loan


Another option is to simply skip the record and use arguments:

/// Calculates montly payment amount
let calculateMonthlyPayment(purchasePrice,downPayment,interestRate,months) =
   let monthsPerYear = 12
   let rate = (interestRate / double monthsPerYear) / 100.0
   let factor = rate + (rate / (Math.Pow(rate + 1.0, double months) - 1.0))
   let amount = purchasePrice - downPayment
   let payment = amount * decimal factor
Categories: Programming

R/plyr: ddply – Renaming the grouping/generated column when grouping by date

Mark Needham - Wed, 07/02/2014 - 07:30

On Nicole’s recommendation I’ve been having a look at R’s plyr package to see if I could simplify my meetup analysis and I started by translating my code that grouped meetup join dates by day of the week.

To refresh, the code without plyr looked like this:

timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01")
query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"})
         RETURN membership.joined AS joinDate"
meetupMembers = cypher(graph, query)
meetupMembers$joined <- timestampToDate(meetupMembers$joinDate)
dd = aggregate(meetupMembers$joined, by=list(format(meetupMembers$joined, "%A")), function(x) length(x))
colnames(dd) = c("dayOfWeek", "count")

which returns the following:

> dd
  dayOfWeek count
1    Friday   135
2    Monday   287
3  Saturday    80
4    Sunday   102
5  Thursday   187
6   Tuesday   286
7 Wednesday   211

We need to use plyr’s ddply function which takes a data frame and transforms it into another one.

To refresh, this is what the initial data frame looks like:

> meetupMembers[1:10,]
       joinDate              joined
1  1.376572e+12 2013-08-15 14:13:40
2  1.379491e+12 2013-09-18 08:55:11
3  1.349454e+12 2012-10-05 17:28:04
4  1.383127e+12 2013-10-30 09:59:03
5  1.372239e+12 2013-06-26 10:27:40
6  1.330295e+12 2012-02-26 22:27:00
7  1.379676e+12 2013-09-20 12:22:39
8  1.398462e+12 2014-04-25 22:41:19
9  1.331734e+12 2012-03-14 14:11:43
10 1.396874e+12 2014-04-07 13:32:26

Most of the examples of using ddply show how to group by a specific ‘column’ e.g. joined but I want to group by part of the value in that column and eventually came across an example which showed how to do it:

> ddply(meetupMembers, .(format(joined, "%A")), function(x) {
    count <- length(x$joined)
    data.frame(count = count)
  format(joined, "%A") count
1               Friday   135
2               Monday   287
3             Saturday    80
4               Sunday   102
5             Thursday   187
6              Tuesday   286
7            Wednesday   211

Unfortunately the generated column heading for the group by key isn’t very readable and it took me way longer than it should have to work out how to name it as I wanted! This is how you do it:

> ddply(meetupMembers, .(dayOfWeek=format(joined, "%A")), function(x) {
    count <- length(x$joined)
    data.frame(count = count)
  dayOfWeek count
1    Friday   135
2    Monday   287
3  Saturday    80
4    Sunday   102
5  Thursday   187
6   Tuesday   286
7 Wednesday   211

If we want to sort that in descending order by ‘count’ we can wrap that ddply in another one:

> ddply(ddply(meetupMembers, .(dayOfWeek=format(joined, "%A")), function(x) {
    count <- length(x$joined)
    data.frame(count = count)
  }), .(count = count* -1))
  dayOfWeek count
1    Monday   287
2   Tuesday   286
3 Wednesday   211
4  Thursday   187
5    Friday   135
6    Sunday   102
7  Saturday    80

From reading a bit about ddply I gather that its slower than using some other approaches e.g. data.table but I’m not dealing with much data so it’s not an issue yet.

Once I got the hang of how it worked ddply was quite nice to work with so I think I’ll have a go at translating some of my other code to use it now.

Categories: Programming

New Entreprogrammers Episodes: 18 and 19

Making the Complex Simple - John Sonmez - Tue, 07/01/2014 - 18:37

We have two new episodes this week, since I had to call an emergency meeting last week about an important life decision. Check them out here: I can’t believe we are already at episode 19.

The post New Entreprogrammers Episodes: 18 and 19 appeared first on Simple Programmer.

Categories: Programming

How combined Lean- and Agile practices will change the world as we know it

Xebia Blog - Tue, 07/01/2014 - 08:50

You might have attended this month at our presentation about eXtreme Manufacturing and the keynote of Nalden last week on XebiCon 2014. There are a few epic takeaways and additions I would like to share with you in this blogpost.

Epic TakeAway #1: The Learn, Unlearn and Relearn Cycle Like Nalden expressed in his inspiring keynote, one of the major things for him to be successful is being able to Learn, Unlearn and Relearn every time again. In my opinion, this will be the key ability for every successful company in the near future.  In fact, this is how nature evolutes: in the end, only the species who are able to adapt to changing circumstances will survive and evolute. This mechanism makes for example, most of the startups fail, but those who will survive, can be extremely disruptive for non-agile organizations.  Best example for this is of course Whatsapp.  Beating up the Telco Industry by almost destroying their whole businessmodel in only a few months. Learn more about disruptive innovation from one of my personal heroes, Harvard Professor Clayton Christensen.

Epic TakeAway #2: Unlearning Waterfall, Relearning Lean & Agile Globally, Waterfall is still the dominant method in companies and universities.  Waterfall has its origins more than 40 years ago. Times have changed. A lot. A new, successful and disruptive product could be there in only a matter of days instead of (many) years. Finally, things are changing. For example, the US Department of Defence has recently embraced Lean and Agile as mandatory practices, especially Scrum. Schools and universities are also more and more adopting the Agile way of working. Later more in this blogpost.

Epic TakeAway #3: Combined Lean- and Agile practices =  XM Lean practices arose in Japan in the 1980’s , mainly in the manufacturing industry, Toyota being the frontrunner here.  Agile practices like Scrum, were first introduced in the 1990’s by Ken Schwaber and Jeff Sutherland, these practices were mainly applied in the IT-industry. Until now, the manufacturing and IT world didn’t really joined forces combining Lean and Agile practices.  Until recently.  The WikiSpeed initiative of Joe Justice proved combining these practices result in a hyper-productive environment, where a 100 Mile/Gallon road legal sportscar could be developed in less than 3 months.  Out of this success eXtreme Manufacturing (XM) arose. Finally, a powerful combination of best practices from the manufacturing- and IT-world came together.

Epic TakeAway #4: Agile Mindset & Education fotoLike Sir Ken Robinson and Dan Pink already described in their famous TED-talks, the way most people are educated and rewarded, is not suitable anymore for modern times and even conflicts with the way we are born.  We learn by "failing", not by preventing it.  Failing in it’s essence should stimulate creativity to do things better next time, not be punished.  On the long run, failing (read: learning!) has more added value than short-term succes, for example by chasing milestones blindly. EduScrum in the Netherlands stimulates schools and universities to apply Scrum in their daily classes in order to stimulate creativity, happiness, self-reliantness and talent. The results of the schools joining these initiative are spectacular: happy students, less dropouts an significantly higher grades. For a prestigious project for the Delft University, Forze, the development of a hydrogen race car, the students are currently being trained and coached to apply Agile and Lean practices.  Also these results are more than promising. The Forze team is happier, more productive and more able to learn faster and better from setbacks.  Actually, they are taking the first steps of being anti-fragile.  Due too an intercession of the Forze team members themselves,  the current support of agile (Xebia) coaches is now planned being extended to the flagship of the Delft University:  the NUON solar team.

The Final Epic TakeAway In my opinion, we reached a tipping point in the way goals should be achieved.  Organizations are massively abandoning Waterfall and embracing Agile practices, like Scrum.  Adding Lean practices like Joe Justice did in his WikiSpeed project, makes Agile and Lean extremely powerful.  Yes, this will even make this world a much better place.  We cannot prevent nature disasters with this, but we can be anti-fragile.  We cannot prevent every epidemic, but we can respond in an XM-fashion on this by developing a vaccin in only days instead of years.  This brings me finally to the missing statement of the current Agile Manifesto:   We should Unlearn and Relearn before we Judge.  Dare to Dream like a little kid again. Unlearn your skepticism.  Companies like Boeing, Lockheed Martin and John Deere already did. Adopting XM speeded up their velocity in some cases with more than 7 times.

What is Capacity in software development? - The #NoEstimates journey

Software Development Today - Vasco Duarte - Tue, 07/01/2014 - 04:00

I hear this a lot in the #NoEstimates discussion: you must estimate to know what you can deliver for a certain price, time or effort.

Actually, you don’t. There’s a different way to look at your organization and your project. Organizations and projects have an inherent capacity, that capacity is a result of many different variables - not all can be predicted. Although you can add more people to a team, you don’t actually know what the impact of that addition will be until you have some data. Estimating the impact is not going to help you, if we are to believe the track record of the software industry.

So, for me the recipe to avoid estimates is very simple: Just do it, measure it and react. Inspect and adapt - not a very new idea, but still not applied enough.

Let’s make it practical. How many of these stories or features is my team or project going to deliver in the next month? Before you can answer that question, you must find out how many stories or features your team or project has delivered in the past.

Look at this example.

How many stories is this team going to deliver in the next 10 sprints? The answer to this question is the concept of capacity (aka Process Capability). Every team, project or organization has an inherent capacity. Your job is to learn what that capacity is and limit the work to capacity! (Credit to Mary Poppendieck (PDF, slide 15) for this quote).

Why is limiting work to capacity important? That’s a topic for another post, but suffice it to say that adding more work than the available capacity, causes many stressful moments and sleepless nights; while having less work than capacity might get you and a few more people fired.

My advice is this: learn what the capacity of your project or team is. Only then you will be able to deliver reliably, and with quality the software you are expected to deliver.

How to determine capacity?

Determining the capacity of capability of a team, organization or project is relatively simple. Here's how

  • 1- Collect the data you have already:
    • If using timeboxes, collect the stories or features delivered(*) in each timebox
    • If using Kanban/flow, collect the stories or features delivered(*) in each week or period of 2 weeks depending on the length of the release/project
  • 2- Plot a graph with the number of stories delivered for the past N iterations, to determine if your System of Development (slideshare) is stable
  • 3- Determine the process capability by calculating the upper (average + 1*sigma) and the lower limits(average - 1*sigma) of variability

At this point you know what your team, organization or process is likely to deliver in the future. However, the capacity can change over time. This means you should regularly review the data you have and determine (see slideshare above) if you should update the capacity limits as in step 3 above.

(*): by "delivered" I mean something similar to what Scrum calls "Done". Something that is ready to go into production, even if the actual production release is done later. In my language delivered means: it has been tested and accepted in a production-like environment.

Note for the statisticians in the audience: Yes, I know that I am assuming a normal distribution of delivered items per unit of time. And yes, I know that the Weibull distribution is a more likely candidate. That's ok, this is an approximation that has value, i.e. gives us enough information to make decisions.

You can receive exclusive content (not available on the blog) on the topic of #NoEstimates, just subscribe to the #NoEstimates mailing list below. As a bonus you will get my #NoEstimates whitepaper, where I review the background and reasons for using #NoEstimates #mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */ Subscribe to our mailing list* indicates required Email Address * First Name Last Name

Picture credit: John Hammink, follow him on twitter