Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Distributed Agile: Daily Stand-up Meetings

Stand-ups are best on your feet!

Stand-ups are best on your feet!

Distributed Agile teams require a different level of care and feeding than a co-located team in order to ensure that they are as effective as possible. This is even truer for a team that is working through their forming-storming-norming process. Core to making Agile-as-framework work effectively is the concepts of team and communication. Daily stand-up meetings are one the most important communication tools used in Scrum or other Agile/Lean frameworks. Techniques that are effective making daily stand-ups work for distributed teams include:

  1. Deal with the time zone issue. There are two primary options to deal with time zones. The first is to keep the team members within three or four time zones of each other. Given typical sourcing options, this tends to be difficult. A second option is to rotate the time for the stand-up meeting from sprint to sprint so that everyone loses a similar amount of sleep (share the pain option). One usable solution that can be tried when distributed teams can’t overlap is to have one team member (rotate) staying late or coming in early to overlap work times.
  2. Identify and attack blockers between stand-ups. Typically, on distributed teams, all parties will not work at the same time. Team members should be counseled to communicate blockers to the team as soon as they are discovered so that something discovered late in the day in one time zone does not affect the team in a different time zone that might just be starting to work. One group I worked with had stand-ups twice each day (at the beginning of the day and at the end of the day) to ensure continuous communication.
  3. Push status outside the stand-up. A solution suggested by Matt Hauser is to have the team answer the classic three questions (What did you do yesterday? What will you do today? Is there anything blocking your progress?) on a WIKI for everyone on the team to read before the stand-up meeting. This helps focus the meeting on planning or dealing with issues.
  4. Vary the question set being asked. The process of varying the question set keeps the team focused on communication rather than giving a memorized speech. For example ask:
    1. Is anyone stuck?
    2. Does anyone need help?
    3. What did not get competed yesterday?
    4. Is there anything everyone should know?

This technique can be used for non-distributed teams, as well as distributed teams.

  1. Ensure that everyone is standing. This is code for making sure that everyone is paying attention and staying focused. Standing is just one technique for helping team members stay focused. Others include banning cell phones and side conversations.
  2. Make sure the meeting stays “crisp.” Stand-up meetings by definition are short and to the point. The team needs to ensure that the meeting stays as disciplined as possible. All team members should show up on time and be prepared to discuss their role in the project. Discussion includes the willingness to ask for help and to provide help to team members.
  3. Use a physical status wall. While the term “distributed” screams tool usage, using a physical wall helps to focus the team. The simplicity of a physical wall takes the complexity of tool usage off the table so the focus can be on communication. Use of a physical wall in a distributed environment will mean using video to moving tasks on the wall (after the fact a picture can be provided to the team). If video is not available, use a tool that EVERYONE has access to. Keep tools as simple as possible.
  4. Don’t stop doing stand-ups. Stand-up meetings are a critical communication and planning event, not doing stand-ups for a distributed team is an indicator that the organization should go back to project manager/plan-based methods.

Like any other distributed team meeting, having good telecommunication/video tools is not only important, it is a prerequisite. If team members can’t hear each other, they CAN’T communicate.

Stand-ups are nearly ubiquitous in Agile. I would do stand-ups even if I were not doing Agile. However despite their simplicity, the added complexity of distributed teams can cause problems. The whole team is responsible for making the stand-up meetings work. While the Scrum master may take the lead in insuring the logistics are right or to facilitate the session when needed, everyone needs to play a role.


Categories: Process Management

R: A first attempt at linear regression

Mark Needham - Tue, 09/30/2014 - 23:20

I’ve been working through the videos that accompany the Introduction to Statistical Learning with Applications in R book and thought it’d be interesting to try out the linear regression algorithm against my meetup data set.

I wanted to see how well a linear regression algorithm could predict how many people were likely to RSVP to a particular event. I started with the following code to build a data frame containing some potential predictors:

library(RNeo4j)
officeEventsQuery = "MATCH (g:Group {name: \"Neo4j - London User Group\"})-[:HOSTED_EVENT]->(event)<-[:TO]-({response: 'yes'})<-[:RSVPD]-(),
                           (event)-[:HELD_AT]->(venue)
                     WHERE (event.time + event.utc_offset) < timestamp() AND venue.name IN [\"Neo Technology\", \"OpenCredo\"]
                     RETURN event.time + event.utc_offset AS eventTime,event.announced_at AS announcedAt, event.name, COUNT(*) AS rsvps"
 
events = subset(cypher(graph, officeEventsQuery), !is.na(announcedAt))
events$eventTime <- timestampToDate(events$eventTime)
events$day <- format(events$eventTime, "%A")
events$monthYear <- format(events$eventTime, "%m-%Y")
events$month <- format(events$eventTime, "%m")
events$year <- format(events$eventTime, "%Y")
events$announcedAt<- timestampToDate(events$announcedAt)
events$timeDiff = as.numeric(events$eventTime - events$announcedAt, units = "days")

If we preview ‘events’ it contains the following columns:

> head(events)
            eventTime         announcedAt                                        event.name rsvps       day monthYear month year  timeDiff
1 2013-01-29 18:00:00 2012-11-30 11:30:57                                   Intro to Graphs    24   Tuesday   01-2013    01 2013 60.270174
2 2014-06-24 18:30:00 2014-06-18 19:11:19                                   Intro to Graphs    43   Tuesday   06-2014    06 2014  5.971308
3 2014-06-18 18:30:00 2014-06-08 07:03:13                         Neo4j World Cup Hackathon    24 Wednesday   06-2014    06 2014 10.476933
4 2014-05-20 18:30:00 2014-05-14 18:56:06                                   Intro to Graphs    53   Tuesday   05-2014    05 2014  5.981875
5 2014-02-11 18:00:00 2014-02-05 19:11:03                                   Intro to Graphs    35   Tuesday   02-2014    02 2014  5.950660
6 2014-09-04 18:30:00 2014-08-26 06:34:01 Hands On Intro to Cypher - Neo4j's Query Language    20  Thursday   09-2014    09 2014  9.497211

We want to predict ‘rsvps’ from the other columns so I started off by creating a linear model which took all the other columns into account:

> summary(lm(rsvps ~., data = events))
 
Call:
lm(formula = rsvps ~ ., data = events)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-8.2582 -1.1538  0.0000  0.4158 10.5803 
 
Coefficients: (14 not defined because of singularities)
                                                                    Estimate Std. Error t value Pr(>|t|)   
(Intercept)                                                       -9.365e+03  3.009e+03  -3.113  0.00897 **
eventTime                                                          3.609e-06  2.951e-06   1.223  0.24479   
announcedAt                                                        3.278e-06  2.553e-06   1.284  0.22339   
event.nameGraph Modelling - Do's and Don'ts                        4.884e+01  1.140e+01   4.286  0.00106 **
event.nameHands on build your first Neo4j app for Java developers  3.735e+01  1.048e+01   3.562  0.00391 **
event.nameHands On Intro to Cypher - Neo4j's Query Language        2.560e+01  9.713e+00   2.635  0.02177 * 
event.nameIntro to Graphs                                          2.238e+01  8.726e+00   2.564  0.02480 * 
event.nameIntroduction to Graph Database Modeling                 -1.304e+02  4.835e+01  -2.696  0.01946 * 
event.nameLunch with Neo4j's CEO, Emil Eifrem                      3.920e+01  1.113e+01   3.523  0.00420 **
event.nameNeo4j Clojure Hackathon                                 -3.063e+00  1.195e+01  -0.256  0.80203   
event.nameNeo4j Python Hackathon with py2neo's Nigel Small         2.128e+01  1.070e+01   1.989  0.06998 . 
event.nameNeo4j World Cup Hackathon                                5.004e+00  9.622e+00   0.520  0.61251   
dayTuesday                                                         2.068e+01  5.626e+00   3.676  0.00317 **
dayWednesday                                                       2.300e+01  5.522e+00   4.165  0.00131 **
monthYear01-2014                                                  -2.350e+02  7.377e+01  -3.185  0.00784 **
monthYear02-2013                                                  -2.526e+01  1.376e+01  -1.836  0.09130 . 
monthYear02-2014                                                  -2.325e+02  7.763e+01  -2.995  0.01118 * 
monthYear03-2013                                                  -4.605e+01  1.683e+01  -2.736  0.01805 * 
monthYear03-2014                                                  -2.371e+02  8.324e+01  -2.848  0.01468 * 
monthYear04-2013                                                  -6.570e+01  2.309e+01  -2.845  0.01477 * 
monthYear04-2014                                                  -2.535e+02  8.746e+01  -2.899  0.01336 * 
monthYear05-2013                                                  -8.672e+01  2.845e+01  -3.049  0.01011 * 
monthYear05-2014                                                  -2.802e+02  9.420e+01  -2.975  0.01160 * 
monthYear06-2013                                                  -1.022e+02  3.283e+01  -3.113  0.00897 **
monthYear06-2014                                                  -2.996e+02  1.003e+02  -2.988  0.01132 * 
monthYear07-2014                                                  -3.123e+02  1.054e+02  -2.965  0.01182 * 
monthYear08-2013                                                  -1.326e+02  4.323e+01  -3.067  0.00976 **
monthYear08-2014                                                  -3.060e+02  1.107e+02  -2.763  0.01718 * 
monthYear09-2013                                                          NA         NA      NA       NA   
monthYear09-2014                                                  -3.465e+02  1.164e+02  -2.976  0.01158 * 
monthYear10-2012                                                   2.602e+01  1.959e+01   1.328  0.20886   
monthYear10-2013                                                  -1.728e+02  5.678e+01  -3.044  0.01020 * 
monthYear11-2012                                                   2.717e+01  1.509e+01   1.800  0.09704 . 
month02                                                                   NA         NA      NA       NA   
month03                                                                   NA         NA      NA       NA   
month04                                                                   NA         NA      NA       NA   
month05                                                                   NA         NA      NA       NA   
month06                                                                   NA         NA      NA       NA   
month07                                                                   NA         NA      NA       NA   
month08                                                                   NA         NA      NA       NA   
month09                                                                   NA         NA      NA       NA   
month10                                                                   NA         NA      NA       NA   
month11                                                                   NA         NA      NA       NA   
year2013                                                                  NA         NA      NA       NA   
year2014                                                                  NA         NA      NA       NA   
timeDiff                                                                  NA         NA      NA       NA   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 5.287 on 12 degrees of freedom
Multiple R-squared:  0.9585,	Adjusted R-squared:  0.8512 
F-statistic: 8.934 on 31 and 12 DF,  p-value: 0.0001399

As I understand it we can look at the R-squared value to understand how much of the variance in the data has been explained by the model – in this case it’s 85%.

A lot of the coefficients seem to be based around specific event names which seems a bit too specific to me so I wanted to see what would happen if I derived a feature which indicated whether a session was practical:

events$practical = grepl("Hackathon|Hands on|Hands On", events$event.name)

We can now run the model again with the new column having excluded ‘event.name’ field:

> summary(lm(rsvps ~., data = subset(events, select = -c(event.name))))
 
Call:
lm(formula = rsvps ~ ., data = subset(events, select = -c(event.name)))
 
Residuals:
    Min      1Q  Median      3Q     Max 
-18.647  -2.311   0.000   2.908  23.218 
 
Coefficients: (13 not defined because of singularities)
                   Estimate Std. Error t value Pr(>|t|)  
(Intercept)      -3.980e+03  4.752e+03  -0.838   0.4127  
eventTime         2.907e-06  3.873e-06   0.751   0.4621  
announcedAt       3.336e-08  3.559e-06   0.009   0.9926  
dayTuesday        7.547e+00  6.080e+00   1.241   0.2296  
dayWednesday      2.442e+00  7.046e+00   0.347   0.7327  
monthYear01-2014 -9.562e+01  1.187e+02  -0.806   0.4303  
monthYear02-2013 -4.230e+00  2.289e+01  -0.185   0.8553  
monthYear02-2014 -9.156e+01  1.254e+02  -0.730   0.4742  
monthYear03-2013 -1.633e+01  2.808e+01  -0.582   0.5676  
monthYear03-2014 -8.094e+01  1.329e+02  -0.609   0.5496  
monthYear04-2013 -2.249e+01  3.785e+01  -0.594   0.5595  
monthYear04-2014 -9.230e+01  1.401e+02  -0.659   0.5180  
monthYear05-2013 -3.237e+01  4.654e+01  -0.696   0.4952  
monthYear05-2014 -1.015e+02  1.509e+02  -0.673   0.5092  
monthYear06-2013 -3.947e+01  5.355e+01  -0.737   0.4701  
monthYear06-2014 -1.081e+02  1.604e+02  -0.674   0.5084  
monthYear07-2014 -1.110e+02  1.678e+02  -0.661   0.5163  
monthYear08-2013 -5.144e+01  6.988e+01  -0.736   0.4706  
monthYear08-2014 -1.023e+02  1.784e+02  -0.573   0.5731  
monthYear09-2013 -6.057e+01  7.893e+01  -0.767   0.4523  
monthYear09-2014 -1.260e+02  1.874e+02  -0.672   0.5094  
monthYear10-2012  9.557e+00  2.873e+01   0.333   0.7430  
monthYear10-2013 -6.450e+01  9.169e+01  -0.703   0.4903  
monthYear11-2012  1.689e+01  2.316e+01   0.729   0.4748  
month02                  NA         NA      NA       NA  
month03                  NA         NA      NA       NA  
month04                  NA         NA      NA       NA  
month05                  NA         NA      NA       NA  
month06                  NA         NA      NA       NA  
month07                  NA         NA      NA       NA  
month08                  NA         NA      NA       NA  
month09                  NA         NA      NA       NA  
month10                  NA         NA      NA       NA  
month11                  NA         NA      NA       NA  
year2013                 NA         NA      NA       NA  
year2014                 NA         NA      NA       NA  
timeDiff                 NA         NA      NA       NA  
practicalTRUE    -9.388e+00  5.289e+00  -1.775   0.0919 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 10.21 on 19 degrees of freedom
Multiple R-squared:  0.7546,	Adjusted R-squared:  0.4446 
F-statistic: 2.434 on 24 and 19 DF,  p-value: 0.02592

Now we’re only accounting for 44% of the variance and none of our coefficients are significant so this wasn’t such a good change.

I also noticed that we’ve got a bit of overlap in the date related features – we’ve got one column for monthYear and then separate ones for month and year. Let’s strip out the combined one:

> summary(lm(rsvps ~., data = subset(events, select = -c(event.name, monthYear))))
 
Call:
lm(formula = rsvps ~ ., data = subset(events, select = -c(event.name, 
    monthYear)))
 
Residuals:
     Min       1Q   Median       3Q      Max 
-16.5745  -4.0507  -0.1042   3.6586  24.4715 
 
Coefficients: (1 not defined because of singularities)
                Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -1.573e+03  4.315e+03  -0.364   0.7185  
eventTime      3.320e-06  3.434e-06   0.967   0.3425  
announcedAt   -2.149e-06  2.201e-06  -0.976   0.3379  
dayTuesday     4.713e+00  5.871e+00   0.803   0.4294  
dayWednesday  -2.253e-01  6.685e+00  -0.034   0.9734  
month02        3.164e+00  1.285e+01   0.246   0.8075  
month03        1.127e+01  1.858e+01   0.607   0.5494  
month04        4.148e+00  2.581e+01   0.161   0.8736  
month05        1.979e+00  3.425e+01   0.058   0.9544  
month06       -1.220e-01  4.271e+01  -0.003   0.9977  
month07        1.671e+00  4.955e+01   0.034   0.9734  
month08        8.849e+00  5.940e+01   0.149   0.8827  
month09       -5.496e+00  6.782e+01  -0.081   0.9360  
month10       -5.066e+00  7.893e+01  -0.064   0.9493  
month11        4.255e+00  8.697e+01   0.049   0.9614  
year2013      -1.799e+01  1.032e+02  -0.174   0.8629  
year2014      -3.281e+01  2.045e+02  -0.160   0.8738  
timeDiff              NA         NA      NA       NA  
practicalTRUE -9.816e+00  5.084e+00  -1.931   0.0645 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 10.19 on 26 degrees of freedom
Multiple R-squared:  0.666,	Adjusted R-squared:  0.4476 
F-statistic: 3.049 on 17 and 26 DF,  p-value: 0.005187

Again none of the coefficients are statistically significant which is disappointing. I think the main problem may be that I have very few data points (only 42) making it difficult to come up with a general model.

I think my next step is to look for some other features that could impact the number of RSVPs e.g. other events on that day, the weather.

I’m a novice at this but trying to learn more so if you have any ideas of what I should do next please let me know.

Categories: Programming

Neo4j: Generic/Vague relationship names

Mark Needham - Tue, 09/30/2014 - 17:47

An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node.

Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in.

However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out.

I created 4 different databases which had one node with 60,000 outgoing relationships – 10,000 which we wanted to retrieve and 50,000 that were irrelevant.

I modelled the ‘relationship’ in 4 different ways…

  • Using a specific relationship type
    (node)-[:HAS_ADDRESS]->(address)
  • Using a generic relationship type and then filtering by end node label
    (node)-[:HAS]->(address:Address)
  • Using a generic relationship type and then filtering by relationship property
    (node)-[:HAS {type: "address"}]->(address)
  • Using a generic relationship type and then filtering by end node property
    (node)-[:HAS]->(address {type: “address”})

…and then measured how long it took to retrieve the ‘has address’ relationships.

The code is on github if you want to take a look.

Although it’s obviously not as precise as a JMH micro benchmark I think it’s good enough to get a feel for the difference between the approaches.

I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms):

Using a generic relationship type and then filtering by end node label
50%ile: 6.0    75%ile: 6.0    99%ile: 402.60999999999825
 
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0   75%ile: 22.0   99%ile: 504.85999999999785
 
Using a generic relationship type and then filtering by end node label
50%ile: 4.0    75%ile: 4.0    99%ile: 145.65999999999931
 
Using a specific relationship type
50%ile: 0.0    75%ile: 1.0    99%ile: 25.749999999999872

We can drill further into why there’s a difference in the times for each of the approaches by profiling the equivalent cypher query. We’ll start with the one which uses a specific relationship name

Using a specific relationship type

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS_ADDRESS]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +SimplePatternMatcher
      |
      +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+-----------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                 Other |
+----------------------+-------+--------+-----------------------------+-----------------------+
|         ColumnFilter |     1 |      0 |                             | keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                       |
| SimplePatternMatcher | 10000 |  10000 | n,   UNNAMED53,   UNNAMED35 |                       |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |          {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+-----------------------+
 
Total database accesses: 10001

Here we can see that there were 10,002 database accesses in order to get a count of our 10,000 HAS_ADDRESS relationships. We get a database access each time we load a node, relationship or property.

By contrast the other approaches have to load in a lot more data only to then filter it out:

Using a generic relationship type and then filtering by end node label

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->(:Address) return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+----------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                            Other |
+----------------------+-------+--------+-----------------------------+----------------------------------+
|         ColumnFilter |     1 |      0 |                             |            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                  |
|               Filter | 10000 |  10000 |                             | hasLabel(  UNNAMED45:Address(0)) |
| SimplePatternMatcher | 10000 |  60000 | n,   UNNAMED45,   UNNAMED35 |                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+----------------------------------+
 
Total database accesses: 70001

Using a generic relationship type and then filtering by relationship property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS {type: "address"}]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED35,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED63,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
 
Total database accesses: 140001

Using a generic relationship type and then filtering by end node property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->({type: "address"}) return count(n);
+----------+
| count(n) |
+----------+
| 10000    |
+----------+
1 row
 
ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +SimplePatternMatcher
        |
        +NodeByIdOrEmpty
 
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED45,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED45,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
 
Total database accesses: 140001

So in summary…specific relationships #ftw!

Categories: Programming

Sudoku, Linear Optimization, and the Ten Cent Diet

Google Code Blog - Tue, 09/30/2014 - 17:12
Originally posted on the Google Research blog. Cross posted on the Google Apps Developers blog

In 1945, future Nobel laureate George Stigler wrote an essay in the Journal of Farm Economics titled The Cost of Subsistence about a seemingly simple problem: how could a soldier be fed for as little money as possible?

The “Stigler Diet” became a classic problem in the then-new field of linear optimization, which is used today in many areas of science and engineering. Any time you have a set of linear constraints such as “at least 50 square meters of solar panels” or “the amount of paint should equal the amount of primer” along with a linear goal (e.g., “minimize cost” or “maximize customers served”), that’s a linear optimization problem.

At Google, our engineers work on plenty of optimization problems. One example is our YouTube video stabilization system, which uses linear optimization to eliminate the shakiness of handheld cameras. A more lighthearted example is in the Google Docs Sudoku add-on, which instantaneously generates and solves Sudoku puzzles inside a Google Sheet, using the SCIP mixed integer programming solver to compute the solution.
Today we’re proud to announce two new ways for everyone to solve linear optimization problems. First, you can now solve linear optimization problems in Google Sheets with the Linear Optimization add-on written by Google Software Engineer Mihai Amarandei-Stavila. The add-on uses Google Apps Script to send optimization problems to Google servers. The solutions are displayed inside the spreadsheet. For developers who want to create their own applications on top of Google Apps, we also provide an API to let you call our linear solver directly.
Second, we’re open-sourcing the linear solver underlying the add-on: Glop (the Google Linear Optimization Package), created by Bruno de Backer with other members of the Google Optimization team. It’s available as part of the or-tools suite and we provide a few examples to get you started. On that page, you’ll find the Glop solution to the Stigler diet problem. (A Google Sheets file that uses Glop and the Linear Optimization add-on to solve the Stigler diet problem is available here. You’ll need to install the add-on first.)

Stigler posed his problem as follows: given nine nutrients (calories, protein, Vitamin C, and so on) and 77 candidate foods, find the foods that could sustain soldiers at minimum cost.

The Simplex algorithm for linear optimization was two years away from being invented, so Stigler had to do his best, arriving at a diet that cost $39.93 per year (in 1939 dollars), or just over ten cents per day. Even that wasn’t the cheapest diet. In 1947, Jack Laderman used Simplex, nine calculator-wielding clerks, and 120 person-days to arrive at the optimal solution.

Glop’s Simplex implementation solves the problem in 300 milliseconds. Unfortunately, Stigler didn’t include taste as a constraint, and so the poor hypothetical soldiers will eat nothing but the following, ever:

  • Enriched wheat flour
  • Liver
  • Cabbage
  • Spinach
  • Navy beans

Is it possible to create an appealing dish out of these five ingredients? Google Chef Anthony Marco took it as a challenge, and we’re calling the result Foie Linéaire à la Stigler:
This optimal meal consists of seared calf liver dredged in flour, atop a navy bean purée with marinated cabbage and a spinach pesto.

Chef Marco reported that the most difficult constraint was making the dish tasty without butter or cream. That said, I had the opportunity to taste our linear optimization solution, and it was delicious.

Posted by Jon Orwant, Engineering Manager
Categories: Programming

Episode 211: Continuous Delivery on Windows with Rachel Laycock and Max Lincoln

Johannes talks with Rachel Laycock and Max Lincoln from ThoughtWorks about continuous delivery on Windows. The outline includes: introduction to continuous delivery; continuous integration; DevOps and ChatOps; decisions to be taken when implementing continuous delivery on windows; build tools on windows; packaging and deploy on windows; infrastructure automation and infrastructure as code with chef, puppet […]
Categories: Programming

Just Enough

Software Requirements Blog - Seilevel.com - Tue, 09/30/2014 - 17:00
One concept you’ll hear tossed about in an Agile discussion is that of “just enough.” You want just enough documentation, just enough development and testing, just enough time for meetings, just enough grooming, and so on. The idea is that doing more than is needed means you have throwaway work when you need to make […]
Categories: Requirements

Sponsored Post: Apple, Flipboard, All Your Base, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Siri Operations Developer. Apple is looking for talented developers to help build the next generation internal cloud platform for Siri. This person should be excited about solving difficult distributed systems problems as well as constantly improving user-experience. This person will be working with a highly technical and motivated team solving the hard problems. Please apply here.
    • Site Reliability Engineer. The Apple Pay Site Reliability Team is hiring for multiple roles focused on the front line customer experience and the back end integration of Apple systems with our Network and Banking partners. Please apply here.
    • Senior Software Engineer, iTunes Infrastructure. Hands-on senior software engineering for the iTunes digital media supply chain engineering team. We are looking for a self starting, energetic individual who is not afraid to question assumptions and with excellent written and oral communication skills. Please apply here
    • iTunes - Content Management Tools Engineer. The candidate should have several years experience developing large-scale web-based applications using object-oriented languages. Excellent understanding of relational databases and data-modeling techniques is also a must. Please apply here
    • Senior Engineer: Mobile Services. The Emerging Technologies/Mobile Services team is looking for a proactive and hardworking software engineer to join our team. The team is responsible for a variety of high quality and high performing mobile services and applications for internal use. We seek an accomplished server-side engineer capable of delivering an extraordinary portfolio of features and services based on emerging technologies to our internal customers. Please apply here.
    • Apple Pay Automation Engineer. The Apple Pay group within iOS Systems is looking for a outstanding automation engineer with strong experience in building client and server test automation. We work in an agile software development environment and are building infrastructure to move towards continuous delivery where every code change is thoroughly tested by push of a button and is considered ready to be deployed if we choose so. Please apply here

  • Flipboard's Site Reliability Engineering Team is hiring! This team offers great challenges solving unique problems unlike any you have seen!  They work exclusively in the cloud, ensuring a highly available and performant product to millions of users daily.  If you have a passion for large-scale systems, next generation provisioning and orchestration tools apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • All Your Base is the only curated database conference of its kind in the UK. Listen to talks from database creators, industry leaders and developers working at the coal face on where to store and how to handle your data. Book tickets.
Cool Products and Services
  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Better, Faster, Cheaper: Pick Three. Scalyr is your universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs”; our columnar data store enables enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – get on board!

  • Are You a Startup in need of Reliable Speed at Scale? The Aerospike Startup Special provides free access to the Aerospike Enterprise Edition software with Community Support for qualifying startups. Learn more and see if you qualify! http://www.aerospike.com/blog/special-startups/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

What Can Lean Learn From Systems Engineering?

Herding Cats - Glen Alleman - Tue, 09/30/2014 - 16:33

The Lean Aerospace Initiative and the Lean Aerospace Initiative Consortium define processes applicable in many domains for applying lean. At first glance there is no natural connection between Lean and System Engineering. The ideas below are from a paper Igave at a Lean conference.

Key Takeaways

  • Lean and Systems engineering are cousins.
  • All but trivial projects are systems and many are systems of systems. Thinking like a systems engineer is the basis of implementing Lean processes. Thinking in the absence of systems, does little to add sustaining value to any process improvement.
  • Product development is a value stream process, but how the components interact at the technical, business, financial, and operational levels is a systems engineering process. Lean itself does not possess the vocabulary to speak to these systems complexity issues [1]

Core Concepts of Systems Engineering

  1. Capture and understand the requirements in terms of Capabilities assessed through Measures of Effectiveness (MOE) and Measures of Performance (MOP).
  2. Ensure requirements are consistent with what is predicted to be possible in a solution in these MOEs and MOPs.
  3. Treat goals as desired characteristics for what may not be possible.
  4. Define the MOE, MOP, goals, and solutions for the whole lifecycle of the project in units meaningful to the buyer.
  5. Maintain the distinction between the statement of the problem and the description of the solution.
  6. Baseline each statement of the problem and the statement of the solution.
  7. Identify descriptions of alternative solutions.
  8. Develop descriptions of the solution.
  9. Except for simple problems, develop a logical  solution description.
  10. Be prepared to iterate in design to drive up effectiveness.
  11. Base the solution of the evaluation of its effectiveness, in units of measure meaningful to the buyer.
  12. Independently verify all work products.
  13. Validate all work products from the perspective of the stakeholders.
  14. Some management is needed to plan and implement effective and efficient transformation of requirements and goals into a description of the solution.

Typical System Engineering Activities

  1. Technical management
  2. System design
  3. Product realization
  4. Technical analysis and evaluation
  5. Product control
  6. Process control
  7. Post implementation support

Steps to Lean Thinking [2]

  1. Specify value
  2. Identify value stream
  3. Make value flow continuously
  4. Let customers pull value
  5. Pursue perfection

Differences and Similarities between Lean and Systems Engineering

  1. Both emerged from practice. Only later were the principles and theories codified.
  2. Both have focused on different phases of the product lifecycle. SE is generally on product development. SE is more focused on planning. Lean generally on product production. While Lean is more focused on empirical action.
  3. Unlike Lean, SE has less focus on quality, except for Integrated Product and Product Development (IPPD).

Despite these differences and similarities both Lean and Systems Engineering are focused on the same objectives – delivering products or lifecycle value to the stakeholders.

It is the lifecycle value that drives both paradigms and must drive any other process paradigm associated with Lean and Systems Engineering. Paradigm like software development, the management of any form of a project and the very notion of agile. A critical understanding often missed is that Lifecycle Value includes the cost of delivering that value.

Value can't be determined in the absence of knowing the cost. ROI and Microeconomics of decision making require both variables to be used to make decisions.

What do we mean by lifecycle?

Generally lifecycle is a combination of product performance, quality, cost and fulfillment of the buyers needed capabilities.[3]

Lean and Systems Engineering share this common goal. The more complex the system, the more contribution there from Lean and SE.

Putting Lean and Systems Engineering Together on Real Projects

First some success factors on complex projects [4]

  1. Dedicated and stable interdisciplinary teams
  2. Use of prototypes and models to generate tradeoffs
  3. Prioritizing product features
  4. Engagement with senior management and customers at every point in the project
  5. Some form of high performing front end decision process that reduces instability of key inputs and improve the flow of work throughout the product lifecycle.

This last success factor is core to any complex environment, no matter what the process is called. In the absence of stability of requirements and funding, improvements to the flow of work is constrained.

The notion of adapting to changing requirements is not the same as having the requirements – and the associated funding – be unstable.

Mapping of the Value Stream to the work process requires some level of stability. It is the search for this stability where Systems Engineering – as a paradigm – adds measureable value to any Lean initiative.

The standardization and commonality of processes across complex systems is the basis for this value. [5]

Conclusions

  1. Lean and SE are two side of the same coin regarding the objective of creating value for the stakeholder
  2. Lean and SE complement each other during different phases of the project – ideation, product trades for SE and production waste removal for Lean anchor both ends of the spectrum of improvement opportunities.
  3. Value stream thinking makes visible the paths to be taken in transitioning to a Lean paradigm while maintaining the principles of systems engineering. [6]
  4. The result is the combination of Speed and Robustness – systems are easily adaptable to change while maintaining fewer surprises, using leading indicators to make decisions and decreasing sensitivity to production and use variables.

[1] “The Lean Enterprise – A Management Philosophy at Lockheed Martin,” Joyce and Schechter, Defense Acquisition Review Journal, 2004.

[2] Lean Thinking, Womack and Jones, Simon and Schuster, 1996

[3] Lean Enterprise Value: Insights from MIT’s Lean Aerospace Initiative, Murman, et al, Palgrave 2002.

[4] “Lean Systems Engineering: Research Initiatives in Support of a New Paradigm,” Rebentisch, Rhodes, and Murman, Conference on Systems Engineering, April 2004.

[5] LM21 Best Practices, Jack Hugus, National Security Studies, Louis A. Bantle Symposium, Syracuse University Maxwell School, October 1999

[6] “Enterprise Transition to Lean Roadmap,” MIT Lean Aerospace Initiative, 2004 Plenary Conference.

 

Related articles Why Projects Fail, No Matter the Domain When We Say Risk What Do We Really Mean? How to Deal With Complexity In Software Projects? Big Systems Acquisitions - Lessons for ACA Web Site
Categories: Project Management

Handling Requests for Unnecessary Artifacts

Mike Cohn's Blog - Tue, 09/30/2014 - 15:00

The following was originally published in Mike Cohn's monthly newsletter. If you like what you're reading, sign up to have this content delivered to your inbox weeks before it's posted on the blog, here.

“Working software over comprehensive documentation.” You’ve certainly seen that statement on the Agile Manifesto. It is perhaps the most important of the Manifesto’s four value statements—working software is, after all, the reason a team has undertaken a software development effort.

It is also one of the most misused parts of the Manifesto. This is the quote people cite when trying to get out of all documentation, which is not what the Manifesto says we should value.

Some documentation on a project can be great. But most non-agile teams write too much and talk too little. Some agile teams go to the opposite extreme, but many seem to find a good balance.

Occasionally, though, a team and product owner may disagree on the necessity of a document—usually with the product owner wanting a document and the team saying it’s not necessary. I’ve found two guidelines helpful in determining how to handle requests for various artifacts, especially documentation, on an agile project.

Guideline No. 1: If a team would produce an artifact while in the process of creating working software, that artifact is just naturally produced.

This guideline covers essentially everything a team would want to produce while on the way to building a system or product. It includes, for example, source code. It also includes any design documents, user guides and other items that the team wants to produce for the benefit of the current team, future teams maintaining the software or end users.

Guideline No. 2: If an artifact would not naturally be produced in the process of creating working software, the artifact is added to the product backlog.

The second guideline is there to cover cases when the product owner (or any other outside stakeholder) wants an artifact produced (usually a document) that the team would not normally produce.

For example, suppose the product owner asks the team to write a document describing every table and field in the database. I’ve certainly seen projects where such a document has been extremely helpful. (In fact, I’ve both requested and written such a document before.) But, I’ve always seen projects where this would have been unnecessary.

If the team thought this database description document were helpful, they would produce it in the process of creating the working software. And Guideline No. 1 would apply. But if they don’t think this document is necessary, they won’t produce it. Unless, that is, the product owner insists, which is where Guideline No. 2 comes in.

If the product owner wants this document, the product owner creates a new product backlog item saying so. The team can then estimate the time it will take to develop this document, just like they’d estimate any other product backlog item.

Putting an estimate on creating the document makes its cost explicit. This forces a product owner to think about the opportunity cost of developing that document. The product owner will be able to ponder: This five-point document or five points worth of new features?

I don’t know which the product owner will choose, but this approach makes the cost of that artifact explicit, allowing it to be compared with the value of additional features instead.

I’d love to know your thoughts on this. How does your team handle product owner requests for artifacts the team wouldn’t naturally produce? What artifacts does your team find helpful? Please share your thoughts in the discussion below.

Handling Requests for Unnecessary Artifacts

Mike Cohn's Blog - Tue, 09/30/2014 - 15:00

The following was originally published in Mike Cohn's monthly newsletter. If you like what you're reading, sign up to have this content delivered to your inbox weeks before it's posted on the blog, here.

“Working software over comprehensive documentation.” You’ve certainly seen that statement on the Agile Manifesto. It is perhaps the most important of the Manifesto’s four value statements—working software is, after all, the reason a team has undertaken a software development effort.

It is also one of the most misused parts of the Manifesto. This is the quote people cite when trying to get out of all documentation, which is not what the Manifesto says we should value.

Some documentation on a project can be great. But most non-agile teams write too much and talk too little. Some agile teams go to the opposite extreme, but many seem to find a good balance.

Occasionally, though, a team and product owner may disagree on the necessity of a document—usually with the product owner wanting a document and the team saying it’s not necessary. I’ve found two guidelines helpful in determining how to handle requests for various artifacts, especially documentation, on an agile project.

Guideline No. 1: If a team would produce an artifact while in the process of creating working software, that artifact is just naturally produced.

This guideline covers essentially everything a team would want to produce while on the way to building a system or product. It includes, for example, source code. It also includes any design documents, user guides and other items that the team wants to produce for the benefit of the current team, future teams maintaining the software or end users.

Guideline No. 2: If an artifact would not naturally be produced in the process of creating working software, the artifact is added to the product backlog.

The second guideline is there to cover cases when the product owner (or any other outside stakeholder) wants an artifact produced (usually a document) that the team would not normally produce.

For example, suppose the product owner asks the team to write a document describing every table and field in the database. I’ve certainly seen projects where such a document has been extremely helpful. (In fact, I’ve both requested and written such a document before.) But, I’ve always seen projects where this would have been unnecessary.

If the team thought this database description document were helpful, they would produce it in the process of creating the working software. And Guideline No. 1 would apply. But if they don’t think this document is necessary, they won’t produce it. Unless, that is, the product owner insists, which is where Guideline No. 2 comes in.

If the product owner wants this document, the product owner creates a new product backlog item saying so. The team can then estimate the time it will take to develop this document, just like they’d estimate any other product backlog item.

Putting an estimate on creating the document makes its cost explicit. This forces a product owner to think about the opportunity cost of developing that document. The product owner will be able to ponder: This five-point document or five points worth of new features?

I don’t know which the product owner will choose, but this approach makes the cost of that artifact explicit, allowing it to be compared with the value of additional features instead.

I’d love to know your thoughts on this. How does your team handle product owner requests for artifacts the team wouldn’t naturally produce? What artifacts does your team find helpful? Please share your thoughts in the discussion below.

Is Agile Dead or Can Good Software Development Scale?

From the Editor of Methods & Tools - Tue, 09/30/2014 - 13:23
As Agile becomes widely accepted as a software development approach, many large organizations have adopted it, mainly in its Scrum form to reduce development cycle. There might be even a fair share of adopters that are trying really to apply Agile values. If the topic of scaling Agile has been discussed for many years and you can read the excellent books of Graig Larman and Bas Vodde on this topic. We have also recently seen the emergence of proprietary” approaches, like SAFE, to achieve this goal. At the same time, ...

Distributed Agile: Sprint Planning

#6 Make sure the telecommunications tools work.

#6 Make sure the telecommunications tools work.

In Distributed Agile: Distributed Team Degree of Difficulty Matrix, I described the many flavors of distributed Agile teams and the complexity different configurations create. While all things being equal, distributed team are less effective than collocated teams. Never the less, distributed Agile with teams spread across countries, continents and companies have become a fact of life. There are techniques to help distributed Agile teams become more effective. In an environment using Scrum, the first formal activities for most team’s is sprint planning. There are numerous techniques that can help make distributed Agile more effective. These techniques include:

1.   Bring the team physically together. Co-location, whether for a single sprint or some periodic basis, will increase the team’s ability to understand each other and know how to work together more effectively.
2.   Develop a sprint planning checklist. The process of getting together and planning is a fairly predictable process. Capture the typical preparation and meeting tasks and make sure they happen. Items can include booking rooms, securing video or telecom facilities, publishing an agenda with breaks and more.
3.   Review the definition of done. Ensure that everyone understands the organization’s definition of done before the starting to plan. The definition of done will help the team know the tasks they need to complete during the sprint to meet the organization’s (or product owner’s) process standards.
4.   Focus on the stories. Don’t let distractions get in the way of planning. Before beginning the planning session, review the process that will be followed with the entire team. Make sure that planning the next sprint is the only topic on the agenda.
5.   Ensure that the stories have been properly groomed. The stories that the team will accept and plan need to be properly formed and have acceptance criteria. This generally means that the stories that are most apt to be accepted by team (and a few more) need to have been through a grooming session. Make this a prerequisite for the planning meeting.
6.   Make sure the telecommunications tools work and have a backup. Distributed planning means that all of the team will be using the phone or video conference. Make sure they are set up and tested. Also always have a backup plan in case your favorite collaboration tool fails because sooner or later it will. Planning is a whole team activity and when the whole team can’t participate planning, it will lose effectiveness.
7.   Everyone should understand the big picture. Have the product owner provide an overview of the goals of the project, and how the current sprint will support those goals. Repeating the big picture will provide the team with a common touch point to validate progress.
8.   Use physical tools for interaction. Physical tools, like flip charts and card walls, can be difficult when many locations are involved in sprint planning. However, when possible, use physical tools like flip charts and whiteboard and then use webcams (preferable) and cameras to share data. Have one location scribe one story and then switch locations for the next story.
9.   Try multiple facilitators. When a team is evenly distributed between two locations consider having another scrum master act as a second facilitator to ensure everyone stays on track. Similarly, have the Scrum master rotate between locations to facilitate the planning session. This can be very effective in helping each location feel connected.
10.Remember that sprint planning is a team meeting. Make sure everyone is involved.

Sprint planning, done well, helps a team understand what they have to do in order to consider a story complete, both from a functional and technical perspective. Distributed Agile teams will need to focus on making sure that everyone is involved and a part of the planning process. Remember to plan for planning, because when you are on the other end of a phone or videoconferencing the tools, process and logistics can make or break the meeting!


Categories: Process Management

The Cognitive Illusion of Bad Software Project Outcomes

Herding Cats - Glen Alleman - Tue, 09/30/2014 - 00:41

Daniel Kahneman's and Amos Tversky's paper On The Reality of Cognitive Illusion. ‡ They suggest, through their research, that intuitive predictions and judgements are often mediated by a small number of distinctive mental operations, called judgement heuristics. These heuristics often lead to characteristic errors and biases.

For example, the effect of aerial perspective on apparent distance is confirmed by the observation that the same mountain appears closer on a clear day rather than a hazy day. The intuitive predication and judgement of probability are often based on the relations of similarity between evidence and possible outcomes. This representativeness is an assessment of the degree of correspondence between a sample and a population. 

The next heuristic is the availability bias in which the probability is estimated by assessing availability or associative distance. † Experience shows and experiments confirm that large classes are recalled better and faster than instances of less frequent classes. That likely occurrences are easier to imagine than unlikely ones. And associative connections are strengthened when two events frequently co-occur. That these associative bonds are strengthened by repetition is the basis of memory. 

So Here's the Issue

When we hear or read that software projects fail often or Standish report says ..., or a personal anecdote that resonates with our own personal experience, we recall that experience from memory. The actual data from the population of all data are not used for comparison. Rather we assume - by applying the cognitive illusion - that the sample sata represents the large class of population data, since our repeated observations of the sample data class has reinforced our illusion that that sample data IS the population.

This is the core issue with anecdotal information when making decisions in the presence of uncertainty. Or speaking about a condition in the absence of statistically testable hypothesis. Or attempting to convey a message in the absence of external confirmation that the message is on solid footing compared to the population of data.

Why This Is Not Good Management

When we hear we're all bad at making estimates, in the absence of actual population statistics about estimate making, we're using Cognitive Illusions and Availability Heuristics. Because we have personal experience with making bad estimates and the majority of people we associate with have the same experience.

This experience is in no way representative of the population of people tasked with making estimates. This would be irrelevant of course if the conversation were simple chatter at the bar. But once that conversation enters the realm of policy making, method development, or suggestions that the anecdotal observations need to result in changing how business conducts its business - we're bad at making estimates so the solution is to stop making estimates - then both availability bias and Cognitive Illusions have displaced the actual conversation about the very validity of the anecdotal concepts. And it is replaced by strong defense of the cognitively biased dea, no matter the credibility of the concept - which is most often weak at best and simply false at worse.

So next time you hear some statement about something involving observational and anecdotal data, ask a simple question.

What's the process by which these anecdotal observation have been tested in the broader population of conditions?

This is the core issue with the Standish Reports. They are self selected samples of projects that are troubled in the absence of the population of projects that are troubled and not troubled. 

Always ask for references, data representative of the references, and an assessment of the statistical confidence that the anecdotal data is in fact correlated with the population data. Otherwise it's just an opinion, and very likely an uniformed opinion.

And if you're paying money to listen to someone tell you ancedotal data and don't speak up and ask those questions, you've participated in the availability heuristic and cognitive illusion along with the speaker.

† Availability: A Heuristic for Judging Frequency and Probability, Amos Tversky and Daniel Kahneman, a chapter appearing in Cognitive Psychology, 1973

‡ On the Reality of Cognitive Illusions, Daniel Kahneman and Amos Tversky, Psychological Review, Vol. 103, No. 3, pp. 582-591

Categories: Project Management

PostgreSQL: ERROR: column does not exist

Mark Needham - Mon, 09/29/2014 - 23:40

I’ve been playing around with PostgreSQL recently and in particular the Northwind dataset typically used as an introductory data set for relational databases.

Having imported the data I wanted to take a quick look at the employees table:

postgres=# SELECT * FROM employees LIMIT 1;
 EmployeeID | LastName | FirstName |        Title         | TitleOfCourtesy | BirthDate  |  HireDate  |           Address           |  City   | Region | PostalCode | Country |   HomePhone    | Extension | Photo |                                                                                      Notes                                                                                      | ReportsTo |              PhotoPath               
------------+----------+-----------+----------------------+-----------------+------------+------------+-----------------------------+---------+--------+------------+---------+----------------+-----------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------------------------------
          1 | Davolio  | Nancy     | Sales Representative | Ms.             | 1948-12-08 | 1992-05-01 | 507 - 20th Ave. E.\nApt. 2A | Seattle | WA     | 98122      | USA     | (206) 555-9857 | 5467      | \x    | Education includes a BA IN psychology FROM Colorado State University IN 1970.  She also completed "The Art of the Cold Call."  Nancy IS a member OF Toastmasters International. |         2 | http://accweb/emmployees/davolio.bmp
(1 ROW)

That works fine but what if I only want to return the ‘EmployeeID’ field?

postgres=# SELECT EmployeeID FROM employees LIMIT 1;
ERROR:  COLUMN "employeeid" does NOT exist
LINE 1: SELECT EmployeeID FROM employees LIMIT 1;

I hadn’t realised (or had forgotten) that field names get lower cased so we need to quote the name if it’s been stored in mixed case:

postgres=# SELECT "EmployeeID" FROM employees LIMIT 1;
 EmployeeID 
------------
          1
(1 ROW)

From my reading the suggestion seems to be to have your field names lower cased to avoid this problem but since it’s just a dummy data set I guess I’ll just put up with the quoting overhead for now.

Categories: Programming

R: Deriving a new data frame column based on containing string

Mark Needham - Mon, 09/29/2014 - 22:37

I’ve been playing around with R data frames a bit more and one thing I wanted to do was derive a new column based on the text contained in the existing column.

I started with something like this:

> x = data.frame(name = c("Java Hackathon", "Intro to Graphs", "Hands on Cypher"))
> x
             name
1  Java Hackathon
2 Intro to Graphs
3 Hands on Cypher

And I wanted to derive a new column based on whether or not the session was a practical one. The grepl function seemed to be the best tool for the job:

> grepl("Hackathon|Hands on|Hands On", x$name)
[1]  TRUE FALSE  TRUE

We can then add a column to our data frame with that output:

x$practical = grepl("Hackathon|Hands on|Hands On", x$name)

And we end up with the following:

> x
             name practical
1  Java Hackathon      TRUE
2 Intro to Graphs     FALSE
3 Hands on Cypher      TRUE

Not too tricky but it took me a bit too long to figure it out so I thought I’d save future Mark some time!

Categories: Programming

Instagram Improved their App's Performance. Here's How.

Is flat design just another pretty face or is it a huge performance hack cloaked as a UI revolution? It turns out flat design is a stone cold performance win.

This and more is expertly explained by Tyler Kieft, Engineer at Instagram, in a crisp and content filled talk he gave at the @scale conferenceInstagram on Typical Android. This talk was part of series of talks given by Facebook on how to design for the reality of mobile applications across the globe, where phones are slower, screens are smaller, and networks are slower than they are in the US.

Designing for a typical phone rather than a high-end phone required the Instagram team to rethink their design in a deep way. One of the revelations in Tyler's talk was that moving to a flat design was huge in making the application more beautiful, more usable, and it also substantially increased performance.

This was quite a surprise. I've only ever thought of flat design as just a way to think about how to build pretty UIs. Silly me. Thanks to Tyler for explaining the benefits of flat design so clearly and forcefully, using Instagram as a great example of what is possible.

Flat design is the anti-skeuomorphism, going digital native, eschewing a slavish obsession with the appearance of reality, adopting simple elements, simple typography, flat colors, and simple designs.

Using flat design Instagram was able shave off 120ms from its cold start times. It was also able to reduce the number of assets it took to display the feed screen from 29 assets down to 8 assets. All while making the application more beautiful, more usable, with giving more focus given to the content across different phone sizes.

How did flat design make all this possible? Please keep on reading...

Categories: Architecture

The Future of Jobs

Will you have a job in the future?

What will that job look like and how will the nature of work change?

Will automation take over your job in the near future?

These are the kinds of questions that Ruth Fisher, author of Winning the Hardware-Software Game, has tackled in a series of posts.

I wrote a summary post to distill her big ideas and insights about the future of jobs in my post:

The Future of Jobs

Fisher has done an outstanding job of framing out the landscape and walking the various arguments and perspectives on how automation will change the nature of work and shape the future of jobs.

One of the first things you might be wondering is, what jobs will automation take away?

Fisher addresses that.

Another question is, what new types jobs will be created?

While that’s an exercise for the reader, Fisher provides clues based on what industry luminaries have seen in terms of how jobs are changing.

The key is to know what automation can and can’t do, and to look at the pattern of work in terms of what’s better suited for humans, and what’s better suited for machines.

As one of my mentors puts it, “If the work can be automated, it’s not human.”

He’s a fan of people doing creative, non-routine work, where they can thrive and shine.

As I take on work, or push back on work, I look through a pretty simple lens:

  1. Is the work repetitive in nature? (in which case, something that should be automated)
  2. Is the work a high-value activity? (if not, why am I doing non high-value activities?)
  3. Does the work create greater capability? (for me, the team, the organization, etc.)
  4. Does the work play to my strengths? (if not, who is a better resource or provider.  You grow faster in your strengths, and in today’s world, if people aren’t giving their best where they have their best to give, it leads to a low-impact team that eventually gets out-executed, or put out to Pasteur.)
  5. Does the work lead to world-class impact?  (When everything gets exposed beyond the firewall, and when it’s a globally connected ecosystem, it’s really important to not only bring your A-game, but to play in a way where you can provide the best service in the world for your specific niche.   If you can’t be the best in your niche in a sustainable way, then you’re in the wrong niche.)

I find that by using this simple lens, I tend to take on high-value work that creates high-impact, that cannot be easily automated.  At the same time, while I perform the work, I look for way to turn things into repetitive activities that can be outsources or automated so that I can keep moving up the stack, and producing higher-value work … that’s more human.

Categories: Architecture, Programming

Scale Agile With Small-World Networks Posted

I posted my most recent Pragmatic Manager newsletter, Scale Agile With Small-World Networks on my site.

This is a way you can scale agile out, not up. No hierarchies needed.

Small-world networks take advantage of the fact that people want to help other people in the organization. Unless you have created MBOs (Management By Objectives) that make people not want to help others, people want to see the entire product succeed. That means they want to help others. Small-world networks also take advantage of the best network in your organization—the rumor mill.

If you enjoy reading this newsletter, please do subscribe. I let my readers know about specials that I run for my books and when new books come out first.

Categories: Project Management

What Ever Happened to the Founders of Sierra Online?

Making the Complex Simple - John Sonmez - Mon, 09/29/2014 - 15:00

Some of the fondest memories of my childhood involve playing adventure games like Space Quest, Kings Quest and Quest for Glory. I remember spending countless hours reloading from save spots and trying to figure out a puzzle. I remember that exciting feeling of anticipation when the Sierra logo flashed onto the screen as my 486 […]

The post What Ever Happened to the Founders of Sierra Online? appeared first on Simple Programmer.

Categories: Programming

R: Filtering data frames by column type (‘x’ must be numeric)

Mark Needham - Mon, 09/29/2014 - 06:46

I’ve been working through the exercises from An Introduction to Statistical Learning and one of them required you to create a pair wise correlation matrix of variables in a data frame.

The exercise uses the ‘Carseats’ data set which can be imported like so:

> install.packages("ISLR")
> library(ISLR)
> head(Carseats)
  Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban  US
1  9.50       138     73          11        276   120       Bad  42        17   Yes Yes
2 11.22       111     48          16        260    83      Good  65        10   Yes Yes
3 10.06       113     35          10        269    80    Medium  59        12   Yes Yes
4  7.40       117    100           4        466    97    Medium  55        14   Yes Yes
5  4.15       141     64           3        340   128       Bad  38        13   Yes  No
6 10.81       124    113          13        501    72       Bad  78        16    No Yes

filter the categorical variables from a data frame and

If we try to run the ‘cor‘ function on the data frame we’ll get the following error:

> cor(Carseats)
Error in cor(Carseats) : 'x' must be numeric

As the error message suggests, we can’t pass non numeric variables to this function so we need to remove the categorical variables from our data frame.

But first we need to work out which columns those are:

> sapply(Carseats, class)
      Sales   CompPrice      Income Advertising  Population       Price   ShelveLoc         Age   Education 
  "numeric"   "numeric"   "numeric"   "numeric"   "numeric"   "numeric"    "factor"   "numeric"   "numeric" 
      Urban          US 
   "factor"    "factor"

We can see a few columns of type ‘factor’ and luckily for us there’s a function which will help us identify those more easily:

> sapply(Carseats, is.factor)
      Sales   CompPrice      Income Advertising  Population       Price   ShelveLoc         Age   Education 
      FALSE       FALSE       FALSE       FALSE       FALSE       FALSE        TRUE       FALSE       FALSE 
      Urban          US 
       TRUE        TRUE

Now we can remove those columns from our data frame and create the correlation matrix:

> cor(Carseats[sapply(Carseats, function(x) !is.factor(x))])
                  Sales   CompPrice       Income  Advertising   Population       Price          Age    Education
Sales        1.00000000  0.06407873  0.151950979  0.269506781  0.050470984 -0.44495073 -0.231815440 -0.051955242
CompPrice    0.06407873  1.00000000 -0.080653423 -0.024198788 -0.094706516  0.58484777 -0.100238817  0.025197050
Income       0.15195098 -0.08065342  1.000000000  0.058994706 -0.007876994 -0.05669820 -0.004670094 -0.056855422
Advertising  0.26950678 -0.02419879  0.058994706  1.000000000  0.265652145  0.04453687 -0.004557497 -0.033594307
Population   0.05047098 -0.09470652 -0.007876994  0.265652145  1.000000000 -0.01214362 -0.042663355 -0.106378231
Price       -0.44495073  0.58484777 -0.056698202  0.044536874 -0.012143620  1.00000000 -0.102176839  0.011746599
Age         -0.23181544 -0.10023882 -0.004670094 -0.004557497 -0.042663355 -0.10217684  1.000000000  0.006488032
Education   -0.05195524  0.02519705 -0.056855422 -0.033594307 -0.106378231  0.01174660  0.006488032  1.000000000
Categories: Programming