Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Distributed Agile: Backlog Grooming Meetings

Understanding how a story or a group of stories fits into the big picture is sometime like reading a single line of Shakespeare and trying to develop the plot for the entire play.

Understanding how a story or a group of stories fits into the big picture is sometime like reading a single line of Shakespeare and trying to develop the plot for the entire play.

There are two reasons to hold backlog grooming meetings. The first is to make sprint planning more efficient and effective. The second reason is make sure you understand your backlog. When teams don’t spend the time needed to groom the backlog, planning meetings can be very tense and extend for hours . . . and hours. Backlog grooming sessions can be whole team activities (rare) or sub-team activities (more common). The most common technique used to generate a sub-team for grooming is the Three Amigos (or some variant). The tallest hurdle all distributed teams face is ensuring effective communications, followed quickly by staying focused on the task at hand. Many of the same techniques we discussed for sprint planning in distributed teams will be effective, however backlog grooming has a few unique twists.

  1. Everybody needs to see the story at all times. Everyone involved must be able to see the story being groomed, preferably as it is being edited. Reading a story to someone at the other end of a phone and then amending the reading as you wordsmith the statement is difficult for many people to conceptualize. Most webinar tools now have whiteboard options. Cut and paste the story and acceptance criteria into the whiteboard feature so that everyone can see the words. One team I recently worked with used messaging software to approximate the process (it worked fairly well). Tools like webcams and telepresence can be used, however make sure the story and the acceptance criteria are easily readable by all parties. When a team member can’t hear or see well enough to stay involved, they will lose focus and probably start doing email.
  2. The right people and locations need to be involved. There are many shades of distributed teams, ranging from two locations to completely dispersed (everyone in different locations). The goal of grooming is to make sure the backlog items that may be used in the next sprint are understood, well-formed and have acceptance criteria. Typically, grooming is most effective when the three major team constituencies are to be involved: the business, the developers and the testers. When a team is distributed, locations can become constituencies that need to be involved to ensure that the grooming session attains the goal of making sure the stories are understood. This is an argument for whole team grooming sessions so that no location feels left out.
  3. Use story maps to link stories the big picture. Understanding how a story or a group of stories fits into the big picture is sometime like reading a single line of Shakespeare and trying to develop the plot for the entire play. When a team is distributed, it becomes more difficult for members to have a side conversation to get things back on track or to develop ways to stay aligned to project’s big picture without a more formal reference. A story map provides a frame of reference so that the team members involved in the grooming session can see how the stories fit into overall project. The use of a story map in the grooming process makes it easier to identify or develop a theme for the next sprint (a theme provides focus and direction to the team).

Backlog grooming is a process to make sure the stories that might be used in the next sprint are understood, well-formed and have acceptance criteria. When backlogs are not well groomed teams tend to spend a lot time planning and re-planning rather than delivering value. This is true whether a team is distributed or not. The problem is that when a team is distributed any hiccup takes more effort to fix, making grooming even more important.

Categories: Process Management

Modularity and testability

Coding the Architecture - Simon Brown - Wed, 10/01/2014 - 22:21

I've been writing blog posts covering a number of topics over the past few months; from the conflict between software architecture and code and architecturally-evident coding styles through to representing a software architecture model as code and how microservice architectures can easily turn into distributed big balls of mud. The common theme running throughout all of them is structure, and this in turn has a relationship with testability.

The TL;DR version of this post is: think about modularity, think about how you structure your code, think about the options you have for testing your code and stop making everything public.

1. The conflict between software architecture and code

I've recently been talking a lot about the disconnect between software architecture and code. George Fairbanks calls this the "model-code gap". It basically says that the abstractions we consider at the architecture level (components, services, modules, layers, etc) are often not explicitly reflected in the code. A major cause is that we don't have those concepts in OO programming languages such as Java, C#, etc. You can't do public component X in Java, for example.

2. The "unit testing is wasteful" thing

Hopefully, we've all see the "unit testing is wasteful" thing, and all of the follow-up discussion. The unfortunate thing about much of the discussion is that "unit testing" has been used interchangeably with "TDD". In my mind, the debate is about unit testing rather than TDD as a practice. I'm not a TDDer, but I do write automated tests. I mostly write tests afterwards. But sometimes I write them beforehand, particularly if I want to test-drive my implementation of something before integrating it. If TDD works for you, that's great. If not, don't worry about it. Just make sure that you *do* write some tests. :-)

There are, of course, a number of sides to the debate, but in TDD is dead. Long live testing. (ignore the title), DHH makes some good points about the numbers and types of tests that a system should have. To quote (strikethrough mine):

I think that's the direction we're heading. Less emphasis on unit tests, because we're no longer doing test-first as a design practice, and more emphasis on, yes, slow, system tests. (Which btw do not need to be so slow any more, thanks to advances in parallelization and cloud runner infrastructure).

The type of software system you're building will also have an impact on the number and types of tests. I once worked on a system where we had a huge number of integration tests, but very few unit tests, primarily because the system actually did very little aside from get data from a Microsoft Dynamics CRM system (via web services) and display it on some web pages. I've also worked on systems that were completely the opposite, with lots of complex business logic.

There's another implicit assumption in all of this ... what's the "unit" in "unit testing"? For many it's an isolated class, but for others the word "unit" can be used to represent anything from a single class through to an entire sub-system.

3. The microservices hype

Microservices is the new, shiny kid in town. There *are* many genuine benefits from adopting this style of architecture, but I do worry that we're simply going to end up building the next wave of distributed big balls of mud if we're not careful. Technologies like Spring Boot make creating and deploying microservices relatively straightforward, but the design thinking behind partitioning a software system into services is still as hard as it's ever been. This is why I've been using this slide in my recent talks.

If you can't build a structured monolith, what makes you think microservices is the answer!?


Uncle Bob Martin posted Microservices and Jars last month, which touches upon the topic of building monolithic applications that do have a clean internal structure, by using the concept of separately deployable units (e.g. JARs, DLLs, etc). Although he doesn't talk about the mechanisms needed to make this happen (e.g. plugin architectures, Java classloaders, etc), it's all achievable. I rarely see teams doing this though.

Structuring our code for modularity at the macro level, even in monolithic systems, provides a number of benefits, but it's a simple way to reduce the model-code gap. In other words, we structure our code to reflect the structural building blocks (e.g. components, services, modules) that we define at the architecture level. If there are "components" on the architecture diagrams, I want to see "components" in the code. This alignment of architecture and code has positive implications for explaining, understanding, maintaining, adapting and working with the system.

It's also about avoiding big balls of mud, and I want to do this by enforcing some useful boundaries in order to slice up my thousands of lines of code/classes into manageable chunks. Uncle Bob suggests that you can use JARs to do this. There are other modularity mechanisms available in Java too; including SPI, CDI and OSGi. But you don't even need a plugin architecture to build a structured monolith. Simply using the scoping modifiers built in to Java is sufficient to represent the concept of a lightweight in-process component/module.

Stop making everything public

We need to resist the temptation to make everything public though, because this is often why codebases turn into a sprawling mass of interconnected objects. I do wonder whether the keystrokes used to write public class are ingrained into our muscle memory as developers. As I said during my closing session at DevDay in Krakow last week, we should make a donation to charity every time we type public class without thinking about whether that class really needs to be public.

Donate to charity every time you type public class without thinking

A simple way to create a lightweight component/module in Java is to create a public interface and keep all of the implementation (one or more classes) package protected, ensuring there is only one "component" per package. Here's an example of a such a component, which also happens to be a Spring Bean. This isn't a silver bullet and there are trade-offs that I have consciously made (e.g. shared domain classes and utility code), but it does at least illustrate that all code doesn't need to be public. Proponents of DDD and ports & adapters may disagree with the naming I've used but, that aside, I do like the stronger sense of modularity that such an approach provides.


And now you have some options for writing automated tests. In this particular example, I've chosen to write automated tests that treat the component as a single thing; going through the component API to the database and back again. You can still do class-level testing too (inside the package), but only if it makes sense and provides value. You can also do TDD; both at the component API and the component implementation level. Treating your components/modules as black boxes results in a slightly different testing pyramid, in that it changes the balance of class and component tests.

Rethinking the testing pyramid?

A microservice architecture will likely push you down this route too, with a balanced mix of low-level class and higher-level service tests. Of course there is no "typical" shape for the testing pyramid; the type of system you're building will determine what it looks like. There are many options for building testable software, but neither unit testing or TDD are dead.

In summary, I'm looking for ways in which it we can structure our code for modularity at the macro-level, to avoid the big ball of mud and to shrink the model-code gap. I also want to be able to automatically draw some useful architecture diagrams based upon the code. We shouldn't blindly be making everything public and writing automated tests at the class level. After all, there are a number of different approaches that we can take for all of this, and the modularity you choose has an implication on the number and types of tests that you write. As I said at the start; think about modularity, think about how you structure your code, think about the options you have for testing your code and stop making everything public. Designing software requires conscious effort. Let's not stop thinking.

Categories: Architecture

No Estimates Needs to Come In Contact With Those Providing the Money

Herding Cats - Glen Alleman - Wed, 10/01/2014 - 17:36

For all the words written and posted around estimating or not estimating - and I've contributed my share - the basis of estimates has yet to be addressed outside of a few people. @PeterKretzman @aritanninen @kalapaistos@fscavo come to mind.

The gap here is simple. No one seems to ask - or even want to ask - Who are the estimates for? They are not likely for developers, who rightly, so in some cases see estimating as taking away from their valuable development duties.

Who Are Estimates For?

Estimates are for business managers providing the money that appears in the developers paycheck. Estimates are for those same business managers accountable for the Profit & Loss statement of the firm employing the developers writing the code. Those estimates forecast confidence intervals of profit or loss on a project or service before that profit or loss arrives and is irrevocable. 

Estimates are for the business marketing staff in a product firm, who are forecasting the "break even" plan for the sunk cost of developing  software that will be sold in the market. Whose revenue will pay back the short term loan (line of credit) used to pay the salaries of the developers. Without this forecast, decisions about spending or further spending have to be made in the dark.

Estimates are for the business development staff in a professional services and development firm to forecast the confidence in the assure that the contractual obligations to provide working software will not cost more - including management reserve and contingency - than they quoted the customer during the early phases of the project. Since all forecasting are probabilistic, this confidence is - or should be - discussed as the probability of cost at of below or completing on or before. The dysfunction of using estimates as commitments, is recognized as just that - dysfuntion. But as a dysfunction, it's classified as Bad Management. Don't Do Stop Things on Purpose is good advice for any business.

Estimates are for the internal business finance staff accountable for managing and forecasting costs for internal software development or procurement used to run the business - and likely used to generate revenue - and assure the senior finance people that the "value" produced by this software measured in monetized units of "money" will exceed the cost to achieve that value when the project completes. And some sense of when the date will be, so those monetized benefits can start to appear on the balance sheet using FASB 86 accounting rules.

The estimates are not for the developers

Those talking about #NoEstimates from the developers point of view are talking to the wrong people. They appeat to be talking to their own self-selected group and not the group that provides the money for their work. As my former NASA Cost Director colleague reminds me "follow the money." So follow the money. Unless the developers are providing the money themselves, the question of estimating or not estimating is a self-referencing conversation in the absence of these people. Because of that, those best to say if estimates are of value or not are not in the conversation. 

So Back To The Original Question

Ignoring for the moment the observed or perceived dysfunctions found in low maturity software development organizations of the misuse of estimates. Ignore for the moment the preception that making estimates of the future cost, duration, and probabilistic outcomes of development work is part of normal engineering processes. Ignore the emotional rhetoric of the Dilbert approach to management. 

The core principle of Microeconomics of software development requires we  have some approximation of the future to make decisions about alternatives. The opportunity cost, the trade-space of decision making, requires we approximate the cost and outcomes of our decisions. 

Now add the core business process of managing expenditures against a planned and targeted Return on Investment, which has both Value and Cost in it's equation. 

Then ask those conjecturing there are:

  • Decision making frameworks for project that do not require estimates
  • Investment models for software projects that do not require estimates
  • Project management approaches of dealing with risk, scope management, progress reporting that do not require estimates

To connect the dots to those conjectures with Microeconomics of software development and ROI assessments of standard business processes.


Related articles How NOT to Estimate Anything How To Fix Martin Fowler's Estimating Problem in 3 Easy Steps More #NoEstimates All Project Numbers are Random Numbers - Act Accordingly How To Estimate, If You Really Want To Resources for Moving Beyond the "Estimating Fallacy" Back To The Future How to "Lie" with Statistics


Categories: Project Management

Announcing the GTAC 2014 Agenda

Google Testing Blog - Wed, 10/01/2014 - 01:37
by Anthony Vallone on behalf of the GTAC Committee

We have completed selection and confirmation of all speakers and attendees for GTAC 2014. You can find the detailed agenda at:

Thank you to all who submitted proposals! It was very hard to make selections from so many fantastic submissions.

There was a tremendous amount of interest in GTAC this year with over 1,500 applicants (up from 533 last year) and 194 of those for speaking (up from 88 last year). Unfortunately, our venue only seats 250. However, don’t despair if you did not receive an invitation. Just like last year, anyone can join us via YouTube live streaming. We’ll also be setting up Google Moderator, so remote attendees can get involved in Q&A after each talk. Information about live streaming, Moderator, and other details will be posted on the GTAC site soon and announced here.

Categories: Testing & QA

Distributed Agile: Daily Stand-up Meetings

Stand-ups are best on your feet!

Stand-ups are best on your feet!

Distributed Agile teams require a different level of care and feeding than a co-located team in order to ensure that they are as effective as possible. This is even truer for a team that is working through their forming-storming-norming process. Core to making Agile-as-framework work effectively is the concepts of team and communication. Daily stand-up meetings are one the most important communication tools used in Scrum or other Agile/Lean frameworks. Techniques that are effective making daily stand-ups work for distributed teams include:

  1. Deal with the time zone issue. There are two primary options to deal with time zones. The first is to keep the team members within three or four time zones of each other. Given typical sourcing options, this tends to be difficult. A second option is to rotate the time for the stand-up meeting from sprint to sprint so that everyone loses a similar amount of sleep (share the pain option). One usable solution that can be tried when distributed teams can’t overlap is to have one team member (rotate) staying late or coming in early to overlap work times.
  2. Identify and attack blockers between stand-ups. Typically, on distributed teams, all parties will not work at the same time. Team members should be counseled to communicate blockers to the team as soon as they are discovered so that something discovered late in the day in one time zone does not affect the team in a different time zone that might just be starting to work. One group I worked with had stand-ups twice each day (at the beginning of the day and at the end of the day) to ensure continuous communication.
  3. Push status outside the stand-up. A solution suggested by Matt Hauser is to have the team answer the classic three questions (What did you do yesterday? What will you do today? Is there anything blocking your progress?) on a WIKI for everyone on the team to read before the stand-up meeting. This helps focus the meeting on planning or dealing with issues.
  4. Vary the question set being asked. The process of varying the question set keeps the team focused on communication rather than giving a memorized speech. For example ask:
    1. Is anyone stuck?
    2. Does anyone need help?
    3. What did not get competed yesterday?
    4. Is there anything everyone should know?

This technique can be used for non-distributed teams, as well as distributed teams.

  1. Ensure that everyone is standing. This is code for making sure that everyone is paying attention and staying focused. Standing is just one technique for helping team members stay focused. Others include banning cell phones and side conversations.
  2. Make sure the meeting stays “crisp.” Stand-up meetings by definition are short and to the point. The team needs to ensure that the meeting stays as disciplined as possible. All team members should show up on time and be prepared to discuss their role in the project. Discussion includes the willingness to ask for help and to provide help to team members.
  3. Use a physical status wall. While the term “distributed” screams tool usage, using a physical wall helps to focus the team. The simplicity of a physical wall takes the complexity of tool usage off the table so the focus can be on communication. Use of a physical wall in a distributed environment will mean using video to moving tasks on the wall (after the fact a picture can be provided to the team). If video is not available, use a tool that EVERYONE has access to. Keep tools as simple as possible.
  4. Don’t stop doing stand-ups. Stand-up meetings are a critical communication and planning event, not doing stand-ups for a distributed team is an indicator that the organization should go back to project manager/plan-based methods.

Like any other distributed team meeting, having good telecommunication/video tools is not only important, it is a prerequisite. If team members can’t hear each other, they CAN’T communicate.

Stand-ups are nearly ubiquitous in Agile. I would do stand-ups even if I were not doing Agile. However despite their simplicity, the added complexity of distributed teams can cause problems. The whole team is responsible for making the stand-up meetings work. While the Scrum master may take the lead in insuring the logistics are right or to facilitate the session when needed, everyone needs to play a role.

Categories: Process Management

R: A first attempt at linear regression

Mark Needham - Tue, 09/30/2014 - 23:20

I’ve been working through the videos that accompany the Introduction to Statistical Learning with Applications in R book and thought it’d be interesting to try out the linear regression algorithm against my meetup data set.

I wanted to see how well a linear regression algorithm could predict how many people were likely to RSVP to a particular event. I started with the following code to build a data frame containing some potential predictors:

officeEventsQuery = "MATCH (g:Group {name: \"Neo4j - London User Group\"})-[:HOSTED_EVENT]->(event)<-[:TO]-({response: 'yes'})<-[:RSVPD]-(),
                     WHERE (event.time + event.utc_offset) < timestamp() AND IN [\"Neo Technology\", \"OpenCredo\"]
                     RETURN event.time + event.utc_offset AS eventTime,event.announced_at AS announcedAt,, COUNT(*) AS rsvps"
events = subset(cypher(graph, officeEventsQuery), !
events$eventTime <- timestampToDate(events$eventTime)
events$day <- format(events$eventTime, "%A")
events$monthYear <- format(events$eventTime, "%m-%Y")
events$month <- format(events$eventTime, "%m")
events$year <- format(events$eventTime, "%Y")
events$announcedAt<- timestampToDate(events$announcedAt)
events$timeDiff = as.numeric(events$eventTime - events$announcedAt, units = "days")

If we preview ‘events’ it contains the following columns:

> head(events)
            eventTime         announcedAt                               rsvps       day monthYear month year  timeDiff
1 2013-01-29 18:00:00 2012-11-30 11:30:57                                   Intro to Graphs    24   Tuesday   01-2013    01 2013 60.270174
2 2014-06-24 18:30:00 2014-06-18 19:11:19                                   Intro to Graphs    43   Tuesday   06-2014    06 2014  5.971308
3 2014-06-18 18:30:00 2014-06-08 07:03:13                         Neo4j World Cup Hackathon    24 Wednesday   06-2014    06 2014 10.476933
4 2014-05-20 18:30:00 2014-05-14 18:56:06                                   Intro to Graphs    53   Tuesday   05-2014    05 2014  5.981875
5 2014-02-11 18:00:00 2014-02-05 19:11:03                                   Intro to Graphs    35   Tuesday   02-2014    02 2014  5.950660
6 2014-09-04 18:30:00 2014-08-26 06:34:01 Hands On Intro to Cypher - Neo4j's Query Language    20  Thursday   09-2014    09 2014  9.497211

We want to predict ‘rsvps’ from the other columns so I started off by creating a linear model which took all the other columns into account:

> summary(lm(rsvps ~., data = events))
lm(formula = rsvps ~ ., data = events)
    Min      1Q  Median      3Q     Max 
-8.2582 -1.1538  0.0000  0.4158 10.5803 
Coefficients: (14 not defined because of singularities)
                                                                    Estimate Std. Error t value Pr(>|t|)   
(Intercept)                                                       -9.365e+03  3.009e+03  -3.113  0.00897 **
eventTime                                                          3.609e-06  2.951e-06   1.223  0.24479   
announcedAt                                                        3.278e-06  2.553e-06   1.284  0.22339   
event.nameGraph Modelling - Do's and Don'ts                        4.884e+01  1.140e+01   4.286  0.00106 **
event.nameHands on build your first Neo4j app for Java developers  3.735e+01  1.048e+01   3.562  0.00391 **
event.nameHands On Intro to Cypher - Neo4j's Query Language        2.560e+01  9.713e+00   2.635  0.02177 * 
event.nameIntro to Graphs                                          2.238e+01  8.726e+00   2.564  0.02480 * 
event.nameIntroduction to Graph Database Modeling                 -1.304e+02  4.835e+01  -2.696  0.01946 * 
event.nameLunch with Neo4j's CEO, Emil Eifrem                      3.920e+01  1.113e+01   3.523  0.00420 **
event.nameNeo4j Clojure Hackathon                                 -3.063e+00  1.195e+01  -0.256  0.80203   
event.nameNeo4j Python Hackathon with py2neo's Nigel Small         2.128e+01  1.070e+01   1.989  0.06998 . 
event.nameNeo4j World Cup Hackathon                                5.004e+00  9.622e+00   0.520  0.61251   
dayTuesday                                                         2.068e+01  5.626e+00   3.676  0.00317 **
dayWednesday                                                       2.300e+01  5.522e+00   4.165  0.00131 **
monthYear01-2014                                                  -2.350e+02  7.377e+01  -3.185  0.00784 **
monthYear02-2013                                                  -2.526e+01  1.376e+01  -1.836  0.09130 . 
monthYear02-2014                                                  -2.325e+02  7.763e+01  -2.995  0.01118 * 
monthYear03-2013                                                  -4.605e+01  1.683e+01  -2.736  0.01805 * 
monthYear03-2014                                                  -2.371e+02  8.324e+01  -2.848  0.01468 * 
monthYear04-2013                                                  -6.570e+01  2.309e+01  -2.845  0.01477 * 
monthYear04-2014                                                  -2.535e+02  8.746e+01  -2.899  0.01336 * 
monthYear05-2013                                                  -8.672e+01  2.845e+01  -3.049  0.01011 * 
monthYear05-2014                                                  -2.802e+02  9.420e+01  -2.975  0.01160 * 
monthYear06-2013                                                  -1.022e+02  3.283e+01  -3.113  0.00897 **
monthYear06-2014                                                  -2.996e+02  1.003e+02  -2.988  0.01132 * 
monthYear07-2014                                                  -3.123e+02  1.054e+02  -2.965  0.01182 * 
monthYear08-2013                                                  -1.326e+02  4.323e+01  -3.067  0.00976 **
monthYear08-2014                                                  -3.060e+02  1.107e+02  -2.763  0.01718 * 
monthYear09-2013                                                          NA         NA      NA       NA   
monthYear09-2014                                                  -3.465e+02  1.164e+02  -2.976  0.01158 * 
monthYear10-2012                                                   2.602e+01  1.959e+01   1.328  0.20886   
monthYear10-2013                                                  -1.728e+02  5.678e+01  -3.044  0.01020 * 
monthYear11-2012                                                   2.717e+01  1.509e+01   1.800  0.09704 . 
month02                                                                   NA         NA      NA       NA   
month03                                                                   NA         NA      NA       NA   
month04                                                                   NA         NA      NA       NA   
month05                                                                   NA         NA      NA       NA   
month06                                                                   NA         NA      NA       NA   
month07                                                                   NA         NA      NA       NA   
month08                                                                   NA         NA      NA       NA   
month09                                                                   NA         NA      NA       NA   
month10                                                                   NA         NA      NA       NA   
month11                                                                   NA         NA      NA       NA   
year2013                                                                  NA         NA      NA       NA   
year2014                                                                  NA         NA      NA       NA   
timeDiff                                                                  NA         NA      NA       NA   
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.287 on 12 degrees of freedom
Multiple R-squared:  0.9585,	Adjusted R-squared:  0.8512 
F-statistic: 8.934 on 31 and 12 DF,  p-value: 0.0001399

As I understand it we can look at the R-squared value to understand how much of the variance in the data has been explained by the model – in this case it’s 85%.

A lot of the coefficients seem to be based around specific event names which seems a bit too specific to me so I wanted to see what would happen if I derived a feature which indicated whether a session was practical:

events$practical = grepl("Hackathon|Hands on|Hands On", events$

We can now run the model again with the new column having excluded ‘’ field:

> summary(lm(rsvps ~., data = subset(events, select = -c(
lm(formula = rsvps ~ ., data = subset(events, select = -c(
    Min      1Q  Median      3Q     Max 
-18.647  -2.311   0.000   2.908  23.218 
Coefficients: (13 not defined because of singularities)
                   Estimate Std. Error t value Pr(>|t|)  
(Intercept)      -3.980e+03  4.752e+03  -0.838   0.4127  
eventTime         2.907e-06  3.873e-06   0.751   0.4621  
announcedAt       3.336e-08  3.559e-06   0.009   0.9926  
dayTuesday        7.547e+00  6.080e+00   1.241   0.2296  
dayWednesday      2.442e+00  7.046e+00   0.347   0.7327  
monthYear01-2014 -9.562e+01  1.187e+02  -0.806   0.4303  
monthYear02-2013 -4.230e+00  2.289e+01  -0.185   0.8553  
monthYear02-2014 -9.156e+01  1.254e+02  -0.730   0.4742  
monthYear03-2013 -1.633e+01  2.808e+01  -0.582   0.5676  
monthYear03-2014 -8.094e+01  1.329e+02  -0.609   0.5496  
monthYear04-2013 -2.249e+01  3.785e+01  -0.594   0.5595  
monthYear04-2014 -9.230e+01  1.401e+02  -0.659   0.5180  
monthYear05-2013 -3.237e+01  4.654e+01  -0.696   0.4952  
monthYear05-2014 -1.015e+02  1.509e+02  -0.673   0.5092  
monthYear06-2013 -3.947e+01  5.355e+01  -0.737   0.4701  
monthYear06-2014 -1.081e+02  1.604e+02  -0.674   0.5084  
monthYear07-2014 -1.110e+02  1.678e+02  -0.661   0.5163  
monthYear08-2013 -5.144e+01  6.988e+01  -0.736   0.4706  
monthYear08-2014 -1.023e+02  1.784e+02  -0.573   0.5731  
monthYear09-2013 -6.057e+01  7.893e+01  -0.767   0.4523  
monthYear09-2014 -1.260e+02  1.874e+02  -0.672   0.5094  
monthYear10-2012  9.557e+00  2.873e+01   0.333   0.7430  
monthYear10-2013 -6.450e+01  9.169e+01  -0.703   0.4903  
monthYear11-2012  1.689e+01  2.316e+01   0.729   0.4748  
month02                  NA         NA      NA       NA  
month03                  NA         NA      NA       NA  
month04                  NA         NA      NA       NA  
month05                  NA         NA      NA       NA  
month06                  NA         NA      NA       NA  
month07                  NA         NA      NA       NA  
month08                  NA         NA      NA       NA  
month09                  NA         NA      NA       NA  
month10                  NA         NA      NA       NA  
month11                  NA         NA      NA       NA  
year2013                 NA         NA      NA       NA  
year2014                 NA         NA      NA       NA  
timeDiff                 NA         NA      NA       NA  
practicalTRUE    -9.388e+00  5.289e+00  -1.775   0.0919 .
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.21 on 19 degrees of freedom
Multiple R-squared:  0.7546,	Adjusted R-squared:  0.4446 
F-statistic: 2.434 on 24 and 19 DF,  p-value: 0.02592

Now we’re only accounting for 44% of the variance and none of our coefficients are significant so this wasn’t such a good change.

I also noticed that we’ve got a bit of overlap in the date related features – we’ve got one column for monthYear and then separate ones for month and year. Let’s strip out the combined one:

> summary(lm(rsvps ~., data = subset(events, select = -c(, monthYear))))
lm(formula = rsvps ~ ., data = subset(events, select = -c(, 
     Min       1Q   Median       3Q      Max 
-16.5745  -4.0507  -0.1042   3.6586  24.4715 
Coefficients: (1 not defined because of singularities)
                Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -1.573e+03  4.315e+03  -0.364   0.7185  
eventTime      3.320e-06  3.434e-06   0.967   0.3425  
announcedAt   -2.149e-06  2.201e-06  -0.976   0.3379  
dayTuesday     4.713e+00  5.871e+00   0.803   0.4294  
dayWednesday  -2.253e-01  6.685e+00  -0.034   0.9734  
month02        3.164e+00  1.285e+01   0.246   0.8075  
month03        1.127e+01  1.858e+01   0.607   0.5494  
month04        4.148e+00  2.581e+01   0.161   0.8736  
month05        1.979e+00  3.425e+01   0.058   0.9544  
month06       -1.220e-01  4.271e+01  -0.003   0.9977  
month07        1.671e+00  4.955e+01   0.034   0.9734  
month08        8.849e+00  5.940e+01   0.149   0.8827  
month09       -5.496e+00  6.782e+01  -0.081   0.9360  
month10       -5.066e+00  7.893e+01  -0.064   0.9493  
month11        4.255e+00  8.697e+01   0.049   0.9614  
year2013      -1.799e+01  1.032e+02  -0.174   0.8629  
year2014      -3.281e+01  2.045e+02  -0.160   0.8738  
timeDiff              NA         NA      NA       NA  
practicalTRUE -9.816e+00  5.084e+00  -1.931   0.0645 .
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.19 on 26 degrees of freedom
Multiple R-squared:  0.666,	Adjusted R-squared:  0.4476 
F-statistic: 3.049 on 17 and 26 DF,  p-value: 0.005187

Again none of the coefficients are statistically significant which is disappointing. I think the main problem may be that I have very few data points (only 42) making it difficult to come up with a general model.

I think my next step is to look for some other features that could impact the number of RSVPs e.g. other events on that day, the weather.

I’m a novice at this but trying to learn more so if you have any ideas of what I should do next please let me know.

Categories: Programming

Neo4j: Generic/Vague relationship names

Mark Needham - Tue, 09/30/2014 - 17:47

An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node.

Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in.

However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out.

I created 4 different databases which had one node with 60,000 outgoing relationships – 10,000 which we wanted to retrieve and 50,000 that were irrelevant.

I modelled the ‘relationship’ in 4 different ways…

  • Using a specific relationship type
  • Using a generic relationship type and then filtering by end node label
  • Using a generic relationship type and then filtering by relationship property
    (node)-[:HAS {type: "address"}]->(address)
  • Using a generic relationship type and then filtering by end node property
    (node)-[:HAS]->(address {type: “address”})

…and then measured how long it took to retrieve the ‘has address’ relationships.

The code is on github if you want to take a look.

Although it’s obviously not as precise as a JMH micro benchmark I think it’s good enough to get a feel for the difference between the approaches.

I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms):

Using a generic relationship type and then filtering by end node label
50%ile: 6.0    75%ile: 6.0    99%ile: 402.60999999999825
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0   75%ile: 22.0   99%ile: 504.85999999999785
Using a generic relationship type and then filtering by end node label
50%ile: 4.0    75%ile: 4.0    99%ile: 145.65999999999931
Using a specific relationship type
50%ile: 0.0    75%ile: 1.0    99%ile: 25.749999999999872

We can drill further into why there’s a difference in the times for each of the approaches by profiling the equivalent cypher query. We’ll start with the one which uses a specific relationship name

Using a specific relationship type

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS_ADDRESS]->() return count(n);
| count(n) |
| 10000    |
1 row
|             Operator |  Rows | DbHits |                 Identifiers |                 Other |
|         ColumnFilter |     1 |      0 |                             | keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                       |
| SimplePatternMatcher | 10000 |  10000 | n,   UNNAMED53,   UNNAMED35 |                       |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |          {  AUTOINT0} |
Total database accesses: 10001

Here we can see that there were 10,002 database accesses in order to get a count of our 10,000 HAS_ADDRESS relationships. We get a database access each time we load a node, relationship or property.

By contrast the other approaches have to load in a lot more data only to then filter it out:

Using a generic relationship type and then filtering by end node label

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->(:Address) return count(n);
| count(n) |
| 10000    |
1 row
|             Operator |  Rows | DbHits |                 Identifiers |                            Other |
|         ColumnFilter |     1 |      0 |                             |            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                  |
|               Filter | 10000 |  10000 |                             | hasLabel(  UNNAMED45:Address(0)) |
| SimplePatternMatcher | 10000 |  60000 | n,   UNNAMED45,   UNNAMED35 |                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                     {  AUTOINT0} |
Total database accesses: 70001

Using a generic relationship type and then filtering by relationship property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS {type: "address"}]->() return count(n);
| count(n) |
| 10000    |
1 row
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED35,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED63,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
Total database accesses: 140001

Using a generic relationship type and then filtering by end node property

neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->({type: "address"}) return count(n);
| count(n) |
| 10000    |
1 row
|             Operator |  Rows | DbHits |                 Identifiers |                                            Other |
|         ColumnFilter |     1 |      0 |                             |                            keep columns count(n) |
|     EagerAggregation |     1 |      0 |                             |                                                  |
|               Filter | 10000 |  20000 |                             | Property(  UNNAMED45,type(0)) == {  AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n,   UNNAMED45,   UNNAMED35 |                                                  |
|      NodeByIdOrEmpty |     1 |      1 |                        n, n |                                     {  AUTOINT0} |
Total database accesses: 140001

So in summary…specific relationships #ftw!

Categories: Programming

Sudoku, Linear Optimization, and the Ten Cent Diet

Google Code Blog - Tue, 09/30/2014 - 17:12
Originally posted on the Google Research blog. Cross posted on the Google Apps Developers blog

In 1945, future Nobel laureate George Stigler wrote an essay in the Journal of Farm Economics titled The Cost of Subsistence about a seemingly simple problem: how could a soldier be fed for as little money as possible?

The “Stigler Diet” became a classic problem in the then-new field of linear optimization, which is used today in many areas of science and engineering. Any time you have a set of linear constraints such as “at least 50 square meters of solar panels” or “the amount of paint should equal the amount of primer” along with a linear goal (e.g., “minimize cost” or “maximize customers served”), that’s a linear optimization problem.

At Google, our engineers work on plenty of optimization problems. One example is our YouTube video stabilization system, which uses linear optimization to eliminate the shakiness of handheld cameras. A more lighthearted example is in the Google Docs Sudoku add-on, which instantaneously generates and solves Sudoku puzzles inside a Google Sheet, using the SCIP mixed integer programming solver to compute the solution.
Today we’re proud to announce two new ways for everyone to solve linear optimization problems. First, you can now solve linear optimization problems in Google Sheets with the Linear Optimization add-on written by Google Software Engineer Mihai Amarandei-Stavila. The add-on uses Google Apps Script to send optimization problems to Google servers. The solutions are displayed inside the spreadsheet. For developers who want to create their own applications on top of Google Apps, we also provide an API to let you call our linear solver directly.
Second, we’re open-sourcing the linear solver underlying the add-on: Glop (the Google Linear Optimization Package), created by Bruno de Backer with other members of the Google Optimization team. It’s available as part of the or-tools suite and we provide a few examples to get you started. On that page, you’ll find the Glop solution to the Stigler diet problem. (A Google Sheets file that uses Glop and the Linear Optimization add-on to solve the Stigler diet problem is available here. You’ll need to install the add-on first.)

Stigler posed his problem as follows: given nine nutrients (calories, protein, Vitamin C, and so on) and 77 candidate foods, find the foods that could sustain soldiers at minimum cost.

The Simplex algorithm for linear optimization was two years away from being invented, so Stigler had to do his best, arriving at a diet that cost $39.93 per year (in 1939 dollars), or just over ten cents per day. Even that wasn’t the cheapest diet. In 1947, Jack Laderman used Simplex, nine calculator-wielding clerks, and 120 person-days to arrive at the optimal solution.

Glop’s Simplex implementation solves the problem in 300 milliseconds. Unfortunately, Stigler didn’t include taste as a constraint, and so the poor hypothetical soldiers will eat nothing but the following, ever:

  • Enriched wheat flour
  • Liver
  • Cabbage
  • Spinach
  • Navy beans

Is it possible to create an appealing dish out of these five ingredients? Google Chef Anthony Marco took it as a challenge, and we’re calling the result Foie Linéaire à la Stigler:
This optimal meal consists of seared calf liver dredged in flour, atop a navy bean purée with marinated cabbage and a spinach pesto.

Chef Marco reported that the most difficult constraint was making the dish tasty without butter or cream. That said, I had the opportunity to taste our linear optimization solution, and it was delicious.

Posted by Jon Orwant, Engineering Manager
Categories: Programming

Episode 211: Continuous Delivery on Windows with Rachel Laycock and Max Lincoln

Johannes talks with Rachel Laycock and Max Lincoln from ThoughtWorks about continuous delivery on Windows. The outline includes: introduction to continuous delivery; continuous integration; DevOps and ChatOps; decisions to be taken when implementing continuous delivery on windows; build tools on windows; packaging and deploy on windows; infrastructure automation and infrastructure as code with chef, puppet […]
Categories: Programming

Just Enough

Software Requirements Blog - - Tue, 09/30/2014 - 17:00
One concept you’ll hear tossed about in an Agile discussion is that of “just enough.” You want just enough documentation, just enough development and testing, just enough time for meetings, just enough grooming, and so on. The idea is that doing more than is needed means you have throwaway work when you need to make […]
Categories: Requirements

Sponsored Post: Apple, Flipboard, All Your Base, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Siri Operations Developer. Apple is looking for talented developers to help build the next generation internal cloud platform for Siri. This person should be excited about solving difficult distributed systems problems as well as constantly improving user-experience. This person will be working with a highly technical and motivated team solving the hard problems. Please apply here.
    • Site Reliability Engineer. The Apple Pay Site Reliability Team is hiring for multiple roles focused on the front line customer experience and the back end integration of Apple systems with our Network and Banking partners. Please apply here.
    • Senior Software Engineer, iTunes Infrastructure. Hands-on senior software engineering for the iTunes digital media supply chain engineering team. We are looking for a self starting, energetic individual who is not afraid to question assumptions and with excellent written and oral communication skills. Please apply here
    • iTunes - Content Management Tools Engineer. The candidate should have several years experience developing large-scale web-based applications using object-oriented languages. Excellent understanding of relational databases and data-modeling techniques is also a must. Please apply here
    • Senior Engineer: Mobile Services. The Emerging Technologies/Mobile Services team is looking for a proactive and hardworking software engineer to join our team. The team is responsible for a variety of high quality and high performing mobile services and applications for internal use. We seek an accomplished server-side engineer capable of delivering an extraordinary portfolio of features and services based on emerging technologies to our internal customers. Please apply here.
    • Apple Pay Automation Engineer. The Apple Pay group within iOS Systems is looking for a outstanding automation engineer with strong experience in building client and server test automation. We work in an agile software development environment and are building infrastructure to move towards continuous delivery where every code change is thoroughly tested by push of a button and is considered ready to be deployed if we choose so. Please apply here

  • Flipboard's Site Reliability Engineering Team is hiring! This team offers great challenges solving unique problems unlike any you have seen!  They work exclusively in the cloud, ensuring a highly available and performant product to millions of users daily.  If you have a passion for large-scale systems, next generation provisioning and orchestration tools apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • All Your Base is the only curated database conference of its kind in the UK. Listen to talks from database creators, industry leaders and developers working at the coal face on where to store and how to handle your data. Book tickets.
Cool Products and Services
  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Better, Faster, Cheaper: Pick Three. Scalyr is your universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs”; our columnar data store enables enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – get on board!

  • Are You a Startup in need of Reliable Speed at Scale? The Aerospike Startup Special provides free access to the Aerospike Enterprise Edition software with Community Support for qualifying startups. Learn more and see if you qualify!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

What Can Lean Learn From Systems Engineering?

Herding Cats - Glen Alleman - Tue, 09/30/2014 - 16:33

The Lean Aerospace Initiative and the Lean Aerospace Initiative Consortium define processes applicable in many domains for applying lean. At first glance there is no natural connection between Lean and System Engineering. The ideas below are from a paper Igave at a Lean conference.

Key Takeaways

  • Lean and Systems engineering are cousins.
  • All but trivial projects are systems and many are systems of systems. Thinking like a systems engineer is the basis of implementing Lean processes. Thinking in the absence of systems, does little to add sustaining value to any process improvement.
  • Product development is a value stream process, but how the components interact at the technical, business, financial, and operational levels is a systems engineering process. Lean itself does not possess the vocabulary to speak to these systems complexity issues [1]

Core Concepts of Systems Engineering

  1. Capture and understand the requirements in terms of Capabilities assessed through Measures of Effectiveness (MOE) and Measures of Performance (MOP).
  2. Ensure requirements are consistent with what is predicted to be possible in a solution in these MOEs and MOPs.
  3. Treat goals as desired characteristics for what may not be possible.
  4. Define the MOE, MOP, goals, and solutions for the whole lifecycle of the project in units meaningful to the buyer.
  5. Maintain the distinction between the statement of the problem and the description of the solution.
  6. Baseline each statement of the problem and the statement of the solution.
  7. Identify descriptions of alternative solutions.
  8. Develop descriptions of the solution.
  9. Except for simple problems, develop a logical  solution description.
  10. Be prepared to iterate in design to drive up effectiveness.
  11. Base the solution of the evaluation of its effectiveness, in units of measure meaningful to the buyer.
  12. Independently verify all work products.
  13. Validate all work products from the perspective of the stakeholders.
  14. Some management is needed to plan and implement effective and efficient transformation of requirements and goals into a description of the solution.

Typical System Engineering Activities

  1. Technical management
  2. System design
  3. Product realization
  4. Technical analysis and evaluation
  5. Product control
  6. Process control
  7. Post implementation support

Steps to Lean Thinking [2]

  1. Specify value
  2. Identify value stream
  3. Make value flow continuously
  4. Let customers pull value
  5. Pursue perfection

Differences and Similarities between Lean and Systems Engineering

  1. Both emerged from practice. Only later were the principles and theories codified.
  2. Both have focused on different phases of the product lifecycle. SE is generally on product development. SE is more focused on planning. Lean generally on product production. While Lean is more focused on empirical action.
  3. Unlike Lean, SE has less focus on quality, except for Integrated Product and Product Development (IPPD).

Despite these differences and similarities both Lean and Systems Engineering are focused on the same objectives – delivering products or lifecycle value to the stakeholders.

It is the lifecycle value that drives both paradigms and must drive any other process paradigm associated with Lean and Systems Engineering. Paradigm like software development, the management of any form of a project and the very notion of agile. A critical understanding often missed is that Lifecycle Value includes the cost of delivering that value.

Value can't be determined in the absence of knowing the cost. ROI and Microeconomics of decision making require both variables to be used to make decisions.

What do we mean by lifecycle?

Generally lifecycle is a combination of product performance, quality, cost and fulfillment of the buyers needed capabilities.[3]

Lean and Systems Engineering share this common goal. The more complex the system, the more contribution there from Lean and SE.

Putting Lean and Systems Engineering Together on Real Projects

First some success factors on complex projects [4]

  1. Dedicated and stable interdisciplinary teams
  2. Use of prototypes and models to generate tradeoffs
  3. Prioritizing product features
  4. Engagement with senior management and customers at every point in the project
  5. Some form of high performing front end decision process that reduces instability of key inputs and improve the flow of work throughout the product lifecycle.

This last success factor is core to any complex environment, no matter what the process is called. In the absence of stability of requirements and funding, improvements to the flow of work is constrained.

The notion of adapting to changing requirements is not the same as having the requirements – and the associated funding – be unstable.

Mapping of the Value Stream to the work process requires some level of stability. It is the search for this stability where Systems Engineering – as a paradigm – adds measureable value to any Lean initiative.

The standardization and commonality of processes across complex systems is the basis for this value. [5]


  1. Lean and SE are two side of the same coin regarding the objective of creating value for the stakeholder
  2. Lean and SE complement each other during different phases of the project – ideation, product trades for SE and production waste removal for Lean anchor both ends of the spectrum of improvement opportunities.
  3. Value stream thinking makes visible the paths to be taken in transitioning to a Lean paradigm while maintaining the principles of systems engineering. [6]
  4. The result is the combination of Speed and Robustness – systems are easily adaptable to change while maintaining fewer surprises, using leading indicators to make decisions and decreasing sensitivity to production and use variables.

[1] “The Lean Enterprise – A Management Philosophy at Lockheed Martin,” Joyce and Schechter, Defense Acquisition Review Journal, 2004.

[2] Lean Thinking, Womack and Jones, Simon and Schuster, 1996

[3] Lean Enterprise Value: Insights from MIT’s Lean Aerospace Initiative, Murman, et al, Palgrave 2002.

[4] “Lean Systems Engineering: Research Initiatives in Support of a New Paradigm,” Rebentisch, Rhodes, and Murman, Conference on Systems Engineering, April 2004.

[5] LM21 Best Practices, Jack Hugus, National Security Studies, Louis A. Bantle Symposium, Syracuse University Maxwell School, October 1999

[6] “Enterprise Transition to Lean Roadmap,” MIT Lean Aerospace Initiative, 2004 Plenary Conference.


Related articles Why Projects Fail, No Matter the Domain When We Say Risk What Do We Really Mean? How to Deal With Complexity In Software Projects? Big Systems Acquisitions - Lessons for ACA Web Site
Categories: Project Management

Handling Requests for Unnecessary Artifacts

Mike Cohn's Blog - Tue, 09/30/2014 - 15:00

The following was originally published in Mike Cohn's monthly newsletter. If you like what you're reading, sign up to have this content delivered to your inbox weeks before it's posted on the blog, here.

“Working software over comprehensive documentation.” You’ve certainly seen that statement on the Agile Manifesto. It is perhaps the most important of the Manifesto’s four value statements—working software is, after all, the reason a team has undertaken a software development effort.

It is also one of the most misused parts of the Manifesto. This is the quote people cite when trying to get out of all documentation, which is not what the Manifesto says we should value.

Some documentation on a project can be great. But most non-agile teams write too much and talk too little. Some agile teams go to the opposite extreme, but many seem to find a good balance.

Occasionally, though, a team and product owner may disagree on the necessity of a document—usually with the product owner wanting a document and the team saying it’s not necessary. I’ve found two guidelines helpful in determining how to handle requests for various artifacts, especially documentation, on an agile project.

Guideline No. 1: If a team would produce an artifact while in the process of creating working software, that artifact is just naturally produced.

This guideline covers essentially everything a team would want to produce while on the way to building a system or product. It includes, for example, source code. It also includes any design documents, user guides and other items that the team wants to produce for the benefit of the current team, future teams maintaining the software or end users.

Guideline No. 2: If an artifact would not naturally be produced in the process of creating working software, the artifact is added to the product backlog.

The second guideline is there to cover cases when the product owner (or any other outside stakeholder) wants an artifact produced (usually a document) that the team would not normally produce.

For example, suppose the product owner asks the team to write a document describing every table and field in the database. I’ve certainly seen projects where such a document has been extremely helpful. (In fact, I’ve both requested and written such a document before.) But, I’ve always seen projects where this would have been unnecessary.

If the team thought this database description document were helpful, they would produce it in the process of creating the working software. And Guideline No. 1 would apply. But if they don’t think this document is necessary, they won’t produce it. Unless, that is, the product owner insists, which is where Guideline No. 2 comes in.

If the product owner wants this document, the product owner creates a new product backlog item saying so. The team can then estimate the time it will take to develop this document, just like they’d estimate any other product backlog item.

Putting an estimate on creating the document makes its cost explicit. This forces a product owner to think about the opportunity cost of developing that document. The product owner will be able to ponder: This five-point document or five points worth of new features?

I don’t know which the product owner will choose, but this approach makes the cost of that artifact explicit, allowing it to be compared with the value of additional features instead.

I’d love to know your thoughts on this. How does your team handle product owner requests for artifacts the team wouldn’t naturally produce? What artifacts does your team find helpful? Please share your thoughts in the discussion below.

Is Agile Dead or Can Good Software Development Scale?

From the Editor of Methods & Tools - Tue, 09/30/2014 - 13:23
As Agile becomes widely accepted as a software development approach, many large organizations have adopted it, mainly in its Scrum form to reduce development cycle. There might be even a fair share of adopters that are trying really to apply Agile values. If the topic of scaling Agile has been discussed for many years and you can read the excellent books of Graig Larman and Bas Vodde on this topic. We have also recently seen the emergence of proprietary” approaches, like SAFE, to achieve this goal. At the same time, ...

Distributed Agile: Sprint Planning

#6 Make sure the telecommunications tools work.

#6 Make sure the telecommunications tools work.

In Distributed Agile: Distributed Team Degree of Difficulty Matrix, I described the many flavors of distributed Agile teams and the complexity different configurations create. While all things being equal, distributed team are less effective than collocated teams. Never the less, distributed Agile with teams spread across countries, continents and companies have become a fact of life. There are techniques to help distributed Agile teams become more effective. In an environment using Scrum, the first formal activities for most team’s is sprint planning. There are numerous techniques that can help make distributed Agile more effective. These techniques include:

1.   Bring the team physically together. Co-location, whether for a single sprint or some periodic basis, will increase the team’s ability to understand each other and know how to work together more effectively.
2.   Develop a sprint planning checklist. The process of getting together and planning is a fairly predictable process. Capture the typical preparation and meeting tasks and make sure they happen. Items can include booking rooms, securing video or telecom facilities, publishing an agenda with breaks and more.
3.   Review the definition of done. Ensure that everyone understands the organization’s definition of done before the starting to plan. The definition of done will help the team know the tasks they need to complete during the sprint to meet the organization’s (or product owner’s) process standards.
4.   Focus on the stories. Don’t let distractions get in the way of planning. Before beginning the planning session, review the process that will be followed with the entire team. Make sure that planning the next sprint is the only topic on the agenda.
5.   Ensure that the stories have been properly groomed. The stories that the team will accept and plan need to be properly formed and have acceptance criteria. This generally means that the stories that are most apt to be accepted by team (and a few more) need to have been through a grooming session. Make this a prerequisite for the planning meeting.
6.   Make sure the telecommunications tools work and have a backup. Distributed planning means that all of the team will be using the phone or video conference. Make sure they are set up and tested. Also always have a backup plan in case your favorite collaboration tool fails because sooner or later it will. Planning is a whole team activity and when the whole team can’t participate planning, it will lose effectiveness.
7.   Everyone should understand the big picture. Have the product owner provide an overview of the goals of the project, and how the current sprint will support those goals. Repeating the big picture will provide the team with a common touch point to validate progress.
8.   Use physical tools for interaction. Physical tools, like flip charts and card walls, can be difficult when many locations are involved in sprint planning. However, when possible, use physical tools like flip charts and whiteboard and then use webcams (preferable) and cameras to share data. Have one location scribe one story and then switch locations for the next story.
9.   Try multiple facilitators. When a team is evenly distributed between two locations consider having another scrum master act as a second facilitator to ensure everyone stays on track. Similarly, have the Scrum master rotate between locations to facilitate the planning session. This can be very effective in helping each location feel connected.
10.Remember that sprint planning is a team meeting. Make sure everyone is involved.

Sprint planning, done well, helps a team understand what they have to do in order to consider a story complete, both from a functional and technical perspective. Distributed Agile teams will need to focus on making sure that everyone is involved and a part of the planning process. Remember to plan for planning, because when you are on the other end of a phone or videoconferencing the tools, process and logistics can make or break the meeting!

Categories: Process Management

The Cognitive Illusion of Bad Software Project Outcomes

Herding Cats - Glen Alleman - Tue, 09/30/2014 - 00:41

Daniel Kahneman's and Amos Tversky's paper On The Reality of Cognitive Illusion. ‡ They suggest, through their research, that intuitive predictions and judgements are often mediated by a small number of distinctive mental operations, called judgement heuristics. These heuristics often lead to characteristic errors and biases.

For example, the effect of aerial perspective on apparent distance is confirmed by the observation that the same mountain appears closer on a clear day rather than a hazy day. The intuitive predication and judgement of probability are often based on the relations of similarity between evidence and possible outcomes. This representativeness is an assessment of the degree of correspondence between a sample and a population. 

The next heuristic is the availability bias in which the probability is estimated by assessing availability or associative distance. † Experience shows and experiments confirm that large classes are recalled better and faster than instances of less frequent classes. That likely occurrences are easier to imagine than unlikely ones. And associative connections are strengthened when two events frequently co-occur. That these associative bonds are strengthened by repetition is the basis of memory. 

So Here's the Issue

When we hear or read that software projects fail often or Standish report says ..., or a personal anecdote that resonates with our own personal experience, we recall that experience from memory. The actual data from the population of all data are not used for comparison. Rather we assume - by applying the cognitive illusion - that the sample sata represents the large class of population data, since our repeated observations of the sample data class has reinforced our illusion that that sample data IS the population.

This is the core issue with anecdotal information when making decisions in the presence of uncertainty. Or speaking about a condition in the absence of statistically testable hypothesis. Or attempting to convey a message in the absence of external confirmation that the message is on solid footing compared to the population of data.

Why This Is Not Good Management

When we hear we're all bad at making estimates, in the absence of actual population statistics about estimate making, we're using Cognitive Illusions and Availability Heuristics. Because we have personal experience with making bad estimates and the majority of people we associate with have the same experience.

This experience is in no way representative of the population of people tasked with making estimates. This would be irrelevant of course if the conversation were simple chatter at the bar. But once that conversation enters the realm of policy making, method development, or suggestions that the anecdotal observations need to result in changing how business conducts its business - we're bad at making estimates so the solution is to stop making estimates - then both availability bias and Cognitive Illusions have displaced the actual conversation about the very validity of the anecdotal concepts. And it is replaced by strong defense of the cognitively biased dea, no matter the credibility of the concept - which is most often weak at best and simply false at worse.

So next time you hear some statement about something involving observational and anecdotal data, ask a simple question.

What's the process by which these anecdotal observation have been tested in the broader population of conditions?

This is the core issue with the Standish Reports. They are self selected samples of projects that are troubled in the absence of the population of projects that are troubled and not troubled. 

Always ask for references, data representative of the references, and an assessment of the statistical confidence that the anecdotal data is in fact correlated with the population data. Otherwise it's just an opinion, and very likely an uniformed opinion.

And if you're paying money to listen to someone tell you ancedotal data and don't speak up and ask those questions, you've participated in the availability heuristic and cognitive illusion along with the speaker.

† Availability: A Heuristic for Judging Frequency and Probability, Amos Tversky and Daniel Kahneman, a chapter appearing in Cognitive Psychology, 1973

‡ On the Reality of Cognitive Illusions, Daniel Kahneman and Amos Tversky, Psychological Review, Vol. 103, No. 3, pp. 582-591

Categories: Project Management

PostgreSQL: ERROR: column does not exist

Mark Needham - Mon, 09/29/2014 - 23:40

I’ve been playing around with PostgreSQL recently and in particular the Northwind dataset typically used as an introductory data set for relational databases.

Having imported the data I wanted to take a quick look at the employees table:

postgres=# SELECT * FROM employees LIMIT 1;
 EmployeeID | LastName | FirstName |        Title         | TitleOfCourtesy | BirthDate  |  HireDate  |           Address           |  City   | Region | PostalCode | Country |   HomePhone    | Extension | Photo |                                                                                      Notes                                                                                      | ReportsTo |              PhotoPath               
          1 | Davolio  | Nancy     | Sales Representative | Ms.             | 1948-12-08 | 1992-05-01 | 507 - 20th Ave. E.\nApt. 2A | Seattle | WA     | 98122      | USA     | (206) 555-9857 | 5467      | \x    | Education includes a BA IN psychology FROM Colorado State University IN 1970.  She also completed "The Art of the Cold Call."  Nancy IS a member OF Toastmasters International. |         2 | http://accweb/emmployees/davolio.bmp
(1 ROW)

That works fine but what if I only want to return the ‘EmployeeID’ field?

postgres=# SELECT EmployeeID FROM employees LIMIT 1;
ERROR:  COLUMN "employeeid" does NOT exist
LINE 1: SELECT EmployeeID FROM employees LIMIT 1;

I hadn’t realised (or had forgotten) that field names get lower cased so we need to quote the name if it’s been stored in mixed case:

postgres=# SELECT "EmployeeID" FROM employees LIMIT 1;
(1 ROW)

From my reading the suggestion seems to be to have your field names lower cased to avoid this problem but since it’s just a dummy data set I guess I’ll just put up with the quoting overhead for now.

Categories: Programming

R: Deriving a new data frame column based on containing string

Mark Needham - Mon, 09/29/2014 - 22:37

I’ve been playing around with R data frames a bit more and one thing I wanted to do was derive a new column based on the text contained in the existing column.

I started with something like this:

> x = data.frame(name = c("Java Hackathon", "Intro to Graphs", "Hands on Cypher"))
> x
1  Java Hackathon
2 Intro to Graphs
3 Hands on Cypher

And I wanted to derive a new column based on whether or not the session was a practical one. The grepl function seemed to be the best tool for the job:

> grepl("Hackathon|Hands on|Hands On", x$name)

We can then add a column to our data frame with that output:

x$practical = grepl("Hackathon|Hands on|Hands On", x$name)

And we end up with the following:

> x
             name practical
1  Java Hackathon      TRUE
2 Intro to Graphs     FALSE
3 Hands on Cypher      TRUE

Not too tricky but it took me a bit too long to figure it out so I thought I’d save future Mark some time!

Categories: Programming

Instagram Improved their App's Performance. Here's How.

Is flat design just another pretty face or is it a huge performance hack cloaked as a UI revolution? It turns out flat design is a stone cold performance win.

This and more is expertly explained by Tyler Kieft, Engineer at Instagram, in a crisp and content filled talk he gave at the @scale conferenceInstagram on Typical Android. This talk was part of series of talks given by Facebook on how to design for the reality of mobile applications across the globe, where phones are slower, screens are smaller, and networks are slower than they are in the US.

Designing for a typical phone rather than a high-end phone required the Instagram team to rethink their design in a deep way. One of the revelations in Tyler's talk was that moving to a flat design was huge in making the application more beautiful, more usable, and it also substantially increased performance.

This was quite a surprise. I've only ever thought of flat design as just a way to think about how to build pretty UIs. Silly me. Thanks to Tyler for explaining the benefits of flat design so clearly and forcefully, using Instagram as a great example of what is possible.

Flat design is the anti-skeuomorphism, going digital native, eschewing a slavish obsession with the appearance of reality, adopting simple elements, simple typography, flat colors, and simple designs.

Using flat design Instagram was able shave off 120ms from its cold start times. It was also able to reduce the number of assets it took to display the feed screen from 29 assets down to 8 assets. All while making the application more beautiful, more usable, with giving more focus given to the content across different phone sizes.

How did flat design make all this possible? Please keep on reading...

Categories: Architecture

The Future of Jobs

Will you have a job in the future?

What will that job look like and how will the nature of work change?

Will automation take over your job in the near future?

These are the kinds of questions that Ruth Fisher, author of Winning the Hardware-Software Game, has tackled in a series of posts.

I wrote a summary post to distill her big ideas and insights about the future of jobs in my post:

The Future of Jobs

Fisher has done an outstanding job of framing out the landscape and walking the various arguments and perspectives on how automation will change the nature of work and shape the future of jobs.

One of the first things you might be wondering is, what jobs will automation take away?

Fisher addresses that.

Another question is, what new types jobs will be created?

While that’s an exercise for the reader, Fisher provides clues based on what industry luminaries have seen in terms of how jobs are changing.

The key is to know what automation can and can’t do, and to look at the pattern of work in terms of what’s better suited for humans, and what’s better suited for machines.

As one of my mentors puts it, “If the work can be automated, it’s not human.”

He’s a fan of people doing creative, non-routine work, where they can thrive and shine.

As I take on work, or push back on work, I look through a pretty simple lens:

  1. Is the work repetitive in nature? (in which case, something that should be automated)
  2. Is the work a high-value activity? (if not, why am I doing non high-value activities?)
  3. Does the work create greater capability? (for me, the team, the organization, etc.)
  4. Does the work play to my strengths? (if not, who is a better resource or provider.  You grow faster in your strengths, and in today’s world, if people aren’t giving their best where they have their best to give, it leads to a low-impact team that eventually gets out-executed, or put out to Pasteur.)
  5. Does the work lead to world-class impact?  (When everything gets exposed beyond the firewall, and when it’s a globally connected ecosystem, it’s really important to not only bring your A-game, but to play in a way where you can provide the best service in the world for your specific niche.   If you can’t be the best in your niche in a sustainable way, then you’re in the wrong niche.)

I find that by using this simple lens, I tend to take on high-value work that creates high-impact, that cannot be easily automated.  At the same time, while I perform the work, I look for way to turn things into repetitive activities that can be outsources or automated so that I can keep moving up the stack, and producing higher-value work … that’s more human.

Categories: Architecture, Programming