Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Pairing Patterns

Actively Lazy - Wed, 02/04/2015 - 22:01

Pair programming is hard. When most developers start pairing it feels unnatural. After a lifetime of coding alone, headphones on, no human contact; suddenly talking about every damned line of code can seem weird. Counter-productive, even.

And yet… effective pairing is the cheapest way to improve code quality. Despite what superficially seems like a halving in productivity – after all, your team of eight developers are only working on four things now instead of eight! – it turns out that productivity doesn’t drop at all. If anything, I’ve seen the opposite.

Going it Alone

In my experience most developers are used to, and feel most comfortable, coding on their own. It seems the most natural way to write code. But it introduces all sorts of problems.

If you’re the only person that wrote this code there’s¬†only one person that knows it, that means at 3am in 6 months time guess who’s getting the phone call? And what happens when you decide to leave? No, worse, what happens when¬†that other guy¬†decides to leave and now you’ve got a metric fuckton of code to support. And of course, he couldn’t code for shit. His code stinks. His design stinks. You question his ability, his morals, even his parentage. Because¬†everybody codes to a different style it’s hard to maintain any consistency. This varies from the most trivial of stylistic complaints (braces on new lines, puhleeze, what a loser) to consistency of architectural approach and standardised tools and libraries. This makes¬†picking up other people’s code hard.

When you’re coding on your own,¬†it’s harder to be disciplined:¬†I don’t need to write a unit test for this class, it’s pretty trivial. I don’t need to refactor this mess, I know how it works. With nobody looking over your shoulder it takes a lot more self-discipline to write the high quality code you know you ought to.

Getting Started Pairing

The easiest way to get started is to pair with someone that’s experienced at doing it. It can feel quite strange and quickly become dysfunctional if you’re not used to it,¬†so having an experienced hand show you what effective pairing feels like is really important.

The most important thing to realise is that pairing is incredibly¬†social. You will spend a massive amount of time¬†talking. It turns out that days of coding can save literally minutes of thought up front. When you’re pairing, this thinking happens out loud as you argue about the best way to approach the design, the best way to test this class, the best way to refactor it.

This can feel alien at first and incredibly wasteful. Why don’t you just shut up and let me code? Because then we’ll just have to delete your crap code and you’ll feel bad. Or worse, we’ll push it so you don’t feel bad and then we’ll come back to this mess again and again over the coming months and pay an incredibly high price instead of spending another few minutes discussing it now until we agree.

The Roles

When pairing we traditionally label the two roles “driver” and “navigator”. The driver is the person with their hands on the keyboard, typing. The navigator isn’t. So what the hell’s the navigator doing? The critical thing is that they’re¬†not just sitting there watching. The driver is busy¬†writing good code that compiles; the driver is focused on details. The navigator is looking at the bigger picture: making sure that what we’re doing is consistent with the overall design.

One thing I really struggle with, but as a navigator it’s really important:¬†don’t interrupt the driver’s flow.¬†Resist the temptation to tell the driver there’s a missing bracket or semi-colon. Resist the urge to tell them what order to fix the compile errors in. Keep track of what needs to be done, if the driver misses something small write it down and come back to it.

The navigator should be¬†taking copious notes, letting the driver stay hands-on-keyboard typing. If there’s a test we’ve spotted we’re missing, write it down. If there’s an obvious design smell we need to come back to, write it down. If there’s a refactoring we should do next, write it down. The navigator uses these notes to guide the coding session – ensuring details aren’t missed and that we keep heading in the right direction and come back to every detail we’ve spotted along the way.

The navigator can also keep track of the development “call stack”. You know how it goes: we started writing the shopping basket returns a price in euros test; but to do that we need¬†to change¬†the basket item get price method; this breaks¬†a couple of basket item unit tests, the first of these shows we don’t have a currency conversion available for a basket item; so now we’re changing how currency conversion is constructed so we can pass it into the basket item factory. This call stack of development activities can get very deep if you’re not careful, but a disciplined navigator with a clear navigator’s pad will guide the way.

Changing Roles

Generally the person that knows the domain / code base / problem the best should spend the least time being the driver. If I don’t know this code and you’re driving, I’m just gonna sit here watching you type. I can’t really contribute any design ideas because you know the domain. I can’t ask questions because it stops you typing.¬†But the other way round: I can be busy typing learning the code as I go; while you use your superior knowledge to guide me in the right direction. I can ask lots of questions because when I don’t know, work stops until I’m happy again.

A good approach can be¬†ping-pong pairing: this is where one person writes a failing test, the other makes it pass then writes another failing test, back to the first to make this test pass and then write another failing test, and so on and so on… This can give a good balance to a pairing session as both developers write test and production code and gives a natural rhythm preventing any one developer from dominating the driver role.

Sometimes it’s necessary to impose a time limit, I find¬†25 minutes is long enough for one person to be driving. This can happen when someone has an idea about a refactoring, especially if it becomes a sprawling change. 25 minutes also puts a good upper limit on a change, if you’ve not been able to commit to source control in 25 minutes it is definitely time to abort and do-over.

At the end of the day, write up your navigator pad and email it your partner. The following day you can swap pairs allowing either of you to carry on from exactly where you left off today.


Pairing can feel strange at first, but with practice it will begin to feel normal. If you can keep pairing day-in day-out you will come to rely on having a second brain alongside you. You’ll realise you can get through complex work faster because you’ve got two¬†people¬†working at different detail levels. Keep pairing long enough and coding on your own will begin to feel strange, almost dangerous. Who’s watching my back?

Categories: Programming, Testing & QA

Matt Cutts: 10 Lessons Learned from the Early Days of Google

I mainly know of Matt Cutts, long time Google employee (since 2000) and currently head of Google's Webspam team, from his appearances on TwiT with Leo Laporte. On TwiT Matt always comes off as smart, thoughtful, and a really nice guy. This you might expect.

What I didn’t expect is in this talk he gave, Lessons learned from the early days of Google, is that Matt also turns out to be quite funny and a good story teller. The stories he’s telling are about Matt’s early days at Google. He puts a very human face on Google. When you think everything Google does is a calculation made by some behind the scenes AI, Matt reminds us that it’s humans making these decisions and they generally just do the best they can.

The primary theme of the talk is innovation and problem solving through creativity. When you are caught between a rock and a hard place you need to get creative. Question your assumptions. Maybe there’s a creative way to solve your problem?

The talk is short and well worth watching. There are lots of those fun little details that only someone with experience and perspective can give. And there’s lots of wisdom here too. Here’s my gloss on Matt’s talk:

1. Sometimes creativity makes a big difference.
Categories: Architecture

Microservices versus the common SOA implementation

Xebia Blog - Wed, 02/04/2015 - 15:02

When I was first reading about MSA architectures (MSA) I had a hard time figuring out what was so different from a Service Oriented Architecture (SOA). Main reason for this is that the SOA paradigm leaves quite a bit of room for interpretation and various people have different interpretations. When Martin Fowler wrote about MSA almost a year ago he also mentioned that some people see it as ‚ÄúSOA done right‚ÄĚ, he himself considers MSA a subset of SOA. So, what are the differences, if any?
At first glance there seem to be more than a few similarities. Both talk about services as the cornerstone for the architecture, services need to loosely coupled, technology agnostic and there’s probably a few more. What sets the Microservices architectural style apart is that it’s more clear on where the focus needs to be when designing your services. The name suggests that the services need to very small and probably very fine grained but in reality no size restrictions apply. Ironically, it’s the size (and focus) of services that commonly cripple SOA implementations.
The main characteristics of MSA are: autonomy, high cohesion and low coupling. Out of these characteristics autonomy is the defining one. Here autonomy not only means that it can run or function independently (that is what low coupling is about), it means that a service should be able to provide business value on its own. This principle ties together the focus on low coupling and high cohesion. Lower coupling allows the service to operate independently and high cohesion increases it’s ability to add value on its own.
What you often see in SOA implementations is the focus on reuse. This means that a certain function should only exist in one place or a single service should handle a certain function. Where things go sour is in how these ‚Äúfunctions‚ÄĚ are determined.¬†This is where cohesion comes into play. Cohesion is the degree in which functionality in services belongs together. The highest cohesion is functional cohesion where the services contribute to a well-defined task or business capability. MSA strives towards functional (high) cohesion. The cohesion found in common SOA implementations is logical cohesion (one step up from the worst kind, coincidental cohesion). With logical cohesion the services are grouped around similar tasks, like the one service for one function mentioned above. This approach leads to finer grained services that focus on atomic tasks. Take for instance a ‚Äúdata service‚ÄĚ that handles all communication with a database. It accepts messages in a common format and then uses that information to execute the desired action on the database. Benefit is that applications that use this service don‚Äôt need to worry about the tech behind said service and only need to provide messages in a common format. If the database is shared by multiple services they all use this service when communicating with the database. Now imagine what would happen if this service goes down (intentionally or not). Or what the impact is when this service needs to be modified. And this is assuming that this service is well designed and built so it can handle all the different requests thrown at it.
With MSA the goal should be high cohesion, this means grouping things around a business capability (so it can add business value). It also provides all aspects of that capability end-to-end, from data storage to user interface functions. So instead of having a ‚Äúdata service‚ÄĚ you create a ‚Äúcustomer service‚ÄĚ or ‚Äúshipping service‚ÄĚ.
Another aspect that is relevant here is that with MSA one should strive for low coupling (or better yet, no coupling). Low coupling boils down to the dependency between services. With no coupling the services can function completely independent of each other. If two services have no coupling downtime of one of the two will have zero impact on the other. The higher the coupling, the higher the impact.
With SOA implementations based on logical cohesion the coupling tends to be high. Because services depend on other services to function. High coupling increases the impact of changes.
For MSA the goal is no coupling. But lowering the coupling does not mean they can’t or shouldn’t interact. Services can still interact with other services but they don’t depend on them. Another distinction to take into consideration is that these are more technical dependencies, not functional ones. Take for instance an order service and a shipping service. Both can operate independently. The shipping service will process all the orders it has received, regardless of the availability of an order service. If the order service is down it simply means no new orders will be created for the shipping service. So when the shipping service is done handling its last known order it stops. Vice versa, if the shipping service is down the order service will still be able to process orders. They just won’t be processed by the shipping service. When the shipping service comes back up it will process the backlog created by the order service.
How much the Microservices architectural style differs from SOA depends on who you talk to. If nothing else, MSA offers a clearer and better defined approach on setting up an architecture built around services. The key differentiator being the focus on autonomously adding value. There is a lot more to say on the subject but hopefully this gives some insight into how MSA differentiates itself from what you typically see with SOA implementations.

Quote of the Day

Herding Cats - Glen Alleman - Wed, 02/04/2015 - 15:02

Ensure Good Data gets to the Bad Decision Makers

This quote is from a DOD Contracting surveillance officer on the inability of some managers to use data for decision making.

Making good decisions requires good data. Data about the future. The confidence in that data starts with gathering good data. It then moves to understanding the naturally occurring and event based uncertainties in that data. With this understanding the decisions can then be based on risk informed, statistically adjusted impacts to cost, schedule, and technical performance for future outcomes.

No Data? Driving in the Dark with Your Headlights Off. Hoping you don't run off the road.

Driving in the darkMy favorite though is this one. Driving in the rear view mirror. 

Objects are closer3


Related articles We Suck At Estimating Building the Perfect Schedule Don't Manage By Quoting Dilbert
Categories: Project Management

What Model Do Your Estimates Follow?

Cone of UncertaintyFor years, we bought the cone of uncertainty for estimation—that is, our estimates were just as likely to be over as under.

Laurent Bossavit, in The Leprechauns of Software Engineering, shows us how that assumption is wrong. (It was an assumption that some people, including me, assumed was real.)

This is a Gaussian (normal) distribution. It’s what we expect. But, it’s almost never right. As Laurent says,

“Many projects stay in 90% done for a long time.”

What curve do our estimates follow¬†if they don’t follow a Gaussian distribution?

Troy Magennis, in “The Economic Impact of Software Development Process Choice – Cycle Time Analysis and Monte Carlo Simulation Results,” suggests we should look at the Power-Law (Weibull) distribution.

What this distribution says with respect to estimation is this: We are good at estimating small things. We get much worse with our estimation quickly, and for the long tail  (larger and larger chunks of work), we are quite bad.

Why? Because creating software is innovation. Building software is about learning. We better our learning as we proceed, assuming we finish features.

We rarely, if ever, do the same thing again. We can’t apply precise estimation approaches to something we don’t know.

You should read Troy’s paper¬†because it’s fascinating. It’s well-written, so don’t worry about not being able to understand it. You will understand it. It’s only 10 pages long.

The question is this: What effect does understanding an estimation model have on our estimates?

If we know that the normal distribution is wrong, then we won’t apply it. Right, why would you do something you know to be wrong? You would not estimate large chunks and expect to have a +/- 10% estimate. It doesn’t make sense to do that.

But what can we do? On the printed paper, what the proceedings will show p. 5061, Troy has a table that is telling. In it, he says that if you have large unique work items or you have large WIP, you will have poor predictability. (He has suggestions for what to do for your process.)

My suggestions for your estimation:

  1. Estimate small chunks of work that a team can complete in a day or so.
  2. Keep WIP low.
  3. Replan as you finish work.
  4. Watch your cycle time.
  5. No multitasking.

What should you do when people ask you for estimates? What kind of requirements do you have? If you have large requirements, follow my advice and use the percentage confidence, as in We Need Planning; Do We Need Estimation? Read the estimation series or get Essays on Estimation.

You can predict a little for estimates. You can better your prediction. And, you may have to predict a large effort. In that case, it helps to know what distribution model might reflect your estimate.

Categories: Project Management

Do You Have Senior Management Support?

Do you have someone backing you up?

Do you have someone backing you up?

Audio Version on SPaMCAST 143

I asked many of my colleagues what they thought were the precursors to beginning a CMMI change program. Almost to a person, they began their list with senior management support, which makes sense as the CMMI has become the top-down process improvement framework of choice, and a prominent attribute of top-down change programs is the need for explicit senior management support.

Deciding whether your process improvement program is best pursued from a bottom-up or the top-down perspective is not a throw-away question. The method you are using changes as it matures over time.  I have heard that during early days of the SEPG conference  there were numerous presentations on how the CMMI could be implemented as a grassroots change program.  Presentations on the pursuit of the CMMI using grassroots techniques are now few and far between, however if you go to an Agile conference you will still see presentations of this type.

Given the importance of senior management support, you need to ensure you have it BEFORE you start any top-down improvement program using a framework like the CMMI.  There are six things to consider when determining whether you have senior management support. They are:

  1. Assigning the right people
  2. Being visible
  3. Organizational change management support
  4. Providing required resources
  5. Enforcing the processes
  6. Having a constancy of purpose

Assigning the right people: Start by assessing whether your top performers and leaders are assigned to staff your CMMI change program. Assigning the best and brightest serves multiple purposes. Top performers tend to demand and draw respect from the staff.  Secondly, assigning the best and brightest is a show of determination by the organization.

Being visible:¬† Do members of the senior management team attend training classes or status meetings?¬† Do they stop people in the hall and ask about the program?¬† Being visible is a convincing approach to proving that the CMMI program is important and success is personnel. Tom Szurszewski said, ‚ÄúThe Senior Management/Sponsor should attend the “Intro to CMMI” class, along with the individuals who were being charged with “making it happen.” Participating in training ensures an equal level of understanding and a very public show of visibility.

Organizational change management support: Making the changes needed to support the CMMI tends to require organizational nips and tucks. Only senior management can grease the skids to make organizational changes.¬† Nanette Yokley stressed the need for an Executive Sponsor that ‚Äúideally … understands what they are getting into related to changes needed and long-term process.‚ÄĚ

Providing the required resources:¬† Resources can include budget, tools, space, training classes and others.¬† Without the right resources, change programs will struggle. Trying to apply the CMMI on the cheap is usually a prescription for problems.¬† Paul Laberge went to heart of the matter with one of his comments saying, ‚Äúmanagement must ensure the availability of a resource (or more) to maintain the process improvement program and documented processes.‚ÄĚ

Enforcing the processes: When implementing any process changes, using the process can’t be optional. When push comes to shove (and it will), management can’t hand out free passes. Management must enforce the process or risk the failure of the program.

Constancy of purpose:¬† W. Edward Deming felt so strongly about the need for constancy of purpose that it was the first of his famous fourteen points. Lasting change requires a focus that goes past the first problem or the next quarter. If the CMMI is perceived to be the change ‚Äúflavor of the week,‚ÄĚ the overall degree of difficulty for staying the course will be higher than expected.¬† Dominique Bourget talked about measuring ‚Äúthe will of the upper management to improve.‚Ä̬† Frankly, that will say a lot about staying power of any change program.


  1. Review each attribute. Can you honestly say that your senior management team (usually more than one) is delivering on each attribute?
  2. Answer each with a yes or no.

If you answer five or more yes you are in good shape. If you can answer yes to less than five, it is time for a serious conversation with your senior management on how to handle remediating the problem and to build management support.

Categories: Process Management

Apache Spark

Xebia Blog - Tue, 02/03/2015 - 21:22

Spark is the new kid on the block when it comes to big data processing. Hadoop is also an open-source cluster computing framework, but when compared to the community contribution, Spark is much more popular. How come? What is so special and innovative about Spark? Is it that Spark makes big data processing easy and much more accessible to the developer? Or is it because the performance is outstanding, especially compared to Hadoop?

This article gives an introduction to the advantages of current systems and compares these two big data systems in depth in order to explain the power of Spark.

Parallel computing

First we have to understand the differences between Hadoop and the conventional approach of parallel computing before we can compare the differences between Hadoop and Spark.

Distributed computing

In the case of parallel computing, all tasks have access to shared data to exchange information and perform their calculations. With distributed computing, each task has its own data. Information is exchanged by passing data between the tasks.

One of the main concepts of distributed computing is data locality, which reduces network traffic. Because of data locality, data is processed faster and more efficiently. There is no separate storage network or processing network.

Apache Hadoop delivers an ecosystem for distributed computing. One of the biggest advantages of this approach is that it is easily scalable, and one can build a cluster with commodity hardware. Hadoop is designed in the way that it can handle server hardware failures.


To understand the main differences between Spark and Hadoop we have to look at their stacks. Both stacks consist of several layers.

Stacks of Spark and Hadoop

The storage layer is responsible for a distributed file system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Spark uses the Hadoop layer. That means one can use HDFS (the file system of Hadoop) or other storage systems supported by the Hadoop API. The following storage systems are supported by Hadoop: your local file system, Amazon S3, Cassandra, Hive, and HBase.

The computing layer is the programming model for large scale data processing. Hadoop and Spark differ significantly in this area. Hadoop uses a disk-based solution provided by a map/reduce model. A disk-based solution persists its temporary data on disk. Spark uses a memory-based solution with its Spark Core. Therefore, Spark is much faster. The differences in their computing models will be discussed in the next chapter.

Cluster managers are a bit different from the other components. They are responsible for managing computing resources and using them for the scheduling of users' applications. Hadoop uses its own cluster manager (YARN). Spark can run over a variety of cluster managers, including YARN, Apache Mesos, and a simple cluster manager called the Standalone Scheduler.

A concept unique to Spark is its high-level packages. They provide lots of functionalities that aren't available in Hadoop. One can see this layer also as a sort of abstraction layer, whereby code becomes much easier to understand and produce. These packages are

  • Spark SQL is Spark‚Äôs package for working with structured data. It allows querying data via SQL.
  • Spark Streaming enables processing live streams of data, for example, log files or a twitter feed.
  • MLlib is a package for machine learning functionality. A practical example of machine learning is spam filtering.
  • GraphX is a library that provides an API for manipulating graphs (like social networks) and performing graph-parallel computations.

The Spark Core is also written in Scala and supports Scala natively, which is a far better language than Java for implementing the kinds of transformations it supports. This results in less code which is therefore more intuitive.

Screen Shot 2015-01-24 at 16.41.37

(Source: infoq)

Computational model

The main difference between Hadoop and Spark is the computational model. A computational model is the algorithm and the set of allowable operations to process the data.

Hadoop uses the map/reduce. A map/reduce involves several steps.

Hadoop computational model: Map/Reduce

  • This data will be processed and indexed on a key/value base. This processing is done by the map task.
  • Then the data will be shuffled and sorted among the nodes, based on the keys. So that each node contains all values for a particular key.
  • The reduce task will do computations for all the values of the keys (for instance count the total values of a key) and write these to disk.

With the Hadoop computational model, there are only two functions available: map and reduce.
Note that doing distributed computing with Hadoop results, most of the time, in several iterations of map/reduce.  For each iteration, all data is persisted on disk. That is the reason it is called disk-based computing.

Spark uses RDD, also called Resilient Distributed Datasets. Working and processing data with RDD is much easier:

Spark computational model: RDD

  • Reading input data and thereby creating an RDD.
  • Transforming data to new RDDs (by each iteration and in memory). Each transformation of data results in a new RDD. For transforming RDD's there are lots of functions one can use, like map, flatMap, filter, distinct, sample, union, intersection, subtract, etc. With map/reduce you only have the map-function. (..)
  • Calling operations to compute a result (output data). Again there are lots of actions available, like collect, count, take, top, reduce, fold, etc instead of only reduce with the map/reduce.

Performance Hadoop vs Spark

Behind the scenes, Spark does a lot, like distribute the data across your cluster and parallelizing the operations. Note that doing distributed computing is memory-based computing. Data between transformations are not saved to disk. That's why Spark is so much faster.


All in all Spark is the next step in the area of big data processing, and has several advantages compared to Hadoop. The innovation of Spark lies in its computational model. The biggest advantages of Spark against Hadoop are

  • Its in-memory computing capabilities that deliver speed
  • Packages like streaming and machine-learning
  • Ease of development - one can program natively in Scala

How to Write a Book: Structured or Emergent

NOOP.NL - Jurgen Appelo - Tue, 02/03/2015 - 19:29

I believe most authors apply the hybrid approach to writing. They start anywhere they want, either with a logical outline or with some random writing, but then they bounce up and down continuously to ensure that their writing has both structure and surprise.

The post How to Write a Book: Structured or Emergent appeared first on NOOP.NL.

Categories: Project Management

Sponsored Post: Apple, Couchbase, Farmerswife, VividCortex, Internap, SocialRadar, Campanja, Transversal, MemSQL, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Apple is hiring a Application Security Engineer. Apple’s Gift Card Engineering group is looking for a software engineer passionate about application security for web applications and REST services. Be part of a team working on challenging and fast paced projects supporting Apple's business by delivering high volume, high performance, and high availability distributed transaction processing systems. Please apply here.

  • Want to be the leader and manager of a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join farmerswife on the beautiful island of Mallorca and help create the next generation on project management tools. Please apply here.

  • Senior DevOps EngineerSocialRadar. We are a VC funded startup based in Washington, D.C. operated like our West Coast brethren. We specialize in location-based technology. Since we are rapidly consuming large amounts of location data and monitoring all social networks for location events, we have systems that consume vast amounts of data that need to scale. As our Senior DevOps Engineer you’ll take ownership over that infrastructure and, with your expertise, help us grow and scale both our systems and our team as our adoption continues its rapid growth. Full description and application here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you.
Cool Products and Services
  • See how LinkedIn uses Couchbase to help power its “Follow” service for 300M+ global users, 24x7.

  • VividCortex is a hosted (SaaS) database performance management platform that provides unparalleled insight and query-level analysis for both MySQL and PostgreSQL servers at micro-second detail. It's not just another tool to draw time-series charts from status counters. It's deep analysis of every metric, every process, and every query on your systems, stitched together with statistics and data visualization. Start a free trial today with our famous 15-second installation.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • Aerospike demonstrates RAM-like performance with Google Compute Engine Local SSDs. After scaling to 1 M Writes/Second with 6x fewer servers than Cassandra on Google Compute Engine, we certified Google’s new Local SSDs using the Aerospike Certification Tool for SSDs (ACT) and found RAM-like performance and 15x storage cost savings. Read more.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

The First Annual Testing on the Toilet Awards

Google Testing Blog - Tue, 02/03/2015 - 17:51
By Andrew Trenk

The Testing on the Toilet (TotT) series was created in 2006 as a way to spread unit-testing knowledge across Google by posting flyers in bathroom stalls. It quickly became a part of Google culture and is still going strong today, with new episodes published every week and read in hundreds of bathrooms by thousands of engineers in Google offices across the world. Initially focused on content related to testing, TotT now covers a variety of technical topics, such as tips on writing cleaner code and ways to prevent security bugs.

While TotT episodes often have a big impact on many engineers across Google, until now we never did anything to formally thank authors for their contributions. To fix that, we decided to honor the most popular TotT episodes of 2014 by establishing the Testing on the Toilet Awards. The winners were chosen through a vote that was open to all Google engineers. The Google Testing Blog is proud to present the winners that were posted on this blog (there were two additional winners that weren’t posted on this blog since we only post testing-related TotT episodes).

And the winners are ...

Erik Kuefler: Test Behaviors, Not Methods and Don't Put Logic in Tests 
Alex Eagle: Change-Detector Tests Considered Harmful

The authors of these episodes received their very own Flushy trophy, which they can proudly display on their desks.

(The logo on the trophy is the same one we put on the printed version of each TotT episode, which you can see by looking for the ‚Äúprinter-friendly version‚ÄĚ link in the TotT blog posts).

Congratulations to the winners!

Categories: Testing & QA

What does it mean when we say 80% confidence in a number?

Herding Cats - Glen Alleman - Tue, 02/03/2015 - 17:07

Confidence intervals are the means to measure population parameters. A concern in inferential statistics (making a prediction from a sample of data or from a model of that data) is the estimation of the population parameter from the sample statistic.

The sample statistic is calculated from the sampled data and the population parameter is estimated from this sample statistic.

  • Statistics are calculated - this means the data from we are looking at, the time series of values for example in a project are used in a calculation
  • Parameters are estimated - a parameters from these numbers is then¬†estimated from the time series. This estimate has a confidence interval. From this estimate we can make inferences. ¬†

One issue in inference making - estimating - is sample size determination. How large of a sample do we  to make an accurate estimation? This is why small sample sizes produce very unreliable inferences. For example sampling 27 stories in an agile project and making in inference about how the remaining stories are going to behave is Very sporty business.

To have a good estimator, that is to make good estimates from sampled or simulated data the estimator must be:

  • Unbiased - the expected value of the estimator must be equal to the mean of the parameter
  • Consistent - the value of the estimator approaches the value of the parameter as the sample size increases
  • Relatively Efficient - the estimator has the smallest variance of all estimators which could be used.

The Confidence Interval

The point estimate differs from the population parameter due to the sampling error, since there is no way to know who close it is to the actual parameter. Because of this, statisticians give an interval estimate as a range of values used to estimate the parameter.

What's the cost of this project going to be when we're done with all our efforts, given we done some work so far?

The confidence interval is an interval estimate with a specific level of confidence. A level of confidence is the probability that the interval estimate will contain the parameter. The level of confidence is 1 ‚ÄĒ őĪ. Where 1‚ÄĒ őĪ ¬†area lies within the confidence interval.¬†The maximum error of the estimate, E,¬†is ¬Ĺ the width of the confidence interval.

The  confidence interval for a symmetric distribution is the point estimate minus the maximum error of the estimate is less than the true population parameter which is less than the point estimate plus the maximum error of the estimate.

An Example from Actual Observations

While staying at the Yellowstone Lodge during the Millennium (year 2000), our kids got sick with some type of flu going around the lodge. My wife lay in bed, tending them all night long and passed the time recording data about Old Faithful erupting outside our bedroom window. 

The data looked something like this:

        Eruptions Waiting 
1     3.600      79 
2     1.800      54 
3     3.333      74 
4     2.283      62 
5     4.533      85 
6     2.883      55

Eruptions is the duration of the eruption of Old Faithful and Waiting is the waiting time before the next eruption. There is a correlation between these pieces of data. This is due to the physical processes of expelling water at high temperature and the refilling processes of the caverns below the surface

Screen Shot 2015-02-02 at 6.35.16 PM

 If we use R as our analysis tool, we can get a sense of what is happening statistically with Old Faithful. (R code below)

> attach(faithful)     # attach the data frame 
> eruption.lm = lm(eruptions ~ waiting)

Then we create a new data frame that set the waiting time value.

> newdata = data.frame(waiting=80)

We now apply the predict function and set the predictor variable in the newdata argument. We also set the interval type as "confidence", and use the default 0.95 confidence level.

> predict(eruption.lm, newdata, interval="confidence") 
     fit    lwr    upr 
1 4.1762 4.1048 4.2476 
> detach(faithful)     # clean up   We can see there is 95% confidence interval of the mean eruption duration for the waiting time of 80 minutes is between 4.1048 and 4.2476 minutes.   Now to a Project Example In the graph below the black line to the left is the historical data from a parameter I want to estimate from it's past value. But I need an 80% confidence and a 95% confidence interval for the customer as to what values this parameter will take on in the future. We can see from the Time Series of the past value both the 80% confidence and the 95% confidence bands for the possible value the parameter can take on the future.


What Does The Mean?

It means two things:

  • When we say we have an 80% confidence that a parameter will assume to value, we need to know how that parameter behaved in the past.
  • When we hear that we are estimating the future from the past, we¬†MUST¬†know about the behaviours of those past values, the size of the population, and the same size, before we can determine the confidence in the possible future outcomes. Have an Average Value without this data is prettu much useless in our decision making process.

What Does This Really Mean?

Anyone suggesting we can make decisions about future outcomes in the presence of uncertainty and at the same time in the absence of estimating those outcomes is pretty much clueless about basic probability and statistics random processes. 

Since all project variables - the statistical parameters - are random variables, driven by underlying process that we must estimate using statistical process available in R and our High School Stats book.


When it is mentioned I use bayesian statistics, or I use Real Options, ask if they are using something like the R Tutorial Resource with Bayesian Statistics. And of course the source code for the statistical processes described above. Then ask to see their data. There seems to be a lot of people tossing around words, like Bayesian, Real Options, Monte Carlo, and other buzz words without actually being able to show their work or the result that an be tested outside their personal ancedotes. Sad but true.

Related articles Taxonomy of Logical Fallacies Building a Credible Performance Measurement Baseline Late Start = Late Finish
Categories: Project Management

Sticking to our Guns

Software Requirements Blog - - Tue, 02/03/2015 - 16:00
I’ve been working on a program that has certain been a challenge, both from the subject matter, the extremely short timeframes given (imposed governmental regulations that must be met), to the stakeholders (who have day jobs to perform as well). Everyone is under pressure, which can make for some short fuses as well as poor […]
Categories: Requirements

Thoughts on Estimation

Mike Cohn's Blog - Tue, 02/03/2015 - 15:00

Ron Jeffries has a new book out, "The Nature of Software Development". I highly recommend everyone read it--in fact, Ron is one of the few people whose every word I recommend people read. I love how he always gives me something to think about. I don't always agree with him--but that's why I like reading what he has to say. If I always agreed, I'd never learn anything and there'd be no point in that.

To celebrate Ron's new book, I asked him to write a guest post for us, which I'm happy to share below. I know you'll enjoy it. Be sure to check out his offer to win one of the great drawings from the book! --Mike


My new book, The Nature of Software Development, will hit the streets in final form any minute now. The book is "long-awaited", at least by me, as it has been percolating in my mind and computer for years. In Nature, I try to share my view that under all the complications and complexity of software development, there is an essential simplicity. I believe that if we keep that simplicity in mind, we can find our way out of the twisty little passages that make up the software business. The pictures in this article are taken from the book. It's full of pictures, hand-drawn by the author (that's me), in the manner of a chalk-talk or illustrated keynote talk.

Did I say "final form"? How silly of me. For example, Nature talks about estimation in its chapter on planning, and elsewhere, including a bonus chapter on the "Five-Card Method" of quickly getting a handle on the scope of the project. But the topic of estimation rages on, and my thinking continues to change, not in direction so much as in refined understanding. I've probably published four or five articles on estimation since the book was finished, and here come some more thoughts for you.

Large Scale: Budgeting over estimation

At the overall project level, I favor a quick estimate if you must, and prefer a budget. Not a multi-million dollar ten year budget but a week, a month, maybe three months for a small team to get a real handle on what the product is and how long it will take until we can have something of use.

We build high-value items first, and continue that so long as the product shows itself to be worth investing in. If we need faster progress, maybe we add people, or even teams. Let's move into the unknown with exploration, not with a huge commitment that may be misplaced.

Small Scale: Maybe no estimates at all

Starting a large project with a small one is controversial enough. At the level of the next few weeks' work, the Scrum Sprint, I have begun to recommend against estimating stories at all, be it in points, hours, or gummi bears.







As Eisenhower said, "Plans are useless, but planning is essential." It's commonly believed that a key part of planning a Sprint is to have estimates for all the backlog items, and the Scrum Guide even tells us that we should track our performance against those estimates.

We might drop or defer a story

One possible advantage of this is that the Product Owner might decide differently what to do if something is going to cost four days instead of two. It could happen: I've seen it happen. In the case I saw, the team scribbled estimates beside stories on the white board, and erased them at the end of the meeting. They found it useful to decide how much work to take on, and at least once, their Product Owner withdrew a story because it was a couple of days more costly than she thought.

This might happen, and it might even happen often enough to matter. Still, I've only seen it once. Even so, if we're doing Scrum as we're supposed to, the stories are in priority order, we forecast how many we can take on, and we work through them. That works so nearly perfectly that the estimates at the Sprint level are generally wasteful.

Estimates will make us learn about the story

Another often-quoted value to small-scale estimates is that it causes the developers to learn enough about the story to really understand it, resulting in better implementations, fewer mistakes, and so on. Yes, it's true that if you will actually think about what's being asked for, you'll do better. It's not clear that estimates are an ideal way to do that.

That said, someone pointed out a value to doing Planning Poker: it quickly identifies stories on which there is mixed understanding, and it provides a ritual where someone who might otherwise be quiet has an occasion to speak. That could be useful. Like my friends with the white board, you might want to consider throwing away any estimates that come out.

Commonly misused

More pernicious -- I used that word in Nature, a copy editor suggested that I change it, and I refused -- is the common practice of comparing estimates to actual "so we can improve our estimates". Hello!?!? How about we improve almost anything rather than estimates? The cleanness of our code, the coordination among our team, the communication between Product Owner and developers? Surely there's something more important to do than focus on the cost of things and guessing the cost correctly. Maybe we could focus on the value?

Value rather than cost






In Nature, I take the position that "Value is what you want," and I mean that literally. Maybe you want revenue. Maybe your product saves lives. Maybe it organizes meetings. Whatever the point of your product is, it is almost certainly not "let's make our actuals match our estimates".

As the picture above suggests, backlog items offer various levels of value, and come with varying costs. What the picture does not show is that value had better be many times larger than cost, or the feature isn't worth doing. If that picture were drawn to scale, the cost dimension would be so narrow we could hardly see it. The picture shows us that what matters most is building high-value items first. Almost any other consideration is far secondary to that one.

I could go on. In fact I have gone on. Beyond the estimation topics discussed in *Nature*, which are just a tiny part of a small book, I've already published quite a few more detailed articles about the topic. And I need your help.

Valuable prizes. Well, prizes.

I certainly hope you'll read, enjoy, and promote Nature. But I have another little proposal in mind for you here.

I'd like to write a few more articles about estimation, mostly about how to get rid of it, but also about what it's good for and how to do it well, if I can find out. To do that, I'd like to hear from you on topics like these:

  • Where have small-scale estimates actually been useful to your team?
  • What have you done to get rid of estimates if you found them troublesome?
  • If management insisted on estimates, how have you been successful in turning that around?

Write me an email at ronjeffries at acm dot org, telling me a bit about your discoveries.

I'll pick at least two emails and dig into them, maybe more. Probably we'll exchange a few notes or talk on the phone. Maybe you'll write an article for my web site or your own. I'll certainly write about your ideas on

Wait, don't answer yet!

If I use your idea, I'll send you a signed print of one of the drawings from the book, printed on the very best paper that will go into my printer. Your choice of drawing if you wish, or I'll pick one.

In addition, I'll draw an equal number of entries randomly. If I use two emails for articles, two more random pictures. If three, three. And so on.

As you'll see when you read the book -- and I hope you do -- these pictures are suitable for framing and for display in any small dark room in your house, such as a closet or pantry. And quite likely, if your cat is like mine, your cat will sit on the drawing if you put it on the floor.

What a fantastic opportunity! Let's see if we can begin to build a community understanding of when estimates are good and when they aren't, and how to work with them, and to work without them.

Oh, and do please consider getting a copy of the book. The pictures are in color, so you might prefer the paper version, or to read it on your color e-reader.


Marco Arment Uses Go Instead of PHP and Saves Money by Cutting the Number of Servers in Half

On the excellent Accidental Tech Podcast there's a running conversation about Marco Arment's (Tumblr, Instapaper) switch to Go, from a much loved PHP, to implement feed crawling for Overcast, his popular podcasting app for the iPhone.

In Episode 101 (at about 1:10) Marco said he halved the number of servers used for crawling feeds by switching to Go. The total savings was a few hundred dollars a month in server costs.

Why? Feed crawling requires lots of parallel networking requests and PHP is bad at that sort of thing, while Go is good at it. 

Amazingly, Marco wrote an article on how much Overcast earned in 2014. It earned $164,000 after Apple's 30%, but before other expenses. At this revenue level the savings, while not huge in absolute terms given the traffic of some other products Marco has worked on, was a good return on programming effort. 

How much effort? It took about two months to rewrite and debug the feed crawlers. In addition, lots of supporting infrastructure that tied into the crawling system had to be created, like the logging infrastructure, the infrastructure that says when a feed was last crawled, monitoring delays, knowing if there's queue congestion, and forcing a feed to be crawled immediately.

So while the development costs were high up front, as Overcast grows the savings will also grow over time as efficient code on fast servers can absorb more load without spinning up more servers.

Lots of good lessons here, especially for the lone developer:

Categories: Architecture

Why I’m Not Writing a Blog Post This Week

Making the Complex Simple - John Sonmez - Mon, 02/02/2015 - 17:00

Just a minute ago my wife walked into my office and busted me. It’s Saturday, around 4:40 PM and I’ve been in my office “writing a blog post” for the last 3 or 4 hours. It’s not like I didn’t get anything done. I responded to like 50 emails, and I shipped out some signed copies of my “Soft Skills” ... Read More

The post Why I’m Not Writing a Blog Post This Week appeared first on Simple Programmer.

Categories: Programming

You Need Feature Teams to Produce Features

Many organizations create teams by their architectural part: front end, back end, middleware. That may have worked back in the waterfall days. It doesn’t work well when you want to implement by feature. (For better images, see¬†Managing the Stream of Features in an Agile Program.)

Pierce Wetter wrote this great article on LinkedIn, There is no “front end” or “back end.”¬†Notice how he says, referring to the yin/yang picture,

Your product isn’t just the white part or the black part above. It’s the whole circle.

That has implications for how you structure your teams.

If you have front end, back end, or middleware teams, you lose the holistic way of looking at features. You can’t produce features—you produce components, parts of features that work across the architecture. Even if everyone does their job perfectly, they still have to knit those pieces together to create features. Too often, the testers find the problems that prevent features.

Instead, you want a product development team, a feature team. That team has someone from the back end, someone from the front end, someone from middleware, and a tester, at minimum. Your team may have more people, but you need those people to be able to create a feature.

You might call these teams product development teams, because they produce product chunks. You can call them feature teams because they can create features.

Whatever you call them, make sure—regardless of your life cycle—that you have feature teams. You can have feature teams in any approach: serial, iterative, incremental, or agile. What differentiates these teams from functional or component teams is that feature teams can produce features.

Features are what you offer to your customers. Doesn’t it make sense that you have teams that create features?


Categories: Project Management

How to make the sprint review meeting worth your while

Xebia Blog - Mon, 02/02/2015 - 16:44

My work allows me to meet a lot of different people, who actively pursue Scrum. Some of them question the value of doing a sprint review meeting at the end of every sprint. Stakeholders presumably do not ‚Äúuse‚ÄĚ nor ‚Äúsee‚ÄĚ their work directly, or the iterated product is not yet releasable.

Looks like this Scrum ritual is not suited for all. If you are a person questioning the value of a demo, then focus on your stakeholders and start to demo the delta instead of just your product. Here is a 3-step plan to make your sprint reviews worth your while.

Step 1: Provide context

Stakeholders are not interested in individual user stories. They want to see releasable working software they can use and work with. That is were the value delta is for them; in practical services and products they can use to make their lives a little bit better than before.
So the thing you should do in the sprint review is to explain the completed stories in the context of the iteration and how this iteration will add value to the upcoming product-release. Providing the context will make it easier for stakeholders to give your team the feedback it is looking for.

Step 2: Demo the delta

Search for the abstraction level on which the user is impacted by the story you have completed, even if this means combining stories to demo as a whole. This is especially useful if you are working in component-based teams.

It will enable you to explicitly illustrate the added value from a stakeholder perspective after the change. It’s not just about adding an input screen or expansion of variables. It’s about being able to do something different as a stakeholder.

Think about the possibilities given the new additions to the system. Maybe people can be more effective or more efficient saving time, money and energy. If possible try to explicitly show the effect of the delta on the stakeholder, for instance by measuring key variables in the before and after situation. Seeing the explicit effect of the changes will get your stakeholders fired up to provide your team with additional ideas and feedback for your next release.

Step 3: Ask for feedback

The goal of the sprint review is to generate feedback. Often, this won‚Äôt happen automatically and means you have to explicitly ask for it. Doing it wrong is to do the review without interruption and to conclude by asking: ‚Äúany questions?‚ÄĚ This will certainly not evoke people to participate in a group discussion as posing questions is, unfortunately, still seen by many as a sign of weakness.

To counter this group dynamic, you should be the one asking the questions to your audience. Example stakeholder focused questions to generate feedback might be; ‚Äúwhat are your thoughts on improving this feature further?‚ÄĚ Or ‚ÄúHow would this fit in your day-to-day routine?‚ÄĚ

By doing so, you will lower the barrier for others to start asking questions themselves. Another tip to turn around the question dynamic is to specifically target your question to a single stakeholder, getting the group out of the conversation.

Up until now these steps helped me a lot in improving my sprint review meetings. These 3 simple steps will let your key-stakeholders be the subject of your review meeting, maximizing the chance that they will provide you with valuable feedback to improve your product.

Sources for Reference Class Forecasting

Herding Cats - Glen Alleman - Mon, 02/02/2015 - 16:19

Reference Class Forecasting is a useful source of data for making estimates in a variety of domains.

And some databases used for RCF

Then there are tools for parametric estimating

A sample of the nearly endless materials on how to apply Reference Class Forecasting

Screen Shot 2015-02-01 at 12.24.18 PM

So when you here we can't possibly estimate this piece of software. It's never been done before. Look around a bit to see if Someone has done it, then look so more, maybe they have a source for a Reference Class you can use. 

Related articles Good Project and Bad Project Building a Credible Performance Measurement Baseline Your Project Needs a Budget and Other Things We Suck At Estimating
Categories: Project Management

How to Define Your Target Audience… with Questions

NOOP.NL - Jurgen Appelo - Mon, 02/02/2015 - 16:16

“Can our organization be a little bit more like Pixar, Spotify, Netflix, Zappos, Virgin, Valve or IDEO? Is there something I can do to get a better company culture? Better collaboration? Better management?”

There is a reason why I started the book description of my #Workout book on Amazon with this question. For me, it defines the target audience of the book.

The post How to Define Your Target Audience… with Questions appeared first on NOOP.NL.

Categories: Project Management

I'm speaking at the O'Reilly Software Architecture Conference

Coding the Architecture - Simon Brown - Mon, 02/02/2015 - 13:19

I'm thrilled to say that I'll be speaking at the inaugural O'Reilly Software Architecture Conference in Boston during March. The title of my session is Software architecture vs code and I'll be speaking about the conflict between software architecture and code. This is a 90-minute session, so I look forward to also discussing how can we solve this issue. Here's the abstract...

Software architecture and coding are often seen as mutually exclusive disciplines, despite us referring to higher level abstractions when we talk about our software. You've probably heard others on your team talking about components, services and layers rather than objects when they're having discussions. Take a look at the codebase though. Can you clearly see these abstractions or does the code reflect some other structure? If so, why is there no clear mapping between the architecture and the code? Why do those architecture diagrams that you have on the wall say one thing whereas your code says another? In fact, why is it so hard to automatically generate a decent architecture diagram from an existing codebase? Join us to explore this topic further.

Software Architecture Conference 2015

You can register with code FRIEND20 for a discount. See you there!

Categories: Architecture