Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Quote of Day

Herding Cats - Glen Alleman - Sun, 07/27/2014 - 15:09

"in order to reason well ... it is absolutely necessary to possess ... such virtues as intellectual honesty and sincerity and a real love of the truth." — C. S. Pierce

Categories: Project Management

Building A Backlog: Technique Synthesis

 

Putting the parts together!

Putting the parts together!

Hand Drawn Chart Saturday

Techniques for building an initial backlog can be classified by how the conversation between stakeholders and the project team is initiated. Some techniques are focused on asking, other techniques focus on observing, while the third category is all about showing something and getting reactions. Most practitioners blend the best from each of the categories. Here are some examples of the hybrid techniques:

Role Playing Prototype: This technique blends paper prototypes (show) with role-playing (observe) to get users and stakeholders to consider how they might act in an environment that has not been fully designed.

Straw man/JAD: This synthesis seeds a JAD session (ask) with an loose outline of a solution or set of potential solutions that are used to guide the discussion which are at the core of JAD. However, the seeding tactic can inhibit creativity. The technique is less constraining when a set of competing solutions is used as the conversation seed, however developing the range of solutions before the JAD session  increases the cost of the JAD.

Embedding: This techniques puts team member(s) into a department to actually perform the work (observe) alongside actual users and stakeholders. This generally requires the embedded team member to be trained and mentored to intimately see how the work is done. I have seen debrief sessions added to this technique to ensure that participants get a chance to discuss the nuances in the workflow.   As I have noted before, with any observation technique everyone needs to understand what is going on and why. This is not an episode of Undercover Boss.

Combining tactics from different categories of techniques that teams use to develop an initial backlog can fundamentally change the dynamics of the requirements definition session. A group of stakeholders will generally have a diverse range of learning and interaction styles that they favor. Combining backlog building techniques gives the project team a better chance at making a connection. Combining techniques should not be done randomly or an ad-hoc basis. Selecting which techniques to combine has four prerequisites:

  1. Someone with experience and training (perhaps a business analyst).
  2. A knowledge of the user community (knowledge the product owner can provide).
  3. Planning (time and effort).
  4. Involved users and stakeholders (call on the product owner and project sponsor to help with this prerequisite).

Developing an initial backlog is a step to get projects going and moving in the right direction. It is, however, only a first step. Backlogs will evolve. Teams, product owners, users and other stakeholders will gain knowledge and experience as the project move forward that will continually shape and reshape the backlog.


Categories: Process Management

Fearless Speaking

“Do one thing every day that scares you.” ― Eleanor Roosevelt

I did a deep dive book review.

This time, I reviewed Fearless Speaking.

The book is more than meets the eye.

It’s actually a wealth of personal development skills at your fingertips and it’s a powerful way to grow your personal leadership skills.

In fact, there are almost fifty exercises throughout the book.

Here’s an example of one of the techniques …

Spotlight Technique #1

When you’re overly nervous and anxious as a public speaker, you place yourself in a ‘third degree’ spotlight.  That’s the name for the harsh bright light police detectives use in days gone by to ‘sweat’ a suspect and elicit a confession.  An interrogation room was always otherwise dimly lit, so the source of light trained on the person (who was usually forced to sit in a hard straight backed chair) was unrelenting.

This spotlight is always harsh, hot, and uncomfortable – and the truth is, you voluntarily train it on yourself by believing your audience is unforgiving.  The larger the audience, the more likely you believe that to be true.

So here’s a technique to get out from under this hot spotlight that you’re imagining so vividly turn it around! Visualize swiveling the spotlight so it’s aimed at your audience instead of you.  After all, aren’t you supposed to illuminate your listeners? You don’t want to leave them in the dark, do you?

There’s no doubt that it’s cooler and much more comfortable when you’re out under that harsh light.  The added benefit is that now the light is shining on your listeners – without question the most important people in the room or auditorium!

I like that there are so many exercises and techniques to choose from.   Many of them don’t fit my style, but there were several that exposed me to new ways of thinking and new ideas to try.

And what’s especially great is knowing that these exercise come from professional actors and speakers – it’s like an insider’s guide at your fingertips.

My book review on Fearless Speaking includes a list of all the exercises, the chapters at a glance, key features from the book, and a few of my favorite highlights from the book (sort of like a movie trailer for the book.)

You Might Also Like

7 Habits of Highly Effective People at a Glance

347 Personal Effectiveness Articles to Help You Change Your Game

Effectiveness Blog Post Roundup

Categories: Architecture, Programming

Transform the input before indexing in elasticsearch

Gridshore - Sat, 07/26/2014 - 07:51

Sometimes you are indexing data and want to have as little to do in the input, or maybe even no influence on the input. Still you need to make changes, you want other content, or other fields. Maybe even remove fields. In elasticsearch 1.3 a new feature is introduces called Transform. In this blogpost I am going to show some of the aspects of this new feature.

Insert the document with the problem

The input we get is coming from a system that puts the string null in a field if it is empty. We do not want null as a string in elasticsearch index. Therefore we want to remove this field completely when indexing a document like that. We start with the example and the proof that you can search on the field.

PUT /transform/simple/1
{
  "title":"This is a document with text",
  "description":"null"
}

Now search for the word null in the description.

For completeness I’ll show you the response as well.

Response:
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "transform",
            "_type": "simple",
            "_id": "1",
            "_score": 0.30685282,
            "_source": {
               "title": "This is a document with text",
               "description": "null"
            }
         }
      ]
   }
}
Change mapping to contain transform

Next we are going to use the transform functionality to remove the field if it contains the string null. To do that we need to remove the index and create a mapping containing the transform functionality. We use the groovy language for the script. Beware that the script is only validated when the first document is inserted.

PUT /transform
{
  "mappings": {
    "simple": {
      "transform": {
        "lang":"groovy",
        "script":"if (ctx._source['description']?.equals('null')) ctx._source['description'] = null"
      },
      "properties": {
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        }
      }
    }
  }
}

When we insert the same document as before and execute the same query we do not get hits. The description field is no longer indexed. An important aspect is that the actual _source is not changed. When requesting the _source of the document you still get back the original document.

GET transform/simple/1/_source
Response:
{
   "title": "This is a document with text",
   "description": "null"
}
Add a field to the mapping

To add a bit more complexity, we add a field called nullField which will contain the name of the field that was null. Not very useful but it suits to show the possibilities.

PUT /transform
{
  "mappings": {
    "simple": {
      "transform": {
        "lang":"groovy",
        "script":"if (ctx._source['description']?.equals('null')) {ctx._source['description'] = null;ctx._source['nullField'] = 'description';}"
      },
      "properties": {
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        },
        "nullField": {
          "type": "string"
        }
      }
    }
  }
}

Notice that we script has changed, not only do we remove the description field, now we also add a new field called nullField. Check that the _source is still not changed. Now we do a search and only return the fields description and nullField. Before scrolling to the response think about the response that you would expect.

GET /transform/_search
{
  "query": {
    "match_all": {}
  },
  "fields": ["nullField","description"]
}

Did you really think about it? Try it out and notice that the nullField is not returned. That is because we did not store it in the index and it is not obtained from the source. So if we really need this value, we can store the nullField in the index and we are fine.

PUT /transform
{
  "mappings": {
    "simple": {
      "transform": {
        "lang":"groovy",
        "script":"if (ctx._source['description']?.equals('null')) {ctx._source['description'] = null;ctx._source['nullField'] = 'description';}"
      },
      "properties": {
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        },
        "nullField": {
          "type": "string",
          "store": "yes"
        }
      }
    }
  }
}

Than with the match all query for two fields we get the following response.

GET /transform/_search
{
  "query": {
    "match_all": {}
  },
  "fields": ["nullField","description"]
}
Response:
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "transform",
            "_type": "simple",
            "_id": "1",
            "_score": 1,
            "fields": {
               "description": [
                  "null"
               ],
               "nullField": [
                  "description"
               ]
            }
         }
      ]
   }
}

Yes, now we do have the new field. That is it, but wait there is more you need to know. There is a way to check what is actually passed to the index for a certain document.

GET transform/simple/1?pretty&_source_transform
Result:
{
   "_index": "transform",
   "_type": "simple",
   "_id": "1",
   "_version": 1,
   "found": true,
   "_source": {
      "description": null,
      "nullField": "description",
      "title": "This is a document with text"
   }
}

Notice the null description and the nullField in the _source.

Final remark

You cannot update the transform part, think about what would happen to your index when some documents did pass the transform version 1 and others version 2.

I would be gentile with this feature, try to solve it before sending it to elasticsearch, but maybe you just have the usecase for this feature, now you know it exists.

In my next blogpost I dive a little bit deeper into the scripting module.

The post Transform the input before indexing in elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming

Building A Backlog: Notes on Showing For Gathering Requirements

The evacuation instructions could be a form of paper prototype.

The evacuation instructions could be a form of paper prototype.

The most powerful single technique for generating requirements is showing users and stakeholders something and collecting reactions. There are numerous techniques for developing something to generate feedback, ranging from functional code that could implemented in production at one extreme, to functional prototypes in the middle, to paper prototypes at the end of the scale. Functional code is used to generate feedback to evolve the backlog in Agile projects, however showing techniques are not used as often as they should be to generate the initial requirements backlog. The reason these techniques are not used is the perceived level of effort needed to generate the prototypes or an impetus to begin building the solution immediately.

The power of the showing techniques is based on the theory that many people only know what they want when they see it. The process of generating feedback and requirements is also risk management. A prototype reduces the risk that the project will either build the wrong thing or build the right thing wrong. Functional prototypes also, to an extent, are useful to prove that a solution is at least conceptually feasible (prototypes are generally too small and not fully functional therefore do not truly serve to prove technical feasibility). Paper prototypes address generating requirements and reducing risk but are faster and cheaper to generate because there is no code or code related infrastructure. A very simple example of a paper prototype for a customer relationship management system (CRM) might be set of drawings of screens to show the rough flow of work.  Users to use the paper screens to imagine the process and discuss the flow. In comparison a functional prototype would have mock screen on a computer show system flow but typically without any background functionality.

The first issue with prototypes is that developing them requires time and effort. Project teams are often presented with a goal, a budget and a deadline. In most corporate IT organizations, performance against the project budget is considered a critical measure of progress (and in the longer term, project success). Therefore teams and project/program managers mercilessly manage the budget to improve the possibility of project success. Unless prototyping is built into the budget and the planned approach for the project, there will be pressure to use the less costly, albeit less effective, asking techniques to generate requirements. Teams, sponsors and project managers make a rational choice to manage the budget risk against the possibility of not generating a good enough backlog to get started. However, generating requirements using prototypes is a process that can used to balance requirements and budget risk. The higher the risk of not having a more through early backlog the more important techniques like prototyping will become to mitigate that risk.

The second issue that the use of prototypes face is what I call “starting fever.” In many methods, including Agile, the whole cross-functional project team is assembled to start the project. There are many individuals that don’t believe that gathering requirement is important, therefore unless there is something for them to build or test, they will find other things to do. There are numerous ways to deal with this type of structural slack; here are two extreme cases will illustrate the range. The first solution is to have a subset of the team (like the Three Amigos) generate the initial backlog before kicking the project off. The second solution is have the whole team spend a day as the project kicks off to generate a quick initial backlog and then use the demonstrations at the end of each sprint to continue to flesh out the backlog. I lean towards scenario one or variants in which the other portions of the team work on framing the technical and physical infrastructure.

Another way “starting fever” can be triggered is because of the panic a fixed budget, fixed scope and fixed date project that someone actually said yes to before requirements are known can cause. Before you protest, regardless of how much every developer, tester, BA, project manager or CIO believes in that this an irrational situation they happen ( I see this as the norm in some organizations) and they casue teams to abandon good  practices like prototyping. In the long run project teams have to pick up the pieces when these types of projects had problems. When teams are put under immediate schedule, budget and scope pressure, leaping into action and then creating and firming up a backlog later can look like a great solution. Action is still equated to progress.

Prototyping is an effective tool for driving out requirements that can’t be expressed until they are seen. The backlog building techniques that show the users and stakeholders something to react to also serves to mitigate the risk of building the wrong thing or the right thing wrong. The power of these techniques is offset by the cost and time required to generate prototypes and a fear that unless you are building something the project has started and the due date is at risk.


Categories: Process Management

R: ggplot – Plotting back to back charts using facet_wrap

Mark Needham - Fri, 07/25/2014 - 22:57

Earlier in the week I showed a way to plot back to back charts using R’s ggplot library but looking back on the code it felt like it was a bit hacky to ‘glue’ two charts together using a grid.

I wanted to find a better way.

To recap, I came up with the following charts showing the RSVPs to Neo4j London meetup events using this code:

2014 07 20 17 42 40

The first thing we need to do to simplify chart generation is to return ‘yes’ and ‘no’ responses in the same cypher query, like so:

timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01", tz = "GMT")
 
query = "MATCH (e:Event)<-[:TO]-(response {response: 'yes'})
         WITH e, COLLECT(response) AS yeses
         MATCH (e)<-[:TO]-(response {response: 'no'})<-[:NEXT]-()
         WITH e, COLLECT(response) + yeses AS responses
         UNWIND responses AS response
         RETURN response.time AS time, e.time + e.utc_offset AS eventTime, response.response AS response"
allRSVPs = cypher(graph, query)
allRSVPs$time = timestampToDate(allRSVPs$time)
allRSVPs$eventTime = timestampToDate(allRSVPs$eventTime)
allRSVPs$difference = as.numeric(allRSVPs$eventTime - allRSVPs$time, units="days")

The query is a bit because we want to capture the ‘no’ responses when they initially said yes which is why we check for a ‘NEXT’ relationship when looking for the negative responses.

Let’s inspect allRSVPs:

> allRSVPs[1:10,]
                  time           eventTime response difference
1  2014-06-13 21:49:20 2014-07-22 18:30:00       no   38.86157
2  2014-07-02 22:24:06 2014-07-22 18:30:00      yes   19.83743
3  2014-05-23 23:46:02 2014-07-22 18:30:00      yes   59.78053
4  2014-06-23 21:07:11 2014-07-22 18:30:00      yes   28.89084
5  2014-06-06 15:09:29 2014-07-22 18:30:00      yes   46.13925
6  2014-05-31 13:03:09 2014-07-22 18:30:00      yes   52.22698
7  2014-05-23 23:46:02 2014-07-22 18:30:00      yes   59.78053
8  2014-07-02 12:28:22 2014-07-22 18:30:00      yes   20.25113
9  2014-06-30 23:44:39 2014-07-22 18:30:00      yes   21.78149
10 2014-06-06 15:35:53 2014-07-22 18:30:00      yes   46.12091

We’ve returned the actual response with each row so that we can distinguish between responses. It will also come in useful for pivoting our single chart later on.

The next step is to get ggplot to generate our side by side charts. I started off by plotting both types of response on the same chart:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  geom_bar(binwidth=1)

2014 07 25 22 14 28

This one stacks the ‘yes’ and ‘no’ responses on top of each other which isn’t what we want as it’s difficult to compare the two.

What we need is the facet_wrap function which allows us to generate multiple charts grouped by key. We’ll group by ‘response’:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  geom_bar(binwidth=1) + 
  facet_wrap(~ response, nrow=2, ncol=1)

2014 07 25 22 34 46

The only thing we’re missing now is the red and green colours which is where the scale_fill_manual function comes in handy:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  scale_fill_manual(values=c("#FF0000", "#00FF00")) + 
  geom_bar(binwidth=1) +
  facet_wrap(~ response, nrow=2, ncol=1)

2014 07 25 22 39 56

If we want to show the ‘yes’ chart on top we can pass in an extra parameter to facet_wrap to change where it places the highest value:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  scale_fill_manual(values=c("#FF0000", "#00FF00")) + 
  geom_bar(binwidth=1) +
  facet_wrap(~ response, nrow=2, ncol=1, as.table = FALSE)

2014 07 25 22 43 29

We could go one step further and group by response and day. First let’s add a ‘day’ column to our data frame:

allRSVPs$dayOfWeek = format(allRSVPs$eventTime, "%A")

And now let’s plot the charts using both columns:

ggplot(allRSVPs, aes(x = difference, fill=response)) + 
  scale_fill_manual(values=c("#FF0000", "#00FF00")) + 
  geom_bar(binwidth=1) +
  facet_wrap(~ response + dayOfWeek, as.table = FALSE)

2014 07 25 22 49 57

The distribution of dropouts looks fairly similar for all the days – Thursday is just at an order of magnitude below the other days because we haven’t run many events on Thursdays so far.

At a glance it doesn’t appear that so many people sign up for Thursday events on the day or one day before.

One potential hypothesis is that people have things planned for Thursday whereas they decide more last minute what to do on the other days.

We’ll have to run some more events on Thursdays to see whether that trend holds out.

The code is on github if you want to play with it

Categories: Programming

Why Project Management is a Control System

Herding Cats - Glen Alleman - Fri, 07/25/2014 - 21:48

When it is mentioned project management is a control system many in the agile world whince. But in fact project is a control system, a closed loop control system.

Here's how it works.

  • We have a goal, a target, some desired outcome.
  • The desired outcome usually comes with a budget - some expected cost.
  • It also comes with a time frame for achieving that desired outcome.
  • That outcome usually - or should if we're doing it right - has a beneficial outcome.

Each of these elements has some unit of measure:

  • Time - the day we need to deliver value or aa capability to meet the business goals or accomplish a mission. There can of course be incremental delievrables, but those also have a time element.
  • Outcome might be he accomplishments of a mission or fulfillment of the business strategy
  • Cost - there is no way to determine the value of anything without knowing its cost. This is the foundation of microeconomics. This can only happen - not knowing cost - by intentionally ignoring the principles of micro-economics. It's done I know, but Don't Do Stupid Things On Purpose.

Here's a small example of incremental delivery of value in an enterprise domain

Project Maturity Flow is the Incremental Delivery of Business Value

The accomplishment of a mission or fulfillment of a business strategy can be called the value produced by the project. In the picture above the value delivered to the business is incremental, but fully functional on delivery to accomplish the business goal. These goals are defined in Measures of Effectiveness and Measures of Performance and these measures are derived from the business strategy or mission statement. So if I want a fleet of cars for my taxi service, producing a sketboard, then a bicycle, is not likley to accomplishment the business goal.

But the term value alone is nice, but not sufficient. Value needs to have some unit of measure. Revenue, cost reduction, environmental cleanup, education of students, reduction of disease, the process of sales orders at a lower cost, flying the 747 to it's destination with minimal fuel. Something that can be assessed in tangible units of measure.

In exchange for this value, with it's units of measure, we have the cost of producing this value.

To assess the value or the cost, we need to know the other item. We can't know the value of something without knowing its cost. We can't know if the cost is appropriate without knowing the value produced by the cost.

This is one principle of Microeconomics of software development

The process of deciding between choices about cost and value - the trade space between cost and value - starts with information about both cost and value. This information lives in the realm of uncertainty before and during the project's life-cycle. It is only known on the cost side after the project completes. And for the value may never be known in the absence of some uncertainty as to the actual measure. This is also a principle of microeconomics - the measures we use to make decisions are random variables.

To determine the value of the random variable we need to estimate, since of course they are random. With these random variables - cost of producing value and the value exchanged for the cost, the next step in projects is to define what we want the project to do:

  • The desired outcome in the form of capabilities.
  • The desired cost for the desired value.
  • The desired time for the delivery of that desired value for the desired cost.
  • The confidence that we can show up on or before the desired time, at or below the desied cost to delivery the desired value, and deliver the needed capabilities to fulfill the mission.

The actual delivery of this value can be incremental, it can be iterative, evolutionary, linear, big bang, or other ways. Software many times can be iterative or incremental, pouring concrete and welding pipe can as well. Building the Interstate might be incremental, the high rise usually needs to wait for the occupancy permit before the value is delivered to the owners. There is no single approach.

For each of these a control system is needed to assure progress to plan is being made. The two types of control systems are Open Loop and Close Loop. The briefing below speaks to those and their use.

So In The End When we hear about a control loop applied to project management, we'll now know about Open and Closed Loop control. And that there can't be a Closed Loop control process without.
  • A desired outcome - the target budget, date, or some performance parameter.
  • Measures of progress to plan - what has the project been doing to date? This should be measured in units of physical percent complete. Working software is a popular platitude in the agile community. But is it the right working software to fulfill the needed capabilities developed through a Capabilities Based Planning process.
  • Variances between actual and plan - with the target outcomes, capture actuals and calculate variances. These variances are the information needed to make business decisions.
    • These decisions are based not only past performance - the actual's - but future performance - the estimates of future performance given the past performance.
    • This future performance must also be risk adjusted.
    • Both the future performance and the uncertainties that create risks to this performance are statistical processes, producing probabilistic outcomes, that are integrated into the decision making processes.
  • Take Corrective Actions - to keep the project inside the white lines. Using the past performance or cost, schedule, and technical outcomes, the assessment of variances, the role of management is to take corrective action to meet the desired outcomes of the project.
    • The cost goals - we have a target budget that is used for assesing the Return on Investment or target product margin.
    • The schedule goals - we have planned go live or release date that been communicated to the market or the customer.
    • The Needed Capabilities - for the product (internal or external) to earn its keep
    • Adjustments - to each of these attributes required management action, assessment of this action to actually get back to GREEN, in our parlance, and keep the project headed to success.
  • Monitor these actions against plan - once corrections are taken, management must still monitor the project to assure it stays on plan.
Related articles Elements of Project Success The Value of Information Control Systems - Their Misuse and Abuse Four Critical Elements of Project Success Critical Thinking Skills Needed for Any Change To Be Effective Seven Immutable Activities of Project Success How Not To Make Decisions Using Bad Estimates Why is Statistical Thinking Hard?
Categories: Project Management

Marketing scrum vs IT scrum - a report published and presented at agile 2014

Xebia Blog - Fri, 07/25/2014 - 17:49

As we know, Scrum is the perfect framework for IT / software development projects to learn, adapt to change and deliver great software of value, faster.

But is Scrum also usable outside of software development? Can we apply similar or maybe even the same principals in other departments in the enterprise?

Yes, we can! And yes there are differences but there are also a lot of similarities.

We (Remco en Me)  successfully implemented Scrum in the marketing departments of two large companies: The ANWB and ING Bank. Both companies are now using Scrum for the development of new campaigns, their full commercial expressions and even at the product development level. They wanted a faster time to market, more ownership, and greater innovation. How did we approach and realized a transition with those goals in the marketing environment? And what are the results?

So when we are not delivering software but other things, how does Scrum change? Well, a great deal actually. The people working in these other departments are, in general, quite different to those in Software Development (and yes more than you would expect). This means coaches or change agents need to take another approach.

Since the people are different, it is possible to go faster or ‘deeper’ in certain areas. Entrepreneurial skills or ambitions are more present in marketing. This gives a sense of ‘act first apologize later’, taking ownership, a higher drive to succeed, and upfront and willing behavior. Scrumming here means thinking more about business goals and KPIs (how to go from department to scrumteam goals for example). After that the fun begins…

I will be speaking about this topic at agile 2014. A great honor offcourse to be standing there. I will also attende the conference and therefor try to post some updates here.

To read more about this topic you can read my publication about marketing scrum. It has the extensive research paper I publisched about this story. Please feel free to give me comments and questions either about agile 2014 or the paper.

 

Enjoy reading the paper:

Marketing scrum vs IT scrum – two marketing case studies who now ‘act first and apologize later'

 

Stuff The Internet Says On Scalability For July 25th, 2014

Hey, it's HighScalability time:


It's systems all the way down. Bugs That Call Us Home.
  • 1 million users in just 4 days: Yo;  30 billion: Pinterest Pins
  • Quotable Quotes:
    • @GlennF: Amazon still dreams it is a startup, like a dog dreaming of chasing rabbits, twitching its legs while asleep.
    • @mfdii: Nobody knows how git works. We all just type in commands like monkeys trying to write Shakespeare. #devopsdays 
    • Benedict Evans: When you pull these strands together, smartphones don't just increase the size of the internet by 2x or 3x, but more like 5x or 10x. It's not just how many devices, but how different those devices are, that has the multiplier effect.
    • @Aaronontheweb: @codinghorror I broke this rule for myself last week. Spent 3 days fixing a problem that we finally solved by a $0.06/hour AWS bill increase
    • Physicist George Ellis: Barring something very unforeseen – the possible tests of the very large and the very small are coming towards the limits of whatever will be possible.
    • The Master Switch: Once the industry had concluded that its profits could be maximized if more people listened to fewer stations, the government, acting as if the business of America were only business, did the industry’s bidding, showing only the most feeble awareness of its consequences for the American ideal of free expression.

  • Ex-Googlers try to recreate Spanner with CockroachDB (awesome name!), which is A Scalable, Geo-Replicated, Transactional Datastore. The design is here and looks good. There's an article on Wired. Good discussion on HackerNews. A globally distributed transactional database it's not, yet, but it's early days yet. After all, they can only work in the dark.

  • Useful post on Handling 1 Billion requests a week with Symfony2. Symfony2 provides good performance and a nice development environment. HAProxy distributes to application servers. Varnish in every application’s server to keep high availability – without having a single point of failure (SPOF). Redis and MySQL for storing data. MySQL is mostly used as a third-tier cache layer (Varnish > Redis > MySQL) for non-expiring resources. 

  • The truest form of the Interest Graph on the net? Details on how Pinterest scales their data infrastructure to create a personalized discovery engine. 20 terabytes of new data each day. 10 petabytes of data in S3. 100 regular Mapreduce users run over 2,000 jobs each day through Qubole. 6 standing Hadoop clusters comprised of over 3,000 nodes. 

  • Just like Captain Kirk. Shifts In Algorithm Design: Now today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful. The brilliance of the current approach is that we can change the problem. 

  • Wet Computing Could Put a Terabyte in a Tablespoon: Researchers from the University of Michigan and New York University demonstrated how plastic nanoparticles, deposited in a liquid, can form a one-bit cluster—the essential building block for information storage. It's called "wet computing," and the technique mimics other biological processes found in nature, like DNA in living cells.

  • Daniel Eloff: The world is not just going massively multicore, it's going heterogeneous core. The one core fits all model of programming is going away. Big performance and efficiency gains can be had from splitting your application among different types of specialized processors. Programmable hardware with FPGAs seems like a natural extension of this trend.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

The Drag of Old Mental Models on Innovation and Change

“Don’t worry about people stealing your ideas. If your ideas are any good, you’ll have to ram them down people’s throats.” — Howard Aiken

It's not a lack of risk taking that holds innovation and change back. 

Even big companies take big risks all the time.

The real barrier to innovation and change is the drag of old mental models.

People end up emotionally invested in their ideas, or they are limited by their beliefs or their world views.  They can't see what's possible with the lens they look through, or fear and doubt hold them back.  In some cases, it's even learned helplessness.

In the book The Future of Management, Gary Hamel shares some great insight into what holds people and companies back from innovation and change.

Yesterday’s Heresies are Tomorrow’s Dogmas

Yesterday's ideas that were profoundly at odds with what is generally accepted, eventually become the norm, and then eventually become a belief system that is tough to change.

Via The Future of Management:

“Innovators are, by nature, contrarians.  Trouble is, yesterday's heresies often become tomorrow's dogmas, and when they do, innovation stalls and the growth curve flattens out.”

Deeply Held Beliefs are the Real Barrier to Strategic Innovation

Success turns beliefs into barriers by cementing ideas that become inflexible to change.

Via The Future of Management:

“... the real barrier to strategic innovation is more than denial -- it's a matrix of deeply held beliefs about the inherent superiority of a business model, beliefs that have been validated by millions of customers; beliefs that have been enshrined in physical infrastructure and operating handbooks; beliefs that have hardened into religious convictions; beliefs that are held so strongly, that nonconforming ideas seldom get considered, and when they do, rarely get more than grudging support.”

It's Not a Lack of Risk Taking that Holds Innovation Back

Big companies take big risks every day.  But the risks are scoped and constrained by old beliefs and the way things have always been done.

Via The Future of Management:

“Contrary to popular mythology, the thing that most impedes innovation in large companies is not a lack of risk taking.  Big companies take big, and often imprudent, risks every day.  The real brake on innovation is the drag of old mental models.  Long-serving executives often have a big chunk of their emotional capital invested in the existing strategy.  This is particularly true for company founders.  While many start out as contrarians, success often turns them into cardinals who feel compelled to defend the one true faith.  It's hard for founders to credit ideas that threaten the foundations of the business models they invented.  Understanding this, employees lower down self-edit their ideas, knowing that anything too far adrift from conventional thinking won't win support from the top.  As a result, the scope of innovation narrows, the risk of getting blindsided goes up, and the company's young contrarians start looking for opportunities elsewhere.”

Legacy Beliefs are a Much Bigger Liability When It Comes to Innovation

When you want to change the world, sometimes it takes a new view, and existing world views get in the way.

Via The Future of Management:

“When it comes to innovation, a company's legacy beliefs are a much bigger liability than its legacy costs.  Yet in my experience, few companies have a systematic process for challenging deeply held strategic assumptions.  Few have taken bold steps to open up their strategy process to contrarian points of view.  Few explicitly encourage disruptive innovation.  Worse, it's usually senior executives, with their doctrinaire views, who get to decide which ideas go forward and which get spiked.  This must change.”

What you see, or can’t see, changes everything.

You Might Also Like

The New Competitive Landscape

The New Realities that Call for New Organizational and Management Capabilities

Who’s Managing Your Company

Categories: Architecture, Programming

Playing with two most interesting new features of elasticsearch 1.3.0

Gridshore - Fri, 07/25/2014 - 11:47

Just a few days a go elasticsearch released version 1.3.0 of their flagship product. The first one is the most waited for feature called the Top hits aggregation. Basically this is what is called grouping. You want to group certain items based on one characteristic, but within this group you want to have the best matching result(s) based on score. Another very important feature is the new support for scripts. Better security options when using scripts using sandboxed script languages.

In this blogpost I am going to explain and show the top_hits feature as well as the new scripting support.


Top hits

I am going to show a very simple example of top hits using my music index. This index contains all the songs I have in my itunes library. First step is to find songs by genre, the following query gives (the default) 10 hits based on the match_all query and the terms aggregation as requested.

GET /mymusic/_search
{
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 10
      }
    }
  }
}

The response is of the format:

{
	"hits": {},
	"aggregations": {
		"byGenre": {
			"buckets": [
				{"key":"rock","doc_count":1910},
				...
			]
		}
    }
}

Now we add the query to the request, songs containing the word love in the title.

GET /mymusic/_search
{
  "query": {
    "match": {
      "name": "love"
    }
  }, 
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 10
      }
    }
  }
}

Now we have less hits, still a number of buckets and the amount of songs that match our query within that bucket. The biggest change is the score in the returned hits. In te previous query the score was always 1, now the score is different due to the query we execute. The highest score now is the song Love by The Mission. The genre for this song is Rock and the song is from the year 1990. Time to introduce the top hits aggregation. With this query we can return the top song containing the word love in the title per genre

GET /mymusic/_search
{
  "query": {
    "match": {
      "name": "love"
    }
  },
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 5
      },
      "aggs": {
        "topFoundHits": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

Again we get hits, but they are not different from the query before. The interesting part is in the aggs part. Here we add a sub aggregation to the byGenre aggregation. This aggregation is called topFoundHits of type top_hits. We only return the best hit per genre. The next code block shows the part of the response with the top hits, I did remove the content of the _source field in the top_hits to keep the response shorter.

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 141,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "byGenre": {
         "buckets": [
            {
               "key": "rock",
               "doc_count": 52,
               "topFoundHits": {
                  "hits": {
                     "total": 52,
                     "max_score": 4.715253,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "4147",
                           "_score": 4.715253,
                           "_source": {
                              "name": "Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "pop",
               "doc_count": 39,
               "topFoundHits": {
                  "hits": {
                     "total": 39,
                     "max_score": 3.3341873,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "11381",
                           "_score": 3.3341873,
                           "_source": {
                              "name": "Love To Love You",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "alternative",
               "doc_count": 12,
               "topFoundHits": {
                  "hits": {
                     "total": 12,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "b",
               "doc_count": 9,
               "topFoundHits": {
                  "hits": {
                     "total": 9,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "r",
               "doc_count": 7,
               "topFoundHits": {
                  "hits": {
                     "total": 7,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

Did you note a problem with my analyser for genre? Hint R&B!

More information on the top_hits aggregation can be found here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

Scripting

Elasticsearch has support for scripts for a long time. The default scripting language was and is mvel up to version 1.3. It will change to groovy in 1.4. Mvel is not a well known scripting language. The biggest advantage is that mvel is very powerful. The disadvantage is that it is to powerful. Mvel does not come with a sandbox principle. Therefore is is possible to write some very nasty scripts even when only doing a query. This was very well shown by a colleague of mine (Byron Voorbach) who created a query to read private keys on developer machines who did not safeguard their elasticsearch instance. Therefore dynamic scripting was switched off in version 1.2 by default.

This came with a very big disadvantage, now it was not possible anymore to use the function_score query without resorting to stored scripts on the server. In version 1.3 of elasticsearch a much better way is introduced. Now you use sandboxed scripting languages like groovy to keep using the flexible approach. Groovy can be configured to include object creation and method calls that are allowed. More information about this is provided in the elasticsearch documentation about scripting.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html

Next is an example query that is querying my music index. This index contains all the songs from my music library. It queries all he songs after the year 1999 and calculates the score based on the year. So the newest songs get the highest score. And yes I know a sort by year desc would have given the same result.

GET mymusic/_search
{
  "query": {
    "function_score": {
      "query": {
        "range": {
          "year": {
            "gte": 2000
          }
        }
      },
      "functions": [
        {
          "script_score": {
            "lang": "groovy", 
            "script": "_score * doc['year'].value"
          }
        }
      ]
    }
  }
}

The score now becomes high, since we do a range query we get back only scores of one. Using the function_score as the multiplication of the year with the score, the end score is the year. I added the year as the only field to return, some of the results than are:

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 2895,
      "max_score": 2014,
      "hits": [
         {
            "_index": "mymusic",
            "_type": "itunes",
            "_id": "12965",
            "_score": 2014,
            "fields": {
               "year": [
                  "2014"
               ]
            }
         },
         {
            "_index": "mymusic",
            "_type": "itunes",
            "_id": "12975",
            "_score": 2014,
            "fields": {
               "year": [
                  "2014"
               ]
            }
         }
      ]
   }
}

Next up is the last sample, a combination of top_hits and scripting.

Top hits with scripting

We start with the sample from top_hits using my music index. Now we want to sort the buckets on the score of the best matching document in the bucket. The default is the number of documents in the bucket. As mentioned in the documentation you need a trick to do this.

The top_hits aggregator isn’t a metric aggregator and therefor can’t be used in the order option of the terms aggregator.

GET /mymusic/_search?search_type=count
{
  "query": {
    "match": {
      "name": "love"
    }
  },
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 5,
        "order": {
          "best_hit":"desc"
        }
      },
      "aggs": {
        "topFoundHits": {
          "top_hits": {
            "size": 1
          }
        },
        "best_hit": {
          "max": {
            "lang": "groovy", 
            "script": "doc.score"
          }
        }
      }
    }
  }
}

The results of this query again with most of the _source taken out is following. Compare it to the query in the top_hits section. Notice the different genres that we get back now. Also check the scores.

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 141,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "byGenre": {
         "buckets": [
            {
               "key": "rock",
               "doc_count": 37,
               "topFoundHits": {
                  "hits": {
                     "total": 37,
                     "max_score": 4.715253,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "4147",
                           "_score": 4.715253,
                           "_source": {
                              "name": "Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.715252876281738
               }
            },
            {
               "key": "alternative",
               "doc_count": 12,
               "topFoundHits": {
                  "hits": {
                     "total": 12,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.194550514221191
               }
            },
            {
               "key": "punk",
               "doc_count": 3,
               "topFoundHits": {
                  "hits": {
                     "total": 3,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.194550514221191
               }
            },
            {
               "key": "pop",
               "doc_count": 24,
               "topFoundHits": {
                  "hits": {
                     "total": 24,
                     "max_score": 3.3341873,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "11381",
                           "_score": 3.3341873,
                           "_source": {
                              "name": "Love To Love You",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 3.3341872692108154
               }
            },
            {
               "key": "b",
               "doc_count": 7,
               "topFoundHits": {
                  "hits": {
                     "total": 7,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 3.027156352996826
               }
            }
         ]
      }
   }
}

This is just a first introduction into the top_hits and scripting. Stay tuned for more blogs around these topics.

The post Playing with two most interesting new features of elasticsearch 1.3.0 appeared first on Gridshore.

Categories: Architecture, Programming

Quote of the Month July 2014

From the Editor of Methods & Tools - Fri, 07/25/2014 - 08:22
Research has shown that the presumption of selfishness is true for maybe 30% of most populations; another 50% are reliably unselfish, and the remaining 20% could go either way, depending on the context. If a company presumes that the undecided 20% are selfish, you can bet they will be selfish—it’s a self-fulfilling prophecy. But worse, the company will create an environment where the 50% of the people who are unselfish are forced to act selfishly. And losing the energy, commitment, and intelligence of half the workforce is perhaps the biggest ...

Building A Backlog: Notes On Observing For Gathering Requirements

3740422323_12654e3ec0_b

One of my jobs during high school was in a tire manufacturing plant in Memphis, Tennessee. On more than one occasion the hated and dreaded time and motion “guy” showed up to observe how I was doing my job. I never knew what the outcome of the observation was or whether the change to the four page process I performed to sort green tires was due to the observation. The job was never easier after the change. Reflecting on that time (and several industrial classes later) I understand that observation is an important tool for developing an understanding of how work should be done, but is not a tool to be used all the time. Using observation in the right scenarios and then taking steps to plan how you will observe is critical getting value for the effort needed.

Why observe? The simplest and clearest rational for using observation techniques is that users and stakeholders either don’t always know what they want or can’t always express their current needs and foresee their future needs. Therefore a new set of eyes will expose more and different needs. There are four typical reasons for observing should be considered as a tool gather knowledge and requirements.

  1. Physical location is a determinant. Processes and work flow is often affected by the physical location. When the physical layout of the people or machines could strongly affect the solution the team developing the initial backlog should observe the process in action to understand the nuances of flow of work.
  2. When people can’t tell you. Occasionally the process being studied will be so complex that no one is able to coherently describe how it works or how it should work. Even more occasionally asking is met by silence due to lack of trust. In both cases observation is a valid tool to develop an initial backlog.
  3. When interactions are crucial. Complex processes often require a wide range of interactions between people, tool and applications. Interactions, except when they cross the boundary, are difficult to identify unless you see them.
  4. When the output and the process don’t match. When, on occasion, the measured output or the output described by a manager does no match what is possible based on the published process then observing the real process is mandatory.

Once you have decided that you must observe, planning becomes a necessity.

  1. Begin by reviewing the known policies, culture and process of the organization or team being observed. This step helps to ensure that you have a sense of the environment and what you will be seeing.
  2. Decide on how long you will observe. Some processes and process variations need time to be seen. If a process requires a week to complete you will need to observe for at least that amount of time.
  3. Determine how you will record what you see. Trying memorize what you see will result in some information, however you will at least need to take notes. Remember that recording can include taking notes or recording audio and video. The level of detail needed will help determine the method needed.
  4. Finalize the logistics of the observation session before showing up. Office space, network and physical access can suck up huge quantities of time and effort. If you have a week for observation do not spend the first day dealing with administration tasks.
  5. Decide how you will create rapport with the group you are observing. Your presence will cause disruption. You need to find a way of observing with minimal impact to the results and without scaring those you are observing into calling placement firms. I am a fan of transparency; tell people why you observing and what will be done with the data. Where possible I usually involve those that I have observed in an early review of the data collected to elicit more information (hybridization of techniques by combining observing and asking).
  6. Finally do what was planned, but do not be afraid to tweak the plan as needed.

When I was in the tire plant, the time and motion guy would just appear and no one was thrilled. When we saw him coming we followed the proscribed process a bit more carefully, even if it was less effective. Observation can change behavior positively or negatively (Hawthorne effect). Sometimes observation might be the only way to know what is really happening, but without planning the data you gather might be what someone wants you to know rather than what you need to know.


Categories: Process Management

Review The Twitter Story by Nick Bilton

Gridshore - Thu, 07/24/2014 - 23:35


A lot of people dream of creating the new Facebook, Twitter or Instagram. Nobody knows when they will create it, most of the ideas just came to be big. Personally I do not believe I will ever create such a product. To busy with to much different things, usually based on great ideas of others. One thing I do like is reading about the success stories of others. I read the book about starbucks, microsoft, apple and a few others.

Recently I started reading the Twitter Story. It reeds like an exciting story, nevertheless it is telling a story based on interviews and facts behind one of the most exciting companies on the internet of the last century.

I do not think it is a coincidence that a lot of what I read in the Lean Startup but also the starbucks story is coming back in the twitter story. One thing that struck me in this book is what business does to friendship. It is also showing that people with great ideas are usually not the people that make this ideas into a profitable company.

When starting a company based on your terrific idea, read this book and learn from it. It might make your life a lot better.

The post Review The Twitter Story by Nick Bilton appeared first on Gridshore.

Categories: Architecture, Programming

Purpose, Not Discipline

NOOP.NL - Jurgen Appelo - Thu, 07/24/2014 - 16:06
Purpose, Not Discipline

I tried running a few years ago, but I stopped because of shin splints and impossible work schedules.

I tried a fitness school, for several months, but I hated all the machines and uninteresting equipment.

I tried Pilates exercises, for a few days, but I found the exercises on a mat mind-numbingly boring.

I tried swimming, for a week or two, but the pool was always crowded and it was far away from my home.

I tried body-weight exercises, for exactly two days, but I found them too hard, which didn’t really motivate me.

And I tried yoga, for less than a week, but it was as least as boring as the Pilates exercises.

The post Purpose, Not Discipline appeared first on NOOP.NL.

Categories: Project Management

How Do I Make $2,000 A Month On Passive Income?

Making the Complex Simple - John Sonmez - Thu, 07/24/2014 - 15:00

In this video I answer a question about how to make passive income from a book and a blog.

The post How Do I Make $2,000 A Month On Passive Income? appeared first on Simple Programmer.

Categories: Programming

How Not To Make Decisions Using Bad Estimates

Herding Cats - Glen Alleman - Thu, 07/24/2014 - 04:54

The presentation Dealing with Estimation, Uncertainty, Risk, and Commitment: An Outside-In Look at Agility and Risk Management has become a popular message for those suggesting we can make decisions about software development in the absence of estimates.

The core issue starts with first chart. It shows the actual completion of a self-selected set of projects versus the ideal estimate. This chart is now in use for the #NoEstimates paradigm as to why estimating is flawed and should be eliminated. How to eliminate estimates while making decisions about spending other peoples money is not actually clear. You'll have to pay €1,300 to find out. 

But let's look at this first chart. It shows the self-selected projects, the vast majority completed above the initial estimate. What is this initial estimate? In the original paper, the initial estimate appears to be the estimate made by someone for how long the project would take. No sure how that estimate was arrived at - the basis of estimate - or how was the estimate was derived. We all know that subject matter expertise is the least desired and past performance, calibrated for all the variables is the best.

So Here in Lies the Rub - to Misquote from Shakespeare's Hamlet

The ideal line is not calibrated. There is no assessment if the orginal estimate was credible or bogus. If it was credible, what was the confidence of that credibility and what was the error band on that confidence. 

This is a serious - some might say egregious - error in statistical analysis. We're comparing actuals to a baseline that is not calibrated. This means the initial estimate is meaningless in the analysis of the variances without an assessment of it accuracy and precision. To then construct a probability distribution chart is nice, but measured against what - against bogus data.

This is harsh, but the paper and the presentation provide no description of the credibility of the initial estimates. Without that, any statistical analysis is meaningless. Let's move to another example in the second chart.

Screen Shot 2014-07-23 at 11.22.14 AM

The second chart - below - is from a calibrated  baseline. The calibration comes from a parametric model, where the parameters of the initial estimate are derived from prior projects - the reference class forecasting paradigm. The tool used here is COCOMO. There are other tools based on COCOMO and Larry Putman's and other methods that can be used for similar calibration of the initial estimates. A few we use are QSM, SEER, Price.

One place to start is Validation Method for Calibrating Software Effort Models. But this approach started long ago with An Empirical Validation of Software Cost Estimation Models. All the way to the current approaches of ARIMA and PCA forecasting for cost, schedule, and performance using past performance. And current approaches, derived from past research, of tuning those cost drivers using Bayesian Statistics.

Screen Shot 2014-07-20 at 10.42.05 PMSo What's All The Flap About?

The issue of software management, estimates of software cost, time, and performance abound. We hear about it every day. Our firm works on programs that have gone Over Target Baseline. So we walk the walk every day.

But when there is bad statistics used to sell solutions to complex problems, that's when it becomes a larger problem. To solve this nearly intractable problem of project cost and schedule over run, we need to look to the root cause. Let's start with a book Facts and Fallacies of Estimating Software Cost and ScheduleFrom there let's look to some more root causes of software project problems. Why Projects Fail is a good place to move to, with their 101 common casues. Like the RAND and IDA Root Cause Analysis reports many are symptoms, rather than root causes, but good infomation all the same.

So in the end when it is suggested that the woo's of project success can be addressed by applying

  • Decision making frameworks for projects that do not require estimates.
  • Investment models for software projects that do not require estimates.
  • Project management (risk management, scope management, progress reporting, etc.) approaches that do not require estimates.

Ask a simple question - is there any tangible, verifiable, externally reviewed evidence for this. Or is this just another self-selected, self-reviewed, self-promoting idea that violates the principles of microeconomics as it is applied to software development, where:

  • Economics is the study of how people make decisions in resource-limited situations. This definition of economics fits the major branches of classical economics very well. 

  • Macroeconomics is the study of how people make decisions in resource-limited situations on a national or global scale. It deals with the effects of decisions that national leaders make on such issues as tax rates, interest rates, and foreign and trade policy, in the presence of uncertainty

  • Microeconomics is the study of how people make decisions in resource—limited situations on a  personal scale. It deals with the decisions that individuals and organizations make on such issues as how much insurance to buy, which word processor to buy, what features to develop in what order, whether to make or buy a capability, or what prices to charge for their products or services, in the presence of uncertainty. Real Options is part of this decision making process as well.

Economic principles underlie the structure of the software development life cycle, and its primary refinements of prototyping, itertaive and incremental development, and emerging requirements. 

If we look at writing software for money, it falls into the microeconomics realm. We have limited resources, limited time, and we need to make decisions in the presence of uncertainty.

In order to decide about the future impact of any one decision - making a choice - we need to know something about the furture which is itself uncertain. The tool to makes these decisions about the future in the presence of uncertainty is call estimating. Lot's of ways to estimate. Lots of tools to help us. Lots of guidance - books, papers, classrooms, advisers. 

But asserting we can in fact make decisions about the future in the presence of uncertainty without estimating is mathematically and practically nonsense. 

So now is the time to learn how to estimate, using your favorite method, because to decide in the absence of knowing the impact of that decision is counter to the stewardship of our customers money. And if we want to keep writing software for money we need to be good stewards first.

Related articles Averages Without Variances are Meaningless - Or Worse Misleading How to "Lie" with Statistics How to Fib With Statistics When Uncertainty is Good No Estimates of Costs and Schedule? The Value of Information COCOMO Model Why is Statistical Thinking Hard? Back To The Future The Failure of Open Loop Thinking
Categories: Project Management

All Decisions Are Based On Mathematics

Herding Cats - Glen Alleman - Thu, 07/24/2014 - 04:25

How Not To Be WrongObvious not every decision we make is based on mathematics, but when we're spending money, especially other people's money, we'd better have so good reason to do so. Some reason other than gut feel for any sigifican value at risk. This is the principle of Microeconomics.

All Things Considered is running a series on how people interprete probability. From capturing a terrortist to the probability it will rain at your house today. The world lives on probabilitic outcomes. These probabilities are driven by underlying statistical process. These statistical processes create uncertainties in our decision making processes.

Both Aleatory and Epistemic uncertainty exist on projects. These two uncertainties create risk. This risk impacts how we make decisions. Minimizing risk, while maximizing reward is a project management process, as well as a microeconomics process. By applying statistical process control we can engage project participants in the decision making process. Making decision in the presence of uncertainty is sporty business and many example of poor forecasts abound. The flaws of statistical thinking are well documented.

When we encounter to notion that decisions can be made in the absence of statistical thinking, there are some questions that need to be answered. Here's one set of questions and answers from the point of view of the mathematics of decision making using probability and statistics.

The book opens with a simple example.

Here's a question. We're designing airplanes - during WWII - in ways that will prevent them getting shot down by enemy fighters, so we provide them  with armour. But armor makes them heavier. Heavier planes are less maneuverable and use more fuel. Armoring planes too much is a proplem. Too little is a problem. Somewhere in between is optimum.

When the planes came back from a mission, the number of bullet holes was recorded. The damage was not uniformly distributed, but followed this pattern

  • Engine - 1.11 bullet holes per square foot (BH/SF)
  • Fueselage - 1.73 BH/SF
  • Fuel System - 1.55 BH/SF
  • Rest of plane - 1.8 BH/SF
The first thought was to provide armour where the need was the highest. But after some thought, the right answer was to provide amour where the bullet holes aren't - on the engines. "where are the missing bullet holes?" The answer was onb the missing planes. The total number of planed leaving minus those returning were the number of planes that were hit in a location that caused them not to return - the engines.

The mathematics here is simple. Start with setting a variable to Zero. This variables is the probability that a plane that takes a hit in the enginer manages to staty in the air and return to base. The result of this analysis (pp. 5-7 of the book) can be applied to our project work.

This is an example of the thought processes needed for project management and the decision making processes needed for spending other peoples money. The mathematician approach is to ask what assumptions are we making? Are they justified? The first assumption - the errenous assumption - was tyhat the planes returning represented were a random sample of all the planes. If so, the conclusions could be drawn.

In The End

Show me the numbers. Numbers talk BS walks is the crude phrase, but true. When we hear some conjecture about the latest fad think about the numbers. But before that read Beyond the Hype: Rediscovedring the Essence of Management, Robert Eccles and Nitin Nohria. This is an important book that lays out the processes for sorting out the hype - and untested and liley untestable conjectures - from the testable processes.

Related articles How To Fix Martin Fowler's Estimating Problem in 3 Easy Steps The World of Probability and Statistics Stationary processes How Not To Be Wrong Why is Statistical Thinking Hard? Selection bias and bombers How Not To Make Decisions Using Bad Estimates
Categories: Project Management

Building A Backlog: Notes On Asking For Requirements

Asking requires listening and writing down what you hear!

Asking requires listening and writing down what you hear!

Asking stakeholders to describe or define requirements is the most common way to develop requirements for projects. Specific techniques include talking to stakeholders in the hall informally, interviews and questionnaires and very formal joint application design (JAD). These techniques are popular because asking and talking to people is easy and opens a dialog. However, while stakeholders may know their business need, they may not know the details of what they really want and need. Moderation and planning are critical for making all of the techniques in the category as effective as possible for creating an initial backlog. Examples of how moderation and planning could be implemented in two classic “asking” techniques are shown below:

Joint Application Design (JAD) is a very formal technique that is an off-shoot of Joint Application Development that evolved in the 1970s. JAD is highly structured approach to developing the requirements and design for an application or project. The process is based on the interaction between key roles (sponsor, subject matter experts including business and IT participants, facilitator, scribe and potentially observers). The process requires all roles. It should be noted that the JAD process was one of the earliest techniques used to embed business and IT personnel for any substantial period of time. The process (documented many places including Wikipedia) has a number of key steps that provide a structured approach for interaction and generating information. Setting the goal (one of the key steps) of the JAD acts as an anchor for the process and provides a tool for the facilitator to re-focus the process if it wanders off course. In order for a JAD to work, up-front planning is mandatory. The participants need to be carefully identified, the goals of the JAD identified and a detailed agenda with supporting documentation needs to be developed. Preparing for the JAD can take as long as the session itself. JADs typically run three to eight days and participants typically were sequestered from the typical working environment during the session.   The combination of skilled facilitator and structure help IT and business participants interact in a creative and productive fashion. Overall JAD is a very powerful technique, however the structure and overhead tend to make it more difficult to apply.

In its classic form, JAD is viewed as less than Agile. Historically it was used to develop the much abused, big up-front design (BUFD). Agile principles call out the concept of emergent design, while eschewing the BUFD. The practice of Agile  is generally more a reflection of finding the balance between what needs to be known and what needs to be discovered. I have used the formal structure of the JAD process as a tool to initiate Agile projects very successfully by refocusing the goal to build an initial product backlog. The combination of structure and facilitation is more valuable when a team is addressing a new business area or in matrix organizations where teams are assembled for each new project.

Interviews are another of the classic “asking” techniques. Interview techniques can range from formally scripted question and answer sessions to loosely guided discussions. Formal interview techniques begin by developing a set of questions to be asked during the interview. In formal interview situations, the responses to the questions in the scrip and any follow-on questions captured as close to verbatim as possible. A legal disposition is an example of a formal interview. They require the interviewers to prepare for the interview not only by developing the set of questions to be asked, but also to gather information about the general outline of the answer they are going to receive. A good interviewer is rarely surprised by the answer they receive. Informal interviews are typically less structured, however they still require preparation. In less formal scenarios I generally recommend developing a loose set of framing questions (framing questions capture the direction of interview without being specific) so that the interviewer develops a goal for the interview and then plans the approach to attain that goal. The framing process is important in case the interviewee throws a curve so that interviewer can gradually guide the interview back to the correct track. Take notes (do not trust your memory) in all interviews. While informal interview seem more like common conversation, interviewers that are good at the informal technique tend to good counter-punchers (able to deliver well formed follow questions that keep the interviewee talking) however even in an informal interview, the interviewee must always their ultimate goal in mind. In both formal and informal situations, if the interviewer is emotionally involved in what the answer should be, consider using a facilitator or external interviewer.

Asking stakeholders for requirements is a tried and true method to generate an initial backlog. Asking should not equate to ad hoc or mere order taking. Asking requires preparation to be effective whether using formal techniques based on JAD or informal interviews. As an interviewer you need to map out where you want the session to go and then act as the guide.


Categories: Process Management

Java: Determining the status of data import using kill signals

Mark Needham - Wed, 07/23/2014 - 23:20

A few weeks ago I was working on the initial import of ~ 60 million bits of data into Neo4j and we kept running into a problem where the import process just seemed to freeze and nothing else was imported.

It was very difficult to tell what was happening inside the process – taking a thread dump merely informed us that it was attempting to process one line of a CSV line and was somehow unable to do so.

One way to help debug this would have been to print out every single line of the CSV as we processed it and then watch where it got stuck but this seemed a bit over kill. Ideally we wanted to only print out the line we were processing on demand.

As luck would have it we can do exactly this by sending a kill signal to our import process and have it print out where it had got up to. We had to make sure we picked a signal which wasn’t already being handled by the JVM and decided to go with ‘SIGTRAP’ i.e. kill -5 [pid]

We came across a neat blog post that explained how to wire everything up and then created our own version:

class Kill3Handler implements SignalHandler
{
    private AtomicInteger linesProcessed;
    private AtomicReference<Map<String, Object>> lastRowProcessed;
 
    public Kill3Handler( AtomicInteger linesProcessed, AtomicReference<Map<String, Object>> lastRowProcessed )
    {
        this.linesProcessed = linesProcessed;
        this.lastRowProcessed = lastRowProcessed;
    }
 
    @Override
    public void handle( Signal signal )
    {
        System.out.println("Last Line Processed: " + linesProcessed.get() + " " + lastRowProcessed.get());
    }
}

We then wired that up like so:

AtomicInteger linesProcessed = new AtomicInteger( 0 );
AtomicReference<Map<String, Object>> lastRowProcessed = new AtomicReference<>(  );
Kill3Handler kill3Handler = new Kill3Handler( linesProcessed, lastRowProcessed );
Signal.handle(new Signal("TRAP"), kill3Handler);
 
// as we iterate each line we update those variables
 
linesProcessed.incrementAndGet();
lastRowProcessed.getAndSet( properties ); // properties = a representation of the row we're processing

This worked really well for us and we were able to work out that we had a slight problem with some of the data in our CSV file which was causing it to be processed incorrectly.

We hadn’t been able to see this by visual inspection since the CSV files were a few GB in size. We’d therefore only skimmed a few lines as a sanity check.

I didn’t even know you could do this but it’s a neat trick to keep in mind – I’m sure it shall come in useful again.

Categories: Programming