Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

How combined Lean- and Agile practices will change the world as we know it

Xebia Blog - Tue, 07/01/2014 - 08:50

You might have attended this month at our presentation about eXtreme Manufacturing and the keynote of Nalden last week on XebiCon 2014. There are a few epic takeaways and additions I would like to share with you in this blogpost.

Epic TakeAway #1: The Learn, Unlearn and Relearn Cycle Like Nalden expressed in his inspiring keynote, one of the major things for him to be successful is being able to Learn, Unlearn and Relearn every time again. In my opinion, this will be the key ability for every successful company in the near future.  In fact, this is how nature evolutes: in the end, only the species who are able to adapt to changing circumstances will survive and evolute. This mechanism makes for example, most of the startups fail, but those who will survive, can be extremely disruptive for non-agile organizations.  Best example for this is of course Whatsapp.  Beating up the Telco Industry by almost destroying their whole businessmodel in only a few months. Learn more about disruptive innovation from one of my personal heroes, Harvard Professor Clayton Christensen.

Epic TakeAway #2: Unlearning Waterfall, Relearning Lean & Agile Globally, Waterfall is still the dominant method in companies and universities.  Waterfall has its origins more than 40 years ago. Times have changed. A lot. A new, successful and disruptive product could be there in only a matter of days instead of (many) years. Finally, things are changing. For example, the US Department of Defence has recently embraced Lean and Agile as mandatory practices, especially Scrum. Schools and universities are also more and more adopting the Agile way of working. Later more in this blogpost.

Epic TakeAway #3: Combined Lean- and Agile practices =  XM Lean practices arose in Japan in the 1980’s , mainly in the manufacturing industry, Toyota being the frontrunner here.  Agile practices like Scrum, were first introduced in the 1990’s by Ken Schwaber and Jeff Sutherland, these practices were mainly applied in the IT-industry. Until now, the manufacturing and IT world didn’t really joined forces combining Lean and Agile practices.  Until recently.  The WikiSpeed initiative of Joe Justice proved combining these practices result in a hyper-productive environment, where a 100 Mile/Gallon road legal sportscar could be developed in less than 3 months.  Out of this success eXtreme Manufacturing (XM) arose. Finally, a powerful combination of best practices from the manufacturing- and IT-world came together.

Epic TakeAway #4: Agile Mindset & Education fotoLike Sir Ken Robinson and Dan Pink already described in their famous TED-talks, the way most people are educated and rewarded, is not suitable anymore for modern times and even conflicts with the way we are born.  We learn by "failing", not by preventing it.  Failing in it’s essence should stimulate creativity to do things better next time, not be punished.  On the long run, failing (read: learning!) has more added value than short-term succes, for example by chasing milestones blindly. EduScrum in the Netherlands stimulates schools and universities to apply Scrum in their daily classes in order to stimulate creativity, happiness, self-reliantness and talent. The results of the schools joining these initiative are spectacular: happy students, less dropouts an significantly higher grades. For a prestigious project for the Delft University, Forze, the development of a hydrogen race car, the students are currently being trained and coached to apply Agile and Lean practices.  Also these results are more than promising. The Forze team is happier, more productive and more able to learn faster and better from setbacks.  Actually, they are taking the first steps of being anti-fragile.  Due too an intercession of the Forze team members themselves,  the current support of agile (Xebia) coaches is now planned being extended to the flagship of the Delft University:  the NUON solar team.

The Final Epic TakeAway In my opinion, we reached a tipping point in the way goals should be achieved.  Organizations are massively abandoning Waterfall and embracing Agile practices, like Scrum.  Adding Lean practices like Joe Justice did in his WikiSpeed project, makes Agile and Lean extremely powerful.  Yes, this will even make this world a much better place.  We cannot prevent nature disasters with this, but we can be anti-fragile.  We cannot prevent every epidemic, but we can respond in an XM-fashion on this by developing a vaccin in only days instead of years.  This brings me finally to the missing statement of the current Agile Manifesto:   We should Unlearn and Relearn before we Judge.  Dare to Dream like a little kid again. Unlearn your skepticism.  Companies like Boeing, Lockheed Martin and John Deere already did. Adopting XM speeded up their velocity in some cases with more than 7 times.

What is Capacity in software development? - The #NoEstimates journey

Software Development Today - Vasco Duarte - Tue, 07/01/2014 - 04:00

I hear this a lot in the #NoEstimates discussion: you must estimate to know what you can deliver for a certain price, time or effort.

Actually, you don’t. There’s a different way to look at your organization and your project. Organizations and projects have an inherent capacity, that capacity is a result of many different variables - not all can be predicted. Although you can add more people to a team, you don’t actually know what the impact of that addition will be until you have some data. Estimating the impact is not going to help you, if we are to believe the track record of the software industry.

So, for me the recipe to avoid estimates is very simple: Just do it, measure it and react. Inspect and adapt - not a very new idea, but still not applied enough.

Let’s make it practical. How many of these stories or features is my team or project going to deliver in the next month? Before you can answer that question, you must find out how many stories or features your team or project has delivered in the past.

Look at this example.

How many stories is this team going to deliver in the next 10 sprints? The answer to this question is the concept of capacity (aka Process Capability). Every team, project or organization has an inherent capacity. Your job is to learn what that capacity is and limit the work to capacity! (Credit to Mary Poppendieck (PDF, slide 15) for this quote).

Why is limiting work to capacity important? That’s a topic for another post, but suffice it to say that adding more work than the available capacity, causes many stressful moments and sleepless nights; while having less work than capacity might get you and a few more people fired.

My advice is this: learn what the capacity of your project or team is. Only then you will be able to deliver reliably, and with quality the software you are expected to deliver.

How to determine capacity?

Determining the capacity of capability of a team, organization or project is relatively simple. Here's how

  • 1- Collect the data you have already:
    • If using timeboxes, collect the stories or features delivered(*) in each timebox
    • If using Kanban/flow, collect the stories or features delivered(*) in each week or period of 2 weeks depending on the length of the release/project
  • 2- Plot a graph with the number of stories delivered for the past N iterations, to determine if your System of Development (slideshare) is stable
  • 3- Determine the process capability by calculating the upper (average + 1*sigma) and the lower limits(average - 1*sigma) of variability

At this point you know what your team, organization or process is likely to deliver in the future. However, the capacity can change over time. This means you should regularly review the data you have and determine (see slideshare above) if you should update the capacity limits as in step 3 above.

(*): by "delivered" I mean something similar to what Scrum calls "Done". Something that is ready to go into production, even if the actual production release is done later. In my language delivered means: it has been tested and accepted in a production-like environment.

Note for the statisticians in the audience: Yes, I know that I am assuming a normal distribution of delivered items per unit of time. And yes, I know that the Weibull distribution is a more likely candidate. That's ok, this is an approximation that has value, i.e. gives us enough information to make decisions.

You can receive exclusive content (not available on the blog) on the topic of #NoEstimates, just subscribe to the #NoEstimates mailing list below. As a bonus you will get my #NoEstimates whitepaper, where I review the background and reasons for using #NoEstimates #mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own MailChimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */ Subscribe to our mailing list* indicates required Email Address * First Name Last Name

Picture credit: John Hammink, follow him on twitter

Measures of Central Tendency

Clouds distributed around the mean, err, horizon.

Clouds distributed around the mean, err, horizon.

Measures of central tendency attempt to define the center of a set of data. Measures of central tendency are important to for interpreting benchmarks and when single rates are used in contracts.  There are many ways to measure however the three most popular are mean, mode and median.  Each of the measures of central tendency provides interesting information and each is more or less useful in different circumstances.  Let’s explore the following simple data set.

Untitled

Mean

The mean is the most popular and well known measure of central tendency.  The mean is calculated by summing the values in the sample (or population) and dividing by the total number of observations.  In the example the mean is calculated as 231.43 / 13 or 17.80.  The mean is most useful when the data points are disturbed evenly around the mean or are normally distributed. A mean is highly influenced by outliers.  

Advantages include:

  • Most common measure of central tendency used and therefore most quickly understood.
  • The answer is unique.

Disadvantages include

  • Influenced by extremes (skewed data and outliers).

Median

Median is the middle observation in a set of data.  Median is affected less by outliers or skewed data.  In order to find the median (by hand) you need to arrange the data in numerical order.  Using the same data set:

Untitled2

The median is 18.64 (six observations above and six observations below.  Since the median is positional, it is less affected by extreme values. Therefore the median is a better reflection of central tendency for data that has outliers or is skewed.  Most project metrics include outliers and tend to be skewed therefore the median is very valuable when evaluating software measures. 

Advantages

  • Extreme values (outliers) do not affect the median as strongly as they do the mean.
  • The answer is unique.

Disadvantages

  • Not as popular as the mean.

Mode

The mode is the most frequent observation in the set of data.  Modes may not be the best measure of central tendency and may not be unique. Worse the set may not have a mode.  The mode is most useful when the data is non-numeric or when you are attempting to the most popular item in a data set. Determine the mode by counting the number of each unique observations. In our example data set:

Untitled3

The mode in this data set is 26.43; it has two observations.  

Advantages:

  • Extreme values (outliers) do not affect the mode.

Disadvantages:

  • May be more than one answer.
  • If every value is unique the mode is useless (every value is the mode).
  • May be difficult to interpret.

Based on our test data set the three measures of central tendency return the following values:

  • Mean: 17.8
  • Median: 18.64
  • Mode: 26.43

Each statistic returns different values.  The mean and median provide relatively similar values therefore it would be important to understand whether the data set represents a sample or whether the data set represents the population.  If the data is from a sample or could become more skewed by extreme values, the median is probably a better representation of the central tendency in this case.  If the population is evenly distributed about the mean (or is normally distributed) the mean is a better representation of central tendency. In the sample data set the mode provides little explanative power. Understanding which measure of central tendency allows change agents to better target changes and if your contract uses metrics to determine performance, which measure of central measure you can have an impact.  Changing or arguing over which to use smacks of poor contracting or gaming the measure.  


Categories: Process Management

R: Aggregate by different functions and join results into one data frame

Mark Needham - Mon, 06/30/2014 - 23:47

In continuing my analysis of the London Neo4j meetup group using R I wanted to see which days of the week we organise meetups and how many people RSVP affirmatively by the day.

I started out with this query which returns each event and the number of ‘yes’ RSVPS:

library(Rneo4j)
timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01")
 
query = "MATCH (g:Group {name: \"Neo4j - London User Group\"})-[:HOSTED_EVENT]->(event)<-[:TO]-({response: 'yes'})<-[:RSVPD]-()
         WHERE (event.time + event.utc_offset) < timestamp()
         RETURN event.time + event.utc_offset AS eventTime, COUNT(*) AS rsvps"
events = cypher(graph, query)
events$datetime <- timestampToDate(events$eventTime)
      eventTime rsvps            datetime
1  1.314815e+12     3 2011-08-31 19:30:00
2  1.337798e+12    13 2012-05-23 19:30:00
3  1.383070e+12    29 2013-10-29 18:00:00
4  1.362474e+12     5 2013-03-05 09:00:00
5  1.369852e+12    66 2013-05-29 19:30:00
6  1.385572e+12    67 2013-11-27 17:00:00
7  1.392142e+12    35 2014-02-11 18:00:00
8  1.364321e+12    23 2013-03-26 18:00:00
9  1.372183e+12    22 2013-06-25 19:00:00
10 1.401300e+12    60 2014-05-28 19:00:00

I wanted to get a data frame which had these columns:

Day of Week | RSVPs | Number of Events

Getting the number of events for a given day was quite easy as I could use the groupBy function I wrote last time:

groupBy = function(dates, format) {
  dd = aggregate(dates, by=list(format(dates, format)), function(x) length(x))
  colnames(dd) = c("key", "count")
  dd
}
 
> groupBy(events$datetime, "%A")
        key count
1  Thursday     9
2   Tuesday    24
3 Wednesday    35

The next step is to get the sum of RSVPs by the day which we can get with the following code:

dd = aggregate(events$rsvps, by=list(format(events$datetime, "%A")), FUN=sum)
colnames(dd) = c("key", "count")

The difference between this and our previous use of the aggregate function is that we’re passing in the number of RSVPs for each event and then grouping by the day and summing up the values for each day rather than counting how many occurrences there are.

If we evaluate ‘dd’ we get the following:

> dd
        key count
1  Thursday   194
2   Tuesday   740
3 Wednesday  1467

We now have two data tables with a very similar shape and it turns out there’s a function called merge which makes it very easy to convert these two data frames into a single one:

x = merge(groupBy(events$datetime, "%A"), dd, by = "key")
colnames(x) = c("day", "events", "rsvps")
> x
        day events rsvps
1  Thursday      9   194
2   Tuesday     24   740
3 Wednesday     35  1467

We could now choose to order our new data frame by number of events descending:

> x[order(-x$events),]
        day events rsvps
3 Wednesday     35  1467
2   Tuesday     24   740
1  Thursday      9   194

We might also add an extra column to calculate the average number of RSVPs per day:

> x$rsvpsPerEvent = x$rsvps / x$events
> x
        day events rsvps rsvpsPerEvent
1  Thursday      9   194      21.55556
2   Tuesday     24   740      30.83333
3 Wednesday     35  1467      41.91429

I’m still getting the hang of it but already it seems like the combination of R and Neo4j allows us to quickly get insights into our data and I’ve barely scratched the surface!

Categories: Programming

Everything's a Remix

Herding Cats - Glen Alleman - Mon, 06/30/2014 - 23:12

In the estimating discussion there is a popular notion that we can't possibly estimate something we haven't done before. So we have to explore - using the customers money by the way - to discover what we don't know.

The when we hear about we've never done this before and estimating is a waste of time, think about the title of the post.

Everything's a Remix

Other than inventing new physics all software development has been done in some form or another before. The only true original thing in the universe is the Big Bang. Everything else  is derived from something that came before.

Now we may not know about this thing in the past, but that's a different story. It as done before in some form, but we didn't realize it. There are endless examples of copying ideas from the past is thinking they are innovative, new and break through. The iPad and all laptops came from Allan Kay's 1972 paper, "A personal computer for childern of all ages." Even how the touch screen on the iPhone works was done before Apple announced it as the biggest breakthrough in the history of computing.

In our formal defense acquisition paradigm there are many programs that are research. This looks like this flow. Making estiimates about the effort and duration is difficult, so blocks of money are provided to find out. But these are not product production or systems development processes. The Systems Design and Development (SDD) is between MS-B and MS-C. We don't confuse exploring from developing. Want to explore work on a DARPA program. Want to develop, work in post-MS-B and know somethiong about what came before.

5000.02

The Pre-milestone A works is to identify what capabilities will be needed in the final product. The DARPA programs I work are even further to the left of Milestone A. 

On the other end of the spectrum from this formal process, a collection of sticky notes on the wall could have similar flow of maturity. But the principles are still the same.

So How To Estimate in the Presence of We've Never Done This Before

  • The first thing to do is go find someone who has. Hire them, buy them dinner, pick their brain.¬†
  • The next would¬†be to find an example of what you want to build and take it apart. This is what every product designer does. In the fashion business they just copy. In the software business they copy and make it better.
  • Long ago I had an idea, along with several others, of writing a book of¬†reusable code in our domain. Algorithms that could be reused. The IBM FORTRAN Scientific Subroutine Library was our model. The¬†remix of the code elements is now built into hardware chips for doing what we did - process signals from radar systems. The¬†Collected Algorithms of the ACM is a similar idea.

Here's a critical concept - we can't introduce anything new until we're fluent in the language of our domain, and we do that through emulation.† This means for us to move forward we have to have done something like this in the past. So if we haven't done something like this in the past, don't know anyone who has, or can't find an example of it being done, we will have little success being innovative. As well, we will hopelessly fail in trying to estimate the cost, schedule, and probability of delivering capabilities. In other words we'll fail and blame it on the estimating process and assume that we'll be successful if we stop estimating.

So stop thinking about we can't know what we don't know and start thinking someone has done this before and we just need to find that someone, somewhere, something. Nobody starts out being original, we need copying to get started. Once copied, transformation is the next step. With the copy we can estimate size and effort. We can now transform it into something that is better, and since we now know about the thing we copied, we have a reference class. Yes that famous Reference Class Forecasting used by all mature estimating shops. With the copy and it's transformed item, we can them combine ideas into something new. The Alto from Xerox and then the Xerox Star for executives, was the basis of the Lisa and Mac.

The End

You can estimate almost anything, and every software system if you do some home work and suspend the belief it can't be done. WHY? because it's not your money, and those providing you the money have an interest in several things about their money - what will it cost, when will you be done, and using the revenue side of the balanced sheet, when will they break even on the exchange of money for value? This is the principle of every for profit business on the planet. The not-for-profits have to pay the electric bill as well, as do the non-profits. So everyone, everywhere needs to know the cost of that value they asked us top produce BEFORE we've spent all their money and ran out of time to reach the target market for that pesky break even equation. 

Anyone tells you otherwise is not in the business of business, but just on the expense side and that means not on the decision making side either, just labor doing what they're told to do - which is a noble profession, but unlikely to influence how decisions are made. 

The notion of decision rights is the basis of governance. When you hear about doing or not doing something in the absence of who needs this information, ask who needs this information and is it your decision right to decide to fulfill or not fulfill the request for that information? As my colleague, retired NASA Cost Director, says follow the money, that's where you find the decider. 

† Everything is a remix, Part 3, Kirby Furgeson.

Related articles Let's Stop Guessing and Learn How to Estimate How To Fix Martin Fowler's Estimating Problem in 3 Easy Steps How to Deal With Complexity In Software Projects? Do It Right or Do It Twice An Agile Estimating Story Measurement of Uncertainty Reference Design Concept Project Finance

 

Categories: Project Management

R: Order by data frame column and take top 10 rows

Mark Needham - Mon, 06/30/2014 - 22:44

I’ve been doing some ad-hoc analysis of the Neo4j London meetup group using R and Neo4j and having worked out how to group by certain keys the next step was to order the rows of the data frame.

I wanted to drill into the days on which people join the group and see whether they join it at a specific time of day. My feeling was that most people would join on a Monday morning.

The first step was to run the query using RNeo4j and then group by day and hour:

library(Rneo4j)
 
query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"})
         RETURN membership.joined AS joinDate"
 
timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01")
 
meetupMembers = cypher(graph, query)
meetupMembers$joined <- timestampToDate(meetupMembers$joinDate)
 
groupBy = function(dates, format) {
  dd = aggregate(dates, by= list(format(dates, format)), function(x) length(x))
  colnames(dd) = c("key", "count")
  dd
}
 
byDayTime = groupBy(meetupMembers$joined, "%A %H:00")

This returned quite a few rows so we’ll just display a subset of them:

> byDayTime[12:25,]
            key count
12 Friday 14:00    12
13 Friday 15:00     8
14 Friday 16:00    11
15 Friday 17:00    10
16 Friday 18:00     3
17 Friday 19:00     1
18 Friday 20:00     3
19 Friday 21:00     4
20 Friday 22:00     7
21 Friday 23:00     2
22 Monday 00:00     3
23 Monday 01:00     1
24 Monday 03:00     1
25 Monday 05:00     3

The next step was to order by the ‘count’ column which wasn’t too difficult:

> byDayTime[order(byDayTime$count),][1:10,]
              key count
2    Friday 03:00     1
3    Friday 04:00     1
4    Friday 05:00     1
5    Friday 07:00     1
17   Friday 19:00     1
23   Monday 01:00     1
24   Monday 03:00     1
46 Saturday 03:00     1
66   Sunday 06:00     1
67   Sunday 07:00     1

If we run the order function on its own we’ll see that it returns the order in which the current rows in the data frame should appear:

> order(byDayTime$count)
  [1]   2   3   4   5  17  23  24  46  66  67 109 128 129   1  21  44  47  48  81  86  87  88 108 130  16  18  22  25  45  53  64  71  75 107  19  26  49  51  55  56  58  59  61
 [44]  65  68  77  79  85 106 110 143  50  52  54  82  84 101 127 146  27  57  60  62  63  69  70  73  99 103 126 145   6  20  76  83  89 105 122 131 144   7  13  40  43  72  80
 [87] 102  39  78 100 132 147  15  94 121 123 142  14  42  74 104 137 140  12  38  92  93 111 124   8   9  11  90  96 125 139  10  32  34  36  95  97  98  28 135 136  33  35 112
[130] 113 116 134  91 141  41 115 120 133  37 119 138  31 117 118  30 114  29

The first 4 rows in our sorted data frame will be rows 2-5 from the initial data frame, which are:

           key count
2 Friday 03:00     1
3 Friday 04:00     1
4 Friday 05:00     1
5 Friday 07:00     1

So that makes sense! In our case we want to sort in descending order which we can do by prefixing the sorting variable with a minus sign:

> byDayTime[order(-byDayTime$count),][1:10,]
                key count
29     Monday 09:00    34
30     Monday 10:00    28
114   Tuesday 11:00    28
31     Monday 11:00    27
117   Tuesday 14:00    27
118   Tuesday 15:00    27
138 Wednesday 14:00    23
119   Tuesday 16:00    22
37     Monday 17:00    21
115   Tuesday 12:00    20

As expected Monday morning makes a strong showing although Tuesday afternoon is also popular which is unexpected. We’ll need to do some more investigation to figure out what’s going on there.

Categories: Programming

ThreadSanitizer: Slaughtering Data Races

Google Testing Blog - Mon, 06/30/2014 - 22:30
by Dmitry Vyukov, Synchronization Lookout, Google, Moscow

Hello,

I work in the Dynamic Testing Tools team at Google. Our team develops tools like AddressSanitizer, MemorySanitizer and ThreadSanitizer which find various kinds of bugs. In this blog post I want to tell you about ThreadSanitizer, a fast data race detector for C++ and Go programs.

First of all, what is a data race? A data race occurs when two threads access the same variable concurrently, and at least one of the accesses attempts is a write. Most programming languages provide very weak guarantees, or no guarantees at all, for programs with data races. For example, in C++ absolutely any data race renders the behavior of the whole program as completely undefined (yes, it can suddenly format the hard drive). Data races are common in concurrent programs, and they are notoriously hard to debug and localize. A typical manifestation of a data race is when a program occasionally crashes with obscure symptoms, the symptoms are different each time and do not point to any particular place in the source code. Such bugs can take several months of debugging without particular success, since typical debugging techniques do not work. Fortunately, ThreadSanitizer can catch most data races in the blink of an eye. See Chromium issue 15577 for an example of such a data race and issue 18488 for the resolution.

Due to the complex nature of bugs caught by ThreadSanitizer, we don't suggest waiting until product release validation to use the tool. For example, in Google, we've made our tools easily accessible to programmers during development, so that anyone can use the tool for testing if they suspect that new code might introduce a race. For both Chromium and Google internal server codebase, we run unit tests that use the tool continuously. This catches many regressions instantly. The Chromium project has recently started using ThreadSanitizer on ClusterFuzz, a large scale fuzzing system. Finally, some teams also set up periodic end-to-end testing with ThreadSanitizer under a realistic workload, which proves to be extremely valuable. When races are found by the tool, our team has zero tolerance for races and does not consider any race to be benign, as even the most benign races can lead to memory corruption.

Our tools are dynamic (as opposed to static tools). This means that they do not merely "look" at the code and try to surmise where bugs can be; instead they they instrument the binary at build time and then analyze dynamic behavior of the program to catch it red-handed. This approach has its pros and cons. On one hand, the tool does not have any false positives, thus it does not bother a developer with something that is not a bug. On the other hand, in order to catch a bug, the test must expose a bug -- the racing data access attempts must be executed in different threads. This requires writing good multi-threaded tests and makes end-to-end testing especially effective.

As a bonus, ThreadSanitizer finds some other types of bugs: thread leaks, deadlocks, incorrect uses of mutexes, malloc calls in signal handlers, and more. It also natively understands atomic operations and thus can find bugs in lock-free algorithms (see e.g. this bug in the V8 concurrent garbage collector).

The tool is supported by both Clang and GCC compilers (only on Linux/Intel64). Using it is very simple: you just need to add a -fsanitize=thread flag during compilation and linking. For Go programs, you simply need to add a -race flag to the go tool (supported on Linux, Mac and Windows).

Interestingly, after integrating the tool into compilers, we've found some bugs in the compilers themselves. For example, LLVM was illegally widening stores, which can introduce very harmful data races into otherwise correct programs. And GCC was injecting unsafe code for initialization of function static variables. Among our other trophies are more than a thousand bugs in Chromium, Firefox, the Go standard library, WebRTC, OpenSSL, and of course in our internal projects.

So what are you waiting for? You know what to do!
Categories: Testing & QA

Complexity is Simple

Software Architecture Zen - Pete Cripp - Mon, 06/30/2014 - 20:18
I was taken with this cartoon and the comments put up by Hugh Macleod last week over at his gapingvoid.com blog so I hope he doesn’t mind me reproducing it here.

Read more...
Categories: Architecture

Data Doesn't Need to Be Free, But it Does Need to Have Sex

How do we pay for the services we want to create and use? That is the question. Systems like Twitter, Instagram, Pinterest and all the other services you love are not cheap to build at scale. Grow now and figure out your business model later as the VC funding disappears, like hope, is not a sustainable strategy. If we want new services that stick around we are going to have to figure out a way for them to make money.

I’m going to argue here that a business model that could make money for software companies, while benefiting users, is creating an open market for data. Yes, your data. For sale. On an open market. For anyone to buy. Privacy is dead. Isn’t it time we leverage the death of privacy for our own gain?

The idea is to create an ecosystem around the production, consumption, and exploitation of data so that all the players can get the energy they need to live and prosper.

The proposed model:

Categories: Architecture

Business Rhythm Drives Process

Herding Cats - Glen Alleman - Mon, 06/30/2014 - 16:05

The agile notion of delivering early, delivering often is a wonderful platitude, but ignores the underlying business rhythm for accepting the software features into producitive use by the dynamics of any business or market channel. Here's some examples of business rhythms I've worked.

  • Paypal - release continually as features arrive from development. I taught a class on software development management at Carnegie Mellon, with the lead for the development process.
  • Medicaide enrollment - release minimally once a week in support of the enrollment agencies - states, counties, and health providers. Provide emergency releases when rules change without notice.
  • Oracle DB updates use to occur on a weekly basis. Our joke was when asked what version we had in production?¬†Look at your watch, tell me the time and where is the second hand. Oracle figured that was a bad idea and went to announced release dates with a few weeks notice.
  • Health Insurance Provider Network - release with capabilities to move to the next business process is available in the figure below. This approach defines the needed business capabilities, there order of delivery, and the planning processes defines when they will be available. This provides the basis of¬†putting them to work, since more than the software is needed for the business benefits to be accrued. Training, integration, data migration, promotion and advertising, and general¬†Go Live activities.

Capabilities Flow

  • In this example above, the planning process for the needed deliverables, in the proper order was worked out through a¬†Capabilities Based Planning process.
  • Release updates to major defense systems on a planned schedule - with coordination of dozens of sites around the world and dozens of vehicles on orbit. Full verification and validation regression testing of 10's of millions of lines of code, some legacy code going back to the late 1970's which I actually worked on in Fortran 77, running on Vax 11/780's and Cyber 74 mainframes running real time standalone operating system.¬†

Capabilities based planning (v2) from Glen Alleman
  • Next is a larger release process. The flight avionics for the Orion spacecraft is released periodically into a systems integration and test environment. Coordination with other software and hardware elements of the spacecraft in the Crew Exploration Vehicle Avionics Integration Laboratory. This software is developed incrementally with capabilities coming in¬†chunks.¬†But¬†dropping this software on the CAIL requires coordination with other elements on the spacecraft at the pace those become available.
  • A SAP rollout has similar external dependencies for the business rhythm - plan to rollout SAP to 53 sites worldwide for complete intergation across a $37B industrial market. On the site I worked, the go live¬†Monday was a non-event.¬†¬†

What's the Point of All This? When we hear deploy fast, deploy often, maybe once a day, test that platitude  against your business rhythm first to see if it matches. Related articles Do It Right or Do It Twice Agile Requires Discipline, In Fact Successful Projects Require Discipline Delivering Needed Capabilities Orion Powered Up for 26 Hours in New Test Capabilities Based Planning and Development
Categories: Project Management

Step By Step Path to Becoming a Great Software Developer

Making the Complex Simple - John Sonmez - Mon, 06/30/2014 - 15:30

I get quite a few emails that basically say “how do I become a good / great software developer?” These kinds of emails generally tick me off, because I feel like when you ask this kind of question, you are looking for some magical potion you can take that will suddenly make you into a […]

The post Step By Step Path to Becoming a Great Software Developer appeared first on Simple Programmer.

Categories: Programming

Neo4j/R: Grouping meetup members by join timestamp

Mark Needham - Mon, 06/30/2014 - 01:06

I wanted to do some ad-hoc analysis on the join date of members of the Neo4j London meetup group and since cypher doesn’t yet have functions for dealings with dates I thought I’d give R a try.

I started off by executing a cypher query which returned the join timestamp of all the group members using Nicole White’s RNeo4j package:

> library(Rneo4j)
 
> query = "match (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"})
RETURN membership.joined AS joinDate"
 
> meetupMembers = cypher(graph, query)
 
> meetupMembers[1:5,]
[1] 1.389107e+12 1.376572e+12 1.379491e+12 1.349454e+12 1.383127e+12

I realised that if I was going to do any date manipulation I’d need to translate the timestamp into an R friendly format so I wrote the following function to help me do that:

> timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01")

I added another column to the data frame with this date representation:

> meetupMembers$joined <- timestampToDate(meetupMembers$joinDate)
 
> meetupMembers[1:5,]
      joinDate              joined
1 1.389107e+12 2014-01-07 15:08:40
2 1.376572e+12 2013-08-15 14:13:40
3 1.379491e+12 2013-09-18 08:55:11
4 1.349454e+12 2012-10-05 17:28:04
5 1.383127e+12 2013-10-30 09:59:03

Next I wanted to group those timestamps by the combination of month + year for which the aggregate and format functions came in handy:

> dd = aggregate(meetupMembers$joined, by=list(format(meetupMembers$joined, "%m-%Y")), function(x) length(x))
> colnames(dd) = c("month", "count")
> dd
     month count
1  01-2012     4
2  01-2013    52
3  01-2014    88
4  02-2012     7
5  02-2013    52
6  02-2014    91
7  03-2012    12
8  03-2013    23
9  03-2014    93
10 04-2012     3
11 04-2013    34
12 04-2014   119
13 05-2012     9
14 05-2013    69
15 05-2014   102
16 06-2011    14
17 06-2012     5
18 06-2013    39
19 06-2014   114
20 07-2011     4
21 07-2012    16
22 07-2013    20
23 08-2011     2
24 08-2012    34
25 08-2013    50
26 09-2012    14
27 09-2013    52
28 10-2011     2
29 10-2012    29
30 10-2013    42
31 11-2011     2
32 11-2012    31
33 11-2013    34
34 12-2012     7
35 12-2013    19

I wanted to be able to group by different date formats so I created the following function to make life easier:

groupBy = function(dates, format) {
  dd = aggregate(dates, by= list(format(dates, format)), function(x) length(x))
  colnames(dd) = c("key", "count")
  dd
}

Now we can find the join dates grouped by year:

> groupBy(meetupMembers$joined, "%Y")
   key count
1 2011    24
2 2012   171
3 2013   486
4 2014   607

or by day:

> groupBy(meetupMembers$joined, "%A")
        key count
1    Friday   135
2    Monday   287
3  Saturday    80
4    Sunday   102
5  Thursday   187
6   Tuesday   286
7 Wednesday   211

or by month:

> groupBy(meetupMembers$joined, "%m")
   key count
1   01   144
2   02   150
3   03   128
4   04   156
5   05   180
6   06   172
7   07    40
8   08    86
9   09    66
10  10    73
11  11    67
12  12    26

I found the ‘by day’ grouping interesting as I had the impression that the huge majority of people joined meetup groups on a Monday but the difference between Monday and Tuesday isn’t significant. 60% of the joins happen between Monday and Wednesday.

The ‘by month’ grouping is a bit skewed by the fact we’re only half way into 2014 and there have been a lot more people joining this year than in previous years.

If we exclude this year then the spread is more uniform with a slight dip in December:

> groupBy(meetupMembers$joined[format(meetupMembers$joined, "%Y") != 2014], "%m")
   key count
1   01    56
2   02    59
3   03    35
4   04    37
5   05    78
6   06    58
7   07    40
8   08    86
9   09    66
10  10    73
11  11    67
12  12    26

Next up I think I need to get some charts going on and perhaps compare the distributions of join dates of various London meetup groups against each other.

I’m an absolute R newbie so if anything I’ve done is stupid and can be done better please let me know.

Categories: Programming

Keeping a journal

Gridshore - Sun, 06/29/2014 - 23:34

Today I was reading the first part of a book I got as a gift from one of my customers. The book is called Show your work by Austin Kleon(Show Your Work! @ Amazon). The whole idea around this book is that you must be open en share what you learn and the steps you took to learn.

I think this fits me like a glove, but I can be more expressive. Therefore I have decided to do things differently. I want to start by writing smaller pieces of the things I want to do that day, or what I accomplished that day, give some excerpts of things I am working on. Not real blog posts or tutorials but more notes that I share with you. Since it is a Sunday I only want to share the book I am reading.


The post Keeping a journal appeared first on Gridshore.

Categories: Architecture, Programming

SPaMCAST 296 ‚Äď Jeff Dalton, CMMI, Agile, Resiliency

Image

 

Listen to the Software Process and Measurement Cast 296

SPaMCAST 296 features our interview with Jeff Dalton we talked about Agile and resiliency. If Agile is resilient it will be able spring back into shape after being bent or compressed by the pressures of development and support.  In the conversation Jeff and I discussed whether Agile was resilient and how frameworks like the CMMI can be used to make Agile more resilient.

Jeff is Broadsword‚Äôs President, Certified Lead Appraiser,¬†CMMI¬†Instructor, ScrumMaster and author of ‚ÄúagileCMMI,‚ÄĚ Broadsword‚Äôs leading methodology for incremental and iterative process improvement. ¬†He is Chairman of the CMMI Institute‚Äôs Partner Advisory Board and former President of the Great Lakes Software Process Improvement Network (GL-SPIN). ¬†He is a recipient of the Software Engineering Institute‚Äôs¬†SEI Member Award for Outstanding Representative¬†for his work uniting the Agile and¬†CMMI¬†communities together through his popular blog ‚ÄúAsk the¬†CMMI¬†Appraiser.‚ÄĚ ¬†He holds degrees in Music and Computer Science and builds experimental airplanes in his spare time. ¬†You can reach Jeff at¬†appraiser@broadswordsolutions.com.

Contact Data:
Email:  appraiser@broadswordsolutions.com.
Twitter:  @CMMIAppraiser
Blog: http://askthecmmiappraiser.blogspot.com/
Web:  http://www.broadswordsolutions.com/
also see:  http://www.cmmi-tv.com

Next week we will feature our essay on IFPUG Function Points.  IFPUG function points are an ISO Standard means to size projects and applications. IFPUG function points are used across a wide range of project types, industries and countries.

Upcoming Events

Upcoming DCG Webinars:

July 24 11:30 EDT ‚Äď The Impact of Cognitive Bias On Teams
Check these out at www.davidconsultinggroup.com

I will be attending Agile 2014 in Orlando, July 28 through August 1, 2014.  It would be great to get together with SPaMCAST listeners, let me know if you are attending.

I will be presenting at the International Conference on Software Quality and Test Management in San Diego, CA on October 1

I will be presenting at the North East Quality Council 60th Conference October 21st and 22nd in Springfield, MA.

More on all of these great events in the near future! I look forward to seeing all SPaMCAST readers and listeners that attend these great events!

The Software Process and Measurement Cast has a sponsor.

As many you know I do at least one webinar for the IT Metrics and Productivity Institute(ITMPI) every year. The ITMPI provides a great service to the IT profession. ITMPI’s mission is to pull together the expertise and educational efforts of the world’s leading IT thought leaders and to create a single online destination where IT practitioners and executives can meet all of their educational and professional development needs. The ITMPI offers a premium membership that gives members unlimited free access to 400 PDU accredited webinar recordings, and waives the PDU processing fees on all live and recorded webinars. The Software Process and Measurement Cast some support if you sign up here. All the revenue our sponsorship generates goes for bandwidth, hosting and new cool equipment to create more and better content for you. Support the SPaMCAST and learn from the ITMPI.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques¬†co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: ‚ÄúThis book will prove that software projects should not be a tedious process, neither for you or your team.‚ÄĚ Support SPaMCAST by buying the book¬†here.

Available in English and Chinese.

 

 


Categories: Process Management

SPaMCAST 296 ‚Äď Jeff Dalton, CMMI, Agile, Resiliency

Software Process and Measurement Cast - Sun, 06/29/2014 - 22:00

SPaMCAST 296 features our interview with Jeff Dalton we talked about Agile and resiliency. If Agile is resilient it will be able spring back into shape after being bent or compressed by the pressures of development and support.  In the conversation Jeff and I discussed whether Agile was resilient and how frameworks like the CMMI can be used to make Agile more resilient.

Jeff is Broadsword’s President, Certified Lead Appraiser, CMMI Instructor, ScrumMaster and author of “agileCMMI,” Broadsword’s leading methodology for incremental and iterative process improvement.  He is Chairman of the CMMI Institute’s Partner Advisory Board and former President of the Great Lakes Software Process Improvement Network (GL-SPIN).  He is a recipient of the Software Engineering Institute’s SEI Member Award for Outstanding Representative for his work uniting the Agile and CMMI communities together through his popular blog “Ask the CMMI Appraiser.”  He holds degrees in Music and Computer Science and builds experimental airplanes in his spare time.  You can reach Jeff at appraiser@broadswordsolutions.com.

Contact Data:
Email:  appraiser@broadswordsolutions.com.
Twitter:  @CMMIAppraiser
Blog: http://askthecmmiappraiser.blogspot.com/
Web:  http://www.broadswordsolutions.com/
also see:  www.cmmi-tv.com

Next week we will feature our essay on IFPUG Function Points.  IFPUG function points are an ISO Standard means to size projects and applications. IFPUG function points are used across a wide range of project types, industries and countries.

Upcoming Events

Upcoming DCG Webinars:

July 24 11:30 EDT – The Impact of Cognitive Bias On Teams
Check these out at www.davidconsultinggroup.com

I will be attending Agile 2014 in Orlando, July 28 through August 1, 2014.  It would be great to get together with SPaMCAST listeners, let me know if you are attending.

I will be presenting at the International Conference on Software Quality and Test Management in San Diego, CA on October 1

I will be presenting at the North East Quality Council 60th Conference October 21st and 22nd in Springfield, MA.

More on all of these great events in the near future! I look forward to seeing all SPaMCAST readers and listeners that attend these great events!

The Software Process and Measurement Cast has a sponsor.

As many you know I do at least one webinar for the IT Metrics and Productivity Institute(ITMPI) every year. The ITMPI provides a great service to the IT profession. ITMPI’s mission is to pull together the expertise and educational efforts of the world’s leading IT thought leaders and to create a single online destination where IT practitioners and executives can meet all of their educational and professional development needs. The ITMPI offers a premium membership that gives members unlimited free access to 400 PDU accredited webinar recordings, and waives the PDU processing fees on all live and recorded webinars. The Software Process and Measurement Cast some support if you sign up here. All the revenue our sponsorship generates goes for bandwidth, hosting and new cool equipment to create more and better content for you. Support the SPaMCAST and learn from the ITMPI.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

Diagramming Spring MVC webapps

Coding the Architecture - Simon Brown - Sun, 06/29/2014 - 09:54

Following on from my previous post (Software architecture as code) where I demonstrated how to create a software architecture model as code, I decided to throw together a quick implementation of a Spring component finder that could be used to (mostly) automatically create a model of a Spring MVC web application. Spring has a bunch of annotations (e.g. @Controller, @Component, @Service and @Repository) and these are often/can be used to signify the major building blocks of a web application. To illustrate this, I took the Spring PetClinic application and produced some diagrams for it. First is a context diagram.

A context diagram for the Spring PetClinic application

Next up are the containers, which in this case are just a web server (e.g. Apache Tomcat) and a database (HSQLDB by default).

A container diagram for the Spring PetClinic application

And finally we have a diagram showing the components that make up the web application. These, and their dependencies, were found by scanning the compiled version of the application (I cloned the project from GitHub and ran the Maven build).

A component diagram for the Spring PetClinic web application

Here is the code that I used to generate the model behind the diagrams.

The resulting JSON representing the model was then copy-pasted across into my simple (and very much in progress) diagramming tool. Admittedly the diagrams are lacking on some details (i.e. component responsibilities and arrow annotations, although those can be fixed), but this approach proves you can expend very little effort to get something that is relatively useful. As I've said before, it's all about getting the abstractions right.

Categories: Architecture

Neo4j: Set Based Operations with the experimental Cypher optimiser

Mark Needham - Sun, 06/29/2014 - 09:45

A few months ago I wrote about cypher queries which look for a missing relationship and showed how you could optimise them by re-working the query slightly.

To refresh, we wanted to find all the people in the London office that I hadn’t worked with given this model…

…and this initial query:

MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)

This took on average 7.46 seconds to execute using cypher-query-tuning so we came up with the following version which took 150 ms on average:

MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)

With the release of Neo4j 2.1 we can now make use of Ronja – the experimental Cypher optimiser – which performs much better for certain types of queries. I thought I’d give it a try against this one.

We can use the experimental optimiser by prefixing our query like so:

cypher 2.1.experimental MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)

If we run that through the query tuner we get the following results:

$ python set-based.py
 
cypher 2.1.experimental MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)
Min 0.719580888748 50% 0.723278999329 95% 0.741609430313 Max 0.743646144867
 
 
MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)
Min 0.706955909729 50% 0.715770959854 95% 0.731880950928 Max 0.733670949936

As you can see there’s not much in it – our original query now runs as quickly as the optimised one. Ronja #ftw!

Give it a try on your slow queries and see how it gets on. There’ll certainly be some cases where it’s slower but over time it should be faster for a reasonable chunk of queries.

Categories: Programming

Neo4j‚Äôs Cypher vs Clojure ‚Äď Group by and Sorting

Mark Needham - Sun, 06/29/2014 - 03:56

One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language.

A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order.

First let’s create some future and some past events based on the current timestamp of 1404006050535:

CREATE (event1:Event {name: "Future Event 1", timestamp: 1414002772427 })
CREATE (event2:Event {name: "Future Event 2", timestamp: 1424002772427 })
CREATE (event3:Event {name: "Future Event 3", timestamp: 1416002772427 })
 
CREATE (event4:Event {name: "Past Event 1", timestamp: 1403002772427 })
CREATE (event5:Event {name: "Past Event 2", timestamp: 1402002772427 })

If we return all the events we see the following:

$ MATCH (e:Event) RETURN e;
==> +------------------------------------------------------------+
==> | e                                                          |
==> +------------------------------------------------------------+
==> | Node[15414]{name:"Future Event 1",timestamp:1414002772427} |
==> | Node[15415]{name:"Future Event 2",timestamp:1424002772427} |
==> | Node[15416]{name:"Future Event 3",timestamp:1416002772427} |
==> | Node[15417]{name:"Past Event 1",timestamp:1403002772427}   |
==> | Node[15418]{name:"Past Event 2",timestamp:1402002772427}   |
==> +------------------------------------------------------------+
==> 5 rows
==> 13 ms

We can achieve the desired grouping and sorting with the following cypher query:

(def sorted-query "MATCH (e:Event)
WITH COLLECT(e) AS events
WITH [e IN events WHERE e.timestamp <= timestamp()] AS pastEvents,
     [e IN events WHERE e.timestamp > timestamp()] AS futureEvents
UNWIND pastEvents AS pastEvent
WITH pastEvent, futureEvents ORDER BY pastEvent.timestamp DESC
WITH COLLECT(pastEvent) as orderedPastEvents, futureEvents
UNWIND futureEvents AS futureEvent
WITH futureEvent, orderedPastEvents ORDER BY futureEvent.timestamp
RETURN COLLECT(futureEvent) AS orderedFutureEvents, orderedPastEvents")

We then use the following function to call through to the Neo4j server using the excellent neocons library:

(ns neo4j-meetup.db
  (:require [clojure.walk :as walk])
  (:require [clojurewerkz.neocons.rest.cypher :as cy])
  (:require [clojurewerkz.neocons.rest :as nr]))
 
(def NEO4J_HOST "http://localhost:7521/db/data/")
 
(defn cypher
  ([query] (cypher query {}))
  ([query params]
     (let [conn (nr/connect! NEO4J_HOST)]
       (->> (cy/tquery query params)
            walk/keywordize-keys))))

We call that function and grab the first row since we know there won’t be any other rows in the result:

(def query-result (->> ( db/cypher sorted-query) first))

Now we need to extract the past and future collections so that we can display them on the page which we can do like so:

> (map #(% :data) (query-result :orderedPastEvents))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (map #(% :data) (query-result :orderedFutureEvents))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

An alternative approach is to return the events from cypher and then handle the grouping and sorting in clojure. In that case our query is much simpler:

(def unsorted-query "MATCH (e:Event) RETURN e")

We’ll use the clj-time library to determine the current time:

(def now (clj-time.coerce/to-long (clj-time.core/now)))

First let’s split the events into past and future:

> (def grouped-by-events 
     (->> (db/cypher unsorted-query)
          (map #(->> % :e :data))
          (group-by #(> (->> % :timestamp) now))))
 
> grouped-by-events
{true [{:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1424002772427, :name "Future Event 2"} {:timestamp 1416002772427, :name "Future Event 3"}], 
 false [{:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"}]}

And finally we sort appropriately using these functions:

(defn time-descending [row] (* -1 (->> row :timestamp)))
(defn time-ascending [row] (->> row :timestamp))
> (sort-by time-descending (get grouped-by-events false))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (sort-by time-ascending (get grouped-by-events true))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

I used Clojure to do the sorting and grouping in my project because the query to get the events was a bit more complicated and became very difficult to read with the sorting and grouping mixed in.

Unfortunately cypher doesn’t provide an easy way to sort within a collection so we need our sorting in the row context and then collect the elements back again afterwards.

Categories: Programming

Adding #NoEstimates to the Framework

#NoEstimates  . . .Yes or No?

#NoEstimates . . .Yes or No?

 

Hand Drawn Chart Saturday!

When I published An Estimation Framework Is Required In Complex Environments, several people that I respect, including Luis Gonçalves (interviewed on the SPaMCAST 282 with Ben Linders), begged to differ with my conclusion that a framework was ever required.  Luis made an impassioned plea for #NoEstimates.  The premise of #NoEstimates is that estimates enforce a plan and plans many times are overcome by changes that range across both technology and business needs.

Vasco Duarte, a leading proponent of #NoEstimate describes the process as follows:

  1. Select the highest value piece of work the team needs to do.
  2. Break that piece of work down into small components.  Vasco uses the term risk-neutral chunks, which means pieces of work that if they don’t get delivered in the first attempt will not put the project at risk.
  3. Develop each piece of work according to the definition of done. #NoEstimates makes a strong case that unless done means anything other than usable by the end customers, the project is not getting the feedback needed to avoid negative surprises.
  4. Iterate and refactor. Continue until the product or enhancement meets the organization’s definition of done.

Estimates are part of a continuum that begins with budgeting, continues to estimating and terminates at planning.   Organizations build strategic plans based on bringing new or enhanced products to market.  For example, a retailer might commit to opening x number of stores in the next year.  If public, once publicly stated, the organization will need to perform to those commitments or face a wide range of consequences.  Based on experience gathered by working in several retailer’s IT organizations, I know that even a single store is a major effort that includes store operations, purchasing, legal and IT.  Missing an opening date causes embarrassment and typically, large financial penalties (paying workers who aren’t working, rescheduling advertising and possible tax penalties not to mention the impact to stock prices).  Organizations need to budget and estimate at a strategic level.

Where the #NoEstimates approach makes sense is at the planning level.  The #NoEstimates process empowers teams (product owner, Scrum Master/coach and development personnel) to work on the highest value work first and to develop a predictable capacity to deliver work.  The results generated by the team provide feedback to evaluate the promises made though organization-level budgets and estimates.

When performance is at odds with what has been promised business choices should be made.  Choices can range from involving other teams (when this makes sense) to accepting the implications of not meeting the commitments made by the organization.

Does #NoEstimates make sense?  Yes, the process and concepts embodied by #NoEstimates fits solidly into a framework of budgeting, estimating and planning.   Without a framework to codify the use of #NoEstimates and to govern organizational behavior, getting to the point of making hard business choices will generate pressure to fall back to command and control fashion.

Note:  I am working on scheduling an interview and discussion with Luis and Vasco on the Software Process and Measurement Cast to discuss #NoEstimates.


Categories: Process Management

Data Science: Mo’ Data Mo’ Problems

Mark Needham - Sun, 06/29/2014 - 00:35

Over the last couple of years I’ve worked on several proof of concept style Neo4j projects and on a lot of them people have wanted to work with their entire data set which I don’t think makes sense so early on.

In the early parts of a project we’re trying to prove out our approach rather than prove we can handle big data – something that Ashok taught me a couple of years ago on a project we worked on together.

In a Neo4j project that means coming up with an effective way to model and query our data and if we lose track of this it’s very easy to get sucked into working on the big data problem.

This could mean optimising our import scripts to deal with huge amounts of data or working out how to handle different aspects of the data (e.g. variability in shape or encoding) that only seem to reveal themselves at scale.

These are certainly problems that we need to solve but in my experience they end up taking much more time than expected and therefore aren’t the best problem to tackle when time is limited. Early on we want to create some momentum and keep the feedback cycle fast.

We probably want to tackle the data size problem as part of the implementation/production stage of the project to use Michael Nygaard’s terminology.

At this stage we’ll have some confidence that our approach makes sense and then we can put aside the time to set things up properly.

I’m sure there are some types of projects where this approach doesn’t make sense so I’d love to hear about them in the comments so I can spot them in future.

Categories: Programming