Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

History of Definitions of ET

James Bach’s Blog - Mon, 03/16/2015 - 16:20

History of the term “Exploratory Testing” as applied to software testing within the Rapid Software Testing methodology space.

For a discussion of the some of the social and philosophical issues surrounding this chronology, see Exploratory Testing 3.0.

1988 First known use of the term, defined variously as “quick tests”; “whatever comes to mind”; “guerrilla raids” – Cem Kaner, Testing Computer Software(There is explanatory text for different styles of ET in the 1988 edition of Testing Computer Software. Cem says that some of the text was actually written in 1983.) 1990 “Organic Quality Assurance”, James Bach’s first talk on agile testing filmed by Apple Computer, which discussed exploratory testing without using the words agile or exploratory. 1993 June: “Persistence of Ad Hoc Testing” talk given at ICST conference by James Bach. Beginning of James’ abortive attempt to rehabilitate the term “ad hoc.” 1995 February: First appearance of “exploratory testing” on Usenet in message by Cem Kaner 1995 Exploratory testing means learning, planning, and testing all at the same time. – James Bach (Market Driven Software Testing class) 1996 Simultaneous exploring, planning, and testing. – James Bach (Exploratory Testing class v1.0) 1999 An interactive process of concurrent product exploration, test design, and test execution. – James Bach (Exploratory Testing class v2.0) 2001(post WHET #1) The Bach ViewAny testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests.

The Kaner View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed, uses information gained while testing to design new and better tests, and where the following conditions apply:

  • The tester is not required to use or follow any particular test materials or procedures.
  • The tester is not required to produce materials or procedures that enable test re-use by another tester or management review of the details of the work done.

– Resolution between Bach and Kaner following WHET #1 and BBST class at Satisfice Tech Center.

(To account for both of views, James started speaking of the “scripted/exploratory continuum” which has greatly helped in explaining ET to factory-style testers) 2003-2006 Simultaneous learning, test design, and test execution – Bach, Kaner 2006-2015 An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project. – (Bach/Bolton edit of Kaner suggestion) 2015 Exploratory testing is now a deprecated term within Rapid Software Testing methodology. See testing, instead. (In other words, all testing is exploratory to some degree. The definition of testing in the RST space is now: Evaluating a product by learning about it through exploration and experimentation including to some degree: questioning, study, modeling, observation, inference, etc.)

 

Categories: Testing & QA

11 Rules All Programmers Should Live By

Making the Complex Simple - John Sonmez - Mon, 03/16/2015 - 15:00

I am a person who tends to live by rules. Now granted, they are mostly rules I set for myself—but they are still rules. I find that creating rules for myself helps me to function better, because I pre-decide things ahead of time instead of making all kinds of decisions on the fly. Should I […]

The post 11 Rules All Programmers Should Live By appeared first on Simple Programmer.

Categories: Programming

Python: Transforming Twitter datetime string to timestamp (z’ is a bad directive in format)

Mark Needham - Sun, 03/15/2015 - 23:43

I’ve been playing around with importing Twitter data into Neo4j and since Neo4j can’t store dates natively just yet I needed to convert a date string to timestamp.

I started with the following which unfortunately throws an exception:

from datetime import datetime
date = "Sat Mar 14 18:43:19 +0000 2015"
 
>>> datetime.strptime(date, "%a %b %d %H:%M:%S %z %Y")
 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 317, in _strptime
    (bad_directive, format))
ValueError: 'z' is a bad directive in format '%a %b %d %H:%M:%S %z %Y'

%z is actually a valid option used to extract the timezone but my googling suggests it not working is one of the idiosyncrasies of strptime.

I eventually came across the python-dateutil library, as recommended by Joe Shaw on StackOverflow.

Using that library the problem is suddenly much simpler:

$ pip install python-dateutil
from dateutil import parser
parsed_date = parser.parse(date)
 
>>> parsed_date
datetime.datetime(2015, 3, 14, 18, 43, 19, tzinfo=tzutc())

To get to a timestamp we can use calendar as I’ve described before:

import calendar
timestamp = calendar.timegm(parser.parse(date).timetuple())
 
>>> timestamp
1426358599
Categories: Programming

SPaMCAST 333 – What is Agile, Selling Defect Control, Planning Communication

www.spamcast.net

                      http://www.spamcast.net

Listen Now

Subscribe on iTunes

This week’s Software Process and Measurement Cast is a magazine feature with three columns. This week we have columns from Kim Pries, The Software Sensei, and Jo Ann Sweeney’s Explaining Change.  In this installment Kim discusses the ins and outs of selling defect control.  In Explaining Change, Jo Ann tackles the concept of planning for communication (protip: it is better than winging it). The SPaMCAST essay this week tackles the topic of what is and isn’t Agile.  Does just saying you are Agile make you Agile?  We think not!

Call to action!

Can you tell a friend about the podcast? If your friends don’t know how to subscribe or listen to a podcast, show them how you listen and subscribe them!  Remember to send us the name of you person you subscribed (and a picture) and I will give both you and the horde you have converted to listeners a call out on the show.

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

I am beginning to think of which book will be next. Do you have any ideas?

Upcoming Events

CMMI Institute Conference EMEA 2015
March 26 -27 London, UK
I will be presenting “Agile Risk Management.”
http://cmmi.unicom.co.uk/

QAI Quest 2015
April 20 -21 Atlanta, GA, USA
Scale Agile Testing Using the TMMi
http://www.qaiquest.org/2015/

DCG will also have a booth!

CANCELED -International Conference on Software Quality and Test Management
Washington D.C. May 31 – June 5, 2015

Next SPaMCast

The next Software Process and Measurement Cast will feature our interview with Agile coach Mario Lucero.  Mario and I discussed the nuts and bolts of coaching Agile teams, what is and isn’t Agile and the impact of coaching on success.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.


Categories: Process Management

SPaMCAST 333 – What is Agile, Selling Defect Control, Planning Communication

Software Process and Measurement Cast - Sun, 03/15/2015 - 22:00

This week’s Software Process and Measurement Cast is a magazine feature with three columns. This week we have columns from Kim Pries, The Software Sensei, and Jo Ann Sweeney’s Explaining Change.  In this installment Kim discusses the ins and outs of selling defect control.  In Explaining Change, Jo Ann tackles the concept of planning for communication (protip: it is better than winging it). The SPaMCAST essay this week tackles the topic of what is and isn’t Agile.  Does just saying you are Agile make you Agile?  We think not!

Call to action!

Can you tell a friend about the podcast? If your friends don’t know how to subscribe or listen to a podcast, show them how you listen and subscribe them!  Remember to send us the name of you person you subscribed (and a picture) and I will give both you and the horde you have converted to listeners a call out on the show. 

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

I am beginning to think of which book will be next. Do you have any ideas?

Upcoming Events

CMMI Institute Conference EMEA 2015
March 26 -27 London, UK
I will be presenting “Agile Risk Management.” 
http://cmmi.unicom.co.uk/

QAI Quest 2015
April 20 -21 Atlanta, GA, USA
Scale Agile Testing Using the TMMi
http://www.qaiquest.org/2015/

DCG will also have a booth!

CANCELED -International Conference on Software Quality and Test Management 
Washington D.C. May 31 - June 5, 2015

Next SPaMCast

The next Software Process and Measurement Cast will feature our interview with Agile coach Mario Lucero.  Mario and I discussed the nuts and bolts of coaching Agile teams, what is and isn’t Agile and the impact of coaching on success. 

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

Managing Projects By The Numbers

Herding Cats - Glen Alleman - Sun, 03/15/2015 - 19:42

Screen Shot 2015-03-14 at 12.40.20 PMWhen we hear we don't need deadlines; we don't need estimates; we can slice work small enough to have each and every activity be the same size, with no variance; we can use small samples with ±30% variance to forecast future outcomes in the absence of any uncertainties; we're Bad at Estimates; and the plethora of other reasons for not doing the work needed to be proper stewards of our customer's money - we're really saying we weren't paying attention in the High School Statistics Class.

All project activities are probabilistic, driven by the underlying uncertainties of the individual random processes. When coupled together - only trivial projects have independent work activities - cost, schedule, and technical activities drive each other in non-linear, non-deterministic ways. Managing in the presence of this uncertainty means risk reduction work or margin. Both are needed.

When it's not our money, we are obligated to be stewards of the money from those paying for the production of the value. This is a core principle of all business operations.

The notion that those on the cost side of the balance sheet have an equal voice of those on the revenue side when it comes to managing the firm's money to produce products or services seems to be lost on those not accountable for managing the money. 

Categories: Project Management

Haystack TV Doubles Engagement with Android TV

Android Developers Blog - Sat, 03/14/2015 - 23:51

Posted by Joshua Gordon, Developer Advocate

Haystack TV is a small six person startup with an ambitious goal: personalize the news. Traditionally, watching news on TV means viewing a list of stories curated by the network. Wouldn’t it be better if you could watch a personalized news channel, based on interesting YouTube stories?

Haystack already had a mobile app, but entering the living room space seemed daunting. Although “Smart TVs” have been on the market for a while, they remain challenging for developers to work with. Many hardware OEMs have proprietary platforms, but Android TV is different. It’s an open ecosystem with great developer resources. Developers can reach millions of users with familiar Android APIs. If you have an existing Android app, it’s easy to bring it to the living room.

Two weeks was all it took for Haystack TV to bring their mobile app to Android TV. That includes building an immersive, cinematic UI (a task greatly simplified by the Android framework). Since launching on Android TV, Haystack TV’s viewership is growing at 40% per month. Previously, users were spending about 40 minutes watching content on mobile per week. Now that’s up to 80 minutes in the living room. Their longest engagements are through Chromecast and Android TV.

Hear from Daniel Barreto, CEO of Haystack TV, on developing for Android TV

Haystack TV’s success on Android TV is a great example of how the Android multi-form factor developer experience shines. Once you’ve learned the ropes of writing Android apps, developing for another form factor (Wear, Auto, TV) is simple.

Android TV helps you create cinematic UIs

Haystack TV’s UI is smooth and cinematic. How were they able to build a great one so quickly? Developing an immersive UI/UX with Android TV is surprisingly easy. The Leanback support library provides fragments for browsing content, showing a details screen, and search. You can use these to get transitions and animations almost for free. To learn more about building UIs for Android TV, watch the Using the Leanback Library DevByte and check out the code samples.

Browsing recommended stories

Your content, front and center

The recommendations row is a central feature of the Android TV home screen. It’s the first thing users see when they turn on their TVs. You can surface content to appear on the recommendations row by implementing the recommendation service. For example, your app can suggest videos your users will want to watch next (say, the next episode in a series, or a related news story). This is great for getting noticed and increasing engagements.

Haystack’s content on the recommendations row

Make your content searchable

How can users find their favorite movie or show from a library of thousands? On Android TV, they can search for it using their voice. This is much faster and more relaxing than typing on the screen with a remote control! In addition to providing in-app search, your app can surface content to appear on the global search results page. The framework takes care of speech recognition for you and delivers the result to your app as a plain text string.

Next Steps

Android TV makes it possible for small startups to create apps for the living room. There are extensive developer resources. For an overview, watch the Introduction to Android TV DevByte. For details, see the developer training docs. Watch this episode of Coffee with a Googler to learn more about the vision for the platform. To get started on your app, visit developer.android.com/tv.

Join the discussion on

+Android Developers
Categories: Programming

Re-Read Saturday: The Goal: A Process of Ongoing Improvement. Part 4

IMG_1249

Eliyahu M. Goldratt and Jeff Cox wrote The Goal: A Process of Ongoing Improvement in 1984.  The Goal is a business novel. It uses the story of Alex Rogo, plant manager, to illustrate the theory of constraints and how the wrong measurement focus can harm an organization. The focus of the re-read to-date has been on the ideas and concepts that affect Software Process and Measurement readers and listeners. The focus we are taking is not meant as a slight to the part of this book that is a novel, but rather to provide focus on the ideas that have shaped lean thinking. Earlier entries in this re-read are:

Part 1                       Part 2                       Part 3

In the next 3 chapters Alex and his team begin to explore how their newly emerging understanding of throughput, inventory and operational expense can be applied in their organization.

Chapter 10

In Chapter 8 Alex and Johan discussed three critical definitions:

Throughput – the rate at which the system generates money through sales.

Inventory – all of the money the system has invested in purchasing things it intends to sell.

Operational Expense – all the money the system spends in order to turn inventory into throughput.

In Chapter 10 Alex and team  discuss whether all of the activities at the plant can be measured by the concepts of throughput, inventory and operational expense. The three definitions equate to money coming in, inventory is money in the system and operational expense is money that leaves the system. The three measures generate consternation and debate. For example, the team spends time debating individual expenses and whether the expense is operational or included in inventory. In the end they conclude that everything can be measured by the three measures. The problem is that they can’t conceive of how the three metrics can be used to turn the plant around. Alex goes to off New York City to meet with Johan the next morning to get some more answers.

Chapter 11

Alex and Johan discuss the problems in the plant over coffee in the hotel restaurant before Johan meets the people he in NYC to visit. Johan guides Alex by asking questions and letting him find the answers for himself (many of us would recognize the techniques Johan uses to help Alex to formulate answers as being very similar to those used in Agile coaching). Key discoveries include the realization that the inventory problem the plant is experiencing is a reflection of over utilization of plant personnel on specific tasks rather than focusing on throughput. The measure of 100% capacity utilization leads to inefficiencies when the parts can’t be used. Conceptually this point is the same as if a software development process that segregates coders and testers, and one group produces more than the other can immediately absorb. For example, coders often keep generating more code even if the testers can’t immediately begin testing the work that is being handed off. The Agile/lean concept of self-organizing teams would attack the problem differently by shifting resources to avoid building excess inventory. The organization’s over-reliance on that single measure of efficiency in isolation has caused disastrous results.

Johan and Alex identify two phenomenon that cause the problems that Alex and his management team are seeing. The first is the idea of dependent events, which is when one event must happen before another. For example, code must be written before the unit tests can be run. The second is statistical fluctuation, said differently all process have some degree of variability. The execution of events vary for many reasons. The book uses an example of a waiter and the variability in the time needed bring the check and pay (Johan was pressed for time and had fit his meeting with Alex into his breakfast). If we consider coding and unit testing as a process, most of us would recognize that coding and and testing any specific piece of work has a degree of variability. Variance could be caused because some problems are harder or the programmer might get hungry and go to lunch before they finish a specific piece of work. When both concepts interact process variability becomes much harder to “plan” out of existence.

Johan leaves Alex with the question of how he can apply these concepts and the three measures to save the plant.

Chapter 12

This chapter could easily be written off as purely back-story. However in the discussion with his wife, she and Alex discuss how to establish a pace that respects both the needs of the businesses and family life. In today’s IT environment we would recognize the discussion as one of sustainable pace. Sustainable pace typically reflects a rate of work that can be consistently performed. It is a concept commonly used when discussing Agile.

Chapters 10 – 12 mark a turning point in the book. Alex has embraced a more systems view of the plant and that the measures that have been used to date are more focused on optimizing parts of the process to the detriment to overall goal of the plant.  What has not fallen into place is how to take that new knowledge and change how the plant works.

Summary of The Goal so far:

Chapters 1 through 3 actively present the reader with a burning platform. The plant and division are failing. Alex Rogo has actively pursued increased efficiency and automation to generate cost reductions, however performance is falling even further behind and fear has become central feature in the corporate culture.

Chapters 4 through 6 shift the focus from steps in the process to the process as a whole. Chapters 4 – 6 move us down the path of identifying the ultimate goal of the organization (in this book). The goal is making money and embracing the big picture of systems thinking. In this section, the authors point out that we are often caught up with pursuing interim goals, such as quality, efficiency or even employment, to the exclusion of the of the ultimate goal. We are reminded by the burning platform identified in the first few pages of the book, the impending closure of the plant and perhaps the division, which in the long run an organization must make progress towards their ultimate goal, or they won’t exist.

Chapters 7 through 9 show Alex’s commitment to change, seeks more precise advice from Johan, brings his closest reports into the discussion and begins a dialog with his wife (remember this is a novel). In this section of the book the concept “that you get what you measure” is addressed. In this section of the book, we see measures of efficiency being used at the level of part production, but not at the level of whole orders or even sales. We discover the corollary to the adage ‘you get what you measure’ is that if you measure the wrong thing …you get the wrong thing. We begin to see Alex’s urgency and commitment to make a change.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast. Dead Tree Version or Kindle Version


Categories: Process Management

Pi Day

Herding Cats - Glen Alleman - Sat, 03/14/2015 - 14:30

March 14, 2015

3/14/15

at

9:26:53 AM

π=3.141592653

Categories: Project Management

Python: Checking any value in a list exists in a line of text

Mark Needham - Sat, 03/14/2015 - 03:52

I’ve been doing some log file analysis to see what cypher queries were being run on a Neo4j instance and I wanted to narrow down the lines I looked at to only contain ones which had mutating operations i.e. those containing the words MERGE, DELETE, SET or CREATE

Here’s an example of the text file I was parsing:

$ cat blog.txt
MATCH n RETURN n
MERGE (n:Person {name: "Mark"}) RETURN n
MATCH (n:Person {name: "Mark"}) ON MATCH SET n.counter = 1 RETURN n

So I only want lines 2 & 3 to be returned as the first one only returns data and doesn’t execute any updates on the graph.

I started off with a very crude way of doing this;

with open("blog.txt", "r") as ins:
    for line in ins:
        if "MERGE" in line or "DELETE" in line or "SET" in line or "CREATE" in line:
           print line.strip()

A better way of doing this is to use the any command and make sure at least one of the words exists in the line:

mutating_commands = ["SET", "DELETE", "MERGE", "CREATE"]
with open("blog.txt", "r") as ins:
    for line in ins:
        if any(command in line for command in mutating_commands):
           print line.strip()

I thought I might be able to simplify the code even further by using itertools but my best attempt so far is less legible than the above:

import itertools
 
commands = ["SET", "CREATE", "MERGE", "DELETE"]
with open("blog.txt", "r") as ins:
    for line in ins:
        if len(list(itertools.ifilter(lambda x: x in line, mutating_commands))) > 0:
            print line.strip()

I think I’ll go with the 2nd approach for now but if I’m doing something wrong with itertools and it’s much easier to use than the example I’ve shown do correct me!

Categories: Programming

A new reference app for multi-device applications

Android Developers Blog - Fri, 03/13/2015 - 19:56
It is now possible to bring the benefits of your app to your users wherever they happen to be, no matter what device they have near them. Today we’re releasing a reference sample that shows how to implement such a service with an app that works across multiple Android form-factors. This sample, the Universal Music Player, is a bare-bones but functional reference app that supports multiple devices and form factors in a single codebase. It is compatible with Android Auto, Android Wear, and Google Cast devices. Give it a try and easily adapt your own app for wherever your users are, be that a phone, watch, TV, car, or more!

Playback controls and album art in the lock screen.
On the application toolbar, the Google Cast icon.
Controlling playback through Android Auto

Controlling playback on an Android Wear watch
This sample uses a number of new features in Android 5.0 Lollipop, like MediaStyle notifications, MediaSession and MediaBrowserService. They make it easy to implement media browsing and playback on multiple devices with a single version of your app.

Check out the source code and let your users enjoy your app from wherever they like.

Posted by Renato Mangini, Senior Developer Platform Engineer, Google Developer Platform Team Join the discussion on

+Android Developers
Categories: Programming

Stuff The Internet Says On Scalability For March 13th, 2015

Hey, it's HighScalability time:


1957: 13 men delivering a computer. 2017: a person may wear 13 computing devices (via Vala Afshar)
  • 5.3 million/second: LinkedIn metrics collected; 1.7 billion: Tinder ratings per day
  • Quotable Quotes:
    • @jankoum: WhatsApp crossed 1B Android downloads. btw our android team is four people + Brian. very small team, very big impact.
    • @daverog: Unlike milk, software gets more expensive, per unit, in larger quantities (diseconomies of scale) @KevlinHenney #qconlondon
    • @DevOpsGuys: Really? Wow! RT @DevOpsGuys: Vast majority of #Google software is in a single repository. 60million builds per year. #qconlondon
    • Ikea: We are world champions in making mistakes, but we’re really good at correcting them.
    • Chris Lalonde: I know a dozen startups that failed from their own success, these problems are only going to get bigger.
    • Chip Overclock: Configuration Files Are Just Another Form of Message Passing (or Maybe Vice Versa)
    • @rbranson: OH: "call me back when the number of errors per second is a top 100 web site."
    • @wattersjames: RT @datacenter: Worldwide #server sales reportedly reach $50.9 billion in 2014
    • @johnrobb: Costs of surveillance.  Bots are cutting the costs by a couple of orders of magnitude more.

  • Is this a sordid story of intrigue? How GitHub Conquered Google, Microsoft, and Everyone Else. Not really. The plot arises from the confluence of the creation of Git, the evolution of Open Source, small network effects, and a tide rising so slowly we may have missed it. Brian Doll (a GitHub VP) is letting us know the water is about nose height: Of GitHub’s ranking in the top 100, Doll says, “What that tells me is that software is becoming as important as the written word.”

  • This is audacious. Peter Lawrey Describes Petabyte JVMs. For when you really really want to avoid a network hit and access RAM locally. The approach is beautifully twisted: It’s running on a machine with 6TB and 6 NUMA regions. Since, as previously noted, we want to try and restrict the heap or at least the JVM to a single NUMA region, you end up with 5 JVMs with a heap of up to 64GB each, and memory mapped caches for both the indexes and the raw data, plus a 6th NUMA region reserved just for the operating system and monitoring tasks.

  • There's a parallel between networks and microprocessors in that network latency is not getting any faster and microprocessor speeds are not getting any faster, yet we are getting more and more bandwidth and more and more MIPS per box. Programming for this reality will be a new skill. Observed while listening to How Did We End Up Here?

  • Advances in medicine will happen because of big data. We need larger sample sizes to figure things out. Sharing Longevity Data.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Estimating Probabilistic Outcomes? Of Course We Can!

Herding Cats - Glen Alleman - Fri, 03/13/2015 - 16:30

At a client site in Sacramento, CA off and on since last December. Driving back to hotel tonight, there was a radio show on the current Uniform California Earthquake Rupture Forecast (Version 3).

UCERF3_postcard

A former Colorado neighbor is an earthquake researcher and we've had discussions about probabilistic estimation of complex systems and complexity sciences

The notion that we can't predict - to some level of confidence - outcomes in the future is of course simply not correct. Earthquake prediction is not technically possible in the populist sense. It's a complex probabilistic process. 

Making forecasts - estimates of future outcomes - for software development projects is much less complex. The processes used to make these estimates range from past performance time series to multi-dimensional parametric models. Several tools are available for these parametric model. Steve McConnell provided  an original one I use a decade or so ago. Steve provides some background on making estimates where he speaks of the 10 deadly sins

  • Confusing targets with estimates - the bug-a-boo of all #NoEstimates advocates. It's simple - DON'T DO THIS.
  • Saying yes when you mean no - no quantitative data and guessing mans bad estimates - DON'T DO THIS
  • Committing Too Early - use cone of uncertainty
  • Assume underestimating has no impact on project - DON'T DO THIS
  • Estimating in the Impossible Zone - an optimistic estimate has a non-zero probability of coming true
  • Overestimating from use of new tools - DON'T DO THIS    
  • Using only one estimating technique - DON'T DO THIS
  • Not using estimating software - DON'T DO THIS
  • Not including risk factors - the the primary sin of the simple small samples of stories or story points used to linearly forecast future performance. DON'T DO THIS.
  • Providing off the cuff estimates - this is called guessing. DON'T DO THIS.

When you need to estimate - as you do in any non-trivial project - make sure you're not committing any of the 10 sins Steve mentions.

So Why The Earthquakes and SW Estimates?

One process is very complex and emerging science. One is a well developed mathematical process. 

There is so much misinformation about estimating software development, it's hard to know where to start. From outright wrong math, to misuse of mathematical concepts, to failure to acknowledge that estimates aren't for developers, they're for those paying the developers.

Categories: Project Management

Will You Offer This for Free?

NOOP.NL - Jurgen Appelo - Fri, 03/13/2015 - 12:54
free

I have done many things for free.

All chapters of my new #Workout book have been available for free since I started writing them, because this helped me get early and fast feedback. I have done countless of free talks for local communities, because it allowed me to learn about their needs and stories. And I have offered plenty of participants free seats in my workshops when they offered me some other value in return.

What I don’t do is give away stuff for free just because people ask.

The post Will You Offer This for Free? appeared first on NOOP.NL.

Categories: Project Management

Empirical Data Used to Estimate Future Performance Re-Deux

Herding Cats - Glen Alleman - Fri, 03/13/2015 - 05:05

Came across this puzzling tweet today

... real empirical data & using probability to forecast are worlds apart. I don't buy "estimation uses data" argument.

This is always reminds me of Wolfgang Pauli's remark to a colleague who showed him a paper from an author who wanted Pauli to comment on...

Das ist nicht nur nicht richtig, es ist nicht einmal falsch!
It is not only not right, it is not even wrong

So let's Look At a Simple Forecasting Process

First forecasting is about the future. Estimates are about the past, present, and future. So estimates of future cost, schedule, and technical performance can be called forecasts.

A project's past performance data is a time series. Gather from things that happened in the past, based on intervals of times. These intervals should be evening spaced. They don't have to be, but that makes the analysis more complex. The example below is done with R is a statistical programming language found here www.r-project.org. R is used in a wide variety of domains. I was introduced to R through our son's work in cellular biology when he pointed out I'd get a D in his BioStats class he taught. Stop making linear projects, unadjusted for the variances of the past, and most of all unadjusted for variances created from uncertainty in the futureCome on Dad get with the program of making risk informed decisions. Here's a good reference on how to do this at a much broader scale Risk Informed Decision Making Handbook.

Below is a R plot from historical data of a project cost parameter, forecasting the possible values of this parameter to the future, using ARIMA (Autoregressive Integrated Moving Average). ARIMA is built into R - which can be downloaded for free. R and its statistical analysis capabilities are used in our domain to develop estimates. Using past performance - in the example below of cost index - we can forecast the eange and confidence on the bounds of that range - for cost index values.

Screen Shot 2015-03-11 at 3.37.12 PM

The chart above is from the paper below.

Earned Value Management Meets Big Data from Glen Alleman So when you hear about empirical data being used for forecasting the future. Or you hear the statement in the opening of this post, ask several questions:
  • Do you have any experience forecasting future outcomes from past performance that is mathematically credible?
  • Did you adjust your forecast for past variances?
  • Did you adjust your forecast for future uncertainty?

No? Then it's unlikely your number will have any chance of being correct.

We Have No Empirical Data, Now What?

Here's a continuation of the Tweet stream

Have you ever been asked to estimate something and haven't got any empirical data. This happens all the time. New teams are put together in new domains and asked to estimate, which really means commit. I don't see too many managers gathering real data about their projects and using them to forecast lead times.

Here's the way to solve this non-problem, problem.

This is a short list, just from my office book shelf. The office library has dozens of other books and the files have many dozens of recent papers on estimating software in the absence of empirical data. Google will find you 100's more.

So here's the final outcome. Whenever we hear about the reason we can't estimate, it's simply not true, never was true, never will be true. 

Related articles Software Engineering is a Verb Failure is not an Option Managing in the Presence of Uncertainty
Categories: Project Management

Accuracy and Precision: Brothers From a Different Species

283037976_fa4d85f7ab_b

In a recent discussion of the concepts of Cost of Delay (CoD) and Weighted Short Job First (WSJF) the used of tables and mathematical formulae led to a discussion of accuracy and precision.  The use of  math and formulae suggest high levels of accuracy and precision even though the CoD and project sizes used were relative estimates. One of the questions posed in the class was whether the application of WSJF, or for that matter ANY prioritization or estimation technique is actuate enough to use in making decisions when the input is imprecise estimated data. Techniques like CoD and WSJF when used to prioritize work are generally based on estimates, rather than ultra-precise measures. Remember the of techniques like WSJF is to generate accurate priorities so that the most important work gets into the pipeline. Prioritization based on frameworks like WSJF require an understanding of the concepts of accuracy and precision and what is required in terms of accuracy or precision when prioritizing work. Often the two concepts are conflated however the two concepts of accuracy and precision are different but interrelated.

Accuracy is defined as how close a measured value is to the true value (www.mathisfun.com) or as freedom from mistake or error (www.merriam-webster.com/dictionary/accuracy). In some instances, accuracy is easy to judge. For example, (Pi) to 10 decimal points is 3.1415926535 (if you are a geek, PI to a million). Whereas Pi calculated to two decimal points is 3.14. Both are accurate. Judging accuracy gets messier when the outcome is unknown, for instance generating an estimate for project prioritization. In order to judge accuracy we generally apply a decision making rule or standard which is typically expressed as a confidence interval or a degree of risk. Another popular alternative is to use a comparison to a relative measure, such an analogy or story points. In order for anything to be perceived as accurate it has to be within the standards set so that the organization can have confidence as they prioritize work.

Precision is a reflection of exactness. In the example of Pi, the 3.14 is accurate but not precise. Calculating Pi to a million decimal points increases the level of precision but doesn’t make the calculation precise. PI is an irrational number we will never be able to calculate PI precisely. In software development, most activities required to deliver business value have some level of variability in performance. Variability makes precise prediction difficult or impossible. Estimates of software development are imprecise by nature. Imprecision driven by the variability in the processes and by the inability to know all the factors that will be encountered makes precision difficult at best. Precision is typically generated by padding the estimates, cutting or adding scope so that the outcome matches the estimate, or in some cases fibbing about the results.

Given effort and consternation that often are often taken to generate false precision, a better avenue is to define an acceptable level of precision so that accuracy is an artifact of doing the work rather than an artifact of controlling how the work is measured. For example, estimating that a piece of work will be done at 2:33.06 PM EST takes far more effort than estimating it will be completed this afternoon.  Both estimates could be equally accurate although the later has more chance at accuracy. Techniques like CoD and WSJF when used to prioritize work are generally based on estimates, rather than ultra-precise measures. The goal is to generate accurate priorities so that the most important work gets into the pipeline. Like many things in Agile and other iterative frameworks, just because a piece of work is not at the top of the list today does not mean its priority can’t increase when the list is reassessed. Organizations have to decide what level of precision is really needed and whether they are willing to trade precision for accuracy.

This weekend I am going to run the 35th Annual St. Malachi Church Run in a time of roughly fifty minutes. The race is five miles long and the forecast is for a very cold rain. In this case, my estimate is probably accurate (time will tell), but not precise. Last year, I actually ran the race in 49.06.36 minutes. The measured results are very precise and are accurate within the tolerance of the measurement system used (chips in the race bib). While I feel faster this year, but I am a bit heavier and a bit older which increases the possible variability in my speed. In terms of setting expectations, I think I would rather be accurate than precise so that my support team can meet me with the umbrella without have to stand in the rain too long themselves.

PS March 14th is Pi Day – have an irrationally good day!


Categories: Process Management

Paper: Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores

How will your OLTP database perform if it had to scale up to 1024 cores? Not very well according to this fascinating paper: Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores, where a few intrepid chaos monkeys report the results of their fiendish experiment. The conclusion: we need a completely redesigned DBMS architecture that is rebuilt from the ground up.

Summary:

Categories: Architecture

Are Ads On Your Blog Generally Frowned Upon?

Making the Complex Simple - John Sonmez - Thu, 03/12/2015 - 16:00

In this video I respond to an email asking about whether or not having ads on your blog is good or bad. I explain that unless you are making significant conversions, you’d be better off without the ads. A good way to way judge this is if you have 500 daily visitors, then you might […]

The post Are Ads On Your Blog Generally Frowned Upon? appeared first on Simple Programmer.

Categories: Programming

Video - Agility and the essence of software architecture

Coding the Architecture - Simon Brown - Wed, 03/11/2015 - 22:59

This is just a quick note to say that the video of my "Agility and the essence of software architecture" talk from YOW! 2014 in Brisbane is now available to watch online. This talk covers the subject of software architecture and agile from a number of perspectives, focusing on how to create agile software systems in an agile way.

Agility and the essence of software architecture

The slides are also available to view online/download. A huge thanks to everybody who attended for making it such a fun session. :-)

Categories: Architecture

Python/Neo4j: Finding interesting computer sciency people to follow on Twitter

Mark Needham - Wed, 03/11/2015 - 22:13

At the beginning of this year I moved from Neo4j’s field team to dev team and since the code we write there is much lower level than I’m used to I thought I should find some people to follow on twitter whom I can learn from.

My technique for finding some of those people was to pick a person from the Neo4j kernel team who’s very good at systems programming and uses twitter which led me to Mr Chris Vest.

I thought that the people Chris interacts with on twitter are likely to be interested in this type of programming so I manually traversed out to those people and had a look who they interacted with and put them all into a list.

This approach has worked well and I’ve picked up lots of reading material from following these people. It does feel like something that could be automated though and we’ll using Python and Neo4j as our tool kit.

We need to find a library to connect to the Twitter API. There are a few to choose from but tweepy seems simple enough so I started using that.

The first thing we need to so is fill in our Twitter API auth details. Since the application is just for me I’m not going to bother with setting up OAuth – instead I’ll just create an application on apps.twitter.com and grab the appropriate tokens:

2015 03 11 01 18 47

Once we’ve done that we can navigate to that app and get a consumer key, consumer secret, access token and access token secret. They all reside on the ‘Keys and Access Tokens’ tab:

2015 03 11 01 20 17 2015 03 11 01 20 30

Now that we’ve got ourself all auth’d up let’s write some code that starts from Chris and goes out and finds his latest tweets and the people he’s interacted with in them. We’ll write the appropriate information out to a CSV file so that we can import it into Neo4j later on:

import tweepy
import csv
from collections import Counter, deque
 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
 
api = tweepy.API(auth, wait_on_rate_limit = True, wait_on_rate_limit_notify = True)
 
counter = Counter()
users_to_process = deque()
USERS_TO_PROCESS = 50
 
def extract_tweet(tweet):
    user_mentions = ",".join([user["screen_name"].encode("utf-8")
                             for user in tweet.entities["user_mentions"]])
    urls = ",".join([url["expanded_url"]
                     for url in tweet.entities["urls"]])
    return [tweet.user.screen_name.encode("utf-8"),
            tweet.id,
            tweet.text.encode("utf-8"),
            user_mentions,
            urls]
 
starting_user = "chvest"
with open("tweets.csv", "a") as tweets:
    writer = csv.writer(tweets, delimiter=",", escapechar="\\", doublequote = False)
    for tweet in tweepy.Cursor(api.user_timeline, id=starting_user).items(50):
        writer.writerow(extract_tweet(tweet))
        tweets.flush()
        for user in tweet.entities["user_mentions"]:
            if not len(users_to_process) > USERS_TO_PROCESS:
                users_to_process.append(user["screen_name"])
                counter[user["screen_name"]] += 1
            else:
                break

As well as printing out Chris’ tweets I’m also capturing other users who he’s had interacted and putting them in a queue that we’ll drain later on. We’re limiting the number of other users that we’ll process to 50 for now but it’s easy to change.

If we print out the first few lines of ‘tweets.csv’ this is what we’d see:

$ head -n 5 tweets.csv
userName,tweetId,contents,usersMentioned,urls
chvest,575427045167071233,@shipilev http://t.co/WxqFIsfiSF,shipilev,
chvest,575403105174552576,@AlTobey I often use http://t.co/G7Cdn9Udst for small one-off graph diagrams.,AlTobey,http://www.apcjones.com/arrows/
chvest,575337346687766528,RT @theburningmonk: this is why you need composition over inheritance... :s #CompositionOverInheritance http://t.co/aKRwUaZ0qo,theburningmonk,
chvest,575269402083459072,@chvest except…? “Each library implementation should therefore be identical with respect to the public API”,chvest,

We’re capturing the user, tweetId, the tweet itself, any users mentioned in the tweet and any URLs shared in the tweet.

Next we want to get some of the tweets of the people Chris has interacted with

# Grab the code from here too - https://gist.github.com/mneedham/3188c44b2cceb88c6de0
 
import tweepy
import csv
from collections import Counter, deque
 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
 
api = tweepy.API(auth, wait_on_rate_limit = True, wait_on_rate_limit_notify = True)
 
counter = Counter()
users_to_process = deque()
USERS_TO_PROCESS = 50
 
def extract_tweet(tweet):
    user_mentions = ",".join([user["screen_name"].encode("utf-8")
                             for user in tweet.entities["user_mentions"]])
    urls = ",".join([url["expanded_url"]
                     for url in tweet.entities["urls"]])
    return [tweet.user.screen_name.encode("utf-8"),
            tweet.id,
            tweet.text.encode("utf-8"),
            user_mentions,
            urls]
 
starting_user = "chvest"
with open("tweets.csv", "a") as tweets:
    writer = csv.writer(tweets, delimiter=",", escapechar="\\", doublequote = False)
    for tweet in tweepy.Cursor(api.user_timeline, id=starting_user).items(50):
        writer.writerow(extract_tweet(tweet))
        tweets.flush()
        for user in tweet.entities["user_mentions"]:
            if not len(users_to_process) > USERS_TO_PROCESS:
                users_to_process.append(user["screen_name"])
                counter[user["screen_name"]] += 1
            else:
                break
    users_processed = set([starting_user])
    while True:
        if len(users_processed) >= USERS_TO_PROCESS:
            break
        else:
            if len(users_to_process) > 0:
                next_user = users_to_process.popleft()
                print next_user
                if next_user in users_processed:
                    "-- user already processed"
                else:
                    "-- processing user"
                    users_processed.add(next_user)
                    for tweet in tweepy.Cursor(api.user_timeline, id=next_user).items(10):
                        writer.writerow(extract_tweet(tweet))
                        tweets.flush()
                        for user_mentioned in tweet.entities["user_mentions"]:
                            if not len(users_processed) > 50:
                                users_to_process.append(user_mentioned["screen_name"])
                                counter[user_mentioned["screen_name"]] += 1
                            else:
                                break
            else:
                break

Finally let’s take a quick look at the users who show up most frequently:

>>> for user_name, count in counter.most_common(20):
        print user_name, count
 
neo4j 13
devnexus 12
AlTobey 11
bitprophet 11
hazelcast 10
chvest 9
shipilev 9
AntoineGrondin 8
gvsmirnov 8
GatlingTool 8
lagergren 7
tomsontom 6
dorkitude 5
noctarius2k 5
DanHeidinga 5
chris_mahan 5
coda 4
mccv 4
gAmUssA 4
jmhodges 4

A few of the people on that list are in my list which is a good start. We can explore the data set better once it’s in Neo4j though so let’s write some Cypher import statements to create our own mini Twitter graph:

// add people
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-twitter/tweets.csv" AS row
MERGE (p:Person {userName: row.userName});
 
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-twitter/tweets.csv" AS row
WITH SPLIT(row.usersMentioned, ",") AS users
UNWIND users AS user
MERGE (p:Person {userName: user});
 
// add tweets
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-twitter/tweets.csv" AS row
MERGE (t:Tweet {id: row.tweetId})
ON CREATE SET t.contents = row.contents;
 
// add urls
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-twitter/tweets.csv" AS row
WITH SPLIT(row.urls, ",") AS urls
UNWIND urls AS url
MERGE (:URL {value: url});
 
// add links
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-twitter/tweets.csv" AS row
MATCH (p:Person {userName: row.userName})
MATCH (t:Tweet {id: row.tweetId})
MERGE (p)-[:TWEETED]->(t);
 
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-twitter/tweets.csv" AS row
WITH SPLIT(row.usersMentioned, ",") AS users, row
UNWIND users AS user
MATCH (p:Person {userName: user})
MATCH (t:Tweet {id: row.tweetId})
MERGE (p)-[:MENTIONED_IN]->(t);
 
LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-twitter/tweets.csv" AS row
WITH SPLIT(row.urls, ",") AS urls, row
UNWIND urls AS url
MATCH (u:URL {value: url})
MATCH (t:Tweet {id: row.tweetId})
MERGE (t)-[:CONTAINS_LINK]->(u);

We can put all those commands in a file and execute them using neo4j-shell:

$ ./neo4j-community-2.2.0-RC01/bin/neo4j-shell --file import.cql

Now let’s write some queries against the graph:

// Find the tweets where Chris mentioned himself
MATCH path = (n:Person {userName: "chvest"})-[:TWEETED]->()<-[:MENTIONED_IN]-(n)
RETURN path

Graph  5

// Find the most popular links shared in the network
MATCH (u:URL)<-[r:CONTAINS_LINK]->()
RETURN u.value, COUNT(*) AS times
ORDER BY times DESC
LIMIT 10
 
+-------------------------------------------------------------------------------------------------+
| u.value                                                                                 | times |
+-------------------------------------------------------------------------------------------------+
| "http://www.polyglots.dk/"                                                              | 4     |
| "http://www.java-forum-nord.de/"                                                        | 4     |
| "http://hirt.se/blog/?p=646"                                                            | 3     |
| "http://wp.me/p26jdv-Ja"                                                                | 3     |
| "https://instagram.com/p/0D4I_hH77t/"                                                   | 3     |
| "https://blogs.oracle.com/java/entry/new_java_champion_tom_chindl"                      | 3     |
| "http://www.kennybastani.com/2015/03/spark-neo4j-tutorial-docker.html"                  | 2     |
| "https://firstlook.org/theintercept/2015/03/10/ispy-cia-campaign-steal-apples-secrets/" | 2     |
| "http://buff.ly/1GzZXlo"                                                                | 2     |
| "http://buff.ly/1BrgtQd"                                                                | 2     |
+-------------------------------------------------------------------------------------------------+
10 rows

The first link is for a programming language meetup in Copenhagen, the second for a Java conference in Hanovier and the third an announcement about the latest version of Java Mission Control. So far so good!

A next step in this area would be to run the links through Prismatic’s interest graph so we can model topics in our graph as well. For now let’s have a look at the interactions between Chris and others in the graph:

// Find the people who Chris interacts with most often
MATCH path = (n:Person {userName: "chvest"})-[:TWEETED]->()<-[:MENTIONED_IN]-(other)
RETURN other.userName, COUNT(*) AS times
ORDER BY times DESC
LIMIT 5
 
+------------------------+
| other.userName | times |
+------------------------+
| "gvsmirnov"    | 7     |
| "shipilev"     | 5     |
| "nitsanw"      | 4     |
| "DanHeidinga"  | 3     |
| "AlTobey"      | 3     |
+------------------------+
5 rows

Let’s generalise that to find interactions between any pair of people:

// Find the people who interact most often
MATCH (n:Person)-[:TWEETED]->()<-[:MENTIONED_IN]-(other)
WHERE n <> other
RETURN n.userName, other.userName, COUNT(*) AS times
ORDER BY times DESC
LIMIT 5
 
+------------------------------------------+
| n.userName    | other.userName   | times |
+------------------------------------------+
| "fbogsany"    | "AntoineGrondin" | 8     |
| "chvest"      | "gvsmirnov"      | 7     |
| "chris_mahan" | "bitprophet"     | 6     |
| "maxdemarzi"  | "neo4j"          | 6     |
| "chvest"      | "shipilev"       | 5     |
+------------------------------------------+
5 rows

Let’s combine a couple of these together to come up with a score for each person:

MATCH (n:Person)
 
// number of mentions
OPTIONAL MATCH (n)-[mention:MENTIONED_IN]->()
WITH n, COUNT(mention) AS mentions
 
// number of links shared by someone else
OPTIONAL MATCH (n)-[:TWEETED]->()-[:CONTAINS_LINK]->(link)<-[:CONTAINS_LINK]-()
 
WITH n, mentions, COUNT(link) AS links
RETURN n.userName, mentions + links AS score, mentions, links
ORDER BY score DESC
LIMIT 10
 
+------------------------------------------+
| n.userName    | score | mentions | links |
+------------------------------------------+
| "chvest"      | 17    | 10       | 7     |
| "hazelcast"   | 16    | 10       | 6     |
| "neo4j"       | 15    | 13       | 2     |
| "noctarius2k" | 14    | 4        | 10    |
| "devnexus"    | 12    | 12       | 0     |
| "polyglotsdk" | 11    | 2        | 9     |
| "shipilev"    | 11    | 10       | 1     |
| "AlTobey"     | 11    | 10       | 1     |
| "bitprophet"  | 10    | 9        | 1     |
| "GatlingTool" | 10    | 8        | 2     |
+------------------------------------------+
10 rows

Amusingly Chris is top of his own network but we also see three accounts which aren’t people, but rather products – neo4j, hazelcast and GatlingTool. The rest are legit though

That’s as far as I’ve got but to make this more useful I think we need to introduce follower/friend links as well as importing more data.

In the mean time I’ve got a bunch of links to go and read!

Categories: Programming