Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

The Customer Bought a Capability

Herding Cats - Glen Alleman - 1 hour 19 min ago

Customers buy capabilities to accomplish a business goal or successfully complete a mission. Deliverables are the foundation of the ability to provide this capability. Here's how to manage a project focused on Deliverables. 

Deliverables Based Planning from Glen Alleman These capabilities and the deliverables that enable them do need to show up at the time they are needed for the cost necessary to support the business case. They're not the Minimal Viable Capabilities, they're the Needed Capabilities. Anything less will not meet the needs of the business.
Categories: Project Management

Should Team Members Sign Up for Tasks During Sprint Planning?

Mike Cohn's Blog - 1 hour 53 min ago

During sprint planning, a team selects a set of product backlog items they will work on during the coming sprint. As part of doing this, most teams will also identify a list of the tasks to be performed to complete those product backlog items.

Many teams will also provide rough estimates of the effort involved in each task. Collectively, these artifacts are the sprint backlog and could be presented along the following lines:

One issue for teams to address is whether individuals should sign up for tasks during sprint planning.

If a team walks out of sprint planning with a name next to every task, individual accountability will definitely be increased. I will feel more responsibility to finish the tasks with my name or initials next to them. And you will feel the same for those with yours. But, this will come at the expense of team accountability.

My recommendation is that a team should leave sprint planning without having put names on tasks. Following a real-time sign-up strategy will allow more flexibility during the sprint.

SE-Radio Episode 225 Brendan Gregg on Systems Performance

Senior performance architect and author of *Systems Performance* Brendan Gregg talks with Robert Blumen about systems performance: how the hardware and OS layers affect application behavior. The discussion covers the scope of systems performance, systems performance in the software life cycle, the role of performance analysis in architecture, methodologies for solving performance problems, dynamic tracing […]
Categories: Programming

R: dplyr ‚Äď Error in (list: invalid subscript type ‚Äėdouble‚Äô

Mark Needham - Mon, 04/27/2015 - 23:34

In my continued playing around with R I wanted to find the minimum value for a specified percentile given a data frame representing a cumulative distribution function (CDF).

e.g. imagine we have the following CDF represented in a data frame:

library(dplyr)
df = data.frame(score = c(5,7,8,10,12,20), percentile = c(0.05,0.1,0.15,0.20,0.25,0.5))

and we want to find the minimum value for the 0.05 percentile. We can use the filter function to do so:

> (df %>% filter(percentile > 0.05) %>% slice(1))$score
[1] 7

Things become more tricky if we want to return multiple percentiles in one go.

My first thought was to create a data frame with one row for each target percentile and then pull in the appropriate row from our original data frame:

targetPercentiles = c(0.05, 0.2)
percentilesDf = data.frame(targetPercentile = targetPercentiles)
> percentilesDf %>% 
    group_by(targetPercentile) %>%
    mutate(x = (df %>% filter(percentile > targetPercentile) %>% slice(1))$score)
 
Error in (list(score = c(5, 7, 8, 10, 12, 20), percentile = c(0.05, 0.1,  : 
  invalid subscript type 'double'

Unfortunately this didn’t quite work as I expected – Antonios pointed out that this is probably because we’re mixing up two pipelines and dplyr can’t figure out what we want to do.

Instead he suggested the following variant which uses the do function

df = data.frame(score = c(5,7,8,10,12,20), percentile = c(0.05,0.1,0.15,0.20,0.25,0.5))
targetPercentiles = c(0.05, 0.2)
 
> data.frame(targetPercentile = targetPercentiles) %>%
    group_by(targetPercentile) %>%
    do(df) %>% 
    filter(percentile > targetPercentile) %>% 
    slice(1) %>%
    select(targetPercentile, score)
Source: local data frame [2 x 2]
Groups: targetPercentile
 
  targetPercentile score
1             0.05     7
2             0.20    12

We can then wrap this up in a function:

percentiles = function(df, targetPercentiles) {
  # make sure the percentiles are in order
  df = df %>% arrange(percentile)
 
  data.frame(targetPercentile = targetPercentiles) %>%
    group_by(targetPercentile) %>%
    do(df) %>% 
    filter(percentile > targetPercentile) %>% 
    slice(1) %>%
    select(targetPercentile, score)
}

which we call like this:

df = data.frame(score = c(5,7,8,10,12,20), percentile = c(0.05,0.1,0.15,0.20,0.25,0.5))
> percentiles(df, c(0.08, 0.10, 0.50, 0.80))
Source: local data frame [2 x 2]
Groups: targetPercentile
 
  targetPercentile score
1             0.08     7
2             0.10     8

Note that we don’t actually get any rows back for 0.50 or 0.80 since we don’t have any entries greater than those percentiles. With a proper CDF we would though so the function does its job.

Categories: Programming

What Mongo-ish API would mobile developers want?

Eric.Weblog() - Eric Sink - Mon, 04/27/2015 - 19:00

A couple weeks ago I blogged about mobile sync for MongoDB.

Updated Status of Elmo

Embeddable Lite Mongo continues to move forward nicely:

  • Progress on indexes:

    • Compound and multikey indexes are supported.
    • Sparse indexes are not done yet.
    • Index key encoding is different from the KeyString stuff that Mongo itself does. For encoding numerics, I did an ugly-but-workable F# port of the encoding used by SQLite4.
    • Hint is supported, but is poorly tested so far.
    • Explain is supported, partially, and only for version 3 of the wire protocol. More work to do there.
    • The query planner (which has delusions of grandeur for even referring to itself by that term) isn't very smart.
    • Indexes cannot yet be used for sorting.
    • Indexes are currently never used to cover a query.
    • When grabbing index bounds from the query, $elemMatch is ignored. Because of this, and because of the way Mongo multikey indexes work, most index scans are bounded at only one end.
    • The $min and $max query modifiers are supported.
    • The query planner doesn't know how to deal with $or at all.
  • Progress on full-text search:

    • This feature is working for some very basic cases.
    • Phrase search is not implemented yet.
    • Language is currently ignored.
    • The matcher step for $text is not implemented yet at all. Everything within the index bounds will get returned.
    • The tokenizer is nothing more than string.split. No stemming. No stop words.
    • Negations are not implemented yet.
    • Weights are stored in the index entries, but textScore is not calculated yet.

I also refactored to get better separation between the CRUD logic and the storage of bson blobs and indexes (making it easier to plug-in different storage layers).

Questions about client-side APIs

So, let's assume you are building a mobile app which communicates with your Mongo server in the cloud using a "replicate and sync" approach. In other words, your app is not doing its CRUD operations by making networking/REST calls back to the server. Instead, your app is working directly with a partial clone of the Mongo database that is right there on the mobile device. (And periodically, that partial clone is magically synchronized with the main database on the server.)

What should the API for that "embedded lite mongo" look like?

Obviously, for each development environment, the form of the API should be designed to feel natural or native in that environment. This is the approach taken by Mongo's own client drivers. In fact, as far as I can tell, these drivers don't even share much (or any?) code. For example, the drivers for C# and Java and Ruby are all different, and (unless I'm mistaken) none of them are mere wrappers around something lower level like the C driver. Each one is built and maintained to provide the most pleasant experience to developers in a specific ecosystem.

My knee-jerk reaction here is to say that mobile developers might want the exact same API as presented by their nearest driver. For example, if I am building a mobile app in C# (using the Xamarin tools), there is a good chance my previous Mongo experience is also in C#, so I am familiar with the C# driver, so that's the API I want.

Intuitive as this sounds, it may not be true. Continuing with the C# example, that driver is quite large. Is its size appropriate for use on a mobile device? Is it even compatible with iOS, which requires AOT compilation? (FWIW, I tried compiling this driver as a PCL (Portable Class Library), and it didn't Just Work.)

For Android, the same kinds of questions would need to be asked about the Mongo Java driver.

And then there are Objective-C and Swift (the primary developer platform for iOS), for which there is no official Mongo driver. But there are a couple of them listed on the Community Supported Drivers page: http://docs.mongodb.org/ecosystem/drivers/community-supported-drivers/.

And we need to consider Phonegap/Cordova as well. Is the Node.js driver a starting point?

And in all of these cases, if we assume that the mobile API should be the same as the driver's API, how should that be achieved? Fork the driver code and rip out all the networking and replace it with calls to the embedded library?

Or should each mobile platform get a newly-designed API which is specifically for mobile use cases?

Believe it or not, some days I wonder: Suppose I got Elmo running as a server on an Android device, listening on localhost port 27017. Could an Android app talk to it with the Mongo Java driver unchanged? Even if this would work, it would be more like a proof-of-concept than a production solution. Still, when looking for solutions to a problem, the mind goes places...

So anyway, I've got more questions than answers here, and I would welcome thoughts or opinions.

  • Feel free to post an issue on GitHub: https://github.com/zumero/Elmo/issues

  • Or email me: eric@zumero.com

  • Or Tweet: @eric_sink

  • Or find me at MongoDB World in NYC at the beginning of June.

 

How can we Build Better Complex Systems? Containers, Microservices, and Continuous Delivery.

We must be able to create better complex software systems. That’s that message from Mary Poppendieck in a wonderful far ranging talk she gave at the Craft Conference: New New Software Development Game: Containers, Micro Services.

The driving insight is complexity grows nonlinearly with size. The type of system doesn’t really matter, but we know software size will continue to grow so software complexity will continue to grow even faster.

What can we do about it? The running themes are lowering friction and limiting risk:

  • Lower friction. This allows change to happen faster. Methods: dump the centralizing database; adopt microservices; use containers; better organize teams.

  • Limit risk. Risk is inherent in complex systems. Methods: PACT testing; continuous delivery.

Some key points:

  • When does software really grow? When smart people can do their own thing without worrying about their impact on others. This argues for building federated systems that ensure isolation, which argues for using microservices and containers.

  • Microservices usually grow successfully from monoliths. In creating a monolith developers learn how to properly partition a system.

  • Continuous delivery both lowers friction and lowers risk. In a complex system if you want stability, if you want security, if you want reliability, if you want safety then you must have lots of little deployments. 

  • Every member of a team is aware of everything. That's what makes a winning team. Good situational awareness.

The highlight of the talk for me was the section on the amazing design of the Swedish Gripen Fighter Jet. Talks on microservices tend to be highly abstract. The fun of software is in the building. Talk about parts can be so nebulous. With the Gripen the federated design of the jet as a System of Systems becomes glaringly concrete and real. If you can replace your guns, radar system, and virtually any other component without impacting the rest of the system, that’s something! Mary really brings this part of the talk home. Don’t miss it.

It’s a very rich and nuanced talk, there’s a lot history and context given, so I can’t capture all the details, watching the video is well worth the effort. Having said that, here’s my gloss on the talk...

Hardware Scales by Abstraction and Miniaturization
Categories: Architecture

Blog Hosting: The Ultimate Guide

Making the Complex Simple - John Sonmez - Mon, 04/27/2015 - 16:00

Blog hosting is such a complicated topic. So many choices. How can you ever know what platform and host to use to host your blog? Well, I’ve done it all, from free hosting to cheap paid hosting, to full service hosting and even running my own server. In this post, I am going to break […]

The post Blog Hosting: The Ultimate Guide appeared first on Simple Programmer.

Categories: Programming

Quote of the Month April 2015

From the Editor of Methods & Tools - Mon, 04/27/2015 - 13:30
Code that doesn’t have tests rots. It rots because we don’t feel confident to touch it, we’re afraid to break the “working” parts. Code rotting means that it doesn’t improve, staying the way we first wrote it. I’ll be the first to admit that whenever I write code, it comes in its most ugly form. It may not look ugly immediately after I wrote it, but if I wait a couple of days (or a couple of hours), I know I will find many ways to improve it. Without tests ...

SPaMCAST 339 ‚Äď Demonstrations, Microservices

 www.spamcast.net

http://www.spamcast.net

Listen Now

Subscribe on iTunes

Software Process and Measurement Cast 339 features our essay on demonstrations and a new Form Follows Function column from Gene Hughson.

Demonstrations are a tool to generate conversations about what is being delivered.  Because a demonstration occurs at the end of every sprint the team will continually be demonstrating the value they are delivering, which reinforces confidence and motivation. The act of demonstrating value provides the team with a platform for collecting feedback that will help them stay on track and focused on delivering what has the most value to the business.

Gene continues his theme of microservices.¬† This week we tackle, ‚ÄúMicroservices, SOA, and EITA: Where To Draw the Line? Why to Draw the¬†Line?‚ÄĚ Gene says, ‚Äúwe recognize lines to prevent needless conflict and waste.‚ÄĚ

Two special notes:

Jo Ann Sweeny of the Explaining Change column is running her annual Worth Working Summit.  Please visit http://www.worthworkingsummit.com/

Jeremy Berriault will be joining the SPaMCAST family.  Jeremy will be focusing on testing and the lessons testing can provide to a team and organization.

Call to action!

Reviews of the Podcast help to attract new listeners.  Can you write a review of the Software Process and Measurement Cast and post it on the podcatcher of your choice?  Whether you listen on ITunes or any other podcatcher, a review will help to grow the podcast!  Thank you in advance!

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

I am beginning to think of which book will be next. Do you have any ideas?

Upcoming Events

CMMI Institute Global Congress
May 12-13 Seattle, WA, USA
My topic – Agile Risk Management
http://cmmiconferences.com/

DCG will also have a booth!

Next SPaMCast

The next Software Process and Measurement Cast will feature our interview with Tom Howlett.  Tom is the author of the Diary of a Scrummaster and is a Scrum Master’s Scrum Master. Tom and I talked Agile and being Agile outside of the classic software development environments.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques¬†co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: ‚ÄúThis book will prove that software projects should not be a tedious process, neither for you or your team.‚ÄĚ Support SPaMCAST by buying the book¬†here.

Available in English and Chinese.


Categories: Process Management

SPaMCAST 339 ‚Äď Demonstrations, Microservices

Software Process and Measurement Cast - Sun, 04/26/2015 - 22:00

Software Process and Measurement Cast 339 features our essay on demonstrations and a new Form Follows Function column from Gene Hughson.

Demonstrations are a tool to generate conversations about what is being delivered.  Because a demonstration occurs at the end of every sprint the team will continually be demonstrating the value they are delivering, which reinforces confidence and motivation. The act of demonstrating value provides the team with a platform for collecting feedback that will help them stay on track and focused on delivering what has the most value to the business.

Gene continues his theme of microservices.¬† This week we tackle, ‚ÄúMicroservices, SOA, and EITA: Where To Draw the Line? Why to Draw the¬†Line?‚ÄĚ Gene says, ‚Äúwe recognize lines to prevent needless conflict and waste.‚ÄĚ

Two special notes:

Jo Ann Sweeny of the Explaining Change column is running her annual Worth Working Summit.  Please visit http://www.worthworkingsummit.com/

Jeremy Berriault will be joining the SPaMCAST family.  Jeremy will be focusing on testing and the lessons testing can provide to a team and organization.

Call to action!

Reviews of the Podcast help to attract new listeners.  Can you write a review of the Software Process and Measurement Cast and post it on the podcatcher of your choice?  Whether you listen on ITunes or any other podcatcher, a review will help to grow the podcast!  Thank you in advance!

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

I am beginning to think of which book will be next. Do you have any ideas?

Upcoming Events

CMMI Institute Global Congress
May 12-13 Seattle, WA, USA
My topic - Agile Risk Management
http://cmmiconferences.com/

DCG will also have a booth!

Next SPaMCast

The next Software Process and Measurement Cast will feature our interview with Tom Howlett.  Tom is the author of the Diary of a Scrummaster and is a Scrum Master’s Scrum Master. Tom and I talked Agile and being Agile outside of the classic software development environments.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques¬†co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: ‚ÄúThis book will prove that software projects should not be a tedious process, neither for you or your team.‚ÄĚ Support SPaMCAST by buying the book¬†here.

Available in English and Chinese.

Categories: Process Management

Suprastructure ‚Äď how come ‚ÄúMicroservices‚ÄĚ are getting small?

supra

Now, I don’t want to get off on a rant here*, ¬†but, It seems like “Microservices” are all the rage these days – at least judging from my twitter, feedly and Prismatic feeds. I already wrote that that in my opinion “Microservices” is just new name to SOA . I thought I’d give a couple of examples for what I mean.

I worked on systems that today would¬†¬†pass for Microservices years ago (as early as 2004/5). ¬†For instance in¬† 2007, ¬†I worked at a startup called xsights. We developed something like¬†¬†google goggles for brands (or barcodeless barcode) so users could snap¬†a picture of a ad/brochure etc. and get relevant content or perks in response (e.g. we had campaigns in Germany with a book publisher where MMSing shots of newspaper ads ¬†or outdoor signage resulted in getting information and discounts¬†on the advertized books).The architecture driving that wast a set of small, focused autonomous services. Each service encapsulated its own data store (if it had one), services were replaceable (e.g. we had MMS, web, apps & 3G video call gateways). We developed the infrastructure to support automatic service discovery, the ability to create ad-hoc long running interactions a.k.a.Sagas that enabled different cooperations between the services (e.g. the flow for 3G video call needed more services for fulfilment than a app one) etc. ¬†You can read a bit about it in the¬†¬†“putting it all together” chapter of my SOA Patterns book or¬†view the presentation¬†I gave in QCon a few years back called “Building reliable systems from unreliable components” (see slides below)¬†both elaborate some more on that system.

Another example is Naval command and control system I (along with Udi Dahan) designed back in 2004 for¬†an unmanned surface vessel¬†(like a drone but on water) ¬†– In that system we had services like “navigation” that suggested navigation routes based on waypoints and data from other services (e.g. weather), a “protector” service that handled communications to and from the actual USVs a “Common Operational Picture” (COP) service that aggregated target data from external services and sensors (e.g. the ones on the protectors), “Alerts” services where business rules could trigger various actions etc. These services communicated using events and messages and had flows like the protecor publish its current positioning, the ¬†COP publish an updated target positions (protector + other targets), the navigation system spots a potential interception problem and publish that , the alert service identify that the threshold for the potential problem is too big and trigger an alert to users which then initiate a request for suggesting alternate navigation plans etc. Admittedly some of these services could have been more focused and smaller but they were still autonomous, with separate storage ¬†and ¬†hey that was 2004 :)

So, what changed in the last decade ? For one, I guess after years of “enterprisy” hype that ruined SOAs name the actual architectural style is finally getting some traction (even if it had to change its name for that to happen).

However, this post is not just a rant on Microservices…

The more interesting chage is the shift in the role of infrastructure from a set of libraries and tools that are embedded within the software we write to larger constructs running outside of the software and running/managing¬†it -> or in other words the emergence of ¬†“suprastructure” instead of infrastructure ¬†(infra = below, supra = above). It isn’t that infrastructure vanishes but a lot of functionality is “outsources” to suprastructure. Also this is something that started a few years back with PaaS but (IMHO) getting more acceptance and use in the last couple of years esp. with the gaining popularity of Docker (and more importantly its ecosystem)

If we consider, for example, the architecture of Appsflyer , which I recently joined, (You can listen to  Nir Rubinshtein, our system architect, presenting it (in Hebrew) or check out the slides on speaker-deck or below (English) )

Instead of writing or using elaborate service hosts and application servers you can  host simple apps in  Docker; run and schedule them by Mesos, get cluster and discovery services from Consul, recover from failure by rereading logs from Kafka etc. Back in the days we also had these capabilities but we wrote tons of code to make it happen. Tons of code that was specific to the solution and technology (and was prone for bugs and problems). For modern solutions, all these capabilities are available almost off the shelf , everywhere,  on premise, on clouds and even across clouds.

The importance of suprastructure in regard to ¬†“microservices” ¬†is that this “outsourcing” of functionality help drive down the overhead and costs associated with making services small(er). In previous years the threshold to getting from useful services ¬†to nanoservices¬† was easier to cross. Today, it is almost reversed – ¬†you spend the effort of setting all this suprastructure once and you actually just begin to see the return if you have enough services to make it worthwhile.

Another advantage of suprastructure is that it is easier to get polyglot services Рi.e.it is easier  to write different services using different technologies. Instead of investing in a lot of technology-specific infrastructure you can get more generic capabilities from the suprastructure and spend more time solving the business problems using the right tool for the job. It also makes it easier to change and  evolve technologies over time Рagain saving the sunk costs of investing in elaborate infrastructure

 

of course, that’s just my opinion I could be wrong…*

PS – we (Appsflyer) are hiring : UI tech lead, data scientist, senior devs and more… :)

Building reliable systems from unreliable components

* with apologies to Dennis Miller

Categories: Architecture

The Flaw of Empirical Data Used to Make Decisions About the Future

Herding Cats - Glen Alleman - Sun, 04/26/2015 - 19:20

It's popular in the agile world and even more popular in the No Estimates paradigm to use the term empirical data as a substitute for estimating future outcomes. And my favorite meme that further confuses the conversation.

Probabilistic forecasting will outperform estimation every time

This of course is "It is not only not right, it is not even wrong."† Probabilistic forecasting IS estimating. Estimating is about the past, present, and future. Forecasting is estimating about the future. I'll save the embarrassment by not saying the name of the #NoEstimates person posting this. 

First a definition. Empirical  is originating in or based on observation or experience. But we all should know that that data needs to properly represent two sides of the problem, the past and the future.

Let's look at some flawed logic in this empirical data paradigm:

  • The past -¬†we took 18 samples from the start of the project till now and calculated the Average number of value and we'll use that as a¬†representative¬†number for the future.
  • The future - is the past a proper representation - statistically - of the future?¬†
    • It's taken 45 minutes from the driveway to the airport garage the last 5 times I¬†left¬†on¬†Monday afternoon to the remote site.
    • What's the probability it will take 45 minutes today?

One more technical detail.

  • The¬†flow¬†or¬†Kanban¬†style processes depend on a critical concept - Each random variable that is always present in our project must be Identical and Independently distributed.
  • This means each random variable has the same¬†probability distribution¬†as the others.
  • This CAN be the case in some situations, but when we are developing software in an emergent environment - not production line - it is unlikely.¬†¬†

So Now Some Issues Of Using Just Empirical Data

The future is emergent in most development work. If it's a production line, and software development is not production, then past performance is a good indicator of future performance. So let's ask some questions before using this past empirical data:

  • Is the data in the past properly assessed for variance, stability - stationarity, independence?
  • It is now of these, what are the statistical parameters. Especially¬†independence. The notion of ¬†INVEST in agile cannot be assumed without a test.¬†
  • Is the future going to be stable, stationary, independent and represented by the past?
  • What's the uncertainty in the future events?
  • What was the uncertainty in the past that was not recognized and influenced the statistics but was not represented?
  • What are the irreducible uncertainties in the future - the naturally occurring variances that will need margin?
  • What are the reducible uncertainties in the future that must be¬†brought down or have¬†management reserve?

Don't have the answers to these and working a non-trivial project? Our empirical data is not worth much because it doesn't actually represent the future. Might as well guess and stop using the term empirical as a substitute for you know know much of anything about the future.

With those answers we can build a credible model of the future, with interdependencies between the work, probability distribution functions for the statistical behaviors of the work elements and start asking the Killer question:

What's the probability of completing on or before the need date for the work we are producing?

This answer only tells us the probability, not the exact date. So here's the most important point.

  • When we have a model, we can test if there is an acceptable probability of success.
  • That's all we can do, model, test, assess, model some more.

All decisions about future outcomes in the presence of uncertainty need estimates that are placed in the model and assessed for their applicability.

This is called Closed Loop Statistical Process Control. And that's how non-trivial projects are managed. Low value at risk, no one cares if you estimate or not. 

† Which by the way is the situation with most of  #NoEstimates conjectures, starting with the willful ignorance of the MicroEconomics of decision making as an opportunity cost process. What will is cost us if we decided by multiple choices in the presence of an uncertain future? That questions can't be answered without making an estimate of that opportunity cost.

 

Categories: Project Management

Re-Read Saturday: The Goal: A Process of Ongoing Improvement. Part 10

IMG_1249

As I began to summarize my Re-Read notes from Chapters 25 and 26, I was struck by a conversion I participated in at the QAI Quest Conference in Atlanta this week (I spoke on scaling Agile testing using the TMMi, but that is a topic for another column). A gentleman at my lunch table expressed his frustration because his testing team could not keep up with all of the work generated by the development team his group served. They read that for every ten developers there should be two testers, and his company had been struggling with balancing the flow of work between testing and development for a longtime. It was not working. I asked what happened when they were being asked to do more work that they had capacity to deliver. The answer was that it sometimes depended on who was yelling at them. The tactics they used included expediting work, letting some work sit around until it became critical or just cutting the amount of planned testing. He capped the conversation by evoking the old adage; ‚Äúthe squeaky wheel gets the grease.‚ÄĚ My lunch companion had reverted to expediting work through a bottleneck much in the same way suggested by Alex (and rejected by Johan) in today’s Re-Read Saturday installment.

Part 1       Part 2       Part 3      Part 4      Part 5      Part 6      Part 7      Part 8    Part 9

Chapter 25. In Chapter 24 Alex and his team suddenly found that 30 parts of their product have become constraints. As Chapter 25 beings, Johan arrives. After being briefed on the problem and the data, Johan suggests that Alex and his leadership team go back out into the plant to SEE what is really happening. The milling machine has become a problem area. Red tag items, priority items that to need ready for NCX-10 and heat treat bottlenecks, are being built to the exclusion of green tag (non-priority parts). Two things have occurred, excess red tag inventory has built up and now overall products can’t be assembled because green tag parts are missing. Johan points out the red card/green card process and running all steps at 100% capacity have created the problem. Remember, by definition non-bottleneck steps or processes have more capacity than is needed and when the non-bottleneck process is run consistently at 100% capacity it produce more output than the overall process needs. Bottlenecks as we have noted earlier define the capacity of the overall process. When output of any step outstrips what the bottleneck excess inventory is generated.  In this case, since everyone has been told to build red card items first they have less time to create non-bottleneck parts therefore inventory is being build up for parts that are not currently needed to the exclusion of the parts needed.  A mechanism is needed to signal when parts should start to flow through process so they will arrive at assemble when they are needed.

Alex and his team discover two rules:

  1. The level of capacity utilization of a non-bottleneck step is not determined by its own potential capacity, but some other constraint. Said differently, non-bottlenecked steps should only be used to the capacity required to support the bottleneck steps and to the level customers want the output of the process.
  2. Activation of a resource (just turning a resource on or off) and utilization of a resource (making use of a resource in a way that moves the system closer to the goal) are not synonymous. Restated, work that does not support attaining the goals of the system generates excess inventory and waste.

Returning briefly to my lunch conversation at QAI Quest, my new chum noted that recently one small project had to wait so long to be tested that the business changed their mind, meaning that six months of coding had to backed out. Which is one reasons that work-in-progress increases waste even in software development.

One the great lines in Chapter 26 is, ‚Äúa system of local optimums is not optimum at all.‚ÄĚ Everyone working at full capacity rarely generates the maximum product flow through the overall process.

Side Note: At the beginning of the chapter Alex and his team had the data that indicated that a problem existed. Until they all went to the floor as a whole team and VISUALIZED the work the problem was difficult to understand. Both Agile and Kanban use visualization as a tool to develop an understanding of how raw material turn into shippable product.

Chapter 26. Alex sits at his kitchen table thinking about the solution to the new dilemma of making sure the right parts flow through the system when they are needed without building up as inventory. Parts should only be worked on when they can continuously move through the process without stopping. This means each step needs to be coordinated, and if the full capacity of a step is not needed it should not be run. This means people might not be busy at all times. Sharon and David (Alex’s children) express interest in helping solve the problem. Using the metaphor of the Boy Scout hike David and Alex participated on earlier in the book, both children try to find a solution of pace and synchronization. Sharon suggests using a drummer to provide a coordinated pace. In Agile we would call the concept of a drummer: cadence. Cadence provides a beat that Agile teams use to pace their delivery of product. David suggests tying a rope to each part flowing through the process. Parts would move at a precise rate, if any step speeds up the rope provides resistance which is a signal to slow down and if slack occurs it is a signal to speed up in orders to keep the pace. Parts arrive at the end of the system when they are needed in a coordinated fashion. In Kanban we would recognize this concept as work being pulled thought the process rather than being pushed.

When back at the plant, Ralph (the computer guy) announces he can calculate the release rate needed for the red tag parts to flow smoothly through the process.  This would mean only the parts that will needed for orders being worked on would be created. No excess inventory would be created at any step including the bottlenecks.  Johan points out that Ralph can also use the same data to calculate the release rates for needed for the green tag items. Ralph thinks it will take him a long time to get both right. Alex tells them to begin even though it won’t be perfect and that they can tune the process as they get data. Do not let analysis paralysis keep you from getting started.

The chapter ends with Donavan (OPS) and Alex recognizing that their corporate efficiency reporting (they report efficiency of steps not the whole system) aren’t going to look great. ¬†Even though they will be completing and shipping more finished product the corporate measures have not been syncronized to the new way the plant is working. The reduction in efficiency (cost per part – see installments one and two) ¬†is going to attract Alex’s boss, Bill Peach‚Äôs attention.

Summary of The Goal so far:

Chapters 1 through 3 actively present the reader with a burning platform. The plant and division are failing. Alex Rogo has actively pursued increased efficiency and automation to generate cost reductions, however performance is falling even further behind and fear has become central feature in the corporate culture.

Chapters 4¬†through¬†6¬†shift the focus from steps in the process to the process as a whole. Chapters 4 ‚Äď 6 move us down the path of identifying the ultimate goal of the organization (in this book). The goal is making money and embracing the big picture of systems thinking. In this section, the authors point out that we are often caught up with pursuing interim goals, such as quality, efficiency or even employment, to the exclusion of the of the ultimate goal. We are reminded by the burning platform identified in the first few pages of the book, the impending closure of the plant and perhaps the division, which in the long run an organization must make progress towards their ultimate goal, or they won‚Äôt exist.

Chapters 7 through 9¬†show Alex‚Äôs commitment to change, seeks more precise advice from Johan, brings his closest reports into the discussion and begins a dialog with his wife (remember this is a novel). In this section of the book the concept ‚Äúthat you get what you measure‚ÄĚ is addressed. In this section of the book, we see measures of efficiency being used at the level of part production, but not at the level of whole orders or even sales. We discover the corollary to the adage ‚Äėyou get what you measure‚Äô is that if you measure the wrong thing ‚Ķyou get the wrong thing. We begin to see Alex‚Äôs urgency and commitment to make a change.

Chapters 10 through 12 mark a turning point in the book. Alex has embraced a more systems view of the plant and that the measures that have been used to date are more focused on optimizing parts of the process to the detriment to overall goal of the plant.  What has not fallen into place is how to take that new knowledge and change how the plant works. The introduction of the concepts of dependent events and statistical variation begin the shift the conceptual understanding of what measure towards how the management team can actually use that information.

Chapters 13 through 16 drive home the point that dependent events and statistical variation impact the performance of the overall system. In order for the overall process to be more effective you have to understand the capability and capacity of each step and then take a systems view. These chapters establish the concepts of bottlenecks and constraints without directly naming them and that focusing on local optimums causes more trouble than benefit.

Chapters 17 through 18 introduces the concept of bottlenecked resources. The affect of the combination dependent events and statistical variability through bottlenecked resources makes delivery unpredictable and substantially more costly. The variability in flow through the process exposes bottlenecks that limit our ability to catch up, making projects and products late or worse generating technical debt when corners are cut in order to make the date or budget.

Chapters 19 through 20 begins with Johan coaching Alex’s team to help them to identify a pallet of possible solutions. They discover that every time the capacity of a bottleneck is increased more product can be shipped.  Changing the capacity of a bottleneck includes reducing down time and the amount of waste the process generates. The impact of a bottleneck is not the cost of individual part, but the cost of the whole product that cannot be shipped. Instead of waiting to make all of the changes Alex and his team implement changes incrementally rather than waiting until they can deliver all of the changes.

Chapters 21 through 22 are a short primer on change management. Just telling people to do something different does not generate support. Significant change requires transparency, communication and involvement. One of Deming’s 14 Principles is constancy of purpose. Alex and his team engage the workforce though a wide range of communication tools and while staying focused on implementing the changes needed to stay in business.

Chapters 23 through 24 introduce the idea of involving the people doing the work in defining the solutions to work problems and finding opportunities. In Agile we use retrospectives to involve and capture the team’s ideas on process and personnel improvements. We also find that fixing one problem without an overall understanding of the whole system can cause problems to pop up elsewhere.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast. Dead Tree Version or Kindle Version


Categories: Process Management

Game Performance: Layout Qualifiers

Android Developers Blog - Sat, 04/25/2015 - 23:39

Today, we want to share some best practices on using the OpenGL Shading Language (GLSL) that can optimize the performance of your game and simplify your workflow. Specifically, Layout qualifiers make your code more deterministic and increase performance by reducing your work.


Let’s start with a simple vertex shader and change it as we go along.

This basic vertex shader takes position and texture coordinates, transforms the position and outputs the data to the fragment shader:
attribute vec4 vertexPosition;
attribute vec2 vertexUV;

uniform mat4 matWorldViewProjection;

varying vec2 outTexCoord;

void main()
{
  outTexCoord = vertexUV;
  gl_Position = matWorldViewProjection * vertexPosition;
}
Vertex Attribute Index To draw a mesh on to the screen, you need to create a vertex buffer and fill it with vertex data, including positions and texture coordinates for this example.

In our sample shader, the vertex data may be laid out like this:
struct Vertex
{
  Vector4 Position;
  Vector2 TexCoords;
};
Therefore, we defined our vertex shader attributes like this:
attribute vec4 vertexPosition;
attribute vec2  vertexUV;
To associate the vertex data with the shader attributes, a call to glGetAttribLocation will get the handle of the named attribute. The attribute format is then detailed with a call to glVertexAttribPointer.
GLint handleVertexPos = glGetAttribLocation( myShaderProgram, "vertexPosition" );
glVertexAttribPointer( handleVertexPos, 4, GL_FLOAT, GL_FALSE, 0, 0 );

GLint handleVertexUV = glGetAttribLocation( myShaderProgram, "vertexUV" );
glVertexAttribPointer( handleVertexUV, 2, GL_FLOAT, GL_FALSE, 0, 0 );
But you may have multiple shaders with the vertexPosition attribute and calling glGetAttribLocation for every shader is a waste of performance which increases the loading time of your game.

Using layout qualifiers you can change your vertex shader attributes declaration like this:
layout(location = 0) in vec4 vertexPosition;
layout(location = 1) in vec2 vertexUV;
To do so you also need to tell the shader compiler that your shader is aimed at GL ES version 3.1. This is done by adding a version declaration:
#version 300 es
Let’s see how this affects our shader, changes are marked in bold:
#version 300 es

layout(location = 0) in vec4 vertexPosition;
layout(location = 1) in vec2 vertexUV;

uniform mat4 matWorldViewProjection;

out vec2 outTexCoord;

void main()
{
  outTexCoord = vertexUV;
  gl_Position = matWorldViewProjection * vertexPosition;
}
Note that we also changed outTexCoord from varying to out. The varying keyword is deprecated from version 300 es and requires changing for the shader to work.

Note that Vertex Attribute qualifiers and #version 300 es are supported from OpenGL ES 3.0. The desktop equivalent is supported on OpenGL 3.3 and using #version 330.

Now you know your position attributes always at 0 and your texture coordinates will be at 1 and you can now bind your shader format without using glGetAttribLocation:
const int ATTRIB_POS = 0;
const int ATTRIB_UV   = 1;

glVertexAttribPointer( ATTRIB_POS, 4, GL_FLOAT, GL_FALSE, 0, 0 );
glVertexAttribPointer( ATTRIB_UV, 2, GL_FLOAT, GL_FALSE, 0, 0 );
This simple change leads to a cleaner pipeline, simpler code and saved performance during loading time.

To learn more about performance on Android, check out the Android Performance Patterns series.

Posted by Shanee Nishry, Games Developer Advocate Join the discussion on

+Android Developers
Categories: Programming

Deliberate Practice: Watching yourself fail

Mark Needham - Sat, 04/25/2015 - 23:26

Think bayes cover medium

I’ve recently been reading the literature written by K. Anders Eriksson and co on Deliberate Practice and one of the suggestions for increasing our competence at a skill is to put ourselves in a situation where we can fail.

I’ve been reading Think Bayes – an introductory text on Bayesian statistics, something I know nothing about – and each chapter concludes with a set of exercises to practice, a potentially perfect exercise in failure!

I’ve been going through the exercises and capturing my screen while I do so, an idea I picked up from one of the papers:

our most important breakthrough was developing a relatively inexpensive and efficient way for students to record their exercises on video and to review and analyze their own performances against well-defined criteria

Ideally I’d get a coach to review the video but that seems too much of an ask of someone. Antonios has taken a look at some of my answers, however, and made suggestions for how he’d solve them which has been really helpful.

After each exercise I watch the video and look for areas where I get stuck or don’t make progress so that I can go and practice more in that area. I also try to find inefficiencies in how I solve a problem as well as the types of approaches I’m taking.

These are some of the observations from watching myself back over the last week or so:

  • I was most successful when I had some idea of what I was going to try first. Most of the time the first code I wrote didn’t end up being correct but it moved me closer to the answer or ruled out an approach.

    It’s much easier to see the error in approach if there is an approach! On one occasion where I hadn’t planned out an approach I ended up staring at the question for 10 minutes and didn’t make any progress at all.

  • I could either solve the problems within 20 minutes or I wasn’t going to solve them and needed to chunk down to a simpler problem and then try the original exercise again.

    e.g. one exercise was to calculate the 5th percentile of a posterior distribution which I flailed around with for 15 minutes before giving up. Watching back on the video it was obvious that I hadn’t completely understood what a probability mass function was. I read the Wikipedia entry and retried the exercise and this time got the answer.

  • Knowing that you’re going to watch the video back stops you from getting distracted by email, twitter, Facebook etc.
  • It’s a painful experience watching yourself struggle – you can see exactly which functions you don’t know or things you need to look up on Google.
  • I deliberately don’t copy/paste any code while doing these exercises. I want to see how well I can do the exercises from scratch so that would defeat the point.

One of the suggestions that Eriksson makes for practice sessions is to focus on ‘technique’ during practice sessions rather than only on outcome but I haven’t yet been able to translate what exactly that would involved in a programming context.

If you have any ideas or thoughts on this approach do let me know in the comments.

Categories: Programming

Want To Learn How To Estimate? Part Troisième

Herding Cats - Glen Alleman - Sat, 04/25/2015 - 22:46

If you work in a domain, as I do, the need to answer the question when will we providing that needed capability to produce the planned value for the planned amount of money, then estimating is going to be part of answering those questions.

If you work where those paying for the work have little or no interest in asking these questions or knowing these answers, or have confidence you'll not overrun the cost, schedule, and deliver the needed capabilities as planned, then maybe estimates are needed.

Be interesting to hear from those actually paying for those outcomes to see what they need to make decisions in the presence of uncertainty.

Here's some more guidance for getting started with estimating software efforts.

And some tools to help out

So you see a trend here? There is nearly unlimited resources on how to estimate software development projects, how to manage in the presence of uncertainty, how to elicit requirements, how the plan and schedule software projects.

So if you hear we're bad at estimates, that's likely the actual experience for the person making that statement, because the person saying that hasn't yet learned how to estimate. Or when we hear estimates are a waste, it's likely it's not their money, so to them estimates take away from some other activity they see as more important. Why do they care of the project overruns it's budget, is late, or doesn't produce the needed value? Or my favorite estimates are the smell of dysfunction, when there is no domain, root cause, or corrective actions suggested, because that's actually hard work, and it's much easier just to point out bad management than provide suggestions of good management. 

Estimating is hard. Anything of value is hard. All the easy problems have been solved. 

But if we are to ever get a handle on the root causes of software project failure modes, we do need to seek out the root causes. A that means much more than just asking the 5 Whys. That's one of many steps in RCA and far from the complete solution to removing the symptoms of our problems. So start there, but never leave it there. 

Here's some approach we use

Unanticipated cost growth is a symptom of failing to properly estimate in the first place, update those estimates as the project progresses, and deal with the underlying risks that drive that cost growth. Same for lateness and less than acceptable delivered capabilities. Once the estimate has been established in some credible form, adjusted as the project progresses, you of course have to execute to the plan, or change the plan. It's a closed loop system

  • Target
  • Execute¬†
  • Assess performance
  • Determine error signal
  • Take corrective actions
    • Change target
    • Change execution processes
  • Repeat until complete

The answer to the root causes that create the symptoms we so quickly label as smells of dysfunction is NOT to stop doing something, but the learn what the actual cause is. Stopping before this knowledge is acquired leaves the symptom still in place. And that doesn't help anyone.

Related articles Who Builds a House without Drawings? Decision Analysis and Software Project Management Herding Cats: Five Estimating Pathologies and Their Corrective Actions Capability Maturity Levels and Implications on Software Estimating Economics of Software Development Qui Bono Want To Learn How To Estimate? Calculating Value from Software Projects - Estimating is a Risk Reduction Process Root Cause Analysis
Categories: Project Management

Game Performance: Explicit Uniform Locations

Android Developers Blog - Sat, 04/25/2015 - 20:15

Posted by Shanee Nishry, Games Developer Advocate

Uniforms variables in GLSL are crucial for passing data between the game code on the CPU and the shader program on the graphics card. Unfortunately, up until the availability of OpenGL ES 3.1, using uniforms required some preparation which made the workflow slightly more complicated and wasted time during loading.

Let us examine a simple vertex shader and see how OpenGL ES 3.1 allows us to improve it:

#version 300 es

layout(location = 0) in vec4 vertexPosition;
layout(location = 1) in vec2 vertexUV;

uniform mat4 matWorldViewProjection;

out vec2 outTexCoord;

void main()
{
    outTexCoord = vertexUV;
    gl_Position = matWorldViewProjection * vertexPosition;
}

Note: You might be familiar with this shader from a previous Game Performance article on Layout Qualifiers. Find it here.

We have a single uniform for our world view projection matrix:

uniform mat4 matWorldViewProjection;

The inefficiency appears when you want to assign the uniform value.

You need to use glUniformMatrix4fv or glUniform4f to set the uniform’s value but you also need the handle for the uniform’s location in the program. To get the handle you must call glGetUniformLocation.

GLuint program; // the shader program
float matWorldViewProject[16]; // 4x4 matrix as float array

GLint handle = glGetUniformLocation( program, ‚ÄúmatWorldViewProjection‚ÄĚ );
glUniformMatrix4fv( handle, 1, false, matWorldViewProject );

That pattern leads to having to call glGetUniformLocation for each uniform in every shader and keeping the handles or worse, calling glGetUniformLocation every frame.

Warning! Never call glGetUniformLocation every frame! Not only is it bad practice but it is slow and bad for your game’s performance. Always call it during initialization and save it somewhere in your code for use in the render loop.

This process is inefficient, it requires you to do more work and costs precious time and performance.

Also take into consideration that you might have multiple shaders with the same uniforms. It would be much better if your code was deterministic and the shader language allowed you to explicitly set the locations of your uniforms so you don’t need to query and manage access handles. This is now possible with Explicit Uniform Locations.

You can set the location for uniforms directly in the shader’s code. They are declared like this

layout(location = index) uniform type name;

For our example shader it would be:

layout(location = 0) uniform mat4 matWorldViewProjection;

This means you never need to use glGetUniformLocation again, resulting in simpler code, initialization process and saved CPU cycles.

This is how the example shader looks after the change. Changes are marked in bold:

#version 310 es

layout(location = 0) in vec4 vertexPosition;
layout(location = 1) in vec2 vertexUV;

layout(location = 0) uniform mat4 matWorldViewProjection;

out vec2 outTexCoord;

void main()
{
    outTexCoord = vertexUV;
    gl_Position = matWorldViewProjection * vertexPosition;
}

As Explicit Uniform Locations are only supported from OpenGL ES 3.1 we also changed the version declaration to 310.

Now all you need to do to set your matWorldViewProjection uniform value is call glUniformMatrix4fv for the handle 0:

const GLint UNIFORM_MAT_WVP = 0; // Uniform location for WorldViewProjection
float matWorldViewProject[16]; // 4x4 matrix as float array

glUniformMatrix4fv( UNIFORM_MAT_WVP, 1, false, matWorldViewProject );

This change is extremely simple and the improvements can be substantial, producing cleaner code, asset pipeline and improved performance. Be sure to make these changes If you are targeting OpenGL ES 3.1 or creating multiple APKs to support a wide range of devices.

To learn more about Explicit Uniform Locations check out the OpenGL wiki page for it which contains valuable information on different layouts and how arrays are represented.

Join the discussion on

+Android Developers
Categories: Programming

R: Think Bayes Locomotive Problem ‚Äď Posterior probabilities for different priors

Mark Needham - Sat, 04/25/2015 - 00:53

In my continued reading of Think Bayes the next problem to tackle is the Locomotive problem which is defined thus:

A railroad numbers its locomotives in order 1..N.

One day you see a locomotive with the number 60. Estimate how many loco- motives the railroad has.

The interesting thing about this question is that it initially seems that we don’t have enough information to come up with any sort of answer. However, we can get an estimate if we come up with a prior to work with.

The simplest prior is to assume that there’s one railroad operator with between say 1 and 1000 railroads with an equal probability of each size.

We can then write similar code as with the dice problem to update the prior based on the trains we’ve seen.

First we’ll create a data frame which captures the product of ‘number of locomotives’ and the observations of locomotives that we’ve seen (in this case we’ve only seen one locomotive with number ’60′:)

library(dplyr)
 
possibleValues = 1:1000
observations = c(60)
 
l = list(value = possibleValues, observation = observations)
df = expand.grid(l) 
 
> df %>% head()
  value observation
1     1          60
2     2          60
3     3          60
4     4          60
5     5          60
6     6          60

Next we want to add a column which represents the probability that the observed locomotive could have come from a particular fleet. If the number of railroads is less than 60 then we have a 0 probability, otherwise we have 1 / numberOfRailroadsInFleet:

prior = 1  / length(possibleValues)
df = df %>% mutate(score = ifelse(value < observation, 0, 1/value))
 
> df %>% sample_n(10)
     value observation       score
179    179          60 0.005586592
1001  1001          60 0.000999001
400    400          60 0.002500000
438    438          60 0.002283105
667    667          60 0.001499250
661    661          60 0.001512859
284    284          60 0.003521127
233    233          60 0.004291845
917    917          60 0.001090513
173    173          60 0.005780347

To find the probability of each fleet size we write the following code:

weightedDf = df %>% 
  group_by(value) %>% 
  summarise(aggScore = prior * prod(score)) %>%
  ungroup() %>%
  mutate(weighted = aggScore / sum(aggScore))
 
> weightedDf %>% sample_n(10)
Source: local data frame [10 x 3]
 
   value     aggScore     weighted
1    906 1.102650e-06 0.0003909489
2    262 3.812981e-06 0.0013519072
3    994 1.005031e-06 0.0003563377
4    669 1.493275e-06 0.0005294465
5    806 1.239455e-06 0.0004394537
6    673 1.484400e-06 0.0005262997
7    416 2.401445e-06 0.0008514416
8    624 1.600963e-06 0.0005676277
9     40 0.000000e+00 0.0000000000
10   248 4.028230e-06 0.0014282246

Let’s plot the data frame to see how the probability varies for each fleet size:

library(ggplot2)
ggplot(aes(x = value, y = weighted), data = weightedDf) + 
  geom_line(color="dark blue")

2015 04 25 00 25 47

The most likely choice is a fleet size of 60 based on this diagram but an alternative would be to find the mean of the posterior which we can do like so:

> weightedDf %>% mutate(mean = value * weighted) %>% select(mean) %>% sum()
[1] 333.6561

Now let’s create a function with all that code in so we can play around with some different priors and observations:

meanOfPosterior = function(values, observations) {
  l = list(value = values, observation = observations)   
  df = expand.grid(l) %>% mutate(score = ifelse(value < observation, 0, 1/value))
 
  prior = 1  / length(possibleValues)
  weightedDf = df %>% 
    group_by(value) %>% 
    summarise(aggScore = prior * prod(score)) %>%
    ungroup() %>%
    mutate(weighted = aggScore / sum(aggScore))
 
  return (weightedDf %>% mutate(mean = value * weighted) %>% select(mean) %>% sum()) 
}

If we update our observed railroads to have numbers 60, 30 and 90 we’d get the following means of posteriors assuming different priors:

> meanOfPosterior(1:500, c(60, 30, 90))
[1] 151.8496
> meanOfPosterior(1:1000, c(60, 30, 90))
[1] 164.3056
> meanOfPosterior(1:2000, c(60, 30, 90))
[1] 171.3382

At the moment the function assumes that we always want to have a uniform prior i.e. every option has an equal opportunity of being chosen, but we might want to vary the prior to see how different assumptions influence the posterior.

We can refactor the function to take in values & priors instead of calculating the priors in the function:

meanOfPosterior = function(values, priors, observations) {
  priorDf = data.frame(value = values, prior = priors)
  l = list(value = priorDf$value, observation = observations)
 
  df = merge(expand.grid(l), priorDf, by.x = "value", by.y = "value") %>% 
    mutate(score = ifelse(value < observation, 0, 1 / value))
 
  df %>% 
    group_by(value) %>% 
    summarise(aggScore = max(prior) * prod(score)) %>%
    ungroup() %>%
    mutate(weighted = aggScore / sum(aggScore)) %>%
    mutate(mean = value * weighted) %>%
    select(mean) %>%
    sum()
}

Now let’s check we get the same posterior means for the uniform priors:

> meanOfPosterior(1:500,  1/length(1:500), c(60, 30, 90))
[1] 151.8496
> meanOfPosterior(1:1000, 1/length(1:1000), c(60, 30, 90))
[1] 164.3056
> meanOfPosterior(1:2000, 1/length(1:2000), c(60, 30, 90))
[1] 171.3382

Now if instead of a uniform prior let’s use a power law one where the assumption is that smaller fleets are more likely:

> meanOfPosterior(1:500,  sapply(1:500,  function(x) x ** -1), c(60, 30, 90))
[1] 130.7085
> meanOfPosterior(1:1000, sapply(1:1000, function(x) x ** -1), c(60, 30, 90))
[1] 133.2752
> meanOfPosterior(1:2000, sapply(1:2000, function(x) x ** -1), c(60, 30, 90))
[1] 133.9975
> meanOfPosterior(1:5000, sapply(1:5000, function(x) x ** -1), c(60, 30, 90))
[1] 134.212
> meanOfPosterior(1:10000, sapply(1:10000, function(x) x ** -1), c(60, 30, 90))
[1] 134.2435

Now we get very similar posterior means which converge on 134 and so that’s our best prediction.

Categories: Programming

Complex, Complexity, Complicated

Herding Cats - Glen Alleman - Fri, 04/24/2015 - 23:41

In the agile community it is popular to use the terms complex, complexity, complicated many times interchangeably and and many times wrongly. These terms are many times overloaded with an agenda used to push a process or even a method.

First some definitions

  • Complex - ¬†consisting of many different and connected part. Not easy to analyze or understand. Complicated or intricate. When a system or problem is considered complex, analytical approaches, like dividing it into parts to make the problem tractable is not sufficient, because it is the interactions of the parts that make the system complex and without these interconnections, the system no longer functions.
  • Complex System -¬†is a functional whole, consisting of interdependent and variable parts. Unlike ¬†conventional systems, the parts need not have fixed relationships, fixed behaviors or fixed quantities, and their individual functions may be undefined in traditional terms.
  • Complicated - containing a number of hidden parts, which must be revealed separately because they do not interact. Mutual interaction of the components creates nonlinear behaviors of the system. In principle all systems are complex. The number of parts or components is irrelevant n the definition of complexity. There can be complexity - nonlinear behaviour - in small systems of large systems.¬†
  • Complexity - there is no standard definition of complexity is a view of systems that suggests simple causes result in complex effects. Complexity as a term¬†is generally used to characterize a system with many parts whose interactions with each other occur in multiple ways. Complexity can occur in a variety of forms
    • Complex behaviour
    • Complex mechanisms
    • Complex situations
    • Complex systems
    • Complex data
  • Complexity Theory -¬†states that critically interacting components self-organize to form potentially evolving structures exhibiting a hierarchy of emergent system properties.¬†This theory takes the view that systems are best regarded as wholes, and studied as such, rejecting the traditional emphasis on simplification and reduction as inadequate techniques on which to base this sort of scientific work.

One more item we need is the types of Complexity

  • Type 1 - fixed systems, where the structure doesn't change as a function of time.
  • Type 2 - systems where time causes changes. This can be repetitive cycles or change with time.
  • Type 3 - moves beyond repetitive systems into organic where change is extensive and non-cyclic in nature.
  • Type 4 - are self organizing where we can¬†combine internal constraints of closed systems, like machines, with the creative evolution of open systems, like people.

And Now To The Point

When we hear complex, complexity, complex systems, complex adaptive system, pause to ask what kind of complex are you talking about. What Type of complex system. In what system are you applying the term complex. Have you classified that system in a way that actually matches a real system.

It is common use the terms complex, complicated, and complexity are interchanged. And software development is classified or mis-classified as one or the both or all three. It is also common to toss around these terms with not actual understanding of their meaning or application.

We need to move beyond buzz words. Words like Systems Thinking. Building software is part of a system. There are interacting parts that when assembled, produce an outcome. Hopefully a desired outcome. In the case of software the interacting parts are more than just the parts. Software has emergent properties. A Type 4 system, built from Type 1, 2, and 3 systems. With changes in time and uncertainty, modeling these systems requires stochastic processes. These processes depend on estimating behaviors as a starting point. 

The understanding that  software development is an uncertain process (stochastic) is well know, starting in the 1980's [1] with COCOMO. Later models, like Cone of Uncertainty made  is clear that these uncertainties, themselves, evolve with time. The current predictive models based on stochastic processes include Monte Carlo Simulation of networks of activities, Real Options, and Bayesian Networks. Each is directly applicable to modeling software development projects.

[1] Software Engineering Economics, Barry Boehm, Prentice-Hall, 1981.

Related articles Decision Analysis and Software Project Management Making Decisions in the Presence of Uncertainty Some More Background on Probability, Needed for Estimating Approximating for Improved Understanding The Microeconomics of a Project Driven Organization How to Avoid the "Yesterday's Weather" Estimating Problem Hope is not a Strategy
Categories: Project Management

Thinking About #NoEstimates?

I have a new article up on agileconnection.com called The Case for #NoEstimates.

The idea is to produce value instead of spending time estimating. We have a vigorous “debate” going on in the comments. I have client work today, so I will be slow to answer comments. I will answer as soon as I have time to compose thoughtful replies!

This column is the follow-on to How Do Your Estimates Provide Value?

If you would like to learn to estimate better or recover from “incorrect” estimates (an oxymoron if I ever heard one), see Predicting the Unpredictable. (All estimates are guesses. If they are ever correct, it’s because we got lucky.)

Categories: Project Management