Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Announcing the GTAC 2013 Agenda

Google Testing Blog - Sat, 05/04/2013 - 16:54

by The GTAC Committee

We have completed selection and confirmation of all speakers for GTAC 2013. You can find the detailed agenda at:
  developers.google.com/gtac/2013/schedule

Thank you to all who submitted proposals! It was very hard to make selections from so many fantastic submissions.

If you were not extended an invitation, don’t forget that you can join us via YouTube live streaming. We’ll be setting up Google Moderator, so remote attendees can get involved in Q&A after each talk. Information about live streaming, Moderator, and other details will be posted on the GTAC site soon and announced here.

Categories: Testing & QA

Daily Process Thoughts: Empowerment, May 4, 2013

Empowerment

 

Hand Drawn Chart Saturday

The principles in the Agile Manifesto stress self-management and self-organization, which are directly at odds with the authoritative core of command and control management. In a people-centric management approach, teams and team members are empowered to make decisions. Using a command and control form of leadership to drive a project or program sends a message to the team that “management” does not trust them to make good decisions schedules, functionality, or budget. The natural tendency of a team in this scenario is to wait to be told what to do. Agile works best when organizations take a people-centric management approach. If decision and direction is passed down to the team as law, concepts like self-organization and self-management become difficult to implement.

While command and control and empowerment appear to be binary states, in real life it is much more of a continuum.  As organizations back away from command and control management techniques empowerment increases. When teams self-organize, Agile implementations begin to take hold and flourish – along with customer satisfaction, quality and productivity by providing an environment where everyone unleash their creativity.


Categories: Process Management

Creating Career Opportunities

How do you create career opportunities?   You reinvent yourself.

While you can always hope for things to land in your lap, there are specific patterns I see successful people do.  Among those that continuously create the best career opportunities, here are the key success patterns:

  1. They invest in themselves.  They’re always learning, and taking some sort of training, beyond their day job.
  2. They reinvent themselves.  As a result of investing in themselves, they grow new capabilities.   With their new capabilities, they expand the opportunities they can easily plug themselves into.  For example, a few of my friends started to focus on data science in anticipation of big data, as one of the key trends for 2013 and beyond.  As part of re-inventing themselves, they re-brand themselves to better showcase what they’re bringing to the table.
  3. They build connections before they need them.  It’s always been a game of who you know and what you know, but now more than ever, your network can be the difference that makes the difference when it comes to finding out about relevant opportunities.
  4. They know who’s job they want.   They have a role-model or two that already does the job they want.  The role-model exemplifies how they want to show up, how they want to spend their time, and through that role-model they learn the types of challenges they want to take on, and they get better perspective on what the life-style is actually like.  This not only helps them get clarity on the type of job they want, but it helps when they tell other people the kind of job they want, and can point to specific examples.
  5. They know the market.   They pay attention to where the action is.   They don’t just follow their passion.  They follow the money, too, to know where the growth is, and where there’s value to be captured.  As the saying goes, every market has niches, but not every niche has a market.
  6. They have a mentor, and a “board of directors.”   They use a circle of trusted advisors that can help clue them into where to grow their strengths, and how to find better opportunities, based on what they’re capable of.   It might be their “wolf pack”, but more often than not, it’s a seasoned mentor or two that has great introspection, and can see what they can’t, and they can help them to see things from a balcony view.  Most importantly, the sharp mentors, the wise and able ones, help them to know their Achilles heal, and get past glass ceilings, and avoid career limiting moves.
  7. They have a sponsor.  Like a game of Chutes and Ladders, skilled sponsors help them find the short-cuts, avoid the dead ends, and avoid sliding backwards.

If you’re wondering where the best career opportunities are, sometimes it’s the job you’ve already got, sometimes you have to go find them, and sometimes, you have to make them.

Categories: Architecture, Programming

Google API infrastructure outage incident report

Google Code Blog - Sat, 05/04/2013 - 00:07
By the Google API Infrastructure Team


As we described in a previous post, earlier this week we experienced an outage in our API infrastructure. Today we’re providing an incident report that details the nature of the outage and our response.

The following is the incident report for the Google API infrastructure outage that occurred on April 30, 2013. We understand this service issue has impacted our valued developers and users, and we apologize to everyone who was affected.

Issue Summary

From 6:26 PM to 7:58 PM PT, requests to most Google APIs resulted in 500 error response messages. Google applications that rely on these APIs also returned errors or had reduced functionality. At its peak, the issue affected 100% of traffic to this API infrastructure. Users could continue to access certain APIs that run on separate infrastructures. The root cause of this outage was an invalid configuration change that exposed a bug in a widely used internal library.

Timeline (all times Pacific Time)
  • 6:19 PM: Configuration push begins
  • 6:26 PM: Outage begins
  • 6:26 PM: Pagers alerted teams
  • 6:54 PM: Failed configuration change rollback
  • 7:15 PM: Successful configuration change rollback
  • 7:19 PM: Server restarts begin
  • 7:58 PM: 100% of traffic back online
Root Cause

At 6:19 PM PT, a configuration change was inadvertently released to our production environment without first being released to the testing enviroment. The change specified an invalid address for the authentication servers in production. This exposed a bug in the authentication libraries which caused them to block permanently while attempting to resolve the invalid address to physical services. In addition, the internal monitoring systems permanently blocked on this call to the authentication library. The combination of the bug and configuration error quickly caused all of the serving threads to be consumed. Traffic was permanently queued waiting for a serving thread to become available. The servers began repeatedly hanging and restarting as they attempted to recover and at 6:26 PM PT, the service outage began.

Resolution and recovery

At 6:26 PM PT, the monitoring systems alerted our engineers who investigated and quickly escalated the issue. By 6:40 PM, the incident response team identified that the monitoring system was exacerbating the problem caused by this bug.

At 6:54 PM, we attempted to rollback the problematic configuration change. This rollback failed due to complexity in the configuration system which caused our security checks to reject the rollback. These problems were addressed and we successfully rolled back at 7:15 PM.

Some jobs started to slowly recover, and we determined that the overall recovery would be faster by a restart of all of the API infrastructure servers globally. To help with the recovery, we turned off some of our monitoring systems which were triggering the bug. As a result, we decided to restart servers gradually (at 7:19 PM), to avoid possible cascading failures from a wide scale restart. By 7:49 PM, 25% of traffic was restored and 100% of traffic was routed to the API infrastructure at 7:58 PM.

Corrective and Preventative Measures

In the last two days, we’ve conducted an internal review and analysis of the outage. The following are actions we are taking to address the underlying causes of the issue and to help prevent recurrence and improve response times:
  • Disable the current configuration release mechanism until safer measures are implemented. (Completed.)
  • Change rollback process to be quicker and more robust.
  • Fix the underlying authentication libraries and monitoring to correctly timeout/interrupt on errors.
  • Programmatically enforce staged rollouts of all configuration changes.
  • Improve process for auditing all high-risk configuration options.
  • Add a faster rollback mechanism and improve the traffic ramp-up process, so any future problems of this type can be corrected quickly.
  • Develop better mechanism for quickly delivering status notifications during incidents.
Google is committed to continually and quickly improving our technology and operational processes to prevent outages. We appreciate your patience and again apologize for the impact to you, your users, and your organization. We thank you for your business and continued support.

Sincerely,

The Google API Infrastructure Team


Posted by Scott Knaster, Editor
Categories: Programming

May 3, 2013: This Week at Engine Yard

Engine Yard Blog - Fri, 05/03/2013 - 20:29

We’ve finalized some major under-the-hood upgrades at Engine Yard this week that should start showing themselves in public facing features within the next few months! In the meantime, this is what you can actively check out.

--Tasha Drew, Product Manager

Engineering Updates

Improvements to ELB handling are live and in production! Updates include better error handling for a smoother integration and experience.

We have removed Passenger 2 as an option for customers booting new environments because it’s really old. Any customers with an environment assigned to the Passenger 2 application server stack has the feature flag enabled and will continue to see it as an option. You are also encouraged to upgrade for all the awesome benefits of Passenger 3.

Engine Yard Cloud customers can now file tickets directly through the Cloud dashboard.

We had a bunch of other minor bumps you can read about in our release notes.

Data Data Data

Riak has been bumped to 1.3.1 as it reaches the last few weeks of its early access phase!

Social Calendar (Come say hi!)

Tuesday, May 7th: Engine Yard’s Dublin, Ireland office will be hosting the second Postgres User Group meetup with Greg Stark, a long-time Postgres contributor and committer as the speaker.

Thursday, May 9th: Coder Dojo in PDX continues to plan how to help teach kids and their parents about how to learn about and explore coding and software. Everyone is encouraged to grab a laptop and jump in!

Thursday, May 9th: Pub Standards in Dublin, Ireland welcomes any and all in-town developers, designers, founders, and people-who-like-to-build-stuff to stop by the Bull & Castle for a beer and a chat.

Articles of Interest

Pricing updates went live, and customers can expect to take advantage of reduced instance pricing on their April bill!

Our friends at TMX posted a thoughtful piece, “In Search of Software Quality.”

Pacific Coast Support team lead and all around awesome guy Ralph Bankston (who sadly has no twitter handle for me to link to) has gone in-depth about how to troubleshoot cron jobs.

Categories: Programming

Stuff The Internet Says On Scalability For May 3, 2013

Hey, it's HighScalability time:


(Giant Hurricane on Saturn, here's one in New Orleans)

 

  • 1,966,080 cores: Time Warp synchronization protocol using up to 7.8M MPI tasks on 1,966,080 cores of the {Sequoia} Blue Gene/Q supercomputer system. 33 trillion events processed in 65 seconds yielding a peak event-rate in excess of 504 billion events/second using 120 racks of Sequoia.
  • Quotable Quotes:
    • Thad Starner: the longer accessing a device exceeds 2s, the more its actually usage would decrease exponentially. Thus, he made a claim that wrist watch interface always sitting on one's wrist ready to use should be more successful than mobile phones which have to pulled out of the pocket. 
    • @joedevon: We came for scalability but we stayed for agility #NoSQL
    • @jahmailay: "Our user base is exploding. I really wish we spent more time on scalability instead of features customers don't use." - Everybody, always.
    • @bsletten: I don’t think it is a coincidence that the words eval() and evil are so close.
    • @RCSecure: Maybe Gov should stop deploying crappy #CyberSecurity instead of Surveiling Citizens
    • @davidpav: "This is what Netflix does - after each deployment creates AMI for faster scaling up"
    • @franzgranlund: Rewrote my little batch-processing application using #akka . 20% performance increase just like that - and now it is easier to scale.
    • @marshray: Ouch, that's kind of dismal. Perhaps we need a new term: "eventual scalability"
    • @adrianco: RT @rbranson: @cscotta load average is the worst thing ever. Slowly trying to evangelize it's demise as a reasonable metric. < +1 every 15 m

  • MIT Tech Review picks 10 breakthrough technologies: Smart Watches (really?), Memory implants (deciphering the code by which the brain forms long-term memories), Additive manufacturing (3-D printing), Supergrids (finally says Edison, DC powergrids), Temporary social media (sigh), Prenatal DNA sequencing (great for full lifecycle ad targeting), Baxter (compliant robots), Deep Learning (the singularity is near), Ultra-Efficient Solar Power (now we are talking). Prediction: We'll laugh at all this filter control talk once we have all of Google's datacenters and knowledge graph software implanted in our heads.

  • IBM on making movies using atoms as pixels. Characterization was a little thin but the plot was magnetic.

  • Lesson from Airbnb: Give yourself permission to experiment with non-scalable changes. Building better is better than building bigger.

  • Here's a short review by me on CyberStorm by Matthew Mather. Matthew is also the author of the most excellent Atopia Chronicles, a sprawling exploration of "artificial intelligence, distributed computing, nanotechnology, and the full range of humanity." CyberStorm is a chilling blow by blow of what could happen in a real cyber attack. As a programmer it's the implied idea of a kind of Crises OS built on a mesh of smartphones that I found most fascinating. Not much seems to be done in this area and even the how-to of writing such applications is rarely discussed. Could be interesting.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Categories: Architecture

Daily Process Thoughts: Overworked Product Owners,

Sleeping

The role of Product Owner is a critical component for ensuring that the rest of the business or company and the IT team work together effectively, and requires significant effort on a daily basis. The product owner provides vision, mentors the team, answers questions, makes decisions about the product, communicates with the broader organization, negotiates resource contentions, coordinates business interaction and serves as a liaison to leaders. In short, the role is difficult and complex. So much so, that I have suggested on more than one occasion that it is the most challenging role in Scrum. There are many potential pitfalls in Scrum, however potentially the most destructive and easiest to avoid is the overworked Product Owner. An individual serving as the Product Owner in addition to their normal day job will likely be overwhelmed. Overworked Product Owners will tend are less effective.

What happens when someone is overworked?  Sooner or later something gets neglected and corners get cut. Some work gets jettisoned as they try to bring their life back to equilibrium. Overworking the Product Owner may lead to inattention to the team, neglect of grooming the product backlog, and unavailability or missed meetings. It is possible that the Product Owner will neglect their day job; however I have generally noticed that people tend to focus first on that portion of their job that is most important to their long term career.

The Product Owner role is challenging; to perform it in a manner that is effective requires effort and focus. Organizations need to ensure that Product Owners have enough of their time allocated to the Product Owner role. Work generally needs to be taken off their plate. The rest of the team needs to support the Product Owner so that obstacles are minimized. Avoid Overworked Product Owner Syndrome and make sure the person that is playing the Product Owner role has the time needed to focus on the project rather on the work they can avoid.


Categories: Process Management

Emphasize Good Practices

NOOP.NL - Jurgen Appelo - Fri, 05/03/2013 - 12:59

Checklist colorIn many working environments people’s focus is usually is on fixing problems. This makes sense, because continuous improvement allows organizations to survive and thrive. However, a focus on things that could be improved usually comes down to a focus on failures and mistakes, and this mindset can have some serious side effects. Being a perfectionist, I have sometimes been guilty of this myself. I have “raised the bar” for me and for others until the bar was so high that Godzilla could do a limbo dance underneath while carrying a space shuttle.

However, I noticed a strange thing when I urged people to stop screwing up. I found this didn’t motivate them at all! I realized getting better isn’t just about reducing what goes wrong (making mistakes). It’s also about increasing what is right (using good practices). And every now and then people need a reminder that they’re doing just fine.

It’s no wonder the culture in many organizations feels negative when the focus of discussions is mainly on mistakes and problems. Workers feel they are held accountable for not being perfect. Instead of having a constructive view on improvement, people end up with a defensive frame of mind. They evade taking responsibility, and for every perceived problem they point at others who must have caused it. Because people’s minds are focused on self-defense instead of improvement, things will not get any better, and the organization will just make more mistakes.

I believe we should emphasize the good recipes over the mistakes, because you get more of what you focus on. If you focus on mistakes, people will make more mistakes. If you focus on good practices, people will invent more good practices.

It seems evident to me that we should emphasize the good behaviors, not the bad ones. We should celebrate good practices, not punish mistakes.

Yayquestions-front-frame-miniThis text is part of Yay Questions, a Management 3.0 Workout article. Read more on my mailing list.
Categories: Project Management

Personal Kanban and Iterations, Day 5

I am still making progress, although it’s more difficult to see my progress today. Why? Because I did not get as much to done.

PersonalKanbanDay5One of my readers asked a question about the Urgent queue  and the relative ranking of my ever-growing left hand column. How did I determine what to do, and what was the rank of each?

The Urgent queue always trumps everything on the left hand side of the list. I was so frantic on Monday, I didn’t order anything when I put the list together. It almost didn’t matter what I worked on, as long as I made enough progress to get enough things to done. As you can see, I did pick and choose. When I rewrite my list for next week, I will reconsider what I need to do in order. I need to complete the workshops and talks first. Then do the writing. My list next week should be shorter, so I should feel less frantic and be able to finish it.

As for the ones I have added to the bottom of the list, trumping the older ones in importance? No, not really. They are there because I realized I needed to do them also this week. My todos are getting away from me. Putting them on the list means I don’t lose them. I can relax because they are there. Now, I have to focus and do them.

If you are wondering, will I continue this series next week? No. I will not. One week of this is plenty. I wanted to show you a number of things:

  • Everyone has trouble every so often, with too much to do
  • The best way to organize your work is to see it, not matter what you decide to do next
  • I like personal kanban, where I finish one chunk of work and go on to the next
  • If you keep your chunks of work small, you can finish one and continue on to the next one. If your chunks of work are too large, you can’t finish anything and you are tempted to multitask. (Don’t do that!)

If you want to see all the posts in this series, here they are:

To see a “real” personal kanban board, the way I suggest you do it in Manage Your Job Search, go to Personal Kanban for Your Job Hunt.

Read my Book Review of Personal Kanban for more information on how to do it right. And, Gil Broza will be interviewing me for his Individuals and Interactions virtual training May 15, 2013. My topic? “Focus Keeps You Going.” Surprised? I don’t think so!

Categories: Project Management

Troubleshooting Common Problems with Cron Jobs at Engine Yard

Engine Yard Blog - Thu, 05/02/2013 - 22:34

Cron jobs are a basic unix tool used to run specific commands at specific times.  This can be anything from deleting files to starting a script that processes payments in your application at specific times without having to remember to start the script manually. The most common questions we receive about cron jobs are: verifying the time at which a cron job is supposed to run, environment and path issues while running rake tasks, and unexpected cron output. Here are examples and solutions to some of these common cron problems.

Timing

The most common question we receive is how to verify the time that a cron job is supposed to run. An important first step in that process is to verify the time zone the server is currently set to.  Cron runs based on the system time. Our servers default to UTC but some of our older servers are running on Pacific Time so you need to ensure you have the correct time zone. You can verify this by either checking /etc/localtime or typing date.

The five scheduling positions are: minute ( 0 - 59 ), hour ( 0 - 23 ), day of the month ( 1 - 31 ), month ( 1 - 12 ), day of the week ( 0 - 6 with Sunday being 0 ).  A short hand for this that can be added to the top of a crontab is # min hr DoM m DoW.

*      *    *      *     *  command to be executed
┬    ┬    ┬    ┬    ┬
│     │     │     │     │
│     │     │     │     │
│     │     │     │     └───── day of week (0 - 7) (0 or 7 are Sunday, or use names)
│     │     │     └────────── month (1 - 12)
│     │     └─────────────── day of month (1 - 31)
│     └──────────────────── hour (0 - 23)
└───────────────────────── min (0 - 59)

(Wikipedia)

You can also use the * which is the wildcard for every possible value of the five scheduling fields.  You can also use */ to have it run at varying times. There are also websites that can check the timing such as http://www.generateit.net/cron-job/ and http://cronwtf.github.com/.

Environment and Path Issues

A problem we see is not calling the proper path when running a rake command. If you are running a rake task you’ll want to make sure you set the environment and the path if needed correctly.

Example: A rake task that may work with system gems but not with bundler because of a path and environment issue.

Deploy User Crontab:
30 1 * * * rake ts:index

This code will only index sphinx if you are using system gems and not Bundler.  If you are using Bundler you will want to make sure you are either using bundle exec or calling the binstub executables directly within the application.

Example: A rake task that works.

Deploy User Crontab:
30 1 * * * cd /data/appname/current && RAILS_ENV=production bundle exec rake ts:index

This command calls both the correct path and also sets the RAILS_ENV environment variable so you get expected results based on the Rails environment running. In some instances you may have to specify the full path to rake in the bundled gems which is /usr/local/bin/bundle exec /data/appname/current/ey_bundler_binstubs/rake.

Cron Output

We commonly see cron jobs that don’t have output handled at all or output in an expected manner.  The choices for cron job output are to have no output, create a log file of what happened during the rake task, or to only list errors. The first step in deciding proper output handling is whether you want cron to notify you of anything or if your command will handle it internally. If you choose to do nothing when creating your cron and there was output it would attempt to send an email or if ssmtp mail was not configured on your instance the output would be sent to the dead.letter file. If you do not want any output saved from the cron job appending >/dev/null 2>&1 to your command output and send it  to /dev/null (/dev/null is a device that discards any data sent to it).

Another option is to capture the output of a rake task running --trace  it is possible to add a verbose log with the addition of >/data/deploy/appname/current/log/ts_rake.log >/dev/null 2>&1. The cron job for that would look like this:

30 1 * * * cd /data/appname/current && RAILS_ENV=production bundle exec rake ts:index --trace > /data/deploy/appname/current/log/ts_rake.log >/dev/null 2>&1

The log file will also need to either exist and be writable by the deploy user running the cron job or the user will need write permission to the directory that contains the log file.

It is possible to send the output to email by not capturing the standard out with  >/dev/null 2>&1.  As stated previously, on our system our systems are not set up to send e-mail. That will need to be set up before having the mail delivered.

Cron running at specific times is recorded by default into /var/log/syslog. You can sudo grep cron /var/log/syslog to look at the cron jobs that have run during the current day. You can check older days by going through the older log files which are rotated daily.

Cron on Engine Yard

Cron jobs are great for scheduled tasks. There are two important things to remember about running applications on Engine Yard Cloud. The first is that the application master or solo instance is the only instance in an environment that the dashboard will install cron jobs. This is something to keep in mind if all of your application instances need to run the script or if the job should be run on a utility instance. The second is that when an application master takeover is initiated the newly transitioned application master doesn’t have the full contents of the previous application master. When a takeover occurs the cron jobs from the dashboard have not yet been put in place. Pressing the apply button inside the dashboard will properly install the cron jobs from your dashboard to your new application master.

Categories: Programming

Daily Process Thoughts, Do Teams Have Boundaries? May 2, 2013

20130502-143225.jpg
It is a commonly held belief that a team comprised of a blend of skills and experiences can accomplish nearly anything. Because we believe that teams are effective, they are used to solve nearly every problem. In some cases the word team has become a talisman without a practical definition. In many cases the lack of definition means that what we call teams have amorphous membership and boundaries which makes it hard to understand who is member, even for those who are on the team.

In his 2009 book, ‘Leading Teams’,J. Richard Hackman suggests that if teams are not bounded the effectiveness of team is reduced. This reduced effectiveness can be caused by many factors, including role confusion or by inability to invest in building trusting relationships. If you don’t know who is on the team today or who will be on it tomorrow, it is is difficult to invest the time and emotional capital needed to build relationships. This is especially true of diverse teams with people of different backgrounds.

Teams have great value. When we discuss the Agile principle of IT and the business working together on a daily basis, the underlying assumption is that the interaction happens at a team level. However, for the interaction to be effective the team needs to be effective. One critical component of building an effective team is that it needs to be bounded, so that the necessary relationships can be built for information and knowledge sharing.


Categories: Process Management

Cost (and Schedule) Estimating Foundations

Herding Cats - Glen Alleman - Thu, 05/02/2013 - 16:02

IDA Cost EstimatingCost estimating methods have been around for a long time. The current processes found in agile use a points system, sometimes a Fibonacci series to bin the values of the points.

The challenge with this approach is the estimate in agile is not monetized so we can't really tell if the Total Allocated Budget (TAB) is sufficient for the project at any point in time, unless the capacity for and the quality of the ourcomes is steady - that is Level of Effort. 

With the LOE approach, the capacity for work is the critical measurement needed for estimating the cost at completion. As well continuous updating of this capacity for work is needed and correctly done on good agile projects.

But there are other issues with this LOE approach on larger projects:

  • Do we know the variances in the capacity for work and how those variances will impact the final Cost at Completion?
  • Do we know the interdependencies between the various work products and how they impact the final cost?
  • Do we know the Aleatory uncertainty - the natural occurrence that can't be fixed and has to have margin
  • Do we know the Epistemic uncertainty - the event based risks that need a risk handling plan?

So the examples like that found at Projects @ Work, don't really consider any of the underlying uncertainties in estimating. Without the next level down - statistically adjusted estimates of the work effort, the capacity for work, the quality of that work, and the interdependencies between those work activities and their products, the simple and maybe even simple minded approaches to estimating have limits to scaling.

This is one of those topics where everyone is right in some way, depending on the domain, context in the domain, and scale of that domain. As agile enters the larger acquisition community, where we're spending other peoples money - maybe 100's of millions of dollars, care needs to be taken when applying un-monetized, non-probabilistic, non-joint probability (cross correlations between work element) and non-stochastic forecasting models. The real world is not that simple.

Related articles Deterministic versus Stochastic Trends in Earned Value Management Data Probabilistic Cost and Schedule Processes Uncertainty is the Source of Risk Time to Revisit The Risk Discussion When We Say Risk What Do We Really Mean? Complex Problems Require Better Solutions A Point Measures Need A Variance
Categories: Project Management

Both Aleatory and Epistemic Uncertainty Create Risk

Herding Cats - Glen Alleman - Thu, 05/02/2013 - 15:08

Over the past weeks there have been several discussions on the forums and Blogs amount risk and risk management. Here's a short post on those topics and their impact on project performance.

Risk Comes from Uncertainty

Risk does not exist by itself. Risk is created when there is uncertainty. If I am certain that it is going to rain this afternoon, then there is no risk of rain. It's going to rain with 100% probability. There is no uncertainty about the forecast of rain.

So first we need uncertainty to have a risk. But there are two classes of uncertainty:

  • Aleatory uncertainty - is uncertainty that comes from a random process. Flipping a coin and predicting either HEADS or TAILS is aleatory uncertainty. In other words, the uncertainty we are observing is random, it is part of the natural processes of what we are observing.

Aleatory uncertainty refers to the inherent uncertainty due to the probabilistic variability.

This type of uncertainty is Irreducible, in that there will always be variability in the underlying variables.

These uncertainties are characterized by a probability distribution.

The parameter that is being measured - duration, RPM, discharge from a river flow, is stochastic and cannot be reduced.

  • Epistemic uncertainty - is uncertainty that comes from the lack of knowledge. This lack of knowledge comes from many sources. Inadequate understanding of the underlying processes, incomplete knowledge of the phenomena, or imprecise evaluation of the related characteristics are common sources of epistemic uncertainty. In other words we don't know how this thing works so there is uncertainty about its operation.
Epistemic uncertainty refers to limited knowledge we may have about the system (modeled or real). This type of uncertainty is reducible. If we have more information, we can take more measurements, conduct more tests, "buy" more information. This type of uncertainty can be reduced. The parameter being measures is usually a characteristic of the material or the physical process. The uncertainty is related to the "lack of knowledge," about this parameter.

Handling Risks Created from Uncertainty

  • Aleatory risk is not a lack of information. It is a naturally occurring process. We cannot buy more information, so we have to provide margin for this type of risk. Schedule Margin to cover the naturally occurring variances in how long it takes to do the work. Cost Margin to cover the naturally occurring variances in the price of something we are consuming in our project.

Aleatory uncertainty and the resulting risk is modeled with a Probability Distribution Function. This PDF describes all the possible values the process can take and the probability of each value. For a single toss of a coin, there is a 50% probability it will be either heads or tails. For multiple tosses of a fair coin the probability distribution of the total number of heads or the total number of tails is a binomial distribution that  looks like this for the numbers of HEADs from fair coin being tossed 20 times.

BinomialThe PDF for the possible durations for the work in the project can be determined in several ways. It turns out we can buy knowledge about aleatory uncertainty through Reference Class Forecasting and past performance modeling. This new information then allows us to update - adjust - our past performance on similar work will provide information about our future performance. But the  underlying processes is still random, and our new information simply created a new aleatory uncertainty PDF.

  • Epistemic risk comes from the lack of knowledge. Epistemology is the branch of philosophy concerned with the nature and scope of knowledge. Lack of knowledge is epistemic uncertainty. 

Epistemic risk is modeled by defining the probability that the risk will occur, the time frame in which that probability is active, and the probability of an impact or consequence from the risk when it does occur.

Risk statements are used to define and model these event based risk:

  • IF-THEN - says if we miss our next milestone then project will fail to achieve its business value during the next quarter.
  • CONDITION-CONCERN - our subcontractor has not provided enough information for us to status the schedule, and our concern is the schedule is slipping and we don't know it.
  • CONDITION-EVENT-CONSEQUENCE - our status shows there are some tasks behind schedule, so we could miss our milestone, and the project will fail to achieve its business value in the next quarter.
For these types of risks we can have an explicit or an implicit risk handling plan. I use the work handling with special purpose. We handle risks in a variety of ways. Mitigation is one of those ways. But the risk handling work is actual work. It is in the schedule. We are doing work to mitigate the risk. We are buying down the risk, or we are retiring the risk. In all cases, we are spending money, and consuming time to reduce the probability that the risk will occur. Or we could be spending money and consuming time to reduce the impact of the risk when it does occur. In both cases we are taking action to address the risk. The second approach to handling an epistemic risk is the have Management Reserve to cover the cost of the consequences when the risk occurs. Sometimes the term contingency is used. Both Management Reserve and Contingency may be used together. In both cases, money is set aside to handle the risk. We also need time as well, so we may have schedule reserve. But this gets confused many times with schedule margin, but it is still needed.

Risk Management is how Adults Manage Projects

One of the posters stated what would be considered a Lame response to the processes and seeming conplexity of managing risks on non-trivial projects, by stating you're making this to complex - Just Do It. It was lame. Here's the response to those who objective in what ever way to doing risk management.

First answer the question what is the value at risk for your project? Don't know? Go find out. Then ask the project sponsor or the person giving you money to manage the project, if they would be willing to lose that money outright. Just write it off when the risk comes true. Probably not would be the answer. So go do the risk management process. 

Here's Tim Lister's advice. The section title is Lister's quote and should be used every time some lame response comes back about risk management.

Tim Lister

Related articles Four components of risk Time to Revisit The Risk Discussion Uncertainty is the Source of Risk Deterministic versus Stochastic Trends in Earned Value Management Data
Categories: Project Management

Personal Kanban and Iterations, Day 4

I’m still chugging along, making great progress. I took some interruptions yesterday, as many people do. They are not reflected on my kanban. They are in my calendar, which I am not showing you :-)

PersonalKanbanDay4A potential client emailed, asked for a call. I said yes, and we arranged for a call that day. Could I have put it on my kanban? Yes. Did I bother? No. Does that make me a bad person? No. It’s my kanban, not yours.

I don’t track metrics from my kanban. If I did, I would want that and the other calls there. But I don’t, so it’s fine.

I’m using my kanban to help me to get to done on my tasks, not to track my every piece of work. I’m using it to not forget work. I have a couple of phone calls this morning and a phone call this afternoon. I hope to complete one of the workshops today. Maybe.

I have a workout tomorrow and a number of phone calls, so I might not complete anything tomorrow. We will see. On the other hand, I am whittling down my list to something manageable. I no long feel anxious about it. I can see my progress. And, I have managed to blog this week. I am a happy, productive woman.

And, that is what personal kanban is all about.

Categories: Project Management

London Workshops Almost Full, May 16 & 17, 2013

Are you considering joining me in my Coaching or Project Management workshops in London on May 16 or May 17, 2013? If so, please decide quickly.

I have room for two more people in the coaching workshop. I have room for three more people  in the project management workshop. When those places are gone, they are gone. That’s it, no more. I will run a waiting list.

If you are considering it because you are not sure, email me.

 

Categories: Project Management

Daily Process Thoughts, Are Product Owners Tour Guides?, May 1, 2013

20130501-224521.jpg
If you have ever visited a major tourist site you have seen tour guides shepherding groups of camera touting tourists. It is easy to see the tour guide role as that of a leader. A typical tour guide plans the tactical logistics of the tour, herds the tour group ensuring everyone is moving in the same direction and implements the vision of the tour planner to deliver value. The goal of our tour guide is to make sure the team begins and ends together, that no one gets lost and the goal of the tour is accomplished. The role is provides administrative and tactical leadership to the tour group. The tour guide is not playing the role of the product owner.

In Agile projects the product owners provide visionary leadership. Tactical leadership and Adminsitration, the tour guide role, is generally defused across the entire team. The arrangement of roles is facilitated by the application by two Agile Principles. The first is the principle that directs the business and IT personnel to work together on daily basis. The second principle in play here is that of self-organizing teams. For example, one mechanism that spreads the role of tour guide across the team is the backlog prioritized by the product owner. The backlog respects the vision in bite sized chunks that the team can then plan and execute. Another example of tactical leadership that the team drives is the standup meeting, in which the whole team acts as cat herders. So, on an agile project, who is the tour guide that herds the team toward the product owners vision in an Agile project? The answer is that role is spread across the team and that agile techniques facilitate making sure that we start and end in the correct place.


Categories: Process Management

Google API infrastructure outage yesterday

Google Code Blog - Thu, 05/02/2013 - 01:59
Author PhotoBy Louis Ryan, Software Engineer

We know that developers around the world depend on our APIs for their apps, sites and businesses every day. Unfortunately, we experienced an outage of the Google API serving infrastructure yesterday, April 30. This outage impacted most Google APIs, resulting in requests failing with a 500 error code. Additionally, users may have experienced missing features or capabilities from some Google services that rely on these APIs.

At 6:26 pm Pacific Time, we pushed a config change that inadvertently caused a widespread outage of our API infrastructure.  Our normal rollback procedure failed, delaying the rollback until 7:22 pm, at which time APIs started to recover. The outage was completely resolved by 8:00 pm.

We are making several changes to help ensure this issue won’t happen again. We’ve identified some key improvements to our release and rollback process that we are implementing immediately. Reliability is a top priority at Google, and we are continuously making improvements to our systems. We apologize to everyone who was affected.


Louis Ryan is an engineer on the API platforms team in Mountain View. Louis is passionate about making APIs faster, more consistent, and reliable.

Posted by Scott Knaster, Editor
Categories: Programming

Announcing the Release of WebMatrix 3

ScottGu's Blog - Scott Guthrie - Wed, 05/01/2013 - 20:53

I’m excited to announce the release of WebMatrix 3.  WebMatrix is a free, lightweight web development tool we first introduced in 2010, and which provides a great, focused web development experience for ASP.NET, PHP, and Node.js.  

Today’s release includes a ton of great new features.  You can easily get started by downloading it, and watching an introduction video:

clip_image002 clip_image004

Some of the highlights of today’s release include deep Windows Azure integration, source control tooling for Git and TFS, and a new remote editing experience. 

Windows Azure Integration

With WebMatrix 3, we are making it really easy to move to the cloud. 

The first time you launch WebMatrix 3, there’s an option to sign into Windows Azure.  You can sign in using the same credentials you use with the Windows Azure Management Portal:

image

Once you are signed-in your Windows Azure account and subscriptions are integrated directly within WebMatrix.  You have the option to create up to 10 free sites on Windows Azure: 

image

You can use the My Sites”button to browse and edit the web sites you already have hosted on Windows Azure.  You can also use the New button to directly create and host new web sites on Windows Azure – and create either a blank new site, or a site created from the Windows Azure Web App Gallery (which lets you start with templates like Umbraco, WordPress, Drupal, etc):

image

In this case we’ll create a new web site using the popular Umbraco CMS solution – one of the templates in the Windows Azure Web Site Gallery:

image

When you select this template, WebMatrix can help you create a new Web Site to host it on Windows Azure, and associate all of the publishing information you need to publish it and keep it in sync with your editing environment within WebMatrix:

image

Once created you get a tailored experience within WebMatrix that provides integrated Umbraco (or WordPress or Drupal, etc) editing functionality inside the tool:

image

And WebMatrix provides the ability to open/edit any appropriate files in it with editing/ and code intellisense support:

image

And when you are done you can one-click publish the site to Windows Azure using the Publish command in top left of the tool.  WebMatrix will provide real-time feedback as it uploads and publishes the site:

image

The end result is a simple, fast and super effective way to edit your sites locally and host and manage them in Windows Azure. 

Watch this great video as Eric build a site with WebMatrix 3 and deploys it to Windows Azure.

Source Control with Git and TFS

One of the most requested features in WebMatrix 2 was support for version control.  WebMatrix 3 now supports both Git and TFS.  The source control experience is extensible, and we’ve worked with several partners to include rich support for Team Foundation Service, CodePlex and GitHub:

clip_image010

The Git tooling works with your current source repositories, configuration, and existing tools.  The experience includes support for commits, branching, multiple remotes, and works great for publishing Web Sites to Windows Azure:

clip_image012

The TFS experience is focused on making common source control tasks easy.  It matches up well with Team Foundation Service, our hosted TFS solution that provides free private Git and TFS repositories.

Watch these great videos of Justin giving a tour of the Git and TFS integration in WebMatrix 3

Remote Editing

In WebMatrix 2, we added the ability to open your Web Site directly from the Windows Azure Management Portal.  With WebMatrix 3, we’ve rounded out that experience by providing an amazing developer experience for live remote editing of your sites.   The new My Sites gallery now allows you to open existing web sites on your local machine, or to remotely edit sites that are hosted in Windows Azure:

image

While working with the remote site, IntelliSense and the other tools work as though the site was on your local machine.  But when you save changes it pushes them directly to the remote hosted site.  This makes it ideal for when you want to make quick changes in a hurry.

If you want to work with the site locally, you can click the ‘download’ button to install and configure any runtime dependencies, and work with the site on your machine:

clip_image016

Watch this video of Thao showing you how to edit your live site on Windows Azure using WebMatrix 3

Summary

WebMatrix 3 includes a seamless experience for working with sites in Windows Azure, source control support for working with Git and TFS, and a vastly improved remote editing experience.  These are just a few of the hundreds of improvements throughout the application, including an extension for PHP validation and Typescript support. 

You can easily get started with WebMatrix by downloading it for free, and watching an introduction video about it:

clip_image002 clip_image004

We look forward to seeing what you build with the new release!

Hope this helps,

Scott

P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu

Categories: Architecture, Programming

Maximizing Project Value - a Review - Chapter 2

Herding Cats - Glen Alleman - Wed, 05/01/2013 - 18:04

Chapter 1 of Maximizing Project Value introduced the idea of value. Chapter 2 speaks to where and how that value flows. A couple years ago there was confusion about the term value. Let's restate it here again. The value John speaks to is the business value. He speaks to Earned Value later in this book, but even more so in Project Management the Agile Way.

Chapter 2 Highlights

  • There is a distinction between project value and business value - project value is measured by Earned Value - how much of your budget did you earn? Business value must be derived from the business strategy. A business case if fine but a measurable strategy is better. Balanced Scorecard is the best way to connect business value with Earned Value. Here's how to do that Notes on Balanced Scorecard.
  • Interests of the customer must be balanced as the value is developed - one way to do that is with a Scorecard that connects project performance with strategy. Then the customer can see the Measures of Effectiveness (MoE) of the project's outcomes against the fulfillment of the strategy.
  • Projects are the instrument of strategy - yep this is how strategy gets implemented. The literature shows strategies fail most often during execution.
  • There is a six steps process from opportunity to projects. Projects drive value - I prefer the Goal Question Measurement approach to making these connections. John's approach is straight forward and simple.

I've found what I was looking for in Chapter 7 that makes the book critical to our success - connecting Business Value with Capabilities  Based Planning. I won't skip ahead. For now Chapter 2, is the starting point for putting these ideas to work.

Related articles Understanding Balanced Scorecard and Performance Measure All Projects Must Have a Strategy for Success Calculating "Earned Value" Must Read Book
Categories: Project Management

Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue

In NoSQL: Past, Present, Future Eric Brewer has a particularly fine section on explaining the often hard to understand ideas of BASE (Basically Available, Soft State, Eventually Consistent), ACID (Atomicity, Consistency, Isolation, Durability), CAP (Consistency Availability, Partition Tolerance), in terms of a pernicious long standing myth about the sanctity of consistency in banking.

Myth: Money is important, so banks must use transactions to keep money safe and consistent, right?

Reality: Banking transactions are inconsistent, particularly for ATMs. ATMs are designed to have a normal case behaviour and a partition mode behaviour. In partition mode Availability is chosen over Consistency.

Why? 1) Availability correlates with revenue and consistency generally does not. 2) Historically there was never an idea of perfect communication so everything was partitioned...

Categories: Architecture