Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Python: Find the highest value in a group

Mark Needham - Sun, 01/25/2015 - 13:47

In my continued playing around with a How I met your mother data set I needed to find out the last episode that happened in a season so that I could use it in a chart I wanted to plot.

I had this CSV file containing each of the episodes:

$ head -n 10 data/import/episodes.csv
NumberOverall,NumberInSeason,Episode,Season,DateAired,Timestamp
1,1,/wiki/Pilot,1,"September 19, 2005",1127084400
2,2,/wiki/Purple_Giraffe,1,"September 26, 2005",1127689200
3,3,/wiki/Sweet_Taste_of_Liberty,1,"October 3, 2005",1128294000
4,4,/wiki/Return_of_the_Shirt,1,"October 10, 2005",1128898800
5,5,/wiki/Okay_Awesome,1,"October 17, 2005",1129503600
6,6,/wiki/Slutty_Pumpkin,1,"October 24, 2005",1130108400
7,7,/wiki/Matchmaker,1,"November 7, 2005",1131321600
8,8,/wiki/The_Duel,1,"November 14, 2005",1131926400
9,9,/wiki/Belly_Full_of_Turkey,1,"November 21, 2005",1132531200

I started out by parsing the CSV file into a dictionary of (seasons -> episode ids):

import csv
from collections import defaultdict
 
seasons = defaultdict(list)
with open("data/import/episodes.csv", "r") as episodesfile:
    reader = csv.reader(episodesfile, delimiter = ",")
    reader.next()
    for row in reader:
        seasons[int(row[3])].append(int(row[0]))
 
print seasons

which outputs the following:

$ python blog.py
defaultdict(<type 'list'>, {
  1: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], 
  2: [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
  3: [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64], 
  4: [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], 
  5: [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112], 
  6: [113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136], 
  7: [137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160], 
  8: [161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184], 
  9: [185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208]})

It’s reasonably easy to transform that into a dictionary of (season -> max episode id) with the following couple of lines:

for season, episode_ids in seasons.iteritems():
    seasons[season] = max(episode_ids)
 
>>> print seasons
defaultdict(<type 'list'>, {1: 22, 2: 44, 3: 64, 4: 88, 5: 112, 6: 136, 7: 160, 8: 184, 9: 208})

This works fine but it felt very much like a dplyr problem to me so I wanted to see whether I could write something cleaner using pandas.

I started out by capturing the seasons and episode ids in separate lists and then building up a DataFrame:

import pandas as pd
from pandas import DataFrame
 
seasons, episode_ids = [], []
with open("data/import/episodes.csv", "r") as episodesfile:
    reader = csv.reader(episodesfile, delimiter = ",")
    reader.next()
    for row in reader:
        seasons.append(int(row[3]))
        episode_ids.append(int(row[0]))
 
df = DataFrame.from_items([('Season', seasons), ('EpisodeId', episode_ids)])
 
>>> print df.groupby("Season").max()["EpisodeId"]
Season
1          22
2          44
3          64
4          88
5         112
6         136
7         160
8         184
9         208

Or we can simplify that and read the CSV file directly into a DataFrame:

df = pd.read_csv('data/import/episodes.csv', index_col=False, header=0)
 
>>> print df.groupby("Season").max()["NumberOverall"]
Season
1          22
2          44
3          64
4          88
5         112
6         136
7         160
8         184
9         208

Pretty neat. I need to get more into pandas.

Categories: Programming

Re-read Saturday: Part Three: Implications for the Twenty-First Century, John P. Kotter Chapters 11 and 12

index

We complete the re-read of John P. Kotter’s book Leading Change by reviewing the implications from the last two chapters of the book.  Part Three paints the picture of a world in which the urgency for change will not abate and perhaps might even increase.  In chapter 11, titled The Organization of the Future, Kotter suggests that while in the past a single key leader can drive change, collaboration at the top of organizations is now required due to both the rate and complexity of change.  He argues that one person simply can’t have the time and expertise to manage, lead, communication, provide vision . . .  you get the point.  The message in the chapter is that for organizations of any type to prosper in the 21st century the ability to create and communicate vision is critical.  That skill needs to be fostered and developed over the long term just as any other significant organizational asset.  Long-term and continuous development of leadership is not accomplished simply by providing in a two-week course in leadership. While leadership is critical, it only goes so far in creating and fostering change and must be supplemented by a culture of empowerment. Broad-based empowerment allows organizations to tap a wide range of knowledge and energy at all levels of the organization.

Boiling the message of Chapter 11 down, Kotter suggests that an organization that will be at home with the dynamic nature of the 21st century will require a lean, non-bureaucratic structure that leverages a wide range of performance data. For example, in an empowered organization performance data must be gathered and analyzed from many sources. Performance data (e.g. customer satisfaction, productivity, returns, quality and others) gains maximum power when everyone has access to the data in order to drive continuous improvements. The culture of the new organization needs to shift from internally focused and command and control to an externally focused, non-bureaucratic organization. While Kotter does not use the terms lean and Agile, the organization he describes as tuned to the 21st Century reflects the tenants of lean and agile.

Chapter 12, titled Leadership and Lifelong Learning, circles back to the concept of leadership. It is a constant thread across all facets of the eight-stage model of change detailed in Leading Change. Kotter describes the need for leaders to continually develop competitive capacity (the capability to deal with an increasing competitive and dynamic environment). The model Kotter uses to describe the development competitive capacity begins with personal history and flows through competitive drive, lifelong learning, skills and abilities to competitive capacity.  Lifelong learning is an input and a tool for developing and honing skills and abilities. Skills and abilities feed competitive capacity. In our re-read of The Seven Habits of Highly Effective People, Stephen Covey culminated the Seven Habit with the habit call Sharpening the Saw.  Sharpening the Saw is a prescription for balanced self-renewal.  Life-long learning is an important component in balanced self-renewal. Whether you read Kotter or Covey the need to continuously learn is an inescapable necessity of any leaders.

As a rule, I am never overwhelmed by the chapters after the meat of most self-help books (I consider Leading Change a management self-help book, part of a continuum that Covey’s Seven Habits of Highly Effective People would be found on also). Part Three of Leading Change ties the book together by reinforcing the need for the eight-stage model for change and to address the need for continuously sharpening the saw.¬† Kotter‚Äôs model is a tool that requires leaders to apply therefore organizations and leaders must foster the capacity to address needed changes.

Re-read Summary

Change is a fact of life. John P. Kotter’s book, Leading Change, defines his famous eight-stage model for change. The first stage of the model is establishing a sense of urgency. A sense of urgency provides the energy and rational for any large, long-term change program. Once a sense of urgency has been established, the second stage in the eight-stage model for change is the establishment of a guiding coalition. If a sense of urgency provides energy to drive change, a guiding coalition provides the power for making change happen. A vision, built on the foundation of urgency and a guiding coalition, represents a picture of a state of being at some point in the future. Developing a vision and strategy is only a start, the vision and strategy must be clearly and consistently communicated to build the critical mass needed to make change actually happen. Once an organization wound up and primed, the people within the organization must be empowered and let loose to create change. Short-term wins provide the feedback and credibility needed to deliver on the change vision. The benefits and feedback from the short-term wins and other environmental feedback are critical for consolidating gains and producing more change. Once a change has been made it needs to anchored so that that the organization does not revert to older, comfortable behaviors throwing away the gains they have invest blood, sweat and tears to create.

The need for change is not abating. The eight-stage model for change requires leadership and vision.  Organizations need to foster leadership while both organizations and the people in those organizations must continually learn and hone their skills.

Next week we will review the list of books that readers of the blog and listeners to the podcast have identified as having a major impact on their career to vote on the next book we will tackle on Re-read Saturday.  Right now The Mythical Man Month by Fred Brooks is at the top of the list.  Care to influence the list?  Let me know the two books that most influenced your career.


Categories: Process Management

Stuff The Internet Says On Scalability For January 23rd, 2015

Hey, it's HighScalability time:


Elon Musk: The universe is really, really big  [Gigapixels of Andromeda [4K]]
  • 90: is the new 50 for woman designer; $656.8 million: 3 months of Uber payouts; $10 billion: all it takes to build the Internet in space; 1 billion: registered WeChat users
  • Quotable Quotes:
    • @antirez: Tech stacks, more replaceable than ever: hardware is better, startups get $$ (few nodes + or - who cares), alternatives countless.
    • Olivio Sarikas: If every Star in this Image was a 2 millimeter Sandcorn you would end up with 1110 kg of Sand!!!!!!!!!
    • Chad Cipoletti: In even simpler terms, we see brands as people.
    • @timoreilly: Love it: “We need a stack, not a pile” says @michalmigurski.
    • @neha: I would be very happy to never again see a distributed systems paper eval on a workload that would fit on one machine.
    • @etherealmind: OH: "oh yeah, the extra 4 PB of storage is being installed today. Its about 4 racks of gear".
    • @lintool: Andrew Moore: Google's ecommerce platform ingests 100K-200K events per second continuously. 

  • Programming as myth building. Myths to Live By: The true symbol does not merely point to something else. It contains in itself a structure which awakens our consciousness to a new awareness of the inner meaning of life and of reality itself. A true symbol takes us to the center of the circle, not to another point on the circumference.

  • Not shocking at all: "We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code...A majority (77%) of the failures require more than one input event to manifest, but most of the failures(90%) require no more than 3." Really, who has the time? More on human nature in Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems.

  • Let simplicity fail before climbing the complexity ladder. Scalability! But at what COST?: "Big data systems may scale well, but this can often be just because they introduce a lot of overhead. Rather than making your computation go faster, the systems introduce substantial overheads which can require large compute clusters just to bring under control. In many cases, you’d be better off running the same computation on your laptop." But notice the kicker: "it took some work for parallel union-find." Replacing smart work with brute force is often the greater win. What are a few machine cycles between friends?

  • Programming is the ultimate team sport, so Why are Some Teams Smarter Than Others? The smartest teams were distinguished by three characteristics. First, their members contributed more equally to the team’s discussions. Second, their members can better read complex emotional states. Third, teams with more women outperformed teams with more men.

  • WhatsApp doesn't understand the web. Interesting design and discussions. Using proprietary Chrome APIs is a tough call, but this is more perplexing: "Your phone needs to stay connected to the internet for our web client to work." Is this for consistency reasons? To make sure the phone and the web stay in sync? Is it for monetization reasons? It does create a closed proxy that effectively prevents monetization leaks. It's tough to judge a solution without understanding the requirements, but there must be something compelling to impose so many limitations.

  • Roman Leventov analysis of Redis data structures. In which Salvatore 'antirez' Sanfilippo addresses point by point criticisms of Redis' implementation. People love Redis, part of that love has to come from what a good guy antirez is. Here he doesn't go all black diamond alpha nerd in the face of a challenge. He admits where things can be improved. He explains design decisions in detail. He advances the discussion with grace, humility, and smarts. A worthy model to emulate.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Agile Roles: What do product owners do other than make decisions?

 

The product owner role is anything but boring.

The product owner role is anything but boring.

The role of the product owner is incredibly important. The decision-making role of a product owner helps grease the skids for the team so that they deliver value efficiently and effectively. That said, there is more to the role than making decisions. In the survey of practioners (Agile Roles: What does a product owner do?) the next four items were:

      1. Attends Scrum meetings
      2. Prioritizes the user stories (and backlog)
      3. Grooms backlog
      4. Defines product vision and features

The product owner is a core member of the team. Participating in the Scrum meetings ensures that the voice of the customer is woven into all levels of planning and is not just a hurdle to be surmounted in a demo. When I was taught Scrum, the participation of the product owner was optional at the daily stand-up, in the retrospective and in more technical parts of sprint planning. Experience has taught me that optional typically translates to not present, and not present translates into defects and rework. Note, on the original list #15 was buy the pizza. I think the Scrum meetings are a good place to occasionally spring for pizza or DONUTS.

The backlog is ‚Äúowned‚ÄĚ by the product owner. The product owner prioritizes the backlog based on interaction with the whole team and other stakeholders. There are many techniques for prioritizing the backlog, ranging from business value, technical complexity, and the squeaky wheel (usually not a good method). Regardless of the method the final prioritization is delivered by the product owner.

As projects progress the backlog evolves. That evolution reflects new stories, new knowledge about the business problem, changes in the implementation approach and the need to break stories into smaller components. The process for making sure stories are well-formed, granular enough to complete and have acceptance criteria is story grooming. Grooming is often a small team affair, however typically the product owner is part of the grooming team. Techniques like the Three Amigos are useful for structuring the grooming approach.

Product owner interprets the sponsor’s (the person with the checkbook and political capital to authorize the project) vision by providing the team with the product vision. The product vision represents the purpose or motivation for the project. Until the project is delivered a vision is the picture that anyone involved with the project should be able to describe. Delivering the vision and vision for the features is a leadership role that helps teams decide on how to deliver a function. Knowing where the project needs to end up provides the team with knowledge that supports making technical decisions.

The product owner is leader, do’er, a visionary and a team member. As the voice of the customer the product owner describes the value proposition for the project from the business’ point of view. As part of the team the product owner interprets and synthesizes information from other team members and outside stakeholders. This is reflected in decision and priorities that shape the project and the value it delivers.

 


Categories: Process Management

As a DBA Expert, which database would you choose?

This is a guest post by Jenny Richards, a professional database administrator who is currently employed at Remote DBA.

In the world of databases, there is no single silver bullet fitting for every gun. How you select the database to use is very dependent on every other factor of your work: 

  • Who are you and what do you do? 
  • What is your end goal – what are you working to achieve?
  • How much data do you intend to store?
  • On what language and OS platforms do your applications run?
  • What is your budget?
  • Will you also require data warehousing, decision support systems and/or BI?
Background information
Categories: Architecture

Business Analyst Tip: Seilevel Approach Works

Software Requirements Blog - Seilevel.com - Thu, 01/22/2015 - 16:00
I still remember my first project at Seilevel, more than 9 years ago. I struggled with the information flood and came out of that project with at least one clear goal:¬† to get better at acquiring new information.¬† In October, I started on a new project with a new client in an industry new to […]
Categories: Requirements

How to Start an Agile Project

Making the Complex Simple - John Sonmez - Thu, 01/22/2015 - 16:00

In this video, I quickly outline how I would start an Agile project from the ground up. I go over the idea of having a single product owner who is responsible for the product, rather than a committee of stakeholders.

The post How to Start an Agile Project appeared first on Simple Programmer.

Categories: Programming

Building a Credible Performance Measurement Baseline

Herding Cats - Glen Alleman - Thu, 01/22/2015 - 03:03

Many times I hear about Cost of Delay, Deliver Value, Measure story points, or Measure Stories. And a myriad of other assessments of project performance, all of which - OK, most of which are examples of Open Loop Control.

Back in 2014, we had a paper in a publication of the College of Performance Management, starting on Page 17. As well, a colleague Nick Pisano (CDR US Navy Retired) has a post on the same topic at his blog.

Screen Shot 2015-01-21 at 4.59.00 PM

The notion of a baseline let alone a Performance Measurement Baseline is at the heart of Closed Loop Control of all processes, from your heating and air conditioning system in your house, to the flight controls on the 737-700 winging its way back home to Denver, to the project you're working - using what ever project management method or software development method of your choosing.

The notion that we can manage anything, the temperature of the room, the nice soft ride in the 737, or the probability of showing up on or before the need date, at or below the needed cost, with the needed capabilities - and NOT have a baseline to steer to is simply wrong. 

Below is the framework for Closed Loop control. This paradigm says simply:

  • State where we are going in units of time, cost, and technical performance:
    • I need this set of features (Capabilities) to be available for use by the customer on or before this date, with some confidence level, for some cost - again with a confidence level.
    • With these features - provided on a planned date, for a planned cost - I can then assess the progress toward that planned date, planned cost, and planned capabilities.
  • With the Planned data and the assessment of the actual data - cost, schedule, and technical performance:
    • Technical Performance is actually not enough
    • Measures of Effectiveness are needed
    • Measures of Performance as well
    • And other¬†...ilities of the outcomes - reliability, maintainability, serviceability, stability, etc.
  • Then with these measures we can generate an¬†error signal - between planned to date and actual to date - to determine several critical things - without which¬†we're flying Open Loop.
    • Given our ¬†Performance to Date and the Planned Performance at this point, how far behind are we, how over budget are we, how close are we to getting this gadget to work as needed?
    • With this data, we can then make a decision.
  • Making those decision means
    • ESTIMATING both the¬†to be¬†target - where should we be at this point in the project for cost, schedule and technical performance. And where should we be to close the gaps between our target and the actual progress to date.
    • ESTIMATING what cost, effort, changes in technical direction are needed to close those gaps.

ESTIMATING IS THE BASIS OF DECISION MAKING - it can't be any clearer than that.

Control systems from Glen Alleman Related articles Your Project Needs a Budget and Other Things The Actual Science in Management Science Probability and Statistics of Project Work What is Governance? Don't Manage By Quoting Dilbert
Categories: Project Management

Python/pdfquery: Scraping the FIFA World Player of the Year votes PDF into shape

Mark Needham - Thu, 01/22/2015 - 01:25

Last week the FIFA Ballon d’Or 2014 was announced and along with the announcement of the winner the individual votes were also made available.

Unfortunately they weren’t made open in a way that Ben Wellington (of IQuantNY fame) would approve of – the choice of format for the data is a PDF file!

I wanted to extract this data to play around with it but I wanted to automate the extraction as I’d done when working with Google Trends data.

I had a quick look for PDF scraping libraries in Python and R and eventually settled on Python’s pdfquery, mainly because there was lots of documentation which made it easy to get started.

One way you scrape data from a PDF is by locating an element on the page and then grabbing everything within a bounded box relative to that element.

In my case I had 17 pages all of which had a heading for each of six columns.

2015 01 22 00 08 18

I wanted to grab the data in each of those columns but initially struggled working out what elements I should be looking for until I came across the following function which allows you to dump an XML version of the PDF to disk:

import pdfquery
pdf = pdfquery.PDFQuery("fboaward_menplayer2014_neutral.pdf")
pdf.load()
pdf.tree.write("/tmp/yadda", pretty_print=True)

The output looks like this:

$ head -n 10 /tmp/yadda
<pdfxml ModDate="D:20150110224554+01'00'" CreationDate="D:20150110224539+01'00'" Producer="Microsoft&#174; Excel&#174; 2010" Creator="Microsoft&#174; Excel&#174; 2010">
  <LTPage bbox="[0, 0, 841.8, 595.2]" height="595.2" pageid="1" rotate="0" width="841.8" x0="0" x1="841.8" y0="0" y1="595.2" page_index="0" page_label="">
    <LTAnon> </LTAnon>
    <LTTextLineHorizontal bbox="[31.08, 546.15, 122.524, 556.59]" height="10.44" width="91.444" word_margin="0.1" x0="31.08" x1="122.524" y0="546.15" y1="556.59"><LTTextBoxHorizontal bbox="[31.08, 546.15, 122.524, 556.59]" height="10.44" index="0" width="91.444" x0="31.08" x1="122.524" y0="546.15" y1="556.59">FIFA Ballon d'Or 2014 </LTTextBoxHorizontal></LTTextLineHorizontal>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>

Having scanned through the file I realised that what I needed to do was locate the ‘LTTextLineHorizontal’ element for each heading and then grab all the ‘LTTextLineHorizontal’ elements that appeared in that column.

I started out by trying to grab the ‘Name’ column on the first page:

>>> name_element = pdf.pq('LTPage[pageid=\'1\'] LTTextLineHorizontal:contains("Name")')[0]
>>> name_element.text
'Name '

Next I needed to get the other elements in that column. With a bit of trial and error I ended up with the following code:

x = float(name_element.get('x0'))
y = float(name_element.get('y0'))
cells = pdf.extract( [
         ('with_parent','LTPage[pageid=\'1\']'),
         ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y-500, x+150, y))
    ])
 
>>> [cell.text.encode('utf-8').strip() for cell in cells['cells']]
['Amiri Islam', 'Cana Lorik', 'Bougherra Madjid', 'Luvu Rafe Talalelei', 'Sonejee Masand Oscar', 'Amaral Felisberto', 'Liddie Ryan', 'Griffith Quinton', 'Messi Lionel', 'Berezovskiy Roman', 'Breinburg Reinhard', 'Jedinak Mile', 'Fuchs Christian', 'Sadigov Rashad', 'Gavin Christie', 'Hasan Mohamed', 'Mamun Md Mamnul Islam', 'Burgess Romelle', 'Kalachou Tsimafei', 'Komany Vincent', 'Eiley Dalton', 'Nusum John', 'Tshering Passang', 'Raldes Ronald', 'D\xc5\xbeeko Edin', 'Da Silva Santos Junior Neymar', 'Ceasar Troy', 'Popov Ivelin', 'Kabore Charles', 'Ntibazonkiza Saidi', 'Kouch Sokumpheak']

I cleaned that up and generified it to work for any page and for columns of different widths. This is what the function looks like:

def extract_cells(page, header, cell_width):
    name_element = pdf.pq('LTPage[pageid=\'%s\'] LTTextLineHorizontal:contains("%s")' % (page, header))[0]
    x = float(name_element.get('x0'))
    y = float(name_element.get('y0'))
    cells = pdf.extract( [
         ('with_parent','LTPage[pageid=\'%s\']' %(page)),
         ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y-500, x+cell_width, y))
    ])
    return [cell.text.encode('utf-8').strip() for cell in cells['cells']]

We can then call that for each column on the page and zip together the resulting arrays to get a tuple for each row:

roles = extract_cells(1, "Vote", 50)
countries = extract_cells(1, "Country", 150)
voters = extract_cells(1, "Name", 170)
first = extract_cells(1, "First (5 points)", 150)
second = extract_cells(1, "Second (3 points)", 150)
third = extract_cells(1, "Third (1 point)", 130)
 
>>> for vote in zip(roles, countries, voters, first, second, third)[:5]:
       print vote
 
('Captain', 'Afghanistan', 'Amiri Islam', 'Messi Lionel', 'Cristiano Ronaldo', 'Ibrahimovic Zlatan')
('Captain', 'Albania', 'Cana Lorik', 'Cristiano Ronaldo', 'Robben Arjen', 'Mueller Thomas')
('Captain', 'Algeria', 'Bougherra Madjid', 'Cristiano Ronaldo', 'Robben Arjen', 'Benzema Karim')
('Captain', 'American Samoa', 'Luvu Rafe Talalelei', 'Neymar', 'Robben Arjen', 'Cristiano Ronaldo')
('Captain', 'Andorra', 'Sonejee Masand Oscar', 'Cristiano Ronaldo', 'Mueller Thomas', 'Kroos Toni')

The next step was to write out each of those rows to a CSV file so we can use it from another program. The full script looks like this:

import pdfquery
import csv
 
def extract_cells(page, header, cell_width):
    name_element = pdf.pq('LTPage[pageid=\'%s\'] LTTextLineHorizontal:contains("%s")' % (page, header))[0]
    x = float(name_element.get('x0'))
    y = float(name_element.get('y0'))
    cells = pdf.extract( [
         ('with_parent','LTPage[pageid=\'%s\']' %(page)),
         ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y-500, x+cell_width, y))
    ])
    return [cell.text.encode('utf-8').strip() for cell in cells['cells']]
 
if __name__ == "__main__":
    pdf = pdfquery.PDFQuery("fboaward_menplayer2014_neutral.pdf")
    pdf.load()
    pdf.tree.write("/tmp/yadda", pretty_print=True)
 
    pages_in_pdf = len(pdf.pq('LTPage'))
 
    with open('votes.csv', 'w') as votesfile:
        writer = csv.writer(votesfile, delimiter=",")
        writer.writerow(["Role", "Country", "Voter", "FirstPlace", "SecondPlace", "ThirdPlace"])
        for page in range(1, pages_in_pdf + 1):
            print page
            roles = extract_cells(page, "Vote", 50)
            countries = extract_cells(page, "Country", 150)
            voters = extract_cells(page, "Name", 170)
            first = extract_cells(page, "First (5 points)", 150)
            second = extract_cells(page, "Second (3 points)", 150)
            third = extract_cells(page, "Third (1 point)", 130)
            votes = zip(roles, countries, voters, first, second, third)
            print votes
            for vote in votes:
                writer.writerow(list(vote))

The code is on github if you want to play around with it or if you just want to grab the votes data that’s there too.

Categories: Programming

Learn from my pain - 5 Lessons from Ello's Adventures in Rapid Scaling

Within one week Ello went from thousands of sessions a day to a few million sessions a day. Mike Pack wrote a great article sharing what they’ve learned: 5 Early Lessons from Rapid, High Availability Scaling with Rails.

Some of their scaling challenges: quantity of data, team size, DNS, bot prevention, responding to users, inappropriate content, and other forms of caching. What did they learn?

  1. Move the graph. User relationships were implemented on a standard Rails stack using Heroku and Postgres. The relationships table became the bottleneck. Solution: denormalize the social graph and move hot data into Redis. Redis is used for speed and Postgres is used for durability. Lesson: know the core pillar that supports your core offering and make it work.

  2. Create indexes early, or you're screwed. There's a camp that says only create indexes when they are needed. They are wrong. The lack of btree indexes kills query performance. Forget a unique index and your data becomes corrupted. Once the damage is done it's hard to add unique indexes later. The data has to be cleaned up and indexes take a long time to build when there's a lot of data.

  3. Sharding is cool, but not that cool. Shard all the things only after you've tried vertically scaling as much as possible. Sharding caused a lot of pain. Creating a covering index from the start and adding more RAM so data could be served from memory, not from disk, would have saved a lot of time and stress as the system scaled.

  4. Don't create bottlenecks, or do. Every new user automatically followed a system user that was used for announcements, etc. Scaling problems that would have been months down the road hit quickly as any write to the system user caused a write amplification of millions of records. The lesson here is not what you may think. While scaling to meet the challenge of the system user was a pain, it made them stay ahead of the scaling challenge. Lesson: self-inflict problems early and often.

  5. It always takes 10 times longer. All the solutions mentioned take much longer to implement than you might think. Early estimates of a couple days soon give way to the reality of much longer time hits. Simply moving large amounts of data can take days. Adding indexes to large amounts of data takes time. And with large amounts of data problems tend to happen as you get to the larger data sizes which means you need to apply a fix and start over. 

This full article is excellent and is filled with much more detail that makes it well worth reading.

Categories: Architecture

We Need Planning; Do We Need Estimation?

As I write the program management book, I am struck by how difficult it is to estimate large chunks of work.

In Essays on Estimation and Manage It!, I recommend several approaches to estimation, each of which include showing that there is no one absolute date for a project or a program.

What can you do? Here are some options:

  1. Plan to replan. Decide how much to invest in the project or program for now. See (as in demo) the project/program progress. Decide how much longer you want to invest in the project or program.
  2. Work to a target date. A target date works best if you work iteratively and incrementally. If you have internal releases often, you can see project/program progress and replan. (If you use a waterfall approach, you are not likely to meet the target with all the features you want and defects you don’t want. If you work iteratively and incrementally, you refine the plan as you approach the target. Notice I said refine the plan, not the estimate.
  3. Provide a 3-point estimate: possible, likely, and worst case. This is PERT estimation.
  4. Provide a percentage confidence with your estimate. You think you can release near a certain date. What is your percentage confidence in that date? This works best with replanning, so you can update your percentage confidence.

Each of these shows your estimation audience you have uncertainty. The larger the project or program, the more you want to show uncertainty.

If you are agile, you may not need to estimate at all. I have managed many projects and programs over the years. No one asked me for a cost or schedule estimate. I received targets. Sometimes, those targets were so optimistic, I had to do a gross estimate to explain why we could not meet that date.

However, I am not convinced anything more than a gross estimate is useful. I am convinced an agile roadmap, building incrementally, seeing progress, and deciding what to do next are good ideas.

Agile Roadmap When you see this roadmap, you can see how we have planned for an internal release each month.

With internal releases, everyone can see the project or program progress.

In addition, we have a quarterly external release. Now, your project or program might not be able to release to your customers every quarter. But, that should be a business decision, not a decision you make because you can’t release. If you are not agile, you might not be able to meet a quarterly release. But, I’m assuming you are agile.

Agile Roadmap, One Quarter at a time In the one-quarter view, you can see the Minimum Viable Products.

You might need to replace MVPs with MIFS, Minimum Indispensable Feature Sets, especially at the beginning.

If you always make stories so small that you can count them, instead of estimate them, you will be close. You won’t spend time estimating instead of developing product, especially at the beginning.

You know the least about the risks and gotchas at the beginning of a project or program. You might not even know much about your MIFS or MVPs. However, if you can release something for customer consumption, you can get the feedback you need.

Feedback is what will tell you:

  • Are these stories too big to count? If so, any estimate you create will be wrong.
  • Are we delivering useful work? If so, the organization will continue to invest.
  • Are we working on the most valuable work, right now? What is valuable will change during the project/program. Sometimes, you realize this feature (set) is less useful than another. Sometimes you realize you’re done.
  • Are we ready to stop? If we met the release criteria early, that’s great. If we are not ready to release, what more do we have to do?

Here’s my experience with estimation. If you provide an estimate, managers won’t believe you. They pressure you to “do more with less,” or some such nonsense. They say things such as, “If we cut out testing, you can go faster, right?” (The answer to that question is, “NO. The less technical debt we have or create, the faster we can go.”)

However, you do need the planning of roadmaps and backlogs. If you don’t have a roadmap that shows people something like what they can expect when, they think you’re faking. You need to replan the roadmap, because what the teams deliver won’t be everything the product owner wanted. That’s okay. Getting feedback about what the teams can do early is great.

There are two questions you want to ask people who ask for estimates:

  1. How much would you like to invest in this project/program before we stop?
  2. How valuable is this project/program to you?

If you work on the most valuable project/program, why are you estimating it? You need to understand how much the organization wants to invest before you stop. If you’re not working on the most valuable project/program, you still want to know how much the organization wants to invest. Or, you need a target date. With a target date, you can release parts iteratively and incrementally until you meet the target.

This is risk management for estimation and replanning. Yes, I am a fan of #noestimates, because the smaller we make the chunks, the easier it is to see what to plan and replan.

We need planning and replanning. I am not convinced we need detailed estimation if we use iterative and incremental approaches.

Categories: Project Management

Continuous Delivery across multiple providers

Xebia Blog - Wed, 01/21/2015 - 13:04

Over the last year three of the four customers I worked with had a similar challenge with their environments. In different variations they all had their environments setup across separate domains. Ranging from physically separated on-premise networks to having environments running across different hosting providers managed by different parties.

Regardless of the reasoning behind having these kinds of setup it‚Äôs a situation where the continuous delivery concepts really add value. The stereotypical problems that exist with manual deployment and testing practices tend to get amplified when they occur in seperated¬†domains. Things get even worse when you add more parties to the mix (like external application developers). Sticking to doing things manually is a recipe for disaster unless you enjoy going through expansive procedures every time you want to do anything in any of ‚Äėyour‚Äô environments. And if you‚Äôve outsourced your environments to an external party you probably don‚Äôt want to have to (re)hire a lot of people just so you can communicate with your supplier.

So how can continuous delivery help in this situation? By automating your provisioning and deployments you make deploying your applications, if nothing else, repeatable and predictable. Regardless of where they need to run.

Just automating your deployments isn’t enough however, a big question that remains is who does what. A question that is most likely backed by a lengthy contract. Agreements between all the parties are meant to provide an answer to that very question. A development partner develops, an outsourcing partners handles the hardware, etc. But nobody handles bringing everything together...

The process of automating your steps already provides some help with this problem. In order to automate you need some form of agreement on how to provide input for the tooling. This at least clarifies what the various parties need to produce. It also clarifies what the result of a step will be. This removes some of the fuzziness out of the process. Things like is the JVM part of the OS or part of the middleware should become clear. But not everything is that clearcut. It’s parts of the puzzle where pieces actually come together that things turn gray. A single tool may need input from various parties. Here you need to resists the common knee-jerk reaction to shield said tool from other people with procedures and red tape. Instead provide access to those tools to all relevant parties and handle your separation of concerns through a reliable access mechanism. Even then there might be some parts that can’t be used by just a single party and in that case, *gasp*, people will need to work together. 

What this results in is an automated pipeline that will keep your environments configured properly and allow applications to be deployed onto them when needed, within minutes, wherever they may run.

MultiProviderCD

The diagram above shows how we set this up for one of our clients. Using XL Deploy, XL Release and Puppet as the automation tooling of choice.

In the first domain we have a git repository to which developers commit their code. A Jenkins build is used to extract this code, build it and package it in such a way that the deployment automation tool (XL Deploy) understands. It’s also kind enough to make that package directly available in XL Deploy. From there, XL Deploy is used to deploy the application not only to the target machines but also to another instance of XL Deploy running in the next domain, thus enabling that same package to be deployed there. This same mechanism can then be applied to the next domain. In this instance we ensure that the machines we are deploying to are consistent by using Puppet to manage them.

To round things off we use a single instance of XL Release to orchestrate the entire pipeline. A single release process is able to trigger the build in Jenkins and then deploy the application to all environments spread across the various domains.

A setup like this lowers deployment errors that come with doing manual deployments and cuts out all the hassle that comes with following the required procedures. As an added bonus your deployment pipeline also speeds up significantly. And we haven’t even talked about adding automated testing to the mix…

Small Basic on Mac & Linux

Phil Trelford's Array - Wed, 01/21/2015 - 09:29

Microsoft’s Small Basic is a simple programming language and environment aimed at beginners.

It ships with an IDE for Windows, a commands line compiler and a small .Net library. Small Basic programs can also be run in the browser on Windows & Mac via SIlverlight.

The shipped .Net library for Small Basic targets WPF for graphics which is unfortunately not supported on Mono, which means Small Basic apps will not run directly on Mac or Linux.

To get Small Basic apps running from the command prompt on Mac and Linux all that is needed is a new library is required without the WPF dependency.

Recently I knocked up such a library providing support for command line input and output, providing graphics is a work-in-progress.

But this does mean I can now write and run FizzBuzz, or even work through the majority of the Small Basic Tutorial, on Linux or Mac via Mono:

Small Basic on Mac

Combine this with my open source Small Basic compiler project (written in F#) and there’s now have a cross platform version of Small Basic :)

If you fancy having a play with an early version of the source download it here: http://trelford.com/SmallBasicLibrary2012.zip

Future work

I’m currently evaluating GtkSharp, OpenTK and WinForms as options for a cross platform version of the graphics library.

As well as the compiler, I’ve also written an interpreter for Small Basic which means it should be possible to edit and run programs on iOS and Android, but that’s another story…

Categories: Programming

Agile Roles: What does a product owner do?

 

One on the product owner's roles is to buy the pizza (or the sushi!)

One of¬†the product owner’s roles is to buy the pizza (or the sushi!)

The product owner role, one of the three identified in Scrum, is deceptively simple. The product owner is the voice of the customer; a conduit to bring business knowledge into the team. The perceived (the word perceived is important) simplicity of the role leads to a wide range of interpretations. Deconstructing the voice of the customer a bit further yields tasks and activities that include defining what needs to be delivered, dynamically providing answers and feedback to the team and prioritizing the backlog. I recently asked a number of product owners, Scrum masters and process improvement personnel for a list of the four activities that product owner was responsible for. The list (ranked by the number of responses, but without censorship) is shown below:

      1. Makes decisions
      2. Attends Scrum meetings
      3. Prioritizes the user stories (and backlog)
      4. Grooms backlog
      5. Defines product vision and features
      6. Accepts or rejects work
      7. Plans for releases
      8. Involves stakeholders (included customers, users, executives, SMEs)
      9. Sells the project
      10. Trains the business
      11. Buys pizza
      12. Provides the project budget
      13. Tests features
      14. Shares the feature list with business
      15. Generates team consensus

The #1 activity of the product owner is to make decisions. Decisions are a critical input for all project teams. Projects are a reflection of a nearly continuous stream of decisions. Decisions that if not made by the right person could take a project off course. While not all decisions made by a team rise to the level of needing input from product owner, or even more importantly rise to the level of needing immediate input from a product owner, when needed the product owner needs to be available and ready to make the tough calls that are needed.

In the first major technology project I was involved with my company decided to shift from one computer platform to another. It was a big deal. In our first attempt the IT department attempted to manage the process without interacting with the business (I was the business). That first attempt at a conversion was . . . exciting. I learned a number of new poignant phrases in several Eastern European languages. The second time around, a business lead was appointed to act as the voice of the business and to coordinate business involvement. The business lead spent at least ¬Ĺ the day with the project team and 1/2 in the business. Leads from all departments and project teams involved in the project met daily to review progress and issues (sort of Scrum of Scrums back in 1979). The ability to meet, talk and make decisions was critical for delivering the functionality needed by the business.

Making decisions isn’t the only task that product owners are called on to perform, but it is one of a very few that almost everyone can agree upon. Although buying pizza would have been higher up my list!

What would you add to the list? Which do you disagree with?


Categories: Process Management

Sponsored Post: Couchbase, VividCortex, Internap, SocialRadar, Campanja, Transversal, MemSQL, Hypertable, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Senior DevOps EngineerSocialRadar. We are a VC funded startup based in Washington, D.C. operated like our West Coast brethren. We specialize in location-based technology. Since we are rapidly consuming large amounts of location data and monitoring all social networks for location events, we have systems that consume vast amounts of data that need to scale. As our Senior DevOps Engineer you’ll take ownership over that infrastructure and, with your expertise, help us grow and scale both our systems and our team as our adoption continues its rapid growth. Full description and application here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/
Cool Products and Services
  • See How PayPal Manages 1B Documents & 10TB Data with Couchbase. This presentation showcases PayPal's usage of Couchbase within its architecture, highlighting Linear scalability, Availability, Flexibility & Extensibility. See How PayPal Manages 1B Documents & 10TB Data with Couchbase.

  • VividCortex is a hosted (SaaS) database performance management platform that provides unparalleled insight and query-level analysis for both MySQL and PostgreSQL servers at micro-second detail. It's not just another tool to draw time-series charts from status counters. It's deep analysis of every metric, every process, and every query on your systems, stitched together with statistics and data visualization. Start a free trial today with our famous 15-second installation.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike demonstrates RAM-like performance with Google Compute Engine Local SSDs. After scaling to 1 M Writes/Second with 6x fewer servers than Cassandra on Google Compute Engine, we certified Google’s new Local SSDs using the Aerospike Certification Tool for SSDs (ACT) and found RAM-like performance and 15x storage cost savings. Read more.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Government Regulations from a Requirements Perspective:

Software Requirements Blog - Seilevel.com - Tue, 01/20/2015 - 16:00
Recently, I have been working with a large financial service provider to help them comply with regulations revolving around the Dodd-Frank regulatory bill passed after the financial crisis of 2008. The Dodd-Frank bill itself is 848 pages while the regulations born out of the law amount to almost 14,000. To give a little perspective, that […]
Categories: Requirements

Focus on Benefits Rather than Features

Mike Cohn's Blog - Tue, 01/20/2015 - 15:00

The following was originally published in Mike Cohn's monthly newsletter. If you like what you're reading, sign up to have this content delivered to your inbox weeks before it's posted on the blog, here.

Suppose your boss has not bought into trying an agile approach in your organization. You schedule a meeting with the boss, and stress how your organization should use Scrum because Scrum:

  • Has short time boxes
  • Relies on self-organization
  • Includes three distinct roles

And based on this discussion, your boss isn’t interested.

Why? Because you focused on the features of Scrum rather than its benefits.

Product owners and Scrum teams make the same mistake when working with the product backlog. A feature is something that is in a product—a spell-checker is in a word processor. A benefit is something a user gets from of a product. By using a word processor, the user gets documents free from spelling errors. The spell-checker is the feature, mistake-free documents is the benefit.

It is generally considered a good practice for the items at the top of a product backlog to be small. Each must be small enough to fit into a sprint, and most teams will do at least a handful each sprint.

The risk here is that small items are much more likely to be features than benefits. When a Scrum team (and specifically its product owner) becomes overly focused on features, it is possible to lose sight of the benefits.

Scrum teams commonly mitigate this risk in two common ways. First, they leave stories as epics until they move toward the top of the product. Second, they include a so-that clause in their user stories. These help, but do not fully eliminate the risk.

Let’s return to your attempt to convince your boss to let your team use Scrum. Imagine you had focused on the benefits of Scrum rather than its features. You told your boss how using Scrum would lead to more successful products, more productive teams, higher quality software, more satisfied stakeholders, happier teams, and so on.

Can you see how that conversation would have gone differently than one focused on short time boxes, self-organization and roles?

Software Development Linkopedia January 2015

From the Editor of Methods & Tools - Tue, 01/20/2015 - 14:58
Here is our monthly selection of interesting knowledge material on programming, software testing and project management.¬†This month you will find some interesting information and opinions about managing software developers, software architecture, Agile testing, product owner patterns, mobile testing, continuous improvement, project planning and technical debt. Blog: Killing the Crunch Mode Antipattern Blog: Barry’s Rules of Engineering and Architecture Blog: Lessons Learned Moving From Engineering Into Management Blog: A ScrumMaster experience report on using Feature Teams Article: Using Models to Help Plan Tests in Agile Projects Article: A Mitigation Plan for a Product Owner’s Anti-Pattern Article: Guerrilla Project ...

Debugging Small Basic Apps in Visual Studio

Phil Trelford's Array - Tue, 01/20/2015 - 09:17

Microsoft Small Basic ships with a custom IDE with syntax colouring and code completion but no debugger:

SmallBasic

There’s a good article by Nonki Takahashi on Microsoft Technet on How to debug Small Basic programs manually which boils down to:

  • trace with TextWindow.WriteLine
  • add conditional debug code with If debug Then ‚Ķ
  • promote your app to full VB.Net

Small Basic in Visual Studio

Last year, for fun, I wrote a custom Small Basic compiler with some extensions like functions with parameters and tuples and pattern matching.

I added debugger support recently so you can compile and debug Small Basic apps directly in Visual Studio:

SmallBasic Debug in VS2013

Setup Steps:

  • download and compile the custom Small Basic compiler
  • create a project to host your Small Basic file (.sb)
  • compile the app with the custom Small Basic Compiler
  • in the project properties Debug tab configure Start External Program

Implementation details

The Reflection.Emit library allows you to mark points in the emitted IL code with corresponding points in the source file using the MarkSequencePoint method. There’s a good guide from back in 2005 on Michael Stall’s blog: Debugging Dynamically Generated Code (Reflection.Emit)

Only a few changes to the compiler were required to provide Debugger support. First I augmented the parser, which uses FParsec, to produce line and column information for each statement. For this FParsec provides a handy getPosition function that returns the current position in the input stream. Then in the IL emit step I simply used MarkSequencePoint to annotate each statement.

Future work

It would also be nice to add syntax colouring for Small Basic within Visual Studio too. If anyone is interested in working with me on this, please get in touch Smile

I’m also now able to run Small Basic programs on Mac and Linux via Mono but that’s another post…

Categories: Programming

Planning And Estimating Is Required for Project Success

Herding Cats - Glen Alleman - Mon, 01/19/2015 - 20:06

In project work we're looking to create or change something, in some defined period of time, for a defined cost. This means we're going to spend money now for some future outcome. The elements that go into this effort to produce some change in the future include (but are not limited to) scope of our efforts (requirements for the outcomes), technical performance (including quality and other ...ilities of the outcomes), the schedule for the work (so we don't have to do everything at once), the budget so we know the cost of the value produced), resources that do the work in exchange for money defined in the budget, risk to cost, schedule, and technical performance goals, and other attributes. A specific project will have specific constraints from each of these attributes. 

The relationships between these attributes is usually non-linear, random in some way (stochastic), and affects future outcomes. Because of the random nature of the attributes and the random nature of their relationship, simple linear, non-statistical projections of past performance used for future performance is most likely to be a disappointment.

Welcome to the Future

To answer the question what does the future look like when the past is a non-linear stochastic process, we need to be able to manage in the presence of uncertainty. With this ability, the future simply emerges and many times this futute is not what was expected. 

This is the role of planning. The best description of planning is

planning constantly peers into the future for indications as to where a solution may emerge. A Plan is a complex situation, adapting to an emerging solution. 
- Mike Dwyer, Big Visible

To be successful at planning we need to do many things. Since it is the future we're planning for each of these things requires an

ESTIMATE

Yes, we're never going to see it coming if we don't Plan. And to Plan, we need to estimate. And to estimate we need to learn how to estimate. So if we want to manage in the presence of uncertainty, here's a starting point...

When managing in the presence of uncertainty we're going to need to make estimates of the impacts of that uncertainty on our decision making processes for that emerging future. When we hear we can make decision in the absence of estimates, it's simply not true in any non-trivial manner. For trivial decisions, Do I start with the enrollment UI or the validation of IRS information UI, or do I buy a Mac with 4GB or 6GB of memory (assuming I can upgrade if I need more), making estimates is likely is little value. But for a decision like, do we switch all our legacy systems to SAP or JD Edwards, I going to need a credible estimate of the cost, schedule, resources, risks, and tangible beneficial outcomes for my enterprise.  So without a domain, context, the Value at Risk, the underlying processes for uncertainty (reducible and irreducible) and some other attributes, the notion of making decisions in the absence of estimating the cost and impact of those decisions in simply bad management.  For Those Interested in Decision Making Processes in the Presence of Uncertainty

 

Related articles Bayesian Statistics and Project Work Decision Making in the Presence of Uncertainty Your Project Needs a Budget and Other Things
Categories: Project Management