Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

C4 stencil for OmniGraffle

Coding the Architecture - Simon Brown - Sun, 06/28/2015 - 11:38

If you like the look and feel of the C4 software architecture diagrams in my Software Architecture for Developers book (see examples here), Dennis Laumen has created an OmniGraffle stencil that will save you some time. Just download the stencil, install and it will appear in your stencil library.

The C4 stencil is available from Omni Group's Stenciltown. Thanks Dennis!

Categories: Architecture

Re-Read Saturday: Mythical Man-Month, Part 1

The Mythical Man-Month

The Mythical Man-Month

The Mythical Man-Month: Essays on Software Engineering was originally published in 1975. That was the year I took my first course in programming ‚Äď FORTRAN, at Memphis State University. Mythical Man-Month has been a must read for every professional involved in delivering value since it was published. The core themes in the book, which include Brook‚Äôs Law (adding people to a late project just makes it later – more in a future installment), are just as important today as they were when they were published because they still true for all types of software development and maintenance.¬†¬† The Mythical Man-Month,¬†I think on reflection you will agree, is still important because we are still trying to come to grips with most if not all of the concepts that Brooks exposed.

Here is the game plan for the Re-Read. We will be reading from Anniversary Edition of the Mythical Man-Month (Addison Wesley 1995). My copy is the 14th printing from 2014. Part of this re-read will actually be a first read for me. I believe I originally read Man-Month in the early 1980‚Äôs, and this edition has a new preface and four new chapters. My intent during the re-read is to cover two essays/chapters per week so the re-read takes ten weeks ‚Äúish.‚ÄĚ ¬†Today will be the exception to the rule. We will tackle the first essay, The Tar Pit (I already used the ideas in the essay to make a point in the essay on Scrum of Scrum membership).

Preface(s):
The original preface provides background for Brooks‚Äô perspective. Brooks came to his conclusions based on a history as a hardware architect then as the manager of the OS/360 project (IBM’s highly successful mainframe operating system). In the preface Brooks telegraphs one of his central theorems – large projects are different than small projects. It took a few years and the Agile Manifesto (2001) to recognize this fact and begin to act on it.

The anniversary edition includes a new preface in addition to the original. There were two notes in the new preface that caused me pause. The first was the observation by Brooks was that few of the central thoughts in the book had been critiqued, proven, and disproven by research and experience. I would like your feedback as we review the essays. I will explicitly point those points out during the re-read and if miss any skip ahead to Chapter 18 to keep me honest. The second point, based on my interest in the publishing industry,  from the new preface was that Brooks was able to publicly thank Addison Wesley staff that he found instrumental when preparing the first edition. The publishing world has changed. The question I pose to the Software Process and Measurement Cast blog readers is has our profession changed as much?

Chapter 1: The Tar Pit

In the Tar Pit Brooks sets the context for the system programming as a craft that is more than just the lone coder sitting in his or her office creating a stand alone app but rather a member of a team building large scale systems. The Tar Pit describes why the complexity and level of effort is different and then exposes the joy and woes of the profession.

The first takeaway from the Tar Pit is that as the product of a piece of work progresses from the delivery of a single program to programming product or system then to a programming systems product, the complexity of managing and understanding the work increases. As size and complexity increase so do tasks and activities to ensure the product works together. Brook suggests that the difference between level of effort between the creation of stand-alone program and a systems product is 9 times! The complexity of coordinating all of these additional tasks and activities while solving complex business problems impacts our ability to deliver on-time, on-budget and on-scope. At times, it also impact our ability to deliver WOW.

The second takeaway are the joys of programming.  Joys motivate programmers to put forth the effort needed to deliver value while trying to navigate the tar pit.  Brooks points out that the rewards of programming are at least four fold:

  1. The joy of making things.
  2. The joy of making things that are useful to others.
  3. The joy of problem solving.
  4. The joy of learning.

These joys are offset by several woes.  These woes are the part of the tar pit that make programming hard.

  1. Expectation of perfection. Unless the code is correct and solves the requirements, it is wrong. There is little to no gray area.
  2. Dependence on others. Others set your objectives, provide resources and often control the source of information. This often leads to the tension caused when individual or teams have the responsibility to deliver without commensurate authority. Agile principles, when applied to teams, are a start to addressing these woes.
  3. Finding the bugs in the big picture.  Designing the big picture is significantly more fun than the tedious effort required to find bugs.
  4. Dead on arrival projects.  Long running projects are occasionally dead on arrival or the market has changed and what has been delivered is not what is needed.

In 1975 when this essay was published Agile was not a word one would attach to building applications or systems. However I think we can see the seeds of the Agile revolution in this essay.


Categories: Process Management

R: Command line ‚Äď Error in GenericTranslator$new : could not find function ‚ÄúloadMethod‚ÄĚ

Mark Needham - Sat, 06/27/2015 - 23:47

I’ve been reading Text Processing with Ruby over the last week or so and one of the ideas the author describes is setting up your scripts so you can run them directly from the command line.

I wanted to do this with my Wimbledon R script and wrote the following script which uses the ‘Rscript’ executable so that R doesn’t launch in interactive mode:

wimbledon

#!/usr/bin/env Rscript
 
library(rvest)
library(dplyr)
library(stringr)
library(readr)
 
# stuff

Then I tried to run it:

$ time ./wimbledon
 
...
 
Error in GenericTranslator$new : could not find function "loadMethod"
Calls: write.csv ... html_extract_n -> <Anonymous> -> Map -> mapply -> <Anonymous> -> $
Execution halted
 
real	0m1.431s
user	0m1.127s
sys	0m0.078s

As the error suggests, the script fails when trying to write to a CSV file – it looks like Rscript doesn’t load in something from the core library that we need. It turns out adding the following line to our script is all we need:

library(methods)

So we end up with this:

#!/usr/bin/env Rscript
 
library(methods)
library(rvest)
library(dplyr)
library(stringr)
library(readr)

And when we run that all is well!

Categories: Programming

R: dplyr ‚Äď squashing multiple rows per group into one

Mark Needham - Sat, 06/27/2015 - 23:36

I spent a bit of the day working on my Wimbledon data set and the next thing I explored is all the people that have beaten Andy Murray in the tournament.

The following dplyr query gives us the names of those people and the year the match took place:

library(dplyr)
 
> main_matches %>% filter(loser == "Andy Murray") %>% select(winner, year)
 
            winner year
1  Grigor Dimitrov 2014
2    Roger Federer 2012
3     Rafael Nadal 2011
4     Rafael Nadal 2010
5     Andy Roddick 2009
6     Rafael Nadal 2008
7 Marcos Baghdatis 2006
8 David Nalbandian 2005

As you can see, Rafael Nadal shows up multiple times. I wanted to get one row per player and list all the years in a single column.

This was my initial attempt:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year))
Source: local data frame [6 x 2]
 
            winner years
1     Andy Roddick  2009
2 David Nalbandian  2005
3  Grigor Dimitrov  2014
4 Marcos Baghdatis  2006
5     Rafael Nadal  2011
6    Roger Federer  2012

Unfortunately it just gives you the last matching row per group which isn’t quite what we want.. I realised my mistake while trying to pass a vector into paste and noticing that a vector came back when I’d expected a string:

> paste(c(2008,2009,2010))
[1] "2008" "2009" "2010"

The missing argument was ‘collapse’ – something I’d come across when using plyr last year:

> paste(c(2008,2009,2010), collapse=", ")
[1] "2008, 2009, 2010"

Now, if we apply that to our original function:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year, collapse=", "))
Source: local data frame [6 x 2]
 
            winner            years
1     Andy Roddick             2009
2 David Nalbandian             2005
3  Grigor Dimitrov             2014
4 Marcos Baghdatis             2006
5     Rafael Nadal 2011, 2010, 2008
6    Roger Federer             2012

That’s exactly what we want. Let’s tidy that up a bit:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% arrange(year) %>%
     summarise(years  = paste(year, collapse =","), times = length(year))  %>%
     arrange(desc(times), years)
Source: local data frame [6 x 3]
 
            winner          years times
1     Rafael Nadal 2008,2010,2011     3
2 David Nalbandian           2005     1
3 Marcos Baghdatis           2006     1
4     Andy Roddick           2009     1
5    Roger Federer           2012     1
6  Grigor Dimitrov           2014     1
Categories: Programming

Scala development with GitHub's Atom editor

Xebia Blog - Sat, 06/27/2015 - 14:57
.code { font-family: monospace; background-color: #eeeeee; }

GitHub recently released version 1.0 of their Atom editor. This post gives a rough overview of its Scala support.

Basic features

Basic features such as Scala syntax highlighting are provided by the language-scala plugin.

Some work on worksheets as found in e.g. Eclipse has been done in the scala-worksheet-plus plugin, but this is still missing major features and not very useful at this time.

Navigation and completion Ctags

Atom supports basic 'Go to Declaration' (ctrl-alt-down) and 'Search symbol' (cmd-shift-r) support by way of the default ctags-based symbols-view.

While there are multiple sbt plugins for generating ctags, the easiest seems to be to have Ensime download the sources (more on that below) and invoke ctags manually: put this configuration in your home directory and run the 'ctags' command from your project root.

This is useful for searching for symbols, but limited for finding declarations: for example, when checking the declaration for Success, ctags doesn't know whether this is scala.util.Success, akka.actor.Status.Success, spray.http.StatusCodes.Success or some other 3rd-party or local symbol with that name.

Ensime

This is where the Ensime plugin comes in.

Ensime is a service for Scala IDE support, originally written for the Scala support in Emacs. The project metadata for Ensime can be generated with 'sbt gen-ensime' from the ensime-sbt sbt plugin.

Usage

Start the Ensime server from Atom with 'cmd-shift-p' 'Ensime: start'. After a small pause the status bar proclaims 'Indexer ready!' and you should be good to go.

At this point the main features are 'jump to definition' (alt-click), hover for type info, and auto-completion:

atom.io ensime completion

There are some rough edges, but this is a promising start based on a solid foundation.

Conclusions

While Atom is already a pleasant, modern, open source, cross platform editor, it is clearly still early days.

The Scala support in Atom is not yet as polished as in IDE's such as IntelliJ IDEA or as stable as in more mature editors such as Sublime Text, but is already practically useful and has serious potential. Startup is not instant, but I did not notice a 'sluggish feel' as reported by earlier reviewers.

Feel free to share your experiences in the comments, I will keep this post updated as the tools - and our experience with them - evolve.

R: ggplot ‚Äď Show discrete scale even with no value

Mark Needham - Fri, 06/26/2015 - 23:48

As I mentioned in a previous blog post, I’ve been scraping data for the Wimbledon tennis tournament, and having got the data for the last ten years I wrote a query using dplyr to find out how players did each year over that period.

I ended up with the following functions to filter my data frame of all the mataches:

round_reached = function(player, main_matches) {
  furthest_match = main_matches %>% 
    filter(winner == player | loser == player) %>% 
    arrange(desc(round)) %>% 
    head(1)  
 
    return(ifelse(furthest_match$winner == player, "Winner", as.character(furthest_match$round)))
}
 
player_performance = function(name, matches) {
  player = data.frame()
  for(y in 2005:2014) {
    round = round_reached(name, filter(matches, year == y))
    if(length(round) == 1) {
      player = rbind(player, data.frame(year = y, round = round))      
    } else {
      player = rbind(player, data.frame(year = y, round = "Did not enter"))
    } 
  }
  return(player)
}

When we call that function we see the following output:

> player_performance("Andy Murray", main_matches)
   year          round
1  2005    Round of 32
2  2006    Round of 16
3  2007  Did not enter
4  2008 Quarter-Finals
5  2009    Semi-Finals
6  2010    Semi-Finals
7  2011    Semi-Finals
8  2012         Finals
9  2013         Winner
10 2014 Quarter-Finals

I wanted to create a chart showing Murray’s progress over the years with the round reached on the y axis and the year on the x axis. In order to do this I had to make sure the ’round’ column was being treated as a factor variable:

df = player_performance("Andy Murray", main_matches)
 
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
df$round = factor(df$round, levels =  rounds)
 
> df$round
 [1] Round of 32    Round of 16    Did not enter  Quarter-Finals Semi-Finals    Semi-Finals    Semi-Finals   
 [8] Finals         Winner         Quarter-Finals
Levels: Did not enter Round of 128 Round of 64 Round of 32 Round of 16 Quarter-Finals Semi-Finals Finals Winner

Now that we’ve got that we can plot his progress:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds)

2015 06 26 23 37 32

This is a good start but we’ve lost the rounds which don’t have a corresponding entry on the x axis. I’d like to keep them so it’s easier to compare the performance of different players.

It turns out that all we need to do is pass ‘drop = FALSE’ to scale_y_discrete and it will work exactly as we want:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop = FALSE)

2015 06 26 23 41 01

Neat. Now let’s have a look at the performances of some of the other top players:

draw_chart = function(player, main_matches){
  df = player_performance(player, main_matches)
  df$round = factor(df$round, levels =  rounds)
 
  ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop=FALSE) + 
    ggtitle(player) + 
    theme(axis.text.x=element_text(angle=90, hjust=1))
}
 
a = draw_chart("Andy Murray", main_matches)
b = draw_chart("Novak Djokovic", main_matches)
c = draw_chart("Rafael Nadal", main_matches)
d = draw_chart("Roger Federer", main_matches)
 
library(gridExtra)
grid.arrange(a,b,c,d, ncol=2)

2015 06 26 23 46 15

And that’s all for now!

Categories: Programming

An update on Eclipse Android Developer Tools

Android Developers Blog - Fri, 06/26/2015 - 22:28

Posted by Jamal Eason, Product Manager, Android

Over the past few years, our team has focused on improving the development experience for building Android apps with Android Studio. Since the launch of Android Studio, we have been impressed with the excitement and positive feedback. As the official Android IDE, Android Studio gives you access to a powerful and comprehensive suite of tools to evolve your app across Android platforms, whether it's on the phone, wrist, car or TV.

To that end and to focus all of our efforts on making Android Studio better and faster, we are ending development and official support for the Android Developer Tools (ADT) in Eclipse at the end of the year. This specifically includes the Eclipse ADT plugin and Android Ant build system.

Time to Migrate

If you have not had the chance to migrate your projects to Android Studio, now is the time. To get started, download Android Studio. For many developers, migration is as simple as importing your existing Eclipse ADT projects in Android Studio with File ‚Üí New‚Üí Import Project as shown below:

For more details on the migration process, check out the migration guide. Also, to learn more about Android Studio and the underlying build system, check out this overview page.

Next Steps

Over the next few months, we are migrating the rest of the standalone performance tools (e.g. DDMS, Trace Viewer) and building in additional support for the Android NDK into Android Studio.

We are focused on Android Studio so that our team can deliver a great experience on a unified development environment. Android tools inside Eclipse will continue to live on in the open source community via the Eclipse Foundation. Check out the latest Eclipse Andmore project if you are interested in contributing or learning more.

For those of you that are new to Android Studio, we are excited for you to integrate Android Studio into your development workflow. Also, if you want to contribute to Android Studio, you can also check out the project source code. To follow all the updates on Android Studio, join our Google+ community.

Categories: Programming

Automatically launching and configuring an EC2 instance with ansible

Agile Testing - Grig Gheorghiu - Fri, 06/26/2015 - 20:53
Ansible makes it easy to configure an EC2 instance from soup to nuts when it comes to launching the instance and configuring it.  Here's a complete playbook I use for this purpose:

$ cat ec2-launch-instance-api.yml
---
- name: Create a new api EC2 instance
hosts: localhost
gather_facts: False
vars:
keypair: api
instance_type: t2.small
security_group: api-core
image: ami-5189a661
region: us-west-2
vpc_subnet: subnet-xxxxxxx
name_tag: api01
tasks:
- name: Launch instance
ec2:
key_name: "{{ keypair }}"
group: "{{ security_group }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: true
region: "{{ region }}"
vpc_subnet_id: "{{ vpc_subnet }}"
assign_public_ip: yes
instance_tags:
Name: "{{ name_tag }}"
register: ec2

- name: Add Route53 DNS record for this instance (overwrite if needed)
route53:
command: create
zone: mycompany.com
record: "{{name_tag}}.mycompany.com"
type: A
ttl: 3600
value: "{{item.private_ip}}"
overwrite: yes
with_items: ec2.instances

- name: Add new instance to proper ansible group
add_host: hostname={{name_tag}} groupname=api-servers ansible_ssh_host={{ item.private_ip }} ansible_ssh_user=ubuntu ansible_ssh_private_key_file=/Users/grig.gheorghiu/.ssh/api.pem
with_items: ec2.instances

- name: Wait for SSH to come up
wait_for: host={{ item.private_ip }} port=22 search_regex=OpenSSH delay=210 timeout=420 state=started
with_items: ec2.instances

- name: Configure api EC2 instance
hosts: api-servers
sudo: True
gather_facts: True
roles:
- base
- tuning
- postfix
- monitoring
- nginx
- api


The first thing I do in this playbook is to launch a new EC2 instance, add or update its Route53 DNS A record, add it to an ansible group and wait for it to be accessible via ssh. Then I configure this instance by applying a handful or roles to it. That's it.
Some things to note:
1) Ansible uses boto under the covers, so you need that installed on your local host, and you also need a ~/.boto configuration file with your AWS credentials:
[Credentials]aws_access_key_id = xxxxxaws_secret_access_key = yyyyyyyyyy
2) When launching an EC2 instance with ansible via the ansible ec2 module, the hosts variable should point to localhost and gather_facts should be set to False. 
3) The various parameters expected by the EC2 API (keypair name, instance type, VPN subnet, security group, instance name tag etc) can be set in the vars section and then used in the tasks section in the ec2 stanza.
4) I used the ansible route53 module for managing DNS. This module has a handy property called overwrite, which when set to yes will update a DNS record in place if it exists, or will create it if it doesn't exist. 5) The add_host task is very useful in that it adds the newly created instance to a hosts group, in my case api-servers. This host group has a group_vars/api-servers configuration file already, where I set various ansible variables used in different roles (mostly secret-type variables such as API keys, user names, passwords etc). The group_vars directory is NOT checked in.
6) In the final task of the playbook, the [api-servers] group (which consists of only the newly created EC2 instance) gets the respective roles applied to it. Why does this group only consist of the newly created EC2 instance? Because when I run the playbook with ansible-playbook, I indicate an empy hosts file to make sure this group is empty:
$ ansible-playbook -i hosts/myhosts.empty ec2-launch-instance-api.yml
If instead I wanted to also apply the specified roles to my existing EC2 instances in that group, I would specify a hosts file that already has those instances defined in the [api-servers] group.

Automatically launching and configuring an EC2 instance with ansible

Agile Testing - Grig Gheorghiu - Fri, 06/26/2015 - 20:53
Ansible makes it easy to configure an EC2 instance from soup to nuts when it comes to launching the instance and configuring it.  Here's a complete playbook I use for this purpose:

$ cat ec2-launch-instance-api.yml
---
- name: Create a new api EC2 instance
hosts: localhost
gather_facts: False
vars:
keypair: api
instance_type: t2.small
security_group: api-core
image: ami-5189a661
region: us-west-2
vpc_subnet: subnet-xxxxxxx
name_tag: api01
tasks:
- name: Launch instance
ec2:
key_name: "{{ keypair }}"
group: "{{ security_group }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: true
region: "{{ region }}"
vpc_subnet_id: "{{ vpc_subnet }}"
assign_public_ip: yes
instance_tags:
Name: "{{ name_tag }}"
register: ec2

- name: Add Route53 DNS record for this instance (overwrite if needed)
route53:
command: create
zone: mycompany.com
record: "{{name_tag}}.mycompany.com"
type: A
ttl: 3600
value: "{{item.private_ip}}"
overwrite: yes
with_items: ec2.instances

- name: Add new instance to proper ansible group
add_host: hostname={{name_tag}} groupname=api-servers ansible_ssh_host={{ item.private_ip }} ansible_ssh_user=ubuntu ansible_ssh_private_key_file=/Users/grig.gheorghiu/.ssh/api.pem
with_items: ec2.instances

- name: Wait for SSH to come up
wait_for: host={{ item.private_ip }} port=22 search_regex=OpenSSH delay=210 timeout=420 state=started
with_items: ec2.instances

- name: Configure api EC2 instance
hosts: api-servers
sudo: True
gather_facts: True
roles:
- base
- tuning
- postfix
- monitoring
- nginx
- api


The first thing I do in this playbook is to launch a new EC2 instance, add or update its Route53 DNS A record, add it to an ansible group and wait for it to be accessible via ssh. Then I configure this instance by applying a handful or roles to it. That's it.
Some things to note:
1) Ansible uses boto under the covers, so you need that installed on your local host, and you also need a ~/.boto configuration file with your AWS credentials:
[Credentials]aws_access_key_id = xxxxxaws_secret_access_key = yyyyyyyyyy
2) When launching an EC2 instance with ansible via the ansible ec2 module, the hosts variable should point to localhost and gather_facts should be set to False. 
3) The various parameters expected by the EC2 API (keypair name, instance type, VPN subnet, security group, instance name tag etc) can be set in the vars section and then used in the tasks section in the ec2 stanza.
4) I used the ansible route53 module for managing DNS. This module has a handy property called overwrite, which when set to yes will update a DNS record in place if it exists, or will create it if it doesn't exist. 5) The add_host task is very useful in that it adds the newly created instance to a hosts group, in my case api-servers. This host group has a group_vars/api-servers configuration file already, where I set various ansible variables used in different roles (mostly secret-type variables such as API keys, user names, passwords etc). The group_vars directory is NOT checked in.
6) In the final task of the playbook, the [api-servers] group (which consists of only the newly created EC2 instance) gets the respective roles applied to it. Why does this group only consist of the newly created EC2 instance? Because when I run the playbook with ansible-playbook, I indicate an empy hosts file to make sure this group is empty:
$ ansible-playbook -i hosts/myhosts.empty ec2-launch-instance-api.yml
If instead I wanted to also apply the specified roles to my existing EC2 instances in that group, I would specify a hosts file that already has those instances defined in the [api-servers] group.

Episode 230: Shubhra Kar on NodeJS

Shubhra Kar of StrongLoop talks to Jeff Meyerson about Node.js. Node allows for server-side JavaScript. Shubra and Jeff explore why Node is so important from three standpoints: isomorphic JavaScript, the single-threaded-concurrency model, and the “API economy.” Isomorphic JavaScript apps have their own control and viewing logic, but they share the state and specification of the […]
Categories: Programming

What Creates Trust in Your Organization?

I published my most recent newsletter, Creating Trustworthy Estimates, this past week. I also noted on Twitter that one person said his estimates created trust in his organization. (He was responding to a #noestimate post that I had retweeted.)

Sometimes, estimates do create trust. They provide a comfortable feeling to many people that you have an idea of what size this beast is. That’s why I offer solutions for a gross estimate in Predicting the Unpredictable. I have nothing against gross estimates.

I don’t like gross estimates (or even detailed estimates) as a way to evaluate projects in the project portfolio¬†because estimates are guesses. Estimates are not a great way to understand and discuss the value of a project. They might be one piece of the valuation discussion, but if you use them as the only way to value a project, you are missing the value discussion you need to have. See Why Cost is the Wrong Question for Evaluating Projects in Your Project Portfolio.

I have not found that only estimates create trust. I have found that delivering the product  (or interim product) creates more trust.

Way back, when I was a software developer, I had a difficult machine vision project. Back then, we invented as we went. We had some in-house libraries, but we had to develop new solutions for each customer.

I had an estimate of 8 weeks for that project. I prototyped and tried a gazillion things. Finally, at 6 weeks, I had a working prototype. I showed it to my managers and other interested people. I finished the project and we shipped it.

Many years later, when I was a consultant, I encountered one of those managers. He said to me, “We held our breath for 6 weeks¬†until you showed us a prototype. You had gone dark and we were worried. We had no idea if you would finish.”

By that time, I had managed people like me. I asked them for visual updates on their status each week or two. I had learned from my experiences.

I asked that manager why they held their breath. I always used an engineering notebook. I could have explained my status at any time to anyone who wanted it. He replied, “We so desperately wanted your estimate to be true. We were so afraid it wasn’t. We had no idea what to do. When you showed us a working prototype, that’s when we started to believe you could finish the project.”

They trusted my initial estimate. It’s a good thing they didn’t ask for updated estimates each week. I remember that project as a series of highs and lows.

That’s the problem with invention/innovation. You can keep track of your progress. You can determine ways to make progress. And, with the highs, your meet or beat your estimate. With the lows, you extend your estimate. I remember that at the beginning of week 5 I was sure I was not going to meet my date. Then, I discovered a way to make the project work. I remember my surprise that it was something “that easy.” It wasn’t easy. I had tracked my experiments in my notebook. There wasn’t much more I could do.

Since then, I asked¬†my managers, “When do you want to know my project is in trouble? As soon as it I think I’m not going to meet my date; after I do some experiments; or the last possible moment?” I create trust when I ask that question¬†because it shows I’m taking their concerns seriously.

After that project, here is what I did to create trust:

  1. Created a first draft estimate.
  2. Tracked my work so I could show visible progress and what didn’t work.
  3. Delivered often. That is why I like inch-pebbles. Yes, after that project, I often had one- or two-day deliverables.
  4. If I thought I wasn’t going to make it, use the questions above to decide when to say, “I’m in trouble.”
  5. Delivered a working product.

PredictingUnpredictable-smallEstimates can be useful. They can show you the risks. And, I’m sure that only having estimates is insufficient for building trust. If you want to learn more about estimation, see Predicting the Unpredictable: Pragmatic Approaches to Estimating Cost or Schedule.

Categories: Project Management

Git Subproject Compile-time Dependencies in Sbt

Xebia Blog - Fri, 06/26/2015 - 13:33

When creating a sbt project recently, I tried to include a project with no releases. This means including it using libraryDependencies in the build.sbt does not work. An option is to clone the project and publish it locally, but this is tedious manual work that needs to be repeated every time the cloned project changes.

Most examples explain how to add a direct compile time dependency on a git repository to sbt, but they just show how to add a single project repository as a dependency using an RootProject. After some searching I found the solution to add projects from a multi-project repository. Instead of RootProject the ProjectRef should be used. This allows for a second argument to specify the subproject in the reposityr.

This is my current project/Build.scala file:

import sbt.{Build, Project, ProjectRef, uri}

object GotoBuild extends Build {
  lazy val root = Project("root", sbt.file(".")).dependsOn(staminaCore, staminaJson, ...)

  lazy val staminaCore = ProjectRef(uri("git://github.com/scalapenos/stamina.git#master"), "stamina-core")
  lazy val staminaJson = ProjectRef(uri("git://github.com/scalapenos/stamina.git#master"), "stamina-json")
  ...
}

These subprojects are now a compile time dependency and sbt will pull in and maintain the repository in ~/.sbt/0.13/staging/[sha]/stamina. So no manual checkout with local publish is needed. This is very handy when depending on an internal independent project/module and without needing to create a new release for every change. (One side note is that my IntelliJ currently does not recognize that the library is on the class/source path of the main project, so it complains it cannot find symbols and therefore cannot do proper syntax checking and auto completing.)

Inspirational Quotes, Inspirational Life Quotes, and Great Leadership Quotes

I know several people looking for inspiration.

I believe the right words ignite or re-ignite us.

There is no better way to prime your mind for great things to come than filling your head and hear with the greatest inspirational quotes that the world has ever known.

Of course, the challenge is finding the best inspirational quotes to draw from.

Well, here you go …

3 Great Inspirational Quotes Collections at Your Fingertips

I revamped a few of my best inspirational quotes collections to really put the gems of insight at your fingertips:

  1. Inspirational Quotes ‚Äď light a fire from the inside out, or find your North Star that pulls you forward
  2. Inspirational Life Quotes -
  3. Great Leadership Quotes ‚Äď learn what great leadership really looks like and how it helps lifts others up

Each of these inspirational quotes collection is hand-crafted with deep words of wisdom, insight, and action.

You'll find inspirational quotes from Charles Dickens, Confucius, Dr. Seuss, George Bernard Shaw, Henry David Thoreau, Horace, Lao Tzu,  Lewis Carroll, Mahatma Gandhi, Oprah Winfrey, Oscar Wilde, Paulo Coelho, Ralph Waldo Emerson, Stephen King, Tony Robbins, and more.

You'll even find an inspirational quote from The Wizard of Oz (and it‚Äôs not ‚ÄúThere‚Äôs no place like home.‚ÄĚ)

Inspirational Quotes Jump Start

Here are a few of my favorites inspirational quotes to get you started:

‚ÄúCourage doesn‚Äôt always roar. Sometimes courage is the quiet voice at the end of the day saying, ‚ÄėI will try again tomorrow.‚Äô‚ÄĚ

‚ÄĒ Mary Anne Radmacher

‚ÄúDo not follow where the path may lead. Go, instead, where there is no path and leave a trail.‚ÄĚ

‚ÄĒ Ralph Waldo Emerson

‚ÄúDon‚Äôt cry because it‚Äôs over, smile because it happened.‚ÄĚ

‚ÄĒ Dr. Seuss

‚ÄúIt is not length of life, but depth of life.‚ÄĚ

‚ÄĒ Ralph Waldo Emerson

‚ÄúLife is not measured by the number of breaths you take, but by every moment that takes your breath away.‚ÄĚ

‚Äď Anonymous

‚ÄúYou live but once; you might as well be amusing.‚ÄĚ

‚ÄĒ Coco Chanel

‚ÄúIt is never too late to be who you might have been.‚ÄĚ

‚ÄĒ George Eliot

‚ÄúSmile, breathe and go slowly.‚ÄĚ

‚ÄĒ Thich Nhat Hanh

‚ÄúWhat lies behind us and what lies before us are tiny matters compared to what lies within us.‚ÄĚ

‚ÄĒ Ralph Waldo Emerson

These inspirational quotes are living breathing collections.  I periodically sweep them to reflect new additions, and I re-organize or re-style the quotes if I find a better way.

I invest a lot of time on quotes because I’ve learned the following simple truth:

Quotes change lives.

The right words, at the right time, can be just that little bit you need, to breakthrough or get unstuck, or find your mojo again.

Have you had your dose of inspiration today?

Categories: Architecture, Programming

Deploying monitoring tools with ansible

Agile Testing - Grig Gheorghiu - Fri, 06/26/2015 - 01:10
At my previous job, I used Chef for configuration management. New job, new tools, so I decided to use ansible, which I had played with before. Part of that was that I got sick of tools based on Ruby. Managing all the gems dependencies and migrating from one Ruby version to another was a nightmare that I didn't want to go through again. That's one reason why at my new job we settled on Go as the main language we use for our backend API layer.

Back to ansible. Since it's written in Python, it's already got good marks in my book. Plus it doesn't need a server and it's fairly easy to wrap your head around. I've been very happy with it so far.

For external monitoring, we use Pingdom because it works and it's cheap. We also use New Relic for application performance monitoring, but it's very expensive, so I've been looking at ways to supplement it with Open Source tools.

An announcement about telegraf drew my attention the other day: here was a metrics collection tool written in Go and sending its data to InfluxDB, which is a scalable database also written in Go and designed to receive time-series data. Seemed like a perfect fit. I just needed a way to display the data from InfluxDB. Cool, it looked like Grafana supports InfluxDB! It turns out however that Grafana support for the latest InfluxDB version 0.9.0 is experimental, i.e. doesn't really work. Plus telegraf itself has some rough edges in the way it tags the data it sends to InfluxDB. Long story short, after a whole day of banging my head against the telegraf/InfluxDB/Grafana wall, I decided to abandon this toolset.

Instead, I reached again to trusty old Graphite and its loyal companion statsd. I had problems with Graphite not scaling well before, but for now we're not sending it such a huge amount of metrics, so it will do. I also settled on collectd as the OS metric collector. It's small, easy to configure, and very stable. The final piece of the puzzle was a process monitoring and alerting tool. I chose monit for this purpose. Again: simple, serverless, small footprint, widely used, stable, easy to configure.

This seems like a lot of tools, but it's not really that bad if you have a good solid configuration management system in place -- ansible in my case.

Here are some tips and tricks specific to ansible for dealing with multiple monitoring tools that need to be deployed across various systems.

Use roles extensively

This is of course recommended no matter what configuration management system you use. With ansible, it's easy to use the commmand 'ansible-galaxy init rolename' to create the directory structure for a new role. My approach is to create a new role for each major application or tool that I want to deploy. Here are some of the roles I created:

  • a base role that adds users, deals with ssh keys and sudoers.d files, creates directory structures common to all servers, etc.
  • a tuning role that mostly configures TCP-related parameters in sysctl.conf
  • a postfix role that installs and configures postfix to use Amazon SES
  •  a go role that installs golang from source and configures GOPATH and GOROOT
  • an nginx role that installs nginx and deploys self-signed SSL certs for development/testing purposes
  • a collectd role that installs collectd and deploys (as an ansible template) a collectd.conf configuration file common to all systems, which sends data to graphite (the system name is customized as {{inventory_hostname}} in the write_graphite plugin)
  • a monit role that installs monit and deploys (again as an ansible template) a monitrc file that monitors resource metrics such as CPU, memory, disk etc. common to all systems
  • an api role that does the heavy lifting for the installation and configuration of the packages that are required by our API layer
Use an umbrella 'monitoring' role
At first I was configuring each ansible playbook to use both the monit role and the collectd role. I realized that it's a bit more clear and also easier to maintain if instead playbooks use a more generic monitoring role, which does nothing but list monit and collectd as dependencies in its meta/main.yml file:

dependencies:  - { role: monit }  - { role: collectd }
Customize monitoring-related configuration files in other roles
A nice thing about both monit and collectd, and a main reason I chose them, is that they read configuration files from a directory called /etc/monit/conf.d for monit and /etc/collectd/collectd.conf.d for collectd. This makes it easy for each role to add its own configuration files. For example, the api role adds 2 files as custom checks in  /etc/monit/conf.d: check_api and check_nginx. It also adds 2 files as custom metric collectors in /etc/collectd/collectd.conf.d: nginx.conf and memcached.conf. The api role does this via a file called tasks/monitoring.yml file which gets included in tasks/main.yml.
As another example, the nginx role also adds its own check_nginx configuration file to /etc/monit/conf.d via a tasks/monitoring.yml file.
The rule of thumb I arrived at is this: each low-level role such as monit and collectd installs the common configuration files needed by all other roles, whereas each higher-level role such as api installs its own custom checks and metric collectors via a monitoring.yml task file. This way, it's easy to see at a glance what each high-level role does for monitoring: just look in its monitoring.yml task file.
To wrap this post up, here is an example of a playbook I use to build API servers:
$ cat api-servers.yml---
- hosts: api-servers  sudo: yes  roles:    - base    - tuning    - postfix    - monitoring    - nginx    - api
I call this playbook with:
$ ansible-playbook -i hosts/myhosts api-servers.yml
To make this work, the hosts/myhosts file has a section similar to this:
[api-servers]api01 ansible_ssh_host=api01.mydomain.comapi02 ansible_ssh_host=api02.mydomain.com




Scaling Agile: Scrum of Scrums, Membership Revisited

Pick a direction?

Pick a direction?

A Scum of Scrums (SoS) is a mechanism that has been adopted to coordinate a group of teams so that they act as a team of teams.  Historically the concept of a SoS was not part of the original Scrum framework and is considered an add-on to the framework.  In other words a Scrum of Scrums is an optional technique that can be added to more canonical Scrum framework if useful however because the technique is optional the amount of guidance is patchy.  For example, when an organization adopts a SoS who should participate is often hotly debated.  The participation debate popped up after we published Scrum of Scrums, The Basics. I had a number of conversations with readers to discuss the topic. Consolidating the discussions suggest the type of membership the person supports depends on what they want to get from using the SoS. All of the readers felt that the SoS should always be focused on coordination however there can be two flavors; one focused on activities and the second on technical questions.

Coordination of Activities:  Those that believe the SoS is primarily a tool for coordinating team activities support the idea that Scrum Masters should be chosen for SoS membership.  The rationale put forward focuses on the idea that the Scrum Master is uniquely positioned to gather and communicate information related to the coordination of teams. Readers in this group view feel that since Scrum Masters interact with all team members as a facilitator rather than a technical leader they are better coordinators. The alternate view held by many Agilistas believe that Scrum Masters acting in this role violate Scrum principles and represent a prima facie reinstitution of the role of the project manager. Further the Agilista view suggests that when technical issues need to be dealt with in SoS meetings populated with Scrum Masters they can’t make decisions rather often need to act as a conduit between technical team members. When SoS become a conduit, a version of the classic telephone game, decision making effectiveness is reduced..

Technical Coordination:  Fred Brooks in his essay Tar Pit (foreshadowing the next installment of Re-Read Saturday) suggests that software development increases in complexity as it progresses past development of an individual program. Integrating work with the work of other requires sharing technical information and making technical decisions that often can impact more than a single person or team. Scrum of scrums that are being used for technical coordination require participants with relevant technical acumen.  Participants with technical acumen generally come from the technical part of the team (developers, architects or testers).

Pushing aside the noise of whether the coordination of activities is less Agile than using the SoS for technical coordination, a more pragmatic approach is to recognize the needs of the team of teams is context driven.  The type of decision the team of team needs to make will change across a sprint or a SAFe Program Increment therefore who should attend a SoS needs to vary. The downside to varying membership is ensuring the right people are in attendance to address the SoS’s current need. One solution I have observed is to develop a cadence for topics.  For example tackle coordination every fourth SoS gathering with other more technical topics being focused on in-between.  Predictability makes it easy to plan who should attend.  Regardless of approach I suggest that any SoS should agree upon a mechanism to decide on the type of to hold. Flexibility to identify the type of SoS will ensure the team does not fall prey to meeting schedule paralysis or the equally evil telephone game.


Categories: Process Management

R: Scraping Wimbledon draw data

Mark Needham - Fri, 06/26/2015 - 00:14

Given Wimbledon starts next week I wanted to find a data set to explore before it gets underway. Having searched around and failed to find one I had to resort to scraping the ATP World Tour’s event page which displays the matches in an easy to access format.

We’ll be using the Wimbledon 2013 draw since Andy Murray won that year! This is what the page looks like:

2015 06 25 23 47 16

Each match is in its own row of a table and each column has a class attribute which makes it really easy to scrape. We’ll be using R’s rvest again. I wrote the following script which grabs the player names, seedings and score of the match and stores everything in a data frame:

library(rvest)
library(dplyr)
library(stringr)
 
s = html_session("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2013/results")
rows = s %>% html_nodes("div#scoresResultsContent tr")
 
matches = data.frame()
for(row in rows) {  
  players = row %>% html_nodes("td.day-table-name a")
  seedings = row %>% html_nodes("td.day-table-seed")
  score = row %>% html_node("td.day-table-score a")
 
  if(!is.null(score)) {
    player1 = players[1] %>% html_text() %>% str_trim()
    seeding1 = ifelse(!is.na(seedings[1]), seedings[1] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
 
    player2 = players[2] %>% html_text() %>% str_trim()
    seeding2 = ifelse(!is.na(seedings[2]), seedings[2] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
 
    matches = rbind(data.frame(winner = player1, 
                               winner_seeding = seeding1, 
                               loser = player2, 
                               loser_seeding = seeding2,
                               score = score %>% html_text() %>% str_trim(),
                               round = round), matches)
 
  } else {
    round = row %>% html_node("th") %>% html_text()
  }
}

This is what the data frame looks like:

> matches %>% sample_n(10)
               winner winner_seeding                       loser loser_seeding            score                round
61      Wayne Odesnik            (4)                Thiago Alves          <NA>            61 64 1st Round Qualifying
4     Danai Udomchoke           <NA>            Marton Fucsovics          <NA>       61 57 1210 1st Round Qualifying
233    Jerzy Janowicz           (24)                Lukasz Kubot          <NA>         75 64 64       Quarter-Finals
90       Malek Jaziri           <NA>             Illya Marchenko           (9)        674 75 64 2nd Round Qualifying
222      David Ferrer            (4)         Alexandr Dolgopolov          (26) 676 762 26 61 62          Round of 32
54  Michal Przysiezny           (11)                 Dusan Lojda          <NA>         26 63 62 1st Round Qualifying
52           Go Soeda           (13)               Nikola Mektic          <NA>            62 60 1st Round Qualifying
42    Ruben Bemelmans           (23) Jonathan Dasnieres de Veigy          <NA>            63 64 1st Round Qualifying
31        Mirza Basic           <NA>              Tsung-Hua Yang          <NA>     674 33 (RET) 1st Round Qualifying
179     Jurgen Melzer           <NA>              Julian Reister           (Q)    36 762 765 62          Round of 64

It also contains qualifying matches which I’m not so interested in. Let’s strip those out:

main_matches = matches %>% filter(!grepl("Qualifying", round)) %>% mutate(year = 2013)

We’ll also put a column in for ‘year’ so that we can handle the draws for multiple years later on.

Next I wanted to clean up the data a bit. I’d like to be able to do some queries based on the seedings of the players but at the moment that column contains numeric brackets in values as well as some other values which indicate whether a player is a qualifier, lucky loser or wildcard entry.

I started by adding a column to store this extra information:

main_matches$winner_type = NA
main_matches$winner_type[main_matches$winner_seeding == "(WC)"] = "wildcard"
main_matches$winner_type[main_matches$winner_seeding == "(Q)"] = "qualifier"
main_matches$winner_type[main_matches$winner_seeding == "(LL)"] = "lucky loser"
 
main_matches$loser_type = NA
main_matches$loser_type[main_matches$loser_seeding == "(WC)"] = "wildcard"
main_matches$loser_type[main_matches$loser_seeding == "(Q)"] = "qualifier"
main_matches$loser_type[main_matches$loser_seeding == "(LL)"] = "lucky loser"

And then I cleaned up the existing column:

tidy_seeding = function(seeding) {
  no_brackets = gsub("\\(|\\)", "", seeding)
  return(gsub("WC|Q|L", NA, no_brackets))
}
 
main_matches = main_matches %>% 
  mutate(winner_seeding = as.numeric(tidy_seeding(winner_seeding)),
         loser_seeding = as.numeric(tidy_seeding(loser_seeding)))

Now we can write a query against the data frame to find out when the underdog won i.e. a player with no seeding beat a player with a seeding or a lower seeded player beat a higher seeded one:

> main_matches %>%  filter((winner_seeding > loser_seeding) | (is.na(winner_seeding) & !is.na(loser_seeding)))
                  winner winner_seeding                 loser loser_seeding                  score          round year
1          Jurgen Melzer             NA         Fabio Fognini            30           675 75 63 62   Round of 128 2013
2          Bernard Tomic             NA           Sam Querrey            21       766 763 36 26 63   Round of 128 2013
3        Feliciano Lopez             NA          Gilles Simon            19             62 64 7611   Round of 128 2013
4             Ivan Dodig             NA Philipp Kohlschreiber            16 46 676 763 63 21 (RET)   Round of 128 2013
5         Viktor Troicki             NA      Janko Tipsarevic            14              63 64 765   Round of 128 2013
6         Lleyton Hewitt             NA         Stan Wawrinka            11               64 75 63   Round of 128 2013
7           Steve Darcis             NA          Rafael Nadal             5             764 768 64   Round of 128 2013
8      Fernando Verdasco             NA      Julien Benneteau            31             761 764 64    Round of 64 2013
9           Grega Zemlja             NA       Grigor Dimitrov            29       36 764 36 64 119    Round of 64 2013
10      Adrian Mannarino             NA            John Isner            18               11 (RET)    Round of 64 2013
11         Igor Sijsling             NA          Milos Raonic            17              75 64 764    Round of 64 2013
12     Kenny De Schepper             NA           Marin Cilic            10                  (W/O)    Round of 64 2013
13        Ernests Gulbis             NA    Jo-Wilfried Tsonga             6         36 63 63 (RET)    Round of 64 2013
14     Sergiy Stakhovsky             NA         Roger Federer             3         675 765 75 765    Round of 64 2013
15          Lukasz Kubot             NA          Benoit Paire            25               61 63 64    Round of 32 2013
16     Kenny De Schepper             NA           Juan Monaco            22              64 768 64    Round of 32 2013
17        Jerzy Janowicz             24       Nicolas Almagro            15              766 63 64    Round of 32 2013
18         Andreas Seppi             23         Kei Nishikori            12        36 62 674 61 64    Round of 32 2013
19         Bernard Tomic             NA       Richard Gasquet             9          767 57 75 765    Round of 32 2013
20 Juan Martin Del Potro              8          David Ferrer             4              62 64 765 Quarter-Finals 2013
21           Andy Murray              2        Novak Djokovic             1               64 75 64         Finals 2013

There are actually very few times when a lower seeded player beat a higher seeded one but there are quite a few instances of non seeds beating seeds. We’ve got 21 occurrences of underdogs winning out of a total of 127 matches.

Let’s filter that set of rows and see which seeds lost in the first round:

> main_matches %>%  filter(round == "Round of 128" & !is.na(loser_seeding))
           winner winner_seeding                 loser loser_seeding                  score        round year
1   Jurgen Melzer             NA         Fabio Fognini            30           675 75 63 62 Round of 128 2013
2   Bernard Tomic             NA           Sam Querrey            21       766 763 36 26 63 Round of 128 2013
3 Feliciano Lopez             NA          Gilles Simon            19             62 64 7611 Round of 128 2013
4      Ivan Dodig             NA Philipp Kohlschreiber            16 46 676 763 63 21 (RET) Round of 128 2013
5  Viktor Troicki             NA      Janko Tipsarevic            14              63 64 765 Round of 128 2013
6  Lleyton Hewitt             NA         Stan Wawrinka            11               64 75 63 Round of 128 2013
7    Steve Darcis             NA          Rafael Nadal             5             764 768 64 Round of 128 2013

Rafael Nadal is the most prominent but Stan Wawrinka also lost in the first round that year which I’d forgotten about! Next let’s make the ’round’ column an ordered factor one so that we can sort matches by round:

main_matches$round = factor(main_matches$round, levels =  c("Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals"))
 
> main_matches$round
...     
Levels: Round of 128 Round of 64 Round of 32 Round of 16 Quarter-Finals Semi-Finals Finals

We can now really easily work out which unseeded players went the furthest in the tournament:

> main_matches %>% filter(is.na(loser_seeding)) %>% arrange(desc(round)) %>% head(5)
             winner winner_seeding             loser loser_seeding           score          round year
1    Jerzy Janowicz             24      Lukasz Kubot            NA        75 64 64 Quarter-Finals 2013
2       Andy Murray              2 Fernando Verdasco            NA  46 36 61 64 75 Quarter-Finals 2013
3 Fernando Verdasco             NA Kenny De Schepper            NA        64 64 64    Round of 16 2013
4      Lukasz Kubot             NA  Adrian Mannarino            NA  46 63 36 63 64    Round of 16 2013
5    Jerzy Janowicz             24     Jurgen Melzer            NA 36 761 64 46 64    Round of 16 2013

Next up I thought it’d be cool to write a function which showed which round each player exited in:

round_reached = function(player, main_matches) {
  furthest_match = main_matches %>% 
    filter(winner == player | loser == player) %>% 
    arrange(desc(round)) %>% 
    head(1)  
 
    return(ifelse(furthest_match$winner == player, "Winner", as.character(furthest_match$round)))
}

Our function isn’t vectorisable – it only works if we pass in a single player at a time so we’ll have to group the data frame by player before calling it. Let’s check it works by seeing how far Andy Murray and Rafael Nadal got:

> round_reached("Rafael Nadal", main_matches)
[1] "Round of 128"
> round_reached("Andy Murray", main_matches)
[1] "Winner"

Great. What about if we try it against each of the top 8 seeds?

> rbind(main_matches %>% filter(winner_seeding %in% 1:8) %>% mutate(name = winner, seeding = winner_seeding), 
        main_matches %>% filter(loser_seeding %in% 1:8) %>% mutate(name = loser, seeding = loser_seeding)) %>%
    select(name, seeding) %>%
    distinct() %>%
    arrange(seeding) %>%
    group_by(name) %>%
    mutate(round_reached = round_reached(name, main_matches))
Source: local data frame [8 x 3]
Groups: name
 
                   name seeding  round_reached
1        Novak Djokovic       1         Finals
2           Andy Murray       2         Winner
3         Roger Federer       3    Round of 64
4          David Ferrer       4 Quarter-Finals
5          Rafael Nadal       5   Round of 128
6    Jo-Wilfried Tsonga       6    Round of 64
7         Tomas Berdych       7 Quarter-Finals
8 Juan Martin Del Potro       8    Semi-Finals

Neat. Next up I want to do a comparison between the round they reached and the round you’d expect them to get to given their seeding but that’s for the weekend!

I’ve put a CSV file containing all the data in this gist in case you want to play with it. I’m planning to scrape a few more years worth of data before Monday and add in some extra fields as well but in case I don’t get around to it the full script in this blog post is included in the gist as well so feel free to tweak it if tennis is your thing.

Categories: Programming

Two people coding is twice as productive, right?

Actively Lazy - Thu, 06/25/2015 - 22:14

Stands to reason, doesn’t it? If one person can make¬†5 widgets an hour, then two people can make¬†10 widgets an hour. Its just the natural way of things. You can’t argue with science.

The same is obviously true of software, isn’t it? If one developer can write 10 lines of code an hour, then clearly two can write 20 lines of code an hour. If you want more code written, just hire more developers. There’s nothing mythical about¬†my man months.

And yet… somehow… software persists in being¬†weird stuff.

This week I had an interesting experience. Me and one other developer have been working on a new, greenfield project. We’ve been ploughing through the work, ticking off stories at a decent rate. Only now it’s getting to that difficult stage where the original design ideas are rapidly giving way to new problems and new ideas; substantial refactoring is going on as we discuss better ways of representing our problem. This seems good and healthy.

Then I had one of those days where everywhere I turned there was a design problem. Not a single line of code could be written without me getting grumpy about the design. Worst of all, it was the code my co-worker had just finished that was showing the flaws in the original design.¬†Cue much discussion. At one point he lamented that he could finish the task (that was blocking me from making progress) “if he could just get a 30 minute run at his computer”. It was nearly¬†5pm.

A day where a 30 minute spell of productive coding is hard to find is not a day where much code has been written. Oh we were productive, the design was much improved by day’s end. The code? Nothing to see here, move along, please.¬†Were we really twice as productive that day? Hell no. I spent the entire day distracting him from completing his tasks to discuss design problems; he spent the entire day trying to merge a branch that my design refactoring had made difficult. We spent the entire day working against each other.

What could we have done differently? Well the first problem was trying to maintain two streams of development activity through the same (small) code base. We were tripping up over each other like crazy. Unwinding a few days, we probably would have got more done with just one person working. That way there would only be one narrative thread through the code, one sequence of refactoring steps at a time.

Wait, what – say that again: we would have got more done last week if only one person had been working on it. Well that’s just crazy talk, let me tell you about making widgets, boy…

I think we massively underestimate the cost of coordination and communication when building software. From the outside its very easy to miss: a quick 5 minute conversation laden with jargon. And yet… this is where the magic happens: this is where the design comes from. But if that 5 minute conversation interrupted someone’s¬†work, the next 45 minutes could be lost while they try and reload into memory what they were working on. Pile up a few of these interruptions in your day, and no wonder it feels like you’re swimming upstream.

Clearly, what we should have been doing but weren’t was pairing. That way there would only have been one narrative thread. One sequence of ideas being applied at a time. Changes neatly serialized by there only being one keyboard. ¬†Of course, by pairing we still could have had the design discussions – but instead they would appear at a time when we were both already stuck. There is no cost of interruption when you’re both already there, immersed in the problem.¬†By pairing we would have stopped working against each other and created an interruption-free space for design discussions.

So in fact: two people can be more productive than one. Two people¬†pairing is definitely better than one person working on their own. It’s made me realise¬†that we’ve been explaining¬†pairing¬†all wrong: we try and justify the “cost” of pairing, as though we somehow have to explain why having two people working at the same machine really isn’t halving productivity. It’s all based on a false assumption: that two people working on¬†different machines are twice as productive as one person working alone. Once you realise that this assumption is fundamentally flawed, the “cost” of pairing evaporates. Instead pairing removes the cost of coordination between two developers: no¬†interruptions, no divergent ideas,¬†no¬†merge conflicts.

Pair programming is actually a cost-saving exercise.


Categories: Programming, Testing & QA

Everything I Learned About PM Came From a Elementary School Teacher

Herding Cats - Glen Alleman - Thu, 06/25/2015 - 18:08

Our daughter is an elementary teacher in Austin Texas. A nice school, Number 2 school in Texas.

While visiting this week, we were talking about a new book a group of us are working on. While showing her the TOC, she said Dad we do all that stuff (minus the finance side) every day, week, month, semester, and year. It's not that hard. That's what we've been trained to do. OK, but talent, dedication, skill, and a gift for teaching helps. 

Here's how an elementary school teacher sees her job as the Project Manager of 20 young clients.

  • Plan before starting anything, it‚Äôs going to go wrong, so know that up front and be able to recognize the train wreck is coming and get out of the way.
    • The plan is a strategy for the successful completion of the project.
    • Without the plan, you don't know how to assess progress in terms meaningful to the¬†decision¬†makers. Measures of cost and¬†schedule¬†are¬†measures¬†of effectiveness.¬†Measurers¬†of¬†stories produces,¬†features¬†delivered¬†aren;t measures or capabilities produced.
    • A Capabilities Based Plan is that measure. What capabilities doesn't the customer need to¬†accomplishment¬†the¬†business¬†case or fulfill a mission.
    • In education Blooms Taxonomy with TLOs and ELO's define the¬†capabilities¬†the student will possess¬†at the end of the course.
  • Have a notion of what done looks like, so when you get there, you can stop and move on.
    • Done is defined as possessing a¬†capability¬†to accomplish something.
    • Write this down in units of Effectiveness and Performance¬†
  • Have your Plan B always ready to go and then start thinking of Plan C when Plan B is under way. No Plan A ever lasts too long in the presence of chaos.
    • Risk management is how adults manage projects - Tim Lister
    • Adult supervision is the role of the teacher. Many times adult¬†supervision¬†of the role of the project¬†manager.
  • Make sure you‚Äôve got all the right resources lined up and ready to spring into action when things go wrong. Classroom aides, class leaders, parents, staff all ready to go when the plan goes in the ditch.
    • Resource planning is a critical success factor for all projects.
  • Know what can go wrong before you start, steer away from trouble and trouble will stay away.
    • Risk planning is planning. Planning is strategy.¬†
    • Apply good risk management to all activities on the project. Perform some formal sequence of risk¬†management. Pick one. My favorite is the SEI Continuous Risk Management process
  • Separate the trouble makers from the main stream. You know them on day one.¬†
    • Any good project manager can see trouble coming.
    • Isolate the troubled parts. Assign them to¬†separate¬†teams. Have them fix the problem so the rest¬†of the project isn't impacted by them
  • Show up early, prepare for the work, clean up afterward, so you can start ‚Äúclean‚ÄĚ again the next day. No less that 100% complete at the end of each period of performance. If not you‚Äôll pay dearly for ¬†it later.
    • Being prepared is the major attribute of project success.
    • This means planning.
    • Let's¬†things¬†emerge is nice of small non-trivial projects with low value at risk.¬†
  • Always ask ‚Äúis this your best work?‚ÄĚ and ‚Äúdid you put your name on it?‚ÄĚ Otherwise you're creating re-work.
    • Set the highest¬†quality¬†standards possible
  • No crying when it doesn‚Äôt work. Redo it and get back on schedule, recess time is schedule margin - you get to stay in and finish your planned work.
    • No¬†whining, every one put your "big boy " pants on a do the work needed to get the job done.
  • Take a break, go outside and play, think what you‚Äôre going to do next hour. Come back and do it.
    • Have retrospectives.
    • Look back for opportunities for improvement
    • Do Root Cause Analysis to find out the "real" why things didn't work
    • Have fun while still working hard

Is This Your Best Work

Related articles Who's Budget is it Anyway? Systems Thinking, System Engineering, and Systems Management Myth's Abound
Categories: Project Management

I Literally Don’t Give A Shit!

Making the Complex Simple - John Sonmez - Thu, 06/25/2015 - 16:00

In this episode, I literally don’t give a shit. Full transcript: John:¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† Hey, John Sonmez from simpleprogrammer.com. I‚Äôve got a bit of a long question. This is kind of an interesting one and there might be some profanity. I try to keep the content pretty clean, but I might get a little worked up about […]

The post I Literally Don’t Give A Shit! appeared first on Simple Programmer.

Categories: Programming

New Azure Billing APIs Available

ScottGu's Blog - Scott Guthrie - Thu, 06/25/2015 - 06:59

Organizations moving to the cloud can achieve significant cost savings.  But to achieve the maximum benefit you need to be able to accurately track your cloud spend in order to monitor and predict your costs. Enterprises need to be able to get detailed, granular consumption data and derive insights to effectively manage their cloud consumption.

I’m excited to announce the public preview release of two new Azure Billing APIs today: the Azure Usage API and Azure RateCard API which provide customers and partners programmatic access to their Azure consumption and pricing details:

Azure Usage API ‚Äď A REST API that customers and partners can use to get their usage data for an Azure subscription. As part of this new Billing API we now correlate the usage/costs by the resource tags you can now set set on your Azure resources (for example: you could assign a tag ‚ÄúDepartment abc‚ÄĚ or ‚ÄúProject X‚ÄĚ to a VM or Database in order to better track spend on a resource and charge it back to an internal group within your company). To get more details, please read the MSDN page on the Usage API. Enterprise Agreement (EA) customers can also use this API to get a more granular view into their consumption data, and to complement what they get from the EA Billing CSV.

Azure RateCard API ‚Äď A REST API that customers and partners can use to get the list of the available resources they can use, along with metadata and price information about them. To get more details, please read the MSDN page on the RateCard API.

You can start taking advantage of both of these APIs today.  You can write your own custom code that uses the APIs to construct your own custom reports, or alternatively you can also now take advantage of pre-built bill tracking systems provided by our partners which already integrate the APIs into their existing solutions.

Partner Solutions

Two of our Azure Billing partners (Cloudyn and Cloud Cruiser) have already integrated the new Billing APIs into their products:

Cloudyn has integrated with Azure Billing APIs to provide IT financial management insights on cost optimization. You can read more about their integration experience in Microsoft Azure Billing APIs enable Cloudyn to Provide ITFM for Customers.

Cloud Cruiser has integrated with the Azure RateCard API to provide an estimate of what it would cost the customer to run the same workloads on Azure. They are also working on integrating with the Azure Usage API to provide insights based on the Azure consumption. You can read more about their integration in Cloud Cruiser and Microsoft Azure Billing API Integration.

You can adopt one or both of the above solutions immediately and use them to better track your Azure bill without having to write a single line of code.

image

Cloudyn's integration enables you to view and query the breakdown of Azure usage by resource tags (e.g. ‚ÄúDev/Test‚ÄĚ, ‚ÄúDepartment abc‚ÄĚ, ‚ÄúProject X‚ÄĚ):

image

Cloudyn's integration showing trend of estimated charges over time:

image

Cloud Cruiser's integration to show estimated cost of running workload on Azure:  

image Using the Billing APIs directly

You can also use the new Billing APIs directly to write your own custom reports and billing tracking logic.  To get started with the APIs, you can leverage the code samples on Github.

The Billing APIs leverage the new Azure Resource Manager and use Azure Active Directory for Authentication and follow the Azure Role-based access control policies.  The code samples we‚Äôve published show a variety of common scenarios and how to integrate this logic end to end. Summary

The new Azure Billing APIs make it much easier to track your bill and save money.

As always, please reach out to us on the Azure Feedback forum and through the Azure MSDN forum.

Hope this helps,

Scott

omni
Categories: Architecture, Programming