Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Succeeding with Geographically Distributed Scrum Teams - New White Paper

10x Software Development - Steve McConnell - Tue, 06/30/2015 - 19:02

We have a new white paper, "Succeeding with Geographically Distributed Scrum Teams." To quote the white paper itself: 

When organizations adopt Agile throughout the enterprise, they typically apply it to both large and small projects. The gap is that most Agile methodologies, such as Scrum and XP, are team-level workflow approaches. These approaches can be highly effective at the team level, but they do not address large project architecture, project management, requirements, and project planning needs. Our clients find that succeeding with Scrum on a large, geographically distributed team requires adopting additional practices to ensure the necessary coordination, communication, integration, and architectural work. This white paper discusses common considerations for success with geographically distributed Scrum.

Check it out!

What’s new with Google Fit: Distance, Calories, Meal data, and new apps and wearables

Google Code Blog - Tue, 06/30/2015 - 18:52

Posted by Angana Ghosh, Lead Product Manager, Google Fit

To help users keep track of their physical activity, we recently updated the Google Fit app with some new features, including an Android Wear watch face that helps users track their progress throughout the day. We also added data types to the Google Fit SDK and have new partners tracking data (e.g. nutrition, sleep, etc.) that developers can now use in their own apps. Find out how to integrate Google Fit into your app and read on to check out some of the cool new stuff you can do.

table, th, td { border: clear; border-collapse: collapse; }

Distance traveled per day

The Google Fit app now computes the distance traveled per day. Subscribe to it using the Recording API and query it using the History API.

Calories burned per day

If a user has entered their details into the Google Fit app, the app now computes their calories burned per day. Subscribe to it using the Recording API and query it using the History API.

Nutrition data from LifeSum, Lose It!, and MyFitnessPal

LifeSum and Lose It! are now writing nutrition data, like calories consumed, macronutrients (proteins, carbs, fats), and micronutrients (vitamins and minerals) to Google Fit. MyFitnessPal will start writing this data soon too. Query it from Google Fit using the History API.

Sleep activity from Basis Peak and Sleep as Android

Basis Peak and Sleep as Android are now writing sleep activity segments to Google Fit. Query this data using the History API.

New workout sessions and activity data from even more great apps and fitness wearables!

Endomondo, Garmin, the Daily Burn, the Basis Peak and the Xiaomi miBand are new Google Fit partners that will allow users to store their workout sessions and activity data. Developers can access this data with permission from the user, which will also be shown in the Google Fit app.

How are developers using the Google Fit platform?

Partners like LifeSum, and Lose It! are reading all day activity to help users keep track of their physical activity in their favorite fitness app.

Runkeeper now shows a Google Now card to its users encouraging them to “work off” their meals, based on their meals written to Google Fit by other apps.

Instaweather has integrated Google Fit into a new Android Wear face that they’re testing in beta. To try out the face, first join this Google+ community and then follow the link to join the beta and download the app.

We hope you enjoy checking out these Google Fit updates. Thanks to all our partners for making it possible! Find out more about integrating the Google Fit SDK into your app.

Categories: Programming

Prioritize and Optimize Over a Slightly Longer Horizon

Mike Cohn's Blog - Tue, 06/30/2015 - 15:00

A lot of agile literature stresses that product owners must prioritize the delivery of value. I’m not going to argue with that. But I am going to argue that product owners need to optimize over a slightly longer horizon than a single sprint.

A product owner’s goal is to maximize the amount of value delivered over the life a product. If the product owner shows up at each sprint planning meeting focused on maximizing the value delivered in only that one sprint, the product owner will never choose to invest in the system’s future.

The product owner will instead always choose to deliver the feature that delivers the most value in the immediate future regardless of the long-term benefit. Such a product owner is like the pleasure-seeking student who parties every night during the semester but fails, and has to repeat the course during the summer.

A product owner with a short time horizon may have the team work on features A1, B1, C1 and D1 in four different parts of the application this sprint because those are the highest valued features.

A product owner with a more appropriate, longer view of the product will instead have the team work on A1, A2, A3 and A4 in the first sprint because there are synergies to working on all the features in area A at the same time.

Agile is still about focusing on value. And it’s still about making sure we deliver value in the short term. We just don’t want to become shortsighted about it.

How to create the smallest possible docker container of any image

Xebia Blog - Tue, 06/30/2015 - 10:46

Once you start to do some serious work with Docker, you soon find that downloading images from the registry is a real bottleneck in starting applications. In this blog post we show you how you can reduce the size of any docker image to just a few percent of the original. So is your image too fat, try stripping your Docker image! The strip-docker-image utility demonstrated in this blog makes your containers faster and safer at the same time!


We are working quite intensively on our High Available Docker Container Platform  using CoreOS and Consul which consists of a number of containers (NGiNX, HAProxy, the Registrator and Consul). These containers run on each of the nodes in our CoreOS cluster and when the cluster boots, more than 600Mb is downloaded by the 3 nodes in the cluster. This is quite time consuming.

cargonauts/consul-http-router      latest              7b9a6e858751        7 days ago          153 MB
cargonauts/progrium-consul         latest              32253bc8752d        7 weeks ago         60.75 MB
progrium/registrator               latest              6084f839101b        4 months ago        13.75 MB

The size of the images is not only detrimental to the boot time of our platform, it also increases the attack surface of the container.  With 153Mb of utilities in the  NGiNX based consul-http-router, there is a lot of stuff in the container that you can use once you get inside. As we were thinking of running this router in a DMZ, we wanted to minimise the amount of tools lying around for a potential hacker.

From our colleague Adriaan de Jonge we already learned how to create the smallest possible Docker container  for a Go program. Could we repeat this by just extracting the NGiNX executable from the official distribution and copying it onto a scratch image?  And it turns out we can!

finding the necessary files

Using the utility dpkg we can list all the files that are installed by NGiNX

docker run nginx dpkg -L nginx
...
/.
/usr
/usr/sbin
/usr/sbin/nginx
/usr/share
/usr/share/doc
/usr/share/doc/nginx
...
/etc/init.d/nginx
locating dependent shared libraries

So we have the list of files in the package, but we do not have the shared libraries that are referenced by the executable. Fortunately, these can be retrieved using the ldd utility.

docker run nginx ldd /usr/sbin/nginx
...
	linux-vdso.so.1 (0x00007fff561d6000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd8f17cf000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007fd8f1598000)
	libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fd8f1329000)
	libssl.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007fd8f10c9000)
	libcrypto.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007fd8f0cce000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd8f0ab2000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd8f0709000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd8f19f0000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd8f0505000)
Following and including symbolic links

Now we have the executable and the referenced shared libraries, it turns out that ldd normally names the symbolic link and not the actual file name of the shared library.

docker run nginx ls -l /lib/x86_64-linux-gnu/libcrypt.so.1
...
lrwxrwxrwx 1 root root 16 Apr 15 00:01 /lib/x86_64-linux-gnu/libcrypt.so.1 -> libcrypt-2.19.so

By resolving the symbolic links and including both the link and the file, we are ready to export the bare essentials from the container!

getpwnam does not work

But after copying all essentials files to a scratch image, NGiNX did not start.  It appeared that NGiNX tries to resolve the user id 'nginx' and fails to do so.

docker run -P  --entrypoint /usr/sbin/nginx stripped-nginx  -g "daemon off;"
...
2015/06/29 21:29:08 [emerg] 1#1: getpwnam("nginx") failed (2: No such file or directory) in /etc/nginx/nginx.conf:2
nginx: [emerg] getpwnam("nginx") failed (2: No such file or directory) in /etc/nginx/nginx.conf:2

It turned out that the shared libraries for the name switch service reading /etc/passwd and /etc/group are loaded at runtime and not referenced in the shared libraries. By adding these shared libraries ( (/lib/*/libnss*) to the container, NGiNX worked!

strip-docker-image example

So now, the strip-docker-image utility is here for you to use!

    strip-docker-image  -i image-name
                        -t target-image-name
                        [-p package]
                        [-f file]
                        [-x expose-port]
                        [-v]

The options are explained below:

-i image-name           to strip
-t target-image-name    the image name of the stripped image
-p package              package to include from image, multiple -p allowed.
-f file                 file to include from image, multiple -f allowed.
-x port                 to expose.
-v                      verbose.

The following example creates a new nginx image, named stripped-nginx based on the official Docker image:

strip-docker-image -i nginx -t stripped-nginx  \
                           -x 80 \
                           -p nginx  \
                           -f /etc/passwd \
                           -f /etc/group \
                           -f '/lib/*/libnss*' \
                           -f /bin/ls \
                           -f /bin/cat \
                           -f /bin/sh \
                           -f /bin/mkdir \
                           -f /bin/ps \
                           -f /var/run \
                           -f /var/log/nginx \
                           -f /var/cache/nginx

Aside from the nginx package, we add the files /etc/passwd, /etc/group and /lib/*/libnss* shared libraries. The directories /var/run, /var/log/nginx and /var/cache/nginx are required for NGiNX to operate. In addition, we added /bin/sh and a few handy utilities, just to be able to snoop around a little bit.

The stripped image has now shrunk to an incredible 5.4% of the original 132.8 Mb to just 7.3Mb and is still fully operational!

docker images | grep nginx
...
stripped-nginx                     latest              d61912afaf16        21 seconds ago      7.297 MB
nginx                              1.9.2               319d2015d149        12 days ago         132.8 MB

And it works!

ID=$(docker run -P -d --entrypoint /usr/sbin/nginx stripped-nginx  -g "daemon off;")
docker run --link $ID:stripped cargonauts/toolbox-networking curl -s -D - http://stripped
...
HTTP/1.1 200 OK

For HAProxy, checkout the examples directory.

Conclusion

It is possible to use the official images that are maintained and distributed by Docker and strip them down to their bare essentials, ready for use! It speeds up load times and reduces the attack surface of that specific container.

Checkout the github repository for the script and the manual page.

Please send me your examples of incredibly shrunk Docker images!

Making Decisions In The Presence of Uncertainty

Herding Cats - Glen Alleman - Tue, 06/30/2015 - 03:57

Screen Shot 2015-06-29 at 5.30.17 PMDecision making is hard. Decision making is easy when we know what to do. When we don't know what to do there are conflicting choices that must be balanced in the presence of uncertainty for each of those choices. The bigger issue is that important choices are usually ones where we know the least about the outcomes and the cost and schedule to achieve those outcomes. 

Decision science evolved to cope with decision making in the presence of uncertainty. This approach goes back to Bernoulli in the early 1700s, but remained an academic subject into the 20th century, because there was no satisfactory way to deal with the complexity of real life. Just after World War II, the fields of systems analysis and operations research began to develop. With the help of computers, it became possible to analyze problems of great complexity in the presence of uncertainty.

In 1938, Chester Barnard, authored of The Functions of the Executive, and coined the term “decision making” from the lexicon of public administration into the business world. This term replaced narrower descriptions such as “resource allocation” and “policy making.”

Decision analysis functions at four different levels

  • Philosophy - uncertainty is a consequence of our incomplete knowledge of the world. In some cases, uncertainty can be partially or completely resolved before decisions are made and resources committed. In many important cases, complete information is not available or is too expensive (in time, money, or other resources) to obtain.
  • Decision framework - decision analysis provides concepts and language to help the decision-maker. The decision maker is aware of the adequacy or inadequacy of the decision basis:
  • Decision-making process - provides a step-by-step procedure that has proved practical in tackling even the most complex problems in an efficient and orderly way.
  • Decision making methodology - provides a number of specific tools that are sometimes indispensable in analyzing a decision problem. These tools include procedures for eliciting and constructing influence diagrams, probability trees, and decision trees; procedures for encoding probability functions and utility curves; and a methodology for evaluating these trees and obtaining information useful to further refine the analysis.

Each level focuses on different aspects of the problem of making decisions. And it is decision making that we're after.  The purpose of the analysis is not to obtain a set of numbers describing decision alternatives. It is to provide the decision-maker the insight needed to choose between alternatives. These insights typically have three elements:

  • What is important to making the decision?
  • Why is it important?
  • How important is it?

Now To The Problem at Hand

It has been conjectured ...

No Estimates

The key here and the critical unanswered question is how can a decision about an outcome in the future, in the presence of that uncertain future, be made in the absence of estimating the attributes going into that decision?

That is, if we have less than acceptable knowledge about a future outcome, how can we make a decision about the choices involved in that outcome?

Dealing with Uncertainty

All project work operates in the presence of uncertainty. The underlying statistical processes create probabilistic outcomes for future activities. These activities may be probabilistic events, or the naturally occurring variances of the processes that make up the project. 

Clarity of discussion through the language of probability is one of the basis of decision analysis. The reality of uncertainty must be confronted and described, and the mathematics of probability is the natural language to describe uncertainty.

When we don't have the clarity of language, when redefining mathematical terms, misusing mathematical terms, enters the conversation, agreeing on the ways - and there are many ways - of making decisions in the presence of an uncertain future - becomes bogged down in approaches that can't be tested in any credible manner. What remains is personal opinion, small sample anecdotes, and attempts to solve complex problems with simple and simple minded approaches. 

For every complex problem there is an answer that is clear, simple, and wrong. H. L. Mencken

Related articles Eyes Wide Shut - A View of No Estimates Carl Sagan's BS Detector Systems Thinking, System Engineering, and Systems Management
Categories: Project Management

Your Job Is Going Away and You Won’t Get a Retirement

Making the Complex Simple - John Sonmez - Mon, 06/29/2015 - 16:00

The world is changing—rapidly. Not too long ago, you could reasonably expect gainful employment at a single company for most of your life and then enjoy a nice pension or retirement plan later in life. It wasn’t too long ago when most software developers might be expected to work for the same company for at […]

The post Your Job Is Going Away and You Won’t Get a Retirement appeared first on Simple Programmer.

Categories: Programming

Dealing with People You Can’t Stand

“If You Want To Go Fast, Go Alone. If You Want To Go Far, Go Together” – African Proverb

I blew the dust off some olds posts to rekindle some of the most important information for work and life.

It’s about dealing with people you can’t stand.

Whether you think of them as jerks, bullies, or just difficult people, the better you can deal with difficult people, the better you can get things done and make things happen.

And the more you learn how to bring out the best, in people at their worst, the less you’ll find people you can’t stand.

How To Bring Out the Best in People at Their Worst (Including Yourself)

Everything I needed to learn about dealing with difficult people, I learned from the book Dealing with People You Can’t Stand: How to Bring Out the Best in People at Their Worst, by Dr. Rick Brinkman and Dr. Rick Kirschner.

It’s one of the most brilliant, thoughtful books I’ve ever read on interpersonal skills and dealing with all sorts of bad behaviors.

The real key to dealing with difficult behavior is more than just recognizing bad behaviors in other people.

It’s recognizing bad behaviors in yourself, the kind that contribute to and amplify other people’s bad behaviors.

The more you know, the more you grow, and this is truly one of those transformational books.

Learn How To Deal with Difficult People (and Gain Some Mad Interpersonal Skills)

I’ve completely re-written my pot that provides an overview of the big ideas in Dealing with People You Can’t Stand:

Dealing with People You Can’t Stand

Even better, I’ve re-written all of my posts that talk through the 10 Types of Difficult People, and what to do about them.

I have to warn you:  Once you learn the 10 Types of Difficult People, you’ll be using the labels to classify bad behaviors that you experience in the halls, in meetings, behind your back, etc.

With that in mind, here they are …

10 Types of Difficult People

Here are the 10 Types of Difficult People at a glance:

  1. Grenade Person – After a brief period of calm, the Grenade person explodes into unfocused ranting and raving about things that have nothing to do with the present circumstances.
  2. Know-It-Alls – Seldom in doubt, the Know-It-All person has a low tolerance for correction and contradiction. If something goes wrong, however, the Know-It-All will speak with the same authority about who’s to blame – you!
  3. Maybe Person – In a moment of decision, the Maybe Person procrastinates in the hope that a better choice will present itself.
  4. No Person – A No Person kills momentum and creates friction for you. More deadly to morale than a speeding bullet, more powerful than hope, able to defeat big ideas with a single syllable.
  5. Nothing Person – A Nothing Person doesn’t contribute to the conversation. No verbal feedback, no nonverbal feedback, Nothing. What else could you expect from … the Nothing Person.
  6. Snipers – Whether through rude comments, biting sarcasm, or a well-timed roll of the eyes, making you look foolish is the Sniper’s specialty.
  7. Tanks – The Tank is confrontational, pointed and angry, the ultimate in pushy and aggressive behavior
  8. Think-They-Know-It-Alls – Think-They-Know-It-All people can’t fool all the people all the time, but they can fool some of the people enough of the time, and enough of the people all of the time – all for the sake of getting some attention.
  9. Whiners – Whiners feel helpless and overwhelmed by an unfair world. Their standard is perfection, and no one and nothing measures up to it.
  10. Yes Person – In an effort to please people and avoid confrontation, Yes People say “yes” without thinking things through.

I warned you.  Are you already thinking about some Snipers in a few meetings that you have, or is there a Yes Person driving you nuts (or are you that Yes Person?)

Have you talked to a Think-They-Know-It-All lately, or worse, a Know-It—All?

Never fear, I’ve included actionable insights and recommendations for dealing with all the various bad behaviors you’ll encounter.

The Lens of Human Understanding

If all this talk about dealing with difficult people, and having silly labels seems like a gimmick, it’s not.  It’s actually deep insight rooted in a powerful, but simple framework that Dr. Rick Brinkman and Dr. Rick Kirschner refer to as the Lens of Human Understanding:

The Lens of Human Understanding

Once I learned The Lens of Human Understanding, so many things fell into place.

Not only did I understand myself better, but I could instantly see what was driving other people, and how my behavior would either create more conflict or resolve it.

But when you don’t know what makes people tick, it’s very easy to get ticked off, or to tick them off.

Here’s looking at you … and other people … and their behaviors … in a brand new way.

You Might Also Like

25 Books the Most Successful Microsoft Leaders Read and Do

Interpersonal Skills Books

Personal Development Hub on Sources of Insight

Personal Development Resources at Sources of Insight

The Great Leadership Quotes Collection

Categories: Architecture, Programming

Software Development Conferences Forecast June 2015

From the Editor of Methods & Tools - Mon, 06/29/2015 - 14:17
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP) and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods & […]

R: Speeding up the Wimbledon scraping job

Mark Needham - Mon, 06/29/2015 - 06:36

Over the past few days I’ve written a few blog posts about a Wimbledon data set I’ve been building and after running the scripts a few times I noticed that it was taking much longer to run that I expected.

To recap, I started out with the following function which takes in a URI and returns a data frame containing a row for each match:

library(rvest)
library(dplyr)
 
scrape_matches1 = function(uri) {
  matches = data.frame()
 
  s = html(uri)
  rows = s %>% html_nodes("div#scoresResultsContent tr")
  i = 0
  for(row in rows) {  
    players = row %>% html_nodes("td.day-table-name a")
    seedings = row %>% html_nodes("td.day-table-seed")
    score = row %>% html_node("td.day-table-score a")
    flags = row %>% html_nodes("td.day-table-flag img")
 
    if(!is.null(score)) {
      player1 = players[1] %>% html_text() %>% str_trim()
      seeding1 = ifelse(!is.na(seedings[1]), seedings[1] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag1 = flags[1] %>% html_attr("alt")
 
      player2 = players[2] %>% html_text() %>% str_trim()
      seeding2 = ifelse(!is.na(seedings[2]), seedings[2] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag2 = flags[2] %>% html_attr("alt")
 
      matches = rbind(data.frame(winner = player1, 
                                 winner_seeding = seeding1, 
                                 winner_flag = flag1,
                                 loser = player2, 
                                 loser_seeding = seeding2,
                                 loser_flag = flag2,
                                 score = score %>% html_text() %>% str_trim(),
                                 round = round), matches)      
    } else {
      round = row %>% html_node("th") %>% html_text()
    }
  } 
  return(matches)
}

Let’s run it to get an idea of the data that it returns:

matches1 = scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results")
 
> matches1 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals
2   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
3 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
6  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
7  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals

As I mentioned, it’s quite slow but I thought I’d wrap it in system.time so I could see exactly how long it was taking:

> system.time(scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results"))
   user  system elapsed 
 25.570   0.111  31.416

About 30 seconds! The first thing I tried was downloading the file separately and running the function against the local file:

> system.time(scrape_matches1("data/raw/2014.html"))
   user  system elapsed 
 25.662   0.123  25.863

Hmmm, that’s only saved us 5 seconds so the bottleneck must be somewhere else. Still there’s no point making a HTTP request every time we run the script so we’ll stick with the local file version.

While browsing rvest’s vignette I noticed a function called html_table which I was curious about. I decided to try and replace some of my code with a call to that:

matches2= html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round)
 
> matches2 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding           loser loser_seeding            score          round
1  Novak Djokovic            (1)   Roger Federer           (4) 677 64 764 57 64         Finals
2  Novak Djokovic            (1) Grigor Dimitrov          (11)    64 36 762 767    Semi-Finals
3   Roger Federer            (4)    Milos Raonic           (8)         64 64 64    Semi-Finals
4  Novak Djokovic            (1)     Marin Cilic          (26)  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)     Andy Murray           (3)        61 764 62 Quarter-Finals
6   Roger Federer            (4)   Stan Wawrinka           (5)     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)    Nick Kyrgios          (WC)    674 62 64 764 Quarter-Finals

I had to do some slightly clever stuff to get the ’round’ column into shape using zoo’s na.locf function which I wrote about previously.

Unfortunately I couldn’t work out how to extract the flag with this version – that value is hidden in the ‘alt’ tag of an img and presumably html_table is just grabbing the text value of each cell. This version is much quicker though!

system.time(html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round))
 
   user  system elapsed 
  0.545   0.002   0.548

What I realised from writing this version is that I need to match all the columns with one call to html_nodes rather than getting the row and then each column in a loop.

I rewrote the function to do that:

scrape_matches3 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  matches3 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim())
  )
  return(matches3)
}

Let’s run and time that to check we’re getting back the right results in a timely manner:

> matches3 %>% sample_n(10)
                   winner winner_seeding winner_flag               loser loser_seeding loser_flag         score
70           David Ferrer            (7)         ESP Pablo Carreno Busta                      ESP  60 673 61 61
128        Alex Kuznetsov           (26)         USA         Tim Smyczek           (3)        USA   46 63 63 63
220   Rogerio Dutra Silva                        BRA   Kristijan Mesaros                      CRO         62 63
83         Kevin Anderson           (20)         RSA        Aljaz Bedene          (LL)        GBR      63 75 62
73          Kei Nishikori           (10)         JPN   Kenny De Schepper                      FRA     64 765 75
56  Roberto Bautista Agut           (27)         ESP         Jan Hernych           (Q)        CZE   75 46 62 62
138            Ante Pavic                        CRO        Marc Gicquel          (29)        FRA  46 63 765 64
174             Tim Puetz                        GER     Ruben Bemelmans                      BEL         64 62
103        Lleyton Hewitt                        AUS   Michal Przysiezny                      POL 62 6714 61 64
35          Roger Federer            (4)         SUI       Gilles Muller           (Q)        LUX      63 75 63
 
> system.time(scrape_matches3("data/raw/2014.html"))
   user  system elapsed 
  0.815   0.006   0.827

It’s still quick – a bit slower than html_table but we can deal with that. As you can see, I also had to add some logic to separate the values for the winners and losers – the players, seeds, flags come back as as one big list. The odd rows represent the winner; the even rows the loser.

Annoyingly we’ve now lost the ’round’ column because that appears as a table heading so we can’t extract it the same way. I ended up cheating a bit to get it to work by working out how many matches each round should contain and generated a vector with that number of entries:

raw_rounds = s %>% html_nodes("th") %>% html_text()
 
> raw_rounds
 [1] "Finals"               "Semi-Finals"          "Quarter-Finals"       "Round of 16"          "Round of 32"         
 [6] "Round of 64"          "Round of 128"         "3rd Round Qualifying" "2nd Round Qualifying" "1st Round Qualifying"
 
rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
            sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
> rounds[1:10]
 [1] "Finals"         "Semi-Finals"    "Semi-Finals"    "Quarter-Finals" "Quarter-Finals" "Quarter-Finals" "Quarter-Finals"
 [8] "Round of 16"    "Round of 16"    "Round of 16"

Let’s put that code into the function and see if we end up with the same resulting data frame:

scrape_matches4 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  raw_rounds = s %>% html_nodes("th") %>% html_text()
  rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
              sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
  matches4 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim()),
    round          = rounds
  )
  return(matches4)
}
 
matches4 = scrape_matches4("data/raw/2014.html")
 
> matches4 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals
2  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
3   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
6   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals

We shouldn’t have added much to the time but let’s check:

> system.time(scrape_matches4("data/raw/2014.html"))
   user  system elapsed 
  0.816   0.004   0.824

Sweet. We’ve saved ourselves 29 seconds per page as long as the number of rounds stayed constant over the years. For the 10 years that I’ve looked at it has but I expect if you go back further the draw sizes will have been different and our script would break.

For now though this will do!

Categories: Programming

Information Technology Estimating Quality

Herding Cats - Glen Alleman - Mon, 06/29/2015 - 01:07

The estimator's charter is not to state what the developers should do, but rather to provide a reasonable project of what they will do. - Tom DeMarco

Here's a few resource materials for estimating cost, schedule, and technical outcomes on software intensive systems. In meeting about managing risk in the presence of uncertainty below, it became clear we need to integrate estimating with risk, technical performance measures, measures of effectiveness, measures of performance, cost, and schedule.

Some Recent Resources

There are 1,117 other papers and articles in the Software Cost Estimating folder on my server. These are just a very small sample of how to make estimates.

The notion that we can make decision in the presence of uncertainty in the absence of estimating (#NoEstimates) the outcomes of those decisions can only work if there no opportunity costs at risk in the future. That is there is nothing at risk for making a choice between multiple outcomes.

Related articles Humpty Dumpty and #NoEstimates Climbing Mountains Requires Good Estimates Carl Sagan's BS Detector
Categories: Project Management

R: dplyr – Update rows with earlier/previous rows values

Mark Needham - Sun, 06/28/2015 - 23:30

Recently I had a data frame which contained a column which had mostly empty values:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA))
  col1 col2
1    1    a
2    2 <NA>
3    3 <NA>
4    4    b
5    5 <NA>

I wanted to fill in the NA values with the last non NA value from that column. So I want the data frame to look like this:

1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

I spent ages searching around before I came across the na.locf function in the zoo library which does the job:

library(zoo)
library(dplyr)
 
> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA)) %>% 
    do(na.locf(.))
  col1 col2
1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

This will fill in the missing values for every column, so if we had a third column with missing values it would populate those too:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    do(na.locf(.))
 
  col1 col2 col3
1    1    a    A
2    2    a    A
3    3    a    B
4    4    b    B
5    5    b    B

If we only want to populate ‘col2′ and leave ‘col3′ as it is we can apply the function specifically to that column:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    mutate(col2 = na.locf(col2))
  col1 col2 col3
1    1    a    A
2    2    a <NA>
3    3    a    B
4    4    b <NA>
5    5    b <NA>

It’s quite a neat function and certainly comes in helpful when cleaning up data sets which don’t tend to be as uniform as you’d hope!

Categories: Programming

SPaMCAST 348 - Woody Zuill, #NoEstimates

Software Process and Measurement Cast - Sun, 06/28/2015 - 22:00

The Software Process and Measurement Cast features our interview with Woody Zuill.  We talked about the concept and controversy swirling around #NoEstimates. Even if the concept is a bridge too far for you, the conversation is important because we talked about why thinking and questioning is a critical survival technique. As Woody points out, it is important to peer past the “thou musts” to gain greater understanding of what you should be doing!

Woody Zuill has been programming computers for 30+ years. Over the last 15+ years he has worked as an Agile Coach, Trainer, and Extreme Programmer and now works with Industrial Logic as a Trainer/Coach/Consultant for Agile and Lean software development. He believes code must be simple, clean, and maintainable to realize the Agile promise of Responding to Change. 

Contact Information
Mob Programming: http://mobprogramming.org/
Blog: http://zuill.us/WoodyZuill/
Twitter: https://twitter.com/woodyzuill

Call to action!

I have a challenge for the Software Process and Measurement Cast listeners for the next few weeks.  I would like you find one person that you think would like the podcast and introduce them to the cast.  This might mean sending them the URL or teaching how to download podcasts.  If you like the podcast and think it is valuable they will be thankful to you for introducing them to the Software Process and Measurement Cast! Thank you in advance!

Re-Read Saturday News

We have just begun the Re-Read Saturday of The Mythical Man-Month. We are off to a rousing start beginning with the Tar Pit.   Get a copy now and start reading!

The Re-Read Saturday and other great articles can be found on the Software Process and Measurement Blog.

Remember: We just completed the Re-Read Saturday of Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement, which began on February 21nd. What did you think?  Did the re-read cause you to read The Goal back up for a refresher? Visit the Software Process and Measurement Blog and review the whole re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

Upcoming Events

Software Quality and Test Management 

September 13 – 18, 2015

San Diego, California

http://qualitymanagementconference.com/

I will be speaking on the impact of cognitive biases on teams!  Let me know if you are attending!

More on other great conferences soon!

Next SPaMCast

The next Software Process and Measurement Cast is a magazine installment.  We will feature our essay on Agile Testing. The flow of testing is different in an Agile project.  In many cases, organizations have either not recognized the change in flow, or have created Agile/waterfall hybrids with test groups holding onto waterfall patterns.  While some of the hybrids are driven by mandated contractual relationships, the majority are driven by lack of understanding or fear of how testing should flow in Agile projects.

We will also have new installments from Jeremy Berriault’s QA Corner.  Jeremy, is a leader in the world of quality assurance and testing and was originally interviewed on the Software Process and Measurement Cast 274. The third column features Steve Tendon discussing more of his great new book, Hyper-Productive Knowledge Work Performance. 

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

SPaMCAST 348 – Woody Zuill, #NoEstimates

 www.spamcast.net

http://www.spamcast.net

Listen Now

Subscribe on iTunes

The Software Process and Measurement Cast features our interview with Woody Zuill.  We talked about the concept and controversy swirling around #NoEstimates. Even if the concept is a bridge too far for you, the conversation is important because we talked about why thinking and questioning is a critical survival technique. As Woody points out, it is important to peer past the “thou musts” to gain greater understanding of what you should be doing!

Woody Zuill has been programming computers for 30+ years. Over the last 15+ years he has worked as an Agile Coach, Trainer, and Extreme Programmer and now works with Industrial Logic as a Trainer/Coach/Consultant for Agile and Lean software development. He believes code must be simple, clean, and maintainable to realize the Agile promise of Responding to Change.

Contact Information
Mob Programming: http://mobprogramming.org/
Blog: http://zuill.us/WoodyZuill/
Twitter: https://twitter.com/woodyzuill

Call to action!

I have a challenge for the Software Process and Measurement Cast listeners for the next few weeks.  I would like you find one person that you think would like the podcast and introduce them to the cast.  This might mean sending them the URL or teaching how to download podcasts.  If you like the podcast and think it is valuable they will be thankful to you for introducing them to the Software Process and Measurement Cast! Thank you in advance!

Re-Read Saturday News

We have just begun the Re-Read Saturday of The Mythical Man-Month. We are off to a rousing start beginning with the Tar Pit.   Get a copy now and start reading!

The Re-Read Saturday and other great articles can be found on the Software Process and Measurement Blog.

 

Remember: We just completed the Re-Read Saturday of Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement, which began on February 21nd. What did you think?  Did the re-read cause you to read The Goal back up for a refresher? Visit the Software Process and Measurement Blog and review the whole re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

Upcoming Events

Software Quality and Test Management 

September 13 – 18, 2015

San Diego, California

http://qualitymanagementconference.com/

I will be speaking on the impact of cognitive biases on teams!  Let me know if you are attending!

 

More on other great conferences soon!

 

Next SPaMCast

The next Software Process and Measurement Cast is a magazine installment.  We will feature our essay on Agile Testing. The flow of testing is different in an Agile project.  In many cases, organizations have either not recognized the change in flow, or have created Agile/waterfall hybrids with test groups holding onto waterfall patterns.  While some of the hybrids are driven by mandated contractual relationships, the majority are driven by lack of understanding or fear of how testing should flow in Agile projects.

We will also have new installments from Jeremy Berriault’s QA Corner.  Jeremy, is a leader in the world of quality assurance and testing and was originally interviewed on the Software Process and Measurement Cast 274. The third column features Steve Tendon discussing more of his great new book, Hyper-Productive Knowledge Work Performance.

 

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.


Categories: Process Management

C4 stencil for OmniGraffle

Coding the Architecture - Simon Brown - Sun, 06/28/2015 - 11:38

If you like the look and feel of the C4 software architecture diagrams in my Software Architecture for Developers book (see examples here), Dennis Laumen has created an OmniGraffle stencil that will save you some time. Just download the stencil, install and it will appear in your stencil library.

The C4 stencil is available from Omni Group's Stenciltown. Thanks Dennis!

Categories: Architecture

Re-Read Saturday: Mythical Man-Month, Part 1

The Mythical Man-Month

The Mythical Man-Month

The Mythical Man-Month: Essays on Software Engineering was originally published in 1975. That was the year I took my first course in programming – FORTRAN, at Memphis State University. Mythical Man-Month has been a must read for every professional involved in delivering value since it was published. The core themes in the book, which include Brook’s Law (adding people to a late project just makes it later – more in a future installment), are just as important today as they were when they were published because they still true for all types of software development and maintenance.   The Mythical Man-Month, I think on reflection you will agree, is still important because we are still trying to come to grips with most if not all of the concepts that Brooks exposed.

Here is the game plan for the Re-Read. We will be reading from Anniversary Edition of the Mythical Man-Month (Addison Wesley 1995). My copy is the 14th printing from 2014. Part of this re-read will actually be a first read for me. I believe I originally read Man-Month in the early 1980’s, and this edition has a new preface and four new chapters. My intent during the re-read is to cover two essays/chapters per week so the re-read takes ten weeks “ish.”  Today will be the exception to the rule. We will tackle the first essay, The Tar Pit (I already used the ideas in the essay to make a point in the essay on Scrum of Scrum membership).

Preface(s):
The original preface provides background for Brooks’ perspective. Brooks came to his conclusions based on a history as a hardware architect then as the manager of the OS/360 project (IBM’s highly successful mainframe operating system). In the preface Brooks telegraphs one of his central theorems – large projects are different than small projects. It took a few years and the Agile Manifesto (2001) to recognize this fact and begin to act on it.

The anniversary edition includes a new preface in addition to the original. There were two notes in the new preface that caused me pause. The first was the observation by Brooks was that few of the central thoughts in the book had been critiqued, proven, and disproven by research and experience. I would like your feedback as we review the essays. I will explicitly point those points out during the re-read and if miss any skip ahead to Chapter 18 to keep me honest. The second point, based on my interest in the publishing industry,  from the new preface was that Brooks was able to publicly thank Addison Wesley staff that he found instrumental when preparing the first edition. The publishing world has changed. The question I pose to the Software Process and Measurement Cast blog readers is has our profession changed as much?

Chapter 1: The Tar Pit

In the Tar Pit Brooks sets the context for the system programming as a craft that is more than just the lone coder sitting in his or her office creating a stand alone app but rather a member of a team building large scale systems. The Tar Pit describes why the complexity and level of effort is different and then exposes the joy and woes of the profession.

The first takeaway from the Tar Pit is that as the product of a piece of work progresses from the delivery of a single program to programming product or system then to a programming systems product, the complexity of managing and understanding the work increases. As size and complexity increase so do tasks and activities to ensure the product works together. Brook suggests that the difference between level of effort between the creation of stand-alone program and a systems product is 9 times! The complexity of coordinating all of these additional tasks and activities while solving complex business problems impacts our ability to deliver on-time, on-budget and on-scope. At times, it also impact our ability to deliver WOW.

The second takeaway are the joys of programming.  Joys motivate programmers to put forth the effort needed to deliver value while trying to navigate the tar pit.  Brooks points out that the rewards of programming are at least four fold:

  1. The joy of making things.
  2. The joy of making things that are useful to others.
  3. The joy of problem solving.
  4. The joy of learning.

These joys are offset by several woes.  These woes are the part of the tar pit that make programming hard.

  1. Expectation of perfection. Unless the code is correct and solves the requirements, it is wrong. There is little to no gray area.
  2. Dependence on others. Others set your objectives, provide resources and often control the source of information. This often leads to the tension caused when individual or teams have the responsibility to deliver without commensurate authority. Agile principles, when applied to teams, are a start to addressing these woes.
  3. Finding the bugs in the big picture.  Designing the big picture is significantly more fun than the tedious effort required to find bugs.
  4. Dead on arrival projects.  Long running projects are occasionally dead on arrival or the market has changed and what has been delivered is not what is needed.

In 1975 when this essay was published Agile was not a word one would attach to building applications or systems. However I think we can see the seeds of the Agile revolution in this essay.


Categories: Process Management

R: Command line – Error in GenericTranslator$new : could not find function “loadMethod”

Mark Needham - Sat, 06/27/2015 - 23:47

I’ve been reading Text Processing with Ruby over the last week or so and one of the ideas the author describes is setting up your scripts so you can run them directly from the command line.

I wanted to do this with my Wimbledon R script and wrote the following script which uses the ‘Rscript’ executable so that R doesn’t launch in interactive mode:

wimbledon

#!/usr/bin/env Rscript
 
library(rvest)
library(dplyr)
library(stringr)
library(readr)
 
# stuff

Then I tried to run it:

$ time ./wimbledon
 
...
 
Error in GenericTranslator$new : could not find function "loadMethod"
Calls: write.csv ... html_extract_n -> <Anonymous> -> Map -> mapply -> <Anonymous> -> $
Execution halted
 
real	0m1.431s
user	0m1.127s
sys	0m0.078s

As the error suggests, the script fails when trying to write to a CSV file – it looks like Rscript doesn’t load in something from the core library that we need. It turns out adding the following line to our script is all we need:

library(methods)

So we end up with this:

#!/usr/bin/env Rscript
 
library(methods)
library(rvest)
library(dplyr)
library(stringr)
library(readr)

And when we run that all is well!

Categories: Programming

R: dplyr – squashing multiple rows per group into one

Mark Needham - Sat, 06/27/2015 - 23:36

I spent a bit of the day working on my Wimbledon data set and the next thing I explored is all the people that have beaten Andy Murray in the tournament.

The following dplyr query gives us the names of those people and the year the match took place:

library(dplyr)
 
> main_matches %>% filter(loser == "Andy Murray") %>% select(winner, year)
 
            winner year
1  Grigor Dimitrov 2014
2    Roger Federer 2012
3     Rafael Nadal 2011
4     Rafael Nadal 2010
5     Andy Roddick 2009
6     Rafael Nadal 2008
7 Marcos Baghdatis 2006
8 David Nalbandian 2005

As you can see, Rafael Nadal shows up multiple times. I wanted to get one row per player and list all the years in a single column.

This was my initial attempt:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year))
Source: local data frame [6 x 2]
 
            winner years
1     Andy Roddick  2009
2 David Nalbandian  2005
3  Grigor Dimitrov  2014
4 Marcos Baghdatis  2006
5     Rafael Nadal  2011
6    Roger Federer  2012

Unfortunately it just gives you the last matching row per group which isn’t quite what we want.. I realised my mistake while trying to pass a vector into paste and noticing that a vector came back when I’d expected a string:

> paste(c(2008,2009,2010))
[1] "2008" "2009" "2010"

The missing argument was ‘collapse’ – something I’d come across when using plyr last year:

> paste(c(2008,2009,2010), collapse=", ")
[1] "2008, 2009, 2010"

Now, if we apply that to our original function:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year, collapse=", "))
Source: local data frame [6 x 2]
 
            winner            years
1     Andy Roddick             2009
2 David Nalbandian             2005
3  Grigor Dimitrov             2014
4 Marcos Baghdatis             2006
5     Rafael Nadal 2011, 2010, 2008
6    Roger Federer             2012

That’s exactly what we want. Let’s tidy that up a bit:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% arrange(year) %>%
     summarise(years  = paste(year, collapse =","), times = length(year))  %>%
     arrange(desc(times), years)
Source: local data frame [6 x 3]
 
            winner          years times
1     Rafael Nadal 2008,2010,2011     3
2 David Nalbandian           2005     1
3 Marcos Baghdatis           2006     1
4     Andy Roddick           2009     1
5    Roger Federer           2012     1
6  Grigor Dimitrov           2014     1
Categories: Programming

Scala development with GitHub's Atom editor

Xebia Blog - Sat, 06/27/2015 - 14:57
.code { font-family: monospace; background-color: #eeeeee; }

GitHub recently released version 1.0 of their Atom editor. This post gives a rough overview of its Scala support.

Basic features

Basic features such as Scala syntax highlighting are provided by the language-scala plugin.

Some work on worksheets as found in e.g. Eclipse has been done in the scala-worksheet-plus plugin, but this is still missing major features and not very useful at this time.

Navigation and completion Ctags

Atom supports basic 'Go to Declaration' (ctrl-alt-down) and 'Search symbol' (cmd-shift-r) support by way of the default ctags-based symbols-view.

While there are multiple sbt plugins for generating ctags, the easiest seems to be to have Ensime download the sources (more on that below) and invoke ctags manually: put this configuration in your home directory and run the 'ctags' command from your project root.

This is useful for searching for symbols, but limited for finding declarations: for example, when checking the declaration for Success, ctags doesn't know whether this is scala.util.Success, akka.actor.Status.Success, spray.http.StatusCodes.Success or some other 3rd-party or local symbol with that name.

Ensime

This is where the Ensime plugin comes in.

Ensime is a service for Scala IDE support, originally written for the Scala support in Emacs. The project metadata for Ensime can be generated with 'sbt gen-ensime' from the ensime-sbt sbt plugin.

Usage

Start the Ensime server from Atom with 'cmd-shift-p' 'Ensime: start'. After a small pause the status bar proclaims 'Indexer ready!' and you should be good to go.

At this point the main features are 'jump to definition' (alt-click), hover for type info, and auto-completion:

atom.io ensime completion

There are some rough edges, but this is a promising start based on a solid foundation.

Conclusions

While Atom is already a pleasant, modern, open source, cross platform editor, it is clearly still early days.

The Scala support in Atom is not yet as polished as in IDE's such as IntelliJ IDEA or as stable as in more mature editors such as Sublime Text, but is already practically useful and has serious potential. Startup is not instant, but I did not notice a 'sluggish feel' as reported by earlier reviewers.

Feel free to share your experiences in the comments, I will keep this post updated as the tools - and our experience with them - evolve.

R: ggplot – Show discrete scale even with no value

Mark Needham - Fri, 06/26/2015 - 23:48

As I mentioned in a previous blog post, I’ve been scraping data for the Wimbledon tennis tournament, and having got the data for the last ten years I wrote a query using dplyr to find out how players did each year over that period.

I ended up with the following functions to filter my data frame of all the mataches:

round_reached = function(player, main_matches) {
  furthest_match = main_matches %>% 
    filter(winner == player | loser == player) %>% 
    arrange(desc(round)) %>% 
    head(1)  
 
    return(ifelse(furthest_match$winner == player, "Winner", as.character(furthest_match$round)))
}
 
player_performance = function(name, matches) {
  player = data.frame()
  for(y in 2005:2014) {
    round = round_reached(name, filter(matches, year == y))
    if(length(round) == 1) {
      player = rbind(player, data.frame(year = y, round = round))      
    } else {
      player = rbind(player, data.frame(year = y, round = "Did not enter"))
    } 
  }
  return(player)
}

When we call that function we see the following output:

> player_performance("Andy Murray", main_matches)
   year          round
1  2005    Round of 32
2  2006    Round of 16
3  2007  Did not enter
4  2008 Quarter-Finals
5  2009    Semi-Finals
6  2010    Semi-Finals
7  2011    Semi-Finals
8  2012         Finals
9  2013         Winner
10 2014 Quarter-Finals

I wanted to create a chart showing Murray’s progress over the years with the round reached on the y axis and the year on the x axis. In order to do this I had to make sure the ’round’ column was being treated as a factor variable:

df = player_performance("Andy Murray", main_matches)
 
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
df$round = factor(df$round, levels =  rounds)
 
> df$round
 [1] Round of 32    Round of 16    Did not enter  Quarter-Finals Semi-Finals    Semi-Finals    Semi-Finals   
 [8] Finals         Winner         Quarter-Finals
Levels: Did not enter Round of 128 Round of 64 Round of 32 Round of 16 Quarter-Finals Semi-Finals Finals Winner

Now that we’ve got that we can plot his progress:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds)

2015 06 26 23 37 32

This is a good start but we’ve lost the rounds which don’t have a corresponding entry on the x axis. I’d like to keep them so it’s easier to compare the performance of different players.

It turns out that all we need to do is pass ‘drop = FALSE’ to scale_y_discrete and it will work exactly as we want:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop = FALSE)

2015 06 26 23 41 01

Neat. Now let’s have a look at the performances of some of the other top players:

draw_chart = function(player, main_matches){
  df = player_performance(player, main_matches)
  df$round = factor(df$round, levels =  rounds)
 
  ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop=FALSE) + 
    ggtitle(player) + 
    theme(axis.text.x=element_text(angle=90, hjust=1))
}
 
a = draw_chart("Andy Murray", main_matches)
b = draw_chart("Novak Djokovic", main_matches)
c = draw_chart("Rafael Nadal", main_matches)
d = draw_chart("Roger Federer", main_matches)
 
library(gridExtra)
grid.arrange(a,b,c,d, ncol=2)

2015 06 26 23 46 15

And that’s all for now!

Categories: Programming

An update on Eclipse Android Developer Tools

Android Developers Blog - Fri, 06/26/2015 - 22:28

Posted by Jamal Eason, Product Manager, Android

Over the past few years, our team has focused on improving the development experience for building Android apps with Android Studio. Since the launch of Android Studio, we have been impressed with the excitement and positive feedback. As the official Android IDE, Android Studio gives you access to a powerful and comprehensive suite of tools to evolve your app across Android platforms, whether it's on the phone, wrist, car or TV.

To that end and to focus all of our efforts on making Android Studio better and faster, we are ending development and official support for the Android Developer Tools (ADT) in Eclipse at the end of the year. This specifically includes the Eclipse ADT plugin and Android Ant build system.

Time to Migrate

If you have not had the chance to migrate your projects to Android Studio, now is the time. To get started, download Android Studio. For many developers, migration is as simple as importing your existing Eclipse ADT projects in Android Studio with File → New→ Import Project as shown below:

For more details on the migration process, check out the migration guide. Also, to learn more about Android Studio and the underlying build system, check out this overview page.

Next Steps

Over the next few months, we are migrating the rest of the standalone performance tools (e.g. DDMS, Trace Viewer) and building in additional support for the Android NDK into Android Studio.

We are focused on Android Studio so that our team can deliver a great experience on a unified development environment. Android tools inside Eclipse will continue to live on in the open source community via the Eclipse Foundation. Check out the latest Eclipse Andmore project if you are interested in contributing or learning more.

For those of you that are new to Android Studio, we are excited for you to integrate Android Studio into your development workflow. Also, if you want to contribute to Android Studio, you can also check out the project source code. To follow all the updates on Android Studio, join our Google+ community.

Categories: Programming