Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Programming

R: Calculating the difference between ordered factor variables

Mark Needham - Thu, 07/02/2015 - 23:55

In my continued exploration of Wimbledon data I wanted to work out whether a player had done as well as their seeding suggested they should.

I therefore wanted to work out the difference between the round they reached and the round they were expected to reach. A ’round’ in the dataset is an ordered factor variable.

These are all the possible values:

rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")

And if we want to factorise a couple of strings into this factor we would do it like this:

round = factor("Finals", levels = rounds, ordered = TRUE)
expected = factor("Winner", levels = rounds, ordered = TRUE)  
 
> round
[1] Finals
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner
 
> expected
[1] Winner
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner

In this case the difference between the actual round and expected round should be -1 – the player was expected to win the tournament but lost in the final. We can calculate that differnce by calling the unclass function on each variable:

 
> unclass(round) - unclass(expected)
[1] -1
attr(,"levels")
[1] "Did not enter"  "Round of 128"   "Round of 64"    "Round of 32"    "Round of 16"    "Quarter-Finals"
[7] "Semi-Finals"    "Finals"         "Winner"

That still seems to have some remnants of the factor variable so to get rid of that we can cast it to a numeric value:

> as.numeric(unclass(round) - unclass(expected))
[1] -1

And that’s it! We can now go and apply this calculation to all seeds to see how they got on.

Categories: Programming

Game Performance: Data-Oriented Programming

Android Developers Blog - Thu, 07/02/2015 - 22:12

Posted by Shanee Nishry, Game Developer Advocate

To improve game performance, we’d like to highlight a programming paradigm that will help you maximize your CPU potential, make your game more efficient, and code smarter.

Before we get into detail of data-oriented programming, let’s explain the problems it solves and common pitfalls for programmers.

Memory

The first thing a programmer must understand is that memory is slow and the way you code affects how efficiently it is utilized. Inefficient memory layout and order of operations forces the CPU idle waiting for memory so it can proceed doing work.

The easiest way to demonstrate is by using an example. Take this simple code for instance:

char data[1000000]; // One Million bytes
unsigned int sum = 0;

for ( int i = 0; i < 1000000; ++i )
{
  sum += data[ i ];
}

An array of one million bytes is declared and iterated on one byte at a time. Now let's change things a little to illustrate the underlying hardware. Changes marked in bold:

char data[16000000]; // Sixteen Million bytes
unsigned int sum = 0;

for ( int i = 0; i < 16000000; i += 16 )
{
  sum += data[ i ];
}

The array is changed to contain sixteen million bytes and we iterate over one million of them, skipping 16 at a time.

A quick look suggests there shouldn't be any effect on performance as the code is translated to the same number of instructions and runs the same number of times, however that is not the case. Here is the difference graph. Note that this is on a logarithmic scale--if the scale were linear, the performance difference would be too large to display on any reasonably-sized graph!


Graph in logarithmic scale

The simple change making the loop skip 16 bytes at a time makes the program run 5 times slower!

The average difference in performance is 5x and is consistent when iterating 1,000 bytes up to a million bytes, sometimes increasing up to 7x. This is a serious change in performance.

Note: The benchmark was run on multiple hardware configurations including a desktop with Intel 5930K 3.50GHz CPU, a Macbook Pro Retina laptop with 2.6 GHz Intel i7 CPU and Android Nexus 5 and Nexus 6 devices. The results were pretty consistent.

If you wish to replicate the test, you might have to ensure the memory is out of the cache before running the loop because some compilers will cache the array on declaration. Read below to understand more on how it works.

Explanation

What happens in the example is quite simply explained when you understand how the CPU accesses data. The CPU can’t access data in RAM; the data must be copied to the cache, a smaller but extremely fast memory line which resides near the CPU chip.

When the program starts, the CPU is set to run an instruction on part of the array but that data is still not in the cache, therefore causing a cache miss and forcing the CPU to wait for the data to be copied into the cache.

For simplicity sake, assume a cache size of 16 bytes for the L1 cache line, this means 16 bytes will be copied starting from the requested address for the instruction.

In the first code example, the program next tries to operate on the following byte, which is already copied into the cache following the initial cache miss, therefore continuing smoothly. This is also true for the next 14 bytes. After 16 bytes, since the first cache miss the loop, will encounter another cache miss and the CPU will again wait for data to operate on, copying the next 16 bytes into the cache.

In the second code sample, the loop skips 16 bytes at a time but hardware continues to operate the same. The cache copies the 16 subsequent bytes each time it encounters a cache miss which means the loop will trigger a cache miss with each iteration and cause the CPU to wait idle for data each time!

Note: Modern hardware implements cache prefetch algorithms to prevent incurring a cache miss per frame, but even with prefetching, more bandwidth is used and performance is lower in our example test.

In reality the cache lines tend to be larger than 16 bytes, the program would run much slower if it were to wait for data at every iteration. A Krait-400 found in the Nexus 5 has a L0 data cache of 4 KB with 64 Bytes per line.

If you are wondering why cache lines are so small, the main reason is that making fast memory is expensive.

Data-Oriented Design

The way to solve such performance issues is by designing your data to fit into the cache and have the program to operate on the entire data continuously.

This can be done by organizing your game objects inside Structures of Arrays (SoA) instead of Arrays of Structures (AoS) and pre-allocating enough memory to contain the expected data.

For example, a simple physics object in an AoS layout might look like this:

struct PhysicsObject
{
  Vec3 mPosition;
  Vec3 mVelocity;

  float mMass;
  float mDrag;
  Vec3 mCenterOfMass;

  Vec3 mRotation;
  Vec3 mAngularVelocity;

  float mAngularDrag;
};

This is a common way way to present an object in C++.

On the other hand, using SoA layout looks more like this:

class PhysicsSystem
{
private:
  size_t mNumObjects;
  std::vector< Vec3 > mPositions;
  std::vector< Vec3 > mVelocities;
  std::vector< float > mMasses;
  std::vector< float > mDrags;

  // ...
};

Let’s compare how a simple function to update object positions by their velocity would operate.

For the AoS layout, a function would look like this:

void UpdatePositions( PhysicsObject* objects, const size_t num_objects, const float delta_time )
{
  for ( int i = 0; i < num_objects; ++i )
  {
    objects[i].mPosition += objects[i].mVelocity * delta_time;
  }
}

The PhysicsObject is loaded into the cache but only the first 2 variables are used. Being 12 bytes each amounts to 24 bytes of the cache line being utilised per iteration and causing a cache miss with every object on a 64 bytes cache line of a Nexus 5.

Now let’s look at the SoA way. This is our iteration code:

void PhysicsSystem::SimulateObjects( const float delta_time )
{
  for ( int i = 0; i < mNumObjects; ++i )
  {
    mPositions[ i ] += mVelocities[i] * delta_time;
  }
}

With this code, we immediately cause 2 cache misses, but we are then able to run smoothly for about 5.3 iterations before causing the next 2 cache misses resulting in a significant performance increase!

The way data is sent to the hardware matters. Be aware of data-oriented design and look for places it will perform better than object-oriented code.

We have barely scratched the surface. There is still more to data-oriented programming than structuring your objects. For example, the cache is used for storing instructions and function memory so optimizing your functions and local variables affects cache misses and hits. We also did not mention the L2 cache and how data-oriented design makes your application easier to multithread.

Make sure to profile your code to find out where you might want to implement data-oriented design. You can use different profilers for different architecture, including the NVIDIA Tegra System Profiler, ARM Streamline Performance Analyzer, Intel and PowerVR PVRMonitor.

If you want to learn more on how to optimize for your cache, read on cache prefetching for various CPU architectures.

Join the discussion on

+Android Developers
Categories: Programming

Do You Really Bill $300 an Hour?

Making the Complex Simple - John Sonmez - Thu, 07/02/2015 - 15:00

In this episode, I explain why I bill $300 an hour. Full transcript: John:               Hey, this is John Sonmez from simpleprogrammer.com. I got a question today about my billing rate, so whether it’s real or not and I found a few questions like this. I just want to say upfront that when I say something […]

The post Do You Really Bill $300 an Hour? appeared first on Simple Programmer.

Categories: Programming

Flappy

Phil Trelford's Array - Thu, 07/02/2015 - 08:04

This week I ran a half-day hands on games development session at the Progressive .Net Tutorials hosted by Skills Matter in London. I believe this was the last conference to be held in Goswell Road before the big move to an exciting new venue.

My session was on mobile games development with F# as the implementation language:

Ready, steady, cross platform games - ProgNet 2015 from Phillip Trelford

Here’s a quick peek inside the room:

Game programming with @ptrelford @ #prognet2015 I'ma make and sell a BILLION copies of this game y'all! pic.twitter.com/5qa4wOSY1G

— Adron (@adron) July 1, 2015

The session tasks were around 2 themes:

  • implement a times table question and answer game (think Nintendo’s Brain Training game)
  • extended a simple Flappy Bird clone

Times table game

The motivation behind this example was to help people:

  • build a simple game loop
  • pick up some basic F# skills

The first tasks , like asking a multiplication question, could be built using F#’s REPL (F# Interactive) and later tasks that took user input required running as a console application.

Here’s some of the great solutions that were posted up to F# Snippets:

To run them, create a new F# Console Application project in Xamarin Studio or Visual Studio and paste in the code (use the Raw view in F# Snippets to copy the code).

Dominic Finn’s source code includes some fun ASCII art too:

// _____ _   _ _____ _____ _____  ______  _  _   _____  _____ _     
//|  __ \ | | |  ___/  ___/  ___| |  ___|| || |_|  _  ||  _  | |    
//| |  \/ | | | |__ \ `--.\ `--.  | |_ |_  __  _| | | || | | | |    
//| | __| | | |  __| `--. \`--. \ |  _| _| || |_| | | || | | | |    
//| |_\ \ |_| | |___/\__/ /\__/ / | |  |_  __  _\ \_/ /\ \_/ / |____
// \____/\___/\____/\____/\____/  \_|    |_||_|  \___/  \___/\_____/
//  

Flappy Bird clone

For this example I sketched out a flappy bird clone using Monogame (along with WinForms and WPF for comparison) with the idea that people could enhance and extend the game:

image

Monogame lets you target multiple platforms including iOS and Android along with Mac, Linux, Windows and even Rapsberry Pi!

The different flavours are available on F# Snippets, simply cut and paste them into an F# script file to run them:

All the samples and tasks are also available in a zip: http://trelford.com/ProgNet15.zip

Have fun!

Categories: Programming

SE-Radio Episode 231: Joshua Suereth and Matthew Farwell on SBT and Software Builds

Joshua Suereth and Matthew Farwell discuss SBT (Simple Build Tool) and their new book SBT in Action. They first look at the factors creating a need for build systems and why they think SBT—a new addition to this area—is a valuable contribution in spite of the vast number of existing build tools. Host Tobias Kaatz, […]
Categories: Programming

R: write.csv – unimplemented type ‘list’ in ‘EncodeElement’

Mark Needham - Tue, 06/30/2015 - 23:26

Everyone now and then I want to serialise an R data frame to a CSV file so I can easily load it up again if my R environment crashes without having to recalculate everything but recently ran into the following error:

> write.csv(foo, "/tmp/foo.csv", row.names = FALSE)
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  : 
  unimplemented type 'list' in 'EncodeElement'

If we take a closer look at the data frame in question it looks ok:

> foo
  col1 col2
1    1    a
2    2    b
3    3    c

However, one of the columns contains a list in each cell and we need to find out which one it is. I’ve found the quickest way is to run the typeof function over each column:

> typeof(foo$col1)
[1] "double"
 
> typeof(foo$col2)
[1] "list"

So ‘col2′ is the problem one which isn’t surprising if you consider the way I created ‘foo':

library(dplyr)
foo = data.frame(col1 = c(1,2,3)) %>% mutate(col2 = list("a", "b", "c"))

If we do have a list that we want to add to the data frame we need to convert it to a vector first so we don’t run into this type of problem:

foo = data.frame(col1 = c(1,2,3)) %>% mutate(col2 = list("a", "b", "c") %>% unlist())

And now we can write to the CSV file:

write.csv(foo, "/tmp/foo.csv", row.names = FALSE)
$ cat /tmp/foo.csv
"col1","col2"
1,"a"
2,"b"
3,"c"

And that’s it!

Categories: Programming

Debian Size Claims - New Lecture Posted

10x Software Development - Steve McConnell - Tue, 06/30/2015 - 19:17

In this week's lecture (https://cxlearn.com) I demonstrate how to use some of the size information we've discussed in other lectures by diving into the Wikipedia claims about the sizes of various versions of Debian.  The point of this week's lecture is to show how to apply critical thinking to size information presented by an authoritative source (Wikipedia), and how to arrive at a confident conclusion that that information is not credible. Practicing software professionals should be able to look at size claims like the Debian size claims and, based on general knowledge, immediately think, "That seems far from credible." Yet, few professionals actually do that. My hope is that working through public examples like this in the lecture series will help software professionals improve their instincts and judgment, which can then be applied to projects in their own organizations. 

Lectures posted so far include:  

0.0 Understanding Software Projects - Intro
     0.1 Introduction - My Background
     0.2 Reading the News
     0.3 Definitions and Notations 

1.0 The Software Lifecycle Model - Intro
     1.1 Variations in Iteration 
     1.2 Lifecycle Model - Defect Removal
     1.3 Lifecycle Model Applied to Common Methodologies
     1.4 Lifecycle Model - Selecting an Iteration Approach  

2.0 Software Size
     2.05 Size - Comments on Lines of Code
     2.1 Size - Staff Sizes 
     2.2 Size - Schedule Basics 
     2.3 Size - Debian Size Claims (New)

Check out the lectures at http://cxlearn.com!

Understanding Software Projects - Steve McConnell

 

Succeeding with Geographically Distributed Scrum Teams - New White Paper

10x Software Development - Steve McConnell - Tue, 06/30/2015 - 19:02

We have a new white paper, "Succeeding with Geographically Distributed Scrum Teams." To quote the white paper itself: 

When organizations adopt Agile throughout the enterprise, they typically apply it to both large and small projects. The gap is that most Agile methodologies, such as Scrum and XP, are team-level workflow approaches. These approaches can be highly effective at the team level, but they do not address large project architecture, project management, requirements, and project planning needs. Our clients find that succeeding with Scrum on a large, geographically distributed team requires adopting additional practices to ensure the necessary coordination, communication, integration, and architectural work. This white paper discusses common considerations for success with geographically distributed Scrum.

Check it out!

What’s new with Google Fit: Distance, Calories, Meal data, and new apps and wearables

Google Code Blog - Tue, 06/30/2015 - 18:52

Posted by Angana Ghosh, Lead Product Manager, Google Fit

To help users keep track of their physical activity, we recently updated the Google Fit app with some new features, including an Android Wear watch face that helps users track their progress throughout the day. We also added data types to the Google Fit SDK and have new partners tracking data (e.g. nutrition, sleep, etc.) that developers can now use in their own apps. Find out how to integrate Google Fit into your app and read on to check out some of the cool new stuff you can do.

table, th, td { border: clear; border-collapse: collapse; }

Distance traveled per day

The Google Fit app now computes the distance traveled per day. Subscribe to it using the Recording API and query it using the History API.

Calories burned per day

If a user has entered their details into the Google Fit app, the app now computes their calories burned per day. Subscribe to it using the Recording API and query it using the History API.

Nutrition data from LifeSum, Lose It!, and MyFitnessPal

LifeSum and Lose It! are now writing nutrition data, like calories consumed, macronutrients (proteins, carbs, fats), and micronutrients (vitamins and minerals) to Google Fit. MyFitnessPal will start writing this data soon too. Query it from Google Fit using the History API.

Sleep activity from Basis Peak and Sleep as Android

Basis Peak and Sleep as Android are now writing sleep activity segments to Google Fit. Query this data using the History API.

New workout sessions and activity data from even more great apps and fitness wearables!

Endomondo, Garmin, the Daily Burn, the Basis Peak and the Xiaomi miBand are new Google Fit partners that will allow users to store their workout sessions and activity data. Developers can access this data with permission from the user, which will also be shown in the Google Fit app.

How are developers using the Google Fit platform?

Partners like LifeSum, and Lose It! are reading all day activity to help users keep track of their physical activity in their favorite fitness app.

Runkeeper now shows a Google Now card to its users encouraging them to “work off” their meals, based on their meals written to Google Fit by other apps.

Instaweather has integrated Google Fit into a new Android Wear face that they’re testing in beta. To try out the face, first join this Google+ community and then follow the link to join the beta and download the app.

We hope you enjoy checking out these Google Fit updates. Thanks to all our partners for making it possible! Find out more about integrating the Google Fit SDK into your app.

Categories: Programming

How to create the smallest possible docker container of any image

Xebia Blog - Tue, 06/30/2015 - 10:46

Once you start to do some serious work with Docker, you soon find that downloading images from the registry is a real bottleneck in starting applications. In this blog post we show you how you can reduce the size of any docker image to just a few percent of the original. So is your image too fat, try stripping your Docker image! The strip-docker-image utility demonstrated in this blog makes your containers faster and safer at the same time!


We are working quite intensively on our High Available Docker Container Platform  using CoreOS and Consul which consists of a number of containers (NGiNX, HAProxy, the Registrator and Consul). These containers run on each of the nodes in our CoreOS cluster and when the cluster boots, more than 600Mb is downloaded by the 3 nodes in the cluster. This is quite time consuming.

cargonauts/consul-http-router      latest              7b9a6e858751        7 days ago          153 MB
cargonauts/progrium-consul         latest              32253bc8752d        7 weeks ago         60.75 MB
progrium/registrator               latest              6084f839101b        4 months ago        13.75 MB

The size of the images is not only detrimental to the boot time of our platform, it also increases the attack surface of the container.  With 153Mb of utilities in the  NGiNX based consul-http-router, there is a lot of stuff in the container that you can use once you get inside. As we were thinking of running this router in a DMZ, we wanted to minimise the amount of tools lying around for a potential hacker.

From our colleague Adriaan de Jonge we already learned how to create the smallest possible Docker container  for a Go program. Could we repeat this by just extracting the NGiNX executable from the official distribution and copying it onto a scratch image?  And it turns out we can!

finding the necessary files

Using the utility dpkg we can list all the files that are installed by NGiNX

docker run nginx dpkg -L nginx
...
/.
/usr
/usr/sbin
/usr/sbin/nginx
/usr/share
/usr/share/doc
/usr/share/doc/nginx
...
/etc/init.d/nginx
locating dependent shared libraries

So we have the list of files in the package, but we do not have the shared libraries that are referenced by the executable. Fortunately, these can be retrieved using the ldd utility.

docker run nginx ldd /usr/sbin/nginx
...
	linux-vdso.so.1 (0x00007fff561d6000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd8f17cf000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007fd8f1598000)
	libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007fd8f1329000)
	libssl.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007fd8f10c9000)
	libcrypto.so.1.0.0 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007fd8f0cce000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fd8f0ab2000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd8f0709000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd8f19f0000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd8f0505000)
Following and including symbolic links

Now we have the executable and the referenced shared libraries, it turns out that ldd normally names the symbolic link and not the actual file name of the shared library.

docker run nginx ls -l /lib/x86_64-linux-gnu/libcrypt.so.1
...
lrwxrwxrwx 1 root root 16 Apr 15 00:01 /lib/x86_64-linux-gnu/libcrypt.so.1 -> libcrypt-2.19.so

By resolving the symbolic links and including both the link and the file, we are ready to export the bare essentials from the container!

getpwnam does not work

But after copying all essentials files to a scratch image, NGiNX did not start.  It appeared that NGiNX tries to resolve the user id 'nginx' and fails to do so.

docker run -P  --entrypoint /usr/sbin/nginx stripped-nginx  -g "daemon off;"
...
2015/06/29 21:29:08 [emerg] 1#1: getpwnam("nginx") failed (2: No such file or directory) in /etc/nginx/nginx.conf:2
nginx: [emerg] getpwnam("nginx") failed (2: No such file or directory) in /etc/nginx/nginx.conf:2

It turned out that the shared libraries for the name switch service reading /etc/passwd and /etc/group are loaded at runtime and not referenced in the shared libraries. By adding these shared libraries ( (/lib/*/libnss*) to the container, NGiNX worked!

strip-docker-image example

So now, the strip-docker-image utility is here for you to use!

    strip-docker-image  -i image-name
                        -t target-image-name
                        [-p package]
                        [-f file]
                        [-x expose-port]
                        [-v]

The options are explained below:

-i image-name           to strip
-t target-image-name    the image name of the stripped image
-p package              package to include from image, multiple -p allowed.
-f file                 file to include from image, multiple -f allowed.
-x port                 to expose.
-v                      verbose.

The following example creates a new nginx image, named stripped-nginx based on the official Docker image:

strip-docker-image -i nginx -t stripped-nginx  \
                           -x 80 \
                           -p nginx  \
                           -f /etc/passwd \
                           -f /etc/group \
                           -f '/lib/*/libnss*' \
                           -f /bin/ls \
                           -f /bin/cat \
                           -f /bin/sh \
                           -f /bin/mkdir \
                           -f /bin/ps \
                           -f /var/run \
                           -f /var/log/nginx \
                           -f /var/cache/nginx

Aside from the nginx package, we add the files /etc/passwd, /etc/group and /lib/*/libnss* shared libraries. The directories /var/run, /var/log/nginx and /var/cache/nginx are required for NGiNX to operate. In addition, we added /bin/sh and a few handy utilities, just to be able to snoop around a little bit.

The stripped image has now shrunk to an incredible 5.4% of the original 132.8 Mb to just 7.3Mb and is still fully operational!

docker images | grep nginx
...
stripped-nginx                     latest              d61912afaf16        21 seconds ago      7.297 MB
nginx                              1.9.2               319d2015d149        12 days ago         132.8 MB

And it works!

ID=$(docker run -P -d --entrypoint /usr/sbin/nginx stripped-nginx  -g "daemon off;")
docker run --link $ID:stripped cargonauts/toolbox-networking curl -s -D - http://stripped
...
HTTP/1.1 200 OK

For HAProxy, checkout the examples directory.

Conclusion

It is possible to use the official images that are maintained and distributed by Docker and strip them down to their bare essentials, ready for use! It speeds up load times and reduces the attack surface of that specific container.

Checkout the github repository for the script and the manual page.

Please send me your examples of incredibly shrunk Docker images!

Your Job Is Going Away and You Won’t Get a Retirement

Making the Complex Simple - John Sonmez - Mon, 06/29/2015 - 16:00

The world is changing—rapidly. Not too long ago, you could reasonably expect gainful employment at a single company for most of your life and then enjoy a nice pension or retirement plan later in life. It wasn’t too long ago when most software developers might be expected to work for the same company for at […]

The post Your Job Is Going Away and You Won’t Get a Retirement appeared first on Simple Programmer.

Categories: Programming

Dealing with People You Can’t Stand

“If You Want To Go Fast, Go Alone. If You Want To Go Far, Go Together” – African Proverb

I blew the dust off some olds posts to rekindle some of the most important information for work and life.

It’s about dealing with people you can’t stand.

Whether you think of them as jerks, bullies, or just difficult people, the better you can deal with difficult people, the better you can get things done and make things happen.

And the more you learn how to bring out the best, in people at their worst, the less you’ll find people you can’t stand.

How To Bring Out the Best in People at Their Worst (Including Yourself)

Everything I needed to learn about dealing with difficult people, I learned from the book Dealing with People You Can’t Stand: How to Bring Out the Best in People at Their Worst, by Dr. Rick Brinkman and Dr. Rick Kirschner.

It’s one of the most brilliant, thoughtful books I’ve ever read on interpersonal skills and dealing with all sorts of bad behaviors.

The real key to dealing with difficult behavior is more than just recognizing bad behaviors in other people.

It’s recognizing bad behaviors in yourself, the kind that contribute to and amplify other people’s bad behaviors.

The more you know, the more you grow, and this is truly one of those transformational books.

Learn How To Deal with Difficult People (and Gain Some Mad Interpersonal Skills)

I’ve completely re-written my pot that provides an overview of the big ideas in Dealing with People You Can’t Stand:

Dealing with People You Can’t Stand

Even better, I’ve re-written all of my posts that talk through the 10 Types of Difficult People, and what to do about them.

I have to warn you:  Once you learn the 10 Types of Difficult People, you’ll be using the labels to classify bad behaviors that you experience in the halls, in meetings, behind your back, etc.

With that in mind, here they are …

10 Types of Difficult People

Here are the 10 Types of Difficult People at a glance:

  1. Grenade Person – After a brief period of calm, the Grenade person explodes into unfocused ranting and raving about things that have nothing to do with the present circumstances.
  2. Know-It-Alls – Seldom in doubt, the Know-It-All person has a low tolerance for correction and contradiction. If something goes wrong, however, the Know-It-All will speak with the same authority about who’s to blame – you!
  3. Maybe Person – In a moment of decision, the Maybe Person procrastinates in the hope that a better choice will present itself.
  4. No Person – A No Person kills momentum and creates friction for you. More deadly to morale than a speeding bullet, more powerful than hope, able to defeat big ideas with a single syllable.
  5. Nothing Person – A Nothing Person doesn’t contribute to the conversation. No verbal feedback, no nonverbal feedback, Nothing. What else could you expect from … the Nothing Person.
  6. Snipers – Whether through rude comments, biting sarcasm, or a well-timed roll of the eyes, making you look foolish is the Sniper’s specialty.
  7. Tanks – The Tank is confrontational, pointed and angry, the ultimate in pushy and aggressive behavior
  8. Think-They-Know-It-Alls – Think-They-Know-It-All people can’t fool all the people all the time, but they can fool some of the people enough of the time, and enough of the people all of the time – all for the sake of getting some attention.
  9. Whiners – Whiners feel helpless and overwhelmed by an unfair world. Their standard is perfection, and no one and nothing measures up to it.
  10. Yes Person – In an effort to please people and avoid confrontation, Yes People say “yes” without thinking things through.

I warned you.  Are you already thinking about some Snipers in a few meetings that you have, or is there a Yes Person driving you nuts (or are you that Yes Person?)

Have you talked to a Think-They-Know-It-All lately, or worse, a Know-It—All?

Never fear, I’ve included actionable insights and recommendations for dealing with all the various bad behaviors you’ll encounter.

The Lens of Human Understanding

If all this talk about dealing with difficult people, and having silly labels seems like a gimmick, it’s not.  It’s actually deep insight rooted in a powerful, but simple framework that Dr. Rick Brinkman and Dr. Rick Kirschner refer to as the Lens of Human Understanding:

The Lens of Human Understanding

Once I learned The Lens of Human Understanding, so many things fell into place.

Not only did I understand myself better, but I could instantly see what was driving other people, and how my behavior would either create more conflict or resolve it.

But when you don’t know what makes people tick, it’s very easy to get ticked off, or to tick them off.

Here’s looking at you … and other people … and their behaviors … in a brand new way.

You Might Also Like

25 Books the Most Successful Microsoft Leaders Read and Do

Interpersonal Skills Books

Personal Development Hub on Sources of Insight

Personal Development Resources at Sources of Insight

The Great Leadership Quotes Collection

Categories: Architecture, Programming

R: Speeding up the Wimbledon scraping job

Mark Needham - Mon, 06/29/2015 - 06:36

Over the past few days I’ve written a few blog posts about a Wimbledon data set I’ve been building and after running the scripts a few times I noticed that it was taking much longer to run that I expected.

To recap, I started out with the following function which takes in a URI and returns a data frame containing a row for each match:

library(rvest)
library(dplyr)
 
scrape_matches1 = function(uri) {
  matches = data.frame()
 
  s = html(uri)
  rows = s %>% html_nodes("div#scoresResultsContent tr")
  i = 0
  for(row in rows) {  
    players = row %>% html_nodes("td.day-table-name a")
    seedings = row %>% html_nodes("td.day-table-seed")
    score = row %>% html_node("td.day-table-score a")
    flags = row %>% html_nodes("td.day-table-flag img")
 
    if(!is.null(score)) {
      player1 = players[1] %>% html_text() %>% str_trim()
      seeding1 = ifelse(!is.na(seedings[1]), seedings[1] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag1 = flags[1] %>% html_attr("alt")
 
      player2 = players[2] %>% html_text() %>% str_trim()
      seeding2 = ifelse(!is.na(seedings[2]), seedings[2] %>% html_node("span") %>% html_text() %>% str_trim(), NA)
      flag2 = flags[2] %>% html_attr("alt")
 
      matches = rbind(data.frame(winner = player1, 
                                 winner_seeding = seeding1, 
                                 winner_flag = flag1,
                                 loser = player2, 
                                 loser_seeding = seeding2,
                                 loser_flag = flag2,
                                 score = score %>% html_text() %>% str_trim(),
                                 round = round), matches)      
    } else {
      round = row %>% html_node("th") %>% html_text()
    }
  } 
  return(matches)
}

Let’s run it to get an idea of the data that it returns:

matches1 = scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results")
 
> matches1 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals
2   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
3 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
6  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
7  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals

As I mentioned, it’s quite slow but I thought I’d wrap it in system.time so I could see exactly how long it was taking:

> system.time(scrape_matches1("http://www.atpworldtour.com/en/scores/archive/wimbledon/540/2014/results"))
   user  system elapsed 
 25.570   0.111  31.416

About 30 seconds! The first thing I tried was downloading the file separately and running the function against the local file:

> system.time(scrape_matches1("data/raw/2014.html"))
   user  system elapsed 
 25.662   0.123  25.863

Hmmm, that’s only saved us 5 seconds so the bottleneck must be somewhere else. Still there’s no point making a HTTP request every time we run the script so we’ll stick with the local file version.

While browsing rvest’s vignette I noticed a function called html_table which I was curious about. I decided to try and replace some of my code with a call to that:

matches2= html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round)
 
> matches2 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding           loser loser_seeding            score          round
1  Novak Djokovic            (1)   Roger Federer           (4) 677 64 764 57 64         Finals
2  Novak Djokovic            (1) Grigor Dimitrov          (11)    64 36 762 767    Semi-Finals
3   Roger Federer            (4)    Milos Raonic           (8)         64 64 64    Semi-Finals
4  Novak Djokovic            (1)     Marin Cilic          (26)  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)     Andy Murray           (3)        61 764 62 Quarter-Finals
6   Roger Federer            (4)   Stan Wawrinka           (5)     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)    Nick Kyrgios          (WC)    674 62 64 764 Quarter-Finals

I had to do some slightly clever stuff to get the ’round’ column into shape using zoo’s na.locf function which I wrote about previously.

Unfortunately I couldn’t work out how to extract the flag with this version – that value is hidden in the ‘alt’ tag of an img and presumably html_table is just grabbing the text value of each cell. This version is much quicker though!

system.time(html("data/raw/2014.html") %>% 
  html_node("div#scoresResultsContent table.day-table") %>% html_table(header = FALSE) %>% 
  mutate(X1 = ifelse(X1 == "", NA, X1)) %>%
  mutate(round = ifelse(grepl("\\([0-9]\\)|\\(", X1), NA, X1)) %>% 
  mutate(round = na.locf(round)) %>%
  filter(!is.na(X8)) %>%
  select(winner = X3, winner_seeding = X1, loser = X7, loser_seeding = X5, score = X8, round))
 
   user  system elapsed 
  0.545   0.002   0.548

What I realised from writing this version is that I need to match all the columns with one call to html_nodes rather than getting the row and then each column in a loop.

I rewrote the function to do that:

scrape_matches3 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  matches3 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim())
  )
  return(matches3)
}

Let’s run and time that to check we’re getting back the right results in a timely manner:

> matches3 %>% sample_n(10)
                   winner winner_seeding winner_flag               loser loser_seeding loser_flag         score
70           David Ferrer            (7)         ESP Pablo Carreno Busta                      ESP  60 673 61 61
128        Alex Kuznetsov           (26)         USA         Tim Smyczek           (3)        USA   46 63 63 63
220   Rogerio Dutra Silva                        BRA   Kristijan Mesaros                      CRO         62 63
83         Kevin Anderson           (20)         RSA        Aljaz Bedene          (LL)        GBR      63 75 62
73          Kei Nishikori           (10)         JPN   Kenny De Schepper                      FRA     64 765 75
56  Roberto Bautista Agut           (27)         ESP         Jan Hernych           (Q)        CZE   75 46 62 62
138            Ante Pavic                        CRO        Marc Gicquel          (29)        FRA  46 63 765 64
174             Tim Puetz                        GER     Ruben Bemelmans                      BEL         64 62
103        Lleyton Hewitt                        AUS   Michal Przysiezny                      POL 62 6714 61 64
35          Roger Federer            (4)         SUI       Gilles Muller           (Q)        LUX      63 75 63
 
> system.time(scrape_matches3("data/raw/2014.html"))
   user  system elapsed 
  0.815   0.006   0.827

It’s still quick – a bit slower than html_table but we can deal with that. As you can see, I also had to add some logic to separate the values for the winners and losers – the players, seeds, flags come back as as one big list. The odd rows represent the winner; the even rows the loser.

Annoyingly we’ve now lost the ’round’ column because that appears as a table heading so we can’t extract it the same way. I ended up cheating a bit to get it to work by working out how many matches each round should contain and generated a vector with that number of entries:

raw_rounds = s %>% html_nodes("th") %>% html_text()
 
> raw_rounds
 [1] "Finals"               "Semi-Finals"          "Quarter-Finals"       "Round of 16"          "Round of 32"         
 [6] "Round of 64"          "Round of 128"         "3rd Round Qualifying" "2nd Round Qualifying" "1st Round Qualifying"
 
rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
            sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
> rounds[1:10]
 [1] "Finals"         "Semi-Finals"    "Semi-Finals"    "Quarter-Finals" "Quarter-Finals" "Quarter-Finals" "Quarter-Finals"
 [8] "Round of 16"    "Round of 16"    "Round of 16"

Let’s put that code into the function and see if we end up with the same resulting data frame:

scrape_matches4 = function(uri) {
  s = html(uri)
 
  players  = s %>% html_nodes("div#scoresResultsContent tr td.day-table-name a")
  seedings = s %>% html_nodes("div#scoresResultsContent tr td.day-table-seed")
  scores   = s %>% html_nodes("div#scoresResultsContent tr td.day-table-score a")
  flags    = s %>% html_nodes("div#scoresResultsContent tr td.day-table-flag img") %>% html_attr("alt") %>% str_trim()
 
  raw_rounds = s %>% html_nodes("th") %>% html_text()
  rounds = c( sapply(0:6, function(idx) rep(raw_rounds[[idx + 1]], 2 ** idx)) %>% unlist(),
              sapply(7:9, function(idx) rep(raw_rounds[[idx + 1]], 2 ** (idx - 3))) %>% unlist())
 
  matches4 = data.frame(
    winner         = sapply(seq(1,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    winner_seeding = sapply(seq(1,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    winner_flag    = sapply(seq(1,length(flags),2),    function(idx) flags[[idx]]),  
    loser          = sapply(seq(2,length(players),2),  function(idx) players[[idx]] %>% html_text()),
    loser_seeding  = sapply(seq(2,length(seedings),2), function(idx) seedings[[idx]] %>% html_text() %>% str_trim()),
    loser_flag     = sapply(seq(2,length(flags),2),    function(idx) flags[[idx]]),
    score          = sapply(scores,                    function(score) score %>% html_text() %>% str_trim()),
    round          = rounds
  )
  return(matches4)
}
 
matches4 = scrape_matches4("data/raw/2014.html")
 
> matches4 %>% filter(round %in% c("Finals", "Semi-Finals", "Quarter-Finals"))
           winner winner_seeding winner_flag           loser loser_seeding loser_flag            score          round
1  Novak Djokovic            (1)         SRB   Roger Federer           (4)        SUI 677 64 764 57 64         Finals
2  Novak Djokovic            (1)         SRB Grigor Dimitrov          (11)        BUL    64 36 762 767    Semi-Finals
3   Roger Federer            (4)         SUI    Milos Raonic           (8)        CAN         64 64 64    Semi-Finals
4  Novak Djokovic            (1)         SRB     Marin Cilic          (26)        CRO  61 36 674 62 62 Quarter-Finals
5 Grigor Dimitrov           (11)         BUL     Andy Murray           (3)        GBR        61 764 62 Quarter-Finals
6   Roger Federer            (4)         SUI   Stan Wawrinka           (5)        SUI     36 765 64 64 Quarter-Finals
7    Milos Raonic            (8)         CAN    Nick Kyrgios          (WC)        AUS    674 62 64 764 Quarter-Finals

We shouldn’t have added much to the time but let’s check:

> system.time(scrape_matches4("data/raw/2014.html"))
   user  system elapsed 
  0.816   0.004   0.824

Sweet. We’ve saved ourselves 29 seconds per page as long as the number of rounds stayed constant over the years. For the 10 years that I’ve looked at it has but I expect if you go back further the draw sizes will have been different and our script would break.

For now though this will do!

Categories: Programming

R: dplyr – Update rows with earlier/previous rows values

Mark Needham - Sun, 06/28/2015 - 23:30

Recently I had a data frame which contained a column which had mostly empty values:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA))
  col1 col2
1    1    a
2    2 <NA>
3    3 <NA>
4    4    b
5    5 <NA>

I wanted to fill in the NA values with the last non NA value from that column. So I want the data frame to look like this:

1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

I spent ages searching around before I came across the na.locf function in the zoo library which does the job:

library(zoo)
library(dplyr)
 
> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA)) %>% 
    do(na.locf(.))
  col1 col2
1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

This will fill in the missing values for every column, so if we had a third column with missing values it would populate those too:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    do(na.locf(.))
 
  col1 col2 col3
1    1    a    A
2    2    a    A
3    3    a    B
4    4    b    B
5    5    b    B

If we only want to populate ‘col2′ and leave ‘col3′ as it is we can apply the function specifically to that column:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    mutate(col2 = na.locf(col2))
  col1 col2 col3
1    1    a    A
2    2    a <NA>
3    3    a    B
4    4    b <NA>
5    5    b <NA>

It’s quite a neat function and certainly comes in helpful when cleaning up data sets which don’t tend to be as uniform as you’d hope!

Categories: Programming

R: Command line – Error in GenericTranslator$new : could not find function “loadMethod”

Mark Needham - Sat, 06/27/2015 - 23:47

I’ve been reading Text Processing with Ruby over the last week or so and one of the ideas the author describes is setting up your scripts so you can run them directly from the command line.

I wanted to do this with my Wimbledon R script and wrote the following script which uses the ‘Rscript’ executable so that R doesn’t launch in interactive mode:

wimbledon

#!/usr/bin/env Rscript
 
library(rvest)
library(dplyr)
library(stringr)
library(readr)
 
# stuff

Then I tried to run it:

$ time ./wimbledon
 
...
 
Error in GenericTranslator$new : could not find function "loadMethod"
Calls: write.csv ... html_extract_n -> <Anonymous> -> Map -> mapply -> <Anonymous> -> $
Execution halted
 
real	0m1.431s
user	0m1.127s
sys	0m0.078s

As the error suggests, the script fails when trying to write to a CSV file – it looks like Rscript doesn’t load in something from the core library that we need. It turns out adding the following line to our script is all we need:

library(methods)

So we end up with this:

#!/usr/bin/env Rscript
 
library(methods)
library(rvest)
library(dplyr)
library(stringr)
library(readr)

And when we run that all is well!

Categories: Programming

R: dplyr – squashing multiple rows per group into one

Mark Needham - Sat, 06/27/2015 - 23:36

I spent a bit of the day working on my Wimbledon data set and the next thing I explored is all the people that have beaten Andy Murray in the tournament.

The following dplyr query gives us the names of those people and the year the match took place:

library(dplyr)
 
> main_matches %>% filter(loser == "Andy Murray") %>% select(winner, year)
 
            winner year
1  Grigor Dimitrov 2014
2    Roger Federer 2012
3     Rafael Nadal 2011
4     Rafael Nadal 2010
5     Andy Roddick 2009
6     Rafael Nadal 2008
7 Marcos Baghdatis 2006
8 David Nalbandian 2005

As you can see, Rafael Nadal shows up multiple times. I wanted to get one row per player and list all the years in a single column.

This was my initial attempt:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year))
Source: local data frame [6 x 2]
 
            winner years
1     Andy Roddick  2009
2 David Nalbandian  2005
3  Grigor Dimitrov  2014
4 Marcos Baghdatis  2006
5     Rafael Nadal  2011
6    Roger Federer  2012

Unfortunately it just gives you the last matching row per group which isn’t quite what we want.. I realised my mistake while trying to pass a vector into paste and noticing that a vector came back when I’d expected a string:

> paste(c(2008,2009,2010))
[1] "2008" "2009" "2010"

The missing argument was ‘collapse’ – something I’d come across when using plyr last year:

> paste(c(2008,2009,2010), collapse=", ")
[1] "2008, 2009, 2010"

Now, if we apply that to our original function:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% summarise(years = paste(year, collapse=", "))
Source: local data frame [6 x 2]
 
            winner            years
1     Andy Roddick             2009
2 David Nalbandian             2005
3  Grigor Dimitrov             2014
4 Marcos Baghdatis             2006
5     Rafael Nadal 2011, 2010, 2008
6    Roger Federer             2012

That’s exactly what we want. Let’s tidy that up a bit:

> main_matches %>% filter(loser == "Andy Murray") %>% 
     group_by(winner) %>% arrange(year) %>%
     summarise(years  = paste(year, collapse =","), times = length(year))  %>%
     arrange(desc(times), years)
Source: local data frame [6 x 3]
 
            winner          years times
1     Rafael Nadal 2008,2010,2011     3
2 David Nalbandian           2005     1
3 Marcos Baghdatis           2006     1
4     Andy Roddick           2009     1
5    Roger Federer           2012     1
6  Grigor Dimitrov           2014     1
Categories: Programming

Scala development with GitHub's Atom editor

Xebia Blog - Sat, 06/27/2015 - 14:57
.code { font-family: monospace; background-color: #eeeeee; }

GitHub recently released version 1.0 of their Atom editor. This post gives a rough overview of its Scala support.

Basic features

Basic features such as Scala syntax highlighting are provided by the language-scala plugin.

Some work on worksheets as found in e.g. Eclipse has been done in the scala-worksheet-plus plugin, but this is still missing major features and not very useful at this time.

Navigation and completion Ctags

Atom supports basic 'Go to Declaration' (ctrl-alt-down) and 'Search symbol' (cmd-shift-r) support by way of the default ctags-based symbols-view.

While there are multiple sbt plugins for generating ctags, the easiest seems to be to have Ensime download the sources (more on that below) and invoke ctags manually: put this configuration in your home directory and run the 'ctags' command from your project root.

This is useful for searching for symbols, but limited for finding declarations: for example, when checking the declaration for Success, ctags doesn't know whether this is scala.util.Success, akka.actor.Status.Success, spray.http.StatusCodes.Success or some other 3rd-party or local symbol with that name.

Ensime

This is where the Ensime plugin comes in.

Ensime is a service for Scala IDE support, originally written for the Scala support in Emacs. The project metadata for Ensime can be generated with 'sbt gen-ensime' from the ensime-sbt sbt plugin.

Usage

Start the Ensime server from Atom with 'cmd-shift-p' 'Ensime: start'. After a small pause the status bar proclaims 'Indexer ready!' and you should be good to go.

At this point the main features are 'jump to definition' (alt-click), hover for type info, and auto-completion:

atom.io ensime completion

There are some rough edges, but this is a promising start based on a solid foundation.

Conclusions

While Atom is already a pleasant, modern, open source, cross platform editor, it is clearly still early days.

The Scala support in Atom is not yet as polished as in IDE's such as IntelliJ IDEA or as stable as in more mature editors such as Sublime Text, but is already practically useful and has serious potential. Startup is not instant, but I did not notice a 'sluggish feel' as reported by earlier reviewers.

Feel free to share your experiences in the comments, I will keep this post updated as the tools - and our experience with them - evolve.

R: ggplot – Show discrete scale even with no value

Mark Needham - Fri, 06/26/2015 - 23:48

As I mentioned in a previous blog post, I’ve been scraping data for the Wimbledon tennis tournament, and having got the data for the last ten years I wrote a query using dplyr to find out how players did each year over that period.

I ended up with the following functions to filter my data frame of all the mataches:

round_reached = function(player, main_matches) {
  furthest_match = main_matches %>% 
    filter(winner == player | loser == player) %>% 
    arrange(desc(round)) %>% 
    head(1)  
 
    return(ifelse(furthest_match$winner == player, "Winner", as.character(furthest_match$round)))
}
 
player_performance = function(name, matches) {
  player = data.frame()
  for(y in 2005:2014) {
    round = round_reached(name, filter(matches, year == y))
    if(length(round) == 1) {
      player = rbind(player, data.frame(year = y, round = round))      
    } else {
      player = rbind(player, data.frame(year = y, round = "Did not enter"))
    } 
  }
  return(player)
}

When we call that function we see the following output:

> player_performance("Andy Murray", main_matches)
   year          round
1  2005    Round of 32
2  2006    Round of 16
3  2007  Did not enter
4  2008 Quarter-Finals
5  2009    Semi-Finals
6  2010    Semi-Finals
7  2011    Semi-Finals
8  2012         Finals
9  2013         Winner
10 2014 Quarter-Finals

I wanted to create a chart showing Murray’s progress over the years with the round reached on the y axis and the year on the x axis. In order to do this I had to make sure the ’round’ column was being treated as a factor variable:

df = player_performance("Andy Murray", main_matches)
 
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
df$round = factor(df$round, levels =  rounds)
 
> df$round
 [1] Round of 32    Round of 16    Did not enter  Quarter-Finals Semi-Finals    Semi-Finals    Semi-Finals   
 [8] Finals         Winner         Quarter-Finals
Levels: Did not enter Round of 128 Round of 64 Round of 32 Round of 16 Quarter-Finals Semi-Finals Finals Winner

Now that we’ve got that we can plot his progress:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds)

2015 06 26 23 37 32

This is a good start but we’ve lost the rounds which don’t have a corresponding entry on the x axis. I’d like to keep them so it’s easier to compare the performance of different players.

It turns out that all we need to do is pass ‘drop = FALSE’ to scale_y_discrete and it will work exactly as we want:

ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop = FALSE)

2015 06 26 23 41 01

Neat. Now let’s have a look at the performances of some of the other top players:

draw_chart = function(player, main_matches){
  df = player_performance(player, main_matches)
  df$round = factor(df$round, levels =  rounds)
 
  ggplot(aes(x = year, y = round, group=1), data = df) + 
    geom_point() + 
    geom_line() + 
    scale_x_continuous(breaks=df$year) + 
    scale_y_discrete(breaks = rounds, drop=FALSE) + 
    ggtitle(player) + 
    theme(axis.text.x=element_text(angle=90, hjust=1))
}
 
a = draw_chart("Andy Murray", main_matches)
b = draw_chart("Novak Djokovic", main_matches)
c = draw_chart("Rafael Nadal", main_matches)
d = draw_chart("Roger Federer", main_matches)
 
library(gridExtra)
grid.arrange(a,b,c,d, ncol=2)

2015 06 26 23 46 15

And that’s all for now!

Categories: Programming

An update on Eclipse Android Developer Tools

Android Developers Blog - Fri, 06/26/2015 - 22:28

Posted by Jamal Eason, Product Manager, Android

Over the past few years, our team has focused on improving the development experience for building Android apps with Android Studio. Since the launch of Android Studio, we have been impressed with the excitement and positive feedback. As the official Android IDE, Android Studio gives you access to a powerful and comprehensive suite of tools to evolve your app across Android platforms, whether it's on the phone, wrist, car or TV.

To that end and to focus all of our efforts on making Android Studio better and faster, we are ending development and official support for the Android Developer Tools (ADT) in Eclipse at the end of the year. This specifically includes the Eclipse ADT plugin and Android Ant build system.

Time to Migrate

If you have not had the chance to migrate your projects to Android Studio, now is the time. To get started, download Android Studio. For many developers, migration is as simple as importing your existing Eclipse ADT projects in Android Studio with File → New→ Import Project as shown below:

For more details on the migration process, check out the migration guide. Also, to learn more about Android Studio and the underlying build system, check out this overview page.

Next Steps

Over the next few months, we are migrating the rest of the standalone performance tools (e.g. DDMS, Trace Viewer) and building in additional support for the Android NDK into Android Studio.

We are focused on Android Studio so that our team can deliver a great experience on a unified development environment. Android tools inside Eclipse will continue to live on in the open source community via the Eclipse Foundation. Check out the latest Eclipse Andmore project if you are interested in contributing or learning more.

For those of you that are new to Android Studio, we are excited for you to integrate Android Studio into your development workflow. Also, if you want to contribute to Android Studio, you can also check out the project source code. To follow all the updates on Android Studio, join our Google+ community.

Categories: Programming

Episode 230: Shubhra Kar on NodeJS

Shubhra Kar of StrongLoop talks to Jeff Meyerson about Node.js. Node allows for server-side JavaScript. Shubra and Jeff explore why Node is so important from three standpoints: isomorphic JavaScript, the single-threaded-concurrency model, and the “API economy.” Isomorphic JavaScript apps have their own control and viewing logic, but they share the state and specification of the […]
Categories: Programming