Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Reminder: ClientLogin Shutdown scheduled for April 20, 2015

Google Code Blog - Tue, 02/17/2015 - 21:17

Posted by Ryan Troll, Technical Lead, Identity and Authentication

As mentioned in our earlier post reminding users to migrate to newer Google Data APIs, we would like to once again share that the ClientLogin shutdown date is fast approaching, and applications which rely on it will stop working when it shuts down. We encourage you to minimize user disruption by switching to OAuth 2.0.

Our top priority is to safeguard users’ data, and at Google we use risk based analysis to block the vast majority of account hijacking attempts. Our risk analysis systems take into account many signals in addition to passwords to ensure that user data is protected. Password-only authentication has several well known shortcomings and we are actively working to move away from it. Moving to OAuth 2.0 ensures that advances we make in secure authentication are passed on to users signing in to Google services from your applications.

In our efforts to eliminate password-only authentication, we took the first step by announcing a deprecation date of April 20, 2015 for ClientLogin three years ago. At the same time, we recommended OAuth 2.0 as the standard authentication mechanism for our APIs. Applications using OAuth 2.0 never ask users for passwords, and users have tighter control over which data client applications can access. You can use OAuth 2.0 to build clients and websites that securely access account data and work with our advanced security features like 2-step verification.

We’ve taken steps to provide alternatives to password authentication in other protocols as well. CalDAV API V2 only supports OAuth 2.0, and we’ve added OAuth 2.0 support to IMAP, SMTP, and XMPP. While a deprecation timeline for password authentication in these protocols hasn’t been announced yet, developers are strongly encouraged to move to OAuth 2.0.

Categories: Programming

Hadoop and the OpenDataPlatform

hadoop-logo-square

Pivotal, IBM and Hortonworks announced today the “Open Data Platform” (ODP) – an attempt to standardize Hadoop. This move seems to be backed up by IBM, Teradata and others that appear as sponsors on the initiative site.

This move has a lot of potential and a few possible downsides.

ODP promises standardization – Cloudera’s Mike Olson downplays the importance of this “Every vendor shipping a Hadoop distribution builds off the Hadoop trunk. The APIs, data formats and semantics of trunk are stable. The project is a decade old, now, and the global Hadoop community exercises its governance obligations responsibly. There’s simply no fundamental incompatibility among the core Hadoop components shipped by the various vendors.”

I disagree. While it is true that there are no “fundamental incompatibility” there is a lot of non-fundamental ones. Each release by each vendor includes backport of features that are somewhere on the main trunk but far from the stable release. This means, that as a vendor, we have to both test our solutions on multiple distributions and work around the  subtle incompatibilities. We also have to limit ourselves to the lowest common denominator of the different platforms (or not support a distro) – for instance, until today, IBM did not support Yarn or Spark on their distribution

Hopefully standardization around common core will also mean that the involved vendors will deliver their value-add on that core unlike today where the offerings are based on proprietary extensions (this is true for Pivotal, IBM etc. not so much for Hortonworks). Today, we can’t take Impala and run it on Pivotal can we take Hawk and run it on HDP . With ODP we would, hopefully,  be able  mix-and-match and have installations where we can, say,  use IBM’s BigSQL with GemFire HD running on HDP and other such mixes. This can be good news for these vendors by enlarging their addressable market and for us a users by increasing our choice and reducing lock-in.

So what are the downsides/possible problems?

Well, for one we need to see that the scenarios I described above will actually happen and this isn’t just a marketing ploy. Another problem, the elephant in the room if you will,  is that the move is not complete –  Cloudera, a major Hadoop player, is not part of this move and as can be seen in the post referenced above, are against it. This is also true for MapR. With these two vendors out we still have multiple vendors to deal with and the problems ODP sets to solve will not disappear. I guess if ODP was led by the ASF or some other more “impartial” party it would have been easier to digest but as it is now all I can do is hope that both ODP will live to its expectations and that in the long run Cloudera and MapR will also join this initiative .

 

 

Categories: Architecture

JetBrains webinar recording: Software architecture as code

Coding the Architecture - Simon Brown - Tue, 02/17/2015 - 18:32

The lovely people at JetBrains have published the recording of the live webinar I did with them last week about software architecture as code. I've embedded the YouTube video below, but you should also go and take a look at their website because there are answers to a bunch of questions that I didn't get time to answer during the webinar itself.

If you've already seen one of my Software architecture vs code presentations, you should probably jump straight to the demo section where I show how to create a software architecture model with code and Structurizr. You can also get the slides and the code that I used.

Thanks again to JetBrains (especially Hadi Hariri, Trisha Gee and Robert Demmer) and to everybody who listened in.

Categories: Architecture

Sponsored Post: Apple, Sentient, Couchbase, Farmerswife, VividCortex, Internap, SocialRadar, Campanja, Transversal, MemSQL, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Apple is hiring a Application Security Engineer. Apple’s Gift Card Engineering group is looking for a software engineer passionate about application security for web applications and REST services. Be part of a team working on challenging and fast paced projects supporting Apple's business by delivering high volume, high performance, and high availability distributed transaction processing systems. Please apply here.

  • Apple is hiring a Software Engineer for Maps Services. The Maps Team is looking for a developer to support and grow some of the core backend services that support Apple Map's Front End Services. Ideal candidate would have experience with system architecture, as well as the design, implementation, and testing of individual components but also be comfortable with multiple scripting languages. Please apply here.

  • Sentient Technologies is hiring several Senior Distributed Systems Engineers and a Senior Distributed Systems QA Engineer. Sentient Technologies, is a privately held company seeking to solve the world’s most complex problems through massively scaled artificial intelligence running on one of the largest distributed compute resources in the world. Help us expand our existing million+ distributed cores to many, many more. Please apply here.

  • Want to be the leader and manager of a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join farmerswife on the beautiful island of Mallorca and help create the next generation on project management tools. Please apply here.

  • Senior DevOps EngineerSocialRadar. We are a VC funded startup based in Washington, D.C. operated like our West Coast brethren. We specialize in location-based technology. Since we are rapidly consuming large amounts of location data and monitoring all social networks for location events, we have systems that consume vast amounts of data that need to scale. As our Senior DevOps Engineer you’ll take ownership over that infrastructure and, with your expertise, help us grow and scale both our systems and our team as our adoption continues its rapid growth. Full description and application here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Rise of the Multi-Model Database. FoundationDB Webinar: March 10th at 1pm EST. Do you want a SQL, JSON, Graph, Time Series, or Key Value database? Or maybe it’s all of them? Not all NoSQL Databases are not created equal. The latest development in this space is the Multi Model Database. Please join FoundationDB for an interactive webinar as we discuss the Rise of the Multi Model Database and what to consider when choosing the right tool for the job.

  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/
Cool Products and Services
  • See how LinkedIn uses Couchbase to help power its “Follow” service for 300M+ global users, 24x7. http://info.couchbase.com/14Q4-MKTG-Website-LinkedIn-Scale-LP.html

  • VividCortex Developer edition delivers a groundbreaking performance management solution to startups, open-source projects, nonprofits, and other organizations free of charge. It integrates high-resolution metrics on queries, metrics, processes, databases, and the OS and hardware to deliver an unprecedented level of visibility into production database activity.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike demonstrates RAM-like performance with Google Compute Engine Local SSDs. After scaling to 1 M Writes/Second with 6x fewer servers than Cassandra on Google Compute Engine, we certified Google’s new Local SSDs using the Aerospike Certification Tool for SSDs (ACT) and found RAM-like performance and 15x storage cost savings. Read more.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Software Development Linkopedia February 2015

From the Editor of Methods & Tools - Tue, 02/17/2015 - 16:09
Here is our monthly selection of interesting knowledge material on programming, software testing and project management. This month you will find some interesting information and opinions about software and system modeling, programmer psychology, managing priorities, improving software architecture, technical user stories, free tools for Scrum, coding culture and integrating UX in Agile approaches. Web site: […]

Multiple Levels of Done

Mike Cohn's Blog - Tue, 02/17/2015 - 16:00

The following was originally published in Mike Cohn's monthly newsletter. If you like what you're reading, sign up to have this content delivered to your inbox weeks before it's posted on the blog, here.

Having a “definition of done” has become a near-standard thing for Scrum teams. The definition of done (often called a “DoD”) establishes what must be true of each product backlog item for that item to be done.

A typical DoD would be something similar to:

  • The code is well written. (That is, we’re happy with it and don’t feel like it immediately needs to be rewritten.)
  • The code is checked in. (Kind of an “of course” statement, but still worth calling out.)
  • The code was either pair programmed or peer reviewed.
  • The code comes with tests at all appropriate levels. (That is, unit, service and user interface.)
  • The feature the code implements has been documented in any end-user documentation such as manuals or help systems.

Many teams will improve their Definition of Done over time. For example, a team using the example above might not be able to do so much automated testing when first starting out. But, hopefully, they would add that to their definition of done over time.

All this is sufficient for the vast majority of teams. But I’ve worked on a few projects whose teams benefitted from having multiple definitions of done. A team takes a product backlog item to definition of done Level 1 in a first sprint, to definition of done Level 2 in a subsequent sprint, and so on.

I am most definitely not saying they code something in a first sprint and test it in a second sprint. “Done” still means tested, but it may mean tested to different—but appropriate—levels. Let’s look an example.

An Example from a Game Studio

One thing I’ve really enjoyed in working with game studios is that they understand that not all work will make it into the finished game. Sometimes, for example, a game team experiments with a new character trying to make the character fun. If they can’t, the character isn’t added to the game.

So it would be extremely wasteful for a game team to have a definition of done requiring all art to be perfect, all audio be recorded, and refresh rates be high when they are merely trying to decide if a new character is fun. The team should do just enough to answer that question.

In a number of game studios, this has led to a four-level definition of done:

Done, Level 1 (D1) means the new feature works and decisions can be made. For animation, this was often “the character is animated in a white room.” It’s “shippable” to friendly users (often internal) who can comment on whether the new functionality meets its objective.

D2: The thing is integrated into the game and users can play it / interact with it.

D3: The feature is truly shippable. It’s good enough to include in a major public release. The team may not want to release it yet—they may first want to improve the frame rate, add some polygons, brighten colors, and so on. But the feature could be shipped with this feature in this state if necessary.

D4: The feature is tuned, polished, and everyone loves it. There’s nothing the team would change. A typical public release will include a mix of D4 and D3 items. There will always be areas the team wants to go back to and further improve. But, time intrudes and they ship the product. So D3 is totally shippable. You’re not embarrassed by D3 and only your hardest core users will notice the ways it could be better. D4 rocks.

Are Multiple Definitions of Done Right for You?

Very likely not. Most teams do quite well with a single definition of done. But the ideas above extend beyond just game development. I’ve used the same approach in a variety of other application domains, notably hardware development. In that case, the teams involved were developing dozens of new gadgets for an integrated suite of home automation products.

They used these definitions:

D1: The new hardware works on a test bench in the office.

D2: The new hardware is integrated with the other products in the suite.

D3: The new hardware is installed and running in at least one model house used for this type of beta testing.

D4: The product is fully ready for sale (e.g., it meets all requirements for UL approval).

Within this company, there were dozens of components in development at all times, and some components could be found at each level of doneness. For example, a product to raise and lower window shades could be in testing at the model home, while a newer component to open and close doors had just been started and was only working on a test bench of one developer.

Most projects will never need this. If you do think it’s appropriate for you, before trying it, really be sure you’re not using the technique as an excuse to skip things like testing.

Each level should exist as a way of making decisions about the product. A good test of that is to see if some features are dropped at each level. It is a good sign, for example, that sometimes a feature reaches a certain doneness level, and the product owner decides the feature is no longer wanted due to perhaps its cost or delivery time.

Cancelling $http requests for fun and profit

Xebia Blog - Tue, 02/17/2015 - 09:11

At my current client, we have a large AngularJS application that is configured to show a full-page error whenever one of the $http requests ends up in error. This is implemented with an error interceptor as you would expect it to be. However, we’re also using some calculation-intense resources that happen to timeout once in a while. This combination is tricky: a user triggers a resource request when navigating to a certain page, navigates to a second page and suddenly ends up with an error message, as the request from the first page triggered a timeout error. This is a particular unpleasant side effect that I’m going to address in a generic way in this post.

There are of course multiple solutions to this problem. We could create a more resilient implementation in the backend that will not time out, but accepts retries. We could change the full-page error in something less ‘in your face’ (but you still would get some out-of-place error notification). For this post I’m going to fix it using a different approach: cancel any running requests when a user switches to a different location (the route part of the URL). This makes sense; your browser does the same when navigating from one page to another, so why not mimic this behaviour in your Angular app?

I’ve created a pretty verbose implementation to explain how to do this. At the end of this post, you’ll find a link to the code as a packaged bower component that can be dropped in any Angular 1.2+ app.

To cancel a running request, Angular does not offer that many options. Under the hood, there are some places where you can hook into, but that won’t be necessary. If we look at the $http usage documentation, the timeout property is mentioned and it accepts a promise to abort the underlying call. Perfect! If we set a promise on all created requests, and abort these at once when the user navigates to another page, we’re (probably) all set.

Let’s write an interceptor to plug in the promise in each request:

angular.module('angularCancelOnNavigateModule')
  .factory('HttpRequestTimeoutInterceptor', function ($q, HttpPendingRequestsService) {
    return {
      request: function (config) {
        config = config || {};
        if (config.timeout === undefined && !config.noCancelOnRouteChange) {
          config.timeout = HttpPendingRequestsService.newTimeout();
        }
        return config;
      }
    };
  });

The interceptor will not overwrite the timeout property when it is explicitly set. Also, if the noCancelOnRouteChange option is set to true, the request won’t be cancelled. For better separation of concerns, I’ve created a new service (the HttpPendingRequestsService) that hands out new timeout promises and stores references to them.

Let’s have a look at that pending requests service:

angular.module('angularCancelOnNavigateModule')
  .service('HttpPendingRequestsService', function ($q) {
    var cancelPromises = [];

    function newTimeout() {
      var cancelPromise = $q.defer();
      cancelPromises.push(cancelPromise);
      return cancelPromise.promise;
    }

    function cancelAll() {
      angular.forEach(cancelPromises, function (cancelPromise) {
        cancelPromise.promise.isGloballyCancelled = true;
        cancelPromise.resolve();
      });
      cancelPromises.length = 0;
    }

    return {
      newTimeout: newTimeout,
      cancelAll: cancelAll
    };
  });

So, this service creates new timeout promises that are stored in an array. When the cancelAll function is called, all timeout promises are resolved (thus aborting all requests that were configured with the promise) and the array is cleared. By setting the isGloballyCancelled property on the promise object, a response promise method can check whether it was cancelled or another exception has occurred. I’ll come back to that one in a minute.

Now we hook up the interceptor and call the cancelAll function at a sensible moment. There are several events triggered on the root scope that are good hook candidates. Eventually I settled for $locationChangeSuccess. It is only fired when the location change is a success (hence the name) and not cancelled by any other event listener.

angular
  .module('angularCancelOnNavigateModule', [])
  .config(function($httpProvider) {
    $httpProvider.interceptors.push('HttpRequestTimeoutInterceptor');
  })
  .run(function ($rootScope, HttpPendingRequestsService) {
    $rootScope.$on('$locationChangeSuccess', function (event, newUrl, oldUrl) {
      if (newUrl !== oldUrl) {
        HttpPendingRequestsService.cancelAll();
      }
    })
  });

When writing tests for this setup, I found that the $locationChangeSuccess event is triggered at the start of each test, even though the location did not change yet. To circumvent this situation, the function does a simple difference check.

Another problem popped up during testing. When the request is cancelled, Angular creates an empty error response, which in our case still triggers the full-page error. We need to catch and handle those error responses. We can simply add a responseError function in our existing interceptor. And remember the special isGloballyCancelled property we set on the promise? That’s the way to distinguish between cancelled and other responses.

We add the following function to the interceptor:

      responseError: function (response) {
        if (response.config.timeout.isGloballyCancelled) {
          return $q.defer().promise;
        }
        return $q.reject(response);
      }

The responseError function must return a promise that normally re-throws the response as rejected. However, that’s not what we want: neither a success nor a failure callback should be called. We simply return a never-resolving promise for all cancelled requests to get the behaviour we want.

That’s all there is to it! To make it easy to reuse this functionality in your Angular application, I’ve packaged this module as a bower component that is fully tested. You can check the module out on this GitHub repo.

Google launches the Chinese language Developer Channel on YouTube

Google Code Blog - Tue, 02/17/2015 - 01:30

Posted by Bill Luan, Greater China Regional Lead, Google Developer Relations

Today, the Google Developer Platform team is launching a Chinese language and captioned YouTube channel, aiming to make it easier for the developers in China to learn more about Google services and technologies around mobile, web and the cloud. The channel includes original content in Chinese (Mandarin speaking), and curates content from the English version of the Google Developers channel with Simplified Chinese captions.

A special thank you to the volunteers in Google Developers Group community in the city of Nanyang (Nanyang GDG) in China, for their effort and contribution in adding the Chinese language translations to the English language Google Developer Channel videos on YouTube. Over time, we will produce more Chinese language original content, as well as continue leveraging GDG volunteers in China to add more Chinese captioned English videos from Google Developer Channel, to serve the learning needs from developers.

Categories: Programming

Python/pandas: Column value in list (ValueError: The truth value of a Series is ambiguous.)

Mark Needham - Mon, 02/16/2015 - 22:39

I’ve been using Python’s pandas library while exploring some CSV files and although for the most part I’ve found it intuitive to use, I had trouble filtering a data frame based on checking whether a column value was in a list.

A subset of one of the CSV files I’ve been working with looks like this:

$ cat foo.csv
"Foo"
1
2
3
4
5
6
7
8
9
10

Loading it into a pandas data frame is reasonably simple:

import pandas as pd
df = pd.read_csv('foo.csv', index_col=False, header=0)
>>> df
   Foo
0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
9   10

If we want to find the rows which have a value of 1 we’d write the following:

>>> df[df["Foo"] == 1]
   Foo
0    1

Finding the rows with a value less than 7 is as you’d expect too:

>>> df[df["Foo"] < 7]
   Foo
0    1
1    2
2    3
3    4
4    5
5    6

Next I wanted to filter out the rows containing odd numbers which I initially tried to do like this:

odds = [i for i in range(1,10) if i % 2 <> 0]
>>> odds
[1, 3, 5, 7, 9]
 
>>> df[df["Foo"] in odds]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/pandas/core/generic.py", line 698, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Unfortunately that doesn’t work and I couldn’t get any of the suggestions from the error message to work either. Luckily pandas has a special isin function for this use case which we can call like this:

>>> df[df["Foo"].isin(odds)]
   Foo
0    1
2    3
4    5
6    7
8    9

Much better!

Categories: Programming

When development resembles the ageing of wine

Xebia Blog - Mon, 02/16/2015 - 20:29

Once upon a time I was asked to help out a software product company.  The management briefing went something like this: "We need you to increase productivity, the guys in development seem to be unable to ship anything! and if they do ship something it's only a fraction of what we expected".

And so the story begins. Now there are many ways how we can improve the teams outcome and its output (the first matters more), but it always starts with observing what they do today and trying to figure out why.

It turns out that requests from the business were treated like a good wine, and were allowed to "age", in the oak barrel that was called Jira. Not so much to add flavour in the form of details, requirements, designs, non functional requirements or acceptance criteria, but mainly to see if the priority of this request would remain stable over a period of time.

In the days that followed I participated in the "Change Control Board" and saw what he meant. Management would change priorities on the fly and make swift decisions on requirements that would take weeks to implement. To stay in vinotology terms, wine was poured in and out the barrels at such a rate that it bore more resemblance to a blender than to the art of wine making.

Though management was happy to learn I had unearthed to root cause to their problem, they were less pleased to learn that they themselves were responsible.  The Agile world created the Product Owner role for this, and it turned out that this is hat, that can only be worn by a single person.

Once we funnelled all the requests through a single person, both responsible for the success of the product and for the development, we saw a big change. Not only did the business got a reliable sparring partner, but the development team had a single voice when it came to setting the priorities. Once the team starting finishing what they started we started shipping at regular intervals, with features that we all had committed to.

Of course it did not take away the dynamics of the business, but it allowed us to deliver, and become reliable in how and when we responded to change. Perhaps not the most aged wine, but enough to delight our customers and learn what we should put in our barrel for the next round.

 

ScottGu Azure event in London on March 2nd

ScottGu's Blog - Scott Guthrie - Mon, 02/16/2015 - 19:16

On March 2nd I'm doing an Azure event in London that you can attend for free.  I'll be speaking for about 2.5 hours and will do an end-to-end walkthrough of Microsoft Azure, show off a bunch of demos of great new features/capabilities, and talk about some of the improvements coming out over the next few months.

logo[1]

You can sign-up and attend the event for free (while tickets last - they are going fast).  If you are interested sign-up now.  The event is being held at the Mermaid Conference & Events Centre in Blackfriars, London:

mermaidspic3[1]

Hope to see some of you there!

Scott

omni
Categories: Architecture, Programming

Scaling Kim Kardashian to 100 Million Page Views

The team at PAPER estimated their article (NSFW) containing pictures of a very naked Kim Kardashian would quickly receive over 100 million page views. The very definition of bursty viral driven traffic.

As a comparison in 2013 it was estimated Google processed over 500 million searches a day. So a nude Kim Kardashian is worth one-fifth of a Google. Strangely, I can believe it.

How did they handle this traffic gold mine? A complete recounting of the unusual behind the scenes story is told by Paul Ford in How PAPER Magazine’s web engineers scaled their back-end for Kim Kardashian (SFW).  (BTW, only one butt pun was made intentionally in this story, all others are serendipity).

What can we learn from the experience? I know what you are thinking. This is just a single static page with a few static pictures. It’s not a complex problem like search or a social network. Shouldn’t any decent CDN be enough to handle that? And you would be correct, but that’s not the whole of the story:

Categories: Architecture

The Joel Test For Programmers (The Simple Programmer Test)

Making the Complex Simple - John Sonmez - Mon, 02/16/2015 - 17:00

A while back—the year 2000 to be exact—Joel Spolsky wrote a blog post entitled: “The Joel Test: 12 Steps to Better Code.” Many software engineers and developers use this test for evaluating a company to determine if a company is a good company to work for. In fact, many software development organizations use the Joel Test […]

The post The Joel Test For Programmers (The Simple Programmer Test) appeared first on Simple Programmer.

Categories: Programming

Agile Misconceptions: There Is One Right Approach

I have an article up on agileconnection.com called Common Misconceptions about Agile: There Is Only One Approach.

If you read my Design Your Agile Project series, you know I am a fan of determining what approach works when for your organization or project.

Please leave comments over there. Thanks!

Two notes:

  1. If you would like to write an article for agileconnection.com, I’m the technical editor. Send me your article and we can go from there.
  2. If you would like more common-sense approaches to agile, sign up for the Influential Agile Leader. We’re leading it in San Francisco and London this year. Early bird pricing ends soon.
Categories: Project Management

SPaMCAST 329 – Commitment, Message and Themes, HALT Testing

www.spamcast.net

http://www.spamcast.net

Listen Now

Subscribe on iTunes

This week’s Software Process and Measurement Cast is our magazine with three features.  We begin with Jo Ann Sweeney’s Explaining Change column.  In this column Jo Ann tackles the concepts of messages and themes.  I consider this the core of communication.  Visit Jo Ann’s website at http://www.sweeneycomms.com and let her know what you think of her column.

The middle segment is our essay on commitment.  The making and keeping of commitments are core components of both professional behavior and Agile. The simple definition of a commitment is a promise to perform. Whether Agile or Waterfall, commitments are used to manage software projects. Commitments drive the behavior of individuals, teams and organizations.  Commitments are powerful!

We wrap this week’s podcast up with a new column from the Software Sensei, Kim Pries. In this installment Kim discusses software HALT testing.  HALT stands for highly accelerated life test.  The goal is to find defects, faults and things that go bump in the night in hours or days rather than waiting for weeks, months or years.  Whether you are testing software, hardware or some combination this is a concept you need to have in your portfolio.

Call to action!

Can you tell a friend about the podcast?  Even better, show them how you listen to the Software Process and Measurement Cast and subscribe them!  Send me the name of you person you subscribed and I will give both you and the horde you have converted to listeners a call out on the show.

Re-Read Saturday News

The next book in our Re-Read Saturday feature will be Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement. Originally published in 1984, it has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. On February 21st we will begin re-read on the Software Process and Measurement Blog

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

Next SPaMCAST

In the next Software Process and Measurement Cast we will feature our interview Anthony Mersino, author of Emotional Intelligence for Project Managers and the newly published Agile Project Management.  Anthony and I talked about Agile, coaching and organizational change.  A wide ranging interview that will help any leader raise the bar!

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese


Categories: Process Management

SPaMCAST 329 – Commitment, Message and Themes, HALT Testing

Software Process and Measurement Cast - Sun, 02/15/2015 - 23:00

This week’s Software Process and Measurement Cast is our magazine with three features.  We begin with Jo Ann Sweeney’s Explaining Change column.  In this column Jo Ann tackles the concepts of messages and themes.  I consider this the core of communication.  Visit Jo Ann’s website at http://www.sweeneycomms.com and let her know what you think of her column.

The middle segment is our essay on commitment.  The making and keeping of commitments are core components of both professional behavior and Agile. The simple definition of a commitment is a promise to perform. Whether Agile or Waterfall, commitments are used to manage software projects. Commitments drive the behavior of individuals, teams and organizations.  Commitments are powerful! 

We wrap this week’s podcast up with a new column from the Software Sensei, Kim Pries. In this installment Kim discusses software HALT testing.  HALT stands for highly accelerated life test.  The goal is to find defects, faults and things that go bump in the night in hours or days rather than waiting for weeks, months or years.  Whether you are testing software, hardware or some combination this is a concept you need to have in your portfolio.

Call to action!

Can you tell a friend about the podcast?  Even better, show them how you listen to the Software Process and Measurement Cast and subscribe them!  Send me the name of you person you subscribed and I will give both you and the horde you have converted to listeners a call out on the show.  

Re-Read Saturday News

The next book in our Re-Read Saturday feature will be Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement. Originally published in 1984, it has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. On February 21st we will begin re-read on the Software Process and Measurement Blog

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version  

 

Next SPaMCast

In the next Software Process and Measurement Cast we will feature our interview Anthony Mersino, author of Emotional Intelligence for Project Managers and the newly published Agile Project Management.  Anthony and I talked about Agile, coaching and organizational change.  A wide ranging interview that will help any leader raise the bar! 

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese

Categories: Process Management

Early Bird Ends Soon for Influential Agile Leader

If you are a leader for your agile efforts in your organization, you need to consider participating in The Influential Agile Leader. If you are working on how to transition to agile, how to talk about agile, how to help your peers, managers, or teams, you want to participate.

Gil Broza and I designed it to be experiential and interactive. We’re leading the workshop in San Francisco, Mar 31-Apr 1. We’ll be in London April 14-15.

The early bird pricing ends Feb 20.

People who participate see great results, especially when they bring peers/managers from their organization. Sign up now.

Categories: Project Management

Python/scikit-learn: Calculating TF/IDF on How I met your mother transcripts

Mark Needham - Sun, 02/15/2015 - 16:56

Over the past few weeks I’ve been playing around with various NLP techniques to find interesting insights into How I met your mother from its transcripts and one technique that kept coming up is TF/IDF.

The Wikipedia definition reads like this:

tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

It is often used as a weighting factor in information retrieval and text mining.

The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.

I wanted to generate a TF/IDF representation of phrases used in the hope that it would reveal some common themes used in the show.

Python’s scikit-learn library gives you two ways to generate the TF/IDF representation:

  1. Generate a matrix of token/phrase counts from a collection of text documents using CountVectorizer and feed it to TfidfTransformer to generate the TF/IDF representation.
  2. Feed the collection of text documents directly to TfidfVectorizer and go straight to the TF/IDF representation skipping the middle man.

I started out using the first approach and hadn’t quite got it working when I realised there was a much easier way!

I have a collection of sentences in a CSV file so the first step is to convert those into a list of documents:

from collections import defaultdict
import csv
 
episodes = defaultdict(list)
with open("data/import/sentences.csv", "r") as sentences_file:
    reader = csv.reader(sentences_file, delimiter=',')
    reader.next()
    for row in reader:
        episodes[row[1]].append(row[4])
 
for episode_id, text in episodes.iteritems():
    episodes[episode_id] = "".join(text)
 
corpus = []
for id, episode in sorted(episodes.iteritems(), key=lambda t: int(t[0])):
    corpus.append(episode)

corpus contains 208 entries (1 per episode), each of which is a string containing the transcript of that episode. Next it’s time to train our TF/IDF model which is only a few lines of code:

from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1,3), min_df = 0, stop_words = 'english')

The most interesting parameter here is ngram_range – we’re telling it to generate 2 and 3 word phrases along with the single words from the corpus.

e.g. if we had the sentence “Python is cool” we’d end up with 6 phrases – ‘Python’, ‘is’, ‘cool’, ‘Python is’, ‘Python is cool’ and ‘is cool’.

Let’s execute the model against our corpus:

tfidf_matrix =  tf.fit_transform(corpus)
>>> len(feature_names)
498254
 
>>> feature_names[50:70]
[u'00 does sound', u'00 don', u'00 don buy', u'00 dressed', u'00 dressed blond', u'00 drunkenly', u'00 drunkenly slurred', u'00 fair', u'00 fair tonight', u'00 fall', u'00 fall foliage', u'00 far', u'00 far impossible', u'00 fart', u'00 fart sure', u'00 friends', u'00 friends singing', u'00 getting', u'00 getting guys', u'00 god']

So we’re got nearly 500,000 phrases and if we look at tfidf_matrix we’d expect it to be a 208 x 498254 matrix – one row per episode, one column per phrase:

>>> tfidf_matrix
<208x498254 sparse matrix of type '<type 'numpy.float64'>'
	with 740396 stored elements in Compressed Sparse Row format>

This is what we’ve got although under the covers it’s using a sparse representation to save space. Let’s convert the matrix to dense format to explore further and find out why:

dense = tfidf_matrix.todense()
>>> len(dense[0].tolist()[0])
498254

What I’ve printed out here is the size of one row of the matrix which contains the TF/IDF score for every phrase in our corpus for the 1st episode of How I met your mother. A lot of those phrases won’t have happened in the 1st episode so let’s filter those out:

episode = dense[0].tolist()[0]
phrase_scores = [pair for pair in zip(range(0, len(episode)), episode) if pair[1] > 0]
 
>>> len(phrase_scores)
4823

There are just under 5000 phrases used in this episode, roughly 1% of the phrases in the whole corpus.
The sparse matrix makes a bit more sense – if scipy used a dense matrix representation there’d be 493,000 entries with no score which becomes more significant as the number of documents increases.

Next we’ll sort the phrases by score in descending order to find the most interesting phrases for the first episode of How I met your mother:

>>> sorted(phrase_scores, key=lambda t: t[1] * -1)[:5]
[(419207, 0.2625177493269755), (312591, 0.19571419072701732), (267538, 0.15551468983363487), (490429, 0.15227880637176266), (356632, 0.1304175242341549)]

The first value in each tuple is the phrase’s position in our initial vector and also corresponds to the phrase’s position in feature_names which allows us to map the scores back to phrases. Let’s look up a couple of phrases:

>>> feature_names[419207]
u'ted'
>>> feature_names[312591]
u'olives'
>>> feature_names[356632]
u'robin'

Let’s automate that lookup:

sorted_phrase_scores = sorted(phrase_scores, key=lambda t: t[1] * -1)
for phrase, score in [(feature_names[word_id], score) for (word_id, score) in sorted_phrase_scores][:20]:
   print('{0: <20} {1}'.format(phrase, score))
 
ted                  0.262517749327
olives               0.195714190727
marshall             0.155514689834
yasmine              0.152278806372
robin                0.130417524234
barney               0.124411751867
lily                 0.122924977859
signal               0.103793246466
goanna               0.0981379875009
scene                0.0953423604123
cut                  0.0917336653574
narrator             0.0864622981985
flashback            0.078295921554
flashback date       0.0702825260177
ranjit               0.0693927691559
flashback date robin 0.0585687716814
ted yasmine          0.0585687716814
carl                 0.0582101172888
eye patch            0.0543650529797
lebanese             0.0543650529797

We see all the main characters names which aren’t that interested – perhaps they should be part of the stop list – but ‘olives’ which is where the olive theory is first mentioned. I thought olives came up more often but a quick search for the term suggests it isn’t mentioned again until Episode 9 in Season 9:

$ grep -rni --color "olives" data/import/sentences.csv | cut -d, -f 2,3,4 | sort | uniq -c
  16 1,1,1
   3 193,9,9

‘yasmine’ is also an interesting phrase in this episode but she’s never mentioned again:

$ grep -h -rni --color "yasmine" data/import/sentences.csv
49:48,1,1,1,"Barney: (Taps a woman names Yasmine) Hi, have you met Ted? (Leaves and watches from a distance)."
50:49,1,1,1,"Ted: (To Yasmine) Hi, I'm Ted."
51:50,1,1,1,Yasmine: Yasmine.
53:52,1,1,1,"Yasmine: Thanks, It's Lebanese."
65:64,1,1,1,"[Cut to the bar, Ted is chatting with Yasmine]"
67:66,1,1,1,Yasmine: So do you think you'll ever get married?
68:67,1,1,1,"Ted: Well maybe eventually. Some fall day. Possibly in Central Park. Simple ceremony, we'll write our own vows. But--eh--no DJ, people will dance. I'm not going to worry about it! Damn it, why did Marshall have to get engaged? (Yasmine laughs) Yeah, nothing hotter than a guy planning out his own imaginary wedding, huh?"
69:68,1,1,1,"Yasmine: Actually, I think it's cute."
79:78,1,1,1,"Lily: You are unbelievable, Marshall. No-(Scene splits in half and shows both Lily and Marshall on top arguing and Ted and Yasmine on the bottom mingling)"
82:81,1,1,1,Ted: (To Yasmine) you wanna go out sometime?
85:84,1,1,1,[Cut to Scene with Ted and Yasmine at bar]
86:85,1,1,1,Yasmine: I'm sorry; Carl's my boyfriend (points to bartender)

It would be interesting to filter out the phrases which don’t occur in any other episode and see what insights we get from doing that. For now though we’ll extract phrases for all episodes and write to CSV so we can explore more easily:

with open("data/import/tfidf_scikit.csv", "w") as file:
    writer = csv.writer(file, delimiter=",")
    writer.writerow(["EpisodeId", "Phrase", "Score"])
 
    doc_id = 0
    for doc in tfidf_matrix.todense():
        print "Document %d" %(doc_id)
        word_id = 0
        for score in doc.tolist()[0]:
            if score > 0:
                word = feature_names[word_id]
                writer.writerow([doc_id+1, word.encode("utf-8"), score])
            word_id +=1
        doc_id +=1

And finally a quick look at the contents of the CSV:

$ tail -n 10 data/import/tfidf_scikit.csv
208,york apparently laughs,0.012174304095213192
208,york aren,0.012174304095213192
208,york aren supposed,0.012174304095213192
208,young,0.013397275854758335
208,young ladies,0.012174304095213192
208,young ladies need,0.012174304095213192
208,young man,0.008437685963000223
208,young man game,0.012174304095213192
208,young stupid,0.011506395106658192
208,young stupid sighs,0.012174304095213192
Categories: Programming

Diamond Kata - Some Thoughts on Tests as Documentation

Mistaeks I Hav Made - Nat Pryce - Sun, 02/15/2015 - 13:13
Comparing example-based tests and property-based tests for the Diamond Kata, I’m struck by how well property-based tests reduce duplication of test code. For example, in the solutions by Sandro Mancuso and George Dinwiddie, not only do multiple tests exercise the same property with different examples but the tests duplicate assertions. Property-based tests avoid the former by defining generators of input data, but I’m not sure why the latter occurs. Perhaps Seb’s “test recycling” approach would avoid this kind of duplication. But compared to example based tests, property based tests do not work so well as as an explanatory overview. Examples convey an overall impression of what the functionality is, but are are not good at describing precise details. When reading example-based tests, you have to infer the properties of the code from multiple examples and informal text in identifiers and comments. The property-based tests I wrote for the Diamond Kata specify precise properties of the diamond function, but nowhere is there a test that describes that the function draws a diamond! There’s a place for both examples and properties. It’s not an either/or decision. However, explanatory examples used for documentation need not be test inputs. If we’re generating inputs for property tests and generating documentation for our software, we can combine the two, and insert generated inputs and calculated ouputs into generated documentation.
Categories: Programming, Testing & QA

Re-Read Saturday . . . And The Readers Have Spoken

IMG_1249

The next book in our Re-Read Saturday feature will be Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement. Originally published in 1984, it has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. On February 21st we will begin re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast. Dead Tree Version or Kindle Version 

For the record, the top five books in the overall voting were:

  1. The Goal: A Process of Ongoing Improvement – Eliyahu M. Goldrattand Jeff Cox 71%
  2. Checklist Manifesto: How to Get Things Done Right- Atul Gawande 43%
  3. Three Tied:
    The Principles of Product Development Flow – Donald G. Reinertsen57%
    The Art of Software Testing – Glenford J. Myers, Cory Sandler and Tom Badgett8.57%
    The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses – Eric Reis 8.57%

I was asked on LinkedIn for a list of the other books that we have featured in the Re-Read Saturday series. Here they are:

7 Habits of Highly Effective People – Stephen Covey

Dr. Covey lays out seven behaviors of successful people (hence the title).  The book is based on observation, interviews and research; therefore the habits presented in the book not only make common sense, but also have a solid evidentiary basis. One of the reasons the book works is the integration of character and ethics into the principles.  I have written and podcasted on the importance and value of character and ethics in the IT environment many times.

Note: If you don’t have a copy of the book, buy one (I would loan you mine, but I suspect I will read it again).  If you use the link below it will support the Software Process and Measurement blog and podcast. Dead Tree Version Kindle Version

The re-read blog entries:

The audio podcast can be listened to HERE

Leading Change – John P. Kotter

Leading Change by John P. Kotter, originally published in 1996, has become a classic reference that most process improvement specialists either have or should have on their bookshelf. The core of the book lays out an eight-step model for effective change that anyone involved in change will find useful. However there is more to the book than just the model.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast. Dead Tree Version 

Entries in the Re-Read are:

I have not compiled the entries into a single essay and podcast as of February 2015.


Categories: Process Management