Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

R: Cohort analysis of Neo4j meetup members

Mark Needham - Tue, 02/24/2015 - 02:19

A few weeks ago I came across a blog post explaining how to apply cohort analysis to customer retention using R and I thought it’d be a fun exercise to calculate something similar for meetup attendees.

In the customer retention example we track customer purchases on a month by month basis and each customer is put into a cohort or bucket based on the first month they made a purchase in.

We then calculate how many of them made purchases in subsequent months and compare that with the behaviour of people in other cohorts.

In our case we aren’t selling anything so our equivalent will be a person attending a meetup. We’ll put people into cohorts based on the month of the first meetup they attended.

This can act as a proxy for when people become interested in a technology and could perhaps allow us to see how the behaviour of innovators, early adopters and the early majority differs, if at all.

The first thing we need to do is get the data showing the events that people RSVP’d ‘yes’ to. I’ve already got the data in Neo4j so we’ll write a query to extract it as a data frame:

library(RNeo4j)
graph = startGraph("http://127.0.0.1:7474/db/data/")
 
query = "MATCH (g:Group {name: \"Neo4j - London User Group\"})-[:HOSTED_EVENT]->(e),
               (e)<-[:TO]-(rsvp {response: \"yes\"})<-[:RSVPD]-(person)
         RETURN rsvp.time, person.id"
 
timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01", tz = "GMT")
 
df = cypher(graph, query)
df$time = timestampToDate(df$rsvp.time)
df$date = format(as.Date(df$time), "%Y-%m")
> df %>% head()
##         rsvp.time person.id                time    date
## 612  1.404857e+12  23362191 2014-07-08 22:00:29 2014-07
## 1765 1.380049e+12 112623332 2013-09-24 18:58:00 2013-09
## 1248 1.390563e+12   9746061 2014-01-24 11:24:35 2014-01
## 1541 1.390920e+12   7881615 2014-01-28 14:40:35 2014-01
## 3056 1.420670e+12  12810159 2015-01-07 22:31:04 2015-01
## 607  1.406025e+12  14329387 2014-07-22 10:34:51 2014-07
## 1634 1.391445e+12  91330472 2014-02-03 16:33:58 2014-02
## 2137 1.371453e+12  68874702 2013-06-17 07:17:10 2013-06
## 430  1.407835e+12 150265192 2014-08-12 09:15:31 2014-08
## 2957 1.417190e+12 182752269 2014-11-28 15:45:18 2014-11

Next we need to find the first meetup that a person attended – this will determine the cohort that the person is assigned to:

firstMeetup = df %>% 
  group_by(person.id) %>% 
  summarise(firstEvent = min(time), count = n()) %>% 
  arrange(desc(count))
 
> firstMeetup
## Source: local data frame [10 x 3]
## 
##    person.id          firstEvent count
## 1   13526622 2013-01-24 20:25:19     2
## 2  119400912 2014-10-03 13:09:09     2
## 3  122524352 2014-08-14 14:09:44     1
## 4   37201052 2012-05-21 10:26:24     3
## 5  137112712 2014-07-31 09:32:12     1
## 6  152448642 2014-06-20 08:32:50    17
## 7   98563682 2014-11-05 17:27:57     1
## 8  146976492 2014-05-17 00:04:42     4
## 9   12318409 2014-11-03 05:25:26     2
## 10  41280492 2014-10-16 19:02:03     5

Let’s assign each person to a cohort (month/year) and see how many people belong to each one:

firstMeetup$date = format(as.Date(firstMeetup$firstEvent), "%Y-%m")
byMonthYear = firstMeetup %>% count(date) %>% arrange(date)
 
ggplot(aes(x=date, y = n), data = byMonthYear) + 
  geom_bar(stat="identity", fill = "dark blue") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Unnamed chunk 4 1

Next we need to track a cohort over time to see whether people keep coming to events. I wrote the following function to work it out:

countsForCohort = function(df, firstMeetup, cohort) {
  members = (firstMeetup %>% filter(date == cohort))$person.id
 
  attendance = df %>% 
    filter(person.id %in% members) %>% 
    count(person.id, date) %>% 
    ungroup() %>%
    count(date)
 
  allCohorts = df %>% select(date) %>% unique
  cohortAttendance = merge(allCohorts, attendance, by = "date", all = TRUE)
 
  cohortAttendance[is.na(cohortAttendance) & cohortAttendance$date > cohort] = 0
  cohortAttendance %>% mutate(cohort = cohort, retention = n / length(members))  
}

On the first line we get the ids of all the people in the cohort so that we can filter the data frame to only include RSVPs by these people. The first call to ‘count’ makes sure that we only have one entry per person per month and the second call gives us a count of how many people attended an event in a given month.

Next we do the equivalent of a left join using the merge function to ensure we have a row representing each month even if noone from the cohort attended. This will lead to NA entries if there’s no matching row in the ‘attendance’ data frame – we’ll replace those with a 0 if the cohort is in the future. If not we’ll leave it as it is.

Finally we calculate the retention rate for each month for that cohort. e.g. these are some of the rows for the ‘2011-06′ cohort:

> countsForCohort(df, firstMeetup, "2011-06") %>% sample_n(10)
      date n  cohort retention
16 2013-01 1 2011-06      0.25
5  2011-10 1 2011-06      0.25
30 2014-03 0 2011-06      0.00
29 2014-02 0 2011-06      0.00
40 2015-01 0 2011-06      0.00
31 2014-04 0 2011-06      0.00
8  2012-04 2 2011-06      0.50
39 2014-12 0 2011-06      0.00
2  2011-07 1 2011-06      0.25
19 2013-04 1 2011-06      0.25

We could then choose to plot that cohort:

ggplot(aes(x=date, y = retention, colour = cohort), data = countsForCohort(df, firstMeetup, "2011-06")) + 
  geom_line(aes(group = cohort)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Unnamed chunk 5 1

From this chart we can see that none of the people who first attended a Neo4j meetup in June 2011 have attended any events for the last two years.

Next we want to be able to plot multiple cohorts on the same chart which we can easily do by constructing one big data frame and passing it to ggplot:

cohorts = collect(df %>% select(date) %>% unique())[,1]
 
cohortAttendance = data.frame()
for(cohort in cohorts) {
  cohortAttendance = rbind(cohortAttendance,countsForCohort(df, firstMeetup, cohort))      
}
 
ggplot(aes(x=date, y = retention, colour = cohort), data = cohortAttendance) + 
  geom_line(aes(group = cohort)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Unnamed chunk 5 2

This all looks a bit of a mess and at the moment we can’t easily compare cohorts as they start at different places on the x axis. We can fix that by adding a ‘monthNumber’ column to the data frame which we calculate with the following function:

monthNumber = function(cohort, date) {
  cohortAsDate = as.yearmon(cohort)
  dateAsDate = as.yearmon(date)
 
  if(cohortAsDate > dateAsDate) {
    "NA"
  } else {
    paste(round((dateAsDate - cohortAsDate) * 12), sep="")
  }
}

Now let’s create a new data frame with the month field added:

cohortAttendanceWithMonthNumber = cohortAttendance %>% 
  group_by(row_number()) %>% 
  mutate(monthNumber = monthNumber(cohort, date)) %>%
  filter(monthNumber != "NA") %>%
  filter(monthNumber != "0") %>% 
  mutate(monthNumber = as.numeric(monthNumber)) %>% 
  arrange(monthNumber)

We’re also filtering out any ‘NA’ columns which would represent row entries for months from before the cohort started. We don’t want to plot those.

finally let’s plot a chart containing all cohorts normalised by month number:

ggplot(aes(x=monthNumber, y = retention, colour = cohort), data = cohortAttendanceWithMonthNumber) + 
  geom_line(aes(group = cohort)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1), panel.background = element_blank())
Unnamed chunk 5 3

It’s still a bit of a mess but what stands out is that when the number of people in a cohort is small the fluctuation in the retention value can be quite pronounced.

The next step is to make the cohorts a bit more coarse grained to see if it reveals some insights. I think I’ll start out with a cohort covering a 3 month period and see how that works out.

Categories: Programming

Introducing ASP.NET 5

ScottGu's Blog - Scott Guthrie - Mon, 02/23/2015 - 21:41

The first preview release of ASP.NET 1.0 came out almost 15 years ago.  Since then millions of developers have used it to build and run great web applications, and over the years we have added and evolved many, many capabilities to it. 

I'm excited today to post about a new release of ASP.NET that we are working on that we are calling ASP.NET 5.  This new release is one of the most significant architectural updates we've done to ASP.NET.  As part of this release we are making ASP.NET leaner, more modular, cross-platform, and cloud optimized.  The ASP.NET 5 preview is now available as a preview release, and you can start using it today by downloading the latest CTP of Visual Studio 2015 which we just made available.

ASP.NET 5 is an open source web framework for building modern web applications that can be developed and run on Windows, Linux and the Mac. It includes the MVC 6 framework, which now combines the features of MVC and Web API into a single web programming framework.  ASP.NET 5 will also be the basis for SignalR 3 - enabling you to add real time functionality to cloud connected applications. ASP.NET 5 is built on the .NET Core runtime, but it can also be run on the full .NET Framework for maximum compatibility.

With ASP.NET 5 we are making a number of architectural changes that makes the core web framework much leaner (it no longer requires System.Web.dll) and more modular (almost all features are now implemented as NuGet modules - allowing you to optimize your app to have just what you need).  With ASP.NET 5 you gain the following foundational improvements:

  • Build and run cross-platform ASP.NET apps on Windows, Mac and Linux
  • Built on .NET Core, which supports true side-by-side app versioning
  • New tooling that simplifies modern Web development
  • Single aligned web stack for Web UI and Web APIs
  • Cloud-ready environment-based configuration
  • Integrated support for creating and using NuGet packages
  • Built-in support for dependency injection
  • Ability to host on IIS or self-host in your own process

The end result is an ASP.NET that you'll feel very familiar with, and which is also now even more tuned for modern web development.

Flexible, Cross-Platform Runtime

ASP.NET 5 works with two runtime environments to give you greater flexibility when hosting your app. The two runtime choices are:

.NET Core – a new, modular, cross-platform runtime with a smaller footprint.  When you target the .NET Core, you’ll be able to take advantage of some exciting new benefits:

1) You can deploy the .NET Core runtime with your app which means your app will run with this deployed version of the runtime rather than the version of the runtime that is installed on the host operating system. Your version of the runtime runs side-by-side with versions for other apps. You can update that runtime, if needed, without affecting other apps, or you can continue running on the same version even though other apps on the system have been updated.  This makes app deployment and framework updates much easier and less impactful to other apps running on a system.

2) Your app is only dependent on features it really needs. Therefore, you are never prompted to update/service the runtime for features that are not relevant to your app. You will spend less time testing and deploying updates that are perhaps unrelated to the functionality of your app.

3) Your app can now be run cross-platform. We will provide a cross-platform version of .NET Core for Windows, Linux and Mac OS X systems.  Regardless of which operating system you use for development or which operating system you target for deployment, you will be able to use .NET. The cross-platform version of the runtime has not been released yet, but we are working on it on GitHub and plan to have an official preview of it out soon.

.NET Framework – The API for .NET Core is currently more limited than the full .NET Framework, so you may need to modify existing apps to target .NET Core. If you don't want to have to update your app you can instead run ASP.NET 5 applications on the full .NET Framework (version 4.5.2 and above).  When doing this you have access to the complete set of .NET Framework APIs. Your existing applications and libraries will work without modification on this runtime. MVC 6 - a unified programming model

MVC, Web API and Web Pages provide complementary functionality and are frequently used together when developing a solution. However, in past ASP.NET releases, these programming frameworks were implemented separately and therefore contained some duplication and inconsistencies. With MVC 6, we are merging those models into a single programming model. Now, you can create a single web application that handles the Web UI and data services without needing to reconcile differences in these programming frameworks. You will also be able to seamlessly transition a simple site first developed with Web Pages into a more robust MVC application.

You can now return Razor views and content-negotiated data from the same controller and using the same MVC filter pipeline.

In addition to unifying the existing frameworks we are also adding new features to make server-side Web development easier, like the new tag helpers feature. Tag helpers let you use HTML helpers in your views by simply extending the semantics of tags in your markup.

So instead of writing this:

@Html.ValidationSummary(true, "", new { @class = "text-danger" })<?xml:namespace prefix = "o" />

<div class="form-group">

    @Html.LabelFor(m => m.UserName, new { @class = "col-md-2 control-label" })

    <div class="col-md-10">

        @Html.TextBoxFor(m => m.UserName, new { @class = "form-control" })

        @Html.ValidationMessageFor(m => m.UserName, "", new { @class = "text-danger" })

    </div>

</div>

You can instead write this:

<div asp-validation-summary="ModelOnly" class="text-danger"></div>

<div class="form-group">

    <label asp-for="UserName" class="col-md-2 control-label"></label>

    <div class="col-md-10">

        <input asp-for="UserName" class="form-control" />

        <span asp-validation-for="UserName" class="text-danger"></span>

    </div>

</div>

Tag helpers make authoring your views more natural and readable. They also simplify customizing the output of HTML helpers with additional markup while letting you take full advantage of the HTML editor.

For more examples of creating MVC 6 apps, see these tutorials. Modern web development

This week's ASP.NET 5 preview also includes a number of other great development features that enable you to build even better web applications:

Dynamic Development

In Visual Studio 2015, we take advantage of dynamic compilation to provide a streamlined developer experience. You no longer have to compile your application every time you want to see a change. Instead, just (1) edit the code, (2) save your changes, (3) refresh the browser, and then (4) see your change automatically appear.

image

You enjoy a development experience that is similar to working with an interpreted language without sacrificing the benefits of a compiled language.

You can also optionally use other code editors to work on your ASP.NET 5 projects. Every function within the Visual Studio user interface is matched with cross-platform command-line operations.

Integration with Popular Web Development Tools (Bower, Grunt and Gulp)

Another exciting feature in Visual Studio 2015 is built-in support for Bower, Grunt, and Gulp - popular open source tools that we think should be in every Web developer’s toolkit.

  • Bower is a package manager for client-side libraries, including both JavaScript and CSS libraries.
  • Grunt and Gulp are task runners, which help you to automate your web development workflow. You can use Grunt or Gulp for tasks like compiling LESS, CoffeeScript, or TypeScript files, running JSLint, or minifying JavaScript files.

Bower: To add a JavaScript library to your ASP.NET project add it directly in the bower.json config file:

image

Notice that Visual Studio gives you IntelliSense with a list of available packages. The next time you open the solution, Visual Studio automatically restores any missing packages, so you don’t need to check the packages into source control.

For server-side packages, you’ll still use NuGet Package Manager.

Grunt: In modern web development, you can find yourself managing a lot of tasks, just to build your app: Compiling LESS, TypeScript, or CoffeeScript files, linting, JavaScript minification, running JS unit tests, and so on. Every team will have its own set of requirements, depending on the particular tools that you use. Task runners make it easier to manage and coordinate these tasks. Visual Studio 2015 will support two popular task runners, Grunt and Gulp.

For example, let’s say you want to use Grunt to compile LESS files. Just go into package.json and add the grunt-contrib-less package, which is a third-party Grunt plugin.

image

Use the new Task Runner Explorer in Visual Studio 2015 to bind the task to a build step (pre-build, post-build, clean, or when the solution is opened).

image

This makes it incredibly easy to automate common tasks within your projects - and have them work both for you, as well as across a team wide project.

Simplified dependency management

In ASP.NET 5 you manage dependencies by adding NuGet packages. You can use the NuGet Package Manager or simply edit the JSON file (project.json) that lists the NuGet packages and versions used in your project. The project.json file is easy to work with and you can edit it with any text editor, which enables you to update dependencies even when the app has been deployed to the cloud.

The project.json file looks like:

image

In Visual Studio 2015, IntelliSense assists you with finding the available NuGet packages that you can add as dependencies.

image

And, Intellisense can even help you with the available versions:

image

Cloud-ready configuration

In ASP.NET 5, we eliminated the need to use Web.config file for configuration values. We wanted to make it easier for you to deploy your app to the cloud and have the app automatically read the correct configuration values for that environment. The new system enables you to request named values from a variety of sources (such as JSON, XML, or environment variables). You can decide which formats work best in your situation.

In the Startup.cs file, you can now add or remove the sources for configuration values.

image

The above code snippet shows a project that is set up to retrieve configuration values from a JSON file and environmental variables. You can change this code if you need to specify other sources. In the specified config.json file, you could provide the values.

image

In your host environment, such as Azure, you can set the environmental variables and those values are automatically used instead of local configuration values after the application is deployed. You can deploy your application without worrying about publishing test values.

Dependency injection (DI)

Dependency Injection (DI) is supported in existing ASP.NET frameworks, like MVC, Web API and SignalR, but not in a consistent and holistic way. ASP.NET 5 provides a built-in DI abstraction that is available in a consistent way throughout the entire web stack. You can access services at startup, in middleware, in filters, in controllers, in model binding and virtually any part of the pipeline where you want to use your services. ASP.NET 5 includes a minimalistic DI container to bootstrap the system, but you can easily replace the default container with your container of choice (Autofac, Ninject, etc). Services can be singleton, scoped to the request or transient.

For example, to see how to use constructor injection with ASP.NET MVC 6, create a new ASP.NET 5 Starter Web project and add a simple time service:

using System;

 

namespace WebApplication1

{

    public class TimeService

    {

        public TimeService()

        {

            Ticks = DateTime.Now.Ticks.ToString();

        }

        public String Ticks { get; set; }

    }

}

The simple service class sets the current Ticks when the constructor is called.

Next, register the time service as a transient service in the ConfigureServices method of the Startup class:

public void ConfigureServices(IServiceCollection services)

{

    services.AddMvc();

    services.AddTransient<TimeService>();

}

Then, update the HomeController to use constructor injection and to write the Ticks when the TimeService object was created.

public class HomeController : Controller

{

    public TimeService TimeService { get; set; }

 

    public HomeController(TimeService timeService)

    {

        TimeService = timeService;

    }

 

    public IActionResult About()

    {

        ViewBag.Message = TimeService.Ticks + " From Controller";

        System.Threading.Thread.Sleep(1);

        return View();

    }

 

    // Code removed for brevity

}

Notice the controller doesn't create a TimeService. It's injected when the controller is instantiated.

In MVC 6 you can use the [Activate] attribute to inject services via properties. You can use [Activate] not just on controllers but also on filters, and view components. This means you can simplify your controller code like this:

public class HomeController : Controller

{

    [Activate]

    public TimeService TimeService { get; set; }

 

    // Code removed for brevity

}

MVC 6 also supports DI into Razor views via the @inject keyword. In the code below, I’ve injected the time service into the about view directly and defined a TimeSvc property by which it can be accessed:

@using WebApplication23

@inject TimeService TimeSvc

 

<h3>@ViewBag.Message</h3>

 

<h3>

    @TimeSvc.Ticks From Razor

</h3>

When you run the app, you can see different ticks values from the controller and the view.

image

Fast HTTP performance

ASP.NET 5 introduces a new HTTP request pipeline that is modular so you can add only the components that you need. The pipeline is also no longer dependent on System.Web. By reducing the overhead in the pipeline, your app can experience better throughput and a more tuned HTTP stack. The new pipeline is based on many of the learnings from the Katana project and also supports OWIN.

To customize which components are used in the pipeline, use the Configure method in your Startup class. The Configure method is used to specify which middleware you want to “use” in your request pipeline. ASP.NET 5 already includes ported versions of many of the middleware from the Katana project, like middleware for static files, authentication and diagnostics. The following image shows some of the features you can add or remove to the pipeline for your project.

public void Configure(IApplicationBuilder app)

{

    // Add static files to the request pipeline.

    app.UseStaticFiles();

 

    // Add cookie-based authentication to the request pipeline.

    app.UseIdentity();

 

    // Add MVC and routing to the request pipeline.

    app.UseMvc(routes =>

    {

    routes.MapRoute(

        name: "default",

        template: "{controller}/{action}/{id?}",

        defaults: new { controller = "Home", action = "Index" });

 

});

You can also write your own middleware components and add them to the pipeline. Open source

We are developing ASP.NET 5 as an open source project on GitHub. You can view the code, see when changes were made, download the code, and submit changes. We believe making ASP.NET 5 open source will we make it easier for you to understand the code, understand our intended direction, and contribute to the project.

image

Docs and tutorials

To get started with ASP.NET 5 you can find docs and tutorials on the ASP.NET site at http://asp.net/vnext. The following tutorials will guide you through the steps of creating your first ASP.NET 5 project.

Also read this article for even more ASP.NET and Web Development improvements coming this week.

Hope this help,

Scott

omni
Categories: Architecture, Programming

I Think You'll Find It's a Bit More Complicated Than That

Herding Cats - Glen Alleman - Mon, 02/23/2015 - 19:35

When we encounter simple answers to complex problems, we need to not only be skeptical, we need to think twice about the credibility of the person posing the solution. A recent example is:

The cost of software is not directly proportional to the value it produces. Knowing cost is potentially useless information.

The first sentence is likely the case. Value of any one feature or capability is not necessary related to it's cost. Since cost in software development is nearly 100% correlated with the cost of the labor needed to produce the feature.

But certainly the cost of developing all the capabilities and the cost of individual capabilities when their interactions are considered must be related to their value or the principles of Microeconomics of Software Development would not longer be in place.

Microeconomics is a branch of economics that studies the behavior of individuals and small impacting organizations in making decisions on the allocation of limited resources. Those limited resources include (but are not limited to) Time and Money.

So without knowing the cost or time it takes to produce an outcome, the simple decision making processes of spending other peoples money based on the Return on that Investment gets a divide by zero error

ROI = (Value - Cost) / Cost

Since all elements of a project are driven by statistical processes, the outcomes are always probabilistic. The delivered capabilities are what the customer bought. Cost and Schedule are needed to produce those capabilities. The success of the project in providing the needed capabilities depends on knowing the Key Performance Parameters, the Measures of Effectiveness, the Measures of Performance, and the Technical Performance Measures of those capabilities and the technical and operational requirements that implement them.

The cost and schedule to fulfill all these probabilistic outcomes is itself probabilistic. It is literally impossible to determine these outcomes in a deterministic manner when each is a statistical process without estimating. The Cost and Schedule elements are also probabilistic, further requiring estimates.

Slide1

The notion that you can determine the Value of something without knowing its Cost is actually nonsense. Anyone suggesting that is the case has little understanding of business, microeconomics of software development or how the world or business treats expenditures of other peoples money.

Here's some background to help in that understanding:

And for any suggestion that cost is not important in determining value please read

51lMOhz7RdLBetween this last book and the books above and all the papers, articles, and training provided about how to manage other people's money when producing value from software systems, you'll hopefully come to realize those notions that we don't need to know the cost, can't know the cost, and poor at making estimates, and should simply start coding and see what comes out are not only seriously misinformed, but misinformed with intentional ignorance.

If your project is not using other peoples money, if your project has low value at risk, if your project is of low importance to those paying, then maybe, just maybe they don't really care how much you spend, when you'll be done, or what will result. But that doesn't sound very fulfilling where I live.

Related articles Start with Problem, Than Suggest The Solution Estimating is Risk Management Software Engineering is a Verb
Categories: Project Management

HappyPancake: a Retrospective on Building a Simple and Scalable Foundation

This is a guest repost by Rinat Abdullin, who worked on HappyPancake, the largest free dating site in Sweden. Initially written in ASP.NET and MS SQL Database server, it eventually became overly complex and expensive to scale. This is the last post in a nearly two year long series of engaging articles on the evolution of the project. For the complete list please see the end of this article.

Our project at HappyPancake completed this week. We delivered a simple and scalable foundation for the next version of largest free dating web site in Sweden (with presence in Norway and Finland).

Journey

Below is the short map of that journey. It lists technologies and approaches that we evaluated for this project. Yellow regions highlight items which made their way into the final design.

Project Deliverables
Categories: Architecture

It’s Not The Critic Who Counts

Making the Complex Simple - John Sonmez - Mon, 02/23/2015 - 17:00

“It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and ... Read More

The post It’s Not The Critic Who Counts appeared first on Simple Programmer.

Categories: Programming

SPaMCAST 330 – Anthony Mersino, Agile Project Management

www.spamcast.net

http://www.spamcast.net

Listen Now

Subscribe on iTunes

This week’s Software Process and Measurement Cast features our interview Anthony Mersino, author of Emotional Intelligence for Project Managers and the newly published Agile Project Management.  Anthony and I talked about Agile, coaching and organizational change.  It is a wide ranging interview that will help any leader raise the bar!   We also talked about his new venture: Vitality Chicago.

We are having a contest! Anthony has offered a copy of his great new book to a randomly selected SPaMCAST listener, ANYWHERE IN THE WORLD.  Enter between February 22th and March 7th.  The winner will be announced on March 8th.  If you want a copy of Agile Project Management you have two options: send your name and email address to spamcastinfor@gmail.com (I will act as the broker and notify the winner at which point we can deal with other types of addresses), OR you can buy a copy.  Remember buying a copy through the Software Process and Measurement Cast helps support the podcast.

Dead Tree Version or Kindle Version

Anthony’s bio:
Anthony C. Mersino, PMP, PMI-ACP, CSP is an Agile Transformation Coach and IT Program Manager with more than 28 years of experience.  He has delivered large-scale business solutions to clients that include Abbot Labs, IBM, Unisys, NORC, and Wolters Kluwer, and provided Agile Coaching for The Carlyle Group, Northern Trust, Bank of America, and Highland Solutions.

Anthony is the author of Agile Project Management, and Emotional Intelligence for Project Managers.  He is also the founder of Vitality Chicago, an Agile transformation consulting firm focused on helping teams THRIVE and organizations TRANSFORM.

Contact information:

Email:
Anthony@ProjectAdvisorsGroup.com
AMERSINO@VITALITYCHICAGO.COM

Websites:
http://projectadvisorsgroup.com/about.html

http://www.vitalitychicago.com/

Call to action!

Can you tell a friend about the podcast?  Even better, show them how you listen to the Software Process and Measurement Cast and subscribe them!  Send me the name of you person you subscribed and I will give both you and the horde you have converted to listeners a call out on the show.

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

Upcoming Events

CMMI Institute Conference EMEA 2016
March 26 -27 London, UK
I will be presenting. My presentation is titled “Agile Risk Management.”
http://cmmi.unicom.co.uk/

 

Next SPaMCast

In the next Software Process and Measurement Cast we will feature another magazine feature.  The features in next week’s podcast include columns from Gene Hughson, discussing micro-services. Jo Ann Sweeney Explaining Change and our essay on Agile Coaching.  Coaches help teams and projects deliver the most value, however many times organizations eschew coaches or conflate management and coaching.  Both actions rob teams and organizations of energy and value. We discuss why next week.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.


Categories: Process Management

SPaMCAST 330 – Anthony Mersino, Agile Project Management

Software Process and Measurement Cast - Sun, 02/22/2015 - 23:00

This week’s Software Process and Measurement Cast features our interview Anthony Mersino, author of Emotional Intelligence for Project Managers and the newly published Agile Project Management.  Anthony and I talked about Agile, coaching and organizational change.  It is a wide ranging interview that will help any leader raise the bar!   We also talked about his new venture: Vitality Chicago.

We are having a contest! Anthony has offered a copy of his great new book to a randomly selected SPaMCAST listener, ANYWHERE IN THE WORLD.  Enter between February 22th and March 7th.  The winner will be announced on March 8th.  If you want a copy of Agile Project Management you have two options: send your name and email address to spamcastinfor@gmail.com (I will act as the broker and notify the winner at which point we can deal with other types of addresses), OR you can buy a copy.  Remember buying a copy through the Software Process and Measurement Cast helps support the podcast.

Dead Tree Version or Kindle Version

Anthony’s bio:

Anthony C. Mersino, PMP, PMI-ACP, CSP is an Agile Transformation Coach and IT Program Manager with more than 28 years of experience.  He has delivered large-scale business solutions to clients that include Abbot Labs, IBM, Unisys, NORC, and Wolters Kluwer, and provided Agile Coaching for The Carlyle Group, Northern Trust, Bank of America, and Highland Solutions.

Anthony is the author of Agile Project Management, and Emotional Intelligence for Project Managers.  He is also the founder of Vitality Chicago, an Agile transformation consulting firm focused on helping teams THRIVE and organizations TRANSFORM.

Contact information:

Email:

Anthony@ProjectAdvisorsGroup.com

AMERSINO@VITALITYCHICAGO.COM

Websites:

http://projectadvisorsgroup.com/about.html

http://www.vitalitychicago.com/

 

Call to action!

Can you tell a friend about the podcast?  Even better, show them how you listen to the Software Process and Measurement Cast and subscribe them!  Send me the name of you person you subscribed and I will give both you and the horde you have converted to listeners a call out on the show.

 

Re-Read Saturday News

The Re-Read Saturday focus on Eliyahu M. Goldratt and Jeff Cox’s The Goal: A Process of Ongoing Improvement began on February 21nd. The Goal has been hugely influential because it introduced the Theory of Constraints, which is central to lean thinking. The book is written as a business novel. Visit the Software Process and Measurement Blog and catch up on the re-read.

Note: If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast.

Dead Tree Version or Kindle Version 

 

Upcoming Events

CMMI Institute Conference EMEA 2016

March 26 -27 London, UK

I will be presenting “Agile Risk Management.”

http://cmmi.unicom.co.uk/

Next SPaMCast

In the next Software Process and Measurement Cast we will feature another magazine feature.  The features in next week’s podcast include columns from Gene Hughson, discussing micro-services. Jo Ann Sweeney Explaining Change and our essay on Agile Coaching.  Coaches help teams and projects deliver the most value, however many times organizations eschew coaches or conflate management and coaching.  Both actions rob teams and organizations of energy and value. We discuss why next week.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

R/dplyr: Extracting data frame column value for filtering with %in%

Mark Needham - Sun, 02/22/2015 - 09:58

I’ve been playing around with dplyr over the weekend and wanted to extract the values from a data frame column to use in a later filtering step.

I had a data frame:

library(dplyr)
df = data.frame(userId = c(1,2,3,4,5), score = c(2,3,4,5,5))

And wanted to extract the userIds of those people who have a score greater than 3. I started with:

highScoringPeople = df %>% filter(score > 3) %>% select(userId)
> highScoringPeople
  userId
1      3
2      4
3      5

And then filtered the data frame expecting to get back those 3 people:

> df %>% filter(userId %in% highScoringPeople)
[1] userId score 
<0 rows> (or 0-length row.names)

No rows! I created vector with the numbers 3-5 to make sure that worked:

> df %>% filter(userId %in% c(3,4,5))
  userId score
1      3     4
2      4     5
3      5     5

That works as expected so highScoringPeople obviously isn’t in the right format to facilitate an ‘in lookup’. Let’s explore:

> str(c(3,4,5))
 num [1:3] 3 4 5
 
> str(highScoringPeople)
'data.frame':	3 obs. of  1 variable:
 $ userId: num  3 4 5

Now it’s even more obvious why it doesn’t work – highScoringPeople is still a data frame when we need it to be a vector/list.

One way to fix this is to extract the userIds using the $ syntax instead of the select function:

highScoringPeople = (df %>% filter(score > 3))$userId
 
> str(highScoringPeople)
 num [1:3] 3 4 5
 
> df %>% filter(userId %in% highScoringPeople)
  userId score
1      3     4
2      4     5
3      5     5

Or if we want to do the column selection using dplyr we can extract the values for the column like this:

highScoringPeople = (df %>% filter(score > 3) %>% select(userId))[[1]]
 
> str(highScoringPeople)
 num [1:3] 3 4 5

Not so difficult after all.

Categories: Programming

The Great Motivational Quotes Revamped

When you need to make things happen, motivational quotes can help you dig deep and get going.

I put together a very comprehensive collection of the world’s best motivational quotes a while back.

It was time for a refresh.  Here it is:

Motivational Quotes – The Great Motivational Quotes Collection

Imagine motivational wisdom of the ages and modern sages right at your fingertips all on one page.   I included motivational quotes from Bruce Lee, Tony Robbins, Winston Churchill, Waldo Emerson, Jim Rohn, and more.

See if you can find at least three motivational quotes that you can take with you on the road of life, to help you deal with setbacks and challenges, and to unleash your inner-awesome.

Getting Started with Motivational Quotes

I’ll start you off.   If you don’t already have these in your personal motivational quotes collection, here are a few that I draw from often:

“If you’re going through hell, keep going.” — Winston Churchill

“When it’s time to die, let us not discover that we have never lived.” -Henry David Thoreau

“Don’t ask yourself what the world needs, ask yourself what makes you come alive. And then go do that. Because what the world needs is people who have come alive.”— Howard Thurman

How’s that for a starter set?

Build Better Motivational Thought Habits

You can train your brain with motivational mantras.     Our thoughts are habits.   If you want to build better thought habits, then feed on some of the best motivational quotes of all time.

“An ounce of action is worth a ton of theory.” – Ralph Waldo Emerson

“Positive thinking won’t let you do anything but it will let you do everything better than negative thinking will.” -– Zig Ziglar

“The only person you are destined to become is the person you decide to be.” – Ralph Waldo Emerson

If you train yourself well, you won’t entirely eliminate motivational setbacks, but you’ll be able to defeat procrastination, and you’ll be able to bounce back faster when you find yourself in a slump.   Motivation is a skill you can build, and it will serve you well, in work and life.

You Create Your Future

The most important motivational concept to hold on to is the idea that you create your future.  Or, as Wayne Dyer puts it:

“Go for it now. The future is promised to no one.”

So go for the bold, and get your game face on.

If you need some help kick-starting your fire, stroll through the motivational quotes a few times until something really sinks in or clicks for you.  Life’s better with the right words, and there are just the right words already out there, just waiting to be found.

Enjoy and take your time sifting through the Motivational Quotes – The Great Motivational Quotes Collection.

Also, if you have a favorite motivational quote that I don’t have listed, let me know.

You Might Also Like

The Great Inspirational Quotes Revamped

The Great Happiness Quotes Collection Revamped

The Great Leadership Quotes Collection Revamped

The Great Love Quotes Collection Revamped

The Great Personal Development Quotes Collection Revamped

The Great Productivity Quotes Collection Revamped

Categories: Architecture, Programming

Agile Software Development in the DOD

Herding Cats - Glen Alleman - Sun, 02/22/2015 - 00:59

I spoke at a workshop this week at The Nexus of Agile Software Development and Earned Value Management, OUSD(AT&L)/PARCA, February 19 – 20, 2015 Institute for Defense Analysis, Alexandria, VA.

This meeting was attended by government and industry representatives to share ideas of how to integrate Agile Software Development on Earned Value Management programs. EVM programs start with awards greater than $20M, so these are non-trivial efforts. The presentations will be available soon,. and I'll update this post when they're posted on the PARCA site.

In the mean time there is existing guidance for starting this process. But first here's a collection from SEI on the topic.

  • First Principle - Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.”
  • Second Principle - Welcome changing requirements, even late in development
  • Third Principle - Delivering working software frequently”
  • Fourth Principle - Business people and developers must work together daily throughout the project.
  • Fifth Principle - Build projects around motivated individuals. Give them the environment and support they need and trust them to get the job done.
  • Sixth Principle - The most efficient and effective method of conveying information to and within a development team is face-to-face conversation
  • Seventh Principle - Working software is the primary measure of progress
  • Eighth Principle - Sustainable development pace.

As well here's some background

The notion of agile in large complex programs has come to the forefront of defense acquisition with Better Buying Power and Achieving Better Buying Power for Software Acquisitions

Related articles Real Life Sources of Empirical Data for Project Estimates We Suck At Estimating Software Engineering is a Verb Building A Credible Measurement Baseline
Categories: Project Management

Re-Read Saturday: The Goal: A Process of Ongoing Improvement, Part 1

download

Today we begin the re-read of The Goal. If you don’t have a copy of the book, buy one.  If you use the link below it will support the Software Process and Measurement blog and podcast. Dead Tree Version or Kindle Version 

Eliyahu M. Goldratt and Jeff Cox’s wrote The Goal: A Process of Ongoing Improvement (published in 1984).  The Goal was framed as a business novel. In general, a novel presents a story through a set of actions and events to facilitate a plot. A business novel uses the plot, interactions between characters and events to develop and illustrate concepts or processes that are important to the author.  Bottom line, a business novel uses metaphors rather than drawn out scholarly exposition to make its point.  The Goal uses the story of Alex Rogo, plant manager, to illustrate the theory of constraints and how the wrong measurement focus can harm an organization.

I am using the 30th anniversary edition of The Goal for this re-read.  This version of the book includes two forewords and 40 chapters.

The two forwards to The Goal expose Goldratt’s philosophical approach.  For example, in the forwards, science is defined as a method to develop or expose a “minimum set of assumptions that can be explained through straightforward derivation, the existence of many phenomena of nature.” Science provides an approach to develop an understanding why something is occurring and then to be able to test against that understanding. We deduce based on observation and measurement to develop a hypothesis and then continue to compare what we see against the hypothesis. Good science is the foundation of effective process improvement. Good process improvement is simply a requirement for survival in today’s dynamic business environment.

The characters introduced in chapters 1 – 4:

  • Alex Rogo – the protagonist, manufacturing plant manager
  • Bill Peach – command and control division vice-president
  • Fran – Alex’s secretary
  • Bob Donovan – Production Manager
  • Julie Rogo – Alex Rogo’s wife

Chapter 1:

In the first chapter we are immediately introduced to Alex Rogo, plant manager, and his boss, Bill Peach.  Our protagonist, Alex Rogo is immediately thrown into a crisis, revolving around a late shipment that his boss has arrived, unannounced, at the plant to expedite. Bill Peach begins by interfering with plant operations, which leads to a critical mechanic quitting and to a broken and potentially sabotaged machine.  Remember back to Kotter’s eight stage model for significant change in his seminal book, Leading Change (the last book featured in out Re-Read Saturday feature).  The first step in the model was to establish a sense of urgency. Goldratt and Cox use chapter one to establish the proverbial burning platform.  The plant is losing money, orders are shipping late and Peach has delivered an ultimatum that unless the plant is turned around, it will be closed.

Chapter 2:

The immediate crisis is surmounted, the order is completed and shipped. The plant focused on getting a single order done and shipped. Bob Donavan noted that everyone pulled together, behavior that the Agile community would call “swarming.”  A thread running through the chapter is that the plant has aggressively pursued cost savings and increased efficiency. This thread foreshadows a recognition that measuring the cost savings or efficiency improvement in any individual step might not provide the results the organization expects. Rogo reflects at one point that he has the best people and the best technology, therefore he must be a poor manager.

Chapter 3:

This chapter develops on the corporate culture by exposing the fixation on efficiency and cost control as the basis for measurement and comparison.  The whole division is on the chopping block and an endemic atmosphere of fear has taken hold.  For example, Rogo’s and Peach’s relationship that had, in the past, been marked by camaraderie is a reflection of the fear and animosity that has been generated. Fear hinders the collaboration and innovation that will be needed to save both the plant and the division.  W. Edward Deming in his famous 14 principles explicitly stated “drive out fear, so that everyone may work effectively for the company.” My interpretation of chapter 3 is that fear and the tools that generate fear will need to be addressed for the division to survive.

Chapters 1 through 3 actively present the reader with a burning platform.  The plant and division are failing.  Alex Rogo has actively pursued increased efficiency and automation to generate cost reductions, however performance is falling even further behind and fear has become central feature in the corporate culture.

Next week we begin the path toward redemption!

What are your thoughts on the forwards and first 3 chapters?


Categories: Process Management

The Great Inspirational Quotes Collection Revamped

I think of inspiration simply as “breathe life into.”

Whether you're shipping code, designing the next big thing, or simply making things happen, inspirational quotes can help keep you going.

In the spirit of helping people find their Eye of the Tiger or get their mojo on, I’ve put together a hand-crafted collection of the ultimate inspirational quotes:

The Inspirational Quotes Collection

If you’ve seen my collection of inspirational quotes before, it’s completely revamped.   It should be much easier to browse all of the inspirational quotes now so you can see some old familiar quotes that you may have heard of long ago, as well as many inspirational quotes, you have never heard of before.

Dive in, explore the collection of inspirational quotes, and see if you can find at least three inspirational quotes that breathe new life into your moment, your day, your work, or anything you do.

The Power of Inspirational Quotes

Inspirational quotes can help us move mountains.   The right inspirational words and ideas can help us boldly go where we have not gone before, as well as conquer our fears and soar to new heights.

Or, the right inspirational quote can simply help us roar a little louder inside, when we need it most.

Life isn’t always a bowl of cherries.  And work can be an incredible challenge.    And sometimes, even our best laid plans, go up in flames.

So having a repertoire of inspirational quotes and inspiring mantras at your mental fingertips can help you roll with the punches and keep going.

One of the most important inspirational ideas I learned early on goes like this:

Whatever doesn’t kill you makes you stronger.

It helped me turn trials into triumphs, and eventually learn to take on big challenges as a way to grow.

Another inspirational idea that really helped me find my way forward is by Ralph Waldo Emerson, and, it goes like this:

“Do not follow where the path may lead. Go, instead, where there is no path and leave a trail.”

Whenever I went on a new journey, down an unfamiliar path, it helped remind me that I don’t always need a trail, and that many times, it’s about blazing my own trail.

The power of inspirational quotes is their power to light a fire inside and fan the flames until we go and blaze our trail that leaves our self, and others, in awe.

What Lies Within Us

Perhaps, the greatest inspirational quote of all time is another amazing quote by Emerson:

“What lies behind us and what lies before us are tiny matters compared to what lies within us.”

It’s an awe-inspiring reminder to not only do what makes us come alive, but to realize our potential and unleash what we are capable of.

It’s Better to Burn Out, then Fade Away

So many inspirational quotes remind us that life is short and that we have to go for it.   But maybe George Bernard Shaw said it best:

“I want to be all used up when I die.”

One quote that I think about often is by Seth Godin:

“Life is like skiing.  Just like skiing, the goal is not to get to the bottom of the hill. It’s to have a bunch of good runs before the sun sets.”

It’s all about making the journey worth it.

When It’s Over

What do you do when it’s over.  It all depends.   Dr. Seuss has an interesting twist:

“Don’t cry because it’s over. Smile because it happened.”

But the one that I find has true wisdom is from Dave Weinbaum:

“The secret to a rich life is to have more beginnings than endings.”

Here’s to new many more beginnings in your life.

Enjoy and be sure to explore The Inspirational Quotes Collection to soar or roar in your own personal way.

You Might Also Like

The Great Happiness Quotes Collection Revamped

The Great Leadership Quotes Collection Revamped

The Great Love Quotes Collection Revamped

The Great Personal Development Quotes Collection Revamped

The Great Productivity Quotes Collection Revamped

Categories: Architecture, Programming

Python/scikit-learn: Detecting which sentences in a transcript contain a speaker

Mark Needham - Fri, 02/20/2015 - 23:42

Over the past couple of months I’ve been playing around with How I met your mother transcripts and the most recent thing I’ve been working on is how to extract the speaker for a particular sentence.

This initially seemed like a really simple problem as most of the initial sentences I looked at weere structured like this:

<speaker>: <sentence>

If there were all in that format then we could write a simple regular expression and then move on but unfortunately they aren’t. We could probably write a more complex regex to pull out the speaker but I thought it’d be fun to see if I could train a model to work it out instead.

The approach I’ve taken is derived from an example in the NLTK book.

The first problem with this approach was that I didn’t have any labelled data to work with so I wrote a little web application that made it easy for me to train chunks of sentences at a time:

2015 02 20 00 44 38

I stored the trained words in a JSON file. Each entry looks like this:

import json
with open("data/import/trained_sentences.json", "r") as json_file:
    json_data = json.load(json_file)
 
>>> json_data[0]
{u'words': [{u'word': u'You', u'speaker': False}, {u'word': u'ca', u'speaker': False}, {u'word': u"n't", u'speaker': False}, {u'word': u'be', u'speaker': False}, {u'word': u'friends', u'speaker': False}, {u'word': u'with', u'speaker': False}, {u'word': u'Robin', u'speaker': False}, {u'word': u'.', u'speaker': False}]}
 
>>> json_data[1]
{u'words': [{u'word': u'Robin', u'speaker': True}, {u'word': u':', u'speaker': False}, {u'word': u'Well', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'it', u'speaker': False}, {u'word': u"'s", u'speaker': False}, {u'word': u'a', u'speaker': False}, {u'word': u'bit', u'speaker': False}, {u'word': u'early', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'but', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'of', u'speaker': False}, {u'word': u'course', u'speaker': False}, {u'word': u',', u'speaker': False}, {u'word': u'I', u'speaker': False}, {u'word': u'might', u'speaker': False}, {u'word': u'consider', u'speaker': False}, {u'word': u'...', u'speaker': False}, {u'word': u'I', u'speaker': False}, {u'word': u'moved', u'speaker': False}, {u'word': u'here', u'speaker': False}, {u'word': u',', u'speaker': False}, {u'word': u'let', u'speaker': False}, {u'word': u'me', u'speaker': False}, {u'word': u'think', u'speaker': False}, {u'word': u'.', u'speaker': False}]}

Each word in the sentence is represented by a JSON object which also indicates if that word was a speaker in the sentence.

Feature selection

Now that I’ve got some trained data to work with I needed to choose which features I’d use to train my model.

One of the most obvious indicators that a word is the speaker in the sentence is that the next word is ‘:’ so ‘next word’ can be a feature. I also went with ‘previous word’ and the word itself for my first cut.

This is the function I wrote to convert a word in a sentence into a set of features:

def pos_features(sentence, i):
    features = {}
    features["word"] = sentence[i]
    if i == 0:
        features["prev-word"] = "<START>"
    else:
        features["prev-word"] = sentence[i-1]
    if i == len(sentence) - 1:
        features["next-word"] = "<END>"
    else:
        features["next-word"] = sentence[i+1]
    return features

Let’s try a couple of examples:

import nltk
 
>>> pos_features(nltk.word_tokenize("Robin: Hi Ted, how are you?"), 0)
{'prev-word': '<START>', 'word': 'Robin', 'next-word': ':'}
 
>>> pos_features(nltk.word_tokenize("Robin: Hi Ted, how are you?"), 5)
{'prev-word': ',', 'word': 'how', 'next-word': 'are'}

Now let’s run that function over our full set of labelled data:

with open("data/import/trained_sentences.json", "r") as json_file:
    json_data = json.load(json_file)
 
tagged_sents = []
for sentence in json_data:
    tagged_sents.append([(word["word"], word["speaker"]) for word in sentence["words"]])
 
featuresets = []
for tagged_sent in tagged_sents:
    untagged_sent = nltk.tag.untag(tagged_sent)
    for i, (word, tag) in enumerate(tagged_sent):
        featuresets.append( (pos_features(untagged_sent, i), tag) )

Here’s a sample of the contents of featuresets:

>>> featuresets[:5]
[({'prev-word': '<START>', 'word': u'You', 'next-word': u'ca'}, False), ({'prev-word': u'You', 'word': u'ca', 'next-word': u"n't"}, False), ({'prev-word': u'ca', 'word': u"n't", 'next-word': u'be'}, False), ({'prev-word': u"n't", 'word': u'be', 'next-word': u'friends'}, False), ({'prev-word': u'be', 'word': u'friends', 'next-word': u'with'}, False)]

It’s nearly time to train our model, but first we need to split out labelled data into training and test sets so we can see how well our model performs on data it hasn’t seen before. sci-kit learn has a function that does this for us:

from sklearn.cross_validation import train_test_split
train_data,test_data = train_test_split(featuresets, test_size=0.20, train_size=0.80)
 
>>> len(train_data)
9480
 
>>> len(test_data)
2370

Now let’s train our model. I decided to try out Naive Bayes and Decision tree models to see how they got on:

>>> classifier = nltk.NaiveBayesClassifier.train(train_data)
>>> print nltk.classify.accuracy(classifier, test_data)
0.977215189873
 
>>> classifier = nltk.DecisionTreeClassifier.train(train_data)
>>> print nltk.classify.accuracy(classifier, test_data)
0.997046413502

It looks like both are doing a good job here with the decision tree doing slightly better. One thing to keep in mind is that most of the sentences we’ve trained at in the form ‘:‘ and we can get those correct with a simple regex so we should expect the accuracy to be very high.

If we explore the internals of the decision tree we’ll see that it’s massively overfitting which makes sense given our small training data set and the repetitiveness of the data:

>>> print(classifier.pseudocode(depth=2))
if next-word == u'!': return False
if next-word == u'$': return False
...
if next-word == u"'s": return False
if next-word == u"'ve": return False
if next-word == u'(':
  if word == u'!': return False
  ...
if next-word == u'*': return False
if next-word == u'*****': return False
if next-word == u',':
  if word == u"''": return False
  ...
if next-word == u'--': return False
if next-word == u'.': return False
if next-word == u'...':
  ...
  if word == u'who': return False
  if word == u'you': return False
if next-word == u'/i': return False
if next-word == u'1': return True
...
if next-word == u':':
  if prev-word == u"'s": return True
  if prev-word == u',': return False
  if prev-word == u'...': return False
  if prev-word == u'2030': return True
  if prev-word == '<START>': return True
  if prev-word == u'?': return False
...
if next-word == u'\u266a\u266a': return False

One update I may make to the features is to include the part of speech of the word rather than its actual value to see if that makes the model a bit more general. Another option is to train a bunch of decision trees against a subset of the data and build an ensemble/random forest of those trees.

Once I’ve got a working ‘speaker detector’ I want to then go and work out who the likely speaker is for the sentences which don’t contain a speaker. The plan is to calculate the word distributions of the speakers from sentences I do have and then calculate the probability that they spoke the unlabelled sentences.

This might not work perfectly as there could be new characters in those episodes but hopefully we can come up with something decent.

The full code for this example is on github if you want to have a play with it.

Any suggestions for improvements are always welcome in the comments.

Categories: Programming

Stuff The Internet Says On Scalability For February 20th, 2015

Hey, it's HighScalability time:


Networks are everywhere, they can even help reveal disease connections.
  • trillions: number of photons constantly hitting your eyes; $19 billion: Snapchat valuation;  8.5K: average number of questions asked on Stack Overflow per day
  • Quotable Quotes:
    • @BenedictEvans: End of 2014: 3.75-4bn mobiles ~1.5bn PCs  7-800m consumer PCs 1.2-1.3bn closed Android 4-500m open Android 650-675m iOS 80m Macs, ~75m Linux
    • @JeremiahLee: “Humans only use 10% of their internet.” —@nvcexploder #NodeSummit
    • beguiledfoil: Javavu - The feeling experienced when you see new languages make the same mistakes Java made 20 years ago and momentarily mistake said language for Java.
    • @ewolff: If Conway's Law is so important - are #Microservices more an organizational approach than an architecture?
    • @KentLangley: "Apache Spark Continues to Spread Beyond Hadoop." I would say supplant. 
    • Database Soup: An in-memory database is one which lacks the capability of spilling to disk.
    • Matthew Dillon: 1-2 year SSD wear on build boxes has been minimal.
    • @gwenshap: Except there is one writer and many readers - so schema and validation must be done on ingest. Anywhere else is just shifting responsibility
    • @jaykreps: Startup style of engineering (fail fast & iterate) doesn't work for every domain, esp. databases & financial systems
    • Taulant Ramabaja: Decentralization is not a goal in and of itself, it is a strategy
    • Eli Reisman: Etsy runs more Hadoop jobs by 7am than most companies do all day.
    • Dormando: We're [memcached] not sponsored like redis is. I've only ever lost money on this venture.
    • The Trust Engineers: There are more Facebook users than Catholics.

  • Exponent...The new integration is hardware + software + services. Not services like disk storage, but services  like HomeKit, HealthKit, Siri, Car Play, Apple Pay. Services that touch every part of our lives. Apple doesn't build cars, stores, or information services, it wraps them with an Apple layer that provides the customer with an integrated experience while taking full advantage of modularity. Modularity wrapped with integration. Owning the hardware is a better profit model than sercvices in the cloud.

  • Quite a response to You Don't Like Google's Go Because You Are Small on reddit. A vigorous 500+ comments were written. Golang isn't perfect. How disappointing, so many things are.

  • After making Linux work well on multiple cores that next bump in performance comes from Improving Linux networking performance. It's a hard job. For a 100Gb adapter on 3GHz CPU there are only about 200 CPU cycles to process each packet. Good break down of time budgets for for various instructions. The approach is improved batching at multiple layers of the stack and better memory management, which leads directly into Toward a more efficient slab allocator.

  • The process behind creating a Google Doodle for Alessandro Volta’s 270th Birthday reminds me a lot of the process of making old style illustrations as described in Cartographies of Time: A History of the Timeline. The idea is to encode symbolically as much of the important information as possible in a single diagram. The coded icon of a tiny skull could mean, for example, a king died while on the throne. A single flame could stand for the fall of man. This art is not completely lost with today's need to convey a lot of information on small screens. This sort of compression has advantages: Strass believed that a graphic representation of history held manifold advantages over a textual one: it revealed order, scale, and synchronism simply and without the trouble of memorization and calculation.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...
Categories: Architecture

Exploring container platforms: StackEngine

Xebia Blog - Fri, 02/20/2015 - 15:51

Docker has been around for more than a year already, and there are a lot of container platforms popping up. In this series of blogposts I will explore these platforms and share some insights. This blogpost is about StackEngine.

TL;DR: StackEngine is (for now) just a nice frontend to the Docker binary. Nothing...

Product versus Project: Exposing Behavior Differences

Organizations with a product perspective generally have an understanding that a project or release will follow the current project reducing the need to get as large a bite at the apple as possible (having tried this a child, I can tell you choking risk is increased).

Organizations with a product perspective generally have an understanding that a project or release will follow the current project reducing the need to get as large a bite at the apple as possible (having tried this a child, I can tell you choking risk is increased).

The concepts of product and project are common perspectives in software development organizations. A simple definition for each is that product is the thing that is delivered – software, an app or an interface. A project reflects that activities needed to develop the product or a feature of the product. Products often have roadmaps that define the path they will follow as they evolve. I was recently shown a road map for an appraisal tool a colleague markets that showed a number of new features planned for later this year and areas that would be addressed in the next few years. The map became less precise the further the time horizon was pushed out. Projects, releases and sprints typically are significantly more granular and with specific plans for work that is currently being developed. Different perspectives generate several different behaviors.

  1. Roadmap versus plan: The time-boxed nature of a project or a sprint (both have a stated beginning and end) tends to generate a focus planning and executing specific activities and tasks. For example, in Scrum sprint planning, accept and commit to the user stories they will deliver. There is often a many-to-one relationship between stories and features that would be recognized at by end-users or customers. Product planning tends to focus on the features and architectures that meet the needs of the user community. Projects foster short-term rather than long-term focus. Short-term focus can lead to architectural trade-offs or technical shortcuts to meet specific dates that will have negative implications in the future. The product owner is often the bridge between the project and product perspectives, acting as an arbiter. The product owner helps the team make decisions could have long-term implications and provides the whole team with an understanding of the roadmap. Teams without (or with limited) access to a product owner and product roadmap can only focus on the time horizon they know.
  2. Needs versus Constraints: Projects are often described as the interaction between the triple constraints of time, budget and scope. Sprints are no different; cadence – time, fixed team size – budget, and committed stories – scope. There is always a natural tension between the business/product owner and the development team. In organizations with a project perspective, product owners and other business stakeholders typically have a rational economic rational to pressure teams to commit to more than can reasonably accomplished in any specific project. Who knows when the next project will be funded? This behavior is often illustrated when the business indicates that ALL requirements they have identified are critical, or when concepts like a minimum viable product are met with hostility. Other examples of this behavior can be seen in organizations that adopt pseudo-Agile.  In pseudo-Agile backlogs are created and an overall due date generated for all the stories  before a team even understands their capacity to deliver. Shortcuts, technical debt and lower customer satisfaction are often the results of this type of perspective. Organizations with a product perspective generally have an understanding that a project or release will follow the current project reducing the need to get as large a bite at the apple as possible (having tried this a child, I can tell you choking risk is increased).
  3. Measuring Efficiency/Cost versus Revenue: Organizations with a product perspective tend to take a wider view of what needs to be measured. Books such as The Goal (by Goldratt and Cox) make a passionate argument for the measurement of overall revenue. The thought is that any process change or any system enhancement needs to be focused on optimizing the big picture rather than over optimizing steps that don’t translate to the goals of the organization. Focusing of delivering projects more efficiently, which is the classic IT measurement, does not make sense if what is being done does not translate to delivering value. Measuring the impact of a product roadmap (e.g. revenue, sales, ROI) leads organizations to a product view of work which lays stories and features out as portfolio of work.

These dichotomies represent how differences in project and product perspectives generate different behaviors. Both perspectives are important based on the role a person is playing in an organization. For example, a sprint team must have a project perspective so they can commit to work with a time box. That same team needs to have a product view when they are making day-to-day trade-offs that all teams take or technical debt may overtake their ability to deliver. Product owners are often the bridge between the project and product perspectives, however the best teams understand and leverage both.


Categories: Process Management

Diamond Kata - TDD with only Property-Based Tests

Mistaeks I Hav Made - Nat Pryce - Fri, 02/20/2015 - 00:04
The Diamond Kata is a simple exercise that Seb Rose described in a recent blog post. Seb describes the Diamond Kata as: Given a letter, print a diamond starting with ‘A’ with the supplied letter at the widest point. For example: print-diamond ‘C’ prints A B B C C B B A Seb used the exercise to illustrate how he “recycles” tests to help him work incrementally towards a full solution. Seb’s approach prompted Alastair Cockburn to write an article in response in which he argued for more thinking before programming. Alastair’s article shows how he approached the Diamond Kata with more up-front analysis. Ron Jeffries and George Dinwiddie resonded to Alastair’s article, showing how they approached the Diamond Kata relying on emergent design to produce an elegant solution (“thinking all the time”, as Ron Jeffries put it). There was some discussion on Twitter, and several other people published their approaches. (I’ll list as many as I know about at the end of this article). The discussion sparked my interest, so I decided to have a go at the exercise myself. The problem seemed to me, at first glance, to be a good fit for property testing. So I decided to test-drive a solution using only property-based tests and see what happens. I wrote the solution in Scala and used ScalaTest to run and organise the tests and ScalaCheck for property testing. What follows is an unexpurgated, warts-and-all walkthrough of my progress, not just the eventual complete solution. I made wrong turns and stupid mistakes along the way. The walkthrough is pretty long, so if you want you don’t want to follow through step by step, jump straight to the complete solution and/or my conclusions on how the exercise went and what I learned. Alternatively, if you want to follow the walkthrough in more detail, the entire history is on GitHub, with a commit per TDD step (add a failing test, commit, make the implementation pass the test, commit, refactor, commit, … and repeat). Walkthrough Getting Started: Testing the Test Runner The first thing I like to do when starting a new project is make sure my development environment and test runner are set up right, that I can run tests, and that test failures are detected and reported. I use Gradle to bootstrap a new Scala project with dependencies on the latest versions of ScalaTest and ScalaCheck and import the Gradle project into IntelliJ IDEA. ScalaTest supports several different styles of test and assertion syntax. The user guide recommends writing an abstract base class that combines traits and annotations for your preferred testing style and test runner, so that’s what I do first: @RunWith(classOf[JUnitRunner]) abstract class UnitSpec extends FreeSpec with PropertyChecks { } My test class extends UnitSpec: class DiamondSpec extends UnitSpec { } I add a test that explicitly fails, to check that the test framework, IDE and build hang together correctly. When I see the test failure, I’m ready to write the first real test. The First Test Given that I’m writing property tests, I have to start with a simple property of the diamond function, not a simple example. The simplest property I can think of is: For all valid input character, the diamond contains one or more lines of text. To turn that into a property test, I must define “all valid input characters” as a generator. The description of the Diamond Kata defines valid input as a single upper case character. ScalaCheck has a predefined generator for that: val inputChar = Gen.alphaUpperChar At this point, I haven’t decided how I will represent the diamond. I do know that my test will assert on the number of lines of text, so I write the property with respect to an auxiliary function, diamondLines(c:Char):Vector[String], which will generate a diamond for input character c and return the lines of the diamond in a vector. "produces some lines" in { forAll (inputChar) { c => assert(diamondLines(c).nonEmpty) } } I like the way that the test reads in ScalaTest/ScalaCheck. It is pretty much a direct translation of my English description of the property into code. To make the test fail, I write diamondLines as: def diamondLines(c : Char) : Vector[String] = { Vector() } The entire test class is: import org.scalacheck._ class DiamondSpec extends UnitSpec { val inputChar = Gen.alphaUpperChar "produces some lines" in { forAll (inputChar) { c => assert(diamondLines(c).nonEmpty) } } def diamondLines(c : Char) : Vector[String] = { Vector() } } The simplest implementation that will make that property pass is to return a single string: object Diamond { def diamond(c: Char) : String = { "A" } } I make the diamondLines function in the test call the new function and split its result into lines: def diamondLines(c : Char) = { Diamond.diamond(c).lines.toVector } The implementation can be used like this: object DiamondApp extends App { import Diamond.diamond println(diamond(args.lift(0).getOrElse("Z").charAt(0))) } A Second Test, But It Is Not Very Helpful I now need to add another property, to more tightly constrain the solution. I notice that the diamond always has an odd number of lines, and decide to test that: For all valid input character, the diamond has an odd number of lines. This implies that the number of lines is greater than zero (because vectors cannot have a negative number of elements and zero is even), so I change the existing test rather than adding another one: "produces an odd number lines" in { forAll (inputChar) { c => assert(isOdd(diamondLines(c).length)) } } def isOdd(n : Int) = n % 2 == 1 But this new test has a problem: my existing solution already passes it. The diamond function returns a single line, and 1 is an odd number. This choice of property is not helping drive the development forwards. A Failing Test To Drive Development, But a Silly Mistake The next simplest property I can think of is the number of lines of the diamond. If ‘ord(c)’ is the number of letters between ‘A’ and c, (zero for A, 1 for B, 2 for C, etc.) then: For all valid input characters, c, the number of lines in a diamond for c is 2*ord(c)+1. At this point I make a silly mistake. I write my property as: "number of lines" in { forAll (inputChar) { c => assert(diamondLines(c).length == ord(c)+1) } } def ord(c: Char) : Int = c - 'A' I don’t notice the mistake immediately. When I do, I decide to leave it in the code as an experiment to see if the property tests will detect the error by becoming inconsistent, and how long it will take before they do so. This kind of mistake would easily be caught by an example test. It’s a good idea to have a few examples, as well as properties, to act as smoke tests. I make the test pass with the smallest amount of production code possible. I move the ord function from the test into the production code and use it to return the required number of lines that are all the same. def diamond(c: Char) : String = { "A\n" * (ord(c)+1) } def ord(c: Char) : Int = c - 'A' Despite sharing the ord function between the test and production code, there’s still some duplication. Both the production and test code calculate ord(c)+1. I want to address that before writing the next test. Refactor: Duplicated Calculation I replace ord(c)+1 with lineCount(c), which calculates number of lines generated for an input letter, and inline the ord(c) function, because it’s now only used in one place. object Diamond { def diamond(c: Char) : String = { "A\n" * lineCount(c) } def lineCount(c: Char) : Int = (c - 'A')+1 } And I use lineCount in the test as well: "number of lines" in { forAll (inputChar) { c => assert(diamondLines(c).length == lineCount(c)) } } On reflection, using the lineCount calculation from production code in the test feels like a mistake. Squareness The next property I add is: For all valid input character, the text containing the diamond is square Where “is square” means: The length of each line is equal to the total number of lines In Scala, this is: "squareness" in { forAll (inputChar) { c => assert(diamondLines(c) forall {_.length == lineCount(c)}) } } I can make the test pass like this: object Diamond { def diamond(c: Char) : String = { val side: Int = lineCount(c) ("A" * side + "\n") * side } def lineCount(c: Char) : Int = (c - 'A')+1 } Refactor: Rename the lineCount Function The lineCount is also being used to calculate the length of each line, so I rename it to squareSide. object Diamond { def diamond(c: Char) : String = { val side: Int = squareSide(c) ("A" * side + "\n") * side } def squareSide(c: Char) : Int = (c - 'A')+1 } Refactor: Clarify the Tests I’m now a little dissatisfied with the way the tests read: "number of lines" in { forAll (inputChar) { c => assert(diamondLines(c).length == squareSide(c)) } } "squareness" in { forAll (inputChar) { c => assert(diamondLines(c) forall {_.length == squareSide(c)}) } } The “squareness” property does not stand alone. It doesn’t communicate that the output is square unless combined with “number of lines” property. I refactor the test to disentangle the two properties: "squareness" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(lines forall {line => line.length == lines.length}) } } "size of square" in { forAll (inputChar) { c => assert(diamondLines(c).length == squareSide(c)) } } The Letter on Each Line The next property I write specifies which characters are printed on each line. The characters of each line should be either a letter that depends on the index of the line, or a space. Because the diamond is vertically symmetrical, I only need to consider the lines from the top to the middle of the diamond. This makes the calculation of the letter for each line much simpler. I make a note to add a property for the vertical symmetry once I have made the implementation pass this test. "single letter per line" in { forAll (inputChar) { c => val allLines = diamondLines(c) val topHalf = allLines.slice(0, allLines.size/2 + 1) for ((line, index) <- topHalf.zipWithIndex) { val lettersInLine = line.toCharArray.toSet diff Set(' ') val expectedOnlyLetter = ('A' + index).toChar assert(lettersInLine == Set(expectedOnlyLetter), "line " + index + ": \"" + line + "\"") } } } To make this test pass, I change the diamond function to: def diamond(c: Char) : String = { val side: Int = squareSide(c) (for (lc <- 'A' to c) yield lc.toString * side) mkString "\n" } This repeats the correct letter for the top half of the diamond, but the bottom half of the diamond is wrong. This will be fixed by the property for vertical symmetry, which I’ve noted down to write next. Vertical Symmetry The property for vertical symmetry is: For all input character, c, the lines from the top to the middle of the diamond, inclusive, are equal to the reversed lines from the middle to the bottom of the diamond, inclusive. "is vertically symmetrical" in { forAll(inputChar) { c => val allLines = diamondLines(c) val topHalf = allLines.slice(0, allLines.size / 2 + 1) val bottomHalf = allLines.slice(allLines.size / 2, allLines.size) assert(topHalf == bottomHalf.reverse) } } The implementation is: def diamond(c: Char) : String = { val side: Int = squareSide(c) val topHalf = for (lc <- 'A' to c) yield lineFor(side, lc) val bottomHalf = topHalf.slice(0, topHalf.length-1).reverse (topHalf ++ bottomHalf).mkString("\n") } But this fails the “squareness” and “size of square” tests! My properties are now inconsistent. The test suite has detected the erroneous implementation of the squareSide function. The correct implementation of squareSide is: def squareSide(c: Char) : Int = 2*(c - 'A') + 1 With this change, the implementation passes all of the tests. The Position Of The Letter In Each Line Now I add a property that specifies the position and value of the letter in each line, and that all other characters in a line are spaces. Like the previous test, I can rely on symmetry in the output to simplify the arithmetic. This time, because the diamond has horizontal symmetry, I only need specify the position of the letter in the first half of the line. I add a specification for horizontal symmetry, and factor out generic functions to return the first and second half of strings and sequences. "is vertically symmetrical" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(firstHalfOf(lines) == secondHalfOf(lines).reverse) } } "is horizontally symmetrical" in { forAll (inputChar) { c => for ((line, index) <- diamondLines(c).zipWithIndex) { assert(firstHalfOf(line) == secondHalfOf(line).reverse, "line " + index + " should be symmetrical") } } } "position of letter in line of spaces" in { forAll (inputChar) { c => for ((line, lineIndex) <- firstHalfOf(diamondLines(c)).zipWithIndex) { val firstHalf = firstHalfOf(line) val expectedLetter = ('A'+lineIndex).toChar val letterIndex = firstHalf.length - (lineIndex + 1) assert (firstHalf(letterIndex) == expectedLetter, firstHalf) assert (firstHalf.count(_==' ') == firstHalf.length-1, "number of spaces in line " + lineIndex + ": " + line) } } } def firstHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(0, (v.length+1)/2) } def secondHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(v.length/2, v.length) } The implementation is: object Diamond { def diamond(c: Char) : String = { val side: Int = squareSide(c) val topHalf = for (letter <- 'A' to c) yield lineFor(side, letter) (topHalf ++ topHalf.reverse.tail).mkString("\n") } def lineFor(length: Int, letter: Char): String = { val halfLength = length/2 val letterIndex = halfLength - ord(letter) val halfLine = " "*letterIndex + letter + " "*(halfLength-letterIndex) halfLine ++ halfLine.reverse.tail } def squareSide(c: Char) : Int = 2*ord(c) + 1 def ord(c: Char): Int = c - 'A' } It turns out the ord function, which I inlined into squareSide a while ago, is needed after all. The implementation is now complete. Running the DiamondApp application prints out diamonds. But there’s plenty of scope for refactoring both the production and test code. Refactoring: Delete the “Single Letter Per Line” Property The “position of letter in line of spaces” property makes the “single letter per line” property superflous, so I delete “single letter per line”. Refactoring: Simplify the Diamond Implementation I rename some parameters and simplify the implementation of the diamond function. object Diamond { def diamond(maxLetter: Char) : String = { val topHalf = for (letter <- 'A' to maxLetter) yield lineFor(maxLetter, letter) (topHalf ++ topHalf.reverse.tail).mkString("\n") } def lineFor(maxLetter: Char, letter: Char): String = { val halfLength = ord(maxLetter) val letterIndex = halfLength - ord(letter) val halfLine = " "*letterIndex + letter + " "*(halfLength-letterIndex) halfLine ++ halfLine.reverse.tail } def squareSide(c: Char) : Int = 2*ord(c) + 1 def ord(c: Char): Int = c - 'A' } The implementation no longer uses the squareSide function. It’s only used by the “size of square” property. Refactoring: Inline the squareSide function I inline the squareSide function into the test. "size of square" in { forAll (inputChar) { c => assert(diamondLines(c).length == 2*ord(c) + 1) } } I believe the erroneous calculation would have been easier to notice if I had done this from the start. Refactoring: Common Implementation of Symmetry There’s one last bit of duplication in the implementation. The expressions that create the horizontal and vertical symmetry of the diamond can be replaced with calls to a generic function. I’ll leave that as an exercise for the reader… Complete Tests and Implementation Tests: import Diamond.ord import org.scalacheck._ import scala.collection.generic.CanBuildFrom class DiamondSpec extends UnitSpec { val inputChar = Gen.alphaUpperChar "squareness" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(lines forall {line => line.length == lines.length}) } } "size of square" in { forAll (inputChar) { c => assert(diamondLines(c).length == 2*ord(c) + 1) } } "is vertically symmetrical" in { forAll (inputChar) { c => val lines = diamondLines(c) assert(firstHalfOf(lines) == secondHalfOf(lines).reverse) } } "is horizontally symmetrical" in { forAll (inputChar) { c => for ((line, index) <- diamondLines(c).zipWithIndex) { assert(firstHalfOf(line) == secondHalfOf(line).reverse, "line " + index + " should be symmetrical") } } } "position of letter in line of spaces" in { forAll (inputChar) { c => for ((line, lineIndex) <- firstHalfOf(diamondLines(c)).zipWithIndex) { val firstHalf = firstHalfOf(line) val expectedLetter = ('A'+lineIndex).toChar val letterIndex = firstHalf.length - (lineIndex + 1) assert (firstHalf(letterIndex) == expectedLetter, firstHalf) assert (firstHalf.count(_==' ') == firstHalf.length-1, "number of spaces in line " + lineIndex + ": " + line) } } } def firstHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(0, (v.length+1)/2) } def secondHalfOf[AS, A, That](v: AS)(implicit asSeq: AS => Seq[A], cbf: CanBuildFrom[AS, A, That]) = { v.slice(v.length/2, v.length) } def diamondLines(c : Char) = { Diamond.diamond(c).lines.toVector } } Implementation: object Diamond { def diamond(maxLetter: Char) : String = { val topHalf = for (letter <- 'A' to maxLetter) yield lineFor(maxLetter, letter) (topHalf ++ topHalf.reverse.tail).mkString("\n") } def lineFor(maxLetter: Char, letter: Char): String = { val halfLength = ord(maxLetter) val letterIndex = halfLength - ord(letter) val halfLine = " "*letterIndex + letter + " "*(halfLength-letterIndex) halfLine ++ halfLine.reverse.tail } def ord(c: Char): Int = c - 'A' } Conclusions In his article, “Thinking Before Programming”, Alastair Cockburn writes: The advantage of the Dijkstra-Gries approach is the simplicity of the solutions produced. The advantage of TDD is modern fine-grained incremental development. … Can we combine the two? I think property-based tests in the TDD process combined the two quite successfully in this exercise. I could record my half-formed thoughts about the problem and solution as generators and properties while using “modern fine-grained incremental development” to tighten up the properties and grow the code that met them. In Seb’s original article, he writes that when working from examples… it’s easy enough to get [the tests for ‘A’ and ‘B’] to pass by hardcoding the result. Then we move on to the letter ‘C’. The code is now screaming for us to refactor it, but to keep all the tests passing most people try to solve the entire problem at once. That’s hard, because we’ll need to cope with multiple lines, varying indentation, and repeated characters with a varying number of spaces between them. I didn’t encounter this problem when driving the implementation with properties. Adding a new property always required an incremental improvement to the implementation to get the tests passing again. Neither did I need to write throw-away tests for behaviour that was not actually desired of the final implementation, as Seb did with his “test recycling” approach. Every property I added applied to the complete solution. I only deleted properties that were implied by properties I added later, and so had become unnecessary duplication. I took the approach of starting from very generic properties and incrementally adding more specific properties as I refine the implementation. Generic properties were easy to come up with, and helped me make progress in the problem. The suite of properties reinforced one another, testing the tests, and detected the mistake I made in one property that caused it to be inconsistent with the rest. I didn’t know Scala, ScalaTest or ScalaCheck well. Now I’ve learned them better I wish I had written a minimisation strategy for the input character. This would have made test failure messages easier to understand. I also didn’t address what the diamond function would do with input outside the range of ‘A’ to ‘Z’. Scala doesn’t let one define a subtype of Char, so I can’t enforce the input constraint in the type system. I guess the Scala way would be to define diamond as a PartialFunction[Char,String]. Further Thoughts Thoughts on duplication and tests as documentation Thoughts on property-based tests and iterative/incremental development Other Solutions Mark Seeman has approached Diamond Kata with property-based tests, using F# and FsCheck. Solutions to the Diamond Kata using exmaple-based tests include: Seb Rose: Recycling Tests in TDD Alastair Cockburn: Thinking Before Programming Seb Rose: Diamond recycling (and painting yourself into a corner) Ron Jeffries: a detailed walkthrough of his solution George Dinwiddie: Another Approach to the Diamond Kata Ivan Sanchez: A walkthrough of his Clojure solution. Jon Jagger: print “squashed-circle” diamond Sandro Mancuso: A Java solution on GitHub Krzysztof Jelski: A Python solution on GitHub Philip Schwarz: A Clojure solution on GitHub
Categories: Programming, Testing & QA

Geeks with Empathy – The Key Traits a Business Analyst Must Possess

Software Requirements Blog - Seilevel.com - Thu, 02/19/2015 - 17:10
I recently participated in a Panel Discussion on the state of the Business Analyst Profession held by IIBA of Austin. One of the questions we tackled was the following: “What are the key traits that a Business Analyst must possess?” There are many qualifications and abilities that a good analyst must possess. The ability to […]
Categories: Requirements

What Should I Do With Free Time At Work?

Making the Complex Simple - John Sonmez - Thu, 02/19/2015 - 17:00

In this video, I respond to a not-so-anonymous email that asks me a question that I often get asked about which is: What you should do with free time at work?

The post What Should I Do With Free Time At Work? appeared first on Simple Programmer.

Categories: Programming

GTAC 2014 Coming to Seattle/Kirkland in October

Google Testing Blog - Thu, 02/19/2015 - 14:22
Posted by Anthony Vallone on behalf of the GTAC Committee

If you're looking for a place to discuss the latest innovations in test automation, then charge your tablets and pack your gumboots - the eighth GTAC (Google Test Automation Conference) will be held on October 28-29, 2014 at Google Kirkland! The Kirkland office is part of the Seattle/Kirkland campus in beautiful Washington state. This campus forms our third largest engineering office in the USA.



GTAC is a periodic conference hosted by Google, bringing together engineers from industry and academia to discuss advances in test automation and the test engineering computer science field. It’s a great opportunity to present, learn, and challenge modern testing technologies and strategies.

You can browse the presentation abstracts, slides, and videos from last year on the GTAC 2013 page.

Stay tuned to this blog and the GTAC website for application information and opportunities to present at GTAC. Subscribing to this blog is the best way to get notified. We're looking forward to seeing you there!

Categories: Testing & QA