Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Auth0 Architecture - Running in Multiple Cloud Providers and Regions

This is a guest post by Jose Romaniello, Head of Engineering, at Auth0.

Auth0 provides authentication, authorization and single sign on services for apps of any type: mobile, web, native; on any stack.

Authentication is critical for the vast majority of apps. We designed Auth0 from the beginning with multipe levels of redundancy. One of this levels is hosting. Auth0 can run anywhere: our cloud, your cloud, or even your own servers. And when we run Auth0 we run it on multiple-cloud providers and in multiple regions simultaneously.

This article is a brief introduction of the infrastructure behind app.auth0.com and the strategies we use to keep it up and running with high availability.

Core Service Architecture

The core service is relatively simple:

  • Front-end servers: these consist of several x-large VMs, running Ubuntu on Microsoft Azure.

  • Store: mongodb, running on dedicated memory optimized X-large VMs.

  • Intra-node service routing: nginx

All components of Auth0 (e.g. Dashboard, transaction server, docs) run on all nodes. All identical.

Multi-cloud / High Availability
Categories: Architecture

What It Actually Means to Market Yourself as a Software Developer

Making the Complex Simple - John Sonmez - Mon, 12/01/2014 - 17:00

Today is your lucky day! No, really it is. I am going to tell you exactly what it means to market yourself as a software developer and why it just might not be such a bad thing. Believe me, I know what you are thinking. I get a lot of flak about the idea of marketing yourself or doing any ... Read More

The post What It Actually Means to Market Yourself as a Software Developer appeared first on Simple Programmer.

Categories: Programming

About snowmen and mathematical proof why agile works

Xebia Blog - Mon, 12/01/2014 - 16:05

Last week I had an interesting course by Roger Sessions on Snowman Architecture. The perishable nature of Snowmen under any serious form of pressure fortunately does not apply to his architecture principles, but being an agile fundamentalist I noticed some interesting patterns in the math underlying the Snowmen Architecture that are well rooted in agile practices. Understanding these principles may give facts to feed your gut feeling about these philosophies and give mathematical proof as to why Agile works.

Complexity

“What has philosophy got to do with measuring anything? It's the mathematicians you have to trust, and they measure the skies like we measure a field. “ - Galileo Galilei, Concerning the New Star (1606).

In his book “Facts and Fallacies of Software Engineering” Robert Glass implied that when the functionality of a system increases by 25% the complexity of it effectively doubles. So in formula form:

                      

This hypothesis is supported by empirical evidence, and also explains why planning poker that focuses on the complexity of the implementation rather than functionality delivered is a more accurate estimator of what a team can deliver in sprint.

Basically the smaller you can make the functionality the better, and that is better to the power 3 for you! Once you start making functionality smaller, you will find that your awesome small functionality needs to talk to other functionalities in order to be useful for an end user. These dependencies are penalized by Roger’s model.

“An outside dependency contributes as much complexity as any other function, but does so independently of the functions.”

In other words, splitting a functionality of say 4 points (74 complexity points) in two equal separate functions reduces the overall complexity to 17 complexity points. This benefit however vanishes when each module has more than 3 connections.

An interesting observation that one can derive from this is a mathematical model that helps you to find which functions “belong” together. It stands to reason that when those functions suffer from technical interfacing, they will equally suffer from human interfaces. But how do we find which functions “belong” together, and does it matter if we get it approximately right? 

Endless possibilities

“Almost right doesn’t count” – Dr. Taylor; on landing a spacecraft after a 300 million miles journey 50 meter from a spot with adequate sunlight for the solar panels. 

Partitioning math is incredibly complex, and the main problem with the separation of functions and interfaces is that it has massive implications if you get it “just about right”. This is neatly covered by “the Bell number” (http://en.wikipedia.org/wiki/Bell_number).

These numbers grow quite quickly e.g. a set of 2 functions can be split 2 ways, but a set of 3 already has 5 options, at 6 it is 203 and if your application covers a mere 16 business functions, we already have more than 10 billion ways to create sets, and only a handful will give that desired low complexity number.

So how can math help us to find the optimal set division? the one with the lowest complexity factor?

Equivalence Relations

In order to find business functions that belong together or at lease have so much in common that the number of interfaces will outweigh the functional complexity, we can resort to the set equivalence relation (http://en.wikipedia.org/wiki/Equivalence_relation). It is both the strong and the weak point in the Snowmen architecture. It provides a genius algorithm for separating a set in the most optimal subsets (and doing so in O(n + k log k) time). The equivalence relation that Session proposes is as follows:

            Two business functions {a, b} have synergy if, and only if, from a business perspective {a} is not useful without {b} and visa versa.

The weak point is the subjective measurement in the equation. When played at a too high level it will all be required, and on a too low level not return any valuable business results.

In my last project we split a large eCommerce platform in the customer facing part and the order handling part. This worked so well that the teams started complaining that the separation had lowered their knowledge of each other’s codebase, since very little functionality required coding on both subsystems.

We effectively had reduced complexity considerable, but could have taken it one step further. The order handling system was talking to a lot of other systems in order to get the order fulfilled. From a business perspective we could have separated further, reducing complexity even further. In fact, armed with Glass’s Law, we’ll refactor the application to make it even better than it is today.

Why bother?

Well, polynomial growing problems can’t be solved with linear solutions.

Polynomial problems vs linear solutions plotted against time

Polynomial problems vs linear solutions plotted against time

As long as the complexity is below the solution curve, things will be going fine. Then there is a point in time where the complexity surpasses our ability to solve it. Sure we can add a team, or a new technology, but unless we change nature of our problem, we are only postponing the inevitable.

This is the root cause why your user stories should not exceed the sprint boundaries. Scrum forces you to chop the functionality into smaller pieces that move the team in a phase where linear development power supersedes the complexity of the problem. In practice, in almost every case where we saw a team breaking this rule, they would end up at the “uh-oh moment” at some point in the future, at that stage where there are no neat solutions any more.

So believe in the math and divide your complexity curve in smaller chunks, where your solution capacity exceeds the problems complexity. (As a bonus you get a happy and thriving team.)

Systems Engineering in the Enterprise IT Domain

Herding Cats - Glen Alleman - Mon, 12/01/2014 - 06:14

Systems Engineering has two components

  • System - a set of interrelated components working together toward some common objective.
  • Engineering - the application of scientific principles to practical ends; as the design, construction and operation of efficient and economical structures, equipment, and systems.

When we work in the Enterprise IT domain or any Software Intensive Systems ...

...systems engineering is focused on the system as a whole; it emphasizes its total operation. It looks at the system from the outside, that is, at its interactions with  other systems and the environment, as well as from the inside. It is concerned  not only with the engineering design of the system but also with external factors, which can significantly constrain the design. These include the identification of customer needs, the system operational environment, interfacing systems, logistics  support requirements, the capabilities of operating personnel, and such other  factors as must be correctly reflected in system requirements documents and accommodated in the system design. [Systems Engineering Principles and Practices, Alexander Kossiakoff, John Wiley & Sons]

So what does this mean in practice?

It means when we start without knowing what DONE looks like, no method, no technique, clever process is going to help us discover what DONE looks like, until we spend a pile of money and expend a lot of time trying out various ideas in our search for DONE. 

What this means is that emergent requirements, mean wandering around looking for what DONE looks like. We need to state DONE in units that connect with Capabilities to fulfill a mission or deliver success for a business case.

What this doesn't mean is that we need the requirements up front. In fact we may not actually what the requirements up front. If we don't know what DONE means, those requirements must change and that change costs much more money then writing down what DONE looks like in units of measure meaningful to the decision makers.

So Here's Some Simple Examples of What A Capability Sounds like

  • We need the capability to pre-process insurance claims at $0.07 per transaction rather than the current $0.11 per transaction.
  • We need the capability to remove 1½ hours from the retail ordering process once the merger is complete.
  • We need the capability to change the Wide Field Camera and the internal nickel hydride batteries, while doing no harm to the telescope.
  • We need the capability to fly 4 astronauts to the International Space Station, dock, stay 6 months, and return safely.
  • We need the capability to control the Hell Fire Missile with a new touch panel while maintaining existing navigation and guidance capabilities in the helicopter.
  • We need the capability to comply with FAR Part 15 using the current ERP system and its supporting work processes.

Here's a more detailed example

Identifying System Capabilities is the starting point for any successful program. Systems Capabilities are not direct requirements, but statements of what the system should provide in terms of “abilities.”

For example there are three capabilities needed for the Hubble Robotic Service Mission:

  • Do no harm to the telescope - it is very fragile
  • Change the Wide Field Camera - was built here in Boulder
  • Connect the battery umbilical cable - like our cell phones they wear out

How is this to be done and what are the technical, operational, safety and mission assurance requirements? Don’t really know yet, but the Capabilities guide their development. The critical reason for starting with capabilities is to establish a home for all the requirements.

To answer the questions:

  • Why is this requirement present?
  • Why is this requirement needed?
  • What business or mission value does fulfilling this requirement provide?

Capabilities statements can then be used to define the units of measure for program progress. Measuring progress with physical percent complete at each level is mandatory. But measuring how the Capabilities are being fulfilled is most meaningful to the customer. The “meaningful to the customer” unit of measures are critical to the success of any program. Without these measures the program may be cost, schedule, and technically successful but fail to fulfill the mission.

This is the difference between fit for purpose and Fit for Use.

The process flow below is the starting point for identifying the Needed Capabilities and determining their priorities. Starting with the Capabilities prevents the “Bottom Up” requirements gathering process from producing a “list” of requirements – all needed – that is missing a well formed topology. This Requirements Architecture is no different than the Technical or Programmatic architecture of the system.

Capabilities Based Planning (CBP) focuses on “outputs” rather than “inputs.”

These “outputs” are the mission capabilities that are fulfilled by the program. Without the capabilities, it is never clear the mission will be a success, because there is no clear and concise description of what success means. Success means providing the needed capabilities, on or near schedule and cost. The concept of CBP recognizes the interdependence of systems, strategy, organization, and support in delivering the capability, and the need to examine options and trade‒offs in terms of performance, cost and risk to identify optimum development investments. CBP relies on Use Cases and scenarios to provide the context to measure the level of capability.

Here's One Approach For Capturing the Needed Capabilities

Screen Shot 2014-11-30 at 8.26.59 PM

In Order To Capture These Needed Capabilities We Need To...

Screen Shot 2014-11-30 at 8.29.12 PM

What Does All This Mean?

When we hear of all the failures of IT projects, and other projects for that matter, the first question that must be answered is 

What was the root cause of the failure?

Research has shown that unclear, vague, and many times conflicting requirements are the source of confusion about what DONE looks like. In the absence of a definitive description of DONE in units of effectiveness and performance, those requirements have no home to be assessed for their appropriateness. 

Related articles Estimating Guidance Complex Project Management Populist versus Technical View of Problems
Categories: Project Management

SPaMCAST 318 – Rob Cross, Big Data and Data Analytics In Software Development

www.spamcast.net

http://www.spamcast.net

Listen to the Software Process and Measurement Cast 318

SPaMCAST 318 features our interview with Rob Cross.  Rob and I discussed his InfoQ article “How to Incorporate Data Analytics into Your Software Process.”  Rob provides ideas on how the theory of big data can be incorporated in to big action that provides “ah-ha” moments for executives and developers alike.

Rob Cross has been in the software development industry for over 15 years in various capacities.  He has worked for several start-up businesses including his current company, PSC.  These companies have been focused on providing software quality, security and performance data to organizations leveraging state of the art technologies.  Rob’s current company has analyzed over 8 billion lines of code as an independent software assessment company on products ranging from military systems, medical devices, satellite systems, video games to Wall Street exchanges.

Rob’s email: rc@proservicescorp.com

Call to action!

We are in the middle of a re-read of John Kotter’s classic Leading Change of on the Software Process and Measurement Blog.  Are you participating in the re-read? Please feel free to jump in and add your thoughts and comments!

After we finish the current re-read will need to decide which book will be next.  We are building a list of the books that have had the most influence on readers of the blog and listeners to the podcast.  Can you answer the question?

What are the two books that have most influenced you career (business, technical or philosophical)?  Send the titles to spamcastinfo@gmail.com.

First, we will compile a list and publish it on the blog.  Second, we will use the list to drive future  “Re-read” Saturdays. Re-read Saturday is an exciting new feature that began on the Software Process and Measurement blog on November 8th.  Feel free to choose you platform; send an email, leave a message on the blog, Facebook or just tweet the list (use hashtag #SPaMCAST)!

Next

Why Are Requirements So Hard To Get Right? IT projects have been around in one form or another since the 1940’s. Looking back in the literature describing the history of IT, the topic of requirements in general and identification of requirements specifically have been top of mind since day one.

Upcoming Events

DCG Webinars:

Agile Risk Management – It Is Still Important
Date: December 18th, 2014
Time: 11:30am EST
Register Now

The Software Process and Measurement Cast has a sponsor.

As many you know I do at least one webinar for the IT Metrics and Productivity Institute (ITMPI) every year. The ITMPI provides a great service to the IT profession. ITMPI’s mission is to pull together the expertise and educational efforts of the world’s leading IT thought leaders and to create a single online destination where IT practitioners and executives can meet all of their educational and professional development needs. The ITMPI offers a premium membership that gives members unlimited free access to 400 PDU accredited webinar recordings, and waives the PDU processing fees on all live and recorded webinars. The Software Process and Measurement Cast some support if you sign up here. All the revenue our sponsorship generates goes for bandwidth, hosting and new cool equipment to create more and better content for you. Support the SPaMCAST and learn from the ITMPI.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.


Categories: Process Management

SPaMCAST 318 – Rob Cross, Big Data and Data Analytics In Software Development

Software Process and Measurement Cast - Sun, 11/30/2014 - 23:00

SPaMCAST 318 features our interview with Rob Cross.  Rob and I discussed his InfoQ article “How to Incorporate Data Analytics into Your Software Process.”  Rob provides ideas on how the theory of big data can be incorporated in to big action that provides “ah-ha” moments for executives and developers alike.

Rob Cross has been in the software development industry for over 15 years in various capacities.  He has worked for several start-up businesses including his current company, PSC.  These companies have been focused on providing software quality, security and performance data to organizations leveraging state of the art technologies.  Rob's current company has analyzed over 8 billion lines of code as an independent software assessment company on products ranging from military systems, medical devices, satellite systems, video games to Wall Street exchanges. 

 Rob's email: rc@proservicescorp.com

Call to action!

We are in the middle of a re-read of John Kotter’s classic Leading Change of on the Software Process and Measurement Blog.  Are you participating in the re-read? Please feel free to jump in and add your thoughts and comments!

After we finish the current re-read will need to decide which book will be next.  We are building a list of the books that have had the most influence on readers of the blog and listeners to the podcast.  Can you answer the question?

What are the two books that have most influenced you career (business, technical or philosophical)?  Send the titles to spamcastinfo@gmail.com

First, we will compile a list and publish it on the blog.  Second, we will use the list to drive future  “Re-read” Saturdays. Re-read Saturday is an exciting new feature that began on the Software Process and Measurement blog on November 8th.  Feel free to choose you platform; send an email, leave a message on the blog, Facebook or just tweet the list (use hashtag #SPaMCAST)!

Next

Why Are Requirements So Hard To Get Right? IT projects have been around in one form or another since the 1940’s. Looking back in the literature describing the history of IT, the topic of requirements in general and identification of requirements specifically have been top of mind since day one. 

Upcoming Events

DCG Webinars:

Agile Risk Management - It Is Still Important
Date: December 18th, 2014
Time: 11:30am EST
Register Now

The Software Process and Measurement Cast has a sponsor.

As many you know I do at least one webinar for the IT Metrics and Productivity Institute (ITMPI) every year. The ITMPI provides a great service to the IT profession. ITMPI’s mission is to pull together the expertise and educational efforts of the world’s leading IT thought leaders and to create a single online destination where IT practitioners and executives can meet all of their educational and professional development needs. The ITMPI offers a premium membership that gives members unlimited free access to 400 PDU accredited webinar recordings, and waives the PDU processing fees on all live and recorded webinars. The Software Process and Measurement Cast some support if you sign up here. All the revenue our sponsorship generates goes for bandwidth, hosting and new cool equipment to create more and better content for you. Support the SPaMCAST and learn from the ITMPI.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

Protractor: Angular testing made easy

Google Testing Blog - Sun, 11/30/2014 - 18:50
By Hank Duan, Julie Ralph, and Arif Sukoco in Seattle

Have you worked with WebDriver but been frustrated with all the waits needed for WebDriver to sync with the website, causing flakes and prolonged test times? If you are working with AngularJS apps, then Protractor is the right tool for you.

Protractor (protractortest.org) is an end-to-end test framework specifically for AngularJS apps. It was built by a team in Google and released to open source. Protractor is built on top of WebDriverJS and includes important improvements tailored for AngularJS apps. Here are some of Protractor’s key benefits:

  • You don’t need to add waits or sleeps to your test. Protractor can communicate with your AngularJS app automatically and execute the next step in your test the moment the webpage finishes pending tasks, so you don’t have to worry about waiting for your test and webpage to sync. 
  • It supports Angular-specific locator strategies (e.g., binding, model, repeater) as well as native WebDriver locator strategies (e.g., ID, CSS selector, XPath). This allows you to test Angular-specific elements without any setup effort on your part. 
  • It is easy to set up page objects. Protractor does not execute WebDriver commands until an action is needed (e.g., get, sendKeys, click). This way you can set up page objects so tests can manipulate page elements without touching the HTML. 
  • It uses Jasmine, the framework you use to write AngularJS unit tests, and Javascript, the same language you use to write AngularJS apps.

Follow these simple steps, and in minutes, you will have you first Protractor test running:

1) Set up environment

Install the command line tools ‘protractor’ and ‘webdriver-manager’ using npm:
npm install -g protractor

Start up an instance of a selenium server:
webdriver-manager update & webdriver-manager start

This downloads the necessary binary, and starts a new webdriver session listening on http://localhost:4444.

2) Write your test
// It is a good idea to use page objects to modularize your testing logic
var angularHomepage = {
nameInput : element(by.model('yourName')),
greeting : element(by.binding('yourName')),
get : function() {
browser.get('index.html');
},
setName : function(name) {
this.nameInput.sendKeys(name);
}
};

// Here we are using the Jasmine test framework
// See http://jasmine.github.io/2.0/introduction.html for more details
describe('angularjs homepage', function() {
it('should greet the named user', function(){
angularHomepage.get();
angularHomepage.setName('Julie');
expect(angularHomepage.greeting.getText()).
toEqual('Hello Julie!');
});
});

3) Write a Protractor configuration file to specify the environment under which you want your test to run:
exports.config = {
seleniumAddress: 'http://localhost:4444/wd/hub',

specs: ['testFolder/*'],

multiCapabilities: [{
'browserName': 'chrome',
// browser-specific tests
specs: 'chromeTests/*'
}, {
'browserName': 'firefox',
// run tests in parallel
shardTestFiles: true
}],

baseUrl: 'http://www.angularjs.org',
};

4) Run the test:

Start the test with the command:
protractor conf.js

The test output should be:
1 test, 1 assertions, 0 failures


If you want to learn more, here’s a full tutorial that highlights all of Protractor’s features: http://angular.github.io/protractor/#/tutorial

Categories: Testing & QA

Spark: Write to CSV file with header using saveAsFile

Mark Needham - Sun, 11/30/2014 - 09:21

In my last blog post I showed how to write to a single CSV file using Spark and Hadoop and the next thing I wanted to do was add a header row to the resulting row.

Hadoop’s FileUtil#copyMerge function does take a String parameter but it adds this text to the end of each partition file which isn’t quite what we want.

However, if we copy that function into our own FileUtil class we can restructure it to do what we want:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.IOUtils;
import java.io.IOException;
 
public class MyFileUtil {
    public static boolean copyMergeWithHeader(FileSystem srcFS, Path srcDir, FileSystem dstFS, Path dstFile, boolean deleteSource, Configuration conf, String header) throws IOException {
        dstFile = checkDest(srcDir.getName(), dstFS, dstFile, false);
        if(!srcFS.getFileStatus(srcDir).isDir()) {
            return false;
        } else {
            FSDataOutputStream out = dstFS.create(dstFile);
            if(header != null) {
                out.write((header + "\n").getBytes("UTF-8"));
            }
 
            try {
                FileStatus[] contents = srcFS.listStatus(srcDir);
 
                for(int i = 0; i < contents.length; ++i) {
                    if(!contents[i].isDir()) {
                        FSDataInputStream in = srcFS.open(contents[i].getPath());
 
                        try {
                            IOUtils.copyBytes(in, out, conf, false);
 
                        } finally {
                            in.close();
                        }
                    }
                }
            } finally {
                out.close();
            }
 
            return deleteSource?srcFS.delete(srcDir, true):true;
        }
    }
 
    private static Path checkDest(String srcName, FileSystem dstFS, Path dst, boolean overwrite) throws IOException {
        if(dstFS.exists(dst)) {
            FileStatus sdst = dstFS.getFileStatus(dst);
            if(sdst.isDir()) {
                if(null == srcName) {
                    throw new IOException("Target " + dst + " is a directory");
                }
 
                return checkDest((String)null, dstFS, new Path(dst, srcName), overwrite);
            }
 
            if(!overwrite) {
                throw new IOException("Target " + dst + " already exists");
            }
        }
        return dst;
    }
}

We can then update our merge function to call this instead:

def merge(srcPath: String, dstPath: String, header:String): Unit =  {
  val hadoopConfig = new Configuration()
  val hdfs = FileSystem.get(hadoopConfig)
  MyFileUtil.copyMergeWithHeader(hdfs, new Path(srcPath), hdfs, new Path(dstPath), false, hadoopConfig, header)
}

We call merge from our code like this:

merge(file, destinationFile, "type,count")

I wasn’t sure how to import my Java based class into the Spark shell so I compiled the code into a JAR and submitted it as a job instead:

$ sbt package
[info] Loading global plugins from /Users/markneedham/.sbt/0.13/plugins
[info] Loading project definition from /Users/markneedham/projects/spark-play/playground/project
[info] Set current project to playground (in build file:/Users/markneedham/projects/spark-play/playground/)
[info] Compiling 3 Scala sources to /Users/markneedham/projects/spark-play/playground/target/scala-2.10/classes...
[info] Packaging /Users/markneedham/projects/spark-play/playground/target/scala-2.10/playground_2.10-1.0.jar ...
[info] Done packaging.
[success] Total time: 8 s, completed 30-Nov-2014 08:12:26
 
$ time ./bin/spark-submit --class "WriteToCsvWithHeader" --master local[4] /path/to/playground/target/scala-2.10/playground_2.10-1.0.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.propertie
...
14/11/30 08:16:15 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
14/11/30 08:16:15 INFO SparkContext: Job finished: saveAsTextFile at WriteToCsvWithHeader.scala:49, took 0.589036 s
 
real	0m13.061s
user	0m38.977s
sys	0m3.393s

And if we look at our destination file:

$ cat /tmp/singlePrimaryTypes.csv
type,count
THEFT,859197
BATTERY,757530
NARCOTICS,489528
CRIMINAL DAMAGE,488209
BURGLARY,257310
OTHER OFFENSE,253964
ASSAULT,247386
MOTOR VEHICLE THEFT,197404
ROBBERY,157706
DECEPTIVE PRACTICE,137538
CRIMINAL TRESPASS,124974
PROSTITUTION,47245
WEAPONS VIOLATION,40361
PUBLIC PEACE VIOLATION,31585
OFFENSE INVOLVING CHILDREN,26524
CRIM SEXUAL ASSAULT,14788
SEX OFFENSE,14283
GAMBLING,10632
LIQUOR LAW VIOLATION,8847
ARSON,6443
INTERFERE WITH PUBLIC OFFICER,5178
HOMICIDE,4846
KIDNAPPING,3585
INTERFERENCE WITH PUBLIC OFFICER,3147
INTIMIDATION,2471
STALKING,1985
OFFENSES INVOLVING CHILDREN,355
OBSCENITY,219
PUBLIC INDECENCY,86
OTHER NARCOTIC VIOLATION,80
RITUALISM,12
NON-CRIMINAL,12
OTHER OFFENSE ,6
NON - CRIMINAL,2
NON-CRIMINAL (SUBJECT SPECIFIED),2

Happy days!

The code is available as a gist if you want to see all the details.

Categories: Programming

Spark: Write to CSV file

Mark Needham - Sun, 11/30/2014 - 08:40

A couple of weeks ago I wrote how I’d been using Spark to explore a City of Chicago Crime data set and having worked out how many of each crime had been committed I wanted to write that to a CSV file.

Spark provides a saveAsTextFile function which allows us to save RDD’s so I refactored my code into the following format to allow me to use that:

import au.com.bytecode.opencsv.CSVParser
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext._
 
def dropHeader(data: RDD[String]): RDD[String] = {
  data.mapPartitionsWithIndex((idx, lines) => {
    if (idx == 0) {
      lines.drop(1)
    }
    lines
  })
}
 
// https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
val crimeFile = "/Users/markneedham/Downloads/Crimes_-_2001_to_present.csv"
 
val crimeData = sc.textFile(crimeFile).cache()
val withoutHeader: RDD[String] = dropHeader(crimeData)
 
val file = "/tmp/primaryTypes.csv"
FileUtil.fullyDelete(new File(file))
 
val partitions: RDD[(String, Int)] = withoutHeader.mapPartitions(lines => {
  val parser = new CSVParser(',')
  lines.map(line => {
    val columns = parser.parseLine(line)
    (columns(5), 1)
  })
})
 
val counts = partitions.
  reduceByKey {case (x,y) => x + y}.
  sortBy {case (key, value) => -value}.
  map { case (key, value) => Array(key, value).mkString(",") }
 
counts.saveAsTextFile(file)

If we run that code from the Spark shell we end up with a folder called /tmp/primaryTypes.csv containing multiple part files:

$ ls -lah /tmp/primaryTypes.csv/
total 496
drwxr-xr-x  66 markneedham  wheel   2.2K 30 Nov 07:17 .
drwxrwxrwt  80 root         wheel   2.7K 30 Nov 07:16 ..
-rw-r--r--   1 markneedham  wheel     8B 30 Nov 07:16 ._SUCCESS.crc
-rw-r--r--   1 markneedham  wheel    12B 30 Nov 07:16 .part-00000.crc
-rw-r--r--   1 markneedham  wheel    12B 30 Nov 07:16 .part-00001.crc
-rw-r--r--   1 markneedham  wheel    12B 30 Nov 07:16 .part-00002.crc
-rw-r--r--   1 markneedham  wheel    12B 30 Nov 07:16 .part-00003.crc
...
-rwxrwxrwx   1 markneedham  wheel     0B 30 Nov 07:16 _SUCCESS
-rwxrwxrwx   1 markneedham  wheel    28B 30 Nov 07:16 part-00000
-rwxrwxrwx   1 markneedham  wheel    17B 30 Nov 07:16 part-00001
-rwxrwxrwx   1 markneedham  wheel    23B 30 Nov 07:16 part-00002
-rwxrwxrwx   1 markneedham  wheel    16B 30 Nov 07:16 part-00003
...

If we look at some of those part files we can see that it’s written the crime types and counts as expected:

$ cat /tmp/primaryTypes.csv/part-00000
THEFT,859197
BATTERY,757530
 
$ cat /tmp/primaryTypes.csv/part-00003
BURGLARY,257310

This is fine if we’re going to pass those CSV files into another Hadoop based job but I actually want a single CSV file so it’s not quite what I want.

One way to achieve this is to force everything to be calculated on one partition which will mean we only get one part file generated:

val counts = partitions.repartition(1).
  reduceByKey {case (x,y) => x + y}.
  sortBy {case (key, value) => -value}.
  map { case (key, value) => Array(key, value).mkString(",") }
 
 
counts.saveAsTextFile(file)

part-00000 now looks like this:

$ cat !$
cat /tmp/primaryTypes.csv/part-00000
THEFT,859197
BATTERY,757530
NARCOTICS,489528
CRIMINAL DAMAGE,488209
BURGLARY,257310
OTHER OFFENSE,253964
ASSAULT,247386
MOTOR VEHICLE THEFT,197404
ROBBERY,157706
DECEPTIVE PRACTICE,137538
CRIMINAL TRESPASS,124974
PROSTITUTION,47245
WEAPONS VIOLATION,40361
PUBLIC PEACE VIOLATION,31585
OFFENSE INVOLVING CHILDREN,26524
CRIM SEXUAL ASSAULT,14788
SEX OFFENSE,14283
GAMBLING,10632
LIQUOR LAW VIOLATION,8847
ARSON,6443
INTERFERE WITH PUBLIC OFFICER,5178
HOMICIDE,4846
KIDNAPPING,3585
INTERFERENCE WITH PUBLIC OFFICER,3147
INTIMIDATION,2471
STALKING,1985
OFFENSES INVOLVING CHILDREN,355
OBSCENITY,219
PUBLIC INDECENCY,86
OTHER NARCOTIC VIOLATION,80
NON-CRIMINAL,12
RITUALISM,12
OTHER OFFENSE ,6
NON - CRIMINAL,2
NON-CRIMINAL (SUBJECT SPECIFIED),2

This works but it’s quite a bit slower than when we were doing the aggregation across partitions so it’s not ideal.

Instead, what we can do is make use of one of Hadoop’s merge functions which squashes part files together into a single file.

First we import Hadoop into our SBT file:

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.5.2"

Now let’s bring our merge function into the Spark shell:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._
 
def merge(srcPath: String, dstPath: String): Unit =  {
  val hadoopConfig = new Configuration()
  val hdfs = FileSystem.get(hadoopConfig)
  FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), false, hadoopConfig, null)
}

And now let’s make use of it:

val file = "/tmp/primaryTypes.csv"
FileUtil.fullyDelete(new File(file))
 
val destinationFile= "/tmp/singlePrimaryTypes.csv"
FileUtil.fullyDelete(new File(destinationFile))
 
val counts = partitions.
reduceByKey {case (x,y) => x + y}.
sortBy {case (key, value) => -value}.
map { case (key, value) => Array(key, value).mkString(",") }
 
counts.saveAsTextFile(file)
 
merge(file, destinationFile)

And now we’ve got the best of both worlds:

$ cat /tmp/singlePrimaryTypes.csv
THEFT,859197
BATTERY,757530
NARCOTICS,489528
CRIMINAL DAMAGE,488209
BURGLARY,257310
OTHER OFFENSE,253964
ASSAULT,247386
MOTOR VEHICLE THEFT,197404
ROBBERY,157706
DECEPTIVE PRACTICE,137538
CRIMINAL TRESPASS,124974
PROSTITUTION,47245
WEAPONS VIOLATION,40361
PUBLIC PEACE VIOLATION,31585
OFFENSE INVOLVING CHILDREN,26524
CRIM SEXUAL ASSAULT,14788
SEX OFFENSE,14283
GAMBLING,10632
LIQUOR LAW VIOLATION,8847
ARSON,6443
INTERFERE WITH PUBLIC OFFICER,5178
HOMICIDE,4846
KIDNAPPING,3585
INTERFERENCE WITH PUBLIC OFFICER,3147
INTIMIDATION,2471
STALKING,1985
OFFENSES INVOLVING CHILDREN,355
OBSCENITY,219
PUBLIC INDECENCY,86
OTHER NARCOTIC VIOLATION,80
RITUALISM,12
NON-CRIMINAL,12
OTHER OFFENSE ,6
NON - CRIMINAL,2
NON-CRIMINAL (SUBJECT SPECIFIED),2

The full code is available as a gist if you want to play around with it.

Categories: Programming

Kanban: Process Improvement and Bottlenecks Revisited

Bottlenecks constrain flow!

Bottlenecks constrain flow!

 

We are revisiting one of more popular essays from 2013 and will return to Re-read Saturday next week with Chapter 4 “Creating A Guiding Coalition.”  

Kanban implementation is a powerful tool to focus the continuous improvement efforts of teams and organizations on delivering more value.  Kanban, through the visualization of work in progress, helps to identify constraints (this is an implementation of the Theory of Constraints).  Add to visualization the core principles discussed in the Daily Process Thoughts, Kanban: An Overview and Introduction which include feedback loops and transparency to regulate the process and an impetus to improve as a team and evolve using models and the scientific method and we have a process improvement engine.

Kanban describes units of work that are blocked or stalled as bottlenecks.  Finding and removing bottlenecks increases the flow of work through the process, therefore increasing the delivery of value.

A perfect example of a bottleneck exists in the highway system in Cleveland, Ohio (the closest major city to my home).  A highway (three lanes in each direction) sweeps into town along the shore of Lake Erie.  When it reaches the edge of downtown the highway makes a nearly 90 degree left hand turn.  The turn is known as Dead Man’s Curve.   Instantly cars and trucks must slow down.  Even when there is no accident the traffic can backup for miles during rush hour.  The turn is a constraint that creates a bottleneck.   If the city wanted to improve the flow of traffic, removing the Dean Man’s Curve bottleneck would help substantially.

Here’s an IT example to see how a bottleneck is identified and how a team could attack the bottleneck. We will use a simple Kanban board.   In this example, the team has a backlog similarly sized units of work.  Each step of the process has a WIP limit.  One of the core practices in Kanban is that WIP limits are not to be systematically violated.

Each step can have different WIP limits.

Each step can have different WIP limits.

As work is drawn through the process, there will be a bottleneck as soon as the analysis for the first wave of work is completed because development only has the capacity to start work on four items. In our example of an application of Kanban, when a unit of work completes the analysis step it will be pulled into the development step only if capacity exists.  In this case one unit of work is immediately blocked and becomes inventory (shown below as the item marked with the letter “B”.

Unbalanced process flows cause bottlenecks

Unbalanced process flows cause bottlenecks

The team has three basic options.  The first is to continue to pull more items into the analysis step and continue to build inventory until the backlog is empty.  This option creates a backlog of work that is waiting for the feedback, increasing the potential rework as defects are found and new decisions are made.  The second possibility is that team members swarm to the blocked unit and add capacity to a step until the blocked unit is freed.  This solution makes a sense if the reason for the blockage is temporary, like a developer that is out sick.  The third (and preferred) option is to change the process to create a balanced flow of work.  In this example, the goal would be to rearrange people and tools to create a balanced WIP limits.

Process improvement maximizes throughput.

Process improvement maximizes throughput.

 Visually tracking work is a powerful tool for identifying bottlenecks.  Kanban’s core practices dissuade practitioners from violating WIP because it limits the stress in the process, which leads to technical debt, defects and rework. Other core practices provide a focus on continuous process improvement so that when a bottleneck is identified, the team works to remove it.  Continually improving the flow work through the development process increases an organization’s ability to deliver value to customers.

 


Categories: Process Management

Estimating Guidance - Updated

Herding Cats - Glen Alleman - Sat, 11/29/2014 - 20:29

There is an abundance of estimating guidance to counter the abundance of ill-informed notions about estimating. Here's some we use on our programs,

The list goes one for 100's of other soruces. Google "software cost estimating." But here's the core issue from the opening line in the Welcome section of Software Estimation: Demystifying the Black ArtSteve McConnell

The most unsuccessful three years in the education of cost estimators appears to be fifth-grade arithmetic - Norman R. Augustine

Augustine is former Chairman and CEO of Martin Marietta. His seminal book Augustine's Laws, describes the complexities and conundrums of today's business management and offers solutions. Anyone interested in learning how successful management of complex technology based firms is done, should read that book.

All Project Processes Driven By Uncertainty

The hope that uncertainty can be "programmed" out of a project is a false hope. However, we can manage in the presence of these uncertainties by understanding the risk they represent, and addressing each in an appropriate manner. In Against the Gods: The Remarkable Story of Risk, author Peter Bernstein states one of the major intellectual triumphs of the modern world is the transformation of uncertainty from a matter of fate to an area of study. And so, risk analysis is the process of assessing risks, while risk management uses risk analysis to devise management strategies to reduce or ameliorate risk. 

Estimating the outcomes of our choices - the opportunity cost paradigm of Microeconomics - is an integral part to managing in the presence of uncertainty. To successfully develop a credible estimate we need to identify and address four types of uncertainly on projects:

  1. Normal variations occur in the completion of tasks arising from normal work processes. Deming has shown that these uncertainties are just part of the process and attempts to control them, plan around them, or otherwise remove them is a waste of time. Mitigation's for these normal variations include fine-grained assessment points in the plan verifying progress. The assessment of these activities should be done in a 0% or 100% manner. Buffers and schedule margin are inserted in front of the critical activities to protect their slippage. Statistical process control approaches forecast further slippage.
  2. Foreseen uncertainties that are identified but have uncertain influences. Mitigation's for these unforeseen uncertainties are done by the creation of contingent paths forward are defined in the plan. These on ramp and off ramp points can be taken if needed.
  3. Unforeseen uncertainties are events that can’t be identified in the planning process. When these unforeseen uncertainties appear new approaches must be developed.
  4. Chaos appears when the basic structure of the project becomes unstable, with no ability to forecast its occurrence are the uncertainties that produced. In the presence of chaos, continuous verification of the project’s strategy is needed. Major iterations of deliverables can isolate these significant disruptions.

Managing in the Presence of Uncertainty

Uncertainty management is essential for any significant project. Certain information about key project cost, performance, and schedule attributes are often unknown until the project is underway. The emerging risks from these uncertainties can be identified early in the project that impact the project later are often termed “known unknowns.” These risks can be mitigated with a good risk management process. For risks that are beyond the vision of the project team a properly implemented risk management process can also rapidly quantify the risks impact and provide sound plans for mitigating its affect.

Uncertainty and the resulting risk management is concerned with the outcome of future events, whose exact outcome is unknown, and with how to deal with these uncertainties. Outcomes are categorized as favorable or unfavorable, and risk management is the art and science of planning, assessing, handling, and monitoring future events to ensure favorable outcomes. A good risk management process is proactive and fundamentally different than issue management or problem solving, which is reactive.

Risk management is an important skill applied to a wide variety of projects. In an era of downsizing, consolidation, shrinking budgets, increasing technological sophistication, and shorter development times, risk management provides valuable insight to help key project personnel plan for risks, alert them of potential issues, analyze these issues, and develop, implement, and monitor plans to address risks  long before they surface as issues and adversely affect project cost, performance, and schedule.

Project management in the presence of uncertainty and the risks this creates requires - actually mandates - estimating the outcomes from these uncertainties. As Tim Lister advises in "Risk Management Is Project Management for Adults"IEEE Software. May 1997.

Risk Management is Project Management for Adults

In the End

So those conjecturing that software estimating can't be done, have either missed that 5th grade class or are intentionally ignoring the basis of all business decision making processes - the assessment of opportunity costs using Microeconomics.

As De Marco and Lister state:

An almost-defining characteristic of adulthood is a willingness to confront the unpleasantness of life, from the niggling to the cataclysmic.

Related articles Assessing Value Produced By Investments Mike Cohn's Agile Quotes Complex Project Management Software Estimating for Non Trival Projects Estimating Guidance Software Estimation in an Agile World
Categories: Project Management

(Edu) Scrum at XP Days Benelux: beware of the next generation

Xebia Blog - Sat, 11/29/2014 - 09:21

Xp Days Benelux 2014 is over, and it was excellent.
Good sessions, interesting mix of topics and presenters, and a wonderful atmosphere of knowledge sharing, respect and passion for Agile.

After 12 years, XP Days Benelux continues to be inspiring and surprising.

The greatest surprise for me was the participation of 12 high school students from the Valuas College in Venlo, who arrived on the second day. These youngsters did not only attend the conference, but they actually hosted a 120-minute session on Scrum at school, called EduScrum.

eduscrum

 

Eduscrum

EduScrum uses the ceremonies, roles and artifacts of Scrum to help young people learn in a better way. Students work together in small teams, and thus take ownership of their own learning process. At the Valuas College, two enthusiastic Chemistry teachers introduced EduScrum in their department two years ago, and have made the switch to teaching Chemistry in this new way.

In an interactive session, we, the adults, learned from the youngsters how they work and what EduScrum brought them. They showed their (foldable!) Scrum boards, explained how their teams are formed, and what the impact was on their study results. Forcing themselves to speak English, they were open, honest, courageous and admirable.

eduscrum2

 

Learnings

Doing Scrum in school has many similarities with doing Scrum at work. However, there is also a lot we can learn from the youngsters. These are my main takeaways:

- Transition is hard
It took the students some time to get used to working in the new way. At first they thought it was awkward. The transition took about… 4 lessons. That means that these youngsters were up and running with Scrum in 2 weeks (!).

- Inform your stakeholders
When the teachers introduced Scrum, they did not inform their main stakeholders, the parents. Some parents, therefore, were quite worried about this strange thing happening at school. However,  after some explanations, the parents recognised that eduScrum actually helps to prepare their children for today’s society and were happy with the process.

- Results count
In schools more than anywhere else, your results (grades) count. EduScrum students are graded as a team as well as individually. When they transitioned to Scrum the students experienced a drop in their grades at first, maybe due to the greater freedom and responsibility they had to get used to. Soon after, theirs grades got better.

- Compliancy is important
Schools and teachers have to comply with many rules and regulations. The knowledge that needs to get acquired each year is quite fixed. However, with EduScrum the students decide how they will acquire that knowledge.

- Scrum teaches you to cooperate
Not surprisingly, all students said that, next to Chemistry, they now learned to cooperate and communicate better. Because of this teamwork, most students like to work this way. However, this is also the reason a few classmates would like to return to the old, individual, style of learning. Teamwork does not suit everyone.

- Having fun helps you to work better
School (and work) should not be boring, and we work better together when we have some fun too. Therefore, next to a Definition of Done, the student teams also have a Definition of Fun.  :-)

Next generation Scrum

At the conference, the youngsters were surprised to see that so many companies that they know personally (like Bol.com) are actually doing Scrum. ‘I thought this was just something I learned to do in school ‘, one girl said. ‘But now I see that it is being used in so many companies and I will actually be able to use it after school, too.’

Beware of these youngsters. When this generation enters the work force, they will embrace Scrum as the natural way of working. In fact, this generation is going to take Scrum to the next level.

 

 

 

 

Cause and Effect and Root Cause Analysisthem

Fishing for the root cause.

Fishing for the root cause.

When something out of the ordinary happens one of the first questions that gets asked is “why?” You can use Root Cause Analysis to find the link between an effect and a cause. Finding the link and the ultimate cause is important if trying to avoid or replicate the event in the future is important. Deming suggested that there are two macro categories of root causes: common causes and special causes. Each needs to be approached differently.

All processes have variation in how they are performed and the results they deliver. For example, the process of a stand-up meeting is not performed exactly the same way every day causing minor variation in duration and potentially in the information shared. Common cause variation is a reflection of the capacity (or capability) of the process. In order to reduce the variance or to increase the capability, changes would have to be made to either the process(es), people or physical environment. Lasting changes requires change to the overall system. Inspecting each variance individually will rarely generate systemic changes. For example, if a team consistently did not finish one (out of some number) of the stories that were committed in a sprint, determining the root cause of each miss individually would be far less effective than looking at the pattern as a whole. Techniques like the “five whys” or fishbone diagraming would be useful for taking a systems view, which is at the core of tackling common cause variation.

Even in systems that are under statistical process control (a method of quality control which uses statistical methods to monitor and control processes to ensure that are operating at peak performance), events occur which generate performance that is substantially better or worse than the normal capability of the system or process. These swings in performance are generally the result of special causes. Special causes by definition are out of the norm and action needs to be taken understand what has occurred. Generally it makes sense, when a special cause event occurs, to address the special case and not the overall system. For example, last year I observed a team that had failed to complete any of the stories they had committed to during a sprint. The performance was due to a severe storm that had left the region without power nearly a week. Whipping the team or changing the Agile process to account for a storm would have less of a long-term impact than changing the environment or perhaps buying a generator.

Finding the cause of an effect is an important skill or history in the form of problems and issues will tend to repeat.  Understanding the difference between common and special cause variance allows teams to decide on the approach to problem solving that has the best chance at identifying the root causes of the issue.


Categories: Process Management

Bad Estimation Is a Systematic Problem

Herding Cats - Glen Alleman - Fri, 11/28/2014 - 16:53

Vasco Duarte posted on twitter a quote (without link) ...

Bad estimation is a systematic problem, not an individual failure ...

The chart below is from a much larger briefing on Essential Views needed to increase the probability of success in the software intensive systems domain. Vasco is correct. The question is what to do about it.

Screen Shot 2014-11-28 at 8.27.22 AM

So the question for those ACAT 1 Programs ($6B and above) and every other domain where cost and schedule overage is common is - what's the approach. In our domain we have approaches. One is below, there are others.

But it would seem that "not estimating" is an unlikley candidate for addressing poor estimating.

Screen Shot 2014-11-28 at 8.49.34 AM 

 

Related articles Assessing Value Produced By Investments Software Estimating for Non Trival Projects
Categories: Project Management

Being Thankful!

I am thankful for my WHOLE family (even the part not in the picture)!

I am thankful for my WHOLE family (even the part not in the picture)!

The fourth Thursday of November is Thanksgiving in the United States. Thanksgiving is a traditional harvest festival in the in the United States and Canada (different dates) although most cultures have a celebration to give thanks for the bounty of the harvest of the land around them. As with many holidays Thanksgiving provides a time for reflection about our lives and the lives of those around us. While I hope we are always thankful, there are times when it is important to actively remember what we have to be thankful for and perhaps even testify just a bit.

I am thankful for:

  • For my whole family and their families and their families families!
  • For my dog and my cats (and that they don’t fight too much)
  • For parents and people that are nearly parents
  • For my friends
  • For everyone that takes or has taken care of me
  • For everyone that has sacrificed so I can be who I am . . . whether they know me or not
  • For the world around me
  • For the problems I face (and for the solutions to those problems)
  • For sand between my toes and the occasional rock in my shoe
  • For the ability to think, smile and laugh
  • For the internet
  • For science and things that are not exactly science
  • For the readers of my blog
  • For the listeners to my podcast
  • For the fact that not everyone agrees with me
  • For the fact that at least a few people do agree with me
  • For mostly rational people and for the guy on the corner that does silly things occasionally
  • For problems to think about
  • For sunrises and sunsets
  • For the 24 hours in a day (even though sometimes I want just a bit more time)

There are probably lots of other things to be thankful for that have slipped my mind, I do not think I am alone in this predicament. Perhaps taking time on a daily basis to reflect on what we are thankful for rather than just on specific days like Thanksgiving would make the world a less stressful, angry and bitter place. But even if reflecting on what you are thankful for could not achieve this lofty goal perhaps it could add one extra smile to your day. A smile that you could share with someone else.


Categories: Process Management

Knowledge Transfer and Validation

Software Requirements Blog - Seilevel.com - Thu, 11/27/2014 - 17:00
I’ve just wrapped up a 2 ½ year consulting engagement. Typically, when I leave a client, I have been on a project and my focus is on completing as many action items as possible and making sure I have transitioned the remaining action items to someone. My team members generally already understand the project. This […]
Categories: Requirements

Take the Harder Road

Making the Complex Simple - John Sonmez - Thu, 11/27/2014 - 16:00

In this video, I talk about why it is usually a good idea to take the harder road, when faced with two choices and also how important it is to specialize.

The post Take the Harder Road appeared first on Simple Programmer.

Categories: Programming

Docker/Neo4j: Port forwarding on Mac OS X not working

Mark Needham - Thu, 11/27/2014 - 13:28

Prompted by Ognjen Bubalo’s excellent blog post I thought it was about time I tried running Neo4j on a docker container on my Mac Book Pro to make it easier to play around with different data sets.

I got the container up and running by following Ognien’s instructions and had the following ports forwarded to my host machine:

$ docker ps
CONTAINER ID        IMAGE                 COMMAND                CREATED             STATUS              PORTS                                              NAMES
c62f8601e557        tpires/neo4j:latest   "/bin/bash -c /launc   About an hour ago   Up About an hour    0.0.0.0:49153->1337/tcp, 0.0.0.0:49154->7474/tcp   neo4j

This should allow me to access Neo4j on port 49154 but when I tried to access that host:port pair I got a connection refused message:

$ curl -v http://localhost:49154
* Adding handle: conn: 0x7ff369803a00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7ff369803a00) send_pipe: 1, recv_pipe: 0
* About to connect() to localhost port 49154 (#0)
*   Trying ::1...
*   Trying 127.0.0.1...
*   Trying fe80::1...
* Failed connect to localhost:49154; Connection refused
* Closing connection 0
curl: (7) Failed connect to localhost:49154; Connection refused

My first thought was the maybe Neo4j hadn’t started up correctly inside the container so I checked the logs:

$ docker logs --tail=10 c62f8601e557
10:59:12.994 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext@2edfbe28{/webadmin,jar:file:/usr/share/neo4j/system/lib/neo4j-server-2.1.5-static-web.jar!/webadmin-html,AVAILABLE}
10:59:13.449 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@192efb4e{/db/manage,null,AVAILABLE}
10:59:13.699 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@7e94c035{/db/data,null,AVAILABLE}
10:59:13.714 [main] INFO  o.e.j.w.StandardDescriptorProcessor - NO JSP Support for /browser, did not find org.apache.jasper.servlet.JspServlet
10:59:13.715 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext@3e84ae71{/browser,jar:file:/usr/share/neo4j/system/lib/neo4j-browser-2.1.5.jar!/browser,AVAILABLE}
10:59:13.807 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@4b6690b1{/,null,AVAILABLE}
10:59:13.819 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@495350f0{HTTP/1.1}{c62f8601e557:7474}
10:59:13.900 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@23ad0c5a{SSL-HTTP/1.1}{c62f8601e557:7473}
2014-11-27 10:59:13.901+0000 INFO  [API] Server started on: http://c62f8601e557:7474/
2014-11-27 10:59:13.902+0000 INFO  [API] Remote interface ready and available at [http://c62f8601e557:7474/]

Nope! It’s up and running perfectly fine which suggested the problemw was with port forwarding.

I eventually found my way to Chris Jones’ ‘How to use Docker on OS X: The Missing Guide‘ which explained the problem:

The Problem: Docker forwards ports from the container to the host, which is boot2docker, not OS X.

The Solution: Use the VM’s IP address.

So to access Neo4j on my machine I need to use the VM’s IP address rather than localhost. We can get the VM’s IP address like so:

$ boot2docker ip
 
The VM's Host only interface IP address is: 192.168.59.103

Let’s strip out that surrounding text though:

$ boot2docker ip 2> /dev/null
192.168.59.103

Now if we cURL using that IP instead:

$ curl -v http://192.168.59.103:49154
* About to connect() to 192.168.59.103 port 49154 (#0)
*   Trying 192.168.59.103...
* Adding handle: conn: 0x7fd794003a00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fd794003a00) send_pipe: 1, recv_pipe: 0
* Connected to 192.168.59.103 (192.168.59.103) port 49154 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.30.0
> Host: 192.168.59.103:49154
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Access-Control-Allow-Origin: *
< Content-Length: 112
* Server Jetty(9.0.5.v20130815) is not blacklisted
< Server: Jetty(9.0.5.v20130815)
<
{
  "management" : "http://192.168.59.103:49154/db/manage/",
  "data" : "http://192.168.59.103:49154/db/data/"
* Connection #0 to host 192.168.59.103 left intact

Happy days!

Chris has solutions to lots of other common problems people come across when using Docker with Mac OS X so it’s worth having a flick through his post.

Categories: Programming

Finding Cause and Effect Using “Five Whys”

Just asking why five times will make you seem like a five-year old

Just asking why five times will make you seem like a five-year old

The “Five Whys” is an iterative approach to identifying cause and effect. The technique provides a structured approach to breaking through surface thinking or intellectual smoke screens in order to get to the real issue. The technique is based on the belief that the first answer to any question is not the root cause, but rather an overly rationalized version.

The tools:

  1. Whiteboard/Flipchart and dry erase markers (for group sessions), or
  2. Paper and pencil (for sessions where taking notes is more appropriate)

The process (team version):

  1. Write down the problem you are trying to investigate.
  2. Ask why the problem occurred and write the answer down.
  3. Using the answer provided ask why again and write the answer down.
  4. Continue until the team agrees that you have exposed a root cause.

Notes for facilitating using the “Five Whys”:

  1. Generally you can tell when you are approaching a root cause when you begin to expose emotion.
  2. Five is not a magic number. Root causes can be exposed by asking with fewer or more whys. If you are well past five whys and have not exposed a root cause, break down the problem into smaller chunks or restate the problem to make sure everyone understands what is being discussed.
  3. Just asking why five times will make you seem like a five-year old. Add context to the why using the previous answer.
  4. If you are using the technique and participants get frustrated with the pattern of questioning, change tactics.
  5. Sometimes “Five Whats” can be substituted for the “Five Whys.” I often use “what” when I am trying to establish a chain of events before looking or discussing why something occurred.

Example:

Problem: The team is still hungry after a “lunch and learn” session.

Why 1 – Why is everyone hungry?

Answer: No one ate during the session.

Why 2 – Why didn’t team members bring their own lunch?

Answer: Lunch was promised so no one brought their own lunch.

Why 3 – Why wasn’t the promised at the session?

Answer: The pizzas were delivered after the session.

Why 4 – Why were the pizzas late?

Answer: Joe ordered the pizzas a “little” late.

Why 5 – Why did Joe order the pizzas late?

Answer: Sid, the team leader, could not find the corporate credit card when the order was supposed to be made.

(Note: this is only sort of fictional)

The act of iteratively asking question is a trick to peel back the layers until a real (or at least real-er) answer is exposed. Finding the root cause of an issue makes it more possible to solve the problem. Without a getting a handle on the root cause it is possible to spend precious time and effort on a solution that won’t deliver results.


Categories: Process Management

Complex Project Management

Herding Cats - Glen Alleman - Wed, 11/26/2014 - 19:53

Effective Complex Poject ManagementThere is much confusion in the domain of project management and especially software projects between complexity, complex, and complicated. Wikipedia definitions almost always fall short.

The book on the left is the latest addition to this topic in the domain of agile software development. The book is based on an Adaptive Complex Project Framework. The notion, a naive notion, that complexity can be reduced and complex systems should be avoided, is just that notional. In practice complex systems can't be avoided in any business or technical domain where mission critical systems exist. That is non-trivial systems are complex. 

These systems include System of Systems, Enterprise Systems, Federated Systems, and system in which interaction with other systems is needed to accomplish the mission or business goal.

The book emerged from a 2010 IBM report of 1,541 executives in 60 countries about the preparedness for complex systems work. Capitalizing on Complexity. From the report there are ten Critical Success Factors, in priority order:

  1. Executive support - if those at the top aren't willing to support your project, it's going to be difficult to get help when things start going bad and they must go bad, because projects break new ground and this creates push back from those uninterested in breaking new ground.
  2. User involvement - projects are about users getting their needs met through new capabilities, delivered through technical and operational requirements.
  3. Clear business objectives - if we don't know what Done looks like in units of measure meaningful to the decision makers, we'll never recognize Done before we run out of time and money.
  4. Emotional maturity - project work is hard work. If you're easily offended by blunt questions about where you'e going, how you're going to get there, assures that when you arrive the product or service will actually work and wat evidence there is to show you spend the money wisely - you're not ready for project work. Project work is about projects. People deliver projects, but it's about the mission or business goal. Maturity in all things is required for success.
  5. Optimizing scope - full functionality can never be foreseen. But a set of needed capabilities must be foreseen if the project is not to turn into a death march - exploring and searching for what Done looks like. Only in a research project do capabilities emerge. If you're spending your customers money, have some definitive notion of what capabilities will result from this spend. What Measures of Effectiveness and Measures of Performance will be used to assure progress is being made toward the goal of delivering those capabilities.
  6. Agile process - no one has visibility to what will emerge in the future. Be prepared - the Boy Scout Motto - for new information and surprises. Ask and answer  how long are you willing to wait before to discover you are late, over budget, and the gadget you're building doesn't work? The answer is ½ the time needed to take corrective action. In other words, determine progress to plan iteratively every few weeks so you have sufficient time to fix what you broke. This is a pur closed loop control system problem. The sampling rate to remain in control is the Nyquist rate, and it is ½ to rate of change of the control variable.  
  7. Project management expertise - managing projects is all about having a plan, a sequence of work activities the deliver incremental, increasing maturity capability for the planned cost, to produce the planned value. All measures of cost and value at monetary.
  8. Skilled resources - work gets done by skilled, experienced people. If they've not seen the problem before or know someone who has, then it's going to be a rough ride. 
  9. Execution - it's all about execution. Execution at a sustainable pace, with tangible outcomes that be assessed for their increasing maturity in units meaningful to the decision makers. The notion that  working software has to be delivered every few days is totally domain dependent. The notion that requirements go stale is equally domain dependent. Don't listen to anyone with any idad that doesn't define the domain of where that idea is known to work. Such a person is just blowing smoke.
  10. Tools and Infrastructure - tools are critical. Any complex project is too complex for one person to manage the data, processes, people, and progress. When you hear complexity is bad, reduce complexity, ask if they have ever actually managed a complex project? No, be quiet.
Related articles Estimating Guidance
Categories: Project Management