Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

[New eBook] Download The No-nonsense Guide to App Growth

Android Developers Blog - Thu, 07/30/2015 - 23:53

Originally posted on the AdMob Blog.

What’s the secret to rapid growth for your app?

Play Store or App Store optimization? A sophisticated paid advertising strategy? A viral social media campaign?

While all of these strategies could help you grow your user base, the foundation for rapid growth is much more basic and fundamental—you need an engaging app.

This handbook will walk you through practical ways to increase your app’s user engagement to help you eventually transition to growth. You’ll learn how to:

  • Pick the right metric to represent user engagement
  • Look at data to audit your app and find areas to fix
  • Promote your app after you’ve reached a healthy level of user engagement

Download a free copy here.

For more tips on app monetization, be sure to stay connected on all things AdMob by following our Twitter and Google+ pages.

Posted by Raj Ajrawat, Product Specialist, AdMob

Join the discussion on

+Android Developers
Categories: Programming

Lighting the way with BLE beacons

Android Developers Blog - Thu, 07/30/2015 - 23:52

Originally posted on the Google Developers blog.

Posted by Chandu Thota, Engineering Director and Matthew Kulick, Product Manager

Just like lighthouses have helped sailors navigate the world for thousands of years, electronic beacons can be used to provide precise location and contextual cues within apps to help you navigate the world. For instance, a beacon can label a bus stop so your phone knows to have your ticket ready, or a museum app can provide background on the exhibit you’re standing in front of. Today, we’re beginning to roll out a new set of features to help developers build apps using this technology. This includes a new open format for Bluetooth low energy (BLE) beacons to communicate with people’s devices, a way for you to add this meaningful data to your apps and to Google services, as well as a way to manage your fleet of beacons efficiently.

Eddystone: an open BLE beacon format

Working closely with partners in the BLE beacon industry, we’ve learned a lot about the needs and the limitations of existing beacon technology. So we set out to build a new class of beacons that addresses real-life use-cases, cross-platform support, and security.

At the core of what it means to be a BLE beacon is the frame format—i.e., a language—that a beacon sends out into the world. Today, we’re expanding the range of use cases for beacon technology by publishing a new and open format for BLE beacons that anyone can use: Eddystone. Eddystone is robust and extensible: It supports multiple frame types for different use cases, and it supports versioning to make introducing new functionality easier. It’s cross-platform, capable of supporting Android, iOS or any platform that supports BLE beacons. And it’s available on GitHub under the open-source Apache v2.0 license, for everyone to use and help improve.

By design, a beacon is meant to be discoverable by any nearby Bluetooth Smart device, via its identifier which is a public signal. At the same time, privacy and security are really important, so we built in a feature called Ephemeral Identifiers (EIDs) which change frequently, and allow only authorized clients to decode them. EIDs will enable you to securely do things like find your luggage once you get off the plane or find your lost keys. We’ll publish the technical specs of this design soon.

Eddystone for developers: Better context for your apps

Eddystone offers two key developer benefits: better semantic context and precise location. To support these, we’re launching two new APIs. The Nearby API for Android and iOS makes it easier for apps to find and communicate with nearby devices and beacons, such as a specific bus stop or a particular art exhibit in a museum, providing better context. And the Proximity Beacon API lets developers associate semantic location (i.e., a place associated with a lat/long) and related data with beacons, stored in the cloud. This API will also be used in existing location APIs, such as the next version of the Places API.

Eddystone for beacon manufacturers: Single hardware for multiple platforms

Eddystone’s extensible frame formats allow hardware manufacturers to support multiple mobile platforms and application scenarios with a single piece of hardware. An existing BLE beacon can be made Eddystone compliant with a simple firmware update. At the core, we built Eddystone as an open and extensible protocol that’s also interoperable, so we’ll also introduce an Eddystone certification process in the near future by closely working with hardware manufacturing partners. We already have a number of partners that have built Eddystone-compliant beacons.

Eddystone for businesses: Secure and manage your beacon fleet with ease

As businesses move from validating their beacon-assisted apps to deploying beacons at scale in places like stadiums and transit stations, hardware installation and maintenance can be challenging: which beacons are working, broken, missing or displaced? So starting today, beacons that implement Eddystone’s telemetry frame (Eddystone-TLM) in combination with the Proximity Beacon API’s diagnostic endpoint can help deployers monitor their beacons’ battery health and displacement—common logistical challenges with low-cost beacon hardware.

Eddystone for Google products: New, improved user experiences

We’re also starting to improve Google’s own products and services with beacons. Google Maps launched beacon-based transit notifications in Portland earlier this year, to help people get faster access to real-time transit schedules for specific stations. And soon, Google Now will also be able to use this contextual information to help prioritize the most relevant cards, like showing you menu items when you’re inside a restaurant.

We want to make beacons useful even when a mobile app is not available; to that end, the Physical Web project will be using Eddystone beacons that broadcast URLs to help people interact with their surroundings.

Beacons are an important way to deliver better experiences for users of your apps, whether you choose to use Eddystone with your own products and services or as part of a broader Google solution like the Places API or Nearby API. The ecosystem of app developers and beacon manufacturers is important in pushing these technologies forward and the best ideas won’t come from just one company, so we encourage you to get some Eddystone-supported beacons today from our partners and begin building!

Join the discussion on

+Android Developers
Categories: Programming


10x Software Development - Steve McConnell - Thu, 07/30/2015 - 22:13

I've posted a YouTube video that gives my perspective on #NoEstimates. 

This is in the new Construx Brain Casts video series. 


Inversion of Control In Commercial Off The Shelf Products (COTS)

Herding Cats - Glen Alleman - Thu, 07/30/2015 - 19:25

The architecture of COTS products comes fixed from the vendor. As standalone systems this is not a problem. When integration starts, it is a problem. 

Here's a white paper from the past that addresses this critical enterprise IT issue

Inversion of Control from Glen Alleman
Categories: Project Management

How Debugging is Like Hunting Serial Killers

Warning: A quote I use in this article is quite graphic. That's the power of the writing, but if you are at all squirmy you may want to turn back now

Debugging requires a particular sympathy for the machine. You must be able to run the machine and networks of machines in your mind while simulating what-ifs based on mere wisps of insight.

There's another process that is surprisingly similar to debugging: hunting down serial killers.

I ran across this parallel while reading Mindhunter: Inside the FBI's Elite Serial Crime Unit by John E. Douglas, a FBI profiler whose specialty is the dark debugging of twisted human minds.

Here's how John describes profiling:

You have to be able to re-create the crime scene in your head. You need to know as much as you can about the victim so that you can imagine how she might have reacted. You have to be able to put yourself in her place as the attacker threatens her with a gun or a knife, a rock, his fists, or whatever. You have to be able to feel her fear as he approaches her. You have to be able to feel her pain as he rapes her or beats her or cuts her. You have to try to imagine what she was going through when he tortured her for his sexual gratification. You have to understand what it’s like to scream in terror and agony, realizing that it won’t help, that it won’t get him to stop. You have to know what it was like. And that is a heavy burden to have to carry.

Serial killers are like bugs in the societal machine. They hide. They blend in. They can pass for "normal" which makes them tough to find. They attack weakness causing untold damage until caught. And they will keep causing damage until caught. They are always hunting for opportunity.

After reading the book I'm quite grateful that the only bugs I've had to deal with are of the computer variety. The human bugs are very very scary.

Here are some other quotes from the book you may also appreciate:

Categories: Architecture

How Do I Come Up With A Profitable Side Project Idea?

Making the Complex Simple - John Sonmez - Thu, 07/30/2015 - 15:00

In this episode, I’m finally going to answer a question about side projects. Full transcript: John:               Hey, John Sonmez from I got a question. I’ve been getting asked this a lot, so I’m finally going to answer this. I’ll probably do a blogpost at some point on this as well, but I always mention […]

The post How Do I Come Up With A Profitable Side Project Idea? appeared first on Simple Programmer.

Categories: Programming

Architecture -Center ERP Systems in the Manufacturing Domain

Herding Cats - Glen Alleman - Thu, 07/30/2015 - 13:51

I found another paper presented at Newspaper systems journal around architecture in manufacturing and ERP.

One of the 12 Principles of agile says The best architectures, requirements, and designs emerge from self-organizing teams. This is a developers point of view of architecture. The architects point of view looks like.

Architectured Centered Design from Glen Alleman
Categories: Project Management

Embracing the Zen of Program Management

The lovely folks at Thoughtworks interviewed me for a blog post, Embracing the Zen of Program Management.  I hope you like the information there.

 Scaling Collaboration Across the OrganizationIf you want to know about agile and lean program management, see Agile and Lean Program Management: Scaling Collaboration Across the Organization. In beta now.

Categories: Project Management

Doing Terrible Things To Your Code

Coding Horror - Jeff Atwood - Thu, 07/30/2015 - 10:31

In 1992, I thought I was the best programmer in the world. In my defense, I had just graduated from college, this was pre-Internet, and I lived in Boulder, Colorado working in small business jobs where I was lucky to even hear about other programmers much less meet them.

I eventually fell in with a guy named Bill O'Neil, who hired me to do contract programming. He formed a company with the regrettably generic name of Computer Research & Technologies, and we proceeded to work on various gigs together, building line of business CRUD apps in Visual Basic or FoxPro running on Windows 3.1 (and sometimes DOS, though we had a sense by then that this new-fangled GUI thing was here to stay).

Bill was the first professional programmer I had ever worked with. Heck, for that matter, he was the first programmer I ever worked with. He'd spec out some work with me, I'd build it in Visual Basic, and then I'd hand it over to him for review. He'd then calmly proceed to utterly demolish my code:

  • Tab order? Wrong.
  • Entering a number instead of a string? Crash.
  • Entering a date in the past? Crash.
  • Entering too many characters? Crash.
  • UI element alignment? Off.
  • Does it work with unusual characters in names like, say, O'Neil? Nope.

One thing that surprised me was that the code itself was rarely the problem. He occasionally had some comments about the way I wrote or structured the code, but what I clearly had no idea about is testing my code.

I dreaded handing my work over to him for inspection. I slowly, painfully learned that the truly difficult part of coding is dealing with the thousands of ways things can go wrong with your application at any given time – most of them user related.

That was my first experience with the buddy system, and thanks to Bill, I came out of that relationship with a deep respect for software craftsmanship. I have no idea what Bill is up to these days, but I tip my hat to him, wherever he is. I didn't always enjoy it, but learning to develop discipline around testing (and breaking) my own stuff unquestionably made me a better programmer.

It's tempting to lay all this responsibility at the feet of the mythical QA engineer.

QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.

— Bill Sempf (@sempf) September 23, 2014

If you are ever lucky enough to work with one, you should have a very, very healthy fear of professional testers. They are terrifying. Just scan this "Did I remember to test" list and you'll be having the worst kind of flashbacks in no time. And that's the abbreviated version of his list.

I believe a key turning point in every professional programmer's working life is when you realize you are your own worst enemy, and the only way to mitigate that threat is to embrace it. Act like your own worst enemy. Break your UI. Break your code. Do terrible things to your software.

This means programmers need a good working knowledge of at least the common mistakes, the frequent cases that average programmers tend to miss, to work against. You are tester zero. This is your responsibility.

Let's start with Patrick McKenzie's classic Falsehoods Programmers Believe about Names:

  1. People have exactly one canonical full name.
  2. People have exactly one full name which they go by.
  3. People have, at this point in time, exactly one canonical full name.
  4. People have, at this point in time, one full name which they go by.
  5. People have exactly N names, for any value of N.
  6. People’s names fit within a certain defined amount of space.
  7. People’s names do not change.
  8. People’s names change, but only at a certain enumerated set of events.
  9. People’s names are written in ASCII.
  10. People’s names are written in any single character set.

That's just the first 10. There are thirty more. Plus a lot in the comments if you're in the mood for extra credit. Or, how does Falsehoods Programmers Believe About Time grab you?

  1. There are always 24 hours in a day.
  2. Months have either 30 or 31 days.
  3. Years have 365 days.
  4. February is always 28 days long.
  5. Any 24-hour period will always begin and end in the same day (or week, or month).
  6. A week always begins and ends in the same month.
  7. A week (or a month) always begins and ends in the same year.
  8. The machine that a program runs on will always be in the GMT time zone.
  9. Ok, that’s not true. But at least the time zone in which a program has to run will never change.
  10. Well, surely there will never be a change to the time zone in which a program has to run in production.
  11. The system clock will always be set to the correct local time.
  12. The system clock will always be set to a time that is not wildly different from the correct local time.
  13. If the system clock is incorrect, it will at least always be off by a consistent number of seconds.
  14. The server clock and the client clock will always be set to the same time.
  15. The server clock and the client clock will always be set to around the same time.

Are there more? Of course there are! There's even a whole additional list of stuff he forgot when he put that giant list together.

Catastrophic Error - User attempted to use program in the manner program was meant to be used

I think you can see where this is going. This is programming. We do this stuff for fun, remember?

But in true made-for-TV fashion, wait, there's more! Seriously, guys, where are you going? Get back here. We have more awesome failure states to learn about:

At this point I wouldn't blame you if you decided to quit programming altogether. But I think it's better if we learn to do for each other what Bill did for me, twenty years ago — teach less experienced developers that a good programmer knows they have to do terrible things to their code. Do it because if you don't, I guarantee you other people will, and when they do, they will either walk away or create a support ticket. I'm not sure which is worse.

[advertisement] Find a better job the Stack Overflow way - what you need when you need it, no spam, and no scams.
Categories: Programming

Neo4j: Cypher – Removing consecutive duplicates

Mark Needham - Thu, 07/30/2015 - 07:23

When writing Cypher queries I sometimes find myself wanting to remove consecutive duplicates in collections that I’ve joined together.

e.g we might start with the following query where 1 and 7 appear consecutively:

RETURN [1,1,2,3,4,5,6,7,7,8] AS values
==> +-----------------------+
==> | values                |
==> +-----------------------+
==> | [1,1,2,3,4,5,6,7,7,8] |
==> +-----------------------+
==> 1 row

We want to end up with [1,2,3,4,5,6,7,8]. We can start by exploding our array and putting consecutive elements next to each other:

WITH [1,1,2,3,4,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
RETURN idx, idx+1, values[idx], values[idx+1]
==> +-------------------------------------------+
==> | idx | idx+1 | values[idx] | values[idx+1] |
==> +-------------------------------------------+
==> | 0   | 1     | 1           | 1             |
==> | 1   | 2     | 1           | 2             |
==> | 2   | 3     | 2           | 3             |
==> | 3   | 4     | 3           | 4             |
==> | 4   | 5     | 4           | 5             |
==> | 5   | 6     | 5           | 6             |
==> | 6   | 7     | 6           | 7             |
==> | 7   | 8     | 7           | 7             |
==> | 8   | 9     | 7           | 8             |
==> +-------------------------------------------+
==> 9 rows

Next we can filter out rows which have the same values since that means they have consecutive duplicates:

WITH [1,1,2,3,4,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
==> +-------+
==> | a | b |
==> +-------+
==> | 1 | 2 |
==> | 2 | 3 |
==> | 3 | 4 |
==> | 4 | 5 |
==> | 5 | 6 |
==> | 6 | 7 |
==> | 7 | 8 |
==> +-------+
==> 7 rows

Now we need to join the collection back together again. Most of the values we want are in field ‘b’ but we also need to grab the first value from field ‘a':

WITH [1,1,2,3,4,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
RETURN COLLECT(a)[0] + COLLECT(b) AS noDuplicates
==> +-------------------+
==> | noDuplicates      |
==> +-------------------+
==> | [1,2,3,4,5,6,7,8] |
==> +-------------------+
==> 1 row

What about if we have more than 2 duplicates in a row?

WITH [1,1,1,2,3,4,5,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
RETURN COLLECT(a)[0] + COLLECT(b) AS noDuplicates
==> +-------------------+
==> | noDuplicates      |
==> +-------------------+
==> | [1,2,3,4,5,6,7,8] |
==> +-------------------+
==> 1 row

Still happy, good times! Of course if we have a non consecutive duplicate that wouldn’t be removed:

WITH [1,1,1,2,3,4,5,5,6,7,7,8,1] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
RETURN COLLECT(a)[0] + COLLECT(b) AS noDuplicates
==> +---------------------+
==> | noDuplicates        |
==> +---------------------+
==> | [1,2,3,4,5,6,7,8,1] |
==> +---------------------+
==> 1 row
Categories: Programming

What is Insight?

"A moment's insight is sometimes worth a life's experience." -- Oliver Wendell Holmes, Sr.

Some say we’re in the Age of Insight.  Others say insight is the new currency in the Digital Economy.

And still others say that insight is the backbone of innovation.

Either way, we use “insight” an awful lot without talking about what insight actually is.

So, what is insight?

I thought it was time to finally do a deeper dive on what insight actually is.  Here is my elaboration of “insight” on Sources of Insight:


You can think of it as “insight explained.”

The simple way that I think of insight, or those “ah ha” moments, is by remembering a question Ward Cunningham uses a lot:

“What did you learn that you didn’t expect?” or “What surprised you?”

Ward uses these questions to reveal insights, rather than have somebody tell him a bunch of obvious or uneventful things he already knows.  For example, if you ask somebody what they learned at their presentation training, they’ll tell you that they learned how to present more effectively, speak more confidently, and communicate their ideas better.

No kidding.

But if you instead ask them, “What did you learn that you didn’t expect?” they might actually reveal some insight and say something more like this:

“Even though we say don’t shoot the messenger all the time, you ARE the message.”


“If you win the heart, the mind follows.”

It’s the non-obvious stuff, that surprises you (at least at first).  Or sometimes, insight strikes us as something that should have been obvious all along and becomes the new obvious, or the new normal.

Ward used this insights gathering technique to more effectively share software patterns.  He wanted stories and insights from people, rather than descriptions of the obvious.

I’ve used it myself over the years and it really helps get to deeper truths.  If you are a truth seeker or a lover of insights, you’ll enjoy how you can tease out more insights, just by changing your questions.   For example, if you have kids, don’t ask, “How was your day?”   Ask them, “What was the favorite part of your day?” or “What did you learn that surprised you?”

Wow, I now this is a short post, but I almost left without defining insight.

According to the dictionary, insight is “The capacity to gain an accurate and deep intuitive understanding of a person or thing.”   Or you may see insight explained as inner sight, mental vision, or wisdom.

I like Edward de Bono’s simple description of insight as “Eureka moments.”

Some people count steps in their day.  I count my “ah-ha” moments.  After all, the most important ingredient of effective ideation and innovation is …yep, you guessed it – insight!

For a deeper dive on the power of insight, read my page on Insight explained, on Sources Of

Categories: Architecture, Programming

A Well Known But Forgotten Trick: Object Pooling

This is a guest repost by Alex Petrov. Find the original article here.

Most problem are quite straightforward to solve: when something is slow, you can either optimize it or parallelize it. When you hit a throughput barrier, you partition a workload to more workers. Although when you face problems that involve Garbage Collection pauses or simply hit the limit of the virtual machine you're working with, it gets much harder to fix them.

When you're working on top of a VM, you may face things that are simply out of your control. Namely, time drifts and latency. Gladly, there are enough battle-tested solutions, that require a bit of understanding of how JVM works.

If you can serve 10K requests per second, conforming with certain performance (memory and CPU parameters), it doesn't automatically mean that you'll be able to linearly scale it up to 20K. If you're allocating too many objects on heap, or waste CPU cycles on something that can be avoided, you'll eventually hit the wall.

The simplest (yet underrated) way of saving up on memory allocations is object pooling. Even though the concept is sounds similar to just pooling objects and socket descriptors, there's a slight difference.

When we're talking about socket descriptors, we have limited, rather small (tens, hundreds, or max thousands) amount of descriptors to go through. These resources are pooled because of the high initialization cost (establishing connection, performing a handshake over the network, memory-mapping the file or whatever else). In this article we'll talk about pooling larger amounts of short-lived objects which are not so expensive to initialize, to save allocation and deallocation costs and avoid memory fragmentation.

Object Pooling
Categories: Architecture

Great Review of Predicting the Unpredictable

Ryan Ripley “highly recommends” Predicting the Unpredictable: Pragmatic Approaches to Estimating Cost or Schedule. See his post: Pragmatic Agile Estimation: Predicting the Unpredictable.

He says this:

This is a practical book about the work of creating software and providing estimates when needed. Her estimation troubleshooting guide highlights many of the hidden issues with estimating such as: multitasking, student syndrome, using the wrong units to estimate, and trying to estimates things that are too big. — Ryan Ripley

Thank you, Ryan!

PredictingUnpredictable-small See Predicting the Unpredictable: Pragmatic Approaches to Estimating Cost or Schedule for more information.

Categories: Project Management

IT Risk Management

Herding Cats - Glen Alleman - Wed, 07/29/2015 - 02:46

I was sorting through a desk draw and came across a collection of papers from book chapters and journals done in the early 2000's when I was the architect of an early newspaper editorial system.

Here's one on Risk Management

Information Technology Risk Management from Glen Alleman This work was done early in the risk management development process. Tim Lister's quote came later Risk management is how adults management projects.
Categories: Project Management

The Definition of Done at Scale


While there is agreement that you should use DoD at scale, how to apply it is less clear.

The Definition of Done (DoD) is an important technique for increasing the operational effectiveness of team-level Agile. The DoD provides a team with a set of criteria that they can use to plan and bound their work. As Agile is scaled up to deliver larger, more integrated solutions the question that is often asked is whether the concept of the DoD can be applied. And if it is applied, does the application require another layer of done (more complexity)?

The answer to the first question is simple and straightforward. If the question is whether the Definition of Done technique can be used as Agile projects are scaled, then the answer is an unequivocal ‘yes’. In preparation for this essay I surveyed a few dozen practitioners and coaches on the topic to ensure that my use of the technique at scale wasn’t extraordinary. To a person, they all used the technique in some form. Mario Lucero, an Agile Coach in Chile, (interviewed on SPaMCAST 334) said it succinctly, “No, the use of Definition of Done doesn’t depend on how large is the project.”

While everyone agreed that the DoD makes sense in a scaled Agile environment, there is far less consensus on how to apply the technique. The divergence of opinion and practice centered on whether or not the teams working together continually integrated their code as part of their build management process. There are two different camps. The first camp typically finds themselves in organizations that integrated functions either as a final step in a sprint, performed integration as a separate function outside of development or as a separate hardening sprint. This camp generally feels that to apply the Definition of Done requires a separate DoD specifically for integration. This DoD would include requirements for integrating functions, testing integration and architectural requirements that span teams. The second camp of respondent finds themselves in environments where continuous integration is performed. In this scenario each respondent either added integration criteria in the team DoD or did nothing at all. The primary difference boiled down to whether the team members were responsible for making sure their code integrated with the overall system or whether someone else (real or perceived) was responsible.

In practice the way that DoD is practiced includes a bit of the infamous “it depends” magic. During our discussion on the topic, Luc Bourgault from Wolters Kluwer stated, “in a perfect world the definition should be same, but I think we should be accept differences when it makes sense.” Pradeep Chennavajhula, Senior Global VP at QAI, made three points:

  1. Principles/characteristics of Definition of done do not change by size of the project.
  2. However, the considerations and detail will be certainly impacted.
  3. This may however, create a perception that Definition of Done varies by size of project.

The Definition of Done is useful for all Agile work whether a single team or a large scaled effort. However, how you have organized your Agile effort will have more of a potential impact on your approach.

Categories: Process Management

SE-Radio Episode 233: Fangjin Yang on OLAP and the Druid Real-Time Analytical Data Store

Fangjin Yang, creator of the Druid real-time analytical database, talks with Robert Blumen. They discuss the OLAP (online analytical processing) domain, OLAP concepts (hypercube, dimension, metric, and pivot), types of OLAP queries (roll-up, drill-down, and slicing and dicing), use cases for OLAP by organizations, the OLAP store’s position in the enterprise workflow, what “real time” […]
Categories: Programming

Neo4j: MERGE’ing on super nodes

Mark Needham - Tue, 07/28/2015 - 22:04

In my continued playing with the Chicago crime data set I wanted to connect the crimes committed to their position in the FBI crime type hierarchy.

These are the sub graphs that I want to connect:

2015 07 26 22 19 04

We have a ‘fbiCode’ on each ‘Crime’ node which indicates which ‘Crime Sub Category’ the crime belongs to.

I started with the following query to connect the nodes together:

MATCH (crime:Crime)
WITH crime SKIP {skip} LIMIT 10000
MATCH (subCat:SubCategory {code: crime.fbiCode})
MERGE (crime)-[:CATEGORY]->(subCat)
RETURN COUNT(*) AS crimesProcessed

I had this running inside a Python script which incremented ‘skip’ by 10,000 on each iteration as long as ‘crimesProcessed’ came back with a value > 0.

To start with the ‘CATEGORY’ relationships were being created very quickly but it slowed down quite noticeably about 1 million nodes in.

I profiled the queries but the query plans didn’t show anything obviously wrong. My suspicion was that I had a super node problem where the cypher run time was iterating through all of the sub category’s relationships to check whether one of them pointed to the crime on the other side of the ‘MERGE’ statement.

I cancelled the import job and wrote a query to check how many relationships each sub category had. It varied from 1,000 to 93,000 somewhat confirming my suspicion.

Michael suggested tweaking the query to use the shortestpath function to check for the existence of the relationship and then use the ‘CREATE’ clause to create it if it didn’t exist.

The neat thing about the shortestpath function is that it will start from the side with the lowest cardinality and as soon as it finds a relationship it will stop searching. Let’s have a look at that version of the query:

MATCH (crime:Crime)
WITH crime SKIP {skip} LIMIT 10000
MATCH (subCat:SubCategory {code: crime.fbiCode})
WITH crime, subCat, shortestPath((crime)-[:CATEGORY]->(subCat)) AS path
  CREATE (crime)-[:CATEGORY]->(subCat))

This worked much better – 10,000 nodes processed in ~ 2.5 seconds – and the time remained constant as more relationships were added. This allowed me to create all the category nodes but we can actually do even better if we use CREATE UNIQUE instead of MERGE

MATCH (crime:Crime)
WITH crime SKIP {skip} LIMIT 10000
MATCH (subCat:SubCategory {code: crime.fbiCode})
CREATE UNIQUE (crime)-[:CATEGORY]->(subCat)
RETURN COUNT(*) AS crimesProcessed

Using this query 10,000 nodes took ~ 250ms -900ms second to process which means we can process all the nodes in 5-6 minutes – good times!

I’m not super familiar with the ‘CREATE UNIQUE’ code so I’m not sure that it’s always a good substitute for ‘MERGE’ but on this occasion it does the job.

The lesson for me here is that if a query is taking longer than you think it should try and use other constructs / a combination of other constructs and see whether things improve – they just might!

Categories: Programming

Python: Difference between two datetimes in milliseconds

Mark Needham - Tue, 07/28/2015 - 21:05

I’ve been doing a bit of adhoc measurement of some cypher queries executed via py2neo and wanted to work out how many milliseconds each query was taking end to end.

I thought there’d be an obvious way of doing this but if there is it’s evaded me so far and I ended up calculating the different between two datetime objects which gave me the following timedelta object:

>>> import datetime
>>> start =
>>> end =
>>> end - start
datetime.timedelta(0, 3, 519319)

The 3 parts of this object are ‘days’, ‘seconds’ and ‘microseconds’ which I found quite strange!

These are the methods/attributes we have available to us:

>>> dir(end - start)
['__abs__', '__add__', '__class__', '__delattr__', '__div__', '__doc__', '__eq__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__pos__', '__radd__', '__rdiv__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmul__', '__rsub__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', 'days', 'max', 'microseconds', 'min', 'resolution', 'seconds', 'total_seconds']

There’s no ‘milliseconds’ on there so we’ll have to calculate it from what we do have:

>>> diff = end - start
>>> elapsed_ms = (diff.days * 86400000) + (diff.seconds * 1000) + (diff.microseconds / 1000)
>>> elapsed_ms

Or we could do the following slightly simpler calculation:

>>> diff.total_seconds() * 1000

And now back to the query profiling!

Categories: Programming

Estimates on Split Stories Do Not Need to Equal the Original

Mike Cohn's Blog - Tue, 07/28/2015 - 15:00

It is good practice to first write large user stories (commonly known as epics) and then to split them into smaller pieces, a process known as product backlog refinement or grooming. When product backlog items are split, they are often re-estimated.

I’m often asked if the sum of the estimates on the smaller stories must equal the estimate on the original, larger story.


Part of the reason for splitting the stories is to understand them better. Team members discuss the story with the product owner. As a product owner clarifies a user story, the team will know more about the work they are to do.

That improved knowledge should be reflected in any estimates they provide. If those estimates don’t sum to the same value as the original story, so be it.

But What About the Burndown?

But, I hear you asking, what about the release burndown chart? A boss, client or customer was told that a story was equal to 20 points. Now that the team split it apart, it’s become bigger.

Well, first, and I always feel compelled to say this: We should always stress to our bosses, clients and customers that estimates are estimates and not commitments.

When we told them the story would be 20 points, that meant perhaps 20, perhaps 15, perhaps 25. Perhaps even 10 or 40 if things went particularly well or poorly.

OK, you’ve probably delivered that message, and it may have gone in one ear and out the other of your boss, client or customer. So here’s something else you should be doing that can protect you against a story becoming larger when split and its parts are re-estimated.

I’ve always written and trained that the numbers in Planning Poker are best thought of as buckets of water.

You have, for example an 8 and a 13 but not a 10 card. If you have a story that you think is a 10, you need to estimate it as a 13. This slight rounding up (which only occurs on medium to large numbers) will mitigate the effect of stories becoming larger when split.

Consider the example of a story a team thinks is a 15. If they play Planning Poker the way I recommend, they will call that large story a 20.

Later, they split it into multiple smaller stories. Let’s say they split it into stories they estimate as 8, 8 and 5. That’s 21. That’s significantly larger than the 15 they really thought it was, but not much larger at all than the 20 they put on the story.

In practice, I’ve found this slight pessimistic bias to work well to counter the natural tendency I believe many developers have to underestimate, and to provide a balance against those who will be overly shocked when any actual overruns its estimate.

Why Guessing is not Estimating and Estimating is not Guessing

Herding Cats - Glen Alleman - Mon, 07/27/2015 - 19:00

I hear all the time estimating is the same as guessing. This is not true mathematically nor is not true business process wise. This is an approach used by many (guessing), not understanding that making decisions in the presence of uncertainty requires we understand the impact of that decision. When that future is uncertain, we need to know that impact in probabilistic terms. And with this, comes confidence, precision, and accuracy of the estimate.

What’s the difference between estimate and guess? The distinction between the two words is one of the degree of care taken in arriving at a conclusion.

The word Estimate is derived from the Latin word aestimare, meaning to value. The term is has the origin of estimable, which means capable of being estimated or worthy of esteem, and of course esteem, which means regard as in High Regard.

To estimate means to judge the extent, nature, or value of something - connected to the regard - he is held in high regard, with the implication that the result is based on expertise or familiarity. An estimate is the resulting calculation or judgment. A related term is approximation, meaning close or near.

In between a guess and an estimate is an educated guess, a more casual estimate. An idiomatic term for this type of middle-ground conclusion is ballpark figure. The origin of this American English idiom, which alludes to a baseball stadium, is not certain, but one conclusion is that it is related to in the ballpark, meaning close in the sense that one at such a location may not be in a precise location but is in the stadium.

To guess is to believe or suppose, to form an opinion based on little or no evidence, or to be correct by chance or conjecture. A guess is a thought or idea arrived at by one of these methods. Synonyms for guess include conjecture and surmise, which like guess can be employed both as verbs and as nouns.

We could have a hunch or an intuition, or we can engage in guesswork or speculation. Dead reckoning is same thing as guesswork.  Dead reckoning was originally referred to a navigation process based on reliable information. Near synonyms describing thoughts or ideas developed with more rigor include hypothesis and supposition, as well as theory and thesis.

A guess is a casual, perhaps spontaneous conclusion. An estimate is based on intentional  thought processes supported by data.

What Does This Mean For Projects?

If we're guessing we're making uninformed conclusions usually in the absence of data, experience, or any evidence of credibility. If we're estimating we are making informed conclusions based on data, past performance, models - including Monte Carlo models, and parametric models.

When we hear decisions can be made without estimates. Or all estimating is guessing, we now mathematically and business process - neither of this is true.

This post is derived from Daily Writing Tips 

Related articles Making Conjectures Without Testable Outcomes Strategy is Not the Same as Operational Effectiveness Are Estimates Really The Smell of Dysfunction? Information Technology Estimating Quality
Categories: Project Management