Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Mark Needham
Syndicate content
Thoughts on Software Development
Updated: 4 hours 33 min ago

Neo4j: Cypher – Creating a time tree down to the day

Sat, 04/19/2014 - 22:15

Michael recently wrote a blog post showing how to create a time tree representing time down to the second using Neo4j’s Cypher query language, something I built on top of for a side project I’m working on.

The domain I want to model is RSVPs to meetup invites – I want to understand how much in advance people respond and how likely they are to drop out at a later stage.

For this problem I only need to measure time down to the day so my task is a bit easier than Michael’s.

After a bit of fiddling around with leap years I believe the following query will create a time tree representing all the days from 2011 – 2014, which covers the time the London Neo4j meetup has been running:

WITH range(2011, 2014) AS years, range(1,12) as months
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN (CASE 
                      WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) 
                      WHEN month = 2 THEN 
                        CASE
                          WHEN year % 4 <> 0 THEN range(1,28)
                          WHEN year % 100 <> 0 THEN range(1,29)
                          WHEN year % 400 <> 0 THEN range(1,29)
                          ELSE range(1,28)
                        END
                      ELSE range(1,30)
                    END) |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d))))

The next step is to link adjacent days together so that we can easily traverse between adjacent days without needing to go back up and down the tree. For example we should have something like this:

(jan31)-[:NEXT]->(feb1)-[:NEXT]->(feb2)

We can build this by first collecting all the ‘day’ nodes in date order like so:

MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
RETURN days

And then iterating over adjacent nodes to create the ‘NEXT’ relationship:

MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
FOREACH(i in RANGE(0, length(days)-2) | 
    FOREACH(day1 in [days[i]] | 
        FOREACH(day2 in [days[i+1]] | 
            CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now if we want to find the previous 5 days from the 1st February 2014 we could write the following query:

MATCH (y:Year {year: 2014})-[:HAS_MONTH]->(m:Month {month: 2})-[:HAS_DAY]->(:Day {day: 1})<-[:NEXT*0..5]-(day)
RETURN y,m,day
2014 04 19 22 14 04

If we want to we can create the time tree and then connect the day nodes all in one query by using ‘WITH *’ like so:

WITH range(2011, 2014) AS years, range(1,12) as months
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN (CASE 
                      WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) 
                      WHEN month = 2 THEN 
                        CASE
                          WHEN year % 4 <> 0 THEN range(1,28)
                          WHEN year % 100 <> 0 THEN range(1,29)
                          WHEN year % 400 <> 0 THEN range(1,29)
                          ELSE range(1,28)
                        END
                      ELSE range(1,30)
                    END) |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d))))
 
WITH *
 
MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
FOREACH(i in RANGE(0, length(days)-2) | 
    FOREACH(day1 in [days[i]] | 
        FOREACH(day2 in [days[i+1]] | 
            CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now I need to connect the RSVP events to the tree!

Categories: Programming

Neo4j 2.0.1: Cypher – Concatenating an empty collection / Type mismatch: expected Integer, Collection<Integer> or Collection<Collection<Integer>> but was Collection<Any>

Sat, 04/19/2014 - 20:51

Last weekend I was playing around with some collections using Neo4j’s Cypher query language and I wanted to concatenate two collections.

This was easy enough when both collections contained values…

$ RETURN [1,2,3,4] + [5,6,7];
==> +---------------------+
==> | [1,2,3,4] + [5,6,7] |
==> +---------------------+
==> | [1,2,3,4,5,6,7]     |
==> +---------------------+
==> 1 row

…but I ended up with the following exception when I tried to concatenate with an empty collection:

$ RETURN [1,2,3,4] + [];
==> SyntaxException: Type mismatch: expected Integer, Collection<Integer> or Collection<Collection<Integer>> but was Collection<Any> (line 1, column 20)
==> "RETURN [1,2,3,4] + []"
==>                     ^

I figured there was probably some strange type coercion going on for the empty collection and came up with the following work around using the RANGE function:

$ RETURN [1,2,3,4] + RANGE(0,-1);
==> +-------------------------+
==> | [1,2,3,4] + RANGE(0,-1) |
==> +-------------------------+
==> | [1,2,3,4]               |
==> +-------------------------+
==> 1 row

While writing this up I decided to check if it behaved the same way in the recently released 2.0.2 and was pleasantly surprised to see that the work around is no longer necessary:

$ RETURN [1,2,3,4] + [];
==> +----------------+
==> | [1,2,3,4] + [] |
==> +----------------+
==> | [1,2,3,4]      |
==> +----------------+
==> 1 row

So if you’re seeing the same issue get yourself upgraded!

Categories: Programming

Neo4j: Cypher – Creating relationships between a collection of nodes / Invalid input ‘[‘:

Sat, 04/19/2014 - 07:33

When working with graphs we’ll frequently find ourselves wanting to create relationships between collections of nodes.

A common example of this would be creating a linked list of days so that we can quickly traverse across a time tree. Let’s say we start with just 3 days:

MERGE (day1:Day {day:1 })
MERGE (day2:Day {day:2 })
MERGE (day3:Day {day:3 })
RETURN day1, day2, day3

And we want to create a ‘NEXT’ relationship between adjacent days:

(day1)-[:NEXT]->(day2)-[:NEXT]->(day3)

The most obvious way to do this would be to collect the days into an ordered collection and iterate over them using FOREACH, creating a relationship between adjacent nodes:

MATCH (day:Day)
WITH day
ORDER BY day.day
WITH COLLECT(day) AS days
FOREACH(i in RANGE(0, length(days)-2) | 
  CREATE UNIQUE (days[i])-[:NEXT]->(days[i+1]))

Unfortunately this isn’t valid syntax:

Invalid input '[': expected an identifier character, node labels, a property map, whitespace, ')' or a relationship pattern (line 6, column 32)
"            CREATE UNIQUE (days[i])-[:NEXT]->(days[i+1]))"
                                ^

It doesn’t seem to like us using array indices where we specify the node identifier.

However, we can work around that by putting days[i] and days[i+1] into single item arrays and using nested FOREACH loops on those, something Michael Hunger showed me last year and I forgot all about!

MATCH (day:Day)
WITH day
ORDER BY day.day
WITH COLLECT(day) AS days
FOREACH(i in RANGE(0, length(days)-2) | 
  FOREACH(day1 in [days[i]] | 
    FOREACH(day2 in [days[i+1]] | 
      CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now if we do a query to get back all the days we’ll see they’re connected:

2014 04 19 07 32 37
Categories: Programming

Neo4j 2.0.0: Query not prepared correctly / Type mismatch: expected Map

Sun, 04/13/2014 - 18:40

I was playing around with Neo4j’s Cypher last weekend and found myself accidentally running some queries against an earlier version of the Neo4j 2.0 series (2.0.0).

My first query started with a map and I wanted to create a person from an identifier inside the map:

WITH {person: {id: 1}} AS params
MERGE (p:Person {id: params.person.id})
RETURN p

When I ran the query I got this error:

==> SyntaxException: Type mismatch: expected Map but was Boolean, Number, String or Collection<Any> (line 1, column 62)
==> "WITH {person: {id: 1}} AS params MERGE (p:Person {id: params.person.id}) RETURN p"

If we try the same query in 2.0.1 it works as we’d expect:

==> +---------------+
==> | p             |
==> +---------------+
==> | Node[1]{id:} |
==> +---------------+
==> 1 row
==> Nodes created: 1
==> Properties set: 1
==> Labels added: 1
==> 47 ms

My next query was the following which links topics of interest to a person:

WITH {topics: [{name: "Java"}, {name: "Neo4j"}]} AS params
MERGE (p:Person {id: 2})
FOREACH(t IN params.topics | 
  MERGE (topic:Topic {name: t.name})
  MERGE (p)-[:INTERESTED_IN]->(topic)
)
RETURN p

In 2.0.0 that query fails like so:

==> InternalException: Query not prepared correctly!

but if we try it in 2.0.1 we’ll see that it works as well:

==> +---------------+
==> | p             |
==> +---------------+
==> | Node[4]{id:2} |
==> +---------------+
==> 1 row
==> Nodes created: 1
==> Relationships created: 2
==> Properties set: 1
==> Labels added: 1
==> 53 ms

So if you’re seeing either of those errors then get yourself upgraded to 2.0.1 as well!

Categories: Programming

install4j and AppleScript: Creating a Mac OS X Application Bundle for a Java application

Mon, 04/07/2014 - 01:04

We have a few internal applications at Neo which can be launched using ‘java -jar ‘ and I always forget where the jars are so I thought I’d wrap a Mac OS X application bundle around it to make life easier.

My favourite installation pattern is the one where when you double click the dmg it shows you a window where you can drag the application into the ‘Applications’ folder, like this:

2014 04 07 00 38 41

I’m not a fan of the installation wizards and the installation process here is so simple that a wizard seems overkill.

I started out learning about the structure of an application bundle which is well described in the Apple Bundle Programming guide. I then worked my way through a video which walks you through bundling a JAR file in a Mac application.

I figured that bundling a JAR was probably a solved problem and had a look at App Bundler, JAR Bundler and Iceberg before settling on Install4j which we used for Neo4j desktop.

I started out by creating an installer using Install4j and then manually copying the launcher it created into an Application bundle template but it was incredibly fiddly and I ended up with a variety of indecipherable messages in the system error log.

Eventually I realised that I didn’t need to create an installer and that what I actually wanted was a Mac OS X single bundle archive media file.

After I’d got install4j creating that for me I just needed to figure out how to create the background image telling the user to drag the application into their ‘Applications’ folder.

Luckily I came across this StackOverflow post which provided some AppleScript to do just that and with a bit of tweaking I ended up with the following shell script which seems to do the job:

#!/bin/bash
 
rm target/DBench_macos_1_0_0.tgz
/Applications/install4j\ 5/bin/install4jc TestBench.install4j
 
title="DemoBench"
backgroundPictureName="graphs.png"
applicationName="DemoBench"
finalDMGName="DemoBench.dmg"
 
rm -rf target/dmg && mkdir -p target/dmg
tar -C target/dmg -xvf target/DBench_macos_1_0_0.tgz
cp -r src/packaging/.background target/dmg
ln -s /Applications target/dmg
 
cd target
rm "${finalDMGName}"
umount -f /Volumes/"${title}"
 
hdiutil create -volname ${title} -size 100m -srcfolder dmg/ -ov -format UDRW pack.temp.dmg
device=$(hdiutil attach -readwrite -noverify -noautoopen "pack.temp.dmg" | egrep '^/dev/' | sed 1q | awk '{print $1}')
 
sleep 5
 
echo '
   tell application "Finder"
     tell disk "'${title}'"
           open
           set current view of container window to icon view
           set toolbar visible of container window to false
           set statusbar visible of container window to false
           set the bounds of container window to {400, 100, 885, 430}
           set theViewOptions to the icon view options of container window
           set arrangement of theViewOptions to not arranged
           set icon size of theViewOptions to 72
           set background picture of theViewOptions to file ".background:'${backgroundPictureName}'"
           set position of item "'${applicationName}'" of container window to {100, 100}
           set position of item "Applications" of container window to {375, 100}
           update without registering applications
           delay 5
           eject
     end tell
   end tell
' | osascript
 
hdiutil detach ${device}
hdiutil convert "pack.temp.dmg" -format UDZO -imagekey zlib-level=9 -o "${finalDMGName}"
rm -f pack.temp.dmg
 
cd ..

To summarise, this script creates a symlink to ‘Applications’, puts a background image in a directory titled ‘.background’, sets that as the background of the window and positions the symlink and application appropriately.

Et voila:

2014 04 07 00 59 56

The Firefox guys wrote a couple of blog posts detailing their experiences writing an installer which were quite an interesting read as well.

Categories: Programming

Clojure: Not so lazy sequences a.k.a chunking behaviour

Sun, 04/06/2014 - 23:07

I’ve been playing with Clojure over the weekend and got caught out by the behaviour of lazy sequences due to chunking – something which was obvious to experienced Clojurians although not me.

I had something similar to the following bit of code which I expected to only evaluate the first item of the infinite sequence that the range function generates:

> (take 1 (map (fn [x] (println (str "printing..." x))) (range)))
(printing...0
printing...1
printing...2
printing...3
printing...4
printing...5
printing...6
printing...7
printing...8
printing...9
printing...10
printing...11
printing...12
printing...13
printing...14
printing...15
printing...16
printing...17
printing...18
printing...19
printing...20
printing...21
printing...22
printing...23
printing...24
printing...25
printing...26
printing...27
printing...28
printing...29
printing...30
printing...31
nil)

The reason this was annoying is because I wanted to shortcut the lazy sequence using take-while, much like the poster of this StackOverflow question.

As I understand it when we have a lazy sequence the granularity of that laziness is 32 items at a time a.k.a one chunk, something that Michael Fogus wrote about 4 years ago. This was a bit surprising to me but it sounds like it makes sense for the majority of cases.

However, if we want to work around that behaviour we can wrap the lazy sequence in the following unchunk function provided by Stuart Sierra:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))

Now if we repeat our initial code we’ll see it only prints once:

> (take 1 (map (fn [x] (println (str "printing..." x))) (unchunk (range))))
(printing...0
nil)
Categories: Programming

Soulver: For all your random calculations

Sun, 03/30/2014 - 15:48

I often find myself doing random calculations and I used to do so part manually and part using Alfred‘s calculator until Alistair pointed me at Soulver, a desktop/iPhone/iPad app, which is even better.

I thought I’d write some examples of calculations I use it for, partly so I’ll remember the syntax in future!

Calculating how much memory Neo4j memory mapping will take up

800 mb + 2660mb + 6600mb + 9500mb + 40mb in GB = 19.6 GB

How long would it take to cover 20,000 km at 100 km / day?

20,000 km / 100 km/day in months = 6.57097681677241832481 months

How long did an import of some data using the Neo4j shell take?

4550855 ms in minutes = 75.84758333333333333333 minutes

Bit shift 1 by 32 places

1 << 32 = 4,294,967,296

Translating into easier to digest units

32381KB / second in MB per minute = 1,942.86 MB/minute
500,000 / 3 years in per hour = 19.01324310408685857874 per hour^2

How long would it take to process a chunk of data?

100 GB / (32381KB / second in MB per minute)  = 51.47051254336390681778 minutes

Hexadecimal to base 10

0x1111 = 4,369
1 + 16 + 16^2 + 16^3 = 4,369

I’m sure there’s much more that you can do that I haven’t figured out yet but even for these simple examples it saves me a bunch of time.

Categories: Programming

Remote profiling Neo4j using yourkit

Tue, 03/25/2014 - 00:44

yourkit is my favourite JVM profiling tool and whilst it’s really easy to profile a local JVM process, sometimes I need to profile a process on a remote machine.

In that case we need to first have the remote JVM started up with a yourkit agent parameter passed as one of the args to the Java program.

Since I’m mostly working with Neo4j this means we need to add the following to conf/neo4j-wrapper.conf:

wrapper.java.additional=-agentpath:/Users/markhneedham/Downloads/YourKit_Java_Profiler_2013_build_13074.app/bin/mac/libyjpagent.jnilib=port=8888

If we run lsof with the Neo4j process ID we’ll see that there’s now a socket listening on port 8888:

java    4388 markhneedham   20u    IPv6 0x901df453b4e9a125       0t0      TCP *:8888 (LISTEN)
...

We can connect to that via the ‘Monitor Remote Applications’ section of yourkit:

2014 03 24 23 39 59

In this case I’m demonstrating how to connect to it on my laptop and am using localhost but usually we’d specify the remote machine’s host name instead.

We also need to ensure that port 8888 is open on any firewalls we have in front of the machine.

The file we refer to in the ‘agentpath’ flag is a bit different depending on the operating system we’re using. All the details are on the yourkit website.

Categories: Programming

Functional Programming in Java – Venkat Subramaniam: Book Review

Sun, 03/23/2014 - 22:18

I picked up Venkat Subramaniam’s ‘Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions‘ to learn a little bit more about Java 8 having struggled to find any online tutorials which did that.

A big chunk of the book focuses on lambdas, functional collection parameters and lazy evaluation which will be familiar to users of C#, Clojure, Scala, Haskell, Ruby, Python, F# or libraries like totallylazy and Guava.

Although I was able to race through the book quite quickly it was still interesting to see how Java 8 is going to reduce the amount of code we need to write to do simple operations on collections.

I wrote up my thoughts on lambda expressions instead of auto closeable, using group by on collections and sorting values in collections in previous blog posts.

I noticed a couple of subtle differences in the method names added to collection e.g. skip/limit are there instead of take/drop for grabbing a subset of said collection.

There are also methods such as ‘mapToInt’ and ‘mapToDouble’ where in other languages you’d just have a single ‘map’ and it would handle everything.

Over the last couple of years I’ve used totallylazy on Java projects to deal with collections and while I like the style of code it encourages you end up with a lot of code due to all the anonymous classes you have to create.

In Java 8 lambdas are a first class concept which should make using totallylazy even better.

In a previous blog post I showed how you’d go about sorted a collection of people by age. In Java 8 it would look like this:

List<Person> people = Arrays.asList(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
people.stream().sorted(comparing(p -> p.getAge())).forEach(System.out::println)

I find the ‘comparing’ function that we have to use a bit unintuitive and this is what we’d have using totallylazy pre Java 8:

Sequence<Person> people = sequence(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
 
people.sortBy(new Callable1<Person, Integer>() {
    @Override
    public Integer call(Person person) throws Exception {
        return person.getAge();
    }
});

Using Java 8 lambdas the code is much simplified:

Sequence<Person> people = sequence(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
System.out.println(people.sortBy(Person::getAge));

If we use ‘forEach’ to print out each person individually we end up with the following:

Sequence<Person> people = sequence(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
people.sortBy(Person::getAge).forEach((Consumer<? super Person>) System.out::println);

The compiler can’t work out whether we want to use the forEach method from totallylazy or from Iterable so we end up having to cast which is a bit nasty.

I haven’t yet tried converting the totallylazy code I’ve written but my thinking is that the real win of Java 8 will be making it easier to use libraries like totallylazy and Guava.

Overall the book describes Java 8′s features very well but if you’ve used any of the languages I mentioned at the top it will all be very familiar – finally Java has caught up with the rest!

Categories: Programming

Neo4j 2.1.0-M01: LOAD CSV with Rik Van Bruggen’s Tube Graph

Mon, 03/03/2014 - 17:34

Last week we released the first milestone of Neo4j 2.1.0 and one its features is a new function in cypher – LOAD CSV – which aims to make it easier to get data into Neo4j.

I thought I’d give it a try to import the London tube graph – something that my colleague Rik wrote about a few months ago.

I’m using the same data set as Rik but I had to tweak it a bit as there were naming differences when describing the connection from Kennington to Waterloo and Kennington to Oval. My updated version of the dataset is on github.

With the help of Alistair we now have a variation on the original which takes into account the various platforms at stations and the waiting time of a train on the platform. This will also enable us to add in things like getting from the ticket hall to the various platforms more easily.

The model looks like this:

2014 03 03 16 15 58

Now we need to create a graph and the first step is to put an index on station name as we’ll be looking that up quite frequently in the queries that follow:

CREATE INDEX on :Station(stationName)

Now that’s in place we can make use of LOAD CSV. The data is very de-normalised which works out quite nicely for us and we end up with the following script:

LOAD CSV FROM "file:/Users/markhneedham/code/tube/runtimes.csv" AS csvLine
WITH csvLine[0] AS lineName, 
     csvLine[1] AS direction, 
     csvLine[2] AS startStationName,
     csvLine[3] AS destinationStationName, 
     toFloat(csvLine[4]) AS distance, 
     toFloat(csvLine[5]) AS runningTime
 
MERGE (start:Station { stationName: startStationName}) 
MERGE (destination:Station { stationName: destinationStationName}) 
MERGE (line:Line { lineName: lineName}) 
MERGE (line) - [:DIRECTION] -> (dir:Direction { direction: direction}) 
CREATE (inPlatform:InPlatform {name: "In: " + destinationStationName + " " + lineName + " " + direction})
CREATE (outPlatform:OutPlatform {name: "Out: " + startStationName + " " + lineName + " " + direction}) 
CREATE (inPlatform) - [:AT] -> (destination) 
CREATE (outPlatform) - [:AT] -> (start) 
CREATE (inPlatform) - [:ON] -> (dir) 
CREATE (outPlatform) - [:ON] -> (dir) 
CREATE (outPlatform) - [r:TRAIN {distance: distance, runningTime: runningTime}] -> (inPlatform)

This file doesn’t contain any headers so we’ll simulate them by using a WITH clause so that we don’t have index lookups all over the place. In this case we’re pointing to a file on the local file system but we could choose to point to a CSV file on the web if we wanted to.

Since stations, lines and directions appear frequently we’ll use MERGE to ensure they don’t get duplicated.

After that we have a post processing step to connect the ‘in’ and ‘out’ platforms shown in the diagram.

MATCH (station:Station) <-[:AT]- (platformIn:InPlatform), 
      (station:Station) <-[:AT]- (platformOut:OutPlatform), 
      (direction:Direction) <-[:ON]- (platformIn:InPlatform), 
      (direction:Direction) <-[:ON]- (platformOut:OutPlatform) 
CREATE (platformIn) -[:WAIT {runningTime: 0.5}]-> (platformOut)

After running a few queries on the graph I realised that it wasn’t possible to combine some journies through Kennington and Euston so I had to add some relationships in there as well:

// link the Euston stations
MATCH (euston:Station {stationName: "EUSTON"})<-[:AT]-(eustonIn:InPlatform)
MATCH (eustonCx:Station {stationName: "EUSTON (CX)"})<-[:AT]-(eustonCxIn:InPlatform)
MATCH (eustonCity:Station {stationName: "EUSTON (CITY)"})<-[:AT]-(eustonCityIn:InPlatform)
 
CREATE UNIQUE (eustonIn)-[:WAIT {runningTime: 0.0}]->(eustonCxIn)
CREATE UNIQUE (eustonIn)-[:WAIT {runningTime: 0.0}]->(eustonCityIn)
CREATE UNIQUE (eustonCxIn)-[:WAIT {runningTime: 0.0}]->(eustonCityIn)
 
// link the Kennington stations
MATCH (kenningtonCx:Station {stationName: "KENNINGTON (CX)"})<-[:AT]-(kenningtonCxIn:InPlatform)
MATCH (kenningtonCity:Station {stationName: "KENNINGTON (CITY)"})<-[:AT]-(kenningtonCityIn:InPlatform)
 
CREATE UNIQUE (kenningtonCxIn)-[:WAIT {runningTime: 0.0}]->(kenningtonCityIn)

I’ve been playing around with the A* algorithm to find the quickest route between stations based on the distances between stations.

The next step is to put a timetable graph alongside this so we can do quickest routes at certain parts of the day and the next step after that will be to take delays into account.

If you’ve got some data you want to get into the graph give LOAD CSV a try and let us know how you get on, the cypher team are keen to get feedback on this.

Categories: Programming

Neo4j: Cypher – Finding directors who acted in their own movie

Fri, 02/28/2014 - 23:57

I’ve been doing quite a few Intro to Neo4j sessions recently and since it contains a lot of problems for the attendees to work on I get to see how first time users of Cypher actually use it.

A couple of hours in we want to write a query to find directors who acted in their own film based on the following model.

2014 02 28 22 40 02

A common answer is the following:

MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
WHERE a.name = d.name
RETURN a

We’re matching an actor ‘a’, finding the movie they acted in and then finding the director of that movie. We now have pairs of actors and directors which we filter down by comparing their ‘name’ property.

I haven’t written SQL for a while but if my memory serves me correctly comparing properties or attributes in this way is quite a common way to test for equality.

In a graph we don’t need to compare properties – what we actually want to check is if ‘a’ and ‘d’ are the same node:

MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(d)
WHERE a = d
RETURN a

We’ve simplifed the query a bit but we can actually go one better by binding the director to the same identifier as the actor like so:

MATCH (a)-[:ACTED_IN]->(m)<-[:DIRECTED]-(a)
RETURN a

So now we’re matching an actor ‘a’, finding the movie they acted in and then finding the director if they happen to be the same person as ‘a’.

The code is now much simpler and more revealing of its intent too.

Categories: Programming

Java 8: Lambda Expressions vs Auto Closeable

Wed, 02/26/2014 - 08:32

If you used earlier versions of Neo4j via its Java API with Java 6 you probably have code similar to the following to ensure write operations happen within a transaction:

public class StylesOfTx
{
    public static void main( String[] args ) throws IOException
    {
        String path = "/tmp/tx-style-test";
        FileUtils.deleteRecursively(new File(path));
 
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
 
        Transaction tx = db.beginTx();
        try 
        {
            db.createNode();
            tx.success();
        } 
        finally 
        {
            tx.close();
        }
    }
}

In Neo4j 2.0 Transaction started extending AutoCloseable which meant that you could use ‘try with resources’ and the ‘close’ method would be automatically called when the block finished:

public class StylesOfTx
{
    public static void main( String[] args ) throws IOException
    {
        String path = "/tmp/tx-style-test";
        FileUtils.deleteRecursively(new File(path));
 
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
 
        try ( Transaction tx = db.beginTx() )
        {
            Node node = db.createNode();
            tx.success();
        }
    }
}

This works quite well although it’s still possible to have transactions hanging around in an application when people don’t use this syntax – the old style is still permissible.

In Venkat Subramaniam’s Java 8 book he suggests an alternative approach where we use a lambda based approach:

public class StylesOfTx
{
    public static void main( String[] args ) throws IOException
    {
        String path = "/tmp/tx-style-test";
        FileUtils.deleteRecursively(new File(path));
 
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
 
        Db.withinTransaction(db, neo4jDb -> {
            Node node = neo4jDb.createNode();
        });
    }
 
    static class Db {
        public static void withinTransaction(GraphDatabaseService db, Consumer<GraphDatabaseService> fn) {
            try ( Transaction tx = db.beginTx() )
            {
                fn.accept(db);
                tx.success();
            }
        }
    }
}

The ‘withinTransaction’ function would actually go on GraphDatabaseService or similar rather than being on that Db class but it was easier to put it on there for this example.

A disadvantage of this style is that you don’t have explicit control over the transaction for handling the failure case – it’s assumed that if ‘tx.success()’ isn’t called then the transaction failed and it’s rolled back. I’m not sure what % of use cases actually need such fine grained control though.

Brian Hurt refers to this as the ‘hole in the middle pattern‘ and I imagine we’ll start seeing more code of this ilk once Java 8 is released and becomes more widely used.

Categories: Programming

Jersey: Ignoring SSL certificate – javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException

Wed, 02/26/2014 - 01:12

Last week Alistair and I were working on an internal application and we needed to make a HTTPS request directly to an AWS machine using a certificate signed to a different host.

We use jersey-client so our code looked something like this:

Client client = Client.create();
 
client.resource("https://some-aws-host.compute-1.amazonaws.com").post();
// and so on

When we ran this we predictably ran into trouble:

com.sun.jersey.api.client.ClientHandlerException: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching some-aws-host.compute-1.amazonaws.com found.
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
	at com.sun.jersey.api.client.Client.handle(Client.java:648)
	at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
	at com.sun.jersey.api.client.WebResource.post(WebResource.java:241)
	at com.neotechnology.testlab.manager.bootstrap.ManagerAdmin.takeBackup(ManagerAdmin.java:33)
	at com.neotechnology.testlab.manager.bootstrap.ManagerAdminTest.foo(ManagerAdminTest.java:11)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching some-aws-host.compute-1.amazonaws.com found.
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
	at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1884)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:276)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:270)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1341)
	at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:153)
	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868)
	at sun.security.ssl.Handshaker.process_record(Handshaker.java:804)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1016)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1339)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1323)
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1300)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
	... 31 more
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching some-aws-host.compute-1.amazonaws.com found.
	at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:191)
	at sun.security.util.HostnameChecker.match(HostnameChecker.java:93)
	at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:347)
	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:203)
	at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:126)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1323)
	... 45 more

We figured that we needed to get our client to ignore the certificate and came across this Stack Overflow thread which had some suggestions on how to do this.

None of the suggestions worked on their own but we ended up with a combination of a couple of the suggestions which did the trick:

public Client hostIgnoringClient() {
    try
    {
        SSLContext sslcontext = SSLContext.getInstance( "TLS" );
        sslcontext.init( null, null, null );
        DefaultClientConfig config = new DefaultClientConfig();
        Map<String, Object> properties = config.getProperties();
        HTTPSProperties httpsProperties = new HTTPSProperties(
                new HostnameVerifier()
                {
                    @Override
                    public boolean verify( String s, SSLSession sslSession )
                    {
                        return true;
                    }
                }, sslcontext
        );
        properties.put( HTTPSProperties.PROPERTY_HTTPS_PROPERTIES, httpsProperties );
        config.getClasses().add( JacksonJsonProvider.class );
        return Client.create( config );
    }
    catch ( KeyManagementException | NoSuchAlgorithmException e )
    {
        throw new RuntimeException( e );
    }
}

You’re welcome Future Mark.

Categories: Programming

Java 8: Group by with collections

Sun, 02/23/2014 - 20:16

In my continued reading of Venkat Subramaniam’s ‘Functional Programming in Java‘ I’ve reached the part of the book where the Stream#collect function is introduced.

We want to take a collection of people, group them by age and return a map of (age -> people’s names) for which this comes in handy.

To refresh, this is what the Person class looks like:

static class Person {
    private String name;
    private int age;
 
    Person(String name, int age) {
 
        this.name = name;
        this.age = age;
    }
 
    @Override
    public String toString() {
        return String.format("Person{name='%s', age=%d}", name, age);
    }
}

And we can write the following code in Java 8 to get a map of people’s names grouped by age:

Stream<Person> people = Stream.of(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
Map<Integer, List<String>> peopleByAge = people
    .collect(groupingBy(p -> p.age, mapping((Person p) -> p.name, toList())));
System.out.println(peopleByAge);
{24=[Paul], 28=[Will], 30=[Mark]}

We’re running the ‘collect’ function over the collection, grouping by the ‘age’ property as we go and grouping the names of people rather than the people themselves.

This is a little bit different to what you’d do in Ruby where there’s a ‘group_by’ function which you can call on a collection:

> people = [ {:name => "Paul", :age => 24}, {:name => "Mark", :age => 30}, {:name => "Will", :age => 28}]
> people.group_by { |p| p[:age] }
=> {24=>[{:name=>"Paul", :age=>24}], 30=>[{:name=>"Mark", :age=>30}], 28=>[{:name=>"Will", :age=>28}]}

This gives us back lists of people grouped by age but we need to apply an additional ‘map’ operation to change that to be a list of names instead:

> people.group_by { |p| p[:age] }.map { |k,v| [k, v.map { |person| person[:name] } ] }
=> [[24, ["Paul"]], [30, ["Mark"]], [28, ["Will"]]]

At this stage we’ve got an array of (age, names) pairs but luckily Ruby 2.1.0 has a function ‘to_h’ which we can call to get back to a hash again:

> people.group_by { |p| p[:age] }.map { |k,v| [k, v.map { |person| person[:name] } ] }.to_h
=> {24=>["Paul"], 30=>["Mark"], 28=>["Will"]}

If we want to follow the Java approach of grouping by a property while running a reduce over the collection we’d have something like the following:

> people.reduce({}) { |acc, item| acc[item[:age]] ||=[]; acc[item[:age]] << item[:name]; acc }
=> {24=>["Paul"], 30=>["Mark"], 28=>["Will"]}

If we’re using Clojure then we might end up with something like this instead:

(def people
  [{:name "Paul", :age 24} {:name "Mark", :age 30} {:name "Will", :age 28}])
 
> (reduce (fn [acc [k v]] (assoc-in acc [k] (map :name v))) {} (group-by :age people))
{28 ("Will"), 30 ("Mark"), 24 ("Paul")}

I thought the Java version looked a bit weird to begin with but it’s actually not too bad having worked through the problem in a couple of other languages.

It’d be good to know whether there’s a better way of doing this the Ruby/Clojure way though!

Categories: Programming

Java 8: Sorting values in collections

Sun, 02/23/2014 - 15:43

Having realised that Java 8 is due for its GA release within the next few weeks I thought it was about time I had a look at it and over the last week have been reading Venkat Subramaniam’s book.

I’m up to chapter 3 which covers sorting a collection of people. The Person class is defined roughly like so:

static class Person {
    private String name;
    private int age;
 
    Person(String name, int age) {
        this.name = name;
        this.age = age;
    }
 
    @Override
    public String toString() {
        return String.format("Person{name='%s', age=%d}", name, age);
    }
}

In the first example we take a list of people and then sort them in ascending age order:

List<Person> people = Arrays.asList(new Person("Paul", 24), new Person("Mark", 30), new Person("Will", 28));
people.stream().sorted((p1, p2) -> p1.age - p2.age).forEach(System.out::println);
Person{name='Paul', age=24}
Person{name='Will', age=28}
Person{name='Mark', age=30}

If we were to write a function to do the same thing in Java 7 it’d look like this:

Collections.sort(people, new Comparator<Person>() {
    @Override
    public int compare(Person o1, Person o2) {
        return o1.age - o2.age;
    }
});
 
for (Person person : people) {
    System.out.println(person);
}

Java 8 has reduced the amount of code we have to write although it’s still more complicated than what we could do in Ruby:

> people = [ {:name => "Paul", :age => 24}, {:name => "Mark", :age => 30}, {:name => "Will", :age => 28}]
> people.sort_by { |p| p[:age] }
=> [{:name=>"Paul", :age=>24}, {:name=>"Will", :age=>28}, {:name=>"Mark", :age=>30}]

A few pages later Venkat shows how you can get close to this by using the Comparator#comparing function:

Function<Person, Integer> byAge = p -> p.age ;
people.stream().sorted(comparing(byAge)).forEach(System.out::println);

I thought I could make this simpler by inlining the ‘byAge’ lambda like this:

people.stream().sorted(comparing(p -> p.age)).forEach(System.out::println);

This seems to compile and run correctly although IntelliJ 13.0 suggests there is a ‘cyclic inference‘ problem. IntelliJ is happy if we explicitly cast the lambda like this:

people.stream().sorted(comparing((Function<Person, Integer>) p -> p.age)).forEach(System.out::println);

IntelliJ also seems happy if we explicitly type ‘p’ in the lambda, so I think I’ll go with that for the moment:

people.stream().sorted(comparing((Person p) -> p.age)).forEach(System.out::println);
Categories: Programming

Automating Skype’s ‘This message has been removed’

Fri, 02/21/2014 - 00:16

One of the stranger features of Skype is that that it allows you to delete the contents of a message that you’ve already sent to someone – something I haven’t seen on any other messaging system I’ve used.

For example if I wrote a message in Skype and wanted to edit it I would press the ‘up’ arrow:

2014 02 20 23 02 28

Once I’ve deleted the message I’d see this in the space where the message used to be:

2014 02 20 23 00 41

I almost certainly am too obsessed with this but I find it quite amusing when I see people posting and retracting messages so I wanted to see if it could be automated.

Alistair showed me Automator, a built in tool on the Mac for automating work flows.

Automator allows you to execute Applescript so we wrote the following code which selects the current chat in Skype, writes a message and then deletes it one character at a time:

on run {input, parameters}
	tell application "Skype"
		activate
	end tell
 
	tell application "System Events"
		set message to "now you see me, now you don't"
		keystroke message
		keystroke return
		keystroke (ASCII character 30) --up arrow
		repeat length of message times
			keystroke (ASCII character 8) --backspace
		end repeat
		keystroke return
	end tell
	return input
end run

We wired up the Applescript via the Utilities > Run Applescript menu option in Automator:

2014 02 20 23 12 38

We can then go further and wire that up to a keyboard shortcut if we want by saving the workflow as a service in Automator but for my messing around purposes clicking the ‘Run’ button from Automator didn’t seem too much of a hardship!

Categories: Programming

Neo4j: Cypher – Set Based Operations

Thu, 02/20/2014 - 19:22

I was recently reminded of a Neo4j cypher query that I wrote a couple of years ago to find the colleagues that I hadn’t worked with in the ThoughtWorks London office.

The model looked like this:

2014 02 18 17 04 01

And I created the following fake data set of the aforementioned model:

public class SetBasedOperations
{
    private static final Label PERSON = DynamicLabel.label( "Person" );
    private static final Label OFFICE = DynamicLabel.label( "Office" );
 
    private static final DynamicRelationshipType COLLEAGUES = DynamicRelationshipType.withName( "COLLEAGUES" );
    private static final DynamicRelationshipType MEMBER_OF = DynamicRelationshipType.withName( "MEMBER_OF" );
 
    public static void main( String[] args ) throws IOException
    {
        Random random = new Random();
        String path = "/tmp/set-based-operations";
        FileUtils.deleteRecursively( new File( path ) );
 
        GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path );
 
        Transaction tx = db.beginTx();
        try
        {
            Node me = db.createNode( PERSON );
            me.setProperty( "name", "me" );
 
            Node londonOffice = db.createNode( OFFICE );
            londonOffice.setProperty( "name", "London Office" );
 
            me.createRelationshipTo( londonOffice, MEMBER_OF );
 
            for ( int i = 0; i < 1000; i++ )
            {
                Node colleague = db.createNode( PERSON );
                colleague.setProperty( "name", "person" + i );
 
                colleague.createRelationshipTo( londonOffice, MEMBER_OF );
 
                if(random.nextInt( 10 ) >= 8) {
                    me.createRelationshipTo( colleague, COLLEAGUES );
                }
 
                tx.success();
            }
        }
        finally
        {
            tx.finish();
        }
 
        db.shutdown();
 
        CommunityNeoServer server = CommunityServerBuilder
                .server()
                .usingDatabaseDir( path )
                .onPort( 9001 )
                .persistent()
                .build();
 
        server.start();
 
    }
}

I’ve created a node representing me and 1,000 people who work in the London office. Out of those 1,000 people I made it so that ~150 of them have worked with me.

If I want to write a cypher query to find the exact number of people who haven’t worked with me I might start with the following:

MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)

We start by finding me, then find the London office which I was a member of, and then find the other people who are members of that office. On the second line we remove people that I’ve previously worked with and then return a count of the people who are left.

When I ran this through my Cypher query tuning tool the average time to evaluate this query was 7.46 seconds.

That is obviously a bit too slow if we want to run the query on a web page and as far as I can tell the reason for that is that for each potential colleague we are searching through my ‘COLLEAGUES’ relationships and checking whether they exist. We’re doing that 1,000 times which is a bit inefficient.

I chatted to David about this, and he suggested that a more efficient query would be to work out all my colleagues up front once and then do the filtering from that set of people instead.

The re-worked query looks like this:

MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)

When we run that through the query tuner the average time reduces to 150 milliseconds which is much better.

This type of query seems to be more about set operations than graph ones because we’re looking for what isn’t there rather than what is. When that’s the case getting the set of things that we want to compare against up front is more profitable.

Categories: Programming

Neo4j: Creating nodes and relationships from a list of maps

Mon, 02/17/2014 - 15:11

Last week Alistair and I were porting some Neo4j cypher queries from 1.8 to 2.0 and one of the queries we had to change was an interesting one that created a bunch of relationships from a list/array of maps.

In the query we had a user ‘Mark’ and wanted to create ‘FRIENDS_WITH’ relationships to Peter and Michael.

2014 02 17 13 39 08

The application passed in a list of maps representing Peter and Michael as a parameter but if we remove the parameters the query looked like this:

MERGE (me:User {userId: 1} )
SET me.name = "Mark"
FOREACH (f IN [{userId: 2, name: "Michael"}, {userId: 3, name: "Peter"}] | 
    MERGE (u:User {userId: f.userId})
    SET u = f
    MERGE (me)-[:FRIENDS_WITH]->(u))

We first ensure that a user with ‘id’ of 1 exists in the database and then make sure their name is set to ‘Mark’. After we’ve done that we iterate over a list of maps containing Mark’s friends and ensure there is a ‘FRIENDS_WITH’ relationship from Mark to them.

The parameterised version of that query looks like this:

MERGE (me:User { userId: {userId} }) 
SET me.name = {name} 
FOREACH(f IN {friends} | 
    MERGE (u:User {userId: f.userId }) 
    SET u = f 
    MERGE (me)-[:FRIENDS_WITH]->(u))

We can then execute that query using Jersey:

public class ListsOfMapsCypher
{
    public static void main( String[] args )
    {
        ObjectNode request = JsonNodeFactory.instance.objectNode();
        request.put("query",
                "MERGE (me:User { userId: {userId} }) " +
                "SET me.name = {name} " +
                "FOREACH(f IN {friends} | " +
                    "MERGE (u:User {userId: f.userId }) " +
                    "SET u = f " +
                    "MERGE (me)-[:FRIENDS_WITH]->(u)) ");
 
        ObjectNode params = JsonNodeFactory.instance.objectNode();
        params.put("userId", 1);
        params.put("name", "Mark");
 
        ArrayNode friends = JsonNodeFactory.instance.arrayNode();
 
        ObjectNode friend1 = JsonNodeFactory.instance.objectNode();
        friend1.put( "userId", 2 );
        friend1.put( "name", "Michael" );
        friends.add( friend1 );
 
        ObjectNode friend2 = JsonNodeFactory.instance.objectNode();
        friend2.put( "userId", 3 );
        friend2.put( "name", "Peter" );
        friends.add( friend2 );
 
        params.put("friends", friends );
 
        request.put("params", params );
 
        ClientResponse clientResponse = client()
                .resource( "http://localhost:7474/db/data/cypher" )
                .accept( MediaType.APPLICATION_JSON )
                .entity( request, MediaType.APPLICATION_JSON_TYPE )
                .post( ClientResponse.class );
 
 
        System.out.println(clientResponse.getEntity( String.class ));
    }
 
    private static Client client()
    {
        DefaultClientConfig defaultClientConfig = new DefaultClientConfig();
        defaultClientConfig.getClasses().add(JacksonJsonProvider.class);
        return Client.create(defaultClientConfig);
    }
}

We can then write a query to check Mark and his friends were persisted:

2014 02 17 14 10 12

And that’s it!

Categories: Programming

Neo4j: Creating nodes and relationships from a list of maps

Mon, 02/17/2014 - 15:11

Last week Alistair and I were porting some Neo4j cypher queries from 1.8 to 2.0 and one of the queries we had to change was an interesting one that created a bunch of relationships from a list/array of maps.

In the query we had a user ‘Mark’ and wanted to create ‘FRIENDS_WITH’ relationships to Peter and Michael.

2014 02 17 13 39 08

The application passed in a list of maps representing Peter and Michael as a parameter but if we remove the parameters the query looked like this:

MERGE (me:User {userId: 1} )
SET me.name = "Mark"
FOREACH (f IN [{userId: 2, name: "Michael"}, {userId: 3, name: "Peter"}] | 
    MERGE (u:User {userId: f.userId})
    SET u = f
    MERGE (me)-[:FRIENDS_WITH]->(u))

We first ensure that a user with ‘id’ of 1 exists in the database and then make sure their name is set to ‘Mark’. After we’ve done that we iterate over a list of maps containing Mark’s friends and ensure there is a ‘FRIENDS_WITH’ relationship from Mark to them.

The parameterised version of that query looks like this:

MERGE (me:User { userId: {userId} }) 
SET me.name = {name} 
FOREACH(f IN {friends} | 
    MERGE (u:User {userId: f.userId }) 
    SET u = f 
    MERGE (me)-[:FRIENDS_WITH]->(u))

We can then execute that query using Jersey:

public class ListsOfMapsCypher
{
    public static void main( String[] args )
    {
        ObjectNode request = JsonNodeFactory.instance.objectNode();
        request.put("query",
                "MERGE (me:User { userId: {userId} }) " +
                "SET me.name = {name} " +
                "FOREACH(f IN {friends} | " +
                    "MERGE (u:User {userId: f.userId }) " +
                    "SET u = f " +
                    "MERGE (me)-[:FRIENDS_WITH]->(u)) ");
 
        ObjectNode params = JsonNodeFactory.instance.objectNode();
        params.put("userId", 1);
        params.put("name", "Mark");
 
        ArrayNode friends = JsonNodeFactory.instance.arrayNode();
 
        ObjectNode friend1 = JsonNodeFactory.instance.objectNode();
        friend1.put( "userId", 2 );
        friend1.put( "name", "Michael" );
        friends.add( friend1 );
 
        ObjectNode friend2 = JsonNodeFactory.instance.objectNode();
        friend2.put( "userId", 3 );
        friend2.put( "name", "Peter" );
        friends.add( friend2 );
 
        params.put("friends", friends );
 
        request.put("params", params );
 
        ClientResponse clientResponse = client()
                .resource( "http://localhost:7474/db/data/cypher" )
                .accept( MediaType.APPLICATION_JSON )
                .entity( request, MediaType.APPLICATION_JSON_TYPE )
                .post( ClientResponse.class );
 
 
        System.out.println(clientResponse.getEntity( String.class ));
    }
 
    private static Client client()
    {
        DefaultClientConfig defaultClientConfig = new DefaultClientConfig();
        defaultClientConfig.getClasses().add(JacksonJsonProvider.class);
        return Client.create(defaultClientConfig);
    }
}

We can then write a query to check Mark and his friends were persisted:

2014 02 17 14 10 12

And that’s it!

Categories: Programming

Neo4j: Value in relationships, but value in nodes too!

Thu, 02/13/2014 - 01:10

I’ve recently spent a bit of time working with people on their graph commons and a common pattern I’ve come across is that although the models have lots of relationships there are often missing nodes.

Emails

We’ll start with a model which represents the emails that people send between each other. A first cut might look like this:

2014 02 12 08 30 59

The problem with this approach is that we haven’t modelled the concept of an email – that’s been implicitly modelled via a relationship.

This means that if we want to indicate who was cc’d or bcc’d on the email we can’t do it. We might also want to track the replies on a thread but again we can’t do it.

A richer model that treated an email as a first class citizen would allow us to do both these things and would look like this:

2014 02 12 23 16 02

We could then write queries to get the chain of emails in a thread or find all the emails that a person was cc’d in – two queries that would be much more difficult to write if we don’t have the concept of an email.

Footballers and football matches

Our second example come from my football dataset and involves modelling the matches that players participated in.

My first attempt looked like this:

2014 02 12 23 30 35

This works reasonably well but I wanted to be able to model which team a player had represented in a match which was quite difficult with this model.

One approach would be to add a ‘team’ property to the ‘PLAYED_IN’ relationship but then we’d need to do some work at query time to work out which team node that property value referred to.

Instead I realised that I was missing the concept of a player’s performance in a match which would make some queries much easier to write. The new model looks like this:

2014 02 12 23 37 28 The tube

The final example is modelling the London tube although this could apply to any transport system. Our initial model of part of the Northern Line might look like this:

2014 02 12 23 59 46

This model works really well and my colleague Rik has written a blog post showing the queries you could write against it.

However, it’s missing the concept of a platform which means if we want to create a routing application which takes into account the amount of time it takes to move between different

If we introduce a node to represent the different platforms in a station we can introduce that type of information:

2014 02 13 00 04 06

In each of these examples we’ve effectively normalised our model by introducing an extra concept which means it looks more complicated.

The benefit of this approach across all three examples is that it allows us to answer more complicated questions of our data which in my experience are the really interesting questions.

As always, let me know what you think in the comments.

Categories: Programming