Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Mark Needham
Syndicate content
Thoughts on Software Development
Updated: 5 hours 41 min ago

Neo4j/R: Analysing London NoSQL meetup membership

Sat, 05/31/2014 - 22:32

In my spare time I’ve been working on a Neo4j application that runs on tops of meetup.com’s API and Nicole recently showed me how I could wire up some of the queries to use her Rneo4j library:

@markhneedham pic.twitter.com/8014jckEUl

— Nicole White (@_nicolemargaret) May 31, 2014

The query used in that visualisation shows the number of members that overlap between each pair of groups but a more interesting query is the one which shows the % overlap between groups based on the unique members across the groups.

The query is a bit more complicated than the original:

MATCH (group1:Group), (group2:Group)
OPTIONAL MATCH (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
 
WITH group1, group2, COUNT(*) as commonMembers
MATCH (group1)<-[:MEMBER_OF]-(group1Member)
 
WITH group1, group2, commonMembers, COLLECT(id(group1Member)) AS group1Members
MATCH (group2)<-[:MEMBER_OF]-(group2Member)
 
WITH group1, group2, commonMembers, group1Members, COLLECT(id(group2Member)) AS group2Members
WITH group1, group2, commonMembers, group1Members, group2Members
 
UNWIND(group1Members + group2Members) AS combinedMember
WITH DISTINCT group1, group2, commonMembers, combinedMember
 
WITH group1, group2, commonMembers, COUNT(combinedMember) AS combinedMembers
 
RETURN group1.name, group2.name, toInt(round(100.0 * commonMembers / combinedMembers)) AS percentage		 
ORDER BY group1.name, group1.name

The next step is to wire that up to use Rneo4j and ggplot2. First we’ll get the libraries installed and loaded:

install.packages("devtools")
devtools::install_github("nicolewhite/Rneo4j")
install.packages("ggplot2")
 
library(Rneo4j)
library(ggplot2)

And now we’ll execute the query and create a chart from the results:

graph = startGraph("http://localhost:7474/db/data/")
 
query = "MATCH (group1:Group), (group2:Group)
         WHERE group1 <> group2
         OPTIONAL MATCH p = (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
         WITH group1, group2, COLLECT(p) AS paths
         RETURN group1.name, group2.name, LENGTH(paths) as commonMembers
         ORDER BY group1.name, group2.name"
 
group_overlap = cypher(graph, query)
 
ggplot(group_overlap, aes(x=group1.name, y=group2.name, fill=commonMembers)) + 
geom_bin2d() +
geom_text(aes(label = commonMembers)) +
labs(x= "Group", y="Group", title="Member Group Member Overlap") +
scale_fill_gradient(low="white", high="red") +
theme(axis.text = element_text(size = 12, color = "black"),
      axis.title = element_text(size = 14, color = "black"),
      plot.title = element_text(size = 16, color = "black"),
      axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
 
// as percentage
 
query = "MATCH (group1:Group), (group2:Group)
         WHERE group1 <> group2
         OPTIONAL MATCH path = (group1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(group2)
 
         WITH group1, group2, COLLECT(path) AS paths
 
         WITH group1, group2, LENGTH(paths) as commonMembers
         MATCH (group1)<-[:MEMBER_OF]-(group1Member)
 
         WITH group1, group2, commonMembers, COLLECT(id(group1Member)) AS group1Members
         MATCH (group2)<-[:MEMBER_OF]-(group2Member)
 
         WITH group1, group2, commonMembers, group1Members, COLLECT(id(group2Member)) AS group2Members
         WITH group1, group2, commonMembers, group1Members, group2Members
 
         UNWIND(group1Members + group2Members) AS combinedMember
         WITH DISTINCT group1, group2, commonMembers, combinedMember
 
         WITH group1, group2, commonMembers, COUNT(combinedMember) AS combinedMembers
 
         RETURN group1.name, group2.name, toInt(round(100.0 * commonMembers / combinedMembers)) AS percentage
 
         ORDER BY group1.name, group1.name"
 
group_overlap = cypher(graph, query)
 
ggplot(group_overlap, aes(x=group1.name, y=group2.name, fill=percentage)) + 
  geom_bin2d() +
  geom_text(aes(label = percentage)) +
  labs(x= "Group", y="Group", title="Member Group Member Overlap") +
  scale_fill_gradient(low="white", high="red") +
  theme(axis.text = element_text(size = 12, color = "black"),
        axis.title = element_text(size = 14, color = "black"),
        plot.title = element_text(size = 16, color = "black"),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
2014 05 31 21 54 42

A first glance at the visualisation suggests that the Hadoop, Data Science and Big Data groups have the most overlap which seems to make sense as they do cover quite similar topics.

Thanks to Nicole for the library and the idea of the visualisation. Now we need to do some more analysis on the data to see if there are any more interesting insights.

Categories: Programming

Thoughts on meetups

Sat, 05/31/2014 - 20:50

I recently came across an interesting blog post by Zach Tellman in which he explains a new approach that he’s been trialling at The Bay Area Clojure User Group.

Zach explains that a lecture based approach isn’t necessarily the most effective way for people to learn and that half of the people attending the meetup are likely to be novices and would struggle to follow more advanced content.

He then goes on to explain an alternative approach:

We’ve been experimenting with a Clojure meetup modelled on a different academic tradition: office hours.

At a university, students who have questions about the lecture content or coursework can visit the professor and have a one-on-one conversation.

At the beginning of every meetup, we give everyone a name tag, and provide a whiteboard with two columns, “teachers” and “students”.

Attendees are encouraged to put their name and interests in both columns. From there, everyone can [...] go in search of someone from the opposite column who shares their interests.

While running Neo4j meetups we’ve had similar observations and my colleagues Stefan and Cedric actually ran a meetup in Paris a few months ago which sounds very similar to Zach’s ‘office hours’ style one.

However, we’ve also been experimenting with the idea that one size doesn’t need to fit all by running different styles of meetups aimed at different people.

For example, we have:

  • An introductory meetup which aims to get people to the point where they can follow talks about more advanced topics.
  • A more hands on session for people who want to learn how to write queries in cypher, Neo4j’s query language.
  • An advanced session for people who want to learn how to model a problem as a graph and import data into a graph.

I’m also thinking of running something similar to the Clojure Dojo but focused on data and graphs where groups of people could work together and build an app.

I noticed that Nick Manning has been doing a similar thing with the New York City Neo4j meetup as well, which is cool.

I’d be interested in hearing about different / better approaches that other people have come across so if you know of any let me know in the comments.

Categories: Programming

Neo4j: Cypher – UNWIND vs FOREACH

Sat, 05/31/2014 - 15:19

I’ve written a couple of posts about the new UNWIND clause in Neo4j’s cypher query language but I forgot about my favourite use of UNWIND, which is to get rid of some uses of FOREACH from our queries.

Let’s say we’ve created a timetree up front and now have a series of events coming in that we want to create in the database and attach to the appropriate part of the timetree.

Before UNWIND existed we might try to write the following query using FOREACH:

WITH [{name: "Event 1", timetree: {day: 1, month: 1, year: 2014}}, 
      {name: "Event 2", timetree: {day: 2, month: 1, year: 2014}}] AS events
FOREACH (event IN events | 
  CREATE (e:Event {name: event.name})
  MATCH (year:Year {year: event.timetree.year }), 
        (year)-[:HAS_MONTH]->(month {month: event.timetree.month }),
        (month)-[:HAS_DAY]->(day {day: event.timetree.day })
  CREATE (e)-[:HAPPENED_ON]->(day))

Unfortunately we can’t use MATCH inside a FOREACH statement so we’ll get the following error:

Invalid use of MATCH inside FOREACH (line 5, column 3)
"  MATCH (year:Year {year: event.timetree.year }), "
   ^
Neo.ClientError.Statement.InvalidSyntax

We can work around this by using MERGE instead in the knowledge that it’s never going to create anything because the timetree already exists:

WITH [{name: "Event 1", timetree: {day: 1, month: 1, year: 2014}}, 
      {name: "Event 2", timetree: {day: 2, month: 1, year: 2014}}] AS events
FOREACH (event IN events | 
  CREATE (e:Event {name: event.name})
  MERGE (year:Year {year: event.timetree.year })
  MERGE (year)-[:HAS_MONTH]->(month {month: event.timetree.month })
  MERGE (month)-[:HAS_DAY]->(day {day: event.timetree.day })
  CREATE (e)-[:HAPPENED_ON]->(day))

If we replace the FOREACH with UNWIND we’d get the following:

WITH [{name: "Event 1", timetree: {day: 1, month: 1, year: 2014}}, 
      {name: "Event 2", timetree: {day: 2, month: 1, year: 2014}}] AS events
UNWIND events AS event
CREATE (e:Event {name: event.name})
WITH e, event.timetree AS timetree
MATCH (year:Year {year: timetree.year }), 
      (year)-[:HAS_MONTH]->(month {month: timetree.month }),
      (month)-[:HAS_DAY]->(day {day: timetree.day })
CREATE (e)-[:HAPPENED_ON]->(day)

Although the lines of code has slightly increased the query is now correct and we won’t accidentally correct new parts of our time tree.

We could also pass on the event that we created to the next part of the query which wouldn’t be the case when using FOREACH.

Categories: Programming

Neo4j: Cypher – Neo.ClientError.Statement.ParameterMissing and neo4j-shell

Sat, 05/31/2014 - 13:44

Every now and then I get sent Neo4j cypher queries to look at and more often than not they’re parameterised which means you can’t easily run them in the Neo4j browser.

For example let’s say we have a database which has a user called ‘Mark’:

CREATE (u:User {name: "Mark"})

Now we write a query to find ‘Mark’ with the name parameterised so we can easily search for a different user in future:

MATCH (u:User {name: {name}}) RETURN u

If we run that query in the Neo4j browser we’ll get this error:

Expected a parameter named name
Neo.ClientError.Statement.ParameterMissing

If we try that in neo4j-shell we’ll get the same exception to start with:

$ MATCH (u:User {name: {name}}) RETURN u;
ParameterNotFoundException: Expected a parameter named name

However, as Michael pointed out to me, the neat thing about neo4j-shell is that we can define parameters by using the export command:

$ export name="Mark"
$ MATCH (u:User {name: {name}}) RETURN u;
+-------------------------+
| u                       |
+-------------------------+
| Node[1923]{name:"Mark"} |
+-------------------------+
1 row

export is a bit sensitive to spaces so it’s best to keep them to a minimum. e.g. the following tries to create the variable ‘name ‘ which is invalid:

$ export name = "Mark"
name  is no valid variable name. May only contain alphanumeric characters and underscores.

The variables we create in the shell don’t have to only be primitives. We can create maps too:

$ export params={ name: "Mark" }
$ MATCH (u:User {name: {params}.name}) RETURN u;
+-------------------------+
| u                       |
+-------------------------+
| Node[1923]{name:"Mark"} |
+-------------------------+
1 row

A simple tip but one that saves me from having to rewrite queries all the time!

Categories: Programming

Clojure: Destructuring group-by’s output

Sat, 05/31/2014 - 01:03

One of my favourite features of Clojure is that it allows you to destructure a data structure into values that are a bit easier to work with.

I often find myself referring to Jay Fields’ article which contains several examples showing the syntax and is a good starting point.

One recent use of destructuring I had was where I was working with a vector containing events like this:

user> (def events [{:name "e1" :timestamp 123} {:name "e2" :timestamp 456} {:name "e3" :timestamp 789}])

I wanted to split the events in two – those containing events with a timestamp greater than 123 and those less than or equal to 123.

After remembering that the function I wanted was group-by and not partition-by (I always make that mistake!) I had the following:

user> (group-by #(> (->> % :timestamp) 123) events)
{false [{:name "e1", :timestamp 123}], true [{:name "e2", :timestamp 456} {:name "e3", :timestamp 789}]}

I wanted to get 2 vectors that I could pass to the web page and this is fairly easy with destructuring:

user> (let [{upcoming true past false} (group-by #(> (->> % :timestamp) 123) events)] 
       (println upcoming) (println past))
[{:name e2, :timestamp 456} {:name e3, :timestamp 789}]
[{:name e1, :timestamp 123}]
nil

Simple!

Categories: Programming

Neo4j: Cypher – Rounding a float value to decimal places

Sun, 05/25/2014 - 23:17

About 6 months ago Jacqui Read created a github issue explaining how she wanted to round a float value to a number of decimal places but was unable to do so due to the round function not taking the appropriate parameter.

I found myself wanting to do the same thing last week where I initially had the following value:

RETURN toFloat("12.336666") AS value

I wanted to round that to 2 decimal places and Wes suggested multiplying the value before using ROUND and then dividing afterwards to achieve that.

For two decimal places we need to multiply and divide by 100:

WITH toFloat("12.336666") AS value
RETURN round(100 * value) / 100 AS value
12.34

Michael suggested abstracting the number of decimal places like so:

WITH 2 as precision
WITH toFloat("12.336666") AS value, 10^precision AS factor
RETURN round(factor * value)/factor AS value

If we want to round to 4 decimal places we can easily change that:

WITH 4 as precision
WITH toFloat("12.336666") AS value, 10^precision AS factor
RETURN round(factor * value)/factor AS value
12.3367

The code is available as a graph gist too if you want to play around with it. And you may as well check out the other gists while you’re at it – enjoy!

Categories: Programming

Neo4j 2.1: Passing around node ids vs UNWIND

Sun, 05/25/2014 - 11:48

When Neo4j 2.1 is released we’ll have the UNWIND clause which makes working with collections of things easier.

In my blog post about creating adjacency matrices we wanted to show how many people were members of the first 5 meetup groups ordered alphabetically and then check how many were members of each of the other groups.

Without the UNWIND clause we’d have to do this:

MATCH (g:Group)
WITH g
ORDER BY g.name
LIMIT 5
 
WITH COLLECT(id(g)) AS groups
 
MATCH (g1) WHERE id(g1) IN groups
MATCH (g2) WHERE id(g2) IN groups
 
OPTIONAL MATCH path = (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
 
RETURN g1.name, g2.name, CASE WHEN path is null THEN 0 ELSE COUNT(path) END AS overlap

Here we get the first 5 groups, put their IDs into a collection and then create a cartesian product of groups by doing back to back MATCH’s with a node id lookup.

If instead of passing around node ids in ‘groups’ we pass around nodes and then used those in the MATCH step we’d end up doing a full node scan which becomes very slow as the store grows.

e.g. this version would be very slow

MATCH (g:Group)
WITH g
ORDER BY g.name
LIMIT 5
 
WITH COLLECT(g) AS groups
 
MATCH (g1) WHERE g1 IN groups
MATCH (g2) WHERE g2 IN groups
 
OPTIONAL MATCH path = (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
 
RETURN g1.name, g2.name, CASE WHEN path is null THEN 0 ELSE COUNT(path) END AS overlap

This is the output from the original query:

+-------------------------------------------------------------------------------------------------------------+
| g1.name                                         | g2.name                                         | overlap |
+-------------------------------------------------------------------------------------------------------------+
| "Big Data Developers in London"                 | "Big Data / Data Science / Data Analytics Jobs" | 17      |
| "Big Data Jobs in London"                       | "Big Data London"                               | 190     |
| "Big Data London"                               | "Big Data Developers in London"                 | 244     |
| "Cassandra London"                              | "Big Data / Data Science / Data Analytics Jobs" | 16      |
| "Big Data Jobs in London"                       | "Big Data Developers in London"                 | 52      |
| "Cassandra London"                              | "Cassandra London"                              | 0       |
| "Big Data London"                               | "Big Data / Data Science / Data Analytics Jobs" | 36      |
| "Big Data London"                               | "Cassandra London"                              | 422     |
| "Big Data Jobs in London"                       | "Big Data Jobs in London"                       | 0       |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data / Data Science / Data Analytics Jobs" | 0       |
| "Big Data Jobs in London"                       | "Cassandra London"                              | 74      |
| "Big Data Developers in London"                 | "Big Data London"                               | 244     |
| "Cassandra London"                              | "Big Data Jobs in London"                       | 74      |
| "Cassandra London"                              | "Big Data London"                               | 422     |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data London"                               | 36      |
| "Big Data Jobs in London"                       | "Big Data / Data Science / Data Analytics Jobs" | 20      |
| "Big Data Developers in London"                 | "Big Data Jobs in London"                       | 52      |
| "Cassandra London"                              | "Big Data Developers in London"                 | 69      |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Jobs in London"                       | 20      |
| "Big Data Developers in London"                 | "Big Data Developers in London"                 | 0       |
| "Big Data Developers in London"                 | "Cassandra London"                              | 69      |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Developers in London"                 | 17      |
| "Big Data London"                               | "Big Data Jobs in London"                       | 190     |
| "Big Data / Data Science / Data Analytics Jobs" | "Cassandra London"                              | 16      |
| "Big Data London"                               | "Big Data London"                               | 0       |
+-------------------------------------------------------------------------------------------------------------+
25 rows

If we use UNWIND we don’t need to pass around node ids anymore, instead we can collect up the nodes into a collection and then explode them out into a cartesian product:

MATCH (g:Group)
WITH g
ORDER BY g.name
LIMIT 5
 
WITH COLLECT(g) AS groups
 
UNWIND groups AS g1
UNWIND groups AS g2
 
OPTIONAL MATCH path = (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
 
RETURN g1.name, g2.name, CASE WHEN path is null THEN 0 ELSE COUNT(path) END AS overlap

There’s not significantly less code but I think the intent of the query is a bit clearer using UNWIND.

I’m looking forward to seeing the innovative uses of UNWIND people come up with once 2.1 is GA.

Categories: Programming

Clojure: Create a directory

Sat, 05/24/2014 - 01:12

I spent much longer than I should have done trying to work out how to create a directory in Clojure as part of an import script I’m working out so for my future self this is how you do it:

(.mkdir (java.io.File. "/path/to/dir/to/create"))

I’m creating a directory which contains today’s date so I’d want something like ‘members-2014-05-24′ if I was running it today. The clj-time library is very good for working with dates.

To create a folder containing today’s date this is what we’d have:

(ns neo4j-meetup.core
  (:require [clj-time.format :as f]))
 
(def format-as-year-month-day (f/formatter "yyyy-MM-dd"))
 
(defn create-directory-for-today []
  (let [date (f/unparse format-as-year-month-day (t/now))]
    (.mkdir (java.io.File. (str "data/members-" date)))))

Initial code shamelessly stolen from Shu Wang’s gist so thanks to him as well!

Categories: Programming

Neo4j 2.1: Creating adjacency matrices

Wed, 05/21/2014 - 00:14

About 9 months ago I wrote a blog post showing how to export an adjacency matrix from a Neo4j 1.9 database using the cypher query language and I thought it deserves an update to use 2.0 syntax.

I’ve been spending some of my free time working on an application that runs on top of meetup.com’s API and one of the queries I wanted to write was to find the common members between 2 meetup groups.

The first part of this query is a cartesian product of the groups we want to consider which will give us the combinations of pairs of groups:

MATCH (g1:Group), (g2:Group)
RETURN g1.name, g2.name
LIMIT 10
+-------------------------------------------------------------------------------------+
| g1.name                           | g2.name                                         |
+-------------------------------------------------------------------------------------+
| "London ElasticSearch User Group" | "London ElasticSearch User Group"               |
| "London ElasticSearch User Group" | "Big Data / Data Science / Data Analytics Jobs" |
| "London ElasticSearch User Group" | "eXist User Group London"                       |
| "London ElasticSearch User Group" | "Couchbase London"                              |
| "London ElasticSearch User Group" | "Big Data Developers in London"                 |
| "London ElasticSearch User Group" | "HBase London Meetup"                           |
| "London ElasticSearch User Group" | "Marklogic Financial Services Community"        |
| "London ElasticSearch User Group" | "GridGain London"                               |
| "London ElasticSearch User Group" | "MEAN Stack"                                    |
| "London ElasticSearch User Group" | "Hazelcast User Group London (HUGL)"            |
...
+-------------------------------------------------------------------------------------+

Our next step is to write a pattern which checks for common members between each pair of groups. We end up with the following:

MATCH (g1:Group), (g2:Group)
OPTIONAL MATCH (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
RETURN g1.name, g2.name, COUNT(*) AS overlap
+----------------------------------------------------------------------------------------------+
| g1.name                     | g2.name                                              | overlap |
+----------------------------------------------------------------------------------------------+
| "eXist User Group London"   | "Women in Data"                                      | 1       |
| "Hive London"               | "Big Data Developers in London"                      | 47      |
| "Neo4j - London User Group" | "London ElasticSearch User Group"                    | 80      |
| "MEAN Stack"                | "The London Distributed Graph Database Meetup Group" | 1       |
| "HBase London Meetup"       | "Big Data London"                                    | 92      |
| "London MongoDB User Group" | "Big Data Developers in London"                      | 63      |
| "Big Data London"           | "Hive London"                                        | 195     |
| "HBase London Meetup"       | "Cassandra London"                                   | 58      |
| "Big Data London"           | "Neo4j - London User Group"                          | 330     |
| "Cassandra London"          | "Oracle Big Data 4 the Enterprise"                   | 50      |
...
+----------------------------------------------------------------------------------------------+

The next step is to sort the rows so that we can create an array of values for each group in our next step. We therefore sort by group1 and then by group2:

MATCH (g1:Group), (g2:Group)
OPTIONAL MATCH (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
RETURN g1.name, g2.name, COUNT(*) AS overlap
ORDER BY g1.name, g2.name
+-------------------------------------------------------------------------------------------------------------+
| g1.name                                         | g2.name                                         | overlap |
+-------------------------------------------------------------------------------------------------------------+
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data / Data Science / Data Analytics Jobs" | 1       |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Developers in London"                 | 17      |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Jobs in London"                       | 20      |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data London"                               | 37      |
| "Big Data / Data Science / Data Analytics Jobs" | "Cassandra London"                              | 16      |
| "Big Data / Data Science / Data Analytics Jobs" | "Couchbase London"                              | 3       |
| "Big Data / Data Science / Data Analytics Jobs" | "Data Science London"                           | 49      |
| "Big Data / Data Science / Data Analytics Jobs" | "DeNormalised London"                           | 3       |
| "Big Data / Data Science / Data Analytics Jobs" | "Enterprise Search London Meetup"               | 2       |
| "Big Data / Data Science / Data Analytics Jobs" | "GridGain London"                               | 1       |
...
+-------------------------------------------------------------------------------------------------------------+

One strange thing we see here is that there is an overlap of 1 between ‘Big Data / Data Science / Data Analytics Jobs’ and itself which is ‘wrong’ as the query doesn’t actually return any overlapping members. However, since we used ‘OPTIONAL MATCH’ we would still have got 1 row back for that pair of groups with a ‘null’ value. Let’s fix that:

MATCH (g1:Group), (g2:Group)
OPTIONAL MATCH path = (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
 
WITH g1, g2, CASE WHEN path is null THEN 0 ELSE COUNT(path) END AS overlap
RETURN g1.name, g2.name, overlap
ORDER BY g1.name, g2.name
LIMIT 10
+-------------------------------------------------------------------------------------------------------------+
| g1.name                                         | g2.name                                         | overlap |
+-------------------------------------------------------------------------------------------------------------+
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data / Data Science / Data Analytics Jobs" | 0       |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Developers in London"                 | 17      |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Jobs in London"                       | 20      |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data London"                               | 37      |
| "Big Data / Data Science / Data Analytics Jobs" | "Cassandra London"                              | 16      |
| "Big Data / Data Science / Data Analytics Jobs" | "Couchbase London"                              | 3       |
| "Big Data / Data Science / Data Analytics Jobs" | "Data Science London"                           | 49      |
| "Big Data / Data Science / Data Analytics Jobs" | "DeNormalised London"                           | 3       |
| "Big Data / Data Science / Data Analytics Jobs" | "Enterprise Search London Meetup"               | 2       |
| "Big Data / Data Science / Data Analytics Jobs" | "GridGain London"                               | 0       |
...
+-------------------------------------------------------------------------------------------------------------+

We’ll see that there is no overlap with ‘GridGain London’ either which we didn’t know before. We’ve been able to do this by using CASE and checking whether or not the OPTIONAL MATCH came up with a path or not.

Our next step is to group the data returned so that we have one row for each meetup group which contains an array showing the overlap with all the other groups:

MATCH (g1:Group), (g2:Group)
OPTIONAL MATCH path = (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
 
WITH g1, g2, CASE WHEN path is null THEN 0 ELSE COUNT(path) END AS overlap
ORDER BY g1.name, g2.name
RETURN g1.name, COLLECT(overlap)
ORDER BY g1.name
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| g1.name                                                     | COLLECT(overlap)                                                                                                    |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Big Data / Data Science / Data Analytics Jobs"             | [0,17,20,37,16,3,49,3,2,0,6,4,28,2,11,0,3,4,5,13,4,1,4,0,2,0,20,1,5,5,0,5,4,4,1]                                    |
| "Big Data Developers in London"                             | [17,0,48,231,67,18,228,18,17,0,38,10,150,12,47,4,24,18,31,63,36,11,20,7,7,1,88,2,38,10,0,33,11,26,3]                |
| "Big Data Jobs in London"                                   | [20,48,0,189,70,16,168,19,7,0,21,10,128,9,51,4,24,14,23,69,13,5,20,5,7,2,69,1,34,12,0,10,10,19,4]                   |
| "Big Data London"                                           | [37,231,189,0,417,49,1128,94,89,0,92,31,738,39,195,20,116,93,124,328,98,44,81,20,36,10,330,2,122,79,2,74,45,107,11] |
| "Cassandra London"                                          | [16,67,70,417,0,36,276,63,40,1,58,13,292,34,104,9,72,55,71,195,58,23,64,9,10,2,174,4,50,65,2,21,23,19,4]            |
| "Couchbase London"                                          | [3,18,16,49,36,0,42,8,6,1,19,6,56,7,20,2,16,11,24,51,21,10,22,12,7,1,43,2,12,9,1,6,5,9,2]                           |
| "Data Science London"                                       | [49,228,168,1128,276,42,0,93,83,2,71,32,611,24,174,17,63,83,120,268,82,36,60,21,22,3,363,3,88,65,0,98,45,141,9]     |
| "DeNormalised London"                                       | [3,18,19,94,63,8,93,0,5,1,17,5,75,6,39,3,20,34,16,53,16,7,27,1,6,2,55,1,20,17,0,3,17,7,3]                           |
| "Enterprise Search London Meetup"                           | [2,17,7,89,40,6,83,5,0,0,9,0,64,4,22,2,6,8,75,44,12,5,11,3,17,2,48,0,9,19,0,7,9,6,0]                                |
| "GridGain London"                                           | [0,0,0,0,1,1,2,1,0,0,0,1,1,3,1,0,0,0,0,1,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0]                                             |
| "HBase London Meetup"                                       | [6,38,21,92,58,19,71,17,9,0,0,3,94,15,37,3,17,9,30,38,22,6,12,5,5,1,51,2,24,9,0,9,10,4,4]                           |
| "HPC & GPU Supercomputing Group of London"                  | [4,10,10,31,13,6,32,5,0,1,3,0,25,4,6,1,6,4,4,8,2,1,4,0,0,0,16,0,3,4,0,2,3,1,1]                                      |
| "Hadoop Users Group UK"                                     | [28,150,128,738,292,56,611,75,64,1,94,25,0,29,214,9,81,67,113,272,75,28,72,13,28,4,259,3,101,60,4,38,39,48,11]      |
| "Hazelcast User Group London (HUGL)"                        | [2,12,9,39,34,7,24,6,4,3,15,4,29,0,6,1,6,5,5,20,14,2,10,2,1,1,27,0,3,2,1,5,2,0,1]                                   |
| "Hive London"                                               | [11,47,51,195,104,20,174,39,22,1,37,6,214,6,0,2,22,31,40,75,23,13,26,4,9,1,80,2,39,27,1,12,18,13,1]                 |
| "London Actionable Behavioral Analytics for Web and Mobile" | [0,4,4,20,9,2,17,3,2,0,3,1,9,1,2,0,1,0,2,8,4,1,1,1,0,1,7,0,2,0,0,8,1,2,1]                                           |
| "London Cloud Computing / NoSQL"                            | [3,24,24,116,72,16,63,20,6,0,17,6,81,6,22,1,0,11,15,52,21,7,27,3,7,1,39,0,15,21,4,2,2,9,5]                          |
| "London Data Bar"                                           | [4,18,14,93,55,11,83,34,8,0,9,4,67,5,31,0,11,0,13,58,12,4,22,3,1,0,44,4,19,7,0,5,8,8,0]                             |
| "London ElasticSearch User Group"                           | [5,31,23,124,71,24,120,16,75,0,30,4,113,5,40,2,15,13,0,80,22,9,32,9,6,0,80,1,20,33,1,6,9,11,2]                      |
| "London MongoDB User Group"                                 | [13,63,69,328,195,51,268,53,44,1,38,8,272,20,75,8,52,58,80,0,56,32,64,62,21,4,211,5,52,71,3,17,22,22,5]             |
| "London NoSQL"                                              | [4,36,13,98,58,21,82,16,12,1,22,2,75,14,23,4,21,12,22,56,0,16,24,20,8,2,69,1,12,13,0,18,8,6,3]                      |
| "London PostgreSQL Meetup Group"                            | [1,11,5,44,23,10,36,7,5,0,6,1,28,2,13,1,7,4,9,32,16,0,12,2,5,1,29,1,10,10,0,3,2,7,0]                                |
| "London Riak Meetup"                                        | [4,20,20,81,64,22,60,27,11,0,12,4,72,10,26,1,27,22,32,64,24,12,0,5,7,1,63,2,9,24,1,9,12,4,3]                        |
| "MEAN Stack"                                                | [0,7,5,20,9,12,21,1,3,0,5,0,13,2,4,1,3,3,9,62,20,2,5,0,1,0,27,1,1,4,1,6,1,3,1]                                      |
| "MarkLogic User Group London"                               | [2,7,7,36,10,7,22,6,17,0,5,0,28,1,9,0,7,1,6,21,8,5,7,1,0,16,22,1,8,6,0,0,5,5,13]                                    |
| "Marklogic Financial Services Community"                    | [0,1,2,10,2,1,3,2,2,0,1,0,4,1,1,1,1,0,0,4,2,1,1,0,16,0,6,0,1,1,0,1,1,1,4]                                           |
| "Neo4j - London User Group"                                 | [20,88,69,330,174,43,363,55,48,2,51,16,259,27,80,7,39,44,80,211,69,29,63,27,22,6,0,5,40,43,3,36,44,58,11]           |
| "OpenCredo Tech Workshops"                                  | [1,2,1,2,4,2,3,1,0,0,2,0,3,0,2,0,0,4,1,5,1,1,2,1,1,0,5,0,2,1,0,0,1,1,0]                                             |
| "Oracle Big Data 4 the Enterprise"                          | [5,38,34,122,50,12,88,20,9,0,24,3,101,3,39,2,15,19,20,52,12,10,9,1,8,1,40,2,0,10,0,2,7,9,4]                         |
| "Redis London"                                              | [5,10,12,79,65,9,65,17,19,0,9,4,60,2,27,0,21,7,33,71,13,10,24,4,6,1,43,1,10,0,0,2,7,2,1]                            |
| "The Apache Jmeter London Group"                            | [0,0,0,2,2,1,0,0,0,0,0,0,4,1,1,0,4,0,1,3,0,0,1,1,0,0,3,0,0,0,0,1,0,0,0]                                             |
| "The Data Scientist - UK"                                   | [5,33,10,74,21,6,98,3,7,0,9,2,38,5,12,8,2,5,6,17,18,3,9,6,0,1,36,0,2,2,1,0,2,12,1]                                  |
| "The London Distributed Graph Database Meetup Group"        | [4,11,10,45,23,5,45,17,9,0,10,3,39,2,18,1,2,8,9,22,8,2,12,1,5,1,44,1,7,7,0,2,0,9,1]                                 |
| "Women in Data"                                             | [4,26,19,107,19,9,141,7,6,0,4,1,48,0,13,2,9,8,11,22,6,7,4,3,5,1,58,1,9,2,0,12,9,0,1]                                |
| "eXist User Group London"                                   | [1,3,4,11,4,2,9,3,0,0,4,1,11,1,1,1,5,0,2,5,3,0,3,1,13,4,11,0,4,1,0,1,1,1,0]                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

This query is reasonably easy to follow and our next step would be to plug the output of this query into a visualisation tool of some sort.

Sometimes we can’t create the cartesian product as easily as we were able to here – all we needed to do was call MATCH with the same label twice.

We can create cartesian products in other scenarios as well. For example let’s say we only want to compare the first 5 meetup groups ordered by name.

First we’ll get the top 5 groups:

MATCH (g:Group)
RETURN g.name
ORDER BY g.name
LIMIT 5
+-------------------------------------------------+
| g.name                                          |
+-------------------------------------------------+
| "Big Data / Data Science / Data Analytics Jobs" |
| "Big Data Developers in London"                 |
| "Big Data Jobs in London"                       |
| "Big Data London"                               |
| "Cassandra London"                              |
+-------------------------------------------------+

Now let’s get all the pairs of those groups:

MATCH (g:Group)
WITH g
ORDER BY g.name
LIMIT 5
 
WITH COLLECT(g) AS groups
UNWIND groups AS g1
UNWIND groups AS g2
RETURN g1.name, g2.name
ORDER BY g1.name, g2.name
+---------------------------------------------------------------------------------------------------+
| g1.name                                         | g2.name                                         |
+---------------------------------------------------------------------------------------------------+
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data / Data Science / Data Analytics Jobs" |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Developers in London"                 |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data Jobs in London"                       |
| "Big Data / Data Science / Data Analytics Jobs" | "Big Data London"                               |
| "Big Data / Data Science / Data Analytics Jobs" | "Cassandra London"                              |
| "Big Data Developers in London"                 | "Big Data / Data Science / Data Analytics Jobs" |
| "Big Data Developers in London"                 | "Big Data Developers in London"                 |
| "Big Data Developers in London"                 | "Big Data Jobs in London"                       |
| "Big Data Developers in London"                 | "Big Data London"                               |
| "Big Data Developers in London"                 | "Cassandra London"                              |
| "Big Data Jobs in London"                       | "Big Data / Data Science / Data Analytics Jobs" |
| "Big Data Jobs in London"                       | "Big Data Developers in London"                 |
| "Big Data Jobs in London"                       | "Big Data Jobs in London"                       |
| "Big Data Jobs in London"                       | "Big Data London"                               |
| "Big Data Jobs in London"                       | "Cassandra London"                              |
| "Big Data London"                               | "Big Data / Data Science / Data Analytics Jobs" |
| "Big Data London"                               | "Big Data Developers in London"                 |
| "Big Data London"                               | "Big Data Jobs in London"                       |
| "Big Data London"                               | "Big Data London"                               |
| "Big Data London"                               | "Cassandra London"                              |
| "Cassandra London"                              | "Big Data / Data Science / Data Analytics Jobs" |
| "Cassandra London"                              | "Big Data Developers in London"                 |
| "Cassandra London"                              | "Big Data Jobs in London"                       |
| "Cassandra London"                              | "Big Data London"                               |
| "Cassandra London"                              | "Cassandra London"                              |
+---------------------------------------------------------------------------------------------------+

Here we’re making use of my current favourite function in cypher – UNWIND – which allows you to take a collection of things and expand them out to have an individual row each.

It’s currently only available in the latest RC of Neo4j 2.1 so we’ll have to wait a little bit longer before using it in production!

We complete the query like so:

MATCH (g:Group)
WITH g
ORDER BY g.name
LIMIT 5
 
WITH COLLECT(g) AS groups
UNWIND groups AS g1
UNWIND groups AS g2
OPTIONAL MATCH path = (g1)<-[:MEMBER_OF]-()-[:MEMBER_OF]->(g2)
 
WITH g1, g2, CASE WHEN path is null THEN 0 ELSE COUNT(path) END AS overlap
ORDER BY g1.name, g2.name
RETURN g1.name, COLLECT(overlap)
ORDER BY g1.name
+----------------------------------------------------------------------+
| g1.name                                         | COLLECT(overlap)   |
+----------------------------------------------------------------------+
| "Big Data / Data Science / Data Analytics Jobs" | [0,17,20,37,16]    |
| "Big Data Developers in London"                 | [17,0,48,231,67]   |
| "Big Data Jobs in London"                       | [20,48,0,189,70]   |
| "Big Data London"                               | [37,231,189,0,417] |
| "Cassandra London"                              | [16,67,70,417,0]   |
+----------------------------------------------------------------------+

And we’re done!

Categories: Programming

Jersey/Jax RS: Streaming JSON

Wed, 04/30/2014 - 02:24

About a year ago I wrote a blog post showing how to stream a HTTP response using Jersey/Jax RS and I recently wanted to do the same thing but this time using JSON.

A common pattern is to take our Java object and get a JSON string representation of that but that isn’t the most efficient use of memory because we now have the Java object and a string representation.

This is particularly problematic if we need to return a lot of the data in a response.

By writing a little bit more code we can get our response to stream to the client as soon as some of it is ready rather than building the whole result and sending it all in one go:

@Path("/resource")
public class MadeUpResource
{
    private final ObjectMapper objectMapper;
 
    public MadeUpResource() {
        objectMapper = new ObjectMapper();
    }
 
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Response loadHierarchy(@PathParam( "pkPerson" ) String pkPerson) {
        final Map<Integer, String> people  = new HashMap<>();
        people.put(1, "Michael");
        people.put(2, "Mark");
 
        StreamingOutput stream = new StreamingOutput() {
            @Override
            public void write(OutputStream os) throws IOException, WebApplicationException
            {
                JsonGenerator jg = objectMapper.getJsonFactory().createJsonGenerator( os, JsonEncoding.UTF8 );
                jg.writeStartArray();
 
                for ( Map.Entry<Integer, String> person : people.entrySet()  )
                {
                    jg.writeStartObject();
                    jg.writeFieldName( "id" );
                    jg.writeString( person.getKey().toString() );
                    jg.writeFieldName( "name" );
                    jg.writeString( person.getValue() );
                    jg.writeEndObject();
                }
                jg.writeEndArray();
 
                jg.flush();
                jg.close();
            }
        };
 
 
        return Response.ok().entity( stream ).type( MediaType.APPLICATION_JSON ).build()    ;
    }
}

If we run that this is the output we’d see:

[{"id":"1","name":"Michael"},{"id":"2","name":"Mark"}]

It’s a simple example but hopefully it’s easy to see how we could translate that if we wanted to stream more complex data.

Categories: Programming

Clojure: Paging meetup data using lazy sequences

Wed, 04/30/2014 - 01:20

I’ve been playing around with the meetup API to do some analysis on the Neo4j London meetup and one thing I wanted to do was download all the members of the group.

A feature of the meetup API is that each end point will only allow you to return a maximum of 200 records so I needed to make use of offsets and paging to retrieve everybody.

It seemed like a good chance to use some lazy sequences to keep track of the offsets and then stop making calls to the API once I wasn’t retrieving any more results.

I wrote the following functions to take care of that bit:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))
 
(defn offsets []
  (unchunk (range)))
 
 
(defn get-all [api-fn]
  (flatten
   (take-while seq
               (map #(api-fn {:perpage 200 :offset % :orderby "name"}) (offsets)))))

I previously wrote about the chunking behaviour of lazy collections which meant that I ended up with a minimum of 32 calls to each URI which wasn’t what I had in mind!

To get all the members in the group I wrote the following function which is passed to get-all:

(:require [clj-http.client :as client])
 
(defn members
  [{perpage :perpage offset :offset orderby :orderby}]
  (->> (client/get
        (str "https://api.meetup.com/2/members?page=" perpage
             "&offset=" offset
             "&orderby=" orderby
             "&group_urlname=" MEETUP_NAME
             "&key=" MEETUP_KEY)
        {:as :json})
       :body :results))

So to get all the members we’d do this:

(defn all-members []
  (get-all members))

I’m told that using lazy collections when side effects are involved is a bad idea – presumably because the calls to the API might never end – but since I only run it manually I can just kill the process if anything goes wrong.

I’d be interested in how others would go about solving this problem – core.async was suggested but that seems to result in much more / more complicated code than this version.

The code is on github if you want to take a look.

Categories: Programming

Clojure: clj-time – Formatting a date / timestamp with day suffixes e.g. 1st, 2nd, 3rd

Sat, 04/26/2014 - 08:50

I’ve been using the clj-time library recently – a Clojure wrapper around Joda Time – and one thing I wanted to do is format a date with day suffixes e.g. 1st, 2nd, 3rd.

I started with the following timestamp:

1309368600000

The first step was to convert that into a DateTime object like so:

user> (require '[clj-time.coerce :as c])
user> (c/from-long 1309368600000)
#<DateTime 2011-06-29T17:30:00.000Z>

I wanted to output that date in the following format:

29th June 2011

We can get quite close by using a custom time formatter:

user> (require '[clj-time.format :as f])
nil
user> (f/unparse (f/formatter "d MMMM yyyy") (c/from-long 1309368600000))
"29 June 2011"

Unfortunately I couldn’t find anywhere in the documentation explaining how to get the elusive ‘th’ or ‘st’ to print. I was hoping for something similar to PHP date formatting:

2014 04 26 08 38 39

Eventually I came across a Stack Overflow post about Joda Time suggesting that you can’t actually format a day in the way I was hoping to.

So I now have the following function to do it for me:

(defn day-suffix [day]
  (let [stripped-day (if (< day 20) day (mod day 10))]
    (cond (= stripped-day 1) "st"
          (= stripped-day 2) "nd"
          (= stripped-day 3) "rd"
          :else "th")))

and the code to get the date in my favoured format looks like this:

user> (def my-time (c/from-long 1309368600000))
#'user/my-time
user> (def day (read-string (f/unparse (f/formatter "d") my-time)))
#'user/day
user> (str day (day-suffix day) " " (f/unparse (f/formatter "MMMM yyyy") my-time))
"29th June 2011"

I’m assuming there’s a better way but what is it?!

Categories: Programming

Neo4j: Cypher – Flatten a collection

Wed, 04/23/2014 - 23:02

Every now and then in Cypher land we’ll end up with a collection of arrays, often created via the COLLECT function, that we want to squash down into one array.

For example let’s say we have the following array of arrays…

$ RETURN [[1,2,3], [4,5,6], [7,8,9]] AS result;
==> +---------------------------+
==> | result                    |
==> +---------------------------+
==> | [[1,2,3],[4,5,6],[7,8,9]] |
==> +---------------------------+
==> 1 row

…and we want to return the array [1,2,3,4,5,6,7,8,9].

Many programming languages have a ‘flatten’ function and although cypher doesn’t we can make our own by using the REDUCE function:

$ WITH  [[1,2,3], [4,5,6], [7,8,9]] AS result 
  RETURN REDUCE(output = [], r IN result | output + r) AS flat;
==> +---------------------+
==> | flat                |
==> +---------------------+
==> | [1,2,3,4,5,6,7,8,9] |
==> +---------------------+
==> 1 row

Here we’re passing the array ‘output’ over the collection and adding the individual arrays ([1,2,3], [4,5,6] and [7,8,9]) to that array as we iterate over the collection.

If we’re working with numbers in Neo4j 2.0.1 we’ll get this type exception with this version of the code:

==> SyntaxException: Type mismatch: expected Any, Collection<Any> or Collection<Collection<Any>> but was Integer (line 1, column 148)

We can easily work around that by coercing the type of ‘output’ like so:

WITH  [[1,2,3], [4,5,6], [7,8,9]] AS result 
RETURN REDUCE(output = range(0,-1), r IN result | output + r);

Of course this is quite a simple example but we can handle more complicated scenarios as well by using nested calls to REDUCE. For example let’s say we wanted to completely flatten this array:

$ RETURN [[1,2,3], [4], [5, [6, 7]], [8,9]] AS result;
==> +-------------------------------+
==> | result                        |
==> +-------------------------------+
==> | [[1,2,3],[4],[5,[6,7]],[8,9]] |
==> +-------------------------------+
==> 1 row

We could write the following cypher code:

$ WITH  [[1,2,3], [4], [5, [6, 7]], [8,9]] AS result 
  RETURN REDUCE(output = [], r IN result | output + REDUCE(innerOutput = [], innerR in r | innerOutput + innerR)) AS flat;
==> +---------------------+
==> | flat                |
==> +---------------------+
==> | [1,2,3,4,5,6,7,8,9] |
==> +---------------------+
==> 1 row

Here we have an outer REDUCE function which iterates over [1,2,3], [4], [5, [6,7]] and [8,9] and then an inner REDUCE function which iterates over those individual arrays.

If we had more nesting then we could just introduce another level of nesting!

Categories: Programming

Neo4j: Cypher – Creating a time tree down to the day

Sat, 04/19/2014 - 22:15

Michael recently wrote a blog post showing how to create a time tree representing time down to the second using Neo4j’s Cypher query language, something I built on top of for a side project I’m working on.

The domain I want to model is RSVPs to meetup invites – I want to understand how much in advance people respond and how likely they are to drop out at a later stage.

For this problem I only need to measure time down to the day so my task is a bit easier than Michael’s.

After a bit of fiddling around with leap years I believe the following query will create a time tree representing all the days from 2011 – 2014, which covers the time the London Neo4j meetup has been running:

WITH range(2011, 2014) AS years, range(1,12) as months
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN (CASE 
                      WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) 
                      WHEN month = 2 THEN 
                        CASE
                          WHEN year % 4 <> 0 THEN range(1,28)
                          WHEN year % 100 <> 0 THEN range(1,29)
                          WHEN year % 400 <> 0 THEN range(1,29)
                          ELSE range(1,28)
                        END
                      ELSE range(1,30)
                    END) |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d))))

The next step is to link adjacent days together so that we can easily traverse between adjacent days without needing to go back up and down the tree. For example we should have something like this:

(jan31)-[:NEXT]->(feb1)-[:NEXT]->(feb2)

We can build this by first collecting all the ‘day’ nodes in date order like so:

MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
RETURN days

And then iterating over adjacent nodes to create the ‘NEXT’ relationship:

MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
FOREACH(i in RANGE(0, length(days)-2) | 
    FOREACH(day1 in [days[i]] | 
        FOREACH(day2 in [days[i+1]] | 
            CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now if we want to find the previous 5 days from the 1st February 2014 we could write the following query:

MATCH (y:Year {year: 2014})-[:HAS_MONTH]->(m:Month {month: 2})-[:HAS_DAY]->(:Day {day: 1})<-[:NEXT*0..5]-(day)
RETURN y,m,day
2014 04 19 22 14 04

If we want to we can create the time tree and then connect the day nodes all in one query by using ‘WITH *’ like so:

WITH range(2011, 2014) AS years, range(1,12) as months
FOREACH(year IN years | 
  MERGE (y:Year {year: year})
  FOREACH(month IN months | 
    CREATE (m:Month {month: month})
    MERGE (y)-[:HAS_MONTH]->(m)
    FOREACH(day IN (CASE 
                      WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) 
                      WHEN month = 2 THEN 
                        CASE
                          WHEN year % 4 <> 0 THEN range(1,28)
                          WHEN year % 100 <> 0 THEN range(1,29)
                          WHEN year % 400 <> 0 THEN range(1,29)
                          ELSE range(1,28)
                        END
                      ELSE range(1,30)
                    END) |      
      CREATE (d:Day {day: day})
      MERGE (m)-[:HAS_DAY]->(d))))
 
WITH *
 
MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day)
WITH year,month,day
ORDER BY year.year, month.month, day.day
WITH collect(day) as days
FOREACH(i in RANGE(0, length(days)-2) | 
    FOREACH(day1 in [days[i]] | 
        FOREACH(day2 in [days[i+1]] | 
            CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now I need to connect the RSVP events to the tree!

Categories: Programming

Neo4j 2.0.1: Cypher – Concatenating an empty collection / Type mismatch: expected Integer, Collection<Integer> or Collection<Collection<Integer>> but was Collection<Any>

Sat, 04/19/2014 - 20:51

Last weekend I was playing around with some collections using Neo4j’s Cypher query language and I wanted to concatenate two collections.

This was easy enough when both collections contained values…

$ RETURN [1,2,3,4] + [5,6,7];
==> +---------------------+
==> | [1,2,3,4] + [5,6,7] |
==> +---------------------+
==> | [1,2,3,4,5,6,7]     |
==> +---------------------+
==> 1 row

…but I ended up with the following exception when I tried to concatenate with an empty collection:

$ RETURN [1,2,3,4] + [];
==> SyntaxException: Type mismatch: expected Integer, Collection<Integer> or Collection<Collection<Integer>> but was Collection<Any> (line 1, column 20)
==> "RETURN [1,2,3,4] + []"
==>                     ^

I figured there was probably some strange type coercion going on for the empty collection and came up with the following work around using the RANGE function:

$ RETURN [1,2,3,4] + RANGE(0,-1);
==> +-------------------------+
==> | [1,2,3,4] + RANGE(0,-1) |
==> +-------------------------+
==> | [1,2,3,4]               |
==> +-------------------------+
==> 1 row

While writing this up I decided to check if it behaved the same way in the recently released 2.0.2 and was pleasantly surprised to see that the work around is no longer necessary:

$ RETURN [1,2,3,4] + [];
==> +----------------+
==> | [1,2,3,4] + [] |
==> +----------------+
==> | [1,2,3,4]      |
==> +----------------+
==> 1 row

So if you’re seeing the same issue get yourself upgraded!

Categories: Programming

Neo4j: Cypher – Creating relationships between a collection of nodes / Invalid input ‘[‘:

Sat, 04/19/2014 - 07:33

When working with graphs we’ll frequently find ourselves wanting to create relationships between collections of nodes.

A common example of this would be creating a linked list of days so that we can quickly traverse across a time tree. Let’s say we start with just 3 days:

MERGE (day1:Day {day:1 })
MERGE (day2:Day {day:2 })
MERGE (day3:Day {day:3 })
RETURN day1, day2, day3

And we want to create a ‘NEXT’ relationship between adjacent days:

(day1)-[:NEXT]->(day2)-[:NEXT]->(day3)

The most obvious way to do this would be to collect the days into an ordered collection and iterate over them using FOREACH, creating a relationship between adjacent nodes:

MATCH (day:Day)
WITH day
ORDER BY day.day
WITH COLLECT(day) AS days
FOREACH(i in RANGE(0, length(days)-2) | 
  CREATE UNIQUE (days[i])-[:NEXT]->(days[i+1]))

Unfortunately this isn’t valid syntax:

Invalid input '[': expected an identifier character, node labels, a property map, whitespace, ')' or a relationship pattern (line 6, column 32)
"            CREATE UNIQUE (days[i])-[:NEXT]->(days[i+1]))"
                                ^

It doesn’t seem to like us using array indices where we specify the node identifier.

However, we can work around that by putting days[i] and days[i+1] into single item arrays and using nested FOREACH loops on those, something Michael Hunger showed me last year and I forgot all about!

MATCH (day:Day)
WITH day
ORDER BY day.day
WITH COLLECT(day) AS days
FOREACH(i in RANGE(0, length(days)-2) | 
  FOREACH(day1 in [days[i]] | 
    FOREACH(day2 in [days[i+1]] | 
      CREATE UNIQUE (day1)-[:NEXT]->(day2))))

Now if we do a query to get back all the days we’ll see they’re connected:

2014 04 19 07 32 37
Categories: Programming

Neo4j 2.0.0: Query not prepared correctly / Type mismatch: expected Map

Sun, 04/13/2014 - 18:40

I was playing around with Neo4j’s Cypher last weekend and found myself accidentally running some queries against an earlier version of the Neo4j 2.0 series (2.0.0).

My first query started with a map and I wanted to create a person from an identifier inside the map:

WITH {person: {id: 1}} AS params
MERGE (p:Person {id: params.person.id})
RETURN p

When I ran the query I got this error:

==> SyntaxException: Type mismatch: expected Map but was Boolean, Number, String or Collection<Any> (line 1, column 62)
==> "WITH {person: {id: 1}} AS params MERGE (p:Person {id: params.person.id}) RETURN p"

If we try the same query in 2.0.1 it works as we’d expect:

==> +---------------+
==> | p             |
==> +---------------+
==> | Node[1]{id:} |
==> +---------------+
==> 1 row
==> Nodes created: 1
==> Properties set: 1
==> Labels added: 1
==> 47 ms

My next query was the following which links topics of interest to a person:

WITH {topics: [{name: "Java"}, {name: "Neo4j"}]} AS params
MERGE (p:Person {id: 2})
FOREACH(t IN params.topics | 
  MERGE (topic:Topic {name: t.name})
  MERGE (p)-[:INTERESTED_IN]->(topic)
)
RETURN p

In 2.0.0 that query fails like so:

==> InternalException: Query not prepared correctly!

but if we try it in 2.0.1 we’ll see that it works as well:

==> +---------------+
==> | p             |
==> +---------------+
==> | Node[4]{id:2} |
==> +---------------+
==> 1 row
==> Nodes created: 1
==> Relationships created: 2
==> Properties set: 1
==> Labels added: 1
==> 53 ms

So if you’re seeing either of those errors then get yourself upgraded to 2.0.1 as well!

Categories: Programming

install4j and AppleScript: Creating a Mac OS X Application Bundle for a Java application

Mon, 04/07/2014 - 01:04

We have a few internal applications at Neo which can be launched using ‘java -jar ‘ and I always forget where the jars are so I thought I’d wrap a Mac OS X application bundle around it to make life easier.

My favourite installation pattern is the one where when you double click the dmg it shows you a window where you can drag the application into the ‘Applications’ folder, like this:

2014 04 07 00 38 41

I’m not a fan of the installation wizards and the installation process here is so simple that a wizard seems overkill.

I started out learning about the structure of an application bundle which is well described in the Apple Bundle Programming guide. I then worked my way through a video which walks you through bundling a JAR file in a Mac application.

I figured that bundling a JAR was probably a solved problem and had a look at App Bundler, JAR Bundler and Iceberg before settling on Install4j which we used for Neo4j desktop.

I started out by creating an installer using Install4j and then manually copying the launcher it created into an Application bundle template but it was incredibly fiddly and I ended up with a variety of indecipherable messages in the system error log.

Eventually I realised that I didn’t need to create an installer and that what I actually wanted was a Mac OS X single bundle archive media file.

After I’d got install4j creating that for me I just needed to figure out how to create the background image telling the user to drag the application into their ‘Applications’ folder.

Luckily I came across this StackOverflow post which provided some AppleScript to do just that and with a bit of tweaking I ended up with the following shell script which seems to do the job:

#!/bin/bash
 
rm target/DBench_macos_1_0_0.tgz
/Applications/install4j\ 5/bin/install4jc TestBench.install4j
 
title="DemoBench"
backgroundPictureName="graphs.png"
applicationName="DemoBench"
finalDMGName="DemoBench.dmg"
 
rm -rf target/dmg && mkdir -p target/dmg
tar -C target/dmg -xvf target/DBench_macos_1_0_0.tgz
cp -r src/packaging/.background target/dmg
ln -s /Applications target/dmg
 
cd target
rm "${finalDMGName}"
umount -f /Volumes/"${title}"
 
hdiutil create -volname ${title} -size 100m -srcfolder dmg/ -ov -format UDRW pack.temp.dmg
device=$(hdiutil attach -readwrite -noverify -noautoopen "pack.temp.dmg" | egrep '^/dev/' | sed 1q | awk '{print $1}')
 
sleep 5
 
echo '
   tell application "Finder"
     tell disk "'${title}'"
           open
           set current view of container window to icon view
           set toolbar visible of container window to false
           set statusbar visible of container window to false
           set the bounds of container window to {400, 100, 885, 430}
           set theViewOptions to the icon view options of container window
           set arrangement of theViewOptions to not arranged
           set icon size of theViewOptions to 72
           set background picture of theViewOptions to file ".background:'${backgroundPictureName}'"
           set position of item "'${applicationName}'" of container window to {100, 100}
           set position of item "Applications" of container window to {375, 100}
           update without registering applications
           delay 5
           eject
     end tell
   end tell
' | osascript
 
hdiutil detach ${device}
hdiutil convert "pack.temp.dmg" -format UDZO -imagekey zlib-level=9 -o "${finalDMGName}"
rm -f pack.temp.dmg
 
cd ..

To summarise, this script creates a symlink to ‘Applications’, puts a background image in a directory titled ‘.background’, sets that as the background of the window and positions the symlink and application appropriately.

Et voila:

2014 04 07 00 59 56

The Firefox guys wrote a couple of blog posts detailing their experiences writing an installer which were quite an interesting read as well.

Categories: Programming

Clojure: Not so lazy sequences a.k.a chunking behaviour

Sun, 04/06/2014 - 23:07

I’ve been playing with Clojure over the weekend and got caught out by the behaviour of lazy sequences due to chunking – something which was obvious to experienced Clojurians although not me.

I had something similar to the following bit of code which I expected to only evaluate the first item of the infinite sequence that the range function generates:

> (take 1 (map (fn [x] (println (str "printing..." x))) (range)))
(printing...0
printing...1
printing...2
printing...3
printing...4
printing...5
printing...6
printing...7
printing...8
printing...9
printing...10
printing...11
printing...12
printing...13
printing...14
printing...15
printing...16
printing...17
printing...18
printing...19
printing...20
printing...21
printing...22
printing...23
printing...24
printing...25
printing...26
printing...27
printing...28
printing...29
printing...30
printing...31
nil)

The reason this was annoying is because I wanted to shortcut the lazy sequence using take-while, much like the poster of this StackOverflow question.

As I understand it when we have a lazy sequence the granularity of that laziness is 32 items at a time a.k.a one chunk, something that Michael Fogus wrote about 4 years ago. This was a bit surprising to me but it sounds like it makes sense for the majority of cases.

However, if we want to work around that behaviour we can wrap the lazy sequence in the following unchunk function provided by Stuart Sierra:

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))

Now if we repeat our initial code we’ll see it only prints once:

> (take 1 (map (fn [x] (println (str "printing..." x))) (unchunk (range))))
(printing...0
nil)
Categories: Programming