Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Mark Needham
Syndicate content
Thoughts on Software Development
Updated: 3 hours 6 min ago

Go vs Python: Parsing a JSON response from a HTTP API

8 hours 51 min ago

As part of a recommendations with Neo4j talk that I’ve presented a few times over the last year I have a set of scripts that download some data from the meetup.com API.

They’re all written in Python but I thought it’d be a fun exercise to see what they’d look like in Go. My eventual goal is to try and parallelise the API calls.

This is the Python version of the script:

import requests
import os
import json

key =  os.environ['MEETUP_API_KEY']
lat = "51.5072"
lon = "0.1275"

seed_topic = "nosql"
uri = "https://api.meetup.com/2/groups?&topic={0}&lat={1}&lon={2}&key={3}".format(seed_topic, lat, lon, key)

r = requests.get(uri)
all_topics = [topic["urlkey"]  for result in r.json()["results"] for topic in result["topics"]]

for topic in all_topics:
    print topic

We’re using the requests library to send a request to the meetup API to get the groups which have the topic ‘nosql’ in the London area. We then parse the response and print out the topics.

Now to do the same thing in Go! The first bit of the script is almost identical:

import (
	"fmt"
	"os"
	"net/http"
	"log"
	"time"
)

func handleError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}

func main() {
	var httpClient = &http.Client{Timeout: 10 * time.Second}

	seedTopic := "nosql"
	lat := "51.5072"
	lon := "0.1275"
	key := os.Getenv("MEETUP_API_KEY")

	uri := fmt.Sprintf("https://api.meetup.com/2/groups?&topic=%s&lat=%s&lon=%s&key=%s", seedTopic, lat, lon, key)

	response, err := httpClient.Get(uri)
	handleError(err)
	defer response.Body.Close()
	fmt.Println(response)
}

If we run that this is the output we see:

$ go cmd/blog/main.go

So far so good. Now we need to parse the response that comes back.

Most of the examples that I came across suggest creating a struct with all the fields that you want to extract from the JSON document but that feels a bit over kill for such a simple script.

Instead we can just create maps of (string -> interface{}) and then apply type conversions where appropriate. I ended up with the following code to extract the topics:

import "encoding/json"

var target map[string]interface{}
decoder := json.NewDecoder(response.Body)
decoder.Decode(&target)

for _, rawGroup := range target["results"].([]interface{}) {
    group := rawGroup.(map[string]interface{})
    for _, rawTopic := range group["topics"].([]interface{}) {
        topic := rawTopic.(map[string]interface{})
        fmt.Println(topic["urlkey"])
    }
}

It’s more verbose that the Python version because we have to explicitly type each thing we take out of the map at every stage, but it’s not too bad. This is the full script:

package main

import (
	"fmt"
	"os"
	"net/http"
	"log"
	"time"
	"encoding/json"
)

func handleError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}

func main() {
	var httpClient = &http.Client{Timeout: 10 * time.Second}

	seedTopic := "nosql"
	lat := "51.5072"
	lon := "0.1275"
	key := os.Getenv("MEETUP_API_KEY")

	uri := fmt.Sprintf("https://api.meetup.com/2/groups?&topic=%s&lat=%s&lon=%s&key=%s", seedTopic, lat, lon, key)

	response, err := httpClient.Get(uri)
	handleError(err)
	defer response.Body.Close()

	var target map[string]interface{}
	decoder := json.NewDecoder(response.Body)
	decoder.Decode(&target)

	for _, rawGroup := range target["results"].([]interface{}) {
		group := rawGroup.(map[string]interface{})
		for _, rawTopic := range group["topics"].([]interface{}) {
			topic := rawTopic.(map[string]interface{})
			fmt.Println(topic["urlkey"])
		}
	}
}

Once I’ve got these topics the next step is to make more API calls to get the groups for those topics.

I want to make those API calls in parallel while making sure I don’t exceed the rate limit restrictions on the API and I think I can make use of go routines, channels, and timers to do that. But that’s for another post!

Categories: Programming

Go: First attempt at channels

Sat, 12/24/2016 - 11:45

In a previous blog post I mentioned that I wanted to extract blips from The ThoughtWorks Radar into a CSV file and I thought this would be a good mini project for me to practice using Go.

In particular I wanted to try using channels and this seemed like a good chance to do that.

I watched a talk by Rob Pike on designing concurrent applications where he uses the following definition of concurrency:

Concurrency is a way to structure a program by breaking it into pieces that can be executed independently.

He then demonstrates this with the following diagram:

2016 12 23 19 52 30

I broke the scraping application down into four parts:

  1. Find the links of blips to download ->
  2. Download the blips ->
  3. Scrape the data from each page ->
  4. Write the data into a CSV file

I don’t think we gain much by parallelising steps 1) or 4) but steps 2) and 3) seem easily parallelisable. Therefore we’ll use a single goroutine for steps 1) and 4) and multiple goroutines for steps 2) and 3).

We’ll create two channels:

  • filesToScrape
  • filesScraped

And they will interact with our components like this:

  • 2) will write the path of the downloaded files into filesToScape
  • 3) will read from filesToScrape and write the scraped content into filesScraped
  • 4) will read from filesScraped and put that information into a CSV file.


I decided to write a completely serial version of the scraping application first so that I could compare it to the parallel version. I had the following common code:

scrape/scrape.go

package scrape

import (
	"github.com/PuerkitoBio/goquery"
	"os"
	"bufio"
	"fmt"
	"log"
	"strings"
	"net/http"
	"io"
)

func checkError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}

type Blip struct {
	Link  string
	Title string
}

func (blip Blip) Download() File {
	parts := strings.Split(blip.Link, "/")
	fileName := "rawData/items/" + parts[len(parts) - 1]

	if _, err := os.Stat(fileName); os.IsNotExist(err) {
		resp, err := http.Get("http://www.thoughtworks.com" + blip.Link)
		checkError(err)
		body := resp.Body

		file, err := os.Create(fileName)
		checkError(err)

		io.Copy(bufio.NewWriter(file), body)
		file.Close()
		body.Close()
	}

	return File{Title: blip.Title, Path: fileName }
}

type File struct {
	Title string
	Path  string
}

func (fileToScrape File ) Scrape() ScrapedFile {
	file, err := os.Open(fileToScrape.Path)
	checkError(err)

	doc, err := goquery.NewDocumentFromReader(bufio.NewReader(file))
	checkError(err)
	file.Close()

	var entries []map[string]string
	doc.Find("div.blip-timeline-item").Each(func(i int, s *goquery.Selection) {
		entry := make(map[string]string, 0)
		entry["time"] = s.Find("div.blip-timeline-item__time").First().Text()
		entry["outcome"] = strings.Trim(s.Find("div.blip-timeline-item__ring span").First().Text(), " ")
		entry["description"] = s.Find("div.blip-timeline-item__lead").First().Text()
		entries = append(entries, entry)
	})

	return ScrapedFile{File:fileToScrape, Entries:entries}
}

type ScrapedFile struct {
	File    File
	Entries []map[string]string
}

func FindBlips(pathToRadar string) []Blip {
	blips := make([]Blip, 0)

	file, err := os.Open(pathToRadar)
	checkError(err)

	doc, err := goquery.NewDocumentFromReader(bufio.NewReader(file))
	checkError(err)

	doc.Find(".blip").Each(func(i int, s *goquery.Selection) {
		item := s.Find("a")
		title := item.Text()
		link, _ := item.Attr("href")
		blips = append(blips, Blip{Title: title, Link: link })
	})

	return blips
}

Note that we’re using the goquery library to scrape the HTML files that we download.

A Blip is used to represent an item that appears on the radar e.g. .NET Core. A File is a representation of that blip on my local file system and a ScrapedFile contains the local representation of a blip and has an array containing every appearance the blip has made in radars over time.

Let’s have a look at the single threaded version of the scraper:

cmd/single/main.go

package main

import (
	"fmt"
	"encoding/csv"
	"os"
	"github.com/mneedham/neo4j-thoughtworks-radar/scrape"
)


func main() {
	var filesCompleted chan scrape.ScrapedFile = make(chan scrape.ScrapedFile)
	defer close(filesCompleted)

	blips := scrape.FindBlips("rawData/twRadar.html")

	var filesToScrape []scrape.File
	for _, blip := range blips {
		filesToScrape = append(filesToScrape, blip.Download())
	}

	var filesScraped []scrape.ScrapedFile
	for _, file := range filesToScrape {
		filesScraped = append(filesScraped, file.Scrape())
	}

	blipsCsvFile, _ := os.Create("import/blipsSingle.csv")
	writer := csv.NewWriter(blipsCsvFile)
	defer blipsCsvFile.Close()

	writer.Write([]string{"technology", "date", "suggestion" })
	for _, scrapedFile := range filesScraped {
		fmt.Println(scrapedFile.File.Title)
		for _, blip := range scrapedFile.Entries {
			writer.Write([]string{scrapedFile.File.Title, blip["time"], blip["outcome"] })
		}
	}
	writer.Flush()
}

rawData/twRadar.html is a local copy of the A-Z page which contains all the blips. This version is reasonably simple: we create an array containing all the blips, scrape them into another array, and then that array into a CSV file. And if we run it:

$ time go run cmd/single/main.go 

real	3m10.354s
user	0m1.140s
sys	0m0.586s

$ head -n10 import/blipsSingle.csv 
technology,date,suggestion
.NET Core,Nov 2016,Assess
.NET Core,Nov 2015,Assess
.NET Core,May 2015,Assess
A single CI instance for all teams,Nov 2016,Hold
A single CI instance for all teams,Apr 2016,Hold
Acceptance test of journeys,Mar 2012,Trial
Acceptance test of journeys,Jul 2011,Trial
Acceptance test of journeys,Jan 2011,Trial
Accumulate-only data,Nov 2015,Assess

It takes a few minutes and most of the time will be taken in the blip.Download() function – work which is easily parallelisable. Let’s have a look at the parallel version where goroutines use channels to communicate with each other:

cmd/parallel/main.go

package main

import (
	"os"
	"encoding/csv"
	"github.com/mneedham/neo4j-thoughtworks-radar/scrape"
)

func main() {
	var filesToScrape chan scrape.File = make(chan scrape.File)
	var filesScraped chan scrape.ScrapedFile = make(chan scrape.ScrapedFile)
	defer close(filesToScrape)
	defer close(filesScraped)

	blips := scrape.FindBlips("rawData/twRadar.html")

	for _, blip := range blips {
		go func(blip scrape.Blip) { filesToScrape <- blip.Download() }(blip)
	}

	for i := 0; i < len(blips); i++ {
		select {
		case file := <-filesToScrape:
			go func(file scrape.File) { filesScraped <- file.Scrape() }(file)
		}
	}

	blipsCsvFile, _ := os.Create("import/blips.csv")
	writer := csv.NewWriter(blipsCsvFile)
	defer blipsCsvFile.Close()

	writer.Write([]string{"technology", "date", "suggestion" })
	for i := 0; i < len(blips); i++ {
		select {
		case scrapedFile := <-filesScraped:
			for _, blip := range scrapedFile.Entries {
				writer.Write([]string{scrapedFile.File.Title, blip["time"], blip["outcome"] })
			}
		}
	}
	writer.Flush()
}

Let's remove the files we just downloaded and give this version a try.

$ rm rawData/items/*

$ time go run cmd/parallel/main.go 

real	0m6.689s
user	0m2.544s
sys	0m0.904s

$ head -n10 import/blips.csv 
technology,date,suggestion
Zucchini,Oct 2012,Assess
Reactive Extensions for .Net,May 2013,Assess
Manual infrastructure management,Mar 2012,Hold
Manual infrastructure management,Jul 2011,Hold
JavaScript micro frameworks,Oct 2012,Trial
JavaScript micro frameworks,Mar 2012,Trial
NPM for all the things,Apr 2016,Trial
NPM for all the things,Nov 2015,Trial
PowerShell,Mar 2012,Trial

So we're down from 190 seconds to 7 seconds, pretty cool! One interesting thing is that the order of the values in the CSV file will be different since the goroutines won't necessarily come back in the same order that they were launched. We do end up with the same number of values:

$ wc -l import/blips.csv 
    1361 import/blips.csv

$ wc -l import/blipsSingle.csv 
    1361 import/blipsSingle.csv

And we can check that the contents are identical:

$ cat import/blipsSingle.csv  | sort > /tmp/blipsSingle.csv

$ cat import/blips.csv  | sort > /tmp/blips.csv

$ diff /tmp/blips.csv /tmp/blipsSingle.csv 


The code in this post is all on github. I'm sure I've made some mistakes/there are ways that this could be done better so do let me know in the comments or I'm @markhneedham on twitter.

Categories: Programming

Go: cannot execute binary file: Exec format error

Fri, 12/23/2016 - 19:24

In an earlier blog post I mentioned that I’d been building an internal application to learn a bit of Go and I wanted to deploy it to AWS.

Since the application was only going to live for a couple of days I didn’t want to spend a long time build up anything fancy so my plan was just to build the executable, SSH it to my AWS instance, and then run it.

My initial (somewhat naive) approach was to just build the project on my Mac and upload and run it:

$ go build

$ scp myapp ubuntu@aws...

$ ssh ubuntu@aws...

$ ./myapp
-bash: ./myapp: cannot execute binary file: Exec format error

That didn’t go so well! By reading Ask Ubuntu and Dave Cheney’s blog post on cross compilation I realised that I just needed to set the appropriate environment variables before running go build.

The following did the trick:

env GOOS=linux GOARCH=amd64 GOARM=7 go build

And that’s it! I’m sure there’s more sophisticated ways of doing this that I’ll come to learn about but for now this worked for me.

Categories: Programming

Neo4j: Graphing the ThoughtWorks Technology Radar

Fri, 12/23/2016 - 18:40

For a bit of Christmas holiday fun I thought it’d be cool to create a graph of the different blips on the ThoughtWorks Technology Radar and how the recommendations have changed over time.

I wrote a script to extract each blip (e.g. .NET Core) and the recommendation made in each radar that it appeared in. I ended up with a CSV file:

|----------------------------------------------+----------+-------------|
|  technology                                  | date     | suggestion  |
|----------------------------------------------+----------+-------------|
|  AppHarbor                                   | Mar 2012 | Trial       |
|  Accumulate-only data                        | Nov 2015 | Assess      |
|  Accumulate-only data                        | May 2015 | Assess      |
|  Accumulate-only data                        | Jan 2015 | Assess      |
|  Buying solutions you can only afford one of | Mar 2012 | Hold        |
|----------------------------------------------+----------+-------------|

I then wrote a Cypher script to create the following graph model:

2016 12 23 16 52 08

WITH ["Hold", "Assess", "Trial", "Adopt"] AS positions
UNWIND RANGE (0, size(positions) - 2) AS index
WITH positions[index] AS pos1, positions[index + 1] AS pos2
MERGE (position1:Position {value: pos1})
MERGE (position2:Position {value: pos2})
MERGE (position1)-[:NEXT]->(position2);

load csv with headers from "file:///blips.csv" AS row
MATCH (position:Position {value:  row.suggestion })
MERGE (tech:Technology {name:  row.technology })
MERGE (date:Date {value: row.date})
MERGE (recommendation:Recommendation {
  id: tech.name + "_" + date.value + "_" + position.value})
MERGE (recommendation)-[:ON_DATE]->(date)
MERGE (recommendation)-[:POSITION]->(position)
MERGE (recommendation)-[:TECHNOLOGY]->(tech);

match (date:Date)
SET date.timestamp = apoc.date.parse(date.value, "ms", "MMM yyyy");

MATCH (date:Date)
WITH date
ORDER BY date.timestamp
WITH COLLECT(date) AS dates
UNWIND range(0, size(dates)-2) AS index
WITH dates[index] as month1, dates[index+1] AS month2
MERGE (month1)-[:NEXT]->(month2);

MATCH (tech)<-[:TECHNOLOGY]-(reco:Recommendation)-[:ON_DATE]->(date)
WITH tech, reco, date
ORDER BY tech.name, date.timestamp
WITH tech, COLLECT(reco) AS recos
UNWIND range(0, size(recos)-2) AS index
WITH recos[index] AS reco1, recos[index+1] AS reco2
MERGE (reco1)-[:NEXT]->(reco2);

Note that I installed the APOC procedures library so that I could convert the string representation of a date into a timestamp using the apoc.date.parse function. The blips.csv file needs to go in the import directory of Neo4j.

Now we’re reading to write some queries.

The Technology Radar has 4 positions that can be taken for a given technology: Hold, Assess, Trial, and Adopt:

  • Hold: Process with Caution
  • Assess: Worth exploring with the goal of understanding how it will affect your enterprise.
  • Trial: Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.
  • Adopt: We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.

I was curious whether there had ever been a technology where the advice was initially to ‘Hold’ but had later changed to ‘Assess’. I wrote the following query to find out:

MATCH (pos1:Position {value:"Hold"})<-[:POSITION]-(reco)-[:TECHNOLOGY]->(tech),
      (pos2:Position {value:"Assess"})<-[:POSITION]-(otherReco)-[:TECHNOLOGY]->(tech),
      (reco)-[:ON_DATE]->(recoDate),
      (otherReco)-[:ON_DATE]->(otherRecoDate)
WHERE (reco)-[:NEXT]->(otherReco)
RETURN tech.name AS technology, otherRecoDate.value AS dateOfChange;

╒════════════╀══════════════╕
β”‚"technology"β”‚"dateOfChange"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════║
β”‚"Azure"     β”‚"Aug 2010"    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Only Azure! The page doesn’t have any explanation for the initial ‘Hold’ advice in April 2010 which was presumably just before ‘the cloud’ became prominent. What about the other way around? Are there any technologies where the suggestion was initially to ‘Assess’ but later to ‘Hold’?

MATCH (pos1:Position {value:"Assess"})<-[:POSITION]-(reco)-[:TECHNOLOGY]->(tech),
      (pos2:Position {value:"Hold"})<-[:POSITION]-(otherReco)-[:TECHNOLOGY]->(tech),
      (reco)-[:ON_DATE]->(recoDate),
      (otherReco)-[:ON_DATE]->(otherRecoDate)
WHERE (reco)-[:NEXT]->(otherReco)
RETURN tech.name AS technology, otherRecoDate.value AS dateOfChange;

╒═══════════════════════════════════╀══════════════╕
β”‚"technology"                       β”‚"dateOfChange"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════║
β”‚"RIA"                              β”‚"Apr 2010"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Backbone.js"                      β”‚"Oct 2012"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Pace-layered Application Strategy"β”‚"Nov 2015"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"SPDY"                             β”‚"May 2015"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"AngularJS"                        β”‚"Nov 2016"    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

A couple of these are Javascript libraries/frameworks so presumably the advice is now to use React instead. Let’s check:

MATCH (t:Technology)<-[:TECHNOLOGY]-(reco)-[:ON_DATE]->(date), (reco)-[:POSITION]->(pos)
WHERE t.name contains "React.js"
RETURN pos.value, date.value 
ORDER BY date.timestamp

╒═══════════╀════════════╕
β”‚"pos.value"β”‚"date.value"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════║
β”‚"Assess"   β”‚"Jan 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"    β”‚"May 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"    β”‚"Nov 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"    β”‚"Apr 2016"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"    β”‚"Nov 2016"  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Ember is also popular:

MATCH (t:Technology)<-[:TECHNOLOGY]-(reco)-[:ON_DATE]->(date), (reco)-[:POSITION]->(pos)
WHERE t.name contains "Ember"
RETURN pos.value, date.value 
ORDER BY date.timestamp

╒═══════════╀════════════╕
β”‚"pos.value"β”‚"date.value"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════║
β”‚"Assess"   β”‚"May 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Assess"   β”‚"Nov 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"    β”‚"Apr 2016"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"    β”‚"Nov 2016"  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Let’s go on a different tangent and look at how many technologies were introduced in the most recent radar?

MATCH (date:Date {value: "Nov 2016"})<-[:ON_DATE]-(reco)
WHERE NOT (reco)<-[:NEXT]-()
RETURN COUNT(*) 

╒══════════╕
β”‚"COUNT(*)"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•‘
β”‚"45"      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Wow, 45 new things! How were they spread across the different positions?

MATCH (date:Date {value: "Nov 2016"})<-[:ON_DATE]-(reco)-[:TECHNOLOGY]->(tech), 
      (reco)-[:POSITION]->(position)
WHERE NOT (reco)<-[:NEXT]-()
WITH position, COUNT(*) AS count, COLLECT(tech.name) AS technologies
ORDER BY LENGTH((position)-[:NEXT*]->()) DESC
RETURN position.value, count, technologies

╒════════════════╀═══════╀══════════════════════════════════════════════╕
β”‚"position.value"β”‚"count"β”‚"technologies"                                β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════β•ͺ══════════════════════════════════════════════║
β”‚"Hold"          β”‚"1"    β”‚["Anemic REST"]                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Assess"        β”‚"28"   β”‚["Nuance Mix","Micro frontends","Three.js","Scβ”‚
β”‚                β”‚       β”‚ikit-learn","WebRTC","ReSwift","Vue.js","Electβ”‚
β”‚                β”‚       β”‚ron","Container security scanning","wit.ai","Dβ”‚
β”‚                β”‚       β”‚ifferential privacy","Rapidoid","OpenVR","AWS β”‚
β”‚                β”‚       β”‚Application Load Balancer","Tarantool","IndiaSβ”‚
β”‚                β”‚       β”‚tack","Ethereum","axios","Bottled Water","Cassβ”‚
β”‚                β”‚       β”‚andra carefully","ECMAScript 2017","FBSnapshotβ”‚
β”‚                β”‚       β”‚Testcase","Client-directed query","JuMP","Clojβ”‚
β”‚                β”‚       β”‚ure.spec","HoloLens","Android-x86","Physical Wβ”‚
β”‚                β”‚       β”‚eb"]                                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"         β”‚"13"   β”‚["tmate","Lightweight Architecture Decision Reβ”‚
β”‚                β”‚       β”‚cords","APIs as a product","JSONassert","Unityβ”‚
β”‚                β”‚       β”‚ beyond gaming","Galen","Enzyme","Quick and Niβ”‚
β”‚                β”‚       β”‚mble","Talisman","fastlane","Auth0","Pa11y","Pβ”‚
β”‚                β”‚       β”‚hoenix"]                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"         β”‚"3"    β”‚["Grafana","Babel","Pipelines as code"]       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Lots of new things to explore over the holidays! The CSV files, import script, and queries used in this post are all available on github if you want to play around with them.

Categories: Programming

Go: Templating with the Gin Web Framework

Fri, 12/23/2016 - 15:30

I spent a bit of time over the last week building a little internal web application using Go and the Gin Web Framework and it took me a while to get the hang of the templating language so I thought I’d write up some examples.

Before we get started, I’ve got my GOPATH set to the following path:

$ echo $GOPATH
/Users/markneedham/projects/gocode

And the project containing the examples sits inside the src directory:

$ pwd
/Users/markneedham/projects/gocode/src/github.com/mneedham/golang-gin-templating-demo

Let’s first install Gin:

$ go get gopkg.in/gin-gonic/gin.v1

It gets installed here:

$ ls -lh $GOPATH/src/gopkg.in
total 0
drwxr-xr-x   3 markneedham  staff   102B 23 Dec 10:55 gin-gonic

Now let’s create a main function to launch our web application:

demo.go

package main

import (
	"github.com/gin-gonic/gin"
	"net/http"
)

func main() {
	router := gin.Default()
	router.LoadHTMLGlob("templates/*")

	// our handlers will go here

	router.Run("0.0.0.0:9090")
}

We’re launching our application on port 9090 and the templates live in the templates directory which is located relative to the file containing the main function:

$ ls -lh
total 8
-rw-r--r--  1 markneedham  staff   570B 23 Dec 13:34 demo.go
drwxr-xr-x  4 markneedham  staff   136B 23 Dec 13:34 templates
Arrays

Let’s create a route which will display the values of an array in an unordered list:

	router.GET("/array", func(c *gin.Context) {
		var values []int
		for i := 0; i < 5; i++ {
			values = append(values, i)
		}

		c.HTML(http.StatusOK, "array.tmpl", gin.H{"values": values})
	})
    {{ range .values }}
  • {{ . }}
  • {{ end }}

And now we'll cURL our application to see what we get back:

$ curl http://localhost:9090/array
  • 0
  • 1
  • 2
  • 3
  • 4

What about if we have an array of structs instead of just strings?

import "strconv"

type Foo struct {
	value1 int
	value2 string
}

	router.GET("/arrayStruct", func(c *gin.Context) {
		var values []Foo
		for i := 0; i < 5; i++ {
			values = append(values, Foo{Value1: i, Value2: "value " + strconv.Itoa(i)})
		}

		c.HTML(http.StatusOK, "arrayStruct.tmpl", gin.H{"values": values})
	})

    {{ range .values }}
  • {{ .Value1 }} -> {{ .Value2 }}
  • {{ end }}

cURL time:

$ curl http://localhost:9090/arrayStruct
  • 0 -> value 0
  • 1 -> value 1
  • 2 -> value 2
  • 3 -> value 3
  • 4 -> value 4
Maps

Now let's do the same for maps.

	router.GET("/map", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"

		c.HTML(http.StatusOK, "map.tmpl", gin.H{"myMap": values})
	})
    {{ range .myMap }}
  • {{ . }}
  • {{ end }}

And cURL it:

$ curl http://localhost:9090/map
  • Go
  • 1.7.4

What if we want to see the keys as well?

	router.GET("/mapKeys", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"

		c.HTML(http.StatusOK, "mapKeys.tmpl", gin.H{"myMap": values})
	})
    {{ range $key, $value := .myMap }}
  • {{ $key }} -> {{ $value }}
  • {{ end }}
$ curl http://localhost:9090/mapKeys
  • language -> Go
  • version -> 1.7.4

And finally, what if we want to select specific values from the map?

	router.GET("/mapSelectKeys", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"

		c.HTML(http.StatusOK, "mapSelectKeys.tmpl", gin.H{"myMap": values})
	})
  • Language: {{ .myMap.language }}
  • Version: {{ .myMap.version }}
$ curl http://localhost:9090/mapSelectKeys
  • Language: Go
  • Version: 1.7.4

I've found the Hugo Go Template Primer helpful for figuring this out so that's a good reference if you get stuck. You can find a go file containing all the examples on github if you want to use that as a starting point.

Categories: Programming

Docker: Unknown – Unable to query docker version: x509: certificate is valid for

Wed, 12/21/2016 - 08:11

I was playing around with Docker locally and somehow ended up with this error when I tried to list my docker machines:

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER    ERRORS
default   -        virtualbox   Running   tcp://192.168.99.101:2376           Unknown   Unable to query docker version: Get https://192.168.99.101:2376/v1.15/version: x509: certificate is valid for 192.168.99.100, not 192.168.99.101

My Google Fu was weak I couldn’t find any suggestions for what this might mean so I tried shutting it down and starting it again!

On the restart I actually got some helpful advice:

$ docker-machine stop
Stopping "default"...
Machine "default" was stopped.
$ docker-machine start
Starting "default"...
(default) Check network to re-create if needed...
(default) Waiting for an IP...
Machine "default" was started.
Waiting for SSH to be available...
Detecting the provisioner...
Started machines may have new IP addresses. You may need to re-run the `docker-machine env` command.

So I tried that:

$ docker-machine env
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.101:2376": x509: certificate is valid for 192.168.99.100, not 192.168.99.101
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

And then regenerates my certificates:

$ docker-machine regenerate-certs
Regenerate TLS machine certs?  Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Detecting the provisioner...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...

And now everything is happy again!

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER   ERRORS
default   -        virtualbox   Running   tcp://192.168.99.101:2376           v1.9.0
Categories: Programming

Kubernetes: Simulating a network partition

Sun, 12/04/2016 - 13:37

A couple of weeks ago I wrote a post explaining how to create a Neo4j causal cluster using Kubernetes and … the I wanted to work out how to simulate a network partition which would put the leader on the minority side and force an election.

We’ve done this on our internal tooling on AWS using the iptables command but unfortunately that isn’t available in my container, which only has the utilities provided by BusyBox.

Luckily one of these is route command which will allow us to achieve the same thing.

To recap, I have 3 Neo4j pods up and running:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
neo4j-0   1/1       Running   0          6h
neo4j-1   1/1       Running   0          6h
neo4j-2   1/1       Running   0          6h

And we can check that the route command is available:

$ kubectl exec neo4j-0 -- ls -alh /sbin/route 
lrwxrwxrwx    1 root     root          12 Oct 18 18:58 /sbin/route -> /bin/busybox

Let’s have a look what role each server is currently playing:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!
$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"

Bye!

Slight aside: I’m able to call cypher-shell without a user and password because I’ve disable authorisation by putting the following in conf/neo4j.conf:

dbms.connector.bolt.enabled=true

Back to the network partitioning…we need to partition away neo4j-2 from the other two servers which we can do by running the following commands:

$ kubectl exec neo4j-2 -- route add -host neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route add -host neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject

If we look at the logs of neo4j-2 we can see that it’s stepped down after being disconnected from the other two servers:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:30:10.186+0000 INFO  [o.n.c.c.c.RaftMachine] Moving to FOLLOWER state after not receiving heartbeat responses in this election timeout period. Heartbeats received: []
...

Who’s taken over as leader?

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"

Bye!
$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!

Looks like neo4j-0! Let’s put some data into the database:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CREATE (:Person {name: 'Mark'})"
Added 1 nodes, Set 1 properties, Added 1 labels

Bye!

Let’s check if that node made it to the other two servers. We’d expect it to be on neo4j-1 but not on neo4j-2:

$ kubectl exec neo4j-1 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})

Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"


Bye!

On neo4j-2 we’ll repeatedly see these types of entries in the log as its election timeout triggers but fails to get any responses to the vote requests it sends out:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:32:56.735+0000 INFO  [o.n.c.c.c.RaftMachine] Election timeout triggered
2016-12-04 11:32:56.736+0000 INFO  [o.n.c.c.c.RaftMachine] Election started with vote request: Vote.Request from MemberId{ca9b954c} {term=11521, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467} and members: [MemberId{484178c4}, MemberId{0acdb8dd}, MemberId{ca9b954c}]
...

We can see those vote requests by looking at the raft-messages.log which can be enabled by setting the following property in conf/neo4j.conf:

causal_clustering.raft_messages_log_enable=true
$ kubectl exec neo4j-2 -- cat logs/raft-messages.log
...
11:33:42.101 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:42.102 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}

11:33:45.432 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:45.433 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}

11:33:48.362 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:48.362 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
...

To ‘heal’ the network partition we just need to delete all the commands we ran earlier:

$ kubectl exec neo4j-2 -- route delete neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route delete neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject

Now let’s check that neo4j-2 now has the node that we created earlier:

$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})

Bye!

That’s all for now!

Categories: Programming

Kubernetes: Spinning up a Neo4j 3.1 Causal Cluster

Fri, 11/25/2016 - 17:55

A couple of weeks ago I wrote a blog post explaining how I’d created a Neo4j causal cluster using docker containers directly and for my next pet project I wanted to use Kubernetes as an orchestration layer so that I could declaratively change the number of servers in my cluster.

I’d never used Kubernetes before but I saw a presentation showing how to use it to create an Elastic cluster at the GDG Cloud meetup a couple of months ago.

In that presentation I was introduced to the idea of a PetSet which is an abstraction exposed by Kubernetes which allows us to manage a set of pods (containers) which have a fixed identity. The documentation explains it better:

A PetSet ensures that a specified number of β€œpets” with unique identities are running at any given time. The identity of a Pet is comprised of:

  • a stable hostname, available in DNS
  • an ordinal index
  • stable storage: linked to the ordinal & hostname

In my case I need to have a stable hostname because each member of a Neo4j cluster is given a list of other cluster members with which it can create a new cluster or join an already existing one. This is the first use case described in the documentation:

PetSet also helps with the 2 most common problems encountered managing such clustered applications:

  • discovery of peers for quorum
  • startup/teardown ordering

So the first thing we need to do is create some stable storage for our pods to use.

We’ll create a cluster of 3 members so we need to create one PersistentVolume for each of them. The following script does the job:

volumes.sh

for i in $(seq 0 2); do
  cat <

If we run this script it'll create 3 volumes which we can see by running the following command:

$ kubectl get pv
NAME      CAPACITY   ACCESSMODES   STATUS    CLAIM                     REASON    AGE
pv0       1Gi        RWO           Bound     default/datadir-neo4j-0             7s
pv1       1Gi        RWO           Bound     default/datadir-neo4j-1             7s
pv2       1Gi        RWO           Bound     default/datadir-neo4j-2             7s
$ kubectl get pvc
NAME              STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
datadir-neo4j-0   Bound     pv0       1Gi        RWO           26s
datadir-neo4j-1   Bound     pv1       1Gi        RWO           26s
datadir-neo4j-2   Bound     pv2       1Gi        RWO           25s

Next we need to create a PetSet template. After a lot of iterations I ended up with the following:

# Headless service to provide DNS lookup
apiVersion: v1
kind: Service
metadata:
  labels:
    app: neo4j
  name: neo4j
spec:
  clusterIP: None
  ports:
    - port: 7474
  selector:
    app: neo4j
----
# new API name
apiVersion: "apps/v1alpha1"
kind: PetSet
metadata:
  name: neo4j
spec:
  serviceName: neo4j
  replicas: 3
  template:
    metadata:
      annotations:
        pod.alpha.kubernetes.io/initialized: "true"
        pod.beta.kubernetes.io/init-containers: '[
            {
                "name": "install",
                "image": "gcr.io/google_containers/busybox:1.24",
                "command": ["/bin/sh", "-c", "echo \"
                unsupported.dbms.edition=enterprise\n
                dbms.mode=CORE\n
                dbms.connectors.default_advertised_address=$HOSTNAME.neo4j.default.svc.cluster.local\n
                dbms.connectors.default_listen_address=0.0.0.0\n
                dbms.connector.bolt.type=BOLT\n
                dbms.connector.bolt.enabled=true\n
                dbms.connector.bolt.listen_address=0.0.0.0:7687\n
                dbms.connector.http.type=HTTP\n
                dbms.connector.http.enabled=true\n
                dbms.connector.http.listen_address=0.0.0.0:7474\n
                causal_clustering.raft_messages_log_enable=true\n
                causal_clustering.initial_discovery_members=neo4j-0.neo4j.default.svc.cluster.local:5000,neo4j-1.neo4j.default.svc.cluster.local:5000,neo4j-2.neo4j.default.svc.cluster.local:5000\n
                causal_clustering.leader_election_timeout=2s\n
                  \" > /work-dir/neo4j.conf" ],
                "volumeMounts": [
                    {
                        "name": "confdir",
                        "mountPath": "/work-dir"
                    }
                ]
            }
        ]'
      labels:
        app: neo4j
    spec:
      containers:
      - name: neo4j
        image: "neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise"
        imagePullPolicy: Always
        ports:
        - containerPort: 5000
          name: discovery
        - containerPort: 6000
          name: tx
        - containerPort: 7000
          name: raft
        - containerPort: 7474
          name: browser
        - containerPort: 7687
          name: bolt
        securityContext:
          privileged: true
        volumeMounts:
        - name: datadir
          mountPath: /data
        - name: confdir
          mountPath: /conf
      volumes:
      - name: confdir
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

The main thing I had trouble with was getting the members of the cluster to talk to each other. The default docker config uses hostnames but I found that pods were unable to contact each other unless I specified the FQDN in the config file. We can run the following command to create the PetSet:

$ kubectl create -f neo4j.yaml 
service "neo4j" created
petset "neo4j" created

We can check if the pods are up and running by executing the following command:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
neo4j-0   1/1       Running   0          2m
neo4j-1   1/1       Running   0          14s
neo4j-2   1/1       Running   0          10s

And we can tail neo4j's log files like this:

$ kubectl logs neo4j-0
Starting Neo4j.
2016-11-25 16:39:50.333+0000 INFO  Starting...
2016-11-25 16:39:51.723+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:51.733+0000 INFO  Initiating metrics...
2016-11-25 16:39:51.911+0000 INFO  Waiting for other members to join cluster before continuing...
2016-11-25 16:40:12.074+0000 INFO  Started.
2016-11-25 16:40:12.428+0000 INFO  Mounted REST API at: /db/manage
2016-11-25 16:40:13.350+0000 INFO  Remote interface available at http://neo4j-0.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-1
Starting Neo4j.
2016-11-25 16:39:53.846+0000 INFO  Starting...
2016-11-25 16:39:56.212+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:56.225+0000 INFO  Initiating metrics...
2016-11-25 16:39:56.341+0000 INFO  Waiting for other members to join cluster before continuing...
2016-11-25 16:40:16.623+0000 INFO  Started.
2016-11-25 16:40:16.951+0000 INFO  Mounted REST API at: /db/manage
2016-11-25 16:40:17.607+0000 INFO  Remote interface available at http://neo4j-1.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-2
Starting Neo4j.
2016-11-25 16:39:57.828+0000 INFO  Starting...
2016-11-25 16:39:59.166+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:59.176+0000 INFO  Initiating metrics...
2016-11-25 16:39:59.329+0000 INFO  Waiting for other members to join cluster before continuing...
2016-11-25 16:40:19.216+0000 INFO  Started.
2016-11-25 16:40:19.675+0000 INFO  Mounted REST API at: /db/manage
2016-11-25 16:40:21.029+0000 INFO  Remote interface available at http://neo4j-2.neo4j.default.svc.cluster.local:7474/

I wanted to log into the servers from my host machine's browser so I setup port forwarding for each of the servers:

$ kubectl port-forward neo4j-0 7474:7474 7687:7687

We can then get an overview of the cluster by running the following procedure:

CALL dbms.cluster.overview()

╒════════════════════════════════════╀═════════════════════════════════════════════════════╀════════╕
β”‚id                                  β”‚addresses                                            β”‚role    β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════════════════════════════════════════════β•ͺ════════║
β”‚81d8e5e2-02db-4414-85de-a7025b346e84β”‚[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,β”‚LEADER  β”‚
β”‚                                    β”‚ http://neo4j-0.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚347b7517-7ca0-4b92-b9f0-9249d46b2ad3β”‚[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,β”‚FOLLOWERβ”‚
β”‚                                    β”‚ http://neo4j-1.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚a5ec1335-91ce-4358-910b-8af9086c2969β”‚[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,β”‚FOLLOWERβ”‚
β”‚                                    β”‚ http://neo4j-2.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

So far so good. What if we want to have 5 servers in the cluster instead of 3? We can run the following command to increase our replica size:

$ kubectl patch petset neo4j -p '{"spec":{"replicas":5}}'
"neo4j" patched

Let's run that procedure again:

CALL dbms.cluster.overview()

╒════════════════════════════════════╀═════════════════════════════════════════════════════╀════════╕
β”‚id                                  β”‚addresses                                            β”‚role    β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════════════════════════════════════════════β•ͺ════════║
β”‚81d8e5e2-02db-4414-85de-a7025b346e84β”‚[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,β”‚LEADER  β”‚
β”‚                                    β”‚ http://neo4j-0.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚347b7517-7ca0-4b92-b9f0-9249d46b2ad3β”‚[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,β”‚FOLLOWERβ”‚
β”‚                                    β”‚ http://neo4j-1.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚a5ec1335-91ce-4358-910b-8af9086c2969β”‚[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,β”‚FOLLOWERβ”‚
β”‚                                    β”‚ http://neo4j-2.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚28613d06-d4c5-461c-b277-ddb3f05e5647β”‚[bolt://neo4j-3.neo4j.default.svc.cluster.local:7687,β”‚FOLLOWERβ”‚
β”‚                                    β”‚ http://neo4j-3.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚2eaa0058-a4f3-4f07-9f22-d310562ad1ecβ”‚[bolt://neo4j-4.neo4j.default.svc.cluster.local:7687,β”‚FOLLOWERβ”‚
β”‚                                    β”‚ http://neo4j-4.neo4j.default.svc.cluster.local:7474]β”‚        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Neat! And it's as easy to go back down to 3 again:

$ kubectl patch petset neo4j -p '{"spec":{"replicas":3}}'
"neo4j" patched
CALL dbms.cluster.overview()

╒════════════════════════════════════╀══════════════════════════════════════════════════════╀════════╕
β”‚id                                  β”‚addresses                                             β”‚role    β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════════════════════════════════════════════β•ͺ════════║
β”‚81d8e5e2-02db-4414-85de-a7025b346e84β”‚[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687, β”‚LEADER  β”‚
β”‚                                    β”‚http://neo4j-0.neo4j.default.svc.cluster.local:7474]  β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚347b7517-7ca0-4b92-b9f0-9249d46b2ad3β”‚[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687, β”‚FOLLOWERβ”‚
β”‚                                    β”‚http://neo4j-1.neo4j.default.svc.cluster.local:7474]  β”‚        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚a5ec1335-91ce-4358-910b-8af9086c2969β”‚[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687, β”‚FOLLOWERβ”‚
β”‚                                    β”‚http://neo4j-2.neo4j.default.svc.cluster.local:7474]  β”‚        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Next I need to look at how we can add read replicas into the cluster. These don't take part in the membership/quorum algorithm so I think I'll be able to use the more common ReplicationController/Pod architecture for those.

If you want to play around with this the code is available as a gist. I'm using the minikube library for all my experiments but I'll hopefully get around to trying this on GCE or AWS soon.

Categories: Programming

Kubernetes: Writing hostname to a file

Tue, 11/22/2016 - 20:56

Over the weekend I spent a bit of time playing around with Kubernetes and to get the hang of the technology I set myself the task of writing the hostname of the machine to a file.

I’m using the excellent minikube tool to create a local Kubernetes cluster for my experiments so the first step is to spin that up:

$ minikube start
Starting local Kubernetes cluster...
Kubectl is now configured to use the cluster.

The first thing I needed to work out how to get the hostname. I figured there was probably an environment variable that I could access. We can call the env command to see a list of all the environment variables in a container so I created a pod template that would show me that information:

hostname_super_simple.yaml

apiVersion: v1
kind: Pod
metadata:
  name: mark-super-simple-test-pod
spec:
  containers:
    - name: test-container
      image: gcr.io/google_containers/busybox:1.24
      command: [ "/bin/sh", "-c", "env" ]      
  dnsPolicy: Default
  restartPolicy: Never

I then created a pod from that template and checked the logs of that pod:

$ kubectl create -f hostname_super_simple.yaml 
pod "mark-super-simple-test-pod" created
$ kubectl logs  mark-super-simple-test-pod
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://10.0.0.1:443
HOSTNAME=mark-super-simple-test-pod
SHLVL=1
HOME=/root
KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443
PWD=/
KUBERNETES_SERVICE_HOST=10.0.0.1

The information we need is in $HOSTNAME so the next thing we need to do is created a pod template which puts that into a file.

hostname_simple.yaml

apiVersion: v1
kind: Pod
metadata:
  name: mark-test-pod
spec:
  containers:
    - name: test-container
      image: gcr.io/google_containers/busybox:1.24
      command: [ "/bin/sh", "-c", "echo $HOSTNAME > /tmp/bar; cat /tmp/bar" ]
  dnsPolicy: Default
  restartPolicy: Never

We can create a pod using this template by running the following command:

$ kubectl create -f hostname_simple.yaml
pod "mark-test-pod" created

Now let’s check the logs of the instance to see whether our script worked:

$ kubectl logs mark-test-pod
mark-test-pod

Indeed it did, good times!

Categories: Programming

Neo4j 3.1 beta3 + docker: Creating a Causal Cluster

Sun, 11/13/2016 - 13:30

Over the weekend I’ve been playing around with docker and learning how to spin up a Neo4j Causal Cluster.

Causal Clustering is Neo4j’s new clustering architecture which makes use of Diego Ongaro’s Raft consensus algorithm to ensure writes are committed on a majority of servers. It’ll be available in the 3.1 series of Neo4j which is currently in beta. I’ll be using BETA3 in this post.

2016 11 13 09 14 41

I don’t know much about docker but luckily my colleague Kevin Van Gundy wrote a blog post a couple of weeks ago explaining how to spin up Neo4j inside a docker container which was very helpful for getting me started.

Kevin spins up a single Neo4j server using the latest released version, which at the time of writing is 3.0.7. Since we want to use a beta version we’ll need to use a docker image from the neo4j-experimental repository.

We’re going to create 3 docker instances, each running Neo4j, and have them form a cluster. We’ll name them instance0, instance1, and instance2. We’ll create config files for each instance on the host machine and refer to those from our docker instance. This is the config file for instance0:

/tmp/ce/instance0/conf/neo4j.conf

unsupported.dbms.edition=enterprise
dbms.mode=CORE

dbms.security.auth_enabled=false
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=512m
dbms.memory.pagecache.size=100M
dbms.tx_log.rotation.retention_policy=false

dbms.connector.bolt.type=BOLT
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=0.0.0.0:7687
dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=0.0.0.0:7474

dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=instance0

causal_clustering.initial_discovery_members=instance0:5000,instance1:5000,instance2:5000
causal_clustering.leader_election_timeout=2s

The only config that changes between instances is dbms.connectors.default_advertised_address which would have a value of instance1 or instance2 for the other members of our cluster.

We can create a docker instance using this config:

docker run --name=instance0 --detach \
           --publish=7474:7474 \
           --publish=7687:7687 \
           --net=cluster \
           --hostname=instance0 \
           --volume /tmp/ce/instance0/conf:/conf \
           --volume /tmp/ce/instance0/data:/data \
           neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise

We create the network ‘cluster’ referenced on the 4th line like this:

docker network create --driver=bridge cluster

It’s a bit of a pain having to create these config files and calls to docker by hand but luckily Michael has scripted the whole thing for us.

docker.sh

function config {
mkdir -p /tmp/ce/$1/conf
cat > /tmp/ce/$1/conf/neo4j.conf << EOF
unsupported.dbms.edition=enterprise
dbms.mode=CORE

dbms.security.auth_enabled=false
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=512m
dbms.memory.pagecache.size=100M
dbms.tx_log.rotation.retention_policy=false

dbms.connector.bolt.type=BOLT
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=0.0.0.0:7687
dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=0.0.0.0:7474

dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=${1}

causal_clustering.initial_discovery_members=instance0:5000,instance1:5000,instance2:5000
causal_clustering.leader_election_timeout=2s
EOF
}

function run {
HOST=$1
INSTANCE=instance$HOST
config $INSTANCE
docker run --name=$INSTANCE --detach \
           --publish=$[7474+$HOST]:7474 \
           --publish=$[7687+$HOST]:7687 \
           --net=cluster \
           --hostname=$INSTANCE \
           --volume /tmp/ce/$INSTANCE/conf:/conf \
           --volume /tmp/ce/$INSTANCE/data:/data \
           neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise
}

docker network create --driver=bridge cluster

run 0
run 1
run 2

Once we run the script we can run the following command to check that the cluster has come up:

$ docker logs instance0
Starting Neo4j.
2016-11-13 11:46:55.863+0000 INFO  Starting...
2016-11-13 11:46:57.241+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2016-11-13 11:46:57.255+0000 INFO  Initiating metrics...
2016-11-13 11:46:57.439+0000 INFO  Waiting for other members to join cluster before continuing...
2016-11-13 11:47:17.816+0000 INFO  Started.
2016-11-13 11:47:18.054+0000 INFO  Mounted REST API at: /db/manage
2016-11-13 11:47:19.068+0000 INFO  Remote interface available at http://instance0:7474/

Each instance is available at port 7474 but we've mapped these to different ports on the host OS by using this line in the parameters we passed to docker run:

--publish=$[7474+$HOST]:7474

We can therefore access each of these Neo4j instances from the host OS at the following ports:

instance0 -> http://localhost:7474
instance1 -> http://localhost:7475
instance2 -> http://localhost:7476

If we open one of those we'll be confronted with the following dialog:

2016 11 13 12 10 06

This is a bit strange as we explicitly disabled security in our config.

The actual problem is that the Neo4j browser is unable to communicate with the underlying database. There are two ways to work around this:

Connect using HTTP instead of BOLT

We can tell the browser to connect to the database using the HTTP protocol rather than BOLT by unticking the checkbox:

2016 11 13 12 12 24 Update the BOLT host

Or we can update the Bolt host value to refer to a host:port value that's accessible from the host OS. Each server is accessible from port 7687 but we mapped those ports to different ports on the host OS with this flag that we passed to docker run:

--publish=$[7687+$HOST]:7687 \

We can access BOLT from the following ports:

instance0 -> localhost:7687
instance1 -> localhost:7688
instance2 -> localhost:7689

Let's try changing it for instance2:

2016 11 13 12 20 29

You might have to refresh your web browser after you change value but it usually updates automatically. We can run the :sysinfo command in the browser to see the state of our cluster:

2016 11 13 12 22 55

And we're good to go. The full script is available as a gist if you want to give it a try.

Let me know how you get on!

Categories: Programming

Neo4j: Find the intermediate point between two lat/longs

Tue, 11/01/2016 - 23:10

Yesterday I wrote a blog post showing how to find the midpoint between two lat/longs using Cypher which worked well as a first attempt at filling in missing locations, but I realised I could do better.

As I mentioned in the last post, when I find a stop that’s missing lat/long coordinates I can usually find two nearby stops that allow me to triangulate this stop’s location.

I also have train routes which indicate the number of seconds it takes to go from one stop to another, which allows me to indicate whether the location-less stop is closer to one stop than the other.

For example, consider stops a, b, and c where b doesn’t have a location. If we have these distances between the stops:

(a)-[:NEXT {time: 60}]->(b)-[:NEXT {time: 240}]->(c)

it tells us that point ‘b’ is actually 0.2 of the distance from ‘a’ to ‘c’ rather than being the midpoint.

There’s a formula we can use to work out that point:

a = sin((1βˆ’f)β‹…Ξ΄) / sin Ξ΄
b = sin(fβ‹…Ξ΄) / sin Ξ΄
x = a β‹… cos Ο†1 β‹… cos Ξ»1 + b β‹… cos Ο†2 β‹… cos Ξ»2
y = a β‹… cos Ο†1 β‹… sin Ξ»1 + b β‹… cos Ο†2 β‹… sin Ξ»2
z = a β‹… sin Ο†1 + b β‹… sin Ο†2
Ο†i = atan2(z, √xΒ² + yΒ²)
Ξ»i = atan2(y, x)

Ξ΄ is the angular distance d/R between the two points.
Ο† = latitude
Ξ» = longitude

Translated to Cypher (with mandatory Greek symbols) it reads like this to find the point 0.2 of the way from one point to another

with {latitude: 51.4931963543, longitude: -0.0475185810} AS p1, 
     {latitude: 51.47908, longitude: -0.05393950 } AS p2

WITH p1, p2, distance(point(p1), point(p2)) / 6371000 AS Ξ΄, 0.2 AS f
WITH p1, p2, Ξ΄, 
     sin((1-f) * Ξ΄) / sin(Ξ΄) AS a,
     sin(f * Ξ΄) / sin(Ξ΄) AS b
WITH radians(p1.latitude) AS Ο†1, radians(p1.longitude) AS Ξ»1,
     radians(p2.latitude) AS Ο†2, radians(p2.longitude) AS Ξ»2,
     a, b
WITH a * cos(Ο†1) * cos(Ξ»1) + b * cos(Ο†2) * cos(Ξ»2) AS x,
     a * cos(Ο†1) * sin(Ξ»1) + b * cos(Ο†2) * sin(Ξ»2) AS y,
     a * sin(Ο†1) + b * sin(Ο†2) AS z
RETURN degrees(atan2(z, sqrt(x^2 + y^2))) AS Ο†i,
       degrees(atan2(y,x)) AS Ξ»i
╒═════════════════╀════════════════════╕
β”‚Ο†i               β”‚Ξ»i                  β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════════════║
β”‚51.49037311149128β”‚-0.04880308288561931β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

A quick sanity check plugging in 0.5 instead of 0.2 finds the midpoint which I was able to sanity check against yesterday’s post:

╒═════════════════╀═════════════════════╕
β”‚Ο†i               β”‚Ξ»i                   β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════════════║
β”‚51.48613822097523β”‚-0.050729537454086385β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

That’s all for now!

Categories: Programming

Neo4j: Find the midpoint between two lat/longs

Mon, 10/31/2016 - 20:31

2016 10 31 06 06 00

Over the last couple of weekends I’ve been playing around with some transport data and I wanted to run the A* algorithm to find the quickest route between two stations.

The A* algorithm takes an estimateEvaluator as one of its parameters and the evaluator looks at lat/longs of nodes to work out whether a path is worth following or not. I therefore needed to add lat/longs for each station and I found it surprisingly hard to find this location date for all the points in the dataset.

Luckily I tend to have the lat/longs for two points either side of a station so I can work out the midpoint as an approximation for the missing one.

I found an article which defines a formula we can use to do this and there’s a StackOverflow post which has some Java code that implements the formula.

I wanted to find the midpoint between Surrey Quays station (51.4931963543,-0.0475185810) and a point further south on the train line (51.47908,-0.05393950). I wrote the following Cypher query to calculate this point:

WITH 51.4931963543 AS lat1, -0.0475185810 AS lon1, 
     51.47908 AS lat2 , -0.05393950 AS lon2

WITH radians(lat1) AS rlat1, radians(lon1) AS rlon1, 
     radians(lat2) AS rlat2, radians(lon2) AS rlon2, 
     radians(lon2 - lon1) AS dLon

WITH rlat1, rlon1, rlat2, rlon2, 
     cos(rlat2) * cos(dLon) AS Bx, 
     cos(rlat2) * sin(dLon) AS By

WITH atan2(sin(rlat1) + sin(rlat2), 
           sqrt( (cos(rlat1) + Bx) * (cos(rlat1) + Bx) + By * By )) AS lat3,
     rlon1 + atan2(By, cos(rlat1) + Bx) AS lon3

RETURN degrees(lat3) AS midLat, degrees(lon3) AS midLon
╒═════════════════╀═════════════════════╕
β”‚midLat           β”‚midLon               β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════════════════║
β”‚51.48613822097523β”‚-0.050729537454086385β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Google Maps screenshot on the right hand side shows the initial points at the top and bottom and the midpoint in between. It’s not perfect; ideally I’d like the midpoint to be on the track, but I think it’s good enough for the purposes of the algorithm.

Now I need to go and fill in the lat/longs for my location-less stations!

Categories: Programming

Neo4j: Create dynamic relationship type

Sun, 10/30/2016 - 23:12

One of the things I’ve often found frustrating when importing data using Cypher, Neo4j’s query language, is that it’s quite difficult to create dynamic relationship types.

Say we have a CSV file structured like this:

load csv with headers from "file:///people.csv" AS row
RETURN row
╒═══════════════════════════════════════════════════════╕
β”‚row                                                    β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚{node1: Mark, node2: Reshmee, relationship: MARRIED_TO}β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚{node1: Mark, node2: Alistair, relationship: FRIENDS}  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

We want to create nodes with the relationship type specified in the file. Unfortunately, in Cypher we can’t pass in relationship types so we have to resort to the FOREACH hack to create our relationships:

load csv with headers from "file:///people.csv" AS row
MERGE (p1:Person {name: row.node1})
MERGE (p2:Person {name: row.node2})

FOREACH(ignoreMe IN CASE WHEN row.relationship = "MARRIED_TO" THEN [1] ELSE [] END |
 MERGE (p1)-[:MARRIED_TO]->(p2))

FOREACH(ignoreMe IN CASE WHEN row.relationship = "FRIENDS" THEN [1] ELSE [] END |
 MERGE (p1)-[:FRIENDS]->(p2))

This works, but:

  1. Looks horrendous
  2. Doesn’t scale particularly well when we have multiple relationship types to deal with

As in my last post the APOC library comes to the rescue again, this time in the form of the apoc.create.relationship procedure.

This procedure allows us to change our initial query to read like this:

load csv with headers from "file:///people.csv" AS row
MERGE (p1:Person {name: row.node1})
MERGE (p2:Person {name: row.node2})

WITH p1, p2, row
CALL apoc.create.relationship(p1, row.relationship, {}, p2) YIELD rel
RETURN rel

Much better!

Categories: Programming

Neo4j: Dynamically add property/Set dynamic property

Thu, 10/27/2016 - 06:29

I’ve been playing around with a dataset which has the timetable for the national rail in the UK and they give you departure and arrival times of each train in a textual format.

For example, the node to represent a stop could be created like this:

CREATE (stop:Stop {arrival: "0802", departure: "0803H"})

That time format isn’t particular amenable to querying so I wanted to add another property which indicated the number of seconds since the start of the day.

So we want to add ‘arrivalSecondsSinceStartOfDay’ and ‘departureSecondsSinceStartOfDay’ properties to our node. I wrote the following query to calculate the values for those properties.

MATCH (stop:Stop)
UNWIND ["arrival", "departure"] AS key

WITH key,
     toInteger(substring(stop[key], 0, 2)) AS hours,          
     toInteger(substring(stop[key], 2, 2)) AS minutes,
     CASE WHEN substring(stop[key], 4,1) = "H" THEN 30 ELSE 0 END AS seconds

WITH key, (hours * 60 * 60) + (minutes * 60) + seconds AS secondsSinceStartOfDay

RETURN key + "SecondsSinceStartOfDay" AS newKey, secondsSinceStartOfDay
╒═══════════════════════════════╀══════════════════════╕
β”‚newKey                         β”‚secondsSinceStartOfDayβ”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════════════║
β”‚arrivalSecondsSinceStartOfDay  β”‚28920                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚departureSecondsSinceStartOfDayβ”‚29010                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Now we’re ready to set those properties on the ‘stop’ node.

MATCH (stop:Stop2)
UNWIND ["arrival", "departure"] AS key

WITH stop,
     key,
     toInteger(substring(stop[key], 0, 2)) AS hours,          
     toInteger(substring(stop[key], 2, 2)) AS minutes,
     CASE WHEN substring(stop[key], 4,1) = "H" THEN 30 ELSE 0 END AS seconds

WITH stop, key, (hours * 60 * 60) + (minutes * 60) + seconds AS secondsSinceStartOfDay
WITH stop, key + "SecondsSinceStartOfDay" AS newKey, secondsSinceStartOfDay
SET stop[newKey] = secondsSinceStartOfDay
Invalid input '[': expected an identifier character, whitespace, '{', node labels, a property map, a relationship pattern, '.', '(', '=' or "+=" (line 12, column 9 (offset: 447))
"SET stop[newKey] = secondsSinceStartOfDay"
         ^

Hmmm that didn’t work as expected! It doesn’t look like we can set dynamic properties using Cypher just yet.

Luckily my colleague Michael Hunger and the Neo4j community have been curating the APOC procedures library and it has just the procedure to help us out.

You’ll need to download the jar for your version of Neo4j and then place it in the plugins directory. I’m using Neo4j 3.1 Beta1 so this is what it looks like for me:

$ tree neo4j-enterprise-3.1.0-BETA1/plugins/

neo4j-enterprise-3.1.0-BETA1/plugins/
└── apoc-3.1.0.1-all.jar

0 directories, 1 file

After you’ve done that you’ll need to restart Neo4j so that it can pick up the new procedures that we’ve added. Once you’ve done that execute the following query to check they’ve installed correctly:

call dbms.procedures()
YIELD name 
WITH name 
WHERE name STARTS WITH "apoc"
RETURN COUNT(*)
╒════════╕
β”‚COUNT(*)β”‚
β•žβ•β•β•β•β•β•β•β•β•‘
β”‚183     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

We’re now ready to dynamically set properties in the graph. The procedure that we’ll use is apoc.create.setProperty and it’s easy to update our query to use it:

MATCH (stop:Stop)
UNWIND ["arrival", "departure"] AS key

WITH stop,
     key,
     toInteger(substring(stop[key], 0, 2)) AS hours,          
     toInteger(substring(stop[key], 2, 2)) AS minutes,
     CASE WHEN substring(stop[key], 4,1) = "H" THEN 30 ELSE 0 END AS seconds

WITH stop, key, (hours * 60 * 60) + (minutes * 60) + seconds AS secondsSinceStartOfDay
WITH stop, key + "SecondsSinceStartOfDay" AS newKey, secondsSinceStartOfDay
CALL apoc.create.setProperty(stop, newKey, secondsSinceStartOfDay) 
Query cannot conclude with CALL (must be RETURN or an update clause) (line 12, column 1 (offset: 439))
"CALL apoc.create.setProperty(stop, newKey, secondsSinceStartOfDay)"
 ^

Oops I spoke too soon! We need to yield the return column of the procedure and return it or just return a count to work around this:

MATCH (stop:Stop)
UNWIND ["arrival", "departure"] AS key

WITH stop,
     key,
     toInteger(substring(stop[key], 0, 2)) AS hours,          
     toInteger(substring(stop[key], 2, 2)) AS minutes,
     CASE WHEN substring(stop[key], 4,1) = "H" THEN 30 ELSE 0 END AS seconds

WITH stop, key, (hours * 60 * 60) + (minutes * 60) + seconds AS secondsSinceStartOfDay
WITH stop, key + "SecondsSinceStartOfDay" AS newKey, secondsSinceStartOfDay
CALL apoc.create.setProperty(stop, newKey, secondsSinceStartOfDay) 
YIELD node
RETURN COUNT(*)
╒════════╕
β”‚COUNT(*)β”‚
β•žβ•β•β•β•β•β•β•β•β•‘
β”‚2       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

And that’s it, we can now dynamically set properties in our queries.

Categories: Programming

Neo4j: Detecting rogue spaces in CSV headers with LOAD CSV

Wed, 10/19/2016 - 06:16

Last week I was helping someone load the data from a CSV file into Neo4j and we were having trouble filtering out rows which contained a null value in one of the columns.

This is what the data looked like:

load csv with headers from "file:///foo.csv" as row
RETURN row
╒══════════════════════════════════╕
β”‚row                               β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚{key1: a,  key2: (null),  key3: c}β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚{key1: d,  key2: e,  key3: f}     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

We’d like to filter out any rows which have ‘key2’ as null, so let’s tweak our query to do that:

load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row
(no rows)

Hmmm that’s odd, it’s got rid of both rows. We’d expect to see the 2nd row since that doesn’t have a null value.

At this point we might suspect that what we’re seeing on the screen isn’t actually what the data looks like. Let’s write the following query to check our header values:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒═════╀═════════╕
β”‚key  β”‚SIZE(key)β”‚
β•žβ•β•β•β•β•β•ͺ═════════║
β”‚key1 β”‚4        β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ key2β”‚5        β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ key3β”‚5        β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The second column tells us that there are some extra characters in the columns for ‘key2’ and ‘key3’ or rather ‘ key2’ and ‘ key3’. In this case they are spaces, but it could easily be another character:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, replace(key, " ", "_SPACE_") AS spaces
╒═════╀═══════════╕
β”‚key  β”‚spaces     β”‚
β•žβ•β•β•β•β•β•ͺ═══════════║
β”‚key1 β”‚key1       β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ key2β”‚_SPACE_key2β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ key3β”‚_SPACE_key3β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

If we clean up our CSV file and try again everything works as expected:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒════╀═════════╕
β”‚key β”‚SIZE(key)β”‚
β•žβ•β•β•β•β•ͺ═════════║
β”‚key1β”‚4        β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚key2β”‚4        β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚key3β”‚4        β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row
╒═══════════════════════════╕
β”‚row                        β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚{key1: d, key2: e, key3: f}β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Categories: Programming

Neo4j: requirement failed

Tue, 10/04/2016 - 23:33

Last week during a hands on Cypher meetup, using Neo4j’s built in movie dataset, one of the attendees showed me the following query which wasn’t working as expected:

MATCH (p:Person)-[:ACTED_IN]->(movie)
RETURN p, COLLECT(movie.title) AS movies
ORDER BY COUNT(movies) DESC
LIMIT 10
requirement failed

We can get a full stack trace in logs/debug.log if we run the same query from the cypher-shell, which was introduced during one fo the Neo4j 3.1 milestone releases:

2016-10-03 23:25:07.529+0000 ERROR [o.n.b.v.r.ErrorReporter] Client triggered an unexpected error [UnknownError]: requirement failed, reference to. requirement failed
java.lang.IllegalArgumentException: requirement failed
        at scala.Predef$.require(Predef.scala:212)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.steps.sortSkipAndLimit$.apply(sortSkipAndLimit.scala:38)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.PlanEventHorizon$.apply(PlanEventHorizon.scala:43)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.PlanEventHorizon$.apply(PlanEventHorizon.scala:31)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.PlanWithTail.apply(PlanWithTail.scala:46)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.PlanWithTail.apply(PlanWithTail.scala:29)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.PlanSingleQuery.apply(PlanSingleQuery.scala:47)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.PlanSingleQuery.apply(PlanSingleQuery.scala:30)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.DefaultQueryPlanner$$anonfun$2.apply(QueryPlanner.scala:51)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.DefaultQueryPlanner$$anonfun$2.apply(QueryPlanner.scala:51)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.immutable.List.map(List.scala:285)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.DefaultQueryPlanner.planQueries(QueryPlanner.scala:51)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.logical.DefaultQueryPlanner.plan(QueryPlanner.scala:36)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.CostBasedExecutablePlanBuilder.produceLogicalPlan(CostBasedExecutablePlanBuilder.scala:95)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.CostBasedExecutablePlanBuilder$$anonfun$1.apply(CostBasedExecutablePlanBuilder.scala:71)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.CostBasedExecutablePlanBuilder$$anonfun$1.apply(CostBasedExecutablePlanBuilder.scala:71)
        at org.neo4j.cypher.internal.compiler.v3_1.helpers.package$$anonfun$closing$1.apply(package.scala:29)
        at org.neo4j.cypher.internal.compiler.v3_1.helpers.package$$anonfun$closing$1.apply(package.scala:29)
        at org.neo4j.cypher.internal.compiler.v3_1.helpers.package$.using(package.scala:37)
        at org.neo4j.cypher.internal.compiler.v3_1.helpers.package$.closing(package.scala:29)
        at org.neo4j.cypher.internal.compiler.v3_1.planner.CostBasedExecutablePlanBuilder.producePlan(CostBasedExecutablePlanBuilder.scala:70)
        at org.neo4j.cypher.internal.compiler.v3_1.executionplan.procs.DelegatingProcedureExecutablePlanBuilder.producePlan(DelegatingProcedureExecutablePlanBuilder.scala:99)
        at org.neo4j.cypher.internal.compiler.v3_1.executionplan.FallbackBuilder$class.producePlan(SilentFallbackPlanBuilder.scala:37)
        at org.neo4j.cypher.internal.compiler.v3_1.executionplan.SilentFallbackPlanBuilder.producePlan(SilentFallbackPlanBuilder.scala:56)
        at org.neo4j.cypher.internal.compiler.v3_1.executionplan.ExecutionPlanBuilder.build(ExecutionPlanBuilder.scala:106)
        at org.neo4j.cypher.internal.compiler.v3_1.CypherCompiler$$anonfun$5.apply(CypherCompiler.scala:176)
        at org.neo4j.cypher.internal.compiler.v3_1.CypherCompiler$$anonfun$5.apply(CypherCompiler.scala:176)
        at org.neo4j.cypher.internal.compiler.v3_1.QueryCache$$anonfun$getOrElseUpdate$1$$anonfun$apply$1.apply(CacheAccessor.scala:36)
        at org.neo4j.cypher.internal.compiler.v3_1.MonitoringCacheAccessor$$anonfun$1.apply(CacheAccessor.scala:57)
        at org.neo4j.cypher.internal.compiler.v3_1.LFUCache$$anon$1.apply(LFUCache.scala:31)

The problem line seems to be this one:

require(sortItems.forall(_.expression.isInstanceOf[Variable]))

The solution is actually quite simple. We should be using the SIZE function to calculate the size of a list rather then the COUNT function:

MATCH (p:Person)-[:ACTED_IN]->(movie)
RETURN p, COLLECT(movie.title) AS movies
ORDER BY SIZE(movies) DESC
LIMIT 10
Categories: Programming

Neo4j: Procedure call inside a query does not support passing arguments implicitly (pass explicitly after procedure name instead)

Sun, 10/02/2016 - 11:13

A couple of days I was trying to write a Cypher query to filter the labels in my database.

I started with the following procedure call to get the list of all the labels:

CALL db.labels
╒══════════╕
β”‚label     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•‘
β”‚Airport   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Flight    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Airline   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Movie     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚AirportDayβ”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Person    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Engineer  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I was only interested in labels that contained the letter ‘a’ so I tweaked the query to filter the output of the procedure:

CALL db.labels
YIELD label 
WITH label WHERE tolower(label) contains "a"
RETURN label

Unfortunately that didn’t work as I expected:

Procedure call inside a query does not support passing arguments implicitly (pass explicitly after procedure name instead) (line 1, column 9 (offset: 8))
"CALL db.labels"
         ^

The mistake I made was calling the procedure implicitly without using parentheses. If you want to do any post processing on the output of a procedure you need to call it explicitly otherwise Cypher gets very confused.

If we add back the parentheses it’s much happier:

CALL db.labels()
YIELD label 
WITH label WHERE tolower(label) contains "a"
RETURN label
╒══════════╕
β”‚label     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•‘
β”‚Airport   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚Airline   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚AirportDayβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

It stumped me for a while until I figured out what the error message meant! I think I’ll just use explicit parentheses all the time from now on to save me running into this one again.

Categories: Programming