Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Mark Needham
Syndicate content
Thoughts on Software Development
Updated: 4 hours 34 min ago

Luigi: An ExternalProgramTask example – Converting JSON to CSV

Sat, 03/25/2017 - 15:09

I’ve been playing around with the Python library Luigi which is used to build pipelines of batch jobs and I struggled to find an example of an ExternalProgramTask so this is my attempt at filling that void.

Luigi - the Python data library for building data science pipelines

I’m building a little data pipeline to get data from the meetup.com API and put it into CSV files that can be loaded into Neo4j using the LOAD CSV command.

The first task I created calls the /groups endpoint and saves the result into a JSON file:

import luigi
import requests
import json
from collections import Counter

class GroupsToJSON(luigi.Task):
    key = luigi.Parameter()
    lat = luigi.Parameter()
    lon = luigi.Parameter()

    def run(self):
        seed_topic = "nosql"
        uri = "https://api.meetup.com/2/groups?&topic={0}&lat={1}&lon={2}&key={3}".format(seed_topic, self.lat, self.lon, self.key)

        r = requests.get(uri)
        all_topics = [topic["urlkey"]  for result in r.json()["results"] for topic in result["topics"]]
        c = Counter(all_topics)

        topics = [entry[0] for entry in c.most_common(10)]

        groups = {}
        for topic in topics:
            uri = "https://api.meetup.com/2/groups?&topic={0}&lat={1}&lon={2}&key={3}".format(topic, self.lat, self.lon, self.key)
            r = requests.get(uri)
            for group in r.json()["results"]:
                groups[group["id"]] = group

        with self.output().open('w') as groups_file:
            json.dump(list(groups.values()), groups_file, indent=4, sort_keys=True)

    def output(self):
        return luigi.LocalTarget("/tmp/groups.json")

We define a few parameters at the top of the class which will be passed in when this task is executed. The most interesting lines of the run function are the last couple where we write the JSON to a file. self.output() refers to the target defined in the output function which in this case is /tmp/groups.json.

Now we need to create a task to convert that JSON file into CSV format. The jq command line tool does this job well so we’ll use that. The following task does the job:

from luigi.contrib.external_program import ExternalProgramTask

class GroupsToCSV(luigi.contrib.external_program.ExternalProgramTask):
    file_path = "/tmp/groups.csv"
    key = luigi.Parameter()
    lat = luigi.Parameter()
    lon = luigi.Parameter()

    def program_args(self):
        return ["./groups.sh", self.input()[0].path, self.output().path]

    def output(self):
        return luigi.LocalTarget(self.file_path)

    def requires(self):
        yield GroupsToJSON(self.key, self.lat, self.lon)

groups.sh

#!/bin/bash

in=${1}
out=${2}

echo "id,name,urlname,link,rating,created,description,organiserName,organiserMemberId" > ${out}
jq -r '.[] | [.id, .name, .urlname, .link, .rating, .created, .description, .organizer.name, .organizer.member_id] | @csv' ${in} >> ${out}

I wanted to call jq directly from the Python code but I couldn’t figure out how to do it so putting that code in a shell script is my workaround.

The last piece of the puzzle is a wrapper task that launches the others:

import os

class Meetup(luigi.WrapperTask):
    def run(self):
        print("Running Meetup")

    def requires(self):
        key = os.environ['MEETUP_API_KEY']
        lat = os.getenv('LAT', "51.5072")
        lon = os.getenv('LON', "0.1275")

        yield GroupsToCSV(key, lat, lon)

Now we’re ready to run the tasks:

$ PYTHONPATH="." luigi --module blog --local-scheduler Meetup
DEBUG: Checking if Meetup() is complete
DEBUG: Checking if GroupsToCSV(key=xxx, lat=51.5072, lon=0.1275) is complete
INFO: Informed scheduler that task   Meetup__99914b932b   has status   PENDING
DEBUG: Checking if GroupsToJSON(key=xxx, lat=51.5072, lon=0.1275) is complete
INFO: Informed scheduler that task   GroupsToCSV_xxx_51_5072_0_1275_e07372cebf   has status   PENDING
INFO: Informed scheduler that task   GroupsToJSON_xxx_51_5072_0_1275_e07372cebf   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 3
INFO: [pid 4452] Worker Worker(salt=970508581, workers=1, host=Marks-MBP-4, username=markneedham, pid=4452) running   GroupsToJSON(key=xxx, lat=51.5072, lon=0.1275)
INFO: [pid 4452] Worker Worker(salt=970508581, workers=1, host=Marks-MBP-4, username=markneedham, pid=4452) done      GroupsToJSON(key=xxx, lat=51.5072, lon=0.1275)
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   GroupsToJSON_xxx_51_5072_0_1275_e07372cebf   has status   DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 2
INFO: [pid 4452] Worker Worker(salt=970508581, workers=1, host=Marks-MBP-4, username=markneedham, pid=4452) running   GroupsToCSV(key=xxx, lat=51.5072, lon=0.1275)
INFO: Running command: ./groups.sh /tmp/groups.json /tmp/groups.csv
INFO: [pid 4452] Worker Worker(salt=970508581, workers=1, host=Marks-MBP-4, username=markneedham, pid=4452) done      GroupsToCSV(key=xxx, lat=51.5072, lon=0.1275)
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   GroupsToCSV_xxx_51_5072_0_1275_e07372cebf   has status   DONE
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 4452] Worker Worker(salt=970508581, workers=1, host=Marks-MBP-4, username=markneedham, pid=4452) running   Meetup()
Running Meetup
INFO: [pid 4452] Worker Worker(salt=970508581, workers=1, host=Marks-MBP-4, username=markneedham, pid=4452) done      Meetup()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   Meetup__99914b932b   has status   DONE
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=970508581, workers=1, host=Marks-MBP-4, username=markneedham, pid=4452) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 3 tasks of which:
* 3 ran successfully:
    - 1 GroupsToCSV(key=xxx, lat=51.5072, lon=0.1275)
    - 1 GroupsToJSON(key=xxx, lat=51.5072, lon=0.1275)
    - 1 Meetup()

This progress looks 🙂 because there were no failed tasks or missing external dependencies

===== Luigi Execution Summary =====

Looks good! Let’s quickly look at our CSV file:

$ head -n10 /tmp/groups.csv 
id,name,urlname,link,rating,created,description,organiserName,organiserMemberId
1114381,"London NoSQL, MySQL, Open Source Community","london-nosql-mysql","https://www.meetup.com/london-nosql-mysql/",4.28,1208505614000,"

Meet others in London interested in NoSQL, MySQL, and Open Source Databases.

","Sinead Lawless",185675230 1561841,"Enterprise Search London Meetup","es-london","https://www.meetup.com/es-london/",4.66,1259157419000,"

Enterprise Search London is a meetup for anyone interested in building search and discovery experiences β€” from intranet search and site search, to advanced discovery applications and beyond.

Disclaimer: This meetup is NOT about SEO or search engine marketing.

What people are saying:

  • ""Join this meetup if you have a passion for enterprise search and user experience that you would like to share with other able-minded practitioners."" β€” Vegard Sandvold
  • ""Full marks for vision and execution. Looking forward to the next Meetup."" β€” Martin White
  • β€œConsistently excellent” β€” Helen Lippell

Sweet! And what if we run it again?

$ PYTHONPATH="." luigi --module blog --local-scheduler Meetup
DEBUG: Checking if Meetup() is complete
INFO: Informed scheduler that task   Meetup__99914b932b   has status   DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=172768377, workers=1, host=Marks-MBP-4, username=markneedham, pid=4531) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 present dependencies were encountered:
    - 1 Meetup()

Did not run any tasks
This progress looks 🙂 because there were no failed tasks or missing external dependencies

===== Luigi Execution Summary =====

As expected nothing happens since our dependencies are already satisfied and we have our first Luigi pipeline up and running.

The post Luigi: An ExternalProgramTask example – Converting JSON to CSV appeared first on Mark Needham.

Categories: Programming

Python 3: TypeError: Object of type β€˜dict_values’ is not JSON serializable

Sun, 03/19/2017 - 17:40

I’ve recently upgraded to Python 3 (I know, took me a while!) and realised that one of my scripts that writes JSON to a file no longer works!

This is a simplified version of what I’m doing:

>>> import json
>>> x = {"mark": {"name": "Mark"}, "michael": {"name": "Michael"}  } 
>>> json.dumps(x.values())
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'dict_values' is not JSON serializable

Python 2.7 would be perfectly happy:

>>> json.dumps(x.values())
'[{"name": "Michael"}, {"name": "Mark"}]'

The difference is in the results returned by the values method:

# Python 2.7.10
>>> x.values()
[{'name': 'Michael'}, {'name': 'Mark'}]

# Python 3.6.0
>>> x.values()
dict_values([{'name': 'Mark'}, {'name': 'Michael'}])
>>> 

Python 3 no longer returns an array, instead we have a dict_values wrapper around the data.

Luckily this is easy to resolve – we just need to wrap the call to values with a call to list:

>>> json.dumps(list(x.values()))
'[{"name": "Mark"}, {"name": "Michael"}]'

This versions works with Python 2.7 as well so if I accidentally run the script with an old version the world isn’t going to explode.

The post Python 3: TypeError: Object of type ‘dict_values’ is not JSON serializable appeared first on Mark Needham.

Categories: Programming

Neo4j: apoc.date.parse – java.lang.IllegalArgumentException: Illegal pattern character β€˜T’ / java.text.ParseException: Unparseable date: β€œ2012-11-12T08:46:15Z”

Mon, 03/06/2017 - 21:52

I often find myself wanting to convert date strings into Unix timestamps using Neo4j’s APOC library and unfortunately some sources don’t use the format that apoc.date.parse expects.

e.g.

return apoc.date.parse("2012-11-12T08:46:15Z",'s') 
AS ts

Failed to invoke function `apoc.date.parse`: 
Caused by: java.lang.IllegalArgumentException: java.text.ParseException: Unparseable date: "2012-11-12T08:46:15Z"

We need to define the format explicitly so the SimpleDataFormat documentation comes in handy. I tried the following:

return apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-ddTHH:mm:ssZ") 
AS ts

Failed to invoke function `apoc.date.parse`: 
Caused by: java.lang.IllegalArgumentException: Illegal pattern character 'T'

Hmmm, we need to quote the ‘T’ character – we can’t just include it in the pattern. Let’s try again:

return  apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-dd'T'HH:mm:ssZ") 
AS ts

Failed to invoke function `apoc.date.parse`: 
Caused by: java.lang.IllegalArgumentException: java.text.ParseException: Unparseable date: "2012-11-12T08:46:15Z"

The problem now is that we haven’t quoted the ‘Z’ but the error doesn’t indicate that – not sure why!

We can either quote the ‘Z’:

return  apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-dd'T'HH:mm:ss'Z'") 
AS ts

╒══════════╕
β”‚"ts"      β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•‘
β”‚1352709975β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Or we can match the timezone using ‘XXX’:

return  apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-dd'T'HH:mm:ssXXX") 
AS ts

╒══════════╕
β”‚"ts"      β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•‘
β”‚1352709975β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The post Neo4j: apoc.date.parse – java.lang.IllegalArgumentException: Illegal pattern character ‘T’ / java.text.ParseException: Unparseable date: “2012-11-12T08:46:15Z” appeared first on Mark Needham.

Categories: Programming

Neo4j: Graphing the β€˜My name is…I work’ Twitter meme

Tue, 02/28/2017 - 16:50

Over the last few days I’ve been watching the chain of ‘My name is…’ tweets kicked off by DHH with interest. As I understand it, the idea is to show that coding interview riddles/hard tasks on a whiteboard are ridiculous.

Hello, my name is David. I would fail to write bubble sort on a whiteboard. I look code up on the internet all the time. I don't do riddles.

— DHH (@dhh) February 21, 2017

Other people quoted that tweet and added their own piece and yesterday Eduardo Hernacki suggested that traversing this chain of tweets seemed tailor made for Neo4j.

@eduardohki is someone traversing all this stuff? #Neo4j

— Eduardo Hernacki (@eduardohki) February 28, 2017

Michael was quickly on the scene and created a Cypher query which calls the Twitter API and creates a Neo4j graph from the resulting JSON response. The only tricky bit is creating a ‘bearer token’ but Jason Kotchoff has a helpful gist showing how to generate one from your Twitter consumer key and consumer secret.

Now that we’re got our bearer token let’s create a parameter to store it. Type the following in the Neo4j browser:

:param bearer: ''

Now we’re ready to query the Twitter API. We’ll start with the search API and find all tweets which contain the text ‘”my name” “I work”‘. That will return a JSON response containing lots of tweets. We’ll then create a node for each tweet it returns, a node for the user who posted the tweet, a node for the tweet it quotes, and relationships to glue them all together.

We’re going to use the apoc.load.jsonParams procedure from the APOC library to help us import the data. If you want to follow along you can use a Neo4j sandbox instance which comes with APOC installed. For your local Neo4j installation, grab the APOC jar and put it into your plugins folder before restarting Neo4j.

This is the query in full:

WITH 'https://api.twitter.com/1.1/search/tweets.json?count=100&result_type=recent&lang=en&q=' as url, {bearer} as bearer

CALL apoc.load.jsonParams(url + "%22my%20name%22%20is%22%20%22I%20work%22",{Authorization:"Bearer "+bearer},null) yield value

UNWIND value.statuses as status
WITH status, status.user as u, status.entities as e
WHERE status.quoted_status_id is not null

// create a node for the original tweet
MERGE (t:Tweet {id:status.id}) 
ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count

// create a node for the author + a POSTED relationship from the author to the tweet
MERGE (p:User {name:u.screen_name})
MERGE (p)-[:POSTED]->(t)

// create a MENTIONED relationship from the tweet to any users mentioned in the tweet
FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu))

// create a node for the quoted tweet and create a QUOTED relationship from the original tweet to the quoted one
MERGE (q:Tweet {id:status.quoted_status_id})
MERGE (t)–[:QUOTED]->(q)

// repeat the above steps for the quoted tweet
WITH t as t0, status.quoted_status as status WHERE status is not null
WITH t0, status, status.user as u, status.entities as e

MERGE (t:Tweet {id:status.id}) 
ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count

MERGE (t0)-[:QUOTED]->(t)

MERGE (p:User {name:u.screen_name})
MERGE (p)-[:POSTED]->(t)

FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu))

MERGE (q:Tweet {id:status.quoted_status_id})
MERGE (t)–[:QUOTED]->(q);

The resulting graph looks like this:

MATCH p=()-[r:QUOTED]->() RETURN p LIMIT 25

Graph  21

A more interesting query would be to find the path from DHH to Eduardo which we can find with the following query:

match path = (dhh:Tweet {id: 834146806594433025})<-[:QUOTED*]-(eduardo:Tweet{id: 836400531983724545})
UNWIND NODES(path) AS tweet
MATCH (tweet)<-[:POSTED]->(user)
RETURN tweet, user

This query:

  • starts from DHH’s tweet
  • traverses all QUOTED relationships until it finds Eduardo’s tweet
  • collects all those tweets and then finds the author
  • returns the tweet and the author

And this is the output:

Graph  20

I ran a couple of other queries against the Twitter API to hydrate some nodes that we hadn’t set all the properties on – you can see all the queries on this gist.

For the next couple of days I also have a sandbox running https://10-0-1-157-32898.neo4jsandbox.com/browser/. You can login using the credentials readonly/twitter.

If you have any questions/suggestions let me know in the comments, @markhneedham on twitter, or email the Neo4j DevRel team – devrel@neo4j.com.

The post Neo4j: Graphing the ‘My name is…I work’ Twitter meme appeared first on Mark Needham.

Categories: Programming

Neo4j: How do null values even work?

Thu, 02/23/2017 - 00:28

Every now and then I find myself wanting to import a CSV file into Neo4j and I always get confused with how to handle the various null values that can lurk within.

Let’s start with an example that doesn’t have a CSV file in sight. Consider the following list and my attempt to only return null values:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value = null
RETURN value

(no changes, no records)

Hmm that’s weird. I’d have expected that at least keep the first value in the collection. What about if we do the inverse?

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value <> null
RETURN value

(no changes, no records)

Still nothing! Let’s try returning the output of our comparisons rather than filtering rows:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
RETURN value = null AS outcome

╒═══════╀═════════╕
β”‚"value"β”‚"outcome"β”‚
β•žβ•β•β•β•β•β•β•β•ͺ═════════║
β”‚null   β”‚null     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"null" β”‚null     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚""     β”‚null     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Mark" β”‚null     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Ok so that isn’t what we expected. Everything has an ‘outcome’ of ‘null’! What about if we want to check whether the value is the string “Mark”?

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
RETURN value = "Mark" AS outcome

╒═══════╀═════════╕
β”‚"value"β”‚"outcome"β”‚
β•žβ•β•β•β•β•β•β•β•ͺ═════════║
β”‚null   β”‚null     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"null" β”‚false    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚""     β”‚false    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Mark" β”‚true     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

From executing this query we learn that if one side of a comparison is null then the return value is always going to be null.

So how do we exclude a row if it’s null?

It turns out we have to use the ‘is’ keyword rather than using the equality operator. Let’s see what that looks like:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value is null
RETURN value

╒═══════╕
β”‚"value"β”‚
β•žβ•β•β•β•β•β•β•β•‘
β”‚null   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜

And the positive case:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value is not null
RETURN value

╒═══════╕
β”‚"value"β”‚
β•žβ•β•β•β•β•β•β•β•‘
β”‚"null" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚""     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Mark" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜

What if we want to get rid of empty strings?

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value <> ""
RETURN value

╒═══════╕
β”‚"value"β”‚
β•žβ•β•β•β•β•β•β•β•‘
β”‚"null" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Mark" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜

Interestingly that also gets rid of the null value which I hadn’t expected. But if we look for values matching the empty string:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
WITH value WHERE value = ""
RETURN value

╒═══════╕
β”‚"value"β”‚
β•žβ•β•β•β•β•β•β•β•‘
β”‚""     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜

It’s not there either! Hmm what’s going on here:

WITH [null, "null", "", "Mark"] AS values
UNWIND values AS value
RETURN value, value = "" AS isEmpty, value <> "" AS isNotEmpty

╒═══════╀═════════╀════════════╕
β”‚"value"β”‚"isEmpty"β”‚"isNotEmpty"β”‚
β•žβ•β•β•β•β•β•β•β•ͺ═════════β•ͺ════════════║
β”‚null   β”‚null     β”‚null        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"null" β”‚false    β”‚true        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚""     β”‚true     β”‚false       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Mark" β”‚false    β”‚true        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

null values seem to get filtered out for every type of equality match unless we explicitly check that a value ‘is null’.

So how do we use this knowledge when we’re parsing CSV files using Neo4j’s LOAD CSV tool?

Let’s say we have a CSV file that looks like this:

$ cat nulls.csv
name,company
"Mark",
"Michael",""
"Will",null
"Ryan","Neo4j"

So none of the first three rows have a value for ‘company’. I don’t have any value at all, Michael has an empty string, and Will has a null value. Let’s see how LOAD CSV interprets this:

load csv with headers from "file:///nulls.csv" AS row
RETURN row

╒═════════════════════════════════╕
β”‚"row"                            β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚{"name":"Mark","company":null}   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚{"name":"Michael","company":""}  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚{"name":"Will","company":"null"} β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚{"name":"Ryan","company":"Neo4j"}β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

We’ve got the full sweep of all the combinations from above. We’d like to create a Person node for each row but only create a Company node and associated ‘WORKS_FOR’ relationshp if an actual company is defined – we don’t want to create a null company.

So we only want to create a company node and ‘WORKS_FOR’ relationship for the Ryan row.

The following query does the trick:

load csv with headers from "file:///nulls.csv" AS row
MERGE (p:Person {name: row.name})
WITH p, row
WHERE row.company <> "" AND row.company <> "null"
MERGE (c:Company {name: row.company})
MERGE (p)-[:WORKS_FOR]->(c)

Added 5 labels, created 5 nodes, set 5 properties, created 1 relationship, statement completed in 117 ms.

And if we visualise what’s been created:

Graph  15

Perfect. Perhaps this behaviour is obvious but it always trips me up so hopefully it’ll be useful to someone else as well!

There’s also a section on the Neo4j developer pages describing even more null scenarios that’s worth checking out.

The post Neo4j: How do null values even work? appeared first on Mark Needham.

Categories: Programming

Neo4j: Analysing a CSV file using LOAD CSV and Cypher

Sun, 02/19/2017 - 23:39

Last week we ran our first online meetup for several years and I wanted to wanted to analyse the stats that YouTube lets you download for an event.

The file I downloaded looked like this:

$ cat ~/Downloads/youtube_stats_pW9boJoUxO0.csv 
Video IDs:, pW9boJoUxO0, Start time:, Wed Feb 15 08:57:55 2017, End time:, Wed Feb 15 10:03:10 2017
Playbacks, Peak concurrent viewers, Total view time (hours), Average session length (minutes)
348, 112, 97.125, 16.7456896552, 

Country code, AR, AT, BE, BR, BY, CA, CH, CL, CR, CZ, DE, DK, EC, EE, ES, FI, FR, GB, HU, IE, IL, IN, IT, LB, LU, LV, MY, NL, NO, NZ, PK, PL, QA, RO, RS, RU, SE, TR, US, VN, ZA
Playbacks, 2, 2, 1, 14, 1, 10, 2, 1, 1, 1, 27, 1, 1, 1, 3, 1, 25, 54, 1, 4, 6, 8, 1, 1, 1, 1, 1, 23, 1, 1, 1, 1, 1, 1, 2, 6, 22, 1, 114, 1, 1
Peak concurrent viewers, 2, 1, 1, 4, 1, 5, 1, 1, 0, 0, 11, 1, 1, 1, 2, 1, 6, 25, 1, 3, 3, 2, 1, 1, 1, 1, 1, 9, 1, 1, 0, 1, 0, 1, 1, 3, 7, 0, 44, 1, 0
Total view time (hours), 1.075, 0.0166666666667, 0.175, 2.58333333333, 0.00833333333333, 3.01666666667, 0.858333333333, 0.0583333333333, 0.0, 0.0, 8.69166666667, 0.8, 0.0166666666667, 0.0583333333333, 0.966666666667, 0.0166666666667, 4.20833333333, 20.8333333333, 0.00833333333333, 1.39166666667, 1.75, 0.766666666667, 0.00833333333333, 0.15, 0.0333333333333, 1.05833333333, 0.0333333333333, 7.36666666667, 0.0583333333333, 0.916666666667, 0.0, 0.00833333333333, 0.0, 0.00833333333333, 0.4, 1.10833333333, 5.28333333333, 0.0, 32.7333333333, 0.658333333333, 0.0
Average session length (minutes), 32.25, 0.5, 10.5, 11.0714285714, 0.5, 18.1, 25.75, 3.5, 0.0, 0.0, 19.3148148148, 48.0, 1.0, 3.5, 19.3333333333, 1.0, 10.1, 23.1481481481, 0.5, 20.875, 17.5, 5.75, 0.5, 9.0, 2.0, 63.5, 2.0, 19.2173913043, 3.5, 55.0, 0.0, 0.5, 0.0, 0.5, 12.0, 11.0833333333, 14.4090909091, 0.0, 17.2280701754, 39.5, 0.0

I want to look at the country specific stats so the first 4 lines aren’t interesting to me:

$ tail -n+5 youtube_stats_pW9boJoUxO0.csv > youtube.csv

I then put the youtube.csv file into the import directory of Neo4j and wrote the following query to return a row representing each country and its score for each of the metrics:

load csv with headers from "file:///youtube.csv" AS row
WITH [key in keys(row) where key <> "Country code"] AS keys, row, row["Country code"] AS heading
UNWIND keys AS key
RETURN key AS country, heading AS key, row[key] AS value

╒═════════╀═══════════╀═══════╕
β”‚"country"β”‚"key"      β”‚"value"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•ͺ═══════════β•ͺ═══════║
β”‚" SE"    β”‚"Playbacks"β”‚"22"   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚" GB"    β”‚"Playbacks"β”‚"54"   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚" FR"    β”‚"Playbacks"β”‚"25"   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚" RS"    β”‚"Playbacks"β”‚"2"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚" LV"    β”‚"Playbacks"β”‚"1"    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

Now I want to create a node representing each country and create a property for each of the metrics. Since the property names are going to be dynamic I’ll make use of the APOC library which I drop into my plugins directory. I then tweaked the query to create the nodes:

load csv with headers from "https://dl.dropboxusercontent.com/u/14493611/youtube.csv" AS row
WITH [key in keys(row) where key <> "Country code"] AS keys, row, row["Country code"] AS heading
UNWIND keys AS key
WITH key AS country, heading AS key, row[key] AS value
MERGE (c:Country {name: replace(country, " ", "")})
WITH *
CALL apoc.create.setProperty(c, key, toInteger(value))
YIELD node
RETURN COUNT(*)

We can now see which country provided the most viewers:

MATCH (n:Country) 
RETURN n.name, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins
ORDER BY playbacks DESC
LIMIT 10

╒════════╀═══════════╀═════════════════╀═════════════════╀════════════════╕
β”‚"n.name"β”‚"playbacks"β”‚"viewTimeInHours"β”‚"peakConcViewers"β”‚"aveSessionMins"β”‚
β•žβ•β•β•β•β•β•β•β•β•ͺ═══════════β•ͺ═════════════════β•ͺ═════════════════β•ͺ════════════════║
β”‚"US"    β”‚"114"      β”‚"32"             β”‚"44"             β”‚"17"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"GB"    β”‚"54"       β”‚"20"             β”‚"25"             β”‚"23"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"DE"    β”‚"27"       β”‚"8"              β”‚"11"             β”‚"19"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"FR"    β”‚"25"       β”‚"4"              β”‚"6"              β”‚"10"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"NL"    β”‚"23"       β”‚"7"              β”‚"9"              β”‚"19"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"SE"    β”‚"22"       β”‚"5"              β”‚"7"              β”‚"14"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"BR"    β”‚"14"       β”‚"2"              β”‚"4"              β”‚"11"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"CA"    β”‚"10"       β”‚"3"              β”‚"5"              β”‚"18"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"IN"    β”‚"8"        β”‚"0"              β”‚"2"              β”‚"5"             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"IL"    β”‚"6"        β”‚"1"              β”‚"3"              β”‚"17"            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The United States in first unsurprisingly followed by the UK, Germany, and France. We ran the meetup at 5pm UK time so it was a friendly enough time for this side of the globe but not so friendly for Asia or Australia so it’s not too surprising we don’t see anybody from there!

For my last trick I wanted to see the full names of the countries so I downloaded the 2 digit codes for each country along with their full name.

I then updated my graph:

load csv with headers from "file:///countries.csv" AS row
MATCH (c:Country {name: row.Code})
SET c.fullName = row.Name;

Now let’s re-run our query and show the country fullnames instead:

MATCH (n:Country) 
RETURN n.fullName, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins
ORDER BY playbacks DESC
LIMIT 10

╒════════════════╀═══════════╀═════════════════╀═════════════════╀════════════════╕
β”‚"n.fullName"    β”‚"playbacks"β”‚"viewTimeInHours"β”‚"peakConcViewers"β”‚"aveSessionMins"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════════β•ͺ═════════════════β•ͺ═════════════════β•ͺ════════════════║
β”‚"United States" β”‚"114"      β”‚"32"             β”‚"44"             β”‚"17"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"United Kingdom"β”‚"54"       β”‚"20"             β”‚"25"             β”‚"23"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Germany"       β”‚"27"       β”‚"8"              β”‚"11"             β”‚"19"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"France"        β”‚"25"       β”‚"4"              β”‚"6"              β”‚"10"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Netherlands"   β”‚"23"       β”‚"7"              β”‚"9"              β”‚"19"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Sweden"        β”‚"22"       β”‚"5"              β”‚"7"              β”‚"14"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Brazil"        β”‚"14"       β”‚"2"              β”‚"4"              β”‚"11"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Canada"        β”‚"10"       β”‚"3"              β”‚"5"              β”‚"18"            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"India"         β”‚"8"        β”‚"0"              β”‚"2"              β”‚"5"             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Israel"        β”‚"6"        β”‚"1"              β”‚"3"              β”‚"17"            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

And that’s the end of my analysis with no relationships in sight!

The post Neo4j: Analysing a CSV file using LOAD CSV and Cypher appeared first on Mark Needham.

Categories: Programming

ReactJS/Material-UI: Cannot resolve module β€˜material-ui/lib/’

Sun, 02/12/2017 - 23:43

I’ve been playing around with ReactJS and the Material-UI library over the weekend and ran into this error while trying to follow one of the example from the demo application:

ERROR in ./src/app/modules/Foo.js
Module not found: Error: Cannot resolve module 'material-ui/lib/Subheader' in /Users/markneedham/neo/reactjs-test/src/app/modules
 @ ./src/app/modules/Foo.js 13:17-53
webpack: Failed to compile.

This was the component code:

import React from 'react'
import Subheader from 'material-ui/lib/Subheader'

export default React.createClass({
  render() {
    return 
    Some Text
    
  }
})

which is then rendered like this:

import Foo from './modules/Foo'
render(Foo, document.getElementById("app"))

I came across this post on Stack Overflow which seemed to describe a similar issue and led me to realise that I was actually on the wrong version of the documentation. I’m using version 0.16.7 but the demo I copied from is for version 0.15.0-alpha.1!

This is the component code that we actually want:

import React from 'react'
import Subheader from 'material-ui/Subheader'

export default React.createClass({
  render() {
    return 
    Some Text
    
  }
})

And that’s all I had to change. There are several other components that you’ll see the same error for and it looks like the change was made between the 0.14.x and 0.15.x series of the library.

The post ReactJS/Material-UI: Cannot resolve module ‘material-ui/lib/’ appeared first on Mark Needham.

Categories: Programming

Go: Multi-threaded writing to a CSV file

Tue, 01/31/2017 - 06:57

As part of a Go script I’ve been working on I wanted to write to a CSV file from multiple Go routines, but realised that the built in CSV Writer isn’t thread safe.

My first attempt at writing to the CSV file looked like this:

package main


import (
	"encoding/csv"
	"os"
	"log"
	"strconv"
)

func main() {

	csvFile, err := os.Create("/tmp/foo.csv")
	if err != nil {
		log.Panic(err)
	}

	w := csv.NewWriter(csvFile)
	w.Write([]string{"id1","id2","id3"})

	count := 100
	done := make(chan bool, count)

	for i := 0; i < count; i++ {
		go func(i int) {
			w.Write([]string {strconv.Itoa(i), strconv.Itoa(i), strconv.Itoa(i)})
			done <- true
		}(i)
	}

	for i:=0; i < count; i++ {
		<- done
	}
	w.Flush()
}

This script should output the numbers from 0-99 three times on each line. Some rows in the file are written correctly, but as we can see below, some aren't:

40,40,40
37,37,37
38,38,38
18,18,39
^@,39,39
...
67,67,70,^@70,70
65,65,65
73,73,73
66,66,66
72,72,72
75,74,75,74,75
74
7779^@,79,77
...

One way that we can make our script safe is to use a mutex whenever we're calling any methods on the CSV writer. I wrote the following code to do this:

type CsvWriter struct {
	mutex *sync.Mutex
	csvWriter *csv.Writer
}

func NewCsvWriter(fileName string) (*CsvWriter, error) {
	csvFile, err := os.Create(fileName)
	if err != nil {
		return nil, err
	}
	w := csv.NewWriter(csvFile)
	return &CsvWriter{csvWriter:w, mutex: &sync.Mutex{}}, nil
}

func (w *CsvWriter) Write(row []string) {
	w.mutex.Lock()
	w.csvWriter.Write(row)
	w.mutex.Unlock()
}

func (w *CsvWriter) Flush() {
	w.mutex.Lock()
	w.csvWriter.Flush()
	w.mutex.Unlock()
}

We create a mutex when NewCsvWriter instantiates CsvWriter and then use it in the Write and Flush functions so that only one go routine at a time can access the underlying CsvWriter. We then tweak the initial script to call this class instead of calling CsvWriter directly:

func main() {
	w, err := NewCsvWriter("/tmp/foo-safe.csv")
	if err != nil {
		log.Panic(err)
	}

	w.Write([]string{"id1","id2","id3"})

	count := 100
	done := make(chan bool, count)

	for i := 0; i < count; i++ {
		go func(i int) {
			w.Write([]string {strconv.Itoa(i), strconv.Itoa(i), strconv.Itoa(i)})
			done <- true
		}(i)
	}

	for i:=0; i < count; i++ {
		<- done
	}
	w.Flush()
}

And now if we inspect the CSV file all lines have been written successfully:

...
25,25,25
13,13,13
29,29,29
32,32,32
26,26,26
30,30,30
27,27,27
31,31,31
28,28,28
34,34,34
35,35,35
33,33,33
37,37,37
36,36,36
...

That's all for now. If you have any suggestions for a better way to do this do let me know in the comments or on twitter - I'm @markhneedham

The post Go: Multi-threaded writing to a CSV file appeared first on Mark Needham.

Categories: Programming

Go vs Python: Parsing a JSON response from a HTTP API

Sat, 01/21/2017 - 11:49

As part of a recommendations with Neo4j talk that I’ve presented a few times over the last year I have a set of scripts that download some data from the meetup.com API.

They’re all written in Python but I thought it’d be a fun exercise to see what they’d look like in Go. My eventual goal is to try and parallelise the API calls.

This is the Python version of the script:

import requests
import os
import json

key =  os.environ['MEETUP_API_KEY']
lat = "51.5072"
lon = "0.1275"

seed_topic = "nosql"
uri = "https://api.meetup.com/2/groups?&topic={0}&lat={1}&lon={2}&key={3}".format(seed_topic, lat, lon, key)

r = requests.get(uri)
all_topics = [topic["urlkey"]  for result in r.json()["results"] for topic in result["topics"]]

for topic in all_topics:
    print topic

We’re using the requests library to send a request to the meetup API to get the groups which have the topic ‘nosql’ in the London area. We then parse the response and print out the topics.

Now to do the same thing in Go! The first bit of the script is almost identical:

import (
	"fmt"
	"os"
	"net/http"
	"log"
	"time"
)

func handleError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}

func main() {
	var httpClient = &http.Client{Timeout: 10 * time.Second}

	seedTopic := "nosql"
	lat := "51.5072"
	lon := "0.1275"
	key := os.Getenv("MEETUP_API_KEY")

	uri := fmt.Sprintf("https://api.meetup.com/2/groups?&topic=%s&lat=%s&lon=%s&key=%s", seedTopic, lat, lon, key)

	response, err := httpClient.Get(uri)
	handleError(err)
	defer response.Body.Close()
	fmt.Println(response)
}

If we run that this is the output we see:

$ go cmd/blog/main.go

&{200 OK 200 HTTP/2.0 2 0 map[X-Meetup-Request-Id:[2d3be3c7-a393-4127-b7aa-076f150499e6] X-Ratelimit-Reset:[10] Cf-Ray:[324093a73f1135d2-LHR] X-Oauth-Scopes:[basic] Etag:["35a941c5ea3df9df4204d8a4a2d60150"] Server:[cloudflare-nginx] Set-Cookie:[__cfduid=d54db475299a62af4bb963039787e2e3d1484894864; expires=Sat, 20-Jan-18 06:47:44 GMT; path=/; domain=.meetup.com; HttpOnly] X-Meetup-Server:[api7] X-Ratelimit-Limit:[30] X-Ratelimit-Remaining:[29] X-Accepted-Oauth-Scopes:[basic] Vary:[Accept-Encoding,User-Agent,Accept-Language] Date:[Fri, 20 Jan 2017 06:47:45 GMT] Content-Type:[application/json;charset=utf-8]] 0xc420442260 -1 [] false true map[] 0xc4200d01e0 0xc4202b2420}

So far so good. Now we need to parse the response that comes back.

Most of the examples that I came across suggest creating a struct with all the fields that you want to extract from the JSON document but that feels a bit over kill for such a simple script.

Instead we can just create maps of (string -> interface{}) and then apply type conversions where appropriate. I ended up with the following code to extract the topics:

import "encoding/json"

var target map[string]interface{}
decoder := json.NewDecoder(response.Body)
decoder.Decode(&target)

for _, rawGroup := range target["results"].([]interface{}) {
    group := rawGroup.(map[string]interface{})
    for _, rawTopic := range group["topics"].([]interface{}) {
        topic := rawTopic.(map[string]interface{})
        fmt.Println(topic["urlkey"])
    }
}

It’s more verbose that the Python version because we have to explicitly type each thing we take out of the map at every stage, but it’s not too bad. This is the full script:

package main

import (
	"fmt"
	"os"
	"net/http"
	"log"
	"time"
	"encoding/json"
)

func handleError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}

func main() {
	var httpClient = &http.Client{Timeout: 10 * time.Second}

	seedTopic := "nosql"
	lat := "51.5072"
	lon := "0.1275"
	key := os.Getenv("MEETUP_API_KEY")

	uri := fmt.Sprintf("https://api.meetup.com/2/groups?&topic=%s&lat=%s&lon=%s&key=%s", seedTopic, lat, lon, key)

	response, err := httpClient.Get(uri)
	handleError(err)
	defer response.Body.Close()

	var target map[string]interface{}
	decoder := json.NewDecoder(response.Body)
	decoder.Decode(&target)

	for _, rawGroup := range target["results"].([]interface{}) {
		group := rawGroup.(map[string]interface{})
		for _, rawTopic := range group["topics"].([]interface{}) {
			topic := rawTopic.(map[string]interface{})
			fmt.Println(topic["urlkey"])
		}
	}
}

Once I’ve got these topics the next step is to make more API calls to get the groups for those topics.

I want to make those API calls in parallel while making sure I don’t exceed the rate limit restrictions on the API and I think I can make use of go routines, channels, and timers to do that. But that’s for another post!

The post Go vs Python: Parsing a JSON response from a HTTP API appeared first on Mark Needham.

Categories: Programming

Go: First attempt at channels

Sat, 12/24/2016 - 11:45

In a previous blog post I mentioned that I wanted to extract blips from The ThoughtWorks Radar into a CSV file and I thought this would be a good mini project for me to practice using Go.

In particular I wanted to try using channels and this seemed like a good chance to do that.

I watched a talk by Rob Pike on designing concurrent applications where he uses the following definition of concurrency:

Concurrency is a way to structure a program by breaking it into pieces that can be executed independently.

He then demonstrates this with the following diagram:

2016 12 23 19 52 30

I broke the scraping application down into four parts:

  1. Find the links of blips to download ->
  2. Download the blips ->
  3. Scrape the data from each page ->
  4. Write the data into a CSV file

I don’t think we gain much by parallelising steps 1) or 4) but steps 2) and 3) seem easily parallelisable. Therefore we’ll use a single goroutine for steps 1) and 4) and multiple goroutines for steps 2) and 3).

We’ll create two channels:

  • filesToScrape
  • filesScraped

And they will interact with our components like this:

  • 2) will write the path of the downloaded files into filesToScape
  • 3) will read from filesToScrape and write the scraped content into filesScraped
  • 4) will read from filesScraped and put that information into a CSV file.


I decided to write a completely serial version of the scraping application first so that I could compare it to the parallel version. I had the following common code:

scrape/scrape.go

package scrape

import (
	"github.com/PuerkitoBio/goquery"
	"os"
	"bufio"
	"fmt"
	"log"
	"strings"
	"net/http"
	"io"
)

func checkError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}

type Blip struct {
	Link  string
	Title string
}

func (blip Blip) Download() File {
	parts := strings.Split(blip.Link, "/")
	fileName := "rawData/items/" + parts[len(parts) - 1]

	if _, err := os.Stat(fileName); os.IsNotExist(err) {
		resp, err := http.Get("http://www.thoughtworks.com" + blip.Link)
		checkError(err)
		body := resp.Body

		file, err := os.Create(fileName)
		checkError(err)

		io.Copy(bufio.NewWriter(file), body)
		file.Close()
		body.Close()
	}

	return File{Title: blip.Title, Path: fileName }
}

type File struct {
	Title string
	Path  string
}

func (fileToScrape File ) Scrape() ScrapedFile {
	file, err := os.Open(fileToScrape.Path)
	checkError(err)

	doc, err := goquery.NewDocumentFromReader(bufio.NewReader(file))
	checkError(err)
	file.Close()

	var entries []map[string]string
	doc.Find("div.blip-timeline-item").Each(func(i int, s *goquery.Selection) {
		entry := make(map[string]string, 0)
		entry["time"] = s.Find("div.blip-timeline-item__time").First().Text()
		entry["outcome"] = strings.Trim(s.Find("div.blip-timeline-item__ring span").First().Text(), " ")
		entry["description"] = s.Find("div.blip-timeline-item__lead").First().Text()
		entries = append(entries, entry)
	})

	return ScrapedFile{File:fileToScrape, Entries:entries}
}

type ScrapedFile struct {
	File    File
	Entries []map[string]string
}

func FindBlips(pathToRadar string) []Blip {
	blips := make([]Blip, 0)

	file, err := os.Open(pathToRadar)
	checkError(err)

	doc, err := goquery.NewDocumentFromReader(bufio.NewReader(file))
	checkError(err)

	doc.Find(".blip").Each(func(i int, s *goquery.Selection) {
		item := s.Find("a")
		title := item.Text()
		link, _ := item.Attr("href")
		blips = append(blips, Blip{Title: title, Link: link })
	})

	return blips
}

Note that we’re using the goquery library to scrape the HTML files that we download.

A Blip is used to represent an item that appears on the radar e.g. .NET Core. A File is a representation of that blip on my local file system and a ScrapedFile contains the local representation of a blip and has an array containing every appearance the blip has made in radars over time.

Let’s have a look at the single threaded version of the scraper:

cmd/single/main.go

package main

import (
	"fmt"
	"encoding/csv"
	"os"
	"github.com/mneedham/neo4j-thoughtworks-radar/scrape"
)


func main() {
	var filesCompleted chan scrape.ScrapedFile = make(chan scrape.ScrapedFile)
	defer close(filesCompleted)

	blips := scrape.FindBlips("rawData/twRadar.html")

	var filesToScrape []scrape.File
	for _, blip := range blips {
		filesToScrape = append(filesToScrape, blip.Download())
	}

	var filesScraped []scrape.ScrapedFile
	for _, file := range filesToScrape {
		filesScraped = append(filesScraped, file.Scrape())
	}

	blipsCsvFile, _ := os.Create("import/blipsSingle.csv")
	writer := csv.NewWriter(blipsCsvFile)
	defer blipsCsvFile.Close()

	writer.Write([]string{"technology", "date", "suggestion" })
	for _, scrapedFile := range filesScraped {
		fmt.Println(scrapedFile.File.Title)
		for _, blip := range scrapedFile.Entries {
			writer.Write([]string{scrapedFile.File.Title, blip["time"], blip["outcome"] })
		}
	}
	writer.Flush()
}

rawData/twRadar.html is a local copy of the A-Z page which contains all the blips. This version is reasonably simple: we create an array containing all the blips, scrape them into another array, and then that array into a CSV file. And if we run it:

$ time go run cmd/single/main.go 

real	3m10.354s
user	0m1.140s
sys	0m0.586s

$ head -n10 import/blipsSingle.csv 
technology,date,suggestion
.NET Core,Nov 2016,Assess
.NET Core,Nov 2015,Assess
.NET Core,May 2015,Assess
A single CI instance for all teams,Nov 2016,Hold
A single CI instance for all teams,Apr 2016,Hold
Acceptance test of journeys,Mar 2012,Trial
Acceptance test of journeys,Jul 2011,Trial
Acceptance test of journeys,Jan 2011,Trial
Accumulate-only data,Nov 2015,Assess

It takes a few minutes and most of the time will be taken in the blip.Download() function – work which is easily parallelisable. Let’s have a look at the parallel version where goroutines use channels to communicate with each other:

cmd/parallel/main.go

package main

import (
	"os"
	"encoding/csv"
	"github.com/mneedham/neo4j-thoughtworks-radar/scrape"
)

func main() {
	var filesToScrape chan scrape.File = make(chan scrape.File)
	var filesScraped chan scrape.ScrapedFile = make(chan scrape.ScrapedFile)
	defer close(filesToScrape)
	defer close(filesScraped)

	blips := scrape.FindBlips("rawData/twRadar.html")

	for _, blip := range blips {
		go func(blip scrape.Blip) { filesToScrape <- blip.Download() }(blip)
	}

	for i := 0; i < len(blips); i++ {
		select {
		case file := <-filesToScrape:
			go func(file scrape.File) { filesScraped <- file.Scrape() }(file)
		}
	}

	blipsCsvFile, _ := os.Create("import/blips.csv")
	writer := csv.NewWriter(blipsCsvFile)
	defer blipsCsvFile.Close()

	writer.Write([]string{"technology", "date", "suggestion" })
	for i := 0; i < len(blips); i++ {
		select {
		case scrapedFile := <-filesScraped:
			for _, blip := range scrapedFile.Entries {
				writer.Write([]string{scrapedFile.File.Title, blip["time"], blip["outcome"] })
			}
		}
	}
	writer.Flush()
}

Let's remove the files we just downloaded and give this version a try.

$ rm rawData/items/*

$ time go run cmd/parallel/main.go 

real	0m6.689s
user	0m2.544s
sys	0m0.904s

$ head -n10 import/blips.csv 
technology,date,suggestion
Zucchini,Oct 2012,Assess
Reactive Extensions for .Net,May 2013,Assess
Manual infrastructure management,Mar 2012,Hold
Manual infrastructure management,Jul 2011,Hold
JavaScript micro frameworks,Oct 2012,Trial
JavaScript micro frameworks,Mar 2012,Trial
NPM for all the things,Apr 2016,Trial
NPM for all the things,Nov 2015,Trial
PowerShell,Mar 2012,Trial

So we're down from 190 seconds to 7 seconds, pretty cool! One interesting thing is that the order of the values in the CSV file will be different since the goroutines won't necessarily come back in the same order that they were launched. We do end up with the same number of values:

$ wc -l import/blips.csv 
    1361 import/blips.csv

$ wc -l import/blipsSingle.csv 
    1361 import/blipsSingle.csv

And we can check that the contents are identical:

$ cat import/blipsSingle.csv  | sort > /tmp/blipsSingle.csv

$ cat import/blips.csv  | sort > /tmp/blips.csv

$ diff /tmp/blips.csv /tmp/blipsSingle.csv 


The code in this post is all on github. I'm sure I've made some mistakes/there are ways that this could be done better so do let me know in the comments or I'm @markhneedham on twitter.

The post Go: First attempt at channels appeared first on Mark Needham.

Categories: Programming

Go: cannot execute binary file: Exec format error

Fri, 12/23/2016 - 19:24

In an earlier blog post I mentioned that I’d been building an internal application to learn a bit of Go and I wanted to deploy it to AWS.

Since the application was only going to live for a couple of days I didn’t want to spend a long time build up anything fancy so my plan was just to build the executable, SSH it to my AWS instance, and then run it.

My initial (somewhat naive) approach was to just build the project on my Mac and upload and run it:

$ go build

$ scp myapp ubuntu@aws...

$ ssh ubuntu@aws...

$ ./myapp
-bash: ./myapp: cannot execute binary file: Exec format error

That didn’t go so well! By reading Ask Ubuntu and Dave Cheney’s blog post on cross compilation I realised that I just needed to set the appropriate environment variables before running go build.

The following did the trick:

env GOOS=linux GOARCH=amd64 GOARM=7 go build

And that’s it! I’m sure there’s more sophisticated ways of doing this that I’ll come to learn about but for now this worked for me.

The post Go: cannot execute binary file: Exec format error appeared first on Mark Needham.

Categories: Programming

Neo4j: Graphing the ThoughtWorks Technology Radar

Fri, 12/23/2016 - 18:40

For a bit of Christmas holiday fun I thought it’d be cool to create a graph of the different blips on the ThoughtWorks Technology Radar and how the recommendations have changed over time.

I wrote a script to extract each blip (e.g. .NET Core) and the recommendation made in each radar that it appeared in. I ended up with a CSV file:

|----------------------------------------------+----------+-------------|
|  technology                                  | date     | suggestion  |
|----------------------------------------------+----------+-------------|
|  AppHarbor                                   | Mar 2012 | Trial       |
|  Accumulate-only data                        | Nov 2015 | Assess      |
|  Accumulate-only data                        | May 2015 | Assess      |
|  Accumulate-only data                        | Jan 2015 | Assess      |
|  Buying solutions you can only afford one of | Mar 2012 | Hold        |
|----------------------------------------------+----------+-------------|

I then wrote a Cypher script to create the following graph model:

2016 12 23 16 52 08

WITH ["Hold", "Assess", "Trial", "Adopt"] AS positions
UNWIND RANGE (0, size(positions) - 2) AS index
WITH positions[index] AS pos1, positions[index + 1] AS pos2
MERGE (position1:Position {value: pos1})
MERGE (position2:Position {value: pos2})
MERGE (position1)-[:NEXT]->(position2);

load csv with headers from "file:///blips.csv" AS row
MATCH (position:Position {value:  row.suggestion })
MERGE (tech:Technology {name:  row.technology })
MERGE (date:Date {value: row.date})
MERGE (recommendation:Recommendation {
  id: tech.name + "_" + date.value + "_" + position.value})
MERGE (recommendation)-[:ON_DATE]->(date)
MERGE (recommendation)-[:POSITION]->(position)
MERGE (recommendation)-[:TECHNOLOGY]->(tech);

match (date:Date)
SET date.timestamp = apoc.date.parse(date.value, "ms", "MMM yyyy");

MATCH (date:Date)
WITH date
ORDER BY date.timestamp
WITH COLLECT(date) AS dates
UNWIND range(0, size(dates)-2) AS index
WITH dates[index] as month1, dates[index+1] AS month2
MERGE (month1)-[:NEXT]->(month2);

MATCH (tech)<-[:TECHNOLOGY]-(reco:Recommendation)-[:ON_DATE]->(date)
WITH tech, reco, date
ORDER BY tech.name, date.timestamp
WITH tech, COLLECT(reco) AS recos
UNWIND range(0, size(recos)-2) AS index
WITH recos[index] AS reco1, recos[index+1] AS reco2
MERGE (reco1)-[:NEXT]->(reco2);

Note that I installed the APOC procedures library so that I could convert the string representation of a date into a timestamp using the apoc.date.parse function. The blips.csv file needs to go in the import directory of Neo4j.

Now we’re reading to write some queries.

The Technology Radar has 4 positions that can be taken for a given technology: Hold, Assess, Trial, and Adopt:

  • Hold: Process with Caution
  • Assess: Worth exploring with the goal of understanding how it will affect your enterprise.
  • Trial: Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.
  • Adopt: We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.

I was curious whether there had ever been a technology where the advice was initially to ‘Hold’ but had later changed to ‘Assess’. I wrote the following query to find out:

MATCH (pos1:Position {value:"Hold"})<-[:POSITION]-(reco)-[:TECHNOLOGY]->(tech),
      (pos2:Position {value:"Assess"})<-[:POSITION]-(otherReco)-[:TECHNOLOGY]->(tech),
      (reco)-[:ON_DATE]->(recoDate),
      (otherReco)-[:ON_DATE]->(otherRecoDate)
WHERE (reco)-[:NEXT]->(otherReco)
RETURN tech.name AS technology, otherRecoDate.value AS dateOfChange;

╒════════════╀══════════════╕
β”‚"technology"β”‚"dateOfChange"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════║
β”‚"Azure"     β”‚"Aug 2010"    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Only Azure! The page doesn’t have any explanation for the initial ‘Hold’ advice in April 2010 which was presumably just before ‘the cloud’ became prominent. What about the other way around? Are there any technologies where the suggestion was initially to ‘Assess’ but later to ‘Hold’?

MATCH (pos1:Position {value:"Assess"})<-[:POSITION]-(reco)-[:TECHNOLOGY]->(tech),
      (pos2:Position {value:"Hold"})<-[:POSITION]-(otherReco)-[:TECHNOLOGY]->(tech),
      (reco)-[:ON_DATE]->(recoDate),
      (otherReco)-[:ON_DATE]->(otherRecoDate)
WHERE (reco)-[:NEXT]->(otherReco)
RETURN tech.name AS technology, otherRecoDate.value AS dateOfChange;

╒═══════════════════════════════════╀══════════════╕
β”‚"technology"                       β”‚"dateOfChange"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════║
β”‚"RIA"                              β”‚"Apr 2010"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Backbone.js"                      β”‚"Oct 2012"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Pace-layered Application Strategy"β”‚"Nov 2015"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"SPDY"                             β”‚"May 2015"    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"AngularJS"                        β”‚"Nov 2016"    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

A couple of these are Javascript libraries/frameworks so presumably the advice is now to use React instead. Let’s check:

MATCH (t:Technology)<-[:TECHNOLOGY]-(reco)-[:ON_DATE]->(date), (reco)-[:POSITION]->(pos)
WHERE t.name contains "React.js"
RETURN pos.value, date.value 
ORDER BY date.timestamp

╒═══════════╀════════════╕
β”‚"pos.value"β”‚"date.value"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════║
β”‚"Assess"   β”‚"Jan 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"    β”‚"May 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"    β”‚"Nov 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"    β”‚"Apr 2016"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"    β”‚"Nov 2016"  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Ember is also popular:

MATCH (t:Technology)<-[:TECHNOLOGY]-(reco)-[:ON_DATE]->(date), (reco)-[:POSITION]->(pos)
WHERE t.name contains "Ember"
RETURN pos.value, date.value 
ORDER BY date.timestamp

╒═══════════╀════════════╕
β”‚"pos.value"β”‚"date.value"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════║
β”‚"Assess"   β”‚"May 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Assess"   β”‚"Nov 2015"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"    β”‚"Apr 2016"  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"    β”‚"Nov 2016"  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Let’s go on a different tangent and look at how many technologies were introduced in the most recent radar?

MATCH (date:Date {value: "Nov 2016"})<-[:ON_DATE]-(reco)
WHERE NOT (reco)<-[:NEXT]-()
RETURN COUNT(*) 

╒══════════╕
β”‚"COUNT(*)"β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•‘
β”‚"45"      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Wow, 45 new things! How were they spread across the different positions?

MATCH (date:Date {value: "Nov 2016"})<-[:ON_DATE]-(reco)-[:TECHNOLOGY]->(tech), 
      (reco)-[:POSITION]->(position)
WHERE NOT (reco)<-[:NEXT]-()
WITH position, COUNT(*) AS count, COLLECT(tech.name) AS technologies
ORDER BY LENGTH((position)-[:NEXT*]->()) DESC
RETURN position.value, count, technologies

╒════════════════╀═══════╀══════════════════════════════════════════════╕
β”‚"position.value"β”‚"count"β”‚"technologies"                                β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════β•ͺ══════════════════════════════════════════════║
β”‚"Hold"          β”‚"1"    β”‚["Anemic REST"]                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Assess"        β”‚"28"   β”‚["Nuance Mix","Micro frontends","Three.js","Scβ”‚
β”‚                β”‚       β”‚ikit-learn","WebRTC","ReSwift","Vue.js","Electβ”‚
β”‚                β”‚       β”‚ron","Container security scanning","wit.ai","Dβ”‚
β”‚                β”‚       β”‚ifferential privacy","Rapidoid","OpenVR","AWS β”‚
β”‚                β”‚       β”‚Application Load Balancer","Tarantool","IndiaSβ”‚
β”‚                β”‚       β”‚tack","Ethereum","axios","Bottled Water","Cassβ”‚
β”‚                β”‚       β”‚andra carefully","ECMAScript 2017","FBSnapshotβ”‚
β”‚                β”‚       β”‚Testcase","Client-directed query","JuMP","Clojβ”‚
β”‚                β”‚       β”‚ure.spec","HoloLens","Android-x86","Physical Wβ”‚
β”‚                β”‚       β”‚eb"]                                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Trial"         β”‚"13"   β”‚["tmate","Lightweight Architecture Decision Reβ”‚
β”‚                β”‚       β”‚cords","APIs as a product","JSONassert","Unityβ”‚
β”‚                β”‚       β”‚ beyond gaming","Galen","Enzyme","Quick and Niβ”‚
β”‚                β”‚       β”‚mble","Talisman","fastlane","Auth0","Pa11y","Pβ”‚
β”‚                β”‚       β”‚hoenix"]                                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚"Adopt"         β”‚"3"    β”‚["Grafana","Babel","Pipelines as code"]       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Lots of new things to explore over the holidays! The CSV files, import script, and queries used in this post are all available on github if you want to play around with them.

The post Neo4j: Graphing the ThoughtWorks Technology Radar appeared first on Mark Needham.

Categories: Programming

Go: Templating with the Gin Web Framework

Fri, 12/23/2016 - 15:30

I spent a bit of time over the last week building a little internal web application using Go and the Gin Web Framework and it took me a while to get the hang of the templating language so I thought I’d write up some examples.

Before we get started, I’ve got my GOPATH set to the following path:

$ echo $GOPATH
/Users/markneedham/projects/gocode

And the project containing the examples sits inside the src directory:

$ pwd
/Users/markneedham/projects/gocode/src/github.com/mneedham/golang-gin-templating-demo

Let’s first install Gin:

$ go get gopkg.in/gin-gonic/gin.v1

It gets installed here:

$ ls -lh $GOPATH/src/gopkg.in
total 0
drwxr-xr-x   3 markneedham  staff   102B 23 Dec 10:55 gin-gonic

Now let’s create a main function to launch our web application:

demo.go

package main

import (
	"github.com/gin-gonic/gin"
	"net/http"
)

func main() {
	router := gin.Default()
	router.LoadHTMLGlob("templates/*")

	// our handlers will go here

	router.Run("0.0.0.0:9090")
}

We’re launching our application on port 9090 and the templates live in the templates directory which is located relative to the file containing the main function:

$ ls -lh
total 8
-rw-r--r--  1 markneedham  staff   570B 23 Dec 13:34 demo.go
drwxr-xr-x  4 markneedham  staff   136B 23 Dec 13:34 templates
Arrays

Let’s create a route which will display the values of an array in an unordered list:

	router.GET("/array", func(c *gin.Context) {
		var values []int
		for i := 0; i < 5; i++ {
			values = append(values, i)
		}

		c.HTML(http.StatusOK, "array.tmpl", gin.H{"values": values})
	})
    {{ range .values }}
  • {{ . }}
  • {{ end }}

And now we'll cURL our application to see what we get back:

$ curl http://localhost:9090/array
  • 0
  • 1
  • 2
  • 3
  • 4

What about if we have an array of structs instead of just strings?

import "strconv"

type Foo struct {
	value1 int
	value2 string
}

	router.GET("/arrayStruct", func(c *gin.Context) {
		var values []Foo
		for i := 0; i < 5; i++ {
			values = append(values, Foo{Value1: i, Value2: "value " + strconv.Itoa(i)})
		}

		c.HTML(http.StatusOK, "arrayStruct.tmpl", gin.H{"values": values})
	})

    {{ range .values }}
  • {{ .Value1 }} -> {{ .Value2 }}
  • {{ end }}

cURL time:

$ curl http://localhost:9090/arrayStruct
  • 0 -> value 0
  • 1 -> value 1
  • 2 -> value 2
  • 3 -> value 3
  • 4 -> value 4
Maps

Now let's do the same for maps.

	router.GET("/map", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"

		c.HTML(http.StatusOK, "map.tmpl", gin.H{"myMap": values})
	})
    {{ range .myMap }}
  • {{ . }}
  • {{ end }}

And cURL it:

$ curl http://localhost:9090/map
  • Go
  • 1.7.4

What if we want to see the keys as well?

	router.GET("/mapKeys", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"

		c.HTML(http.StatusOK, "mapKeys.tmpl", gin.H{"myMap": values})
	})
    {{ range $key, $value := .myMap }}
  • {{ $key }} -> {{ $value }}
  • {{ end }}
$ curl http://localhost:9090/mapKeys
  • language -> Go
  • version -> 1.7.4

And finally, what if we want to select specific values from the map?

	router.GET("/mapSelectKeys", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"

		c.HTML(http.StatusOK, "mapSelectKeys.tmpl", gin.H{"myMap": values})
	})
  • Language: {{ .myMap.language }}
  • Version: {{ .myMap.version }}
$ curl http://localhost:9090/mapSelectKeys
  • Language: Go
  • Version: 1.7.4

I've found the Hugo Go Template Primer helpful for figuring this out so that's a good reference if you get stuck. You can find a go file containing all the examples on github if you want to use that as a starting point.

The post Go: Templating with the Gin Web Framework appeared first on Mark Needham.

Categories: Programming

Docker: Unknown – Unable to query docker version: x509: certificate is valid for

Wed, 12/21/2016 - 08:11

I was playing around with Docker locally and somehow ended up with this error when I tried to list my docker machines:

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER    ERRORS
default   -        virtualbox   Running   tcp://192.168.99.101:2376           Unknown   Unable to query docker version: Get https://192.168.99.101:2376/v1.15/version: x509: certificate is valid for 192.168.99.100, not 192.168.99.101

My Google Fu was weak I couldn’t find any suggestions for what this might mean so I tried shutting it down and starting it again!

On the restart I actually got some helpful advice:

$ docker-machine stop
Stopping "default"...
Machine "default" was stopped.
$ docker-machine start
Starting "default"...
(default) Check network to re-create if needed...
(default) Waiting for an IP...
Machine "default" was started.
Waiting for SSH to be available...
Detecting the provisioner...
Started machines may have new IP addresses. You may need to re-run the `docker-machine env` command.

So I tried that:

$ docker-machine env
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.101:2376": x509: certificate is valid for 192.168.99.100, not 192.168.99.101
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

And then regenerates my certificates:

$ docker-machine regenerate-certs
Regenerate TLS machine certs?  Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Detecting the provisioner...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...

And now everything is happy again!

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER   ERRORS
default   -        virtualbox   Running   tcp://192.168.99.101:2376           v1.9.0

The post Docker: Unknown – Unable to query docker version: x509: certificate is valid for appeared first on Mark Needham.

Categories: Programming