Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!
Software Development Blogs: Programming, Software Testing, Agile Project Management
Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!
As I mentioned in my previous blog post I’ve been hacking on a product taxonomy and I wanted to create a ‘CHILD’ relationship between a collection of categories.
For example, I had the following array and I wanted to transform it into an array of ‘SubCategory, Category’ pairs:
taxonomy = ["Cat", "SubCat", "SubSubCat"]
# I wanted this to become [("Cat", "SubCat"), ("SubCat", "SubSubCat")
In order to do this we need to zip the first 2 items with the last which I found reasonably easy to do using Python:
>>> zip(taxonomy[:-1], taxonomy[1:])
[('Cat', 'SubCat'), ('SubCat', 'SubSubCat')]
Here we using the python array slicing notation to get all but the last item of ‘taxonomy’ and then all but the first item of ‘taxonomy’ and zip them together.
I wanted to achieve that effect in Ruby though because my import job was written in that!
We can’t achieve the open ended slicing as far as I can tell so the following gives us an error:
> taxonomy[..-1]
SyntaxError: (irb):10: syntax error, unexpected tDOT2, expecting ']'
taxonomy[..-1]
^
from /Users/markhneedham/.rbenv/versions/1.9.3-p327/bin/irb:12:in `<main>'
The way negative indexing works is a bit different so to remove the last item of the array we use ‘-2′ rather than ‘-1′:
> taxonomy[0..-2].zip(taxonomy[1..-1]) => [["Cat", "SubCat"], ["SubCat", "SubSubCat"]]
I’ve been playing around with modelling a product taxonomy and one thing that I wanted to do was find out the full path where a product sits under the tree.
I created a simple data set to show the problem:
CREATE (cat { name: "Cat" })
CREATE (subcat1 { name: "SubCat1" })
CREATE (subcat2 { name: "SubCat2" })
CREATE (subsubcat1 { name: "SubSubCat1" })
CREATE (product1 { name: "Product1" })
CREATE (cat)-[:CHILD]-subcat1-[:CHILD]-subsubcat1
CREATE (product1)-[:HAS_CATEGORY]-(subsubcat1)
I wanted to write a query which would return ‘product1′ and the tree ‘Cat -> SubCat1 -> SubSubCat1′ and initially wrote the following query:
START product=node:node_auto_index(name="Product1") MATCH product-[:HAS_CATEGORY]-category, taxonomy=category<-[:CHILD*1..]-parent RETURN product, EXTRACT(n IN NODES(taxonomy): n.name)
which returns:
==> +--------------------------------------------------------------------+
==> | product | EXTRACT(n IN NODES(taxonomy): n.name) |
==> +--------------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1"] |
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1","Cat"] |
==> +--------------------------------------------------------------------+
==> 2 rows
I didn’t want to return the first row since that isn’t the full tree and Andres suggested that looking for nodes which didn’t have any incoming children would help me do that:
START product=node:node_auto_index(name="Product1")
MATCH product-[:HAS_CATEGORY]-category,
taxonomy=category<-[:CHILD*1..]-parent
WHERE NOT parent<-[:CHILD]-()
RETURN product, EXTRACT(n IN NODES(taxonomy): n.name)
==> +--------------------------------------------------------------------+
==> | product | EXTRACT(n IN NODES(taxonomy): n.name) |
==> +--------------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1","Cat"] |
==> +--------------------------------------------------------------------+
==> 1 row
If we want to reverse the taxonomy so it’s in the right order we can follow Wes Freeman’s advice from the following Stack Overflow thread:
START product=node:node_auto_index(name="Product1")
MATCH product-[:HAS_CATEGORY]-category, taxonomy=category<-[:CHILD*1..]-parent
WHERE NOT parent<-[:CHILD]-()
RETURN product,
REDUCE(acc=[], cat IN EXTRACT(n IN NODES(taxonomy): n.name): cat + acc) AS taxonomy
==> +-------------------------------------------------------------+
==> | product | taxonomy |
==> +-------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["Cat","SubCat1","SubSubCat1"] |
==> +-------------------------------------------------------------+
==> 1 row
Chris and I were looking at the neo4j log files of a client earlier in the week and wanted to do some processing of the file so we could ask the client to send us some further information.
The log file was over 10,000 lines long but the bit of the file we were interesting in was only a few hundred lines.
I usually use Vim and the ‘:set number’ when I want to refer to line numbers in a file but Chris showed me that we can achieve the same thing with e.g. ‘less -N data/log/neo4j.0.0.log’.
We can then operate on say lines 10-100 by passing the ‘-n’ flag to sed:
-n By default, each line of input is echoed to the standard output after all of the commands have been applied to it. The -n option suppresses this behavior.
$ sed -n '10,15p' data/log/neo4j.0.0.log INFO: Enabling HTTPS on port [7473] May 19, 2013 11:11:52 AM org.neo4j.server.logging.Logger log INFO: No SSL certificate found, generating a self-signed certificate.. May 19, 2013 11:11:53 AM org.neo4j.server.logging.Logger log INFO: Mounted discovery module at [/] May 19, 2013 11:11:53 AM org.neo4j.server.logging.Logger log
We then used a combination of grep, awk and sort to work out which log files we needed.
I’ve written a couple of posts over the last few months about my experiences with A/B testing and one conversation we often used to have was around user experience vs conversion rate.
Once you start running an A/B test it encourages you to focus more on the conversion rate of users in different parts of the flow and your inclination is to make changes that increase that conversion rate.
Another one of our drivers is to provide the best user experience that we can to our customers and since sometimes this means that the best thing for them is not to switch it seems that these two must be in conflict.
I found it particularly interesting seeing how the conversion rate could be impacted by the way that information was displayed to a user.
This was an idea that I first came across when reading about how the Obama campaign used A/B testing where they noticed big changes in conversion rates by making small tweaks to sentences and imagery.
Our goal from a user experience perspective was to put all the information in front of the user so that they could make an informed choice about what to do.
Initially we made the negative features of the plans very prominent and had them in a large font which led to a drop in conversion.
We assumed that people were now giving more importance to the negative features than was warranted e.g. some plans had a cancellation fee but it typically only accounted for 5% of the saving they’d make by switching to the plan.
When the product is a bit more complicated we could argue that we improve the user experience by helping the user to make an appropriate choice.
On a website the way that we do this is by how we display information by changing the font size, font weight, positioning and a variety of other things.
It’s an interesting balance to find between the two drivers but if we veer towards conversion at all costs then although we’ll get a higher conversion rate in the long term we’ll have some frustrated customers who won’t use our website again.
If we look at it that way then the two drivers don’t seem so opposed to each other.
In my time playing around with neo4j I’ve run into a problem a few times where I executed a query using the web console (usually accessible @ http://localhost:7474/webadmin/#/console/) and have got absolutely no response.
I noticed a similar thing today when Rickard and I were having a look at why a Lucene index query wasn’t behaving as we expected.
I setup some data in a neo4j database using neography with the following code:
require 'neography'
@neo = Neography::Rest.new
@neo.create_node_index("Id_Index", "exact", "lucene")
node1 = @neo.create_node("Hour" => 1, "name" => "Max")
node2 = @neo.create_node("Hour" => 2, "name" => "Mark")
node3 = @neo.create_node("Hour" => 3, "name" => "Rickard")
@neo.add_node_to_index("Id_Index", "Hour", 1, node1)
@neo.add_node_to_index("Id_Index", "Hour", 2, node2)
@neo.add_node_to_index("Id_Index", "Hour", 3, node3)
I then ran the following query which I was expecting to return all the nodes:
start hour=node:Id_Index("Hour:[00 TO 02] or Hour:[03 TO 05]") RETURN hour
Instead it returned nothing and I couldn’t see anything being logged either.
Rickard pointed out was because the exception is only returned to the API caller and that it would be better to run the query from the Data Browser which is typically accessible from http://localhost:7474/webadmin/#/data/search/
If we run the query from there then we can see what’s going wrong:
BadInputException StackTrace: org.neo4j.server.rest.repr.RepresentationExceptionHandlingIterable.exceptionOnHasNext(RepresentationExceptionHandlingIterable.java:50) org.neo4j.helpers.collection.ExceptionHandlingIterable$1.hasNext(ExceptionHandlingIterable.java:60) org.neo4j.helpers.collection.IteratorWrapper.hasNext(IteratorWrapper.java:42) org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:58) org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:75) org.neo4j.server.rest.repr.MappingSerializer.putList(MappingSerializer.java:61) org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:57) org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:42) org.neo4j.server.rest.repr.OutputFormat.assemble(OutputFormat.java:179) org.neo4j.server.rest.repr.OutputFormat.formatRepresentation(OutputFormat.java:131) org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:117) org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:55) org.neo4j.server.rest.web.CypherService.cypher(CypherService.java:94) java.lang.reflect.Method.invoke(Method.java:597)
There seemed to be some strangeness going on with how Lucene handles the query when a default search field isn’t provided but we noticed that it behaved as expected if we didn’t use an OR since Lucene has an implicit OR between statements anyway.
start hour=node:Id_Index("Hour:[00 TO 02] Hour:[03 TO 05]") RETURN hour
Either way, the lesson for me was if the console isn’t giving a result run the query in the data browser to work out what’s going wrong!
Nate Silver is famous for having correctly predicted the winner of all 50 states in the 2012 United States elections and Sid recommended his book so I could learn more about statistics for the A/B tests that we were running.
I thought the book was a really good introduction to applied statistics and by using real life examples which most people would be able to relate to it makes a potentially dull subject interesting.
Reasonably early on the author points out that there’s a difference between making a prediction and making a forecast:
The book mainly focuses on the latter.
We then move onto quite an interesting section about over fitting which is where we mistake noise for signal in our data.
I first came across this term when Jen and I were working through one of the Kaggle problems and were using a random forest of deliberately over fitted Decision Trees to do digit recognition.
It’s not a problem when we combine lots of decision trees together and use a majority wins algorithm to make our prediction but if we use just one of them its predictions on any new data will be completely wrong.
Later on in the book he points out that a lot of conspiracy theories come when we look at data retrospectively and can easily detect signal from noise in data when at the time it was much more difficult.
He also points out that sometimes there isn’t actually any signal, it’s all noise, and we can fall into the trap of looking for something that isn’t there. I think this ‘noise’ is what we’d refer to as random variation in the context of an A/B test.
Silver also encourages us to make sure that we understand the theory behind any inference we make:
Statistical inferences are much stronger when backed up by theory or at least some deeper thinking about their root causes.
When we were running A/B tests Sid encouraged people to think whether a theory about why conversion had changed made logical sense before assuming it was true which I think covers similar ground.
A big chunk of the book covers Bayes’ theorem and how often when we’re making forecasts we have prior beliefs which it forces us to make explicit.
For example there is a section which talks about the probability a lady is being cheated on given that she’s found some underwear that she doesn’t recognise in her house.
In order to work out the probability she’s being cheated on we need to know the probability that she was being cheated on before she found the underwear. Silver suggests that since 4% of married partners cheat on their spouses that would be a good number to use.
He then goes on to show multiple other problems throughout the book that we can apply Bayes’ theorem to.
Some other interesting things I picked up are that if we’re good at forecasting then being given more information should make our forecast better and that when we don’t have any special information we’re better off following the opinion of the crowd.
Silver also showed a clever trick for inferring data points on a data set which follows a power law i.e. the long tail distribution where there are very few massive events but lots of really small ones.
We have a power law distribution when modelling the number of terrorists attacks vs number of fatalities but if we change both scales to be logarithmic we can come up with a probability of how likely more deadly attacks are.
There is then some discussion of how we can make changes in the way that we treat terrorism to try and impact the shape of the chart e.g. in Israel Silver suggests that they really want to avoid a very deadly attack but at the expense of there being more smaller attacks.
A lot of the book is spent discussing weather/earthquake forecasting which is very interesting to read about but I couldn’t quite see a link back to the software context.
Overall though I found it an interesting read although there are probably a few places that you can skim over the detail and still get the gist of what he’s saying.
I’ve been using Sublime a bit recently and one thing I wanted to do was put neo4j cypher queries into files with arbitrary extensions and have them recognised as cypher files every time I open them.
I’m using the cypher Sublime plugin to get the syntax highlighting but since I’ve got my cypher in a .haml file it only remembers that it should have cypher highlighting as long as the file is open.
As soon as I close and then re-open the file it goes back to being highlighted as HAML.
I initially thought that the way around this would be to write a plugin which kept track of files that you’d manually assigned a syntax to but then I came across the ApplySyntax plugin which seems even better.
ApplySyntax allows you to assign syntaxes to files based on regular expression matching on the file name or on the first line of the file.
At the moment, the easiest way to detect that a file is a cypher query is that the first line will begin with ‘START’ so I wrote the following in my user settings file:
~/Library/Application Support/Sublime Text 2/Packages/User/ApplySyntax.sublime-settings
{
"reraise_exceptions": false,
"new_file_syntax": false,
"syntaxes": [
{
"name": "Cypher",
"rules": [
{"first_line": "^START"}
]
}
]
}
ApplySyntax is a pretty neat plugin, worth having a look if you have this problem to solve!
Thibaut and I spent the best part of the last couple of days trying to diagnose a problem we were having trying to make a POST request using rest-client to one of our services.
We have nginx fronting the application server so the request passes through there first:
The problem we were having was that the request was timing out on the client side before it had been processed and the request wasn’t reaching the application server.
We initially thought there might be a problem with our nginx configuration because we don’t have many POST requests with largish (40kb) payloads so we initially tried tweaking the proxy buffer size.
It was a bit of a long shot because changing that setting only reduces the likelihood that nginx writes the request body to disc and then loads it later which shouldn’t impact performance that much.
The next thing we tried was replicating the request using cURL with a smaller payload which worked fine. cURL had no problem with the bigger payload either.
We therefore thought there must be a difference in the request headers being sent by rest-client and our initial investigation suggested that it might be to do with the ‘Content-Length‘ header.
There was a 1 byte difference in the value being sent by cURL and the one being sent by rest-client which was to do with the last character of the payload being a 0A (linefeed) character.
We changed the ‘Content-Length’ header on our cURL request to match that of the rest-client request (i.e. 1 byte too large) and were able to replicate the timeout problem.
At this stage we thought that calling ‘strip’ on the body of our rest-client request would solve the problem as the ‘Content-Length’ header would now be set to the correct value. It did set the ‘Content-Length’ header properly but unfortunately didn’t get rid of the timeout.
Our next step was to check whether or not we could get any request to work from rest-client so we tried using a smaller payload which worked fine.
At this stage Jason heard us discussing what to do next and said that he’d come across it earlier and that upgrading our Ruby Version from ’1.9.3p0′ would solve all our woes.
That Ruby version is a couple of years old and most of our servers are running ’1.9.3p392′ but somehow this one had slipped through the net.
We spun up a new server with that version of Ruby installed and it did indeed fix the problem.
However, we were curious what the fix was and had a look at the change log of the first patch release after ’1.9.3p0′. We noticed the following which seemed relevant:
Tue May 31 17:03:24 2011 Hiroshi Nakamura
* lib/net/http.rb, lib/net/protocol.rb: Allow to configure to wait
server returning ’100 continue’ response before sending HTTP request
body. See NEWS for more detail. See #3622.
Original patch is made by Eric Hodel .
* test/net/http/test_http.rb: test it.
* NEWS: Add new feature.
One thing we noticed from looking at the requests with ngrep was that cURL was setting the 100 Continue Expect request header and rest-client wasn’t.
When the payload size was small nginx didn’t seem to send a ’100 Continue’ response which was presumably why we weren’t seeing a problem with the small payloads.
I wasn’t sure how to go about finding out exactly what was going wrong but given how long it took us to get to this point I thought I’d summarise what we tried and see if anyone could explain it to me.
So if you’ve come across this problem (probably 2 years ago!) it’d be cool to know exactly what the problem was.
When I first started working at uSwitch Sid installed a couple of ‘productivity applications’ on my Mac which I’ve found pretty useful but from talking to others I realised they aren’t known/being used by everyone.
AlfredAlfred is a Quick Silver replacement which allows you to quickly open applications, find files, search Google and more. Even though we’re not using half of its features it’s still proved to be useful.
I quite like the calculator feature which we’ve been using for adhoc calculation like working out how much free memory there was on a server or the conversion rate on part of an A/B test.
Moom
The other application is Moom which allows you to move/resize windows.
I didn’t see the point when I first saw it but it’s actually really useful when you’re working on a big monitor and want to put say the terminal alongside the browser.
We have the following shortcuts set up:
That allows us to type ‘Ctrl + Space’ to make the window fill the left hand side of the screen, ‘Alt + Space’ to make it fill the right hand side of the screen and ‘Alt + Ctrl + Space’ to fill the whole screen.
You can also set up shortcuts to allow you to move a window between displays or to rearrange the windows based on certain events.
Highly recommended!
If anyone knows any other cool tools like this I’d love to hear about them.
I’ve been trying to see if I can match some of the football stats that OptaJoe posts on twitter and one that I was looking at yesterday was around the number of red cards different teams have received.
1 – Sunderland have picked up their first PL red card of the season. The only team without one now are Man Utd. Angels.
To refresh this is the sub graph that we’ll need to look at to work it out:
I started off with the following query which traverses out from each match, finds the players who were sent off in the match and then groups the sendings off by the team they were playing for:
START game = node:matches('match_id:*')
MATCH game<-[:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards
LIMIT 5
When we run this we get the following results:
+------------------------------+ | team.name | redCards | +------------------------------+ | "Sunderland" | 1 | | "West Ham United" | 1 | | "Norwich City" | 1 | | "Reading" | 1 | | "Liverpool" | 2 | +------------------------------+ 5 rows
The problem we have here is that it hasn’t returned Manchester United because they haven’t yet received any red cards and therefore none of their players match the ‘sent_off_in’ relationship.
I ran into something similar in a post I wrote about a month ago where I was working out which day of the week players scored on.
The first step towards getting Manchester United to return with a count of 0 is to make the ‘sent_off_in’ relationship optional.
However, that on its own that isn’t enough because it now returns a count of all the player performances for each team:
START game = node:matches('match_id:*')
MATCH game<-[?:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards ASC
LIMIT 5
+-----------------------------+ | team.name | redCards | +-----------------------------+ | "Chelsea" | 448 | | "Wigan Athletic" | 459 | | "Fulham" | 460 | | "Liverpool" | 466 | | "Everton" | 467 | +-----------------------------+ 5 rows
Instead what we need to do is collect up all the ‘sent_off_in’ relationships and sum them up.
We can use the COLLECT function to do that and the neat thing about COLLECT is that it doesn’t bother collecting the empty relationships so we end up with exactly what we need:
START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, COLLECT(r) AS redCards
LIMIT 5
+-----------------------------------------------------------------------------------------------------+
| team.name | redCards |
+-----------------------------------------------------------------------------------------------------+
| "Wigan Athletic" | [:sent_off_in[26443] {},:sent_off_in[37785] {}] |
| "Everton" | [:sent_off_in[6795] {minute:61},:sent_off_in[21735] {},:sent_off_in[34594] {}] |
| "Newcastle United" | [:sent_off_in[434] {minute:75},:sent_off_in[32389] {},:sent_off_in[34915] {}] |
| "Southampton" | [:sent_off_in[49393] {minute:70},:sent_off_in[49392] {minute:82}] |
| "West Ham United" | [:sent_off_in[21734] {minute:67}] |
+-----------------------------------------------------------------------------------------------------+
5 rows
We then just need to call the LENGTH function to work out how many red cards there are in each collection and then we’re done:
START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, LENGTH(COLLECT(r)) AS redCards
ORDER BY redCards
LIMIT 5
+--------------------------------+ | team.name | redCards | +--------------------------------+ | "Manchester United" | 0 | | "West Ham United" | 1 | | "Sunderland" | 1 | | "Norwich City" | 1 | | "Reading" | 1 | +--------------------------------+ 5 rows
A few months ago I wrote about my initial experiences with A/B testing and since then we’ve been working on another one and learnt some things around reporting on these types of tests that I thought was interesting.
Reporting as a first class concernOne thing we changed from our previous test after a suggestion by Mike was to start treating the reporting of data related to the test as a first class citizen.
To do this we created an end point which the main application could send POST requests to in order to record page views and various other information about users.
On our previous test we’d derived the various conversion rates from our main transactional data store but it was really slow and painful because the way we structure data in there is optimised for a completely different use case.
Having just the data we want to report on in a separate data store has massively reduced the time spent generating reports.
However, one thing that we learnt about this approach is that you need to spend some time thinking about what data is going to be needed up front.
If you don’t then it will have to be added later on and the reporting on that metric won’t cover the whole test duration.
Drilling down to get insightIn the first test we ran we only really looked at conversion at quite a high level which is good for getting an overview but doesn’t give much insight into what’s going on.
For this test we started off with higher level metrics but a few days in became curious about what was going on between two of the pages and so created a report that segmented users based on an action they’d taken on the first page.
This allowed us to rule out a theory about a change in conversion which we had initially thought was down to a change we’d made but actually proved to be because of a change in an external factor.
The frustrating part of drilling down into the data is that you don’t really know what is it you’re going to want to zoom in on so you have to write code for the specific scenario each time!
Detecting bugsWe generate browser specific metrics on each test that we run and while the conversion rate is generally similar between them there have been some times when there’s a big drop in one browser.
More often than not when we’ve drilled into this we’ve found that there was actually a Javascript bug that we hadn’t detected and we can then go back and sort that out.
An alternative approach would be to have an automated Javascript/Web Driver test suite which ran against each browser. We’ve effectively traded off the maintenance cost of that for what is usually a small period of inconvenience for some users.
A few agos I wrote a post about treating servers as cattle, not as pets in which I described an approach to managing virtual machines at uSwitch whereby we frequently spin up new ones and delete the existing ones.
I’ve worked on teams previously where we’ve also talked about this mentality but ended up not doing it because it was difficult, usually for one of two reasons:
Martin Fowler wrote a post a couple of years ago where he said the following:
One of my favorite soundbites is: if it hurts, do it more often. It has the happy property of seeming nonsensical on the surface, but yielding some valuable meaning when you dig deeper
I think it applies in this context too and I have noticed that the more frequently we tear down and spin up new nodes the easier it becomes to do so.
Part of this is because there’s been less time for changes to have happened in package repositories but we are also more inclined to optimise things that we have to do frequently so the whole process is faster as well.
For example in one of our sets of machines we need to give one machine a specific tag so that when the application is deployed it sets up a bunch of cron jobs to run each evening.
Initially this was done manually and we were quite reluctant to ever tear down that machine but we’ve now got it all automated and it’s not a big deal anymore – it can be cattle just like the rest of them!
One neat rule of thumb Phil taught me is that if we make major changes to our infrastructure we should spin up some new machines to check that it still actually works.
If we don’t do this then when we actually need to spin up a new node because of a traffic spike or machine corruption problem it’s not going to work and we’re going to have to fix things in a much more stressful context.
For example we recently moved some repositories around in github and although it’s a fairly simple change spinning up new nodes helped us see all the places where we’d failed to make the appropriate change.
While I appreciate taking this approach is more time consuming in the short term I’d argue that if we automate as much of the pain as possible in the long run it will probably be beneficial.
Over the last year or so I’ve spent quite a bit of time working with puppet and one of the things that we had to decide when installing packages was whether or not to specify a particular version.
On the first project I worked on we didn’t bother and just let the package manager chose the most recent version.
Therefore if we were installing nginx the puppet code would read like this:
package { 'nginx':
ensure => 'present',
}
We can see which version that would install by checking the version table for the package:
$ apt-cache policy nginx
nginx:
Installed: (none)
Candidate: 1:1.2.6-1~43~precise1
Version table:
1:1.2.6-1~43~precise1 0
500 http://ppa.launchpad.net/brightbox/ruby-ng/ubuntu/ precise/main amd64 Packages
1.4.0-1~precise 0
500 http://nginx.org/packages/ubuntu/ precise/nginx amd64 Packages
1.1.19-1ubuntu0.1 0
500 http://us.archive.ubuntu.com/ubuntu/ precise-updates/universe amd64 Packages
1.1.19-1 0
500 http://us.archive.ubuntu.com/ubuntu/ precise/universe amd64 Packages
In this case if we don’t specify a version the Brightbox ’1:1.2.6-1~43~precise1′ version will be installed.
Running dpkg with the ‘compare-versions’ flag shows us that this version is considered higher than the nginx.org one:
$ dpkg --compare-versions '1:1.2.6-1~43~precise1' gt '1.4.0-1~precise' ; echo $? 0
From what I understand you can pin versions higher up the list by associating a higher number with them but given that all these versions are set to ’500′ I’m not sure how it decides on the order!
The problem with not specifying a version is that when a new version becomes available the next time puppet runs it will automatically upgrade the version for us.
Most of the time this isn’t a problem but there were a couple of occasions when a version got bumped and something elsewhere stopped working and it took us quite a while to work out what had changed.
The alternative approach is to pin the package installation to a specific version. So if we want the recent 1.4.0 version installed we’d have the following code:
package { 'nginx':
ensure => '1.4.0-1~precise',
}
The nice thing about this approach is that we always know which version is going to be installed.
The problem we now introduce is that when an updated version is added to the repository the old one is typically removed which means a puppet run on a new machine will fail because it can’t find the version.
After working with puppet for a few months it becomes quite easy to see when this is the reason for the failure but it creates the perception that ‘puppet is always failing’ for newer people which isn’t so good.
I think on balance I prefer to have the versions explicitly defined because I find it easier to work out what’s going on that way but I’m sure there’s an equally strong argument for just picking the latest version.
Tim and I were investigating a weird problem we were having with nginx where it was getting in a state where it had exceeded the number of open files allowed on the system and started rejecting requests.
We can find out the maximum number of open files that we’re allowed on a system with the following command:
$ ulimit -n 1024
Our hypothesis was that some socket connections were never being closed and therefore the number of open files was climbing slowly upwards until it exceeded the limit.
We wanted to check how many sockets nginx had open so to start with we needed to know the process IDs it was running under:
$ ps aux | grep nginx | grep -v grep root 1089 0.0 0.7 105152 2736 ? Ss 17:34 0:00 nginx: master process /usr/sbin/nginx www-data 17474 0.0 0.6 105300 2296 ? S 21:49 0:04 nginx: worker process www-data 17475 0.0 0.7 105300 2856 ? S 21:49 0:04 nginx: worker process www-data 17476 0.0 0.7 105300 2792 ? S 21:49 0:03 nginx: worker process www-data 17477 0.0 0.7 105300 2668 ? S 21:49 0:04 nginx: worker process
So the process IDs we’re interested in are 1089, 17474, 17475, 17476 and 17477.
We can check which file descriptors they have open with the following command:
$ sudo ls -alh /proc/{1089,17{474,475,476,477}}/fd
/proc/17476/fd:
total 0
dr-x------ 2 www-data www-data 0 Apr 23 23:40 .
...
l-wx------ 1 www-data www-data 64 Apr 23 23:40 6 -> /var/log/nginx/error.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 7 -> /var/www/thinkingingraphs/shared/log/nginx_access.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 8 -> /var/www/thinkingingraphs/shared/log/nginx_error.log
lrwx------ 1 www-data www-data 64 Apr 23 23:40 9 -> socket:[8910]
/proc/17477/fd:
total 0
...
lrwx------ 1 www-data www-data 64 Apr 23 23:40 56 -> socket:[52213]
lrwx------ 1 www-data www-data 64 Apr 23 23:40 57 -> anon_inode:[eventpoll]
l-wx------ 1 www-data www-data 64 Apr 23 23:40 6 -> /var/log/nginx/error.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 7 -> /var/www/thinkingingraphs/shared/log/nginx_access.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 8 -> /var/www/thinkingingraphs/shared/log/nginx_error.log
lrwx------ 1 www-data www-data 64 Apr 23 23:40 9 -> socket:[8910]
We can narrow that down to just show us how many sockets are open:
$ sudo ls -alh /proc/{1089,17{474,475,476,477}}/fd | grep socket | wc -l
189
We could also use lsof although for some reason that returns a slightly different number:
$ sudo lsof -p 1089,17474,17475,17476,17477 | grep socket | wc -l 184
If we want to use brace expansion to do that it becomes a bit more tricky:
$ sudo lsof -p `echo {1089,174{74,75,76,77}} | sed 's/ /,/g'` | grep socket | wc -l
184
Annoyingly we couldn’t actually replicate the error but think that it’s been solved in nginx 1.2.0 (we were using 1.1.19) by this change:
Bugfix: a segmentation fault might occur in a worker process if the
"try_files" directive was used; the bug had appeared in 1.1.19.
As I mentioned a couple of weeks ago I’ve been working on a tutorial about thinking through problems in graphs and since it’s a Sinatra application I thought thin would be a decent choice for web server.
In my initial setup I had the following nginx config file which was used to proxy requests on to thin:
/etc/nginx/sites-available/thinkingingraphs.conf
upstream thin {
server 127.0.0.1:3000;
}
server {
listen 80 default;
server_name _;
charset utf-8;
rewrite ^\/status(.*)$ $1 last;
gzip on;
gzip_disable "MSIE [1-6]\.(?!.*SV1)";
gzip_types text/plain application/xml text/xml text/css application/x-javascript application/xml+rss text/javascript application/json;
gzip_vary on;
access_log /var/www/thinkingingraphs/shared/log/nginx_access.log;
error_log /var/www/thinkingingraphs/shared/log/nginx_error.log;
root /var/www/thinkingingraphs/current/public;
location / {
proxy_pass http://thin;
}
error_page 404 /404.html;
error_page 500 502 503 504 /500.html;
}
I had an upstart script which started the thin server…
/etc/init/thinkingingraphs.conf
script export RACK_ENV=production export RUBY=ruby cd /var/www/thinkingingraphs/current exec su -s /bin/sh vagrant -c '$RUBY -S bundle exec thin -p 3000 start >> /var/www/thinkingingraphs/current/log/production.log 2>&1' end script
… and then I used the following capistrano script to stop and start the server whenever I was deploying a new version of the application:
config/deploy.rb
namespace :deploy do
task(:start) {}
task(:stop) {}
desc "Restart Application"
task :restart do
sudo "stop thinkingingraphs || echo 0"
sudo "start thinkingingraphs"
end
end
The problem with this approach is that some requests receive a 502 response code while its restarting:
$ bundle exec cap deploy
$ while true; do curl -w %{http_code}:%{time_total} http://localhost/ -o /dev/null -s; printf "\n"; sleep 0.5; done
200:0.076
200:0.074
200:0.095
502:0.003
200:0.696
I wanted to try and make a no downtime deploy script and I came across a couple of posts which helped me work out how to do it.
The first step was to make sure that I had more than one thin instance running so that requests could be sent to one of the other ones while a restart was in progress.
I created the following config file:
/etc/thin/thinkingingraphs.yml
chdir: /var/www/thinkingingraphs/current environment: production address: 0.0.0.0 port: 3000 timeout: 30 log: log/thin.log pid: tmp/pids/thin.pid max_conns: 1024 max_persistent_conns: 100 require: [] wait: 30 servers: 3 daemonize: true onebyone: true
One of the other properties that we need to set is ‘onebyone’ which means that when you restart thin it will take down the thin instances one at a time. This means one of the other two can handle incoming requests.
We’ve set the number of servers to 3 which will spin up 3 instances on ports 3000, 3001 and 3002.
I changed my upstart script to look like this:
/etc/init/thinkingingraphs.conf
script export RACK_ENV=production export RUBY=ruby cd /var/www/thinkingingraphs/current exec su -s /bin/sh vagrant -c '$RUBY -S bundle exec thin -C /etc/thin/thinkingingraphs.yml start >> /var/www/thinkingingraphs/current/log/production.log 2>&1' end script
I also had to change the capistrano script to call ‘thin restart’ instead of stopping and starting the upstart script:
config/deploy.rb
namespace :deploy do
task(:start) {}
task(:stop) {}
desc "Restart Application"
task :restart do
run "cd #{current_path} && bundle exec thin restart -C /etc/thin/thinkingingraphs.yml"
end
end
Finally I had to make some changes to the nginx config file to send on requests to other thin instances if the first attempt failed (due to it being restarted) using the proxy_next_upstream method:
/etc/nginx/sites-available/thinkingingraphs.conf
upstream thin {
server 127.0.0.1:3000 max_fails=1 fail_timeout=15s;
server 127.0.0.1:3001 max_fails=1 fail_timeout=15s;
server 127.0.0.1:3002 max_fails=1 fail_timeout=15s;
}
server {
listen 80 default;
server_name _;
charset utf-8;
rewrite ^\/status(.*)$ $1 last;
gzip on;
gzip_disable "MSIE [1-6]\.(?!.*SV1)";
gzip_types text/plain application/xml text/xml text/css application/x-javascript application/xml+rss text/javascript application/json;
gzip_vary on;
access_log /var/www/thinkingingraphs/shared/log/nginx_access.log;
error_log /var/www/thinkingingraphs/shared/log/nginx_error.log;
root /var/www/thinkingingraphs/current/public;
location / {
proxy_pass http://thin;
proxy_next_upstream error timeout http_502 http_503;
}
error_page 404 /404.html;
error_page 500 502 503 504 /500.html;
}
We’ve also made a change to our upstream definition to proxy requests to one of the thin instances which will be running.
When I deploy the application now there is no downtime:
$ bundle exec cap deploy
$ while true; do curl -w %{http_code}:%{time_total} http://localhost/ -o /dev/null -s; printf "\n"; sleep 0.5; done
200:0.094
200:0.095
200:0.082
200:0.102
200:0.080
200:0.081
The only problem is that upstart now seems to have lost a handle on the thin processes and from what I can tell there isn’t a master process which upstart could get a handle on so I’m not sure how to wire this up.
Any ideas welcome!
In order to run the neo4j server on my Ubuntu 12.04 Vagrant VM I needed to install the Oracle/Sun JDK which proved to be more difficult than I’d expected.
I initially tried to install it via the OAB-Java script but was running into some dependency problems and eventually came across a post which specified a PPA that had an installer I could use.
I wrote a little puppet Java module to wrap the commands in:
class java($version) {
package { "python-software-properties": }
exec { "add-apt-repository-oracle":
command => "/usr/bin/add-apt-repository -y ppa:webupd8team/java",
notify => Exec["apt_update"]
}
package { 'oracle-java7-installer':
ensure => "${version}",
require => [Exec['add-apt-repository-oracle']],
}
}
I then included this in my default node definition:
node default {
class { 'java': version => '7u21-0~webupd8~0', }
}
Unfortunately when I ran that I ended up with the following error:
err: /Stage[main]/Java/Package[oracle-java7-installer]/ensure: change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install oracle-java7-installer' returned 100: Reading package lists... Building dependency tree... Reading state information... The following extra packages will be installed: java-common Suggested packages: ... Unpacking oracle-java7-installer (from .../oracle-java7-installer_7u21-0~webupd8~0_all.deb) ... oracle-license-v1-1 license could not be presented try 'dpkg-reconfigure debconf' to select a frontend other than noninteractive dpkg: error processing /var/cache/apt/archives/oracle-java7-installer_7u21-0~webupd8~0_all.deb (--unpack): subprocess new pre-installation script returned error exit status 2 Processing triggers for man-db ... Errors were encountered while processing: /var/cache/apt/archives/oracle-java7-installer_7u21-0~webupd8~0_all.deb E: Sub-process /usr/bin/dpkg returned an error code (1)
I came across this post on Ask Ubuntu which explained a neat trick for getting around it by making it look like we’ve agreed to the licence. This is done by passing options to debconf-set-selections.
For a real server I guess you’d want some step where a person accepts the licence but since this is just for my hacking it seems to make sense.
My new Java manifest looks like this:
class java($version) {
package { "python-software-properties": }
exec { "add-apt-repository-oracle":
command => "/usr/bin/add-apt-repository -y ppa:webupd8team/java",
notify => Exec["apt_update"]
}
exec {
'set-licence-selected':
command => '/bin/echo debconf shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections';
'set-licence-seen':
command => '/bin/echo debconf shared/accepted-oracle-license-v1-1 seen true | /usr/bin/debconf-set-selections';
}
package { 'oracle-java7-installer':
ensure => "${version}",
require => [Exec['add-apt-repository-oracle'], Exec['set-licence-selected'], Exec['set-licence-seen']],
}
}
As I’ve mentioned in a couple of previous posts I’ve been playing around with creating a Vagrant VM that I can use for my neo4j hacking which has involved a lot of messing around with installing apt packages.
There are loads of different ways of working out what’s going on when packages aren’t installing as you’d expect so I thought it’d be good to document the ones I’ve been using so I can find them more easily next time.
Finding reverse dependenciesA couple of times I found myself wondering how a certain package had ended up on the VM because I hadn’t specified that it should be installed so I wanted to know who had!
I wanted to find out the reverse dependency for the package. e.g. to find out who depended on make which we can find out with the following command:
$ apt-cache rdepends make make Reverse Depends: ... build-essential make:i386 libc6-dev:i386 open-vm-dkms mythbuntu-desktop broadcom-sta-source ...
The nice thing about ‘rdepends’ is that it will tell us reverse dependencies even for a package that we haven’t installed. This was helpful here as I had forgotten to install ‘build-essential’ and this made it obvious.
Finding which version of a package is installedI added one of the Brightbox repositories to get a more recent Ruby version and noticed that something weird was going on with the version of ‘nginx-common’ that puppet was trying to install.
It seemed like one one my dependencies was trying to pull in the ‘latest’ version of ‘nginx-common’ which I’d expected to be ’1.1.19-1ubuntu0.1′.
By passing the ‘policy’ flag to apt-cache I was able to see that there was a recent version available via Brightbox:
$ apt-cache policy nginx-common
nginx-common:
Installed: 1.1.19-1ubuntu0.1
Candidate: 1:1.2.6-1~43~precise1
Version table:
1:1.2.6-1~43~precise1 0
500 http://ppa.launchpad.net/brightbox/ruby-ng/ubuntu/ precise/main amd64 Packages
*** 1.1.19-1ubuntu0.1 0
500 http://us.archive.ubuntu.com/ubuntu/ precise-updates/universe amd64 Packages
100 /var/lib/dpkg/status
1.1.19-1 0
500 http://us.archive.ubuntu.com/ubuntu/ precise/universe amd64 Packages
Finding which versions of a package are available
Another flag that we can pass to apt-cache is ‘madison’ which shows us the available versions for a package but doesn’t indicate which version is installed:
$ apt-cache madison nginx-common
nginx-common | 1:1.2.6-1~43~precise1 | http://ppa.launchpad.net/brightbox/ruby-ng/ubuntu/ precise/main amd64 Packages
nginx-common | 1.1.19-1ubuntu0.1 | http://us.archive.ubuntu.com/ubuntu/ precise-updates/universe amd64 Packages
nginx-common | 1.1.19-1 | http://us.archive.ubuntu.com/ubuntu/ precise/universe amd64 Packages
nginx | 1.1.19-1 | http://us.archive.ubuntu.com/ubuntu/ precise/universe Sources
nginx | 1.1.19-1ubuntu0.1 | http://us.archive.ubuntu.com/ubuntu/ precise-updates/universe Sources
nginx | 1:1.2.6-1~43~precise1 | http://ppa.launchpad.net/brightbox/ruby-ng/ubuntu/ precise/main Sources
Finding which package a file belongs to
At some stage I wanted to check which exact package was installing nginx which I was able to do with the following command:
$ dpkg -S `which nginx` nginx-extras: /usr/sbin/nginx
I had installed ‘nginx-common’ which I learn depends on ‘nginx-extras’ by using our ‘rdepends’ command:
$ apt-cache rdepends nginx-extras nginx-extras Reverse Depends: nginx-naxsi:i386 ... nginx-commonFinding the dependencies of a package
I wanted to check the dependencies of the ‘ruby1.9.1′ package to see whether or not I needed to explicitly install ‘libruby1.9.1′ or if that would be taken care of.
Passing the ‘-s’ flag to dpkg let me check this:
$ dpkg -s ruby1.9.1 Package: ruby1.9.1 Status: install ok installed Architecture: amd64 Version: 1:1.9.3.327-1bbox2~precise1 Replaces: irb1.9.1, rdoc1.9.1, rubygems1.9.1 Provides: irb1.9.1, rdoc1.9.1, ruby-interpreter, rubygems1.9.1 Depends: libruby1.9.1 (= 1:1.9.3.327-1bbox2~precise1), libc6 (>= 2.2.5) Suggests: ruby1.9.1-examples, ri1.9.1, graphviz, ruby1.9.1-dev, ruby-switch Conflicts: irb1.9.1 (<< 1.9.1.378-2~), rdoc1.9.1 (<< 1.9.1.378-2~), ri (<= 4.5), ri1.9.1 (<< 1.9.2.180-3~), ruby (<= 4.5), rubygems1.9.1 ...
These are the ones that I’ve found useful so far. I’d love to here other people’s favourites though as I’m undoubtably missing some.
Last week I was writing a query to find the top scorers in the Premier League so far this season alongside the number of games they’ve played in which initially read like this:
START player = node:players('name:*')
MATCH player-[:started|as_sub]-playedLike-[:in]-game-[r?:scored_in]-player
WITH player, COUNT(DISTINCT game) AS games, COLLECT(r) AS allGoals
RETURN player.name, games, LENGTH(allGoals) AS goals
ORDER BY goals DESC
LIMIT 5
+------------------------------------+ | player.name | games | goals | +------------------------------------+ | "Luis Suárez" | 30 | 22 | | "Robin Van Persie" | 30 | 19 | | "Gareth Bale" | 27 | 17 | | "Michu" | 29 | 16 | | "Demba Ba" | 28 | 15 | +------------------------------------+ 5 rows 1 ms
I modelled whether a player started a game or came on as a substitute with separate relationship types ‘started’ and ‘as_sub’ but in this query we’re not interested in that, we just want to know whether they played.
In the world of relational database design we tend to try and avoid redundancy but with graphs this isn’t such a big deal so I thought I may as well add a ‘played’ relationship whenever a ‘as_sub’ or ‘started’ one was being created.
We can then simplify the above query to read like this:
START player = node:players('name:*')
MATCH player-[:played]-playedLike-[:in]-game-[r?:scored_in]-player
WITH player, COUNT(DISTINCT game) AS games, COLLECT(r) AS allGoals
RETURN player.name, games, LENGTH(allGoals) AS goals
ORDER BY goals DESC
LIMIT 5
+------------------------------------+ | player.name | games | goals | +------------------------------------+ | "Luis Suárez" | 30 | 22 | | "Robin Van Persie" | 30 | 19 | | "Gareth Bale" | 27 | 17 | | "Michu" | 29 | 16 | | "Demba Ba" | 28 | 15 | +------------------------------------+ 5 rows 0 ms
When I’m querying I often forget that I modelled starting/substitute separately and think the data has screwed up and it’s always because I’ve forgotten to include the ‘as_sub’ relationship.
Having the ‘played’ relationship means that no longer happens which is cool.
I have a reasonably small data set so I haven’t seen any performance problems from creating this redundancy.
However, since the maximum number of relationships going out from a player would be unlikely to be more than 1000s I don’t think it will become one either.
As always I’d be interested in thoughts from others who have come across similar problems or can see something that I’ve missed.
I’ve been playing around with a puppet configuration to run a neo4j server on an Ubuntu VM and one thing that has been quite tricky is getting the Sun/Oracle Java JDK to install repeatably.
I adapted Julian’s Java module which uses OAB-Java and although it was certainly working cleanly at one stage I somehow ended up with it not working because of failed dependencies:
[2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[install OAB repo]/returns: [x] Installing Java build requirements Ofailed [2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[install OAB repo]/returns: ^[[m^O [i] Showing the last 5 lines from the logfile (/root/oab-java.sh.log)... [2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[install OAB repo]/returns: nginx-common [2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[install OAB repo]/returns: nginx-extras [2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[install OAB repo]/returns: E: Sub-process /usr/bin/dpkg returned an error code (1) ... [2013-04-12 07:03:10] Warning: /Stage[main]/Java/Package[sun-java6-jdk]: Skipping because of failed dependencies [2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[default JVM]: Dependency Exec[install OAB repo] has failures: true [2013-04-12 07:03:10] Warning: /Stage[main]/Java/Exec[default JVM]: Skipping because of failed dependencies
I spent a few hours looking at this problem but couldn’t quite figure out how to sort out the dependency problem and ended up running part one command manually after which applying puppet again worked.
Obviously this is a bit of a cop out because ideally I’d like it to be possible to spin up the VM in one puppet run without manual intervention.
A couple of days ago I was discussing the problem with Ashok and he suggested that it was probably good to know when I could defer fixing the problem to a later stage since having a completely automated spin up isn’t my highest priority.
i.e. when I could take on what he referred to as ‘Puppet debt‘
I think this is a reasonable way of looking at things and I have worked on projects where we’ve been baffled by puppet’s dependency graph and have setup scripts which run puppet twice until we have time to sort it out.
If we’re spinning up new instances frequently then we have less ability to take on this type of debt because it’s going to hurt us much more but if not then I think it is reasonable to defer the problem.
This feels like another type of technical debt to me but I’d be interested in others’ thoughts and whether I’m just a complete cop out!
As I mentioned in my previous post I’ve been deploying a web application to a vagrant VM using Capistrano and my initial configuration was like so:
require 'capistrano/ext/multistage'
set :application, "thinkingingraphs"
set :scm, :git
set :repository, "git@bitbucket.org:markhneedham/thinkingingraphs.git"
set :scm_passphrase, ""
set :ssh_options, {:forward_agent => true, :paranoid => false, keys: ['~/.vagrant.d/insecure_private_key']}
set :stages, ["vagrant"]
set :default_stage, "vagrant"
set :user, "vagrant"
server "192.168.33.101", :app, :web, :db, :primary => true
set :deploy_to, "/var/www/thinkingingraphs"
When I ran ‘cap deploy’ I ended up with the following error:
* executing "git clone -q git@bitbucket.org:markhneedham/thinkingingraphs.git /var/www/thinkingingraphs/releases/20130414171523 && cd /var/www/thinkingingraphs/releases/20130414171523 && git checkout -q -b deploy 6dcbf945ef5b8a5d5d39784800f4a6b7731c7d8a && (echo 6dcbf945ef5b8a5d5d39784800f4a6b7731c7d8a > /var/www/thinkingingraphs/releases/20130414171523/REVISION)"
servers: ["192.168.33.101"]
[192.168.33.101] executing command
** [192.168.33.101 :: err] Host key verification failed.
** [192.168.33.101 :: err] fatal: The remote end hung up unexpectedly
As far as I can tell the reason for this is that bitbucket hasn’t been verified as a host by the VM and therefore the equivalent of the following happens when it tries to clone the repository:
$ ssh git@bitbucket.org The authenticity of host 'bitbucket.org (207.223.240.182)' can't be established. RSA key fingerprint is 97:8c:1b:f2:6f:14:6b:5c:3b:ec:aa:46:46:74:7c:40. Are you sure you want to continue connecting (yes/no)?
Since we aren’t answering ‘yes’ to that question and bitbucket isn’t in our ~/.ssh/known_hosts file it’s not able to continue.
One solution to this problem is to run the ssh command above and then answer ‘yes’ to the question which will add bitbucket to our known_hosts file and we can then run ‘cap deploy’ again.
It’s a bit annoying to have that manual step though so another way is to set cap to use pty by putting the following line in our config file:
set :default_run_options, {:pty => true}
Now when we run ‘cap deploy’ we can see that bitbucket automatically gets added to the known_hosts file:
servers: ["192.168.33.101"]
[192.168.33.101] executing command
** [192.168.33.101 :: out] The authenticity of host 'bitbucket.org (207.223.240.181)' can't be established.
** RSA key fingerprint is 97:8c:1b:f2:6f:14:6b:5c:3b:ec:aa:46:46:74:7c:40.
** Are you sure you want to continue connecting (yes/no)?
** [192.168.33.101 :: out] yes
** [192.168.33.101 :: out] Warning: Permanently added 'bitbucket.org,207.223.240.181' (RSA) to the list of known hosts.
As far as I can tell this runs the command using a pseudo terminal and then automatically adds bitbucket into the known_hosts file but I’m not entirely sure how that works. My google skillz have also failed me so if anyone can explain it to me that’d be cool