Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Gridshore
Syndicate content
A weblog about software engineering, Architecture, Technology an other things we like.
Updated: 6 hours 13 min ago

AngularJS directives for c3.js

Tue, 08/19/2014 - 10:19

For one of my projects we wanted to create some nice charts. Feels like something you often want but do not do because it takes to much time. This time we really needed it. We had a look at D3.js library, a very nice library but so many options and a lot to do yourself. Than we found c3.js, check the blog post by Roberto: Creating charts with C3.js. Since I do a lot with AngularJS, I wanted to integrate these c3.js charts with AngularJS. I already wrote a piece about the integration. Now I went one step further by creating a set of AngularJS directives.

You can read the blog on the trifork blog:

http://blog.trifork.com/2014/08/19/angularjs-directives-for-c3-js-chart-library/

The post AngularJS directives for c3.js appeared first on Gridshore.

Categories: Architecture, Programming

Creativity, INC. – by Ed Catmull

Fri, 08/15/2014 - 20:39


A colleague of mine, Ronald Vonk, recommended this book to me. It is a book by one of the founders of Pixar, you know from all those fantastic computer animated movies. At pixar they created an continuous changing environment where creativity should excel. It is a very interesting read if you are interested in management books that are not to heavy on theories. Ed explains very well and entertaining how they went from a small company with a vision to a huge company with a vision.

Without to much spoilers, you really need to reed the book yourself, I want to mention a few things that I remembered after reading the book.

The team is more important than the idea’s or the talent of the separate people. Take care of the team, make sure they function well and give them responsibility. Make them feel proud when they finished what they wanted to create. Always put people first.

This is something I ran into in my normal working life as well. I do think you have to enable the teams to adept and to stay as a good team. The challenge is to get others in to learn and later on replace team members or start their own team.

We would never make a film that way again. It is the managements job to take the long view, to intervene and protect our people from their willingness to pursue excellence at all costs. Not to do so would be irresponsible.

This was a remark after delivering a movie under great time stress. They pulled through, but at a cost.

Braintrust – Group of people giving feedback and ideas for improvements on a certain idea. Important is that the feedback is meant to improve the idea, not to bully the person(s) the idea originated from. It is very important that everybody is open to the feedback and not defensive. In the end it is not the braintrust that makes a decision, it is the person in charge for the product. Still this group of people is kind of the first user and therefore the feedback should not be taken to lightly.

This was something I had a long thought about, my conclusion was that I am not really good at this. I often do feel that my ideas are my babies that need to be defended. First persuade me I am wrong, o sorry, an idea that someone had was not the best.

I did not want to become a manager, I just wanted to be one of the boys and do research. When we became bigger I realised I became more important and new people did not see me as a peer or one of the guys. I realised things were starting to get hidden from me. It is no problem as long as you trust people will tell someone else that will tell the most important things to me again.

Couldn’t agree more.

You can have this very nice polished finely tuned locomotive. People think that being the driver of the train is giving them power. They feel that driving the train in the end is shaping the company. The truth is, it’s not. Driving the train does not set it’s course. The real job is laying the track.

This was an eye opener a well, something you know but is hard to put into words.

At pixar they do not have contracts. They feel that employment contracts both hurt the employer as well as the employee. If someone had a problem with the company, there wasn’t much point in complaining because they were under contract. If someone didn’t perform well, on the other hand, there was no point in confronting them about it; their contract simply wouldn’t be renewed, which might be the first time they heard about their need to improve. The whole system discouraged and devaluated day-to-day communication and was culturally dysfunctional. But since everybody was used to it, they were blind to the problem.

This is a long one, have thought about it for a while. I think for now I would be to scared to do this in a company, still I like the idea.

What is the point of hiring smart people if you don’t empower them to fix what’s broken? Often to much time is lost in making sure no mistakes will be made. Often however, it just takes a few days to find solutions for mistakes.

Keeps coming back to the same point, a manager is a facilitator, nothing more nothing less. It is a very important role, just like all the others. Think about it, it is the team, the complete team.

The post Creativity, INC. – by Ed Catmull appeared first on Gridshore.

Categories: Architecture, Programming

Using C3js with AngularJS

Tue, 07/29/2014 - 17:17

C3js is a graph javascript library on top of D3js. For a project we needed graphs and we ended up using c3js. How this happened and some of the first steps we took is written down by Roberto van der Linden in his blogpost: Creating charts with C3.js

Using C3js without other javascript libraries is of course perfectly doable. Still I think using it in combination with AngularJS is interesting to evaluate. In this blogpost I am going to document some steps I took to go from the basic c3 sample to a more advanced AngularJS powered sample.

Setup project – most basic sample

The easiest setup is by cloning by github repository: c3-angular-sample. If you are more the follow a long kind of person, than these are the steps.

We start by downloading C3js, d3 and AngularJS and create the first html page to get us started.

Next we create the index1.html that loads the stylesheet as well as the required javascript files. Two things are important to notice. One, the div with id chart. This is used to position the chart. Two, the script block that creates the chart object with data.

<!doctype html>
<html>
<head>
	<meta charset="utf-8">

	<link href="css/c3-0.2.4.css" rel="stylesheet" type="text/css">
</head>
<body>
	<h1>Sample 1: Most basic C3JS sample</h1>
	<div id="chart"></div>

	<!-- Load the javascript libraries -->
	<script src="js/d3/d3-3.4.11.min.js" charset="utf-8"></script>
	<script src="js/c3/c3-0.2.4.min.js"></script>

	<!-- Initialize and draw the chart -->
	<script type="text/javascript">
		var chart = c3.generate({
		    bindto: '#chart',
		    data: {
		      columns: [
		        ['data1', 30, 200, 100, 400, 150, 250],
		        ['data2', 50, 20, 10, 40, 15, 25]
		      ]
		    }
		});
	</script>
</body>
</html>

Next we introduce AngularJS to create the chart object.

Use AngularJS to create the chart

The first step is to remove the script block and add more javascript libraries to load. We load the AngularJS library and our own js file called graph-app-2.js. In the html tag we initialise the AngularJS app using ng-app=”graphApp”. In the body tag we initialise the Graph controller and call the function when the Angular app is initialised.

<html ng-app="graphApp">
<body ng-controller="GraphCtrl" ng-init="showGraph()">

Now let us have a look at some AngularJS javascript code. Notice that their is hardly a difference between the JavaScript code in both samples.

var graphApp = angular.module('graphApp', []);
graphApp.controller('GraphCtrl', function ($scope) {
	$scope.chart = null;
	
	$scope.showGraph = function() {
		$scope.chart = c3.generate({
			    bindto: '#chart',
			    data: {
			      columns: [
			        ['data1', 30, 200, 100, 400, 150, 250],
			        ['data2', 50, 20, 10, 40, 15, 25]
			      ]
			    }
			});		
	}
});

Still everything is really static, it makes use of mostly defaults. Let us move on and make it possible for the user to make some changes to the data and the type of chart.

Give the power to the user to change the data

We have two input boxes that accept common separated data and two drop downs to change the type of the chart. Fun to try out all the different types available. Below first he piece of html that contains the form to accept the user input.

	<form novalidate>
		<p>Enter in format: val1,val2,val3,etc</p>
		<input ng-model="config.data1" type="text" size="100"/>
		<select ng-model="config.type1" ng-options="typeOption for typeOption in typeOptions"></select>
		<p>Enter in format: val1,val2,val3,etc</p>
		<input ng-model="config.data2" type="text" size="100"/>
		<select ng-model="config.type2" ng-options="typeOption for typeOption in typeOptions"></select>
	</form>
	<button ng-click="showGraph()">Show graph</button>

This should be easy to understand. There are a few fields taken from the model as well as a button that generates the graph. Below is the complete javascript code for the controller.

var graphApp = angular.module('graphApp', []);

graphApp.controller('GraphCtrl', function ($scope) {
	$scope.chart = null;
	$scope.config={};
	$scope.config.data1="30, 200, 100, 200, 150, 250";
	$scope.config.data2="70, 30, 10, 240, 150, 125";

	$scope.typeOptions=["line","bar","spline","step","area","area-step","area-spline"];

	$scope.config.type1=$scope.typeOptions[0];
	$scope.config.type2=$scope.typeOptions[1];


	$scope.showGraph = function() {
		var config = {};
		config.bindto = '#chart';
		config.data = {};
		config.data.json = {};
		config.data.json.data1 = $scope.config.data1.split(",");
		config.data.json.data2 = $scope.config.data2.split(",");
		config.axis = {"y":{"label":{"text":"Number of items","position":"outer-middle"}}};
		config.data.types={"data1":$scope.config.type1,"data2":$scope.config.type2};
		$scope.chart = c3.generate(config);		
	}
});

In the first part we initialise all the variables in the $scope that are used on the form. In the showGraph function we have added a few things compared to the previous sample. We do not use the column data provider anymore, now we use the json provider. therefore we create arrays out of the comma separated numbers string. We also add a label to the y-axis and we set the types of charts using the same names as in the son object with the data. I think this is a good time to show a screen dump, stil it is much nicer to open the index3.html file yourself and play around with it.

Screen Shot 2014 07 29 at 17 52 26

Now we want to do more with angular, introduce a service that could obtain data from the server, but also make the data time based data.

Introducing time and data generation

The html is very basic, it only contains to buttons. One to start generating data and one to stop generating data. Let us focus on the javascript we have created. When creating the application we now tell it to look for another module called graphApp.services. Than we create the service using the factory and register it with the application. That way we can inject the service later on into the controller.

var graphApp = angular.module('graphApp', ['graphApp.services']);
var services = angular.module('graphApp.services', []);
services.factory('dataService', [function() {
	function DataService() {
		var data = [];
		var numDataPoints = 60;
		var maxNumber = 200;

		this.loadData = function(callback) {
			if (data.length > numDataPoints) {
				data.shift();
			}
			data.push({"x":new Date(),"data1":randomNumber(),"data2":randomNumber()});
			callback(data);
		};

		function randomNumber() {
			return Math.floor((Math.random() * maxNumber) + 1);
		}
	}
	return new DataService();
}]);

This service has one exposed method, loadData(callback). The result is that you get back a collection of objects with 3 fields: x, data1 and data2. The number of points is maximised to 60 and they have a random value between 0 and 200.

Next is the controller. I start with the controller, the parameters that are initialised as well as the method to draw the graph.

graphApp.controller('GraphCtrl', ['$scope','$timeout','dataService',function ($scope,$timeout,dataService) {
	$scope.chart = null;
	$scope.config={};

	$scope.config.data=[]

	$scope.config.type1="spline";
	$scope.config.type2="spline";
	$scope.config.keys={"x":"x","value":["data1","data2"]};

	$scope.keepLoading = true;

	$scope.showGraph = function() {
		var config = {};
		config.bindto = '#chart';
		config.data = {};
		config.data.keys = $scope.config.keys;
		config.data.json = $scope.config.data;
		config.axis = {};
		config.axis.x = {"type":"timeseries","tick":{"format":"%S"}};
		config.axis.y = {"label":{"text":"Number of items","position":"outer-middle"}};
		config.data.types={"data1":$scope.config.type1,"data2":$scope.config.type2};
		$scope.chart = c3.generate(config);		
	}

Take note of the $scope.config.keys which is used later on in the config.data.keys object. With this we move from an array of data per line to a an object with all the datapoints. The x is coming from the x property, the value properties are coming from data1 and data2. Also take note of the config.axis.x property. Here we specify that we a re now dealing with timeseries data and we format the ticks to show only the seconds. This is logical since we configured the have a maximum of 60 points and we create a new point every second, which we will see later on. The next code block shows the other three functions in the controller.

	$scope.startLoading = function() {
		$scope.keepLoading = true;
		$scope.loadNewData();
	}

	$scope.stopLoading = function() {
		$scope.keepLoading = false;
	}

	$scope.loadNewData = function() {
		dataService.loadData(function(newData) {
			var data = {};
			data.keys = $scope.config.keys;
			data.json = newData;
			$scope.chart.load(data);
			$timeout(function(){
				if ($scope.keepLoading) {
					$scope.loadNewData()				
				}
			},1000);			
		});
	}

In the loadNewData method we use the $timeout function of AngularJS. After a second we call the loadNewData again if we did not set the keepLoading property explicitly to false. Note that adding a point to the graph and calling the function recursively is done in the callback of the methode as send to the dataService.

Now run the sample for longer than 60 seconds and see what happens. Below is a screen dump, but is becomes more funny when you run the sample yourself.

Screen Shot 2014 07 29 at 18 16 01

That is about it, now I am going to integrate this with my elasticsearch-ui plugin

The post Using C3js with AngularJS appeared first on Gridshore.

Categories: Architecture, Programming

Transform the input before indexing in elasticsearch

Sat, 07/26/2014 - 07:51

Sometimes you are indexing data and want to have as little to do in the input, or maybe even no influence on the input. Still you need to make changes, you want other content, or other fields. Maybe even remove fields. In elasticsearch 1.3 a new feature is introduces called Transform. In this blogpost I am going to show some of the aspects of this new feature.

Insert the document with the problem

The input we get is coming from a system that puts the string null in a field if it is empty. We do not want null as a string in elasticsearch index. Therefore we want to remove this field completely when indexing a document like that. We start with the example and the proof that you can search on the field.

PUT /transform/simple/1
{
  "title":"This is a document with text",
  "description":"null"
}

Now search for the word null in the description.

For completeness I’ll show you the response as well.

Response:
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "transform",
            "_type": "simple",
            "_id": "1",
            "_score": 0.30685282,
            "_source": {
               "title": "This is a document with text",
               "description": "null"
            }
         }
      ]
   }
}
Change mapping to contain transform

Next we are going to use the transform functionality to remove the field if it contains the string null. To do that we need to remove the index and create a mapping containing the transform functionality. We use the groovy language for the script. Beware that the script is only validated when the first document is inserted.

PUT /transform
{
  "mappings": {
    "simple": {
      "transform": {
        "lang":"groovy",
        "script":"if (ctx._source['description']?.equals('null')) ctx._source['description'] = null"
      },
      "properties": {
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        }
      }
    }
  }
}

When we insert the same document as before and execute the same query we do not get hits. The description field is no longer indexed. An important aspect is that the actual _source is not changed. When requesting the _source of the document you still get back the original document.

GET transform/simple/1/_source
Response:
{
   "title": "This is a document with text",
   "description": "null"
}
Add a field to the mapping

To add a bit more complexity, we add a field called nullField which will contain the name of the field that was null. Not very useful but it suits to show the possibilities.

PUT /transform
{
  "mappings": {
    "simple": {
      "transform": {
        "lang":"groovy",
        "script":"if (ctx._source['description']?.equals('null')) {ctx._source['description'] = null;ctx._source['nullField'] = 'description';}"
      },
      "properties": {
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        },
        "nullField": {
          "type": "string"
        }
      }
    }
  }
}

Notice that we script has changed, not only do we remove the description field, now we also add a new field called nullField. Check that the _source is still not changed. Now we do a search and only return the fields description and nullField. Before scrolling to the response think about the response that you would expect.

GET /transform/_search
{
  "query": {
    "match_all": {}
  },
  "fields": ["nullField","description"]
}

Did you really think about it? Try it out and notice that the nullField is not returned. That is because we did not store it in the index and it is not obtained from the source. So if we really need this value, we can store the nullField in the index and we are fine.

PUT /transform
{
  "mappings": {
    "simple": {
      "transform": {
        "lang":"groovy",
        "script":"if (ctx._source['description']?.equals('null')) {ctx._source['description'] = null;ctx._source['nullField'] = 'description';}"
      },
      "properties": {
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        },
        "nullField": {
          "type": "string",
          "store": "yes"
        }
      }
    }
  }
}

Than with the match all query for two fields we get the following response.

GET /transform/_search
{
  "query": {
    "match_all": {}
  },
  "fields": ["nullField","description"]
}
Response:
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "transform",
            "_type": "simple",
            "_id": "1",
            "_score": 1,
            "fields": {
               "description": [
                  "null"
               ],
               "nullField": [
                  "description"
               ]
            }
         }
      ]
   }
}

Yes, now we do have the new field. That is it, but wait there is more you need to know. There is a way to check what is actually passed to the index for a certain document.

GET transform/simple/1?pretty&_source_transform
Result:
{
   "_index": "transform",
   "_type": "simple",
   "_id": "1",
   "_version": 1,
   "found": true,
   "_source": {
      "description": null,
      "nullField": "description",
      "title": "This is a document with text"
   }
}

Notice the null description and the nullField in the _source.

Final remark

You cannot update the transform part, think about what would happen to your index when some documents did pass the transform version 1 and others version 2.

I would be gentile with this feature, try to solve it before sending it to elasticsearch, but maybe you just have the usecase for this feature, now you know it exists.

In my next blogpost I dive a little bit deeper into the scripting module.

The post Transform the input before indexing in elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming

Playing with two most interesting new features of elasticsearch 1.3.0

Fri, 07/25/2014 - 11:47

Just a few days a go elasticsearch released version 1.3.0 of their flagship product. The first one is the most waited for feature called the Top hits aggregation. Basically this is what is called grouping. You want to group certain items based on one characteristic, but within this group you want to have the best matching result(s) based on score. Another very important feature is the new support for scripts. Better security options when using scripts using sandboxed script languages.

In this blogpost I am going to explain and show the top_hits feature as well as the new scripting support.


Top hits

I am going to show a very simple example of top hits using my music index. This index contains all the songs I have in my itunes library. First step is to find songs by genre, the following query gives (the default) 10 hits based on the match_all query and the terms aggregation as requested.

GET /mymusic/_search
{
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 10
      }
    }
  }
}

The response is of the format:

{
	"hits": {},
	"aggregations": {
		"byGenre": {
			"buckets": [
				{"key":"rock","doc_count":1910},
				...
			]
		}
    }
}

Now we add the query to the request, songs containing the word love in the title.

GET /mymusic/_search
{
  "query": {
    "match": {
      "name": "love"
    }
  }, 
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 10
      }
    }
  }
}

Now we have less hits, still a number of buckets and the amount of songs that match our query within that bucket. The biggest change is the score in the returned hits. In te previous query the score was always 1, now the score is different due to the query we execute. The highest score now is the song Love by The Mission. The genre for this song is Rock and the song is from the year 1990. Time to introduce the top hits aggregation. With this query we can return the top song containing the word love in the title per genre

GET /mymusic/_search
{
  "query": {
    "match": {
      "name": "love"
    }
  },
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 5
      },
      "aggs": {
        "topFoundHits": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

Again we get hits, but they are not different from the query before. The interesting part is in the aggs part. Here we add a sub aggregation to the byGenre aggregation. This aggregation is called topFoundHits of type top_hits. We only return the best hit per genre. The next code block shows the part of the response with the top hits, I did remove the content of the _source field in the top_hits to keep the response shorter.

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 141,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "byGenre": {
         "buckets": [
            {
               "key": "rock",
               "doc_count": 52,
               "topFoundHits": {
                  "hits": {
                     "total": 52,
                     "max_score": 4.715253,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "4147",
                           "_score": 4.715253,
                           "_source": {
                              "name": "Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "pop",
               "doc_count": 39,
               "topFoundHits": {
                  "hits": {
                     "total": 39,
                     "max_score": 3.3341873,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "11381",
                           "_score": 3.3341873,
                           "_source": {
                              "name": "Love To Love You",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "alternative",
               "doc_count": 12,
               "topFoundHits": {
                  "hits": {
                     "total": 12,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "b",
               "doc_count": 9,
               "topFoundHits": {
                  "hits": {
                     "total": 9,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "r",
               "doc_count": 7,
               "topFoundHits": {
                  "hits": {
                     "total": 7,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

Did you note a problem with my analyser for genre? Hint R&B!

More information on the top_hits aggregation can be found here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

Scripting

Elasticsearch has support for scripts for a long time. The default scripting language was and is mvel up to version 1.3. It will change to groovy in 1.4. Mvel is not a well known scripting language. The biggest advantage is that mvel is very powerful. The disadvantage is that it is to powerful. Mvel does not come with a sandbox principle. Therefore is is possible to write some very nasty scripts even when only doing a query. This was very well shown by a colleague of mine (Byron Voorbach) who created a query to read private keys on developer machines who did not safeguard their elasticsearch instance. Therefore dynamic scripting was switched off in version 1.2 by default.

This came with a very big disadvantage, now it was not possible anymore to use the function_score query without resorting to stored scripts on the server. In version 1.3 of elasticsearch a much better way is introduced. Now you use sandboxed scripting languages like groovy to keep using the flexible approach. Groovy can be configured to include object creation and method calls that are allowed. More information about this is provided in the elasticsearch documentation about scripting.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html

Next is an example query that is querying my music index. This index contains all the songs from my music library. It queries all he songs after the year 1999 and calculates the score based on the year. So the newest songs get the highest score. And yes I know a sort by year desc would have given the same result.

GET mymusic/_search
{
  "query": {
    "function_score": {
      "query": {
        "range": {
          "year": {
            "gte": 2000
          }
        }
      },
      "functions": [
        {
          "script_score": {
            "lang": "groovy", 
            "script": "_score * doc['year'].value"
          }
        }
      ]
    }
  }
}

The score now becomes high, since we do a range query we get back only scores of one. Using the function_score as the multiplication of the year with the score, the end score is the year. I added the year as the only field to return, some of the results than are:

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 2895,
      "max_score": 2014,
      "hits": [
         {
            "_index": "mymusic",
            "_type": "itunes",
            "_id": "12965",
            "_score": 2014,
            "fields": {
               "year": [
                  "2014"
               ]
            }
         },
         {
            "_index": "mymusic",
            "_type": "itunes",
            "_id": "12975",
            "_score": 2014,
            "fields": {
               "year": [
                  "2014"
               ]
            }
         }
      ]
   }
}

Next up is the last sample, a combination of top_hits and scripting.

Top hits with scripting

We start with the sample from top_hits using my music index. Now we want to sort the buckets on the score of the best matching document in the bucket. The default is the number of documents in the bucket. As mentioned in the documentation you need a trick to do this.

The top_hits aggregator isn’t a metric aggregator and therefor can’t be used in the order option of the terms aggregator.

GET /mymusic/_search?search_type=count
{
  "query": {
    "match": {
      "name": "love"
    }
  },
  "aggs": {
    "byGenre": {
      "terms": {
        "field": "genre",
        "size": 5,
        "order": {
          "best_hit":"desc"
        }
      },
      "aggs": {
        "topFoundHits": {
          "top_hits": {
            "size": 1
          }
        },
        "best_hit": {
          "max": {
            "lang": "groovy", 
            "script": "doc.score"
          }
        }
      }
    }
  }
}

The results of this query again with most of the _source taken out is following. Compare it to the query in the top_hits section. Notice the different genres that we get back now. Also check the scores.

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 3,
      "successful": 3,
      "failed": 0
   },
   "hits": {
      "total": 141,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "byGenre": {
         "buckets": [
            {
               "key": "rock",
               "doc_count": 37,
               "topFoundHits": {
                  "hits": {
                     "total": 37,
                     "max_score": 4.715253,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "4147",
                           "_score": 4.715253,
                           "_source": {
                              "name": "Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.715252876281738
               }
            },
            {
               "key": "alternative",
               "doc_count": 12,
               "topFoundHits": {
                  "hits": {
                     "total": 12,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.194550514221191
               }
            },
            {
               "key": "punk",
               "doc_count": 3,
               "topFoundHits": {
                  "hits": {
                     "total": 3,
                     "max_score": 4.1945505,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "7889",
                           "_score": 4.1945505,
                           "_source": {
                              "name": "Love Love Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 4.194550514221191
               }
            },
            {
               "key": "pop",
               "doc_count": 24,
               "topFoundHits": {
                  "hits": {
                     "total": 24,
                     "max_score": 3.3341873,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "11381",
                           "_score": 3.3341873,
                           "_source": {
                              "name": "Love To Love You",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 3.3341872692108154
               }
            },
            {
               "key": "b",
               "doc_count": 7,
               "topFoundHits": {
                  "hits": {
                     "total": 7,
                     "max_score": 3.0271564,
                     "hits": [
                        {
                           "_index": "mymusic",
                           "_type": "itunes",
                           "_id": "2549",
                           "_score": 3.0271564,
                           "_source": {
                              "name": "First Love",
                           }
                        }
                     ]
                  }
               },
               "best_hit": {
                  "value": 3.027156352996826
               }
            }
         ]
      }
   }
}

This is just a first introduction into the top_hits and scripting. Stay tuned for more blogs around these topics.

The post Playing with two most interesting new features of elasticsearch 1.3.0 appeared first on Gridshore.

Categories: Architecture, Programming

Review The Twitter Story by Nick Bilton

Thu, 07/24/2014 - 23:35


A lot of people dream of creating the new Facebook, Twitter or Instagram. Nobody knows when they will create it, most of the ideas just came to be big. Personally I do not believe I will ever create such a product. To busy with to much different things, usually based on great ideas of others. One thing I do like is reading about the success stories of others. I read the book about starbucks, microsoft, apple and a few others.

Recently I started reading the Twitter Story. It reeds like an exciting story, nevertheless it is telling a story based on interviews and facts behind one of the most exciting companies on the internet of the last century.

I do not think it is a coincidence that a lot of what I read in the Lean Startup but also the starbucks story is coming back in the twitter story. One thing that struck me in this book is what business does to friendship. It is also showing that people with great ideas are usually not the people that make this ideas into a profitable company.

When starting a company based on your terrific idea, read this book and learn from it. It might make your life a lot better.

The post Review The Twitter Story by Nick Bilton appeared first on Gridshore.

Categories: Architecture, Programming

Better than something or someone or just good

Sun, 07/13/2014 - 15:56

Recently I saw some tweets about the possibility to finally have a website to express your hate for Eclipse. Since I am a strong believer in another tool than eclipse and I really don’t want to work with eclipse, at first I thought this was funny. Than I was checking some news posts and found a post on apple insider where they show a Samsung advertorial that bashes the iPhone battery life. Again, funny if you like samsung or you think android is better than the iPhone. Maybe even funny if you still love the iPhone.

These tweets and sources of information made me think about marketing of products, opinions of people and sometimes company strategies. As a company, do you want to prove you are better than someone else, or do you want to prove that you are really good. Personally I do not really care if I am better than someone else, I only care if I am good or maybe even superb.

Does it really work to be better than … ? Was Argentina better than the Dutch just a week a go? If not, like so many say, did it help the Dutch? Better than is an interesting concept in soccer. If your soccer teams never wins the championship, you as a fan still think they are the best. I get the same with my local soccer team, every parent thinks his kid is better than the other kids. So better than is incredibly subjective. If you say you create better software, does it sell better? What is good software? If you are into software quality you probably have a few tools to calculate the quality of software. Are you a better programmer if the lines of code you write have shorter length? Are you a better software developer if you use another tool than eclipse? I don’t really care. Use eclipse if you want to, use something else if you can afford it.

I like to make fun of people that use eclipse just like I make fun of people using linux or windows. But in the end, I don’t really care. I know people using windows or linux, working on eclipse can make better software than I can. I do not want to work for a company that tries to sell projects only by stating that they are better than someone else. I want my company to tell customers about what we are good at.

So if you are in the hating or Better than business put a hate comment on this blog, if not have a good day.

The post Better than something or someone or just good appeared first on Gridshore.

Categories: Architecture, Programming

Keeping a journal

Sun, 06/29/2014 - 23:34

Today I was reading the first part of a book I got as a gift from one of my customers. The book is called Show your work by Austin Kleon(Show Your Work! @ Amazon). The whole idea around this book is that you must be open en share what you learn and the steps you took to learn.

I think this fits me like a glove, but I can be more expressive. Therefore I have decided to do things differently. I want to start by writing smaller pieces of the things I want to do that day, or what I accomplished that day, give some excerpts of things I am working on. Not real blog posts or tutorials but more notes that I share with you. Since it is a Sunday I only want to share the book I am reading.


The post Keeping a journal appeared first on Gridshore.

Categories: Architecture, Programming

Using Dropwizard in combination with Elasticsearch

Thu, 05/15/2014 - 21:09

Dropwizard logo

How often do you start creating a new application? How often have you thought about configuring an application. Where to locate a config file, how to load the file, what format to use? Another thing you regularly do is adding timers to track execution time, management tools to do thread analysis etc. From a more functional perspective you want a rich client side application using angularjs. So you need a REST backend to deliver json documents. Does this sound like something you need regularly? Than this blog post is for you. If you never need this, please keep on reading, you might like it.

In this blog post I will create an application that show you all the available indexes in your elasticsearch cluster. Not very sexy, but I am going to use: AngularJS, Dropwizard and elasticsearch. That should be enough to get a lot of you interested.


What is Dropwizard

Dropwizard is a framework that combines a lot of other frameworks that have become the de facto standard in their own domain. We have jersey for REST interface, jetty for light weight container, jackson for json parsing, free marker for front-end templates, Metric for the metrics, slf4j for logging. Dropwizard has some utilities to combine these frameworks and enable you as a developer to be very productive in constructing your application. It provides building blocks like lifecycle management, Resources, Views, loading of bundles, configuration and initialization.

Time to jump in and start creating an application.

Structure of the application

The application is setup as a maven project. To start of we only need one dependency:

<dependency>
    <groupId>io.dropwizard</groupId>
    <artifactId>dropwizard-core</artifactId>
    <version>${dropwizard.version}</version>
</dependency>

If you want to follow along, you can check my github repository:


https://github.com/jettro/dropwizard-elastic

Configure your application

Every application needs configuration. In our case we need to configure how to connect to elasticsearch. In drop wizard you extend the Configuration class and create a pojo. Using jackson and hibernate validator annotations we configure validation and serialization. In our case the configuration object looks like this:

public class DWESConfiguration extends Configuration {
    @NotEmpty
    private String elasticsearchHost = "localhost:9200";

    @NotEmpty
    private String clusterName = "elasticsearch";

    @JsonProperty
    public String getElasticsearchHost() {
        return elasticsearchHost;
    }

    @JsonProperty
    public void setElasticsearchHost(String elasticsearchHost) {
        this.elasticsearchHost = elasticsearchHost;
    }

    @JsonProperty
    public String getClusterName() {
        return clusterName;
    }

    @JsonProperty
    public void setClusterName(String clusterName) {
        this.clusterName = clusterName;
    }
}

Then you need to create a yml file containing the properties in the configuration as well as some nice values. In my case it looks like this:

elasticsearchHost: localhost:9300
clusterName: jc-play

How often did you start in your project to create the configuration mechanism? Usually I start with maven and quickly move to tomcat. Not this time. We did do maven, now we did configuration. Next up is the runner for the application.

Add the runner

This is the class we run to start the application. Internally jetty is started. We extend the Application class and use the configuration class as a generic. This is the class that initializes the complete application. Used bundles are initialized, classes are created and passed to other classes.

public class DWESApplication extends Application<DWESConfiguration> {
    private static final Logger logger = LoggerFactory.getLogger(DWESApplication.class);

    public static void main(String[] args) throws Exception {
        new DWESApplication().run(args);
    }

    @Override
    public String getName() {
        return "dropwizard-elastic";
    }

    @Override
    public void initialize(Bootstrap<DWESConfiguration> dwesConfigurationBootstrap) {
    }

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        logger.info("Running the application");
    }
}

When starting this application, we have no succes. A big error because we did not register any resources.

ERROR [2014-05-14 16:58:34,174] com.sun.jersey.server.impl.application.RootResourceUriRules: 
	The ResourceConfig instance does not contain any root resource classes.
Nothing happens, we just need a resource.

Before we can return something, we need to have something to return. We create a pogo called Index that contains one property called name. For now we just return this object as a json object. The following code shows the IndexResource that handles the requests that are related to the indexes.

@Path("/indexes")
@Produces(MediaType.APPLICATION_JSON)
public class IndexResource {

    @GET
    @Timed
    public Index showIndexes() {
        Index index = new Index();
        index.setName("A Dummy Index");

        return index;
    }
}

The @GET, @PATH and @Produces annotations are from the jersey rest library. @Timed is from the metrics library. Before starting the application we need to register our index resource with jersey.

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        logger.info("Running the application");
        final IndexResource indexResource = new IndexResource();
        environment.jersey().register(indexResource);
    }

Now we can start the application using the following runner from intellij. Later on we will create the executable jar.

Running the app from intelij

Run the application again, this time it works. You can browse to http://localhost:8080/index and see our dummy index as a nice json document. There is something in the logs though. I love this message, this is what you get when running the application without health checks.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!    THIS APPLICATION HAS NO HEALTHCHECKS. THIS MEANS YOU WILL NEVER KNOW      !
!     IF IT DIES IN PRODUCTION, WHICH MEANS YOU WILL NEVER KNOW IF YOU'RE      !
!    LETTING YOUR USERS DOWN. YOU SHOULD ADD A HEALTHCHECK FOR EACH OF YOUR    !
!         APPLICATION'S DEPENDENCIES WHICH FULLY (BUT LIGHTLY) TESTS IT.       !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Creating a health check

We add a health check, since we are creating an application interacting with elasticsearch, we create a health check for elasticsearch. Don’t think to much about how we connect to elasticsearch yet. We get there later on.

public class ESHealthCheck extends HealthCheck {

    private ESClientManager clientManager;

    public ESHealthCheck(ESClientManager clientManager) {
        this.clientManager = clientManager;
    }

    @Override
    protected Result check() throws Exception {
        ClusterHealthResponse clusterIndexHealths = clientManager.obtainClient().admin().cluster().health(new ClusterHealthRequest())
                .actionGet();
        switch (clusterIndexHealths.getStatus()) {
            case GREEN:
                return HealthCheck.Result.healthy();
            case YELLOW:
                return HealthCheck.Result.unhealthy("Cluster state is yellow, maybe replication not done? New Nodes?");
            case RED:
            default:
                return HealthCheck.Result.unhealthy("Something is very wrong with the cluster", clusterIndexHealths);

        }
    }
}

Just like with the resource handler, we need to register the health check. Together with the standard http port for normal users, another port is exposed for administration. Here you can find the metrics reports like Metrics, Ping, Threads, Healthcheck.

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        Client client = ESClientFactorybean.obtainClient(config.getElasticsearchHost(), config.getClusterName());

        logger.info("Running the application");
        final IndexResource indexResource = new IndexResource(client);
        environment.jersey().register(indexResource);

        final ESHealthCheck esHealthCheck = new ESHealthCheck(client);
        environment.healthChecks().register("elasticsearch", esHealthCheck);
    }

You as a reader now have an assignment to start the application and check the admin pages yourself: http://localhost:8081. We are going to connect to elasticsearch in the mean time.

Connecting to elasticsearch

We connect to elasticsearch using the transport client. This is taken care of by the ESClientManager. We make use of the drop wizard managed classes. The lifecycle of these classes is managed by drop wizard. From the configuration object we take the host(s) and the cluster name. Now we can obtain a client in the start method and pass this client to the classes that need it. The first class that needs it is the health check, but we already had a look at that one. Using the ESClientManager other classes have access to the client. The managed interface mandates the start as well as the stop method.

    @Override
    public void start() throws Exception {
        Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", clusterName).build();

        logger.debug("Settings used for connection to elasticsearch : {}", settings.toDelimitedString('#'));

        TransportAddress[] addresses = getTransportAddresses(host);

        logger.debug("Hosts used for transport client : {}", (Object) addresses);

        this.client = new TransportClient(settings).addTransportAddresses(addresses);
    }

    @Override
    public void stop() throws Exception {
        this.client.close();
    }

We need to register our managed class with the lifecycle of the environment in the runner class.

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        ESClientManager esClientManager = new ESClientManager(config.getElasticsearchHost(), config.getClusterName());
        environment.lifecycle().manage(esClientManager);
    }	

Next we want to change the IndexResource to use the elasticsearch client to list all indexes.

    public List<Index> showIndexes() {
        IndicesStatusResponse indices = clientManager.obtainClient().admin().indices().prepareStatus().get();

        List<Index> result = new ArrayList<>();
        for (String key : indices.getIndices().keySet()) {
            Index index = new Index();
            index.setName(key);
            result.add(index);
        }
        return result;
    }

Now we can browse to http://localhost:8080/indexes and we get back a nice json object. In my case I got this:

[
	{"name":"logstash-tomcat-2014.05.02"},
	{"name":"mymusicnested"},
	{"name":"kibana-int"},
	{"name":"playwithip"},
	{"name":"logstash-tomcat-2014.05.08"},
	{"name":"mymusic"}
]
Creating a better view

Having this REST based interface with json documents is nice, but not if you are human like me (well kind of). So let us add some AngularJS magic to create a slightly better view. The following page can of course also be created with easier view technologies, but I want to demonstrate what you can do with dropwizard.

First we make it possible to use free marker as a template. To make this work we need to additional dependencies: dropwizard-views and dropwizard-views-freemarker. The first step is a view class that knows the free marker template to load and provide the fields that you template can read. In our case we want to expose the cluster name.

public class HomeView extends View {
    private final String clusterName;

    protected HomeView(String clusterName) {
        super("home.ftl");
        this.clusterName = clusterName;
    }

    public String getClusterName() {
        return clusterName;
    }
}

Than we have to create the free marker template. This looks like the following code block

<#-- @ftlvariable name="" type="nl.gridshore.dwes.HomeView" -->
<html ng-app="myApp">
<head>
    <title>DWAS</title>
</head>
<body ng-controller="IndexCtrl">
<p>Underneath a list of indexes in the cluster <strong>${clusterName?html}</strong></p>

<div ng-init="initIndexes()">
    <ul>
        <li ng-repeat="index in indexes">{{index.name}}</li>
    </ul>
</div>

<script src="/assets/js/angular-1.2.16.min.js"></script>
<script src="/assets/js/app.js"></script>
</body>
</html>

By default you put these template in the resources folder using the same sub folders as your view class has for the package. If you look closely you see some angularjs code, more on this later on. First we need to map a url to the view. This is done with a resource class. The following code block shows the HomeResource class that maps the “/” to the HomeView.

@Path("/")
@Produces(MediaType.TEXT_HTML)
public class HomeResource {
    private String clusterName;

    public HomeResource(String clusterName) {
        this.clusterName = clusterName;
    }

    @GET
    public HomeView goHome() {
        return new HomeView(clusterName);
    }
}

Notice we now configure it to return text/html. The goHome method is annotated with GET, so each GET request to the PATH “/” is mapped to the HomeView class. Now we need to tell jersey about this mapping. That is done in the runner class.

final HomeResource homeResource = new HomeResource(config.getClusterName());
environment.jersey().register(homeResource);
Using assets

The final part I want to show is how to use the assets bundle from drop zone to map a folder “/assets” to a part of the url. To use this bundle you have to add the following dependency in maven: dropwizard-assets. Than we can easily map the assets folder in our resources folder to the web assets folder

    @Override
    public void initialize(Bootstrap<DWESConfiguration> dwesConfigurationBootstrap) {
        dwesConfigurationBootstrap.addBundle(new ViewBundle());
        dwesConfigurationBootstrap.addBundle(new AssetsBundle("/assets/", "/assets/"));
    }

That is it, now you can load the angular javascript file. My very basic sample has one angular controller. This controller uses the $http service to call our /indexes url. The result is used to show the indexes in a list view.

myApp.controller('IndexCtrl', function ($scope, $http) {
    $scope.indexes = [];

    $scope.initIndexes = function () {
        $http.get('/indexes').success(function (data) {
            $scope.indexes = data;
        });
    };
});

And the result

the very basic screen showing the indexes

Concluding

This was my first go at using drop wizard, I must admit I like what I have seen so far. Not sure if I would create a big application with it, on the other hand it is really structured. Before moving on I would need to reed a bit more about the library and check all of its options. There is a lot more possible than what I have showed you in here.

References

The post Using Dropwizard in combination with Elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming