Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Gridshore
Syndicate content
A weblog about software engineering, Architecture, Technology an other things we like.
Updated: 2 days 3 hours ago

Introducing a query tool for elasticsearch

Tue, 03/12/2013 - 20:47

ElasticSearch logo

In my previous blog posts I was working on reading data from wordpress blogs using groovy. This is nice, but of course there was a reason why I needed this. I wanted to create a tool or better a plugin for Elasticsearch. This plugin should make it easier to check the state of you cluster, play around with facets and query your data.

On my employers blog I started a series of blog posts that explain this plugin. I go into details for the libraries I used: AngularJS, Twitter Bootstrap and of course elastic.js.

Check my employers blog post if you are interested.

Introducing a query tool as an elasticsearch plugin (part 1)/

The post Introducing a query tool for elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming

Doing more ElasticSearch with groovy

Thu, 01/31/2013 - 22:07

In my previous blog post I wrote about a groovy client for reading a wordpress blog. Than using this client I send the data to ElasticSearch to be indexed. Of course you cannot do anything with ElasticSearch if you do not read the data by executing queries. So that blog post also talks about executing search queries and doing count queries.

But what if you want to start playing with things like facets? What if we want to use a different analyzer to separate the keywords on a comma? Than you can use curl. No wait there is more. You can also use groovy of course.

That is what I discuss in this blog post, creating and removing indexes. Beware, this is not something you want to do on your production server. Deleting an index makes the index disappear, yes for real, you cannot get it back.

Read on if you want to learn more about groovy and ElasticSearch.


Prequel

Please read my previous post if you want to understand the sample that I am creating. I will not discuss it in this blog post.

Learning about ElasticSearch/

Delete an index

Again be careful, usually this is not possible in production. You should disable it.

The ElasticSearchGateway contains the following method to delete an index.

public deleteIndex() {
    try {
        def response = node.client.admin.indices.prepareDelete(indexValue).execute().get()
        if (response.acknowledged) {
            println "The index is removed"
        } else {
            println "The index could not be removed"
        }
    } catch (Exception e) {
        println "The index you want to delete is missing : ${e.message}"
    }
}

node is an instance of GNode created in the constructor of my Gateway class. You ask for the client to get access to the ElasticSearch cluster. Than ask for the admin part on which you can call index related queries. In this case we call the prepareDelete method. I put a try-catch around the delete statement. I do not want the script to stop if the index cannot be deleted because it is not there.

Creating the index

Before I show you have to create an index with some advanced stuff going on in there, I want to explain why I needed it. I want to create facets around the keywords and the categories in my blogs. The format of the keywords field is:

Yvonne van der Mey, fotografie, bloemen

I want three terms out of this after analyzing for creating facets. Therefore I want to tokenize based on the comma and I want to strip the spaces from the items. This can be done with a custom analyzer. The custom analyzer refers to a pattern based tokenizer and a trimming filter. In groovy you can create this index, with settings and mappings as shown in the next block.

public createIndex() {
    def future = node.client.admin.indices.create {
        index = this.indexValue
        settings {
            number_of_shards = "1"
            analysis {
                analyzer {
                    comma {
                        type = "custome"
                        tokenizer = "bycomma"
                        filter = ["nowhite"]
                    }
                }
                tokenizer {
                    bycomma {
                        type = "pattern"
                        pattern = ","
                    }
                }
                filter {
                    nowhite {
                        type = "trim"
                    }
                }
            }
        }
        mapping this.typeValue, {
            "${this.typeValue}" {
                properties {
                    keywords {
                        type = "string"
                        analyzer = "comma"
                    }
                    categories {
                        type = "string"
                        analyzer = "comma"
                    }
                }
            }
        }
    }

    future.success = { CreateIndexResponse response ->
        println "Index is created"
    }

    future.failure = {
        println "ERROR creating index $it"
    }

}

With groovy you can use the closure notation to create these items like settings and mappings. This way the method with a string containing the source is called and the closure is transformed into the json structure. The notation looks very similar to the json as provided when using curl.

A nice way to test if your analyzer works is using curl. You can actually call the analyzer with a string to be analyzed.

[~]$ curl -XGET 'localhost:9200/coenradie/_analyze?analyzer=comma&pretty=true' -d 'this ,is a ,test, with a lot of,difference,'
{
  "tokens" : [ {
    "token" : "this",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "is a",
    "start_offset" : 6,
    "end_offset" : 11,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "test",
    "start_offset" : 12,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "with a lot of",
    "start_offset" : 17,
    "end_offset" : 31,
    "type" : "word",
    "position" : 4
  }, {
    "token" : "difference",
    "start_offset" : 32,
    "end_offset" : 42,
    "type" : "word",
    "position" : 5
  } ]
}

As you can see, the spaces at the end and beginning are removed, but we have multiple words as one token. Exactly like I want.

The code also shows we use this analyzer for two fields: keywords and categories. The next screen shows another project I am working on. This project is actually the main project but I needed some content so I created the groovy scripts to find content and send them to ElasticSearch. Check the facets at the bottom of the screen.

Screen Shot 2013 01 31 at 22 03 28

I hop you like this addition. My next blog will be about ElasticSearch and writing a plugin using AngularJS.

The post Doing more ElasticSearch with groovy appeared first on Gridshore.

Categories: Architecture, Programming

Learning about ElasticSearch

Sun, 01/27/2013 - 10:04

ElasticSearch logo

This week I had a training at Trifork Amsterdam by Martijn and Uri from ElasticSearch. This training was a very nice in depth look at the capabilities of ElasticSearch. Like with all trainings and conferences I get motivated to try out the technology immediately. I am working on a plugin for ElasticSearch using AngularJS. More on this in a next blog at Trifork.

In this blog I am going to tell you the steps I took to get going with ElasticSearch. I show some of the steps for installing ElasticSearch, explain the first steps in using the groovy library. Next to that I’ll show you how to index this wordpress blog using ElasticSearch. When everything is indexed I will of course show some of the queries you can perform.

Let us get going.


Installing and configuring

I do not want to make this part to extensive. The documentation at the site of ElasticSearch is rigorous and therefore my extra lines would be to much. Some of the things I want to accomplish in the configuration are:

  • Start elastic search with the logs in your command prompt: -f
  • Specify the location of the configuration file: -Des.config=/path/to/config/file
    • Change the name of the cluster and the node: cluster.name: jc-elasticsearch and node.name: “Node-gridshore”
    • Disable the multiple shards and replicas for local development: index.number_of_shards: 1 and index.number_of_replicas: 0
    • Path to the location of the configuration files: path.conf: /path/to/conf
    • Other paths, like work folder, logs folder, data folder and plugin folder.
  • Install the head plugin to have a look at whay is happening to your cluster. I have chosen to install the plugin in the normal directory and than copy it to my configured plugins folder.

Don’t forget to install and use java 7, al that java 6 stuff is really old and slow compared to 7. ElasticSearch can do a better job when running on 7.

Those are the first commands to get everything running, so once more the steps:

  1. Download elastic search: http://www.elasticsearch.org/download/. I used 0.20.2
  2. Create a directory structure outside of the downloaded elastic search folder: config, data, logs, plugins, work.
  3. Copy the elastic search.yml and logging.yml to the config folder and make the changes as mentioned.
  4. Install the plugin and copy it to your plugins folder: bin/plugin -install Aconex/elasticsearch-head
  5. Move to the main folder of you ElasticSearch download and execute the command (with your location of the config file)
bin/elasticsearch -f -Des.config=/.../elasticsearch/projects/gridshore/config/elasticsearch.yml

Now you can browse to the head plugin and see that not a lot has happened yet.

http://localhost:9200/_plugin/head/index.html

Before going to the index creation you can play around with some aspects of the REST api that ElasticSearch exposes: Try a few of these requests with curl to see the data that ElasticSearch exposes. Don’t use the pretty=true in production.

http://localhost:9200/_cluster/state?pretty=true

http://localhost:9200/_cluster/nodes?pretty=true

Reading the data using Groovy

Of course you can create the index and the required mapping using the curl based api. You can also just insert a document and the index will be created automatically and the default mapping will be created. In our case I first want to introduce you to the groovy client I use to read data from the blog using xmlrpc and the data model. Than we move on to the mapping.

In the sample code I use gradle to configure the project and my environment (Intellij).

dependencies {
	groovy 'org.codehaus.groovy:groovy:2.0.6'
    groovy 'org.codehaus.groovy:groovy-all:2.0.6'
    groovy ('org.codehaus.groovy:groovy-xmlrpc:0.8') {
        exclude module: 'groovy-all'
    }
    groovy 'commons-cli:commons-cli:1.2'
    groovy 'log4j:log4j:1.2.16'

    groovy ('org.elasticsearch:elasticsearch-lang-groovy:1.2.0') {
        exclude module: 'groovy-all'
    }
}

There is some strange thing going on with the groovy and groovy-all libraries. But I had some issues combining intellij and gradle. This seems to work for me. Notice that we need the groovy-xmlrpc library and of course the elasticsearch-lang-groovy library.

Using the following groovy class we can read all posts from the gridshore blog. Since the code is straightforward, I am not going in any details.

package nl.gridshore.wordpress

import groovy.net.xmlrpc.XMLRPCServerProxy

class WordPressReader {
    private String xmlrpcUrl
    private String username
    private String password

    private XMLRPCServerProxy serverProxy;

    def WordPressReader(xmlrpcUrl, username, password) {
        this.xmlrpcUrl = xmlrpcUrl
        this.username = username
        this.password = password

        serverProxy = new XMLRPCServerProxy(xmlrpcUrl)
        serverProxy.setBasicAuth(username, password)
    }

    def obtainMostRecentPosts(int number = 10) {
        def posts = []
        def foundPosts = serverProxy.metaWeblog.getRecentPosts(1, username, password, number)
        foundPosts.each {post ->
            def blogItem = new BlogItem()
            blogItem.id = post['postid']
            blogItem.link = post['permaLink']
            blogItem.status = post['post_status']
            blogItem.keywords = post['mt_keywords']
            blogItem.title = post['title']
            blogItem.createdOn = post['dateCreated']
            blogItem.content = post['description']
            blogItem.categories = post['categories']
            blogItem.author = post['wp_author_display_name']
            blogItem.slug = post['wp_slug']
            posts.add(blogItem)
        }
        return posts
    }
}

Time to put some stuff into ElasticSearch

Put some data into the index

The groovy library is very easy to use. The following code block shows the opening and closing of the connection. We make use of a client node that does not contain data. Take special notice of the line where we set a property for GXContentBuilder, this is required to enable the configuration using a closure. In the constructor we open the connection and in you have to close the connection using the close method.

package nl.gridshore.elasticsearch

import nl.gridshore.wordpress.BlogItem
import org.elasticsearch.action.index.IndexResponse
import org.elasticsearch.groovy.common.xcontent.GXContentBuilder
import org.elasticsearch.groovy.node.GNode
import org.elasticsearch.groovy.node.GNodeBuilder

import static org.elasticsearch.groovy.node.GNodeBuilder.nodeBuilder

class ElasticSearchGateway {
    GNode node

    ElasticSearchGateway() {
        GXContentBuilder.rootResolveStrategy = Closure.DELEGATE_FIRST; // required to use a closure as settings

        GNodeBuilder nodeBuilder = nodeBuilder();
        nodeBuilder.settings {
            node {
                client = true
            }

            cluster {
                name = "jc-elasticsearch"
            }
        }

        node = nodeBuilder.node()
    }

    public close() {
        node.stop().close()
    }
}

Now that we know how to obtain a connection, let us obtain some data and store the documents in the ElasticSearch index. The next method is also coming from the ElasticSearchGateway class that I have created. This is the most basic version that auto creates and index and the complete mapping.

    public indexBlogItem(BlogItem blogItem) {
        def future = node.client.index {
            index = "gridshore"
            type = "blog"
            source {
                blogId = blogItem.id
                link = blogItem.link
                status = blogItem.status
                keywords = blogItem.keywords
                title = blogItem.title
                createdOn_date = blogItem.createdOn
                content = blogItem.content
                categories = blogItem.categories
                author = blogItem.author
                slug = blogItem.slug
            }
        }

        future.success = {IndexResponse response ->
            println "Indexed $response.index/$response.type/$response.id"
        }
    }

With the following script we obtain 100 items from my blog and add them to the ElasticSearch index.

import nl.gridshore.elasticsearch.ElasticSearchGateway
import nl.gridshore.wordpress.BlogItem
import nl.gridshore.wordpress.WordpressReader

def rpcUrl = "http://www.gridshore.nl/xmlrpc.php"
def username = "?"
def password = "?"

def reader = new WordPressReader(rpcUrl,username,password)

def posts = reader.obtainMostRecentPosts(100)

ElasticSearchGateway gateway = new ElasticSearchGateway()

posts.each {BlogItem item ->
    println item.title
    gateway.indexBlogItem(item)
}

System.in.withReader {
    print 'input: '
    println it.readLine()
}

gateway.close()
Execute queries on the data

Now that I told you I have data in the index, let us create a query to check if there is actually something in the index. Of course we use groovy to query the index and to print the results.

    public queryIndex(theTerm) {
        def search = node.client.search {
            indices : "gridshore"
            types : "blog"
            source {
                query {
                    term(_all: theTerm)
                }
            }
        }

        search.response.hits.each {SearchHit hit ->
            println "Got hit $hit.id from $hit.index/$hit.type with title $hit.source.title"
        }
    }

By running the following script I get the shown results.

import nl.gridshore.elasticsearch.ElasticSearchGateway

ElasticSearchGateway gateway = new ElasticSearchGateway()

gateway.queryIndex("groovy")

System.in.withReader {
    print 'input: '
    println it.readLine()
}

gateway.close()
Got hit Huib3BPiTEua7Yvy6kc83w from gridshore/blog with title Doing more with groovy
Got hit lp79gd1UQ6mSdp3yp7O8Jg from gridshore/blog with title Analyzing beet results with groovy
Got hit CoRdYwD9ScyNnocqDvLGYg from gridshore/blog with title Cleaning up your maven repository with groovy
Got hit _U2q3miSRn6dTdUJB9PNYA from gridshore/blog with title Cleaning up artifactory with a groovy script
Got hit StrcQtnwTJSIvlu0pYojpA from gridshore/blog with title Exposing jmx through jmxmp and reading the jmx data with groovy
Got hit iQz0QFtYT3af9-Rls4d8Zw from gridshore/blog with title Use Grails and Axon to create a CQRS application (part II)
Got hit fajgAgrgST6AnNwCeyyocA from gridshore/blog with title Recap of the year 2010
Got hit V6h65a_rQ2mOvpE-POTF4Q from gridshore/blog with title Use Grails and Axon to create a CQRS application (part I)
Got hit xdi2KN9zQf2JD8qdbhBmdQ from gridshore/blog with title Using the NOS open data API with the springframework and jackson
Got hit DOiahRyMS0KUbg9e6BQHpQ from gridshore/blog with title Doing grails, yes I like it

The last function I want to show is counting documents. The following code block shows how to count all documents in the gridshore index of type blog. Thinking about the input for the amount of blogitems we imported, the result should be obvious.

    public countAllDocuments() {
        def count = node.client.count {
            indices : "gridshore"
            parameterTypes : "blog"
        }
        
        println "Number of found blog items : $count.response.count"
    }
Number of found blog items : 100

This is it, you can find the sources online @github. Check my GridshoreSamples project, in there you’ll find a small project called groovy-es-client.

https://github.com/jettro/GridshoreSamples

The post Learning about ElasticSearch appeared first on Gridshore.

Categories: Architecture, Programming

Review of the book: Presentation Patterns

Sun, 12/30/2012 - 13:10

I have written a review about the book Presentation Patterns: Techniques for Crafting Better Presentations (@Amazon)

In the past I have read multiple books about creating and giving presentations. I really like to present something to the public. On numerous NLJug events I have presented as well as a lot of internal events @ trifork.

These books have helped me to improve my presentation skills as well as the process to come up with nice presentations.

If you are also interested in these kind of books and you do not mind that I earn a few bucks you can buy these books that I highly recommend @ Amazon.

The post Review of the book: Presentation Patterns appeared first on Gridshore.

Categories: Architecture, Programming

This was my 2012

Fri, 12/28/2012 - 11:27

It is almost the end of the year, we are getting ready for 2013. Therefore I want to look back at my 2012 as a software engineer. I want to look back at the blogs I have written, the presentations I have given, the books I read and the conferences I attended. This blog post will give a nice overview of my year and therefore a lot of links to the things I found interesting.

Twitter

I am not a very active user on twitter with @gridshore. I do twitter more with my personal account @jettroCoenradie. Still it is not hard to see what I was busy with this year. Check my twitter overview created with vizify

Presentations

This year I presented at two events. At the Hippo Gettogether I presented about the ldap integration we did at the University of Amsterdam. You can find the slides here at slideshare.

At the yearly JFall conference of the NLJug i presented about creating polyglot and scalable applications. You can download the slides, but you can also watch the presentation on Parleys.

Blogs

When looking back I see there is not a lot of activity on the gridshore website. Most of my activity moved to my employers blog. Therefore I have decided to write a short post in here as well when I write a blog post at my employers blog.

My year started with a blog post about running the hippo cmd components from within intellij. At Hippo they preferred the cargo plugin to run their components, which is not ideal when debugging your software. Therefore I came up with a way to make it easy to run your components in intellij.

The next blogpost I wrote was about creating a MongoDB based event store for the Axonframework. Allard made some nice improvements to the event store, but this was the foundation. The event store is used in a sample application called the Axon Trader. More on this later on.

I am a hobbyist photographer. I attend a number of workshops every year at Yvonne van der Mey. She needed a new website and I like to help people out. Therefore I created a wordpress based website for her. I have a number of other wordpress website under maintenance (www.wateenjuweeltje.nl, www.coenradie.com,www.nicobulder.com). In the beginning I was doing my editing on the server, but I wanted to have a better solution. I started using git to have a history for my scripts. Manually copying the sources to the server was not nice. Therefore I created a few scripts to check git what changes were not send to the server yet. Using ftp I send these files to the server. Of course I wrote a blog post about this: Deployments with Git and Bash on a Mac.

I continued building applications using Axonframework and MongoDB. I already created the event store, but an application using axon also has a query side. For the Axon Trader sample I wanted to use MongoDB for the query side as well. I heard about the spring-data project and thought there was a good match for my sample. In this blog post I discussed my experiences with the spring-data project and the MongoDB module specifically.

For a number of projects we wanted information about the server using a ping request. I thought it was good to create something reusable, therefore I created the Healthcheck library. In this blogpost you can read more about the java based health check library.

For the University of Amsterdam we are creating a Hippo based solution for their new websites. Like with other hippo projects we have a lot of integration to do. I usually create an administrator application for managing all these integration components. Of course we do not want everybody to be able to login, therefore we have created an authentication mechanism for Spring-security to authenticate against a hippo repository. This blog post discusses that mechanism.

During 2012 I started learning about Vert.x. This is a very nice application framework to create scalable/polyglot applications on the JVM. I had an idea about a presentation proposal for the NLJuG JFall event. Before being able to present about it I had to learn. Usually I learn by creating sample applications and blogging about my experiences. Therefore I wrote this blog post about my first steps with vert.x. This was a very well read blogpost that was tweeted about 32 times. Nice to know that people liked what I was doing.

During the Devoxx conference I learned about AngularJS. I really liked the idea around this JavaScript framework. It also seemed easy enough to work with. With my interest in Vert.x and Axonframework I wanted to combine this knowledge in a sample application. This blog post described my experiences: Basic Axon Framework sample using vert.x and angular.js.

Conferences

In May I attended the GoTo conference in Amsterdam. At Trifork we organise all the GoTo conferences. The one in Amsterdam was a nice conference this year. I learned a lot and got really inspired by a few talks.

In November I was able to join the Devoxx conference. What I thrill this is. I attended a lot of presentations and we joined a booth with 10gen, the creators of MongoDB. I had some very interesting talks that made me think and want to try out new stuff. I really liked the talk about AngularJS. The evening I got back I rewrote a part of a sample using angularJS. This is really something for the future. Here you can read more about my devoxx 2012.

iOS

Screen ios rekenen

In this year I also started learning about iOS development. I started with an app for the iPad. But since both my kids have an iPod touch I wanted to create a very basic application to learn mathematics for the kids. I’ll write more about this in 2013.

Books

Like every year I spend some money on books. A lot of them are eBooks, but I also keep buying hard copy books.

  • Presentation patterns – Neal Ford
  • The lean startup – Eric Ries
  • Just enough software architecture – George Fairbanks
  • Java Performance – Charlie Hunt
  • Don’t make me think – Steve Krug
  • Spring Integration in Action – Mark Fisher
  • 7 databases in 7 weeks – Eric Redmond
  • iOS Storyboarding – Dr. Rory Lewis
  • The iOS 5 developers Cookbook – Erica Sadun
  • iOS SDK Development – Chris Adamson

The post This was my 2012 appeared first on Gridshore.

Categories: Architecture, Programming