Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Architecture

Paper: Nanocubes: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets

How do you turn Big Data into fast, useful, and interesting visualizations? Using R and technology called Nanocubes. The visualizations are stunning and amazingly reactive. Almost as interesting as the technologies behind them.

David Smith wrote a great article explaining the technology and showing a demo by Simon Urbanek of a visualization that uses 32Tb of Twitter data. It runs smoothly and interactively on a single machine with 16Gb of RAM.  For more information and demos go to nanocubes.net.

David Smith sums it up nicely:

Despite the massive number of data points and the beauty and complexity of the real-time data visualization, it runs impressively quickly. The underlying data structure is based on Nanocubes, a fast datastructure for in-memory data cubes. The basic idea is that nanocubes aggregate data hierarchically, so that as you zoom in and out of the interactive application, one pixel on the screen is mapped to just one data point, aggregated from the many that sit "behind" that pixel. Learn more about nanocubes, and try out the application yourself (modern browser required) at the link below.

Abstract from Nanocubes for Real-Time Exploration of Spatiotemporal Datasets:

Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop’s main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.
Categories: Architecture

Declaration of Interdependence

You might already know the Agile Manifesto:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

But do you know the Declaration of Interdependence:

  • We increase return on investment by making continuous flow of value our focus.
  • We deliver reliable results by engaging customers in frequent interactions and shared ownership.
  • We expect uncertainty and manage for it through iterations, anticipation, and adaptation.
  • We unleash creativity and innovation by recognizing that individuals are the ultimate source of value, and creating an environment where they can make a difference.
  • We boost performance through group accountability for results and shared responsibility for team effectiveness.
  • We improve effectiveness and reliability through situationally specific strategies, processes and practices.

While the Agile Manifesto is geared toward Agile practitioners, the Declaration of Interdependence is geared towards Agile project leaders.

When you know the values that shape things, it helps you better understand why things are the way they are. 

Notice how you can read the Agile Manifesto as, “we value this more than that” and you can read the Declaration of Interdependence as “this benefit we achieve through this.”  Those are actually powerful and repeatable language patterns.  I’ve found myself drawing from those patterns over the years, whenever I was trying to articulate operating principles (which is a good name for principles that guide how you operate.)

You Might Also Like

3 Ways to Accelerate Business Value

Extreme Programming (XP) at a Glance

How We Adhered to the Agile Manifesto on the patterns & practices Team

The patterns & practices Way

Using MadLibs to Create Actionable Principles and Practices

Categories: Architecture, Programming

The Power of Annual Reviews for Personal Development

Talk about taking some things for granted.  Especially when it’s a love-hate relationship.  I’m talking about Annual Reviews. 

I didn’t realize how valuable they can be when you own the process and you line them up with your bigger goal setting for life.  I’ve done them for so long, in this way, that I forgot how much they are a part of my process for carving out a high-impact year.

I know I might do things a big differently in terms of how I do my review, so I highlighted key things in my post:

The Power of Annual Reviews for Achieving Your Goals and Realizing Your Potential

Note that if you hate the term Annual Review because it conjures up a bunch of bad memories, then consider calling it your Annual Retrospective.  If you’re a Scrum fan, you’ll appreciate the twist.

Here’s the big idea:

If you “own” your Annual Review, you can use taking a look back to take a leap forward.

What I mean is that if you are pro-active in your approach, and if you really use feedback as a gift, you can gain tremendous insights into your personal growth and capabilities.

Here’s a summary of what I do in terms of my overall review process:

  1. Take a Look Back.  In December, I take a look back.   For example, this would be my 2013 Year in Review.   What did I achieve?  What went well? What didn’t go well?  How did I do against my 3-5 key goals that really mattered.   I use The Rule of 3, so really, I care about 3 significant changes that I can tell a story around for the year (The value of a story is the value of the change, and the value of the change is the value of the challenge.)
  2. Take a Look Forward.  Also in December, I take a look ahead.  What are my 3-5 big goals that I want to achieve for this year?  I really focus on 3 wins for each year.  The key is to hone in on the changes that matter.  If it’s not a change, then it’s business as usual, and doesn’t really need my attention because it’s already a habit and I’m already doing it.
  3. Align Work + Life.  When the Microsoft mid-year process starts, I figure out what I want to achieve in terms of themes and goals for the year at work.  I’ve already got my bigger picture in mind.   Now it’s just a matter of ensuring alignment between work and life.  There’s always a way to create better alignment and better leverage, and that’s how we empower ourselves to flourish in work and life.

It’s not an easy process.  But that’s just it.  That’s what makes it worth it.  It’s a tough look at the hard stuff that matters.  The parts of the process that make it  a challenge are the opportunities for growth.   Looking back, I can see how much easier it is for me to really plan out a year of high-impact where I live my values and play to my strengths.  I can also see early warning signs and anticipate downstream challenges.  I know when I first started, it was daunting to figure out what a year might look like.  Now, it’s almost too easy.

This gives me a great chance up front to play out a lot of “What If?” scenarios.  This also gives me a great chance right up front to ask the question, if this is how the year will play out, is that the ride I want to be on?  The ability to plan out our future capability vision, design a better future, and change our course is part of owning our destiny.

In my experience, a solid plan at the right level, gives you more flexibility and helps you make smarter choices, before you become a frog in the boiling pot.

If you haven’t taken the chance to really own and drive your Annual Review, then consider doing an Annual Retrospective, and use the process to help you leap frog ahead.

Make this YOUR year.

You Might Also Like

2012 Year in Review

Anatomy of a High-Potential

Mid-Year Review, Career, and Getting Ahead

Performance Review Template

The Guerilla Guide to Getting a Better Performance Review at Microsoft

Categories: Architecture, Programming

Phoney Deadlines are Deadly for Achieving Results

Xebia Blog - Sat, 12/28/2013 - 17:55

Ever had deadlines that must be met causing short-term decisions to be made? Ever worked over time with your team to meet an important deadline after which the delivered product wasn’t used for a couple of weeks?

I believe we all know these examples where deadlines are imposed on the team for questionable reasons.

Yet, deadlines are part of reality and we have to deal with them. Certainly, there is business value in meeting them but they also have costs.

The Never Ending Story of Shifting Deadlines…..

Some time ago I was involved in a project for delivering personalised advertisements on mobile phones. At that time this was quite revolutionary and we didn’t know how the market would react. Therefore, a team of skilled engineers and marketeers was assembled and we set out to deliver a prototype in a couple of months and test it in real life, i.e. with real phones and in the mobile network. This was a success and we got the assignment to make it into a commercial product version 1.0.
At this time there was no paying customer for the product yet and we built it targeted at multiple potential customers.

For the investment to make sense the deadline for delivering version 1.0 was set to 8 months.

The prototype worked fine but how to achieve a robust product when the product is scaled to millions of subscribers and thousands of advertisements per second? What architecture to use? Should we build upon the prototype or throw it away and start all over with the acquired knowledge?

A new architecture required us to use new technology which would require training and time to get acquainted with it. Time we did not have as the deadline was only 8 months away. We double checked that the deadline can be moved to a later date. Of course, this wash’t possible as it would invalidate the business case. We decided to not throw away the prototype but enhance it further.

As the deadline was approached it became clear that we were not going to deliver a product 1.0. Causes were multiple: the prototype’s architecture did not quite fit the 1.0 needs, scope changed along the way as marketing got new insights from the changing market, the team grew in size, and the integration to other network components took time as well.
So, the deadline was extended with another 6 months.

The deadline got shifted 2 more times.

This felt really bad. It felt we let down both the company and the product owner by not delivering on time. We had the best people part on the team and already had a working proto type. How come we weren’t able to deliver? What happened? What could we do to prevent this from happening a third time?

Then the new product was going to be announced at a large Telecom conference. This is what the product (and the team) needed; we still got a deadline but this time there was a clear goal associated with the deadline, namely a demo version for attracting first customers! Moreover, there was a small time window for delivering the product; missing the deadline would mean an important opportunity was lost with severe impact to the business case. This time we made the deadline.

The conference was a success and we got our first customers; of course new deadlines followed and this time with clear goals related to the needs of specific customers.

The Effect Deadlines Have

Looking back, which always is easy, we could have finished the product much earlier if the initial deadline was set to a much later date. Certainly, there was value in being able to deliver a product very fast, i.e. having short-term deadlines. On the other hand there were also costs associated with these short-term deadlines including:

  • struggling with new technologies caused by not taking time to do the necessary trainings and take time to gain experience,
  • working with an architecture that does not quite fit causing more workarounds,
  • relatively simple tasks becoming more complex over time.

In this case the short-term deadline put pressure on the team to deliver in short time causing the team to take short-cuts along the way causing delays and refactoring at a later time. Over time less results will be delivered.
What makes this pattern hard to fix is that the action of imposing deadline will deliver short-term results and seems a good idea to get results from the team.

This pattern is known as ‘Shifting the Burden’ and is depicted below. In the previous example the root cause is not trusting the team to get the job done. The distrust is addressed by imposing a deadline as a way to ‘force’ the team to deliver.

deadline1

The balancing (B) loop on top is the short-term solution to the problem getting results from the team. The 'fear' of the lacking focus and therefore results leads to imposing a deadline and thereby increasing the focus (and results) of the team. The problem symptom will reduce but will reappear causing an 'addiction' to the loop of imposing deadlines.

The fundamental solution of trusting the team, prioritising and giving them goals is often overlooked. Also this fundamental solution is less evident and costs energy and effort from the organisation to implement. The short-term solution has unwanted side effects that in the long run - slashed arrow - have negative impact on the team results.

In the example above the fundamental solution consisted of setting and prioritising towards the goal of a successful demo at the conference. This worked because it was a short-term and realistic goal. Furthermore the urgency was evident to the team: there was not going to be a second chance in case this deadline was missed.

Another example
In practise I also encounter the situation in which deadlines are imposed is a team that seems to lack focus. The underlying problem is the lack of a (business) vision and goals. The symptom as experienced by the organisation is the lack of concrete results. In fact the team does work hard but does so by working on multiple goals at the same time. Here, clear goals and prioritising the work to be done first will help.

deadline2

Also in this example, the action of imposing deadline to ‘solve’ the problem has the unintended side effect of not addressing the underlying problem. This will make the problem of the team not delivering result reappear.

Goals & Deadlines

In the examples of the previous section the deadlines I call phoney deadlines. When in practise a deadline is imposed it usually also implies a fixed scope and fixed date.

Deadlines
Deadlines should be business case related and induce large costs if not met. For the deadlines in the above examples this is not the case.

Examples of deadlines that have associated large costs if not met, are:

  • associated with small market window of introducing a product (or set of features); the cost of missing the small time window is very high,
  • associated with implementation dates of laws; again missing these deadline severely harms the business,
  • hardware that becomes end of life,
  • support contracts the end.

Goals
In the story above the ‘real’ deadline actually was 2 years instead of the 8 months. In this case the deadline probably was used as a management tool, with all good intentions, to get the team focussed on producing results. Whereas in fact it caused short-cuts to be made by the team in order to meet the deadline.

Getting focus in teams is done by giving the team a goal: a series of subgoals leading to the end-goal [Bur13]. Establish a vision and derive a series of (sub)goals to realise the vision. Relate these goals to the business case. Story mapping is one of the tools available to define series of goals [Lit].

Conclusion

Avoid setting deadlines as a means to get results from a team. On the short-term this will give results but on the long run it negatively impacts the results that the organisation wants to achieve.

Reserve deadlines for events that have a very high Cost of Delay when missed, i.e. the cost of missing the deadline is very large.

Instead, set a vision (both for the organisation and product) that is consistent with the fundamental solution. In addition, derive a series of goals and prioritise to help team focus on achieving results. To derive a series of goals several techniques can be used like Story mapping and Goal-Impact Mapping.

References [Bur13] Daniel Burm, 2013, The Secret to 3-Digit Productivity Growth [Wik] Shifting the Burden, Wikipedia, Systems Thinking [Lit] Lithespeed, Story Mapping

Season's greetings

Coding the Architecture - Simon Brown - Tue, 12/24/2013 - 11:28

2013 has been a fantastic year for me and I've had the pleasure of meeting so many people in more than a dozen countries. I just wanted to take this opportunity to say thank you and to wish all of my friends and clients a Merry Christmas. I hope that you have a happy and healthy 2014.

Instead of sending greeting cards, I'll be making a donation to the Channel Islands Air Search and RNLI Jersey. Both are voluntary organisations made up of people who regularly risk their own lives to save others. Jersey has experienced some very stormy weather during recent weeks where the RNLI and the CI AirSearch have both been called out. The CI AirSearch aeroplane unfortunately had to make an emergency landing too. They're incredibly brave to head out in the scary conditions we've had over the past few weeks and are an essential part of island life. "Inspirational" is an understatement. Let's hope they have a quiet holiday season.

Categories: Architecture

Composite User Stories

Xebia Blog - Wed, 12/18/2013 - 11:21

User stories

User stories must represent the business value. That's why we use the well known one-line description 'as an <actor> I want an <action>, so I can reach a <goal>'. It is both simple and powerful because it provides the team a concrete customer related context for identifying the relevant tasks to reach the required goal.

The stories pulled into the sprint by the team have a constraint on size. They should at least be small enough to fit into a sprint. This constraint of story size can in some cases require the story to be broken down into smaller stories. There are some useful patterns to do this like workflow steps, business rule or data variation etc.

Complex systems

When dealing with large and complex systems consisting of many interacting components the process of breaking down can impose problems even when following the standard guidelines. Especially when breaking down a story leads to stories which are related to components deep within the system without direct connection to the end user or the business goal. Those stories are usually inherently technical and far away from the business perspective.

Lets say the team encounters a story like ‘As a user I want something really complex that doesn’t fit in a story, so I can do A’. The story requires interaction of multiple components so the team breaks it down in to smaller stories like ‘As a user I want component X to expose operation Y, so I can do A’. There should be a user and business goal, but the action has no direct relation to either of them. It provides no meaningful context for this particular story and it just doesn't feel right.

Constraint by time and with no apparent solution provided by known patterns the team is likely to define a story like: ‘Implement operation Y in component X’, which is basically a compound task description and provides no context at all.

Components as actors

Breaking the rules a bit it is possible to use the principle of user story definition and provide meaningful context in these cases. The trick is to zoom into the system and define the sub stories on another level using a composite relation and making the components actors themselves with their own goals: ‘As component Z I want to call operation Y on component X, so I can do B’ and ‘As component X I want to implement operation Y, so I can do C’.

There is no direct customer or business value in this sub story, but because it is linked by composition it is quite easy to trace back to the business value. Each of the sub goals contributes to the goal stated in the composite story. Goal A will be reached by reaching both goal B and goal C (A = B + C).

Linking the stories

There are several ways to link the stories to their composite story. You can number stories like 1, 1a, 1b, ... or by indenting the sticky notes with sub user stories on the scrum board to visualize the relationship. To make the relation more explicit you can also extend the story description like: As an <Actor> I want <Action>, so I can reach <Goal> and contribute to <Composite Goal>.

Composite Stories

Summary

The emphasis of this approach is to try to maintain meaningful context while splitting (technical) user stories for complex systems with many interacting components. By viewing components as actors with their own goals you can create meaningful user stories with relevant contexts. The use of a composite structure creates logical relations between the stories in the composition and connects them to the business value. This way the team can maintain a consistent way of expressing functionality using user stories.

Disclaimer

This method should only be applied when splitting user stories using the standard patterns is not possible. For instance it does not provide an answer to the rule that each story should deliver value to the end user. It is likely that more than one sprint is needed to deliver a composite story.  Also you should ask yourself the question why there is complexity in the system and could it be avoided. But for teams facing this complexity and the challenge to split the stories today this method can be the lesser of two evils.

When C4 becomes C5

Coding the Architecture - Simon Brown - Mon, 12/02/2013 - 10:59

I've been working with a number of teams recently, helping them to diagram their software systems using the C4 approach that is described in my Software Architecture for Developers book. To summarise, it's a way to visualise the static structure of a software system using a small number of simple diagrams as follows:

  1. Context: a high-level diagram that sets the scene; including key system dependencies and actors.
  2. Containers: a containers diagram shows the high-level technology choices (web servers, application servers, databases, mobile devices, etc), how responsibilities are distributed across them and how they communicate.
  3. Components: for each container, a components diagram lets you see the key logical components/services and their relationships.
  4. Classes: if there’s important information that I want to portray, I’ll include some class diagrams. This is an optional level of detail and its inclusion depends on a number of factors.

In the real-world, software systems never live in isolation and it's often useful to understand how all of the various software systems fit together within the bounds of an enterprise. To do this, I'll simply add another diagram that sits on top of the C4 diagrams, to show the enterprise context from an IT perspective. This usually includes:

  • The organisational boundary.
  • Internal and external users.
  • Internal and external systems (including a high-level summary of their responsibilities and data owned).

Essentially this becomes a high-level map of the software systems at the enterprise level, with a C4 drill-down for each software system of interest. Caveat time. I do appreciate that enterprise architecture isn't simply about technology but, in my experience, many organisations don't have an enterprise architecture view of their IT landscape. In fact, it shocks me how often I come across organisations of all sizes that lack such a holistic view, especially considering IT is usually a key part of the way they implement business processes and service customers. Sketching out the enterprise context from a technology perspective at least provides a way to think outside of the typical silos that form around IT systems.

Categories: Architecture

Oh no, more logs, start with logstash

Gridshore - Sun, 11/10/2013 - 20:26

How many posts have you seen about logging? And how many have your read about logging? Recently logging became cool again. Nowadays everybody talks about logstash, elasticsearch and kibana. It feels like everybody is playing with these tools. If you are not among the people playing around with it, than this is your blog post. I am going to help you get started with logstash, get familiar with the configuration and configuring the input as well as output. Than when you are familiar with the concepts and know how to play around with logstash, I move on to storing things in elasticsearch. There are some interesting steps to take there as well. When you have a way to put data in elasticsearch we move on to looking at the data. Before you can understand the power of Kibana, you have to create some queries on your own. I’ll help you there as well. In the end we will also have a look at Kibana.


Logstash

Logstash comes with a number of different components. You can run them all using the executable jar. But logstash is very pluggable, therefore you can also use other components to replace the internal logstash components. Logstash contains the following components:

  • Shipper – sends events to logstash
  • broker/indexer – sends events to an output, elasticsearch for instance
  • search/storage – provides search capabilities using an internal elasticsearch
  • Web interface – provided a guy using a version of kibana

Logstash is created using jRuby, so you need a jvm to run logstash. When you have the executable jar all you need to do is create a basic config file and you can start experimenting. The config file consists of three main parts:

  • Input – the way we receive messages or events
  • Filters – how we leave out or convert messages
  • Output – the way to send out messages

The next code block gives the most basic config, use the standard input from the terminal where you run logstash and output the messages to the same console.

input {
  stdin { }
}
output {
  stdout { }
}

Time to run logstash using this config:

java -jar logstash-1.2.2-flatjar.jar agent -v -f basic.conf

Then when typing Hello World!! we get (I did remove some debug info):

2013-11-08T21:58:13.178+0000 jettro.gridshore.nl Hello World!!

With the 1.2.2 release it is still annoying that you cannot just stop the agent using ctrl+c. I have to really kill it with the -9 option.

It is important to understand that the input and output contents are plugins. We can add other plugins to handle other input sources as well as plugins for outputting data. One is to output data to elasticsearch we will see later on.

Elasticsearch

I am not going to explain how to install elasticsearch. There are so many resources available online, especially on the elasticsearch website. So I tak it you have a running elasticsearch installation. Now we are going to update the logstash configuration to send all events to elasticsearch. The following code block shows the config for sending events as entered in the standard in to elasticsearch.

input {
  stdin { }
}
output {
  stdout { }

  elasticsearch {
    cluster => "logstash"
  }
}

Now when we have elasticsearch running with auto discovery enabled and a cluster name equal to logstash we can start logstash again. The output should resemble the following.

[~/javalibs/logstash-indexer]$ java -jar logstash-1.2.2-flatjar.jar agent -v -f basic.conf 
Pipeline started {:level=>:info}
New ElasticSearch output {:cluster=>"logstash", :host=>nil, :port=>"9300-9305", :embedded=>false, :level=>:info}
log4j, [2013-11-09T17:51:30.005]  INFO: org.elasticsearch.node: [Nuklo] version[0.90.3], pid[32519], build[5c38d60/2013-08-06T13:18:31Z]
log4j, [2013-11-09T17:51:30.006]  INFO: org.elasticsearch.node: [Nuklo] initializing ...
log4j, [2013-11-09T17:51:30.011]  INFO: org.elasticsearch.plugins: [Nuklo] loaded [], sites []
log4j, [2013-11-09T17:51:31.897]  INFO: org.elasticsearch.node: [Nuklo] initialized
log4j, [2013-11-09T17:51:31.897]  INFO: org.elasticsearch.node: [Nuklo] starting ...
log4j, [2013-11-09T17:51:31.987]  INFO: org.elasticsearch.transport: [Nuklo] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/192.168.178.38:9301]}
log4j, [2013-11-09T17:51:35.052]  INFO: org.elasticsearch.cluster.service: [Nuklo] detected_master [jc-server][wB-3FWWEReyhWYRzWyZggw][inet[/192.168.178.38:9300]], added {[jc-server][wB-3FWWEReyhWYRzWyZggw][inet[/192.168.178.38:9300]],}, reason: zen-disco-receive(from master [[jc-server][wB-3FWWEReyhWYRzWyZggw][inet[/192.168.178.38:9300]]])
log4j, [2013-11-09T17:51:35.056]  INFO: org.elasticsearch.discovery: [Nuklo] logstash/5pzXIeDpQNuFqQasY6jFyw
log4j, [2013-11-09T17:51:35.056]  INFO: org.elasticsearch.node: [Nuklo] started

Now when typing Hello World !! in the terminal the following is logged to the standard out.

output received {:event=>#"Hello World !!", "@timestamp"=>"2013-11-09T16:55:19.180Z", "@version"=>"1", "host"=>"jettrocadiesmbp.fritz.box"}>, :level=>:info}
2013-11-09T16:55:19.180+0000 jettrocadiesmbp.fritz.box Hello World !!

This time however, the same thing is also send to elasticsearch. When we can check that by doing the following query to determine the index that is created.

curl -XGET "http://localhost:9200/_mapping?pretty"

The response than is the mapping. The index is the date of today, there is a type called logs and it has the fields that are also written out in the console.

{
  "logstash-2013.11.09" : {
    "logs" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date",
          "format" : "dateOptionalTime"
        },
        "@version" : {
          "type" : "string"
        },
        "host" : {
          "type" : "string"
        },
        "message" : {
          "type" : "string"
        }
      }
    }
  }
}

Now that we know the name of the index, we can create a query to see if our message got through.

curl -XGET "http://localhost:9200/logstash-2013.11.09/logs/_search?q=message:hello&pretty"

The response for this query now is

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.19178301,
    "hits" : [ {
      "_index" : "logstash-2013.11.09",
      "_type" : "logs",
      "_id" : "9l8N2tIZSuuLaQhGrLhg7A",
      "_score" : 0.19178301, "_source" : {"message":"Hello World !!","@timestamp":"2013-11-09T16:55:19.180Z","@version":"1","host":"jettrocadiesmbp.fritz.box"}
    } ]
  }
}

There you go, we can enter messages in the console where logstash is running and query elasticsearch to see the messages are actually in the system. Not sure if this is useful, but at least you have seen the steps. Next step is to have a look at our data using a tool called kibana.

Kibana

There are a multitude of ways to install kibana. Depending on your environment one is easier than the other. I like to install kibana as a plugin in elasticsearch on a development machine. So in the plugins folder create the folder kibana/_site and store all the content of the downloaded kibana tar in there. Now browse to http://localhost:9200/_plugin/kibana. In the first screen look for the logstash dashboard. When you open the dashboard it looks a bit different than mine, I made some changes to make it easier to present on the screen. Later on I will show how to create your own dashboard and panels. The following screen shows Kibana.

Screen Shot 2013 11 09 at 18 43 34

logstash also comes with an option to run kibana from the logstash executable. I personally prefer to have it as a separate install, that way you can always use the latest and greatest version.

Using tomcat access logs

This is all nice, but we are not implementing a system like this to enter a few messages, therefore we want to attach another input source to logstash. I am going to give an example with tomcat access logs. If you want to obtain access logs in tomcat you need to add a valve to the configured host in server.xml.

<Valve className="org.apache.catalina.valves.AccessLogValveDC" 
       directory="/Users/jcoenradie/temp/logs/" 
       prefix="localhost_access_log" 
       suffix=".txt" 
       renameOnRotate="true"
       pattern="%h %t %S &quot;%r&quot; %s %b" />

An example output from the logs than is, the table shows what the pattern we have means

0:0:0:0:0:0:0:1 [2013-11-10T16:28:00.580+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/partials/dashboard.html HTTP/1.1" 200 988
0:0:0:0:0:0:0:1 [2013-11-10T16:28:00.580+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/api/settings HTTP/1.1" 200 90
0:0:0:0:0:0:0:1 [2013-11-10T16:28:02.753+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/partials/users.html HTTP/1.1" 200 7160
0:0:0:0:0:0:0:1 [2013-11-10T16:28:02.753+0100] C054CED0D87023911CC07DB00B2F8F75 "GET /admin/api/users HTTP/1.1" 200 1332
h remote host t timestamp S session id r first line of request s http status code of response b bytes send

If you want mote information about the logging options check the tomcat configuration.

First step is get the contents of this file into logstash. Therefore we have to make a change to add an input coming from a file.

input {
  stdin { }
  file {
    type => "tomcat-access"
    path => ["/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt"]
  }
}
output {
  stdout { }

  elasticsearch {
    cluster => "logstash"
  }
}

The debug output now becomes.

output received {:event=>#"0:0:0:0:0:0:0:1 [2013-11-10T17:15:11.028+0100] 9394CB826328D32FEB5FE1F510FD8F22 \"GET /static/js/mediaOverview.js HTTP/1.1\" 304 -", "@timestamp"=>"2013-11-10T16:15:20.554Z", "@version"=>"1", "type"=>"tomcat-access", "host"=>"jettro-coenradies-macbook-pro.fritz.box", "path"=>"/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt"}>, :level=>:info}

Now we have stuff in elasticsearch, but we have just one string, the message. We now we have more interesting data in the message. Let us move on to the following component in logstash, filtering.

Logstash filtering

You can use filters to enhance the received events. The following configuration shows how to extract client, timestamp, session id, method, uri path, uri param, protocol, status code and bytes. As you can see we use grok to match these fields from the input.

input {
  stdin { }
  file {
    type => "tomcat-access"
    path => ["/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt"]
  }
}
filter {
  if [type] == "tomcat-access" {
    grok {
      match => ["message","%{IP:client} \[%{TIMESTAMP_ISO8601:timestamp}\] (%{WORD:session_id}|-) \"%{WORD:method} %{URIPATH:uri_path}(?:%{URIPARAM:uri_param})? %{DATA:protocol}\" %{NUMBER:code} (%{NUMBER:bytes}|-)"]
    }
  }
}
output {
  stdout { }

  elasticsearch {
    cluster => "logstash"
  }
}

Now compare the new output.

output received {:event=>#"0:0:0:0:0:0:0:1 [2013-11-10T17:46:19.000+0100] 9394CB826328D32FEB5FE1F510FD8F22 \"GET /static/img/delete.png HTTP/1.1\" 304 -", "@timestamp"=>"2013-11-10T16:46:22.112Z", "@version"=>"1", "type"=>"tomcat-access", "host"=>"jettro-coenradies-macbook-pro.fritz.box", "path"=>"/Users/jcoenradie/temp/dpclogs/localhost_access_log.txt", "client"=>"0:0:0:0:0:0:0:1", "timestamp"=>"2013-11-10T17:46:19.000+0100", "session_id"=>"9394CB826328D32FEB5FE1F510FD8F22", "method"=>"GET", "uri_path"=>"/static/img/delete.png", "protocol"=>"HTTP/1.1", "code"=>"304"}>, :level=>:info}

Now if we go back to kibana, we can see we have more fields. The message is now replace with the mentioned fields. So now we can easily filter on for instance session_id. The following image shows that we can select the new fields.

Screen Shot 2013 11 10 at 17 56 24

That is it for now, later on I’ll blog about more logstash options and creating dashboards with kibana.

The post Oh no, more logs, start with logstash appeared first on Gridshore.

Categories: Architecture, Programming

Setting up keys to sign emails in Samsung’s Android email app

Gridshore - Mon, 09/30/2013 - 17:37
Introduction For almost half a year now, I’ve been the proud owner of a Samsung Galaxy SIII Mini (bought it just before the release of the S4, because my phone died and I couldn’t wait for the S4). Since then I’ve got it doing most of what I want it to do, except sign my outgoing emails when I want it to (sign them cryptographically, obviously — I got it to add a text signature within two seconds). The problem here is that setting up the Samsung stock mail app (I don’t use the GMail app) is not immediately obvious. But today I finally got it working, after a long and frustrating day. Read on to find out how…

To sign or not to sign… First of all, let’s take a look at the basic infrastructure for securing your outgoing mail in Samsung’s mail client. This infrastructure is found in the mail application’s settings, which are accessed using the menu key once you start the mail client: OpenSettings.png After you access the settings, find the security options item and tap that:

Screenshot_2013-09-30-17-41-20.png

You should now see a screen like this: Screenshot_2013-09-30-17-41-29.png Hooray, you can manage keys that allow you to sign and/or encrypt your mails!! But this is where things start to get awkward. There are two competing standards out there (both endorsed by the IETF) for signing and encrypting mail. First, there is S/MIME, which uses the same PKI interface also used to secure web traffic and which requires yu to use RSA keypairs and signed certificates. On the other hand there is Pretty Good Privacy (PGP) which uses many types of keypairs, keyservers and a web of trust. So the first question that you run into here is: which do you use? The answer to that is that you use PGP, because S/MIME is not supported by this mail client except for Exchange servers. But you have to dig long and hard on the web to find that out, because there is no official documentation to tell you that. So your next move is going to be to use a tool like GPG to generate your public/private keypair with a passphrase, publish it on a server if you wish and export the public and private keys as .ASC files. After that, you can follow the instructions you find all across the web to place these files in the root of your SD card and import the keys. Which you do by going to Private keys or Public keys in the menu shown above, hitting the menu button and selecting Import keys. And then you will discover that this does not work because no key file is found. You see, for some bizarre reason Samsung chose not to use the onboard key management facilities of Android to manage their keys, instead opting to roll their own. To import the keys into the Samsung mail client, place your key files on your SD card in the directory
/opengpg/export
Yes, that is correct, export. Then, make sure your keyfiles have the correct name. They should be called
<your email address>_<your name as you filled it in in the mail account settings>_0x<the ID of your PGP keypair>_Private_Key.asc

and

<your email address>_<your name as you filled it in in the mail account settings>_0x<the ID of your PGP keypair>_Public_Key.asc

respectively for the private and public keys. If you use other names, the mail app will not find them. You can generate an example if you want: in the mail app, use the Create keys option and export the keys to see what the names look like. You’ll have to get the ID from your GPG tool.

After all that, you should be able to import your keys. Then use the Set default key option to choose a default keypair. You can either select to sign all your mails, or you can use the settings per mail to sign and/or encrypt. Don’t lose your passphrase, you have to fill it in every time you sign a mail!

The post Setting up keys to sign emails in Samsung’s Android email app appeared first on Gridshore.

Categories: Architecture, Programming

Cloning an OpenSuSE 12.3 virtual machine using GRUB2, VirtualBox and cryptfs

Gridshore - Mon, 08/19/2013 - 10:49
Introduction

So I just bought myself a new laptop (an Asus N76VB-T4038H) which I love so far. It’s a great machine and real value for money. Of course it comes preloaded with Windows 8 — which I hate so far, but am holding on to for the occasional multimedia thingie and because I might have to install Microsoft Office at some point. But my main OS is still Linux and in particular OpenSuSE. And given the way I want to use this laptop in the near future, the idea hit me that it would be a good idea to virtualize my Linux machines so that I can clone them and set up new and separate environments when needed. So I installed VirtualBox on my guest OS, downloaded the OpenSuSE 12.3 ISO, set up a new machine to create a golden image, pointed it to the ISO and installed OpenSuSE. Smooth sailing, no problems. Even set up an encrypted home partition without any pain. And then I tried to clone the machine in VirtalBox and start the clone… and found that it couldn’t find my cloned hard drive.

This is actually a reasonably well-known problem in OpenSuSE and some other Linux distros. You’ll find another blog about what is going on here and there are several forum posts. But none of the ones I found cover a solution involving GRUB2, so I thought I’d post my experiences here.

The problem

The basic problem is this: Linux has a single file system tree which spans all of the available hard drives, partitions and so on in one, large virtual structure. To make this happen different parts of the physical storage infrastructure of your machine are mapped to branches of the file system tree (so called mount point). So, for example, the /home directory in your file system may be designated a mount point for the second partition on you primary SCSI drive. Which means that you can type /home and under water the OS will start looking at the partition also known as /dev/sda2. This trick can be applied at any time by the way, using the mount command. This is also what happens, for instance, when you insert a USB drive or DVD: a previously normal directory like /media/USB suddenly and magically becomes the file system location for the USB drive.

Now, recently, Linux has acquired different ways of naming partitions and drives. It all used to be /dev/hdaX and /dev/sdaX, but nowadays several partitions have introduced additional naming schemes using symlinks. For example, on my system, there is a directory /dev/disk which includes several subdirectories containing symlinks to actual device files, all using different naming schemes:

bzt@linux-akf6:/dev/disk> ll
total 0
drwxr-xr-x 2 root root 260 Aug 19 11:27 by-id
drwxr-xr-x 2 root root 140 Aug 19 11:27 by-path
drwxr-xr-x 2 root root 120 Aug 19 11:27 by-uuid
bzt@linux-akf6:/dev/disk>

The by-id directory for instance includes symbolic names for the device files:

bzt@linux-akf6:/dev/disk> ll by-id/
total 0
lrwxrwxrwx 1 root root  9 Aug 19 11:27 ata-VBOX_CD-ROM_VB2-01700376 -> ../../sr0
lrwxrwxrwx 1 root root  9 Aug 19 11:27 ata-VBOX_HARDDISK_VB3ceba069-ef9ac38e -> ../../sda
lrwxrwxrwx 1 root root 10 Aug 19 11:27 ata-VBOX_HARDDISK_VB3ceba069-ef9ac38e-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug 19 11:27 ata-VBOX_HARDDISK_VB3ceba069-ef9ac38e-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug 19 11:27 ata-VBOX_HARDDISK_VB3ceba069-ef9ac38e-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Aug 19 11:27 dm-name-cr_home -> ../../dm-0
lrwxrwxrwx 1 root root 10 Aug 19 11:27 dm-uuid-CRYPT-LUKS1-22b6b0f3ad0c433e855383ca2e64bef1-cr_home -> ../../dm-0
lrwxrwxrwx 1 root root  9 Aug 19 11:27 scsi-SATA_VBOX_HARDDISK_VB3ceba069-ef9ac38e -> ../../sda
lrwxrwxrwx 1 root root 10 Aug 19 11:27 scsi-SATA_VBOX_HARDDISK_VB3ceba069-ef9ac38e-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug 19 11:27 scsi-SATA_VBOX_HARDDISK_VB3ceba069-ef9ac38e-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug 19 11:27 scsi-SATA_VBOX_HARDDISK_VB3ceba069-ef9ac38e-part3 -> ../../sda3
bzt@linux-akf6:/dev/disk>

Some of these mount points are fixed, meaning that the operating system automatically remounts them on system reboot. These mount points are recorded in the /etc/fstab file (file system table, a tab-separated table of the information needed to mount the mount points) and in the boot loader (because the boot loader has to know where to start the operating system from). The boot loader files are in /boot/grub2 if you are using GRUB2.

Now, some Linux distributions (including OpenSuSE 12.3) have chosen to generate these configuration files using the new, symbolic names for device files (found in /dev/disk/by-id) rather than the actual device file names (e.g. /dev/sda2). Which is usually no problem. Except when you are on a virtual computer which has been cloned and is using a virtual disk which has also been cloned. Because when you clone a disk, its symbolic name changes. But the configuration files are not magically updated during cloning to reflect this change.

The solution

The solution to this problem is to switch back to the device file names (because those are correct, even after cloning).

To make your Linux virtual machine with GRUB2 clone-ready, perform the following steps:

  1. Take a snapshot of your virtual machine.
  2. Become root on your machine.
  3. Examine the /etc/fstab file. Find all the symbolic names that occur in that file.
  4. Refer to the /dev/disk/by-id directory and examine the symlinks to figure out which device file is equivalent to which symbolic name.
  5. Use vi to edit the /etc/fstab file.
  6. Replace all symbolic names with the correct device file names.
  7. Save the new /etc/fstab file.
  8. Go to the /boot/grub2 directory.
  9. Make the same changes to the device.map and grub.cfg files.
You should now be able to clone the machine and boot the clone and have it find your drives. Encrypted partitions

There is one exception though: if you have encrypted partitions, they still will not work. This is because Linux uses some indirection in the file system table for crypted file systems. To get your encrypted partitions to work, you have to edit the /etc/crypttab file and make the same symbolic-for-device-file name substitutions there.

The post Cloning an OpenSuSE 12.3 virtual machine using GRUB2, VirtualBox and cryptfs appeared first on Gridshore.

Categories: Architecture, Programming

New book review ‘SOA Made Simple’ coming soon.

Ben Wilcock - Tue, 02/05/2013 - 14:24

My review copy has arrived and I’ll be reviewing it just as soon as I can, but in the meantime if you’d like more information about this new book go to http://bit.ly/wpNF1J


Categories: Architecture, Programming

Facebook Has An Architectural Governance Challenge

Just to be clear, I don't work for Facebook, I have no active engagements with Facebook, my story here is my own and does not necessarily represent that of IBM. I'd spent a little time at Facebook some time ago, I've talked with a few of its principal developers, and I've studied its architecture. That being said:

Facebook has a looming architectural governance challenge.

When I last visited the company, they had only a hundred of so developers, the bulk of whom fit cozily in one large war room. Honestly, it was little indistinguishable from a Really Nice college computer lab: nice work desks, great workstations, places where you could fuel up with caffeine and sugar. Dinner was served right there, so you never needed to leave. Were I a twenty-something with only a dog and a futon to my name, it would be been geek heaven. The code base at the time was, by my estimate, small enough that it was grokable, and the major functional bits were not so large and were sufficiently loosely coupled such that development could proceed along nearly independent threads of progress.

I'll reserve my opinions of Facebook's development and architectural maturity for now. But, I read with interest this article that reports that Facebook plans to double in size in the coming year.

Oh my, the changes they are a comin'.

Let's be clear, there are certain limited conditions under which the maxim "give me PHP and a place to stand, and I will move the world" holds true. Those conditions include having a) a modest code base b) with no legacy friction c) growth and acceptance and limited competition that masks inefficiencies, d) a hyper energetic, manically focused group of developers e) who all fit pretty much in the same room. Relax any of those constraints, and Developing Really Really Hard just doesn't cut it any more.

Consider: the moment you break a development organization across offices, you introduce communication and coordination challenges. Add the crossing of time zones, and unless you've got some governance in place, architectural rot will slowly creep in and the flaws in your development culture will be magnified. The subtly different development cultures that will evolve in each office will yield subtly different textures of code; it's kind of like the evolutionary drift on which Darwin reported. If your architecture is well-structure, well-syndicated, and well-governed, you can more easily split the work across groups; if your architecture is poorly-structured, held in the tribal memory of only a few, and ungoverned, then you can rely on heroics for a while, but that's unsustainable. Your heros will dig in, burn out, or cash out.

Just to be clear, I'm not picking on Facebook. What's happening here is a story that every group that's at the threshold of complexity must cross. If you are outsourcing to India or China or across the city, if you are growing your staff to the point where the important architectural decisions no longer will fit in One Guy's Head, if you no longer have the time to just rewrite everything, if your growing customer base grows increasingly intolerant of capricious changes, then, like it or not, you've got to inject more discipline.

Now, I'm not advocating extreme, high ceremony measures. As a start, there are some fundamentals that will go a long way: establish a well-instrumented and well-automated build and release system; use some collaboration tools that channel work but also allow for serendipitous connections; codify and syndicate the system's load bearing wells/architectural decisions; create a culture of patterns and refactoring.

Remind your developers that what they do, each of of them, is valued; remind your developers there is more to life than coding.

It will be interesting to watch how Facebook metabolizes this growth. Some organizations are successful in so doing; many are not. But I really do wish Facebook success. If they thought the past few years were interesting times, my message to them is that the really interesting times are only now beginning. And I hope they enjoy the journey.
Categories: Architecture

How Watson Works

Earlier this year, I conducted an archeological dig on Watson. I applied the techniques I've developed for the Handbook which involves the use of the UML, Philippe Kruchten's 4+1 View Model, and IBM's Rational Software Architect. The fruits of this work have proven to be useful as groups other than Watson's original developers begin to transform the Watson code base for use in other domains.

You can watch my presentation at IBM Innovate on How Watson Works here.
Categories: Architecture

Books on Computing

Over the past several years, I've immersed myself in the literature of the history and the implications of computing. All told, I've consumed over two hundred books, almost one hundred documentaries, and countless articles and websites - and I have a couple of hundred more books yet to metabolize. I've begun to name the resources I've studied here and so offer them up for your reading pleasure.

I've just begun to enter my collection of books - what you see there now at the time of this blog is just a small number of the books that currently surround me in my geek cave - so stay tuned as this list grows. If you have any particular favorites you think I should study, please let me know.
Categories: Architecture

The Computing Priesthood

At one time, computing was a priesthood, then it became personal; now it is social, but it is becoming more human.

In the early days of modern computing - the 40s, 50s and 60s - computing was a priesthood. Only a few were allowed to commune directly with the machine; all others would give their punched card offerings to the anointed, who would in turn genuflect before their card readers and perform their rituals amid the flashing of lights, the clicking of relays, and the whirring of fans and motors. If the offering was well-received, the anointed would call the communicants forward and in solemn silence hand them printed manuscripts, whose signs and symbols would be studied with fevered brow.

But there arose in the world heretics, the Martin Luthers of computing, who demanded that those glass walls and raised floors be brought down. Most of these heretics cried out for reformation because they once had a personal revelation with a machine; from time to time, a secular individual was allowed full access to an otherwise sacred machine, and therein would experience an epiphany that it was the machines who should serve the individual, not the reverse. Their heresy spread organically until it became dogma. The computer was now personal.

But no computer is an island entire of itself; every computer is a piece of the continent, a part of the main. And so it passed that the computer, while still personal, became social, connected to other computers that were in turn connected to yet others, bringing along their users who delighted in the unexpected consequences of this network effect. We all became part of the web of computed humanity, able to weave our own personal threads in a way that added to this glorious tapestry whose patterns made manifest the noise and the glitter of a frantic global conversation.

It is as if we have created a universe, then as its creators, made the choice to step inside and live within it. And yet, though connected, we remain restless. We now strive to craft devices that amplify us, that look like us, that mimic our intelligence.

Dr. Jeffrey McKee has noted that "every species is a transitional species." It is indeed so; in the co-evolution of computing and humanity, both are in transition. It is no surprise, therefore, that we now turn to re-create computing in our own image, and in that journey we are equally transformed.
Categories: Architecture

Responsibility

No matter what future we may envision, that future relies on software-intensive systems that have not yet been written.

You can now follow me on Twitter.
Categories: Architecture

There Were Giants Upon the Earth

Steve Jobs. Dennis Ritchie. John McCarthy. Tony Sale.

These are men who - save for Steve Jobs - were little known outside the technical community, but without whom computing as we know it today would not be. Dennis created Unix and C; John invented Lisp; Tony continued the legacy of Bletchley Park, where Turing and others toiled in extreme secrecy but whose efforts shorted World War II by two years.

All pioneers of computing.

They will be missed.
Categories: Architecture

Steve Jobs

This generation, this world, was graced with the brilliance of Steve Jobs, a man of integrity who irreversibly changed the nature of computing for the good. His passion for simplicity, elegance, and beauty - even in the invisible - was and is an inspiration for all software developers.

Quote of the day:

Almost everything - all external expectations, all pride, all fear of embarrassment or failure - these things just fall away in the face of death, leaving only what is truly important. Remembering that you are going to die is the best way I know to avoid the trap of thinking you have something to lose. You are already naked. There is no reason not to follow your heart.
Steve Jobs
Categories: Architecture

Thu, 01/01/1970 - 01:00