Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Architecture

All employees should be limited only by their ability rather than an absence of resources.

James Hamilton hid a pearl of wisdom inside Why Renewable Energy (Alone) Won't Full Solve the Problem that I think is well worth prying out:

I’ve long advocated the use of economic incentives to drive innovative uses of computing resources inside the company while preventing costs from spiraling out of control.  Most IT departments control costs by having computing resources in short supply and only buying more resources slowly and with considerable care. Effectively computing is a scarce resource so it needs to get used carefully. This effectively limits IT cost growth and controls wastage but it also limits overall corporate innovation and the gains driven by the experiments that need these additional resources.

I’m a big believer in making effectively infinite computing resources available internally and billing them back precisely to the team that used them. Of course, each internal group needs to show the customer value of their resource consumption. Asking every group to effectively be a standalone profit center is, in some ways, complex in that the “product” from some groups is hard to quantitatively measure. Giving teams the resources they need to experiment and then allowing successful experiments to progress rapidly into production encourages innovation, makes for a more exciting place to work, and the improvements brought by successful experiments help the company be more competitive and better serve its customers.

I argue that all employees should be limited only by their ability rather than an absence of resources or an inability to argue convincingly for more. This is one of the most important yet least discussed advantages of cloud computing: taking away artificial resource limitations in support light-weight experimentation and rapid innovation. Making individual engineers and teams responsible to deliver more value for more resources consumed makes it possible encourage experimentation without fear that costs will rise without sufficient value being produced.

Categories: Architecture

Testing Feature Branches Remotely with Grunt

Xebia Blog - Tue, 12/02/2014 - 17:48

At my current job we are working on multiple features simultaneously, using git feature branches. We have a Jenkins build server which we use for integration testing of the master branch, which runs about 20 jobs simultaneously for Protractor and Fitnesse tests. An individual job typically takes around 10 minutes to complete.

Our policy is to keep the master branch production ready at all times. Therefore we have a review process in place that should assure that feature branches are only pushed to master when they can't break the application.
This all works very well as long as the feature which you are working on requires only one or two integration test suites to test its functionality. But every once in a while you're working on something that could have effects all over the application, and you would like to run a larger number of integration test suites. And of course before you merge your feature branch to master.
Running all the integration suites on your local machine would take too way much time. And Jenkins is configured to run all its suites against the master branch. So what to do?

The Solution

In this article I'm going to show a solution that we developed for this problem, which lets us start multiple remote Jenkins jobs on the branch that we are working on. This way we can continue working on our local machine while Jenkins is running integration tests on the build server.
Most of our integration suites run against our frontend modules, and for those modules we use grunt as our build tool.
Therefore the most practical step was extend grunt with a task for starting the integration tests on Jenkins: we'd like to type 'grunt jenkins' and then grunt should figure out which branch we have checked out, send that information to Jenkins, and start all the integration suites.
To accomplish that we need to take the following steps:

  • Have some Jenkins integration tests suites which can take a git branch as a parameter
  • Create a custom grunt task called 'jenkins'
  • Let the grunt jenkins task figure out which branch we have checked out
  • Let the grunt jenkins task start a bunch of remote jenkins jobs with the branch as a parameter
The Parameterized Jenkins job

Jenkins offers the feature to configure your build with a parameter. Here is how we do it:
In the configuration of a Jenkins job there's a little checkbox saying 'the build is parameterized'. Upon checking it, you can enter a parameter name, which will be available in your Jenkins build script.
We'll add a parameter called BRANCH, like in the screenshot below:

Jenkins job Parameter

Then in our Jenkins build script, we can check if the parameter is set, and if this is the case, check out the branch. It will look something like this:

git fetch
if [[ -n "$BRANCH" ]]; then
  git checkout -f $BRANCH
  git pull
else
  git checkout -f ${PROMOTED_GIT_COMMIT-"origin/master"}
fi

What's nice about our parameterized build job is that we can invoke it via a Rest call and include our parameter as a query parameter. I'll show that later on.

Our custom 'jenkins' Grunt task

In our grunt.js configuration file, we can load custom tasks. The following snippet loads all files in the conf/grunt/tasks folder.

 grunt.loadTasks('conf/grunt/tasks');

In our tasks folder we create a jenkins.js file, containing our custom jenkins task.
The next thing to do is to retrieve the name of the branch which we have checked out on our machine. There's a grunt plugin called 'gitinfo' which will help us with that.
When the gitinfo plugin is invoked it will add a section to the grunt configuration which contains, amongst others, the name of our current local branch:

module.exports = function (grunt) {
  grunt.registerTask('jenkins', ['gitinfo', 'build-branch']);
  
  grunt.registerTask('build-branch', function () {
    var git = grunt.config().gitinfo;
    grunt.log.ok('Building branch: ' + git.local.branch.current.name);


And now we can start our parameterized job with the correct value for the branch parameter, like this:

    var request = require('request');

    var jenkinsUser = 'your username';
    var jenkinsPassword = 'your password';
    var jenkinsHost = 'your jenkins host';
    var job = 'my-parameterized-integration-suite'; 

    var url = 'http://' + jenkinsUser + ':' + jenkinsHost + '@' + jenkinsHost + ':8080/job/' + job + '/buildWithParameters?BRANCH=' + git.local.branch.current.name + '&delay=0sec';

      request({
        url: url,
        method: 'POST'
      },
      jobFinished(job));
    });

First we acquire a reference to the 'request' package. This is a simple Node package that lets you perform http requests.
We then build the Rest url; to connect to jenkins we need to supply our Jenkins username and password.
And finally we post a request to the Rest endpoint of Jenkins, which will start our job. We supply a callback called 'jobFinished'.

Putting it all together: starting multiple jobs

With these steps in place, we have a new grunt task which we can invoke with 'grunt jenkins' from the commandline, and which will start a Jenkins job on the feature branch that we have checked out locally.
But this will only be useful if our grunt jenkins task is able to start not just one job, but a bunch of them.
Here is the full source code of the jenkins.js file. It has a (hardcoded) array of jobs, starts all of them and keeps track of how many of them have finished:

module.exports = function (grunt) {

  grunt.registerTask('jenkins', ['gitinfo', 'build-branch']);

  grunt.registerTask('build-branch', function () {
    var request = require('request');

    var jenkinsUser = 'your username';
    var jenkinsPassword = 'your password';
    var jenkinsHost = 'your jenkins host';
  
    var jobs = [
      'my-parameterized-integration-suite-1',
      'my-parameterized-integration-suite-2',
      'my-parameterized-integration-suite-3',
      'my-parameterized-integration-suite-4',
      'my-parameterized-integration-suite-5'
    ];
    var git = grunt.config().gitinfo;
    var done = this.async();
    var jobCounter = 0;

    grunt.log.writeln();
    grunt.log.ok('Building branch: ' + git.local.branch.current.name);
    grunt.log.writeln();

    function jobFinished (job) {
      return function (error, response, body) {
        jobCounter++;
        grunt.log.ok('[' + jobCounter + '/' + jobs.length + '] Started: ' + job);

        if (error) {
          grunt.log.error('Error: ' + error + (response ? ', status: ' + response.statusCode : ''));
        } else if (response.statusCode === 301) {
          grunt.log.writeln('See: ' + response.headers.location);
        }

        if (body) {
          grunt.log.writeln(body);
        }

        if (jobCounter === jobs.length) {
          grunt.log.ok();
          done();
        }
      };
    }

    jobs.forEach(function (job, i) {
      var url = 'http://' + jenkinsUser + ':' + jenkinsHost + '@' + jenkinsHost + ':8080/job/' + job + '/buildWithParameters?BRANCH=' + git.local.branch.current.name + '&delay=0sec';
      grunt.log.ok('[' + (i + 1) + '/' + jobs.length + '] Starting: ' + job);

      request({
        url: url,
        method: 'POST'
      },
      jobFinished(job));
    });

    grunt.log.ok();

  });
};

And here's the console output:

$ grunt jenkins
Running "gitinfo" task

Running "build-branch" task

>> Building branch: my-feature-branch

>> [1/5] Starting: my-parameterized-integration-suite-1
>> [2/5] Starting: my-parameterized-integration-suite-2
>> [3/5] Starting: my-parameterized-integration-suite-3
>> [4/5] Starting: my-parameterized-integration-suite-4
>> [5/5] Starting: my-parameterized-integration-suite-5
OK
>> [1/5] Started: my-parameterized-integration-suite-1
>> [2/5] Started: my-parameterized-integration-suite-2
>> [3/5] Started: my-parameterized-integration-suite-3
>> [4/5] Started: my-parameterized-integration-suite-4
>> [5/5] Started: my-parameterized-integration-suite-5
OK

Done, without errors.

If You Want to Thrive at Microsoft

I was reading back through Satya Nadella’s email on Bold Ambition and Our Core, and a few things caught my eye.

One of them was the idea that if you want to thrive at Microsoft, you need to drive change.

Satya writes:

“And if you want to thrive at Microsoft and make a world impact, you and your team must add numerous more changes to this list that you will be enthusiastic about driving.

Nothing is off the table in how we think about shifting our culture to deliver on this core strategy. Organizations will change. Mergers and acquisitions will occur. Job responsibilities will evolve. New partnerships will be formed. Tired traditions will be questioned. Our priorities will be adjusted. New skills will be built. New ideas will be heard. New hires will be made. Processes will be simplified. And if you want to thrive at Microsoft and make a world impact, you and your team must add numerous more changes to this list that you will be enthusiastic about driving.”

Change is in the air, and Satya has given everyone a license to thrive by re-imagining how to change the world, or at least their part of it.

For me, I’m focused on how to accelerate business transformation with Cloud, Mobile, Social, Big Data and the Internet of Things.

Together, these technology trends are enabling new end-to-end customer experiences, workforce transformation, and operations transformation.

It’s all about unleashing what individuals and businesses are capable of.

Categories: Architecture, Programming

New blogpost on kibana 4 beta

Gridshore - Tue, 12/02/2014 - 12:55

If you are like me interested in elasticsearch and kibana, than you might be interested in a blog post I wrote on my employers blog about the new Kibana 4 beta. If so, head over to my employers blog:

http://amsterdam.luminis.eu/2014/12/01/experiment-with-the-kibana-4-beta/

The post New blogpost on kibana 4 beta appeared first on Gridshore.

Categories: Architecture, Programming

Auth0 Architecture - Running in Multiple Cloud Providers and Regions

This is a guest post by Jose Romaniello, Head of Engineering, at Auth0.

Auth0 provides authentication, authorization and single sign on services for apps of any type: mobile, web, native; on any stack.

Authentication is critical for the vast majority of apps. We designed Auth0 from the beginning with multipe levels of redundancy. One of this levels is hosting. Auth0 can run anywhere: our cloud, your cloud, or even your own servers. And when we run Auth0 we run it on multiple-cloud providers and in multiple regions simultaneously.

This article is a brief introduction of the infrastructure behind app.auth0.com and the strategies we use to keep it up and running with high availability.

Core Service Architecture

The core service is relatively simple:

  • Front-end servers: these consist of several x-large VMs, running Ubuntu on Microsoft Azure.

  • Store: mongodb, running on dedicated memory optimized X-large VMs.

  • Intra-node service routing: nginx

All components of Auth0 (e.g. Dashboard, transaction server, docs) run on all nodes. All identical.

Multi-cloud / High Availability
Categories: Architecture

About snowmen and mathematical proof why agile works

Xebia Blog - Mon, 12/01/2014 - 16:05

Last week I had an interesting course by Roger Sessions on Snowman Architecture. The perishable nature of Snowmen under any serious form of pressure fortunately does not apply to his architecture principles, but being an agile fundamentalist I noticed some interesting patterns in the math underlying the Snowmen Architecture that are well rooted in agile practices. Understanding these principles may give facts to feed your gut feeling about these philosophies and give mathematical proof as to why Agile works.

Complexity

“What has philosophy got to do with measuring anything? It's the mathematicians you have to trust, and they measure the skies like we measure a field. “ - Galileo Galilei, Concerning the New Star (1606).

In his book “Facts and Fallacies of Software Engineering” Robert Glass implied that when the functionality of a system increases by 25% the complexity of it effectively doubles. So in formula form:

                      

This hypothesis is supported by empirical evidence, and also explains why planning poker that focuses on the complexity of the implementation rather than functionality delivered is a more accurate estimator of what a team can deliver in sprint.

Basically the smaller you can make the functionality the better, and that is better to the power 3 for you! Once you start making functionality smaller, you will find that your awesome small functionality needs to talk to other functionalities in order to be useful for an end user. These dependencies are penalized by Roger’s model.

“An outside dependency contributes as much complexity as any other function, but does so independently of the functions.”

In other words, splitting a functionality of say 4 points (74 complexity points) in two equal separate functions reduces the overall complexity to 17 complexity points. This benefit however vanishes when each module has more than 3 connections.

An interesting observation that one can derive from this is a mathematical model that helps you to find which functions “belong” together. It stands to reason that when those functions suffer from technical interfacing, they will equally suffer from human interfaces. But how do we find which functions “belong” together, and does it matter if we get it approximately right? 

Endless possibilities

“Almost right doesn’t count” – Dr. Taylor; on landing a spacecraft after a 300 million miles journey 50 meter from a spot with adequate sunlight for the solar panels. 

Partitioning math is incredibly complex, and the main problem with the separation of functions and interfaces is that it has massive implications if you get it “just about right”. This is neatly covered by “the Bell number” (http://en.wikipedia.org/wiki/Bell_number).

These numbers grow quite quickly e.g. a set of 2 functions can be split 2 ways, but a set of 3 already has 5 options, at 6 it is 203 and if your application covers a mere 16 business functions, we already have more than 10 billion ways to create sets, and only a handful will give that desired low complexity number.

So how can math help us to find the optimal set division? the one with the lowest complexity factor?

Equivalence Relations

In order to find business functions that belong together or at lease have so much in common that the number of interfaces will outweigh the functional complexity, we can resort to the set equivalence relation (http://en.wikipedia.org/wiki/Equivalence_relation). It is both the strong and the weak point in the Snowmen architecture. It provides a genius algorithm for separating a set in the most optimal subsets (and doing so in O(n + k log k) time). The equivalence relation that Session proposes is as follows:

            Two business functions {a, b} have synergy if, and only if, from a business perspective {a} is not useful without {b} and visa versa.

The weak point is the subjective measurement in the equation. When played at a too high level it will all be required, and on a too low level not return any valuable business results.

In my last project we split a large eCommerce platform in the customer facing part and the order handling part. This worked so well that the teams started complaining that the separation had lowered their knowledge of each other’s codebase, since very little functionality required coding on both subsystems.

We effectively had reduced complexity considerable, but could have taken it one step further. The order handling system was talking to a lot of other systems in order to get the order fulfilled. From a business perspective we could have separated further, reducing complexity even further. In fact, armed with Glass’s Law, we’ll refactor the application to make it even better than it is today.

Why bother?

Well, polynomial growing problems can’t be solved with linear solutions.

Polynomial problems vs linear solutions plotted against time

Polynomial problems vs linear solutions plotted against time

As long as the complexity is below the solution curve, things will be going fine. Then there is a point in time where the complexity surpasses our ability to solve it. Sure we can add a team, or a new technology, but unless we change nature of our problem, we are only postponing the inevitable.

This is the root cause why your user stories should not exceed the sprint boundaries. Scrum forces you to chop the functionality into smaller pieces that move the team in a phase where linear development power supersedes the complexity of the problem. In practice, in almost every case where we saw a team breaking this rule, they would end up at the “uh-oh moment” at some point in the future, at that stage where there are no neat solutions any more.

So believe in the math and divide your complexity curve in smaller chunks, where your solution capacity exceeds the problems complexity. (As a bonus you get a happy and thriving team.)

(Edu) Scrum at XP Days Benelux: beware of the next generation

Xebia Blog - Sat, 11/29/2014 - 09:21

Xp Days Benelux 2014 is over, and it was excellent.
Good sessions, interesting mix of topics and presenters, and a wonderful atmosphere of knowledge sharing, respect and passion for Agile.

After 12 years, XP Days Benelux continues to be inspiring and surprising.

The greatest surprise for me was the participation of 12 high school students from the Valuas College in Venlo, who arrived on the second day. These youngsters did not only attend the conference, but they actually hosted a 120-minute session on Scrum at school, called EduScrum.

eduscrum

 

Eduscrum

EduScrum uses the ceremonies, roles and artifacts of Scrum to help young people learn in a better way. Students work together in small teams, and thus take ownership of their own learning process. At the Valuas College, two enthusiastic Chemistry teachers introduced EduScrum in their department two years ago, and have made the switch to teaching Chemistry in this new way.

In an interactive session, we, the adults, learned from the youngsters how they work and what EduScrum brought them. They showed their (foldable!) Scrum boards, explained how their teams are formed, and what the impact was on their study results. Forcing themselves to speak English, they were open, honest, courageous and admirable.

eduscrum2

 

Learnings

Doing Scrum in school has many similarities with doing Scrum at work. However, there is also a lot we can learn from the youngsters. These are my main takeaways:

- Transition is hard
It took the students some time to get used to working in the new way. At first they thought it was awkward. The transition took about… 4 lessons. That means that these youngsters were up and running with Scrum in 2 weeks (!).

- Inform your stakeholders
When the teachers introduced Scrum, they did not inform their main stakeholders, the parents. Some parents, therefore, were quite worried about this strange thing happening at school. However,  after some explanations, the parents recognised that eduScrum actually helps to prepare their children for today’s society and were happy with the process.

- Results count
In schools more than anywhere else, your results (grades) count. EduScrum students are graded as a team as well as individually. When they transitioned to Scrum the students experienced a drop in their grades at first, maybe due to the greater freedom and responsibility they had to get used to. Soon after, theirs grades got better.

- Compliancy is important
Schools and teachers have to comply with many rules and regulations. The knowledge that needs to get acquired each year is quite fixed. However, with EduScrum the students decide how they will acquire that knowledge.

- Scrum teaches you to cooperate
Not surprisingly, all students said that, next to Chemistry, they now learned to cooperate and communicate better. Because of this teamwork, most students like to work this way. However, this is also the reason a few classmates would like to return to the old, individual, style of learning. Teamwork does not suit everyone.

- Having fun helps you to work better
School (and work) should not be boring, and we work better together when we have some fun too. Therefore, next to a Definition of Done, the student teams also have a Definition of Fun.  :-)

Next generation Scrum

At the conference, the youngsters were surprised to see that so many companies that they know personally (like Bol.com) are actually doing Scrum. ‘I thought this was just something I learned to do in school ‘, one girl said. ‘But now I see that it is being used in so many companies and I will actually be able to use it after school, too.’

Beware of these youngsters. When this generation enters the work force, they will embrace Scrum as the natural way of working. In fact, this generation is going to take Scrum to the next level.

 

 

 

 

Make Any Framework Suck Less With These 10 Insightful Lessons

Alexey Migutsky in 2 years with Angular has a lot to say about Angular, which I can't comment on at all, not being an Angular user. But burried in his article are some lessons for building better frameworks that obviously come from deep experience. Frameworks will always suck, but if you follow these lessons will your frameworks suck less? Yes, I think they will.

Here are Alexey's Lessons for framework (and metaframework) developers:

  1. You should have as small as possible number on abstractions.
  2. You should name things consistent with your "thought domain".
  3. Do not mix several responsibilities in your components. Make fine-grained abstractions with well-defined roles.
  4. Always describe the intention for your decisions and tradeoffs in your documentation.
  5. Have a currated and updated reference project/examples.
  6. You abstractions should scale "from bottom up". Start with small items and then fit them to a Composite pattern. Do not start with the question "How do we override it globally?".
  7. Global state is pure evil. It's like darkness in the horror films - you never know what problems you will have when you tread into it...
  8. The dataflow and data changes should be granular and localized to a single component.
  9. Do not make things easy to use, make your components and abstractions simple to understand. People should learn how to do stuff in a new and effective way, do not ADAPT to their comfort zone.
  10. Do not encode all good things you know in the framework.
Categories: Architecture

How to implement validation callbacks in AngularJS 1.3

Xebia Blog - Wed, 11/26/2014 - 09:21

In my current project we've recently switched from AngularJS 1.2 to 1.3. Except for a few breaking changes the upgrade was quite trivial. However, after diving into the changelog we noticed that the way AngularJS handles form validation changed drastically. Since we're working on a greenfield application we decided it was worth the effort to rewrite the validation logic. The main argument for this was that the validation we had could be drastically simplified by using the new validation pipeline.

This article is aimed at AngularJS developers interested in the new validation pipeline offered by AngularJS 1.3. Except for a small introduction, this article will not be about all the different aspects related to validating forms. I will showcase 2 different cases in which we had to come up with custom solutions:

  • Displaying additional information after successful validation
  • Validating equality of multiple password fields

What has changed?

Whereas in AngularJS 1.2 we could use $parsers and $formatters for form validation, AngularJS 1.3 introduces the concept of $validators and $asyncValidators. As we can deduce from the names, the latter is for server-side validations using HTTP calls and the former is for validations on the client-side.

All validators are directives that are registered on a specific ngModel by either adding it to the ngModel.$validators or ngModel.$asyncValidators. When validating, all $validators run before executing the $asyncValidators.

AngularJS 1.3 also utilises the HTML5 validation API wherever possible. For each of the HTML5 validation attributes, AngularJS offers a directive. For example, for minlength, add ng-minlength to your input field to incorporate the minimum length check.

When it comes down to showing error messages we can rely on the $error property on a ngModel. Whenever one of the validators fail it will add the name of the validator to the $error property. Using the brand new ngMessages module we can then easily display specific error messaging depending on the type of validator.

Displaying additional information after successful validation

Implementing the new validation pipeline came with a few challenges. The biggest being that we had quite a few use cases in which, after successfully validating a field, we wanted to display some data returned by the web service. Below I will discuss how we've solved this.

The directive itself is very simple and merely does the following:

  1. Clear the data displayed next to the field. If the user has already entered text and the validation succeeds the data from validation call will be shown next to the input field. If the user were to change the input's value and it would not validate correctly, the data displayed next to the field would be stale. To prevent this we first clear the data displayed next to the field at the start of the validation.
  2. Validate the content against the web service using the HelloResource. Besides returning the promise the resource gives us we invoke the callback() method when the promise is successfully resolved.
  3. Display data returned by the HTTP call using a callback method
'use strict';

angular.module('angularValidators')
  .directive('validatorWithCallback', function (HelloResource) {
    return {
      require: 'ngModel',
      link: function (scope, element, attrs, ngModel) {
        function callback(response) {}

        ngModel.$asyncValidators.validateWithCallback = function (modelValue, viewValue) {
          callback('');

          var value = modelValue || viewValue;

          return HelloResource.get({name: value}).$promise.then(function (response) {
            callback(response);
          });
        };
      }
    };
});

We can add the validator to our input by adding the validator-with-callback attribute to the input which we would like to validate.

<form name="form">
    <input type="text" name="name" ng-model="name" required validator-with-callback />
</form>
Implementing the clear and callback

Because this directive should be independent from any specific ngModel we have to find a way to pass the ngModel to the directive. To accomplish this we add a value to the validator-with-callback attribute. We also change the ng-model attribute value to name.value. Why this is required will be explained later on. To finish we also add a div that will only display when the form element is valid and we will set it to display the value of name.detail.

<form name="form">
    <input type="text" name="name" ng-model="name.value" required validator-with-callback="name" />
    <div ng-if="form.name.$valid">{{name.detail}}</div>
</form>

The $eval method from scope can be used to resolve the object using the attribute's value. Displaying the data won't work if we simply supply and overwrite any scoped object (f.e. $scope.data). We have to add a scoped object name which contains 2 properties: value and detail. Note: the naming is not important.

We will add a controller to our HTML file which will be responsible for setting the default value for our scoped object name. As shown in our HTML view above, the value property will be used for storing the value of the field. The detail property will be used to store the response from the web service call and display it to the user.

'use strict';

angular.module('angularValidators')
  .controller('ValidationController', function ($scope) {
    $scope.name = {
      value: undefined,
      detail: ''
    };
});

The last thing is changing the directive implementation to retrieve the target object and implement the clear and callback methods. We do this by retrieving the value from the validator-with-callback attribute by calling scope.$eval(attrs.validatorWithCallback). When we have the target object we can implement the callback method.

'use strict';

angular.module('angularValidators')
  .directive('validatorWithCallback', function (HelloResource) {
    return {
      require: 'ngModel',
      link: function (scope, element, attrs, ngModel) {
        var target = scope.$eval(attrs.validatorWithCallback);

        function callback(response) {
          if (_.isUndefined(target)) {
            return;
          }

          target.detail = response.msg;
        }
        ngModel.$asyncValidators.validateWithCallback = function (modelValue, viewValue) {
          ...omitted...
        };
      }
    };
});

This is all that's needed to create a directive with a callback method. This callback method uses data returned by the web service to populate the property value but it can of course be adjusted to do anything we desire.

Validating equality of multiple password fields

The second example we would like to show is a synchronous validator that validates the values of 2 different fields. The use case we had for this were 2 password fields which required to be equal.

Requirements
  • (In)validate both fields when the user changes the value of either one of them and they are (not) equal
  • The validator successfully validates the field if the second input has not been touched or the value of the second input is empty.
  • Only display 1 error message and only when no other validators (required & min-length) are invalid
Implementation

We start of by creating the 2 different form elements with both a required and a ng-minlength validator. We also add a button to the form to show how enabling/disabling the button depending on the validity of the form works.

Both password fields also have the validate-must-equal-to="other_field_name" attribute. This indicates that we wish to validate the value of this field against the field defined by the attribute. We also add a form-name="form" attribute to pass in the name of the form to our directive. This is needed for invalidating the second input on our form without hardcoding the form name inside the directive and thus making this directive fully independent from form and field names.

To conclude we also conditionally show or hide the error displaying containers. For the errors related to the first input field we also specify that it should not display if the notEqualTo error has been set by our directive. This ensures that no empty div will be displayed if our validator invalidates the first field.

<form name="form">
    <input type="password" name="password" ng-model="password" required ng-minlength="8" validate-must-equal-to="password2" form-name="form" />
    <div ng-messages="form.password.$error" ng-if="form.password.$touched && form.password.$invalid && !form.password.$error.notEqualTo">
        <div ng-message="required">This field is required</div>
        <div ng-message="minlength">Your password must be at least 8 characters long</div>
    </div>
    <br />
    <input type="password" name="password2" ng-model="password2" required ng-minlength="8" validate-must-equal-to="password" form-name="form" />
    <div ng-messages="form.password2.$error" ng-if="form.password2.$touched && form.password2.$invalid">
        <div ng-message="required">This field is required</div>
        <div ng-message="minlength">Your password must be at least 8 characters long</div>
    </div>

    <p>The submit button will only be enabled when the entire form is valid</p>
    <button ng-disabled="form.$invalid">Submit</button>
</form>

The validator itself is again very compact. Basically all we want is to retrieve the value from the input and pass it to a isEqualToOther method which returns a boolean. At the beginning of the link method we also do a check to see if the form-name attribute is provided. If not, we throw an error. We do this to communicate to any developer reusing this directive that this directive requires the form name to function correctly. Unfortunately at this moment there is no other way to communicate the additional mandatory attribute.

'use strict';

angular.module('angularValidators')
  .directive('validateMustEqualTo', function () {
    return {
      require: 'ngModel',
      link: function (scope, element, attrs, ngModel) {
        if (_.isUndefined(attrs.formName)) {
          throw 'For this directive to function correctly you need to supply the form-name attribute';
        }

        function isEqualToOther(value) {
          ...omitted...
        }

        ngModel.$validators.notEqualTo = function (modelValue, viewValue) {
          var value = modelValue || viewValue;

          return isEqualToOther(value);
        };
      }
    };
  });

The isEqualToOther method itself does the following:

  • Retrieve the other input form element
  • Throw an error if it cannot be found which again means this directive won't function as intended
  • Retrieve the value from the other input and validate the field if the input has not been touched or the value is empty
  • Compare both values
  • Set the validity of the other field depending on the comparison
  • Return the comparison to (in)validate the field this directive is linked to
'use strict';

angular.module('angularValidators')
  .directive('validateMustEqualTo', function () {
    return {
      require: 'ngModel',
      link: function (scope, element, attrs, ngModel) {
        ...omitted...

        function isEqualToOther(value) {
          var otherInput = scope[attrs.formName][attrs.validateMustEqualTo];
          if (_.isUndefined(otherInput)) {
            throw 'Cannot retrieve the second field to compare with from the scope';
          }

          var otherValue = otherInput.$modelValue || otherInput.$viewValue;
          if (otherInput.$untouched || _.isEmpty(otherValue)) {
            return true;
          }

          var isEqual = (value === otherValue);

          otherInput.$setValidity('notEqualTo', isEqual);

          return isEqual;
        }

        ngModel.$validators.notEqualTo = function (modelValue, viewValue) {
          ...omitted...
        };
      }
    };
  });
Alternative solution

An alternative solution to the validate-must-equal-to directive could be to implement a directive that encapsulates both password fields and has a scoped function that would handle validation using ng-blur on the fields or a $watch on both properties. However, this approach does not use the out-of-the-box validation pipeline and we would thus have to extend the logic in the form button's ng-disabled to allow the user to submit the form.

Conclusion

AngularJS 1.3 introduces a new validation pipeline which is incredibly easy to use. However, when faced with more advanced validation rules it becomes clear that certain features (like the callback mechanism) are lacking for which we had to find custom solutions. In this article we've shown you 2 different validation cases which extend the standard pipeline.

Demo application

I've set up a stand-alone demo application which can be cloned from GitHub. This demo includes both validators and karma tests that cover all different scenario's. Please feel free to use and modify this code as you feel appropriate.

Sponsored Post: Apple, Asana, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Sr. Software Engineer-iOS Systems. Do you love building highly scalable, distributed web applications? Does the idea of performance tuning Java applications make your heart leap? Would you like to work in a fast-paced environment where your technical abilities will be challenged on a day to day basis? Do you want your work to make a difference in the lives of millions of people? Please apply here.
    • Apple Pay - Site Reliability Engineer. You already know this… every issue counts. A single ticket can be the key to discovering an issue impacting thousands of people. And now that you’ve found it, you can’t wait to fix it. You also know that owning the quality of an application is about separating the signal from the noise. Finding that signal is what motives you. Now that you’ve found it, you’re next step is to role up the sleeves and start coding. As a member of the Apple Pay SRE team, you’re expected to not just find the issues, but to write code and fix them. Please apply here.
    • Senior Software Engineer -iOS Systems. This role demands the best and brightest engineers. The ideal candidate will be well rounded and offer a diverse skill set that aligns with key qualifications. Practical experience integrating with a diverse set of third-party APIs will also serve to distinguish you from other candidates. This is a highly cross functional role, and the typical team member's day to day responsibilities on the Carrier Services team. Please apply here
  • Aerospike is hiring! Join the innovative team behind the world's fastest flash-optimized in-memory NoSQL database. Currently hiring for positions in our Mountain View, Calif., and Bangalore offices. Apply now! http://www.aerospike.com/careers

  • As a production-focused infrastructure engineer at Asana, you’ll be the person who takes the lead on setting and achieving our stability and uptime goals, architecting the production stack, defining the on-call experience, the build process, cluster management, monitoring and alerting. Please apply here.

  • Performance and Scale EngineerSprout Social, will be like a physical trainer for the Sprout social media management platform: you will evaluate and make improvements to keep our large, diverse tech stack happy, healthy, and, most importantly, fast. You'll work up and down our back-end stack - from our RESTful API through to our myriad data systems and into the Java services and Hadoop clusters that feed them - searching for SPOFs, performance issues, and places where we can shore things up. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/
Cool Products and Services
  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

CITCON Europe 2014 wrap-up

Xebia Blog - Mon, 11/24/2014 - 19:33

On the 19th and 20th of September CITCON (pronounced "kit-con") took place in Zagreb, Croatia. CITCON is dedicated to continuous integration and testing. It brings together some of the most interesting people of the European testing and continuous integration community. These people also determine the topics of the conference.

They can do this because CITCON is an Open Space conference. If you're not familiar with the concept of Open Space, check out Wikipedia. On Friday evening, attendees can pitch their proposals. Through dot voting and (constant) shuffling of the schedule, the attendees create their conference program.

In this post we'll wrap up a few topics that were discussed.

Polytesting

Peter Zsoldos (@zsepi) went into his most recent brain-spin: polytesting. If I have a set of requirements, is it feasible to apply those requirements at different levels of my application; say, component, integration and UI level. This sounds very appealing because you can perform ATDD at different levels.
This approach is particularly interesting because it has the potential to keep you focused on the required functionality all the way. You'll need good, concrete requirements for this to work.

Microservices

Microservices are a hot topic nowadays. The promise of small, isolated units with clear interfaces is tempting. There are generally two types of architectures that can be applied. The most common one is where there is no central entity, and services communicate to each other directly.

Douglas Squirrel (@douglasquirrel) explained an alternative architecture by using a central pub-sub "database" to which each service is allowed to publish "facts". Douglas deliberately used the term facts to describe single items that are considered true at a specific point in time ("events"" is too generic a term).

The second model comes closer to mechanisms such as event sourcing (or even ESBs if you take it to the extreme). One of the advantages of this approach is that, because facts are stored, it's possible to construct new functionality based on existing facts. For example, you could use this functionality in a game to create leader boards and, at a later stage, create leaderboards per continent, country, or whatever seems fit.

Unit testing

Arjan Molenaar introduced a flaming hot topic this year: "unit testing is a waste". Inspired by recent discussions of DHH, Martin Fowler, and Kent Beck, Arjan tried to find out the opinions of the CITCON crowd. Most of the people contributing to the discussion must have been working in consultancy, because the main conclusion was "It depends".

Whether unit testing is worth the effort mainly depends on the goals that people try to achieve when writing their unit tests. Some people write them from a TDD perspective. They use tests to guide themselves through development cycles, making sure they haven't made little errors. If this helps you, then please keep doing it! If it does not really help, well ...

Other people write unit tests from a regression perspective, or at least maintain them for regression testing. This part caused the most discussion. How useful are unit tests for regression testing purposes? Are you really catching regression if you isolate a single unit?

The growing interest in microservices also sheds new light on this discussion. In the future, when microservices will be widely adopted, we will be working with much smaller codebases. They might be so small and clear that unit tests are no longer required to guide us through the development process.

CI scaling

Another trending topic was scaling CI systems. It was good to see that the ideas we have at Xebia were consistent with the ideas we heard at CITCON. First off, the solution for everything (and world peace, it seems) is microservices. Unfortunately, some of us, for the time being, must deal with monolithic codebases. Luckily there are still options for your growing CI system, even though for now it remains one big chunk of code.

The staged pipeline: you test the things most likely to break first. Basically, you break your test suite up into multiple test suites and run them at separate stages in the pipeline.

But how do you determine what is most likely to break and what to test where? Tests that are most likely to break are those that are closely linked to the code changes, so run them first. Also, determine high-risk areas for your application. These areas can be identified based on trends (in failing tests) or simply through human analysis. To determine where to run the different test suites is mainly a matter of speed versus confidence. You want fast feedback so you don't want to push all your tests to the end of the pipeline. But you also don't want to wait forever before you know you can move on to the next thing.

Beer brewing for process refinement

Who isn't interested in beer brewing? Tom Denley (@scarytom) proposed a session on home brewing and the analogy. Because Arjan is a homebrewer himself, this seemed like an obvious session for him.

In addition to Tom explaining the process of brewing, we discussed how we got into brewing. In both cases, the first brew was made with a can of hopped malt syrup, adding yeast, and producing a beer from there. For your second beer, you replace the can of syrup with malt extract powder and dark malt (for flavour). At a later stage, you can replace the malt extract with ground malt.

What we basically do is start with the end in mind. If you're starting with continuous delivery, it is considered good practice to do the same: get your application deployed in production as soon as possible and optimise your process from your deployed app back to source code.

Again, it was a good conference with some nice take-aways. Next year's episode will most likely take place in Finland. The year after that... The Netherlands?

Testing cheatsheet

Xebia Blog - Mon, 11/24/2014 - 19:00

Sometimes it is not clear for everybody how unit tests relates to e2e-test. This cheatsheet, I created, describes in one page:

  1. The different definitions
  2. Different structures of the tests
  3. The importance of unit tests
  4. The importance of e2e tests
  5. External versus internal quality
  6. E2E and unit tests living next to each other

Feel free to download and use it in your project if you feel there is a confusion of tongues between unit and e2e tests.

Download: TestingCheatSheet

 

A Flock of Tasty Sources on How to Start Learning High Scalability

This is a guest repost by Leandro Moreira.

distributed systems

When we usually are interested about scalability we look for links, explanations, books, and references. This mini article links to the references I think might help you in this journey.

DISCLAIMER:

You don’t need to have N machines to build/test a cluster/high scalable system, currently you can use Vagrant and up N machines easily.

THE REFERENCES:

Now that you know you can empower yourself with virtual servers, I challenge you to not only read these links but put them into practice.

Good questions to test your knowledge:

Categories: Architecture

Stuff The Internet Says On Scalability For November 21st, 2014

Hey, it's HighScalability time:


Sweet dreams brave Philae. May you awaken to a bright-throned dawn for one last challenge.

 

  •  80 million: bacteria xferred in a juicy kiss;
  • Quotable Quotes:
    • James Hamilton: Every day, AWS adds enough new server capacity to support all of Amazon's global infrastrucrture when it was a $7B annual revenue enterprise.
    • @iglazer: What is the test that could most destroy your business model?  Test for that. @adrianco #defragcon
    • @zhilvis: Prefer decoupling over duplication. Coupling will kill you before duplication does by @ICooper #buildstufflt
    • @jmbroad: "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." ~George Box
    • @RichardWarburto: Optimisation maybe premature but measurement isn't.
    • @joeerl: Hell hath no version numbers - the great ones saw no need for version numbers - they used port numbers instead. See, for example, RFC 821,
    • JustCallMeBen: tldr: queues only help to flatten out burst load. Make sure your maintained throughput is high enough.
    • @rolandkuhn: «the event log is a database of the past, not just of the present» — @jboner at #reactconf
    • @ChiefScientist: CRUD is dead. -- @jboner #reactconf 
    • @fdmts: 30T of flash disks cabled up and online.  Thanks @scalableinfo!
    • monocasa: Immutable, statically linked, minimal system images running micro services on top of a hypervisor is a very old concept too. This is basically the direction IBM went in the 60's with their hypervisors and they haven't looked back.
    • Kiril Savino: Scaling is the process of decoupling load from latency.

  • Perhaps they were controlled by a master AI? Google and Stanford Built Similar Neural Networks Without Knowing It: Neural networks can be plugged into one another in a very natural way. So we simply take a convolutional neural network, which understands the content of images, and then we take a recurrent neural network, which is very good at processing language, and we plug one into the other. They speak to each other—they can take an image and describe it in a sentence.

  • You know how you never really believed the view in MVC was ever really separate? Now this is MVC. WatchKit apps run on the iPhone as an extension, only the UI component runs on the watch. XWindows would be so proud.

  • Shopify shows how they Build an Internal Cloud with Docker and CoreOS: Shopify is a large Ruby on Rails application that has undergone massive scaling in recent years. Our production servers are able to scale to over 8,000 requests per second by spreading the load across 1700 cores and 6 TB RAM.

  • Machine learning isn't just about creating humavoire AIs. It's a technology, like electricity, that will transform everything it affixes with its cyclops gaze. Here's a seemingly mundane example from Google, as discussed on the Green (Low Carbon) Data Center Blog. Google has turned inward, applying Machine Learning to its data center fleet. The result:  Google achieved from 8% to 25% reduction in its energy used to cool the data center with an average of 15%.  Who wouldn’t be excited to save an average of 15% on their cooling energy costs by providing new settings to run the mechanical plant? < And this is how the world will keep those productivity increases reaching skyward.

  • Does anyone say "I love my water service"? Or "I love my garbage service"? Then why would anyone say "I love Facebook"? That's when you've arrived. When you are so much a part of the way things are that people don't even think of loving them or not. They just are. The Fall of Facebook

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

We are leaving 3x-4x performance on the table just because of configuration.

Performance guru Martin Thompson gave a great talk at Strangeloop: Aeron: Open-source high-performance messaging, and one of the many interesting points he made was how much performance is being lost because were aren't configuring machines properly.

This point comes on the observation that "Loss, throughput, and buffer size are all strongly related."

Here's a gloss of Martin's reasoning. It's a problem that keeps happening and people aren't aware that it's happening because most people are not aware of how to tune network parameters in the OS.

The separation of programmers and system admins has become an anti-pattern. Developers don’t talk to the people who have root access on machines who don’t talk to the people that have network access. Which means machines are never configured right, which leads to a lot of loss. We are leaving 3x-4x performance on the table just because of configuration.

We need to workout how to bridge that gap, know what the parameters are, and how to fix them.

So know your OS network parameters and how to tune them.

Related Articles
Categories: Architecture

What is a Monolith?

Coding the Architecture - Simon Brown - Wed, 11/19/2014 - 10:00

There is currently a strong trend for microservice based architectures and frequent discussions comparing them to monoliths. There is much advice about breaking-up monoliths into microservices and also some amusing fights between proponents of the two paradigms - see the great Microservices vs Monolithic Melee. The term 'Monolith' is increasingly being used as a generic insult in the same way that 'Legacy' is!

However, I believe that there is a great deal of misunderstanding about exactly what a 'Monolith' is and those discussing it are often talking about completely different things.

A monolith can be considered an architectural style or a software development pattern (or anti-pattern if you view it negatively). Styles and patterns usually fit into different Viewtypes (a viewtype is a set, or category, of views that can be easily reconciled with each other [Clements et al., 2010]) and some basic viewtypes we can discuss are:

  • Module - The code units and their relation to each other at compile time.
  • Allocation - The mapping of the software onto its environment.
  • Runtime - The static structure of the software elements and how they interact at runtime.

A monolith could refer to any of the basic viewtypes above.


Module Monolith

If you have a module monolith then all of the code for a system is in a single codebase that is compiled together and produces a single artifact. The code may still be well structured (classes and packages that are coherent and decoupled at a source level rather than a big-ball-of-mud) but it is not split into separate modules for compilation. Conversely a non-monolithic module design may have code split into multiple modules or libraries that can be compiled separately, stored in repositories and referenced when required. There are advantages and disadvantages to both but this tells you very little about how the code is used - it is primarily done for development management.

Module Monolith


Allocation Monolith

For an allocation monolith, all of the code is shipped/deployed at the same time. In other words once the compiled code is 'ready for release' then a single version is shipped to all nodes. All running components have the same version of the software running at any point in time. This is independent of whether the module structure is a monolith. You may have compiled the entire codebase at once before deployment OR you may have created a set of deployment artifacts from multiple sources and versions. Either way this version for the system is deployed everywhere at once (often by stopping the entire system, rolling out the software and then restarting).

A non-monolithic allocation would involve deploying different versions to individual nodes at different times. This is again independent of the module structure as different versions of a module monolith could be deployed individually.

Allocation Monolith


Runtime Monolith

A runtime monolith will have a single application or process performing the work for the system (although the system may have multiple, external dependencies). Many systems have traditionally been written like this (especially line-of-business systems such as Payroll, Accounts Payable, CMS etc).

Whether the runtime is a monolith is independent of whether the system code is a module monolith or not. A runtime monolith often implies an allocation monolith if there is only one main node/component to be deployed (although this is not the case if a new version of software is rolled out across regions, with separate users, over a period of time).

Runtime Monolith

Note that my examples above are slightly forced for the viewtypes and it won't be as hard-and-fast in the real world.

Conclusion

Be very carefully when arguing about 'Microservices vs Monoliths'. A direct comparison is only possible when discussing the Runtime viewtype and properties. You should also not assume that moving away from a Module or Allocation monolith will magically enable a Microservice architecture (although it will probably help). If you are moving to a Microservice architecture then I'd advise you to consider all these viewtypes and align your boundaries across them i.e. don't just code, build and distribute a monolith that exposes subsets of itself on different nodes.

Categories: Architecture

Ready, Test, Go!

Xebia Blog - Tue, 11/18/2014 - 10:42

The full potential of many an agile organization is hardly ever reached. Many teams find themselves redefining user stories although they have been committed to as part of the sprint. The ‘ready phase’, meant to get user stories clear and sufficiently detailed so they can be implemented, is missed. How will each user story result in high quality features that deliver business value? The ‘Definition of Ready’ is lacking one important entry: “Automated tests are available.” Ensuring to have testable and hence automated acceptance criteria before committing to user stories in a sprint, allows you to retain focus during the sprint. We define this as: Ready, Test, Go!

Ready

Behaviour-Driven Development has proven to be a fine technique to write automated acceptance criteria. Using the Gherkin format (given, when, then), examples can be specified that need to be supported by the system once the user story is completed. When a sufficiently detailed list of examples is available, all Scrum stakeholders agree with the specification. Common understanding is achieved that when the story is implemented, we are one step closer to building the right thing.

Test

The specification itself becomes executable: at any moment in time, the gap between the desired and implemented functionality becomes visible. In other words, this automated acceptance test should be run continuously. First time, it happily fails. Next, implementation can start. This, following Test-Driven Development principles, starts with writing (also failing) unit tests. Then, development of the production code starts. When the unit tests are passing and acceptance tests for a story are passing, other user stories can be picked up; stories of which the tests happily fail. Tests thus act as a safeguard to continuously verify that the team is building the thing right. Later, the automated tests (acceptance tests and unit tests) serve as a safety net for regression testing during subsequent sprints.

Go!

That's simple: release your software to production. Ensure that other testing activities (performance tests, chain tests, etc) are as much as possible automated and performed as part of the sprint.

The (Agile) Test Automation Engineer

In order to facilitate or boost this way of working, the role of the test automation engineer is key. The test automation engineer is defining the test architecture and facilitating the necessary infrastructure needed to run tests often and fast. He is interacting with developers to co-develop fixtures, to understand how the production code is built, and to decide upon the structure and granularity of the test code.

Apart from their valuable and unique analytical skills, relevant testers grow their technical skills. If it cannot be automated, it cannot be checked, so one might question whether the user story is ready to be committed to in a sprint. The test automation engineer helps the Scrum teams to identify when they are ‘ready to test’ and urges the product owner and business to specify requirements – at least for the next sprint ahead.

So: ready, test, go!!

Service discovery with consul and consul-template

Agile Testing - Grig Gheorghiu - Tue, 11/18/2014 - 00:20
I talked in the past about an "Ops Design Pattern: local haproxy talking to service layer". I described how we used a local haproxy on pretty much all nodes at a given layer of our infrastructure (webapp, API, e-commerce) to talk to services offered by the layer below it. So each webapp server has a local haproxy that talks to all API nodes it sends requests to. Similarly, each API node has a local haproxy that talks to all e-commerce nodes it needs info from.

This seemed like a good idea at a time, but it turns out it has a couple of annoying drawbacks:
  • each local haproxy runs health checks against N nodes, so if you have M nodes running haproxy, each of the N nodes will receive M health checks; if M and N are large, then you have a health check storm on your hands
  • to take a node out of a cluster at any given layer, we tag it as 'inactive' in Chef, then run chef-client on all nodes that run haproxy and talk to the inactive node at layers above it; this gets old pretty fast, especially when you're doing anything that might conflict with Chef and that the chef-client run might overwrite (I know, I know, you're not supposed to do anything of that nature, but we are all human :-)
For the second point, we are experimenting with haproxyctl so that we don't have to run chef-client on every node running haproxy. But it still feels like a heavy-handed approach.
If I were to do this again (which I might), I would still have an haproxy instance in front of our webapp servers, but for communicating from one layer of services to another I would use a proper service discovery tool such as grampa Apache ZooKeeper or the newer kids on the block, etcd from CoreOS and consul from HashiCorp.

I settled on consul for now, so in this post I am going to show how you can use consul in conjunction with the recently released consul-template to discover services and to automate configuration changes. At the same time, I wanted to experiment a bit with Ansible as a configuration management tool. So the steps I'll describe were actually automated with Ansible, but I'll leave that for another blog post.

The scenario I am going to describe involves 2 haproxy instances, each pointing to 2 Wordpress servers running Apache, PHP and MySQL, with Varnish fronting the Wordpress application. One of the 2 Wordpress servers is considered primary as far as haproxy is concerned, and the other one is a backup server, which will only get requests if the primary server is down. All servers are running Ubuntu 12.04.

Install and run the consul agent on all nodes

The agent will start in server mode on the 2 haproxy nodes, and in agent mode on the 2 Wordpress nodes.

I first deployed consul to the 2 haproxy nodes. I used a modified version of the ansible-consul role from jivesoftware. The configuration file /etc/consul.cfg for the first server (lb1) is:

{
  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "lb1",
  "server": true,
  "bind_addr": "10.0.0.1",
  "datacenter": "us-west-1b",
  "bootstrap": true,
  "rejoin_after_leave": true
}

(and similar for lb2, with only node_name and bind_addr changed to lb2 and 10.0.0.2 respectively)

The ansible-consul role also creates a consul user and group, and an upstart configuration file like this:

# cat /etc/init/consul.conf

# Consul Agent (Upstart unit)
description "Consul Agent"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec sudo -u consul -g consul /opt/consul/bin/consul agent -config-dir /etc/consul.d -config-file=/etc/consul.conf >> /var/log/consul 2>&1
respawn
respawn limit 10 10
kill timeout 10

To start/stop consul, I use:

# start consul
# stop consul

Note that "server" is set to true and "bootstrap" is also set to true, which means that each consul server will be the leader of a cluster with 1 member, itself. To join the 2 servers into a consul cluster, I did the following:
  • join lb1 to lb2: on lb1 run consul join 10.0.0.2
  • tail /var/log/consul on lb1, note messages complaining about both consul servers (lb1 and lb2) running in bootstrap mode
  • stop consul on lb1: stop consul
  • edit /etc/consul.conf on lb1 and set  "bootstrap": false
  • start consul on lb1: start consul
  • tail /var/log/consul on both lb1 and lb2; it should show no more errors
  • run consul info on both lb1 and lb2; the output should show server=true on both nodes, but leader=true only on lb2
Next I ran the consul agent in regular non-server mode on the 2 Wordpress nodes. The configuration file /etc/consul.cfg on node wordpress1 was:
{  "domain": "consul.",  "data_dir": "/opt/consul/data",  "log_level": "INFO",  "node_name": "wordpress1",  "server": false,  "bind_addr": "10.0.1.1",  "datacenter": "us-west-1b",  "rejoin_after_leave": true}
(and similar for wordpress2, with the node_name set to wordpress2 and bind_addr set to 10.0.1.2)
After starting up the agents via upstart, I joined them to lb2 (although the could be joined to any of the existing members of the cluster). I ran this on both wordpress1 and wordpress2:
# consul join 10.0.0.2
At this point, running consul members on any of the 4 nodes should show all 4 members of the cluster:
Node          Address         Status  Type    Build  Protocollb1           10.0.0.1:8301   alive   server  0.4.0  2wordpress2    10.0.1.2:8301   alive   client  0.4.0  2lb2           10.0.0.2:8301   alive   server  0.4.0  2wordpress1    10.0.1.1:8301   alive   client  0.4.0  2
Install and run dnsmasq on all nodes
The ansible-consul role does this for you. Consul piggybacks on DNS resolution for service naming, and by default the domain names internal to Consul start with consul. In my case they are configured in consul.cfg via "domain": "consul."
The dnsmasq configuration file for consul is:
# cat /etc/dnsmasq.d/10-consul
server=/consul./127.0.0.1#8600
This causes dnsmasq to provide DNS resolution for domain names starting with consul. by querying a DNS server on 127.0.0.1 running on port 8600 (which is the port the local consul agent listens on to provide DNS resolution).
To start/stop dnsmasq, use: service dnsmasq start | stop.
Now that dnsmasq is running, you can look up names that end in .node.consul from any member node of the consul cluster (there are 4 member nodes in my cluster, 2 servers and 2 agents). For example, I ran this on lb2:
$ dig wordpress1.node.consul
; <<>> DiG 9.8.1-P1 <<>> wordpress1.node.consul;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2511;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:;wordpress1.node.consul. IN A
;; ANSWER SECTION:wordpress1.node.consul. 0 IN A 10.0.1.1
;; Query time: 1 msec;; SERVER: 127.0.0.1#53(127.0.0.1);; WHEN: Fri Nov 14 00:09:16 2014;; MSG SIZE  rcvd: 76
Configure services and checks on consul agent nodes

Internal DNS resolution within the .consul domain becomes even more useful when nodes define services and checks. For example, the 2 Wordpress nodes run varnish and apache (on port 80 and port 443) so we can define 3 services as JSON files in /etc/consul.d. On wordpress1, which is our active/primary node in haproxy, I defined these services:

$ cat http_service.json
{
    "service": {
        "name": "http",
        "tags": ["primary"],
        "port":80,
        "check": {
                "id": "http_check",
                "name": "HTTP Health Check",
  "script": "curl -H 'Host=www.mydomain.com' http://localhost",
        "interval": "5s"
        }
    }
}

$ cat ssl_service.json
{
    "service": {
        "name": "ssl",
        "tags": ["primary"],
        "port":443,
        "check": {
                "id": "ssl_check",
                "name": "SSL Health Check",
  "script": "curl -k -H 'Host=www.mydomain.com' https://localhost:443",
        "interval": "5s"
        }
    }
}

$ cat varnish_service.json
{
    "service": {
        "name": "varnish",
        "tags": ["primary"],
        "port":6081 ,
        "check": {
                "id": "varnish_check",
                "name": "Varnish Health Check",
  "script": "curl http://localhost:6081",
        "interval": "5s"
        }
    }
}

Each service we defined has a name, a port and a check with its own ID, name, script that runs whenever the check is executed, and an interval that specifies how often the check is run. In the examples above I specified simple curl commands against the ports that these services are running on. Note also that each service has a list of tags associated with it. In my case, the services on wordpress1 have the tag "primary". The services defined on wordpress2 are identical to the ones on wordpress1 with the only difference being the tag, which on wordpress2 is "backup".

After restarting consul on wordpress1 and wordpress2, the following service-related DNS names are available for resolution on all nodes in the consul cluster (I am going to include only relevant portions of the dig output):

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.1
varnish.service.consul. 0 IN A 10.0.1.2

This name resolves in DNS round-robin fashion to the IP addresses of all nodes that are running the varnish service, regardless of their tags and regardless of the data centers that their nodes run in. In our case, it resolves to the IP addresses of wordpress1 and wordpress2.

Note that the IP address of a given node only appears in the DNS result set if the service running on that node has a healty check. If the check fails, then consul's DNS service will not include the IP of the node in the result set. This is very important for the dynamic discovery of healthy services.

$ dig varnish.service.us-west-1b.consul

;; ANSWER SECTION:
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.2
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.1

If we include the data center (in our case us-west-1b) in the DNS name we query, then only the services running on nodes in that data center will be returned in the result set. In our case though, all nodes run in the us-west-1b data center, so this query returns, like the previous one, the IP addresses of wordpress1 and wordpress2. Note that the IPs can be returned in any order, because of DNS round-robin. In this case the IP of wordpress2 was first.

$ dig SRV varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress1.node.us-west-1b.consul.
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress2.node.us-west-1b.consul.

;; ADDITIONAL SECTION:
wordpress1.node.us-west-1b.consul. 0 IN A 10.0.1.1
wordpress2.node.us-west-1b.consul. 0 IN A 10.0.1.2

A useful feature of the consul DNS service is that it returns the port number that a given service runs on when queried for an SRV record. So this query returns the names and IPs of the nodes that the varnish service runs on, as well as the port number, which in this case is 6081. The application querying for the SRV record needs to interpret this extra piece of information, but this is very useful for the discovery of internal services that might run on non-standard port numbers.

$ dig primary.varnish.service.consul

;; ANSWER SECTION:
primary.varnish.service.consul. 0 IN A 10.0.1.1

$ dig backup.varnish.service.consul

;; ANSWER SECTION:
backup.varnish.service.consul. 0 IN A 10.0.1.2

The 2 DNS queries above show that it's possible to query a service by its tag, in our case 'primary' vs. 'backup'. The result set will contain the IP addresses of the nodes tagged with the specific tag and running the specific service we asked for. This feature will prove useful when dealing with consul-template in haproxy, as I'll show later in this post.

Load balance across services

It's easy now to see how an application can take advantage of the internal DNS service provided by consul and load balance across services. For example, an application that needs to load balance across the 2 varnish services on wordpress1 and wordpress2 would use varnish.service.consul as the DNS name it talks to when it needs to hit varnish. Every time this DNS name is resolved, a random node from wordpress1 and wordpress2 is returned via the DNS round-robin mechanism. If varnish were to run on a non-standard port number, the application would need to issue a DNS request for the SRV record in order to obtain the port number as well as the IP address to hit.

Note that this method of load balancing has health checks built in. If the varnish health check fails on one of the nodes providing the varnish service, that node's IP address will not be included in the DNS result set returned by the DNS query for that service.

Also note that the DNS query can be customized for the needs of the application, which can query for a specific data center, or a specific tag, as I showed in the examples above.

Force a node out of service

I am still looking for the best way to take nodes in and out of service for maintenance or other purposes. One way I found so far is to deregister a given service via the Consul HTTP API. Here is an example of a curl command that accomplishes that, executed on node wordpress1:

$ curl -v http://localhost:8500/v1/agent/service/deregister/varnish
* About to connect() to localhost port 8500 (#0)
*   Trying 127.0.0.1... connected
> GET /v1/agent/service/deregister/varnish HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 17 Nov 2014 19:01:06 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
* Closing connection #0

The effect of this command is that the varnish service on node wordpress1 is 'deregistered', which for my purposes means 'marked as down'. DNS queries for varnish.service.consul will only return the IP address of wordpress2:

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.2

We can also use the Consul HTTP API to verify that the varnish service does not appear in the list of active services on node wordpress1. We'll use the /agent/services API call and we'll save the output to a file called services.out, then we'll use the jq tool to pretty-print the output:

$ curl -v http://localhost:8500/v1/agent/services -o services.out
$ jq . <<< `cat services.out`{  "http": {    "ID": "http",    "Service": "http",    "Tags": [      "primary"    ],    "Port": 80  },  "ssl": {    "ID": "ssl",    "Service": "ssl",    "Tags": [      "primary"    ],    "Port": 443  }}
Note that only the http and ssl services are shown.
Force a node back in service

Again, I am still looking for the best way to mark as service as 'up' once it was marked as 'down'. One way would be to register the service via the Consul HTTP API, and that requires issuing a POST request with the payload being the JSON configuration file for that service. Another way is to just restart the consul agent on the node in question. This will register the service that had been deregistered previously.

Install and configure consul-template

For the next few steps, I am going to show how to use consul-template in conjuction with consul for discovering services and configuring haproxy based on the discovered services.

I automated the installation and configuration of consul-template via an Ansible role that I put on Github, but I am going to discuss the main steps here. See also the instructions on the consul-template Github page.

In my Ansible role, I copy the consul-template binary to the target node (in my case the 2 haproxy nodes lb1 and lb2), then create a directory structure /opt/consul-template/{bin,config,templates}. The consul-template configuration file is /opt/consul-template/config/consul-template.cfg and it looks like this in my case:

$ cat config/consul-template.cfg
consul = "127.0.0.1:8500"

template {
  source = "/opt/consul-template/templates/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy restart"
}

Note that consul-template needs to be able to talk a consul agent, which in my case is the local agent listening on port 8500. The template that consul-template maintains is defined in another file,  /opt/consul-template/templates/haproxy.ctmpl. What consul-template does is monitor changes to that file via changes to the services referenced in the file. Upon any such change, consul-template will generate a new target file based on the template and copy it to the destination file, which in my case is the haproxy config file /etc/haproxy/haproxy.cfg. Finally, consul-template will executed a command, which in my case is the restarting of the haproxy service.

Here is the actual template file for my haproxy config, which is written in the Go template format:

$ cat /opt/consul-template/templates/haproxy.ctmpl

global
  log 127.0.0.1   local0
  maxconn 4096
  user haproxy
  group haproxy

defaults
  log     global
  mode    http
  option  dontlognull
  retries 3
  option redispatch
  timeout connect 5s
  timeout client 50s
  timeout server 50s
  balance  roundrobin

# Set up application listeners here.

frontend http
  maxconn {{key "service/haproxy/maxconn"}}
  bind 0.0.0.0:80
  default_backend servers-http-varnish

backend servers-http-varnish
  balance            roundrobin
  option httpchk GET /
  option  httplog
{{range service "primary.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}

frontend https
  maxconn            {{key "service/haproxy/maxconn"}}
  mode               tcp
  bind               0.0.0.0:443
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin
{{range service "primary.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}


To the trained eye, this looks like a regular haproxy configuration file, with the exception of the portions bolded above. These are Go template snippets which rely on a couple of template functions exposed by consul-template above and beyond what the Go templating language offers. Specifically, the key function queries a key stored in the Consul key/value store and outputs the value associated with that key (or an empty string if the value doesn't exist). The service function queries a consul service by its DNS name and returns a result set used inside the range statement. The variables inside the result set can be inspected for properties such as Node, Address and Port, which correspond to the Consul service node name, IP address and port number for that particular service.

In my example above, I use the value of the key service/haproxy/maxconn as the value of maxconn. In the http-varnish backend, I used 2 sets of services names, primary.varnish and backup.varnish, because I wanted to differentiate in haproxy.cfg between the primary server (wordpress1 in my case) and the backup server (wordpress2). In the ssl backend, I did the same but with the ssl service.
Everything so far would work fine with the exception of the key/value pair represented by the key service/haproxy/maxconn. To define that pair, I used the Consul key/value store API (this can be run on any member of the Consul cluster):
$ cat set_haproxy_maxconn.sh#!/bin/bash
MAXCONN=4000
curl -X PUT -d "$MAXCONN" http://localhost:8500/v1/kv/service/haproxy/maxconn
To verify that the value was set, I used:
$ cat query_consul_kv.sh#!/bin/bash
curl -v http://localhost:8500/v1/kv/?recurse
$ ./query_consul_kv.sh* About to connect() to localhost port 8500 (#0)*   Trying 127.0.0.1... connected> GET /v1/kv/?recurse HTTP/1.1> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3> Host: localhost:8500> Accept: */*>< HTTP/1.1 200 OK< Content-Type: application/json< X-Consul-Index: 30563< X-Consul-Knownleader: true< X-Consul-Lastcontact: 0< Date: Mon, 17 Nov 2014 23:01:07 GMT< Content-Length: 118<* Connection #0 to host localhost left intact* Closing connection #0[{"CreateIndex":10995,"ModifyIndex":30563,"LockIndex":0,"Key":"service/haproxy/maxconn","Flags":0,"Value":"NDAwMA=="}]
At this point, everything is ready for starting up the consul-template service (in Ubuntu), I did it via this Upstart configuration file:
# cat /etc/init/consul-template.conf# Consul Template (Upstart unit)description "Consul Template"start on (local-filesystems and net-device-up IFACE!=lo)stop on runlevel [06]
exec /opt/consul-template/bin/consul-template  -config=/opt/consul-template/config/consul-template.cfg >> /var/log/consul-template 2>&1
respawnrespawn limit 10 10kill timeout 10
# start consul-template
Once consul-template starts, it will peform the actions corresponding to the functions defined in the template file /opt/consul-template/templates/haproxy.ctmpl. In my case, it will query Consul for the value of the key service/haproxy/maxconn and for information about the 2 Consul services varnish.service and ssl.service. It will then save the generated file to /etc/haproxy/haproxy.cfg and it will restart the haproxy service. The relevant snippets from haproxy.cfg are:
frontend http  maxconn 4000  bind 0.0.0.0:80  default_backend servers-http
backend servers-http  balance            roundrobin  option httpchk GET /  option  httplog
    server wordpress1 10.0.1.1:6081 weight 1 check port 6081

    server wordpress2 10.0.1.2:6081 backup weight 1 check port 6081
and
frontend https  maxconn            4000  mode               tcp  bind               0.0.0.0:443  default_backend    servers-https
backend servers-https  mode               tcp  option             tcplog  balance            roundrobin
    server wordpress1 10.0.1.1:443 weight 1 check port 443

    server wordpress2 10.0.1.2:443 backup weight 1 check port 443
I've been running this as a test on lb2. I don't consider my setup quite production-ready because I don't have monitoring in place, and I also want to experiment with consul security tokens for better security. But this is a pattern that I think will work.





Aeron: Do we really need another messaging system?

Do we really need another messaging system? We might if it promises to move millions of messages a second, at small microsecond latencies between machines, with consistent response times, to large numbers of clients, using an innovative design.  

And that’s the promise of Aeron (the Celtic god of battle, not the chair, though tell that to the search engines), a new high-performance open source message transport library from the team of Todd Montgomery, a multicast and reliable protocol expert, Richard Warburton, an expert on compiler optimizations, and Martin Thompson, the pasty faced performance gangster.

The claims are Aeron is already beating the best products out there on throughput and latency matches the best commercial products up to the 90th percentile. Aeron can push small 40 byte messages at 6 million messages a second, which is a very difficult case.

Here’s a talk Martin gave on Aeron at Strangeloop: Aeron: Open-source high-performance messaging. I’ll give a gloss of his talk as well as integrating in sources of information listed at the end of this article.

Martin and his team were in the enviable position of having a client that required a product like Aeron and was willing to both finance its development while also making it open source. So go git Aeron on GitHub. Note, it’s early days for Aeron and they are still in the heavy optimization phase.

The world has changed therefore endpoints need to scale as never before. This is why Martin says we need a new messaging system. It’s now a multi-everything world. We have multi-core, multi-socket, multi-cloud, multi-billion user computing, where communication is happening all the time. Huge numbers of consumers regularly pound a channel to read from same publisher, which causes lock contention, queueing effects, which causes throughput to drop and latency to spike. 

What’s needed is a new messaging library to make the most of this new world. The move to microservices only heightens the need:

As we move to a world of micro services then we need very low and predictable latency from our communications otherwise the coherence component of USL will come to rain fire and brimstone on our designs.

With Aeron the goal is to keep things pure and focused. The benchmarking we have done so far suggests a step forward in throughput and latency. What is quite unique is that you do not have to choose between throughput and latency. With other high-end messaging transports this is a distinct choice. The algorithms employed by Aeron give maximum throughput while minimising latency up until saturation.

“Many messaging products are a Swiss Army knife; Aeron is a scalpel,” says Martin, which is a good way to understand Aeron. It’s not a full featured messaging product in the way you may be used to, like Kafka. Aeron does not persist messages, it doesn’t support guaranteed delivery, nor clustering, nor does it support topics. Aeron won’t know if a client has crashed and be able to sync it back up from history or initialize a new client from history. 

The best way to place Aeron in your mental matrix might be as a message oriented replacement for TCP, with higher level services written on top. Todd Montgomery expands on this idea:

Aeron being an ISO layer 4 protocol provides a number of things that messaging systems can't and also doesn't provide several things that some messaging systems do.... if that makes any sense. Let me explain slightly more wrt all typical messaging systems (not just Kafka and 0MQ). 

One way to think more about where Aeron fits is TCP, but with the option of reliable multicast delivery. However, that is a little limited in that Aeron also, by design, has a number of possible uses that go well beyond what TCP can do. Here are a few things to consider: 

Todd continues on with more detail, so please keep reading the article to see more on the subject.

At its core Aeron is a replicated persistent log of messages. And through a very conscious design process messages are wait-free and zero-copy along the entire path from publication to reception. This means latency is very good and very predictable.

That sums up Aeron is nutshell. It was created by an experienced team, using solid design principles sharpened on many previous projects, backed by techniques not everyone has in their tool chest. Every aspect has been well thought out to be clean, simple, highly performant, and highly concurrent.

If simplicity is indistinguishable from cleverness, then there’s a lot of cleverness going on in Aeron. Let’s see how they did it...

Categories: Architecture

How To Build a Roadmap for Your Digital Business Transformation

Let’s say you want to take your business to the Cloud --  How do you do it?

If you’re a small shop or a startup, it might be easy to just swipe your credit card and get going.

If, on the other hand, you’re a larger business that wants to start your journey to the Cloud, with a lot of investments and people that you need to bring along, you need a roadmap.

The roadmap will help you deal with setbacks, create confidence in the path, and help ensure that you can get from point A to point B (and that you know what point B actually is.)  By building an implementable roadmap for your business transformation, you can also build a coalition of the willing to help you get their faster.  And you can design your roadmap so that your journey flows continuous business value along the way.

In the book, Leading Digital: Turning Technology into Business Transformation, George Westerman, Didier Bonnet, and Andrew McAfee, share how top leaders build better roadmaps for their digital business transformation.

Why You Need to Build a Roadmap for Your Digital Transformation

If you had infinite time and resources, maybe you could just wing it, and hope for the best.   A better approach is to have a roadmap as a baseline.  Even if your roadmap changes, at least you can share the path with others in your organization and get them on board to help make it happen.

Via Leading Digital:

“In a perfect world, your digital transformation would deliver an unmatched customer experience, enjoy the industry's most effective operations, and spawn innovative, new business models.  There are a myriad of opportunities for digital technology to improve your business and no company can entertain them all at once.  The reality of limited resources, limited attention spans, and limited capacity for change with force focused choices.  This is the aim of your roadmap.”

Find Your Entry Point

Your best starting point is a business capability that you want to exploit.

Via Leading Digital:

“Many companies have come to realize that before they can create a wholesale change within their organization, they have to find an entry point that will begin shifting the needle.  How? They start by building a roadmap that leverages existing assets and capabilities.  Burberry, for example, enjoyed a globally recognized brand and a fleet of flagship retail locations around the world.  The company started by revitalizing its brand and customer experience in stores and online.  Others, like Codelco, began with the core operational processes of their business.  Caesars Entertainment combined strong capabilities in analytics with a culture of customer service to deliver a highly personalized guest experience.  There is no single right way to start your digital transformation.  What matters is that you find the existing capability--your sweet spot--that will get your company off the starting blocks.

Once your initial focus is clear, you can start designing your transformation roadmap.  Which investments and activities are necessary to close the gap to your vision?  What is predictable, and what isn't? What is the timing and scheduling of each initiative? What are the dependencies between them?  What organizational resources, such as analytics skills, are required?”

Engage Practitioners Early in the Design

If you involve others in your roadmap, you get their buy-in, and they will help you with your business transformation.

Via Leading Digital:

“Designing your roadmap will require input from a broad set of stakeholders.  Rather than limit the discussion to the top team, engage the operational specialists who bring an on-the-ground perspective.  This will minimize the traditional vision-to-execution gap.  You can crowd-source the design.  Or, you can use facilitated workshops, as as 'digital days,' as an effective way to capture and distill the priorities and information you will need to consider.  We've seen several Digital Masters do both.

Make no mistake; designing your roadmap will take time, effort, and multiple iterations.  But you will find it a valuable exercise.  it forces agreement on priorities and helps align senior management and the people tasked to execute the program.  Your roadmap will become more than just a document.  If executed well, it can be the canvas of the transformation itself.  Because your roadmap is a living document, it will evolve as your implementation progresses.”

Design for Business Outcome, Not Technology

When you create your roadmap, focus on the business outcomes.   Think in terms of adding incremental business capabilities.   Don’t make it a big bang thing.   Instead, start small, but iterate on building business capabilities that take advantage of Cloud, Mobile, Social, and Big Data technologies.

Via Leading Digital:

“Technology for its own sake is a common trap.  Don't build your roadmap as a series of technology projects.  Technology is only part of the story in digital transformation and often the least challenging one.  For example, the major hurdles for Enterprise 2.0 platforms are not technical.  Deploying the platform is relatively straightforward, and today's solutions are mature.  The challenge lies in changing user behavior--encouraging adoption and sustaining engagement in the activities the platform is meant to enable.

Express your transformation roadmap in terms of business outcomes.  For example, 'Establish a 360-degree understanding of our customers.'  Build into your roadmap the many facets of organizational change that your transformation will require customer experiences, operational processes, employee ways of working, organization, culture, communication--the list goes on.  This is why contributions from a wide variety is so critical.”

There are lots of way to build a roadmap, but the best thing you can do is put something down on paper so that you can share the path with other people and start getting feedback and buy-in.

You’ll be surprised but when you show business and IT leaders a roadmap, it helps turn strategy into execution and make things real in people’s minds.

You Might Also Like

10 High-Value Activities in the Enterprise

Cloud Changes the Game from Deployment to Adoption

Drive Business Transformation by Reenvisioning Operations

Drive Business Transformation by Reenvisioning Your Customer Experience

Dual-Speed IT Drives Business Transformation and Improves IT-Business Relationships

How To Build a Better Business Case for Digital Initiatives

How To Improve the IT-Business Relationship

How Leaders are Building Digital Skills

Management Innovation is at the Top of the Innovation Stack

Categories: Architecture, Programming