Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator/categories/7&page=6' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.

Announcing Windows Server 2016 Containers Preview

ScottGu's Blog - Scott Guthrie - Wed, 08/19/2015 - 17:01

At DockerCon this year, Mark Russinovich, CTO of Microsoft Azure, demonstrated the first ever application built using code running in both a Windows Server Container and a Linux container connected together. This demo helped demonstrate Microsoft's vision that in partnership with Docker, we can help bring the Windows and Linux ecosystems together by enabling developers to build container-based distributed applications using the tools and platforms of their choice.

Today we are excited to release the first preview of Windows Server Containers as part of our Windows Server 2016 Technical Preview 3 release. We’re also announcing great updates from our close collaboration with Docker, including enabling support for the Windows platform in the Docker Engine and a preview of the Docker Engine for Windows. Our Visual Studio Tools for Docker, which we previewed earlier this year, have also been updated to support Windows Server Containers, providing you a seamless end-to-end experience straight from Visual Studio to develop and deploy code to both Windows Server and Linux containers. Last but not least, we’ve made it easy to get started with Windows Server Containers in Azure via a dedicated virtual machine image. Windows Server Containers

Windows Server Containers create a highly agile Windows Server environment, enabling you to accelerate the DevOps process to efficiently build and deploy modern applications. With today’s preview release, millions of Windows developers will be able to experience the benefits of containers for the first time using the languages of their choice – whether .NET, ASP.NET, PowerShell or Python, Ruby on Rails, Java and many others.

Today’s announcement delivers on the promise we made in partnership with Docker, the fast-growing open platform for distributed applications, to offer container and DevOps benefits to Linux and Windows Server users alike. Windows Server Containers are now part of the Docker open source project, and Microsoft is a founding member of the Open Container Initiative. Windows Server Containers can be deployed and managed either using the Docker client or PowerShell. Getting Started using Visual Studio

The preview of our Visual Studio Tools for Docker, which enables developers to build and publish ASP.NET 5 Web Apps or console applications directly to a Docker container, has been updated to include support for today’s preview of Windows Server Containers. The extension automates creating and configuring your container host in Azure, building a container image which includes your application, and publishing it directly to your container host. You can download and install this extension, and read more about it, at the Visual Studio Gallery here: http://aka.ms/vslovesdocker.

Once installed, developers can right-click on their projects within Visual Studio and select “Publish”:


Doing so will display a Publish dialog which will now include the ability to deploy to a Docker Container (on either a Windows Server or Linux machine):


You can choose to deploy to any existing Docker host you already have running:


Or use the dialog to create a new Virtual Machine running either Window Server or Linux with containers enabled.  The below screen-shot shows how easy it is to create a new VM hosted on Azure that runs today’s Windows Server 2016 TP3 preview that supports Containers – you can do all of this (and deploy your apps to it) easily without ever having to leave the Visual Studio IDE:

image Getting Started Using Azure

In June of last year, at the first DockerCon, we enabled a streamlined Azure experience for creating and managing Docker hosts in the cloud. Up until now these hosts have only run on Linux. With the new preview of Windows Server 2016 supporting Windows Server Containers, we have enabled a parallel experience for Windows users.

Directly from the Azure Marketplace, users can now deploy a Windows Server 2016 virtual machine pre-configured with the container feature enabled and Docker Engine installed. Our quick start guide has all of the details including screen shots and a walkthrough video so take a look here https://msdn.microsoft.com/en-us/virtualization/windowscontainers/quick_start/azure_setup.


Once your container host is up and running, the quick start guide includes step by step guides for creating and managing containers using both Docker and PowerShell. Getting Started Locally Using Hyper-V

Creating a virtual machine on your local machine using Hyper-V to act as your container host is now really easy. We’ve published some PowerShell scripts to GitHub that automate nearly the whole process so that you can get started experimenting with Windows Server Containers as quickly as possible. The quick start guide has all of the details at https://msdn.microsoft.com/en-us/virtualization/windowscontainers/quick_start/container_setup.

Once your container host is up and running the quick start guide includes step by step guides for creating and managing containers using both Docker and PowerShell.

image Additional Information and Resources

A great list of resources including links to past presentations on containers, blogs and samples can be found in the community section of our documentation. We have also setup a dedicated Windows containers forum where you can provide feedback, ask questions and report bugs. If you want to learn more about the technology behind containers I would highly recommend reading Mark Russinovich’s blog on “Containers: Docker, Windows and Trends” that was published earlier this week. Summary

At the //Build conference earlier this year we talked about our plan to make containers a fundamental part of our application platform, and today’s releases are a set of significant steps in making this a reality.’ The decision we made to embrace Docker and the Docker ecosystem to enable this in both Azure and Windows Server has generated a lot of positive feedback and we are just getting started.

While there is still more work to be done, now users in the Window Server ecosystem can begin experiencing the world of containers. I highly recommend you download the Visual Studio Tools for Docker, create a Windows Container host in Azure or locally, and try out our PowerShell and Docker support. Most importantly, we look forward to hearing feedback on your experience.

Hope this helps,

Scott omni

Categories: Architecture, Programming

The Microsoft Take on Containers and Docker

This is a guest repost by Mark Russinovich, CTO of Microsoft Azure (and novelist!). We all benefit from a vibrant competitive cloud market and Microsoft is part of that mix. Here's a good container overview along with Microsoft's plan of attack. Do you like their story? Is it interesting? Is it compelling?

You can’t have a discussion on cloud computing lately without talking about containers. Organizations across all business segments, from banks and major financial service firms to e-commerce sites, want to understand what containers are, what they mean for applications in the cloud, and how to best use them for their specific development and IT operations scenarios.

From the basics of what containers are and how they work, to the scenarios they’re being most widely used for today, to emerging trends supporting “containerization”, I thought I’d share my perspectives to better help you understand how to best embrace this important cloud computing development to more seamlessly build, test, deploy and manage your cloud applications.

Containers Overview

In abstract terms, all of computing is based upon running some “function” on a set of “physical” resources, like processor, memory, disk, network, etc., to accomplish a task, whether a simple math calculation, like 1+1, or a complex application spanning multiple machines, like Exchange. Over time, as the physical resources became more and more powerful, often the applications did not utilize even a fraction of the resources provided by the physical machine. Thus “virtual” resources were created to simulate underlying physical hardware, enabling multiple applications to run concurrently – each utilizing fractions of the physical resources of the same physical machine.

We commonly refer to these simulation techniques as virtualization. While many people immediately think virtual machines when they hear virtualization, that is only one implementation of virtualization. Virtual memory, a mechanism implemented by all general purpose operating systems (OSs), gives applications the illusion that a computer’s memory is dedicated to them and can even give an application the experience of having access to much more RAM than the computer has available.

Containers are another type of virtualization, also referred to as OS Virtualization. Today’s containers on Linux create the perception of a fully isolated and independent OS to the application. To the running container, the local disk looks like a pristine copy of the OS files, the memory appears only to hold files and data of a freshly-booted OS, and the only thing running is the OS. To accomplish this, the “host” machine that creates a container does some clever things.

The first technique is namespace isolation. Namespaces include all the resources that an application can interact with, including files, network ports and the list of running processes. Namespace isolation enables the host to give each container a virtualized namespace that includes only the resources that it should see. With this restricted view, a container can’t access files not included in its virtualized namespace regardless of their permissions because it simply can’t see them. Nor can it list or interact with applications that are not part of the container, which fools it into believing that it’s the only application running on the system when there may be dozens or hundreds of others.

For efficiency, many of the OS files, directories and running services are shared between containers and projected into each container’s namespace. Only when an application makes changes to its containers, for example by modifying an existing file or creating a new one, does the container get distinct copies from the underlying host OS – but only of those portions changed, using Docker’s “copy-on-write” optimization. This sharing is part of what makes deploying multiple containers on a single host extremely efficient.

Second, the host controls how much of the host’s resources can be used by a container. Governing resources like CPU, RAM and network bandwidth ensure that a container gets the resources it expects and that it doesn’t impact the performance of other containers running on the host. For example, a container can be constrained so that it cannot use more than 10% of the CPU. That means that even if the application within it tries, it can’t access to the other 90%, which the host can assign to other containers or for its own use. Linux implements such governance using a technology called “cgroups.” Resource governance isn’t required in cases where containers placed on the same host are cooperative, allowing for standard OS dynamic resource assignment that adapts to changing demands of application code.

The combination of instant startup that comes from OS virtualization and reliable execution that comes from namespace isolation and resource governance makes containers ideal for application development and testing. During the development process, developers can quickly iterate. Because its environment and resource usage are consistent across systems, a containerized application that works on a developer’s system will work the same way on a different production system. The instant-start and small footprint also benefits cloud scenarios, since applications can scale-out quickly and many more application instances can fit onto a machine than if they were each in a VM, maximizing resource utilization.

Comparing a similar scenario that uses virtual machines with one that uses containers highlights the efficiency gained by the sharing. In the example shown below, the host machine has three VMs. In order to provide the applications in the VMs complete isolation, they each have their own copies of OS files, libraries and application code, along with a full in-memory instance of an OS. Starting a new VM requires booting another instance of the OS, even if the host or existing VMs already have running instances of the same version, and loading the application libraries into memory. Each application VM pays the cost of the OS boot and the in-memory footprint for its own private copies, which also limits the number of application instances (VMs) that can run on the host.

App Instances on Host

The figure below shows the same scenario with containers. Here, containers simply share the host operating system, including the kernel and libraries, so they don’t need to boot an OS, load libraries or pay a private memory cost for those files. The only incremental space they take is any memory and disk space necessary for the application to run in the container. While the application’s environment feels like a dedicated OS, the application deploys just like it would onto a dedicated host. The containerized application starts in seconds and many more instances of the application can fit onto the machine than in the VM case.

Containers on Host

Docker’s Appeal
Categories: Architecture

Release Burn Down Brought to Life

Xebia Blog - Tue, 08/18/2015 - 23:48

Inspired by the blog of Mike Cohn [Coh08] "Improving On Traditional Release Burndown Charts" I created a time lapsed version of it. It also nicely demonstrates that forecasts of "What will be finished?" (at a certain time) get better as the project progresses.

The improved traditional release burn down chart clearly show what (a) is finished (light green), (b) what will very likely be finished (dark green), (c) what will perhaps be finished, and perhaps not (orange), and (d) what almost is guaranteed not to be finished (red).

This knowledge supports product owners in ordering the backlog based on the current knowledge.


The result is obtained doing a Monte Carlo simulation of a toy project, using a fixed product backlog of around 100 backlog items with various sized items. The amount of work realized also varies per projectday based on a simple uniform probability distribution.

Forecasting is done using a 'worst' velocity and a 'best' velocity. Both are determined using only the last 3 velocities, i.e. only the last 3 sprints are considered.

The 2 grey lines represent the height of the orange part of the backlog, i.e. the backlog items that might be or not be finished. This also indicates the uncertainty over time of what actually will be delivered by the team at the given time.




The Making Of...

The movie above has been created using GNU plot [GNU Plot] for drawing the charts, and ffmpeg [ffmpeg] has been used to creat the time lapsed movie from the set of charts.


Over time the difference between the 2 grey lines gets smaller, a clear indication of improving predictability and reduction of risk. Also, the movie shows that the final set of backlog items done is well between the 2 grey lines from the start of the project.

This looks very similar to the 'Cone of Uncertainty'. Besides that the shape of the grey lines only remotely resembles a cone, another difference is that the above simulation merely takes statistical chances into account. The fact that the team gains more knowledge and insight over time, is not considered in the simulation, whereas it is an important factor in the 'Cone of Uncertainty'.


[Coh08] "Improving On Traditional Release Burndown Charts", Mike Cohn, June 2008, https://www.mountaingoatsoftware.com/blog/improving-on-traditional-release-burndown-charts

[GNU Plot] Gnu plot version 5.0, "A portable command-line driven graphing utility", http://www.gnuplot.info

[ffmpeg] "A complete, cross-platform solution to record, convert and stream audio and video", https://ffmpeg.org

Sponsored Post: Surge, Redis Labs, Jut.io, VoltDB, Datadog, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • VoltDB's in-memory SQL database combines streaming analytics with transaction processing in a single, horizontal scale-out platform. Customers use VoltDB to build applications that process streaming data the instant it arrives to make immediate, per-event, context-aware decisions. If you want to join our ground-breaking engineering team and make a real impact, apply here.  

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?
Cool Products and Services
  • MongoDB Management Made Easy. Gain confidence in your backup strategy. MongoDB Cloud Manager makes protecting your mission critical data easy, without the need for custom backup scripts and storage. Start your 30 day free trial today.

  • In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-intensive use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Iterables, Iterators and Generator functions in ES2015

Xebia Blog - Mon, 08/17/2015 - 20:45

ES2015 adds a lot of new features to javascript that make a number of powerful constructs, present in other languages for years, available in the browser (well as soon as support for those features is rolled out of course, but in the meantime we can use these features by using a transpiler such as Babeljs or Traceur).
Some of the more complicated additions are the iterator and iterable protocols and generator functions. In this post I'll explain what they are and what you can use them for.

The Iterable and Iterator protocols

These protocols are analogues to, for example, the Java Interfaces and define the contract that an object must adhere to in order to be considered an iterable or an iterator. So instead of new language features they leverage existing constructs by agreeing upon a convention (as javascript does not have a concept like an interface in other typed languages). Let's have a closer look at these protocols and see how they interact with each other.

The Iterable protocol

This protocol specifies that for an object object to be considered iterable (and usable in, for example, a `for ... of` loop) it has to define a function with the special key "Symbol.iterator" that returns an object that adheres to the iterator protocol. That is basically the only requirement. For example say you have a datastructure you want to iterate over, in ES2015 you would do that as follows:

class DataStructure {
  constructor(data) {
    this.data = data;

  [Symbol.iterator]() {
    let current = 0
    let data = this.data.slice();
    return  {
      next: function () {
        return {
          value: data[current++],
          done: current > data.length
let ds = new DataStructure(['hello', 'world']);

console.log([...ds]) // ["hello","world"]

The big advantage of using the iterable protocol over using another construct like `for ... in` is that you have more clearly defined iteration semantic (for example: you do not need explicit hasOwnProperty checking when iterating over an array to filter out properties on the array object but not in the array). Another advantage is the when using generator functions you can benefit of lazy evaluation (more on generator functions later).

The iterator protocol

As mentioned before, the only requirement for the iterable protocol is for the object to define a function that returns an iterator. But what defines an iterator?
In order for an object to be considered an iterator it must provided a method named `next` that returns an object with 2 properties:
* `value`: The actual value of the iterable that is being iterated. This is only valid when done is `false`
* `done`: `false` when `value` is an actual value, `true` when the iterator did not produce a new value

Note that when you provided a `value` you can leave out the `done` property and when the `done` property is `true` you can leave out the value property.

The object returned by the function bound to the DataStructure's `Symbol.iterator` property in the previous example does this by returning the entry from the array as the value property and returning `done: false` while there are still entries in the data array.

So by simply implementing both these protocols you can turn any `Class` (or `object` for that matter) into an object you can iterate over.
A number of built-ins in ES2015 already implement these protocols so you can experiment with the protocol right away. You can already iterate over Strings, Arrays, TypedArrays, Maps and Sets.

Generator functions

As shown in the earlier example implementing the iterable and iterator protocols manually can be quite a hassle and is error-prone. That is why a language feature was added to ES2015: generator functions. A generator combines both an iterable and an iterator in a single function definition. A generator function is declared by adding an asterisk (`*`) to the function name and using yield to return values. Big advantage of using this method is that your generator function will return an iterator that, when its `next()` method is invoked will run up to the first yield statement it encounters and will suspend execution until `next()` is called again (after which it will resume and run until the next yield statement). This allows us to write an iteration that is evaluated lazily instead of all at once.

The following example re-implements the iterable and iterator using a generator function producing the same result, but with a more concise syntax.

class DataStructure {
  constructor(data) {
    this.data = data;

  *[Symbol.iterator] () {
    let data = this.data.slice();
    for (let entry of data) {
      yield entry;
let ds = new DataStructure(['hello', 'world']);

console.log([...ds]) // ["hello","world"]
More complex usages of generators

As mentioned earlier, generator functions allow for lazy evaluation of (possibly) infinite iterations allowing to use constructs known from more functional languages such as taking a limited subset from an infinte sequence:

function* generator() {
  let i = 0;
  while (true) {
    yield i++;

function* take(number, gen) {
  let current = 0;
  for (let result of gen) {
    yield result;
    if (current++ >= number) {
console.log([...take(10, generator())]) // [0,1,2,3,4,5,6,7,8,9]
console.log([...take(10, [1,2,3])]) // [1,2,3]

Delegating generators
Within a generator it is possible to delegate to a second generator making it possible to create recursive iteration structures. The following example demonstrates a simple generator delegating to a sub generator and returning to the main generator.

function* generator() {
  yield 1;
  yield* subGenerator()
  yield 4;

function* subGenerator() {
  yield 2;
  yield 3;

console.log([...generator()]) // [1,2,3,4]

30 Day Sprints for Personal Development: Change Yourself with Skill

"What lies behind us and what lies before us are small matters compared to what lies within us. And when we bring what is within us out into the world, miracles happen." -- Ralph Waldo Emerson

I've written about 30 Day Sprints before, but it's time to talk about them again:

30 Day Sprints help you change yourself with skill.

Once upon a time, I found that when I was learning a new skill, or changing a habit, or trying something new, I wasn't getting over that first humps, or making enough progress to stick with it.

At the same time, I would get distracted by shiny new objects.  Because I like to learn and try new things, I would start something else, and ditch whatever else I was trying to work on, to pursuit my new interest.  So I was hopping from thing to thing, without much to show for it, or getting much better.

I decided to stick with something for 30 days to see if it would make a difference.  It was my personal 30 day challenge.  And it worked.   What I found was that sticking with something past two weeks, got me past those initial hurdles.  Those dips that sit just in front of where breakthroughs happen.

All I did was spend a little effort each day for 30 days.  I would try to learn a new insight or try something small each day.  Each day, it wasn't much.  But over 30 days, it accumulated.  And over 30 days, the little effort added up to a big victory.

Why 30 Day Sprints Work So Well

Eventually, I realized why 30 Day Sprints work so well.  You effectively stack things in your favor.  By investing in something for a month, you can change how you approach things.  It's a very different mindset when you are looking at your overall gain over 30 days versus worrying about whether today or tomorrow gave you immediate return on your time.  By taking a longer term view, you give yourself more room to experiment and learn in the process.

  1. 30 Day Sprints let you chip away at the stone.  Rather than go big bang or whole hog up front, you can chip away at it.  This takes the pressure off of you.  You don't have to make a breakthrough right away.  You just try to make a little progress and focus on the learning.  When you don't feel like you made progress, you at least can learn something about your approach.
  2. 30 Day Sprints get you over the initial learning curve.  When you are taking in new ideas and learning new concepts, it helps to let things sink in.  If you're only trying something for a week or even two weeks, you'd be amazed at how many insights and breakthroughs are waiting just over that horizon.  Those troughs hold the keys to our triumphs.
  3. 30 Day Sprints help you stay focused.  For 30 days, you stick with it.  Sure you want to try new things, but for 30 days, you keep investing in this one thing that you decided was worth it.  Because you do a little every day, it actually gets easier to remember to do it. But the best part is, when something comes up that you want to learn or try, you can add it to your queue for your next 30 Day Sprint.
  4. 30 Day Sprints help you do things better, faster, easier, and deeper.  For 30 days, you can try different ways.  You can add a little twist.  You can find what works and what doesn't.  You can keep testing your abilities and learning your boundaries.  You push the limits of what you're capable of.  Over the course of 30 days, as you kick the tires on things, you'll find short-cuts and new ways to improve. Effectively, you unleash your learning abilities through practice and performance.
  5. 30 Day Sprints help you forge new habits.  Because you focus for a little bit each day, you actually create new habits.  A habit is much easier to put in place when you do it each day.  Eventually, you don't even have to think about it, because it becomes automatic.  Doing something every other day, or every third day, means you have to even remember when to do it.  We're creatures of habit.  Just replace how you already spend a little time each day, on your behalf.

And that is just the tip of the iceberg.

The real power of 30 Day Sprints is that they help you take action.  They help you get rid of all the excuses and all the distractions so you can start to achieve what you’re fully capable of.

Ways to Make 30 Day Sprints Work Better

When I first started using 30 Day Sprints for personal development, the novelty of doing something more than a day or a week or even two weeks, was enough to get tremendous value.  But eventually, as I started to do more 30 Day Sprints, I wanted to get more out of them.

Here is what I learned:

  1. Start 30 Day Sprints at the beginning of each month.  Sure, you can start 30 Day Sprints whenever you want, but I have found it much easier, if the 17th of the month, is day 17 of my 30 Day Sprint.  Also, it's a way to get a fresh start each month.  It's like turning the page.  You get a clean slate.  But what about February?  Well, that's when I do a 28 Day Sprint (and one day more when Leap Year comes.)
  2. Same Time, Same Place.  I've found it much easier and more consistent, when I have a consistent time and place to work on my 30 Day Sprint.  Sure, sometimes my schedule won't allow it.  Sure, some things I'm learning require that I do it from different places.  But when I know, for example, that I will work out 6:30 - 7:00 A.M. each day in my living room, that makes things a whole lot easier.  Then I can focus on what I'm trying to learn or improve, and not spend a lot of time just hoping I can find the time each day.  The other benefit is that I start to find efficiencies because I have a stable time and place, already in place.  Now I can just optimize things.
  3. Focus on the learning.  When it's the final inning and the score is tied, and you have runners on base, and you're up at bat, focus is everything.  Don't focus on the score.  Don't focus on what's at stake.  Focus on the pitch.  And swing your best.  And, hit or miss, when it's all over, focus on what you learned.  Don't dwell on what went wrong.  Focus on how to improve.  Don't focus on what went right.  Focus on how to improve.  Don't get entangled by your mini-defeats, and don't get seduced by your mini-successes.  Focus on the little lessons that you sometimes have to dig deeper for.

Obviously, you have to find what works for you, but I've found these ideas to be especially helpful in getting more out of each 30 Day Sprint.  Especially the part about focusing on the learning.  I can't tell you how many times I got too focused on the results, and ended up missing the learning and the insights. 

If you slow down, you speed up, because you connect the dots at a deeper level, and you take the time to really understand nuances that make the difference.

Getting Started

Keep things simple when you start.  Just start.  Pick something, and make it your 30 Day Sprint. 

In fact, if you want to line your 30 Day Sprint up with the start of the month, then just start your 30 Day Sprint now and use it as a warm-up.  Try stuff.  Learn stuff.  Get surprised.  And then, at the start of next month, just start your 30 Day Sprint again.

If you really don't know how to get started, or want to follow a guided 30 Day Sprint, then try 30 Days of Getting Results.  It's where I share my best lessons learned for personal productivity, time management, and work-life balance.  It's a good baseline, because by mastering your productivity, time management, and work-life balance, you will make all of your future 30 Day Sprints more effective.

Boldly Go Where You Have Not Gone Before

But it's really up to you.  Pick something you've been either frustrated by, inspired by, or scared of, and dive in.

Whether you think of it as a 30 Day Challenge, a 30 Day Improvement Sprint, a Monthly Improvement Sprint, or just a 30 Day Sprint, the big idea is to do something small for 30 days.

If you want to go beyond the basics and learn everything you can about mastering personal productivity, then check out Agile Results, introduced in Getting Results the Agile Way.

Who knows what breakthroughs lie within?

May you surprise yourself profoundly.

Categories: Architecture, Programming

How Autodesk Implemented Scalable Eventing over Mesos

This is a guest post by Olivier Paugam, SW Architect for the Autodesk Cloud. I really like this post because it shows how bits of infrastructure--Mesos, Kafka, RabbitMQ, Akka, Splunk, Librato, EC2--can be combined together to solve real problems. It's truly amazing how much can get done these days by a small team.

I was tasked a few months ago to come up with a central eventing system, something that would allow our various backends to communicate with each other. We are talking about activity streaming backends, rendering, data translation, BIM, identity, log reporting, analytics, etc.  So something really generic with varying load, usage patterns and scaling profile.  And oh, also something that our engineering teams could interface with easily.  Of course every piece of the system should be able to scale on its own.

I obviously didn't have time to write too much code and picked up Kafka as our storage core as it's stable, widely used and works okay (please note I'm not bound to using it and could switch over to something else).  Now I of course could not expose it directly and had to front-end it with some API. Without thinking much I also rejected the idea of having my backend manage the offsets as it places too much constraint on how one deals with failures for instance.

So what did I end up with?

Categories: Architecture

Persistence with Docker containers - Team 1: GlusterFS

Xebia Blog - Mon, 08/17/2015 - 10:05

This is a follow-up blog from KLM innovation day

The goal of Team 1 was to have GlusterFS cluster running in Docker containers and to expose the distributed file system to a container by ‘mounting’ it through a so called data container.

Setting up GlusterFS was not that hard, the installation steps are explained here [installing-glusterfs-a-quick-start-guide].

The Dockerfiles we eventually created and used can be found here [docker-glusterfs]
Note: the Dockerfiles still containe some manual steps because you need to tell GlusterFS about the other node so they can find each other. In an real environment this could be done by for example Consul.

Although setting up GlusterFS cluster was not hard, mounting it on CoreOS proved much more complicated. We wanted to mount the folder through a container using the GlusterFS client but to achieve that the container needs to run in privileged mode or with ‘SYS_ADMIN’ capabilities. This has nothing to do with GlusterFS it self, Docker doesn’t allow mounts without these options. Eventually mounting of the remote folder can be achieved but exposing this mounted folder as Docker volume is not possible. This is an Docker shortcoming, see docker issue here

Our second - not so prefered - method was mounting the folder in CoreOS itself and then using it in a container. The problem here is that CoreOS does not have support for GlusterFS client but does have NFS support. So to make this work we exposed GlusterFS through NFS, the step to enable it can be found here [Using_NFS_with_Gluster].  After enabling NFS on GlusterFS we mounted the exposed folder in CoreOS and used it in a container which worked fine.

Mounting GlusterFS through NFS was not what we wanted, but luckily Docker released their experimental volume plugin support. And our luck did not end there, because it turned out David Calavera had already created a volume plugin for GlusterFS. So to test this out we used the experimental Docker version 1.8 and run the plugin with the necessary settings. This all worked fine but this is where our luck ran out. When using the experimental Docker daemon, in combination with this plugin, we can see in debug mode the plugin connects to the GlusterFS and saying it is mounting the folder. But unfortunately it receives an error, it seems from the server and then unmounts the folder.

The volume plugin above is basically a wrapper around the GlusterFS client. We also found a Go API for GlusterFS. This could be used to create a pure Go implementation of the volume plugin, but unfortunately we ran out of time to actually try this.


Using distributed files system like GlusterFS or CEPH sounds very promising especially combined with the Docker volume plugin which hides the implementation and allows to decouple the container from the storage of the host.


Between the innovation day and this blog post the world evolved and Docker 1.8 came out and with it a CEPH docker volume plugin.

Innovation day at KLM: Persistence with Docker containers

Xebia Blog - Sat, 08/15/2015 - 12:06

On 3th of July KLM and Cargonauts joined forces at KLM headquarters for an innovation day. The goal was to share knowledge and find out how to properly do “Persistence with Docker containers”.

Persistence is data that you want to have available after the reboot, and to make it more complex in some cases you also want to share that data over multiple nodes. Examples of this are an upload folder that is shared or a database. Our innovation day case is focusing on a MySQL database, we want to find out how we can host MySQL data reliable and highly available.

Persistence with Docker containers poses the same dilemmas as when you don’t use them but with more options to chose from. Eventually those options could be summarized to this:

  1. Don’t solve the problem on the Docker platform; Solve it at the application level by using a distributed database like Cassandra.
  2. Don’t solve the problem on the Docker platform; Use a SAAS or some technology that provides a high available storage to your host.
  3. Do fix it on the Docker platform, so you are not tied to a specific solution. This will allows you to deploy everywhere as long as Docker is available.

Since this was an innovation day and we are container enthusiasts, we focused on the last option. To make it more explicit we decided to try and investigate into these two possible solutions for our problem:

  1. Provide the host with a highly available distributed file system with GlusterFS. This will allow us to start a container anywhere and move it freely because the data is available anywhere on the platform.
  2. GlusterFS might not provide the best performance due to its distributed nature. So to have better performance we need to have the data available on the host. For this we investigated Flocker.

Note: We realise that these two solutions only solve part of the problem because what happens if the data gets corrupted? To solve that we still need some kind of a snapshot/backup solution.

Having decided on the approach we split the group into two teams, team 1 focused on GlusterFS and team 2 focused on Flocker. We will post their stories in the coming days. Stay tuned!

The innovation day went exactly as an innovation day should go, with a lot of enthusiasm followed with some disappointments which led to small victories and unanswered questions, but with great new knowledge and a clear vision where to focus your energy next.

We would like to thank KLM for hosting the innovation day!

Stuff The Internet Says On Scalability For August 14th, 2015

Hey, it's HighScalability time:

Being Google CEO: Nice. Becoming Tony Stark: Priceless (Alphabet)


  • $7: WeChat's revenue per user and there are 549 million of them; 60%: Etsy users using mobile; 10: times per second a self-driving car makes a decision; 900: calories in a litre of blood, vampires have very efficient metabolisms; 5 billion: the largest feature in the universe in light years

  • Quotable Quotes:
    • @sbeam: they finally had the Enigma machine. They opened the case. A card fell out. Turing picked it up. "Damn. They included a EULA." #oraclefanfic
    • kordless: compute and storage continue to track with Moore's Law but bandwidth doesn't. I keep wondering if this isn't some sort of universal limitation on this reality that will force high decentralization.
    • @SciencePorn: If you were to remove all of the empty space from the atoms that make up every human on earth, all humans would fit into an apple.
    • @adrianco: Commodity server with 1.4TB of RAM running a mix of 16GB regular DRAM and 128GB Memory1 modules.
    • @JudithNursalim: "One of the most scalable structure in history was the Roman army. Its unit: eight guys; the number of guys that fits in a tent" - Chris Fry
    • GauntletWizard: Google RPCs are fast. The RPC trace depth of many requests is >20 in miliseconds. Google RPCs are free - Nobody budgets for intradatacenter traffic. Google RPCs are reliable - Most teams hold themselves to a SLA of 4 9s, as measured by their customers, and many see >5 as the usual state of affairs.
    • @rzidane360: I am a Java library and I will start 50 threads and allocate a billion objects  on your behalf.
    • @codinghorror: From Sandy Bridge in Jan 2011 to Skylake in Aug 2015, x86 CPU perf increased ~25%. Same time for ARM mobile CPUs: ~800%.
    • @raistolo: "The cloud is not a cloud at all, it's a limited number of companies that have control over a large part of the Internet" @granick
    • Benedict Evans: since 1999 there are now roughly 10x more people online, US online revenues from ecommerce and advertising have risen 15x, and the cost of creating software companies has fallen by roughly 10x. 

  • App constellations aren't working. Is this another idea the West will borrow from the East? When One App Rules Them All: The Case of WeChat and Mobile in China: Chinese apps tend to combine as many features as possible into one application. This is in stark contrast to Western apps, which lean towards “app constellations”.

  • It doesn't get much more direct than this. Labellio: Scalable Cloud Architecture for Efficient Multi-GPU Deep Learning: The Labellio architecture is based on the modern distributed computing architectural concept of microservices, with some modification to achieve maximal utilization of GPU resources. At the core of Labellio is a messaging bus for deep learning training and classification tasks, which launches GPU cloud instances on demand. Running behind the web interface and API layer are a number of components including data collectors, dataset builders, model trainer controllers, metadata databases, image preprocessors, online classifiers and GPU­-based trainers and predictors. These components all run inside docker containers. Each component communicates with the others mainly through the messaging bus to maximize the computing resources of CPU, GPU and network, and share data using object storage as well as RDBMS.

  • How do might your application architecture change using Lambda? Here's a nice example of Building Scalable and Responsive Big Data Interfaces with AWS Lambda. A traditional master-slave or job server model is not used, instead Lambda is used to connect streams or processes in a pipeline. Work is broken down into smaller, parallel operations on small chunks with Lambda functions doing the heavy lifting. The pipeling consists of a S3 key lister, AWS Lambda invoker/result aggregator, Web client response handle. 

  • The Indie Web folks have put together a really big list of Site Deaths, that is sites that have had their plugs pulled, bits blackened, dreams dashed. Take some time, look through, and say a little something for those that have gone before.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Personal Development Insights from the Greatest Book on Personal Development Ever

“Let him who would move the world first move himself.” ― Socrates

At work, and in life, you need every edge you can get.

Personal development is a process of realizing and maximizing your potential.

It’s a way to become all that you’re capable of.

One of the most powerful books on personal development is Unlimited Power, by Tony Robbins.  In Unlimited Power, Tony Robbins shares some of the most profound insights in personal development that world has ever known.

Develop Your Abilities and Model Success

Through a deep dive into the world of NLP (Neuro-Linguistic Programming) and Neuro-Associative Conditioning, Robbins shows you how to master you mind, master your body, master your emotional intelligence, and improve what you’re capable of in all aspects of your life.  You can think of NLP as re-programming your mind, body, and emotions for success.

We’ve already been programmed by the shows we watch, the books we’ve read, the people in our lives, the beliefs we’ve formed.  But a lot of this was unconscious.  We were young and took things at face value, and jumped to conclusions about how the world works, who we are, and who we can be, or worse, who others think we should be.

NLP is a way to break way from limiting beliefs and to model the success of others with skill.  You can effectively reverse engineer how other people get success and then model the behavior, the attitudes, and the actions that create that success.  And you can do it better, faster, and easier, than you might imagine. 

NLP is really a way to model what the most successful people think, say, and do.

Unlimited Power at Your Fingertips

I’ve created a landing page that is a round up and starting point to dive into some of the book nuggets from Unlimited Power:

Unlimited Power Book Nuggets at a Glance

On that page, I also provided very brief summaries of the core personal development insight so that you can get a quick sense of the big ideas.

A Book Nugget is simply what I call a mini-lesson or insight from a book that you can use to change what you think, feel, or do.

Unlimited Power is not an easy book to read, but it’s one of the most profound tombs of knowledge in terms of personal development insights.

Personal Development Insights at Your Fingertips

If you want to skip the landing page and just jump into a few Unlimited Power Book Nuggets and take a few personal development insights for a spin, here you go:

5 Keys to Wealth and Happiness

5 Rules for Formulating Outcomes

5 Sources of Beliefs for Personal Excellence

7 Beliefs for Personal Excellence

7 Traits of Success

Create Your Ideal Day, the Tony Robbins Way

Don’t Compare Yourself to Others

How To Change the Emotion of Any Experience to Empower You

How To Get Whatever You Want

Leadership for a Better World

Persuasion is the Most Important Skill You Can Develop

Realizing Your Potential is a Dynamic Process

Schotoma: Why You Can’t See What’s Right in Front of You

Seven Meta-Programs for Understanding People

The Difference Between Those Who Succeed and Those Who Fail

As you’ll quickly see, Unlimited Power remains one of the most profound sources of insight for realizing your potential and becoming all that you’re capable of.

It truly is the ultimate source of personal development in action.


Categories: Architecture, Programming

Why My Water Droplet Is Better Than Your Hadoop Cluster

We’ve had computation using slime mold and soap film, now we have computation using water droplets. Stanford bioengineers have built a “fully functioning computer that runs like clockwork - but instead of electrons, it operates using the movement of tiny magnetised water droplets.”


By changing the layout of the bars on the chip it's possible to make all the universal logic gates. And any Boolean logic circuit can be built by moving the little magnetic droplets around. Currently the chips are about half the size of a postage stamp and the droplets are smaller than poppy seeds.

What all this means I'm not sure, but pavo6503 has a comment that helps understand what's going on:

Logic gates pass high and low states. Since they plan to use drops of water as carriers and the substances in those drops to determine what the high/low state is they could hypothetically make a filter that sorts drops of water containing 1 to many chemicals. Pure water passes through unchanged. water with say, oil in it, passes to another container, water with alcohol to another. A "chip" with this setup could be used to purify water where there are many contaminants you want separated.

Categories: Architecture

You might not need lodash (in your ES2015 project)

Xebia Blog - Tue, 08/11/2015 - 12:44
code { display: inline !important; }

This post is the first in a series of ES2015 posts. We'll be covering new JavaScript functionality every week for the coming two months.

ES2015 brings a lot of new functionality to the table. It might be a good idea to evaluate if your new or existing projects actually require a library such as lodash. We'll talk about several common usages of lodash functions that can be simply replaced by a native ES2015 implementation.


_.extend / _.merge

Let's start with _.extend and its related _.merge function. These functions are often used for combining multiple configuration properties in a single object.

const dst = { xeb: 0 };
const src1 = { foo: 1, bar: 2 };
const src2 = { foo: 3, baz: 4 };

_.extend(dst, src1, src2);

assert.deepEqual(dst, { xeb: 0, foo: 3, bar: 2, baz: 4 });

Using the new Object.assign method, the same behaviour is natively possible:

const dst2 = { xeb: 0 };

Object.assign(dst2, src1, src2);

assert.deepEqual(dst2, { xeb: 0, foo: 3, bar: 2, baz: 4 });

We're using Chai assertions to confirm the correct behaviour.


_.defaults / _.defaultsDeep

Sometimes when passing many parameters to a method, a config object is used. _.defaults and its related _.defaultsDeep function come in handy to define defaults in a certain structure for these config objects.

function someFuncExpectingConfig(config) {
  _.defaultsDeep(config, {
    text: 'default',
    colors: {
      bgColor: 'black',
      fgColor: 'white'
  return config;
let config = { colors: { bgColor: 'grey' } };


assert.equal(config.text, 'default');
assert.equal(config.colors.bgColor, 'grey');
assert.equal(config.colors.fgColor, 'white');

With ES2015, you can now destructure these config objects into separate variables. Together with the new default param syntax we get:

function destructuringFuncExpectingConfig({
  text = 'default',
  colors: {
    bgColor: bgColor = 'black',
    fgColor: fgColor = 'white' }
  }) {
  return { text, bgColor, fgColor };

const config2 = destructuringFuncExpectingConfig({ colors: { bgColor: 'grey' } });

assert.equal(config2.text, 'default');
assert.equal(config2.bgColor, 'grey');
assert.equal(config2.fgColor, 'white');


_.find / _.findIndex

Searching in arrays using an predicate function is a clean way of separating behaviour and logic.

const arr = [{ name: 'A', id: 123 }, { name: 'B', id: 436 }, { name: 'C', id: 568 }];
function predicateB(val) {
 return val.name === 'B';

assert.equal(_.find(arr, predicateB).id, 436);
assert.equal(_.findIndex(arr, predicateB), 1);

In ES2015, this can be done in exactly the same way using Array.find.

assert.equal(Array.find(arr, predicateB).id, 436);
assert.equal(Array.findIndex(arr, predicateB), 1);

Note that we're not using the extended Array-syntax arr.find(predicate). This is not possible with Babel that was used to transpile this ES2015 code.


_.repeat, _.startsWith, _.endsWith and _.includes

Some very common but never natively supported string functions are _.repeat to repeat a string multiple times and _.startsWith / _.endsWith / _.includes to check if a string starts with, ends with or includes another string respectively.

assert.equal(_.repeat('ab', 3), 'ababab');
assert.isTrue(_.startsWith('ab', 'a'));
assert.isTrue(_.endsWith('ab', 'b'));
assert.isTrue(_.includes('abc', 'b'));

Strings now have a set of new builtin prototypical functions:

assert.equal('ab'.repeat(3), 'ababab');



A not-so-common function to fill an array with default values without looping explicitly is _.fill.

const filled = _.fill(new Array(3), 'a', 1);
assert.deepEqual(filled, [, 'a', 'a']);

It now has a drop-in replacement: Array.fill.

const filled2 = Array.fill(new Array(3), 'a', 1);
assert.deepEqual(filled2, [, 'a', 'a']);


_.isNaN, _.isFinite

Some type checks are quite tricky, and _.isNaN and _.isFinite fill in such gaps.


Simply use the new Number builtins for these checks now:



_.first, _.rest

Lodash comes with a set of functional programming-style functions, such as _.first (aliased as _.head) and _.rest (aliased _.tail) which get the first and rest of the values from an array respectively.

const elems = [1, 2, 3];

assert.equal(_.first(elems), 1);
assert.deepEqual(_.rest(elems), [2, 3]);

The syntactical power of the rest parameter together with destructuring replaces the need for these functions.

const [first, ...rest] = elems;

assert.equal(first, 1);
assert.deepEqual(rest, [2, 3]);



Specially written for ES5, lodash contains helper functions to mimic the behaviour of some ES2015 parts. An example is the _.restParam function that wraps a function and sends the last parameters as an array to the wrapped function:

function whatNames(what, names) {
 return what + ' ' + names.join(';');
const restWhatNames = _.restParam(whatNames);

assert.equal(restWhatNames('hi', 'a', 'b', 'c'), 'hi a;b;c');

Of course, in ES2015 you can simply use the rest parameter as intended.

function whatNamesWithRest(what, ...names) {
 return what + ' ' + names.join(';');

assert.equal(whatNamesWithRest('hi', 'a', 'b', 'c'), 'hi a;b;c');



Another example is the _.spread function that wraps a function which takes an array and sends the array as separate parameters to the wrapped function:

function whoWhat(who, what) {
 return who + ' ' + what;
const spreadWhoWhat = _.spread(whoWhat);
const callArgs = ['yo', 'bro'];

assert.equal(spreadWhoWhat(callArgs), 'yo bro');

Again, in ES2015 you want to use the spread operator.

assert.equal(whoWhat(...callArgs), 'yo bro');


_.values, _.keys, _.pairs

A couple of functions exist to fetch all values, keys or value/key pairs of an object as an array:

const bar = { a: 1, b: 2, c: 3 };

const values = _.values(bar);
const keys = _.keys(bar);
const pairs = _.pairs(bar);

assert.deepEqual(values, [1, 2, 3]);
assert.deepEqual(keys, ['a', 'b', 'c']);
assert.deepEqual(pairs, [['a', 1], ['b', 2], ['c', 3]]);

Now you can use the Object builtins:

const values2 = Object.values(bar);
const keys2 = Object.keys(bar);
const pairs2 = Object.entries(bar);

assert.deepEqual(values2, [1, 2, 3]);
assert.deepEqual(keys2, ['a', 'b', 'c']);
assert.deepEqual(pairs2, [['a', 1], ['b', 2], ['c', 3]]);


_.forEach (for looping over object properties)

Looping over the properties of an object is often done using a helper function, as there are some caveats such as skipping unrelated properties. _.forEach can be used for this.

const foo = { a: 1, b: 2, c: 3 };
let sum = 0;
let lastKey = undefined;

_.forEach(foo, function (value, key) {
  sum += value;
  lastKey = key;

assert.equal(sum, 6);
assert.equal(lastKey, 'c');

With ES2015 there's a clean way of looping over Object.entries and destructuring them:

sum = 0;
lastKey = undefined;
for (let [key, value] of Object.entries(foo)) {
  sum += value;
  lastKey = key;

assert.equal(sum, 6);
assert.equal(lastKey, 'c');



Often when having nested structures, a path selector can help in selecting the right variable. _.get is created for such an occasion.

const obj = { a: [{}, { b: { c: 3 } }] };

const getC = _.get(obj, 'a[1].b.c');

assert.equal(getC, 3);

Although ES2015 does not has a native equivalent for path selectors, you can use destructuring as a way of 'selecting' a specific value.

let a, b, c;
({ a : [, { b: { c } }]} = obj);

assert.equal(c, 3);



A very Python-esque function that creates an array of integer values, with an optional step size.

const range = _.range(5, 10, 2);
assert.deepEqual(range, [5, 7, 9]);

As a nice ES2015 alternative, you can use a generator function and the spread operator to replace it:

function* rangeGen(from, to, step = 1) {
  for (let i = from; i < to; i += step) {
    yield i;

const range2 = [...rangeGen(5, 10, 2)];

assert.deepEqual(range2, [5, 7, 9]);

A nice side-effect of a generator function is it's laziness. It is possible to use the range generator without generating the entire array first, which can come in handy when memory usage should be minimal.


Just like kicking the jQuery habit, we've seen that there are alternatives to some lodash functions, and it can be preferable to use as little of these functions as possible. Keep in mind that the lodash library offers a consistent API that developers are probably familiar with. Only swap it out if the ES2015 benefits outweigh the consistency gains (for instance, when performance is an issue).

For reference, you can find the above code snippets at this repo. You can run them yourself using webpack and Babel.

Spark, Parquet and S3 – It’s complicated.

(A version of this post was originally posted in AppsFlyer’s blog. Also special thanks to Morri Feldman and Michael Spector from AppsFlyer data team that did most of the work solving the problems discussed in this article)

TL;DR; The combination of Spark, Parquet and S3 (& Mesos) is a powerful, flexible and cost effective analytics platform (and, incidentally, an alternative to Hadoop). However making all these technologies gel and play nicely together is not a simple task. This post describes the challenges we (AppsFlyer) faced when building our analytics platform on these technologies, and the steps we  took to mitigate them and make it all work.

Spark is shaping up as the leading alternative to Map/Reduce for several reasons including the wide adoption by the different Hadoop distributions, combining both batch and streaming on a single platform and a growing library of machine-learning integration (both in terms of included algorithms and the integration with machine learning languages namely R and Python). At AppsFlyer, we’ve been using Spark for a while now as the main framework for ETL (Extract, Transform & Load) and analytics. A recent example is the new version of our retention report  that we recently released, which utilized Spark  to crunch several data streams (> 1TB a day) with ETL (mainly data cleansing) and analytics (a stepping stone towards full click-fraud detection) to produce the report.
One of the main changes we introduced in this report is the move from building onSequence files to using Parquet files. Parquet is a columnar data format, which is probably the best option today for storing long term big data for analytics purposes (unless you are heavily invested in Hive, where Orc is the more suitable format). The advantages of Parquet  vs. Sequence files are performance and compression without losing the benefit of wide support by big-data tools  (Spark, Hive, Drill, Tajo, Presto etc.).

One relatively unique aspect of our infrastructure for big data is that we do not use Hadoop (perhaps that’s a topic for a separate post). We are using Mesos as a resource manager instead of YARN and we use Amazon S3 instead of HDFS as a distributed storage solution. HDFS has several advantages over S3, however, the cost/benefit for maintaining long running HDFS clusters on AWS  vs. using S3 are overwhelming in favor of S3.

That said, the combination of Spark, Parquet and S3 posed several challenges for us and this post will list the major ones and the solutions we came up with to cope with them.

Parquet & Spark

Parquet and Spark seem to have been in a  love-hate relationship for a while now. On the one hand, the Spark documentation touts  Parquet as one of the best formats for analytics of big data (it is) and on the other hand the support for Parquet in Spark is incomplete and annoying to use. Things are surely moving in the right direction but there are still a few quirks and pitfalls to watch out for.

To start on a positive note,Spark and Parquet integration has come a  long way in the past few months. Previously, one had to jump through hoops just to be able to convert existing data to Parquet. The introduction of DataFrames to Spark made this process much, much simpler. When the input format is supported by the DataFrame API e.g. the input is JSON  (built-in) or Avro (which isn’t built in Spark yet, but you can use a library to read it) converting to Parquet is just a matter of reading the input format on one side and persisting it as Parquet on the other. Consider for example the following snippet in Scala:

View the code on Gist.

Even when you are handling a format where the schema isn’t part of the data, the conversion process is quite simple as Spark lets you specify the schema programmatically. The Spark documentation is pretty straightforward and contains examples in Scala, Java and Python. Furthermore, it isn’t too complicated to define schemas in other languages. For instance, here (AppsFlyer), we use Clojure as our main development language so we developed a couple of helper functions to do just that. The sample code below provides the details:

The first thing is to extract the data from whatever structure we have and specify the schema we like. The code below takes an event-record and extracts various data points from it into a vector of the form [:column_name value optional_data_type]. Note that the data type is optional since it is defaults to string if not specified.

View the code on Gist.

The next step is to use the above mentioned structure to both extract the schema and convert to DataFrame Rows:

View the code on Gist.

Finally we apply these functions over an RDD , convert it to a data frame and save as parquet:

View the code on Gist.

As mentioned above, things are on the up and up for Parquet and Spark but the road is not clear yet. Some of the problems we encountered include:

  • a critical bug in 1.4 release where a race condition when writing parquet files caused massive data loss on jobs (This bug is fixed in 1.4.1 – so if you are using Spark 1.4 and parquet upgrade yesterday!)
  • Filter pushdown optimization, which is turned off  by default since Spark still uses Parquet 1.6.0rc3  -even though 1.6.0 has been out for awhile (it seems Spark 1.5 will use parquet 1.7.0 so the problem will be solved)
  • Parquet is not “natively” supported in Spark, instead, Spark relies on Hadoop support for the parquet format – this is not a problem in itself, but for us it caused major performance issues when we tried to use Spark and Parquet with S3 – more on that in the next section

Parquet, Spark & S3

Amazon S3 (Simple Storage Services) is an object storage solution that is relatively cheap to use. It does have a few disadvantages vs. a “real” file system; the major one is eventual consistency i.e. changes made by one process are not immediately visible to other applications. (If you are using Amazon’s EMR you can use EMRFS “consistent view” to overcome this.) However, if you understand this limitation, S3 is still a viable input and output source, at least for batch jobs.

As mentioned above, Spark doesn’t have a native S3 implementation and relies on Hadoop classes to abstract the data access to Parquet. Hadoop provides 3 file system clients to S3:

  • S3 block file system (URI schema of the form “s3://..”) which doesn’t seem to work with Spark which only work on EMR (Edited: 12/8/2015 thanks to Ewan Leith)
  • S3 Native Filesystem (“s3n://..” URIs) – download  Spark distribution that supports Hadoop 2.* and up if you want to use this (tl;dr – you don’t)
  • S3a – a replacement for S3n that removes some of the limitations and problems of S3n. Download “Spark with Hadoop 2.6 and up” support to use this one (tl;dr – you want this but it needs some work before it is usable)


When we used Spark 1.3 we encountered many problems when we tried to use S3, so we started out using s3n – which worked for the most part, i.e. we got jobs running and completing but a lot of them failed with various read timeout and host unknown exceptions. Looking at the tasks within the jobs the picture was even grimmer with high percentages of failures that pushed us to increase timeouts and retries to ridiculous levels. When we moved to Spark 1.4.1, we took another stab at trying s3a. This time around we got it to work. The first thing we had to do was to set both spark.executor.extraClassPath and spark.executor.extraDriverPath to point at the aws-java-sdk and the hadoop-aws jars since apparently both are missing from the “Spark with Hadoop 2.6” build. Naturally we used the 2.6 version of these files but then we were hit by this little problem. Hadoop 2.6 AWS implementation has a bug which causes it to split S3 files  in unexpected ways (e.g. a 400 files jobs ran with 18 million tasks) luckily using  Hadoop AWS jar to version 2.7.0 instead of the 2.6 one solved this problem –  So,with all that set  s3a prefixes works without hitches (and provides better performance than s3n).

Finding the right S3 Hadoop library contributes to the stability of our jobs but  regardless of S3 library (s3n or s3a) the performance of Spark jobs that use Parquet files was  still abysmal. When looking at the Spark UI, the actual work of handling the data seemed quite reasonable but Spark spent a huge amount of time before actually starting the work and after the job was “completed” before it actually terminated. We like to call this phenomena the “Parquet Tax.”

Obviously we couldn’t live with the “Parquet Tax” so we delved into the log files of our jobs and discovered several issues. This first one has to do with startup times of Parquet jobs. The people that built Spark understood that schema can evolve over time and provides a nice feature for DataFrames called “schema merging.” If you look at schema in a big data lake/reservoir (or whatever it is called today) you can definitely expect the schema to evolve over time. However if you look at a directory that is the result of a single job there is no difference in the schema… It turns out that when Spark initializes a job, it reads the footers of all the Parquet files to perform the schema merging. All this work is done from the driver before any tasks are allocated to the executor and can take long minutes, even hours (e.g. we have jobs that look back at half a year of install data). It isn’t documented but looking at the Spark code  you can override this behavior by specifying mergeSchema as false :

In Scala:

View the code on Gist.

and in Clojure:

View the code on Gist.

Note that this doesn’t work in Spark 1.3. In Spark 1.4 it works as expected and in Spark 1.4.1 it causes Spark only to look at _common_metadata file which is not the end of the world since it is a small file and there’s only one of these per directory. However, this brings us to another aspect of the “Parquet Tax” – the “end of job” delays.

Turning off schema merging and controlling the schema used by Spark helped cut down the job start up times but, as mentioned we still suffered from long delays at the end of jobs. We already knew of one Hadoop<->S3 related problem when using text files. Hadoop  being immutable first writes files to a temp directory and then copies them over. With S3 that’s not a problem but the copy operation is very very expensive. With text files, DataBricks created DirectOutputCommitter  (probably for their Spark SaaS offering). Replacing the output committer for text files is fairly easy – you just need to set “spark.hadoop.mapred.output.committer.class” on the Spark configuration e.g.:

View the code on Gist.

A similar solution exists for Parquet and unlike the solution for text files it is even part of the Spark distribution. However, to make things complicated you have to configure it on Hadoop configuration and not on the Spark configuration. To get the Hadoop configuration you first need to create a Spark context from the Spark configuration, call hadoopConfiguration on it and then set “spark.sql.parquet.output.committer.class” as in:

View the code on Gist.

Using the DirectParquetOutputCommitter provided a significant reduction in the “Parquet Tax” but we still found that some jobs were taking a very long time to complete. Again the problem was the file system assumptions Spark and Hadoop hold which were  the culprits. Remember the “_common_metadata” Spark looks at the onset of a job – well, Spark spends a lot of time at the end of the job creating both this file and an additional MetaData file with additional info from the files that are in the directory. Again this is all done from one place (the driver) rather than being handled by the executors. When the job results in small files (even when there are couple of thousands of those) the process takes reasonable time. However, when the job results in larger files (e.g. when we ingest a full day of application launches) this takes upward of an hour. As with mergeSchema the solution is to manage metadata manually so we set “parquet.enable.summary-metadata” to false (again on the Hadoop configuration and generate the _common_metadata file ourselves (for the large jobs)

To sum up, Parquet and especially Spark are works in progress – making cutting edge technologies work for you can be a challenge and require a lot of digging. The documentation is far from perfect at times but luckily all the relevant technologies are open source (even the Amazon SDK), so you can always dive into the bug reports, code etc. to understand how things actually work and find the solutions you need. Also, from time to time you can find articles and blog posts that explain how to overcome the common issues in technologies you are using. I hope this post clears off some of the complications  of integrating Spark, Parquet and S3, which are, at the end of the day, all great technologies with a lot of potential.

Image by xkcd. Licensed under creative-commons 2.5

Categories: Architecture

Testing UI changes in large web applications

Xebia Blog - Mon, 08/10/2015 - 13:51

When a web application starts to grow in terms of functionality, number of screens and amount of code, automated testing becomes a necessity. Not only will these tests prevent you from delivering bugs to your users but also help to maintain a high speed of development. This ensures that you'll be focusing on new and better features without having to fix bugs in the existing ones.

However even with all kinds of unit-, integration- and end-to-end tests in place,  you'll still end up with a huge blind spot: does your application still looks like it's supposed to?

Can we test for this as well? (hint: we can).

Breaking the web's UI is easy

A web application's looks is determined by a myriad of HTML tags and CSS rules which are often re-used in many different combinations. And therein lies the problem: any seemingly innocuous change to markup or CSS could lead to a broken layout, unaligned elements or other unintended side effects. A change in CSS or markup for one screen, could lead to problems on another.

Additionally, as browsers are often being updated, CSS and markup bugs might be either fixed or introduced. How will you know if your application is still looking good in the latest Firefox or Chrome version or the next big new browsers of the future?

So how do we test this?

The most obvious method to prevent visual regressions in a web application is to manually click through every screen of an application using several browsers on different platforms, looking for problems. While this solution might work fine at first, it does not scale very well. The amount of screens you'll have to look through will increase, which will steadily increase the time you'll need for testing. This in turn will slow your development speed considerably.

Clicking through every screen every time you want to release a new feature is a very tedious process. And because you'll be looking at the same screens over and over again, you (and possibly your testing colleagues) will start to overlook things.

So this manual process slow downs your development process, it's error prone and, most importantly, it's no fun!

Automate all the things?

As a developer, my usual response to repetitive manual processes is to automate them away with some clever scripts or tools. Sadly, this solution won't work either. Currently it's not possible to let a script determine which visual change to a page is good or bad. While we might delegate this task to some revolutionary artificial intelligence in the future, it's not a solution we can use right now.

What we can do: automate the pieces of the visual testing process where we can, while still having humans determine whether a visual change is intended.

Also taking into account our quality and requirements in regards to development speed, we'll be looking for a tool that:

  • minimizes the manual steps in our development workflow
  • makes it easy to create, update, debug and run the tests
  • provides a solid user- and developer/tester experience
Introducing: VisualReview

To address these issues we're developing a new tool called VisualReview. Its goal is to provide a productive and human-friendly workflow for testing and reviewing your web application's layout for any regressions. In short, VisualReview allows you to:

  • use a (scripting) environment of your choice to automate screen navigation and making screenshots of selected screens
  • accept and reject any differences in screenshots between runs in a user-friendly workflow.
  • compare these screenshots against previously accepted ones.

With these features (and more to come) VisualReview's primary focus is to provide a great development process and environment for development teams.

How does it work?

VisualReview acts as a server that receives screenshots though a regular HTTP upload. When a screenshot is received, it's compared against a baseline and stores any differences it finds. After all screenshots have been analyzed someone from your team (a developer, tester or any other role) opens up the server's analysis page to view any differences and accepts or rejects them. Every screenshot that's been accepted will be set as a baseline for future tests.

Sending screenshots to VisualReview is typically done from a test script. We already provide an API for Protractor (AngularJS's browser testing tool, basically an Angular friendly wrapper around Selenium), however any environment could potentially use VisualReview as the upload is done using a simple HTTP REST call. A great example of this happened during a recent meetup where we presented VisualReview. During our presentation a couple of attendees created a node client for use in their own project. A working version was running even before the meetup was over.

Example workflow

To illustrate how this works in practice I'll be using an example web application. In this case a twitter clone called 'Deep Thoughts' where users can post a single-sentence thought, similar to Reddit's shower thoughts.
Deep Thoughts is an Angular application so I'll be using Angular's browser testing tool Protractor to test for visual changes. Protractor does not support sending screenshots to VisualReview by default, so we'll be using visualreview-protractor as a dependency to the protractor test suite. After adding some additional protractor configuration and made sure the VisualReview server is running, we're ready to run the test script. The test script could look like this:

var vr = browser.params.visualreview;
describe('the deep thoughts app', function() {
  it('should show the homepage', function() {

With all pieces in place, we can now run the Protractor script:

protractor my-protractor-config.js

When all tests have been executed, the test script ends with the following message:

VisualReview-protractor: test finished. Your results can be viewed at: http://localhost:7000/#/1/1/2/rp

Opening the link in a browser it shows the VisualReview screenshot analysis tool.

VisualReview analysis screen

For this example we've already created a baseline of images, so this screen now highlights differences between the baseline and the new screenshot. As you can see, the left and right side of the submit button are highlighted in red: it seems that someone has changed the button's width. Using keyboard or mouse navigation, I can view both the new screenshot and the baseline screenshot. The differences between the two are highlighted in red.

Now I can decide whether or not I'm going to accept this change using the top menu.

Accepting or rejecting screenshots in VisualReview

If I accept this change, the screenshot will replace the one in the baseline. If I reject it, the baseline image will remain as it is while this screenshot is marked as a 'rejection'.  With this rejection state, I can now point other team members to look at all the rejected screenshots by using the filter option which allows for better team cooperation.

VisualReview filter menu

Open source

VisualReview is an open source project hosted on github. We recently released our first stable version and are very interested in your feedback. Try out the latest release or run it from an example project. Let us know what you think!


Another elasticsearch blog post, now about Shield

Gridshore - Thu, 01/29/2015 - 21:13

<p>I just wrote another piece of text on my other blog. This time I wrote about the recently release elasticsearch plugin called Shield. If you want to learn more about securing your elasticsearch cluster, please head over to my other blog and start reading</p>


The post Another elasticsearch blog post, now about Shield appeared first on Gridshore.

Categories: Architecture, Programming

New blog posts about bower, grunt and elasticsearch

Gridshore - Mon, 12/15/2014 - 08:45

Two new blog posts I want to point out to you all. I wrote these blog posts on my employers blog:

The first post is about creating backups of your elasticsearch cluster. Some time a go they introduced the snapshot/restore functionality. Of course you can use the REST endpoint to use the functionality, but how easy is it if you can use a plugin to handle the snapshots. Or maybe even better, integrate the functionality in your own java application. That is what this blogpost is about, integrating snapshot/restore functionality in you java application. As a bonus there are the screens of my elasticsearch gui project snowing the snapshot/restore functionality.

Creating elasticsearch backups with snapshot/restore

The second blog post I want to put under you attention is front-end oriented. I already mentioned my elasticsearch gui project. This is an Angularjs application. I have been working on the plugin for a long time and the amount of javascript code is increasing. Therefore I wanted to introduce grunt and bower to my project. That is what this blogpost is about.

Improve my AngularJS project with grunt

The post New blog posts about bower, grunt and elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming

New blogpost on kibana 4 beta

Gridshore - Tue, 12/02/2014 - 12:55

If you are like me interested in elasticsearch and kibana, than you might be interested in a blog post I wrote on my employers blog about the new Kibana 4 beta. If so, head over to my employers blog:


The post New blogpost on kibana 4 beta appeared first on Gridshore.

Categories: Architecture, Programming

Big changes

Gridshore - Sun, 10/26/2014 - 08:27

The first 10 years of my career I worked as a consultant for Capgemini and Accenture. I learned a lot in that time. One of the things I learned was that I wanted something else. I wanted to do something with more impact, more responsibility and together with people that wanted to do challenging projects. Not pure to get up in your career but more because they like doing cool stuff. Therefore I left Accenture to become part of a company called JTeam. It has been over 6 years that this took place.

I started as Chief Architect at JTeam. The goal was to become a leader to the other architects and create a team together with Bram. That time I was lucky that Allard joined me. We share a lot of ideas, which makes it easier to set goals and accomplish them. I got to learn a few very good people at JTeam, to bad that some of them left, but that is life.

After a few years bigger changes took place. Leonard left for Steven and the shift to a company that needs to grow started. We took over two companies (Funk and Neteffect), we now had all disciplines of software development available. From front-end to operations. As the company grew some things had to change. I got more involved in arranging things like internships, tech events, partnerships and human resource management.

We moved into a bigger building and we had better opportunities. One of the opportunities was a search solution created by Shay Banon. Gone was Steven, together with Shay he founded Elasticsearch. We got acquired by Trifork. In this change we lost most of our search expertise because all of our search people joined the elasticsearch initiative. Someone had to pick up search at Trifork and that was me together with Bram.

For over 2 years I invested a lot of time in learning about mainly elasticsearch. I created a number of workshops/trainings and got involved with multiple customers that needed search. I have given trainings to a number of customers to groups varying between 2 and 15 people. In general they were all really pleased with the trainings I have given.

Having so much focus for a while gave me a lot of time to think, I did not need to think about next steps for the company, I just needed to get more knowledgeable about elasticsearch. In that time I started out on a journey to find out what I want. I talked to my management about it and thought about it myself a lot. Then, right before summer holiday I had a diner with two people I know through the Nljug, Hans and Bert. We had a very nice talk and in the end they gave me an opportunity that I really had to have some good thoughts about. It was really interesting, a challenge, not really a technical challenge, but more an experience that is hard to find. During summer holiday I convinced myself this was a very interesting direction and I took the next step.

I had a lunch meeting with my soon to be business partner Sander. After around 30 minutes it already felt good. I really feel the energy of creating something new, I feel inspired again. This is the feeling I have been missing for a while. In September we were told that Bram was leaving Trifork. Since he is the person that got me into JTeam back in the days it felt weird. I understand his reasons to go out and try to start something new. Bram leaving resulted in a vacancy for a CTO and the management team had decided to approach Allard for this role. This was a surprise to me, but a very nice opportunity for Allard and I know he i going to do a good job. At the end of September Sander and myself presented the draft business plan to the board for Luminis. That afternoon hands were shaken. It was than that I made the last call and decided to resign from my Job at Trifork and take this new opportunity at Luminis.

I feel sad about leaving some people behind. I am going to mis the morning talks in the car with Allard about everything related to the company, I am going to mis doing projects with Roberto (We are a hell of team), I am going to mis Byron for his capabilities (You make me feel proud that I guided your first steps within Trifork), I am going to mis chasing customers with Henk (We did a good job the passed year) and I am going to mis Daphne and the after lunch walks . To all of you and all the others at Trifork, it is a small world …

Luminis logo

Together with Sander, and with the help of all the others at Luminis, we are going to start Luminis Amsterdam. This is going to be a challenge for me, but together with Sander I feel we are going to make it happen. I feel confident that the big changes to come will be good changes.

The post Big changes appeared first on Gridshore.

Categories: Architecture, Programming

New book review ‘SOA Made Simple’ coming soon.

Ben Wilcock - Tue, 02/05/2013 - 14:24

My review copy has arrived and I’ll be reviewing it just as soon as I can, but in the meantime if you’d like more information about this new book go to http://bit.ly/wpNF1J

Categories: Architecture, Programming