Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!


Improvements in Xamarin.Forms 1.3

Eric.Weblog() - Eric Sink - Tue, 01/27/2015 - 19:00

Back in November I wrote a blog entry about performance problems resulting from the design of the layout system in Xamarin.Forms. I am pleased to report that things took a big step forward with the recent release of version 1.3.

Reviewing the problem

In a nutshell, the Layout classes do too much. They contain functionality to make sure everything gets updated whenever something changes. In principle, this is good, since we obviously don't want stale stuff on the screen. But in practice, there are many cases where the built-in update code ends up being slower than necessary.

For example, suppose I'm going to add ten child views to a layout. With the built-in update code, a layout cycle will get triggered ten times, once for each child view I add. Worse, if I'm trying to do any kind of subview recycling, the odds are high that I want to add a child view while I am processing a layout cycle. This will trigger a recursive layout cycle, resulting in the end of civilization as we know it.

Instead, what I want is one layout cycle which happens after all ten child views have been added.

The solution I proposed

IMHO, the best design for this kind of problem is to have multiple layers:

  • The Low-Level layer models child view relationships only. It provides a way for a View to be inside another View, but it doesn't give much more than that. In iOS terms, this is UIView.addSubView.

  • The High-Level layer (which is built on the functionality provided by the layers below it) has Views which actively manage their child views. In iOS terms, an example of this would be UICollectionView.

  • In the Middle, it would make sense to have a layer which provides things which are common to all (or nearly all) of the stuff in the High-Level layer, to avoid code duplication.

Xamarin.Forms has the High-Level layer and the Middle layer, but it does not have the Low-Level layer. So I proposed creating it.

I didn't get exactly what I wanted, but...

The solution in Xamarin.Forms 1.3

In Xamarin.Forms 1.3, the Middle layer is still the lowest thing we've got. However, there are new capabilities which allow the Middle layer to pretend like it is a Low-Level layer. It still has a bunch of built-in update code, but now that code can be turned off. :-)

The important new capabilities are:

  • ShouldInvalidateOnChildAdded
  • ShouldInvalidateOnChildRemoved
  • OnChildMeasureInvalidated

By returning false from my override of ShouldInvalidateOnChildAdded() and ShouldInvalidateOnChildRemoved(), I can have a Layout which doesn't do any automatic updates when I add or remove children.

And by overriding OnChildMeasureInvalidated(), I can have a Layout which refuses to do real estate negotiations with its child views.

This is good.

How I'm using this

Because of this new stuff, an upcoming release of our DataGrid component will be even faster. Our panel layout class will look something like this:

private class myLayout : Layout<View>
    Func<View,Rectangle> getBox;

    public myLayout(Func<View,Rectangle> f)
        getBox = f;

    public void LayoutOneChild(View v)
        Rectangle r = getBox (v);
        v.Layout (r);

    public void LayoutAllChildren()
        foreach (View v in Children) {
            LayoutOneChild (v);

    protected override bool ShouldInvalidateOnChildAdded (View child)
        return false; // stop pestering me

    protected override bool ShouldInvalidateOnChildRemoved (View child)
        return false; // go away and leave me alone

    protected override void OnChildMeasureInvalidated ()
        // I'm ignoring you.  You'll take whatever size I want to give
        // you.  And you'll like it.

    protected override void LayoutChildren (double x, double y, double width, double height)
        LayoutAllChildren ();

This Layout class is obviously very simplistic, but it merely scratches the surface of what becomes possible now that Xamarin.Forms has [something that can imitate] a Low-Level subview layer.

Kudos and thanks to the Xamarin.Forms team!


Android Wear & QR Code: Putting Users through the Fast Track

Android Developers Blog - Tue, 01/27/2015 - 13:18

Posted by Hoi Lam, Developer Advocate

Rushing onto a train, entering a concert, or simply ordering a coffee, we have all seen users (or ourselves) rummaging through their wallets or mobile app trying to get the right boarding pass, ticket or loyalty card. With Android Wear and a few lines of code in your mobile app, this can all work like magic.

What’s new in the Android Support Library

While QR Code images could be attached to a notification since the first release of the Android Wear platform, developers have asked about two situations which they would like to see improve:

  1. With circular displays, it is hard for developer to know if the QR code is displayed in it’s entirety and not cropped.
  2. To conserve battery, Android Wear switches off the screen after five seconds of inactivity. However, this makes it hard for the user to ensure that the QR code is still displayed on their wrist when they reach the front of the queue.

With the latest support library, we have added two additional methods to WearableExtender to give developers more control over how background images are displayed in notifications. These new APIs can be used in a number of scenarios, we will focus on the QR code use case in this post:

  • Ensure the image is not cropped - setHintAvoidBackgroundClipping(true)
  • With this new method, developers can ensure that the entire QR code is always visible. table, th, td { border: 1px solid black; border-collapse: collapse; } Wrong:
    setHintAvoidBackgroundClipping (false)
    // this is the Default Right:
    setHintAvoidBackgroundClipping (true)
  • Ensure the QR code is still displayed when the user gets to the front of the queue - setHintScreenTimeout(timeInMS)
  • This new method enables developers to set a timeout that makes sense for their specific use case.
Design Best Practices

We have experimented with a number of customization options with QR codes and here are some of the lessons learnt:

  • Do test with your equipment - Before deploying, test with your QR code readers to ensure that the QR code displayed on the wearable works with your equipment.
  • Do use black and white QR codes - This ensures maximum contrasts and makes it easier for the reader to read the information.
  • Do display only the core information in the text notification - Remember that less is more. Glanceability is important for wearables.
  • Do test with both round and square watches - The amount of text can be displayed on the notification varies especially dependent on the form factor (square and circular).
  • Do brand with icon - On the main notification in the Android Wear stream, developers can set a full color icon using setLargeIcon to brand your notification.
  • Do convey additional information using background - To achieve an even better result, consider setting context sensitive backgrounds through setBackground, such as a photo of the destination for the train or a picture of the stadium.
  • Do use QR codes which are 400x400 pixels or larger - In line with other background images, the recommended minimum size for QR code is 400x400 pixels.
  • Do not brand the QR code - The screen real estate is limited on Android Wear and using some of this for branding may result in the QR code not working correctly.
  • Do not use anything other than grey or default theme color for notification text - Although Android Wear notifications support basic text formatting such as setting text color, this should be used in moderation with the color set to default or grey. The reason is that the Holo theme for Android 4.x has a default background of black whereas Material Design theme for Android 5+ including Android Wear has a white background. This makes it hard for the colour to work for both themes. Bold and Italic are fine formatting choices.
Android Wear is for people on the move

Using QR codes on Android Wear is a very delightful experience. The information that the user needs is right on their wrist at the right time in the right place. With the new APIs, you can now unlock more doors than ever before and give users an easier time with check in on the go.

Sample code can be downloaded from this repository.

Join the discussion on

+Android Developers
Categories: Programming

Should I Work On Non-Work Things At Work?

Making the Complex Simple - John Sonmez - Mon, 01/26/2015 - 17:00

I’ve received a lot of questions lately about whether or not it is appropriate to work on non-work things at work. This isn’t an easy question to answer and every situation is a bit different, but I thought I’d offer some general advice that can help you figure out the answer for yourself. Doing something is better than doing nothing ... Read More

The post Should I Work On Non-Work Things At Work? appeared first on Simple Programmer.

Categories: Programming

Python: Find the highest value in a group

Mark Needham - Sun, 01/25/2015 - 13:47

In my continued playing around with a How I met your mother data set I needed to find out the last episode that happened in a season so that I could use it in a chart I wanted to plot.

I had this CSV file containing each of the episodes:

$ head -n 10 data/import/episodes.csv
1,1,/wiki/Pilot,1,"September 19, 2005",1127084400
2,2,/wiki/Purple_Giraffe,1,"September 26, 2005",1127689200
3,3,/wiki/Sweet_Taste_of_Liberty,1,"October 3, 2005",1128294000
4,4,/wiki/Return_of_the_Shirt,1,"October 10, 2005",1128898800
5,5,/wiki/Okay_Awesome,1,"October 17, 2005",1129503600
6,6,/wiki/Slutty_Pumpkin,1,"October 24, 2005",1130108400
7,7,/wiki/Matchmaker,1,"November 7, 2005",1131321600
8,8,/wiki/The_Duel,1,"November 14, 2005",1131926400
9,9,/wiki/Belly_Full_of_Turkey,1,"November 21, 2005",1132531200

I started out by parsing the CSV file into a dictionary of (seasons -> episode ids):

import csv
from collections import defaultdict
seasons = defaultdict(list)
with open("data/import/episodes.csv", "r") as episodesfile:
    reader = csv.reader(episodesfile, delimiter = ",")
    for row in reader:
print seasons

which outputs the following:

$ python
defaultdict(<type 'list'>, {
  1: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22], 
  2: [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], 
  3: [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64], 
  4: [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], 
  5: [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112], 
  6: [113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136], 
  7: [137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160], 
  8: [161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184], 
  9: [185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208]})

It’s reasonably easy to transform that into a dictionary of (season -> max episode id) with the following couple of lines:

for season, episode_ids in seasons.iteritems():
    seasons[season] = max(episode_ids)
>>> print seasons
defaultdict(<type 'list'>, {1: 22, 2: 44, 3: 64, 4: 88, 5: 112, 6: 136, 7: 160, 8: 184, 9: 208})

This works fine but it felt very much like a dplyr problem to me so I wanted to see whether I could write something cleaner using pandas.

I started out by capturing the seasons and episode ids in separate lists and then building up a DataFrame:

import pandas as pd
from pandas import DataFrame
seasons, episode_ids = [], []
with open("data/import/episodes.csv", "r") as episodesfile:
    reader = csv.reader(episodesfile, delimiter = ",")
    for row in reader:
df = DataFrame.from_items([('Season', seasons), ('EpisodeId', episode_ids)])
>>> print df.groupby("Season").max()["EpisodeId"]
1          22
2          44
3          64
4          88
5         112
6         136
7         160
8         184
9         208

Or we can simplify that and read the CSV file directly into a DataFrame:

df = pd.read_csv('data/import/episodes.csv', index_col=False, header=0)
>>> print df.groupby("Season").max()["NumberOverall"]
1          22
2          44
3          64
4          88
5         112
6         136
7         160
8         184
9         208

Pretty neat. I need to get more into pandas.

Categories: Programming

How to Start an Agile Project

Making the Complex Simple - John Sonmez - Thu, 01/22/2015 - 16:00

In this video, I quickly outline how I would start an Agile project from the ground up. I go over the idea of having a single product owner who is responsible for the product, rather than a committee of stakeholders.

The post How to Start an Agile Project appeared first on Simple Programmer.

Categories: Programming

Python/pdfquery: Scraping the FIFA World Player of the Year votes PDF into shape

Mark Needham - Thu, 01/22/2015 - 01:25

Last week the FIFA Ballon d’Or 2014 was announced and along with the announcement of the winner the individual votes were also made available.

Unfortunately they weren’t made open in a way that Ben Wellington (of IQuantNY fame) would approve of – the choice of format for the data is a PDF file!

I wanted to extract this data to play around with it but I wanted to automate the extraction as I’d done when working with Google Trends data.

I had a quick look for PDF scraping libraries in Python and R and eventually settled on Python’s pdfquery, mainly because there was lots of documentation which made it easy to get started.

One way you scrape data from a PDF is by locating an element on the page and then grabbing everything within a bounded box relative to that element.

In my case I had 17 pages all of which had a heading for each of six columns.

2015 01 22 00 08 18

I wanted to grab the data in each of those columns but initially struggled working out what elements I should be looking for until I came across the following function which allows you to dump an XML version of the PDF to disk:

import pdfquery
pdf = pdfquery.PDFQuery("fboaward_menplayer2014_neutral.pdf")
pdf.tree.write("/tmp/yadda", pretty_print=True)

The output looks like this:

$ head -n 10 /tmp/yadda
<pdfxml ModDate="D:20150110224554+01'00'" CreationDate="D:20150110224539+01'00'" Producer="Microsoft&#174; Excel&#174; 2010" Creator="Microsoft&#174; Excel&#174; 2010">
  <LTPage bbox="[0, 0, 841.8, 595.2]" height="595.2" pageid="1" rotate="0" width="841.8" x0="0" x1="841.8" y0="0" y1="595.2" page_index="0" page_label="">
    <LTAnon> </LTAnon>
    <LTTextLineHorizontal bbox="[31.08, 546.15, 122.524, 556.59]" height="10.44" width="91.444" word_margin="0.1" x0="31.08" x1="122.524" y0="546.15" y1="556.59"><LTTextBoxHorizontal bbox="[31.08, 546.15, 122.524, 556.59]" height="10.44" index="0" width="91.444" x0="31.08" x1="122.524" y0="546.15" y1="556.59">FIFA Ballon d'Or 2014 </LTTextBoxHorizontal></LTTextLineHorizontal>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>
    <LTAnon> </LTAnon>

Having scanned through the file I realised that what I needed to do was locate the ‘LTTextLineHorizontal’ element for each heading and then grab all the ‘LTTextLineHorizontal’ elements that appeared in that column.

I started out by trying to grab the ‘Name’ column on the first page:

>>> name_element = pdf.pq('LTPage[pageid=\'1\'] LTTextLineHorizontal:contains("Name")')[0]
>>> name_element.text
'Name '

Next I needed to get the other elements in that column. With a bit of trial and error I ended up with the following code:

x = float(name_element.get('x0'))
y = float(name_element.get('y0'))
cells = pdf.extract( [
         ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y-500, x+150, y))
>>> [cell.text.encode('utf-8').strip() for cell in cells['cells']]
['Amiri Islam', 'Cana Lorik', 'Bougherra Madjid', 'Luvu Rafe Talalelei', 'Sonejee Masand Oscar', 'Amaral Felisberto', 'Liddie Ryan', 'Griffith Quinton', 'Messi Lionel', 'Berezovskiy Roman', 'Breinburg Reinhard', 'Jedinak Mile', 'Fuchs Christian', 'Sadigov Rashad', 'Gavin Christie', 'Hasan Mohamed', 'Mamun Md Mamnul Islam', 'Burgess Romelle', 'Kalachou Tsimafei', 'Komany Vincent', 'Eiley Dalton', 'Nusum John', 'Tshering Passang', 'Raldes Ronald', 'D\xc5\xbeeko Edin', 'Da Silva Santos Junior Neymar', 'Ceasar Troy', 'Popov Ivelin', 'Kabore Charles', 'Ntibazonkiza Saidi', 'Kouch Sokumpheak']

I cleaned that up and generified it to work for any page and for columns of different widths. This is what the function looks like:

def extract_cells(page, header, cell_width):
    name_element = pdf.pq('LTPage[pageid=\'%s\'] LTTextLineHorizontal:contains("%s")' % (page, header))[0]
    x = float(name_element.get('x0'))
    y = float(name_element.get('y0'))
    cells = pdf.extract( [
         ('with_parent','LTPage[pageid=\'%s\']' %(page)),
         ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y-500, x+cell_width, y))
    return [cell.text.encode('utf-8').strip() for cell in cells['cells']]

We can then call that for each column on the page and zip together the resulting arrays to get a tuple for each row:

roles = extract_cells(1, "Vote", 50)
countries = extract_cells(1, "Country", 150)
voters = extract_cells(1, "Name", 170)
first = extract_cells(1, "First (5 points)", 150)
second = extract_cells(1, "Second (3 points)", 150)
third = extract_cells(1, "Third (1 point)", 130)
>>> for vote in zip(roles, countries, voters, first, second, third)[:5]:
       print vote
('Captain', 'Afghanistan', 'Amiri Islam', 'Messi Lionel', 'Cristiano Ronaldo', 'Ibrahimovic Zlatan')
('Captain', 'Albania', 'Cana Lorik', 'Cristiano Ronaldo', 'Robben Arjen', 'Mueller Thomas')
('Captain', 'Algeria', 'Bougherra Madjid', 'Cristiano Ronaldo', 'Robben Arjen', 'Benzema Karim')
('Captain', 'American Samoa', 'Luvu Rafe Talalelei', 'Neymar', 'Robben Arjen', 'Cristiano Ronaldo')
('Captain', 'Andorra', 'Sonejee Masand Oscar', 'Cristiano Ronaldo', 'Mueller Thomas', 'Kroos Toni')

The next step was to write out each of those rows to a CSV file so we can use it from another program. The full script looks like this:

import pdfquery
import csv
def extract_cells(page, header, cell_width):
    name_element = pdf.pq('LTPage[pageid=\'%s\'] LTTextLineHorizontal:contains("%s")' % (page, header))[0]
    x = float(name_element.get('x0'))
    y = float(name_element.get('y0'))
    cells = pdf.extract( [
         ('with_parent','LTPage[pageid=\'%s\']' %(page)),
         ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y-500, x+cell_width, y))
    return [cell.text.encode('utf-8').strip() for cell in cells['cells']]
if __name__ == "__main__":
    pdf = pdfquery.PDFQuery("fboaward_menplayer2014_neutral.pdf")
    pdf.tree.write("/tmp/yadda", pretty_print=True)
    pages_in_pdf = len(pdf.pq('LTPage'))
    with open('votes.csv', 'w') as votesfile:
        writer = csv.writer(votesfile, delimiter=",")
        writer.writerow(["Role", "Country", "Voter", "FirstPlace", "SecondPlace", "ThirdPlace"])
        for page in range(1, pages_in_pdf + 1):
            print page
            roles = extract_cells(page, "Vote", 50)
            countries = extract_cells(page, "Country", 150)
            voters = extract_cells(page, "Name", 170)
            first = extract_cells(page, "First (5 points)", 150)
            second = extract_cells(page, "Second (3 points)", 150)
            third = extract_cells(page, "Third (1 point)", 130)
            votes = zip(roles, countries, voters, first, second, third)
            print votes
            for vote in votes:

The code is on github if you want to play around with it or if you just want to grab the votes data that’s there too.

Categories: Programming

Continuous Delivery across multiple providers

Xebia Blog - Wed, 01/21/2015 - 13:04

Over the last year three of the four customers I worked with had a similar challenge with their environments. In different variations they all had their environments setup across separate domains. Ranging from physically separated on-premise networks to having environments running across different hosting providers managed by different parties.

Regardless of the reasoning behind having these kinds of setup it’s a situation where the continuous delivery concepts really add value. The stereotypical problems that exist with manual deployment and testing practices tend to get amplified when they occur in seperated domains. Things get even worse when you add more parties to the mix (like external application developers). Sticking to doing things manually is a recipe for disaster unless you enjoy going through expansive procedures every time you want to do anything in any of ‘your’ environments. And if you’ve outsourced your environments to an external party you probably don’t want to have to (re)hire a lot of people just so you can communicate with your supplier.

So how can continuous delivery help in this situation? By automating your provisioning and deployments you make deploying your applications, if nothing else, repeatable and predictable. Regardless of where they need to run.

Just automating your deployments isn’t enough however, a big question that remains is who does what. A question that is most likely backed by a lengthy contract. Agreements between all the parties are meant to provide an answer to that very question. A development partner develops, an outsourcing partners handles the hardware, etc. But nobody handles bringing everything together...

The process of automating your steps already provides some help with this problem. In order to automate you need some form of agreement on how to provide input for the tooling. This at least clarifies what the various parties need to produce. It also clarifies what the result of a step will be. This removes some of the fuzziness out of the process. Things like is the JVM part of the OS or part of the middleware should become clear. But not everything is that clearcut. It’s parts of the puzzle where pieces actually come together that things turn gray. A single tool may need input from various parties. Here you need to resists the common knee-jerk reaction to shield said tool from other people with procedures and red tape. Instead provide access to those tools to all relevant parties and handle your separation of concerns through a reliable access mechanism. Even then there might be some parts that can’t be used by just a single party and in that case, *gasp*, people will need to work together. 

What this results in is an automated pipeline that will keep your environments configured properly and allow applications to be deployed onto them when needed, within minutes, wherever they may run.


The diagram above shows how we set this up for one of our clients. Using XL Deploy, XL Release and Puppet as the automation tooling of choice.

In the first domain we have a git repository to which developers commit their code. A Jenkins build is used to extract this code, build it and package it in such a way that the deployment automation tool (XL Deploy) understands. It’s also kind enough to make that package directly available in XL Deploy. From there, XL Deploy is used to deploy the application not only to the target machines but also to another instance of XL Deploy running in the next domain, thus enabling that same package to be deployed there. This same mechanism can then be applied to the next domain. In this instance we ensure that the machines we are deploying to are consistent by using Puppet to manage them.

To round things off we use a single instance of XL Release to orchestrate the entire pipeline. A single release process is able to trigger the build in Jenkins and then deploy the application to all environments spread across the various domains.

A setup like this lowers deployment errors that come with doing manual deployments and cuts out all the hassle that comes with following the required procedures. As an added bonus your deployment pipeline also speeds up significantly. And we haven’t even talked about adding automated testing to the mix

Small Basic on Mac & Linux

Phil Trelford's Array - Wed, 01/21/2015 - 09:29

Microsoft’s Small Basic is a simple programming language and environment aimed at beginners.

It ships with an IDE for Windows, a commands line compiler and a small .Net library. Small Basic programs can also be run in the browser on Windows & Mac via SIlverlight.

The shipped .Net library for Small Basic targets WPF for graphics which is unfortunately not supported on Mono, which means Small Basic apps will not run directly on Mac or Linux.

To get Small Basic apps running from the command prompt on Mac and Linux all that is needed is a new library is required without the WPF dependency.

Recently I knocked up such a library providing support for command line input and output, providing graphics is a work-in-progress.

But this does mean I can now write and run FizzBuzz, or even work through the majority of the Small Basic Tutorial, on Linux or Mac via Mono:

Small Basic on Mac

Combine this with my open source Small Basic compiler project (written in F#) and there’s now have a cross platform version of Small Basic :)

If you fancy having a play with an early version of the source download it here:

Future work

I’m currently evaluating GtkSharp, OpenTK and WinForms as options for a cross platform version of the graphics library.

As well as the compiler, I’ve also written an interpreter for Small Basic which means it should be possible to edit and run programs on iOS and Android, but that’s another story

Categories: Programming

Debugging Small Basic Apps in Visual Studio

Phil Trelford's Array - Tue, 01/20/2015 - 09:17

Microsoft Small Basic ships with a custom IDE with syntax colouring and code completion but no debugger:


There’s a good article by Nonki Takahashi on Microsoft Technet on How to debug Small Basic programs manually which boils down to:

  • trace with TextWindow.WriteLine
  • add conditional debug code with If debug Then 

  • promote your app to full VB.Net

Small Basic in Visual Studio

Last year, for fun, I wrote a custom Small Basic compiler with some extensions like functions with parameters and tuples and pattern matching.

I added debugger support recently so you can compile and debug Small Basic apps directly in Visual Studio:

SmallBasic Debug in VS2013

Setup Steps:

  • download and compile the custom Small Basic compiler
  • create a project to host your Small Basic file (.sb)
  • compile the app with the custom Small Basic Compiler
  • in the project properties Debug tab configure Start External Program

Implementation details

The Reflection.Emit library allows you to mark points in the emitted IL code with corresponding points in the source file using the MarkSequencePoint method. There’s a good guide from back in 2005 on Michael Stall’s blog: Debugging Dynamically Generated Code (Reflection.Emit)

Only a few changes to the compiler were required to provide Debugger support. First I augmented the parser, which uses FParsec, to produce line and column information for each statement. For this FParsec provides a handy getPosition function that returns the current position in the input stream. Then in the IL emit step I simply used MarkSequencePoint to annotate each statement.

Future work

It would also be nice to add syntax colouring for Small Basic within Visual Studio too. If anyone is interested in working with me on this, please get in touch Smile

I’m also now able to run Small Basic programs on Mac and Linux via Mono but that’s another post

Categories: Programming

Cracking The Coding Interview: 12 Things You Need To Know

Making the Complex Simple - John Sonmez - Mon, 01/19/2015 - 16:30

Cracking the coding interview is the holy grail of many programmers and software developers, but is cracking the coding interview really possible? Nothing, I mean nothing, terrifies more software engineers than the dreaded coding interview. Sure, Gayle McDowell, wrote an excellent book that is actually called “Cracking the Coding Interview,” but is it actually possible? Yes, but I don’t think ... Read More

The post Cracking The Coding Interview: 12 Things You Need To Know appeared first on Simple Programmer.

Categories: Programming

Try is free in the Future

Xebia Blog - Mon, 01/19/2015 - 09:40

Lately I have seen a few developers consistently use a Try inside of a Future in order to make error handling easier. Here I will investigate if this has any merits or whether a Future on it’s own offers enough error handle.

If you look at the following code there is nothing that a Future can’t supply but a Try can:

import scala.concurrent.{Await, Future, Awaitable}
import scala.concurrent.duration._
import scala.util.{Try, Success, Failure}

object Main extends App {

  // Happy Future
  val happyFuture = Future {

  // Bleak future
  val bleakFuture = Future {
    throw new Exception("Mass extinction!")

  // We would want to wrap the result into a hypothetical http response
  case class Response(code: Int, body: String)

  // This is the handler we will use
  def handle[T](future: Future[T]): Future[Response] = { {
      case answer: Int => Response(200, answer.toString)
    } recover {
      case t: Throwable => Response(500, "Uh oh!")

    val result = Await.result(handle(happyFuture), 1 second)

    val result = Await.result(handle(bleakFuture), 1 second)

After giving it some thought the only situation where I could imagine Try being useful in conjunction with Future is when awaiting a Future but not wanting to deal with error situations yet. The times I would be awaiting a future are very few in practice though. But when needed something like this migth do:

object TryAwait {
  def result[T](awaitable: Awaitable[T], atMost: Duration): Try[T] = {
    Try {
      Await.result(awaitable, atMost)

If you do feel that using Trys inside of Futures adds value to your codebase please let me know.

Python/NLTK: Finding the most common phrases in How I Met Your Mother

Mark Needham - Mon, 01/19/2015 - 01:24

Following on from last week’s blog post where I found the most popular words in How I met your mother transcripts, in this post we’ll have a look at how we can pull out sentences and then phrases from our corpus.

The first thing I did was tweak the scraping script to pull out the sentences spoken by characters in the transcripts.

Each dialogue is separated by two line breaks so we use that as our separator. I also manually skimmed through the transcripts and found out which tags we need to strip out. I ended up with the following:

import csv
import nltk
import re
import bs4
from bs4 import BeautifulSoup, NavigableString
from soupselect import select
from nltk.corpus import stopwords
from collections import Counter
from nltk.tokenize import word_tokenize
episodes_dict = {}
def strip_tags(soup, invalid_tags):
    for tag in invalid_tags:
        for match in soup.findAll(tag):
    return soup
def extract_sentences(html):
    clean = []
    brs_in_a_row = 0
    temp = ""
    for item in raw_text.contents:
        if == "br":
            brs_in_a_row = brs_in_a_row + 1
            temp = temp + item
        if brs_in_a_row == 2:
            temp = ""
            brs_in_a_row = 0
    return clean
speakers = []
with open('data/import/episodes.csv', 'r') as episodes_file, \
     open("data/import/sentences.csv", 'w') as sentences_file:
    reader = csv.reader(episodes_file, delimiter=',')
    writer = csv.writer(sentences_file, delimiter=',')
    writer.writerow(["SentenceId", "EpisodeId", "Season", "Episode", "Sentence"])
    sentence_id = 1
    for row in reader:
        transcript = open("data/transcripts/S%s-Ep%s" %(row[3], row[1])).read()
        soup = BeautifulSoup(transcript)
        rows = select(soup, "table.tablebg tr div.postbody")
        raw_text = rows[0]
        [ad.extract() for ad in select(raw_text, "")]
        [ad.extract() for ad in select(raw_text, "div.t-foot-links")]
        [ad.extract() for ad in select(raw_text, "hr")]
        for tag in ['strong', 'em', "a"]:
            for match in raw_text.findAll(tag):
        print row
        for sentence in [
                for item in extract_sentences(raw_text.contents)
            writer.writerow([sentence_id, row[0], row[3], row[1], sentence])
            sentence_id = sentence_id + 1

Here’s a preview of the sentences CSV file:

$ head -n 10 data/import/sentences.csv
2,1,1,1,Scene One
3,1,1,1,[Title: The Year 2030]
4,1,1,1,"Narrator: Kids, I'm going to tell you an incredible story. The story of how I met your mother"
5,1,1,1,Son: Are we being punished for something?
6,1,1,1,Narrator: No
7,1,1,1,"Daughter: Yeah, is this going to take a while?"
8,1,1,1,"Narrator: Yes. (Kids are annoyed) Twenty-five years ago, before I was dad, I had this whole other life."
9,1,1,1,"(Music Plays, Title ""How I Met Your Mother"" appears)"

The next step is to iterate through each of those sentences and create some n-grams to capture the common phrases in the transcripts.

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.

Python’s nltk library has a function that makes this easy e.g.

>>> import nltk
>>> tokens = nltk.word_tokenize("I want to be in an n gram")
>>> tokens
['I', 'want', 'to', 'be', 'in', 'an', 'n', 'gram']
>>> nltk.util.ngrams(tokens, 2)
[('I', 'want'), ('want', 'to'), ('to', 'be'), ('be', 'in'), ('in', 'an'), ('an', 'n'), ('n', 'gram')]
>>> nltk.util.ngrams(tokens, 3)
[('I', 'want', 'to'), ('want', 'to', 'be'), ('to', 'be', 'in'), ('be', 'in', 'an'), ('in', 'an', 'n'), ('an', 'n', 'gram')]

If we do a similar thing of HIMYM transcripts while stripping out the speaker’s name – lines are mostly in the form “Speaker:Sentence” – we end up with the following top phrases:

import nltk
import csv
import string
import re
from collections import Counter
non_speaker = re.compile('[A-Za-z]+: (.*)')
def extract_phrases(text, phrase_counter, length):
    for sent in nltk.sent_tokenize(text):
        strip_speaker = non_speaker.match(sent)
        if strip_speaker is not None:
            sent =
        words = nltk.word_tokenize(sent)
        for phrase in nltk.util.ngrams(words, length):
            phrase_counter[bphrase] += 1
phrase_counter = Counter()
with open("data/import/sentences.csv", "r") as sentencesfile:
    reader = csv.reader(sentencesfile, delimiter=",")
    for sentence in reader:
        extract_phrases(sentence[4], phrase_counter, 3)
most_common_phrases = phrase_counter.most_common(50)
for k,v in most_common_phrases:
    print '{0: <5}'.format(v), k

And if we run that:

$ python
1123  (',', 'I', "'m")
1099  ('I', 'do', "n't")
1005  (',', 'it', "'s")
535   ('I', 'ca', "n't")
523   ('I', "'m", 'not')
507   ('I', 'mean', ',')
507   (',', 'you', "'re")
459   (',', 'that', "'s")
458   ('2030', ')', ':')
454   ('(', '2030', ')')
453   ('Ted', '(', '2030')
449   ('I', "'m", 'sorry')
247   ('I', 'have', 'to')
247   ('No', ',', 'I')
246   ("'s", 'gon', 'na')
241   (',', 'I', "'ll")
229   ('I', "'m", 'going')
226   ('do', "n't", 'want')
226   ('It', "'s", 'not')

I noticed that quite a few of the phrases had punctuation in so my next step was to get rid of any of the phrases that had any punctuation in. I updated extract_phrases like so:

def extract_phrases(text, phrase_counter, length):
    for sent in nltk.sent_tokenize(text):
        strip_speaker = non_speaker.match(sent)
        if strip_speaker is not None:
            sent =
        words = nltk.word_tokenize(sent)
        for phrase in nltk.util.ngrams(words, length):
            if all(word not in string.punctuation for word in phrase):
                phrase_counter[phrase] += 1

Let’s run it again:

$ python
1099  ('I', 'do', "n't")
535   ('I', 'ca', "n't")
523   ('I', "'m", 'not')
449   ('I', "'m", 'sorry')
414   ('do', "n't", 'know')
383   ('Ted', 'from', '2030')
338   ("'m", 'gon', 'na')
334   ('I', "'m", 'gon')
300   ('gon', 'na', 'be')
279   ('END', 'OF', 'FLASHBACK')
267   ("'re", 'gon', 'na')
155   ('It', "'s", 'just')
151   ('at', 'the', 'bar')
150   ('a', 'lot', 'of')
147   ("'re", 'going', 'to')
144   ('I', 'have', 'a')
142   ('I', "'m", 'so')
138   ('do', "n't", 'have')
137   ('I', 'think', 'I')
136   ('not', 'gon', 'na')
136   ('I', 'can', 'not')
135   ('and', 'I', "'m")

Next I wanted to display each phrase as a string rather than a tuple which was more difficult than I expected. I ended up with the following function which almost does the job:

def untokenize(ngram):
    tokens = list(ngram)
    return "".join([" "+i if not i.startswith("'") and \
                             i not in string.punctuation and \
                             i != "n't"
                          else i for i in tokens]).strip()

I updated extract_phrases to use that function:

def extract_phrases(text, phrase_counter, length):
    for sent in nltk.sent_tokenize(text):
        strip_speaker = non_speaker.match(sent)
        if strip_speaker is not None:
            sent =
        words = nltk.word_tokenize(sent)
        for phrase in nltk.util.ngrams(words, length):
            if all(word not in string.punctuation for word in phrase):
                phrase_counter[untokenize(phrase)] += 1

Let’s go again:

$ python
1099  I don't
535   I can't
523   I'm not
449   I'm sorry
414   don't know
383   Ted from 2030
338   'm gon na
334   I'm gon
300   gon na be
151   at the bar
150   a lot of
147   're going to
144   I have a
142   I'm so
138   don't have
137   I think I
136   not gon na
136   I can not
135   and I'm

These were some of the interesting things that stood out for me and deserve further digging into:

  • A lot of the most popular phrases begin with ‘I’ – it would be interesting to filter those sentences to find the general sentiment.
  • The ‘untokenize’ function struggles to reconstruct the slang phrase ‘gonna’ into a single word.
  • ‘Ted from 2030′ is actually a speaker which doesn’t follow the expected regex pattern and so wasn’t filtered out.
  • ‘END OF FLASHBACK’ shows quite high up and pulling out those flashbacks would probably be an interesting feature to extract to see which episodes reference each other.
  • ‘Marshall and Lily’ and ‘Lily and Marshall’ show up on the list – it would be interesting to explore the frequency of pairs of other characters.

The code is all on github if you want to play with it.

Categories: Programming


Xebia Blog - Sun, 01/18/2015 - 12:11

Did you ever use AngularJS as a frontend framework? Then you should definitely give Meteor a try! Where AngularJS is powerful just as a client framework, meteor is great as a full stack framework. That means you just write your code in one language as if there is no back- and frontend at all. In fact, you get an Android and IOS client for free. Meteor is so incredibly simple that you are productive from the beginning.

Where meteor kicks angular

One of the killing features of meteor is that you'll have a shared code base for frontend and backend. In the next code snippet, you'll see a file shared by backend and frontend:

// Collection shared and synchronized accross client, server and database
Todos = new Mongo.Collection('todos');

// Shared validation logic
validateTodo = function (todo) {
  var errors = {};
  if (!todo.title)
    todo.title = "Please fill in a title";
  return errors;

Can you imagine how neat the code above is?

Scan 04 Jan 2015 18.48-page4

With one codebase, you get the full stack!

  1. As in the backend file and in the frontend file one can access and query over the Todos collection. Meteor is responsible for syncing the todos. Even when another user adds an item, it will be visible to your client directly. Meteor accomplishes this by a client-side Mongo implementation (MiniMongo).
  2. One can write validation rules once! And they are executed both on the front-end and on the back-end. So you can give my user quick feedback about invalid input, but you can also guarantee that no invalid data is processed by the backend (when someone bypasses the client). And this is all without duplicated code.

Another killer feature of meteor is that it works out of the box, and it's easy to understand. Angular can be a bit overwhelming; you have to learn concepts like directives, services, factories, filters, isolated scopes, transclusion. For some initial scaffolding, you need to know grunt, yeoman, etcetera. With meteor every developer can create, run and deploy a full-stack application within minutes. After installing meteor you can run your app within seconds.

$ curl | /bin/sh
$ meteor create dummyapp
$ cd dummyapp
$ meteor
$ meteor deploy
Screen Shot 2015-01-04 at 19.49.08

Meteor dummy application

Another nice aspect of meteor is that it uses DDP, the Distributed Data Protocol. The team invented the protocol and they are heavily promoting it as "REST for websockets". It is a simple, pragmatic approach allowing it to be used to deliver live updates as data changes in the backend. Remember that this works all out of the box. This talk walks you through the concepts of it. But the result is that if you change data on a client it will be updated immediately on the other client.

And there is so much more, like...

  1. Latency Compensation. On the client, Meteor prefetches data and simulates models to make it look like server method calls return instantly.
  2. Meteor is open source and integrates with existing open source tools and frameworks.
  3. Services (like an official package server and a build farm).
  4. Command line tools
  5. Hot deploys
Where meteor falls short

Yes, meteor is not the answer to all your problems. The reason, I'll still choose angular above meteor for my professional work, is because the view framework of angular rocks. It makes it easy to structure your client code into testable units and connect them via dependency injection. With angular you can separate your HTML from your javascript. With meteor your javascript contains HTML elements, (because their UI-library is based on handlebars. That makes testing harder and large projects will become unstructured very quickly.

Another flaw emerges if your project already has a backend. When you choose meteor, you choose their full stack. That means: Mongo as database and Node.js as backend. Despite you are able to create powerful applications, Meteor doesn't allow you (easily) to change this stack.

Under the hood

Meteor consists out of several subprojects. In fact, it is a library of libraries. In fact, it is a stack; a standard set of core packages that are designed to work well together:

Components used by meteor

  1. To make meteor reactive they've included the components blaze and tracker. The blaze component is heavily based on handlebars.
  2. The DDP component is a new protocol, described by meteor, for modern client-server communication.
  3. Livequery and full stack database take all the pain of data synchronization between the database, backend and frontend away! You don't have to think about in anymore.
  4. The Isobuild package is a unified build system for browser, server and mobile.

If you want to create a website or a mobile app with a backend in no time, with getting lots of functionality out of the box, meteor is a very interesting tool. If you want to have more control or connect to an existing backend, then meteor is probably less suitable.

You can watch this presentation I recently gave, to go along with the article.

Bandita Joarder on How Presence is Something You Can Learn

Bandita is one of the most amazing leaders in the technology arena.

She’s not just technical, but she also has business skills, and executive presence.

But she didn’t start out that way.

She had to learn presence from the school of hard knocks.   Many people think presence is something that either you have or you don’t.

Bandita proves otherwise.

Here is a guest post by Bandita Joarder on how presence is something you can learn:

Presence is Something You Can Learn

It’s a personal story.  It’s an empowering story.  It’s a story of a challenge and a change, and how learning the power of presence, helped Bandita move forward in her career.


Categories: Architecture, Programming

How Google Analytics helps you make better decisions for your apps

Android Developers Blog - Fri, 01/16/2015 - 01:12

Posted by Russell Ketchum, Lead Product Manager, Google Analytics for Mobile Apps

Knowing how your customers use your app is the foundation to keeping them happy and engaged. It’s important to track downloads and user ratings, but the key to building a successful business is using data to dive deeper into understanding the full acquisition funnel and what makes users stick around.

Google Analytics is the easiest way to understand more about what your users are doing inside your app on Google Play, while also simultaneously tracking your users across the web and other mobile platforms. To show how Google Analytics can help, we've created a new "Analyze" section on the Android Developers website for you to check out. We provide guidance on how to design a measurement plan and implement effective in-app analytics – and take advantage of features only available between Google Play and Google Analytics.

The Google Play Referral Flow in Analytics

Google Analytics for mobile apps provides a comprehensive view into your app’s full user lifecycle, including user acquisition, composition, in app behavior, and key conversions. Our Analytics Academy course on mobile app analytics is also a great resource to learn the fundamentals.

Eltsoft LLC, a foreign language learning and education app developer for Android, recognized early on how impactful Google Analytics would have on the company's ability to quickly improve on its apps and meet user needs.

Analytics has really helped us to track the effectiveness of the changes to our app. I would say six months ago, that our success was a mystery. The data said we were doing well, but the whys were not clear. Therefore, we couldn’t replicate or push forward. But today, we understand what’s happening and can project our future success. We have not only the data, but can control certain variables allowing us to understand that data. - Jason Byrne, Eltsoft LLC

Here are some powerful tips to make the most of Google Analytics:

  1. Understand the full acquisition funnel
  2. Uniquely integrated with the Google Play Developer Console, Google Analytics gives you a comprehensive view of the Google Play Referral Flow. By linking Analytics to the Developer Console, you can track useful data on how users move through the acquisition flow from your marketing efforts to the Google Play store listing to the action of launching the app. If you find that a significant number of users browse your app in Google Play, but don’t install it, for example, you can then focus your efforts on improving your store listing.
  3. Unlock powerful insights on in-app purchases
  4. Monitoring in-app purchases in the Google Play Developer Console will show you the total revenue your app is generating, but it does not give you the full picture about your paying users. By instrumenting your app with the Google Analytics ecommerce tracking, you’ll get a fuller understanding of what paying users do inside your app. For example, you can find out which acquisition channels deliver users who stay engaged and go on to become the highest value users.
  5. Identify roadblocks and common paths with the Behavior Flow
  6. Understanding how users move through your app is best done with in-app analytics. With Google Analytics, you can easily spot if a significant percentage of users leave your app during a specific section. For example, if you see significant drop off on a certain level of your game, you may want to make that level easier, so that more users complete the level and progress through the game. Similarly, if you find users who complete a tutorial stay engaged with your app, you might put the tutorial front and center for first-time users.
  7. Segment your audience to find valuable insights
  8. Aggregated data can help you answer questions about overall trends in your app. If you want to unlock deeper insights about what drives your users’ behavior, you can slice and dice your data using segmentation, such as demographics, behavior, or install date. If something changes in one of your key metrics, segmentation can help you get to the root of the issue -- for example, was a recent app update unpopular with users from one geographic area, or were users with a certain device or carrier affected by a bug?
  9. Use custom data to measure what matters for your business
  10. Simply activating the Google Analytics library gives you many out-of-the-box metrics without additional work, such as daily and monthly active users, session duration, breakdowns by country, and many more variables. However, it’s likely that your app has many user actions or data types that are unique to it, which are critical to building an engaged user base. Google Analytics provides events, custom dimensions, and custom metrics so you can craft a measurement strategy that fits your app and business.
  11. No more one-size-fits-all ad strategy
  12. If you’re a developer using AdMob to monetize your app, you can now see all of your Analytics data in the AdMob dashboard. Running a successful app business is all about reaching the right user with the right ad or product at the right time. If you create specific user segments in Google Analytics, you can target each segment with different ad products. For example, try targeting past purchasers with in-app purchase ads, while monetizing users who don’t purchase through targeted advertising.

By measuring your app performance on a granular level, you will be able to make better decisions for your business. Successful developers build their measurement plan at the same time as building their app in order to set goals and track progress against key success metrics, but it’s never too late to start.

Choose the implementation that works best for your app to get started with Google Analytics today and find out more about what you can do in the new “Analyze” section of

Join the discussion on

+Android Developers
Categories: Programming

Does Manual Testing Have a Future?

Making the Complex Simple - John Sonmez - Thu, 01/15/2015 - 16:00

In this video, I tackle whether or not manual testing has a future or whether someone who is a manual tester should look to move on to a different role.

The post Does Manual Testing Have a Future? appeared first on Simple Programmer.

Categories: Programming

Monitoring Akka with Kamon

Xebia Blog - Thu, 01/15/2015 - 13:49

Kamon is a framework for monitoring the health and performance of applications based on akka, the popular actor system framework often used with Scala. It provides good quick indicators, but also allows in-depth analysis.


Beyond just collecting local metrics per actor (e.g. message processing times and mailbox size), Kamon is unique in that it also monitors message flow between actors.

Essentially, Kamon introduces a TraceContext that is maintained across asynchronous calls: it uses AOP to pass the context along with messages. None of your own code needs to change.

Because of convenient integration modules for Spray/Play, a TraceContext can be automatically started when an HTTP request comes in.

If nothing else, this can be easily combined with the Logback converter shipped with Kamon: simply logging the token is of great use right out of the gate.


Kamon does not come with a dashboard by itself (though some work in this direction is underway).

Instead, it provides 3 'backends' to post the data to (4 if you count the 'LogReporter' backend that just dumps some statistics into Slf4j): 2 on-line services (NewRelic and DataDog), and statsd (from Etsy).

statsd might seem like a hassle to set up, as it needs additional components such as grafana/graphite to actually browse the statistics. Kamon fortunately provides a correctly set-up docker container to get you up and running quickly. We unfortunately ran into some issues with the image uploaded to the Docker Hub Registry, but building it ourselves from the definition on github resolved most of these.


We found the source code to Kamon to be clear and to-the-point. While we're generally no great fan of AspectJ, for this purpose the technique seems to be quite well-suited.

'Monkey-patching' a core part of your stack like this can of course be dangerous, especially with respect to performance considerations. Unless you enable the heavier analyses (which are off by default and clearly marked), it seems this could be fairly light - but of course only real tests will tell.

Getting Started

Most Kamon modules are enabled by adding their respective akka extension. We found the quickest way to get started is to:

  • Add the Kamon dependencies to your project as described in the official getting started guide
  • Enable the Metrics and LogReporter extensions in your akka configuration
  • Start your application with AspectJ run-time weaving enabled. How to do this depends on how you start your application. We used the sbt-aspectj plugin.

Enabling AspectJ weaving can require a little bit of twiddling, but adding the LogReporter should give you quick feedback on whether you were successful: it should start periodically logging metrics information.

Next steps are:

  • Enabling Spray or Play plugins
  • Adding the trace token to your logging
  • Enabling other backends (e.g. statsd)
  • Adding custom application-specific metrics and trace points

Kamon looks like a healthy, useful tool that not only has great potential, but also provides some great quick wins.

The documentation that is available is of great quality, but there are some parts of the system that are not so well-covered. Luckily, the source code very approachable.

It is clear the Kamon project is not very popular yet, judging by some of the rough edges we encountered. These, however, seem to be mostly superficial: the core ideas and implementation seems solid. We highly recommend taking a look.


Remco Beckers

Arnout Engelen

Exploring Akka Stream's TCP Back Pressure

Xebia Blog - Wed, 01/14/2015 - 15:48

Some years ago, when Reactive Streams lived in utopia we got the assignment to build a high-volume message broker. A considerable amount of code of the solution we delivered back then was dedicated to prevent this broker being flooded with messages in case an endpoint became slow.

How would we have solved this problem today with the shiny new Akka Reactive Stream (experimental) implementation just within reach?

In this blog we explore Akka Streams in general and TCP Streams in particular. Moreover, we show how much easier we can solve the challenge we faced backed then using Streams.

A use-case for TCP Back Pressure

The high-volume message broker mentioned in the introduction basically did the following:

  • Read messages (from syslog) from a TCP socket
  • Parse the message
  • Forward the message to another system via a TCP connection

For optimal throughput multiple TCP connections were available, which allowed delivering messages to the endpoint system in parallel. The broker was supposed to handle about 4000 - 6000 messages per second. As follows a schema of the noteworthy components and message flow:


Naturally we chose Akka as framework to implement this application. Our approach was to have an Actor for every TCP connection to the endpoint system. An incoming message was then forwarded to one of these connection Actors.

The biggest challenge was related to back pressure: how could we prevent our connection Actors from being flooded with messages in case the endpoint system slowed down or was not available? With 6000 messages per second an Actor's mailbox is flooded very quickly.

Another requirement was that message buffering had to be done by the client application, which was syslog. Syslog has excellent facilities for that. Durable mailboxes or something the like was out of the question. Therefore, we had to find a way to pull only as many messages in our broker as it could deliver to the endpoint. In other words: provide our own back pressure implementation.

A considerable amount of code of the solution we delivered back then was dedicated to back pressure. During one of our re-occurring innovation days we tried to figure out how much easier the back pressure challenge would have been if Akka Streams would have been available.

Akka Streams in a nutshell

In case you are new to Akka Streams as follows some basic information that help you understand the rest of the blog.

The core ingredients of a Reactive Stream consist of three building blocks:

  • A Source that produces some values
  • A Flow that performs some transformation of the elements produced by a Source
  • A Sink that consumes the transformed values of a Flow

Akka Streams provide a rich DSL through which transformation pipelines can be composed using the mentioned three building blocks.

A transformation pipeline executes asynchronously. For that to work it requires a so called FlowMaterializer, which will execute every step of the pipeline. A FlowMaterializer uses Actor's for the pipeline's execution even though from a usage perspective you are unaware of that.

A basic transformation pipeline looks as follows:


  implicit val actorSystem = ActorSystem()
  implicit val materializer = FlowMaterializer()

  val numberReverserFlow: Flow[Int, String] = Flow[Int].map(_.toString.reverse)

  numberReverserFlow.runWith(Source(100 to 200), ForeachSink(println))

We first create a Flow that consumes Ints and transforms them into reversed Strings. For the Flow to run we call the runWith method with a Source and a Sink. After runWith is called, the pipeline starts executing asynchronously.

The exact same pipeline can be expressed in various ways, such as:

    //Use the via method on the Source that to pass in the Flow
    Source(100 to 200).via(numberReverserFlow).to(ForeachSink(println)).run()

    //Directly call map on the Source.
    //The disadvantage of this approach is that the transformation logic cannot be re-used.
    Source(100 to 200).map(_.toString.reverse).to(ForeachSink(println)).run()

For more information about Akka Streams you might want to have a look at this Typesafe presentation.

A simple reverse proxy with Akka Streams

Lets move back to our initial quest. The first task we tried to accomplish was to create a stream that accepts data from an incoming TCP connection, which is forwarded to a single outgoing TCP connection. In that sense this stream was supposed to act as a typical reverse-proxy that simply forwards traffic to another connection. The only remarkable quality compared to a traditional blocking/synchronous solution is that our stream operates asynchronously while preserving back-pressure.


implicit val system = ActorSystem("on-to-one-proxy")
implicit val materializer = FlowMaterializer()

val serverBinding = StreamTcp().bind(new InetSocketAddress("localhost", 6000))

val sink = ForeachSink[StreamTcp.IncomingConnection] { connection =>
      println(s"Client connected from: ${connection.remoteAddress}")
      connection.handleWith(StreamTcp().outgoingConnection(new InetSocketAddress("localhost", 7000)).flow)
val materializedServer =


First we create the mandatory instances every Akka reactive Stream requires, which is an ActorSystem and a FlowMaterializer. Then we create a server binding using the StreamTcp Extension that listens to incoming traffic on localhost:6000. With the ForeachSink[StreamTcp.IncomingConnection] we define how to handle the incoming data for every StreamTcp.IncomingConnection by passing a flow of type Flow[ByteString, ByteString]. This flow consumes ByteStrings of the IncomingConnection and produces a ByteString, which is the data that is sent back to the client.

In our case the flow of type Flow[ByteString, ByteString] is created by means of the StreamTcp().outgoingConnection(endpointAddress).flow. It forwards a ByteString to the given endpointAddress (here localhost:7000) and returns its response as a ByteString as well. This flow could also be used to perform some data transformations, like parsing a message.

Parallel reverse proxy with a Flow Graph

Forwarding a message from one connection to another will not meet our self defined requirements. We need to be able to forward messages from a single incoming connection to a configurable amount of outgoing connections.

Covering this use-case is slightly more complex. For it to work we make use of the flow graph DSL.

  import akka.util.ByteString

  private def parallelFlow(numberOfConnections:Int): Flow[ByteString, ByteString] = {
    PartialFlowGraph { implicit builder =>
      val balance = Balance[ByteString]
      val merge = Merge[ByteString]
      UndefinedSource("in") ~> balance

      1 to numberOfConnections map { _ =>
        balance ~> StreamTcp().outgoingConnection(new InetSocketAddress("localhost", 7000)).flow ~> merge

      merge ~> UndefinedSink("out")
    } toFlow (UndefinedSource("in"), UndefinedSink("out"))

We construct a flow graph that makes use of the junction vertices Balance and Merge, which allow us to fan-out the stream to several other streams. For the amount of parallel connections we want to support, we create a fan-out flow starting with a Balance vertex, followed by a OutgoingConnection flow, which is then merged with a Merge vertex.

From an API perspective we faced the challenge of how to connect this flow to our IncomingConnection. Almost all flow graph examples take a concrete Source and Sink implementation as starting point, whereas the IncomingConnection does neither expose a Source nor a Sink. It only accepts a complete flow as input. Consequently, we needed a way to abstract the Source and Sink since our fan-out flow requires them.

The flow graph API offers the PartialFlowGraph class for that, which allows you to work with abstract Sources and Sinks (UndefinedSource and UndefinedSink). We needed quite some time to figure out how they work: simply declaring a UndefinedSource/Sink without a name won't work. It is essential that you give the UndefinedSource/Sink a name which must be identical to the one that is used in the UndefinedSource/Sink passed in the toFlow method. A bit more documentation on this topic would help.

Once the fan-out flow is created, it can be passed to the handleWith method of the IncomingConnection:

val sink = ForeachSink[StreamTcp.IncomingConnection] { connection =>
      println(s"Client connected from: ${connection.remoteAddress}")
      val parallelConnections = 20

As a result, this implementation delivers all incoming messages to the endpoint system in parallel while still preserving back-pressure. Mission completed!

Testing the Application

To test our solution we wrote two helper applications:

  • A blocking client that pumps as many messages as possible into a socket connection to the parallel reverse proxy
  • A server that delays responses with a configurable latency in order to mimic a slow endpoint. The parallel reverse proxy forwards messages via one of its connections to this endpoint.

The following chart depicts the increase in throughput with the increase in amount of connections. Due to the nondeterministic concurrent behavior there are some spikes in the results but the trend shows a clear correlation between throughput and amount of connections:


End-to-end solution

The end-to-end solution can be found here.
By changing the numberOfConnections variable you can see the impact on performance yourself.

Check it out! ...and go with the flow ;-)

Information about TCP back pressure with Akka Streams

At the time of this writing there was not much information available about Akka Streams, due to the fact that it is one of the newest toys of the Typesafe factory. As follows some valuable resources that helped us getting started:

Efficient Game Textures with Hardware Compression

Android Developers Blog - Tue, 01/13/2015 - 20:43

Posted by Shanee Nishry, Developer Advocate

As you may know, high resolution textures contribute to better graphics and a more impressive game experience. Adaptive Scalable Texture Compression (ASTC) helps solve many of the challenges involved including reducing memory footprint and loading time and even increase performance and battery life.

If you have a lot of textures, you are probably already compressing them. Unfortunately, not all compression algorithms are made equal. PNG, JPG and other common formats are not GPU friendly. Some of the highest-quality algorithms today are proprietary and limited to certain GPUs. Until recently, the only broadly supported GPU accelerated formats were relatively primitive and produced poor results.

With the introduction of ASTC, a new compression technique invented by ARM and standardized by the Khronos group, we expect to see dramatic changes for the better. ASTC promises to be both high quality and broadly supported by future Android devices. But until devices with ASTC support become widely available, it’s important to understand the variety of legacy formats that exist today.

We will examine preferable compression formats which are supported on the GPU to help you reduce .apk size and loading times of your game.

Texture Compression

Popular compressed formats include PNG and JPG, which can’t be decoded directly by the GPU. As a consequence, they need to be decompressed before copying them to the GPU memory. Decompressing the textures takes time and leads to increased loading times.

A better option is to use hardware accelerated formats. These formats are lossy but have the advantage of being designed for the GPU.

This means they do not need to be decompressed before being copied and result in decreased loading times for the player and may even lead to increased performance due to hardware optimizations.

Hardware Accelerated Formats

Hardware accelerated formats have many benefits. As mentioned before, they help improve loading times and the runtime memory footprint.

Additionally, these formats help improve performance, battery life and reduce heating of the device, requiring less bandwidth while also consuming less energy.

There are two categories of hardware accelerated formats, standard and proprietary. This table shows the standard formats:

table { border-collapse: collapse; } table, th, td { border: 1px solid black; } td { padding: 5px; } ETC1 Supported on all Android devices with OpenGL ES 2.0 and above. Does not support alpha channel. ETC2 Requires OpenGL ES 3.0 and above. ASTC Higher quality than ETC1 and ETC2. Supported with the Android Extension Pack.

As you can see, with higher OpenGL support you gain access to better formats. There are proprietary formats to replace ETC1, delivering higher quality and alpha channel support. These are shown in the following table:

table { border-collapse: collapse; } table, th, td { border: 1px solid black; }td { padding: 5px; } ATC Available with Adreno GPU. PVRTC Available with a PowerVR GPU. DXT1 S3 DXT1 texture compression. Supported on devices running Nvidia Tegra platform. S3TC S3 texture compression, nonspecific to DXT variant. Supported on devices running Nvidia Tegra platform.

That’s a lot of formats, revealing a different problem. How do you choose which format to use?

To best support all devices you need to create multiple apks using different texture formats. The Google Play developer console allows you to add multiple apks and will deliver the right one to the user based on their device. For more information check this page.

When a device only supports OpenGL ES 2.0 it is recommended to use a proprietary format to get the best results possible, this means making an apk for each hardware.

On devices with access to OpenGL ES 3.0 you can use ETC2. The GL_COMPRESSED_RGBA8_ETC2_EAC format is an improved version of ETC1 with added alpha support.

The best case is when the device supports the Android Extension Pack. Then you should use the ASTC format which has better quality and is more efficient than the other formats.

Adaptive Scalable Texture Compression (ASTC)

The Android Extension Pack has ASTC as a standard format, removing the need to have different formats for different devices.

In addition to being supported on modern hardware, ASTC also offers improved quality over other GPU formats by having full alpha support and better quality preservation.

ASTC is a block based texture compression algorithm developed by ARM. It offers multiple block footprints and bitrate options to lower the size of the final texture. The higher the block footprint, the smaller the final file but possibly more quality loss.

Note that some images compress better than others. Images with similar neighboring pixels tend to have better quality compared to images with vastly different neighboring pixels.

Let’s examine a texture to better understand ASTC:

This bitmap is 1.1MB uncompressed and 299KB when compressed as PNG.

Compressing the Android jellybean jar texture into ASTC through the Mali GPU Texture Compression Tool yields the following results.

table { border-collapse: collapse; } table, th, td { border: 1px solid black; }td { padding: 5px; } Block Footprint 4x4 6x6 8x8 Memory 262KB 119KB 70KB Image Output Difference Map 5x Enhanced Difference Map

As you can see, the highest quality (4x4) bitrate for ASTC already gains over PNG in memory size. Unlike PNG, this gain stays even after copying the image to the GPU.

The tradeoff comes in the detail, so it is important to carefully examine textures when compressing them to see how much compression is acceptable.


Using hardware accelerated textures in your games will help you reduce the size of your .apk, runtime memory use as well as loading times.

Improve performance on a wider range of devices by uploading multiple apks with different GPU texture formats and declaring the texture type in the AndroidManifest.xml.

If you are aiming for high end devices, make sure to use ASTC which is included in the Android Extension Pack.

Join the discussion on

+Android Developers
Categories: Programming

Python: Counter – ValueError: too many values to unpack

Mark Needham - Tue, 01/13/2015 - 00:16

I recently came across Python’s Counter tool which makes it really easy to count the number of occurrences of items in a list.

In my case I was trying to work out how many times words occurred in a corpus so I had something like the following:

>> from collections import Counter
>> counter = Counter(["word1", "word2", "word3", "word1"])
>> print counter
Counter({'word1': 2, 'word3': 1, 'word2': 1})

I wanted to write a for loop to iterate over the counter and print the (key, value) pairs and started with the following:

>>> for key, value in counter:
...   print key, value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack

I’m not sure why I expected this to work but in fact since Counter is a sub class of dict we need to call iteritems to get an iterator of pairs rather than just keys.

The following does the job:

>>> for key, value in counter.iteritems():
...   print key, value
word1 2
word3 1
word2 1

Hopefully future Mark will remember this!

Categories: Programming