Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Try to Be 50% Wrong from Both Sides

NOOP.NL - Jurgen Appelo - Mon, 07/07/2014 - 14:06
50% Wrong

When extremists on both sides claim that you’re 50% wrong, you can feel confident that you’re probably right where you should be.

Vegetables are healthy, but I’ve never heard someone suggest that eating only vegetables is a wise approach to long-term health

The post Try to Be 50% Wrong from Both Sides appeared first on NOOP.NL.

Categories: Project Management

R/plyr: ddply ‚Äď Error in vector(type, length) : vector: cannot make a vector of mode ‚Äėclosure‚Äô.

Mark Needham - Mon, 07/07/2014 - 07:07

In my continued playing around with plyr’s ddply function I was trying to group a data frame by one of its columns and return a count of the number of rows with specific values and ran into a strange (to me) error message.

I had a data frame:

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(TRUE, FALSE, TRUE) 
df = data.frame(n, s, b)

And wanted to group and count on column ‘b’ so I’d get back a count of 2 for TRUE and 1 for FALSE. I wrote this code:

ddply(df, "b", function(x) { 
  countr <- length(x$n) 
  data.frame(count = count) 
})

which when evaluated gave the following error:

Error in vector(type, length) : 
  vector: cannot make a vector of mode 'closure'.

It took me quite a while to realise that I’d just made a typo in assigned the count to a variable called ‘countr’ instead of ‘count’.

As a result of that typo I think the R compiler was trying to find a variable called ‘count’ somwhere else in the lexical scope but was unable to. If I’d defined the variable ‘count’ outside the call to ddply function then my typo wouldn’t have resulted in an error but rather an unexpected resulte.g.

> count = 10
> ddply(df, "b", function(x) { 
+   countr <- length(x$n) 
+   data.frame(count = count) 
+ })
      b count
1 FALSE     4
2  TRUE     4

Once I spotted the typo and fixed it things worked as expected:

> ddply(df, "b", function(x) { 
+   count <- length(x$n) 
+   data.frame(count = count) 
+ })
      b count
1 FALSE     1
2  TRUE     2
Categories: Programming

Quote of the Day

Herding Cats - Glen Alleman - Mon, 07/07/2014 - 07:02

Movement without direction will create a hole in the ground ‚ÄĒ¬†Sophia Bedford-Pierce

Categories: Project Management

SPaMCAST 297 ‚Äď IFPUG Function Points

www.spamcast.net

http://www.spamcast.net

 

Listen to SPaMCAST 297 now!

SPaMCAST 297 features our essay on IPFUG Function Points. IFPUG Function Points are a measure of the functionality delivered by the project or application being counted based on a set of rules documented in the IFPUG Counting Practices Manual (CPM). The measure of delivered functionality is a proxy for size which can be used in estimating and measuring work. An analogy for function points is the measure of the number of square feet (or square meters) of a house. Want more?  Listen to SPaMCAST 297 now!

Next week we will feature our interview with Brian Federici.  We discussed working in an environment with a nearly continuous delivery model from the point of view of a practitioner. Does what sounds good in theory really work when it is implemented?

Upcoming Events

Upcoming DCG Webinars:

July 24 11:30 EDT – The Impact of Cognitive Bias On Teams
Check these out at www.davidconsultinggroup.com

I will be attending Agile 2014 in Orlando, July 28 through August 1, 2014.  It would be great to get together with SPaMCAST listeners, let me know if you are attending. http://agile2014.agilealliance.org/

I will be presenting at the International Conference on Software Quality and Test Management in San Diego, CA on October 1.  I HAVE A PROMO CODE!  Send me an email and I will share.

http://www.pnsqc.org/international-conference-software-quality-test-management-2014/

I will be presenting at the North East Quality Council 60th Conference October 21st and 22nd in Springfield, MA.

http://www.neqc.org/conference/60/location.asp

More on all of these great events in the near future! I look forward to seeing all SPaMCAST readers and listeners that attend these great events!

The Software Process and Measurement Cast has a sponsor.

As many you know I do at least one webinar for the IT Metrics and Productivity Institute (ITMPI) every year. The ITMPI provides a great service to the IT profession. ITMPI’s mission is to pull together the expertise and educational efforts of the world’s leading IT thought leaders and to create a single online destination where IT practitioners and executives can meet all of their educational and professional development needs. The ITMPI offers a premium membership that gives members unlimited free access to 400 PDU accredited webinar recordings, and waives the PDU processing fees on all live and recorded webinars. The Software Process and Measurement Cast some support if you sign up here. All the revenue our sponsorship generates goes for bandwidth, hosting and new cool equipment to create more and better content for you. Support the SPaMCAST and learn from the ITMPI.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here.

Available in English and Chinese.


Categories: Process Management

SPaMCAST 297 ‚Äď IFPUG Function Points

Software Process and Measurement Cast - Sun, 07/06/2014 - 22:00

SPaMCAST 297 features our essay on IPFUG Function Points. IFPUG Function Points are a measure of the functionality delivered by the project or application being counted based on a set of rules documented in the IFPUG Counting Practices Manual (CPM). The measure of delivered functionality is a proxy for size which can be used in estimating and measuring work. An analogy for function points is the measure of the number of square feet (or square meters) of a house.

Next week we will feature our interview with Brian Federici.  We discussed working in an environment with a nearly continuous delivery model from the point of view of a practitioner. Does what sounds good in theory really work when it is implemented?

Upcoming Events

Upcoming DCG Webinars:

July 24 11:30 EDT - The Impact of Cognitive Bias On Teams
Check these out at www.davidconsultinggroup.com

 

I will be attending Agile 2014 in Orlando, July 28 through August 1, 2014.  It would be great to get together with SPaMCAST listeners, let me know if you are attending. http://agile2014.agilealliance.org/

I will be presenting at the International Conference on Software Quality and Test Management in San Diego, CA on October 1.  I HAVE A PROMO CODE!  Send me an email and I will share.
http://www.pnsqc.org/international-conference-software-quality-test-management-2014/

I will be presenting at the North East Quality Council 60th Conference October 21st and 22nd in Springfield, MA.
http://www.neqc.org/conference/60/location.asp

More on all of these great events in the near future! I look forward to seeing all SPaMCAST readers and listeners that attend these great events!

The Software Process and Measurement Cast has a sponsor.

As many you know I do at least one webinar for the IT Metrics and Productivity Institute (ITMPI) every year. The ITMPI provides a great service to the IT profession. ITMPI's mission is to pull together the expertise and educational efforts of the world's leading IT thought leaders and to create a single online destination where IT practitioners and executives can meet all of their educational and professional development needs. The ITMPI offers a premium membership that gives members unlimited free access to 400 PDU accredited webinar recordings, and waives the PDU processing fees on all live and recorded webinars. The Software Process and Measurement Cast some support if you sign up here. All the revenue our sponsorship generates goes for bandwidth, hosting and new cool equipment to create more and better content for you. Support the SPaMCAST and learn from the ITMPI.

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: "This book will prove that software projects should not be a tedious process, neither for you or your team." Support SPaMCAST by buying the book here.

Available in English and Chinese.

Categories: Process Management

I teach people how to draw pictures

Coding the Architecture - Simon Brown - Sun, 07/06/2014 - 11:55

Regular readers will know that I'm a big fan of pictures, especially as a mechanism for communicating the structure of software systems. To this end, I sometimes introduce myself as somebody who teaches people how to draw pictures. This is often said with a smile, because on the face of it this is exactly what I appear to be doing. But there's less truth to this than there initially appears.

Something that we do on the full 2-day training course and the 1-day sketching workshop is to get people drawing some pictures to communicate the structure of a software system. This is either a solution to the financial risk system or their own software. Over the years, I've done this for thousands of people and pretty much everybody struggles to draw something that effectively communicates the software. The same challenges crop up again and again.

The challenges of drawing pictures

The diagrams that are produced during the initial 90-minute timebox are usually fairly confusing and exhibit problems ranging from a lack of clarity around the abstractions and notation used through to diagrams that show too little or too much detail. This is an aside, but the majority of the diagrams are informal "boxes and lines" rather than UML.

After some reviews and feedback, I introduce people to my C4 model. In a nutshell, it's all about thinking of a software system as being made up of containers (web applications, databases, mobile devices, etc), each of which contains components, which in turn are made up of classes. There are some variations of this depending on the technology you're using, but that's basically it. With this structure in mind, you can then draw a diagram at each level in turn, which leads to a system context diagram, a container diagram, one or more component diagrams and (optionally) some class diagrams. If you want a slightly more detailed introduction to C4, take a look at Simple sketches for diagramming your software architecture.

The C4 model

Back to the workshops, and iteration two is another 90-minute timebox in which teams can redraw their diagrams. I don't actually mandate that people should follow my C4 approach, but most do, and what happens next is really interesting. The conversations change. In stark contrast to the first timebox, the conversations are much more technical. Gone are the discussions about what to include on each diagram and instead the focus is on the technical aspects of the software. Instead of "what should we show on this diagram?", the questions I hear are more along the lines of "what are the responsibilities of this thing?" and "how does this thing talk to that thing?". In essence, the discussion returns to be about software design, which is exactly where it should be.

There are a number of ways in which you can communicate the architecture of a software system, so I don't want to pitch my C4 approach as "the one true way". What I do want to say is that providing even a little guidance in this area can go a long way. I ask people for their thoughts after the second timebox and the consensus is that the diagrams are easier to draw and comprehend because of the "framework" (their words, not mine) that's been provided. All I'm doing is providing some constraints that people might want to work within, and this frees them up to focus on what's important again.

Back to that comment about introducing myself as somebody who teaches people how to draw pictures. It turns out that this isn't actually what I do. I don't really care so much *how* people draw pictures. What I do care about is that they use a well-understood, consistent and meaningful set of *abstractions* to think about and therefore describe their software. It turns out that modelling our software isn't dead after all. ;-)

Categories: Architecture

Distributed big balls of mud

Coding the Architecture - Simon Brown - Sun, 07/06/2014 - 10:27

If you want evidence that the software development industry is susceptible to fashion, just go and take a look at all of the hype around microservices. It's everywhere! For some people microservices is "the next big thing", whereas for others it's simply a lightweight evolution of the big SOAP service-oriented architectures that we saw 10 years ago "done right". I do like a lot of what the current microservice architectures are doing, but it's by no means a silver bullet. Okay, I know that sounds obvious, but I think many people are jumping on them for the wrong reason.

From monoliths to microservices

I often show this slide in my conference talks, and I've blogged about this before, but basically there are different ways to build software systems. On the one side we have traditional monolithic systems, where everything is bundled up inside a single deployable unit. This is probably where most of the industry is. Caveats apply, but monoliths can be built quickly and are easy to deploy, but they provide limited agility because even tiny changes require a full redeployment. We also know that monoliths often end up looking like a big ball of mud because of the way that software often evolves over time. For example, many monolithic systems are built using a layered architecture, and it's relatively easy for layered architectures to be abused (e.g. skipping "around" a service to call the repository/data access layer directly).

On the other side we have service-based architectures, where a software system is made up of many separately deployable services. Again, caveats apply but, if done well, service-based architectures buy you a lot of flexibility and agility because each service can be developed, tested, deployed, scaled, upgraded and rewritten separately, especially if the services are decoupled via asynchronous messaging. The downside is increased complexity because your software system now has many more moving parts than a monolith. As Robert says, the complexity is still there, you're just moving it somewhere else.

There is, of course, a mid-ground here. We can build monolithic systems that are made up of in-process components, each of which has an explicit well-defined interface and set of responsibilities. This is old-school component-based design that talks about high cohesion and low coupling, but I usually sense some hesitation when I talk about it. And this seems odd to me. Before I explain why, let me quote something from a blog post that I read earlier this morning about the rationale behind a team adopting a microservices approach. When we started building Karma, we decided to split the project into two main parts: the backend API, and the frontend application. The backend is responsible for handling orders from the store, usage accounting, user management, device management and so forth, while the frontend offers a dashboard for users which accesses this API. Along the way we noticed that if the whole backend API is monolithic it doesn't work very well because everything gets entangled.

The blog post also mentions scaling, versioning and multiple languages/frameworks as other reasons to choose microservices. Again, there are no silver bullets here, everything is a trade-off. Anyway, "everything getting entangled" is not a reason to switch from monoliths to microservices. If you're building a monolithic system and it's turning into a big ball of mud, perhaps you should consider whether you're taking enough care of your software architecture. Do you really understand what the core structural abstractions are in your software? Are their interfaces and responsibilities clear too? If not, why do you think moving to a microservices architecture will help? Sure, the physical separation of services will force you to not take some shortcuts, but you can achieve the same separation between components in a monolith. A little design thinking and an architecturally-evident coding style will help to achieve this without the baggage of going distributed.

Many of the teams I've spoken to are building monolithic systems and don't want to look at component-based design. The mid-ground seems to be a hard-sell. I ran a software architecture sketching workshop with a team earlier this year where we diagrammed one of their software systems. The diagram started as a strictly layered architecture (presentation, business services, data access) with all arrows pointing downwards and each layer only ever calling the layer directly beneath it. The code told a different story though and the eventual diagram didn't look so neat anymore. We discussed how adopting a package by component approach could fix some of these problems, but the response was, "meh, we like building software using layers".

It seems as if teams are jumping on microservices because they're sexy, but the design thinking and decomposition strategy required to create a good microservices architecture are the same as those needed to create a well structured monolith. If teams find it hard to create a well structured monolith, I don't rate their chances of creating a well structured microservices architecture. As Michael Feathers recently said, "There's a bit of overhead involved in implementing each microservice. If they ever become as easy to create as classes, people will have a freer hand to create trouble - hulking monoliths at a different scale.". I agree. A world of distributed big balls of mud worries me.

Categories: Architecture

Stakeholders Don’t Care How Hard A Story Is!

The language they understand is months and dollars (or any other type of currency).

The language they understand is months and dollars (or any other type of currency).

Clients, stakeholders and pointy haired bosses really do care about how long a project will take, how much the project will cost and by extension the effort required to deliver the project. What clients, stakeholders and bosses don’t care about is how much the team needs to think or the complexity of the stories or features, except as those attributes effect the duration, cost and effort.  The language they understand is months and dollars (or any other type of currency). Teams however, need to speak in terms of complexity and code (programming languages). Story points are an attempt to create a common understanding.

When a team uses story points, t-shirt or other relative sizing techniques, they hash a myriad of factors together.  When a team decomposes problem they have to assess complexity, capability and capacity in order to determine how long a story, feature or task will take (and therefor cost).  The number of moving parts in this mental algebra makes the outcome variable.  That variability generates debates on how rational it is to estimate at this level that we will not tackle in this essay.  When the team translates their individual perceptions (that include complexity, capacity and capability) into story points or other relative sizing techniques, they are attempting to share an understanding with stakeholders of how long and at what price (with a pinch of variability).  For example, if a team using t-shirt sizing and two week sprints indicate they can deliver 1 large story and 2 two medium or 1 medium and 5 small stories based on past performance, it would be fairly easy to determine when the items on the backlog will be delivered and a fair approximation on the number of sprints (aka effort, which equates to cost).

Clients, stakeholders and bosses are not interested in the t-shirt sizes or the number of story points, but they do care about whether a feature will take a long time to build or cost a lot. The process of sizing helps technical teams translate how hard a story or a project is into words that clients, stakeholders and bosses can understand intimately.


Categories: Process Management

Celebration And Reflection Are Sides Of The Same Coin

Happy 4th of July.

Happy 4th of July.

Whether on a personal or a project level, the calendar is the most important measuring stick used to gauge progress because it is the measure everyone understands and can follow.  We mark significant milestones with a celebration. For example, birthdays are always a milestone and every country, person or project has a birthday (whether they celebrate them or not is another story). Whether the celebration is annually or monthly (anyone with a newborn has celebrated birthdays monthly), isn’t as material as using the calendar milestone to create space for celebration and reflection.  At milestones like birthdays we should remember and reflect on where we started, the path that we have taken and our goals.

In Agile projects we build in time for reflection called retrospectives. While in classic waterfall projects we have post-implementation reviews.  Regardless of which project management camp you find yourself in, introspection and the process adjustments generated from those changes are valuable. What is rarer is personal reflection. Sure we create resolutions on New Year’s Day, but how often do we review our progress against those goals? Milestones represent chance not only to celebrate, but, as importantly, a chance to step back and take time for introspection. For example, the team at the Software Process and Measurement Podcast and blog (yes there is a team) spent a bit of time during a recent retreat reflecting on a number of stalled projects and how we were going to get them going.

Milestone evoke celebration and introspection. Celebration is the easy part, what is typically harder is to reflect on how we met our goals. In some cases to we accept accomplishment with asking whether means justify the ends.  Early in my career I saw projects that, even though they delivered great outcomes, left project teams in shambles, or some cases, used creative accounting to hide budget overages.  In the short run the celebration was exciting, however in the long run was cost worth the benefit?  I think not. Tellingly, I do not know anyone from that stage of my career that is still in the business. A better approach is when the  markers that show that time is passing are a signal to celebrate and find time to reflect and renew.  In projects, those milestones include sprint reviews, demonstrations, sprint planning or classic phase gates. Each of those milestones generate feedback that helps teams and individuals change direction if needed. Feedback provides the impetus for change. It is usually very easy to mark those events by letting friends, family and stakeholders know what has been accomplished since the last milestone.

Everyone likes a celebration, whether it is because of fireworks, a piece of cake or the demonstration of some tasty bit of promised functionality.  After the celebration step back and reflect on the path that has been taken.  Seek out feedback, just like a retrospective and make changes where needed to ensure that when you reflect back at the next milestone that you do not have to think about what you should have done.


Categories: Process Management

Create the smallest possible Docker container

Xebia Blog - Fri, 07/04/2014 - 21:59

When you are playing around with Docker, you quickly notice that you are downloading large numbers of megabytes as you use preconfigured containers. A simple Ubuntu container easily exceeds 200MB and as software is installed on top of it, the size increases. In some use cases, you do not need everything that comes with Ubuntu. For example, if you want to run a simple web server, written in Go, there is no need for any tool around that at all.

I have been searching for the smallest possible container to start with and found this one:

docker pull scratch

The scratch image is perfect. Literally perfect! It is elegant, small and fast. It does not contain any bugs, security leaks, slow code or technical debt. And that is because it is basically empty. Except for a bit of metadata added by Docker. In fact, you could have created this scratch image yourself with this command as described in the Docker documentation:

tar cv --files-from /dev/null | docker import - scratch

 

So that is it, the smallest possible Docker image. End of blog post!

... or is there something more we can say about this? For example, how do you use the scratch base image? It turns out this brings some challenges of its own.

Creating content for the scratch image

What can we run on an empty base image? An executable without dependencies. Do you have executables without dependencies?

I used to write code in Python, Java and JavaScript. Each of these languages/platforms require a runtime installed. Recently, I started looking into the Go (or GoLang if you prefer) platform. And it seems (spoiler alert) like Go is statically  linked. So I tried compiling a simple web server saying Hello World and running it within the scratch container. Here is the code for the Hello World web server:

package main

import (
	"fmt"
	"net/http"
)

func helloHandler(w http.ResponseWriter, r *http.Request) {
	fmt.Fprintln(w, "Hello World from Go in minimal Docker container")
}

func main() {
	http.HandleFunc("/", helloHandler)

	fmt.Println("Started, serving at 8080")
	err := http.ListenAndServe(":8080", nil)
	if err != nil {
		panic("ListenAndServe: " + err.Error())
	}
}

 

Obviously, I cannot compile my webserver inside the scratch container as there is no Go compiler in it. And as I am working on a Mac, I also cannot compile a Linux binary just like that. (Actually, it is possible to cross-compile GoLang sources to different platforms, but that is material for another blog post)

So I first need a Docker container with a Go compiler. Let's start simple:

docker run -ti google/golang /bin/bash

 

Inside this container, I can build the Go web server, which I have committed in a GitHub repository:

go get github.com/adriaandejonge/helloworld

 

The go get command is a variant of the go build command that allows fetching and building remote dependencies. You can start the resulting executable with:

$GOPATH/bin/helloworld

 

This works. But it is not what we want. We need the hello world container to run inside the scratch container. So, in fact, we need a Dockerfile saying:

FROM scratch
ADD bin/helloworld /helloworld
CMD ["/helloworld"]

 

and then start that. Unfortunately, the way we started the google/golang container, there is no way to build this Dockerfile. So first, we need a way to access Docker from within the container.

Calling Docker from within Docker

When you use Docker, sooner or later you run into the need to control Docker from within Docker. There are multiple ways to accomplish this. You could use recursion and run Docker inside Docker. However, that seems overly complex and again leads to large containers. You can also provide access to the Docker server outside the instance with a few additional command line options:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(which docker):$(which docker) -ti google/golang /bin/bash

 

Before you continue, please rerun the Go compiler, as Docker forgot our previous compilation during the restart:

go get github.com/adriaandejonge/helloworld

 

When starting the container, the -v flag creates a volume inside the Docker container and allows you to provide a file from the Docker machine as input. The /var/run/docker.sock is the Unix socket that allows access to the Docker server. The $(which docker) part is a clever way to provide the path for the docker executable inside the container without hardcoding it. However, be careful when you use this command on an Apple when using boot2docker. If the docker executable is installed in a different location than it is installed in boot2docker's virtual machine, this results in a mismatch. It will be the executable inside the boot2docker virtual server that gets inserted into the container. So you may want to replace $(which docker) with /usr/local/bin/docker which is hardcoded. Similarly, if you run a different system, there is a chance that the /var/run/docker.sock has a different location and you need to adjust it accordingly.

Now you can use the Dockerfile inside the google/golang container in the $GOPATH directory, which points to /gopath in this example. Actually, I already checked this Dockerfile into GitHub. So you can copy it from the Go build directory to the desired location like this:

cp $GOPATH/src/github.com/adriaandejonge/helloworld/Dockerfile $GOPATH

 

You need to copy this as the compiled binary is now located in $GOPATH/bin and it is not possible to include files from parent directories when building a Dockerfile. So after copying, the next step is:

docker build -t adejonge/helloworld $GOPATH

 

And if all goes, well, Docker responds with something like:

Successfully built 6ff3fd5a381d

 

Which allows you to run the container:

docker run -ti --name hellobroken adejonge/helloworld

 

But unfortunately, now Docker responds with:

2014/07/02 17:06:48 no such file or directory

 

So what is going on? We have a statically linked executable inside a scratch container. Did we make a mistake?

As it turns out, Go does not statically link libraries. Or at least not all libraries. Under Linux, we can see the dynamically linked libraries for an executable with the ldd command:

ldd $GOPATH/bin/helloworld 

 

Which responds with:

linux-vdso.so.1 => (0x00007fff039fe000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f61df30f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f61def84000)
/lib64/ld-linux-x86-64.so.2 (0x00007f61df530000)

 

So before we can run the Hello World webserver, we need to tell the Go compiler to actually do static linking.

Creating statically linked executables in Go

In order to create statically linked executables, we need to tell Go to use the cgo compiler rather than the go compiler. The command to do so is:

CGO_ENABLED=0 go get -a -ldflags '-s' github.com/adriaandejonge/helloworld

 

The CGO_ENABLED environment variable tells Go to use the cgo compiler rather than the go compiler. The -a flag tells Go to rebuild all dependencies. Otherwise you still end up with dynamically linked dependencies. And finally the -ldflags '-s' flag is a nice extra. It reduces the file size of the resulting executable by roughly 50%. You can also do this without the cgo compiler. The size reduction is a result from removing debug information.

Just to be sure, rerun the ldd command.

ldd $GOPATH/bin/helloworld 

 

It should now respond with:

not a dynamic executable

 

You can also rerun the steps for creating the Docker container around the executable from scratch:

docker build -t adejonge/helloworld $GOPATH

 

And if all goes well, Docker responds with something like:

Successfully built 6ff3fd5a381d

 

Which allows you to run the container:

docker run -ti --name helloworld adejonge/helloworld

 

And this time it should respond with:

Started, serving at 8080

 

Until so far, there were many manual steps and there is a lot of room for error. Let's exit from the google/golang container and continue from the surrounding machine:

<Press Ctrl-C>
exit

 

You can check the existence or absence of containers and images with:

docker ps -a
docker images -a

 

And you can do some cleaning of Docker with:

docker rm -f helloworld
docker rmi -f adejonge/helloworld

 

Creating a Docker container that creates a Docker container

The steps we took so far, we can also record in a Dockerfile and have Docker do the work for us:

FROM google/golang
RUN CGO_ENABLED=0 go get -a -ldflags '-s' github.com/adriaandejonge/helloworld
RUN cp /gopath/src/github.com/adriaandejonge/helloworld/Dockerfile /gopath
CMD docker build -t adejonge/helloworld gopath

 

I checked this Dockerfile into a separate GitHub repository called adriaandejonge/hellobuild. It can be built with this command:

docker build -t adejonge/hellobuild github.com/adriaandejonge/hellobuild

 

Providing the  -t flag names the image as adejonge/hellobuild and implicitly tags it as latest. These names make it easier for you to remove the image later on. Next,  you can create a container from this image while providing the flags that you have seen earlier in this post:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(which docker):$(which docker) -ti --name hellobuild adejonge/hellobuild

 

Providing the --name hellobuild flag makes it easier to remove the container after running. In fact, you can do so right away, because after running this command, you already created the adejonge/helloworld image:

docker rm -f hellobuild
docker rmi -f adejonge/hellobuild

 

And now you can start a new container named helloworld based on the adejonge/helloworld image as you have done before:

docker run -ti --name helloworld adejonge/helloworld

 

Because all these steps are run from the same command line, without opening a bash shell inside a Docker container, you can add these steps to a bash script and run it automatically. For your convenience, I have added these bash scripts to the hellobuild GitHub repository.

Also, if you want to try the smallest possible Docker container running a Hello World web server without following all the steps described in this blog post, you can also use the pre-built image that I checked into the Docker Hub repository:

docker pull adejonge/helloworld

 

With docker images -a you can see that the size is 3.6MB. Of course, you can make it even smaller if you manage to create an executable that is smaller than the web server in Go that I wrote. In C or Assembly you may be able to do so. However, you can never make it smaller than the scratch image.

Stuff The Internet Says On Scalability For July 4th, 2014

Hey, it's HighScalability time:


Beauty is everywhere. Household dust magnified 22 million times.
  • Let's play a game of guess the company. They have: >100 billion searches per month; > 60 trillion known URLs; > 50 billion facts in knowledge graph; > 100 hours of video uploaded every minute; > 2 billion containers; > 6 trillion Cloud Datastore ops/month. Who is it? Why it's Google, of course. 
  • Billions of events every day: Twitter. One billion active users: Android.
  • Quotable quotes:
    • PeterGriffin: I don't know why the author called this "Multi-process architectures suck :(" when he really meant "I suck at multi-process architectures :("
    • @khrabrov: Experienced startup engineers are looking for a full-stack Business Guy to be CEO, COO, PM, marketer, account manager, HR, and receptionist.
    • @PatrickMcFadin: 30x perf over #hadoop by running #spark over #cassandra The crowd was stunned. 
    • @jcoglan: A programmer is someone who can simultaneously entertain the ideas that tight coupling is bad and fridges should be connected to the 'net
    • @BenedictEvans: Consumers spend more on apps (~$20bn run rate) than on recorded music ($17bn). 
    • @solarce: "You achieve nirvana when all failures are viewed as normal operations and not as apocalyptic events"
    • Rudiger Moller: Yup. As memory keeps getting cheaper, Java cannot profit except going off heap or use Azul Zing. Either improve concurrent GC or reduce the amount of references required to model data structures in Java.
    • @PatrickMcFadin: OH: "idompotency is better than beefalo"
  • I started listening to Songza about 6 weeks ago. Loved its emotional intelligence. And now I find Google went and acquired it. A coincidence? This is not a case of megalomania. It occurred to me that Google is in the perfect position to let some algorithms loose on its data to see if a service like Songza is gaining mind share. If you look at DNS access, G+, Gmail, Chrome, web trends, etc you have a pretty good proxy for actual usage data. In fact, your algorithms could just look at everything and identify acquisition targets by ranking what services are rising above the noise. And in double fact Google can probably estimate future growth trends better than Songza because they have historical data on many other services. 
  • Concurrency Improvements in HyperLevelDB. Taking single threaded code and making multithreaded is not for the faint of heart. Deadlocks await each new access pattern. By reducing time locks are held, using lock free data structures, and using fine grained locking HyperDex was able to reach 400K operations per second, better than LevelDB's 275K operations per second.
  • The Lambda Architecture has nothing to do with The Secret, in case you were wondering. To see why Jay Kreps has an excellent article Questioning the Lambda Architecture based on his experiences at LinkedIn. The main objection is double processing, concluding: These days, my advice is to use a batch processing framework like MapReduce if you aren’t latency sensitive, and use a stream processing framework if you are, but not to try to do both at the same time unless you absolutely must. Great discussion in the comment section. For me it's as simple as never mix read and write streams. They have completely different purposes. More on Hacker News.
  • Videos from the Velocity Conference 2014 on YouTube.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so keep on going)...

Categories: Architecture

Quote of the Day

Herding Cats - Glen Alleman - Fri, 07/04/2014 - 16:23

It is the mark of an educated mind to rest satisfied with the degree of precision which the nature of the subject admits and not to seek exactness where only an approximation is possible¬†‚ÄĒAristotle (384 B.C - 322 B.C.)

When we hear someone say estimates are guesses, When we estimate we act as if we believe the plan will not change, or similar uninformed nonsense, think of Aristotle. Without the understanding, from education, experience, and skill to realize that all project variables are random variables and vary naturally and vary from external events.

As such in order to determine the future impacts from decisions that involve cost, schedule, and performance, we need to estimate that impact of those random processes on the outcome of our decision.

This is the basis of all decision making in the presence of uncertainty. It's been claimed decisions can be made without estimates, but until someone comes up with the way to make decisions without estimating those impacts, statistical estimating is the way.

Categories: Project Management

My Best Diagram Ever: the Celebration Grid

NOOP.NL - Jurgen Appelo - Fri, 07/04/2014 - 12:59
Celebration Grid

At the Spark the Change conference in London, someone tweeted, “This one slide was worth the ticket price!” which was then retweeted more than 30 times. I got similar great comments from people in my workshops. And yes, I agree. My celebration grid might be the best model I ever created. It is based on the work by Donald Reinertsen, but I tried to visualize what he said in many more words.

The post My Best Diagram Ever: the Celebration Grid appeared first on NOOP.NL.

Categories: Project Management

Baselining and Benchmarking Require Standards

Baseline, not base line...

Baseline, not base line…

Measuring a process generates a baseline.  By contrast, a benchmark is a comparison of a baseline to another baseline.  Benchmarks can compare baselines to other internal baselines or external baselines.  I am often asked whether it possible to externally benchmark measures and metrics that have no industry definition or occasionally are team specific. Without developing a common definition of the measure or metric so that data is comparable, the answer is no.  A valid baseline line and benchmark requires that the measure or metric being collected is defined and consistently collected by all parties using the benchmark.

Measures or metrics used in external benchmarks need to be based on published or agreed upon standards between the parties involved in the benchmark.  Most examples of standards are obvious.  For example, in the software field there are a myriad of standards that can be leveraged to define software metrics.  Examples of standards groups include: IEEE, ISO, IFPUG, COSMIC and OMG. Metrics that are defined by these standards can be externally benchmarked and there are numerous sources of data.  Measures without international standards require all parties to specifically define what is being measured.  I recently ran across a simple example. The definition of a month caused a lot of discussion.  An organization compared function points per month (a simple throughput metric) to benchmark data they purchased.  The organization’s data was remarkably below the baseline.  The problem was that the benchmark used the common definition of a month (12 in a year) while their data used an internal definition of a 13 period year. The benchmark data or their data should have been modified to be comparable.

Applying the defined metric consistently is also critical and not always a given.  For example, when discussing the cost of an IT project understanding what is included is important for consistency.  Project costs could include hardware, software development and changes, purchased software, management costs, project management costs, business participation costs, and the list could go on ad-infinitum.  Another example might be the use of story points (a relative measure based on team perception), while a team may well be able to apply the measure consistently because it is based on perception comparisons, outside of the team would be at best valueless and at worst dangerous.

The data needed to create a baseline and for a benchmark comparison must be based on a common definition that is understood by all parties, or the results will generate misunderstandings.  A common definition is only a step along the route to a valuable baseline or benchmark, the data collection must be done on a consistent basis.  It is one thing to agree upon a definition and then have that definition consistently applied during data collection. Even metrics like IFPUG Function Points, which have a standard definition and rigorous training, can show up to a five percent variance between counters.  Less rigorously defined and trained metrics are unknowns that require due diligence by anyone that use them.


Categories: Process Management

Dockerfiles as automated installation scripts

Xebia Blog - Thu, 07/03/2014 - 19:16

Dockerfiles are great and easily readable specifications for the installation and configuration of an application. It is terse, can be understood by anyone who understands UNIX commands, results in a testable product and can easily be turned into an automated installation script using a little awk'ward magic. Just in case you want to install the application in question on the good old fashioned way, without the Docker hassle :-)

In this case, we needed to experiment with the Codahale Metrics library and Zabbix. Instead of installing a complete Zabbix server, I googled for a docker container and was pleased to find a ready to run Zabbix server configuration created by Bernardo Gomez Palacio. . Unfortunately, the server stopped repeatedly after about 5 minutes due the simplevisor's impression that it was requested to stop. I could not figure out where this request was coming from, and as it was pretty persistent, I decided to install zabbix on a virtual box.

So I checked out the  docker-zabbix github project and found a ready to run Vagrant configuration to build the zabbix docker container itself (Cool!). The Dockerfile contained easily and readable instructions on how to install and configure Zabbix. But,  instead of copy-and-pasting the instructions to the command prompt, I cloned the project on the vagrant box and created the following awk script in order to execute the instructions in the Dockerfile directly on the running system.

/^ADD/ {
sub(/ADD/, "")
    cmd = "mkdir -p $(dirname " $2 ")"
    system(cmd)
    cmd = "cp " $0
    system(cmd)
}

/^RUN/ {
    sub(/RUN/, "")
    cmd = $0
    system(cmd)
}

After a few minutes, the image was properly configured. I just needed to run the database initialisation script (/start.sh) and ensured that all the services were started on reboot.

 cd /etc/init.d
for i in zabbix* httpd mysqld snmp* ; do
     chkconfig $i on
     service $i start
done

Even if you do not use Docker in production, Dockerfiles are a great improvement in the specifications of installation instructions!

The Value of Information †

Herding Cats - Glen Alleman - Thu, 07/03/2014 - 15:18

Since all variables on all projects are random - cost, schedule, and delivered capabilities, in the economics of projects, the chance of being wrong and the Cost of being wrong is the expected opportunity cost. When we write software for money, we are participating in the microeconomics process of decision making based on information about the future:

  • What value will be returned in exchange for the cost to produce that value? This value can be tangible - revenue from the sales of our product, revenue from the efforts produced through our contracting firm to our client, cost savings to the internal IT operation, increased sales from our ability to be faster and better than our competition,¬†or intangible value in some broader sense - a public good for example.
  • What is the cost to produce that value? This cost is almost always tangible. Money spent over some period of time.¬†

Information is needed to assess both the cost and the value in order to DECIDE what to do. The formula for the value of this information can be mathematical as well as intuitive.

We make better decisions when we can reduce uncertainty about those decisions. Knowing the value of the information used to make those decisions is part of the microeconomics of writing software for money. 

If we are uncertain about a business decision, or a decision for the business based on technology, that means we have a chance of making a wrong decision. By wrong it means the consequences of the alternatives cannot be assessed and one chocie that might have been preferable was not choosen. The cost of being wrong is the difference between the wrong choice and the best alternative.

In order to make an informed decision we need information - as mentioned above. This information itself has uncertainty, and therefore most times we need to estimate the actual numbers from the source of the information:

  • How much will it cost? Good question. Can't really tell unless we have a firm fixed price quote from a vendor. Cost is not the same as budget. We might be able to fix the budget, but cost is always varaible in practice. We can stop when cost reaches budget, but then there is a chance we're not done - we have an incomplete deliverable, missing needed features.
  • What value will be produced? Good question. Can't really tell unless we have revenue from the same product or similar products, or have price quotes from our competitors for the same product we are trying to sell.
  • How long will it take to produce a¬†value producing outcome?¬†Good question. Can't really tell since we haven't done this before.

These questions and their answers are critical to the successful operation of any business, whose fundamental principle of operations is to turn expense into revenue. Since the variables involved in our projects are actually random variables, we'll need to estimate the answers, leaving the bigger question unanswered to date...

Can we make decisions without estimating the future impact on cost, schedule, and performance or that decision?

Gather information in support of decision making is decision risk reduction. The desire to reduce risk is good business practice. The decision maker needs information about the behaviour of the random variables involved in the decision making process. These must be estimated before the fact to make a decision about the future. 

To develop the needed estimates we need a Basis of Estimate process, which means building the estimates from Reference Classes, parametric models, or similar cardinal based processes that have calibrated in some way. The Ordinal (relative) estimate are not credible. This removes the ill conceived notion that estimates are guesses.

† Extracted from How To Measure Anything, Douglas W. Hubbard.

Related articles Making Estimates For Your Project Require Discipline, Skill, and Experience The Calculus of Writing Software for Money, Part II Why is Statistical Thinking Hard? How to "Lie" with Statistics All Project Numbers are Random Numbers - Act Accordingly How To Assure Your Project Will Fail Economics of Iterative Software Development Everything is a Random Variable Four Critical Elements of Project Success Why We Must Learn to Estimate
Categories: Project Management

Be Bold!

Making the Complex Simple - John Sonmez - Thu, 07/03/2014 - 15:00

Sometimes you have to just be bold if you want to maximize your opportunities.

The post Be Bold! appeared first on Simple Programmer.

Categories: Programming

How architecture enables kick ass teams (1): replication considered harmful?

Xebia Blog - Thu, 07/03/2014 - 11:51

At Xebia we regularly have discussions regarding Agile Architecture? What is it? What does it take? How should you organise this? Is it technical or organisational? And much more questions… which I won’t be answering today. What I will do today is kick off a blog series covering subjects that are often part of these heated debates. In general what we strive for with Agile Architecture is an architecture that enables the organisation to keep moving fast and without IT be a limiting factor for realising changes. As you read this series you’ll start noticing one theme coming back over and over again: Autonomy. Sometimes we’ll be focussing on the architecture of systems, sometimes on the architecture of the organisation or teams, but autonomy is the overarching theme. And if you’re familiar with Conways Law it should be no surprise that there is a strong correlation between team and system structure. Having a structure of teams  that is completely different from your system landscape causes friction. We are convinced that striving for optimal team and system autonomy will lead to an organisation which is able to quickly adapt and respond to changes.

The first subject is replication of data, this is more a systems (landscape) issue and less of an organisational issue and definitely not the only one, more posts will follow.

We all have to deal with situations where:

  • consumers of a data retrieval service (e.g. customer account details) require this service to be highly available, or
  • compute intensive analysis must be done using the data in a system, or
  • data owned by a system must be searched in a way that is not (efficiently) supported by that system

These situations all impact the autonomy of the system owning the data.Is the system able to provide the it's functionality at the require quality level or do these external requirements lead to negative consequences on quality of the service provided or maintainability? Should these requirements be forced into the system or is another approach more appropriate?

Above examples all could be solved by replicating data into another system which is more suitable for meeting these requirements but … replication of data is considered to be harmful by some. Is it really? Often mentioned reasons not to replicate data are:

  • The replicated data will always be less accurate and timely than the original data
    True, and is this really a problem for the specific situation you’re dealing with? Sometimes you really need the latest version of a customer record, but in many situations it is no problem is the data is seconds, minutes or even hours old.
  • Business logic that interprets the data is implemented twice and needs to be maintained
    Yes, and you have to compare the costs of this against the benefits. As long as the benefits outweigh the costs, it is a good choice.  You can even consider to provide a library that is used in both systems.
  • System X is the authoritative source of the data and should be the only one that exposes it
    Agree, and keeping that system as the authoritative source is good practice and does not mean that there could be read only access to the same (replicated) data in other systems.

As you can see it is never a black and white decision, you’ll have to make a balanced decision and include benefits and costs of both alternatives. The gained autonomy and business benefits derived from this can easily outweigh the extra development, hosting and maintenance costs of replicating data.

A few concrete examples from my own experience:

We had a situation where a CRM system instance owned data which was also required in a 24x7 emergency support proces. The data was nicely exposed by a number of data retrieval services. At that organisation the CRM system deployment was such that most components were redundant, but during updates the system as a whole would still be down for several hours. Which was not acceptable given that the data was required in a 24x7 emergency support process. Making the CRM system deployment upgradable without downtime was not possible or would cost .
In this situation the costs of replicating the CRM system database to another datacenter using standard database features and having the data retrieval services access either that replicated database or the original database (as fall back) was much cheaper than trying to make CRM system itself high available. The replicated database would remain running accessible even when CRM system  got upgraded. Yes, we’re bypassing the CRM system business logic for interpreting the data, but for this situation the logic was so simple that the costs of reimplementing and maintaining this in a new lightweight service (separate from CRM system) were neglectable.

Another example is from a telecom provider that uses a chain of fulfilment systems in which it registered all network products sold to its customers (e.g. internet access, telephony, tv). Each product instance depends on instances registered in another system and if you drill down deep enough you‚Äôll reach the physical network hardware ports on which it runs. The systems that registered all products used a relational model which was okay for registration. However, questions like ‚Äúif this product instance breaks, which customers are impacted‚ÄĚ were impossible to answer without overheating CPUs in those systems. By publishing all changes in the registrations to a separate system we could model the whole inventory of services as a network graph and easily do analysis on it without impacting the fulfilment systems. The fact that the data would be a (at most) a few seconds old was absolutely no problem.

And a last example is that sometimes you want to do a full (phonetic) text search through a subset of your domain model. Relational data models quickly get you into an unmaintainable situation. You‚Äôre SQL queries will require many tables, lot‚Äôs of inefficient ‚ÄúLIKE ‚Äė%gold%‚Äô" and developers that have a hard time understanding what a query actually intended to do. Replicating the data to a search engine makes searching far easier and¬†provides more possibilities for searches that are hard to realise in a relational database.

As you can see replication of data can increase autonomy of systems and teams and thereby make your system or system landscape and organisation more agile. I.e. you can realise new functionality faster and get it available for your users quicker because the coupling with other systems or teams is reduced.

In a next blog we'll discuss another subject that impacts team or system autonomy.

Loan calculator

Phil Trelford's Array - Wed, 07/02/2014 - 08:17

Yesterday I came across a handy loan payment calculator in C# by Jonathon Wood via Alvin Ashcraft’s Morning Dew links. The implementation appears to be idiomatic C# using a class, mutable properties and wrapped in a host console application to display the results.

I thought it’d be fun to spend a few moments re-implementing it in F# so it can be executed in F# interactive as a script or a console application.

Rather than use a class, I’ve plumped for a record type that captures all the required fields:

/// Loan record
type Loan = {
   /// The total purchase price of the item being paid for.
   PurchasePrice : decimal
   /// The total down payment towards the item being purchased.
   DownPayment : decimal
   /// The annual interest rate to be charged on the loan
   InterestRate : double
   /// The term of the loan in months. This is the number of months
   /// that payments will be made.
   TermMonths : int
   }

 

And for the calculation simply a function:

/// Calculates montly payment amount
let calculateMonthlyPayment (loan:Loan) =
   let monthsPerYear = 12
   let rate = (loan.InterestRate / double monthsPerYear) / 100.0
   let factor = rate + (rate / (Math.Pow(rate+1.,double loan.TermMonths) 1.))
   let amount = loan.PurchasePrice - loan.DownPayment
   let payment = amount * decimal factor
   Math.Round(payment,2)

 

We can test the function immediately in F# interactive

let loan = {
   PurchasePrice = 50000M
   DownPayment = 0M
   InterestRate = 6.0
   TermMonths = 5 * 12
   }

calculateMonthlyPayment loan

 

Then a test run (which produces the same results as the original code):

let displayLoanInformation (loan:Loan) =
   printfn "Purchase Price: %M" loan.PurchasePrice
   printfn "Down Payment: %M" loan.DownPayment
   printfn "Loan Amount: %M" (loan.PurchasePrice - loan.DownPayment)
   printfn "Annual Interest Rate: %f%%" loan.InterestRate
   printfn "Term: %d months" loan.TermMonths
   printfn "Monthly Payment: %f" (calculateMonthlyPayment loan)
   printfn ""

for i in 0M .. 1000M .. 10000M do
   let loan = { loan with DownPayment = i }
   displayLoanInformation loan

 

Another option is to simply skip the record and use arguments:

/// Calculates montly payment amount
let calculateMonthlyPayment(purchasePrice,downPayment,interestRate,months) =
   let monthsPerYear = 12
   let rate = (interestRate / double monthsPerYear) / 100.0
   let factor = rate + (rate / (Math.Pow(rate + 1.0, double months) - 1.0))
   let amount = purchasePrice - downPayment
   let payment = amount * decimal factor
   Math.Round(payment,2
Categories: Programming

New Entreprogrammers Episodes: 18 and 19

Making the Complex Simple - John Sonmez - Tue, 07/01/2014 - 18:37

We have two new episodes this week, since I had to call an emergency meeting last week about an important life decision. Check them out here:¬†http://entreprogrammers.com/ I can’t believe we are already at episode 19.

The post New Entreprogrammers Episodes: 18 and 19 appeared first on Simple Programmer.

Categories: Programming