Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!
Software Development Blogs: Programming, Software Testing, Agile Project Management
Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

If any of these items interest you there's a full description of each sponsor below. Please click to read more...
Now that we have the C10K concurrent connection problem licked, how do we level up and support 10 million concurrent connections? Impossible you say. Nope, systems right now are delivering 10 million concurrent connections using techniques that are as radical as they may be unfamiliar.
To learn how it’s done we turn to Robert Graham, CEO of Errata Security, and his absolutely fantastic talk at Shmoocon 2013 called C10M Defending The Internet At Scale.
Robert has a brilliant way of framing the problem that I’ve never heard of before. He starts with a little bit of history, relating how Unix wasn’t originally designed to be a general server OS, it was designed to be a control system for a telephone network. It was the telephone network that actually transported the data so there was a clean separation between the control plane and the data plane. The problem is we now use Unix servers as part of the data plane, which we shouldn’t do at all. If we were designing a kernel for handling one application per server we would design it very differently than for a multi-user kernel.
Which is why he says the key is to understand:
The kernel isn’t the solution. The kernel is the problem.
Which means:
Don’t let the kernel do all the heavy lifting. Take packet handling, memory management, and processor scheduling out of the kernel and put it into the application, where it can be done efficiently. Let Linux handle the control plane and let the the application handle the data plane.
The result will be a system that can handle 10 million concurrent connections with 200 clock cycles for packet handling and 1400 hundred clock cycles for application logic. As a main memory access costs 300 clock cycles it’s key to design in way that minimizes code and cache misses.
With a data plane oriented system you can process 10 million packets per second. With a control plane oriented system you only get 1 million packets per second.
If this seems extreme keep in mind the old saying: scalability is specialization. To do something great you can’t outsource performance to the OS. You have to do it yourself.
Now, let’s learn how Robert creates a system capable of handling 10 million concurrent connections...
Hey, it's HighScalability time:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

This is an email interview with Viktor Klang, Director of Engineering at Typesafe, on the Scala Futures model & Akka, both topics on which is he is immensely passionate and knowledgeable.
How do you structure your application? That’s the question I explored in the article Beyond Threads And Callbacks. An option I did not talk about, mostly because of my own ignorance, is a powerful stack you may not be all that familiar with: Scala and Akka.
To remedy my oversight is our acting tour guide, Typesafe’s Viktor Klang, long time Scala hacker and Java enterprise systems architect. Viktor was very patient in answering my questions and was enthusiastic about sharing his knowledge. He’s a guy who definitely knows what he is talking about.
I’ve implemented several Actor systems along with the messaging infrastructure, threading, async IO, service orchestration, failover, etc, so I’m innately skeptical about frameworks that remove control from the programmer at the cost of latency.
So at the end of the interview am I ready to drink the koolaid? Not quite, but I’ll have a cup of coffee with the idea.
I came to think of Scala + Akka as a kind of a IaaS for your process architecture. Toss in Play for the web framework and you have a slick stack, with far more out of the box power than Go, Node, or plaino jaino Java.
The build or buy decision is surprisingly similar to every other infrastructure decision you make. Should you use a cloud or build your own? It’s the same sort of calculation you need to go through when deciding on your process architecture. While at the extremes you lose functionality and flexibility, but since they’ve already thought of most everything you would need to think about, with examples, and support, you gain a tremendous amount too. Traditionally, however, processes architecture has been entirely ad-hoc. That may be changing.
Now, let’s start the interview with Viktor...
I read one of these poignantly humorous comics on Not Invented Here a while back and since I wasn't sure it was OK to repost I emailed asking for permission. Nada. Then I saw Martijn de Vrieze posted a collection of scalability comics from NIH and decided what the heck (click image to read on site):
Thanks to Martijn for curating the collection and NIH for creating them.
And I agree with Martijn, they do capture an ineffable quality about the entire space.

Hey, it's HighScalability time:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

In NoSQL: Past, Present, Future Eric Brewer has a particularly fine section on explaining the often hard to understand ideas of BASE (Basically Available, Soft State, Eventually Consistent), ACID (Atomicity, Consistency, Isolation, Durability), CAP (Consistency Availability, Partition Tolerance), in terms of a pernicious long standing myth about the sanctity of consistency in banking.
Myth: Money is important, so banks must use transactions to keep money safe and consistent, right?
Reality: Banking transactions are inconsistent, particularly for ATMs. ATMs are designed to have a normal case behaviour and a partition mode behaviour. In partition mode Availability is chosen over Consistency.
Why? 1) Availability correlates with revenue and consistency generally does not. 2) Historically there was never an idea of perfect communication so everything was partitioned...

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

This is a repost of part 2 (part 1) of an interview I did for the Boundary blog.
Boundary: There’s another battle coming down the pike between Amazon (AWS) and Google (GCE). How should the CTO decide which one’s best?
Hoff: Given that GCE is still closed to public access we have very little common experience on which to judge. The best way to decide is as always, by running a few experiments. Pick a few representative projects, a representative team, implement the projects on both infrastructures, crunch some numbers, figure out the bigger picture and then select the one you wanted in the first place
.
Sebastian Stadil, founder of Scalr, recently wrote about his experiences on both platforms and found some interesting differences: AWS has a much richer set of services; GCE is on-demand only, so AWS can be cheaper; GCE has faster disk and faster network IO, especially between datacenters; GCE has faster boot times and can mount read-only partitions across multiple machines; and GCE shares images across regions...
Hey, it's HighScalability time:
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Joe Armstrong is a co-inventor of Erlang and general all around renaissance software tinkerer as shown by his excellent work on writing a C Compiler and his voluminous work on GitHub.
Given the success of Erlang it's probably no surprise that he wrote his thesis on the ground breaking ideas behind Erlang: Making reliable distributed systems in the presence of software errors.
Even if you have yet to join the cult of Erlang the principles behind Erlang are universal and well worth exploring for your own designs. Highly recommended.
Introduction:

Solving problems while saving money is always a problem. In Nobody ever got ï¬red for using Hadoop on a cluster they give some counter-intuitive advice by showing a big-memory server may provide better performance per dollar than a cluster:

This is a repost of part 1 of an interview I did for the Boundary blog.
Boundary: What is Facebook’s secret sauce for managing what’s got to be the biggest Big Data project, if you will, on the Web?
Hoff: From several presentations we’ve learned what Facebook insiders like Aditya Agarwal and Robert Johnson, both former Directors of Engineering, consider their secret sauce:
Hey, it's HighScalability time:
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...
Tachyon (github) is interesting new filesystem brought to by the folks at the UC Berkeley AMP Lab:

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Pinterest has been riding an exponential growth curve, doubling every month and half. They’ve gone from 0 to 10s of billions of page views a month in two years, from 2 founders and one engineer to over 40 engineers, from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances.
Stunning growth. So what’s Pinterest's story? To tell their story we have our bards, Pinterest’s Yashwanth Nelapati and Marty Weiner, who tell the dramatic story of Pinterest’s architecture evolution in a talk titled Scaling Pinterest. This is the talk they would have liked to hear a year and half ago when they were scaling fast and there were a lot of options to choose from. And they made a lot of incorrect choices.
This is a great talk. It’s full of amazing details. It’s also very practical, down to earth, and it contains strategies adoptable by nearly anyone. Highly recommended.
Two of my favorite lessons from the talk:
These two lessons are interrelated. Tools following the principles in (2) can scale by adding more boxes. And as load increases mature products should have fewer problems. When you do hit problems you’ll at least have a community to help fix them. It’s when your tools are too tricky and too finicky that you hit walls so high you can’t climb over.
It’s in what I think is the best part of the entire talk, the discussion of why sharding is better than clustering, that you see the themes of growing by adding resources, few failure modes, mature, simple, and good support, come into full fruition. Notice all the tools they chose grow by adding shards, not through clustering. The discussion of why they prefer sharding and how they shard is truly interesting and will probably cover ground you’ve never considered before.
Now, let’s see how Pinterest scales:
Hey, it's HighScalability time:
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...
In Don’t panic! Here’s how to quickly scale your mobile apps Mike Maelzer paints a wonderful picture of how Avocado, a mobile app for connecting couples, evolved to handle 30x traffic within a few weeks. If you are just getting started then this is a great example to learn from.
What I liked: it's well written, packing a lot of useful information in a little space; it's failure driven, showing the process of incremental change driven by purposeful testing and production experience; it shows awareness of what's important, in their case, user signup; a replica setup was used for testing, a nice cloud benefit.
Their Biggest lesson learned is a good one:
It would have been great to start the scaling process much earlier. Due to time pressure we had to make compromises –like dropping four of our media resizer boxes. While throwing more hardware at some scaling problems does work, it’s less than ideal.Here's my gloss on the article:
Evolution One - Make it Work