Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Architecture

Stuff The Internet Says On Scalability For January 23rd, 2015

Hey, it's HighScalability time:


Elon Musk: The universe is really, really big  [Gigapixels of Andromeda [4K]]
  • 90: is the new 50 for woman designer; $656.8 million: 3 months of Uber payouts; $10 billion: all it takes to build the Internet in space; 1 billion: registered WeChat users
  • Quotable Quotes:
    • @antirez: Tech stacks, more replaceable than ever: hardware is better, startups get $$ (few nodes + or - who cares), alternatives countless.
    • Olivio Sarikas: If every Star in this Image was a 2 millimeter Sandcorn you would end up with 1110 kg of Sand!!!!!!!!!
    • Chad Cipoletti: In even simpler terms, we see brands as people.
    • @timoreilly: Love it: “We need a stack, not a pile” says @michalmigurski.
    • @neha: I would be very happy to never again see a distributed systems paper eval on a workload that would fit on one machine.
    • @etherealmind: OH: "oh yeah, the extra 4 PB of storage is being installed today. Its about 4 racks of gear".
    • @lintool: Andrew Moore: Google's ecommerce platform ingests 100K-200K events per second continuously. 

  • Programming as myth building. Myths to Live By: The true symbol does not merely point to something else. It contains in itself a structure which awakens our consciousness to a new awareness of the inner meaning of life and of reality itself. A true symbol takes us to the center of the circle, not to another point on the circumference.

  • Not shocking at all: "We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code...A majority (77%) of the failures require more than one input event to manifest, but most of the failures(90%) require no more than 3." Really, who has the time? More on human nature in Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems.

  • Let simplicity fail before climbing the complexity ladder. Scalability! But at what COST?: "Big data systems may scale well, but this can often be just because they introduce a lot of overhead. Rather than making your computation go faster, the systems introduce substantial overheads which can require large compute clusters just to bring under control. In many cases, you’d be better off running the same computation on your laptop." But notice the kicker: "it took some work for parallel union-find." Replacing smart work with brute force is often the greater win. What are a few machine cycles between friends?

  • Programming is the ultimate team sport, so Why are Some Teams Smarter Than Others? The smartest teams were distinguished by three characteristics. First, their members contributed more equally to the team’s discussions. Second, their members can better read complex emotional states. Third, teams with more women outperformed teams with more men.

  • WhatsApp doesn't understand the web. Interesting design and discussions. Using proprietary Chrome APIs is a tough call, but this is more perplexing: "Your phone needs to stay connected to the internet for our web client to work." Is this for consistency reasons? To make sure the phone and the web stay in sync? Is it for monetization reasons? It does create a closed proxy that effectively prevents monetization leaks. It's tough to judge a solution without understanding the requirements, but there must be something compelling to impose so many limitations.

  • Roman Leventov analysis of Redis data structures. In which Salvatore 'antirez' Sanfilippo addresses point by point criticisms of Redis' implementation. People love Redis, part of that love has to come from what a good guy antirez is. Here he doesn't go all black diamond alpha nerd in the face of a challenge. He admits where things can be improved. He explains design decisions in detail. He advances the discussion with grace, humility, and smarts. A worthy model to emulate.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

As a DBA Expert, which database would you choose?

This is a guest post by Jenny Richards, a professional database administrator who is currently employed at Remote DBA.

In the world of databases, there is no single silver bullet fitting for every gun. How you select the database to use is very dependent on every other factor of your work: 

  • Who are you and what do you do? 
  • What is your end goal – what are you working to achieve?
  • How much data do you intend to store?
  • On what language and OS platforms do your applications run?
  • What is your budget?
  • Will you also require data warehousing, decision support systems and/or BI?
Background information
Categories: Architecture

Learn from my pain - 5 Lessons from Ello's Adventures in Rapid Scaling

Within one week Ello went from thousands of sessions a day to a few million sessions a day. Mike Pack wrote a great article sharing what they’ve learned: 5 Early Lessons from Rapid, High Availability Scaling with Rails.

Some of their scaling challenges: quantity of data, team size, DNS, bot prevention, responding to users, inappropriate content, and other forms of caching. What did they learn?

  1. Move the graph. User relationships were implemented on a standard Rails stack using Heroku and Postgres. The relationships table became the bottleneck. Solution: denormalize the social graph and move hot data into Redis. Redis is used for speed and Postgres is used for durability. Lesson: know the core pillar that supports your core offering and make it work.

  2. Create indexes early, or you're screwed. There's a camp that says only create indexes when they are needed. They are wrong. The lack of btree indexes kills query performance. Forget a unique index and your data becomes corrupted. Once the damage is done it's hard to add unique indexes later. The data has to be cleaned up and indexes take a long time to build when there's a lot of data.

  3. Sharding is cool, but not that cool. Shard all the things only after you've tried vertically scaling as much as possible. Sharding caused a lot of pain. Creating a covering index from the start and adding more RAM so data could be served from memory, not from disk, would have saved a lot of time and stress as the system scaled.

  4. Don't create bottlenecks, or do. Every new user automatically followed a system user that was used for announcements, etc. Scaling problems that would have been months down the road hit quickly as any write to the system user caused a write amplification of millions of records. The lesson here is not what you may think. While scaling to meet the challenge of the system user was a pain, it made them stay ahead of the scaling challenge. Lesson: self-inflict problems early and often.

  5. It always takes 10 times longer. All the solutions mentioned take much longer to implement than you might think. Early estimates of a couple days soon give way to the reality of much longer time hits. Simply moving large amounts of data can take days. Adding indexes to large amounts of data takes time. And with large amounts of data problems tend to happen as you get to the larger data sizes which means you need to apply a fix and start over. 

This full article is excellent and is filled with much more detail that makes it well worth reading.

Categories: Architecture

Continuous Delivery across multiple providers

Xebia Blog - Wed, 01/21/2015 - 13:04

Over the last year three of the four customers I worked with had a similar challenge with their environments. In different variations they all had their environments setup across separate domains. Ranging from physically separated on-premise networks to having environments running across different hosting providers managed by different parties.

Regardless of the reasoning behind having these kinds of setup it’s a situation where the continuous delivery concepts really add value. The stereotypical problems that exist with manual deployment and testing practices tend to get amplified when they occur in seperated domains. Things get even worse when you add more parties to the mix (like external application developers). Sticking to doing things manually is a recipe for disaster unless you enjoy going through expansive procedures every time you want to do anything in any of ‘your’ environments. And if you’ve outsourced your environments to an external party you probably don’t want to have to (re)hire a lot of people just so you can communicate with your supplier.

So how can continuous delivery help in this situation? By automating your provisioning and deployments you make deploying your applications, if nothing else, repeatable and predictable. Regardless of where they need to run.

Just automating your deployments isn’t enough however, a big question that remains is who does what. A question that is most likely backed by a lengthy contract. Agreements between all the parties are meant to provide an answer to that very question. A development partner develops, an outsourcing partners handles the hardware, etc. But nobody handles bringing everything together...

The process of automating your steps already provides some help with this problem. In order to automate you need some form of agreement on how to provide input for the tooling. This at least clarifies what the various parties need to produce. It also clarifies what the result of a step will be. This removes some of the fuzziness out of the process. Things like is the JVM part of the OS or part of the middleware should become clear. But not everything is that clearcut. It’s parts of the puzzle where pieces actually come together that things turn gray. A single tool may need input from various parties. Here you need to resists the common knee-jerk reaction to shield said tool from other people with procedures and red tape. Instead provide access to those tools to all relevant parties and handle your separation of concerns through a reliable access mechanism. Even then there might be some parts that can’t be used by just a single party and in that case, *gasp*, people will need to work together. 

What this results in is an automated pipeline that will keep your environments configured properly and allow applications to be deployed onto them when needed, within minutes, wherever they may run.

MultiProviderCD

The diagram above shows how we set this up for one of our clients. Using XL Deploy, XL Release and Puppet as the automation tooling of choice.

In the first domain we have a git repository to which developers commit their code. A Jenkins build is used to extract this code, build it and package it in such a way that the deployment automation tool (XL Deploy) understands. It’s also kind enough to make that package directly available in XL Deploy. From there, XL Deploy is used to deploy the application not only to the target machines but also to another instance of XL Deploy running in the next domain, thus enabling that same package to be deployed there. This same mechanism can then be applied to the next domain. In this instance we ensure that the machines we are deploying to are consistent by using Puppet to manage them.

To round things off we use a single instance of XL Release to orchestrate the entire pipeline. A single release process is able to trigger the build in Jenkins and then deploy the application to all environments spread across the various domains.

A setup like this lowers deployment errors that come with doing manual deployments and cuts out all the hassle that comes with following the required procedures. As an added bonus your deployment pipeline also speeds up significantly. And we haven’t even talked about adding automated testing to the mix…

Sponsored Post: Couchbase, VividCortex, Internap, SocialRadar, Campanja, Transversal, MemSQL, Hypertable, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Senior DevOps EngineerSocialRadar. We are a VC funded startup based in Washington, D.C. operated like our West Coast brethren. We specialize in location-based technology. Since we are rapidly consuming large amounts of location data and monitoring all social networks for location events, we have systems that consume vast amounts of data that need to scale. As our Senior DevOps Engineer you’ll take ownership over that infrastructure and, with your expertise, help us grow and scale both our systems and our team as our adoption continues its rapid growth. Full description and application here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/
Cool Products and Services
  • See How PayPal Manages 1B Documents & 10TB Data with Couchbase. This presentation showcases PayPal's usage of Couchbase within its architecture, highlighting Linear scalability, Availability, Flexibility & Extensibility. See How PayPal Manages 1B Documents & 10TB Data with Couchbase.

  • VividCortex is a hosted (SaaS) database performance management platform that provides unparalleled insight and query-level analysis for both MySQL and PostgreSQL servers at micro-second detail. It's not just another tool to draw time-series charts from status counters. It's deep analysis of every metric, every process, and every query on your systems, stitched together with statistics and data visualization. Start a free trial today with our famous 15-second installation.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • Aerospike demonstrates RAM-like performance with Google Compute Engine Local SSDs. After scaling to 1 M Writes/Second with 6x fewer servers than Cassandra on Google Compute Engine, we certified Google’s new Local SSDs using the Aerospike Certification Tool for SSDs (ACT) and found RAM-like performance and 15x storage cost savings. Read more.

  • FoundationDB 3.0. 3.0 makes the power of a multi-model, ACID transactional database available to a set of new connected device apps that are generating data at previously unheard of speed. It is the fastest, most scalable, transactional database in the cloud - A 32 machine cluster running on Amazon EC2 sustained more than 14M random operations per second.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Try is free in the Future

Xebia Blog - Mon, 01/19/2015 - 09:40

Lately I have seen a few developers consistently use a Try inside of a Future in order to make error handling easier. Here I will investigate if this has any merits or whether a Future on it’s own offers enough error handling.

If you look at the following code there is nothing that a Future can’t supply but a Try can:

import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.{Await, Future, Awaitable}
import scala.concurrent.duration._
import scala.util.{Try, Success, Failure}

object Main extends App {

  // Happy Future
  val happyFuture = Future {
    42
  }

  // Bleak future
  val bleakFuture = Future {
    throw new Exception("Mass extinction!")
  }

  // We would want to wrap the result into a hypothetical http response
  case class Response(code: Int, body: String)

  // This is the handler we will use
  def handle[T](future: Future[T]): Future[Response] = {
    future.map {
      case answer: Int => Response(200, answer.toString)
    } recover {
      case t: Throwable => Response(500, "Uh oh!")
    }
  }

  {
    val result = Await.result(handle(happyFuture), 1 second)
    println(result)
  }

  {
    val result = Await.result(handle(bleakFuture), 1 second)
    println(result)
  }
}

After giving it some thought the only situation where I could imagine Try being useful in conjunction with Future is when awaiting a Future but not wanting to deal with error situations yet. The times I would be awaiting a future are very few in practice though. But when needed something like this migth do:

object TryAwait {
  def result[T](awaitable: Awaitable[T], atMost: Duration): Try[T] = {
    Try {
      Await.result(awaitable, atMost)
    }
  }
}

If you do feel that using Trys inside of Futures adds value to your codebase please let me know.

Meteor

Xebia Blog - Sun, 01/18/2015 - 12:11

Did you ever use AngularJS as a frontend framework? Then you should definitely give Meteor a try! Where AngularJS is powerful just as a client framework, meteor is great as a full stack framework. That means you just write your code in one language as if there is no back- and frontend at all. In fact, you get an Android and IOS client for free. Meteor is so incredibly simple that you are productive from the beginning.

Where meteor kicks angular

One of the killing features of meteor is that you'll have a shared code base for frontend and backend. In the next code snippet, you'll see a file shared by backend and frontend:

// Collection shared and synchronized accross client, server and database
Todos = new Mongo.Collection('todos');

// Shared validation logic
validateTodo = function (todo) {
  var errors = {};
  if (!todo.title)
    todo.title = "Please fill in a title";
  return errors;
}

Can you imagine how neat the code above is?

Scan 04 Jan 2015 18.48-page4

With one codebase, you get the full stack!

  1. As in the backend file and in the frontend file one can access and query over the Todos collection. Meteor is responsible for syncing the todos. Even when another user adds an item, it will be visible to your client directly. Meteor accomplishes this by a client-side Mongo implementation (MiniMongo).
  2. One can write validation rules once! And they are executed both on the front-end and on the back-end. So you can give my user quick feedback about invalid input, but you can also guarantee that no invalid data is processed by the backend (when someone bypasses the client). And this is all without duplicated code.

Another killer feature of meteor is that it works out of the box, and it's easy to understand. Angular can be a bit overwhelming; you have to learn concepts like directives, services, factories, filters, isolated scopes, transclusion. For some initial scaffolding, you need to know grunt, yeoman, etcetera. With meteor every developer can create, run and deploy a full-stack application within minutes. After installing meteor you can run your app within seconds.

$ curl https://install.meteor.com | /bin/sh
$ meteor create dummyapp
$ cd dummyapp
$ meteor
$ meteor deploy dummyapp.meteor.com
Screen Shot 2015-01-04 at 19.49.08

Meteor dummy application

Another nice aspect of meteor is that it uses DDP, the Distributed Data Protocol. The team invented the protocol and they are heavily promoting it as "REST for websockets". It is a simple, pragmatic approach allowing it to be used to deliver live updates as data changes in the backend. Remember that this works all out of the box. This talk walks you through the concepts of it. But the result is that if you change data on a client it will be updated immediately on the other client.

And there is so much more, like...

  1. Latency Compensation. On the client, Meteor prefetches data and simulates models to make it look like server method calls return instantly.
  2. Meteor is open source and integrates with existing open source tools and frameworks.
  3. Services (like an official package server and a build farm).
  4. Command line tools
  5. Hot deploys
Where meteor falls short

Yes, meteor is not the answer to all your problems. The reason, I'll still choose angular above meteor for my professional work, is because the view framework of angular rocks. It makes it easy to structure your client code into testable units and connect them via dependency injection. With angular you can separate your HTML from your javascript. With meteor your javascript contains HTML elements, (because their UI-library is based on handlebars. That makes testing harder and large projects will become unstructured very quickly.

Another flaw emerges if your project already has a backend. When you choose meteor, you choose their full stack. That means: Mongo as database and Node.js as backend. Despite you are able to create powerful applications, Meteor doesn't allow you (easily) to change this stack.

Under the hood

Meteor consists out of several subprojects. In fact, it is a library of libraries. In fact, it is a stack; a standard set of core packages that are designed to work well together:

Components used by meteor

  1. To make meteor reactive they've included the components blaze and tracker. The blaze component is heavily based on handlebars.
  2. The DDP component is a new protocol, described by meteor, for modern client-server communication.
  3. Livequery and full stack database take all the pain of data synchronization between the database, backend and frontend away! You don't have to think about in anymore.
  4. The Isobuild package is a unified build system for browser, server and mobile.
Conclusion

If you want to create a website or a mobile app with a backend in no time, with getting lots of functionality out of the box, meteor is a very interesting tool. If you want to have more control or connect to an existing backend, then meteor is probably less suitable.

You can watch this presentation I recently gave, to go along with the article.

Stuff The Internet Says On Scalability For January 16th, 2015

Hey, it's HighScalability time:


First people to free-climb the Dawn Wall of El Capitan using nothing but stone knives and bearskins (pics). 
  • $3.3 trillion: mobile revenue in 2014; ~10%: the difference between a good SpaceX landing and a crash; 6: hours for which quantum memory was held stable 
  • Quotable Quotes:
    • @stevesi: "'If you had bought the computing power found inside an iPhone 5S in 1991, it would have cost you $3.56 million.'"
    • @imgurAPI: Where do you buy shares in data structures? The Stack Exchange
    • @postwait: @xaprb agreed. @circonus does per-second monitoring, but *retain* one minute for 7 years; that plus histograms provides magical insight.
    • @iamaaronheld: A single @awscloud datacenter consumes enough electricity to send 24 DeLoreans back in time
    • @rstraub46: "We are becoming aware that the major questions regarding technology are not technical but human questions" - Peter Drucker, 1967
    • @Noahpinion: Behavioral economics IS the economics of information. via @CFCamerer 
    • @sheeshee: "decentralize all the things" (guess what everybody did in the early 90ies & why we happily flocked to "services". ;)
    • New Clues: The Internet is no-thing at all. At its base the Internet is a set of agreements, which the geeky among us (long may their names be hallowed) call "protocols," but which we might, in the temper of the day, call "commandments."

  • Can't agree with this. We Suck at HTTP. HTTP is just a transport. It should only deliver transport related error codes. Application errors belong in application messages, not spread all over the stack. 

  • Apple has lost the functional high ground. It's funny how microservices are hot and one of its wins is the independent evolution of services. Apple's software releases now make everything tied together. It's a strategy tax. The watch just extends the rigidity of the structure. But this is a huge upgrade. Apple is moving to a cloud multi-device sync model, which is a complete revolution. It will take a while for all this to shake out. 

  • This is so cool, I've never heard of Cornelis Drebbel (1620s) before or about his amazing accomplishments. The Vulgar Mechanic and His Magical Oven: His oven is one of the earliest devices that gave human control away to a machine and thus can be seen as a forerunner of the smart machine, the self-deciding automaton, the thinking robot.

  • Do you think there's a DevOps identity crisis, as Baron Schwartz suggests? Does DevOps have a messaging and positioning problem? Is DevOps just old wine in a new skin? Is DevOps made up of echo chambers? I don't know, but an interesting analysis by Baron.

  • How does Hyper-threading double your CPU throughput?: So if you are optimizing for higher throughput – that may be fine. But if you are optimizing for response time, then you may consider running with HT turned off.

  • Underdog.io share's what's Inside Datadog’s Tech Stack: python, javascript and go; the front-end happen in D3 and React; databases are Kafka, redis, Cassandra, S3, ElasticSearch, PostgreSQL; DevOps is Chef, Capistrano, Jenkins, Hubot, and others.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Bandita Joarder on How Presence is Something You Can Learn

Bandita is one of the most amazing leaders in the technology arena.

She’s not just technical, but she also has business skills, and executive presence.

But she didn’t start out that way.

She had to learn presence from the school of hard knocks.   Many people think presence is something that either you have or you don’t.

Bandita proves otherwise.

Here is a guest post by Bandita Joarder on how presence is something you can learn:

Presence is Something You Can Learn

It’s a personal story.  It’s an empowering story.  It’s a story of a challenge and a change, and how learning the power of presence, helped Bandita move forward in her career.

Enjoy.

Categories: Architecture, Programming

Monitoring Akka with Kamon

Xebia Blog - Thu, 01/15/2015 - 13:49

Kamon is a framework for monitoring the health and performance of applications based on akka, the popular actor system framework often used with Scala. It provides good quick indicators, but also allows in-depth analysis.

Tracing

Beyond just collecting local metrics per actor (e.g. message processing times and mailbox size), Kamon is unique in that it also monitors message flow between actors.

Essentially, Kamon introduces a TraceContext that is maintained across asynchronous calls: it uses AOP to pass the context along with messages. None of your own code needs to change.

Because of convenient integration modules for Spray/Play, a TraceContext can be automatically started when an HTTP request comes in.

If nothing else, this can be easily combined with the Logback converter shipped with Kamon: simply logging the token is of great use right out of the gate.

Dashboarding

Kamon does not come with a dashboard by itself (though some work in this direction is underway).

Instead, it provides 3 'backends' to post the data to (4 if you count the 'LogReporter' backend that just dumps some statistics into Slf4j): 2 on-line services (NewRelic and DataDog), and statsd (from Etsy).

statsd might seem like a hassle to set up, as it needs additional components such as grafana/graphite to actually browse the statistics. Kamon fortunately provides a correctly set-up docker container to get you up and running quickly. We unfortunately ran into some issues with the image uploaded to the Docker Hub Registry, but building it ourselves from the definition on github resolved most of these.

Implementation

We found the source code to Kamon to be clear and to-the-point. While we're generally no great fan of AspectJ, for this purpose the technique seems to be quite well-suited.

'Monkey-patching' a core part of your stack like this can of course be dangerous, especially with respect to performance considerations. Unless you enable the heavier analyses (which are off by default and clearly marked), it seems this could be fairly light - but of course only real tests will tell.

Getting Started

Most Kamon modules are enabled by adding their respective akka extension. We found the quickest way to get started is to:

  • Add the Kamon dependencies to your project as described in the official getting started guide
  • Enable the Metrics and LogReporter extensions in your akka configuration
  • Start your application with AspectJ run-time weaving enabled. How to do this depends on how you start your application. We used the sbt-aspectj plugin.

Enabling AspectJ weaving can require a little bit of twiddling, but adding the LogReporter should give you quick feedback on whether you were successful: it should start periodically logging metrics information.

Next steps are:

  • Enabling Spray or Play plugins
  • Adding the trace token to your logging
  • Enabling other backends (e.g. statsd)
  • Adding custom application-specific metrics and trace points
Conclusion

Kamon looks like a healthy, useful tool that not only has great potential, but also provides some great quick wins.

The documentation that is available is of great quality, but there are some parts of the system that are not so well-covered. Luckily, the source code very approachable.

It is clear the Kamon project is not very popular yet, judging by some of the rough edges we encountered. These, however, seem to be mostly superficial: the core ideas and implementation seems solid. We highly recommend taking a look.

 

Remco Beckers

Arnout Engelen

StackExchange's Performance Dashboard

StackExchange created a very cool performance dashboard that looks to be updated from real system metrics. Wouldn't it be fascinating if every site had a similar dashboard?

The dashboard contains information like there are 560 million page views per month, 260,000 sustained connections,  34 TB data transferred per month, 9 web servers with 48GB of RAM handling 185 req/s at 15% CPU usage. There are 4 SQL servers, 2 redis servers, 3 tag engine servers, 3 elasticsearch servers, and 2 HAProxy servers, along with stats on each.

There's also an excellent discussion thread on reddit that goes into more interesting details, with questions being answered by folks from StackExchange. 

StackExchange is still doing innovative work and is very much an example worth learning from. They've always danced to their own tune and it's a catchy tune at that. More at StackOverflow Update: 560M Pageviews A Month, 25 Servers, And It's All About Performance.

Categories: Architecture

Exploring Akka Stream's TCP Back Pressure

Xebia Blog - Wed, 01/14/2015 - 15:48

Some years ago, when Reactive Streams lived in utopia we got the assignment to build a high-volume message broker. A considerable amount of code of the solution we delivered back then was dedicated to prevent this broker being flooded with messages in case an endpoint became slow.

How would we have solved this problem today with the shiny new Akka Reactive Stream (experimental) implementation just within reach?

In this blog we explore Akka Streams in general and TCP Streams in particular. Moreover, we show how much easier we can solve the challenge we faced backed then using Streams.

A use-case for TCP Back Pressure

The high-volume message broker mentioned in the introduction basically did the following:

  • Read messages (from syslog) from a TCP socket
  • Parse the message
  • Forward the message to another system via a TCP connection

For optimal throughput multiple TCP connections were available, which allowed delivering messages to the endpoint system in parallel. The broker was supposed to handle about 4000 - 6000 messages per second. As follows a schema of the noteworthy components and message flow:

Waterhose2

Naturally we chose Akka as framework to implement this application. Our approach was to have an Actor for every TCP connection to the endpoint system. An incoming message was then forwarded to one of these connection Actors.

The biggest challenge was related to back pressure: how could we prevent our connection Actors from being flooded with messages in case the endpoint system slowed down or was not available? With 6000 messages per second an Actor's mailbox is flooded very quickly.

Another requirement was that message buffering had to be done by the client application, which was syslog. Syslog has excellent facilities for that. Durable mailboxes or something the like was out of the question. Therefore, we had to find a way to pull only as many messages in our broker as it could deliver to the endpoint. In other words: provide our own back pressure implementation.

A considerable amount of code of the solution we delivered back then was dedicated to back pressure. During one of our re-occurring innovation days we tried to figure out how much easier the back pressure challenge would have been if Akka Streams would have been available.

Akka Streams in a nutshell

In case you are new to Akka Streams as follows some basic information that help you understand the rest of the blog.

The core ingredients of a Reactive Stream consist of three building blocks:

  • A Source that produces some values
  • A Flow that performs some transformation of the elements produced by a Source
  • A Sink that consumes the transformed values of a Flow

Akka Streams provide a rich DSL through which transformation pipelines can be composed using the mentioned three building blocks.

A transformation pipeline executes asynchronously. For that to work it requires a so called FlowMaterializer, which will execute every step of the pipeline. A FlowMaterializer uses Actor's for the pipeline's execution even though from a usage perspective you are unaware of that.

A basic transformation pipeline looks as follows:


  import akka.stream.scaladsl._
  import akka.stream.FlowMaterializer
  import akka.actor.ActorSystem

  implicit val actorSystem = ActorSystem()
  implicit val materializer = FlowMaterializer()

  val numberReverserFlow: Flow[Int, String] = Flow[Int].map(_.toString.reverse)

  numberReverserFlow.runWith(Source(100 to 200), ForeachSink(println))

We first create a Flow that consumes Ints and transforms them into reversed Strings. For the Flow to run we call the runWith method with a Source and a Sink. After runWith is called, the pipeline starts executing asynchronously.

The exact same pipeline can be expressed in various ways, such as:


    //Use the via method on the Source that to pass in the Flow
    Source(100 to 200).via(numberReverserFlow).to(ForeachSink(println)).run()

    //Directly call map on the Source.
    //The disadvantage of this approach is that the transformation logic cannot be re-used.
    Source(100 to 200).map(_.toString.reverse).to(ForeachSink(println)).run()

For more information about Akka Streams you might want to have a look at this Typesafe presentation.

A simple reverse proxy with Akka Streams

Lets move back to our initial quest. The first task we tried to accomplish was to create a stream that accepts data from an incoming TCP connection, which is forwarded to a single outgoing TCP connection. In that sense this stream was supposed to act as a typical reverse-proxy that simply forwards traffic to another connection. The only remarkable quality compared to a traditional blocking/synchronous solution is that our stream operates asynchronously while preserving back-pressure.

import java.net.InetSocketAddress
import akka.actor.ActorSystem
import akka.stream.FlowMaterializer
import akka.stream.io.StreamTcp
import akka.stream.scaladsl.ForeachSink

implicit val system = ActorSystem("on-to-one-proxy")
implicit val materializer = FlowMaterializer()

val serverBinding = StreamTcp().bind(new InetSocketAddress("localhost", 6000))

val sink = ForeachSink[StreamTcp.IncomingConnection] { connection =>
      println(s"Client connected from: ${connection.remoteAddress}")
      connection.handleWith(StreamTcp().outgoingConnection(new InetSocketAddress("localhost", 7000)).flow)
}
val materializedServer = serverBinding.connections.to(sink).run()

serverBinding.localAddress(materializedServer)

First we create the mandatory instances every Akka reactive Stream requires, which is an ActorSystem and a FlowMaterializer. Then we create a server binding using the StreamTcp Extension that listens to incoming traffic on localhost:6000. With the ForeachSink[StreamTcp.IncomingConnection] we define how to handle the incoming data for every StreamTcp.IncomingConnection by passing a flow of type Flow[ByteString, ByteString]. This flow consumes ByteStrings of the IncomingConnection and produces a ByteString, which is the data that is sent back to the client.

In our case the flow of type Flow[ByteString, ByteString] is created by means of the StreamTcp().outgoingConnection(endpointAddress).flow. It forwards a ByteString to the given endpointAddress (here localhost:7000) and returns its response as a ByteString as well. This flow could also be used to perform some data transformations, like parsing a message.

Parallel reverse proxy with a Flow Graph

Forwarding a message from one connection to another will not meet our self defined requirements. We need to be able to forward messages from a single incoming connection to a configurable amount of outgoing connections.

Covering this use-case is slightly more complex. For it to work we make use of the flow graph DSL.


  import akka.util.ByteString
  import akka.stream.scaladsl._
  import akka.stream.scaladsl.FlowGraphImplicits._

  private def parallelFlow(numberOfConnections:Int): Flow[ByteString, ByteString] = {
    PartialFlowGraph { implicit builder =>
      val balance = Balance[ByteString]
      val merge = Merge[ByteString]
      UndefinedSource("in") ~> balance

      1 to numberOfConnections map { _ =>
        balance ~> StreamTcp().outgoingConnection(new InetSocketAddress("localhost", 7000)).flow ~> merge
      }

      merge ~> UndefinedSink("out")
    } toFlow (UndefinedSource("in"), UndefinedSink("out"))
  }

We construct a flow graph that makes use of the junction vertices Balance and Merge, which allow us to fan-out the stream to several other streams. For the amount of parallel connections we want to support, we create a fan-out flow starting with a Balance vertex, followed by a OutgoingConnection flow, which is then merged with a Merge vertex.

From an API perspective we faced the challenge of how to connect this flow to our IncomingConnection. Almost all flow graph examples take a concrete Source and Sink implementation as starting point, whereas the IncomingConnection does neither expose a Source nor a Sink. It only accepts a complete flow as input. Consequently, we needed a way to abstract the Source and Sink since our fan-out flow requires them.

The flow graph API offers the PartialFlowGraph class for that, which allows you to work with abstract Sources and Sinks (UndefinedSource and UndefinedSink). We needed quite some time to figure out how they work: simply declaring a UndefinedSource/Sink without a name won't work. It is essential that you give the UndefinedSource/Sink a name which must be identical to the one that is used in the UndefinedSource/Sink passed in the toFlow method. A bit more documentation on this topic would help.

Once the fan-out flow is created, it can be passed to the handleWith method of the IncomingConnection:

...
val sink = ForeachSink[StreamTcp.IncomingConnection] { connection =>
      println(s"Client connected from: ${connection.remoteAddress}")
      val parallelConnections = 20
      connection.handleWith(parallelFlow(parallelConnections))
    }
...

As a result, this implementation delivers all incoming messages to the endpoint system in parallel while still preserving back-pressure. Mission completed!

Testing the Application

To test our solution we wrote two helper applications:

  • A blocking client that pumps as many messages as possible into a socket connection to the parallel reverse proxy
  • A server that delays responses with a configurable latency in order to mimic a slow endpoint. The parallel reverse proxy forwards messages via one of its connections to this endpoint.

The following chart depicts the increase in throughput with the increase in amount of connections. Due to the nondeterministic concurrent behavior there are some spikes in the results but the trend shows a clear correlation between throughput and amount of connections:

Performance_Chart

End-to-end solution

The end-to-end solution can be found here.
By changing the numberOfConnections variable you can see the impact on performance yourself.

Check it out! ...and go with the flow ;-)

Information about TCP back pressure with Akka Streams

At the time of this writing there was not much information available about Akka Streams, due to the fact that it is one of the newest toys of the Typesafe factory. As follows some valuable resources that helped us getting started:

Why isn't the architecture in the code?

Coding the Architecture - Simon Brown - Tue, 01/13/2015 - 10:25

In response to my System Context diagram as code post yesterday was this question:

@simonbrown why is that information not already in the system's code?

— Nat Pryce (@natpryce) January 12, 2015

I've often asked the same thing and, if the code is the embodiment/implementation of the architecture, this information really should be present in the code. But my experience suggests this is rarely the case.

System context

My starting point for describing a software system is to draw a system context diagram. This shows the system in question along with key user types (e.g. actors, roles, personas, etc) and system dependencies.

I should be able to get a list of user roles from the code. For example, many web applications will have some configuration that describes the various user roles, Active Directory groups, etc and the parts of the web application that they have access too. This will differ from codebase to codebase and technology to technology, but in theory this information is available somewhere.

The key system dependencies are a little harder to extract from a codebase. Again, we can scrape security configuration to identify links to systems such as LDAP and Active Directory. We could also search the codebase for links to known libraries or APIs, and make the assumption that these are a system dependencies. But what about those system interactions that are done by copying a file into a network share? I know this sounds archaic, but it still happens. Understanding inbound dependencies is also tricky, especially if you don't keep track of your API consumers.

Containers

The next level in my C4 model is a container diagram, which basically shows the various web applications, mobile apps, databases, file systems, standalone applications, etc and how they interact to form the overall software system. Again, some of this information will be present, in one form or another, in the codebase. For example, you could scrape this information out of an IDE such as IntelliJ IDEA (i.e. modules) or Visual Studio (i.e. projects). The output from build scripts for code (e.g. Ant, Maven, MSBuild, etc) and infrastructure (e.g. Puppet, Chef, Vagrant, Docker, etc) will probably result in deployable units, which can again be identified and this information used to create the containers model.

Components

The third level of the C4 model is components (or modules, services, layers, etc). Since even a relatively small application may consist of a large number of components, this is a level that we certainly want to automate. But it turns out that even this is tricky. Usually there's a lack of an architecturally-evident coding style, which means you get a conflict between the software architecture model and the code. This is particularly true in older systems where the codebase lacks modularity and looks like a sea of thousands of classes interacting with one another. As Robert Annett suggests, there are a number of strategies that we can use to identify "components" from a codebase though; including annotations/attributes, packaging conventions, naming conventions, module systems (e.g. OSGi), library dependencies and so on.

Auto-generating the software architecture model

Ultimately, I'd like to auto-generate as much of the software architecture model as possible from the code, but this isn't currently realistic. Why?

@natpryce @simonbrown because code doesn't contain the structures needed (and we don't train/show people how to do it)

— Eoin Woods (@eoinwoodz) January 13, 2015

We face two key challenges here. First of all, we need to get people thinking about software architecture once again so that they are able to think about, describe and discuss the various structures needed to reason about a large and/or complex software system. And secondly, we need to find a way to get these structures into the codebase. We have a way to go but, in time, I hope that the thought of using Microsoft Visio for drawing software architecture diagrams will seem ridiculous.

Categories: Architecture

The Stunning Scale of AWS and What it Means for the Future of the Cloud

James Hamilton, VP and Distinguished Engineer at Amazon, and long time blogger of interesting stuff, gave an enthusiastic talk at AWS re:Invent 2014 on AWS Innovation at Scale. He’s clearly proud of the work they are doing and it shows.

James shared a few eye popping stats about AWS:

  • 1 million active customers
  • All 14 other cloud providers combined have 1/5th the aggregate capacity of AWS (estimate by Gartner in 2013)
  • 449 new services and major features released in 2014
  • Every day, AWS adds enough new server capacity to support all of Amazon’s global infrastructure when it was a $7B annual revenue enterprise (in 2004).
  • S3 has 132% year-over-year growth in data transfer
  • 102Tbps network capacity into a datacenter.

The major theme of the talk is the cloud is a different world. It’s a special environment that allows AWS to do great things at scale, things you can’t do, which is why the transition from on premise x86 servers to the public cloud is happening at a blistering pace. With so many scale driven benefits to the public cloud, it's a transition that can't be stopped. The cloud will keep getting more reliable, more functional, and cheaper at a rate that you can't begin to match with your limited resources, generalist gear, bloated software stacks, slow supply chains, and outdated innovation paradigms.

That's the PR message at least. But one thing you can say about Amazon is they are living it. They are making it real. So a healthy doubt is healthy, but extrapolating out the lines of fate would also be wise.

One of the fickle finger of fate advantages AWS has is resources. At one million customers they have the scale to keep the engine of expansion and improvement going. Profits aren't being taken out, money is being reinvested. This is perhaps the most important advantage of scale.

But money without smarts is simply waste. Amazon wants you to know they have the smarts. We've heard how Google and Facebook build their own gear, Amazon does too. They build their own networking gear, networking software, racks, and they work with Intel to get faster processor versions of processors than are available on the market. The key is they know everything and control everything about their environment, so they can build simpler gear that does exactly what they want, which turns out to be cheaper and more reliable in the end.

Complete control allows quality metrics to be built into everything. Metrics drive a constant quality increase in all parts of the system, which is why against all odds AWS is getting more reliable as the pace of innovation quickens. Great pools of actionable data turned into knowledge is another huge advantage of scale.

Another thing AWS can do that you can't is the Availability Zone architecture itself. Each AZ is its own datacenter and AZs within a region are located very close together. This reduces messaging latencies, which means state can be synchronously replicated between AZs, which greatly improves availability compared to the typical approach where redundant datacenters are very far apart. 

It's a talk rich with information and...well, spunk. The real meta-theme of the talk is how Amazon consciously uses scale to their competitive advantage. For Amazon scale isn't just an expense to be dealt with, scale is a resource to exploit, if you know how.

Here's my gloss of James Hamilton's incredible talk...

Everything in the Talk has a Foundation in Scale
Categories: Architecture

System Context diagram as code

Coding the Architecture - Simon Brown - Mon, 01/12/2015 - 15:10

As I said in Resolving the conflict between software architecture and code, my focus for this year is representing a software architecture model as code. In Simple Sketches for Diagramming Your Software Architecture, I showed an example System Context diagram for my techtribes.je website.

techtribes.je System Context diagram

It's a simple diagram that shows techtribes.je in the middle, surrounded by the key types of users and system dependencies. It's your typical "big picture" view. This diagram was created using OmniGraffle (think Microsoft Visio for Mac OS X) and it's exactly that - a static diagram that needs to be manually kept up to date. Instead, wouldn't it be great if this diagram was based upon a model that we could better version control, collaborate on and visualise? If you're not sure what I mean by a "model", take a look at Models, sketches and everything in between.

This is basically what the aim of Structurizr is. It's a way to describe a software architecture model as code, and then visualise it in a simple way. The Structurizr Java library is available on GitHub and you can download a prebuilt binary. Just as a warning, this is very much a work in progress and so don't be surprised if things change! Here's some Java code to recreate the techtribes.je System Context diagram.

Executing this code creates this JSON, which you can then copy and paste into the try it page of Structurizr. The result (if you move the boxes around) is something like this.

techtribes.je System Context diagram

Don't worry, there will eventually be an API for uploading software architecture models and the diagrams will get some styling, but it proves the concept. What we have then is an API that implements the various levels in my C4 software architecture model, with a simple browser-based rendering tool. Hopefully that's a nice simple introduction of how to represent a software architecture model as code, and gives you a flavour for the sort of direction I'm taking it. Having the software architecture as code provides some interesting opportunities that you don't get with static diagrams from Visio, etc and the ability to keep the models up to date automatically by scanning the codebase is what I find particularly exciting. If you have any thoughts on this, please do drop me a note.

Categories: Architecture

Models, sketches and everything in between

Coding the Architecture - Simon Brown - Mon, 01/12/2015 - 13:35

Eoin Woods (co-author of the Software Systems Architecture book) and I presented a session at the Software Architect 2014 conference titled Models, sketches and everything in between, where we discussed the differences between diagrams and models for capturing and communicating the software architecture of a system.

Just the mention of the word "modelling" brings back horrible memories of analysis paralysis for many software developers. And, in their haste to adopt agile approaches, we’ve seen countless software teams who have thrown out the modelling baby with the process bathwater. In extreme cases, this has led to the creation of software systems that really are the stereotypical "big ball of mud". In this session, Simon and Eoin will discuss models, sketches and everything in between, providing you with some real world advice on how even a little modelling can help you avoid chaos.

Models, sketches and everything in between - video

Models, sketches and everything in between - slides

The video and slides are both available. After a short overview of our (often differing!) opinions, we answered the following questions.

  1. Modelling - Why Bother?
  2. Modelling and Agility?
  3. How to Do It?
  4. UML - Is It Worth the Hassle?
  5. Modelling in the Large vs the Small

It was a very fun session to do and I'd recommend taking a look if you're interested in describing/communicating the software architecture of your system. Enjoy!

Categories: Architecture

Stuff The Internet Says On Scalability For January 9th, 2015

Hey, it's HighScalability time:


UFOs or Floating Solar Balloon power stations? You decide.

 

  • 700 Million: WhatsApp active monthly users; 17 million: comments on Stack Exchange in 2014
  • Quotable Quotes
    • John von Neumann: It is easier to write a new code than to understand an old one.
    • @BenedictEvans: Gross revenue on Apple & Google's app stores was a little over $20bn in 2014. Bigger than recorded music, FWIW.
    • Julian Bigelow: Absence of a signal should never be used as a signal. 
    • Bigelow ~ separate signal from noise at every stage of the process—in this case, at the transfer of every single bit—rather than allowing noise to accumulate along the way
    • cgb_: One of the things I've found interesting about rapidly popular opensource solutions in the last 1-2 years is how quickly venture cap funding comes in and drives the direction of future development.
    • @miostaffin: "If Amazon wants to test 5,000 users to use a feature, they just need to turn it on for 45 seconds." -@jmspool #uxdc
    • Roberta Ness: Amazing possibility on the one hand and frustrating inaction on the other—that is the yin and yang of modern science. Invention generates ever more gizmos and gadgets, but imagination is not providing clues to solving the scientific puzzles that threaten our very existence.

  • Can HTTPS really be faster than HTTP? Yes, it can. Take the test for yourself. The secret: SPDY. More at Why we don’t use a CDN: A story about SPDY and SSL

  • A fascinating and well told tale of the unexpected at Facebook. Solving the Mystery of Link Imbalance: A Metastable Failure State at Scale: The most literal conclusion to draw from this story is that MRU connection pools shouldn’t be used for connections that traverse aggregated links. At a meta-level, the next time you are debugging emergent behavior, you might try thinking of the components as agents colluding via covert channels. At an organizational level, this investigation is a great example of why we say that nothing at Facebook is somebody else’s problem.

  • Everything old is new again. Facebook on disaggregation vs. hyperconvergence: Just when everyone agreed that scale-out infrastructure with commodity nodes of tightly-coupled CPU, memory and storage is the way to go, Facebook’s Jeff Qin, a capacity management engineer – in a talk at Storage Visions 2015 – offers an opposing vision: disaggregated racks. One rack for computes, another for memory and a third – and fourth – for storage.

  • Why Instagram Worked. Instagram was the result of a pivot away from a not popular enough social networking site to a stripped down app that allowed people to document their world in pictures. Though the source article is short on the why, there's a good discussion on Hacker News. Some interesting reasons: Instagram worked because it algorithmically hides flaws in photographs so everyone's pictures look "good"; Snapping a photo is easy and revolves around a moment -- something easier to recognize when it's worthy of sharing; Startups need lucky breaks, but connections with the right people increase the odds considerably; Instagram worked because it was at the right place at the right time; It worked because it's a simple, quick, ultra-low friction way of sharing photos.

  • Atheists, it's not what you think. The God Login. The incomparable Jeff Atwood does a deep dive on the design of a common everyday object: the Login page. The title was inspired by one of Jeff's teacher's who asked what was the "God Algorithm" for a problem, that is, if God solved a problem what would the solution look like? While you may not agree with the proposed solution to the Login page problem, you may at least come away believing that one may or may not exist.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Shneiderman's mantra

Coding the Architecture - Simon Brown - Thu, 01/08/2015 - 10:01

I attended a fantastic talk about big data visualisation at the YOW! 2014 conference in Sydney last month (slides), where Doug Talbott talked about how to understand and visualise large quantities of data. One of the things he mentioned was Shneiderman's mantra:

Overview first, zoom and filter, then details-on-demand

Leaving aside the thorny issue of how teams structure their software systems as code, one of the major problems I see teams having with software architecture is how to think about their systems. There are various ways to do this, including a number of view catalogs (e.g. logical view, design view, development view, etc) and I have my C4 model that focuses on the static structure of a software system. If you inherit an existing codebase and are asked to create a software architecture model though, where do you start? And how to people start understanding the model as quickly as possible so they can get on with their job?

Shneiderman's mantra fits really nicely with the C4 model because it's hierarchical.

Shneiderman's mantra and the C4 software architecture model

Overview first (context and container diagrams)

My starting point for understanding any software system is to draw a system context diagram. This helps me to understand the scope of the system, who is using it and what the key system dependencies are. It's usually quick to draw and quick to understand.

Next I'll open up the system and draw a diagram showing the containers (web applications, mobile apps, standalone applications, databases, file systems, message buses, etc) that make up the system. This shows the overall shape of the software system, how responsibilities have been distributed and the key technology choices that have been made.

Zoom and filter (component diagrams)

As developers, we often need more detail, so I'll then zoom into each (interesting) container in turn and show the "components" inside it. This is where I show how each application has been decomposed into components, services, modules, layers, etc, along with a brief note about key responsibilities and technology choices. If you're hand-drawing the diagrams, this part can get a little tedious, which is why I'm focussing on creating a software architecture model as code, and automating as much of this as possible.

Details on demand (class diagrams)

Optionally, I might progress deeper into the hierarchy to show the classes* that make up a particular component, service, module, layer, etc. Ultimately though, this detail resides in the code and, as software developers, we can get that on demand.

Understanding a large and/or complex software system

Next time you're asked to create an architecture model, understand an existing system, present an system overview, do some software archaeology, etc, my advice is to keep Shneiderman's mantra in mind. Start at the top and work into the detail, creating a story that gets deeper into the detail as it progresses. The C4 model is a great way to do this and if you'd like an introduction to it (with example diagrams), you can take a look at Simple Sketches for Diagramming Your Software Architecture on the new Voxxed website.

* this assumes an OO language like Java or C#, for example

Categories: Architecture

Habits, Dreams, and Goals

I’ve been talking to people in the halls about what they learned about goals from last year, and what they are going to do differently this year.   We’ve had chats about New Years Resolutions, habits, goals, and big dreams. (My theme is Dream Big for 2015.)

Here are a few of the insights that I’ve been sharing with people that really seems to create a lot clarity:

  1. Dream big first, then create your goals.  Too many people start with goals, but miss the dream that holds everything together.   The dream is the backdrop and it needs to inspire you and pull your forward.  Your dream needs to be actionable and believable, and it needs to reflect your passion and your purpose.
  2. There are three types of actions:  habits, goals, and inspired actions.   Habits can help support our goals and reach our dreams.   Goals are really the above and beyond that we set our sights on and help us funnel and focus our energy to reach meaningful milestones.   They take deliberate focus and intent.  You don’t randomly learn to play the violin with skill.  It takes goals.  Inspired actions are the flashes of insight and moments of brilliance.
  3. People mess up by focusing on goals, but not having any habits that support them.  For example, if I have an extreme fitness goal, but I have the ice-cream habit, I might not reach my goals.  Or, if I want to be an early bird, but I have the party-all-night long, or a I’m a late-night reader, that might not work out so well.  
  4. People mess up on their habits when they have no goals.  They might inch their way forward, but they can easily spend an entire year, and not actually have anything significant or meaningful for themselves, because they never took the chance to dream big, or set a goal they cared about.   So while they’ve made progress, they didn’t make any real pop.   Their life was slow and steady.  In some cases, this is great, if all they wanted.  But I also know people that feel like they wasted the year, because they didn’t do what they knew they were capable of, or wanted to achieve.
  5. People can build habits that help them reach new goals.   Some people I knew have built fantastic habits.  They put a strong foundation in place that helps them reach for more.  They grow better, faster, stronger, and more powerful.   In my own experience, I had some extreme fitness goals, but I started with a few healthy habits.  My best one is wake up, work out.  I just do it.  I do a 30 minute workout.   I don’t have to think about it, it’s just part of my day like brushing my teeth.  Since it’s a habit, I keep doing it, so I get better over time.  When I first started the workout, I sucked.  I repeated the same workout three times, but by the third time, I was on fire.   And, since it’s a habit, it’s there for me, as a staple in my day, and, in reality, the most empowering part of my day.  It boosts me and gives me energy that makes everything else in my day, way easier, much easier to deal with, and I can do things in half the time, or in some cases 10X.

Maybe the most important insight is that while you don’t need goals to make your habits effective, it’s really easy to spend a year, and then wonder where the year went, without the meaningful milestones to look back on.   That said, I’ve had a few years, where I simply focused on habits without specific goals, but I always had a vision for a better me, or a better future in mind (more like a direction than a destination.)

As I’ve taken friends and colleagues through some of my learnings over the holidays, regarding habits, dreams, and goals, I’ve had a few people say that I should put it all together and share it, since it might help more people add some clarity to setting and achieving their goals.

Here it is:

How Dreams, Goals, and Habits Fit Together

Enjoy, and Dream Big for 2015.

Categories: Architecture, Programming

The Ultimate Guide: 5 Methods for Debugging Production Servers at Scale

This a guest post by Alex Zhitnitsky, an engineer working at Takipi, who is on a mission to help Java and Scala developers solve bugs in production and rid the world of buggy software.

How to approach the production debugging conundrum?

All sorts of wild things happen when your code leaves the safe and warm development environment. Unlike the comfort of the debugger in your favorite IDE, when errors happen on a live server - you better come prepared. No more breakpoints, step over, or step into, and you can forget about adding that quick line of code to help you understand what just happened. In production, bad things happen first and then you have to figure out what exactly went wrong. To be able to debug in this kind of environment we first need to switch our debugging mindset to plan ahead. If you’re not prepared with good practices in advance, roaming around aimlessly through the logs wouldn’t be too effective.

And that’s not all. With high scalability architectures, enter high scalability errors. In many cases we find transactions that originate on one machine or microservice and break something on another. Together with Continuous Delivery practices and constant code changes, errors find their way to production with an increasing rate. The biggest problem we’re facing here is capturing the exact state which led to the error, what were the variable values, which thread are we in, and what was this piece of code even trying to do?

Let’s take a look at 5 methods that can help us answer just that. Distributed logging, advanced jstack techniques, BTrace and other custom JVM agents:

1. Distributed Logging
Categories: Architecture