Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
Syndicate content
Updated: 5 hours 2 min ago

Building Globally Distributed, Mission Critical Applications: Lessons From the Trenches Part 2

10 hours 28 min ago

This is Part 2 of a guest post by Kris Beevers, founder and CEO, NSONE, a purveyor of a next-gen intelligent DNS and traffic management platform. Here's Part 1.

Integration and functional testing is crucial

Unit testing is hammered home in every modern software development class.  It’s good practice. Whether you’re doing test-driven development or just banging out code, without unit tests you can’t be sure a piece of code will do what it’s supposed to unless you test it carefully, and ensure those tests keep passing as your code evolves.

In a distributed application, your systems will break even if you have the world’s best unit testing coverage. Unit testing is not enough.

You need to test the interactions between your subsystems. What if a particular piece of configuration data changes – how does that impact Subsystem A’s communication with Subsystem B? What if you changed a message format – do all the subsystems generating and handling those messages continue to talk with each other? Does a particular kind of request that depends on results from four different backend subsystems still result in a correct response after your latest code changes?

Unit tests don’t answer these questions, but integration tests do. Invest time and energy in your integration testing suite, and put a process in place for integration testing at all stages of your development and deployment process. Ideally, run integration tests on your production systems, all the time.

There is no such thing as a service interrupting maintenance
Categories: Architecture

Sponsored Post: Microsoft , Librato, Surge, Redis Labs, Jut.io, VoltDB, Datadog, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 09/01/2015 - 16:56

Who's Hiring?
  • Microsoft’s Visual Studio Online team is building the next generation of software development tools in the cloud out in Durham, North Carolina. Come help us build innovative workflows around Git and continuous deployment, help solve the Git scale problem or help us build a best-in-class web experience. Learn more and apply.

  • VoltDB's in-memory SQL database combines streaming analytics with transaction processing in a single, horizontal scale-out platform. Customers use VoltDB to build applications that process streaming data the instant it arrives to make immediate, per-event, context-aware decisions. If you want to join our ground-breaking engineering team and make a real impact, apply here.  

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?
Cool Products and Services
  • Librato, a SolarWinds Cloud company, is a hosted monitoring platform for real-time operations and performance analytics. Easily add metrics from any source using turnkey solutions such as the AWS Cloudwatch integration, or by leveraging any of over 100 open source collection agents and language bindings. Librato is loved equally by DevOps and data engineers. Start using Librato today. Full-featured and free for 30 days.

  • MongoDB Management Made Easy. Gain confidence in your backup strategy. MongoDB Cloud Manager makes protecting your mission critical data easy, without the need for custom backup scripts and storage. Start your 30 day free trial today.

  • In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-intensive use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Building Globally Distributed, Mission Critical Applications: Lessons From the Trenches Part 1

Mon, 08/31/2015 - 16:56

This is Part 1 of a guest post by Kris Beevers, founder and CEO, NSONE, a purveyor of a next-gen intelligent DNS and traffic management platform. Here's Part 2.

Every tech company thinks about it: the unavoidable – in fact, enviable – challenge of scaling its applications and systems as the business grows. How can you think about scaling from the beginning, and put your company on good footing, without optimizing prematurely? What are some of the key challenges worth thinking about now, before they bite you later on? When you’re building mission critical technology, these are fundamental questions. And when you’re building a distributed infrastructure, whether for reliability or performance or both, they’re hard questions to answer.

Putting the right architecture and processes in place will enable your systems and company to withstand the common hiccups distributed, high traffic applications face. This enables you to stay ahead of scaling constraints, manage inevitable network and system failures, stay calm and debug production issues in real-time, and grow your company and product successfully.

Who is this guy?

I’ve been building globally distributed, large scale applications for a long time.  Way back in the first dot-com boom, I bailed on college classes for a year and built backend infrastructure for a file-sharing startup which grew to millions of users – until the RIAA’s lawyers caught wind and sent us packing back to our dorm rooms. The business went bust, but I was hooked on scale.

More recently, at Voxel, an internet infrastructure provider that was acquired by Internap in 2011, I built global internet infrastructure used by many large web companies – we built globally distributed public cloud, bare metal as-a-service, content delivery networks, and much more. We learned a lot of scaling lessons, and we learned them the hard way.

Now, at NSONE, we’ve built a next-gen intelligent DNS and traffic management platform, which today services some of the largest properties on the Internet, including many companies who are themselves mission critical service providers.  This is truly globally distributed, mission critical infrastructure, and the lessons we learned at Voxel have served us well – and been reinforced time and again – as we’ve built and scaled the NSONE platform.

It’s time to share some of what we’ve learned, and with luck, maybe you can apply some of these lessons in your own applications – instead of learning them the hard way!

Architecture first
Categories: Architecture

Stuff The Internet Says On Scalability For August 28th, 2015x

Fri, 08/28/2015 - 16:56

Hey, it's HighScalability time:


The oldest known fossil of a flowering plant. 130 million years old. What digital will last so long?
  • 32.6: Ashley Madison password cracks per hour; 1 million: cores in the Human Brain Project's silicon brain; 54,000: tennis balls used at Wimbledon; 4 kB: size of first web page; 1.2 million: million messages per second Apache Samza performance on a single node; 27%: higher conversion for sites loading one second faster; 

  • Quotable Quotes:
    • @adrianco: Apple first read about Mesos on http://highscalability.com  and for a year have run Siri on the worlds biggest cluster 
    • @Besvinick: Interesting recurring sentiment from recent grads: We lived most of our college lives on Snapchat—now we don't have any "tangible" memories.
    • Robin Hobb: For most moments of our lives, we have forgotten almost all of the world around us, except for what currently claims our interest.
    • @Carnage4Life: I'd like to thank all the Amazon employees who cried at their desks to make this possible
Categories: Architecture

7 Strategies for 10x Transformative Change

Wed, 08/26/2015 - 16:56

Peter Thiel, VC, PayPal co-founder, early Facebook investor, and most importantly, the supposed inspiration for Silicon Valley's intriguing Peter Gregory character, argues in his book Zero to One that a successful business needs to make a product that is 10 times better than its closest competitor

The title Zero to One refers to the idea of progress as either horizontal/extensive or vertical/intensive. For a more detailed explanation take a look at Peter Thiel's CS183: Startup - Class 1 Notes Essay.

Horizontal/extensive progress refers to copying things that work. Observe, imitate, and repeat.  The one word summary for the concept is  "globalization.” For more on this PAYPAL MAFIA: Reid Hoffman & Peter Thiel's Master Class in China is an interesting watch.

Vertical/intensive progress means doing something genuinely new, that is going from zero to one, as apposed to going from one to N, which is merely globalization. This is the creative spark. The hero's journey of over coming obstacles on the way to becoming the Master of the Universe you were always meant to be.

We see this pattern with Google a lot. Google often hits scaling challenges long before anyone else and because they have a systematizing culture they produce discrete replicatable technologies that then diffuse out to the rest of the world, often through open source efforts.

Google told us about the Google File System in 2003, MapReduce in 2004, Bigtable in 2006, The Datacenter as a Computer in 2009, Percolator (real-time updates) in 2010, Pregel (graph processing) in 2010, Dremel (interactive analysis) in 2010, Spanner (globally distributed database) in 2012,  Omega (cluster scheduling) in 2013, Borg (cluster manager) in 2015, and Jupiter Rising (advanced networking) in 2015.

Sometime later we've seen the development of open source parallels like HDFS, Hadoop, HBase, Giraph, YARN, Drill, and Mesos. 

So, how can you rise up and meet the 10x challenge?

Murat Demirbas, a computer science and engineering professor at SUNY Buffalo, and awesome writer on all things distributed, came up with some good suggestions in How to go for 10X

Categories: Architecture

Ask HighScalability: Choose an Async App Server or Multiple Blocking Servers?

Mon, 08/24/2015 - 16:56

Jonathan Willis, software developer by day and superhero by night, asked an interesting question via Twitter on StackOverflow

tl;dr Many Rails apps or one Vertx/Play! app?


I've been having discussions with other members of my team on the pros and cons of using an async app server such as the Play! Framework (built on Netty) versus spinning up multiple instances of a Rails app server. I know that Netty is asynchronous/non-blocking, meaning during a database query, network request, or something similar an async call will allow the event loop thread to switch from the blocked request to another request ready to be processed/served. This will keep the CPUs busy instead of blocking and waiting.

I'm arguing in favor or using something such as the Play! Framework or Vertx.io, something that is non-blocking... Scalable. My team members, on the other hand, are saying that you can get the same benefit by using multiple instances of a Rails app, which out of the box only comes with one thread and doesn't have true concurrency as do apps on the JVM. They are saying just use enough app instances to match the performance of one Play! application (or however many Play! apps we use), and when a Rails app blocks the OS will switch processes to a different Rails app. In the end, they are saying that the CPUs will be doing the same amount of work and we will get the same performance.

What do you think? The marketplace has seemingly moved, in the form of node.js, Golang, Akka, and even Java, to the async server model. Does that mean it's the only right way?

Here's my attempt at a response:

Categories: Architecture

Stuff The Internet Says On Scalability For August 21st, 2015

Fri, 08/21/2015 - 16:56

Hey, it's HighScalability time:


Hunter-Seeker? Nope. This is the beauty of what a Google driverless car sees. Great TED talk.
  • $2.8 billion: projected Instagram ad revenue in 2017; 1 trillion: Azure event hub events per month; 10 million: Stack Overflow questions asked; 1 billion: max volts generated by a lightening strike; 850: apps downloaded every second from the AppStore; 2000: years data can be stored in DNA; 60: # of robots needed to replace 600 humans; 1 million: queries per second with Nginx, Ubuntu, EC2

  • Quotable Quotes:
    • Tales from the Lunar Module Guidance Computer: we landed on the moon with 152 Kbytes of onboard computer memory.
    • @ijuma: Included in JDK 8 update 60 "changes GHASH internals from using byte[] to long, improving performance about 10x
    • @ErrataRob: I love the whining over the Bitcoin XT fork. It's as if anarchists/libertarians don't understand what anarchy/libertarianism means.
    • Network World: the LHC Computing Grid has 132,992 physical CPUs, 553,611 logical CPUs, 300PB of online disk storage and 230PB of nearline (magnetic tape) storage. It's a staggering amount of processing capacity and data storage that relies on having no single point of failure.
    • @petereisentraut: Chef is kind of a distributed monkey-patching festival running as root.
    • @SciencePorn: If you were to remove all of the empty space from the atoms that make up every human on earth, all humans would fit into an apple.
    • SDN for the cloud: Most of the concepts presented in the papers have been put into practice in Microsoft cloud infrastructures. As a result of these improvements, modern Azure services can carry up to 1,400,000 SQL databases. Moreover, a typical Azure event hub sees as high as 1 trillion events per month.

  • On the Alphabet Google reorg...what Horace Dediu has to say on functional vs divisional organizations may provide insight. A functional organization, which is used by the Army and Apple, prevents cross divisional fights for resources and power. Those are the kind of internal politics that kill a company. Why not just sidestep all that?

  • Here's how Pinterest shards MySQL to scale: All data needed to be replicated to a slave machine for backup, with high availability and dumping to S3 for MapReduce...You never want to read/write to a slave in production...Slaves lag, which causes strange bugs; I still recommend startups avoid the fancy new stuff — try really hard to just use MySQL. Trust me. I have the scars to prove it...We created a 64 bit ID that contains the shard ID...To create a new Pin, we gather all the data and create a JSON blob...A mapping table links one object to another...there are three primary ways to add more capacity...more RAM...open up new ranges...move some shards to new machines...This system is best effort. It does not give you Atomicity, Isolation or Consistency in all cases...We stored the shard configuration table in ZooKeeper...This system has been in production at Pinterest for 3.5 years now and will likely be in there forever. 

  • Nobody expects the quadruple fault! Google loses data as lightning strikes: four successive lightning strikes on the local utilities grid that powers our European datacenter caused a brief loss of power to storage systems...only a very small number of disks remained affected, totalling less than 0.000001% of the space of allocated persistent disks...full recovery is not possible.

  • Flxone upgraded to Go version 1.5 and reduced their 95-percentile garbage collector from 279 milliseconds down to just 10 ms, a 96% decrease in garbage collection pause time. Average request latency dropped by 53%. I wonder now if they can reduce the number of nodes required to meet their SLA? And would the results hold if they wrote their app more natively, that is to generate garbage?

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

The Microsoft Take on Containers and Docker

Wed, 08/19/2015 - 16:56

This is a guest repost by Mark Russinovich, CTO of Microsoft Azure (and novelist!). We all benefit from a vibrant competitive cloud market and Microsoft is part of that mix. Here's a good container overview along with Microsoft's plan of attack. Do you like their story? Is it interesting? Is it compelling?

You can’t have a discussion on cloud computing lately without talking about containers. Organizations across all business segments, from banks and major financial service firms to e-commerce sites, want to understand what containers are, what they mean for applications in the cloud, and how to best use them for their specific development and IT operations scenarios.

From the basics of what containers are and how they work, to the scenarios they’re being most widely used for today, to emerging trends supporting “containerization”, I thought I’d share my perspectives to better help you understand how to best embrace this important cloud computing development to more seamlessly build, test, deploy and manage your cloud applications.

Containers Overview

In abstract terms, all of computing is based upon running some “function” on a set of “physical” resources, like processor, memory, disk, network, etc., to accomplish a task, whether a simple math calculation, like 1+1, or a complex application spanning multiple machines, like Exchange. Over time, as the physical resources became more and more powerful, often the applications did not utilize even a fraction of the resources provided by the physical machine. Thus “virtual” resources were created to simulate underlying physical hardware, enabling multiple applications to run concurrently – each utilizing fractions of the physical resources of the same physical machine.

We commonly refer to these simulation techniques as virtualization. While many people immediately think virtual machines when they hear virtualization, that is only one implementation of virtualization. Virtual memory, a mechanism implemented by all general purpose operating systems (OSs), gives applications the illusion that a computer’s memory is dedicated to them and can even give an application the experience of having access to much more RAM than the computer has available.

Containers are another type of virtualization, also referred to as OS Virtualization. Today’s containers on Linux create the perception of a fully isolated and independent OS to the application. To the running container, the local disk looks like a pristine copy of the OS files, the memory appears only to hold files and data of a freshly-booted OS, and the only thing running is the OS. To accomplish this, the “host” machine that creates a container does some clever things.

The first technique is namespace isolation. Namespaces include all the resources that an application can interact with, including files, network ports and the list of running processes. Namespace isolation enables the host to give each container a virtualized namespace that includes only the resources that it should see. With this restricted view, a container can’t access files not included in its virtualized namespace regardless of their permissions because it simply can’t see them. Nor can it list or interact with applications that are not part of the container, which fools it into believing that it’s the only application running on the system when there may be dozens or hundreds of others.

For efficiency, many of the OS files, directories and running services are shared between containers and projected into each container’s namespace. Only when an application makes changes to its containers, for example by modifying an existing file or creating a new one, does the container get distinct copies from the underlying host OS – but only of those portions changed, using Docker’s “copy-on-write” optimization. This sharing is part of what makes deploying multiple containers on a single host extremely efficient.

Second, the host controls how much of the host’s resources can be used by a container. Governing resources like CPU, RAM and network bandwidth ensure that a container gets the resources it expects and that it doesn’t impact the performance of other containers running on the host. For example, a container can be constrained so that it cannot use more than 10% of the CPU. That means that even if the application within it tries, it can’t access to the other 90%, which the host can assign to other containers or for its own use. Linux implements such governance using a technology called “cgroups.” Resource governance isn’t required in cases where containers placed on the same host are cooperative, allowing for standard OS dynamic resource assignment that adapts to changing demands of application code.

The combination of instant startup that comes from OS virtualization and reliable execution that comes from namespace isolation and resource governance makes containers ideal for application development and testing. During the development process, developers can quickly iterate. Because its environment and resource usage are consistent across systems, a containerized application that works on a developer’s system will work the same way on a different production system. The instant-start and small footprint also benefits cloud scenarios, since applications can scale-out quickly and many more application instances can fit onto a machine than if they were each in a VM, maximizing resource utilization.

Comparing a similar scenario that uses virtual machines with one that uses containers highlights the efficiency gained by the sharing. In the example shown below, the host machine has three VMs. In order to provide the applications in the VMs complete isolation, they each have their own copies of OS files, libraries and application code, along with a full in-memory instance of an OS. Starting a new VM requires booting another instance of the OS, even if the host or existing VMs already have running instances of the same version, and loading the application libraries into memory. Each application VM pays the cost of the OS boot and the in-memory footprint for its own private copies, which also limits the number of application instances (VMs) that can run on the host.

App Instances on Host

The figure below shows the same scenario with containers. Here, containers simply share the host operating system, including the kernel and libraries, so they don’t need to boot an OS, load libraries or pay a private memory cost for those files. The only incremental space they take is any memory and disk space necessary for the application to run in the container. While the application’s environment feels like a dedicated OS, the application deploys just like it would onto a dedicated host. The containerized application starts in seconds and many more instances of the application can fit onto the machine than in the VM case.

Containers on Host

Docker’s Appeal
Categories: Architecture

Sponsored Post: Surge, Redis Labs, Jut.io, VoltDB, Datadog, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 08/18/2015 - 16:56

Who's Hiring?
  • VoltDB's in-memory SQL database combines streaming analytics with transaction processing in a single, horizontal scale-out platform. Customers use VoltDB to build applications that process streaming data the instant it arrives to make immediate, per-event, context-aware decisions. If you want to join our ground-breaking engineering team and make a real impact, apply here.  

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?
Cool Products and Services
  • MongoDB Management Made Easy. Gain confidence in your backup strategy. MongoDB Cloud Manager makes protecting your mission critical data easy, without the need for custom backup scripts and storage. Start your 30 day free trial today.

  • In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-intensive use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

How Autodesk Implemented Scalable Eventing over Mesos

Mon, 08/17/2015 - 16:56

This is a guest post by Olivier Paugam, SW Architect for the Autodesk Cloud. I really like this post because it shows how bits of infrastructure--Mesos, Kafka, RabbitMQ, Akka, Splunk, Librato, EC2--can be combined together to solve real problems. It's truly amazing how much can get done these days by a small team.

I was tasked a few months ago to come up with a central eventing system, something that would allow our various backends to communicate with each other. We are talking about activity streaming backends, rendering, data translation, BIM, identity, log reporting, analytics, etc.  So something really generic with varying load, usage patterns and scaling profile.  And oh, also something that our engineering teams could interface with easily.  Of course every piece of the system should be able to scale on its own.

I obviously didn't have time to write too much code and picked up Kafka as our storage core as it's stable, widely used and works okay (please note I'm not bound to using it and could switch over to something else).  Now I of course could not expose it directly and had to front-end it with some API. Without thinking much I also rejected the idea of having my backend manage the offsets as it places too much constraint on how one deals with failures for instance.

So what did I end up with?

Categories: Architecture

Stuff The Internet Says On Scalability For August 14th, 2015

Fri, 08/14/2015 - 16:56

Hey, it's HighScalability time:


Being Google CEO: Nice. Becoming Tony Stark: Priceless (Alphabet)

 

  • $7: WeChat's revenue per user and there are 549 million of them; 60%: Etsy users using mobile; 10: times per second a self-driving car makes a decision; 900: calories in a litre of blood, vampires have very efficient metabolisms; 5 billion: the largest feature in the universe in light years

  • Quotable Quotes:
    • @sbeam: they finally had the Enigma machine. They opened the case. A card fell out. Turing picked it up. "Damn. They included a EULA." #oraclefanfic
    • kordless: compute and storage continue to track with Moore's Law but bandwidth doesn't. I keep wondering if this isn't some sort of universal limitation on this reality that will force high decentralization.
    • @SciencePorn: If you were to remove all of the empty space from the atoms that make up every human on earth, all humans would fit into an apple.
    • @adrianco: Commodity server with 1.4TB of RAM running a mix of 16GB regular DRAM and 128GB Memory1 modules.
    • @JudithNursalim: "One of the most scalable structure in history was the Roman army. Its unit: eight guys; the number of guys that fits in a tent" - Chris Fry
    • GauntletWizard: Google RPCs are fast. The RPC trace depth of many requests is >20 in miliseconds. Google RPCs are free - Nobody budgets for intradatacenter traffic. Google RPCs are reliable - Most teams hold themselves to a SLA of 4 9s, as measured by their customers, and many see >5 as the usual state of affairs.
    • @rzidane360: I am a Java library and I will start 50 threads and allocate a billion objects  on your behalf.
    • @codinghorror: From Sandy Bridge in Jan 2011 to Skylake in Aug 2015, x86 CPU perf increased ~25%. Same time for ARM mobile CPUs: ~800%.
    • @raistolo: "The cloud is not a cloud at all, it's a limited number of companies that have control over a large part of the Internet" @granick
    • Benedict Evans: since 1999 there are now roughly 10x more people online, US online revenues from ecommerce and advertising have risen 15x, and the cost of creating software companies has fallen by roughly 10x. 

  • App constellations aren't working. Is this another idea the West will borrow from the East? When One App Rules Them All: The Case of WeChat and Mobile in China: Chinese apps tend to combine as many features as possible into one application. This is in stark contrast to Western apps, which lean towards “app constellations”.

  • It doesn't get much more direct than this. Labellio: Scalable Cloud Architecture for Efficient Multi-GPU Deep Learning: The Labellio architecture is based on the modern distributed computing architectural concept of microservices, with some modification to achieve maximal utilization of GPU resources. At the core of Labellio is a messaging bus for deep learning training and classification tasks, which launches GPU cloud instances on demand. Running behind the web interface and API layer are a number of components including data collectors, dataset builders, model trainer controllers, metadata databases, image preprocessors, online classifiers and GPU­-based trainers and predictors. These components all run inside docker containers. Each component communicates with the others mainly through the messaging bus to maximize the computing resources of CPU, GPU and network, and share data using object storage as well as RDBMS.

  • How do might your application architecture change using Lambda? Here's a nice example of Building Scalable and Responsive Big Data Interfaces with AWS Lambda. A traditional master-slave or job server model is not used, instead Lambda is used to connect streams or processes in a pipeline. Work is broken down into smaller, parallel operations on small chunks with Lambda functions doing the heavy lifting. The pipeling consists of a S3 key lister, AWS Lambda invoker/result aggregator, Web client response handle. 

  • The Indie Web folks have put together a really big list of Site Deaths, that is sites that have had their plugs pulled, bits blackened, dreams dashed. Take some time, look through, and say a little something for those that have gone before.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Why My Water Droplet Is Better Than Your Hadoop Cluster

Wed, 08/12/2015 - 16:56

We’ve had computation using slime mold and soap film, now we have computation using water droplets. Stanford bioengineers have built a “fully functioning computer that runs like clockwork - but instead of electrons, it operates using the movement of tiny magnetised water droplets.”

 

By changing the layout of the bars on the chip it's possible to make all the universal logic gates. And any Boolean logic circuit can be built by moving the little magnetic droplets around. Currently the chips are about half the size of a postage stamp and the droplets are smaller than poppy seeds.

What all this means I'm not sure, but pavo6503 has a comment that helps understand what's going on:

Logic gates pass high and low states. Since they plan to use drops of water as carriers and the substances in those drops to determine what the high/low state is they could hypothetically make a filter that sorts drops of water containing 1 to many chemicals. Pure water passes through unchanged. water with say, oil in it, passes to another container, water with alcohol to another. A "chip" with this setup could be used to purify water where there are many contaminants you want separated.

Categories: Architecture

How Google Invented an Amazing Datacenter Network Only They Could Create

Mon, 08/10/2015 - 16:56

 

Google with justly earned pride recently announced:

Today at the 2015 Open Network Summit, we are revealing for the first time the details of five generations of our in-house network technology. From Firehose, our first in-house datacenter network, ten years ago to our latest-generation Jupiter network, we’ve increased the capacity of a single datacenter network more than 100x. Our current generation — Jupiter fabrics — can deliver more than 1 Petabit/sec of total bisection bandwidth. To put this in perspective, such capacity would be enough for 100,000 servers to exchange information at 10Gb/s each, enough to read the entire scanned contents of the Library of Congress in less than 1/10th of a second.

Google’s datacenter network is the magic behind what makes much of Google really work. But what is “bisectional bandwidth” and why does it matter? We talked about bisectional bandwidth a while back in Changing Architectures: New Datacenter Networks Will Set Your Code And Data Free. In short, bisectional bandwidth refers to the networks Google servers use to talk to each other.

Historically datacenter networks were oriented around talking to users. Let’s say a request for a web page came in from a browser. The request would go to a server and a reply was crafted by talking to just a few other servers, or perhaps even none at all, and the reply would be sent back to the client. This style of network is called a North/South oriented network. Very little internal communication was needed to implement a request.

That all changed as website and API services grew richer over time. Now literally thousands of backend requests can be made to create a single web page. Mind blowing. This meant communication shifted from being dominated by talking to users to talking to other machines within a datacenter. So these are called East/West oriented networks.

The shift to East/West dominate communication patterns meant a different topology was needed for datacenter networks. The old traditional fat tree network designs were out and something new needed to take its place.

Google has been on the forefront of developing new rich service supportive network designs largely because of their guiding vision of seeing The Datacenter as a Computer. Once your datacenter is the computer then your network is equivalent to a backplane on a single computer, so it must be as fast and reliable as possible so remote disk and remote storage can be accessed as if they were local.

Google’s efforts revolve around a three pronged plan of attack: use a Clos topology, use SDN (Software Defined Networking), and build their own custom gear in their own Googlish way.

Until now we’ve had a limited exposure to Google’s network designs. While we don’t exactly have an all access pass, Amin Vahdat, Google Fellow and Technical Lead for networking at Google, shared a lot of juicy details in a great talk: ONS [Open Networking Summit] 2015: Wednesday Keynote. There’s also a paper: Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network.

Why release details earlier than they usually do? Google has some real competition with Amazon and they need to find compelling points of differentiation. Google hopes their datacenter network is one such point.

So what makes Google different? The overall message:

  • The end of Moore’s Law means how programs are built is changing.

  • Google has figured it out. Google knows how to build great networks and achieve proper datacenter balance.

  • You can prosper by taking advantage of the innovations and capabilities of Google’s Cloud Platform, the very same platform that powers Google Search.

  • So climb on board, the network is fine! 

Is that enough? Perhaps it's not a message with mass appeal, but it may find a home with the discriminating buyer. 

Some key points from the talk for me:

  • We don’t know how to build big networks that deliver lots of bandwidth. Google says their network provides 1 Pb/sec of total bisection bandwidth, but it turns out that’s not nearly enough. To support a datacenter’s worth of large compute servers you’ll need 5 Pb/sec networks. Keep in mind the entire internet today is probably near 200Tb/s.

  • It’s more efficient to schedule jobs over huge clusters. Otherwise you have leftover CPU in one place and leftover memory in another. So if you can build your system correctly, a datacenter scale computer gives you a decided economy of scale.

  • Google built their datacenter network system using lessons they learned from the server and storage world: scale out, logically centralize, use commodity components, and never ever manage singlets of anything. Manage all your servers, storage, and networks as a unified whole.

  • The I/O gap is huge. Amin says it has to get solved, if it doesn’t then we’ll stop innovating. Storage capacity has increased through disaggregation. The opportunity is to access global datacenter storage as if it were local. This will get harder and harder with flash and NVM. A new tier of flash and NWM will completely change programming models. Note: unfortunately he didn’t expand on this notion, I dearly wished he had. Amin, can we talk?

What you look for in a good story are characters that act from a core identity. Here we see Google operating from a unique vision that grew organically from their deep experience building scalable software systems. Probably only Google would have had the guts to follow their vision through and build a datacenter network so completely different from accepted wisdom. That takes huge huevos. And it makes for a good story.

Here’s my hopelessly inadequate gloss on the talk:

Categories: Architecture

Stuff The Internet Says On Scalability For August 7th, 2015

Fri, 08/07/2015 - 16:56

Hey, it's HighScalability time:


A feather? Brass relief? River valley? Nope. It's frost on mars!
  • $10 billion: Microsoft data center spend per year; 1: hours from London to New York at mach 4.5; 1+: million Facebook requests per second; 25TB: raw data collected per day at Criteo; 1440: minutes in a day; 2.76: farthest distance a human eye can detect a candle flame in kilometers.

  • Quotable Quotes:
    • @drunkcod: IT is a cost center you say? Ok, let's shut all the servers down until you figure out what part of revenue we contribute to.
    • Beacon 23: I’m here because they ain’t made a computer yet that won’t do something stupid one time out of a hundred trillion. Seems like good odds, but when computers are doing trillions of things a day, that means a whole lot of stupid. 
    • @johnrobb: China factory: Went from 650 employees to 60 w/ robots. 3x production increase.  1/5th defect rate.
    • @twotribes: "Metrics are the internet’s heroin and we’re a bunch of junkies mainlining that black tar straight into the jugular of our organizations."
    • @javame: @adrianco I've seen a 2Tb erlang monolith and I don't want to see that again cc/@martinfowler
    • @micahjay1: Thinking about @a16z podcast about bio v IT ventures. Having done both, big diff is cost to get started and burn rate. No AWS in bio...yet
    • @0xced: XML: 1996 XLink: 1997 XML-RPC: 1998 XML Schema: 1998 JSON: 2001 JSON-LD: 2010 SON-RPC: 2005 JSON Schema: 2009 
    • Inside the failure of Google+: What people failed to understand was Facebook had network effects. It’s like you have this grungy night club and people are having a good time and you build something next door that’s shiny and new, and technically better in some ways, but who wants to leave? People didn't need another version of Facebook.
    • @bdu_p: Old age and treachery will beat youth and skill every time. A failed attempt to replace unix grep 

  • The New World looks a lot like the old Moscow. The Master of Disguise: My Secret Life in the CIA: we assume constant surveillance. This saturation level of surveillance, which far surpassed anything Western intelligence services attempted in their own democratic societies, had greatly constrained CIA operations in Moscow for decades.

  • How Netflix made their website startup time 70% faster. They removed a lot of server side complexity by moving to mostly client side rendering. Java, Tomcat, Struts, and Tiles were replaced with Node.js and React.js.  They call this Universal JavaScript, JavaScript on the server side and the client side. "Using Universal JavaScript means the rendering logic is simply passed down to the client." Only a bootstrap view is rendered on the server with everything else rendered incrementally on the client.

  • How Facebook fights spam with Haskell. Haskell is used as an expressive, latency sensitive rules engine. Sitting at the front of the ingestion point pipeline, it synchronously handles every single write request to Facebook and Instagram. That's more than one million requests per second. So not so slow. Haskell works well because it's a purely functional strongly typed language, supports hot swapping, supports implicit concurrency, performs well, and supports interactive development. Haskell is not used for the entire stack however. It's sandwiched. On the top there's C++ to process messages and on the bottom there's C++ client code interacts with other services. Key design decision: rules can't make writes, which means an abstract syntax tree of fetches can be overlapped and batched. 

  • You know how kids these days don't know the basics, like how eggs come from horses or that milk comes from chickens? The disassociation disorder continues. Now Millions of Facebook users have no idea they’re using the internet: A while back, a highly-educated friend and I were driving through an area that had a lot of data centers. She asked me what all of those gigantic blocks of buildings contained. I told her that they were mostly filled with many servers that were used to host all sorts of internet services. It completely blew her mind. She had no idea that the services that she and billions of others used on their phones actually required millions and millions of computers to transmit and process the data.

  • History rererepeats itself. Serialization is still evil. Improving Facebook's performance on Android with FlatBuffers:  It took 35 ms to parse a JSON stream of 20 KB...A JSON parser needs to build a field mappings before it can start parsing, which can take 100 ms to 200 ms...FlatBuffers is a data format that removes the need for data transformation between storage and the UI...Story load time from disk cache is reduced from 35 ms to 4 ms per story...Transient memory allocations are reduced by 75 percent...Cold start time is improved by 10-15 percent.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

How do you program a computer with 10 terabytes of RAM?

Wed, 08/05/2015 - 16:56

How do you program a computer with 10 terabytes of RAM in a single address space?  When the great Adrian Cockcroft was interviewed for Enterprise Initiatives Episode blog, that’s one of the answers he gave to the question of “What’s the next big thing?”

Adrian says we are already taking big machines and running tiny little containers on them. He thinks another interesting workload is huge memory systems. Building computers with many terabytes of main memory will soon be affordable. We already know the JVM has problems garbage collecting on machines with 10s of gigabytes of RAM. What about machines with terabytes of RAM? We don’t really have the programming models worked out yet. It may be that garbage collected languages won't make the cut.

Sounds like a good idea for a post, right? Here’s the problem, I found surprisingly little on huge memory systems. If you have any ideas on good source please leave a comment. Here’s some of what I did find…

SGI’s 64TB Computer
Categories: Architecture

Sponsored Post: Surge, Redis Labs, Jut.io, VoltDB, Datadog, Power Admin, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 08/04/2015 - 16:56

Who's Hiring?
  • VoltDB's in-memory SQL database combines streaming analytics with transaction processing in a single, horizontal scale-out platform. Customers use VoltDB to build applications that process streaming data the instant it arrives to make immediate, per-event, context-aware decisions. If you want to join our ground-breaking engineering team and make a real impact, apply here.  

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?
Cool Products and Services
  • MongoDB Management Made Easy. Gain confidence in your backup strategy. MongoDB Cloud Manager makes protecting your mission critical data easy, without the need for custom backup scripts and storage. Start your 30 day free trial today.

  • In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-intensive use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Seven of the Nastiest Anti-patterns in Microservices

Mon, 08/03/2015 - 16:56

Daniel Bryant gave an energetic talk at Devoxx UK 2015 on lessons learned from over five years of experience with microservice based projects. The talk: The Seven Deadly Sins of Microservices: Redux (video, slides).

If you don't want to risk your immortal API then be sure to avoid:

  1. Lust - using the latest and greatest tech with the idea it will solve all your problems. It won't. Do you really need microservices at all? If you do go microservices do you really need new tech in your stack? Choose boring technology. Know why you are choosing something. A monolith can perform better and because a monolith can be developed faster it may also be the correct choice in proving your business case 
  2. Gluttony - excessive communication protocols. Projects often have a crazy number of protocols for gluing parts together. Standardize on the glue across an organization. Choose one synchronous and one asynchronous protocol. Don't gold-plate.
  3. Greed - all your service are belong to us. Do not underestimate the impact moving to a microservice approach will have on your organization. Your business organization needs to change to take advantage of microservices. Typically orgs will have silos between Dev, QA, and Ops with even more silos inside each silo like front-end, middleware, and database. Use cross functional teams like Spotify, Amazon, and Gilt. Connect rather than divide your company. 
  4. Sloth - creating a distributed monolith. If you can't deploy your services independently then they aren't microservices. Decouple. Transform data at a less central part of the stack. Some options are schema-first design and consumer-driven contracts.
  5. Wrath - blowing up when bad things happen. Bad things happen all the time so you need to test. Microservices are inherently distributed so you have network problems to deal with that weren't a problem in a monolith. The book Release It! has a lot of good fault tolerance patterns. Operationally you need to implement continuous delivery, agile, and devops. Test for failures using real life disaster scenarios testing, live injection failure testing, and something like Netflix's Simian Army.
  6. Envy - the shared single domain fallacy. A lot of time has been spent building and perfecting the model of a single domain. There's one big database with a unified schema. Microservices decompose a system along different lines and that can cause contention in an organization. Reports can be generated using pull by service or data pumps with events. 
  7. Pride - testing in the world of transience. Does your stuff really work? We all make mistakes. Think testing at the developer level, operational level, and business level. Surprisingly little has been written about testing microservices. Invest in your build pipeline testing. Some tools: Serenity BOD, Wiremock/Saboteur, Jenkins Performance Plugin. Testing in production is an emerging idea with companies that deploy many microservices.
Categories: Architecture

Stuff The Internet Says On Scalability For July 31st, 2015

Fri, 07/31/2015 - 16:56

Hey, it's HighScalability time:


Where does IBM's Watson or Google Translate fit? (SciencePorn)
  • 40Tb/s: Bandwidth for Windows 10 launch; 4.04B: Facebook Q2 revenue; 37M: Americans who don't use the web;
  • Quotable Quotes:
    • @BoredElonMusk: We would have already discovered Earth 6.0 if NASA got the same budget as the DOD.
    • David Blight~ Something I've always believed as a historian and more and more it seems true to me is what really moves history, or brings about change in rather sudden and almost always unpredictable ways, is events. 
    • Quentyn Kennemer: Tom Brady replaces Android with iPhone, gets suspended 4 games
    • @BenedictEvans: Apple Maps has ~300m users to iOS GMaps 100m, of 4-500m iPhones. Spotify has 20m paying & 70m free users. And then there’s YouTube
    • Ben: Some scale problems should go unsolved. No. Most scale problems should go unsolved.
    • @mikedicarlo: 3.5 million Redis ops per/sec across our cluster. Wondering how that compares with other production deployments out there. 
    • @Carnage4Life: $1 billion valuation for a caller ID app with $800K in revenues? Unicorn valuations are officially meaningless 

  • Is shooting a trespasser filming a video of your potentially intimate moments considered a crime? Kentucky man shoots down drone hovering over his backyard

  • Death through premature scaling. Larry Berman determined this was the cause of death of RewardMe, his once scrappy startup. In the next turn of the wheel the dharma is:  Be a 1-man growth team;  Get customers online as oppose to through a long sales cycle; Don’t hold inventory; Focus on product and support. The new enlightenment: Don’t scale until you’re ready for it. Cash is king, and you need to extend your runway as long as possible until you’ve found product market fit. 

  • What about scaling for the rest of us? That's the topic addressed in Scaling Ruby Apps to 1000 Requests per Minute - A Beginner's Guide. A very good resource. It goes into explaining the path of request through Heroku. Dispels some myths like scaling up makes a system faster. Explains queue time. And other good stuff. 

  • Not quite as sexy as Zero Point energy, but 3D Xpoint memory sounds pretty cool: Intel and Micron have unveiled what appears to be the holy grail of memory. Called 3D XPoint (pronounced "cross point"), this is an entirely new type of non-volatile memory, with roughly 1,000 times the performance and 1,000 times the endurance of conventional NAND flash, while also being 10 times denser than conventional DRAM.

  • So what is 3D XPoint Memory really? Here's a great analysis at DailyTech by Jason Mick. More than analysis, it's a detective story. Jason puts together clues from history and recently filed patents to deduce that this new wonder RAM is most likely to be PRAM or Phase-change Memory, that stores data "in the form of a phase change to a tiny atomic-level structure." Jason thinks "any usage scenarios, it may be possible to run exclusively off PRAM." Forgetting just got even harder.

  • Damn. I may die after all. The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near: My model shows that it can be estimated that the brain operates at least 10x^21 operations per second. With current rates of growth in computational power we could achieve supercomputers with brain-like capabilities by the year 2037, but estimates after the year 2080 seem more realistic.

  • It has always struck me that telcos who desperately want to get in to the cloud business, where they are just an also ran, control some of the most desired potential colo space in the world: cell towers. Turn those towers into location aware clouds and we can really get some revolutionary edge computing going on. Transiting traffic back to a centralized cloud is such a waste. Could 'Supercomputing at the Edge' provide a scalable platform for new mobile services?

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

How Debugging is Like Hunting Serial Killers

Thu, 07/30/2015 - 16:56

Warning: A quote I use in this article is quite graphic. That's the power of the writing, but if you are at all squirmy you may want to turn back now

Debugging requires a particular sympathy for the machine. You must be able to run the machine and networks of machines in your mind while simulating what-ifs based on mere wisps of insight.

There's another process that is surprisingly similar to debugging: hunting down serial killers.

I ran across this parallel while reading Mindhunter: Inside the FBI's Elite Serial Crime Unit by John E. Douglas, a FBI profiler whose specialty is the dark debugging of twisted human minds.

Here's how John describes profiling:

You have to be able to re-create the crime scene in your head. You need to know as much as you can about the victim so that you can imagine how she might have reacted. You have to be able to put yourself in her place as the attacker threatens her with a gun or a knife, a rock, his fists, or whatever. You have to be able to feel her fear as he approaches her. You have to be able to feel her pain as he rapes her or beats her or cuts her. You have to try to imagine what she was going through when he tortured her for his sexual gratification. You have to understand what it’s like to scream in terror and agony, realizing that it won’t help, that it won’t get him to stop. You have to know what it was like. And that is a heavy burden to have to carry.

Serial killers are like bugs in the societal machine. They hide. They blend in. They can pass for "normal" which makes them tough to find. They attack weakness causing untold damage until caught. And they will keep causing damage until caught. They are always hunting for opportunity.

After reading the book I'm quite grateful that the only bugs I've had to deal with are of the computer variety. The human bugs are very very scary.

Here are some other quotes from the book you may also appreciate:

Categories: Architecture

A Well Known But Forgotten Trick: Object Pooling

Wed, 07/29/2015 - 16:56

This is a guest repost by Alex Petrov. Find the original article here.

Most problem are quite straightforward to solve: when something is slow, you can either optimize it or parallelize it. When you hit a throughput barrier, you partition a workload to more workers. Although when you face problems that involve Garbage Collection pauses or simply hit the limit of the virtual machine you're working with, it gets much harder to fix them.

When you're working on top of a VM, you may face things that are simply out of your control. Namely, time drifts and latency. Gladly, there are enough battle-tested solutions, that require a bit of understanding of how JVM works.

If you can serve 10K requests per second, conforming with certain performance (memory and CPU parameters), it doesn't automatically mean that you'll be able to linearly scale it up to 20K. If you're allocating too many objects on heap, or waste CPU cycles on something that can be avoided, you'll eventually hit the wall.

The simplest (yet underrated) way of saving up on memory allocations is object pooling. Even though the concept is sounds similar to just pooling objects and socket descriptors, there's a slight difference.

When we're talking about socket descriptors, we have limited, rather small (tens, hundreds, or max thousands) amount of descriptors to go through. These resources are pooled because of the high initialization cost (establishing connection, performing a handshake over the network, memory-mapping the file or whatever else). In this article we'll talk about pooling larger amounts of short-lived objects which are not so expensive to initialize, to save allocation and deallocation costs and avoid memory fragmentation.

Object Pooling
Categories: Architecture