Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
Syndicate content
Updated: 10 hours 10 min ago

Stuff The Internet Says On Scalability For December 19th, 2014

Fri, 12/19/2014 - 18:12

Hey, it's HighScalability time:


Brilliant & hilarious keynote to finish the day at #yow14 (Matt)

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

The Big Problem is Medium Data

Wed, 12/17/2014 - 17:56

This is a guest post by Matt Hunt, who leads open source projects for Bloomberg LP R&D. 

“Big Data” systems continue to attract substantial funding, attention, and excitement. As with many new technologies, they are neither a panacea, nor even a good fit for many common uses. Yet they also hold great promise. The question is, can systems originally designed to serve hundreds of millions of requests for something like web pages also work for requests that are computationally expensive and have tight tolerances?

Modern era big data technologies are a solution to an economics problem faced by Google and other Internet giants a decade ago. Storing, indexing, and responding to searches against all web pages required tremendous amounts of disk space and computer power. Very powerful machines, fast SAN storage, and data center space were prohibitively expensive. The solution was to pack cheap commodity machines as tightly together as possible with local disks.

This addressed the space and hardware cost problem, but introduced a software challenge. Writing distributed code is hard, and with many machines comes many failures. So a framework was also required to take care of such problems automatically for the system to be viable.

Hadoop

Right now, we’re in a transition phase in the industry in computing built from the entrance of Hadoop and its community starting in 2004. Understanding why and how these systems were created also offers insight into some of their weaknesses.  

At Bloomberg that we don’t have a big data problem. What we have is a “medium data” problem -- and so does everyone else.   Systems such as Hadoop and Spark are less efficient and mature for these typical low latency enterprise uses in general. High core counts, SSDs, and large RAM footprints are common today - but many of the commodity platforms have yet to take full advantage of them, and challenges remain.  A number of distributed components are further hampered by Java, which creates its own complications for low latency performance.

A practical use case
Categories: Architecture

Multithreaded Programming has Really Gone to the Dogs

Tue, 12/16/2014 - 17:56

Taken from Multithreaded programming - theory and practice on reddit, which also has some very funny comments. If anything this is way too organized. 

 What's not shown? All the little messes that have to be cleaned up after...

Categories: Architecture

The Machine: HP's New Memristor Based Datacenter Scale Computer - Still Changing Everything

Tue, 12/16/2014 - 17:52

The end of Moore’s law is the best thing that’s happened to computing in the last 50 years. Moore’s law has been a tyranny of comfort. You were assured your chips would see a constant improvement. Everyone knew what was coming and when it was coming. The entire semiconductor industry was held captive to delivering on Moore’s law. There was no new invention allowed in the entire process. Just plod along on the treadmill and do what was expected. We are finally breaking free of these shackles and entering what is the most exciting age of computing that we’ve seen since the late 1940s. Finally we are in a stage where people can invent and those new things will be tried out and worked on and find their way into the market. We’re finally going to do things differently and smarter.

-- Stanley Williams (paraphrased)

HP has been working on a radically new type of computer, enigmatically called The Machine (not this machine). The Machine is perhaps the largest R&D project in the history of HP. It’s a complete rebuild of both hardware and software from the ground up. A massive effort. HP hopes to have a small version of their datacenter scale product up and running in two years.

The story began when we first met HP’s Stanley Williams about four years ago in How Will Memristors Change Everything? In the latest chapter of the memristor story, Mr. Williams gives another incredible talk: The Machine: The HP Memristor Solution for Computing Big Data, revealing more about how The Machine works.

The goal of The Machine is to collapse the memory/storage hierarchy. Computation today is energy inefficient. Eighty percent of the energy and vast amounts of time are spent moving bits between hard disks, memory, processors, and multiple layers of cache. Customers end up spending more money on power bills than on the machines themselves. So the machine has no hard disks, DRAM, or flash. Data is held in power efficient memristors, an ion based nonvolatile memory, and data is moved over a photonic network, another very power efficient technology. When a bit of information leaves a core it leaves as a pulse of light.

On graph processing benchmarks The Machine reportedly performs 2-3 orders of magnitude better based on energy efficiency and one order of magnitude better based on time. There are no details on these benchmarks, but that’s the gist of it.

The Machine puts data first. The concept is to build a system around nonvolatile memory with processors sprinkled liberally throughout the memory. When you want to run a program you send the program to a processor near the memory, do the computation locally, and send the results back. Computation uses a wide range of heterogeneous multicore processors. By only transmitting the bits required for the program and the results the savings is enormous when compared to moving terabytes or petabytes of data around.

The Machine is not targeted at standard HPC workloads. It’s not a LINPACK buster. The problem HP is trying to solve for their customers is where a customer wants to perform a query and figure out the answer by searching through a gigantic pile of data. Problems that need to store lots of data and analyze in realtime as new data comes in

Why is a very different architecture needed for building a computer? Computer systems can’t not keep up with the flood of data that’s coming in. HP is hearing from their customers that they need the ability to handle ever greater amounts of data. The amount of bits that are being collected is growing exponentially faster than the rate at which transistors are being manufactured. It’s also the case that information collection is growing faster than the rate at which hard disks are being manufactured. HP estimates there are 250 trillion DVDs worth of data that people really want to do something with. Vast amount of data are being collected in the world are never even being looked at.

So something new is needed. That’s at least the bet HP is making. While it’s easy to get excited about the technology HP is developing, it won’t be for you and me, at least until the end of the decade. These will not be commercial products for quite a while. HP intends to use them for their own enterprise products, internally consuming everything that’s made. The idea is we are still very early in the tech cycle, so high cost systems are built first, then as volumes grow and processes improve, the technology will be ready for commercial deployment. Eventually costs will come down enough that smaller form factors can be sold.

What is interesting is HP is essentially building its own cloud infrastructure, but instead of leveraging commodity hardware and software, they are building their own best of breed custom hardware and software. A cloud typically makes available vast pools of memory, disk, and CPU, organized around instance types which are connected by fast networks. Recently there’s a move to treat these resource pools as independent of the underlying instances. So we are seeing high level scheduling software like Kubernetes and Mesos becoming bigger forces in the industry. HP has to build all this software themselves, solving many of the same problems, along with the opportunities provided by specialized chips. You can imagine programmers programming very specialized applications to eke out every ounce of performance from The Machine, but what is more likely is HP will have to create a very sophisticated scheduling system to optimize how programs run on top of The Machine. What's next in software is the evolution of a kind of Holographic Application Architecture, where function is fluid in both time and space, and identity arises at run-time from a two-dimensional structure. Schedule optimization is the next frontier being explored on the cloud.

The talk is organized in two broad sections: hardware and software. Two-thirds of the project is software, but Mr. Williams is a hardware guy, so hardware makes up the majority of the talk.  The hardware section is based around the idea of optimizing the various functions around the physics that is available: electrons compute; ions store; photons communicate.

Here’s is my gloss on Mr. Williams talk. As usual with such a complex subject much can be missed. Also, Mr. Williams tosses huge interesting ideas around like pancakes, so viewing the talk is highly recommended. But until then, let’s see The Machine HP thinks will be the future of computing….

Categories: Architecture

The Machine: HP's New Memristor Based Datacenter Scale Computer - Still Changing Everything

Mon, 12/15/2014 - 17:56

The end of Moore’s law is the best thing that’s happened to computing in the last 50 years. Moore’s law has been a tyranny of comfort. You were assured your chips would see a constant improvement. Everyone knew what was coming and when it was coming. The entire semiconductor industry was held captive to delivering on Moore’s law. There was no new invention allowed in the entire process. Just plod along on the treadmill and do what was expected. We are finally breaking free of these shackles and entering what is the most exciting age of computing that we’ve seen since the late 1940s. Finally we are in a stage where people can invent and those new things will be tried out and worked on and find their way into the market. We’re finally going to do things differently and smarter.

-- Stanley Williams (paraphrased)

HP has been working on a radically new type of computer, enigmatically called The Machine (not this machine). The Machine is perhaps the largest R&D project in the history of HP. It’s a complete rebuild of both hardware and software from the ground up. A massive effort. HP hopes to have a small version of their datacenter scale product up and running in two years.

The story began when we first met HP’s Stanley Williams about four years ago in How Will Memristors Change Everything? In the latest chapter of the memristor story, Mr. Williams gives another incredible talk: The Machine: The HP Memristor Solution for Computing Big Data, revealing more about how The Machine works.

The goal of The Machine is to collapse the memory/storage hierarchy. Computation today is energy inefficient. Eighty percent of the energy and vast amounts of time are spent moving bits between hard disks, memory, processors, and multiple layers of cache. Customers end up spending more money on power bills than on the machines themselves. So the machine has no hard disks, DRAM, or flash. Data is held in power efficient memristors, an ion based nonvolatile memory, and data is moved over a photonic network, another very power efficient technology. When a bit of information leaves a core it leaves as a pulse of light.

On graph processing benchmarks The Machine reportedly performs 2-3 orders of magnitude better based on energy efficiency and one order of magnitude better based on time. There are no details on these benchmarks, but that’s the gist of it.

The Machine puts data first. The concept is to build a system around nonvolatile memory with processors sprinkled liberally throughout the memory. When you want to run a program you send the program to a processor near the memory, do the computation locally, and send the results back. Computation uses a wide range of heterogeneous multicore processors. By only transmitting the bits required for the program and the results the savings is enormous when compared to moving terabytes or petabytes of data around.

The Machine is not targeted at standard HPC workloads. It’s not a LINPACK buster. The problem HP is trying to solve for their customers is where a customer wants to perform a query and figure out the answer by searching through a gigantic pile of data. Problems that need to store lots of data and analyze in realtime as new data comes in

Why is a very different architecture needed for building a computer? Computer systems can’t not keep up with the flood of data that’s coming in. HP is hearing from their customers that they need the ability to handle ever greater amounts of data. The amount of bits that are being collected is growing exponentially faster than the rate at which transistors are being manufactured. It’s also the case that information collection is growing faster than the rate at which hard disks are being manufactured. HP estimates there are 250 trillion DVDs worth of data that people really want to do something with. Vast amount of data are being collected in the world are never even being looked at.

So something new is needed. That’s at least the bet HP is making. While it’s easy to get excited about the technology HP is developing, it won’t be for you and me, at least until the end of the decade. These will not be commercial products for quite a while. HP intends to use them for their own enterprise products, internally consuming everything that’s made. The idea is we are still very early in the tech cycle, so high cost systems are built first, then as volumes grow and processes improve, the technology will be ready for commercial deployment. Eventually costs will come down enough that smaller form factors can be sold.

What is interesting is HP is essentially building its own cloud infrastructure, but instead of leveraging commodity hardware and software, they are building their own best of breed custom hardware and software. A cloud typically makes available vast pools of memory, disk, and CPU, organized around instance types which are connected by fast networks. Recently there’s a move to treat these resource pools as independent of the underlying instances. So we are seeing high level scheduling software like Kubernetes and Mesos becoming bigger forces in the industry. HP has to build all this software themselves, solving many of the same problems, along with the opportunities provided by specialized chips. You can imagine programmers programming very specialized applications to eke out every ounce of performance from The Machine, but what is more likely is HP will have to create a very sophisticated scheduling system to optimize how programs run on top of The Machine. What's next in software is the evolution of a kind of Holographic Application Architecture, where function is fluid in both time and space, and identity arises at run-time from a two-dimensional structure. Schedule optimization is the next frontier being explored on the cloud.

The talk is organized in two broad sections: hardware and software. Two-thirds of the project is software, but Mr. Williams is a hardware guy, so hardware makes up the majority of the talk.  The hardware section is based around the idea of optimizing the various functions around the physics that is available: electrons compute; ions store; photons communicate.

Here’s is my gloss on Mr. Williams talk. As usual with such a complex subject much can be missed. Also, Mr. Williams tosses huge interesting ideas around like pancakes, so viewing the talk is highly recommended. But until then, let’s see The Machine HP thinks will be the future of computing….

Categories: Architecture

Stuff The Internet Says On Scalability For December 12th, 2014

Fri, 12/12/2014 - 17:56

Hey, it's HighScalability time:


We've had a wee bit of a storm in the bay area.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Reactive prefetch speeds Google's mobile search by 100-150 milliseconds.

Wed, 12/10/2014 - 17:56

Increasing responsiveness by parallelizing and prefetching content using hints and dependency graphs, is an old concept, but seldom do we see such a nice tight example of the benefit as is given by the great Ilya Grigorik in this G+ post

The insight here is that we're initiating the fetch for the HTML and its critical resources in parallel... which requires that the page initiating the navigation knows which critical resources are being used on the target page.

This is a powerful pattern and one that you can use to accelerate your site as well. The key insight is that we are not speculatively prefetching resources and do not incur unnecessary downloads. Instead, we wait for the user to click the link and tell us exactly where they are headed, and once we know that, we tell the browser which other resources it should fetch in parallel - aka, reactive prefetch!

As you can infer, implementing the above strategy requires a lot of smarts both in the browser and within the search engine... First, we need to know the list of critical resources that may delay rendering of the destination page for every page on the web! No small feat, but the Search team has us covered - they're good like that. Next, we need a browser API that allows us to invoke the prefetch logic when the click occurs: the search page listens for the click event, and once invoked, dynamically inserts prefetch hints into the search results page. Finally, this is where Chrome comes in: as the search results page is unloaded, the browser begins fetching the hinted resources in parallel with the request for the destination page. The net result is that the critical resources are fetched much sooner, allowing the browser to render the destination page 100-150 milliseconds earlier.

Categories: Architecture

In Memory: Grace Hopper to Programmers: Mind Your Nanoseconds!

Wed, 12/10/2014 - 00:22

This is an article published a few years ago, but as today is Grace Hopper's birthday I thought it would be a good time to share again an amazing talk from this amazing woman.

Computing pioneer Grace Hopper, inventor of the compiler, searched for a concrete way to create an intuitive understanding of just how fast is a nanosecond, a billionth of a second, which was the speed of their new computer circuits. As an illustration she settled on the length of wire that is as long as light can travel in one nanosecond. The length is a very portable 11.8 inches. A microseconds worth of wire is a still portable, but a much bulkier 984 feet. In one millisecond light travels 186 miles, which only Hercules could carry. In today's terms, at a 3.06 GHz clock speed, there's .33 nanoseconds between ticks, or 3.73 inches of light travel.

Understanding the profligate ways of programmers, she suggests that every programmer wear a necklace of a microseconds worth of wire so they know what they are wasting when they throw away microseconds. And if a General is busting your chops about satellite messages taking too long to send, you can bust out your piece of wire and explain there's a lot of nanoseconds between here and there.

Here's a short, witty, and wise video of her famous nanosecond demonstration. An amazing lady, great innovator, an engaging speaker, and an inspiring teacher.

Related Articles
Categories: Architecture

Sponsored Post: Campanja, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Tue, 12/09/2014 - 17:56

Who's Hiring?
  • Campanja is an Internet advertising optimization company born in the cloud and today we are one of the nordics bigger AWS consumers, the time has come for us to the embrace the next generation of cloud infrastructure. We believe in immutable infrastructure, container technology and micro services, we hope to use PaaS when we can get away with it but consume at the IaaS layer when we have to. Please apply here.

  • Performance and Scale EngineerSprout Social, will be like a physical trainer for the Sprout social media management platform: you will evaluate and make improvements to keep our large, diverse tech stack happy, healthy, and, most importantly, fast. You'll work up and down our back-end stack - from our RESTful API through to our myriad data systems and into the Java services and Hadoop clusters that feed them - searching for SPOFs, performance issues, and places where we can shore things up. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/
Cool Products and Services
  • Aerospike Hits 1M writes per second with 6x Fewer Servers than Cassandra. A new Google Compute Engine benchmark demonstrates how the Aerospike database hit 1 million writes per second with just 50 nodes - compared to Cassandra's 300 nodes. Read the benchmark: http://www.aerospike.com/blog/1m-wps-6x-fewer-servers-than-cassandra/

  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Stuff The Internet Says On Scalability For December 5th, 2014

Fri, 12/05/2014 - 17:56

Hey, it's HighScalability time:


InfoSec Taylor Swift is wise...haters gonna hate.

 

  • 6 billion+: Foursquare checkins; 25000: allocs for every keystroke in Chrome's Omnibox
  • Quotable Quotes:
    • @wattersjames: Pretty convinced that more value is created by networks of products in today's world than 'stacks'--'stack' model seems outdated.
    • @ChrisLove: WOW 70% of http://WalMart.com  traffic over the holidays was from a 'mobile' device! #webperf #webdevelopment #html5
    • @Nick_Craver: No compelling reason - we can run all of #stackexchange on one R610 server (4yr old) @ 5% CPU. #redis is incredibly efficient.
    • @jehiah: The ticker on http://bitly.com  rolled past 20 BILLION Bitlinks today. Made possible by reliable mysql clusters + NSQ.
    • @tonydenyer: micro services how to turn your ball of mud into a distributed ball of mud" #xpdaylon
    • @moonpolysoft: containers are the new nosql b/c all are dimly aware of a payoff somewhere, and are more than willing to slice each other up to get there.
    • @shipilev: Shipilev's First Law of Performance Issues: "It is almost always something very simple and embarrassing to admit once you found it"
    • Gérard Berry: We discovered that this whole hypothesis of being infinitely fast both simplified programming and helped us to be faster than others, which was fun.
    • @randybias: OH: “The religion of technology is featurism.”  [ brilliant observation ]
    • @rolandkuhn: ACID 2.0: associative commutative idempotent distributed #reactconf @PatHelland
    • @techmilind: @adrianco @wattersjames @cloudpundit Intra-region latency of <2ms is the killer feature. That's what makes Aurora possible.
    • @timreid: async involves a higher level of initial essential complexity, but a greatly reduced level of continual accidental complexity #reactconf
    • @capotribu: Docker Machine + Swarm + (proposed) Compose = multi-containers apps on clusters in 1 command #DockerCon
    • @dthume: "Some people say playing with thread affinity is for wimps. Those people don't make money" - @mjpt777 at #reactconf
    • @jamesurquhart: Reading a bunch of apparently smart people remain blind to the lessons of complexity. #rootcauseismerelyaclue
    • Facebook: the rate of our machine-to-machine traffic growth remains exponential, and the volume has been doubling at an interval of less than a year.

  • In the US we tend to be practical mobile users instead of personal and social fun users. Innovation is happening elsewhere as is clearly shown in Dan Grover's epic exploration of Chinese Mobile App UI Trends: using voice instead of text for messaging; QR codes for everything; indeterminate badges to indicate there's something interesting to look at; a discover menu item that contains "changing menagerie of fun"; lots of app stores; using phone numbers for login, even on websites; QR code logins; chat as the universal UI; more damn stickers; each app has a wallet; use of location in ways those in the US might find creepy; tight integration with offline consumption; common usage of the assistive touch feature on the iPhone; cutesy mascots in loading and error screens; pollution widgets; full ad splash screen when an app starts; theming of apps. 

  • Awesome analysis. A really deep dive with great graphics on Facebook's new network architecture. Facebook Fabric Networking Deconstructed: the new Facebook Network Fabric is in fact a Fat-Tree with 3-levels.

  • Just a tiny thing. AMD, Numascale, and Supermicro Deploy Large Shared Memory System: The Numascale system, installed over the past two weeks, consists of 5184 CPU cores and 20.7 TBytes of shared memory which is housed in 108 Supermicro 1U servers connected in a 3D torus with NumaConnect, using three cabinets with 36 servers apiece in a 6x6x3 topology. Each server has 48 cores three AMD Opteron 6386 CPUs and 192GB memory, providing a single system image and 20.7 TB to all 5184 cores.

  • Quit asking why something happened. The question that must be answered is how. The Infinite Hows (or, the Dangers Of The Five Whys)

  • What do customers want? Answers. Greg Ferro talks about how a company that hired engineers to answer sales inquiries doubled their sales. All people wanted were their questions answered. Once answered they would place an order. No complex time wasting sales cycle required. Technology has replaced the information gather part of the sales cycle. Customers already know everything about a product before making contact. Now what they need are answers.

  • Docker Networking is as simple as a reverse 4 and a half somersault piked from a 3 metre board into a black hole.

  • How is that Google Cloud Platform thingy working out? Pretty well says Aerospike. 1M Aerospike reads/sec for just $11.44/hour, on 50 nodes, with  linear scalability for 100% reads and 100% writes.

  • Your bright human mind can solve a maze. So what? Fatty acid chemistry can too: This study demonstrates that the Marangoni flow in a channel network can solve maze problems such as exploring and visualizing the shortest path and finding all possible solutions in a parallel fashion. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

All employees should be limited only by their ability rather than an absence of resources.

Wed, 12/03/2014 - 17:56

James Hamilton hid a pearl of wisdom inside Why Renewable Energy (Alone) Won't Full Solve the Problem that I think is well worth prying out:

I’ve long advocated the use of economic incentives to drive innovative uses of computing resources inside the company while preventing costs from spiraling out of control.  Most IT departments control costs by having computing resources in short supply and only buying more resources slowly and with considerable care. Effectively computing is a scarce resource so it needs to get used carefully. This effectively limits IT cost growth and controls wastage but it also limits overall corporate innovation and the gains driven by the experiments that need these additional resources.

I’m a big believer in making effectively infinite computing resources available internally and billing them back precisely to the team that used them. Of course, each internal group needs to show the customer value of their resource consumption. Asking every group to effectively be a standalone profit center is, in some ways, complex in that the “product” from some groups is hard to quantitatively measure. Giving teams the resources they need to experiment and then allowing successful experiments to progress rapidly into production encourages innovation, makes for a more exciting place to work, and the improvements brought by successful experiments help the company be more competitive and better serve its customers.

I argue that all employees should be limited only by their ability rather than an absence of resources or an inability to argue convincingly for more. This is one of the most important yet least discussed advantages of cloud computing: taking away artificial resource limitations in support light-weight experimentation and rapid innovation. Making individual engineers and teams responsible to deliver more value for more resources consumed makes it possible encourage experimentation without fear that costs will rise without sufficient value being produced.

Categories: Architecture

Auth0 Architecture - Running in Multiple Cloud Providers and Regions

Mon, 12/01/2014 - 17:56

This is a guest post by Jose Romaniello, Head of Engineering, at Auth0.

Auth0 provides authentication, authorization and single sign on services for apps of any type: mobile, web, native; on any stack.

Authentication is critical for the vast majority of apps. We designed Auth0 from the beginning with multipe levels of redundancy. One of this levels is hosting. Auth0 can run anywhere: our cloud, your cloud, or even your own servers. And when we run Auth0 we run it on multiple-cloud providers and in multiple regions simultaneously.

This article is a brief introduction of the infrastructure behind app.auth0.com and the strategies we use to keep it up and running with high availability.

Core Service Architecture

The core service is relatively simple:

  • Front-end servers: these consist of several x-large VMs, running Ubuntu on Microsoft Azure.

  • Store: mongodb, running on dedicated memory optimized X-large VMs.

  • Intra-node service routing: nginx

All components of Auth0 (e.g. Dashboard, transaction server, docs) run on all nodes. All identical.

Multi-cloud / High Availability
Categories: Architecture

Make Any Framework Suck Less With These 10 Insightful Lessons

Wed, 11/26/2014 - 17:56

Alexey Migutsky in 2 years with Angular has a lot to say about Angular, which I can't comment on at all, not being an Angular user. But burried in his article are some lessons for building better frameworks that obviously come from deep experience. Frameworks will always suck, but if you follow these lessons will your frameworks suck less? Yes, I think they will.

Here are Alexey's Lessons for framework (and metaframework) developers:

  1. You should have as small as possible number on abstractions.
  2. You should name things consistent with your "thought domain".
  3. Do not mix several responsibilities in your components. Make fine-grained abstractions with well-defined roles.
  4. Always describe the intention for your decisions and tradeoffs in your documentation.
  5. Have a currated and updated reference project/examples.
  6. You abstractions should scale "from bottom up". Start with small items and then fit them to a Composite pattern. Do not start with the question "How do we override it globally?".
  7. Global state is pure evil. It's like darkness in the horror films - you never know what problems you will have when you tread into it...
  8. The dataflow and data changes should be granular and localized to a single component.
  9. Do not make things easy to use, make your components and abstractions simple to understand. People should learn how to do stuff in a new and effective way, do not ADAPT to their comfort zone.
  10. Do not encode all good things you know in the framework.
Categories: Architecture

Sponsored Post: Apple, Asana, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Tue, 11/25/2014 - 17:56

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Sr. Software Engineer-iOS Systems. Do you love building highly scalable, distributed web applications? Does the idea of performance tuning Java applications make your heart leap? Would you like to work in a fast-paced environment where your technical abilities will be challenged on a day to day basis? Do you want your work to make a difference in the lives of millions of people? Please apply here.
    • Apple Pay - Site Reliability Engineer. You already know this… every issue counts. A single ticket can be the key to discovering an issue impacting thousands of people. And now that you’ve found it, you can’t wait to fix it. You also know that owning the quality of an application is about separating the signal from the noise. Finding that signal is what motives you. Now that you’ve found it, you’re next step is to role up the sleeves and start coding. As a member of the Apple Pay SRE team, you’re expected to not just find the issues, but to write code and fix them. Please apply here.
    • Senior Software Engineer -iOS Systems. This role demands the best and brightest engineers. The ideal candidate will be well rounded and offer a diverse skill set that aligns with key qualifications. Practical experience integrating with a diverse set of third-party APIs will also serve to distinguish you from other candidates. This is a highly cross functional role, and the typical team member's day to day responsibilities on the Carrier Services team. Please apply here
  • Aerospike is hiring! Join the innovative team behind the world's fastest flash-optimized in-memory NoSQL database. Currently hiring for positions in our Mountain View, Calif., and Bangalore offices. Apply now! http://www.aerospike.com/careers

  • As a production-focused infrastructure engineer at Asana, you’ll be the person who takes the lead on setting and achieving our stability and uptime goals, architecting the production stack, defining the on-call experience, the build process, cluster management, monitoring and alerting. Please apply here.

  • Performance and Scale EngineerSprout Social, will be like a physical trainer for the Sprout social media management platform: you will evaluate and make improvements to keep our large, diverse tech stack happy, healthy, and, most importantly, fast. You'll work up and down our back-end stack - from our RESTful API through to our myriad data systems and into the Java services and Hadoop clusters that feed them - searching for SPOFs, performance issues, and places where we can shore things up. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/
Cool Products and Services
  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

A Flock of Tasty Sources on How to Start Learning High Scalability

Mon, 11/24/2014 - 17:56

This is a guest repost by Leandro Moreira.

distributed systems

When we usually are interested about scalability we look for links, explanations, books, and references. This mini article links to the references I think might help you in this journey.

DISCLAIMER:

You don’t need to have N machines to build/test a cluster/high scalable system, currently you can use Vagrant and up N machines easily.

THE REFERENCES:

Now that you know you can empower yourself with virtual servers, I challenge you to not only read these links but put them into practice.

Good questions to test your knowledge:

Categories: Architecture

Stuff The Internet Says On Scalability For November 21st, 2014

Fri, 11/21/2014 - 17:56

Hey, it's HighScalability time:


Sweet dreams brave Philae. May you awaken to a bright-throned dawn for one last challenge.

 

  •  80 million: bacteria xferred in a juicy kiss;
  • Quotable Quotes:
    • James Hamilton: Every day, AWS adds enough new server capacity to support all of Amazon's global infrastrucrture when it was a $7B annual revenue enterprise.
    • @iglazer: What is the test that could most destroy your business model?  Test for that. @adrianco #defragcon
    • @zhilvis: Prefer decoupling over duplication. Coupling will kill you before duplication does by @ICooper #buildstufflt
    • @jmbroad: "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." ~George Box
    • @RichardWarburto: Optimisation maybe premature but measurement isn't.
    • @joeerl: Hell hath no version numbers - the great ones saw no need for version numbers - they used port numbers instead. See, for example, RFC 821,
    • JustCallMeBen: tldr: queues only help to flatten out burst load. Make sure your maintained throughput is high enough.
    • @rolandkuhn: «the event log is a database of the past, not just of the present» — @jboner at #reactconf
    • @ChiefScientist: CRUD is dead. -- @jboner #reactconf 
    • @fdmts: 30T of flash disks cabled up and online.  Thanks @scalableinfo!
    • monocasa: Immutable, statically linked, minimal system images running micro services on top of a hypervisor is a very old concept too. This is basically the direction IBM went in the 60's with their hypervisors and they haven't looked back.
    • Kiril Savino: Scaling is the process of decoupling load from latency.

  • Perhaps they were controlled by a master AI? Google and Stanford Built Similar Neural Networks Without Knowing It: Neural networks can be plugged into one another in a very natural way. So we simply take a convolutional neural network, which understands the content of images, and then we take a recurrent neural network, which is very good at processing language, and we plug one into the other. They speak to each other—they can take an image and describe it in a sentence.

  • You know how you never really believed the view in MVC was ever really separate? Now this is MVC. WatchKit apps run on the iPhone as an extension, only the UI component runs on the watch. XWindows would be so proud.

  • Shopify shows how they Build an Internal Cloud with Docker and CoreOS: Shopify is a large Ruby on Rails application that has undergone massive scaling in recent years. Our production servers are able to scale to over 8,000 requests per second by spreading the load across 1700 cores and 6 TB RAM.

  • Machine learning isn't just about creating humavoire AIs. It's a technology, like electricity, that will transform everything it affixes with its cyclops gaze. Here's a seemingly mundane example from Google, as discussed on the Green (Low Carbon) Data Center Blog. Google has turned inward, applying Machine Learning to its data center fleet. The result:  Google achieved from 8% to 25% reduction in its energy used to cool the data center with an average of 15%.  Who wouldn’t be excited to save an average of 15% on their cooling energy costs by providing new settings to run the mechanical plant? < And this is how the world will keep those productivity increases reaching skyward.

  • Does anyone say "I love my water service"? Or "I love my garbage service"? Then why would anyone say "I love Facebook"? That's when you've arrived. When you are so much a part of the way things are that people don't even think of loving them or not. They just are. The Fall of Facebook

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

We are leaving 3x-4x performance on the table just because of configuration.

Wed, 11/19/2014 - 17:56

Performance guru Martin Thompson gave a great talk at Strangeloop: Aeron: Open-source high-performance messaging, and one of the many interesting points he made was how much performance is being lost because were aren't configuring machines properly.

This point comes on the observation that "Loss, throughput, and buffer size are all strongly related."

Here's a gloss of Martin's reasoning. It's a problem that keeps happening and people aren't aware that it's happening because most people are not aware of how to tune network parameters in the OS.

The separation of programmers and system admins has become an anti-pattern. Developers don’t talk to the people who have root access on machines who don’t talk to the people that have network access. Which means machines are never configured right, which leads to a lot of loss. We are leaving 3x-4x performance on the table just because of configuration.

We need to workout how to bridge that gap, know what the parameters are, and how to fix them.

So know your OS network parameters and how to tune them.

Related Articles
Categories: Architecture

Aeron: Do we really need another messaging system?

Mon, 11/17/2014 - 18:26

Do we really need another messaging system? We might if it promises to move millions of messages a second, at small microsecond latencies between machines, with consistent response times, to large numbers of clients, using an innovative design.  

And that’s the promise of Aeron (the Celtic god of battle, not the chair, though tell that to the search engines), a new high-performance open source message transport library from the team of Todd Montgomery, a multicast and reliable protocol expert, Richard Warburton, an expert on compiler optimizations, and Martin Thompson, the pasty faced performance gangster.

The claims are Aeron is already beating the best products out there on throughput and latency matches the best commercial products up to the 90th percentile. Aeron can push small 40 byte messages at 6 million messages a second, which is a very difficult case.

Here’s a talk Martin gave on Aeron at Strangeloop: Aeron: Open-source high-performance messaging. I’ll give a gloss of his talk as well as integrating in sources of information listed at the end of this article.

Martin and his team were in the enviable position of having a client that required a product like Aeron and was willing to both finance its development while also making it open source. So go git Aeron on GitHub. Note, it’s early days for Aeron and they are still in the heavy optimization phase.

The world has changed therefore endpoints need to scale as never before. This is why Martin says we need a new messaging system. It’s now a multi-everything world. We have multi-core, multi-socket, multi-cloud, multi-billion user computing, where communication is happening all the time. Huge numbers of consumers regularly pound a channel to read from same publisher, which causes lock contention, queueing effects, which causes throughput to drop and latency to spike. 

What’s needed is a new messaging library to make the most of this new world. The move to microservices only heightens the need:

As we move to a world of micro services then we need very low and predictable latency from our communications otherwise the coherence component of USL will come to rain fire and brimstone on our designs.

With Aeron the goal is to keep things pure and focused. The benchmarking we have done so far suggests a step forward in throughput and latency. What is quite unique is that you do not have to choose between throughput and latency. With other high-end messaging transports this is a distinct choice. The algorithms employed by Aeron give maximum throughput while minimising latency up until saturation.

“Many messaging products are a Swiss Army knife; Aeron is a scalpel,” says Martin, which is a good way to understand Aeron. It’s not a full featured messaging product in the way you may be used to, like Kafka. Aeron does not persist messages, it doesn’t support guaranteed delivery, nor clustering, nor does it support topics. Aeron won’t know if a client has crashed and be able to sync it back up from history or initialize a new client from history. 

The best way to place Aeron in your mental matrix might be as a message oriented replacement for TCP, with higher level services written on top. Todd Montgomery expands on this idea:

Aeron being an ISO layer 4 protocol provides a number of things that messaging systems can't and also doesn't provide several things that some messaging systems do.... if that makes any sense. Let me explain slightly more wrt all typical messaging systems (not just Kafka and 0MQ). 

One way to think more about where Aeron fits is TCP, but with the option of reliable multicast delivery. However, that is a little limited in that Aeron also, by design, has a number of possible uses that go well beyond what TCP can do. Here are a few things to consider: 

Todd continues on with more detail, so please keep reading the article to see more on the subject.

At its core Aeron is a replicated persistent log of messages. And through a very conscious design process messages are wait-free and zero-copy along the entire path from publication to reception. This means latency is very good and very predictable.

That sums up Aeron is nutshell. It was created by an experienced team, using solid design principles sharpened on many previous projects, backed by techniques not everyone has in their tool chest. Every aspect has been well thought out to be clean, simple, highly performant, and highly concurrent.

If simplicity is indistinguishable from cleverness, then there’s a lot of cleverness going on in Aeron. Let’s see how they did it...

Categories: Architecture

Stuff The Internet Says On Scalability For November 14th, 2014

Fri, 11/14/2014 - 17:56

Hey, it's HighScalability time:


Spectacular rendering of the solar system to scale. (Roberto Ziche)

 

  • 700: number of low-orbit satellites in a sidecar cheap internet; 130 terabytes: AdRoll ad data processed daily; 15 billion: daily Weather Channel forecasts; 1 million: AWS customers
  • Quotable Quotes:
    • @benkepes: Each AWS data center has typically 50k to 80k physical servers. Up to 102Tbps provisioned networking capacity #reinvent
    • @scottvdp: AWS just got rid of infrastructure behind any application tier. Lambda for async distributed events, container engine for everything else.
    • @wif: AWS is handling 7 trillion DynamoDB requests per month in a single region. 4x over last year. same jitter. #reinvent
    • Philae: If my path was off by even half a degree the humans would have had to abort the mission.
    • Al Aho: Well, you can get a stack of stacks, basically. And the nested stack automaton has sort of an efficient way of implementing the stack of stacks, and you can think of it as sort of almost like a cactus. That's why some people are calling it cactus automata, at the time.
    • Gilt: Someone spent $30K on an Acura & LA travel package on their iPhone.
    • @cloudpundit: Gist of Jassy's #reinvent remarks: Are you an enterprise vendor? Do you have a high-margin product/service? AWS is probably coming for you.
    • @mappingbabel: Things coming out from the AWS #reinvent analyst summit - Amazon has minimum 3 million servers & lights up own globe-spanning fibre.
    • @cloudpundit: James Hamilton says mobile hardware design patterns are future of servers. Single-chip offerings, semiconductor-level innovation. #reinvent
    • @rightscale: RT @owenrog: AWS builds its own electricity substations simply because the power companies can't build fast enough to meet demand #reInvent
    • @timanderson: New C4 instances #reinvent up to 36 cores up to 16TB SSD
    • @holly_cummins: L1 cache is a beer in hand, L3 is fridge, main memory is walking to the store, disk access is flying to another country for beer. 
    • @ericlaw: Sample HTTP compression ratios observed on @Facebook: -1300%, -34.5%, -14.7%, -25.4%. ProTip: Don't compress two byte responses. #webperf
    • @JefClaes: It's not the concept that defines the invariants but the invariants that define the concept.

  • It's hard to imagine just a few short years ago AWS did not exist. Now it has 1 million customers, runs 70 million hours of software per month, and their AWS re:Invent conference has a robust 13,500 attendees. Re:Invent shows if Amazon is going to be disrupted, a lack of innovation will not be the cause. The key talking point is that AWS is not just IaaS anymore, AWS is a Platform. The underlying subtext is lock-in. Minecraft-like, Amazon is building out their platform brick by brick. Along with GCE, AWS announced a Docker based container service. Intel designed a special new cloud processor for AWS, which will be available in a new C4 instance type. There's Aurora, a bigger, badder MySQL. To the joy of many EBS is getting bigger and faster. The world is getting more reactive, S3 is emitting events. With less fan fare are an impressive suite of code deployment and management tools. There's also a key management service, a configuration manager, and a service catalog. Most provocative is Lambda, or PaaS++, which as the name suggests is the ability to call a function in response to events. Big deal? It could be, though quite limited currently, only supporting Node.js and a few event types. You can't, for example, terminate a REST request. What it could grow in to is promising, a complete abstraction layer over the cloud so any sense of machines and locations are removed. They aren't the first, but that hardly matters.

  • It's not a history of the Civil War. Or WWW I. Or the Dust Bowl. But An Oral History of Unix. Yes, that much time has passed. Interviewed are many names you'll recognize and some you've probably never heard of. A fascinating window into the deep deep past.

  • No surprise at all. Plants talk to each other using an internet of fungus: We suggest that tomato plants can 'eavesdrop' on defense responses and increase their disease resistance against potential pathogen...the phantom orchid, get the carbon they need from nearby trees, via the mycelia of fungi that both are connected to...Other orchids only steal when it suits them. These "mixotrophs" can carry out photosynthesis, but they also "steal" carbon from other plants...The fungal internet exemplifies one of the great lessons of ecology: seemingly separate organisms are often connected, and may depend on each other. 

  • How do you persist a 200 thousand messages/second data stream while guaranteeing data availability and redundancy? Tadas Vilkeliskis shows you how with Apache Kafka. It excels at high write rates, compession saves lots on network traffic, and a custom C++ http-to-kafka accommodates performance.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Three Reasons You Probably Don’t Need Multi-Data Center Capabilities

Wed, 11/12/2014 - 17:56

This is a guest post by Nikhil Palekar, Systems Architect, FoundationDB

For many organizations that care a lot about strong consistency and low latency or haven’t already built a fault tolerant application tier on top of their database, adding a multiple data center (MDC) database implementation may create more complexity or unintended consequences than meaningful benefits. Why might that be?

Categories: Architecture