Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
Syndicate content
Updated: 15 hours 25 min ago

Sponsored Post: Dow Jones, Spotify, Evernote, Surge, Rackspace, Amazon, Booking, aiCache, Aerospike, Percona, ScaleOut, New Relic, LogicMonitor, AppDynamics, ManageEngine, Site24x7

Tue, 05/14/2013 - 17:00

Who's Hiring?
  • Amazing things are happening at Dow Jones – help build the next generation News and Media platforms that serve the best journalism in the world. High-impact, passionate, and driven technologists thrive in our environment, building platforms that deliver trusted content that enlightens and inspires millions around the world.  Please apply online
  • Want to build scalable systems that power the world's largest music streaming service? Spotify is looking for engineers for our backend infrastructure team. Apply now.
  • At Evernote our vision is to help the world remember everything. If you want to work in a face paced, highly rewarding environment with some of the smartest engineers on the planet, then come join us! We are looking for Sr. Security Engineers and Sr. Operations Engineers/DevOps to join our operations team.
  • LogicMonitor is looking for a Front End developer to have a huge impact, be valued, realize their dreams, and help us realize ours. We are looking for someone to own the code that delivers the design and usability of LogicMonitor's enterprise SaaS application(s). Please apply online
  • We need awesome people @ Booking.com - We want YOU! Come design next generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Please apply online.
  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs
Fun and Informative Events
  • Surge - The Scalability & Performance Conference, presented by OmniTI is happening on Sept. 12th-13th. Special, High Scalability Reader Rate: $50 off registration--now through September 10!
  • It's back! Join the MySQL Community at the annual Percona Live MySQL Conference and Expo in Santa Clara, April 22-25. This year's conference features an outstanding lineup of 92 speakers delivering 112 breakout sessions over three days! 
Cool Products and Services

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution

Mon, 05/13/2013 - 16:30

Now that we have the C10K concurrent connection problem licked, how do we level up and support 10 million concurrent connections? Impossible you say. Nope, systems right now are delivering 10 million concurrent connections using techniques that are as radical as they may be unfamiliar.

To learn how it’s done we turn to Robert Graham, CEO of Errata Security, and his absolutely fantastic talk at Shmoocon 2013 called C10M Defending The Internet At Scale.

Robert has a brilliant way of framing the problem that I’ve never heard of before. He starts with a little bit of history, relating how Unix wasn’t originally designed to be a general server OS, it was designed to be a control system for a telephone network. It was the telephone network that actually transported the data so there was a clean separation between the control plane and the data plane. The problem is we now use Unix servers as part of the data plane, which we shouldn’t do at all. If we were designing a kernel for handling one application per server we would design it very differently than for a multi-user kernel. 

Which is why he says the key is to understand:

  • The kernel isn’t the solution. The kernel is the problem.

Which means:

  • Don’t let the kernel do all the heavy lifting. Take packet handling, memory management, and processor scheduling out of the kernel and put it into the application, where it can be done efficiently. Let Linux handle the control plane and let the the application handle the data plane.

The result will be a system that can handle 10 million concurrent connections with 200 clock cycles for packet handling and 1400 hundred clock cycles for application logic. As a main memory access costs 300 clock cycles it’s key to design in way that minimizes code and cache misses.

With a data plane oriented system you can process 10 million packets per second. With a control plane oriented system you only get 1 million packets per second.

If this seems extreme keep in mind the old saying: scalability is specialization. To do something great you can’t outsource performance to the OS. You have to do it yourself.

Now, let’s learn how Robert creates a system capable of handling 10 million concurrent connections...

Categories: Architecture

Stuff The Internet Says On Scalability For May 10, 2013

Fri, 05/10/2013 - 17:00

Hey, it's HighScalability time:


(In Thailand, they figured out how to solve the age-old queuing problem!)

 

  • Nanoscale: Plants IM Using Nanoscale Sound Waves; 100 petabytes: CERN data storage
  • Quotable Quotes:
    • Geoff Arnold: Arguably all interesting advances in computer science and software engineering occur when a resource that was previously scarce or expensive becomes cheap and plentiful.
    • @jamesurquhart: "Complexity is a characteristic of the system, not of the parts in it." -Dekker
    • @louisnorthmore: Scaling down - now that's scalability!
    • @peakscale: Where distributed systems people retire to forget the madness: http://en.wikipedia.org/wiki/Antipaxos 
    • @dozba: "The Linux Game Database" ... Well, at least they will never have scaling problems.
    • Michael Widenius: There is no reason at all to use MySQL
    • @steveloughran: Whenever someone says "unlimited scalability", ask if that exceeds the berkenstein bound
    • @nationofminds: "I have infinite MIPS. Unlimited scalability. And zero effing patience." 
    • Endowing cells with logic and memory: Genetic circuits that process and permanently store information are created with recombinases that flip the orientation of DNA cassettes.

  • Search Is Eating The World. The long sought after Nirvana of search and database becoming one may be nigh. 

  • And you thought scalability didn't pay: Twitter Acquires Palo Alto-Based Scalable Computing Startup Ubalo

  • New Finds: @foodfight is an interesting and informative Chef oriented DevOps podcast you may enjoy if that's the sort of thing you enjoy, which you probably do. From which I learned from fellow Way of Kings aficionado Brandon Burton about a new deep systems podcast called Real Talk by James Golick and Joe Damato, who want to talk about things concrete, not like that Hacker News BS.

  • I'd love to see the API: The idea we live in a simulation isn't science fiction. Magic anyone?

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Categories: Architecture

Typesafe Interview: Scala + Akka is an IaaS for Your Process Architecture

Wed, 05/08/2013 - 17:00

This is an email interview with Viktor Klang, Director of Engineering at Typesafe, on the Scala Futures model & Akka, both topics on which is he is immensely passionate and knowledgeable.

How do you structure your application? That’s the question I explored in the article Beyond Threads And Callbacks. An option I did not talk about, mostly because of my own ignorance, is a powerful stack you may not be all that familiar with: Scala and Akka.

To remedy my oversight is our acting tour guide, Typesafe’s Viktor Klang, long time Scala hacker and Java enterprise systems architect. Viktor was very patient in answering my questions and was enthusiastic about sharing his knowledge. He’s a guy who definitely knows what he is talking about.

I’ve implemented several Actor systems along with the messaging infrastructure, threading, async IO, service orchestration, failover, etc, so I’m innately skeptical about frameworks that remove control from the programmer at the cost of latency.

So at the end of the interview am I ready to drink the koolaid? Not quite, but I’ll have a cup of coffee with the idea. 

I came to think of Scala + Akka as a kind of a IaaS for your process architecture. Toss in Play for the web framework and you have a slick stack, with far more out of the box power than Go, Node, or plaino jaino Java.

The build or buy decision is surprisingly similar to every other infrastructure decision you make. Should you use a cloud or build your own? It’s the same sort of calculation you need to go through when deciding on your process architecture. While at the extremes you lose functionality and flexibility, but since they’ve already thought of most everything you would need to think about, with examples, and support, you gain a tremendous amount too. Traditionally, however, processes architecture has been entirely ad-hoc. That may be changing. 

Now, let’s start the interview with Viktor...

Categories: Architecture

Not Invented Here: A Comical Series on Scalability

Tue, 05/07/2013 - 19:36

I read one of these poignantly humorous comics on Not Invented Here a while back and since I wasn't sure it was OK to repost I emailed asking for permission. Nada. Then I saw Martijn de Vrieze posted a collection of scalability comics from NIH and decided what the heck (click image to read on site):

Thanks to Martijn for curating the collection and NIH for creating them.

And I agree with Martijn, they do capture an ineffable quality about the entire space.

Categories: Architecture

7 Not So Sexy Tips for Saving Money On Amazon

Mon, 05/06/2013 - 17:01

Harish Ganesan CTO of 8KMiles has a very helpful blog, Cloud, Big Data and Mobile, where he shows a nice analytical bent which leads to a lot of practical advice and cost saving tips:
  1. Use SQS Batch Requests to reduce the number of requests hitting SQS which saves costs. Sending 10 messages in a single batch request which in the example save $30/month.
  2. Use SQS Long Polling to reduce extra polling requests, cutting down empty receives, which in the example saves ~$600 in empty receive leakage costs.
  3. Choose the right search technology choice to save costs in AWS by matching your activity pattern to the technology. For a small application with constant load or a heavily utilized search tier or seasonal loads Amazon Cloud Search looks like the cost efficient play. 
  4. Use Amazon CloudFront Price Class to minimize costs by selecting the right Price Class for your audience to potentially reduce delivery costs by excluding Amazon CloudFront’s more expensive edge locations.
  5. Optimize ElastiCache Cluster costs by right sizing cluster node sizes. For different usage scenarios (heavy, moderate, low) their are optimal instances types. Choosing the right type for the right usage scenario saves money.
  6. Amazon Auto Scaling can save costs by better matching demand and capacity. Certainly not a new idea but the diagrams, different leakage scenarios (daily spike, weekly fluctuation, seasonal spike), and the explanation of potential savings (substantial) are well done.
  7. Use Amazon S3 Object Expiration feature to delete old backups, logs, documents, digital media, etc. A leakage of ~20 TB adds up to a tidy ~1650 USD a year. 
Categories: Architecture

Stuff The Internet Says On Scalability For May 3, 2013

Fri, 05/03/2013 - 17:00

Hey, it's HighScalability time:


(Giant Hurricane on Saturn, here's one in New Orleans)

 

  • 1,966,080 cores: Time Warp synchronization protocol using up to 7.8M MPI tasks on 1,966,080 cores of the {Sequoia} Blue Gene/Q supercomputer system. 33 trillion events processed in 65 seconds yielding a peak event-rate in excess of 504 billion events/second using 120 racks of Sequoia.
  • Quotable Quotes:
    • Thad Starner: the longer accessing a device exceeds 2s, the more its actually usage would decrease exponentially. Thus, he made a claim that wrist watch interface always sitting on one's wrist ready to use should be more successful than mobile phones which have to pulled out of the pocket. 
    • @joedevon: We came for scalability but we stayed for agility #NoSQL
    • @jahmailay: "Our user base is exploding. I really wish we spent more time on scalability instead of features customers don't use." - Everybody, always.
    • @bsletten: I don’t think it is a coincidence that the words eval() and evil are so close.
    • @RCSecure: Maybe Gov should stop deploying crappy #CyberSecurity instead of Surveiling Citizens
    • @davidpav: "This is what Netflix does - after each deployment creates AMI for faster scaling up"
    • @franzgranlund: Rewrote my little batch-processing application using #akka . 20% performance increase just like that - and now it is easier to scale.
    • @marshray: Ouch, that's kind of dismal. Perhaps we need a new term: "eventual scalability"
    • @adrianco: RT @rbranson: @cscotta load average is the worst thing ever. Slowly trying to evangelize it's demise as a reasonable metric. < +1 every 15 m

  • MIT Tech Review picks 10 breakthrough technologies: Smart Watches (really?), Memory implants (deciphering the code by which the brain forms long-term memories), Additive manufacturing (3-D printing), Supergrids (finally says Edison, DC powergrids), Temporary social media (sigh), Prenatal DNA sequencing (great for full lifecycle ad targeting), Baxter (compliant robots), Deep Learning (the singularity is near), Ultra-Efficient Solar Power (now we are talking). Prediction: We'll laugh at all this filter control talk once we have all of Google's datacenters and knowledge graph software implanted in our heads.

  • IBM on making movies using atoms as pixels. Characterization was a little thin but the plot was magnetic.

  • Lesson from Airbnb: Give yourself permission to experiment with non-scalable changes. Building better is better than building bigger.

  • Here's a short review by me on CyberStorm by Matthew Mather. Matthew is also the author of the most excellent Atopia Chronicles, a sprawling exploration of "artificial intelligence, distributed computing, nanotechnology, and the full range of humanity." CyberStorm is a chilling blow by blow of what could happen in a real cyber attack. As a programmer it's the implied idea of a kind of Crises OS built on a mesh of smartphones that I found most fascinating. Not much seems to be done in this area and even the how-to of writing such applications is rarely discussed. Could be interesting.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Categories: Architecture

Myth: Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue

Wed, 05/01/2013 - 17:00

In NoSQL: Past, Present, Future Eric Brewer has a particularly fine section on explaining the often hard to understand ideas of BASE (Basically Available, Soft State, Eventually Consistent), ACID (Atomicity, Consistency, Isolation, Durability), CAP (Consistency Availability, Partition Tolerance), in terms of a pernicious long standing myth about the sanctity of consistency in banking.

Myth: Money is important, so banks must use transactions to keep money safe and consistent, right?

Reality: Banking transactions are inconsistent, particularly for ATMs. ATMs are designed to have a normal case behaviour and a partition mode behaviour. In partition mode Availability is chosen over Consistency.

Why? 1) Availability correlates with revenue and consistency generally does not. 2) Historically there was never an idea of perfect communication so everything was partitioned...

Categories: Architecture

Sponsored Post: Spotify, Evernote, Surge, Rackspace, Simple, Amazon, Booking, aiCache, Aerospike, Percona, ScaleOut, New Relic, LogicMonitor, AppDynamics, ManageEngine, Site24x7

Tue, 04/30/2013 - 17:00

Who's Hiring?
  • Want to build scalable systems that power the world's largest music streaming service? Spotify is looking for engineers for our backend infrastructure team. Apply now.
  • At Evernote our vision is to help the world remember everything. If you want to work in a face paced, highly rewarding environment with some of the smartest engineers on the planet, then come join us! We are looking for Sr. Security Engineers and Sr. Operations Engineers/DevOps to join our operations team.
  • LogicMonitor is looking for a Front End developer to have a huge impact, be valued, realize their dreams, and help us realize ours. We are looking for someone to own the code that delivers the design and usability of LogicMonitor's enterprise SaaS application(s). Please apply online
  • We need awesome people @ Booking.com - We want YOU! Come design next generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Please apply online.
  • Help build the platform that powers a better, fairer banking experience at Simple. Join a talented team that chooses its own tools; works across web, Android, iOS, and Ruby/Scala/Clojure backend apps; and develops a secure and scalable banking service on AWS. Learn more at careers.
  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs
Fun and Informative Events
  • Surge - The Scalability & Performance Conference, presented by OmniTI is happening on Sept. 12th-13th. Special, High Scalability Reader Rate: $50 off registration--now through September 10!
  • It's back! Join the MySQL Community at the annual Percona Live MySQL Conference and Expo in Santa Clara, April 22-25. This year's conference features an outstanding lineup of 92 speakers delivering 112 breakout sessions over three days! 
Cool Products and Services

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

AWS v GCE Face-off and Why Innovation Needs Lower Cost Infrastructures

Mon, 04/29/2013 - 17:00

This is a repost of part 2 (part 1) of an interview I did for the Boundary blog.

Boundary:  There’s another battle coming down the pike between Amazon (AWS) and Google (GCE). How should the CTO decide which one’s best?

Hoff: Given that GCE is still closed to public access we have very little common experience on which to judge. The best way to decide is as always, by running a few experiments. Pick a few representative projects, a representative team, implement the projects on both infrastructures, crunch some numbers, figure out the bigger picture and then select the one you wanted in the first place :-) .

Sebastian Stadil, founder of Scalr, recently wrote about his experiences on both platforms and found some interesting differences: AWS has a much richer set of services; GCE is on-demand only, so AWS can be cheaper; GCE has faster disk and faster network IO, especially between datacenters; GCE has faster boot times and can mount read-only partitions across multiple machines; and GCE shares images across regions...

Categories: Architecture

Stuff The Internet Says On Scalability For April 26, 2013

Fri, 04/26/2013 - 17:05

Hey, it's HighScalability time:

 

  • 100 Billion -  Neurons in The Human Brain, As Many Cells as Stars in the Milky Way; 10TB - Tumblr memcache
  • Quoteable Quotes:
    • @thoward3: OH: "We make scalability a possibility.. You know, we make 'scalapossibilty'. "
    • Tesla: When wireless is perfectly applied the whole earth will be converted into a huge brain, which in fact it is, all things being particles of a real and rhythmic whole. We shall be able to communicate with one another instantly, irrespective of distance. Not only this, but through television and telephony we shall see and hear one another as perfectly as though we were face to face, despite intervening distances of thousands of miles; and the instruments through which we shall be able to do this will be amazingly simple compared with our present telephone. A man will be able to carry one in his vest pocket.
    • @ADTELLIGENCE: Data on the internet: Data of all of 1993 = Data of 1 second in 2013
    • Nassim Taleb: Man-made complex systems tend to develop cascades and runaway chains of reactions that decrease, even eliminate, predictability and cause outsized events. So the modern world may be increasing in technological knowledge, but, paradoxically, it is making things a lot more unpredictable.
    • The Bw-Tree: A B-tree for New Hardware Platforms: We believe that latch free techniques and state changes that avoid update-in-place are the keys to high performance on modern processors.
    • @rvirding: WhatsApp "Bigger Than Twitter" With Over 200M Monthly Active Users, 8B Inbound And 12B and they use #erlang
    • Jasper Fforde: There’s a lot to be said about merely having a hazy idea of what’s going on but generally reaching the right outcome by following broad policy outlines. In fact, I’ve a sneaky suspicion that it’s the only way of getting things done. Once the horror and unpredictability of unintended consequences gets a hold, even the best-intentioned and noblest of plans generally descend to mayhem, confusion and despair.
    • @enygma: I'm starting to think the Twitter unfollow bug is actually their way to handle scalability
    • @ndubaz: Spent last 2 days training with the Army's latest virtual trainers. More skeptical than ever of scalability and utility for light forces.
    • @bernardgolden: Airbnb workflow control system was 10K (!) lines of bash script.
  • Scaling Deployment at Etsy by Daniel Schauenberg. 1.49 billion page views, 4,215,169 items sold, $94.7 million of goods sold, 22+ million members, 800,000+ active shops. LAMMP + Monolithic App + No Branching + Frequent deployment + lots more.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Categories: Architecture

Paper: Making reliable distributed systems in the presence of software errors

Thu, 04/25/2013 - 17:25

Joe Armstrong is a co-inventor of Erlang and general all around renaissance software tinkerer as shown by his excellent work on writing a C Compiler and his voluminous work on GitHub.

Given the success of Erlang it's probably no surprise that he wrote his thesis on the ground breaking ideas behind Erlang: Making reliable distributed systems in the presence of software errors.

Even if you have yet to join the cult of Erlang the principles behind Erlang are universal and well worth exploring for your own designs. Highly recommended.

Introduction:

Categories: Architecture

Strategy: Using Lots of RAM Often Cheaper than Using a Hadoop Cluster

Wed, 04/24/2013 - 17:25

Solving problems while saving money is always a problem. In Nobody ever got ï¬red for using Hadoop on a cluster they give some counter-intuitive advice by showing a big-memory server may  provide better performance per dollar than a cluster:

  1. For jobs where the input data is multi-terabyte or larger a Hadoop cluster is the right solution.
  2. For smaller problems memory has reached a GB/$ ratio where it is technically and financially feasible to use a single server with 100s of GB of DRAM rather than a cluster. Given the majority of analytics jobs do not process huge data sets, a cluster doesn't need to be your first option. Scaling up RAM saves on programmer time, reduces programmer effort, improved accuracy, and reduces hardware costs.

 

Categories: Architecture

Facebook Secrets of Web Performance

Tue, 04/23/2013 - 17:25

This is a repost of part 1 of an interview I did for the Boundary blog.

Boundary: What is Facebook’s secret sauce for managing what’s got to be the biggest Big Data project, if you will, on the Web?

Hoff: From several presentations we’ve learned what Facebook insiders like Aditya Agarwal and Robert Johnson, both former Directors of Engineering, consider their secret sauce:

Categories: Architecture

Stuff The Internet Says On Scalability For April 19, 2013

Fri, 04/19/2013 - 17:25

Hey, it's HighScalability time:


(Ukrainian daredevil scaling buildings)
  • Two Trillion Objects, 1.1 Million Requests / Second: S3; 1.4TB/s: Titan supercomputer has world’s fastest storage; four billion hours: Netflix streaming in last 3 months; $1.2B: Google's Q1 infrastructure spend
  • Quotable Quotes:
    • Google: We'll track EVERY task on EVERY data center server
    • Stacey Higginbotham: All in all in the last five years the world has gained 54 Tbps of new capacity.
    • @seveas: Scalability 103: Hardware sucks. Software sucks. Everything *will* break, prepare for failure of any component of your system.
    • bloodredsun: The long and short of it is that Cassandra is a fantastic system for write heavy situations. What it is not good at are read heavy situations where deterministic low latency is required, which is pretty much what the pinterest guys were dealing with.
    • @viktorklang: "The e-mail message could not be delivered because the user's mailfolder is full." <-- EMAIL HAS BACKPRESSURE OMG
  • Interesting Behind the Scenes: Airbnb Neighborhoods. Includes a description of their work flow and a detailed breakdown of their stack: Rails, PostgreSQL/PostGIS, Memcached, CoffeeScript, Sass, jQuery, Handlebars, Backbone, Underscore, Sinatra, Clojure, Java, Hadoop, Cascalog. Highlight: "You don't need a database, you need a [expletive deleted] cache" So that's what we did, we traded our database for a cache.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Categories: Architecture

Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS

Wed, 04/17/2013 - 17:25

Tachyon  (github) is interesting new filesystem brought to by the folks at the UC Berkeley AMP Lab:

Tachyon is a fault tolerant distributed ï¬le system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.It offers up to 300 times higher throughput than HDFS, by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that is frequently read. It has a Java-like File API, native support for raw tables, a pluggable file system, and it works with Hadoop with no modifications.   It might work well for streaming media too as you wouldn't have to wait for the complete file to hit the disk before rendering. Discuss on Hacker News
Categories: Architecture

Sponsored Post: Surge, Rackspace, Simple, Fitbit, Amazon, Booking, aiCache, Aerospike, Percona, ScaleOut, New Relic, LogicMonitor, AppDynamics, ManageEngine, Site24x7

Tue, 04/16/2013 - 17:28

Who's Hiring?
  • LogicMonitor is looking for a Front End developer to have a huge impact, be valued, realize their dreams, and help us realize ours. We are looking for someone to own the code that delivers the design and usability of LogicMonitor's enterprise SaaS application(s). Please apply online
  • We need awesome people @ Booking.com - We want YOU! Come design next generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Please apply online.
  • Help build the platform that powers a better, fairer banking experience at Simple. Join a talented team that chooses its own tools; works across web, Android, iOS, and Ruby/Scala/Clojure backend apps; and develops a secure and scalable banking service on AWS. Learn more at careers.
  • Fitbit is hiring a Site Operations Lead to help us on our mission to make the world a healthier place! Fitbit's wearable fitness devices are worn by people across the world, each syncing with the web site, wirelessly and automatically, every 15 minutes. Join our mission here!
  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs
  • Aerospike is Hiring! You dream in C - and like it? Then join us as a Senior Distributed Systems Engineer or Client / Application Engineer. People covent your bag of tricks for troubleshooting systems and network issues? Join our Operations and QA team. See if these positions are a fit for you! 
Fun and Informative Events
  • Surge - The Scalability & Performance Conference, presented by OmniTI is happening on Sept. 12th-13th. Special, High Scalability Reader Rate: $50 off registration--now through September
  • It's back! Join the MySQL Community at the annual Percona Live MySQL Conference and Expo in Santa Clara, April 22-25. This year's conference features an outstanding lineup of 92 speakers delivering 112 breakout sessions over three days! 
Cool Products and Services
  • The Rackspace Cloud Application Programming Interface (API) has changed the game allowing customers to easily modify their cloud configuration with just a few lines of code.  Read about three of the most popular things that customers do with the Rackspace AP.
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • New Benchmark shows Aerospike nearly 10x Faster than the Competition. Thumbtack Technology YCSB Benchmark shows Aerospike nearly 10x faster than Cassandra, Couchbase and Mongodb. Read it now!
  • ScaleOut Software. In-Memory Data Grids for the Enterprise. Download a Free Trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years

Mon, 04/15/2013 - 17:25

Pinterest has been riding an exponential growth curve, doubling every month and half. They’ve gone from 0 to 10s of billions of page views a month in two years, from 2 founders and one engineer to over 40 engineers, from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances.

Stunning growth. So what’s Pinterest's story? To tell their story we have our bards, Pinterest’s Yashwanth Nelapati and Marty Weiner, who tell the dramatic story of Pinterest’s architecture evolution in a talk titled Scaling Pinterest. This is the talk they would have liked to hear a year and half ago when they were scaling fast and there were a lot of options to choose from. And they made a lot of incorrect choices.

This is a great talk. It’s full of amazing details. It’s also very practical, down to earth, and it contains strategies adoptable by nearly anyone. Highly recommended.

Two of my favorite lessons from the talk:

  1. Architecture is doing the right thing when growth can be handled by adding more of the same stuff. You want to be able to scale by throwing money at a problem which means throwing more boxes at a problem as you need them. If you are architecture can do that, then you’re golden.
  2. When you push something to the limit all technologies fail in their own special way. This lead them to evaluate tool choices with a preference for tools that are: mature; really good and simple; well known and liked; well supported; consistently good performers; failure free as possible; free. Using these criteria they selected: MySQL, Solr, Memcache, and Redis. Cassandra and Mongo were dropped.

These two lessons are interrelated. Tools following the principles in (2) can scale by adding more boxes. And as load increases mature products should have fewer problems. When you do hit problems you’ll at least have a community to help fix them.  It’s when your tools are too tricky and too finicky that you hit walls so high you can’t climb over.

It’s in what I think is the best part of the entire talk, the discussion of why sharding is better than clustering, that you see the themes of growing by adding resources, few failure modes, mature, simple, and good support, come into full fruition. Notice all the tools they chose grow by adding shards, not through clustering. The discussion of why they prefer sharding and how they shard is truly interesting and will probably cover ground you’ve never considered before.

Now, let’s see how Pinterest scales:

Categories: Architecture

Stuff The Internet Says On Scalability For April 12, 2013

Fri, 04/12/2013 - 17:25

Hey, it's HighScalability time:


(Ukrainian daredevil scaling buildings)

 

  • 877,000 TPS: Erlang and VoltDB. 
  • Quotable Quotes:
    • Hendrik Volkmer: Complexity + Scale => Reduced Reliability + Increased Chance of catastrophic failures
    • @TheRealHirsty: This coffee could use some "scalability"
    • @billcurtis_: Angular.js with Magento + S3 json file caching = wicked scalability
    • Dan Milstein: Screw you Joel Spolsky, We're Rewriting It From Scratch!
    • Anil Dash: Terms of Service and IP trump the Constitution
    • Jeremy Zawodny: Yeah, seek time matters. A lot.
    • @joeweinman: @adrianco proves why auto scaling is better than curated capacity management. < 50% + Cost Saving
    • @ascendantlogic: Any "framework" naturally follows this progression. Something is complex so someone does something to make it easier. Everyone rushes to it but needs one or two things from the technologies they left behind so they introduce that into the "new" framework. Over the years everyone's edge cases are accounted for with frameworks on top of frameworks and suddenly everyone is looking for the next big simplification.
  • Imagine if you had a beowulf cluster of tiny antennas? You could build a TV rebroadcasting service that has old media running for the Galt's Gulch of pay TV.
  • As a technologically advanced nation, why haven't we done this yet? Nationwide Google Fiber would cost $11B over five years, probably will never happen. I say this while using my nation wide power/telephone/road/defense system.
  • Great list of technical talks. I'm partial to Big Ball of Mud.
  • Making Black Swans work for you: Stick to simple rules; Decentralize; Develop layered systems; Build in redundancy and overcompensation; Resist the urge to suppress randomness; Ensure everyone has skin in the game; Give higher status to practitioners rather than theoreticians.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Categories: Architecture

Check Yourself Before You Wreck Yourself - Avocado's 5 Early Stages of Architecture Evolution

Wed, 04/10/2013 - 17:25

In Don’t panic! Here’s how to quickly scale your mobile apps Mike Maelzer paints a wonderful picture of how Avocado, a mobile app for connecting couples, evolved to handle 30x traffic within a few weeks. If you are just getting started then this is a great example to learn from.

What I liked: it's well written, packing a lot of useful information in a little space; it's failure driven, showing the process of incremental change driven by purposeful testing and production experience; it shows awareness of what's important, in their case, user signup; a replica setup was used for testing, a nice cloud benefit. 

Their Biggest lesson learned is a good one:

It would have been great to start the scaling process much earlier. Due to time pressure we had to make compromises –like dropping four of our media resizer boxes. While throwing more hardware at some scaling problems does work, it’s less than ideal.

Here's my gloss on the article:

Evolution One - Make it Work
Categories: Architecture