Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
Syndicate content
Updated: 1 day 17 hours ago

Paper: Actor Model of Computation: Scalable Robust Information Systems

Wed, 10/22/2014 - 16:56

With Reactive Systems becoming the new old hotness, it will help to have a thorough grounding in the Actor Model. Here's a good start. Carl Hewitt in Actor Model of Computation: Scalable Robust Information Systems gives a very thorough and relatively concise explanation of the Actor model.

Here's the abstract.

The Actor model is a mathematical theory that treats "Actors" as the universal primitives of concurrent digital computation. The model has been used both as a framework for a theoretical understanding of concurrency, and as the theoretical basis for several practical implementations of concurrent systems. Unlike previous models of computation, the Actor model was inspired by physical laws. It was also influenced by the programming languages Lisp, Simula 67 and Smalltalk-72, as well as ideas for Petri Nets, capability-based systems and packet switching. The advent of massive concurrency through client-cloud computing and many-core computer architectures has galvanized interest in the Actor model.


Actor technology will see significant application for integrating all kinds of digital information for individuals, groups, and organizations so their information usefully links together. Information integration needs to make use of the following information system principles:
    * Persistence. Information is collected and indexed.
    * Concurrency: Work proceeds interactively and concurrently, overlapping in time.
    * Quasi-commutativity: Information can be used regardless of whether it initiates new work or become relevant to ongoing work.
    * Sponsorship: Sponsors provide resources for computation, i.e., processing, storage, and communications.
    * Pluralism: Information is heterogeneous, overlapping and often inconsistent.
    * Provenance: The provenance of information is carefully tracked and recorded

The Actor Model is intended to provide a foundation for inconsistency robust information integration.

Related Articles
Categories: Architecture

Facebook Mobile Drops Pull For Push-based Snapshot + Delta Model

Mon, 10/20/2014 - 16:56

We've learned mobile is different. In If You're Programming A Cell Phone Like A Server You're Doing It Wrong we learned programming for a mobile platform is its own specialty. In How Facebook Makes Mobile Work At Scale For All Phones, On All Screens, On All Networks we learned bandwidth on mobile networks is a precious resource. 

Given all that, how do you design a protocol to sync state (think messages, comments, etc.) between mobile nodes and the global state holding servers located in a datacenter?

Facebook recently wrote about their new solution to this problem in Building Mobile-First Infrastructure for Messenger. They were able to reduce bandwidth usage by 40% and reduced by 20% the terror of hitting send on a phone.

That's a big win...that came from a protocol change.

Facebook Messanger went from a traditional notification triggered full state pull:

Categories: Architecture

Stuff The Internet Says On Scalability For October 17th, 2014

Fri, 10/17/2014 - 16:56

Hey, it's HighScalability time:


What could this be? Swarms of drones painting 3D light sculptures against the night sky!
  • Quotable Quotes:
    • Visnja Zeljeznjak: Steve Jobs' product pricing formula: cost of materials x 3 + 33%
    • Benedict Evans: We now have over 2bn iOS and Android devices on earth, and this will grow in the next few years to well over 3bn.
    • @ClearStoryData: It's true! Avg beer drinker attracts 4.4% more Mosquitos than water drinker #Strataconf
    • Leslie Lamport: The core idea of the problem of that notion of causality came about because of my familiarity with special relativity...where one event could causally effect another depended on weather or not information from one could physically reach the other.
    • @laurelatoreilly: Fascinating session about cargo ships going dark to shift market prices #IoT #strataconf "your decisions are only as good as your data"
    • @muratdemirbas: Distributed/decentralized coordination is expensive & hard to scale. Centralized coordination is cheap & scales easily using hierarchies.
    • @froidianslip: ”Kafka is awesome. We heard it cures cancer." -- @gwenshap #Strataconf
    • @timoreilly: RT @grapealope: The self-driving car has 6000 sensors, and takes readings at 4Hz. That's a lot of data. @MCSrivas #strataconf #MapR
    • @froidianslip: Love the paraphrase borrowed from Ray Bradbury, "Any sufficiently complex configuration is indistinguishable from code." #Strataconf
    • @matei_zaharia: Spark shatters MapReduce's 100 TB and 1 PB sort records... with 10x fewer nodes
    • @msallstr: “Synchronous calls in this environment are the crystal meth of programming”  @mjpt777 on the new   reactive manifesto 
    • @postwait: “If you put them under enough stress, perfectly rational people will panic and start believing in science” #priceless
    • Ilya Grigorik: It's great to see access from mobile is around 30% faster compared to last year.
    • @ryandotsmith: Recently migrated an async system to SQS. Much simple. Tiny latency. Here is the code (maybe a gem?)

  • People just don't appreciate the power of messy. The problematic culture of "Worse is Better". There's an implied notion here that people can't recognize better when they see it. Better is not a platonic ideal. It can't be proved by argument. Better, like evolution, is something that works itself out in practice. Like evolution, Worse is Better is an algorithm for stepping through a possibility space by jumping from one working phenotype to the next more adapted working phenotype. And for many, that's better. Not Ideal, but Better.

  • The Times They Are a-Changin'. Docker and Microsoft partner to drive adoption of distributed applications. What's the goal? nickstinemates: Package your Windows app in a docker container, use same tooling you would otherwise use to deploy to a docker engine running on a Windows host. Package your Linux app in a docker container, use same tooling you would otherwise use to deploy to a docker engine running on a Linux host.

  • Leandro Pereira writes a fine autobiography in Life of a HTTP request, as seen by my toy web server. All the stages of life are there. Socket creation. Acceptance. Scheduling. Coroutines. Reading requests. Parsing requests. All the way to the reply and the death of the connection. A lot to learn if you want to look at the simplified internals of a service.

  • Wonderful talk: Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions. Kyle Kingsbury takes a detailed look at different partition problems in different databases. There are split brains. Masters dying. Lost data. General network mayhem. It's great. The lesson: what's written down in the marketing documentation is not always what you get. Test your application and see what really happens. The world is not simple. A dumb solution where you understand the failure modes can be a good choice.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Using a SSD Cache in Front of EBS Boosted Throughput by 50%, for Free

Wed, 10/15/2014 - 16:56

Using EBS has lots of advantages--reliability, snapshotting, resizing--but overcoming the performance problems by using Provisioned IOPS is expensive. 

Swrve, an integrated marketing and A/B testing and optimization platform for mobile apps, did something clever. They are using the c3.xlarge EC2 instances, that have two 40GB SSD devices per instance, as a cache.

They found through testing RAID-0 striping using a 4-way stripe along with enhanceio, effectively increased throughput by over 50%, for free. With no filesystem corruption problems.

How is it free? "We were planning on upgrading to the C3 class of instance anyway, and sticking with EBS as the backing store. Once you’re using an instance which has SSD ephemeral storage, there are no additional fees to use that hardware."

For great analysis, lots of juicy details, graphs, and configuration commands, please take a look at How we increased our EC2 event throughput by 50%, for free

Categories: Architecture

Sponsored Post: Apple, Hypertable, VSCO, Gannett, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Tue, 10/14/2014 - 17:04

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Senior Engineer: Mobile Services. The Emerging Technologies/Mobile Services team is looking for a proactive and hardworking software engineer to join our team. The team is responsible for a variety of high quality and high performing mobile services and applications for internal use. We seek an accomplished server-side engineer capable of delivering an extraordinary portfolio of features and services based on emerging technologies to our internal customers. Please apply here.
    • Apple Pay Automation Engineer. The Apple Pay group within iOS Systems is looking for a outstanding automation engineer with strong experience in building client and server test automation. We work in an agile software development environment and are building infrastructure to move towards continuous delivery where every code change is thoroughly tested by push of a button and is considered ready to be deployed if we choose so. Please apply here
    • Site Reliability Engineer. As a member of the Apple Pay SRE team, you’re expected to not just find the issues, but to write code and fix them. You’ll be involved in all phases and layers of the application, and you’ll have a direct impact on the experience of millions of customers. Please apply here.
    • Software Engineering Manager. In this role, you will be communicating extensively with business teams across different organizations, development teams, support teams, infrastructure teams and management. You will also be responsible for working with cross-functional teams to delivery large initiatives. Please apply here

  • VSCO. Do you want to: ship the best digital tools and services for modern creatives at VSCO? Build next-generation operations with Ansible, Consul, Docker, and Vagrant? Autoscale AWS infrastructure to multiple Regions? Unify metrics, monitoring, and scaling? Build self-service tools for engineering teams? Contact me (Zo, zo@vs.co) and let’s talk about working together. vs.co/careers.

  • Gannett Digital is looking for talented Front-end developers with strong Python/Django experience to join their Development & Integrations team. The team focuses on video, user generated content, API integrations and cross-site features for Gannett Digital’s platform that powers sites such as http://www.usatoday.com, http://www.wbir.com or http://www.democratandchronicle.com. Please apply here.

  • Platform Software Engineer, Sprout Social, builds world-class social media management software designed and built for performance, scale, reliability and product agility. We pick the right tool for the job while being pragmatic and scrappy. Services are built in Python and Java using technologies like Cassandra and Hadoop, HBase and Redis, Storm and Finagle. At the moment we’re staring down a rapidly growing 20TB Hadoop cluster and about the same amount stored in MySQL and Cassandra. We have a lot of data and we want people hungry to work at scale. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Sign Up for New Aerospike Training Courses.  Aerospike now offers two certified training courses; Aerospike for Developers and Aerospike for Administrators & Operators, to help you get the most out of your deployment.  Find a training course near you. http://www.aerospike.com/aerospike-training/

  • November TokuMX Meetups for Those Interested in MongoDB. Join us in one of the following cities in November to learn more about TokuMX and hear TokuMX use cases. 11/5 - London;11/11 - San Jose; 11/12 - San Francisco. Not able to get to these cities? Check out our website for other upcoming Tokutek events in your area - www.tokutek.com/events.
Cool Products and Services
  • Hypertable Inc. Announces New UpTime Support Subscription Packages. The developer of Hypertable, an open-source, high-performance, massively scalable database, announces three new UpTime support subscription packages – Premium 24/7, Enterprise 24/7 and Basic. 24/7/365 support packages start at just $1995 per month for a ten node cluster -- $49.95 per machine, per month thereafter. For more information visit us on the Web at http://www.hypertable.com/. Connect with Hypertable: @hypertable--Blog.

  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Diagnose server issues from a single tab. Scalyr replaces all your monitoring and log management services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. Engineers say it's powerful and easy to use. Customer support teams use it to troubleshoot user issues. CTO's consider it a smart alternative to Splunk, with enterprise-grade functionality, sane pricing, and human support. Trusted by in-the-know companies like Codecademy – learn more!

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

How League of Legends Scaled Chat to 70 million Players - It takes Lots of minions.

Mon, 10/13/2014 - 16:56

How would you build a chat service that needed to handle 7.5 million concurrent players, 27 million daily players, 11K messages per second, and 1 billion events per server, per day?

What could generate so much traffic? A game of course. League of Legends. League of Legends is a team based game, a multiplayer online battle arena (MOBA), where two teams of five battle against each other to control a map and achieve objectives.

For teams to succeed communication is crucial. I learned that from Michal Ptaszek, in an interesting talk on Scaling League of Legends Chat to 70 million Players (slides) at the Strange Loop 2014 conference. Michal gave a good example of why multiplayer team games require good communication between players. Imagine a basketball game without the ability to call plays. It wouldn’t work. So that means chat is crucial. Chat is not a Wouldn’t It Be Nice feature.

Michal structures the talk in an interesting way, using as a template the expression: Make it work. Make it right. Make it fast.

Making it work meant starting with XMPP as a base for chat. WhatsApp followed the same strategy. Out of the box you get something that works and scales well...until the user count really jumps. To make it right and fast, like WhatsApp, League of Legends found themselves customizing the Erlang VM. Adding lots of monitoring capabilities and performance optimizations to remove the bottlenecks that kill performance at scale.

Perhaps the most interesting part of their chat architecture is the use of Riak’s CRDTs (commutative replicated data types) to achieve their goal of a shared nothing fueled massively linear horizontal scalability. CRDTs are still esoteric, so you may not have heard of them yet, but they are the next cool thing if you can make them work for you. It’s a different way of thinking about handling writes.

Let’s learn how League of Legends built their chat system to handle 70 millions players...

Stats
Categories: Architecture

Stuff The Internet Says On Scalability For October 10th, 2014

Fri, 10/10/2014 - 16:56

Hey, it's HighScalability time:


Social climber: Instagram explorer scales to new heights in New York.

 

  • 11 billion: world population in 2100; 10 petabytes: Size of Netflix data warehouse on S3; $600 Billion: the loss when a trader can't type; 3.2: 0-60 mph time of probably not my next car.
  • Quotable Quotes:
    • @kahrens_atl: Last week #NewRelic Insights wrote 618 billion events and ran 237 trillion queries with 9 millisecond response time #FS14
    • @sustrik: Imagine debugging on a quantum computer: Looking at the value of a variable changes its value. I hope I'll be out of business by then.
    • Arrival of the Fittest: Solving Evolution's Greatest Puzzle: Every cell contains thousands of such nanomachines, each of them dedicated to a different chemical reaction. And all their complex activities take place in a tiny space where the molecular building blocks of life are packed more tightly than a Tokyo subway at rush hour. Amazing.
    • Eric Schmidt: The simplest outcome is we're going to end up breaking the Internet," said Google's Schmidt. Foreign governments, he said, are "eventually going to say, we want our own Internet in our country because we want it to work our way, and we don't want the NSA and these other people in it.
    • Antirez: Basically it is neither a CP nor an AP system. In other words, Redis Cluster does not achieve the theoretical limits of what is possible with distributed systems, in order to gain certain real world properties.
    • @aliimam: Just so we can fathom the scale of 1B vs 1M: 1,000,000 seconds is 11.5 days. 1,000,000,000 seconds is 31.6 YEARS
    • @kayousterhout: 92% of catastrophic failures in distributed data-intensive systems caused by incorrect error handling https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf … #osdi14
    • @DrQz: 'The purpose of computing is insight, not numbers.' (Hamming) Sometimes numbers ARE the insight so, make them accesible too. (Me)

  • Robert Scoble on the Gillmor Gang said that because of the crush of signups, ello had to throttle invites. Their single PostgreSQL server couldn't handle it captain.

  • Containers are getting much larger with new composite materials. Not that kind of container. Shipping containers. High oil costs have driven ships carrying 5000 containers to evolve. Now they can carry 18,000 and soon 19,000 containers!

  • If you've wanted to make a network game then this is a great start. Making Fast-Paced Multiplayer Networked Games is Hard: Fast-paced multiplayer games over the Internet are hard, but possible. First understanding your constraints then building within them is essential. I hope I have shed some light on what those constraints are and some of the techniques you can use to build within them. No doubt there are other ways out there and ways yet to be used. Each game is different and has its own set of priorities. Learning from what has been done before could help a great deal.

  • Arrival of the Fittest: Solving Evolution's Greatest Puzzle: Environmental change requires complexity, which begets robustness, which begets genotype networks, which enable innovations, the very kind that allow life to cope with environmental change, increase its complexity, and so on, in an ascending spiral of ever-increasing innovability...is the hidden architecture of life.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

That's Not My Problem - I'm Renting Them

Wed, 10/08/2014 - 16:56

Scott Hanselman gives a hilarious and insightful talk in Virtual Machines, JavaScript and Assembler, a keynote at Velocity Santa Clara 2014. The topic of his talk is an intuitive understanding of the cloud and why it's the best thing ever. 

At about 6:30 into the video Scott is at his standup comic best when he recounts a story of a talk Adrian Cockroft gave on Netflix’s move to SSDs. An audience member energetically questioned the move to SSDs saying they had high failure rates and how moving to SSDs was a stupid idea.

To which Mr. Cockroft replies:

That's not my problem, I'm renting them.

Scott selected the ideal illustration of the high level of abstraction the cloud provides. If you are new to the cloud that's a very hard idea to grasp. "That's not my problem, I'm renting them" is the perfect mantra when you find yourself worried about things you don't need to be worried about anymore.

Categories: Architecture

How Clay.io Built their 10x Architecture Using AWS, Docker, HAProxy, and Lots More

Mon, 10/06/2014 - 16:56

This is a guest repost by Zoli Kahan from Clay.io. 

This is the first post in my new series 10x, where I share my experiences and how we do things at Clay.io to develop at scale with a small team. If you find these things interesting, we're hiring - zoli@clay.io.

The Cloud CloudFlare

CloudFlare

CloudFlare handles all of our DNS, and acts as a distributed caching proxy with some additional DDOS protection features. It also handles SSL.

Amazon EC2 + VPC + NAT server
Categories: Architecture

Stuff The Internet Says On Scalability For October 3rd, 2014

Fri, 10/03/2014 - 16:56

Hey, it's HighScalability time:


Is the database landscape evolving or devolving?

 

  • 76 million: once more through the data breach; 2016: when a Zettabyte is transfered over the Internet in one year
  • Quotable Quotes:
    • @wattersjames: Words missing from the Oracle PaaS keynote: agile, continuous delivery, microservices, scalability, polyglot, open source, community #oow14
    • @samcharrington: At last count, there were over 1,000,000 million containers running in the wild. http://stats.openvz.org  @jejb_ #ccevent
    • @mappingbabel: Oracle's cloud has 30,000 computers. Google has about two million computers. Amazon over a million. Rackspace over 100,000.
    • Andrew Auernheimer: The world should have given the GNU project some money to hire developers and security auditors. Hell, it should have given Stallman a place to sleep that isn't a couch at a university. There is no f*cking justice in this world.
    • John Nagle: The right answer is to track wins and losses on delayed and non-delayed ACKs. Don't turn on ACK delay unless you're sending a lot of non-delayed ACKs closely followed by packets on which the ACK could have been piggybacked. Turn it off when a delayed ACK has to be sent. I should have pushed for this in the 1980s.
    • @neil_conway: The number of < 15 node Hadoop clusters is >> the number of > 15 node Hadoop clusters. Unfortunately not reflected in SW architecture.

  • In the meat world Google wants devices to talk to you. The Physical Web. This will be better than Apple's beacons because Apple is severely limiting the functionality of beacons by requiring IDs be baked into applications. It's a very static and controlled world. In other words, it's very Apple. By using URLs Google is supporting both the web and apps; and adding flexibility because a single app can dynamically and generically handle the interaction from any kind of device. In other words, it's very Google. Apple has the numbers though, with hundreds of millions of beacon enabled phones in customer hands. Since it's just another protocol over BLE it should work on Apple devices as well.

  • Did Netflix survive the great AWS rebootathon? The Chaos Monkey says yes, yes they did: Out of our 2700+ production Cassandra nodes, 218 were rebooted. 22 Cassandra nodes were on hardware that did not reboot successfully. This led to those Cassandra nodes not coming back online. Our automation detected the failed nodes and replaced them all, with minimal human intervention. Netflix experienced 0 downtime that weekend. 

  • Google Compute Engine is following Moore's Law by announcing a 10% discount. Bandwidth is still expensive because networks don't care about silly laws. And margin has to come from somewhere.

  • Software is eating...well you've heard it before. Mesosphere cofounder envisions future data center as ‘one big computer’: The data center of the future will be fully virtualized, with everything from power supplies to storage devices consolidated into a single pool and managed by software, according to an executive whose company intends to lead the way.

  • Companies, startups, hacker spaces, teams are all intentional communities. People choose to work together towards some end. A consistent group killer is that people can be really sh*tty to each other. There's a lot of work that has been done around how to make intentional communities work. Holacracy is just one option. Here's a really interesting interview with Diana Leafe Christian on what makes communities work. It requires creating Community Glue, Good Process and Communication Skill, Effective Project Management, and good Governance and Decision making. Which is why most communities fail. Did you know there's even something called Non-Defensive Communication? If followed the Internet would collapse.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Sponsored Post: Apple, Flipboard, All Your Base, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Tue, 09/30/2014 - 16:56

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Siri Operations Developer. Apple is looking for talented developers to help build the next generation internal cloud platform for Siri. This person should be excited about solving difficult distributed systems problems as well as constantly improving user-experience. This person will be working with a highly technical and motivated team solving the hard problems. Please apply here.
    • Site Reliability Engineer. The Apple Pay Site Reliability Team is hiring for multiple roles focused on the front line customer experience and the back end integration of Apple systems with our Network and Banking partners. Please apply here.
    • Senior Software Engineer, iTunes Infrastructure. Hands-on senior software engineering for the iTunes digital media supply chain engineering team. We are looking for a self starting, energetic individual who is not afraid to question assumptions and with excellent written and oral communication skills. Please apply here
    • iTunes - Content Management Tools Engineer. The candidate should have several years experience developing large-scale web-based applications using object-oriented languages. Excellent understanding of relational databases and data-modeling techniques is also a must. Please apply here
    • Senior Engineer: Mobile Services. The Emerging Technologies/Mobile Services team is looking for a proactive and hardworking software engineer to join our team. The team is responsible for a variety of high quality and high performing mobile services and applications for internal use. We seek an accomplished server-side engineer capable of delivering an extraordinary portfolio of features and services based on emerging technologies to our internal customers. Please apply here.
    • Apple Pay Automation Engineer. The Apple Pay group within iOS Systems is looking for a outstanding automation engineer with strong experience in building client and server test automation. We work in an agile software development environment and are building infrastructure to move towards continuous delivery where every code change is thoroughly tested by push of a button and is considered ready to be deployed if we choose so. Please apply here

  • Flipboard's Site Reliability Engineering Team is hiring! This team offers great challenges solving unique problems unlike any you have seen!  They work exclusively in the cloud, ensuring a highly available and performant product to millions of users daily.  If you have a passion for large-scale systems, next generation provisioning and orchestration tools apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • All Your Base is the only curated database conference of its kind in the UK. Listen to talks from database creators, industry leaders and developers working at the coal face on where to store and how to handle your data. Book tickets.
Cool Products and Services
  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Better, Faster, Cheaper: Pick Three. Scalyr is your universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs”; our columnar data store enables enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – get on board!

  • Are You a Startup in need of Reliable Speed at Scale? The Aerospike Startup Special provides free access to the Aerospike Enterprise Edition software with Community Support for qualifying startups. Learn more and see if you qualify! http://www.aerospike.com/blog/special-startups/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Instagram Improved their App's Performance. Here's How.

Mon, 09/29/2014 - 16:56

Is flat design just another pretty face or is it a huge performance hack cloaked as a UI revolution? It turns out flat design is a stone cold performance win.

This and more is expertly explained by Tyler Kieft, Engineer at Instagram, in a crisp and content filled talk he gave at the @scale conferenceInstagram on Typical Android. This talk was part of series of talks given by Facebook on how to design for the reality of mobile applications across the globe, where phones are slower, screens are smaller, and networks are slower than they are in the US.

Designing for a typical phone rather than a high-end phone required the Instagram team to rethink their design in a deep way. One of the revelations in Tyler's talk was that moving to a flat design was huge in making the application more beautiful, more usable, and it also substantially increased performance.

This was quite a surprise. I've only ever thought of flat design as just a way to think about how to build pretty UIs. Silly me. Thanks to Tyler for explaining the benefits of flat design so clearly and forcefully, using Instagram as a great example of what is possible.

Flat design is the anti-skeuomorphism, going digital native, eschewing a slavish obsession with the appearance of reality, adopting simple elements, simple typography, flat colors, and simple designs.

Using flat design Instagram was able shave off 120ms from its cold start times. It was also able to reduce the number of assets it took to display the feed screen from 29 assets down to 8 assets. All while making the application more beautiful, more usable, with giving more focus given to the content across different phone sizes.

How did flat design make all this possible? Please keep on reading...

Categories: Architecture

Stuff The Internet Says On Scalability For September 26th, 2014

Fri, 09/26/2014 - 16:56

Hey, it's HighScalability time:


With tensegrity landing balls we'll be the coolest aliens to ever land on Mars.
  • 6-8Tbps:  Apple’s live video stream; $65B: crowdfunding's contribution to the global economy
  • Quotable Quotes:
    • @bodil: I asked @richhickey and he said "a transducer is just a pre-fused Kleisli arrows in the list monad." #strangeloop
    • @lusis: If you couldn’t handle runit maybe you shouldn’t be f*cking with systemd. You’ll shoot your g*ddamn foot off.
    • Rob Neely: Programming model stability + Regular advances in realized performance = scientific discovery through computation
    • @BenedictEvans: Maybe 5bn PCs have been sold so far. And 17bn mobile phones.
    • @xaprb: "There's no word for the opposite of synergy" @jasonh at #surgecon

  • The SSD Endurance Experiment. The good news: You don't have to worry about writing a lot of data to SSDs anymore. That bad news: When your SSD does die your data may not be safe. Good discussion on Hacker News.

  • Don't have a lot of money? Don't worry. Being cheap can actually create cool: Teleportation was used in Star Trek because the budget couldn't afford expensive shots of spaceships landing on different planets.

  • Not so crazy after all? Google’s Internet “Loon” Balloons Will Ring the Globe within a Year

  • Before cloud and after cloud as told through a car crash

  • Cluster around dear readers, videos from MesosCon 2014 are now available.

  • From Backbone To React: Our Experience Scaling a Web Application. This seems a lot like the approach Facebook uses in their Android apps. As things get complex move the logic to a top level centralized manager and then distribute changes down to components that are not incrementally changed, they are replaced entirely.

  • Deciding between GAE or EC2? This might help: Running a website: Google App Engine vs. Amazon EC2. AWS is hard to set up. Both give you a lot for free. GAE is not customizable. On AWS use whatever languages and software you want. GAE once written your software will scale. If you have a sysadmin or your project requires specific software go with AWS. If you are small or have a static site go with GAE. 

  • Mean vs Lamp – How Do They Stack Up? MEAN = MongoDB, Express.js, Angular.js, PHP or Python. Why be MEAN: the three most significant being a single language from top to bottom, flexibility in deployment platform, and enhanced speed in data retrieval. However, the switch is not without trade-offs; any existing code will either need to be rewritten in JavaScript or integrated into the new stack in a non-obvious manner.  

  • Free the Web: Sometimes, I feel like blaming money. When money comes into play, people start to fear. They fear losing their money, and they fear losing their visitors. And so they focus on making buttons easily clickable (which inevitably narrows down places where they can go), and they focus on making sites that are safe but predictably usable.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

5 Tips for Scaling NoSQL Databases: Don’t Trust Assumptions—Test, Test, Test!

Wed, 09/24/2014 - 17:08

Alex Bordei, product manager for Bigstep’s Full Metal Cloud, in Scaling NoSQL databases: 5 tips for increasing performance, shares a nice set of lessons he's learned about how NoSQL databases scale:

  • Never assume linearity in scaling. Hardware prices grow exponentially as the specs increase, but not all software can take full advantage of all that power. So you may be paying for hardware your database can't use. Find the sweet spot for price and hardware capabilities.
  • Tests speak louder than specs. Don't trust vendor documentation. It's cheap to spin up new instances so test the specs for yourself.
  • Mind the details: Memory & CPU numbers matter. For in-memory databases the specs on your memory modules matter. Faster memory means faster performance. Same for CPU frequencies. Pay attention to what your money is buying.
  • Do not neglect network latency. Paying for fast memory and fast CPU won't do a lot of good if your network is slow. 
  • Avoid virtualization with NoSQL databases. Virtualization can exact a 20-200% performance penalty. Noisy neighbors also help ruin the neighborhood. Up to 400% performance gains can be seen by switching away from virtualization and adopting bare metal clouds.

Lots of good advice. Each of these points in discussed in more detail in the original article, which is well worth reading.

 

Categories: Architecture

How Facebook Makes Mobile Work at Scale for All Phones, on All Screens, on All Networks

Mon, 09/22/2014 - 16:58

Update: Instagram Improved Their App's Performance. Here's How.

When you find your mobile application that ran fine in the US is slow in other countries, how do you fix it? That’s a problem Facebook talks about in a couple of enlightening videos from the @scale conference. Since mobile is eating the world, this is the sort of thing you need to consider with your own apps.

In the US we may complain about our mobile networks, but that’s more #firstworldproblems talk than reality. Mobile networks in other countries can be much slower and cost a lot more. This is the conclusion from Chris Marra, Project Manager at Facebook, in a really interesting talk titled Developing Android Apps for Emerging Market.

Facebook found in the US there’s 70.6% 3G penetration with 280ms average latency. In India there’s 6.9% 3G penetration with 500ms latency. In Brazil there’s 38.6% 3G penetration with more than 850ms average latency.

Chris also talked about Facebook’s comprehensive research on who uses Facebook and what kind of phones they use. In summary they found not everyone is on a fast phone, not everyone has a large screen, and not everyone is on a fast network.

It turns out the typical phone used by Facebook users is from circa 2011, dual core, with less than 1GB of RAM. By designing for a high end phone Facebook found all their low end users, which is the typical user, had poor user experiences.

For the slow phone problem Facebook created a separate application that used lighter weight animations and other strategies to work on lower end phones. For the small screen problem Facebook designers made sure applications were functional at different screen sizes.

Facebook has moved to a product organization. A single vertical group is responsible for producing a particular product rather than having, for example, an Android team try to create all Android products. There’s also a horizontally focussed Android team trying to figure out best practices for Android, delving deep into the details of what makes a platform tick.

Each team is responsible for the end-to-end performance and reliability for their product. There are also core teams looking at and analyzing general performance problems and helping where needed to improve performance.

Both core teams and product teams are needed. The core team is really good at instrumentation and identifying problems and working with product teams to fix them. For mobile it’s important that each team owns their full product end-to-end. Owning core engagement metrics, core reliability, and core performance metrics including daily usage, cold start times, and reliability, while also knowing how to fix problems. 

To solve the slow network problem there’s a whole other talk. This time the talk is given by Andrew Rogers, Engineering Manager at Facebook, and it’s titled Tuning Facebook for Constrained Networks. Andrew talks about three methods to help deal with network problems: Image Download Sizes, Network Quality Detection, Prefetching Content.

Overall, please note the immense effort that is required to operate at Facebook scale. Not only do you have different phones like Android and iOS, you have different segments within each type of phone you must code and design for. This is crazy hard to do.

Reducing Image Sizes -  WebP saved over 30% JPEG, 80% over PNG
Categories: Architecture

Stuff The Internet Says On Scalability For September 19th, 2014

Fri, 09/19/2014 - 16:56

Hey, it's HighScalability time:


Galactic bolt-hole or supermassive black hole, weighing as much as 21 million suns?
  • Quotable Quotes:
    • @debuggist: Chief takeaway from #velocityconf for me: failure happens so monitor for the ones that are important albeit in systems or culture & fix them
    • @Carnage4Life: The real tech bubble is valuations of Google, Facebook & Twitter are inflated by app install ads from unprofitable startups 
    • Jay Parikh: Android has 2x as many users as iOS. However, iOS average revenue per user is 4x higher than Android. 
    • Joe Armstrong: We’ve made a mess. Need to reverse entropy. Quantum mechanics sets limits to ultimate speed of computation. We need Math. Abolish names and places. Build the condenser. Make low-power computers - no net environmental damaged.
    • @reneritchie: iOS 8 is a beefy update. The internet is feeling the strain of millions of downloads. If it’s slow or stuttering, just give it some time.
    • @vishyp: Quip is big on disconnected clients and clients as write-thru caches. Like it! Uses C++ for x-platform. @btaylor #atscale2014
    • @csabacsoma: Instragram: we use Postgres for almost everything now, some Redis and Memcached #AtScale2014
    • @rfberry: "To maximize throughput, we needed an integrated approach to backpressure." #AtScale2014
    • @sharonw: I asked this Facebook / Instagram panel about image upload. Key changes: retry aggressively, and resize and encode on device. #AtScale2014
    • Igor Zaika: Most of the developers who work on Microsoft Office are younger than the codebase. 
    • Chris Marra: There are about 10k different device models accessing Facebook. Designing for high-end smartphones does not cut it.
    • @Carnage4Life: Apple spends $100m on a U2 album you don't want. Microsoft spends that on ads you don't like. Amazon spends it on free shipping #Marketing
    • @beerops: Laziness as a multiplier: train other people to do what you can do to remove yourself as a bottleneck #velocityconf
    • @xaprb: "Failure is a feature of complex systems." - #velocityconf
    • @dalmaer: "Our mobile app is a write through cache (SQLite) to the source of truth (MySQL on AWS)" -- @btaylor #AtScale2014
    • @herminghaus: @BenedictEvans Designed an 11.4TB patent retrieval system in 1993 with slow WORM robots. Cost $140m. Now <$1000.- at BestBuy.
    • @BenedictEvans: You can't ask people to decide on a trade-off when they have experience of one side but not the other.

  • Caching at Scale. There's a need to better manage caching, especially under failure conditions. Solutions are generally in the form of a proxy layer above memcached. Along these lines Box created Tron, Twitter created TwemProxy, and Facebook created a value meal in McRouter. Database people have always countered, why have a separate cache, just build a cache into the database? This hasn't worked for various reasons, mostly because a database always cares more about being a good database rather than being a good cache. Vitess wants to fix that. Vitess is an open-source system written in Go used at Youtube that: challenges the paradigm of treating caching as a separate layer by directly addressing the issues of database scalability and by modifying the handling of SQL queries.  

  • Talk about your Chaos Godzilla. Facebook Turned Off Entire Data Center to Test Resiliency. Before flipping that switch there must have been a little pause, perhaps thinking this wouldn't be prudent, but damn the torpedoes and full speed ahead. Apparently some issues were found, but it went fairly smoothly. Hazaa for the chutzpah.

  • Best LAN party ever? Researchers twist four radio beams together to achieve high data transmission speeds. The researchers reached data transmission rates of 32 gigabits per second across 2.5 meters of free space in a basement lab.

  • This is an understatement. iOS 8, thoroughly reviewed. An amazing job. A big take away for me is Apple is systematically removing reasons not to buy an iPhone. Bigger phone. Check. Configurable keyboard. Check. Extensions that display in the today view and allow app cooperation. Check. Another take away is Apple is abandoning simplicity for configurability, which is embracing complexity, which is a potential experience killer.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

The FireBox Warehouse Scale Computer in 2020 Will Have 1K Sockets, 100K Cores, 100PB NV RAM, and a 4Pb/s Network

Wed, 09/17/2014 - 16:56

That's the eye popping prediction from Krste Asanović, University of California, Berkeley, in a presentation he gave at FAST '14 titled: FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers (pdf).

FireFox looks system like this:

Trends in Warehouse Scale Computers (WSC)s:
Categories: Architecture

Sponsored Post: Apple, Flipboard, All Your Base, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Tue, 09/16/2014 - 16:56

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Siri Operations Developer. Apple is looking for talented developers to help build the next generation internal cloud platform for Siri. This person should be excited about solving difficult distributed systems problems as well as constantly improving user-experience. This person will be working with a highly technical and motivated team solving the hard problems. Please apply here.
    • Site Reliability Engineer. The Apple Pay Site Reliability Team is hiring for multiple roles focused on the front line customer experience and the back end integration of Apple systems with our Network and Banking partners. Please apply here.
    • Senior Software Engineer, iTunes Infrastructure. Hands-on senior software engineering for the iTunes digital media supply chain engineering team. We are looking for a self starting, energetic individual who is not afraid to question assumptions and with excellent written and oral communication skills. Please apply here
    • iTunes - Content Management Tools Engineer. The candidate should have several years experience developing large-scale web-based applications using object-oriented languages. Excellent understanding of relational databases and data-modeling techniques is also a must. Please apply here

  • Flipboard's Site Reliability Engineering Team is hiring! This team offers great challenges solving unique problems unlike any you have seen!  They work exclusively in the cloud, ensuring a highly available and performant product to millions of users daily.  If you have a passion for large-scale systems, next generation provisioning and orchestration tools apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • All Your Base is the only curated database conference of its kind in the UK. Listen to talks from database creators, industry leaders and developers working at the coal face on where to store and how to handle your data. Book tickets.
Cool Products and Services
  • FoundationDB launches SQL Layer. SQL Layer is an ANSI SQL engine that stores its data in the FoundationDB Key-Value Store, inheriting its exceptional properties like automatic fault tolerance and scalability. It is best suited for operational (OLTP) applications with high concurrency. Users of the Key Value store will have free access to SQL Layer. SQL Layer is also open source, you can get started with it on GitHub as well.

  • Better, Faster, Cheaper: Pick Three. Scalyr is your universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs”; our columnar data store enables enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – get on board!

  • Whitepaper Clarifies ACID Support in Aerospike. In our latest whitepaper, author and Aerospike VP of Engineering & Operations, Srini Srinivasan, defines ACID support in Aerospike, and explains how Aerospike maintains high consistency by using techniques to reduce the possibility of partitions.  Read the whitepaper: http://www.aerospike.com/docs/architecture/assets/AerospikeACIDSupport.pdf.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Getting Things Right: A Look at Centralized vs Decentralized Systems Through the Eyes of Instant Replay

Mon, 09/15/2014 - 17:06

Three baseball umpires were sitting around a bar, talking about how they make calls on each pitch: First umpire: Some are balls and some are strikes, and I call them as they are. Second umpire: Some are balls and some are strikes, and I call them as I see 'em. Third umpire: Some are balls and some are strikes, but they ain’t nothin' until I call 'em.


AT&T's Global Network Operations Center

 


MLB's Instant Replay Bunker
NHL's Situation Room

It’s fun to look at how concepts we think of as belonging primarily to the domain of computer science play out in other fields. One intriguing example is how Instant Replay reflects and even helps shape the culture of a sport by how replay is implemented: decentralized or centralized.

Lucrative TV deals have pumped huge sums of money into professional sports. With so much money in play, sports have shifted from being pure entertainment to wanting to get things right. The price of making a bad call is just too high to let the human element decide the fate of titans.

Getting things right is also a much talked about subject in computer science. In CS the language of getting things right uses terms like transaction, rollback, quorum, optimistic replication, linearizability, synchronization, lock, eventually consistent, compensating transaction, and so on.

In sports to get things right referees use terms like flag, penalty, by rule, ruling stands, reset the clock, down and distance, line to gain, the whistle blew, ruling confirmed, and ruling overturned.

Though the vocabulary is different, the intent is much the same. Correctness.

Intent is not all tech and sports have in common. As technology evolves we are seeing sports change to take advantage of the new capabilities technology offers. And those changes should be familiar to anyone in software. Sports have gone from a completely decentralized system of officiating to where we now see the NBA, NFL, MLB, and NHL, all converging on some form of a centralized system.

The NHL were the innovators, starting their centralized instant replay system in 2011. It works something like this...officials sit in a war room located in Toronto that looks a lot like every network operations center ever built. Video feeds from all games flow into the room. When there is a controversy or an obvious review-worthy play, Toronto is contacted for a quick review and judgement on the correct call.  Every sport will implement their own centralized replay system in their own way, but that's the gist of it.

We’ve seen the exact same transformation as federated services like email have been replaced with centralized services like Twitter and Facebook. It turns out sports and computer science have some deeper similarities. What might those be?

Categories: Architecture

Stuff The Internet Says On Scalability For September 12th, 2014

Fri, 09/12/2014 - 16:56

Hey, it's HighScalability time:


Each dot in this image is an entire galaxy containing billions of stars. What's in there?
  • Quotable Quotes:
    • mseepgood: Or "another language that's becoming popular, Node.js"
    • Joe Moreno: What good are billions of cycles of CPU power that make me wait. I shouldn't have to wait longer and longer due to launching, buffering, syncing, I/O and latency.
    • @stevecheney: Apple Pay is the magic that integrated hardware / software produces. No one else in the world can do this.
    • @etherealmind: Next gen Intel Xeon E5 V3 CPU includes packet processor for 40GBE, 30x increase in OpenSSL crypto, 25% increase in DPDK perf. #IDF14
    • @pbailis: There's actually an interesting question in understanding when to break "sharing" -- at core, NUMA domain, server, or cluster level?
    • @fmueller_bln: Just wait some minutes for vagrant to provision a vm with puppet and you’ll know why docker may be better option for dev machines...

  • Encryption will make fighting the spam war much costlier reveals Mike Hearn in an awesome post: A brief history of the spam war, where he gives insightful color commentary of the punch counter punch between World Heavyweight Champion Google and the challenger, Clever Spammer.  Mike worked in the Gmail trenches for over four years and recommends: make sending email cost money; use money to create deposits using bitcoin. 

  • jeswin: No other browser can practically implement or support Dart. If they do their implementation will be slower than Google's, and will get classified as inferior. < Ignoring the merits of Dart, this is an interesting ecosystem effect. By rating sites for non quality of content reasons Google can in effect select for characteristics over which they have a comparative advantage. It's not an arms length transaction. 

  • Dateline Seattle. Social media users execute a coordinated denial of service attack on cell networks, preventing those in need from accessing emergency services. Who are these terrorists? Football fans. City of Seattle asks people to stop streaming videos, posting photos because of football. Tweets, Instagram, YouTube, and Snapchat are overloading the cell networks so calls can't get through. Should the cell network expand capacity? Should there be an app tax to constrain demand? Should users pay per packet? As a 49ers fan I have another suggestion...move games to a different venue, perhaps the moon. That will help.

  • Are you a militant cable cutter who thinks the future of  TV is the Internet? Not so fast says Dan Rayburn in Internet Traffic Records Could Be Broken This Week Thanks To Apple, NFL, Sony, Xbox, EA and Others: Delivering video over the Internet at the same scale and quality that you do over a cable network isn’t possible. The Internet is not a cable network and if you think otherwise, you will be proven wrong this week. We’re going to see long download times, more buffering of streams, more QoS issues and ISPs that will take steps to deal with the traffic. 

  • Ted Nelson takes on the impossible in on How Bitcoin Actually Works (Computers for Cynics #7). And he does an excellent job, sharing his usual insight with a twist. The title is misleading however. There's hardly any cynicism. How disappointing! Ted is clearly impressed with the design and implementation of bitcoin. For good reason. No matter what you think of bitcoin and its potential role in society, it is a very well thought out and impressive piece of technology. On par with Newton, Mr. Nelson suggests. If you watch this you'll probably realize that you don't actually understand bitcoin, even if you think you do, and that's a good thing.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture