Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
Syndicate content
Updated: 2 hours 27 min ago

Paper: Heracles: Improving Resource Efficiency at Scale

Thu, 06/04/2015 - 16:56

Underutilization and segregation are the classic strategies for ensuring resources are available when work absolutely must get done. Keep a database on its own server so when the load spikes another VM or high priority thread can't interfere with RAM, power, disk, or CPU access. And when you really need fast and reliable networking you can't rely on QOS, you keep a dedicated line.

Google flips the script in Heracles: Improving Resource Efficiency at Scale, shooting for high resource utilization while combining different load profiles.

I'm assuming the project name Heracles was chosen not simply for his legendary strength, but because when strength failed, Heracles could always depend on his wits. Who can ever forget when Heracles tricked Atlas into taking the sky back onto his shoulders? Good times.

The problem: better utilization of compute resources while complying  service level objectives (SLOs) for latency-critical (LC) and best effort batch (BE) tasks: 

Categories: Architecture

What Does it Mean to Poke a Complex System?

Wed, 06/03/2015 - 16:56

A little bit of follow up...

In How Can We Build Better Complex Systems? Containers, Microservices, And Continuous Delivery I had a question about what Mary Poppendieck meant when she talked about poking complex systems.

InfoQ interviewed Mary & Tom Poppendieck and found out the answer:

Categories: Architecture

Why You Dont' Want to Aim for 100% Uptime According to Google's Urs Hölzle

Tue, 06/02/2015 - 16:56

Wait, you don't want 100% uptime? Who said such a crazy thing? Risk taker Urs Hölzle, senior VP for technical infrastructure, in Google's Infrastructure Chief Talks SDN: Whenever you try something new, there are going to be problems with it....We were willing to take the risk to get the innovation. Our VP who runs our site reliability gave a great talk about not aiming for 100% uptime....The easiest way to make it be at 100% is to resist change, because change is when bad things happen. Looks great for your SLA, but it's bad for your business because you slow down innovation.... In the first year of running B4, [we asked] "Will we have an outage?" Realistically, yes there's a high chance because it was all new code. Are we going to be perfect? Probably not. You have to have a willingness to take a little risk.
Categories: Architecture

Developing Products in the Style of Etsy

Mon, 06/01/2015 - 16:56

How should you go about structuring your project? We have two general paradigms that I'll characterize as flowing from the Etsy coaching tree, emphasising the monolith, and from the Netflix coaching tree, emphasizing microservices. This is of course an over simplification, but it's for instructional purposes only. For a broad comparison of the two approaches take a look at The Great Microservices Vs Monolithic Apps Twitter Melee.

This is not a good vs. evil sort of mythos. The Force is truly one. We simply have two valid and functional ways of looking the world.

I think wdewind nails the heart of the difference:

The point of the article is that local optimization gives you this tiny boost in the beginning for a long term cost that eventually moves the organization is a direction of shipping less. It's not that innovative technologies are bad.

The mentioned article is Choose Boring Technology by Dan McKinley, in which Dan does a great job exploring Etsy style development with both insight and wisdom. 

Dan explores four different principles:

Categories: Architecture

Stuff The Internet Says On Scalability For May 29th, 2015

Fri, 05/29/2015 - 16:56

Hey, it's HighScalability time:


Just imagine. 0-100 mph in 1.2 seconds. Astronaut's view from the Dragon spacecraft.
  • $850B: mobile web market in 2018; 107: unicorns; 3.2 billion: # of people on the Internet; 10^82: atoms in the observable universe
  • Quotable Quotes:
    • @cloud_opinion: appropriate term for people that resist Docker is "VM Huggers"
    • @mikeloukides: Scale systems, not teams. Adding scale shouldn’t mean adding people. Teams should scale sublinearly.  @shinynew_oz @ #velocityconf
    • Marc Levinson: If the market repeatedly misjudged the container, so did the state. Governments in New York City and San Francisco ignored the consequences of containerization as they wasted hundreds of millions of dollars reconstructing ports that were outmoded before the concrete was dry
    • @corbett: doesn't describe ultimate origin but "Inflation describes how the universe emerges from a patch of 10^-28cm & mass of only a few grams" -AG
    • @Gizmodo: Since last year, over 600 million more people have smartphones. It’s the age of mobile, says Sundar Pichai. #io15
    • @stshank: Android in a nutshell: >1 billion users, 4000 devices, 500 carriers, 400 device makers says @sundarpichai at #io15 
    • Carlos C:  Congratulations, FP hackers. You won the battle of simplicity to express...and here is where Go wins the battle of simplicity to achieve.
    • @markimbriaco: @joestump In my day, we emitted HTML from our apps. Pushed the packets uphill to the browsers. Through driving DDoS. And we liked it.
    • aikah: Yep, hail "Isomorphic micro-service oriented management."
    • @bitfield: "We haven't got time to automate this stuff, because we're too busy dealing with the problems caused by our lack of automation." —Everyone
    • @raju: India reported 851 Million active mobile connections in February 2015
    • @ValaAfshar: The average smartphone user checks their mobile device 214 times per day... and 86% of the time is apps (vs 14% browser). #codecon
    • @BradStone: Meeker: 87 percent millennials say smartphones never leave their side night or day. 44 percent use camera at least once a day. #CodeCon
    • @sequoia: "We're close to 1M people everyday staying at an @Airbnb home. We're here to stay" @bchesky #codecon
    • @pmarca: Moore's Law used to be about faster, now it's more about cheaper. Huge change with the biggest possible consequences. 
    • Nicolas Liochon: CAP: if all you have is a timeout, everything looks like a partition
    • See the complete post for the full list...

  • This would change things. What Memory Will Intel’s Purley Platform Use?: One slide, titled: “Purley: Biggest Platform Advancement Since Nehalem” includes this post’s graphic, which tells of a memory with: “Up to 4x the capacity & lower cost than DRAM, and 500x faster than NAND.” Also, What High-Bandwidth Memory Is and Why You Should Care

  • The question seldom asked with these kind of efforts: Does your idea of merit have merit? Startup Aims to Make Silicon Valley an Actual Meritocracy.

  • The reason for us to save everything is that our collective data is the training ground for future AIs. We should train them to understand all of humanity. Hopefully they'll learn pity. Oh, wait...  The Internet With A Human Face: I've come to believe that a lot of what's wrong with the Internet has to do with memory. The Internet somehow contrives to remember too much and too little at the same time. 

  • If you would like a rich exploration of the ethical implications of post-humanism then Apex: Nexus Arc Book 3 by Ramez Naam is the book for you. The framework is a game of iterated tit-for-tat. Ultimately if we don't want post-humans to destroy us lowly humans then we humans need to treat them well, from the start. If we harm them then the correct move on their part is to tat us. That won't be good. So open with a trust move and be nice. This radical notion might even work with normal humans.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

A Toolkit to Measure Basic System Performance and OS Jitter

Wed, 05/27/2015 - 16:56

Jean Dagenais published a great response on a mechanical-sympathy thread to Gil Tene's article, The Black Magic Of Systematically Reducing Linux OS Jitter. It's full of helpful tools for tracking down jitter problems. I apologize for the incomplete attribution. I did not find a web presence for Jean. 

To complement the great information I got on the “Systematic Way to Find Linux Jitter”, I have created a toolkit that I now used to evaluate current and future trading platforms.

In case this can be useful, I have listed these tools, as well as the URLs to get the source code and a description of their usage. I am learning a lot by reading the source code, and the blog entry associated.

This is far from an exhaustive list, as every week I find either a new problem area or a new tool that improve my understanding of this beautiful problem domain ;)

These tools are grouped into these categories: 

  1. CPU, Memory, Disk, Network
  2. X86, Linux, and Java time resolution
  3. Context Switches & Inter Thread Latency
  4. System Jitter
  5. Application Building Blocks: distruptor, openHft, Aeron & Workload Generator
  6. Application Performance Testing

Happy Benchmarking and Jitter Chasing!

1. CPU, Memory, Disk, Network

Categories: Architecture

Sponsored Post: Tumblr, Power Admin, Learninghouse, MongoDB, Internap, Aerospike, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 05/26/2015 - 16:56

Who's Hiring?
  • Make Tumblr fast, reliable and available for hundreds of millions of visitors and tens of millions of users.  As a Site Reliability Engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. Apply here now! 

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • 90 Days. 1 Bootcamp. A whole new life. Interested in learning how to code? Concordia St. Paul's Coding Bootcamp is an intensive, fast-paced program where you learn to be a software developer. In this full-time, 12-week on-campus course, you will learn either .NET or Java and acquire the skills needed for entry-level developer positions. For more information, read the Guide to Coding Bootcamp or visit bootcamp.csp.edu.

  • June 2nd – 4th, Santa Clara: Register for the largest NoSQL event of the year, Couchbase Connect 2015, and hear how innovative companies like Cisco, TurboTax, Joyent, PayPal, Nielsen and Ryanair are using our NoSQL technology to solve today’s toughest big data challenges. Register Today.

  • The Art of Cyberwar: Security in the Age of Information. Cybercrime is an increasingly serious issue both in the United States and around the world; the estimated annual cost of global cybercrime has reached $100 billion with over 1.5 million victims per day affected by data breaches, DDOS attacks, and more. Learn about the current state of cybercrime and the cybersecurity professionals in charge with combatting it in The Art of Cyberwar: Security in the Age of Information, provided by Russell Sage Online, a division of The Sage Colleges.

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at http://mongodbworld.com/ but hurry! Super Early Bird pricing ends on April 3.
Cool Products and Services
  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • Instructions for implementing Redis functionality in Aerospike. Aerospike Director of Applications Engineering, Peter Milne, discusses how to obtain the semantic equivalent of Redis operations, on simple types, using Aerospike to improve scalability, reliability, and ease of use. Read more.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Appknox Architecture - Making the Switch from AWS to the Google Cloud

Mon, 05/25/2015 - 17:00

This is a guest post by dhilipsiva, Full-Stack & DevOps Engineer at Appknox.

Appknox helps detect and fix security loopholes in mobile applications. Securing your app is as simple as submitting your store link. We upload your app, scan for security vulnerabilities, and report the results. 

What's notable about our stack:
  • Modular Design. We modularized stuff so far that we de-coupled our front-end from our back-end. This architecture has many advantages that we'll talk about later in the post.
  • Switch from AWS to Google Cloud. We made our code largely vendor independent so we were able to easily make the switch from AWS to the Google Cloud. 
Primary Languages
  1. Python & Shell for the Back-end
  2. CoffeeScript and LESS for Front-end
Our Stack
  1. Django
  2. Postgres (Migrated from MySQL)
  3. RabbitMQ
  4. Celery
  5. Redis
  6. Memcached
  7. Varnish
  8. Nginx
  9. Ember
  10. Google Compute 
  11. Google Cloud Storage
Architecture
Categories: Architecture

Stuff The Internet Says On Scalability For May 22nd, 2015

Fri, 05/22/2015 - 16:56

Hey, it's HighScalability time:


Where is the World Brain? San Fernando marshes in Spain (by Cristobal Serrano)
  • 569TB: 500px total data transfer per month; 82% faster: elite athletes' brains; billions and millions: Facebook's graph store read and write load; 1.3 billion: daily Pinterest spam fighting events; 1 trillion: increase in processing power performance over six decades; 5 trillion: Facebook pub-sub messages per day
  • Quotable Quotes:
    • Silicon Valley: “Tell me the truth,” Gavin demands of a staff member. “Is it Windows Vista bad? Zune bad?” “I’m sorry,” the staffer tells Gavin, “but it’s Apple Maps bad!”
    • @garybernhardt: Reminder to people whose "big data" is under a terabyte: servers with 1 TB RAM can be had about $20k. Your data set fits in RAM.
    • @epc: μServices and AWS Lambda are this year’s containers and Docker at #Gluecon
    • orasis: So by this theory the value of a tech startup is the developer's laptops and the value of a yoga studio is the loaner mats.
    • @ajclayton: An average attacker sits on your network for 229 days, collecting information. @StephenCoty #gluecon
    • @mipsytipsy: people don't *cause* problems, they trigger latent conditions that make failures more likely.  @allspaw on post mortems #srecon15europe
    • @pas256: The future of cloud infrastructure is a secure, elastically scalable, highly reliable, and continuously deployed microservices architecture
    • Kevin Marks: The Web is the network
    • @cdixon: We asked for flying cars and all we got was the entire planet communicating instantly via $34 pocket supercomputers 
    • @ajclayton: Uh oh, @pas256 just suggested that something could be called a "nanoservice"...microservices are already old. #gluecon
    • @jamesurquhart: A sign that containers are interim step? Pkging procs better than pkging servers, but not as good as pkging functs? 
    • @markburgess_osl: Let's rename "immutable infrastructure" to "prefab/disposable" infrastructure, to decouple it from the false association with functionalprog
    • @Beaker: Key to startup success: solve a problem that has been solved before but was constrained due to platform tech cost or non-automated ops scale
    • @mooreds: 10M req/month == $45 for lambda.  Cheap. -- @pas256 #gluecon
    • @ajclayton: Microservices "exist on all points of the hype cycle simultaneously" @johnsheehan #gluecon
    • @oztalip: "Treat web server as a library not as a container, start it inside your application, not the other way around!" -@starbuxman #GOTOChgo
    • @sharonclin: If a site doesn't load in 3 sec, 57% abandon, 80% never return.  @krbenedict #m6xchange #Telerik
    • QuirksMode: Tools don’t solve problems any more, they have become the problem.
    • @rzazueta: Was considering taking a shot every time I saw "Microservices" on the #gluecon hashtag. But I've already gone through two livers.
    • @MariaSallis: "If you don't invest in infrastructure, don't invest in microservices" @johnsheehan #gluecon
    • Brian Gallagher: If the world devolved into a single cloud provider, there would be no need for Cloud Foundry.
    • @b6n: startup idea: use technology from the 70s.
    • Steven Hawking: The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge
    • @aneel: "Monolithic apps have unlimited invisible internal dependencies" -@adrianco #gluecon
    • @windley: microservices don’t reduce complexity, they move it around, from dev to ops. #gluecon
    • @paulsbruce: When everyone has to be an expert in everything, that doesn't scale." @dberkholz @451research #gluecon
    • @oamike: I didn’t do SOA right, I didn’t do REST right, I’m sure as hell not going to do micro services right. #gluecon @kinlane
    • Urs Hölzle: My biggest worry is that regulation will threaten the pace of innovation.
    • @mccrory: There has been an explosion in managed OpenStack solutions - Platform9, MetaCloud, BlueBox
    • @viktorklang: Remember that you heard it here first, CPU L1 cache is the new disk.

  • This is more a measure of the fecundity of the ecosystem than an indication of disease. By its very nature the magic creation machine that it is Silicon Valley must create both wonder and bewilderment. Silicon Valley Is a Big Fat Lie: That gap between the Silicon Valley that enriches the world and the Silicon Valley that wastes itself on the trivial is widening daily.

  • In a liquidity crisis all those promises mean nothing. RadioShack Sold Your Data to Pay Off Its Debts.

  • YouTube has to work at it too. To Take On HBO And Netflix, YouTube Had To Rewire Itself: All of the things that InnerTube has enabled—faster iteration, improved user testing, mobile user analytics, smarter recommendations, and more robust search—have paid off in a big way. As of early 2015, YouTube was finally becoming a destination: On mobile, 80% of YouTube sessions currently originate from within YouTube itself.

  • If you aren't doing web stuff, do you really need to use HTTP? Do you really know why you prefer REST over RPC? There's no reason for API requests to pass through an HTTP stack.

  • If scaling is specialization and the cloud is the computer then why are we still using TCP/IP between services within a datacenter? Remote Direct Memory Access is fast. FaRM: Fast Remote Memory: FaRM’s per-machine throughput of 6.3 million operations per second is 10x that reported for Tao. FaRM’s average latency at peak throughput was 41µs which is 40–50x lower than reported Tao latencies. 

  • MigratoryData with 10 Million Concurrent Connections on a single commodity server. Lots of details on how the benchmark was run and the various configuration options. CPU usage under 50% (with spikes), memory usage was predictable, network traffic was  0.8 Gbps for 168,000 messages per second, 95th Percentile Latency: 374.90 ms. Next up? C100M.

  • Does anyone have a ProductHunt invite that they would be willing share with me?

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Database Scaling Redefined: Scaling Demanding Queries, High Velocity Data Modifications and Fast Indexing All At Once for Big Data

Wed, 05/20/2015 - 16:56

This is a guest post by Cihan Biyikoglu, Director of Product Management at Couchbase.

Question: A few million people are out looking for a setup to efficiently live and interact. What is the most optimized architecture they can use?

  1. Build one giant high-rise for everyone,
  2. Build many single-family homes OR
  3. Build something in between?

Schools, libraries, retail stores, corporate HQs, homes are all there to optimize variety of interactions. Sizes of groups and type of exchange vary drastically… Turns out, what we have chosen to do is, to build all of the above. To optimize different interactions, different architectures make sense.

While high rises can be effective for interactions with high density of people in a small amount of land, it is impractical to build 500 story buildings. It is also hard to add/remove floors as you need them. So high-rises feel awfully like scaling-up – cluster of processors communicating over fast memory to compute fast but limited scale ceiling and elasticity.

As your home, single-family architecture work great. Nice backyard to play and private space for family dinners... You may need to get in your car to interact with other families, BUT it is easy to build more single family houses: so easy elasticity and scale. Single-family structure feels awfully like scaling-out, doesn't it? Cluster of commodity machines that communicate over slower networks and come with great elasticity.

“How does this all relate to database scalability?” you ask…

Categories: Architecture

Paper: FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

Tue, 05/19/2015 - 16:56

It's amazing what you can accomplish these days on a single machine using SSDs and smart design. In the paper FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs they:

demonstrate that FlashGraph is able to process graphs with billions of vertices and hundreds of billions of edges on a single commodity machine.

The challenge is SSDs are a lot slower than RAM:

The throughput of SSDs are an order of magnitude less than DRAM and the I/O latency is multiple orders of magnitude slower. Also, I/O performance is extremely non-uniform and needs to be localized. Finally, high-speed I/O consumes many CPU cycles, interfering with graph processing.

Their solution exploits caching, parallelism, smart scheduling and smart placement algorithms:

We build FlashGraph on top of a user-space SSD file system called SAFS [32] to overcome these technical challenges. The set-associative file system (SAFS) refactors I/O scheduling, data placement, and data caching for the extreme parallelism of modern NUMA multiprocessors. The lightweight SAFS cache enables FlashGraph to adapt to graph applications with different cache hit rates. We integrate FlashGraph with the asynchronous user-task I/O interface of SAFS to reduce the overhead of accessing data in the page cache and memory consumption, as well as overlapping computation with I/O.

The result performs up to 80% of its in-memory implementation:

We observe that in many graph applications a large SSD array is capable of delivering enough I/Os to saturate the CPU. This suggests the importance of optimizing for CPU and RAM in such an I/O system. It also suggests that SSDs have been sufficiently fast to be an important extension for RAM when we build a machine for large-scale graph analysis applications.

Abstract: 

Categories: Architecture

How MySQL is able to scale to 200 Million QPS - MySQL Cluster

Mon, 05/18/2015 - 16:56
This is a guest post by Andrew Morgan, MySQL Principal Product Manager at Oracle.

MySQL Cluster logo

The purpose of this post is to introduce MySQL Cluster - which is the in-memory, real-time, scalable, highly available version of MySQL. Before addressing the incredible claim in the title of 200 Million Queries Per Second it makes sense to go through an introduction of MySQL Cluster and its architecture in order to understand how it can be achieved.

Introduction to MySQL Cluster
Categories: Architecture

Stuff The Internet Says On Scalability For May 15th, 2015

Fri, 05/15/2015 - 16:56

Hey, it's HighScalability time:


Stand a top a volcano and survey the universe.  (By Shane Black & Judy Schmidt)
  • 1 million: Airbnb's room inventory; 2 billion: Telegram messages sent daily; Two billion: photos shared daily on Facebook; 10,000: sensors in every Airbus wing
  • Quotable Quotes:
    • Silicon Valley: “We’re about shaving yoctoseconds off latency for every layer in the stack,” he said. “If we rent from a public cloud, we’re using servers that are, by definition, generic and unpredictable.”
    • @liviutudor: Netflix: approx 250 Cassandra clusters over 7,000+ server instances #cloud
    • @GreylockVC: "More billion-dollar marketplaces will be created in the next five years than in the previous 20." - @simonrothman 
    • CDIXON: Exponential growth curves in the “feels gradual” phase are deceptive. There are many things happening today in technology that feel gradual and disappointing but will soon feel sudden and amazing.
    • @badnima: OH: "The gossip protocol has reached its scaling limits"
    • marcosdumay: People get pretty excited every time physicists talk about information. The bottom line is that information manipulation is just Math, viewed by a different angle.
    • Bill Janeway: There's only one way to hedge against uncertainty in venture capital...cash and control. Enough cash that when something goes wrong you can buy time to figure out what is and assess what you can do about it. 
    • zylo4747's coworker: Where's the step about preparing to have all your plans crushed and rushing shit out the door as fast as possible?
    • Martin Fowler: don't even consider microservices unless you have a system that's too complex to manage as a monolith. 
    • @postwait: Ingesting, querying, & visualizing data isn't a monitoring system. It isn't even sufficient plumbing for such a system. #srecon15europe
    • @techsummitpr: "Up to date weather conditions? It's not a marvel from Google, it's a marvel from the National Weather Service." @timoreilly #techsummitpr
    • @sovereignfund: Verified as legit: The top 25 hedge fund managers earn more than all kindergarten teachers in U.S. combined. 
    • Adrian Colyer: In their evaluation, the authors found that mixing MapReduce and memcached traffic in the same network extended memcached latency in the tail by 85x compared to an interference free network. 
    • @BenedictEvans: US ecommerce revenues 1999: $12bn 2013: $219bn
    • Gregory Hickok: the brain samples the world in rhythmic pulses, perhaps even discrete time chunks, much like the individual frames of a movie. From the brain’s perspective, experience is not continuous but quantized.
    • David Bollier: There is no master inventory of commons. They can arise whenever a community decides it wishes to manage a resource in a collective manner, with a special regard for equitable access, use and sustainability.

  • What’s Next for Moore’s Law?: I predict that Intel's 10nm process technology will use Quantum Well FETs (QWFETs) with a 3D fin geometry, InGaAs for the NFET channel, and strained Germanium for the PFET channel, enabling lower voltage and more energy efficient transistors in 2016, and the rest of the industry will follow suit at the 7nm node.

  • Don't read How to Build a Unicorn From Scratch – and Walk Away with Nothing if you are easily frightened. Years of work down the drain. **chills** To walk safely through the Valley: Focus on terms, not just valuation; Build a waterfall; Don’t do bad business deals just to get investment capital; Understand the motivations of others; Understand your own motivation.

  • How do you build a real-time chat system? Scaling Secret: Real-time Chat. Goal was to handle 50,000 simultaneous conversations. Pusher was used to deliver messages. For a database Secret used Google App Engine’s High-Replication Datastore. Some nice details on the schema and other issues. Good thread on HN where the main point of contention is should an expensive service like Pusher be used to do something so simple? Usual arguments about wasting money vs displaying your hacker plumage. 

  • Under the hood: Facebook’s cold storage system. A top to bottom reengineering to save power for infrequently accessed photos. Yes, that's cool. Each cold storage datacenter uses 1/6th the energy as a normal datacenter while storing hundreds of petabytes of data. Erasure coding is used to store data. Data is scanned every 30 days to recreate any lost data.  As capacity is added data is rebalanced to the new racks. No file system is used at all. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

To see the Future of the Apple Watch Just Go to Disneyland

Wed, 05/13/2015 - 17:07

by AreteStock

 

Removing friction. That’s what the Apple Watch is good at.

Many think watches are a category flop because they don’t have that obvious killer app. Like hot sauce, maybe a watch isn’t something you eat all by itself, but it gives whatever you sprinkle it on a little extra flavor?

Walk into your hotel, the system recognizes you, your room number pops up on your watch, you walk directly to your room and unlock it with your watch.

Walk into an airport, your flight displays on your watch along with directions to your terminal. To get on the plane you just flash your watch. On landing, walk to your rental car and unlock it with your watch.

A notification arrives that it’s time to leave for your meeting, traffic is bad, best get an early start.

While shopping you check with your partner if you need milk by talking directly through your watch. In the future you’ll just know if you need milk, but we’re not there yet.

You can do all these things with a phone. Google Now, for example. What the easy accessibility of the watch does in these scenarios is remove friction. It makes it natural for a complex backend system to talk to you about things it learns from you and your environment. Hiding in a pocket or a purse, a phone is too inconvenient and too general purpose. Your watch becomes a small custom viewport on to a much larger more connected world.

After developing my own watch extension, using other extensions, and listening to a lot of discussion on the subject, it’s clear the form factor of a watch is very limiting and will always be limiting. You’ll never be able to do much UI-wise on a watch. Even the cleverest programmers can only do so much with so little screen real estate and low resource usage requirements. Instagram and Evernote simply aren’t the same on a watch.

But that’s OK. Every device has what it does well. It takes time for users and developers to explore a new device space.

What a watch does well is not so much enable new types of apps, but plug people into much larger and smarter systems. This is where the friction is removed.

Re-enchanting the World Disneyland Style
Categories: Architecture

Sponsored Post: Learninghouse, OpenDNS, MongoDB, Internap, Aerospike, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 05/12/2015 - 16:56

Who's Hiring?
  • The Cloud Platform team at OpenDNS is building a PaaS for our engineering teams to build and deliver their applications. This is a well rounded team covering software, systems, and network engineering and expect your code to cut across all layers, from the network to the application. Learn More

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • 90 Days. 1 Bootcamp. A whole new life. Interested in learning how to code? Concordia St. Paul's Coding Bootcamp is an intensive, fast-paced program where you learn to be a software developer. In this full-time, 12-week on-campus course, you will learn either .NET or Java and acquire the skills needed for entry-level developer positions. For more information, read the Guide to Coding Bootcamp or visit bootcamp.csp.edu.

  • June 2nd – 4th, Santa Clara: Register for the largest NoSQL event of the year, Couchbase Connect 2015, and hear how innovative companies like Cisco, TurboTax, Joyent, PayPal, Nielsen and Ryanair are using our NoSQL technology to solve today’s toughest big data challenges. Register Today.

  • The Art of Cyberwar: Security in the Age of Information. Cybercrime is an increasingly serious issue both in the United States and around the world; the estimated annual cost of global cybercrime has reached $100 billion with over 1.5 million victims per day affected by data breaches, DDOS attacks, and more. Learn about the current state of cybercrime and the cybersecurity professionals in charge with combatting it in The Art of Cyberwar: Security in the Age of Information, provided by Russell Sage Online, a division of The Sage Colleges.

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at http://mongodbworld.com/ but hurry! Super Early Bird pricing ends on April 3.
Cool Products and Services
  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • Instructions for implementing Redis functionality in Aerospike. Aerospike Director of Applications Engineering, Peter Milne, discusses how to obtain the semantic equivalent of Redis operations, on simple types, using Aerospike to improve scalability, reliability, and ease of use. Read more.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Designing for Scale - Three Principles and Three Practices from Tapad Engineering

Mon, 05/11/2015 - 16:56

This is a guest post by Toby Matejovsky, Director of Engineering at Tapad (@TapadEng).

Here at Tapad, scaling our technology strategically has been crucial to our immense growth. Over the last four years we’ve scaled our real-time bidding system to handle hundreds of thousands of queries per second. We’ve learned a number of lessons about scalability along that journey.

Here are a few concrete principles and practices we’ve distilled from those experiences:

  • Principle 1: Design for Many
  • Principle 2: Service-Oriented Architecture Beats Monolithic Application
  • Principle 3: Monitor Everything
  • Practice 1: Canary Deployments
  • Practice 2: Distributed Clock
  • Practice 3: Automate To Assist, Not To Control
Principle 1: Design for Many
Categories: Architecture

Stuff The Internet Says On Scalability For May 8th, 2015

Fri, 05/08/2015 - 16:56

Hey, it's HighScalability time:


Not spooky at all. A 1,000 robot self-organizing flash mob.
  • 400 ppm: global CO2 concentration; 13.1 billion: distance in light-years of farthest galaxy
  • Quotable Quotes:
    • Pied Piper: It’s built on a universal compression engine that stacks on any file, data, video or image no matter what size.
    • Bokardo: 1 hour of research saves 10 hours of development time
    • @12Knocksinna: Microsoft uses Cassandra open source tech to help manage the 500+ million events generated by Office 365 hourly (along with SQL and Azure)
    • @antirez: Redis had a lot of client libs ASAP. By reusing the Redis protocol, Disque is getting clients even faster, and 2700 Github stars in 9 days!
    • @blueben: AWS Glacier seems like a great DR option until you realize it costs $180,000 to retrieve your 100TB archive in an emergency.
    • Peter Diamandis: The best way to become a billionaire is to solve a billion-person problem.
    • Cordkillers: YouTube visits up 40% from last year
    • @acroll: "It's about economics not innovation, otherwise we'd all be flying Concorde instead of Jumbo Jets." @JulieMarieMeyer #StrataHadoop
    • @DLoesch: Start time delayed because cable systems are overloaded due to PPV buys. Insane. Don't snooze, don't lose! #MayPac
    • grauenwolf: This is where unit test fanboys piss me off. They claim that they can't use integration tests because they are too slow. I claim that they need integration tests to find their slow queries.
    • nuclearqtip: The open source world needs a standardized trust model for binary artifacts. 
    • Greg Ferro: SDN and SNA are about as similar Model T Ford & any modern car. For the record, no drives a Model T Ford to work everyday. Stop comparing SDN to SNA. Its pointless.
    • Urs Hölzle: Now the decade of work we put into NoSQL is available to everyone using GCP.  One way it shows that we've been working on this longer than anyone else: 99% read latency is 6ms vs ~300ms for other systems.
    • Swardley: Cloud is not about saving money - never was. It's about doing more stuff with exactly the same amount of money. That can cause a real headache in competition. 
    • Johns Hopkins: scientists have discovered that neurons are risk takers: They use minor "DNA surgeries" to toggle their activity levels all day, every day. 

  • Tesla's Powerwall has already sold out. So will Tesla's next gigafactory be a terafactory or a petafactory?

  • Something to keep in mind when hiring: 21% of [NFL] Hall of Fame players were selected in the 4th round or later.

  • Move along, nothing to see here. Brett Slatkin: I wonder how long it will be before people realize that all of this server orchestration business is a waste of time? Ultimately, what you really want is to never think about systems like Borg that schedule processes to run on machines. That's the wrong level of abstraction. You want something like App Engine, vintage 2008 platform as a service, where you run a single command to deploy your system to production with zero configuration.

  • Can any product withstand Aphyr's Jepsen partition torture test? Aeropspike, Elasticsearch, MongoDB, RabbitMQ, Riak, Cassandra, Kafka, NuoDB, Postgres, Redis, all had problems when stress tested under network partitions. Not surprising really, as Aphyr says, "Distributed systems design is really hard." That we find problems in popular well regarded products indicates that "We need formal theory, written proofs, computer verification, and experimental demonstration that our systems make the tradeoffs we think they make. As systems engineers, we continually struggle to erase the assumption of safety before that assumption causes data loss or downtime. We need to clearly document system behaviors so that users can make the right choices. We must understand our systems in order to explain them–and distributed systems are hard to understand." gmagnusson has a good sense of things: "I admire the work that Aphyr does - though at the end of the day, I need to build systems that work for the problem I'm trying to solve (and I have to choose from real things that are available). These technologies in general are trying to address really hard problems and design and architecture is the art of balancing tradeoffs. Nothing is going to be perfect. Yet."  

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Varnish Goes Upstack with Varnish Modules and Varnish Configuration Language

Wed, 05/06/2015 - 16:56

This is a guest post by Denis Brækhus and Espen Braastad, developers on the Varnish API Engine from Varnish Software. Varnish has long been used in discriminating backends, so it's interesting to see what they are up to.

Varnish Software has just released Varnish API Engine, a high performance HTTP API Gateway which handles authentication, authorization and throttling all built on top of Varnish Cache. The Varnish API Engine can easily extend your current set of APIs with a uniform access control layer that has built in caching abilities for high volume read operations, and it provides real-time metrics.

Varnish API Engine is built using well known components like memcached, SQLite and most importantly Varnish Cache. The management API is written in Python. A core part of the product is written as an application on top of Varnish using VCL (Varnish Configuration Language) and VMODs (Varnish Modules) for extended functionality.

We would like to use this as an opportunity to show how you can create your own flexible yet still high performance applications in VCL with the help of VMODs.

VMODs (Varnish Modules)
Categories: Architecture

Elements of Scale: Composing and Scaling Data Platforms

Mon, 05/04/2015 - 16:56

This is a guest repost of Ben Stopford's epic post on Elements of Scale: Composing and Scaling Data Platforms. A masterful tour through the evolutionary forces that shape how systems adapt to key challenges.

As software engineers we are inevitably affected by the tools we surround ourselves with. Languages, frameworks, even processes all act to shape the software we build.

Likewise databases, which have trodden a very specific path, inevitably affect the way we treat mutability and share state in our applications.

Over the last decade we’ve explored what the world might look like had we taken a different path. Small open source projects try out different ideas. These grow. They are composed with others. The platforms that result utilise suites of tools, with each component often leveraging some fundamental hardware or systemic efficiency. The result, platforms that solve problems too unwieldy or too specific to work within any single tool.

So today’s data platforms range greatly in complexity. From simple caching layers or polyglotic persistence right through to wholly integrated data pipelines. There are many paths. They go to many different places. In some of these places at least, nice things are found.

So the aim for this talk is to explain how and why some of these popular approaches work. We’ll do this by first considering the building blocks from which they are composed. These are the intuitions we’ll need to pull together the bigger stuff later on.

Categories: Architecture

Stuff The Internet Says On Scalability For May 1st, 2015

Fri, 05/01/2015 - 16:56

Hey, it's HighScalability time:


Got containers? Gorgeous shot of the CSCL Globe (by Walter Scriptunas II), world's largest container ship: 1,313ft long; 19,000 standard containers.
  • $3000: Tesla's new 7kWh daily cycle battery.
  • Quotable Quotes:
    • @mamund: "Turns out there is nothing about HTTP that I like" --  Douglas Crockford 
    • @PeterChch: Your little unimportant site might be hacked not for your data but for your aws resources. E.g. bitcoin mining.
    • @Joseph_DeSimone: I find it stunning that Google's annual R&D budget totaled $9.8 billion and the Budget for the National Science Foundation was $7.3 billion
    • @jedberg: The new EC2 container service adds the missing granularity to #ec2
    • Randy Shoup: “Every service at Google is either deprecated or not ready yet.”  -- Google engineering proverb
    • @mtnygard: Today the ratio of admins to servers in a well-behaved scalable web companies is about 1 to 10,000. @botchagalupe #craftconf
    • @joshk: Data: There Are Over 9x More Private IPOs Than Actual Tech IPOs 
    • @nwjsmith: “Systems are not algorithms. Systems are much more complex.“ #CraftConf @skamille
    • kk: “Because the center of the universe is wherever there is the least resistance to new ideas.”
    • John Allspaw: Stop thinking that you’re trying to solve a troubleshooting problem; you’re not. Instead of telling me about how your software will solve problems, show me that you’re trying to build a product that is going to join my team as an awesome team member, because I’m going to think about using/buying your service in the same way that I think about hiring.
    • @mpaluchowski: "Netflix is a #logging system that happens to play movies." #CraftConf
    • John Wilke:  Resiliency is more important than performance.
    • @peakscale: The server/cattle metaphor rubs me the wrong way... all the farmers I knew and worked for named and cared about their herd.
    • @aphyr: "We've managed to run 40 services in prod for three years without needing to introduce a consensus system" @skamille, #CraftConf
    • @ryantomlinson: “Spotify have been using DNS for service discovery for a long time” #CraftConf
    • @csanchez: Google "we start over 2 billion containers per week" containers, containers, containers! #qconlondon 
    • @tyler_treat: If you're using RabbitMQ, consider replacing it with Kafka. Higher throughput, better replication, replayability. Same goes for other MQs.
    • @tastapod: @botchagalupe telling #CraftConf how it is! “Yelp is spinning up 8 containers a second. This is the real sh*t, man!”
    • @mpaluchowski: "A static #alert threshold won't be any good next week. It must be calculated." #CraftConf
    • @mtnygard: #craftconf @randyshoup “Microservices are an answer to a scaling problem, not a business problem.”  So right.
    • @adrianco: @mtnygard @randyshoup speed of development is the business problem that leads to Microservices.
    • @b6n: the aws financials should be a wake-up call to anyone still thinking cloud isn't a game of raw scale
    • @mtnygard: The “edge” used to be top-of-rack. Then the hypervisor. Now it’s the container. That’s 100x the number of IPs. — @botchagalupe #craftconf
    • @idajantis: 'An escalator can never break; it can only become stairs' - nice one by @viktorklang at #CraftConf on Distributed Systems failing gracefully
    • @jessitron: "You should store your data in a real database and replicate it to Elasticsearch." @aphyr #CraftConf

  • A telling difference between Google and Apple: Google Now becomes a more robust platform with 70 new partner apps. Apple takes an app-centric view of the world and Google not surprisingly takes a data centric view. With Google developers feed Google data for Google to display. With Apple developers feed Apple apps for users to consume. On Apple developers push their own brand and control functionality through bundled extensions, but Google will have the perspective to really let their deep learning prowess sing. So there's a real choice.

  • How appropriate that game theory is applied to cyberwarfare. Mutually Assured Destruction isn't just for nukes. Pentagon Announces New Strategy for Cyberwarfare: “Deterrence is partially a function of perception,” the new strategy says. “It works by convincing a potential adversary that it will suffer unacceptable costs if it conducts an attack on the United States, and by decreasing the likelihood that a potential adversary’s attack will succeed.

  • Reducing big data using ideas from quantum theory makes it easier to interpret. So maybe QM is nature's way of making sense of the BigData that is the Universe?

  • Synergy is not always BS. Cheaper bandwidth or bust: How Google saved YouTube: YouTube was burning through $2 million a month in bandwidth costs before the acquisition. What few knew at the time was that Google was a pioneer in data center technology, which allowed it to dramatically lower the costs of running YouTube.

  • In a winner take all market is the cost of customer acquisition pyrrhic? Uber Burning $750 Million in a Year.

  • The cloud behind the cloud. Apple details how it rebuilt Siri on Mesos: Apple’s custom Mesos scheduler is called J.A.R.V.I.S.; Apple uses J.A.R.V.I.S. as its internal platform-as-a-service; Apple’s Mesos cluster spans thousands of nodes and runs about a hundred services; Siri’s Mesos backend represents its third generation, and a move away from “traditional” infrastructure.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture