Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator/sources/3&page=3' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.
Syndicate content
Updated: 3 hours 35 min ago

Free Book: Practical Scalablility Analysis with the Universal Scalability Law

Wed, 11/18/2015 - 17:56

If you are very comfortable with math and modeling Dr. Neil Gunther'Universal Scalability Law is a powerful way of predicting system performance and whittling down those bottlenecks. If not, the USL can be hard to wrap your head around.

There's a free eBook for that. Performance and scalability expert Baron Schwartz, founder of VividCortex, has written a wonderful exploration of scalability truths using the USL as a lens: Practical Scalablility Analysis with the Universal Scalability Law

As a sample of what you'll learn, here are some of the key takeaways from the book:

  • Scalability is a formal concept that is best defined as a mathematical function.
  • Linear scalability means equal return on investment. Double down on workers and you’ll get twice as much work done; add twice as many nodes and you’ll increase the maximum capacity twofold. Linear scalability is oft claimed but seldom delivered.
  • Systems scale sublinearly because of contention, which adds queueing delay, and crosstalk, which inflates service times. The penalty for contention grows linearly and the crosstalk penalty grows quadratically. (An alternative to the crosstalk theory is that longer queues are more costly to manage.)
  • Contention causes throughput to asymptotically approach the reciprocal of the serialized fraction of the workload. If your workload is 5% serialized you’ll never grow the effective speedup by more than 20-fold
  • Crosstalk causes the system to regress. The harder you try to push systems with crosstalk, the more time they spend fighting amongst themselves.
  • To build scalable systems, avoid contention (serialization) and crosstalk (synchronization). The contention and crosstalk penalties degrade system scalability and performance much faster than you’d think. Even tiny amounts of serialization or pairwise data synchronization cause big losses in efficiency.
  • If you can’t avoid crosstalk, partition (shard) into smaller systems that will lose less efficiency by avoiding the explosion of service times at larger sizes.
  • To model systems with the USL, obtain measurements of throughput at various levels of load or size, and use regression to estimate the parameters to Equation 3.
  • To forecast scalability beyond what’s observable, be pessimistic and treat the USL as a best-case scenario that won’t really happen. Use Equation 4 to forecast the maximum possible throughput, but don’t forecast too far out. Use Equation 6 to forecast response time.
  • Use your judgment to predict limitations that USL can’t see, such as saturation of network bandwidth or changes in the system’s model when all of the CPUs become busy
  • Use the USL to explain why systems aren’t scaling well. Too much queueing? Too much crosstalk? Treat the USL as a pessimistic model and demand that your systems scale at least as well as it does.
  • If you see superlinear scaling, check your measurements and how you’ve set up the system under test. In most cases σ should be positive, not negative. Make sure you’re not varying the system’s dimensions relative to each other and creating apparent superlinear efficiencies that don’t really exist.
  • It’s fun to fantasize about models that might match observed system behavior more closely than the USL, but the USL arises analytically from how we know queueing systems work. Invented models might not have any basis in reality. Besides, the USL usually models systems extremely well up to the point of inflection, and modeling what happens beyond that isn’t as interesting as knowing why it happens.
  • Never trust a scatterplot with an arbitrary curve fit through it unless you know why that’s the right curve. Don’t confuse the USL, hockey stick charts from queueing theory, or other charts that just happen to have similar shapes. Know what shape various plots should exhibit, and suspect bad measurements or other mistakes if you don’t see them.

Note, the link to the eBook requires entering some data, but it's free, well written, and useful, so it's probably worth it.

Related Articles
Categories: Architecture

9ish Low Latency Strategies for SaaS Companies

Mon, 11/16/2015 - 17:56

Achieving very low latencies takes special engineering, but if you are a SaaS company latencies of a few hundred milliseconds are possible for complex business logic using standard technologies like load balancers, queues, JVMs, and rest APIs.

Itai Frenkel, a software engineer at Forter, which provides a Fraud Prevention Decision as a Service, shows how in an excellent article: 9.5 Low Latency Decision as a Service Design Patterns.

While any article on latency will have some familiar suggestions, Itai goes into some new territory you can really learn from. The full article is rich with detail, so you'll want to read it, but here's a short gloss:

Categories: Architecture

How Facebook's Safety Check Works

Sat, 11/14/2015 - 17:33

I noticed on Facebook during this horrible tragedy in Paris that there was some worry because not everyone had checked in using Safety Check (video). So I thought people might want to know a little more about how Safety Check works.

If a friend or family member hasn't checked-in yet it doesn't mean anything bad has happened to them. Please keep that in mind. Safety Check is a good system, but not a perfect system, so keep your hopes up.

This is a really short version, there's a longer article if you are interested.

When is Safety Check Triggered?
  • Before the Paris attack Safety Check was only activated for natural disasters. Paris was the first time it was activated for human disasters and they will be doing it more in the future. As a sign of this policy change, Safety Check has been activated for the recent bombing in Nigeria.

How Does Safety Check Work?
  • If you are in an area impacted by a disaster Facebook will send you a push notification asking if you are OK. 

  • Tapping the “I’m Safe” button marks that your are safe.

  • All your friends are notified that you are safe.

  • Friends can also see a list of all the people impacted by the disaster and how they are doing.

How is the impacted area selected?
  • Since Facebook only has city-level location for most users, declaring the area isn't as hard as drawing on a map. Facebook usually selects a number of cities, regions, states, or countries that are affected by the crisis.

  • Facebook always allows people to declare themselves into the crisis (or out) in case the geolocation prediction is inaccurate. This means Facebook can be a bit more selective with the geographic area, since they want a pretty high signal with the notifications. Notification click-through and conversion rates are used as downstream signals on how well a launch went.

  • For something like Paris, Facebook selected the whole city and launched. Especially with the media reporting "Paris terror attacks," this seemed like a good fit.

How do you build the pool of people impacted by a disaster in a certain area?
  • Building a geoindex is the obvious solution, but it has weaknesses.

  • People are constantly moving so the index will be stale.

  • A geoindex of 1.5 billion people is huge and would take a lot of resources they didn’t have. Remember, this is a small team without a lot of resources trying to implement a solution.

  • Instead of keeping a data pipeline that’s rarely used active all of the time, the solution should work only when there is an incident. This requires being able to make a query that is dynamic and instant.

  • Facebook does not have GPS-level location information for the majority of its user base (only those that turn on the nearby friends feature), so they use the same IP2Geo prediction algorithms that Google and other web companies use -- essentially determining city level location based on IP address.

The solution leveraged the shape of the social graph and its properties:
  • When there’s a disaster, say an earthquake in Nepal, a hook for Safety Check is turned on in every single news feed load.

  • When people check their news feed the hook executes. If the person checking their news feed is not in Nepal then nothing happens.

  • When someone in Nepal checks their news feed is when the magic happens.

  • Safety Check fans out to all their friends on their social graph. If a friend is in the same area then a push notification is sent asking if they are OK.

  • The process keeps repeating recursively. For every friend found in the disaster area a job is spawned to check their friends. Notifications are sent as needed.

In Practice this Solution Was Very Effective
  • At the end of the day it's really just DFS (Depth First Search) with seen state and selective exploration.

  • The product experience feels live and instant because the algorithm is so fast at finding people. Everyone in the same room, for example, will appear to get their notifications at the same time. Why?

  • Using the news feed gives a random sampling of users that is biased towards the most active users with the most friends. And it filters out inactive users, which is billions of rows of computation which need not be performed.

  • The graph is dense and interconnectedSix Degrees of Kevin Bacon is wrong, at least on Facebook. The average distance between any two of Facebook’s 1.5 billion users is 4.74 edges. Sorry Kevin. With 1.5 billion users the whole graph can be explored within 5 hops. Most people can be efficiently reached by following the social graph.

  • There’s a lot of parallelism for free using a social graph approach. Friends can be assigned to different machines and processed in parallel. As can their friends, and so on.

  • Isn't it possible to use something like Hadoop/Hive/Presto to simply get a list of all users in Paris on demand? Hive and Hadoop are offline. It can take ~45 minutes to execute a query on Facebook's entire user table (even longer if it involves joins) and certain times of the day its slower (during work hours usually). Not only that, but once the query executes some engineer has to go copy and paste into a script that would likely run on one machine. Doing this in a distributed async job fashion allowed for a lot more flexibility. Even better, it's possible to change the geographic area as the algorithm runs and those changes are reflected immediately. 

  • The cost of searching for the users in the area directly correlates with the size of the crisis (geographically). A smaller crises ends up being fairly cheap, whereas larger crises end up checking on a larger and larger portion of the userbase until 100% of the user base is reached. For Nepal, a big disaster, ~1B profiles were checked. For some smaller launches only ~100k profiles were checked. Had an index been used, or an offline job that did joins and filters, the cost would be constant, no matter how small the crisis.

On HackerNews

Categories: Architecture

Stuff The Internet Says On Scalability For November 13th, 2015

Fri, 11/13/2015 - 17:56

Hey, it's HighScalability time:


Gorgeous picture of where microbes live in species. Humans have the most. (M. WARDEH ET AL)

  • 14.3 billion: Alibaba single day sales; 1.55 billion: Facebook monthly active users; 6 billion: Snapchat video views per day; unlimited: now defined as 300 GB by Comcast; 80km: circumference of China's proposed supercolider; 500: alien worlds visualized; 50: future sensors per acre on farms; 1 million: Instagram requests per second.

  • Quotable Quotes:
    • Adam Savage~ Lesson learned: do not test fire rockets indoors.
    • dave_sullivan: I'm going to say something unpopular, but horizontally-scaled deep learning is overkill for most applications. Can anyone here present a use case where they have personally needed horizontal scaling because a Titan X couldn't fit what they were trying to do? 
    • @bcantrill: Question I've been posing at #KubeCon: are we near Peak Confusion in the container space? Consensus: no -- confusion still accelerating!
    • @PeterGleick: When I was born, CO2 levels were  ~300 ppm. This week may be the last time anyone alive will see less than 400 ppm. 
    • @patio11: "So I'm clear on this: our business is to employ people who can't actually do worthwhile work, train them up, then hand to competition?"
    • Settlement-Size: This finding reveals that incipient forms of hierarchical settlement structure may have preceded socioeconomic complexity in human societies
    • wingolog: for a project to be technically cohesive, it needs to be socially cohesive as well; anything else is magical thinking.
    • @mjpt777: Damn! @toddlmontgomery has got Aeron C++ IPC to go at over 30m msg/sec. Java is struggling to keep up.
    • Tim O'Reilly: While technological unemployment is a real phenomenon, I think it's far more important to look at the financial incentives we've put in place for companies to cut workers and the cost of labor. If you're a public company whose management compensation is tied your stock price, it's easy to make short term decisions that are good for your pocketbook but bad long term for both the company and for society as a whole.
    • @RichardDawkins: Evolution is "Descent with modification". Languages, computers and fashions evolve. Solar systems, mountains and embryos don't. They develop
    • @Grady_Booch: Dispatches from a programmer in the year 2065: "How do you expect me to fit 'Hello, World' into only a terabyte of memory?" via Joe Marasco
    • @huntchr: I find #Zookeeper to be the Achilles Heal of a few otherwise interesting projects e.g. #kafka, #mesos.
    • Robert Scoble~ Facebook Live was bringing 10x more viewers than Twitter/Periscope
    • cryptoz: I've always wondered about this. Presumably the people leading big oil companies are not dumb idiots; so why wouldn't they take this knowledge and prepare in advance?

  • Waze is using data from sources you may not expect. Robert Scoble: How about Waze? I witnessed an accident one day on the highway near my house. Two lane road. The map turned red within 30 seconds of the accident. How did that happen? Well, it turns out cell phone companies (Verizon, in particular, in the United States) gather real time data from cell phones. Your phone knows how fast it’s going. In fact, today, Waze shows you that it knows. Verizon sells that data (anonymized) to Google, which then uses that data to put the red line on your map.

  • If email would have been done really right in the early days then we wouldn't need half the social networks or messaging apps we have today. Almost everything we see is a reimplementation of email. Gmail, We Need To Talk.

  • Don Norman and Bruce Tognazzini, prophets from Apple's time in the wilderness, don't much like the new religion. They stand before the temple shaking fists at blasphemy. How Apple Is Giving Design A Bad Name: Apple is destroying design. Worse, it is revitalizing the old belief that design is only about making things look pretty. No, not so! Design is a way of thinking, of determining people’s true, underlying needs, and then delivering products and services that help them. Design combines an understanding of people, technology, society, and business. 

  • There's a new vision of the Internet out there and it's built around the idea of Named Data Networking (NDN). It's an evolution from today’s host-centric network architecture IP to a data-centric network architecture. Luminaries like Van Jacobson like the idea. Packet Pushers with good coverage in Show 262 – Future of Networking – Dave Ward. Dave Ward is the CTO of Engineering and Chief Architect at Cisco. For me, make the pipes dumb, fast, and secure. Everything else is emergent.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Sponsored Post: StatusPage.io, Digit, iStreamPlanet, Instrumental, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 11/10/2015 - 17:56

Who's Hiring?
  • Senior Devops Engineer - StatusPage.io is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.

  • Digit Game Studios, Irish’s largest game development studio, is looking for game server engineers to work on existing and new mobile 3D MMO games. Our most recent project in development is based on an iconic AAA-IP and therefore we expect very high DAU & CCU numbers. If you are passionate about games and if you are experienced in creating low-latency architectures and/or highly scalable but consistent solutions then talk to us and apply here.

  • As a Networking & Systems Software Engineer at iStreamPlanet you’ll be driving the design and implementation of a high-throughput video distribution system. Our cloud-based approach to video streaming requires terabytes of high-definition video routed throughout the world. You will work in a highly-collaborative, agile environment that thrives on success and eats big challenges for lunch. Please apply here.

  • As a Scalable Storage Software Engineer at iStreamPlanet you’ll be driving the design and implementation of numerous storage systems including software services, analytics and video archival. Our cloud-based approach to world-wide video streaming requires performant, scalable, and reliable storage and processing of data. You will work on small, collaborative teams to solve big problems, where you can see the impact of your work on the business. Please apply here.

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event could be here. How cool is that?
Cool Products and Services
  • Instrumental is a hosted real-time application monitoring platform. In the words of one of our customers: "Instrumental is the first place we look when an issue occurs. Graphite was always the last place we looked." - Dan M

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

A 360 Degree View of the Entire Netflix Stack

Mon, 11/09/2015 - 17:56

This is a guest repost by Chris Ueland, creator of Scale Scale, with a creative high level view of the Netflix stack.

As we research and dig deeper into scaling, we keep running into Netflix. They are very public with their stories. This post is a round up that we put together with Bryan’s help. We collected info from all over the internet. If you’d like to reach out with more info, we’ll append this post. Otherwise, please enjoy!

–Chris / ScaleScale / MaxCDN


A look at what we think is interesting about how Netflix Scales
Categories: Architecture

Stuff The Internet Says On Scalability For November 6th, 2015

Fri, 11/06/2015 - 05:56

Hey, it's HighScalability time:


Cool geneology of Relational Database Management Systems.
  • 9,000: Artifacts Uncovered in California Desert; 400 Million: LinkedIn members; 100: CEOs have more retirement assets than 41% of American families; $160B: worth of AWS; 12,000: potential age of oldest oral history; fungi: world's largest miners 

  • Quotable Quotes:
    • @jaykreps: Someone tell @TheEconomist that people claiming you can build Facebook on top of a p2p blockchain are totally high.
    • Larry Page: I think my job is to create a scale that we haven't quite seen from other companies. How we invest all that capital, and so on.
    • Tiquor: I like how one of the oldest concepts in programming, the ifdef, has now become (if you read the press) a "revolutionary idea" created by Facebook and apparently the core of a company's business. I'm only being a little sarcastic.
    • @DrQz: +1 Data comes from the Devil, only models come from God. 
    • @DakarMoto: Great talk by @adrianco today quote of the day "i'm getting bored with #microservices, and I’m getting very interested in #teraservices.”
    • @adrianco: Early #teraservices enablers - Diablo Memory1 DIMMs, 2TB AWS X1 instances, in-memory databases and analytics...
    • @PatrickMcFadin: Average DRAM Contract Price Sank Nearly 10% in Oct Due to Ongoing Supply Glut. How long before 1T memory is min?
    • @leftside: "Netflix is a monitoring system that sometimes shows people movies." --@adrianco #RICON15
    • Linus: So I really see no reason for this kind of complete idiotic crap.
    • Jeremy Hsu: In theory, the new architecture could pack about 25 million physical qubits within an array that’s 150 micrometers by 150 µm. 
    • @alexkishinevsky: Just done AWS API Gateway HTTPS API, AWS Lambda function to process data straight into AWS Kinesis. So cool, so different than ever before.
    • @highway_62: @GreatDismal Food physics and candy scaling is a real thing. Expectations and ratios get blown. Mouth feel changes.
    • @randybias:  #5 you can’t get automation scaling without relative homogeneity (homologous) and that’s why the webscale people succeeded
    • Brian Biles: Behind it all: VMs won.  The only thing that kept this [Server Centric Storage is Killing Arrays] from happening a long time ago was OS proliferation on physical servers in the “Open Systems” years.  Simplifying storage for old OS’s required consolidative arrays with arbitrated-standard protocols.
    • @paulcbetts: This disk is writing at almost 1GB/sec and reading at ~2.2GB/sec. I remember in 2005 when I thought my HD reading at 60MB/sec was hot shit.
    • @merv: One of computing’s biggest challenges for architects and designers: scaling is not distributed uniformly in time or space.

  • To Zookeeper or to not Zookeeper? This is one of the questions debated on an energetic mechanical-sympathy thread. Some say Zookeeper is an unreliable and difficult to manage. Others say Zookeeper works great if carefully tended. If you need a gossip/discovery service there are alternatives: JGroups, Raft, Consul, Copycat.

  • Algorithms are as capable of tyranny as any other entity wielding power. Twins denied driver’s permit because DMV can’t tell them apart

  • Odd thought. What if Twitter took stock or options as payment for apps that want to use Twitter as a platform (not Fabric)? The current user caps would effectively be the free tier. If you want to go above that you can pay. Or you can exchange stock or options for service. This prevents the Yahoo problem of being King Makers, that is when Google becomes more valuable than you. It gives Twitter potential for growth. It aligns incentives because Twitter will be invested in the success of apps that use it. And it gives apps skin in the game. Although Twitter has to recognize the value of the stock they receive as revenue, they can offset that against previous losses.

  • One of the best stories ever told. Her Code Got Humans on the Moon—And Invented Software Itself: MARGARET HAMILTON WASN’T supposed to invent the modern concept of software and land men on the moon...But the Apollo space program came along. And Hamilton stayed in the lab to lead an epic feat of engineering that would help change the future of what was humanly—and digitally—possible. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Strategy: Avoid Lots of Little Files

Wed, 11/04/2015 - 18:59

I've been bitten by this one. It happens when you quite naturally use the file system as a quick and dirty database. A directory is a lot like a table and a file name looks a lot like a key. You can store many-to-one relationships via subdirectories. And the path to a file makes a handy quick lookup key. 

The problem is a file system isn't a database. That realization doesn't hit until you reach a threshold where there are actually lots of files. Everything works perfectly until then.

When the threshold is hit iterating a directory becomes very slow because most file system directory data structures are not optimized for the lots of small files case. And even opening a file becomes slow.

According to Steve Gibson on Security Now (@16:10) LastPass ran into this problem. LastPass stored every item in their vault in an individual file. This allowed standard file syncing technology to be used to update only the changed files. Updating a password changes just one file so only that file is synced.

Steve thinks this is a design mistake, but this approach makes perfect sense. It's simple and robust, which is good design given, what I assume, is the original reasonable expectation of relatively small vaults.

The problem is the file approach doesn't scale to larger vaults with thousands of files for thousands of web sites. Interestingly, decrypting files was not the bottleneck, the overhead of opening files became the problem. The slowdown was on the elaborate security checks the OS makes to validate if a process has the rights to open a file.

The new version of 1Password uses a UUID to shard items into one of 16 files based on the first digit of the UUID. Given good random number generation the files should grow more or less equally as items are added. Problem solved. Would this be your first solution when first building a product? Probably not.

Apologies to 1Password if this is not a correct characterization of their situation, but even if wrong, the lesson still remains.

Categories: Architecture

Paper: Coordination Avoidance in Distributed Databases By Peter Bailis

Mon, 11/02/2015 - 17:56

Peter Bailis has released the work of a lifetime, his dissertion is now available online: Coordination Avoidance in Distributed Databases.

The topic Peter is addressing is summed up nicely by his thesis statement: 

Many semantic requirements of database-backed applications can be efficiently enforced without coordination, thus improving scalability, latency, and availability.

I'd like to say I've read the entire dissertation and can offer cogent insightful analysis, but that would be a lie. Though I have watched several of Peter's videos (see Related Articles). He's doing important and interesting work, that as much University research has done, may change the future of what everyone is doing.

From the introduction:

The rise of Internet-scale geo-replicated services has led to upheaval in the design of modern data management systems. Given the availability, latency, and throughput penalties associated with classic mechanisms such as serializable transactions, a broad class of systems (e.g., “NoSQL”) has sought weaker alternatives that reduce the use of expensive coordination during system operation, often at the cost of application integrity. When can we safely forego the cost of this expensive coordination, and when must we pay the price?

In this thesis, we investigate the potential for coordination avoidance—the use of as little coordination as possible while ensuring application integrity—in several modern dataintensive domains. We demonstrate how to leverage the semantic requirements of applications in data serving, transaction processing, and web services to enable more efficient distributed algorithms and system designs. The resulting prototype systems demonstrate regular order-of-magnitude speedups compared to their traditional, coordinated counterparts on a variety of tasks, including referential integrity and index maintenance, transaction execution under common isolation models, and database constraint enforcement. A range of open source applications and systems exhibit similar results.

Related Articles 
Categories: Architecture

How Shopify Scales to Handle Flash Sales from Kanye West and the Superbowl

Mon, 11/02/2015 - 17:56

This is a guest repost by Christophe Limpalair, creator of Scale Your Code.

In this article, we take a look at methods used by Shopify to make their platform resilient. Not only is this interesting to read about, but it can also be practical and help you with your own applications.

Shopify's Scaling Challenges

Shopify, an ecommerce solution, handles about 300 million unique visitors a month, but as you'll see, these 300M people don't show up in an evenly distributed fashion.

One of their biggest challenge is what they call "flash sales". These flash sales are when tremendously popular stores sell something at a specific time.

For example, Kanye West might sell new shoes. Combined with Kim Kardashian, they have a following of 50 million people on Twitter alone.

They also have customers who advertise on the Superbowl. Because of this, they have no idea how much traffic to expect. It could be 200,000 people showing up at 3:00 for a special sale that ends within a few hours.

How does Shopify scale to these sudden increases in traffic? Even if they can't scale that well for a particular sale, how can they make sure it doesn't affect other stores? This is what we will discuss in the next sections, after briefly explaining Shopify's architecture for context.

Shopify's Architecture
Categories: Architecture

Stuff The Internet Says On Scalability For October 30th, 2015

Fri, 10/30/2015 - 16:56

Hey, it's HighScalability time:


Movie goers Force Crashed websites with record ticket presales. Yoda commented: Do. Or do not. There is no try.
  • $51.5 billion: Apple quarterly revenue; 1,481: distance in light years of a potential Dyson Sphere; $470 billion: size of insurance industry data play; 31,257: computer related documents in a scanned library; $1.2B: dollars lost to business email scams; 46 billion: pixels in largest astronomical image; 27: seconds of distraction after doing anything interesting in a car; 10 billion: transistor SPARC M7 chip; 10K: cost to get a pound in to low earth orbit; $8.2 billion: Microsoft cloud revenue; 

  • Quotable Quotes:
    • @jasongorman: A $trillion industry has been built on the very lucky fact that Tim Berners-Lee never thought "how do I monetise this?"
    • Cade Metz: Sure, the app [WhatsApp] was simple. But it met a real need. And it could serve as a platform for building all sorts of other simple services in places where wireless bandwidth is limited but people are hungry for the sort of instant communication we take for granted here in the US.
    • Adrian Hanft: Brand experts insist that success comes from promoting your unique attributes, but in practice differentiation is less profitable than consolidation.
    • Jim Butcher: It’s a tradition. Were traditions rational, they’d be procedures.
    • Albert Einstein~ Sometimes I pretend I’m the Mayor of my kitchen and veto fish for dinner. ‘Too fishy’ is what I say!
    • @chumulu: “Any company big enough to have a research lab is too big to listen to it" -- Alan Kay
    • Robin Harris: So maybe AWS has all the growth it can handle right now and doesn’t want more visibility. AWS may be less scalable than we’d like to believe.
    • Michael Nielsen: Every finitely realizable physical system can be simulated efficiently and to an arbitrary degree of approximation by a universal model (quantum) computing machine operating by finite means.
    • Sundar Pichai~ there are now more Google mobile searches than desktop searches worldwide.
    • Joe Salvia~ The major advance in the science of construction over the last few decades has been the perfection of tracking and communication.
    • apy: In other words, as far as I can tell docker is replacing people learning how to use their package manager, not changing how software could or should have been deployed.
    • @joelgrus: "Data science is a god-like power." "Right, have you finished munging those CSVs yet?""No, they have time zone data in them!"
    • @swardley: "things are getting worse. Companies are increasingly financialised and spending less on basic research" @MazzucatoM 
    • Dan Rayburn: The cause of what Akamai is seeing is a result of Apple, Microsoft and Facebook moving a larger percentage of their traffic to their in-house delivery networks.
    • @littleidea: containers will not fix your broken architecture you are welcome
    • spawndog: I've typically found the best gameplay optimization comes from a greater amount of creative freedom like you mention. Lets not do it. Lets do it less frequently. Lets organize the data into something relative to usage pattern like spatial partitions.
    • @awealthofcs: The 1800s: I hope I survive my 3 month voyage to deliver a message to London Now: The streaming on this NFL game in London is a bit spotty
    • @ddwoods2: just having buffers ≠ resilience; resilience = the capacities for changing position/size/kind of buffers, before events eat those buffers
    • unoti: There's a dangerous, contagious illness that developers of every generation get that causes them to worry about architecture and getting "street cred" even more than they worry about solving business problems. I've fallen victim to this myself, because street cred is important to me. But it's a trap.
    • @kelseyhightower: Kubernetes is getting some awesome new features: Auto scaling pods, Jobs API (batch), and a new deployment API for serve side app rollouts.

  • Great story on Optimizing League of Legends. The process: Identification: profile the application and identify the worst performing parts; Comprehension: understand what the code is trying to achieve and why it is slow; Iteration: change the code based on step 2 and then re-profile. Repeat until fast enough. Result: memory savings of 750kb and a function that ran one to two milliseconds faster. 

  • Fantastic article on Medium's architecture: 25 million uniques a month;  service-oriented architecture, running about a dozen production services; GitHub; Amazon’s Virtual Private Cloud; Ansible; mostly Node with some Go; CloudFlare, Fastly, CloudFront with interesting traffic allocations; Nginx and HAProxy; Datadog, PagerDuty, Elasticsearch, Logstash, Kibana; DynamoDB, Redis, Aurora, Neo4J; Protocol Buffers used as contract between layers; and much more.

  • Are notifications the new Web X.0? Notification: the push and the pull: Right now we are witnessing another round of unbundling as the notification screen becomes the primary interface for mobile computing.

  • Algorithm hacking 101. Uber Surge Price? Research Says Walk A Few Blocks, Wait A Few Minutes.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Five Lessons from Ten Years of IT Failures

Wed, 10/28/2015 - 16:56

IEEE Spectrum has a wonderful article series on Lessons From a Decade of IT Failures. It’s not your typical series in that there are very cool interactive graphs and charts based on data collected from past project failures. They are really fun to play with and I can only imagine how much work it took to put them together.

The overall takeaway of the series is:

Even given the limitations of the data, the lessons we draw from them indicate that IT project failures and operational issues are occurring more regularly and with bigger consequences. This isn’t surprising as IT in all its various forms now permeates every aspect of global society. It is easy to forget that Facebook launched in 2004, YouTube in 2005, Apple’s iPhone in 2007, or that there has been three new versions of Microsoft Windows released since 2005. IT systems are definitely getting more complex and larger (in terms of data captured, stored and manipulated), which means not only are they increasing difficult and costly to develop, but they’re also harder to maintain.

Here are the specific lessons:

Categories: Architecture

Sponsored Post: Digit, iStreamPlanet, Instrumental, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 10/27/2015 - 17:13

Who's Hiring?
  • Digit Game Studios, Irish’s largest game development studio, is looking for game server engineers to work on existing and new mobile 3D MMO games. Our most recent project in development is based on an iconic AAA-IP and therefore we expect very high DAU & CCU numbers. If you are passionate about games and if you are experienced in creating low-latency architectures and/or highly scalable but consistent solutions then talk to us and apply here.

  • As a Networking & Systems Software Engineer at iStreamPlanet you’ll be driving the design and implementation of a high-throughput video distribution system. Our cloud-based approach to video streaming requires terabytes of high-definition video routed throughout the world. You will work in a highly-collaborative, agile environment that thrives on success and eats big challenges for lunch. Please apply here.

  • As a Scalable Storage Software Engineer at iStreamPlanet you’ll be driving the design and implementation of numerous storage systems including software services, analytics and video archival. Our cloud-based approach to world-wide video streaming requires performant, scalable, and reliable storage and processing of data. You will work on small, collaborative teams to solve big problems, where you can see the impact of your work on the business. Please apply here.

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event could be here. How cool is that?
Cool Products and Services
  • Instrumental is a hosted real-time application monitoring platform. In the words of one of our customers: "Instrumental is the first place we look when an issue occurs. Graphite was always the last place we looked." - Dan M

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

What ideas in IT must die?

Mon, 10/26/2015 - 16:56

Are there ideas in IT that must die for progress to be made?

Max Planck wryly observed that scientific progress is often less meritocracy and more Lord of the Flies:

A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.

Playing off this insight is a thought provoking book collection of responses to a question posed on the Edge: This Idea Must Die: Scientific Theories That Are Blocking Progress. From the book blurb some of the ideas that should transition into the postmortem are: Jared Diamond explores the diverse ways that new ideas emerge; Nassim Nicholas Taleb takes down the standard deviation; Richard Thaler and novelist Ian McEwan reveal the usefulness of "bad" ideas; Steven Pinker dismantles the working theory of human behavior.

Let’s get edgy: Are there ideas that should die in IT?

What ideas do you think should pass into the great version control system called history? What ideas if garbage collected would allow us to transmigrate into a bright shiny new future? Be as deep and bizarre as you want. This is the time for it.

I have two: Winner Takes All and The Homogeneity Principle.

Winner Takes All
Categories: Architecture

Stuff The Internet Says On Scalability For October 23rd, 2015

Fri, 10/23/2015 - 16:56

Hey, it's HighScalability time:


The amazing story of Voyager's walkabout and the three body problem.
If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • $18 billion: wasted on US Army Future Combat system; 70%: Americans who support an Internet sales tax;  $1.3 billion: wasted on an interoperable health record system; trillions: NSA breaking Web and VPN connections; 615: human data teams beat by a computer; $900,000: cost of apps on your smartphone 30 years ago.

  • Quotable Quotes:
    • @PatrickMcFadin: 'Sup 10x coder. Grace Hopper invented the compiler and has a US Navy destroyer named after her. Just how badass are you again?
    • @benwerd: I love Marty McFly too, but more importantly, the first transatlantic voice transmission was sent 100 years ago today. What a century.
    • Martin Goodwell: The nearly two-billion requests that Netflix receives each day result in roughly 20 billion internal API calls.
    • sigma914: It's great to see people implementing distributed services using a vertically scalable technology stack again. The past ~decade has seen a lot of "We can scale sideways so constant overheads are irrelevant! We'll just use Java and add more machines!" which, in real life, seems to leave a lot of performance on the table.
    • Eric Schmidt: The way you build great products is small teams with strong leaders who make tradeoffs and work all night to build a product that just barely works.
    • @boulderDanH: We adopted stateful services early on @VictorOps and I always worried we were crazy. Maybe not
    • @jamesallworth: "The pressure for conformity isn’t limited to car design, it affects *everything*."
    • Eric Schmidt: Hindsight is always that you make the important decisions more quickly.
    • @fromroots: Facebook bought Instagram and WhatsApp to block Chinese competitors like Tencent and Alibaba from scaling globally quickly
    • Eric Schmidt: You’ve got to have products that can scale. What’s new is that once you have that product, you can scale very quickly. Look at Uber.
    • David Ehrenberg: So, before scaling, build your plan, get your systems in place, control your cash burn, create meaningful milestones and plan for cash-flow positive. That’s the foundation to successfully scale.
    • Francis Fukuyama: Hence patrimonialism has evolved into what is called “neopatrimonialism,” in which political leaders adopt the outward forms of modern states—with bureaucracies, legal systems, elections, and the like—and yet in reality rule for private gain. 
    • @sandromancuso: I hate all these bloody Java frameworks. Why devs keep using them? No, you won’t die if you write some code yourself. 
    • Eric Schmidt: Their point was that the industry overvalues experience, and undervalues strategic and tactical flexibility.
    • @AWSUserGroupUK: Daily load fluctuates by two orders of magnitude - auto scaling architecture is essential #BMW #reinvent
    • @tpechacek: “The greatest shortcoming of the human race,” he said, “is our inability to understand the exponential function”
    • @mjpt777: +1000 "Programming with Java concurrency is like working with the inlaws. You never know will happen." - @venkat_s #jokerconf
    • Eric Schmidt: The teams are far larger than they should be. It’s a failure of architecture — the programmers don’t have the right libraries. I hope that machine learning will fix that problem.
    • Marcus Zetterquist: Start using writing your C++, Java and Javascript code using pure functions and immutability NOW. It gets super powers
    • Eric Schmidt: Companies like ours have so much cash that the main limit is opportunities to deploy it.
    • @berkson0: Loving and hating the Scaling keynote at #AWS #reinvent. All my painfully earned infrastructure experience rendered superfluous <sigh>
    • Eric Schmidt: The day we turned on the auctions, revenue tripled.
    • James S.A. Corey: Awareness is a function of the brain just like vision or motor control or language. It isn’t exempt from being broken
    • @mitchellh: Sure, but scaling linearly from millions to trillions of requests won’t scale financially. I’m talking about financial efficiency
    • Anil Ananthaswamy: It turns out that in order to anchor the self to the body, the brain has to integrate signals from within the body with external sensations, and with sensations of position and balance. When something goes wrong with brain regions that integrate all these signals, the results are even more dramatic than out-of-body experiences
    • @alemacgo: “The whole point of science is to penetrate the fog of human senses, including common sense.”
    • @themadstone: Why is life special? Bc a billionth of a billionth of a fraction of all matter in the universe is living matter.

  • Oh how the world has changed. Here's an email from 1996: Alta Vista is a very large project, requiring the cooperation of at least 5 servers, configured for searching huge indices and handling a huge Internet traffic load. The initial hardware configuration for Alta Vista is as follows... 

  • AWS has helped change the VC industry. AWS and Venture Capital. Getting a new company off the ground takes less than a few hundred thousand dollars these days. With AWS all you have are the variable costs of what you use. Gone are the days of needing to buy a bunch of servers and the people to maintain them. Old news. More interesting is because less money is now needed to start a venture more people can help ventures get started. VC incentives are aligned with companies that need to grow quickly, like an Uber. Given that a VC only needs one in ten investments or so to be a huge hit, it's best for a VC if those other nine die as fast as possible to minimize costs. This may not align with your interests if you would like grow more organically. It's unprofitable for VCs to play in the seed funding realm. VCs used to win because they had access to capital and superior information, both of which have been commoditized at today's lower funding levels and higher availability of expertise. So if you want get to heaven, you may need an Angel.

  • Here's the IPv6 carrot. Accessing Facebook can be 10-15 percent faster over IPv6. IPv6: It's time to get on board.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

5 Lessons from 5 Years of Building Instagram

Wed, 10/21/2015 - 16:56

Instagram has always been generous in sharing their accumulated wisdom. Just take a look at the Related Articles section of this post to see how generous.

The tradition continues. Mike Krieger, Instagram co-founder, wrote a really good article on lessons learned from milestones achieved during Five Years of Building Instagram. Here's a summary of the lessons, but the article goes into much more of the connective tissue and is well worth reading.

  1. Do the simple thing first. This is the secret of supporting exponential growth. There's no need to future proof everything you do. That leads to paralysis. For each new challenge find the fastest, simplest fix for each. 
  2. Do fewer things better. Focus on a single platform. This allows you to iterate faster because not everything has to be done twice. When you have to expand create a team explicitly for each platform.
  3. Upfront work but can pay huge dividends. Create an automated scriptable infrastructure implementing a repeatable server provisioning process. This makes it easier to bring on new hires and handle disasters. Hire engineers with the right stuff who aren't afraid to work through a disaster. 
  4. Don’t reinvent the wheel. Instagram moved to Facebook's infrastructure because it allowed them to stay small and leverage a treasure trove of capabilities.
  5. Nothing lasts forever. Be open to evolve your product. Don't be afraid of creating special teams to tackle features and adapt to a rapidly scaling community.
Related Articles
Categories: Architecture