Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
Syndicate content
Updated: 11 hours 6 min ago

Algolia's Fury Road to a Worldwide API Part 3

Mon, 07/27/2015 - 16:56

The most frequent questions we answer for developers and devops are about our architecture and how we achieve such high availability. Some of them are very skeptical about high availability with bare metal servers, while others are skeptical about how we distribute data worldwide. However, the question I prefer is “How is it possible for a startup to build an infrastructure like this”. It is true that our current architecture is impressive for a young company:

  • Our high-end dedicated machines are hosted in 13 worldwide regions with 25 data-centers

  • our master-master setup replicates our search engine on at least 3 different machines

  • we process over 6 billion queries per month

  • we receive and handle over 20 billion write operations per month

Just like Rome wasn't built in a day, our infrastructure wasn't as well. This series of posts will explore the 15 instrumental steps we took when building our infrastructure. I will even discuss our outages and bugs in order to you to understand how we used them to improve our architecture.

The first blog post of this series focused on our early days in beta and the second post on the first 18 months of the service, including our first outages. In this last post, I will describe how we transformed our "startup" architecture into something new that was able to meet the expectation of big public companies.

Step 11: February 2015 Launch of our Synchronized Worldwide infrastructure
Categories: Architecture

Stuff The Internet Says On Scalability For July 24th, 2015

Fri, 07/24/2015 - 16:56

Hey, it's HighScalability time:


Walt Disney doesn't mouse around. Here's how he makes a goofy business plan.

 

  • 81%: AWS YOY growth; 400: hours of video uploaded to YouTube EVERY MINUTE; 9,000: # of mineable asteroids near earth; 1,400: light years to Earth's high latency backup node; 10K: in the future hard disks will be this many times faster 
  • Quotable Quotes:
    • @BenedictEvans: Chinese govt: At the end of 2014 China had 112.7 billion static webpages and 77.2 billion dynamic webpages. They used 9,310,312,446,467 KB
    • Michael Franklin (AMPLab): This is always a pendulum where you swing from highly distributed to more centralized and back in. My guess is there’s going to be another swing of the pendulum, where we really need to start thinking about how do you distribute processing throughout a wide area network.
    • Sherlock Holmes: Singularity is almost invariably a clue. 
    • @jpetazzo: OH: "In any team you need a tank, a healer, a damage dealer, someone with crowd control abilities, and another who knows iptables"
    • Jeff Sussna: Ultimately, the impact of containers will reach even beyond IT, and play a part in transforming the entire nature of the enterprise. 
    • @CarlosAlimurung: Impressive.  The number of #youtube channels making six figures grew by 50%. 
    • harlowja: Overall, no the community isn't perfect, yes there are issues, yes it burns some people out, but software isn't rainbows and butterflies after all.
    • werner: BTW nobody wants eventual consistency, it is a fact of live among many trade-offs. I would rather not expose it but it comes with other advantages ...
    • @VideoInkNews: We’re focused on our top three priorities – mobile, mobile and mobile, said @YouTube CEO @SusanWojcicki #VidCon2015 #keynote
    • Ivan Pepelnjak: Use a combination of MPLS/VPN and Internet VPN, or Internet VPN with 3G backup. Use multiple access methods, so the cable-seeking backhoe doesn’t bring down all uplinks.
    • @randybias: Repeat after me: containers do little to enable application portability.  If you want portability use a PaaS.  PaaS != Containers.
    • To see even more quotes please click through to see the rest of the post.

  • Can't we all just get along? And by "we" I mean humans and robots. Maybe. Inside Amazon shows by example how one new utopian community is bridging the categorical divide. Forget all your skepticism and technopanic, humans and robots can really work together in a highly efficient system.

  • A Brief History of Scaling LinkedIn. Not so brief actually. Lots of really good details. They of course started off with a monolith and ended up with a service oriented architecture. One of the most interesting ideas is the super block: groupings of backend services with a single access API. This allows us to have a specific team optimize the block, while keeping our call graph in check for each client.

  • If you want to move at the speed of software doesn't your datacenter infrastructure have to move at the same speed? Network Break 45 from Packet Pushers talks about an open source virtual software router, CloudRouter, running the latest release of OpenDaylight's SDN controller and ONOS. The idea is to make a dead simple router you can just instantiate as needed. Greg Ferro makes the point that if you don't have to care if you are starting 100 or 1000 virtual routers it changes how you go about building infrastructure. Running a Cisco Router, and F5 load balancer, and a virtual firewall, how much will it cost to spin up virtual datacenters for 100s of developers? How long will it take? How much will it cost? How does it even work? 

Categories: Architecture

Architecting Backend for a Social Product

Wed, 07/22/2015 - 16:56

This is aimed towards taking you through key architectural decisions which will make a social application a true next generation social product. The proposed changes addresses following attributes; a) availability b) reliability c) scalability d) performance and flexibility towards extensions (not modifications)

Goals

a) Ensuring that user’s content is easily discoverable and is available always.

b) Ensuring that the content pushed is relevant not only semantically but also from user’s device perspective.

c) Ensuring that the real time updates are generated, pushed and analyzed.

d) Eye towards saving user’s resources as much as possible.

e) Irrespective of server load, user’s experience should remain intact.

f) Ensuring overall application security

In summary we want to deal with an amazing challenge, where we must deal with a mega sea of ever expanding user generated contents, increasing number of users, and a constant stream of new items, all while ensuring an excellent performance. Considering the above challenge it is imperative that we must study certain key architectural elements which will influence the over system design. Here are the few key decisions & analysis.

Data Storage
Categories: Architecture

Sponsored Post: Redis Labs, Jut.io, VoltDB, Datadog, Tumblr, Power Admin, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 07/21/2015 - 16:56

Who's Hiring?
  • Make Tumblr fast, reliable and available for hundreds of millions of visitors and tens of millions of users.  As a Site Reliability Engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. Apply here now! 

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event could be here. How cool is that?
Cool Products and Services
  • New Research Shows MongoDB Outperforms NoSQL Vendors. Independent evaluators United Software Associates recently released a report in which they demonstrated that MongoDB overwhelmingly outperforms key value stores both in terms of throughput and latency across a number of configurations. Download the report to get access to performance test results comparing Couchbase, Cassandra, and MongoDB with WiredTiger. Get the report.

  • Processing the stream of votes from millions of devices and providing results for a TV reality show in real time is a role that only the fastest database can pull off. In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-heavy use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • VoltDB is a full-featured fast data platform that has all of the data processing capabilities of Apache Storm and Spark Streaming, but adds a tightly coupled, blazing fast ACID-relational database, scalable ingestion with backpressure; all with the flexibility and interactivity of SQL queries. Learn more.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Algolia's Fury Road to a Worldwide API Steps Part 2

Mon, 07/20/2015 - 17:13


The most frequent questions we answer for developers and devops are about our architecture and how we achieve such high availability. Some of them are very skeptical about high availability with bare metal servers, while others are skeptical about how we distribute data worldwide. However, the question I prefer is “How is it possible for a startup to build an infrastructure like this”. It is true that our current architecture is impressive for a young company:

  • Our high-end dedicated machines are hosted in 13 worldwide regions with 25 data-centers

  • our master-master setup replicates our search engine on at least 3 different machines

  • we process over 6 billion queries per month

  • we receive and handle over 20 billion write operations per month

Just like Rome wasn't built in a day, our infrastructure wasn't as well. This series of posts will explore the 15 instrumental steps we took when building our infrastructure. I will even discuss our outages and bugs in order to you to understand how we used them to improve our architecture.

The first blog post of the series focused on the early days of our beta. This blog post will focus on the first 18 months of our service from September 2013 to December 2014 and even include our first outages!

Step 4: January 2014
Categories: Architecture

Stuff The Internet Says On Scalability For July 17th, 2015

Fri, 07/17/2015 - 16:56

Hey, it's HighScalability time:


In case you were wondering, the world is weird. Large Hadron Collider discovers new pentaquark particle.

 

  • 3x: Uber bigger than taxi market; 250x: traffic in HotSchedules' DDoS attack; 92%: Apple’s share of the smartphone profit pie; 7: Airbnb rejections
  • Quotable Quotes:
    • Netflix: A slow or unhealthy server is worse than a down server 
    • @inconshreveable: ngrok production servers, max GC pause: Go1.4 (top) vs Go1.5. Holy 85% reduction! /cc Go team
    • Nic Fleming: The fungal internet exemplifies one of the great lessons of ecology: seemingly separate organisms are often connected, and may depend on each other.
    • @IBMResearch: With 20+ billion transistors on new chip, that's a 50% scaling improvement over today’s tech #ibmresearch #7nm 

  • Apple and Google Race to See Who Can Kill the App First. Honest question, how are people supposed to make money in this new world? Apps are quickly becoming just an identity that ties together 10 or so components that appear integrated as part of the OS, but don't look like your app at all. Reminds me of laminar flow. We are seeing a rebirth of CORBA, COM and OLE 2, this time the container is an app bound by deep linking and some ad hoc ways to push messages around. Show developers the money.

  • The dark side of Google 10x: One former exec told Business Insider that the gospel of 10x, which is promoted by top execs including CEO Larry Page, has two sides. “It’s enormously energizing on one side, but on the other it can be totally paralyzing,”

  • Wait, are we going all RAM or all flash? So confusing. MIT Develops Cheaper Supercomputer Clusters By Nixing Costly RAM In Favor Of Flash: researchers presented evidence at the International Symposium on Computer Architecture that if servers executing a distributed computation go to disk for data even just 5 percent of the time, performance takes a hit to where it's comparable with flash memory anyway. 40 servers with 10 terabytes of RAM wouldn't chew through a 10.5TB computation any better than 20 servers with 20TB of flash memory. What's involved here is moving a little computational power off of the servers and onto the chips that control the flash drives.

  • Is disruption merely a Silicon Valley fantasy? Corporate America Hasn’t Been Disrupted: the advantage enjoyed by incumbents, always substantial, has been growing in recent years...more Americans worked for big companies...Large companies are becoming more dominant in part by buying up their rivals...Consolidation could explain at least part of the rising failure rate among startups...The startup rate has declined in every major industry, every state and nearly every city, and the failure rate’s rise has been nearly as universal. 

  • What's a unikernel and why should you care? Amir Chaudhry reveals all in his Unikernels talk given at PolyConf 15. And here's the supporting blog post. Why are we still applications on top of operating systems? Most applications are single purpose so why all the complexity? Why are we building software for the cloud the same way we build it for desktops? We can do better with Unikerels where every application is a single purpose VM with a single address space.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

This. Just. This.

Thu, 07/16/2015 - 17:23

In response to an honest comment about some of Instagram's rather "ordinary engineering choices", mikeyk (Co-founder @ Instagram) had what I consider the perfect response:  We (at IG) aren't claiming to be doing revolutionary things on infrastructure--but one thing I found super valuable when scaling Instagram in the early days was having access to stories from other companies on how they've scaled. That's the spirit in which I encourage our engineers to blog about our DB scaling, our search infra, etc--I think the more open we are (as a company, but more broadly as an industry) about technical approaches + solutions, the better off we'll be. This could be the anthem for HS and is a key reason our industry continues to get better. And in case you are interested, here are just a few of those stories from Instagram:

On HackerNews

Categories: Architecture

64 Network DO’s and DON’Ts for Game Engines. Part IIIa: Server-Side

Wed, 07/15/2015 - 16:56

This article originally appeared on ITHare.com. It's one article from an excellent series of articles: Part I. Client Side; Part IIa. Protocols and APIs; Part IIb; Protocols and APIs; Part IIIb. Server-Side (deployment, optimizations, and testing); Part IV. Great TCP-vs-UDP Debate; Part V. UDP; Part VI. TCP.

In Part III of the article, we’ll discuss issues specific to server-side, as well as certain DO’s and DON’Ts related to system testing. Due to the size, part III has been split, and in this part IIIa we’ll concentrate on the issues related to Store-Process-and-Forward architecture.

18. DO consider Event-Driven programming model for Server Side too

As discussed above (see item #1 in Part I), the event-driven programming is a must for the client side; in addition, it also comes handy on the server side. Having multi-threaded logic is still a nightmare for the server-side [NoBugs2010], and keeping logic single-threaded simplifies development a lot. Whether to think that multi-threaded game logic is normal, and single-threaded logic is a big improvement, or to think that single-threaded game logic is normal, and multi-threaded logic is a nightmare – is up to you. What is clear is that if you can keep your game logic single-threaded – you’ll be very happy compared to the multi-threaded alternative.

However, unlike the client-side where performance and scalability rarely pose problems, on the server side where you need to serve hundreds of thousands of players, they become really (or, if your project is successful, “really really”) important. I know two ways of handling performance/scalability for games, while keeping logic single-threaded.

18a. Off-loading
Categories: Architecture

A Very Old Version of the PlentyOfFish Architecture that was Just Sold for $575 Million

Wed, 07/15/2015 - 16:56

PlentyOfFish was acquired by the Match Group for $575 million in cash. And it all goes to Markus Frind. Here's the story of the acquisition

Way back in 2009 I wrote architecture article on PlentyOfFish, which I'll reproduce here for historical perspective. The main theme at that time was how Markus was making great fat stacks of cash from adsense by running this huge site all by himself on a Microsoft stack.

We know the adsense goldmine played out long ago. What else has changed? We don't really know. Sometime ago we stopped getting updates on PlentyOfFish architecture changes, so that's all we have.

I doubt much remains the same however. Now 75 people work at PlentyOfFish, there are 90 million registered users, and a whopping 3.6 million active daily users, so something must be happening.

Anyway, here's the old PlentyOfFish Architecture. It still makes for interesting reading. I'm just wondering, when you get done reading, is being sold for $575 Million the ending you would expect?

PlentyOfFish Architecture
Categories: Architecture

Algolia's Fury Road to a Worldwide API

Mon, 07/13/2015 - 16:56

Guest post by Julien Lemoine, co-founder & CTO of Algolia, a developer friendly search as a service API.

The most frequent questions we answer for developers and devops are about our architecture and how we achieve such high availability. Some of them are very skeptical about high availability with bare metal servers, while others are skeptical about how we distribute data worldwide. However, the question I prefer is “How is it possible for a startup to build an infrastructure like this”. It is true that our current architecture is impressive for a young company:

  • Our high-end dedicated machines are hosted in 13 worldwide regions with 25 data-centers

  • our master-master setup replicates our search engine on at least 3 different machines

  • we process over 6 billion queries per month

  • we receive and handle over 20 billion write operations per month

Just like Rome wasn't build in a day, our infrastructure wasn't as well. This series of posts will explore the 15 instrumental steps we took when building our infrastructure. I will even discuss our outages and bugs in order to you to understand how we used them to improve our architecture.

This first part will focus on the first three first steps we took when building the service while in beta from March 2013 to August 2013.

The Cloud versus Bare metal debate
Categories: Architecture

Stuff The Internet Says On Scalability For July 10th, 2015

Fri, 07/10/2015 - 16:56

Hey, it's HighScalability time:


Spying on an ant holding its favorite microchip. (@SciencePorn)

 

  • 1,425%: malware attack ROI; 33333/sec: BP oil well datapoints generated; 8 million: mumified dogs; 5 billion: Apple map requests per week; 10 billion: parameter neural net in your basement; 1 trillion: records in Yahoo's Sherpa.
  • Quotable Quotes:
    • @warriors: "It's ironic but what the unexpected thing was that everything went exactly as we hoped. That never happens." 
    • @georgeblazer: At scale, architecture dominates material. Alan Kay #gophercon
    • Nassim Nicholas Taleb: How unpredictability rises faster and faster with dimensionality: add one variable to 100 and error doubles.
    • Elon Musk~ one of the biggest challenges in the CRS-7 event is matching events to the exact time. When you are talking a matter of milliseconds it's hard to match the ground track video to the sensors
    • @MeltingIce: NYSE halted, United grounds flights due to computer glitch, WSJ website is down. This is how it starts.
    • The Shut-In Economy: They end up asking each other which apps they work for: Postmates. Seamless. EAT24. GrubHub. Safeway.com
    • @aphyr: At a Devops Chicago meetup I asked how many people had experienced partitions in their datacenters, and over half the room raised hands.
    • @mjpt777: Simplifying code rocks. @toddlmontgomery and I are seeing new throughput highs on Aeron after simplifying and cutting indirection.
    • aphyr: Real software is fuzzier: our processes are usually not realtime, which means the network effectively extends within the node. Garbage collection, in particular, is a notorious cause of “network” partitions, because it delays messages.
    • @fgens: 2 key msgs at #AWSSummit : "Developers, we love you (IaaS is so yesterday)!" and "Go 'server-less' -- deploy code as Lambda microservices"
    • @BenedictEvans: @Jim_Edwards but the boom is caused by having 3bn people online and a generational change in the tech platform from PC to mobile.
    • @Obdurodon: "Our distributed file system doesn't work quite right and performs like crap." "OK, we'll just call it an object store then."
    • @viktorklang: Suspected main causes for slow programs: A) Doing unnecessary work B) Waiting for unnecessary work to be done
    • @jasongorman: There are only so many times one can re-learn the same patterns for remote procedure calls before one gets mightily sick of it
    • @BenedictEvans: Devices in use, end of 2014:  ~1.5bn PCs 7-800m consumer PCs 1.2-1.3bn closed Android 4-500m open Android 650-675m iOS 80m Macs, ~75m Linux.
    • vardump: When performance matters, don't chase pointers (references) and don't ever do random memory accesses if you can avoid it. Ensure variables needed about same time are near each other, so that a single cache line fill will get all. Except when other threads/cores frequently update a variable, try to keep those separately to reduce false sharing.
    • @JZdziarski: Quantum entanglement dictates if one programmer is writing productive good code, another somewhere is inexplicably porting to JavaScript.
    • There are even more quotes for your perusal, please click through to the full article.

  • Love this! jamiesonbecker: AMAZON DEPRECATES EC2 November 3, 2017, SEATTLE At the AWS Re:Invent in Las Vegas today, Amazon Web Services today announced the deprecation of Elastic Compute Cloud as it shifts toward lighter-weight, more horizontally scalable services. Amazon announced that it was giving customers the opportunity to migrate toward what it claims are lower cost "containers" and "Lambda processes".

  • Horace Dediu with an interesting view of Humanism++. This time it's preferring humans over algorithms instead of humans over faith. Curate, don't automate...or sermonize. Also interesting was the discussion on functional vs divisional organizations. Apple is the largest functional org outside of the army. Functional orgs prevent cross divisional fights for resources and power.

  • OK, this is funny: Using locks to solve your concurrency problems.

  • Do we have a precedence for the rise of walled gardens? 1492: The Year the World Began: shift of initiative—the upset in the normal state of the world—started in 1492, when the resources of the Americas began to be accessible to Westerners while remaining beyond the reach of other rival or potentially rival civilizations.

  • Here's how the new StackExchange blog was rebuilt. Pretty radical. They got rid of WordPress for a static blog built on Jekyll. While there was some contention with the move on HackerNewsjonhmchan set the record straight: Performance also wasn't the only plus here. Closing major security holes, making more of our content and technology more open, and moving to a platform that our devs liked working in are just some of the other wins. It's too early to say definitively now, but we think the change is probably a good one.

  • Should Uber be the new McDonalds? While I viscerally agree with the following statement, the idea that every place in the world must have the same laws is equally obviously wrong. Vive la difference! @paulg: Uber is so obviously a good thing that you can measure how corrupt cities are by how hard they try to suppress it.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

RebornDB: the Next Generation Distributed Key-Value Store

Wed, 07/08/2015 - 16:56

There are many key-value stores in the world and they are widely used in many systems. E.g, we can use a Memcached to store a MySQL query result for later same query, use MongoDB to store documents for better searching, etc.

For different scenarios, we should choose different key-value store. There is no silver-bullet key-value store for all solutions. But if you just want a simple key-value store, easy to use, very fast, supporting many powerful data structures, redis may be a good choice for your start.  

Redis is advanced key-value cache and store, under BSD license. It is very fast, has many data types(String, Hash, List, Set, Sorted Set …), uses RDB or AOF persistence and replication to guarantee data security, and supplies many language client libraries.

Most of all, market chooses Redis. There are many companies using Redis and it has proved its worth.

Although redis is great, it still has some disadvantages, and the biggest one is memory limitation.  Redis keeps all data in memory, which limits the whole dataset size and lets us save more data impossibly.

The official redis cluster solves this by splitting data into many redis servers, but it has not been proven in many practical environments yet. At the same time, it need us to change our client libraries to support “MOVED” redirection and other special commands, this is unacceptable in running production too. So redis cluster is not a good solution now.

QDB

We like redis, and want to go beyond its limitation, so we building a service named QDB, which is compatible with redis, saves data in disk to exceed memory limitation and keeps hot data in memory for performance.

Introduction

QDB is a redis like, fast key-value store.It has below good features:

Categories: Architecture

Sponsored Post: VoltDB, Datadog, Tumblr, Power Admin, Learninghouse, MongoDB, Internap, Aerospike, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 07/07/2015 - 16:56

Who's Hiring?
  • Make Tumblr fast, reliable and available for hundreds of millions of visitors and tens of millions of users.  As a Site Reliability Engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. Apply here now! 

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • 90 Days. 1 Bootcamp. A whole new life. Interested in learning how to code? Concordia St. Paul's Coding Bootcamp is an intensive, fast-paced program where you learn to be a software developer. In this full-time, 12-week on-campus course, you will learn either .NET or Java and acquire the skills needed for entry-level developer positions. For more information, read the Guide to Coding Bootcamp or visit bootcamp.csp.edu.

  • The Art of Cyberwar: Security in the Age of Information. Cybercrime is an increasingly serious issue both in the United States and around the world; the estimated annual cost of global cybercrime has reached $100 billion with over 1.5 million victims per day affected by data breaches, DDOS attacks, and more. Learn about the current state of cybercrime and the cybersecurity professionals in charge with combatting it in The Art of Cyberwar: Security in the Age of Information, provided by Russell Sage Online, a division of The Sage Colleges.

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at http://mongodbworld.com/ but hurry! Super Early Bird pricing ends on April 3.
Cool Products and Services
  • VoltDB is a full-featured fast data platform that has all of the data processing capabilities of Apache Storm and Spark Streaming, but adds a tightly coupled, blazing fast ACID-relational database, scalable ingestion with backpressure; all with the flexibility and interactivity of SQL queries. Learn more.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • Instructions for implementing Redis functionality in Aerospike. Aerospike Director of Applications Engineering, Peter Milne, discusses how to obtain the semantic equivalent of Redis operations, on simple types, using Aerospike to improve scalability, reliability, and ease of use. Read more.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

How Do We Explain the Unreasonable Effectiveness of IT?

Mon, 07/06/2015 - 16:59

Joseph Campbell: As Schopenhauer says, when you look back on your life, it looks as though it were a plot, but when you are into it, it’s a mess: just one surprise after another. Then, later, you see it was perfect. So, I have a theory that if you are on your own path things are going to come to you. Since it’s your own path, and no one has ever been on it before, there’s no precedent, so everything that happens is a surprise and is timely.

Why is the IT industry so darn effective? Just think about these amazing advancements. A little over 30 years ago the Apple Mac went on sale. In 2020 Benedict Evans estimates 80% of adults on earth will have a smartphone. And about at that same time applications were typically monoliths that ran on one computer. Now applications can deploy with the push of a button on cloud native architectures that exploit many thousands of CPUs using datacenter scale operating systems. And software used to be this strange specialized niche only nerds cared about or understood. Now software is in everything and is so ubiquitous it’s becoming nearly invisible. The examples could go on and on and on...and on.

These advances have evolved step-by-step over time, so we don’t even realize the full weight of the transformative changes we’ve experienced. What can account for such astonishingly rapid progress?

Stepping stones.

What the heck do stepping stones have to do with anything? Here’s a clue...do you remember the Connections TV Series by the incredible James Burke?

For an explanation we turn to Ken Stanley, Computer scientist, artificial intelligence researcher, Associate Professor at the University of Central Florida, who wrote a new book Why Greatness Cannot Be Planned: The Myth of the Objective, with a fascinatingly counterintuitive premise:

The greatest achievements become less likely when they are made objectives. The best way to achieve greatness, the truest path to “blue sky” discovery or to fulfill boundless ambition, is to have no objective at all. To achieve our highest goals, we must be willing to abandon them. 

The Big Idea
Categories: Architecture

How to Configure Alerts to Prevent Too Many Server Notifications

Mon, 06/15/2015 - 16:56

This is a guest post by Ashish Mohindroo, who leads Product for Happy Apps, a new uptime and performance monitoring system.

It isn't uncommon for system administrators to receive a stream of alarms for situations that either can wait to be addressed, will remedy themselves, or simply weren't problems to begin with. The other side of the equation is missing the reports that indicate a problem that has to be addressed right away. Options for customizing server notifications let you decide the conditions that trigger alerts, set the level of alerts, and choose the recipients based on each alert's importance.

The only thing worse than receiving too many notifications is not receiving the one alert that would keep a small glitch from becoming a big problem. Preventing over-notification requires fine-tuning system alerts so that the right people find out about problems and potential problems at the right time. Here's a three-step approach to customizing alerts:

Categories: Architecture

Sponsored Post: Datadog, Tumblr, Power Admin, Learninghouse, MongoDB, Internap, Aerospike, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 06/09/2015 - 16:56

Who's Hiring?
  • Make Tumblr fast, reliable and available for hundreds of millions of visitors and tens of millions of users.  As a Site Reliability Engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. Apply here now! 

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • 90 Days. 1 Bootcamp. A whole new life. Interested in learning how to code? Concordia St. Paul's Coding Bootcamp is an intensive, fast-paced program where you learn to be a software developer. In this full-time, 12-week on-campus course, you will learn either .NET or Java and acquire the skills needed for entry-level developer positions. For more information, read the Guide to Coding Bootcamp or visit bootcamp.csp.edu.

  • June 2nd – 4th, Santa Clara: Register for the largest NoSQL event of the year, Couchbase Connect 2015, and hear how innovative companies like Cisco, TurboTax, Joyent, PayPal, Nielsen and Ryanair are using our NoSQL technology to solve today’s toughest big data challenges. Register Today.

  • The Art of Cyberwar: Security in the Age of Information. Cybercrime is an increasingly serious issue both in the United States and around the world; the estimated annual cost of global cybercrime has reached $100 billion with over 1.5 million victims per day affected by data breaches, DDOS attacks, and more. Learn about the current state of cybercrime and the cybersecurity professionals in charge with combatting it in The Art of Cyberwar: Security in the Age of Information, provided by Russell Sage Online, a division of The Sage Colleges.

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at http://mongodbworld.com/ but hurry! Super Early Bird pricing ends on April 3.
Cool Products and Services
  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • Instructions for implementing Redis functionality in Aerospike. Aerospike Director of Applications Engineering, Peter Milne, discusses how to obtain the semantic equivalent of Redis operations, on simple types, using Aerospike to improve scalability, reliability, and ease of use. Read more.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Leveraging AWS to Build a Scalable Data Pipeline

Mon, 06/08/2015 - 18:06

While at Netflix and LinkedIn Siddharth "Sid" Anand wrote some great articles for HS. Sid is back, this time as a Data Architect at Agari. Original article is here.

Data-rich companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. In order to meet strict requirements on data security, fault-tolerance, cost control, job scalability, and uptime, they need to closely manage their core technology. Like serving systems (e.g. web application servers and OLTP databases) that need to be up 24x7 to display content to users the world over, data pipelines need to be up and running in order to pick the most engaging and up-to-date content to display. In other words, updated ranking models, new content recommendations, and the like are what make data pipelines an integral part of an end user’s web experience by picking engaging, personalized content. 

Agari, a data-driven email security company, is no different in its demand for a low-latency, reliable, and scalable data pipeline.  It must process a flood of inbound email and email authentication metrics, analyze this data in a timely manner, often enriching it with 3rd party data and model-driven derived data, and publish findings. One twist is that Agari, unlike the companies listed above, operates completely in the cloud, specifically in AWS.  This has turned out to be more a boon than a disadvantage. 

Below is one such data pipeline used at Agari.

Agari Data Pipeline

Data Flow
Categories: Architecture

Gone Fishin' 2015

Mon, 06/08/2015 - 17:56

Well, not exactly Fishin', but I'll be on a month long vacation starting today.

I won't be posting (much) new content, so we'll all have a break. Disappointing, I know.

Please use this time for quiet contemplation and other inappropriate activities.

See you on down the road...

Categories: Architecture

Stuff The Internet Says On Scalability For June 5th, 2015

Fri, 06/05/2015 - 16:56

Hey, it's HighScalability time:


Stunning Multi-Wavelength Image Of The Solar Atmosphere.
  • 4x: amount spent by Facebook users
  • Quotable Quotes:
    • Facebook: Facebook's average data set for CF has 100 billion ratings, more than a billion users, and millions of items. In comparison, the well-known Netflix Prize recommender competition featured a large-scale industrial data set with 100 million ratings, 480,000 users, and 17,770 movies (items).
    • @BenedictEvans: The number of photos shared on social networks this year will probably be closer to 2 trillion than to 1 trillion.
    • @jeremysliew: For every 10 photos shared on @Snapchat, 5 are shared on @Facebook and 1 on @Instagtam. 8,696 photos/sec on Snapchat.
    • @RubenVerborgh: “Don’t ask for an API, ask for data access. Tim Berners-Lee called for open data, not open services.” —@pietercolpaert #SemDev2015 #ESWC2015
    • Craig Timberg: When they thought about security, they foresaw the need to protect the network against potential intruders or military threats, but they didn’t anticipate that the Internet’s own users would someday use the network to attack one another. 
    • Janet Abbate: They [ARPANET inventors] thought they were building a classroom, and it turned into a bank.
    • A.C. Hadfield: The power of accurate observation is often called cynicism by those who don’t possess it.
    • @plightbo: Every business is becoming a software business
    • @potsdamnhacker: Replaced Go service with an Erlang one. Already used hot-code reloading, fault tolerance, runtime inspectability to great effect. #hihaters
    • @PHP_CEO: WE SPENT 18 MONTHS MIGRATING FROM A MONOLITH TO MICROSERVICES RESULT:- GITHUB GETS PAID FOR MORE PRIVATE REPOS - FIND/REPLACE IS HARDER
    • @alsargent: Given continuous deployment, average lifetime of a #Docker image @newrelic is 12hrs. Different ops pattern than VMs. #velocityconf
    • @PHP_CEO: ALSO THE NODE GUY WHO WAS A RUBY GUY THAT REWROTE IT ALL BECAME A RUST GUY AND MOVED TO THAILAND TO BECOME A NOMAD STARTUP GUY
    • @abt_programming: "If you think good architecture is expensive, try bad architecture" - Brian Foote - and Joseph Yoder
    • @KlangianProverb: "I thought studying distributed systems would make me understand software better—it made me understand reality better."—Old Klangian Proverb
    • @rachelmetz: google's error rate for image recognition was 28 percent in 2008, now it's like 5 percent, quoc le says.

  • Fear or strength? Apple’s Tim Cook Delivers Blistering Speech On Encryption, Privacy. With Google Now on Tap Google is saying we've joyously crossed the freaky line and we unapologetically plan to leverage our huge lead in machine learning to the max. Apple knows they can't match this feature. Google knows this is a clear and distinct exploitable marketing idea, like a super thin MacBook Air slowly slipping out of a manila envelope.

  • How does Kubernetes compare to Mesos? cmcluck, who works at Google and was one of the founders of the project explains: we looked really closely at Apache Mesos and liked a lot of what we saw, but there were a couple of things that stopped us just jumping on it. (1) it was written in C++ and the containers world was moving to Go -- we knew we planned to make a sustained and considerable investment in this and knew first hand that Go was more productive (2) we wanted something incredibly simple to showcase the critical constructs (pods, labels, label selectors, replication controllers, etc) and to build it directly with the communities support and mesos was pretty large and somewhat monolithic (3) we needed what Joe Beda dubbed 'over-modularity' because we wanted a whole ecosystem to emerge, (4) we wanted 'cluster environment' to be lightweight and something you could easily turn up or turn down, kinda like a VM; the systems integrators i knew who worked with mesos felt that it was powerful but heavy and hard to setup (though i will note our friends at Mesosphere are helping to change this). so we figured doing something simple to create a first class cluster environment for native app management, 'but this time done right' as Tim Hockin likes to say everyday. < Also, CoreOS (YC S13) Raises $12M to Bring Kubernetes to the Enterprise.

  • If structure arises in the universe because electrons can't occupy the same space, why does structure arise in software?

  • The cost of tyranny is far lower than one might hope. How much would it cost for China to intercept connections and replace content flowing at 1.9-terabits/second? About $200K says Robert Graham in Scalability of the Great Cannon. Low? Probably. But for the price of a closet in the new San Francisco you can edit an entire people's perception of the Internet in real-time.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Paper: Heracles: Improving Resource Efficiency at Scale

Thu, 06/04/2015 - 16:56

Underutilization and segregation are the classic strategies for ensuring resources are available when work absolutely must get done. Keep a database on its own server so when the load spikes another VM or high priority thread can't interfere with RAM, power, disk, or CPU access. And when you really need fast and reliable networking you can't rely on QOS, you keep a dedicated line.

Google flips the script in Heracles: Improving Resource Efficiency at Scale, shooting for high resource utilization while combining different load profiles.

I'm assuming the project name Heracles was chosen not simply for his legendary strength, but because when strength failed, Heracles could always depend on his wits. Who can ever forget when Heracles tricked Atlas into taking the sky back onto his shoulders? Good times.

The problem: better utilization of compute resources while complying  service level objectives (SLOs) for latency-critical (LC) and best effort batch (BE) tasks: 

Categories: Architecture