Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
Syndicate content
Updated: 9 hours 16 min ago

How HipChat Stores and Indexes Billions of Messages Using ElasticSearch and Redis

Mon, 01/06/2014 - 17:54

This article is from an interview with Zuhaib Siddique, a production engineer at HipChat, makers of group chat and IM for teams.

HipChat started in an unusual space, one you might not think would have much promise, enterprise group messaging, but as we are learning there is gold in them there enterprise hills. Which is why Atlassian, makers of well thought of tools like JIRA and Confluence, acquired HipChat in 2012.

And in a tale not often heard, the resources and connections of a larger parent have actually helped HipChat enter an exponential growth cycle. Having reached the 1.2 billion message storage mark they are now doubling the number of messages sent, stored, and indexed every few months.

That kind of growth puts a lot of pressure on a once adequate infrastructure. HipChat exhibited a common scaling pattern. Start simple, experience traffic spikes, and then think what do we do now? Using bigger computers is usually the first and best answer. And they did that. That gave them some breathing room to figure out what to do next. On AWS, after a certain inflection point, you start going Cloud Native, that is, scaling horizontally. And that’s what they did.

But there's a twist to the story. Security concerns have driven the development of an on-premises version of HipChat in addition to its cloud/SaaS version. We'll talk more about this interesting development in a post later this week.

While HipChat isn’t Google scale, there is good stuff to learn from HipChat about how they index and search billions of messages in a timely manner, which is the key difference between something like IRC and HipChat. Indexing and storing messages under load while not losing messages is the challenge.

This is the road that HipChat took, what can you learn? Let’s see…

Categories: Architecture

Stuff The Internet Says On Scalability For January 3rd, 2014

Fri, 01/03/2014 - 17:54

Hey, it's HighScalability time, can you handle the truth?


Should software architectures include parasites? They increase diversity and complexity in the food web.
  • 10 Million: classic hockey stick growth pattern for GitHub repositories
  • Quotable Quotes:
    • Seymour Cray: A supercomputer is a device for turning compute-bound problems into IO-bound problems.
    • Robert Sapolsky: And why is self-organization so beautiful to my atheistic self? Because if complex, adaptive systems don’t require a blue print, they don’t require a blue print maker. If they don’t require lightning bolts, they don’t require Someone hurtling lightning bolts.
    • @swardley: Asked for a history of PaaS? From memory, public launch - Zimki ('06), BungeeLabs ('06), Heroku ('07), GAE ('08), CloudFoundry ('11) ...
    • @neil_conway: If you're designing scalable systems, you should understand backpressure and build mechanisms to support it.
    • Scott Aaronson...the brain is not a quantum computer. A quantum computer is good at factoring integers, discrete logarithms, simulating quantum physics, modest speedups for some combinatorial algorithms, none of these have obvious survival value. The things we are good at are not the same thing quantum computers are good at.
    • @rbranson: Scaling down is way cooler than scaling up.
    • @rbranson: The i2 EC2 instances are a huge deal. Instagram could have put off sharding for 6 more months, would have had 3x the staff to do it.
    • @mraleph: often devs still approach performance of JS code as if they are riding a horse cart but the horse had long been replaced with fusion reactor
  • Now we know the cost of bandwidth: Netflix’s new plan: save a buck with SD-only streaming
  • Massively interesting Stack Overflow thread on Why is processing a sorted array faster than an unsorted array? Compilers may grant a hidden boon or turn traitor with a deep deceit. How do you tell? It's about branch prediction.
  • Can your database scale to 1000 cores? Nope. Concurrency Control in the Many-core Era: Scalability and Limitations: We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.
  • Not all SSDs are created equal. Power-Loss-Protected SSDs Tested: Only Intel S3500 PassesWith a follow up. If data on your SSD can't survive a power outage it ain't worth a damn. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Categories: Architecture

xkcd: How Standards Proliferate:

Thu, 01/02/2014 - 18:00

The great thing about standards is there are so many to choose from. What is it about human nature that makes this so recognizably true?

Categories: Architecture

Paper: Nanocubes: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets

Wed, 01/01/2014 - 17:55

How do you turn Big Data into fast, useful, and interesting visualizations? Using R and technology called Nanocubes. The visualizations are stunning and amazingly reactive. Almost as interesting as the technologies behind them.

David Smith wrote a great article explaining the technology and showing a demo by Simon Urbanek of a visualization that uses 32Tb of Twitter data. It runs smoothly and interactively on a single machine with 16Gb of RAM.  For more information and demos go to nanocubes.net.

David Smith sums it up nicely:

Despite the massive number of data points and the beauty and complexity of the real-time data visualization, it runs impressively quickly. The underlying data structure is based on Nanocubes, a fast datastructure for in-memory data cubes. The basic idea is that nanocubes aggregate data hierarchically, so that as you zoom in and out of the interactive application, one pixel on the screen is mapped to just one data point, aggregated from the many that sit "behind" that pixel. Learn more about nanocubes, and try out the application yourself (modern browser required) at the link below.

Abstract from Nanocubes for Real-Time Exploration of Spatiotemporal Datasets:

Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop’s main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.
Categories: Architecture