Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator/sources/3&page=1' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.
Syndicate content
Updated: 7 hours 51 min ago

A Beginner's Guide to Scaling to 11 Million+ Users on Amazon's AWS

Mon, 01/11/2016 - 17:56

How do you scale a system from one user to more than 11 million users? Joel Williams, Amazon Web Services Solutions Architect, gives an excellent talk on just that subject: AWS re:Invent 2015 Scaling Up to Your First 10 Million Users.

If you are an advanced AWS user this talk is not for you, but it’s a great way to get started if you are new to AWS, new to the cloud, or if you haven’t kept up with with constant stream of new features Amazon keeps pumping out.

As you might expect since this is a talk by Amazon that Amazon services are always front and center as the solution to any problem. Their platform play is impressive and instructive. It's obvious by how the pieces all fit together Amazon has done a great job of mapping out what users need and then making sure they have a product in that space. 

Some of the interesting takeaways:

  • Start with SQL and only move to NoSQL when necessary.
  • A consistent theme is take components and separate them out. This allows those components to scale and fail independently. It applies to breaking up tiers and creating microservices.
  • Only invest in tasks that differentiate you as a business, don't reinvent the wheel.
  • Scalability and redundancy are not two separate concepts, you can often do both at the same time.
  • There's no mention of costs. That would be a good addition to the talk as that is one of the major criticisms of AWS solutions.
The Basics
Categories: Architecture

Uptime Funk - Best Sysadmin Parody Video Ever!

Sun, 01/10/2016 - 18:14

This is so good! Perfect for your Monday morning jam.

 

Uptime Funk is a music video (parody of Uptown Funk) from SUSECon 2015 in Amsterdam. My favorite:  I'm all green (hot patch)
Called a Penguin and Chameleon
I'm all green (hot patch)
Call Torvalds and Kroah-Hartman
It’s too hot (hot patch)
Yo, say my name you know who I am
It’s too hot (hot patch)
I ain't no simple code monkey
Nuthin's down
Categories: Architecture

Stuff The Internet Says On Scalability For January 8th, 2016

Fri, 01/08/2016 - 17:56

Hey, it's HighScalability time:


Finally, a clear diagram of Amazon's industry impact. (MARK A. GARLICK)

 

If you like this Stuff then please consider supporting me on Patreon.
  • 150: # of globular clusters in the Milky Way; 800 million: Facebook Messenger users; 180,000: high-res images of the past; 1 exaflops: 1 million trillion floating-point operations per second; 10%: of Google's traffic is now IPv6; 100 milliseconds: time it takes to remember; 35: percent of all US Internet traffic used by Netflix; 125 million: hours of content delivered each day by Netflix's CDN;

  • Quotable Quotes:
    • Erik DeBenedictis: We could build an exascale computer today, but we might need a nuclear reactor to power it
    • wstrange: What I really wish the cloud providers would do is reduce network egress costs. They seem insanely expensive when compared to dedicated servers.
    • rachellaw: What's fascinating is the bot-bandwagon is mirroring the early app market. With apps, you downloaded things to do things. With bots, you integrate them into things, so they'll do it for you. 
    • erichocean: The situation we're in today with RAM is pretty much the identical situation with the disks of yore.
    • @bernardgolden: @Netflix will spend 2X what HBO does on programming in 2016? That's an amazing stat. 
    • @saschasegan: Huawei's new LTE modem has 18 LTE bands. Qualcomm's dominance of LTE is really ending this year.
    • Unruly Places: The rise of placelessness, on top of the sense that the whole planet is now minutely known and surveilled, has given this dissatisfaction a radical edge, creating an appetite to find places that are off the map and that are somehow secret, or at least have the power to surprise us.
    • @mjpt777: Queues are everywhere. Recognise them, make them first class, model and monitor them for telemetry.
    • Guido de Croon:  the robot exploits the impending instability of its control system to perceive distances. This could be used to determine when to switch off its propellers during landing, for instance.
    • @gaberivera: In the future, all major policy questions will be settled by Twitter debates between venture capitalists
    • Craig McLuckie: It’s not obvious until you start to actually try to run massive numbers of services that you experience an incredible productivity that containers bring
    • Brian Kirsch: One of the biggest things when you look at the benefits of container-based virtualization is its ability to squeeze more and more things onto a single piece of hardware for cost savings. While that is good for budgets, it is excessively horrible when things go bad.
    • @RichardWarburto: It still surprises me that configuration is most popular user of strong consistency models atm. Is config more important than data
    • @jamesurquhart: Five years ago I predicted CFO would stop complaining about up front cost, and start asking to reduce monthly bill. Seeing that happen now.
    • @martinkl: Communities in a nutshell… • Databases research: “In fsync we trust” • Distributed systems research: “In majority vote we trust”
    • @BoingBoing: Tax havens hold $7.6 trillion; 8% of world's total wealth
    • @DrQz: Amazon's actual profits are still tiny, relying heavily on its AWS cloud business.
    • hadagribble: we need to view fast storage as something other than disk behind a block interface and slow memory, especially with all the different flavours of fast persistent storage that seem to be on the horizon. For the one's that attach to the memory bus, the PMFS-style [1] approach of treating them like a file-system for discoverability and then mmaping to allow them to be accessed as memory is pretty attractive.

  • EC2 with a 5% price reduction on certain things in certain places. Not exactly the race to the bottom one would hope for in a commodity market, which means the cloud is not a commodity. Happy New Year – EC2 Price Reduction (C4, M4, and R3 Instances).

  • Since the locus of the Internet is centering on a command line interface in the form of messaging, chatbot integrations may be giving APIs a second life, assuming they are let inside the walled garden. The next big thing in computing is called 'ChatOps,' and it's already happening inside Slack. The advantage chatops has over the old Web + API mashup dream is that messaging platforms come built-in with a business model/app store, large amd growing user base, and network effects. Facebook’s Secret Chat SDK Lets Developers Build Messenger Bots. Slack apps. WeChat API. Telegram API. Alexa API. Google's Voice Actions. How about Siri or iMessage? Nope. njovin likes it: I've worked with the new Chat SDK and our customers' use cases aren't geared toward forcing (or even encouraging) users into using Facebook Messenger. Most of them are just trying to meet demand from their customers. In our particular case, we have customers with a lot of international travelers who have access to data while abroad but not necessarily SMS. IMO it's a lot better than having a dedicated app you have to download to interact with a specific brand.

  • The world watched a lot of porn this year. If you like analytics you'll love Pornhub’s 2015 Year in Review: In 2015 alone, we streamed 75GB of data a second; bandwidth used is 1,892 petabytes; 4,392,486,580 hours of video were watched; 21.2 billion visits.

  • A very interesting way to frame the issue. On the dangers of a blockchain monoculture: The Bitcoin blockchain: the world’s worst database. Would you use a database with these features? Uses approximately the same amount of electricity as could power an average American household for a day per transaction. Supports 3 transactions / second across a global network with millions of CPUs/purpose-built ASICs. Takes over 10 minutes to “commit” a transaction. Doesn’t acknowledge accepted writes: requires you read your writes, but at any given time you may be on a blockchain fork, meaning your write might not actually make it into the “winning” fork of the blockchain (and no, just making it into the mempool doesn’t count). In other words: “blockchain technology” cannot by definition tell you if a given write is ever accepted/committed except by reading it out of the blockchain itself (and even then). Can only be used as a transaction ledger denominated in a single currency, or to store/timestamp a maximum of 80 bytes per transaction. But it’s decentralized!

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Let's Donate Our Organs and Unused Cloud Cycles to Science

Wed, 01/06/2016 - 17:56

There’s a long history of donating spare compute cycles for worthy causes. Most of those efforts were started in the Desktop Age. Now, in the Cloud Age, how can we donate spare compute capacity? How about through a private spot market?

There are cycles to spare. Public Cloud Usage trends:

  • Instances are underutilized with average utilization rates between 8-9%

  • 24% of instance reservations are unused

Maybe all that CapEx sunk into Reserved Instances can be put to some use? Maybe over provisioned instances could be added to the resource pool as well? That’s a lot of power Captain. How could it be put to good use?

There is a need to crunch data. For science. Here’s a great example as described in This is how you count all the trees on Earth. The idea is simple: from satellite pictures count the number of trees. It’s an embarrassingly parallel problem, perfect for the cloud. NASA had a problem. Their cloud is embarrassingly tiny. 400 hypervisors shared amongst many projects. Analysing all the data would would take 10 months. An unthinkable amount of time in this Real-time Age. So they used the spot market on AWS.

The upshot? The test run cost a measly $80, which means that NASA can process data collected for an entire UTM zone for just $250. The cost for all 11 UTM zones in sub-Sarahan Africa and the use of all four satellites comes in at just $11,000.

“We have turned what was a $200,000 job into a $10,000 job and we went from 100 days to 10 days [to complete],” said Hoot. “That is something scientists can build easily into their budget proposals.”

That last quote, That is something scientists can build easily into their budget proposals, stuck in my craw.

Imagine how much science could get done if you didn’t have the budget proposal process slowing down the future? Especially when we know there are so many free cycles available that are already attached to well supported data processing pipelines. How could those cycles be freed up to serve a higher purpose?

Netflix shows the way with their internal spot market. Netflix has so many cloud resources at their disposal, a pool of 12,000 unused reserved instances at peak times, that they created their own internal spot market to drive better utilization. The whole beautiful setup is described Creating Your Own EC2 Spot Market, Creating Your Own EC2 Spot Market -- Part 2, and in High Quality Video Encoding at Scale.

The win: By leveraging the internal spot market Netflix measured the equivalent of a 210% increase in encoding capacity.

Netflix has a long and glorious history of sharing and open sourcing their tools. It seems likely when they perfect their spot market infrastructure it could be made generally available.

Perhaps the Netflix spot market could be extended so unused resources across the Clouds could advertise themselves for automatic integration into a spot market usable by scientists to crunch data and solve important world problems.

Perhaps donated cycles could even be charitable contributions that could help offset the cost of the resource? My wife is a tax accountant and she says this is actually true, under the right circumstances.

This kind of idea has a long history with me. When AWS first started, I like a lot of people wondered, how can I make money off this gold rush? That’s before we knew Amazon was going to make most of the tools to sell to the miners themselves. The idea of exploiting underutilized resources fascinated me for some reason. That is, after all, what VMs do for physical hardware, exploit the underutilized resources of powerful machines. And it is in some ways the idea behind our modern economy. Yet even today software architectures aren’t such that we reach anything close to full utilization of our hardware resources. What I wanted to do was create a memcached system that allowed developers to sell their unused memory capacity (and later CPU, network, storage) to other developers as cheap dynamic pools of memcached storage. Get your cache dirt cheap and developers could make some money back on underused resources. A very similar idea to the spot market notion. But without homomorphic encryption the security issues were daunting, even assuming Amazon would allow it. With the advent of the Container Age sharing a VM is now way more secure and Amazon shouldn’t have a problem with the idea if it’s for science. I hope.

Categories: Architecture

Sponsored Post: Netflix, StatusPage.io, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 01/05/2016 - 17:56

Who's Hiring?
  • Manager - Site Reliability Engineering: Lead and grow the the front door SRE team in charge of keeping Netflix up and running. You are an expert of operational best practices and can work with stakeholders to positively move the needle on availability. Find details on the position here: https://jobs.netflix.com/jobs/398

  • Senior Service Reliability Engineer (SRE): Drive improvements to help reduce both time-to-detect and time-to-resolve while concurrently improving availability through service team engagement.  Ability to analyze and triage production issues on a web-scale system a plus. Find details on the position here: https://jobs.netflix.com/jobs/434

  • Manager - Performance Engineering: Lead the world-class performance team in charge of both optimizing the Netflix cloud stack and developing the performance observability capabilities which 3rd party vendors fail to provide.  Expert on both systems and web-scale application stack performance optimization. Find details on the position here https://jobs.netflix.com/jobs/860482

  • Senior Devops Engineer - StatusPage.io is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event could be here. How cool is that?
Cool Products and Services
  • Turn chaotic logs and metrics into actionable data. Scalyr is a tool your entire team will love. Get visibility into your production issues without juggling multiple tools and tabs. Loved and used by teams at Codecademy, ReturnPath, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Server-Side Architecture. Front-End Servers and Client-Side Random Load Balancing

Mon, 01/04/2016 - 17:56

Chapter by chapter Sergey Ignatchenko is putting together a wonderful book on the Development and Deployment of Massively Multiplayer Games, though it has much broader applicability than games. Here's a recent chapter from his book.

Enter Front-End Servers

[Enter Juliet]
Hamlet:
Thou art as sweet as the sum of the sum of Romeo and his horse and his black cat! Speak thy mind!
[Exit Juliet]

— a sample program in Shakespeare Programming Language

 

 

Front-End Servers as an Offensive Line

 

Our Classical Deployment Architecture (especially if you do use FSMs) is not bad, and it will work, but there is still quite a bit of room for improvement for most of the games out there. More specifically, we can add another row of servers in front of the Game Servers, as shown on Fig VI.8:

Categories: Architecture

Stuff The Internet Says On Scalability For January 1st, 2016

Fri, 01/01/2016 - 17:56

Hey, Happy New Year, it's HighScalability time:


River system? Vascular system? Nope. It's a map showing how all roads really lead to Rome.

 

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 71: mentions of innovation by the Chinese Communist Party; 60.5%: of all burglaries involve forcible entry; 280,000-squarefoot: Amazon's fulfillment center in India capable of shipping 2 million items; 11 billion: habitable earth like planets in the goldilocks zone in just our galaxy; 800: people working on the iPhone's camera (how about the app store?); 3.3 million: who knew there were so many Hello Kitty fans?; 26 petabytes: size of League of Legends' data warehouse; 

  • Quotable Quotes:
    • George Torwell: Tor is Peace / Prism is Slavery / Internet is Strength
    • @SciencePorn: Mr Claus will eat 150 BILLION calories and visit 5,556 houses per second this Christmas Eve.
    • @SciencePorn: Blue Whale's heart is so big, a small child can swim through the veins.
    • @BenedictEvans: There are close to 4bn people on earth with a phone (depending on your assumptions). Will go to at least 5bn. So these issues will grow.
    • @JoeSondow: "In real life you won't always have a calculator with you." — math teachers in the 80s
    • James Hamilton: This is all possible due to the latencies we see with EC2 Enhanced networking. Within an availability zone, round-trip times are now tens of microseconds, which make it feasible to propose and commit transactions to multiple resilient nodes in less than a millisecond.
    • Benedict Evans: The mobile ecosystem, now, is heading towards perhaps 10x the scale of the PC industry, and mobile is not just a new thing or a big thing, but that new generation, whose scale makes it the new centre of gravity of the tech industry. Almost everything else will orbit around it. 
    • Ruth Williams: Bacteria growing in an unchanging environment continue to adapt indefinitely.
    • @Raju: Not one venture-backed news aggregator has yet shown a Sustainable Business Model
    • @joeerl: + choose accurate names + favor beauty over performance + design minimal essential API's + document the unobvious
    • @shibuyashadows: There is no such thing as a full-node anymore. Now there are two types: Mining Nodes Economic Nodes. Both sets are now semi-centralized on the network, are heavily inter-dependent and represent the majority of the active Bitcoin users.
    • @TheEconomist: In 1972 a man with a degree aged 25-34 earned 22% more than a man without. Today, it's 70%
    • Dr. David Miller~ We are in the age of Howard Hughes. People make their fortune elsewhere and spend it on space. 
    • Credit for CRISPR: Part of that oversimplification is rooted in the fact that most modern life-science researchers aren’t working to uncover broad biological truths. These days the major discoveries lie waiting in the details
    • @BenedictEvans: Idle observation: Facebook will almost certainly book more revenue in 2015 than the entire internet ad industry made up until to 2000
    • Eric Clemmons: Ultimately, the problem is that by choosing React (and inherently JSX), you’ve unwittingly opted into a confusing nest of build tools, boilerplate, linters, & time-sinks to deal with before you ever get to create anything.
    • Kyle Russell: Why do I need such a powerful PC for VR? Immersive VR experiences are 7x more demanding than PC gaming.
    • @josevalim: The system that manages rate limits for Pinterest written in Elixir with a 90% response time of 800 microseconds.
    • catnaroek: The normal distribution is important because it arises naturally when the preconditions of the central limit theorem hold. But you still have to use your brain - you can't unquestioningly assume that any random variable (or sample or whatever) you will stumble upon will be approximately normally distributed.
    • Dominic Chambers: Now, if you consider the server-side immutable state atom to be a materialized view of the historic events received by a server, you can see that we've already got something very close to a Samza style database, but without the event persistence.
    • Joscha Bach: In my view, the 20th century’s most important addition to understanding the world is not positivist science, computer technology, spaceflight, or the foundational theories of physics. It is the notion of computation. Computation, at its core, and as informally described as possible, is very simple: every observation yields a set of discernible differences.

  • The New Yorker is picking up on the Winner Takes All theme that's been developing, I guess that makes it an official meme. What's missing from their analysis is that users are attracted to the eventual winners because they provide a superior customer experience. Magical algorithms are in support of experience. As long as a product doesn't fail at providing that experience there's little reason to switch after being small networked into a choice. You might think many many products could find purchase along the long tail, but in low friction markets that doesn't seem to be the case. Other choices become invisible and what's invisible starves to death.

  • I wonder how long it took to get to the 1 billionth horse ride? Uber Hits One Billionth Ride in 5.5 years.

  • Let's say you are a frog that has been in a warming pot for the last 15 years, what would you have missed? Robert Scoble has put together quite a list. 15 years ago there was no: Facebook, YouTube, Twitter, Google+, Quora, Uber, Lyft, iPhone, iPads, iPod, Android, HDTV, self driving cars, Waze, Google Maps, Spotify. Soundcloud, WordPress, Wechat, Flipkart, AirBnb, Flipboard, LinkedIn, AngelList, Techcrunch, Google Glass, Y Combinator, Techstars, Geekdom, AWS, OpenStack, Azure, Kindle, Tesla, and a lot more.

  • He who controls the algorithm reaps the rewards. Kansas is now the 5th state where lottery prizes may have been fixed.

  • What Is The Power Grid? A stunning 60% of generated energy is lost before it can be consumed, which is why I like my power grids like my databases: distributed and shared nothing.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

How to choose an in-memory NoSQL solution: Performance measuring

Wed, 12/30/2015 - 17:56

The main purpose of this work is to show results of benchmarking some of the leading in-memory NoSQL databases with a tool named YCSB.

We selected three popular in-memory database management systems: Redis (standalone and in-cloud named Azure Redis Cache), Tarantool and CouchBase and one cache system Memcached. Memcached is not a database management system and does not have persistence. But we decided to take it, because it is also widely used as a fast storage system. Our “firing field” was a group of four virtual machines in Microsoft Azure Cloud. Virtual machines are located close to each other, meaning they are in one datacenter. This is necessary to reduce the impact of network overhead in latency measurements. Images of these VMs can be downloaded by links: one, two, three and four (login: nosql, password: qwerty). A pair of VMs named nosql-1 and nosql-2 is useful for benchmarking Tarantool and CouchBase and another pair of VMs named nosql-3 and nosql-4 is good for Redis, Azure Redis Cache and Memcached. Databases and tests are installed and configured on these images.

Our virtual machines were the basic A3 instances with 4 cores, 7 GB RAM and 120 GB disk size.

Databases and their configurations
Categories: Architecture

Using AWS Lambda functions to create print ready files

Mon, 12/28/2015 - 17:56

In a nutshell, Peecho is all about turning your digital content into professionally printed products. Although it might look like a simple task, a lot of stuff happens behind the scenes to make that possible. In this article, we’re going to tell you about our  processing architecture as well as at a recent performance improvement with the integration of AWS Lambda functions.

Print-ready files In order to make digital content ready for printing facilities, there are some procedures that must occur after the order is received and before the final printing. In printing industry this process is called pre-press and the Peecho platform fully automates its initial stages before routing orders to printers.
Once the file has been created by the customer and uploaded to Peecho, it undergoes our processing stage. During processing, the file is checked to make sure it contains all the elements necessary for a successful print run: do the images have the proper format and resolution, are all the fonts included, are the RGB/CMYK colors set up appropriately, are all layout elements such as margins, crop marks and bleeds set up correctly, etc.
All these checks are automated by our backend systems. The entire process is quite complex and involves heavy computational activities to be executed that are expensive and time consuming. Let’s take a more detailed look at our processing architecture.
Processing Architecture The processing stage starts right after an order is placed and payment has been confirmed. It’s initiated by the order intake server by adding a message to a SQS processing queue with all information about the order and file to be processed. Whenever there is a message available in the queue, a new processing machine (a large EC2 instance) starts working to transform the original data into a print ready file.
At the core of the processing code we use open source libraries like iText as well as third party software for PDF and image encoding/conversion like PStill and ImageMagick. As the result of processing we generate PDF/X-3 files.  

In earlier versions, when the Peecho platform first launched, all processing was executed by EC2 instances. For a single order it was done sequentially; page by page as illustrated below.


Since we can deal with any kind of files and usually really tough ones, the described transformation process could take hours to be executed. In average, it would take 15 seconds per page. Since it needed to be done sequentially, the processing time increased linearly according to the number of pages. For example, a 400-page document would take around 1 hour and 40 minutes to be processed, which  is a considerable amount of time for a single file.
Recently, our development team has integrated the new AWS Lambda functions into the processing architecture and that has changed the story enormously.
AWS Lambda Imagine if you could simply define a piece of code that runs in a dedicated machine in the cloud, without worrying about provisioning, managing and scaling the servers that you use to run the code? That’s exactly what AWS Lambda is: a compute service where you can define functions that respond to events, such as changes to data in Amazon S3.
In the new processing architecture, we took the existing processing code and converted it into a AWS Lambda function that performs all file transformations on a single page in a document. The new function is written in Node.js and is triggered after S3 file uploads.
After the processing starts, the original document is split into separate pages and uploaded to S3; when the upload completes for every page, a new Lambda instance is launched and starts cracking the page data.

By doing that, we are now able to run a separate processing instance for each page in parallel. It means that for a 400-page document we now launch 400 Lambda instances simultaneously and process the entire document at the same period of time it would take to process a single page. Therefore, the processing time does not increase with the number of pages. And as a result, we can process almost any document in the same time we used to process a single page!
Although AWS Lambda is a great and powerful function, it has some limitations regarding execution time, disk space and memory. For instance, we are not able to use Lambda to process files larger than 500MB. Since we still have to process these big guys, the Peecho platform falls back to the previous mechanism whenever we need to handle corner cases like that.
More on Lambda Other than document processing, Peecho also uses AWS Lambda functions in some other cool features like the generation of thumbnails for publication covers as well as content previews. For that, Lambda functions are triggered right after a publication is uploaded, so image thumbnails are instantly available in our dashboard, website and checkout pages.
Our development team is obsessed in making things simpler and faster. We are continuously seeking new possibilities for improving performance across Peecho applications. When it comes to that, AWS Lambda function makes a great fit and it’s definitely going to be more and more explored in future releases.
Categories: Architecture

Stuff The Internet Says On Scalability For December 18th, 2015

Fri, 12/18/2015 - 17:52

Hey, it's HighScalability time:


In honor of a certain event what could be better than a double-bladed lightsaber slicing through clouds? (ESA/Hubble & NASA)

 

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 66,000 yottayears: lifetime of an electron; 3 Gbps: potential throughput for backhaul microwave networks; 1.2 trillion: yearly Google searches; $100 trillion: global investible capital; 2.5cm: range of chip powered by radio waves; 

  • Quotable Quotes:
    • @KarenMN: He's making a database / He's sorting it twice / SELECT * from contacts WHERE behavior = 'nice' / SQL Clause is coming to town
    • abrkn: Every program attempts to expand until it has an app store. Those programs which cannot so expand are replaced by ones which can.
    • Amin Vahdat: Some recent external measurements indicate that our [Google] backbone carries the equivalent of 10 percent of all the traffic on the global Internet. The rate at which that volume is growing is faster than for the Internet as a whole.
    • Prismatic:  we also learned content distribution is a tough business and we’ve failed to grow at a rate that justifies continuing to support our Prismatic News products.
    • On General Pershing: Pershing was the way he was because he knew that winning wars was in the details. Troops who paid attention to the small things would master the big things. 
    • jbob2000: Wow! A single developer working on small websites doesn't need MVC? What a revelation! I bet he doesn't have any pesky problems, such as; working in large teams, long term support, developer turn over, documentation, changing requirements, deadlines, scaling, etc. etc. Oh, but the rendered HTML looks nice!
    • Poldrack: That was totally unexpected, but it shows that being caffeinated radically changes the connectivity of your brain
    • @ValaAfshar: Uber is less than 6 years old and now valued more than 80% of S&P 500 companies.
    • @HNTitles: Scaling Pinterest - From 0 to Startup: How We Use That. What startups use to prevent concussions
    • @Carnage4Life: Top 5 qualities of successful teams at Google 1 Failure is OK 2 Dependability  3 Clear structure 4 Meaning 5 Impact
    • Ustun Ozgur: The tides have changed there too. Now, you need just two endpoints: One for serving the initial HTML, one for the API endpoints. This is the essence of web programming in the future: Two endpoints to rule them all.
    • @ErlangerNick: US: 1 brewery per 78k people, 10 new breweries per week. UK: 1 brewery per 50k people, 15 new breweries per week when scaling populations.
    • jerf: When rewriting something, you should generally strive for a drop-in replacement that does the same thing, in some cases, even matching bug-for-bug
    • @EricMinick: "We found that where code deployments are most painful, you’ll find the poorest IT performance... and culture" - 2015 Puppet State of DevOps
    • @StartupLJackson: I'm going on the record to say the killer app for Bitcoin is not turning $1 of electricity into $.50 of BTC. 
    • @nntaleb: Paris blokes missed the point that it is not just temp rising, but its volatility rising more than the average! 2nd order effect=fragility
    • Julian Dunn: Unfortunately, I believe that the “large attack surface” is a fundamental design problem with containers being an evolutionary, not a revolutionary step from VMs and bare metal.
    • The Shade Tree Developer: sharing a database is like drug abusers sharing needles.
    • Joe Young: Keurig coffee machines are the bane of my trade. They are not built to last, some rarely make it a year in our business. They have no replaceable parts, so I can not fix them.
    • wh-uws: This is why slack is winning. They took many of the concepts of what makes irc great abd put a much better user experience on top. Why is that so hard for people to understand?
    • @chamath: New VC dynamics: Returns being generated by new firms. Legacy firms increasingly dated and out of touch. 

  • The Talk Show interviewed Apple senior vice president of software engineering Craig Federighi about Swift. The upshot wasn't anything technical, it was a feeling: If you were worried that Apple is going to dangle Swift, get you pot committed, and then pull it out from under you, that seems highly unlikely. It's clear from the interview Apple is using Swift, they are excited about Swift, and it's here to stay. Plan accordingly. John Siracusa is dead on in his discussion of garbage collection. Swift is using ARC instead of garbage collection, which is a bet on determinism winning over virtual machine based language approaches, which is a good bet IMHO, even in the age of more powerful mobile processors.

  • Elon Musk’s Billion-Dollar AI Plan Is About Far More Than Saving the World. Those AIs are so clever. How do you distribute AIs as deep and wide into society as possible? You make it free and open! That's how the AIs are going to take over, riding the open source meme to victory. 

  • It's odd how in software we try to reduce coupling at all costs, yet in biology every opportunity to communicate and create feedback loops is exploited. Maybe it's we who are doing it wrong? Cells send tiny parcels to each other: cells package various molecules into tiny bubble-like parcels called extracellular vesicles to send important messages - in sickness and health.

  • Now that's disaster planning! Elon Musk worries third World War would ruin Mars mission.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

How Does the Use of Docker Effect Latency?

Wed, 12/16/2015 - 17:56

A great question came up on the mechanical-sympathy list that many others probably have as well: 

I keep hearing about [Docker] as if it is the greatest thing since sliced bread, but I've heard anecdotal evidence that low latency apps take a hit. 

Who better to answer than Gil Tene, Vice President of Technology and CTO, Co-Founder, of Azul Systems? Like Stephen Curry draining a deep transition three, Gil can always be counted on for his insight:

And here's Gil's answer:

Putting aside questions of taste and style, and focusing on the effects on latency (the original question), the analysis from a pure mechanical point of view is pretty simple: Docker uses Linux containers as a means of execution, with no OS virtualization layer for CPU and memory, and with optional (even if default is on) virtualization layers for i/o. 

CPU and Memory

From a latency point of view, Docker's (and any other Linux container's) CPU and memory latency characteristics are pretty much indistinguishable from Linux itself. But the same things that apply to latency behavior in Linux apply to Docker.

If you want clean & consistent low latency, you'll have to the same things you need to do on non-dockerized and non-containerized Linux for the same levels of consistency. E.g. if you needed to keep the system as a whole under control (no hungry neighbors), you'll have to do that at the host level for Docker as well.

If you needed to isolate sockets or cores and choose which processes end up where, expect to do the same for your docker containers and/or the threads within them.

If you were numactl'ing or doing any sort of directed numa-driven memory allocation, the same will apply.

And some of the stuff you'll need to do may seem counter-style to how some people want to deploy docker, but if you are really interested in consistent low latency, you'll probably need to break out the toolbox and use the various cgroups, tasksets and other cool stuff to assert control over how things are laid out. But if/when you do, you won't be able to tell the difference (in terms of CPU and memory latency behaviors) between a dockeriz'ed process and one that isn't.

I/O Disk I/O

I/O behavior under various configurations is where most of the latency overhead questions (and answers) usually end up. I don't know enough about disk i/o behaviors and options in docker to talk about it much. I'm pretty sure the answer to anything throughput and latency sensitive for storage will be "bypass the virtualization and volumes stuff, and provide direct device access to disks and mount points".

Networking

The networking situation is pretty clear: If you want one of those "land anywhere and NAT/bridge with some auto-generated networking stuff" deployments, you'll probably pay dearly for that behavior in terms of network latency and throughput (compared to bare metal dedicated NICs on normal linux). However, there are options for deploying docker containers (again, may be different from how some people would like to deploy things) that provide either low-overhead or essentially zero-latency-overhead network links for docker. Start with host networking and/or use dedicated IP addresses and NICs, and you'll do much better than the bridged defaults. But you can go to things like Solarflare's NICs (which tend to be common in bare metal low latency environments already), and even do kernel bypass, dedicated spinning-core network stack things that will have a latency behavior no different (on Docker) than if you did the same on bare metal Linux.

 

Docker (which is "userland as a unit") is not about packing lots of thing into a box. Neither is guest-OS-as-a-unit virtualization. Sure, they can both be used for that (and often are), but the biggest benefit they both give is the ability to ship around a consistent, well captured configuration. And the ability to develop, test, and deploy that exact same configuration. This later turns into being able to easily manage deployment and versioning (including roll backs), and being able to do cool things like elastic sizing, etc. There are configuration tools (puppet/chef/...) that can be used to achieve similar results on bare metal as well, of course (assuming they truly control everything in your image), but the ability to pack up your working stuff as a bunch of bits that can "just be turned on" is a very appealing.

I know people who use virtualization even with a single guest-per-host (e.g. an AWS r3.8xlarge instance type is probably that right now). And people who use docker the same way (single container per host). In both cases, it's about configuration control and how things get deployed, and not at all about packing things in a smaller footprint.

The low latency thing then becomes a "does it hurt?" question. And Docker hurts a lot less than hypervisor or KVM based virtualization does when it comes to low latency, and with the right choices for I/O (dedicated NICs, cores, and devices), it becomes truly invisible.

On HackerNews

Categories: Architecture

Does AMP Counter an Existential Threat to Google?

Mon, 12/14/2015 - 18:06

When AMP (Accelerated Mobile Pages) was first announced it was right inline with Google’s long standing project to make the web faster. Nothing seemingly out of the ordinary.

Then I listened to a great interview on This Week in Google with Richard Gingras, Head of News at Google, that made it clear AMP is more than just another forward looking initiative from Google. Much more.

What is AMP? AMP is two things. AMP is a restricted subset of HTML designed to make the web fast on mobile devices. AMP is also a strategy to counter an existential threat to Google: the mobile web is in trouble and if the mobile web is in trouble then Google is in trouble.

In the interview Richard says (approximately):

The alternative [to a strong vibrant community around AMP] is devastating. We don’t want to see a decline in the viability of the mobile web. We don’t want to see poor experiences on the mobile web propel users into proprietary platforms.

This point, or something very like it, is repeated many times during the interview. With ad blocker usage on the rise there’s a palpable sense of urgency to do something. So Google stepped up and took leadership in creating AMP when no one else was doing anything that aligned with the principles of the free and open web.

The irony for Google is that advertising helped break the web. We have fouled our own nest.

Why now? Web pages are routinely between 2MB and 10 MB in size for only 80K worth of content. The blimpification of web pages comes from two general sources: beautification and advertising. Lots of code and media are used to make the experience of content more compelling. Lots of code and media are used in advertising.

The result: web pages have become very very slow. And a slow web is a dead web, especially in the parts of the world without fast or cheap mobile networks, which is much of the world. For many of these people the Internet consists of their social network, not the World Wide Web, and that’s not a good outcome for lots of people, including Google. So AMP wants to make people fall in love with the web again by speeding it up using a simple, cachable, and open format.

Does AMP work? Pinterest found AMP pages load four times faster and use eight times less data than traditional mobile-optimized pages. So, yes.

Is AMP being adopted? Seems like it.  Some of those on board are: WordPress, Nuzzle, LinkedIn, Twitter. Fox News, The WSJ, The NYT, Huffington Post, BuzzFeed, The Washington Post, BBC, The Economist, FT, Vox Media, LINE, Viber, and Tango, comScore, Chartbeat, Google Analytics, Parse.ly, Network18, and many more. Content publishers clearly see value in the survival of the web. Developers like AMP too. There are over 4500 developers on the AMP GitHub project.

When will AMP start? Google will reportedly send traffic to AMP pages in Google Search starting in late February, 2016.

Will Google advantage AMP in search results? Not directly says Google, but since faster sites rank better, AMP will implicitly rank higher compared to heavier weight content. We may have a two tiered web: the fast AMP based web and the slow bloated traditional web. Non AMP pages can still be made fast of course, but all of human history argues against it.

The AMP talk featured a well balanced panel representing a wide variety of interests. Leo Laporte, famous host and founder of TWiT, represents the small content publisher. He views AMP with a generally positive yet skeptical eye. AMP is open source, but it is still controlled by Google, so is the web still the open web? Jeff Jarvis is a journalism professor and a long time innovative thinker on how journalism can stay alive in the modern era. Jeff helped inspire the idea of AMP and sees AMP as a way publishers can distribute content to users on whatever form of media users are consuming. Kevin Marks is as good a representative for the free and open web as you could ask for. Matt Cutts as a very early employee at Google is of course pro Google, but he’s also represents an engineering perspective. Richard Gingras is the driving force behind AMP at Google. He’s also a compelling evangelist for AMP and the need for a true new Web 2.0.

Here’s a gloss of the discussion. I’m not attributing who said what, just the outstanding points that help reveal AMP’s vision for the future of the open web:

Origin Story
Categories: Architecture

Stuff The Internet Says On Scalability For December 11th, 2015

Fri, 12/11/2015 - 18:11

Hey, it's HighScalability time:


Cheesy Star Trek graphics? Nope. It's hot gas streaming into Pandora’s Cluster.

 

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.

  • 100 millionJohn Henry as played by a conventional computer loses to a quantum computer; 400,000: cores in PayPal's OpenStack deployment; 10TB: max size of Google Cloud SQL database; 9%: Kickstarter projects that don't deliver; $2.3 trillion: worth of The Forbes 400 members; billions: worth of Spanish treasure ship;

  • Quotable Quotes:
    • Pandalicious: I actually expect that down the road most large open source projects will start distributing a standardized build environment via docker containers. 
    • @glasnt: "Optimise for speed flexibility & evolution" "Whoever is iterating faster has a huge advantage" - @adrianco #yow15 
    • @erikbryn: LIDAR goes from $75K to $500, leaves Moore's Law in the Dust
    • Henry Miller: One has to believe wholeheartedly in what one is doing, realize that it is the best one can do at the moment—forego perfection now and always!—and accept the consequences which giving birth entails.
    • @jedws: "uber is way more reliable on Saturday and Sunday because there are no engineers working on the.system" #yow15
    • @samkottle: "Waffles are like kubernetes on a dish" -@rbranson
    • @brian_klaas: No server is easier to manage than no server, but are we moving all the complexity to the front-end?
    • @Carnage4Life: Death of #unbundling part 2: Facebook shutting down lab which shipped side apps like Hello, Rooms & Slingshot 
    • @carlosfairgray: Efforts to drive uncertainty out of development have only driven innovation out of development. #yow15 @DReinertsen 
    • : “Let’s legislate secure cryptographic backdoors” is the 21st century’s “let’s pass a law to make π = 3”
    • @jessitron: To call an API, or just grab it from the database? Don't tap into another team at the spine. Talk to their faces.
    • Brian Chesky: One of the keys to get to scale, is to do things that don’t scale. One other important lesson within this lesson is — 100 customers who love you > 1,000,000 users.
    • IbanezDavy: The areas of where we expect quantum computers to be faster are roughly known. There are cases where classical computers will still perform better than a quantum computer. But D Wave has been criticized of not truly having a quantum computer, so I think they are motivated in just demonstrating that they do indeed have one.
    • @tiagogriffo: "We developed the product so fast that marketing had not time to change the requirements" said a PM. From @DReinertsen talk at #yow15
    • @xaprb: push 10,000 metrics/sec at 1-sec resolution for 1000 servers for a year and see if it scales forever ;-)

  • Apple has open sourced Swift for reals, not just a code dump months too late to be of use. Swift is on github, you can look at the code, see the entire version history from the very first check-in, see what's changing, contribute, file bugs, etc. So it's a real open source project. Apple is even porting key frameworks like their Foundation libraries over to Swift. If you are looking for the one language to rule them all, that can run fast enough on the server, be used for web apps, and run on mobile, Swift is making the case for being that language, which is no doubt what Apple also wants it for. Incentives align. Expect developers to quickly fillout the tool chain. How does Swift compare? Go vs Node vs Rust vs Swift. Swift is fast, but lacks language primitives for parallelism. 

  • Ruby can be much faster. 25,000+ Req/s for Rack JSON API with MRuby~ MRuby is a minimal version of Ruby, that can be embedded in any system that supports C...There is a new HTTP web server called H2O, which is really, really fast...When H2O is compiled, it embeds a MRuby interpreter that can be used to run Ruby code. The result: an astonishing: 28,000+ requests per second.

  • Fox guarding the chickens. U.S. states pass laws backing Uber’s view of drivers as contractors.

  • In the same way there's always a tradeoff between ASIC and white box solutions, there's also an ebb and flow between domain specific languages and general purpose languages. Google replaced Sawzall, a DSL for performing powerful, scalable analysis, with a software ecosystem built around Go. Replacing Sawzall — a case study in domain-specific language migration. The result: we’ve found that with carefully designed libraries we can get most of the benefits of Sawzall in Go while gaining the advantages of a powerful general-purpose language. The overall response of analysts to these changes has been extremely positive. Today, logs analysis is one of the most intensive users of Go at Google, and Go is the most-used language for reading logs through the logs proxy.

  • There's a new data mining Barbie. The new talking Hello Barbie doll has the mind of Siri: "Equipped with Siri-like voice-recognition software and a wi-fi connection, Hello Barbie can respond to questions from kids about everything from her favorite color to career goals." Unfortunately I can't take credit for the data mining comment, I heard it on TWiT

  • If you have 70 data caching stations around the world connected with fast links and you are already expert at caching your own content, starting your own CDN makes a lot sense. So that's what Google did. Cloud CDN. Interestingly, Google may be trying to turn these caching stations into datacenters, so says Google's Secret Plan to Catch Up to Amazon and Microsoft in Cloud. If you could use Kubernetes to place work on the edge and combine that with some kind of multi-datacenter database, you would have yourself very low latency access to a lot of mobile devices.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Free Red Book: Readings in Database Systems, 5th Edition

Wed, 12/09/2015 - 17:56

For the first time in ten years there has been an update to the classic Red Book, Readings in Database Systems, which offers "readers an opinionated take on both classic and cutting-edge research in the field of data management."

Editors Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker curated the papers and wrote pithy introductions. Unfortunately, links to the papers are not included, but a kindly wizard, Nindalf, gathered all the referenced papers together and put them in one place.

What's in it?

  • Preface 
  • Background introduced by Michael Stonebraker 
  • Traditional RDBMS Systems introduced by Michael Stonebraker 
  • Techniques Everyone Should Know introduced by Peter Bailis 
  • New DBMS Architectures introduced by Michael Stonebraker
  • Large-Scale Dataflow Engines introduced by Peter Bailis 
  • Weak Isolation and Distribution introduced by Peter Bailis 
  • Query Optimization introduced by Joe Hellerstein 
  • Interactive Analytics introduced by Joe Hellerstein 
  • Languages introduced by Joe Hellerstein 
  • Web Data introduced by Peter Bailis 
  • A Biased Take on a Moving Target: Complex Analytics by Michael Stonebraker 
  • A Biased Take on a Moving Target: Data Integration by Michael Stonebraker
Related Articles

 

Categories: Architecture

Sponsored Post: StatusPage.io, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 12/08/2015 - 17:56

Who's Hiring?
  • Senior Devops Engineer - StatusPage.io is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event could be here. How cool is that?
Cool Products and Services
  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

The Serverless Start-up - Down with Servers!

Mon, 12/07/2015 - 17:56

teletext.io

This is a guest post by Marcel Panse and Sander Nagtegaal from Teletext.io.

In our early Peecho days, we wrote an article explaining how to build a really scalable architecture for next to nothing, using Amazon Web Services. Auto-scaling, merciless decoupling and even automated bidding on unused server capacity were the tricks we used back then to operate on a shoestring. Now, it is time to take it one step further.

We would like to introduce Teletext.io, also known as the serverless start-up - again, entirely built around AWS, but leveraging only the Amazon API Gateway, Lambda functions, DynamoDb, S3 and Cloudfront.

The Virtues of Constraint

We like rules. At our previous start-up Peecho, product owners had to do fifty push-ups as payment for each user story that they wanted to add to an ongoing sprint. Now, at our current company myTomorrows, our developer dance-offs are legendary: during the daily stand-ups, you are only allowed to speak while dancing - leading to the most efficient meetings ever.

This way of thinking goes all the way into our product development. It may seem counter-intuitive at first, but constraints fuel creativity. For example, all our logo design is done with technical diagramming tool Omnigraffle, so there is no way we could use hideous lens flares and such. Anyway - recently, we launched yet another initiative called Teletext.io. So, we needed a new restriction.

At Teletext.io, we are not allowed to use servers. Not even one.

It was a good choice. We will explain why.

Why Servers are Bad
Categories: Architecture

Stuff The Internet Says On Scalability For December 4th, 2015

Fri, 12/04/2015 - 17:56

Hey, it's HighScalability time:


Change: Elliott $800,000 in 1960, 8K RAM, 2kHz CPU vs Raspberry Pi Zero, $5, 1Ghz, 512MB

 

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.

  • 434,000: square-feet in Facebook's new office;  $62.5 billion: Uber's valuation; 11: DigitalOcean datacenters; $4.45 billion: black Friday online sales; 2MPH: speed news traveled in 1500; 95: percent of world covered by mobile broadband; 86%: items Amazon delivers that weigh less than five pounds.

  • Quotable Quotes:
    • Jeremy Hsu: Is anybody thinking about how we’ll have to code differently to accommodate the jump from a 1-exaflop supercomputer to 10 exaflops? There is not enough attention being paid to this issue.
    • @kml: “Process drives away talent” - @adrianco at #yow15
    • capkutay: Seems like a lot of the momentum behind containers is driven by the Silicon Valley investment community.
    • @taotetek: IoT is turning homes into datacenters with no system administrators and no security team.
    • @asymco: On Thursday and early Friday, mobile traffic accounted for nearly 60% of all online shopping traffic, and 40% of all online sales
    • Mobile App Developers are Suffering: It’s just too saturated. The barriers to adoption and therefore monetization are too high. It’s easier on the web.
    • Taleb: It is foolish to separate risk taking from the risk management of ruin.
    • Maxime Chevalier-Boisvert:  I believe dynamic languages are here to stay. They can be very nimble, in ways that statically typed languages might never be able to match. We’re at a point in time where static typing dominates mainstream thought in the programming world, but that doesn’t mean dynamic languages are dead.
    • @__edorian: "Can i have a static linked binary?" - "No that's stupid, it's slower and takes more space!" - "Can i have a docker image?" - "Sure!
    • @grzegorz_dyk: When I see people talking about fine grained #microservices I am thinking: why not use actors? #akka #erlang
    • Henry Miller: When you can’t create you can work.
    • @ValaAfshar: For the first time ever, online media consumption is bigger than TV consumption. 
    • @matthewfellows: I learned today that Airbus code is reviewed by hand... in raw assembly code #yow15 @dius_au
    • Rich Hickey: Programmers know the benefits of everything and the tradeoffs of nothing
    • Robin Harris: Cheap storage is changing the world. Whether it is in the cloud, on a dash cam, or embedded in an app, cheap – as in inexpensive – storage is enabling new relationships between individuals, and with culture, power, and groups.
    • @sustrik: libmill shows 1400x performance improvement in c10k scenarios. Wow! I love low-hanging fruit.
    • @jmckenty: At Scale: Bigger than what you’ve got now.
    • John Cage: My notion of how to proceed in a society to bring change is not to protest the thing that is evil, but rather to let it die its own death.
    • @b6n: preemptively blog about how you scaled to support the million users you don't have yet.
    • @joeweinman: When will the FCC start addressing app neutrality?
    • @ufried: i have this post about data scalability always open in a tab, just to remind me of some essentials once in a while 

  • Personalization is getting more personal and more useful. Personalized Nutrition: Healthy foods are unique to individuals: Israeli research teams have demonstrated that there exists a high degree of variability in the responses of different individuals to identical meals...Using their set of amassed data, the researchers then went a step further, applying machine-learning algorithm to their cohort of 800 participants and developing an algorithm capable of predicting individualized PPGRs (postprandial (post-meal) glycemic responses). This intricate algorithm incorporates 137 features representing meal content, daily activity, blood parameters, CGM-derived features, questionnaires, and microbiome features.

  • Now that's putting concertina wire on the walled garden fence. WhatsApp is blocking links to a competing messenger app.

  • As programming is a creative act, perhaps the ultimate creative act, this advice applies to programmers too. Ira Glass: Nobody tells this to people who are beginners, I wish someone told me. All of us who do creative work, we get into it because we have good taste. But there is this gap. For the first couple years you make stuff, it’s just not that good. It’s trying to be good, it has potential, but it’s not. But your taste, the thing that got you into the game, is still killer. And your taste is why your work disappoints you. A lot of people never get past this phase, they quit. Most people I know who do interesting, creative work went through years of this. We know our work doesn’t have this special thing that we want it to have. We all go through this. And if you are just starting out or you are still in this phase, you gotta know its normal and the most important thing you can do is do a lot of work. Put yourself on a deadline so that every week you will finish one story. It is only by going through a volume of work that you will close that gap, and your work will be as good as your ambitions. And I took longer to figure out how to do this than anyone I’ve ever met. It’s gonna take awhile. It’s normal to take awhile. You’ve just gotta fight your way through.

  • So you want a revolution, what will be the cost? It’s a Trap: Emperor Palpatine’s Poison Pill: In this case study we found that the Rebel Alliance would need to prepare a bailout of at least 15%, and likely at least 20%, of GGP in order to mitigate the systemic risks and the sudden and catastrophic economic collapse. Without such funds at the ready, it likely the Galactic economy would enter an economic depression of astronomical proportions.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Deep Lessons from Google and eBay on Building Ecosystems of Microservices

Tue, 12/01/2015 - 17:56

When you look at large scale systems from Google, Twitter, eBay, and Amazon, their architecture has evolved into something similar: a set of polyglot microservices.

What does it looks like when you are in the polyglot microservices end state? Randy Shoup, who worked in high level positions at both Google and eBay, has a very interesting talk exploring just that idea: Service Architectures at Scale: Lessons from Google and eBay.

What I really like about Randy's talk is how he is self-consciously trying to immerse you in the experience of something you probably have no experience of: creating, using, perpetuating, and protecting a large scale architecture.

In the Ecosystem of Services section of the talk Randy asks: What does it look like to have a large scale ecosystem of polyglot microservices? In the Operating Services at Scale section he asks: As a service provider what does it feel like to operate such a service? In the Building a Service section he asks: When you are a service owner what does it look like? And in the Service Anti-Patterns section he asks: What can go wrong?

A very powerful approach.

The highlight of the talk for me was the idea of aligning incentives, a consistent theme that crosscuts the entire endeavour. While never explicitly pulled out as a separate strategy, it's the motivation behind why you want small teams to develop small clean services, why a charge back model for internal services is so powerful, how architecture can evolve without an architect, how clean design can evolve from a bottom up process, and how standards can evolve without a central committee.

My takeaway is the deliberate aligning of incentives is how you scale both a large, dynamic organization and a large, dynamic code base. Putting in the right incentives nudges things into happening without explicit control, almost in the same way more work in a distributed system gets done when you remove locks, don't share state, communicate with messages, and parallelize everything.

Let's see how large scale systems are built in the modern era...

Polyglot Microservices are the End Game
Categories: Architecture

Stuff The Internet Says On Scalability For November 27th, 2015

Fri, 11/27/2015 - 17:56

Hey, it's HighScalability time:


The most detailed picture of the Internet ever as compiled by an illegal 420,000-node botnet.
  • $40 billion: P2P lending in China; 20%: amount of all US margin expansion accounted for by Apple since 2010; 11: years of Saturn photos; 117: number of different steering wheels offered for a VW Golf; 1Gbps: speed of a network using a lightbulb.

  • Quotable Quotes:
    • @jaksprats: If we could compile a subset of JavaScript to Lua, JS could run on Server(Node,js), Browser, Desktop, iOS, & Android.JS could run EVERYWHERE
    • @wilkieii: Tech: "Don't roll your own crypto if you aren't an expert" *replaces nutrition with Soylent, currency with bitcoin* *puts wifi in lightbulb*
    • @brianpeddle: The architecture of one human brain would require a zettabyte of capacity. Full simulation of a human brain by 2023.
    • MarshalBanana: That can still easily be the right choice. Complex algorithms trade asymptotic performance for setup cost and maintenance cost. Sometimes the tradeoff isn't worth it.
    • kevindeasi: There are so many things to know nowadays. Backend: Sql, NoSql, NewSql, etc. Middlware: Django, NodeJs, Spring, Groovy, RoR, Symfony, etc. Client: Angular, Ember, React, Jquery, etc. I haven't even mentioned hardware, security, servers/cloud, and api. Now you also need to know about theory, UI/UX, git, deploying servers, HTTP, scrum, software development process, testing.
    • Brian Chesky~ It was better to have 100 people who loved us vs. 1M people who liked us. All movements grow this way.
    • idlewords: All the advantages of a dedicated server without the hassle of saving tons of money.
    • jorangreef: Well, how would you handle massive traffic spikes? Through a combination of vertical and horizontal scaling? Through having excess capacity? Except that I would probably want to start with something fast and inexpensive to begin with.
    • @jaykreps: "The bigger the interface, the weaker the abstraction"--@rob_pike
    • Animats: That still irks me. The real problem is not tinygram prevention. It's ACK delays, and that stupid fixed timer. They both went into TCP around the same time, but independently. I did tinygram prevention (the Nagle algorithm) and Berkeley did delayed ACKs, both in the early 1980s. The combination of the two is awful.
    • @jaykreps: Distributed computing is the new normal: Mesos, K8s = dist'd processes; Cassandra, Kafka, etc = dist'd data; microservices = dist'd apps.
    • @bradfitz: OH: "Well you can add nodes to the cluster. They made that work well, but you can't remove them. It's the Hotel California of auto-scaling."

  • Creating Your Own EC2 Spot Market -- Part 2. Video encoding represents 70% of Netflix's computing needs. And Netflix has a daily peak of 12,000 unused instances. So they created their own spot market to improve encoding throughput by the equivalent of a 210% increase in encoding capacity. Using their update real-time approach they were able to perform an encoding job in 18 hours that they expected to take a few days. Great article with a lot of deep thinking on the topic.

  • Amen! We should come up with a catchy name for RAII so more languages support it because RAII is awesome and simplifies code!

  • Google as a cloud company instead of an ad company? It could happen: Google's Holzle Envisions Cloud Business Eclipsing Ads in 2020. Google announced Custom Machine Types  so you can configure the number of virtual CPUs and the amount RAM you want for you machine. I imagine this nifty feature is enabled by Google's advanced datacenter scheduling software, but it will take more than that to beat AWS and Azure. To take market share Google may need to instigate a price war. Though it looks like Google might make a lot of money charging back to Google.

  • Good explanation of what is servless computing by Leonardo Federico: the phrase “serverless” doesn’t mean servers are no longer involved. It simply means that developers no longer have to think "that much" about them. Computing resources get used as services without having to manage around physical capacities or limits. Let's take for example AWS Lambda. "Lambda allows you to NOT think about servers. Which means you no longer have to deal with over/under capacity, deployments, scaling and fault tolerance, OS or language updates, metrics, and logging."

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Sponsored Post: StatusPage.io, iStreamPlanet, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Tue, 11/24/2015 - 17:56

Who's Hiring?
  • Senior Devops Engineer - StatusPage.io is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.

  • As a Networking & Systems Software Engineer at iStreamPlanet you’ll be driving the design and implementation of a high-throughput video distribution system. Our cloud-based approach to video streaming requires terabytes of high-definition video routed throughout the world. You will work in a highly-collaborative, agile environment that thrives on success and eats big challenges for lunch. Please apply here.

  • As a Scalable Storage Software Engineer at iStreamPlanet you’ll be driving the design and implementation of numerous storage systems including software services, analytics and video archival. Our cloud-based approach to world-wide video streaming requires performant, scalable, and reliable storage and processing of data. You will work on small, collaborative teams to solve big problems, where you can see the impact of your work on the business. Please apply here.

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event could be here. How cool is that?
Cool Products and Services
  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture