Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator/sources/3' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.
Syndicate content
Updated: 6 hours 42 min ago

The Always On Architecture - Moving Beyond Legacy Disaster Recovery

Wed, 08/24/2016 - 00:42
Failover does not cut it anymore. You need an ALWAYS ON architecture with multiple data centers. -- Martin Van Ryswyk, VP of Engineering at DataStax

Failover, switching to a redundant or standby system when a component fails, has a long and checkered history as a way of dealing with failure. The reason is your failover mechanism becomes a single point of failure that often fails just when it's needed most. Having worked on a few telecom systems that used a failover strategy I know exactly how stressful failover events can be and how stupid you feel when your failover fails. If you have a double or triple fault in your system failover is exactly the time when it will happen. 

For a long time the only real trick we had for achieving fault tolerance was to have a hot, warm, or cold standby (disk, interface, card, server, router, generator, datacenter, etc.) and failover to it when there's a problem. This old style of Disaster Recovery planning is no longer adequate or necessary.

Now, thanks to cloud infrastructures, at least at a software system level, we have an alternative: an always on architecture. Google calls this a natively multihomed architecture. You can distribute data across multiple datacenters in such away that all your datacenters are always active. Each datacenter can automatically scale capacity up and down depending on what happens to other datacenters. You know, the usual sort of cloud propaganda. Robin Schumacher makes a good case here: Long live Dear CXO – When Will What Happened to Delta Happen to You?

Recent Problems With Disaster !Recovery
Categories: Architecture

Stuff The Internet Says On Scalability For August 19th, 2016

Fri, 08/19/2016 - 16:56

Hey, it's HighScalability time:

 


Modern art? Nope. Pancreatic cancer revealed by fluorescent labeling.

 

If you like this sort of Stuff then please support me on Patreon.
  • 4: SpaceX rocket landings at sea; 32TB: 3D Vertical NAND Flash; 10x: compute power for deep learning as the best of today’s GPUs; 87%: of vehicles could go electric without any range problems; 06%: visitors that post comments on NPR; 235k: terrorism related Twitter accounts closed; 40%: AMD improvement in instructions per clock for Zen; 15%: apps are slower is summer because of humidity;

  • Quotable Quotes:
    • @netik: There is no Internet of Things. There are only many unpatched, vulnerable small computers on the Internet.
    • @Pinboard: The Programmers’ Credo: we do these things not because they are easy, but because we thought they were going to be easy
    • Aphyr: This advantage is not shared by sequential consistency, or its multi-object cousin, serializability. This much, I knew–but Herlihy & Wing go on to mention, almost offhand, that strict serializability is also nonlocal!
    • @PHP_CEO: I’VE HAD AN IDEA / WE’LL TAKE ALL THE BAD CODE / BUNDLE IT TOGETHER / AND SELL IT TO VCS AS A COLLATERALIZED TECHNICAL DEBT OBLIGATION
    • felixgallo: I agree, the actor model is a significantly more usable metaphor for containers than functions. When you start thinking about supervisor trees, you start heading towards Kubernetes, which is interesting.
    • David Rosenthal: So in practice blockchains are decentralized (not), anonymous (not and not), immutable (not), secure (not), fast (not) and cheap (not). What's (not) to like?
    • @grimmelm: You know, you can’t spell “idiotic” without “IoT”
    • @jroper: 10 years ago, backends were monolithic services and frontends many pages. Now frontends are monolithic pages and backends many services.
    • @jakevoytko: Ordinary human: Hey, this is a fork. You can eat with it! People who comment on programming blogs: You can't eat soup with that.
    • iLoch: Wow $5000/mo for 2000rps, just for the application servers? That's absurd. I think we're paying around $2000/mo for our app servers, a database which is over 2TB in size, and we ingest about 10 megabytes of text data per second, on top of a couple thousand requests per second to the user facing application.
    • @josh_wills: I'm thinking about writing a book on data engineering for kids: "An Immutable, Append-Only Log of Unfortunate Events"
    • Kill Process: What the world needs is not a new social network that concentrates power in a single place, but a design to intrinsically prevent the concentration of power that results in barriers to switching.
    • ljmasternoob: the bump was just Schrödinger's cat stepping on Occam's razor.
    • carsongross: The JVM is a treasure just sitting there waiting to be rediscovered.
    • @mjpt777: When @nitsanw points out some of what he finds in the JVM I often end up crying :(
    • @karpathy: I hoped TensorFlow would standardize our code but it's low level so we've diverged on layers over it: Slim, PrettyTensor, Keras, TFLearn ...
    • @rbranson:  coordination is a scaling bottleneck in teams as much as it is in distributed systems.
    • @mathiasverraes: There are only two hard problems in distributed systems:  2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery
    • @PhilDarnowsky: I've been using dynamically typed languages for a living for a decade. As a result, I prefer statically typed languages.
    • Allyn Malventano: 64-Layer is Samsung's 4th generation of V-NAND. We've seen 48-Layer and 32-Layer, but few know that 24-Layer was a thing (but was mainly in limited enterprise parts).
    • @cmeik: "It's a bit odd to me that programming languages today only give you the ability to write something that runs on one machine..." [1/2]
    • @trengriffin: @amcafee Use of higher radio frequencies will require a lot more antennas creating ever smaller coverage areas. More heterogeneous bandwidth
    • @jamesurquhart: Disagree IaaS multicloud tools will play major role moving forward. Game is in PaaS and app deployment (containers).

  • Linking it all together on a great episode of This Week In Tech. Google’s new OS, Fuchsia, for places where Android fears to tread, smaller, lower power IoT type devices. Intel Optane is an almost shipping non-volatile memory that is 1000X faster than SSD (maybe not), has up to 10X the capacity of DRAM, while only being a few X slower than typical DRAM, is perfect for converged IoT devices. Say goodbye to blocks and memory tiers. IoT devices don't have to be fast, so DRAM can be replaced with this new memory, hopefully making simpler cheaper devices that can last a decade on a small battery, especially when combined with low power ARM CPUsNVMe is replacing SATA and AHCI for higher bandwidth, lower latency access to non-volatile memory. 5g, when it comes out, will specifically support billions of low power IoT devices. Machine learning ties everything together. That future that is full of sensors may actually happen. As Greg Ferro said~ We are starting to see the convergence of multiple advances. You can start to plot a pathway forward to see where the disruption occurs. The irony, still, is nothing will work together. We have ubiquitous wifi more from a fluke of history than any conscious design. We see how when left up to industry the silo mindset captures all reason, and we are all the poorer for it.

  • We have water rights. Mineral rights. Surface rights. Is there such a thing as virtual property rights? Do you own the virtual property rights of your own property when someone else decides to use it in an application? Pokemon GO Hit With Class Action LawsuitWhy do people keep coming to this couple’s home looking for lost phones?

  • As data becomes more valuable that we are the product becomes assumed. Provider of Personal Finance Tools Tracks Bank Cards, Sells Data to Investors: Yodlee has another way of making money: The company sells some of the data it gathers from credit- and debit-card transactions to investors and research firms...Yodlee can tell you down to the day how much the water bill was across 25,000 citizens of San Francisco” or the daily spending at McDonald’s throughout the country...The details are so valuable that some investment firms have paid more than $2 million apiece for an annual subscription to Yodlee’s service.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Sponsored Post: Zohocorp, Exoscale, Host Color, Cassandra Summit, Scalyr, Gusto, LaunchDarkly, Aerospike, VividCortex, MemSQL, AiScaler, InMemory.Net

Tue, 08/16/2016 - 16:56

Who's Hiring?
  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.

Fun and Informative Events
  • Join database experts from companies like Apple, ING, Instagram, Netflix, and many more to hear about how Apache Cassandra changes how they build, deploy, and scale at Cassandra Summit 2016. This September in San Jose, California is your chance to network, get certified, and trained on the leading NoSQL, distributed database with an exclusive 20% off with  promo code - Academy20. Learn more at CassandraSummit.org

  • NoSQL Databases & Docker Containers: From Development to Deployment. What is Docker and why is it important to Developers, Admins and DevOps when they are using a NoSQL database? Find out in this on-demand webinar by Alvin Richards, VP of Product at Aerospike, the enterprise-grade NoSQL database. The video includes a demo showcasing the core Docker components (Machine, Engine, Swarm and Compose) and integration with Aerospike. See how much simpler Docker can make building and deploying multi-node, Aerospike-based applications!  
Cool Products and Services
  • Do you want a simpler public cloud provider but you still want to put real workloads into production? Exoscale gives you VMs with proper firewalling, DNS, S3-compatible storage, plus a simple UI and straightforward API. With datacenters in Switzerland, you also benefit from strict Swiss privacy laws. From just €5/$6 per month, try us free now.

  • High Availability Cloud Servers in Europe: High Availability (HA) is very important on the Cloud. It ensures business continuity and reduces application downtime. High Availability is a standard service on the European Cloud infrastructure of Host Color, active by default for all cloud servers, at no additional cost. It provides uniform, cost-effective failover protection against any outage caused by a hardware or an Operating System (OS) failure. The company uses VMware Cloud computing technology to create Public, Private & Hybrid Cloud servers. See Cloud service at Host Color Europe.

  • Dev teams are using LaunchDarkly’s Feature Flags as a Service to get unprecedented control over feature launches. LaunchDarkly allows you to cleanly separate code deployment from rollout. We make it super easy to enable functionality for whoever you want, whenever you want. See how it works.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

 

If any of these items interest you there's a full description of each sponsor below...

Categories: Architecture

How PayPal Scaled to Billions of Transactions Daily Using Just 8VMs

Mon, 08/15/2016 - 16:56

How did Paypal take a billion hits a day system that might traditionally run on a 100s of VMs and shrink it down to run on 8 VMs, stay responsive even at 90% CPU, at transaction densities Paypal has never seen before, with jobs that take 1/10th the time, while reducing costs and allowing for much better organizational growth without growing the compute infrastructure accordingly? 

PayPal moved to an Actor model based on Akka. PayPal told their story here: squbs: A New, Reactive Way for PayPal to Build Applications. They open source squbs and you can find it here: squbs on GitHub.

The stateful service model still doesn't get enough consideration when projects are choosing a way of doing things. To learn more about stateful services there's an article, Making The Case For Building Scalable Stateful Services In The Modern Era, based on an great talk given by Caitie McCaffrey. And if that doesn't convince you here's WhatsApp, who used Erlang, an Akka competitor, to achieve incredible throughput: The WhatsApp Architecture Facebook Bought For $19 Billion.

I refer to the above articles because the PayPal article is short on architectural details. It's more about the factors the led the selection of Akka and the benefits they've achieved by moving to Akka. But it's a very valuable motivating example for doing something different than the status quo. 

What's wrong with services on lots of VMs approach?

Categories: Architecture

Stuff The Internet Says On Scalability For August 12th, 2016

Wed, 08/10/2016 - 16:20

Hey, it's HighScalability time:

 

 

The big middle finger to the Olympic Committee. They pulled this video of the incredibly beautiful Olympic cauldron at Rio.

 

If you like this sort of Stuff then please support me on Patreon.
  • 25 years ago: the first website went online; $236M: Pokemon Go revenue in 5 weeks in 3 countriesSeveral thousand: work on Apple maps; 2500 Nimitz Carriers: weight of iPhone if implemented using tube transistors; $50 trillion: cost of iPhone in 1950, economic output of the world in your hand; 1000x: faster phase-change RAM; 15lbs: Americans heavier than 20 years ago; 2 years: for hacking the IRS; 3.6PB: hypothetical storage pod based on 60 TB SSD; 330,000: cash registers hacked; 162%: increased love for electric cars in China; 

  • Quotable Quotes:
    • @carllerche: it is hard to imagine how a node app could get closer to the metal with only 20MM LOC between the app and the hardware.
    • David Heinemeier Hansson (RoR)~ Lots and lots of huge systems that are running the gosh darn Internet are built by remote people operating asynchronously. You don't think that's good enough for your little shop?
    • Cesarini: Some frameworks that try to automate activities end up failing to hide complexity. They limit the trade-offs you can make, so they cater only to a subset of systems, often with very detailed requirements. 
    • "Uncle" Bob Martin: I have lived through 22 orders of magnitude growth of growth in hardware.
    • Jovanovic: To use Bitcoin for real-time trades, we need to eliminate its lazy fork-resolution mechanism and adopt strong consistency, a more proactive approach that guarantees transaction persistence.
    • Pedro Ramalhete: one latency distribution plot is worth a thousand throughput measurements
    • @n1ko_w1ll: Impressive numbers:  - 80% cut code with #scala - responsive at 90% load with #akka Impressive numbers: - 80% cut code with #scala- responsive at 90% load with #akka
    • @samkroon: So Aussie government is asking 20 million ppl to login to one web site on the same night... Fail. Should have gone #serverless. #census2016
    • @caitie: "My contribution to RPC is not to make another system based on RPC" @cmeik #NikeTechTalks
    • @krisajenkins: This is your return type: Int / This is your return type on microservices: IO / (Logger (Either HttpError Int)) Microservices: Know the risks.
    • @nosqlonsql: Latency drives throughput if you cannot achieve enough concurrency. Kafka vs Chronicle. Must read by @PeterLawrey
    • reddit: Today's date is 100/1000/10000 in binary
    • @caitie: "The languages we associate with distributed programming are really concurrent languages" @cmeik #NikeTechTalks
    • @goserverless: Lambda down :( #aws #serverless
    • @pkanavos: @goserverless I think I'll PaaS
    • Jan Wedel: So if you plan to build an application from scratch and it is only meant to be used in on-premise scenarios as described, you probably shouldn't go for a microservice architecture.
    • @bmoesta: Any industry that solely focuses on efficiency innovation is on the verge of death. Disruptive innovations that drive progress drive growth
    • flak: It’s quite likely that your crypto will explode sooner or later, and it’s possible that random numbers will be implicated, but it’s very unlikely that some USB gizmo promising “true random” at kilobits per second will save you. Save your money instead.

  • Imagine how much the world has changed in those 25 years. The world's first website went online 25 years ago today. Without the Web the Internet would probably still be a backwater for researchers. The Web was the Internet's killer app. It's hard to imagine Pokemon is Augmented Realities' killer app. AR needs its let the people make it bigger and better technology. Given the balkanization of AR into proprietary silos AR may never have its Web moment. Will there be an HTTP for AR?

  • The phrase "small, reprogrammable quantum computer" doesn't sound remotely present-tense, but it is: Shantanu Debnath and colleagues at the University of Maryland reveal their new device can solve three algorithms using quantum effects to perform calculations in a single step, where a normal computer would require several operations. Although the new device consists of just five bits of quantum information (qubits), the team said it had the potential to be scaled up to a larger computer...the key to the new device was a system of laser pulses that drove the quantum logic gates, which operate like the switches and transistors that power ordinary computers.

  • Turning programmers into a proper profession, like doctors, is not the way to go. How much do doctors innovate? Very little. Doctors as a profession have been pounded into their current shape by two oppressors: fear of lawsuits and educational debt. Doctors are bound by best practices and oaths to do nothing interesting. What must programmers do constantly? Innovate and do the interesting. By not being a profession we are free to do harm, yes, but we are also able to create. Creation is a better failure mode than ossification. "Uncle" Bob Martin - "The Future of Programming". Nice gloss by Eric Fleming: Long story short this was really two talks in one. The first speech was about progress in hardware and software from 1945 to 2015. The second talk is about how there is so much growth in the programming field that there are too many young inexperienced people to do it right which necessitates some self regulatory body to bring young professionals into the flock. Ironically the talk his didn't intend to give, the first one is far more interesting than the talk he did give about how to fix the growing inexperience in industry.

  • Don't let what happened in Turkey happen to your coup attempt. Learn from experience. Here's your step-by-step guide on How to Overthrow a Government. Presented at, you may be surprised to hear, DefCon. First select from a menu of three overthrow methods: regime change: elections, coups and revolution. Next select a crack insurgency team from a handy wizard interface. Then there's a drop down list of intelligence gathering resources and funding options. After a few more clicks just press Go and you have your revolution (you'll certainly choose revolution, you get so many more points that way).

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

10 Gameday Failure Testing Scenarios from Obama for America

Tue, 08/09/2016 - 16:56

I have dozens if not hundreds of half finished articles and snippets of ideas in the haunted house that is my Google Docs. Walking the house around midnight, with the lights turned off course, I stumbled upon one ghost that has been haunting me since 2012. It is time to perform the ritual of exorcism by just publishing something.

You may or may not remember Obama for America, which in 2012 had a staff of 120 people that built and maintained the infrastructure that helped get out the vote for Obama. 

Harper Reed and Dylan Richard headed up the effort. Around that time they were getting a lot of press. One of the things that interested me was how they held Gameday test events, where they would simulate failure modes in their testing environments. Google calls these DiRT (Disaster Recovery Testing event) exercises

So I asked Harper and Dylan what these exercises actually were and they were kind enough to reply. And I apparently forgot all about it. My apologies. Better late than never? Yah, let's go with that.

Here are some of the failure testing scenarios carried out by the Obama for America team:

  1. Flush memcache
  2. Kill memcache (null route on instances)
  3. Kill replicants (we used security groups to deny access)
  4. Kill master
  5. Kill the backing API (we had a heavy SOA)
  6. Put API in read-only (killing master should accomplish this - but this tests client apps explicitly)
  7. Kill SQS (we used it heavily, particularly for decoupled systems and fall backs)
  8. Emulate an EBS failure (kill all DBs [we used RDS], kill all EBS backed instances)
  9. Emulate full east coast failure (we had a 2 stage failover plan to the west coast - fail to a read only mode which we could do easily, and fail over permanently which would only happen in the case of extended east coast AWS unavailability)
  10. Emulate human error (claim to have done something [scale up, restart a DB, flush the cache, bounce the wsgi proc, etc] but don't actually do it) 

Now there's one less ghost haunting the halls.

Related Articles
Categories: Architecture

Stuff The Internet Says On Scalability For August 5th, 2016

Fri, 08/05/2016 - 16:56

Hey, it's HighScalability time:

 

 

What does a 107 football field long battery building Gigafactory look like? A lot like a giant Costco. (tour)

 

If you like this sort of Stuff then please support me on Patreon.
  • 60 billion: Facebook messages per day; 3x: Facebook messages compared to global SMS traffic; $15: min wage increases job growth; 85,000: real world QPS for Twitter's search; 2017: when MRAM finally arrives; $60M: Bitcoin heist, bigger than any bank robbery; 710m: Internet users in China; 

  • Quotable Quotes:
    • @cmeik: When @eric_brewer told me that Go was good for building distributed systems, I couldn't help but think about this.
    • David Rosenthal: We can see the end of the era of data and computation abundance. Dealing with an era of constrained resources will be very different.In particular, enthusiasm for blockchain technology as A Solution To Everything will need to be tempered by its voracious demand for energy.
    • Dr Werner Vogels: What we’ve seen is a revolution where complete applications are being stripped of all their servers, and only code is being run. Quite a few companies are ripping out big pieces of their applications and replacing their servers, their VMs and their containers with just code. Perhaps we no longer have to think about servers.
    • @dsb: agree w serverless future - seeing more startups using that model & entirely eliminates most of my infra diligence questions
    • Emin Gün Sirer: It's too early for a coherent story to emerge from the smoldering ashes of the Bitfinex disaster. 
    • @jeremiahdillon: The coming decades will bring population shrinkage not seen since the Black Death. Good for wages, bad for GDP.
    • Nicole Hemsoth: The chatter is going around, once again, that AWS is looking to deliver a private version of its public cloud infrastructure, something that is not as easy to do as it sounds. 
    • Michael Rabin: I must admit that after many years of work in this area, the efficacy of randomness for so many algorithmic problems is absolutely mysterious to me. It is efficient, it works; but why and how is absolutely mysterious. 
    • Algorithms to Live By: that “bubble sort has no apparent redeeming features,” the research of Ackley and his collaborators suggests that there may be a place for algorithms like Bubble Sort after all. Its very inefficiency—moving items only one position at a time—makes it fairly robust against noise, far more robust than faster algorithms like Mergesort, in which each comparison potentially moves an item a long way. Mergesort’s very efficiency makes it brittle
    • JoshGlazebrook: Looks like Hitachi (HGST) is still leading in terms of reliability. 
    • @SeanMcElwee: don't argue with capitalists. seize the means of production.
    • jondubois: What the author describes, I would not call 'protocols' - The Bitcoin network is a hosted implementation of the Bitcoin protocol - It is not the protocol itself. Tokens in the context of the Bitcoin protocol itself have no value - The value is derived from the popularity of the infrastructure, not from the popularity of the protocol.

  • Where there is Pokemon there is a way. If you don't make an API someone will. Ingenious third party tracking services are one reason Pokemon Go is slow: The company says these services were making the servers unreliable. Pokémon Go doesn’t have an API, so it seems like Pokévision and others created countless of accounts on many servers around the world using Android emulators. With these emulators, they could fake movements around cities and reverse-engineer the game to create a sort of lightweight API and gather Pokémon data.

  • Two years later is appears Facebook creating a separate Messenger app was a good idea. Go figure. This Is The Smartest Thing Facebook Ever Did: In phase one, Facebook grows the user base. “We’re really at the beginning of phase two,” he said, in which the company focuses on growing organic interactions between people and businesses. Once businesses see this is working, the company launches stage three, in which it asks companies to pay up. This strategy has worked well for the company’s other products: Facebook reported $6.44 billion in sales this year, up 59 percent from a year ago. The company’s profits almost tripled to $2.06 billion.

  • So you want a system where the guberment has the master key to all encrypted systems? What a great idea! Anyone can now print out all TSA master keys.

  • This is from WWI! French gov: "WWI sites will be fully cleared of unexploded ordnance in... 300-900 years." Can you imagine what the the aftermath of the cryptowars will be like? Sorry, don't touch that toaster...it will hack your neural lace and make you do crazy shite. Voting booths are all compromised, back to paper. Don't even think of using your all electric AI controlled car. It's now an IDAID (Improvised Destructive AI Device). Remember all those families that drove themselves over the cliff? So sad. After the fifth iteration of this pattern we'll have to melt it all down and start over again, only this time through only steampunk tech will be allowed.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Is build back? The Fall of the General Purpose CPU

Wed, 08/03/2016 - 16:56

There's a meme out there that hardware is dead. Maybe not. Hardware is becoming more specialized as the general purpose CPU can't keep up. The tick-tock cycle created by Moore's law meant designers had a choice: build or buy. Make your own hardware to deep inspect 1gps of network traffic (for example) and release later or use an off-the-shelf CPU and release sooner.

Now in the anarchy of a Moore's lawless it looks like build is back. Jeff Dean is giving a talk at #scaledmlconf where he talks about this trend at Google.

CPU@jackclarkSF: Jeff Dean says Google can run its full Inception' v3 image model on a phone at about 6fps. And specialized ASICs are coming. 

And Mo Patel captured this slide from the talk:

Categories: Architecture

Sponsored Post: Exoscale, Host Color, Cassandra Summit, Scalyr, Gusto, LaunchDarkly, Aerospike, VividCortex, MemSQL, AiScaler, InMemory.Net

Tue, 08/02/2016 - 16:56

Who's Hiring?
  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.

Fun and Informative Events
  • Join database experts from companies like Apple, ING, Instagram, Netflix, and many more to hear about how Apache Cassandra changes how they build, deploy, and scale at Cassandra Summit 2016. This September in San Jose, California is your chance to network, get certified, and trained on the leading NoSQL, distributed database with an exclusive 20% off with  promo code - Academy20. Learn more at CassandraSummit.org

  • NoSQL Databases & Docker Containers: From Development to Deployment. What is Docker and why is it important to Developers, Admins and DevOps when they are using a NoSQL database? Find out in this on-demand webinar by Alvin Richards, VP of Product at Aerospike, the enterprise-grade NoSQL database. The video includes a demo showcasing the core Docker components (Machine, Engine, Swarm and Compose) and integration with Aerospike. See how much simpler Docker can make building and deploying multi-node, Aerospike-based applications!  
Cool Products and Services
  • Do you want a simpler public cloud provider but you still want to put real workloads into production? Exoscale gives you VMs with proper firewalling, DNS, S3-compatible storage, plus a simple UI and straightforward API. With datacenters in Switzerland, you also benefit from strict Swiss privacy laws. From just €5/$6 per month, try us free now.

  • High Availability Cloud Servers in Europe: High Availability (HA) is very important on the Cloud. It ensures business continuity and reduces application downtime. High Availability is a standard service on the European Cloud infrastructure of Host Color, active by default for all cloud servers, at no additional cost. It provides uniform, cost-effective failover protection against any outage caused by a hardware or an Operating System (OS) failure. The company uses VMware Cloud computing technology to create Public, Private & Hybrid Cloud servers. See Cloud service at Host Color Europe.

  • Dev teams are using LaunchDarkly’s Feature Flags as a Service to get unprecedented control over feature launches. LaunchDarkly allows you to cleanly separate code deployment from rollout. We make it super easy to enable functionality for whoever you want, whenever you want. See how it works.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

 

If any of these items interest you there's a full description of each sponsor below...

Categories: Architecture

How to Setup a Highly Available Multi-AZ Cassandra Cluster on AWS EC2

Mon, 08/01/2016 - 16:56

 

This is a guest post by Alessandro Pieri, Software Architect at Stream. Try out this 5 minute interactive tutorial to learn more about Stream’s API.

Originally built by Facebook in 2009, Apache Cassandra is a free and open-source distributed database designed to handle large amounts of data across a large number of servers. At Stream, we use Cassandra as the primary data store for our feeds. Cassandra stands out because it’s able to:

  • Shard data automatically

  • Handle partial outages without data loss or downtime

  • Scales close to linearly

If you’re already using Cassandra, your cluster is likely configured to handle the loss of 1 or 2 nodes. However, what happens when a full availability zone goes down?

In this article you will learn how to setup Cassandra to survive a full availability zone outage. Afterwards, we will analyze how moving from a single to a multi availability zone cluster impacts availability, cost, and performance.

Recap 1: What Are Availability Zones?
Categories: Architecture

Stuff The Internet Says On Scalability For July 29th, 2016

Fri, 07/29/2016 - 16:56

Hey, it's HighScalability time:


Facial tats to disrupt big brother surveillance systems may actually work. Our future?

 

If you like this sort of Stuff then please support me on Patreon.
  • 40.4 million: iPhones sold this quarter;  7: number of times Facebook has avoided the IRS; 104: new exoplanets; 100: new brain regions found; 2x: HTTPS adoption; 

  • Quotable Quotes:
    • @mat: Apple is doomed: "the nearly $8 billion in profits this quarter is more than twice what Facebook made in 2015"
    • Bruce Schneier: The truth is that technology magnifies power in general, but the rates of adoption are different. The unorganized, the distributed, the marginal, the dissidents, the powerless, the criminal: they can make use of new technologies faster. And when those groups discovered the Internet, suddenly they had power. But when the already powerful big institutions finally figured out how to harness the Internet for their needs, they had more power to magnify. That’s the difference: the distributed were more nimble and were quicker to make use of their new power, while the institutional were slower but were able to use their power more effectively.
    • @mjasay: What AWS does for AMZN: $2.89B in revenue (up from $1.8B last year), earning 56% of Amazon profits (EPS was $1.78, up from $0.19 last year)
    • @kurtseifried: I wonder how discrete cloud billing can get? Per cpu cycle? bit moved in and out? I suspect yes.
    • Algorithms to Live By: More generally, our intuitions about rationality are too often informed by exploitation rather than exploration. When we talk about decision-making, we usually focus just on the immediate payoff of a single decision—and if you treat every decision as if it were your last, then indeed only exploitation makes sense.
    • Pinterest: As it turns out, it’s damn hard to design consistent and beautiful things at scale. 
    • @obfuscurity: OH: “god i hate having to lie about loving containers all the time”
    • @beaucronin: Leah McGuire: "Metrics are the unit tests of data science"; without them you won't know when things break and you'll be exposed #wrangleconf
    • @tsantero: OH: "Blockchain: a system that allows a bunch of non-CS people to suddenly be distributed computing experts."
    • zeveb: People want safety; they want security; they want conformity; they want power over others.
    • Richard Watson: My take-home [re Pokemon Go]: even the very best can be surprised when the scale hits the fan.
    • @xaibeha: HTTP/2: Because a hundred requests per page load is just a fact of nature.
    • mdatwood: many people have this irrational hate for Java, or they hate the Java from 10 years ago. Todays Java is fast, has tons of mature frameworks, and is probably one of the best tools to use from building a web service back end.
    • @BenedictEvans: Obvious: an iPhone has hundreds of times more compute power than the original Pentium. More important: $50 Androids in rural Africa do too
    • Dark Silicon: infeasible to operate all on-chip components at full performance at the same time due to the thermal constraints (peak temperature, spatial and temporal thermal gradients etc.
    • @Sneakyness: Why do people always assume that companies have scaling issues, and not that they've determined that 85% uptime is enough to make money
    • @cdixon: Alternative headline: "Alphabet invests $859M in long-term projects."
    • @xaprb: We were promised a Utopian vision with the “semantic web,” but it turns out it’s actually Feedly, IFTT, Slack, and Pocket that fulfill it.
    • Amit: Let's drop 10¢ coins and $10 bills and treat them like 50¢ coins, $2 bills, $50 bills — they exist but we don't use them widely.
    • Graham Templeton: One major advantage of life over modern engineering is power efficiency.
    • @neil_conway: @t_crayford @kellabyte >10k threads running native code + user-defined stored procedures in a single address space sounds pretty scary.

  • Niantic is looking for a Software Engineer - Server Infrastructure to help make Pokemon go. You think it's easy? Think again: Create the server infrastructure to support our hosted AR/Geo platform underpinning projects such as Pokémon GO using Java and Google Cloud. You will work on real-time indexing, querying and aggregation problems at massive scales of hundreds of millions of events per day, all on a single, coherent world-wide instance shared by millions of users.

  • DDos attacks as a reason to bypass the kernel. Why we use the Linux kernel's TCP stack:  During some attacks we are flooded with up to 3M packets per second (pps) per server...With this scale of attack the Linux kernel is not enough for us. We must work around it. We don't use the previously mentioned "full kernel bypass", but instead we run what we call a "partial kernel bypass". With this the kernel retains the ownership of the network card, and allows us to perform a bypass only on a single "RX queue". 

  • BTW, I bought nothing on Prime Day. How AWS Powered Amazon’s Biggest Day Ever: This wave of traffic then circled the globe, arriving in Europe and the US over the course of 40 hours and generating 85 billion clickstream log entries. Orders surpassed Prime Day 2015 by more than 60% worldwide and more than 50% in the US alone. On the mobile side, more than one million customers downloaded and used the Amazon Mobile App for the first time.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Economics May Drive Serverless

Wed, 07/27/2016 - 16:56

We've been following an increasing ephemerality curve to get more and more utilization out of our big brawny boxes. VMs, VMs in the cloud, containers, containers in the cloud, and now serverless, which looks to be our first native cloud infrastructure.

Serverless is said to be about functions, but you really need a zip file of code to do much of anything useful, which is basically a container.

So serverless isn't so much about packaging as it is about not standing up your own chunky persistent services. Those services, like storage, like the database, etc, have moved to the environment.

Your code orchestrates the dance and implements specific behaviours. Serverless is nothing if not a framework writ large.

Serverless also intensifies the developer friendly disintermediation of infrastructure that the cloud started.

Upload your code and charge it on your credit card. All the developer has to worry about their function. Oh, and linking everything together (events, DNS, credentials, backups, etc) through a Byzantine patch panel of a UI; uploading each of your zillions of "functions" on every change; managing versions so you can separate out test, development, and production. But hey, nothing is perfect.

What may drive serverless more than anything else is economics. From markonen

In my book, the innovation in Lambda is, above everything else, about the billing model. My company moved the work of 40 dedicated servers onto Lambda and in doing so decimated our costs. Paying for 1500 cores (our current AWS limit) in 100ms increments has been a game changer. I'm sure there are upsides to adopting the same programming model with your own hardware or VMs, but the financial benefit of Lambda will not be there.

There are many more quotes likes this, but that's the jist of it. And as pointed out by others, the pay off depends on some utilization threshold. If you can drive the utilization of your instances to some high level then running your own instances makes economic sense.

For the rest of us taking advantage of the aggregation of a big cloud provider is a winner. Setting up a highly available service on the cloud, dealing with instances and all the other overhead is still a huge PITA. Why deal with all that if you don't have to?

Developers pick winners. Developers follow ease of use. Developers follow the money. So serverless is a winner. You'll just have to get over the name.

Categories: Architecture

Stuff The Internet Says On Scalability For July 22nd, 2016

Fri, 07/22/2016 - 16:56

Hey, it's HighScalability time:


It's not too late London. There's still time to make this happen

 

If you like this sort of Stuff then please support me on Patreon.
  • 40%: energy Google saves in datacenters using machine learning; 2.3: times more energy knights in armor spend than when walking; 1000x: energy efficiency of 3D carbon nanotubes over silicon chips; 176,000: searchable documents from the Founding Fathers of the US; 93 petaflops: China’s Sunway TaihuLight; $800m: Azure's quarterly revenue; 500 Terabits per square inch: density when storing a bit with an atom; 2 billion: Uber rides; 46 months: jail time for accessing a database; 

  • Quotable Quotes:
    • Lenin: There are decades where nothing happens; and there are weeks where decades happen.
    • Nitsan Wakart: I have it from reliable sources that incorrectly measuring latency can lead to losing ones job, loved ones, will to live and control of bowel movements.
    • Margaret Hamilton~ part of the culture on the Apollo program “was to learn from everyone and everything, including from that which one would least expect.”
    • @DShankar: Basically @elonmusk plans to compete with -all vehicle manufacturers (cars/trucks/buses) -all ridesharing companies -all utility companies
    • @robinpokorny: ‘Number one reason for types is to get idea what the hell is going on.’ @swannodette at #curryon
    • Dan Rayburn: Some have also suggested that the wireless carriers are seeing a ton of traffic because of Pokemon Go, but that’s not the case. Last week, Verizon Wireless said that Pokemon Go makes up less than 1% of its overall network data traffic.
    • @timbaldridge: When people say "the JVM is slow" I wonder to what dynamic, GC'd, runtime JIT'd, fully parallel, VM they are comparing it to.
    • @papa_fire: “Burnout is when long term exhaustion meets diminished interest.”  May be the best definition I’ve seen.
    • Sheena Josselyn: Linking two memories was very easy, but trying to separate memories that were normally linked became very difficult
    • @mstine: if your microservices must be deployed as a complete set in a specific order, please put them back in a monolith and save yourself some pain
    • teaearlgraycold: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
    • Erik Duindam:  I bake minimum viable scalability principles into my app.
    • Hassabis: It [DeepMind] controls about 120 variables in the data centers. The fans and the cooling systems and so on, and windows and other things. They were pretty astounded.
    • @WhatTheFFacts: In 1989, a new blockbuster store was opening in America every 17 hours.
    • praptak: It [SRE] changes the mindset from "Failure? Just log an error, restore some 'good'-ish state and move on to the next cool feature." towards "New cool feature? What possible failures will it cause? How about improving logging and monitoring on our existing code instead?"
    • plusepsilon: I transitioned from using Bayesian models in academia to using machine learning models in industry. One of the core differences in the two paradigms is the "feel" when constructing models. For a Bayesian model, you feel like you're constructing the model from first principles. You set your conditional probabilities and priors and see if it fits the data. I'm sure probabilistic programming languages facilitated that feeling. For machine learning models, it feels like you're starting from the loss function and working back to get the best configuration

  • Isn't it time we admit Dark Energy and Dark Matter are simply optimizations in the algorithms running the sim of our universe? Occam's razor. Even the Eldritch engineers of our creation didn't have enough compute power to simulate an entire universe. So they fudged a bit. What's simpler than making 90 percent of matter in our galaxy invisible?

  • Do you have one of these? Google has a Head of Applied AI.

  • Uber with a great two article series on their stack. Part unoPart deux: Our business runs on a hybrid cloud model, using a mix of cloud providers and multiple active data centers...We currently use Schemaless (built in-house on top of MySQL), Riak, and Cassandra...We use Redis for both caching and queuing. Twemproxy provides scalability of the caching layer without sacrificing cache hit rate via its consistent hashing algorithm. Celery workers process async workflow operations using those Redis instances...for logging, we use multiple Kafka clusters...This data is also ingested in real time by various services and indexed into an ELK stack for searching and visualizations...We use Docker containers on Mesos to run our microservices with consistent configurations scalably...Aurora for long-running services and cron jobs...Our service-oriented architecture (SOA) makes service discovery and routing crucial to Uber’s success...we’re moving to a pub-sub pattern (publishing updates to subscribers). HTTP/2 and SPDY more easily enable this push model. Several poll-based features within the Uber app will see a tremendous speedup by moving to push....we’re prioritizing long-term reliability over debuggability...Phabricator powers a lot of internal operations, from code review to documentation to process automation...We search through our code on OpenGrok...We built our own internal deployment system to manage builds. Jenkins does continuous integration. We combined Packer, Vagrant, Boto, and Unison to create tools for building, managing, and developing on virtual machines. We use Clusto for inventory management in development. Puppet manages system configuration...We use an in-house documentation site that autobuilds docs from repositories using Sphinx...Most developers run OSX on their laptops, and most of our production instances run Linux with Debian Jessie...At the lower levels, Uber’s engineers primarily write in Python, Node.js, Go, and Java...We rip out and replace older Python code as we break up the original code base into microservices. An asynchronous programming model gives us better throughput. And lots more.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Building Highly Scalable V6 Only Cloud Hosting

Wed, 07/20/2016 - 16:56

This is a guest repost by Donatas Abraitis, Lead Systems Engineer at at Hostinger International.

This article is about how we built the new high scalable cloud hosting solution using IPv6-only communication between commodity servers, what problems we faced with IPv6 protocol and how we tackled them for handling more than ten millions active users.

Why did we decide to run IPv6-only network?

At Hostinger we care much about innovation technologies, thus we decided to run a new project named Awex that is based on this protocol. If we can, so why not start since today? Only frontend (user facing) services are running in dual-stack environment, everything else is IPv6-only for west-east traffic.

Architecture
Categories: Architecture

Sponsored Post: Cassandra Summit, Scalyr, Gusto, LaunchDarkly, Awake Networks, Aerospike, VividCortex, MemSQL, AiScaler, InMemory.Net

Tue, 07/19/2016 - 16:56

Who's Hiring?
  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.

  • Awake Networks is an early stage network security and analytics startup that processes, analyzes, and stores billions of events at network speed. We help security teams respond to intrusions with super-human  efficiency and provide macroscopic and microscopic insight into the networks they defend. We're looking for folks that are excited about building systems that handle scale in a constrained environment. We have many open-ended problems to solve around stream-processing, distributed systems, machine learning, query processing, data modeling, and much more! Please check out our jobs page to learn more.

Fun and Informative Events
  • Join database experts from companies like Apple, ING, Instagram, Netflix, and many more to hear about how Apache Cassandra changes how they build, deploy, and scale at Cassandra Summit 2016. This September in San Jose, California is your chance to network, get certified, and trained on the leading NoSQL, distributed database with an exclusive 20% off with  promo code - Academy20. Learn more at CassandraSummit.org

  • NoSQL Databases & Docker Containers: From Development to Deployment. What is Docker and why is it important to Developers, Admins and DevOps when they are using a NoSQL database? Find out in this on-demand webinar by Alvin Richards, VP of Product at Aerospike, the enterprise-grade NoSQL database. The video includes a demo showcasing the core Docker components (Machine, Engine, Swarm and Compose) and integration with Aerospike. See how much simpler Docker can make building and deploying multi-node, Aerospike-based applications!  
Cool Products and Services
  • Do you want a simpler public cloud provider but you still want to put real workloads into production? Exoscale gives you VMs with proper firewalling, DNS, S3-compatible storage, plus a simple UI and straightforward API. With datacenters in Switzerland, you also benefit from strict Swiss privacy laws. From just €5/$6 per month, try us free now.

  • High Availability Cloud Servers in Europe: High Availability (HA) is very important on the Cloud. It ensures business continuity and reduces application downtime. High Availability is a standard service on the European Cloud infrastructure of Host Color, active by default for all cloud servers, at no additional cost. It provides uniform, cost-effective failover protection against any outage caused by a hardware or an Operating System (OS) failure. The company uses VMware Cloud computing technology to create Public, Private & Hybrid Cloud servers. See Cloud service at Host Color Europe.

  • Dev teams are using LaunchDarkly’s Feature Flags as a Service to get unprecedented control over feature launches. LaunchDarkly allows you to cleanly separate code deployment from rollout. We make it super easy to enable functionality for whoever you want, whenever you want. See how it works.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

 

If any of these items interest you there's a full description of each sponsor below...

Categories: Architecture

How Does Google do Planet-Scale Engineering for a Planet-Scale Infrastructure?

Mon, 07/18/2016 - 17:15

 

How does Google keep all its services up and running? They almost never seem to fail. If you've ever wondered we get a wonderful peek behind the curtain in a talk given at GCP NEXT 2016 by Melissa Binde, Director, Storage SRE at Google: How Google Does Planet-Scale Engineering for Planet-Scale Infrastructure.

Melissa's talk is short, but it's packed with wisdom and delivered in a no nonsense style that makes you think if your service is down Melissa is definitely the kind of person you want on the case. 

Oh, just what is SRE? It stands for Site Reliability Engineering, but a definition is more elusive. It's like the kind of answers you get when you ask for a definition of the Tao. It's more a process than a thing, as is made clear by Ben Sloss 24x7 VP, Google, who defines SRE as:

what happens when a software engineer is tasked with what used to be called operations.

Let that bounce around your head for awhile.

Above and beyond all else one thing is clear: SREs are the custodian of production. SREs are the custodian of customer experience, for both google.com and GCP.

Some of the highlights of the talk for me:

  • The Destructive Incentives of Pitting Uptime vs Features. SRE is an attempt to solve the natural tension between developers who want to push features and sysadmins that want maintain uptime by not pushing features. 
  • The Error Budget. This is the idea that failure is expected. It's not a bad thing. Users can't tell if a service is up 100% of the time or 99.99%, so you can have errors. This reduces the tension between dev and ops. As long as the error budget is maintained you can push out new features and the ops side won't be blamed.
  • Goal is to restore service immediately. Troubleshooting comes later. This means you need a  lot of logging and tooling to debug after a service has been restored. For some reason this made flash on a line from an earlier article, also based on a talk from a Google SRE: Backups are useless. It’s the restore you care about
  • No Boredom Philosophy of Paging. When a page comes in it should be for an interesting and new problem. You don't want SREs being bored handling repetitive problems. That's what bots are for.

Other interesting topics in the talk are: How is SRE structured organizationally? How are devs hired into a role focussed on production and keep them happy? How do we keep the team valued inside of Google? How do we help our teams communicate better and resolve disagreements with data rather than with assertions or power grabs? 

Let's get on with it with it. Here's how Google does Planet-Scale Engineering for a Planet-Scale Infrastructure...

Categories: Architecture

Stuff The Internet Says On Scalability For July 15th, 2016

Fri, 07/15/2016 - 16:56

Hey, it's HighScalability time:


That little smudge on Jupiter is North America (size comparison). 

 

If you like this sort of Stuff then please support me on Patreon.
  • <2%: percent of total U.S. electricity consumption used by data centers; $4.99: hourly wage of Amazon Turkers; 8,072: cores in Cassandra cluster; .5: new reward for slaving away in the Bitcoin mines; 11: source code for the original Apollo guidance computer; 10 inverse femtobarns: number of collisions recorded by the Large Hadron Collider; 34 bps: using MEMO to send molecular messages through the air; 200 MB: record for storage in DNA; 10,000+: 3D printed parts are used in a Rolls-Royce Phantom; $43.6bn: IaaS revenue to triple by 2020; 

  • Quotable Quotes:
    • @PokemonGoApp: To ensure all Trainers can experience #PokemonGo, we continue to add new resources to accommodate everyone. Thank you for your patience.
    • @balajis: Pokemon Go is a classic overnight success, 10 years in the making. Ingress database, Google Maps, the Pokemon brand…
    • @avantgame: The math of Pokemon Go is pretty amazing. 21 million players in ONE week, playing 43 minutes on average a day.
    • @icecrime: Does Pokemon Go have generics?
    • @HarvardBiz: When companies start scaling, they often start seeing the future as a threat
    • Jakob Engblom: for the best performance, you want to break the design apart across cut-points with the lowest level of communication across the cut.
    • @peterpur: once again, it becomes obvious that complexity feeds itself, while simplicity needs conscious effort & hard work.
    • @jamesurquhart: Mine is already a microservice because it runs on a microcomputer. Right? Right?
    • Facebook: In our experience, every time we add a new tool, we are surprised that we managed without it.
    • @petecordell: Telling a programmer there's already a library to do X is like telling a songwriter there's already a song about love
    • @linclark: Code that my mom wrote 50 years ago just went up on GitHub
    • @danielbryantuk: "Our monolithic application was so monolithic that we gave it a name - jimmy..." Haha, awesome! @ZalandoTech at #microservices summit
    • Uri Hasson~ even across different languages, our brains show similar activity, or become “aligned,” when we hear the same idea or story.
    • @etherealmind: Bidirectional forwarding detection is most significant advance in Autonomous Routing in the last 20 years.
    • @aphyr: In particular, I'd like to note that @VoltDB has opted to preserve strong serializabilty as the default behavior, despite a latency cost.
    • @swardley: the system is based on a cycle of theft, settlers steal from pioneers forcing them to move on ...
    • Ian Adams: That the actual encoding at the CPU [for erasure coded storage] is generally not the bottleneck, but instead that the network tends to be, especially when you have really “wide” codes, e.g. 17/20 causing tons of traffic across many storage nodes for every request. 
    • Ayende: There is about 10% difference between fsync and fdatasync when using the HDD, but there is barely any difference as far as the SSD is concerned. This is because the SSD can do random updates (such as updating both the data and the metadata) much faster, since it doesn’t need to move the spindle.
    • @cpurdy: As long as flash capacities have an order-of-magnitude advantage over RAM, flash is allowed to be slower ;-)
    • @huntchr: Before you all go nuts re #serverless, #mechanicalsympathy remains important. You still need to understand what is going on under the hood.
    • Gallant: These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
    • @sheeshee: trying to convince somebody to archive a really old CGI script roughly 1994 for archeological purposes.. old code is important for learning.
    • Kreps & Kleppmann: we advocate a style of application development in which each data storage and processing component focuses on “doing one thing well”. Heterogeneous systems can be built by composing such specialised tools through the simple, general-purpose interface of a log. 

  • This is how you know you are Facebook. Instead of testing your new mobile software on one device you have a datacenter, with a lab, with around 60 custom made rack bristling with 2000 mobile phones, all so you can test all the different combinations and permutations. The mobile device lab at the Prineville data center.

  • The challenge was made. @adrianco: Let me know when you run a 1000 node Cassandra cluster on Kubernetes :-). The challenge was met. Thousand Instances of Cassandra using Kubernetes Pet Set: We deployed 1,009 minion nodes to Google Compute Engine (GCE), spread across 4 zones, running a custom version of the Kubernetes 1.3 beta. We ran this demo on beta code since the demo was being set up before the 1.3 release date. For the minion nodes, GCE virtual machine n1-standard-8 machine size was chosen, which is vm with 8 virtual CPUs and 30GB of memory. It would allow for a single instance of Cassandra to run on one node, which is recommended for disk I/O. 

  • Lures from Pokemon Go have turned out to be amazingly effective. Pokemon Go Is Driving Insane Amounts of Sales at Small Local Businesses. Here's How It Works. Building that kind of native business model driver deep into the game mechanics is the real trick of the game. Also, How the gurus behind Google Earth created 'Pokémon Go'.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Why Amazon Retail Went to a Service Oriented Architecture

Wed, 07/13/2016 - 16:56

When Lee Atchison arrived at Amazon, Amazon was in the process of moving from a large monolithic application to a Service Oriented Architecture.

Lee talks about this evolution in an interesting interview on Software Engineering Daily: Scalable Architecture with Lee Atchison, about Lee's new book: Architecting for Scale: High Availability for Your Growing Applications.

This is a topic Adrian Cockcroft has talked a lot about in relation to his work at Netflix, but it's a powerful experience to hear Lee talk about how Amazon made the transition with us having the understanding of what Amazon would later become. 

Amazon was running into the problems of success. Not so much from a scaling to handle the requests perspective, but they were suffering from the problem of scaling the number of engineers working in the same code base.

At the time their philosophy was based on the Two Pizza team. A small group owns a particular piece of functionality. The problem is it doesn’t work to have hundreds of pizza teams working on the same code base. It became very difficult to innovate and add new features. It even became hard to build the application, pass the test suites, and deploy the software.

The solution: move to a Service Oriented Architecture (not microservices).

Organizing around services allowed individual teams to truly own the code base, the support responsibility, and top to bottom responsibility for the functionality.

The result: a dramatic increase in innovation and the ability to grow. After a while the Amazon retail site grew with a constant stream of new capabilities. Maybe too many :-)

The process requires a culture shift, really more of an ownership shift, from being part of a larger group to be an entity of its own that has responsibilities outside of its group as well as responsibilities inside the group.

While the strategy of consciously exploiting team organization as a means of speeding up product development and encouraging innovation is not a new idea now, in the early 2000s it would have been one ballsy move.

Related Articles 
Categories: Architecture

Vertical Scaling Works for Bits and Bites

Tue, 07/12/2016 - 16:56

This is just to delicious a parallel to pass up. 

Here we have Google building a new four story datacenter Scaling Up: Google Building Four-Story Data Centers:

 

And here we have a new vertical farm from AeroFarms

 

Both have racks of consumables. One is a rack of bits, the other is a rack of bites. Both used to sprawl horizontally across huge swaths of land and now are building up. Both designs are driven by economic efficiency, extracting the most value per square foot. Both are expanding to meet increased demand. It's a strange sort of convergence.

Categories: Architecture

Stuff The Internet Says On Scalability For July 8th, 2016

Fri, 07/08/2016 - 16:56

Hey, it's HighScalability time:


Juno: 165,000mph, 1.7 billion miles, missed orbit by 10 miles. Dang buggy software. 

 

If you like this sort of Stuff then please support me on Patreon.
  • $3B: damages awarded to HP from Oracle; 37%: when to stop looking through your search period; 70%: observed Annualized Failure Rate (AFR) in production datacenters for some models of SSDs; 

  • Quotable Quotes:
    • spacerodent: After Christmas there was this huge excess capacity and that is when I first learned of the EC2 project. It was my belief EC2 came out of a need to utilize those extra Gurupa servers during the off season:)
    • bcantrill: That said, I think Sun's problem was pretty simple: we thought we were a hardware company long after it should have been clear that we were a systems company. As a result, we made overpriced, underperforming (and, it kills me to say, unreliable) hardware. And because we were hardware-fixated, we did not understand the economic disruptive force of either Intel or open source until it was too late. 
    • @cmeik: I am not convinced the blockchain and CRDTs *work.*
    • daly: Managers make decisions. Only go to management with your need for a decision and always present the options. They went to management with what was, in essence, a complaint. Worse, it was a complaint that had nothing to do with the business. Clearly they were not keeping the business uppermost in their priority queue. So management made a business decision and fixed the problem.
    • @colettecello: Architect: "we should break this down into 6 microservices" Me: "you have 6 teams who hate each other?" Architect: "how did you know that?"
    • Matt Stats: The differences between BSD and Linux all derive from basic philosophical differences. Once you understand those, everything else falls into place pretty neatly.
    • @wattersjames: "Last year, Johnson & Johnson turned off its last mainframe"
    • Allan Kelly: But in the world of software development this mindset [economies of scale] is a recipe for failure and under performance. The conflict between economies of scale thinking and diseconomies of scale working will create tension and conflict.
    • Jeff G: Today, a large part of my business is migrating companies off the monolithic Java EE containers into lightweight modular containers. Yes, even the tried and true banking and financial industries are moving away from Java EE. 
    • collyw: "Weeks of programming can save hours of planning" is a favorite quote of mine.
    • xiongchiamiov: when I see a team responsible for hundreds of microservices, it's not at all surprising when I find they're completely underwater and struggling to keep up with maintenance, much less new features.
    • Robert Plomin: We're always talking about differences. The only genetics that makes a difference is that 1 percent of the 3 billion base pairs. But that is over 10 million base pairs of DNA. We're looking at these differences and asking to what extent they cause the differences that we observe. 
    • @jmferdegue: Micro services as a cost reduction strategy for project delivery. Marco Cullen from @OpenCredo at #micromanchester
    • J.R.R. Tolkien: I like, and even dare to wear in these dull days, ornamental waistcoats.
    • @johnregehr~ HN commenter has reached enlightenment : In both cases, after about a year, we found ourselves wishing we had not rewritten the network stack.
    • @nigelbabu: OH: 9.9999% uptime is still five 9s.
    • @jessfraz: "We are going to need a floppy and a shaman" @ryanhuber
    • Peter Cohen: So, why the cloud? Because, the developer.
    • @CompSciFact: 'The fastest algorithm can frequently be replaced by one that is almost as fast and much easier to understand.' -- Douglas W. Jones
    • @igrigorik: Improved font loading in WebKit: http://bit.ly/29eaxV2  - tl;dr: 3s timeout, WOFF2, unicode-range, Font Loading API. hooray!
    • @danielbryantuk: "I've worked on teams with 200+. We had 3 people just to make JPA work" @myfear on scaling issues #micromanchester
    • AWS Origin Story: Jassy tells of an executive retreat at Jeff Bezos’ house in 2003. It was there that the executive team conducted an exercise identifying the company’s core competencies
    • @sheeshee: ".. you are charged for every 100ms your code executes and the number of times your code is triggered." the 1970ies are back. (aws lambda)
    • @KentBeck: accepting mediocrity as the price of scaling misunderstands the power law distribution of payoffs.
    • @cowtowncoder: that is: cost efficiency from AWS et al is for SMALL deployments, and at some point it always, invariably becomes cheaper to DIY
    • Exascale Computing Research priorities: Total power requirements suggest that CPUs will not be suitable commodity processors for supercomputers in the future.

  • Here's how Instagram does it. Instagram + Android: Four Years Later: At the core of this principle is the idea that the Instagram app is simply a renderer of server-provided data, much like a web browser. Almost all complex business logic happens server-side, where it is easier to fix bugs and add new features. We rely on the server to be perfect, enforced through continuous integration testing, and dispense with null-checking or data-consistency checking on the client.

  • Good story on how WePay is moving from a monolith to a services based architecture on top of Kubernetes. Advantages: autoscaling, rolling updates, a pure model independent of software assigned to specific machines.  WePay on Kubernetes: ‘It Changed Our Business’.

  • Julia Ferraioli with a really fun explaination of Kubernetes using legos

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture