Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator/sources/3' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

High Scalability - Building bigger, faster, more reliable websites
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.
Syndicate content
Updated: 36 min 33 sec ago

Stuff The Internet Says On Scalability For December 2nd, 2016

Fri, 12/02/2016 - 17:56

Hey, it's HighScalability time:

 

A phrase you've probably heard a lot this week: AWS announces...

 

If you like this sort of Stuff then please support me on Patreon.
  • 18 minutes: latency to Mars; 100TB: biggest dynamodb table; 55M: visits to Kaiser were virtual; $2 Billion: yearly Uber losses; 91%: Apple's take of smartphone profits; 825: AI patents held by IBM; $8: hourly cost of a spot welding in the auto industry; 70%: Walmart website traffic was mobile; $3 billion: online black friday sales; 80%: IT jobs replaceable by automation; $7500: cost of the one terabit per second DDoS attack on Dyn; 

  • Quotable Quotes:
    • @BotmetricHQ: #AWS is deploying tens of thousands of servers every day, enough to power #Amazon in 2005 when it was a $8.5B Enterprise. #reInvent
    • bcantrill: From my perspective, if this rumor is true, it's a relief. Solaris died the moment that they made the source proprietary -- a decision so incredibly stupid that it still makes my head hurt six years later.
    • Dropbox: it can take up to 180 milliseconds for data traveling by undersea cables at nearly the speed of light to cross the Pacific Ocean. Data traveling across the Atlantic can take up to 90 milliseconds.
    • @James_R_Holmes: The AWS development cycle: 1) Have fun writing code for a few months 2) Delete and use new AWS service that replaces it
    • @swardley: * asked "Can Amazon be beaten?" Me : of course * : how? Me : ask your CEO * : they are asking Me : have you thought about working at Amazon?
    • @etherealmind: Whatever network vendors did to James Hamilton at AWS, he is NEVER going to forgive them.
    • Stratechery: the flexibility and modularity of AWS is the chief reason why it crushed Google’s initial cloud offering, Google App Engine, which launched back in 2008. Using App Engine entailed accepting a lot of decisions that Google made on your behalf; AWS let you build exactly what you needed.
    • @jbeda: AWS Lambda@Edge thing is huge. It is the evolution of the CDN. We'll see this until there are 100s of DCs available to users.
    • erikpukinskis: Everyone in this subthread is missing the point of open source industrial equipment. The point is not to get a cheap tractor, or even a good one. The point is not to have a tractor you can service. The point is to have a shared platform.
    • John Furrier: Mark my words, if Amazon does not start thinking about the open-source equation, they could see a revolt that no one’s ever seen before in the tech industry. If you’re using open source to build a company to take territory from others, there will be a revolt.
    • @toddtauber: As we've become more sophisticated at quantifying things, we've become less willing to take risks. via @asymco
    • Resilience Thinking: Being efficient, in a narrow sense, leads to elimination of redundancies-keeping only those things that are directly and immediately beneficial. We will show later that this kind of efficiency leads to drastic losses in resilience.
    • Connor Gibson: By placing advertisements around the outside of your game (in the header, footer and sidebars) as well as the possibility video overlays it is entirely possible to earn up to six figures through this platform.
    • Google Analytics: And maybe, if nothing else, I guess it suggests that despite the soup du jour — huge seed/A rounds, massive valuations, binary outcomes— you can sometimes do alright by just taking less money and more time.
    • badger_bodger: I'm starting to get Frontend Fatigue Fatigue.
    • Steve Yegge: But now, thanks to Moore's Law, even your wearable Android or iOS watch has gigs of storage and a phat CPU, so all the decisions they made turned out in retrospect to be overly conservative.  And as a result, the Android APIs and frameworks are far, far, FAR from what you would expect if you've come from literally any other UI framework on the planet.  They feel alien. 
    • David Rosenthal: Again we see that expensive operations with cheap requests create a vulnerability that requires mitigation. In this case rate limiting the ICMP type 3 code 3 packets that get checked is perhaps the best that can be done.
    • @IAmOnDemand: Private on public cloud means the you can burst public/private workloads intothe public and shut down yr premise or... #reinvent
    • @allingeek: It isn’t “serverless" if you own the server/device. It is just a functional programing framework. #reinvent
    • brilliantcode: If you told me to use Azure two years ago I would've laughed you out of the room. But here I am in 2016, using Azure, using ASP.net + IIS on Visual Studio. that's some powerful shit and currently AWS has cost leadership and perceived switching cost as their edge.
    • seregine: Having worked at both places for ~4 years each, I would say Amazon is much more of a product company, and a platform is really a collection of compelling products. Amazon really puts customers first...Google really puts ideas (or technology) first.
    • api: Amazon seems to be trying to build a 100% proprietary global mainframe that runs everywhere.
    • Athas: No, it [Erlang] does not use SIMD to any great extent. Erlang uses message passing, not data parallelism. Erlang is for concurrency, not parallelism, so it would benefit little from these kinds of massively parallel hardware.
    • @chuhnk: @adrianco @cloud_opinion funnily those of us who've built platforms at various startups now think a cloud provider is the best place to be.
    • @jbeda: So the guy now in charge of building OSS communities at @awscloud says you should just join Amazon? Communities are built on diversity.
    • @JoeEmison: There's also an aspect of some of these AWS services where they only exist because of problems with other AWS services.
    • logmeout: Until bandwidth pricing is fixed rather than nickel and dimeing us to death; a lot of us will choose fixed pricing alternatives to AWS, GCP and Rackspace.
    • arcticfox: 100%. I can't stand it [AWS]. It's unlimited liability for anyone that uses their service with no way to limit it. If you were able to set hard caps, you could have set yours at like $5 or even $0 (free tier) and never run into that.
    • @edw519: I hate batch processing so much that I won't even use the dishwasher. I just wash, dry, and put away real time.
    • @CodeBeard: it could be argued that games is the last real software industry. Libraries have reduced most business-useful code to glue.
    • Gall's Law: A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
    • @mathewlodge: AWS now also designing its own ASICs for networking #Reinvent
    • @giano: From instances to services, AWS better than anybody else understood that use case specific wins over general purpose every day. #reinvent
    • @ben11kehoe: AWS hitting breadth of capability hard. Good counterpoint to recent "Google is 50% cheaper" news #reinvent
    • Michael E. Smith: But there are also positive effects of energized crowding. Urban economists and economic geographers have known for a long time that when businesses and industries concentrate themselves in cities, it leads to economies of scale and thus major gains in productivity. These effects are called agglomeration effects.
    • Andrew Huang: The inevitable slowdown of Moore’s Law may spell trouble for today’s technology giants, but it also creates an opportunity for the fledgling open-hardware movement to grow into something that potentially could be very big. 
    • Stratechery: This is Google’s bet when it comes to the enterprise cloud: open-sourcing Kubernetes was Google’s attempt to effectively build a browser on top of cloud infrastructure and thus decrease switching costs; the company’s equivalent of Google Search will be machine learning.

  • Just what has Amazon been up to?

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

How to Make Your Database 200x Faster Without Having to Pay More?

Mon, 11/28/2016 - 17:56

This is a guest repost Barzan Mozafari, an assistant professor at University of Michigan and an advisor to a new startup, snappydata.io, that recently launched an open source OLTP + OLAP Database built on Spark.

Almost everyone these days is complaining about performance in one way or another. It’s not uncommon for database administrators and programmers to constantly find themselves in a situation where their servers are maxed out, or their queries are taking forever. This frustration is way too common for all of us. The solutions are varied. The most typical one is squinting at the query and blaming the programmer for not being smarter with their query. Maybe they could have used the right index or materialized view or just re-write their query in a better way. Other times, you might have to spin up a few more nodes if your company is using a cloud service. In other cases, when your servers are overloaded with too many slow queries, you might set different priorities for different queries so that at least the more urgent one (e.g., CEO queries) finish faster. When the DB does not support priority queues, your admin might even cancel your queries to free up some resources for the more urgent queries.

No matter which one of these experiences you’ve had, you’re probably familiar with the pain of having to wait for slow queries or having to pay for more cloud instances or buying faster and bigger servers. Most people are familiar with traditional database tuning and query optimization techniques, which come with their own pros and cons. So we’re not going to talk about those here. Instead, in this post, we’re going to talk about more recent techniques that are far less known to people and in many cases actually lead to much better performance and saving opportunities.

To start, consider these scenarios:

Categories: Architecture

Stuff The Internet Says On Scalability For November 25th, 2016

Fri, 11/25/2016 - 18:40

Hey, it's HighScalability time:

 

Margaret Hamilton was honored with the Presidential Medal of Freedom for writing Apollo guidance software. Oddly, she's absent from best programmers of all time lists.

 

If you like this sort of Stuff then please support me on Patreon.
  • 98 seconds: before camera infected with malware; zeptosecond: smallest fragment of time ever measured; 50%: Google Cloud cheaper than AWS; 50%: of the world is on-line;

  • Quotable Quotes:
    • @skamille: Sometimes I think that human societies just weren't meant to scale to billions of people sharing arbitrary information
    • @joshk0: At @GetArbor we use #kubernetes to host a 30K QPS ad-tech serving platform. Maybe smaller than Pokemon Go but nothing to sneeze at.
    • HFT Guy: 2016 should be remembered as the year Google became a better choice than AWS. If 50% cheaper is not a solid argument, I don’t know what is.
    • Glenn Marcus: Hybrid [Progressive Web App] development takes 260% more effort man hours than Native development.
    • Bruce Schneier: I want to suggest another way of thinking about it in that everything is now a computer: This is not a phone. It’s a computer that makes phone calls. A refrigerator is a computer that keeps things cold. ATM machine is a computer with money inside. Your car is not a mechanical device with a computer. It’s a computer with four wheels and an engine… And this is the Internet of Things, and this is what caused the DDoS attack we’re talking about.
    • Bruce Schneier: I don’t like this. I like the world where the internet can do whatever it wants, whenever it wants, at all times. It’s fun. This is a fun device. But I’m not sure we can do that anymore.
    • southpolesteve: [Lambda] is cheaper and simpler to operate than our previous ec2+Opsworks setup. We get code to production faster and spend more time on actual business problems vs infrastructure problems.
    • Carlo Rovelli: Meaning = Information + Evolution
    • chadscira: We have been using Rancher as well... It allowed us to move away from DO and AWS. Now most of our infra is from OVH :). It's been smooth sailing. Because of massive costs savings we were able to just reinvest it in our own redundancy. Also 12-factor apps are pretty damn resilient.
    • Fiahil: Making separate [Google] accounts might not be enough considering they allegedly banned accounts related to each others by recovery address. Why would you think they would not do the same with accounts sharing occasionally the same laptop, the same ip address, and the same first and last name ?
    • @swardley: Arghhh, one of those "can IBM beat Amazon?" .... the answer has three parts 1) the game has become harder  2) yes it could  3) no it won't
    • fest: Replaying the sensor inputs and evaluating new estimated state is a really good way of debugging failures (because you can't just stop the system mid-air and evaluate internal state). It also helps with regression test suite and trying out new algorithms quickly.
    • @Tibocut: «Institutions prefer to have trillions sitting still than redistributing them towards opportunities» @asymco https://youtu.be/nD8QszyiVTY  at 2h45
    • @AlanaMassey: A gathering of two or more average looking white men is referred to by biologists as "a podcast."
    • @RyanHoliday: "How slow men are in matters when they believe they have time and how swift they are when necessity drives them to it." Machiavelli
    • agataygurturk: We use route53 health checks to invoke API gateway and thus the backend Lambda.
    • Paul Biggar: Yeah, BDSM. It’s San Francisco. Everyone’s into distributed systems and BDSM.
    • @mims: Since the Apollo program, we've privatized the R&D that drives all innovation. That might be a problem.
    • Backblaze:  We have fewer drives because over the last quarter we swapped out more than 3,500 2 terabyte (TB) HGST and WDC hard drives for 2,400 8 TB Seagate drives. So we have fewer drives, but more data.
    • @lee_newcombe: Fun finding from my talk earlier.  40 attendees: 37 on cloud, 3 about to start.  Only one trying serverless.  There's your opportunity folks
    • Resilience Thinking: In resilient systems everything is not necessarily connected to everything else. Overconnected systems are susceptible to shocks and they are rapidly transmitted through the system. A resilient system opposes such a trend; it would maintain or create a degree of modularity.

  • Security expert Rob Graham with a stunning blow by blow twitter story of a botnet infecting his brand new security camera. The whole process starts within 98 seconds of putting the camera on the internet, which is far faster than an ordinary mortal can configure the device to be secure. This was a cheap camera that had good reviews. At some point we need to think about all this too cheap equipment as being funded by a Botnet Subsidy. It's almost too much of a coincidence that all these cheap devices, meant to be bought like candy in the mass consumer market, have such obviously poor security. Maybe it's not an accident? See also, Pre-installed Backdoor On 700 Million Android

  • Their profit margin is your opportunity. With The Era of Cloud Price Discounts Is Fading and the cost of metal continuing to decrease, is now a good time to consider transitioning to bare metal on-premise type infrastructures? The incentives are now coming into alignment. Kubernetes: Finally...A True Cloud Platform by Sam Ghods, Co-founder, Box makes a good case for Kubernetes as the only truly portable infrastructure option.

  • This is both pure genius and a sure sign of the apocalypse. Exclusive Interview: How Jared Kushner Won Trump The White House. Democrats may have thought they had a technological lead because of the last presidential election, but it turns out they were fighting the last war. Technology changed and they did not. Old: targeting, organizing and motivating voters. New: Moneyball meets Social Media with a twist of message tailoring, sentiment manipulation and machine learning. If this presidential election could be represented as a battle between Peter Thiel and Eric Schmidt: Thiel triumphed. Traditional microtargeting is almost quaint. Now, using Facebook's ability to target users with dark posts, a newsfeed message seen by no one aside from the users being targeted, each user can be shown a world specifically tailored to push and prod their particular buttons. For an explanation see The Secret Agenda of a Facebook Quiz. That's why it's both genius and apocalyptical. Things will never be the same. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

IT Hare: Ultimate DB Heresy: Single Modifying DB Connection. Part I. Performanc

Tue, 11/22/2016 - 17:56

Sergey Ignatchenko continues his excellent book series with a new chapter on databases. This is a guest repost

The idea of single-write-connection is used extensively in the post, as it's defined elsewhere I asked Sergey for a definition so the article would make a little more sense...

As for single-write-connection - I mean that there is just one app (named "DB Server" in the article) having a single DB connection to the database which is allowed to issue modifying statements (UPDATEs/INSERTs/DELETEs). This allows to achieve several important simplifications - first of all, all fundamentally non-testable concurrency issues (such as missing SELECT FOR UPDATE and deadlocks) are eliminated entirely, second - the whole thing becomes deterministic (which is a significant help to figure out bugs - even simple text logging has been seen to make the system quite debuggable, including post-mortem), and last but not least - this monopoly on updates can be used in quite creative ways to improve performance (in particular, to keep always-coherent app-level cache which can be like 100x-1000x more efficient than going to DB).

After we finished with all the preliminaries, we can now get to the interesting part – implementing our transactional DB and DB Server. We already mentioned implementing DB Server briefly in Chapter VII, but now we need much more detailed discussion on this all-important topic.

“Transactional / operational DB is a place where all the automated decisions are made about your game (stock exchange, bank, etc.)First of all, let’s re-iterate what we’re speaking about. Transactional/operational DB is a place where all the automated decisions are made about your game (stock exchange, bank, etc.).

It stores things such as player accounts, with all their persistent attributes etc. etc.; it also stores communications related to payment processing, and so on, and so forth. And “DB Server” is our app handling access to DBMS (as noted in Chapter VII, I am firmly against having SQL statements issued directly by your Game Servers/Game Logic, so an intermediary such as DB Server is necessary).

As discussed above, ACID properties tend to be extremely important for transactional/operational DB. We don’t want money – or that artifact which is sold for real $20K on eBay – to be lost or duplicated. For this and some other reasons, we’ll be speaking about SQL databases for our transactional/operational DB (while it is possible to use NoSQL for transactional/operational DB – achieving strict guarantees is usually difficult, in particular because of lack of multi-object ACID transactions in most of NoSQL DBs out there, see discussion in [[TODO]] section above).

And now, we’re finally ready to start discussing interesting things.

Multi-Connection DB Access
Categories: Architecture

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

Tue, 11/22/2016 - 17:56

Who's Hiring?
  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com

  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.
Fun and Informative Events
  • Your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: Architecture

Stuff The Internet Says On Scalability For November 18th, 2016

Fri, 11/18/2016 - 17:56

Hey, it's HighScalability time:

 

Now you don't have to shrink yourself to see inside a computer. Here's a fully functional 16-bit computer that's over 26 square feet huge! Bighex machine

 

If you like this sort of Stuff then please support me on Patreon.
  • 50%: drop in latency and CPU load after adopting PHP7 at Tumblr; 4,425: satellites for Skynet; 13%: brain connectome shared by identical twins; 20: weird & wonderful datasets for machine learning; 200 Gb/sec: InfiniBand data rate; 15 TB: data generated nightly by Large Synoptic Survey Telescope; 17.24%: top comments that were also first comments on reddit; $120 million: estimated cost of developing Kubernetes; 3-4k: proteins involved in the intracellular communication network;

  • Quotable Quotes:
    • Westworld: Survival is just another loop.
    • Leo Laporte: All bits should be treated equally. 
    • Paul Horner: Honestly, people are definitely dumber. They just keep passing stuff around. Nobody fact-checks anything anymore
    • @WSJ: "A conscious effort by a nation-state to attempt to achieve a specific effect" NSA chief on WikiLeaks 
    • encoderer: For the saas business I run, Cronitor, aws costs have consistently stayed around 10% total MRR. I think there are a lot of small and medium sized businesses who realize a similar level of economic utility.
    • @joshtpm: 1: Be honest: Facebook and Twitter maxed out election frenzy revenues and cracked down once the cash was harvested. Also once political ...
    • boulos: As a counter argument: very few teams at Google run on dedicated machines. Those that do are enormous, both in the scale of their infrastructure and in their team sizes. I'm not saying always go with a cloud provider, I'm reiterating that you'd better be certain you need to.
    • Renegade Facebook Employees: Sadly, News Feed optimizes for engagement. As we've learned in this election, bullshit is highly engaging. A bias towards truth isn't an impossible goal.
    • Russ White: The bottom line is this—don’t be afraid to use DNS for what it’s designed for in your network...We need to learn to treat DNS like it’s a part of the IP stack, rather than something that “only the server folks care about,” or “a convenience for users we don’t really take seriously for operations.”
    • Wizart_App: It's always about speed – never about beauty.
    • Michael Zeltser: MapReduce is just too low level and too dumb. Mixing complex business logic with MapReduce low level optimization techniques is asking too much. 
    • Michael Zeltser: One thing that always bugged me in MapReduce is its inability to reason about my data as a dataset. Instead you are forced to think in single key-value pair, small chunk, block, split, or file. Coming from SQL, it felt like going backwards 20 years. Spark has solved this perfectly.  
    • Guillaume Sachot: I can confirm that I've seen high availability appliances fail more often than non-clustered ones. And it's not limited to firewalls that crash together due to a bug in session sharing, I have noticed it for almost anything that does HA: DRBD instances, Pacemaker, shared filesystems...
    • Albert-Laszlo Barabasi: The bottom line is: Brother, never give up. When you give up, that’s when your creativity ends
    • SpaceX: According to a transcript received by Space News, he argued that the supercooled liquid oxygen that SpaceX uses as propellant actually became so cold that it turned into a solid. And that’s not supposed to happen.
    • Murat: Safety is a system-level property, unit testing of components is not enough.
    • @alexjc: 1/ As deep learning evolves as a discipline, it's becoming more about architecting highly complex systems that leverage data & optimization.
    • btgeekboy: Indeed. If there's one thing I've learned in >10 years of building large, multi-tenant systems, it's that you need the ability to partition as you grow. Partitioning eases growth, reduces blast radius, and limits complexity.
    • @postwait: Monitoring vendors that say they support histograms and only support percentiles are lying to their customers. Full stop. #NowYouKnow
    • @crucially: Fastly hit 5mm request per seconds tonight with a cache hit ratio of 96% -- proud of the team.
    • Rick Webb: Just because Silicon Valley has desperately wanted to believe for twenty years that communities can self-police does not make it true. 
    • Cybiote: Humans can additionally predict other agents and other things about the world based on intuitive physics. This is why they can get on without the huge array of sensors and cars cannot. Humans make up for the lack of sensors by being able to use the poor quality data more effectively. To put this in perspective, 8.75 megabits / second is estimated to pass through the human retina but only on the order of a 100 bits is estimated to reach conscious attention.
    • David Rand: What I found was consistent with the theory and the initial results: in situations where there're no future consequences, so it's in your clear self-interest to be selfish, intuition leads to more cooperation than deliberation.   
    • @crucially: Fastly hit 5mm request per seconds tonight with a cache hit ratio of 96% -- proud of the team
    • SpaceX: With deployment of the first 800 satellites, SpaceX will be able to provide widespread U.S. and international coverage for broadband services. Once fully optimized through the Final Deployment, the system will be able to provide high bandwidth (up to 1 Gbps per user), low latency broadband services for consumers and businesses in the U.S. and globally.
    • Steve Gibson: Anyone can make a mistake [regarding Pixel ownage], and Google is playing security catch up. But what they CAN and SHOULD be proud of is that they had the newly discovered problem patched within 24 hours!
    • dragonnyxx: Calling a 10,000 line program a "large project" is like calling dating someone for a week a "long-term relationship".
    • Brockman: I have three friends: confusion, contradiction, and awkwardness. That’s how I try to meander through life. Make it strange.
    • Martin Sústrik: In this particular case, almost everybody will agree that adding the abstraction was not worth it. But why? It was a tradeoff between code duplication and increased level of abstraction. But why would one decide that the well known cost of code duplication is lower than somewhat fuzzy "cost of abstraction"?

  • Biomedical engineering might be an area a lot of tech people interested in real-time monitoring and control at scale could be of help. Hr2: Wireless Spinal Tech, Climate Policy, Moon Impact. Researchers want to use wireless technology to record 100k+ neurons simultaneously, 24x7, for long periods of time. The goal is to use this data to control high dimensional systems, like when when reaching and grasping the shoulder, elbow, hand, wrist, and fingers must all work together in real-time. Sound familiar?

  • Making the Switch from Node.js to Golang. Digg switched a S3 heavy service from Node to Go and: Our average response time from the service was almost cut in half, our timeouts (in the scenario that S3 was slow to respond) were happening on time, and our traffic spikes had minimal effects on the service...With our Golang upgrade, we are easily able to handle 200 requests per minute and 1.5 million S3 item fetches per day. And those 4 load-balanced instances we were running Octo on initially? We’re now doing it with 2.

  • Not a lie. The best explanation to resilience. Resilience is how you maintain the self-organizing capacity of a system. Great explanation. The way you maintain the resilience of a system is by letting it probe its boundaries. The only way to make forest resilient to fire is to burn it. Efficiency is riding as close as possible to the boundary by using feedback to keep the system self-organizing.

  • Facebook does a lot of work making their mobile apps work over poor networks. One change they are making is Client-side ranking to more efficiently show people stories in feed. Previously, all story ranking occurred on the server and entries paged up to the device and displayed in order. The problem with this approach is that an article's rank could change while media is being loaded. Now a pool of stories is kept on the client and as new stories are added they are reranked and shown to users in rank order. This approach adapts well to slow networks because slow-loading content is temporarily down-ranked while it loads.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

The Story of Batching to Streaming Analytics at Optimizely

Wed, 11/16/2016 - 17:56

Our mission at Optimizely is to help decision makers turn data into action. This requires us to move data with speed and reliability. We track billions of user events, such as page views, clicks and custom events, on a daily basis. To provide our customers with immediate access to key business insights about their users has always been our top most priority. Because of this, we are constantly innovating on our data ingestion pipeline.

In this article we will introduce how we transformed our data ingestion pipeline from batching to streaming to provide our customers with real-time session metrics.

Motivations 

Unification. Previously, we maintained two data stores for different use cases - HBase is used for computing Experimentation metrics, whereas Druid is used for calculating Personalization results. These two systems were developed with distinctive requirements in mind:

Experimentation

Personalization

Instant event ingestion

Delayed event ingestion ok

Query latency in seconds

Query latency in subseconds

Visitor level metrics

Session level metrics

As our business requirements evolve, however, things quickly became difficult to scale. Maintaining a Druid + HBase Lambda architecture (see below) to satisfy these business needs became a technical burden for the engineering team. We need a solution that reduces backend complexity and increases development productivity. More importantly, a unified counting infrastructure creates a generic platform for many of our future product needs.

Consistency. As mentioned above, the two counting infrastructures provide different metrics and computational guarantees. For example, Experimentation results show you the number of visitors visited your landing page whereas Personalization shows you the number of sessions instead. We want to bring consistent metrics to our customers and support both type of statistics across our products.

Real-time results. Our session based results are computed using MR jobs, which can be delayed up to hours after the events are received. A real-time solution will provide our customers with more up-to-date view of their data.

Druid + HBase

In our earlier posts, we introduced our backend ingestion pipeline and how we use Druid and MR to store transactional stats based on user sessions. One biggest benefit we get from Druid is the low latency results at query time. However, it does come with its own set of drawbacks. For example, since segment files are immutable, it is impossible to incrementally update the indexes. As a result, we are forced to reprocess user events within a given time window if we need to fix certain data issues such as out of order events. In addition, we had difficulty scaling the number of dimensions and dimension cardinality, and queries expanding long period of time became expensive.

On the other hand, we also use HBase for our visitor based computation. We write each event into an HBase cell, which gave us maximum flexibility in terms of supporting the kind of queries we can run. When a customer needs to find out “how many unique visitors have triggered an add-to-cart conversion”, for example, we do a scan over the range of dataset for that experimentation. Since events are pushed into HBase (through Kafka) near real-time, data generally reflect the current state of the world. However, our current table schema does not aggregate any metadata associated with each event. These metadata include generic set of information such as browser types and geolocation details, as well as customer specific tags used for customized data segmentation. The redundancy of these data prevents us from supporting large number of custom segmentations, as it increases our storage cost and query scan time.

SessionDB 
Categories: Architecture

How Urban Airship Scaled to 2.5 Billion Notifications During the U.S. Election

Mon, 11/14/2016 - 17:56

This is a guest post by Urban Airship. Contributors: Adam Lowry, Sean Moran, Mike Herrick, Lisa Orr, Todd Johnson, Christine Ciandrini, Ashish Warty, Nick Adlard, Mele Sax-Barnett, Niall Kelly, Graham Forest, and Gavin McQuillan

Urban Airship is trusted by thousands of businesses looking to grow with mobile. Urban Airship is a seven year old SaaS company and has a freemium business model so you can try it for free. For more information, visit www.urbanairship.com. Urban Airship now averages more than one billion push notifications delivered daily. This post highlights Urban Airship notification usage for the 2016 U.S. election, exploring the architecture of the system--the Core Delivery Pipeline--that delivers billions of real-time notifications for news publishers.

2016 U.S. Election

In the 24 hours surrounding Election Day, Urban Airship delivered 2.5 billion notifications—its highest daily volume ever. This is equivalent to 8 notification per person in the United States or 1 notification for every active smartphone in the world. While Urban Airship powers more than 45,000 apps across every industry vertical, analysis of the election usage data shows that more than 400 media apps were responsible for 60% of this record volume, sending 1.5 billion notifications in a single day as election results were tracked and reported.

 

Notification volume was steady and peaked when the presidential election concluded:

Categories: Architecture

Stuff The Internet Says On Scalability For November 11th, 2016

Fri, 11/11/2016 - 17:56

Hey, it's HighScalability time:

 

Hacking recognition systems with fashion.

 

If you like this sort of Stuff then please support me on Patreon.
  • 9 teraflops: PC GPU performance for VR rendering; 1.75 million requests per second: DDoS attack from cameras; 5GB/mo: average data consumption in the US; ~59.2GB: size of Wikipedia corpus; 50%: slower LTE within the last year; 5.4 million: entries in Microsoft Concept Graph; 20 microseconds: average round-trip latencies between 250,000 machines using direct FPGA-to-FPGA messages (Microsoft); 1.09 billion: Facebook daily active mobile users; 300 minutes: soaring time for an AI controlled glider; 82ms: latency streaming game play on Azure; 

  • Quotable Quotes:
    • AORTA: Apple’s service revenue is now consistently greater than iPad and Mac revenue streams making it the number two revenue stream behind the gargantuan iPhone bucket.
    • @GeertHub: Apple R&D budget: $10 billion NASA science budget: $5 billion One explored Pluto, the other made a new keyboard.
    • Steve Jobs: tie all of our products together, so we further lock customers into our ecosystem
    • @moxie: I think these types of posts are also the inevitable result of people overestimating our organizational capacity based on whatever limited success Signal and Signal Protocol have had. It could be that the author imagines me sitting in a glass skyscraper all day, drinking out of champagne flutes, watching over an enormous engineering team as they add support for animated GIF search as an explicit fuck you to people with serious needs.
    • @jdegoes: Devs don't REALLY hate abstraction—they hate obfuscation. Abstraction discards irrelevant details, retaining an essence governed by laws.
    • @ewolff: There are no stateless applications. It just means state is on the client or in the database.
    • @mjpt777: Pushing simple logic down into the memory controllers is the only way to overcome the bandwidth bottleneck. I'm glad to see it begin.
    • @gigastacey: Moral of @0xcharlie car hacking talk appears to be don't put actuators on the internet w/out thinking about security. #ARMTechCon
    • @markcallaghan: When does MySQL become too slow for analytics? Great topic, maybe hard to define but IO-bound index nested loops join isn't fast.
    • @iAnimeshS: A year's computing on the old Macintosh portable can now be processed in just 5 seconds on the #NewMacBookPro. #AppleEvent
    • @neil_conway: OH: "My philosophy for writing C++ is the same as for using Git: 'I stay in my damn lane.'"
    • qnovo: Yet as big as this figure sounds, and it is big, only 3 gallons of gasoline (11 liters) pack the same amount of energy. Whereas the Tesla battery weighs about 1300 lbs (590 kg), 3 gallons of gasoline weigh a mere 18 lbs (8 kg). This illustrates the concept of energy density: a lithium-ion battery is 74X less dense than gasoline.
    • @kelseyhightower: I'm willing to bet developers spend more time reverse engineering inadequate API documentation than implementing business logic.
    • @sgmansfield: OH: our ci server continues to run out of inodes because each web site uses ~140,000 files in node_modules
    • @relix42: “We use maven to download half the internet and npm to get the other half…”
    • NEIL IRWIN: economic expansions do not die of old age—an old expansion like our current one is not likelier to enter a recession in the next year than a young expansion.
    • @popey: I am in 6 slack channels. 1.5GB RAM consumed by the desktop app. In 100+ IRC channels. 25MB consumed by irssi. The future is rubbish.
    • @SwiftOnSecurity: The only way to improve the security of these IoT devices is market forces. They must not be allowed to profit without fear of repercussions
    • The Ancient One: you think you know how the world works. What if I told you, through the mystic arts, we harness energy and shape reality?
    • @natpryce: "If you have four groups working on a compiler*, you'll get a four-pass compiler" *and you describe the problem in terms of passes
    • @PatrickMcFadin: Free cloud APIs are closing up as investors start looking for a return. Codebender is closing down 
    • We have quotes n the likes of which even god has never seen. Read the full article to them all.

  • The true program is the programmer. Ralph Waldo Emerson: “The true poem is the poet's mind; the true ship is the ship-builder. In the man, could we lay him open, we should see the reason for the last flourish and tendril of his work; as every spine and tint in the sea-shell preexist in the secreting organs of the fish.”

  • Who would have thought something like this was possible? A Regex that only matches itself. As regexes go it's not even all that weird looking. One of the comments asks for a proof of why it works. That would be interesting.

  • Docker in Production: A History of Failure. Generated a lot of heat and some light. Good comments on HN and on reddit and on reddit. A lot of the comments say yes, there a problems with Docker, but end up saying something like...tzaman: That's odd, we've been using Docker for about a year in development and half a year in production (on Google Container engine / Kubernetes) and haven't experienced any of the panics, crashes yet (at least not any we could not attribute as a failure on our end).

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

Tue, 11/08/2016 - 18:48

Who's Hiring?
  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com

  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.
Fun and Informative Events
  • Your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: Architecture

The QuickBooks Platform

Mon, 11/07/2016 - 17:56

This is a guest post by Siddharth Ram – Chief Architect, Small Business. Siddharth_ram@intuit.com.

The QuickBooks ecosystem is the largest small business SaaS product. The QuickBooks Platform supports bookkeeping, payroll and payment solutions for small businesses, their customers and accountants worldwide. Since QuickBooks is also a compliance & tax filing platform, consistency in reporting is extremely important.. Financial reporting requires flexibility in queries – a given report may have dozens of different dimensions that can be tweaked. Collaboration requires multiple edits by employees, Accountants and Business owners at the same time, leading to potential conflicts. All this leads to solving interesting scaling problems at Intuit.

Solving for scalability requires thinking on multiple time horizons and axes. Scaling is not just about scaling software – it is also about people scalability, process scalability and culture scalability. All these axes are actively worked on at Intuit. Our goal with employees is to create an atmosphere that allows them to do the best work of their lives.

Background
Categories: Architecture

Future Tidal Wave of Mobile Video

Thu, 10/20/2016 - 17:23
In this article I will examine the growing trends of Internet Mobile video and how consumer behaviour is rapidly adopting to a world of ‘always on content’ and discuss the impact on the underlying infrastructure.
Categories: Architecture

Gone Fishin'

Thu, 10/20/2016 - 04:07

Well, not exactly Fishin', but I'll be on a month long vacation starting today. I won't be posting (much) new content, so we'll all have a break. Disappointing, I know. Please use this time for quiet contemplation and other inappropriate activities. See you on down the road...

Categories: Architecture

Datanet: a New CRDT Database that Let's You Do Bad Bad Things to Distributed Data

Mon, 10/17/2016 - 17:44

 

We've had databases targeting consistency. These are your typical RDBMSs. We've had databases targeting availability. These are your typical NoSQL databases.

If you're using your CAP decoder ring you know what's next...what databases do we have that target making concurrency a first class feature? That promise to thrive and continue to function when network partitions occur?

No many, but we have a brand new concurrency oriented database: Datanet - a P2P replication system that utilizes CRDT algorithms to allow multiple concurrent actors to modify data and then automatically & sensibly resolve modification conflicts.

Datanet is the creation of Russell Sullivan. Russell spent over three years hidden away in his mad scientist layer researching, thinking, coding, refining, and testing Datanet. You may remember Russell. He has been involved with several articles on HighScalability and he wrote AlchemyDB, a NoSQL database, which was acquired by Aerospike.

So Russell has a feel for what's next. When he built AlchemyDB he was way ahead of the pack and now he thinks practical, programmer friendly CRDTs are what's next. Why?

Concurrency and data locality. To quote Russell:

Datanet lets you ship data to the spot where the action is happening. When the action happens it is processed locally, your system's reactivity is insanely quick. This is pretty much the opposite of the non-concurrent case where you need to go to a specific machine in the cloud to modify a piece of data regardless of where the action takes place. As your system grows, the concurrent approach is superior.

We have been slowly moving away from transactions towards NoSQL for reasons of scalability, availability, robustness, etc. Datanet continues this evolution by taking the next step and moving towards extreme distribution: supporting tons of concurrent writers.

The shift is to more distribution in computation. We went from one app-server & one DB to app-server-clusters and clustered-DBs, to geographically distributed data-centers, and now we are going much further with Datanet, data is distributed anywhere you need it to a local cache that functions as a database master.

How does Datanet work?

In Datanet, the same piece of data can simultaneously exist as a write-able entity in many many places in the stack. Datanet is a different way of looking at data: Datanet more closely resembles an internet routing protocol than a traditional client-server database ... and this mirrors the current realities that data is much more in flight than it used to be.

What bad bad things can you do to your distributed data? Here's an amazing video of how Datanet recovers quickly, predictably, and automatically from Chaos Monkey level extinction events. It's pretty slick. 

 

Here's an email interview I did with Russell. He goes into a lot more detail about Datanet and what it's all about. I think you will find it interesting. 

Let's start with your name and a little of your background?
Categories: Architecture

Stuff The Internet Says On Scalability For October 14th, 2016

Fri, 10/14/2016 - 16:56

Hey, it's HighScalability time:

 

A pattern from the collective unconscious of the universe. Scott Kelly's brilliant Year in Space Photos.

 

If you like this sort of Stuff then please support me on Patreon.
  • $1.5 million: new iOS hack bug bounty; 120 Terabits per second: Google and Facebook's submarine cable between Los Angeles with Hong Kong; 142,000: IT jobs lost last month;  $17 billion: cost of recall to Samsung; $4.1 Billion: IRS detected identity theft tax fraud; 1956: first mention of P vs NP by Kurt Gödel to John von Neumann; 1 million HTTP requests per second: DDoS attacks coming from IoT cameras; 90 petaflops: capacity of volunteer computing; 500 msec: time it takes the brain to integrate all sensory data into consciousness;

  • Quotable Quotes:
    • @GreatDismal: Silicon Valley fantasy that our universe is a simulation is actually the fantasy that our universe is a *sucessful startup*
    • @gblache: Being POTUS must be like inheriting a 240 year old code base and being asked to fix it in 4 years while half your team tries to sandbag you.
    • chrissnell: I'm a huge believer in colocation/on-prem in the post-Kubernetes era. I manage technical operations at a SaaS company and we migrated out of public cloud and into our own private, dedicated gear almost two years ago. Kubernetes and--especially--CoreOS has been a game changer for us.
    • @BenedictEvans: You spend 50-100x more on your smartphone than Google or FB make from you in ad revenue. They pay for their clouds out of that ad revenue
    • @kevinmarks: #NextEconomy Urs Hölzle: training a large model is super computationally intensive - trillions of flops
    • Tim O'Reilly: we see huge amounts of capital sitting on the sidelines rather than being part of a city - how do we fix this?
    • old-gregg: When I was at Rackspace, I was trying to analyze the top reasons our startup customers would stop using some of our SaaS offerings. The most common one, unsurprisingly, was they'd run out of business. But another top one was "they got successful". As they got bigger and more successful (can't mention names) they'd bring more and more in-house, eventually getting to a point that the only products they were interested in were just servers and bandwidth.
    • Joel Spolsky: But developers don’t want to overhear conversations. That’s ideal for a trading floor, but developers need to concentrate
    • Werner Vogels: Fast Data is an emerging industry term for information that is arriving at high volume and incredible rates, faster than traditional databases can manage. 
    • mattmanser: Honestly mate, you're just talking about the same old, same old. Every framework is about componentization and encapsulation. You could take React out of your post and replace it with any framework name in the last 40 years and it would have made 'sense' at the time.
    • @danielbryantuk: "Traditional software dev was like farming. You bought your tool stack and got busy. Now we're more like foragers" @monkchips #jaxlondon
    • Prashant Deva: RethinkDB is a classic story of good engineers doing only 'cool' things, not understanding their business, and ignoring all the 'boring' things that actually make a business tick.
    • Ada Lovelace Day: Lovelace came up with a method for the Analytical Engine to repeat a series of instructions: the first documented loop in computing
    • Greg Sanders: Let's stop talking about the block size. Let's talk about weight, the weight of a transaction, the weight of a block, the externalities it puts on the system. Let's talk about throughput. We can put more information in small spaces, so let's look at these problems
    • James Ryan: A major hold-up has been memory issues. GTA can’t even keep a car in memory after it’s left the player’s field of view, so there’s been no room at all for maintaining something resembling a character’s inner world.
    • yummyfajitas: Paraphrasing this to data science: "Everybody wants to have software provide them insights from data, but no one wants to learn any math."
    • @hunterwalk: "YouTube has a 46% share [of online video market], MySpace has 23% & Google Video has 10%." @nytimes 10/9/06  Happy 10th anniversary YT acq
    • @datawireio: "Microservices should not be used if the organization isn't embracing DevOps principles" http://d6e.co/2dxp0vr  by @danielbryantuk
    • delinka: I'm a bit older than the author. Every time I feel like I'm "out of touch" with the hip new thing, I take a weekend to look into it. I tend to discover that the core principles are the same, this time someone has added another C to MVC; or the put their spin on an API for doing X; or you can tell they didn't learn from the previous solution and this new one misses the mark, but it'll be three years before anyone notices (because those with experience probably aren't touching it yet, and those without experience will discover the shortcomings in time.)
    • sonnytron: But that's never good enough for douche bags that have a Foosball table in the office. They want you to give up your lunch and your evenings and play foosball with them. And crush it bro. And kill it bro.
    • @tupshin: @cmeik at scale (for various axes of scale, such as geographic-induced latency) a totally ordered system is impractical due to ux concerns
    • Victor J. Blue: When we’re addicted to online life, every moment is fun and diverting, but the whole thing is profoundly unsatisfying.
    • Richard Evans: I looked through the code and it turned out that much much earlier in the game I’d been rude to a servant during dinner, and the servant had gone into the kitchen and told the people there what a jerk I’d been – one of those people was the doctor. He remembered that. This took me quite a long time to debug. This is an example of how emergence is exciting but it opens up questions about game design.

  • This is the old: We had a post about whether you need maths to program. My answer: You need this kind [discrete math]. This is the new: Foundations of Data Science: we have written this book to cover the theory likely to be useful in the next 40 years, just as an understanding of automata theory, algorithms and related topics gave students an advantage in the last 40 years. One of the major changes is the switch from discrete mathematics to more of an emphasis on probability, statistics, and numerical methods.

  • Unlocking Horizontal Scalability in Our Web Serving Tier. Using MySQL on AWS RDS, Airbnb ran into C10K problems (connection limitations) that manifested as query latency increases, increased requests queues, and error rate spikes. So they added a connection pooling feature to MaxScale, a database proxy that supports intelligent query routing in between client applications and a set of backend MySQL servers. To neutralize the extra network hop introduced by the proxy they implemented availability zone aware request routing in SmartStack. Result: we were able to scale the application server tier with the addition of more servers without an increase in MySQL server threads. More than 15 Airbnb MaxScale database proxy services are in production.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Lessons Learned from Scaling Uber to 2000 Engineers, 1000 Services, and 8000 Git repositories

Wed, 10/12/2016 - 16:56

For a visual of the growth Uber is experiencing take a look at the first few seconds of the above video. It will start in the right place. It's from an amazing talk given by Matt Ranney, Chief Systems Architect at Uber and Co-founder of Voxer: What I Wish I Had Known Before Scaling Uber to 1000 Services (slides).

It shows a ceaseless, rhythmic, undulating traffic grid of growth occurring in a few Chinese cities. This same pattern of explosive growth is happening in cities all over the world. In fact, Uber is now in 400 cities and 70 countries. They have over 6000 employees, 2000 of whom are engineers. Only a year and half a go there were just 200 engineers. Those engineers have produced over 1000 microservices which are stored in over 8000 git repositories.

That's crazy 10x growth in a crazy short period of time. Who has experienced that? Not many. And as you might expect that sort of unique, compressed, fast paced, high stakes experience has to teach you something new, something deeper than you understood before.

Matt is not new to this game. He was co-founder of Voxer, which experienced its own rapid growth, but this is different. You can tell while watching the video Matt is trying to come to terms with what they've accomplished.

Matt is a thoughtful guy and that comes through. In a recent interview he says:

And a lot of architecture talks at QCon and other events left me feeling inadequate; like other people- like Google for example - had it all figured out but not me.

This talk is Matt stepping outside of the maelstrom for a bit, trying to make sense of an experience, trying to figure it all out. And he succeeds. Wildly.

It's part wisdom talk and part confessional. "Lots of mistakes have been made along the way," Matt says, and those are where the lessons come from.

The scaffolding of the talk hangs on WIWIK (What I Wish I Had Known) device, which has become something of an Internet meme. It's advice he would give his naive, one and half year younger self, though of course, like all of us, he certainly would not listen.  

And he would not be alone. Lots of people have been critical of Uber (HackerNewsReddit). After all, those numbers are really crazy. Two thousand engineers? Eight thousand repositories? One thousand services? Something must be seriously wrong, isn't it?

Maybe. Matt is surprisingly non-judgemental about the whole thing. His mode of inquiry is more questioning and searching than finding absolutes. He himself seems bemused over the number of repositories, but he gives the pros and cons of more repositories versus having fewer repositories, without saying which is better, because given Uber's circumstances: how do you define better?

Uber is engaged in a pitched world-wide battle to build a planetary scale system capable of capturing a winner-takes-all market. That's the business model. Be the last service standing. What does better mean in that context?  

Winner-takes-all means you have to grow fast. You could go slow and appear more ordered, but if you go too slow you’ll lose. So you balance on the edge of chaos and dip your toes, or perhaps your whole body, into chaos, because that’s how you’ll scale to become the dominant world wide service. This isn’t a slow growth path. This a knock the gate down and take everything strategy. Think you could do better? Really?

Microservices are a perfect fit for what Uber is trying to accomplish. Plug your ears, but it's a Conway's Law thing, you get so many services because that's the only way so many people can be hired and become productive.

There's no technical reason for so many services. There's no technical reason for so many repositories. This is all about people. mranney sums it up nicely:

Scaling the traffic is not the issue. Scaling the team and the product feature release rate is the primary driver.

A consistent theme of the talk is this or that is great, but there are tradeoffs, often surprising tradeoffs that you really only experience at scale. Which leads to two of the biggest ideas I took from the talk:

  • Microservices are a way of replacing human communication with API coordination. Rather than people talking and dealing with team politics it's easier for teams to simply write new code. It reminds me of a book I read long ago, don't remember the name, where people lived inside a Dyson Sphere and because there was so much space and so much free energy available within the sphere that when any group had a conflict with another group they could just splinter off and settle into a new part of the sphere. Is this better? I don't know, but it does let a lot of work get done in parallel while avoiding lots of people overhead. 
  • Pure carrots, no sticks. This is a deep point about the role of command and control is such a large diverse group. You'll be tempted to mandate policy. Thou shalt log this way, for example. If you don't there will be consequences. That's the stick. Matt says don't do that. Use carrots instead. Any time the sticks come out it's bad. So no mandates. The way you want to handle it is provide tools that are so obvious and easy to use that people wouldn’t do it any other way.

This is one of those talks you have to really watch to understand because a lot is being communicated along dimensions other than text. Though of course I still encourage you to read my gloss of the talk :-)

Stats (April 2016)
Categories: Architecture

Sponsored Post: ScaleArc, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

Tue, 10/11/2016 - 16:41

Who's Hiring?
  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.

Fun and Informative Events
  • Learn how Nielsen Marketing Cloud (NMC) leverages online machine learning and predictive personalization to drive its success in a live webinar on Tuesday, September 20 at 11 am PT / 2 pm ET. Hear from Nielsen’s Kevin Lyons, Senior VP of Data Science and Digital Technology, and Brent Keator, VP of Infrastructure, as well as from Brian Bulkowski, CTO and Co-Founder at Aerospike, as they describe the front-edge architecture and technical choices – including the Aerospike NoSQL database – that have led to NMC’s success. RSVP: https://goo.gl/xDQcu4
Cool Products and Services
  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: Architecture

Stuff The Internet Says On Scalability For October 7th, 2016

Wed, 10/05/2016 - 17:57

Hey, it's HighScalability time:

 

The worlds oldest analog computer, from 87 BC, the otherworldly Antikythera mechanism.

 

If you like this sort of Stuff then please support me on Patreon.
  • 70 billion: facts in Google's knowledge graph; 80 million: monthly visitors to walmart.com; 50%: lower cost for sending a container from Shanghai to Europe; 6 billion: Docker Hub pulls per 6 weeks; 5x: impact reduction using new airbag helmet; 400: node Cassandra + Spark Cluster in Azure; 66%: loss of installs when apps > 100MB; 223GB: Udacity open sources self-driving car data; 

  • Quotable Quotes:
    • rfrey: The success of many companies, and probably all of the unicorns, has nothing to do with technology. The tech is necessary, of course, but so are desks and an accounting department. Internalizing that has been difficult for me as an engineer.
    • @mza: 72 new features/services released last month on #AWS. 706 so far this year (up 42.9% YoY).
    • Marc Andreessen: To me the problem is clear: The problem is insufficient technological adoption, innovation, and disruption in these high-escalating price sectors of the economy. My thesis is that we're not in a tech bubble — we’re in a tech bust. Our problem isn't too much technology or people being too excited about technology. The problem is we don't have nearly enough technology. These cartel-like legacy industries are way too hard to disrupt.
    • @mfdii: What did the NSA agent say when it got access to all the email? Yahoo!
    • Ben Thompson~ [Google's Pixel event] was a huge event, you rarely see a company changing business models
    • @kerryb: News just in: databases to be “named and shamed” if they use foreign keys without trying to train local British keys first.
    • kazagistar: The biggest use of REST in our system (and I suspect a lot of large newer systems) is not "web client to backend server" but "microservice to microservice". And for this, GraphQL is severely immature.
    • @amcafee: Tesla software update: good braking "even if a UFO were to land on the freeway in zero visibility conditions."
    • evanelias: Facebook uses MySQL for countless other critical OLTP use-cases, and (for better or worse) even a few OLAP use-cases. It's the primary store of Facebook, across the entire company. It's the storage layer for ad serving, payments, async task persistence, internal tooling, many many other things. Most of these use-cases make full use of SQL and the relational model.
    • @rakamaric: Deschutes Brewery using light-weight formal methods (white-box fuzzing) to find bugs in their code! #soarlab
    • @tottinge: "Crowdsourcing is the tyranny of the herd, not the wisdom of crowds" @snowded #lascot16
    • @pedrolopesme: @toddlmontgomery "Your API is a protocol. Treat it like one."  #qconnyc 2016
    • Rodrick Brown: A pattern today many use to accomplish this [logging] is using a kafka logging library that hooks into their microservice and use something like spark to consume the logs from Kafka into elasticsearch. We're doing hundreds of thousands of events/sec on a tiny ~8 node ES cluster.
    • @dominicad: "The way people make decisions is key to understanding company culture. Instead of system analysis, record decisions." @snowded #lascot16
    • Hugh E. Williams: Engineers irrationally avoid hash tables because of the worst-case O(n) search time. In practice, that means they’re worried that everything they search for will hash to the same value
    • @JoeEmison: That's just not accurate. I've spent the last year trying to run on GCP and keep going back to AWS. It's not just perception.
    • boulos: Where I do agree is networking egress. The big three providers all have metered bandwidth rates that are way above the "all inclusive" fee you pay to Hetzner, OVH, DO, and others. The cheapest way to host an ftp server that serves 20 TB per month is certainly on one of these (today). None of these providers will let you serve 1 PB / month this way, but if you're in their sweet spot and they can make it work out on average, it's a good fit.
    • @DDDBE: "If you have a magical genie, you still have the problem of trying to explain what you want. That is domain complexity." @malk_zameth
    • avitzurel: The networking on AWS needs to be better. I don't want the strongest machine just to have a better transfer rate. It makes complete sense to have a micro machine for some services, but if those services are accessed or access other HTTP/s services, it will be unnecessarily slow
    • Alan Huang: the number of [Internet] hops can be reduced by 2X by converting the network into a toroid. The number of hops can be further reduced by recasting the network into N-dimensional hypercube or into a multistage network, such as a Perfect Shuffle or Banyan.
    • @jessfraz: Can we go back to ncurses apps instead of these memory hogging bullshits?
    • Russ White: The reality is we shouldn’t need DevOps for configuration at all. This is a bit of a revolution in my thinking in the last two or three years, but what I’m trying to do is to simply make DevOps, as it’s currently constituted, obsolete. DevOps should be about understanding how the network is working and making the network work better

  • Software is eating the world, but software is also eating software. Laugh. Cry. Shake your head and then your fist, but it's a satire that's all true: How it feels to learn JavaScript in 2016. Epic. Once you wipe away the tears you may also realize this is a great tutorial on all the different frameworks and how they fit together. You won't find better. 

  • Videos are available for Full Stack Fest 2016, held in Barcelon, with topics ranging from Docker, IPFS & GraphQL to Reactive Programming, Immutable Interfaces & Virtual Reality. 

  • Great analogy by paulddraper on cloud pricing: "Restaurant prices are ridiculous ... made the comparison between groceries and menu offerings of McDonalds, Taco Bell, Burger King ... Olive Garden (SO EXPENSIVE) and you pay 5 times at a restaurant for the same." You're not paying for hardware. You're paying for hardware, expertise, services, and convenience. On-prem or colocation may be a good choice. But limiting your comparison to raw computing power mischaracterizes the decision.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Stuff The Internet Says On Scalability For September 30th, 2016

Fri, 09/30/2016 - 16:56

Hey, it's HighScalability time:

 

Everything is a network. Map showing the global genetic interaction network of a cell. 

 

If you like this sort of Stuff then please support me on Patreon.
  • 18: Google can now drink and drive in Washington DC.; $10 billion: cost of a Vision Quest to Mars; 620 Gbps: DDoS attack on KrebsOnSecurity; 1 Tbps: DDoS attack on OVH; $200,000: cost of a typical cyber incident; 8 million: video training dataset labeled with 4800 labels; 180: Amazon warehouses in the US; 10: bits of info per photon; 16: GPUs in new AI killer P2 instance type;

  • Quotable Quotes:
    • @markmccaughrean: 1,000,000 people to Mars in 100 yrs. 10 people/launch? That's 3 a day, every day, for a century. 1% failure rate? One explosion every month
    • @jeremiahg: Any sufficiently advanced exploit is indistinguishable from a 400lb hacker.
    • BrianKrebs: I suggested to Mr. Wright perhaps a better comparison was that ne’er-do-wells now have a virtually limitless supply of Stormtrooper clones that can be conscripted into an attack at a moment’s notice.
    • Sonia: Academia’s not-so-subtle distain for applied research does more than damage a few promising careers; it renders our field’s output useless, destined to collect dust on the shelves of Elsevier. 
    • Monica L. Smith: Nobody builds their own infrastructure. You don’t build your own highway, train line, water pipe, your own sewer. Those are things that connect you and your household to everybody else sequentially in your neighborhood, in your region, from the city out into the broader hinterlands.
    • @olesovhcom: This botnet with 145607 cameras/dvr (1-30Mbps per IP) is able to send >1.5Tbps DDoS. Type: tcp/ack, tcp/ack+psh, tcp/syn.
    • kenrose: We see this pattern at PagerDuty over the majority of our customers. There is a definite lull in alert volume over the weekends that picks up first thing Monday morning.It's led to my personal conclusion that most production issues are caused by people, not errant hardware or systems.
    • @rseroter: "We Crammed this Monolith Into a Container and Called it a Microservice"
    • @mweagle: I really don’t want to run my own k8s in AWS, but ECS is so opaque to debug that k8s seems like a good choice.
    • Werner Vogels~ We have this overarching goal which is customer centricity. Doing anything that benefits the customer gets priority above everything else. Working on eliminating all single points of failure in the company purely benefits the customer because it really improves the customer experience.
    • Cory Doctorow~ The thing open source software had going for it was the Ulysses Pact...the  irrevocable license, the failure mode of open source software, having founded an open source software company, I can tell you there are moments where it feels like your survival turns on being able to close the code you had opened when you were idealistic. There are moments of desperation when that happens. 
    • @lightbend: "We've been using #Akka in production for over two years, without a single crash." -@CruiseNorwegian |
    • @cloud_opinion: Monolithic -> Microservices -> "which container image?" -> "Screw it, lets do PaaS" ->  CF  or AWS?
    • Etsy: concurrency proved to be great for logical aggregation of components, and not so great for performance optimization. Better database access would be better for that.
    • Yaniv Nizan: the number of users actually contributing ad revenue in your app is a lot lower than 6.5% and much closer to the 1% or 2% that contribute revenue from In-app purchases. 
    • @reckless: Elon is basically putting on an Apple event, for going to Mars.
    • @potch: DRY: Don't Repeat Yourself / DAMP: Do Abstraction/Minimalism Pragmatically / MOIST: Maybe Only Innovate Some Times?
    • @dannysullivan: In the Facebook video metrics thing, spare a thought for the poor BuzzFeed watermelon, less viral than it thought :)
    • Addison Snell: If the promise of cloud computing is overblown, it because of the amplification it gets from its loyal converts, enterprises who have found liberation and agility in outsourcing IT. 
    • @psaffo: In 1990, the size of the US software industry was $3.2 billion -- the same size as the gourmet popcorn industry in that same year.
    • David Rosenthal: [Storage] Revenues are flat or decreasing, profits are decreasing for both companies. These do not look like companies faced by insatiable demand for their products; they look like mature companies facing increasing difficulty in scaling their technology.
    • @legind: Let's Encrypt now the 3rd largest CA, after Comodo and Symantec, comprising over 13% of the SSL cert market share 
    • @stewartbrand: “In the long run, the technology driving activities in space will be biological.” Rousing essay by Freeman Dyson.
    • @jessitron: Constructing causal ordering at the generic level of "all messages received cause all future messages sent" is expensive and also less meaningful than a business-logic-aware, conscious causal ordering. This conscious causal ordering gives us external consistency, accurate legibility, and visibility into what we know to be causal.

  • In an article light on details, written more with a marketing flourish, we still learn some interesting details on the infrastructure behind Pokemon Go. Bringing Pokémon GO to life on Google Cloud. It runs on Google Cloud, Kubernetes, Google Container Engine, HTTP/S Load Balancer, and Cloud Datastore. Keep in mind Alphabet is invested in Niantic and Ingress, the forerunner of Pokemon Go, ran on App Engine. So it sounds like a new backend implementation that had to scale from zero to the size of Twitter in a matter of weeks, with a much more complicated work load. Growth was explosive. Player traffic was 50x larger than initial estimates. An implication is the problems experienced during launch were not infrastructure related. Google, in the form of Customer Reliability Engineer (CRE), worked closely with Niantic to make sure the infrastructure scaled. The problems must have been elsewhere in the application stack, which is perfectly understandable. That sort of load could not have been predicted. The design decisions you make for 5x expected traffic are very different than they are for 50x. Nobody will spend the money or take the time to build a system for 50x. Nobody. Lots of good comments on HackerNews. Good question by ksec, would Poekemon Go even be possible in a pre-cloud era? 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenters

Wed, 09/28/2016 - 16:59

If you are Uber and you need to store the location data that is sent out every 30 seconds by both driver and rider apps, what do you do? That’s a lot of real-time data that needs to be used in real-time.

Uber’s solution is comprehensive. They built their own system that runs Cassandra on top of Mesos. It’s all explained in a good talk by Abhishek Verma, Software Engineer at Uber: Cassandra on Mesos Across Multiple Datacenters at Uber (slides).

Is this something you should do too? That’s an interesting thought that comes to mind when listening to Abhishek’s talk.

Developers have a lot of difficult choices to make these days. Should we go all in on the cloud? Which one? Isn’t it too expensive? Do we worry about lock-in? Or should we try to have it both ways and craft brew a hybrid architecture? Or should we just do it all ourselves for fear of being cloud shamed by our board for not reaching 50 percent gross margins?

Uber decided to build their own. Or rather they decided to weld together their own system by fusing together two very capable open source components. What was needed was a way to make Cassandra and Mesos work together, and that’s what Uber built.

For Uber the decision is not all that hard. They are very well financed and have access to the top talent and resources needed to create, maintain, and update these kind of complex systems.

Since Uber’s goal is for transportation to have 99.99% availability for everyone, everywhere, it really makes sense to want to be able to control your costs as you scale to infinity and beyond.

But as you listen to the talk you realize the staggering effort that goes into making these kind of systems. Is this really something your average shop can do? No, not really. Keep this in mind if you are one of those cloud deniers who want everyone to build all their own code on top of the barest of bare metals.

Trading money for time is often a good deal. Trading money for skill is often absolutely necessary.

Given Uber’s goal of reliability, where out of 10,000 requests only one can fail, they need to run out of multiple datacenters. Since Cassandra is proven to handle huge loads and works across datacenters, it makes sense as the database choice.  

And if you want to make transportation reliable for everyone, everywhere, you need to use your resources efficiently. That’s the idea behind using a datacenter OS like Mesos. By statistically multiplexing services on the same machines you need 30% fewer machines, which saves money. Mesos was chosen because at the time Mesos was the only product proven to work with cluster sizes of 10s of thousands of machines, which was an Uber requirement. Uber does things in the large.

What were some of the more interesting findings?

  • You can run stateful services in containers. Uber found there was hardly any difference, 5-10% overhead, between running Cassandra on bare metal versus running Cassandra in a container managed by Mesos.

  • Performance is good: mean read latency: 13 ms and write latency: 25 ms, and P99s look good.

  • For their largest clusters they are able to support more than a million writes/sec and ~100k reads/sec.

  • Agility is more important than performance. With this kind of architecture what Uber gets is agility. It’s very easy to create and run workloads across clusters.

Here’s my gloss of the talk:

In the Beginning
Categories: Architecture