Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Architecture

Sponsored Post: Apple, Asana, FoundationDB, Cie Games, BattleCry, Surge, Dreambox, Chartbeat, Monitis, Netflix, Salesforce, Cloudant, CopperEgg, Logentries, Gengo, Couchbase, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngin

Who's Hiring?
  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here. 
    • Software Developer in Test. The iOS Systems team is looking for a Quality Assurance engineer. In this role you will be expected to work hand-in-hand with the software engineering team to find and diagnose software defects. The ideal candidate will also seek out ways to further automate all aspects of our existing process. This is a highly technical role and requires in-depth knowledge of both white-box and black-box testing methodologies. Please apply here
    • Senior Software Engineer -iOS Systems. Do you love building highly scalable, distributed web applications? Does the idea of a fast-paced environment make your heart leap? Do you want your technical abilities to be challenged every day, and for your work to make a difference in the lives of millions of people? If so, the iOS Systems Carrier Services team is looking for a talented software engineer who is not afraid to share knowledge, think outside the box, and question assumptions. Please apply here.
    • Software Engineering Manager, IS&T WWDR Dev Systems. The WWDR development team is seeking a hands-on engineering manager with a passion for building large-scale, high-performance applications. The successful candidate will be collaborating with Worldwide Developer Relations (WWDR) and various engineering teams throughout Apple. You will lead a team of talented engineers to define and build large-scale web services and applications. Please apply here.
    • C++ Senior Developer and Architect- Maps. The Maps Team is looking for a senior developer and architect to support and grow some of the core backend services that support Apple Map's Front End Services. Ideal candidate would have experience with system architecture, as well as the design, implementation, and testing of individual components but also be comfortable with multiple scripting languages. Please apply here.

  • Cie Games, small indie developer and publisher in LA, is looking for rock star Senior Game Java programmers to join our team! We need devs with extensive experience building scalable server-side code for games or commercial-quality applications that are rich in functionality. We offer competitive comp, great benefits, interesting projects, and exceptional growth opportunities. Check us out at http://www.ciegames.com/careers.

  • BattleCry, the newest ZeniMax studio in Austin, is seeking a qualified Front End Web Engineer to help create and maintain our web presence for AAA online games. This includes the game accounts web site, enhancing the studio website, our web and mobile- based storefront, and front end for support tools. http://jobs.zenimax.com/requisitions/view/540

  • FoundationDB is seeking outstanding developers to join our growing team and help us build the next generation of transactional database technology. You will work with a team of exceptional engineers with backgrounds from top CS programs and successful startups. We don’t just write software. We build our own simulations, test tools, and even languages to write better software. We are well-funded, offer competitive salaries and option grants. Interested? You can learn more here.

  • Asana. As an infrastructure engineer you will be designing software to process, query, search, analyze, and store data for applications that are continually growing in scale. You will work with a world-class team of engineers on deploying and operating existing systems, and building new ones for problems that are unique to our problem space. Please apply here.

  • Operations Engineer - AWS Cloud. Want to grow and extend a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join DreamBox Learning’s creative team of engineers, designers, and educators. Help us radically change education in an environment that values collaboration, innovation, integrity and fun. Please apply here. http://www.dreambox.com/careers

  • Chartbeat measures and monetizes attention on the web. Our traffic numbers are growing, and so is our list of product and feature ideas. That means we need you, and all your unparalleled backend engineer knowledge to help up us scale, extend, and evolve our infrastructure to handle it all. If you've these chops: www.chartbeat.com/jobs/be, come join the team!

  • The Salesforce.com Core Application Performance team is seeking talented and experienced software engineers to focus on system reliability and performance, developing solutions for our multi-tenant, on-demand cloud computing system. Ideal candidate is an experienced Java developer, likes solving real-world performance and scalability challenges and building new monitoring and analysis solutions to make our site more reliable, scalable and responsive. Please apply here.

  • Sr. Software Engineer - Distributed Systems. Membership platform is at the heart of Netflix product, supporting functions like customer identity, personalized profiles, experimentation, and more. Are you someone who loves to dig into data structure optimization, parallel execution, smart throttling and graceful degradation, SYN and accept queue configuration, and the like? Is the availability vs consistency tradeoff in a distributed system too obvious to you? Do you have an opinion about asynchronous execution and distributed co-ordination? Come join us

  • Human Translation Platform Gengo Seeks Sr. DevOps Engineer. Build an infrastructure capable of handling billions of translation jobs, worked on by tens of thousands of qualified translators. If you love playing with Amazon’s AWS, understand the challenges behind release-engineering, and get a kick out of analyzing log data for performance bottlenecks, please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • OmniTI has a reputation for scalable web applications and architectures, but we still lean on our friends and peers to see how things can be done better. Surge started as the brainchild of our employees wanting to bring the best and brightest in Web Operations to our own backyard. Now in its fifth year, Surge has become the conference on scalability and performance. Early Bird rate in effect until 7/24!
Cool Products and Services
  • A third party vendor benchmarked the performance of three databases: Couchbase Server, MongoDB and DataStax Enterprise. The databases were benchmarked with two different workloads (read-intensive / balanced) via YCSB on dedicated servers. Read.

  • Now track your log activities with Log Monitor and be on the safe side! Monitor any type of log file and proactively define potential issues that could hurt your business' performance. Detect your log changes for: Error messages, Server connection failures, DNS errors, Potential malicious activity, and much more. Improve your systems and behaviour with Log Monitor.

  • The NoSQL "Family Tree" from Cloudant explains the NoSQL product landscape using an infographic. The highlights: NoSQL arose from "Big Data" (before it was called "Big Data"); NoSQL is not "One Size Fits All"; Vendor-driven versus Community-driven NoSQL.  Create a free Cloudant account and start the NoSQL goodness

  • Finally, log management and analytics can be easy, accessible across your team, and provide deep insights into data that matters across the business - from development, to operations, to business analytics. Create your free Logentries account here.

  • CopperEgg. Simple, Affordable Cloud Monitoring. CopperEgg gives you instant visibility into all of your cloud-hosted servers and applications. Cloud monitoring has never been so easy: lightweight, elastic monitoring; root cause analysis; data visualization; smart alerts. Get Started Now.

  • Whitepaper Clarifies ACID Support in Aerospike. In our latest whitepaper, author and Aerospike VP of Engineering & Operations, Srini Srinivasan, defines ACID support in Aerospike, and explains how Aerospike maintains high consistency by using techniques to reduce the possibility of partitions.  Read the whitepaper: http://www.aerospike.com/docs/architecture/assets/AerospikeACIDSupport.pdf.

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Who's Managing Your Company?

One of the best books I’m reading lately is The Future of Management, by Gary Hamel.

It’s all about how management innovation is the best competitive advantage, whether you look through the history of great businesses or the history of great militaries.  Hamel makes a great case that strategic innovation, product or service innovation, and operational innovation are fleeting advantages, but management innovation leads to competitive advantage for the long haul.

In The Future of Management, Hamel poses a powerful question …

“Who is managing your company?”

Via The Future of Management:

“Who's managing your company?  You might be tempted to answer, 'the CEO,' or 'the executive team,' or 'all of us in middle management.'  And you'd be right, but that wouldn't be the whole truth.  To a large extent, your company is being managed right now by a small coterie of long-departed theorists and practitioners who invented the rules and conventions of 'modern' management back in the early years of the 20th century.  They are the poltergeists who inhabit the musty machinery of management.  It is their edicts, echoing across the decades, that invisibly shape the way your company allocates resources, sets budgets, distributes power, rewards people, and makes decisions.”

That’s why it’s easy for CEOs to hop around companies …

Via The Future of Management:

“So pervasive is the influence of these patriarchs that the technology of management varies only slightly from firm to firm.  Most companies have a roughly similar management hierarchy (a cascade of EVPs, SVPs, and VPs).  They have analogous control systems, HR practices, and planning rituals, and rely on comparable reporting structures and review systems.  That's why it's so easy for a CEO to jump from one company to another -- the levers and dials of management are more or less the same in every corporate cockpit.”

What really struck me here is how much management approach has been handed down through the ages, and accepted as status quo.

It’s some great good for thought, especially given that management innovation is THE most powerful form of competitive advantage from an innovation standpoint (which Hamel really builds a strong case here throughout the entirety of the book.)

You Might Also Like

The New Competitive Landscape

Principles and Values Define a Culture

The Enterprise of the Future

Cognizant on The Next Generation Enterprise

Satya Nadella on the Future is Software

Categories: Architecture, Programming

StackOverflow Update: 560M Pageviews a Month, 25 Servers, and It's All About Performance

The folks at Stack Overflow remain incredibly open about what they are doing and why. So it’s time for another update. What has Stack Overflow been up to?

The network of sites that make up StackExchange, which includes StackOverflow, is now ranked 54th for traffic in the world; they have 110 sites and are growing at a rate of 3 or 4 a month; 4 million users; 40 million answers; and 560 million pageviews a month.

This is with just 25 servers. For everything. That’s high availability, load balancing, caching, databases, searching, and utility functions. All with a relative handful of employees. Now that’s quality engineering.

This update is based on The architecture of StackOverflow (video) by Marco Cecconi and What it takes to run Stack Overflow (post) by Nick Craver. In addition, I’ve merged in comments from various sources. No doubt some of the details are out of date as I meant to write this article long ago, but it should still be representative. 

Stack Overflow still uses Microsoft products. Microsoft infrastructure works and is cheap enough, so there’s no compelling reason to change. Yet SO is pragmatic. They use Linux where it makes sense. There’s no purity push to make everything Linux or keep everything Microsoft. That wouldn’t be efficient. 

Stack Overflow still uses a scale-up strategy. No clouds in site. With their SQL Servers loaded with 384 GB of RAM and 2TB of SSD, AWS would cost a fortune. The cloud would also slow them down, making it harder to optimize and troubleshoot system issues. Plus, SO doesn’t need a horizontal scaling strategy. Large peak loads, where scaling out makes sense, hasn’t  been a problem because they’ve been quite successful at sizing their system correctly.

So it appears Jeff Atwood’s quote: "Hardware is Cheap, Programmers are Expensive", still seems to be living lore at the company.

Marco Ceccon in his talk says when talking about architecture you need to answer this question first: what kind of problem is being solved?

First the easy part. What does StackExchange do? It takes topics, creates communities around them, and creates awesome question and answer sites. 

The second part relates to scale. As we’ll see next StackExchange is growing quite fast and handles a lot of traffic. How does it do that? Let’s take a look and see….

Stats
Categories: Architecture

100 Articles to Sharpen Your Mind

Actually, it's more than 100 articles for your mind.  I've tagged my articles with "mind" on Sources of Insight that focus on increasing your "intellectual horsepower":

Articles on Mind Power and the Power of Thoughts

Here are a few of the top mind articles that you can quickly get results with:

Note that if reading faster is important to you, then I recommend also reading How To Read 10,000 Words a Minute (it’s my ultimate edge) and The Truth About Speed Reading.

If there’s one little trick I use with reading (whether it’s a book, an email, or whatever), I ask myself “what’s the insight?” or “what’s the action?” or “how can I use this?"  You’d be surprised but just asking yourself those little focusing questions can help you parse down cluttered content fast and find the needles in the haystack.

Categories: Architecture, Programming

Stuff The Internet Says On Scalability For July 18th, 2014

Hey, it's HighScalability time:


The one is many. Lichen composed of possibly 400 distinct species.
  • Quotable Quotes:
    • The Master Switch: Selling radio sets—the old revenue model—was a good if limited business, for ultimately few households would need more than one radio every few years. But advertising revenues could expand indefinitely—or so it seemed then.
    • Larry Page: It’s pretty difficult to solve big problems in four years. I think it’s probably pretty easy to do it in 20 years. I think our whole system is setup in a way that makes it difficult for leaders of really big companies.

  • The Master Switch: The inventors we remember are significant not so much as inventors, but as founders of “disruptive” industries, ones that shake up the technological status quo. Through circumstance or luck, they are exactly at the right distance both to imagine the future and to create an independent industry to exploit it.

  • You thought you were clever and safe? Reality doesn't like that. The fallacy of distributed transactions: In other words, as far as the queue is concerned, the transaction committed, and the message is gone. As far as the database is concerned, that transaction was rolled back, and never happened. Of course, the chance that something like that can happen in one of your systems? Probably one in a million.

  • How do you make Cassandra 50% faster? You add batching of replies. You fix your thread pools. And you get rid of unecessary endcoding and decoding phases like with Thrift. Of course now the bottleneck has moved and the process starts again.

  • Everyday Algorithms: Elevator Allocation. Though I'm pretty sure my contextualized and personalized elevator scheduling goes something like: for him, slow as possible. And they and the crosswalk lights must be in cahoots, because I get the same fine service.

  • Simon Wardley: Anyhow, this is what I don't get. Micro services has become a big thing - good. So, why do we have to continuously create 'new' terms to describe what is already happening? < Why have new songs when they use all the same words? What changes is the context. A new binding requires a new word. It's like a linguistic method of versioning. 10 years ago all the words around that eras version of microservices would be different, so we need a new word now to reflect a new world.

  • Programmers really want the database to work as a queue. It just never works in the end. But if you are Antirez and are a programmer and have your own database then you can make that happen. Queues and databases

  • Another case of breaking one thing in to two parts and then arguing which part is more important. In reference to the debate over Linus' quote on data structures: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." < Good and evil. Light and dark. Mind and Body. Starsky and Hutch. They are defined in terms of each other and make no sense without each other. They are a single system. 

  • New russian 8-core CPU. It may not be the fastest CPU, but it won't break in the field and hardly ever jams when covered in mud or sand.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

First experiences with OpenStack

Agile Testing - Grig Gheorghiu - Thu, 07/17/2014 - 21:37
We hit a big milestone this week, as we started to use OpenStack as a private cloud, intially just for QA/integration environments. Up to now we've been creating KVM machines semi-manually, which used to take minutes. Now we cut down that process to seconds, calling the Nova API from the command line, e.g.:

$ nova boot --image precise-image --flavor www --key_name mykey --nic net-id=3eafbd4f-0389-4c5b-93ba-7764742ee8cd www1.qa1

Once an instance is provisioned, we bootstrap it with Chef:

$ knife bootstrap www1.qa1.mydomain.com -x ubuntu --sudo -E qa1 -N www1.qa1 -r "role[base], role[www]"

Our internal network architecture is fairly complex, so my colleague Jeff Roberts spent quite some time bending OpenStack Neutron to his will (in conjunction with Open vSwitch) in order to support our internal VLANs. The OpenStack infrastructure has been stable so far, and it's just such a pleasure to do everything via an API and not to spin VMs up manually. Being back to working with a (private) cloud feels good.

This is just version 1.0 of our OpenStack rollout. Soon we'll start spinning up one environment at a time using chef-metal and fog  and we'll also integrate instance + environment spin-up with Jenkins. Exciting times ahead!

Agile and the Definition of Quality

“Quality begins on the inside... then works its way out.” -- Bob Moawad

Quality is value to someone.

Quality is relative.

Quality does not exist in a non-human vacuum.

Who is the person behind a statement about quality?

Who’s requirements count the most?

What are people willing to pay or do to have their requirements met?

Quality can be elusive if you don’t know how to find it, or you don’t know where to look.  Worse, even when you know where to look, you need to know how to manage the diversity of conflicting views.

On a good note, Agile practices and an Agile approach can help you surface and tackle quality in a tractable and pragmatic way.

In the book Agile Impressions, by “the grandfather of Agile Programming”, Jerry Weinberg shares insights and lessons learned around the relativity of quality and how to make decisions about quality more explicit and transparent.

Example of Conflicting Ideas About Software Quality

Here are some conflicting ideas about what constitutes software quality, according to Weinberg:

“Zero defects is high quality.”
“Lots of features is high quality.”
Elegant coding is high quality.”
“High performance is high quality.”
”Low development cost is high quality.”
“Rapid development is high quality.”
“User-friendliness is high quality.”

More Quality for One Person, May Mean Less for Another

There are always trade-offs.  It can be a game of robbing Peter to pay Paul.

Via Agile Impressions:

“Recognizing the relativity of quality often resolves the semantic dilemma. This is a monumental contribution, but it still does not resolve the political dilemma:  More quality for one person may mean less quality for another.”

The Relativity of Quality

Quality is relative.

Via Agile Impressions:

“The reason for my dilemma lies in the relativity of quality. As the MiniCozy story crisply illustrates, what is adequate quality to one person may be inadequate quality to another.”

Quality Does Not Exist in a Non-Human Vacuum

So many

Via Agile Impressions:

“If you examine various definitions of quality, you will always find this relativity. You may have to examine with care, though, for the relativity is often hidden, or at best, implicit.

In short, quality does not exist in a non-human vacuum, but every statement about quality is a statement about some person(s).  That statement may be explicit or implicit. Most often, the “who” is implicit, and statements about quality sound like something Moses brought down from Mount Sinai on a stone tablet.  That’s why so many discussions of software quality are unproductive: It’s my stone tablet versus your Golden Calf.”

Ask, Who is the Person Behind that Statement About Quality?

The way to have more productive conversations about quality is to find out who is the person behind a specific statement about quality.

Via Agile Impressions:

“When we encompass the relativity of quality, we have a tool to make those discussions more fruitful.  Each time somebody asserts a definition of software quality, we simply ask, “Who is the person behind that statement about quality.”

Quality Is Value To Some Person

Whose requirements count the most?

Via Agile Impressions:

“The political/emotional dimension of quality is made evident by a somewhat different definition of quality.  The idea of ‘requirements’ is a bit too innocent to be useful in this early stage, because it says nothing about whose requirements count the most. A more workable definition would be this:

‘Quality is value to some person.’

By ‘value,’ I mean, ‘What are people willing to pay (do) to have their requirements met.’ Suppose, for instance, that Terra were not my niece, but the niece of the president of the MiniCozy Software Company.  Knowing MiniCozy’s president’s reputation for impulsive emotional action, the project manager might have defined “quality” of the word processor differently.  In that case, Terra’s opinion would have been given high weight in the decision about which faults to repair.”

The Definition of “Quality” is Always Political and Emotional

Quality is a human thing.

Via Agile Impressions:

“In short, the definition of ‘quality’ is always political and emotional, because it always involves a series of decisions about whose opinions count, and how much they count relative to one another. Of course, much of the time these political/emotional decisions– like all important political/emotional decisions–are hidden from public view. Most of us software people like to appear rational. That’s why very few people appreciate the impact of this definition of quality on the Agile approaches.”

Agile Teams Can Help Make Decisions About Quality More Explicit Transparent

Open processes and transparency can help arrive at a better quality bar.

Via Agile Impressions:

“What makes our task even more difficult is that most of the time these decisions are hidden even from the conscious minds of the persons who make them.  That’s why one of the most important actions of an Agile team is bringing such decisions into consciousness, if not always into public awareness. And that’s why development teams working with an open process (like Agile) are more likely to arrive at a more sensible definition of quality than one developer working alone. To me, I don’t consider Agile any team with even one secret component.”

The "Customer" Must Represent All Significant Decisions of Quality

The quality of your product will be gated by the quality of your representation.

Via Agile Impressions:

“Customer support is another emphasis in Agile processes, and this definition of quality guides the selection of the ‘customers.’ To put it succinctly, the ‘ customer’ must actively represent all of the significant definitions of ‘quality.’ Any missing component of quality may very likely lead to a product that’s deficient in that aspect of quality.”

If You Don’t Have Suitable Representation of Views on Quality, You’re Not Agile

It’s faster and far more efficient to ignore people and get your software done.  But it’s far less effective.  Your amplify your effectiveness for addressing quality by involving the right people, in the right way, at the right time.  That’s how you change your quality game.

Via Agile Impressions:

“As a consultant to supposedly Agile teams, I always examine whether or not they have active participation of a suitable representation of diverse views of their product’s quality. If they tell me, “We can be more agile if we don’t have to bother satisfying so many people, then they may indeed by agile, but they’re definitely not Agile.”

I’ve learned a lot about quality over the years.  Many of Jerry Weinberg’s observations and insights match what I’ve experienced across various projects, products, and efforts.   The most important thing I’ve learned is how much value is in the eye of the beholder and the stakeholder and that quality is something that you directly impact by having the right views involved throughout the process.

Quality is not something you can bolt on or something that you can patch.

While you can certainly improve things, so much of quality starts up front with vision and views of the end in mind.

You might even say that quality is a learning process of realizing the end in mind.

For me, quality is a process of vision + rapid learning loops to iterate my way through the jungle of conflicting and competing views and viewpoints, while brining people along the journey.

Categories: Architecture, Programming

10 Program Busting Caching Mistakes

While Ten Caching Mistakes that Break your App by Omar Al Zabir is a few years old, it is still a great source of advice on using caches, especially on the differences between using a local in-memory cache and when using a distributed cache.

Here are the top 10 mistakes (summarized):
  1. Relying on a default serializer. Default serializers can use a lot of CPU, especially for complex types. Give some thought to the best serialization and deserialization method for your language and environment.
  2. Storing large objects in a single cache item. Because of serialization and deserialization costs, under concurrent load, frequent access to large object graphs can kill your server's CPU. Instead, break up the larger graph into smaller subgraphs and cache them separately. Retrieve only the smallest unit you need.
  3. Using cache to share objects between threads. Race conditions, when writes are involved, develop if parts of a program are accessing the same cached items simultaneously. Some sort of external locking mechanism is needed. 
  4. Assuming items will be in cache immediately after storing them. Never assume an item will be in a cache, even after it was just written, because a cache can flush items when memory gets tight. Code should always check for a null return value from a cache.
  5. Storing entire collection with nested objects. Storing an entire collection when you need to get a particular item results in poor performance because of the serialization overhead. Cache individual items separately so they can be retrieved separately. 
  6. Storing parent-child objects together and also separately. Sometimes an object will simultaneously be contained in two or more parent objects. To not have the same object stored in two different places in the cache store it on its own under its own key. The parent objects will then read the objects when access is needed.
  7. Caching Configuration settings. Store configuration data in a static variable that is local to your process. Accessing cached data is expensive so you want to avoid that cost when possible.
  8. Caching Live Objects that have open handle to stream, file, registry, or network. Don't cache objects the have references to resources like files, streams, memory, etc. When the cached item is removed from the cache those resources will not be deleted and system resources will leak. 
  9. Storing same item using multiple keys. It can be convenient to access an item by a key and an index number.  This can work when a cache is in-memory because the cache can contain a reference to the same object which means changes to the object will be seen through both access paths. When using a remote cache any updates won't be visible so the objects will get out of sync. 
  10. Not updating or deleting items in cache after updating or deleting them on persistent storage. Items in a remote cache are stored as a copy, so updating an object won't update the cache. The cache must specifically be updated for the changes to be seen by anyone else. With an in-memory cache changes to an object will be seen by everyone. Same for deletion. Deleting an object won't delete it from the cache. It's up to the program make sure cached items are deleted correctly.
Categories: Architecture

Bitly: Lessons Learned Building a Distributed System that Handles 6 Billion Clicks a Month

Have you ever wondered how bitly makes money? A URL shortener can’t be that hard to write, right? Sean O'Connor, Lead Application Developer at bitly, answers the how can bitly possibly make money question immediately in a talk he gave on bitly at the Bacon conference.

Writing a URL shortner that works is easy, says Sean, writing one that scales and is highly available, is not so easy.

Bitly doesn’t make money with a Shortening as a Service service, bitly makes money on an analytics product that mashes URL click data with with data they crawl from the web to help customers understand what people are paying attention to on the web. 

Analytics products began as a backend service that crawled web server logs. Logs contained data from annotated links along with cookie data to indicate where on a page a link was clicked, who clicked it, what the link was, etc. But the links all went back to the domain of the web site. The idea of making links go to a different domain than your own so that a 3rd party can do the analytics is a scary proposition, but it’s also kind of genius.

While this talk is not on bitly’s architecture, it is a thoughtful exploration on the nature of distributed systems and how you can solve bigger than one box problems with them.

Perhaps my favorite lesson from his talk is this one (my gloss):

SOA + queues + async messaging is really powerful. This approach isolates components, lets work happen concurrently, lets boxes fail independently, while still having components be easy to reason about.

I also really like his explanation for why event style messages are better than command style messages. I’ve never heard it put that way before.

Sean talks from a place of authentic experience. If you are trying to make a jump from a single box mindset to a multibox way of thinking, this talk is well worth watching. 

So let’s see what Sean has to say about distributed systems...

Stats
Categories: Architecture

Better than something or someone or just good

Gridshore - Sun, 07/13/2014 - 15:56

Recently I saw some tweets about the possibility to finally have a website to express your hate for Eclipse. Since I am a strong believer in another tool than eclipse and I really don’t want to work with eclipse, at first I thought this was funny. Than I was checking some news posts and found a post on apple insider where they show a Samsung advertorial that bashes the iPhone battery life. Again, funny if you like samsung or you think android is better than the iPhone. Maybe even funny if you still love the iPhone.

These tweets and sources of information made me think about marketing of products, opinions of people and sometimes company strategies. As a company, do you want to prove you are better than someone else, or do you want to prove that you are really good. Personally I do not really care if I am better than someone else, I only care if I am good or maybe even superb.

Does it really work to be better than … ? Was Argentina better than the Dutch just a week a go? If not, like so many say, did it help the Dutch? Better than is an interesting concept in soccer. If your soccer teams never wins the championship, you as a fan still think they are the best. I get the same with my local soccer team, every parent thinks his kid is better than the other kids. So better than is incredibly subjective. If you say you create better software, does it sell better? What is good software? If you are into software quality you probably have a few tools to calculate the quality of software. Are you a better programmer if the lines of code you write have shorter length? Are you a better software developer if you use another tool than eclipse? I don’t really care. Use eclipse if you want to, use something else if you can afford it.

I like to make fun of people that use eclipse just like I make fun of people using linux or windows. But in the end, I don’t really care. I know people using windows or linux, working on eclipse can make better software than I can. I do not want to work for a company that tries to sell projects only by stating that they are better than someone else. I want my company to tell customers about what we are good at.

So if you are in the hating or Better than business put a hate comment on this blog, if not have a good day.

The post Better than something or someone or just good appeared first on Gridshore.

Categories: Architecture, Programming

WorldCup 2014 Retrospective: The Magic about Mindset and Leadership

Xebia Blog - Sun, 07/13/2014 - 13:56

This weekend preparing this bjohan-cruijff-hollande-1974logpost, I ran into a brilliant quote from Johan Cruijff. At a conference a few years ago for the Dutch local government, he told a great story about a talented blind golfer, Ronald Boef he played golf with.  Despite his handicap, Ronald Boef played his best golf in difficult mental circumstances like playing balls over a big pond or consistent putting. The conclusion of Johan Cruijff: "Ronald doesn’t “see" the problems, he is only focussing on the next target. He thinks from a positive mindset".   I couldn’t agree more.  In my opinion, this is one of the fundamentals behind eXtreme Manufacturing (XM) and the reason why the Dutch team didn’t made it through the WorldCup finals.

Like many consultants, topsport is an inspiring source for me.  Almost every day I show or tell stories from great sport coaches like Marc Lammers or Johan Cruijjff.   Like every major sports event, also this WorldCup in Brasil contained some interesting lessons for me I wanted to share with you.

The Big Question: You can have the best individual team members but still not be able to perform.  Why?

 3rd Place Playoff - 2014 FIFA World Cup Brazil

Top Team Ingredient #1: Mindset

The defeat of Spain against the Netherlands, the glorious win of Germany over Brazil showed having fun, faith and determination pay off and a lack of these ingredients will bring you in a lot of trouble.   Until the penalty series of the semi-finals, the right side of this recipe also worked for the Dutch squad. Now, penalty series are for no one a fun exercise, which only leaves faith and determination.   Unlike the previous penalty series against Costa Rica, the Dutch team had no faith in their keeper as a penalty-killer which directly effected the teams determination. They became more hesitant and aware of what could happen when missing a penalty.  Yes, Ronald Boef probably would have taken the penalties better than the Dutch team did against Argentina.. ;-)

Top Team Ingredient #2: Leadership

Like Johan Cruijjf stated in the same video, the leader on the pitch should be 100% concentrated on every detail and also (in my words) be the natural leader of the team, coaching them in keeping the spirit up and giving them enough room “to grow".  Despite his great qualities as a football-player, as a captain Robin van Persie was obviously not the natural leader of the team. Arjan Robben was. The natural leadership of Arjan Robben in combination with his determination was an important reason why The Netherlands were able to regain their motivation and pull off a highly respected 3rd place in this WorldCup.

In my opinion, a high performing team should always have a natural leader.  The options:

  1. A formal leader with natural leadership qualities is the perfect combination.
  2. A formal leader without natural leadership qualities but able to delegate this to another team member is also okay.
  3. A formal leader without natural leadership qualities and ignoring don’t having this competence, is bad news for the team, the team’s environment but above all, for the formal leader himself.

For the new coach van the Dutch team, Guus Hiddink, it will be a challenge convincing Robin van Persie to step back as the 1st captain after nominating Arjan Robben.  Robin van Persie should keep one thing in mind here:  no one is doubting his qualities as a top world class striker.  As a natural leader however, he is not that world class.  Trying to be one is effecting his performance as a world class striker and that would in the end be a disappointment for his supporters but above all, for Robin van Persie himself.

What does this imply for Leadership within organizations?

Leadership, especially natural leadership, is crucial for having highly motivated and productive teams.  The team stays motivated and focussed on their goal.

How ever, a lot of employees are still instrumentally “nominated” to become a coach or manager without having any leadership skills.  In my opinion, natural leadership is something you can’t gain by nomination or just by learning it.  You can improve it, but there should be some basis of natural leadership.  Ignoring this can be even counter-productive: conflicts will arise, the spirit and productivity will go down.

Stuff The Internet Says On Scalability For July 11th, 2014

Hey, it's HighScalability time:


Yesterday in history: Nikola Tesla's Birthday, born in 1856. The greatest geek who ever lived?
  • 10Gbps: New world record broadband speed of 10 Gbps over copper.
  • Quotable Quotes:
    • @BenedictEvans: There were 40m internet users when Netscape IPOed. The time's not far off when a startup with 40m users will be too small to get funded.
    • Scott Aaronson: In any case, the question I asked myself about CLEVER/PageRank was not the one that, maybe in retrospect, I should have asked: namely, “how can I leverage the fact that I know the importance of this idea before most people do, in order to make millions of dollars?”
    • chub79: µservices aren't technological as much as they are cultural.
    • @Elmood: I thought of a new term when talking about code: "It's made from unmaintainium."
    • @lxt: Amazing how quickly a bunch of nines go up in smoke.
    • @martinrue: Knock knock. Race condition. Who's there?

  • The Master Switch: The Rise and Fall of Information Empires by Tim Wu: History shows a typical progression of information technologies: from somebody’s hobby to somebody’s industry; from jury-rigged contraption to slick production marvel; from a freely accessible channel to one strictly controlled by a single corporation or cartel—from open to closed system. History also shows that whatever has been closed too long is ripe for ingenuity’s assault: in time a closed industry can be opened anew, giving way to all sorts of technical possibilities and expressive uses for the medium before the effort to close the system likewise begins again.

  • Tim Freeman indulges a well developed Technothantos Complex and comes up with a great big list of outage postmortems. You'll find the usual, outages from configuration issues, failover failures, quorumnesia, protocol flapping, bugs in not your stuff that causes bugs in your stuff, power outages, capacity problems, JPOBs (just plain old bugs), DDOS attacks, and good old operator error. 

  • Pinterest describes PinLater, An asynchronous job execution system. PinLater executes hundreds of different job types at a processing rate of over 100,000 per second. So you may say yet another async job system, but it's clear keeping such a critical part of their infrastructure in house makes sense. The article is a good explanation of a fairly standard approach. It used Thrift for the API, it's written in Java, Twitter’s Finagle is used for the RPC framework. MySQL is "used for relatively low throughput use cases and those that schedule jobs over long periods and thus can benefit from storing jobs on disk rather than purely in memory." Redis is "used for high throughput job queues that are normally drained in real time." Horizontal scaling is via sharding. 

  • In science class we did this one day, but I just couldn't do it. Dissecting Message Queues. Tyler Treat looks at both brokerless and brokered queues by looking a throughput benchmarks, latency benchmarks, and through qualitative analysis. No winner was declared, but if you are making a choice in this area it's well worth reading. 

  • 40 Million hits a day on WordPress using a $10 VPS. Sure, it's a static site, but still a good example of what can be done these days. Stack: Nginx + PHP-FPM (aka LEMP Stack) + Microcaching. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Why Little's Law Works...Always

Xebia Blog - Fri, 07/11/2014 - 12:00

On the internet there is much information on Little's Law. It is described an explained in many places [Vac]. Recently, the conditions under which it is true got  attention [Ram11]. As will be explained in this blog the conditions under which the law is true are very mild. It will be shown that for teams working on backlog items virtually there are no conditions.

Why does it work? Under what conditions does it hold?

Airplane Folding

In the previous post (Applying Little's Law in Agile Games) I described how to calculate the quantities in Little's Law. As an example this was applied to the Agile game of folding airplanes consisting of 3 rounds of folding.

airplane-1

Let's look in more detail in which round an airplane was picked up and in which round it was completed. This is depicted in the following figure.

The horizontal axis shows the number of rounds. The vertical axis describes each airplane to fold. The picture is then interpreted as follows. Airplane no. 2 is picked up in round 1 and competed in the same round. It has a waiting time of 1 round. This is indicated at the right of the lowest shaded rectangle.
Airplane no. 8 was picked up in round 1 and finished in round 3. A waiting time of 3 rounds. Airplane no 12 (top most shaded area) was picked up in round 3 and unfinished. Up to round 3 a waiting time of 1 round.

The number 3, 5, and 10 denote the number of completed airplanes at the end of round 1, 2, and 3 respectively.

Waiting Times

The waiting times are determined by counting the number of 'cells' in a row.

The pictures show that we have 12 airplanes (12 'rows'), 3 completed in the first round, 2 more completed in the second round and 5 additionally  folded airplanes in the third and last round giving a total of 10 finished paper airplanes.

All twelve airplanes have waiting times of 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, and 1 respectively.

Work in Progress

In the figure below the number of airplanes picked up by the team in each round are indicated with red numbers above the round. new doc 6_2
In round 1 the team has taken up the folding of 11 airplanes (of which 3 are completed). In round 2 the team was folding 8 airplanes (of which 2 were competed) and in round 3 the team was folding 7 airplanes (of which it completed 5).

Work in progress is determined by counting the number of 'cells' in a column.

Little's Law.....Again

Now that we have determined the waiting times and amount of work in progress, let's calculate the average waiting time and average work in progress.

Average Waiting Time. This quantity we get by adding all waiting times and dividing by the number of items. This gives 26/12.

Average Work in Progress. This quantity is equal to (11+8+7)/3 = 26/3.

Average input rate. This is equal to 12 (the height of the third column) divided by 3 which gives 4.

Again we find that: Average Waiting Time = Average Work in Progress / Average input rate.

Why It Works

new doc 6_3 Little's Law works....always....because the average waiting times is got by adding the lengths of all the rows dividing by the number of rows, so it is proportional to the size of the shaded area in the picture to the right.

The average work in progress is got by adding the heights of the columns in the shaded area which is also proportional to the size of the shaded area.

Both the waiting time and work in progress relate to the size of the shaded area: one by adding the heights and the other by adding the rows. The proportionality corresponds to the average input rate.

Conditions

What assumptions did we make? None...well this is not exactly true. The only assumptions we make in this calculation:

  • We count discrete items
  • There are a finite number of rounds (or sprints)
  • Items enter and possibly leave the system.

That's it. It doesn't need to be stable, ageing (items having increasingly larger waiting times) is not a problem, prioritisation/scheduling of items (also known as queueing discipline), etc. Only the above assumptions need to be true.

Note: Especially the second condition is important, i.e. Little's Law is measured over a finite time interval. For infinite time interval additional conditions need to be fulfilled.

Note: When applying this to agile teams we always consider finite time intervals, e.g. 6 months, 1 year, 8 sprints, etc.

Conclusion

Little's Law is true because the average waiting time is proportional to the size of the shaded area (see figure) and the average work in progress is also proportional to the size of the same shaded area.

Only 3 basic conditions need to be met for Little's Law to be true.

References

[Vac] Little’s Law Part I, Dan Vacanti, http://corporatekanban.com/littles-law-part-i/

[Ram11] Little’s Law – It’s not about the numbers, Agile Ramblings, http://agileramblings.com/2012/12/11/littles-law-its-not-about-the-numbers/

Bootstrapping and monitoring multiple processes in Docker using monit

Xebia Blog - Thu, 07/10/2014 - 23:01

If you have every tried to start a docker container and keep it running, you must have encountered the problem that this is no easy task. Most stuff I like to start in container are things like http servers, application servers and various other middleware components which tend to have start scripts that daemonize the program. Starting a single process is a pain, starting multiple processes becomes nasty. My advise is to use monit to start all but the most simple Docker application containers! When I found monit while delving through the inner works Cloud Foundry, I was ecstatic about it! It was so elegant, small, fast, with a beautiful DSL that I thought it was the hottest thing since sliced bread! I was determined to blog it off the roof tops. Until.... I discovered that the first release dated from somewhere in 2002. So it was not hot and new; Clearly I had been lying under a UNIX rock for quite a while. This time, the time was right to write about it! Most of the middleware components I want to start in a docker container, have a habit to start the process, daemonize it and exit immediately, with the docker container on its tail. My first attempt to circumvent this while starting a tomcat server in Docker looked something like this:

/bin/bash -c "service tomcat7 start;while service tomcat7 status;do sleep 1;done

Quite horrific. Imaging the ugliness when you have to start multiple processes. A better solution is needed:  With the zabbix docker container  the problem was solved using simplevisor. As you can read in this post that was not a pleasant experience either. As I knew little about simplevisor and could not solve the problem, I put in an issue and  resorted to a plain installation. But a voice in my head started nagging: "Why don't you fix it and send a pull request?"  (actually, it was the voice of my colleague Arjan Molenaar). Then, I remembered from my earlier explorations to the inner workings of Cloud Foundry, a tool that would be really suitable for the job: monit. Why? It will:

  1. Give you a beautiful,readable specification file stating which processes to start
  2. Make sure that your processes will keep on running
  3. Deliver you a clean and crisp monitoring application
  4. Reduce all your Docker starts to a single command!

In the case of the Zabbix server there were seven processes to start: the zabbix server, agent, java agent, apache, mysql and sshd. In monit this looks as follows:

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
        start program = "/sbin/service mysqld start"
        stop program = "/sbin/service mysqld stop"

check process zabbix-server with pidfile /var/run/zabbix/zabbix_server.pid
        start program = "/sbin/service zabbix-server start"
        stop program = "/sbin/service zabbix-server stop"
        depends on mysqld

check process zabbix-agent with pidfile /var/run/zabbix/zabbix_agentd.pid
        start program = "/sbin/service zabbix-agent start"
        stop program = "/sbin/service zabbix-agent stop"

check process zabbix-java-gateway with pidfile /var/run/zabbix/zabbix_java.pid
        start program = "/sbin/service zabbix-java-gateway start"
        stop program = "/sbin/service zabbix-java-gateway stop"

check process httpd with pidfile /var/run/httpd/httpd.pid
        start program = "/sbin/service httpd start"
        stop program = "/sbin/service httpd stop"
        depends on zabbix-server

check process sshd with pidfile /var/run/sshd.pid
        start program = "/sbin/service sshd start"
        stop program = "/sbin/service sshd stop"

Normally when you start monit it will start as a daemon. But fortunately, you can prevent this with the following configuration.

set init

Your Dockerfile CMD can now always look the same:

    monit -d 10 -Ic /etc/monitrc

Finally, by adding the following statement to the configuration you get an application to view the status of your container processes,

set httpd
     port 2812
     allow myuser:mypassword

After starting the container, surf to port 2812 and you will get a beautiful page showing the state of your processes and the ability to stop and restart them.

monit overview monit process control

Just delve into the documentation of monit and you will find much more features that will allow you to monitor network ports and files, start corrective actions and send out alerts.

Monit is true to its UNIX heritage: it is elegant and promotes an autonomous monitoring system. Monit is cool!

One view or many?

Coding the Architecture - Simon Brown - Wed, 07/09/2014 - 22:16

In Diagramming Spring MVC webapps, I presented an approach that allows you to create a fairly comprehensive model of a software system in code. It starts with you creating a simple base model that includes software systems, people and containers. With this in place, all of the components can then be automatically populated into the model via a scan of the compiled Java code. This is all based upon Software architecture as code.

Once you have a model to work with, it's relatively straightforward to visualise it via a number of views. In the Spring PetClinic example, three separate views (one each of a system context, containers and components view) are sufficient to show everything. With larger software systems, however, this isn't the case.

As an example, here's what a single component diagram for the web application of my techtribes.je system looks like.

A mess

Yup, it's a mess. The components around the left, top and right edges are Spring MVC controllers, while those in the centre are the core components. There are clearly three hotspots here - the LoggingComponent, ActivityComponent and ContentSourceComponent. The reason for the first should be obvious, in that almost all components use the LoggingComponent. The latter two are used by all controllers, simply because some common information is displayed on the header of all pages on the website. I don't mind excluding the LoggingComponent from the view, but I'd quite like to keep the other two. That aside, even excluding the ActivityComponent and ContentSourceComponent doesn't actually solve the problem here. The resulting diagram is still a mess because it's showing far too much information. Instead, another approach is needed.

With this in mind, what I've done instead is use a programmatic approach to create a number of views for the techtribes.je web application, one per Spring MVC controller. The code looks like this.

The result is a larger number of simple diagrams, but I think that the trade-off is worth it. It's a much better way to navigate a large model.

Not so much of a mess

And here's an example component diagram that focusses on a single Spring MVC controller.

Not so much of a mess

The JSON representing the techtribes.je model can be found on GitHub and you can copy-paste it into my (still in-progress) diagramming tool if you'd like to explore the model yourself. I'm still experimenting with much of this but I really like the opportunities provided by having the software architecture model in code. This really is "software architecture for developers". :-)

Categories: Architecture

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Categories: Architecture

Sponsored Post: Surge, Apple, Dreambox, Chartbeat, Monitis, Netflix, Salesforce, Blizzard Entertainment, Cloudant, CopperEgg, Logentries, Gengo, ScaleOut Software, Couchbase, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngin

Who's Hiring?

  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here.
    • Senior Security Engineer. As a Senior Security Engineer on our team, you will be the ‘tip of the spear’ and will have direct impact on the Point-of-Sale system that powers Apple Retail globally. You will contribute to implementing standards and processes across multiple groups within the organization. You will also help lead the organization through a continuous process of learning and improving secure practices. Please apply here
    • Quality Assurance Engineer - Mobile Platforms. Apple’s Mobile Services/Emerging Technology group is looking for a highly motivated, result-oriented Quality Assurance Engineer. You will be responsible for overseeing quality engineering of mobile server and client platforms and applications in a fast-paced dynamic environment. Your job is to exceed our business customer's aggressive quality expectations and take the QA team forward on a path of continuous improvement. Please apply here.
    • Sr Software Engineer. Join Apple's Internet Applications Team, within the Information Systems and Technology group, as a Senior Software Engineer. Be involved in challenging and fast paced projects supporting Apple's business by delivering Java based IS Systems. Please apply here.
    • Senior Payment Engineer.  you will be responsible for working with cross-functional teams and developing Java server-based solutions to address business and technological needs. You will be helping design and build next generation retail solutions. You will be reviewing design and code developed by others on the team.You will build services and integrate with both internal as well as external services in a SOA environment. You will design and develop frameworks to be used by a large community of developers within the organization. Please apply here
    • Software Developer in Test. The iOS Systems team is looking for a Quality Assurance engineer. In this role you will be expected to work hand-in-hand with the software engineering team to find and diagnose software defects. The ideal candidate will also seek out ways to further automate all aspects of our existing process. This is a highly technical role and requires in-depth knowledge of both white-box and black-box testing methodologies. Please apply here
    • Senior Software Engineer -iOS Systems.Do you love building highly scalable, distributed web applications? Does the idea of a fast-paced environment make your heart leap? Do you want your technical abilities to be challenged every day, and for your work to make a difference in the lives of millions of people? If so, the iOS Systems Carrier Services team is looking for a talented software engineer who is not afraid to share knowledge, think outside the box, and question assumptions. Please apply here.

  • Asana. As an infrastructure engineer you will be designing software to process, query, search, analyze, and store data for applications that are continually growing in scale. You will work with a world-class team of engineers on deploying and operating existing systems, and building new ones for problems that are unique to our problem space. Please apply here.

  • Operations Engineer - AWS Cloud. Want to grow and extend a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join DreamBox Learning’s creative team of engineers, designers, and educators. Help us radically change education in an environment that values collaboration, innovation, integrity and fun. Please apply here. http://www.dreambox.com/careers

  • Chartbeat measures and monetizes attention on the web. Our traffic numbers are growing, and so is our list of product and feature ideas. That means we need you, and all your unparalleled backend engineer knowledge to help up us scale, extend, and evolve our infrastructure to handle it all. If you've these chops: www.chartbeat.com/jobs/be, come join the team!

  • The Salesforce.com Core Application Performance team is seeking talented and experienced software engineers to focus on system reliability and performance, developing solutions for our multi-tenant, on-demand cloud computing system. Ideal candidate is an experienced Java developer, likes solving real-world performance and scalability challenges and building new monitoring and analysis solutions to make our site more reliable, scalable and responsive. Please apply here.

  • Sr. Software Engineer - Distributed Systems. Membership platform is at the heart of Netflix product, supporting functions like customer identity, personalized profiles, experimentation, and more. Are you someone who loves to dig into data structure optimization, parallel execution, smart throttling and graceful degradation, SYN and accept queue configuration, and the like? Is the availability vs consistency tradeoff in a distributed system too obvious to you? Do you have an opinion about asynchronous execution and distributed co-ordination? Come join us

  • Java Software Engineers of all levels, your time is now. Blizzard Entertainment is leveling up its Battle.net team, and we want to hear from experienced and enthusiastic engineers who want to join them on their quest to produce the most epic customer-facing site experiences possible. As a Battle.net engineer, you'll be responsible for creating new (and improving existing) applications in a high-load, high-availability environment. Please apply here.

  • Human Translation Platform Gengo Seeks Sr. DevOps Engineer. Build an infrastructure capable of handling billions of translation jobs, worked on by tens of thousands of qualified translators. If you love playing with Amazon’s AWS, understand the challenges behind release-engineering, and get a kick out of analyzing log data for performance bottlenecks, please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • OmniTI has a reputation for scalable web applications and architectures, but we still lean on our friends and peers to see how things can be done better. Surge started as the brainchild of our employees wanting to bring the best and brightest in Web Operations to our own backyard. Now in its fifth year, Surge has become the conference on scalability and performance. Early Bird rate in effect until 7/24! 

  • FoundationDB has announced a new course on concurrency which is free and fully browser-accessible. The course is targeted at developers who are familiar with the FoundationDB Key-Value Store API and want to achieve high throughput in their applications.
Cool Products and Services
  • Now track your log activities with Log Monitor and be on the safe side! Monitor any type of log file and proactively define potential issues that could hurt your business' performance. Detect your log changes for: Error messages, Server connection failures, DNS errors, Potential malicious activity, and much more. Improve your systems and behaviour with Log Monitor.

  • The NoSQL "Family Tree" from Cloudant explains the NoSQL product landscape using an infographic. The highlights: NoSQL arose from "Big Data" (before it was called "Big Data"); NoSQL is not "One Size Fits All"; Vendor-driven versus Community-driven NoSQL.  Create a free Cloudant account and start the NoSQL goodness

  • Finally, log management and analytics can be easy, accessible across your team, and provide deep insights into data that matters across the business - from development, to operations, to business analytics. Create your free Logentries account here.

  • CopperEgg. Simple, Affordable Cloud Monitoring. CopperEgg gives you instant visibility into all of your cloud-hosted servers and applications. Cloud monitoring has never been so easy: lightweight, elastic monitoring; root cause analysis; data visualization; smart alerts. Get Started Now.

  • Aerospike in-Memory NoSQL database is now Open Source. Read the news and see who scales with Aerospike. Check out the code on github!

  • consistent: to be, or not to be. That’s the question. Is data in MongoDB consistent? It depends. It’s a trade-off between consistency and performance. However, does performance have to be sacrificed to maintain consistency? more.

  • Do Continuous MapReduce on Live Data? ScaleOut Software's hServer was built to let you hold your daily business data in-memory, update it as it changes, and concurrently run continuous MapReduce tasks on it to analyze it in real-time. We call this "stateful" analysis. To learn more check out hServer.

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Identifying Architectural Elements in Current Systems

Coding the Architecture - Simon Brown - Tue, 07/08/2014 - 10:27

Simon recently talked about the gap between Software Architecture and Code and how to close this with architecturally-evident coding. He's also creating tools to allow Software Architecture to be expressed as code.

If you're working on a greenfield project then including annotations to help with navigation is a great solution but what if you've inherited a large system with a model-code gap? Or if you only realise, sometime into a project, that you lack a model to help you understand its growing complexity? Well, Simon also had some thoughts on scanning Spring annotations to provide this data. This works quite well and it got me thinking about other artifacts in code that can be extracted for these diagrams.

(In more formal terms - for those of you that like to quote ISO42010 - we are trying to extract architectural elements from code that can be displayed within architectural views. Of course the elements may be from a variety of differing abstractions, levels and granularity and therefore need to be placed within differing views.)

So what can we extract from a current/legacy system to give us a view into it? Some suggestions include:

Annotations As already suggested, dependency injections systems such as Spring provide some annotations that can be extracted to give a basic, high level model. Annotations are also present in Java EE applications and other enterprise frameworks. XML DI Configuration files Many (legacy) Spring projects use xml configuration files to define beans. Having scanned a few examples this seems to create a relatively low level model which would need some manual tweaking after generation. With sensible naming convention for beans you can produce models for a desired abstraction. The bean properties indicate the connections between these elements. Module Bundling Systems Modular systems such as OSGi define bundles of components and services including lifecycle and service registry. The deployment information should provide a high level overview. Packages If you have used 'package-by-component' then your packages will relate one-to-one with your components. The links between components should be identifiable by the links between the classes within them (the has-a relationships). If you have package-by-layer then this is much harder or impossible to use. Experience tells me that most real-world systems are actually a combination of the two so you should have some useful information. Class Names It's very common for class names to contain a strong indication of their role e.g. XyzService, XyzConnector, XyzDao, XyzFacade etc. Scanning for known patterns should identify the element names and roles. Interfaces and class hierarchies If you implement interfaces (or extend base classes) then the interfaces used may show the abstraction level and type e.g. implementing Service, DAO, Connector, Repository etc Delegation or Library Dependency Shared delegates used by a set of classes/functions may indicate their purpose. e.g. components delegating to a database utility might indicate a DAO component or using a CORBA utility might indicate a service. This is likely to be time consuming as you need to identify and scan for each delegate you are interested in. Comments/Javadoc/JSDoc/NDoc/Doclets Comments and javadoc style API comments can provide a large amount of meta-information about a class or package. In fact, many UML modeling tools enrich code using custom comments and tags. This has the advantage of not affecting the compiled code or introducing library dependencies but may not be consistently used. Tests Test can provide a lot of meta-data about your system. Unit tests tend to be concentrated around important classes and often construct entire components to test. Simply extracting the names of classes that are directly tested will produce a useful list of components. The higher level systems tests will reveal the important services. Build Systems Build systems such as ant, maven, NuBuild etc all have hooks into the code base for building and deployment. A simple extraction of the build targets will give you the deployment modules (which is a very helpful view for operation teams). This may give you the required information for a Containers view.

What do you think? Of course all of the above is very dependent on your codebase but if none of the above works then you have to question the quality and structure of the code! The data extracted may need filtering and manual correction as it won't give you exactly what you want. You might consider creating structurizr annotations using an initial scan and then maintaining them. One of my tasks for the next few weeks is to try this out on some legacy codebases I maintain.

What other ways of identifying architectural elements can you think of?

Categories: Architecture

Applying Little's Law in Agile Games

Xebia Blog - Mon, 07/07/2014 - 21:20

Have you ever used Little's Law to explain that lower WiP (work in progress) limits lead to shorter cycle times? Ever tried to illustrate Little's Law in an Agile game and found it doesn't hold? Then read this blog to discover that it is exactly true in Agile games and how it really works.

Some time ago I gave a kanban workshop. Part of the workshop was a game of folding paper airplanes to illustrate flow. To illustrate Little's Law we determined the throughput, cycle time and work in progress. To my surprise the law didn't hold. Not even close. In this blog I want to share the insight into why it does work!

Introduction

It is well known that the average number of items in progress is proportional to the average cycle time of completed work items. The proportionality is the average input rate (or throughput rate) of work items. This relation is known as Little's Law. It was discovered by Little in the 1960s and has found many applications.

In kanban teams this relationship is often used to qualitatively argue that it is favourable for flow to have not too much work in parallel. To this end WiP (work in progress) limits are introduced. The smaller the WiP the smaller the average cycle time which means better flow.

A surprise to me was that it is exactly true and it remains true under very relaxed conditions.

Little's Law

In mathematical form the law is often stated as:

(1)

 \bar{N} = \lambda \bar{W} \bar{N} = \lambda \bar{W}

Here  \bar{N} \bar{N} is the average number of work items in progress at a certain time, and  \bar{W} \bar{W} is the average cycle time.  \lambda \lambda is the average input rate (new work items per unit of time). In stable systems this also equals the average throughput. In this case Little's Law is often (re)stated as

(2)

 \frac{\mathrm{Work\, in\, progress}}{\mathrm{Throughput}} = \mathrm{Cycle Time} \frac{\mathrm{Work\, in\, progress}}{\mathrm{Throughput}} = \mathrm{Cycle Time}

Conditions

In practise one considers Little's Law over a finite period of time, e.g. 6 months, 5 sprints, 3 rounds in an Agile game. Also in practise, teams work on backlog items which are discrete items. After the work is done this results in a new product increment.

Under the following conditions (1) is exact:

  • The system is observed over a finite period of time,
  • The system is a queueing system.

A queuing system is a system that consists of discrete items which arrive at a certain rate, receive service after which they depart.

Examples of a queueing system. An agile team works on backlog items. A kanban team that works on production incidents. A scrum team.

Agile Game

An often used game for explaining the importance of flow to team is the game of folding paper airplanes. Many forms of this games exist. See e.g. [Heintz11].

For this blog's purpose consider a team that folds air planes. The backlog is a stack of white paper. 3 Rounds of folding are done. Airplanes that are folded and fly at least 2 meters are considered done.

new doc 5_1-1

At the end of each round we will collect the following metrics:

  • number of completed airplanes
  • number of airplanes in progress and not yet finished.

The result of the the 3 rounds are shown at the right. At the end of round 1 Team A completed 3 airplanes and having 8 unfinished airplanes. Likewise, Team B finished 4 airplanes in round 3 giving a total of 12 finished airplanes and having 6 unfinished airplanes in progress.

The cycle time are got by writing the round number of the sheet of paper when starting to fold the airplane. When done, write the round number of the paper. The cycle time for one airplanes is got by subtracting the two and adding 1.

Calculating Little's Law

The way I was always calculating the number for work in progress, throughput and cycle time has been

  1. averaging cycle time for all completed airplanes,
  2. averaging the throughput over all rounds,
  3. averaging the work in progress over all rounds.

When calculated at the end of round 3, for Team A this amounts to:

  • Average work in progress = (8+6+2)/3 = 16/3,
  • Average throughput = 10 (completed airplanes)/3 = 10/3,
  • Average cycle time = 22/10 = 11/5

Using (2) above we get: 16/3 / (10/3) = 8/5. This is not equal to the average cycle time of 11/5. Not even close. How come?

The Truth

The interpretation of work in progress, throughput and cycle time I got from working with cumulative flow diagrams. There are many resources explaining these, see e.g. [Vega2011].

The key to the correct interpretation is choosing the time interval for which to measure the quantities  \bar{W} \bar{W} ,  \lambda \lambda , and  \bar{W} \bar{W} . Second, using the input rate instead of the throughput. Third, at the end of the time period include the unfinished items. Last, in calculating  \bar{N} \bar{N} consider all items that went through the system.

When we reinterpret the results for teams A and B we get

Team A

  • Average work in progress
    In round 1 3 airplanes were completed and left 8 unfinished; a total of 11 for work in progress (11 airplanes picked up as work)
    In round 2 the team completed 2 airplanes and have 6 unfinished; a total of 8
    In round 3 the team finished an additional 5 airplanes and left 2 uncompleted; a total of 7
    When measured over 3 rounds an average of (11+8+7)/3 = 26/3
  • Average input rate
    Using the input rate:
    In round 1 the team picked up 11 new airplanes
    In round 2 the team picked up no additional airplanes
    In round 3 one new airplanes was picked up.
    An average input rate of (11+0+1)/3 = 4 airplanes per round
  • Average cycle time
    At the end of the third round 2 airplanes are left in progress; one taken up in the third round having a waiting time of 1 and one left from the first round having waiting time of 3 rounds. A total waiting time of 22 + 3 + 1 = 26 rounds.
    Averaging over 12 airplanes we have an average cycle time of 26/12 = 13/6 rounds per airplane.

Dividing the average work in progress by the average input rate we get 26/3 divided by 4 = 26/12(!). This is exactly equal to the calculated average cycle time!

Team B

In a similar manner the reinterpreted results for team B are:

  • Average work in progress = (13+14+10)/3 = 37/3 airplanes,
  • Average input rate = (13+2+3)/3 = 6 airplanes per round,
  • Average cycle time = (27 (completed) + 10 (unfinished))/18 (airplanes) = 37/18 rounds per airplane

Again, dividing the average work in progress by the average input rate we get 37/18 rounds per airplane, which again is exactly equal to the average cycle time or waiting time!

Note: the cycle time of 10 days is built up by (a) 1 airplane from round 1 (cycle time of 3), 2 airplanes picked up in round 2 (total of 4 rounds), 3 airplanes picked up in round 3 (total of 3 rounds).

What About Cumulative Flow Diagrams?

Now that we understand how to calculate the quantities in Little's Law, we go back to cumulative flow diagrams. How come Little's Law works in this case.

In the case of teams that have collected data on cycle time, work in progress and throughput Little's Law work when done as explained in the section 'Calculating Little's Law' because:

  1. the teams are kept stable by having WiP limits on the left most column ("To Do"); then the throughput is more or less equal to the input rate,
  2. the team has completed a fairly large amount of work items in which case the waiting time of unfinished work items can be neglected,
  3. when measured over the (large part of the) value creation process, the completed items per time period can often be neglected in the calculation of the average work in progress.
Summary

Little's Law (1) holds under the conditions that (a) the system considered is a queueing system and (b) the observation or measurements are done over a finite time interval. It then holds independently of the stationaryness of the probability distributions, queuing discipline, emptiness of the system at the start and end of the time interval.

Calculate the quantities  \bar{N} \bar{N} ,  \lambda \lambda , and  \bar{W} \bar{W} as follows:

  • Average work in progress  \bar{N} \bar{N}
    For each time interval considered count the total amount of work in the system and add any items completed in that time interval.
  • Average cycle time  \bar{W} \bar{W}
    Sum the cycle times for all completed items and include the waiting time for unfinished items and divide by the total number of items.
  • Average input rate  \lambda \lambda
    Add the total number of items that entered the system and divide by the total number of time intervals.
References

[Little61] Little, J. D. C. 1961. A proof for the queuing formula: L = ãW . Oper. Res. 9(3) 383–387.

[Heintz11] John Heintz, June 2011, Agile Airplane Game, GistLabs, http://gistlabs.com/2011/06/agile-airplane-game/

[Vega11] Vega Information System Services, Inc., September 2011, Basics of Reading Cumulative Flow Diagrams, http://www.vissinc.com/2011/09/29/basics-of-reading-cumulative-flow-diagrams/

 

Scaling the World Cup - How Gambify runs a massive mobile betting app with a team of 2

This is a guest post by Elizabeth Osterloh and Tobias Wilke of cloudControl.

Startups face very different issues than big companies when they build software. Larger companies develop projects over much longer time frames and often have entire IT-departments to support them in creating customized architecture. It’s an entirely different story when a startup has a good idea, it gets popular, and they need to scale fast.

This was the situation for Gambify, an app for organizing betting games released just in time for the soccer World Cup. The company was founded and is run in Germany by only two people. When they managed to get a few major endorsements (including Adidas and the German team star Thomas Müller), they had to prepare for a sudden deluge of users, as well as very specific peak times.

The Gambify App: Basic Architecture
Categories: Architecture