Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Architecture

10 Program Busting Caching Mistakes

While Ten Caching Mistakes that Break your App by Omar Al Zabir is a few years old, it is still a great source of advice on using caches, especially on the differences between using a local in-memory cache and when using a distributed cache.

Here are the top 10 mistakes (summarized):
  1. Relying on a default serializer. Default serializers can use a lot of CPU, especially for complex types. Give some thought to the best serialization and deserialization method for your language and environment.
  2. Storing large objects in a single cache item. Because of serialization and deserialization costs, under concurrent load, frequent access to large object graphs can kill your server's CPU. Instead, break up the larger graph into smaller subgraphs and cache them separately. Retrieve only the smallest unit you need.
  3. Using cache to share objects between threads. Race conditions, when writes are involved, develop if parts of a program are accessing the same cached items simultaneously. Some sort of external locking mechanism is needed. 
  4. Assuming items will be in cache immediately after storing them. Never assume an item will be in a cache, even after it was just written, because a cache can flush items when memory gets tight. Code should always check for a null return value from a cache.
  5. Storing entire collection with nested objects. Storing an entire collection when you need to get a particular item results in poor performance because of the serialization overhead. Cache individual items separately so they can be retrieved separately. 
  6. Storing parent-child objects together and also separately. Sometimes an object will simultaneously be contained in two or more parent objects. To not have the same object stored in two different places in the cache store it on its own under its own key. The parent objects will then read the objects when access is needed.
  7. Caching Configuration settings. Store configuration data in a static variable that is local to your process. Accessing cached data is expensive so you want to avoid that cost when possible.
  8. Caching Live Objects that have open handle to stream, file, registry, or network. Don't cache objects the have references to resources like files, streams, memory, etc. When the cached item is removed from the cache those resources will not be deleted and system resources will leak. 
  9. Storing same item using multiple keys. It can be convenient to access an item by a key and an index number.  This can work when a cache is in-memory because the cache can contain a reference to the same object which means changes to the object will be seen through both access paths. When using a remote cache any updates won't be visible so the objects will get out of sync. 
  10. Not updating or deleting items in cache after updating or deleting them on persistent storage. Items in a remote cache are stored as a copy, so updating an object won't update the cache. The cache must specifically be updated for the changes to be seen by anyone else. With an in-memory cache changes to an object will be seen by everyone. Same for deletion. Deleting an object won't delete it from the cache. It's up to the program make sure cached items are deleted correctly.
Categories: Architecture

Bitly: Lessons Learned Building a Distributed System that Handles 6 Billion Clicks a Month

Have you ever wondered how bitly makes money? A URL shortener can’t be that hard to write, right? Sean O'Connor, Lead Application Developer at bitly, answers the how can bitly possibly make money question immediately in a talk he gave on bitly at the Bacon conference.

Writing a URL shortner that works is easy, says Sean, writing one that scales and is highly available, is not so easy.

Bitly doesn’t make money with a Shortening as a Service service, bitly makes money on an analytics product that mashes URL click data with with data they crawl from the web to help customers understand what people are paying attention to on the web. 

Analytics products began as a backend service that crawled web server logs. Logs contained data from annotated links along with cookie data to indicate where on a page a link was clicked, who clicked it, what the link was, etc. But the links all went back to the domain of the web site. The idea of making links go to a different domain than your own so that a 3rd party can do the analytics is a scary proposition, but it’s also kind of genius.

While this talk is not on bitly’s architecture, it is a thoughtful exploration on the nature of distributed systems and how you can solve bigger than one box problems with them.

Perhaps my favorite lesson from his talk is this one (my gloss):

SOA + queues + async messaging is really powerful. This approach isolates components, lets work happen concurrently, lets boxes fail independently, while still having components be easy to reason about.

I also really like his explanation for why event style messages are better than command style messages. I’ve never heard it put that way before.

Sean talks from a place of authentic experience. If you are trying to make a jump from a single box mindset to a multibox way of thinking, this talk is well worth watching. 

So let’s see what Sean has to say about distributed systems...

Stats
Categories: Architecture

Better than something or someone or just good

Gridshore - Sun, 07/13/2014 - 15:56

Recently I saw some tweets about the possibility to finally have a website to express your hate for Eclipse. Since I am a strong believer in another tool than eclipse and I really don’t want to work with eclipse, at first I thought this was funny. Than I was checking some news posts and found a post on apple insider where they show a Samsung advertorial that bashes the iPhone battery life. Again, funny if you like samsung or you think android is better than the iPhone. Maybe even funny if you still love the iPhone.

These tweets and sources of information made me think about marketing of products, opinions of people and sometimes company strategies. As a company, do you want to prove you are better than someone else, or do you want to prove that you are really good. Personally I do not really care if I am better than someone else, I only care if I am good or maybe even superb.

Does it really work to be better than … ? Was Argentina better than the Dutch just a week a go? If not, like so many say, did it help the Dutch? Better than is an interesting concept in soccer. If your soccer teams never wins the championship, you as a fan still think they are the best. I get the same with my local soccer team, every parent thinks his kid is better than the other kids. So better than is incredibly subjective. If you say you create better software, does it sell better? What is good software? If you are into software quality you probably have a few tools to calculate the quality of software. Are you a better programmer if the lines of code you write have shorter length? Are you a better software developer if you use another tool than eclipse? I don’t really care. Use eclipse if you want to, use something else if you can afford it.

I like to make fun of people that use eclipse just like I make fun of people using linux or windows. But in the end, I don’t really care. I know people using windows or linux, working on eclipse can make better software than I can. I do not want to work for a company that tries to sell projects only by stating that they are better than someone else. I want my company to tell customers about what we are good at.

So if you are in the hating or Better than business put a hate comment on this blog, if not have a good day.

The post Better than something or someone or just good appeared first on Gridshore.

Categories: Architecture, Programming

WorldCup 2014 Retrospective: The Magic about Mindset and Leadership

Xebia Blog - Sun, 07/13/2014 - 13:56

This weekend preparing this bjohan-cruijff-hollande-1974logpost, I ran into a brilliant quote from Johan Cruijff. At a conference a few years ago for the Dutch local government, he told a great story about a talented blind golfer, Ronald Boef he played golf with.  Despite his handicap, Ronald Boef played his best golf in difficult mental circumstances like playing balls over a big pond or consistent putting. The conclusion of Johan Cruijff: "Ronald doesn’t “see" the problems, he is only focussing on the next target. He thinks from a positive mindset".   I couldn’t agree more.  In my opinion, this is one of the fundamentals behind eXtreme Manufacturing (XM) and the reason why the Dutch team didn’t made it through the WorldCup finals.

Like many consultants, topsport is an inspiring source for me.  Almost every day I show or tell stories from great sport coaches like Marc Lammers or Johan Cruijjff.   Like every major sports event, also this WorldCup in Brasil contained some interesting lessons for me I wanted to share with you.

The Big Question: You can have the best individual team members but still not be able to perform.  Why?

 3rd Place Playoff - 2014 FIFA World Cup Brazil

Top Team Ingredient #1: Mindset

The defeat of Spain against the Netherlands, the glorious win of Germany over Brazil showed having fun, faith and determination pay off and a lack of these ingredients will bring you in a lot of trouble.   Until the penalty series of the semi-finals, the right side of this recipe also worked for the Dutch squad. Now, penalty series are for no one a fun exercise, which only leaves faith and determination.   Unlike the previous penalty series against Costa Rica, the Dutch team had no faith in their keeper as a penalty-killer which directly effected the teams determination. They became more hesitant and aware of what could happen when missing a penalty.  Yes, Ronald Boef probably would have taken the penalties better than the Dutch team did against Argentina.. ;-)

Top Team Ingredient #2: Leadership

Like Johan Cruijjf stated in the same video, the leader on the pitch should be 100% concentrated on every detail and also (in my words) be the natural leader of the team, coaching them in keeping the spirit up and giving them enough room “to grow".  Despite his great qualities as a football-player, as a captain Robin van Persie was obviously not the natural leader of the team. Arjan Robben was. The natural leadership of Arjan Robben in combination with his determination was an important reason why The Netherlands were able to regain their motivation and pull off a highly respected 3rd place in this WorldCup.

In my opinion, a high performing team should always have a natural leader.  The options:

  1. A formal leader with natural leadership qualities is the perfect combination.
  2. A formal leader without natural leadership qualities but able to delegate this to another team member is also okay.
  3. A formal leader without natural leadership qualities and ignoring don’t having this competence, is bad news for the team, the team’s environment but above all, for the formal leader himself.

For the new coach van the Dutch team, Guus Hiddink, it will be a challenge convincing Robin van Persie to step back as the 1st captain after nominating Arjan Robben.  Robin van Persie should keep one thing in mind here:  no one is doubting his qualities as a top world class striker.  As a natural leader however, he is not that world class.  Trying to be one is effecting his performance as a world class striker and that would in the end be a disappointment for his supporters but above all, for Robin van Persie himself.

What does this imply for Leadership within organizations?

Leadership, especially natural leadership, is crucial for having highly motivated and productive teams.  The team stays motivated and focussed on their goal.

How ever, a lot of employees are still instrumentally ‚Äúnominated‚ÄĚ to become a coach or manager without having any leadership skills. ¬†In my opinion, natural leadership is something you can‚Äôt gain by nomination or just by learning it. ¬†You can improve it, but there should be some basis of natural leadership. ¬†Ignoring this can be even counter-productive: conflicts will arise, the spirit and productivity will go down.

Stuff The Internet Says On Scalability For July 11th, 2014

Hey, it's HighScalability time:


Yesterday in history: Nikola Tesla's Birthday, born in 1856. The greatest geek who ever lived?
  • 10Gbps: New world record broadband speed of 10 Gbps over copper.
  • Quotable Quotes:
    • @BenedictEvans: There were 40m internet users when Netscape IPOed. The time's not far off when a startup with 40m users will be too small to get funded.
    • Scott Aaronson: In any case, the question I asked myself about CLEVER/PageRank was not the one that, maybe in retrospect, I should have asked: namely, “how can I leverage the fact that I know the importance of this idea before most people do, in order to make millions of dollars?”
    • chub79: µservices aren't technological as much as they are cultural.
    • @Elmood: I thought of a new term when talking about code: "It's made from unmaintainium."
    • @lxt: Amazing how quickly a bunch of nines go up in smoke.
    • @martinrue: Knock knock. Race condition. Who's there?

  • The Master Switch: The Rise and Fall of Information Empires by Tim Wu: History shows a typical progression of information technologies: from somebody’s hobby to somebody’s industry; from jury-rigged contraption to slick production marvel; from a freely accessible channel to one strictly controlled by a single corporation or cartel—from open to closed system. History also shows that whatever has been closed too long is ripe for ingenuity’s assault: in time a closed industry can be opened anew, giving way to all sorts of technical possibilities and expressive uses for the medium before the effort to close the system likewise begins again.

  • Tim Freeman indulges a well developed Technothantos Complex and comes up with a great big list of outage postmortems. You'll find the usual, outages from configuration issues, failover failures, quorumnesia, protocol flapping, bugs in not your stuff that causes bugs in your stuff, power outages, capacity problems, JPOBs (just plain old bugs), DDOS attacks, and good old operator error. 

  • Pinterest describes PinLater, An asynchronous job execution system. PinLater executes hundreds of different job types at a processing rate of over 100,000 per second. So you may say yet another async job system, but it's clear keeping such a critical part of their infrastructure in house makes sense. The article is a good explanation of a fairly standard approach. It used Thrift for the API, it's written in Java, Twitter’s Finagle is used for the RPC framework. MySQL is "used for relatively low throughput use cases and those that schedule jobs over long periods and thus can benefit from storing jobs on disk rather than purely in memory." Redis is "used for high throughput job queues that are normally drained in real time." Horizontal scaling is via sharding. 

  • In science class we did this one day, but I just couldn't do it. Dissecting Message Queues. Tyler Treat looks at both brokerless and brokered queues by looking a throughput benchmarks, latency benchmarks, and through qualitative analysis. No winner was declared, but if you are making a choice in this area it's well worth reading. 

  • 40 Million hits a day on WordPress using a $10 VPS. Sure, it's a static site, but still a good example of what can be done these days. Stack: Nginx + PHP-FPM (aka LEMP Stack) + Microcaching. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Why Little's Law Works...Always

Xebia Blog - Fri, 07/11/2014 - 12:00

On the internet there is much information on Little's Law. It is described an explained in many places [Vac]. Recently, the conditions under which it is true got  attention [Ram11]. As will be explained in this blog the conditions under which the law is true are very mild. It will be shown that for teams working on backlog items virtually there are no conditions.

Why does it work? Under what conditions does it hold?

Airplane Folding

In the previous post (Applying Little's Law in Agile Games) I described how to calculate the quantities in Little's Law. As an example this was applied to the Agile game of folding airplanes consisting of 3 rounds of folding.

airplane-1

Let's look in more detail in which round an airplane was picked up and in which round it was completed. This is depicted in the following figure.

The horizontal axis shows the number of rounds. The vertical axis describes each airplane to fold. The picture is then interpreted as follows. Airplane no. 2 is picked up in round 1 and competed in the same round. It has a waiting time of 1 round. This is indicated at the right of the lowest shaded rectangle.
Airplane no. 8 was picked up in round 1 and finished in round 3. A waiting time of 3 rounds. Airplane no 12 (top most shaded area) was picked up in round 3 and unfinished. Up to round 3 a waiting time of 1 round.

The number 3, 5, and 10 denote the number of completed airplanes at the end of round 1, 2, and 3 respectively.

Waiting Times

The waiting times are determined by counting the number of 'cells' in a row.

The pictures show that we have 12 airplanes (12 'rows'), 3 completed in the first round, 2 more completed in the second round and 5 additionally  folded airplanes in the third and last round giving a total of 10 finished paper airplanes.

All twelve airplanes have waiting times of 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, and 1 respectively.

Work in Progress

In the figure below the number of airplanes picked up by the team in each round are indicated with red numbers above the round. new doc 6_2
In round 1 the team has taken up the folding of 11 airplanes (of which 3 are completed). In round 2 the team was folding 8 airplanes (of which 2 were competed) and in round 3 the team was folding 7 airplanes (of which it completed 5).

Work in progress is determined by counting the number of 'cells' in a column.

Little's Law.....Again

Now that we have determined the waiting times and amount of work in progress, let's calculate the average waiting time and average work in progress.

Average Waiting Time. This quantity we get by adding all waiting times and dividing by the number of items. This gives 26/12.

Average Work in Progress. This quantity is equal to (11+8+7)/3 = 26/3.

Average input rate. This is equal to 12 (the height of the third column) divided by 3 which gives 4.

Again we find that: Average Waiting Time = Average Work in Progress / Average input rate.

Why It Works

new doc 6_3 Little's Law works....always....because the average waiting times is got by adding the lengths of all the rows dividing by the number of rows, so it is proportional to the size of the shaded area in the picture to the right.

The average work in progress is got by adding the heights of the columns in the shaded area which is also proportional to the size of the shaded area.

Both the waiting time and work in progress relate to the size of the shaded area: one by adding the heights and the other by adding the rows. The proportionality corresponds to the average input rate.

Conditions

What assumptions did we make? None...well this is not exactly true. The only assumptions we make in this calculation:

  • We count discrete items
  • There are a finite number of rounds (or sprints)
  • Items enter and possibly leave the system.

That's it. It doesn't need to be stable, ageing (items having increasingly larger waiting times) is not a problem, prioritisation/scheduling of items (also known as queueing discipline), etc. Only the above assumptions need to be true.

Note: Especially the second condition is important, i.e. Little's Law is measured over a finite time interval. For infinite time interval additional conditions need to be fulfilled.

Note: When applying this to agile teams we always consider finite time intervals, e.g. 6 months, 1 year, 8 sprints, etc.

Conclusion

Little's Law is true because the average waiting time is proportional to the size of the shaded area (see figure) and the average work in progress is also proportional to the size of the same shaded area.

Only 3 basic conditions need to be met for Little's Law to be true.

References

[Vac] Little’s Law Part I, Dan Vacanti, http://corporatekanban.com/littles-law-part-i/

[Ram11] Little‚Äôs Law ‚Äď It‚Äôs not about the numbers,¬†Agile Ramblings, http://agileramblings.com/2012/12/11/littles-law-its-not-about-the-numbers/

Bootstrapping and monitoring multiple processes in Docker using monit

Xebia Blog - Thu, 07/10/2014 - 23:01

If you have every tried to start a docker container and keep it running, you must have encountered the problem that this is no easy task. Most stuff I like to start in container are things like http servers, application servers and various other middleware components which tend to have start scripts that daemonize the program. Starting a single process is a pain, starting multiple processes becomes nasty. My advise is to use monit to start all but the most simple Docker application containers! When I found monit while delving through the inner works Cloud Foundry, I was ecstatic about it! It was so elegant, small, fast, with a beautiful DSL that I thought it was the hottest thing since sliced bread! I was determined to blog it off the roof tops. Until.... I discovered that the first release dated from somewhere in 2002. So it was not hot and new; Clearly I had been lying under a UNIX rock for quite a while. This time, the time was right to write about it! Most of the middleware components I want to start in a docker container, have a habit to start the process, daemonize it and exit immediately, with the docker container on its tail. My first attempt to circumvent this while starting a tomcat server in Docker looked something like this:

/bin/bash -c "service tomcat7 start;while service tomcat7 status;do sleep 1;done

Quite horrific. Imaging the ugliness when you have to start multiple processes. A better solution is needed:  With the zabbix docker container  the problem was solved using simplevisor. As you can read in this post that was not a pleasant experience either. As I knew little about simplevisor and could not solve the problem, I put in an issue and  resorted to a plain installation. But a voice in my head started nagging: "Why don't you fix it and send a pull request?"  (actually, it was the voice of my colleague Arjan Molenaar). Then, I remembered from my earlier explorations to the inner workings of Cloud Foundry, a tool that would be really suitable for the job: monit. Why? It will:

  1. Give you a beautiful,readable specification file stating which processes to start
  2. Make sure that your processes will keep on running
  3. Deliver you a clean and crisp monitoring application
  4. Reduce all your Docker starts to a single command!

In the case of the Zabbix server there were seven processes to start: the zabbix server, agent, java agent, apache, mysql and sshd. In monit this looks as follows:

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
        start program = "/sbin/service mysqld start"
        stop program = "/sbin/service mysqld stop"

check process zabbix-server with pidfile /var/run/zabbix/zabbix_server.pid
        start program = "/sbin/service zabbix-server start"
        stop program = "/sbin/service zabbix-server stop"
        depends on mysqld

check process zabbix-agent with pidfile /var/run/zabbix/zabbix_agentd.pid
        start program = "/sbin/service zabbix-agent start"
        stop program = "/sbin/service zabbix-agent stop"

check process zabbix-java-gateway with pidfile /var/run/zabbix/zabbix_java.pid
        start program = "/sbin/service zabbix-java-gateway start"
        stop program = "/sbin/service zabbix-java-gateway stop"

check process httpd with pidfile /var/run/httpd/httpd.pid
        start program = "/sbin/service httpd start"
        stop program = "/sbin/service httpd stop"
        depends on zabbix-server

check process sshd with pidfile /var/run/sshd.pid
        start program = "/sbin/service sshd start"
        stop program = "/sbin/service sshd stop"

Normally when you start monit it will start as a daemon. But fortunately, you can prevent this with the following configuration.

set init

Your Dockerfile CMD can now always look the same:

    monit -d 10 -Ic /etc/monitrc

Finally, by adding the following statement to the configuration you get an application to view the status of your container processes,

set httpd
     port 2812
     allow myuser:mypassword

After starting the container, surf to port 2812 and you will get a beautiful page showing the state of your processes and the ability to stop and restart them.

monit overview monit process control

Just delve into the documentation of monit and you will find much more features that will allow you to monitor network ports and files, start corrective actions and send out alerts.

Monit is true to its UNIX heritage: it is elegant and promotes an autonomous monitoring system. Monit is cool!

One view or many?

Coding the Architecture - Simon Brown - Wed, 07/09/2014 - 22:16

In Diagramming Spring MVC webapps, I presented an approach that allows you to create a fairly comprehensive model of a software system in code. It starts with you creating a simple base model that includes software systems, people and containers. With this in place, all of the components can then be automatically populated into the model via a scan of the compiled Java code. This is all based upon Software architecture as code.

Once you have a model to work with, it's relatively straightforward to visualise it via a number of views. In the Spring PetClinic example, three separate views (one each of a system context, containers and components view) are sufficient to show everything. With larger software systems, however, this isn't the case.

As an example, here's what a single component diagram for the web application of my techtribes.je system looks like.

A mess

Yup, it's a mess. The components around the left, top and right edges are Spring MVC controllers, while those in the centre are the core components. There are clearly three hotspots here - the LoggingComponent, ActivityComponent and ContentSourceComponent. The reason for the first should be obvious, in that almost all components use the LoggingComponent. The latter two are used by all controllers, simply because some common information is displayed on the header of all pages on the website. I don't mind excluding the LoggingComponent from the view, but I'd quite like to keep the other two. That aside, even excluding the ActivityComponent and ContentSourceComponent doesn't actually solve the problem here. The resulting diagram is still a mess because it's showing far too much information. Instead, another approach is needed.

With this in mind, what I've done instead is use a programmatic approach to create a number of views for the techtribes.je web application, one per Spring MVC controller. The code looks like this.

The result is a larger number of simple diagrams, but I think that the trade-off is worth it. It's a much better way to navigate a large model.

Not so much of a mess

And here's an example component diagram that focusses on a single Spring MVC controller.

Not so much of a mess

The JSON representing the techtribes.je model can be found on GitHub and you can copy-paste it into my (still in-progress) diagramming tool if you'd like to explore the model yourself. I'm still experimenting with much of this but I really like the opportunities provided by having the software architecture model in code. This really is "software architecture for developers". :-)

Categories: Architecture

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Categories: Architecture

Sponsored Post: Surge, Apple, Dreambox, Chartbeat, Monitis, Netflix, Salesforce, Blizzard Entertainment, Cloudant, CopperEgg, Logentries, Gengo, ScaleOut Software, Couchbase, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngin

Who's Hiring?

  • Apple has multiple openings. Changing the world is all in a day's work at Apple. Imagine what you could do here.
    • Senior Security Engineer. As a Senior Security Engineer on our team, you will be the ‘tip of the spear’ and will have direct impact on the Point-of-Sale system that powers Apple Retail globally. You will contribute to implementing standards and processes across multiple groups within the organization. You will also help lead the organization through a continuous process of learning and improving secure practices. Please apply here
    • Quality Assurance Engineer - Mobile Platforms. Apple’s Mobile Services/Emerging Technology group is looking for a highly motivated, result-oriented Quality Assurance Engineer. You will be responsible for overseeing quality engineering of mobile server and client platforms and applications in a fast-paced dynamic environment. Your job is to exceed our business customer's aggressive quality expectations and take the QA team forward on a path of continuous improvement. Please apply here.
    • Sr Software Engineer. Join Apple's Internet Applications Team, within the Information Systems and Technology group, as a Senior Software Engineer. Be involved in challenging and fast paced projects supporting Apple's business by delivering Java based IS Systems. Please apply here.
    • Senior Payment Engineer.  you will be responsible for working with cross-functional teams and developing Java server-based solutions to address business and technological needs. You will be helping design and build next generation retail solutions. You will be reviewing design and code developed by others on the team.You will build services and integrate with both internal as well as external services in a SOA environment. You will design and develop frameworks to be used by a large community of developers within the organization. Please apply here
    • Software Developer in Test. The iOS Systems team is looking for a Quality Assurance engineer. In this role you will be expected to work hand-in-hand with the software engineering team to find and diagnose software defects. The ideal candidate will also seek out ways to further automate all aspects of our existing process. This is a highly technical role and requires in-depth knowledge of both white-box and black-box testing methodologies. Please apply here
    • Senior Software Engineer -iOS Systems.Do you love building highly scalable, distributed web applications? Does the idea of a fast-paced environment make your heart leap? Do you want your technical abilities to be challenged every day, and for your work to make a difference in the lives of millions of people? If so, the iOS Systems Carrier Services team is looking for a talented software engineer who is not afraid to share knowledge, think outside the box, and question assumptions. Please apply here.

  • Asana. As an infrastructure engineer you will be designing software to process, query, search, analyze, and store data for applications that are continually growing in scale. You will work with a world-class team of engineers on deploying and operating existing systems, and building new ones for problems that are unique to our problem space. Please apply here.

  • Operations Engineer - AWS Cloud. Want to grow and extend a cutting-edge cloud deployment? Take charge of an innovative 24x7 web service infrastructure on the AWS Cloud? Join DreamBox Learning’s creative team of engineers, designers, and educators. Help us radically change education in an environment that values collaboration, innovation, integrity and fun. Please apply here. http://www.dreambox.com/careers

  • Chartbeat measures and monetizes attention on the web. Our traffic numbers are growing, and so is our list of product and feature ideas. That means we need you, and all your unparalleled backend engineer knowledge to help up us scale, extend, and evolve our infrastructure to handle it all. If you've these chops: www.chartbeat.com/jobs/be, come join the team!

  • The Salesforce.com Core Application Performance team is seeking talented and experienced software engineers to focus on system reliability and performance, developing solutions for our multi-tenant, on-demand cloud computing system. Ideal candidate is an experienced Java developer, likes solving real-world performance and scalability challenges and building new monitoring and analysis solutions to make our site more reliable, scalable and responsive. Please apply here.

  • Sr. Software Engineer - Distributed Systems. Membership platform is at the heart of Netflix product, supporting functions like customer identity, personalized profiles, experimentation, and more. Are you someone who loves to dig into data structure optimization, parallel execution, smart throttling and graceful degradation, SYN and accept queue configuration, and the like? Is the availability vs consistency tradeoff in a distributed system too obvious to you? Do you have an opinion about asynchronous execution and distributed co-ordination? Come join us

  • Java Software Engineers of all levels, your time is now. Blizzard Entertainment is leveling up its Battle.net team, and we want to hear from experienced and enthusiastic engineers who want to join them on their quest to produce the most epic customer-facing site experiences possible. As a Battle.net engineer, you'll be responsible for creating new (and improving existing) applications in a high-load, high-availability environment. Please apply here.

  • Human Translation Platform Gengo Seeks Sr. DevOps Engineer. Build an infrastructure capable of handling billions of translation jobs, worked on by tens of thousands of qualified translators. If you love playing with Amazon’s AWS, understand the challenges behind release-engineering, and get a kick out of analyzing log data for performance bottlenecks, please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • OmniTI has a reputation for scalable web applications and architectures, but we still lean on our friends and peers to see how things can be done better. Surge started as the brainchild of our employees wanting to bring the best and brightest in Web Operations to our own backyard. Now in its fifth year, Surge has become the conference on scalability and performance. Early Bird rate in effect until 7/24! 

  • FoundationDB has announced a new course on concurrency which is free and fully browser-accessible. The course is targeted at developers who are familiar with the FoundationDB Key-Value Store API and want to achieve high throughput in their applications.
Cool Products and Services
  • Now track your log activities with Log Monitor and be on the safe side! Monitor any type of log file and proactively define potential issues that could hurt your business' performance. Detect your log changes for: Error messages, Server connection failures, DNS errors, Potential malicious activity, and much more. Improve your systems and behaviour with Log Monitor.

  • The NoSQL "Family Tree" from Cloudant explains the NoSQL product landscape using an infographic. The highlights: NoSQL arose from "Big Data" (before it was called "Big Data"); NoSQL is not "One Size Fits All"; Vendor-driven versus Community-driven NoSQL.  Create a free Cloudant account and start the NoSQL goodness

  • Finally, log management and analytics can be easy, accessible across your team, and provide deep insights into data that matters across the business - from development, to operations, to business analytics. Create your free Logentries account here.

  • CopperEgg. Simple, Affordable Cloud Monitoring. CopperEgg gives you instant visibility into all of your cloud-hosted servers and applications. Cloud monitoring has never been so easy: lightweight, elastic monitoring; root cause analysis; data visualization; smart alerts. Get Started Now.

  • Aerospike in-Memory NoSQL database is now Open Source. Read the news and see who scales with Aerospike. Check out the code on github!

  • consistent: to be, or not to be. That’s the question. Is data in MongoDB consistent? It depends. It’s a trade-off between consistency and performance. However, does performance have to be sacrificed to maintain consistency? more.

  • Do Continuous MapReduce on Live Data? ScaleOut Software's hServer was built to let you hold your daily business data in-memory, update it as it changes, and concurrently run continuous MapReduce tasks on it to analyze it in real-time. We call this "stateful" analysis. To learn more check out hServer.

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

Identifying Architectural Elements in Current Systems

Coding the Architecture - Simon Brown - Tue, 07/08/2014 - 10:27

Simon recently talked about the gap between Software Architecture and Code and how to close this with architecturally-evident coding. He's also creating tools to allow Software Architecture to be expressed as code.

If you're working on a greenfield project then including annotations to help with navigation is a great solution but what if you've inherited a large system with a model-code gap? Or if you only realise, sometime into a project, that you lack a model to help you understand its growing complexity? Well, Simon also had some thoughts on scanning Spring annotations to provide this data. This works quite well and it got me thinking about other artifacts in code that can be extracted for these diagrams.

(In more formal terms - for those of you that like to quote ISO42010 - we are trying to extract architectural elements from code that can be displayed within architectural views. Of course the elements may be from a variety of differing abstractions, levels and granularity and therefore need to be placed within differing views.)

So what can we extract from a current/legacy system to give us a view into it? Some suggestions include:

Annotations As already suggested, dependency injections systems such as Spring provide some annotations that can be extracted to give a basic, high level model. Annotations are also present in Java EE applications and other enterprise frameworks. XML DI Configuration files Many (legacy) Spring projects use xml configuration files to define beans. Having scanned a few examples this seems to create a relatively low level model which would need some manual tweaking after generation. With sensible naming convention for beans you can produce models for a desired abstraction. The bean properties indicate the connections between these elements. Module Bundling Systems Modular systems such as OSGi define bundles of components and services including lifecycle and service registry. The deployment information should provide a high level overview. Packages If you have used 'package-by-component' then your packages will relate one-to-one with your components. The links between components should be identifiable by the links between the classes within them (the has-a relationships). If you have package-by-layer then this is much harder or impossible to use. Experience tells me that most real-world systems are actually a combination of the two so you should have some useful information. Class Names It's very common for class names to contain a strong indication of their role e.g. XyzService, XyzConnector, XyzDao, XyzFacade etc. Scanning for known patterns should identify the element names and roles. Interfaces and class hierarchies If you implement interfaces (or extend base classes) then the interfaces used may show the abstraction level and type e.g. implementing Service, DAO, Connector, Repository etc Delegation or Library Dependency Shared delegates used by a set of classes/functions may indicate their purpose. e.g. components delegating to a database utility might indicate a DAO component or using a CORBA utility might indicate a service. This is likely to be time consuming as you need to identify and scan for each delegate you are interested in. Comments/Javadoc/JSDoc/NDoc/Doclets Comments and javadoc style API comments can provide a large amount of meta-information about a class or package. In fact, many UML modeling tools enrich code using custom comments and tags. This has the advantage of not affecting the compiled code or introducing library dependencies but may not be consistently used. Tests Test can provide a lot of meta-data about your system. Unit tests tend to be concentrated around important classes and often construct entire components to test. Simply extracting the names of classes that are directly tested will produce a useful list of components. The higher level systems tests will reveal the important services. Build Systems Build systems such as ant, maven, NuBuild etc all have hooks into the code base for building and deployment. A simple extraction of the build targets will give you the deployment modules (which is a very helpful view for operation teams). This may give you the required information for a Containers view.

What do you think? Of course all of the above is very dependent on your codebase but if none of the above works then you have to question the quality and structure of the code! The data extracted may need filtering and manual correction as it won't give you exactly what you want. You might consider creating structurizr annotations using an initial scan and then maintaining them. One of my tasks for the next few weeks is to try this out on some legacy codebases I maintain.

What other ways of identifying architectural elements can you think of?

Categories: Architecture

Applying Little's Law in Agile Games

Xebia Blog - Mon, 07/07/2014 - 21:20

Have you ever used Little's Law to explain that lower WiP (work in progress) limits lead to shorter cycle times? Ever tried to illustrate Little's Law in an Agile game and found it doesn't hold? Then read this blog to discover that it is exactly true in Agile games and how it really works.

Some time ago I gave a kanban workshop. Part of the workshop was a game of folding paper airplanes to illustrate flow. To illustrate Little's Law we determined the throughput, cycle time and work in progress. To my surprise the law didn't hold. Not even close. In this blog I want to share the insight into why it does work!

Introduction

It is well known that the average number of items in progress is proportional to the average cycle time of completed work items. The proportionality is the average input rate (or throughput rate) of work items. This relation is known as Little's Law. It was discovered by Little in the 1960s and has found many applications.

In kanban teams this relationship is often used to qualitatively argue that it is favourable for flow to have not too much work in parallel. To this end WiP (work in progress) limits are introduced. The smaller the WiP the smaller the average cycle time which means better flow.

A surprise to me was that it is exactly true and it remains true under very relaxed conditions.

Little's Law

In mathematical form the law is often stated as:

(1)

 \bar{N} = \lambda \bar{W} \bar{N} = \lambda \bar{W}

Here  \bar{N} \bar{N} is the average number of work items in progress at a certain time, and  \bar{W} \bar{W} is the average cycle time.  \lambda \lambda is the average input rate (new work items per unit of time). In stable systems this also equals the average throughput. In this case Little's Law is often (re)stated as

(2)

 \frac{\mathrm{Work\, in\, progress}}{\mathrm{Throughput}} = \mathrm{Cycle Time} \frac{\mathrm{Work\, in\, progress}}{\mathrm{Throughput}} = \mathrm{Cycle Time}

Conditions

In practise one considers Little's Law over a finite period of time, e.g. 6 months, 5 sprints, 3 rounds in an Agile game. Also in practise, teams work on backlog items which are discrete items. After the work is done this results in a new product increment.

Under the following conditions (1) is exact:

  • The system is observed over a finite period of time,
  • The system is a queueing system.

A queuing system is a system that consists of discrete items which arrive at a certain rate, receive service after which they depart.

Examples of a queueing system. An agile team works on backlog items. A kanban team that works on production incidents. A scrum team.

Agile Game

An often used game for explaining the importance of flow to team is the game of folding paper airplanes. Many forms of this games exist. See e.g. [Heintz11].

For this blog's purpose consider a team that folds air planes. The backlog is a stack of white paper. 3 Rounds of folding are done. Airplanes that are folded and fly at least 2 meters are considered done.

new doc 5_1-1

At the end of each round we will collect the following metrics:

  • number of completed airplanes
  • number of airplanes in progress and not yet finished.

The result of the the 3 rounds are shown at the right. At the end of round 1 Team A completed 3 airplanes and having 8 unfinished airplanes. Likewise, Team B finished 4 airplanes in round 3 giving a total of 12 finished airplanes and having 6 unfinished airplanes in progress.

The cycle time are got by writing the round number of the sheet of paper when starting to fold the airplane. When done, write the round number of the paper. The cycle time for one airplanes is got by subtracting the two and adding 1.

Calculating Little's Law

The way I was always calculating the number for work in progress, throughput and cycle time has been

  1. averaging cycle time for all completed airplanes,
  2. averaging the throughput over all rounds,
  3. averaging the work in progress over all rounds.

When calculated at the end of round 3, for Team A this amounts to:

  • Average work in progress = (8+6+2)/3 = 16/3,
  • Average throughput = 10 (completed airplanes)/3 = 10/3,
  • Average cycle time = 22/10 = 11/5

Using (2) above we get: 16/3 / (10/3) = 8/5. This is not equal to the average cycle time of 11/5. Not even close. How come?

The Truth

The interpretation of work in progress, throughput and cycle time I got from working with cumulative flow diagrams. There are many resources explaining these, see e.g. [Vega2011].

The key to the correct interpretation is choosing the time interval for which to measure the quantities  \bar{W} \bar{W} ,  \lambda \lambda , and  \bar{W} \bar{W} . Second, using the input rate instead of the throughput. Third, at the end of the time period include the unfinished items. Last, in calculating  \bar{N} \bar{N} consider all items that went through the system.

When we reinterpret the results for teams A and B we get

Team A

  • Average work in progress
    In round 1 3 airplanes were completed and left 8 unfinished; a total of 11 for work in progress (11 airplanes picked up as work)
    In round 2 the team completed 2 airplanes and have 6 unfinished; a total of 8
    In round 3 the team finished an additional 5 airplanes and left 2 uncompleted; a total of 7
    When measured over 3 rounds an average of (11+8+7)/3 = 26/3
  • Average¬†input rate
    Using the input rate:
    In round 1 the team picked up 11 new airplanes
    In round 2 the team picked up no additional airplanes
    In round 3 one new airplanes was picked up.
    An average input rate of (11+0+1)/3 = 4 airplanes per round
  • Average cycle time
    At the end of the third round 2 airplanes are left in progress; one taken up in the third round having a waiting time of 1 and one left from the first round having waiting time of 3 rounds. A total waiting time of 22 + 3 + 1 = 26 rounds.
    Averaging over 12 airplanes we have an average cycle time of 26/12 = 13/6 rounds per airplane.

Dividing the average work in progress by the average input rate we get 26/3 divided by 4 = 26/12(!). This is exactly equal to the calculated average cycle time!

Team B

In a similar manner the reinterpreted results for team B are:

  • Average work in progress = (13+14+10)/3 = 37/3 airplanes,
  • Average input rate¬†= (13+2+3)/3 = 6 airplanes per round,
  • Average cycle time = (27 (completed) + 10 (unfinished))/18 (airplanes) = 37/18 rounds per airplane

Again, dividing the average work in progress by the average input rate we get 37/18 rounds per airplane, which again is exactly equal to the average cycle time or waiting time!

Note: the cycle time of 10 days is built up by (a) 1 airplane from round 1 (cycle time of 3), 2 airplanes picked up in round 2 (total of 4 rounds), 3 airplanes picked up in round 3 (total of 3 rounds).

What About Cumulative Flow Diagrams?

Now that we understand how to calculate the quantities in Little's Law, we go back to cumulative flow diagrams. How come Little's Law works in this case.

In the case of teams that have collected data on cycle time, work in progress and throughput Little's Law work when done as explained in the section 'Calculating Little's Law' because:

  1. the teams are kept stable by having WiP limits on the left most column ("To Do"); then the throughput is more or less equal to the input rate,
  2. the team has completed a fairly large amount of work items in which case the waiting time of unfinished work items can be neglected,
  3. when measured over the (large part of the) value creation process, the completed items per time period can often be neglected in the calculation of the average work in progress.
Summary

Little's Law (1) holds under the conditions that (a) the system considered is a queueing system and (b) the observation or measurements are done over a finite time interval. It then holds independently of the stationaryness of the probability distributions, queuing discipline, emptiness of the system at the start and end of the time interval.

Calculate the quantities  \bar{N} \bar{N} ,  \lambda \lambda , and  \bar{W} \bar{W} as follows:

  • Average work in progress  \bar{N} \bar{N}
    For each time interval considered count the total amount of work in the system and add any items completed in that time interval.
  • Average cycle time  \bar{W} \bar{W}
    Sum the cycle times for all completed items and include the waiting time for unfinished items and divide by the total number of items.
  • Average input rate  \lambda \lambda
    Add the total number of items that entered the system and divide by the total number of time intervals.
References

[Little61] Little, J. D. C. 1961. A proof for the queuing formula: L = √£W . Oper. Res. 9(3) 383‚Äď387.

[Heintz11] John Heintz, June 2011, Agile Airplane Game, GistLabs, http://gistlabs.com/2011/06/agile-airplane-game/

[Vega11] Vega Information System Services, Inc., September 2011, Basics of Reading Cumulative Flow Diagrams, http://www.vissinc.com/2011/09/29/basics-of-reading-cumulative-flow-diagrams/

 

Scaling the World Cup - How Gambify runs a massive mobile betting app with a team of 2

This is a guest post by Elizabeth Osterloh and Tobias Wilke of cloudControl.

Startups face very different issues than big companies when they build software. Larger companies develop projects over much longer time frames and often have entire IT-departments to support them in creating customized architecture. It’s an entirely different story when a startup has a good idea, it gets popular, and they need to scale fast.

This was the situation for Gambify, an app for organizing betting games released just in time for the soccer World Cup. The company was founded and is run in Germany by only two people. When they managed to get a few major endorsements (including Adidas and the German team star Thomas Müller), they had to prepare for a sudden deluge of users, as well as very specific peak times.

The Gambify App: Basic Architecture
Categories: Architecture

I teach people how to draw pictures

Coding the Architecture - Simon Brown - Sun, 07/06/2014 - 11:55

Regular readers will know that I'm a big fan of pictures, especially as a mechanism for communicating the structure of software systems. To this end, I sometimes introduce myself as somebody who teaches people how to draw pictures. This is often said with a smile, because on the face of it this is exactly what I appear to be doing. But there's less truth to this than there initially appears.

Something that we do on the full 2-day training course and the 1-day sketching workshop is to get people drawing some pictures to communicate the structure of a software system. This is either a solution to the financial risk system or their own software. Over the years, I've done this for thousands of people and pretty much everybody struggles to draw something that effectively communicates the software. The same challenges crop up again and again.

The challenges of drawing pictures

The diagrams that are produced during the initial 90-minute timebox are usually fairly confusing and exhibit problems ranging from a lack of clarity around the abstractions and notation used through to diagrams that show too little or too much detail. This is an aside, but the majority of the diagrams are informal "boxes and lines" rather than UML.

After some reviews and feedback, I introduce people to my C4 model. In a nutshell, it's all about thinking of a software system as being made up of containers (web applications, databases, mobile devices, etc), each of which contains components, which in turn are made up of classes. There are some variations of this depending on the technology you're using, but that's basically it. With this structure in mind, you can then draw a diagram at each level in turn, which leads to a system context diagram, a container diagram, one or more component diagrams and (optionally) some class diagrams. If you want a slightly more detailed introduction to C4, take a look at Simple sketches for diagramming your software architecture.

The C4 model

Back to the workshops, and iteration two is another 90-minute timebox in which teams can redraw their diagrams. I don't actually mandate that people should follow my C4 approach, but most do, and what happens next is really interesting. The conversations change. In stark contrast to the first timebox, the conversations are much more technical. Gone are the discussions about what to include on each diagram and instead the focus is on the technical aspects of the software. Instead of "what should we show on this diagram?", the questions I hear are more along the lines of "what are the responsibilities of this thing?" and "how does this thing talk to that thing?". In essence, the discussion returns to be about software design, which is exactly where it should be.

There are a number of ways in which you can communicate the architecture of a software system, so I don't want to pitch my C4 approach as "the one true way". What I do want to say is that providing even a little guidance in this area can go a long way. I ask people for their thoughts after the second timebox and the consensus is that the diagrams are easier to draw and comprehend because of the "framework" (their words, not mine) that's been provided. All I'm doing is providing some constraints that people might want to work within, and this frees them up to focus on what's important again.

Back to that comment about introducing myself as somebody who teaches people how to draw pictures. It turns out that this isn't actually what I do. I don't really care so much *how* people draw pictures. What I do care about is that they use a well-understood, consistent and meaningful set of *abstractions* to think about and therefore describe their software. It turns out that modelling our software isn't dead after all. ;-)

Categories: Architecture

Distributed big balls of mud

Coding the Architecture - Simon Brown - Sun, 07/06/2014 - 10:27

If you want evidence that the software development industry is susceptible to fashion, just go and take a look at all of the hype around microservices. It's everywhere! For some people microservices is "the next big thing", whereas for others it's simply a lightweight evolution of the big SOAP service-oriented architectures that we saw 10 years ago "done right". I do like a lot of what the current microservice architectures are doing, but it's by no means a silver bullet. Okay, I know that sounds obvious, but I think many people are jumping on them for the wrong reason.

From monoliths to microservices

I often show this slide in my conference talks, and I've blogged about this before, but basically there are different ways to build software systems. On the one side we have traditional monolithic systems, where everything is bundled up inside a single deployable unit. This is probably where most of the industry is. Caveats apply, but monoliths can be built quickly and are easy to deploy, but they provide limited agility because even tiny changes require a full redeployment. We also know that monoliths often end up looking like a big ball of mud because of the way that software often evolves over time. For example, many monolithic systems are built using a layered architecture, and it's relatively easy for layered architectures to be abused (e.g. skipping "around" a service to call the repository/data access layer directly).

On the other side we have service-based architectures, where a software system is made up of many separately deployable services. Again, caveats apply but, if done well, service-based architectures buy you a lot of flexibility and agility because each service can be developed, tested, deployed, scaled, upgraded and rewritten separately, especially if the services are decoupled via asynchronous messaging. The downside is increased complexity because your software system now has many more moving parts than a monolith. As Robert says, the complexity is still there, you're just moving it somewhere else.

There is, of course, a mid-ground here. We can build monolithic systems that are made up of in-process components, each of which has an explicit well-defined interface and set of responsibilities. This is old-school component-based design that talks about high cohesion and low coupling, but I usually sense some hesitation when I talk about it. And this seems odd to me. Before I explain why, let me quote something from a blog post that I read earlier this morning about the rationale behind a team adopting a microservices approach. When we started building Karma, we decided to split the project into two main parts: the backend API, and the frontend application. The backend is responsible for handling orders from the store, usage accounting, user management, device management and so forth, while the frontend offers a dashboard for users which accesses this API. Along the way we noticed that if the whole backend API is monolithic it doesn't work very well because everything gets entangled.

The blog post also mentions scaling, versioning and multiple languages/frameworks as other reasons to choose microservices. Again, there are no silver bullets here, everything is a trade-off. Anyway, "everything getting entangled" is not a reason to switch from monoliths to microservices. If you're building a monolithic system and it's turning into a big ball of mud, perhaps you should consider whether you're taking enough care of your software architecture. Do you really understand what the core structural abstractions are in your software? Are their interfaces and responsibilities clear too? If not, why do you think moving to a microservices architecture will help? Sure, the physical separation of services will force you to not take some shortcuts, but you can achieve the same separation between components in a monolith. A little design thinking and an architecturally-evident coding style will help to achieve this without the baggage of going distributed.

Many of the teams I've spoken to are building monolithic systems and don't want to look at component-based design. The mid-ground seems to be a hard-sell. I ran a software architecture sketching workshop with a team earlier this year where we diagrammed one of their software systems. The diagram started as a strictly layered architecture (presentation, business services, data access) with all arrows pointing downwards and each layer only ever calling the layer directly beneath it. The code told a different story though and the eventual diagram didn't look so neat anymore. We discussed how adopting a package by component approach could fix some of these problems, but the response was, "meh, we like building software using layers".

It seems as if teams are jumping on microservices because they're sexy, but the design thinking and decomposition strategy required to create a good microservices architecture are the same as those needed to create a well structured monolith. If teams find it hard to create a well structured monolith, I don't rate their chances of creating a well structured microservices architecture. As Michael Feathers recently said, "There's a bit of overhead involved in implementing each microservice. If they ever become as easy to create as classes, people will have a freer hand to create trouble - hulking monoliths at a different scale.". I agree. A world of distributed big balls of mud worries me.

Categories: Architecture

Create the smallest possible Docker container

Xebia Blog - Fri, 07/04/2014 - 21:59

When you are playing around with Docker, you quickly notice that you are downloading large numbers of megabytes as you use preconfigured containers. A simple Ubuntu container easily exceeds 200MB and as software is installed on top of it, the size increases. In some use cases, you do not need everything that comes with Ubuntu. For example, if you want to run a simple web server, written in Go, there is no need for any tool around that at all.

I have been searching for the smallest possible container to start with and found this one:

docker pull scratch

The scratch image is perfect. Literally perfect! It is elegant, small and fast. It does not contain any bugs, security leaks, slow code or technical debt. And that is because it is basically empty. Except for a bit of metadata added by Docker. In fact, you could have created this scratch image yourself with this command as described in the Docker documentation:

tar cv --files-from /dev/null | docker import - scratch

 

So that is it, the smallest possible Docker image. End of blog post!

... or is there something more we can say about this? For example, how do you use the scratch base image? It turns out this brings some challenges of its own.

Creating content for the scratch image

What can we run on an empty base image? An executable without dependencies. Do you have executables without dependencies?

I used to write code in Python, Java and JavaScript. Each of these languages/platforms require a runtime installed. Recently, I started looking into the Go (or GoLang if you prefer) platform. And it seems (spoiler alert) like Go is statically  linked. So I tried compiling a simple web server saying Hello World and running it within the scratch container. Here is the code for the Hello World web server:

package main

import (
	"fmt"
	"net/http"
)

func helloHandler(w http.ResponseWriter, r *http.Request) {
	fmt.Fprintln(w, "Hello World from Go in minimal Docker container")
}

func main() {
	http.HandleFunc("/", helloHandler)

	fmt.Println("Started, serving at 8080")
	err := http.ListenAndServe(":8080", nil)
	if err != nil {
		panic("ListenAndServe: " + err.Error())
	}
}

 

Obviously, I cannot compile my webserver inside the scratch container as there is no Go compiler in it. And as I am working on a Mac, I also cannot compile a Linux binary just like that. (Actually, it is possible to cross-compile GoLang sources to different platforms, but that is material for another blog post)

So I first need a Docker container with a Go compiler. Let's start simple:

docker run -ti google/golang /bin/bash

 

Inside this container, I can build the Go web server, which I have committed in a GitHub repository:

go get github.com/adriaandejonge/helloworld

 

The go get command is a variant of the go build command that allows fetching and building remote dependencies. You can start the resulting executable with:

$GOPATH/bin/helloworld

 

This works. But it is not what we want. We need the hello world container to run inside the scratch container. So, in fact, we need a Dockerfile saying:

FROM scratch
ADD bin/helloworld /helloworld
CMD ["/helloworld"]

 

and then start that. Unfortunately, the way we started the google/golang container, there is no way to build this Dockerfile. So first, we need a way to access Docker from within the container.

Calling Docker from within Docker

When you use Docker, sooner or later you run into the need to control Docker from within Docker. There are multiple ways to accomplish this. You could use recursion and run Docker inside Docker. However, that seems overly complex and again leads to large containers. You can also provide access to the Docker server outside the instance with a few additional command line options:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(which docker):$(which docker) -ti google/golang /bin/bash

 

Before you continue, please rerun the Go compiler, as Docker forgot our previous compilation during the restart:

go get github.com/adriaandejonge/helloworld

 

When starting the container, the -v flag creates a volume inside the Docker container and allows you to provide a file from the Docker machine as input. The /var/run/docker.sock is the Unix socket that allows access to the Docker server. The $(which docker) part is a clever way to provide the path for the docker executable inside the container without hardcoding it. However, be careful when you use this command on an Apple when using boot2docker. If the docker executable is installed in a different location than it is installed in boot2docker's virtual machine, this results in a mismatch. It will be the executable inside the boot2docker virtual server that gets inserted into the container. So you may want to replace $(which docker) with /usr/local/bin/docker which is hardcoded. Similarly, if you run a different system, there is a chance that the /var/run/docker.sock has a different location and you need to adjust it accordingly.

Now you can use the Dockerfile inside the google/golang container in the $GOPATH directory, which points to /gopath in this example. Actually, I already checked this Dockerfile into GitHub. So you can copy it from the Go build directory to the desired location like this:

cp $GOPATH/src/github.com/adriaandejonge/helloworld/Dockerfile $GOPATH

 

You need to copy this as the compiled binary is now located in $GOPATH/bin and it is not possible to include files from parent directories when building a Dockerfile. So after copying, the next step is:

docker build -t adejonge/helloworld $GOPATH

 

And if all goes, well, Docker responds with something like:

Successfully built 6ff3fd5a381d

 

Which allows you to run the container:

docker run -ti --name hellobroken adejonge/helloworld

 

But unfortunately, now Docker responds with:

2014/07/02 17:06:48 no such file or directory

 

So what is going on? We have a statically linked executable inside a scratch container. Did we make a mistake?

As it turns out, Go does not statically link libraries. Or at least not all libraries. Under Linux, we can see the dynamically linked libraries for an executable with the ldd command:

ldd $GOPATH/bin/helloworld 

 

Which responds with:

linux-vdso.so.1 => (0x00007fff039fe000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f61df30f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f61def84000)
/lib64/ld-linux-x86-64.so.2 (0x00007f61df530000)

 

So before we can run the Hello World webserver, we need to tell the Go compiler to actually do static linking.

Creating statically linked executables in Go

In order to create statically linked executables, we need to tell Go to use the cgo compiler rather than the go compiler. The command to do so is:

CGO_ENABLED=0 go get -a -ldflags '-s' github.com/adriaandejonge/helloworld

 

The CGO_ENABLED environment variable tells Go to use the cgo compiler rather than the go compiler. The -a flag tells Go to rebuild all dependencies. Otherwise you still end up with dynamically linked dependencies. And finally the -ldflags '-s' flag is a nice extra. It reduces the file size of the resulting executable by roughly 50%. You can also do this without the cgo compiler. The size reduction is a result from removing debug information.

Just to be sure, rerun the ldd command.

ldd $GOPATH/bin/helloworld 

 

It should now respond with:

not a dynamic executable

 

You can also rerun the steps for creating the Docker container around the executable from scratch:

docker build -t adejonge/helloworld $GOPATH

 

And if all goes well, Docker responds with something like:

Successfully built 6ff3fd5a381d

 

Which allows you to run the container:

docker run -ti --name helloworld adejonge/helloworld

 

And this time it should respond with:

Started, serving at 8080

 

Until so far, there were many manual steps and there is a lot of room for error. Let's exit from the google/golang container and continue from the surrounding machine:

<Press Ctrl-C>
exit

 

You can check the existence or absence of containers and images with:

docker ps -a
docker images -a

 

And you can do some cleaning of Docker with:

docker rm -f helloworld
docker rmi -f adejonge/helloworld

 

Creating a Docker container that creates a Docker container

The steps we took so far, we can also record in a Dockerfile and have Docker do the work for us:

FROM google/golang
RUN CGO_ENABLED=0 go get -a -ldflags '-s' github.com/adriaandejonge/helloworld
RUN cp /gopath/src/github.com/adriaandejonge/helloworld/Dockerfile /gopath
CMD docker build -t adejonge/helloworld gopath

 

I checked this Dockerfile into a separate GitHub repository called adriaandejonge/hellobuild. It can be built with this command:

docker build -t adejonge/hellobuild github.com/adriaandejonge/hellobuild

 

Providing the  -t flag names the image as adejonge/hellobuild and implicitly tags it as latest. These names make it easier for you to remove the image later on. Next,  you can create a container from this image while providing the flags that you have seen earlier in this post:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(which docker):$(which docker) -ti --name hellobuild adejonge/hellobuild

 

Providing the --name hellobuild flag makes it easier to remove the container after running. In fact, you can do so right away, because after running this command, you already created the adejonge/helloworld image:

docker rm -f hellobuild
docker rmi -f adejonge/hellobuild

 

And now you can start a new container named helloworld based on the adejonge/helloworld image as you have done before:

docker run -ti --name helloworld adejonge/helloworld

 

Because all these steps are run from the same command line, without opening a bash shell inside a Docker container, you can add these steps to a bash script and run it automatically. For your convenience, I have added these bash scripts to the hellobuild GitHub repository.

Also, if you want to try the smallest possible Docker container running a Hello World web server without following all the steps described in this blog post, you can also use the pre-built image that I checked into the Docker Hub repository:

docker pull adejonge/helloworld

 

With docker images -a you can see that the size is 3.6MB. Of course, you can make it even smaller if you manage to create an executable that is smaller than the web server in Go that I wrote. In C or Assembly you may be able to do so. However, you can never make it smaller than the scratch image.

Dockerfiles as automated installation scripts

Xebia Blog - Thu, 07/03/2014 - 19:16

Dockerfiles are great and easily readable specifications for the installation and configuration of an application. It is terse, can be understood by anyone who understands UNIX commands, results in a testable product and can easily be turned into an automated installation script using a little awk'ward magic. Just in case you want to install the application in question on the good old fashioned way, without the Docker hassle :-)

In this case, we needed to experiment with the Codahale Metrics library and Zabbix. Instead of installing a complete Zabbix server, I googled for a docker container and was pleased to find a ready to run Zabbix server configuration created by Bernardo Gomez Palacio. . Unfortunately, the server stopped repeatedly after about 5 minutes due the simplevisor's impression that it was requested to stop. I could not figure out where this request was coming from, and as it was pretty persistent, I decided to install zabbix on a virtual box.

So I checked out the  docker-zabbix github project and found a ready to run Vagrant configuration to build the zabbix docker container itself (Cool!). The Dockerfile contained easily and readable instructions on how to install and configure Zabbix. But,  instead of copy-and-pasting the instructions to the command prompt, I cloned the project on the vagrant box and created the following awk script in order to execute the instructions in the Dockerfile directly on the running system.

/^ADD/ {
sub(/ADD/, "")
    cmd = "mkdir -p $(dirname " $2 ")"
    system(cmd)
    cmd = "cp " $0
    system(cmd)
}

/^RUN/ {
    sub(/RUN/, "")
    cmd = $0
    system(cmd)
}

After a few minutes, the image was properly configured. I just needed to run the database initialisation script (/start.sh) and ensured that all the services were started on reboot.

 cd /etc/init.d
for i in zabbix* httpd mysqld snmp* ; do
     chkconfig $i on
     service $i start
done

Even if you do not use Docker in production, Dockerfiles are a great improvement in the specifications of installation instructions!

How architecture enables kick ass teams (1): replication considered harmful?

Xebia Blog - Thu, 07/03/2014 - 11:51

At Xebia we regularly have discussions regarding Agile Architecture? What is it? What does it take? How should you organise this? Is it technical or organisational? And much more questions… which I won’t be answering today. What I will do today is kick off a blog series covering subjects that are often part of these heated debates. In general what we strive for with Agile Architecture is an architecture that enables the organisation to keep moving fast and without IT be a limiting factor for realising changes. As you read this series you’ll start noticing one theme coming back over and over again: Autonomy. Sometimes we’ll be focussing on the architecture of systems, sometimes on the architecture of the organisation or teams, but autonomy is the overarching theme. And if you’re familiar with Conways Law it should be no surprise that there is a strong correlation between team and system structure. Having a structure of teams  that is completely different from your system landscape causes friction. We are convinced that striving for optimal team and system autonomy will lead to an organisation which is able to quickly adapt and respond to changes.

The first subject is replication of data, this is more a systems (landscape) issue and less of an organisational issue and definitely not the only one, more posts will follow.

We all have to deal with situations where:

  • consumers of a data retrieval service (e.g. customer account details) require this service to be highly available, or
  • compute intensive analysis must be done using the data in a system, or
  • data owned by a system must be searched in a way that is not (efficiently) supported by that system

These situations all impact the autonomy of the system owning the data.Is the system able to provide the it's functionality at the require quality level or do these external requirements lead to negative consequences on quality of the service provided or maintainability? Should these requirements be forced into the system or is another approach more appropriate?

Above examples all could be solved by replicating data into another system which is more suitable for meeting these requirements but … replication of data is considered to be harmful by some. Is it really? Often mentioned reasons not to replicate data are:

  • The replicated data will always be less accurate and timely than the original data
    True, and is this really a problem for the specific situation you’re dealing with? Sometimes you really need the latest version of a customer record, but in many situations it is no problem is the data is seconds, minutes or even hours old.
  • Business logic that interprets the data is implemented twice and needs to be maintained
    Yes, and you have to compare the costs of this against the benefits. As long as the benefits outweigh the costs, it is a good choice.  You can even consider to provide a library that is used in both systems.
  • System X is the authoritative source of the data and should be the only one that exposes it
    Agree, and keeping that system as the authoritative source is good practice and does not mean that there could be read only access to the same (replicated) data in other systems.

As you can see it is never a black and white decision, you’ll have to make a balanced decision and include benefits and costs of both alternatives. The gained autonomy and business benefits derived from this can easily outweigh the extra development, hosting and maintenance costs of replicating data.

A few concrete examples from my own experience:

We had a situation where a CRM system instance owned data which was also required in a 24x7 emergency support proces. The data was nicely exposed by a number of data retrieval services. At that organisation the CRM system deployment was such that most components were redundant, but during updates the system as a whole would still be down for several hours. Which was not acceptable given that the data was required in a 24x7 emergency support process. Making the CRM system deployment upgradable without downtime was not possible or would cost .
In this situation the costs of replicating the CRM system database to another datacenter using standard database features and having the data retrieval services access either that replicated database or the original database (as fall back) was much cheaper than trying to make CRM system itself high available. The replicated database would remain running accessible even when CRM system  got upgraded. Yes, we’re bypassing the CRM system business logic for interpreting the data, but for this situation the logic was so simple that the costs of reimplementing and maintaining this in a new lightweight service (separate from CRM system) were neglectable.

Another example is from a telecom provider that uses a chain of fulfilment systems in which it registered all network products sold to its customers (e.g. internet access, telephony, tv). Each product instance depends on instances registered in another system and if you drill down deep enough you‚Äôll reach the physical network hardware ports on which it runs. The systems that registered all products used a relational model which was okay for registration. However, questions like ‚Äúif this product instance breaks, which customers are impacted‚ÄĚ were impossible to answer without overheating CPUs in those systems. By publishing all changes in the registrations to a separate system we could model the whole inventory of services as a network graph and easily do analysis on it without impacting the fulfilment systems. The fact that the data would be a (at most) a few seconds old was absolutely no problem.

And a last example is that sometimes you want to do a full (phonetic) text search through a subset of your domain model. Relational data models quickly get you into an unmaintainable situation. You‚Äôre SQL queries will require many tables, lot‚Äôs of inefficient ‚ÄúLIKE ‚Äė%gold%‚Äô" and developers that have a hard time understanding what a query actually intended to do. Replicating the data to a search engine makes searching far easier and¬†provides more possibilities for searches that are hard to realise in a relational database.

As you can see replication of data can increase autonomy of systems and teams and thereby make your system or system landscape and organisation more agile. I.e. you can realise new functionality faster and get it available for your users quicker because the coupling with other systems or teams is reduced.

In a next blog we'll discuss another subject that impacts team or system autonomy.

How combined Lean- and Agile practices will change the world as we know it

Xebia Blog - Tue, 07/01/2014 - 08:50

You might have attended this month at our presentation about eXtreme Manufacturing and the keynote of Nalden last week on XebiCon 2014. There are a few epic takeaways and additions I would like to share with you in this blogpost.

Epic TakeAway #1: The Learn, Unlearn and Relearn Cycle Like Nalden expressed in his inspiring keynote, one of the major things for him to be successful is being able to Learn, Unlearn and Relearn every time again. In my opinion, this will be the key ability for every successful company in the near future.  In fact, this is how nature evolutes: in the end, only the species who are able to adapt to changing circumstances will survive and evolute. This mechanism makes for example, most of the startups fail, but those who will survive, can be extremely disruptive for non-agile organizations.  Best example for this is of course Whatsapp.  Beating up the Telco Industry by almost destroying their whole businessmodel in only a few months. Learn more about disruptive innovation from one of my personal heroes, Harvard Professor Clayton Christensen.

Epic TakeAway #2: Unlearning Waterfall, Relearning Lean & Agile Globally, Waterfall is still the dominant method in companies and universities.  Waterfall has its origins more than 40 years ago. Times have changed. A lot. A new, successful and disruptive product could be there in only a matter of days instead of (many) years. Finally, things are changing. For example, the US Department of Defence has recently embraced Lean and Agile as mandatory practices, especially Scrum. Schools and universities are also more and more adopting the Agile way of working. Later more in this blogpost.

Epic TakeAway #3: Combined Lean- and Agile practices =  XM Lean practices arose in Japan in the 1980’s , mainly in the manufacturing industry, Toyota being the frontrunner here.  Agile practices like Scrum, were first introduced in the 1990’s by Ken Schwaber and Jeff Sutherland, these practices were mainly applied in the IT-industry. Until now, the manufacturing and IT world didn’t really joined forces combining Lean and Agile practices.  Until recently.  The WikiSpeed initiative of Joe Justice proved combining these practices result in a hyper-productive environment, where a 100 Mile/Gallon road legal sportscar could be developed in less than 3 months.  Out of this success eXtreme Manufacturing (XM) arose. Finally, a powerful combination of best practices from the manufacturing- and IT-world came together.

Epic TakeAway #4: Agile Mindset & Education fotoLike Sir Ken Robinson and Dan Pink already described in their famous TED-talks, the way most people are educated and rewarded, is not suitable anymore for modern times and even conflicts with the way we are born.  We learn by "failing", not by preventing it.  Failing in it’s essence should stimulate creativity to do things better next time, not be punished.  On the long run, failing (read: learning!) has more added value than short-term succes, for example by chasing milestones blindly. EduScrum in the Netherlands stimulates schools and universities to apply Scrum in their daily classes in order to stimulate creativity, happiness, self-reliantness and talent. The results of the schools joining these initiative are spectacular: happy students, less dropouts an significantly higher grades. For a prestigious project for the Delft University, Forze, the development of a hydrogen race car, the students are currently being trained and coached to apply Agile and Lean practices.  Also these results are more than promising. The Forze team is happier, more productive and more able to learn faster and better from setbacks.  Actually, they are taking the first steps of being anti-fragile.  Due too an intercession of the Forze team members themselves,  the current support of agile (Xebia) coaches is now planned being extended to the flagship of the Delft University:  the NUON solar team.

The Final Epic TakeAway In my opinion, we reached a tipping point in the way goals should be achieved.  Organizations are massively abandoning Waterfall and embracing Agile practices, like Scrum.  Adding Lean practices like Joe Justice did in his WikiSpeed project, makes Agile and Lean extremely powerful.  Yes, this will even make this world a much better place.  We cannot prevent nature disasters with this, but we can be anti-fragile.  We cannot prevent every epidemic, but we can respond in an XM-fashion on this by developing a vaccin in only days instead of years.  This brings me finally to the missing statement of the current Agile Manifesto:   We should Unlearn and Relearn before we Judge.  Dare to Dream like a little kid again. Unlearn your skepticism.  Companies like Boeing, Lockheed Martin and John Deere already did. Adopting XM speeded up their velocity in some cases with more than 7 times.

Keeping a journal

Gridshore - Sun, 06/29/2014 - 23:34

Today I was reading the first part of a book I got as a gift from one of my customers. The book is called Show your work by Austin Kleon(Show Your Work! @ Amazon). The whole idea around this book is that you must be open en share what you learn and the steps you took to learn.

I think this fits me like a glove, but I can be more expressive. Therefore I have decided to do things differently. I want to start by writing smaller pieces of the things I want to do that day, or what I accomplished that day, give some excerpts of things I am working on. Not real blog posts or tutorials but more notes that I share with you. Since it is a Sunday I only want to share the book I am reading.


The post Keeping a journal appeared first on Gridshore.

Categories: Architecture, Programming