Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Coding Android TV games is easy as pie

Android Developers Blog - Wed, 11/19/2014 - 19:28
Posted by Alex Ames, Fun Propulsion Labs at Google*

We’re pleased to announce Pie Noon, a simple game created to demonstrate multi-player support on the Nexus Player, an Android TV device. Pie Noon is an open source, cross-platform game written in C++ which supports:

  • Up to 4 players using Bluetooth controllers.
  • Touch controls.
  • Google Play Games Services sign-in and leaderboards.
  • Other Android devices (you can play on your phone or tablet in single-player mode, or against human adversaries using Bluetooth controllers).

Pie Noon serves as a demonstration of how to use the SDL library in Android games as well as Google technologies like Flatbuffers, Mathfu, fplutil, and WebP.

  • Flatbuffers provides efficient serialization of the data loaded at run time for quick loading times. (Examples: schema files and loading compiled Flatbuffers)
  • Mathfu drives the rendering code, particle effects, scene layout, and more, allowing for efficient mathematical operations optimized with SIMD. (Example: particle system)
  • fplutil streamlines the build process for Android, making iteration faster and easier. Our Android build script makes use of it to easily compile and run on on Android devices.
  • WebP compresses image assets more efficiently than jpg or png file formats, allowing for smaller APK sizes.

You can download the game in the Play Store and the latest open source release from our GitHub page. We invite you to learn from the code to see how you can implement these libraries and utilities in your own Android games. Take advantage of our discussion list if you have any questions, and don’t forget to throw a few pies while you’re at it!

* Fun Propulsion Labs is a team within Google that's dedicated to advancing gaming on Android and other platforms.

Join the discussion on

+Android Developers
Categories: Programming

Three Alternatives for Making Smaller Stories

When I was in Israel a couple of weeks ago teaching workshops, one of the big problems people had was large stories. Why was this a problem? If your stories are large, you can’t show progress, and more importantly, you can’t change.

For me, the point of agile is the transparency—hey, look at what we’ve done!—and the ability to change. You can change the items in the backlog for the next iteration if you are working in iterations. You can change the project portfolio. You can change the features. But, you can’t change anything if you continue to drag on and on and on for a give feature. You’re not transparent if you keep developing a feature. You become a black hole.

Managers start to ask, “What you guys doing? When will you be done? How much will this feature cost?” Do you see where you need to estimate more if the feature is large? Of course, the larger the feature, the more you need to estimate and the more difficult it is to estimate well.

Pawel Brodzinski said this quite well last year at the Agile conference, with his favorite estimation scale. Anything other than a size 1 was either too big or the team had no clue.

The reason Pawel and I and many other people like very small stories—size of 1—means that you deliver something every day or more often. You have transparency. You don’t invest a ton of work without getting feedback on the work.

The people I met a couple of weeks ago felt (and were) stuck. One guy was doing intricate SQL queries. He thought that there was no value until the entire query was done. Nope, that’s where he is incorrect. There is value in interim results. Why? How else would you debug the query? How else would you discover if you had the database set up correctly for product performance?

I suggested that every single atomic transaction was a valuable piece. That the way to build small stories was to separate this hairy SQL statement was at the atomic transaction. I bet there are other ways, but that was a good start. He got that aha look, so I am sure he will think of other ways.

Another guy was doing algorithm development. Now, I know one issue with algorithm development is you have to keep testing performance or reliability or something else when you do the development. Otherwise, you fall off the deep end. You have an algorithm tuned for one aspect of the system, but not another one. The way I’ve done this in the past is to support algorithm development with a variety of tests.

Testing Continuum from Manage It!This is the testing continuum from Manage It! Your Guide to Modern, Pragmatic Project Management. See the unit and component testing parts? If you do algorithm development, you need to test each piece of the algorithm—the inner loop, the next outer loop, repeat for each loop—with some sort of unit test, then component test, then as a feature. And, you can do system level testing for the algorithm itself.

Back when I tested machine vision systems, I was the system tester for an algorithm we wanted to go “faster.” I created the golden master tests and measured the performance. I gave my tests to the developers. Then, as they changed the inner loops, they created their own unit tests. (No, we were not smart enough to do test-driven development. You can be.) I helped create the component-level tests for the next-level-up tests. We could run each new potential algorithm against the golden master and see if the new algorithm was better or not.

I realize that you don’t have a product until everything works. This is like saying in math that you don’t have an answer until you have the finished the entire calculation. And, you are allowed—in fact, I encourage you—to show your interim work. How else can you know if you are making progress?

Another participant said that he was special. (Each and every one of you is special. Don’t you know that by now??) He was doing firmware development. I asked if he simulated the firmware before he downloaded to the device. “Of course!” he said. “So, simulate in smaller batches,” I suggested. He got that far-off look. You know that look, the one that says, “Why didn’t I think of that?”

He didn’t think of it because it requires changes to their simulator. He’s not an idiot. Their simulator is built for an entire system, not small batches. The simulator assumes waterfall, not agile. They have some technical debt there.

Here are the three ways, in case you weren’t clear:

  1. Use atomic transactions as a way to show value when you have a big honking transaction. Use tests for each atomic transaction to support your work and understand if you have the correct performance on each transaction.
  2. Break apart algorithm development, as in “show your work.” Support your algorithm development with tests, especially if you have loops.
  3. Simulate in small batches when you have hardware or firmware. Use tests to support your work.

You want to deliver value in your projects. Short stories allow you to do this. Long stories stop your momentum. The longer your project, and the more teams (if you work on a program), the more you need to keep your stories short. Try these alternatives.

Do you have other scenarios I haven’t discussed? Ask away in the comments.

Categories: Project Management

Launchpad Online for developers: Getting Started with Google APIs

Google Code Blog - Wed, 11/19/2014 - 19:00
Google APIs give you the power to build rich integrations with our most popular applications, including Google Maps, YouTube, Gmail, Google Drive and more. If you are new to our developer features, we want to give you a quick jumpstart on how to use them effectively and build amazing things.

As you saw last week, we’re excited to partner with the Startup Launch team to create the “Launchpad Online”, a new video series for a global audience that joins their cadre which already includes the established “Root Access” and “How I:” series, as well as a “Global Spotlight” series on some of the startups that have benefitted from the Startup Launch program. This new series focuses on developers who are beginners to one or more Google APIs. Each episode helps get new users off the ground with the basics, shows experienced developers more advanced techniques, or introduces new API features, all in as few lines of code as possible.

The series assumes you have some experience developing software and are comfortable using languages like Javascript and Python. Be inspired by a short working example that you can drop into your own app and customize as desired, letting you “take off” with that great idea. Hopefully you had a chance to view our intro video last week and are ready for more!

Once you’ve got your developers ready to get started, learn how to use the Google Developers Console to set up a project (see video to the right), and walk through common security code required for your app to access Google APIs. When you’ve got this “foundation” under your belt, you’ll be ready to build your first app… listing your files on Google Drive. You’ll learn something new and see some code that makes it happen in each future episode… we look forward to having you join us soon on Launchpad Online!

Posted by Wesley Chun, Developer Advocate, Google

+Wesley Chun (@wescpy) is an engineer, author of the bestselling Core Python books, and a Developer Advocate at Google, specializing in Google Drive, Google Apps Script, Gmail, Google Apps, cloud computing, developer training, and academia. He loves traveling worldwide to meet Google users everywhere, whether at a developers conference, user group meeting, or on a university campus!
Categories: Programming

We are leaving 3x-4x performance on the table just because of configuration.

Performance guru Martin Thompson gave a great talk at Strangeloop: Aeron: Open-source high-performance messaging, and one of the many interesting points he made was how much performance is being lost because were aren't configuring machines properly.

This point comes on the observation that "Loss, throughput, and buffer size are all strongly related."

Here's a gloss of Martin's reasoning. It's a problem that keeps happening and people aren't aware that it's happening because most people are not aware of how to tune network parameters in the OS.

The separation of programmers and system admins has become an anti-pattern. Developers don’t talk to the people who have root access on machines who don’t talk to the people that have network access. Which means machines are never configured right, which leads to a lot of loss. We are leaving 3x-4x performance on the table just because of configuration.

We need to workout how to bridge that gap, know what the parameters are, and how to fix them.

So know your OS network parameters and how to tune them.

Related Articles
Categories: Architecture

What is a Monolith?

Coding the Architecture - Simon Brown - Wed, 11/19/2014 - 10:00

There is currently a strong trend for microservice based architectures and frequent discussions comparing them to monoliths. There is much advice about breaking-up monoliths into microservices and also some amusing fights between proponents of the two paradigms - see the great Microservices vs Monolithic Melee. The term 'Monolith' is increasingly being used as a generic insult in the same way that 'Legacy' is!

However, I believe that there is a great deal of misunderstanding about exactly what a 'Monolith' is and those discussing it are often talking about completely different things.

A monolith can be considered an architectural style or a software development pattern (or anti-pattern if you view it negatively). Styles and patterns usually fit into different Viewtypes (a viewtype is a set, or category, of views that can be easily reconciled with each other [Clements et al., 2010]) and some basic viewtypes we can discuss are:

  • Module - The code units and their relation to each other at compile time.
  • Allocation - The mapping of the software onto its environment.
  • Runtime - The static structure of the software elements and how they interact at runtime.

A monolith could refer to any of the basic viewtypes above.


Module Monolith

If you have a module monolith then all of the code for a system is in a single codebase that is compiled together and produces a single artifact. The code may still be well structured (classes and packages that are coherent and decoupled at a source level rather than a big-ball-of-mud) but it is not split into separate modules for compilation. Conversely a non-monolithic module design may have code split into multiple modules or libraries that can be compiled separately, stored in repositories and referenced when required. There are advantages and disadvantages to both but this tells you very little about how the code is used - it is primarily done for development management.

Module Monolith


Allocation Monolith

For an allocation monolith, all of the code is shipped/deployed at the same time. In other words once the compiled code is 'ready for release' then a single version is shipped to all nodes. All running components have the same version of the software running at any point in time. This is independent of whether the module structure is a monolith. You may have compiled the entire codebase at once before deployment OR you may have created a set of deployment artifacts from multiple sources and versions. Either way this version for the system is deployed everywhere at once (often by stopping the entire system, rolling out the software and then restarting).

A non-monolithic allocation would involve deploying different versions to individual nodes at different times. This is again independent of the module structure as different versions of a module monolith could be deployed individually.

Allocation Monolith


Runtime Monolith

A runtime monolith will have a single application or process performing the work for the system (although the system may have multiple, external dependencies). Many systems have traditionally been written like this (especially line-of-business systems such as Payroll, Accounts Payable, CMS etc).

Whether the runtime is a monolith is independent of whether the system code is a module monolith or not. A runtime monolith often implies an allocation monolith if there is only one main node/component to be deployed (although this is not the case if a new version of software is rolled out across regions, with separate users, over a period of time).

Runtime Monolith

Note that my examples above are slightly forced for the viewtypes and it won't be as hard-and-fast in the real world.

Conclusion

Be very carefully when arguing about 'Microservices vs Monoliths'. A direct comparison is only possible when discussing the Runtime viewtype and properties. You should also not assume that moving away from a Module or Allocation monolith will magically enable a Microservice architecture (although it will probably help). If you are moving to a Microservice architecture then I'd advise you to consider all these viewtypes and align your boundaries across them i.e. don't just code, build and distribute a monolith that exposes subsets of itself on different nodes.

Categories: Architecture

SAFe Agile Release Planning: Participants Basics

Lots of varied roles!

SAFe’s release planning event is has been considered both the antithesis of Agile and the secret sauce for scaling Agile to programs and the enterprise. The process is fairly simple, albeit formal. Everyone involved with a program increment (an 8 – 12 week segment of an Agile release train) get together and synchronize on the program increment objectives, plan and then commit to the plan. Who is the “everybody” in a release planning event? The primary participants include:

  1. The Scrum Teams.   Release trains are typically comprised of 50 – 125 people with Scrum teams of 5 – 9 people comprising the lion’s share. Scrum teams are comprised of a Scrum master, product owner and technical personnel. The teams plan, identify risks and dependencies, share knowledge and commit to the plan.
  2. Release Train Engineer. The RTE facilitates the release planning meeting. The RTE is often called the Scrum master of Scrum masters. I have also heard the RTE called a cat herder.
  3. Product Management. Product management provides the vision for the program increment. They interpret the product roadmap and program epics providing direction.
  4. Business Owners. Business owners are the voice of the business. They interact with the teams in the ART to influence and generate an alignment between the program-level program increment objectives and the individual team-level program increment objectives.

Others that are involved include:

  1. CTO, Enterprise Architect and System Architect. These roles provide the architectural vision and an assessment of the Agile environment including changes since the beginning of the last program increment.
  2. Customer Representatives. Customer representatives are an adjunct to the product management personal providing interpretation of the vision.
  3. Release Managers. Release managers assist in planning, managing governing releases. The release manager may be involved in several ARTs.
  4. Engineering Development Managers. The development managers responsible for the applications involved in the release train need to be aware and provide support and knowledge capital to help ensure success.
  5. Tech Lead and Representative Team Members, Developers, QA/Test, Documentation, etc. Representatives from groups the ART must interface with during the program increment need to participate and have decision making power to commit personnel and resources needed to support the ART.
  6. VPs and Executives Responsible for Product Strategy, Delivery and/or other Affected Operations. Anyone that would typically need to review, provide support or resources or approve a plan or direction should be at hand during the release planning event.

It has been said that it takes a village to raise a child. It takes an equally complex number of roles and participants to plan and then generate commitment to that plan.


Categories: Process Management

Have You Signed Up for the Conscious Software Development Telesummit?

Do you know about the Conscious Software Development Telesummit? Michael Smith is interviewing more than 20 experts about all aspects of software development, project management, and project portfolio management. He’s releasing the interviews in chunks, so you can  listen and not lose work time. Isn’t that smart of him?

If you haven’t signed up yet, do it now. You get access to all of the interviews, recordings, and transcripts for all the speakers. That’s the Conscious Software Development Telesummit. Because you should make conscious decisions about what to do for your software projects.

Categories: Project Management

Begin developing with Android Auto

Android Developers Blog - Tue, 11/18/2014 - 19:09

Posted by Daniel Holle, Product Manager

At Google I/O back in June, we provided a preview of Android Auto. Today, we’re excited to announce the availability of our first APIs for building Auto-enabled apps for audio and messaging. Android apps can now be extended to the car in a way that is optimized for the driving experience.

For users, this means they simply connect their Android handheld to a compatible vehicle and begin utilizing a car-optimized Android experience that works with the car’s head unit display, steering wheel buttons, and more. For developers, the APIs and UX guidelines make it easy to provide a simple way for users to get the information they need while on the road. As an added bonus, the Android Auto APIs let developers easily extend their existing apps targeting Android 5.0 (API level 21) or higher to work in the car without having to worry about vehicle-specific hardware differences. This gives developers wide reach across manufacturers, model and regions, by just developing with one set of APIs and UX standards.

There are two use cases that Android Auto supports today:

  • Audio apps that expose content for users to browse and allow audio playback from the car, such as music, podcasts, news, etc.
  • Messaging apps that receive incoming notifications, read messages aloud, and send replies via voice from the car.

To help you get started with Android Auto, check out our Getting Started guide. It’s important to note that while the APIs are available today, apps extended with Android Auto cannot be published quite yet. More app categories will be supported in the future, providing more opportunities for developers and drivers of Android Auto. We encourage you to join the Android Auto Developers Google+ community to stay up-to-date on the latest news and timelines.

We’ve already started working with partners to develop experiences for Android Auto: iHeartRadio, Joyride, Kik, MLB.com, NPR, Pandora, PocketCasts, Songza, SoundCloud, Spotify, Stitcher, TextMe, textPlus, TuneIn, Umano, and WhatsApp. If you happen to be in the Los Angeles area, stop by the LA Auto Show through November 30 and visit us in the Hyundai booth to take Android Auto and an app or two for a test drive.

Join the discussion on

+Android Developers
Categories: Programming

You’re invited to submit an app to the Google Fit Developer Challenge

Google Code Blog - Tue, 11/18/2014 - 18:00
With the recent launch of the Google Fit platform, we hope to spur innovation in the world of fitness apps and devices that encourage users to have a more active lifestyle. To expand the ecosystem and help users get a more comprehensive view of their fitness, we’re excited to announce the Google Fit Developer Challenge in partnership with adidas, Polar, and Withings.
We’re inviting you to create and showcase an app, or update an existing app, that integrates with Google Fit. The submission deadline is February 17, 2015. Our selected judges will choose up to six new apps and six existing apps for Fit, as the winners. The judges are looking for fitness apps that are innovative, fun to use, keep users coming back, offer users real benefit, and meet the Android design and quality guidelines.

Challenge winners will be featured on Google Play in front of Android users around the world. Runners-ups will also receive prizes, in the form of smart devices from Google, adidas, Polar, and Withings, so that they can continue to tinker and improve their fitness apps with the latest hardware.

Check out the challenge website to find out how to enter*. You will also find some helpful resources to find inspiration. We encourage you to join the Google Fit Developer Community on Google+ to discuss your ideas and the challenge.

* The challenge is open to registered Google Play developers in the US, UK, France, Germany, India, Israel, Hong Kong, Singapore, South Korea, Japan, and Taiwan who are over the age of 18 (or the age of majority in their country). For the full terms and conditions, read the official rules.

Posted by Angana Ghosh, Product Manager, Google Fit
Categories: Programming

Software Development Linkopedia November 2014

From the Editor of Methods & Tools - Tue, 11/18/2014 - 17:19
Here is our monthly selection of interesting knowledge material on programming, software testing and project management.  This month you will find some interesting information and opinions about Agile coaching and management, positive quality assurance, product managers and owners, enterprise software architecture, functionnal programming, MySQL performance and software testing at Google. Blog: Fewer Bosses. More Coaches. Please. Blog: Advice for Agile Coaches on “Dealing With” Middle Management Blog: Five ways to make testing more positive Blog: 9 Things Every Product Manager Should Know about Being an Agile Product Owner Article: Managing Trust in Distributed Teams Article: Industrial ...

Adjusting To New Information

Software Requirements Blog - Seilevel.com - Tue, 11/18/2014 - 17:00
What does it mean for an organization to be agile? I don’t mean just in terms of agile software development, I mean for any team or company or group of people working toward any common goal. I tend to think about it in terms of ships: imagine that you are crossing the Atlantic, and you […]
Categories: Requirements

Are Vanity Metrics Really All That Bad?

Mike Cohn's Blog - Tue, 11/18/2014 - 15:00

The following was originally published in Mike Cohn's monthly newsletter. If you like what you're reading, sign up to have this content delivered to your inbox weeks before it's posted on the blog, here.

I have a bit of a problem with all the hatred shown to so-called vanity metrics.

Eric Ries first defined vanity metrics in his landmark book, The Lean Startup. Ries says vanity metrics are the ones that most startups are judged by—things like page views, number of registered users, account activations, and things like that.

Ries says that vanity metrics are in contrast to actionable metrics. He defines an actionable metric as one that demonstrates clear cause and effect. If what causes a metric to go up or down is clear, the metric is actionable. All other metrics are vanity metrics.

I’m pretty much OK with all this so far. I’m big on action. I’ve written in my books and in posts here that if a metric will not lead to a different action, that metric is not worth gathering. I’ve said the same of estimates. If you won’t behave differently by knowing a number, don’t waste time getting that number.

So I’m fine with the definitions of “actionable” and “vanity” metrics. My problem is with some of the metrics that are thrown away as being merely vanity. For example, the number one hit on Google today when I searched for “vanity metrics” was an article on TechCrunch.

They admit to being guilty of using them and cite such metrics as 1 million downloads and 10 million registered users.

But are such numbers truly vanity metrics?

One chapter in Succeeding with Agile, is about metrics. In it, I wrote about creating a balanced scorecard and using both leading and lagging indicators. A lagging indicator is something you can measure after you have done something, and can be used to determine if you achieved a goal.

If your goal is improving quality, a lagging indicator could be the number of defects reported in the first 30 days after the release. That would tell you if you achieved your goal—but it comes with the drawback of not being at all available until 30 days after the release.

A leading indicator, on the other hand, is available in advance, and can tell you if you are on your way to achieving a goal.

The number of nightly tests that pass would be a leading indicator that a team is on its way to improving quality. The number of nightly tests passing, though, is a vanity metric in Ries’ terms. It can be easily manipulated; the team could run the same or similar tests many times to deliberately inflate the number of tests. Therefore, the linkage because cause and effect is weak. More passing tests do not guarantee improved quality.

But is the number of passing tests really a vanity metric? Is it really useless?

To show that it’s not, consider a few other metrics you’re probably familiar with: your cholesterol value, your blood pressure, your resting pulse, even your weight. A doctor can use these values and learn something about your health. If your total cholesterol value is 160, a heart attack is probably not imminent. A value of 300, though, and it’s a good thing you’re visiting your doctor.

These are leading indicators. They don’t guarantee anything. I could have a cholesterol value of 160 and have a heart attack as soon as I walk out of the doctor’s office. The only true lagging indicator would be the number of heart attacks I’ve had in the last year. Yes, absolutely a much better metric, but not available until the end of the year.

So should we avoid all vanity metrics? No. Vanity metrics can possess meaningful information. They are often leading indicators. If a website’s goal is to sell memberships then number of memberships sold is that company’s key, actionable metric.

But number of unique new visitors—a vanity metric—can be a great leading indicator. More new visitors should lead to more memberships sold. Just like more passing tests should lead to higher quality. It’s not guaranteed, but it is indicative.

The TechCrunch article I mentioned has the right attitude. It says, “Vanity metrics aren’t completely useless, just don’t be fooled by them.” The real danger of vanity metrics is that they can be gamed. We can run tests that can’t fail. We can buy traffic to our site that we know will never translate into paid memberships, but make the traffic metrics look good.

As long as no one is doing things like that, vanity metrics can serve as good leading indicators. Just keep in mind that they don’t measure what you really care about. They merely indicate whether you’re on the right path.

How Do You Do It? (A Job Like a Tailored Suit)

NOOP.NL - Jurgen Appelo - Tue, 11/18/2014 - 14:21
tailored-suit

I regularly get the question, “How Do You Do It?”

“How are you able to travel so much and not get sick of it?”

“How can you read 50+ books per year and also write your own?”

Gosh, I don’t know.

The post How Do You Do It? (A Job Like a Tailored Suit) appeared first on NOOP.NL.

Categories: Project Management

Ready, Test, Go!

Xebia Blog - Tue, 11/18/2014 - 10:42

The full potential of many an agile organization is hardly ever reached. Many teams find themselves redefining user stories although they have been committed to as part of the sprint. The ‘ready phase’, meant to get user stories clear and sufficiently detailed so they can be implemented, is missed. How will each user story result in high quality features that deliver business value? The ‘Definition of Ready’ is lacking one important entry: “Automated tests are available.” Ensuring to have testable and hence automated acceptance criteria before committing to user stories in a sprint, allows you to retain focus during the sprint. We define this as: Ready, Test, Go!

Ready

Behaviour-Driven Development has proven to be a fine technique to write automated acceptance criteria. Using the Gherkin format (given, when, then), examples can be specified that need to be supported by the system once the user story is completed. When a sufficiently detailed list of examples is available, all Scrum stakeholders agree with the specification. Common understanding is achieved that when the story is implemented, we are one step closer to building the right thing.

Test

The specification itself becomes executable: at any moment in time, the gap between the desired and implemented functionality becomes visible. In other words, this automated acceptance test should be run continuously. First time, it happily fails. Next, implementation can start. This, following Test-Driven Development principles, starts with writing (also failing) unit tests. Then, development of the production code starts. When the unit tests are passing and acceptance tests for a story are passing, other user stories can be picked up; stories of which the tests happily fail. Tests thus act as a safeguard to continuously verify that the team is building the thing right. Later, the automated tests (acceptance tests and unit tests) serve as a safety net for regression testing during subsequent sprints.

Go!

That's simple: release your software to production. Ensure that other testing activities (performance tests, chain tests, etc) are as much as possible automated and performed as part of the sprint.

The (Agile) Test Automation Engineer

In order to facilitate or boost this way of working, the role of the test automation engineer is key. The test automation engineer is defining the test architecture and facilitating the necessary infrastructure needed to run tests often and fast. He is interacting with developers to co-develop fixtures, to understand how the production code is built, and to decide upon the structure and granularity of the test code.

Apart from their valuable and unique analytical skills, relevant testers grow their technical skills. If it cannot be automated, it cannot be checked, so one might question whether the user story is ready to be committed to in a sprint. The test automation engineer helps the Scrum teams to identify when they are ‘ready to test’ and urges the product owner and business to specify requirements – at least for the next sprint ahead.

So: ready, test, go!!

Seq vs Streams

Phil Trelford's Array - Tue, 11/18/2014 - 08:22

Last week Gian Ntzik gave a great talk at the F#unctional Londoners meetup on the Nessos Streams library. It’s a lightweight F#/C# library for efficient functional-style pipelines on streams of data.

The main difference between LINQ/Seq and Streams is that LINQ is about composing external iterators (Enumerable/Enumerator) and Streams is based on the continuation-passing-style composition of internal iterators, which makes optimisations such as loop fusion easier.

The slides (using FsReveal) and samples are available on Gian’s github repository.

Simple Streams

Gian started the session by live coding a simple implementation of streams in about 20 minutes:

type Stream<'T> = ('T -> unit) -> unit

let inline ofArray (source: 'T[]) : Stream<'T> =
   fun k ->
      let mutable i = 0
      while i < source.Length do
            k source.[i]
            i <- i + 1          

let inline filter (predicate: 'T -> bool) (stream: Stream<'T>) : Stream<'T> =
   fun k -> stream (fun value -> if predicate value then k value)

let inline map (mapF: 'T -> 'U) (stream: Stream<'T>) : Stream<'U> =
   fun k -> stream (fun v -> k (mapF v))

let inline iter (iterF: 'T -> unit) (stream: Stream<'T>) : unit =
   stream (fun v -> iterF v)

let inline toArray (stream: Stream<'T>) : 'T [] =
   let acc = new List<'T>()
   stream |> iter (fun v -> acc.Add(v))
   acc.ToArray()

let inline fold (foldF:'State->'T->'State) (state:'State) (stream:Stream<'T>) =
   let acc = ref state
   stream (fun v -> acc := foldF !acc v)
   !acc

let inline reduce (reducer: ^T -> ^T -> ^T) (stream: Stream< ^T >) : ^T
      when ^T : (static member Zero : ^T) =
   fold (fun s v -> reducer s v) LanguagePrimitives.GenericZero stream

let inline sum (stream : Stream< ^T>) : ^T
      when ^T : (static member Zero : ^T)
      and ^T : (static member (+) : ^T * ^T -> ^T) =
   fold (+) LanguagePrimitives.GenericZero stream

and as you can see only about 40 lines of code.

Sequential Performance

Just with this simple implementation, Gian was able to demonstrate a significant performance improvement over F#’s built-in Seq module for a simple pipeline:

#time // Turns on timing in F# Interactive

let data = [|1L..1000000L|]

let seqValue = 
   data
   |> Seq.filter (fun x -> x%2L = 0L)
   |> Seq.map (fun x -> x * x)
   |> Seq.sum
// Real: 00:00:00.252, CPU: 00:00:00.234, GC gen0: 0, gen1: 0, gen2: 0

let streamValue =
   data
   |> Stream.ofArray
   |> Stream.filter (fun x -> x%2L = 0L)
   |> Stream.map (fun x -> x * x)
   |> Stream.sum
// Real: 00:00:00.119, CPU: 00:00:00.125, GC gen0: 0, gen1: 0, gen2: 0

Note for operations over arrays, the F# Array module would be more appropriate choice and is slightly faster:

let arrayValue =
   data
   |> Array.filter (fun x -> x%2L = 0L)
   |> Array.map (fun x -> x * x)
   |> Array.sum
// Real: 00:00:00.094, CPU: 00:00:00.093, GC gen0: 0, gen1: 0, gen2: 0

Also LINQ does quite well here as it has a specialized overloads including one for summing over int64 values:

open System.Linq

let linqValue =   
   data
      .Where(fun x -> x%2L = 0L)
      .Select(fun x -> x * x)
      .Sum()
// Real: 00:00:00.058, CPU: 00:00:00.062, GC gen0: 0, gen1: 0, gen2: 0

However with F# Interactive running in 64-bit mode Streams take back the advantage (thanks to Nick Palladinos for the tip):

let streamValue =
   data
   |> Stream.ofArray
   |> Stream.filter (fun x -> x%2L = 0L)
   |> Stream.map (fun x -> x * x)
   |> Stream.sum
// Real: 00:00:00.033, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0

Looks like the 64-bit JIT is doing some black magic there.

Parallel Performance

Switching to the full Nessos Streams library, there’s support for parallel streams via the ParStream module:

let parsStreamValue =
   data
   |> ParStream.ofArray
   |> ParStream.filter (fun x -> x%2L = 0L)
   |> ParStream.map (fun x -> x + 1L)
   |> ParStream.sum
// Real: 00:00:00.069, CPU: 00:00:00.187, GC gen0: 0, gen1: 0, gen2: 0

which demonstrates a good performance increase with little effort.

For larger computes Nessos Streams supports cloud based parallel operations against Azure.

Overall Nessos Streams looks like a good alternative to the Seq module for functional pipelines.

Nessos LinqOptimzer

For further optimization Gian recommended the Nessos LinqOptimizer:

An automatic query optimizer-compiler for Sequential and Parallel LINQ. LinqOptimizer compiles declarative LINQ queries into fast loop-based imperative code. The compiled code has fewer virtual calls and heap allocations, better data locality and speedups of up to 15x

The benchmarks are impressive:

cs-ssq

Reactive Extensions (Rx)

One of the questions in the talk and on twitter later was, given Rx is also a push model, how does the performance compare:

@sforkmann @ptrelford @dsyme how efficient memory and perf wise to @ReactiveX? Our numbers are pretty hard to beat!

— Матвій Підвисоцький (@mattpodwysocki) November 13, 2014

Clearly the Nessos Streams library and Rx have different goals (data processing vs event processing), but I thought it would be interesting to compare them all the same:

open System.Reactive.Linq

let rxValue =
   data
      .ToObservable()
      .Where(fun x -> x%2L = 0L)
      .Select(fun x -> x * x)
      .Sum()      
      .ToEnumerable()
      |> Seq.head
// Real: 00:00:02.895, CPU: 00:00:02.843, GC gen0: 120, gen1: 0, gen2: 0

let streamValue =
   data
   |> Stream.ofArray
   |> Stream.filter (fun x -> x%2L = 0L)
   |> Stream.map (fun x -> x * x)
   |> Stream.sum
// Real: 00:00:00.130, CPU: 00:00:00.109, GC gen0: 0, gen1: 0, gen2: 0

In this naive comparison you can see Nessos Streams is roughly 20 times faster than Rx.

Observable module

F# also has a built-in Observable module for operations over IObservable<T> (support for operations over events was added to F# back in 2006). Based on the claims on Rx performance made by Matt Podwysocki I was curious to see how it stacked up:

let obsValue =
   data
   |> Observable.ofSeq
   |> Observable.filter (fun x -> x%2L = 0L)
   |> Observable.map (fun x -> x * x)
   |> Observable.sum
   |> Observable.first
// Real: 00:00:00.479, CPU: 00:00:00.468, GC gen0: 18, gen1: 0, gen2: 0

As you can see Observable module comes off roughly 5 times faster.

Note: I had to add some simple combinators to make this work, you can see the full snippet here: http://fssnip.net/ow

Summary

Nessos Streams look like a promising direction for performance of functional pipelines, and for gaining raw imperative performance the Nessos LINQOptimizer is impressive.

Categories: Programming

Google Play services 6.5

Android Developers Blog - Tue, 11/18/2014 - 06:46
Posted by Ian Lake, Developer Advocate

To offer more seamless integration of Google products within your app, we’re excited to start the rollout of the latest version of Google Play services.

Google Play services 6.5 includes new features in Google Maps, Google Drive and Google Wallet as well as the recently launched Google Fit API. We are also providing developers with more granular control over which Google Play services APIs your app depends on to help you maintain a lean app.

Google Maps

We’re making it easier to get directions to places from your app! The Google Maps Android API now offers a map toolbar to let users open Google Maps and immediately get directions and turn by turn navigation to the selected marker. This map toolbar will show by default when you compile against Google Play services 6.5.

In addition, there is also a new ‘lite mode’ map option, ideal for situations where you want to provide a number of smaller maps, or a map that is so small that meaningful interaction is impractical, such as a thumbnail in a list. A lite mode map is a bitmap image of a map at a specified location and zoom level.

In lite mode, markers and shapes are drawn client-side on top of the static image, so you still have full control over them. Lite mode supports all of the map types, the My Location layer, and a subset of the functionality of a fully-interactive map. Users can tap on the map to launch Google Maps when they need more details.

The Google Maps Android API also exposes a new getMapAsync(OnMapReadyCallback) method to MapFragment and MapView which will notify you exactly when the map is ready. This serves as a replacement for the now deprecated getMap() method.

We’re also exposing the Google Maps for Android app intents available to your apps including displaying the map, searching, starting turn by turn navigation, and opening Street View so you can build upon the familiar and powerful maps already available on the device.

Drive

You can now add both public and application private custom file properties to a Drive file which can be used to build very efficient search queries and allow apps to save information which is guaranteed to persist across editing by other apps.

We’ve also made it even easier to make syncing your files to Drive both user and battery friendly with the ability to control when files are uploaded by network type or charging status and cancel pending uploads.

Google Wallet

In addition to the existing ‘Buy with Google’ button available to quickly purchase goods & services using Google Wallet, this release adds a ‘Donate with Google’ button for providing the same ease of use in collecting donations.

Google Fit

The Google Fit SDK was recently officially released as part of Google Play services and can be used to super-charge your fitness apps with a simple API for working with sensors, recording activity data, and reading the user’s aggregated fitness data.

In this release, we’ve made it easier for developers to add activity segments (predefined time periods of running, walking, cycling, etc) when inserting sessions, making it easy to support pauses or multiple activity type workouts. We’ll also be adding additional samples to help kick-start your Google Fit integration.

Granular Dependency Management

As we’ve continued to add more APIs across the wide range of Google services, it can be hard to maintain a lean app, particularly if you're only using a portion of the available APIs. Now with Google Play services 6.5, you’ll be able to depend only on a minimal common library and the exact APIs your app needs. This makes it very lightweight to get started with Google Play services.

SDK Coming Soon!

We’ll be rolling out Google Play services 6.5 over the next few days, and we’ll update this blog post, publish the documentation, and make the SDK available once the rollout completes.

To learn more about Google Play services and the APIs available to you through it, visit the Google Play Services section on the Android Developer site.

Join the discussion on

+Android Developers
Categories: Programming

SAFe Agile Release Planning: Event Basics

OLYMPUS DIGITAL CAMERA

The release planning event in the Scaled Agile Framework Enterprise (SAFe) is a two-day program-level planning event that focuses the efforts of a group of teams to pursue a common vision and mission. As we have noted, the event includes participation by everyone involved in the Agile release train (ART), participation is in-person (if humanly possible), occurs every 8 – 12 weeks and has a formal structured agenda.   The agenda has five major components:

  1. Synchronization.  This component seeks to get everyone involved in the ART on the same page in terms of:
    1. Business Context
    2. Product Vision
    3. Architectural Vision
    4. Technical Environment and Standards

Each of these subcomponents provide the team with an understanding of what they are being asked to do and the environment they are being asked to operate within. The flow begins by grounding the team the business context for the program increment (the 8 -12 weeks). Each step after the business context increased in technical detail.

  1. Draft Planning. Leveraging the context, vision, architecture and environmental standards, the teams develop a draft plan. We will explore the process in greater detail in a later essay, however this where the team breakdown the stories they will tackle during the program increment, breaks them down, exposes risks and identifies and minimizes dependencies.   Draft planning usually consumes the second half of the day. The Release Train Engineer will gather the scrum masters together periodically during the draft planning process to ensure teams are sharing dependencies and understand the direction each is heading.
  2. Management Review and Problem Solving. At the end of draft planning, any significant problems with the scope of the program increment, staffing or other resource constraints are generally apparent. After the majority of team has completed their work for the day the management team (including RTE and scrum masters) meets to share what was learned and make the decisions needed to adjust to the constraints. This is must be completed before the beginning of day two.
  3. Final Planning. The teams review the adjustments made during the during the management review the previous evening as a whole group and then break out into teams again to continue planning converting the draft plans into final(ish) plans. Plans are finalized when they are accepted by the business owners.
  4. Review, Re-planning and Acceptance. When the teams plans are finalized they are reviewed by the whole team, the risks are ROAMed, the whole team votes on acceptance (a form of peer review and acceptance), any rework is performed on the plan and finally a retrospective is performed and next steps identified.

The release planning meeting operationalizes a program increment. A program increment represents 8 – 12 week planning horizon within a larger Agile Release Train. The large scale planning event helps keep all of the teams involved in the ART synchronized. The release planning meeting might be SAFe’s special sauce.


Categories: Process Management

Service discovery with consul and consul-template

Agile Testing - Grig Gheorghiu - Tue, 11/18/2014 - 00:20
I talked in the past about an "Ops Design Pattern: local haproxy talking to service layer". I described how we used a local haproxy on pretty much all nodes at a given layer of our infrastructure (webapp, API, e-commerce) to talk to services offered by the layer below it. So each webapp server has a local haproxy that talks to all API nodes it sends requests to. Similarly, each API node has a local haproxy that talks to all e-commerce nodes it needs info from.

This seemed like a good idea at a time, but it turns out it has a couple of annoying drawbacks:
  • each local haproxy runs health checks against N nodes, so if you have M nodes running haproxy, each of the N nodes will receive M health checks; if M and N are large, then you have a health check storm on your hands
  • to take a node out of a cluster at any given layer, we tag it as 'inactive' in Chef, then run chef-client on all nodes that run haproxy and talk to the inactive node at layers above it; this gets old pretty fast, especially when you're doing anything that might conflict with Chef and that the chef-client run might overwrite (I know, I know, you're not supposed to do anything of that nature, but we are all human :-)
For the second point, we are experimenting with haproxyctl so that we don't have to run chef-client on every node running haproxy. But it still feels like a heavy-handed approach.
If I were to do this again (which I might), I would still have an haproxy instance in front of our webapp servers, but for communicating from one layer of services to another I would use a proper service discovery tool such as grampa Apache ZooKeeper or the newer kids on the block, etcd from CoreOS and consul from HashiCorp.

I settled on consul for now, so in this post I am going to show how you can use consul in conjunction with the recently released consul-template to discover services and to automate configuration changes. At the same time, I wanted to experiment a bit with Ansible as a configuration management tool. So the steps I'll describe were actually automated with Ansible, but I'll leave that for another blog post.

The scenario I am going to describe involves 2 haproxy instances, each pointing to 2 Wordpress servers running Apache, PHP and MySQL, with Varnish fronting the Wordpress application. One of the 2 Wordpress servers is considered primary as far as haproxy is concerned, and the other one is a backup server, which will only get requests if the primary server is down. All servers are running Ubuntu 12.04.

Install and run the consul agent on all nodes

The agent will start in server mode on the 2 haproxy nodes, and in agent mode on the 2 Wordpress nodes.

I first deployed consul to the 2 haproxy nodes. I used a modified version of the ansible-consul role from jivesoftware. The configuration file /etc/consul.cfg for the first server (lb1) is:

{
  "domain": "consul.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "node_name": "lb1",
  "server": true,
  "bind_addr": "10.0.0.1",
  "datacenter": "us-west-1b",
  "bootstrap": true,
  "rejoin_after_leave": true
}

(and similar for lb2, with only node_name and bind_addr changed to lb2 and 10.0.0.2 respectively)

The ansible-consul role also creates a consul user and group, and an upstart configuration file like this:

# cat /etc/init/consul.conf

# Consul Agent (Upstart unit)
description "Consul Agent"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

exec sudo -u consul -g consul /opt/consul/bin/consul agent -config-dir /etc/consul.d -config-file=/etc/consul.conf >> /var/log/consul 2>&1
respawn
respawn limit 10 10
kill timeout 10

To start/stop consul, I use:

# start consul
# stop consul

Note that "server" is set to true and "bootstrap" is also set to true, which means that each consul server will be the leader of a cluster with 1 member, itself. To join the 2 servers into a consul cluster, I did the following:
  • join lb1 to lb2: on lb1 run consul join 10.0.0.2
  • tail /var/log/consul on lb1, note messages complaining about both consul servers (lb1 and lb2) running in bootstrap mode
  • stop consul on lb1: stop consul
  • edit /etc/consul.conf on lb1 and set  "bootstrap": true
  • start consul on lb1: start consul
  • tail /var/log/consul on both lb1 and lb2; it should show no more errors
  • run consul info on both lb1 and lb2; the output should show server=true on both nodes, but leader=true only on lb2
Next I ran the consul agent in regular non-server mode on the 2 Wordpress nodes. The configuration file /etc/consul.cfg on node wordpress1 was:
{  "domain": "consul.",  "data_dir": "/opt/consul/data",  "log_level": "INFO",  "node_name": "wordpress1",  "server": false,  "bind_addr": "10.0.1.1",  "datacenter": "us-west-1b",  "rejoin_after_leave": true}
(and similar for wordpress2, with the node_name set to wordpress2 and bind_addr set to 10.0.1.2)
After starting up the agents via upstart, I joined them to lb2 (although the could be joined to any of the existing members of the cluster). I ran this on both wordpress1 and wordpress2:
# consul join 10.0.0.2
At this point, running consul members on any of the 4 nodes should show all 4 members of the cluster:
Node          Address         Status  Type    Build  Protocollb1           10.0.0.1:8301   alive   server  0.4.0  2wordpress2    10.0.1.2:8301   alive   client  0.4.0  2lb2           10.0.0.2:8301   alive   server  0.4.0  2wordpress1    10.0.1.1:8301   alive   client  0.4.0  2
Install and run dnsmasq on all nodes
The ansible-consul role does this for you. Consul piggybacks on DNS resolution for service naming, and by default the domain names internal to Consul start with consul. In my case they are configured in consul.cfg via "domain": "consul."
The dnsmasq configuration file for consul is:
# cat /etc/dnsmasq.d/10-consul
server=/consul./127.0.0.1#8600
This causes dnsmasq to provide DNS resolution for domain names starting with consul. by querying a DNS server on 127.0.0.1 running on port 8600 (which is the port the local consul agent listens on to provide DNS resolution).
To start/stop dnsmasq, use: service dnsmasq start | stop.
Now that dnsmasq is running, you can look up names that end in .node.consul from any member node of the consul cluster (there are 4 member nodes in my cluster, 2 servers and 2 agents). For example, I ran this on lb2:
$ dig wordpress1.node.consul
; <<>> DiG 9.8.1-P1 <<>> wordpress1.node.consul;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2511;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:;wordpress1.node.consul. IN A
;; ANSWER SECTION:wordpress1.node.consul. 0 IN A 10.0.1.1
;; Query time: 1 msec;; SERVER: 127.0.0.1#53(127.0.0.1);; WHEN: Fri Nov 14 00:09:16 2014;; MSG SIZE  rcvd: 76
Configure services and checks on consul agent nodes

Internal DNS resolution within the .consul domain becomes even more useful when nodes define services and checks. For example, the 2 Wordpress nodes run varnish and apache (on port 80 and port 443) so we can define 3 services as JSON files in /etc/consul.d. On wordpress1, which is our active/primary node in haproxy, I defined these services:

$ cat http_service.json
{
    "service": {
        "name": "http",
        "tags": ["primary"],
        "port":80,
        "check": {
                "id": "http_check",
                "name": "HTTP Health Check",
  "script": "curl -H 'Host=www.mydomain.com' http://localhost",
        "interval": "5s"
        }
    }
}

$ cat ssl_service.json
{
    "service": {
        "name": "ssl",
        "tags": ["primary"],
        "port":443,
        "check": {
                "id": "ssl_check",
                "name": "SSL Health Check",
  "script": "curl -k -H 'Host=www.mydomain.com' https://localhost:443",
        "interval": "5s"
        }
    }
}

$ cat varnish_service.json
{
    "service": {
        "name": "varnish",
        "tags": ["primary"],
        "port":6081 ,
        "check": {
                "id": "varnish_check",
                "name": "Varnish Health Check",
  "script": "curl http://localhost:6081",
        "interval": "5s"
        }
    }
}

Each service we defined has a name, a port and a check with its own ID, name, script that runs whenever the check is executed, and an interval that specifies how often the check is run. In the examples above I specified simple curl commands against the ports that these services are running on. Note also that each service has a list of tags associated with it. In my case, the services on wordpress1 have the tag "primary". The services defined on wordpress2 are identical to the ones on wordpress1 with the only difference being the tag, which on wordpress2 is "backup".

After restarting consul on wordpress1 and wordpress2, the following service-related DNS names are available for resolution on all nodes in the consul cluster (I am going to include only relevant portions of the dig output):

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.1
varnish.service.consul. 0 IN A 10.0.1.2

This name resolves in DNS round-robin fashion to the IP addresses of all nodes that are running the varnish service, regardless of their tags and regardless of the data centers that their nodes run in. In our case, it resolves to the IP addresses of wordpress1 and wordpress2.

Note that the IP address of a given node only appears in the DNS result set if the service running on that node has a healty check. If the check fails, then consul's DNS service will not include the IP of the node in the result set. This is very important for the dynamic discovery of healthy services.

$ dig varnish.service.us-west-1b.consul

;; ANSWER SECTION:
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.2
varnish.service.us-west-1b.consul. 0 IN A 10.0.1.1

If we include the data center (in our case us-west-1b) in the DNS name we query, then only the services running on nodes in that data center will be returned in the result set. In our case though, all nodes run in the us-west-1b data center, so this query returns, like the previous one, the IP addresses of wordpress1 and wordpress2. Note that the IPs can be returned in any order, because of DNS round-robin. In this case the IP of wordpress2 was first.

$ dig SRV varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress1.node.us-west-1b.consul.
varnish.service.consul. 0 IN SRV 1 1 6081 wordpress2.node.us-west-1b.consul.

;; ADDITIONAL SECTION:
wordpress1.node.us-west-1b.consul. 0 IN A 10.0.1.1
wordpress2.node.us-west-1b.consul. 0 IN A 10.0.1.2

A useful feature of the consul DNS service is that it returns the port number that a given service runs on when queried for an SRV record. So this query returns the names and IPs of the nodes that the varnish service runs on, as well as the port number, which in this case is 6081. The application querying for the SRV record needs to interpret this extra piece of information, but this is very useful for the discovery of internal services that might run on non-standard port numbers.

$ dig primary.varnish.service.consul

;; ANSWER SECTION:
primary.varnish.service.consul. 0 IN A 10.0.1.1

$ dig backup.varnish.service.consul

;; ANSWER SECTION:
backup.varnish.service.consul. 0 IN A 10.0.1.2

The 2 DNS queries above show that it's possible to query a service by its tag, in our case 'primary' vs. 'backup'. The result set will contain the IP addresses of the nodes tagged with the specific tag and running the specific service we asked for. This feature will prove useful when dealing with consul-template in haproxy, as I'll show later in this post.

Load balance across services

It's easy now to see how an application can take advantage of the internal DNS service provided by consul and load balance across services. For example, an application that needs to load balance across the 2 varnish services on wordpress1 and wordpress2 would use varnish.service.consul as the DNS name it talks to when it needs to hit varnish. Every time this DNS name is resolved, a random node from wordpress1 and wordpress2 is returned via the DNS round-robin mechanism. If varnish were to run on a non-standard port number, the application would need to issue a DNS request for the SRV record in order to obtain the port number as well as the IP address to hit.

Note that this method of load balancing has health checks built in. If the varnish health check fails on one of the nodes providing the varnish service, that node's IP address will not be included in the DNS result set returned by the DNS query for that service.

Also note that the DNS query can be customized for the needs of the application, which can query for a specific data center, or a specific tag, as I showed in the examples above.

Force a node out of service

I am still looking for the best way to take nodes in and out of service for maintenance or other purposes. One way I found so far is to deregister a given service via the Consul HTTP API. Here is an example of a curl command that accomplishes that, executed on node wordpress1:

$ curl -v http://localhost:8500/v1/agent/service/deregister/varnish
* About to connect() to localhost port 8500 (#0)
*   Trying 127.0.0.1... connected
> GET /v1/agent/service/deregister/varnish HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:8500
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 17 Nov 2014 19:01:06 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
* Closing connection #0

The effect of this command is that the varnish service on node wordpress1 is 'deregistered', which for my purposes means 'marked as down'. DNS queries for varnish.service.consul will only return the IP address of wordpress2:

$ dig varnish.service.consul

;; ANSWER SECTION:
varnish.service.consul. 0 IN A 10.0.1.2

We can also use the Consul HTTP API to verify that the varnish service does not appear in the list of active services on node wordpress1. We'll use the /agent/services API call and we'll save the output to a file called services.out, then we'll use the jq tool to pretty-print the output:

$ curl -v http://localhost:8500/v1/agent/services -o services.out
$ jq . <<< `cat services.out`{  "http": {    "ID": "http",    "Service": "http",    "Tags": [      "primary"    ],    "Port": 80  },  "ssl": {    "ID": "ssl",    "Service": "ssl",    "Tags": [      "primary"    ],    "Port": 443  }}
Note that only the http and ssl services are shown.
Force a node back in service

Again, I am still looking for the best way to mark as service as 'up' once it was marked as 'down'. One way would be to register the service via the Consul HTTP API, and that requires issuing a POST request with the payload being the JSON configuration file for that service. Another way is to just restart the consul agent on the node in question. This will register the service that had been deregistered previously.

Install and configure consul-template

For the next few steps, I am going to show how to use consul-template in conjuction with consul for discovering services and configuring haproxy based on the discovered services.

I automated the installation and configuration of consul-template via an Ansible role that I put on Github, but I am going to discuss the main steps here. See also the instructions on the consul-template Github page.

In my Ansible role, I copy the consul-template binary to the target node (in my case the 2 haproxy nodes lb1 and lb2), then create a directory structure /opt/consul-template/{bin,config,templates}. The consul-template configuration file is /opt/consul-template/config/consul-template.cfg and it looks like this in my case:

$ cat config/consul-template.cfg
consul = "127.0.0.1:8500"

template {
  source = "/opt/consul-template/templates/haproxy.ctmpl"
  destination = "/etc/haproxy/haproxy.cfg"
  command = "service haproxy restart"
}

Note that consul-template needs to be able to talk a consul agent, which in my case is the local agent listening on port 8500. The template that consul-template maintains is defined in another file,  /opt/consul-template/templates/haproxy.ctmpl. What consul-template does is monitor changes to that file via changes to the services referenced in the file. Upon any such change, consul-template will generate a new target file based on the template and copy it to the destination file, which in my case is the haproxy config file /etc/haproxy/haproxy.cfg. Finally, consul-template will executed a command, which in my case is the restarting of the haproxy service.

Here is the actual template file for my haproxy config, which is written in the Go template format:

$ cat /opt/consul-template/templates/haproxy.ctmpl

global
  log 127.0.0.1   local0
  maxconn 4096
  user haproxy
  group haproxy

defaults
  log     global
  mode    http
  option  dontlognull
  retries 3
  option redispatch
  timeout connect 5s
  timeout client 50s
  timeout server 50s
  balance  roundrobin

# Set up application listeners here.

frontend http
  maxconn {{key "service/haproxy/maxconn"}}
  bind 0.0.0.0:80
  default_backend servers-http-varnish

backend servers-http-varnish
  balance            roundrobin
  option httpchk GET /
  option  httplog
{{range service "primary.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.varnish"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}

frontend https
  maxconn            {{key "service/haproxy/maxconn"}}
  mode               tcp
  bind               0.0.0.0:443
  default_backend    servers-https

backend servers-https
  mode               tcp
  option             tcplog
  balance            roundrobin
{{range service "primary.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} weight 1 check port {{.Port}}
{{end}}
{{range service "backup.ssl"}}
    server {{.Node}} {{.Address}}:{{.Port}} backup weight 1 check port {{.Port}}
{{end}}


To the trained eye, this looks like a regular haproxy configuration file, with the exception of the portions bolded above. These are Go template snippets which rely on a couple of template functions exposed by consul-template above and beyond what the Go templating language offers. Specifically, the key function queries a key stored in the Consul key/value store and outputs the value associated with that key (or an empty string if the value doesn't exist). The service function queries a consul service by its DNS name and returns a result set used inside the range statement. The variables inside the result set can be inspected for properties such as Node, Address and Port, which correspond to the Consul service node name, IP address and port number for that particular service.

In my example above, I use the value of the key service/haproxy/maxconn as the value of maxconn. In the http-varnish backend, I used 2 sets of services names, primary.varnish and backup.varnish, because I wanted to differentiate in haproxy.cfg between the primary server (wordpress1 in my case) and the backup server (wordpress2). In the ssl backend, I did the same but with the ssl service.
Everything so far would work fine with the exception of the key/value pair represented by the key service/haproxy/maxconn. To define that pair, I used the Consul key/value store API (this can be run on any member of the Consul cluster):
$ cat set_haproxy_maxconn.sh#!/bin/bash
MAXCONN=4000
curl -X PUT -d "$MAXCONN" http://localhost:8500/v1/kv/service/haproxy/maxconn
To verify that the value was set, I used:
$ cat query_consul_kv.sh#!/bin/bash
curl -v http://localhost:8500/v1/kv/?recurse
$ ./query_consul_kv.sh* About to connect() to localhost port 8500 (#0)*   Trying 127.0.0.1... connected> GET /v1/kv/?recurse HTTP/1.1> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3> Host: localhost:8500> Accept: */*>< HTTP/1.1 200 OK< Content-Type: application/json< X-Consul-Index: 30563< X-Consul-Knownleader: true< X-Consul-Lastcontact: 0< Date: Mon, 17 Nov 2014 23:01:07 GMT< Content-Length: 118<* Connection #0 to host localhost left intact* Closing connection #0[{"CreateIndex":10995,"ModifyIndex":30563,"LockIndex":0,"Key":"service/haproxy/maxconn","Flags":0,"Value":"NDAwMA=="}]
At this point, everything is ready for starting up the consul-template service (in Ubuntu), I did it via this Upstart configuration file:
# cat /etc/init/consul-template.conf# Consul Template (Upstart unit)description "Consul Template"start on (local-filesystems and net-device-up IFACE!=lo)stop on runlevel [06]
exec /opt/consul-template/bin/consul-template  -config=/opt/consul-template/config/consul-template.cfg >> /var/log/consul-template 2>&1
respawnrespawn limit 10 10kill timeout 10
# start consul-template
Once consul-template starts, it will peform the actions corresponding to the functions defined in the template file /opt/consul-template/templates/haproxy.ctmpl. In my case, it will query Consul for the value of the key service/haproxy/maxconn and for information about the 2 Consul services varnish.service and ssl.service. It will then save the generated file to /etc/haproxy/haproxy.cfg and it will restart the haproxy service. The relevant snippets from haproxy.cfg are:
frontend http  maxconn 4000  bind 0.0.0.0:80  default_backend servers-http
backend servers-http  balance            roundrobin  option httpchk GET /  option  httplog
    server wordpress1 10.0.1.1:6081 weight 1 check port 6081

    server wordpress2 10.0.1.2:6081 backup weight 1 check port 6081
and
frontend https  maxconn            4000  mode               tcp  bind               0.0.0.0:443  default_backend    servers-https
backend servers-https  mode               tcp  option             tcplog  balance            roundrobin
    server wordpress1 10.0.1.1:443 weight 1 check port 443

    server wordpress2 10.0.1.2:443 backup weight 1 check port 443
I've been running this as a test on lb2. I don't consider my setup quite production-ready because I don't have monitoring in place, and I also want to experiment with consul security tokens for better security. But this is a pattern that I think will work.





Aeron: Do we really need another messaging system?

Do we really need another messaging system? We might if it promises to move millions of messages a second, at small microsecond latencies between machines, with consistent response times, to large numbers of clients, using an innovative design.  

And that’s the promise of Aeron (the Celtic god of battle, not the chair, though tell that to the search engines), a new high-performance open source message transport library from the team of Todd Montgomery, a multicast and reliable protocol expert, Richard Warburton, an expert on compiler optimizations, and Martin Thompson, the pasty faced performance gangster.

The claims are Aeron is already beating the best products out there on throughput and latency matches the best commercial products up to the 90th percentile. Aeron can push small 40 byte messages at 6 million messages a second, which is a very difficult case.

Here’s a talk Martin gave on Aeron at Strangeloop: Aeron: Open-source high-performance messaging. I’ll give a gloss of his talk as well as integrating in sources of information listed at the end of this article.

Martin and his team were in the enviable position of having a client that required a product like Aeron and was willing to both finance its development while also making it open source. So go git Aeron on GitHub. Note, it’s early days for Aeron and they are still in the heavy optimization phase.

The world has changed therefore endpoints need to scale as never before. This is why Martin says we need a new messaging system. It’s now a multi-everything world. We have multi-core, multi-socket, multi-cloud, multi-billion user computing, where communication is happening all the time. Huge numbers of consumers regularly pound a channel to read from same publisher, which causes lock contention, queueing effects, which causes throughput to drop and latency to spike. 

What’s needed is a new messaging library to make the most of this new world. The move to microservices only heightens the need:

As we move to a world of micro services then we need very low and predictable latency from our communications otherwise the coherence component of USL will come to rain fire and brimstone on our designs.

With Aeron the goal is to keep things pure and focused. The benchmarking we have done so far suggests a step forward in throughput and latency. What is quite unique is that you do not have to choose between throughput and latency. With other high-end messaging transports this is a distinct choice. The algorithms employed by Aeron give maximum throughput while minimising latency up until saturation.

“Many messaging products are a Swiss Army knife; Aeron is a scalpel,” says Martin, which is a good way to understand Aeron. It’s not a full featured messaging product in the way you may be used to, like Kafka. Aeron does not persist messages, it doesn’t support guaranteed delivery, nor clustering, nor does it support topics. Aeron won’t know if a client has crashed and be able to sync it back up from history or initialize a new client from history. 

The best way to place Aeron in your mental matrix might be as a message oriented replacement for TCP, with higher level services written on top. Todd Montgomery expands on this idea:

Aeron being an ISO layer 4 protocol provides a number of things that messaging systems can't and also doesn't provide several things that some messaging systems do.... if that makes any sense. Let me explain slightly more wrt all typical messaging systems (not just Kafka and 0MQ). 

One way to think more about where Aeron fits is TCP, but with the option of reliable multicast delivery. However, that is a little limited in that Aeron also, by design, has a number of possible uses that go well beyond what TCP can do. Here are a few things to consider: 

Todd continues on with more detail, so please keep reading the article to see more on the subject.

At its core Aeron is a replicated persistent log of messages. And through a very conscious design process messages are wait-free and zero-copy along the entire path from publication to reception. This means latency is very good and very predictable.

That sums up Aeron is nutshell. It was created by an experienced team, using solid design principles sharpened on many previous projects, backed by techniques not everyone has in their tool chest. Every aspect has been well thought out to be clean, simple, highly performant, and highly concurrent.

If simplicity is indistinguishable from cleverness, then there’s a lot of cleverness going on in Aeron. Let’s see how they did it...

Categories: Architecture

How To Build a Roadmap for Your Digital Business Transformation

Let’s say you want to take your business to the Cloud --  How do you do it?

If you’re a small shop or a startup, it might be easy to just swipe your credit card and get going.

If, on the other hand, you’re a larger business that wants to start your journey to the Cloud, with a lot of investments and people that you need to bring along, you need a roadmap.

The roadmap will help you deal with setbacks, create confidence in the path, and help ensure that you can get from point A to point B (and that you know what point B actually is.)  By building an implementable roadmap for your business transformation, you can also build a coalition of the willing to help you get their faster.  And you can design your roadmap so that your journey flows continuous business value along the way.

In the book, Leading Digital: Turning Technology into Business Transformation, George Westerman, Didier Bonnet, and Andrew McAfee, share how top leaders build better roadmaps for their digital business transformation.

Why You Need to Build a Roadmap for Your Digital Transformation

If you had infinite time and resources, maybe you could just wing it, and hope for the best.   A better approach is to have a roadmap as a baseline.  Even if your roadmap changes, at least you can share the path with others in your organization and get them on board to help make it happen.

Via Leading Digital:

“In a perfect world, your digital transformation would deliver an unmatched customer experience, enjoy the industry's most effective operations, and spawn innovative, new business models.  There are a myriad of opportunities for digital technology to improve your business and no company can entertain them all at once.  The reality of limited resources, limited attention spans, and limited capacity for change with force focused choices.  This is the aim of your roadmap.”

Find Your Entry Point

Your best starting point is a business capability that you want to exploit.

Via Leading Digital:

“Many companies have come to realize that before they can create a wholesale change within their organization, they have to find an entry point that will begin shifting the needle.  How? They start by building a roadmap that leverages existing assets and capabilities.  Burberry, for example, enjoyed a globally recognized brand and a fleet of flagship retail locations around the world.  The company started by revitalizing its brand and customer experience in stores and online.  Others, like Codelco, began with the core operational processes of their business.  Caesars Entertainment combined strong capabilities in analytics with a culture of customer service to deliver a highly personalized guest experience.  There is no single right way to start your digital transformation.  What matters is that you find the existing capability--your sweet spot--that will get your company off the starting blocks.

Once your initial focus is clear, you can start designing your transformation roadmap.  Which investments and activities are necessary to close the gap to your vision?  What is predictable, and what isn't? What is the timing and scheduling of each initiative? What are the dependencies between them?  What organizational resources, such as analytics skills, are required?”

Engage Practitioners Early in the Design

If you involve others in your roadmap, you get their buy-in, and they will help you with your business transformation.

Via Leading Digital:

“Designing your roadmap will require input from a broad set of stakeholders.  Rather than limit the discussion to the top team, engage the operational specialists who bring an on-the-ground perspective.  This will minimize the traditional vision-to-execution gap.  You can crowd-source the design.  Or, you can use facilitated workshops, as as 'digital days,' as an effective way to capture and distill the priorities and information you will need to consider.  We've seen several Digital Masters do both.

Make no mistake; designing your roadmap will take time, effort, and multiple iterations.  But you will find it a valuable exercise.  it forces agreement on priorities and helps align senior management and the people tasked to execute the program.  Your roadmap will become more than just a document.  If executed well, it can be the canvas of the transformation itself.  Because your roadmap is a living document, it will evolve as your implementation progresses.”

Design for Business Outcome, Not Technology

When you create your roadmap, focus on the business outcomes.   Think in terms of adding incremental business capabilities.   Don’t make it a big bang thing.   Instead, start small, but iterate on building business capabilities that take advantage of Cloud, Mobile, Social, and Big Data technologies.

Via Leading Digital:

“Technology for its own sake is a common trap.  Don't build your roadmap as a series of technology projects.  Technology is only part of the story in digital transformation and often the least challenging one.  For example, the major hurdles for Enterprise 2.0 platforms are not technical.  Deploying the platform is relatively straightforward, and today's solutions are mature.  The challenge lies in changing user behavior--encouraging adoption and sustaining engagement in the activities the platform is meant to enable.

Express your transformation roadmap in terms of business outcomes.  For example, 'Establish a 360-degree understanding of our customers.'  Build into your roadmap the many facets of organizational change that your transformation will require customer experiences, operational processes, employee ways of working, organization, culture, communication--the list goes on.  This is why contributions from a wide variety is so critical.”

There are lots of way to build a roadmap, but the best thing you can do is put something down on paper so that you can share the path with other people and start getting feedback and buy-in.

You’ll be surprised but when you show business and IT leaders a roadmap, it helps turn strategy into execution and make things real in people’s minds.

You Might Also Like

10 High-Value Activities in the Enterprise

Cloud Changes the Game from Deployment to Adoption

Drive Business Transformation by Reenvisioning Operations

Drive Business Transformation by Reenvisioning Your Customer Experience

Dual-Speed IT Drives Business Transformation and Improves IT-Business Relationships

How To Build a Better Business Case for Digital Initiatives

How To Improve the IT-Business Relationship

How Leaders are Building Digital Skills

Management Innovation is at the Top of the Innovation Stack

Categories: Architecture, Programming