Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.

Defining Empathy

Empathy is the ability to place yourself in another person’s shoes.

Empathy is the ability to place yourself in another person’s shoes.

Coaching is a key tool to help individuals and teams reach peak performance. One of the key attributes of a good coach is empathy. I was recently drawn into a discussion about the role of empathy in coaching. Critical to the understanding the role that empathy plays in coaching is understanding the definition of empathy. One of the most difficult parts of the discussion was agreeing on a common definition .

One of the first definitions proffered during our discussion was that empathy represented the ability to understand others feelings and emotions. On reflection it was decided that this definition only goes part of the way. Empathy is a deeper concept that subsumes mere understanding, requiring more of a cognitive connection between he coach and coachee. The level of understanding requires both parties understand not only the emotion or behavior, but have understanding of the basis for the emotion or behavior without taking on the emotion or accepting the emotion. In the end we arrived at a more precise definition, empathy is the capacity/ability to understand or feel what another person is experiencing from their frame of reference.  Put more succinctly, Chris Nurre on SPaMCAST 362 stated,

The ability to put yourself in another person’s shoes is critical for a good coach because it helps the coach to accept the coachee on his or her own terms.  Empathy is useful to:

  • Help to build trust.
  • Build a pathway that enables communication.
  • Disarm the coachee’s self-interest.
  • Facilitate the development of altruism.
  • Encourage bigger picture point of view

Empathy is also a valuable commodity at a team level.  Empathy between team members allows teams to create bonds of trust in a similar manner as between individuals. Without empathy between members, teams will be not be cohesive and will pursue their individual self-interest.  A coach must be able to put themselves into the shoes of the team and team members they interact with in order to help those they are coaching to build self-knowledge.  At the same time, team members must have the enough empathy for their fellow team members to hold a team together so they can pursue a single goal. 

Next few entries will include discussions of :

  1. Attributes and techniques of an empathic coach.
  2. Learning empathy.
  3. Too much empathy.

Categories: Process Management

Sponsored Post: StatusPage.io, iStreamPlanet, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?
  • Senior Devops Engineer - StatusPage.io is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.

  • As a Networking & Systems Software Engineer at iStreamPlanet you’ll be driving the design and implementation of a high-throughput video distribution system. Our cloud-based approach to video streaming requires terabytes of high-definition video routed throughout the world. You will work in a highly-collaborative, agile environment that thrives on success and eats big challenges for lunch. Please apply here.

  • As a Scalable Storage Software Engineer at iStreamPlanet you’ll be driving the design and implementation of numerous storage systems including software services, analytics and video archival. Our cloud-based approach to world-wide video streaming requires performant, scalable, and reliable storage and processing of data. You will work on small, collaborative teams to solve big problems, where you can see the impact of your work on the business. Please apply here.

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
  • Your event could be here. How cool is that?
Cool Products and Services
  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Categories: Architecture

jq: Cannot iterate over number / string and number cannot be added

Mark Needham - Tue, 11/24/2015 - 01:12

In my continued parsing of meetup.com’s JSON API I wanted to extract some information from the following JSON file:

$ head -n40 data/members/18313232.json
    "status": "active",
    "city": "London",
    "name": ". .",
    "other_services": {},
    "country": "gb",
    "topics": [],
    "lon": -0.13,
    "joined": 1438866605000,
    "id": 92951932,
    "state": "17",
    "link": "http://www.meetup.com/members/92951932",
    "photo": {
      "thumb_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/thumb_250896203.jpeg",
      "photo_id": 250896203,
      "highres_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/highres_250896203.jpeg",
      "photo_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/member_250896203.jpeg"
    "lat": 51.49,
    "visited": 1446745707000,
    "self": {
      "common": {}
    "status": "active",
    "city": "London",
    "name": "Abdelkader Idryssy",
    "other_services": {},
    "country": "gb",
    "topics": [
        "name": "Weekend Adventures",
        "urlkey": "weekend-adventures",
        "id": 16438
        "name": "Community Building",
        "urlkey": "community-building",

In particular I want to extract the member’s id, name, join date and the ids of topics they’re interested in. I started with the following jq query to try and extract those attributes:

$ jq -r '.[] | [.id, .name, .joined, (.topics[] | .id | join(";"))] | @csv' data/members/18313232.json
Cannot iterate over number (16438)

Annoyingly this treats topic ids on an individual basis rather than as an array as I wanted. I tweaked the query to the following with no luck:

$ jq -r '.[] | [.id, .name, .joined, (.topics[].id | join(";"))] | @csv' data/members/18313232.json
Cannot iterate over number (16438)

As a guess I decided to wrap ‘.topics[].id’ in an array literal to see if it had any impact:

$ jq -r '.[] | [.id, .name, .joined, ([.topics[].id] | join(";"))] | @csv' data/members/18313232.json
92951932,". .",1438866605000,""
jq: error (at data/members/18313232.json:60013): string ("") and number (16438) cannot be added

Woot! A different error message at least and this one seems to be due to a type mismatch between the string we want to end up with and the array of numbers that we currently have.

We can cast our way to victory with the ‘tostring’ function:

$ jq -r '.[] | [.id, .name, .joined, ([.topics[].id | tostring] | join(";"))] | @csv' data/members/18313232.json
92951932,". .",1438866605000,""
193866304,"Abdelkader Idryssy",1445195325000,"16438;20727;15401;9760;20246;20923;3336;2767;242;58259;4417;1789;10454;20274;10232;563;25375;16433;15187;17635;26273;21808;933;7789;23884;16212;144477;15322;21067;3833;108403;20221;1201;182;15083;9696;4377;15360;18296;15121;17703;10161;1322;3880;18333;3485;15585;44584;18692;21681"
28643052,"Abhishek Chanda",1439688955000,"646052;520302;15167;563;65735;537492;646072;537502;24959;1025832;8599;31197;24410;26118;10579;1064;189;48471;16216;18062;33089;107633;46831;20479;1423042;86258;21441;3833;21681;188;9696;58162;20398;113032;18060;29971;55324;30928;15261;58259;638;16475;27591;10107;242;109595;10470;26384;72514;1461192"
39523062,"Adam Kinder-Jones",1438677474000,"70576;21549;3833;42277;164111;21522;93380;48471;15167;189;563;25435;87614;9696;18062;58162;10579;21681;19882;108403;128595;15582;7029"
194119823,"Adam Lewis",1444867994000,"10209"
14847001,"Adam Rogers",1422917313000,""
87709042,"Adele Green",1436867381000,"15167;18062;102811;9696;30928;18060;78565;189;7029;48471;127567;10579;58162;563;3833;16216;21441;37708;209821;15401;59325;31792;21836;21900;984862;15720;17703;96823;4422;85951;87614;37428;2260;827;121802;19672;38660;84325;118991;135612;10464;1454542;17936;21549;21520;17628;148303;20398;66339;29661"
11497937,"Adrian Bridgett",1421067940000,"30372;15046;25375;638;498;933;374;27591;18062;18060;15167;10581;16438;15672;1998;1273;713;26333;15099;15117;4422;15892;242;142180;563;31197;20479;1502;131178;15018;43256;58259;1201;7319;15940;223;8652;66493;15029;18528;23274;9696;128595;21681;17558;50709;113737"
14151190,"adrian lawrence",1437142198000,"7029;78565;659;85951;15582;48471;9696;128595;563;10579;3833;101960;16137;1973;78566;206;223;21441;16216;108403;21681;186;1998;15731;17703;15043;16613;17885;53531;48375;16615;19646;62326;49954;933;22268;19243;37381;102811;30928;455;10358;73511;127567;106382;16573;36229;781;23981;1954"
183557824,"Adrien Pujol",1421882756000,"108403;563;9696;21681;188;24410;1064;32743;124668;15472;21123;1486432;1500742;87614;46831;1454542;46810;166000;126177;110474"
Categories: Programming

Android Studio 2.0 Preview

Android Developers Blog - Tue, 11/24/2015 - 00:52

Posted by, Jamal Eason, Product Manager, Android

One the most requested features we receive is to make app builds and deployment faster in Android Studio. Today at the Android Developer Summit, we’re announcing a preview of Android Studio 2.0 featuring Instant Run that will dramatically improve your development workflow. With Android Studio 2.0, we are also including a preview of a new GPU Profiler.

All these updates are available now in the canary release channel, so we can get your feedback. Since this initial release is a preview, you may want to download and run an additional copy of Android Studio in parallel with your current version.

New Features in Android Studio 2.0
Instant Run: Faster Build & Deploy

Android Studio’s instant run feature allows you to to quickly see your changes running on your device or emulator.

Getting started is easy. If you create a new project with Android Studio 2.0 then your projects are already setup. If you have a pre-existing app open Settings/Preferences, the go to Build, Execution, Deployment → Instant Run. Click on Enable Instant Run... This will ensure you have the correct gradle plugin for your project to work with Instant Run.

Enable Instant Run for Android Studio projects

Select Run as normal and Android Studio will perform normal compilation, packaging and install steps and run your app on your device or emulator. After you make edits to your source code or resources, pressing Run again will deploy your changes directly into the running app.

New Run & Stop Actions in Android Studio for Instant Run

For a more detailed guide setup and try Instant Run, click here.

GPU Profiler

Profiling your OpenGL ES Android code is now even easier with the GPU Profiler in Android Studio. The tool is in early preview, but is very powerful and not only shows details about the GL State and Commands, you can record entire sessions and walk through the GL Framebuffer and Textures as your app is running OpenGL ES Code.

Android Studio GPU Profiler

To get started, first download the GPU Debugging Tools package from the Android Studio SDK Manager. Click here for more details about the GPU Profiler tool and how to set up your Android app project for profiling.

Whats Next

This is just a taste of some of the bigger updates in this latest release of Android Studio. We'll be going through the full release in more detail at the Android Developer Summit (livestreamed on Monday and Tuesday). Over the next few weeks, we'll be showing how to take advantage of even more features in Android Studio 2.0, so be sure to check back in.

If you're interested in more Android deep technical content, we will be streaming over 16 hours of content from the inaugural Android Developer Summit over the next two days, and together with Codelabs, all of this content will be available online after the Summit concludes.

Android Studio 2.0 is available today on the Android Studio canary channel. Let us know what you think of these new features by connecting with the Android Studio development team on Google+.

Categories: Programming

Example Mapping - Steering the conversation

Xebia Blog - Mon, 11/23/2015 - 19:14

People who are familiar with BDD and ATDD already know how useful the three amigos (product owner, tester and developer) session is for talking about what the system under development is supposed to do. But somehow these refinement sessions seem to drain the group's energy. One of the problems I see is not having a clear structure for conversations.

Example Mapping is a simple technique that can steer the conversation into breaking down any product backlog items within 30 minutes.

The Three Amigos

Example Mapping is best used in so called Three Amigo Sessions. The purpose of this session is to create a common understanding of the requirements and a shared vocabulary across product owner and the rest of the team. During this session the product owner shares every user story by explaining the need for change in a product. It is essential that the conversation has multiple points of view. Testers and developers identify missing requirements or edge cases and are addressed by describing accepted behaviour, before a feature is considered ready for development.

In order to help you steer the conversations, here is a list of guidelines for Three Amigos Sessions:

  • Empathy: Make sure the team has the capability to help each other understand the requirements. Without empathy and the room for it, you are lost.
  • Common understanding of the domain: Make sure that the team uses the same vocabulary (digital of physical) and speaks the same domain language.
  • Think big, but act small: Make sure all user stories are small and ready enough to make impact
  • Rules and examples: Make sure every user story explains the specification with rules and scenarios / examples.
Example mapping

Basic ingredients for Example Mapping are curiosity and a pack of post-it notes containing the following colours:

  • Yellow for defining user story
  • Red for defining questions
  • Blue for defining rules
  • Green for defining examples

Using the following steps can help you steer the conversations towards accepted behaviour of the system under development:

  1. Understanding the problem
    Let the product owner start by writing down the user story on a yellow post-it note and have him explain the need for change in the product. The product owner should help the team understand the problem.
  2. Challenge the problem by asking questions
    Once the team has understood the problem, the team challenges the problem by asking questions. Collect all the questions by writing them down starting with "What if ... " on red post-it notes. Place them on the right side of the user story (yellow) post-it note. We will treat this post-it note as a ticket for a specific and focussed discussion.
  3. Identifying rules
    The key here is to identify rules for every given answer (steered from the red question post-it notes). Extract rules from the answers and write them down on a blue post-it note. Place them below the user story (yellow) post-it note. This basically describes the acceptance criteria of a user story. Make sure that every rule can be discussed separately. The single responsibility principle and separation of concerns should be applied.
  4. Describing situations with examples
    Once you have collected all the important rules of the user story, you collect all interesting situations / examples by writing them down on a green post-it note. Place them below the rule (blue) post-it note. Make sure that the team talks about examples focussed on one single rule. Steer the discussion by asking questions like: Have we reached the boundaries of the rule? What happens when the rule fails?
An example


Here in the given example above, the product owners requires a free shipping process. She wrote it down on a yellow post-it note. After collecting and answering questions, two rules were discussed and refined on blue post-it notes; shopping cart limit and the appearance of the free shipping banner on product pages. All further discussions were steered towards the appropriate rule. Two examples in the shopping cart limit were defined and one example for the free shipping banner on a green post-it notes. Besides steering the team in rule based discussions, the team also gets a clear overview of the scope for the first iteration of the requirement.

Getting everyone on the same page is the key to success here. Try it a couple of times and let me know how it went.


Your New Technical Skills

One of the struggles a developer faces when moving up the ladder, is how to keep their technical skills?

If they are used to being a high-performing, individual contributor, and a technical go-to resource, this is especially challenging.


Because the job is different, now.

It’s no longer about how awesome your developer skills are.  Now it’s about bringing out the best from the people you manage, and hopefully *lead.*  Your job is now about creating a high-performing team.   It’s about growing more leaders.  It’s about being the oil and the glue.  The oil so that the team can work effectively, as friction-free as possible, and the glue, so that all the work connects together.

There’s a good book called What Got You Here, Won’t Get You There, by Marshall Goldsmith.  The book title sort of says it all, but the big idea is that if you take on a new management role, but continue to perform like an individual contributor, or at a lower level, don’t expect to be successful.

The irony is that most people will quickly default to doing what they do best, which is what got them to where they are.   But now the rules have changed, and they don’t adapt.  And as the saying goes, adapt or die.  It’s how a lot of careers end.

But not you.

While you will want to keep up your skills that got you to where you are, the real challenge is about adding new ones.   And, at first blush, they might just seem like “soft skills”, while you are used to learning “technical skills.”   Well, treat these at your new technical skills to learn.

Your new technical skills are:

  1. Building EQ (Emotional Intelligence) in teams
  2. Building High-Performance Teams
  3. Putting vision/mission/values in place
  4. Putting the execution model in place
  5. Directing and inspiring as appropriate – situational leadership – per employee
  6. Creating and leverage leadership opportunities and teachable moments
  7. Creating the right decision frameworks and flows and empowerment models
  8. Building a better business
  9. And doing thought-work in the space for the industry

I’ll leave this list at 9, so that it doesn’t become a 10 Top Skills to Learn to Advance Your Career post.

Emotional Intelligence as a Technical Skill

If you wonder how Emotional Intelligence can be a technical skill, I wish I could show you all the Mind Maps, the taxonomies, the techniques, the hard-core debates over the underlying principles, patterns, and practices, that I have seen many developers dive into over the years.

The good news is that Emotional Intelligence is a skill you can build.  I’ve seen many developers become first time managers and then work on their Emotional Intelligence skills and everything changes.  They become a better manager.  They become more influential.  They read a room better and know how to adapt themselves more effectively in any situation.  They know how to manage their emotions.  And they know how to inspire and delight others, instead of tick them off.

Along the lines of Emotional Intelligence, I should add Financial Intelligence to the mix.  So many developers and technologists would be more effective in the business arena, if they mastered the basics of Financial Intelligence.  There is actually a book called Financial Intelligence for IT Professionals.   It breaks down the basics of how to think in financial terms.   Innovation doesn’t fund itself.  Cool projects don’t fund themselves.  Technology is all fun and games until the money runs out.  But if you can show how technology helps the business, all of a sudden instead of being a cost or overhead, you are now part of the value chain, or at least the business can appreciate what you bring to the table.

Building High-Performance Teams as a Technical Skill

Building High-Performance Teams takes a lot of know-how.  It helps if you are already well grounded in how to ship stuff.  It really helps if you have some basic project management skills and you know how to see how the parts of the project come together as a whole.  It especially helps if you have a strong background in Agile methodologies like Kanban, Scrum, XP, etc.  While you don’t need to create Kanbans, its certainly helps if you get the idea of visualizing the workflow and reducing open work.  And, while you may not need to do Scrum per se, it helps if you get the idea behind a Product Backlog, a Sprint Backlog, and Sprints.  And while you may not need to do XP, it helps if you get the idea of sustainable pace, test-driven development, pairing, collective ownership, and an on-site customer. 

But the real key to building high-performance teams is actually about trust. 

Not trust as in “I trust that you’ll do that.”  

No.  It’s vulnerability-based trust, as in “I’ve got your back.”   This is what enables individuals on a team to go out on a limb, to try more, to do more, to become more.

Otherwise, they everybody has to watch out for their own backs, and they spend their days making sure they don’t get pushed off the boat or hanging from a limb, while somebody saws it off.   (See 10 Things Great Managers Do.)

And nothing beats a self-organizing team, where people sign-up for work (vs. get assigned work), where people play their position well, and help others play theirs.

Vision, Mission, Values as a Technical Skill

Vision, mission, and values are actually some of the greatest technical skills you can master, for yourself and for any people or teams you might lead, now or in the future.   So many people mix up vision and mission.

Here’s the deal:

Mission is the job.

Vision is where you want to go, now that you know what the job is.

And Values are what you express in actions in terms of what you reward.  Notice how I said actions, not words.  Too many people and teams say they value one thing, but their actions value another.

It’s one thing to go off and craft a vision, mission, and values that you want everybody to adhere to.  It’s another thing to co-create the future with a team, and create your vision, mission, and values, with everybody’s fingerprints on it.  But that’s how you get buy-in.   And getting buy-in, usually involves dealing with conflict (which is a whole other set of technical skills you can master.)  

When a leader can express a vision, mission, and values with clarity, they can inspire the people around them, bring out the best in people, create a high-performance culture, and accelerate results.

Execution as a Technical Skill

This is where the rubber meets the road.  There are so many great books on how to execute with skill.  One of my favorites is Flawless Execution.  And of the most insightful books on creating an effective execution model is Managing the Design Factory.

The main thing to master here is to be able to easily create a release schedule that optimizes resources and people, while flowing value to customers and stakeholders.

I know that’s boiling a lot down, but that’s the point.  To master execution, you need to be able to easily think about the challenges you are up against:  not enough time, not enough resources, not enough budget, not enough clarity, not enough customers, etc.

It’s a powerful thing when you can turn chaos into clarity and get the train leaving the station in a reliable way.

It’s hard to beat smart people shipping on a cadence, if they are always learning and always improving.

Situational Leadership as a Technical Skill

Sadly, this is one of the most common mistakes of new managers.  Seasoned ones, too.  They treat everybody on the team the same.  And they usually default to whatever they learned.   They either focus on motivating or they focus on directing.  And directing to the extreme, very quickly becomes micro-managing.

The big idea of Situational Leadership is to consider whether each person needs direction or motivation, or both.  

If you try to motivate somebody who is really looking for direction, you will both be frustrated.  Similarly, if you try to direct somebody who really is looking for motivation, it’s a quick spiral down.

There are many very good books on Situational Leadership and how to apply it in the real world.

Decision Making as a Technical Skill

This is where a lot of blood-shed happens.   This is where conflict thrives or dies.   Decision making is the bread-and-butter of today’s knowledge worker.  That’s what makes insight so valuable in a Digital Economy.  After all, what do you use the insight for?  To make better decisions.

It’s one thing for you to just make decisions.

But the real key here is how to create simple ways to deal with conflict and how to make better decisions as a group.   This includes how to avoid the pitfalls of groupthink.  It includes the ability to leverage the wisdom of the crowds.  It also includes the ability to influence and persuade with skill.  It includes the ability to balance connection with conviction.  It includes the ability to balance your Conflict Management Style with the Conflict Management Style of others.

Business as a Technical Skill

Business can be hard-core.   This isn’t so obvious if you deal with mediocre business people.  But when you interact with serious business leaders, you quickly understand how complicated, and technical, running a business and changing a business really is.

At the most fundamental level, the purpose of a business is to create a customer.

But even who you choose to serve as your “customer” is a strategic choice.

You can learn a lot about business by studying some of the great business challenges in the book, Case Interview Secrets, which is written by a former McKinsey consultant.

You can also learn a lot about business by studying which KPIs and business outcomes matter, in each industry, and by each business function.

It also helps to be able to quickly know how to whiteboard a value chain and be able to use some simple tools like SWOT analysis.  If you can really internalize Michael Porter’s mental models and toolset, then you will be ahead of many people in the business world.

Thoughtwork as a Technical Skill

There are many books and guides on how to be a leader in your field.   One of my favorites is, Lead the Field, by Earl Nightingale.  It’s an oldie, but goodie.

The real key is to be able to master ideation.  You need to be able to come up with ideas.   Probably the best technique I learned was long ago.   I simply set an idea quota.   In the book, ThinkerToys, by Michael Michalko, I learned that Thomas Edison set a quote to think up new ideas.  Success really is a numbers game.   Anyway, I started by writing one idea per note in my little yellow sticky pad.  The first week, I had a handful of ideas.   But once my mind was cleared by writing my ideas down, I was soon filling up multiple yellow sticky pads per week.

I very quickly went from having an innovation challenge to having an execution challenge.

So then I went back to the drawing board and focused on mastering execution as a technical skill Winking smile

Hopefully, if you are worried about how to keep growing your skills as you climb your corporate ladder, this will give you some food for thought.

Categories: Architecture, Programming

How Wistia Handles Millions of Requests Per Hour and Processes Rich Video Analytics

This is a guest repost from Christophe Limpalair of his interview with Max Schnur, Web Developer at  Wistia.

Wistia is video hosting for business. They offer video analytics like heatmaps, and they give you the ability to add calls to action, for example. I was really interested in learning how all the different components work and how they’re able to stream so much video content, so that’s what this episode focuses on.

What does Wistia’s stack look like?

As you will see, Wistia is made up of different parts. Here are some of the technologies powering these different parts:

What scale are you running at?
Categories: Architecture

Software Development Conferences Forecast November 2015

From the Editor of Methods & Tools - Mon, 11/23/2015 - 17:14
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP), DevOps and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods […]

SPaMCAST 369 – The Stand-Up Meeting, #NoEstimates, More on Mastery

Software Process and Measurement Cast - Sun, 11/22/2015 - 23:00

The Software Process and Measurement Cast 369 features our essay on stand-up meetings.  Stand-up meetings are a combination of tactical planning on steroids and the perfect start to a great day. Stand-up meetings are one the easiest Agile practices to adopt and often one the easiest to mess up.

Also, this week features we have Kim Pries and his Software Sensei column.  Kim discusses what it takes to move toward mastery. Mastery implies more than just being good at any particular task. The Software Sensei provides a path forward.

Gene Hughson brings the first of his discussions on the topic of #NoEstimates from his Form Follows Function blog! Specifically Gene provided a more detail and background on his essay #NoEstimates – Questions, Answers, and Credibility.

Call to Action!

Review the SPaMCAST on iTunes, Stitcher or your favorite podcatcher/player and then share the review! Help your friends find the Software Process and Measurement Cast. After all, friends help friends find great podcasts!

Re-Read Saturday News

Remember that the Re-Read Saturday of The Mythical Man-Month is winding down. THis week we are running a quick poll to identify the next book  on the Software Process and Measurement Blog.  What would you like the next re-read book to be?

Upcoming Events

Details on 2016 Conferences that include QAI Quest and StarEast to name a few in a few weeks.


The next Software Process and Measurement Cast will feature a discussion with Steve Tendon discussing more of his new book Tame The Flow, and more. (Subject to change due to the Thanksgiving holiday in the States).

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects

should not be a tedious process, neither

for you or your team.” Support SPaMCAST by buying the book here. Available in English and Chinese.

Categories: Process Management

SPaMCAST 369 – The Stand-Up Meeting, #NoEstimates, More on Mastery



Listen Now

Subscribe on iTunes

The Software Process and Measurement Cast 369 features our essay on stand-up meetings.  Stand-up meetings are a combination of tactical planning on steroids and the perfect start to a great day. Stand-up meetings are one the easiest Agile practices to adopt and often one the easiest to mess up.

Also, this week features we have Kim Pries and his Software Sensei column.  Kim discusses what it takes to move toward mastery. Mastery implies more than just being good at any particular task. The Software Sensei provides a path forward.

Gene Hughson brings the first of his discussions on the topic of #NoEstimates from his Form Follows Function blog! Specifically Gene provided a more detail and background on his essay #NoEstimates – Questions, Answers, and Credibility.

Call to Action!

Review the SPaMCAST on iTunes, Stitcher or your favorite podcatcher/player and then share the review! Help your friends find the Software Process and Measurement Cast. After all, friends help friends find great podcasts!

Re-Read Saturday News

Remember that the Re-Read Saturday of The Mythical Man-Month is winding down. THis week we are running a quick poll to identify the next book  on the Software Process and Measurement Blog.  What would you like the next re-read book to be?

Upcoming Events

Details on 2016 Conferences that include QAI Quest and StarEast to name a few in a few weeks.


The next Software Process and Measurement Cast will feature a discussion with Steve Tendon discussing more of his new book Tame The Flow, and more. (Subject to change due to the Thanksgiving holiday in the States).

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects

should not be a tedious process, neither

for you or your team.” Support SPaMCAST by buying the book here. Available in English and Chinese.

Categories: Process Management

Agile at Scale - A Reading List

Herding Cats - Glen Alleman - Sun, 11/22/2015 - 20:29

I'm working two programs where Agile at Scale is the development paradigm. When we start an engagement using other peoples money, in this case the money belongs to a sovereign, we make sure everyone is on the same page. When Agile at Scale is applied, it is usually applied on programs that have tripped the FAR 34.2/DFARS 234.2 levels for Earned Value Management. This means $20M programs are self assessed and $100M and above are validated by the DCMA (Defense Contract Management Agency).

While these programs are applying Agile, usually Scrum, they are also subject to EIA-748-C compliance and a list of other DID's (Data Item Description) and other procurement, development, testing, and operational guidelines . These means there are multiple constraints on how the progress to plan is reported to the customer - the sovereign. These programs are not 5 guys at the same table as their customer exploring what will be needed for mission success when they're done. These programs are not everyone's cup of tea, but agile is a powerful tool in the right hands of Software Intensive System of Systems for Mission Critical programs. Programs that MUST, deliver the needed Capabilities, at the Need Time, for the Planned Cost, within the planned Margins for cost, schedule, and technical performance.

One place to start to improve the probability that we're all on the same page- is this reading list. This is not an exhaustive list, and it is ever growing. But it's a start. It;s hoped this list is the basis of a shared understanding that while Agile is a near universal principle, there are practices that must be tailored to specific domains. And one's experience in one domain may or may not be applicable to other domains. 

Like it says in the Scrum Guide. 

Scrum (n): A framework within which people can address complex adaptive problems, while productively and creatively delivering products of the highest possible value.

And since Scrum is an agile software development framework, Scrum is a framework not a methodology. Scrum of Scrums, Agile At Scale, especially Agile at Scale inside EIA-748-C prorgams has much different needs than 5 people sitting at the same table with their customer with an emerging set of requirements where the needed capabilities will also emerge.

Related articles Slicing Work Into Small Pieces Agile Software Development in the DOD Technical Performance Measures What Do We Mean When We Say "Agile Community?" Can Enterprise Agile Be Bottom Up? How To Measure Anything
Categories: Project Management

The Next Book for Re-Read Saturday

The front cover of The Mythical Man-Month

The Mythical Man-Month

Airports and driving have conspired to keep me from finishing this week’s essay from The Mythical Man-Month, therefore I thought I would seek your advice.  The Re-Read of The Mythical Man-Month by Fred P. Brooks is nearly complete.  We have one very long retrospective essay and an epilogue left in the book and a summary essay from me. I anticipate one or two more weeks of posts. The burning question is: what is next?  We have re-read Stephen Covey’s Seven Habits, The Goal by Goldratt and now The Mythical Man-Month.  I have two questions for you. 1) Do you want me to continue with this feature?  And, 2) which book should be next?   It is time for a poll.  If you would like to suggest a book that is not on the list, feel free to “write-in” a candidate in the comments for this essay.  I have included the top entries from the poll I did a few years ago and a few suggestions readers and listeners have made along the way (or books I have rediscovered as I poked around my library). 

One final comment before the poll.  Thanks to Anteneh Berhane for his great question on the relationship between value and done (Done and Value Are Related! Part 1 and Part 2). The question Anteneh asked made me think about an aspect of software development that it is easy to assume, does functional software always measn value. I love to answer questions, so please continue to reach out by phone, email, Twitter and let’s talk.  I categorize all responses to questions with the tag “FAQ” to facilitate searches.

So, please vote below. There are two polls. For the second question about my next Re-Read book, please vote for your top 2 choices.

Now it is time for the polls:

Take Our Poll (function(d,c,j){if(!d.getElementById(j)){var pd=d.createElement(c),s;pd.id=j;pd.src='https://s1.wp.com/wp-content/mu-plugins/shortcodes/js/polldaddy-shortcode.js';s=d.getElementsByTagName(c)[0];s.parentNode.insertBefore(pd,s);} else if(typeof jQuery !=='undefined')jQuery(d.body).trigger('pd-script-load');}(document,'script','pd-polldaddy-loader'));


Take Our Poll (function(d,c,j){if(!d.getElementById(j)){var pd=d.createElement(c),s;pd.id=j;pd.src='https://s1.wp.com/wp-content/mu-plugins/shortcodes/js/polldaddy-shortcode.js';s=d.getElementsByTagName(c)[0];s.parentNode.insertBefore(pd,s);} else if(typeof jQuery !=='undefined')jQuery(d.body).trigger('pd-script-load');}(document,'script','pd-polldaddy-loader'));

Categories: Process Management

Add ifPresent to Swift Optionals

Xebia Blog - Fri, 11/20/2015 - 22:43

In my previous post I wrote a lot about how you can use the map and flatMap functions of Swift Optionals. In this one, I'll add a custom function to Optionals through an extension, the ifPresent function.

extension Optional {

    public func ifPresent(@noescape f: (Wrapped) throws -> Void) rethrows {
        switch self {
        case .Some(let value): try f(value)
        case .None: ()

What this does is simply execute the closure f if the Optional is not nil. A small example:

var str: String? = "Hello, playground"
str.ifPresent { print($0) }

This works pretty much the same as the ifPresent method of Java optionals.

Why do we need it?

Well, we don't really need it. Swift has the built-in language feature of if let and guard that deals pretty well with these kind of situations (which Java cannot do).

We could simply write the above example as follows:

var str: String? = "Hello, playground"
if let str = str {

For this example it doesn't matter much wether you would use ifPresent or if let. And because everyone is familiar with if let you'd probably want to stick with that.

When to use it

Sometimes, when you want to call a function that has exactly one parameter with the same type as your Optional, then you might benefit a bit more from this syntax. Let's have a loot at that:

var someOptionalView: UIView? = ...
var parentView: UIView = ...


Since addSubview has one parameter of type UIView, we can immediately pass in that function reference to the ifPresent function.

Otherwise we would have to write the following code instead:

var someOptionalView: UIView? = ...
var parentView: UIView = ...

if let someOptionalView = someOptionalView {
When you can't use it

Unfortunately, it's not always possible to use this in the way we'd like to. If we look back at the very first example with the print function we would ideally write it without closure:

var str: String? = "Hello, playground"

Even though print can be called with just a String, it's function signature takes variable arguments and default parameters. Whenever that's the case it's not possible to use it as a function reference. This becomes increasingly frustrating when you add default parameters to existing methods after which your code that refers to it doesn't compile anymore.

It would also be useful if it was possible to set class variables through a method reference. Instead of:

if let value = someOptionalValue {
  self.value = value

We would write something like this:


But that doesn't compile, so we need to write it like this:

someOptionalValue.ifPresent { self.value = $0 }

Which isn't really much better that the if let variant.

(I had a look at a post about If-Let Assignment Operator but unfortunately that crashed my Swift compiler while building, which is probably due to a compiler bug)


Is the ifPresent function a big game changer for Swift? Definitely not.
Is it necessary? No.
Can it be useful? Yes.

Initial experiences with the Prometheus monitoring system

Agile Testing - Grig Gheorghiu - Fri, 11/20/2015 - 22:23
I've been looking for a while for a monitoring system written in Go, self-contained and easy to deploy. I think I finally found what I was looking for in Prometheus, a monitoring system open-sourced by SoundCloud and started there by ex-Googlers who took their inspiration from Google's Borgmon system.

Prometheus is a pull system, where the monitoring server pulls data from its clients by hitting a special HTTP handler exposed by each client ("/metrics" by default) and retrieving a list of metrics from that handler. The output of /metrics is plain text, which makes it fairly easily parseable by humans as well, and also helps in troubleshooting.

Here's a subset of the OS-level metrics that are exposed by a client running the node_exporter Prometheus binary (and available when you hit http://client_ip_or_name:9100/metrics):

# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="guest"} 0
node_cpu{cpu="cpu0",mode="idle"} 2803.93
node_cpu{cpu="cpu0",mode="iowait"} 31.38
node_cpu{cpu="cpu0",mode="irq"} 0
node_cpu{cpu="cpu0",mode="nice"} 2.26
node_cpu{cpu="cpu0",mode="softirq"} 0.23
node_cpu{cpu="cpu0",mode="steal"} 21.16
node_cpu{cpu="cpu0",mode="system"} 25.84
node_cpu{cpu="cpu0",mode="user"} 79.94
# HELP node_disk_io_now The number of I/Os currently in progress.
# TYPE node_disk_io_now gauge
node_disk_io_now{device="xvda"} 0
# HELP node_disk_io_time_ms Milliseconds spent doing I/Os.
# TYPE node_disk_io_time_ms counter
node_disk_io_time_ms{device="xvda"} 44608
# HELP node_disk_io_time_weighted The weighted # of milliseconds spent doing I/Os. See https://www.kernel.org/doc/Documentation/iostats.txt.
# TYPE node_disk_io_time_weighted counter
node_disk_io_time_weighted{device="xvda"} 959264

There are many such "exporters" available for Prometheus, exposing metrics in the format expected by the Prometheus server from systems such as Apache, MySQL, PostgreSQL, HAProxy and many others (see a list here).

What drew me to Prometheus though was the fact that it allows for easy instrumentation of code by providing client libraries for many languages: Go, Java/Scala, Python, Ruby and others. 
One of the main advantages of Prometheus over alternative systems such as Graphite is the rich query language that it provides. You can associate labels (which are arbitrary key/value pairs) with any metrics, and you are then able to query the system by label. I'll show examples in this post. Here's a more in-depth comparison between Prometheus and Graphite.
Installation (on Ubuntu 14.04)
I put together an ansible role that is loosely based on Brian Brazil's demo_prometheus_ansible repo.
Check out my ansible-prometheus repo for this ansible role, which installs Prometheus, node_exporter and PromDash (a ruby-based dashboard builder). For people not familiar with ansible, most of the installation commands are in the install.yml task file. Here is the sequence of installation actions, in broad strokes.
For the Prometheus server:
  • download prometheus-0.16.1.linux-amd64.tar.gz from https://github.com/prometheus/prometheus/releases/download
  • extract tar.gz into /opt/prometheus/dist and link /opt/prometheus/prometheus-server to /opt/prometheus/dist/prometheus-0.16.1.linux-amd64
  • create Prometheus configuration file from ansible template and drop it in /etc/prometheus/prometheus.yml (more on the config file later)
  • create Prometheus default command-line options file from ansible template and drop it in /etc/default/prometheus
  • create Upstart script for Prometheus in /etc/init/prometheus.conf:
# Run prometheus

start on startup

chdir /opt/prometheus/prometheus-server

./prometheus -config.file /etc/prometheus/prometheus.yml
end script
For node_exporter:
  • download node_exporter-0.12.0rc1.linux-amd64.tar.gz from https://github.com/prometheus/node_exporter/releases/download
  • extract tar.gz into /opt/prometheus/dist and move node_exporter binary to /opt/prometheus/bin/node_exporter
  • create Upstart script for Prometheus in /etc/init/prometheus_node_exporter.conf:
# Run prometheus node_exporter
start on startup
script   /opt/prometheus/bin/node_exporterend script
For PromDash:
  • git clone from https://github.com/prometheus/promdash
  • follow instructions in the Prometheus tutorial from Digital Ocean (can't stop myself from repeating that D.O. publishes the best technical tutorials out there!)
Here is a minimal Prometheus configuration file (/etc/prometheus/prometheus.yml):
global:  scrape_interval: 30s  evaluation_interval: 5s
scrape_configs:  - job_name: 'prometheus'    target_groups:      - targets:        - prometheus.example.com:9090  - job_name: 'node'    target_groups:      - targets:        - prometheus.example.com:9100        - api01.example.com:9100        - api02.example.com:9100        - test-api01.example.com:9100        - test-api02.example.com:9100
The configuration file format for Prometheus is well documented in the official docs. My example shows that the Prometheus server itself is monitored (or "scraped" in Prometheus parlance) on port 9090, and that OS metrics are also scraped from 5 clients which are running the node_exporter binary on port 9100, including the Prometheus server.
At this point, you can start Prometheus and node_exporter on your Prometheus server via Upstart:
# start prometheus# start prometheus_node_exporter
Then you should be able to hit http://prometheus.example.com:9100 to see the metrics exposed by node_exporter, and more importantly http://prometheus.example.com:9090 to see the default Web console included in the Prometheus server. A demo page available from Robust Perception can be examined here.
Note that Prometheus also provides default Web consoles for node_exporter OS-level metrics. They are available at http://prometheus.example.com:9090/consoles/node.html (the ansible-prometheus role installs nginx and redirects http://prometheus.example.com:80 to the previous URL). The node consoles show CPU, Disk I/O and Memory graphs and also network traffic metrics for each client running node_exporter. 

Working with the MySQL exporter
I installed the mysqld_exporter binary on my Prometheus server box.
# cd /opt/prometheus/dist# git clone https://github.com/prometheus/mysqld_exporter.git# cd mysqld_exporter# make
Then I created a wrapper script I called run_mysqld_exporter.sh:
# cat run_mysqld_exporter.sh#!/bin/bash
export DATA_SOURCE_NAME=“dbuser:dbpassword@tcp(dbserver:3306)/dbname”; ./mysqld_exporter
Two important notes here:
1) Note the somewhat awkward format for the DATA_SOURCE_NAME environment variable. I tried many other formats but only this one worked for me. The wrapper's script main purpose is to define this variable properly. With some of my other tries, I got this error message:
INFO[0089] Error scraping global state: Default addr for network 'dbserver:3306' unknown  file=mysqld_exporter.go line=697
You could also define this variable in ~/.bashrc but in that case it may clash with other  Prometheus exporters (the one for PostgreSQL for example) which also need to define this variable.
2) Note that the dbuser specified in the DATA_SOURCE_NAME variable needs to have either SUPER or REPLICATION CLIENT permissions to the MySQL server you need to monitor. I ran a SQL statement of this form:

I created an Upstart init script I called /etc/init/prometheus_mysqld_exporter.conf:
# cat /etc/init/prometheus_mysqld_exporter.conf# Run prometheus mysqld exporter
start on startup
chdir /opt/prometheus/dist/mysqld_exporter
script   ./run_mysqld_exporter.shend script
I modified the Prometheus server configuration file (/etc/prometheus/prometheus.yml) and added a scrape job for the MySQL metrics:

  - job_name: 'mysql'
    honor_labels: true
      - targets:
        - prometheus.example.com:9104

I restarted the Prometheus server:

# stop prometheus
# start prometheus

Then I started up mysqld_exporter via Upstart:
# start prometheus_mysqld_exporter
If everything goes well, the metrics scraped from MySQL will be available at http://prometheus.example.com:9104/metrics
Here are some of the available metrics:
# HELP mysql_global_status_innodb_data_reads Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_data_reads untyped
mysql_global_status_innodb_data_reads 12660
# HELP mysql_global_status_innodb_data_writes Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_data_writes untyped
mysql_global_status_innodb_data_writes 528790
# HELP mysql_global_status_innodb_data_written Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_data_written untyped
mysql_global_status_innodb_data_written 9.879318016e+09
# HELP mysql_global_status_innodb_dblwr_pages_written Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_dblwr_pages_written untyped
mysql_global_status_innodb_dblwr_pages_written 285184
# HELP mysql_global_status_innodb_row_ops_total Total number of MySQL InnoDB row operations.
# TYPE mysql_global_status_innodb_row_ops_total counter
mysql_global_status_innodb_row_ops_total{operation="deleted"} 14580
mysql_global_status_innodb_row_ops_total{operation="inserted"} 847656
mysql_global_status_innodb_row_ops_total{operation="read"} 8.1021419e+07
mysql_global_status_innodb_row_ops_total{operation="updated"} 35305

Most of the metrics exposed by mysqld_exporter are of type Counter, which means they always increase. A meaningful number to graph then is not their absolute value, but their rate of change. For example, for the mysql_global_status_innodb_row_ops_total metric, the rate of change of reads for the last 5 minutes (reads/sec) can be expressed as:
This is also an example of a Prometheus query which filters by a specific label (in this case {operation="read"})
A good way to get a feel for the metrics available to the Prometheus server is to go to the Web console and graphing tool available at http://prometheus.example.com:9090/graph. You can copy and paste the ine above in the Expression edit box and click execute. You should see something like this graph in the Graph tab:

It's important to familiarize yourself with the 4 types of metrics handled by Prometheus: Counter, Gauge, Histogram and Summary. 
Working with the Postgres exporter
Although not an official Prometheus package, the Postgres exporter has worked just fine for me. 
I installed the postgres_exporter binary on my Prometheus server box.
# cd /opt/prometheus/dist# git clone https://github.com/wrouesnel/postgres_exporter.git# cd postgres_exporter# make
Then I created a wrapper script I called run_postgres_exporter.sh:

# cat run_postgres_exporter.sh

export DATA_SOURCE_NAME="postgres://dbuser:dbpassword@dbserver/dbname"; ./postgres_exporter
Note that the format for DATA_SOURCE_NAME is a bit different from the MySQL format.
I created an Upstart init script I called /etc/init/prometheus_postgres_exporter.conf:
# cat /etc/init/prometheus_postgres_exporter.conf# Run prometheus postgres exporter
start on startup
chdir /opt/prometheus/dist/postgres_exporter
script   ./run_postgres_exporter.shend script
I modified the Prometheus server configuration file (/etc/prometheus/prometheus.yml) and added a scrape job for the Postgres metrics:

  - job_name: 'postgres'
    honor_labels: true
      - targets:
        - prometheus.example.com:9113

I restarted the Prometheus server:

# stop prometheus
# start prometheus
Then I started up postgres_exporter via Upstart:
# start prometheus_postgres_exporter
If everything goes well, the metrics scraped from Postgres will be available at http://prometheus.example.com:9113/metrics
Here are some of the available metrics:
# HELP pg_stat_database_tup_fetched Number of rows fetched by queries in this database
# TYPE pg_stat_database_tup_fetched counter
pg_stat_database_tup_fetched{datid="1",datname="template1"} 7.730469e+06
pg_stat_database_tup_fetched{datid="12998",datname="template0"} 0
pg_stat_database_tup_fetched{datid="13003",datname="postgres"} 7.74208e+06
pg_stat_database_tup_fetched{datid="16740",datname="mydb"} 2.18194538e+08
# HELP pg_stat_database_tup_inserted Number of rows inserted by queries in this database
# TYPE pg_stat_database_tup_inserted counter
pg_stat_database_tup_inserted{datid="1",datname="template1"} 0
pg_stat_database_tup_inserted{datid="12998",datname="template0"} 0
pg_stat_database_tup_inserted{datid="13003",datname="postgres"} 0
pg_stat_database_tup_inserted{datid="16740",datname="mydb"} 3.5467483e+07
# HELP pg_stat_database_tup_returned Number of rows returned by queries in this database
# TYPE pg_stat_database_tup_returned counter
pg_stat_database_tup_returned{datid="1",datname="template1"} 6.41976558e+08
pg_stat_database_tup_returned{datid="12998",datname="template0"} 0
pg_stat_database_tup_returned{datid="13003",datname="postgres"} 6.42022129e+08
pg_stat_database_tup_returned{datid="16740",datname="mydb"} 7.114057378094e+12
# HELP pg_stat_database_tup_updated Number of rows updated by queries in this database
# TYPE pg_stat_database_tup_updated counter
pg_stat_database_tup_updated{datid="1",datname="template1"} 1
pg_stat_database_tup_updated{datid="12998",datname="template0"} 0
pg_stat_database_tup_updated{datid="13003",datname="postgres"} 1
pg_stat_database_tup_updated{datid="16740",datname="mydb"} 4351

These metrics are also of type Counter, so to generate meaningful graphs for them, you need to plot their rates. For example, to see the rate of rows returned per second from the database called mydb, you would plot this expression:
The Prometheus expression evaluator available at http://prometheus.example.com:9090/graph is again your friend. BTW, if you start typing pg_ in the expression field, you'll see a drop-down filled automatically with all the available metrics starting with pg_. Handy!
Working with the AWS CloudWatch exporterThis is one of the officially supported Prometheus exporters, used for graphing and alerting on AWS CloudWatch metrics. I installed it on the Prometheus server box. It's a java app, so it needs a JDK installed, and also maven for building the app.
# cd /opt/prometheus/dist# git clone https://github.com/prometheus/cloudwatch_exporter.git# apt-get install maven2 openjdk-7-jdk# cd cloudwatch_exporter# mvn package
The cloudwatch_exporter app needs AWS credentials in order to connect to CloudWatch and read the metrics. Here's what I did:
  1. created an AWS IAM user called cloudwatch_ro and downloaded its access key and secret key
  2. created an AWS IAM custom policy called CloudWatchReadOnlyAccess-201511181031, which includes the default CloudWatchReadOnlyAccess policy (the custom policy is not stricly necessary, and you can use the default one, but I preferred to use a custom one because I may need to further edits to the policy file)
  3. attached the CloudWatchReadOnlyAccess-201511181031 policy to the cloudwatch_ro user
  4. created a file called ~/.aws/credentials with the contents:
The cloudwatch_exporter app also needs a json file containing the CloudWatch metrics we want it to retrieve from AWS. Here is an example of ELB-related metrics I specified in a file called cloudwatch.json:
  "region": "us-west-2",
  "metrics": [
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "RequestCount",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "BackendConnectionErrors",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_Backend_2XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_Backend_4XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_Backend_5XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_ELB_4XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "HTTPCode_ELB_5XX",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "SurgeQueueLength",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Maximum", "Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "SpilloverCount",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Sum"]},
    {"aws_namespace": "AWS/ELB", "aws_metric_name": "Latency",
     "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
     "aws_dimension_select": {"LoadBalancerName": [“LB1”, “LB2”]},
     "aws_statistics": ["Average"]},
Note that you need to look up the exact syntax for each metric name, dimensions and preferred statistics in the AWS CloudWatch documentation. For ELB metrics, the documentation is here. The CloudWatch name corresponds to the cloudwatch_exporter JSON parameter aws_metric_name, dimensions corresponds to aws_dimensions, and preferred statistics corresponds to aws_statistics.
I modified the Prometheus server configuration file (/etc/prometheus/prometheus.yml) and added a scrape job for the CloudWatch metrics:

  - job_name: 'cloudwatch'
    honor_labels: true
      - targets:
        - prometheus.example.com:9106

I restarted the Prometheus server:

# stop prometheus
# start prometheus

I created an Upstart init script I called /etc/init/prometheus_cloudwatch_exporter.conf:
# cat /etc/init/prometheus_cloudwatch_exporter.conf# Run prometheus cloudwatch exporter
start on startup
chdir /opt/prometheus/dist/cloudwatch_exporter
script   /usr/bin/java -jar target/cloudwatch_exporter-0.2-SNAPSHOT-jar-with-dependencies.jar 9106 cloudwatch.jsonend script
Then I started up cloudwatch_exporter via Upstart:
# start prometheus_cloudwatch_exporter
If everything goes well, the metrics scraped from CloudWatch will be available at http://prometheus.example.com:9106/metrics
Here are some of the available metrics:
# HELP aws_elb_request_count_sum CloudWatch metric AWS/ELB RequestCount Dimensions: [AvailabilityZone, LoadBalancerName] Statistic: Sum Unit: Count
# TYPE aws_elb_request_count_sum gauge
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2a",} 1.0
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2c",} 1.0
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2c",} 2.0
aws_elb_request_count_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2a",} 12.0
# HELP aws_elb_httpcode_backend_2_xx_sum CloudWatch metric AWS/ELB HTTPCode_Backend_2XX Dimensions: [AvailabilityZone, LoadBalancerName] Statistic: Sum Unit: Count
# TYPE aws_elb_httpcode_backend_2_xx_sum gauge
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2a",} 1.0
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2c",} 1.0
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2c",} 2.0
aws_elb_httpcode_backend_2_xx_sum{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2a",} 12.0
# HELP aws_elb_latency_average CloudWatch metric AWS/ELB Latency Dimensions: [AvailabilityZone, LoadBalancerName] Statistic: Average Unit: Seconds
# TYPE aws_elb_latency_average gauge
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2a",} 0.5571935176849365
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB1”,availability_zone="us-west-2c",} 0.5089397430419922
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2c",} 0.035556912422180176
aws_elb_latency_average{job="aws_elb",load_balancer_name=“LB2”,availability_zone="us-west-2a",} 0.0031794110933939614

Note that there are 3 labels available to query the metrics above: job, load_balancer_name and availability_zone. 
If we specify something like aws_elb_request_count_sum{job="aws_elb"} in the expression evaluator at http://prometheus.example.com:9090/graph, we'll see 4 graphs, one for each load_balancer_name/availability_zone combination. 
To see only graphs related to a specific load balancer, say LB1, we can specify an expression of the form:aws_elb_request_count_sum{job="aws_elb",load_balancer_name="LB1"}In this case, we'll see 2 graphs for LB1, one for each availability zone.
In order to see the request count across all availability zones for a specific load balancer, we need to apply the sum function: sum(aws_elb_request_count_sum{job="aws_elb",load_balancer_name="LB1"}) by (load_balancer_name) In this case, we'll see one graph with the request count across the 2 availability zones pertaining to LB1.
If we want to graph all load balancers but only show one graph per balancer, summing all availability zones for each balancer, we would use an expression like this: sum(aws_elb_request_count_sum{job="aws_elb"}) by (load_balancer_name)So in this case we'll see 2 graphs, one for LB1 and one for LB2, with each graph summing the request count across the availability zones for LB1 and LB2 respectively.
Note that in all the expressions above, since the job label has the value "aws_elb" common to all metrics, it can be dropped from the queries because it doesn't produce any useful filtering.
For other AWS CloudWatch metrics, consult the Amazon CloudWatch Namespaces, Dimensions and Metrics Reference.

Instrumenting Go code with Prometheus
For me, the most interesting feature of Prometheus is that allows for easy instrumentation of the code. Instead of pushing metrics a la statsd and Graphite, a web app needs to implement a /metrics handler and use the Prometheus client library code to publish app-level metrics to that handler. The Prometheus server will then hit /metrics on the client and pull/scrape the metrics.

More specifics for Go code instrumentation

1) Declare and register Prometheus metrics in your code

I have the following 2 variables defined in an init.go file in a common package that gets imported in all of the webapp code:

var PrometheusHTTPRequestCount = prometheus.NewCounterVec(
        Namespace: "myapp",
        Name:      "http_request_count",
        Help:      "The number of HTTP requests.",
    []string{"method", "type", "endpoint"},

var PrometheusHTTPRequestLatency = prometheus.NewSummaryVec(
        Namespace: "myapp",
        Name:      "http_request_latency",
        Help:      "The latency of HTTP requests.",
    []string{"method", "type", "endpoint"},

Note that the first metric is a CounterVec, which in the Prometheus client_golang library specifies a Counter metric that can also get labels associated with it. The labels in my case are "method", "type" and "endpoint". The purpose of this metric is to measure the HTTP request count. Since it's a Counter, it will increase monotonically, so for graphing purposes we'll need to plot its rate and not its absolute value.

The second metric is a SummaryVec, which in the client_golang library specifies a Summary metric with labels. I have the same labels are for the CounterVec metric. The purpose of this metric is to measure the HTTP request latency. Because it's a Summary, it will provide the absolute measurement, the count, as well as quantiles for the measurements.

These 2 variables then get registered in the init function:

func init() {
    // Register Prometheus metric trackers

2) Let Prometheus handle the /metrics endpoint

The GitHub README for client_golang shows the simplest way of doing this:

http.Handle("/metrics", prometheus.Handler())
http.ListenAndServe(":8080", nil)

However, most of the Go webapp code will rely on some sort of web framework, so YMMV. In our case, I had to insert the prometheus.Handler function as a variable pretty deep in our framework code in order to associate it with the /metrics endpoint.

3) Modify Prometheus metrics in your code

The final step in getting Prometheus to instrument your code is to modify the Prometheus metrics you registered by incrementing Counter variables and taking measurements for Summary variables in the appropriate places in your app. In my case, I increment PrometheusHTTPRequestCount in every HTTP handler in my webapp by calling its Inc() method. I also measure the HTTP latency, i.e. the time it took for the handler code to execute, and call the Observe() method on the PrometheusHTTPRequestLatency variable.

The values I associate with the "method", "type" and "endpoint" labels come from the endpoint URL associated with each instrumented handler. As an example, for an HTTP GET request to a URL such as http://api.example.com/customers/find, "method" is the HTTP method used in the request ("GET"), "type" is "customers", and "endpoint" is "/customers/find".

Here is the code I use for modifying the Prometheus metrics (R is an object/struct which represents the HTTP request):

    // Modify Prometheus metrics
    pkg, endpoint := common.SplitUrlForMonitoring(R.URL.Path)
    method := R.Method
    PrometheusHTTPRequestCount.WithLabelValues(method, pkg, endpoint).Inc()
    PrometheusHTTPRequestLatency.WithLabelValues(method, pkg, endpoint).Observe(float64(elapsed) / float64(time.Millisecond))

4) Retrieving your metrics

Assuming your web app runs on port 8080, you'll need to modify the Prometheus server configuration file and add a scrape job for app-level metrics. I have something similar to this in /etc/prometheus/prometheus.xml:

- job_name: 'myapp-api'
      - targets:
        - api01.example.com:8080
        - api02.example.com:8080
          group: 'production'
      - targets:
        - test-api01.example.com:8080
        - test-api02.example.com:8080
          group: 'test'

Note an extra label called "group" defined in the configuration file. It has the values "production" and "test" respectively, and allows for the filtering of Prometheus measurements by the environment of the monitored nodes.

Whenever the Prometheus configuration file gets modified, you need to restart the Prometheus server:

# stop prometheus
# start prometheus

At this point, the metrics scraped from the webapp servers will be available at http://api01.example.com:8080/metrics.

Here are some of the available metrics:
# HELP myapp_http_request_count The number of HTTP requests.
# TYPE myapp_http_request_count counter
myapp_http_request_count{endpoint="/merchant/register",method="GET",type="admin"} 2928
# HELP myapp_http_request_latency The latency of HTTP requests.
# TYPE myapp_http_request_latency summary
myapp_http_request_latency{endpoint="/merchant/register",method="GET",type="admin",quantile="0.5"} 31.284808
myapp_http_request_latency{endpoint="/merchant/register",method="GET",type="admin",quantile="0.9"} 33.353354
myapp_http_request_latency{endpoint="/merchant/register",method="GET",type="admin",quantile="0.99"} 33.353354
myapp_http_request_latency_sum{endpoint="/merchant/register",method="GET",type="admin"} 93606.57930099976

myapp_http_request_latency_count{endpoint="/merchant/register",method="GET",type="admin"} 2928

Note that myapp_http_request_count and myapp_http_request_latency_count show the same value for the method/type/endpoint combination in this example. You could argue that myapp_http_request_count is redundant in this case. There could be instances where you want to increment a counter without taking a measurement for the summary, so it's still useful to have both. 
Also note that myapp_http_request_latency, being a summary, computes 3 different quantiles: 0.5, 0.9 and 0.99 (so 50%, 90% and 99% of the measurements respectively fall under the given numbers for the latencies).

5) Graphing your metrics with PromDash
The PromDash tool provides an easy way to create dashboards with a look and feel similar to Graphite. PromDash is available at http://prometheus.example.com:3000. 
First you need to define a server by clicking on the Servers link up top, then entering a name ("prometheus") and the URL of the Prometheus server ("http://prometheus.example.com:9090/").
Then click on Dashboards up top, and create a new directory, which offers a way to group dashboards. You can call it something like "myapp". Now you can create a dashboard (you also need to select the directory it belongs to). Once you are in the Dashboard create/edit screen, you'll see one empty graph with the default title "Title". 
When you hover over the header of the graph, you'll see other buttons available. You want to click on the 2nd button from the left, called Datasources, then click Add Expression. Note that the server field is already pre-filled. If you start typing myapp in the expression field, you should see the metrics exposed by your application (for example myapp_http_request_count and myapp_http_request_latency).
To properly graph a Counter-type metric, you need to plot its rate. Use this expression to show the HTTP request/second rate measured in the last minute for all the production endpoints in my webapp:
(the job and group values correspond to what we specified in /etc/prometheus/prometheus.xml)
If you want to show the HTTP request/second rate for test endpoints of "admin" type, use this expression:
If you want to show the HTTP request/second rate for a specific production endpoint, use an expression similar to this:
Once you enter the expression you want, close the Datasources form (it will save everything). Also change the title by clicking on the button called "Graph and Axis Settings". In that form, you can also specify that you want the plot lines stacked as opposed to regular lines.
 For latency metrics, you don't need to look at the rate. Instead, you can look at a specific quantile. Let's say you want to plot the 99% quantile for latencies observed in all production endpoint, for write operations (corresponding to HTTP methods which are not GET). Then you would use an expression like this:
As for the HTTP request/second graphs, you can refine the latency queries by specifying a type, an endpoint or both:
I hope you have enough information at this point to go wild with dashboards! Remember, who has the most dashboards wins!
Wrapping up
I wanted to write this blog post so I don't forget all the stuff that was involved in setting up and using Prometheus. It's a lot, but it's also not that bad once you get a hang for it. In particular, the Prometheus server itself is remarkably easy to set up and maintain, a refreshing change from other monitoring systems I've used before.
One thing I haven't touched on is the alerting mechanism used in Prometheus. I haven't looked at that yet, since I'm still using a combination of Pingdom, monit and Jenkins for my alerting. I'll tackle Prometheus alerting in another blog post.
I really like Prometheus so far and I hope you'll give it a try!

Stuff The Internet Says On Scalability For November 20th, 2015

Hey, it's HighScalability time:

100 years ago people saw this as our future. We will be so laughably wrong about the future.
  • $24 billion: amount telcos make selling data about you; $500,000: cost of iOS zero day exploit; 50%: a year's growth of internet users in India; 72: number of cores in Intel's new chip; 30,000: Docker containers started on 1,000 nodes; 1962: when the first Cathode Ray Tube entered interplanetary space; 2x: cognitive improvement with better indoor air quality; 1 million: Kubernetes request per second; 

  • Quotable Quotes:
    • Zuckerberg: One of our goals for the next five to 10 years is to basically get better than human level at all of the primary human senses: vision, hearing, language, general cognition. 
    • Sawyer Hollenshead: I decided to do what any sane programmer would do: Devise an overly complex solution on AWS for a seemingly simple problem.
    • Marvin Minsky: Big companies and bad ideas don't mix very well.
    • @mathiasverraes: Events != hooks. Hooks allow you to reach into a procedure, change its state. Events communicate state change. Hooks couple, events decouple
    • @neil_conway: Lamport, trolling distributed systems engineers since 1998. 
    • @timoreilly: “Silicon Valley is the QA department for the rest of the world. It’s where you test out new business models.” @jamescham #NextEconomy
    • Henry Miller: It is my belief that the immature artist seldom thrives in idyllic surroundings. What he seems to need, though I am the last to advocate it, is more first-hand experience of life—more bitter experience, in other words. In short, more struggle, more privation, more anguish, more disillusionment.
    • @mollysf: "We save north of 30% when we move apps to cloud. Not in infrastructure; in operating model." @cdrum #structureconf
    • Alex Rampell: This is the flaw with looking at Square and Stripe and calling them commodity players. They have the distribution. They have the engineering talent. They can build their own TiVo. It doesn’t mean they will, but their success hinges on their own product and engineering prowess, not on an improbable deal with an oligopoly or utility.
    • @csoghoian: The Michigan Supreme Court, 1922: Cars are tools for robbery, rape, murder, enabling silent approach + swift escape.
    • @tomk_: Developers are kingmakers, driving technology adoption. They choose MongoDB for cost, agility, dev productivity. @dittycheria #structureconf
    • Andrea “Andy” Cunningham: You have to always foster an environment where people can stand up against the orthodoxy, otherwise you will never create anything new.
    • @joeweinman: Jay Parikh at #structureconf on moving Instagram to Facebook: only needed 1 FB server for every 3 AWS servers
    • amirmc: The other unikernel projects (i.e. MirageOS and HaLVM), take a clean-slate approach which means application code also has to be in the same language (OCaml and Haskell, respectively). However, there's also ongoing work to make pieces of the different implementations play nicely together too (but it's early days).

  • After a tragedy you can always expect the immediate fear inspired reframing of agendas. Snowden responsible for Paris...really?

  • High finance in low places. The Hidden Wealth of Nations: In 2003, less than a year before its initial public offering in August 2004, Google US transferred its search and advertisement technologies to “Google Holdings,” a subsidiary incorporated in Ireland, but which for Irish tax purposes is a resident of Bermuda.

  • The entertaining True Tales of Engineering Scaling. Started with Rails and Postgres. Traffic jumped. High memory workers on Heroku broke the bank. Can't afford the time to move to AWS. Lots of connection issues. More traffic. More problems. More solutions. An interesting story with many twists. The lesson: Building and, more importantly, shipping software is about the constant trade off of forward movement and present stability.

  • 5 Tips to Increase Node.js Application Performance: Implement a Reverse Proxy Server; Cache Static Files; Implement a Node.js Load Balancer; Proxy WebSocket Connections; Implement SSL/TLS and HTTP/2.

  • Docker adoption is not that easy, Uber took months to get up and running with Docker. How Docker Turbocharged Uber’s Deployments: Everything just changes a bit, we need to think about stuff differently...You really need to rethink all of the parts of your infrastructure...Uber recognizes that Docker removed team dependencies, offering more freedom because members were no longer tied to specific frameworks or specific versions. Framework and service pawners are now able to experiment with new technologies and to manage their own environments.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Altering the Deal

Making the Complex Simple - John Sonmez - Fri, 11/20/2015 - 14:00

For those of us who don’t have the luxury of The Force for mental persuasion, negotiation tactics are a vital tool we need to have in our tool belts in order to make the most out of our professional careers.  Luckily, most of us don’t have this guy as our boss or our client. For […]

The post Altering the Deal appeared first on Simple Programmer.

Categories: Programming

Android Developer Story: Gifted Mom reaches more mothers across Africa with Android

Android Developers Blog - Fri, 11/20/2015 - 07:38

Posted by Lily Sheringham, Google Play team

Gifted Mom is an app developed in Cameroon which provides users with basic, yet critical information about pregnancy, breastfeeding and child vaccinations. The widespread use of Android smartphones in Africa has meant that Gifted Mom has been able to reach more people at scale and improve lives.

Watch the creators of Gifted Mom, developer Alain Nteff and doctor Conrad Tankou, explain how they built their business and launched Gifted Mom on Google Play. They also talk about their plans to grow and help people in other developing countries across the continent in the next three years, in order to ultimately tackle maternal and infant mortality.

Find out more about building apps for Android and how to find success on Google Play.

Categories: Programming

Done and Value Are Related! (Part 2)


Just because it is done, doesn’t mean it adds value.

In recent exchange after the 16th installment of the re-read Saturday of the Mythical Man-Month, Anteneh Berhane called me to ask one of the hardest questions I had ever been asked: why doesn’t the definition of done include value? At its core, the question was asks how can we consider work “done”, if there is no business value delivered, even if the definition of done is satisfied including demonstrably meeting acceptance criteria. Is software are really done if the business need has changed so what was delivered is technically correct, but misses the mark based on the business environment today? This is a scenario I was all too familiar with during the 1980s and 1990s at the height of large waterfall projects but see less in today’s Agile development environment.  Anteneh Berhane’s question reminds us that the problem is still possible.  We discussed five potential problems that disrupt the linkage between done and value.  Today we will discuss the second three.

Scaled efforts without plans for communication and coordination can cause lots of work to be done that doesn’t meet needs. Scaled Agile is a group of teams working together toward a common goal.  These teams need to coordinate in order to achieve a specific goal. Coordination requires the adoption of techniques that help make coordination possible.  For example, the release train engineer and release train planning event are tools used by the Scaled Agile Framework Enterprise (SAFe) to ensure that what gets built is what is needed for the business across all of the teams.  Without communication, planning and feedback tools and techniques to ensure coordination, it is possible to a team or a group teams build the wrong thing while meeting all the criteria for being done. When more than one team is working in a coordinated fashion, time and effort must be spent coordinating activities and work between all of the related teams.

A poorly defined value proposition is a BIG problem. One of the premises of Agile software development is that a team will deliver value on a regular cadence.  If the product owner or product management team has not spent the time needed to evaluate the features in the project backlog to determine which have the most value (or if specific requirements have any value at all) the software that gets delivered may be done, may meet acceptance criteria, but may not measure up on the value side of the equation. The simple fix for the product owner or product management group is to evaluate and rank features based on the business value they deliver. (Note: epics and stories can also be evaluated for value.)

Lack of product owner involvement is problematic for all Agile projects.  Pedro Serrador in an interview on SPAMCAST 368 stated that the level of product owner involvement was an indicator of success with Agile. Product owner involvement provides both the vision and the feedback needed to ensure the team is building the right software.  Projects without product owner involvement should not be done using Agile, and perhaps should not be done at all.

The definition of done often infers that the software being delivered has value. Unfortunately, that is not always true.  Evaluating and focusing on value is just as important and meeting the definition of done and whether the software has been evaluated to determine if it  meets the acceptance criteria.

Categories: Process Management

To ECC or Not To ECC

Coding Horror - Jeff Atwood - Fri, 11/20/2015 - 00:44

On one of my visits to the Computer History Museum – and by the way this is an absolute must-visit place if you are ever in the San Francisco bay area – I saw an early Google server rack circa 1999 in the exhibits.

Not too fancy, right? Maybe even … a little janky? This is building a computer the Google way:

Instead of buying whatever pre-built rack-mount servers Dell, Compaq, and IBM were selling at the time, Google opted to hand-build their server infrastructure themselves. The sagging motherboards and hard drives are literally propped in place on handmade plywood platforms. The power switches are crudely mounted in front, the network cables draped along each side. The poorly routed power connectors snake their way back to generic PC power supplies in the rear.

Some people might look at these early Google servers and see an amateurish fire hazard. Not me. I see a prescient understanding of how inexpensive commodity hardware would shape today's internet. I felt right at home when I saw this server; it's exactly what I would have done in the same circumstances. This rack is a perfect example of the commodity x86 market D.I.Y. ethic at work: if you want it done right, and done inexpensively, you build it yourself.

This rack is now immortalized in the National Museum of American History. Urs Hölzle posted lots more juicy behind the scenes details, including the exact specifications:

  • Supermicro P6SMB motherboard
  • 256MB PC100 memory
  • Pentium II 400 CPU
  • IBM Deskstar 22GB hard drives (×2)
  • Intel 10/100 network card

When I left Stack Exchange (sorry, Stack Overflow) one of the things that excited me most was embarking on a new project using 100% open source tools. That project is, of course, Discourse.

Inspired by Google and their use of cheap, commodity x86 hardware to scale on top of the open source Linux OS, I also built our own servers. When I get stressed out, when I feel the world weighing heavy on my shoulders and I don't know where to turn … I build servers. It's therapeutic.

I like to give servers a little pep talk while I build them. "Who's the best server! Who's the fastest server!"

— Jeff Atwood (@codinghorror) November 16, 2015

Don't judge me, man.

But more seriously, with the release of Intel's latest Skylake architecture, it's finally time to upgrade our 2013 era Discourse servers to the latest and greatest, something reflective of 2016 – which means building even more servers.

Discourse runs on a Ruby stack and one thing we learned early on is that Ruby demands exceptional single threaded performance, aka, a CPU running as fast as possible. Throwing umptazillion CPU cores at Ruby doesn't buy you a whole lot other than being able to handle more requests at the same time. Which is nice, but doesn't get you speed per se. Someone made a helpful technical video to illustrate exactly how this all works:

This is by no means exclusive to Ruby; other languages like JavaScript and Python also share this trait. And Discourse itself is a JavaScript application delivered through the browser, which exercises the mobile / laptop / desktop client CPU. Mobile devices reaching near-parity with desktop performance in single threaded performance is something we're betting on in a big way with Discourse.

So, good news! Although PC performance has been incremental at best in the last 5 years, between Haswell and Skylake, Intel managed to deliver a respectable per-thread performance bump. Since we are upgrading our servers from Ivy Bridge (very similar to the i7-3770k), the generation before Haswell, I'd expect a solid 33% performance improvement at minimum.

Even worse, the more cores they pack on a single chip, the slower they all go. From Intel's current Xeon E5 lineup:

  • E5-1680 → 8 cores, 3.2 Ghz
  • E5-1650 → 6 cores, 3.5 Ghz
  • E5-1630 → 4 cores, 3.7 Ghz

Sad, isn't it? Which brings me to the following build for our core web tiers, which optimizes for "lots of inexpensive, fast boxes"

2013 2016 Xeon E3-1280 V2 Ivy Bridge 3.6 Ghz / 4.0 Ghz quad-core ($640)
SuperMicro X9SCM-F-O mobo ($190)
32 GB DDR3-1600 ECC ($292)
SC111LT-330CB 1U chassis ($200)
Samsung 830 512GB SSD ×2 ($1080)
1U Heatsink ($25)
i7-6700k Skylake 4.0 Ghz / 4.2 Ghz quad-core ($370)
SuperMicro X11SSZ-QF-O mobo ($230)
64 GB DDR4-2133 ($520)
CSE-111LT-330CB 1U chassis ($215)
Samsung 850 Pro 1TB SSD ×2 ($886)
1U Heatsink ($20)
$2,427 $2,241 31w idle, 87w BurnP6 load 14w idle, 81w BurnP6 load

So, about 10% cheaper than what we spent in 2013, with 2× the memory, 2× the storage (probably 50-100% faster too), and at least ~33% faster CPU. With lower power draw, to boot! Pretty good. Pretty, pretty, pretty, pretty good.

(Note that the memory bump is only possible thanks to Intel finally relaxing their iron fist of maximum allowed RAM at the low end; that's new to the Skylake generation.)

One thing is conspicuously missing in our 2016 build: Xeons, and ECC Ram. In my defense, this isn't intentional – we wanted the fastest per-thread performance and no Intel Xeon, either currently available or announced, goes to 4.0 GHz with Skylake. Paying half the price for a CPU with better per-thread performance than any Xeon, well, I'm not going to kid you, that's kind of a nice perk too.

So what is ECC all about?

Error-correcting code memory (ECC memory) is a type of computer data storage that can detect and correct the most common kinds of internal data corruption. ECC memory is used in most computers where data corruption cannot be tolerated under any circumstances, such as for scientific or financial computing.

Typically, ECC memory maintains a memory system immune to single-bit errors: the data that is read from each word is always the same as the data that had been written to it, even if one or more bits actually stored have been flipped to the wrong state. Most non-ECC memory cannot detect errors although some non-ECC memory with parity support allows detection but not correction.

It's received wisdom in the sysadmin community that you always build servers with ECC RAM because, well, you build servers to be reliable, right? Why would anyone intentionally build a server that isn't reliable? Are you crazy, man? Well, looking at that cobbled together Google 1999 server rack, which also utterly lacked any form of ECC RAM, I'm inclined to think that reliability measured by "lots of redundant boxes" is more worthwhile and easier to achieve than the platonic ideal of making every individual server bulletproof.

Being the type of guy who likes to question stuff… I began to question. Why is it that ECC is so essential anyway? If ECC was so important, so critical to the reliable function of computers, why isn't it built in to every desktop, laptop, and smartphone in the world by now? Why is it optional? This smells awfully… enterprisey to me.

Now, before everyone stops reading and I get permanently branded as "that crazy guy who hates ECC", I think ECC RAM is fine:

  • The cost difference between ECC and not-ECC is minimal these days.
  • The performance difference between ECC and not-ECC is minimal these days.
  • Even if ECC only protects you from rare 1% hardware error cases that you may never hit until you literally build hundreds or thousands of servers, it's cheap insurance.

I am not anti-insurance, nor am I anti-ECC. But I do seriously question whether ECC is as operationally critical as we have been led to believe, and I think the data shows modern, non-ECC RAM is already extremely reliable.

First, let's look at the Puget Systems reliability stats. These guys build lots of commodity x86 gamer PCs, burn them in, and ship them. They helpfully track statistics on how many parts fail either from burn-in or later in customer use. Go ahead and read through the stats.

For the last two years, CPU reliability has dramatically improved. What is interesting is that this lines up with the launch of the Intel Haswell CPUs which was when the CPU voltage regulation was moved from the motherboard to the CPU itself. At the time we theorized that this should raise CPU failure rates (since there are more components on the CPU to break) but the data shows that it has actually increased reliability instead.

Even though DDR4 is very new, reliability so far has been excellent. Where DDR3 desktop RAM had an overall failure rate in 2014 of ~0.6%, DDR4 desktop RAM had absolutely no failures.

SSD reliability has dramatically improved recently. This year Samsung and Intel SSDs only had a 0.2% overall failure rate compared to 0.8% in 2013.

Modern commodity computer parts from reputable vendors are amazingly reliable. And their trends show from 2012 onward essential PC parts have gotten more reliable, not less. (I can also vouch for the improvement in SSD reliability as we have had zero server SSD failures in 3 years across our 12 servers with 24+ drives, whereas in 2011 I was writing about the Hot/Crazy SSD Scale.) And doesn't this make sense from a financial standpoint? How does it benefit you as a company to ship unreliable parts? That's money right out of your pocket and the reseller's pocket, plus time spent dealing with returns.

We had a, uh, "spirited" discussion about this internally on our private Discourse instance.

This is not a new debate by any means, but I was frustrated by the lack of data out there. In particular, I'm really questioning the difference between "soft" and "hard" memory errors:

But what is the nature of those errors? Are they soft errors – as is commonly believed – where a stray Alpha particle flips a bit? Or are they hard errors, where a bit gets stuck?

I absolutely believe that hard errors are reasonably common. RAM DIMMS can have bugs, or the chips on the DIMM can fail, or there's a design flaw in circuitry on the DIMM that only manifests in certain corner cases or under extreme loads. I've seen it plenty. But a soft error where a bit of memory randomly flips?

There are two types of soft errors, chip-level soft error and system-level soft error. Chip-level soft errors occur when the radioactive atoms in the chip's material decay and release alpha particles into the chip. Because an alpha particle contains a positive charge and kinetic energy, the particle can hit a memory cell and cause the cell to change state to a different value. The atomic reaction is so tiny that it does not damage the actual structure of the chip.

Outside of airplanes and spacecraft, I have a difficult time believing that soft errors happen with any frequency, otherwise most of the computing devices on the planet would be crashing left and right. I deeply distrust the anecdotal voodoo behind "but one of your computer's memory bits could flip, you'd never know, and corrupted data would be written!" It'd be one thing if we observed this regularly, but I've been unhealthily obsessed with computers since birth and I have never found random memory corruption to be a real, actual problem on any computers I have either owned or had access to.

But who gives a damn what I think. What does the data say?

A 2007 study found that the observed soft error rate in live servers was two orders of magnitude lower than previously predicted:

Our preliminary result suggests that the memory soft error rate in two real production systems (a rack-mounted server environment and a desktop PC environment) is much lower than what the previous studies concluded. Particularly in the server environment, with high probability, the soft error rate is at least two orders of magnitude lower than those reported previously. We discuss several potential causes for this result.

A 2009 study on Google's server farm notes that soft errors were difficult to find:

We provide strong evidence that memory errors are dominated by hard errors, rather than soft errors, which previous work suspects to be the dominant error mode.

Yet another large scale study from 2012 discovered that RAM errors were dominated by permanent failure modes typical of hard errors:

Our study has several main findings. First, we find that approximately 70% of DRAM faults are recurring (e.g., permanent) faults, while only 30% are transient faults. Second, we find that large multi-bit faults, such as faults that affects an entire row, column, or bank, constitute over 40% of all DRAM faults. Third, we find that almost 5% of DRAM failures affect board-level circuitry such as data (DQ) or strobe (DQS) wires. Finally, we find that chipkill functionality reduced the system failure rate from DRAM faults by 36x.

In the end, we decided the non-ECC RAM risk was acceptable for every tier of service except our databases. Which is kind of a bummer since higher end Skylake Xeons got pushed back to the big Purley platform upgrade in 2017. Regardless, we burn in every server we build with a complete run of memtestx86 and overnight prime95/mprime, and you should too. There's one whirring away through endless memory tests right behind me as I write this.

I find it very, very suspicious that ECC – if it is so critical to preventing these random, memory corrupting bit flips – has not already been built into every type of RAM that we ship in the ubiquitous computing devices all around the world as a cost of doing business. But I am by no means opposed to paying a small insurance premium for server farms, either. You'll have to look at the data and decide for yourself. Mostly I wanted to collect all this information in one place so people who are also evaluating the cost/benefit of ECC RAM for themselves can read the studies and decide what they want to do.

Please feel free to leave comments if you have other studies to cite, or significant measured data to share.

[advertisement] At Stack Overflow, we put developers first. We already help you find answers to your tough coding questions; now let us help you find your next job.
Categories: Programming

Android Studio 1.5

Android Developers Blog - Thu, 11/19/2015 - 19:53

Posted by, Jamal Eason, Product Manager, Android

Android Studio 1.5 is now available in the stable release channel. The latest release is focused on delivering more stability, with most of the enhancements being made under the hood (along with addressing several bugs).

Some of the specific bug fixes, include the ability to use short names when code-completing custom views.

In addition to the stability improvements and bug fixes, we’ve added a new feature to the memory profiler. It can now assist you in detecting some of the most commonly known causes of leaked activities.

There are also several new lint checks. Here's one below which warns you if you are attempting to override resources referenced from the manifest.

If you’re already using Android Studio, you can check for updates from the navigation menu (Help → Check for Update [Windows/Linux] , Android Studio → Check for Updates [OS X]). For new users, you can learn more about Android Studio, or download the stable version from the Android Studio site.

As always, we welcome your feedback on how we can help you. You can also connect with the Android developer tools team via Google+. And don’t worry about what’s in the box from the video. It’s nothing. Really. Forget I mentioned it.

Categories: Programming