Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

Reasons for Continuous Planning

I’m working on the program management book, specifically on the release planning chapter. One of the problems I see in programs is that the organization/senior management/product manager wants a “commitment” for an entire quarter. Since they think in quarter-long roadmaps, that’s not unreasonable—from their perspective.

AgileRoadmapOneQuarter.copyright-300x223There is a problem with commitments and the need for planning for an entire quarter. This is legacy (waterfall) thinking. Committing is not what the company actually wants. Delivery is what the company wants. The more often you deliver, the more often you can change.

That means changing how often you release and replan.

Consider these challenges for a one-quarter commitment:

  1. Even if you have small stories, you might not be able to estimate perfectly. You might finish something in less time than you had planned. Do you want to take advantage of schedule advances?
  2. In the case of too-large stories, where you can’t easily provide a good estimate, (where you need a percent confidence or some other mechanism to explain risk,) you are (in my experience) likely to under-estimate.
  3. What if something changes mid-quarter, and you want more options or a change in what the feature teams can deliver? Do you want to wait until the end of a quarter to change the program’s direction?

If you “commit” on a shorter cadence, you can manage these problems. (I prefer the term replan.)

If you consider a no-more-than-one-monthly-duration “commit,” you can see the product evolve, provide feedback across the program, and change what you do at every month milestone. That’s better.

Here’s a novel idea: Don’t commit to anything at all. Use continuous planning.

AgileRoadmapOneQuarter.copyright-1080x804

If you look at the one-quarter roadmap, you can see I show  three iterations worth of stories as MVPs. In my experience, that is at least one iteration too much look-ahead knowledge. I know very few teams who can see six weeks out. I know many teams who can see to the next iteration. I know a few teams who can see two iterations.

What does that mean for planning?

Do continuous planning with short stories. You can keep the 6-quarter roadmap. That’s fine. The roadmap is a wish list. Don’t commit to a one-quarter roadmap. If you need a commitment, commit to one iteration at a time. Or, in flow/kanban, commit to one story at a time.

That will encourage everyone to:

  1. Think small. Small stories, short iterations, asking every team to manage their WIP (work in progress) will help the entire program maintain momentum.
  2. See interdependencies. The smaller the features, the clearer the interdependencies are.
  3. Plan smaller things and plan for less time so you can reduce the planning load for the program. (I bet your planning for one iteration or two is much better and takes much less time than your one-quarter planning.)
  4. Use deliverable planning (“do these features”) in a rolling wave (continue to replan as teams deliver).

These ideas will help you see value more often in your program. When you release often and replan, you build trust as a program. Your managers might stop asking for “commits.”

If you keep the planning small, you don’t need to gather everyone in one big room once a quarter for release planning. If you do continuous planning, you might never need everyone in one room for planning. You might want everyone in one room for a kickoff or to help people see who is working on the program. That’s different than a big planning session, where people plan instead of deliver value.

If you are managing a program, what would it take for you to do continuous planning? What impediments can you see? What risks would you have planning this way?

Oh, and if you are on your way to agile and you use release trains, remember that the release train commits to a date, not scope and date.

Consider planning and replanning every week or two. What would it take for your program to do that?

Categories: Project Management

How Google Invented an Amazing Datacenter Network Only They Could Create

 

Google with justly earned pride recently announced:

Today at the 2015 Open Network Summit, we are revealing for the first time the details of five generations of our in-house network technology. From Firehose, our first in-house datacenter network, ten years ago to our latest-generation Jupiter network, we’ve increased the capacity of a single datacenter network more than 100x. Our current generation — Jupiter fabrics — can deliver more than 1 Petabit/sec of total bisection bandwidth. To put this in perspective, such capacity would be enough for 100,000 servers to exchange information at 10Gb/s each, enough to read the entire scanned contents of the Library of Congress in less than 1/10th of a second.

Google’s datacenter network is the magic behind what makes much of Google really work. But what is “bisectional bandwidth” and why does it matter? We talked about bisectional bandwidth a while back in Changing Architectures: New Datacenter Networks Will Set Your Code And Data Free. In short, bisectional bandwidth refers to the networks Google servers use to talk to each other.

Historically datacenter networks were oriented around talking to users. Let’s say a request for a web page came in from a browser. The request would go to a server and a reply was crafted by talking to just a few other servers, or perhaps even none at all, and the reply would be sent back to the client. This style of network is called a North/South oriented network. Very little internal communication was needed to implement a request.

That all changed as website and API services grew richer over time. Now literally thousands of backend requests can be made to create a single web page. Mind blowing. This meant communication shifted from being dominated by talking to users to talking to other machines within a datacenter. So these are called East/West oriented networks.

The shift to East/West dominate communication patterns meant a different topology was needed for datacenter networks. The old traditional fat tree network designs were out and something new needed to take its place.

Google has been on the forefront of developing new rich service supportive network designs largely because of their guiding vision of seeing The Datacenter as a Computer. Once your datacenter is the computer then your network is equivalent to a backplane on a single computer, so it must be as fast and reliable as possible so remote disk and remote storage can be accessed as if they were local.

Google’s efforts revolve around a three pronged plan of attack: use a Clos topology, use SDN (Software Defined Networking), and build their own custom gear in their own Googlish way.

Until now we’ve had a limited exposure to Google’s network designs. While we don’t exactly have an all access pass, Amin Vahdat, Google Fellow and Technical Lead for networking at Google, shared a lot of juicy details in a great talk: ONS [Open Networking Summit] 2015: Wednesday Keynote. There’s also a paper: Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network.

Why release details earlier than they usually do? Google has some real competition with Amazon and they need to find compelling points of differentiation. Google hopes their datacenter network is one such point.

So what makes Google different? The overall message:

  • The end of Moore’s Law means how programs are built is changing.

  • Google has figured it out. Google knows how to build great networks and achieve proper datacenter balance.

  • You can prosper by taking advantage of the innovations and capabilities of Google’s Cloud Platform, the very same platform that powers Google Search.

  • So climb on board, the network is fine! 

Is that enough? Perhaps it's not a message with mass appeal, but it may find a home with the discriminating buyer. 

Some key points from the talk for me:

  • We don’t know how to build big networks that deliver lots of bandwidth. Google says their network provides 1 Pb/sec of total bisection bandwidth, but it turns out that’s not nearly enough. To support a datacenter’s worth of large compute servers you’ll need 5 Pb/sec networks. Keep in mind the entire internet today is probably near 200Tb/s.

  • It’s more efficient to schedule jobs over huge clusters. Otherwise you have leftover CPU in one place and leftover memory in another. So if you can build your system correctly, a datacenter scale computer gives you a decided economy of scale.

  • Google built their datacenter network system using lessons they learned from the server and storage world: scale out, logically centralize, use commodity components, and never ever manage singlets of anything. Manage all your servers, storage, and networks as a unified whole.

  • The I/O gap is huge. Amin says it has to get solved, if it doesn’t then we’ll stop innovating. Storage capacity has increased through disaggregation. The opportunity is to access global datacenter storage as if it were local. This will get harder and harder with flash and NVM. A new tier of flash and NWM will completely change programming models. Note: unfortunately he didn’t expand on this notion, I dearly wished he had. Amin, can we talk?

What you look for in a good story are characters that act from a core identity. Here we see Google operating from a unique vision that grew organically from their deep experience building scalable software systems. Probably only Google would have had the guts to follow their vision through and build a datacenter network so completely different from accepted wisdom. That takes huge huevos. And it makes for a good story.

Here’s my hopelessly inadequate gloss on the talk:

Categories: Architecture

How to Predict the Release of Your Project Without Estimating

From the Editor of Methods & Tools - Mon, 08/10/2015 - 15:41
We often hear that estimating is a must in project management. “We can’t make decisions without them” we hear often. This video shows examples of how you can predict a release date of a project without any estimates, relying only on easily available data. Learn how you can follow progress on a project at all […]

Low-overhead rendering with Vulkan

Android Developers Blog - Mon, 08/10/2015 - 13:59

Posted by Shannon Woods, Technical Program Manager

Developers of games and 3D graphics applications have one key challenge to meet: How complex a scene can they draw in a small fraction of a second? Much of the work in graphics development goes into organizing data so it can be efficiently consumed by the GPU for rendering. But even the most careful developers can hit unforeseen bottlenecks, in part because the drivers for some graphics processors may reorganize all of that data before it can actually be processed. The APIs used to control these drivers are also not designed for multi-threaded use, requiring synchronization with locks around calls that could be more efficiently done in parallel. All of this results in CPU overhead, which consumes time and power that you’d probably prefer to spend drawing your scene.

Lowering overhead and handing control to developers

In order to address some of the sources of CPU overhead and provide developers with more explicit control over rendering, we’ve been working to bring a new 3D rendering API, Vulkan™, to Android. Like OpenGL™ ES, Vulkan is an open standard for 3D graphics and rendering maintained by Khronos. Vulkan is being designed from the ground up to minimize CPU overhead in the driver, and allow your application to control GPU operation more directly. Vulkan also enables better parallelization by allowing multiple threads to perform work such as command buffer construction at once.

An API is only useful if it does what you expect

To make it easier to write an application once that works across a variety of devices, Android 5.0 Lollipop significantly expanded the Android Compatibility Test Suite (CTS) with over fifty thousand new tests for OpenGL ES, and many more have been added since. This provides an extensive open source test suite for identifying problems in drivers so that they can be fixed, creating a more robust and reliable experience for both developers and end users. For Vulkan, we’ll not only develop similar tests for use in the Android CTS, but we’ll also contribute them to Khronos for use in Vulkan’s own open source Conformance Test Suite. This will enable Khronos to test Vulkan drivers across platforms and hardware, and improve the 3D graphics ecosystem as a whole.

It’s all about developer choice

We’ll be working hard to help create, test, and ship Vulkan, but at the same time, we’re also going to contribute to and support OpenGL ES. As a developer, you’ll be able to choose which API is right for you: the simplicity of OpenGL ES, or the explicit control of Vulkan. We’re committed to providing an excellent developer experience, no matter which API you choose.

Vulkan is still under development, but you’ll be able to find specifications, tests, and tools once they are released at http://www.khronos.org/vulkan.

Categories: Programming

Testing UI changes in large web applications

Xebia Blog - Mon, 08/10/2015 - 13:51

When a web application starts to grow in terms of functionality, number of screens and amount of code, automated testing becomes a necessity. Not only will these tests prevent you from delivering bugs to your users but also help to maintain a high speed of development. This ensures that you'll be focusing on new and better features without having to fix bugs in the existing ones.

However even with all kinds of unit-, integration- and end-to-end tests in place,  you'll still end up with a huge blind spot: does your application still looks like it's supposed to?

Can we test for this as well? (hint: we can).

Breaking the web's UI is easy

A web application's looks is determined by a myriad of HTML tags and CSS rules which are often re-used in many different combinations. And therein lies the problem: any seemingly innocuous change to markup or CSS could lead to a broken layout, unaligned elements or other unintended side effects. A change in CSS or markup for one screen, could lead to problems on another.

Additionally, as browsers are often being updated, CSS and markup bugs might be either fixed or introduced. How will you know if your application is still looking good in the latest Firefox or Chrome version or the next big new browsers of the future?

So how do we test this?

The most obvious method to prevent visual regressions in a web application is to manually click through every screen of an application using several browsers on different platforms, looking for problems. While this solution might work fine at first, it does not scale very well. The amount of screens you'll have to look through will increase, which will steadily increase the time you'll need for testing. This in turn will slow your development speed considerably.

Clicking through every screen every time you want to release a new feature is a very tedious process. And because you'll be looking at the same screens over and over again, you (and possibly your testing colleagues) will start to overlook things.

So this manual process slow downs your development process, it's error prone and, most importantly, it's no fun!

Automate all the things?

As a developer, my usual response to repetitive manual processes is to automate them away with some clever scripts or tools. Sadly, this solution won't work either. Currently it's not possible to let a script determine which visual change to a page is good or bad. While we might delegate this task to some revolutionary artificial intelligence in the future, it's not a solution we can use right now.

What we can do: automate the pieces of the visual testing process where we can, while still having humans determine whether a visual change is intended.

Also taking into account our quality and requirements in regards to development speed, we'll be looking for a tool that:

  • minimizes the manual steps in our development workflow
  • makes it easy to create, update, debug and run the tests
  • provides a solid user- and developer/tester experience
Introducing: VisualReview

To address these issues we're developing a new tool called VisualReview. Its goal is to provide a productive and human-friendly workflow for testing and reviewing your web application's layout for any regressions. In short, VisualReview allows you to:

  • use a (scripting) environment of your choice to automate screen navigation and making screenshots of selected screens
  • accept and reject any differences in screenshots between runs in a user-friendly workflow.
  • compare these screenshots against previously accepted ones.

With these features (and more to come) VisualReview's primary focus is to provide a great development process and environment for development teams.

How does it work?

VisualReview acts as a server that receives screenshots though a regular HTTP upload. When a screenshot is received, it's compared against a baseline and stores any differences it finds. After all screenshots have been analyzed someone from your team (a developer, tester or any other role) opens up the server's analysis page to view any differences and accepts or rejects them. Every screenshot that's been accepted will be set as a baseline for future tests.

VisualReview-how-it-works
Sending screenshots to VisualReview is typically done from a test script. We already provide an API for Protractor (AngularJS's browser testing tool, basically an Angular friendly wrapper around Selenium), however any environment could potentially use VisualReview as the upload is done using a simple HTTP REST call. A great example of this happened during a recent meetup where we presented VisualReview. During our presentation a couple of attendees created a node client for use in their own project. A working version was running even before the meetup was over.

Example workflow

To illustrate how this works in practice I'll be using an example web application. In this case a twitter clone called 'Deep Thoughts' where users can post a single-sentence thought, similar to Reddit's shower thoughts.
VisualReview-example-site
Deep Thoughts is an Angular application so I'll be using Angular's browser testing tool Protractor to test for visual changes. Protractor does not support sending screenshots to VisualReview by default, so we'll be using visualreview-protractor as a dependency to the protractor test suite. After adding some additional protractor configuration and made sure the VisualReview server is running, we're ready to run the test script. The test script could look like this:

var vr = browser.params.visualreview;
describe('the deep thoughts app', function() {
  it('should show the homepage', function() {
    browser.get('http://localhost:8000/#/thoughts');
    vr.takeScreenshot('main');
  });
  [...]
});

With all pieces in place, we can now run the Protractor script:

protractor my-protractor-config.js

When all tests have been executed, the test script ends with the following message:

VisualReview-protractor: test finished. Your results can be viewed at: http://localhost:7000/#/1/1/2/rp

Opening the link in a browser it shows the VisualReview screenshot analysis tool.

VisualReview analysis screen

For this example we've already created a baseline of images, so this screen now highlights differences between the baseline and the new screenshot. As you can see, the left and right side of the submit button are highlighted in red: it seems that someone has changed the button's width. Using keyboard or mouse navigation, I can view both the new screenshot and the baseline screenshot. The differences between the two are highlighted in red.

Now I can decide whether or not I'm going to accept this change using the top menu.

Accepting or rejecting screenshots in VisualReview

If I accept this change, the screenshot will replace the one in the baseline. If I reject it, the baseline image will remain as it is while this screenshot is marked as a 'rejection'.  With this rejection state, I can now point other team members to look at all the rejected screenshots by using the filter option which allows for better team cooperation.

VisualReview filter menu

Open source

VisualReview is an open source project hosted on github. We recently released our first stable version and are very interested in your feedback. Try out the latest release or run it from an example project. Let us know what you think!

 

Which of These Fears is Holding You Back

Making the Complex Simple - John Sonmez - Mon, 08/10/2015 - 13:00

I’ve sat here looking at this blank page for far too long. I wanted to start off by telling you about why fear is so limiting. Why fears—and the various manifestations of fear—are crippling to you, me, and everyone else in this world. But, the truth is: I… was afraid. I was afraid that I […]

The post Which of These Fears is Holding You Back appeared first on Simple Programmer.

Categories: Programming

Reasoning About the "Estimating" Problem

Herding Cats - Glen Alleman - Mon, 08/10/2015 - 00:59

Let's start with a background piece on estimating. The Fermi Problem. A Fermi estimate is an order estimate of something. Not an order of magnitude (that's a 10X estimates, easy for anyone to make). These types of problems are encountered in physics and engineering education. From personal experience in oral exams where we were asked to estimate something quickly on the black board (yes the Black Board). Something like, what is the orbital velocity of a star with a specific mass composed of a specific set of fusion elements? You have 5 minutes young student, work quickly.

These back of the envelope calculations are well know exercises to show how to make estimates in the presence of uncertainty and with very little data in hand. This technique was named after Enrico Fermi for his ability to make good approximation with little or not actual data. These types of problems involve making justified guesses (not the types on uninformed guesses we see in many domains), with upper and lower variances.

A nice example is how many piano tuners are there in Chicago in 2009?

  1. There are approximately 9,000,000 people living in Chicago.
  2. On average, there are two persons in each household in Chicago.
  3. Roughly one household in twenty has a piano that is tuned regularly.
  4. Pianos that are tuned regularly are tuned on average about once per year.
  5. It takes a piano tuner about two hours to tune a piano, including travel time.
  6. Each piano tuner works eight hours in a day, five days in a week, and 50 weeks in a year.

With these assumptions, the number of piano tunings a year is approximately

  • (9,000,000 people in Chicago / 2 people per house) x 1 piano per 20 houses x 1 tuner per piano per year = 225,000 tuning per year
  • (50 weeks per year x 5 day per week x 8 hours a day) / 2 hours to tune = 1000 tunings per year per piano tuner
  • (225,000 tunings per year) / 1000 tunings per year per tuner = 225 tuners in Chicago in 2009
  • The actual number in 2009 was 290

This is similar to the Drake equation which estimates the number of intelligent civilizations in our galaxy. This approach by the way, may be one of the reasons estimating is seen as hard or even not possible by some. They missed those opportunities where estimating is taught. 

What Does This Have to do with Project Management?

Estimation theory is a critical aspect of project management. When spending other peoples money in the presence of uncertainty we need to make decisions in the presence of this uncertainty. Estimation theory is a branch of statistics dealing with estimating values of parameters (numbers) based on measured/empirical data that have random values. The parameters describe the underlying physical process in a way that their values affect the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements.

In the project world, we have three core variables - cost, schedule, and technical performance. These are interdependent, likely non-linear, and many times non-stationary (evolving in time). There is nice course at MIT OCW course on the Art of Approximation in Science and Engineering  These back of an envelope estimates are critical to success in engineering and science. They are also critical to estimating in software development.

So when we hear it's hard or it's not even possible to estimate software development, don't believe it for a moment. Here's a butt simple way on How to Estimate Almost Any Software Deliverable.

The next thing we hear is estimates are the smell of dysfunction. And of course no dysfunctions are named, no root cause of the dysfunction named, and no corrective actions named - only stop estimating since estimates are evil, used as commitments, and misused to punish developers.

The Real Bottom Line

In business a framing assumption of managerial finance. This framing assumptions informs those of us on the business side of spending our money when making decisions. This is the basis of microeconomics of decision making. 

When it is conjectured that decisions can be made in the presence of uncertainty in the absence of estimating to cost and impacts of those decisions without making estimates of those outcomes, we have to ask are those making those conjectures informed by any framework based in the process of business management? It appears not.

Related articles Estimating Processes in Support of Economic Analysis Making Conjectures Without Testable Outcomes Applying the Right Ideas to the Wrong Problem Root Cause of Project Failure IT Risk Management
Categories: Project Management

Flaws and Fallacies of #NoEstimates

Herding Cats - Glen Alleman - Sun, 08/09/2015 - 23:39

All the work we do in the projects domain is driven by uncertainty. Uncertainty of some probabilistic future event impacting our project. Uncertainty in the work activities performed while developing a product or service.

Decision making in the presence of these uncertainties is a natural process in all of business.

The decision maker is asked to express her beliefs by assigning probabilities to certain possible states of the system in the future and the resulting outcomes of those states.

What's the chance we'll have this puppy ready for VMWorld in August? What's the probability that when we go live and 300,000 users logon we'll be able to handle the load? What's our test coverage for the upcoming release given we've added 14 new enhancements to the code base this quarter? Questions like that are normal everyday business questions, along with what's the expected delivery date, what's the expected total sunk cost, and what's the expected bookable value measured in Dead Presidents for the system when it goes live?

To answer these and the unlimited number of other business, technical, operational, performance, security, and financial questions, we need to know something about probability and statistics. This knowledge  is an essential tool for decision making no matter the domain.

Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write - H.G. Wells

If we accept the notion that all project work is probabilistic, driven by the underlying statistical processes of time, cost, and technical outcomes, including Effectiveness, Performance, Capabilities, and all the ...ilities that  manifest and determine value after a system is put into initial use. Then these conditions are the source of uncertainty and come in two types:

  • Reducible - event based with a probability of occurrence within a specified time period.
  • Irreducible - naturally occurring by a Probability Distribution Function of the variances produced by the underlying process.

If you don't accept this - that all project work is probabilistic in nature - stop reading, this Blog is not for you.

If you do accept that all project work is uncertain, then there are some more assumptions we need to make sense of the decision making processes. The term statistic has two definitions - one long ago and a current one. The long ago one means a fact, referring to numerical facts. A numerical fact as a measurement, a count, or a rank. This number can represent a total, an average or a percentage of several such measures. This term also applied to the broad discipline of statistical manipulation in the same way accounting applies to entering and balancing accounts. 

Statistics in the second sense is a set of methods for obtaining, organizing, and summarizing numerical facts. These facts usually represent a partial rather than complete knowledge about a situation. For example the sample of the population rather than counting the entire population in the case of the census.

These numbers - statistics - are usually subjected to formal statistical analysis to help in our decision making in the presence of uncertainty.

In our software project world uncertainty is an inherent fact. Software uncertainty is likely much higher than in construction, since the requirements in software development are soft unlike the requirements in interstate highway development. But while the domain may have different variance in the level of uncertainty, estimates are still needed to make decisions in the presence of these uncertainties. Highway development has many uncertainties - none the least is the weather and weather delays. 

When you measure what you are speaking about and express it in numbers you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind - Lord Kelvin

Decisions are made on data. Otherwise those decisions are just gut feel, intuition, and at their core guesses. When you are guessing with other peoples money you have a low probability of keeping your job or the business staying in business. 

... a tale told by an idiot, full of sound and fury, signifying nothing - Shakespeare

When we hear personal anecdotes about how to correct a problem and the conjecture that those anecdotes are applicable outside the individual telling the anecdote - beware. Without a test of any conjecture it is just a conjecture. 

He uses statistics as a drunken man uses lampposts - for support rather than illumination - Andrew Lang

We many times confuse a symptom with the cause. When reading about all the failures in IT projects, and probability of failure, the number of failures versus success, there is rarely - in those naive posts on that topic - any assessment of the cause of the failure. The Root Cause analysis is not present. The Chaos Report is the most egregious of these. 

There is no merit where there is no trial; and till experience stamps the mark of strength, cowards may pass for heroes, and faith for falsehood - A. Hill

Tossing out anecdotes, platitudes, and misquoted quotes does not make for a credible argument for anything. I knew a person that did X successfully, therefore you should have the same experience is common. Or just try it you may find it works for you just like it worked for me. 

It seems there are no Principles or tested Practices in the approach to improving projects success. Just platitudes and anecdotes - masking chatter as process improvement advice. 

I started to write a detailed exposition using this material for the #NoEstimates conjecture that decisions can be made without an estimate. But Steve McConnell's post is much better than anything I could have done. So here's the wrap up...

When it is conjectured that decisions, any decisions, some decisions, self selected decisions, can be made in the presence of uncertainty can be made without also making an estimate of the outcome of that decision, the cost of that decision, the impact of that decision - then let's hear how, so we can test it outside personal opinion and anecdote.

References 

It's time for #NoEstimates advocates to provide some principle based examples of how to make decisions in the presence of uncertainty without estimating. Here these are populist books (Books without the heavy math), but still capable of conveying the principles of the topic can be a source of learning. 

  1. Flaws and Fallacies in Statistical Thinking, Stephen K. Campbell, Prentice Hall, 1974
  2. The Economics of Iterative Software Development: Steering Toward Better Business Results, Walker Royce, Kurt Bittner, and Mike Perrow, Addison Wesley, 2009.
  3. How Not to be Wrong: The Power of Mathematical Thinking, Jordan Ellenberg, Penguin Press, 2014
  4. Hard Facts, Dangerous Half-Truths & Total Nonsense: Profiting from Evidence Based Management, Jeffery Pfeffer and Robert I. Sutton, Harvard Business School Press, 2006.
  5. How to Measure Anything, Finding the Value of Intangibles in Business, 3rd Edition, Douglas W. Hubbard, John Wiley & Sons, 2014.
  6. Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways Ways to Lie With Statistics, Gary Smith
  7. Center for Informed Decision Making
  8. Decision Making for the Professional, Peter McNamee and John Celona

Some actual math books on the estimating problem

  1. Probability Methods for Cost Uncertainty Analysis, Pau R. Garvey
  2. Making Hard Decisions: An Introduction to Decision Analysis, 2nd Edition, Robert T, Clemen, Duxbury Press, 1996.
  3. Estimating Software Intensive Systems, Richard D. Stutzke, Addison Wesley, 2005.
  4. Probabilities as Similarly Weighted Frequencies, Antoine Billot · Itzhak Gilboa · Dov Samet · David Schmeidler
Related articles Making Conjectures Without Testable Outcomes Estimating Processes in Support of Economic Analysis Applying the Right Ideas to the Wrong Problem Estimating and Making Decisions in Presence of Uncertainty
Categories: Project Management

SPaMCAST 354 -Allan Kelly, #NoProjects

Software Process and Measurement Cast - Sun, 08/09/2015 - 22:00

The week’s Software Process and Measurement Cast features our interview with Allan Kelly.  We talked #NoProjects and having a focus of delivering a consistent flow of value.  The classic project framework causes us to focus on being on-time, on-budget and on-scope, but not on-value. If we don’t focus on delivering the maximum value we are doing both our customers and ourselves a great disservice. 

Allan Kelly advises teams from many different companies and domains on adopting and deepening Agile practices and development in general. He specializes in working with software product companies and aligning products and processes with company strategy. When he is not with clients he writes far too much.  

He holds BSc and MBA degrees, is the author of three books: "Xanpan - team centric Agile Software Development" (https://leanpub.com/xanpan), "Business Patterns for Software Developers" and “Changing Software Development: Learning to be Agile”. In addition he is the originator of Retrospective Dialogue Sheets (http://www.dialoguesheets.com) and a regular conference speaker. He can be found on Twitter as @allankellynet (http://twitter.com/allankellynet) and blogs (http://blog.allankelly.net).

Call to Action!

I have a challenge for the Software Process and Measurement Cast listeners for the next few weeks. I would like you to find one person that you think would like the podcast and introduce them to the cast. This might mean sending them the URL or teaching them how to download podcasts. If you like the podcast and think it is valuable they will be thankful to you for introducing them to the Software Process and Measurement Cast. Thank you in advance!

Re-Read Saturday News

Remember that the Re-Read Saturday of The Mythical Man-Month is in full swing.  This week we tackle the essay titled “Passing the Word”!  Check out the new installment at Software Process and Measurement Blog.

Upcoming Events

Software Quality and Test Management
September 13 – 18, 2015
San Diego, California
http://qualitymanagementconference.com/

I will be speaking on the impact of cognitive biases on teams.  Let me know if you are attending! If you are still deciding on attending let me know because I have a discount code.

Agile Development Conference East
November 8-13, 2015
Orlando, Florida
http://adceast.techwell.com/

I will be speaking on November 12th on the topic of Agile Risk. Let me know if you are going and we will have a SPaMCAST Meetup.

Next SPaMCAST

The next Software Process and Measurement feature our essay titled, Agile Success.  How do we define success with Agile?  If we can’t define what success using Agile is and how we can measure it, anyone adopting Agile is bound to wander aimlessly.  Wandering aimlessly is bad for your career and potentially for the careers of everyone around you!

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here. Available in English and Chinese.

Categories: Process Management

SPaMCAST 353 -Allan Kelly, #NoProjects

 www.spamcast.net

http://www.spamcast.net

Listen to SPaMCAST 354

Subscribe on iTunes

The week’s Software Process and Measurement Cast features our interview with Allan Kelly.  We talked #NoProjects and having a focus of delivering a consistent flow of value.  The classic project framework causes us to focus on being on-time, on-budget and on-scope, but not on-value. If we don’t focus on delivering the maximum value we are doing both our customers and ourselves a great disservice.

Allan Kelly advises teams from many different companies and domains on adopting and deepening Agile practices and development in general. He specializes in working with software product companies and aligning products and processes with company strategy. When he is not with clients he writes far too much.

He holds BSc and MBA degrees, is the author of three books: “Xanpan – team centric Agile Software Development” (https://leanpub.com/xanpan), “Business Patterns for Software Developers” and “Changing Software Development: Learning to be Agile”. In addition he is the originator of Retrospective Dialogue Sheets (http://www.dialoguesheets.com) and a regular conference speaker. He can be found on Twitter as @allankellynet (http://twitter.com/allankellynet) and blogs (http://blog.allankelly.net).

Call to Action!

I have a challenge for the Software Process and Measurement Cast listeners for the next few weeks. I would like you to find one person that you think would like the podcast and introduce them to the cast. This might mean sending them the URL or teaching them how to download podcasts. If you like the podcast and think it is valuable they will be thankful to you for introducing them to the Software Process and Measurement Cast. Thank you in advance!

Re-Read Saturday News

Remember that the Re-Read Saturday of The Mythical Man-Month is in full swing.  This week we tackle the essay titled “Passing the Word”!  Check out the new installment at Software Process and Measurement Blog.

Upcoming Events

Software Quality and Test Management

September 13 – 18, 2015

San Diego, California

http://qualitymanagementconference.com/

I will be speaking on the impact of cognitive biases on teams.  Let me know if you are attending! If you are still deciding on attending let me know because I have a discount code.

 

Agile Development Conference East

November 8-13, 2015

Orlando, Florida

http://adceast.techwell.com/

I will be speaking on November 12th on the topic of Agile Risk. Let me know if you are going and we will have a SPaMCAST Meetup.

Next SPaMCAST

The next Software Process and Measurement feature our essay titled, Agile Success.  How do we define success with Agile?  If we can’t define what success using Agile is and how we can measure it, anyone adopting Agile is bound to wander aimlessly.  Wandering aimlessly is bad for your career and potentially for the careers of everyone around you!

Shameless Ad for my book!

Mastering Software Project Management: Best Practices, Tools and Techniques co-authored by Murali Chematuri and myself and published by J. Ross Publishing. We have received unsolicited reviews like the following: “This book will prove that software projects should not be a tedious process, neither for you or your team.” Support SPaMCAST by buying the book here. Available in English and Chinese.


Categories: Process Management

Re-Read Saturday: The Mythical Man-Month, Part 6 – Passing the Word

The Mythical Man-Month

The Mythical Man-Month

In the sixth essay of The Mythical Man-Month, titled Passing the Word, Brooks tackles one of the largest problems any large project will have: communicating the architecture. Whether you have defined the architecture upfront or just as it is needed, passing the word is critical to ensuring everyone stays on the same page and what gets built works and is what is wanted. This essay provided Brooks’ take on how to ensure that a large number of people hear, understand and implement the architects’ decisions. He describes seven inter-locking techniques for passing the word; they are:

  1. Written specifications. Many developers value documentation on the same level as reality TV shows; however, as projects and products are scaled past one or two co-located teams documentation becomes a valuable tool. Written specifications define limits, appearance, UX and interfaces. In essence, the written specification is the primary output of the architect that provides everyone with the boundaries of the product and how the user will  interact with the product.  What the written spec, the product of the architects, doesn’t define how the guts of the product will work which is the purview of the developers.  Written specifications do not have to relate to large paper manuals; techniques like WIKI’s have been used to capture and transmit specifications and to solicit interaction.
  2. Formal definitions. Words are great, but imprecise. Even when everyone involved in an effort shares the same first language (which is less and less true as the metaphorical world shrinks). Formal languages and modeling techniques can be used to document the specification and to capture exceptions and explanations. Alternately, simulators and prototypes are mechanisms that can be used to capture and document the specifications.
    The problem with having two definitions of the same idea is what happens when they disagree. Brooks points out that the answer is to never have two communication methods.  Rather he points out you need a tool to break the tie if a disagreement occurs. Either have one method of communicating the spec or have three (think odd numbers).
  3. Direct incorporation. Direct incorporation builds a structure or framework for the product that cannot be changed by the implementer.  For example, a set of predefined objects or classes. Deviations and changes, when needed, require renaming and recompiling modules and interfaces. I view this as more of a control mechanism; however, the original structure acts as baseline to communicate the architectural vision.
  4. Conferences and courts. This category can be described at a high level in one word – meetings. Brooks suggests two types of meetings to control and communicate change. The first is the conference. A conference is a group meeting held on a periodic basis (weekly or monthly) that includes all architects and representatives from the hardware and software developers. Changes and refinements are reviewed and decisions are made. Consensus drives decisions; however, if consensus cannot be achieved the lead architect decides (appeals to overall project leader are allowed). This type of meeting might be recognized as a type of architectural change control board (CCB). The second type of meeting is the “court.” The court is more of a formal meeting of the architects, representatives of the implementers, management, marketing (if relevant) and other stakeholders to make decisions about any nagging issues on how the architectural specification is to be implemented. Courts are typically held annually or semi-annually.
  5. Multiple implementations. One possible solution to the issue of discrepancies between the specifications and what is implemented is to support multiple implementations. Alternate solutions can move forward and be evaluated. While sometimes possible, in general this solution can generate a significant drain on people and resources.
  6. The telephone log. Questions to the architects come up as implementers interact with the specification. In this technique you capture all questions and answers and publish them so everyone can benefit from the conversations. Wikis make a great tool for capturing and disseminating Q&A content.
  7. The product test. The independent test is a tool to identify discrepancies between the specification and the implementation. Some form of independence, whether as an independent test group or through test driven development, is needed to ensure a consistent translation of the vision into a product. Remember that the final arbiter is the customer/user and their product test will be merciless.

Communication is the single most prevalent problem any large group effort will encounter. In Passing the Word, Brooks provides seven possible mechanisms to ensure that everyone hears the same story and has the chance to develop a clear and consistent understanding of that story.

Do you have other solutions that you can suggest? Please share?

Previous installments of the Re-read of The Mythical Man-Month

Introductions and The Tar Pit

The Mythical Man-Month (The Essay)

The Surgical Team

Aristocracy, Democracy and System Design

The Second-System Effect


Categories: Process Management

Record Linkage: Playing around with Duke

Mark Needham - Sat, 08/08/2015 - 23:50

I’ve become quite interesting in record linkage recently and came across the Duke project which provides some tools to help solve this problem. I thought I’d give it a try.

The typical problem when doing record linkage is that we have two records from different data sets which represent the same entity but don’t have a common key that we can use to merge them together. We therefore need to come up with a heuristic that will allow us to do so.

Duke has a few examples showing it in action and I decided to go with the linking countries one. Here we have countries from Dbpedia and the Mondial database and we want to link them together.

The first thing we need to do is build the project:

export JAVA_HOME=`/usr/libexec/java_home`
mvn clean package -DskipTests

At the time of writing this will put a zip fail containing everything we need at duke-dist/target/. Let’s unpack that:

unzip duke-dist/target/duke-dist-1.3-SNAPSHOT-bin.zip

Next we need to download the data files and Duke configuration file:

wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries-dbpedia.csv
wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries.xml
wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries-mondial.csv
wget https://raw.githubusercontent.com/larsga/Duke/master/doc/example-data/countries-test.txt

Now we’re ready to give it a go:

java -cp "duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --testfile=countries-test.txt --testdebug --showmatches countries.xml
 
...
 
NO MATCH FOR:
ID: '7706', NAME: 'guatemala', AREA: '108890', CAPITAL: 'guatemala city',
 
MATCH 0.9825124555160142
ID: '10052', NAME: 'pitcairn islands', AREA: '47', CAPITAL: 'adamstown',
ID: 'http://dbpedia.org/resource/Pitcairn_Islands', NAME: 'pitcairn islands', AREA: '47', CAPITAL: 'adamstown',
 
Correct links found: 200 / 218 (91.7%)
Wrong links found: 0 / 24 (0.0%)
Unknown links found: 0
Percent of links correct 100.0%, wrong 0.0%, unknown 0.0%
Records with no link: 18
Precision 100.0%, recall 91.74311926605505%, f-number 0.9569377990430622

We can look in countries.xml to see how the similarity between records is being calculated:

  <schema>
    <threshold>0.7</threshold>
...
    <property>
      <name>NAME</name>
      <comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
      <low>0.09</low>
      <high>0.93</high>
    </property>
    <property>
      <name>AREA</name>
      <comparator>no.priv.garshol.duke.comparators.NumericComparator</comparator>
      <low>0.04</low>
      <high>0.73</high>
    </property>
    <property>
      <name>CAPITAL</name>
      <comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
      <low>0.12</low>
      <high>0.61</high>
    </property>
  </schema>

So we’re working out similarity of the capital city and country by calculating their Levenshtein distance i.e. the minimum number of single-character edits required to change one word into the other

This works very well if there is a typo or difference in spelling in one of the data sets. However, I was curious what would happen if the country had two completely different names e.g Cote d’Ivoire is sometimes know as Ivory Coast. Let’s try changing the country name in one of the files:

"19147","Cote dIvoire","Yamoussoukro","322460"
java -cp "duke-dist-1.3-SNAPSHOT/lib/*" no.priv.garshol.duke.Duke --testfile=countries-test.txt --testdebug --showmatches countries.xml
 
NO MATCH FOR:
ID: '19147', NAME: 'ivory coast', AREA: '322460', CAPITAL: 'yamoussoukro',

I also tried it out with the BBC and ESPN match reports of the Man Utd vs Tottenham match – the BBC references players by surname, while ESPN has their full names.

When I compared the full name against surname using the Levenshtein comparator there were no matches as you’d expect. I had to split the ESPN names up into first name and surname to get the linking to work.

Equally when I varied the team name’s to be ‘Man Utd’ rather than ‘Manchester United’ and ‘Tottenham’ rather than ‘Tottenham Hotspur’ that didn’t work either.

I think I probably need to write a domain specific comparator but I’m also curious whether I could come up with a bunch of training examples and then train a model to detect what makes two records similar. It’d be less deterministic but perhaps more robust.

Categories: Programming

Refactoring JavaScript from Sync to Async in Safe Baby-Steps

Mistaeks I Hav Made - Nat Pryce - Sat, 08/08/2015 - 17:10
Consider some JavaScript code that gets and uses a value from a synchronous call or built in data structure: function to_be_refactored() { var x; ... x = get_x(); ...use x... } Suppose we want to replace this synchronous call with a call to a service that has an asynchronous API (an HTTP fetch, for example). How can we refactor the code from synchronous to asynchronous style in small safe steps? First, wrap the the remainder of the function after the line that gets the value in a “continuation” function that takes the value as a parameter and closes over any other variables in its environment. Pass the value to the continuation function: function to_be_refactored() { var x, cont; ... x = get_x(); cont = function(x) { ...use x... }; cont(x); } Then pull the definition of the continuation function before the code that gets the value: function to_be_refactored() { var x, cont; cont = function(x) { ...use x... }; ... x = get_x(); cont(x); } Now extract the last two lines that get the value and call the continuation into a single function that takes the continuation as a parameter and pass the continuation to it. function to_be_refactored() { ... get_x_and(function(x) { ...use x... }); } function get_x_and(cont) { cont(get_x()); } If you have calls to get_x in many places in your code base, move get_x_and into a common module so that it can be called everywhere that get_x is called. Transform the remaining uses of get_x to “continuation passing style”, replacing the calls to get_x with calls toget_x_and. Finally, replace the implementation of get_x_and with a call to the async service and delete the get_x function. Wouldn’t it be nice if IDEs could do this refactoring automatically? The Trouble With Shared Mutable State Dale Hagglund asked via Twitter “What if cont assumes that some [shared mutable] property remains constant across the async invocation? I’ve always found these very hard to unmake.” In that case, you’ll have to copy the current value of the shared, mutable property into a local variable that is then closed over by the continuation. E.g. function to_be_refactored() { var x; ... x = get_x(); ...use x and shared_mutable_y() ... } would have to become: function to_be_refactored() { var y; ... y = shared_mutable_y(); get_x_and(function(x) { ...use x and y... }); }
Categories: Programming, Testing & QA

More Misconceptions of Waterfall

Herding Cats - Glen Alleman - Sat, 08/08/2015 - 16:47

It is popular in some agile circle to use Waterfall as the stalking horse for every bad management practices in software development. A recent example is

Go/No Go decisions are a residue of waterfall thinking. All software can built incrementally and most released incrementally.

Nothing in Waterfall prohibits incremental release. In fact the notion of block release is the basis of most Software Intensive Systems development. From the point of view of the business capabilities are what they bought. The capability to do something of value in exchange for the cost of that value. Here's an example in health insurance business. Incremental release of features is of little value if those features don't work together to provide some needed capability to conduct business. A naive approach is the release early and release often platitude of some in the agile domain. Let's say we're building a personnel management system. This includes recruiting, on-boarding, provisioning, benefits signup, time keeping, and payroll. It's not be very useful to release the time keeping feature if the payroll feature was not ready. 

Screen Shot 2015-08-08 at 9.37.16 AM

So before buying into the platitude of release early and often ask what does the business need to do business? Then draw a picture like the one about, develop a Plan for producing those capabilities in the order they are needed to deliver the needed value. Without this approach, you'll be spending money without producing value and calling that agile. 

That way you can stop managing other peoples money with Platitudes and replace them with actual business management processes. So every time you hear a platitude masking as good management, ask does that person using that platitude work anywhere that is high value at risk? No, then probably has yet to encounter that actual management of other peoples money

Related articles Capabilities Based Planning - Part 2 Are Estimates Really The Smell of Dysfunction?
Categories: Project Management

Welcome to The Internet of Compromised Things

Coding Horror - Jeff Atwood - Sat, 08/08/2015 - 11:59

This post is a bit of a public service announcement, so I'll get right to the point:

Every time you use WiFi, ask yourself: could I be connecting to the Internet through a compromised router with malware?

It's becoming more and more common to see malware installed not at the server, desktop, laptop, or smartphone level, but at the router level. Routers have become quite capable, powerful little computers in their own right over the last 5 years, and that means they can, unfortunately, be harnessed to work against you.

I write about this because it recently happened to two people I know.

.@jchris A friend got hit by this on newly paved win8.1 computer. Downloaded Chrome, instantly infected with malware. Very spooky.

— not THE Damien Katz (@damienkatz) May 20, 2015

@codinghorror *no* idea and there’s almost ZERO info out there. Essentially malicious JS adware embedded in every in-app browser

— John O'Nolan (@JohnONolan) August 7, 2015

In both cases, they eventually determined the source of the problem was that the router they were connecting to the Internet through had been compromised.

This is way more evil genius than infecting a mere computer. If you can manage to systematically infect common home and business routers, you can potentially compromise every computer connected to them.

Hilarious meme images I am contractually obligated to add to each blog post aside, this is scary stuff and you should be scared.

Router malware is the ultimate man-in-the-middle attack. For all meaningful traffic sent through a compromised router that isn't HTTPS encrypted, it is 100% game over. The attacker will certainly be sending all that traffic somewhere they can sniff it for anything important: logins, passwords, credit card info, other personal or financial information. And they can direct you to phishing websites at will – if you think you're on the "real" login page for the banking site you use, think again.

Heck, even if you completely trust the person whose router you are using, they could be technically be doing this to you. But they probably aren't.

Probably.

In John's case, the attackers inserted annoying ads in all unencrypted web traffic, which is an obvious tell to a sophisticated user. But how exactly would the average user figure out where this junk is coming from (or worse, assume the regular web is just full of ad junk all the time), when even a technical guy like John – founder of the open source Ghost blogging software used on this very blog – was flummoxed?

But that's OK, we're smart users who would only access public WiFi using HTTPS websites, right? Sadly, even if the traffic is HTTPS encrypted, it can still be subverted! There's an extremely technical blow-by-blow analysis at Cryptostorm, but the TL;DR is this:

Compromised router answers DNS req for *.google.com to 3rd party with faked HTTPS cert, you download malware Chrome. Game over.

HTTPS certificate shenanigans. DNS and BGP manipulation. Very hairy stuff.

How is this possible? Let's start with the weakest link, your router. Or more specifically, the programmers responsible for coding the admin interface to your router.

They must be terribly incompetent coders to let your router get compromised over the Internet, since one of the major selling points of a router is to act as a basic firewall layer between the Internet and you… right?

In their defense, that part of a router generally works as advertised. More commonly, you aren't being attacked from the hardened outside. You're being attacked from the soft, creamy inside.

That's right, the calls are coming from inside your house!

By that I mean you'll visit a malicious website that scripts your own browser to access the web-based admin pages of your router, and reset (or use the default) admin passwords to reconfigure it.

Nasty, isn't it? They attack from the inside using your own browser. But that's not the only way.

  • Maybe you accidentally turned on remote administration, so your router can be modified from the outside.

  • Maybe you left your router's admin passwords at default.

  • Maybe there is a legitimate external exploit for your router and you're running a very old version of firmware.

  • Maybe your ISP provided your router and made a security error in the configuration of the device.

In addition to being kind of terrifying, this does not bode well for the Internet of Things.

Internet of Compromised Things, more like.

OK, so what can we do about this? There's no perfect answer; I think it has to be a defense in depth strategy.

Inside Your Home

Buy a new, quality router. You don't want a router that's years old and hasn't been updated. But on the other hand you also don't want something too new that hasn't been vetted for firmware and/or security issues in the real world.

Also, any router your ISP provides is going to be about as crappy and "recent" as the awful stereo system you get in a new car. So I say stick with well known consumer brands. There are some hardcore folks who think all consumer routers are trash, so YMMV.

I can recommend the Asus RT-AC87U – it did very well in the SmallNetBuilder tests, Asus is a respectable brand, it's been out a year, and for most people, this is probably an upgrade over what you currently have without being totally bleeding edge overkill. I know it is an upgrade for me.

(I am also eagerly awaiting Eero as a domestic best of breed device with amazing custom firmware, and have one pre-ordered, but it hasn't shipped yet.)

Download and install the latest firmware. Ideally, do this before connecting the device to the Internet. But if you connect and then immediately use the firmware auto-update feature, who am I to judge you.

Change the default admin passwords. Don't leave it at the documented defaults, because then it could be potentially scripted and accessed.

Turn off WPS. Turns out the Wi-Fi Protected Setup feature intended to make it "easy" to connect to a router by pressing a button or entering a PIN made it … a bit too easy. This is always on by default, so be sure to disable it.

Turn off uPNP. Since we're talking about attacks that come from "inside your house", uPNP offers zero protection as it has no method of authentication. If you need it for specific apps, you'll find out, and you can forward those ports manually as needed.

Make sure remote administration is turned off. I've never owned a router that had this on by default, but check just to be double plus sure.

For Wifi, turn on WPA2+AES and use a long, strong password. Again, I feel most modern routers get the defaults right these days, but just check. The password is your responsibility, and password strength matters tremendously for wireless security, so be sure to make it a long one – at least 20 characters with all the variability you can muster.

Pick a unique SSID. Default SSIDs just scream hack me, for I have all defaults and a clueless owner. And no, don't bother "hiding" your SSID, it's a waste of time.

Optional: use less congested channels for WiFi. The default is "auto", but you can sometimes get better performance by picking less used frequencies at the ends of the spectrum. As summarized by official ASUS support reps:

  • Set 2.4 GHz channel bandwidth to 40 MHz, and change the control channel to 1, 6 or 11.

  • Set 5 GHz channel bandwidth to 80 MHz, and change the control channel to 165 or 161.

Experts only: install an open source firmware. I discussed this a fair bit in Everyone Needs a Router, but you have to be very careful which router model you buy, and you'll probably need to stick with older models. There are several which are specifically sold to be friendly to open source firmware.

Outside Your Home

Well, this one is simple. Assume everything you do outside your home, on a remote network or over WiFi is being monitored by IBGs: Internet Bad Guys.

I know, kind of an oppressive way to voyage out into the world, but it's better to start out with a defensive mindset, because you could be connecting to anyone's compromised router or network out there.

But, good news. There are only two key things you need to remember once you're outside, facing down that fiery ball of hell in the sky and armies of IBGs.

  1. Never access anything but HTTPS websites.

    If it isn't available over HTTPS, don't go there!

    You might be OK with HTTP if you are not logging in to the website, just browsing it, but even then IBGs could inject malware in the page and potentially compromise your device. And never, ever enter anything over HTTP you aren't 100% comfortable with bad guys seeing and using against you somehow.

    We've made tremendous progress in HTTPS Everywhere over the last 5 years, and these days most major websites offer (or even better, force) HTTPS access. So if you just want to quickly check your GMail or Facebook or Twitter, you will be fine, because those services all force HTTPS.

  2. If you must access non-HTTPS websites, or you are not sure, always use a VPN.

    A VPN encrypts all your traffic, so you no longer have to worry about using HTTPS. You do have to worry about whether or not you trust your VPN provider, but that's a much longer discussion than I want to get into right now.

    It's a good idea to pick a go-to VPN provider so you have one ready and get used to how it works over time. Initially it will feel like a bunch of extra work, and it kinda is, but if you care about your security an encrypt-everything VPN is bedrock. And if you don't care about your security, well, why are you even reading this?

If it feels like these are both variants of the same rule, always strongly encrypt everything, you aren't wrong. That's the way things are headed. The math is as sound as it ever was – but unfortunately the people and devices, less so.

Be Safe Out There

Until I heard Damien's story and John's story, I had no idea router hardware could be such a huge point of compromise. I didn't realize that you could be innocently visiting a friend's house, and because he happens to be the parent of three teenage boys and the owner of an old, unsecured router that you connect to via WiFi … your life will suddenly get a lot more complicated.

As the amount of stuff we connect to the Internet grows, we have to understand that the Internet of Things is a bunch of tiny, powerful computers, too – and they need the same strong attention to security that our smartphones, laptops, and servers already enjoy.

[advertisement] At Stack Overflow, we help developers learn, share, and grow. Whether you’re looking for your next dream job or looking to build out your team, we've got your back.
Categories: Programming

The Deadline to Apply for GTAC 2015 is Monday Aug 10

Google Testing Blog - Fri, 08/07/2015 - 18:33
Posted by Anthony Vallone on behalf of the GTAC Committee


The deadline to apply for GTAC 2015 is this Monday, August 10th, 2015. There is a great deal of interest to both attend and speak, and we’ve received many outstanding proposals. However, it’s not too late to submit your proposal for consideration. If you would like to speak or attend, be sure to complete the form by Monday.

We will be making regular updates to the GTAC site (developers.google.com/gtac/2015/) over the next several weeks, and you can find conference details there.

For those that have already signed up to attend or speak, we will contact you directly by mid-September.

Categories: Testing & QA

Stuff The Internet Says On Scalability For August 7th, 2015

Hey, it's HighScalability time:


A feather? Brass relief? River valley? Nope. It's frost on mars!
  • $10 billion: Microsoft data center spend per year; 1: hours from London to New York at mach 4.5; 1+: million Facebook requests per second; 25TB: raw data collected per day at Criteo; 1440: minutes in a day; 2.76: farthest distance a human eye can detect a candle flame in kilometers.

  • Quotable Quotes:
    • @drunkcod: IT is a cost center you say? Ok, let's shut all the servers down until you figure out what part of revenue we contribute to.
    • Beacon 23: I’m here because they ain’t made a computer yet that won’t do something stupid one time out of a hundred trillion. Seems like good odds, but when computers are doing trillions of things a day, that means a whole lot of stupid. 
    • @johnrobb: China factory: Went from 650 employees to 60 w/ robots. 3x production increase.  1/5th defect rate.
    • @twotribes: "Metrics are the internet’s heroin and we’re a bunch of junkies mainlining that black tar straight into the jugular of our organizations."
    • @javame: @adrianco I've seen a 2Tb erlang monolith and I don't want to see that again cc/@martinfowler
    • @micahjay1: Thinking about @a16z podcast about bio v IT ventures. Having done both, big diff is cost to get started and burn rate. No AWS in bio...yet
    • @0xced: XML: 1996 XLink: 1997 XML-RPC: 1998 XML Schema: 1998 JSON: 2001 JSON-LD: 2010 SON-RPC: 2005 JSON Schema: 2009 
    • Inside the failure of Google+: What people failed to understand was Facebook had network effects. It’s like you have this grungy night club and people are having a good time and you build something next door that’s shiny and new, and technically better in some ways, but who wants to leave? People didn't need another version of Facebook.
    • @bdu_p: Old age and treachery will beat youth and skill every time. A failed attempt to replace unix grep 

  • The New World looks a lot like the old Moscow. The Master of Disguise: My Secret Life in the CIA: we assume constant surveillance. This saturation level of surveillance, which far surpassed anything Western intelligence services attempted in their own democratic societies, had greatly constrained CIA operations in Moscow for decades.

  • How Netflix made their website startup time 70% faster. They removed a lot of server side complexity by moving to mostly client side rendering. Java, Tomcat, Struts, and Tiles were replaced with Node.js and React.js.  They call this Universal JavaScript, JavaScript on the server side and the client side. "Using Universal JavaScript means the rendering logic is simply passed down to the client." Only a bootstrap view is rendered on the server with everything else rendered incrementally on the client.

  • How Facebook fights spam with Haskell. Haskell is used as an expressive, latency sensitive rules engine. Sitting at the front of the ingestion point pipeline, it synchronously handles every single write request to Facebook and Instagram. That's more than one million requests per second. So not so slow. Haskell works well because it's a purely functional strongly typed language, supports hot swapping, supports implicit concurrency, performs well, and supports interactive development. Haskell is not used for the entire stack however. It's sandwiched. On the top there's C++ to process messages and on the bottom there's C++ client code interacts with other services. Key design decision: rules can't make writes, which means an abstract syntax tree of fetches can be overlapped and batched. 

  • You know how kids these days don't know the basics, like how eggs come from horses or that milk comes from chickens? The disassociation disorder continues. Now Millions of Facebook users have no idea they’re using the internet: A while back, a highly-educated friend and I were driving through an area that had a lot of data centers. She asked me what all of those gigantic blocks of buildings contained. I told her that they were mostly filled with many servers that were used to host all sorts of internet services. It completely blew her mind. She had no idea that the services that she and billions of others used on their phones actually required millions and millions of computers to transmit and process the data.

  • History rererepeats itself. Serialization is still evil. Improving Facebook's performance on Android with FlatBuffers:  It took 35 ms to parse a JSON stream of 20 KB...A JSON parser needs to build a field mappings before it can start parsing, which can take 100 ms to 200 ms...FlatBuffers is a data format that removes the need for data transformation between storage and the UI...Story load time from disk cache is reduced from 35 ms to 4 ms per story...Transient memory allocations are reduced by 75 percent...Cold start time is improved by 10-15 percent.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Finding What To Learn Next

Making the Complex Simple - John Sonmez - Fri, 08/07/2015 - 13:00

Being able to learn things quickly is an amazing skill to have—even more so for developers because of speed of technology. Most people change careers 15 times throughout their life. Not jobs, careers! So it’s safe to assume that the average developer will have multiple jobs throughout their career. Each job change has the potential […]

The post Finding What To Learn Next appeared first on Simple Programmer.

Categories: Programming

Always #notimplementednovalue . . . Maybe or Maybe Not

Does writing throwaway code to generate information and knowledge have value?

Does writing throwaway code to generate information and knowledge have value?

When we talk about implementing software into production at the end of each sprint (or for the more avant-garde, continuously as it is completed) as a reflection of value that a team delivers to its customer there is always push back. Agile practitioners, in particular, are concerned that some of the code is never implemented. If unimplemented code is not perceived to have value, then why are development teams generating the code they are not going to put into production? The problem is often exacerbated by the perceived need of the developers to get credit for the most tangible output of their work (which is code) rather than the knowledge the code generates. In order to understand why some of the code that is created is not put into production, it is important to understand the typical sources of code that does not get implemented: research and development (R&D), prototypes and development aids.

Research and Development (R&D): R&D is defined as the “investigative activities with the intention of making a discovery that can either lead to the development of new products or procedures, or to improvement of existing products or procedures.” R&D is not required to create a new report from SAP or a new data base table to support a CRM system. In R&D in an IT department, researchers generate experiments to explore ideas and test hypothesis, not to create shippable products or an installed production base of code. The value in the process is the knowledge that is generated (also known as intellectual property). Credit (e.g. adulation and promotions) accrues for generating the IP rather than to the code.

Prototypes: Prototypes are often used to sort out whether an idea is worth pursuing (this could be considered a special, micro form of R&D) and/or as a technique to generate and validate requirements. Prototypes are preliminary models that once constructed are put aside. As with R&D, the goal is less to generate code than to generate information that can be used in subsequent steps in solving a specific problem. As with R&D, credit (e.g. adulation and promotions) accrues for generating the IP rather than to the code.

Development Aids: Developers and testers often create tools to aid in the construction and testing of functionality. Rarely are these tools developed to be put into production. The value of this code is reflected in the efficiency and quality of the functionality they are created to support.

Whether in an R&D environment or at a team-level building prototypes or development aids, does writing throwaway code to generate information and knowledge have value? While this question sounds pedantic, it is a question that gets discussed when a coach begins to push the idea of #NotInProductionNoValue. The answer is to focus the discussion on the information and knowledge generated. In the end, it is the information and knowledge that that has value that will move forward with the project or organization even when the code is sloughed off like skin after a bad sunburn. Most simply when testing an assumption keeps you from making a mistake or provides information to make a good decision, doing whatever is needed makes sense. However, it is not the code that has value per se, but rather the information generated.

Side Note: Many IT departments have rebranded themselves as R&D departments. The R&D metaphor is used to evoke that the IT Departments is identifying products and leading the business. In some startup and cutting edge technology firms this is may well be true; however, generally the use of the term is a generally misnomer or wishful thinking. Instead most IT departments are focused on product delivery, i.e. building solutions based on relatively tried and true frameworks and methods. If you doubt the veracity of that statement just observe the amount of package software (e.g. SAP, PeopleSoft) your own organization supports.


Categories: Process Management

Spark: Convert RDD to DataFrame

Mark Needham - Thu, 08/06/2015 - 22:11

As I mentioned in a previous blog post I’ve been playing around with the Databricks Spark CSV library and wanted to take a CSV file, clean it up and then write out a new CSV file containing some of the columns.

I started by processing the CSV file and writing it into a temporary table:

import org.apache.spark.sql.{SQLContext, Row, DataFrame}
 
val sqlContext = new SQLContext(sc)
val crimeFile = "Crimes_-_2001_to_present.csv"
sqlContext.load("com.databricks.spark.csv", Map("path" -> crimeFile, "header" -> "true")).registerTempTable("crimes")

I wanted to get to the point where I could call the following function which writes a DataFrame to disk:

private def createFile(df: DataFrame, file: String, header: String): Unit = {
  FileUtil.fullyDelete(new File(file))
  val tmpFile = "tmp/" + System.currentTimeMillis() + "-" + file
  df.distinct.save(tmpFile, "com.databricks.spark.csv")
}

The first file only needs to contain the primary type of crime, which we can extract with the following query:

val rows = sqlContext.sql("select `Primary Type` as primaryType FROM crimes LIMIT 10")
 
rows.collect()
res4: Array[org.apache.spark.sql.Row] = Array([ASSAULT], [ROBBERY], [CRIMINAL DAMAGE], [THEFT], [THEFT], [BURGLARY], [THEFT], [BURGLARY], [THEFT], [CRIMINAL DAMAGE])

Some of the primary types have trailing spaces which I want to get rid of. As far as I can tell Spark’s variant of SQL doesn’t have the LTRIM or RTRIM functions but we can map over ‘rows’ and use the String ‘trim’ function instead:

rows.map { case Row(primaryType: String) => Row(primaryType.trim) }
res8: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[29] at map at DataFrame.scala:776

Now we’ve got an RDD of Rows which we need to convert back to a DataFrame again. ‘sqlContext’ has a function which we might be able to use:

sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => Row(primaryType.trim) })
 
<console>:27: error: overloaded method value createDataFrame with alternatives:
  [A <: Product](data: Seq[A])(implicit evidence$4: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$3: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
 cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row])
              sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => Row(primaryType.trim) })
                         ^

These are the signatures we can choose from:

2015 08 06 21 58 12

If we want to pass in an RDD of type Row we’re going to have to define a StructType or we can convert each row into something more strongly typed:

case class CrimeType(primaryType: String)
 
sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) })
res14: org.apache.spark.sql.DataFrame = [primaryType: string]

Great, we’ve got our DataFrame which we can now plug into the ‘createFile’ function like so:

createFile(
  sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) }),
  "/tmp/crimeTypes.csv",
  "crimeType:ID(CrimeType)")

We can actually do better though!

Since we’ve got an RDD of a specific class we can make use of the ‘rddToDataFrameHolder’ implicit function and then the ‘toDF’ function on ‘DataFrameHolder’. This is what the code looks like:

import sqlContext.implicits._
createFile(
  rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) }.toDF(),
  "/tmp/crimeTypes.csv",
  "crimeType:ID(CrimeType)")

And we’re done!

Categories: Programming