Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.

Extreme Programming Explained, Second Edition: Week 6

XP Explained

This week we tackle teams in XP and why XP works based on the Theory of Constraints in Extreme Programing Explained, Second Edition (2005). The two chapters are linked by the idea that work is delivered most effectively when Β teams or organizations achieve a consistent flow.

Chapter 10 -The Whole XP Team

The principal flow, as described in our re-read of Goldratt’s The Goal, proves that more value is created when a system achieves a smooth and steady stream of output. In order to achieve a state of flow, everyone on the team needs to be linked to the work to reduce delay and friction between steps. Β Implementing the steps necessary to address complex work within a team is often at odds with how waterfall projects break work down based on specialties.Β Unless the barriers between specialties are broken down it is hard to get people to agree that you can work incrementally in small chunks versus specialty-based phases such as planning, analysis, design and more.

Every β€œspecialty” needs to understand their role in XP.Β 

Testers – XP assumes that programmers using XP take on the responsibility for catching unit-level mistakes. XP uses the concept of test-first programming. Β In test-first programming, the team begins each cycle (sprint in Scrum) by writing tests that will fail until the code is written. Once the tests are written and executed to prove they will fail, the team writes the code and runs the tests until they pass. ItΒ is at least a partial definition of done. As the team uncovers new details, new tests will be specified and incorporated into the team’s test suite. Β When testers are not directly involved writing and executing tests they can work on extending automated testing.

Interaction designers – Interaction designers work with customers to write and clarify stories. Interaction designers deliver analysis of actual usage of the system to decide what the system needs to do next. The interaction designer in XP would also encompass the UX and UI designer roles as they have evolved since XP Explained was written and updated. Β The designer tends to be a bit in front of the developers to reduces potential delays.

Architects – Architects help the team to keep the big picture in mind as development progresses in small incremental steps. Β Similar to the interaction designer, the architect evolves the big picture just enough ahead of the development team to provide direction (SAFe call this the architectural runway). Evolving the architecture in small steps and gathering feedback as development progress from incremental system testing reduces the risk that the project will wander off track.

Project managers – Project managers (PM) facilitate communication both inside the team and between customers, suppliers and the rest of the organization. Beck suggests that the PMO act as the team historians. Β Project managers keep the team β€œplan” synchronized with the reality based on how they are performing and the world happening outside the team.

Product managers – Product managers write stories, pick themes and stories for the quarterly cycle, pick stories in the weekly cycle, answer questions as development progress and helps when new information is uncovered. The product manager helps the whole team prioritize work. Β (Note: it is different from the concept of the product owner in Scrum). Β The product manager should help the team focus on pieces of work that allow the system to be whole at the end of every cycle.

Executives – The executives’ role in XP is to provide an environment for a team so they have courage, confidence, and accountability. Beck suggests that executives trust the metrics. Β The first metric is the number of defects found after development. Β The fewer the better. The second metric that executives should leverage to build trust in XP is the time lag between idea inception and when the idea begins generating revenue. Β This metric is also known as β€œconcept to cash” (faster is better).

Technical writers – In XP, the technical writer role generates feedback by asking the question, β€œHow can I explain that?” The tech writer can also help to create a closer relationship with users as they help them to learn about the product, listen to their feedback and then to address any confusion between the development team and the user community. Embedding the tech writer role into the XP team allows the team to get feedback on a more timely basis, rather than waiting until much later in the development cycle.

Users – Users help write user stories, provide business intelligence and make business domain decisions as development progress. Β Users must be able to speak for the larger business community, they need to command a broad consensus for the decisions they make. Β If users can’t command a broad consensus from the business community for the decisions they make, they should let the team work on something else first while they get thier ducks in a row.

Programmers – Programmers estimate stories and tasks, they break stories down into smaller pieces, write tests, write code, run tests, automate tedious development processes, and gradually improve the design of the system. Β As with all roles in XP, the most valuable developers combine specialization with a broad set of capabilities.

Human resources – Human resources needs to find a way to hire the right individuals and then to evaluate teams. Β Evaluating teams require changing the review process to focus on teams, rather than on individuals.

XP addressed roles that most discussions of Scrum have ignored, but that are needed to deliver a project. Roles should not be viewed as a rigid set of specialties that every project requires at every moment. Β Teams and organizations need to add and subtract roles as needed. XP team members need to have the flexibility to shift roles as needed to maximize the flow of work. Β 

Chapter 11 – The Theory of Constraints

In order to find opportunities for improvement in the development process using XP, begin by determining which problems are development problems and which are caused outside of the development process. This first step is important because XP is only focused on the software development process (areas like marketing are out of scope). One approach for improving software development is to look at the throughput of the software development process. Β The theory of constraints (ToC) is a system thinking approach for process improvement. Β A simple explanation of ToC is that the output of any system or process is limited by a very small number of constraints within the process. Β Using the ToC to measure the throughput of the development process (a system from the point of view of the ToC) provides the basis for identifying constraints, making a change and then finding the next constraint. Β Using the ToC as an improvement approach maximizes the output of the overall development process rather focusing on the local maximization of steps. Β 

The theory of constraints is not a perfect fit for software development becauseΒ software development is more influenced by people, therefore, is more variable than the mechanical transformation of raw materials. An over-reliance on the concepts like the ToC will tend to overemphasize process and engineering solutions over people solutions, such as a team approach. This is a caution rather than a warning to avoid process approaches. In addition, systems approaches can highlight issues outside of development’s span of control. Β Getting others in the organization to recognize issues they are not ready to accept or address can cause conflict. Beck ends the chapter with the advice, β€œIf you don’t have executive sponsorship, be prepared to do a better job yourself without recognition or protection.” Unsaid is that you will also have to be prepared to for the consequences of your behavior. Β Β 

Previous installments of Extreme Programing Explained, Second Edition (2005) on Re-read Saturday:

Extreme Programming Explained: Embrace Change Second Edition Week 1, Preface and Chapter 1

Week 2, Chapters 2 – 3

Week 3, Chapters 4 – 5

Week 4, Chapters 6 – 7 Β 

Week 5, Chapters 8 – 9


Categories: Process Management

Improvements for smaller app downloads on Google Play

Android Developers Blog - Fri, 07/22/2016 - 18:55

Posted by Anthony Morris, SWE Google Play

Google Play continues to grow rapidly, as Android users installed over 65 billion apps in the last year from the Google Play Store. We’re also seeing developers move to update their apps more frequently to push great new content, patch security vulnerabilities, and iterate quickly on user feedback.

However, many users are sensitive to the amount of data they use, especially if they are not on Wi-Fi. Google Play is investing in improvements to reduce the data that needs to be transferred for app installs and updates, while making data cost more transparent to users.

Read on to understand the updates and learn some tips for ways to optimize the size of your APK.

New Delta algorithm to reduce the size of app updates

For approximately 98% of app updates from the Play Store, only changes (deltas) to APK files are downloaded and merged with the existing files, reducing the size of updates. We recently rolled out a delta algorithm, bsdiff, that further reduces patches by up to 50% or more compared to the previous algorithm. Bsdiff is specifically targeted to produce more efficient deltas of native libraries by taking advantage of the specific ways in which compiled native code changes between versions. To be most effective, native libraries should be stored uncompressed (compression interferes with delta algorithms).

An example from Chrome: Patch Description Previous patch size Bsdiff Size M46 to M47 major update 22.8 MB 12.9 MB M47 minor update 15.3 MB 3.6 MB

Apps that don’t have uncompressed native libraries can see a 5% decrease in size on average, compared to the previous delta algorithm.

Applying the delta algorithm to APK Expansion Files to further reduce update size

APK Expansion Files allow you to include additional large files up to 2GB in size (e.g. high resolution graphics or media files) with your app, which is especially popular with games. We have recently expanded our delta and compression algorithms to apply to these APK Expansion Files in addition to APKs, reducing the download size of initial installs by 12%, and updates by 65% on average.

Clearer size information in the Play Store

Alongside the improvements to reduce download size, we also made information displayed about data used and download sizes in the Play Store clearer. You can now see actual download sizes, not the APK file size, in the Play Store. If you already have an app, you will only see the update size. These changes are rolling out now.

Example 1: Showing new β€œDownload size” of APK

Example 2: Showing new β€œUpdate size” of APK

Tips to reduce your download sizes

1. Optimize for the right size measurements: Users care about download size (i.e. how many bytes are transferred when installing/updating an app), and they care about disk size (i.e. how much space the app takes up on disk). It’s important to note that neither of these are the same as the original APK file size nor necessarily correlated.

Chrome example: Compressed Native Library Uncompressed Native Library APK Size 39MB 52MB (+25%) Download size (install) 29MB 29MB (no change) Download size (update) 29MB 21MB (-29%) Disk size 71MB 52MB (-26%)/td>

Chrome found that initial download size remained the same by not compressing the native library in their APK, while the APK size increased, because Google Play already performs compression for downloads. They also found that the update size decreased, as deltas are more effective with uncompressed files, and disk size decreased as you no longer need an compressed copy of the native library:

2. Reduce your APK size: Remove unnecessary data from the APK like unused resources and code.

3. Optimize parts of your APK to make them smaller: Using more efficient file formats, for example by using WebP instead of JPEG, or by using Proguard to remove unused code.

Read more about reducing APK sizes and watch the I/O 2016 session β€˜Putting Your App on a Diet’ to learn from Wojtek KaliciΕ„ski, about how to reduce the size of your APK.

Categories: Programming

Stuff The Internet Says On Scalability For July 22nd, 2016

Hey, it's HighScalability time:

It's not too late London. There's still time to make this happen


If you like this sort of Stuff then please support me on Patreon.
  • 40%: energy Google saves in datacenters using machine learning; 2.3: times more energy knights in armor spend than when walking; 1000x: energy efficiency of 3D carbon nanotubes over silicon chips; 176,000: searchable documents from the Founding Fathers of the US; 93 petaflops: China’s Sunway TaihuLight; $800m: Azure's quarterly revenue; 500 Terabits per square inch: density when storing a bit with an atom; 2 billion: Uber rides; 46 months: jail time for accessing a database; 

  • Quotable Quotes:
    • Lenin: There are decades where nothing happens; and there are weeks where decades happen.
    • Nitsan Wakart: I have it from reliable sources that incorrectly measuring latency can lead to losing ones job, loved ones, will to live and control of bowel movements.
    • Margaret Hamilton~ part of the culture on the Apollo program “was to learn from everyone and everything, including from that which one would least expect.”
    • @DShankar: Basically @elonmusk plans to compete with -all vehicle manufacturers (cars/trucks/buses) -all ridesharing companies -all utility companies
    • @robinpokorny: ‘Number one reason for types is to get idea what the hell is going on.’ @swannodette at #curryon
    • Dan Rayburn: Some have also suggested that the wireless carriers are seeing a ton of traffic because of Pokemon Go, but that’s not the case. Last week, Verizon Wireless said that Pokemon Go makes up less than 1% of its overall network data traffic.
    • @timbaldridge: When people say "the JVM is slow" I wonder to what dynamic, GC'd, runtime JIT'd, fully parallel, VM they are comparing it to.
    • @papa_fire: “Burnout is when long term exhaustion meets diminished interest.”  May be the best definition I’ve seen.
    • Sheena Josselyn: Linking two memories was very easy, but trying to separate memories that were normally linked became very difficult
    • @mstine: if your microservices must be deployed as a complete set in a specific order, please put them back in a monolith and save yourself some pain
    • teaearlgraycold: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
    • Erik Duindam:  I bake minimum viable scalability principles into my app.
    • Hassabis: It [DeepMind] controls about 120 variables in the data centers. The fans and the cooling systems and so on, and windows and other things. They were pretty astounded.
    • @WhatTheFFacts: In 1989, a new blockbuster store was opening in America every 17 hours.
    • praptak: It [SRE] changes the mindset from "Failure? Just log an error, restore some 'good'-ish state and move on to the next cool feature." towards "New cool feature? What possible failures will it cause? How about improving logging and monitoring on our existing code instead?"
    • plusepsilon: I transitioned from using Bayesian models in academia to using machine learning models in industry. One of the core differences in the two paradigms is the "feel" when constructing models. For a Bayesian model, you feel like you're constructing the model from first principles. You set your conditional probabilities and priors and see if it fits the data. I'm sure probabilistic programming languages facilitated that feeling. For machine learning models, it feels like you're starting from the loss function and working back to get the best configuration

  • Isn't it time we admit Dark Energy and Dark Matter are simply optimizations in the algorithms running the sim of our universe? Occam's razor. Even the Eldritch engineers of our creation didn't have enough compute power to simulate an entire universe. So they fudged a bit. What's simpler than making 90 percent of matter in our galaxy invisible?

  • Do you have one of these? Google has a Head of Applied AI.

  • Uber with a great two article series on their stack. Part unoPart deux: Our business runs on a hybrid cloud model, using a mix of cloud providers and multiple active data centers...We currently use Schemaless (built in-house on top of MySQL), Riak, and Cassandra...We use Redis for both caching and queuing. Twemproxy provides scalability of the caching layer without sacrificing cache hit rate via its consistent hashing algorithm. Celery workers process async workflow operations using those Redis instances...for logging, we use multiple Kafka clusters...This data is also ingested in real time by various services and indexed into an ELK stack for searching and visualizations...We use Docker containers on Mesos to run our microservices with consistent configurations scalably...Aurora for long-running services and cron jobs...Our service-oriented architecture (SOA) makes service discovery and routing crucial to Uber’s success...we’re moving to a pub-sub pattern (publishing updates to subscribers). HTTP/2 and SPDY more easily enable this push model. Several poll-based features within the Uber app will see a tremendous speedup by moving to push....we’re prioritizing long-term reliability over debuggability...Phabricator powers a lot of internal operations, from code review to documentation to process automation...We search through our code on OpenGrok...We built our own internal deployment system to manage builds. Jenkins does continuous integration. We combined Packer, Vagrant, Boto, and Unison to create tools for building, managing, and developing on virtual machines. We use Clusto for inventory management in development. Puppet manages system configuration...We use an in-house documentation site that autobuilds docs from repositories using Sphinx...Most developers run OSX on their laptops, and most of our production instances run Linux with Debian Jessie...At the lower levels, Uber’s engineers primarily write in Python, Node.js, Go, and Java...We rip out and replace older Python code as we break up the original code base into microservices. An asynchronous programming model gives us better throughput. And lots more.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Mahout/Hadoop: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

Mark Needham - Fri, 07/22/2016 - 14:55

I’ve been working my way through Dragan Milcevski’s mini tutorial on using Mahout to do content based filtering on documents and reached the final step where I needed to read in the generated item-similarity files.

I got the example compiling by using the following Maven dependency:


Unfortunately when I ran the code I ran into a version incompatibility problem:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
	at org.apache.hadoop.ipc.Client.call(Client.java:1113)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
	at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
	at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
	at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
	at com.markhneedham.mahout.Similarity.getDocIndex(Similarity.java:86)
	at com.markhneedham.mahout.Similarity.main(Similarity.java:25)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

Version 0.9.0 of mahout-core was published in early 2014 so I expect it was built against an earlier version of Hadoop than I’m using (2.7.2).

I tried updating the Hadoop dependencies that were being called in the stack trace to no avail.


When stepping through the stack trace I noticed that my program was still using an old version of hadoop-core, so with one last throw of the dice I decided to try explicitly excluding that:


And amazingly it worked. Now, finally, I can see how similar my documents are!

Categories: Programming

Hadoop: DataNode not starting

Mark Needham - Fri, 07/22/2016 - 14:31

In my continued playing with Mahout I eventually decided to give up using my local file system and use a local Hadoop instead since that seems to have much less friction when following any examples.

Unfortunately all my attempts to upload any files from my local file system to HDFS were being met with the following exception:

java.io.IOException: File /user/markneedham/book2.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)
at org.apache.hadoop.ipc.Client.call(Client.java:905)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:928)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:811)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427)

I eventually realised, from looking at the output of jps, that the DataNode wasn’t actually starting up which explains the error message I was seeing.

A quick look at the log files showed what was going wrong:


2016-07-21 18:58:00,496 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /usr/local/Cellar/hadoop/hdfs/tmp/dfs/data: namenode clusterID = CID-c2e0b896-34a6-4dde-b6cd-99f36d613e6a; datanode clusterID = CID-403dde8b-bdc8-41d9-8a30-fe2dc951575c
2016-07-21 18:58:00,496 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to / Exiting.
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
        at java.lang.Thread.run(Thread.java:745)
2016-07-21 18:58:00,497 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to /
2016-07-21 18:58:00,602 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2016-07-21 18:58:02,607 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2016-07-21 18:58:02,608 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2016-07-21 18:58:02,610 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

I’m not sure how my clusterIDs got out of sync, although I expect it’s because I reformatted HDFS without realising at some stage. There are other ways of solving this problem but the quickest for me was to just nuke the DataNode’s data directory which the log file told me sits here:

sudo rm -r /usr/local/Cellar/hadoop/hdfs/tmp/dfs/data/current

I then re-ran the hstart script that I stole from this tutorial and everything, including the DataNode this time, started up correctly:

$ jps
26736 NodeManager
26392 DataNode
26297 NameNode
26635 ResourceManager
26510 SecondaryNameNode

And now I can upload local files to HDFS again. #win!

Categories: Programming

Customer Satisfaction Metrics and Quality

On a scale of fist to five, I'm at a ten.

On a scale of fist to five, I’m at a ten.

Quality is partly about the number defects delivered in a piece of software and partly about how the stakeholders and customers experience the software. Β Experience is typically measured as customer satisfaction. Customer satisfactionΒ is a measure of how products and services supplied by a company meet or surpass customer expectations. Customer satisfaction is impacted by all three aspects of software quality: functional (what the software does), structural (whether the software meets standards) and process (how the code was built).

Surveys can be used to collect customer and team level data. Β Satisfaction is used to measure products, services, behaviors or work environment meet expectations.

  1. Asking: Β Asking the question, β€œare you happy (or some variant of the word happy) with the results of XYZ project?” is an assessment of satisfaction. The answer to that simple question will indicate whether the people you are asking are β€œhappy”, or whether you need to ask more questions. Β Asking is a powerful tool and can be as simple as asking a single question to a team or group of customers or as complicated using multifactor surveys. Even though just asking whether someone is satisfied and then listening to the answer can provide powerful information, the size of projects or the complexity of software being delivered often dictate a more formal approach, which means that surveys are often used to collect satisfaction data. Β Product or customer satisfaction is typically measured after a release or on a periodic basis.

    Fist to Five, a simple asking technique: Agile teams measure team level satisfaction using simple techniques such Fist-to-Five. Β Fist-to-five is a simple asking technique in which team members are asked to vote on how satisfied they are by flashing a number of fingers all at the same time. Β Showing five fingers means you are very satisfied and a fist (no fingers) is unsatisfied. Β This form of measurement can be used to assess team satisfaction on a daily basis. (A simple video explanation) I generally post an average score on the wall in the team room in order to track the team’s satisfaction trend.

  2. The Net Promoter metric is a more advanced form of a customer satisfaction measure than simple asking but less complicated than the multifactor indexes that are sometimes generated. Promoters are people who are so satisfied that they will actively spread knowledge to others. Generating the metric begins by asking β€œhow likely you are to recommend the product or organization being measured to a friend or colleague?” I have seen many variants of the net promoter question but at the heart of it, the question is whether the respondent will recommend the service, product, team or organization. Β The response is scored using a scale from 1 – 10. Β Answers of 10 or 9 represent promoters, 7 or 8 are neutral and all other answers represent detractors. The score is calculated using the following formula: (# of Promoters β€” # of Detractors) / (Total Promoters + Neutral + Detractors) x 100. Β Β If ten people responded to a net promoter question and 5 where promoters, 3 neutral and 2 detractors the net promoter score is 30 (5 -2 /10 *100). Β Over time the goal is to improve the net promoter score which will increase the chance your work will be recommended.

Software quality is a nuanced concept that reflects many factors, some of which are functional, structural or process related. Satisfaction is a reflection of quality from a different perspective than measuring defects or code structure. The essence of customer satisfaction is the very simple question, are you happy with what we delivered? Knowing if the team, stakeholders, and customers are happy with what was delivered or the path that was taken to get to that delivery is often just as important as knowing the number of defects that were delivered.

Categories: Process Management

Quote of the Day

Herding Cats - Glen Alleman - Thu, 07/21/2016 - 19:03

A skeptic will question claims, then embrace the evidence. A denier will question claims, then reject the evidence. - Neil deGrase Tyson

Think of this whenever there is a conjecture that has no testable evidence of the claim. And think ever more when those making the conjectured claim refuse to provide evidence. When that is the case, it is appropriate to ignore the conjecture all togetherΒ 

Categories: Project Management

Mahout: Exception in thread β€œmain” java.lang.IllegalArgumentException: Wrong FS: file:/… expected: hdfs://

Mark Needham - Thu, 07/21/2016 - 18:57

I’ve been playing around with Mahout over the last couple of days to see how well it works for content based filtering.

I started following a mini tutorial from Stack Overflow but ran into trouble on the first step:

bin/mahout seqdirectory \
--input file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo \
--output file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo-out \
-c UTF-8 \
-chunk 64 \
-prefix mah
16/07/21 21:19:20 INFO AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo], --keyPrefix=[mah], --method=[mapreduce], --output=[file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo-out], --startPhase=[0], --tempDir=[temp]}
16/07/21 21:19:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/21 21:19:20 INFO deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
16/07/21 21:19:20 INFO deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
16/07/21 21:19:20 INFO deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo, expected: hdfs://localhost:8020
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
	at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
	at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:156)
	at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:90)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:64)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I was trying to run the command against the local file system on my laptop which should have been possible according to the instructions. I couldn’t find any flag I could pass any flag that I could pass to Mahout to tell it not to use HDFS but I eventually stumbled on someone else experiencing a similar problem.

It turns out the last time I was playing around with Hadoop, in late 2015, I’d actually configured that and had completely forgotten. I needed to comment out the following config:



I commented that property out and all was happy with the (Hadoop) world again.

Categories: Programming

Making Release Frictionless, a Business Decision, Part 2

In Part 1, I talked about small stories/chunks of work, checking in all the time so you could buildΒ often and see progress. That assumes you know what done means. Project “done” means release criteria. Here are some stories about how I started using release criteria.

Back in the 70s,Β I worked in a small development group. We had 5 or 6 people, depending on the time of year. We worked alone on our parts of the system. We all integrated into one instrument, but we worked primarily alone. This is back in the days of microcomputers. I wrote assembler, Fortran, or microcode, depending on the part of the system. I still worked on small chunks, “checked in,” as in I made sure I saved my files. No, we had no real version control then.

We had a major release in about 1979 or something like that. I’d been there about 15 months by then. Our customers called the President of the company, complaining about the software. Yes, it was that bad.

Why was it that bad? We had thought we were working towards one goal. Our customers wanted a different goal. If I remember correctly, these were some of the problems (that was a long time ago. I might have forgotten some details.):

  • We did not have a unified approach to how the system asked for information.Β There was no GUI, but the command line was not consistent.
  • Parts of the system were quite buggy. The calculations were correct, but the information presentation was slow, in the wrong place, or didn’t make sense. Our customers had a difficult time using the system.
  • Some parts of the system were quite slow. Not the instrument, but how the instrument showed the data.
  • The parts didn’t fit together. No one had the responsibility of making sure thatΒ the system looked like one system. We all integrated our own parts. No one looked at the whole.

Oops. My boss told us we needed to fix it. I asked the question, “How will we know we are done?” He responded, “When the customers stop calling.” I said, “No, we’re not going to keep shipping more tape to people. What are all the things you want us to do?” He said, “You guys are the software people. You decide.”

I asked my colleagues if it was okay if I developed draft release criteria, so we would know that the release was done. They agreed. I developed them in the next half day, wrote a memo and asked people for a meeting to see if they agreed. (This is in the days before email.)

We met and we changed my strawman criteria to something we could all agree on. We now knew what we had to do. I showed the criteria to my boss. He agreed with them. We worked to the release criteria, sent out a new tape (before the days of disks or CDs!) to all our customers and that project finally finished.

I used the idea of release criteria on every single project since. For me, it’s a powerful idea, to know what done means for the project.

I wrote aΒ release criteria article (see the release criteria search for my release criteria writing) and explained it more in Manage It! Your Guide to Modern, Pragmatic Project Management.

In the 80s, I used it for a projectΒ where we did custom machine vision implementations. If I hadn’t, the customer would have continued asking for more features. The customer did anyway, but we could ask for more money every time we changed the release criteria to add more features.

I use release criteria and milestone criteria for any projects and programs longer than three months in duration, so we could see our progress (or lack thereof) earlier, rather than later. To be honest, even if we think the project is only a couple of months, I always ask, “Do we know what done means for this project?” For small projects, I want to make sure we finish and don’t add more to the project. For programs, I want to make sure we all know where we are headed, so we get there.

Here’s how small chunks of work, checking in every day, and release criteria all work together:

  1. Release criteria tell you what done means for the project. Once you know, you can develop scenarios for checking on your “doneness” as often as you like. I like automated tests that we can run with each build. The tests tell us if we are getting closer or farther away from what we want out of our release.
  2. When you work in small chunks, check them in every day and build at least as often as every day, you can check on the build progress. You know if the build is good or not.
  3. If you add the idea of scenarios for testing as you proceed, release becomes a business decision, Β not a “hardening sprint” or some such.

Here’s a little list that might help you achieve friction-less releases:

  1. What do you need to do to make your stories small? If they are not one day, can you pair, swarm, or mob to finish one story in one day? What would you have to change to do so?
  2. If you have curlicue stories, what can you do to make your stories look like the straight line through the architecture?
  3. What can you do to check in all the time? Is it a personal thing you can do, or do you need to ask Β your entire team to check in all the time?Β I don’t know how to really succeed at agile without continuous integration. What prevents you from integrating all the time? (Hint, it might be your story size.)
  4. Do you know what done means for this release (interim and project)? Do you have deliverable-based planning to achieve those releases?

Solve these problems and you may find frictionless release possible.

When you make releasing externally a business decision—because you can release internally any time you want—you will find your project proceeds more smoothly, regardless of whether you are agile.

Reminder: If you want to learn how to Β make your stories smaller or solve some of the problems of non-frictionless releases, join my Practical Product Owner workshop, starting August 23, 2016. You’ll practice on your projects, so you can see maximum business value from the workshop.

Categories: Project Management

The Ultimate Tester: Wrap-Up

Xebia Blog - Thu, 07/21/2016 - 16:14
To everyone who has read all or some of the past blog posts in this series: thank you so much for reading. I hope I have given you some food for thought on where you can improve as a tester (or developer who tests!).Β  In four blog posts, we explored what it takes to become

Making Release Frictionless, a Business Decision, Part 1

Would you like to release your product at any time? I like it when releases are a business decision, not a result of blood, sweat, and tears.Β It’s possible, and it might not be easy for you. Here are some stories thatΒ showed you how I did it, long ago and more recently.

Story 1: Many years ago, I was a developer on a moderately complex system. There were three of us working together. We used RCS (yes, it was in the ’80s or something like that). I hated that system. Maybe it was our installation of it. I don’t know. All I Β know is that it was too easy to lock each other out, and not be able to do a darn thing. My approach was to make sure I could check in my work in as few files as possible (Single Responsibility Principle, although I didn’t know it at the time), and to work on small chunks.

I checked in every day at least before I went to lunch, once in the middle of the afternoon, and before I left for the day. I did not do test-first development, and I didn’t check my tests in at the time. It took me a while to learn that lesson. I only checked in working code—at least, it worked on my machine.

We built almost every day. (No, we hadn’t learned that lesson either.) We could release at least once a week, closer to twice a week. Β Not friction-less, but close enough for our needs.

Story 2: I learned some lessons, and a few years later, I had graduated to SCCS. I still didn’t like it. Merging was not possible for us, so we each worked on our own small stuff. I still worked on small chunks and checked in at least three times a day. This time, I was smarter, and checked in my tests as I wrote code. I still wrote code first and tests second. However, I worked in really small chunks (small functions and the tests that went with them) and checked them in as a unit. The only time I didn’t do that is if it was lunch or the end of the day. If I was done with code but not tests, I checked in anyway. (No, I was not perfect.) We all had a policy of checking in all our code every day. That way, someone else could take over if one of us got sick.

Each of us did the same thing. This time, we built whenever we wanted a new system. Often, it was a couple of times a day. We told each other, “Don’t go there. That part’s not done, but it shouldn’t break anything.” We had internal releases at least once a day. We released as a demo once a week to our manager.

After that, I worked at a couple of places with home-grown version control systems that look a lot like subversion does now. That was in the later 80s. I became a project manager and program manager.

Story 3: I was a program manager for a 9-team software program. We’d had trouble in the past getting to the point where we could release. I asked teams to do these things: Work towards a program-wide deliverable (release) every month, and use continuous integration. I said, “I want you to check everything in every day and make sure we always have a working build. I want to be able to see the build work every morning when I arrive.” Seven teams said yes. Two teams said no. I explained to the teams they could work in any way they wanted, as long as they could integrate within 24 hours of seeing everyone else’s code. “No problem, JR. We know what we’re doing.”

Well, those two teams didn’t deliver their parts at the first month milestone. They were pissed when I explained they could not work on any more features until they integrated what they had. Until they had everything working, no new features. (I was pissed, too.)

It took them almost three weeks to integrate their four weeks of work. They finally asked for help and a couple of other guys worked with the teams to untangle their code and check everything in.

I learned the value of continuous integration early. Mostly because I was way too lazy (forgetful?, not smart enough?) to be able to keep the details of an entire system in my head for an entire project. I know people who can. I cannot. I used to think it was one of my failings. I now realize many people only think they can keep all the details. They can’t either.

Here’s the technical part of how I got to frictionless releases:

  1. Make the thing you work on small. If you use stories, make the story a one-day or smaller story. I don’t care if the entire team works on it or one person works on it (well, I do care, and that’s a topic for another post), but being able to finish something of value in one day means you can descend into it. You finish it. You come up for air/more work and descend again. You don’t have to remember a ton of stuff related but not directly a part of this feature.
  2. Use continuous integration. Check in all the time. Now that I write books using subversion, I check in whenever I have either several paras/one chunk, or it’s been an hour. I check that the book builds and I fix problems right away, when the work is fresh in my mind. It’s one of the ways I can write fast and write well. Our version control systems are much more sophisticated than the ones I used in the early days. I’m not sure I buy automated merge. I prefer to keep the stories small and cohesive. (See this post on curlicue features. Avoid those by managing to implement by feature.)
  3. Check in all the associated data. I check in automated tests and test data when I write code. I check in bibliographic references when I write books. If you need something else with your work product, do it at the time you create. If I was a developer now, I would check in all my unit tests when I check in the code. If I was really smart, I might even check in the tests first, to do TDD. (TDD helps people design, not test.) If I was a tester, I would definitely check in all the automated tests as soon as possible. I could then ask the developers to run those tests to make sure they didn’t make a mistake. I could do the hard-to-define and necessary exploratory testing. (Yes, I did this as a tester.)

Frictionless releases are not just technical. You have to know what done means for a release. That’s why I started using release criteria back in the 70s. I’ll write a part 2 about release criteria.

Categories: Project Management

Our Answer To the Alert Storm: Introducing Team View Alerts

Xebia Blog - Thu, 07/21/2016 - 11:39
As a Dev or Ops it’s hard to focus on the things that really matter. Applications, systems, tools and other environments are generating notifications at a frequency and amount greater than you are able to cope with. It's a problem for every Dev and Ops professional. Alerts are used to identify trends, spikes or dips

Neo4j: Cypher – Detecting duplicates using relationships

Mark Needham - Wed, 07/20/2016 - 18:32

I’ve been building a graph of computer science papers on and off for a couple of months and now that I’ve got a few thousand loaded in I realised that there are quite a few duplicates.

They’re not duplicates in the sense that there are multiple entries with the same identifier but rather have different identifiers but seem to be the same paper!

e.g. there are a couple of papers titled ‘Authentication in the Taos operating system’:


2016 07 20 11 43 00


2016 07 20 11 43 38

This is the same paper published in two different journals as far as I can tell.

Now in this case it’s quite easy to just do a string similarity comparison of the titles of these papers and realise that they’re identical. I’ve previously use the excellent dedupe library to do this and there’s also an excellent talk from Berlin Buzzwords 2014 where the author uses locality-sensitive hashing to achieve a similar outcome.

However, I was curious whether I could use any of the relationships these papers have to detect duplicates rather than just relying on string matching.

This is what the graph looks like:

Graph  8

We’ll start by writing a query to see how many common references the different Taos papers have:

MATCH (r:Resource {id: "168640"})-[:REFERENCES]->(other)
WITH r, COLLECT(other) as myReferences
UNWIND myReferences AS reference
OPTIONAL MATCH path = (other)-[:REFERENCES]->(reference)
WITH other, COUNT(path) AS otherReferences, SIZE(myReferences) AS myReferences
WITH other, 1.0 * otherReferences / myReferences AS similarity WHERE similarity > 0.5
RETURN other.id, other.title, similarity
ORDER BY similarity DESC
β”‚other.idβ”‚other.title                                β”‚similarityβ”‚
β”‚168640  β”‚Authentication in the Taos operating systemβ”‚1         β”‚
β”‚174614  β”‚Authentication in the Taos operating systemβ”‚1         β”‚

This query:

  • picks one of the Taos papers and finds its references
  • finds other papers which reference those same papers
  • calculates a similarity score based on how many common references they have
  • returns papers that have more than 50% of the same references with the most similar ones at the top

I tried it with other papers to see how it fared:

Performance of Firefly RPC

β”‚other.idβ”‚other.title                                                     β”‚similarity        β”‚
β”‚74859   β”‚Performance of Firefly RPC                                      β”‚1                 β”‚
β”‚77653   β”‚Performance of the Firefly RPC                                  β”‚0.8333333333333334β”‚
β”‚110815  β”‚The X-Kernel: An Architecture for Implementing Network Protocolsβ”‚0.6666666666666666β”‚
β”‚96281   β”‚Experiences with the Amoeba distributed operating system        β”‚0.6666666666666666β”‚
β”‚74861   β”‚Lightweight remote procedure call                               β”‚0.6666666666666666β”‚
β”‚106985  β”‚The interaction of architecture and operating system design     β”‚0.6666666666666666β”‚
β”‚77650   β”‚Lightweight remote procedure call                               β”‚0.6666666666666666β”‚

Authentication in distributed systems: theory and practice

β”‚other.idβ”‚other.title                                               β”‚similarity        β”‚
β”‚121160  β”‚Authentication in distributed systems: theory and practiceβ”‚1                 β”‚
β”‚138874  β”‚Authentication in distributed systems: theory and practiceβ”‚0.9090909090909091β”‚

Sadly it’s not as simple as finding 100% matches on references! I expect the later revisions of a paper added more content and therefore additional references.

What if we look for author similarity as well?

MATCH (r:Resource {id: "121160"})-[:REFERENCES]->(other)
WITH r, COLLECT(other) as myReferences
UNWIND myReferences AS reference
OPTIONAL MATCH path = (other)-[:REFERENCES]->(reference)
WITH r, other, authorSimilarity,  COUNT(path) AS otherReferences, SIZE(myReferences) AS myReferences
WITH r, other, authorSimilarity,  1.0 * otherReferences / myReferences AS referenceSimilarity
WHERE referenceSimilarity > 0.5
MATCH (r)<-[:AUTHORED]-(author)
WITH r, myReferences, COLLECT(author) AS myAuthors
UNWIND myAuthors AS author
OPTIONAL MATCH path = (other)<-[:AUTHORED]-(author)
WITH other, myReferences, COUNT(path) AS otherAuthors, SIZE(myAuthors) AS myAuthors
WITH other, myReferences, 1.0 * otherAuthors / myAuthors AS authorSimilarity
WHERE authorSimilarity > 0.5
RETURN other.id, other.title, referenceSimilarity, authorSimilarity
ORDER BY (referenceSimilarity + authorSimilarity) DESC
β”‚other.idβ”‚other.title                                               β”‚referenceSimilarityβ”‚authorSimilarityβ”‚
β”‚121160  β”‚Authentication in distributed systems: theory and practiceβ”‚1                  β”‚1               β”‚
β”‚138874  β”‚Authentication in distributed systems: theory and practiceβ”‚0.9090909090909091 β”‚1               β”‚
β”‚other.idβ”‚other.title                   β”‚referenceSimilarityβ”‚authorSimilarityβ”‚
β”‚74859   β”‚Performance of Firefly RPC    β”‚1                  β”‚1               β”‚
β”‚77653   β”‚Performance of the Firefly RPCβ”‚0.8333333333333334 β”‚1               β”‚

I’m sure I could find some other papers where neither of these similarities worked well but it’s an interesting start.

I think the next step is to build up a training set of pairs of documents that are and aren’t similar to each other. We could then train a classifier to determine whether two documents are identical.

But that’s for another day!

Categories: Programming

Connecting your App to a Wi-Fi Device

Android Developers Blog - Wed, 07/20/2016 - 18:21

Posted by Rich Hyndman, Android Developer Advocate

With the growth of the Internet of Things, connecting Android applications to Wi-Fi enabled devices is becoming more and more common. Whether you’re building an app for a remote viewfinder, to set up a connected light bulb, or to control a quadcopter, if it’s Wi-Fi based you will need to connect to a hotspot that may not have Internet connectivity.

From Lollipop onwards the OS became a little more intelligent, allowing multiple network connections and not routing data to networks that don’t have Internet connectivity. That’s very useful for users as they don’t lose connectivity when they’re near Wi-Fis with captive portals. Data routing APIs were added for developers, so you can ensure that only the appropriate app traffic is routed over the Wi-Fi connection to the external device.

To make the APIs easier to understand, it is good to know that there are 3 sets of networks available to developers:

  • WiFiManager#startScan returns a list of available Wi-Fi networks. They are primarily identified by SSID.
  • WiFiManager#getConfiguredNetworks returns a list of the Wi-Fi networks configured on the device, also indexed by SSID, but they are not necessarily currently available.
  • ConnectivityManager#getAllNetworks returns a list of networks that are being interacted with by the phone. This is necessary as from Lollipop onwards a device may be connected to multiple networks at once, Wi-Fi, LTE, Bluetooth, etc… The current state of each is available by calling ConnectivityManager#getNetworkInfo and is identified by a network ID.

In all versions of Android you start by scanning for available Wi-Fi networks with WiFiManager#startScan, iterate through the ScanResults looking for the SSID of your external Wi-Fi device. Once you’ve found it you can check if it is already a configured network using WifiManager#getConfiguredNetworks and iterating through the WifiConfigurations returned, matching on SSID. It’s worth noting that the SSIDs of the configured networks are enclosed in double quotes, whilst the SSIDs returned in ScanResults are not.

If your network is configured you can obtain the network ID from the WifiConfiguration object. Otherwise you can configure it using WifiManager#addNetwork and keep track of the network id that is returned.

To connect to the Wi-Fi network, register a BroadcastReceiver that listens for WifiManager.NETWORK_STATE_CHANGED_ACTION and then call WifiManager.enableNetwork (int netId, boolean disableOthers), passing in your network ID. The enableNetwork call disables all the other Wi-Fi access points for the next scan, locates the one you’ve requested and connects to it. When you receive the network broadcasts you can check with WifiManager#getConnectionInfo that you’re successfully connected to the correct network. But, on Lollipop and above, if that network doesn’t have internet connectivity network, requests will not be routed to it.

Routing network requests

To direct all the network requests from your app to an external Wi-Fi device, call ConnectivityManager#setProcessDefaultNetwork on Lollipop devices, and on Marshmallow call ConnectivityManager#bindProcessToNetwork instead, which is a direct API replacement. Note that these calls require android.permission.INTERNET; otherwise they will just return false.

Alternatively, if you’d like to route some of your app traffic to the Wi-Fi device and some to the Internet over the mobile network:

Now you can keep your users connected whilst they benefit from your innovative Wi-Fi enabled products.

Categories: Programming

Android Developer Story: StoryToys finds success in the β€˜Family’ section on Google Play

Android Developers Blog - Wed, 07/20/2016 - 17:21

Posted by Lily Sheringham, Google Play team

Based in Dublin, Ireland, StoryToys is a leading publisher of interactive books and games for children. Like most kids’ app developers, they faced the challenges of engaging with the right audiences to get their content discovered. Since the launch of the Family section on Google Play, StoryToys has experienced an uplift of 270% in revenue and an increase of 1300% in downloads.

Hear Emmet O’Neill, Chief Product Officer, and Gavin Barrett, Commercial Director, discuss how the Family section creates a trusted and creative space for families to find new content. Also hear how beta testing, localized pricing and more, has allowed StoryToy’s flagship app, My Very Hungry Caterpillar, to significantly increase engagement and revenue.

Learn more about Google Play for Families and get the Playbook for Developers app to stay up-to-date with more features and best practices that will help you grow a successful business on Google Play.

Categories: Programming

Building Highly Scalable V6 Only Cloud Hosting

This is a guest repost by Donatas Abraitis, Lead Systems Engineer at at Hostinger International.

This article is about how we built the new high scalable cloud hosting solution using IPv6-only communication between commodity servers, what problems we faced with IPv6 protocol and how we tackled them for handling more than ten millions active users.

Why did we decide to run IPv6-only network?

At Hostinger we care much about innovation technologies, thus we decided to run a new project named Awex that is based on this protocol. If we can, so why not start since today? Only frontend (user facing) services are running in dual-stack environment, everything else is IPv6-only for west-east traffic.

Categories: Architecture

Assessing Value Produced in Exchange for the Cost to Produce the Value

Herding Cats - Glen Alleman - Wed, 07/20/2016 - 13:35

A common assertion in the Agile community is we focus on Value over Cost.

Both are equally needed. Both must be present to make informed decisions. Both are random variables. As random variables, both need estimates to make informed decisions.

To assess value produced by the project we first must have targets to steer toward. A target Value must be measured in units meaningful to the decision makers. Measures of Effectiveness and Performance that can monetized this Value.

Value cannot be determined without knowing the cost to produce that Value. This is fundamental to the Microeconomics of Decision making for all business processes.

The Value must be assessed using...

  • Measures of Effectiveness - is an Operational measure of success that isΒ closely related to the achievements of the mission or operational objectives evaluated in the operational environment, under a specific set of conditions.
  • Measures of Performance - is a Measure that characterize physical or functional attributes relating to the system operation, measured or estimated under specific conditions.
  • Key Performance Parameter - is a Measure that represents the capabilities and characteristics so significant that failure to meet them can be cause for reevaluation, reassessing, or termination of the program.
  • Technical Performance Measures - are Attributes that determine how well a system or system element satisfies or expected to satisfy a technical requirement or goal.

Without these measures attached to the Value there is no way to confirm that the cost to produce the Value will breakeven. The Return on Investment to deliver the needed Capability is of course.

ROI = (Value - Cost)/Cost

So the numerator and the denominator must have the same units of Measure. This can usually be dollars. Maybe hours. So when we hear ...

The focus on value is what makes the #NoEstimatesΒ idea valuable - ask in what units of measure is that Value? Are those units of measure meanigful to the decision makers? Are those decision makers accountable for the financial performance of the firm?


Categories: Project Management

Strictly Enforced Verified Boot with Error Correction

Android Developers Blog - Wed, 07/20/2016 - 01:51

Posted by Sami Tolvanen, Software Engineer


Android uses multiple layers of protection to keep users safe. One of these layers is verified boot, which improves security by using cryptographic integrity checking to detect changes to the operating system. Android has alerted about system integrity since Marshmallow, but starting with devices first shipping with Android 7.0, we require verified boot to be strictly enforcing. This means that a device with a corrupt boot image or verified partition will not boot or will boot in a limited capacity with user consent. Such strict checking, though, means that non-malicious data corruption, which previously would be less visible, could now start affecting process functionality more.

By default, Android verifies large partitions using the dm-verity kernel driver, which divides the partition into 4 KiB blocks and verifies each block when read, against a signed hash tree. A detected single byte corruption will therefore result in an entire block becoming inaccessible when dm-verity is in enforcing mode, leading to the kernel returning EIO errors to userspace on verified partition data access.

This post describes our work in improving dm-verity robustness by introducing forward error correction (FEC), and explains how this allowed us to make the operating system more resistant to data corruption. These improvements are available to any device running Android 7.0 and this post reflects the default implementation in AOSP that we ship on our Nexus devices.

Error-correcting codes

Using forward error correction, we can detect and correct errors in source data by shipping redundant encoding data generated using an error-correcting code. The exact number of errors that can be corrected depends on the code used and the amount of space allocated for the encoding data.

Reed-Solomon is one of the most commonly used error-correcting code families, and is readily available in the Linux kernel, which makes it an obvious candidate for dm-verity. These codes can correct up to ⌊t/2βŒ‹ unknown errors and up to t known errors, also called erasures, when t encoding symbols are added.

A typical RS(255, 223) code that generates 32 bytes of encoding data for every 223 bytes of source data can correct up to 16 unknown errors in each 255 byte block. However, using this code results in ~15% space overhead, which is unacceptable for mobile devices with limited storage. We can decrease the space overhead by sacrificing error correction capabilities. An RS(255, 253) code can correct only one unknown error, but also has an overhead of only 0.8%.

An additional complication is that block-based storage corruption often occurs for an entire block and sometimes spans multiple consecutive blocks. Because Reed-Solomon is only able to recover from a limited number of corrupted bytes within relatively short encoded blocks, a naive implementation is not going to be very effective without a huge space overhead.

Recovering from consecutive corrupted blocks

In the changes we made to dm-verity for Android 7.0, we used a technique called interleaving to allow us to recover not only from a loss of an entire 4 KiB source block, but several consecutive blocks, while significantly reducing the space overhead required to achieve usable error correction capabilities compared to the naive implementation.

Efficient interleaving means mapping each byte in a block to a separate Reed-Solomon code, with each code covering N bytes across the corresponding N source blocks. A trivial interleaving where each code covers a consecutive sequence of N blocks already makes it possible for us to recover from the corruption of up to (255 - N) / 2 blocks, which for RS(255, 223) would mean 64 KiB, for example.

An even better solution is to maximize the distance between the bytes covered by the same code by spreading each code over the entire partition, thereby increasing the maximum number of consecutive corrupted blocks an RS(255, N) code can handle on a partition consisting of T blocks to ⌈T/NβŒ‰ Γ— (255 - N) / 2.

Interleaving with distance D and block size B.

An additional benefit of interleaving, when combined with the integrity verification already performed by dm-verity, is that we can tell exactly where the errors are in each code. Because each byte of the code covers a different source blockβ€”and we can verify the integrity of each block using the existing dm-verity metadataβ€”we know which of the bytes contain errors. Being able to pinpoint erasure locations allows us to effectively double our error correction performance to at most ⌈T/NβŒ‰ Γ— (255 - N) consecutive blocks.

For a ~2 GiB partition with 524256 4 KiB blocks and RS(255, 253), the maximum distance between the bytes of a single code is 2073 blocks. Because each code can recover from two erasures, using this method of interleaving allows us to recover from up to 4146 consecutive corrupted blocks (~16 MiB). Of course, if the encoding data itself gets corrupted or we lose more than two of the blocks covered by any single code, we cannot recover anymore.

While making error correction feasible for block-based storage, interleaving does have the side effect of making decoding slower, because instead of reading a single block, we need to read multiple blocks spread across the partition to recover from an error. Fortunately, this is not a huge issue when combined with dm-verity and solid-state storage as we only need to resort to decoding if a block is actually corrupted, which still is rather rare, and random access reads are relatively fast even if we have to correct errors.


Strictly enforced verified boot improves security, but can also reduce reliability by increasing the impact of disk corruption that may occur on devices due to software bugs or hardware issues.

The new error correction feature we developed for dm-verity makes it possible for devices to recover from the loss of up to 16-24 MiB of consecutive blocks anywhere on a typical 2-3 GiB system partition with only 0.8% space overhead and no performance impact unless corruption is detected. This improves the security and reliability of devices running Android 7.0.

Categories: Programming

Four Delivered Defect Metrics

Find the defects before delivery.

Find the defects before delivery.

One of the strongest indications of the quality of a piece of software is the number of defects found when it is used. Β In software, defects are generated by a flaw that causes the code to fail to perform as required. Even in organizations that don’t spend the time and effort to collect information on defects before the software is delivered collect information on defects that crop up after delivery. Β Four classic defect measures are used β€œpost” delivery. Β Each of the four measures is used to improve the functional, structural and process aspects the software delivery process. They are:Β 

  1. Simple Backlog or List is the simplest defect measure. Relatively early in my career, I took over a testing group that supported a few hundred developers. Β During the interview process, I determinedΒ that a lot of defects were being found in production; however, no one was quite sure how many were β€œa lot.” Enter the need to capture those defects and generate a simple backlog.
    Variants on the simple backlog include defect aging reports, counts by defect type, and quality of fix (how many times it takes to fix a defect) to name a few.
    Backlogs of delivered defects can be a reflection of all aspects of the development process. Β Variants that include defect type or insertion point (where the defect come from) can be used to target a specific aspect of quality.
  2. Defect Arrival Rate (post implementation) is similar to the Defect Discovery and Cumulative Discovery Rates used to monitor defects during software development. Β This metric is most often used to monitor the implementation of programs or other types of large-scale effort. Β I used this tool to monitor bank mergers and conversions. Arrival rate metrics collected during the development process are most often a measure of structural quality; however, after functionality is in production it Β is much more difficult to tie this metric to a specific aspect of quality. Β More information is needed to determine whether the defect data is tied to functional, structural or process quality.
  3. Defect Removal Efficiency (DRE) is the ratio of the defects found before implementation and removed to the total defects found through some period after delivery (typically thirty to ninety days). DRE is often used to evaluate the quality of the process used to develop the software (or other product). DRE includes information from all defect insertion tasks (such as analysis, design, and coding) and all defect removal tasks (reviews, testing and usage).
  4. Mean Time to Failure is the expected length of time a device or piece of software is expected to last in operation before it breaks. Failure is typically held to mean the device or software ceases to operate (defining the word failure is always a bit fraught). Β This is most sophisticated of the classic quality metrics (really a measure of reliability)Β and is most often used to evaluate devices. Mean time to failure is almost always used to evaluate the functional aspect of quality.

The number and types of delivered defects are an obvious reflection of software quality. Β The more defects a user of any sort finds, the lower the quality of the software. Simple countsΒ of defects or backlogs, however,Β tend to provide less information value anytime we get past a few defects.Β 

Categories: Process Management