Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.

Attitude versus Knowledge in Software Development

In her article for the Summer 2016 issue of Methods & Tools, “Hiring for Agility,” Nadia Smith suggested considering an interesting difference between the need to recruit for “Agile”, defined as the knowledge and experience of Agile practices, and to recruit for “agility”, defined as attitude, values and behavior. Without focusing on Agile, this approach […]

Announcing Open Registration and Exhibitors for Google Play Indie Games Festival in San Francisco, Sept. 24

Android Developers Blog - Mon, 08/29/2016 - 18:03

Posted by Jamil Moledina, Google Play, Games Strategic Lead

To celebrate the art of the latest innovative indie games, we’re hosting the first Google Play Indie Games Festival in North America on September 24th in San Francisco. At the festival, Android fans and gamers will have a unique opportunity to play new and unreleased indie games from some of the most innovative developers in the US and Canada, as well as vote for their favorite ones.

Registration is now open and the event is free for everyone to enjoy.

We’re also excited to announce the games selected to exhibit and compete at the event. From over 200 submissions, we carefully picked 30 games that promise the most fun and engaging experiences to attendees. Fans will have a chance to play a variety of indie games not yet available publicly.

Check out the full list of games selected here and below.


A Matter of Murder
Antihero (coming soon)
AR Zombie (coming soon)
Armajet (coming soon)
Armor Blitz (coming soon)
Bit Bit Blocks (coming soon)
1979 Revolution: Black Friday (coming soon) Coffee Pot Terrarium (coming soon) Crayola® Worlds for Tango (coming soon) Dog Sled Saga (coming soon)
Endless Mine

Futurable 1. Summer City (coming soon) Gunhouse (coming soon)
HoloGrid: Monster Battle (coming soon) Hovercraft: Takedown
HOVR (coming soon)
Maruta 279 (coming soon)
Norman's Night In: The Cave (coming soon) Numeris

Orbit - Playing with Gravity
Parallyzed

Psychic (coming soon)
Riptide GP: Renegade
Roofbot

Sand Stories (coming soon)
SmashWars VR: Drone Racing
ThreeSwipes

Rainmaker: Ultimate Trading
Zombie Rollerz (coming soon)
Coming soon

Fans will also have the opportunity to vote for their favorite games at the festival, along with an authoritative panel of judges from Google Play and the game industry. They include:

  • Ron Carmel, Co-founder of Indie Fund; co-creator of World of Goo
  • Hyunse Chang, Business Development Manager at Google Play
  • Lina Chen, Co-founder & CEO of Nix Hydra
  • David Edery, CEO of Spry Fox
  • Maria Essig, Partner Manager, Indies at Google Play
  • Noah Falstein, Chief Game Designer at Google
  • Dan Fiden, Chief Strategy Officer of Funplus
  • Emily Greer, CEO of Kongregate
  • Alex Lee, Producer, Program Manager, Daydream & Project Tango at Google
  • Jordan Maron, Gamer and independent YouTuber “CaptainSparklez”

We are also thrilled to announce that veteran game designer and professor Richard Lemarchand will be the emcee for the event. He was lead designer at Crystal Dynamics and Naughty Dog, and is now Associate Chair and Associate Professor at the University of Southern California, School of Cinematic Arts, Interactive Media and Games Division.

The winning developers will receive prizes, such Google Cloud credits, NVIDIA SHIELD Android TVs and K1 tablets, Razer Forge TV bundles, and more, to recognize their efforts.

Join us for an exciting opportunity to connect with fellow game fans, get inspired, and celebrate the art of indie games. Learn more about the event on the event website.

Categories: Programming

Tango developer workshop brings stories to life

Google Code Blog - Mon, 08/29/2016 - 17:21

Posted by Eitan Marder-Eppstein, Senior Software Engineer for Tango

Technology helps us connect and communicate with others -- from sharing commentary and photos on social media to a posting a video with breaking news, digital tools enable us to craft stories and share them with the world.

Tango can enhance storytelling by bringing augmented reality into our surroundings. Recently, the Tango team hosted a three-day developer workshop around how to use this technology to tell incredible stories through mobile devices. The workshop included a wide range of participants, from independent filmmakers and developers to producers and creatives at major media companies. By the end of the workshop, a number of new app prototypes had been created. Here are some of the workshop highlights:

  • The New York Times experimented with ways to connect people with news stories by creating 3D models of the places where the events happened.
  • The Wall Street Journal prototyped an app called ViewPoint to bring location-based stories to life. When you’re in front of a monument, for example, you can see AR content and pictures that someone else took at that site.
  • Line experimented with bringing 3D characters to life. For example, app users could see AR superheros in front of them, and then their friend could jump into the characters’ costumes.
  • Google’s Mobile Vision Team brought music to life by letting people point their phones at various objects and visualize the vibrations that music makes on them.

We even had an independent developer use Tango to create realtime video stabilization tool. We’re looking forward to seeing these apps—and many more—come to life. If you want to start building your own storytelling and visual communication apps for augmented reality, check out our developer page and join our G+ community.

Categories: Programming

Book of the Month

Herding Cats - Glen Alleman - Mon, 08/29/2016 - 00:45

Picturing Uncertain WorldThis is a book about my favorite subject - the uncertainty in the normal world.

As readers of this blog know, managing in the presence of uncertainty is how adults manage projects. This is called Risk Management.

And as always in order to make decisions in the presence of the uncertainties that create risk, we need to make estimates.

No estimates? No decisions based on probabilistic choices. No probabilistic choices? No understanding of the resulting risks (both epistemic and aleatory). 

No understanding of the probabilistic outcomes and the statistical variances? No Adult Management of risk. (Remember Tin Lister's quote - Risk Management is How Adults Manage Projects

So it's this simple, no estimates of the outcome, no risk management. No risk management, no adult management. No estimates means No Adult Management of the naturally occuring and probabilistic risk on all project work.

Here's another Tim Lister presentation show how estimates and estimating are part of all decision making when spending other people's money

Screen Shot 2016-08-28 at 5.44.00 PM

Categories: Project Management

SPAMCAST 409 “Delay”

Friends, I have had an issue with my computer.  When I returned for grocery shopping the machine had stopped functioning.  There is a remote possibility that I will post this evening but the odds are that we will be back next week.


Categories: Process Management

Extreme Programming Explained, Second Edition: Re-Read Week 11 (Chapters 20 – 21)

XP Explained Cover

This week the re-read of Kent Beck and Cynthia Andres’s Extreme Programing Explained, Second Edition (2005) tackles Chapters 20 and 21.  Chapter 20 is a discussion of applying XP.  The short version is that there is no one perfect way to apply XP, which dovetails nicely with Chapter 21 which addresses the concept of purity and certification.  IF there is no one perfect way to apply XP, there can’t be a litmus test for XP purity.

Chapter 20: Applying XP

Software development is rife with waste and inefficiencies even though practitioners have been struggling for years to wring the waste out the process.  Part of the issue is that people by their general nature are chaotic, therefore don’t always work in the most efficient manner. Other parts of the problem are that developing software is complex and what ends up running in production is often a result of learning and innovation.  Said more simply, software development is not a result of an assembly line process. There are many paths to developing code based on context, people, and the business problem being solved, hence the potential for inefficiencies.

Many organizations make the mistake of mandating the adoption of a methodology such as XP without addressing the pre-existing problems in the organization that lead to inefficiency and waste.  Organizations that have not dealt with the problems that make embracing the philosophies and principles that underpin XP viable are then shocked when individuals and teams revert to earlier practices they perceive as safer or more comfortable. Beck suggests that strong sponsorship is helpful to avoid reverting away from XP at the organization level.  At a lower level, for addressing the inertia created by pre-existing problems that are slow to change Beck again suggests the idea of leading by example, adopting XP yourself and providing tangible outcomes (and stories) are a powerful tool for change. 

The concept of continuous improvement connotes a smooth progression towards an ultimate goal. Any experienced practitioner or observer of organizational change will understand that idea of a smooth progression toward the future is a simplification.  Rather, progress is more of series of jumps as improvements are made, tempered by feedback and then integrated into the organization.  Change can be facilitated by coaching.  Coaches can bring knowledge-based on a wider base of experience to the team. Coaches are also a means to change the composition of the team which can inject the energy need to motivate change.

The final word on applying XP is if your organization’s values and principles are at odds with XP, don’t apply XP to the organization until you wrestle with the pre-existing culture.

Chapter 21: Purity

How can you tell if a team or organization is practicing XP?  There is no binary purity test.  For example, just practicing pair programming does not mean you are practicing XP, just as being a distributed team meana you can’t practice XP. The values, principles, and practices that comprise the definition of XP in Extreme Programing Explained, Second Edition provide potential users of XP with guidance. Beck says, “Saying your team is extreme sets other people’s expectations for your style of communication, your development practices, and the speed and quality of your results.” Whether you use extreme, Agile or are an RUP practitioner, what you call yourself sets expectations based on the values and principles embedded in the methodology.  In the end, purity is a matter of living up to the expectations of values and principles you espouse.

The discussion of purity in Chapter 21 evoked a rant on certification.  Beck stated, “If a certifying authority isn’t willing to stand behind IT certification, it’s just printing certificates and collecting money.” The suggestion is that like board certifications in medicine unless the certification board is willing to police usage, the certification lacks value.  As a member of the board of an association that tests and certifies practitioners, Beck’s comments are food for thought.

Based on the premise that XP can be implemented and practiced in many different ways, if the way you are practicing XP upholds the values and principles of XP you are “doing” XP.  In other words there can be no litmus test and certification based on XP purity.

Previous installments of Extreme Programing Explained, Second Edition (2005) on Re-read Saturday:

Extreme Programming Explained: Embrace Change Second Edition Week 1, Preface and Chapter 1

Week 2, Chapters 2 – 3

Week 3, Chapters 4 – 5

Week 4, Chapters 6 – 7  

Week 5, Chapters 8 – 9

Week 6, Chapters 10 – 11

Week 7, Chapters 12 – 13

Week 8, Chapters 14 – 15

Week 9, Chapters 16 – 17

Week, 10, Chapters 18 – 19

Remember  we are going to read The Five Dysfunctions of a Team by Jossey-Bass next.  This will be a new book for me, therefore, an initial read, not a re-read!  Steven Adams suggested the book and it has been on my list for a few years. Click the link (The Five Dysfunctions of a Team), buy a copy, and in a few weeks, we will begin to read the book together.


Categories: Process Management

scikit-learn: Clustering and the curse of dimensionality

Mark Needham - Sat, 08/27/2016 - 21:32

In my last post I attempted to cluster Game of Thrones episodes based on character appearances without much success. After I wrote that post I was flicking through the scikit-learn clustering documentation and noticed the following section which describes some of the weaknesses of the K-means clustering algorithm:

Inertia is not a normalized metric: we just know that lower values are better and zero is optimal.

But in very high-dimensional spaces, Euclidean distances tend to become inflated (this is an instance of the so-called “curse of dimensionality”).

Running a dimensionality reduction algorithm such as PCA prior to k-means clustering can alleviate this problem and speed up the computations.

Each episode has 638 dimensions so this is probably the problem we’re seeing. I actually thought the ‘curse of dimensionality’ referred to the greater than linear increase in computation time; I hadn’t realised it could also impact the clustering itself.

As the documentation notes, the K-Means algorithm calculates euclidean distances to work out which cluster episodes should go in. Episodes in the same cluster should have a small euclidean distance and items in different clusters should have larger ones.

I created a little script to help me understand the curse of dimensionality. I’ve got 4 pairs of vectors, of size 4, 6, 100, and 600. Half of the items in the vector match and the other half differ. I calculate the cosine similarity and euclidean distance for each pair of vectors:

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
 
def distances(a, b):
    return np.linalg.norm(a-b), cosine_similarity([a, b])[0][1]
 
def mixed(n_zeros, n_ones):
    return np.concatenate((np.repeat([1], n_ones), np.repeat([0], n_zeros)), axis=0)
 
def ones(n_ones):
    return np.repeat([1], n_ones)
 
print distances(mixed(2, 2), ones(4))
print distances(mixed(3, 3), ones(6))
print distances(mixed(50, 50), ones(100))
print distances(mixed(300, 300), ones(600))
 
(1.4142135623730951, 0.70710678118654746)
(1.7320508075688772, 0.70710678118654768)
(7.0710678118654755, 0.70710678118654757)
(17.320508075688775, 0.70710678118654746)

The euclidean distance for the 600 item vector is 17x larger than for the one containing 4 items despite having the same similarity score.

Having convinced myself that reducing the dimensionality of the vectors could make a difference I reduced the size of the episodes vectors using the the Truncated SVD algorithm before trying K-means clustering again.

First we reduce the dimensionality of the episodes vectors:

from sklearn.decomposition import TruncatedSVD
 
n_components = 2
reducer = TruncatedSVD(n_components=n_components)
reducer.fit(all)
new_all = reducer.transform(all)
print("%d: Percentage explained: %s\n" % (n_components, reducer.explained_variance_ratio_.sum()))
 
2: Percentage explained: 0.124579183633

I’m not sure how much I should be reducing the number of dimensions so I thought 2 would an interesting place to start. I’m not sure exactly what the output of the reducer.explained_variance_ratio_ function means so I need to do some more reading to figure out whether it makes sense to carry on with a dimension of 2.

For now though let’s try out the clustering algorithm again and see how it gets on:

from sklearn.cluster import KMeans
 
for n_clusters in range(2, 10):
    km = KMeans(n_clusters=n_clusters, init='k-means++', max_iter=100, n_init=1)
    cluster_labels = km.fit_predict(new_all)
    silhouette_avg = metrics.silhouette_score(new_all, cluster_labels, sample_size=1000)
 
    print n_clusters, silhouette_avg
 
2 0.559681096025
3 0.498456585461
4 0.524704352941
5 0.441580592398
6 0.44703058946
7 0.447895331824
8 0.433698007009
9 0.459874485986

This time out silhouette scores are much better. I came across a tutorial from the Guide to Advanced Data Analysis which includes a table explaining how to interpret this score:

2016 08 27 21 18 14

We have a couple of cluster sizes which fit in the ‘reasonable structure’ and a few just on the edge of fitting in that category.

I tried varying the number of dimensions and found that 3 worked reasonably well, but after that the silhouette score dropped rapidly. Once we reach 30 dimensions the silhouette score is almost the same as if we hadn’t reduced dimensionality at all.

I haven’t figured out a good way of visualising the results of my experiments where I vary the dimensions and number of clusters so that’s something to work on next. I find it quite difficult to see what’s going on by just staring at the raw numbers.

I also need to read up on the SVD algorithm to understand when it is/isn’t acceptable to reduce dimensions and how much I should be reducing them by.

Any questions/thoughts/advice do let me know in the comments.

Categories: Programming

Stuff The Internet Says On Scalability For August 26th, 2016

Hey, it's HighScalability time:

 

 

The Pixar render farm in 1995 is half of an iPhone (@BenedictEvans)

 

If you like this sort of Stuff then please support me on Patreon.
  • 33.0%: of all retail goods sold online in the US are sold on Amazon;  110.9 million: monthly Amazon unique visitors; 21 cents: cost of 30K batch derived page views on Lambda; 4th: grade level of Buzzfeed articles; $1 trillion: home value threatened by rising sea levels; $1.2B: Uber lost $1.2B on $2.1B in revenue in H1 2016; 1.58 trillion: miles Americans drove through June; 

  • Quotable Quotes:
    • @bendystraw: My best technical skill isn't coding, it's a willingness to ask questions, in front of everyone, about what I don't understand
    • @vmg: "ls is the IDE of producing lists of filenames"
    • @nicklockwood: The hardest problem in computer science is fighting the urge to solve a different, more interesting problem than the one at hand.
    • @RexRizzo: Wired: "Machine learning will TAKE OVER THE WORLD!" Amazon: "We see you bought a wallet. Would you like to buy ANOTHER WALLET?"
    • @viktorklang: "The very existence of Ethernet flow control may come as a shock" - http://jeffq.com/blog/the-ethernet-pause-frame/ 
    • @JoeEmison: 4/ (c) if you need stuff on prem, keep it on prem. No need to make your life harder by hooking it up to some bullshit that doesn't work well
    • @grayj_: Also people envision more than you think. Wright Brothers to cargo flights: 7 yrs. Steam engine to car: 7 yrs.
    • David Wentzlaff: With Piton, we really sat down and rethought computer architecture in order to build a chip specifically for data centres and the cloud
    • @thenewstack: In 2015, there was 1 talk about #microservcies at OSCON; in 2016, there were 30: @dberkholz #CloudNativeDay
    • The Memory Guy: Now for the bad news: This new technology [3D XPoint] will not be a factor in the market if Intel and Micron can’t make it, and last week’s IDF certainly gave little reason for optimism.
    • @Carnage4Life: $19 billion just to link WhatsApp graph with Facebook's is mundane. Expect deeper, more insidious connections coming
    • Seth Lloyd~ The universe is a quantum computer. Biological life is all about extracting meaningful information from a sea of bits.
    • Facebookk: To automate such design changes, the team introduced new models to FBNet in which IPs and circuits were allocated using design tools based on predefined rules, and relevant config snippets were generated for deployment.
    • Robert Graham: Despite the fact that everybody and their mother is buying iPhone 0days to hack phones, it's still the most secure phone. Androids are open to any old hacker -- iPhone are open only to nation state hackers.
    • oppositelock: I'm a former Google engineer working at another company now, and we use http/json rpc here. This RPC is the single highest consumer of cpu in our clusters, and our scale isn't all that large. I'm moving over to gRPC asap, for performance reasons.
    • Gary Sims: The purposes and goals of Fuchsia are still a mystery, however it is a serious undertaking. Dart is certainly key, as is Flutter.
    • @mjpt777: "We haven't made all that much progress on parallel computing in all those years." - Barbara Liskov
    • @AnupGhosh_: Just another sleepy August: 1. NSA crown jewels hacked. 2. Apple triple 0-day weaponized. 3. Short selling vulnerabilities for fun & profit.
    • @JoeEmison: Hypothesis: enterprises adopted CloudFoundry because at least it gets up and running (cf OpenStack), but now finding it so inferior to AWS.
    • Robert Metcalfe: I predict the Internet will soon go spectacularly supernova and in 1996 catastrophically collapse.
    • Alan Cooper~ Form follows function to Hell. If you are building something out of bits what does form follows function mean? Function follows the user. If you are focussing on functions you are missing the point. 
    • @etherealmind: I've _never_ seen a successful outsourcing arrangement. And I've work on both sides in more than 10 companies.
    • @musalbas: Schools need to stop spending years teaching kids garbage Microsoft PowerPoint skills and teach them Unix sysadmin skills.
    • Dan Woods: With data lakes there’s no inherent way to prioritize what data is going into the supply chain and how it will eventually be used. The result is like a museum with a huge collection of art, but no curator with the eye to tell what is worth displaying and what’s not.
    • Jay Kreps: Unlike scalability, multi-tenancy is something of a latent variable in the success of systems. You see hundreds of blog posts on benchmarking infrastructure systems—showing millions of requests per second on vast clusters—but far fewer about the work of scaling a system to hundreds or thousands of engineers and use cases. It’s just a lot harder to quantify multi-tenancy than it is to quantify scalability.
    • Jay Kreps: the advantage of Kafka is not just that it can handle that large application but that you can continue to deploy more and more apps to the same cluster as your adoption grows, without needing a siloed cluster for each use. 
    • @vambenepe: My secret superpower is using “reply” in situations where most others would use “reply all”.
    • @tvanfosson: Developer progression: instead of junior to senior 1. Simple and wrong 2. Complicated and wrong 3. Complicated and right 4. Simple and right
    • Maria Konnikova: The real confidence game feeds on the desire for magic, exploiting our endless taste for an existence that is more extraordinary and somehow more meaningful.
    • gpderetta: Apple A9 is a quite sophisticate CPU, there is no reason to believe is not using a state of the art predictor. The Samsung CPU might not have any advantage at all on this area.
    • Chetan Sharma: For 4G, we went from 0% to 25% penetration in 60 months, 25-50% in 21 months, 50-75% in 24 months and by the end of 2020, we will have 95%+ penetration. By 2020, US is likely to be 4 years ahead of Europe and 3 years ahead of China in LTE penetration. In fact, the industry vastly underestimated the growth of 4G in the US market. Will 5G growth curves be any different?

  • You know what's cool? A rubberband powered refrigerator. Or trillions of dollars...in space mining. Space Mining Company Plans to Launch Asteroid-Surveying Spacecraft by 2020. Billionaires get your rockets ready. It's a start: Weighing about 110 pounds, Prospector-1 will be powered by water, expelling superheated vapor to generate thrust. Since water will be the first resource mined from asteroids, this water propulsion system will allow future spacecraft–the ones that do the actual mining–to refuel on the go.

  • False positives in the new fully automated algorithmic driven world are red in tooth and claw. We may need a law. You know that feeling when you use your credit and you are told it is no longer valid? You are cutoff. Some algorithm has decided to isolate you from the world. At least you can call a credit card company. Have you ever tried to call a Cloud Company? Fred Trotter tells a scary story of not being able to face his accuser in Google Intrusion Detection Problem: So today our Google Cloud Account was suspended...Google threatened to shut our cloud account down in 3 days unless we did something…but made it impossible to complete that action...Google Cloud services shutdown the entire project...It is not safe to use any part of Google Cloud Services because their threat detection system has a fully automated allergic reaction to anything that has not seen before, and it is capable of taking down all of your cloud services, without limitation. 

  • In the "every car should come with a buggy whip" department we have The Absurd Fight Over Fund Documents You Probably Don't Read. $200 million would be saved if investors got their mutual fund reports online instead of on paper. You guessed it, there's a paper lobby against it. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Leadership and Management Are Not The Same Thing!

Follow me this way!

Follow me this way!

Most any substantive of discussion of Agile sooner or later turns to leadership.  As teams embrace the principles in the Agile Manifesto that foster self-organization and self-management, they often require a shift away from classic management techniques.  In some cases, as teams begin exploring Agile the idea of management becomes an anathema, while in other cases, the concepts of leadership and management are conflated. Leadership and management are not the same thing and in most organizations, both are required.

A manager is a person responsible for controlling, administering and/or directing all or part of an organization.  Alternatively, a leader is a person that provides motivation and vision that compels others to work with the leader to achieve the goals.

Comparing leaders and managers provides even more distinctions.  

Attribute Leader Manager Power Informal, Earned Formal, Hierarchical People Followers, Voluntary Subordinates, Authority Decision Making Facilitates Makes Vision Tactical Strategic Risk Posture Avoid Taker Culture Endorses Shapes

The list of attributes could go on; however, we can boil down the difference between a manager and leader to the distinction to three critical characteristics: earned power, vision, and followers. Most, if not all, of the rest of the attributes build on that base.  While the difference between a leader and manager is knife edge sharp, who plays each role is far murkier.

A leader provides the vision and motivation needed to push the boundaries, change direction and challenge the status quo ante. Managers, on the other hand, deliver the administration needed for an organization to run; for example, creating and managing budgets, hiring and firing personnel, signing contracts, and other equally important tasks.  Without someone to “handle” these tasks, no amount of leadership will keep an organization going in the long run. What makes a manager in the knowledge economy different is the need to empower subordinates to plan the day-to-day detail and leaders to leader all the while providing the environment for the magic to happen.

In the knowledge economy, value is dependent on the information available and the ability of people workers who are no longer undifferentiated cogs in the machine.  In this new world, management and leadership are not easily separated. Knowledge workers look to their managers and leaders to provide a vision and a purpose. In order to deliver organizational value, managers must provide the organization that facilitates the development of skills and talent while simultaneously inspiring results. In today’s business environment the role of manager and leader are dependent on each other.  Both roles are required in any partnership, team or multinational organization. There is no reason why a leader and manager can’t be the same person playing different roles based on context, but both roles need to be played.

Management / Leadership Thread

  • Five Different Management Styles
  • Leadership versus Management (Current)
  • Management Styles in Agile Teams
  • Management Styles in Scaled Agile
  • Servant Leaders, Revisited

 


Categories: Process Management

scikit-learn: Trying to find clusters of Game of Thrones episodes

Mark Needham - Thu, 08/25/2016 - 23:07

In my last post I showed how to find similar Game of Thrones episodes based on the characters that appear in different episodes. This allowed us to find similar episodes on an episode by episode basis, but I was curious whether there were groups of similar episodes that we could identify.

scikit-learn provides several clustering algorithms that can run over our episode vectors and hopefully find clusters of similar episodes. A clustering algorithm groups similar documents together, where similarity is based on calculating a ‘distance’ between documents. Documents separated by a small distance would be in the same cluster, whereas if there’s a large distance between episodes then they’d probably be in different clusters.

The simplest variant is K-means clustering:

The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified.

The output from the algorithm is a list of labels which correspond to the cluster assigned to each episode.

Let’s give it a try on the Game of Thrones episodes. We’ll start from the 2 dimensional array of episodes/character appearances that we created in the previous post.

>>> all.shape
(60, 638)
 
>>> all
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

We have a 60 (episodes) x 638 (characters) array which we can now plug into the K-means clustering algorithm:

>>> from sklearn.cluster import KMeans
 
>>> n_clusters = 3
>>> km = KMeans(n_clusters=n_clusters, init='k-means++', max_iter=100, n_init=1)
>>> cluster_labels = km.fit_predict(all)
 
>>> cluster_labels
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 2, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

cluster_labels is an array containing a label for each episode in the all array. The spread of these labels is as follows:

>>> import numpy as np
>>> np.bincount(cluster_labels)
array([19, 12, 29])

i.e. 19 episodes in cluster 0, 12 in cluster 1, and 29 in cluster 2.

How do we know if the clustering is any good?

Ideally we’d have some labelled training data which we could compare our labels against, but since we don’t we can measure the effectiveness of our clustering by calculating inter-centroidal separation and intra-cluster variance.

i.e. how close are the episodes to other episodes in the same cluster vs how close are they to episodes in the closest different cluster.

scikit-learn gives us a function that we can use to calculate this score – the silhouette coefficient.

The output of this function is a score between -1 and 1.

  • A score of 1 means that our clustering has worked well and a document is far away from the boundary of another cluster.
  • A score of -1 means that our document should have been placed in another cluster.
  • A score of 0 means that the document is very close to the decision boundary between two clusters.

I tried calculating this coefficient for some different values of K. This is what I found:

from sklearn import metrics
 
for n_clusters in range(2, 10):
    km = KMeans(n_clusters=n_clusters, init='k-means++', max_iter=100, n_init=1)
    cluster_labels = km.fit_predict(all)
 
    silhouette_avg = metrics.silhouette_score(all, cluster_labels, sample_size=1000)
    sample_silhouette_values = metrics.silhouette_samples(all, cluster_labels)
 
    print n_clusters, silhouette_avg
 
2 0.0798610142955
3 0.0648416081725
4 0.0390877994786
5 0.020165277756
6 0.030557856406
7 0.0389677156458
8 0.0590721834989
9 0.0466170527996

The best score we manage here is 0.07 when we set the number of clusters to 2. Even our highest score is much lower than the lowest score on the documentation page!

I tried it out with some higher values of K but only saw a score over 0.5 once I put the number of clusters to 40 which would mean 1 or 2 episodes per cluster at most.

At the moment our episode arrays contain 638 elements so they’re too long to visualise on a 2D silhouette plot. We’d need to apply a dimensionality reduction algorithm before doing that.

In summary it looks like character co-occurrence isn’t a good way to cluster episodes. I’m curious what would happen if we flip the array on its head and try and cluster the characters instead, but that’s for another day.

If anyone spots anything that I’ve missed when reading the output of the algorithm let me know in the comments. I’m just learning by experimentation at the moment.

Categories: Programming

Can Software Make You Less Racist?

Coding Horror - Jeff Atwood - Thu, 08/25/2016 - 08:52

I don't think we computer geeks appreciate how profoundly the rise of the smartphone, and Facebook, has changed the Internet audience. It's something that really only happened in the last five years, as smartphones and data plans dropped radically in price and became accessible – and addictive – to huge segments of the population.

People may have regularly used computers in 2007, sure, but that is a very different thing than having your computer in your pocket, 24/7, with you every step of every day, fully integrated into your life. As Jerry Seinfeld noted in 2014:

But I know you got your phone. Everybody here's got their phone. There's not one person here who doesn't have it. You better have it … you gotta have it. Because there is no safety, there is no comfort, there is no security for you in this life any more … unless when you're walking down the street you can feel a hard rectangle in your pants.

It's an addiction that is new to millions – but eerily familiar to us.

From "only nerds will use the Internet" to "everyone stares at their smartphones all day long!" in 20 years. Not bad, team :-).

— Marc Andreessen (@pmarca) January 16, 2015

The good news is that, at this moment, every human being is far more connected to their fellow humans than any human has ever been in the entirety of recorded history.

Spoiler alert: that's also the bad news.

Nextdoor is a Facebook-alike focused on specific neighborhoods. The idea is that you and everyone else on your block would join, and you can privately discuss local events, block parties, and generally hang out like neighbors do. It's a good idea, and my wife started using it a fair amount in the last few years. We feel more connected to our neighbors through the service. But one unfortunate thing you'll find out when using Nextdoor is that your neighbors are probably a little bit racist.

I don't use Nextdoor myself, but I remember Betsy specifically complaining about the casual racism she saw there, and I've also seen it mentioned several times on Twitter by people I follow. They're not the only ones. It became so epidemic that Nextdoor got a reputation for being a racial profiling hub. Which is obviously not good.

Social networking historically trends young, with the early adopters. Facebook launched as a site for college students. But as those networks grow, they inevitably age. They begin to include older people. And those older people will, statistically speaking, be more racist. I apologize if this sounds ageist, but let me ask you something: do you consider your parents a little racist? I will personally admit that one of my parents is definitely someone I would label a little bit racist. It's … not awesome.

The older the person, the more likely they are to have these "old fashioned" notions that the mere presence of differently-colored people on your block is inherently suspicious, and marriage should probably be defined as between a man and a woman.

In one meta-analysis by Jeffrey Lax and Justin Phillips of Columbia University, a majority of 18–29 year old Americans in 38 states support same sex marriage while in only 6 states do less than 45% of 18–29 year olds support same-sex marriage. At the same time not a single state shows support for same-sex marriage greater than 35% amongst those 64 and older

The idea that regressive social opinions correlate with age isn't an opinion; it's a statistical fact.

Support for same-sex marriage in the U.S.

18 - 29 years old    65%
30 - 49 years old    54%
50 - 64 years old    45%
65+ years old        39%

Are there progressive septuagenarians? Sure there are. But not many.

To me, failure to support same-sex marriage is as inconceivable as failing to support interracial marriage. Which was not that long ago, to the tune of the late 60s and early 70s. If you want some truly hair-raising reading, try Loving v. Virginia on for size. Because Virginia is for lovers. Just not those kind of lovers, 49 years ago. In the interests of full disclosure, I am 45 years old, and I graduated from the University of Virginia.

With Nextdoor, you're more connected with your neighbors than ever before. But through that connection you may also find out some regressive things about your neighbors that you'd never have discovered in years of the traditional daily routine of polite waves, hellos from the driveway, and casual sidewalk conversations.

To their immense credit, rather than accepting this status quo, Nextdoor did what any self-respecting computer geek would do: they changed their software. Now, when you attempt to post about a crime or suspicious activity …

… you get smart, just in time nudges to think less about race, and more about behavior.

The results were striking:

Nextdoor claims this new multi-step system has, so far, reduced instances of racial profiling by 75%. It’s also decreased considerably the number of notes about crime and safety. During testing, the number of crime and safety issue reports abandoned before being published rose by 50%. “It’s a fairly significant dropoff,” said Tolia, “but we believe that, for Nextdoor, quality is more important than quantity.”

I'm a huge fan of designing software to help nudge people, at exactly the right time, to be their better selves. And this is a textbook example of doing it right.

Would using Nextdoor and encountering these dialogs make my aforementioned parent a little bit less racist? Probably not. But I like to think they would stop for at least a moment and consider the importance of focusing on the behavior that is problematic, rather than the individual person. This is a philosophy I promoted on Stack Overflow, I continue to promote with Discourse, and I reinforce daily with our three kids. You never, ever judge someone by what they look like. Look at what they do instead.

If you were getting excited about the prospect of validating Betteridge's Law yet again, I'm sorry to disappoint you. I truly do believe software, properly designed software, can not only help us be more civil to each other, but can also help people – maybe even people you love – behave a bit less like racists online.

[advertisement] At Stack Overflow, we help developers learn, share, and grow. Whether you’re looking for your next dream job or looking to build out your team, we've got your back.
Categories: Programming

Can Software Make You Less Racist?

Coding Horror - Jeff Atwood - Thu, 08/25/2016 - 08:52

I don't think we computer geeks appreciate how profoundly the rise of the smartphone, and Facebook, has changed the Internet audience. It's something that really only happened in the last five years, as smartphones and data plans dropped radically in price and became accessible – and addictive – to huge segments of the population.

People may have regularly used computers in 2007, sure, but that is a very different thing than having your computer in your pocket, 24/7, with you every step of every day, integrated into your life. As Jerry Seinfeld noted in 2014:

But I know you got your phone. Everybody here's got their phone. There's not one person here who doesn't have it. You better have it … you gotta have it. Because there is no safety, there is no comfort, there is no security for you in this life any more … unless when you're walking down the street you can feel a hard rectangle in your pants.

It's an addiction that is new to millions – but eerily familiar to us.

From "only nerds will use the Internet" to "everyone stares at their smartphones all day long!" in 20 years. Not bad, team :-).

— Marc Andreessen (@pmarca) January 16, 2015

The good news is that, at this moment, every human being is far more connected to their fellow humans than any human has ever been in the entirety of recorded history.

Spoiler alert: that's also the bad news.

Nextdoor is a Facebook-alike focused on specific neighborhoods. The idea is that you and everyone else on your block would join, and you can privately discuss local events, block parties, and generally hang out like neighbors do. It's a good idea, and my wife started using it a fair amount in the last few years. We feel more connected to our neighbors through the service. But one unfortunate thing you'll find out when using Nextdoor is that your neighbors are probably a little bit racist.

I don't use Nextdoor myself, but I remember Betsy specifically complaining about the casual racism she saw there, and I've also seen it mentioned several times on Twitter by people I follow. They're not the only ones. It became so epidemic that Nextdoor got a reputation for being a racial profiling hub. Which is obviously not good.

Social networking historically trends young, with the early adopters. Facebook launched as a site for college students. But as those networks grow, they inevitably age. They begin to include older people. And those older people will, statistically speaking, be more racist. I apologize if this sounds ageist, but let me ask you something: do you consider your parents a little racist? I will personally admit that one of my parents is definitely someone I would label a little bit racist. It's … not awesome.

The older the person, the more likely they are to have these "old fashioned" notions that the mere presence of differently-colored people on your block is inherently suspicious, and marriage should probably be defined as between a man and a woman.

In one meta-analysis by Jeffrey Lax and Justin Phillips of Columbia University, a majority of 18–29 year old Americans in 38 states support same sex marriage while in only 6 states do less than 45% of 18–29 year olds support same-sex marriage. At the same time not a single state shows support for same-sex marriage greater than 35% amongst those 64 and older

The idea that regressive social opinions correlate with age isn't an opinion; it's a statistical fact.

Support for same-sex marriage in the U.S.

18 - 29 years old    65%
30 - 49 years old    54%
50 - 64 years old    45%
65+ years old        39%

Are there progressive septuagenarians? Sure there are. But not many.

To me, failure to support same-sex marriage is as inconceivable as failing to support interracial marriage. Which was not that long ago, to the tune of the late 60s and early 70s. If you want some truly hair-raising reading, try Loving v. Virginia on for size. Because Virginia is for lovers. Just not those kind of lovers, 49 years ago. In the interests of full disclosure, I am 45 years old, and I graduated from the University of Virginia.

With Nextdoor, you're more connected with your neighbors than ever before. But through that connection you may also find out some regressive things about your neighbors that you'd never have discovered in years of the traditional daily routine of polite waves, hellos from the driveway, and casual sidewalk conversations.

To their immense credit, rather than accepting this status quo, Nextdoor did what any self-respecting computer geek would do: they changed their software. Now, when you attempt to post about a crime or suspicious activity …

… you get smart, just in time nudges to think less about race, and more about behavior.

The results were striking:

Nextdoor claims this new multi-step system has, so far, reduced instances of racial profiling by 75%. It’s also decreased considerably the number of notes about crime and safety. During testing, the number of crime and safety issue reports abandoned before being published rose by 50%. “It’s a fairly significant dropoff,” said Tolia, “but we believe that, for Nextdoor, quality is more important than quantity.”

I'm a huge fan of designing software to help nudge people, at exactly the right time, to be their better selves. And this is a textbook example of doing it right.

Would using Nextdoor and encountering these dialogs make my aforementioned parent a little bit less racist? Probably not. But I like to think they would stop for at least a moment and consider the importance of focusing on the behavior that is problematic, rather than the individual person. This is a philosophy I promoted on Stack Overflow, I continue to promote with Discourse, and I reinforce daily with our three kids. You never, ever judge someone by what they look like. Look at what they do instead.

If you were getting excited about the prospect of validating Betteridge's Law yet again, I'm sorry to disappoint you. I truly do believe software, properly designed software, can not only help people be more civil to each other, but can also help people – maybe even people you love – behave a bit less like racists online.

[advertisement] At Stack Overflow, we help developers learn, share, and grow. Whether you’re looking for your next dream job or looking to build out your team, we've got your back.
Categories: Programming

Software Development Conferences Forecast August 2016

From the Editor of Methods & Tools - Wed, 08/24/2016 - 15:34
Here is a list of software development related conferences and events on Agile project management ( Scrum, Lean, Kanban), software testing and software quality, software architecture, programming (Java, .NET, JavaScript, Ruby, Python, PHP), DevOps and databases (NoSQL, MySQL, etc.) that will take place in the coming weeks and that have media partnerships with the Methods […]

Five Different Management Styles

Attention Dogs

Delegative management?

During a keynote speech at a conference I recently attended I listened to a Phillip Lew challenge the orthodoxy of Agile’s over-reliance on group decision making (participative management) styles.  Agile teams are typically built a presumption that group decision making is all that is needed to deliver value to the customer. Group decision making is often compared to classic command and control forms (autocratic) of management. In many cases, autocratic management is portrayed as the antithesis of Agile which casts the discussion in terms of good and evil.  Casting the discussion of management styles in terms of good and evil is probably an overstatement and just believing that are there only two management styles is a miss-statement. Let’s deal with the miss-statement first. There are at least five common management styles. Before we can wrestle with whether Agile only works if a pure participative management is used we need to agree on a few definitions.

Autocratic management, also known as directive or command and control, is based on leader assigning work to subordinates and then overseeing the completion of the work. Taylorism, discussed in the re-read of XP Explained, is a form of autocratic management.

Delegative management is a relatively hands-off approach in which a manager hands off a piece of work to the team or lower level manager and lets them deal with it. The hand-off of work includes the responsibility for accomplishing the work. This form of management can be thought of as fire-and-forget.  Assignments, once made, are left to be performed with a minimum of supervision.

Negotiative management features interactions to reach a common agreement. Negotiative management can be practiced between individuals or groups.  Decisions are often based on the power differentials between the participants and their negotiating abilities.

Participative management encourages all people involved in the work to contribute to organizing work and developing solutions. This management style works best when it leverages the diversity of knowledge and experience of the participants. Decisions under this style of management is typically marked by group consensus. This is the most prominent management style seen at the team level in Agile, which highlights self-organizing and self-managing groups.

Consultative management is a hybrid form of management that combines participative and autocratic forms of management.  The leader involves participants as an input, but ultimately the leader then makes the final decision.  When I was in the corporate world this was a typical management approach because preserves a clear chain of responsibility.  

None of these management styles is prima facie good or bad.  They are contextually useful.  For example, in the middle of a traffic accident, a driver will not have time to evoke participative management, but rather will need to make and implement decisions autocratically.  Each management type has strengths, weaknesses and can be applied effectively in specific contexts.

 

Next:

  • Leadership versus Management
  • Management Styles in Agile Teams
  • Management Styles in Scaled Agile
  • Servant Leaders, Revisted

Categories: Process Management

The Always On Architecture - Moving Beyond Legacy Disaster Recovery

Failover does not cut it anymore. You need an ALWAYS ON architecture with multiple data centers. -- Martin Van Ryswyk, VP of Engineering at DataStax

Failover, switching to a redundant or standby system when a component fails, has a long and checkered history as a way of dealing with failure. The reason is your failover mechanism becomes a single point of failure that often fails just when it's needed most. Having worked on a few telecom systems that used a failover strategy I know exactly how stressful failover events can be and how stupid you feel when your failover fails. If you have a double or triple fault in your system failover is exactly the time when it will happen. 

For a long time the only real trick we had for achieving fault tolerance was to have a hot, warm, or cold standby (disk, interface, card, server, router, generator, datacenter, etc.) and failover to it when there's a problem. This old style of Disaster Recovery planning is no longer adequate or necessary.

Now, thanks to cloud infrastructures, at least at a software system level, we have an alternative: an always on architecture. Google calls this a natively multihomed architecture. You can distribute data across multiple datacenters in such away that all your datacenters are always active. Each datacenter can automatically scale capacity up and down depending on what happens to other datacenters. You know, the usual sort of cloud propaganda. Robin Schumacher makes a good case here: Long live Dear CXO – When Will What Happened to Delta Happen to You?

Recent Problems With Disaster !Recovery
Categories: Architecture

What Are Story Points?

Mike Cohn's Blog - Tue, 08/23/2016 - 15:00

Story points are a unit of measure for expressing an estimate of the overall effort that will be required to fully implement a product backlog item or any other piece of work.

When we estimate with story points, we assign a point value to each item. The raw values we assign are unimportant. What matters are the relative values. A story that is assigned a 2 should be twice as much as a story that is assigned a 1. It should also be two-thirds of a story that is estimated as 3 story points.

Instead of assigning 1, 2 and 3, that team could instead have assigned 100, 200 and 300. Or 1 million, 2 million and 3 million. It is the ratios that matter, not the actual numbers.

What Goes Into a Story Point?

Because story points represent the effort to develop a story, a team’s estimate must include everything that can affect the effort. That could include:

  • The amount of work to do
  • The complexity of the work
  • Any risk or uncertainty in doing the work

When estimating with story points, be sure to consider each of these factors. Let’s see how each impacts the effort estimate given by story points.

The Amount of Work to Do

Certainly, if there is more to do of something, the estimate of effort should be larger. Consider the case of developing two web pages. The first page has only one field and a label asking to enter a name. The second page has 100 fields to also simply be filled with a bit of text.

The second page is no more complex. There are no interactions among the fields and each is nothing more than a bit of text. There’s no additional risk on the second page. The only difference between these two pages is that there is more to do on the second page.

The second page should be given more story points. It probably doesn’t get 100 times more points even though there are 100 times as many fields. There are, after all, economies of scale and maybe making the second page is only 2 or 3 or 10 times as much effort as the first page.

Risk and Uncertainty

The amount of risk and uncertainty in a product backlog item should affect the story point estimate given to the item.

If a team is asked to estimate a product backlog item and the stakeholder asking for it is unclear about what will be needed, that uncertainty should be reflected in the estimate.

If implementing a feature involves changing a particular piece of old, brittle code that has no automated tests in place, that risk should be reflected in the estimate.

Complexity

Complexity should also be considered when providing a story point estimate. Think back to the earlier example of developing a web page with 100 trivial text fields with no interactions between them.

Now think about another web page also with 100 fields. But some are date fields with calendar widgets that pop up. Some are formatted text fields like phone numbers or Social Security numbers. Other fields do checksum validations as with credit card numbers.

This screen also requires interactions between fields. If the user enters a Visa card, a three-digit CVV field is shown. But if the user enters an American Express card, a four-digit CVV field is shown.

Even though there are still 100 fields on this screen, these fields are harder to implement. They’re more complex. They’ll take more time. There’s more chance the developer makes a mistake and has to back up and correct it.

This additional complexity should be reflected in the estimate provided.

Consider All Factors: Amount of Work, Risk and Uncertainty, and Complexity

It may seem impossible to combine three factors into one number and provide that as an estimate. It’s possible, though, because effort is the unifying factor. Estimators consider how much effort will be required to do the amount of work described by a product backlog item.

Estimators then consider how much effort to include for dealing with the risk and uncertainty inherent in the product backlog item. Usually this is done by considering the risk of a problem occurring and the impact if the risk does occur. So, for example, more will be included in the estimate for a time-consuming risk that is likely to occur than for a minor and unlikely risk.

Estimators also consider the complexity of the work to be done. Work that is complex will require more thinking, may require more trial-and-error experimentation, perhaps more back-and-forth with a customer, may take longer to validate and may need more time to correct mistakes.

All three factors must be combined.

Consider Everything in the Definition of Done

A story point estimate must include everything involved in getting a product backlog item all the way to done. If a team’s definition of done includes creating automated tests to validate the story (and that would be a good idea), the effort to create those tests should be included in the story point estimate.

Story points can be a hard concept to grasp. But the effort to fully understand that points represent effort as impacted by the amount of work, the complexity of the work and any risk or uncertainty in the work will be worth it.

Connecting Project Benefits to Business Strategy for Success

Herding Cats - Glen Alleman - Tue, 08/23/2016 - 05:01

The current PMI Pulse titled Delivering Value: Focus on Benefits during Project Execution provides some guidance on how to manage the benefits side of an IT project. But the article misses the mark on an important concept. This is a chart in the paper, suggesting the metrics of the benefits.

But where do these metrics come from?

Screen Shot 2016-08-16 at 1.00.29 PM

The question is where do the measures of the benefits listed in the above chart come from? 

The answer is they come from the Strategy of the IT function. Where is the strategy defined? The answer is in the Balanced Scorecard. This is how ALL connections are made in enterprise IT projects. Why are we doing something? How will we recognize that it's the right thing to do? What are the measures of the outcomes connected to each other and connected to the top level strategy to the strategic needs of the firm.

When you hear we can't forecast the benefits in the future from our work, you can count on the firm spending a pile of money for probably not much value. Follow the steps starting on page 47 in the presentation above and build the 4 perspectives and connect the initiatives.
  • Stakeholder - what does the business need in terms of beneficial outcomes?
  • Internal Processes - what governance processes will be used to produce these outcomes?
  • Learings and Growth - what people, information, and organizational elements will be needed to execute the process to produce the benefical outcomes?
  • Budget - what are you willing to spend to achieve these beneficial outcomes

As always, each of these is a random variable operating in the presence of uncertanty, creating risk that they will not be achieved. As always, this means making estimates of both the beneficial outcomes and the cost to achieve them. 

Like all non-trivial projects, estimating is a critical success factor. Uncertainty is unavoidable. Making decisions in the presence of uncertanty is unavoidable. Having some hope that the decision will result in a beneficial outcomes requires making estimates of that outcome and choosing the most likely beneficial outcome.  

Anyone telling you otherwise is working in a de-minimis project. 

So Let's Apply These Principles to a Recent Post

A post Uncertainty of benefits versus costs, has some ideas that need addressing ...

  • Return of an investment is the benefits minus the costs. 
    • And both are random variables subject to reducible and irreducible uncertanties.
    • Start by building a model of these uncertainties.
    • Apply that model and update it with data from the project as it proceeds.
  • Most people focus way too much on costs and not enough on benefits.
    • Why? This is bad management. This is naive management. Stop doing stupid things on purpose.
    • Risk Management (from the underlying uncertainties) is how adults manage projects - Tim Lister.
    • Behave like an adult, manage the risk.
  • If you are working on anything innovative, your benefit uncertainty is crazy high.
    • Says who?
    • If you don't have some statistically confident sense of what the pay off is going to be, you'd better be ready to spend money to find out before you spend all the money.
    • This is naive project management and naive business management.
    • It's counter to the first bullet - ROI = (Value - Cost)/Cost. 
    • Having an acceptable level of confidence in both Value and Cost is part of Adult Management of other people's money.
  • But we can’t estimate based on data, it has to be guesses!
    • No estimates, are not guesses unless done by a Child. 
    • Estimates ARE based on data. This is called Reference Class Forecasting. Also parametric models use past performance to project future performance.
    • If Your cost estimation might be off by +/- 50%, but your benefit estimation could be off by +/-95% (or more), you're pretty much clueless about what the customer wants. Or you're spending money on a R&D project to find out. This is one of those examples conjectured by inexperienced estimators. This is not how it works in any mature firm.
    • Adults don't guess, they estimate.
    • Adults know how to estimate. Lots of books, papers, and tools.
  • So we should all stop doing estimates, right?
    • No - an estimate is a forecast and a commitment.
    • The commitment MUST have a confidence level.
    • We have 80% confidence of launching on or before the 3rd week in November 2014 for 4 astronauts in our vehicle to the International Space station. This was a VERY innovative system. This is why a contract for $3.5B was awarded. This approach is applicable to ALL projects
    • Any de minimis projects have not deadline or a Not to Exceed target cost.

All projects are probabilistic. All projects have uncertainty in cost and benefits. Estimating both cost and benefit, continuously updating those estimates, and taking action to correct unfavorable variances from plan, is how adults manage projects.

  Related articles Strategy is Not the Same as Operational Effectiveness Decision Making Without Estimates? Local Firm Has Critical Message for Project Managers Architecture -Center ERP Systems in the Manufacturing Domain The Purpose Of Guiding Principles in Project Management Herding Cats: Large Programs Require Special Processes Herding Cats: The Problems with Schedules Quote of the Day The Cost Estimating Problem Estimating Processes in Support of Economic Analysis
Categories: Project Management

Taking the final wrapper off of Android 7.0 Nougat

Android Developers Blog - Tue, 08/23/2016 - 00:24

Posted by Dave Burke, VP of Engineering

Android Nougat

Android 7.0 Nougat

Today, Android 7.0 Nougat will begin rolling out to users, starting with Nexus devices. At the same time, we’re pushing the Android 7.0 source code to the Android Open Source Project (AOSP), extending public availability of this new version of Android to the broader ecosystem.

We’ve been working together with you over the past several months to get your feedback on this release, and also to make sure your apps are ready for the users who will run them on Nougat devices.

What’s inside Nougat

Android Nougat reflects input from thousands of fans and developers like you, all around the world. There are over 250 major features in Android Nougat, including VR Mode in Android. We’ve worked at all levels of the Android stack in Nougat — from how the operating system reads sensor data to how it sends pixels to the display — to make it especially built to provide high quality mobile VR experiences.

Plus, Nougat brings a number of new features to help make Android more powerful, more productive and more secure. It introduces a brand new JIT/AOT compiler to improve software performance, make app installs faster, and take up less storage. It also adds platform support for Vulkan, a low-overhead, cross-platform API for high-performance, 3D graphics. Multi-Window support lets users run two apps at the same time, and Direct Reply so users can reply directly to notifications without having to open the app. As always, Android is built with powerful layers of security and encryption to keep your private data private, so Nougat brings new features like File-based encryption, seamless updates, and Direct Boot.

You can find all of the Nougat developer resources here, including details on behavior changes and new features you can use in your apps. An overview of what's new for developers is available here, and you can explore all of the new user features in Nougat here.

Multi-window mode in Android Nougat

Multi-window mode in Android Nougat

The next wave of users

Starting today and rolling out over the next several weeks, the Nexus 6, Nexus 5X, Nexus 6P, Nexus 9, Nexus Player, Pixel C, and General Mobile 4G (Android One) will get an over-the-air software update to Android 7.0 Nougat. Devices enrolled in the Android Beta Program will also receive this final version.

And there are many tasty devices coming from our partners running Android Nougat, including the upcoming LG V20, which will be the first new smartphone that ships with Android Nougat, right out of the box.

With all of these new devices beginning to run Nougat, now is the time to publish your app updates to Google Play. We recommend compiling against, and ideally targeting, API 24. If you’re still testing some last minute changes, a great strategy to do this is using Google Play’s beta testing feature to get early feedback from a small group of users — including those using Android 7.0 Nougat — and then doing a staged rollout as you release the updated app to all users.

What’s next for Nougat?

We’re moving Nougat into a new regular maintenance schedule over the coming quarters. In fact, we’ve already started work on the first Nougat maintenance release, that will bring continued refinements and polish, and we’re planning to bring that to you this fall as a developer preview. Stay tuned!

We’ll be closing open bugs logged against Developer Preview builds soon, but please keep the feedback coming! If you still see an issue that you filed in the preview tracker, just file a new issue against Android 7.0 in the AOSP issue tracker.

Thanks for being part of the preview, which we shared earlier this year with an eye towards giving everyone the opportunity to make the next release of Android stronger. Your continued feedback has been extremely beneficial in shaping this final release, not just for users, but for the entire Android ecosystem.

Categories: Programming

Modernizing OAuth interactions in Native Apps for Better Usability and Security

Google Code Blog - Mon, 08/22/2016 - 22:29

Posted by William Denniss, Product Manager, Identity and Authentication

The Identity team is constantly striving to help Google users sign-in to third-party applications with their Google account in a secure and seamless way, and enable users to share select information from their account such as their calendar or contact information with other apps, when they wish to do so.

Under the hood these interactions happen via OAuth requests, and over the years Google has supported a number of ways for developers to implement OAuth flows with us. With improved security and usability in mind, we will soon be ending the support for one of these ways. In the coming months, we will no longer allow OAuth requests to Google in embedded browsers known as “web-views”, such as the WebView UI element on Android and UIWebView/WKWebView on iOS, and equivalents on Windows and OS X.
Using the device browser for OAuth requests instead of an embedded web-view can improve the usability of your apps significantly: users only need to sign-in to Google once per device, improving conversion rates of sign-in and authorization flows in your app. Modern “in-app browser tab” patterns available on some operating systems, such as Chrome Custom Tabs on Android and SFSafariViewController on iOS offer further UX improvements for browser-based OAuth flows.

In contrast, the outdated method of using embedded browsers for OAuth means a user must sign-in to Google each time, instead of using the existing logged-in session from the device. The device browser also provides improved security as apps are able to inspect and modify content in a web-view, but not content shown in the browser.

To help you migrate, we offer libraries and samples that follow modern best practices which you can use:

  • Google Sign-In for Android and iOS, our recommended SDK for sign-in and OAuth with Google Accounts.
  • AppAuth for Android, iOS, and OS X, an open source OAuth client library that can be used with Google and other OAuth providers. We also offer GTMAppAuth (for iOS and OS X), a library which enables AppAuth support for the Google APIs Client Library for Objective-C, and the GTM Session Fetcherprojects.
  • Google Sign-in and OAuth Examples for Windows, examples demonstrating how to use the browser to authenticate Google users in various Windows environments such as Universal Windows Platform (UWP), console and desktop apps.

You can also read protocol-level documentation for our standards-based support of OAuth for Native Apps, and an IETF best current practice draft on this topic.

Versions of Google Sign-In on iOS prior to version 3.0 don’t support the current industry best practices of the in-app browser tab, and therefore are also deprecated. If you use Google Sign-In, please update to the latest version to get all the recent security and usability improvements. For now, this policy does not remove our support of WebView on iOS 8, however we may start to display notices encouraging users to upgrade their device for better security.

The rollout schedule for the deprecation of web-views for OAuth requests to Google is as follows. Starting October 20, 2016, we will prevent new OAuth clients from using web-views on platforms with a viable alternative, and will phase in user-facing notices for existing OAuth clients. On April 20, 2017, we will start blocking OAuth requests using web-views for all OAuth clients on platforms where viable alternatives exist.

If you have any questions with the migration, please post to Stack Overflow tagged with “google-oauth”.

Categories: Programming

Neo4j/scikit-learn: Calculating the cosine similarity of Game of Thrones episodes

Mark Needham - Mon, 08/22/2016 - 22:12

A couple of months ago Praveena and I created a Game of Thrones dataset to use in a workshop and I thought it’d be fun to run it through some machine learning algorithms and hopefully find some interesting insights.

The dataset is available as CSV files but for this analysis I’m assuming that it’s already been imported into neo4j. If you want to import the data you can run the tutorial by typing the following into the query bar of the neo4j browser:

:play http://guides.neo4j.com/got

Since we don’t have any training data we’ll be using unsupervised learning methods, and we’ll start simple by calculating the similarity of episodes based character appearances. We’ll be using scitkit-learn‘s cosine similarity function to determine episode similarity.

Christian Perone has an excellent blog post explaining how to use cosine similarity on text documents which is well worth a read. We’ll be using a similar approach here, but instead of building a TF/IDF vector for each document we’re going to create a vector indicating whether a character appeared in an episode or not.

e.g. imagine that we have 3 characters – A, B, and C – and 2 episodes. A and B appear in the first episode and B and C appear in the second episode. We would represent that with the following vectors:

Episode 1 = [1, 1, 0]
Episode 2 = [0, 1, 1]

We could then calculate the cosine similarity between these two episodes like this:

>>> from sklearn.metrics.pairwise import cosine_similarity
>>> one = [1,1,0]
>>> two = [0,1,1]
 
>>> cosine_similarity([one, two])
array([[ 1. ,  0.5],
       [ 0.5,  1. ]])

So this is telling us that Episode 1 is 100% similar to Episode 1, Episode 2 is 100% similar to itself as well, and Episodes 1 and 2 are 50% similar to each other based on the fact that they both have an appearance of Character B.

Note that the character names aren’t even mentioned at all, they are implicitly a position in the array. This means that when we use our real dataset we need to ensure that the characters are in the same order for each episode, otherwise the calculation will be meaningless!

In neo4j land we have an APPEARED_IN relationship between a character and each episode that they appeared in. We can therefore write the following code using the Python driver to get all pairs of episodes and characters:

from neo4j.v1 import GraphDatabase, basic_auth
driver = GraphDatabase.driver("bolt://localhost", auth=basic_auth("neo4j", "neo"))
session = driver.session()
 
rows = session.run("""
    MATCH (c:Character), (e:Episode)
    OPTIONAL MATCH (c)-[appearance:APPEARED_IN]->(e)
    RETURN e, c, appearance
    ORDER BY e.id, c.id""")

We can iterate through the rows to see what the output looks like:

>>> for row in rows:
        print row
 
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=5415 labels=set([u'Character']) properties={u'name': u'Addam Marbrand', u'id': u'/wiki/Addam_Marbrand'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=5882 labels=set([u'Character']) properties={u'name': u'Adrack Humble', u'id': u'/wiki/Adrack_Humble'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=6747 labels=set([u'Character']) properties={u'name': u'Aegon V Targaryen', u'id': u'/wiki/Aegon_V_Targaryen'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=5750 labels=set([u'Character']) properties={u'name': u'Aemon', u'id': u'/wiki/Aemon'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=5928 labels=set([u'Character']) properties={u'name': u'Aeron Greyjoy', u'id': u'/wiki/Aeron_Greyjoy'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=5503 labels=set([u'Character']) properties={u'name': u'Aerys II Targaryen', u'id': u'/wiki/Aerys_II_Targaryen'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=6753 labels=set([u'Character']) properties={u'name': u'Alannys Greyjoy', u'id': u'/wiki/Alannys_Greyjoy'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=6750 labels=set([u'Character']) properties={u'name': u'Alerie Tyrell', u'id': u'/wiki/Alerie_Tyrell'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=5753 labels=set([u'Character']) properties={u'name': u'Alliser Thorne', u'id': u'/wiki/Alliser_Thorne'}> appearance=None>
<Record e=<Node id=6780 labels=set([u'Episode']) properties={u'season': 1, u'number': 1, u'id': 1, u'title': u'Winter Is Coming'}> c=<Node id=5858 labels=set([u'Character']) properties={u'name': u'Alton Lannister', u'id': u'/wiki/Alton_Lannister'}> appearance=None>

Next we’ll build a ‘matrix’ of episodes/characters. If a character appears in an episode then we’ll put a ‘1’ in the matrix, if not we’ll put a ‘0’:

episodes = {}
for row in rows:
    if episodes.get(row["e"]["id"]) is None:
        if row["appearance"] is None:
            episodes[row["e"]["id"]] = [0]
        else:
            episodes[row["e"]["id"]] = [1]
    else:
        if row["appearance"] is None:
            episodes[row["e"]["id"]].append(0)
        else:
            episodes[row["e"]["id"]].append(1)

Here’s an example of one entry in the matrix:

>>> len(episodes)
60
 
>>> len(episodes[1])
638
 
>>> episodes[1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

From this output we learn that there are 60 episodes and 638 characters in Game of Thrones so far. We can also see which characters appeared in the first episode, although it’s a bit tricky to work out which index in the array corresponds to each character.

The next thing we’re going to do is calculate the cosine similarity between episodes. Let’s start by seeing how similar the first episode is to all the others:

>>> all = episodes.values()
 
>>> cosine_similarity(all[0:1], all)[0]
array([ 1.        ,  0.69637306,  0.48196269,  0.54671752,  0.48196269,
        0.44733753,  0.31707317,  0.42340087,  0.34989921,  0.43314808,
        0.36597766,  0.18421252,  0.30961158,  0.2328101 ,  0.30616181,
        0.41905818,  0.36842504,  0.35338088,  0.18376917,  0.3569686 ,
        0.2328101 ,  0.34539847,  0.25043516,  0.31707317,  0.25329221,
        0.33342786,  0.34921515,  0.2174909 ,  0.2533473 ,  0.28429311,
        0.23026565,  0.22310537,  0.22365301,  0.23816275,  0.28242289,
        0.16070148,  0.24847093,  0.21434648,  0.03582872,  0.21189672,
        0.15460414,  0.17161693,  0.15460414,  0.17494961,  0.1234662 ,
        0.21426863,  0.21434648,  0.18748505,  0.15308091,  0.20161946,
        0.19877675,  0.30920827,  0.21058466,  0.19127301,  0.24607943,
        0.18033393,  0.17734311,  0.16296707,  0.18740851,  0.23995201])

The first entry in the array indicates that episode 1 is 100% similar to episode 1 which is a good start. It’s 69% similar to episode 2 and 48% similar to episode 3. We can sort that array to work out which episodes it’s most similar to:

>>> for idx, score in sorted(enumerate(cosine_similarity(all[0:1], all)[0]), key = lambda x: x[1], reverse = True)[:5]:
        print idx, score
 
0 1.0
1 0.696373059207
3 0.546717521051
2 0.481962692712
4 0.481962692712

Or we can see how similar the last episode of season 6 is compared to the others:

>>> for idx, score in sorted(enumerate(cosine_similarity(all[59:60], all)[0]), key = lambda x: x[1], reverse = True)[:5]:
        print idx, score
 
59 1.0
52 0.500670191678
46 0.449085146211
43 0.448218732478
49 0.446296233312

I found it a bit painful exploring similarities like this so I decided to write them into neo4j instead and then write a query to find the most similar episodes. The following query creates a SIMILAR_TO relationship between episodes and sets a score property on that relationship:

>>> episode_mapping = {}
>>> for idx, episode_id in enumerate(episodes):
        episode_mapping[idx] = episode_id
 
>>> for idx, episode_id in enumerate(episodes):
        similarity_matrix = cosine_similarity(all[idx:idx+1], all)[0]
        for other_idx, similarity_score in enumerate(similarity_matrix):
            other_episode_id = episode_mapping[other_idx]
            print episode_id, other_episode_id, similarity_score
            if episode_id != other_episode_id:
                session.run("""
                    MATCH (episode1:Episode {id: {episode1}}), (episode2:Episode {id: {episode2}})
                    MERGE (episode1)-[similarity:SIMILAR_TO]-(episode2)
                    ON CREATE SET similarity.score = {similarityScore}
                    """, {'episode1': episode_id, 'episode2': other_episode_id, 'similarityScore': similarity_score})
 
    session.close()

The episode_mapping dictionary is needed to map from episode ids to indices e.g. episode 1 is at index 0.

If we want to find the most similar pair of episodes in Game of Thrones we can execute the following query:

MATCH (episode1:Episode)-[similarity:SIMILAR_TO]-(episode2:Episode)
WHERE ID(episode1) > ID(episode2)
RETURN "S" + episode1.season + "E" + episode1.number AS ep1, 
       "S" + episode2.season + "E" + episode2.number AS ep2, 
       similarity.score AS score
ORDER BY similarity.score DESC
LIMIT 10
 
╒═════╤════╤══════════════════╕
│ep1  │ep2 │score             │
╞═════╪════╪══════════════════╡
│S1E2 │S1E1│0.6963730592072543│
├─────┼────┼──────────────────┤
│S1E4 │S1E3│0.6914173051223086│
├─────┼────┼──────────────────┤
│S1E9 │S1E8│0.6869464497590777│
├─────┼────┼──────────────────┤
│S2E10│S2E8│0.6869037302955034│
├─────┼────┼──────────────────┤
│S3E7 │S3E6│0.6819943394704735│
├─────┼────┼──────────────────┤
│S2E7 │S2E6│0.6813598225089799│
├─────┼────┼──────────────────┤
│S1E10│S1E9│0.6796436827080401│
├─────┼────┼──────────────────┤
│S1E5 │S1E4│0.6698105143372364│
├─────┼────┼──────────────────┤
│S1E10│S1E8│0.6624062584864754│
├─────┼────┼──────────────────┤
│S4E5 │S4E4│0.6518358737330705│
└─────┴────┴──────────────────┘

And the least popular?

MATCH (episode1:Episode)-[similarity:SIMILAR_TO]-(episode2:Episode)
WHERE ID(episode1) > ID(episode2)
RETURN "S" + episode1.season + "E" + episode1.number AS ep1, 
       "S" + episode2.season + "E" + episode2.number AS ep2, 
       similarity.score AS score
ORDER BY similarity.score
LIMIT 10
 
╒════╤════╤═══════════════════╕
│ep1 │ep2 │score              │
╞════╪════╪═══════════════════╡
│S4E9│S1E5│0                  │
├────┼────┼───────────────────┤
│S4E9│S1E6│0                  │
├────┼────┼───────────────────┤
│S4E9│S4E2│0                  │
├────┼────┼───────────────────┤
│S4E9│S2E9│0                  │
├────┼────┼───────────────────┤
│S4E9│S2E4│0                  │
├────┼────┼───────────────────┤
│S5E6│S4E9│0                  │
├────┼────┼───────────────────┤
│S6E8│S4E9│0                  │
├────┼────┼───────────────────┤
│S4E9│S4E6│0                  │
├────┼────┼───────────────────┤
│S3E9│S2E9│0.03181423814878889│
├────┼────┼───────────────────┤
│S4E9│S1E1│0.03582871819500093│
└────┴────┴───────────────────┘

The output of this query suggests that there are no common characters between 8 pairs of episodes which at first glance sounds surprising. Let’s write a query to check that finding:

MATCH (episode1:Episode)<-[:APPEARED_IN]-(character)-[:APPEARED_IN]->(episode2:Episode)
WHERE episode1.season = 4 AND episode1.number = 9 AND episode2.season = 1 AND episode2.number = 5
return episode1, episode2
 
(no changes, no rows)

It’s possible I made a mistake with the scraping of the data but from a quick look over the Wiki page I don’t think I have. I found it interesting that Season 4 Episode 9 shows up on 9 of the top 10 least similar pairs of episodes.

Next I’m going to cluster the episodes based on character appearances, but this post is long enough already so that’ll have to wait for another post another day.

Categories: Programming