Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator

ThreadSanitizer: Slaughtering Data Races

Google Testing Blog - Mon, 06/30/2014 - 22:30
by Dmitry Vyukov, Synchronization Lookout, Google, Moscow

Hello,

I work in the Dynamic Testing Tools team at Google. Our team develops tools like AddressSanitizer, MemorySanitizer and ThreadSanitizer which find various kinds of bugs. In this blog post I want to tell you about ThreadSanitizer, a fast data race detector for C++ and Go programs.

First of all, what is a data race? A data race occurs when two threads access the same variable concurrently, and at least one of the accesses attempts is a write. Most programming languages provide very weak guarantees, or no guarantees at all, for programs with data races. For example, in C++ absolutely any data race renders the behavior of the whole program as completely undefined (yes, it can suddenly format the hard drive). Data races are common in concurrent programs, and they are notoriously hard to debug and localize. A typical manifestation of a data race is when a program occasionally crashes with obscure symptoms, the symptoms are different each time and do not point to any particular place in the source code. Such bugs can take several months of debugging without particular success, since typical debugging techniques do not work. Fortunately, ThreadSanitizer can catch most data races in the blink of an eye. See Chromium issue 15577 for an example of such a data race and issue 18488 for the resolution.

Due to the complex nature of bugs caught by ThreadSanitizer, we don't suggest waiting until product release validation to use the tool. For example, in Google, we've made our tools easily accessible to programmers during development, so that anyone can use the tool for testing if they suspect that new code might introduce a race. For both Chromium and Google internal server codebase, we run unit tests that use the tool continuously. This catches many regressions instantly. The Chromium project has recently started using ThreadSanitizer on ClusterFuzz, a large scale fuzzing system. Finally, some teams also set up periodic end-to-end testing with ThreadSanitizer under a realistic workload, which proves to be extremely valuable. When races are found by the tool, our team has zero tolerance for races and does not consider any race to be benign, as even the most benign races can lead to memory corruption.

Our tools are dynamic (as opposed to static tools). This means that they do not merely "look" at the code and try to surmise where bugs can be; instead they they instrument the binary at build time and then analyze dynamic behavior of the program to catch it red-handed. This approach has its pros and cons. On one hand, the tool does not have any false positives, thus it does not bother a developer with something that is not a bug. On the other hand, in order to catch a bug, the test must expose a bug -- the racing data access attempts must be executed in different threads. This requires writing good multi-threaded tests and makes end-to-end testing especially effective.

As a bonus, ThreadSanitizer finds some other types of bugs: thread leaks, deadlocks, incorrect uses of mutexes, malloc calls in signal handlers, and more. It also natively understands atomic operations and thus can find bugs in lock-free algorithms (see e.g. this bug in the V8 concurrent garbage collector).

The tool is supported by both Clang and GCC compilers (only on Linux/Intel64). Using it is very simple: you just need to add a -fsanitize=thread flag during compilation and linking. For Go programs, you simply need to add a -race flag to the go tool (supported on Linux, Mac and Windows).

Interestingly, after integrating the tool into compilers, we've found some bugs in the compilers themselves. For example, LLVM was illegally widening stores, which can introduce very harmful data races into otherwise correct programs. And GCC was injecting unsafe code for initialization of function static variables. Among our other trophies are more than a thousand bugs in Chromium, Firefox, the Go standard library, WebRTC, OpenSSL, and of course in our internal projects.

So what are you waiting for? You know what to do!
Categories: Testing & QA

Complexity is Simple

Software Architecture Zen - Pete Cripp - Mon, 06/30/2014 - 20:18
I was taken with this cartoon and the comments put up by Hugh Macleod last week over at his gapingvoid.com blog so I hope he doesn’t mind me reproducing it here.

Read more...
Categories: Architecture

Neo4j/R: Grouping meetup members by join timestamp

Mark Needham - Mon, 06/30/2014 - 01:06

I wanted to do some ad-hoc analysis on the join date of members of the Neo4j London meetup group and since cypher doesn’t yet have functions for dealings with dates I thought I’d give R a try.

I started off by executing a cypher query which returned the join timestamp of all the group members using Nicole White’s RNeo4j package:

> library(Rneo4j)
 
> query = "match (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"})
RETURN membership.joined AS joinDate"
 
> meetupMembers = cypher(graph, query)
 
> meetupMembers[1:5,]
[1] 1.389107e+12 1.376572e+12 1.379491e+12 1.349454e+12 1.383127e+12

I realised that if I was going to do any date manipulation I’d need to translate the timestamp into an R friendly format so I wrote the following function to help me do that:

> timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01")

I added another column to the data frame with this date representation:

> meetupMembers$joined <- timestampToDate(meetupMembers$joinDate)
 
> meetupMembers[1:5,]
      joinDate              joined
1 1.389107e+12 2014-01-07 15:08:40
2 1.376572e+12 2013-08-15 14:13:40
3 1.379491e+12 2013-09-18 08:55:11
4 1.349454e+12 2012-10-05 17:28:04
5 1.383127e+12 2013-10-30 09:59:03

Next I wanted to group those timestamps by the combination of month + year for which the aggregate and format functions came in handy:

> dd = aggregate(meetupMembers$joined, by=list(format(meetupMembers$joined, "%m-%Y")), function(x) length(x))
> colnames(dd) = c("month", "count")
> dd
     month count
1  01-2012     4
2  01-2013    52
3  01-2014    88
4  02-2012     7
5  02-2013    52
6  02-2014    91
7  03-2012    12
8  03-2013    23
9  03-2014    93
10 04-2012     3
11 04-2013    34
12 04-2014   119
13 05-2012     9
14 05-2013    69
15 05-2014   102
16 06-2011    14
17 06-2012     5
18 06-2013    39
19 06-2014   114
20 07-2011     4
21 07-2012    16
22 07-2013    20
23 08-2011     2
24 08-2012    34
25 08-2013    50
26 09-2012    14
27 09-2013    52
28 10-2011     2
29 10-2012    29
30 10-2013    42
31 11-2011     2
32 11-2012    31
33 11-2013    34
34 12-2012     7
35 12-2013    19

I wanted to be able to group by different date formats so I created the following function to make life easier:

groupBy = function(dates, format) {
  dd = aggregate(dates, by= list(format(dates, format)), function(x) length(x))
  colnames(dd) = c("key", "count")
  dd
}

Now we can find the join dates grouped by year:

> groupBy(meetupMembers$joined, "%Y")
   key count
1 2011    24
2 2012   171
3 2013   486
4 2014   607

or by day:

> groupBy(meetupMembers$joined, "%A")
        key count
1    Friday   135
2    Monday   287
3  Saturday    80
4    Sunday   102
5  Thursday   187
6   Tuesday   286
7 Wednesday   211

or by month:

> groupBy(meetupMembers$joined, "%m")
   key count
1   01   144
2   02   150
3   03   128
4   04   156
5   05   180
6   06   172
7   07    40
8   08    86
9   09    66
10  10    73
11  11    67
12  12    26

I found the ‘by day’ grouping interesting as I had the impression that the huge majority of people joined meetup groups on a Monday but the difference between Monday and Tuesday isn’t significant. 60% of the joins happen between Monday and Wednesday.

The ‘by month’ grouping is a bit skewed by the fact we’re only half way into 2014 and there have been a lot more people joining this year than in previous years.

If we exclude this year then the spread is more uniform with a slight dip in December:

> groupBy(meetupMembers$joined[format(meetupMembers$joined, "%Y") != 2014], "%m")
   key count
1   01    56
2   02    59
3   03    35
4   04    37
5   05    78
6   06    58
7   07    40
8   08    86
9   09    66
10  10    73
11  11    67
12  12    26

Next up I think I need to get some charts going on and perhaps compare the distributions of join dates of various London meetup groups against each other.

I’m an absolute R newbie so if anything I’ve done is stupid and can be done better please let me know.

Categories: Programming

Keeping a journal

Gridshore - Sun, 06/29/2014 - 23:34

Today I was reading the first part of a book I got as a gift from one of my customers. The book is called Show your work by Austin Kleon(Show Your Work! @ Amazon). The whole idea around this book is that you must be open en share what you learn and the steps you took to learn.

I think this fits me like a glove, but I can be more expressive. Therefore I have decided to do things differently. I want to start by writing smaller pieces of the things I want to do that day, or what I accomplished that day, give some excerpts of things I am working on. Not real blog posts or tutorials but more notes that I share with you. Since it is a Sunday I only want to share the book I am reading.


The post Keeping a journal appeared first on Gridshore.

Categories: Architecture, Programming

Diagramming Spring MVC webapps

Coding the Architecture - Simon Brown - Sun, 06/29/2014 - 09:54

Following on from my previous post (Software architecture as code) where I demonstrated how to create a software architecture model as code, I decided to throw together a quick implementation of a Spring component finder that could be used to (mostly) automatically create a model of a Spring MVC web application. Spring has a bunch of annotations (e.g. @Controller, @Component, @Service and @Repository) and these are often/can be used to signify the major building blocks of a web application. To illustrate this, I took the Spring PetClinic application and produced some diagrams for it. First is a context diagram.

A context diagram for the Spring PetClinic application

Next up are the containers, which in this case are just a web server (e.g. Apache Tomcat) and a database (HSQLDB by default).

A container diagram for the Spring PetClinic application

And finally we have a diagram showing the components that make up the web application. These, and their dependencies, were found by scanning the compiled version of the application (I cloned the project from GitHub and ran the Maven build).

A component diagram for the Spring PetClinic web application

Here is the code that I used to generate the model behind the diagrams.

The resulting JSON representing the model was then copy-pasted across into my simple (and very much in progress) diagramming tool. Admittedly the diagrams are lacking on some details (i.e. component responsibilities and arrow annotations, although those can be fixed), but this approach proves you can expend very little effort to get something that is relatively useful. As I've said before, it's all about getting the abstractions right.

Categories: Architecture

Neo4j: Set Based Operations with the experimental Cypher optimiser

Mark Needham - Sun, 06/29/2014 - 09:45

A few months ago I wrote about cypher queries which look for a missing relationship and showed how you could optimise them by re-working the query slightly.

To refresh, we wanted to find all the people in the London office that I hadn’t worked with given this model…

…and this initial query:

MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)

This took on average 7.46 seconds to execute using cypher-query-tuning so we came up with the following version which took 150 ms on average:

MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)

With the release of Neo4j 2.1 we can now make use of Ronja – the experimental Cypher optimiser – which performs much better for certain types of queries. I thought I’d give it a try against this one.

We can use the experimental optimiser by prefixing our query like so:

cypher 2.1.experimental MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)

If we run that through the query tuner we get the following results:

$ python set-based.py
 
cypher 2.1.experimental MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)
Min 0.719580888748 50% 0.723278999329 95% 0.741609430313 Max 0.743646144867
 
 
MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)
Min 0.706955909729 50% 0.715770959854 95% 0.731880950928 Max 0.733670949936

As you can see there’s not much in it – our original query now runs as quickly as the optimised one. Ronja #ftw!

Give it a try on your slow queries and see how it gets on. There’ll certainly be some cases where it’s slower but over time it should be faster for a reasonable chunk of queries.

Categories: Programming

Neo4j’s Cypher vs Clojure – Group by and Sorting

Mark Needham - Sun, 06/29/2014 - 03:56

One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language.

A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order.

First let’s create some future and some past events based on the current timestamp of 1404006050535:

CREATE (event1:Event {name: "Future Event 1", timestamp: 1414002772427 })
CREATE (event2:Event {name: "Future Event 2", timestamp: 1424002772427 })
CREATE (event3:Event {name: "Future Event 3", timestamp: 1416002772427 })
 
CREATE (event4:Event {name: "Past Event 1", timestamp: 1403002772427 })
CREATE (event5:Event {name: "Past Event 2", timestamp: 1402002772427 })

If we return all the events we see the following:

$ MATCH (e:Event) RETURN e;
==> +------------------------------------------------------------+
==> | e                                                          |
==> +------------------------------------------------------------+
==> | Node[15414]{name:"Future Event 1",timestamp:1414002772427} |
==> | Node[15415]{name:"Future Event 2",timestamp:1424002772427} |
==> | Node[15416]{name:"Future Event 3",timestamp:1416002772427} |
==> | Node[15417]{name:"Past Event 1",timestamp:1403002772427}   |
==> | Node[15418]{name:"Past Event 2",timestamp:1402002772427}   |
==> +------------------------------------------------------------+
==> 5 rows
==> 13 ms

We can achieve the desired grouping and sorting with the following cypher query:

(def sorted-query "MATCH (e:Event)
WITH COLLECT(e) AS events
WITH [e IN events WHERE e.timestamp <= timestamp()] AS pastEvents,
     [e IN events WHERE e.timestamp > timestamp()] AS futureEvents
UNWIND pastEvents AS pastEvent
WITH pastEvent, futureEvents ORDER BY pastEvent.timestamp DESC
WITH COLLECT(pastEvent) as orderedPastEvents, futureEvents
UNWIND futureEvents AS futureEvent
WITH futureEvent, orderedPastEvents ORDER BY futureEvent.timestamp
RETURN COLLECT(futureEvent) AS orderedFutureEvents, orderedPastEvents")

We then use the following function to call through to the Neo4j server using the excellent neocons library:

(ns neo4j-meetup.db
  (:require [clojure.walk :as walk])
  (:require [clojurewerkz.neocons.rest.cypher :as cy])
  (:require [clojurewerkz.neocons.rest :as nr]))
 
(def NEO4J_HOST "http://localhost:7521/db/data/")
 
(defn cypher
  ([query] (cypher query {}))
  ([query params]
     (let [conn (nr/connect! NEO4J_HOST)]
       (->> (cy/tquery query params)
            walk/keywordize-keys))))

We call that function and grab the first row since we know there won’t be any other rows in the result:

(def query-result (->> ( db/cypher sorted-query) first))

Now we need to extract the past and future collections so that we can display them on the page which we can do like so:

> (map #(% :data) (query-result :orderedPastEvents))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (map #(% :data) (query-result :orderedFutureEvents))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

An alternative approach is to return the events from cypher and then handle the grouping and sorting in clojure. In that case our query is much simpler:

(def unsorted-query "MATCH (e:Event) RETURN e")

We’ll use the clj-time library to determine the current time:

(def now (clj-time.coerce/to-long (clj-time.core/now)))

First let’s split the events into past and future:

> (def grouped-by-events 
     (->> (db/cypher unsorted-query)
          (map #(->> % :e :data))
          (group-by #(> (->> % :timestamp) now))))
 
> grouped-by-events
{true [{:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1424002772427, :name "Future Event 2"} {:timestamp 1416002772427, :name "Future Event 3"}], 
 false [{:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"}]}

And finally we sort appropriately using these functions:

(defn time-descending [row] (* -1 (->> row :timestamp)))
(defn time-ascending [row] (->> row :timestamp))
> (sort-by time-descending (get grouped-by-events false))
({:timestamp 1403002772427, :name "Past Event 1"} {:timestamp 1402002772427, :name "Past Event 2"})
 
> (sort-by time-ascending (get grouped-by-events true))
({:timestamp 1414002772427, :name "Future Event 1"} {:timestamp 1416002772427, :name "Future Event 3"} {:timestamp 1424002772427, :name "Future Event 2"})

I used Clojure to do the sorting and grouping in my project because the query to get the events was a bit more complicated and became very difficult to read with the sorting and grouping mixed in.

Unfortunately cypher doesn’t provide an easy way to sort within a collection so we need our sorting in the row context and then collect the elements back again afterwards.

Categories: Programming

Data Science: Mo’ Data Mo’ Problems

Mark Needham - Sun, 06/29/2014 - 00:35

Over the last couple of years I’ve worked on several proof of concept style Neo4j projects and on a lot of them people have wanted to work with their entire data set which I don’t think makes sense so early on.

In the early parts of a project we’re trying to prove out our approach rather than prove we can handle big data – something that Ashok taught me a couple of years ago on a project we worked on together.

In a Neo4j project that means coming up with an effective way to model and query our data and if we lose track of this it’s very easy to get sucked into working on the big data problem.

This could mean optimising our import scripts to deal with huge amounts of data or working out how to handle different aspects of the data (e.g. variability in shape or encoding) that only seem to reveal themselves at scale.

These are certainly problems that we need to solve but in my experience they end up taking much more time than expected and therefore aren’t the best problem to tackle when time is limited. Early on we want to create some momentum and keep the feedback cycle fast.

We probably want to tackle the data size problem as part of the implementation/production stage of the project to use Michael Nygaard’s terminology.

At this stage we’ll have some confidence that our approach makes sense and then we can put aside the time to set things up properly.

I’m sure there are some types of projects where this approach doesn’t make sense so I’d love to hear about them in the comments so I can spot them in future.

Categories: Programming

Episode 205: Martin Lippert on Eclipse Flux

Eberhard Wolff talks with Martin Lippert of Pivotal about the Eclipse Flux project. This projects is in its early stages — and has a very interesting goal: It aims to put software development tools into the cloud. It is a lot more than just an IDE (integrated development environment) in a browser. Instead the IDE […]
Categories: Programming

Data Science is the Art of Asking Better Questions

I heard a colleague make a great comment today …

“Data science is the art of asking better questions.

It’s not the art of finding a solution … the data keeps evolving.”

Categories: Architecture, Programming

Software architecture as code

Coding the Architecture - Simon Brown - Tue, 06/24/2014 - 21:22

If you've been following the blog, you will have seen a couple of posts recently about the alignment of software architecture and code. Software architecture vs code talks about the typical gap between how we think about the software architecture vs the code that we write, while An architecturally-evident coding style shows an example of how to ensure that the code does reflect those architectural concepts. The basic summary of the story so far is that things get much easier to understand if your architectural ideas map simply and explicitly into the code.

Regular readers will also know that I'm a big fan of using diagrams to visualise and communicate the architecture of a software system, and this "big picture" view of the world is often hard to see from the thousands of lines of code that make up our software systems. One of the things that I teach people during my sketching workshops is how to sketch out a software system using a small number of simple diagrams, each at very separate levels of abstraction. This is based upon my C4 model, which you can find an introduction to at Simple sketches for diagramming your software architecture. The feedback from people using this model has been great, and many have a follow-up question of "what tooling would you recommend?". My answer has typically been "Visio or OmniGraffle", but it's obvious that there's an opportunity here.

Representing the software architecture model in code

I've had a lot of different ideas over the past few months for how to create, what is essentially, a lightweight modelling tool and for some reason, all of these ideas came together last week while I was at the GOTO Amsterdam conference. I'm not sure why, but I had a number of conversations that inspired me in different ways, so I skipped one of the talks to throw some code together and test out some ideas. This is basically what I came up with...

It's a description of the context and container levels of my C4 model for the techtribes.je system. Hopefully it doesn't need too much explanation if you're familiar with the model, although there are some ways in which the code can be made simpler and more fluent. Since this is code though, we can easily constrain the model and version it. This approach works well for the high-level architectural concepts because there are very few of them, plus it's hard to extract this information from the code. But I don't want to start crafting up a large amount of code to describe the components that reside in each container, particularly as there are potentially lots of them and I'm unsure of the exact relationships between them.

Scanning the codebase for components

If your code does reflect your architecture (i.e. you're using an architecturally-evident coding style), the obvious solution is to just scan the codebase for those components, and use those to automatically populate the model. How do we signify what a "component" is? In Java, we can use annotations...

Identifying those components is then a matter of scanning the source or the compiled bytecode. I've played around with this idea on and off for a few months, using a combination of Java annotations along with annotation processors and libraries including Scannotation, Javassist and JDepend. The Reflections library on Google Code makes this easy to do, and now I have simple Java program that looks for my component annotation on classes in the classpath and automatically adds those to the model. As for the dependencies between components, again this is fairly straightforward to do with Reflections. I have a bunch of other annotations too, for example to represent dependencies between a component and a container or software system, but the principle is still the same - the architecturally significant elements and their dependencies can mostly be embedded in the code.

Creating some views

The model itself is useful, but ideally I want to look at that model from different angles, much like the diagrams that I teach people to draw when they attend my sketching workshop. After a little thought about what this means and what each view is constrained to show, I created a simple domain model to represent the context, container and component views...

Again, this is all in code so it's quick to create, versionable and very customisable.

Exporting the model

Now that I have a model of my software system and a number of views that I'd like to see, I could do with drawing some pictures. I could create a diagramming tool in Java that reads the model directly, but perhaps a better approach is to serialize the object model out to an external format so that other tools can use it. And that's what I did, courtesy of the Jackson library. The resulting JSON file is over 600 lines long (you can see it here), but don't forget most of this has been generated automatically by Java code scanning for components and their dependencies.

Visualising the views

The last question is how to visualise the information contained in the model and there are a number of ways to do this. I'd really like somebody to build a Google Maps or Prezi-style diagramming tool where you can pinch-zoom in and out to see different views of the model, but my UI skills leave something to be desired in that area. For the meantime, I've thrown together a simple diagramming tool using HTML 5, CSS and JavaScript that takes a JSON string and visualises the views contained within it. My vision here is to create a lightweight model visualisation tool rather than a Visio clone where you have to draw everything yourself. I've deployed this app on Pivotal Web Services and you can try it for yourself. You'll have to drag the boxes around to lay out the elements and it's not very pretty, but the concept works. The screenshot that follows shows the techtribes.je context diagram.

A screenshot of a simple context diagram

Thoughts?

All of the C4 model Java code is open source and sitting on GitHub. This is only a few hours of work so far and there are no tests, so think of this as a prototype more than anything else at the moment. I really like the simplicity of capturing a software architecture model in code, and using an architecturally-evident coding style allows you to create large chunks of that model automatically. This also opens up the door to some other opportunities such automated build plugins, lightweight documentation tooling, etc. Caveats apply with the applicability of this to all software systems, but I'm excited at the possibilities. Thoughts?

Categories: Architecture

Teams Should Go So Fast They Almost Spin Out of Control

Mike Cohn's Blog - Tue, 06/24/2014 - 15:00

Yes, I really did refer to guitarist Alvin Lee in a Certified Scrum Product Owner class last week. Here's why.

I was making a point that Scrum teams should strive to go as fast as they can without going so fast they spin out of control. Alvin Lee of the band Ten Years After was a talented guitarist known for his very fast solos. Lee's ultimate performance was of the song "I'm Going Home" at Woodstock. During the performance, Lee was frequently on the edge of flying out of control, yet he kept it all together for some of the best 11 minutes in rock history.

I want the same of a Scrum team--I want them going so fast they are just on the verge of spinning out of control yet are able to keep it together and deliver something classic and powerful.

Re-watching Ten Years After's Woodstock performance I'm struck by a couple of other lessons, which I didn't mention in class last week:

One: Scrum teams should be characterized by frequent, small hand-offs. A programmer gets eight lines of code working and yells, "Hey, Tester, check it out." The tester has been writing automated tests while waiting for those eight lines and runs the tests. Thirty minutes later the programmer has the next micro-feature coded and ready for testing. Although a good portion of the song is made up of guitar solos, they aren't typically long solos. Lee plays a solo and soon hands the song back to his bandmates, repeating for four separate solos through the song.

Two: Scrum teams should minimize work in progress. While "I'm Going Home" is a long song (clocking in at over eleven minutes), there are frequent "deliveries" of interpolated songs throughout the performance. Listen for "Blue Suede Shoes, "Whole Lotta Shaking" and others, some played for just a few seconds.

OK, I'm probably nuts, and I certainly didn't make all these points in class. But Alvin Lee would have made one great Scrum teammate. Let me know what you think in the comments below.

Is there a future for Map/Reduce?

8w9jj

Google’s Jeffrey Dean and Sanjay Ghemawat filed the patent request and published the map/reduce paper  10 year ago (2004). According to WikiPedia Doug Cutting and Mike Cafarella created Hadoop, with its own implementation of Map/Reduce,  one year later at Yahoo – both these implementations were done for the same purpose – batch indexing of the web.

Back than, the web began its “web 2.0″ transition, pages became more dynamic , people began to create more content – so an efficient way to reprocess and build the web index was needed and map/reduce was it. Web Indexing was a great fit for map/reduce since the initial processing of each source (web page) is completely independent from any other – i.e.  a very convenient map phase and you need  to combine the results to build the reverse index. That said, even the core google algorithm –  the famous pagerank is iterative (so less appropriate for map/reduce), not to mention that  as the internet got bigger and the updates became more and more frequent map/reduce wasn’t enough. Again Google (who seem to be consistently few years ahead of the industry) began coming up with alternatives like Google Percolator  or  Google Dremel (both papers were published in 2010, Percolator was introduced at that year, and dremel has been used in Google since 2006).

So now, it is 2014, and it is time for the rest of us to catch up with Google and get over Map/Reduce and  for multiple reasons:

  • end-users’ expectations (who hear “big data” but interpret that as  “fast data”)
  • iterative problems like graph algorithms which are inefficient as you need to load and reload the data each iteration
  • continuous ingestion of data (increments coming on as small batches or streams of events) – where joining to existing data can be expensive
  • real-time problems – both queries and processing

In my opinion, Map/Reduce is an idea whose time has come and gone – it won’t die in a day or a year, there is still a lot of working systems that use it and the alternatives are still maturing. I do think, however, that if you need to write or implement something new that would build on map/reduce – you should use other option or at the very least carefully consider them.

So how is this change going to happen ?  Luckily, Hadoop has recently adopted YARN (you can see my presentation on it here), which opens up the possibilities to go beyond map/reduce without changing everything … even though in effect,  a lot  will change. Note that some of the new options do have migration paths and also we still retain the  access to all that “big data” we have in Hadoopm as well as the extended reuse of some of the ecosystem.

The first type of effort to replace map/reduce is to actually subsume it by offering more  flexible batch. After all saying Map/reduce is not relevant, deosn’t mean that batch processing is not relevant. It does mean that there’s a need to more complex processes. There are two main candidates here  Tez and Spark where Tez offers a nice migration path as it is replacing map/reduce as the execution engine for both Pig and Hive and Spark has a compelling offer by combining Batch and Stream processing (more on this later) in a single engine.

The second type of effort or processing capability that will help kill map/reduce is MPP databases on Hadoop. Like the “flexible batch” approach mentioned above, this is replacing a functionality that map/reduce was used for – unleashing the data already processed and stored in Hadoop.  The idea here is twofold

  • To provide fast query capabilities* – by using specialized columnar data format and database engines deployed as daemons on the cluster
  • To provide rich query capabilities – by supporting more and more of the SQL standard and enriching it with analytics capabilities (e.g. via MADlib)

Efforts in this arena include Impala from Cloudera, Hawq from Pivotal (which is essentially greenplum over HDFS), startups like Hadapt or even Actian trying to leverage their ParAccel acquisition with the recently announced Actian Vector . Hive is somewhere in the middle relying on Tez on one hand and using vectorization and columnar format (Orc)  on the other

The Third type of processing that will help dethrone Map/Reduce is Stream processing. Unlike the two previous types of effort this is covering a ground the map/reduce can’t cover, even inefficiently. Stream processing is about  handling continuous flow of new data (e.g. events) and processing them  (enriching, aggregating, etc.)  them in seconds or less.  The two major contenders in the Hadoop arena seem to be Spark Streaming and Storm though, of course, there are several other commercial and open source platforms that handle this type of processing as well.

In summary – Map/Reduce is great. It has served us (as an industry) for a decade but it is now time to move on and bring the richer processing capabilities we have elsewhere to solve our big data problems as well.

Last note  – I focused on Hadoop in this post even thought there are several other platforms and tools around. I think that regardless if Hadoop is the best platform it is the one becoming the de-facto standard for big data (remember betamax vs VHS?)

One really, really last note – if you read up to here, and you are a developer living in Israel, and you happen to be looking for a job –  I am looking for another developer to join my Technology Research team @ Amdocs. If you’re interested drop me a note: arnon.rotemgaloz at amdocs dot com or via my twitter/linkedin profiles

*esp. in regard to analytical queries – operational SQL on hadoop with efforts like  Phoenix ,IBM’s BigSQL or Splice Machine are also happening but that’s another story

illustration idea found in  James Mickens’s talk in Monitorama 2014 –  (which is, by the way, a really funny presentation – go watch it) -ohh yeah… and pulp fiction :)

Categories: Architecture

Hadoop YARN overview

I did a short overview of Hadoop YARN to our big data development team. The presentation covers the motivation for YARN, how it works and its major weaknesses

You can watch/download on slideshare

Categories: Architecture

Using Dropwizard in combination with Elasticsearch

Gridshore - Thu, 05/15/2014 - 21:09

Dropwizard logo

How often do you start creating a new application? How often have you thought about configuring an application. Where to locate a config file, how to load the file, what format to use? Another thing you regularly do is adding timers to track execution time, management tools to do thread analysis etc. From a more functional perspective you want a rich client side application using angularjs. So you need a REST backend to deliver json documents. Does this sound like something you need regularly? Than this blog post is for you. If you never need this, please keep on reading, you might like it.

In this blog post I will create an application that show you all the available indexes in your elasticsearch cluster. Not very sexy, but I am going to use: AngularJS, Dropwizard and elasticsearch. That should be enough to get a lot of you interested.


What is Dropwizard

Dropwizard is a framework that combines a lot of other frameworks that have become the de facto standard in their own domain. We have jersey for REST interface, jetty for light weight container, jackson for json parsing, free marker for front-end templates, Metric for the metrics, slf4j for logging. Dropwizard has some utilities to combine these frameworks and enable you as a developer to be very productive in constructing your application. It provides building blocks like lifecycle management, Resources, Views, loading of bundles, configuration and initialization.

Time to jump in and start creating an application.

Structure of the application

The application is setup as a maven project. To start of we only need one dependency:

<dependency>
    <groupId>io.dropwizard</groupId>
    <artifactId>dropwizard-core</artifactId>
    <version>${dropwizard.version}</version>
</dependency>

If you want to follow along, you can check my github repository:


https://github.com/jettro/dropwizard-elastic

Configure your application

Every application needs configuration. In our case we need to configure how to connect to elasticsearch. In drop wizard you extend the Configuration class and create a pojo. Using jackson and hibernate validator annotations we configure validation and serialization. In our case the configuration object looks like this:

public class DWESConfiguration extends Configuration {
    @NotEmpty
    private String elasticsearchHost = "localhost:9200";

    @NotEmpty
    private String clusterName = "elasticsearch";

    @JsonProperty
    public String getElasticsearchHost() {
        return elasticsearchHost;
    }

    @JsonProperty
    public void setElasticsearchHost(String elasticsearchHost) {
        this.elasticsearchHost = elasticsearchHost;
    }

    @JsonProperty
    public String getClusterName() {
        return clusterName;
    }

    @JsonProperty
    public void setClusterName(String clusterName) {
        this.clusterName = clusterName;
    }
}

Then you need to create a yml file containing the properties in the configuration as well as some nice values. In my case it looks like this:

elasticsearchHost: localhost:9300
clusterName: jc-play

How often did you start in your project to create the configuration mechanism? Usually I start with maven and quickly move to tomcat. Not this time. We did do maven, now we did configuration. Next up is the runner for the application.

Add the runner

This is the class we run to start the application. Internally jetty is started. We extend the Application class and use the configuration class as a generic. This is the class that initializes the complete application. Used bundles are initialized, classes are created and passed to other classes.

public class DWESApplication extends Application<DWESConfiguration> {
    private static final Logger logger = LoggerFactory.getLogger(DWESApplication.class);

    public static void main(String[] args) throws Exception {
        new DWESApplication().run(args);
    }

    @Override
    public String getName() {
        return "dropwizard-elastic";
    }

    @Override
    public void initialize(Bootstrap<DWESConfiguration> dwesConfigurationBootstrap) {
    }

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        logger.info("Running the application");
    }
}

When starting this application, we have no succes. A big error because we did not register any resources.

ERROR [2014-05-14 16:58:34,174] com.sun.jersey.server.impl.application.RootResourceUriRules: 
	The ResourceConfig instance does not contain any root resource classes.
Nothing happens, we just need a resource.

Before we can return something, we need to have something to return. We create a pogo called Index that contains one property called name. For now we just return this object as a json object. The following code shows the IndexResource that handles the requests that are related to the indexes.

@Path("/indexes")
@Produces(MediaType.APPLICATION_JSON)
public class IndexResource {

    @GET
    @Timed
    public Index showIndexes() {
        Index index = new Index();
        index.setName("A Dummy Index");

        return index;
    }
}

The @GET, @PATH and @Produces annotations are from the jersey rest library. @Timed is from the metrics library. Before starting the application we need to register our index resource with jersey.

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        logger.info("Running the application");
        final IndexResource indexResource = new IndexResource();
        environment.jersey().register(indexResource);
    }

Now we can start the application using the following runner from intellij. Later on we will create the executable jar.

Running the app from intelij

Run the application again, this time it works. You can browse to http://localhost:8080/index and see our dummy index as a nice json document. There is something in the logs though. I love this message, this is what you get when running the application without health checks.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!    THIS APPLICATION HAS NO HEALTHCHECKS. THIS MEANS YOU WILL NEVER KNOW      !
!     IF IT DIES IN PRODUCTION, WHICH MEANS YOU WILL NEVER KNOW IF YOU'RE      !
!    LETTING YOUR USERS DOWN. YOU SHOULD ADD A HEALTHCHECK FOR EACH OF YOUR    !
!         APPLICATION'S DEPENDENCIES WHICH FULLY (BUT LIGHTLY) TESTS IT.       !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Creating a health check

We add a health check, since we are creating an application interacting with elasticsearch, we create a health check for elasticsearch. Don’t think to much about how we connect to elasticsearch yet. We get there later on.

public class ESHealthCheck extends HealthCheck {

    private ESClientManager clientManager;

    public ESHealthCheck(ESClientManager clientManager) {
        this.clientManager = clientManager;
    }

    @Override
    protected Result check() throws Exception {
        ClusterHealthResponse clusterIndexHealths = clientManager.obtainClient().admin().cluster().health(new ClusterHealthRequest())
                .actionGet();
        switch (clusterIndexHealths.getStatus()) {
            case GREEN:
                return HealthCheck.Result.healthy();
            case YELLOW:
                return HealthCheck.Result.unhealthy("Cluster state is yellow, maybe replication not done? New Nodes?");
            case RED:
            default:
                return HealthCheck.Result.unhealthy("Something is very wrong with the cluster", clusterIndexHealths);

        }
    }
}

Just like with the resource handler, we need to register the health check. Together with the standard http port for normal users, another port is exposed for administration. Here you can find the metrics reports like Metrics, Ping, Threads, Healthcheck.

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        Client client = ESClientFactorybean.obtainClient(config.getElasticsearchHost(), config.getClusterName());

        logger.info("Running the application");
        final IndexResource indexResource = new IndexResource(client);
        environment.jersey().register(indexResource);

        final ESHealthCheck esHealthCheck = new ESHealthCheck(client);
        environment.healthChecks().register("elasticsearch", esHealthCheck);
    }

You as a reader now have an assignment to start the application and check the admin pages yourself: http://localhost:8081. We are going to connect to elasticsearch in the mean time.

Connecting to elasticsearch

We connect to elasticsearch using the transport client. This is taken care of by the ESClientManager. We make use of the drop wizard managed classes. The lifecycle of these classes is managed by drop wizard. From the configuration object we take the host(s) and the cluster name. Now we can obtain a client in the start method and pass this client to the classes that need it. The first class that needs it is the health check, but we already had a look at that one. Using the ESClientManager other classes have access to the client. The managed interface mandates the start as well as the stop method.

    @Override
    public void start() throws Exception {
        Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", clusterName).build();

        logger.debug("Settings used for connection to elasticsearch : {}", settings.toDelimitedString('#'));

        TransportAddress[] addresses = getTransportAddresses(host);

        logger.debug("Hosts used for transport client : {}", (Object) addresses);

        this.client = new TransportClient(settings).addTransportAddresses(addresses);
    }

    @Override
    public void stop() throws Exception {
        this.client.close();
    }

We need to register our managed class with the lifecycle of the environment in the runner class.

    @Override
    public void run(DWESConfiguration config, Environment environment) throws Exception {
        ESClientManager esClientManager = new ESClientManager(config.getElasticsearchHost(), config.getClusterName());
        environment.lifecycle().manage(esClientManager);
    }	

Next we want to change the IndexResource to use the elasticsearch client to list all indexes.

    public List<Index> showIndexes() {
        IndicesStatusResponse indices = clientManager.obtainClient().admin().indices().prepareStatus().get();

        List<Index> result = new ArrayList<>();
        for (String key : indices.getIndices().keySet()) {
            Index index = new Index();
            index.setName(key);
            result.add(index);
        }
        return result;
    }

Now we can browse to http://localhost:8080/indexes and we get back a nice json object. In my case I got this:

[
	{"name":"logstash-tomcat-2014.05.02"},
	{"name":"mymusicnested"},
	{"name":"kibana-int"},
	{"name":"playwithip"},
	{"name":"logstash-tomcat-2014.05.08"},
	{"name":"mymusic"}
]
Creating a better view

Having this REST based interface with json documents is nice, but not if you are human like me (well kind of). So let us add some AngularJS magic to create a slightly better view. The following page can of course also be created with easier view technologies, but I want to demonstrate what you can do with dropwizard.

First we make it possible to use free marker as a template. To make this work we need to additional dependencies: dropwizard-views and dropwizard-views-freemarker. The first step is a view class that knows the free marker template to load and provide the fields that you template can read. In our case we want to expose the cluster name.

public class HomeView extends View {
    private final String clusterName;

    protected HomeView(String clusterName) {
        super("home.ftl");
        this.clusterName = clusterName;
    }

    public String getClusterName() {
        return clusterName;
    }
}

Than we have to create the free marker template. This looks like the following code block

<#-- @ftlvariable name="" type="nl.gridshore.dwes.HomeView" -->
<html ng-app="myApp">
<head>
    <title>DWAS</title>
</head>
<body ng-controller="IndexCtrl">
<p>Underneath a list of indexes in the cluster <strong>${clusterName?html}</strong></p>

<div ng-init="initIndexes()">
    <ul>
        <li ng-repeat="index in indexes">{{index.name}}</li>
    </ul>
</div>

<script src="/assets/js/angular-1.2.16.min.js"></script>
<script src="/assets/js/app.js"></script>
</body>
</html>

By default you put these template in the resources folder using the same sub folders as your view class has for the package. If you look closely you see some angularjs code, more on this later on. First we need to map a url to the view. This is done with a resource class. The following code block shows the HomeResource class that maps the “/” to the HomeView.

@Path("/")
@Produces(MediaType.TEXT_HTML)
public class HomeResource {
    private String clusterName;

    public HomeResource(String clusterName) {
        this.clusterName = clusterName;
    }

    @GET
    public HomeView goHome() {
        return new HomeView(clusterName);
    }
}

Notice we now configure it to return text/html. The goHome method is annotated with GET, so each GET request to the PATH “/” is mapped to the HomeView class. Now we need to tell jersey about this mapping. That is done in the runner class.

final HomeResource homeResource = new HomeResource(config.getClusterName());
environment.jersey().register(homeResource);
Using assets

The final part I want to show is how to use the assets bundle from drop zone to map a folder “/assets” to a part of the url. To use this bundle you have to add the following dependency in maven: dropwizard-assets. Than we can easily map the assets folder in our resources folder to the web assets folder

    @Override
    public void initialize(Bootstrap<DWESConfiguration> dwesConfigurationBootstrap) {
        dwesConfigurationBootstrap.addBundle(new ViewBundle());
        dwesConfigurationBootstrap.addBundle(new AssetsBundle("/assets/", "/assets/"));
    }

That is it, now you can load the angular javascript file. My very basic sample has one angular controller. This controller uses the $http service to call our /indexes url. The result is used to show the indexes in a list view.

myApp.controller('IndexCtrl', function ($scope, $http) {
    $scope.indexes = [];

    $scope.initIndexes = function () {
        $http.get('/indexes').success(function (data) {
            $scope.indexes = data;
        });
    };
});

And the result

the very basic screen showing the indexes

Concluding

This was my first go at using drop wizard, I must admit I like what I have seen so far. Not sure if I would create a big application with it, on the other hand it is really structured. Before moving on I would need to reed a bit more about the library and check all of its options. There is a lot more possible than what I have showed you in here.

References

The post Using Dropwizard in combination with Elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming

Yet More Change for the Capitals

DevHawk - Harry Pierson - Sat, 04/26/2014 - 21:13

Six years ago, I was pretty excited about the future for the Washington Capitals. They had just lost their first round match up with the Flyers – which was a bummer – but they had made the playoffs for the first time in 3 seasons. I wrote at the time:

Furthermore, even though they lost, these playoffs are a promise of future success. I tell my kids all the time that the only way to get good at something is to work hard while you’re bad at it. Playoff hockey is no different. Most of the Caps had little or no playoff experience going into this series and it really showed thru the first three games. But they kept at it and played much better over the last four games of the series. They went 2-2 in those games, but the two losses went to overtime. A little more luck (or better officiating) and the Caps are headed to Pittsburgh instead of the golf course.

What a difference six seasons makes. Sure, they won the President’s Trophy in 2010. But the promise of future playoff success has been broken, badly. The Caps have been on a pretty steep decline after getting beat by the eighth seed Canadians in the first round of the playoffs in 2010. Since then, they’ve switched systems three times and head coaches twice. This year, they missed the playoffs entirely even with Alex Ovechkin racking up a league-leading 51 goals.

Today, the word came down that both the coach and general manager have been let go. As a Caps fan, I’m really torn about this. I mean, I totally agree that the coach and GM had to go – frankly, I was surprised it didn’t happen 7-10 days earlier. But now what do you do? The draft is two months and one day away, free agency starts two days after that. The search for a GM is going to have to be fast. Then the GM will have to make some really important decisions about players at the draft, free agency and compliance buyouts with limited knowledge of the players in our system. Plus, he’ll need to hire a new head coach – preferably before the draft as well.

The one positive note is that the salary cap for the Capitals looks pretty good for next year. The Capitals currently have the second largest amount of cap space / open roster slot in the league. (The Islanders are first with $14.5 million / open roster slot. The Caps have just over $7 million / open roster slot.) They have only a handful of unrestricted free agents to resign – with arguably only one “must sign” (Mikhail Grabovski) in the bunch. Of course, this could also be a bug rather than a feature – having that many players under contract may make it harder for the new GM to shape the team in his image.

Who every the Capitals hire to be GM and coach, I’m not expecting a promising start. It feels like the next season is already a wash, and we’re not even finished with the first round of this year’s playoffs yet.

I guess it could be worse.

I could be a Toronto Leafs fan.

Categories: Architecture, Programming

Brokered WinRT Components Step Three

DevHawk - Harry Pierson - Fri, 04/25/2014 - 16:45

So far, we’ve created two projects, written all of about two lines of code and we have both our brokered component and its proxy/stub ready to go. Now it’s time to build the Windows Runtime app that uses the component. So far, things have been pretty easy – the only really tricky and/or manual step so far has been registering the proxy/stub, and that’s only tricky if you don’t want to run VS as admin. Unfortunately, tying this all together in the app requires a few more manual steps.

But before we get to the manual steps, let’s create the WinRT client app. Again, we’re going to create a new project but this time we’re going to select “Blank App (Windows)” from the Visual C# -> Store Apps -> Windows App node of the Add New Project dialog. Note, I’m not using “Blank App (Universal)” or “Blank App (Windows Phone)” because the brokered WinRT component feature is not support on Windows Phone. Call the client app project whatever you like, I’m calling mine “HelloWorldBRT.Client”.

Before we start writing code, we need to reference the brokered component. We can’t reference the brokered component directly or it will load in the sandboxed app process. Instead, the app need to reference a reference assembly version of the .winmd that gets generated automatically by the proxy/stub project. Remember in the last step when I said Kieran Mockford is an MSBuild wizard? The proxy/stub template project includes a custom target that automatically publishes the reference assembly winmd file used by the client app. When he showed me that, I was stunned – as I said, the man is a wizard. This means all you need to do is right click on the References node of the WinRT Client app project and select Add Reference. In the Reference Manager dialog, add a reference to the proxy/stub project you created in step two.

Now I can add the following code to the top of my App.OnLaunched function. Since this is a simple Hello World walkthru, I’m not going to bother to build any UI. I’m just going to inspect variables in the debugger. Believe me, the less UI I write, the better for everyone involved. Note, I’ve also added the P/Invoke signatures for GetCurrentProcess/ThreadID and to the client app like I did in the brokered component in step one. This way, I can get the process and thread IDs for both the app and broker process and compare them.

var pid = GetCurrentProcessId();
var tid = GetCurrentThreadId();

var c = new HelloWorldBRT.Class();
var bpid = c.CurrentProcessId;
var btid = c.CurrentThreadId;

At this point the app will compile, but if I run it the app will throw a TypeLoadException when it tries to create an instance of HelloWorldBRT.Class. The type can’t be loaded because the we’re using the reference assembly .winmd published by the proxy/stub project – it has no implementation details, so it can’t load. In order to be able to load the type, we need to declare the HelloWorldBRT.Class as a brokered component in the app’s pacakge.appxmanifest file. For non-brokered components, Visual Studio does this for you automatically. For brokered components we have to do it manually unfortunately. Every activatable class (i.e. class you can construct via “new”) needs to be registered in the appx manifest this way.

To register HelloWorldBRT.Class, right click the Package.appxmanifest file in the client project, select “Open With” from the context menu and then select “XML (Text) editor” from the Open With dialog. Then you need to insert inProcessServer extension that includes an ActivatableClass element for each class you can activate (aka has a public constructor). Each ActivatableClass element contains an ActivatableClassAttribute element that contains a pointer to the folder where the brokered component is installed. Here’s what I added to Package.appxmainfest of my HelloWorldBRT.Client app.

<Extensions>
  <Extension Category="windows.activatableClass.inProcessServer">
    <InProcessServer>
      <Path>clrhost.dll</Path>
      <ActivatableClass ActivatableClassId="HelloWorldBRT.Class" 
                        ThreadingModel="both">
        <ActivatableClassAttribute 
             Name="DesktopApplicationPath" 
             Type="string" 
             Value="D:\dev\HelloWorldBRT\Debug\HelloWorldBRT.PS"/>
      </ActivatableClass>
    </InProcessServer>
  </Extension>
</Extensions>

The key thing here is the addition of the DesktopApplicationPath ActivatableClassAttribute. This tells the WinRT activation logic that HelloWorldBRT.Class is a brokered component and where the managed .winmd file with the implementation details is located on the device. Note, you can use multiple brokered components in your side loaded app, but they all have the same DesktopApplicationPath.

Speaking of DesktopApplicationPath, the path I’m using here is path the final location of the proxy/stub components generated by the compiler. Frankly, this isn’t an good choice to use in a production deployment. But for the purposes of this walk thru, it’ll be fine.

ClientWatchWindow

Now when we run the app, we can load a HelloWorldBRT.Class instance and access the properties. re definitely seeing a different app process IDs when comparing the result of calling GetCurrentProcessId directly in App.OnLoaded vs. the result of calling GetCurrentProcessId in the brokered component. Of course, each run of the app will have different ID values, but this proves that we are loading our brokered component into a different process from where our app code is running.

Now you’re ready to go build your own brokered components! Here’s hoping you’ll find more interesting uses for them than comparing the process IDs of the app and broker processes in the debugger! :)

Categories: Architecture, Programming

Brokered WinRT Components Step Two

DevHawk - Harry Pierson - Fri, 04/25/2014 - 16:43

Now that we have built the brokered component , we have to build a proxy/stub for it. Proxies and stubs are how WinRT method calls are marshalled across process boundaries. If you want to know more – or you have insomnia – feel free to read all the gory details up on MSDN.

Proxies and stubs look like they might be scary, but they’re actually trivial (at least in the brokered component scenario) because 100% of the code is generated for you. It couldn’t be much easier.

Right click the solution node and select Add -> New Project. Alternatively, you can select File -> New -> Project in the Visual Studio main menu, but if you do that make sure you change the default solution from “Create new Solution” to “Add to Solution”. Regardless of how you launch the new project wizard, search for “broker” again, but this time select the “Brokered Windows Runtime ProxyStub” template. Give the project a name – I chose “HelloWorldBRT.PS”.

ProxyStubAddReferenceOnce you’ve created the proxy/stub project, you need to set a reference to the brokered component you created in step 1. Since proxies and stubs are native, this is a VC++ project. Adding a reference in a VC++ is not as straightforward as it is in C# projects. Right click the proxy/stub project, select “Properties” and then select Common Properties -> References from the tree on the left. Press the “Add New Reference…” button to bring up the same Add Reference dialog you’ve seen in managed code projects. Select the brokered component project and press OK.

Remember when I said that 100% of the code for the proxy/stub is generated? I wasn’t kidding – creating the template and setting referencing the brokered component project is literally all you need to do. Want proof? Go ahead and build now. If you watch the output windows, you’ll see a bunch of output go by referencing IDL files and MIDLRT among other stuff. This proxy/stub template has some custom MSBuild tasks that generates the proxy/stub code using winmdidl and midlrt. The process is similar to what is described here. BTW, if you get a chance, check out the proxy/stub project file – it is a work of art. Major props to Kieran Mockford for his msbuild wizardry.

ProxyStubRegisterOutputUnfortunately, it’s not enough just to build the proxy/stub – you also have to register it. The brokered component proxy/stub needs to be registered globally on the machine, which means you have to be running as an admin to do it. VS can register the proxy/stub for you automatically, but that means you have to run VS as an administrator. That always makes me nervous, but if you’re OK with running as admin you can enable proxy/stub registration by right clicking the proxy/stub project file, selecting Properties, navigating to Configuration properties -> Linker -> General in the tree of the project properties page, and then changing Register Output to “Yes”.

If you don’t like running VS as admin, you can manually register the proxy/stub by running “regsvr32 <proxystub dll>” from an elevated command prompt. Note, you do have to re-register every time the public surface area of your brokered component changes so letting VS handle registration admin is definitely the easier route to go.

In the third and final step, we’ll build a client app that accesses our brokered component.

Categories: Architecture, Programming

Brokered WinRT Components Step One

DevHawk - Harry Pierson - Fri, 04/25/2014 - 16:41

In this step, we’ll build the brokered component itself. Frankly, the only thing that makes a brokered component different than a normal WinRT component is some small tweaks to the project file to enable access to the full .NET Runtime and Base Class Library. The brokered component whitepaper describes the these tweaks in detail, but the new brokered component template takes care of these small tweaks for you.

BRT_NewProjectStart by selecting File -> New -> Project in Visual Studio. With the sheer number of templates to choose from these days, I find it’s easier to just search for the one I want. Type “broker” in the search box in the upper left, you’ll end up with two choices – the brokered WinRT component and the brokered WinRT proxy/stub. For now, choose the brokered component. We’ll be adding a brokered proxy/stub in step two. Name the project whatever you want. I named mine “HelloWorldBRT”.

This is probably the easiest step of the three as there’s nothing really special you have to do – just write managed code like you always do. In my keynote demo, this is where I wrote the code that wrapped the existing ADO.NET based data access library. For the purposes of this walkthrough, let’s do something simpler. We’ll use P/Invoke to retrieve the current process and thread IDs. These Win32 APIs are supported for developing WinRT apps and will make it obvious that the component is running in a separate process than the app. Here’s the simple code to retrieve those IDs (hat tip to pinvoke.net for the interop signatures):

public sealed class Class
{
    [DllImport("kernel32.dll")]
    static extern uint GetCurrentThreadId();

    [DllImport("kernel32.dll")]
    static extern uint GetCurrentProcessId();

    public uint CurrentThreadId
    {
        get { return GetCurrentThreadId(); }
    }

    public uint CurrentProcessId
    {
        get { return GetCurrentProcessId(); }
    }
}

That’s it! I didn’t even bother to change the class name for this simple sample.

Now, to be clear, there’s no reason why this code needs to run in a broker process. As I pointed out, the Win32 functions I’m wrapping here are supported for use in Windows Store apps. For this walkthrough, I’m trying to keep the code simple in order to focus on the specifics of building brokered components. If you want to see an example that actually leverages the fact that it’s running outside of the App Container, check out the NorthwindRT sample.

In the next step, we’ll add the proxy/stub that enables this component to communicate across a process boundary.

Categories: Architecture, Programming

Brokered WinRT Components Step-by-Step

DevHawk - Harry Pierson - Fri, 04/25/2014 - 16:40

Based on the feedback I’ve gotten since my keynote appearance @ Build – both in person and via email & twitter – there are a lot of folks who are excited about the Brokered WinRT Component feature. However, I’ve been advising folks to hold off a bit until the new VS templates were ready. Frankly, the developer experience for this feature is a bit rough and the VS template makes the experience much better. Well, hold off no longer! My old team has published the Brokered WinRT Component Project Templates up on the Visual Studio Gallery!

Now that the template is available, I’ve written a step-by-step guide demonstrating how to build a “Hello World” style brokered component. Hopefully, this will help folks in the community take advantage of this cool new feature in Windows 8.1 Update.

To keep it readable, I’ve broken it into three separate posts:

Note, this walkthrough assumes you’re running Windows 8.1 Update, Visual Studio 2013 with Update 2 RC (or later) and the Brokered WinRT Component Project Templates installed.

I hope this series helps you take advantage of brokered WinRT components. If you have any further questions, feel free to drop me an email or hit me up on Twitter.

Categories: Architecture, Programming