Warning: Table './devblogsdb/cache_page' is marked as crashed and last (automatic?) repair failed query: SELECT data, created, headers, expire, serialized FROM cache_page WHERE cid = 'http://www.softdevblogs.com/?q=aggregator&page=27' in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc on line 135

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 729

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 730

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 731

Warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/bootstrap.inc on line 732
Software Development Blogs: Programming, Software Testing, Agile, Project Management
Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Feed aggregator
warning: Cannot modify header information - headers already sent by (output started at /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/database.mysql.inc:135) in /home/content/O/c/n/Ocnarfparking9/html/softdevblogs/includes/common.inc on line 153.

Refactoring JavaScript from Sync to Async in Safe Baby-Steps

Mistaeks I Hav Made - Nat Pryce - Sat, 08/08/2015 - 17:10
Consider some JavaScript code that gets and uses a value from a synchronous call or built in data structure: function to_be_refactored() { var x; ... x = get_x(); ...use x... } Suppose we want to replace this synchronous call with a call to a service that has an asynchronous API (an HTTP fetch, for example). How can we refactor the code from synchronous to asynchronous style in small safe steps? First, wrap the the remainder of the function after the line that gets the value in a “continuation” function that takes the value as a parameter and closes over any other variables in its environment. Pass the value to the continuation function: function to_be_refactored() { var x, cont; ... x = get_x(); cont = function(x) { ...use x... }; cont(x); } Then pull the definition of the continuation function before the code that gets the value: function to_be_refactored() { var x, cont; cont = function(x) { ...use x... }; ... x = get_x(); cont(x); } Now extract the last two lines that get the value and call the continuation into a single function that takes the continuation as a parameter and pass the continuation to it. function to_be_refactored() { ... get_x_and(function(x) { ...use x... }); } function get_x_and(cont) { cont(get_x()); } If you have calls to get_x in many places in your code base, move get_x_and into a common module so that it can be called everywhere that get_x is called. Transform the remaining uses of get_x to “continuation passing style”, replacing the calls to get_x with calls toget_x_and. Finally, replace the implementation of get_x_and with a call to the async service and delete the get_x function. Wouldn’t it be nice if IDEs could do this refactoring automatically? The Trouble With Shared Mutable State Dale Hagglund asked via Twitter “What if cont assumes that some [shared mutable] property remains constant across the async invocation? I’ve always found these very hard to unmake.” In that case, you’ll have to copy the current value of the shared, mutable property into a local variable that is then closed over by the continuation. E.g. function to_be_refactored() { var x; ... x = get_x(); ...use x and shared_mutable_y() ... } would have to become: function to_be_refactored() { var y; ... y = shared_mutable_y(); get_x_and(function(x) { ...use x and y... }); }
Categories: Programming, Testing & QA

More Misconceptions of Waterfall

Herding Cats - Glen Alleman - Sat, 08/08/2015 - 16:47

It is popular in some agile circle to use Waterfall as the stalking horse for every bad management practices in software development. A recent example is

Go/No Go decisions are a residue of waterfall thinking. All software can built incrementally and most released incrementally.

Nothing in Waterfall prohibits incremental release. In fact the notion of block release is the basis of most Software Intensive Systems development. From the point of view of the business capabilities are what they bought. The capability to do something of value in exchange for the cost of that value. Here's an example in health insurance business. Incremental release of features is of little value if those features don't work together to provide some needed capability to conduct business. A naive approach is the release early and release often platitude of some in the agile domain. Let's say we're building a personnel management system. This includes recruiting, on-boarding, provisioning, benefits signup, time keeping, and payroll. It's not be very useful to release the time keeping feature if the payroll feature was not ready. 

Screen Shot 2015-08-08 at 9.37.16 AM

So before buying into the platitude of release early and often ask what does the business need to do business? Then draw a picture like the one about, develop a Plan for producing those capabilities in the order they are needed to deliver the needed value. Without this approach, you'll be spending money without producing value and calling that agile. 

That way you can stop managing other peoples money with Platitudes and replace them with actual business management processes. So every time you hear a platitude masking as good management, ask does that person using that platitude work anywhere that is high value at risk? No, then probably has yet to encounter that actual management of other peoples money

Related articles Capabilities Based Planning - Part 2 Are Estimates Really The Smell of Dysfunction?
Categories: Project Management

Welcome to The Internet of Compromised Things

Coding Horror - Jeff Atwood - Sat, 08/08/2015 - 11:59

This post is a bit of a public service announcement, so I'll get right to the point:

Every time you use WiFi, ask yourself: could I be connecting to the Internet through a compromised router with malware?

It's becoming more and more common to see malware installed not at the server, desktop, laptop, or smartphone level, but at the router level. Routers have become quite capable, powerful little computers in their own right over the last 5 years, and that means they can, unfortunately, be harnessed to work against you.

I write about this because it recently happened to two people I know.

.@jchris A friend got hit by this on newly paved win8.1 computer. Downloaded Chrome, instantly infected with malware. Very spooky.

— not THE Damien Katz (@damienkatz) May 20, 2015

@codinghorror *no* idea and there’s almost ZERO info out there. Essentially malicious JS adware embedded in every in-app browser

— John O'Nolan (@JohnONolan) August 7, 2015

In both cases, they eventually determined the source of the problem was that the router they were connecting to the Internet through had been compromised.

This is way more evil genius than infecting a mere computer. If you can manage to systematically infect common home and business routers, you can potentially compromise every computer connected to them.

Hilarious meme images I am contractually obligated to add to each blog post aside, this is scary stuff and you should be scared.

Router malware is the ultimate man-in-the-middle attack. For all meaningful traffic sent through a compromised router that isn't HTTPS encrypted, it is 100% game over. The attacker will certainly be sending all that traffic somewhere they can sniff it for anything important: logins, passwords, credit card info, other personal or financial information. And they can direct you to phishing websites at will – if you think you're on the "real" login page for the banking site you use, think again.

Heck, even if you completely trust the person whose router you are using, they could be technically be doing this to you. But they probably aren't.


In John's case, the attackers inserted annoying ads in all unencrypted web traffic, which is an obvious tell to a sophisticated user. But how exactly would the average user figure out where this junk is coming from (or worse, assume the regular web is just full of ad junk all the time), when even a technical guy like John – founder of the open source Ghost blogging software used on this very blog – was flummoxed?

But that's OK, we're smart users who would only access public WiFi using HTTPS websites, right? Sadly, even if the traffic is HTTPS encrypted, it can still be subverted! There's an extremely technical blow-by-blow analysis at Cryptostorm, but the TL;DR is this:

Compromised router answers DNS req for *.google.com to 3rd party with faked HTTPS cert, you download malware Chrome. Game over.

HTTPS certificate shenanigans. DNS and BGP manipulation. Very hairy stuff.

How is this possible? Let's start with the weakest link, your router. Or more specifically, the programmers responsible for coding the admin interface to your router.

They must be terribly incompetent coders to let your router get compromised over the Internet, since one of the major selling points of a router is to act as a basic firewall layer between the Internet and you… right?

In their defense, that part of a router generally works as advertised. More commonly, you aren't being attacked from the hardened outside. You're being attacked from the soft, creamy inside.

That's right, the calls are coming from inside your house!

By that I mean you'll visit a malicious website that scripts your own browser to access the web-based admin pages of your router, and reset (or use the default) admin passwords to reconfigure it.

Nasty, isn't it? They attack from the inside using your own browser. But that's not the only way.

  • Maybe you accidentally turned on remote administration, so your router can be modified from the outside.

  • Maybe you left your router's admin passwords at default.

  • Maybe there is a legitimate external exploit for your router and you're running a very old version of firmware.

  • Maybe your ISP provided your router and made a security error in the configuration of the device.

In addition to being kind of terrifying, this does not bode well for the Internet of Things.

Internet of Compromised Things, more like.

OK, so what can we do about this? There's no perfect answer; I think it has to be a defense in depth strategy.

Inside Your Home

Buy a new, quality router. You don't want a router that's years old and hasn't been updated. But on the other hand you also don't want something too new that hasn't been vetted for firmware and/or security issues in the real world.

Also, any router your ISP provides is going to be about as crappy and "recent" as the awful stereo system you get in a new car. So I say stick with well known consumer brands. There are some hardcore folks who think all consumer routers are trash, so YMMV.

I can recommend the Asus RT-AC87U – it did very well in the SmallNetBuilder tests, Asus is a respectable brand, it's been out a year, and for most people, this is probably an upgrade over what you currently have without being totally bleeding edge overkill. I know it is an upgrade for me.

(I am also eagerly awaiting Eero as a domestic best of breed device with amazing custom firmware, and have one pre-ordered, but it hasn't shipped yet.)

Download and install the latest firmware. Ideally, do this before connecting the device to the Internet. But if you connect and then immediately use the firmware auto-update feature, who am I to judge you.

Change the default admin passwords. Don't leave it at the documented defaults, because then it could be potentially scripted and accessed.

Turn off WPS. Turns out the Wi-Fi Protected Setup feature intended to make it "easy" to connect to a router by pressing a button or entering a PIN made it … a bit too easy. This is always on by default, so be sure to disable it.

Turn off uPNP. Since we're talking about attacks that come from "inside your house", uPNP offers zero protection as it has no method of authentication. If you need it for specific apps, you'll find out, and you can forward those ports manually as needed.

Make sure remote administration is turned off. I've never owned a router that had this on by default, but check just to be double plus sure.

For Wifi, turn on WPA2+AES and use a long, strong password. Again, I feel most modern routers get the defaults right these days, but just check. The password is your responsibility, and password strength matters tremendously for wireless security, so be sure to make it a long one – at least 20 characters with all the variability you can muster.

Pick a unique SSID. Default SSIDs just scream hack me, for I have all defaults and a clueless owner. And no, don't bother "hiding" your SSID, it's a waste of time.

Optional: use less congested channels for WiFi. The default is "auto", but you can sometimes get better performance by picking less used frequencies at the ends of the spectrum. As summarized by official ASUS support reps:

  • Set 2.4 GHz channel bandwidth to 40 MHz, and change the control channel to 1, 6 or 11.

  • Set 5 GHz channel bandwidth to 80 MHz, and change the control channel to 165 or 161.

Experts only: install an open source firmware. I discussed this a fair bit in Everyone Needs a Router, but you have to be very careful which router model you buy, and you'll probably need to stick with older models. There are several which are specifically sold to be friendly to open source firmware.

Outside Your Home

Well, this one is simple. Assume everything you do outside your home, on a remote network or over WiFi is being monitored by IBGs: Internet Bad Guys.

I know, kind of an oppressive way to voyage out into the world, but it's better to start out with a defensive mindset, because you could be connecting to anyone's compromised router or network out there.

But, good news. There are only two key things you need to remember once you're outside, facing down that fiery ball of hell in the sky and armies of IBGs.

  1. Never access anything but HTTPS websites.

    If it isn't available over HTTPS, don't go there!

    You might be OK with HTTP if you are not logging in to the website, just browsing it, but even then IBGs could inject malware in the page and potentially compromise your device. And never, ever enter anything over HTTP you aren't 100% comfortable with bad guys seeing and using against you somehow.

    We've made tremendous progress in HTTPS Everywhere over the last 5 years, and these days most major websites offer (or even better, force) HTTPS access. So if you just want to quickly check your GMail or Facebook or Twitter, you will be fine, because those services all force HTTPS.

  2. If you must access non-HTTPS websites, or you are not sure, always use a VPN.

    A VPN encrypts all your traffic, so you no longer have to worry about using HTTPS. You do have to worry about whether or not you trust your VPN provider, but that's a much longer discussion than I want to get into right now.

    It's a good idea to pick a go-to VPN provider so you have one ready and get used to how it works over time. Initially it will feel like a bunch of extra work, and it kinda is, but if you care about your security an encrypt-everything VPN is bedrock. And if you don't care about your security, well, why are you even reading this?

If it feels like these are both variants of the same rule, always strongly encrypt everything, you aren't wrong. That's the way things are headed. The math is as sound as it ever was – but unfortunately the people and devices, less so.

Be Safe Out There

Until I heard Damien's story and John's story, I had no idea router hardware could be such a huge point of compromise. I didn't realize that you could be innocently visiting a friend's house, and because he happens to be the parent of three teenage boys and the owner of an old, unsecured router that you connect to via WiFi … your life will suddenly get a lot more complicated.

As the amount of stuff we connect to the Internet grows, we have to understand that the Internet of Things is a bunch of tiny, powerful computers, too – and they need the same strong attention to security that our smartphones, laptops, and servers already enjoy.

[advertisement] At Stack Overflow, we help developers learn, share, and grow. Whether you’re looking for your next dream job or looking to build out your team, we've got your back.
Categories: Programming

Stuff The Internet Says On Scalability For August 7th, 2015

Hey, it's HighScalability time:

A feather? Brass relief? River valley? Nope. It's frost on mars!
  • $10 billion: Microsoft data center spend per year; 1: hours from London to New York at mach 4.5; 1+: million Facebook requests per second; 25TB: raw data collected per day at Criteo; 1440: minutes in a day; 2.76: farthest distance a human eye can detect a candle flame in kilometers.

  • Quotable Quotes:
    • @drunkcod: IT is a cost center you say? Ok, let's shut all the servers down until you figure out what part of revenue we contribute to.
    • Beacon 23: I’m here because they ain’t made a computer yet that won’t do something stupid one time out of a hundred trillion. Seems like good odds, but when computers are doing trillions of things a day, that means a whole lot of stupid. 
    • @johnrobb: China factory: Went from 650 employees to 60 w/ robots. 3x production increase.  1/5th defect rate.
    • @twotribes: "Metrics are the internet’s heroin and we’re a bunch of junkies mainlining that black tar straight into the jugular of our organizations."
    • @javame: @adrianco I've seen a 2Tb erlang monolith and I don't want to see that again cc/@martinfowler
    • @micahjay1: Thinking about @a16z podcast about bio v IT ventures. Having done both, big diff is cost to get started and burn rate. No AWS in bio...yet
    • @0xced: XML: 1996 XLink: 1997 XML-RPC: 1998 XML Schema: 1998 JSON: 2001 JSON-LD: 2010 SON-RPC: 2005 JSON Schema: 2009 
    • Inside the failure of Google+: What people failed to understand was Facebook had network effects. It’s like you have this grungy night club and people are having a good time and you build something next door that’s shiny and new, and technically better in some ways, but who wants to leave? People didn't need another version of Facebook.
    • @bdu_p: Old age and treachery will beat youth and skill every time. A failed attempt to replace unix grep 

  • The New World looks a lot like the old Moscow. The Master of Disguise: My Secret Life in the CIA: we assume constant surveillance. This saturation level of surveillance, which far surpassed anything Western intelligence services attempted in their own democratic societies, had greatly constrained CIA operations in Moscow for decades.

  • How Netflix made their website startup time 70% faster. They removed a lot of server side complexity by moving to mostly client side rendering. Java, Tomcat, Struts, and Tiles were replaced with Node.js and React.js.  They call this Universal JavaScript, JavaScript on the server side and the client side. "Using Universal JavaScript means the rendering logic is simply passed down to the client." Only a bootstrap view is rendered on the server with everything else rendered incrementally on the client.

  • How Facebook fights spam with Haskell. Haskell is used as an expressive, latency sensitive rules engine. Sitting at the front of the ingestion point pipeline, it synchronously handles every single write request to Facebook and Instagram. That's more than one million requests per second. So not so slow. Haskell works well because it's a purely functional strongly typed language, supports hot swapping, supports implicit concurrency, performs well, and supports interactive development. Haskell is not used for the entire stack however. It's sandwiched. On the top there's C++ to process messages and on the bottom there's C++ client code interacts with other services. Key design decision: rules can't make writes, which means an abstract syntax tree of fetches can be overlapped and batched. 

  • You know how kids these days don't know the basics, like how eggs come from horses or that milk comes from chickens? The disassociation disorder continues. Now Millions of Facebook users have no idea they’re using the internet: A while back, a highly-educated friend and I were driving through an area that had a lot of data centers. She asked me what all of those gigantic blocks of buildings contained. I told her that they were mostly filled with many servers that were used to host all sorts of internet services. It completely blew her mind. She had no idea that the services that she and billions of others used on their phones actually required millions and millions of computers to transmit and process the data.

  • History rererepeats itself. Serialization is still evil. Improving Facebook's performance on Android with FlatBuffers:  It took 35 ms to parse a JSON stream of 20 KB...A JSON parser needs to build a field mappings before it can start parsing, which can take 100 ms to 200 ms...FlatBuffers is a data format that removes the need for data transformation between storage and the UI...Story load time from disk cache is reduced from 35 ms to 4 ms per story...Transient memory allocations are reduced by 75 percent...Cold start time is improved by 10-15 percent.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Finding What To Learn Next

Making the Complex Simple - John Sonmez - Fri, 08/07/2015 - 13:00

Being able to learn things quickly is an amazing skill to have—even more so for developers because of speed of technology. Most people change careers 15 times throughout their life. Not jobs, careers! So it’s safe to assume that the average developer will have multiple jobs throughout their career. Each job change has the potential […]

The post Finding What To Learn Next appeared first on Simple Programmer.

Categories: Programming

Always #notimplementednovalue . . . Maybe or Maybe Not

Does writing throwaway code to generate information and knowledge have value?

Does writing throwaway code to generate information and knowledge have value?

When we talk about implementing software into production at the end of each sprint (or for the more avant-garde, continuously as it is completed) as a reflection of value that a team delivers to its customer there is always push back. Agile practitioners, in particular, are concerned that some of the code is never implemented. If unimplemented code is not perceived to have value, then why are development teams generating the code they are not going to put into production? The problem is often exacerbated by the perceived need of the developers to get credit for the most tangible output of their work (which is code) rather than the knowledge the code generates. In order to understand why some of the code that is created is not put into production, it is important to understand the typical sources of code that does not get implemented: research and development (R&D), prototypes and development aids.

Research and Development (R&D): R&D is defined as the “investigative activities with the intention of making a discovery that can either lead to the development of new products or procedures, or to improvement of existing products or procedures.” R&D is not required to create a new report from SAP or a new data base table to support a CRM system. In R&D in an IT department, researchers generate experiments to explore ideas and test hypothesis, not to create shippable products or an installed production base of code. The value in the process is the knowledge that is generated (also known as intellectual property). Credit (e.g. adulation and promotions) accrues for generating the IP rather than to the code.

Prototypes: Prototypes are often used to sort out whether an idea is worth pursuing (this could be considered a special, micro form of R&D) and/or as a technique to generate and validate requirements. Prototypes are preliminary models that once constructed are put aside. As with R&D, the goal is less to generate code than to generate information that can be used in subsequent steps in solving a specific problem. As with R&D, credit (e.g. adulation and promotions) accrues for generating the IP rather than to the code.

Development Aids: Developers and testers often create tools to aid in the construction and testing of functionality. Rarely are these tools developed to be put into production. The value of this code is reflected in the efficiency and quality of the functionality they are created to support.

Whether in an R&D environment or at a team-level building prototypes or development aids, does writing throwaway code to generate information and knowledge have value? While this question sounds pedantic, it is a question that gets discussed when a coach begins to push the idea of #NotInProductionNoValue. The answer is to focus the discussion on the information and knowledge generated. In the end, it is the information and knowledge that that has value that will move forward with the project or organization even when the code is sloughed off like skin after a bad sunburn. Most simply when testing an assumption keeps you from making a mistake or provides information to make a good decision, doing whatever is needed makes sense. However, it is not the code that has value per se, but rather the information generated.

Side Note: Many IT departments have rebranded themselves as R&D departments. The R&D metaphor is used to evoke that the IT Departments is identifying products and leading the business. In some startup and cutting edge technology firms this is may well be true; however, generally the use of the term is a generally misnomer or wishful thinking. Instead most IT departments are focused on product delivery, i.e. building solutions based on relatively tried and true frameworks and methods. If you doubt the veracity of that statement just observe the amount of package software (e.g. SAP, PeopleSoft) your own organization supports.

Categories: Process Management

Spark: Convert RDD to DataFrame

Mark Needham - Thu, 08/06/2015 - 22:11

As I mentioned in a previous blog post I’ve been playing around with the Databricks Spark CSV library and wanted to take a CSV file, clean it up and then write out a new CSV file containing some of the columns.

I started by processing the CSV file and writing it into a temporary table:

import org.apache.spark.sql.{SQLContext, Row, DataFrame}
val sqlContext = new SQLContext(sc)
val crimeFile = "Crimes_-_2001_to_present.csv"
sqlContext.load("com.databricks.spark.csv", Map("path" -> crimeFile, "header" -> "true")).registerTempTable("crimes")

I wanted to get to the point where I could call the following function which writes a DataFrame to disk:

private def createFile(df: DataFrame, file: String, header: String): Unit = {
  FileUtil.fullyDelete(new File(file))
  val tmpFile = "tmp/" + System.currentTimeMillis() + "-" + file
  df.distinct.save(tmpFile, "com.databricks.spark.csv")

The first file only needs to contain the primary type of crime, which we can extract with the following query:

val rows = sqlContext.sql("select `Primary Type` as primaryType FROM crimes LIMIT 10")
res4: Array[org.apache.spark.sql.Row] = Array([ASSAULT], [ROBBERY], [CRIMINAL DAMAGE], [THEFT], [THEFT], [BURGLARY], [THEFT], [BURGLARY], [THEFT], [CRIMINAL DAMAGE])

Some of the primary types have trailing spaces which I want to get rid of. As far as I can tell Spark’s variant of SQL doesn’t have the LTRIM or RTRIM functions but we can map over ‘rows’ and use the String ‘trim’ function instead:

rows.map { case Row(primaryType: String) => Row(primaryType.trim) }
res8: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[29] at map at DataFrame.scala:776

Now we’ve got an RDD of Rows which we need to convert back to a DataFrame again. ‘sqlContext’ has a function which we might be able to use:

sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => Row(primaryType.trim) })
<console>:27: error: overloaded method value createDataFrame with alternatives:
  [A <: Product](data: Seq[A])(implicit evidence$4: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$3: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
 cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row])
              sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => Row(primaryType.trim) })

These are the signatures we can choose from:

2015 08 06 21 58 12

If we want to pass in an RDD of type Row we’re going to have to define a StructType or we can convert each row into something more strongly typed:

case class CrimeType(primaryType: String)
sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) })
res14: org.apache.spark.sql.DataFrame = [primaryType: string]

Great, we’ve got our DataFrame which we can now plug into the ‘createFile’ function like so:

  sqlContext.createDataFrame(rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) }),

We can actually do better though!

Since we’ve got an RDD of a specific class we can make use of the ‘rddToDataFrameHolder’ implicit function and then the ‘toDF’ function on ‘DataFrameHolder’. This is what the code looks like:

import sqlContext.implicits._
  rows.map { case Row(primaryType: String) => CrimeType(primaryType.trim) }.toDF(),

And we’re done!

Categories: Programming

Understanding Software Project Size - New Lecture Posted

10x Software Development - Steve McConnell - Thu, 08/06/2015 - 21:36

I've uploaded a new lecture in my Understanding Software Projects lecture series. This lecture focuses on the critical topic of Software Size. If you've ever wondered why some early projects succeed while later similar projects fail, this lecture explains the basic dynamics that cause that. If you've wondered why Scrum projects struggle to scale, I share some insights on that topic. 

I believe this is one of my best lectures in the series so far -- and it's a very important topic. It will be free for the next week, so check it out: https://cxlearn.com.

Lectures posted so far include:  

0.0 Understanding Software Projects - Intro
     0.1 Introduction - My Background
     0.2 Reading the News
     0.3 Definitions and Notations 

1.0 The Software Lifecycle Model - Intro
     1.1 Variations in Iteration 
     1.2 Lifecycle Model - Defect Removal
     1.3 Lifecycle Model Applied to Common Methodologies 
     1.4 Lifecycle Model - Selecting an Iteration Approach  

2.0 Software Size - Introduction (New)
     1.01 Size - Examples of Size     
     2.05 Size - Comments on Lines of Code
     2.1 Size - Staff Sizes 
     2.2 Size - Schedule Basics 
     2.3 Size - Debian Size Claims 

3.0 Human Variation - Introduction

Check out the lectures at http://cxlearn.com!

Are 64% of Features Really Rarely or Never Used?

Mike Cohn's Blog - Wed, 08/05/2015 - 15:00

A very oft-cited metric is that 64 percent of features in products are “rarely or never used.” The source for this claim was Jim Johnson, chairman of the Standish Group, who presented it in a keynote at the XP 2002 conference in Sardinia. The data Johnson presented can be seen in the following chart.

Johnson’s data has been repeated again and again to the extent that those citing it either don’t understand its origins or never bothered to check into them.

The misuse or perhaps just overuse of this data has been bothering me for a while, so I decided to investigate it. I was pretty sure of the facts but didn’t want to rely solely on my memory, so I got in touch with the Standish Group, and they were very helpful in clarifying the data.

The results Jim Johnson presented at XP 2002 and that have been repeated so often were based on a study of four internal applications. Yes, four applications. And, yes, all internal-use applications. No commercial products.

So, if you’re citing this data and using it to imply that every product out there contains 64 percent “rarely or never used features,” please stop. Please be clear that the study was of four internally developed projects at four companies.

MongoDB and WTFs and Anger

Eric.Weblog() - Eric Sink - Mon, 07/20/2015 - 19:00

Recently, Sven Slootweg (joepie91) published a blog entry entitled Why you should never, ever, ever use MongoDB. It starts out with the words "MongoDB is evil" and proceeds to give a list of negative statements about same.

I am not here to respond to each of his statements. He labels them as "facts", and some (or perhaps all) of them surely are. In fact, for now, let's assume that everything he wrote is correct. My point here is not to say that the author is wrong.

Rather, my point here is that this kind of blog entry tells me very little about MongoDB while it tells me a great deal about the emotions of the person who wrote it.

Like I said, it may be true that every WTF the author listed is correct. It is also true that some software has more WTFs than others.

I'm not a MongoDB expert, but I've been digging into it quite a bit, and I could certainly make my own list of its WTFs. And I would also admit that my own exploration of Couchbase has yielded fewer of those moments. Therefore, every single person on the planet who chooses MongoDB instead of Couchbase is making a terrible mistake, right?

Let me briefly shift to a similar situation where I personally have a lot more knowledge: Microsoft SQL Server vs PostgreSQL. For me, it is hard to study SQL Server without several WTF moments. And while PostgreSQL is not perfect, I have found that a careful study there tends to produce more admiration than WTFs.

So, after I discovered that (for example) SQL Server has no support for deferred foreign keys, why didn't I write a blog entry entitled "Why you should never, ever, ever use SQL Server"?

Because I calmed down and looked at the bigger picture.

I think I could make an entirely correct list of negative things about SQL Server that is several pages long. And I suppose if I wanted to do that, and if I were really angry while I was writing it, I would include only the facts that support my feelings, omitting anything positive. For example, my rant blog entry would have no reason to acknowledge that SQL Server is the mostly widely used relational database server in the world. These kinds of facts merely distract people from my point.

But what would happen if I stopped writing my rant and spent some time thinking about the fact I just omitted?

I just convinced myself that this piece of software is truly horrible, and yet, millions of people are using it every day. How do I explain this?

If I tried to make a complete list of theories that might fit the facts, today's blog entry would get too long. Suffice it to say this: Some of those theories might support an anti-Microsoft rant (for example, maybe Microsot's field sales team is really good at swindling people), but I'm NOT going to be able to prove that every single person who chose SQL Server has made a horrible mistake. There is no way I can credibly claim that PostgreSQL is the better choice for every single company simply because I admire it. Even though I think (for example) that SQL Server handles NULL and UNIQUE in a broken way, there is some very large group of people for whom SQL Server is a valid and smart choice.

So why would I write a blog entry that essentially claims that all SQL Server users are stupid when that simply cannot be true? I wouldn't. Unless I was really angry.

MongoDB is undisputably the top NoSQL vendor. It is used by thousands of companies who serve millions of users every day. Like all young software serving a large user base, it has bugs and flaws, some of which are WTF-worthy. But it is steadily getting better. Any discussion of its technical deficiences which does not address these things is mostly just somebody venting emotion.


My initial experience with Rust

Eric.Weblog() - Eric Sink - Mon, 06/08/2015 - 19:00
First, a digression about superhero movies

I am apparently incapable of hating any movie about a comic book superhero.

I can usually distinguish the extremes. Yes, I can tell that "The Dark Knight" was much better than "Elektra". My problem is that I tend to think that the worst movies in this genre are still pretty good.

And I have the same sort of unreasonable affection toward programming languages. I have always been fascinated by languages, compilers, and interpreters. My opinions about such things skew toward the positive simply because I find them so interesting.

I do still have preferences. For example, I tend to like strongly typed languages more. In fact, I think it is roughly true that the stricter a compiler is, the more I like it. But I can easily find things to like in languages that I mostly dislike.

I've spent more of my career writing C than any other language. But in use cases where I need something like C, I am increasingly eager for something more modern.

I started learning Rust with two questions:

  • How successful might Rust become as a viable replacement for C?

  • If I enjoy functional programming, how much of that enjoyment can I retain while coding in Rust?

The context

My exploration of Rust has taken place in one of my side projects: https://github.com/ericsink/LSM

LSM is a key-value database with a log-structured merge tree design. It is conceptually similar to Google LevelDB. I first wrote it in C#. Then I rewrote/ported it to F#. Now I have ported it to Rust. (The Rust port is not yet mentioned in the README for that repo, but it's in the top-level directory called 'rs'.)

For the purpose of learning F# and Rust, my initial experience was the same. The first thing I did in each of these languages was to port LSM. In other words, the F# and Rust ports of LSM are on equal footing. Both of them were written by someone who was a newbie in the language.

Anyway, although Rust and F# are very different languages, I have used F# as a reference point for my learning of Rust, so this blog entry walks that path as well.

This is not to say that I think Rust and F# would typically be used for the same kinds of things. I can give you directions from Denver to Chicago without asserting they are similar. Nonetheless, given that Rust is mostly intended to be a modern replacement for C, it has a surprising number of things in common with F#.

The big comparison table   F# Rust Machine model Managed, .NET CLR Native, LLVM Runtime CLR None Style Multi-paradigm, functional-first Multi-paradigm, imperative-first Syntax family ML-ish C-ish Blocks Significant whitespace Curly braces Exception handling Yes No Strings .NET (UTF-16) UTF-8 Free allocated memory Automatic, garbage collector Automatic, static analysis Type inference Yes, but not from method calls Yes, but only within functions Functional immutable collections Yes No Currying Yes No Partial application Yes No Compiler strictness Extremely strict Even stricter Tuples Yes Yes Discriminated unions
type Blob =
    | Stream of Stream
    | Array of byte[]
    | Tombstone
enum Blob {
Mutability To be avoided Safe to use Lambda expressions
let f = 
  (fun acc item -> acc + item)
let f = 
  |acc, &item| acc + item;
Higher-order functions List.fold f 0 a a.iter().fold(0, f) Integer overflow checking No open Checked Yes Let bindings
let x = 1
let mutable y = 2
let x = 1;
let mut y = 2;
if statements are expressions Yes Yes Unit type () () Pattern matching
match cur with
| Some csr -> csr.IsValid()
| None -> false
match cur {
    Some(csr) => csr.IsValid(),
    None => false
Primary collection type Linked list Vector Naming types CamelCase CamelCase Naming functions, etc camelCase snake_case Warnings about naming conventions No Yes Type for integer literals Suffix (0uy) Inference (0) or suffix (0u8) Project file foo.fsproj (msbuild) Cargo.toml Testing framework xUnit, NUnit, etc. Built into Cargo Debug prints printf "%A" foo println!("{:?}", foo); Memory safety

I have written a lot of C code over the years. More than once while in the middle of a project, I have stopped to explore ways of getting the compiler to catch my memory leaks. I tried the Clang static analyzer and Frama-C and Splint and others. It just seemed like there should be a way, even if I had to annotate function signatures with information about who owns a pointer.

So perhaps you can imagine my joy when I first read about Rust.

Even more cool, Rust has taken this set of ideas so much further than the simple feature I tried to envision. Rust doesn't just detect leaks, it also:

  • frees everything for you, like a garbage collector, but it's not.

  • prevents access to something that has been freed.

  • prevents modifying an iterator while it is being used.

  • prevents all memory corruption bugs.

  • automatically disposes other kinds of resources, not just allocated memory.

  • prevents two threads from having simultaneous access to something.

That last bullet is worth repeating: With Rust, you never stare at your code trying to figure out if it's thread safe or not. If it compiles, then it's thread safe.

Safety is Rust's killer feature, and it is very compelling.


If you come to Rust hoping to find a great functional language, you will be disappointed. Rust does have a bunch of functional elements, but it is not really a functional language. It's not even a functional-first hybrid. Nonetheless, Rust has enough cool functional stuff available that it has been described as "ML in C++ clothing".

I did my Rust port of LSM as a line-by-line translation from the F# version. This was not a particularly good approach.

  • Functional programming is all about avoiding mutable things, typically by using recursion, monads, computation expressions, and immutable collections.

  • In Rust, mutability should not be avoided, because it's safe. If you are trying to use mutability in a way that would not be safe, your code will not compile.

So if you're porting code from a more functional language, you can end up with code that isn't very Rusty.

If you are a functional programming fan, you might be skeptical of Rust and its claims. Try to think of it like this: Rust agrees that mutability is a problem -- it is simply offering a different solution to that problem.

Learning curve

I don't know if Rust is the most difficult-to-learn programming language I have seen, but it is running in that race.

Anybody remember back when Joel Spolsky used to talk about how difficult it is for some programmers to understand pointers? Rust is a whole new level above that. Compared to Rust, regular pointers are simplistic.

With Rust, we don't just have pointers. We also have ownership, borrows, and lifetimes.

As you learn Rust, you will reach a point where you think you are starting to understand things. And then you try to return a reference from a function, or store a reference in a struct. Suddenly you have lifetime<'a> annotations<'a> all<'a> over<'a> the<'a> place<'a>.

And why did you put them there? Because you understood something? Heck no. You started sprinkling explicit lifetimes throughout your code because the compiler error messages told you to.

I'm not saying that Rust isn't worth the pain. I personally think Rust is rather brilliant.

But a little expectation setting is appropriate here. Some programming languages are built for the purpose of making programming easier. (It is a valid goal to want to make software development accessible to a wider group of people.) Rust is not one of those languages.

That said, the Rust team has invested significant effort in excellent documentation (see The Book). And those compiler error messages really are good.

Finally, let me observe that while some things are hard to learn because they are poorly designed, Rust is not one of those things. The deeper I get into this, the more impressed I am. And so far, every single time I thought the compiler was wrong, I was mistaken.

I have found it helpful to try to make every battle with the borrow checker into a learning experience. I do not merely want to end up with the compiler accepting my code. I want to understand more than I did when I started.

Error handling

Rust does not have exceptions for error handling. Instead, error handling is done through the return values of functions.

But Rust actually makes this far less tedious than it might sound. By convention (and throughout the Rust standard library), error handling is done by returning a generic enum type called Result<T,E>. This type can encapsulate either the successful result of the function or an error condition.

On top of this, Rust has a clever macro called try!. Because of this macro, if you read some Rust code, you might think it has exception handling.

// This code was ported from F# which assumes that any Stream
// that supports Seek also can give you its Length.  That method
// isn't part of the Seek trait, but this implementation should
// suffice.
fn seek_len(fs: &mut R) -> io::Result where R : Seek {
    // remember where we started (like Tell)
    let pos = try!(fs.seek(SeekFrom::Current(0)));

    // seek to the end
    let len = try!(fs.seek(SeekFrom::End(0)));

    // restore to where we were
    let _ = try!(fs.seek(SeekFrom::Start(pos)));


This function returns std::io::Result<u64>. When it calls the seek() method of the trait object it is given, it uses the try! macro, which will cause an early return of the function if it fails.

In practice, I like Rust's Result type very much.

  • The From and Error traits make it easy to combine different kinds of Result/Error values.

  • The distinction between errors and panics seems very clean.

  • I like having the compiler help me be certain that I am propagating errors everywhere I should be. (I dislike scanning library documentation to figure out if I called something that throws an exception I need to handle.)

Nonetheless, when doing a line-by-line port of F# to Rust, this was probably the most tedious issue. Lots of functions that returned () in F# changed to return Result in Rust.

Type inference

Rust does type inference within functions, but it cannot or will not infer the types of function arguments or function return values.

Very often I miss having the more complete form of type inference one gets in F#. But I do remind myself of certain things:

  • The Rust type system is far more complicated than that of F#. Am I holding a Foo? Or do I have a &Foo (a reference to a Foo)? Am I trying to transfer ownership of this value or not? Being a bit more explicit can be helpful.

  • F# type inference has its weaknesses as well. Most notably, inference doesn't work at all with method calls. This gives the object-oriented features of F# a very odd "feel", as if they don't belong in the language, but it would be unthinkable for a CLR language not to have them.

  • Rust has type inference for integer literals but F# does not.

  • The type inference capabilities of Rust may get smarter in the future.


Rust iterators are basically like F# seq (which is an alias for .NET IEnumerable). They are really powerful and provide support for functional idioms like List.map. For example:

fn to_hex_string(ba: &[u8]) -> String {
    let strs: Vec = ba.iter()
        .map(|b| format!("{:02X}", b))
  • This function takes a slice (a part of an array) of bytes (u8) and returns its representation as a hex string.

  • Vec is a growable array

  • .iter() means something different than it does in F#. Here, it is the function that returns an iterator for a slice

  • .map() is pretty similar to F#. The argument above is Rust's syntax for a closure.

  • .collect() also means something different than it does in F#. Here, it consumes the iterator and puts all the mapped results into the Vec we asked for.

  • .connect("") is basically a join of all the resulting strings.

However, there are a few caveats.

In Rust, you have a lot more flexibility about whether you are dealing with "a Foo" or "a reference to a Foo", and most of the time, it's the latter. Overall, this is just more work than it is in F#, and using iterators feels like it magnifies that effect.


I haven't done the sort of careful benchmarking that is necessary to say a lot about performance, so I will say only a little.

  • I typically use one specific test for measuring performance changes. It writes 10 LSM segments and then merges them all into one, resulting in a data file.

  • On that test, the Rust version is VERY roughly 5 times faster than the F# version.

  • The Rust and F# versions end up producing exactly the same output file.

  • The test is not all that fair to F#. Writing an LSM database in F# was always kind of a square-peg/round-hole endeavor.

  • With Rust, the difference in compiling with or without the optimizer can be huge. For example, that test runs 15 times faster with compiler optimizations than it does without.

  • With Rust, the LLVM optimizer can't really do its job very well if it can't do function inlining. Which it can't do across crates unless you use explicit inline attributes or turn on LTO.

  • In F#, there often seems to be a negative correlation between "idiomatic-ness" and "performance". In other words, the more functional and idiomatic your code, the slower it will run.

  • F# could get a lot faster if it could take better advantage of the ability of the CLR to do value types. For example, in F#, option and tuple always cause heap allocations.

Integer overflow

Integer overflow checking is one of my favorite features of Rust.

In languages or environments without overflow checking, unsigned types are very difficult to use safely, so people generally use signed integers everywhere, even in cases where a negative value makes no sense. Rust doesn't suffer from this silliness.

For example, the following code will panic:

let x: u8 = 255;
let y = x + 2;
println!("{}", y);

That said, I haven't quite figured out how to get overflow checking to happen on casts. I want the following code (or something very much like it) to panic:

let x: u64 = 257;
let y = x as u8;
println!("{}", y);

Note that, by default, Rust turns off integer overflow checking in release builds, for performance reasons.

  • F# is still probably the most productive and pleasant language I have ever used. But Rust is far better than C in this regard.

  • IMO, the Read, Write, and Seek traits are a much better design than .NET's Stream, which tries to encapsulate all three concepts.

  • 'cargo test' is a nice, easy-to-use testing framework that is built into Cargo. I like it.

  • crates.io is like NuGet for Rust, and it's integrated with Cargo.

  • If 'cargo bench' wants to always report timings in nanoseconds, I wish it would put in a separator every three digits.

  • I actually like the fact that Rust is taking a stance on things like function_names_in_snake_case and TypeNamesInCamelCase, even to the point of issuing compiler warnings for names that do not match the conventions. I don't agree 100% with their style choices, and that's my point. Being opinionated might help avoid a useless discussion about something that never really matters very much anyway.

  • I miss printf-style format strings.

  • I'm not entirely sure I like the automatic dereferencing feature. I kinda wish the compiler wouldn't help me in this manner until I know what I'm doing.

Bottom line

I am seriously impressed with Rust. Then again, I thought that Eric Bana's Hulk movie was pretty good, so you might want to just ignore everything I say.

In terms of maturity and ubiquity, C has no equal. Still, I believe Rust has the potential to become a compelling replacement for C in many situations.

I look forward to using Rust more.


Announcing Zumero for SQL Server, Release 2.0

Eric.Weblog() - Eric Sink - Fri, 05/08/2015 - 19:00

Zumero for SQL Server (ZSS) is a solution for replication and sync between SQL Server and mobile devices. ZSS can be used to create offline-friendly mobile apps for iOS, Android, Windows Phone, PhoneGap, and Xamarin.

Our 2.0 release is a major step forward in the maturity of the product.


  • Compatibility with Azure SQL -- This release offers improved compatibility with Microsoft Azure SQL Database. Whether you prefer cloud or on-prem, ZSS 2.0 is a robust sync solution.

  • Improved filtering -- In the 2.0 release, filters have become more powerful and easier to use. Arcane limitations of the 1.x filtering feature have been lifted. New capabilities include filtering by date, and filtering of incoming writes.

  • Schema change detection -- The handling of schema changes is now friendlier to the use of other SQL tools. In 1.x, schema changes needed to be performed in the ZSS Manager application. In 2.0, we detect and handle most schema changes automatically, allowing you to integrate ZSS without changing the way you do things.

  • Better UI for configuration -- The ZSS Manager application has been improved to include UI for configuration of multiple backend databases, as well as more fine-grained control of which columns in a table are published for sync.

  • Increased Performance -- Perhaps most important of all, ZSS 2.0 is faster. In some situations, it is a LOT faster.

Lots more info at zumero.com.


Microsoft is becoming cool again

Eric.Weblog() - Eric Sink - Thu, 04/30/2015 - 19:00
Isn't "New Microsoft" awesome?

.NET is going open source? And cross-platform? On Github?!?

The news out of Redmond often seems like a mis-timed April fools joke.

But it's real. This is happening. Microsoft is apparently willing to do "whatever it takes" to get developers to love them again.

How did this company change so much, so quickly?

A lot of folks are giving credit to CEO Satya Nadella. And there could be some truth to that. Maaaaaaybe.

Another popular view: Two of the most visible people in this story are:

  • Scott Hanselman (whose last name I cannot type without double-checking the spelling.)

  • and Scott Gu (whose last name is Guenther. Or something like that. I can never remember anything after the first two letters.)

I understand why people think maybe these two guys caused this revolution. They both seem to do a decent job I suppose.

But the truth is that New Microsoft started when Microsoft hired Martin Woodward.

What?!? Who the heck is Martin Woodward?

Martin's current position is Executive Director of the .NET Foundation. Prior to that, he worked as a Program Manager on various developer tools.

Nobody knows who Martin is. Either of the Scotts have 2 orders of magnitude more Twitter followers.

But I think if you look closely at Martin's five+ year career at Microsoft, you will see a pattern. Every time a piece of Old Microsoft got destroyed in favor of New Microsoft, Martin was nearby.

Don't believe me? Ask anybody in DevDiv how TFS ended up getting support for Git.

It's all about Martin.

So all the credit should go to Martin Woodward then?

Certainly not.

You see, Martin joined Microsoft in late 2009 as part of their acquisition of Teamprise.

And Teamprise was a division of SourceGear.

I hired Martin Woodward (single-handedly, with no help or input from anybody else) in 2005. Four years later, when Microsoft acquired our Teamprise division (which I made happen, all by myself), Martin became a Microsoft employee.

Bottom line: Are you excited about all of the fantastic news coming out of Build 2015 this week? That stuff would never have happened if it were not for ME.

Eric, how can we ever thank you?

So, now that you know that I am the one behind all the terrific things Microsoft is doing, I'm sure you want to express your appreciation. But that won't be necessary. While I understand the sentiment, in lieu of expensive gifts and extravagant favors, I am asking my adoring fans to do two things:

First, try not to forget all the little people at Microsoft who are directly involved in the implementation of change. People like Martin, or the Scotts, or Satya. Even though these folks are making a relatively minor contribution compared to my own, I would like them to be supported in their efforts. Be positive. Don't second-guess their motives. Lay aside any prejudices you might have from Old Microsoft. Believe.

Second, get involved. Interact with New Microsoft as part of the community. Go to Github and grab the code. Report an issue. Send a pull request.

Embellishments and revisionist history aside...

Enjoy this! It's a GREAT time to be a developer.


What Mongo-ish API would mobile developers want?

Eric.Weblog() - Eric Sink - Mon, 04/27/2015 - 19:00

A couple weeks ago I blogged about mobile sync for MongoDB.

Updated Status of Elmo

Embeddable Lite Mongo continues to move forward nicely:

  • Progress on indexes:

    • Compound and multikey indexes are supported.
    • Sparse indexes are not done yet.
    • Index key encoding is different from the KeyString stuff that Mongo itself does. For encoding numerics, I did an ugly-but-workable F# port of the encoding used by SQLite4.
    • Hint is supported, but is poorly tested so far.
    • Explain is supported, partially, and only for version 3 of the wire protocol. More work to do there.
    • The query planner (which has delusions of grandeur for even referring to itself by that term) isn't very smart.
    • Indexes cannot yet be used for sorting.
    • Indexes are currently never used to cover a query.
    • When grabbing index bounds from the query, $elemMatch is ignored. Because of this, and because of the way Mongo multikey indexes work, most index scans are bounded at only one end.
    • The $min and $max query modifiers are supported.
    • The query planner doesn't know how to deal with $or at all.
  • Progress on full-text search:

    • This feature is working for some very basic cases.
    • Phrase search is not implemented yet.
    • Language is currently ignored.
    • The matcher step for $text is not implemented yet at all. Everything within the index bounds will get returned.
    • The tokenizer is nothing more than string.split. No stemming. No stop words.
    • Negations are not implemented yet.
    • Weights are stored in the index entries, but textScore is not calculated yet.

I also refactored to get better separation between the CRUD logic and the storage of bson blobs and indexes (making it easier to plug-in different storage layers).

Questions about client-side APIs

So, let's assume you are building a mobile app which communicates with your Mongo server in the cloud using a "replicate and sync" approach. In other words, your app is not doing its CRUD operations by making networking/REST calls back to the server. Instead, your app is working directly with a partial clone of the Mongo database that is right there on the mobile device. (And periodically, that partial clone is magically synchronized with the main database on the server.)

What should the API for that "embedded lite mongo" look like?

Obviously, for each development environment, the form of the API should be designed to feel natural or native in that environment. This is the approach taken by Mongo's own client drivers. In fact, as far as I can tell, these drivers don't even share much (or any?) code. For example, the drivers for C# and Java and Ruby are all different, and (unless I'm mistaken) none of them are mere wrappers around something lower level like the C driver. Each one is built and maintained to provide the most pleasant experience to developers in a specific ecosystem.

My knee-jerk reaction here is to say that mobile developers might want the exact same API as presented by their nearest driver. For example, if I am building a mobile app in C# (using the Xamarin tools), there is a good chance my previous Mongo experience is also in C#, so I am familiar with the C# driver, so that's the API I want.

Intuitive as this sounds, it may not be true. Continuing with the C# example, that driver is quite large. Is its size appropriate for use on a mobile device? Is it even compatible with iOS, which requires AOT compilation? (FWIW, I tried compiling this driver as a PCL (Portable Class Library), and it didn't Just Work.)

For Android, the same kinds of questions would need to be asked about the Mongo Java driver.

And then there are Objective-C and Swift (the primary developer platform for iOS), for which there is no official Mongo driver. But there are a couple of them listed on the Community Supported Drivers page: http://docs.mongodb.org/ecosystem/drivers/community-supported-drivers/.

And we need to consider Phonegap/Cordova as well. Is the Node.js driver a starting point?

And in all of these cases, if we assume that the mobile API should be the same as the driver's API, how should that be achieved? Fork the driver code and rip out all the networking and replace it with calls to the embedded library?

Or should each mobile platform get a newly-designed API which is specifically for mobile use cases?

Believe it or not, some days I wonder: Suppose I got Elmo running as a server on an Android device, listening on localhost port 27017. Could an Android app talk to it with the Mongo Java driver unchanged? Even if this would work, it would be more like a proof-of-concept than a production solution. Still, when looking for solutions to a problem, the mind goes places...

So anyway, I've got more questions than answers here, and I would welcome thoughts or opinions.

  • Feel free to post an issue on GitHub: https://github.com/zumero/Elmo/issues

  • Or email me: eric@zumero.com

  • Or Tweet: @eric_sink

  • Or find me at MongoDB World in NYC at the beginning of June.


Mobile Sync for Mongo

Eric.Weblog() - Eric Sink - Mon, 04/13/2015 - 19:00

We here at Zumero have been exploring the possibility of a mobile sync solution for MongoDB.

We first released our Zumero for SQL Server product almost 18 months ago, and today there are bunches of people using mobile apps which sync using our solution.

But not everyone uses SQL Server, so we often wonder what other database backends we should consider supporting. In this blog entry, I want to talk about some progress we've made toward a "Zumero for Mongo" solution and "think out loud" about the possibilities.

Background: Mobile Sync

The basic idea of mobile sync is to keep a partial copy of the database on the mobile device so the app doesn't have to go back to the network for every single CRUD operation. The benefit is an app that is faster, more reliable, and works offline. The flip side of that coin is the need to keep the mobile copy of the database synchronized with the data on the server.

Sync is tricky, but as mobile continues its explosive growth, this approach is gaining momentum:

If the folks at Mongo are already working on something in this area, we haven't seen any sign of it. So we decided to investigate some ideas.

Pieces of the puzzle

In addition to the main database (like SQL Server or MongoDB or whatever), a mobile sync solution has three basic components:

Mobile database
  • Runs on the mobile device as part of the app

  • Probably an embedded database library

  • Keeps a partial replica of the main database

  • Wants to be as similar as possible to the main database

Sync server
  • Monitors changes made by others to the main database

  • Sends incremental changes back and forth between clients and the main database

  • Resolves conflicts, such as when two participants want to change the same data

  • Manages authentication and permissions for mobile clients

  • Filters data so that each client only gets what it needs

Sync client
  • Monitors changes made by the app to the mobile database

  • Talks over the network to the sync server

  • Pushes and pulls incremental changes to keep the mobile database synchronized

  • For this blog entry, I want to talk mostly about the mobile database. In our Zumero for SQL Server solution, this role is played by SQLite. There are certainly differences between SQL Server and SQLite, but on the whole, SQLite does a pretty good job pretending to be SQL Server.

    What embedded database could play this role for Mongo?

    This question has no clear answer, so we've been building a a lightweight Mongo-compatible database. Right now it's just a prototype, but its development serves the purpose of helping us explore mobile sync for Mongo.

    Embeddable Lite Mongo

    Or "Elmo", for short.

    Elmo is a database that is designed to be as Mongo-compatible as it can be within the constraints of mobile devices.

    In terms of the status of our efforts, let me begin with stuff that does NOT work:

    • Sharding is an example of a Mongo feature that Elmo does not support and probably never will.

    • Elmo also has no plans to support any feature which requires embedding a JavaScript engine, since that would violate Apple's rules for the App Store.

    • We do hope to support full text search ($text, $meta, etc), but this is not yet implemented.

    • Similarly, we have not yet implemented any of the geo features, but we consider them to be within the scope of the project.

    • Elmo does not support capped collections, and we are not yet sure if it should.

    Broadly speaking, except for the above, everything works. Mostly:

    • All documents are stored in BSON

    • Except for JS code, all BSON types are supported

    • Comparison and sorting of BSON values (including different types) works

    • All basic CRUD operations are implemented

    • The update command supports all the update operators except $isolated

    • The update command supports upsert as well

    • The findAndModify command includes full support for its various options

    • Basic queries are fully functional, including query operators, projection, and sorting

    • The matcher supports Mongo's notion of query predicates matching any element of an array

    • CRUD operations support resolution of paths into array subobjects, like x.y to {x:[{y:2}]}

    • Regex works, with support for the i, s, and m options

    • The positional operator $ works in update and projection

    • Cursors and batchSize are supported

    • The aggregation pipeline is supported, including all expression elements and all stages (except geo)

    More caveats:

    • Support for indexes is being implemented, but they don't actually speed anything up yet.

    • The dbref format is tolerated, but is not [yet] resolved.

    • The $explain feature is not implemented yet.

    • For the purpose of storing BSON blobs, Elmo is currently using SQLite. Changing this later will be straightforward, as we're basically just using SQLite as a key-value store, so the API between all of Elmo's CRUD logic and the storage layer is not very wide.

    Notes on testing:

    • Although mobile-focused Elmo does not need an actual server, it has one, simply so that we can run the jstests suite against it.

    • The only test suite sections we have worked on are jstests/core and jstests/aggregation.

    • Right now, Elmo can pass 311 of the test cases from jstests.

    • We have never tried contacting Elmo with any client driver except the mongo shell. So this probably doesn't work yet.

    • Elmo's server only supports the new style protocol, including OP_QUERY, OP_GET_MORE, OP_KILL_CURSORS, and OP_REPLY. None of the old "fire and forget" messages are implemented.

    • Where necessary to make a test case pass, Elmo tries to return the same error numbers as Mongo itself.

    • All effort thus far has been focused on making Elmo functional, with no effort spent on performance.

    How Elmo should work:

    • In general, our spec for Elmo's behavior is the MongoDB documentation plus the jstests suite.

    • In cases where the Mongo docs seem to differ from the actual behavior of Mongo, we try to make Elmo behave like Mongo does.

    • In cases where the Mongo docs are silent, we often stick a proxy in front of the Mongo server and dump all the messages so we can see exactly what is going on.

    • We occasionally consult the Mongo server source code for reference purposes, but no Mongo code has been copied into Elmo.

    Notes on the code:

    • Elmo is written in F#, which was chosen because it's an insanely productive environment and we want to move quickly.

    • But while F# is a great language for this exploratory prototype, it may not be the right choice for production, simply because it would confine Elmo use cases to Xamarin, and Miguel's world domination plan is not quite complete yet. :-)

    • The Elmo code is now available on GitHub at https://github.com/zumero/Elmo. Currently the license is GPLv3, which makes it incompatible with production use on mobile platforms, which is okay for now, since Elmo isn't ready for production use anyway. We'll revisit licensing issues later.

    Next steps:

    • Our purpose in this blog entry is to start conversations with others who may be interested in mobile sync solutions for Mongo.

    • Feel free to post a question or comment or whatever as an issue on GitHub: https://github.com/zumero/Elmo/issues

    • Or email me: eric@zumero.com

    • Or Tweet: @eric_sink

    • If you're interested in a face-to-face conversation or a demo, we'll be at MongoDB World in NYC at the beginning of June.


Another elasticsearch blog post, now about Shield

Gridshore - Thu, 01/29/2015 - 21:13

<p>I just wrote another piece of text on my other blog. This time I wrote about the recently release elasticsearch plugin called Shield. If you want to learn more about securing your elasticsearch cluster, please head over to my other blog and start reading</p>


The post Another elasticsearch blog post, now about Shield appeared first on Gridshore.

Categories: Architecture, Programming

New blog posts about bower, grunt and elasticsearch

Gridshore - Mon, 12/15/2014 - 08:45

Two new blog posts I want to point out to you all. I wrote these blog posts on my employers blog:

The first post is about creating backups of your elasticsearch cluster. Some time a go they introduced the snapshot/restore functionality. Of course you can use the REST endpoint to use the functionality, but how easy is it if you can use a plugin to handle the snapshots. Or maybe even better, integrate the functionality in your own java application. That is what this blogpost is about, integrating snapshot/restore functionality in you java application. As a bonus there are the screens of my elasticsearch gui project snowing the snapshot/restore functionality.

Creating elasticsearch backups with snapshot/restore

The second blog post I want to put under you attention is front-end oriented. I already mentioned my elasticsearch gui project. This is an Angularjs application. I have been working on the plugin for a long time and the amount of javascript code is increasing. Therefore I wanted to introduce grunt and bower to my project. That is what this blogpost is about.

Improve my AngularJS project with grunt

The post New blog posts about bower, grunt and elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming

New blogpost on kibana 4 beta

Gridshore - Tue, 12/02/2014 - 12:55

If you are like me interested in elasticsearch and kibana, than you might be interested in a blog post I wrote on my employers blog about the new Kibana 4 beta. If so, head over to my employers blog:


The post New blogpost on kibana 4 beta appeared first on Gridshore.

Categories: Architecture, Programming

Big changes

Gridshore - Sun, 10/26/2014 - 08:27

The first 10 years of my career I worked as a consultant for Capgemini and Accenture. I learned a lot in that time. One of the things I learned was that I wanted something else. I wanted to do something with more impact, more responsibility and together with people that wanted to do challenging projects. Not pure to get up in your career but more because they like doing cool stuff. Therefore I left Accenture to become part of a company called JTeam. It has been over 6 years that this took place.

I started as Chief Architect at JTeam. The goal was to become a leader to the other architects and create a team together with Bram. That time I was lucky that Allard joined me. We share a lot of ideas, which makes it easier to set goals and accomplish them. I got to learn a few very good people at JTeam, to bad that some of them left, but that is life.

After a few years bigger changes took place. Leonard left for Steven and the shift to a company that needs to grow started. We took over two companies (Funk and Neteffect), we now had all disciplines of software development available. From front-end to operations. As the company grew some things had to change. I got more involved in arranging things like internships, tech events, partnerships and human resource management.

We moved into a bigger building and we had better opportunities. One of the opportunities was a search solution created by Shay Banon. Gone was Steven, together with Shay he founded Elasticsearch. We got acquired by Trifork. In this change we lost most of our search expertise because all of our search people joined the elasticsearch initiative. Someone had to pick up search at Trifork and that was me together with Bram.

For over 2 years I invested a lot of time in learning about mainly elasticsearch. I created a number of workshops/trainings and got involved with multiple customers that needed search. I have given trainings to a number of customers to groups varying between 2 and 15 people. In general they were all really pleased with the trainings I have given.

Having so much focus for a while gave me a lot of time to think, I did not need to think about next steps for the company, I just needed to get more knowledgeable about elasticsearch. In that time I started out on a journey to find out what I want. I talked to my management about it and thought about it myself a lot. Then, right before summer holiday I had a diner with two people I know through the Nljug, Hans and Bert. We had a very nice talk and in the end they gave me an opportunity that I really had to have some good thoughts about. It was really interesting, a challenge, not really a technical challenge, but more an experience that is hard to find. During summer holiday I convinced myself this was a very interesting direction and I took the next step.

I had a lunch meeting with my soon to be business partner Sander. After around 30 minutes it already felt good. I really feel the energy of creating something new, I feel inspired again. This is the feeling I have been missing for a while. In September we were told that Bram was leaving Trifork. Since he is the person that got me into JTeam back in the days it felt weird. I understand his reasons to go out and try to start something new. Bram leaving resulted in a vacancy for a CTO and the management team had decided to approach Allard for this role. This was a surprise to me, but a very nice opportunity for Allard and I know he i going to do a good job. At the end of September Sander and myself presented the draft business plan to the board for Luminis. That afternoon hands were shaken. It was than that I made the last call and decided to resign from my Job at Trifork and take this new opportunity at Luminis.

I feel sad about leaving some people behind. I am going to mis the morning talks in the car with Allard about everything related to the company, I am going to mis doing projects with Roberto (We are a hell of team), I am going to mis Byron for his capabilities (You make me feel proud that I guided your first steps within Trifork), I am going to mis chasing customers with Henk (We did a good job the passed year) and I am going to mis Daphne and the after lunch walks . To all of you and all the others at Trifork, it is a small world …

Luminis logo

Together with Sander, and with the help of all the others at Luminis, we are going to start Luminis Amsterdam. This is going to be a challenge for me, but together with Sander I feel we are going to make it happen. I feel confident that the big changes to come will be good changes.

The post Big changes appeared first on Gridshore.

Categories: Architecture, Programming

Thought after Test Automation Day 2013

Henry Ford said “Obstacles are those frightful things you see when take your eyes off the goal.” After I’ve been to Test Automation Day last month I’m figuring out why industrializing testing doesn’t work. I try to put it in this negative perspective, because I think it works! But also when is it successful? A lot of times the remark from Ford is indeed the problem. People tend to see obstacles. Obstacles because of the thought that it’s not feasible to change something. They need to change. But that’s not an easy change.

After attending the #TAD2013 as it was on Twitter I saw a huge interest in better testing, faster testing, even cheaper testing by using tools to industrialize. Test automation has long been seen as an interesting option that can enable faster testing. it wasn’t always cheaper, especially the first time, but at least faster. As I see it it’ll enable better testing. “Better?” you may ask. Test automation itself doesn’t enable better testing, but by automating regression tests and simple work the tester can focus on other areas of the quality.


And isn’t that the goal? In the end all people involved in a project want to deliver a high quality product, not full of bugs. But they also tend to see the obstacles. I see them less and less. New tools are so well advanced and automation testers are becoming smarter and smarter that they enable us to look beyond the obstacles. I would like to say look over the obstacles.

At the Test Automation Day I learned some new things, but it also proved something I already know; test automation is here to stay. We don’t need to focus on the obstacles, but should focus on the goal.

Categories: Testing & QA