Skip to content

Configuration Management

Video of my presentation from OSCON 2010

Eric.Weblog() - Eric Sink - Sat, 08/14/2010 - 15:56

For those who are interested, we've posted the video of my presentation at OSCON on YouTube.

I had a few problems when displaying my slide deck at the conference.  When I'm speaking at an event, I usually like to use whatever equipment is provided.  To be assured of compatibility between my MacBook Pro and the projector, I would need to bring like [what seems like] 23 different video adapters.  It's easier to just bring my slide deck on a thumb drive.

The email from the conference organizers told us there would be "Dell laptops" in the room.  I remember thinking how boneheaded it was of them to be running Windows at the Open Source convention, but I complied and brought my slides as a PowerPoint file.

And then I got there and discovered that I was the one being a bonehead for assuming that "Dell laptop" == "Windows + Office".  Actually, those Dell laptops were running Linux with OpenOffice.org.  Anyway, OO.org imported my .pptx file, but it botched the formatting in some rather unexpected and entertaining ways.

Moving Forward

Since OSCON ended three weeks ago, folks on our team have been taking their summer vacations, but we've still made some good progress:

  • After hearing lots of (well deserved) complaints from people trying to build 64-bit Veracity, we expanded our continuous integration build farm to do both 32 and 64 bit builds, debug and release, on all our platforms.

  • We had just missed our goal of dogfooding Veracity's bug-tracking features before OSCON, but after another round of improvements to the Web UI stuff, now we're using Veracity not just for source control, but also for project tracking.

  • We implemented Mercurial-style version numbers.  They're specific to one instance of a repo, but still kind of handy.

  • We started work on letting Veracity run through mainstream web servers (instead of only using its embedded web server).

  • We did lots of bug fixes, including some deep polishing and testing work on patterns for include/exclude settings.

  • I've been working in a private branch, focused mostly on improving performance:

    • Every changeset record has a blob list which is used for making things like push/pull and incremental indexing efficient.  For changesets which are a DAG merge (more than one parent), we need to normalize that blob list to ensure that the exact same list is constructed on each side of the merge.  Our previous normalization code was additive.  It walked the DAG back to the lowest common ancestor and added any blob which wasn't present on both sides.  Gradually, this caused those blob lists to keep getting bigger and bigger, which turned out to be a nasty performance probem that gets worse as the repo grows.  So, I switched the normalization code to remove any blob which was present in the blob list of any ancestor.  This is a lot harder to calculate, but it results in a much tighter list.

    • The changeset record for a database DAG includes a delta.  When that changeset is a merge, the delta is calculated against the lowest common ancestor of the two parents.  However, when it comes time to store that delta for later use by the indexing code, it would be better to calculate an equivalent delta against one of the two parents.

    • In a Veracity database, every record has two fields:  recid and rectype.  However, some our databases just don't need both of these fields.  For example, recid is really only useful if you plan to modify records, but the audit DBs are filled with record that never get modified.  Similarly, if a DB only has one record type, we don't need every single record to have a field reminding us what the name of that type is.  So, I made a bunch of changes to allow a Veracity DB to exclude one or both of these fields.  Eliminating the need to store, retrieve, index and obey these superfluous fields resulted in a nice perf increase.

    • I went through and made dozens of little optimizations in the indexer.  Remember to always use SQLite's prepared statements in loops.  Make sure every blob getting indexed only gets loaded once.  Tune the hash table which represents JSON objects.

    • I found and fixed a few GC rooting bugs in our SpiderMonkey code.  BTW, I can't wait until we can upgrade to new and improved version of the JS engine.  I greatly dislike the fact that SpiderMonkey doesn't have a wider int.

    • Unfortunately, some of my changes break compatibility, so I've been writing a script to migrate all our data.  This week I'll merge with the trunk and we'll do what we call a "repository reboot".

    • This firehose of detail is mostly just the ramblings of yet another blogger who is under the delusion that anybody cares about the mundane elements of his day.  Which reminds me, Thursday morning for breakfast I had iced coffee with an omelet made of red peppers, Portobello mushrooms, and provolone.  Anyway, on the off chance that anything here wants to get discussed, meet me on the Veracity mailing list.

After things settle down just a bit more, we'll be ready to start publishing nightly tarballs.

Slides from my presentation at OSCON 2010

Eric.Weblog() - Eric Sink - Tue, 07/27/2010 - 16:47

Several folks have asked for a copy of the slides from my talk at OSCON last week, so here they are (PDF, 2 MB).  They might be a little hard to follow without the narrative that goes with them.  A videotape of the talk will be posted in a week or so.

Thanks to all who attended my presentation.  The turnout was great, and folks seemed quite enthusiastic about Veracity.

My apologies to the Prophet and SD developers (one of whom attended my talk) for neglecting to mention them.  A silly oversight on my part.

I was especially appreciative of the attendance and expressions of support from several members of the original Subversion development team.  Subversion is one of the most successful version control tools ever, and I watched its early development closely enough to develop an admiration for the folks who built it.  So it was a very pleasant surprise to find a few "celebrities" in attendance at my session.  :-)

Veracity Technology Overview

Eric.Weblog() - Eric Sink - Mon, 07/19/2010 - 22:13

When I encounter a new piece of software, I usually ask, "What's in it?"

Tools and technologies we've been using to build Veracity

C

The core library and the command-line app are written entirely in C.  Some folks won't like our coding conventions.  I'll probably do a whole blog entry sometime to ((apologize for) && (defend)) the liberties we've taken with the C preprocessor.

JavaScript

jQuery

On the browser side of things, Veracity is a web app written in JavaScript using jQuery.

SVG

Burn down charts and other web graphics are done using SVG.

JSON

Veracity uses JSON all over the place.  All serialized structures in the repo are JSON.  Database records and templates are JSON.  We have a bunch of C code for parsing JSON, writing JSON, and dealing with JSON-like data in memory.

SQLite

We use SQLite in several places as a more scalable disk format, and also as an index.

Mongoose

The Veracity command-line app has an embedded web server for personal use.  It's based on Mongoose.

Curl

The client side of push/pull is done by calling libcurl.

UTF-8

ICU

Early on in the Veracity project, we did a lot of work to make sure that stuff was done right with respect to Unicode.  Our preferred encoding is UTF-8.  The ICU library from IBM has been helpful in a number of places.

REST

The Veracity web API is very RESTy.  And of course, everything serialized over the network is in JSON. 

CMake

Our build system is CMake, with which we have a love/hate relationship.  We love it because it generates makefiles, Xcode projects or Visual Studio solutions.  We hate it because its language makes Forth look sane.

CTest

We have a huge suite of automated tests.  CMake's integrated test stuff actually works pretty well.  Just run 'ctest' at the top level directory.

SpiderMonkey

A lot of our tests are written in Javascript.  We have a command-line executable called 'vscript' which is basically the Spidermonkey Javascript engine glued to the main Veracity library.

Continuous Integration

Our CI system system rebuilds from scratch and runs the main tests after every checkin, on Mac, Linux and Windows.  Results are published to an internal web page and sent to the team by email.

Scrum

The more we use Scrum, the more I like it.  We're patient with ourselves.  We just try to get a little better in our Scrum practices each iteration.

gcov

lcov

Nightly builds run the entire test suite with code coverage done by gcov.  Our current coverage level is 81%.

vcdiff

For binary deltas, Veracity uses the algorithm described in RFC 3284.  (Actually, the use of binary deltas is currently turned off by default, so if you notice that repositories seem big, that's why.  All the plumbing is done.  We're just not using it yet.)  Anyway, we've got our own implementation of vcdiff.  We may consider switching to Google's open-vcdiff at some point if its performance is better.

zlib

For simple non-deltified compression in repo implementations, Veracity uses zlib.

Valgrind

When coding in C, valgrind is indispensable.

Shark

I do most of my coding on the Mac, so I use the Shark profiler.  Very cool.

emacs

vim

Eclipse

Visual Studio

bash

gdb

Every developer on our team chooses their own tools.  We have a good representation of most of the major religions.

Firefox

Safari

Chrome

Similarly, every developer chooses their own web browser.  I'm not sure what feelings to have upon noticing that nobody is using Internet Explorer.  It seems so wrong.  And yet, so right.

A few notable things we'll probably be using later

.NET

Even though Veracity was not built fundamentally on the .NET platform, we are committed to providing excellent support for Windows developers.  Visual Studio integration is a high priority.

Java

Similarly, we didn't use Java to build the core libraries for Veracity, but we plan to deliver excellent integration into the Java world, including an Eclipse plugin.

IIS

Apache

The embedded web server is fine for personal use on the desktop, but large teams will want to run a real web server for their central repository.  We designed for this case early, but have not yet implemented something like an IIS plugin.

A few notable things we are NOT using (and maybe never will)

C++

Here's another blog article I need to write.  Basically, we only considered two choices:  C, and the C-like subset of C++.  We chose plain C.  I just wish the Microsoft C compiler supported C99.

Flash

Silverlight

GWT

We seriously considered other ways of building our web apps.  We ended up choosing basic HTML/CSS/jQuery/Ajax.  No regrets, but I sometimes wonder how things are going for people using GWT.

NSPR

APR

Sorry folks, in an apparent fit of NIH syndrome, we wrote our own portability layer. And I am completely unrepentant.

XML

JSON won.  What can I say?  I just like curly braces a lot more than angle brackets.

 

Veracity screenshot: Burndown Chart

Eric.Weblog() - Eric Sink - Thu, 07/15/2010 - 17:13

Yesterday I tried to describe Veracity in a thousand words.  Today, let's try a picture.

Veracity's distributed work item tracking feature is one of several things which is built on that "decentralized database" I mentioned.  This screenshot is Veracity displaying a burndown chart for a Scrum iteration.

The thing on the left is an activity stream.  It's a Twitter-like feature with other notifications mixed in, such as code checkins and comments on work items.

Veracity: The next step in DVCS

Eric.Weblog() - Eric Sink - Wed, 07/14/2010 - 15:00

One week from today, at the O'Reilly Open Source Convention, SourceGear will be making a big announcement.  Today I'm giving you an early preview.  We've been building something new.  :-)

  • It's called Veracity.
  • It's a Distributed Version Control System (DVCS), somewhat like Mercurial or Git.
  • It has some cool new capabilities no other DVCS has.
  • It will be open source, released one week from today under the Apache License, Version 2.0.

This project has been consuming the bulk of my time, and I am glad to finally be able to write and speak about it.  I'll have a lot to say going forward, but for today I just want to answer some questions we anticipate folks will be asking.

Why build yet another DVCS?

At OSCON next week we will be referring to Veracity as "the next step in DVCS".  This description may sound a bit audacious, but it describes exactly what we have built Veracity to be.

Git, Mercurial and Bazaar are all great, but we don't think they are the last word. This model of distributed development is the future of our industry. Things are just getting started.  We're building Veracity to push forward.

So let me try to explain how our vision is different from what is available from the popular DVCS tools today.

Please understand that my intent here is not to criticize existing tools or start a war with their fans (especially because Veracity needs to simmer a bit longer before it's ready). I simply know that the easiest way to explain something new is to compare it to something well-known.

Decentralized Database

Veracity goes beyond versioning of directories and files to provide management of records and fields, with full support for pushing, pulling and merging database changesets, just like source tree changesets.

Veracity's decentralized, template-driven database is used for all kinds of administrative data, including user accounts, tags, commit messages, and history. This database is also the platform on which we are building features like work item tracking.

User accounts

Existing DVCS tools have no real concept of user accounts. Enterprise customers need robust administration features like auditing and permissions. Veracity supports these features with a user system built on its decentralized database engine.

Pluggable storage layers

Veracity wraps all the actual storage of a repository in an API. This allows different implementations to offer different tradeoffs. For example, an organization may want to use an enterprise SQL database to store repository data on a central server, while developer desktop machines may use a simpler filesystem-oriented storage engine.  You can push and pull changesets across different storage implementations seamlessly.

Hash functions

Just as with Mercurial and Git, Veracity identifies all repository objects using a cryptographic hash of the contents. Veracity supports SHA1 like current tools, but is ready for the future with full support for SHA2 and Skein, at 256 or even 512 bits.

Veracity's default hash is SHA1.  Our dogfooding repo is SHA2/256.

Robust tracking for rename and directories

Like Bazaar, Veracity assigns every repository object an ID which remains constant when the object is renamed or moved to a different path. This handles the situation where a developer changes both the contents of a file and its path in the same transaction, and is a critical feature for robust merge operations.

Veracity also tracks directories as first-class repository objects, just like files.

Cross-Platform C

From the beginning, we wanted to make it easy to integrate Veracity into all kinds of other systems on a wide variety of platforms. So we wrote everything in C, with Windows, Mac OS and Linux all on equal footing. We love Python too, but C is a lowest common denominator that can be ported or integrated everywhere we need to go.

Apache License Version 2.0

Current DVCS tools do not yet have much penetration with enterprise customers.  This is largely due to lack of features and company infrastructure.  But even if Git or Mercurial were enterprise-ready in every other way, many companies will hesitate because of the GPL.

We chose the Apache License Version 2.0 (instead of the GPL) because we wanted there to be no obstacles for Veracity to be adopted in commercial and enterprise scenarios.

Open Source? How are you guys gonna make money?

The core of Veracity will be open source, but we do plan to sell add-on products built on the core.

Does this news mean you are abandoning Vault?

Heck no. Vault is like, 100% of our revenue. And there are still thousands of teams on SourceSafe that need to be rescued from their plight.  :-)

We looked hard at the notion of morphing Vault into a DVCS and decided it just isn't feasible.  If we had forced the square peg into the round hole, the result would either have fallen short of being a true DVCS or it would have been an incredibly painful upgrade for Vault customers.

Vault will continue to be supported and improved for centuries.

Is Veracity ready for people to actually use?

Not yet.

We are dogfooding Veracity here at SourceGear, but if anybody else tries to use it, they'll be frustrated.  File formats, command syntax and APIs are all still in flux.  We have a lot of stuff to finish up before we give it a 1.0 version number.

In the meantime, if you need a DVCS that is ready to use now, Mercurial, Git and Bazaar offer you three great choices.

How can I give feedback?

My blog currently does not have a comments feature, but I would still welcome feedback from anyone who has something to say.  If you want to say something privately, feel free to email me directly (eric@sourcegear.com).  Or you can use Twitter (eric_sink).

We'll be hosting a project mailing list which will be opened next week when the source is released.  And we'll have a "modern" website for the Veracity community a bit later.

Coming Soon...

Remember, this is open source stuff, so it's not real until the source is actually available.  That'll happen a week from today with the "official" announcement.  For now, I just wanted to let you know what's coming.

Going to OSCON

Eric.Weblog() - Eric Sink - Wed, 07/07/2010 - 22:37

Hey folks, I just wanted to let my readers know that I'll be at the O'Reilly Open Source Convention (OSCON) a couple weeks from now in Portland, Oregon.  SourceGear will have an exhibitor booth with, as usual, the very coolest free T-shirts.

Stop by and say hello!  :-)