Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Architecture

Better guesswork for Product Owners

Xebia Blog - Thu, 11/17/2016 - 09:22
Estimation, if there is one concept hard to grasp in product development it will be when things are done. With done I don’t mean the releasable increment from the iteration, but rather what will be in it? or in Product Management speak: “what problem does it solve for our customer?”. I increasingly am practicing randori

The Story of Batching to Streaming Analytics at Optimizely

Our mission at Optimizely is to help decision makers turn data into action. This requires us to move data with speed and reliability. We track billions of user events, such as page views, clicks and custom events, on a daily basis. To provide our customers with immediate access to key business insights about their users has always been our top most priority. Because of this, we are constantly innovating on our data ingestion pipeline.

In this article we will introduce how we transformed our data ingestion pipeline from batching to streaming to provide our customers with real-time session metrics.

Motivations 

Unification. Previously, we maintained two data stores for different use cases - HBase is used for computing Experimentation metrics, whereas Druid is used for calculating Personalization results. These two systems were developed with distinctive requirements in mind:

Experimentation

Personalization

Instant event ingestion

Delayed event ingestion ok

Query latency in seconds

Query latency in subseconds

Visitor level metrics

Session level metrics

As our business requirements evolve, however, things quickly became difficult to scale. Maintaining a Druid + HBase Lambda architecture (see below) to satisfy these business needs became a technical burden for the engineering team. We need a solution that reduces backend complexity and increases development productivity. More importantly, a unified counting infrastructure creates a generic platform for many of our future product needs.

Consistency. As mentioned above, the two counting infrastructures provide different metrics and computational guarantees. For example, Experimentation results show you the number of visitors visited your landing page whereas Personalization shows you the number of sessions instead. We want to bring consistent metrics to our customers and support both type of statistics across our products.

Real-time results. Our session based results are computed using MR jobs, which can be delayed up to hours after the events are received. A real-time solution will provide our customers with more up-to-date view of their data.

Druid + HBase

In our earlier posts, we introduced our backend ingestion pipeline and how we use Druid and MR to store transactional stats based on user sessions. One biggest benefit we get from Druid is the low latency results at query time. However, it does come with its own set of drawbacks. For example, since segment files are immutable, it is impossible to incrementally update the indexes. As a result, we are forced to reprocess user events within a given time window if we need to fix certain data issues such as out of order events. In addition, we had difficulty scaling the number of dimensions and dimension cardinality, and queries expanding long period of time became expensive.

On the other hand, we also use HBase for our visitor based computation. We write each event into an HBase cell, which gave us maximum flexibility in terms of supporting the kind of queries we can run. When a customer needs to find out “how many unique visitors have triggered an add-to-cart conversion”, for example, we do a scan over the range of dataset for that experimentation. Since events are pushed into HBase (through Kafka) near real-time, data generally reflect the current state of the world. However, our current table schema does not aggregate any metadata associated with each event. These metadata include generic set of information such as browser types and geolocation details, as well as customer specific tags used for customized data segmentation. The redundancy of these data prevents us from supporting large number of custom segmentations, as it increases our storage cost and query scan time.

SessionDB 
Categories: Architecture

How Urban Airship Scaled to 2.5 Billion Notifications During the U.S. Election

This is a guest post by Urban Airship. Contributors: Adam Lowry, Sean Moran, Mike Herrick, Lisa Orr, Todd Johnson, Christine Ciandrini, Ashish Warty, Nick Adlard, Mele Sax-Barnett, Niall Kelly, Graham Forest, and Gavin McQuillan

Urban Airship is trusted by thousands of businesses looking to grow with mobile. Urban Airship is a seven year old SaaS company and has a freemium business model so you can try it for free. For more information, visit www.urbanairship.com. Urban Airship now averages more than one billion push notifications delivered daily. This post highlights Urban Airship notification usage for the 2016 U.S. election, exploring the architecture of the system--the Core Delivery Pipeline--that delivers billions of real-time notifications for news publishers.

2016 U.S. Election

In the 24 hours surrounding Election Day, Urban Airship delivered 2.5 billion notifications—its highest daily volume ever. This is equivalent to 8 notification per person in the United States or 1 notification for every active smartphone in the world. While Urban Airship powers more than 45,000 apps across every industry vertical, analysis of the election usage data shows that more than 400 media apps were responsible for 60% of this record volume, sending 1.5 billion notifications in a single day as election results were tracked and reported.

 

Notification volume was steady and peaked when the presidential election concluded:

Categories: Architecture

Stuff The Internet Says On Scalability For November 11th, 2016

Hey, it's HighScalability time:

 

Hacking recognition systems with fashion.

 

If you like this sort of Stuff then please support me on Patreon.
  • 9 teraflops: PC GPU performance for VR rendering; 1.75 million requests per second: DDoS attack from cameras; 5GB/mo: average data consumption in the US; ~59.2GB: size of Wikipedia corpus; 50%: slower LTE within the last year; 5.4 million: entries in Microsoft Concept Graph; 20 microseconds: average round-trip latencies between 250,000 machines using direct FPGA-to-FPGA messages (Microsoft); 1.09 billion: Facebook daily active mobile users; 300 minutes: soaring time for an AI controlled glider; 82ms: latency streaming game play on Azure; 

  • Quotable Quotes:
    • AORTA: Apple’s service revenue is now consistently greater than iPad and Mac revenue streams making it the number two revenue stream behind the gargantuan iPhone bucket.
    • @GeertHub: Apple R&D budget: $10 billion NASA science budget: $5 billion One explored Pluto, the other made a new keyboard.
    • Steve Jobs: tie all of our products together, so we further lock customers into our ecosystem
    • @moxie: I think these types of posts are also the inevitable result of people overestimating our organizational capacity based on whatever limited success Signal and Signal Protocol have had. It could be that the author imagines me sitting in a glass skyscraper all day, drinking out of champagne flutes, watching over an enormous engineering team as they add support for animated GIF search as an explicit fuck you to people with serious needs.
    • @jdegoes: Devs don't REALLY hate abstraction—they hate obfuscation. Abstraction discards irrelevant details, retaining an essence governed by laws.
    • @ewolff: There are no stateless applications. It just means state is on the client or in the database.
    • @mjpt777: Pushing simple logic down into the memory controllers is the only way to overcome the bandwidth bottleneck. I'm glad to see it begin.
    • @gigastacey: Moral of @0xcharlie car hacking talk appears to be don't put actuators on the internet w/out thinking about security. #ARMTechCon
    • @markcallaghan: When does MySQL become too slow for analytics? Great topic, maybe hard to define but IO-bound index nested loops join isn't fast.
    • @iAnimeshS: A year's computing on the old Macintosh portable can now be processed in just 5 seconds on the #NewMacBookPro. #AppleEvent
    • @neil_conway: OH: "My philosophy for writing C++ is the same as for using Git: 'I stay in my damn lane.'"
    • qnovo: Yet as big as this figure sounds, and it is big, only 3 gallons of gasoline (11 liters) pack the same amount of energy. Whereas the Tesla battery weighs about 1300 lbs (590 kg), 3 gallons of gasoline weigh a mere 18 lbs (8 kg). This illustrates the concept of energy density: a lithium-ion battery is 74X less dense than gasoline.
    • @kelseyhightower: I'm willing to bet developers spend more time reverse engineering inadequate API documentation than implementing business logic.
    • @sgmansfield: OH: our ci server continues to run out of inodes because each web site uses ~140,000 files in node_modules
    • @relix42: “We use maven to download half the internet and npm to get the other half…”
    • NEIL IRWIN: economic expansions do not die of old age—an old expansion like our current one is not likelier to enter a recession in the next year than a young expansion.
    • @popey: I am in 6 slack channels. 1.5GB RAM consumed by the desktop app. In 100+ IRC channels. 25MB consumed by irssi. The future is rubbish.
    • @SwiftOnSecurity: The only way to improve the security of these IoT devices is market forces. They must not be allowed to profit without fear of repercussions
    • The Ancient One: you think you know how the world works. What if I told you, through the mystic arts, we harness energy and shape reality?
    • @natpryce: "If you have four groups working on a compiler*, you'll get a four-pass compiler" *and you describe the problem in terms of passes
    • @PatrickMcFadin: Free cloud APIs are closing up as investors start looking for a return. Codebender is closing down 
    • We have quotes n the likes of which even god has never seen. Read the full article to them all.

  • The true program is the programmer. Ralph Waldo Emerson: “The true poem is the poet's mind; the true ship is the ship-builder. In the man, could we lay him open, we should see the reason for the last flourish and tendril of his work; as every spine and tint in the sea-shell preexist in the secreting organs of the fish.”

  • Who would have thought something like this was possible? A Regex that only matches itself. As regexes go it's not even all that weird looking. One of the comments asks for a proof of why it works. That would be interesting.

  • Docker in Production: A History of Failure. Generated a lot of heat and some light. Good comments on HN and on reddit and on reddit. A lot of the comments say yes, there a problems with Docker, but end up saying something like...tzaman: That's odd, we've been using Docker for about a year in development and half a year in production (on Google Container engine / Kubernetes) and haven't experienced any of the panics, crashes yet (at least not any we could not attribute as a failure on our end).

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Categories: Architecture

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp

Who's Hiring?
  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com

  • IT Security Engineering. At Gusto we are on a mission to create a world where work empowers a better life. As Gusto's IT Security Engineer you'll shape the future of IT security and compliance. We're looking for a strong IT technical lead to manage security audits and write and implement controls. You'll also focus on our employee, network, and endpoint posture. As Gusto's first IT Security Engineer, you will be able to build the security organization with direct impact to protecting PII and ePHI. Read more and apply here.
Fun and Informative Events
  • Your event here!
Cool Products and Services
  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Categories: Architecture

The QuickBooks Platform

This is a guest post by Siddharth Ram – Chief Architect, Small Business. Siddharth_ram@intuit.com.

The QuickBooks ecosystem is the largest small business SaaS product. The QuickBooks Platform supports bookkeeping, payroll and payment solutions for small businesses, their customers and accountants worldwide. Since QuickBooks is also a compliance & tax filing platform, consistency in reporting is extremely important.. Financial reporting requires flexibility in queries – a given report may have dozens of different dimensions that can be tweaked. Collaboration requires multiple edits by employees, Accountants and Business owners at the same time, leading to potential conflicts. All this leads to solving interesting scaling problems at Intuit.

Solving for scalability requires thinking on multiple time horizons and axes. Scaling is not just about scaling software – it is also about people scalability, process scalability and culture scalability. All these axes are actively worked on at Intuit. Our goal with employees is to create an atmosphere that allows them to do the best work of their lives.

Background
Categories: Architecture

Building on the shoulders of giants: microservices as a redesign strategy

Xebia Blog - Fri, 11/04/2016 - 20:10
With the rise of new-IT backed companies in almost every segment; from retail to financial institutions, more traditional companies are often forced in change or perish strategies. Where the business strengths of newer competitors are often enforced by strong, serial startup developers, able to integrate the experience of previous failures into completely new stacks. Older companies’

AzureCon Keynote Announcements: India Regions, GPU Support, IoT Suite, Container Service, and Security Center

ScottGu's Blog - Scott Guthrie - Thu, 10/01/2015 - 06:43

Yesterday we held our AzureCon event and were fortunate to have tens of thousands of developers around the world participate.  During the event we announced several great new enhancements to Microsoft Azure including:

  • General Availability of 3 new Azure regions in India
  • Announcing new N-series of Virtual Machines with GPU capabilities
  • Announcing Azure IoT Suite available to purchase
  • Announcing Azure Container Service
  • Announcing Azure Security Center

We were also fortunate to be joined on stage by several great Azure customers who talked about their experiences using Azure including: Jet.com, Nascar, Alaska Airlines, Walmart, and ThyssenKrupp. Watching the Videos

All of the talks presented at AzureCon (including the 60 breakout talks) are now available to watch online.  You can browse and watch all of the sessions here.

image

My keynote to kick off the event was an hour long and provided an end-to-end look at Azure and some of the big new announcements of the day.  You can watch it here.

Below are some more details of some of the highlights:

Announcing General Availability of 3 new Azure regions in India

Yesterday we announced the general availability of our new India regions: Mumbai (West), Chennai (South) and Pune (Central).  They are now available for you to deploy solutions into.

This brings our worldwide presence of Azure regions up to 24 regions, more than AWS and Google combined. Over 125 customers and partners have been participating in the private preview of our new India regions.   We are seeing tremendous interest from industry sectors like Public Sector, Banking Financial Services, Insurance and Healthcare whose cloud adoption has been restricted by data residency requirements.  You can all now deploy your solutions too. Announcing N-series of Virtual Machines with GPU Support

This week we announced our new N-series family of Azure Virtual Machines that enable GPU capabilities.  Featuring NVidia’s best of breed Tesla GPUs, these Virtual Machines will help you run a variety of workloads ranging from remote visualization to machine learning to analytics.

The N-series VMs feature NVidia’s flagship GPU, the K80 which is well supported by NVidia’s CUDA development community. N-series will also have VM configurations featuring the latest M60 which was recently announced by NVidia. With support for M60, Azure becomes the first hyperscale cloud provider to bring the capabilities of NVidia’s Quadro High End Graphics Support to the cloud. In addition, N-series combines GPU capabilities with the superfast RDMA interconnect so you can run multi-machine, multi-GPU workloads such as Deep Learning and Skype Translator Training.

Announcing Azure Security Center

This week we announced the new Azure Security Center—a new Azure service that gives you visibility and control of the security of your Azure resources, and helps you stay ahead of threats and attacks.  Azure is the first cloud platform to provide unified security management with capabilities that help you prevent, detect, and respond to threats.

image

The Azure Security Center provides a unified view of your security state, so your team and/or your organization’s security specialists can get the information they need to evaluate risk across the workloads they run in the cloud.  Based on customizable policy, the service can provide recommendations. For example, the policy might be that all web applications should be protected by a web application firewall. If so, the Azure Security Center will automatically detect when web apps you host in Azure don’t have a web application firewall configured, and provide a quick and direct workflow to get a firewall from one of our partners deployed and configured:

image

Of course, even with the best possible protection in place, attackers will still try to compromise systems. To address this problem and adopt an “assume breach” mindset, the Azure Security Center uses advanced analytics, including machine learning, along with Microsoft’s global threat intelligence network to look for and alert on attacks. Signals are automatically collected from your Azure resources, the network, and integrated security partner solutions and analyzed to identify cyber-attacks that might otherwise go undetected. Should an incident occur, security alerts offer insights into the attack and suggest ways to remediate and recover quickly. Security data and alerts can also be piped to existing Security Information and Events Management (SIEM) systems your organization has already purchased and is using on-premises.

image

No other cloud vendor provides the depth and breadth of these capabilities, and they are going to enable you to build even more secure applications in the cloud.

Announcing Azure IoT Suite Available to Purchase

The Internet of Things (IoT) provides tremendous new opportunities for organizations to improve operations, become more efficient at what they do, and create new revenue streams.  We have had a huge interest in our Azure IoT Suite which until this week has been in public preview.  Our customers like Rockwell Automation and ThyssenKrupp Elevators are already connecting data and devices to solve business problems and improve their operations. Many more businesses are poised to benefit from IoT by connecting their devices to collect and analyze untapped data with remote monitoring or predictive maintenance solutions.

In working with customers, we have seen that getting started on IoT projects can be a daunting task starting with connecting existing devices, determining the right technology partner to work with and scaling an IoT project from proof of concept to broad deployment. Capturing and analyzing untapped data is complex, particularly when a business tries to integrate this new data with existing data and systems they already have. 

The Microsoft Azure IoT Suite helps address many of these challenges.  The Microsoft Azure IoT Suite helps you connect and integrate with devices more easily, and to capture and analyze untapped device data by using our preconfigured solutions, which are engineered to help you move quickly from proof of concept and testing to broader deployment. Today we support remote monitoring, and soon we will be delivering support for predictive maintenance and asset management solutions.

These solutions reliably capture data in the cloud and analyze the data both in real-time and in batch processing. Once your devices are connected, Azure IoT Suite provides real time information in an intuitive format that helps you take action from insights. Our advanced analytics then enables you to easily process data—even when it comes from a variety of sources, including devices, line of business assets, sensors and other systems and provide rich built-in dashboards and analytics tools for access to the data and insights you need. User permissions can be set to control reporting and share information with the right people in your organization.

Below is an example of the types of built-in dashboard views that you can leverage without having to write any code:

image

To support adoption of the Azure IoT Suite, we are also announcing the new Microsoft Azure Certified for IoT program, an ecosystem of partners whose offerings have been tested and certified to help businesses with their IoT device and platform needs. The first set of partners include Beaglebone, Freescale, Intel, Raspberry Pi, Resin.io, Seeed and Texas Instruments. These partners, along with experienced global solution providers are helping businesses harness the power of the Internet of Things today.  

You can learn more about our approach and the Azure IoT Suite at www.InternetofYourThings.com and partners can learn more at www.azure.com/iotdev. Announcing Azure IoT Hub

This week we also announced the public preview of our new Azure IoT Hub service which is a fully managed service that enables reliable and secure bi-directional communications between millions of IoT devices and an application back end. Azure IoT Hub offers reliable device-to-cloud and cloud-to-device hyper-scale messaging, enables secure communications using per-device security credentials and access control, and includes device libraries for the most popular languages and platforms.

Providing secure, scalable bi-directional communication from the heterogeneous devices to the cloud is a cornerstone of any IoT solution which Azure IoT hub addresses in the following way:

  • Per-device authentication and secure connectivity: Each device uses its own security key to connect to IoT Hub. The application back end is then able to individually whitelist and blacklist each device, enabling complete control over device access.
  • Extensive set of device libraries: Azure IoT device SDKs are available and supported for a variety of languages and platforms such as C, C#, Java, and JavaScript.
  • IoT protocols and extensibility: Azure IoT Hub provides native support of the HTTP 1.1 and AMQP 1.0 protocols for device connectivity. Azure IoT Hub can also be extended via the Azure IoT protocol gateway open source framework to provide support for MQTT v3.1.1.
  • Scale: Azure IoT Hub scales to millions of simultaneously connected devices, and millions of events per seconds.

Getting started with Azure IoT Hub is easy. Simply navigate to the Azure Preview portal, and use the Internet of Things->Azure IoT Hub. Choose the name, pricing tier, number of units and location and select Create to provision and deploy your IoT Hub:

image

Once the IoT hub is created, you can navigate to Settings and create new shared access policies and modify other messaging settings for granular control.

The bi-directional communication enabled with an IoT Hub provides powerful capabilities in a real world IoT solution such as the control of individual device security credentials and access through the use of a device identity registry.  Once a device identity is in the registry, the device can connect, send device-to-cloud messages to the hub, and receive cloud-to-device messages from backend applications with just a few lines of code in a secure way.

Learn more about Azure IoT Hub and get started with your own real world IoT solutions. Announcing the new Azure Container Service

’We’ve been working with Docker to integrate Docker containers with both Azure and Windows Server for some time. This week we announced the new Azure Container Service which leverages the popular Apache Mesos project to deliver a customer proven orchestration solution for applications delivered as Docker containers.

image[24]

The Azure Container Service enables users to easily create and manage a Docker enabled Apache Mesos cluster. The container management software running on these clusters is open source, and in addition to the application portability offered by tooling such as Docker and Docker Compose, you will be able to leverage portable container orchestration and management tooling such as Marathon, Chronos and Docker Swarm.

When utilizing the Azure Container Service, you will be able to take advantage of the tight integration with Azure infrastructure management features such as tagging of resources, Role Based Access Control (RBAC), Virtual Machine Scale Sets (VMSS) and the fully integrated user experience in the Azure portal. By coupling the enterprise class Azure cloud with key open source build, deploy and orchestration software, we maximize customer choice when it comes to containerize workloads.

The service will be available for preview by the end of the year. Learn More

Watch the AzureCon sessions online to learn more about all of the above announcements – plus a lot more that was covered during the day.  We are looking forward to seeing what you build with what you learn!

Hope this helps,

Scott omni

Categories: Architecture, Programming

Announcing General Availability of HDInsight on Linux + new Data Lake Services and Language

ScottGu's Blog - Scott Guthrie - Mon, 09/28/2015 - 21:54

Today, I’m happy to announce several key additions to our big data services in Azure, including the General Availability of HDInsight on Linux, as well as the introduction of our new Azure Data Lake and Language services. General Availability of HDInsight on Linux

Today we are announcing general availability of our HDInsight service on Ubuntu Linux.  HDInsight enables you to easily run managed Hadoop clusters in the cloud.  With today’s release we now allow you to configure these clusters to run using both a Windows Server Operating System as well as an Ubuntu based Linux Operating System.

HDInsight on Linux enables even broader support for Hadoop ecosystem partners to run in HDInsight providing you even greater choice of preferred tools and applications for running Hadoop workloads. Both Linux and Windows clusters in HDInsight are built on the same standard Hadoop distribution and offer the same set of rich capabilities.

Today’s new release also enables additional capabilities, such as, cluster scaling, virtual network integration and script action support. Furthermore, in addition to Hadoop cluster type, you can now create HBase and Storm clusters on Linux for your NoSQL and real time processing needs such as building an IoT application.

Create a cluster

HDInsight clusters running using Linux can now be easily created from the Azure Management portal under the Data + Analytics section.  Simply select Ubuntu from the cluster operating system drop-down, as well as optionally choose the cluster type you wish to create (we support base Hadoop as well as clusters pre-configured for workloads like Storm, Spark, HBase, etc).

image

All HDInsight Linux clusters can be managed by Apache Ambari. Ambari provides the ability to customize configuration settings of your Hadoop cluster while giving you a unified view of the performance and state of your cluster and providing monitoring and alerting within the HDInsight cluster.

image

Installing additional applications and Hadoop components

Similar to HDInsight Windows clusters, you can now customize your Linux cluster by installing additional applications or Hadoop components that are not part of default HDInsight deployment. This can be accomplished using Bash scripts with script action capability.  As an example, you can now install Hue on an HDInsight Linux cluster and easily use it with your workloads:

image

Develop using Familiar Tools

All HDInsight Linux clusters come with SSH connectivity enabled by default. You can connect to the cluster via a SSH client of your choice. Moreover, SSH tunneling can be leveraged to remotely access all of the Hadoop web applications from the browser.

image New Azure Data Lake Services and Language

We continue to see customers enabling amazing scenarios with big data in Azure including analyzing social graphs to increase charitable giving, analyzing radiation exposure and using the signals from thousands of devices to simulate ways for utility customers to optimize their monthly bills. These and other use cases are resulting in even more data being collected in Azure. In order to be able to dive deep into all of this data, and process it in different ways, you can now use our Azure Data Lake capabilities – which are 3 services that make big data easy.

The first service in the family is available today: Azure HDInsight, our managed Hadoop service that lets you focus on finding insights, and not spend your time having to manage clusters. HDInsight lets you deploy Hadoop, Spark, Storm and HBase clusters, running on Linux or Windows, managed, monitored and supported by Microsoft with a 99.9% SLA.

The other two services, Azure Data Lake Store and Azure Data Lake Analytics introduced below, are available in private preview today and will be available broadly for public usage shortly. Azure Data Lake Store

Azure Data Lake Store is a hyper-scale HDFS repository designed specifically for big data analytics workloads in the cloud. Azure Data Lake Store solves the big data challenges of volume, variety, and velocity by enabling you to store data of any type, at any size, and process it at any scale. Azure Data Lake Store can support near real-time scenarios such as the Internet of Things (IoT) as well as throughput-intensive analytics on huge data volumes. The Azure Data Lake Store also supports a variety of computation workloads by removing many of the restrictions constraining traditional analytics infrastructure like the pre-definition of schema and the creation of multiple data silos. Once located in the Azure Data Lake Store, Hadoop-based engines such as Azure HDInsight can easily mine the data to discover new insights.

Some of the key capabilities of Azure Data Lake Store include:

  • Any Data: A distributed file store that allows you to store data in its native format, Azure Data Lake Store eliminates the need to transform or pre-define schema in order to store data.
  • Any Size: With no fixed limits to file or account sizes, Azure Data Lake Store enables you to store kilobytes to exabytes with immediate read/write access.
  • At Any Scale: You can scale throughput to meet the demands of your analytic systems including the high throughput needed to analyze exabytes of data. In addition, it is built to handle high volumes of small writes at low latency making it optimal for near real-time scenarios like website analytics, and Internet of Things (IoT).
  • HDFS Compatible: It works out-of-the-box with the Hadoop ecosystem including other Azure Data Lake services such as HDInsight.
  • Fully Integrated with Azure Active Directory: Azure Data Lake Store is integrated with Azure Active Directory for identity and access management over all of your data.
Azure Data Lake Analytics with U-SQL

The new Azure Data Lake Analytics service makes it much easier to create and manage big data jobs. Built on YARN and years of experience running analytics pipelines for Office 365, XBox Live, Windows and Bing, the Azure Data Lake Analytics service is the most productive way to get insights from big data. You can get started in the Azure management portal, querying across data in blobs, Azure Data Lake Store, and Azure SQL DB. By simply moving a slider, you can scale up as much computing power as you’d like to run your data transformation jobs.

image

Today we are introducing a new U-SQL offering in the analytics service, an evolution of the familiar syntax of SQL.  U-SQL allows you to write declarative big data jobs, as well as easily include your own user code as part of those jobs. Inside Microsoft, developers have been using this combination in order to be productive operating on massive data sets of many exabytes of scale, processing mission critical data pipelines. In addition to providing an easy to use experience in the Azure management portal, we are delivering a rich set of tools in Visual Studio for debugging and optimizing your U-SQL jobs. This lets you play back and analyze your big data jobs, understanding bottlenecks and opportunities to improve both performance and efficiency, so that you can pay only for the resources you need and continually tune your operations.

image Learn More

For more information and to get started, check out the following links:

Hope this helps,

Scott

omni
Categories: Architecture, Programming

Online AzureCon Conference this Tuesday

ScottGu's Blog - Scott Guthrie - Mon, 09/28/2015 - 04:35

This Tuesday, Sept 29th, we are hosting our online AzureCon event – which is a free online event with 60 technical sessions on Azure presented by both the Azure engineering team as well as MVPs and customers who use Azure today and will share their best practices.

I’ll be kicking off the event with a keynote at 9am PDT.  Watch it to learn the latest on Azure, and hear about a lot of exciting new announcements.  We’ll then have some fantastic sessions that you can watch throughout the day to learn even more.

image

Hope to see you there!

Scott

omni
Categories: Architecture, Programming

Better Density and Lower Prices for Azure’s SQL Elastic Database Pools

ScottGu's Blog - Scott Guthrie - Wed, 09/23/2015 - 21:41

A few weeks ago, we announced the preview availability of the new Basic and Premium Elastic Database Pools Tiers with our Azure SQL Database service.  Elastic Database Pools enable you to run multiple, isolated and independent databases that can be auto-scaled automatically across a private pool of resources dedicated to just you and your apps.  This provides a great way for software-as-a-service (SaaS) developers to better isolate their individual customers in an economical way.

Today, we are announcing some nice changes to the pricing structure of Elastic Database Pools as well as changes to the density of elastic databases within a pool.  These changes make it even more attractive to use Elastic Database Pools to build your applications.

Specifically, we are making the following changes:

  • Finalizing the eDTU price – With Elastic Database Pools you purchase units of capacity that we can call eDTUs – which you can then use to run multiple databases within a pool.  We have decided to not increase the price of eDTUs as we go from preview->GA.  This means that you’ll be able to pay a much lower price (about 50% less) for eDTUs than many developers expected.
  • Eliminating the per-database fee – In additional to lower eDTU prices, we are also eliminating the fee per database that we have had with the preview. This means you no longer need to pay a per-database charge to use an Elastic Database Pool, and makes the pricing much more attractive for scenarios where you want to have lots of small databases.
  • Pool density – We are announcing increased density limits that enable you to run many more databases per Elastic Database pool. See the chart below under “Maximum databases per pool” for specifics. This change will take effect at the time of general availability, but you can design your apps around these numbers.  The increase pool density limits will make Elastic Database Pools event more attractive.

image

 

Below are the updated parameters for each of the Elastic Database Pool options with these new changes:

image

For more information about Azure SQL Database Elastic Database Pools and Management tools go the technical overview here.

Hope this helps,

Scott omni

Categories: Architecture, Programming

Announcing the Biggest VM Sizes Available in the Cloud: New Azure GS-VM Series

ScottGu's Blog - Scott Guthrie - Wed, 09/02/2015 - 18:51

Today, we’re announcing the release of the new Azure GS-series of Virtual Machine sizes, which enable Azure Premium Storage to be used with Azure G-series VM sizes. These VM sizes are now available to use in both our US and Europe regions.

Earlier this year we released the G-series of Azure Virtual Machines – which provide the largest VM size provided by any public cloud provider.  They provide up to 32-cores of CPU, 448 GB of memory and 6.59 TB of local SSD-based storage.  Today’s release of the GS-series of Azure Virtual Machines enables you to now use these large VMs with Azure Premium Storage – and enables you to perform up to 2,000 MB/sec of storage throughput , more than double any other public cloud provider.  Using the G5/GS5 VM size now also offers more than 20 gbps of network bandwidth, also more than double the network throughout provided by any other public cloud provider.

These new VM offerings provide an ideal solution to your most demanding cloud based workloads, and are great for relational databases like SQL Server, MySQL, PostGres and other large data warehouse solutions. You can also use the GS-series to significantly scale-up the performance of enterprise applications like Dynamics AX.

The G and GS-series of VM sizes are available to use now in our West US, East US-2, and West Europe Azure regions.  You’ll see us continue to expand availability around the world in more regions in the coming months. GS Series Size Details

The below table provides more details on the exact capabilities of the new GS-series of VM sizes:

Size

Cores

Memory

Max Disk IOPS

Max Disk Bandwidth

(MB per second)

Standard_GS1

2

28

5,000

125

Standard_GS2

4

56

10,000

250

Standard_GS3

8

112

20,000

500

Standard_GS4

16

224

40,000

1,000

Standard_GS5

32

448

80,000

2,000

Creating a GS-Series Virtual Machine

Creating a new GS series VM is very easy.  Simply navigate to the Azure Preview Portal, select New(+) and choose your favorite OS or VM image type:

image

Click the Create button, and then click the pricing tier option and select “View All” to see the full list of VM sizes. Make sure your region is West US, East US 2, or West Europe to select the G-series or the GS-Series:

image

When choosing a GS-series VM size, the portal will create a storage account using Premium Azure Storage. You can select an existing Premium Storage account, as well, to use for the OS disk of the VM:

image

Hitting Create will launch and provision the VM. Learn More

If you would like more information on the GS-Series VM sizes as well as other Azure VM Sizes then please visit the following page for additional details: Virtual Machine Sizes for Azure.

For more information on Premium Storage, please see: Premium Storage overview. Also, refer to Using Linux VMs with Premium Storage for more details on Linux deployments on Premium Storage.

Hope this helps,

Scott

omni
Categories: Architecture, Programming

Announcing Great New SQL Database Capabilities in Azure

ScottGu's Blog - Scott Guthrie - Thu, 08/27/2015 - 17:13

Today we are making available several new SQL Database capabilities in Azure that enable you to build even better cloud applications.  In particular:

  • We are introducing two new pricing tiers for our  Elastic Database Pool capability.  Elastic Database Pools enable you to run multiple, isolated and independent databases on a private pool of resources dedicated to just you and your apps.  This provides a great way for software-as-a-service (SaaS) developers to better isolate their individual customers in an economical way.
  • We are also introducing new higher-end scale options for SQL Databases that enable you to run even larger databases with significantly more compute + storage + networking resources.

Both of these additions are available to start using immediately.  Elastic Database Pools

If you are a SaaS developer with tens, hundreds, or even thousands of databases, an elastic database pool dramatically simplifies the process of creating, maintaining, and managing performance across these databases within a budget that you control. 

image

A common SaaS application pattern (especially for B2B SaaS apps) is for the SaaS app to use a different database to store data for each customer.  This has the benefit of isolating the data for each customer separately (and enables each customer’s data to be encrypted separately, backed-up separately, etc).  While this pattern is great from an isolation and security perspective, each database can end up having varying and unpredictable resource consumption (CPU/IO/Memory patterns), and because the peaks and valleys for each customer might be difficult to predict, it is hard to know how much resources to provision.  Developers were previously faced with two options: either over-provision database resources based on peak usage--and overpay. Or under-provision to save cost--at the expense of performance and customer satisfaction during peaks.

Microsoft created elastic database pools specifically to help developers solve this problem.  With Elastic Database Pools you can allocate a shared pool of database resources (CPU/IO/Memory), and then create and run multiple isolated databases on top of this pool.  You can set minimum and maximum performance SLA limits of your choosing for each database you add into the pool (ensuring that none of the databases unfairly impacts other databases in your pool).  Our management APIs also make it much easier to script and manage these multiple databases together, as well as optionally execute queries that span across them (useful for a variety operations).  And best of all when you add multiple databases to an Elastic Database Pool, you are able to average out the typical utilization load (because each of your customers tend to have different peaks and valleys) and end up requiring far fewer database resources (and spend less money as a result) than you would if you ran each database separately.

The below chart shows a typical example of what we see when SaaS developers take advantage of the Elastic Pool capability.  Each individual database they have has different peaks and valleys in terms of utilization.  As you combine multiple of these databases into an Elastic Pool the peaks and valleys tend to normalize out (since they often happen at different times) to require much less overall resources that you would need if each database was resourced separately:

databases sharing eDTUs

Because Elastic Database Pools are built using our SQL Database service, you also get to take advantage of all of the underlying database as a service capabilities that are built into it: 99.99% SLA, multiple-high availability replica support built-in with no extra charges, no down-time during patching, geo-replication, point-in-time recovery, TDE encryption of data, row-level security, full-text search, and much more.  The end result is a really nice database platform that provides a lot of flexibility, as well as the ability to save money.

New Basic and Premium Tiers for Elastic Database Pools

Earlier this year at the //Build conference we announced our new Elastic Database Pool support in Azure and entered public preview with the Standard Tier edition of it.  The Standard Tier allows individual databases within the elastic pool to burst up to 100 eDTUs (a DTU represents a combination of Compute + IO + Storage performance) for performance. 

Today we are adding additional Basic and Premium Elastic Database Pools to the preview to enable a wider range of performance and cost options.

  • Basic Elastic Database Pools are great for light-usage SaaS scenarios.  Basic Elastic Database Pools allows individual databases performance bursts up to 5 eDTUs.
  • Premium Elastic Database Pools are designed for databases that require the highest performance per database. Premium Elastic Database Pools allows individual database performance bursts up to 1,000 eDTUs.

Collectively we think these three Elastic Database Pool pricing tier options provide a tremendous amount of flexibility and optionality for SaaS developers to take advantage of, and are designed to enable a wide variety of different scenarios. Easily Migrate Databases Between Pricing Tiers

One of the cool capabilities we support is the ability to easily migrate an individual database between different Elastic Database Pools (including ones with different pricing tiers).  For example, if you were a SaaS developer you could start a customer out with a trial edition of your application – and choose to run the database that backs it within a Basic Elastic Database Pool to run it super cost effectively.  As the customer’s usage grows you could then auto-migrate them to a Standard database pool without customer downtime.  If the customer grows up to require a tremendous amount of resources you could then migrate them to a Premium Database Pool or run their database as a standalone SQL Database with a huge amount of resource capacity.

This provides a tremendous amount of flexibility and capability, and enables you to build even better applications. Managing Elastic Database Pools

One of the the other nice things about Elastic Database Pools is that the service provides the management capabilities to easily manage large collections of databases without you having to worry about the infrastructure that runs it.   

You can create and mange Elastic Database Pools using our Azure Management Portal or via our Command-line tools or REST Management APIs.  With today’s update we are also adding support so that you can use T-SQL to add/remove new databases to/from an elastic pool.  Today’s update also adds T-SQL support for measuring resource utilization of databases within an elastic pool – making it even easier to monitor and track utilization by database.

image Elastic Database Pool Tier Capabilities

During the preview, we have been and will continue to tune a number of parameters that control the density of Elastic Database Pools as we progress through the preview.

In particular, the current limits for the number of databases per pool and the number of pool eDTUs is something we plan to steadily increase as we march towards the general availability release.  Our plan is to provide the highest possible density per pool, largest pool sizes, and the best Elastic Database Pool economics while at the same time keeping our 99.99 availability SLA.

Below are the current performance parameters for each of the Elastic Database Pool Tier options in preview today:

 

Basic Elastic

Standard Elastic

Premium Elastic

Elastic Database Pool

eDTU range per pool (preview limits)

100-1200 eDTUs

100-1200 eDTUs

125-1500 eDTUs

Storage range per pool

10-120 GB

100-1200 GB

63-750 GB

Maximum database per pool (preview limits)

200

200

50

Estimated monthly pool and add-on  eDTU costs (preview prices)

Starting at $0.2/hr (~$149/pool/mo).

Each additional eDTU $.002/hr (~$1.49/mo)

Starting at $0.3/hr (~$223/pool mo). 

Each additional eDTU $0.003/hr (~$2.23/mo)

Starting at $0.937/hr (`$697/pool/mo).

Each additional eDTU $0.0075/hr (~$5.58/mo)

Storage per eDTU

0.1 GB per eDTU

1 GB per eDTU

.5 GB per eDTU

Elastic Databases

eDTU max per database (preview limits)

0-5

0-100

0-1000

Storage max per DB

2 GB

250 GB

500 GB

Per DB cost (preview prices)

$0.0003/hr (~$0.22/mo)

$0.0017/hr (~$1.26/mo)

$0.0084/hr (~$6.25/mo)

We’ll continue to iterate on the above parameters and increase the maximum number of databases per pool as we progress through the preview, and would love your feedback as we do so.

New Higher-Scale SQL Database Performance Tiers

In addition to the enhancements for Elastic Database Pools, we are also today releasing new SQL Database Premium performance tier options for standalone databases. 

Today we are adding a new P4 (500 DTU) and a P11 (1750 DTU) level which provide even higher performance database options for SQL Databases that want to scale-up. The new P11 edition also now supports databases up to 1TB in size.

Developers can now choose from 10 different SQL Database Performance levels.  You can easily scale-up/scale-down as needed at any point without database downtime or interruption.  Each database performance tier supports a 99.99% SLA, multiple-high availability replica support built-in with no extra charges (meaning you don’t need to buy multiple instances to get an SLA – this is built-into each database), no down-time during patching, point-in-time recovery options (restore without needing a backup), TDE encryption of data, row-level security, and full-text search.

image

Learn More

You can learn more about SQL Databases by visiting the http://azure.microsoft.com web-site.  Check out the SQL Database product page to learn more about the capabilities SQL Databases provide, as well as read the technical documentation to learn more how to build great applications using it.

Summary

Today’s database updates enable developers to build even better cloud applications, and to use data to make them even richer more intelligent.  We are really looking forward to seeing the solutions you build.

Hope this helps,

Scott

omni
Categories: Architecture, Programming

Announcing Windows Server 2016 Containers Preview

ScottGu's Blog - Scott Guthrie - Wed, 08/19/2015 - 17:01

At DockerCon this year, Mark Russinovich, CTO of Microsoft Azure, demonstrated the first ever application built using code running in both a Windows Server Container and a Linux container connected together. This demo helped demonstrate Microsoft's vision that in partnership with Docker, we can help bring the Windows and Linux ecosystems together by enabling developers to build container-based distributed applications using the tools and platforms of their choice.

Today we are excited to release the first preview of Windows Server Containers as part of our Windows Server 2016 Technical Preview 3 release. We’re also announcing great updates from our close collaboration with Docker, including enabling support for the Windows platform in the Docker Engine and a preview of the Docker Engine for Windows. Our Visual Studio Tools for Docker, which we previewed earlier this year, have also been updated to support Windows Server Containers, providing you a seamless end-to-end experience straight from Visual Studio to develop and deploy code to both Windows Server and Linux containers. Last but not least, we’ve made it easy to get started with Windows Server Containers in Azure via a dedicated virtual machine image. Windows Server Containers

Windows Server Containers create a highly agile Windows Server environment, enabling you to accelerate the DevOps process to efficiently build and deploy modern applications. With today’s preview release, millions of Windows developers will be able to experience the benefits of containers for the first time using the languages of their choice – whether .NET, ASP.NET, PowerShell or Python, Ruby on Rails, Java and many others.

Today’s announcement delivers on the promise we made in partnership with Docker, the fast-growing open platform for distributed applications, to offer container and DevOps benefits to Linux and Windows Server users alike. Windows Server Containers are now part of the Docker open source project, and Microsoft is a founding member of the Open Container Initiative. Windows Server Containers can be deployed and managed either using the Docker client or PowerShell. Getting Started using Visual Studio

The preview of our Visual Studio Tools for Docker, which enables developers to build and publish ASP.NET 5 Web Apps or console applications directly to a Docker container, has been updated to include support for today’s preview of Windows Server Containers. The extension automates creating and configuring your container host in Azure, building a container image which includes your application, and publishing it directly to your container host. You can download and install this extension, and read more about it, at the Visual Studio Gallery here: http://aka.ms/vslovesdocker.

Once installed, developers can right-click on their projects within Visual Studio and select “Publish”:

image

Doing so will display a Publish dialog which will now include the ability to deploy to a Docker Container (on either a Windows Server or Linux machine):

image

You can choose to deploy to any existing Docker host you already have running:

image

Or use the dialog to create a new Virtual Machine running either Window Server or Linux with containers enabled.  The below screen-shot shows how easy it is to create a new VM hosted on Azure that runs today’s Windows Server 2016 TP3 preview that supports Containers – you can do all of this (and deploy your apps to it) easily without ever having to leave the Visual Studio IDE:

image Getting Started Using Azure

In June of last year, at the first DockerCon, we enabled a streamlined Azure experience for creating and managing Docker hosts in the cloud. Up until now these hosts have only run on Linux. With the new preview of Windows Server 2016 supporting Windows Server Containers, we have enabled a parallel experience for Windows users.

Directly from the Azure Marketplace, users can now deploy a Windows Server 2016 virtual machine pre-configured with the container feature enabled and Docker Engine installed. Our quick start guide has all of the details including screen shots and a walkthrough video so take a look here https://msdn.microsoft.com/en-us/virtualization/windowscontainers/quick_start/azure_setup.

image

Once your container host is up and running, the quick start guide includes step by step guides for creating and managing containers using both Docker and PowerShell. Getting Started Locally Using Hyper-V

Creating a virtual machine on your local machine using Hyper-V to act as your container host is now really easy. We’ve published some PowerShell scripts to GitHub that automate nearly the whole process so that you can get started experimenting with Windows Server Containers as quickly as possible. The quick start guide has all of the details at https://msdn.microsoft.com/en-us/virtualization/windowscontainers/quick_start/container_setup.

Once your container host is up and running the quick start guide includes step by step guides for creating and managing containers using both Docker and PowerShell.

image Additional Information and Resources

A great list of resources including links to past presentations on containers, blogs and samples can be found in the community section of our documentation. We have also setup a dedicated Windows containers forum where you can provide feedback, ask questions and report bugs. If you want to learn more about the technology behind containers I would highly recommend reading Mark Russinovich’s blog on “Containers: Docker, Windows and Trends” that was published earlier this week. Summary

At the //Build conference earlier this year we talked about our plan to make containers a fundamental part of our application platform, and today’s releases are a set of significant steps in making this a reality.’ The decision we made to embrace Docker and the Docker ecosystem to enable this in both Azure and Windows Server has generated a lot of positive feedback and we are just getting started.

While there is still more work to be done, now users in the Window Server ecosystem can begin experiencing the world of containers. I highly recommend you download the Visual Studio Tools for Docker, create a Windows Container host in Azure or locally, and try out our PowerShell and Docker support. Most importantly, we look forward to hearing feedback on your experience.

Hope this helps,

Scott omni

Categories: Architecture, Programming

Another elasticsearch blog post, now about Shield

Gridshore - Thu, 01/29/2015 - 21:13

<p>I just wrote another piece of text on my other blog. This time I wrote about the recently release elasticsearch plugin called Shield. If you want to learn more about securing your elasticsearch cluster, please head over to my other blog and start reading</p>

http://amsterdam.luminis.eu/2015/01/29/elasticsearch-shield-first-steps-using-java/

The post Another elasticsearch blog post, now about Shield appeared first on Gridshore.

Categories: Architecture, Programming

New blog posts about bower, grunt and elasticsearch

Gridshore - Mon, 12/15/2014 - 08:45

Two new blog posts I want to point out to you all. I wrote these blog posts on my employers blog:

The first post is about creating backups of your elasticsearch cluster. Some time a go they introduced the snapshot/restore functionality. Of course you can use the REST endpoint to use the functionality, but how easy is it if you can use a plugin to handle the snapshots. Or maybe even better, integrate the functionality in your own java application. That is what this blogpost is about, integrating snapshot/restore functionality in you java application. As a bonus there are the screens of my elasticsearch gui project snowing the snapshot/restore functionality.

Creating elasticsearch backups with snapshot/restore

The second blog post I want to put under you attention is front-end oriented. I already mentioned my elasticsearch gui project. This is an Angularjs application. I have been working on the plugin for a long time and the amount of javascript code is increasing. Therefore I wanted to introduce grunt and bower to my project. That is what this blogpost is about.

Improve my AngularJS project with grunt

The post New blog posts about bower, grunt and elasticsearch appeared first on Gridshore.

Categories: Architecture, Programming

New blogpost on kibana 4 beta

Gridshore - Tue, 12/02/2014 - 12:55

If you are like me interested in elasticsearch and kibana, than you might be interested in a blog post I wrote on my employers blog about the new Kibana 4 beta. If so, head over to my employers blog:

http://amsterdam.luminis.eu/2014/12/01/experiment-with-the-kibana-4-beta/

The post New blogpost on kibana 4 beta appeared first on Gridshore.

Categories: Architecture, Programming

Big changes

Gridshore - Sun, 10/26/2014 - 08:27

The first 10 years of my career I worked as a consultant for Capgemini and Accenture. I learned a lot in that time. One of the things I learned was that I wanted something else. I wanted to do something with more impact, more responsibility and together with people that wanted to do challenging projects. Not pure to get up in your career but more because they like doing cool stuff. Therefore I left Accenture to become part of a company called JTeam. It has been over 6 years that this took place.

I started as Chief Architect at JTeam. The goal was to become a leader to the other architects and create a team together with Bram. That time I was lucky that Allard joined me. We share a lot of ideas, which makes it easier to set goals and accomplish them. I got to learn a few very good people at JTeam, to bad that some of them left, but that is life.

After a few years bigger changes took place. Leonard left for Steven and the shift to a company that needs to grow started. We took over two companies (Funk and Neteffect), we now had all disciplines of software development available. From front-end to operations. As the company grew some things had to change. I got more involved in arranging things like internships, tech events, partnerships and human resource management.

We moved into a bigger building and we had better opportunities. One of the opportunities was a search solution created by Shay Banon. Gone was Steven, together with Shay he founded Elasticsearch. We got acquired by Trifork. In this change we lost most of our search expertise because all of our search people joined the elasticsearch initiative. Someone had to pick up search at Trifork and that was me together with Bram.

For over 2 years I invested a lot of time in learning about mainly elasticsearch. I created a number of workshops/trainings and got involved with multiple customers that needed search. I have given trainings to a number of customers to groups varying between 2 and 15 people. In general they were all really pleased with the trainings I have given.

Having so much focus for a while gave me a lot of time to think, I did not need to think about next steps for the company, I just needed to get more knowledgeable about elasticsearch. In that time I started out on a journey to find out what I want. I talked to my management about it and thought about it myself a lot. Then, right before summer holiday I had a diner with two people I know through the Nljug, Hans and Bert. We had a very nice talk and in the end they gave me an opportunity that I really had to have some good thoughts about. It was really interesting, a challenge, not really a technical challenge, but more an experience that is hard to find. During summer holiday I convinced myself this was a very interesting direction and I took the next step.

I had a lunch meeting with my soon to be business partner Sander. After around 30 minutes it already felt good. I really feel the energy of creating something new, I feel inspired again. This is the feeling I have been missing for a while. In September we were told that Bram was leaving Trifork. Since he is the person that got me into JTeam back in the days it felt weird. I understand his reasons to go out and try to start something new. Bram leaving resulted in a vacancy for a CTO and the management team had decided to approach Allard for this role. This was a surprise to me, but a very nice opportunity for Allard and I know he i going to do a good job. At the end of September Sander and myself presented the draft business plan to the board for Luminis. That afternoon hands were shaken. It was than that I made the last call and decided to resign from my Job at Trifork and take this new opportunity at Luminis.

I feel sad about leaving some people behind. I am going to mis the morning talks in the car with Allard about everything related to the company, I am going to mis doing projects with Roberto (We are a hell of team), I am going to mis Byron for his capabilities (You make me feel proud that I guided your first steps within Trifork), I am going to mis chasing customers with Henk (We did a good job the passed year) and I am going to mis Daphne and the after lunch walks . To all of you and all the others at Trifork, it is a small world …

Luminis logo

Together with Sander, and with the help of all the others at Luminis, we are going to start Luminis Amsterdam. This is going to be a challenge for me, but together with Sander I feel we are going to make it happen. I feel confident that the big changes to come will be good changes.

The post Big changes appeared first on Gridshore.

Categories: Architecture, Programming

Facebook Has An Architectural Governance Challenge

Just to be clear, I don't work for Facebook, I have no active engagements with Facebook, my story here is my own and does not necessarily represent that of IBM. I'd spent a little time at Facebook some time ago, I've talked with a few of its principal developers, and I've studied its architecture. That being said:

Facebook has a looming architectural governance challenge.

When I last visited the company, they had only a hundred of so developers, the bulk of whom fit cozily in one large war room. Honestly, it was little indistinguishable from a Really Nice college computer lab: nice work desks, great workstations, places where you could fuel up with caffeine and sugar. Dinner was served right there, so you never needed to leave. Were I a twenty-something with only a dog and a futon to my name, it would be been geek heaven. The code base at the time was, by my estimate, small enough that it was grokable, and the major functional bits were not so large and were sufficiently loosely coupled such that development could proceed along nearly independent threads of progress.

I'll reserve my opinions of Facebook's development and architectural maturity for now. But, I read with interest this article that reports that Facebook plans to double in size in the coming year.

Oh my, the changes they are a comin'.

Let's be clear, there are certain limited conditions under which the maxim "give me PHP and a place to stand, and I will move the world" holds true. Those conditions include having a) a modest code base b) with no legacy friction c) growth and acceptance and limited competition that masks inefficiencies, d) a hyper energetic, manically focused group of developers e) who all fit pretty much in the same room. Relax any of those constraints, and Developing Really Really Hard just doesn't cut it any more.

Consider: the moment you break a development organization across offices, you introduce communication and coordination challenges. Add the crossing of time zones, and unless you've got some governance in place, architectural rot will slowly creep in and the flaws in your development culture will be magnified. The subtly different development cultures that will evolve in each office will yield subtly different textures of code; it's kind of like the evolutionary drift on which Darwin reported. If your architecture is well-structure, well-syndicated, and well-governed, you can more easily split the work across groups; if your architecture is poorly-structured, held in the tribal memory of only a few, and ungoverned, then you can rely on heroics for a while, but that's unsustainable. Your heros will dig in, burn out, or cash out.

Just to be clear, I'm not picking on Facebook. What's happening here is a story that every group that's at the threshold of complexity must cross. If you are outsourcing to India or China or across the city, if you are growing your staff to the point where the important architectural decisions no longer will fit in One Guy's Head, if you no longer have the time to just rewrite everything, if your growing customer base grows increasingly intolerant of capricious changes, then, like it or not, you've got to inject more discipline.

Now, I'm not advocating extreme, high ceremony measures. As a start, there are some fundamentals that will go a long way: establish a well-instrumented and well-automated build and release system; use some collaboration tools that channel work but also allow for serendipitous connections; codify and syndicate the system's load bearing wells/architectural decisions; create a culture of patterns and refactoring.

Remind your developers that what they do, each of of them, is valued; remind your developers there is more to life than coding.

It will be interesting to watch how Facebook metabolizes this growth. Some organizations are successful in so doing; many are not. But I really do wish Facebook success. If they thought the past few years were interesting times, my message to them is that the really interesting times are only now beginning. And I hope they enjoy the journey.
Categories: Architecture

How Watson Works

Earlier this year, I conducted an archeological dig on Watson. I applied the techniques I've developed for the Handbook which involves the use of the UML, Philippe Kruchten's 4+1 View Model, and IBM's Rational Software Architect. The fruits of this work have proven to be useful as groups other than Watson's original developers begin to transform the Watson code base for use in other domains.

You can watch my presentation at IBM Innovate on How Watson Works here.
Categories: Architecture