Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Database

Troubleshooting haproxy 502 errors related to malformed/large HTTP headers

Agile Testing - Grig Gheorghiu - Wed, 07/23/2014 - 00:02
We had a situation recently where our web application started to behave strangely. First nginx (which sits in front of the application) started to error out with messages of this type:

upstream sent too big header while reading response header from upstream

A quick Google search revealed that a fix for this is to bump up proxy_buffer_size in nginx.conf, for both http and https traffic, along these lines:

proxy_buffer_size   256k;
proxy_buffers   4 256k;
proxy_busy_buffers_size   256k;

Now nginx was happy when hit directly. However, haproxy was still erroring out with a 502 'bad gateway' return code, followed by PH. Here is a snippet from the haproxy log file:

Jul 22 21:27:13 127.0.0.1 haproxy[14317]: 172.16.38.57:53408 [22/Jul/2014:21:27:12.776] www-frontend www-backend/www2:80 1/0/1/-1/898 502 8396 - - PH-- 0/0/0/0/0 0/0 "GET /someurl HTTP/1.1"

Another Google search revealed that PH means that haproxy rejected the header from the backend because it was malformed.

At this point, an investigation into the web app did discover a loop in the code that kept adding elements to a cookie included in the response header.

Anyway, I leave this here in the hope that somebody will stumble on it and benefit from it.

First experiences with OpenStack

Agile Testing - Grig Gheorghiu - Thu, 07/17/2014 - 21:37
We hit a big milestone this week, as we started to use OpenStack as a private cloud, intially just for QA/integration environments. Up to now we've been creating KVM machines semi-manually, which used to take minutes. Now we cut down that process to seconds, calling the Nova API from the command line, e.g.:

$ nova boot --image precise-image --flavor www --key_name mykey --nic net-id=3eafbd4f-0389-4c5b-93ba-7764742ee8cd www1.qa1

Once an instance is provisioned, we bootstrap it with Chef:

$ knife bootstrap www1.qa1.mydomain.com -x ubuntu --sudo -E qa1 -N www1.qa1 -r "role[base], role[www]"

Our internal network architecture is fairly complex, so my colleague Jeff Roberts spent quite some time bending OpenStack Neutron to his will (in conjunction with Open vSwitch) in order to support our internal VLANs. The OpenStack infrastructure has been stable so far, and it's just such a pleasure to do everything via an API and not to spin VMs up manually. Being back to working with a (private) cloud feels good.

This is just version 1.0 of our OpenStack rollout. Soon we'll start spinning up one environment at a time using chef-metal and fog  and we'll also integrate instance + environment spin-up with Jenkins. Exciting times ahead!

Setting up the hostname in Ubuntu

Agile Testing - Grig Gheorghiu - Fri, 06/13/2014 - 23:04
Most people recommend setting up the hostname on a Linux box so that:

1) running 'hostname' returns the short name (i.e. myhost)
2) running 'hostname -f' returns the FQDN (i.e. myhost.prod.example.com)
3) running 'hostname -d' returns the domain name (i.e prod.example.com)

After experimenting a bit and also finding this helpful Server Fault post, here's what we did to achieve this (we did it via Chef recipes, but it amounts to the same thing):

  • make sure we have the short name in /etc/hostname:
myhost

(also run 'hostname myhost' at the command line)
  • make sure we have the FQDN as the first entry associated with the IP of the server in /etc/hosts:
10.0.1.10 myhost.prod.example.com myhost myhost.prod
  • make sure we have the domain name set up as the search domain in /etc/resolv.conf:
search prod.example.com

Reboot the box when you're done to make sure all of this survives reboots.


Technologies to look into as a sysadmin

Agile Testing - Grig Gheorghiu - Tue, 05/20/2014 - 18:58
These are some of the technologies that I think are either established, or new and promising, but all useful for sysadmins, no matter what their level of expertise is. Some of them I am already familiar with, some are on my TODO list, some I am exploring currently. They all reflect my own taste, so YMMV.

Operating systems
  • Ubuntu

Programming/scripting languages
  • Go
  • Python/Ruby

Configuration management
  • Chef
  • Ansible

Monitoring/graphing/logging/searching
  • Sensu
  • Graphite
  • Logstash
  • ElasticSearch

Load balancer/Web server
  • HAProxy
  • Nginx

Relational databases
  • MySQL
  • PostgreSQL

Non-relational distributed databases
  • Riak
  • Cassandra

Service discovery
  • etcd
  • consul

Virtualization
  • KVM
  • Vagrant
  • Docker

Software defined networking (SDN)
  • Open vSwitch

IaaS
  • OpenStack

PaaS
  • CloudFoundry


This should keep most people in the industry busy for a while ;-)

Dashboards are important!

Agile Testing - Grig Gheorghiu - Fri, 04/25/2014 - 22:21
In this case, they were a factor in my having a discussion with Eric Garcetti, the mayor of Los Angeles, who was visiting our office. He was intrigued by the Graphite dashboards we have on 8 monitors around the Ops area and I explained to him a little bit of what's going on in terms of what we're graphing. I'll let you guess who is the mayor in this photo:


Slides from my remote presentation on "Modern Web development and operations practices" at MSU

Agile Testing - Grig Gheorghiu - Fri, 04/25/2014 - 22:06
Titus Brown was kind enough to invite me to present to his students in the CSE 491 "Web development" class at MSU. I presented remotely, via Google Hangouts, on "Modern Web development and operations practices" and it was a lot of fun. Perhaps unsurprisingly, most of the questions at the end were on how to get a job in this field and be able to play with some of these cool technologies. My answer was to become active in Open Source, beef up your portfolio on GitHub, go to conferences and network (this was actually Titus's idea, but I wholeheartedly agree), and in general  be curious and passionate about your field, and success will follow. I posted my slides on Slideshare if you are curious to take a look. Thanks to Dr. Brown for inviting me! :-)

Why does it work in staging but not in production?

Agile Testing - Grig Gheorghiu - Mon, 04/07/2014 - 23:07
This is a question that I am sure was faced by every developer and operation engineer out there. There can be multiple answers to this question, and I'll try to offer some of the ones we arrived at, having to do mainly with our Chef workflow, but that can be applied I think to any other configuration management tool.

1) A Chef cookbook version in staging is different from the version in production

This is a common scenario, and it's supposed to work this way. You do want to test out new versions of your cookbooks in staging first, then update the version of the cookbook in production.

2) A feature flag is turned on in staging but turned off in production

We have Chef attributes defined in attributes/default.rb that serve as feature flags. If a certain attribute is true, some recipe code or template section gets included which wouldn't be included if the attribute were false. The situation can occur where a certain attribute is set to true in the staging environment but is set to false in the production environment, at which point things can get out of sync. Again, this is expected, as you do want to test new features out in staging first, but don't forget to turn them on in production at some point.

3) A block of code or template is included in staging but not in production

We had this situation very recently. Instead of using attributes as feature flags, we were directly comparing the environment against 'stg' or 'prod' inside an if block in a template, and only including that template section if the environment was 'stg'. So things were working perfectly in staging, but mysteriously the template section wasn't even there in production. An added difficulty was that the template in question was peppered with non-indented if blocks, so it took us a while to figure out what was going on.

Two lessons here:

a) Make your templates readable by indenting code blocks.

b) Use attributes as feature flags, and don't compare directly against the current environment. This way, it's easier to always look at the default attribute file and see if a given feature flag is true or false.

4) A modification is made to the cookbook version in production directly on the Chef server

I blogged about this issue in the past. Suppose you have an environments file that pins a given cookbook (let's designate it as cookbook C) to 1.0.1 in staging and to 1.0.0 in production. You want to upgrade production to 1.0.1, because it was tested in staging and it worked fine. However, instead of i) modifying the environments/prod.rb file and pinning the cookbook C to 1.0.1, ii) updating the Chef server via "knife environment from file environments/prod.rb" and iii) committing your changes in git, you modify the version of the cookbook C directly on the Chef server with "knife environment edit prod".

Then, the next time you or somebody else modifies environments/prod.rb to bump up another cookbook to the next version, the version of cookbook C in that file is still 1.0.0, so when you upload environments/prod.rb to the Chef server, it will downgrade cookbook C from 1.0.1 to 1.0.0. Chaos will ensue the next time chef-client runs on the nodes that have recipes from cookbook C. Production will be broken, while staging will still happily work.

Here are 2 other scenarios not related directly to staging vs production, but instead having the potential to break production altogether.

You forget to upload the new version of the cookbook to the Chef server

You make all of your modifications to the cookbook, you commit your code to git, but for some reason you forget to upload the cookbook to the Chef server. Particularly if you keep the same version of the cookbook that is in staging (and possibly in production), then your modifications won't take effect and you may spend some quality time pulling your hair.

You upload a cookbook to the Chef server without bumping its version

There is another, even worse, scenario though: you do upload your cookbook to the Chef server, but you realize that you didn't bump up the version number compared to what is currently pinned to production. As a consequence, all the nodes in production that have recipes from that cookbook will be updated the next time they run chef-client. That's a nasty one. It does happen. So make sure you pay attention to your cookbook versioning process and stick to it!