Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Database

Some gotchas around keepalived and iproute2 (part 1)

Agile Testing - Grig Gheorghiu - Fri, 05/31/2013 - 19:09
I should have written this blog post a while ago, while these things were still fresh on my mind. Still, better late than never.

Scenario: 2 bare-metal servers with 6 network ports each, to serve as our HAProxy load balancers in an active/failover configuration based on keepalived (I described how we integrated this with Chef in my previous post).

The architecture we have for the load balancers is as follows

  • 1 network interface (virtual and bonded, see below) is on a 'front-end' VLAN which gets the incoming traffic hitting HAProxy
  • 1 network interface is on a 'back-end' VLAN where the actual servers behind HAProxy live
  • 1 network interface is on an 'ops' VLAN which we want to use for accessing the HAProxy server for monitoring purposes

We (and by the way, when I say we, I mean mostly my colleagues Jeff Roberts and Zmer Andranigian) used Open vSwitch to create a virtual bridge interface and bond 2 physical interfaces on this bridge for each of the 'front-end' and 'back-end' interfaces.

To install Open vSwitch on Ubuntu, use:

# apt-get install openvswitch-switch openvswitch-controller

To create a bridge:

# ovs-vsctl add-br frontend_if

To create a bonded interface with 2 physical NICs (eth0 and eth1) on the frontend_if bridge created above:

# ovs-vsctl add-bond frontend_if frontend_bond eth0 eth1 lacp=active other_config:lacp-time=slow bond_mode=balance-tcp

We did the same for the 'back-end' interface by creating a bridge and bonding eth2 and eth3. We also configured the 'ops' interface as a regular network interface on eth4. To assign IP addresses to frontend_if, backend_if and eth4, we edited /etc/network/interfaces and added stanzas similar to:

auto eth0
iface eth0 inet static
address 0.0.0.0

auto eth1
iface eth1 inet static
address 0.0.0.0



auto frontend_if
iface frontend_if  inet static
        address 172.3.15.36
        netmask 255.255.255.0
        network 172.3.15.0
        broadcast 172.3.15.255
        gateway 172.3.15.1
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 172.3.10.4 172.3.10.8
        dns-search prod-vip.mydomain.com


auto eth2
iface eth2 inet static
address 0.0.0.0

auto eth3
iface eth3 inet static
address 0.0.0.0


auto backend_if
iface backend_if  inet static
        address 172.3.50.36
        netmask 255.255.255.0
        network 172.3.50.0
        broadcast 172.3.50.255
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 172.3.10.4 172.3.10.8
        dns-search prod.mydomain.com


At this point, we wanted to be able to ssh from a remote location into the HAProxy box using any of the 3 IP addresses associated with frontend_if, backend_if, and eth4. The problem was that with the regular routing rules in Linux, there's one default gateway, which in our case was on the same VLAN with frontend_if (172.3.15.1).

The solution was to install and configure the iproute2 package. This allows you to have multiple default gateways, one per interface that you want to configure this way (this blog post on iproute2 commands proved to be very useful).

To configure a default gateway for each of the 2 interfaces we defined above (frontend_if and backend_if), we added the following commands to /etc/rc.local  so that they can be run each time the box gets rebooted:


echo "1 admin" > /etc/iproute2/rt_tables
ip route add 172.3.50.0/24 dev backend_if src 172.3.50.36 table admin
ip route add default via 172.3.50.1 dev backend_if table admin
ip rule add from 172.3.15.36/32 table admin
ip rule add to 172.3.15.36/32 table admin

echo "2 admin2" >> /etc/iproute2/rt_tables
ip route add 172.3.15.0/24 dev frontend_if src 172.3.15.36 table admin2
ip route add default via 172.30.15.1 dev frontend_if table admin2
ip rule add from 172.3.15.36/32 table admin2
ip rule add to 172.3.15.36/32 table admin2


This was working great, but there was another aspect to this setup: we needed to get keepalived working between the 2 HAProxy boxes. Running keepalived means there is a new floating IP, which is a virtual IP address maintained by the keepalived process. In our case, this floating IP (172.3.15.38) was attached to the frontend_if interface, which means we had to add another iproute2 stanza:


echo "3 admin3" >> /etc/iproute2/rt_tables
ip route add 172.3.15.0/24 dev frontend_if src 172.3.15.38 table admin3
ip route add default via 172.30.15.1 dev frontend_if table admin3
ip rule add from 172.3.15.38/32 table admin3
ip rule add to 172.3.15.38/32 table admin3

I'll stop here for now. Stay tuned for part 2, where you can read about our adventures trying to get keepalived to work as we wanted it to. Hint: it involved getting rid of iproute2 policies.

Setting up keepalived with Chef on Ubuntu 12.04

Agile Testing - Grig Gheorghiu - Fri, 04/19/2013 - 21:26
We have 2 servers running HAProxy on Ubuntu 12.04. We want to set them up in an HA configuration, and for that we chose keepalived.

The first thing we did was look for an existing Chef cookbook for keepalived -- luckily, @jtimberman already wrote it. It's a pretty involved cookbook, probably one of the most complex I've seen. The usage instructions are pretty good though. In any case, we ended up writing our own wrapper cookbook on top of keepalived -- let's call it frontend-keepalived.

The usage documentation for the Opscode keepalived cookbook contains a role-based example and a recipe-based example. We took inspiration from both. In our frontend-keepalived/recipes/default.rb file we have:


include_recipe 'keepalived'

node[:keepalived][:check_scripts][:chk_haproxy] = {
  :script => 'killall -0 haproxy',
  :interval => 2,
  :weight => 2
}
node[:keepalived][:instances][:vi_1] = {
  :ip_addresses => '172.30.10.10',
  :interface => 'frontend_if',
  :track_script => 'chk_haproxy',
  :nopreempt => false,
  :advert_int => 1,
  :auth_type => :pass, # :pass or :ah
  :auth_pass => 'mypass'
}


This code overrides the default values for many of the attributes defined in the Opscode keepalived cookbook. It specifies the floating IP address that will be common between the 2 servers that will each run HAProxy (:ip_addresses). It also specifies the network interface where the multicast-based keepalived protocol (:interface) and the 'check script' which tests whether HAProxy is still running on each server.

However, we still needed a way to specify which of the 2 servers is the master and which is the backup (in keepalived parlance), as well as indicating priorities for each server. The usage document in the keep alived cookbook shows this as an example of using a single role to define the master and the backup:


override_attributes(
:keepalived => {
:global => {
:router_ids => {
'node1' => 'MASTER_NODE',
'node2' => 'BACKUP_NODE'
}
}
}
)

We couldn't get this to work (if somebody who did reads this, please leave a comment and tell me how you did it!). Instead, we defined 2 roles, one for the master and one for the backup. Here's the master role:


$ cat frontend-keepalived-master.rb
name "frontend-keepalived-master"
description "install keepalived and set state to MASTER"

override_attributes(
    "keepalived" => {
      "instance_defaults" => {
        "state" => "MASTER",
        "priority" => "101"
}
      }
)

run_list(
  "recipe[frontend-keepalived]"
)

Here's the backup role:


$ cat frontend-keepalived-backup.rb
name "frontend-keepalived-backup"
description "install keepalived and set state to BACKUP"

override_attributes(
    "keepalived" => {
      "instance_defaults" => {
        "state" => "BACKUP",
        "priority" => "100"
}
    }
)

run_list(
  "recipe[frontend-keepalived]"
)

Notice that we override 2 attributes, the state and the priority. The defaults for these are in the Opscode keepalived cookbook, under attributes/default.rb


default['keepalived']['instance_defaults']['state'] = 'MASTER'
default['keepalived']['instance_defaults']['priority'] = 100

This was useful in determining how to specify the stanza overriding them in our roles -- it made us see that we needed to specify the instance_defaults key under keepalived in the role files.

At this point, we added the master role to the Chef run_list of server #1 and the backup role to the Chef run_list of server #2. We had to do one more thing on each server (which we'll add to the default recipe of our frontend-keepalived cookbook): per this very helpful blog post on setting up HAProxy and keepalived, we edited /etc/systctl.conf and added:

net.ipv4.ip_nonlocal_bind=1
then applied it via 'sysctl -p'. This was needed so that HAProxy can listen on the keepalived-created 'floating IP' common to the 2 servers, which is not a real IP tied to an existing local network interface.

Once we ran chef-client on each of the 2 servers, we were able to verify that keepalived does its job by pinging the common floating IP from a 3rd server, then shutting down the network interface 'frontend_if' on each server, with no interruption in the ICMP responses sent from the floating IP. Our next step is to do some heavy-duty testing involving HTTP requests handled by HAProxy, and see that there is no interruption in service when we fail over from one HAProxy server to the other.

UPDATE

My colleague Zmer Andranigian discovered an attribute in the Opscode keepalived cookbook that deals with the sysctl setup. The default value for this attribute is:

default['keepalived']['shared_address'] = false

If this attribute is set to 'true' (for example in one of the 2 roles we defined above), then the keepalived cookbook will create a file called /etc/sysctl.d/60-ip-nonlocal-bind.conf containing:

net.ipv4.ip_nonlocal_bind=1

and will also set it in the running configuration of sysctl.

For reference, the role frontend-keepalived-master would contain the following attributes:


override_attributes(
    "keepalived" => {
      "instance_defaults" => {
        "state" => "MASTER",
        "priority" => "101"
}
      "shared_address" => "true"
   }
)




Using wrapper cookbooks in Chef

Agile Testing - Grig Gheorghiu - Tue, 04/02/2013 - 23:58
Not sure if this is considered a Chef best practice or not -- I would like to get some feedback, hopefully via constructive comments on this blog post. But I've started to see this pattern when creating application-specific Chef cookbooks: take a community cookbook, include it in your own, and customize it for your specific application.

A case in point is the haproxy community cookbook. We have an application that needs to talk to a Riak cluster. After doing some research (read 'googling around') and asking people on Twitter (because Tweeps are always right), it looks like the preferred way of putting a load balancer in front of Riak is to run haproxy on each application server that needs to talk to Riak, and have haproxy listen on 127.0.0.1 on some port number, then load balance those requests to the Riak backend. Here is an example of such an haproxy.cfg file.

So what I did was to create a small cookbook called haproxy-riak, add a default.rb recipe file that just calls

include_recipe haproxy

and then customize the haproxy.cfg file via a template file in my new cookbook (I actually didn't do it via the template yet, only via a hardcoded cookbook file, but my colleague and Chef guru Jeff Roberts is working on templatizing it).

I also added

depends haproxy

to metadata.rb.

I think this is pretty much all that's needed in order to create this 'wrapper' cookbook. This cookbook can then be used by any application that needs to talk to Riak. As a matter of fact, we (i.e. Jeff) are thinking that we should have such a cookbook per application (so we would call it app1-haproxy-riak for example) just so we can do things like search for different Riak clusters that we may have for different types of applications, and populate haproxy.cfg with the search results.

In any case, I would be curious to find out if other people are using this 'pattern', or if they found other ways to apply the DRY principle in their Chef cookbooks. Please leave a comment!

Installing ruby 2.0 on Ubuntu 12.04

Agile Testing - Grig Gheorghiu - Mon, 03/18/2013 - 17:55
I wanted to find a way to install ruby 2.0 on Ubuntu 12.04 via Chef, without going the rvm route, which is harder to automate. After several tries, I finally found a list of .deb packages which are necessary and sufficient as pre-requisites. Writing it down here for my future reference, and maybe it will prove useful to others out there.

I downloaded most of these packages from packages.ubuntu.com (if you google their names you'll find them), then I installed them via 'dpkg -i'. Here they are:


libffi5_3.0.9-1_amd64.deb
libssl0.9.8_0.9.8o-7ubuntu3.1_amd64.deb
zlib1g-dev_1.2.3.4.dfsg-3ubuntu4_amd64.deb

The actual ruby .deb package was built with fpm by my colleague Jeff Roberts courtesy of this blog post.




No snowflakes allowed

Agile Testing - Grig Gheorghiu - Wed, 03/06/2013 - 23:44
"Snowflake" is a term I learned from my colleague Jeff Roberts. It is used in the Chef community (maybe in the configuration management community at large as well) to designate a server/node that is 'unique', i.e. not in configuration management control. In a Chef environment, it means that the node in question was never added to Chef and never had chef-client run on it.

We've all been in situations where it seems overkill to go through the effort of automating the setup of a server. Maybe the server has a unique purpose within our infrastructure. Maybe we didn't feel like spending the time to create Chef recipes for that server. Whatever the reasoning, it seemed low-risk at the time.

Well, I am here to tell you there is danger in this way of thinking. Example: we deployed a server in EC2 manually. We installed the Sensu client on it manually and pointed it at our Sensu server. Everything seemed fine. Then one day we updated our Sensu configuration (via Chef) both on the Sensu server and on all the Sensu clients. Of course, the Sensu configuration on our snowflake server never got updated, since chef-client wasn't running on that server. As a result, the Sensu client wasn't checking in properly with the Sensu server, and the snowflake behaved as if it was falling off the map as far as our monitoring system was concerned. We had to manually update Sensu on the snowflake to bring it in sync with our configuration changes.

Basically, the result of having snowflake servers is that they do fall off the map as far as the overall automation of your infrastructure is concerned. They suffer bitrot, and you end up spending lots of time on their care and feeding, thus defeating the purpose of saving the time to automate them in the first place.

This being said, it's hard to be disciplined enough to run chef-client periodically on every single server in your infrastructure. I've never been able to do that before, but we are doing it now, mostly because of the insistence of Jeff. I do see the advantages of this discipline, and I do recommend it to everybody.

Video and slides for 'Five years of EC2 distilled' talk

Agile Testing - Grig Gheorghiu - Wed, 03/06/2013 - 02:49
On February 19th I gave a talk at the Silicon Valley Cloud Computing Meetup about my experiences and lessons learned while using EC2 for the past 5 years or so. I posted the slides on Slideshare and there's also a video recording of my presentation. I think the talk went pretty well, judging by the many questions I got at the end. Hopefully it will be useful to some people out there who are wondering if EC2 or The Cloud in general is a good fit for their infrastructure (short answer: it very much depends).

I want to thank Sebastian Stadil from Scalr for inviting me to give the talk (he first contacted me about giving a talk like this in -- believe it or not -- early 2008!). I also want to thank Adobe for hosting the meeting, and CDNetworks for sponsoring the meeting.