Skip to content

Software Development Blogs: Programming, Software Testing, Agile Project Management

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Agile Testing - Grig Gheorghiu
Syndicate content
Did anybody say webscale?
Updated: 29 min 16 sec ago

Automatically launching and configuring an EC2 instance with ansible

Fri, 06/26/2015 - 20:53
Ansible makes it easy to configure an EC2 instance from soup to nuts when it comes to launching the instance and configuring it.  Here's a complete playbook I use for this purpose:

$ cat ec2-launch-instance-api.yml
---
- name: Create a new api EC2 instance
hosts: localhost
gather_facts: False
vars:
keypair: api
instance_type: t2.small
security_group: api-core
image: ami-5189a661
region: us-west-2
vpc_subnet: subnet-xxxxxxx
name_tag: api01
tasks:
- name: Launch instance
ec2:
key_name: "{{ keypair }}"
group: "{{ security_group }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: true
region: "{{ region }}"
vpc_subnet_id: "{{ vpc_subnet }}"
assign_public_ip: yes
instance_tags:
Name: "{{ name_tag }}"
register: ec2

- name: Add Route53 DNS record for this instance (overwrite if needed)
route53:
command: create
zone: mycompany.com
record: "{{name_tag}}.mycompany.com"
type: A
ttl: 3600
value: "{{item.private_ip}}"
overwrite: yes
with_items: ec2.instances

- name: Add new instance to proper ansible group
add_host: hostname={{name_tag}} groupname=api-servers ansible_ssh_host={{ item.private_ip }} ansible_ssh_user=ubuntu ansible_ssh_private_key_file=/Users/grig.gheorghiu/.ssh/api.pem
with_items: ec2.instances

- name: Wait for SSH to come up
wait_for: host={{ item.private_ip }} port=22 search_regex=OpenSSH delay=210 timeout=420 state=started
with_items: ec2.instances

- name: Configure api EC2 instance
hosts: api-servers
sudo: True
gather_facts: True
roles:
- base
- tuning
- postfix
- monitoring
- nginx
- api


The first thing I do in this playbook is to launch a new EC2 instance, add or update its Route53 DNS A record, add it to an ansible group and wait for it to be accessible via ssh. Then I configure this instance by applying a handful or roles to it. That's it.
Some things to note:
1) Ansible uses boto under the covers, so you need that installed on your local host, and you also need a ~/.boto configuration file with your AWS credentials:
[Credentials]aws_access_key_id = xxxxxaws_secret_access_key = yyyyyyyyyy
2) When launching an EC2 instance with ansible via the ansible ec2 module, the hosts variable should point to localhost and gather_facts should be set to False. 
3) The various parameters expected by the EC2 API (keypair name, instance type, VPN subnet, security group, instance name tag etc) can be set in the vars section and then used in the tasks section in the ec2 stanza.
4) I used the ansible route53 module for managing DNS. This module has a handy property called overwrite, which when set to yes will update a DNS record in place if it exists, or will create it if it doesn't exist. 5) The add_host task is very useful in that it adds the newly created instance to a hosts group, in my case api-servers. This host group has a group_vars/api-servers configuration file already, where I set various ansible variables used in different roles (mostly secret-type variables such as API keys, user names, passwords etc). The group_vars directory is NOT checked in.
6) In the final task of the playbook, the [api-servers] group (which consists of only the newly created EC2 instance) gets the respective roles applied to it. Why does this group only consist of the newly created EC2 instance? Because when I run the playbook with ansible-playbook, I indicate an empy hosts file to make sure this group is empty:
$ ansible-playbook -i hosts/myhosts.empty ec2-launch-instance-api.yml
If instead I wanted to also apply the specified roles to my existing EC2 instances in that group, I would specify a hosts file that already has those instances defined in the [api-servers] group.

Automatically launching and configuring an EC2 instance with ansible

Fri, 06/26/2015 - 20:53
Ansible makes it easy to configure an EC2 instance from soup to nuts when it comes to launching the instance and configuring it.  Here's a complete playbook I use for this purpose:

$ cat ec2-launch-instance-api.yml
---
- name: Create a new api EC2 instance
hosts: localhost
gather_facts: False
vars:
keypair: api
instance_type: t2.small
security_group: api-core
image: ami-5189a661
region: us-west-2
vpc_subnet: subnet-xxxxxxx
name_tag: api01
tasks:
- name: Launch instance
ec2:
key_name: "{{ keypair }}"
group: "{{ security_group }}"
instance_type: "{{ instance_type }}"
image: "{{ image }}"
wait: true
region: "{{ region }}"
vpc_subnet_id: "{{ vpc_subnet }}"
assign_public_ip: yes
instance_tags:
Name: "{{ name_tag }}"
register: ec2

- name: Add Route53 DNS record for this instance (overwrite if needed)
route53:
command: create
zone: mycompany.com
record: "{{name_tag}}.mycompany.com"
type: A
ttl: 3600
value: "{{item.private_ip}}"
overwrite: yes
with_items: ec2.instances

- name: Add new instance to proper ansible group
add_host: hostname={{name_tag}} groupname=api-servers ansible_ssh_host={{ item.private_ip }} ansible_ssh_user=ubuntu ansible_ssh_private_key_file=/Users/grig.gheorghiu/.ssh/api.pem
with_items: ec2.instances

- name: Wait for SSH to come up
wait_for: host={{ item.private_ip }} port=22 search_regex=OpenSSH delay=210 timeout=420 state=started
with_items: ec2.instances

- name: Configure api EC2 instance
hosts: api-servers
sudo: True
gather_facts: True
roles:
- base
- tuning
- postfix
- monitoring
- nginx
- api


The first thing I do in this playbook is to launch a new EC2 instance, add or update its Route53 DNS A record, add it to an ansible group and wait for it to be accessible via ssh. Then I configure this instance by applying a handful or roles to it. That's it.
Some things to note:
1) Ansible uses boto under the covers, so you need that installed on your local host, and you also need a ~/.boto configuration file with your AWS credentials:
[Credentials]aws_access_key_id = xxxxxaws_secret_access_key = yyyyyyyyyy
2) When launching an EC2 instance with ansible via the ansible ec2 module, the hosts variable should point to localhost and gather_facts should be set to False. 
3) The various parameters expected by the EC2 API (keypair name, instance type, VPN subnet, security group, instance name tag etc) can be set in the vars section and then used in the tasks section in the ec2 stanza.
4) I used the ansible route53 module for managing DNS. This module has a handy property called overwrite, which when set to yes will update a DNS record in place if it exists, or will create it if it doesn't exist. 5) The add_host task is very useful in that it adds the newly created instance to a hosts group, in my case api-servers. This host group has a group_vars/api-servers configuration file already, where I set various ansible variables used in different roles (mostly secret-type variables such as API keys, user names, passwords etc). The group_vars directory is NOT checked in.
6) In the final task of the playbook, the [api-servers] group (which consists of only the newly created EC2 instance) gets the respective roles applied to it. Why does this group only consist of the newly created EC2 instance? Because when I run the playbook with ansible-playbook, I indicate an empy hosts file to make sure this group is empty:
$ ansible-playbook -i hosts/myhosts.empty ec2-launch-instance-api.yml
If instead I wanted to also apply the specified roles to my existing EC2 instances in that group, I would specify a hosts file that already has those instances defined in the [api-servers] group.

Deploying monitoring tools with ansible

Fri, 06/26/2015 - 01:10
At my previous job, I used Chef for configuration management. New job, new tools, so I decided to use ansible, which I had played with before. Part of that was that I got sick of tools based on Ruby. Managing all the gems dependencies and migrating from one Ruby version to another was a nightmare that I didn't want to go through again. That's one reason why at my new job we settled on Go as the main language we use for our backend API layer.

Back to ansible. Since it's written in Python, it's already got good marks in my book. Plus it doesn't need a server and it's fairly easy to wrap your head around. I've been very happy with it so far.

For external monitoring, we use Pingdom because it works and it's cheap. We also use New Relic for application performance monitoring, but it's very expensive, so I've been looking at ways to supplement it with Open Source tools.

An announcement about telegraf drew my attention the other day: here was a metrics collection tool written in Go and sending its data to InfluxDB, which is a scalable database also written in Go and designed to receive time-series data. Seemed like a perfect fit. I just needed a way to display the data from InfluxDB. Cool, it looked like Grafana supports InfluxDB! It turns out however that Grafana support for the latest InfluxDB version 0.9.0 is experimental, i.e. doesn't really work. Plus telegraf itself has some rough edges in the way it tags the data it sends to InfluxDB. Long story short, after a whole day of banging my head against the telegraf/InfluxDB/Grafana wall, I decided to abandon this toolset.

Instead, I reached again to trusty old Graphite and its loyal companion statsd. I had problems with Graphite not scaling well before, but for now we're not sending it such a huge amount of metrics, so it will do. I also settled on collectd as the OS metric collector. It's small, easy to configure, and very stable. The final piece of the puzzle was a process monitoring and alerting tool. I chose monit for this purpose. Again: simple, serverless, small footprint, widely used, stable, easy to configure.

This seems like a lot of tools, but it's not really that bad if you have a good solid configuration management system in place -- ansible in my case.

Here are some tips and tricks specific to ansible for dealing with multiple monitoring tools that need to be deployed across various systems.

Use roles extensively

This is of course recommended no matter what configuration management system you use. With ansible, it's easy to use the commmand 'ansible-galaxy init rolename' to create the directory structure for a new role. My approach is to create a new role for each major application or tool that I want to deploy. Here are some of the roles I created:

  • a base role that adds users, deals with ssh keys and sudoers.d files, creates directory structures common to all servers, etc.
  • a tuning role that mostly configures TCP-related parameters in sysctl.conf
  • a postfix role that installs and configures postfix to use Amazon SES
  •  a go role that installs golang from source and configures GOPATH and GOROOT
  • an nginx role that installs nginx and deploys self-signed SSL certs for development/testing purposes
  • a collectd role that installs collectd and deploys (as an ansible template) a collectd.conf configuration file common to all systems, which sends data to graphite (the system name is customized as {{inventory_hostname}} in the write_graphite plugin)
  • a monit role that installs monit and deploys (again as an ansible template) a monitrc file that monitors resource metrics such as CPU, memory, disk etc. common to all systems
  • an api role that does the heavy lifting for the installation and configuration of the packages that are required by our API layer
Use an umbrella 'monitoring' role
At first I was configuring each ansible playbook to use both the monit role and the collectd role. I realized that it's a bit more clear and also easier to maintain if instead playbooks use a more generic monitoring role, which does nothing but list monit and collectd as dependencies in its meta/main.yml file:

dependencies:  - { role: monit }  - { role: collectd }
Customize monitoring-related configuration files in other roles
A nice thing about both monit and collectd, and a main reason I chose them, is that they read configuration files from a directory called /etc/monit/conf.d for monit and /etc/collectd/collectd.conf.d for collectd. This makes it easy for each role to add its own configuration files. For example, the api role adds 2 files as custom checks in  /etc/monit/conf.d: check_api and check_nginx. It also adds 2 files as custom metric collectors in /etc/collectd/collectd.conf.d: nginx.conf and memcached.conf. The api role does this via a file called tasks/monitoring.yml file which gets included in tasks/main.yml.
As another example, the nginx role also adds its own check_nginx configuration file to /etc/monit/conf.d via a tasks/monitoring.yml file.
The rule of thumb I arrived at is this: each low-level role such as monit and collectd installs the common configuration files needed by all other roles, whereas each higher-level role such as api installs its own custom checks and metric collectors via a monitoring.yml task file. This way, it's easy to see at a glance what each high-level role does for monitoring: just look in its monitoring.yml task file.
To wrap this post up, here is an example of a playbook I use to build API servers:
$ cat api-servers.yml---
- hosts: api-servers  sudo: yes  roles:    - base    - tuning    - postfix    - monitoring    - nginx    - api
I call this playbook with:
$ ansible-playbook -i hosts/myhosts api-servers.yml
To make this work, the hosts/myhosts file has a section similar to this:
[api-servers]api01 ansible_ssh_host=api01.mydomain.comapi02 ansible_ssh_host=api02.mydomain.com




More nxlog logging tricks

Tue, 04/14/2015 - 00:11
In a previous post I talked about "Sending Windows logs to Papertrail with nxlog". In the mean time I had to work through a couple of nxlog issues that weren't quite obvious to solve -- hence this quick post.

Scenario 1: You don't want to send a given log file to Papertrail

My solution:

In this section:

# Monitor MyApp1 log files 
START_ANGLE_BRACKET Input MyApp1 END_ANGLE_BRACKET
 Module im_file
 File 'C:\\MyApp1\\logs\\*.log' 
 Exec $Message = $raw_event; 
 Exec if $Message =~ /GET \/ping/ drop(); 
 Exec if file_name() =~ /.*\\(.*)/ $SourceName = $1; 
 SavePos TRUE 
 Recursive TRUE 
START_ANGLE_BRACKET /Input END_ANGLE_BRACKET

add a line which drops the current log line if the file name contains the pattern you are looking to skip. For example, for a file name called skip_this_one.log (from the same log directory), the new stanza would be:
# Monitor MyApp1 log files 
START_ANGLE_BRACKET Input MyApp1 END_ANGLE_BRACKET
 Module im_file
 File 'C:\\MyApp1\\logs\\*.log' 
 Exec $Message = $raw_event; 
 Exec if $Message =~ /GET \/ping/ drop();  Exec if file_name() =~ /skip_this_one.log/ drop();
 Exec if file_name() =~ /.*\\(.*)/ $SourceName = $1; 
 SavePos TRUE 
 Recursive TRUE 
START_ANGLE_BRACKET /Input END_ANGLE_BRACKET
Scenario 2: You want to prefix certain log lines depending on their directory of origin
Assume you have a test app and a dev app running on the same box, with the same exact log format, but with logs saved in different directories, so that in the Input sections you would have 
File 'C:\\MyTestApp\\logs\\*.log' for the test app and File 'C:\\MyDevApp\\logs\\*.log' for the dev app.
The only solution I found so far was to declare a filewatcher_transformer Processor section for each app. The default filewatcher_transformer section I had before looked like this:

START_ANGLE_BRACKET  Processor filewatcher_transformer END_ANGLE_BRACKET  Module pm_transformer    # Uncomment to override the program name  # Exec $SourceName = 'PROGRAM NAME';  Exec $Hostname = hostname();  OutputFormat syslog_rfc5424START_ANGLE_BRACKET/Processor END_ANGLE_BRACKET
I created instead these 2 sections:
START_ANGLE_BRACKET Processor filewatcher_transformer_test END_ANGLE_BRACKET  Module pm_transformer    # Uncomment to override the program name  # Exec $SourceName = 'PROGRAM NAME';  Exec $SourceName = "TEST_" + $SourceName;  Exec $Hostname = hostname();  OutputFormat syslog_rfc5424START_ANGLE_BRACKET/Processor END_ANGLE_BRACKET

START_ANGLE_BRACKET Processor filewatcher_transformer_dev END_ANGLE_BRACKET  Module pm_transformer    # Uncomment to override the program name  # Exec $SourceName = 'PROGRAM NAME';  Exec $SourceName = "DEV_" + $SourceName;  Exec $Hostname = hostname();  OutputFormat syslog_rfc5424START_ANGLE_BRACKET/Processor END_ANGLE_BRACKET
As you can see, I chose to prefix $SourceName, which is the name of the log file in this case, with either TEST_ or DEV_ depending on the app.
There is one thing remaining, which is to define a specific route for each app. Before, I had a common route for both apps:
START_ANGLE_BRACKET  Route 2 END_ANGLE_BRACKET
Path MyAppTest, MyAppDev=> filewatcher_transformer => syslogoutSTART_ANGLE_BRACKET /Route END_ANGLE_BRACKET
I replaced the common route with the following 2 routes, each connecting an app with its respective Processor section.
START_ANGLE_BRACKET  Route 2 END_ANGLE_BRACKET
Path MyAppTest=> filewatcher_transformer_test => syslogoutSTART_ANGLE_BRACKET /Route END_ANGLE_BRACKET
START_ANGLE_BRACKET  Route 3 END_ANGLE_BRACKET
Path MyAppDev=> filewatcher_transformer_dev => syslogoutSTART_ANGLE_BRACKET /Route END_ANGLE_BRACKET
At this point, I restarted the nxlog service and I started to see log filenames in Papertrail of the form DEV_errors.log and TEST_errors.log.