Running WordPress on Heroku + Amazon RDS

With the announcement of the Heroku and Facebook partnership yesterday, Heroku quietly confirmed support for two new languages, Python and PHP. The self-proclaimed polyglot platform, Heroku's Celadon Cedar Stack now has a considerable advantage over newcomers like PHP Fog, and Orchestra.io by avoiding mostly separate, language-specific products.

To test out Heroku's PHP support (version 5.3.6), I deployed WordPress. One of the current limitations is the lack of native MySQL support, outside of hooking into the Xeround Cloud DB or Amazon RDS. I've found one blog post that suggested using the default PostgreSQL backend that Heroku provides with the 'PG4WP' WordPress plugin, enabling WordPress to be used with a PostgreSQL database. While this may work, it lacks most plugin support and is more of a bandaid for the platform limitations. Instead, you can use Amazon's Relational Database Service (RDS) addon.

Amazon RDS is a service that allows you to set up, operate and scale a dedicated MySQL database server on top of EC2. In addition to standard MySQL features, RDS offers the following functionality:

  • Automated backups
  • Point-in-time recovery
  • Seamless vertical scaling between instance types

The free Amazon RDS add-on lets you connect your Heroku app to an RDS instance and seamlessly use it in place of the standard, Heroku-provided PostgreSQL database. To get started, you should configure the RDS command line toolkit and Heroku gem if you haven't already. Let's start by creating the RDS database instance on your local machine:

rds-create-db-instance --db-instance-identifier [name]\
  --allocated-storage 5 \
  --db-instance-class db.m1.small  \
  --engine MySQL5.1 \
  --master-username [user] \
  --master-user-password [pw] \
  --db-name [name] \
  --headers

This will take a few minutes. Once the database is provisioned, add your local IP address to the security group -- assuming your workstation’s public IP is 1.1.1.1:

rds-authorize-db-security-group-ingress default --cidr-ip 1.1.1.1/32

Heroku also needs to be able to access your RDS instance. To allow Heroku’s cloud through the RDS firewall, run the following command:

rds-authorize-db-security-group-ingress default \
    --ec2-security-group-name default \
    --ec2-security-group-owner-id 098166147350

Note: Previously, Heroku recommended using their AWS security group and AWS account ID to grant apps access to other services running on AWS. This approach is no longer recommended and the relevant documentation has been removed. Reasons for no longer recommending this include:

  • Cross-security grants do not work with AWS VPC (which is now the default on AWS)
  • It is not safe because it grants access to all apps running on Heroku, not just yours
  • Does not work across AWS regions
  • Heroku may in the future run apps in a VPC or in a different region or use a different AWS account

If you are using Heroku with a AWS RDS database, I would strongly recommend using SSL to secure database connections. Find links and details in the Amazon RDS Dev Center article.

Now we can begin building the application layer. Since we're going to be using Git for version control, I'd suggest cloning WordPress from the GitHub repository; it is synced from Automattic's SVN repository every 15 minutes, including branches and tags:

git clone git://github.com/WordPress/WordPress.git
cd WordPress

Before we start making any changes to file structure, we should make our own Git repository and start committing. Note here that you'll want to populate the wp-config.php with the MySQL credentials from the rds-create-db-instance command above:

git init
mv wp-config-sample.php wp-config.php
git add .
git commit -m 'initial commit'

Now, create the stack and enable the RDS addon with your MySQL credentials. Once the stack has been created, you can deploy:

heroku create --stack cedar
heroku addons:add amazon_rds url=mysql://user:pass@rdshostname.amazonaws.com/databasename
git push heroku master

The output should look something like this:

➜ wp-heroku-test git:(master) git push heroku master
Counting objects: 985, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (965/965), done.
Writing objects: 100% (985/985), 3.65 MiB | 221 KiB/s, done.
Total 985 (delta 66), reused 0 (delta 0)

-----> Heroku receiving push
-----> PHP app detected
-----> Bundling Apache v2.2.19
-----> Bundling PHP v5.3.6
-----> Discovering process types
       Procfile declares types -> (none)
       Default types for PHP   -> web
-----> Compiled slug size is 24.9MB
-----> Launching... done, v4
       http://evening-waterfall-3372.herokuapp.com deployed to Heroku

To git@heroku.com:evening-waterfall-3977.git
 * [new branch]      master -> master

Notes

  • A commenter on HackerNews noted that the slug is still read-only, but the ephemeral filesystem is writable. The slug is what gets deployed on each new dyno spawned. The ephemeral filesystem is the individual file system on each dyno. So a plugin like WP Super Cache would be able to write to the file system, but that cache would only exist for the individual dyno that wrote it.

  • Because of the usage concerns of media and content uploads, I'd suggest using a CDN or Amazon S3 for storing images and attachments.

  • The zlib extension for PHP is not compiled on the Celadon Cedar Stack. Theme and plugin uploads through the WordPress admin panel will fail. To get around this, set up your themes and plugins on your local workstation first, commit and deploy.

  • Do not include a phpinfo page in the document root as it will contain your database credentials in plain text.

  • If you want to have pretty permalinks, create the .htaccess file on your local machine and populate the mod_rewrite rules prior to deploying.

Live Data Replication Using lsyncd

One of my favorite Linux utilities I've discovered recently is lsyncd, a live syncing (mirror) daemon. Following the traditional Unix philosophy, it does data replication simply and it does it very well. Using some fancy inotify magic, lsyncd will spawn one or more processes to synchronize the targets after changes have been made.

After determining that a client would need multiple web-servers running in sync, I evaluated a few different tools that perform data replication and live syncing. This is one of the larger problems I've encountered with scaling web applications horizontally -- storage management among multiple web services. In traditional deployments, it might make sense to use something like NFS or even DRBD, operating on the block device level. While this works for write-heavy systems under high load, it isn't practical mirroring a whole block device for a high-traffic WordPress site -- I just needed a basic client-server model.

This is where lsyncd really shines. The lsyncd configuration file is written in Lua and super easy to set up. Below is the lsyncd.conf that duplicates /var/www/tjstein.com on the master server to the 4 targets with the same path:

settings = {
   delay        = 1,
   maxProcesses = 5,
   statusFile   = "/tmp/lsyncd.status",
   logfile      = "/var/log/lsyncd.log",
}

targetlist = {
 "10.0.1.23:/var/www/tjstein.com",
 "10.0.1.24:/var/www/tjstein.com",
 "10.0.1.25:/var/www/tjstein.com",
 "10.0.1.26:/var/www/tjstein.com"
}

for _, server in ipairs(targetlist) do
  sync{ default.rsync,
    source="/var/www/tjstein.com",
    rsyncOpts="-rltvupgo",
    target=server
  }
end

To make the rsyncing seamless, you'll need to send the public key generated on the master server to all of the duplicated nodes for easy access. With the client I was working with, we were also using Nginx to load balance across the web nodes. Since we only wanted writes going to the primary master server, we redirected all requests to /wp-admin to the primary node. Then, any changes made through the WordPress backend were synced over to the targets in less than 1 second. Here is the Nginx configuration:

upstream backend  {
  server 10.0.1.20; #master
  server 10.0.1.21; #www-01
  server 10.0.1.22; #www-02
  server 10.0.1.23; #www-03
}

upstream admin {
  server 10.0.1.20;
}

server {
  server_name www.tjstein.com tjstein.com;

  location / {
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_pass  http://backend;
  }

  location ~ /wp-admin/* {
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_pass  http://admin;
  }
}

Notes

You'll want to install lsyncd from source on most distributions. Older versions (pre 2.x) contained gross XML style configuration; the most recent version, as of 8/30/2011, is 2.0.5.

APC Cache Purge Plugin for WordPress

After a recent WordPress update, I noticed that the admin panel did not correctly reflect the new version as indicated in the footer of the backend panel. I realized that the APC opcode cache was not flushed and therefore held on to cached versions of many updated files. I thought about what other ways this cache could be flushed without restarting the web server or PHP daemon (PHP-FPM) without compromising service availability.

Some subsequent googling later, I found a dirty hack that, although helpful, was somewhat limited by design -- it suggests adding the script to the functions.php file of your theme. This means you'd need to manually insert the code for each theme. Also, apc_clear_cache() did not work by default with APC version 3.1.3p1. From this, I created a simple, single purpose WordPress plugin to flush the APC cache.

Once activated, you'll get a 'Purge APC' option under the tools menu. Once the option is clicked, the APC cache is flushed and the entries are displayed on the page:

Feel free to fork or submit pull requests to the GitHub repository.

<?php
/**
 * @package APC Cache Purge
 * @version 0.1
 */
/*
Plugin Name: APC Cache Purge
Plugin URI: http://tjstein.com
Description: This is a simple, single purpose plugin to flush the APC cache.
Author: TJ Stein, inspired by Kaspars Dambis of konstruktors.com
Version: 0.1
Author URI: http://tjstein.com
License: GPLv2
*/
function apc_purge() {
    return apc_clear_cache('opcode');
}
// Add Purge APC menu under Tools menu
add_action('admin_menu', 'php_apc_info');
        
function php_apc_info() {
    add_submenu_page('tools.php', 'Purge APC', 'Purge APC', 'activate_plugins', 'flush_php_apc', 'php_apc_options');
}
        
function php_apc_options() {
    if (apc_purge() && apc_purge('user'))
        print '<p>Success!</p>';
    else
        print '<p>Clearing Failed!</p>';
    print '<pre>'; print_r(apc_cache_info()); print '</pre>';
}
// Add Purge APC in the favorite actions dropdown
add_filter('favorite_actions', 'clear_apc_link');
        
function clear_apc_link($actions) {
    $actions['tools.php?page=flush_php_apc'] = array('Purge APC', 'edit_posts');
    return $actions;
}
?>

Installation

  • Upload apc-cache-purge.php to the /wp-content/plugins/ directory
  • Activate the plugin through the ‘Plugins’ menu in WordPress
  • When needed, flush the cache by clicking on ‘Purge APC’ under the Tools section

Alsatian Darn

"Alsatian Darn" from the album "Tomboy" (2011) performed by Panda Bear.

Classy Web Development with Sinatra and Heroku

Since my recent switch from WordPress to Jekyll, I've really enjoyed playing around with Ruby. Although I don't have much programming experience, I've been fascinated with one particular Ruby framework, Sinatra.

Sinatra is a DSL (domain-specific language) for quickly creating web applications in Ruby with very little effort. It is a minimalist framework, built right on top of Rack, a standard interface for Ruby web frameworks. Unlike rails, you won't find many bells and whistles here. There are no models, views or controllers by default, but that's where Sinatra really shines. You can use Sinatra to create lean, focused web applications in just a few lines of code.

So this morning, I published a Sinatra template for Heroku with Haml, Sass & jQuery. The repository includes a base HTML 5 template for Sinatra, ready for Heroku deployment. Use bundle install to grab all of the gem dependencies. If you get lost during deployment, check out Getting Started with Heroku

One of the fun parts of the project was using Haml in place of erb. I'm still pulling up the Haml reference here and there but it's a breeze to work with. Here is an example of the layout page:

!!! 5
%html{html_attrs('en-en')}
  %head
    %meta{:'http-equiv' => "Content-Type", :content => "text/html; charset=utf-8"}
    %meta{:name => "lang", :content => "en"}
    %title Sinatra Template for Heroku w/ Haml, Sass & jQuery | TJ Stein
    %link{:href => "assets/css/style.css", :rel => "stylesheet", :type => "text/css"}/
  %body
    .container
    = yield
    = haml :analytics, :layout => false

The actual application file doesn't have much in it. It loads the required gems, builds the routes including the Sass support. You can add or modify any of the routes to create new pages or functions:

require 'sinatra'
require 'haml'
require 'sass'

set :haml, :format => :html5

get '/' do
  haml :index
end

get '/:path' do
  haml params[:path].to_sym
end

get '/style.css' do
  content_type 'text/css', :charset => 'utf-8'
  scss :style
end

If you prefer using a Rakefile to deploy, I've included an example Rakefile with a rake task for deployment to Heroku using rake deploy:

require 'rake'

desc "Deploy to Heroku."
task :deploy do
   require 'heroku'
   require 'heroku/command'
   user, pass = File.read(File.expand_path("~/.heroku/credentials")).split("\n")
   heroku = Heroku::Client.new(user, pass)

   cmd = Heroku::Command::BaseWithApp.new([])
   remotes = cmd.git_remotes(File.dirname(__FILE__) + "/../..")

   remote, app = remotes.detect {|key, value| value == (ENV['APP'] || cmd.app)}

   if remote.nil?
   raise "Could not find a git remote for the '#{ENV['APP']}' app"
   end

   `git push #{remote} master`

   heroku.restart(app)
end

If you end up using the template in your own application or want to contribute, send a pull request or just fork the project. Keep it classy.