Mounting S3 Buckets Using FUSE

Over the past few days, I've been playing around with FUSE and a FUSE-based filesystem backed by Amazon S3, s3fs. Until recently, I've had a negative perception of FUSE that was pretty unfair, partly based on some of the lousy FUSE-based projects I had come across. I'm sure some of it also comes down to some partial ignorance on my part for not fully understanding what FUSE is and how it works.

What is FUSE?

FUSE is a loadable kernel module that lets you develop a user space filesystem framework without understanding filesystem internals or learning kernel module programming. This basically lets you develop a filesystem as executable binaries that are linked to the FUSE libraries. So, if you're not comfortable hacking on kernel code, FUSE might be a good option for you.

FUSE + Amazon S3

So, now that we have a basic understanding of FUSE, we can use this to extend the cloud-based storage service, S3. Using a tool like s3fs, you can now mount buckets to your local filesystem without much hassle. Although your reasons may vary for doing this, a few good scenarios come to mind:

  • Your server is running low on disk space and you want to expand
  • You want to give multiple servers read/write access to a single filesystem
  • You want to access off-site backups on your local filesystem without ssh/rsync/ftp

To get started, we'll need to install some prerequisites. I've set this up successfully on Ubuntu 10.04 and 10.10 without any issues:

aptitude -y install build-essential libfuse-dev fuse-utils libcurl4-openssl-dev 
aptitude -y install libxml2-dev mime-support pkg-config

Now you'll need to download and compile the s3fs source. As of 2/22/2011, the most recent release, supporting reduced redundancy storage, is 1.40. Check out the Google Code page to be certain you're grabbing the most recent release.

cd /usr/local/src
wget http://s3fs.googlecode.com/files/s3fs-1.40.tar.gz
tar zxvf s3fs-1.40.tar.gz && cd s3fs-1.40
./configure
make && make install

This will install the s3fs binary in /usr/local/bin/s3fs. You can add it to your .bashrc if needed:

echo "export s3fs='/usr/local/bin/s3fs'" >> ~/.bashrc
source ~/.bashrc

Now we have to set the allow_other mount option for FUSE. Using the allow_other mount option works fine as root, but in order to have it work as other users, you need uncomment user_allow_other in the fuse configuration file:

perl -p -i -e 's|#user_allow_other|user_allow_other|g;' /etc/fuse.conf

To make sure the s3fs binary is working, run the following:

root@tjstein.com:~# s3fs
s3fs: missing BUCKET argument
Usage: s3fs BUCKET MOUNTPOINT [OPTION]...

So before you can mount the bucket to your local filesystem, create the bucket in the AWS control panel or using a CLI toolset like s3cmd. Then, create the mount directory on your local machine before mounting the bucket:

s3cmd mb s3://bucketname
mkdir -p /mnt/s3
s3fs bucketname -o use_cache=/tmp -o allow_other /mnt/s3

To allow access to the bucket, you must authenticate using your AWS secret access key and access key. You can either add the credentials in the s3fs command using flags or use a password file. Depending on what version of s3fs you are using, the location of the password file may differ -- it will most likely reside in your user's home directory or /etc.

I also suggest using the use_cache option. If enabled, s3fs automatically maintains a local cache of files in the folder specified by use_cache. Whenever s3fs needs to read or write a file on S3, it first downloads the entire file locally to the folder specified by use_cache and operates on it. When FUSE release() is called, s3fs will re-upload the file to s3 if it has been changed, using md5 checksums to minimize transfers from S3.

To confirm the mount, run mount -l and look for /mnt/s3.

Notes

While this method is easy to implement, there are some caveats to be aware of.

  • Because of the distributed nature of S3, you may experience some propagation delay. So, after the creation of a file, it may not be immediately available for any subsequent file operation. To read more about the "eventual consistency", check out the following post from shlomoswidler.com.

  • You can't update part of an object on S3. If you want to update 1 byte of a 5GB object, you'll have to re-upload the entire object.

  • The software documentation for s3fs is lacking, likely due to a commercial version being available now.

Minimum Viable Frameworks

During the process of redesigning my personal blog, I evaluated several pieces of software including Jekyll, Flask (a Python micro-framework) and Toto among a few others. I decided to go with Jekyll as it seemed to meet most of my requirements:

  • No database; posts are stored in plain text, markdown or some other equivalent
  • No server-side language requirements
  • Can integrate easily with Git
  • Can be themed and extended
  • Has well-written documentation and good community adoption

Since the redesign, I've really embraced this type of site-building platform and I think many others are starting to see it's appeal. The popularity of these micro-frameworks has grown tremendously over the last year or two. I've read numerous blog posts from users leaving larger CMS-based blogging platforms like WordPress in favor of more minimalist, git-powered projects. Their reasons for switching vary, but one common theme is apparent: It's simple.

"I really wanted something simpler than WordPress. I didn’t need a CMS. I barely need a blogging engine. I update so infrequently. I want something that creates well formed html (hah), static content and is easy to use."

Source: Harper Reed

A few web service providers are also catching on, offering extremely affordable (and sometimes free) hosting. Amazon recently announced the availability of website hosting on S3, their storage service. Heroku has offered a free, albeit somewhat diminutive, tier for rapid-prototyping, staging, and testing purposes, as well as actually running lightweight apps. Google App Engine also offers cloud-based solutions, which can be pushed further with tools like DryDrop. DryDrop enables you to host your static site on Google App Engine and update it by pushing to GitHub. That's pretty rad.

If you're looking into micro-frameworks, check out the following articles for more information:

Using WebFont Loader

In the process of designing this site, I decided to use some custom fonts using the @font-face CSS attribute. In comparison to some of the other web font options (Cufon, sIFR, FLIR), the @font-face CSS method is simple, easy to implement and well supported in most modern browsers. Although the Google Font API is probably the most well-known web font library, I decided to roll my own kit from Font Squirrel and self-host the fonts. The really nice part about Font Squirrel is that they provide all of the different font formats (TTF, EOT, OTF, and SVG), compatible with every browser on the market.

For all of my H1-H6 headings, I use the following markup:

@font-face {
    font-family: 'YanoneKaffeesatzRegular';
    src: url('YanoneKaffeesatz-Regular-webfont.eot');
    src: local('☺'), url('/fonts/YanoneKaffeesatz-Regular-webfont.woff') format('woff'), url('/fonts/YanoneKaffeesatz-Regular-webfont.ttf') format('truetype'), url('/fonts/YanoneKaffeesatz-Regular-webfont.svg#webfont1BSMunJa') format('svg');
    font-weight: normal;
    font-style: normal;
}

I then load the font in like I would any other family:

h1, h2, h3, h4, h5, h6 {
    font-family: YanoneKaffeesatzRegular, Arial, Helvetica, sans-serif;
    font-weight: normal;
    color: #111111;
}

Everything looked great besides one small, yet extremely annoying, caveat.

I noticed in Firefox and Opera that for a split second, just before the page finished rendering, unstyled text would be displayed. This drove me crazy. Some subsequent Googling let me know that this was common, often referred to as FOUT (Flash of Unstyled Text). The fix was pretty trivial. What you can do is use the WebFont Loader from Google & Typekit.

The WebFont Loader is a JavaScript library that gives you more control over font loading than the Google Font API provides. The key is using the events system to hide the font until it's ready to be shown. There are quite a few implementation options so I'd suggest checking out some of the following articles for help:

Note: If you see a horizontal scroll-bar after implementing the WebFont Loader on body text, try adding overflow attributes:

body {
    font-size: 75%;
    font-family: DroidSansRegular;
    overflow: -moz-scrollbars-vertical;
    overflow-x: hidden;
    overflow-y: scroll;
}

WordCamp LA

In September of last year, I was invited to speak at WordCamp LA on developing fast and scalable servers for WordPress. During the 30 or so minute presentation, I covered quite a few topics. I strayed away from the application layer side of things and really focused on server improvements and software considerations that offered the most performance. I got some really nice feedback and met some great people involved with WordPress in the LA area. The two other WordCamps I attended before (New York and their flagship conference, San Francisco) were extremely well organized and WordCamp LA was no different.

Although I wasn't able to post my slides, I've posted the video here. I look forward to speaking at more events like this and I highly recommend attending a WordCamp event if one comes into your town.