Automatically Compress/Sync Your S3 Hosted Static Website

For those of you that use Amazon Web Services (AWS) Simple Storage Service (S3), you might find yourself wondering how to serve gzip compressed files for better performance. While the instructions available online work well, manually compressing, renaming and editing metadata of all your hosted files can be super tedious; let’s see what we can do about that.

In order to setup, we’re going to need to have installed and configured s3cmd. For OS X users, this is available via Homebrew with simply  brew update && brew install s3cmd && s3cmd --configure .

Once that’s setup, we’re pretty much ready to write the script.

I saved this as a script called  gzsync and put it in my scripts directory (part of my $PATH , chmod a+x ~/.bin/gzsync ).

Now, say we’re in our static website directory that we want to upload to S3.

This would be a clever git post-receive hook to automatically deploy our site, but we’ll save that for next time. For CloudFront users, this would be a good time to invalidate your distribution. We can actually script this, and if you would like to contribute this, that would be great!

If you would like to make any modifications to the script and recontribute them, I have put this up as a GitHub gist.

Sync Google Contacts on Mac OS X with Two Factor Authentication

So you want to synchronize your Google contacts with the contacts in Mac OS X, but you have done the smart thing and enabled two factor authentication for your Google account. Not a problem, here’s how to do it.

Open up Contacts.app and go to Contacts > Preferences… (⌘-,) and go to Accounts, there you will see “On My Mac” as an account option.

screenshot

Click the “Configure…” button, and you will be prompted for your Google account and password. This is where the two factor authentication trick comes into play.

Visit the Google two factor authentication settings page and scroll down to Application-specific passwords. There you will need to supply a meaningful name to your new key (e.g. “Mac OS X Contacts Sync”) and click Generate password. This will create a random string of characters that you will then copy and paste as your password into the Contacts.app preferences screen.

Once you fill that in, proceed with the prompt on the Contacts.app preferences screen. You should see a symbol in your task bar now which only the option “Sync now”. This will pull all your contacts from your Google account, and you’re set, securely!

Make Desktop Background from Screensaver Defaults in Mac OS X

There are photos in the Mac OS X screensaver that are nicer than those in the default desktop background choices. Naturally, one might want those nice National Geographic photos as their desktop background, and it’s pretty simple to get at them.

Open up the Terminal, and type in the following:

Then just navigate to Change Desktop Background… and add the Wallpapers folder in your Pictures directory.

Change desktop background screenshot

How to install Python pandas Development Version on Mac OS X

The pandas data analysis module is quickly becoming the go-to tool for data analysis in Python. Certain features, such as in memory joins and sorts, become extremely powerful when dealing with in-memory datasets. Often times, operations that take hours in Excel to execute take only seconds using pandas.

As the recent re-covert to Mac OS X, I wanted to get setup with the development version of pandas on my new machine running Mac OS X 10.8.

To begin, we need to have a few things installed, particularly pip and homebrew.

If you have not yet installed pip, and have a valid Python installation on your machine, simply run sudo easy_install pip in your terminal.

Once that’s done, we need to install a few libraries before trying to install our Python libraries.

This will bring in all the compilers and libraries that we’re going to need to build our stuff later on.

Assuming that you want the following libraries installed at the global Python install level, rather than a virtual environment, you can install the requirements to build pandas in a single line.

With that, you should be able to clone the latest pandas repository and install the latest development version.

That’s pretty much it, if you have any problems, feel free to leave a comment.

Using Python and the NLTK to Find Haikus in the Public Twitter Stream

So after sitting around mining the public twitter stream and detecting natural language with Python, I decided to have a little fun with all that data by detecting haikus.

The Natural Language Toolkit (NLTK) in Python, basically the Swiss army knife of natural language processing, allows for more than just natural language detection. The NLTK offers quite a few corpora, including the Carnegie Mellon University (CMU) Pronouncing Dictionary. This corpus contains quite a few features, but the one that piqued my interest was the syllable count for over 125,000 (English) words. With the ability to get the number of syllables for almost every English word, why not see if we can pluck some haikus from the public Twitter stream!

We’re going to be feeding Python a string formed Tweet and try to figure out if it is a haiku, trying our best to split it up into haiku form.

Building upon natural language detection with the NLTK, we should first filter out all the Tweets that come are probably not English (to speed things up a little bit).

Once we have that out of the way, we can dig into the haiku detection.

So what we have now is a function,  is_haiku, that will return a list of the three haiku lines if the given string is a haiku, or returns  False  if it’s (probably) not a haiku. I keep saying probably because this script isn’t perfect, but it works most of the time.

After all that hacky code, it’s just a matter of hooking it up to the public Twitter stream. Borrowing from the public Twitter stream mining code, we can pipe every Tweet into the is_haiku function and if it returns a list, add it to our database.

So running this for a while, we actually pick up some pretty entertaining Tweets. I have been running this script for a little while on a micro EC2 instance and created a basic site that shows them in haiku form, as well as a Twitter account that retweets every haiku that it finds.

Some samples of found haikus,

 

 

 

So it’s can be pretty interesting. What this exercise underlines is the publicity of your Tweets. There might be some robot out there mining all that stuff. In fact, every Tweet is archived by the Library of Congress, so be mindful what you post.

I have posted the full script in as a Gist that puts it all together. If you have any improvements or comments, feel free to contribute!