Using Python and the NLTK to Find Haikus in the Public Twitter Stream

So after sitting around mining the public twitter stream and detecting natural language with Python, I decided to have a little fun with all that data by detecting haikus.

The Natural Language Toolkit (NLTK) in Python, basically the Swiss army knife of natural language processing, allows for more than just natural language detection. The NLTK offers quite a few corpora, including the Carnegie Mellon University (CMU) Pronouncing Dictionary. This corpus contains quite a few features, but the one that piqued my interest was the syllable count for over 125,000 (English) words. With the ability to get the number of syllables for almost every English word, why not see if we can pluck some haikus from the public Twitter stream!

We’re going to be feeding Python a string formed Tweet and try to figure out if it is a haiku, trying our best to split it up into haiku form.

Building upon natural language detection with the NLTK, we should first filter out all the Tweets that come are probably not English (to speed things up a little bit).

Once we have that out of the way, we can dig into the haiku detection.

So what we have now is a function,  is_haiku, that will return a list of the three haiku lines if the given string is a haiku, or returns  False  if it’s (probably) not a haiku. I keep saying probably because this script isn’t perfect, but it works most of the time.

After all that hacky code, it’s just a matter of hooking it up to the public Twitter stream. Borrowing from the public Twitter stream mining code, we can pipe every Tweet into the is_haiku function and if it returns a list, add it to our database.

So running this for a while, we actually pick up some pretty entertaining Tweets. I have been running this script for a little while on a micro EC2 instance and created a basic site that shows them in haiku form, as well as a Twitter account that retweets every haiku that it finds.

Some samples of found haikus,




So it’s can be pretty interesting. What this exercise underlines is the publicity of your Tweets. There might be some robot out there mining all that stuff. In fact, every Tweet is archived by the Library of Congress, so be mindful what you post.

I have posted the full script in as a Gist that puts it all together. If you have any improvements or comments, feel free to contribute!

Twitter as a Cure for Perfectionism

Perfectionism can cripple productivity in that it will stop you from even getting started. Why do something now if it will not be perfect? Why should I not shave this yak? This problem is particularly insidious in programming and design,  in that you can argue that there may very well be a “perfect” way of doing it. The global optimum of some function, or the minimum amount of ink to convey an idea. You can essentially work forever and never achieve perfection. There is a cure, in an unlikely place: Twitter.

I was (and to some degree still am) guilty of perfectionism, but I found the most therapeutic device in combating this was having a space to spout off half-cocked ideas into the ether and watch them linger. I noticed that no matter how much or how little time I spent crafting these short messages, it was always kind of short of perfect and it did not matter.

If you find yourself bored one day, log into a streaming Twitter client (e.g. TweetDeck) and add a trending hash tag column. What you’ll see immediately is that almost no tweet is significant. You could craft the most beautiful, intelligent short poem and post it to an unknown quantity of people. Could no one see it? Perhaps, though unlikely. Could someone see it and move on to the thousands of other messages they’re trying to consume? Absolutely.

This exercise reinforces that more important than holding something to perfection, you should let it go. Someone may call you mediocre, but that is just as insignificant. If they someone sees it and doesn’t think it’s great, it’s unlikely they’ll do anything but move on (unbeknownst to you). A kind of social nihilism.

This echos with almost all social networks where there really is no way to be negative using the software (e.g. favorite, like, heart), Hater App excluded.

While this technique may not work in quite the same way if you have over 100k followers, it addresses the real problem at hand: worrying that you’re not as great as you want to be and people will find out. Tweet more, ship more, write more, design more, who cares?

Your website should (probably) not have a unique mobile version

If you are the owner of a higher resolution mobile or tablet device (such as the iOS devices, Google Nexus devices, Samsung devices, etc.), you understand the frustration when a website defaults you to a mobile version or mobile “friendly” version of the site. Even more off putting is that many of these mobile sites skin themselves to iOS, such as Wunderground.

Screenshot of Wunderground mobile version of the site on Samsung Galaxy Note II

For those weather enthusiasts that are not using an iOS device, it at very least gives the impression that you do not care about your non-iOS users. Moreover, the information is not presented in such a way that someone with a larger display than a previous generation iOS device can appreciate.

If we take a look at the desktop version of the site on my Galaxy Note II, you can see it is quite pleasing while presenting more information than before.

Screenshot of Wunderground desktop version of the site on Samsung Galaxy Note II





When I was using my iPhone 3G, the mobile site was quite pleasing. But that was years ago now, and average device resolution has improved with time. Back then it was “Browser”, now it’s Firefox Mobile or Chrome (or some other WebKit based browser, such as Safari). If a mobile user is using Chrome, it is likely that they are using a next generation device and do not wish to see a mobile version of the site.

You can argue that older devices and even iPhones need these smaller, custom, web versions but many users disagree.

Unless your analytics tell you straight up that the unique mobile versions are worth maintaining, this type of behavior feels like supporting IE 5 and 800×600 did in 2007. Something obnoxious that your behind the times policy at work makes you do.

The elephant in the room, of course, is that responsive design has more or less made this practice not only in bad taste, but bad design and bad code. It reflects poorly upon you, the developer, when you have generic mobile version of your site, rather than some nice responsive design that just does it for you. While some sites execute these custom sites beautifully, devices are becoming more advanced with mobile device resolution increasing with each product launch. Annoying your users for the benefit of late adopters will punish those desirable, more likely to convert, users while rewarding the reluctant.

Configure a USB foot pedal (or remap any key) on Linux

I wanted a USB foot pedal solely for the purposes of push-to-talk (PTT) on Mumble, so I bought one and eventually got it working. Configuring any off-the-beaten-path device on Linux can be kind of a pain, but it does not always have to be. Hopefully this helps someone else with the same problem.

If you are really into it, you can build your own foot pedal or you can simply buy one for about $12 USD online. The pedal I bought was a just generic USB gaming foot pedal on Amazon, but there are fancier ones out there with multiple pedals as well, which we can address later on.

Essentially, the foot pedal will act as a USB keyboard with some kind of default key press. This should work the same for nearly every USB foot pedal, but the default key is not always ideal. In my case, the press was mapped to a lowercase b, which does not really help anyone. So the real fun to configuring this pedal is rebinding that key press to something else, exclusively for the foot pedal. If you are happy with your default foot pedal press, then you can stop here, otherwise let’s get to rebinding.

Note: For the entirety of this tutorial, I will be using my own unique values returned from various commands we will be using. I have done my best to note those values that you should replace with your own unique results. One other thing, we’re basically going over how to remap a key in Linux under the specific case for a USB foot pedal. You can use this tutorial to remap any key on any keyboard if you wish!

Once you have plugged in your foot pedal, we need to find out how the foot pedal is addressed by the operating system.

Looking down the list, I see a device that is probably my foot pedal,  Bus 006 Device 004: ID 0c45:7403 Microdia. We will need more information on it than that, so let’s go a little deeper.

Note the argument  -d 0c45:7403 here is my device ID that shows up in lsusb to limit the verbose output, which we widdle down to the just the parts we need.

Moving on, we need to find how the USB foot pedal is addressed as an input device so that we can remap it.

The results of  /lib/udev/findkeyboards tells us what’s plugged in and how we can address it specifically. I suspect that input/event11 is my device, so I will try that out with our next command. If you accidentily choose your primary keyboard, press ESC to get back to the command prompt.

There we can see the scan code that we detect when the foot pedal is pressed, in my case that was 0x70005. Quick note for those with a multiple pedal system, click each pedal individually and note which pedal corresponds to which scan code for later.

Alright, we’re almost there, but we’re about to get a little weird. Using your favorite editor as superuser, open up  /lib/udev/rules.d/95-keymap.rules.

At the bottom of the this file, we’re going to append a new line that is similar to the others, except with our new device’s configuration.

Very important note here, the ID_VENDOR  is set to our result from  lsusb before while the idProduct  matches the 0x7403  we got again from lsusb and it comes just before the line LABEL="keyboard_end" . This will very likely be different from your configuration, so be sure to substitute your unique values in here, as with the entirety of this tutorial.

Save and close the file. We now can remap the key press to something more palatable for PTT stuff on Mumble by creating a keymap file.

Using your favorite editor, create a new keymap file at /lib/udev/keymaps/microdia (substituting your LABEL  from before as the filename). In that new file, it is as simple as using the scan code we got before and the new key we want to map it to. In my case, I wanted to map it to the phantom F13  key, so that it never gets in the way.

Save and close the file, and we are basically done. Run the last command to get it up and running,

And we’re done! Note that you will need to reboot your machine for the change to be permanent, but otherwise you should be good.

If you have any problems, questions or suggestions, leave a comment!

College graduates are depressed, and they should (and shouldn’t) be

Recent college graduates, sold a guaranteed future in exchange for unabsolvable debt, are increasingly finding that the emperor has no clothes–working sucks, and not in a whiny small way. Graduating in some of the worst economic conditions in almost a century, young folks are at the whim not of the educated but the established money/power architecture. This is not to say that we should all be black bloc anarchists burning effigies of “fat cats” or the like. Blame cannot be (entirely) placed on a given person or class of people, but on what is essentially the fastest clip of advancement in human history.

The New York Times recently published an article on the lives of 20 somethings in this country (for future reference, the United States). The article resonates with almost everyone I know personally, in that we work long hours for little pay and little hope for advancement. The only industry in which this might not be entirely true is software/tech. However, even within the startup culture you have long hours paid below market value pay for the hopes of some big payoff down the road (sound familiar?).

You can focus on marketing, “creative work”, etc. when observing these trends, but it can be seen in what were traditionally solid careers in science, technology, engineering and math (STEM) related fields. Analysis of the current social and economic climates often ignores the role of advancing technology, at least in a high level way.

Beyond just the oft heard complaints of being “always on” with smart phones, etc., the real role that technology is playing is far deeper–it is shifting, as it always has, towards more efficient work. The output of workers since 1947 has increased dramatically, while compensation has stagnated for over 10 years, seen in the chart below:

That major bump in productivity in the mid to late ’90s is no coincidence: computer and Internet technology has vastly increased our productivity, while employers are opting not to increase compensation. And why should they?

Employers have the upper hand in that jobs are scarce while talent is not. Increased productivity also means that fewer workers are needed to complete the same amount of work. Experience is waved over workers new to the workforce as a justification for a lack of compensation and advancement. More insidious than this, of course, the unpaid, borderline illegal internships that litter the job landscape and further depress wages. Often times those wishing to strive for a high level profession, such as medicine, have to suffer through incredibly harsh hours with no compensation in order to qualify for college applications. Creatives in the film and book industries see this as well.

The underlying trend in all of this follows that the young are asked to sacrifice tremendously for the hopes of a payoff later in life. This has been the song of the ages, you reap what you sow. The question lies in what really is the final payoff. It is far from guaranteed to happen, at least in the capacity commensurate with the level of effort.

It is not all gloom and doom, though. Looking forward to the future, technology will continue to improve our lives and make the world better for all. Perhaps this is period of adjustment. Re-calibrating to the new level of what is possible.