The pandas data analysis module is quickly becoming the go-to tool for data analysis in Python. Certain features, such as in memory joins and sorts, become extremely powerful when dealing with in-memory datasets. Often times, operations that take hours in Excel to execute take only seconds using pandas.
As the recent re-covert to Mac OS X, I wanted to get setup with the development version of pandas on my new machine running Mac OS X 10.8.
To begin, we need to have a few things installed, particularly
pip and homebrew.
If you have not yet installed
pip, and have a valid Python installation on your machine, simply run
sudo easy_install pip in your terminal.
Once that’s done, we need to install a few libraries before trying to install our Python libraries.
$ brew update
$ brew install hdf5
$ brew install gcc
$ brew install gfortran
This will bring in all the compilers and libraries that we’re going to need to build our stuff later on.
Assuming that you want the following libraries installed at the global Python install level, rather than a virtual environment, you can install the requirements to build pandas in a single line.
$ sudo pip install cython numpy numexpr tables scipy matplotlib bottleneck
With that, you should be able to clone the latest pandas repository and install the latest development version.
$ git clone https://github.com/pydata/pandas.git
$ cd pandas
$ sudo python setup.py install
That’s pretty much it, if you have any problems, feel free to leave a comment.