Pandas 0.7.0 released: Python data analysis library

rch · on Feb 10, 2012

Pandas is looking very nice in general, and I'm happy to find HDF5 in there too :)

http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytabl...

joelthelion · on Feb 10, 2012

Ooooh, this is really cool! R is nice, but switching to it is a pain when working in python.

Together with scikits.learn, this could prove really useful in machine learning and data analysis projects.

acslater00 · on Feb 10, 2012

Pandas is literally in my top 3 favorite open-source projects. I use v0.6x regularly and it is absolutely fantastic. Highly recommend. Can't wait to try some of these new features as well.

dshah · on Feb 11, 2012

I'm curious: What are your other two favorites?

radikalus · on Feb 10, 2012

I'm not sure if I was starting fresh I'd go with R anymore -- it's so hard to leave R when you've got a toolkit of 50+ packages you need though. =(

coda_ · on Feb 10, 2012

Wes, was just on your blog and see you are also into data visualization. Wondering if you have any recommendations for web based charting tools? I've used flot (jquery plugin), but looking for alternatives. Thanks!

wesm · on Feb 11, 2012

Very interested in d3 integration. Some people (http://github.com/mikedewar/D3py) have already started working in that direction. The IPython HTML notebook makes JavaScript visualization combined with pandas a very attractive option going forward, especially if you can come up with a way to have an interactive plot with backend computations being handled by pandas. pandas currently does not emit JSON; I would live to adapt UltraJSON or another library to turn DataFrame objects into JSON very fast and efficiently.

Ecio78 · on Feb 11, 2012

I've used http://www.amcharts.com (and its mapping companion http://www.ammap.com). It was originally in flash but now there's also a js version (havent tried it yet). I think i've read here on HN that a new flot fork has been released with jquery requirements but i cant find the link

Cieplak · on Feb 10, 2012

http://mbostock.github.com/d3/

http://raphaeljs.com/

mahmoudimus · on Feb 11, 2012

I like http://www.highcharts.com/

wildmXranat · on Feb 10, 2012

I tried to find information on how fast the operations are in Pandas, but couldn't see any numbers. Does anybody have opinions about that aspect?

wesm · on Feb 10, 2012

I've written quite a bit about performance on my blog: http://blog.wesmckinney.com. The historical (v)benchmarks page is a good resource (but doesn't compare to any other libraries): http://pandas.pydata.org/pandas-docs/vbench/

hhimanshu · on Feb 10, 2012

what are you using to display code on your blog, it's really nice!

wesm · on Feb 10, 2012

Recent posts use the Crayon syntax highlighter for Wordpress. Though I'm thinking about ditching WP eventually for a workflow more like http://jseabold.net/blog/2012/01/project-genesis.html.

gourneau · on Feb 10, 2012

Wes is a rockstar

regularfry · on Feb 11, 2012

> NaN (not a number) is the standard missing data marker used in pandas

That's just wrong.

wesm · on Feb 11, 2012

Is it? For lack of NA bit patterns in NumPy it's either use a special value (like NaN) or use masked arrays. If you choose the latter, I say to you: good luck.

regularfry · on Feb 12, 2012

NaN as commonly used already has a meaning: it's the result of a calculation whose inputs were known, and the calculation is known to be undefined for those specific inputs. "Unknown" means something entirely different: that we don't know what the inputs were, but if we did they are unlikely to have been NaN.

Conflating the two concepts means you can't tell the difference given the result set. It's just a happy accident that "unknown" and NaN have identical propagation rules, but that doesn't mean that it's safe to use one in place of the other. Reading up on it, it looks like Octave and Matlab can treat NaN as "missing data", though, so I guess there's a certain "industry standard behaviour" to follow so as not to surprise users, but it's still less than ideal.

In an ideal world, we could define an explicit "missing data" quiet NaN which would have a distinct visual representation - I suspect this is doable with access to the float exponent bits, but I don't know how Python could take advantage of it.

jofer · on Feb 11, 2012

Well, masked arrays are a very good solution for the right problem (i.e. temporarily or permanently flagging data as "bad" while preserving the original data). Not to rehash the old debate, but they're quite handy when you need them.

I do agree that NaN's are a better choice for truly missing data, but I'm biased just because they use less memory. They're not a solution for non-floating point data, though.

Great job on Pandas, by the way!

atron306 · on Feb 10, 2012

Pandas + statsmodels = #rstats domination. Really like where this project is going.

grogenaut · on Feb 11, 2012

It'd be better if you lost the ragging on c#/c++/java and went positive by accentuating the great abilities of interpreted languages like python for rapid prototyping, which is what you are doing when you are iteratively improving analysis.

FYI it's fun to hear an academic ragging on "unmaintainable code".

wesm · on Feb 11, 2012

Who's the academic you're referring to (if it's me, you're misinformed)?

One of the strengths of Python is that you can use it to build critical production systems (which I've done for many years in the financial industry). You come up against a lot of people who think "Java/C++/C# are the only suitable systems languages".

grogenaut · on Feb 12, 2012

I use python at work heavily. I also equally use Java, C++, C#, Ruby, and shell scripting. I use what's good for what I'm trying to do, and I like having several choices.

I'm merely pointing out that the language bashing is not productive. The writeup should point out the positives and stop trying to turn the differences between languages into a parallel of state of American political discourse.

orp · on Feb 10, 2012

Is there anything similar to Pandas that runs on the JVM?

ogrisel · on Feb 10, 2012

Incanter: http://incanter.org/

zentrus · on Feb 11, 2012

Is there anything close to this for Ruby?