> NLTK is very well-documented and easy to work with. Agreed. > It's als...

jnbiche · on Jan 17, 2012

>Don't agree. The biggest missing piece is a statistical >parser which forms the basis for a lot of further linguistic >analysis.

Conceded and agreed. This is the one major gap. But I still maintain it's a remarkably complete toolkit. Plus you get to work in Python, which is a big advantage for me.

What's wrong with the Naive Bayes classifier? Did you submit a patch?

Likewise, I totally agree with you that there are faster/more accurate/more efficient implementations of many of the tools in the NLTK. If performance is a must, then you're better of prototyping in NLTK then using a specialized library. But in terms of completeness and ease of use, NLTK is very strong.

EDIT: I'm not sure why abhaga is being downvoted. There was nothing disrespectful in his response to me. Disagreement is an important part of intelligent discussion. Upvoting to counter the downvote(s).

abhaga · on Jan 17, 2012

> What's wrong with the Naive Bayes classifier?

The problem I found is that it mixes up the binomial and the multinomial event models for the naive bayes (See http://www.cs.cmu.edu/~knigam/papers/multinomial-aaaiws98.pd... for reference). It computes the probabilities as per the binomial event model but doesn't include the probabilities of missing events. This was my understanding from reading the source code.

> Plus you get to work in Python, which is a big advantage for me.

Indeed. I so wish someone would build a dependency parser on top of pfp so that I can ditch Stanford parser. I have used https://github.com/dasmith/stanford-corenlp-python for interfacing with Stanford toolkit but it is somewhat brittle.

queensnake · on Jan 17, 2012

No SVM support either. I could try to add it I guess; libSVM has Python bindings already.

jnbiche · on Jan 18, 2012

As far as I know, NLTK has no C dependencies other than its general dependency on NumPy. I think they are keeping the toolkit in pure Python on purpose (but I may be wrong about that). That said there are SVM implementations in pure Python -- PyMVPA, for one.