Difference between Python’s collections.Counter and nltk.probability.FreqDist

I want to calculate the term-frequencies of words in a text corpus. I’ve been using NLTK’s word_tokenize followed by probability.FreqDist for some time to get this done. The word_tokenize returns a list, which is converted to a frequency distribution by FreqDist. However, I recently came across the Counter function in collections (collections.Counter), which seems to be doing the exact same thing. Both FreqDist and Counter have a most_common(n) function which return the n most common words. Does anyone know if there’s a difference between these two? Is one faster than the other? Are there cases where one would work and the other wouldn’t?

How can you get the call tree with Python profilers?

I used to use a nice Apple profiler that is built into the System Monitor application. As long as your C++ code was compiled with debug information, you could sample your running application and it would print out an indented tree telling you what percent of the parent function’s time was spent in this function (and the body vs. other function calls).