Topic Discovery With Apache Pig and Mallet

Datetime:2016-08-23 02:39:26          Topic: Apache Pig           Share

Only one of two posts from this blog in 2012 but it is a useful one.

From the post:

A common desire when working with natural language is topic discovery. That is, given a set of documents (eg. tweets, blog posts, emails) you would like to discover the topics inherent in those documents. Often this method is used to summarize a large corpus of text so it can be quickly understood what that text is ‘about’. You can go further and use topic discovery as a way to classify new documents or to group and organize the documents you’ve done topic discovery on.

Walks through the use of Pig and Mallet on a newsgroup data set.

I have been thinking about getting one of those unlimited download newsgroup accounts.

Maybe I need to go ahead and start building some newsgroup data sets.





About List