Pairing gender detection with word count statistics

Have you ever wondered what words in the NY Times are more likely to be associated with the male and female genders? Neal Caren has!

His blog post, Using Python to see how the Times writes about men and women, uses the friendly natural language processing (NLP) library, known as the Natural language ToolKit (NLTK). He goes through the detail of each step and describes the his start to finish method for the analysis.

If you’re looking to get your feet wet with a bit of text mining, replicating his approach on a news corpus is a great, fun start!

More on the NLTK, an open-source, community driven effort is available here. There are many quick, relatively easy to implement exercises that can be replicated with basic Python knowledge.

Leave a Reply

Your email address will not be published. Required fields are marked *

*