Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Common Crawl corpus has information related to politics, including political speeches, the fulltext of bills, and news articles that mention politicians. By identifying web pages that relate to politics, you could find out what words are associated with individual politicians.

Bollywood

Bollywood is the second biggest movie industry after Hollywood. It has a far reaching social and economic impact. It would be interesting to get an analysis of movie watching trends in the south and north of India from Common Crawl data.