Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Amazon Web Services

Common Crawl data is stored as a public data set on Amazon Web Services (AWS), making it free to access using your AWS credentials and Elastic MapReduce. If you don’t already have an account with Amazon Web Services, you'll need to create one to get started.

...

The Access Key ID and Secret Access Key verifies that you are accessing data on Amazon’s cloud.  They can be used to authorize things that cost money, so be sure to keep this information in a safe place.

Local Development Environment (Java/Hadoop/Eclipse)

Yahoo! provides an excellent tutorial showing how to set up a local MapReduce development environment:

...