...
The entire Common Crawl data set is stored on Amazon S3 as an Public Data Set.
http://aws.amazon.com/datasets/41740
The directory structure is as follows:
Crawl #1 - s3://aws-publicdatasets/common-crawl/crawl-001/
Crawl #2 - s3://aws-publicdatasets/common-crawl/crawl-002/
Crawl #3 - s3://aws-publicdatasets/common-crawl/parse-output/
...