Creative Commons Data Dump Jun ’11

Jeff Atwood

The latest version of the Stack Exchange Creative Commons Data Dump is now available. This reflects all public data in …

Stack Overflow
Server Fault
Super User
Stack Apps
all public non-beta Stack Exchange Sites
all corresponding meta sites

… up to Jun 2011.

This month’s Stack Exchange data dump, as always, is hosted at ClearBits! You can subscribe via RSS to be notified every time a new dump is available.

Please read, this is not the usual yadda yadda! Three things:

Because the dumps are quite a bit of work for us, we’re moving to a bi-monthly schedule instead of monthly. Meaning, you can expect dumps every two months instead of every month. If you have an urgent need for more timely data than this, contact us directly, or use the Stack Exchange Data Explorer, which will continue to be updated monthly.
The attribution rules have changed to forbid JavaScript generated attribution links.
As of November 2010, we enhanced the format of the data dump to include more requested fields, full revision history, and many other pending meta requests tagged [data-dump]. That’s why the dump is so much larger, but we did break it out in individual files per site within the torrent, so you can download just the files you need.

If you’d prefer not to download the torrent and would rather play with the most recent public data in your web browser right now, check out our open source Stack Exchange Data Explorer.

Have fun remixing and reusing; all we ask is for proper attribution.

posted June 19th, 2011 under cc-wiki-dump

6 Comments

Jeff Johnson says:
June 20th, 2011 at 10:29 am
I realize it’s a lot of work to manage these data dumps. Thanks for posting these, it’s fun to analyze and play with.
Ben Towne says:
June 20th, 2011 at 2:19 pm
Thanks for posting this data. I look forward to seeing what I can see in it.
The Readme has a partial key for numeric fields, such as PostHistoryTypeId. Is there a more complete key around? For example, what does it mean when it = 25?
Thanks,
Ben
Carl Partridge says:
July 8th, 2011 at 12:23 pm
I realise that these data dumps are useful, but I have noticed a huge amount of sites springing up that essentially just re-publish the content to generate their own advertising revenue.

In addition to ripping you guys off, this also makes it harder to find useful results on a search engine, since half the results are just duplicates of existing stack overflow content.

Is there nothing that can be done to limit this?
Ben Towne says:
July 25th, 2011 at 2:09 pm
The field association_id seems to be absent from users.xml. Is there any way to generate it from the data that is present?
Stu Thompson says:
August 10th, 2011 at 6:44 am
As usual, although two months late, the dumps can downloaded via HTTP at http://bit.ly/98j9jn
Jeff Johnson says:
August 29th, 2011 at 10:32 am
@Carl Partridge: Every google search I do for anything tech related usually puts stackoverflow.com at the top. What sites are you seeing with what queries?

All of stackexchange is licensed under creative commons, so there are bound to be the odd search result that shows another site somewhere in the list.

Creative Commons Data Dump Jun ’11

Jeff Atwood

Leave a Reply

Recently

Categories

Archive

RSS

Flair