The latest version of the Stack Exchange Creative Commons Data Dump is now available. This reflects all public data in …

… up to Jun 2011.

Download from ClearBits

This month’s Stack Exchange data dump, as always, is hosted at ClearBits! You can subscribe via RSS to be notified every time a new dump is available.

Please read, this is not the usual yadda yadda! Three things:

  1. Because the dumps are quite a bit of work for us, we’re moving to a bi-monthly schedule instead of monthly. Meaning, you can expect dumps every two months instead of every month. If you have an urgent need for more timely data than this, contact us directly, or use the Stack Exchange Data Explorer, which will continue to be updated monthly.
  2. The attribution rules have changed to forbid JavaScript generated attribution links.
  3. As of November 2010, we enhanced the format of the data dump to include more requested fields, full revision history, and many other pending meta requests tagged [data-dump]. That’s why the dump is so much larger, but we did break it out in individual files per site within the torrent, so you can download just the files you need.

If you’d prefer not to download the torrent and would rather play with the most recent public data in your web browser right now, check out our open source Stack Exchange Data Explorer.

Have fun remixing and reusing; all we ask is for proper attribution.

«
»
  1. Jeff Johnson says:

    I realize it’s a lot of work to manage these data dumps. Thanks for posting these, it’s fun to analyze and play with.

  2. Ben Towne says:

    Thanks for posting this data. I look forward to seeing what I can see in it.
    The Readme has a partial key for numeric fields, such as PostHistoryTypeId. Is there a more complete key around? For example, what does it mean when it = 25?
    Thanks,
    Ben

  3. Carl Partridge says:

    I realise that these data dumps are useful, but I have noticed a huge amount of sites springing up that essentially just re-publish the content to generate their own advertising revenue.

    In addition to ripping you guys off, this also makes it harder to find useful results on a search engine, since half the results are just duplicates of existing stack overflow content.

    Is there nothing that can be done to limit this?

  4. Ben Towne says:

    The field association_id seems to be absent from users.xml. Is there any way to generate it from the data that is present?

  5. Stu Thompson says:

    As usual, although two months late, the dumps can downloaded via HTTP at http://bit.ly/98j9jn

  6. Jeff Johnson says:

    @Carl Partridge: Every google search I do for anything tech related usually puts stackoverflow.com at the top. What sites are you seeing with what queries?

    All of stackexchange is licensed under creative commons, so there are bound to be the odd search result that shows another site somewhere in the list.

Leave a Reply