The latest version of the Stack Overflow Creative Commons Data Dump is now available. This reflects all public data in Stack Overflow up to July 2009.

Download the Stack Overflow Creative Commons Data Dump via BitTorrent

New in this month’s dump:

  • all public favorites data now is included, per-user.
  • license.txt
  • readme.txt

I have created a category for the monthly data dumps; if you’re interested you can subscribe via RSS to this category and be notified every time a new dump is available.

Have fun remixing and reusing, just remember — all we ask is for proper attribution!

« Are you a human being?
Cross-Site Account Associations »

10 Responses

  1. sam says:

    Thanks Jeff,

    I’ll fix up http://github.com/sambo99/So-Slow/tree/master tomorrow to reflect the new fields.

  2. Stu Thompson says:

    There goes my weekend ;)

  3. Alastair Smith says:

    Would be great to see some examples of re-mixing and re-using this data showcased here.

  4. Jeff Atwood says:

    Give us examples! I’m happy to blog about any cool uses of the data!

  5. Jon Skeet says:

    @Jeff: Have you tried doing an export of the data without the text fields (for posts and comments)? How big is it? Could you stand to put two downloads up?

  6. Stu Thompson says:

    One-click bz2′d tar file link: http://media10.simplex.tv/content/xtendx/stu/stackoverflow/so-export-2009-07.tar.bz2

  7. Jeff Atwood says:

    @Jon — wouldn’t it be relatively simple to strip that out? I’d rather stick with one dump that has everything public, and let people slice and dice it however they see fit.

  8. Sam says:

    Grrr, there is a Comment with UserId=”" in there somewhere, had to add in a whole bunch of validation code to work around

  9. Sam says:

    Ok my imported now brings in the latest dump, there is an exe on github as well.

    It also brings in the IsWiki field on post.

    See: http://github.com/SamSaffron/So-Slow/tree/master

  10. Chris says:

    Does anyone know if there is a (user friendly) offline reader for the data? Something that doesn’t worry about the user data, just tehe questions and answers.

    I tried querying the tags with the SQLite version but didn’t have much luck. An SO-thumbdrive edition would be very useful :)

Leave a Reply