The latest version of the Stack Overflow Creative Commons Data Dump is now available. This reflects all public data in Stack Overflow up to July 2009.
Download the Stack Overflow Creative Commons Data Dump via BitTorrent
New in this month’s dump:
- all public favorites data now is included, per-user.
- license.txt
- readme.txt
I have created a category for the monthly data dumps; if you’re interested you can subscribe via RSS to this category and be notified every time a new dump is available.
Have fun remixing and reusing, just remember — all we ask is for proper attribution!
July 7th, 2009 at 5:05 am
Thanks Jeff,
I’ll fix up http://github.com/sambo99/So-Slow/tree/master tomorrow to reflect the new fields.
July 7th, 2009 at 5:12 am
There goes my weekend ;)
July 7th, 2009 at 5:28 am
Would be great to see some examples of re-mixing and re-using this data showcased here.
July 7th, 2009 at 6:34 am
Give us examples! I’m happy to blog about any cool uses of the data!
July 7th, 2009 at 7:49 am
@Jeff: Have you tried doing an export of the data without the text fields (for posts and comments)? How big is it? Could you stand to put two downloads up?
July 7th, 2009 at 2:39 pm
One-click bz2′d tar file link: http://media10.simplex.tv/content/xtendx/stu/stackoverflow/so-export-2009-07.tar.bz2
July 7th, 2009 at 3:05 pm
@Jon — wouldn’t it be relatively simple to strip that out? I’d rather stick with one dump that has everything public, and let people slice and dice it however they see fit.
July 7th, 2009 at 6:30 pm
Grrr, there is a Comment with UserId=”" in there somewhere, had to add in a whole bunch of validation code to work around
July 7th, 2009 at 11:13 pm
Ok my imported now brings in the latest dump, there is an exe on github as well.
It also brings in the IsWiki field on post.
See: http://github.com/SamSaffron/So-Slow/tree/master
July 8th, 2009 at 2:08 am
Does anyone know if there is a (user friendly) offline reader for the data? Something that doesn’t worry about the user data, just tehe questions and answers.
I tried querying the tags with the SQLite version but didn’t have much luck. An SO-thumbdrive edition would be very useful :)