The latest version of the Stack Overflow Trilogy Creative Commons Data Dump is now available. This reflects all public data in …
- Stack Overflow
- Server Fault
- Super User
… up to November 2009.
Download the Stack Overflow Trilogy Creative Commons Data Dump via BitTorrent
Please note that the Stack Overflow trilogy data dumps are now hosted at LegalTorrents! You can subscribe via RSS and be notified every time a new dump is available.
Have fun remixing and reusing; all we ask is for proper attribution.
November 1st, 2009 at 10:37 pm
I was wondering when you’d update the site, as you had been retagging with status-completed on the site :)
New datasets to play with! :D
November 2nd, 2009 at 2:14 am
Are there any people who are actually using this?
I would be curious…
November 2nd, 2009 at 4:19 am
Yea! I’m really excited about having the SU/SF data too.
I’m posting the dump to http://bit.ly/RPQYc this morning for those not Bit Torrent inclined. Also will have a .tar.bz2 version.
Stu
November 2nd, 2009 at 4:21 am
@Doekman: You can see what people are doing with the statitics on MSO:
http://meta.stackoverflow.com/questions/tagged/data-analysis
http://meta.stackoverflow.com/questions/tagged/data-dump
November 2nd, 2009 at 6:52 am
YES! I’ve been waiting for this. I’ve got SPWho2.com all wired up for the other sites, just gotta import & test it.
November 2nd, 2009 at 10:52 am
So the data is complete up to Nov 1 2009 00:00:00 UTC?
November 2nd, 2009 at 11:27 pm
Data is complete up to the time we run the app that produces the dumps. That’s usually the day of the 1st, sometime, or a bit later.
We plan to automate this once all the kinks are worked out.
November 3rd, 2009 at 8:13 am
I just thought of something important… You aren’t dumping Careers.SO data, are you?
November 4th, 2009 at 3:00 am
no, of course not
November 4th, 2009 at 10:31 pm
I’ve been using this as test data for completely unrelated applications. Not exactly intended, but still useful…
November 5th, 2009 at 5:59 am
I’m using this to find hidden words embedded in the data that could predict the future.
November 9th, 2009 at 10:46 pm
The latest data dump is now online at rdbhost.
http://www.rdbhost.com/rdbadmin/main.html?r0000000767
The data is in a set of tables in a Postgresql database. The front end is an admin program, similar to phpminadmin, with utilities to view tables, as well as an SQL window.
Access is select-only, but open to the public.