The latest version of the Stack Overflow Trilogy Creative Commons Data Dump is now available. This reflects all public data in …

  • Stack Overflow
  • Server Fault
  • Super User

… up to November 2009.

Download the Stack Overflow Trilogy Creative Commons Data Dump via BitTorrent

Please note that the Stack Overflow trilogy data dumps are now hosted at LegalTorrents! You can subscribe via RSS and be notified every time a new dump is available.

Have fun remixing and reusing; all we ask is for proper attribution.

«
»
  1. voyager says:

    I was wondering when you’d update the site, as you had been retagging with status-completed on the site :)

    New datasets to play with! :D

  2. Doekman says:

    Are there any people who are actually using this?
    I would be curious…

  3. Stu Thompson says:

    Yea! I’m really excited about having the SU/SF data too.

    I’m posting the dump to http://bit.ly/RPQYc this morning for those not Bit Torrent inclined. Also will have a .tar.bz2 version.

    Stu

  4. Stu Thompson says:

    @Doekman: You can see what people are doing with the statitics on MSO:

    http://meta.stackoverflow.com/questions/tagged/data-analysis

    http://meta.stackoverflow.com/questions/tagged/data-dump

  5. Brent Ozar says:

    YES! I’ve been waiting for this. I’ve got SPWho2.com all wired up for the other sites, just gotta import & test it.

  6. Ether says:

    So the data is complete up to Nov 1 2009 00:00:00 UTC?

  7. Jeff Atwood says:

    Data is complete up to the time we run the app that produces the dumps. That’s usually the day of the 1st, sometime, or a bit later.

    We plan to automate this once all the kinks are worked out.

  8. Kevin Connolly says:

    I just thought of something important… You aren’t dumping Careers.SO data, are you?

  9. Jeff Atwood says:

    no, of course not

  10. Kevin Montrose says:

    I’ve been using this as test data for completely unrelated applications. Not exactly intended, but still useful…

  11. Jason says:

    I’m using this to find hidden words embedded in the data that could predict the future.

  12. David Keeney says:

    The latest data dump is now online at rdbhost.

    http://www.rdbhost.com/rdbadmin/main.html?r0000000767

    The data is in a set of tables in a Postgresql database. The front end is an admin program, similar to phpminadmin, with utilities to view tables, as well as an SQL window.

    Access is select-only, but open to the public.

Leave a Reply