The latest version of the Stack Overflow Trilogy Creative Commons Data Dump is now available. This reflects all public data in …

  • Stack Overflow
  • Server Fault
  • Super User
  • Meta Stack Overflow

… up to March 2010.

Download the Stack Overflow Trilogy Creative Commons Data Dump via BitTorrent

Please note that the Stack Overflow trilogy data dumps are now hosted at LegalTorrents! You can subscribe via RSS and be notified every time a new dump is available.

Have fun remixing and reusing; all we ask is for proper attribution.

«
»
  1. Jeff Atwood says:

    new to this dump is the email/ip user gravatar hashes. The hash is email, if provided, and if not, the last known IP address of the user.

  2. Brian Gianforcaro says:

    If you want a NOSQL way to play with the dumps, but not digging the XML, check out the SO dump importer for MongoDB. It’s super simple, and fast.

    http://github.com/bgianfo/stackoverflow-mongodb

  3. Greg Hewgill says:

    I found an odd problem in the March dump, the comments.xml file appears to be incomplete for three of the four dumps. The date of the last comment for each site is:

    META – 2009-09-03
    SU – 2010-02-24
    SF – 2010-01-21
    SO – 2010-02-28

    It looks like some of the comments.xml files in the last dump are also truncated, but at different dates.

  4. Geoff Dalgas says:

    Good find Greg – the data dump for next month will contain all of the missing comments.

  5. David Keeney says:

    .. and if you are looking for an SQL way to play with the dumps, Rdbhost has the data online:

    http://www.rdbhost.com/rdbadmin/main.html?r0000000767

  6. Brent Ozar says:

    … and if you are looking for a Microsoft SQL Server way to play with the dumps, I’ve got the data online too:

    http://www.brentozar.com/archive/2010/02/querying-the-stackoverflow-data-dump/

Leave a Reply