We decided early on that all user-generated content on Stack Overflow would be under a Creative Commons license.
All those great Stack Overflow questions, answers, and comments, so generously contributed by all of you, are licensed under cc-wiki:
cc-wiki license
You are free
- to Share — to copy, distribute, and transmit the work
- to Remix — to adapt the work
Under the following conditions
- Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
- Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
The community has selflessly provided all this content in the spirit of sharing and helping each other. In that very same spirit, we are happy to return the favor by providing a database dump of public data.
We always intended to give the contributed content back to the community as a whole. Our primary concern was making sure we didn’t have an AOL-style “incident” where we accidentally release personally identifying information in so-called “sanitized” data. Stack Overflow user Greg Hewgill was kind enough to help us beta test several iterations of the data dump, ensuring that we didn’t release anything except content that is visible on the public website. He also suggested several improvements to improve the data dump, so that it contains as much useful public information as possible.
Cheers, Greg! Also, thanks to Stack Overflow Valued Associate #00003, Geoff Dalgas, who patiently worked through many iterations of this to get it together on our end.
The current anonymized public data dump is 205 megabytes, 7zipped, and contains these files:
- badges.xml
- comments.xml
- posts.xml
- users.xml
- votes.xml
Updated 06/08/09: the following are fixed in the June (06-09) dump
- Slightly more data (May dump was taken at the end of May)
- ParentID is present for Answers (PostTypeId = 2)
- AcceptedAnswerID is present for Questions (PostTypeId = 1)
- Fixed any invalid XML data in all files
- Named the file .7z so people better understand what compression to use
Download the Stack Overflow Creative Commons Data Dump via BitTorrent
Our plan is to create a new data dump every month, reflecting all data in the system up to that month. We will seed the latest and greatest dump (at a low bitrate) as long as we can, ideally permanently.
And yes, it’s still fun to say “data dump”. We look forward to seeing what the community can do with this data!
June 4th, 2009 at 3:08 am
Awesome.
June 4th, 2009 at 3:18 am
Thank you.
June 4th, 2009 at 3:20 am
Cool!
June 4th, 2009 at 3:21 am
Fantastic news. Looking forward to analysing some of this :)
Jon
June 4th, 2009 at 3:31 am
It’s already been said, but: Awesome.
June 4th, 2009 at 3:39 am
Thanks SO team (and Greg), this rocks. Like you, I’m really looking forward to see how this data gets used.
June 4th, 2009 at 3:43 am
Awesome! GJ guys!
June 4th, 2009 at 3:57 am
Wonderful, news.
June 4th, 2009 at 4:04 am
Simply Awesome !!
You guys may say that I am being a pessimist, But I never thought that Jeff and Joel would release these “data dumps”. Now all I have to do is simply create a new SORIP.COM and upload the data to it ;)
June 4th, 2009 at 4:07 am
Excellent! I’ve downloaded the torrent and will be seeding for as long as my server holds up.
There’s one bit of data that probably needs explanation. The VoteTypeId field in votes.xml can be one of the following values:
1 AcceptedByOriginator
2 UpMod
3 DownMod
4 Offensive
5 Favorite
6 Close
7 Reopen
8 BountyStart
9 BountyClose
10 Deletion
11 Undeletion
12 Spam
13 InformModerator
June 4th, 2009 at 4:15 am
From Rhino (Bolt - http://thephotoshopper.blogspot.com/2009/05/bolt-out-of-blue.html )
> You’re beyond awesome! You’re… be-awesome!
Still from Rhino:
> the impossible can become possible if you’re awesome!
June 4th, 2009 at 4:32 am
Really cool. Up to 17 seeds already, and I’ll leave mine up permanently too. Setting up to do a quick video on how to data mine it with SQL Server Analysis Services this morning.
June 4th, 2009 at 4:58 am
Hmm… code_swarm here we come :) Now just to get this into a format that will work for that. May try converting to wikimedia style, then use wiki_swarm.
If I get it working, I’ll post a link.
June 4th, 2009 at 5:00 am
Hi,
Sound like good news but i’m wondering about the share alike part. How can companies endorse people using stack overflow with the danger of using code they were not supposed to, since very few commercial companies can actually abide to this rule
June 4th, 2009 at 5:33 am
Awesome.
One small request: Please use the extension .7z so it’s clear what the file format is.
June 4th, 2009 at 5:56 am
Awesome, I can definitely put this to good use. *Almost* as good as an API, so I’ll take what I can get. Will there be a way to automatically retrieve the latest dump when it arrives? Perhaps an RSS feed?
http://cyber.law.harvard.edu/rss/bitTorrent.html
Oh, and as I’m looking over the data, there appears to be a small bug: everyone is a year older in the dump than on the site.
June 4th, 2009 at 6:02 am
Any apps there to search through this? Could be an awesome KB.
June 4th, 2009 at 6:10 am
Hey,
Whats to stop your competitors (i.e. the hyphen one) from uploading this data into there db’s?
June 4th, 2009 at 6:18 am
You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Has this manner been specified anywhere? That is, what is the expected way to attribute the use of this data?
June 4th, 2009 at 6:20 am
1.3Mb/s with only 18 seeds… and you wonder why anyone holding IP rights is effed.
June 4th, 2009 at 6:21 am
Whoops, make that 1.3MB/s
June 4th, 2009 at 6:34 am
And yes, it’s still fun to say “data dump”.
In the spirit of Jeff’s comment…
Would you consider this an excremental improvement to the system?
June 4th, 2009 at 6:48 am
Thanks for the data dump, this is awesome.
Ditto with tweakt. I didn’t read the entry close enough and was confused why the file wasn’t exploding until I re-read that it was 7zipped.
June 4th, 2009 at 7:13 am
awesome. i’ll get started on the “which porn star are you?” app that correlates your SO score to the popularity of porn stars.
rev 1 will even come with a badge! yay internets!
June 4th, 2009 at 7:19 am
So I’m probably missing something, but how can you license content under a CC-license without the original authors’ consent? Is there a EULA agreement when users sign up that says they relinquish rights to their UGC, to the extent permitted by local law, to SO (whoever that may legally be)? I read the post in the first link - that post does not address what the anchor text suggests it does, though. Where do users give up their rights on the content they submit? The faq doesn’t seem to address this, nor can I find a ToU for on stackoverflow.com (I would expect a link to it in the footer but I can’t find it anywhere else, either).
June 4th, 2009 at 7:23 am
I just posted a blog entry and tutorial video on how to data mine this with Excel and SQL Server Analysis Services:
http://sqlserverpedia.com/blog/sql-server-tutorial/data-mining-the-stackoverflow-database/
June 4th, 2009 at 7:36 am
Mm, 2.2MB/s.
I’m also a little confused as to how the license pertains to code people get help with on the site — would using code from the site therefore require the same license?
June 4th, 2009 at 7:55 am
Maybe there’s something bad with my download, but posts.xml isn’t well-formed for me. In particular, Python’s xml.parser.Expat is giving me this error:
xml.parsers.expat.ExpatError: reference to invalid character number: line 1, column 7602956
Am I the only one having issues with posts.xml, or should I use a parser which is more accepting of bad XML at it? (lxml didn’t work either, actually, but it doesn’t give a column number for failure so its output wasn’t particularly interesting.)
Also, I’m finding it really annoying that there’s no data definition, leaving us all to figure out the format of the XML files ourselves; I wish I knew what attributes to expect (and what format they are in) ahead of time.
June 4th, 2009 at 8:04 am
In case the SQLServerPedia site is unavailable (as it already seems to be) you can get watch the video tutorial on my site (which hilariously, has a bigger server) -
http://www.brentozar.com/archive/2009/06/data-mining-the-stackoverflow-database/
June 4th, 2009 at 8:10 am
> anonymized public data dump
Why bother with the anonymization? It’s got to be trivial to reverse. If you want to keep voting records and such secret, you should just not include them in the dump. Remember when AOL released “anonymous” data?
June 4th, 2009 at 9:11 am
This is wonderful. I can’t wait for all the web apps that are bound to pop up and analyze this in all sorts of ways. Thanks, SO dudes.
June 4th, 2009 at 9:23 am
I’ve posted a slightly-meta’ish-question on Stackoverflow about interesting statistics found in the data..
http://stackoverflow.com/questions/951056/what-interesting-stats-have-you-found-from-the-stack-overflow-data-dump
June 4th, 2009 at 10:00 am
What should people like me do who CAN NOT run torrents? I don’t have anywhere I run a computer where I would be allowed to server out stuff as part of a torrent.
June 4th, 2009 at 11:08 am
Awesome stuff guys , thanks
June 4th, 2009 at 11:35 am
Here’s an interesting essay by Bruce Schneier on anonymous data:
Why “Anonymous” Data Sometimes Isn’t
http://www.schneier.com/essay-200.html
and a more recent blog post:
Identifying People using Anonymous Social Networking Data
http://www.schneier.com/blog/archives/2009/04/identifying_peo.html
Pretty interesting stuff!
June 4th, 2009 at 11:54 am
An XSD would be nice.
June 4th, 2009 at 12:24 pm
>So I’m probably missing something, but how can you
>license content under a CC-license without the >original authors’ consent?
The CC logo and the FAQ clearly explain that content is licensed under CC. You are not relinquishing any rights over your content (IANAL) you still own it and are free to write the same question/answer somewhere else. All you are doing is giving SO the right to use your content under CC.
Code fragments poste din answers would probably be small enough to be used freely in your own code. Me saying use std::stringstream with .str() to convert a long into a string in an answer doesn’t exactly make your software package a derived work.
June 4th, 2009 at 12:38 pm
So can’t someone take this dump and create a competing site?
June 4th, 2009 at 2:18 pm
Yes in theory, but they would have to make it more attractive than SO for visitors, which means better Google rank and there wouldn’t be much point unless you also ran more ads - which would turn visitors away.
Microsoft is doing something like this with their new ‘bing’ search site, they are repackaging wikipedia pages as reference.
June 4th, 2009 at 2:38 pm
> So can’t someone take this dump and create a competing site?
Absolutely. I remember when Wikipedia was new, clone sites came up all the time. Searching for things in Google was annoying because at the time, many of the knockoff clone sites actually had higher pagerank (that didn’t last long).
Although they thought these clone sites were competing in some sense, there was simply no comparison. The live interaction and freshness of content is what makes Wikipedia what it is. It’s the same with Stack Overflow. If somebody were to take the Stack Overflow data, repackage it in almost any way at all (that of course serves their own interest in some way, like with more ads), then it will still ultimately be less interesting than the original site itself. And if you could even ask questions on such a clone site, who would answer them?
June 4th, 2009 at 5:55 pm
Thanks, this is a good contribution to the community.
June 4th, 2009 at 11:20 pm
I think this is a very good initiative, but I also believe that this will create a lot of crap sites that basically just use your data. Today, a search on any major search engine for some concept or somesuch will return a link to a wikipedia page. The same search will also return hundreds of results to other ad-ridden sites that contain basically the same content.
Now, with free content, just like open source in a sense, this isn’t necessarily bad, but it does increase the clutter on the internet, which makes “real” results harder to find. Arguably, if they use the same data as you, the results they deliver are “real” in the same sense as yours. It’s like with refactoring code in a sense. Try to avoid duplicate code that does the same.
Anyway, nice initiative. =)
June 5th, 2009 at 12:13 am
Hi
Like Daniel (#comment-24179) I am having some trouble parsing this posts.xml.
I have tried the .Net XmlReader and XmlTextReader.
I am a noob when it comes to reading XML but have also tried the former with XmlReaderSettings.CheckCharacters turned off. It gets further but still fails.
The 2 errors I get with XmlReader are:
‘ ‘, hexadecimal value 0×1F, is an invalid character. Line 1, position 7602959.
‘<’, hexadecimal value 0×3C, is an invalid attribute character. Line 1, position 44308159.
June 5th, 2009 at 12:39 am
really wish you hadnt used torrent - its blocked on our corporate network.
And i might be a stick in the mud but dont allow torrent software on my home pcs either.
Any chance of simple FTP?
June 5th, 2009 at 5:10 am
It would be nice if you could publish an XSD for this as well, that would make life much easier.
June 5th, 2009 at 9:39 am
@BCS @.jpg
This torrent appears to be small enough for Torrent Relay: http://lifehacker.com/395857/torrentrelay-downloads-any-torrent-through-your-browser
From my understanding free users can retrieve <= 800mB files at 200 KBps maximum download but it won’t seed past completion.
June 5th, 2009 at 11:36 am
What would be a good use for the data?
If I want to search it, I would use SO.
The only benefit I can think of is to create a better search functionality than SO’s. Like with sort options and better filters.
June 5th, 2009 at 2:11 pm
For those that really can’t use bittorrent:
http://tejp.de/files/so-export-2009-05.7z
June 5th, 2009 at 7:30 pm
@jpg, why wouldn’t you “allow” bittorrent software on your home computer? Simple FTP would require potentially costly amounts of bandwidth on the part of Stackoverflow. Using torrents they can utilize the bandwidth of all others who choose to seed and download the data and greatly reduce the strain on their servers.
Even WoW patches are distributed via torrent.
@Abdu, One major possibility I see is the ability to have an “offline” version of Stackoverflow’s content. There are certain instances and places where internet isn’t possible and with this data dump they have access to a very nice programming resource without the need for a connection.
June 5th, 2009 at 7:53 pm
I’m also having problems with invalid characters in posts.xml (using a couple different parsers — including Excel and a Xerces-based interface). Any hints on this?
Also, if you happen have some data analysis insights but don’t happen to have 750 rep yet, you can’t share those insights on SO. :(
June 6th, 2009 at 3:51 am
I have put a view of the file schema on my blog.
StackOverflow Download Data Schema
Hope it helps!
Jon
June 6th, 2009 at 6:23 am
If anyone is having trouble dealing with the XML files, I’ve copied the data to a sqlite3 database, which can be found here:
http://modos.org/so-export-sqlite-2009-05.torrent
The only changes are in the format of times - the date/time literal has been replaced with the equivalent unix timestamp in most places; in the votes table, only the data was provided, so 00:00 was used.
This is an extremely large database (1.0GB uncompressed), and without indexing queries can take minutes to complete. the file index.sql contains sqlite expressions that will create indexes on each integer field in the database, as well as the badges table name field, as it might be helpful to select or group by the name of the badge.
As indexing noticeably adds to the size of the database, to save bandwith no indexing has been done beyond what sqlite3 automatically does for integer primary keys. however, you can add indexing by following the directions in the README file.
June 6th, 2009 at 9:07 am
Thanks @nobody for the prepped sqlite db! How did you get around the xml error in posts.xml? I got this when parsing with cElementTree:
SyntaxError: reference to invalid character number: line 1, column 7602956
Cheers,
Luke
June 6th, 2009 at 10:47 am
I just posted at http://stackoverflow.com/questions/960020/how-can-i-know-the-average-reputation-of-the-users-in-so/ a Python 2.5 script that (on my Mac) parses the .xml files (with cElementTree) without problems — they’re Unicode with a byte-order mark at the start, not sure what underlying parser you have that can’t deal with that. Maybe you could try downloading and installing lxml…?
June 6th, 2009 at 2:53 pm
@Luke Venediger
I wasn’t aware of cElementTree when I wrote my import script, which is more or less regular expressions and generating the appropriate “insert into” statements. It basically did a low-level match for column=”value” on a row by row basis, so it’s very possible for corrupted or invalid data to creep in. In any case, I’ll release it if there’s demand, but it looks like I might be able to get cleaner code and better performance if I go with a library that’s meant for parsing XML, provided I can work around the error that you stumbled across.
June 7th, 2009 at 1:14 am
@Alex: My guess is that the parser your Python script is using just isn’t as string as the .NET one. If there really *is* a character U+001F in the data, it’s undeniably invalid XML.
I’m going to have a look later today, hopefully (and convert the files to Protocol Buffers if possible, partly to see the difference in size).
June 7th, 2009 at 4:52 am
Yeah, it’s got invalid XML in it. I emailed w/Geoff yesterday. There are comments and posts that use strings that, when output to XML, are hosing things up. I don’t know enough (anything) about SQL-to-XML conversions to help on that one.
June 7th, 2009 at 6:44 am
Dang, yeah, I ran into the same problem while generating XSD’s for these. Looking forward to getting the corrected data.
June 7th, 2009 at 8:01 am
Hmm.. I just went into posts.xml with a hex editor, and it looks like the character at position 7,602,956 is an “s” (0×73). It’s in the body of post id 139921 for future reference. Based on Jon Skeet’s post, I also did a search for 0×1f in the file, but I could not find any occurrences. In short, I can’t find anything wrong with the file, but I am concerned about the integrity of the SQLite DB that I’m distributing, so further help in nailing down the source of the invalid XML would be appreciated.
June 7th, 2009 at 9:32 am
@nobody_: The “position” is likely to be a position in terms of XML characters, not raw bytes. It’s also possible that there’s an entity reference of 0×1F rather than the character appearing directly. It’s unfortunately quite tricky to analyze the problem when there’s so much data, and when it’s all on one line :(
If I can work out exactly where the file has gone wrong and how to fix it, I’ll put up a small patch program. In the meantime, I’m just indexing your SQL database :)
It may well be that *using* the database, it’s a lot easier to find the problems in the XML files…
June 7th, 2009 at 10:55 am
We are getting close to a new export file for the month of June which will resolve the data issues. Look for an update on this blog where we will post a new torrent for all to download.
June 7th, 2009 at 11:10 am
Before the new data dump arrives, however, I believe I’ve found all the problems. (Not sure about posts.xml yet, as I haven’t converted that to protobuf format, but the rest work.)
For both comments.xml and posts.xml, open up the files in your favourite “large file hex editor” (I used “HHD Hex Editor Nero”) and use a regular expression (treating the binary data as ASCII) to replace “�[12345678BCEF];” with “?”; ditto for “.;”. This will get rid of all the entity references which lead to invalid XML characters.
Then find “” (that’s the “pre” tag in case it’s stripped here) in posts.xml at offset 0×2a45112. Do what you like with this - I changed it to “[pre]“. That should be all it takes.
Jon
June 7th, 2009 at 1:13 pm
There is also some left angle brackets (U+3008) in http://stackoverflow.com/questions/151744/are-you-using-ascii-art-to-decorate-your-code/151757#151757 that got converted from their Unicode representations to an ASCII left angle bracket instead of their UTF-8 representation. Of course the literal left angle bracket is invalid inside an XML attribute.
I’ve got a very hacky Python script that attempts to sanitise the XML input, it at least makes it possible to parse with the standard Python SAX parser. If there is interest I’ll post the script.
June 7th, 2009 at 3:14 pm
@greg - I’d love to grab that py script from you, if at all possible. I’m getting impatient waiting, and was about to go your route. :)
June 7th, 2009 at 8:31 pm
I managed to clean up the posts.xml file so the .Net XmlReader can parse it. Wrote a little program to add a newline before each row element (its all 1 line otherwise), and then used a text editor to check out the problems.
I then hacked a little util together to produce each “row” as a separate Microsoft Word document. (I am doing this to produce a large set of documents for testing the peformance of another app.) Many hours later I aborted it after 340,000 rows and 3.2 GB of Word documents produced. (I say this cause if anyone else is interested in this sort of usage, let me know and I can look at sharing stuff)
Anyway - Posts.xml has a PostTypeId attribute which seems to contain “1″ or “2″ which I assume represent “question” and “answer” respectively. What isn’t immediately obvious to me is how the answers are mapped to their question. What am I missing?
June 8th, 2009 at 2:01 am
OK, the new June dump is seeded as a torrent now:
- Slightly more data (May dump was taken at the end of May)
- ParentID is present for Answers (PostTypeId = 2)
- AcceptedAnswerID is present for Questions (PostTypeId = 1)
- Fixed any invalid XML data in all files
- Named the file .7z so people better understand what compression to use
June 8th, 2009 at 4:21 am
I haven’t checked the new dump yet, but would it be possible to make future ones create one line per row, instead of the whole XML document being on a single line? On text editors which can cope with large files, that would make it a lot saner to deal with.
I’d like to suggest we create a wiki somewhere with the most interesting queries. (I don’t think an SO question would really be appropriate…)
Oh, and I didn’t *spot* anything to indicate whether a post was community wiki or not. I don’t have the database in front of me at the minute, so I can’t check… but if it’s not there now, could you include it fairly easily?
June 8th, 2009 at 6:09 am
I just wrote a blog entry explaining how to import the data into SQL Server and what the different fields mean:
http://www.brentozar.com/archive/2009/06/how-to-import-the-stackoverflow-xml-into-sql-server/
Jon - if a Post has OwnerUserId = 1, that’s the Community user account, so it’s a wiki.
June 8th, 2009 at 6:15 am
@Brent: Ah, cool. So by counting OwnerUserId=22656 I really was only counting my non-CW posts. Fun.
It’s not often I wish I had better SQL skills, but this kind of data does it. I might generate a Protocol Buffer version without any text in, which should be easily loadable into memory… then I could use LINQ to Objects, which I’m much more comfortable with :)
I wonder if LINQPad has some easy way of loading in data that I could use… otherwise I could just munge Snippy a bit. So sad that I have about 101 other things I really should be doing…
June 8th, 2009 at 7:09 am
Jon - great idea about the wiki. I slapped together a section over at SQLServerPedia:
http://sqlserverpedia.com/wiki/Data_Mining_the_StackOverflow_Database
I’ve got my XML import queries, schema notes and a couple of queries there, and I’ll add more after I get done with my next pet project.
June 8th, 2009 at 8:21 am
@Brent Ozar:
I don’t have the latest dump, so things might have changed, but all my posts have a OwnerUserID of 658, even the ones that are CW. Also, the user id of Community is -1 (Jeff has id 1) and does not own any posts.
@Jon Skeet:
Regarding the Wiki idea, I suggest you contribute to this question dbr set up:
http://stackoverflow.com/questions/951056/what-interesting-stats-can-i-obtain-from-the-stack-overflow-data-dump
I know you said you thought it wasn’t appropriate for a SO question, but I disagree, and I think this is what CW is made for. Besides, it gets the most coverage and visibility if it stays on SO.
June 8th, 2009 at 9:17 am
Fair enough, as there’s already a question there I’ll do what I can with that (when I have time to do anything useful, admittedly - probably not tonight). Unfortunately I’m more likely to be a consumer than a producer on this one.
Jon
June 8th, 2009 at 12:44 pm
It looks like the June dump cleaned up all the remaining XML formatting problems, so that script I mentioned yesterday is no longer needed.
Looking at the latest dump, there appear to be no posts where OwnerUserId=”-1″, so I’m still not certain how to identify community wiki posts.
On another note, the number of questions on a day last week reached 1307, which for the first time exceeds the number of questions per day on launch week (which was 1301). Here’s some simple graphs: http://hewgill.com/~greg/stackoverflow/stats.html
June 8th, 2009 at 2:10 pm
Greg - Sorry, it’s not negative one - that was supposed to be just one. The community ID is 1. If you query the users table it’ll make sense. Nice graphs! I’ve put together a little site showing some of the metrics at:
http://spwho2.com
June 8th, 2009 at 2:25 pm
@Brent Ozar:
Jeff’s ID is 1, Community is -1:
sqlite> select id, displayname from users where id = 1 OR id = -1;
-1|Community
1|Jeff Atwood
capcha: stack 240
June 8th, 2009 at 5:37 pm
This is great that you all did this. It’s good from a transparency and following the CC license sort of standpoint. My question is for users. Judging by the comments, this is a very sought after feature.
What are you going to do with it? Unless you are trying to create a doomed-to-fail ripoff of SO or something like bigresource.com (this pops up in my search results way too much), I don’t really see the use. You can already get the data on SO. I think a full SO API would be more interesting. But that’s just me. I am interested to know what the uses will be though.
June 8th, 2009 at 6:54 pm
Thanks for the update Jeff!
June 9th, 2009 at 9:37 am
I just created a new sqlite3 file with the June 2009 data:
http://modos.org/so-export-sqlite-2009-06.torrent
This time I decided to include indexing directly, so the file’s a bit larger (~500MB gzipped, 1.6GB uncompressed).
Also, please disregard README~, Emacs can be a bit overzealous with its autosaving. Thanks.
June 10th, 2009 at 2:01 am
Is it possible to upload the tag db table(s) as well next time? I understand if that’s too much though. Great stuff regardless!
June 10th, 2009 at 9:13 am
@Glitz:
All the tag data is there, you can find it in the tags column in the posts table, so you can normalize it into a separate table (or two) if you want. However, I do with that the tags were separated by maybe a space instead of >< - it would definitely make it easier to read at a glance.
June 10th, 2009 at 9:15 am
with -> wish
and the >< is supposed to be & gt;& lt;
June 10th, 2009 at 1:11 pm
Is there somewhere else where folks are discussing this? E.g.: I see that the community user (-a) has ~11k of down votes. Why?
(This is just so fracking fun!)
June 10th, 2009 at 2:50 pm
@Stu Thompson:
The community user “Own[s] downvotes on spam/evil posts that get permanently deleted.”
From http://stackoverflow.com/users/-1/community .
June 11th, 2009 at 12:45 am
Ah, cool. Thank you mmyers.
I’ve my first take on the stats, which looks at up vs. down votes over time at http://lanai.dietpizza.ch/geekomatic/2009/06/09/1244565360000.html
Hours of entertainment! I’m so glad this data is available now. :)