Join the Stack Overflow Community
Stack Overflow is a community of 6.5 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I recently learned how to use heaps and the beauty of heapsort. I decided to compare heapsort with std::sort in C++ and Arrays.sort() in Java. I sorted an array of integers, each randomly generated in the range of <0; 2000000000)

I generated 100,000,000 integers into an array in Java, and ran Arrays.sort(), then generated new random sequence and ran my heapSort(). This is an output of my program in Java:

Arrays.sort time: 10.923 seconds.

Heap sort time: 1.402 seconds.

So heapsort is about 8 times faster.

I then ran similar code in C++, this time using std::vector as my container (due to the fact that std::sort needs two iterators).

C++ results:

Heapsort: 3.213

std::sort: 37.264

So in my program, std::sort is about 12 times slower.

In Java, I measured time using System.currentTimeMilis() and in C++ I used clock() from .

This was tested on Windows 7, Quad-Core Intel i5 2500k, overclocked to 4.8GHz. C++ was compiled with -Wall -pedantic flags.

Can anyone tell me what is going on? Is heapsort really that much faster? Or did I make an mistake in my code? I don't want to flood this post with lots of code, so I will link it at the end of this post.

BTW: Yes, I'm aware that Arrays.sort() is stable and heapsort isn't. Java doesn't have an unstable sort (atleast, I haven't found one). That is why I used std::sort in C++, to see if it had something to do with stability or not.

Source code, both C++ and Java: https://gist.github.com/anonymous/7475399

share|improve this question
    
Is this a bug? int tmp = heap[0]; heap[i] = heap[0]; heap[i] = tmp; – Darthfett Nov 14 '13 at 22:32
    
IIRC Java Arrays.sort() is overloaded. For an array of builtin type the implementation is permitted to use a non-stable sort, but for an array of Object type it must be stable. So, if the implementation chooses to use a stable sort algorithm even for builtin types, you cannot be blamed for comparing it against heapsort. – Steve Jessop Nov 14 '13 at 22:33
3  
"Why am I way more amazing than standard library code that's been deployed and scrutinized millions of times" => double-check for correctness of your program... – Kerrek SB Nov 14 '13 at 22:34
4  
"I generated 100,000,000 integers into an array in Java, and ran Arrays.sort(), then generated new random sequence and ran my heapSort()". Why not generate a sequence, copy it, sort one using each algorithm, and compare results? That will give you a pretty good idea whether your code is correct. – Steve Jessop Nov 14 '13 at 22:35
    
You should (almost) never benchmarking code before you verify that it is actually computing correct results (that is, verify that it is the code you want to benchmark). In view of the comments and answers so far, you apparently skipped that step. – Ted Hopp Nov 15 '13 at 0:09

Your Java code looks bugged to me

int tmp = heap[0];
heap[i] = heap[0];
heap[i] = tmp;

That's not the code to swap two elements.

Does that make a difference to the execution time? I don't know heap sort well enough to be sure.

share|improve this answer
4  
It makes the heapify a no-op. No sorting is taking place. – Captain Giraffe Nov 14 '13 at 22:37

You do not properly swap items in either your Java (as john pointed out), nor your C++ code:

void heapSort(vector<int> & heap, int length)
{
    int heapsize = length;
    buildHeap(heap, heapsize);
    for(int i = heapsize-1; i >= 1; i--)
    {
        int tmp = heap[0];
        heap[i] = heap[0];
        heap[i] = tmp; // overwrote the item you just tried to swap!
        heapsize--;
        heapify(heap, 0, heapsize);
    }
}

In short, your code is "more efficient" because it doesn't do any sorting at all.

share|improve this answer

There is one other issue in your C++ code that relates to how you're generating your random distribution:

int randomval()
{
  double d;
  int result;
  d = rand() / RAND_MAX;
  result = (int) (d * N);
  return result;
}

d is always going to be 0 because you are perform an int division and then implicitly converting it to double afterwards. In short, your randomval function isn't giving you any random values at all.

When you sort this with your own heap sort, the same code path is always executed. In your case, heapify will probably never execute this portion of code:

if (largest != i)
{
    int tmp = heap[i];
    heap[i] = heap[largest];
    heap[largest] = tmp;

    heapify(heap, largest, heapsize);
}

which is why your implementation appears to be faster.

Fixing the random test data with an actual distribution I think you'll find your implementation to be slower:

#include <random>
// snip...
int main()
{
  int length = 10000000;
  std::vector<int> vint1;

  std::default_random_engine gen;
  std::uniform_int_distribution<int> randomval(1, N);
  for (int i = 0; i < length; i++)
  {
        vint1.push_back(randomval(gen));
  }
  std::vector<int> vint2 = vint1; /* so we're sorting same testdata for both */
  // ...

Rerunning the benchmark again shows:

g++ -std=c++0x -Wall -pedantic -O2 heapsorttest.cpp -o heapsorttest.exe
heapsorttest.exe

Heapsort: 5.822s
true

std::sort: 0.936s
true
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.