You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Why is processing a sorted array faster than processing an unsorted array?

Here is a piece of C++ code that shows some very peculiar behavior. For some strange reason, sorting the data (before the timed region) miraculously makes the loop almost six times faster.

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;

    // !!! With this, the next loop runs faster.
    std::sort(data, data + arraySize);

    // Test
    clock_t start = clock();
    long long sum = 0;
    for (unsigned i = 0; i < 100000; ++i)
    {
        for (unsigned c = 0; c < arraySize; ++c)
        {   // Primary loop
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << '\n';
    std::cout << "sum = " << sum << '\n';
}
  • Without std::sort(data, data + arraySize);, the code runs in 11.54 seconds.
  • With the sorted data, the code runs in 1.93 seconds.

(Sorting itself takes more time than this one pass over the array, so it's not actually worth doing if we needed to calculate this for an unknown array.)


Initially, I thought this might be just a language or compiler anomaly, so I tried Java:

import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        // !!! With this, the next loop runs faster
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;
        for (int i = 0; i < 100000; ++i)
        {
            for (int c = 0; c < arraySize; ++c)
            {   // Primary loop
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

With a similar but less extreme result.


My first thought was that sorting brings the data into the cache, but then I thought how silly that was because the array was just generated.

  • What is going on?
  • Why is processing a sorted array faster than processing an unsorted array?

The code is summing up some independent terms, so the order should not matter.


Related / followup Q&As about the same effect with different / later compilers and options:

Answer

Cancel
4
  • 2
    Right, but the setup cost of sorting the array is O(N log N), so breaking early doesn't help you if the only reason you are sorting the array is to be able to break early. If, however, you have other reasons to pre-sort the array, then yes, this is valuable. Nov 6 '18 at 12:28
  • 1
    Depends how many times you sort the data compared to how many times you loop on it. The sort in this example is just an example, it doesn't have to be just before the loop Feb 27 '19 at 12:23
  • 3
    Yes, that's exactly the point I made in my first comment :-) You say "The branch prediction will miss only once." But you are not counting the O(N log N) branch prediction misses inside the sort algorithm, which is actually greater than the O(N) branch prediction misses in the unsorted case. So you would need to use the entirety of the sorted data O(log N) times to break even (probably actually closer to O(10 log N), depending on the sort algorithm, e.g. for quicksort, due to cache misses -- mergesort is more cache-coherent, so you would need closer to O(2 log N) usages to break even.) Feb 28 '19 at 12:28
  • 1
    One significant optimization though would be to do only "half a quicksort", sorting only items less than the target pivot value of 127 (assuming everything less than or equal to the pivot is sorted after the pivot). Once you reach the pivot, sum the elements before the pivot. This would run in O(N) startup time rather than O(N log N), although there will still be a lot of branch prediction misses, probably of the order of O(5 N) based on the numbers I gave before, since it's half a quicksort. Feb 28 '19 at 12:34