Edit - Stack Overflow

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Rev

Why is processing a sorted array faster than processing an unsorted array?

Here is a piece of C++ code that shows some very peculiar behavior. For some strange reason, sorting the data (before the timed region) miraculously makes the loop almost six times faster.

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;

    // !!! With this, the next loop runs faster.
    std::sort(data, data + arraySize);

    // Test
    clock_t start = clock();
    long long sum = 0;
    for (unsigned i = 0; i < 100000; ++i)
    {
        for (unsigned c = 0; c < arraySize; ++c)
        {   // Primary loop
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << '\n';
    std::cout << "sum = " << sum << '\n';
}

Without std::sort(data, data + arraySize);, the code runs in 11.54 seconds.
With the sorted data, the code runs in 1.93 seconds.

(Sorting itself takes more time than this one pass over the array, so it's not actually worth doing if we needed to calculate this for an unknown array.)

Initially, I thought this might be just a language or compiler anomaly, so I tried Java:

import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        // !!! With this, the next loop runs faster
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;
        for (int i = 0; i < 100000; ++i)
        {
            for (int c = 0; c < arraySize; ++c)
            {   // Primary loop
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

With a similar but less extreme result.

My first thought was that sorting brings the data into the cache, but then I thought how silly that was because the array was just generated.

What is going on?
Why is processing a sorted array faster than processing an unsorted array?

The code is summing up some independent terms, so the order should not matter.

Related / followup Q&As about the same effect with different / later compilers and options:

Answer

Edit Summary

Cancel

2

Right, but the setup cost of sorting the array is O(N log N), so breaking early doesn't help you if the only reason you are sorting the array is to be able to break early. If, however, you have other reasons to pre-sort the array, then yes, this is valuable.
– Luke Hutchison
Nov 6 '18 at 12:28
1

Depends how many times you sort the data compared to how many times you loop on it. The sort in this example is just an example, it doesn't have to be just before the loop
– Yochai Timmer
Feb 27 '19 at 12:23
3

Yes, that's exactly the point I made in my first comment :-) You say "The branch prediction will miss only once." But you are not counting the O(N log N) branch prediction misses inside the sort algorithm, which is actually greater than the O(N) branch prediction misses in the unsorted case. So you would need to use the entirety of the sorted data O(log N) times to break even (probably actually closer to O(10 log N), depending on the sort algorithm, e.g. for quicksort, due to cache misses -- mergesort is more cache-coherent, so you would need closer to O(2 log N) usages to break even.)
– Luke Hutchison
Feb 28 '19 at 12:28
1

One significant optimization though would be to do only "half a quicksort", sorting only items less than the target pivot value of 127 (assuming everything less than or equal to the pivot is sorted after the pivot). Once you reach the pivot, sum the elements before the pivot. This would run in O(N) startup time rather than O(N log N), although there will still be a lot of branch prediction misses, probably of the order of O(5 N) based on the numbers I gave before, since it's half a quicksort.
– Luke Hutchison
Feb 28 '19 at 12:34

Add a comment |

► fix grammatical or spelling errors
► clarify meaning without changing it
► correct minor mistakes
► add related resources or links
► always respect the original author

► create code fences with backticks ` or tildes ~

```
like so
```

► add language identifier to highlight code

```python
def function(foo):
print(foo)
```

► put returns between paragraphs

► for linebreak add 2 spaces at end

► _italic_ or **bold**

► indent code by 4 spaces

► backtick escapes `like _so_`

► quote by placing > at start of line

► to make links (use https whenever possible)
<https://example.com>
[example](https://example.com)
<a href="https://example.com">example</a>

formatting help »
answering help »

current community

your communities

more stack exchange communities

Why is processing a sorted array faster than processing an unsorted array?

Answer