You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Cancel
9
  • 94
    Another observation is that you don't need to sort the array, but you just need to partition it with the value 128. Sorting is n*log(n), whereas partitioning is just linear. Basically it is just one run of the quick sort partitioning step with the pivot chosen to be 128. Unfortunately in C++ there is just nth_element function, which partition by position, not by value. May 11, 2018 at 12:45
  • 33
    @screwnut here's an experiment which would show that partitioning is sufficient: create an unsorted but partitioned array with otherwise random contents. Measure time. Sort it. Measure time again. The two measurements should be basically indistinguishable. (Experiment 2: create a random array. Measure time. Partition it. Measure time again. You should see the same speed-up as sorting. You could roll the two experiments into one.) Oct 5, 2020 at 8:26
  • 31
    Btw. on Apple M1 the code runs in 17 sec unsorted, and in 7 sec sorted, so the branch prediction penalty isn't that bad on risc architecture. Mar 31, 2021 at 9:07
  • 28
    @RomanYavorskyi: It depends on the compiler. If they make branchless asm for this specific test (e.g. as part of vectorizing with SIMD like in Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?, or just with scalar cmov (gcc optimization flag -O3 makes code slower than -O2), then sorted or not doesn't matter. But unpredictable branches are still a very real thing when it's not as simple as counting, so it would be insane to delete this question. Apr 15, 2021 at 6:31
  • 12
    @screwnut: To be fair, though, it still wouldn't be worth partitioning first, because partitioning requires conditional copying or swapping based on the same array[i] > 128 compare. (Unless you're going to be counting multiple times, and you want to partition the bulk of an array so it's still fast, only mispredicting in an unpartitioned part after some appends or modifications). If you can get the compiler to do it, ideally just vectorize with SIMD, or at least use branchless scalar if the data is unpredictable. (See previous comment for links.) Apr 15, 2021 at 6:46