Timeline for Why is processing a sorted array faster than processing an unsorted array?
Current License: CC BY-SA 4.0
13 events
when toggle format | what | by | license | comment | |
---|---|---|---|---|---|
Oct 20, 2020 at 22:33 | history | edited | Luke Hutchison | CC BY-SA 4.0 |
added 20 characters in body
|
Oct 20, 2020 at 22:26 | history | edited | Luke Hutchison | CC BY-SA 4.0 |
added 20 characters in body
|
Feb 21, 2020 at 12:04 | history | edited | Luke Hutchison | CC BY-SA 4.0 |
added 206 characters in body
|
Feb 21, 2020 at 11:58 | history | edited | Luke Hutchison | CC BY-SA 4.0 |
added 206 characters in body
|
Feb 21, 2020 at 11:47 | history | edited | Luke Hutchison | CC BY-SA 4.0 |
added 206 characters in body
|
Feb 21, 2020 at 11:40 | comment | added | Peter Cordes |
(ARM predicated execution truly NOPs the instruction, so you can even use it on loads or stores that would fault, unlike x86 cmov with a memory source operand. Most ISAs, including AArch64, only have ALU select operations. So ARM predication can be powerful, and usable more efficiently than branchless code on most ISAs.)
|
|
Feb 21, 2020 at 11:36 | comment | added | Peter Cordes |
BTW, you could save an instruction in the loop by indexing relative to the end of the array. Before the loop, set up R2 = data + arraySize , then start with R1 = -arraySize . The bottom of the loop becomes adds r1, r1, #1 / bnz inner_loop . Compilers don't use this optimization for some reason :/ But anyway, predicated execution of the add is not fundamentally different in this case from what you can do with branchless code on other ISAs, like x86 cmov . Although it's not as nice: gcc optimization flag -O3 makes code slower than -O2
|
|
Feb 21, 2020 at 11:34 | comment | added | Peter Cordes | Note that the OP is not including the time to sort in their measurement. It's probably an overall loss to sort first before running a branch x86 loop, too, even though the non-sorted case makes the loop run a lot slower. But sorting a big array requires a lot of work. | |
Feb 21, 2020 at 11:09 | history | edited | Luke Hutchison | CC BY-SA 4.0 |
added 206 characters in body
|
May 15, 2018 at 17:08 | comment | added | Luke Hutchison | Not adding the S suffix allows you to have several conditional instructions in a row without worrying that one of them might change the status bits, which might otherwise have the side effect of skipping the rest of the conditional instructions. | |
May 15, 2018 at 17:06 | comment | added | Luke Hutchison | The other innovation in ARM is the addition of the S instruction suffix, also optional on (almost) all instructions, which if absent, prevents instructions from changing status bits (with the exception of the CMP instruction, whose job is to set status bits, so it doesn't need the S suffix). This allows you to avoid CMP instructions in many cases, as long as the comparison is with zero or similar (eg. SUBS R0, R0, #1 will set the Z (Zero) bit when R0 reaches zero). Conditionals and the S suffix incur zero overhead. It's quite a beautiful ISA. | |
May 14, 2018 at 14:01 | history | edited | jpaugh | CC BY-SA 4.0 |
Rearrange for clarity, and prevent fake comments
|
Dec 22, 2017 at 13:13 | history | answered | Luke Hutchison | CC BY-SA 3.0 |