Timeline for Why is processing a sorted array faster than processing an unsorted array?

Current License: CC BY-SA 4.0

13 events

when toggle format	what		by	license	comment
Oct 20, 2020 at 22:33	history	edited	Luke Hutchison	CC BY-SA 4.0	added 20 characters in body
Oct 20, 2020 at 22:26	history	edited	Luke Hutchison	CC BY-SA 4.0	added 20 characters in body
Feb 21, 2020 at 12:04	history	edited	Luke Hutchison	CC BY-SA 4.0	added 206 characters in body
Feb 21, 2020 at 11:58	history	edited	Luke Hutchison	CC BY-SA 4.0	added 206 characters in body
Feb 21, 2020 at 11:47	history	edited	Luke Hutchison	CC BY-SA 4.0	added 206 characters in body
Feb 21, 2020 at 11:40	comment	added	Peter Cordes		(ARM predicated execution truly NOPs the instruction, so you can even use it on loads or stores that would fault, unlike x86 `cmov` with a memory source operand. Most ISAs, including AArch64, only have ALU select operations. So ARM predication can be powerful, and usable more efficiently than branchless code on most ISAs.)
Feb 21, 2020 at 11:36	comment	added	Peter Cordes		BTW, you could save an instruction in the loop by indexing relative to the end of the array. Before the loop, set up `R2 = data + arraySize`, then start with `R1 = -arraySize`. The bottom of the loop becomes `adds r1, r1, #1` / `bnz inner_loop`. Compilers don't use this optimization for some reason :/ But anyway, predicated execution of the add is not fundamentally different in this case from what you can do with branchless code on other ISAs, like x86 `cmov`. Although it's not as nice: gcc optimization flag -O3 makes code slower than -O2
Feb 21, 2020 at 11:34	comment	added	Peter Cordes		Note that the OP is not including the time to sort in their measurement. It's probably an overall loss to sort first before running a branch x86 loop, too, even though the non-sorted case makes the loop run a lot slower. But sorting a big array requires a lot of work.
Feb 21, 2020 at 11:09	history	edited	Luke Hutchison	CC BY-SA 4.0	added 206 characters in body
May 15, 2018 at 17:08	comment	added	Luke Hutchison		Not adding the S suffix allows you to have several conditional instructions in a row without worrying that one of them might change the status bits, which might otherwise have the side effect of skipping the rest of the conditional instructions.
May 15, 2018 at 17:06	comment	added	Luke Hutchison		The other innovation in ARM is the addition of the S instruction suffix, also optional on (almost) all instructions, which if absent, prevents instructions from changing status bits (with the exception of the CMP instruction, whose job is to set status bits, so it doesn't need the S suffix). This allows you to avoid CMP instructions in many cases, as long as the comparison is with zero or similar (eg. SUBS R0, R0, #1 will set the Z (Zero) bit when R0 reaches zero). Conditionals and the S suffix incur zero overhead. It's quite a beautiful ISA.
May 14, 2018 at 14:01	history	edited	jpaugh	CC BY-SA 4.0	Rearrange for clarity, and prevent fake comments
Dec 22, 2017 at 13:13	history	answered	Luke Hutchison	CC BY-SA 3.0

Stack Overflow works best with JavaScript enabled