13 events
when toggle format what by license comment
Oct 20, 2020 at 22:33 history edited Luke Hutchison CC BY-SA 4.0
added 20 characters in body
Oct 20, 2020 at 22:26 history edited Luke Hutchison CC BY-SA 4.0
added 20 characters in body
Feb 21, 2020 at 12:04 history edited Luke Hutchison CC BY-SA 4.0
added 206 characters in body
Feb 21, 2020 at 11:58 history edited Luke Hutchison CC BY-SA 4.0
added 206 characters in body
Feb 21, 2020 at 11:47 history edited Luke Hutchison CC BY-SA 4.0
added 206 characters in body
Feb 21, 2020 at 11:40 comment added Peter Cordes (ARM predicated execution truly NOPs the instruction, so you can even use it on loads or stores that would fault, unlike x86 cmov with a memory source operand. Most ISAs, including AArch64, only have ALU select operations. So ARM predication can be powerful, and usable more efficiently than branchless code on most ISAs.)
Feb 21, 2020 at 11:36 comment added Peter Cordes BTW, you could save an instruction in the loop by indexing relative to the end of the array. Before the loop, set up R2 = data + arraySize, then start with R1 = -arraySize. The bottom of the loop becomes adds r1, r1, #1 / bnz inner_loop. Compilers don't use this optimization for some reason :/ But anyway, predicated execution of the add is not fundamentally different in this case from what you can do with branchless code on other ISAs, like x86 cmov. Although it's not as nice: gcc optimization flag -O3 makes code slower than -O2
Feb 21, 2020 at 11:34 comment added Peter Cordes Note that the OP is not including the time to sort in their measurement. It's probably an overall loss to sort first before running a branch x86 loop, too, even though the non-sorted case makes the loop run a lot slower. But sorting a big array requires a lot of work.
Feb 21, 2020 at 11:09 history edited Luke Hutchison CC BY-SA 4.0
added 206 characters in body
May 15, 2018 at 17:08 comment added Luke Hutchison Not adding the S suffix allows you to have several conditional instructions in a row without worrying that one of them might change the status bits, which might otherwise have the side effect of skipping the rest of the conditional instructions.
May 15, 2018 at 17:06 comment added Luke Hutchison The other innovation in ARM is the addition of the S instruction suffix, also optional on (almost) all instructions, which if absent, prevents instructions from changing status bits (with the exception of the CMP instruction, whose job is to set status bits, so it doesn't need the S suffix). This allows you to avoid CMP instructions in many cases, as long as the comparison is with zero or similar (eg. SUBS R0, R0, #1 will set the Z (Zero) bit when R0 reaches zero). Conditionals and the S suffix incur zero overhead. It's quite a beautiful ISA.
May 14, 2018 at 14:01 history edited jpaugh CC BY-SA 4.0
Rearrange for clarity, and prevent fake comments
Dec 22, 2017 at 13:13 history answered Luke Hutchison CC BY-SA 3.0