Timeline for Why is processing a sorted array faster than processing an unsorted array?

Current License: CC BY-SA 4.0

22 events

when toggle format	what		by	license	comment
S Jan 27, 2021 at 10:03	history	suggested	jakubde	CC BY-SA 4.0	Replaced dead link with a wayback machine variant
Jan 26, 2021 at 22:51	review	Suggested edits
S Jan 27, 2021 at 10:03
May 27, 2019 at 13:08	history	edited	Peter Mortensen	CC BY-SA 4.0	Made compliant with the Jon Skeet Decree - <https://twitter.com/PeterMortensen/status/976400000942034944>.
May 27, 2019 at 13:07	comment	added	Peter Mortensen		What about cache sizes, L1, L2, etc.? Having to go to main memory for a table lookup would be killer. Can you address that in your answer?
Feb 28, 2019 at 12:37	comment	added	Luke Hutchison		Bit shifting is a zero-cost operation in ARM, so you may find the bit-shifted version is faster on ARM.
Feb 27, 2019 at 10:56	history	edited	Neuron	CC BY-SA 4.0	readers generally don't care about what has been added or removed, they just want to read a good post. For those who are more curious, there is the edit history. https://meta.stackexchange.com/a/127655/316262
S Jul 10, 2018 at 8:57	history	suggested	rahul singh Chauhan	CC BY-SA 4.0	corrected spelling
Jul 10, 2018 at 8:43	review	Suggested edits
S Jul 10, 2018 at 8:57
Jan 29, 2016 at 1:24	history	wiki removed	Shog9
Jan 28, 2016 at 20:56	history	made wiki			Post Made Community Wiki by George Stocker
Aug 23, 2015 at 19:12	history	edited	Mark Rogers	CC BY-SA 3.0	textified link - feel free to rollback
Oct 27, 2014 at 23:05	comment	added	steveha		On what architecture would an expression like `(x < node->value)` require a branch to evaluate? All the architectures with which I am familiar have a "flags" register, and it is simple to extract the desired flag value. I guess on the Pentium 4 the flag bit extraction might be slow as IIRC that chip doesn't have dedicated shifting hardware for addresses, but borrows the ALU to shift bits. But I don't know where an actual branch would be needed. Hmm, your examples were conditional... the idea is that once you extract the bit from flags, you can just use indexing with no branch.
Oct 27, 2014 at 21:01	comment	added	Logan Pickup		`i = (x < node->value); node = node->link[i];` doesn't have an explicit branch, but it still contains a comparison; it depends a lot on the target architecture as to whether this can be resolved without a branch or not. Since it can be done on x86 (using either CMOV or LAHF) and ARM (conditional add or move), which are the only architectures I use, this is possibly not important!
Jul 4, 2014 at 14:56	comment	added	Petter		Zain is correct. The “if” is just hidden in the lookup table. The code is faster because the lookup table is hidden OUTSIDE the 100000 iterations. There is nothign to gain from using a lookup table for this problem.
Apr 7, 2014 at 18:58	history	edited	steveha	CC BY-SA 3.0	add more
Apr 2, 2014 at 15:15	comment	added	Falco		Good answer - since a lookup-table can also handle complex cases, where we cannot easily cheat with bit-manipulation
Mar 18, 2014 at 8:52	comment	added	atlaste		@steveha P.S.: another possible answer would be `int c = data[j]; sum += c & -(c >> 7);` which requires no multiplications at all.
Mar 18, 2014 at 8:45	comment	added	atlaste		@steveha Multiplication should be faster, I tried looking it up in the Intel books, but couldn't find it... either way, benchmarking also gives me that result here.
Mar 4, 2014 at 15:38	comment	added	Richard Tingle		Possibly I'm missing something but in your `j` equals 0 or 1 method why don't you just multiply your value by `j` before adding it rather than using the array indexing (possibly should be multiplied by `1-j` rather than `j`)
Jul 29, 2013 at 22:02	comment	added	steveha		I think indexing with the 0/1 value will probably be faster than an integer multiply, but I guess if performance is really critical you should profile it. I agree that small lookup tables are essential to avoid cache pressure, but clearly if you have a bigger cache you can get away with a bigger lookup table, so 4KB is more a rule of thumb than a hard rule. I think you meant `sizeof(int) == 4`? That would be true for 32-bit. My two-year-old cell phone has a 32KB L1 cache, so even a 4K lookup table might work, especially if the lookup values were a byte instead of an int.
Jul 29, 2013 at 12:05	comment	added	atlaste		Right, you can also just use the bit directly and multiply (`data[c]>>7` - which is discussed somewhere here as well); I intentionally left this solution out, but of course you are correct. Just a small note: The rule of thumb for lookup tables is that if it fits in 4KB (because of caching), it'll work - preferably make the table as small as possible. For managed languages I'd push that to 64KB, for low-level languages like C++ and C, I'd probably reconsider (that's just my experience). Since `typeof(int) = 4`, I'd try to stick to max 10 bits.
Jul 22, 2013 at 8:29	history	answered	steveha	CC BY-SA 3.0

toggle format

Collectives™ on Stack Overflow

Timeline for Why is processing a sorted array faster than processing an unsorted array?

Current License: CC BY-SA 4.0