22 events
when toggle format what by license comment
S Jan 27, 2021 at 10:03 history suggested jakubde CC BY-SA 4.0
Replaced dead link with a wayback machine variant
Jan 26, 2021 at 22:51 review Suggested edits
S Jan 27, 2021 at 10:03
May 27, 2019 at 13:08 history edited Peter Mortensen CC BY-SA 4.0
Made compliant with the Jon Skeet Decree - <https://twitter.com/PeterMortensen/status/976400000942034944>.
May 27, 2019 at 13:07 comment added Peter Mortensen What about cache sizes, L1, L2, etc.? Having to go to main memory for a table lookup would be killer. Can you address that in your answer?
Feb 28, 2019 at 12:37 comment added Luke Hutchison Bit shifting is a zero-cost operation in ARM, so you may find the bit-shifted version is faster on ARM.
Feb 27, 2019 at 10:56 history edited Neuron CC BY-SA 4.0
readers generally don't care about what has been added or removed, they just want to read a good post. For those who are more curious, there is the edit history. https://meta.stackexchange.com/a/127655/316262
S Jul 10, 2018 at 8:57 history suggested rahul singh Chauhan CC BY-SA 4.0
corrected spelling
Jul 10, 2018 at 8:43 review Suggested edits
S Jul 10, 2018 at 8:57
Jan 29, 2016 at 1:24 history wiki removed Shog9
Jan 28, 2016 at 20:56 history made wiki Post Made Community Wiki by George Stocker
Aug 23, 2015 at 19:12 history edited Mark Rogers CC BY-SA 3.0
textified link - feel free to rollback
Oct 27, 2014 at 23:05 comment added steveha On what architecture would an expression like (x < node->value) require a branch to evaluate? All the architectures with which I am familiar have a "flags" register, and it is simple to extract the desired flag value. I guess on the Pentium 4 the flag bit extraction might be slow as IIRC that chip doesn't have dedicated shifting hardware for addresses, but borrows the ALU to shift bits. But I don't know where an actual branch would be needed. Hmm, your examples were conditional... the idea is that once you extract the bit from flags, you can just use indexing with no branch.
Oct 27, 2014 at 21:01 comment added Logan Pickup i = (x < node->value); node = node->link[i]; doesn't have an explicit branch, but it still contains a comparison; it depends a lot on the target architecture as to whether this can be resolved without a branch or not. Since it can be done on x86 (using either CMOV or LAHF) and ARM (conditional add or move), which are the only architectures I use, this is possibly not important!
Jul 4, 2014 at 14:56 comment added Petter Zain is correct. The “if” is just hidden in the lookup table. The code is faster because the lookup table is hidden OUTSIDE the 100000 iterations. There is nothign to gain from using a lookup table for this problem.
Apr 7, 2014 at 18:58 history edited steveha CC BY-SA 3.0
add more
Apr 2, 2014 at 15:15 comment added Falco Good answer - since a lookup-table can also handle complex cases, where we cannot easily cheat with bit-manipulation
Mar 18, 2014 at 8:52 comment added atlaste @steveha P.S.: another possible answer would be int c = data[j]; sum += c & -(c >> 7); which requires no multiplications at all.
Mar 18, 2014 at 8:45 comment added atlaste @steveha Multiplication should be faster, I tried looking it up in the Intel books, but couldn't find it... either way, benchmarking also gives me that result here.
Mar 4, 2014 at 15:38 comment added Richard Tingle Possibly I'm missing something but in your j equals 0 or 1 method why don't you just multiply your value by j before adding it rather than using the array indexing (possibly should be multiplied by 1-j rather than j)
Jul 29, 2013 at 22:02 comment added steveha I think indexing with the 0/1 value will probably be faster than an integer multiply, but I guess if performance is really critical you should profile it. I agree that small lookup tables are essential to avoid cache pressure, but clearly if you have a bigger cache you can get away with a bigger lookup table, so 4KB is more a rule of thumb than a hard rule. I think you meant sizeof(int) == 4? That would be true for 32-bit. My two-year-old cell phone has a 32KB L1 cache, so even a 4K lookup table might work, especially if the lookup values were a byte instead of an int.
Jul 29, 2013 at 12:05 comment added atlaste Right, you can also just use the bit directly and multiply (data[c]>>7 - which is discussed somewhere here as well); I intentionally left this solution out, but of course you are correct. Just a small note: The rule of thumb for lookup tables is that if it fits in 4KB (because of caching), it'll work - preferably make the table as small as possible. For managed languages I'd push that to 64KB, for low-level languages like C++ and C, I'd probably reconsider (that's just my experience). Since typeof(int) = 4, I'd try to stick to max 10 bits.
Jul 22, 2013 at 8:29 history answered steveha CC BY-SA 3.0