If you are curious about even more optimizations that can be done to this code, consider this:
Starting with the original loop:
for (unsigned i = 0; i < 100000; ++i)
{
for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
sum += data[j];
}
}
With loop interchange, we can safely change this loop to:
for (unsigned j = 0; j < arraySize; ++j)
{
for (unsigned i = 0; i < 100000; ++i)
{
if (data[j] >= 128)
sum += data[j];
}
}
Then, you can see that the if
conditional is constant throughout the execution of the i
loop, so you can hoist the if
out:
for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
{
for (unsigned i = 0; i < 100000; ++i)
{
sum += data[j];
}
}
}
Then, you see that the inner loop can be collapsed into one single expression, assuming the floating point model allows it (/fp:fast/fp:fast
is thrown, for example)
for (unsigned j = 0; j < arraySize; ++j)
{
if (data[j] >= 128)
{
sum += data[j] * 100000;
}
}
That one is 100,000x000 times faster than before.