An official answer would be from

 1. [Intel][1]
 2. [Intel][2]
 3. [Scientific papers][3]
 4. Books: J.L. Hennessy, D.A. Patterson: Computer architecture: a quantitative approach
 5. Articles in scientific puplications: T.Y. Yeh, Y.N. Patt made a lot of these on branch predictions.

You can also see from this lovely [diagram][4] why the branch predictor gets confused.

[![2-bit state diagram][5]][5]

Each element in the original code is a random value

    data[c] = std::rand() % 256;

so the predictor will change sides as the std::rand() blow.

On the other hand, once it's sorted, the predictor will first move into a state of strongly not taken and when the values change to the high value the predictor will in three runs through change all the way fron strongly not taken to strongly taken.

  [1]: https://software.intel.com/en-us/articles/avoiding-the-cost-of-branch-misprediction
  [2]: https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts
  [3]: https://scholar.google.com/scholar?q=branch%20prediction%20computer%20architecture&hl=da&as_sdt=0&as_vis=1&oi=scholart
  [4]: https://en.wikipedia.org/wiki/Branch_predictor#/media/File:Branch_prediction_2bit_saturating_counter-dia.svg
  [5]: https://i.sstatic.net/pBMV2.png