CENSADA 313 3.5 CONCLUSIONES - Análisis del estado poblacional de aves acuáticas, playeras migr

The high speed of modern microprocessors is obtained by using a pipeline where

instructions are fetched and decoded in several stages before they are executed. However, the pipeline structure has one big problem. Whenever the code has a branch (e.g. an if-else structure), the microprocessor does not know in advance which of the two branches to feed into the pipeline. If the wrong branch is fed into the pipeline then the error is not detected until 10 - 20 clock cycles later and the work it has done by fetching, decoding, and perhaps speculatively executing instructions during this time has been wasted. The consequence is that the microprocessor wastes several clock cycles whenever it feeds a branch into the pipeline and later discovers that it has chosen the wrong branch.

Microprocessor designers have gone to great lengths to reduce this problem. The most important method that is used is branch prediction. Modern microprocessors are using advanced algorithms to predict which way a branch will go based on the past history of that branch and other nearby branches. The algorithms used for branch prediction are different for each type of microprocessor. These algorithms are described in detail in manual 3: "The microarchitecture of Intel, AMD and VIA CPUs".

A branch instruction takes typically 0 - 2 clock cycles in the case that the microprocessor has made the right prediction. The time it takes to recover from a branch misprediction is approximately 12 - 25 clock cycles, depending on the processor. This is called the branch misprediction penalty.

Branches are relatively cheap if they are predicted most of the time, but expensive if they are often mispredicted. A branch that always goes the same way is predicted well, of

course. A branch that goes one way most of the time and rarely the other way is

mispredicted only when it goes the other way. A branch that goes many times one way, then many times the other way is mispredicted only when it changes. A branch that follows a simple periodic pattern can also be predicted quite well if it is inside a loop with few or no other branches. A simple periodic pattern can be, for example, to go one way two times and the other way three times. Then again two times the first way and three times the other way, etc. The worst case is a branch that goes randomly one way or the other with a 50-50 chance of going either way. Such a branch will be mispredicted 50% of the time.

A for-loop or while-loop is also a kind of branch. After each iteration it decides whether to repeat or to exit the loop. The loop-branch is usually predicted well if the repeat count is small and always the same. The maximum loop count that can be predicted perfectly varies between 9 and 64, depending on the processor. Nested loops are predicted well only on some processors. On many processors, a loop that contains several branches is not predicted well.

A switch statements is a kind of branch that can go more than two ways. Switch statements are most efficient if the case labels follow a sequence where each label is equal to the preceding label plus one, because it can be implemented as a table of jump targets. A switch statement with many labels that have values far from each other is inefficient because the compiler must convert it to a branch tree.

On older processors, a switch statement with sequential labels is simply predicted to go the same way as last time it was executed. It is therefore certain to be mispredicted whenever it goes another way than last time. Newer processors are sometimes able to predict a switch statement if it follows a simple periodic pattern or if it is correlated with preceding branches and the number of different targets is small.

The number of branches and switch statements should preferably be kept small in the critical part of a program, especially if the branches are poorly predictable. It may be useful to roll out a loop if this can eliminate branches, as explained in the next paragraph.

The target of branches and function calls are saved in a special cache called the branch target buffer. Contentions in the branch target buffer can occur if a program has many branches or function calls. The consequence of such contentions is that branches can be mispredicted even if they otherwise would be predicted well. Even direct function calls can be mispredicted for this reason. A program with many branches and function calls in the critical part of the code can therefore suffer from mispredictions.

In some cases it is possible to replace a poorly predictable branch by a table lookup. For example:

// Example 7.29a float a; bool b; a = b ? 1.5f : 2.6f;

The ?: operator here is a branch. If it is poorly predictable then replace it by a table lookup: // Example 7.29b

float a; bool b = 0;

const float lookup[2] = {2.6f, 1.5f}; a = lookup[b];

If a bool is used as an array index then it is important to make sure it is initialized or comes from a reliable source so that it can have no other values than 0 or 1. See page 34.

The examples on page 144 and 145 show various ways of reducing the number of branches.

Manual 3: "The microarchitecture of Intel, AMD and VIA CPUs" gives more details on branch predictions in the different microprocessors.

In document Análisis del estado poblacional de aves acuáticas, playeras migratorias y residentes en las piscinas artificiales de Ecuasal, Mar Bravo y Pacoa en la provincia de Santa Elena – Ecuador, noviembre 2009 – mayo 2010 (página 43-49)