• No se han encontrado resultados

Justificación y viabilidad del tema propuesto

CONTAS REGIONAIS DO BRASIL 2010-

1.4 Justificación y viabilidad del tema propuesto

A vector memory access that reads (writes) a set of consecutive scalar elements to (from) memory is usually denoted as contiguous, adjacent or stride-one vector mem- ory access. Non-contiguous, non-adjacent or non-stride-one vector memory accesses are those vector accesses that read (write) scalar elements that are not consecutive in memory. Stride-one vector memory accesses are generally more efficient in the vast majority of SIMD architectures. In fact, some SIMD architectures do not even have support for the non-stride-one counterparts.

In addition, stride-one vector memory accesses can be qualified as aligned or un- aligned accesses depending on whether the memory address that they read (write)

3.5. Vectorization Algorithm 47

from (to) is aligned to the boundary of the vector length of the architecture. In many architectures, unaligned vector memory accesses are significantly slower that their aligned counterparts due to hardware constraints. For this reason, the vectoriza- tion algorithm must try to maximize the use of aligned vector memory accesses. However, using an aligned vector memory access over an unaligned address usually results in a hardware exception that aborts the execution. Thus, the vectorization algorithm must ensure that the address of a particular vector memory access will always be aligned to the vector length boundary at runtime.

In the following sections, we briefly illustrate how memory accesses are ana- lyzed to determine if they are adjacent and aligned or not. For the sake of sim- plicity, we limit the explanation of this analysis to array subscripts although our implementation is also able to deal with pointer accesses.

Contiguity

Our vectorization algorithm computes the adjacent attribute defined in Section 3.5.6 to state that a vector access is stride-one or not.

In order to determine whether a scalar memory access performs loads/stores in consecutive positions in memory across loop iterations, the vectorization algorithm analyzes whether the evolution of the access is stride-one or not. To be considered adjacent, the array pointer must be uniform. Otherwise, the memory access is classified as non-adjacent as the array pointer could have different values across the loop iterations. If the array pointer is uniform, then the algorithm analyzes the expression in the subscript of the array. Figure 3.9 shows some of the rules used in this analysis based on the linear and uniform attributes:

linear: the evolution of linear expression is determined by their step. A linear

expressions with a step one is the base of an adjacent array access (lines 2 and 3).

uniform: the value of uniform variables is always the same within a vector regis-

ter. They cannot cause an adjacent access by themselves (lines 4 and 5).

unqualified: variables that are neither linear nor uniform (line 15) can evolve in

an arbitrary way across loop iterations. If a variable of this kind is present in the subscript of an array access, the access is classified as non-adjacent.

The attributes of sub-expressions involved in the subscript determine the adja- cent attribute of a compound expression. Lines 7 to 13 contain the rules for addition and subtraction expressions. As we can see, an array subscript with an addition or subtraction expression in the subscript will be adjacent if one of the sub-expression of the addition is adjacent and the other one is uniform.

1: functionIS ADJACENT(expr)

2: if IS LINEAR(expr) andLINEAR STEP(expr) == 1 then

3: return TRUE

4: else ifIS UNIFORM(expr) then 5: return FALSE

6: else

7: switch expr.kind do

8: case NODECLADD(expr1, expr2, type)

9: return (IS ADJACENT(expr1) andIS UNIFORM(expr2)) or

10: (IS ADJACENT(expr2) andIS UNIFORM(expr1))

11: case NODECL SUB(expr1, expr2, type)

12: return (IS ADJACENT(expr1) andIS UNIFORM(expr2)) or

13: (IS ADJACENT(expr2) andIS UNIFORM(expr1))

14: end if

15: return FALSE . Default case. Arbitrary evolution.

16: end function

Figure 3.9: Some rules used in the computation of the adjacent attribute for the subscript expression of an array subscript

Alignment

Our vectorization algorithm computes the aligned attribute defined in Section 3.5.6. Once the array access has been qualified as adjacent, the algorithm analyzes the access to determine if its address will be aligned to the vector boundary across all the iterations of the loop. The memory address accessed by an array subscript is computed as subscripted alignment + subscript alignment × sizeof (type). If the resulting alignment is multiple of the vector length boundary, the memory access is classified as aligned. An unknown value or alignment values not multiple of the vector length boundary are then classified as unaligned.

Figure 3.10 shows some simplified rules used in the computation of the align- ment caused by the subscript expression of the array subscript. This alignment is computed in number of elements. Later, this number is multiplied by the size of the data type of the access to have an alignment in bytes. The alignment of a subscript expression depends on the attributes found on its sub-expressions:

constant: for compile-time constants the algorithm returns their constant value

(lines 2 and 3).

suitable: suitable variables will have a runtime value multiple of the vector length

in number of elements. The algorithm returns its suitable value (lines 4 and 5).

linear: linear variables require a suitable lower bound to produce an aligned ac-

3.5. Vectorization Algorithm 49

1: functionALIGNMENT(expr)

2: ifIS CONSTANT(EXPR)( )then

3: return expr.const value

4: else ifIS SUITABLE(expr) then

5: returnGET SUITABLE VALUE(expr)

6: else ifIS LINEAR(expr) then

7: returnALIGNMENT(expr.lb) +ALIGNMENT(expr.step)× VF

8: else

9: switch expr.kind do

10: case NODECL ADD(expr1, expr2, type)

11: returnALIGNMENT(expr1) +ALIGNMENT(expr2)

12: case NODECL MUL(expr1, expr2, type)

13: returnALIGNMENT(expr1)×ALIGNMENT(expr2)

14: end if

15: return UNKNOWN . Default case. The value of expr is unknown.

16: end function

Figure 3.10: Some simplified rules used in the computation of the alignment (in elements) caused by subscript expression of an array subscript

6 and 7).

Computing the alignment of compound expressions requires a special algebra that defines the results for the operations on operands with the UNKNOWN value. For example, addition and multiplication operations depicted in lines 9 - 13 must be able to combine alignment values where the UNKNOWN value could be present. In this way, when the UNKNOWN value is operated with an addition operation, their result is also UNKNOWN. The same happens with the multiplication operation except when the other operand is suitable. A multiplication of an UNKNOWN value by a suitable value results in another suitable value.