• No se han encontrado resultados

The VAX vector architecture defines the instruction set , registers, and behavior that all VAX vector implementations, such as the VAX 9000 V-box, must follow.' The vector architecture effort started in December 1985. At that time several CPU develop­ ment projects were well underway, including the VAX 9000 system. With the expectation of provid­ ing four to five times performance improvement for vectorizable applications, Digital decided to add vector p rocessi ng to the VAX 9000 system, even though the system was in an advanced stage of development. A decision also was made to provide a complementary metal oxide semiconductor (CMOS) implementation of the architecture on the VAX 6000 Model 4 00 system."

Because both systems could not tolerate major changes without a major slip in schedule, the archi­ tecture required an approach that made few changes to the scalar processor - that part of a VA,'\

processor that executes the regular VAX instruction set. Furthermore, because not all applications and markets can benefit from vector processing, Digital decided not to require vector processing on every new VAX processor. Therefore, vector processing is offered as an optional capability. The scalar proces­ sor decodes vector i nstructions and passed them to its associated vector processor. All processing of vector instructions is handled by the vector pro­ cessor. Mechanisms are provided for vector-scalar

synchronization and handling of vector exceptions by the scalar processor.

Although the architecture had to account for the implementation constraints of both ongoing CMOS and ECL projects, it had to be general and flexible enough to allow future, more i ntegrated implemen­ tations at higher performance. The architecture also had to m inimize its impact on the existing VMS a nd ULTRIX operating systems because major changes could significantly delay software support for vector processing.

Basic Architecture

The VAX vector architecture uses a vector-register­ based design first pioneered by Seymour C ray. 1 There are 16 vector registers, each of which holds 64 elements; an element is 64 -bits. Instructions which operate on longword integers or F _floating point data, only manipu late the low-order 32 bits of each element - sometimes referred to as long­ word elements.

A n umber of vector control registers control which elements of a vector register are processed by an instmction. The vector length register (VLR) limits the highest-numbered vector register ele­ ment that is processed by a vector instruction. The vector mask register (VMR) consists of a 64 -bit mask, in which each mask bit corresponds to one of the possible element positions in a vector register. When instructions are executed under control of the vector mask register, only those elements for which the corresponding mask bit is true are pro­ cessed by the instruction. Vector compare instruc­ tions set the value of the vector mask register.

The vector coun t register (VCR) receives t he number of elements generated by the compressed IOTA instruction, which is similar to COMPRESSED IOTA on the CRAY-2.1 All VAX vector instructions use two-byte extended opcodes. Any necessary scalar operands (e. g. , base address and stride for vector memory instructions) are specified by standard VAX scalar operand specifiers. The instruction formats allow all VAX vector instructions to be encoded in

62

seven classes. The seven basic instruction groups and their opcodes are shown in Table l .

Within each class, all instructions have the same number and types of operands, which allows the scalar processor to use block-decoding techniques. The differences in operation between the individ­ ual instructions within a class are irrelevant to the scalar processor and need only be known by the vector processor. I mportant features of the instruc­ tion set are

• Support for random-strided vector memory data through gather (VGATH) and scatter (VSCAT) instructions

• Generation of compressed IOTA vectors (through

the IOTA instruction) to be used as offsets to the gather and scatrer instructions

• Merging vector registers through the VMERGE

instruction

• The ability for any vector instruction to operate under control of the vector mask register

Additional control information for a vector instruction is provided in the vector control word (shown as cntrl in Table 1 ), which is a scalar operand to most vector instructions. The control word operand can be specified using any VAX addressing mode. However, VAX compilers gener­ ally use immediate mode addressing (that is, place the control word within the instruction stream). The format of the vector control word is shown in Figure 1 .

The Va , Yb , and Vc fields indicate the source and destination vector registers to be used by the instruction. These fields also indicate the specific operation to be performed by a vector compare or convert instruction. The MOE bit indicates whether the particular instruction operates under control of the vector mask register. The MTF bit determines what bit value corresponds to " true" for vector mask register bits. It allows a compiler to vectorize if-then-else constructs. The EXC bit is used in vector arithmetic instructions to enable integer overflow and floating underflow exception reporting. The Ml bit is used in vector memory load instructions to indicate modify-intent. Figure 2 shows the encod­ ing for some typical VAX vector instructions.

Vector Execution Model

With the addition of vector processing, a typical VAX processor consists of a scalar processor and an associated vector processor; the two are referred to as a scalar/vector pair. A VAX multiprocessor system

Table 1 VAX Vector Instruction Classes Vector Memory, Constant-stride opcode cntrl , base, stride

VLDL Load longword vector data

VLDQ Load quadword vector data

VSTL Store longword vector data

VSTQ Store quadword vector data

Vector Memory, Random-stride opcode cntrl, base

VGATHL Gather longword vector data

VGATHQ Gather quadword vector data

VSCATL Scatter longword vector data

VSCATQ Scatter quadword vector data

Vector-Scalar Single-precision Arithmetic opcode cntrl, scalar

VSADDL I nteger longword add

VSADDF F _floating add

VSBICL Bit clear longword

VSBISL Bit set longword

VSCMPL I nteger longword compare

VSCMPF F _floating compare

VSDIVF F _floating divide

VSMULL I nteger longword m u ltiply

VSMULF F _floating m u ltiply

VSSLLL Shift left logical longword

VSSRLL Sh ift right logical longword

VSSUBL I nteger longword subtract

VSSUBF F _floating subtract

VSXORL Exclusive-or longword

I OTA Generate compressed IOTA vector

Vector Control Register Read opcode regnum, destination

M FVP Move from vector processor

Vector Control Register Write opcode regnum, scalar

MTVP Move to vector processor

Digital Techllicaljournal Vol. 2 No. 4 Fall /990

Vector Processing on the VAX 9000 System

Vector-scalar Double-precision Arithmetic opcode cntrl , scalar

VSADDD O_floating add

VSADDG G_floating add

VSCMPD O_floating com pare

VSCMPG G_floating com pare

VSDIVD O_floating divide

VSDIVG G_floating d ivide

VSMULD O_floating m u ltiply

VSMULG G_floating m u ltiply

VSSUBD O_floating subtract

VSSU BG G_floating subtract

VSMERGE Merge

Vector-vector Arithmetic opcode cntrl or regnum

VVADDL I nteger longword add

VVADDF F _floating add

VVADDD O_floating add

VVADDG G_floating add

VVBICL Bit clear longword

VVBISL Bit set longword

VVCM PL I nteger longword compare

VVCMPF F _floating com pare

VVCMPD O_floating compare

VVCMPG G_floating compare

VVCVT Convert

VVDIVF F _floating d ivide

VVDIVD D_floating divide

VVDIVG G_floating divide

VVMERGE Merge

VVM ULL I nteger longword m u ltiply

VVMULF F _floating m u ltiply

VVMULD O_floating m u ltiply

VVMU LG G_floating m u ltiply

VVSLLL Shift left logical longword

VVSRLL Shift right logical longword

VVSUBL I nteger longword subtract

VVSUBF F _floating subtract

VVSUBD O_floating subtract

VVSUBG G_floating subtract

VVXORL Exclusive-or longword

VSYNC Synchronize vector memory access

1 5

14

13