The VAX vector architecture defines the instruction set , registers, and behavior that all VAX vector implementations, such as the VAX 9000 V-box, must follow.' The vector architecture effort started in December 1985. At that time several CPU develop ment projects were well underway, including the VAX 9000 system. With the expectation of provid ing four to five times performance improvement for vectorizable applications, Digital decided to add vector p rocessi ng to the VAX 9000 system, even though the system was in an advanced stage of development. A decision also was made to provide a complementary metal oxide semiconductor (CMOS) implementation of the architecture on the VAX 6000 Model 4 00 system."
Because both systems could not tolerate major changes without a major slip in schedule, the archi tecture required an approach that made few changes to the scalar processor - that part of a VA,'\
processor that executes the regular VAX instruction set. Furthermore, because not all applications and markets can benefit from vector processing, Digital decided not to require vector processing on every new VAX processor. Therefore, vector processing is offered as an optional capability. The scalar proces sor decodes vector i nstructions and passed them to its associated vector processor. All processing of vector instructions is handled by the vector pro cessor. Mechanisms are provided for vector-scalar
synchronization and handling of vector exceptions by the scalar processor.
Although the architecture had to account for the implementation constraints of both ongoing CMOS and ECL projects, it had to be general and flexible enough to allow future, more i ntegrated implemen tations at higher performance. The architecture also had to m inimize its impact on the existing VMS a nd ULTRIX operating systems because major changes could significantly delay software support for vector processing.
Basic Architecture
The VAX vector architecture uses a vector-register based design first pioneered by Seymour C ray. 1 There are 16 vector registers, each of which holds 64 elements; an element is 64 -bits. Instructions which operate on longword integers or F _floating point data, only manipu late the low-order 32 bits of each element - sometimes referred to as long word elements.
A n umber of vector control registers control which elements of a vector register are processed by an instmction. The vector length register (VLR) limits the highest-numbered vector register ele ment that is processed by a vector instruction. The vector mask register (VMR) consists of a 64 -bit mask, in which each mask bit corresponds to one of the possible element positions in a vector register. When instructions are executed under control of the vector mask register, only those elements for which the corresponding mask bit is true are pro cessed by the instruction. Vector compare instruc tions set the value of the vector mask register.
The vector coun t register (VCR) receives t he number of elements generated by the compressed IOTA instruction, which is similar to COMPRESSED IOTA on the CRAY-2.1 All VAX vector instructions use two-byte extended opcodes. Any necessary scalar operands (e. g. , base address and stride for vector memory instructions) are specified by standard VAX scalar operand specifiers. The instruction formats allow all VAX vector instructions to be encoded in
62
seven classes. The seven basic instruction groups and their opcodes are shown in Table l .
Within each class, all instructions have the same number and types of operands, which allows the scalar processor to use block-decoding techniques. The differences in operation between the individ ual instructions within a class are irrelevant to the scalar processor and need only be known by the vector processor. I mportant features of the instruc tion set are
• Support for random-strided vector memory data through gather (VGATH) and scatter (VSCAT) instructions
• Generation of compressed IOTA vectors (through
the IOTA instruction) to be used as offsets to the gather and scatrer instructions
• Merging vector registers through the VMERGE
instruction
• The ability for any vector instruction to operate under control of the vector mask register
Additional control information for a vector instruction is provided in the vector control word (shown as cntrl in Table 1 ), which is a scalar operand to most vector instructions. The control word operand can be specified using any VAX addressing mode. However, VAX compilers gener ally use immediate mode addressing (that is, place the control word within the instruction stream). The format of the vector control word is shown in Figure 1 .
The Va , Yb , and Vc fields indicate the source and destination vector registers to be used by the instruction. These fields also indicate the specific operation to be performed by a vector compare or convert instruction. The MOE bit indicates whether the particular instruction operates under control of the vector mask register. The MTF bit determines what bit value corresponds to " true" for vector mask register bits. It allows a compiler to vectorize if-then-else constructs. The EXC bit is used in vector arithmetic instructions to enable integer overflow and floating underflow exception reporting. The Ml bit is used in vector memory load instructions to indicate modify-intent. Figure 2 shows the encod ing for some typical VAX vector instructions.
Vector Execution Model
With the addition of vector processing, a typical VAX processor consists of a scalar processor and an associated vector processor; the two are referred to as a scalar/vector pair. A VAX multiprocessor system
Table 1 VAX Vector Instruction Classes Vector Memory, Constant-stride opcode cntrl , base, stride
VLDL Load longword vector data
VLDQ Load quadword vector data
VSTL Store longword vector data
VSTQ Store quadword vector data
Vector Memory, Random-stride opcode cntrl, base
VGATHL Gather longword vector data
VGATHQ Gather quadword vector data
VSCATL Scatter longword vector data
VSCATQ Scatter quadword vector data
Vector-Scalar Single-precision Arithmetic opcode cntrl, scalar
VSADDL I nteger longword add
VSADDF F _floating add
VSBICL Bit clear longword
VSBISL Bit set longword
VSCMPL I nteger longword compare
VSCMPF F _floating compare
VSDIVF F _floating divide
VSMULL I nteger longword m u ltiply
VSMULF F _floating m u ltiply
VSSLLL Shift left logical longword
VSSRLL Sh ift right logical longword
VSSUBL I nteger longword subtract
VSSUBF F _floating subtract
VSXORL Exclusive-or longword
I OTA Generate compressed IOTA vector
Vector Control Register Read opcode regnum, destination
M FVP Move from vector processor
Vector Control Register Write opcode regnum, scalar
MTVP Move to vector processor
Digital Techllicaljournal Vol. 2 No. 4 Fall /990
Vector Processing on the VAX 9000 System
Vector-scalar Double-precision Arithmetic opcode cntrl , scalar
VSADDD O_floating add
VSADDG G_floating add
VSCMPD O_floating com pare
VSCMPG G_floating com pare
VSDIVD O_floating divide
VSDIVG G_floating d ivide
VSMULD O_floating m u ltiply
VSMULG G_floating m u ltiply
VSSUBD O_floating subtract
VSSU BG G_floating subtract
VSMERGE Merge
Vector-vector Arithmetic opcode cntrl or regnum
VVADDL I nteger longword add
VVADDF F _floating add
VVADDD O_floating add
VVADDG G_floating add
VVBICL Bit clear longword
VVBISL Bit set longword
VVCM PL I nteger longword compare
VVCMPF F _floating com pare
VVCMPD O_floating compare
VVCMPG G_floating compare
VVCVT Convert
VVDIVF F _floating d ivide
VVDIVD D_floating divide
VVDIVG G_floating divide
VVMERGE Merge
VVM ULL I nteger longword m u ltiply
VVMULF F _floating m u ltiply
VVMULD O_floating m u ltiply
VVMU LG G_floating m u ltiply
VVSLLL Shift left logical longword
VVSRLL Shift right logical longword
VVSUBL I nteger longword subtract
VVSUBF F _floating subtract
VVSUBD O_floating subtract
VVSUBG G_floating subtract
VVXORL Exclusive-or longword
VSYNC Synchronize vector memory access