CN1152300C - Single-instruction-multiple-data processing with combined scalar/vector operations - Google Patents

Single-instruction-multiple-data processing with combined scalar/vector operations Download PDF

Info

Publication number
CN1152300C
CN1152300C CNB971174059A CN97117405A CN1152300C CN 1152300 C CN1152300 C CN 1152300C CN B971174059 A CNB971174059 A CN B971174059A CN 97117405 A CN97117405 A CN 97117405A CN 1152300 C CN1152300 C CN 1152300C
Authority
CN
China
Prior art keywords
register
vector
data
instruction
scalar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB971174059A
Other languages
Chinese (zh)
Other versions
CN1188275A (en
Inventor
Ī���ȡ�A���º�Ĭ��
莫塔兹·A·穆罕默德
朴宪哲
Sd
利·T·恩格延
罗尼·S·D·旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN1188275A publication Critical patent/CN1188275A/en
Application granted granted Critical
Publication of CN1152300C publication Critical patent/CN1152300C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/24Systems for the transmission of television signals using pulse code modulation

Abstract

The digital signal processor has scalar registers for scalar values, and a group of general purpose vector registers for the vectors which form the elements of multiple data. Each register has a fixed size but can be partitioned by the user to sizes preferred for their application. The operations executed by the vector processor treat two or more vector operands to determine a vector quantity, combine a scalar operand and a vector operand to determine the total quantity, or combine two or more scalar operands to determine a scalar quantity. The scalar registers also facilitate manipulation of individual data elements in a vector register.

Description

Single-instruction-multiple-data processing in the multi-media signal processor and device thereof
Invention field
The present invention relates to Digital Signal Processing, the method and the device thereof that particularly carry out parallel processing for a plurality of data elements to each instruction of multimedia function (such as the Audio and Video Code And Decode).
Background technology
This patent document relates to and the following simultaneously patent application of application of reference:
U.S. Patent application serial number UNKNOWN1, attorney docket M-4354 is entitled as " Multiprocessor Operation in a Multimedia Signal Processor (multiprocessor operations in the multi-media signal processor) ";
U.S. Patent application serial number UNKNOWN2, attorney docket M-4355 is entitled as " Single-Instruction-Multiple-Data Processing in a Multimedia Signal Processor (single-instruction multiple-data in the multi-media signal processor is processed) ";
U.S. Patent application serial number UNKNOWN3, attorney docket M-4365 is entitled as " Efficient Context Saving and Restoring in Multiprocessors (efficient Locale Holding and recovery in the multiprocessor) ";
U.S. Patent application serial number UNKNOWN4, attorney docket M-4366 is entitled as " System and Method for Handling Software Interrupts with Argument Passing (processing has the system and method for the software interrupt of parameter transmission) ";
U.S. Patent application serial number UNKNOWN5, attorney docket M-4367 is entitled as " System and Method for Handling Interrupts and Exception Events in an Asymmetric Multiprocessor Architecture (system and method for handling interrupt and anomalous event in asymmetric multi-processor structure) ";
U.S. Patent application serial number UNKNOWN6, attorney docket M-4368 is entitled as " Methods and Apparatus for Processing Video Data (method and apparatus of processing video data) ";
U.S. Patent application serial number UNKNOWN7, attorney docket M-4369 is entitled as " Single-Instruction-Multiple-Data Processing Using Multiple Banks of Vector Registers (adopting the single-instruction multiple-data of a plurality of vector registor groups to process) "; And
The programmable digital signal processor (DSPs) that is used for multimedia application (for example real-time video Code And Decode) needs great disposal ability, in order to process a large amount of data in finite time. Several structures of digital signal processor are well-known. The universal architecture that most of microprocessors adopt generally needs high operate frequency, so that the DSP with the computing capability that is enough to carry out the real-time video coding or decodes to be provided. This makes this DSP expensive.
Very long instruction word (VLIW) processor is a kind of DSP with a lot of functional units, and the major part in these functional units is carried out different, relatively simple task. The single instruction of VLIW DSP can be 128 bytes or longer, and has a plurality of independently by the part of functional unit executed in parallel independently. VLIW DSPs has very strong computing capability, because many functional units can concurrent working. VLIW DSPs also has relatively low cost, because each functional unit is relatively little and simple. The problem that VLIW DSPs exists is to process I/O control, be unsuitable for the function aspects inefficiency of a plurality of functional unit executed in parallel of using VLIW DSP with main computer communication and other. In addition, the software of VLIW is different from traditional software and exploitation difficulty, in default of programming tool be familiar with the programmer of VLIW software configuration. Therefore, can provide the DSP of reasonable cost, high computing capability and familiar programmed environment is that multimedia application is looked for.
Summary of the invention
The purpose of this invention is to provide a kind of single-instruction-multiple-data processing and device thereof.
According to one aspect of the invention, a kind of processor is provided, comprise a scalar register, be suitable for storing single scalar value; A vector registor is suitable for storing a plurality of data elements; And treatment circuit, it is connected to described scalar register and described vector registor, wherein this treatment circuit is carried out multiple operation concurrently in response to single instruction, and a data element in every kind of described vector registor of operation handlebar combines with the described scalar value in the described scalar register.
According to a further aspect of the present invention, provide a kind of operation processing circuit to carry out the method for instruction, having comprised: the register data element of reading to consist of the vector value component; With the execution parallel work-flow, this operation handlebar scalar value combines with each data element, to produce vector result.
Another aspect according to the present invention, a kind of method of Operation Processor, being included in provides scalar register and vector registor in the described processor, wherein each scalar register is suitable for storing single scalar value, and each vector registor is suitable for storing a plurality of data elements that consist of component of a vector; Be assigned to a register number to each scalar register, this register number is different from the register number that is assigned to other scalar register; Be assigned to a register number to each vector registor, this register number is different from the register number that is assigned to other vector registor, and at least some register number that wherein is assigned to described vector registor is identical with the register number that is assigned to described scalar register; Form an instruction, this instruction comprises first operand and second operand, and wherein first operand is the register number of sign scalar register, and second operand is the register number of mark vector register; And carry out described instruction with at transferring data by the described scalar register of described first operand sign and between by the data element in the described vector registor of described second operand sign.
Multimedia digital signal processor (DSP) according to one aspect of the invention comprises a vector processor, and this vector processor operation vector data (being that every operand has a plurality of data elements) is to provide high throughput. This processor uses the SIMD organization of RISC type instruction collection. The programmer can adapt to the programmed environment of vector processor at an easy rate, because it is similar to the programmed environment of the general processor that most of programmer is familiar with.
DSP comprises the general vector registor of a cover. Each vector registor has regular length, but is divided into the independent data element that a plurality of users can select length. Therefore, being stored in data element prime number in the vector registor depends on and is the selected length of this element. For example 32 byte registers can be divided into 32 8 data element, 16 16 data element, or 8 32 data element. The selection of data length and type is determined by the instruction of processing the data relevant with vector registor, and an execution data path of instruction is carried out a plurality of parallel work-flows, and this depends on the data length that instruction is indicated.
The instruction of vector processor can the directed quantity register or scalar register as operand, and operate concurrently a plurality of data elements of a plurality of vector registors, in order to improve computing capability. An exemplary instruction set of vector processor of the present invention comprises: the coprocessor interface operation; Flow control operations: load/store operations; And logic/arithmetical operation. The operation that logic/arithmetical operation comprises combines a plurality of data elements of the data vector that bears results to corresponding a plurality of data elements in same or a plurality of other vector registors of a plurality of data elements of a vector registor. Other logic/arithmetical operation mixes the various data elements of one or more vector registors, or the data element of vector registor is combined with scalar.
A kind of structure extension of this vector processor has added scalar register, and each scalar register comprises a scalar data element. The combination of scalar sum vector registor has made things convenient for the instruction set with vector processor to expand to comprise concurrently the operation of the same scalar value combination of each data element of a vector. For example, an instruction be multiply by a scalar value to a plurality of data elements of a vector. Scalar register also provides a position, is used for the individual data element that vector registor will extract or deposit in vector registor to storage. Scalar register is with transmission information between vector processor and coprocessor (structure of this coprocessor only provides scalar register) and also very convenient to calculating the used effective address of load/store operations.
According to a further aspect in the invention, a plurality of vector registors of vector processor are organized as a plurality of groups. Each group can be selected as " current (current) " group, and another group then is " substituting (alternative) " group. In the control register of vector processor " current group " position indication current and. In order to reduce the required figure place of mark vector register, some instruction only provides the register number of a vector registor in identifying current group. Load/store instruction has an extra order to identify the vector registor of any one group. Therefore, load/store operations can be taken out data and be delivered to alternate sets during the data of operation in current group. This helps image to process and the software pipeline operation of figure process, and the delay of reduction processor when fetch data, because with the load/store operations of accessing the alternative registers group, logic/arithmetical operation can not carried out in order. In other instruction, alternate sets allow to be used the Double Length vector registor, and this register comprises one from current group vector registor, and the corresponding vector registor from alternate sets. This Double Length register can be differentiated according to syntax of instructions. Control bit in the vector processor can be set, so that default vector length is one or two vector registor. Alternate sets also allows to use the operand of explicit identification still less in the complicated order syntax, as the conditional jump of shuffling (shuffle), going to shuffle (unshuffle), saturated (saturate) and having two sources and two destination registers.
Vector processor is also realized novel instruction, as Siping City all (average quad), shuffle, go to shuffle, paired mode maximum (pair-wise maximum) and exchange (exchange) and saturated. These instructions are carried out, and to operate in the multimedia function (for example Video coding and decoding) be common, and replace realizing in other instruction set 2 required or more instruction of said function. Thereby the vector processor instruction set has been improved efficient and the speed of multimedia application Program.
Description of drawings
Describe the preferred embodiments of the present invention in detail below in conjunction with accompanying drawing, wherein,
Fig. 1 is the block diagram of multimedia processor according to an embodiment of the invention.
Fig. 2 is the block diagram of vector processor of the multimedia processor of Fig. 1.
Fig. 3 is the block diagram of fetching unit of the vector processor of Fig. 2.
Fig. 4 is the block diagram of fetching unit of the vector processor of Fig. 2.
Fig. 5 A, 5B and 5C show the register of vector processor of Fig. 2 to the step of the used execution pipeline of register instruction, load instructions and storage instruction.
Fig. 6 A is the block diagram of execution data path of the vector processor of Fig. 2.
Fig. 6 B is the block diagram of the register file (register file) of Fig. 6 A execution data path.
Fig. 6 C is the block diagram of the parallel processing logic unit of Fig. 6 A execution data path.
Fig. 7 is the block diagram of load/store unit of the vector processor of Fig. 2.
Fig. 8 is the form of the vector processor instruction set of one embodiment of the invention.
The specific embodiment
Used same reference numeral represents similar or identical item in different figure.
Fig. 1 shows the block diagram of embodiment of the multi-media signal processor (MSP) 100 of one embodiment of the invention. Multimedia processor 100 comprises the processing core 105 that general processor 110 and vector processor 120 form. Process core 105 and link the remainder of multimedia processor 100 by cache memory (hereinafter referred to as high-speed cache) subsystem 130, high-speed buffer subsystem comprises SRAM 160 and 190, ROM170 and director cache 180. Director cache 180 can be configured to SRAM160 instruction cache 162 and the data cache 164 of processor 110, and SRAM190 is configured to instruction cache 192 and the data cache 194 of vector processor 120.
ROM170 comprises data and the instruction of processor 110 and 120 in the sheet, and can be configured to high-speed cache. In the present embodiment, ROM170 comprises: reset and initialization procedure; The self-test diagnostic procedure; Interrupt and exception handler; And sound blaster emulation subroutine; V.34 modem signal is processed subroutine; The regular phone function; 2-D and 3-D figure subroutine analyzer; And be used for Voice ﹠ Video standard such as MPEG-1, MPEG-2, H.261, H.263, G.728 and subroutine analyzer G.723.
High-speed buffer subsystem 130 is connected to two system bus 140 and 150 to processor 110 and 120, and as processor 110 and 120 and be coupled to high-speed cache and the switching station (switching station) of the equipment of bus 140 and 150. The clock frequency work that system bus 150 usefulness are higher than bus 140, and being connected to Memory Controller 158, local bus interface 156, dma controller 154 and equipment interface 152, local bus, direct memory access (DMA) and various modulus, digital to analog converter that they are respectively external partial memory, master computer provide interface. System timer 142, UART (Universal asynchronous receiver transceiver, universal asynchronous receiver transmit) 144, bit stream processor 146 and interrupt control unit 148 are connected to bus 140. The patent application of above-mentioned being entitled as " Multiprocessor Operation in a Multimedia Signal Processor " and " Methods and apparatus for Processing Video Data " has more fully illustrated the work of high-speed buffer subsystem 130 and exemplary equipment, and processor 110 and 120 is by high level cache subsystem 130 and bus 140 and the described equipment of 150 access.
Processor 110 and 120 is carried out independently program threads, and structurally also is different, in order to more effectively carry out the particular task of giving them. Processor 110 is mainly used in controlling function, for example the function that computes repeatedly in a large number of the execution of real time operating system and similarly not needing. Therefore, processor 110 does not need strong computing capability, can realize with traditional general processor structure. This repetitive operation that comprises data block common in the multimedia processing of vector processor 120 main realization mathematical computations (number crunching). For strong computing capability and relative simply programming are arranged, vector processor 120 has SIMD (Single instruction multiple data, single-instruction multiple-data) structure; In the present embodiment, most of data path is 288 or 576 bit wides in vector processor 120, with the support vector data manipulation. In addition, the instruction set of vector processor 120 comprises the instruction that is particularly useful for the multimedia problem.
In the present embodiment, processor 110 is 32 risc processors, is operated on the 40MHz, meets the structure of ARM7 processor, and described ARM7 processor includes the register set of ARM7 standard definition. About the structure of ARM 7 risc processors and instruction set at " ARM7DM Data Sheet (ARM7DM product description) " Document Number (document number): be described among the ARM DDI 0010G, this can obtain from Advance RISC Machines Ltd. company. ARM7DM Data Sheet all is included in here as a reference. Appendix A has illustrated the expansion of the ARM7 instruction set of present embodiment.
Vector processor 120 not only operates vector but also operate scalar. In the present embodiment, vector data processor 120 comprises the pipeline system RISC engine (engine) with 80MHz work. The register of vector processor 120 comprises 32 scalar registers, 32 special registers, two group of 288 bit vector register and the vectorial accumulator registers of two groups of Double Lengths (namely 576). Appendix C has illustrated the register set of the vector processor 120 of present embodiment. In the present embodiment, processor 120 comprises 32 scalar registers, and 5 bit registers of these scalar registers by scope from 0 to 31 are number identified instruction. Also have 64 288 vector registor, these registers form two groups, and every group has 32 vector registors. Each vector registor can No. 31 identify with the vector registor of 1 group number (0 or 1) and 5 scopes from 0 to. The vector registor in current group is only accessed in most of instruction, as it is represented to be stored in the default group position CBANK of control register VCSR of vector processor 120. The 2nd control bit VEC64 represents the Double Length the vector registor whether default expression of register number is comprised of a register from each group. The register number of the register number of the syntax distinctive mark vector registor of instruction and sign scalar register.
Each vector registor can be divided into the programmable a plurality of data elements of length, and table 1 shows the data type of the data element of supporting in 288 bit vector registers.
Table 1:
Data type Data length Explain
  int8 8 (byte) 82 complement code between-128 and 127
  int9 9 (byte 9) 92 complement code between-256 and 255
  int16 16 (half-word) 16 2 complement code between-32,768 and 32,767
  int32 32 (word) 32 2 complement code between-2147483648 and 2147483647.
  float 32 (word) 32 IEEE 754 single-precision format
Appendix D further provides the data length supported in the embodiments of the invention and the explanation of type.
To the int9 data type, 9 bit bytes are combined in the 288 bit vector registers continuously, and to other data type, each the 9th is not used in 288 bit vector registers. 288 bit vector registers can be put 32 8 or 9 integer data elements, 16 16 integer data elements or 8 32 integers or floating-point element. In addition, 2 vector registors can be combined with Double Length vector assembling data element. In an embodiment of the present invention, the control bit VEC64 set with among control and the status register VCSR places mode VEC64 to vector processor 120, and Double Length (576) is the default length of vector registor here.
Multimedia processor 100 also comprises 32 extended registers 115 that a cover processor 110 and 120 can be accessed. Appendix B has illustrated extended register collection and their function in the embodiments of the invention. The scalar sum special register of extended register and vector processor 120 in some cases can be for processor 110 access. 2 special uses " user " extended register has 2 read ports, allows simultaneously read register of processor 110 and 120. Other extended register can not be simultaneously accessed.
Vector processor 120 has two the state VP_RUN and the VP_IDLE that replace, and indication vector processor 120 is in work or is in idle condition. When vector processor 120 was in state VP _ IDLE, processor 110 can read or write the scalar sum special register of vector processor 120. But the result that processor 110 read or write a register of vector processor 120 when vector processor 120 was in state VP_RUN does not give definition.
Expansion to the ARM7 instruction set of processor 110 comprises access extended register and the scalar of vector processor 120 or the instruction of special register. Command M FER and MFEP move on to the scalar of extended register and vector processor 120 or the data in the special register in the general register in the processor 110 respectively, and command M TER and MTEP move on to the data of general register in the processor 110 in the scalar or special register of extended register and vector processor 120 respectively. The TESTSET instruction is read extended register and the position 30 of extended register is set to 1. Signal instruction processor 110 occurs to processor 120 and has read the result that (or use) produces by with position 30 set in instruction TESTSET, has made things convenient for user/producer synchronous. The duty of other instruction of processor 110 such as STARTVP and INTVP dominant vector processor 120.
The work of 110 primary processors of processor is in order to the operation of dominant vector processor 120. Simplify processor 110 and 120 with the asymmetric division of control between processor 110 and 120 and carried out synchronous problem. When vector processor 120 was in the VP_IDLE state, processor 110 came initialization vector processor 120 by IA is write in the program counter of vector processor 120. Then, processor 110 is carried out the STARTVP instruction, and vector processor 120 is changed over state VP_RUN. Under state VP_RUN, vector processor 120 is by high-speed buffer subsystem 130 fetchings, and the processor 110 of its program of continuation execution is carried out those instructions concurrently together. After startup, vector processor 120 continues to carry out, until run into unusual, the VCJOIN that carries out to satisfy felicity condition or VCINT instruction or interrupted by processor 110. Vector processor 120 can be sent to processor 110 with the result of program execution by the result being write extended register, the result is write the address spaces that processor 110 and 120 shares or when vector processor 120 reenters state VP_IDLE the result being stayed in the scalar or special register of processor 110 access.
Vector processor 120 is not processed the unusual of it. When execution causes unusual instruction, vector processor 120 VP_IDLE that gets the hang of, and send an interrupt requests to processor 110 by direct-through line. Vector processor 120 remains on state VP_IDLE, until processor 110 is carried out another STARTVP instruction. The register VISRC that processor 110 is responsible for read vector processor 120 may process unusually by reinitializing vector processor 120 to determine unusual character, and then, boot vector processor 120 recovers to carry out as required.
INTVP instruction interrupt vector processor 120 by processor 110 is carried out makes vector processor 120 enter idle condition VP_IDLE. Instruction INTVP can for example be used in the multitask system, and vector processor is switched to another task such as sound card emulation from task such as the video coding of carrying out.
Vector processor instruction VCINT and VCJOIN are flow control instructions, if the condition of instruction indication satisfies, these instructions make vector processor 120 place state VP_IDLE the execution of stop vector processor 120, and to 110 interrupt requests of processor, unless this request conductively-closed. The program counter of vector processor 120 (special register VPC) is pointed out the IA after VCINT or the VCJOIN instruction. Processor 110 can check the interrupt source register VISRC of vector processor 120, determines whether it is that VCINT or VCJOIN instruction cause interrupt requests. Because vector processor 120 has the mass data bus, and more effective on its register of Save and restore, so should the Save and restore register during the software of carrying out by vector processor 120 switches (context switching) at the scene. The patent application of above-mentioned being entitled as " Efficient Context Saving and Restoring in Multiprocessors " has illustrated an exemplary system of Context switches.
Fig. 2 shows the main functional diagram of the embodiment of vector processor 120. Vector processor 120 comprises 210, the decoders 220 in a fetching unit (IFU), scheduler 230, execution data path 240 and a load/store unit (LSU) 250. The IFU210 fetching is also processed flow control instructions (such as branch). Command decoder 220 is according to the order that arrives from IFU 210, and per cycle is deciphered an instruction, and a field value of deciphering out from instruction is write the FIFO in the scheduler 230. Scheduler 230 selects to send to the field value of carrying out control register according to the needs of executable operations step. Send to select to depend on operand dependence (dependency) and processing resource such as execution data path 240 or pack into/availability of memory cell 250. Logic/the arithmetic instruction of execution data path 240 executable operations vectors or scalar data. Pack into/memory cell 250 carries out the instruction of packing into/store of the address space of access vector processors 120.
Fig. 3 shows the block diagram of the embodiment of IFU210. IFU comprises an instruction buffer, and this buffer is divided into main instruction buffer 310 and ancillary instruction buffer 312. Main buffer 310 comprises 8 continual commands, comprising the instruction corresponding to the present procedure counting. Comprise 8 instructions of the instruction in the buffer 310 and then in the secondary buffer 312. IFU210 also comprises a branch target buffer 314, and it comprises 8 continual commands, comprising the target of next flow control instructions in buffer 310 or 312. In the present embodiment, vector processor 120 uses the risc type instruction set, wherein every instruction be 32 long, buffer 310,312 or 314 is 8 * 32 digit buffers, and links high-speed buffer subsystem 130 by 256 bit instruction buses. IFU 210 can be within a clock cycle, and 8 instructions in the high-speed buffer subsystem 130 are loaded in the buffer 310,312 or 314 any one. Register 340,342 and 344 is indicated respectively the base address of load in the buffer 310,312 and 314.
MUX 332 is selected current instruction from main instruction buffer 310. If present instruction is not flow control instructions, and be stored in the decoding stage that instruction in the command register 330 proceeds to execution, then command register 330 is deposited in present instruction, is incremented to programmed counting. Behind the programmed counting increment, select the last item instruction in the buffer 310, then 8 instructions of next group are loaded onto buffer 310. If buffer 312 comprises desired 8 instructions, then the content of buffer 312 and register 342 moves on to buffer 310 and register 340 immediately, has again 8 instructions to deliver to secondary buffer 312 from cache systems 130 pre-fetchings. Adder 350 is determined the address of next group instruction according to the base address in the register 342 and the side-play amount selected by MUX 352. The result address that is obtained by adder 350 is stored in the register 342, when this moves on to register 340 in this address from register 342 or carry out later on. The address that calculates is also delivered in the high-speed buffer subsystem 130 in company with the request of 8 instructions. If called cache control system 130 last time, when buffer 310 request, also not 8 instructions below buffer 312 provides, then the instruction of request last time when receiving from high-speed buffer subsystem 130, is stored in the buffer 310 immediately.
If present instruction is flow control instructions, IFU210 by convection control instruction condition calculating and after flow control instructions refresh routine count to process this instruction. If because the instruction that the front may change condition do not finish, and condition pauses IFU210 when can not determine. If branch does not occur, program counter is incremented, and following instruction is selected as mentioned above. If the target that branch and branch target buffer 314 comprise this branch occurs, then the content of buffer 314 and register 344 is moved to buffer 310 and register 340, instruction is provided and need wait for from the instruction in the high-speed buffer subsystem 130 so that IFU 210 can continue as decoder 220.
In order to be branch target buffer 314 prefetched instructions, scanner 320 scanning buffer devices 310 and 312 are to locate the and then next flow control instructions of present procedure counting. If find flow control instructions in buffer 310 or 312, scanner 320 is determined to comprise the side-play amount of 8 instructions of flow control instructions destination address to one group of (aligned) that aims at from the buffer 310 that comprises this instruction or 312 base address. MUX 352 and 354 provides the side-play amount of flow control instructions and from the base address of register 340 or 342, is that buffer 314 produces new base address by adder 350 for adder 350. New base address is transferred to high-speed buffer subsystem 130, moreover it provides 8 instructions for branch target buffer 314.
Processing flow control instructions such as " decrement and conditional jump " instruction VD1CBR, VD2CBR and VD3CBR, when reaching " change control register " instruction VCHGCR, IFU210 can change the value of the register except programmed counting. When IFU 210 found the instruction of a non-flow control instructions, command register 330 was delivered in this instruction, and from there to decoder 220.
As shown in Figure 4, each field of the fifo buffer 410 of decoder 220 by controlling value being write scheduler 230 is deciphered an instruction. Fifo buffer 410 comprises 4 line triggers, and wherein every delegation can comprise 5 information fields, in order to control the execution of an instruction. Row 0 to 3 keeps arriving the earliest respectively the information of up-to-date instruction, when information is early finished along with instruction and when being removed, the information in fifo buffer 410 moves down into lower row. Scheduler 230 sends an instruction to the execution phase by selecting essential instruction field to be loaded into to comprise the control pipeline 420 of carrying out register 421 to 427. Most of instruction can be scheduled, in order to do not send in order and carry out. Especially the order about logic/arithmetical operation and load/store operations is arbitrarily, unless the operand dependence is arranged between load/store operations and logic/arithmetical operation. Field value relatively indicates whether have operation dependency to exist in the fifo buffer 410.
Fig. 5 A illustrates 6 stage execution pipelines of an instruction, and this instruction has realized the operation of register to register, and need not access the address space of vector processor 120. In the instruction fetching stage 511, as mentioned above fetching one instruction of IFU210. The fetching stage needs 1 clock cycle, unless because pipelining delay, unsolved branch condition or the delay in the high-speed buffer subsystem 130 that prefetched instruction is provided pause IFU210. In the decoding stage 512, decoder 220 decoding is from the instruction of IFU210, and the information of this instruction is write scheduler 230. The decoding stage 512 also needs a clock cycle, unless to new operation, among the FIFO 410 without available row. During the period 1 of FIFO 410, can send and operate control pipeline 420, but can be delayed owing to sending of operation early.
Executing data passage 240 is realized registers to the operation of register, and provides data and address for load/store operations. Fig. 6 A shows the block diagram of execution data path 240 1 embodiment, and is illustrated together with the execution phase 514,515 and 516. Carrying out register 421 provides the signal of two registers in the marker register file 610, and register file 610 was read in the clock cycle during read phase 514. Register file 610 comprises 32 scalar registers and 64 vector registors. Fig. 6 B is the block diagram of register file 610. Register file 610 has 2 read ports and 2 write ports, in order to provide 2 to read to write with 2 in each clock cycle. Each port comprises selects circuit 612,614,616 or 618 and 288 data/address bus 613,615,617 or 619. Selecting circuit is to know such as circuit 612,614,616 and 618 in the art, and use address signal WRADDR1, WRADDR2, RDADDR1 or RDADDR2, this be decoder 220 from generally be 5 bit registers that in instruction, provide number, group position from instruction or state of a control register VCSR, and indicator register be to obtain vector registor or the syntax of instructions of scalar register. The path that data are read can be to load/store unit 250, perhaps by MUX 622 and 624, by multiplier 620 ALUs 630, accumulator 640 by MUX 656. 2 registers are read in most of operation, and read phase 514 is finished in one-period. Yet, some instruction, as take advantage of and the instruction that adds instruction VMAD and operation Double Length vector need to more than the data of 2 registers, cause read phase 514 to need to surpass a clock cycle.
In the execution phase 515, multiplier 620, ALU 630 and accumulator 640 are processed the data that read from register file 610 front. If in order to read necessary a plurality of cycles of data demand, the execution phase 515 can be overlapping with read phase 514. The duration of execution phase 515 is depended on type (integer or floating type) and the quantity (read cycle data) of deal with data element. From carry out register 422,423 and 425 signal controlling data inserting to ALU 630, accumulator 640 and multiplier 620 in order to realize that in the execution phase first step operates. From carry out register 432,433 and 435 signal controlling realizes the second step operation in the execution phases 515.
Fig. 6 C shows the block diagram of multiplier 620 and ALU 630 1 embodiment. Multiplier 620 is integer multiplier, and it comprises 8 independently 36 * 36 multipliers 626. Each multiplier 626 comprises 49 * 9 multipliers that link together by control circuit. To 8 and 9 bit data elements width, disconnect the mutual binding of 49 * 9 multipliers from the control signal of scheduler 230, so that each multiplier 626 is realized 4 multiplication, multiplier 620 is realized 32 independently multiplication in one-period. To 16 bit data elements, control circuit 9 * 9 multipliers to the operation that links together. Multiplier 620 is realized 16 parallel multiplications. To 32 integer data element types, 8 626 each clock cycle of multiplier are realized 8 parallel multiplications. The result of multiplication provides 576 results to 9 bit data elements width, provides 512 results to other data length.
ALU 630 can process 576 or 512 results from multiplier 620 in 2 clock cycle. ALU 630 comprises 8 independently 36 ALUs 636, and each ALU 636 comprises for floating addition and 32 * 32 floating point units taking advantage of. Adjunct circuit is realized integer displacement, arithmetic sum logic function. For integer operation, each ALU 636 comprises 4 unit that can independently carry out 8 and 9 bit manipulations, and to 16 and 32 integer data elements, per 2 or 4 can form one group and connect together.
Accumulator 640 accumulation results, and comprise 2 576 bit registers, in order to realize the degree of precision of intermediate object program.
At write phase 516, from the result store of execution phase in register file 610. Within a clock cycle, can write 2 registers, 2 data values that input MUX 602 and 605 selections will be write. The duration of the write phase 516 of once-through operation depends on the data volume that will be write as operating result and from the competition of LSU 250, LSU 250 may be by writing to finish the loading instruction to register file 610. Select register that the data from logical block 630, accumulator 640 and multiplier 620 are write from the signal of carrying out register 426 and 427.
Fig. 5 B illustrates and carries out the execution pipeline 520 that loads instruction. Identical for instruction fetching stage 511, decoding stage 512 and the stage of sending 513 of execution pipeline 520 and illustrated register to the operation of register. Read phase 514 is also identical with top explanation, just execution data path 240 usefulness from the data of register file 610 to determine the address of calls cache subsystem 130. At address phase 525, MUX 652,654 and 656 is selected the address, and this address is provided for the load/store unit 250 of execution phase 526 and 527. When load/store unit 250 was processed operation, during stage 526 and 527, the Information preservation of load operation was in FIFO 410.
Fig. 7 shows an embodiment of load/store unit 250. Calls cache subsystem 130 during the stage 256 is with the data of request stage 525 determined addresses. Present embodiment uses (transaction based) high-speed cache based on affairs to call, and can pass through high-speed buffer subsystem 130 access local address spaces comprising a plurality of equipment of processor 110 and 120. In several cycles after calls cache subsystem 130, requested data may can not get, but when other called hang-up, load/store unit 250 can the calls cache subsystems. Therefore, load/store unit 250 unlikely pauses. High-speed buffer subsystem 130 provides the required clock periodicity of requested data to depend on hitting of data cache 194 or miss (hit or miss).
In the driving stage 527, high-speed buffer subsystem 130 is that load/store unit 250 is confirmed (assert) data-signal. High-speed buffer subsystem 130 can provide the data of 256 (32 bytes) to load/store unit 250 in each cycle, and byte alignment device 710 is aimed at each byte of 32 bytes in corresponding 9 memory locations, so that 288 value to be provided. 288 form is easily to the multimedia application of for example mpeg encoded and decoding, and they use 9 bit data elements sometimes. 288 place values write read data 720. To write phase 528, scheduler 230 is sent to the field 4 of fifo buffer 410 and carries out register 426 or 427, and 288 value of data buffer 720 is write register file 610.
Fig. 5 C shows and carries out the used execution pipeline 530 of storage instruction. The fetching stage 511 of execution pipeline 530, decoding stage 512 and the stage of sending 513 are identical with what illustrate previously. Read phase 514 is also identical with what illustrate previously, and just read phase is read data and the used data of address computation that will store. Want stored data to be written into write data buffer 730 in the load/store unit 250. MUX 740 becomes the data transaction of 9 bit byte forms the form of traditional octet. From the data of the conversion of buffer 730 with from the relative address in address computation stage 525, during the SRAM stage 536, delivered to concurrently high-speed buffer subsystem 130.
In the embodiment of vector processor, each instruction be 32 long and have a kind of form in 9 kinds of forms shown in Fig. 8, and be labeled as REAR, REAI, RRRM5, RRRR, RI, CT, RRRM9, RRRM9*, and RRRM9** Appendix E has illustrated the instruction set of vector processor 120.
When determining an effective address, use some loading, storage and the cache operations of scalar register to have the REAR form. The REAR format order is that 000b identifies and 3 operands arranged by 3 register number sign with a position 29-31, and 2 register number SRb and SRi are scalar register, and register number Rn can be scalar or vector registor, and this depends on a D. Group position B or for register Rn identifies a group, if indicate whether when perhaps the default vector register size is Double Length that vector registor Rn is Double Length. The operation that opcode field Opc sign is carried out operand, and field TT indication transmission type is for loading or storage. Typical REAR format order is instruction VL, and it comes bit load registers Rn from scalar register SRb and the definite address of SRi content addition. If position A is set, the address of calculating is stored among the scalar register SRb.
The REAI format order is identical with the REAR instruction, just is used to replace the content of scalar register SRi from 8 immediate values of field IMM. REAR and REAI form are countless according to the length of element field.
The RRRM5 form is used for having the instruction of 2 source operands and a destination operand. These instructions have 3 register manipulation numbers or 2 register manipulation numbers and 15 immediate value. Coding at field D, the S shown in the appendix E and M determines whether that first source operand Ra is scalar or vector registor; Whether the 2nd source operand Rb/IM5 is scalar register, vector registor or 5 immediate values; And whether destination register Rd is scalar or vector registor.
The RRRR form is used for having the instruction of 4 register manipulation numbers. Register number Ra and Rb indication source register. Register number Rd indicates destination register, and register number Rc indication source or destination register, this depends on field Opc. The all operations were number is vector registor, is scalar register unless position S is set indicator register Rb. The data element length of field DS indication vector registor. Field Opc selects the data type of 32 bit data elements.
The RI format order loads an immediate value to register. Field IMM comprises can reach 18 immediate value. Register number Rd indicates destination register, and this destination register is vector registor or the scalar register in current group, and this depends on a D. Field DS and F be length and the type of designation data element respectively. To 32 integer data elements, 18 immediate values are being loaded into register Rd with the previous crops sign extended. To the floating data element, position 18, position 17 to 10 and position 9 to 0 represent respectively symbol, the exponential sum mantissa of 32 floating point values.
The CT form is used for flow control instructions, and it comprises opcode field Opc, condition field Cond and 23 s' immediate value IMM. When the condition field indicated condition is true time, branch then occurs. Possible condition code is " always (unconditionally) ", " Less than (less than) ", " equal (equaling) ", " Less than or equal (being less than or equal to) ", " greater than (greater than) ", " not equal (being not equal to) ", " greater than or equal (more than or equal to) " and " overflow (overflowing) ". Position GT, EQ, LT and SO among state and the control register VCSR are used for appreciation condition.
Form RRRM9 provides 3 register manipulation numbers or 2 register manipulation numbers and 19 immediate value. Which operand the combination of position D, S and M indicates is vector registor, scalar register or 9 immediate values. Field DS designation data length of element. RRRM9*And RRRM9**Form is the special circumstances of RRRM9 form, and distinguishes with opcode field Opc. RRRM9* form condition code Cond and id field alternate source register number Ra. RRRM9**Form replaces each highest significant position of immediate value with condition code Cond and position K. RRRM9*And RRRM9**Further specify in appendix E and provide, relate to conditional branch instruction VCMOV, element shielding conditional jump CMOVM and comparison and masking instruction CMPV be set.
Although in conjunction with specific embodiments the present invention has been made explanation, but these explanations only are the examples that the present invention uses, should be as being not a kind of restriction, the various modifications of the disclosed embodiments characteristics and combination still belong to the scope of the present invention that following claim defines in addition.
Appendix A
In an exemplary embodiment, processor 110 is the general processors according to ARM7 processor standard. In ARM7 to the description references ARM structured file of register or ARM7 tables of data (document number ARM DDI 0020C, in December, 1994 distribution).
In order to cooperatively interact 110 processors with vector processor 120: starting and stop vector processor; The test vector processor state comprises synchronous regime; Scalar/special register from vector processor 120 passes to data in the general register of processor 110; And the scalar/special register that the data in the general register is passed to vector processor. Between the vector registor of general register and vector processor, there is not direct conveyer, these transmission need memory as mediator.
Table A .1 has illustrated the ARM7 instruction set of expanding for the reciprocation of vector processor.
Table A .1: the ARM7 instruction set of expansion
Instruction The result
  STARTVP This instruction makes vector processor enter the VP-RUN state, if vector processor has entered the VP-RUN state then without impact. STARTVP carries out as processor data operation (CDP) class in the ARM7 structure, turns back to ARM7 without the result, and ARM7 continues its execution.
  INTVP This instruction makes vector processor enter the VP-IDEL state, if vector processor has entered the VP-IDEL state then without impact. INTVP carries out as processor data operation (CDP) class in the ARM7 structure, turns back to ARM7 without the result, and ARM7 continues its execution.
  TESTSET User's extended register is read in this instruction, and register-bit 30 is set to 1 so that between vector sum ARM7 processor, provide the producer/consumer type synchronously. In the ARM7 structure, TESTSET carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted).
  MFER Transfer to the ARM general register from extended register, in the ARM7 structure, MFER carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted).
Instruction The result
  MFVP Transfer to the ARM7 general register from the scalar/special register of vector processor. Be different from other ARM7 instruction, this instruction is only carried out when vector processor is in VP-IDLE state. Otherwise its result is undefined. In the ARM7 structure, MFVP carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted).
  MTER Transfer to extended register from the ARM7 general register, in the ARM7 structure, MTER transmits (MCR) class as coprocessor register and carries out. ARM7 gets clogged, until this instruction is performed (register is transmitted).
  MTVP Transfer to the scalar/special register of vector processor from the ARM7 general register, be different from other ARM7 instruction, this instruction is only carried out when vector processor is in VP_ IDLE state. Otherwise its result is undefined. In the ARM7 structure, MTVP transmits (MCR) class as coprocessor register and does not carry out. ARM7 gets clogged, until this instruction is performed (register is transmitted).
  CACHE The software administration of ARM7 data cache is provided
  PFTCH The cache line of looking ahead is delivered to the ARM7 data cache.
  WBACK The cache line that the ARM7 data cache is come is written back in the memory.
Table A .2 has listed the unusual of ARM7, before carrying out the fault instruction, detects and reports that these are unusual. The exception vector address provides with sexadecimal notation.
Table A .2:ARM7 is unusual
Exception vector Explanation
    0x00000000 ARM7 resets
    0x00000004 The ARM7 undefined instruction is unusual
    0x00000004 Vector processor is unavailable unusual
    0x00000008 The ARM7 software interrupt
    0x0000000C The ARM7 single step is unusual
    0x0000000C ARM7 IA breakpoint is unusual
    0x00000010 ARM7 data address breakpoint is unusual
    0x00000010 ARM7 invalid data address is unusual
    0x00000018 The ARM7 protection is violating the regulations unusual
The following describes the syntax that the ARM7 instruction set is expanded. About the form of the explanation of term and instruction with reference to ARM structured file or ARM7 tables of data (document number ARM DDI 0020C, deliver in December, 1994).
The ARM structure provides 3 kinds of instruction formats for coprocessor interface:
1. coprocessor data manipulation (CDP)
2. the coprocessor data transmit (LDC, STC)
3. coprocessor register transmits (MRC, MCR)
Whole two kinds of forms are used in the expansion of MSP structure.
The coprocessor data manipulation form (CDP) that uses for operation need not return to ARM7. The CDP form
       30         25                20         15               10             5            0
The CDP format fields has following agreement:
Field Meaning
Cond Condition field, this field designated order executive condition
Opc The co processor operation code
CRn The co processor operation number register
CRd The coprocessor destination register
CP# Coprocessor number; Below coprocessor number be current use: 1111-ARM7 data cache 0111-vector processor, the register of expansion
CP Coprocessor information
CPm The co processor operation number register
Coprocessor data transfer format (LDC, STC) is used for directly loading or the register subset of storage vector processor arrives memory. The ARM7 processor is responsible for providing word address, and vector processor provides or receive data, and the number of words of control transmission. More detailed content is with reference to the ARM7 tables of data. LDC, the STC form
        30            25               20        15               10              5          0  
Format fields has following agreement:
Field Meaning
    Cond Condition field, this field designated order executive condition
    P The Pre/Post flag bit
    U The Up/Down position
    N Transmit length, because the CRd field does not have enough figure places, position N uses as a part of source or destination register identifier.
    W The write-back position
    L Load/the storage position
    Rn Base register
    CRn Coprocessor source/destination register
    CP# Coprocessor number, following coprocessor number are current uses: 1111-ARM7 data cache 0111-vector processor, the register of expansion
    Offset Without 8 of symbols side-play amount immediately
Coprocessor register transformat (MRC, MCR) is used for directly transmission information between ARM7 and vector processor. This form is used in the scalar of ARM7 register and vector processor or the transfer between the special register.
MRC, the MCR form
  30             25              20         15              10                 5        0
This format fields has following agreement:
Field Meaning
    Cond Condition field, the condition that this field designated order is carried out
    Opc The co processor operation code
    L Loading/storage position L=0 moves on to vector processor L=1 and moves from vector processor
    CRn:Crm Coprocessor source/destination register. CRn<1:0 only 〉: CRm<3:0〉be used
    Rd ARM source/destination register
    CP# Coprocessor number, following coprocessor number are current uses: 1111=ARM7 data cache 0111=vector processor, the register of expansion
    CP Coprocessor information
The ARM instruction of expansion
The ARM instruction alphabet sequence of expansion is explained.
The CACHE cache operations
Form
             30          25                20          15               10         5       0
The assembler syntax
STC{cond}p15,cOpc,<Address>
CACHE{cond}Opc,<Address>
Cond={eq wherein, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, It, gt, le, ai, nv} and Opc={0,1,3}. Note, because the CRn field of LDC/STC form is used to specify Opc. The decimal representation of Opcode must be by letter " C " take the lead (namely representing 0 with CO) in the first syntax. About the address mode syntax with reference to the ARM7 tables of data.
Explanation
Be true time at Cond only, carry out this instruction. Opc<3:0〉indicate following operation:
 Opc<3:0> Meaning
    0000 Write-back and calcellation are by the cache line of the change of EA appointment. If the row of coupling comprises the data of not changing, this row is cancelled, and refuses write-back. If can't find the cache line that comprises EA, data cache keeps remaining untouched.
    0001 Write-back and calcellation are by the cache line of the change of EA traction appointment. If matching row comprises the data of not changing, this row is cancelled refuses write-back.
    0010 Be used for PFTCH and WBACK instruction
    0011 Calcellation is by the cache line of EA appointment. Even this row was changed, this cache line is also by cancel (not write-back). This is a kind of privileged operation, if attempt to use under user mode, it will cause that the ARM7 protection is violating the regulations
Other Keep
Operation
With reference to the ARM7 tables of data, how EA calculates.
Unusually
The ARM7 protection is violating the regulations.
INTVP interrupt vector processor
Form 30 25 20 15 10 50
Figure C9711740500251
The assembler syntax
CDP{cond}p7,1,c0,c0,co
INTVP{cond}
Cond={eq wherein, ne, cs, cc, mi, pl, vs, vc, hi, Is, ge, It, gt, le, al, ns}.
Explanation
This instruction is that true time is carried out at Cond only. This instruction is signaled vector processor is stopped. ARM7 needn't wait for that vector processor stops, and continues to carry out next instruction.
Should use MFER busy waiting circulation and whether after this instruction is carried out, stop in order to looking at vector processor. If vector processor is at the VP_IDLE state, then this instruction is inoperative. Position 19:12,7:15 and 3:0 are retained.
Unusually
Vector processor is unavailable.
MFER shifts from extended register
Form
        30            25           20            15            10           5          0
Figure C9711740500261
The assembler syntax
MRC{cond}p7,2,Rd,cP,cER,0
MFER{cond}Rd,RNAME
Cond={eg wherein, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, P={0,1}, ER={0 ... .15} and RNAME refers to the register memonic symbol (that is, PERO or CSR) of appointment on the structure.
Explanation
This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to P:ER<3:0〉the extended register ER of appointment shifts, and is as shown in the table. Explanation with reference to chapters and sections 1.2 extended registers.
  ER<3:0>     P=0     P=1
    0000     UER0     PER0
    0001     UER1     PER1
    0010     UER2     PER2
    0011     UER3     PER3
    0100     UER4     PER4
    0101     UER5     PER5
    0110     UER6     PER6
    0111     UER7     PER7
    1000     UER8     PER8
    1001     UER9     PER9
  ER<3:0>     P=0     P=1
    1010     UER10     PER10
    1011     UER11     PER11
    1100     UER12     pER12
    1101     UER13     PER13
    1110     UER14     PER14
    1111     UER15     PER15
Position 19:17 and 7:5 are retained
Unusually
When attempting to access PERx in user mode, protection is violating the regulations.
MFVP shifts from vector processor
Form
Figure C9711740500281
The assembler syntax
MRC{cond}p7,1,Rd,Crn,CRm,0
MFVP{cond}Rd,RNAME
Cond={eq wherein, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=and r0 ... r15}, CRn={c0 ... .c15}, CRm={c0 ... .c15} and RNAME refers to the register memonic symbol (that is, SPO or VCS) of appointment on the structure
Explanation
This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to the scalar of vector processor/special register CRn<1:0 〉: CRm<3:0〉shift. Distribution with reference to register transfer vector processor register number among the chapters and sections 3.2.3.
Position 7.5 and CRn<3:2〉be retained.
Below the vector processor register mappings is presented at. The table 15 of reference vector processor special register (SP0-SP15).
 CRM<3:0> CRn<1:0>=00 CRn<1:0>=01 CRn<1:0>=10 CRn<1:0>=111
    0000     SR0     SR16     SP0     RASR0
    0001     SR1     SR17     Sp0     RASR1
    0010     SR2     SR18     SP0     RASR2
    0011     SR3     SR19     SP0     RASR3
    0100     SR4     SR20     SP0     RASR4
    0101     SR5     SR21     SP0     RASR5
    0110     SR6     SR22     SP0     RASR6
    0111     SR7     SR23     SP0     RASR7
    1000     SR8     SR24     SP0     RASR8
    1001     SR9     SR25     SP0     RASR9
    1010     SR10     SR26     SP0     RASR10
    1011     SR11     SR27     SP0     RASR11
    1100     SR12     SR28     SP0     RASR12
    1101     SR13     SR29     SP0     RASR13
    1110     SR14     SR30     SP0     RASR14
    1111     SR15     SR31     SP0     RASR15
SR0 often reads 32 zero, and ignores writing it.
Unusually
Vector processor is unavailable.
MTER transfers to extended register
Form 30 25 20 15 10 50
The assembler syntax
MRC{cond}p7,2,Rd,cP,cER,0
MFVP{cond}Rd,RNAME
Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, P={0,1}, ER={0 ... 15}. RNAME refers to the register memonic symbol (that is, PERO or CSR) of appointment on the structure.
Explanation
This instruction is that true time is carried out in condition only. ARM7 register Rd is according to P:ER<3:0〉the extended register ER of appointment shifts. As shown in the table
  ER<3:0>     P=0     P=1
    0000     UER0     PER0
    0001     UER1     PER1
    0010     UER2     PER2
    0011     UER3     PER3
    0100     UER4     PER4
    0101     UER5     PER5
    0110     UER6     PER6
    0111     UER7     PER7
    1000     UER8     PER8
    1001     UER9     PER9
    1010     UER10     PER10
    1011     UER11     PER11
    1100     UER12     PER12
    1101     UER13     PER13
    1110     UER14     PER14
    1111     UER15     PER15
Position 19:17 and 7:5 are for subsequent use
Unusually
Attempt is when user mode access PERx, and protection is violating the regulations.
MTVP transfers to vector processor
Form 30 25 20 15 10 50
Figure C9711740500311
The assembler syntax
MRC{cond}p7,1,Rd,Crn,CRm,0
MFVP{cond}Rd,RNAME
Here Cond={eq, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, CRn={c0 ... .c15}, CRm={c0 ... .c15}. RNAME refers to the register memonic symbol (that is, SPO or VCS) of appointment on the structure.
Explanation
This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to the scalar of vector processor/special register CRn<1:0 〉: CRm<3:0〉shift.
Position 7:5 and CRn<3:2〉keep.
The vector processor register mappings is as follows
CRM<3:0> CRn<1:0>=00 CRn<1:0>=01  CRn<1:0>=10 CRn<1:0>=111
    0000     SR0     SR16     SP0     RASR0
    0001     SR1     SR17     SP0     RASR1
    0010     SR2     SR18     SP0     RASR2
    0011     SR3     SR19     SP0     RASR3
    0100     SR4     SR20     SP0     RASR4
    0101     SR5     SR21     SP0     RASR5
    0110     SR6     SR22     SP0     RASR6
    0111     SR7     SR23     SP0     RASR7
    1000     SR8     SR24     SP0     RASR8
    1001     SR9     SR25     SP0     RASR9
    1010     SR10     SR26     SP0     RASR10
    1011     SR11     SR27     SP0     RASR11
    1100     SR12     SR28     SP0     RASR12
    1101     SR13     SR29     SP0     RASR13
    1110     SR14     SR30     SP0     RASR14
    1111     SR15     SR31     SP0     RASR15
Unusually
Vector processor is unavailable.
PFTCH looks ahead
Form
         30           25              20         15             10                 5
0
Figure C9711740500321
The assembler syntax
LDC{cond}p15,2,<Address>
PFTCH{cond}<Address>
Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, Is, ge, lt, gt, le, al, nv}, the ARM7 tables of data of reference address mode syntax.
Explanation
This instruction is that true time is carried out at Cond only. Cache line by the EA appointment is pre-fetched in the ARM7 data cache.
Operation
How to be calculated about EA, with reference to the ARM7 tables of data.
Unusually: nothing
STARTVP start vector processor
Form
            30             25           20               15             10         5
    0 
Figure C9711740500331
The assembler syntax
CDP{cond}p7,0,cO,cO,cO
STARTVP{cond}
Cond={eq wherein, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, it, gt, le, al, nv}.
Explanation
This instruction is that true time is carried out at cond only. This instruction is signaled to vector processor, starts to carry out and automatically remove VISRC<vjp〉and VISRC<vip. ARM7 does not wait for that vector processor starts execution, continues to carry out next instruction.
The state of vector processor must be initialized to desired state before this instruction is carried out. If vector processor is at the VP-RUN state, then this instruction is without effect.
Position 19:12,7:5 and 3:0 keep.
Unusually
Vector processor is unavailable.
TESTSET test and setting
Form
           30          25             20                 15           10         5
      0
The assembler syntax
MRC{cond}p7,0,Rd,cO,cER,0
TESTSET{cond}Rd,RNAME
Here cond={eq, he, cs, cc, mi, p1, rs, re, hi, ls, ge, It, gt, le, al, nv}. Rd=and r0....r15}, ER={0 ... ..15}, RNAME refer to the register memonic symbol (that is, VER1 or VASYNC) of appointment on the structure.
Explanation
This instruction is that true time is carried out at cond only, and this instruction turns back to the content of UERX among the RD, and sets UERX<30〉be 1. If destination register is appointed as by ARM7 register 15 then UERx<30〉return in the Z position of CPSR, in order to can realize short busy waiting circulation.
Current, only have UER1 to be prescribed in company with reading instruction works.
Position 19:17 and 7:5 keep.
Unusually: nothing
Appendix B
The organization definition of multimedia processor 100 extended register of processor 110 usefulness MFER and MTER instruction access, extended register comprises special permission extended register and user's extended register.
The special permission extended register is mainly used in controlling the operation of multi-media signal processor. B.1 they be shown in table
Show B.1: the special permission extended register
Number Memonic symbol Explanation
    PER0 CTR Control register
    PER1 PVR The processor type register
    PER2 VIMSK The vector IMR
    PER3 ALABR ARM7 IA breakpoint register
    PER4 ADABR ARM7 data address breakpoint register
    PER5 SPREG The scratchpad register
    PER6 STR Status register
The operation of control register control MSP100, all positions among the CTR are eliminated when resetting, and B.2 the definition of register shown in showing.
Table definition B.2:CTR
The position Memonic symbol Explanation
    31-13 Keeping the position reads as 0 forever
    12   VDCI Vector data cache invalidation position. During set, it is invalid that whole vector processor data caches are become. Because the cache invalidation operation can conflict with normal cache operations usually, so can only support an invalid code sequence.
    11   VDE Vector data cache enabling position. When removing, forbid the vector processor data cache
    10   VICI Vector instruction cache invalidation position. It is invalid that whole vector processor instruction caches are become. Because the cache invalidation operation can conflict with normal cache operations usually. So can only support an invalid code sequence.
    9   VICE Vector instruction cache enabling position. When removing, forbid the vector processor instruction cache.
The position Mnemonic symbol Explanation
    8  ADCI ARM7 data cache invalid bit. When set, it is invalid that whole ARM7 data caches are become. Because cache invalidation operates usually together with normal cache operations conflict, so only support an invalid code sequence.
    7  ADCE ARM7 data cache enable bit. When removing, forbid the ARM7 data cache.
    6  AICI ARM7 instruction cache invalid bit. When set, it is invalid that whole ARM7 instruction caches are become. Because cache invalidation operates usually together with normal cache operations conflict, so only support an invalid code sequence.
    5  AICE ARM7 instruction cache enable bit. When removing, forbid the ARM7 instruction cache
    4  APSE ARM7 processor single step enable bit. When set, make the ARM7 processor after carrying out an instruction, it is unusual that the single step of ARM7 processor occurs. The single step function only obtains under user or way to manage.
    3  SPAE Scratchpad access enable bit. When setting, allow ARM7 to process from scratchpad and load or deposit scratchpad. When removing, attempt loading or be stored into scratchpad unusual to produce ARM7 invalid data address
    2  VPSE Vector processor single step enable bit. When setting, make vector processor after carrying out an instruction, it is unusual that the vector processor single step occurs.
    1  VPPE Vector processor streamline enable bit. When removing, the configuration vector processor is in order to operate under the nonpipeline mode. This moment, it was movable only having an instruction in the vector processor execution pipeline.
    0  VPAE Vector processor access enabled position. When setting, make as mentioned above the ARM7 instruction of ARM7 processing execution expansion. When removing, stop ARM7 processing execution expansion ARM7 instruction. All such attempts can produce unavailable unusual of vector processor
The state of status register instruct MS P100. All positions among the field STR are eliminated when resetting, and B.3 the definition of register shown in showing.
Show B.3 STR definition
The position Memonic symbol Explanation
    31:23 Reservation position-forever pronounce 0
    22  ADAB When ARM7 data address breakpoint coupling occured, ARM7 data address breakpoint exception bits was set up, and interrupting report by data exception should be unusual.
    21  AIDA When ARM7 loads or the storage instruction attempts to access debatable address or MSP concrete scheme when not finishing, maybe when attempting to access a unallowed scratch pad memory, it is unusual to produce ARM7 invalid data address. Thisly unusually can stop interrupt reporting by data.
    20  AIAB When ARM7 IA breakpoint matches now, ARM7 IA breakpoint exception bits is set. This stops by looking ahead interrupting reporting unusually.
    19  AIIA ARM7 illegal command address is unusual. This exception stops by looking ahead interrupting reporting.
    18  ASTP The ARM7 single step is unusual. This stops by looking ahead interrupting reporting unusually.
    17  APV ARM7 protection violation. The exception is reported via the IRQ interrupt
    16  VPUA Vector processor can not get an exception, the exception can not get through the coprocessor Interrupt to report
    15-0 Reserved - always read as 0
Processor type (Version) register identifies the processor specific multimedia signal processor family Processor type.
Vector processor interrupt mask register VIMSK control processor 110 different vector processor Often reported. With VISRC register when the corresponding bit is set when, VIMSK in each one ARM7 to interrupts generated an exception. It does not affect how to detect abnormal vector processors, but the impact is No exceptions will be interrupted ARM7. In VIMSK all the bits are cleared at reset. Register set Defined as shown in Table B.4
Table B.4: VIMSK Definition
Position Mnemonic Explanation
    31     DABE Data address break interrupt enable
    30     LABE Instruction address break interrupt enable
    29     SSTPE Single-step interrupt enable
    28-14 Reserved - always read as 0.
    13     FOVE Floating point overflow interrupt enable
    12     FINVE Illegal floating point operand interrupt enable
    11     FDIVE Floating-point division by zero interrupt enable
    10     IOVE Integer overflow interrupt enable
    9     IDIVE Integer divide by zero interrupt is enabled
    8-7 Reserved - always read as 0
    6     VIE VCINT interrupt enable
    5     VJE VCJOIN interrupt enable
    4-1 Reserved - always read as 0
    0     CSE Context switching is enabled
ARM7 instruction address breakpoint registers ARM7 aid debugging process. Register Definition Table B.5 shows.
Table B.5: AIABR Definition
Position Mnemonic Explanation
31-2 LADR ARM7 instruction address
1 Reserved, always read as 0
0 LABE Instruction address breakpoints can, cleared on reset. If set, when "ARM7 instruction accesses address" matches ALABR <31:2>, And VCSR <AIAB> cleared occurs ARM7 instruction to Address breakpoint exception, VCSR <ALAB> set to indicate an exception. When a match occurs, if VCSR <ALAB> has been set, then the VCSR <AIAB> cleared match is ignored. In the instruction execution Before reporting anomalies.
"ARM7 Data Address Breakpoint Registers" Auxiliary ARM7 debug procedures. Register Definition As shown in Table B.6.
Table B.6: ADABR Definition
Position Mnemonic Explanation
  31-2  DADR ARM data addresses. Undefined at reset
    1  SABE Storage "Address Breakpoint Enable" in the reset clears. If set, when the ARM7 Memory access address high 30 matches ADABR <31:2> and VCSR <ADAB> Is cleared, the occurrence of "ARM7 Data Address Breakpoint" exception. VCSR <ADAB> set, indicates abnormalities. When a match occurs, if VCSR <ADAB> Has been set, this VCSR <ADAB> is cleared, Match is ignored. In storage before instruction execution, the exception is reported.
    0  LABE Load address breakpoint enabled. Cleared on reset. If set, when the ARM7 Load Access address high 30 matches ADABR <31:2> and VCSR <ADAB> Is cleared when "ARM7 Data Address Breakpoint" exception. VCSR <ADAB> is set to indicate an exception. When a match occurs if the VCSR <ADAB> has been set, this VCSR <ADAB> is cleared, Match is ignored. In previously reported abnormal load instruction.
"Scratchpad registers" Configuring the cache subsystem 130 is formed using a high SRAM Speed ​​and size of the temporary address. Register definitions are shown in Table B.7
Table B.7: SPREG Definition
Position Mnemonic Explanation
31-11 SPBASE "High-speed buffer base address" indicates the start address of scratchpad high 21. According MSP_BASE register value, which value must have 4M bytes Offset
10-2 Retention
1-0 SPSIZE The size scratchpad 00 -> 0K (vector processor with 4K data cache) 01 -> 2K (vector processor with 2K data cache) 10 -> 3K (vector processor with 1K data cache) 11 -> 4K (without vector processor data cache)
Users extended registers 110 and 120 is mainly used for synchronization of the processor. Users extended registers when Only a pre defined, the mapping in place 30, and for example "MFERR15, UERx" finger Order will return a bit value Z flag. Bit UERx <31> and UERx <29:0> are always Read as 0. Users extended registers are described in Table B.8.
Table B.8: User Extension Register
Number Mnemonic Explanation
  UER0  VPSTATE Vector register status flag. When set, bit 30 indicates vector processing Is the VP-RUN state, and executes instructions. When cleared, which means that In VP_IDLE vector processor state and has stopped VPC Addressing the next instruction to be executed. VPSTATE <30> in the reset Is cleared.
  UER1  VASYNC Vector and ARM7 synchronization flag. Bit 30 provides vectors and ARM7 Department Processor 120 and 110 between the producer / consumer type synchronization. Vector VMOV instruction processor 120 can set or clear this flag. The standard Chi also can be used MFER or MTER ARM7 instruction processing is Set or cleared. In addition, the flag can be read or set command TESTSET Position.
Table B.9 shows the power-on reset state when the extended registers.
Table B.9: Extended power status register
Register Reset state
    CTR 0
    PVR TBD
    VIMSK 0
    ALABR AIABR <0> = 0, the other not defined
    ADABR ADABR <0> = 0, the other not defined
    STR 0
    VPSTATE VPSTATE <30> = 0, the other not defined
    VASYNE VASYNC <3> = 0, the other not defined
Appendix C
Structural state vector processor 120 comprises 32 32-bit scalar register; 32 288 Vector registers of two groups; one pair of 576 vector accumulator register; a group of 32 dedicated registers. Scalar, vector, and intended for general-purpose programming accumulator register with, and supports many different data types.
The following tags are used here and later parts: VR indicates vector registers; VRi denotes the i-th vector registers (zero offset); VR [i] represents the vector register VR in the i-th data element; Represents the vector register VR <a:b> the bits a to b, and VR [i] <a:b> means to the VR of registers in the i-th bit of a data element to b.
For a number of elements in a vector register, vector data structure has a type and number of additional According to the length dimension. Because there is a fixed size vector register, it depends on the number of data elements to maintain Element length. MSP structure defines as shown in Table C.1 length 5 elements.
Table C.1: ​​the length of the data element
Length Name Length (bits)
Boolean     1
Byte     8
Byte 9     9
Halfword     16
Word     32
MSP structure, according to the data type specified in the instruction and length to explain the vector data. Typically, Most math instruction byte, byte 9, halfword, and word length of the element supports two's complement (integer) grid Style. In addition, for most arithmetic instructions, the word length of the element supports IEEE754 single precision format.
A programmer can be in any desired way to interpret the data, as long as the instruction sequence to produce meaningful Results. For example, programmers freely in bytes 9 to store 8-bit unsigned number, which is equivalent to freely 8 unsigned byte data saved to the element, and with the supplied two's complement arithmetic instructions to operate They are, as long as the program can handle "false" overflow results.
There are 32 scalar registers, called SR0 to SR31. Scalar register is 32 bits long and to accommodate Is satisfied by any one of a defined length of a data element. Scalar register is a special register SR0 Makers. Register SR0 always read 32 zeros. And disregard for SR0 register writes. Byte, word, Section 9 and the half-word data type is stored in the scalar register the least significant bit, and that the most significant Bits have undefined values.
Since no data type indicator registers, the programmer must know the storage used by each instruction The data types. This differs from the 32-bit register that contains the 32-bit value other structures. MSP A structured data type specified correctly modify only the results for the defined data type A bit. For example, Byte 9 plus the results can only be modified scalar register 32 goals Low 9. The higher the value of the 23 Not defined. Unless otherwise indicated by instruction.
64 vector registers are configured two groups, each group of 32 registers. Group 0 contains the first 32 Registers, followed by the group 1 comprises 32 registers. These two groups a set to the current group, Another setting or alternative groups. All vector instruction through the use of default values ​​in the current group registers, except The load / store and register transfer instructions, they can access the alternative group vector register. In the "to Volume control "and" Status Register VCSR "in CBANK bits used to set the group of 0 or 1 to For the current group (another one as an alternative group). In the current group of vector registers are designated as VR0 to VR31, and in the alternative group designated as VRA0 to VRA31. ...
64 vector registers are configured two groups, each group of 32 registers. Group 0 contains the first 32 Registers, followed by the group 1 comprises 32 registers. These two groups a set to the current group, Another setting or alternative groups. All vector instruction through the use of default values ​​in the current group registers, except The load / store and register transfer instructions, they can access the alternative group vector register. In the "to Volume control "and" Status Register VCSR "in CBANK bits used to set the group of 0 or 1 to For the current group (another one as an alternative group). In the current group of vector registers are designated as VR0 to VR31, and in the alternative group designated as VRA0 to VRA31. ...
VRi<575:0>=VR 1i<287:0>:VR 0i<287:0>
Here VR0i and VR1i are 1 and 0 represents the group number of registers in the vector register VRi. Double-wide vector registers are called VR0 to VR31.
Vector register can hold byte, byte 9, halfword, or word length of more than one element, as shown in Table C.2 Shown.
Table C.2: number of elements of each vector register
Length of the element name Element length (bits) Maximum number of elements The total number of bits used
Byte 9     9     32     288
Byte     8     32     256
Halfword     16     16     256
Word     32     8     256
Does not support a mixture of various elements length register. In addition to byte 9 elements outside with only 288 The 256 bits. In particular, the ninth bit of each do. In byte, half-word and word length of 32 without Bit is reserved. Their values ​​programmer should not make any assumptions.
Vector accumulator register is compared to the result in the destination register has higher precision intermediate nodes If available storage. Vector accumulator register 288 consists of four registers, which is VAC1H, VAC1L, VAC0H and VAC0L. VAC0H: VAC0L default by the three instructions through the Purposes. VEC64 mode only, VCL1H: VAC1L 9 to 64 bytes for the analog vector operations. Even in VEC32 manner set 1 for the current group, still use this VAC0H: VAC0L right.
To generate the source vector register with the same number of elements in the result of extended precision, by a pair of Registers to hold the extended-precision elements, as shown in Table C.3.
Table C.3: Vector Accumulator format
Element length Logical View VAC format
Byte 9 VAC[i]<17:0> VAC0H [i] <8>: VAC0L <i> <8:0> with For i = 0 .. 31 and VAC1H [i-32] <8:0>: VAC1L [i-32] <8:0> for i = 32 .. 63
Byte VAC[i]<15:0> VAC0H [i] <7:0>: VAC0L <i> <7:0> For i = 0 .. 31 and VAC1H [i-32] <7:0>: VAC1L [i-32] <7:0> for i = 32 .. 63
Halfword VAC[i]<31:0> VAC0H [i] <15:0>: VAC0L <i> <15: 0> for i = 0 .. 15 and VAC1H [i-16] <15: 0>: VAC1L [i-16] for i = 16 .. 31
Word VAC[i]<63:0> VAC0H [i] <31:0>: VAC0L <i> <31: 0> for i = 0 .. 7 and VAC1H [i-8] <31: 0>: VAC1L [i-8] <31:0> for i = 8 .. 15
Only VEC64 mode only used VAC1H: VAC1L right, at this time the number of elements, the byte 9 (and Byte), halfword and word 64, 32 or 16, respectively.
There are 33 dedicated registers can not be loaded directly from memory or directly into memory. 16 special Using registers are called RASR0 to RASR15, forming an internal subroutine return address stack by adjusting Use and return instructions for use. Another 17 32 dedicated registers are shown in Table C.4
Table C.4: special register
Number Mnemonic Explanation
    SP0     VCSR Vector control and status register
    SP1     VPC Vector program counter
    SP2     VEPC Vectored exception program counter
    SP3     VISRC Vectored interrupt source register
    SP4     VIINS Vectored Interrupt instruction register
    SP5     VCR1 Vector Count Register 1
    SP6     VCR2 Vector Count Register 2
    SP7     VCR3 Vector Count Register 3
    SP8     VGMR0 Total vector mask register 0
    SP9     VGMR1 Vector mask register a total
    SP10     VOR0 Vector overflow register 0
    SP11     VOR1 Vector overflow register 1
    SP12     VLABR Vector data address breakpoint registers
    SP13     VDABR Vector instruction address breakpoint register
    SP14     VMMR0 Vector shift mask register 0
    SP15     VMMR1 Vector mask register a transfer
    SP16     VASYNC Vector and ARM7 Synchronization Register
Vector control and status registers VCSRDefinitions are shown in Table C.5
Table C.5: VCSR Definition
Position Mnemonic Explanation
  31:18 Retention
  17:13 VSP<4:0> Return address stack pointer. VSP by moving to and from the subroutine subroutine Cheng instructions to return to use to keep track of internal return address stack. In return Return address stack is only 16 entrance, VSP <4> is used to detect stack Overflow condition.
  12 SO Summary overflow status flag. When the result of an arithmetic operation overflows, this bit is Set. This bit is once set is unchanged until the write 0 to
When cleared.
Position Mnemonic Explanation
    11     GT Greater than the state flag. When SRa> SRb, use VSUBS instruction set Set this bit.
    10     EQ Equal status flag. When SRa = SRb, use VSUBS instruction set Set this bit.
    9     LT Less than the state flag. When SRa <SRb time by VSUBS instruction set The bit
    8     SMM Select a transfer mask. When this bit is set, VMMR0 / 1 to becoming operator Shielding elements surgery operation.
    7     CEM Complement shielding elements. When this bit is set, regardless of the configured arithmetic Shielding element operation, the element is defined shielded VGMR0 / 1 or VMMR0 / 1 to 1's complement. This bit does not change VGMR0 / 1 or VMMR0 / 1 contents of these registers are used only to change. SMM: CEM Code provides: 00 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 01 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 10 - with VMMR0 / 1 as all elements except VCMOVM outside Masked. 11 - with VMMR0 / 1 as all elements except outside VCMOVM Masked. ...
    6     OED Complement shielding elements. When this bit is set, regardless of the configured arithmetic Shielding element operation, the element is defined shielded VGMR0 / 1 or VMMR0 / 1 to 1's complement. This bit does not change VGMR0 / 1 or VMMR0 / 1 contents of these registers are used only to change. SMM: CEM Code provides: 00 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 01 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 10 - with VMMR0 / 1 as all elements except VCMOVM outside Masked. 11 - with VMMR0 / 1 as all elements except outside VCMOVM Masked. ...
    5     ISAT Integer saturation mode. OED: ISAT bit combination is defined as: 00 No OED: ISAT bit saturation states: 00 unsaturated, when an overflow exception reports. X1 saturated, does not cause an overflow
Position Mnemonic Explanation
10 unsaturated, when an overflow exception is not reported.
  4:3   RMODE IEEE754 floating point rounding mode operation. 00 negative infinity rounding direction 01 rounding direction zero 10 rounding direction closest to the value 11 rounding direction positive infinity
  2   FSAT Saturation mode bit floating point (IEEE fast way)
  1   CBANK Current group bit. When set, indicates that the group one of the current group. When cleared the table Group 0 for the current group show, when VEC64 bit is set, CBANK suddenly Slightly.
  0   VEC64 64 bytes 9 vector mode bit. When set, the provisions of vector registers and There accumulator 576. Default mode specified length of 32 bytes 9 and its Called VEC32 mode.
Vector VPC program counter registerBy the vector processor 120 to execute the next instruction Address. ARM7 processor 110 is issued STARTVP command to start operation of the vector processor 120 Register should be loaded before the VPC.
Vectored exception program counter VEPCIndicate the most likely to cause abnormal latest instruction address. MSP100 does not support precise exception, therefore, with the "most likely" is used.
Vectored interrupt source register VISRCOn the ARM7 processor 110 that the interrupt source. Appropriate bit By the hardware when the abnormality is detected is set. In the vector processor 120 begins executing software before re- Must clear the register VISRC. In the register VISRC any location in the bit vector processing are caused 120 into the state of VP-IDLE. If the corresponding interrupt enable bit in VIMSK be set to Interrupts the processor 110 is sent. Table C.6 defines the contents of the register VISRC.
C.6: VISRC Definition
Position Mnemonics Explanation
    31   DAB Data address breakpoint exception
    30   LAB Instruction address breakpoint exception
    29   SSTP Single step exception
   28-18 Retention
    17   IIA Invalid instruction address anomalies
    16   IINS Invalid instruction exception
    15   IDA Invalid data address exception
    14   UDA Unaligned data access exception
    13   FOV Floating-point overflow exception
    12   FINV The number of floating-point invalid operation exception
    11   FDIV Floating-point division by zero exception
    10   IOV Integer overflow exception
    9   IDIV Integer Divide by Zero exception
    8   RASO The return address on the stack overflow exception
    7   RASU The return address stack underflow exception
    6   VIP VCINT exception is pending, the Executive STARTVP command clears the bit
    5   VJP VCJOIN exception is pending, the Executive STARTVP command clears the bit
    4-0   VPEV Vector processor exception vector
Vector interrupt instruction register VIINSWhen VCINT or VCJOIN instruction is executed to interrupt the ARM7 processor 110, VCINT or VCJOIN instruction is updated.
Vector Count Register VCR1, VCR2 and VCR3For the "reduction and branch" instructions VD1CBR, VD2CBR and VD3CBR, and used to perform the loop count is initialized. When executing OK VD1CBR instruction VCR1 register is decremented by 1. If the count value is not zero, and the command Match the conditions as referred to VFLAG, branching occurs. Otherwise, the branch does not occur. Register VCR1 In any case, can be decremented by 1. Register VCR2 and VCR3 be used in the same way.
Vector fully shielded register VGMR0VEC32 mode indicates the destination vector will be affected Register elements and in VEC64 mode in VR <287:0> elements within. In VGMR0 A control vector for each 9 bits in the destination register updates. Specifically, VGMR0 <i> Control VEC32 mode VRd <9i +8:9 i> Updates and VEC64 VR mode0d <9i +8:9 I> update. Note, VR0d refers to the VEC64 mode within the destination register bank 0 Device, while VRd refers to the current group destination register. In VEC32 mode, both in the group 0, Also for group 1. Vector mask register VGMR0 full instructions for all except the VCMOVM Execution of the instructions.
Vector mask register VGMR1 represents all VEC64 VR mode will be affected <575:288> elements within. In each of the control register VGMR1 the purpose of group 1 vector 9 bits register updates. Specifically VGMR1 <i> control VR1 <9i +8:9 i> Updates. In VEC32 VGMR1 mode register is not used, but VEC64 mode, image In addition VCMOVM instruction outside the ring of all instructions executed.
Vector overflow register VOR0VEC32 mode represents the elements and VEC64 mode VR <287:0> the elements that comprise a vector arithmetic overflow after the results. The register Scalar register is not subject to modification arithmetic. Bit VOR0 <i> set indicates byte and byte 9 The i-th element of the first <i,idiv2> half word elements, or the operation of the first word data type (i, idiv4) Elements including overflow results. For example, bits 1 and 3 may be set to indicate, respectively, the first half-word And word elements overflow. In VOR0 median mapping differs from the median of VGMR0 or VGMR1 Mappings.
Vector overflow register VOR1VEC64 mode for showing VR <575:288> The elements that are included in the vector arithmetic operation result after an overflow. Register VOR1 in VEC32 Mode is not used, nor by the scalar arithmetic to modify. Bit set VOR1 <i> expressed words Section or byte 9 i-th element, half-word section (i, idiv2) elements, or the operation of the first word data type (i idiv4) elements include an overflow results. For example, bits 1 and 3 may be respectively set as shown in the VR <575:288> in the first half-word or word element overflow. In VOR1 median mapping does not VGMR0 or the same as the mapping of the bits VGMR1.
Vector instruction address breakpoint register VLABRAid debugging vector program. Registers are defined as Table C.7 below.
Table C.7: VLABR Definition
Position Mnemonic Explanation
   31-2 IADR Vector instruction address, the reset is not defined
    1 Reserved bit
    0 IABE Instruction address breakpoints enabled. In the reset is not defined. If set, when the vector refers to Make access address with VLABR <31:2> matches happen "vector instruction to Address Breakpoint "exception, set bit VISRC <IAB> to indicate abnormalities of the different Often before instruction execution reports.
Vector data address breakpoint registers VDABRAid debugging vector program. Registers are defined as Table C.8 representation.
Table C.8: VDABR Definition
Position Mnemonic Explanation
   31-2  DADR Vector data addresses. When the reset is not defined
    1  SABE Memory address breakpoint enabled. Reset is not defined. If set, when the vector storage Chu access address with VDABR <31:2> match happen "vector Data Address Breakpoint "exception. VISRC <DAB> bit is set to indicate Exception. Previously reported in the storage instruction execution exception.
    0  LABE Load address breakpoint enabled. Cleared on reset. If set, when the vector plus Set access address with VDABR <31:2> match occurs when "the number of vectors According to the Address Breakpoint "Exception. VISRC <DAB> is set to indicate an exception. Before loading the instruction execution report abnormalities.
Vector shift mask register VMMR0At all times for VCMOVM command to use, while When VCSR <SMM> = 1 in time for all commands used. Register VMMR0 indicates VEC32 Mode will be affected elements of the destination vector register, and VEC64 mode VRL <287:0> inline elements. Each bit in the VMMR0 control vector nine bits of the destination register Updates. Specifically VMMR0 <i> in VEC32 mode control VRd <9i +8:9 i> Updates the control mode in VEC64 VR0d <9i +8:9 i> updates. In VEC64 mold Where VR0d indicates the purpose of the group 0 register, VRd refers to the current group of the destination register, In VEC32 mode VRd can also be in group 0 In Group 1.
Vector shift mask register VMMR1At all times for VCMOVM command to use, while When VCSR <SMM> = 1 in time for all commands used. Register VMMR1 indicates VEC64 Model affected the VR <575:288> elements, VMMR1 of each control Vector group 1 9 bits in the destination register updates. Specifically VGMR1 <i> control VR1d <9i +8:9 i> update. In VEC32 VGMR1 mode register is not used.
Vector and ARM7 synchronization register VASYNCProvided between the processor 110 and 120 Production / Consumer type of synchronization. Currently, the only defined bit 30. When the vector processor VP-120 RUN or VP_IDLE time, ARM7 processor available MFER, MTER and TESTSET means Make access to the register VASYNC. Register VASYNC not pass TVP or MFVP instruction is ARM7 processor accesses. Because these commands can not access beyond the beginning of 16 vector processors Special register. Vector processing instruction accesses through VMOV register VASYNC.
Table C.9 shows power-on reset vector processor state.
Table C.9: Power-on reset state vector processors
Register Reset state
    SR0     0
All other registers Undefined
In the vector processor can execute instructions prior to the adoption ARM7 processor 110 initializes dedicated registers Register.
Appendix D
Each instruction implied or required by the source and destination operand data types. Some commands have the same Applicable to more than one data type semantics. Some instructions have the semantics of the source with a digital According to the types, and different data types on the results. This appendix describes the exemplary embodiment of the number of support According to the type. In the present application are described in Table 1 of the supported data types int8, int9, int16, int32 and float. Does not support unsigned integer format, unsigned integer value in the first before use First must be converted to two's complement format. The programmer is free to use unsigned integer arithmetic instructions together or Select any other format, as long as the proper handling overflow. This structure defines only two's complement integer 32-bit floating-point number and the type of data overflow. These structures are not detected 8,9,16 or 32-bit computing Implementation of this operation is to detect the necessary unsigned overflow. Table D.1 shows the loading operation supported by the The data length ...
Each instruction implied or required by the source and destination operand data types. Some commands have the same Applicable to more than one data type semantics. Some instructions have the semantics of the source with a digital According to the types, and different data types on the results. This appendix describes the exemplary embodiment of the number of support According to the type. In the present application are described in Table 1 of the supported data types int8, int9, int16, int32 and float. Does not support unsigned integer format, unsigned integer value in the first before use First must be converted to two's complement format. The programmer is free to use unsigned integer arithmetic instructions together or Select any other format, as long as the proper handling overflow. This structure defines only two's complement integer 32-bit floating-point number and the type of data overflow. These structures are not detected 8,9,16 or 32-bit computing Implementation of this operation is to detect the necessary unsigned overflow. Table D.1 shows the loading operation supported by the The data length ...
Length of the data memory Register data length Load operation
    8-bit     9-bit Load 8, sign extended to 9 (for Canadian Contains eight two's complement)
    8-bit     9-bit Load eight, zero-extended to nine (for loading Unsigned 8)
    16-bit     16-bit Load 16, (used to load 16-bit unsigned Or two's complement)
    32-bit     32-bit Load 32, (used to load 32-bit unsigned, 2's complement integer or 32-bit floating point)
This structure according to the data type specified memory address boundary alignment. That is not aligned on byte to Requirements; right halfword aligned halfword boundary conditions; right word is the word boundary alignment condition.
Table D.2 shows the supported data storage operation length
Table D.2: storing operations supported by the data length
Register data length Length of the data memory Storage operation
    8-bit     8-bit Storage 8 (8-bit unsigned storage or 2's complement Code)
    9-bit     8-bit Cut to the lower 8 bits, storage 8 (a memory 9
Whether the value of the symbol of 0-255 2 Complement)
    16-bit     16-bit Storage 16 (16-bit unsigned storage or 2 Complement).
    32-bit     32-bit Storage 32
Because more than one data type is mapped to either a scalar or vector registers. So in the head Registers for some data types may be some bit is not defined results. In fact, in addition to the Amount of data in the destination register in byte 9 the operation and the length of the scalar data in the destination register word length operation Work, the some bits in the destination register, their values ​​are not due to an operation are defined. These bits, Structural requirements of their value is undefined, Table D.3 shows the length of the data for each undefined Position.
Table D.3: Undefined bit data length
Data length Vector destination register Scalar destination register
Byte VR<9i+8>,for i=0 to 31 SR<31:8>
Byte 9 none SR<31:9>
Halfword VR<9i+8>,for i=0 to 31 SR<31:16>
Word VR<9i+8>,for i=0 to 31 none
When programming programmer must know the source and destination registers or memory data type. Data Classes Length of the element from one type into another potentially resulting in a different number of elements stored in a vector register Medium. For example, from half-word to word data type conversion of vector registers need two vector registers to save The same number of storage elements is converted. On the contrary, from the vector register with a user-defined format word Data type conversion into half-word format, in the vector register is half the number of elements to produce the same, and the remaining I bit in the other half. In both cases, the data type conversion is converted to produce an element having Structure configuration, the length of these elements is different from the length of the source element. ...
When programming programmer must know the source and destination registers or memory data type. Data Classes Length of the element from one type into another potentially resulting in a different number of elements stored in a vector register Medium. For example, from half-word to word data type conversion of vector registers need two vector registers to save The same number of storage elements is converted. On the contrary, from the vector register with a user-defined format word Data type conversion into half-word format, in the vector register is half the number of elements to produce the same, and the remaining I bit in the other half. In both cases, the data type conversion is converted to produce an element having Structure configuration, the length of these elements is different from the length of the source element. ...
Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package The basic steps include:
Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package The basic steps include:...
Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package The basic steps include:...
In the vector register VRa Lieutenant two's complement number from the larger element length (int16) into smaller lengths (As int8) included in the basic steps of:
1 Verify int16 data types byte length of each element can be represented. If you need to To the ends of saturation of the elements to fit a smaller length.
(2) the elements of the VRa vector VRb to mix with another wash transferred to two vectors VRc: VRd, in the VRa: VRd, each element in the high half transferred to VRc, transferred to the lower half VRd, so the low half of VRd effectively VRa collection of all elements in the lower half million Su.
The following data type conversion in order to provide some special instructions: int32 into single-precision floating-point; single Precision floating-point into fixed-point (XY notation); single-precision floating-point turn into int32; int8 into int9; int9 Into int16; and int16 into int9.
To provide flexibility in program design vector, most vector instructions and use only shielded element Operation within the selected vector register element. "Full vector mask register" VGMR0 and VGMR1 element identified by vector instructions in the destination register and vector accumulator to be modified Elements. 9 bytes for byte and a data length of operation, VGMR0 (or VGMR1) in 32-bit Everyone in the identification of an element to be operated, the bit is set, indicates VGMR0 <i> byte length Element i will be effect. Where i is 0 to 31. Right half-word data in terms of the length of operation, in VGMR0 (or VGMR1) Each of the 32 bits in the two identified an element to be operated. Bit VGMR0 <2i: 2i +1> Set, indicates that the role of element i will be, i is 0-15. If the length of the half word data operations VGMR0 only one pair is set, then only those bits corresponding byte is modified. To Data word length operation, VGMR0 (or VGMR1) set for each set of four bits identify an element is operated Made. Bit VGMR0 <4i: 4i +3> set, indicates that the role of element i will be, i is 0-7. As VGMR0 the fruit in the four bits are not set all bits of the data word length operation is set, only the Those bits should byte is modified. ...
To provide flexibility in program design vector, most vector instructions and use only shielded element Operation within the selected vector register element. "Full vector mask register" VGMR0 and VGMR1 element identified by vector instructions in the destination register and vector accumulator to be modified Elements. 9 bytes for byte and a data length of operation, VGMR0 (or VGMR1) in 32-bit Everyone in the identification of an element to be operated, the bit is set, indicates VGMR0 <i> byte length Element i will be effect. Where i is 0 to 31. Right half-word data in terms of the length of operation, in VGMR0 (or VGMR1) Each of the 32 bits in the two identified an element to be operated. Bit VGMR0 <2i: 2i +1> Set, indicates that the role of element i will be, i is 0-15. If the length of the half word data operations VGMR0 only one pair is set, then only those bits corresponding byte is modified. To Data word length operation, VGMR0 (or VGMR1) set for each set of four bits identify an element is operated Made. Bit VGMR0 <4i: 4i +3> set, indicates that the role of element i will be, i is 0-7. As VGMR0 the fruit in the four bits are not set all bits of the data word length operation is set, only the Those bits should byte is modified. ...
For vector programming flexibility, most MSP instruction supports vector and scalar operations three kinds Form, are as follows:
1 vector = vector of vector operations
(2) vector = scalar vector operations
3 vector = scalar scalar operations
Case 2 scalar registers specified as the B operand in scalar register a single element complex A vector is made to match the number of elements in the operand required amount. Copied elements are designated with a scalar Operand elements have the same value. Scalar Operands for immediate operands form can be derived from a scalar register Or instruction. In the case of immediate operand, if the data type specified by the data length ratio can be obtained Immediately to the field length is large, the use of appropriate sign extension.
In many multimedia applications, especially immediate attention to the source and the accuracy of the final result. In addition, the entire Multiply instruction produces energy stored in two vector registers in the "double precision" intermediate results.
Typically, MSP architecture supports 8,9,16 and 32 elements in two's complement integer format And 32 elements IEEE754 single precision format. The definition of an overflow, the result is outside a predetermined data Type can be represented by the maximum positive or maximum negative range. When an overflow occurs, write the destination register The value is not a valid number, the defined underflow used only for floating-point operations.
Unless otherwise noted, all floating point operations specified in bits VCSR <RMODE> rounding the four One way. Some instructions use the well-known rounding zero (even rounding) rounding mode. These instructions are clearly Noted.
In many multimedia applications, the saturation is an important feature. MSP architecture supports all four Integer and floating-point operations saturation. The median in the register VCSR ISAT Specify integer saturation mode. Floating point Saturated mode, also known as IEEE fast manner in which a VCSR FSAT bit to specify. When enabled saturated Mode, exceeds the maximum positive or negative results are a large set maximum positive or maximum negative value. In this Case, no overflow occurs, the overflow bit can not be set.
Table D.4 lists the exact exceptions that previously specified in the implementation of fault detection and reporting. Different Constant vector address in hexadecimal notation.
Table D.4: precise exception
Exception vector Explanation
    0x00000018 Vector processor instruction address breakpoint exception
    0x00000018 Vector processor data address breakpoint exception
    0x00000018 Invalid instruction exception vector processors
    0x00000018 Single step exception vector processors
    0x00000018 Vector processors return address on the stack overflow exception
    0x00000018 Vector processors return address stack underflow exception
    0x00000018 Exception vector processor VCINT
    0x00000018 Exception vector processor VCJOIN
Table D.5 lists the inexact exception, these anomalies in the implementation of certain directives in the program is faulty After instruction, to be detected and reported.
Table D.5: inexact exception
Exception vector Explanation
    0x00000018 Invalid exception vector processor instruction address
    0x00000018 Invalid data address exception vector processors
    0x00000018 Vector processor does not align data access exception
    0x00000018 Vector processor integer overflow exception
    0x00000018 Floating-point overflow exception vector processors
    0x00000018 Floating-point invalid operand vector processor exception
    0x00000018 Vector processor floating point divide by zero exception
    0x00000018 Vector processor integer divide by zero exception
Appendix E
The vector processor instructions included are shown in Table E.1 in eleven categories
Table E.1 Vector instruction class summary
Category Explanation
Control flow Instructions contained in this category include the transfer and is used to control the interface ARM7 instruction The program flow.
Logical (bitwise manner, shielding) This class includes instruction bitwise logical manner. Although (bitwise manner, shielding) Data type is Boolean class, but logic instructions to modify using elemental shield The results, which requires data types.
Shift and Rotate (Calculated as elemental way, shielded) This category contains instructions for each element of the shift and rotate bit screen Cover. The class distinction between the length of the element, and shielded by the elements of.
Arithmetic (Calculated as elemental way, shielded) This class includes elements of the way by arithmetic instructions. (Calculated as elemental way, shielded) That is a result of i-th element of the source element of the i-th calculated , The type of the elements of the class distinction, and subject to the impact of shielding elements.
Multimedia (Calculated as elemental way, shielded) This category contains instructions for optimizing multimedia (calculated as elemental way, shielded) Applications, the class distinction element type, and shielded by the elements affected.
Data Type Conversion (Calculated as elemental way, unshielded) This class contains the instructions for converting from one element (element mode, no screen Cover) data type to another. This class supports the specified data class instruction Type set, and without shielding elements, since this structure does not support storage Is in more than one data type.
Arithmetic between elements This class includes instruction for a different location from the vector fetch two elements An arithmetic result.
Transfer between elements This class includes instruction for a different location from the vector fetch two elements Rearrange elements.
Load / store This class includes instructions for loading or storage registers. These instructions are not Masked by the impact of elements.
Cache Operation This category contains instructions for controlling the instruction and data caches. These refer to So shielded from the impact of elements.
Register transfers This class contains instructions for transferring data between two registers. These Instructions are usually shielded from the impact of elements, but some elements can be selected Masked.
Table E.2 lists the flow control instructions.
Table E.2: Flow control instructions
Mnemonic Explanation
    VCBR Conditional branch
    VCBRI Indirect conditional branch
    VD1CBR Reduction VCR1 and conditional branches
    VD2CBR Reduction VCR2 and conditional branches
    VD3CBR Reduction VCR3 and conditional branches
    VCJSR Conditions rotor routines
    VCJSRI Indirect rotor routine conditions
    VCRSR Conditional Return from the program
    VCINT ARM7 interrupt conditions
    VCJOIN Conditions confluence with ARM7
    VCCS Context switching conditions
    VCBARR Conditions barrier
    VCHGCR Change Control Register (VCSR)
Logic class supports Boolean data type and shielded by the elements affected. Table E.3 lists the flow control commands.
Table E.3: logic instructions
Mnemonic Explanation
    VNOT NOT--B
    VAND AND-(A&B)
    VCAND Complement AND-(-A & B)
    VANDC AND complement - (A &-B)
    VNAND NAND--(A&B)
    VOR OR-(A|R)
    VCOR Complement OR-(-A | R)
    VORC OR complement of - (A |-R)
    VNOR NOR--(A|R)
    VXOR XOR - (A ^ R)
    VXNOR Exclusive NOR - (A ^ R)
Shift / Rotate shift class instructions int8, int9, int16 and int32 data type operations (non-floating Point data types), and subject to the impact of shielding elements. Table E.4 lists the shift / rotate class instruction.
Table E.4: Shift and Rotate class
Mnemonic Explanation
    VDIV2N In addition to a power of 2
    VLSL Logical Shift Left
    VLSR Logical Shift Right
    VROL Rotate Left
    VROR Rotate Right
Typically, the arithmetic class instruction support int8, int9, int16 and int32 and floating-point data types, and Masked by the impact of elements. For non-supported data types specifically limited, see below each instruction A detailed description. VCMPV instruction is not subject to the impact shield element, the element shield their work situation Condition. Table E.5 lists the arithmetic class instruction.
Table E.5: math class
Mnemonic Explanation
    VASR Arithmetic shift right
    VADD Plus
    VAVG Average
    VSUB Minus
    VASUB Less absolute
    VMUL Multiply
    VMULA Multiply accumulator
    VMULAF Multiply accumulator fractional
    VMULF Multiply decimals
    VMULFR And multiply decimals and rounding
    VMULL By Low
    VMAD Multiplication and addition
    VMADL Low multiplication and addition
    VADAC Add and accumulate
Mnemonic Explanation
    VADACL Add and accumulate low
    VMAC Multiply and accumulate
    VMACF Multiply and accumulate fractional
    VMACL Multiply and accumulate low
    VMAS Multiply and subtract from accumulator
    VMASF Multiply and subtract from accumulator fractional
    VMASL Multiply and subtract from accumulator low
    VSATU Saturated to the upper limit
    VSATL Saturated to the lower limit
    VSUBS Less scalar and postcondition
    VCMPV Compare vectors and set mask
    VDIVI In addition to initializing
    VDIVS Except
    VASL Arithmetic shift right
    VASA Arithmetic shift an accumulator
MPEG instructions are specially adapted for the MPEG encoding and decoding of a class of instructions, but may be in various Manner. MPEG directive does not support int8, int9, int16 and int32 data types, and are subject to Elements shielding effects. Table E.6 lists MPEG instruction.
Table E.6: MPEG class
Mnemonic Explanation
    VAAS3 Plus processing (-1,0,1) symbol
    VASS3 Addition and subtraction (-1, 0) Symbol
    VEXTSGN2 Extraction (-1,1) symbol
    VEXTSGN3 Extraction (-1,0,1) symbol
    VXORALL XOR all elements of the least significant bit.
Each data type conversion instruction to support specific data types, and is not a shadow shield element Sound, because this structure does not support more than one register data type. Table E.7 lists the data classes Type conversion instructions.
Table E.7: data type conversion classes
Mnemonic Explanation
    VCVTIF Convert from integer to float
    VCVTFF Floating-point to fixed-point conversion
    VROUND Rounding floating-point to integer (supports four IEEE rounding mode Style)
    VCNTLZ Count leading 0
    VCVTB9 Converting data type Byte 9
Internal element arithmetic class instruction support int8, int9, int16 and int32 and floating-point data types.
Table E.8 lists the internal elements of math class instruction.
Table E.8: internal element arithmetic class
Mnemonic Explanation
    VADDH Two adjacent elements plus
    VAVGH Average of two adjacent elements
    VAVGQ Average of the four elements
    VMAXE Maximum switching even / odd elements
Transfer between elements support byte-oriented instructions, byte 9, halfword, and word length of the data, are listed in Table E.9 Transfer between the elements of class instruction.
Table E.9: Elements interline transfer type
Mnemonic Explanation
    VESL Elements to the left one
    VESR Elements to the right one
    VSHFL Even / odd element shuffling
    VSHFL Even / odd element shuffling
    VSHFLH High even / odd element shuffling
    VSHFLL Low even / odd element shuffling
    VUNSHFL Even / odd elements deshuffling
    VUNSHFLH High even / odd elements deshuffling
    VUNSHFLL Low even / odd elements deshuffling
Load / store instructions in addition to support for byte, half-word and word length of the data outside also particularly relevant support byte 9 Data length operation, and shielded by the elements of. Table E.10 lists the load / store instruction class.
Table E10: load / store category
Mnemonic Explanation
    VL Load
    VLD Load double word
    VLQ Load quadword
    VLCB Loaded from the ring buffer
    VLR Inverse sequence of elements loaded
    VLWS Span load
    VST Storage
    VSTD Memory double word
    VSTQ Storage quadword
    VSTCB Stored in the ring buffer
    VSTR Inverse sequence of elements stored
    VSTWS Span storage
Most of register transfer instruction support int8, int9, int16 and int32 and floating-point type, Not affected by the impact shield element, only VCMOVM instruction is subject to the impact shield element. Table E.11 Lists the register transfer class instruction.
Table E.11: register transfer class
Mnemonic Explanation
    VLI Immediate loading
    VMOV Shift
    VCMOV Conditional transfer
    VCMOVM Shielded with conditional branching element
    VEXTRT Extracting an element
    VINSERT Insert an element
Table E.12 lists the cache subsystem 130 controls a cache operation class instruction.
Table E.12: Cache operation class
Mnemonic Explanation
    VCACHE The data or instruction cache cache operation
    VPFTCH To a data cache prefetch
    VWBACK From the data cache write-back
Instructions predicate
To simplify the description of the instruction set in the appendix uses a special terminology. For example, the instruction operation Operand is a byte, byte 9, halfword, or word length signed two's complement integer, unless otherwise Comments. The term "registers" is used to refer to common (scalar or vector) registers, other types of registers Are clearly explained. Press assembly language syntax, the suffix b, b9, h, and w represents the data length (byte, Byte 9, half-word and word) and integer data types (int8, int9, int16 and int2). In addition, with the To describe the instruction operands, operation, and assembly language syntax terminology and symbols are as follows.
Rd purpose registers (vector, scalar or dedicated)
Ra, Rb source registers a and b (vector, scalar or private)
Rc source or destination register c (vector or scalar)
Rs store data source register (vector or scalar)
S 32-bit scalar or special registers
Vector register VR current group
VRA substitution group vector register
VR0 0 group vector register
VR1 1 set of vector registers
VRd vector destination register (default is the current group, unless the VRA is specified)
VRa, VRb vector source register a and b
The source or destination register VRC vector C
VRS vector store data source register
VAC0H vector accumulator registers 0 High
VAC0L vector accumulator register 0 Low
VAC1H vector accumulator registers a high
VAC1L vector accumulator registers a low
SRd scalar destination register
SRa, SRb scalar source registers a and b
SRb + in order to effectively address base register update
SRs scalar data storage source register
SP special register
VR [i] vector register VR in the i-th element
VR [i] <a:b> vector register VR in the i-th element of a to b bits
VR [i] <msb> vector register VR in the i-th element in the most significant bit
The effective memory access address EA
MEM Memory
BYTE [EA] EA memory address of a byte
HALF [EA] EA in halfword memory address, the address EA +1 to bits <15:8>.
WORD [EA] EA of a memory address, address EA +3 to bits <31:24>.
NumElem given data type for the specified number of elements. In VEC32 model, the word
Section and byte 9, halfword, or word data length, respectively 32,16, or 8; in
VEC64 model, byte and byte 9, halfword, or word data length, respectively
64, 32 or 16. Scalar operation NumElem is 0.
EMASK [i] represents the i element by element shield. Byte for byte and 9, half-word or words
Data length, in VGMR0 / 1, ~ VGMR0 / 1, VMMR0 / 1 or ~
VMMR0 / 1 respectively represents 1, 2 or 4 bits. A scalar operation, even
EMASK [i] = 0, but also that element shield is set.
MMASK [i] represents the i element by element shield. In bytes and byte 9, halfword, or word
Data length, respectively, in VMMR0 or VMMR1 represents 1, 2 or 4
Position.
VCSR vector control and status register
VCSR VCSR <x> represents one bit or more bits. "X" is the field name
VPC vector processor program counter
VECSIZE vector register length, in VEC32 pattern is 32, the pattern is in VEC64
            64。
SPAD register
C programming structure used to describe the flow control operation. Some exceptions noted below:
= Assignment
Connection
{X ‖ Y} X or Y to select between (not a logical or)
sex on the length of the specified data sign extension
sex_dp the specified data length double precision number sign extension
sign "(arithmetic) right sign extension
zex the specified data length zero-extended
zero "(logic) right zero extension
"Left (filled with zeros)
trnc7 truncated front 7 (from a half-word)
trnc1 front of an amputated (from byte 9)
% Modulo operator
| expression | expression taking the absolute value
/ Except (for floating-point data types using four kinds of IEEE rounding modes)
/ / Divide (using a zero rounding mode rounding)
Saturate () for integer types saturated to the maximum negative or maximum positive value, does not produce an overflow; right
To floating-point data types, can be saturated to positive infinity, positive zero, negative zero, or negative
Or infinity.
General instruction format shown in Figure 8 and described below.
REAR formatBy the load, store and use the operation instruction cache, and format the fields with REAR
Have the meanings given below in Table E.13.
Table E.13: REAR format
Field Significance
OPC<4:0> Opcode
    B The Rn registers the group identifier
    D Purpose / Source scalar register. When set, Rn <4:0> point mark Volume register. In VEC32 mode, the B: D-coded legal values ​​are: 00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is in the alternative group vector register 11 Undefined
In VEC64 mode, the B: D-coded legal values ​​are: 00 Only in the vector register Rn 4,8,16 or 32 bytes are Use 01 Rn is a scalar register 10 vector register Rn, all 64 bytes are used 11 Undefined
  TT<1:0> Transfer type, indicating the specific load or store operation. See below LT And ST coding table.
  C Cache off. This bit is set to bypass the data cache load Memory. This bit is used to load and store instructions Mnemonic set cache-off Set (OFF to connect Mnemonic)
  A Address is updated, set this bit with a valid address update SRb. Effective Address Press SRb + SRi calculation.
  Rn<4:0> Destination / source register number
  SRb<4:0> Scalar base register number
  SRi<4:0> Indexed register number marked
Bits 17:15 are reserved and should be zero, in order to ensure that the structure in the future to extend compatibility. B: D and Some coding TT field is undefined, the programmer should not use these codes, because the structure is not specified when Such a coding is used the expected results. Table E.14 shows VEC32 and VEC64 modes are supported Scalar load operation (the TT field is encoded as LT).
Table E.14 in VEC32 and VEC64 mode REAR load operation
   D:LT Mnemonic Significance
    100     .bs9 Load 8 become byte 9 lengths sign extension
    101     .h Load 16 become half-word length
    110     .bz9 Load 8 byte 9 lengths become zero expansion
    111     .w Load 32 as word length
Table E.15 shows VEC32 mode support vector load operation (the TT fields are assigned as LT Code), then VCSR <0> bit is cleared.
Table E.15: VEC32 mode REAR load operation
   D:LT Mnemonic Significance
    000     .4 4 bytes from the memory into the register to load the lower 4 bytes 9, And keep the remaining byte 9 does not change. 4 bytes for each section 9 9 Bit according to the corresponding section 8 for sign extension.
    001     .8 Loaded from the memory into the register lower 8 bytes 8 bytes 9, And keep the remaining byte 9 does not change. 8 bytes for each section 9 9 Bit according to the corresponding section 8 for sign extension.
    010     .16 Load 16 bytes from memory into the register lower 16 bytes 9 and keep the remaining byte 9 does not change. 9 of 16 bytes each No. 9 according to the corresponding section 8 for sign extension.
    011     .32 Load 32 bytes from memory into the register lower 32 bytes 9 and keep the remaining byte 9 does not change. 9 of 32 bytes each No. 9 according to the corresponding section 8 for sign extension.
B bit is used to indicate the current or alternative groups.
Table E.16 shows VEC64 mode support vector load operation (by the TT field as LT Coding). At this point VCSR <0> bit is set.
Table E.16: VEC32 load operation mode REAR
  B:D:LT Mnemonic Significance
    0000     .4 4 bytes from the memory into the register to load the lower 4 bytes 9 and keep the remaining byte 9 does not change. Each of 4 bytes 9 The first nine months, according to the corresponding section 8 for sign extension.
    0001     .8 Loaded from the memory into the register lower 8 bytes 8 bytes 9 and keep the remaining byte 9 does not change. 9, each 8 bytes First nine months according to the corresponding section 8 for sign extension.
    0010     .16 16 bytes from the memory into the register to load the lower 16 characters Section 9 and keep the remaining byte 9 does not change. 16 bytes 9 Each section 9 according to the corresponding section 8 for sign extension.
  B:D:LT Mnemonic Significance
    0011 .32 Load 32 bytes from memory into the register lower 32 words Section 9 and keep the remaining byte 9 does not change. 32 bytes 9 Each section 9 according to the corresponding section 8 for sign extension.
    1000 Undefined
    1001 Undefined
    1010 Undefined
    1011 .64 Loads from memory 64 bytes into the register lower 64 words Section 9 and keep the remaining byte 9 does not change. 64 bytes 9 Each section 9 according to the corresponding section 8 for sign extension.
64 bytes of bit B is used to indicate vector operations, because the VEC64 mode when a group and there is no alternative The concept of groups.
Table E.17 lists VEC32 and VEC64 scalar modes are supported storage operation (in the TT field Is encoded as ST).
Table E.17: REAR scalar storage operations
   D:ST Mnemonic Significance
    100     .b Memory byte or byte 9 lengths become 8 (from byte 9 truncated one)
    101     .h Store halfword length become 16
    110 Undefined
    111     .w Memory word length as 32
Table E.18 lists VEC32 mode support vector storage operation (in the TT field was incorporated as a ST Code), then VCSR <0> bit is cleared.
Table E18: VEC32 mode REAR vector storage operations
   D:ST Mnemonic Significance
    000     .4 4 bytes from the register memory to the memory, register 4 bytes 9 Each section 9 is ignored.
    001     .8 Storage 8 bytes from the register to the memory byte register 8 9 Each section 9 is ignored.
    010     .1b Store 16 bytes from register to memory, registers 16 bytes 9 Each of the ninth bit is ignored.
    011     .32 Store 32 bytes from register to memory, registers 32 bytes 9 Each of the ninth bit is ignored.
Table E.19 lists VEC64 mode support vector storage operation (in the TT field was incorporated as a ST Code), then VCSR <0> bit is set.
Table E.19: In VEC32 REAR vector memory operation mode
B:D:ST Mnemonic Significance
0000 .4 4 bytes from the register memory to the memory, register 4 bytes 9 each of the ninth bit is ignored.
0001 .8 Register stores 8 bytes from the memory, registers 8 bytes 9 each of the ninth bit is ignored.
0010 .16 Store 16 bytes from register to memory, registers 16 words Each section 9 Section 9 is ignored.
0011 .32 Store 32 bytes from register to memory, registers 32 words Each section 9 Section 9 is ignored.
1000 Undefined
1001 Undefined
1010 Undefined
1011 .64 Store 64 bytes from register to memory, registers 64 words Each section 9 Section 9 is ignored.
Bit B is used to indicate 64 byte vector operations, because in VEC64 mode does not exist in the current group and alternative The concept of groups.
REAI formatBy the loading, storage, and operating instruction cache, table E.20 shows REAI grid Under the meaning of each field type.
Table E.20: REAI format
Field Significance
OPC<4:0> Opcode
    B Group identifier register Rn. When VEC32 mode settings, Rn <4:0> indicates the group in alternative vector register number; When VEC64 mold Type is set, it indicates that all vectors (64 bytes) operation.
    D Purpose / Source scalar register. When set, Rn <4:0> represents a landmark Volume register. In VEC32 mode B: D-coded legal values ​​are:
00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is an alternative set of vector registers 11 Undefined In VEC64 mode B: D-coded legal values ​​are: 00 only in the vector register Rn, 8, 16 or 32 bytes are the Use 01 Rn is a scalar register 10 vector registers Rn, the entire 64 bytes are used 11 Undefined ...
TT<1:0> 00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is an alternative set of vector registers 11 Undefined In VEC64 mode B: D-coded legal values ​​are: 00 only in the vector register Rn, 8, 16 or 32 bytes are the Use 01 Rn is a scalar register 10 vector registers Rn, the entire 64 bytes are used 11 Undefined ...
C Cache Close, set this bit to bypass loading the data cache. This bit is used to load and store instructions Cache-off mnemonic set (connection OFF to mnemonic).
A Address is updated, set this bit with a valid address update SRb. Effective address by SRb + IM <7:0> calculation.
Rn<4:0> Destination / source register number
SRb<4:0> Scalar base register number
IMM<7:0> An 8-bit immediate offset, according to two's complement digital illustration.
REAR and REAI format used to transmit the same type of coding. See further details on the coding REAR format.
RRRM5 format provides three registers or two registers, and a 5-bit immediate operand. Table E.21 RRRM5 format defined fields.
Table E.21: RRRM5 format
Field Significance
OP<4:0> Opcode
D Purpose scalar register. When set, Rd <4:1> indicates scalar Storage Device; When cleared, Rd <4:0> indicates vector register.
S Scalar Rb register. When set point Rb <4:0> is a scalar register Device; When cleared, Rb <4:0> is the vector registers.
SD<1:0> Data width, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type) 11 characters (for int2 or floating point data types)
M D: S bit modifier, see below D: S: M coding table.
Rd<4:0> Objective D register number
Ra<4:0> Source A register number
Rb <4:0> or IM5 <4:0> Source B register, or 5-bit literal, depending on D: S: M coding, 5 immediate value as an unsigned number.
Bit 19:15 Reserved and must be zero to ensure compatibility in the future to expand.
All vector register operand refers to the current group (group 0 can be also be a group) unless otherwise Make statements. Table E.22 lists when DS <1:0> is 00, 01 or 10 of the D: S: M series Yards.
Table E22: DS is not equal to 11:00 RRRM5 the D: S: M Coding
Coding     Rd     Ra    Rb/IM5 Note
    000     VRd     VRa     VRb Three vector register operands
    001 Undefined
    010     VRd     VRa     SRb B operand is a scalar register
    011     VRd     VRa     IM5 B operand is the immediate 5
    100 Undefined
    101 Undefined
    110     SRd     SRa     SRb Three scalar register operand
    111     SRd     SRa     IM5 B operand is the immediate 5
When DS <1:0> is 11:00 D: S: M coding has the following meanings:
Table E.23: DS equal to 11:00, RRRM5 the D: S: M Coding
 D:S:M     Rd     Ra   Rb/IM5 Note
    000     VRd     VRa     VRb Three vector register operand (int32)
    001     VRd     VRa     VRb Three vector register operand (float)
    010     VRd     VRa     SRb B operand is a scalar register (int32)
    011     VRd     VRa     IM5 B operand is 5 immediate data (int32)
    100     VRd     VRa     SRb B operand is a scalar register (float)
    101     SRb     SRa     SRb Three scalar register operand (float)
    110     SRd     SRa     SRb Three scalar register operand (int32)
    111     SRd     SRa     IM5 B operand is 5 immediate data (int32)
RRRR formatProvides four register operands, Table E.24 shows RRRR format fields.
Table E.24: RRRR format
Field Significance
    Op<4:0> Opcode
    S Scalar Rb register. When set point Rb <4: 0> is a scalar register; When cleared, Rb <4: 0> is a vector register.
    DS<1:0> Data length, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type) 11 characters (for int32 data type)
    Rc<4:0> Source / destination register number C
    Rd<4:0> Objective D register number
    Ra<4:0> Source A register number
    Rb<4:0> Source B register number
All vector register operand refers to the current group (either 0 group can also be a group), unless otherwise Make statements.
RI formatOnly by the load immediate instruction. Table E.25 RI format specified field.
Table E.25: RI format
Field Significance
D Purpose scalar register. When set, Rd <4:0> represents a landmark Volume registers; When cleared, Rd <4:0> indicates the current group a Vector register.
F Floating-point data types. When set, indicates that a floating-point data types, and Requirements DS <1:0> of 11.
DS<1:0> Data length, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for intt9 data type) 10 half-word (for int16 data type)
11 characters (for int32 or floating point data types)
Rd<4:0> Objective D register number
IMM<18:0> A literal value 19
Field F: DS <1:0> of certain coding undefined. Programming these codes should not, as This structure is not given when using this encoding the expected consequences. Loaded into the Rd value depends on the number of The type of data, as shown in Table E.26.
Table E.26: RI format load value
Format Data Type Register operand
    .b Byte (8) Rd<7:0>:=Imm<7:0>
    .b9 Byte (9) Rd<8:0>:=Imm<8:0>
    .h Half-word (16) Rd<15:0>:=Imm<15:0>
    .w Word (32) Rd <31:0>: = sign extension IMM <18:0>
    .f Floating point (32) Rd <31>: = Imm <18> (symbol) Rd <30:23>: = Imm <17:0> (index) Rd <22:13>: = Imm <9:0> (mantissa) Rd <12:0>: = 0
CT formatContains fields shown in Table E.27.
Table E.27: CT format
Field Significance
Opc<3:0> Opcode
Cond<2:0> Transfer conditions: 000 unconditional Less than 001 010 is equal to Less than or equal to 011 100 is greater than 101 is not equal to Greater than or equal to 110 111 Overflow
IMM<22:0> 23 immediate digital offset press two's complement number instructions.
Transfer conditions using VCSR [GT: EQ: LT] fields. Overflow condition using VCSR [SO] bit, when When set, it takes precedence over GT, EQ and LT bit. VCCS and VCBARR places other than the above Said to explain Cond <2:0> field, refer to its instruction description details.
RRRM9 formatSpecify three registers or two registers and a 9 immediate operand. Table E.28 given RRRM9 formatted field.
Table E.28: RRRM9 format
Field Significance
Opc<5:0> Opcode
D Purpose scalar register. When set, Rd <4:0> represents a landmark Volume registers; When cleared, Rd <4:0> represents a vector register Makers.
S Scalar Rb register. When set, indicates Rb <4:0> is a scalar Register; When cleared, Rb <4:0> is a vector register.
DS<1:0> Data width, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type)
11 characters (for int32 or floating point data types)
M On the D: S bit modifier, see the back D: S: M coding table.
Rd<4:0> Purpose register number
Ra<4:0> Source A register number
Rb <4:0> or IM5 <4:0> Source B register number or a 5-bit literal, depending on the D: S: M series Yards.
IM9<3:0> And IM5 <4:0> supplied with a 9 immediate, depending on D: S: M coding.
Bits are reserved bits 19:15, when D: S: M coding does not specify an immediate operand, and must Must be 0 to ensure future compatibility.
All vector register operand refers to the current group (either 0 group can also be a group) unless otherwise Make statements. D: S: M coding with RRRM5 format shown in Table E.22 and E.23 those are relative Same, except under DS <1:0> coding segments extracted from the literal number immediately above, Table E.29 Shown.
Table E.29: RRRM9 format literal value
DS Matching data types B operand
00     int8 Source B<7:0>:=IM9<2:0>:IM5<4: 0>
01     int9 Source B<8:0>:=IM9<3:0>:IM5<4: 0>
10     int16 Source B<15:0>:=sex(IM9<3:0>:IM5<4: 0>)
11     int32 Source B<31:0>:=sex(IM9<3:0>:IM5<4: 0>)
Floating-point data types can not get immediate format.
The following is based on Alphanumeric MSP vector instructions. Note:
1 Unless otherwise indicated, the instruction is shielded by the elements of. CT formatting commands without shadow shield element Rang. By the load, store and cache directive composed REAR and REAI formatting commands are not subject to Elements shield effect.
2 floating-point data types can not get 9 immediate operand.
3 only in the operating instructions given in vector form. The scalar operations, assuming only one, namely 0 Element is defined.
4 pairs RRRM5 and RRRM9 format, the following coding for integer data types (b, b9, h, w):
D:S:M   000   010   011   110   111
    DS   00   01   10   11
5 on RRRM5 and RRRM9 format, the following coding for floating-point data types:
D:S:M  001  100   n/a   101   n/a
    DS   11
6 may cause an overflow for all the instructions that, when VCSR <ISAT> bit is set, the saturation to int8, int9, int16, int32 maximum or minimum limit is adopted. Accordingly, when VCSR <FSAT> Bit is set, the floating-point result saturates to - infinity, -0, +0 or + infinity.
7 Press syntactic rules,. N can be used instead. B9 to represent the data length byte 9.
8 for all the instructions to return to the destination register or to the vector of the accumulator is IEEE754 floating-point results Single-precision format. Floating-point results written to the lower portion of the accumulator, high part does not change.
VAAS3 plus and additional (-1, 0) Symbol
Format
Assembler syntax
VAAS3.dt VRd,VRa,VRb
VAAS3.dt VRd,VRa,SRb
VAAS3.dt SRd,SRa,SRb
Where dt = {b, b9, h, w}.
Supported modes
D:S:M     V<-V@V     V<-V@S     S<-S@S
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / Scalar contents of register Ra Rb is added to produce an intermediate result, the intermediate results Additionally with Ra symbol; and the end result is stored in the vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    if(Ra[i]>0)       extsgn3=1;
    else if(Ra[i]<0)  extsgn3=-1;
    else               extsgn3=0;
    Rd[i]=Ra[i]+Rb[i]+extsgn3;
}
Abnormal
Overflow.
VADAC add and accumulate
Format
Figure C9711740500791
Assembler syntax
VADAC.dt VRc,VRd,VRa,VRb
VADAC.dt SRc,SRd,SRa,SRb
Where dt = {b, b9, h, w}.
Supported modes
  S   VR     SR
  DS int8(b)   int9(b9)   int16(h) int32(w)
Explanation
The Ra and Rb each element of the operand vector elements with each double accumulator sum the Each element of the double-precision and stored into the accumulator and the destination vector register Ra and Rd. Ra and Rb Using the specified data type, and VAC using the appropriate double-precision data types (16,18,32 and 64 respectively int8, int9, int16 and int32). Each double-precision elements are stored in a high VACH and Rc. If Rc = Rd, Rc the results are undefined.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Aop[i]={VRa[i]‖SRa};
    Bop[i]={VRb[i]‖SRb};
    VACH[i]:VACL[i]=sex(Aop[i]+Bop[i])+VACH[i]:VACL[i];
    Rc[i]=VACH[i];
    Rd[i]=VACL[i];
}
VADACL add and accumulate low
Format
Assembler syntax
VADACL.dt  VRd,VRa,VRb
VADACL.dt  VRd,VRa,SRb
VADACL.dt  VRd,VRa,#IMM
VADACL.dt  SRd,SRa,SRb
VADACL.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M   DS   V<-V@V   int8(b)   V<-V@S   int9(b9)   V<-V@I   int16(h)   S<-S@S   int32(w)   S<-S@I
The Ra and Rb / immediate operand vector for each element and each extended precision accumulator element Addition, the extended precision and deposit vector accumulator; returned to the accuracy of the lower destination register Rd. Ra and Rb / immediate use of the specified data type, and VAC with the appropriate double-precision data types (16,18,32 and 64 respectively int8, int9, int16 and int32). Each extended precision Elements are stored in VACH in high.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    VACH[i]:VACL[i]=sex(Ra[i]+Bop[i])+VACH[i]:VACL[i];
    Rd[i]=VACL[i];   
}
VADD plus
Format
Figure C9711740500811
Assembler syntax
VADD.dt VRd,VRa,VRb
VADD.dt VRd,VRa,SRb
VADD.dt VRd,VRa,#IMM
VADD.dt SRd,SRa,SRb
VADD.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M     V<-V@V     V<-V@S     V<-V@I   S<-S@S     S<-S@I
  DS     int8(b)     int9(b9)     int16(h)   int32(w)     float(f)
Explanation
Plus Ra and Rb / immediate operands, and return them to the destination register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]=[VRb[i]‖SRb‖sex(IMM<8:0>));
    Rd[i]=Ra[i]+Bop[i];
}
Abnormal
Overflow, floating-point invalid operand.
VADDH plus two adjacent elements
Format
Figure C9711740500821
Assembler syntax
VADDH.dt  VRd,VRa,VRb
VADDH.dt  VRd,VRa,SRb
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M     V<-V@V     V<-V@S
 DS     int8.(b)     int9(b9)     int16(h)     int32(w)     float(f)
Explanation
Figure C9711740500822
Operating
for(i=0;i<NumElem-1;i++){
    Rd[i]=Ra[i]+Ra[i+1];
}
Rd[NumElem-1]=Ra[NumElem-1]+{VRb[0]‖SRb};
Abnormal
Overflow, floating-point invalid operand.
Programming notes
This directive is NOT affected shielding elements.
VAND with
Format
Figure C9711740500831
Assembler syntax
VAND.dt VRd,VRa,VRb
VAND.dt VRd,VRa,SRb
VAND.dt VRd,VRa,#IMM
VAND.dt SRd,SRa,SRb
VAND.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M  V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
For Ra and Rb / logical and immediate operands and returns the result to the destination register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]<k>=Ra[i]<k>&Bop[i]<k>,k=for all bits in elementi;
}
Abnormal
None.
VANDC and complement
Format
Assembler syntax
VANDC.dt  VRd,VRa,VRb
VANDC.dt  VRd,VRa,SRb
VANDC.dt  VRd,VRa,#IMM
VANDC.dt  SRd,SRa,SRb
VANDC.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
For Ra and Rb / immediate operands and logical complement, and returns the result to the destination register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]=(VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]<k>=Ra[i]<k>&~Bop[i]<k>,k=for all bits in elementi;
}
Abnormal
None.
Arithmetic shift accumulator VASA
Format
Assembler syntax
VASAL.dt
VASAR.dt
Where dt = {b, b9, h, w}, and R that direction left or right shift.
Supported modes
    R     left     right
    DS     int8(b)     int9(b9)     int16(h)   int32(w)
Explanation
Vector accumulator registers each data element left one position, and filled with zeros from the right (If R = 0), or a sign-extended right position (if R = 1). The results are stored in a vector Accumulator.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    if(R=1)
       VACOH[i]:VACOL[i]=VACOH[i]:VACOL[i]sign>>1;
else
       VACOH[i]:VACOL[i]=VACOH[i]:VACOL[i]<<1;
}
Abnormal
Overflow.
VASL arithmetic left shift
Format
Assembler syntax
VASL.dt  VRd,VRa,SRb
VASL.dt  VRd,VRa,#IMM
VASL.dt  SRd,SRa,SRb
VASL.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M     V<-V@S     V<-V@I     S<-S@S   S>-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / scalar register Ra each data element left, from the right are filled with zeros by the shift amount Scalar register Rb or IMM field gives the results stored in the vector / scalar register Rd. To Those elements caused an overflow, the result and in accordance with its symbol contains the largest positive or negative value to the maximum. Shift Position is defined as an unsigned integer.
Operating
shift_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
    Rd[i]=saturate(Ra[i]<<shift_amount);
}
Abnormal
None.
Programming notes
Note Shift-amount from SRb or IMM <4:0> gained five digits. For byte, byte9, halfword data types, the programmer is responsible for the shift amount specified correctly, this shift is less than or equal to The number of bits in the data length. If the shift is greater than the specified data length, the element will be filled with zeros.
VASR arithmetic shift right
Format
Assembler syntax
VASR.dt  VRd,VRa,SRb
VASR.dt  VRd,VRa,#IMM
VASR.dt  SRd,SRa,SRb
VASR.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M     V<-V@S     V<-V@I     S<-S@S   S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / scalar register Ra, each data element is an arithmetic right shift, the most significant bit position of a character Number extension, the shift amount in the scalar register Rb, or the least significant bit IMM field is given, the results Stored in a vector / scalar register Rd. Shift amount specified as an unsigned integer.
Operating
shift_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMAS[i];i++)(   
    Rd[i]=Ra[i]sign>>shift_amount;
}
Abnormal
None.
Programming notes
Note Shift-amount from SRb or IMM <4:0> gained five digits. For byte, byte9, halfword data types, the programmer is responsible for correctly specified shift amount, a small amount of this shift Than or equal to the length of the data digits. If the shift is greater than the specified length of data elements by symbol Bit stuffing.
VASS3 plus and minus (-1, 0) Symbol
Format
Assembler syntax
VASS3.dt  VRd,VRa,VRb
VASS3.dt  VRd,VRa,SRb
VASS3.dt  SRd,SRa,SRb
Where dt = {b, b9, h, w}.
Supported modes
 D:S:M     V<-V@V     V<-V@S     S<-S@S
 DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / Scalar contents of register Ra Rb is added to produce an intermediate result, and the Ra is Symbols removed from the intermediate results; final result is stored in the vector / scalar register Rd.
Operating
for(i=0;i<NumElem  && EMASK[i];i++){
    if(Ra[i]>0)      extsgn3=1;
    else if(Ra[i]<0) extsgn3=-1;
    else              extsgn3=0;
    Rd[i]=Ra[i]+Rb[i]-extsgn3;
}
Abnormal
Overflow.
VASUB absolute value subtraction
Format
Figure C9711740500891
Assembler syntax
VASUB.dt VRd,VRa,VRb
VASUB.dt VRd,VRa,SRb
VASUB.dt VRd,VRa,#IMM
VASUB.dt SRd,SRa,SRb
VASUB.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M   V<-V@V   V+V@S   V<-V@I    S<-S@S   S<-S@I
 DS   int8(b)   int9(b9)   int16(h)   int32(w)   float(f)
Explanation
Vector / scalar register Rb or IMM field content from the vector / scalar contents of register Ra Subtracted, the absolute results are stored in the vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]=[Rb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]=|Ra[i]~Bop[i]|;
Abnormal
Overflow, floating-point invalid operand.
Programming notes
If the subtraction result is the biggest negative, then the absolute value of the operation after an overflow occurs. If you allow full And mode of operation of the absolute value of the result of this will be the largest positive number.
VAVG two elements mean
Format
Figure C9711740500901
Assembler syntax
VAVG.dt  VRd,VRa,VRb
VAVG.dt  VRd,VRa,SRb
VAVG.dt  SRd,SRa,SRb
Where dt = {b, b9, h, w, f}. Use VAVGT for integer data types to refer to Be "truncated" rounding mode.
Supported modes
D:S:M   V<-V@V   V<-V@S   S<-S@S
  DS   int8(b)   int9(b9)   int16(h)   int32(w)   float(f)
Explanation
Vector / scalar add the contents of register Ra vector / scalar register Rb contents to generate a Intermediate results; followed by the intermediate result by 2, and the final result is stored in the vector / scalar register Rd Medium. For integer data types, T = 1 if the rounding mode is truncated, and if T = 0 (default), Then rounded to zero. Floating-point data types, the rounding mode specified by the VCSR <RMODE>.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]=(Rb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]=(Ra[i]+Bop[i])//2;
}
Abnormal
None.
VAVGH average of two adjacent elements
Format
Figure C9711740500911
Assembler syntax
VAVGH.dt  VRd,VRa,VRb
VAVGH.dt  VRd,VRa,SRb
Where dt = {b, b9, h, w, f]. Use VAVGHT for integer data types to Specify "truncate" the rounding mode.
Supported modes
D:S:M   V<-V@V     V<-V@S
  DS   int8(b)     int9(b9)     int16(h)     int32(w)     float(f)
Explanation
For each element, an average of two adjacent elements right. On integer data type, if T = 1, Rounding mode is cut off, while the T = 0 (default) is rounded down to zero. Floating-point data types, the rounding mode Designated by the VCSR <RMODE>.
Figure C9711740500912
Operating
for(i=0;i<NumElem-1;i++){
    Rd[i]=(Ra[i]+Ra[i+l])//2;
}
Rd[NumElem-1]=(Ra[NumElem-1)+{VRb[0]‖SRb})//2;
Abnormal
None.
Programming notes
This command is not affected by masking element.
VAVGQ four average
Format
Figure C9711740500921
Assembler syntax
VAVGQ.dt VRd,VRa,VRb 
Where dt = {b, b9, h, w}. Use VAVGQT for integer data types to indicate "Truncation" rounding mode.
Supported modes
D:S:M   V<-V@V
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
In VEC64 mode does not support this command.
As shown below, the use of truncated mode specified by the T (1 truncated rounding zero 0 is the default) To calculate the average of four elements. Note that the leftmost element (Dn-1) Is undefined.
Operating
for(i=0;i<NumElem-1;i++){
    Rd[i]=(Ra[i]+Rb[i]+Ra[i+1]+Rb[i+1])//4:
}
Abnormal
None.
VCACHE Cache Operation
Format
Assembler syntax
VCACHE.fc  SRd,SRi
VCACHE.fc  SRb,#IMM
VCACHE.fc  SRb+,SRi
VCACHE.fc  SRb+,#IMM
Where fc = {0,1}.
Explanation
The instruction for vector data use Cache software management. When the data part or all of the Cache Such as temporary memory is configured, this command has no effect on the temporary memory.
Supports the following options:
FC<2:0> Significance
    000 Write-back and make it match with the EA label altered the Cache line is invalid. If Matching row contains data that is not altered, then make this line is not valid without the write-back. If Found no Cache line contains EA, the data Cache reserve the right not to be touched.
    001 Write-back and make the index specified by EA altered the Cache line is invalid. If Matching row contains data that is not altered, so that this line is not valid without the write-back.
Other Undefined
Operating
Abnormal
None.
Programming notes
This command is not affected by masking element.
VCAND complement and
Format
Figure C9711740500941
Assembler syntax
VCAND.dt  VRd,VRa,VRb
VCAND.dt  VRd,VRa,SRb
VCAND.dt  VRd,VRa,#IMM
VCAND.dt  SRd,SRa,SRb
VCAND.dt  SRd,SRa,#IMM
Where dt = (b, b9, h, w). Note. W and. F indicate the same operation.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
For Ra and Rb / immediate operands and logical complement, and return their results to the destination register Devices Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]<k>=~Ra[i]<k>&Bop[i]<k>,k=for all bits in elementi;
}
Abnormal
None.
VCBARR Conditions barrier
Format
Figure C9711740500951
Assembler syntax
VCBARR.cond
Where cond = {0-7}. Each condition will be given later mnemonics.
Explanation
As long as this condition remains valid, delaying all the directives and subsequent instructions (appear in the program sequence Those behind). Cond <2:0> field interpretation CT format different from the other conditions Instruction.
Current definition of the following conditions:
Cond<2:0> Significance
    000 Later in the implementation of any command, waiting all previous instructions (program sequence Column appears earlier) to end the execution.
Other Undefined
Operating
while(Cond=true)
stall all later instructoins;
Abnormal
None.
Programming notes
This instruction is provided for the software to force a series of instructions executed. This command can be used to force precisely Report does not clearly abnormal event. For example, if the instruction is immediately used in the calculation of abnormal events can cause After surgery instructions, this event will be the instruction addressing exception program counter reports.
VCBR conditional branch
Format
Assembler syntax
VCBR.cond  #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then the transfer, this is not a delayed branch.
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un))
VPC=VPC+sex(Offset<22:0>*4);
elseVPC=VPC+4;
Abnormal
Invalid instruction address.
VCBRI indirect conditional branch
Format
Figure C9711740500971
Assembler syntax
VCBRI.cond SRb
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then the indirect transfer. This is not a delayed branch.
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un))
VPC=SRb<31:2>:b’00;
elseVPC=VPC+4;
Abnormal
Invalid instruction address.
VCCS Conditions scene conversion
Format
Figure C9711740500981
Assembler syntax
VCCS #Offset
Explanation
If VIMSK <cse> is true, then jump to the site conversion routines. This is not a delayed turn Shift.
If VIMSK <cse> is true, VPC +4 (return address) saved to the return address stack Stacks. If not true, from VPC +4 continue.
Operating
If(VIMSK<cse>=1){
    if(VSP<4>>15){
        VISRC<RASO>=1;
        signal ARM7 with RASO exception;
        VP STATE=VP_IDLE;
}else{
        RSTACK[VSP<3:0>]=VPC+4;
        VSP<4:0>=VSP<4:0>+1;
        VPC=VPC+sex(Offset<22:0>*4);
}
}else VPC=VPC+4;
Abnormal
Return address stack overflow.
VCHGCR change control register
Format
Assembler syntax
VCHGCR Mode
Explanation
This command changes the operating mode of vector processors
Mode in each specified as follows:
Mode Significance
  bit1:0 This two control VCSR <CBANK> bit. Coding specify: 00 - do not change 01 - Clear VCSR <CBANK> bit 10 - Set VCSR <CBANK> bit 11 - bit trigger VCSR <CBANK>
  bits3:2 This two control VCSR <SMM> bit. Coding specify: 00 - do not change 01 - Clear VCSR <SMM> bit 10 - Set VCSR <SMM> bit 11 - bit trigger VCSR <SMM>
  bit5:4 This two control VCSR <CEM> bit. Coding specify: 00 - do not change 01 - Clear VCSR <CEM> bit 10 - Set VCSR <CEM> bit 11 - bit trigger VCSR <CEM>
Other Undefined
Operating
Abnormal
None.
Programming notes
The directive provides for the hardware to be more effective than VMOV instructions to change the way in VCSR Control bit.
VCINT condition interrupt ARM7
Format
Assembler syntax
VCINT.cond #ICODE
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, when enabled, execution stops and interrupts ARM7.
Operating
If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un)){
    VISRC<vip>=1;
    VIINS=[VCINT.cond#ICODE instruction];
    VEPC=VPC;
    if(VIMSK<vie>=1)signal ARM7 interrupt;
    VP_STATE=VP_IDLE;
}
else VPC=VPC+4;
Abnormal
VCINT interrupted.
VCJOIN connection with ARM7 task conditions
Format
Assembler syntax
VCJOIN.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, when enabled, execution stops and interrupts ARM7.
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)){
    VISRC<vjp>=-1;
    VIINS=[VCJOIN.cond#Offset instruction];
    VEPC=VPC;
    if(VIMSK<vje>=1)signal ARM7 interrupt;
    VP_STATE=VP_IDLE;
}
else VPC=VPC+4;
Abnormal
VCJOIN interrupted.
VCJSR conditional jump to subroutine
Format
Assembler syntax
VCJSR.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then jump to the subroutine. This is not a delayed branch.
If Cond is true, the VPC +4 (return address) saved to the return address stack. If a non- True, from the VPC +4 continue.
Operating
If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un))(
    if(VSP<4>>15){
        VISRC<RASO>=1;
        signal ARM7 with RASO exception;
        VP_STATE=VP_IDLE;
    }else{
        RSTACK[VSP<3:0>]=VPC+4;
        VSP<4:0>=VSP<4:0>+1:
        VPC=VPC+sex(Offset<22:0>*4);
    }
}else VPC=VPC+4;
Abnormal
Return address stack overflow.
VCJSRI indirect conditional jump to subroutine
Format
Figure C9711740501031
Assembler syntax
VCJSRI.cond  SRb
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then the indirect jump to subroutine. This is not a delayed branch.
If Cond is true, VPC +4 (return address) saved to the return address stack. If Not true, from VPC +4 continue.
Operating
If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un)){
    if(VSP<4:0>15){
        VISRC<RASO>=1;
        signal ARM7 with RASO exception;
        VP_STATE=VP_IDLE;
    }else{
        RSTACK[VSP<3:0>]=VPC+4;
        VSP<4:0>=VSP<4:0>+1;
        VPC=SRb<31:2>:b′00;
    }
}else VPC=VPC+4:
Abnormal
Return address stack overflow.
VCMOV conditional branch
Format
Figure C9711740501041
Assembler syntax
VCMOV dt  Rd,Rb,cond
VCMOV.dt  Rd,#IMM,cond
Where dt = {b, b9, h, w, f}, cond = (un, lt, eq, le, gt, ne, ge, ov}. Attention. F and. W specify the same operation, unless. F 9 data type does not support the Immediate operand.
Supported modes
D:S:M     V<-V     V<-S     V<-I     S<-S     S<-I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
If Cond is true, the contents of register Rb transferred to the register Rd. ID <1:0> Further specify the source and destination registers:
Vector register VR current group
SR scalar register
SY synchronization register
VAC vector accumulator register (register coded reference to the VAC VMOV explanation)
    D:S:M  ID<1:0>=00  ID<1:0>=01  ID<1:0>=10  ID<1:0>=11
    V<-V   VR<-VR  VR<-VAC  VAC<-VR
    V<-S   VR<-SR  VAC<-SR
    V<-I   VR<-I
    S<-S   SR<-SR
    S<-I   SR<-1
Operating
If((Cond=VCSR[SOV,GT,EQ,LT])|(Cond=un))
    for(i=0;i<NumElem;i++)
Rd [i] = {Rb [i] ‖ ‖ SRb Sex (IMM <8:0>)}; Abnormal
None.
Programming notes
Elements of this Directive without shielding effect, VCMOVM affected by shielding elements.
On the eight elements, vector floating-point precision accumulator expansion is expressed using the full 576. Because And, including the transfer of the accumulator vector registers must be specified. B9 data length.
VCMOVM elements shielded with conditional branching
Format
Figure C9711740501061
Assembler syntax
VCMOVM.dt  Rd,Rb,cond
VCMOVM.dt  Rd,#IMM,cond
Where dt = {b, b9, h, w, f}, cond = {un, lt, eq, le, gt, ne, ge, ov}. Attention. F and. W specify the same operation, unless. F 9 data type does not support the Immediate operand.
Supported modes
D:S:M     V<-V     V<-S     V<-1
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
If Cond is true, then transfer the contents of register Rb to register Rd. ID <1:0> Further specify the source and destination registers:
Vector register VR current group
SR scalar register
VAC vector accumulator register (register coded reference to the VAC VMOV explanation)
    D:S:M   ID<1:0>=00   ID<1:0>= 01   ID<1:0>=10  ID<1:0>=11
    V<-V   VR<-VR   VR<-VAC   VAC<-VR
    V<-S   VR<-SR   VAC<-SR
    V<-I   V<-I
    S<-S
    S<-I
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un))
    for(i=0;i<NumElem && MMASK[i];i++)
        Rd[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};
Abnormal
None.
Programming notes
Elements of this Directive by VMMR shielding effect, VCMOV element shielded from impact.
On the eight elements in the vector floating-point precision accumulator expansion is expressed using the full 576. Because And, including the transfer of the accumulator vector registers must be specified. B9 data length.
VCMPV comparison and set shield
Format
Assembler syntax
VCMPV.dt  VRa,VRb,cond,mask
VCMPV.dt  VRa,SRb,cond,mask
Where dt = {b, b9, h, w, f}, cond = {lt, eq, le, gt, ne, ge,}, mask = {VGMR, VMMR}. If you specify is not masked, VGMR is assumed.
Supported modes
D:S:M   M<-V@V   M<-V@S
  DS   int8(b)   int9(b9)   int16(h)   int32(w)   float(f)
Explanation
VRa and VRb vector register contents by performing a subtraction operation (VRa [i]-VRb [i]) for Element comparison means, if the result of the comparison instruction and VCMPV Cond fields match, VGMR (eg K = 0) or VMMR (eg K = 1) # i phase register bit is set. For example, If Cond field is less than (LT), when VRa [i] <VRb [i] is set VGMR [i] (or VMMR [i]) Position.
Operating
        for(i=0:i<NumElem:i++){
            Bop[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};
            relationship[i]=Ra[i]?Bop[i];
            if(K=1)
                MMASK[i]=(relationship[i]=Cond)?True:False;
            else
                EMASK[i]=(relationship[i]=Cond)?True:False;
        }
Abnormal
None.
Programming notes
This command is not affected shielding element.
VCNTLZ count leading zeros
Format
Figure C9711740501091
Assembler syntax
VCNTLZ.dt  VRd,VRb
VCNTLZ.dt  SRd,SRb
Where dt = {b, b9, h, w}.
Supported modes
    5     V<-V     S<-S
    DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
For each element in Rb number of leading zeros count; count value returned in Rd.
Operating
        for(i=0:i<NumElem && EMASK[i];i++){
            Rd[i]=number of leading zeroes(Rb[i]);
        }
Abnormal
None.
Programming notes
If all the bits are zero element, the result is equal to the length of elements (8,9,16 or 32 respectively Corresponding byte, byte9, halfword, or word).
Leading zero count position index element has an inverse relationship (if used in VCMPR instruction Behind). For the conversion to the element's position, for a given data type, subtract from NumElem VCNTLZ Results.
VCOR or complement
Format
Figure C9711740501101
Assembler syntax
VCOR.dt  VRd,VRa,VRb
VCOR.dt  VRd,VRa,SRb
VCOR.dt  VRd,VRa,#IMM
VCOR.dt  SRd,SRa,SRb
VCOR.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
 D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
For Ra and Rb / immediate operand or a logical complement, and returns the result to the destination register Operating
        for(i=0;i<NumElem && EMASK[i];i++){
            Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
            Rd[i]<k>=-Ra[i]<k>|Bop[i]<k>,k=for all bits in elementi;
Abnormal
None.
VCRSR conditions return from subroutine
Format
Figure C9711740501111
Assembler syntax
VCRSR.cond
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, return from subroutine. This is not a delayed branch
If Cond is true, from the stack, the return address stored in the return address to continue. As If not true, from VPC +4 continue.
Operating
        If((Cond=VCSR[SO,GTEQ,LT])|(Cond=un)){
            if(VSP<4:0>=0){
                VISRC<RASU>=1;
                signal ARM7 with RASU exeeption:
                VP_STATE=VP_IDLE;
            }else{
                VSP<4:0>=VSP<4:0>-1;
                VPC=RSTACK[VSP<3:0>];
                VPC<1:0>=b′00;
            }
        }else VPC=VPC+4;
Abnormal
Invalid instruction address, return address stack underflow.
VCVTB9 byte9 data type conversion
Format
Figure C9711740501121
Assembler syntax
VCVTB9.md  VRd,VRb
VCVTB9.md  SRd,SRb
Where md = {bb9, b9h, hb9}.
Support model
  S     V<-V     S<-S
  MD     bb9     b9h     hb9
Explanation
Each element in Rb convert from byte to byte9 (bb9), from byte9 converted to halfword (b9h) Conversion to or from a halfword byte9 (hb9).
Operating
        if(md<1:0>=0){//bb9 for byte to byte9 conversion
            VRd=VRb;
            VRd<9i+8>=VRb<9i+7>,i=0 to 31(or 63 in VEC64 mode)}
        else if(md<1:0>=2){//b9h for byte9 to halfword conversion
            VRd=VRb;
            VRd<18i+16:18i+9>=VRb<18i+8>,i=0 to 15(or 31 in VEC64 mode)}
        else if(md<1:0>=3)//hb9 for halfword to byte9 conversion
            VRd<18i+8>=VRb<18i+9>,i=0 to 15(or 31 in VEC64 mode)
        else VRd=undefuned;
Abnormal
None.
Programming notes
In conjunction with b9h mode before this instruction, requiring the programmer to use shuffle (shuffle) operation tone Entire vector register the decrease in the number of elements. Hb9 used together with the instruction mode, requires Programmer operation with unshuffle destination vector register adjustment the increase in the number of elements. This instruction does not Masked by the impact of elements.
VCVTFF floating-point to fixed-point conversion
Format
Assembler syntax
VCVTFF  VRd,VRa,SRb
VCVTFF  VRd,VRa,#IMM
VCVTFF  SRd,SRa,SRb
VCVTFF  SRd,SRa,#IMM
Supported modes
  D:S:M   V<-V,S   V<-V,I   S<-S,S   S<-S,I
Explanation
Vector / scalar contents of register Ra convert from a 32-bit floating point format <X,Y> sentinel Real number, wherein the length of Y the Rb (mode 32) or IMM field specifies, and X is the length from the (32-Y The Length). X indicates the integer part, Y represents the fractional part. The result is stored in the vector / scalar register Register Rd.
Operating
        Y_size={SRb%32‖IMM<4:0>};
        for(i=0;i<NumElem;i++){
            Rd[i]=convert to<32-Y_size.Y size>format(Ra[i]);
        }
Abnormal
Overflow.
Programming notes
The directive only supports Word data length. Because the structure does not support multi-register data classes Type, the instruction does not use the element shield. The directive on the use of integer data types rounded to zero rounding mode.
VCVTIF integer to floating point conversion
Format
Assembler syntax
VCVTIF  VRd,VRb
VCVTIF  VRd,SRb
VCVTIF  SRd,SRb
Supported modes
D:S:M   V<-V     V<-S   S<-S
Explanation
Vector / scalar register Rb contents from int32 convert floating-point data types, the result is stored in Vector / scalar register Rd.
Operating
        for(i=0;i<NumElem;i++){
            Rd[i]=convert to floating point format(Rb[i]);
        }
Abnormal
None.
Programming notes
This instruction supports only word data length. Because the structure does not support multiple data types in the register, This instruction does not use the element shield.
VD1CBR VCR1 and conditions of transfer of minus one
Format
Figure C9711740501151
Assembler syntax
VD1CBR.cond  #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, VCR1 decremented and metastasis. This is not a delayed branch.
Operating
VCR1=VCR1-1;
If((VCR1>0)&((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)))
     VPC=VPC+Sex(Offset<22:0>*4);
else VPC=VPC+4;
Abnormal
Invalid instruction address.
Programming notes
Note VCR1 condition is checked in before the transfer minus 1. When VCR1 perform this refers to 0:00 Order the loop count is set to 2 effective32-1。
VD2CBR VCR2 minus 1 and conditional branching
Format
Figure C9711740501161
Assembler syntax
VD2CBR.cond  #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, VCR2 decremented and metastasis. This is not a delayed branch.
Operating
VCR2=VCR2-1;
If((VCR2>0)&((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)))
     VPC=VPC+sex(Offset<22:0>*4);
else VPC=VPC+4;
Abnormal
Invalid instruction address.
Programming notes
Note VCR2 condition is checked in before the transfer minus 1. When VCR2 perform this means is 0 Order the loop count is set to 2 effective32-1。
VD3CBR VCR3 minus 1 and conditional branching
Format
Assembler syntax
VD3CBR.cond  #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
When Cond is true, VCR3 minus one and metastasis. This is not a delayed branch.
Operating
VCR3=VCR3-1;
If((VCR3>0)&((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)))
     VPC=VPC+sex(Offset<22:0>*4);
else VPC=VPC+4;
Abnormal
Invalid instruction address.
Programming notes
Note VCR3 condition is checked in before the transfer minus 1. When VCR3 perform this refers to 0:00 Order the loop count is set to 2 effective32-1。
VDIV2N by 2 n Except
Format
Assembler syntax
VDIV2N.dt  VRd,VRa,SRb
VDIV2N.dt  VRd,VRa,#IMM
VDIV2N.dt  SRd,SRa,SRb
VDIV2N.dt  SRd,SRa,#IMV
Where dt = {b, b9, h, w}.
Supported modes
D:S:M     V<-V@S     V<-V@I     S<-S@S   S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / scalar contents of register Ra are twonIn addition, where n is a scalar register Rb, or 2MM Positive integer content, the final result is stored in the vector / scalar register Rd. This command uses the truncated (to Zero rounding) as rounding mode.
Operating
        N={SRb%32‖IMM<4:0>};
        for(i=0;i<NumElem && EMASK[i];i++){
            Rd[i]=Ra[i]/2 N
        }
Abnormal
None.
Programming notes
Note that N is SRb or IMM <4:0> to obtain the five digits. For byte, byte9, halfword data types, the programmer is responsible for properly specify the data length is less than or equal to the precision level N Values. If it is greater than the precision of the specified data length, the element filled with the sign bit. This instruction is used to zero Rounding rounding mode.
VDIV2N.F are two floating-point n Except
Format
Assembler syntax
VDIV2N.f  VRd,VRa,SRb
VDIV2N.f  VRd,VRa,#IMM
VDIV2N.f  SRd,SRa,SRb
VDIV2N.f  SRd,SRa,#IMM
Supported modes
D:S:M   V<-V@S   V<-V@I   S<-S@S   S<-S@I
Explanation
Vector / scalar contents of register Ra are 2n addition, where n is the scalar register Rb or the IMM Positive integer content, the final result is stored in the vector / scalar register Rd.
Operating
        N={SRb%32‖IMM<4:0>};
        for(i=0;i<NumElem && EMASK[i];i++){
            Rd[i]=Ra[i]/2 N
        }
Abnormal
None.
Programming notes
Note that N is SRb or IMM <4:0> gained five digits.
VDIVI incomplete unless initialization
Format
Figure C9711740501201
Assembler syntax
VDIVI.ds  VRb
VDIVI.ds  SRb
Where ds = {b, b9, h, w}.
Supported modes
    S     VRb     SRb
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Do not restore unsigned integer divide initialization steps. Dividend is double precision accumulator Signed integer. If the dividend is a single-precision number, it must be sign-extended to double precision, and stored in VACOH and VACOL in. Divisor is Rb in single-precision signed integer.
If the sign of the dividend the same sign as the divisor, is subtracted from the accumulator high Rb. As Different, Rb is added to the accumulator on high.
Operating
        for(i=0;i<NumElem && EMASK[i];i++){
            Bop[i]={VRb[i]‖SRb)
            if(VACOH[i]<msb>=Bop[i]<msb>)
                VACOH[i]=VACOH[i]-Bop[i];
            else
                VACOH[i]=VACOH[i]+Bop[i]:
        }
Abnormal
None.
Programming notes
In division step, the programmer is responsible for checking overflow or division by zero situation.
VDOVS incomplete unless steps
Format
Figure C9711740501211
Assembler syntax
VDIVS.ds  VRb
VDIVS.ds  SRb
Where ds = {b, b9, h, w}.
Supported modes
    S     VRb     SRb
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Perform a recovery with no sign except the election on behalf of steps. Requirements of this Directive to be executed with the same number of times According to the same length (for example, int8 is 8, int9 of 9 times, int16 to 16, int32 data Type 32). VDIVI instruction must be used in addition to the steps once before, early in the accumulator generates Initial part of the remainder. Divisor is Rb in single-precision signed integer. Each step of extracting a quotient Move accumulator bits and the least significant bit.
If the portion of the accumulator with the sign of the remainder of the same sign as the divisor in Rb, from high accumulator Bit subtracted Rb. If they are different, Rb is added to the accumulator on high.
If the accumulator portion derived remainder (plus or minus a result) the sign of the divisor symbol phase Same, then the quotient bit is 1. If not identical, then the quotient bit is 0. Accumulator left a position with suppliers Bit populated.
In addition to the steps at the end, the remainder is in the accumulator high, rather low in the business in the accumulator. This quotient 1's complement form.
Operating
VESL elements to the left one
Format
Assembler syntax
VESL.dt  SRc,VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
  S     SRb
  DS   int8(b)   int9(b9) int16(h) int32(w)
Explanation
The elements in the vector register Ra left a location from the scalar register Rb populated. Being Out of the leftmost element returns to scalar register Rc, other elements return to the vector register Rd.
Figure C9711740501222
Operating
VRd[0]=SRb;
for(i=o;i<NumElem-1;i++)
    VRd[i]=VRa[i-1];
SRc=VRa[NumElem-1];
Abnormal
None.
Programming notes
This command is not affected shielding element.
VESR elements to the right one
Format
Figure C9711740501231
Assembler syntax
VESL.dt  SRc,VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
   S     SRb
  Ds     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
The vector register Ra are elements to the right one position, from a scalar register Rb populated. Being shifted The rightmost element returns to scalar register Rc, other elements return to the vector register Rd.
Operating
SRc=VRa[0];
for(i=o;i<NumElem-2;i++)
    VRd[i]=VRa[i+1];
VRd[NumElem-1]=SRb;
Abnormal
None.
Programming notes
This command is not affected shielding element.
VEXTRT extract an element
Format
Assembler syntax
VEXTRT.dt  SRd,VRa,SRb
VEXTRT.dt  SRd,VRa,#IMM
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
D:S:M     S<-S     S<-I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
Extracted from Ra vector register elements and store them in a scalar register Rd, the register of the cable Cited by a scalar register Rb or IMM field indicates.
Operating
index32={SRb%32‖IMM<4:0>};
index64={SRb%64‖IMM<5:0>};
index=(VCSR<vec64>)?index64:index32;
SRd=VRa[index];
Abnormal
None.
Programming notes
This command is not affected shielding element.
VEXTSGN2 extraction (1, -1) symbol
Format
Figure C9711740501251
Assembler syntax
VEXTSGN2.dt VRd,VRa
VEXTSGN2.dt SRd,SRa
Where dt = {b, b9, h, w}.
Supported modes
    S     V<-V     S<-S
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Calculate vector / scalar register Ra symbol value content element method, the result is stored in the vector / Scalar register Rd.
Operating
        for(i=0;i<NumElem && EMASK[i];i++){
            Rd[i]=(Ra[i]<0)?-1:1;
        }
Abnormal
None.
VEXTSGN3 extract (1,0, -1) symbol
Format
Figure C9711740501261
Assembler syntax
VEXTSGN3.dt  VRd,VRa
VEXTSGN3.dt  SRd,SRa
Where dt = {b, b9, h, w}.
Supported modes
  S     V<-V     S<-S
  DS   int8(b)   int9(b9) int16(h)   int32(w)
Explanation
Calculate vector / scalar register Ra symbol value content element method, the result is stored in the vector / Scalar register Rd.
Operating
        for(i=0;i<NumElem && EMASK[i];i++){
            if(Ra[i]>0)        Rd[i]=1;
            else if(Ra[i]<0)   Rd[i]=-1;
            else                Rd[i]=0;
        }
Abnormal
None.
VINSRT insert an element
Format
Figure C9711740501271
Assembler syntax
VINSRT.dt VRd,SRa,SRb
VINSRT.dt VRd,SRa,#IMM
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
D:S:M     V<-S     V<-I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
The scalar register Ra, Rb elements in scalar register or IMM field specifies the index of the plug Into the vector register Rd.
Operating
index32={SRb%32‖IM4<4:0>};
index64={SRb%64‖IMM<5:0>};
index=(VCSR<Vec64>)?index64:index32;
VRd[index]=SRa;
Abnormal
None.
Programming notes
Elements of this Directive without shielding effect.
VL load
Format
Figure C9711740501281
Assembler syntax
VL.lt  Rd,SRb,SRi
VL.lt  Rd,SRb,#IMM
VL.lt  Rd,SRb+,SRi
VL.lt  Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Note. B and. Bs9 specify the same action, .64, and can not be specified together VRAd. For cache -off loaded using VLOFF.
Operating
Load current or alternative group a vector register or a scalar register.
Operating
EA=SRb+{SRi‖Sex(IMM<7:0>)};
if(A=1)SRb=EA;
Rd = the table below:
LT Load operation
.b  SR d<7:0>:=BYTE[EA]
.bz9  SR d<8:0>=zex BYTE[EA]
.bs9  SR 4<8:0>=sex BYTE[EA]
.h  SR d<15:0>=HALF[EA]
.w  SR d<31:0>=WORD[EA]
.4  VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 3
.8  VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 7
.16  VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 15
.32  VR d<9i+8:9i>=sex BYT[EA+i],i=0 to 31
.64  VR 0d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31  VR 1d<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VLCB loaded from the circular buffer
Format
Figure C9711740501301
Assembler syntax
VLCB.lt  Rd,SRb,SRi
VLCB.lt  Rd,SRb,#IMM
VLCB.lt  Rd,SRb+,SRi
VLCB.lt  Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For cache -off loaded using VLCBOFF.
Explanation
From the SRb+1The BEGIN pointer and SRb+2The END defined circular buffer pointer Loads a vector or scalar register.
Address update is loaded and before the operation, such as the effective address is greater than END address, effectively Address is adjusted. In addition,. H and. W scalar loaded circular buffer boundary must separately with halfword And word boundaries.
Operating
EA=SR b+{SRi‖sex(IMM<7:0>)};
BEGIN=SR b+1
END=SR b+2
cbsize=END-BEGIN;
if(EA>END)EA=BEGIN+(EA-END);
if(A=1)SR b=EA;
R d= See the following table:
LT Load operation
.bz9  SR d<8:0>=zex BYTE[EA]
.bs9  SR d<8:0>=sex BYTE[EA]
.h  SR d<15:0>=HALF[EA]
.w  SR d<31:0>=WORD[EA]
.4  VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 3
.8  VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 7
  LT Load operation
  .16   VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 15
  .32   VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 31
  .64   VR 0d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 31   VR 1d<9i+8:9i>=sex BYTE[(EA+32+i>END)?EA+32+i-cbsize:EA+32+i],i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
Programmer must determine the following condition for this command to work as desired:
BEGIN<EA<2*END-BEGIN
Namely, EA> BEGIN and EA-END <END-BEGIN.
VLD double load
Format
Figure C9711740501321
Assembler syntax
VLD.lt Rd,SRb,SRi
VLD.lt Rd,SRb,#IMM
VLD.lt Rd,SRb+,SRi
VLD.lt Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For cache -off loaded using VLDOFF.
Explanation
Load current or alternative group two vector registers or two scalar register.
Operating
EA=SR b+{SR i‖Sex(IMM<7:0>)};
if(A=1)SR b=EA;
R d:R d+1= Table below:
LT Load operation
.bz9  SR d<8:0>=zex BYTE[EA]  SR d+1<8:0>=zex BYTE[EA+1]
.bs9  SR d<8:0>=zex BYTE[EA]  SR d+1<8:0>=zex BYTE[EA+1]
.h  SR d<15:0>=HALF[EA]  SR d+1<15:0>=HALF[EA+2]
.w  SR d<31:0>=WORD[EA]  SR d+1<31:0>=WORD[EA+4]
.4  VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 3  VR d+1<9i+8:9i>=sex BYTE[EA+4+i],i=0 to 3
.8  VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 7  VR d+1<9i+8:9i>=sex BYTE[EA+8+i],i=0 to 7
  LT Load operation
  .16   VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 15   VR d+1<9i+8:9i>=sex BYTE[EA+16+i],i=0 to 15
  .32   VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31   VR d+1<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31
  .64   VR 0d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31   VR 1d<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31   VR 0d+1<9i+8:9i>=sex BYTE[EA+64+i],i=0 to 31   VR 1d+1<9i+8:9i>=sex BYTE[EA+96+i],i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VLI immediate loading
Format
Figure C9711740501341
Assembler syntax
VLI.dt  VRd,#IMM
VLI.dt  SRd,#IMM
Where dt = {b, b9, h, w, f}.
Explanation
Scalar or vector registers to load the immediate value.
Scalar register is loaded, according to the type of data loaded byte, byte9, halfword, or word. For byte, byte9 and halfword data types, unaffected those byte (byte9) does not Is changed.
Operating
Rd = the table below:
  DT Loading scalar Vector Load
  .i8   SR d<7:0>=IMM<7:0>   VR d=32 int8 elements
  .i9   SR d<8:0>=IMM<8:0>   VR d=32 int9 elements
  .i16   SR d<15:0>=IMM<15:0>   VR d=16 int16 elements
  .i32   SR d<31:0>=sex IMM<18:0>   VR d=8 int32 elcments
  .f   SR d<31>=IMM<18>(sign)   SR d<30:23>=IMM<17:10>(exponent)   SR d<22:13>=IMM<9:0>(mantissa)   SR d<12:0>=zeroes   VR d=8 float elements
Abnormal
None.
VLQ four load
Format
Figure C9711740501351
Assembler syntax
VLQ.lt Rd,SRb,SRi
VLQ.lt Rd,SRb,#IMM
VLQ.lt Rd,SRb+,SRi
VLQ.lt Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For Cache -off load utilization VLQOFF.
Explanation
Group in the current or alternative loading four vector registers or four scalar register.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)];
if(A=1)SR b=EA;;
R d:R d+1:R d+2:R d+3= Table below:
  LT Load operation
.bz9 SR d<8:0>=zex BYTE[EA] SR d+1<8:0>=zex BYTE[EA+1] SR d+2<8:0>=zex BYTE[EA+2] SR d3<8:0>=zex BYTE[EA+3]
.bs9 SR d<8:0>=zex BYTE[EA] SR d+1<8:0>=zex BYTE[EA+1] SR d+2<8:0>=zex BYTE[EA+2] SR d+3<8:0>=zex BYTE[EA+3]
.h SR d<15:0>=HALF[EA] SR d+1<15:0>=HALF[EA+2] SR d+2<15:0>=HALF[EA+4] SR d+3<15:0>=HALF{EA+6]
  LT Load operation
  .w   SR d<31:0>=WORD[EA]   SR d+1<31:0>=WORD[EA+4]   SR d+2<31:0>=WORD[EA+8]   SR d+3<31:0>=WORD[EA+12]
  .4   VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 3   VR d+1<9i+8:9i>=sex BYTE[EA+4+i],i=0 to 3   VR d+2<9i+8:9i>=sex BYTE[EA+8+i],i=0 to 3   VR d+3<9i+8:9i>=sex BYTE[EA+12+i],i=0 to 3
  .8   VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 7   VR d+1<9i+8:9i>=sex BYTE[EA+8+i],i=0 to 7   VR d+2<9i+8:9i>=sex BYTE[EA+16+i],i=0 to 7   VR d+3<9i+8:9i>=sex BYTE[EA+24+i],i=0 to 7
  .16   VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 15   VR d+1<9i+8:9i>=sex BYTE[EA+16+i],i=0 to 15   VR d+2<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 15   VR d+3<9i+8:9i>=sex BYTE[EA+48+i],i=0 to 15
  .32   VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31   VR d+1<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31   VR d+2<9i+8:9i>=sex BYTE[EA464+i],i=0 to 31   VR d+3<9i+8:9i>=sex BYTE[EA+96+i],i=0 to 31
  .64   VR 0d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31   VR 1d<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31   VR 0d+1<9i+8:9i>=sex BYTE[EA+64+i],i=0 to 31   VR 1d+1<9i+8:9i>=sex BYTE[EA+96+i],i=0 to 31   VR 0d+2<9i+8:9i>=sex BYTE[EA+128+i],i=0 to 31   VR 1d+2<9i+8:9i>=sex BYTE[EA+160+i],i=0 to 31   VR 0d+3<9i+8:9i>=sex BYTE[EA+192+i],i=0 to 31   VR 1d+3<9i+8:9i>=sex BYTE[EA+224+i],i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VLR reverse loading
Format
Figure C9711740501371
Assembler syntax
VLR.lt Rd,SRb,SRi
VLR.lt Rd,SRb,#IMM
VLR.lt Rd,SRb+,SRi
VLR.lt Rd,SRb+,#IMM
Where lt = {4,8,16,32,64}, Rd = {VRd, VRAd}. Note .64 and VRAd Can not be specified together. Cache-off load for use VLROFF.
Explanation
Load a sequence in reverse element vector registers. This command is not supported scalar destination register.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)];
if(A=1)SR b=EA;
Rd = the table below:
  LT Load operation
  .4   VR d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 3
  .8   VR d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 7
  .16   VE d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 15
  .32   VR d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 31
  .64   VR 0d[31-i]<8:0>=sex BYTE[EA+32+i],i=0 to 31   VR 1d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 31
Abnormal
Invalid data address address misaligned accesses.
Programming notes
This command is not affected shielding element.
VLSL Logical Shift Left
Format
Figure C9711740501381
Assembler syntax
VLSL.dt  VRd,VRa,SRb
VLSL.dt  VRd,VRa,#IMM
VLSL.dt  SRd,SRa,SRb
VLSL.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
 D:S:M     V<-V@S     V<-V@I     S<-S@S   S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / scalar register Ra logical shift left each element, the least significant bit (LSB) position Zero fill, the shift amount in the scalar register Rb or IMM field is given, the result is stored in the vector / standard Amount of register Rd.
Operating
        shift_amount={SRb%32‖IMM<4:0>};
        for(i=0;i<NumElem && EMASK[i]:i++){
            Rd[i]=Ra[i]<<shift_amount;
        }
Abnormal
None.
Programming notes
Note that shift-amount from SRb or IMM <4:0> gained five digits, for In byte, byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to The shift amount of bits. If the shift is greater than the specified data length, the element will be filled with zeros.
VLSR Logical Shift Right
Format
Figure C9711740501391
Assembler syntax
VLSR.dt  VRd,VRa,SRb
VLSR.dt  VRd,VRa,#IMM
VLSR.dt  SRd,SRa,SRb
VLSR.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M     V<-V@S     V<-V@I     S<-S@S   S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / scalar register Ra logical shift right for each element, the most significant bit (MSB) position With zero fill, the shift amount in the scalar register Rb or IMM field is given, the result is stored in the vector / Scalar register Rd.
Operating
shift_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
    Rd[i]=Ra[i]zero>>shift_amount;
}
Abnormal
None.
Programming notes
Note that shift-amount from SRb or IMM <4:0> gained five digits, for In byte, byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to The shift amount of bits. If the shift is greater than the specified data length, the element will be filled with zeros.
VLWS span load
Format
Figure C9711740501401
Assembler syntax
VLWS.dt  Rd,SRb,SRi
VLWS.dt  Rd,SRb,#IMM
VLWS.dt  Rd,SRb+,SRi
VLWS.dt  Rd,SRb+,#IMM
Where dt = {4,8,16,32}, Rd = {VRd, VRAd}. Note that the mode is not .64 Support, with the VL instead. On the Cache-off loaded using VLWSOFF.
Explanation
From the effective address beginning with scalar register SRb +1 as the span of control registers, from the storage 32 bytes loaded into the vector registers VRd.
LT specified block size, for each block of consecutive bytes loaded. SRb +1 specified stride, Separating the start of two consecutive blocks of bytes.
stride must be equal to or greater than the block size. EA must be aligned with the data length. stride And the block size must be a multiple data length.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)};
if(A=1)SR b=EA;
Block-size={4‖8‖16‖32};
Stride=SR b+1<31:0>;
for(i=0;i<VECSIZE/Block-size;i++)
for(j=0;j<Block-size;j++)
VRd[i*Block-size+j]<8:0>=sex BYTE{EA+i*Stride
+j};
Abnormal
Invalid data address, unaligned accesses.
VMAC multiply and accumulate
Format
Assembler syntax
VMAC.dt  VRa,VRb
VMAC.dt  VRa,SRb
VMAC.dt  VRa,#IMM
VMAC.dt  SRa,SRb
VMAC.dt  SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int16(h)   int32(w)   float(f)
Explanation
Ra and Rb each element of each element in a double-precision multiplication to produce an intermediate result; the The intermediate results of each element of the vector of double-precision accumulator element of each double precision addition, each Double-precision elements and stored in vector accumulator.
Ra and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each element in the high part of double-precision storage In VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++)(
    Aop[i]={VRa[i]‖SRa};
    Bop[i]={VRb[i]‖SRb);
    if(dt=float)VACL[i]=Aop[i]*Bop[i]+VACL[i];
    else VACH[i]:VACL[i]=Aop[i]*Bop[i]+VACH[i]:VACL[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMACF fractional multiply and accumulate
Format
Figure C9711740501421
Assembler syntax
VMACF.dt  VRa,VRb
VMACF.dt  VRa,SRb
VMACF.dt  VRd,#IMM
VMACF.dt  SRa,SRb
VMACF.dt  SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M    V<-V@V   V<-V@S   V<-V@I     S<-S@S   S<-S@I
  DS    int8(b)   int16(h)     int32(w)
Explanation
VRa and Rb each element of each element in a double-precision multiplication to produce an intermediate result; This left a double-precision intermediate results; the shifted intermediate results of each of the double-precision elements and to Double the amount of each element of the accumulator sum; each element to a double-precision vector accumulator and storage Medium.
VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each element of the double-precision high portion Points stored in VACH in.
Operating
    for(i=0;i<NumElem && EMASK[i];i++){
        Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
        VACH[i]:VACL[i]=((VRa[i]*Bop[i])<<1)+VACH[i]:VACL[i];
    }
Abnormal
Overflow.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMACL multiply and accumulate low
Format
Figure C9711740501431
Assembler syntax
VMACL.dt  VRd,VRa,VRb
VMACL.dt  VRd,VRa,SRb
VMACL.dt  VRd,VRa,#IMM
VMACL.dt  SRd,SRa,SRb
VMACL.dt SRd,SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M   V<-V@V  V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int16(h)   int32(w)   float(f)
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; this intermediate results of each double-precision vector accumulator elements and each element of double precision sum; Each element will be stored in double-precision and vector accumulator; returned to the lower part of the destination register bit VRd.
VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each bit double precision element part Stored in VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb};
    if(dt=float)VACL[i]=VRa[i]*Bop[i]+VACL[i];
    else VACH[i]:VACL[i]=VRa[i]*Bop[i]+VACH[i]:VACL[i];
    VRd[i]=VACL[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VMAD multiplication and addition
Format
Assembler syntax
VMAD.dt  VRc,VRd,VRa,VRb
VMAD.dt  SRc,SRd,SRa,SRb
Where dt = (b, h, w).
Supported modes
  S     VR     SR
  DS     int8(b)     int16(h)     int32(w)
Explanation
Each element of the Ra and Rb each element of a double-precision multiplication to produce intermediate results; Each of the intermediate results of this double-precision elements and adding each element Rc; double precision of each element Degrees and stored in the destination register Rd +1: Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Aop[i]={VRa[i]‖SRa];
    Bop[i]=(VRb[i]‖SRb};
    Cop[i]=(VRc[i]‖SRc};
    Rd+1[i]:Rd[i]=Aop[i]*Bop[i]+sex_dp(Cop[i]);
}
Abnormal
None.
VMADL low multiplication and addition
Format
Figure C9711740501451
Assembler syntax
VMADL.df  VRc,VRd,VRa,VRb
VMADL.dt  SRc,SRd,SRa,SRb
Where dt = {b, h, w, f}.
Supported modes
    S     VR     SR
  DS     int8(b)     float(f)     int16(h)     int32(w)
Explanation
Each element of the Ra and Rb each element of a double-precision multiplication to produce intermediate results; This intermediate result for each element of the double-precision adding each element Rc; double precision of each element Degrees and the low part of the return to the destination register Rd.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++)(
    Aop[i]={VRa[i]‖SRa};
    Bop[i]={VRb[i]‖SRb];
    Cop[i]={VRc[i]‖SRc{;
    if(dt=Roat)Lo[i]=Aop[i]*Bop[i]+Cop[i];
    else Hi[i]:Lo[i]=Aop[i]*Bop[i]+sex_dp(Cop[i]);
    Rd[i]=Lo[i];
}
Abnormal
Overflow invalid floating point operands.
VMAS multiply and subtract from accumulator
Format
Assembler syntax
VMAS.dt  VRa,VRb
VMAS.dt  VRa,SRb
VMAS.dt  VRa,#IMM
VMAS.dt  SRa,SRb
VMAS.dt  SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M     V<-V@V  V<-V@S     V<-V@I     S<-S@S     S<-S@I
  DS     int8(b)     int16(h)     int32(w)     float(f)
Explanation
Ra and Rb each element to each element in a double-precision multiplication to produce an intermediate result; from Each double precision vector accumulator element by subtracting the intermediate results of each double precision element; each element Double-precision and storage elements to the vector accumulator.
Ra and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each element in the high part of double-precision storage In VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
    for(i=0;i<NumElem && EMASK[i];i++){
        Bop[i]={VRb[i]‖SRb};
        if(dt=float)VACL[i]=VACL[i]-VRa[i]*Bop[i];
        else VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]*Bop[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMASF fractional multiply and subtract from accumulator
Format
Figure C9711740501471
Assembler syntax
VMASF.dt  VRa,VRb
VMASF.dt  VRa,SRb
VMASF.dt  VRa,#IMM
VMASF.dt  SRa,SRb
VMASF.dt  SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M     V<-V@V  V<-V@S     V<-V@I     S<-S@S  S<-S@I
  DS     int8(b)     int16(h)     int32(w)
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; intermediate results of a double-precision left one; from each double-precision vector accumulator subtracts the elements are Each intermediate result shift double precision element; the double of each element to the vector accumulation and storage Makers.
VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively, and int8, int16 and int32). Each element of the double-precision high portion Points stored in VACH in.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)];
    VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]*Bop[i];
}
Abnormal
Overflow.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMASL low multiply and subtract from accumulator
Format
Assembler syntax
VMASL.dt  VRd,VRa,VRb
VMASL.dt  VRd,VRa,SRb
VMASL.dt  VRd,VRa,#IMM
VMASL.dt  SRd,SRa,SRb
VMASL.dt  SRd,SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M     V<-V@V  V<-V@S     V<-V@I     S<-S@S     S<-S@I
 DS     int8(b)     int16(h)     int32(w)     float(f)
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; from the vector accumulator subtracts the elements of each double-precision double-precision intermediate results of each element; each Elements and stored in double-precision vector accumulator; the lower part of the store to the destination register VRd.
RVa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each double-precision elements stored in the high part of VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb};
    if(dt=float)VACL[i]=VACL[i]-VRA[i]*Bop[i];
    else VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]*Bop[i]:
    VRd[i]=VACL[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VMAXE and maximum pairwise exchange
Format
Assembler syntax
VMAXE.dt VRd,VRb
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M    V<-V
  DS   int8(b)   int9(b9)   int16(h)   int32(w)   float(f)
Explanation
VRa should be equal VRb. When VRa with VRb not the same, the result is undefined.
Each vector register Rb even / odd pairs of data elements are compared, and each data element pairs The larger value is stored to the even positions, each data element of the vector stored in the smaller register Rd In an odd position.
Operating
for(i=0;i<NumElem && EMASK[i];i=i+2)(
    VRd[i]=(VRb[i]>VRb[i+1])?VRb[i]:VRb[i+1];
    VRd[i+1]=(VRb[i]>VRb[i+1])?VRb[i+1]:VRb[i]:
}
Abnormal
None.
VMOV transfer
Format
Assembler syntax
VMOV.dt Rd,Rb
Where dt = {b, b9, h, w, f}. Rd and Rb instruction register name specified on the structure.
Note. W and. F indicate the same operation.
Supported modes
Explanation
Transfer the contents of register Rb to register Rd. Group field specifies the source and destination registers Groups. Register set markup approach is:
Vector register VR current group
VRA substitution group vector register
SR scalar register
SP special register
RASR return address stack register
MAC vector accumulator registers (see below VAC register code table)
Group <3:0> Source group Destination group Note
    0000 Retention
    0001   VR   VRA
    0010   VRA   VR
    0011   VRA   VRA
    0100 Retention
    0101 Retention
    0110   VRA   VAC
    0111   VAC   VRA
    1000 Retention
    1001   SR   VRA
    1010 Retention
    1011 Retention
    1100   SR   SP
    1101   SP   SR
    1110   SR   RASR
    1111   RASR   SR
Note that you can not use this command to the vector register scalar register. VEXTRT instruction is Provided for this purpose.
The VAC register encoded using the following table:
  R<2:0> Register Note
    000 Undefined
    001   VAC0L
    010   VAC0H
    011   VAC0 Specify VAC0H: VAC0L both. As Designated as the source, VRd +1: VRd Send Register pair is updated. VRd must be an even Number of registers.
    100 Undefined
    101   VAC1L
    110   VAC1H
    111   VAC1 Specify VAC1H: VAC1L both. As Designated as the source, VRd +1: VRd Send Register pair is updated. VRd must be an even Number of registers.
Other Undefined
Operating
Rd=Rb
Abnormal
Set in VCSR or VISRC unusual event status bit will cause a corresponding anomalies.
Programming notes
This command is not affected shielding element. Note that the mode used in VEC64 replacement group does not exist Concept, VEC64 mode, the instruction can not be used to replace the group from the register or to alternative groups Register transfers.
VMUL multiply
Format
Assembler syntax
VMUL.dt VRc,VRd,VRa,VRb
VMUL.dt SRc,SRd,SRa,SRb
Where dt = {b, h, w}.
Supported modes
    S     VR     SR
    DS     int8(b)     int16(h)     int32(w)
Explanation
Each element of the Ra and Rb each element to produce a double-precision multiplication result; each Elements and return to double-precision destination register Rc: Rd.
Ra and Rb using the specified data type, and Rc: Rd using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each element of the double-precision high portion Points stored in Rc.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Aop[i]={VRa[i]‖SRa};
    Bop[i]={VRb[i]‖SRb};
    Hi[i]:Lo[i]=Aop[i]*Bop[i]:
    Rc[i]=Hi[i];
    Rd[i]=Lo[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead. This command does not support Floating-point data types, as a result of extended data types are not supported.
VMULA multiply to accumulator
Format
Figure C9711740501531
Assembler syntax
VMULA.dt  VRa,VRb
VMULA.dt  VRa,SRb
VMULA.dt  VRa,#IMM
VMULA.dt  SRa,SRb
VMULA.dt  SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M     V@V     V@S     V@I     S@S     S@I
  DS     int8(b)     int16(h)     int32(w)   float(f)
Explanation
Each element of the VRa and Rb each element to produce a double-precision multiplication result; the This result is written to the accumulator.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb};
    if(dt==float)VACL[i]=VRa[i]*Bop[i];
    else VACH[i]:VACL[i]=VRa[i]*Bop[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VMULAF decimal multiply to accumulator
Format
Figure C9711740501541
Assembler syntax
VMULAF.dt  VRa,VRb
VMULAF.dt  VRa,SRb
VMULAF.dt  VRa,#IMM
VMULAF.dt  SRa,SRb
VMULAF.dt  SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M     V@V     V@S     V@I     S@S     S@I
  DS     int8(b)     int16(h)     int32(w)
Explanation
Each element of the VRa and Rb multiplying each element to produce a knot in the middle of the double-precision Fruit; This left a double-precision intermediate results; writes the result to the accumulator.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    VACH[i]:VACL[i]=(VRa[i]*Bop[i])<<1;
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMULF multiply decimals
Format
Assembler syntax
VMULF.dt  VRd,VRa,VRb
VMULF.dt  VRd,VRa,SRb
VMULF.dt  VRd,VRa,#IMM
VMULF.dt  SRd,SRa,SRb
VMULF.dt  SRd,SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M    V<-V@V   V<-V@S     V<-V@I    S<-S@S   S<-S@I
  DS     int8(b)     int16(h)    int32(w)
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; This left a double-precision intermediate results; high part of the result returned to the destination register VRd +1, The low part of the return to the destination register VRd. VRd register must be an even number.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    Hi[i]:Lo[i]=(VRa[i]*Bop[i])<<1;
    VRd+1[i]=Hi[i];
    VRd[i]=Lo[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMULFR multiply decimals and rounding
Format
Assembler syntax
VMULFR.dt  VRd,VRa,VRb
VMULFR.dt  VRd,VRa,SRb
VMULFR.dt  VRd,VRa,#IMM
VMULFR.dt  SRd,SRa,SRb
VMULFR.dt  SRd,SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M  V<-V@V  V<-V@S   V<-V@I   S<-S@S  S<-S@I
 DS   int8(b)   int16(h)   int32(w)
Explanation
Each element of the VRa and Rb each element to produce a double-precision multiplication intermediate result; This left a double-precision intermediate results; this is shifted intermediate result is rounded to the upper part; highs Partial return to the destination register VRd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]=(VRb[i]‖SRb‖sex(IIMM<8:0>)};
    Hi[i]:Lo[i]=(VRa[i]*Bop[i])<<1;
    if(Lo[i]<msb>==1) Hi[i]=Hi[i]+1;
    VRd[i]=Hi[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMULL by low
Format
Figure C9711740501571
Assembler syntax
VMULL.dt  VRd,VRa,VRb
VMULL.dt  VRd,VRa,SRb
VMULL.dt  VRd,VRa,#IMM
VMULL.dt  SRd,SRa,SRb
VMULL.dt  SRd,SRa,#IMM
Where dt = (b, h, w, f}.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
 DS   int8(b)   int16(h)   int32(w)   float(f)
Explanation
Each element of the VRa and Rb each element to produce a double-precision multiplication result; result The lower part of the return to the destination register VRd.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb};
    if(dt=Roat)Lo[i]=VRa[i]*Bop[i];
    else Hi[i]:Lo[i]=VRa[i]*Bop[i];
    VRd[i]=Lo[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VNAND and non-
Format
Figure C9711740501581
Assembler syntax
VNAND.dt  VRd,VRa,VRb
VNAND.dt  VRd,VRa,SRb
VNAND.dt  VRd,VRa,#IMM
VNAND.dt  SRd,SRa,SRb
VNAND.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
For each element in Ra every one with Rb / immediate operands in the corresponding bit logic NAND, results are returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]=[VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]<k>=-(Ra[i]<k> & Bop[i]<k>).for k=all bits in elementi;
}
Abnormal
None.
VNOR or
Format
Assembler syntax
VNOR.dt  VRd,VRa,VRb
VNOR.dt  VRd,VRa,SRb
VNOR.dt  VRd,VRa,#IMM
VNOR.dt  SRd,SRa,SRb
VNOR.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M     V<-V@V     V<-V@S     V<-V@I     S<-S@S  S<-S@I
 DS     int8(b)     ini9(b9)     int16(h)     int32(w)
Explanation
For each element in Ra every one with Rb / immediate operands corresponding bits in the logic NOR; Results are returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]<k>=-(Ra[i]<k>|Bop[i]<k>).for k=all bits in elementi;
}
Abnormal
None.
VOR or
Format
Assembler syntax
VOR.dt  VRd,VRa,VRb
VOR.dt  VRd,VRa,SRb
VOR.dt  VRd,VRa,#IMM
VOR.dt  SRd,SRa,SRb
VOR.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M    V<-V@V    V<-V@S    V<-V@I     S<-S@S   S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
For each element in Ra every one with Rb / immediate operands corresponding bit logical OR; Results are returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]<k>=Ra[i]<k>|Bop[i]<k>,for k=all bits in elementi;
}
Abnormal
None.
VORC ​​or complement
Format
Figure C9711740501611
Assembler syntax
VORC.dt  VRd,VRa,VRb
VORC.dt  VRd,VRa,SRb
VORC.dt  VRd,VRa,#IMM
VORC.dt  SRd,SRa,SRb
VORC.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
For each element in Ra every one with Rb / immediate operand corresponding logical complement of the bit OR; result is returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)}
    Rd[i]<k>=Ra[i]<k>|-Bop[i]<k>,for k=all bits in elementi;
}
Abnormal
None.
VPFTCH prefetch
Format
Figure C9711740501621
Assembler syntax
VPFTCH.ln  SRb,SRi
VPFTCH.ln  SRb,#IMM
VPFTCH.ln  SRb+,SRi
VPFTCH.ln SRb+,#IMM
Where ln = {1,2,4,8}.
Explanation
Start from a valid address multiple vector data Cache prefetch rows. Cache is specified as the number of rows Next:
LN <1:0> = 00: prefetch a line of 64 bytes Cache
LN <1:0> = 01: prefetch two rows of 64 bytes Cache
LN <1:0> = 10: prefetch 4 lines of 64 bytes Cache
LN <1:0> = 11: prefetch 8 lines of 64 bytes Cache
If the address is not valid falls on 64-byte boundary, then the first 64 bytes truncated to match the edges Boundary alignment.
Operating
Abnormal
Invalid data address anomalies.
Programming notes
EA <31:0> pointed out in a local memory byte address.
VPFTCHSP prefetched into temporary memory
Format
Assembler syntax
VPFTCHSP.ln  SRp,SRb,SRi
VPFTCHSP.ln  SRp,SRb,#IMM
VPFTCHSP.ln  SRp,SRb+,SRi
VPFTCHSP.ln  SRP,SRb+,#IMM
Where ln = {1,2,4,8}. Note VPFTCH and VPFTCHSP have the same Opcode.
Explanation
Temporary memory from the memory to send multiple blocks of 64 bytes. Effective address given memory Start address and SRp provide temporary memory starting address. The number of 64-byte blocks are assigned as follows:
LN <1:0> = 00: sending a 64-byte block
LN <1:0> = 01: sending two 64-byte blocks
LN <1:0> = 10: transmission 4 blocks of 64 bytes
LN <1:0> = 11: sending eight blocks of 64 bytes
If the address is not valid falls on 64-byte boundaries, first truncated to make the 64-byte boundary Alignment. If SRp in the temporary memory address pointer does not fall on a 64-byte boundary, it also cut Off with the 64-byte boundary alignment. Align the temporary memory pointer address to increase the number of bytes transferred.
Operating
EA=SRb+{SRi‖sex(IMM<7:0>)};
if(A=1)SRb=EA;
Num_bytes={64‖128‖256‖512};
Mem_adrs=EA<31:6>:6b′000000;
SRp=SRp<31:6>:6b′000000;
for(i=0;i<Num_bytes;i++)
SPAD[SRp++]=MEM[Mem_adrs+i];
Abnormal
Invalid data address anomalies.
VROL Rotate Left
Format
Assembler syntax
VROL.dt  VRd,VRa,SRb
VROL.dt  VRd,VRa,#IMM
VROL.dt  SRd,SRa,SRb
VROL.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M   V<-V@S   V<-V@I   S<-S@S   S<-S@I
 DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
Vector / scalar register Ra left circle for each data element, left the number of bits in a scalar register Rb, or IMM field is given, the result is stored vector / scalar register Rd.
Operating
rotate_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
    Rd[i]=Ra[i]rotate_left rotate_amount;
}
Abnormal
None.
Programming notes
Note rotate-amount from SRb or IMM <4:0> gained five digits. For byte, byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to the number of bits Cyclic shift amount. If the shift amount is greater than the specified data length, the result is undefined.
Note that n bits Rotate Left Rotate Right ElemSize-n bits, where ElemSize table The length of the given data shows the number of bits.
VROR Rotate Right
Format
Figure C9711740501651
Assembler syntax
VROR.dt VRd,SRa,SRb
VROR.dt VRd,SRa,#IMM
VROR.dt SRd,SRa,SRb
VROR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M     V<-V@S     V<-V@I     S<-S@S    S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector / scalar register Ra each data element rotated right, right by the number of bits in a scalar register Rb, or IMM field is given, the result is stored vector / scalar register Rd.
Operating
rotate_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
    Rd[i]=Ra[i]rotate_right rotate_amount;
}
Abnormal
None.
Programming notes
Note rotate-amount from SRb or IMM <4:0> has made a number of five. For byte, byte9, halfword data types, the programmer responsible for the correct designation is less than or equal to the data The length of the cyclic shift amount of bits. If the shift amount is greater than the specified data length, the result Is undefined.
Note that the loop right by n bits is equivalent to rotate left ElemSize-n bits, where ElemSize table The length of the given data shows the number of bits.
VROUND will float to integer rounding
Format
Figure C9711740501661
Assembler syntax
VROUND.rm  VRd,VRb
VROUND.rm  SRd,SRb
Where m = {ninf, zero, near, pinf}.
Supported modes
D:S:M     V<-V     S<-S
Explanation
Vector / scalar register Rb contents of floating-point data format rounding to become the nearest 32-bit integer Number (Word), the result is stored in the vector / scalar register Rd. Rounding mode specified in RM.
RM<1:0> Mode Significance
    00     ninf To - ∞ rounding
    01     zero Rounding toward zero
    10     near Rounded to the nearest even number
    11     pinf Rounding toward + ∞
Operating
for(i=0;i<NumElem;i++){
    Rd[i]=Convert to int32(Rb[i]);
}
Abnormal
None.
Programming notes
This command is not affected shielding element.
VSATL saturation to low limit
Format
Assembler syntax
VSATL.dt  VRd,VRa,VRb
VSATL.dt  VRd,VRa,SRb
VSATL.dt  VRd,VRa,#IMM
VSATL.dt  SRd,SRa,SRb
VSATL.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Note 9 immediate unsupported. F data types.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S  S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)   float(f)
Explanation
Vector / scalar register Ra with it each data element in the vector / scalar register Rb or IMM Field compared to the corresponding lower limit check. If the data element value smaller than the lower limit, were set equal to In the lower limit, and the final result is stored in vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]=(Ra[i]<Bop[i]?Bop[i]:Ra[i];
}
Abnormal
None.
VSATU saturate the high limit
Format
Assembler syntax
VSATU.dt  VRd,SRa,SRb
VSATU.dt  VRd,SRa,SRb
VSATU.dt  VRd,SRa,#IMM
VSATU.dt  SRd,SRa,SRb
VSATU.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}. Note 9 immediate unsupported. F data types.
Supported modes
D:S:M    V<-V@V     V<-V@S     V<-V@I     S<-S@S     S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)     float(f)
Explanation
Vector / scalar register Ra with it each data element in the vector / scalar register Rb or IMM Field is checked against the corresponding high limit. If the data element is greater than this upper limit, were set equal to At high limit, and the final result is stored in vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)];
    Rd[i]=(Ra[i]>Bop[i])?Bop[i]:Ra[i];
}
Abnormal
None.
VSHFL shuffling
Format
Assembler syntax
VSHFL.dt  VRc,VRd,VRa,VRb
VSHFL.dt  VRc,VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
  S     VRb     SRb
  DS   int8(b)   int9(b9)   int18(h)   int32(w)
Explanation
Vector contents of register Ra and Rb shuffling, the result is stored in the vector register Rc: Rd, As shown below:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VSHFLH shuffling high
Format
Figure C9711740501701
Assembler syntax
VSHFLH.dt  VRd,VRa,VRb
VSHFLH.dt  VRd,VRa,SRb
Where dt = {b, b9, h, w, f]. Attention. W and. F specify the same operation.
Supported modes
    S     VRb     SRb
    DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector contents of register Ra and Rb shuffled, placed in the high part of the result vector register Rd, as shown below:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VSHFLL shuffled Low
Format
Figure C9711740501711
Assembler syntax
VSHFLL.dt  VRd,VRa,VRb
VSHFLL.dt  VRd,VRa,SRb
Where dt = {b, b9, h, W, f}. Attention. W and. F specify the same operation.
Supported modes
   S     VRb     SRb
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
Vector contents of register Ra and Rb shuffling, the results stored in the lower part of the vector register Rd, as shown below:
Figure C9711740501712
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VST Storage
Format
Assembler syntax
VST.st  Rs,SRb,SRi
VST.st  Rs,SRb,#IMM
VST.st  Rs,SRb+,SRi
VST.st  Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.
Note. B and. B9t indicate the same operation, .64, and can not be specified together VRAs. On the Cache-off Storage usage VSTOFF.
Explanation
Storing a vector or scalar register.
Operating
EA=SR b+[SR i‖sex(IMM<7:0>)};
if(A=1)SR b=EA;
MEM [EA] = table:
    ST Storage operations
.b  BYTE[EA]=SR s<7:0>
.h  HALF[EA]=SR s<15:0>
.w  WORD[EA]=SR s<31:0>
.4  BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 3
.8  BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 7
.16  BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 15
.32  BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 31
.64  BYTE[EA+i]=VR 0s<9i+7:9i>,i=0 to 31  BYTE[EA+32+i]=VR 1s<9i+7:9i>,i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VSTCB stored in a circular buffer
Format
Assembler syntax
VSTCB.st  Rs,SRb,SRi
VSTCB.st  Rs,SRb,#IMM
VSTCB.st  Rs,SRb+,SRi
VSTCB.st  Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.
Note. B and. B9t indicate the same operation, .64, and can not be specified together VRAs. On the Cache-off Use VSTCBOFF.
Explanation
From the circular buffer stores the vector or scalar register, a circular buffer bounded by SRb+1In The BEGIN pointer and SRb+2The END-pointer.
And address of the storage before the update operation, if the effective address is greater than END address, it will be Adjusted. In addition,. H and. W scalar boundary of the circular buffer must be loaded separately and with halfword Word boundary alignment.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)};
BEGIN=SR b+1
END=SR b+2
cbsize=END-BEGIN;
if(EA>END)EA=BEGIN+(EA-END);
if(A=1)SR b=EA;
MEM [EA] = table:
  ST Storage operations
  .b   BYTE[EA]=SR s<7:0>;
  .h   HALF[EA]=SR s<15:0>;
  .w   WORD[EA]=SR s<31:0>;
  .4   BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 3
  .8   BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 7
  .16   BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 15
  .32   BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 31
  .64   BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR 0s<9i+7:9i>,i=0 to 31   BYTE[(EA+32+i>END)?EA+32+i-cbsize:EA+32+i]=VR 1s<9i+7:9i>.   i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
Programming of the following conditions must be determined in order to make this work in the desired command:
BEGIN<EA<2*END-BEGIN
Namely, EA> BEGIN and EA-END <END-BEGIN
VSTD dual storage
Format
Assembler syntax
VSTD.st  Rs,SRb,SRi
VSTD.st  Rs,SRb,#IMM
VSTD.st  Rs,SRb+,SRi
VSTD.st  Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}. Attention. B and. B9t specify the same operation, .64 and VRAs can not be specified together. On the Cache-off Storage usage VSTDOFF.
Explanation
From the current or alternative storage group from two vector registers or two scalar register.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)};
if(A=1)SR b=EA;
MEM [EA] = table:
    ST Storage operations
.b BYTE[EA]=SR s<7:0> BYTE[EA+1]=SR s+1<7:0>
.h HALF[EA]=SR s<15:0> HALF[EA+2]=SR s+1<15:0>
.w WORD[EA]=SR s<31:0> WORD[EA+4]=SR s+1<31:0>
.4 BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 3 BYTE[EA+4+i]=VR s+1<9i+7:9i>,i=0 to 3
.8 BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 7 BYTE[EA+8+i]=VR s+1<9i+7:9i>,i=0 to 7
.16 BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 15 BYTE[EA+16+i]=VR s+1<9i+7:9i>,i=0 to 15
   ST Storage operations
  .32   BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 31   BYTE[EA+32+i]=VR s+1<9i+7:9i>,i=0 to 31
  .64   BYTE[EA+i]=VR 0s<9i+7:9i>,i=0 to 31   BYTE[EA+32+i]=VR 1s<9i+7:9i>,i=0 to 31   BYTE[EA+64+i]=VR 0s+1<9i+7:9i>,i=0 to 31   BYTE[EA+96+i]=VR 1s+1<9i+7:9i>,i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
Elements of this Directive without shielding effect.
VSTQ four storage
Format
Assembler syntax
VSTQ.st  Rs,SRb,SRi
VSTQ.st  Rs,SRb,#IMM
VSTQ.st  Rs,SRb+,SRi
VSTQ.st  Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.
Attention. B and. B9t specify the same operation, .64 and VRAs can not be specified together. On the Cache-off Storage usage VSTQOFF.
Explanation
Storage from the current or alternative set of four vector registers or four scalar register.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)};
if(A=1)SR b=EA;
MEM [EA] = table:
    ST Storage operations
.b BYTE[EA]=SR s<7:0> BYTE[EA+1]=SR s+1<7:0> BYTE[EA+2]=SR s+2<7:0> BYTE[EA+3]=SR s+3<7:0>
.h HALF[EA]=SR s<15:0> HALF[EA+2]=SR s+1<15:0> HALF[EA+4]=SR s+2<15:0> HALF[EA+6]=SR s+3<15:0>
.w WORD[EA]=SR s<31:0> WORD[EA+4]=SR s+1<31:0> WORD[EA+8]=SR s+2<31:0> WORD[EA+12]=SR s+3<31:0>
  ST Storage operations
  .4   BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 3   BYTE[EA+4+i]=VR s+1<9i+7:9i>,i=0 to 3   BYTE[EA+8+i]=VR s+2<9i+7:9i>,i=0 to 3   BYTE[EA+12+i]=VR s+3<9i+7:9i>,i=0 to 3
  .8   BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 7   BYTE[EA+8+i]=VR s+1<9i+7:9i>,i=0 to 7   BYTE[EA+16+i]=VR s+2<9i+7:9i>,i=0 to 7   BYTE[EA+24+i]=VR s+3<9i+7:9i>,i=0 to 7
  .16   BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 15   BYTE[EA+16+i]=VR s+1<9i+7:9i>,i=0 to 15   BYTE[EA+32+i]=VR s+2<9i+7:9i>,i=0 to 15   BYTE[EA+48+i]=VR s+3<9i+7:9i>,i=0 to 15
  .32   BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 31   BYTE[EA+32+i]=VR s+1<9i+7:9i>,i=0 to 31   BYTE[EA+64+i]=VR s+2<9i+7:9i>,i=0 to 31   BYTE[EA+96+i]=VR s+3<9i+7:9i>,i=0 to 31
  .64   BYTE[EA+i]=VR 0s<9i+7:9i>,i=0 to 31   BYTE[EA+32+i]=VR 1s<9i+7:9i>,i=0 to 31   BYTE[EA+64+i]=VR 0s+1<9i+7:9i>,i=0 to 31   BYTE[EA+96+i]=VR 1s+1<9i+7:9i>,i=0 to 31   BYTE[EA+128+i]=VR 0s+2<9i+7:9i>,i=0 to 31   BYTE[EA+160+i]=VR 1s+2<9i+7:9i>,i=0 to 31   BYTE[EA+192+i]=VR 0s+3<9i+7:9i>,i=0 to 31   BYTE[EA+224+i]=VR 1s+3<9i+7:9i>,i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VSTR reverse Storage
Format
Figure C9711740501791
Assembler syntax
VSTR.st  Rs,SRb,SRi
VSTR.st  Rs,SRb,#IMM
VSTR st  Rs,SRb+,SRi
VSTR.st  Rs,SRb+,#IMM
Where st = {4,8,16,32,64}, Rs = {VRs, VRAs}. Note .64 and VRAs Can not be specified together. On the Cache-off storage usage VSTROFF.
Explanation
Stored in reverse order of elements in vector registers. The directive does not support scalar data source register.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)};
if(A=1)SR b=EA;
MEM [EA] = table:
ST Storage operations
.b BYTE[EA+i]=VR s[31-i]<7:0>,for i=0 to 31
.h HALF[EA+i]=VR s[15-i]<15:0>,for i=0 to 15
.w WORD[EA+i]=VR s[7-i]<31:0>,for i=0 to 7
.4 BYTE[EA+i]=VR s[31-i]<7:0>,i=0 to 3
.8 BYTE[EA+i]=VR s[31-i ]<7:0>,i=0 to 7
.16 BYTE[EA+i]=VR s[31-i]<7:0>,i=0 to 15
.32 BYTE[EA+i]=VR s[31-i]<7:0>,i=0 to 31
.64 BYTE[EA+32+i]=VR 0s[31-i]<7:0>,i=0 to 31 BYTE[EA+i]=VR 1s[31-i]<7:0>,i=0 to 31
Abnormal
Invalid data address, unaligned accesses.
Programming notes
Elements of this Directive without shielding effect.
VSTWS span storage
Format
Figure C9711740501801
Assembler syntax
VSTWS.st  Rs,SRb,SRi
VSTWS.st  Rs,SRb,#IMM
VSTWS.st  Rs,SRb+,SRi
VSTWS.st  Rs,SRb+,#IMM
Where st = [8,16,32}, Rs = {VRs, VRAs}. Note that .64 is not supported mode With VST instead. On the Cache-off storage usage VSTWSOFF.
Explanation
Start from a valid address, using scalar register SRb+1As a span of control registers, from the vector register Register VRs to store 32 bytes of memory.
ST instruction block size, block storage from each successive bytes. SRb+1Instructions stride, Separating the start of two consecutive blocks of bytes.
Stride must be equal to or greater than the block size. EA must be aligned data length. stride and block size must be a multiple data length.
Operating
EA=SR b+{SR i‖sex(IMM<7:0>)};
if(A=1)SR b=EA;
Block-size={4‖8‖16‖32};
Stride=SR b+1<31:0);
for(i=0;i<VECSIZE/Block-size;i ++)
for(j=0;j<Block-size;j ++)
BYTE[EA+I*Stride+j]=VRs{i*Block-size+j}<7:0>;
Abnormal
Invalid data address, unaligned accesses.
VSUB Less
Format
Assembler syntax
VSUB.dt  VRd,VRa,VRb
VSUB.dt  VRd,VRa,SRb
VSUB.dt  VRd,VRa,#IMM
VSUB.dt  SRd,SRa,SRb
VSUB.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
DS   int8(b)   int9(b9)   int16(h)   int32(w)   float(f)
Explanation
From the vector / scalar Subtract the contents of register Ra vector / scalar register Rb content, the knot If stored in a vector / scalar register Rd.
Operating
        for(i=0;i<NumElem && EMASK[i];i++){
            Bop[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};
            Rd[i]=Ra[i]-Bop[i];
        }
Abnormal
Overflow invalid floating point operands.
VSUBS downs and set
Format
Figure C9711740501821
Assembler syntax
VSUBS.dt  SRd,SRa,SRb
VSUBS.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M     S<-S@S     S<-S@I
  DS     int8(b)     int9(b9)     int16(h)     int32(w)     float(f)
Explanation
Subtracted from SRa SRb; result into SRd, and set VCSR in VFLAG bit.
Operating
Bop={SRb‖sex(IMM<8:0>)};
SRd=SRa-Bop;
VCSR<lt,eq,gt>=status(SRa-Bop);
Abnormal
Overflow invalid floating point operands.
VUNSHFL deshuffling
Format
Figure C9711740501831
Assembler syntax
VUNSHFL.dt  VRc,VRd,VRa,VRb
VUNSHFL.dt  VRc,VRd,VRa,SRb 
Where dt = {b, b9, h, w, f}. Note. W and. F indicate the same operation.
Supported modes
  S     VRb     SRb
  DS     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
The contents of vector register VRa Rb deshuffling and into vector register VRc: VRd, As follows:
Figure C9711740501832
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VUNSHFLH deshuffling high
Format
Assembler syntax
VUNSHFLH.dt  VRd,VRa,VRb
VUNSHFLH.dt  VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Note. W and. F indicate the same operation.
Supported modes
    S     VRb     SRb
    Ds     int8(b)     int9(b9)     int16(h)   int32(w)
Explanation
The contents of vector register VRa Rb is deshuffling; returned to the high part of the result vector register Register VRd, as follows:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VUNSHFLL deshuffling low
Format
Figure C9711740501851
Assembler syntax
VUNSHFLL.dt  VRd,VRa,VRb
VUNSHFLL.dt  VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
  S     VRb     SRb
  Ds     int8(b)     int9(b9)     int16(h)     int32(w)
Explanation
The contents of vector register VRa Rb is deshuffling; results returned to the low part of the vector register Register VRd, as follows:
Figure C9711740501852
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VWBACKSP writeback from the temporary memory
Format
Figure C9711740501871
Assembler syntax
VWBACKSP.ln  SRp,SRb,SRi
VWBACKSP.ln  SRp,SRb,#IMM
VWEACKSP.ln  SRp,SRb+,SRi
VWBACKSP.ln  SRp,SRb+,#IMM
Where ln = {1,2,4,8}. Note VWBACK and VWBACKSP use the same operating For the code.
Explanation
Transferred from the temporary memory to the memory more than 64 byte blocks. Effective address given memory Start address, SRp given temporary memory starting address. The number of 64-byte blocks are assigned as follows:
LN <1:0> = 00: sending a 64-byte block
LN <1:0> = 01: sending two 64-byte blocks
LN <1:0> = 10: transmission 4 blocks of 64 bytes
LN <1:0> = 11: sending eight blocks of 64 bytes
If the address is not valid falls on 64-byte boundary, then it is first truncated to 64 bytes with the edges Boundary alignment. If SRp pointer in the temporary memory address does not fall on 64-byte boundaries, but also Truncated and well and 64-byte boundary alignment. Align the temporary memory pointer address to send word Increase the number of sections.
Operating
    EA=SRb+{SRi‖sex(IMM<7:0>)};
    if(A=1)SRb=EA;
    Num_bytes={64‖128‖256‖512};
    Mem_adrs=EA<31:6>:6b′000000;
    SRp=SRp<31:6>:6b′000000;
    for(i=0;i<Num_bytes;i++)
        SPAD[SRp++]=MEM[Mem_adrs+i];
Abnormal
Invalid data address anomalies.
VXNOR Exclusive NOR
Format
Assembler syntax
VXNOR.dt  VRd,VRa,VRb
VXNOR.dt  VRd,VRa,SRb
VXNOR.dt  VRd,VRa,#IMM
VXNOR.dt  SRd,SRa,SRb
VXNOR.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
Vector / scalar contents of register Ra and vector / scalar register Rb contents logical XOR Africa, the result is stored in vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
    Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
    Rd[i]<k>=-(Ra[i]<k>^Bop[i]<k>),for k=all bits in elementi;
}
Abnormal
None.
VXOR XOR
Format
Figure C9711740501891
Assembler syntax
VXOR.dt  VRd,VRa,VRb
VXOR.dt  VRd,VRa,SRb
VXOR.dt  VRd,VRa,#IMM
VXOR.dt  SRd,SRa,SRb
VXOR.dt  SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M   V<-V@V   V<-V@S   V<-V@I   S<-S@S   S<-S@I
  DS   int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
Vector / scalar contents of register Ra and vector / scalar register Rb contents Exclusive The result is placed vector / scalar register Rd.
Operating
        for(i=0;i<NumElem && EMASK[i];i++){
            Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
            Rd[i]<k>=Ra[i]<k>^Bop[i]<k>,for k=all bits in elementi;
        }
Abnormal
None.
VXORALL XOR all the elements
Format
Figure C9711740501901
Assembler syntax
VXORALL.dt SRd,VRb
Where dt = {b, b9, h, w}. Attention. B and. B9 specify the same operation.
Supported modes
    DS     int8(b)   int9(b9)   int16(h)   int32(w)
Explanation
VRb each element along with the least significant bit XOR, a result is returned to SRd most Low significant bits. This command is not affected shielding element.
Operating
Abnormal
None.
VWBACK writeback
Format
Figure C9711740501861
Assembler syntax
VWBACK.ln  SRb,SRi
VWBACK.ln  SRb,#IMM
VWBACK.ln  SRb+,SRi
VWBACK.ln  SRb+,#IMM
Where ln = {1,2,4,8}.
Explanation
Vector Data Cache whose index is specified in the EA (EA match its label those same phase Anti) Cache line, as it contains the modified data, were updated to the memory. If more than one Cache line is specified, when they contain the modified data, the subsequent successive rows are updated in Cache To the memory. Cache the number of rows specified as follows:
LN <1:0> = 00: write a line of 64 bytes Cache
LN <1:0> = 01: write two lines of 64 bytes Cache
LN <1:0> = 10: write four lines of 64 bytes Cache
LN <1:0> = 11: write 8 lines of 64 bytes Cache
If the address is not valid falls on 64-byte boundary, then it is first truncated to 64 bytes with the edges Boundary alignment.
Operating
Abnormal
Invalid data address anomalies.
Programming notes
EA <31:0> point out local memory byte address.

Claims (9)

1 A processor comprising:
A scalar register, suitable for storing a single scalar value;
A vector register, for storing a plurality of data elements; and
Processing circuit, which is connected to said scalar register and said vector registers, wherein the processing Circuit is responsive to a single instruction to perform a variety of operations in parallel, each operation of said vector register With an element of data in the scalar registers of the scalar values ​​together.
(2) A method of operating the processing circuit to execute the command, the method comprising:
Read valued components constituting register data elements; and
Perform parallel operation, the operation of the scalar value combined with each data element to produce a vector Results.
3 as claimed in claim 2, wherein said parallel operation performed comprising said target Value is multiplied with each of said data elements to generate vector data results.
4 as claimed in claim 2, wherein said parallel operation performed comprising said target Value added to each of said data elements to generate vector data results.
5 as claimed in claim 2, further comprising reading from another register in said target value Combining said data element, wherein the further register for storing a single scalar value.
As claimed in claim 2, further comprising extracting from the instruction with the value of said target Combining elements of said data.
7 A method of operating the processor, the method comprising:
Providing a plurality of processors in said scalar register and a plurality of vector registers, wherein each standard Volume registers for storing a single scalar value, and each vector register adapted to store a vector component constitutes A plurality of data elements;
To each scalar register number assigned to a register, the register number is different from the label assigned to other Volume register register number;
To each vector register number assigned to a register, the register number is different from the other assigned to Volume registers register number, which is assigned to at least some of said vector register and register number assigned To the scalar register number register the same;
Forming an instruction, the instruction includes a first operand and the second operand, wherein the first operand Identifies a scalar register is a register number, the second operand is a vector register identifies Register number; and
Executing said instruction by said identifier of said first operand register, and the scalar Said identifier of said second operand vector register a transfer data between the data elements.
As claimed in claim 7, wherein:
Forming said instructions further comprises a vector used to identify data elements in the third operation Number; and which
Executing said command to said first operand by the scalar registers identified by said Identification of said second operand vector register operand identifies said third data elements Transfer data between.
(10) as claimed in claim 7, wherein:
The directive also includes the formation of another scalar register is used to identify a third operand; and Among
Executing said command to said first operand by the scalar registers identified by said Identified by the second operand in said another scalar register values ​​stored in the vector identity Register transfer data between the data elements.
CNB971174059A 1996-08-19 1997-08-19 Single-instruction-multiple-data processing with combined scalar/vector operations Expired - Fee Related CN1152300C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US69958596A 1996-08-19 1996-08-19
US699585 1996-08-19
US699,585 1996-08-19

Publications (2)

Publication Number Publication Date
CN1188275A CN1188275A (en) 1998-07-22
CN1152300C true CN1152300C (en) 2004-06-02

Family

ID=24809983

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB971174059A Expired - Fee Related CN1152300C (en) 1996-08-19 1997-08-19 Single-instruction-multiple-data processing with combined scalar/vector operations

Country Status (6)

Country Link
JP (1) JPH10143494A (en)
KR (1) KR100267089B1 (en)
CN (1) CN1152300C (en)
DE (1) DE19735349B4 (en)
FR (1) FR2752629B1 (en)
TW (1) TW346595B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103002276B (en) * 2011-03-31 2017-10-03 Vixs系统公司 Multi-format video decoder and coding/decoding method
WO2013095597A1 (en) * 2011-12-22 2013-06-27 Intel Corporation Systems, apparatuses, and methods for performing an absolute difference calculation between corresponding packed data elements of two vector registers
US9792115B2 (en) * 2011-12-23 2017-10-17 Intel Corporation Super multiply add (super MADD) instructions with three scalar terms
CN102750133B (en) * 2012-06-20 2014-07-30 中国电子科技集团公司第五十八研究所 32-Bit triple-emission digital signal processor supporting SIMD
KR102179385B1 (en) 2013-11-29 2020-11-16 삼성전자주식회사 Method and processor for implementing instruction and method and apparatus for encoding instruction and medium thereof
GB2543303B (en) * 2015-10-14 2017-12-27 Advanced Risc Mach Ltd Vector data transfer instruction
US10108581B1 (en) * 2017-04-03 2018-10-23 Google Llc Vector reduction processor
US11409692B2 (en) * 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
CN114116513B (en) * 2021-12-03 2022-07-29 中国人民解放军战略支援部队信息工程大学 Register mapping method and device from multi-instruction set architecture to RISC-V instruction set architecture

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5081573A (en) * 1984-12-03 1992-01-14 Floating Point Systems, Inc. Parallel processing system
US5001662A (en) * 1989-04-28 1991-03-19 Apple Computer, Inc. Method and apparatus for multi-gauge computation
JPH04336378A (en) * 1991-05-14 1992-11-24 Nec Corp Information processor
US5669013A (en) * 1993-10-05 1997-09-16 Fujitsu Limited System for transferring M elements X times and transferring N elements one time for an array that is X*M+N long responsive to vector type instructions
DE69519449T2 (en) * 1994-05-05 2001-06-21 Conexant Systems Inc Space pointer data path

Also Published As

Publication number Publication date
FR2752629B1 (en) 2005-08-26
DE19735349A1 (en) 1998-04-02
KR19980018065A (en) 1998-06-05
TW346595B (en) 1998-12-01
JPH10143494A (en) 1998-05-29
FR2752629A1 (en) 1998-02-27
KR100267089B1 (en) 2000-11-01
CN1188275A (en) 1998-07-22
DE19735349B4 (en) 2006-12-14

Similar Documents

Publication Publication Date Title
CN1117316C (en) Single-instruction-multiple-data processing using multiple banks of vector registers
CN1112635C (en) Single-instruction-multiple-data processing in multimedia signal processor and device thereof
CN1194292C (en) Microprocessor with improved instruction set system structure
CN1103961C (en) Coprocessor data access control
CN1135468C (en) Digital signal processing integrated circuit architecture
CN1080906C (en) System and method for processing datums
CN1246772C (en) Processor
CN101720460B (en) Compact instruction set encoding
CN1625731A (en) Configurable data processor with multi-length instruction set architecture
CN1149469C (en) Set of instructions for operating on packed data
US9411585B2 (en) Multi-addressable register files and format conversions associated therewith
CN1656495A (en) A scalar/vector processor
CN1584824A (en) Microprocessor frame based on CISC structure and instruction realizing style
CN1173931A (en) Method and appts. for custom operations of a processor
CN1152300C (en) Single-instruction-multiple-data processing with combined scalar/vector operations
CN1226323A (en) Data processing apparatus registers
CN1115631C (en) Eight-bit microcontroller having a RISC architecture
CN1103959C (en) Register addressing in a data processing apparatus
CN1862485A (en) A digital signal processor
CN1279435C (en) Digital signal processor
CN1254740C (en) Data processing using coprocessor
CN1104679C (en) Data processing condition code flags
CN1226325A (en) Input operation control in data processing systems
CN1223934C (en) Macroinstruction collecting symmetrical parallel system structure micro processor
CN1226324A (en) Data processing system register control

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20040602

Termination date: 20090819