CN1152300C

CN1152300C - Single-instruction-multiple-data processing with combined scalar/vector operations

Info

Publication number: CN1152300C
Application number: CNB971174059A
Authority: CN
Inventors: Ī��ȡ�A��º�Ĭ��; 莫塔兹·Ａ·穆罕默德; 朴宪哲; Sd; 利·T·恩格延; 罗尼·S·D·旺
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 1996-08-19
Filing date: 1997-08-19
Publication date: 2004-06-02
Anticipated expiration: 2017-08-19
Also published as: FR2752629B1; DE19735349A1; KR19980018065A; TW346595B; JPH10143494A; FR2752629A1; KR100267089B1; CN1188275A; DE19735349B4

Abstract

The digital signal processor has scalar registers for scalar values, and a group of general purpose vector registers for the vectors which form the elements of multiple data. Each register has a fixed size but can be partitioned by the user to sizes preferred for their application. The operations executed by the vector processor treat two or more vector operands to determine a vector quantity, combine a scalar operand and a vector operand to determine the total quantity, or combine two or more scalar operands to determine a scalar quantity. The scalar registers also facilitate manipulation of individual data elements in a vector register.

Description

Single-instruction-multiple-data processing in the multi-media signal processor and device thereof

Invention field

The present invention relates to Digital Signal Processing, the method and the device thereof that particularly carry out parallel processing for a plurality of data elements to each instruction of multimedia function (such as the Audio and Video Code And Decode).

Background technology

This patent document relates to and the following simultaneously patent application of application of reference:

U.S. Patent application serial number UNKNOWN1, attorney docket M-4354 is entitled as " Multiprocessor Operation in a Multimedia Signal Processor (multiprocessor operations in the multi-media signal processor) ";

U.S. Patent application serial number UNKNOWN2, attorney docket M-4355 is entitled as " Single-Instruction-Multiple-Data Processing in a Multimedia Signal Processor (single-instruction multiple-data in the multi-media signal processor is processed) ";

U.S. Patent application serial number UNKNOWN3, attorney docket M-4365 is entitled as " Efficient Context Saving and Restoring in Multiprocessors (efficient Locale Holding and recovery in the multiprocessor) ";

U.S. Patent application serial number UNKNOWN4, attorney docket M-4366 is entitled as " System and Method for Handling Software Interrupts with Argument Passing (processing has the system and method for the software interrupt of parameter transmission) ";

U.S. Patent application serial number UNKNOWN5, attorney docket M-4367 is entitled as " System and Method for Handling Interrupts and Exception Events in an Asymmetric Multiprocessor Architecture (system and method for handling interrupt and anomalous event in asymmetric multi-processor structure) ";

U.S. Patent application serial number UNKNOWN6, attorney docket M-4368 is entitled as " Methods and Apparatus for Processing Video Data (method and apparatus of processing video data) ";

U.S. Patent application serial number UNKNOWN7, attorney docket M-4369 is entitled as " Single-Instruction-Multiple-Data Processing Using Multiple Banks of Vector Registers (adopting the single-instruction multiple-data of a plurality of vector registor groups to process) "; And

The programmable digital signal processor (DSPs) that is used for multimedia application (for example real-time video Code And Decode) needs great disposal ability, in order to process a large amount of data in finite time. Several structures of digital signal processor are well-known. The universal architecture that most of microprocessors adopt generally needs high operate frequency, so that the DSP with the computing capability that is enough to carry out the real-time video coding or decodes to be provided. This makes this DSP expensive.

Very long instruction word (VLIW) processor is a kind of DSP with a lot of functional units, and the major part in these functional units is carried out different, relatively simple task. The single instruction of VLIW DSP can be 128 bytes or longer, and has a plurality of independently by the part of functional unit executed in parallel independently. VLIW DSPs has very strong computing capability, because many functional units can concurrent working. VLIW DSPs also has relatively low cost, because each functional unit is relatively little and simple. The problem that VLIW DSPs exists is to process I/O control, be unsuitable for the function aspects inefficiency of a plurality of functional unit executed in parallel of using VLIW DSP with main computer communication and other. In addition, the software of VLIW is different from traditional software and exploitation difficulty, in default of programming tool be familiar with the programmer of VLIW software configuration. Therefore, can provide the DSP of reasonable cost, high computing capability and familiar programmed environment is that multimedia application is looked for.

Summary of the invention

The purpose of this invention is to provide a kind of single-instruction-multiple-data processing and device thereof.

According to one aspect of the invention, a kind of processor is provided, comprise a scalar register, be suitable for storing single scalar value; A vector registor is suitable for storing a plurality of data elements; And treatment circuit, it is connected to described scalar register and described vector registor, wherein this treatment circuit is carried out multiple operation concurrently in response to single instruction, and a data element in every kind of described vector registor of operation handlebar combines with the described scalar value in the described scalar register.

According to a further aspect of the present invention, provide a kind of operation processing circuit to carry out the method for instruction, having comprised: the register data element of reading to consist of the vector value component; With the execution parallel work-flow, this operation handlebar scalar value combines with each data element, to produce vector result.

Another aspect according to the present invention, a kind of method of Operation Processor, being included in provides scalar register and vector registor in the described processor, wherein each scalar register is suitable for storing single scalar value, and each vector registor is suitable for storing a plurality of data elements that consist of component of a vector; Be assigned to a register number to each scalar register, this register number is different from the register number that is assigned to other scalar register; Be assigned to a register number to each vector registor, this register number is different from the register number that is assigned to other vector registor, and at least some register number that wherein is assigned to described vector registor is identical with the register number that is assigned to described scalar register; Form an instruction, this instruction comprises first operand and second operand, and wherein first operand is the register number of sign scalar register, and second operand is the register number of mark vector register; And carry out described instruction with at transferring data by the described scalar register of described first operand sign and between by the data element in the described vector registor of described second operand sign.

Multimedia digital signal processor (DSP) according to one aspect of the invention comprises a vector processor, and this vector processor operation vector data (being that every operand has a plurality of data elements) is to provide high throughput. This processor uses the SIMD organization of RISC type instruction collection. The programmer can adapt to the programmed environment of vector processor at an easy rate, because it is similar to the programmed environment of the general processor that most of programmer is familiar with.

DSP comprises the general vector registor of a cover. Each vector registor has regular length, but is divided into the independent data element that a plurality of users can select length. Therefore, being stored in data element prime number in the vector registor depends on and is the selected length of this element. For example 32 byte registers can be divided into 32 8 data element, 16 16 data element, or 8 32 data element. The selection of data length and type is determined by the instruction of processing the data relevant with vector registor, and an execution data path of instruction is carried out a plurality of parallel work-flows, and this depends on the data length that instruction is indicated.

The instruction of vector processor can the directed quantity register or scalar register as operand, and operate concurrently a plurality of data elements of a plurality of vector registors, in order to improve computing capability. An exemplary instruction set of vector processor of the present invention comprises: the coprocessor interface operation; Flow control operations: load/store operations; And logic/arithmetical operation. The operation that logic/arithmetical operation comprises combines a plurality of data elements of the data vector that bears results to corresponding a plurality of data elements in same or a plurality of other vector registors of a plurality of data elements of a vector registor. Other logic/arithmetical operation mixes the various data elements of one or more vector registors, or the data element of vector registor is combined with scalar.

A kind of structure extension of this vector processor has added scalar register, and each scalar register comprises a scalar data element. The combination of scalar sum vector registor has made things convenient for the instruction set with vector processor to expand to comprise concurrently the operation of the same scalar value combination of each data element of a vector. For example, an instruction be multiply by a scalar value to a plurality of data elements of a vector. Scalar register also provides a position, is used for the individual data element that vector registor will extract or deposit in vector registor to storage. Scalar register is with transmission information between vector processor and coprocessor (structure of this coprocessor only provides scalar register) and also very convenient to calculating the used effective address of load/store operations.

According to a further aspect in the invention, a plurality of vector registors of vector processor are organized as a plurality of groups. Each group can be selected as " current (current) " group, and another group then is " substituting (alternative) " group. In the control register of vector processor " current group " position indication current and. In order to reduce the required figure place of mark vector register, some instruction only provides the register number of a vector registor in identifying current group. Load/store instruction has an extra order to identify the vector registor of any one group. Therefore, load/store operations can be taken out data and be delivered to alternate sets during the data of operation in current group. This helps image to process and the software pipeline operation of figure process, and the delay of reduction processor when fetch data, because with the load/store operations of accessing the alternative registers group, logic/arithmetical operation can not carried out in order. In other instruction, alternate sets allow to be used the Double Length vector registor, and this register comprises one from current group vector registor, and the corresponding vector registor from alternate sets. This Double Length register can be differentiated according to syntax of instructions. Control bit in the vector processor can be set, so that default vector length is one or two vector registor. Alternate sets also allows to use the operand of explicit identification still less in the complicated order syntax, as the conditional jump of shuffling (shuffle), going to shuffle (unshuffle), saturated (saturate) and having two sources and two destination registers.

Vector processor is also realized novel instruction, as Siping City all (average quad), shuffle, go to shuffle, paired mode maximum (pair-wise maximum) and exchange (exchange) and saturated. These instructions are carried out, and to operate in the multimedia function (for example Video coding and decoding) be common, and replace realizing in other instruction set 2 required or more instruction of said function. Thereby the vector processor instruction set has been improved efficient and the speed of multimedia application Program.

Description of drawings

Describe the preferred embodiments of the present invention in detail below in conjunction with accompanying drawing, wherein,

Fig. 1 is the block diagram of multimedia processor according to an embodiment of the invention.

Fig. 2 is the block diagram of vector processor of the multimedia processor of Fig. 1.

Fig. 3 is the block diagram of fetching unit of the vector processor of Fig. 2.

Fig. 4 is the block diagram of fetching unit of the vector processor of Fig. 2.

Fig. 5 A, 5B and 5C show the register of vector processor of Fig. 2 to the step of the used execution pipeline of register instruction, load instructions and storage instruction.

Fig. 6 A is the block diagram of execution data path of the vector processor of Fig. 2.

Fig. 6 B is the block diagram of the register file (register file) of Fig. 6 A execution data path.

Fig. 6 C is the block diagram of the parallel processing logic unit of Fig. 6 A execution data path.

Fig. 7 is the block diagram of load/store unit of the vector processor of Fig. 2.

Fig. 8 is the form of the vector processor instruction set of one embodiment of the invention.

The specific embodiment

Used same reference numeral represents similar or identical item in different figure.

Fig. 1 shows the block diagram of embodiment of the multi-media signal processor (MSP) 100 of one embodiment of the invention. Multimedia processor 100 comprises the processing core 105 that general processor 110 and vector processor 120 form. Process core 105 and link the remainder of multimedia processor 100 by cache memory (hereinafter referred to as high-speed cache) subsystem 130, high-speed buffer subsystem comprises SRAM 160 and 190, ROM170 and director cache 180. Director cache 180 can be configured to SRAM160 instruction cache 162 and the data cache 164 of processor 110, and SRAM190 is configured to instruction cache 192 and the data cache 194 of vector processor 120.

ROM170 comprises data and the instruction of processor 110 and 120 in the sheet, and can be configured to high-speed cache. In the present embodiment, ROM170 comprises: reset and initialization procedure; The self-test diagnostic procedure; Interrupt and exception handler; And sound blaster emulation subroutine; V.34 modem signal is processed subroutine; The regular phone function; 2-D and 3-D figure subroutine analyzer; And be used for Voice ﹠ Video standard such as MPEG-1, MPEG-2, H.261, H.263, G.728 and subroutine analyzer G.723.

High-speed buffer subsystem 130 is connected to two system bus 140 and 150 to processor 110 and 120, and as processor 110 and 120 and be coupled to high-speed cache and the switching station (switching station) of the equipment of bus 140 and 150. The clock frequency work that system bus 150 usefulness are higher than bus 140, and being connected to Memory Controller 158, local bus interface 156, dma controller 154 and equipment interface 152, local bus, direct memory access (DMA) and various modulus, digital to analog converter that they are respectively external partial memory, master computer provide interface. System timer 142, UART (Universal asynchronous receiver transceiver, universal asynchronous receiver transmit) 144, bit stream processor 146 and interrupt control unit 148 are connected to bus 140. The patent application of above-mentioned being entitled as " Multiprocessor Operation in a Multimedia Signal Processor " and " Methods and apparatus for Processing Video Data " has more fully illustrated the work of high-speed buffer subsystem 130 and exemplary equipment, and processor 110 and 120 is by high level cache subsystem 130 and bus 140 and the described equipment of 150 access.

Processor 110 and 120 is carried out independently program threads, and structurally also is different, in order to more effectively carry out the particular task of giving them. Processor 110 is mainly used in controlling function, for example the function that computes repeatedly in a large number of the execution of real time operating system and similarly not needing. Therefore, processor 110 does not need strong computing capability, can realize with traditional general processor structure. This repetitive operation that comprises data block common in the multimedia processing of vector processor 120 main realization mathematical computations (number crunching). For strong computing capability and relative simply programming are arranged, vector processor 120 has SIMD (Single instruction multiple data, single-instruction multiple-data) structure; In the present embodiment, most of data path is 288 or 576 bit wides in vector processor 120, with the support vector data manipulation. In addition, the instruction set of vector processor 120 comprises the instruction that is particularly useful for the multimedia problem.

In the present embodiment, processor 110 is 32 risc processors, is operated on the 40MHz, meets the structure of ARM7 processor, and described ARM7 processor includes the register set of ARM7 standard definition. About the structure of ARM 7 risc processors and instruction set at " ARM7DM Data Sheet (ARM7DM product description) " Document Number (document number): be described among the ARM DDI 0010G, this can obtain from Advance RISC Machines Ltd. company. ARM7DM Data Sheet all is included in here as a reference. Appendix A has illustrated the expansion of the ARM7 instruction set of present embodiment.

Vector processor 120 not only operates vector but also operate scalar. In the present embodiment, vector data processor 120 comprises the pipeline system RISC engine (engine) with 80MHz work. The register of vector processor 120 comprises 32 scalar registers, 32 special registers, two group of 288 bit vector register and the vectorial accumulator registers of two groups of Double Lengths (namely 576). Appendix C has illustrated the register set of the vector processor 120 of present embodiment. In the present embodiment, processor 120 comprises 32 scalar registers, and 5 bit registers of these scalar registers by scope from 0 to 31 are number identified instruction. Also have 64 288 vector registor, these registers form two groups, and every group has 32 vector registors. Each vector registor can No. 31 identify with the vector registor of 1 group number (0 or 1) and 5 scopes from 0 to. The vector registor in current group is only accessed in most of instruction, as it is represented to be stored in the default group position CBANK of control register VCSR of vector processor 120. The 2nd control bit VEC64 represents the Double Length the vector registor whether default expression of register number is comprised of a register from each group. The register number of the register number of the syntax distinctive mark vector registor of instruction and sign scalar register.

Each vector registor can be divided into the programmable a plurality of data elements of length, and table 1 shows the data type of the data element of supporting in 288 bit vector registers.

Table 1:

Data type	Data length	Explain
Data type	Data length	Explain	int8	8 (byte)	82 complement code between-128 and 127
int9	9 (byte 9)	92 complement code between-256 and 255	int8	8 (byte)	82 complement code between-128 and 127
int9	9 (byte 9)	92 complement code between-256 and 255	int16	16 (half-word)	16 2 complement code between-32,768 and 32,767
int32	32 (word)	32 2 complement code between-2147483648 and 2147483647.	int16	16 (half-word)	16 2 complement code between-32,768 and 32,767
int32	32 (word)	32 2 complement code between-2147483648 and 2147483647.	float	32 (word)	32 IEEE 754 single-precision format

Appendix D further provides the data length supported in the embodiments of the invention and the explanation of type.

To the int9 data type, 9 bit bytes are combined in the 288 bit vector registers continuously, and to other data type, each the 9th is not used in 288 bit vector registers. 288 bit vector registers can be put 32 8 or 9 integer data elements, 16 16 integer data elements or 8 32 integers or floating-point element. In addition, 2 vector registors can be combined with Double Length vector assembling data element. In an embodiment of the present invention, the control bit VEC64 set with among control and the status register VCSR places mode VEC64 to vector processor 120, and Double Length (576) is the default length of vector registor here.

Multimedia processor 100 also comprises 32 extended registers 115 that a cover processor 110 and 120 can be accessed. Appendix B has illustrated extended register collection and their function in the embodiments of the invention. The scalar sum special register of extended register and vector processor 120 in some cases can be for processor 110 access. 2 special uses " user " extended register has 2 read ports, allows simultaneously read register of processor 110 and 120. Other extended register can not be simultaneously accessed.

Vector processor 120 has two the state VP_RUN and the VP_IDLE that replace, and indication vector processor 120 is in work or is in idle condition. When vector processor 120 was in state VP _ IDLE, processor 110 can read or write the scalar sum special register of vector processor 120. But the result that processor 110 read or write a register of vector processor 120 when vector processor 120 was in state VP_RUN does not give definition.

Expansion to the ARM7 instruction set of processor 110 comprises access extended register and the scalar of vector processor 120 or the instruction of special register. Command M FER and MFEP move on to the scalar of extended register and vector processor 120 or the data in the special register in the general register in the processor 110 respectively, and command M TER and MTEP move on to the data of general register in the processor 110 in the scalar or special register of extended register and vector processor 120 respectively. The TESTSET instruction is read extended register and the position 30 of extended register is set to 1. Signal instruction processor 110 occurs to processor 120 and has read the result that (or use) produces by with position 30 set in instruction TESTSET, has made things convenient for user/producer synchronous. The duty of other instruction of processor 110 such as STARTVP and INTVP dominant vector processor 120.

The work of 110 primary processors of processor is in order to the operation of dominant vector processor 120. Simplify processor 110 and 120 with the asymmetric division of control between processor 110 and 120 and carried out synchronous problem. When vector processor 120 was in the VP_IDLE state, processor 110 came initialization vector processor 120 by IA is write in the program counter of vector processor 120. Then, processor 110 is carried out the STARTVP instruction, and vector processor 120 is changed over state VP_RUN. Under state VP_RUN, vector processor 120 is by high-speed buffer subsystem 130 fetchings, and the processor 110 of its program of continuation execution is carried out those instructions concurrently together. After startup, vector processor 120 continues to carry out, until run into unusual, the VCJOIN that carries out to satisfy felicity condition or VCINT instruction or interrupted by processor 110. Vector processor 120 can be sent to processor 110 with the result of program execution by the result being write extended register, the result is write the address spaces that processor 110 and 120 shares or when vector processor 120 reenters state VP_IDLE the result being stayed in the scalar or special register of processor 110 access.

Vector processor 120 is not processed the unusual of it. When execution causes unusual instruction, vector processor 120 VP_IDLE that gets the hang of, and send an interrupt requests to processor 110 by direct-through line. Vector processor 120 remains on state VP_IDLE, until processor 110 is carried out another STARTVP instruction. The register VISRC that processor 110 is responsible for read vector processor 120 may process unusually by reinitializing vector processor 120 to determine unusual character, and then, boot vector processor 120 recovers to carry out as required.

INTVP instruction interrupt vector processor 120 by processor 110 is carried out makes vector processor 120 enter idle condition VP_IDLE. Instruction INTVP can for example be used in the multitask system, and vector processor is switched to another task such as sound card emulation from task such as the video coding of carrying out.

Vector processor instruction VCINT and VCJOIN are flow control instructions, if the condition of instruction indication satisfies, these instructions make vector processor 120 place state VP_IDLE the execution of stop vector processor 120, and to 110 interrupt requests of processor, unless this request conductively-closed. The program counter of vector processor 120 (special register VPC) is pointed out the IA after VCINT or the VCJOIN instruction. Processor 110 can check the interrupt source register VISRC of vector processor 120, determines whether it is that VCINT or VCJOIN instruction cause interrupt requests. Because vector processor 120 has the mass data bus, and more effective on its register of Save and restore, so should the Save and restore register during the software of carrying out by vector processor 120 switches (context switching) at the scene. The patent application of above-mentioned being entitled as " Efficient Context Saving and Restoring in Multiprocessors " has illustrated an exemplary system of Context switches.

Fig. 2 shows the main functional diagram of the embodiment of vector processor 120. Vector processor 120 comprises 210, the decoders 220 in a fetching unit (IFU), scheduler 230, execution data path 240 and a load/store unit (LSU) 250. The IFU210 fetching is also processed flow control instructions (such as branch). Command decoder 220 is according to the order that arrives from IFU 210, and per cycle is deciphered an instruction, and a field value of deciphering out from instruction is write the FIFO in the scheduler 230. Scheduler 230 selects to send to the field value of carrying out control register according to the needs of executable operations step. Send to select to depend on operand dependence (dependency) and processing resource such as execution data path 240 or pack into/availability of memory cell 250. Logic/the arithmetic instruction of execution data path 240 executable operations vectors or scalar data. Pack into/memory cell 250 carries out the instruction of packing into/store of the address space of access vector processors 120.

Fig. 3 shows the block diagram of the embodiment of IFU210. IFU comprises an instruction buffer, and this buffer is divided into main instruction buffer 310 and ancillary instruction buffer 312. Main buffer 310 comprises 8 continual commands, comprising the instruction corresponding to the present procedure counting. Comprise 8 instructions of the instruction in the buffer 310 and then in the secondary buffer 312. IFU210 also comprises a branch target buffer 314, and it comprises 8 continual commands, comprising the target of next flow control instructions in buffer 310 or 312. In the present embodiment, vector processor 120 uses the risc type instruction set, wherein every instruction be 32 long, buffer 310,312 or 314 is 8 * 32 digit buffers, and links high-speed buffer subsystem 130 by 256 bit instruction buses. IFU 210 can be within a clock cycle, and 8 instructions in the high-speed buffer subsystem 130 are loaded in the buffer 310,312 or 314 any one. Register 340,342 and 344 is indicated respectively the base address of load in the buffer 310,312 and 314.

MUX 332 is selected current instruction from main instruction buffer 310. If present instruction is not flow control instructions, and be stored in the decoding stage that instruction in the command register 330 proceeds to execution, then command register 330 is deposited in present instruction, is incremented to programmed counting. Behind the programmed counting increment, select the last item instruction in the buffer 310, then 8 instructions of next group are loaded onto buffer 310. If buffer 312 comprises desired 8 instructions, then the content of buffer 312 and register 342 moves on to buffer 310 and register 340 immediately, has again 8 instructions to deliver to secondary buffer 312 from cache systems 130 pre-fetchings. Adder 350 is determined the address of next group instruction according to the base address in the register 342 and the side-play amount selected by MUX 352. The result address that is obtained by adder 350 is stored in the register 342, when this moves on to register 340 in this address from register 342 or carry out later on. The address that calculates is also delivered in the high-speed buffer subsystem 130 in company with the request of 8 instructions. If called cache control system 130 last time, when buffer 310 request, also not 8 instructions below buffer 312 provides, then the instruction of request last time when receiving from high-speed buffer subsystem 130, is stored in the buffer 310 immediately.

If present instruction is flow control instructions, IFU210 by convection control instruction condition calculating and after flow control instructions refresh routine count to process this instruction. If because the instruction that the front may change condition do not finish, and condition pauses IFU210 when can not determine. If branch does not occur, program counter is incremented, and following instruction is selected as mentioned above. If the target that branch and branch target buffer 314 comprise this branch occurs, then the content of buffer 314 and register 344 is moved to buffer 310 and register 340, instruction is provided and need wait for from the instruction in the high-speed buffer subsystem 130 so that IFU 210 can continue as decoder 220.

In order to be branch target buffer 314 prefetched instructions, scanner 320 scanning buffer devices 310 and 312 are to locate the and then next flow control instructions of present procedure counting. If find flow control instructions in buffer 310 or 312, scanner 320 is determined to comprise the side-play amount of 8 instructions of flow control instructions destination address to one group of (aligned) that aims at from the buffer 310 that comprises this instruction or 312 base address. MUX 352 and 354 provides the side-play amount of flow control instructions and from the base address of register 340 or 342, is that buffer 314 produces new base address by adder 350 for adder 350. New base address is transferred to high-speed buffer subsystem 130, moreover it provides 8 instructions for branch target buffer 314.

Processing flow control instructions such as " decrement and conditional jump " instruction VD1CBR, VD2CBR and VD3CBR, when reaching " change control register " instruction VCHGCR, IFU210 can change the value of the register except programmed counting. When IFU 210 found the instruction of a non-flow control instructions, command register 330 was delivered in this instruction, and from there to decoder 220.

As shown in Figure 4, each field of the fifo buffer 410 of decoder 220 by controlling value being write scheduler 230 is deciphered an instruction. Fifo buffer 410 comprises 4 line triggers, and wherein every delegation can comprise 5 information fields, in order to control the execution of an instruction. Row 0 to 3 keeps arriving the earliest respectively the information of up-to-date instruction, when information is early finished along with instruction and when being removed, the information in fifo buffer 410 moves down into lower row. Scheduler 230 sends an instruction to the execution phase by selecting essential instruction field to be loaded into to comprise the control pipeline 420 of carrying out register 421 to 427. Most of instruction can be scheduled, in order to do not send in order and carry out. Especially the order about logic/arithmetical operation and load/store operations is arbitrarily, unless the operand dependence is arranged between load/store operations and logic/arithmetical operation. Field value relatively indicates whether have operation dependency to exist in the fifo buffer 410.

Fig. 5 A illustrates 6 stage execution pipelines of an instruction, and this instruction has realized the operation of register to register, and need not access the address space of vector processor 120. In the instruction fetching stage 511, as mentioned above fetching one instruction of IFU210. The fetching stage needs 1 clock cycle, unless because pipelining delay, unsolved branch condition or the delay in the high-speed buffer subsystem 130 that prefetched instruction is provided pause IFU210. In the decoding stage 512, decoder 220 decoding is from the instruction of IFU210, and the information of this instruction is write scheduler 230. The decoding stage 512 also needs a clock cycle, unless to new operation, among the FIFO 410 without available row. During the period 1 of FIFO 410, can send and operate control pipeline 420, but can be delayed owing to sending of operation early.

Executing data passage 240 is realized registers to the operation of register, and provides data and address for load/store operations. Fig. 6 A shows the block diagram of execution data path 240 1 embodiment, and is illustrated together with the execution phase 514,515 and 516. Carrying out register 421 provides the signal of two registers in the marker register file 610, and register file 610 was read in the clock cycle during read phase 514. Register file 610 comprises 32 scalar registers and 64 vector registors. Fig. 6 B is the block diagram of register file 610. Register file 610 has 2 read ports and 2 write ports, in order to provide 2 to read to write with 2 in each clock cycle. Each port comprises selects circuit 612,614,616 or 618 and 288 data/address bus 613,615,617 or 619. Selecting circuit is to know such as circuit 612,614,616 and 618 in the art, and use address signal WRADDR1, WRADDR2, RDADDR1 or RDADDR2, this be decoder 220 from generally be 5 bit registers that in instruction, provide number, group position from instruction or state of a control register VCSR, and indicator register be to obtain vector registor or the syntax of instructions of scalar register. The path that data are read can be to load/store unit 250, perhaps by

MUX

622 and 624, by multiplier 620 ALUs 630, accumulator 640 by MUX 656. 2 registers are read in most of operation, and read phase 514 is finished in one-period. Yet, some instruction, as take advantage of and the instruction that adds instruction VMAD and operation Double Length vector need to more than the data of 2 registers, cause read phase 514 to need to surpass a clock cycle.

In the execution phase 515, multiplier 620, ALU 630 and accumulator 640 are processed the data that read from register file 610 front. If in order to read necessary a plurality of cycles of data demand, the execution phase 515 can be overlapping with read phase 514. The duration of execution phase 515 is depended on type (integer or floating type) and the quantity (read cycle data) of deal with data element. From carry out register 422,423 and 425 signal controlling data inserting to ALU 630, accumulator 640 and multiplier 620 in order to realize that in the execution phase first step operates. From carry out register 432,433 and 435 signal controlling realizes the second step operation in the execution phases 515.

Fig. 6 C shows the block diagram of multiplier 620 and ALU 630 1 embodiment. Multiplier 620 is integer multiplier, and it comprises 8 independently 36 * 36 multipliers 626. Each multiplier 626 comprises 49 * 9 multipliers that link together by control circuit. To 8 and 9 bit data elements width, disconnect the mutual binding of 49 * 9 multipliers from the control signal of scheduler 230, so that each multiplier 626 is realized 4 multiplication, multiplier 620 is realized 32 independently multiplication in one-period. To 16 bit data elements, control circuit 9 * 9 multipliers to the operation that links together. Multiplier 620 is realized 16 parallel multiplications. To 32 integer data element types, 8 626 each clock cycle of multiplier are realized 8 parallel multiplications. The result of multiplication provides 576 results to 9 bit data elements width, provides 512 results to other data length.

ALU 630 can process 576 or 512 results from multiplier 620 in 2 clock cycle. ALU 630 comprises 8 independently 36 ALUs 636, and each ALU 636 comprises for floating addition and 32 * 32 floating point units taking advantage of. Adjunct circuit is realized integer displacement, arithmetic sum logic function. For integer operation, each ALU 636 comprises 4 unit that can independently carry out 8 and 9 bit manipulations, and to 16 and 32 integer data elements, per 2 or 4 can form one group and connect together.

Accumulator 640 accumulation results, and comprise 2 576 bit registers, in order to realize the degree of precision of intermediate object program.

At write phase 516, from the result store of execution phase in register file 610. Within a clock cycle, can write 2 registers, 2 data values that input MUX 602 and 605 selections will be write. The duration of the write phase 516 of once-through operation depends on the data volume that will be write as operating result and from the competition of LSU 250, LSU 250 may be by writing to finish the loading instruction to register file 610. Select register that the data from logical block 630, accumulator 640 and multiplier 620 are write from the signal of carrying out

register

426 and 427.

Fig. 5 B illustrates and carries out the execution pipeline 520 that loads instruction. Identical for instruction fetching stage 511, decoding stage 512 and the stage of sending 513 of execution pipeline 520 and illustrated register to the operation of register. Read phase 514 is also identical with top explanation, just execution data path 240 usefulness from the data of register file 610 to determine the address of calls cache subsystem 130. At address phase 525, MUX 652,654 and 656 is selected the address, and this address is provided for the load/store unit 250 of execution phase 526 and 527. When load/store unit 250 was processed operation, during stage 526 and 527, the Information preservation of load operation was in FIFO 410.

Fig. 7 shows an embodiment of load/store unit 250. Calls cache subsystem 130 during the stage 256 is with the data of request stage 525 determined addresses. Present embodiment uses (transaction based) high-speed cache based on affairs to call, and can pass through high-speed buffer subsystem 130 access local address spaces comprising a plurality of equipment of processor 110 and 120. In several cycles after calls cache subsystem 130, requested data may can not get, but when other called hang-up, load/store unit 250 can the calls cache subsystems. Therefore, load/store unit 250 unlikely pauses. High-speed buffer subsystem 130 provides the required clock periodicity of requested data to depend on hitting of data cache 194 or miss (hit or miss).

In the driving stage 527, high-speed buffer subsystem 130 is that load/store unit 250 is confirmed (assert) data-signal. High-speed buffer subsystem 130 can provide the data of 256 (32 bytes) to load/store unit 250 in each cycle, and byte alignment device 710 is aimed at each byte of 32 bytes in corresponding 9 memory locations, so that 288 value to be provided. 288 form is easily to the multimedia application of for example mpeg encoded and decoding, and they use 9 bit data elements sometimes. 288 place values write read data 720. To write phase 528, scheduler 230 is sent to the field 4 of fifo buffer 410 and carries out

register

426 or 427, and 288 value of data buffer 720 is write register file 610.

Fig. 5 C shows and carries out the used execution pipeline 530 of storage instruction. The fetching stage 511 of execution pipeline 530, decoding stage 512 and the stage of sending 513 are identical with what illustrate previously. Read phase 514 is also identical with what illustrate previously, and just read phase is read data and the used data of address computation that will store. Want stored data to be written into write data buffer 730 in the load/store unit 250. MUX 740 becomes the data transaction of 9 bit byte forms the form of traditional octet. From the data of the conversion of buffer 730 with from the relative address in address computation stage 525, during the SRAM stage 536, delivered to concurrently high-speed buffer subsystem 130.

In the embodiment of vector processor, each instruction be 32 long and have a kind of form in 9 kinds of forms shown in Fig. 8, and be labeled as REAR, REAI, RRRM5, RRRR, RI, CT, RRRM9, RRRM9^*, and RRRM9^** Appendix E has illustrated the instruction set of vector processor 120.

When determining an effective address, use some loading, storage and the cache operations of scalar register to have the REAR form. The REAR format order is that 000b identifies and 3 operands arranged by 3 register number sign with a position 29-31, and 2 register number SRb and SRi are scalar register, and register number Rn can be scalar or vector registor, and this depends on a D. Group position B or for register Rn identifies a group, if indicate whether when perhaps the default vector register size is Double Length that vector registor Rn is Double Length. The operation that opcode field Opc sign is carried out operand, and field TT indication transmission type is for loading or storage. Typical REAR format order is instruction VL, and it comes bit load registers Rn from scalar register SRb and the definite address of SRi content addition. If position A is set, the address of calculating is stored among the scalar register SRb.

The REAI format order is identical with the REAR instruction, just is used to replace the content of scalar register SRi from 8 immediate values of field IMM. REAR and REAI form are countless according to the length of element field.

The RRRM5 form is used for having the instruction of 2 source operands and a destination operand. These instructions have 3 register manipulation numbers or 2 register manipulation numbers and 15 immediate value. Coding at field D, the S shown in the appendix E and M determines whether that first source operand Ra is scalar or vector registor; Whether the 2nd source operand Rb/IM5 is scalar register, vector registor or 5 immediate values; And whether destination register Rd is scalar or vector registor.

The RRRR form is used for having the instruction of 4 register manipulation numbers. Register number Ra and Rb indication source register. Register number Rd indicates destination register, and register number Rc indication source or destination register, this depends on field Opc. The all operations were number is vector registor, is scalar register unless position S is set indicator register Rb. The data element length of field DS indication vector registor. Field Opc selects the data type of 32 bit data elements.

The RI format order loads an immediate value to register. Field IMM comprises can reach 18 immediate value. Register number Rd indicates destination register, and this destination register is vector registor or the scalar register in current group, and this depends on a D. Field DS and F be length and the type of designation data element respectively. To 32 integer data elements, 18 immediate values are being loaded into register Rd with the previous crops sign extended. To the floating data element, position 18, position 17 to 10 and position 9 to 0 represent respectively symbol, the exponential sum mantissa of 32 floating point values.

The CT form is used for flow control instructions, and it comprises opcode field Opc, condition field Cond and 23 s' immediate value IMM. When the condition field indicated condition is true time, branch then occurs. Possible condition code is " always (unconditionally) ", " Less than (less than) ", " equal (equaling) ", " Less than or equal (being less than or equal to) ", " greater than (greater than) ", " not equal (being not equal to) ", " greater than or equal (more than or equal to) " and " overflow (overflowing) ". Position GT, EQ, LT and SO among state and the control register VCSR are used for appreciation condition.

Form RRRM9 provides 3 register manipulation numbers or 2 register manipulation numbers and 19 immediate value. Which operand the combination of position D, S and M indicates is vector registor, scalar register or 9 immediate values. Field DS designation data length of element. RRRM9^*And RRRM9^**Form is the special circumstances of RRRM9 form, and distinguishes with opcode field Opc. RRRM9* form condition code Cond and id field alternate source register number Ra. RRRM9^**Form replaces each highest significant position of immediate value with condition code Cond and position K. RRRM9^*And RRRM9^**Further specify in appendix E and provide, relate to conditional branch instruction VCMOV, element shielding conditional jump CMOVM and comparison and masking instruction CMPV be set.

Although in conjunction with specific embodiments the present invention has been made explanation, but these explanations only are the examples that the present invention uses, should be as being not a kind of restriction, the various modifications of the disclosed embodiments characteristics and combination still belong to the scope of the present invention that following claim defines in addition.

Appendix A

In an exemplary embodiment, processor 110 is the general processors according to ARM7 processor standard. In ARM7 to the description references ARM structured file of register or ARM7 tables of data (document number ARM DDI 0020C, in December, 1994 distribution).

In order to cooperatively interact 110 processors with vector processor 120: starting and stop vector processor; The test vector processor state comprises synchronous regime; Scalar/special register from vector processor 120 passes to data in the general register of processor 110; And the scalar/special register that the data in the general register is passed to vector processor. Between the vector registor of general register and vector processor, there is not direct conveyer, these transmission need memory as mediator.

Table A .1 has illustrated the ARM7 instruction set of expanding for the reciprocation of vector processor.

Table A .1: the ARM7 instruction set of expansion

Instruction	The result
Instruction	The result	STARTVP	This instruction makes vector processor enter the VP-RUN state, if vector processor has entered the VP-RUN state then without impact. STARTVP carries out as processor data operation (CDP) class in the ARM7 structure, turns back to ARM7 without the result, and ARM7 continues its execution.
INTVP	This instruction makes vector processor enter the VP-IDEL state, if vector processor has entered the VP-IDEL state then without impact. INTVP carries out as processor data operation (CDP) class in the ARM7 structure, turns back to ARM7 without the result, and ARM7 continues its execution.	STARTVP
INTVP		TESTSET	User's extended register is read in this instruction, and register-bit 30 is set to 1 so that between vector sum ARM7 processor, provide the producer/consumer type synchronously. In the ARM7 structure, TESTSET carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted).
MFER	Transfer to the ARM general register from extended register, in the ARM7 structure, MFER carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted).	TESTSET

Instruction	The result
Instruction	The result	MFVP	Transfer to the ARM7 general register from the scalar/special register of vector processor. Be different from other ARM7 instruction, this instruction is only carried out when vector processor is in VP-IDLE state. Otherwise its result is undefined. In the ARM7 structure, MFVP carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted).
MTER	Transfer to extended register from the ARM7 general register, in the ARM7 structure, MTER transmits (MCR) class as coprocessor register and carries out. ARM7 gets clogged, until this instruction is performed (register is transmitted).	MFVP
MTER		MTVP	Transfer to the scalar/special register of vector processor from the ARM7 general register, be different from other ARM7 instruction, this instruction is only carried out when vector processor is in VP_ IDLE state. Otherwise its result is undefined. In the ARM7 structure, MTVP transmits (MCR) class as coprocessor register and does not carry out. ARM7 gets clogged, until this instruction is performed (register is transmitted).
CACHE	The software administration of ARM7 data cache is provided	MTVP
CACHE	The software administration of ARM7 data cache is provided	PFTCH	The cache line of looking ahead is delivered to the ARM7 data cache.
WBACK	The cache line that the ARM7 data cache is come is written back in the memory.	PFTCH

Table A .2 has listed the unusual of ARM7, before carrying out the fault instruction, detects and reports that these are unusual. The exception vector address provides with sexadecimal notation.

Table A .2:ARM7 is unusual

Exception vector	Explanation
Exception vector	Explanation	0x00000000	ARM7 resets
0x00000004	The ARM7 undefined instruction is unusual	0x00000000	ARM7 resets
0x00000004	The ARM7 undefined instruction is unusual	0x00000004	Vector processor is unavailable unusual
0x00000008	The ARM7 software interrupt	0x00000004	Vector processor is unavailable unusual
0x00000008	The ARM7 software interrupt	0x0000000C	The ARM7 single step is unusual
0x0000000C	ARM7 IA breakpoint is unusual	0x0000000C	The ARM7 single step is unusual
0x0000000C	ARM7 IA breakpoint is unusual	0x00000010	ARM7 data address breakpoint is unusual
0x00000010	ARM7 invalid data address is unusual	0x00000010	ARM7 data address breakpoint is unusual
0x00000010	ARM7 invalid data address is unusual	0x00000018	The ARM7 protection is violating the regulations unusual

The following describes the syntax that the ARM7 instruction set is expanded. About the form of the explanation of term and instruction with reference to ARM structured file or ARM7 tables of data (document number ARM DDI 0020C, deliver in December, 1994).

The ARM structure provides 3 kinds of instruction formats for coprocessor interface:

1. coprocessor data manipulation (CDP)

2. the coprocessor data transmit (LDC, STC)

3. coprocessor register transmits (MRC, MCR)

Whole two kinds of forms are used in the expansion of MSP structure.

The coprocessor data manipulation form (CDP) that uses for operation need not return to ARM7. The CDP form

30 25 20 15 10 5 0

The CDP format fields has following agreement:

Field	Meaning
Field	Meaning	Cond	Condition field, this field designated order executive condition
Opc	The co processor operation code	Cond
Opc	The co processor operation code	CRn	The co processor operation number register
CRd	The coprocessor destination register	CRn	The co processor operation number register
CRd	The coprocessor destination register	CP#	Coprocessor number; Below coprocessor number be current use: 1111-ARM7 data cache 0111-vector processor, the register of expansion
CP	Coprocessor information	CP#
CP	Coprocessor information	CPm	The co processor operation number register

Coprocessor data transfer format (LDC, STC) is used for directly loading or the register subset of storage vector processor arrives memory. The ARM7 processor is responsible for providing word address, and vector processor provides or receive data, and the number of words of control transmission. More detailed content is with reference to the ARM7 tables of data. LDC, the STC form

30 25 20 15 10 5 0

Format fields has following agreement:

Field	Meaning
Field	Meaning	Cond	Condition field, this field designated order executive condition
P	The Pre/Post flag bit	Cond
P	The Pre/Post flag bit	U	The Up/Down position
N	Transmit length, because the CRd field does not have enough figure places, position N uses as a part of source or destination register identifier.	U	The Up/Down position
N		W	The write-back position
L	Load/the storage position	W	The write-back position
L	Load/the storage position	Rn	Base register
CRn	Coprocessor source/destination register	Rn	Base register
CRn	Coprocessor source/destination register	CP#	Coprocessor number, following coprocessor number are current uses: 1111-ARM7 data cache 0111-vector processor, the register of expansion
Offset	Without 8 of symbols side-play amount immediately	CP#

Coprocessor register transformat (MRC, MCR) is used for directly transmission information between ARM7 and vector processor. This form is used in the scalar of ARM7 register and vector processor or the transfer between the special register.

MRC, the MCR form

30 25 20 15 10 5 0

This format fields has following agreement:

Field	Meaning
Field	Meaning	Cond	Condition field, the condition that this field designated order is carried out
Opc	The co processor operation code	Cond
Opc	The co processor operation code	L	Loading/storage position L=0 moves on to vector processor L=1 and moves from vector processor
CRn：Crm	Coprocessor source/destination register. CRn＜1:0 only 〉: CRm＜3:0〉be used	L
CRn：Crm		Rd	ARM source/destination register
CP#	Coprocessor number, following coprocessor number are current uses: 1111=ARM7 data cache 0111=vector processor, the register of expansion	Rd	ARM source/destination register
CP#		CP	Coprocessor information

The ARM instruction of expansion

The ARM instruction alphabet sequence of expansion is explained.

The CACHE cache operations

Form

30 25 20 15 10 5 0

The assembler syntax

STC{cond}p15，cOpc，<Address>

CACHE{cond}Opc，<Address>

Cond={eq wherein, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, It, gt, le, ai, nv} and Opc={0,1,3}. Note, because the CRn field of LDC/STC form is used to specify Opc. The decimal representation of Opcode must be by letter " C " take the lead (namely representing 0 with CO) in the first syntax. About the address mode syntax with reference to the ARM7 tables of data.

Explanation

Be true time at Cond only, carry out this instruction. Opc＜3:0〉indicate following operation:

Opc<3：0>	Meaning
Opc<3：0>	Meaning	0000	Write-back and calcellation are by the cache line of the change of EA appointment. If the row of coupling comprises the data of not changing, this row is cancelled, and refuses write-back. If can't find the cache line that comprises EA, data cache keeps remaining untouched.
0001	Write-back and calcellation are by the cache line of the change of EA traction appointment. If matching row comprises the data of not changing, this row is cancelled refuses write-back.	0000
0001		0010	Be used for PFTCH and WBACK instruction
0011	Calcellation is by the cache line of EA appointment. Even this row was changed, this cache line is also by cancel (not write-back). This is a kind of privileged operation, if attempt to use under user mode, it will cause that the ARM7 protection is violating the regulations	0010	Be used for PFTCH and WBACK instruction
0011		Other	Keep

Operation

With reference to the ARM7 tables of data, how EA calculates.

Unusually

The ARM7 protection is violating the regulations.

INTVP interrupt vector processor

Form 30 25 20 15 10 50

The assembler syntax

CDP{cond}p7，1，c0，c0，co

INTVP{cond}

Cond={eq wherein, ne, cs, cc, mi, pl, vs, vc, hi, Is, ge, It, gt, le, al, ns}.

Explanation

This instruction is that true time is carried out at Cond only. This instruction is signaled vector processor is stopped. ARM7 needn't wait for that vector processor stops, and continues to carry out next instruction.

Should use MFER busy waiting circulation and whether after this instruction is carried out, stop in order to looking at vector processor. If vector processor is at the VP_IDLE state, then this instruction is inoperative. Position 19:12,7:15 and 3:0 are retained.

Unusually

Vector processor is unavailable.

MFER shifts from extended register

Form

30 25 20 15 10 5 0

The assembler syntax

MRC{cond}p7，2，Rd，cP，cER，0

MFER{cond}Rd，RNAME

Cond={eg wherein, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, P={0,1}, ER={0 ... .15} and RNAME refers to the register memonic symbol (that is, PERO or CSR) of appointment on the structure.

Explanation

This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to P:ER＜3:0〉the extended register ER of appointment shifts, and is as shown in the table. Explanation with reference to chapters and sections 1.2 extended registers.

ER<3：0>	P＝0	P＝1
ER<3：0>	P＝0	P＝1	0000	UER0	PER0
0001	UER1	PER1	0000	UER0	PER0
0001	UER1	PER1	0010	UER2	PER2
0011	UER3	PER3	0010	UER2	PER2
0011	UER3	PER3	0100	UER4	PER4
0101	UER5	PER5	0100	UER4	PER4
0101	UER5	PER5	0110	UER6	PER6
0111	UER7	PER7	0110	UER6	PER6
0111	UER7	PER7	1000	UER8	PER8
1001	UER9	PER9	1000	UER8	PER8

ER<3：0>	P＝0	P＝1
ER<3：0>	P＝0	P＝1	1010	UER10	PER10
1011	UER11	PER11	1010	UER10	PER10
1011	UER11	PER11	1100	UER12	pER12
1101	UER13	PER13	1100	UER12	pER12
1101	UER13	PER13	1110	UER14	PER14
1111	UER15	PER15	1110	UER14	PER14

Position 19:17 and 7:5 are retained

Unusually

When attempting to access PERx in user mode, protection is violating the regulations.

MFVP shifts from vector processor

Form

The assembler syntax

MRC{cond}p7，1，Rd，Crn，CRm，0

MFVP{cond}Rd，RNAME

Cond={eq wherein, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=and r0 ... r15}, CRn={c0 ... .c15}, CRm={c0 ... .c15} and RNAME refers to the register memonic symbol (that is, SPO or VCS) of appointment on the structure

Explanation

This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to the scalar of vector processor/special register CRn＜1:0 〉: CRm＜3:0〉shift. Distribution with reference to register transfer vector processor register number among the chapters and sections 3.2.3.

Position 7.5 and CRn＜3:2〉be retained.

Below the vector processor register mappings is presented at. The table 15 of reference vector processor special register (SP0-SP15).

CRM<3：0>	CRn<1：0>＝00	CRn<1：0>＝01	CRn<1：0>＝10	CRn<1：0>＝111
CRM<3：0>	CRn<1：0>＝00	CRn<1：0>＝01	CRn<1：0>＝10	CRn<1：0>＝111	0000	SR0	SR16	SP0	RASR0
0001	SR1	SR17	Sp0	RASR1	0000	SR0	SR16	SP0	RASR0
0001	SR1	SR17	Sp0	RASR1	0010	SR2	SR18	SP0	RASR2
0011	SR3	SR19	SP0	RASR3	0010	SR2	SR18	SP0	RASR2
0011	SR3	SR19	SP0	RASR3	0100	SR4	SR20	SP0	RASR4
0101	SR5	SR21	SP0	RASR5	0100	SR4	SR20	SP0	RASR4
0101	SR5	SR21	SP0	RASR5	0110	SR6	SR22	SP0	RASR6
0111	SR7	SR23	SP0	RASR7	0110	SR6	SR22	SP0	RASR6
0111	SR7	SR23	SP0	RASR7	1000	SR8	SR24	SP0	RASR8
1001	SR9	SR25	SP0	RASR9	1000	SR8	SR24	SP0	RASR8
1001	SR9	SR25	SP0	RASR9	1010	SR10	SR26	SP0	RASR10
1011	SR11	SR27	SP0	RASR11	1010	SR10	SR26	SP0	RASR10
1011	SR11	SR27	SP0	RASR11	1100	SR12	SR28	SP0	RASR12
1101	SR13	SR29	SP0	RASR13	1100	SR12	SR28	SP0	RASR12
1101	SR13	SR29	SP0	RASR13	1110	SR14	SR30	SP0	RASR14
1111	SR15	SR31	SP0	RASR15	1110	SR14	SR30	SP0	RASR14

SR0 often reads 32 zero, and ignores writing it.

Unusually

Vector processor is unavailable.

MTER transfers to extended register

Form 30 25 20 15 10 50

The assembler syntax

MRC{cond}p7，2，Rd，cP，cER，0

MFVP{cond}Rd，RNAME

Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, P={0,1}, ER={0 ... 15}. RNAME refers to the register memonic symbol (that is, PERO or CSR) of appointment on the structure.

Explanation

This instruction is that true time is carried out in condition only. ARM7 register Rd is according to P:ER＜3:0〉the extended register ER of appointment shifts. As shown in the table

ER<3：0>	P＝0	P＝1
ER<3：0>	P＝0	P＝1	0000	UER0	PER0
0001	UER1	PER1	0000	UER0	PER0
0001	UER1	PER1	0010	UER2	PER2
0011	UER3	PER3	0010	UER2	PER2
0011	UER3	PER3	0100	UER4	PER4
0101	UER5	PER5	0100	UER4	PER4
0101	UER5	PER5	0110	UER6	PER6
0111	UER7	PER7	0110	UER6	PER6
0111	UER7	PER7	1000	UER8	PER8
1001	UER9	PER9	1000	UER8	PER8
1001	UER9	PER9	1010	UER10	PER10
1011	UER11	PER11	1010	UER10	PER10
1011	UER11	PER11	1100	UER12	PER12
1101	UER13	PER13	1100	UER12	PER12
1101	UER13	PER13	1110	UER14	PER14
1111	UER15	PER15	1110	UER14	PER14

Position 19:17 and 7:5 are for subsequent use

Unusually

Attempt is when user mode access PERx, and protection is violating the regulations.

MTVP transfers to vector processor

Form 30 25 20 15 10 50

The assembler syntax

MRC{cond}p7，1，Rd，Crn，CRm，0

MFVP{cond}Rd，RNAME

Here Cond={eq, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, CRn={c0 ... .c15}, CRm={c0 ... .c15}. RNAME refers to the register memonic symbol (that is, SPO or VCS) of appointment on the structure.

Explanation

This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to the scalar of vector processor/special register CRn＜1:0 〉: CRm＜3:0〉shift.

Position 7:5 and CRn＜3:2〉keep.

The vector processor register mappings is as follows

Unusually

Vector processor is unavailable.

PFTCH looks ahead

Form

30 25 20 15 10 5

0

The assembler syntax

LDC{cond}p15，2，<Address>

PFTCH{cond}<Address>

Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, Is, ge, lt, gt, le, al, nv}, the ARM7 tables of data of reference address mode syntax.

Explanation

This instruction is that true time is carried out at Cond only. Cache line by the EA appointment is pre-fetched in the ARM7 data cache.

Operation

How to be calculated about EA, with reference to the ARM7 tables of data.

Unusually: nothing

STARTVP start vector processor

Form

30 25 20 15 10 5

0

The assembler syntax

CDP{cond}p7，0，cO，cO，cO

STARTVP{cond}

Cond={eq wherein, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, it, gt, le, al, nv}.

Explanation

This instruction is that true time is carried out at cond only. This instruction is signaled to vector processor, starts to carry out and automatically remove VISRC＜vjp〉and VISRC＜vip. ARM7 does not wait for that vector processor starts execution, continues to carry out next instruction.

The state of vector processor must be initialized to desired state before this instruction is carried out. If vector processor is at the VP-RUN state, then this instruction is without effect.

Position 19:12,7:5 and 3:0 keep.

Unusually

Vector processor is unavailable.

TESTSET test and setting

Form

30 25 20 15 10 5

0

The assembler syntax

MRC{cond}p7，0，Rd，cO，cER，0

TESTSET{cond}Rd，RNAME

Here cond={eq, he, cs, cc, mi, p1, rs, re, hi, ls, ge, It, gt, le, al, nv}. Rd=and r0....r15}, ER={0 ... ..15}, RNAME refer to the register memonic symbol (that is, VER1 or VASYNC) of appointment on the structure.

Explanation

This instruction is that true time is carried out at cond only, and this instruction turns back to the content of UERX among the RD, and sets UERX＜30〉be 1. If destination register is appointed as by ARM7 register 15 then UERx＜30〉return in the Z position of CPSR, in order to can realize short busy waiting circulation.

Current, only have UER1 to be prescribed in company with reading instruction works.

Position 19:17 and 7:5 keep.

Unusually: nothing

Appendix B

The organization definition of multimedia processor 100 extended register of processor 110 usefulness MFER and MTER instruction access, extended register comprises special permission extended register and user's extended register.

The special permission extended register is mainly used in controlling the operation of multi-media signal processor. B.1 they be shown in table

Show B.1: the special permission extended register

Number	Memonic symbol	Explanation
Number	Memonic symbol	Explanation	PER0	CTR	Control register
PER1	PVR	The processor type register	PER0	CTR	Control register
PER1	PVR	The processor type register	PER2	VIMSK	The vector IMR
PER3	ALABR	ARM7 IA breakpoint register	PER2	VIMSK	The vector IMR
PER3	ALABR	ARM7 IA breakpoint register	PER4	ADABR	ARM7 data address breakpoint register
PER5	SPREG	The scratchpad register	PER4	ADABR	ARM7 data address breakpoint register
PER5	SPREG	The scratchpad register	PER6	STR	Status register

The operation of control register control MSP100, all positions among the CTR are eliminated when resetting, and B.2 the definition of register shown in showing.

Table definition B.2:CTR

The position	Memonic symbol	Explanation
The position	Memonic symbol	Explanation	31-13		Keeping the position reads as 0 forever
12	VDCI	Vector data cache invalidation position. During set, it is invalid that whole vector processor data caches are become. Because the cache invalidation operation can conflict with normal cache operations usually, so can only support an invalid code sequence.	31-13		Keeping the position reads as 0 forever
12	VDCI		11	VDE	Vector data cache enabling position. When removing, forbid the vector processor data cache
10	VICI	Vector instruction cache invalidation position. It is invalid that whole vector processor instruction caches are become. Because the cache invalidation operation can conflict with normal cache operations usually. So can only support an invalid code sequence.	11	VDE
10	VICI		9	VICE	Vector instruction cache enabling position. When removing, forbid the vector processor instruction cache.

The position	Mnemonic symbol	Explanation
The position	Mnemonic symbol	Explanation	8	ADCI	ARM7 data cache invalid bit. When set, it is invalid that whole ARM7 data caches are become. Because cache invalidation operates usually together with normal cache operations conflict, so only support an invalid code sequence.
7	ADCE	ARM7 data cache enable bit. When removing, forbid the ARM7 data cache.	8	ADCI
7	ADCE		6	AICI	ARM7 instruction cache invalid bit. When set, it is invalid that whole ARM7 instruction caches are become. Because cache invalidation operates usually together with normal cache operations conflict, so only support an invalid code sequence.
5	AICE	ARM7 instruction cache enable bit. When removing, forbid the ARM7 instruction cache	6	AICI
5	AICE		4	APSE	ARM7 processor single step enable bit. When set, make the ARM7 processor after carrying out an instruction, it is unusual that the single step of ARM7 processor occurs. The single step function only obtains under user or way to manage.
3	SPAE	Scratchpad access enable bit. When setting, allow ARM7 to process from scratchpad and load or deposit scratchpad. When removing, attempt loading or be stored into scratchpad unusual to produce ARM7 invalid data address	4	APSE
3	SPAE		2	VPSE	Vector processor single step enable bit. When setting, make vector processor after carrying out an instruction, it is unusual that the vector processor single step occurs.
1	VPPE	Vector processor streamline enable bit. When removing, the configuration vector processor is in order to operate under the nonpipeline mode. This moment, it was movable only having an instruction in the vector processor execution pipeline.	2	VPSE
1	VPPE		0	VPAE	Vector processor access enabled position. When setting, make as mentioned above the ARM7 instruction of ARM7 processing execution expansion. When removing, stop ARM7 processing execution expansion ARM7 instruction. All such attempts can produce unavailable unusual of vector processor

The state of status register instruct MS P100. All positions among the field STR are eliminated when resetting, and B.3 the definition of register shown in showing.

Show B.3 STR definition

The position	Memonic symbol	Explanation
The position	Memonic symbol	Explanation	31：23		Reservation position-forever pronounce 0
22	ADAB	When ARM7 data address breakpoint coupling occured, ARM7 data address breakpoint exception bits was set up, and interrupting report by data exception should be unusual.	31：23		Reservation position-forever pronounce 0
22	ADAB		21	AIDA	When ARM7 loads or the storage instruction attempts to access debatable address or MSP concrete scheme when not finishing, maybe when attempting to access a unallowed scratch pad memory, it is unusual to produce ARM7 invalid data address. Thisly unusually can stop interrupt reporting by data.
20	AIAB	When ARM7 IA breakpoint matches now, ARM7 IA breakpoint exception bits is set. This stops by looking ahead interrupting reporting unusually.	21	AIDA
20	AIAB		19	AIIA	ARM7 illegal command address is unusual. This exception stops by looking ahead interrupting reporting.
18	ASTP	The ARM7 single step is unusual. This stops by looking ahead interrupting reporting unusually.	19	AIIA
18	ASTP		17	APV	ARM7 protection violation. The exception is reported via the IRQ interrupt
16	VPUA	Vector processor can not get an exception, the exception can not get through the coprocessor Interrupt to report	17	APV
16	VPUA		15-0		Reserved - always read as 0

Processor type (Version) register identifies the processor specific multimedia signal processor family Processor type.

Vector processor interrupt mask register VIMSK control processor 110 different vector processor Often reported. With VISRC register when the corresponding bit is set when, VIMSK in each one ARM7 to interrupts generated an exception. It does not affect how to detect abnormal vector processors, but the impact is No exceptions will be interrupted ARM7. In VIMSK all the bits are cleared at reset. Register set Defined as shown in Table B.4

Table B.4: VIMSK Definition

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation	31	DABE	Data address break interrupt enable
30	LABE	Instruction address break interrupt enable	31	DABE	Data address break interrupt enable
30	LABE	Instruction address break interrupt enable	29	SSTPE	Single-step interrupt enable
28-14		Reserved - always read as 0.	29	SSTPE	Single-step interrupt enable
28-14		Reserved - always read as 0.	13	FOVE	Floating point overflow interrupt enable
12	FINVE	Illegal floating point operand interrupt enable	13	FOVE	Floating point overflow interrupt enable
12	FINVE	Illegal floating point operand interrupt enable	11	FDIVE	Floating-point division by zero interrupt enable
10	IOVE	Integer overflow interrupt enable	11	FDIVE	Floating-point division by zero interrupt enable
10	IOVE	Integer overflow interrupt enable	9	IDIVE	Integer divide by zero interrupt is enabled
8-7		Reserved - always read as 0	9	IDIVE	Integer divide by zero interrupt is enabled
8-7		Reserved - always read as 0	6	VIE	VCINT interrupt enable
5	VJE	VCJOIN interrupt enable	6	VIE	VCINT interrupt enable
5	VJE	VCJOIN interrupt enable	4-1		Reserved - always read as 0
0	CSE	Context switching is enabled	4-1		Reserved - always read as 0

ARM7 instruction address breakpoint registers ARM7 aid debugging process. Register Definition Table B.5 shows.

Table B.5: AIABR Definition

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation	31-2	LADR	ARM7 instruction address
1		Reserved, always read as 0	31-2	LADR	ARM7 instruction address
1		Reserved, always read as 0	0	LABE	Instruction address breakpoints can, cleared on reset. If set, when "ARM7 instruction accesses address" matches ALABR <31:2>, And VCSR <AIAB> cleared occurs ARM7 instruction to Address breakpoint exception, VCSR <ALAB> set to indicate an exception. When a match occurs, if VCSR <ALAB> has been set, then the VCSR <AIAB> cleared match is ignored. In the instruction execution Before reporting anomalies.

"ARM7 Data Address Breakpoint Registers" Auxiliary ARM7 debug procedures. Register Definition As shown in Table B.6.

Table B.6: ADABR Definition

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation	31-2	DADR	ARM data addresses. Undefined at reset
1	SABE	Storage "Address Breakpoint Enable" in the reset clears. If set, when the ARM7 Memory access address high 30 matches ADABR <31:2> and VCSR <ADAB> Is cleared, the occurrence of "ARM7 Data Address Breakpoint" exception. VCSR <ADAB> set, indicates abnormalities. When a match occurs, if VCSR <ADAB> Has been set, this VCSR <ADAB> is cleared, Match is ignored. In storage before instruction execution, the exception is reported.	31-2	DADR	ARM data addresses. Undefined at reset
1	SABE		0	LABE	Load address breakpoint enabled. Cleared on reset. If set, when the ARM7 Load Access address high 30 matches ADABR <31:2> and VCSR <ADAB> Is cleared when "ARM7 Data Address Breakpoint" exception. VCSR <ADAB> is set to indicate an exception. When a match occurs if the VCSR <ADAB> has been set, this VCSR <ADAB> is cleared, Match is ignored. In previously reported abnormal load instruction.

"Scratchpad registers" Configuring the cache subsystem 130 is formed using a high SRAM Speed and size of the temporary address. Register definitions are shown in Table B.7

Table B.7: SPREG Definition

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation	31-11	SPBASE	"High-speed buffer base address" indicates the start address of scratchpad high 21. According MSP_BASE register value, which value must have 4M bytes Offset
10-2	Retention		31-11	SPBASE
10-2	Retention		1-0	SPSIZE	The size scratchpad 00 -> 0K (vector processor with 4K data cache) 01 -> 2K (vector processor with 2K data cache) 10 -> 3K (vector processor with 1K data cache) 11 -> 4K (without vector processor data cache)

Users extended registers 110 and 120 is mainly used for synchronization of the processor. Users extended registers when Only a pre defined, the mapping in place 30, and for example "MFERR15, UERx" finger Order will return a bit value Z flag. Bit UERx <31> and UERx <29:0> are always Read as 0. Users extended registers are described in Table B.8.

Table B.8: User Extension Register

Number	Mnemonic	Explanation
Number	Mnemonic	Explanation	UER0	VPSTATE	Vector register status flag. When set, bit 30 indicates vector processing Is the VP-RUN state, and executes instructions. When cleared, which means that In VP_IDLE vector processor state and has stopped VPC Addressing the next instruction to be executed. VPSTATE <30> in the reset Is cleared.
UER1	VASYNC	Vector and ARM7 synchronization flag. Bit 30 provides vectors and ARM7 Department Processor 120 and 110 between the producer / consumer type synchronization. Vector VMOV instruction processor 120 can set or clear this flag. The standard Chi also can be used MFER or MTER ARM7 instruction processing is Set or cleared. In addition, the flag can be read or set command TESTSET Position.	UER0	VPSTATE

Table B.9 shows the power-on reset state when the extended registers.

Table B.9: Extended power status register

Register	Reset state
Register	Reset state	CTR	0
PVR	TBD	CTR	0
PVR	TBD	VIMSK	0
ALABR	AIABR <0> = 0, the other not defined	VIMSK	0
ALABR	AIABR <0> = 0, the other not defined	ADABR	ADABR <0> = 0, the other not defined
STR	0	ADABR	ADABR <0> = 0, the other not defined
STR	0	VPSTATE	VPSTATE <30> = 0, the other not defined
VASYNE	VASYNC <3> = 0, the other not defined	VPSTATE	VPSTATE <30> = 0, the other not defined

Appendix C

Structural state vector processor 120 comprises 32 32-bit scalar register; 32 288 Vector registers of two groups; one pair of 576 vector accumulator register; a group of 32 dedicated registers. Scalar, vector, and intended for general-purpose programming accumulator register with, and supports many different data types.

The following tags are used here and later parts: VR indicates vector registers; VRi denotes the i-th vector registers (zero offset); VR [i] represents the vector register VR in the i-th data element; Represents the vector register VR <a:b> the bits a to b, and VR [i] <a:b> means to the VR of registers in the i-th bit of a data element to b.

For a number of elements in a vector register, vector data structure has a type and number of additional According to the length dimension. Because there is a fixed size vector register, it depends on the number of data elements to maintain Element length. MSP structure defines as shown in Table C.1 length 5 elements.

Table C.1: the length of the data element

Length Name	Length (bits)
Length Name	Length (bits)	Boolean	1
Byte	8	Boolean	1
Byte	8	Byte 9	9
Halfword	16	Byte 9	9
Halfword	16	Word	32

MSP structure, according to the data type specified in the instruction and length to explain the vector data. Typically, Most math instruction byte, byte 9, halfword, and word length of the element supports two's complement (integer) grid Style. In addition, for most arithmetic instructions, the word length of the element supports IEEE754 single precision format.

A programmer can be in any desired way to interpret the data, as long as the instruction sequence to produce meaningful Results. For example, programmers freely in bytes 9 to store 8-bit unsigned number, which is equivalent to freely 8 unsigned byte data saved to the element, and with the supplied two's complement arithmetic instructions to operate They are, as long as the program can handle "false" overflow results.

There are 32 scalar registers, called SR0 to SR31. Scalar register is 32 bits long and to accommodate Is satisfied by any one of a defined length of a data element. Scalar register is a special register SR0 Makers. Register SR0 always read 32 zeros. And disregard for SR0 register writes. Byte, word, Section 9 and the half-word data type is stored in the scalar register the least significant bit, and that the most significant Bits have undefined values.

Since no data type indicator registers, the programmer must know the storage used by each instruction The data types. This differs from the 32-bit register that contains the 32-bit value other structures. MSP A structured data type specified correctly modify only the results for the defined data type A bit. For example, Byte 9 plus the results can only be modified scalar register 32 goals Low 9. The higher the value of the 23 Not defined. Unless otherwise indicated by instruction.

64 vector registers are configured two groups, each group of 32 registers. Group 0 contains the first 32 Registers, followed by the group 1 comprises 32 registers. These two groups a set to the current group, Another setting or alternative groups. All vector instruction through the use of default values in the current group registers, except The load / store and register transfer instructions, they can access the alternative group vector register. In the "to Volume control "and" Status Register VCSR "in CBANK bits used to set the group of 0 or 1 to For the current group (another one as an alternative group). In the current group of vector registers are designated as VR0 to VR31, and in the alternative group designated as VRA0 to VRA31. ...

VRi<575：0>＝VR ₁i<287：0>：VR ₀i<287：0>

Here VR₀i and VR₁i are 1 and 0 represents the group number of registers in the vector register VRi. Double-wide vector registers are called VR0 to VR31.

Vector register can hold byte, byte 9, halfword, or word length of more than one element, as shown in Table C.2 Shown.

Table C.2: number of elements of each vector register

Length of the element name	Element length (bits)	Maximum number of elements	The total number of bits used
Length of the element name	Element length (bits)	Maximum number of elements	The total number of bits used	Byte 9	9	32	288
Byte	8	32	256	Byte 9	9	32	288
Byte	8	32	256	Halfword	16	16	256
Word	32	8	256	Halfword	16	16	256

Does not support a mixture of various elements length register. In addition to byte 9 elements outside with only 288 The 256 bits. In particular, the ninth bit of each do. In byte, half-word and word length of 32 without Bit is reserved. Their values programmer should not make any assumptions.

Vector accumulator register is compared to the result in the destination register has higher precision intermediate nodes If available storage. Vector accumulator register 288 consists of four registers, which is VAC1H, VAC1L, VAC0H and VAC0L. VAC0H: VAC0L default by the three instructions through the Purposes. VEC64 mode only, VCL1H: VAC1L 9 to 64 bytes for the analog vector operations. Even in VEC32 manner set 1 for the current group, still use this VAC0H: VAC0L right.

To generate the source vector register with the same number of elements in the result of extended precision, by a pair of Registers to hold the extended-precision elements, as shown in Table C.3.

Table C.3: Vector Accumulator format

Element length	Logical View	VAC format
Element length	Logical View	VAC format	Byte 9	VAC[i]<17：0>	VAC0H [i] <8>: VAC0L <i> <8:0> with For i = 0 .. 31 and VAC1H [i-32] <8:0>: VAC1L [i-32] <8:0> for i = 32 .. 63
Byte	VAC[i]<15：0>	VAC0H [i] <7:0>: VAC0L <i> <7:0> For i = 0 .. 31 and VAC1H [i-32] <7:0>: VAC1L [i-32] <7:0> for i = 32 .. 63	Byte 9	VAC[i]<17：0>
Byte	VAC[i]<15：0>		Halfword	VAC[i]<31：0>	VAC0H [i] <15:0>: VAC0L <i> <15: 0> for i = 0 .. 15 and VAC1H [i-16] <15: 0>: VAC1L [i-16] for i = 16 .. 31
Word	VAC[i]<63：0>	VAC0H [i] <31:0>: VAC0L <i> <31: 0> for i = 0 .. 7 and VAC1H [i-8] <31: 0>: VAC1L [i-8] <31:0> for i = 8 .. 15	Halfword	VAC[i]<31：0>

Only VEC64 mode only used VAC1H: VAC1L right, at this time the number of elements, the byte 9 (and Byte), halfword and word 64, 32 or 16, respectively.

There are 33 dedicated registers can not be loaded directly from memory or directly into memory. 16 special Using registers are called RASR0 to RASR15, forming an internal subroutine return address stack by adjusting Use and return instructions for use. Another 17 32 dedicated registers are shown in Table C.4

Table C.4: special register

Number	Mnemonic	Explanation
Number	Mnemonic	Explanation	SP0	VCSR	Vector control and status register
SP1	VPC	Vector program counter	SP0	VCSR	Vector control and status register
SP1	VPC	Vector program counter	SP2	VEPC	Vectored exception program counter
SP3	VISRC	Vectored interrupt source register	SP2	VEPC	Vectored exception program counter
SP3	VISRC	Vectored interrupt source register	SP4	VIINS	Vectored Interrupt instruction register
SP5	VCR1	Vector Count Register 1	SP4	VIINS	Vectored Interrupt instruction register
SP5	VCR1	Vector Count Register 1	SP6	VCR2	Vector Count Register 2
SP7	VCR3	Vector Count Register 3	SP6	VCR2	Vector Count Register 2
SP7	VCR3	Vector Count Register 3	SP8	VGMR0	Total vector mask register 0
SP9	VGMR1	Vector mask register a total	SP8	VGMR0	Total vector mask register 0
SP9	VGMR1	Vector mask register a total	SP10	VOR0	Vector overflow register 0
SP11	VOR1	Vector overflow register 1	SP10	VOR0	Vector overflow register 0
SP11	VOR1	Vector overflow register 1	SP12	VLABR	Vector data address breakpoint registers
SP13	VDABR	Vector instruction address breakpoint register	SP12	VLABR	Vector data address breakpoint registers
SP13	VDABR	Vector instruction address breakpoint register	SP14	VMMR0	Vector shift mask register 0
SP15	VMMR1	Vector mask register a transfer	SP14	VMMR0	Vector shift mask register 0
SP15	VMMR1	Vector mask register a transfer	SP16	VASYNC	Vector and ARM7 Synchronization Register

Vector control and status registers VCSRDefinitions are shown in Table C.5

Table C.5: VCSR Definition

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation	31：18	Retention
17：13	VSP<4：0>	Return address stack pointer. VSP by moving to and from the subroutine subroutine Cheng instructions to return to use to keep track of internal return address stack. In return Return address stack is only 16 entrance, VSP <4> is used to detect stack Overflow condition.	31：18	Retention
17：13	VSP<4：0>		12	SO	Summary overflow status flag. When the result of an arithmetic operation overflows, this bit is Set. This bit is once set is unchanged until the write 0 to

		When cleared.
		When cleared.	Position	Mnemonic	Explanation
11	GT	Greater than the state flag. When SRa> SRb, use VSUBS instruction set Set this bit.	Position	Mnemonic	Explanation
11	GT		10	EQ	Equal status flag. When SRa = SRb, use VSUBS instruction set Set this bit.
9	LT	Less than the state flag. When SRa <SRb time by VSUBS instruction set The bit	10	EQ
9	LT		8	SMM	Select a transfer mask. When this bit is set, VMMR0 / 1 to becoming operator Shielding elements surgery operation.
7	CEM	Complement shielding elements. When this bit is set, regardless of the configured arithmetic Shielding element operation, the element is defined shielded VGMR0 / 1 or VMMR0 / 1 to 1's complement. This bit does not change VGMR0 / 1 or VMMR0 / 1 contents of these registers are used only to change. SMM: CEM Code provides: 00 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 01 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 10 - with VMMR0 / 1 as all elements except VCMOVM outside Masked. 11 - with VMMR0 / 1 as all elements except outside VCMOVM Masked. ...	8	SMM
7	CEM		6	OED	Complement shielding elements. When this bit is set, regardless of the configured arithmetic Shielding element operation, the element is defined shielded VGMR0 / 1 or VMMR0 / 1 to 1's complement. This bit does not change VGMR0 / 1 or VMMR0 / 1 contents of these registers are used only to change. SMM: CEM Code provides: 00 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 01 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 10 - with VMMR0 / 1 as all elements except VCMOVM outside Masked. 11 - with VMMR0 / 1 as all elements except outside VCMOVM Masked. ...
5	ISAT	Integer saturation mode. OED: ISAT bit combination is defined as: 00 No OED: ISAT bit saturation states: 00 unsaturated, when an overflow exception reports. X1 saturated, does not cause an overflow	6	OED

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation			10 unsaturated, when an overflow exception is not reported.
4：3	RMODE	IEEE754 floating point rounding mode operation. 00 negative infinity rounding direction 01 rounding direction zero 10 rounding direction closest to the value 11 rounding direction positive infinity			10 unsaturated, when an overflow exception is not reported.
4：3	RMODE		2	FSAT	Saturation mode bit floating point (IEEE fast way)
1	CBANK	Current group bit. When set, indicates that the group one of the current group. When cleared the table Group 0 for the current group show, when VEC64 bit is set, CBANK suddenly Slightly.	2	FSAT	Saturation mode bit floating point (IEEE fast way)
1	CBANK		0	VEC64	64 bytes 9 vector mode bit. When set, the provisions of vector registers and There accumulator 576. Default mode specified length of 32 bytes 9 and its Called VEC32 mode.

Vector VPC program counter registerBy the vector processor 120 to execute the next instruction Address. ARM7 processor 110 is issued STARTVP command to start operation of the vector processor 120 Register should be loaded before the VPC.

Vectored exception program counter VEPCIndicate the most likely to cause abnormal latest instruction address. MSP100 does not support precise exception, therefore, with the "most likely" is used.

Vectored interrupt source register VISRCOn the ARM7 processor 110 that the interrupt source. Appropriate bit By the hardware when the abnormality is detected is set. In the vector processor 120 begins executing software before re- Must clear the register VISRC. In the register VISRC any location in the bit vector processing are caused 120 into the state of VP-IDLE. If the corresponding interrupt enable bit in VIMSK be set to Interrupts the processor 110 is sent. Table C.6 defines the contents of the register VISRC.

C.6: VISRC Definition

Position	Mnemonics	Explanation
Position	Mnemonics	Explanation	31	DAB	Data address breakpoint exception
30	LAB	Instruction address breakpoint exception	31	DAB	Data address breakpoint exception
30	LAB	Instruction address breakpoint exception	29	SSTP	Single step exception
28-18		Retention	29	SSTP	Single step exception
28-18		Retention	17	IIA	Invalid instruction address anomalies
16	IINS	Invalid instruction exception	17	IIA	Invalid instruction address anomalies
16	IINS	Invalid instruction exception	15	IDA	Invalid data address exception
14	UDA	Unaligned data access exception	15	IDA	Invalid data address exception
14	UDA	Unaligned data access exception	13	FOV	Floating-point overflow exception
12	FINV	The number of floating-point invalid operation exception	13	FOV	Floating-point overflow exception
12	FINV	The number of floating-point invalid operation exception	11	FDIV	Floating-point division by zero exception
10	IOV	Integer overflow exception	11	FDIV	Floating-point division by zero exception
10	IOV	Integer overflow exception	9	IDIV	Integer Divide by Zero exception
8	RASO	The return address on the stack overflow exception	9	IDIV	Integer Divide by Zero exception
8	RASO	The return address on the stack overflow exception	7	RASU	The return address stack underflow exception
6	VIP	VCINT exception is pending, the Executive STARTVP command clears the bit	7	RASU	The return address stack underflow exception
6	VIP		5	VJP	VCJOIN exception is pending, the Executive STARTVP command clears the bit
4-0	VPEV	Vector processor exception vector	5	VJP

Vector interrupt instruction register VIINSWhen VCINT or VCJOIN instruction is executed to interrupt the ARM7 processor 110, VCINT or VCJOIN instruction is updated.

Vector Count Register VCR1, VCR2 and VCR3For the "reduction and branch" instructions VD1CBR, VD2CBR and VD3CBR, and used to perform the loop count is initialized. When executing OK VD1CBR instruction VCR1 register is decremented by 1. If the count value is not zero, and the command Match the conditions as referred to VFLAG, branching occurs. Otherwise, the branch does not occur. Register VCR1 In any case, can be decremented by 1. Register VCR2 and VCR3 be used in the same way.

Vector fully shielded register VGMR0VEC32 mode indicates the destination vector will be affected Register elements and in VEC64 mode in VR <287:0> elements within. In VGMR0 A control vector for each 9 bits in the destination register updates. Specifically, VGMR0 Control VEC32 mode VRd <9i +8:9 i> Updates and VEC64 VR mode₀d <9i +8:9 I> update. Note, VR₀d refers to the VEC64 mode within the destination register bank 0 Device, while VRd refers to the current group destination register. In VEC32 mode, both in the group 0, Also for group 1. Vector mask register VGMR0 full instructions for all except the VCMOVM Execution of the instructions.

Vector mask register VGMR1 represents all VEC64 VR mode will be affected <575:288> elements within. In each of the control register VGMR1 the purpose of group 1 vector 9 bits register updates. Specifically VGMR1 control VR1 <9i +8:9 i> Updates. In VEC32 VGMR1 mode register is not used, but VEC64 mode, image In addition VCMOVM instruction outside the ring of all instructions executed.

Vector overflow register VOR0VEC32 mode represents the elements and VEC64 mode VR <287:0> the elements that comprise a vector arithmetic overflow after the results. The register Scalar register is not subject to modification arithmetic. Bit VOR0 set indicates byte and byte 9 The i-th element of the first <i,idiv2> half word elements, or the operation of the first word data type (i, idiv4) Elements including overflow results. For example,

bits

1 and 3 may be set to indicate, respectively, the first half-word And word elements overflow. In VOR0 median mapping differs from the median of VGMR0 or VGMR1 Mappings.

Vector overflow register VOR1VEC64 mode for showing VR <575:288> The elements that are included in the vector arithmetic operation result after an overflow. Register VOR1 in VEC32 Mode is not used, nor by the scalar arithmetic to modify. Bit set VOR1 expressed words Section or byte 9 i-th element, half-word section (i, idiv2) elements, or the operation of the first word data type (i idiv4) elements include an overflow results. For example,

bits

1 and 3 may be respectively set as shown in the VR <575:288> in the first half-word or word element overflow. In VOR1 median mapping does not VGMR0 or the same as the mapping of the bits VGMR1.

Vector instruction address breakpoint register VLABRAid debugging vector program. Registers are defined as Table C.7 below.

Table C.7: VLABR Definition

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation	31-2	IADR	Vector instruction address, the reset is not defined
1		Reserved bit	31-2	IADR	Vector instruction address, the reset is not defined
1		Reserved bit	0	IABE	Instruction address breakpoints enabled. In the reset is not defined. If set, when the vector refers to Make access address with VLABR <31:2> matches happen "vector instruction to Address Breakpoint "exception, set bit VISRC <IAB> to indicate abnormalities of the different Often before instruction execution reports.

Vector data address breakpoint registers VDABRAid debugging vector program. Registers are defined as Table C.8 representation.

Table C.8: VDABR Definition

Position	Mnemonic	Explanation
Position	Mnemonic	Explanation	31-2	DADR	Vector data addresses. When the reset is not defined
1	SABE	Memory address breakpoint enabled. Reset is not defined. If set, when the vector storage Chu access address with VDABR <31:2> match happen "vector Data Address Breakpoint "exception. VISRC <DAB> bit is set to indicate Exception. Previously reported in the storage instruction execution exception.	31-2	DADR	Vector data addresses. When the reset is not defined
1	SABE		0	LABE	Load address breakpoint enabled. Cleared on reset. If set, when the vector plus Set access address with VDABR <31:2> match occurs when "the number of vectors According to the Address Breakpoint "Exception. VISRC <DAB> is set to indicate an exception. Before loading the instruction execution report abnormalities.

Vector shift mask register VMMR0At all times for VCMOVM command to use, while When VCSR <SMM> = 1 in time for all commands used. Register VMMR0 indicates VEC32 Mode will be affected elements of the destination vector register, and VEC64 mode VRL <287:0> inline elements. Each bit in the VMMR0 control vector nine bits of the destination register Updates. Specifically VMMR0 in VEC32 mode control VRd <9i +8:9 i> Updates the control mode in VEC64 VR₀d <9i +8:9 i> updates. In VEC64 mold Where VR₀d indicates the purpose of the group 0 register, VRd refers to the current group of the destination register, In VEC32 mode VRd can also be in group 0 In Group 1.

Vector shift mask register VMMR1At all times for VCMOVM command to use, while When VCSR <SMM> = 1 in time for all commands used. Register VMMR1 indicates VEC64 Model affected the VR <575:288> elements, VMMR1 of each control Vector group 1 9 bits in the destination register updates. Specifically VGMR1 control VR1d <9i +8:9 i> update. In VEC32 VGMR1 mode register is not used.

Vector and ARM7 synchronization register VASYNCProvided between the processor 110 and 120 Production / Consumer type of synchronization. Currently, the only defined bit 30. When the vector processor VP-120 RUN or VP_IDLE time, ARM7 processor available MFER, MTER and TESTSET means Make access to the register VASYNC. Register VASYNC not pass TVP or MFVP instruction is ARM7 processor accesses. Because these commands can not access beyond the beginning of 16 vector processors Special register. Vector processing instruction accesses through VMOV register VASYNC.

Table C.9 shows power-on reset vector processor state.

Table C.9: Power-on reset state vector processors

Register	Reset state
Register	Reset state	SR0	0
All other registers	Undefined	SR0	0

In the vector processor can execute instructions prior to the adoption ARM7 processor 110 initializes dedicated registers Register.

Appendix D

Each instruction implied or required by the source and destination operand data types. Some commands have the same Applicable to more than one data type semantics. Some instructions have the semantics of the source with a digital According to the types, and different data types on the results. This appendix describes the exemplary embodiment of the number of support According to the type. In the present application are described in Table 1 of the supported data types int8, int9, int16, int32 and float. Does not support unsigned integer format, unsigned integer value in the first before use First must be converted to two's complement format. The programmer is free to use unsigned integer arithmetic instructions together or Select any other format, as long as the proper handling overflow. This structure defines only two's complement integer 32-bit floating-point number and the type of data overflow. These structures are not detected 8,9,16 or 32-bit computing Implementation of this operation is to detect the necessary unsigned overflow. Table D.1 shows the loading operation supported by the The data length ...

Length of the data memory	Register data length	Load operation
Length of the data memory	Register data length	Load operation	8-bit	9-bit	Load 8, sign extended to 9 (for Canadian Contains eight two's complement)
8-bit	9-bit	Load eight, zero-extended to nine (for loading Unsigned 8)	8-bit	9-bit
8-bit	9-bit	Load eight, zero-extended to nine (for loading Unsigned 8)	16-bit	16-bit	Load 16, (used to load 16-bit unsigned Or two's complement)
32-bit	32-bit	Load 32, (used to load 32-bit unsigned, 2's complement integer or 32-bit floating point)	16-bit	16-bit	Load 16, (used to load 16-bit unsigned Or two's complement)

This structure according to the data type specified memory address boundary alignment. That is not aligned on byte to Requirements; right halfword aligned halfword boundary conditions; right word is the word boundary alignment condition.

Table D.2 shows the supported data storage operation length

Table D.2: storing operations supported by the data length

Register data length	Length of the data memory	Storage operation
Register data length	Length of the data memory	Storage operation	8-bit	8-bit	Storage 8 (8-bit unsigned storage or 2's complement Code)
9-bit	8-bit	Cut to the lower 8 bits, storage 8 (a memory 9	8-bit	8-bit	Storage 8 (8-bit unsigned storage or 2's complement Code)

		Whether the value of the symbol of 0-255 2 Complement)
		Whether the value of the symbol of 0-255 2 Complement)	16-bit	16-bit	Storage 16 (16-bit unsigned storage or 2 Complement).
32-bit	32-bit	Storage 32	16-bit	16-bit	Storage 16 (16-bit unsigned storage or 2 Complement).

Because more than one data type is mapped to either a scalar or vector registers. So in the head Registers for some data types may be some bit is not defined results. In fact, in addition to the Amount of data in the destination register in byte 9 the operation and the length of the scalar data in the destination register word length operation Work, the some bits in the destination register, their values are not due to an operation are defined. These bits, Structural requirements of their value is undefined, Table D.3 shows the length of the data for each undefined Position.

Table D.3: Undefined bit data length

Data length	Vector destination register	Scalar destination register
Data length	Vector destination register	Scalar destination register	Byte	VR<9i+8>，for i＝0 to 31	SR<31：8>
Byte 9	none	SR<31：9>	Byte	VR<9i+8>，for i＝0 to 31	SR<31：8>
Byte 9	none	SR<31：9>	Halfword	VR<9i+8>，for i＝0 to 31	SR<31：16>
Word	VR<9i+8>，for i＝0 to 31	none	Halfword	VR<9i+8>，for i＝0 to 31	SR<31：16>

When programming programmer must know the source and destination registers or memory data type. Data Classes Length of the element from one type into another potentially resulting in a different number of elements stored in a vector register Medium. For example, from half-word to word data type conversion of vector registers need two vector registers to save The same number of storage elements is converted. On the contrary, from the vector register with a user-defined format word Data type conversion into half-word format, in the vector register is half the number of elements to produce the same, and the remaining I bit in the other half. In both cases, the data type conversion is converted to produce an element having Structure configuration, the length of these elements is different from the length of the source element. ...

Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package The basic steps include:

Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package The basic steps include:...

In the vector register VRa Lieutenant two's complement number from the larger element length (int16) into smaller lengths (As int8) included in the basic steps of:

1 Verify int16 data types byte length of each element can be represented. If you need to To the ends of saturation of the elements to fit a smaller length.

(2) the elements of the VRa vector VRb to mix with another wash transferred to two vectors VRc: VRd, in the VRa: VRd, each element in the high half transferred to VRc, transferred to the lower half VRd, so the low half of VRd effectively VRa collection of all elements in the lower half million Su.

The following data type conversion in order to provide some special instructions: int32 into single-precision floating-point; single Precision floating-point into fixed-point (XY notation); single-precision floating-point turn into int32; int8 into int9; int9 Into int16; and int16 into int9.

To provide flexibility in program design vector, most vector instructions and use only shielded element Operation within the selected vector register element. "Full vector mask register" VGMR0 and VGMR1 element identified by vector instructions in the destination register and vector accumulator to be modified Elements. 9 bytes for byte and a data length of operation, VGMR0 (or VGMR1) in 32-bit Everyone in the identification of an element to be operated, the bit is set, indicates VGMR0 byte length Element i will be effect. Where i is 0 to 31. Right half-word data in terms of the length of operation, in VGMR0 (or VGMR1) Each of the 32 bits in the two identified an element to be operated. Bit VGMR0 <2i: 2i +1> Set, indicates that the role of element i will be, i is 0-15. If the length of the half word data operations VGMR0 only one pair is set, then only those bits corresponding byte is modified. To Data word length operation, VGMR0 (or VGMR1) set for each set of four bits identify an element is operated Made. Bit VGMR0 <4i: 4i +3> set, indicates that the role of element i will be, i is 0-7. As VGMR0 the fruit in the four bits are not set all bits of the data word length operation is set, only the Those bits should byte is modified. ...

For vector programming flexibility, most MSP instruction supports vector and scalar operations three kinds Form, are as follows:

1 vector = vector of vector operations

(2) vector = scalar vector operations

3 vector = scalar scalar operations

Case 2 scalar registers specified as the B operand in scalar register a single element complex A vector is made to match the number of elements in the operand required amount. Copied elements are designated with a scalar Operand elements have the same value. Scalar Operands for immediate operands form can be derived from a scalar register Or instruction. In the case of immediate operand, if the data type specified by the data length ratio can be obtained Immediately to the field length is large, the use of appropriate sign extension.

In many multimedia applications, especially immediate attention to the source and the accuracy of the final result. In addition, the entire Multiply instruction produces energy stored in two vector registers in the "double precision" intermediate results.

Typically, MSP architecture supports 8,9,16 and 32 elements in two's complement integer format And 32 elements IEEE754 single precision format. The definition of an overflow, the result is outside a predetermined data Type can be represented by the maximum positive or maximum negative range. When an overflow occurs, write the destination register The value is not a valid number, the defined underflow used only for floating-point operations.

Unless otherwise noted, all floating point operations specified in bits VCSR <RMODE> rounding the four One way. Some instructions use the well-known rounding zero (even rounding) rounding mode. These instructions are clearly Noted.

In many multimedia applications, the saturation is an important feature. MSP architecture supports all four Integer and floating-point operations saturation. The median in the register VCSR ISAT Specify integer saturation mode. Floating point Saturated mode, also known as IEEE fast manner in which a VCSR FSAT bit to specify. When enabled saturated Mode, exceeds the maximum positive or negative results are a large set maximum positive or maximum negative value. In this Case, no overflow occurs, the overflow bit can not be set.

Table D.4 lists the exact exceptions that previously specified in the implementation of fault detection and reporting. Different Constant vector address in hexadecimal notation.

Table D.4: precise exception

Exception vector	Explanation
Exception vector	Explanation	0x00000018	Vector processor instruction address breakpoint exception
0x00000018	Vector processor data address breakpoint exception	0x00000018	Vector processor instruction address breakpoint exception
0x00000018	Vector processor data address breakpoint exception	0x00000018	Invalid instruction exception vector processors
0x00000018	Single step exception vector processors	0x00000018	Invalid instruction exception vector processors
0x00000018	Single step exception vector processors	0x00000018	Vector processors return address on the stack overflow exception
0x00000018	Vector processors return address stack underflow exception	0x00000018
0x00000018	Vector processors return address stack underflow exception	0x00000018	Exception vector processor VCINT
0x00000018	Exception vector processor VCJOIN	0x00000018	Exception vector processor VCINT

Table D.5 lists the inexact exception, these anomalies in the implementation of certain directives in the program is faulty After instruction, to be detected and reported.

Table D.5: inexact exception

Exception vector	Explanation
Exception vector	Explanation	0x00000018	Invalid exception vector processor instruction address
0x00000018	Invalid data address exception vector processors	0x00000018	Invalid exception vector processor instruction address
0x00000018	Invalid data address exception vector processors	0x00000018	Vector processor does not align data access exception
0x00000018	Vector processor integer overflow exception	0x00000018	Vector processor does not align data access exception
0x00000018	Vector processor integer overflow exception	0x00000018	Floating-point overflow exception vector processors
0x00000018	Floating-point invalid operand vector processor exception	0x00000018	Floating-point overflow exception vector processors
0x00000018	Floating-point invalid operand vector processor exception	0x00000018	Vector processor floating point divide by zero exception
0x00000018	Vector processor integer divide by zero exception	0x00000018	Vector processor floating point divide by zero exception

Appendix E

The vector processor instructions included are shown in Table E.1 in eleven categories

Table E.1	Vector instruction class summary
Table E.1	Vector instruction class summary	Category	Explanation
Control flow	Instructions contained in this category include the transfer and is used to control the interface ARM7 instruction The program flow.	Category	Explanation
Control flow		Logical (bitwise manner, shielding)	This class includes instruction bitwise logical manner. Although (bitwise manner, shielding) Data type is Boolean class, but logic instructions to modify using elemental shield The results, which requires data types.
Shift and Rotate (Calculated as elemental way, shielded)	This category contains instructions for each element of the shift and rotate bit screen Cover. The class distinction between the length of the element, and shielded by the elements of.	Logical (bitwise manner, shielding)
Shift and Rotate (Calculated as elemental way, shielded)		Arithmetic (Calculated as elemental way, shielded)	This class includes elements of the way by arithmetic instructions. (Calculated as elemental way, shielded) That is a result of i-th element of the source element of the i-th calculated , The type of the elements of the class distinction, and subject to the impact of shielding elements.
Multimedia (Calculated as elemental way, shielded)	This category contains instructions for optimizing multimedia (calculated as elemental way, shielded) Applications, the class distinction element type, and shielded by the elements affected.	Arithmetic (Calculated as elemental way, shielded)
Multimedia (Calculated as elemental way, shielded)		Data Type Conversion (Calculated as elemental way, unshielded)	This class contains the instructions for converting from one element (element mode, no screen Cover) data type to another. This class supports the specified data class instruction Type set, and without shielding elements, since this structure does not support storage Is in more than one data type.
Arithmetic between elements	This class includes instruction for a different location from the vector fetch two elements An arithmetic result.
Arithmetic between elements		Transfer between elements	This class includes instruction for a different location from the vector fetch two elements Rearrange elements.
Load / store	This class includes instructions for loading or storage registers. These instructions are not Masked by the impact of elements.	Transfer between elements
Load / store		Cache Operation	This category contains instructions for controlling the instruction and data caches. These refer to So shielded from the impact of elements.
Register transfers	This class contains instructions for transferring data between two registers. These Instructions are usually shielded from the impact of elements, but some elements can be selected Masked.	Cache Operation

Table E.2 lists the flow control instructions.

Table E.2: Flow control instructions

Mnemonic	Explanation
Mnemonic	Explanation	VCBR	Conditional branch
VCBRI	Indirect conditional branch	VCBR	Conditional branch
VCBRI	Indirect conditional branch	VD1CBR	Reduction VCR1 and conditional branches
VD2CBR	Reduction VCR2 and conditional branches	VD1CBR	Reduction VCR1 and conditional branches
VD2CBR	Reduction VCR2 and conditional branches	VD3CBR	Reduction VCR3 and conditional branches
VCJSR	Conditions rotor routines	VD3CBR	Reduction VCR3 and conditional branches
VCJSR	Conditions rotor routines	VCJSRI	Indirect rotor routine conditions
VCRSR	Conditional Return from the program	VCJSRI	Indirect rotor routine conditions
VCRSR	Conditional Return from the program	VCINT	ARM7 interrupt conditions
VCJOIN	Conditions confluence with ARM7	VCINT	ARM7 interrupt conditions
VCJOIN	Conditions confluence with ARM7	VCCS	Context switching conditions
VCBARR	Conditions barrier	VCCS	Context switching conditions
VCBARR	Conditions barrier	VCHGCR	Change Control Register (VCSR)

Logic class supports Boolean data type and shielded by the elements affected. Table E.3 lists the flow control commands.

Table E.3: logic instructions

Mnemonic	Explanation
Mnemonic	Explanation	VNOT	NOT--B
VAND	AND-(A&B)	VNOT	NOT--B
VAND	AND-(A&B)	VCAND	Complement AND-(-A & B)
VANDC	AND complement - (A &-B)	VCAND	Complement AND-(-A & B)
VANDC	AND complement - (A &-B)	VNAND	NAND--(A&B)
VOR	OR-(A\|R)	VNAND	NAND--(A&B)
VOR	OR-(A\|R)	VCOR	Complement OR-(-A \| R)
VORC	OR complement of - (A \|-R)	VCOR	Complement OR-(-A \| R)
VORC	OR complement of - (A \|-R)	VNOR	NOR--(A\|R)
VXOR	XOR - (A ^ R)	VNOR	NOR--(A\|R)
VXOR	XOR - (A ^ R)	VXNOR	Exclusive NOR - (A ^ R)

Shift / Rotate shift class instructions int8, int9, int16 and int32 data type operations (non-floating Point data types), and subject to the impact of shielding elements. Table E.4 lists the shift / rotate class instruction.

Table E.4: Shift and Rotate class

Mnemonic	Explanation
Mnemonic	Explanation	VDIV2N	In addition to a power of 2
VLSL	Logical Shift Left	VDIV2N	In addition to a power of 2
VLSL	Logical Shift Left	VLSR	Logical Shift Right
VROL	Rotate Left	VLSR	Logical Shift Right
VROL	Rotate Left	VROR	Rotate Right

Typically, the arithmetic class instruction support int8, int9, int16 and int32 and floating-point data types, and Masked by the impact of elements. For non-supported data types specifically limited, see below each instruction A detailed description. VCMPV instruction is not subject to the impact shield element, the element shield their work situation Condition. Table E.5 lists the arithmetic class instruction.

Table E.5: math class

Mnemonic	Explanation
Mnemonic	Explanation	VASR	Arithmetic shift right
VADD	Plus	VASR	Arithmetic shift right
VADD	Plus	VAVG	Average
VSUB	Minus	VAVG	Average
VSUB	Minus	VASUB	Less absolute
VMUL	Multiply	VASUB	Less absolute
VMUL	Multiply	VMULA	Multiply accumulator
VMULAF	Multiply accumulator fractional	VMULA	Multiply accumulator
VMULAF	Multiply accumulator fractional	VMULF	Multiply decimals
VMULFR	And multiply decimals and rounding	VMULF	Multiply decimals
VMULFR	And multiply decimals and rounding	VMULL	By Low
VMAD	Multiplication and addition	VMULL	By Low
VMAD	Multiplication and addition	VMADL	Low multiplication and addition
VADAC	Add and accumulate	VMADL	Low multiplication and addition

Mnemonic	Explanation
Mnemonic	Explanation	VADACL	Add and accumulate low
VMAC	Multiply and accumulate	VADACL	Add and accumulate low
VMAC	Multiply and accumulate	VMACF	Multiply and accumulate fractional
VMACL	Multiply and accumulate low	VMACF	Multiply and accumulate fractional
VMACL	Multiply and accumulate low	VMAS	Multiply and subtract from accumulator
VMASF	Multiply and subtract from accumulator fractional	VMAS	Multiply and subtract from accumulator
VMASF	Multiply and subtract from accumulator fractional	VMASL	Multiply and subtract from accumulator low
VSATU	Saturated to the upper limit	VMASL	Multiply and subtract from accumulator low
VSATU	Saturated to the upper limit	VSATL	Saturated to the lower limit
VSUBS	Less scalar and postcondition	VSATL	Saturated to the lower limit
VSUBS	Less scalar and postcondition	VCMPV	Compare vectors and set mask
VDIVI	In addition to initializing	VCMPV	Compare vectors and set mask
VDIVI	In addition to initializing	VDIVS	Except
VASL	Arithmetic shift right	VDIVS	Except
VASL	Arithmetic shift right	VASA	Arithmetic shift an accumulator

MPEG instructions are specially adapted for the MPEG encoding and decoding of a class of instructions, but may be in various Manner. MPEG directive does not support int8, int9, int16 and int32 data types, and are subject to Elements shielding effects. Table E.6 lists MPEG instruction.

Table E.6: MPEG class

Mnemonic	Explanation
Mnemonic	Explanation	VAAS3	Plus processing (-1,0,1) symbol
VASS3	Addition and subtraction (-1, 0) Symbol	VAAS3	Plus processing (-1,0,1) symbol
VASS3	Addition and subtraction (-1, 0) Symbol	VEXTSGN2	Extraction (-1,1) symbol
VEXTSGN3	Extraction (-1,0,1) symbol	VEXTSGN2	Extraction (-1,1) symbol
VEXTSGN3	Extraction (-1,0,1) symbol	VXORALL	XOR all elements of the least significant bit.

Each data type conversion instruction to support specific data types, and is not a shadow shield element Sound, because this structure does not support more than one register data type. Table E.7 lists the data classes Type conversion instructions.

Table E.7: data type conversion classes

Mnemonic	Explanation
Mnemonic	Explanation	VCVTIF	Convert from integer to float
VCVTFF	Floating-point to fixed-point conversion	VCVTIF	Convert from integer to float
VCVTFF	Floating-point to fixed-point conversion	VROUND	Rounding floating-point to integer (supports four IEEE rounding mode Style)
VCNTLZ	Count leading 0	VROUND
VCNTLZ	Count leading 0	VCVTB9	Converting data type Byte 9

Internal element arithmetic class instruction support int8, int9, int16 and int32 and floating-point data types.

Table E.8 lists the internal elements of math class instruction.

Table E.8: internal element arithmetic class

Mnemonic	Explanation
Mnemonic	Explanation	VADDH	Two adjacent elements plus
VAVGH	Average of two adjacent elements	VADDH	Two adjacent elements plus
VAVGH	Average of two adjacent elements	VAVGQ	Average of the four elements
VMAXE	Maximum switching even / odd elements	VAVGQ	Average of the four elements

Transfer between elements support byte-oriented instructions, byte 9, halfword, and word length of the data, are listed in Table E.9 Transfer between the elements of class instruction.

Table E.9: Elements interline transfer type

Mnemonic	Explanation
Mnemonic	Explanation	VESL	Elements to the left one
VESR	Elements to the right one	VESL	Elements to the left one
VESR	Elements to the right one	VSHFL	Even / odd element shuffling
VSHFL	Even / odd element shuffling	VSHFL	Even / odd element shuffling
VSHFL	Even / odd element shuffling	VSHFLH	High even / odd element shuffling
VSHFLL	Low even / odd element shuffling	VSHFLH	High even / odd element shuffling
VSHFLL	Low even / odd element shuffling	VUNSHFL	Even / odd elements deshuffling
VUNSHFLH	High even / odd elements deshuffling	VUNSHFL	Even / odd elements deshuffling
VUNSHFLH	High even / odd elements deshuffling	VUNSHFLL	Low even / odd elements deshuffling

Load / store instructions in addition to support for byte, half-word and word length of the data outside also particularly relevant support byte 9 Data length operation, and shielded by the elements of. Table E.10 lists the load / store instruction class.

Table E10: load / store category

Mnemonic	Explanation
Mnemonic	Explanation	VL	Load
VLD	Load double word	VL	Load
VLD	Load double word	VLQ	Load quadword
VLCB	Loaded from the ring buffer	VLQ	Load quadword
VLCB	Loaded from the ring buffer	VLR	Inverse sequence of elements loaded
VLWS	Span load	VLR	Inverse sequence of elements loaded
VLWS	Span load	VST	Storage
VSTD	Memory double word	VST	Storage
VSTD	Memory double word	VSTQ	Storage quadword
VSTCB	Stored in the ring buffer	VSTQ	Storage quadword
VSTCB	Stored in the ring buffer	VSTR	Inverse sequence of elements stored
VSTWS	Span storage	VSTR	Inverse sequence of elements stored

Most of register transfer instruction support int8, int9, int16 and int32 and floating-point type, Not affected by the impact shield element, only VCMOVM instruction is subject to the impact shield element. Table E.11 Lists the register transfer class instruction.

Table E.11: register transfer class

Mnemonic	Explanation
Mnemonic	Explanation	VLI	Immediate loading
VMOV	Shift	VLI	Immediate loading
VMOV	Shift	VCMOV	Conditional transfer
VCMOVM	Shielded with conditional branching element	VCMOV	Conditional transfer
VCMOVM	Shielded with conditional branching element	VEXTRT	Extracting an element
VINSERT	Insert an element	VEXTRT	Extracting an element

Table E.12 lists the cache subsystem 130 controls a cache operation class instruction.

Table E.12: Cache operation class

Mnemonic	Explanation
Mnemonic	Explanation	VCACHE	The data or instruction cache cache operation
VPFTCH	To a data cache prefetch	VCACHE	The data or instruction cache cache operation
VPFTCH	To a data cache prefetch	VWBACK	From the data cache write-back

Instructions predicate

To simplify the description of the instruction set in the appendix uses a special terminology. For example, the instruction operation Operand is a byte, byte 9, halfword, or word length signed two's complement integer, unless otherwise Comments. The term "registers" is used to refer to common (scalar or vector) registers, other types of registers Are clearly explained. Press assembly language syntax, the suffix b, b9, h, and w represents the data length (byte, Byte 9, half-word and word) and integer data types (int8, int9, int16 and int2). In addition, with the To describe the instruction operands, operation, and assembly language syntax terminology and symbols are as follows.

Rd purpose registers (vector, scalar or dedicated)

Ra, Rb source registers a and b (vector, scalar or private)

Rc source or destination register c (vector or scalar)

Rs store data source register (vector or scalar)

S 32-bit scalar or special registers

Vector register VR current group

VRA substitution group vector register

VR0 0 group vector register

VR1 1 set of vector registers

VRd vector destination register (default is the current group, unless the VRA is specified)

VRa, VRb vector source register a and b

The source or destination register VRC vector C

VRS vector store data source register

VAC0H vector accumulator registers 0 High

VAC0L vector accumulator register 0 Low

VAC1H vector accumulator registers a high

VAC1L vector accumulator registers a low

SRd scalar destination register

SRa, SRb scalar source registers a and b

SRb + in order to effectively address base register update

SRs scalar data storage source register

SP special register

VR [i] vector register VR in the i-th element

VR [i] <a:b> vector register VR in the i-th element of a to b bits

VR [i] <msb> vector register VR in the i-th element in the most significant bit

The effective memory access address EA

MEM Memory

BYTE [EA] EA memory address of a byte

HALF [EA] EA in halfword memory address, the address EA +1 to bits <15:8>.

WORD [EA] EA of a memory address, address EA +3 to bits <31:24>.

NumElem given data type for the specified number of elements. In VEC32 model, the word

Section and byte 9, halfword, or word data length, respectively 32,16, or 8; in

VEC64 model, byte and byte 9, halfword, or word data length, respectively

64, 32 or 16. Scalar operation NumElem is 0.

EMASK [i] represents the i element by element shield. Byte for byte and 9, half-word or words

Data length, in VGMR0 / 1, ~ VGMR0 / 1, VMMR0 / 1 or ~

VMMR0 / 1 respectively represents 1, 2 or 4 bits. A scalar operation, even

EMASK [i] = 0, but also that element shield is set.

MMASK [i] represents the i element by element shield. In bytes and byte 9, halfword, or word

Data length, respectively, in VMMR0 or VMMR1 represents 1, 2 or 4

Position.

VCSR vector control and status register

VCSR VCSR <x> represents one bit or more bits. "X" is the field name

VPC vector processor program counter

VECSIZE vector register length, in VEC32 pattern is 32, the pattern is in VEC64

64。

SPAD register

C programming structure used to describe the flow control operation. Some exceptions noted below:

= Assignment

Connection

{X ‖ Y} X or Y to select between (not a logical or)

sex on the length of the specified data sign extension

sex_dp the specified data length double precision number sign extension

sign "(arithmetic) right sign extension

zex the specified data length zero-extended

zero "(logic) right zero extension

"Left (filled with zeros)

trnc7 truncated front 7 (from a half-word)

trnc1 front of an amputated (from byte 9)

% Modulo operator

| expression | expression taking the absolute value

/ Except (for floating-point data types using four kinds of IEEE rounding modes)

/ / Divide (using a zero rounding mode rounding)

Saturate () for integer types saturated to the maximum negative or maximum positive value, does not produce an overflow; right

To floating-point data types, can be saturated to positive infinity, positive zero, negative zero, or negative

Or infinity.

General instruction format shown in Figure 8 and described below.

REAR formatBy the load, store and use the operation instruction cache, and format the fields with REAR

Have the meanings given below in Table E.13.

Table E.13: REAR format

Field	Significance
Field	Significance	OPC<4：0>	Opcode
B	The Rn registers the group identifier	OPC<4：0>	Opcode
B	The Rn registers the group identifier	D	Purpose / Source scalar register. When set, Rn <4:0> point mark Volume register. In VEC32 mode, the B: D-coded legal values are: 00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is in the alternative group vector register 11 Undefined

	In VEC64 mode, the B: D-coded legal values are: 00 Only in the vector register Rn 4,8,16 or 32 bytes are Use 01 Rn is a scalar register 10 vector register Rn, all 64 bytes are used 11 Undefined
		TT<1：0>	Transfer type, indicating the specific load or store operation. See below LT And ST coding table.
C	Cache off. This bit is set to bypass the data cache load Memory. This bit is used to load and store instructions Mnemonic set cache-off Set (OFF to connect Mnemonic)	TT<1：0>
C		A	Address is updated, set this bit with a valid address update SRb. Effective Address Press SRb + SRi calculation.
Rn<4：0>	Destination / source register number	A
Rn<4：0>	Destination / source register number	SRb<4：0>	Scalar base register number
SRi<4：0>	Indexed register number marked	SRb<4：0>	Scalar base register number

Bits 17:15 are reserved and should be zero, in order to ensure that the structure in the future to extend compatibility. B: D and Some coding TT field is undefined, the programmer should not use these codes, because the structure is not specified when Such a coding is used the expected results. Table E.14 shows VEC32 and VEC64 modes are supported Scalar load operation (the TT field is encoded as LT).

Table E.14 in VEC32 and VEC64 mode REAR load operation

D：LT	Mnemonic	Significance
D：LT	Mnemonic	Significance	100	.bs9	Load 8 become byte 9 lengths sign extension
101	.h	Load 16 become half-word length	100	.bs9	Load 8 become byte 9 lengths sign extension
101	.h	Load 16 become half-word length	110	.bz9	Load 8 byte 9 lengths become zero expansion
111	.w	Load 32 as word length	110	.bz9	Load 8 byte 9 lengths become zero expansion

Table E.15 shows VEC32 mode support vector load operation (the TT fields are assigned as LT Code), then VCSR <0> bit is cleared.

Table E.15: VEC32 mode REAR load operation

D：LT	Mnemonic	Significance
D：LT	Mnemonic	Significance	000	.4	4 bytes from the memory into the register to load the lower 4 bytes 9, And keep the remaining byte 9 does not change. 4 bytes for each section 9 9 Bit according to the corresponding section 8 for sign extension.
001	.8	Loaded from the memory into the register lower 8 bytes 8 bytes 9, And keep the remaining byte 9 does not change. 8 bytes for each section 9 9 Bit according to the corresponding section 8 for sign extension.	000	.4
001	.8		010	.16	Load 16 bytes from memory into the register lower 16 bytes 9 and keep the remaining byte 9 does not change. 9 of 16 bytes each No. 9 according to the corresponding section 8 for sign extension.
011	.32	Load 32 bytes from memory into the register lower 32 bytes 9 and keep the remaining byte 9 does not change. 9 of 32 bytes each No. 9 according to the corresponding section 8 for sign extension.	010	.16

B bit is used to indicate the current or alternative groups.

Table E.16 shows VEC64 mode support vector load operation (by the TT field as LT Coding). At this point VCSR <0> bit is set.

Table E.16: VEC32 load operation mode REAR

B：D：LT	Mnemonic	Significance
B：D：LT	Mnemonic	Significance	0000	.4	4 bytes from the memory into the register to load the lower 4 bytes 9 and keep the remaining byte 9 does not change. Each of 4 bytes 9 The first nine months, according to the corresponding section 8 for sign extension.
0001	.8	Loaded from the memory into the register lower 8 bytes 8 bytes 9 and keep the remaining byte 9 does not change. 9, each 8 bytes First nine months according to the corresponding section 8 for sign extension.	0000	.4
0001	.8		0010	.16	16 bytes from the memory into the register to load the lower 16 characters Section 9 and keep the remaining byte 9 does not change. 16 bytes 9 Each section 9 according to the corresponding section 8 for sign extension.

B：D：LT	Mnemonic	Significance
B：D：LT	Mnemonic	Significance	0011	.32	Load 32 bytes from memory into the register lower 32 words Section 9 and keep the remaining byte 9 does not change. 32 bytes 9 Each section 9 according to the corresponding section 8 for sign extension.
1000	Undefined		0011	.32
1000	Undefined		1001	Undefined
1010	Undefined		1001	Undefined
1010	Undefined		1011	.64	Loads from memory 64 bytes into the register lower 64 words Section 9 and keep the remaining byte 9 does not change. 64 bytes 9 Each section 9 according to the corresponding section 8 for sign extension.

64 bytes of bit B is used to indicate vector operations, because the VEC64 mode when a group and there is no alternative The concept of groups.

Table E.17 lists VEC32 and VEC64 scalar modes are supported storage operation (in the TT field Is encoded as ST).

Table E.17: REAR scalar storage operations

D：ST	Mnemonic	Significance
D：ST	Mnemonic	Significance	100	.b	Memory byte or byte 9 lengths become 8 (from byte 9 truncated one)
101	.h	Store halfword length become 16	100	.b
101	.h	Store halfword length become 16	110	Undefined
111	.w	Memory word length as 32	110	Undefined

Table E.18 lists VEC32 mode support vector storage operation (in the TT field was incorporated as a ST Code), then VCSR <0> bit is cleared.

Table E18: VEC32 mode REAR vector storage operations

D：ST	Mnemonic	Significance
D：ST	Mnemonic	Significance	000	.4	4 bytes from the register memory to the memory, register 4 bytes 9 Each section 9 is ignored.
001	.8	Storage 8 bytes from the register to the memory byte register 8 9 Each section 9 is ignored.	000	.4
001	.8		010	.1b	Store 16 bytes from register to memory, registers 16 bytes 9 Each of the ninth bit is ignored.
011	.32	Store 32 bytes from register to memory, registers 32 bytes 9 Each of the ninth bit is ignored.	010	.1b

Table E.19 lists VEC64 mode support vector storage operation (in the TT field was incorporated as a ST Code), then VCSR <0> bit is set.

Table E.19: In VEC32 REAR vector memory operation mode

B：D：ST	Mnemonic	Significance
B：D：ST	Mnemonic	Significance	0000	.4	4 bytes from the register memory to the memory, register 4 bytes 9 each of the ninth bit is ignored.
0001	.8	Register stores 8 bytes from the memory, registers 8 bytes 9 each of the ninth bit is ignored.	0000	.4
0001	.8		0010	.16	Store 16 bytes from register to memory, registers 16 words Each section 9 Section 9 is ignored.
0011	.32	Store 32 bytes from register to memory, registers 32 words Each section 9 Section 9 is ignored.	0010	.16
0011	.32		1000	Undefined
1001	Undefined		1000	Undefined
1001	Undefined		1010	Undefined
1011	.64	Store 64 bytes from register to memory, registers 64 words Each section 9 Section 9 is ignored.	1010	Undefined

Bit B is used to indicate 64 byte vector operations, because in VEC64 mode does not exist in the current group and alternative The concept of groups.

REAI formatBy the loading, storage, and operating instruction cache, table E.20 shows REAI grid Under the meaning of each field type.

Table E.20: REAI format

Field	Significance
Field	Significance	OPC<4：0>	Opcode
B	Group identifier register Rn. When VEC32 mode settings, Rn <4:0> indicates the group in alternative vector register number; When VEC64 mold Type is set, it indicates that all vectors (64 bytes) operation.	OPC<4：0>	Opcode
B		D	Purpose / Source scalar register. When set, Rn <4:0> represents a landmark Volume register. In VEC32 mode B: D-coded legal values are:

	00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is an alternative set of vector registers 11 Undefined In VEC64 mode B: D-coded legal values are: 00 only in the vector register Rn, 8, 16 or 32 bytes are the Use 01 Rn is a scalar register 10 vector registers Rn, the entire 64 bytes are used 11 Undefined ...
		TT<1：0>	00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is an alternative set of vector registers 11 Undefined In VEC64 mode B: D-coded legal values are: 00 only in the vector register Rn, 8, 16 or 32 bytes are the Use 01 Rn is a scalar register 10 vector registers Rn, the entire 64 bytes are used 11 Undefined ...
C	Cache Close, set this bit to bypass loading the data cache. This bit is used to load and store instructions Cache-off mnemonic set (connection OFF to mnemonic).	TT<1：0>
C		A	Address is updated, set this bit with a valid address update SRb. Effective address by SRb + IM <7:0> calculation.
Rn<4：0>	Destination / source register number	A
Rn<4：0>	Destination / source register number	SRb<4：0>	Scalar base register number
IMM<7：0>	An 8-bit immediate offset, according to two's complement digital illustration.	SRb<4：0>	Scalar base register number

REAR and REAI format used to transmit the same type of coding. See further details on the coding REAR format.

RRRM5 format provides three registers or two registers, and a 5-bit immediate operand. Table E.21 RRRM5 format defined fields.

Table E.21: RRRM5 format

Field	Significance
Field	Significance	OP<4：0>	Opcode
D	Purpose scalar register. When set, Rd <4:1> indicates scalar Storage Device; When cleared, Rd <4:0> indicates vector register.	OP<4：0>	Opcode
D		S	Scalar Rb register. When set point Rb <4:0> is a scalar register Device; When cleared, Rb <4:0> is the vector registers.
SD<1：0>	Data width, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type) 11 characters (for int2 or floating point data types)	S
SD<1：0>		M	D: S bit modifier, see below D: S: M coding table.
Rd<4：0>	Objective D register number	M	D: S bit modifier, see below D: S: M coding table.
Rd<4：0>	Objective D register number	Ra<4：0>	Source A register number
Rb <4:0> or IM5 <4:0>	Source B register, or 5-bit literal, depending on D: S: M coding, 5 immediate value as an unsigned number.	Ra<4：0>	Source A register number

Bit 19:15 Reserved and must be zero to ensure compatibility in the future to expand.

All vector register operand refers to the current group (group 0 can be also be a group) unless otherwise Make statements. Table E.22 lists when DS <1:0> is 00, 01 or 10 of the D: S: M series Yards.

Table E22: DS is not equal to 11:00 RRRM5 the D: S: M Coding

Coding	Rd	Ra	Rb/IM5	Note
Coding	Rd	Ra	Rb/IM5	Note	000	VRd	VRa	VRb	Three vector register operands
001				Undefined	000	VRd	VRa	VRb	Three vector register operands
001				Undefined	010	VRd	VRa	SRb	B operand is a scalar register
011	VRd	VRa	IM5	B operand is the immediate 5	010	VRd	VRa	SRb	B operand is a scalar register
011	VRd	VRa	IM5	B operand is the immediate 5	100				Undefined
101				Undefined	100				Undefined
101				Undefined	110	SRd	SRa	SRb	Three scalar register operand
111	SRd	SRa	IM5	B operand is the immediate 5	110	SRd	SRa	SRb	Three scalar register operand

When DS <1:0> is 11:00 D: S: M coding has the following meanings:

Table E.23: DS equal to 11:00, RRRM5 the D: S: M Coding

D：S：M	Rd	Ra	Rb/IM5	Note
D：S：M	Rd	Ra	Rb/IM5	Note	000	VRd	VRa	VRb	Three vector register operand (int32)
001	VRd	VRa	VRb	Three vector register operand (float)	000	VRd	VRa	VRb	Three vector register operand (int32)
001	VRd	VRa	VRb	Three vector register operand (float)	010	VRd	VRa	SRb	B operand is a scalar register (int32)
011	VRd	VRa	IM5	B operand is 5 immediate data (int32)	010	VRd	VRa	SRb	B operand is a scalar register (int32)
011	VRd	VRa	IM5	B operand is 5 immediate data (int32)	100	VRd	VRa	SRb	B operand is a scalar register (float)
101	SRb	SRa	SRb	Three scalar register operand (float)	100	VRd	VRa	SRb	B operand is a scalar register (float)
101	SRb	SRa	SRb	Three scalar register operand (float)	110	SRd	SRa	SRb	Three scalar register operand (int32)
111	SRd	SRa	IM5	B operand is 5 immediate data (int32)	110	SRd	SRa	SRb	Three scalar register operand (int32)

RRRR formatProvides four register operands, Table E.24 shows RRRR format fields.

Table E.24: RRRR format

Field	Significance
Field	Significance	Op<4：0>	Opcode
S	Scalar Rb register. When set point Rb <4: 0> is a scalar register; When cleared, Rb <4: 0> is a vector register.	Op<4：0>	Opcode
S		DS<1：0>	Data length, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type) 11 characters (for int32 data type)
Rc<4：0>	Source / destination register number C	DS<1：0>
Rc<4：0>	Source / destination register number C	Rd<4：0>	Objective D register number
Ra<4：0>	Source A register number	Rd<4：0>	Objective D register number
Ra<4：0>	Source A register number	Rb<4：0>	Source B register number

All vector register operand refers to the current group (either 0 group can also be a group), unless otherwise Make statements.

RI formatOnly by the load immediate instruction. Table E.25 RI format specified field.

Table E.25: RI format

Field	Significance
Field	Significance	D	Purpose scalar register. When set, Rd <4:0> represents a landmark Volume registers; When cleared, Rd <4:0> indicates the current group a Vector register.
F	Floating-point data types. When set, indicates that a floating-point data types, and Requirements DS <1:0> of 11.	D
F		DS<1：0>	Data length, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for intt9 data type) 10 half-word (for int16 data type)

	11 characters (for int32 or floating point data types)
	11 characters (for int32 or floating point data types)	Rd<4：0>	Objective D register number
IMM<18：0>	A literal value 19	Rd<4：0>	Objective D register number

Field F: DS <1:0> of certain coding undefined. Programming these codes should not, as This structure is not given when using this encoding the expected consequences. Loaded into the Rd value depends on the number of The type of data, as shown in Table E.26.

Table E.26: RI format load value

Format	Data Type	Register operand
Format	Data Type	Register operand	.b	Byte (8)	Rd<7：0>：＝Imm<7：0>
.b9	Byte (9)	Rd<8：0>：＝Imm<8：0>	.b	Byte (8)	Rd<7：0>：＝Imm<7：0>
.b9	Byte (9)	Rd<8：0>：＝Imm<8：0>	.h	Half-word (16)	Rd<15：0>：＝Imm<15：0>
.w	Word (32)	Rd <31:0>: = sign extension IMM <18:0>	.h	Half-word (16)	Rd<15：0>：＝Imm<15：0>
.w	Word (32)	Rd <31:0>: = sign extension IMM <18:0>	.f	Floating point (32)	Rd <31>: = Imm <18> (symbol) Rd <30:23>: = Imm <17:0> (index) Rd <22:13>: = Imm <9:0> (mantissa) Rd <12:0>: = 0

CT formatContains fields shown in Table E.27.

Table E.27: CT format

Field	Significance
Field	Significance	Opc<3：0>	Opcode
Cond<2：0>	Transfer conditions: 000 unconditional Less than 001 010 is equal to Less than or equal to 011 100 is greater than 101 is not equal to Greater than or equal to 110 111 Overflow	Opc<3：0>	Opcode
Cond<2：0>		IMM<22：0>	23 immediate digital offset press two's complement number instructions.

Transfer conditions using VCSR [GT: EQ: LT] fields. Overflow condition using VCSR [SO] bit, when When set, it takes precedence over GT, EQ and LT bit. VCCS and VCBARR places other than the above Said to explain Cond <2:0> field, refer to its instruction description details.

RRRM9 formatSpecify three registers or two registers and a 9 immediate operand. Table E.28 given RRRM9 formatted field.

Table E.28: RRRM9 format

Field	Significance
Field	Significance	Opc<5：0>	Opcode
D	Purpose scalar register. When set, Rd <4:0> represents a landmark Volume registers; When cleared, Rd <4:0> represents a vector register Makers.	Opc<5：0>	Opcode
D		S	Scalar Rb register. When set, indicates Rb <4:0> is a scalar Register; When cleared, Rb <4:0> is a vector register.
DS<1：0>	Data width, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type)	S

	11 characters (for int32 or floating point data types)
	11 characters (for int32 or floating point data types)	M	On the D: S bit modifier, see the back D: S: M coding table.
Rd<4：0>	Purpose register number	M
Rd<4：0>	Purpose register number	Ra<4：0>	Source A register number
Rb <4:0> or IM5 <4:0>	Source B register number or a 5-bit literal, depending on the D: S: M series Yards.	Ra<4：0>	Source A register number
Rb <4:0> or IM5 <4:0>		IM9<3：0>	And IM5 <4:0> supplied with a 9 immediate, depending on D: S: M coding.

Bits are reserved bits 19:15, when D: S: M coding does not specify an immediate operand, and must Must be 0 to ensure future compatibility.

All vector register operand refers to the current group (either 0 group can also be a group) unless otherwise Make statements. D: S: M coding with RRRM5 format shown in Table E.22 and E.23 those are relative Same, except under DS <1:0> coding segments extracted from the literal number immediately above, Table E.29 Shown.

Table E.29: RRRM9 format literal value

DS	Matching data types	B operand
DS	Matching data types	B operand	00	int8	Source B<7：0>：＝IM9<2：0>：IM5<4： 0>
01	int9	Source B<8：0>：＝IM9<3：0>：IM5<4： 0>	00	int8	Source B<7：0>：＝IM9<2：0>：IM5<4： 0>
01	int9	Source B<8：0>：＝IM9<3：0>：IM5<4： 0>	10	int16	Source B<15：0>：＝sex(IM9<3：0>：IM5<4： 0>)
11	int32	Source B<31：0>：＝sex(IM9<3：0>：IM5<4： 0>)	10	int16	Source B<15：0>：＝sex(IM9<3：0>：IM5<4： 0>)

Floating-point data types can not get immediate format.

The following is based on Alphanumeric MSP vector instructions. Note:

1 Unless otherwise indicated, the instruction is shielded by the elements of. CT formatting commands without shadow shield element Rang. By the load, store and cache directive composed REAR and REAI formatting commands are not subject to Elements shield effect.

2 floating-point data types can not get 9 immediate operand.

3 only in the operating instructions given in vector form. The scalar operations, assuming only one, namely 0 Element is defined.

4 pairs RRRM5 and RRRM9 format, the following coding for integer data types (b, b9, h, w):

D：S：M	000	010	011	110	111
D：S：M	000	010	011	110	111	DS	00	01	10	11

5 on RRRM5 and RRRM9 format, the following coding for floating-point data types:

D：S：M	001	100	n/a	101	n/a
D：S：M	001	100	n/a	101	n/a	DS					11

6 may cause an overflow for all the instructions that, when VCSR <ISAT> bit is set, the saturation to int8, int9, int16, int32 maximum or minimum limit is adopted. Accordingly, when VCSR <FSAT> Bit is set, the floating-point result saturates to - infinity, -0, +0 or + infinity.

7 Press syntactic rules,. N can be used instead. B9 to represent the data length byte 9.

8 for all the instructions to return to the destination register or to the vector of the accumulator is IEEE754 floating-point results Single-precision format. Floating-point results written to the lower portion of the accumulator, high part does not change.

VAAS3 plus and additional (-1, 0) Symbol

Format

Assembler syntax

VAAS3.dt VRd，VRa，VRb

VAAS3.dt VRd，VRa，SRb

VAAS3.dt SRd，SRa，SRb

Where dt = {b, b9, h, w}.

Supported modes

D：S：M	V＜-V@V	V＜-V@S		S＜-S@S
D：S：M	V＜-V@V	V＜-V@S		S＜-S@S		DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Vector / Scalar contents of register Ra Rb is added to produce an intermediate result, the intermediate results Additionally with Ra symbol; and the end result is stored in the vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

if(Ra[i]＞0) extsgn3＝1；

else if(Ra[i]＜0) extsgn3＝-1；

else extsgn3＝0；

Rd[i]＝Ra[i]+Rb[i]+extsgn3；

}

Abnormal

Overflow.

VADAC add and accumulate

Format

Assembler syntax

VADAC.dt VRc，VRd，VRa，VRb

VADAC.dt SRc，SRd，SRa，SRb

Where dt = {b, b9, h, w}.

Supported modes

S	VR	SR
S	VR	SR			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

The Ra and Rb each element of the operand vector elements with each double accumulator sum the Each element of the double-precision and stored into the accumulator and the destination vector register Ra and Rd. Ra and Rb Using the specified data type, and VAC using the appropriate double-precision data types (16,18,32 and 64 respectively int8, int9, int16 and int32). Each double-precision elements are stored in a high VACH and Rc. If Rc = Rd, Rc the results are undefined.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Aop[i]＝{VRa[i]‖SRa}；

Bop[i]＝{VRb[i]‖SRb}；

VACH[i]：VACL[i]＝sex(Aop[i]+Bop[i])+VACH[i]：VACL[i]；

Rc[i]＝VACH[i]；

Rd[i]＝VACL[i]；

}

VADACL add and accumulate low

Format

Assembler syntax

VADACL.dt VRd，VRa，VRb

VADACL.dt VRd，VRa，SRb

VADACL.dt VRd，VRa，#IMM

VADACL.dt SRd，SRa，SRb

VADACL.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

D：S：M DS

V＜-V@V int8(b)

V＜-V@S int9(b9)

V＜-V@I int16(h)

S＜-S@S int32(w)

S＜-S@I

The Ra and Rb / immediate operand vector for each element and each extended precision accumulator element Addition, the extended precision and deposit vector accumulator; returned to the accuracy of the lower destination register Rd. Ra and Rb / immediate use of the specified data type, and VAC with the appropriate double-precision data types (16,18,32 and 64 respectively int8, int9, int16 and int32). Each extended precision Elements are stored in VACH in high.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

VACH[i]：VACL[i]＝sex(Ra[i]+Bop[i])+VACH[i]：VACL[i]；

Rd[i]＝VACL[i]；

}

VADD plus

Format

Assembler syntax

VADD.dt VRd，VRa，VRb

VADD.dt VRd，VRa，SRb

VADD.dt VRd，VRa，#IMM

VADD.dt SRd，SRa，SRb

VADD.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w, f}.

Supported modes

D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I
D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I	DS	int8(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

Plus Ra and Rb / immediate operands, and return them to the destination register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝[VRb[i]‖SRb‖sex(IMM<8：0>))；

Rd[i]＝Ra[i]+Bop[i]；

}

Abnormal

Overflow, floating-point invalid operand.

VADDH plus two adjacent elements

Format

Assembler syntax

VADDH.dt VRd，VRa，VRb

VADDH.dt VRd，VRa，SRb

Where dt = {b, b9, h, w, f}.

Supported modes

D：S：M	V＜-V@V	V＜-V@S
D：S：M	V＜-V@V	V＜-V@S				DS	int8.(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

Operating

for(i＝0；i＜NumElem-1；i++){

Rd[i]＝Ra[i]+Ra[i+1]；

}

Rd[NumElem-1]＝Ra[NumElem-1]+{VRb[0]‖SRb}；

Abnormal

Overflow, floating-point invalid operand.

Programming notes

This directive is NOT affected shielding elements.

VAND with

Format

Assembler syntax

VAND.dt VRd，VRa，VRb

VAND.dt VRd，VRa，SRb

VAND.dt VRd，VRa，#IMM

VAND.dt SRd，SRa，SRb

VAND.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.

Supported modes

D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I
D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I	DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

For Ra and Rb / logical and immediate operands and returns the result to the destination register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝Ra[i]<k>&Bop[i]<k>，k＝for all bits in elementi；

}

Abnormal

None.

VANDC and complement

Format

Assembler syntax

VANDC.dt VRd，VRa，VRb

VANDC.dt VRd，VRa，SRb

VANDC.dt VRd，VRa，#IMM

VANDC.dt SRd，SRa，SRb

VANDC.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.

Supported modes

Explanation

For Ra and Rb / immediate operands and logical complement, and returns the result to the destination register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝(VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝Ra[i]<k>&～Bop[i]<k>，k＝for all bits in elementi；

}

Abnormal

None.

Arithmetic shift accumulator VASA

Format

Assembler syntax

VASAL.dt

VASAR.dt

Where dt = {b, b9, h, w}, and R that direction left or right shift.

Supported modes

R	left	right
R	left	right			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Vector accumulator registers each data element left one position, and filled with zeros from the right (If R = 0), or a sign-extended right position (if R = 1). The results are stored in a vector Accumulator.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

if(R＝1)

VACOH[i]：VACOL[i]＝VACOH[i]：VACOL[i]sign＞＞1；

else

VACOH[i]：VACOL[i]＝VACOH[i]：VACOL[i]＜＜1；

}

Abnormal

Overflow.

VASL arithmetic left shift

Format

Assembler syntax

VASL.dt VRd，VRa，SRb

VASL.dt VRd，VRa，#IMM

VASL.dt SRd，SRa，SRb

VASL.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

D：S：M		V＜-V@S	V＜-V@I	S＜-S@S	S＞-S@I
D：S：M		V＜-V@S	V＜-V@I	S＜-S@S	S＞-S@I	DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Vector / scalar register Ra each data element left, from the right are filled with zeros by the shift amount Scalar register Rb or IMM field gives the results stored in the vector / scalar register Rd. To Those elements caused an overflow, the result and in accordance with its symbol contains the largest positive or negative value to the maximum. Shift Position is defined as an unsigned integer.

Operating

shift_amount＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem && EMASK[i]；i++){

Rd[i]＝saturate(Ra[i]＜＜shift_amount)；

}

Abnormal

None.

Programming notes

Note Shift-amount from SRb or IMM <4:0> gained five digits. For byte, byte9, halfword data types, the programmer is responsible for the shift amount specified correctly, this shift is less than or equal to The number of bits in the data length. If the shift is greater than the specified data length, the element will be filled with zeros.

VASR arithmetic shift right

Format

Assembler syntax

VASR.dt VRd，VRa，SRb

VASR.dt VRd，VRa，#IMM

VASR.dt SRd，SRa，SRb

VASR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

D：S：M		V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I
D：S：M		V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I	DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Vector / scalar register Ra, each data element is an arithmetic right shift, the most significant bit position of a character Number extension, the shift amount in the scalar register Rb, or the least significant bit IMM field is given, the results Stored in a vector / scalar register Rd. Shift amount specified as an unsigned integer.

Operating

shift_amount＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem && EMAS[i]；i++)(

Rd[i]＝Ra[i]sign＞＞shift_amount；

}

Abnormal

None.

Programming notes

Note Shift-amount from SRb or IMM <4:0> gained five digits. For byte, byte9, halfword data types, the programmer is responsible for correctly specified shift amount, a small amount of this shift Than or equal to the length of the data digits. If the shift is greater than the specified length of data elements by symbol Bit stuffing.

VASS3 plus and minus (-1, 0) Symbol

Format

Assembler syntax

VASS3.dt VRd，VRa，VRb

VASS3.dt VRd，VRa，SRb

VASS3.dt SRd，SRa，SRb

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / Scalar contents of register Ra Rb is added to produce an intermediate result, and the Ra is Symbols removed from the intermediate results; final result is stored in the vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

if(Ra[i]＞0) extsgn3＝1；

else if(Ra[i]＜0) extsgn3＝-1；

else extsgn3＝0；

Rd[i]＝Ra[i]+Rb[i]-extsgn3；

}

Abnormal

Overflow.

VASUB absolute value subtraction

Format

Assembler syntax

VASUB.dt VRd，VRa，VRb

VASUB.dt VRd，VRa，SRb

VASUB.dt VRd，VRa，#IMM

VASUB.dt SRd，SRa，SRb

VASUB.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

D：S：M	V＜-V@V	V+V@S	V＜-V@I	S<-S@S	S＜-S@I
D：S：M	V＜-V@V	V+V@S	V＜-V@I	S<-S@S	S＜-S@I	DS	int8(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

Vector / scalar register Rb or IMM field content from the vector / scalar contents of register Ra Subtracted, the absolute results are stored in the vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝[Rb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]＝|Ra[i]～Bop[i]|；

Abnormal

Overflow, floating-point invalid operand.

Programming notes

If the subtraction result is the biggest negative, then the absolute value of the operation after an overflow occurs. If you allow full And mode of operation of the absolute value of the result of this will be the largest positive number.

VAVG two elements mean

Format

Assembler syntax

VAVG.dt VRd，VRa，VRb

VAVG.dt VRd，VRa，SRb

VAVG.dt SRd，SRa，SRb

Where dt = {b, b9, h, w, f}. Use VAVGT for integer data types to refer to Be "truncated" rounding mode.

Supported modes

D：S：M	V＜-V@V	V＜-V@S		S＜-S@S
D：S：M	V＜-V@V	V＜-V@S		S＜-S@S		DS	int8(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

Vector / scalar add the contents of register Ra vector / scalar register Rb contents to generate a Intermediate results; followed by the intermediate result by 2, and the final result is stored in the vector / scalar register Rd Medium. For integer data types, T = 1 if the rounding mode is truncated, and if T = 0 (default), Then rounded to zero. Floating-point data types, the rounding mode specified by the VCSR <RMODE>.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝(Rb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]＝(Ra[i]+Bop[i])//2；

}

Abnormal

None.

VAVGH average of two adjacent elements

Format

Assembler syntax

VAVGH.dt VRd，VRa，VRb

VAVGH.dt VRd，VRa，SRb

Where dt = {b, b9, h, w, f]. Use VAVGHT for integer data types to Specify "truncate" the rounding mode.

Supported modes

D：S：M	V＜-V@V	V＜-V@S
D：S：M	V＜-V@V	V＜-V@S				DS	int8(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

For each element, an average of two adjacent elements right. On integer data type, if T = 1, Rounding mode is cut off, while the T = 0 (default) is rounded down to zero. Floating-point data types, the rounding mode Designated by the VCSR <RMODE>.

Operating

for(i＝0；i＜NumElem-1；i++){

Rd[i]＝(Ra[i]+Ra[i+l])//2；

}

Rd[NumElem-1]＝(Ra[NumElem-1)+{VRb[0]‖SRb})//2；

Abnormal

None.

Programming notes

This command is not affected by masking element.

VAVGQ four average

Format

Assembler syntax

VAVGQ.dt VRd，VRa，VRb

Where dt = {b, b9, h, w}. Use VAVGQT for integer data types to indicate "Truncation" rounding mode.

Supported modes

D：S：M	V＜-V@V
D：S：M	V＜-V@V				DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

In VEC64 mode does not support this command.

As shown below, the use of truncated mode specified by the T (1 truncated rounding zero 0 is the default) To calculate the average of four elements. Note that the leftmost element (D_n-1) Is undefined.

Operating

for(i＝0；i＜NumElem-1；i++){

Rd[i]＝(Ra[i]+Rb[i]+Ra[i+1]+Rb[i+1])//4：

}

Abnormal

None.

VCACHE Cache Operation

Format

Assembler syntax

VCACHE.fc SRd，SRi

VCACHE.fc SRb，#IMM

VCACHE.fc SRb+，SRi

VCACHE.fc SRb+，#IMM

Where fc = {0,1}.

Explanation

The instruction for vector data use Cache software management. When the data part or all of the Cache Such as temporary memory is configured, this command has no effect on the temporary memory.

Supports the following options:

FC<2：0>	Significance
FC<2：0>	Significance	000	Write-back and make it match with the EA label altered the Cache line is invalid. If Matching row contains data that is not altered, then make this line is not valid without the write-back. If Found no Cache line contains EA, the data Cache reserve the right not to be touched.
001	Write-back and make the index specified by EA altered the Cache line is invalid. If Matching row contains data that is not altered, so that this line is not valid without the write-back.	000
001		Other	Undefined

Operating

Abnormal

None.

Programming notes

This command is not affected by masking element.

VCAND complement and

Format

Assembler syntax

VCAND.dt VRd，VRa，VRb

VCAND.dt VRd，VRa，SRb

VCAND.dt VRd，VRa，#IMM

VCAND.dt SRd，SRa，SRb

VCAND.dt SRd，SRa，#IMM

Where dt = (b, b9, h, w). Note. W and. F indicate the same operation.

Supported modes

Explanation

For Ra and Rb / immediate operands and logical complement, and return their results to the destination register Devices Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝～Ra[i]<k>&Bop[i]<k>，k＝for all bits in elementi；

}

Abnormal

None.

VCBARR Conditions barrier

Format

Assembler syntax

VCBARR.cond

Where cond = {0-7}. Each condition will be given later mnemonics.

Explanation

As long as this condition remains valid, delaying all the directives and subsequent instructions (appear in the program sequence Those behind). Cond <2:0> field interpretation CT format different from the other conditions Instruction.

Current definition of the following conditions:

Cond<2：0>	Significance
Cond<2：0>	Significance	000	Later in the implementation of any command, waiting all previous instructions (program sequence Column appears earlier) to end the execution.
Other	Undefined	000

Operating

while(Cond＝true)

stall all later instructoins；

Abnormal

None.

Programming notes

This instruction is provided for the software to force a series of instructions executed. This command can be used to force precisely Report does not clearly abnormal event. For example, if the instruction is immediately used in the calculation of abnormal events can cause After surgery instructions, this event will be the instruction addressing exception program counter reports.

VCBR conditional branch

Format

Assembler syntax

VCBR.cond #Offset

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, then the transfer, this is not a delayed branch.

Operating

If((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un))

VPC＝VPC+sex(Offset<22：0>*4)；

elseVPC＝VPC+4；

Abnormal

Invalid instruction address.

VCBRI indirect conditional branch

Format

Assembler syntax

VCBRI.cond SRb

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, then the indirect transfer. This is not a delayed branch.

Operating

If((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un))

VPC＝SRb<31：2>：b’00；

elseVPC＝VPC+4；

Abnormal

Invalid instruction address.

VCCS Conditions scene conversion

Format

Assembler syntax

VCCS #Offset

Explanation

If VIMSK <cse> is true, then jump to the site conversion routines. This is not a delayed turn Shift.

If VIMSK <cse> is true, VPC +4 (return address) saved to the return address stack Stacks. If not true, from VPC +4 continue.

Operating

If(VIMSK<cse>＝1){

if(VSP<4>＞15){

VISRC<RASO>＝1；

signal ARM7 with RASO exception；

VP STATE＝VP_IDLE；

}else{

RSTACK[VSP<3：0>]＝VPC+4；

VSP<4：0>＝VSP<4：0>+1；

VPC＝VPC+sex(Offset<22：0>*4)；

}

}else VPC＝VPC+4；

Abnormal

Return address stack overflow.

VCHGCR change control register

Format

Assembler syntax

VCHGCR Mode

Explanation

This command changes the operating mode of vector processors

Mode in each specified as follows:

Mode	Significance
Mode	Significance	bit1：0	This two control VCSR <CBANK> bit. Coding specify: 00 - do not change 01 - Clear VCSR <CBANK> bit 10 - Set VCSR <CBANK> bit 11 - bit trigger VCSR <CBANK>
bits3：2	This two control VCSR <SMM> bit. Coding specify: 00 - do not change 01 - Clear VCSR <SMM> bit 10 - Set VCSR <SMM> bit 11 - bit trigger VCSR <SMM>	bit1：0
bits3：2		bit5：4	This two control VCSR <CEM> bit. Coding specify: 00 - do not change 01 - Clear VCSR <CEM> bit 10 - Set VCSR <CEM> bit 11 - bit trigger VCSR <CEM>
Other	Undefined	bit5：4

Operating

Abnormal

None.

Programming notes

The directive provides for the hardware to be more effective than VMOV instructions to change the way in VCSR Control bit.

VCINT condition interrupt ARM7

Format

Assembler syntax

VCINT.cond #ICODE

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, when enabled, execution stops and interrupts ARM7.

Operating

If((Cond＝VCSR[SO，GT.EQ，LT])|(Cond＝un)){

VISRC<vip>＝1；

VIINS＝[VCINT.cond#ICODE instruction]；

VEPC＝VPC；

if(VIMSK<vie>＝1)signal ARM7 interrupt；

VP_STATE＝VP_IDLE；

}

else VPC＝VPC+4；

Abnormal

VCINT interrupted.

VCJOIN connection with ARM7 task conditions

Format

Assembler syntax

VCJOIN.cond #Offset

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, when enabled, execution stops and interrupts ARM7.

Operating

If((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)){

VISRC<vjp>＝-1；

VIINS＝[VCJOIN.cond#Offset instruction]；

VEPC＝VPC；

if(VIMSK<vje>＝1)signal ARM7 interrupt；

VP_STATE＝VP_IDLE；

}

else VPC＝VPC+4；

Abnormal

VCJOIN interrupted.

VCJSR conditional jump to subroutine

Format

Assembler syntax

VCJSR.cond #Offset

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, then jump to the subroutine. This is not a delayed branch.

If Cond is true, the VPC +4 (return address) saved to the return address stack. If a non- True, from the VPC +4 continue.

Operating

If((Cond＝VCSR[SO，GT.EQ，LT])|(Cond＝un))(

if(VSP<4>＞15){

VISRC<RASO>＝1；

signal ARM7 with RASO exception；

VP_STATE＝VP_IDLE；

}else{

RSTACK[VSP<3：0>]＝VPC+4；

VSP<4：0>＝VSP<4：0>+1：

VPC＝VPC+sex(Offset<22：0>*4)；

}

}else VPC＝VPC+4；

Abnormal

Return address stack overflow.

VCJSRI indirect conditional jump to subroutine

Format

Assembler syntax

VCJSRI.cond SRb

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, then the indirect jump to subroutine. This is not a delayed branch.

If Cond is true, VPC +4 (return address) saved to the return address stack. If Not true, from VPC +4 continue.

Operating

If((Cond＝VCSR[SO，GT.EQ，LT])|(Cond＝un)){

if(VSP<4：0>15){

VISRC<RASO>＝1；

signal ARM7 with RASO exception；

VP_STATE＝VP_IDLE；

}else{

RSTACK[VSP<3：0>]＝VPC+4；

VSP<4：0>＝VSP<4：0>+1；

VPC＝SRb<31：2>：b′00；

}

}else VPC＝VPC+4：

Abnormal

Return address stack overflow.

VCMOV conditional branch

Format

Assembler syntax

VCMOV dt Rd，Rb，cond

VCMOV.dt Rd，#IMM，cond

Where dt = {b, b9, h, w, f}, cond = (un, lt, eq, le, gt, ne, ge, ov}. Attention. F and. W specify the same operation, unless. F 9 data type does not support the Immediate operand.

Supported modes

D：S：M	V＜-V	V＜-S	V＜-I	S＜-S	S＜-I
D：S：M	V＜-V	V＜-S	V＜-I	S＜-S	S＜-I	DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

If Cond is true, the contents of register Rb transferred to the register Rd. ID <1:0> Further specify the source and destination registers:

Vector register VR current group

SR scalar register

SY synchronization register

VAC vector accumulator register (register coded reference to the VAC VMOV explanation)

D：S：M	ID<1：0>＝00	ID<1：0>＝01	ID<1：0>＝10	ID<1：0>＝11
D：S：M	ID<1：0>＝00	ID<1：0>＝01	ID<1：0>＝10	ID<1：0>＝11	V＜-V	VR＜-VR	VR＜-VAC	VAC＜-VR
V＜-S	VR＜-SR		VAC＜-SR		V＜-V	VR＜-VR	VR＜-VAC	VAC＜-VR
V＜-S	VR＜-SR		VAC＜-SR		V＜-I	VR＜-I
S＜-S	SR＜-SR				V＜-I	VR＜-I
S＜-S	SR＜-SR				S＜-I	SR＜-1

Operating

If((Cond＝VCSR[SOV，GT，EQ，LT])|(Cond＝un))

for(i＝0；i＜NumElem；i++)

Rd [i] = {Rb [i] ‖ ‖ SRb Sex (IMM <8:0>)}; Abnormal

None.

Programming notes

Elements of this Directive without shielding effect, VCMOVM affected by shielding elements.

On the eight elements, vector floating-point precision accumulator expansion is expressed using the full 576. Because And, including the transfer of the accumulator vector registers must be specified. B9 data length.

VCMOVM elements shielded with conditional branching

Format

Assembler syntax

VCMOVM.dt Rd，Rb，cond

VCMOVM.dt Rd，#IMM，cond

Where dt = {b, b9, h, w, f}, cond = {un, lt, eq, le, gt, ne, ge, ov}. Attention. F and. W specify the same operation, unless. F 9 data type does not support the Immediate operand.

Supported modes

D：S：M	V＜-V	V＜-S	V＜-1
D：S：M	V＜-V	V＜-S	V＜-1		DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

If Cond is true, then transfer the contents of register Rb to register Rd. ID <1:0> Further specify the source and destination registers:

Vector register VR current group

SR scalar register

D：S：M	ID<1：0>＝00	ID<1：0>＝ 01	ID<1：0>＝10	ID<1：0>＝11
D：S：M	ID<1：0>＝00	ID<1：0>＝ 01	ID<1：0>＝10	ID<1：0>＝11	V＜-V	VR＜-VR	VR＜-VAC	VAC＜-VR
V＜-S	VR＜-SR		VAC＜-SR		V＜-V	VR＜-VR	VR＜-VAC	VAC＜-VR
V＜-S	VR＜-SR		VAC＜-SR		V＜-I	V＜-I
S＜-S					V＜-I	V＜-I
S＜-S					S＜-I

Operating

If((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un))

for(i＝0；i＜NumElem && MMASK[i]；i++)

Rd[i]＝{Rb[i]‖SRb‖sex(IMM<8：0>)}；

Abnormal

None.

Programming notes

Elements of this Directive by VMMR shielding effect, VCMOV element shielded from impact.

On the eight elements in the vector floating-point precision accumulator expansion is expressed using the full 576. Because And, including the transfer of the accumulator vector registers must be specified. B9 data length.

VCMPV comparison and set shield

Format

Assembler syntax

VCMPV.dt VRa，VRb，cond，mask

VCMPV.dt VRa，SRb，cond，mask

Where dt = {b, b9, h, w, f}, cond = {lt, eq, le, gt, ne, ge,}, mask = {VGMR, VMMR}. If you specify is not masked, VGMR is assumed.

Supported modes

D：S：M	M＜-V@V	M＜-V@S
D：S：M	M＜-V@V	M＜-V@S				DS	int8(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

VRa and VRb vector register contents by performing a subtraction operation (VRa [i]-VRb [i]) for Element comparison means, if the result of the comparison instruction and VCMPV Cond fields match, VGMR (eg K = 0) or VMMR (eg K = 1) # i phase register bit is set. For example, If Cond field is less than (LT), when VRa [i] <VRb [i] is set VGMR [i] (or VMMR [i]) Position.

Operating

for(i＝0：i＜NumElem：i++){

Bop[i]＝{Rb[i]‖SRb‖sex(IMM<8：0>)}；

relationship[i]＝Ra[i]？Bop[i]；

if(K＝1)

MMASK[i]＝(relationship[i]＝Cond)？True：False；

else

EMASK[i]＝(relationship[i]＝Cond)？True：False；

}

Abnormal

None.

Programming notes

This command is not affected shielding element.

VCNTLZ count leading zeros

Format

Assembler syntax

VCNTLZ.dt VRd，VRb

VCNTLZ.dt SRd，SRb

Where dt = {b, b9, h, w}.

Supported modes

5	V＜-V	S＜-S
5	V＜-V	S＜-S			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

For each element in Rb number of leading zeros count; count value returned in Rd.

Operating

for(i＝0：i＜NumElem && EMASK[i]；i++){

Rd[i]＝number of leading zeroes(Rb[i])；

}

Abnormal

None.

Programming notes

If all the bits are zero element, the result is equal to the length of elements (8,9,16 or 32 respectively Corresponding byte, byte9, halfword, or word).

Leading zero count position index element has an inverse relationship (if used in VCMPR instruction Behind). For the conversion to the element's position, for a given data type, subtract from NumElem VCNTLZ Results.

VCOR or complement

Format

Assembler syntax

VCOR.dt VRd，VRa，VRb

VCOR.dt VRd，VRa，SRb

VCOR.dt VRd，VRa，#IMM

VCOR.dt SRd，SRa，SRb

VCOR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.

Supported modes

Explanation

For Ra and Rb / immediate operand or a logical complement, and returns the result to the destination register Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝-Ra[i]<k>|Bop[i]<k>，k＝for all bits in elementi；

Abnormal

None.

VCRSR conditions return from subroutine

Format

Assembler syntax

VCRSR.cond

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, return from subroutine. This is not a delayed branch

If Cond is true, from the stack, the return address stored in the return address to continue. As If not true, from VPC +4 continue.

Operating

If((Cond＝VCSR[SO，GTEQ，LT])|(Cond＝un)){

if(VSP<4：0>＝0){

VISRC<RASU>＝1；

signal ARM7 with RASU exeeption：

VP_STATE＝VP_IDLE；

}else{

VSP<4：0>＝VSP<4：0>-1；

VPC＝RSTACK[VSP<3：0>]；

VPC<1：0>＝b′00；

}

}else VPC＝VPC+4；

Abnormal

Invalid instruction address, return address stack underflow.

VCVTB9 byte9 data type conversion

Format

Assembler syntax

VCVTB9.md VRd，VRb

VCVTB9.md SRd，SRb

Where md = {bb9, b9h, hb9}.

Support model

S	V＜-V	S＜-S
S	V＜-V	S＜-S			MD	bb9		b9h	hb9

Explanation

Each element in Rb convert from byte to byte9 (bb9), from byte9 converted to halfword (b9h) Conversion to or from a halfword byte9 (hb9).

Operating

if(md<1：0>＝0){//bb9 for byte to byte9 conversion

VRd＝VRb；

VRd<9i+8>＝VRb<9i+7>，i＝0 to 31(or 63 in VEC64 mode)}

else if(md<1：0>＝2){//b9h for byte9 to halfword conversion

VRd＝VRb；

VRd<18i+16：18i+9>＝VRb<18i+8>，i＝0 to 15(or 31 in VEC64 mode)}

else if(md<1：0>＝3)//hb9 for halfword to byte9 conversion

VRd<18i+8>＝VRb<18i+9>，i＝0 to 15(or 31 in VEC64 mode)

else VRd＝undefuned；

Abnormal

None.

Programming notes

In conjunction with b9h mode before this instruction, requiring the programmer to use shuffle (shuffle) operation tone Entire vector register the decrease in the number of elements. Hb9 used together with the instruction mode, requires Programmer operation with unshuffle destination vector register adjustment the increase in the number of elements. This instruction does not Masked by the impact of elements.

VCVTFF floating-point to fixed-point conversion

Format

Assembler syntax

VCVTFF VRd，VRa，SRb

VCVTFF VRd，VRa，#IMM

VCVTFF SRd，SRa，SRb

VCVTFF SRd，SRa，#IMM

Supported modes

D：S：M

V＜-V，S

V＜-V，I

S＜-S，S

S＜-S，I

Explanation

Vector / scalar contents of register Ra convert from a 32-bit floating point format <X,Y> sentinel Real number, wherein the length of Y the Rb (mode 32) or IMM field specifies, and X is the length from the (32-Y The Length). X indicates the integer part, Y represents the fractional part. The result is stored in the vector / scalar register Register Rd.

Operating

Y_size＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem；i++){

Rd[i]＝convert to<32-Y_size.Y size>format(Ra[i])；

}

Abnormal

Overflow.

Programming notes

The directive only supports Word data length. Because the structure does not support multi-register data classes Type, the instruction does not use the element shield. The directive on the use of integer data types rounded to zero rounding mode.

VCVTIF integer to floating point conversion

Format

Assembler syntax

VCVTIF VRd，VRb

VCVTIF VRd，SRb

VCVTIF SRd，SRb

Supported modes

D：S：M

V＜-V

V＜-S

S＜-S

Explanation

Vector / scalar register Rb contents from int32 convert floating-point data types, the result is stored in Vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem；i++){

Rd[i]＝convert to floating point format(Rb[i])；

}

Abnormal

None.

Programming notes

This instruction supports only word data length. Because the structure does not support multiple data types in the register, This instruction does not use the element shield.

VD1CBR VCR1 and conditions of transfer of minus one

Format

Assembler syntax

VD1CBR.cond #Offset

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, VCR1 decremented and metastasis. This is not a delayed branch.

Operating

VCR1＝VCR1-1；

If((VCR1＞0)&((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)))

VPC＝VPC+Sex(Offset<22：0>*4)；

else VPC＝VPC+4；

Abnormal

Invalid instruction address.

Programming notes

Note VCR1 condition is checked in before the transfer minus 1. When VCR1 perform this refers to 0:00 Order the loop count is set to 2 effective³²-1。

VD2CBR VCR2 minus 1 and conditional branching

Format

Assembler syntax

VD2CBR.cond #Offset

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

If Cond is true, VCR2 decremented and metastasis. This is not a delayed branch.

Operating

VCR2＝VCR2-1；

If((VCR2＞0)&((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)))

VPC＝VPC+sex(Offset<22：0>*4)；

else VPC＝VPC+4；

Abnormal

Invalid instruction address.

Programming notes

Note VCR2 condition is checked in before the transfer minus 1. When VCR2 perform this means is 0 Order the loop count is set to 2 effective³²-1。

VD3CBR VCR3 minus 1 and conditional branching

Format

Assembler syntax

VD3CBR.cond #Offset

Where cond = {un, lt, eq, le, gt, ne, ge, ov}.

Explanation

When Cond is true, VCR3 minus one and metastasis. This is not a delayed branch.

Operating

VCR3＝VCR3-1；

If((VCR3＞0)&((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)))

VPC＝VPC+sex(Offset<22：0>*4)；

else VPC＝VPC+4；

Abnormal

Invalid instruction address.

Programming notes

Note VCR3 condition is checked in before the transfer minus 1. When VCR3 perform this refers to 0:00 Order the loop count is set to 2 effective³²-1。

VDIV2N by 2 ⁿ Except

Format

Assembler syntax

VDIV2N.dt VRd，VRa，SRb

VDIV2N.dt VRd，VRa，#IMM

VDIV2N.dt SRd，SRa，SRb

VDIV2N.dt SRd，SRa，#IMV

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / scalar contents of register Ra are twoⁿIn addition, where n is a scalar register Rb, or 2MM Positive integer content, the final result is stored in the vector / scalar register Rd. This command uses the truncated (to Zero rounding) as rounding mode.

Operating

N＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem && EMASK[i]；i++){

Rd[i]＝Ra[i]/2 ^N；

}

Abnormal

None.

Programming notes

Note that N is SRb or IMM <4:0> to obtain the five digits. For byte, byte9, halfword data types, the programmer is responsible for properly specify the data length is less than or equal to the precision level N Values. If it is greater than the precision of the specified data length, the element filled with the sign bit. This instruction is used to zero Rounding rounding mode.

VDIV2N.F are two floating-point ⁿ Except

Format

Assembler syntax

VDIV2N.f VRd，VRa，SRb

VDIV2N.f VRd，VRa，#IMM

VDIV2N.f SRd，SRa，SRb

VDIV2N.f SRd，SRa，#IMM

Supported modes

D：S：M

V＜-V@S

V＜-V@I

S＜-S@S

S＜-S@I

Explanation

Vector / scalar contents of register Ra are 2n addition, where n is the scalar register Rb or the IMM Positive integer content, the final result is stored in the vector / scalar register Rd.

Operating

N＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem && EMASK[i]；i++){

Rd[i]＝Ra[i]/2 ^N；

}

Abnormal

None.

Programming notes

Note that N is SRb or IMM <4:0> gained five digits.

VDIVI incomplete unless initialization

Format

Assembler syntax

VDIVI.ds VRb

VDIVI.ds SRb

Where ds = {b, b9, h, w}.

Supported modes

S	VRb	SRb
S	VRb	SRb			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Do not restore unsigned integer divide initialization steps. Dividend is double precision accumulator Signed integer. If the dividend is a single-precision number, it must be sign-extended to double precision, and stored in VACOH and VACOL in. Divisor is Rb in single-precision signed integer.

If the sign of the dividend the same sign as the divisor, is subtracted from the accumulator high Rb. As Different, Rb is added to the accumulator on high.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb)

if(VACOH[i]<msb>＝Bop[i]<msb>)

VACOH[i]＝VACOH[i]-Bop[i]；

else

VACOH[i]＝VACOH[i]+Bop[i]：

}

Abnormal

None.

Programming notes

In division step, the programmer is responsible for checking overflow or division by zero situation.

VDOVS incomplete unless steps

Format

Assembler syntax

VDIVS.ds VRb

VDIVS.ds SRb

Where ds = {b, b9, h, w}.

Supported modes

S	VRb	SRb
S	VRb	SRb			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Perform a recovery with no sign except the election on behalf of steps. Requirements of this Directive to be executed with the same number of times According to the same length (for example, int8 is 8, int9 of 9 times, int16 to 16, int32 data Type 32). VDIVI instruction must be used in addition to the steps once before, early in the accumulator generates Initial part of the remainder. Divisor is Rb in single-precision signed integer. Each step of extracting a quotient Move accumulator bits and the least significant bit.

If the portion of the accumulator with the sign of the remainder of the same sign as the divisor in Rb, from high accumulator Bit subtracted Rb. If they are different, Rb is added to the accumulator on high.

If the accumulator portion derived remainder (plus or minus a result) the sign of the divisor symbol phase Same, then the quotient bit is 1. If not identical, then the quotient bit is 0. Accumulator left a position with suppliers Bit populated.

In addition to the steps at the end, the remainder is in the accumulator high, rather low in the business in the accumulator. This quotient 1's complement form.

Operating

VESL elements to the left one

Format

Assembler syntax

VESL.dt SRc，VRd，VRa，SRb

Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.

Supported modes

S		SRb
S		SRb			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

The elements in the vector register Ra left a location from the scalar register Rb populated. Being Out of the leftmost element returns to scalar register Rc, other elements return to the vector register Rd.

Operating

VRd[0]＝SRb；

for(i＝o；i＜NumElem-1；i++)

VRd[i]＝VRa[i-1]；

SRc＝VRa[NumElem-1]；

Abnormal

None.

Programming notes

This command is not affected shielding element.

VESR elements to the right one

Format

Assembler syntax

VESL.dt SRc，VRd，VRa，SRb

Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.

Supported modes

S		SRb
S		SRb			Ds	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

The vector register Ra are elements to the right one position, from a scalar register Rb populated. Being shifted The rightmost element returns to scalar register Rc, other elements return to the vector register Rd.

Operating

SRc＝VRa[0]；

for(i＝o；i＜NumElem-2；i++)

VRd[i]＝VRa[i+1]；

VRd[NumElem-1]＝SRb；

Abnormal

None.

Programming notes

This command is not affected shielding element.

VEXTRT extract an element

Format

Assembler syntax

VEXTRT.dt SRd，VRa，SRb

VEXTRT.dt SRd，VRa，#IMM

Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.

Supported modes

D：S：M				S＜-S	S＜-I
D：S：M				S＜-S	S＜-I	DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Extracted from Ra vector register elements and store them in a scalar register Rd, the register of the cable Cited by a scalar register Rb or IMM field indicates.

Operating

index32＝{SRb％32‖IMM<4：0>}；

index64＝{SRb％64‖IMM<5：0>}；

index＝(VCSR<vec64>)？index64：index32；

SRd＝VRa[index]；

Abnormal

None.

Programming notes

This command is not affected shielding element.

VEXTSGN2 extraction (1, -1) symbol

Format

Assembler syntax

VEXTSGN2.dt VRd，VRa

VEXTSGN2.dt SRd，SRa

Where dt = {b, b9, h, w}.

Supported modes

S	V＜-V	S＜-S
S	V＜-V	S＜-S			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Calculate vector / scalar register Ra symbol value content element method, the result is stored in the vector / Scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Rd[i]＝(Ra[i]＜0)？-1：1；

}

Abnormal

None.

VEXTSGN3 extract (1,0, -1) symbol

Format

Assembler syntax

VEXTSGN3.dt VRd，VRa

VEXTSGN3.dt SRd，SRa

Where dt = {b, b9, h, w}.

Supported modes

S	V＜-V	S＜-S
S	V＜-V	S＜-S			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

if(Ra[i]＞0) Rd[i]＝1；

else if(Ra[i]＜0) Rd[i]＝-1；

else Rd[i]＝0；

}

Abnormal

None.

VINSRT insert an element

Format

Assembler syntax

VINSRT.dt VRd，SRa，SRb

VINSRT.dt VRd，SRa，#IMM

Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.

Supported modes

D：S：M		V＜-S	V＜-I
D：S：M		V＜-S	V＜-I			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

The scalar register Ra, Rb elements in scalar register or IMM field specifies the index of the plug Into the vector register Rd.

Operating

index32＝{SRb％32‖IM4<4：0>}；

index64＝{SRb％64‖IMM<5：0>}；

index＝(VCSR<Vec64>)？index64：index32；

VRd[index]＝SRa；

Abnormal

None.

Programming notes

Elements of this Directive without shielding effect.

VL load

Format

Assembler syntax

VL.lt Rd，SRb，SRi

VL.lt Rd，SRb，#IMM

VL.lt Rd，SRb+，SRi

VL.lt Rd，SRb+，#IMM

Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Note. B and. Bs9 specify the same action, .64, and can not be specified together VRAd. For cache -off loaded using VLOFF.

Operating

Load current or alternative group a vector register or a scalar register.

Operating

EA＝SRb+{SRi‖Sex(IMM<7：0>)}；

if(A＝1)SRb＝EA；

Rd = the table below:

LT	Load operation
LT	Load operation	.b	SR _d<7：0>：＝BYTE[EA]
.bz9	SR _d<8：0>＝zex BYTE[EA]	.b	SR _d<7：0>：＝BYTE[EA]
.bz9	SR _d<8：0>＝zex BYTE[EA]	.bs9	SR ₄<8：0>＝sex BYTE[EA]
.h	SR _d<15：0>＝HALF[EA]	.bs9	SR ₄<8：0>＝sex BYTE[EA]
.h	SR _d<15：0>＝HALF[EA]	.w	SR _d<31：0>＝WORD[EA]
.4	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 3	.w	SR _d<31：0>＝WORD[EA]
.4	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 3	.8	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 7
.16	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 15	.8	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 7
.16	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 15	.32	VR _d<9i+8：9i>＝sex BYT[EA+i]，i＝0 to 31
.64	VR _0d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31 VR _1d<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31	.32	VR _d<9i+8：9i>＝sex BYT[EA+i]，i＝0 to 31

Abnormal

Invalid data address, unaligned accesses.

Programming notes

This command is not affected shielding element.

VLCB loaded from the circular buffer

Format

Assembler syntax

VLCB.lt Rd，SRb，SRi

VLCB.lt Rd，SRb，#IMM

VLCB.lt Rd，SRb+，SRi

VLCB.lt Rd，SRb+，#IMM

Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For cache -off loaded using VLCBOFF.

Explanation

From the SR_b+1The BEGIN pointer and SR_b+2The END defined circular buffer pointer Loads a vector or scalar register.

Address update is loaded and before the operation, such as the effective address is greater than END address, effectively Address is adjusted. In addition,. H and. W scalar loaded circular buffer boundary must separately with halfword And word boundaries.

Operating

EA＝SR _b+{SRi‖sex(IMM<7：0>)}；

BEGIN＝SR _b+1；

END＝SR _b+2；

cbsize＝END-BEGIN；

if(EA＞END)EA＝BEGIN+(EA-END)；

if(A＝1)SR _b＝EA；

R _d= See the following table:

LT	Load operation
LT	Load operation	.bz9	SR _d<8：0>＝zex BYTE[EA]
.bs9	SR _d<8：0>＝sex BYTE[EA]	.bz9	SR _d<8：0>＝zex BYTE[EA]
.bs9	SR _d<8：0>＝sex BYTE[EA]	.h	SR _d<15：0>＝HALF[EA]
.w	SR _d<31：0>＝WORD[EA]	.h	SR _d<15：0>＝HALF[EA]
.w	SR _d<31：0>＝WORD[EA]	.4	VR _d<9i+8：9i>＝sex BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]，i＝0 to 3
.8	VR _d<9i+8：9i>＝sex BYTE[(EA+i>END)？EA+i-cbsize：EA+i]，i＝0 to 7	.4

LT	Load operation
LT	Load operation	.16	VR _d<9i+8：9i>＝sex BYTE[(EA+i>END)？EA+i-cbsize：EA+i]，i＝0 to 15
.32	VR _d<9i+8：9i>＝sex BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]，i＝0 to 31	.16
.32		.64	VR _0d<9i+8：9i>＝sex BYTE[(EA+i>END)？EA+i-cbsize：EA+i]，i＝0 to 31 VR _1d<9i+8：9i>＝sex BYTE[(EA+32+i＞END)？EA+32+i-cbsize：EA+32+i]，i＝0 to 31

Abnormal

Invalid data address, unaligned accesses.

Programming notes

This command is not affected shielding element.

Programmer must determine the following condition for this command to work as desired:

BEGIN＜EA＜2*END-BEGIN

Namely, EA> BEGIN and EA-END <END-BEGIN.

VLD double load

Format

Assembler syntax

VLD.lt Rd，SRb，SRi

VLD.lt Rd，SRb，#IMM

VLD.lt Rd，SRb+，SRi

VLD.lt Rd，SRb+，#IMM

Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For cache -off loaded using VLDOFF.

Explanation

Load current or alternative group two vector registers or two scalar register.

Operating

EA＝SR _b+{SR _i‖Sex(IMM<7：0>)}；

if(A＝1)SR _b＝EA；

R _d：R _d+1= Table below:

LT	Load operation
LT	Load operation	.bz9	SR _d<8：0>＝zex BYTE[EA] SR _d+1<8：0>＝zex BYTE[EA+1]
.bs9	SR _d<8：0>＝zex BYTE[EA] SR _d+1<8：0>＝zex BYTE[EA+1]	.bz9	SR _d<8：0>＝zex BYTE[EA] SR _d+1<8：0>＝zex BYTE[EA+1]
.bs9	SR _d<8：0>＝zex BYTE[EA] SR _d+1<8：0>＝zex BYTE[EA+1]	.h	SR _d<15：0>＝HALF[EA] SR _d+1<15：0>＝HALF[EA+2]
.w	SR _d<31：0>＝WORD[EA] SR _d+1<31：0>＝WORD[EA+4]	.h	SR _d<15：0>＝HALF[EA] SR _d+1<15：0>＝HALF[EA+2]
.w	SR _d<31：0>＝WORD[EA] SR _d+1<31：0>＝WORD[EA+4]	.4	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 3 VR _d+1<9i+8：9i>＝sex BYTE[EA+4+i]，i＝0 to 3
.8	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 7 VR _d+1<9i+8：9i>＝sex BYTE[EA+8+i]，i＝0 to 7	.4

LT	Load operation
LT	Load operation	.16	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 15 VR _d+1<9i+8：9i>＝sex BYTE[EA+16+i]，i＝0 to 15
.32	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31 VR _d+1<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31	.16
.32		.64	VR _0d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31 VR _1d<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31 VR _0d+1<9i+8：9i>＝sex BYTE[EA+64+i]，i＝0 to 31 VR _1d+1<9i+8：9i>＝sex BYTE[EA+96+i]，i＝0 to 31

Abnormal

Invalid data address, unaligned accesses.

Programming notes

This command is not affected shielding element.

VLI immediate loading

Format

Assembler syntax

VLI.dt VRd，#IMM

VLI.dt SRd，#IMM

Where dt = {b, b9, h, w, f}.

Explanation

Scalar or vector registers to load the immediate value.

Scalar register is loaded, according to the type of data loaded byte, byte9, halfword, or word. For byte, byte9 and halfword data types, unaffected those byte (byte9) does not Is changed.

Operating

Rd = the table below:

DT	Loading scalar	Vector Load
DT	Loading scalar	Vector Load	.i8	SR _d<7：0>＝IMM<7：0>	VR _d＝32 int8 elements
.i9	SR _d<8：0>＝IMM<8：0>	VR _d＝32 int9 elements	.i8	SR _d<7：0>＝IMM<7：0>	VR _d＝32 int8 elements
.i9	SR _d<8：0>＝IMM<8：0>	VR _d＝32 int9 elements	.i16	SR _d<15：0>＝IMM<15：0>	VR _d＝16 int16 elements
.i32	SR _d<31：0>＝sex IMM<18：0>	VR _d＝8 int32 elcments	.i16	SR _d<15：0>＝IMM<15：0>	VR _d＝16 int16 elements
.i32	SR _d<31：0>＝sex IMM<18：0>	VR _d＝8 int32 elcments	.f	SR _d<31>＝IMM<18>(sign) SR _d<30：23>＝IMM<17：10>(exponent) SR _d<22：13>＝IMM<9：0>(mantissa) SR _d<12：0>＝zeroes	VR _d＝8 float elements

Abnormal

None.

VLQ four load

Format

Assembler syntax

VLQ.lt Rd，SRb，SRi

VLQ.lt Rd，SRb，#IMM

VLQ.lt Rd，SRb+，SRi

VLQ.lt Rd，SRb+，#IMM

Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd, SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For Cache -off load utilization VLQOFF.

Explanation

Group in the current or alternative loading four vector registers or four scalar register.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)]；

if(A＝1)SR _b＝EA；；

R _d：R _d+1：R _d+2：R _d+3= Table below:

LT	Load operation
LT	Load operation	.bz9	SR _d<8：0>＝zex BYTE[EA] SR _d+1<8：0>＝zex BYTE[EA+1] SR _d+2<8：0>＝zex BYTE[EA+2] SR _d3<8：0>＝zex BYTE[EA+3]
.bs9	SR _d<8：0>＝zex BYTE[EA] SR _d+1<8：0>＝zex BYTE[EA+1] SR _d+2<8：0>＝zex BYTE[EA+2] SR _d+3<8：0>＝zex BYTE[EA+3]	.bz9
.bs9		.h	SR _d<15：0>＝HALF[EA] SR _d+1<15：0>＝HALF[EA+2] SR _d+2<15：0>＝HALF[EA+4] SR _d+3<15：0>＝HALF{EA+6]

LT	Load operation
LT	Load operation	.w	SR _d<31：0>＝WORD[EA] SR _d+1<31：0>＝WORD[EA+4] SR _d+2<31：0>＝WORD[EA+8] SR _d+3<31：0>＝WORD[EA+12]
.4	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 3 VR _d+1<9i+8：9i>＝sex BYTE[EA+4+i]，i＝0 to 3 VR _d+2<9i+8：9i>＝sex BYTE[EA+8+i]，i＝0 to 3 VR _d+3<9i+8：9i>＝sex BYTE[EA+12+i]，i＝0 to 3	.w
.4		.8	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 7 VR _d+1<9i+8：9i>＝sex BYTE[EA+8+i]，i＝0 to 7 VR _d+2<9i+8：9i>＝sex BYTE[EA+16+i]，i＝0 to 7 VR _d+3<9i+8：9i>＝sex BYTE[EA+24+i]，i＝0 to 7
.16	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 15 VR _d+1<9i+8：9i>＝sex BYTE[EA+16+i]，i＝0 to 15 VR _d+2<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 15 VR _d+3<9i+8：9i>＝sex BYTE[EA+48+i]，i＝0 to 15	.8
.16		.32	VR _d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31 VR _d+1<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31 VR _d+2<9i+8：9i>＝sex BYTE[EA464+i]，i＝0 to 31 VR _d+3<9i+8：9i>＝sex BYTE[EA+96+i]，i＝0 to 31
.64	VR _0d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31 VR _1d<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31 VR _0d+1<9i+8：9i>＝sex BYTE[EA+64+i]，i＝0 to 31 VR _1d+1<9i+8：9i>＝sex BYTE[EA+96+i]，i＝0 to 31 VR _0d+2<9i+8：9i>＝sex BYTE[EA+128+i]，i＝0 to 31 VR _1d+2<9i+8：9i>＝sex BYTE[EA+160+i]，i＝0 to 31 VR _0d+3<9i+8：9i>＝sex BYTE[EA+192+i]，i＝0 to 31 VR _1d+3<9i+8：9i>＝sex BYTE[EA+224+i]，i＝0 to 31	.32

Abnormal

Invalid data address, unaligned accesses.

Programming notes

This command is not affected shielding element.

VLR reverse loading

Format

Assembler syntax

VLR.lt Rd，SRb，SRi

VLR.lt Rd，SRb，#IMM

VLR.lt Rd，SRb+，SRi

VLR.lt Rd，SRb+，#IMM

Where lt = {4,8,16,32,64}, Rd = {VRd, VRAd}. Note .64 and VRAd Can not be specified together. Cache-off load for use VLROFF.

Explanation

Load a sequence in reverse element vector registers. This command is not supported scalar destination register.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)]；

if(A＝1)SR _b＝EA；

Rd = the table below:

LT	Load operation
LT	Load operation
.4	VR _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 3
.4	VR _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 3	.8	VR _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 7
.16	VE _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 15	.8	VR _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 7
.16	VE _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 15	.32	VR _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 31
.64	VR _0d[31-i]<8：0>＝sex BYTE[EA+32+i]，i＝0 to 31 VR _1d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 31	.32	VR _d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 31

Abnormal

Invalid data address address misaligned accesses.

Programming notes

This command is not affected shielding element.

VLSL Logical Shift Left

Format

Assembler syntax

VLSL.dt VRd，VRa，SRb

VLSL.dt VRd，VRa，#IMM

VLSL.dt SRd，SRa，SRb

VLSL.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / scalar register Ra logical shift left each element, the least significant bit (LSB) position Zero fill, the shift amount in the scalar register Rb or IMM field is given, the result is stored in the vector / standard Amount of register Rd.

Operating

shift_amount＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem && EMASK[i]：i++){

Rd[i]＝Ra[i]＜＜shift_amount；

}

Abnormal

None.

Programming notes

Note that shift-amount from SRb or IMM <4:0> gained five digits, for In byte, byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to The shift amount of bits. If the shift is greater than the specified data length, the element will be filled with zeros.

VLSR Logical Shift Right

Format

Assembler syntax

VLSR.dt VRd，VRa，SRb

VLSR.dt VRd，VRa，#IMM

VLSR.dt SRd，SRa，SRb

VLSR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / scalar register Ra logical shift right for each element, the most significant bit (MSB) position With zero fill, the shift amount in the scalar register Rb or IMM field is given, the result is stored in the vector / Scalar register Rd.

Operating

shift_amount＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem && EMASK[i]；i++){

Rd[i]＝Ra[i]zero＞＞shift_amount；

}

Abnormal

None.

Programming notes

VLWS span load

Format

Assembler syntax

VLWS.dt Rd，SRb，SRi

VLWS.dt Rd，SRb，#IMM

VLWS.dt Rd，SRb+，SRi

VLWS.dt Rd，SRb+，#IMM

Where dt = {4,8,16,32}, Rd = {VRd, VRAd}. Note that the mode is not .64 Support, with the VL instead. On the Cache-off loaded using VLWSOFF.

Explanation

From the effective address beginning with scalar register SRb +1 as the span of control registers, from the storage 32 bytes loaded into the vector registers VRd.

LT specified block size, for each block of consecutive bytes loaded. SRb +1 specified stride, Separating the start of two consecutive blocks of bytes.

stride must be equal to or greater than the block size. EA must be aligned with the data length. stride And the block size must be a multiple data length.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)}；

if(A＝1)SR _b＝EA；

Block-size＝{4‖8‖16‖32}；

Stride＝SR _b+1<31：0>；

for(i＝0；i＜VECSIZE/Block-size；i++)

for(j＝0；j＜Block-size；j++)

VRd[i*Block-size+j]<8：0>＝sex BYTE{EA+i*Stride

+j}；

Abnormal

Invalid data address, unaligned accesses.

VMAC multiply and accumulate

Format

Assembler syntax

VMAC.dt VRa，VRb

VMAC.dt VRa，SRb

VMAC.dt VRa，#IMM

VMAC.dt SRa，SRb

VMAC.dt SRa，#IMM

Where dt = {b, h, w, f}.

Supported modes

D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I
D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I	DS	int8(b)		int16(h)	int32(w)	float(f)

Explanation

Ra and Rb each element of each element in a double-precision multiplication to produce an intermediate result; the The intermediate results of each element of the vector of double-precision accumulator element of each double precision addition, each Double-precision elements and stored in vector accumulator.

Ra and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each element in the high part of double-precision storage In VACH in.

Floating-point data types, all of the operands and the result are single precision.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++)(

Aop[i]＝{VRa[i]‖SRa}；

Bop[i]＝{VRb[i]‖SRb)；

if(dt＝float)VACL[i]＝Aop[i]*Bop[i]+VACL[i]；

else VACH[i]：VACL[i]＝Aop[i]*Bop[i]+VACH[i]：VACL[i]；

}

Abnormal

Overflow invalid floating point operands.

Programming notes

This command does not support int9 data types, use int16 data type instead.

VMACF fractional multiply and accumulate

Format

Assembler syntax

VMACF.dt VRa，VRb

VMACF.dt VRa，SRb

VMACF.dt VRd，#IMM

VMACF.dt SRa，SRb

VMACF.dt SRa，#IMM

Where dt = {b, h, w}.

Supported modes

D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I
D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I	DS	int8(b)		int16(h)	int32(w)

Explanation

VRa and Rb each element of each element in a double-precision multiplication to produce an intermediate result; This left a double-precision intermediate results; the shifted intermediate results of each of the double-precision elements and to Double the amount of each element of the accumulator sum; each element to a double-precision vector accumulator and storage Medium.

VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each element of the double-precision high portion Points stored in VACH in.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

VACH[i]：VACL[i]＝((VRa[i]*Bop[i])＜＜1)+VACH[i]：VACL[i]；

}

Abnormal

Overflow.

Programming notes

This command does not support int9 data types, use int16 data type instead.

VMACL multiply and accumulate low

Format

Assembler syntax

VMACL.dt VRd，VRa，VRb

VMACL.dt VRd，VRa，SRb

VMACL.dt VRd，VRa，#IMM

VMACL.dt SRd，SRa，SRb

VMACL.dt SRd，SRa，#IMM

Where dt = {b, h, w, f}.

Supported modes

Explanation

Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; this intermediate results of each double-precision vector accumulator elements and each element of double precision sum; Each element will be stored in double-precision and vector accumulator; returned to the lower part of the destination register bit VRd.

VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each bit double precision element part Stored in VACH in.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb}；

if(dt＝float)VACL[i]＝VRa[i]*Bop[i]+VACL[i]；

else VACH[i]：VACL[i]＝VRa[i]*Bop[i]+VACH[i]：VACL[i]；

VRd[i]＝VACL[i]；

}

Abnormal

Overflow invalid floating point operands.

Programming notes

This command does not support int9 data types. Instead use the int16 data type.

VMAD multiplication and addition

Format

Assembler syntax

VMAD.dt VRc，VRd，VRa，VRb

VMAD.dt SRc，SRd，SRa，SRb

Where dt = (b, h, w).

Supported modes

S	VR	SR
S	VR	SR			DS	int8(b)		int16(h)	int32(w)

Explanation

Each element of the Ra and Rb each element of a double-precision multiplication to produce intermediate results; Each of the intermediate results of this double-precision elements and adding each element Rc; double precision of each element Degrees and stored in the destination register Rd +1: Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Aop[i]＝{VRa[i]‖SRa]；

Bop[i]＝(VRb[i]‖SRb}；

Cop[i]＝(VRc[i]‖SRc}；

Rd+1[i]：Rd[i]＝Aop[i]*Bop[i]+sex_dp(Cop[i])；

}

Abnormal

None.

VMADL low multiplication and addition

Format

Assembler syntax

VMADL.df VRc，VRd，VRa，VRb

VMADL.dt SRc，SRd，SRa，SRb

Where dt = {b, h, w, f}.

Supported modes

S	VR	SR
S	VR	SR			DS	int8(b)	float(f)	int16(h)	int32(w)

Explanation

Each element of the Ra and Rb each element of a double-precision multiplication to produce intermediate results; This intermediate result for each element of the double-precision adding each element Rc; double precision of each element Degrees and the low part of the return to the destination register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++)(

Aop[i]＝{VRa[i]‖SRa}；

Bop[i]＝{VRb[i]‖SRb]；

Cop[i]＝{VRc[i]‖SRc{；

if(dt＝Roat)Lo[i]＝Aop[i]*Bop[i]+Cop[i]；

else Hi[i]：Lo[i]＝Aop[i]*Bop[i]+sex_dp(Cop[i])；

Rd[i]＝Lo[i]；

}

Abnormal

Overflow invalid floating point operands.

VMAS multiply and subtract from accumulator

Format

Assembler syntax

VMAS.dt VRa，VRb

VMAS.dt VRa，SRb

VMAS.dt VRa，#IMM

VMAS.dt SRa，SRb

VMAS.dt SRa，#IMM

Where dt = {b, h, w, f}.

Supported modes

Explanation

Ra and Rb each element to each element in a double-precision multiplication to produce an intermediate result; from Each double precision vector accumulator element by subtracting the intermediate results of each double precision element; each element Double-precision and storage elements to the vector accumulator.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb}；

if(dt＝float)VACL[i]＝VACL[i]-VRa[i]*Bop[i]；

else VACH[i]：VACL[i]＝VACH[i]：VACL[i]-VRa[i]*Bop[i]；

}

Abnormal

Overflow invalid floating point operands.

Programming notes

This command does not support int9 data types, use int16 data type instead.

VMASF fractional multiply and subtract from accumulator

Format

Assembler syntax

VMASF.dt VRa，VRb

VMASF.dt VRa，SRb

VMASF.dt VRa，#IMM

VMASF.dt SRa，SRb

VMASF.dt SRa，#IMM

Where dt = {b, h, w}.

Supported modes

Explanation

Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; intermediate results of a double-precision left one; from each double-precision vector accumulator subtracts the elements are Each intermediate result shift double precision element; the double of each element to the vector accumulation and storage Makers.

VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively, and int8, int16 and int32). Each element of the double-precision high portion Points stored in VACH in.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)]；

VACH[i]：VACL[i]＝VACH[i]：VACL[i]-VRa[i]*Bop[i]；

}

Abnormal

Overflow.

Programming notes

This command does not support int9 data types, use int16 data type instead.

VMASL low multiply and subtract from accumulator

Format

Assembler syntax

VMASL.dt VRd，VRa，VRb

VMASL.dt VRd，VRa，SRb

VMASL.dt VRd，VRa，#IMM

VMASL.dt SRd，SRa，SRb

VMASL.dt SRd，SRa，#IMM

Where dt = {b, h, w, f}.

Supported modes

Explanation

Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; from the vector accumulator subtracts the elements of each double-precision double-precision intermediate results of each element; each Elements and stored in double-precision vector accumulator; the lower part of the store to the destination register VRd.

RVa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each double-precision elements stored in the high part of VACH in.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb}；

if(dt＝float)VACL[i]＝VACL[i]-VRA[i]*Bop[i]；

else VACH[i]：VACL[i]＝VACH[i]：VACL[i]-VRa[i]*Bop[i]：

VRd[i]＝VACL[i]；

}

Abnormal

Overflow invalid floating point operands.

Programming notes

This command does not support int9 data types. Instead use the int16 data type.

VMAXE and maximum pairwise exchange

Format

Assembler syntax

VMAXE.dt VRd，VRb

Where dt = {b, b9, h, w, f}.

Supported modes

D：S：M	V＜-V
D：S：M	V＜-V					DS	int8(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

VRa should be equal VRb. When VRa with VRb not the same, the result is undefined.

Each vector register Rb even / odd pairs of data elements are compared, and each data element pairs The larger value is stored to the even positions, each data element of the vector stored in the smaller register Rd In an odd position.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i＝i+2)(

VRd[i]＝(VRb[i]＞VRb[i+1])？VRb[i]：VRb[i+1]；

VRd[i+1]＝(VRb[i]＞VRb[i+1])？VRb[i+1]：VRb[i]：

}

Abnormal

None.

VMOV transfer

Format

Assembler syntax

VMOV.dt Rd，Rb

Where dt = {b, b9, h, w, f}. Rd and Rb instruction register name specified on the structure.

Note. W and. F indicate the same operation.

Supported modes

Explanation

Transfer the contents of register Rb to register Rd. Group field specifies the source and destination registers Groups. Register set markup approach is:

Vector register VR current group

VRA substitution group vector register

SR scalar register

SP special register

RASR return address stack register

MAC vector accumulator registers (see below VAC register code table)

Group <3:0>	Source group	Destination group	Note
Group <3:0>	Source group	Destination group	Note	0000			Retention
0001	VR	VRA		0000			Retention
0001	VR	VRA		0010	VRA	VR
0011	VRA	VRA		0010	VRA	VR
0011	VRA	VRA		0100			Retention
0101			Retention	0100			Retention
0101			Retention	0110	VRA	VAC
0111	VAC	VRA		0110	VRA	VAC
0111	VAC	VRA		1000			Retention
1001	SR	VRA		1000			Retention
1001	SR	VRA		1010			Retention
1011			Retention	1010			Retention
1011			Retention	1100	SR	SP
1101	SP	SR		1100	SR	SP
1101	SP	SR		1110	SR	RASR
1111	RASR	SR		1110	SR	RASR

Note that you can not use this command to the vector register scalar register. VEXTRT instruction is Provided for this purpose.

The VAC register encoded using the following table:

R<2：0>	Register	Note
R<2：0>	Register	Note	000	Undefined
001	VAC0L		000	Undefined
001	VAC0L		010	VAC0H
011	VAC0	Specify VAC0H: VAC0L both. As Designated as the source, VRd +1: VRd Send Register pair is updated. VRd must be an even Number of registers.	010	VAC0H
011	VAC0		100	Undefined
101	VAC1L		100	Undefined
101	VAC1L		110	VAC1H
111	VAC1	Specify VAC1H: VAC1L both. As Designated as the source, VRd +1: VRd Send Register pair is updated. VRd must be an even Number of registers.	110	VAC1H
111	VAC1		Other	Undefined

Operating

Rd＝Rb

Abnormal

Set in VCSR or VISRC unusual event status bit will cause a corresponding anomalies.

Programming notes

This command is not affected shielding element. Note that the mode used in VEC64 replacement group does not exist Concept, VEC64 mode, the instruction can not be used to replace the group from the register or to alternative groups Register transfers.

VMUL multiply

Format

Assembler syntax

VMUL.dt VRc，VRd，VRa，VRb

VMUL.dt SRc，SRd，SRa，SRb

Where dt = {b, h, w}.

Supported modes

S	VR	SR
S	VR	SR			DS	int8(b)		int16(h)	int32(w)

Explanation

Each element of the Ra and Rb each element to produce a double-precision multiplication result; each Elements and return to double-precision destination register Rc: Rd.

Ra and Rb using the specified data type, and Rc: Rd using the appropriate double-precision data types (16, 32 and 64 respectively int8, int16 and int32). Each element of the double-precision high portion Points stored in Rc.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Aop[i]＝{VRa[i]‖SRa}；

Bop[i]＝{VRb[i]‖SRb}；

Hi[i]：Lo[i]＝Aop[i]*Bop[i]：

Rc[i]＝Hi[i]；

Rd[i]＝Lo[i]；

}

Abnormal

None.

Programming notes

This command does not support int9 data types, use int16 data type instead. This command does not support Floating-point data types, as a result of extended data types are not supported.

VMULA multiply to accumulator

Format

Assembler syntax

VMULA.dt VRa，VRb

VMULA.dt VRa，SRb

VMULA.dt VRa，#IMM

VMULA.dt SRa，SRb

VMULA.dt SRa，#IMM

Where dt = {b, h, w, f}.

Supported modes

D：S：M	V@V	V@S	V@I	S@S	S@I
D：S：M	V@V	V@S	V@I	S@S	S@I	DS	int8(b)		int16(h)	int32(w)	float(f)

Explanation

Each element of the VRa and Rb each element to produce a double-precision multiplication result; the This result is written to the accumulator.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb}；

if(dt＝＝float)VACL[i]＝VRa[i]*Bop[i]；

else VACH[i]：VACL[i]＝VRa[i]*Bop[i]；

}

Abnormal

None.

Programming notes

This command does not support int9 data types. Instead use the int16 data type.

VMULAF decimal multiply to accumulator

Format

Assembler syntax

VMULAF.dt VRa，VRb

VMULAF.dt VRa，SRb

VMULAF.dt VRa，#IMM

VMULAF.dt SRa，SRb

VMULAF.dt SRa，#IMM

Where dt = {b, h, w}.

Supported modes

D：S：M	V@V	V@S	V@I	S@S	S@I
D：S：M	V@V	V@S	V@I	S@S	S@I	DS	int8(b)		int16(h)	int32(w)

Explanation

Each element of the VRa and Rb multiplying each element to produce a knot in the middle of the double-precision Fruit; This left a double-precision intermediate results; writes the result to the accumulator.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

VACH[i]：VACL[i]＝(VRa[i]*Bop[i])＜＜1；

}

Abnormal

None.

Programming notes

This command does not support int9 data types, use int16 data type instead.

VMULF multiply decimals

Format

Assembler syntax

VMULF.dt VRd，VRa，VRb

VMULF.dt VRd，VRa，SRb

VMULF.dt VRd，VRa，#IMM

VMULF.dt SRd，SRa，SRb

VMULF.dt SRd，SRa，#IMM

Where dt = {b, h, w}.

Supported modes

Explanation

Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results Fruit; This left a double-precision intermediate results; high part of the result returned to the destination register VRd +1, The low part of the return to the destination register VRd. VRd register must be an even number.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Hi[i]：Lo[i]＝(VRa[i]*Bop[i])＜＜1；

VRd+1[i]＝Hi[i]；

VRd[i]＝Lo[i]；

}

Abnormal

None.

Programming notes

This command does not support int9 data types, use int16 data type instead.

VMULFR multiply decimals and rounding

Format

Assembler syntax

VMULFR.dt VRd，VRa，VRb

VMULFR.dt VRd，VRa，SRb

VMULFR.dt VRd，VRa，#IMM

VMULFR.dt SRd，SRa，SRb

VMULFR.dt SRd，SRa，#IMM

Where dt = {b, h, w}.

Supported modes

Explanation

Each element of the VRa and Rb each element to produce a double-precision multiplication intermediate result; This left a double-precision intermediate results; this is shifted intermediate result is rounded to the upper part; highs Partial return to the destination register VRd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝(VRb[i]‖SRb‖sex(IIMM<8：0>)}；

Hi[i]：Lo[i]＝(VRa[i]*Bop[i])＜＜1；

if(Lo[i]<msb>＝＝1) Hi[i]＝Hi[i]+1；

VRd[i]＝Hi[i]；

}

Abnormal

None.

Programming notes

This command does not support int9 data types, use int16 data type instead.

VMULL by low

Format

Assembler syntax

VMULL.dt VRd，VRa，VRb

VMULL.dt VRd，VRa，SRb

VMULL.dt VRd，VRa，#IMM

VMULL.dt SRd，SRa，SRb

VMULL.dt SRd，SRa，#IMM

Where dt = (b, h, w, f}.

Supported modes

Explanation

Each element of the VRa and Rb each element to produce a double-precision multiplication result; result The lower part of the return to the destination register VRd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb}；

if(dt＝Roat)Lo[i]＝VRa[i]*Bop[i]；

else Hi[i]：Lo[i]＝VRa[i]*Bop[i]；

VRd[i]＝Lo[i]；

}

Abnormal

Overflow invalid floating point operands.

Programming notes

This command does not support int9 data types. Instead use the int16 data type.

VNAND and non-

Format

Assembler syntax

VNAND.dt VRd，VRa，VRb

VNAND.dt VRd，VRa，SRb

VNAND.dt VRd，VRa，#IMM

VNAND.dt SRd，SRa，SRb

VNAND.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.

Supported modes

Explanation

For each element in Ra every one with Rb / immediate operands in the corresponding bit logic NAND, results are returned in Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝[VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝-(Ra[i]<k> & Bop[i]<k>).for k＝all bits in elementi；

}

Abnormal

None.

VNOR or

Format

Assembler syntax

VNOR.dt VRd，VRa，VRb

VNOR.dt VRd，VRa，SRb

VNOR.dt VRd，VRa，#IMM

VNOR.dt SRd，SRa，SRb

VNOR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.

Supported modes

D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I
D：S：M	V＜-V@V	V＜-V@S	V＜-V@I	S＜-S@S	S＜-S@I	DS	int8(b)	ini9(b9)	int16(h)	int32(w)

Explanation

For each element in Ra every one with Rb / immediate operands corresponding bits in the logic NOR; Results are returned in Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝-(Ra[i]<k>|Bop[i]<k>).for k＝all bits in elementi；

}

Abnormal

None.

VOR or

Format

Assembler syntax

VOR.dt VRd，VRa，VRb

VOR.dt VRd，VRa，SRb

VOR.dt VRd，VRa，#IMM

VOR.dt SRd，SRa，SRb

VOR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.

Supported modes

Explanation

For each element in Ra every one with Rb / immediate operands corresponding bit logical OR; Results are returned in Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝Ra[i]<k>|Bop[i]<k>，for k＝all bits in elementi；

}

Abnormal

None.

VORC or complement

Format

Assembler syntax

VORC.dt VRd，VRa，VRb

VORC.dt VRd，VRa，SRb

VORC.dt VRd，VRa，#IMM

VORC.dt SRd，SRa，SRb

VORC.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.

Supported modes

Explanation

For each element in Ra every one with Rb / immediate operand corresponding logical complement of the bit OR; result is returned in Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}

Rd[i]<k>＝Ra[i]<k>|-Bop[i]<k>，for k＝all bits in elementi；

}

Abnormal

None.

VPFTCH prefetch

Format

Assembler syntax

VPFTCH.ln SRb，SRi

VPFTCH.ln SRb，#IMM

VPFTCH.ln SRb+，SRi

VPFTCH.ln SRb+，#IMM

Where ln = {1,2,4,8}.

Explanation

Start from a valid address multiple vector data Cache prefetch rows. Cache is specified as the number of rows Next:

LN <1:0> = 00: prefetch a line of 64 bytes Cache

LN <1:0> = 01: prefetch two rows of 64 bytes Cache

LN <1:0> = 10: prefetch 4 lines of 64 bytes Cache

LN <1:0> = 11: prefetch 8 lines of 64 bytes Cache

If the address is not valid falls on 64-byte boundary, then the first 64 bytes truncated to match the edges Boundary alignment.

Operating

Abnormal

Invalid data address anomalies.

Programming notes

EA <31:0> pointed out in a local memory byte address.

VPFTCHSP prefetched into temporary memory

Format

Assembler syntax

VPFTCHSP.ln SRp，SRb，SRi

VPFTCHSP.ln SRp，SRb，#IMM

VPFTCHSP.ln SRp，SRb+，SRi

VPFTCHSP.ln SRP，SRb+，#IMM

Where ln = {1,2,4,8}. Note VPFTCH and VPFTCHSP have the same Opcode.

Explanation

Temporary memory from the memory to send multiple blocks of 64 bytes. Effective address given memory Start address and SRp provide temporary memory starting address. The number of 64-byte blocks are assigned as follows:

LN <1:0> = 00: sending a 64-byte block

LN <1:0> = 01: sending two 64-byte blocks

LN <1:0> = 10: transmission 4 blocks of 64 bytes

LN <1:0> = 11: sending eight blocks of 64 bytes

If the address is not valid falls on 64-byte boundaries, first truncated to make the 64-byte boundary Alignment. If SRp in the temporary memory address pointer does not fall on a 64-byte boundary, it also cut Off with the 64-byte boundary alignment. Align the temporary memory pointer address to increase the number of bytes transferred.

Operating

EA＝SRb+{SRi‖sex(IMM<7：0>)}；

if(A＝1)SRb＝EA；

Num_bytes＝{64‖128‖256‖512}；

Mem_adrs＝EA<31：6>：6b′000000；

SRp＝SRp<31：6>：6b′000000；

for(i＝0；i＜Num_bytes；i++)

SPAD[SRp++]＝MEM[Mem_adrs+i]；

Abnormal

Invalid data address anomalies.

VROL Rotate Left

Format

Assembler syntax

VROL.dt VRd，VRa，SRb

VROL.dt VRd，VRa，#IMM

VROL.dt SRd，SRa，SRb

VROL.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / scalar register Ra left circle for each data element, left the number of bits in a scalar register Rb, or IMM field is given, the result is stored vector / scalar register Rd.

Operating

rotate_amount＝{SRb％32‖IMM<4：0>}；

for(i＝0；i<NumElem && EMASK[i]；i++){

Rd[i]＝Ra[i]rotate_left rotate_amount；

}

Abnormal

None.

Programming notes

Note rotate-amount from SRb or IMM <4:0> gained five digits. For byte, byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to the number of bits Cyclic shift amount. If the shift amount is greater than the specified data length, the result is undefined.

Note that n bits Rotate Left Rotate Right ElemSize-n bits, where ElemSize table The length of the given data shows the number of bits.

VROR Rotate Right

Format

Assembler syntax

VROR.dt VRd，SRa，SRb

VROR.dt VRd，SRa，#IMM

VROR.dt SRd，SRa，SRb

VROR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / scalar register Ra each data element rotated right, right by the number of bits in a scalar register Rb, or IMM field is given, the result is stored vector / scalar register Rd.

Operating

rotate_amount＝{SRb％32‖IMM<4：0>}；

for(i＝0；i＜NumElem && EMASK[i]；i++){

Rd[i]＝Ra[i]rotate_right rotate_amount；

}

Abnormal

None.

Programming notes

Note rotate-amount from SRb or IMM <4:0> has made a number of five. For byte, byte9, halfword data types, the programmer responsible for the correct designation is less than or equal to the data The length of the cyclic shift amount of bits. If the shift amount is greater than the specified data length, the result Is undefined.

Note that the loop right by n bits is equivalent to rotate left ElemSize-n bits, where ElemSize table The length of the given data shows the number of bits.

VROUND will float to integer rounding

Format

Assembler syntax

VROUND.rm VRd，VRb

VROUND.rm SRd，SRb

Where m = {ninf, zero, near, pinf}.

Supported modes

D：S：M

V＜-V

S＜-S

Explanation

Vector / scalar register Rb contents of floating-point data format rounding to become the nearest 32-bit integer Number (Word), the result is stored in the vector / scalar register Rd. Rounding mode specified in RM.

RM<1：0>	Mode	Significance
RM<1：0>	Mode	Significance	00	ninf	To - ∞ rounding
01	zero	Rounding toward zero	00	ninf	To - ∞ rounding
01	zero	Rounding toward zero	10	near	Rounded to the nearest even number
11	pinf	Rounding toward + ∞	10	near	Rounded to the nearest even number

Operating

for(i＝0；i＜NumElem；i++){

Rd[i]＝Convert to int32(Rb[i])；

}

Abnormal

None.

Programming notes

This command is not affected shielding element.

VSATL saturation to low limit

Format

Assembler syntax

VSATL.dt VRd，VRa，VRb

VSATL.dt VRd，VRa，SRb

VSATL.dt VRd，VRa，#IMM

VSATL.dt SRd，SRa，SRb

VSATL.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}. Note 9 immediate unsupported. F data types.

Supported modes

Explanation

Vector / scalar register Ra with it each data element in the vector / scalar register Rb or IMM Field compared to the corresponding lower limit check. If the data element value smaller than the lower limit, were set equal to In the lower limit, and the final result is stored in vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]＝(Ra[i]＜Bop[i]？Bop[i]：Ra[i]；

}

Abnormal

None.

VSATU saturate the high limit

Format

Assembler syntax

VSATU.dt VRd，SRa，SRb

VSATU.dt VRd，SRa，#IMM

VSATU.dt SRd，SRa，SRb

VSATU.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w, f}. Note 9 immediate unsupported. F data types.

Supported modes

Explanation

Vector / scalar register Ra with it each data element in the vector / scalar register Rb or IMM Field is checked against the corresponding high limit. If the data element is greater than this upper limit, were set equal to At high limit, and the final result is stored in vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)]；

Rd[i]＝(Ra[i]＞Bop[i])？Bop[i]：Ra[i]；

}

Abnormal

None.

VSHFL shuffling

Format

Assembler syntax

VSHFL.dt VRc，VRd，VRa，VRb

VSHFL.dt VRc，VRd，VRa，SRb

Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.

Supported modes

S	VRb	SRb
S	VRb	SRb			DS	int8(b)	int9(b9)	int18(h)	int32(w)

Explanation

Vector contents of register Ra and Rb shuffling, the result is stored in the vector register Rc: Rd, As shown below:

Operating

Abnormal

None.

Programming notes

This instruction does not use the element shield.

VSHFLH shuffling high

Format

Assembler syntax

VSHFLH.dt VRd，VRa，VRb

VSHFLH.dt VRd，VRa，SRb

Where dt = {b, b9, h, w, f]. Attention. W and. F specify the same operation.

Supported modes

S	VRb	SRb
S	VRb	SRb			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Vector contents of register Ra and Rb shuffled, placed in the high part of the result vector register Rd, as shown below:

Operating

Abnormal

None.

Programming notes

This instruction does not use the element shield.

VSHFLL shuffled Low

Format

Assembler syntax

VSHFLL.dt VRd，VRa，VRb

VSHFLL.dt VRd，VRa，SRb

Where dt = {b, b9, h, W, f}. Attention. W and. F specify the same operation.

Supported modes

S	VRb	SRb
S	VRb	SRb			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

Vector contents of register Ra and Rb shuffling, the results stored in the lower part of the vector register Rd, as shown below:

Operating

Abnormal

None.

Programming notes

This instruction does not use the element shield.

VST Storage

Format

Assembler syntax

VST.st Rs，SRb，SRi

VST.st Rs，SRb，#IMM

VST.st Rs，SRb+，SRi

VST.st Rs，SRb+，#IMM

Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.

Note. B and. B9t indicate the same operation, .64, and can not be specified together VRAs. On the Cache-off Storage usage VSTOFF.

Explanation

Storing a vector or scalar register.

Operating

EA＝SR _b+[SR _i‖sex(IMM<7：0>)}；

if(A＝1)SR _b＝EA；

MEM [EA] = table:

ST	Storage operations
ST	Storage operations	.b	BYTE[EA]＝SR _s<7：0>
.h	HALF[EA]＝SR _s<15：0>	.b	BYTE[EA]＝SR _s<7：0>
.h	HALF[EA]＝SR _s<15：0>	.w	WORD[EA]＝SR _s<31：0>
.4	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 3	.w	WORD[EA]＝SR _s<31：0>
.4	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 3	.8	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 7
.16	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 15	.8	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 7
.16	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 15	.32	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 31
.64	BYTE[EA+i]＝VR _0s<9i+7：9i>，i＝0 to 31 BYTE[EA+32+i]＝VR _1s<9i+7：9i>，i＝0 to 31	.32	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 31

Abnormal

Invalid data address, unaligned accesses.

Programming notes

This command is not affected shielding element.

VSTCB stored in a circular buffer

Format

Assembler syntax

VSTCB.st Rs，SRb，SRi

VSTCB.st Rs，SRb，#IMM

VSTCB.st Rs，SRb+，SRi

VSTCB.st Rs，SRb+，#IMM

Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.

Note. B and. B9t indicate the same operation, .64, and can not be specified together VRAs. On the Cache-off Use VSTCBOFF.

Explanation

From the circular buffer stores the vector or scalar register, a circular buffer bounded by SR_b+1In The BEGIN pointer and SR_b+2The END-pointer.

And address of the storage before the update operation, if the effective address is greater than END address, it will be Adjusted. In addition,. H and. W scalar boundary of the circular buffer must be loaded separately and with halfword Word boundary alignment.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)}；

BEGIN＝SR _b+1；

END＝SR _b+2；

cbsize＝END-BEGIN；

if(EA＞END)EA＝BEGIN+(EA-END)；

if(A＝1)SR _b＝EA；

MEM [EA] = table:

ST	Storage operations
ST	Storage operations	.b	BYTE[EA]＝SR _s<7：0>；
.h	HALF[EA]＝SR _s<15：0>；	.b	BYTE[EA]＝SR _s<7：0>；
.h	HALF[EA]＝SR _s<15：0>；	.w	WORD[EA]＝SR _s<31：0>；
.4	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 3	.w	WORD[EA]＝SR _s<31：0>；
.4	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 3	.8	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 7
.16	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 15	.8	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 7
.16	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 15	.32	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 31
.64	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _0s<9i+7：9i>，i＝0 to 31 BYTE[(EA+32+i＞END)？EA+32+i-cbsize：EA+32+i]＝VR _1s<9i+7：9i>. i＝0 to 31	.32	BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR _s<9i+7：9i>，i＝0 to 31

Abnormal

Invalid data address, unaligned accesses.

Programming notes

This command is not affected shielding element.

Programming of the following conditions must be determined in order to make this work in the desired command:

BEGIN＜EA＜2*END-BEGIN

Namely, EA> BEGIN and EA-END <END-BEGIN

VSTD dual storage

Format

Assembler syntax

VSTD.st Rs，SRb，SRi

VSTD.st Rs，SRb，#IMM

VSTD.st Rs，SRb+，SRi

VSTD.st Rs，SRb+，#IMM

Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}. Attention. B and. B9t specify the same operation, .64 and VRAs can not be specified together. On the Cache-off Storage usage VSTDOFF.

Explanation

From the current or alternative storage group from two vector registers or two scalar register.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)}；

if(A＝1)SR _b＝EA；

MEM [EA] = table:

ST	Storage operations
ST	Storage operations	.b	BYTE[EA]＝SR _s<7：0> BYTE[EA+1]＝SR _s+1<7：0>
.h	HALF[EA]＝SR _s<15：0> HALF[EA+2]＝SR _s+1<15：0>	.b	BYTE[EA]＝SR _s<7：0> BYTE[EA+1]＝SR _s+1<7：0>
.h	HALF[EA]＝SR _s<15：0> HALF[EA+2]＝SR _s+1<15：0>	.w	WORD[EA]＝SR _s<31：0> WORD[EA+4]＝SR _s+1<31：0>
.4	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 3 BYTE[EA+4+i]＝VR _s+1<9i+7：9i>，i＝0 to 3	.w	WORD[EA]＝SR _s<31：0> WORD[EA+4]＝SR _s+1<31：0>
.4		.8	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 7 BYTE[EA+8+i]＝VR _s+1<9i+7：9i>，i＝0 to 7
.16	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 15 BYTE[EA+16+i]＝VR _s+1<9i+7：9i>，i＝0 to 15	.8

ST	Storage operations
ST	Storage operations	.32	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 31 BYTE[EA+32+i]＝VR _s+1<9i+7：9i>，i＝0 to 31
.64	BYTE[EA+i]＝VR _0s<9i+7：9i>，i＝0 to 31 BYTE[EA+32+i]＝VR _1s<9i+7：9i>，i＝0 to 31 BYTE[EA+64+i]＝VR _0s+1<9i+7：9i>，i＝0 to 31 BYTE[EA+96+i]＝VR _1s+1<9i+7：9i>，i＝0 to 31	.32

Abnormal

Invalid data address, unaligned accesses.

Programming notes

Elements of this Directive without shielding effect.

VSTQ four storage

Format

Assembler syntax

VSTQ.st Rs，SRb，SRi

VSTQ.st Rs，SRb，#IMM

VSTQ.st Rs，SRb+，SRi

VSTQ.st Rs，SRb+，#IMM

Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.

Attention. B and. B9t specify the same operation, .64 and VRAs can not be specified together. On the Cache-off Storage usage VSTQOFF.

Explanation

Storage from the current or alternative set of four vector registers or four scalar register.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)}；

if(A＝1)SR _b＝EA；

MEM [EA] = table:

ST	Storage operations
ST	Storage operations	.b	BYTE[EA]＝SR _s<7：0> BYTE[EA+1]＝SR _s+1<7：0> BYTE[EA+2]＝SR _s+2<7：0> BYTE[EA+3]＝SR _s+3<7：0>
.h	HALF[EA]＝SR _s<15：0> HALF[EA+2]＝SR _s+1<15：0> HALF[EA+4]＝SR _s+2<15：0> HALF[EA+6]＝SR _s+3<15：0>	.b
.h		.w	WORD[EA]＝SR _s<31：0> WORD[EA+4]＝SR _s+1<31：0> WORD[EA+8]＝SR _s+2<31：0> WORD[EA+12]＝SR _s+3<31：0>

ST	Storage operations
ST	Storage operations	.4	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 3 BYTE[EA+4+i]＝VR _s+1<9i+7：9i>，i＝0 to 3 BYTE[EA+8+i]＝VR _s+2<9i+7：9i>，i＝0 to 3 BYTE[EA+12+i]＝VR _s+3<9i+7：9i>，i＝0 to 3
.8	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 7 BYTE[EA+8+i]＝VR _s+1<9i+7：9i>，i＝0 to 7 BYTE[EA+16+i]＝VR _s+2<9i+7：9i>，i＝0 to 7 BYTE[EA+24+i]＝VR _s+3<9i+7：9i>，i＝0 to 7	.4
.8		.16	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 15 BYTE[EA+16+i]＝VR _s+1<9i+7：9i>，i＝0 to 15 BYTE[EA+32+i]＝VR _s+2<9i+7：9i>，i＝0 to 15 BYTE[EA+48+i]＝VR _s+3<9i+7：9i>，i＝0 to 15
.32	BYTE[EA+i]＝VR _s<9i+7：9i>，i＝0 to 31 BYTE[EA+32+i]＝VR _s+1<9i+7：9i>，i＝0 to 31 BYTE[EA+64+i]＝VR _s+2<9i+7：9i>，i＝0 to 31 BYTE[EA+96+i]＝VR _s+3<9i+7：9i>，i＝0 to 31	.16
.32		.64	BYTE[EA+i]＝VR _0s<9i+7：9i>，i＝0 to 31 BYTE[EA+32+i]＝VR _1s<9i+7：9i>，i＝0 to 31 BYTE[EA+64+i]＝VR _0s+1<9i+7：9i>，i＝0 to 31 BYTE[EA+96+i]＝VR _1s+1<9i+7：9i>，i＝0 to 31 BYTE[EA+128+i]＝VR _0s+2<9i+7：9i>，i＝0 to 31 BYTE[EA+160+i]＝VR _1s+2<9i+7：9i>，i＝0 to 31 BYTE[EA+192+i]＝VR _0s+3<9i+7：9i>，i＝0 to 31 BYTE[EA+224+i]＝VR _1s+3<9i+7：9i>，i＝0 to 31

Abnormal

Invalid data address, unaligned accesses.

Programming notes

This command is not affected shielding element.

VSTR reverse Storage

Format

Assembler syntax

VSTR.st Rs，SRb，SRi

VSTR.st Rs，SRb，#IMM

VSTR st Rs，SRb+，SRi

VSTR.st Rs，SRb+，#IMM

Where st = {4,8,16,32,64}, Rs = {VRs, VRAs}. Note .64 and VRAs Can not be specified together. On the Cache-off storage usage VSTROFF.

Explanation

Stored in reverse order of elements in vector registers. The directive does not support scalar data source register.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)}；

if(A＝1)SR _b＝EA；

MEM [EA] = table:

ST	Storage operations
ST	Storage operations	.b	BYTE[EA+i]＝VR _s[31-i]<7：0>，for i＝0 to 31
.h	HALF[EA+i]＝VR _s[15-i]<15：0>，for i＝0 to 15	.b	BYTE[EA+i]＝VR _s[31-i]<7：0>，for i＝0 to 31
.h	HALF[EA+i]＝VR _s[15-i]<15：0>，for i＝0 to 15	.w	WORD[EA+i]＝VR _s[7-i]<31：0>，for i＝0 to 7
.4	BYTE[EA+i]＝VR _s[31-i]<7：0>，i＝0 to 3	.w	WORD[EA+i]＝VR _s[7-i]<31：0>，for i＝0 to 7
.4	BYTE[EA+i]＝VR _s[31-i]<7：0>，i＝0 to 3	.8	BYTE[EA+i]＝VR _s[31-i ]<7：0>，i＝0 to 7
.16	BYTE[EA+i]＝VR _s[31-i]<7：0>，i＝0 to 15	.8	BYTE[EA+i]＝VR _s[31-i ]<7：0>，i＝0 to 7
.16	BYTE[EA+i]＝VR _s[31-i]<7：0>，i＝0 to 15	.32	BYTE[EA+i]＝VR _s[31-i]<7：0>，i＝0 to 31
.64	BYTE[EA+32+i]＝VR _0s[31-i]<7：0>，i＝0 to 31 BYTE[EA+i]＝VR _1s[31-i]<7：0>，i＝0 to 31	.32	BYTE[EA+i]＝VR _s[31-i]<7：0>，i＝0 to 31

Abnormal

Invalid data address, unaligned accesses.

Programming notes

Elements of this Directive without shielding effect.

VSTWS span storage

Format

Assembler syntax

VSTWS.st Rs，SRb，SRi

VSTWS.st Rs，SRb，#IMM

VSTWS.st Rs，SRb+，SRi

VSTWS.st Rs，SRb+，#IMM

Where st = [8,16,32}, Rs = {VRs, VRAs}. Note that .64 is not supported mode With VST instead. On the Cache-off storage usage VSTWSOFF.

Explanation

Start from a valid address, using scalar register SR_b+1As a span of control registers, from the vector register Register VRs to store 32 bytes of memory.

ST instruction block size, block storage from each successive bytes. SR_b+1Instructions stride, Separating the start of two consecutive blocks of bytes.

Stride must be equal to or greater than the block size. EA must be aligned data length. stride and block size must be a multiple data length.

Operating

EA＝SR _b+{SR _i‖sex(IMM<7：0>)}；

if(A＝1)SR _b＝EA；

Block-size＝{4‖8‖16‖32}；

Stride＝SR _b+1＜31：0)；

for(i＝0；i＜VECSIZE/Block-size；i ⁺⁺)

for(j＝0；j＜Block-size；j ⁺⁺)

BYTE[EA+I*Stride+j]＝VRs{i*Block-size+j}<7：0>；

Abnormal

Invalid data address, unaligned accesses.

VSUB Less

Format

Assembler syntax

VSUB.dt VRd，VRa，VRb

VSUB.dt VRd，VRa，SRb

VSUB.dt VRd，VRa，#IMM

VSUB.dt SRd，SRa，SRb

VSUB.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w, f}.

Supported modes

Explanation

From the vector / scalar Subtract the contents of register Ra vector / scalar register Rb content, the knot If stored in a vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{Rb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]＝Ra[i]-Bop[i]；

}

Abnormal

Overflow invalid floating point operands.

VSUBS downs and set

Format

Assembler syntax

VSUBS.dt SRd，SRa，SRb

VSUBS.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w, f}.

Supported modes

D：S：M				S＜-S@S	S＜-S@I
D：S：M				S＜-S@S	S＜-S@I	DS	int8(b)	int9(b9)	int16(h)	int32(w)	float(f)

Explanation

Subtracted from SRa SRb; result into SRd, and set VCSR in VFLAG bit.

Operating

Bop＝{SRb‖sex(IMM<8：0>)}；

SRd＝SRa-Bop；

VCSR<lt，eq，gt>＝status(SRa-Bop)；

Abnormal

Overflow invalid floating point operands.

VUNSHFL deshuffling

Format

Assembler syntax

VUNSHFL.dt VRc，VRd，VRa，VRb

VUNSHFL.dt VRc，VRd，VRa，SRb

Where dt = {b, b9, h, w, f}. Note. W and. F indicate the same operation.

Supported modes

S	VRb	SRb
S	VRb	SRb			DS	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

The contents of vector register VRa Rb deshuffling and into vector register VRc: VRd, As follows:

Operating

Abnormal

None.

Programming notes

This instruction does not use the element shield.

VUNSHFLH deshuffling high

Format

Assembler syntax

VUNSHFLH.dt VRd，VRa，VRb

VUNSHFLH.dt VRd，VRa，SRb

Where dt = {b, b9, h, w, f}. Note. W and. F indicate the same operation.

Supported modes

S	VRb	SRb
S	VRb	SRb			Ds	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

The contents of vector register VRa Rb is deshuffling; returned to the high part of the result vector register Register VRd, as follows:

Operating

Abnormal

None.

Programming notes

This instruction does not use the element shield.

VUNSHFLL deshuffling low

Format

Assembler syntax

VUNSHFLL.dt VRd，VRa，VRb

VUNSHFLL.dt VRd，VRa，SRb

Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.

Supported modes

S	VRb	SRb
S	VRb	SRb			Ds	int8(b)	int9(b9)	int16(h)	int32(w)

Explanation

The contents of vector register VRa Rb is deshuffling; results returned to the low part of the vector register Register VRd, as follows:

Operating

Abnormal

None.

Programming notes

This instruction does not use the element shield.

VWBACKSP writeback from the temporary memory

Format

Assembler syntax

VWBACKSP.ln SRp，SRb，SRi

VWBACKSP.ln SRp，SRb，#IMM

VWEACKSP.ln SRp，SRb+，SRi

VWBACKSP.ln SRp，SRb+，#IMM

Where ln = {1,2,4,8}. Note VWBACK and VWBACKSP use the same operating For the code.

Explanation

Transferred from the temporary memory to the memory more than 64 byte blocks. Effective address given memory Start address, SRp given temporary memory starting address. The number of 64-byte blocks are assigned as follows:

LN <1:0> = 00: sending a 64-byte block

LN <1:0> = 01: sending two 64-byte blocks

LN <1:0> = 10: transmission 4 blocks of 64 bytes

LN <1:0> = 11: sending eight blocks of 64 bytes

If the address is not valid falls on 64-byte boundary, then it is first truncated to 64 bytes with the edges Boundary alignment. If SRp pointer in the temporary memory address does not fall on 64-byte boundaries, but also Truncated and well and 64-byte boundary alignment. Align the temporary memory pointer address to send word Increase the number of sections.

Operating

EA＝SRb+{SRi‖sex(IMM<7：0>)}；

if(A＝1)SRb＝EA；

Num_bytes＝{64‖128‖256‖512}；

Mem_adrs＝EA<31：6>：6b′000000；

SRp＝SRp<31：6>：6b′000000；

for(i＝0；i＜Num_bytes；i++)

SPAD[SRp++]＝MEM[Mem_adrs+i]；

Abnormal

Invalid data address anomalies.

VXNOR Exclusive NOR

Format

Assembler syntax

VXNOR.dt VRd，VRa，VRb

VXNOR.dt VRd，VRa，SRb

VXNOR.dt VRd，VRa，#IMM

VXNOR.dt SRd，SRa，SRb

VXNOR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / scalar contents of register Ra and vector / scalar register Rb contents logical XOR Africa, the result is stored in vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝-(Ra[i]<k>^Bop[i]<k>)，for k＝all bits in elementi；

}

Abnormal

None.

VXOR XOR

Format

Assembler syntax

VXOR.dt VRd，VRa，VRb

VXOR.dt VRd，VRa，SRb

VXOR.dt VRd，VRa，#IMM

VXOR.dt SRd，SRa，SRb

VXOR.dt SRd，SRa，#IMM

Where dt = {b, b9, h, w}.

Supported modes

Explanation

Vector / scalar contents of register Ra and vector / scalar register Rb contents Exclusive The result is placed vector / scalar register Rd.

Operating

for(i＝0；i＜NumElem && EMASK[i]；i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；

Rd[i]<k>＝Ra[i]<k>^Bop[i]<k>，for k＝all bits in elementi；

}

Abnormal

None.

VXORALL XOR all the elements

Format

Assembler syntax

VXORALL.dt SRd，VRb

Where dt = {b, b9, h, w}. Attention. B and. B9 specify the same operation.

Supported modes

DS

int8(b)

int9(b9)

int16(h)

int32(w)

Explanation

VRb each element along with the least significant bit XOR, a result is returned to SRd most Low significant bits. This command is not affected shielding element.

Operating

Abnormal

None.

VWBACK writeback

Format

Assembler syntax

VWBACK.ln SRb，SRi

VWBACK.ln SRb，#IMM

VWBACK.ln SRb+，SRi

VWBACK.ln SRb+，#IMM

Where ln = {1,2,4,8}.

Explanation

Vector Data Cache whose index is specified in the EA (EA match its label those same phase Anti) Cache line, as it contains the modified data, were updated to the memory. If more than one Cache line is specified, when they contain the modified data, the subsequent successive rows are updated in Cache To the memory. Cache the number of rows specified as follows:

LN <1:0> = 00: write a line of 64 bytes Cache

LN <1:0> = 01: write two lines of 64 bytes Cache

LN <1:0> = 10: write four lines of 64 bytes Cache

LN <1:0> = 11: write 8 lines of 64 bytes Cache

If the address is not valid falls on 64-byte boundary, then it is first truncated to 64 bytes with the edges Boundary alignment.

Operating

Abnormal

Invalid data address anomalies.

Programming notes

EA <31:0> point out local memory byte address.

Claims

1 A processor comprising:

A scalar register, suitable for storing a single scalar value;

A vector register, for storing a plurality of data elements; and

Processing circuit, which is connected to said scalar register and said vector registers, wherein the processing Circuit is responsive to a single instruction to perform a variety of operations in parallel, each operation of said vector register With an element of data in the scalar registers of the scalar values together.

(2) A method of operating the processing circuit to execute the command, the method comprising:

Read valued components constituting register data elements; and

Perform parallel operation, the operation of the scalar value combined with each data element to produce a vector Results.

3 as claimed in claim 2, wherein said parallel operation performed comprising said target Value is multiplied with each of said data elements to generate vector data results.

4 as claimed in claim 2, wherein said parallel operation performed comprising said target Value added to each of said data elements to generate vector data results.

5 as claimed in claim 2, further comprising reading from another register in said target value Combining said data element, wherein the further register for storing a single scalar value.

As claimed in claim 2, further comprising extracting from the instruction with the value of said target Combining elements of said data.

7 A method of operating the processor, the method comprising:

Providing a plurality of processors in said scalar register and a plurality of vector registers, wherein each standard Volume registers for storing a single scalar value, and each vector register adapted to store a vector component constitutes A plurality of data elements;

To each scalar register number assigned to a register, the register number is different from the label assigned to other Volume register register number;

To each vector register number assigned to a register, the register number is different from the other assigned to Volume registers register number, which is assigned to at least some of said vector register and register number assigned To the scalar register number register the same;

Forming an instruction, the instruction includes a first operand and the second operand, wherein the first operand Identifies a scalar register is a register number, the second operand is a vector register identifies Register number; and

Executing said instruction by said identifier of said first operand register, and the scalar Said identifier of said second operand vector register a transfer data between the data elements.

As claimed in claim 7, wherein:

Forming said instructions further comprises a vector used to identify data elements in the third operation Number; and which

Executing said command to said first operand by the scalar registers identified by said Identification of said second operand vector register operand identifies said third data elements Transfer data between.

(10) as claimed in claim 7, wherein:

The directive also includes the formation of another scalar register is used to identify a third operand; and Among

Executing said command to said first operand by the scalar registers identified by said Identified by the second operand in said another scalar register values stored in the vector identity Register transfer data between the data elements.