CN1152300C - Single-instruction-multiple-data processing with combined scalar/vector operations - Google Patents
Single-instruction-multiple-data processing with combined scalar/vector operations Download PDFInfo
- Publication number
- CN1152300C CN1152300C CNB971174059A CN97117405A CN1152300C CN 1152300 C CN1152300 C CN 1152300C CN B971174059 A CNB971174059 A CN B971174059A CN 97117405 A CN97117405 A CN 97117405A CN 1152300 C CN1152300 C CN 1152300C
- Authority
- CN
- China
- Prior art keywords
- register
- vector
- data
- instruction
- scalar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 239000013598 vector Substances 0.000 title claims abstract description 590
- 238000012545 processing Methods 0.000 title claims description 28
- 238000012546 transfer Methods 0.000 claims description 44
- 238000000034 method Methods 0.000 claims description 31
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 230000002159 abnormal effect Effects 0.000 description 105
- 239000008186 active pharmaceutical agent Substances 0.000 description 78
- 239000000872 buffer Substances 0.000 description 65
- 238000003860 storage Methods 0.000 description 54
- 230000000295 complement effect Effects 0.000 description 41
- 230000001343 mnemonic effect Effects 0.000 description 37
- 238000007667 floating Methods 0.000 description 29
- 230000008859 change Effects 0.000 description 25
- 238000006243 chemical reaction Methods 0.000 description 23
- 238000011068 loading method Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 14
- 230000000694 effects Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 230000003111 delayed effect Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- BVRIUXYMUSKBHG-UHFFFAOYSA-N 4-[[4-(dimethylamino)phenyl]diazenyl]aniline Chemical compound C1=CC(N(C)C)=CC=C1N=NC1=CC=C(N)C=C1 BVRIUXYMUSKBHG-UHFFFAOYSA-N 0.000 description 9
- 230000014759 maintenance of location Effects 0.000 description 9
- 229920006395 saturated elastomer Polymers 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 8
- 101000641224 Homo sapiens Vimentin-type intermediate filament-associated coiled-coil protein Proteins 0.000 description 7
- 102100034331 Vimentin-type intermediate filament-associated coiled-coil protein Human genes 0.000 description 7
- 235000013399 edible fruits Nutrition 0.000 description 7
- 238000013507 mapping Methods 0.000 description 7
- 230000033228 biological regulation Effects 0.000 description 5
- 239000013256 coordination polymer Substances 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 101100178280 Rattus norvegicus Homer1 gene Proteins 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 description 4
- 101100243442 Arabidopsis thaliana PER4 gene Proteins 0.000 description 3
- 101100189940 Arabidopsis thaliana PER5 gene Proteins 0.000 description 3
- 101100463459 Arabidopsis thaliana PER6 gene Proteins 0.000 description 3
- 101000579484 Homo sapiens Period circadian protein homolog 1 Proteins 0.000 description 3
- 101001073216 Homo sapiens Period circadian protein homolog 2 Proteins 0.000 description 3
- 101000601274 Homo sapiens Period circadian protein homolog 3 Proteins 0.000 description 3
- 101001126582 Homo sapiens Post-GPI attachment to proteins factor 3 Proteins 0.000 description 3
- 101100519625 Komagataella pastoris PEX2 gene Proteins 0.000 description 3
- 206010027476 Metastases Diseases 0.000 description 3
- 102100028293 Period circadian protein homolog 1 Human genes 0.000 description 3
- 102100035787 Period circadian protein homolog 2 Human genes 0.000 description 3
- 102100037630 Period circadian protein homolog 3 Human genes 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000000873 masking effect Effects 0.000 description 3
- 230000009401 metastasis Effects 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 101100136148 Arabidopsis thaliana PER11 gene Proteins 0.000 description 2
- 101100136150 Arabidopsis thaliana PER13 gene Proteins 0.000 description 2
- 101100136151 Arabidopsis thaliana PER14 gene Proteins 0.000 description 2
- 101100136152 Arabidopsis thaliana PER15 gene Proteins 0.000 description 2
- 101100463465 Arabidopsis thaliana PER7 gene Proteins 0.000 description 2
- 101100463466 Arabidopsis thaliana PER8 gene Proteins 0.000 description 2
- 101100519531 Arabidopsis thaliana PER9 gene Proteins 0.000 description 2
- 101001126533 Arabidopsis thaliana Peroxisome biogenesis factor 10 Proteins 0.000 description 2
- 101100478290 Arabidopsis thaliana SR30 gene Proteins 0.000 description 2
- 101100446506 Mus musculus Fgf3 gene Proteins 0.000 description 2
- 101100407810 Pichia angusta PEX10 gene Proteins 0.000 description 2
- 101100297149 Pichia angusta PEX3 gene Proteins 0.000 description 2
- 240000001987 Pyrus communis Species 0.000 description 2
- 101100212791 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) YBL068W-A gene Proteins 0.000 description 2
- 101100160255 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) YLR154C-H gene Proteins 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000003607 modifier Substances 0.000 description 2
- 238000011022 operating instruction Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- UOTMYNBWXDUBNX-UHFFFAOYSA-N 1-[(3,4-dimethoxyphenyl)methyl]-6,7-dimethoxyisoquinolin-2-ium;chloride Chemical compound Cl.C1=C(OC)C(OC)=CC=C1CC1=NC=CC2=CC(OC)=C(OC)C=C12 UOTMYNBWXDUBNX-UHFFFAOYSA-N 0.000 description 1
- 101100298412 Arabidopsis thaliana PCMP-H73 gene Proteins 0.000 description 1
- 101100136149 Arabidopsis thaliana PER12 gene Proteins 0.000 description 1
- 101100365087 Arabidopsis thaliana SCRA gene Proteins 0.000 description 1
- 241000776471 DPANN group Species 0.000 description 1
- 101001074199 Rattus norvegicus Glycerol kinase Proteins 0.000 description 1
- 101100438139 Vulpes vulpes CABYR gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- IFLVGRRVGPXYON-UHFFFAOYSA-N adci Chemical compound C12=CC=CC=C2C2(C(=O)N)C3=CC=CC=C3CC1N2 IFLVGRRVGPXYON-UHFFFAOYSA-N 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000001456 electron microprobe Auger spectroscopy Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002040 inelastic incoherent neutron scattering Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- NRBNGHCYDWUVLC-UHFFFAOYSA-N mtep Chemical compound S1C(C)=NC(C#CC=2C=NC=CC=2)=C1 NRBNGHCYDWUVLC-UHFFFAOYSA-N 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 101150096366 pep7 gene Proteins 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 101150116173 ver-1 gene Proteins 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/24—Systems for the transmission of television signals using pulse code modulation
Abstract
The digital signal processor has scalar registers for scalar values, and a group of general purpose vector registers for the vectors which form the elements of multiple data. Each register has a fixed size but can be partitioned by the user to sizes preferred for their application. The operations executed by the vector processor treat two or more vector operands to determine a vector quantity, combine a scalar operand and a vector operand to determine the total quantity, or combine two or more scalar operands to determine a scalar quantity. The scalar registers also facilitate manipulation of individual data elements in a vector register.
Description
Invention field
The present invention relates to Digital Signal Processing, the method and the device thereof that particularly carry out parallel processing for a plurality of data elements to each instruction of multimedia function (such as the Audio and Video Code And Decode).
Background technology
This patent document relates to and the following simultaneously patent application of application of reference:
U.S. Patent application serial number UNKNOWN1, attorney docket M-4354 is entitled as " Multiprocessor Operation in a Multimedia Signal Processor (multiprocessor operations in the multi-media signal processor) ";
U.S. Patent application serial number UNKNOWN2, attorney docket M-4355 is entitled as " Single-Instruction-Multiple-Data Processing in a Multimedia Signal Processor (single-instruction multiple-data in the multi-media signal processor is processed) ";
U.S. Patent application serial number UNKNOWN3, attorney docket M-4365 is entitled as " Efficient Context Saving and Restoring in Multiprocessors (efficient Locale Holding and recovery in the multiprocessor) ";
U.S. Patent application serial number UNKNOWN4, attorney docket M-4366 is entitled as " System and Method for Handling Software Interrupts with Argument Passing (processing has the system and method for the software interrupt of parameter transmission) ";
U.S. Patent application serial number UNKNOWN5, attorney docket M-4367 is entitled as " System and Method for Handling Interrupts and Exception Events in an Asymmetric Multiprocessor Architecture (system and method for handling interrupt and anomalous event in asymmetric multi-processor structure) ";
U.S. Patent application serial number UNKNOWN6, attorney docket M-4368 is entitled as " Methods and Apparatus for Processing Video Data (method and apparatus of processing video data) ";
U.S. Patent application serial number UNKNOWN7, attorney docket M-4369 is entitled as " Single-Instruction-Multiple-Data Processing Using Multiple Banks of Vector Registers (adopting the single-instruction multiple-data of a plurality of vector registor groups to process) "; And
The programmable digital signal processor (DSPs) that is used for multimedia application (for example real-time video Code And Decode) needs great disposal ability, in order to process a large amount of data in finite time. Several structures of digital signal processor are well-known. The universal architecture that most of microprocessors adopt generally needs high operate frequency, so that the DSP with the computing capability that is enough to carry out the real-time video coding or decodes to be provided. This makes this DSP expensive.
Very long instruction word (VLIW) processor is a kind of DSP with a lot of functional units, and the major part in these functional units is carried out different, relatively simple task. The single instruction of VLIW DSP can be 128 bytes or longer, and has a plurality of independently by the part of functional unit executed in parallel independently. VLIW DSPs has very strong computing capability, because many functional units can concurrent working. VLIW DSPs also has relatively low cost, because each functional unit is relatively little and simple. The problem that VLIW DSPs exists is to process I/O control, be unsuitable for the function aspects inefficiency of a plurality of functional unit executed in parallel of using VLIW DSP with main computer communication and other. In addition, the software of VLIW is different from traditional software and exploitation difficulty, in default of programming tool be familiar with the programmer of VLIW software configuration. Therefore, can provide the DSP of reasonable cost, high computing capability and familiar programmed environment is that multimedia application is looked for.
Summary of the invention
The purpose of this invention is to provide a kind of single-instruction-multiple-data processing and device thereof.
According to one aspect of the invention, a kind of processor is provided, comprise a scalar register, be suitable for storing single scalar value; A vector registor is suitable for storing a plurality of data elements; And treatment circuit, it is connected to described scalar register and described vector registor, wherein this treatment circuit is carried out multiple operation concurrently in response to single instruction, and a data element in every kind of described vector registor of operation handlebar combines with the described scalar value in the described scalar register.
According to a further aspect of the present invention, provide a kind of operation processing circuit to carry out the method for instruction, having comprised: the register data element of reading to consist of the vector value component; With the execution parallel work-flow, this operation handlebar scalar value combines with each data element, to produce vector result.
Another aspect according to the present invention, a kind of method of Operation Processor, being included in provides scalar register and vector registor in the described processor, wherein each scalar register is suitable for storing single scalar value, and each vector registor is suitable for storing a plurality of data elements that consist of component of a vector; Be assigned to a register number to each scalar register, this register number is different from the register number that is assigned to other scalar register; Be assigned to a register number to each vector registor, this register number is different from the register number that is assigned to other vector registor, and at least some register number that wherein is assigned to described vector registor is identical with the register number that is assigned to described scalar register; Form an instruction, this instruction comprises first operand and second operand, and wherein first operand is the register number of sign scalar register, and second operand is the register number of mark vector register; And carry out described instruction with at transferring data by the described scalar register of described first operand sign and between by the data element in the described vector registor of described second operand sign.
Multimedia digital signal processor (DSP) according to one aspect of the invention comprises a vector processor, and this vector processor operation vector data (being that every operand has a plurality of data elements) is to provide high throughput. This processor uses the SIMD organization of RISC type instruction collection. The programmer can adapt to the programmed environment of vector processor at an easy rate, because it is similar to the programmed environment of the general processor that most of programmer is familiar with.
DSP comprises the general vector registor of a cover. Each vector registor has regular length, but is divided into the independent data element that a plurality of users can select length. Therefore, being stored in data element prime number in the vector registor depends on and is the selected length of this element. For example 32 byte registers can be divided into 32 8 data element, 16 16 data element, or 8 32 data element. The selection of data length and type is determined by the instruction of processing the data relevant with vector registor, and an execution data path of instruction is carried out a plurality of parallel work-flows, and this depends on the data length that instruction is indicated.
The instruction of vector processor can the directed quantity register or scalar register as operand, and operate concurrently a plurality of data elements of a plurality of vector registors, in order to improve computing capability. An exemplary instruction set of vector processor of the present invention comprises: the coprocessor interface operation; Flow control operations: load/store operations; And logic/arithmetical operation. The operation that logic/arithmetical operation comprises combines a plurality of data elements of the data vector that bears results to corresponding a plurality of data elements in same or a plurality of other vector registors of a plurality of data elements of a vector registor. Other logic/arithmetical operation mixes the various data elements of one or more vector registors, or the data element of vector registor is combined with scalar.
A kind of structure extension of this vector processor has added scalar register, and each scalar register comprises a scalar data element. The combination of scalar sum vector registor has made things convenient for the instruction set with vector processor to expand to comprise concurrently the operation of the same scalar value combination of each data element of a vector. For example, an instruction be multiply by a scalar value to a plurality of data elements of a vector. Scalar register also provides a position, is used for the individual data element that vector registor will extract or deposit in vector registor to storage. Scalar register is with transmission information between vector processor and coprocessor (structure of this coprocessor only provides scalar register) and also very convenient to calculating the used effective address of load/store operations.
According to a further aspect in the invention, a plurality of vector registors of vector processor are organized as a plurality of groups. Each group can be selected as " current (current) " group, and another group then is " substituting (alternative) " group. In the control register of vector processor " current group " position indication current and. In order to reduce the required figure place of mark vector register, some instruction only provides the register number of a vector registor in identifying current group. Load/store instruction has an extra order to identify the vector registor of any one group. Therefore, load/store operations can be taken out data and be delivered to alternate sets during the data of operation in current group. This helps image to process and the software pipeline operation of figure process, and the delay of reduction processor when fetch data, because with the load/store operations of accessing the alternative registers group, logic/arithmetical operation can not carried out in order. In other instruction, alternate sets allow to be used the Double Length vector registor, and this register comprises one from current group vector registor, and the corresponding vector registor from alternate sets. This Double Length register can be differentiated according to syntax of instructions. Control bit in the vector processor can be set, so that default vector length is one or two vector registor. Alternate sets also allows to use the operand of explicit identification still less in the complicated order syntax, as the conditional jump of shuffling (shuffle), going to shuffle (unshuffle), saturated (saturate) and having two sources and two destination registers.
Vector processor is also realized novel instruction, as Siping City all (average quad), shuffle, go to shuffle, paired mode maximum (pair-wise maximum) and exchange (exchange) and saturated. These instructions are carried out, and to operate in the multimedia function (for example Video coding and decoding) be common, and replace realizing in other instruction set 2 required or more instruction of said function. Thereby the vector processor instruction set has been improved efficient and the speed of multimedia application Program.
Description of drawings
Describe the preferred embodiments of the present invention in detail below in conjunction with accompanying drawing, wherein,
Fig. 1 is the block diagram of multimedia processor according to an embodiment of the invention.
Fig. 2 is the block diagram of vector processor of the multimedia processor of Fig. 1.
Fig. 3 is the block diagram of fetching unit of the vector processor of Fig. 2.
Fig. 4 is the block diagram of fetching unit of the vector processor of Fig. 2.
Fig. 5 A, 5B and 5C show the register of vector processor of Fig. 2 to the step of the used execution pipeline of register instruction, load instructions and storage instruction.
Fig. 6 A is the block diagram of execution data path of the vector processor of Fig. 2.
Fig. 6 B is the block diagram of the register file (register file) of Fig. 6 A execution data path.
Fig. 6 C is the block diagram of the parallel processing logic unit of Fig. 6 A execution data path.
Fig. 7 is the block diagram of load/store unit of the vector processor of Fig. 2.
Fig. 8 is the form of the vector processor instruction set of one embodiment of the invention.
The specific embodiment
Used same reference numeral represents similar or identical item in different figure.
Fig. 1 shows the block diagram of embodiment of the multi-media signal processor (MSP) 100 of one embodiment of the invention. Multimedia processor 100 comprises the processing core 105 that general processor 110 and vector processor 120 form. Process core 105 and link the remainder of multimedia processor 100 by cache memory (hereinafter referred to as high-speed cache) subsystem 130, high-speed buffer subsystem comprises SRAM 160 and 190, ROM170 and director cache 180. Director cache 180 can be configured to SRAM160 instruction cache 162 and the data cache 164 of processor 110, and SRAM190 is configured to instruction cache 192 and the data cache 194 of vector processor 120.
ROM170 comprises data and the instruction of processor 110 and 120 in the sheet, and can be configured to high-speed cache. In the present embodiment, ROM170 comprises: reset and initialization procedure; The self-test diagnostic procedure; Interrupt and exception handler; And sound blaster emulation subroutine; V.34 modem signal is processed subroutine; The regular phone function; 2-D and 3-D figure subroutine analyzer; And be used for Voice ﹠ Video standard such as MPEG-1, MPEG-2, H.261, H.263, G.728 and subroutine analyzer G.723.
High-speed buffer subsystem 130 is connected to two system bus 140 and 150 to processor 110 and 120, and as processor 110 and 120 and be coupled to high-speed cache and the switching station (switching station) of the equipment of bus 140 and 150. The clock frequency work that system bus 150 usefulness are higher than bus 140, and being connected to Memory Controller 158, local bus interface 156, dma controller 154 and equipment interface 152, local bus, direct memory access (DMA) and various modulus, digital to analog converter that they are respectively external partial memory, master computer provide interface. System timer 142, UART (Universal asynchronous receiver transceiver, universal asynchronous receiver transmit) 144, bit stream processor 146 and interrupt control unit 148 are connected to bus 140. The patent application of above-mentioned being entitled as " Multiprocessor Operation in a Multimedia Signal Processor " and " Methods and apparatus for Processing Video Data " has more fully illustrated the work of high-speed buffer subsystem 130 and exemplary equipment, and processor 110 and 120 is by high level cache subsystem 130 and bus 140 and the described equipment of 150 access.
Processor 110 and 120 is carried out independently program threads, and structurally also is different, in order to more effectively carry out the particular task of giving them. Processor 110 is mainly used in controlling function, for example the function that computes repeatedly in a large number of the execution of real time operating system and similarly not needing. Therefore, processor 110 does not need strong computing capability, can realize with traditional general processor structure. This repetitive operation that comprises data block common in the multimedia processing of vector processor 120 main realization mathematical computations (number crunching). For strong computing capability and relative simply programming are arranged, vector processor 120 has SIMD (Single instruction multiple data, single-instruction multiple-data) structure; In the present embodiment, most of data path is 288 or 576 bit wides in vector processor 120, with the support vector data manipulation. In addition, the instruction set of vector processor 120 comprises the instruction that is particularly useful for the multimedia problem.
In the present embodiment, processor 110 is 32 risc processors, is operated on the 40MHz, meets the structure of ARM7 processor, and described ARM7 processor includes the register set of ARM7 standard definition. About the structure of ARM 7 risc processors and instruction set at " ARM7DM Data Sheet (ARM7DM product description) " Document Number (document number): be described among the ARM DDI 0010G, this can obtain from Advance RISC Machines Ltd. company. ARM7DM Data Sheet all is included in here as a reference. Appendix A has illustrated the expansion of the ARM7 instruction set of present embodiment.
Vector processor 120 not only operates vector but also operate scalar. In the present embodiment, vector data processor 120 comprises the pipeline system RISC engine (engine) with 80MHz work. The register of vector processor 120 comprises 32 scalar registers, 32 special registers, two group of 288 bit vector register and the vectorial accumulator registers of two groups of Double Lengths (namely 576). Appendix C has illustrated the register set of the vector processor 120 of present embodiment. In the present embodiment, processor 120 comprises 32 scalar registers, and 5 bit registers of these scalar registers by scope from 0 to 31 are number identified instruction. Also have 64 288 vector registor, these registers form two groups, and every group has 32 vector registors. Each vector registor can No. 31 identify with the vector registor of 1 group number (0 or 1) and 5 scopes from 0 to. The vector registor in current group is only accessed in most of instruction, as it is represented to be stored in the default group position CBANK of control register VCSR of vector processor 120. The 2nd control bit VEC64 represents the Double Length the vector registor whether default expression of register number is comprised of a register from each group. The register number of the register number of the syntax distinctive mark vector registor of instruction and sign scalar register.
Each vector registor can be divided into the programmable a plurality of data elements of length, and table 1 shows the data type of the data element of supporting in 288 bit vector registers.
Table 1:
Data type | Data length | Explain |
int8 | 8 (byte) | 82 complement code between-128 and 127 |
int9 | 9 (byte 9) | 92 complement code between-256 and 255 |
int16 | 16 (half-word) | 16 2 complement code between-32,768 and 32,767 |
int32 | 32 (word) | 32 2 complement code between-2147483648 and 2147483647. |
float | 32 (word) | 32 IEEE 754 single-precision format |
Appendix D further provides the data length supported in the embodiments of the invention and the explanation of type.
To the int9 data type, 9 bit bytes are combined in the 288 bit vector registers continuously, and to other data type, each the 9th is not used in 288 bit vector registers. 288 bit vector registers can be put 32 8 or 9 integer data elements, 16 16 integer data elements or 8 32 integers or floating-point element. In addition, 2 vector registors can be combined with Double Length vector assembling data element. In an embodiment of the present invention, the control bit VEC64 set with among control and the status register VCSR places mode VEC64 to vector processor 120, and Double Length (576) is the default length of vector registor here.
Multimedia processor 100 also comprises 32 extended registers 115 that a cover processor 110 and 120 can be accessed. Appendix B has illustrated extended register collection and their function in the embodiments of the invention. The scalar sum special register of extended register and vector processor 120 in some cases can be for processor 110 access. 2 special uses " user " extended register has 2 read ports, allows simultaneously read register of processor 110 and 120. Other extended register can not be simultaneously accessed.
Vector processor 120 has two the state VP_RUN and the VP_IDLE that replace, and indication vector processor 120 is in work or is in idle condition. When vector processor 120 was in state VP _ IDLE, processor 110 can read or write the scalar sum special register of vector processor 120. But the result that processor 110 read or write a register of vector processor 120 when vector processor 120 was in state VP_RUN does not give definition.
Expansion to the ARM7 instruction set of processor 110 comprises access extended register and the scalar of vector processor 120 or the instruction of special register. Command M FER and MFEP move on to the scalar of extended register and vector processor 120 or the data in the special register in the general register in the processor 110 respectively, and command M TER and MTEP move on to the data of general register in the processor 110 in the scalar or special register of extended register and vector processor 120 respectively. The TESTSET instruction is read extended register and the position 30 of extended register is set to 1. Signal instruction processor 110 occurs to processor 120 and has read the result that (or use) produces by with position 30 set in instruction TESTSET, has made things convenient for user/producer synchronous. The duty of other instruction of processor 110 such as STARTVP and INTVP dominant vector processor 120.
The work of 110 primary processors of processor is in order to the operation of dominant vector processor 120. Simplify processor 110 and 120 with the asymmetric division of control between processor 110 and 120 and carried out synchronous problem. When vector processor 120 was in the VP_IDLE state, processor 110 came initialization vector processor 120 by IA is write in the program counter of vector processor 120. Then, processor 110 is carried out the STARTVP instruction, and vector processor 120 is changed over state VP_RUN. Under state VP_RUN, vector processor 120 is by high-speed buffer subsystem 130 fetchings, and the processor 110 of its program of continuation execution is carried out those instructions concurrently together. After startup, vector processor 120 continues to carry out, until run into unusual, the VCJOIN that carries out to satisfy felicity condition or VCINT instruction or interrupted by processor 110. Vector processor 120 can be sent to processor 110 with the result of program execution by the result being write extended register, the result is write the address spaces that processor 110 and 120 shares or when vector processor 120 reenters state VP_IDLE the result being stayed in the scalar or special register of processor 110 access.
Vector processor 120 is not processed the unusual of it. When execution causes unusual instruction, vector processor 120 VP_IDLE that gets the hang of, and send an interrupt requests to processor 110 by direct-through line. Vector processor 120 remains on state VP_IDLE, until processor 110 is carried out another STARTVP instruction. The register VISRC that processor 110 is responsible for read vector processor 120 may process unusually by reinitializing vector processor 120 to determine unusual character, and then, boot vector processor 120 recovers to carry out as required.
INTVP instruction interrupt vector processor 120 by processor 110 is carried out makes vector processor 120 enter idle condition VP_IDLE. Instruction INTVP can for example be used in the multitask system, and vector processor is switched to another task such as sound card emulation from task such as the video coding of carrying out.
Vector processor instruction VCINT and VCJOIN are flow control instructions, if the condition of instruction indication satisfies, these instructions make vector processor 120 place state VP_IDLE the execution of stop vector processor 120, and to 110 interrupt requests of processor, unless this request conductively-closed. The program counter of vector processor 120 (special register VPC) is pointed out the IA after VCINT or the VCJOIN instruction. Processor 110 can check the interrupt source register VISRC of vector processor 120, determines whether it is that VCINT or VCJOIN instruction cause interrupt requests. Because vector processor 120 has the mass data bus, and more effective on its register of Save and restore, so should the Save and restore register during the software of carrying out by vector processor 120 switches (context switching) at the scene. The patent application of above-mentioned being entitled as " Efficient Context Saving and Restoring in Multiprocessors " has illustrated an exemplary system of Context switches.
Fig. 2 shows the main functional diagram of the embodiment of vector processor 120. Vector processor 120 comprises 210, the decoders 220 in a fetching unit (IFU), scheduler 230, execution data path 240 and a load/store unit (LSU) 250. The IFU210 fetching is also processed flow control instructions (such as branch). Command decoder 220 is according to the order that arrives from IFU 210, and per cycle is deciphered an instruction, and a field value of deciphering out from instruction is write the FIFO in the scheduler 230. Scheduler 230 selects to send to the field value of carrying out control register according to the needs of executable operations step. Send to select to depend on operand dependence (dependency) and processing resource such as execution data path 240 or pack into/availability of memory cell 250. Logic/the arithmetic instruction of execution data path 240 executable operations vectors or scalar data. Pack into/memory cell 250 carries out the instruction of packing into/store of the address space of access vector processors 120.
Fig. 3 shows the block diagram of the embodiment of IFU210. IFU comprises an instruction buffer, and this buffer is divided into main instruction buffer 310 and ancillary instruction buffer 312. Main buffer 310 comprises 8 continual commands, comprising the instruction corresponding to the present procedure counting. Comprise 8 instructions of the instruction in the buffer 310 and then in the secondary buffer 312. IFU210 also comprises a branch target buffer 314, and it comprises 8 continual commands, comprising the target of next flow control instructions in buffer 310 or 312. In the present embodiment, vector processor 120 uses the risc type instruction set, wherein every instruction be 32 long, buffer 310,312 or 314 is 8 * 32 digit buffers, and links high-speed buffer subsystem 130 by 256 bit instruction buses. IFU 210 can be within a clock cycle, and 8 instructions in the high-speed buffer subsystem 130 are loaded in the buffer 310,312 or 314 any one. Register 340,342 and 344 is indicated respectively the base address of load in the buffer 310,312 and 314.
MUX 332 is selected current instruction from main instruction buffer 310. If present instruction is not flow control instructions, and be stored in the decoding stage that instruction in the command register 330 proceeds to execution, then command register 330 is deposited in present instruction, is incremented to programmed counting. Behind the programmed counting increment, select the last item instruction in the buffer 310, then 8 instructions of next group are loaded onto buffer 310. If buffer 312 comprises desired 8 instructions, then the content of buffer 312 and register 342 moves on to buffer 310 and register 340 immediately, has again 8 instructions to deliver to secondary buffer 312 from cache systems 130 pre-fetchings. Adder 350 is determined the address of next group instruction according to the base address in the register 342 and the side-play amount selected by MUX 352. The result address that is obtained by adder 350 is stored in the register 342, when this moves on to register 340 in this address from register 342 or carry out later on. The address that calculates is also delivered in the high-speed buffer subsystem 130 in company with the request of 8 instructions. If called cache control system 130 last time, when buffer 310 request, also not 8 instructions below buffer 312 provides, then the instruction of request last time when receiving from high-speed buffer subsystem 130, is stored in the buffer 310 immediately.
If present instruction is flow control instructions, IFU210 by convection control instruction condition calculating and after flow control instructions refresh routine count to process this instruction. If because the instruction that the front may change condition do not finish, and condition pauses IFU210 when can not determine. If branch does not occur, program counter is incremented, and following instruction is selected as mentioned above. If the target that branch and branch target buffer 314 comprise this branch occurs, then the content of buffer 314 and register 344 is moved to buffer 310 and register 340, instruction is provided and need wait for from the instruction in the high-speed buffer subsystem 130 so that IFU 210 can continue as decoder 220.
In order to be branch target buffer 314 prefetched instructions, scanner 320 scanning buffer devices 310 and 312 are to locate the and then next flow control instructions of present procedure counting. If find flow control instructions in buffer 310 or 312, scanner 320 is determined to comprise the side-play amount of 8 instructions of flow control instructions destination address to one group of (aligned) that aims at from the buffer 310 that comprises this instruction or 312 base address. MUX 352 and 354 provides the side-play amount of flow control instructions and from the base address of register 340 or 342, is that buffer 314 produces new base address by adder 350 for adder 350. New base address is transferred to high-speed buffer subsystem 130, moreover it provides 8 instructions for branch target buffer 314.
Processing flow control instructions such as " decrement and conditional jump " instruction VD1CBR, VD2CBR and VD3CBR, when reaching " change control register " instruction VCHGCR, IFU210 can change the value of the register except programmed counting. When IFU 210 found the instruction of a non-flow control instructions, command register 330 was delivered in this instruction, and from there to decoder 220.
As shown in Figure 4, each field of the fifo buffer 410 of decoder 220 by controlling value being write scheduler 230 is deciphered an instruction. Fifo buffer 410 comprises 4 line triggers, and wherein every delegation can comprise 5 information fields, in order to control the execution of an instruction. Row 0 to 3 keeps arriving the earliest respectively the information of up-to-date instruction, when information is early finished along with instruction and when being removed, the information in fifo buffer 410 moves down into lower row. Scheduler 230 sends an instruction to the execution phase by selecting essential instruction field to be loaded into to comprise the control pipeline 420 of carrying out register 421 to 427. Most of instruction can be scheduled, in order to do not send in order and carry out. Especially the order about logic/arithmetical operation and load/store operations is arbitrarily, unless the operand dependence is arranged between load/store operations and logic/arithmetical operation. Field value relatively indicates whether have operation dependency to exist in the fifo buffer 410.
Fig. 5 A illustrates 6 stage execution pipelines of an instruction, and this instruction has realized the operation of register to register, and need not access the address space of vector processor 120. In the instruction fetching stage 511, as mentioned above fetching one instruction of IFU210. The fetching stage needs 1 clock cycle, unless because pipelining delay, unsolved branch condition or the delay in the high-speed buffer subsystem 130 that prefetched instruction is provided pause IFU210. In the decoding stage 512, decoder 220 decoding is from the instruction of IFU210, and the information of this instruction is write scheduler 230. The decoding stage 512 also needs a clock cycle, unless to new operation, among the FIFO 410 without available row. During the period 1 of FIFO 410, can send and operate control pipeline 420, but can be delayed owing to sending of operation early.
Executing data passage 240 is realized registers to the operation of register, and provides data and address for load/store operations. Fig. 6 A shows the block diagram of execution data path 240 1 embodiment, and is illustrated together with the execution phase 514,515 and 516. Carrying out register 421 provides the signal of two registers in the marker register file 610, and register file 610 was read in the clock cycle during read phase 514. Register file 610 comprises 32 scalar registers and 64 vector registors. Fig. 6 B is the block diagram of register file 610. Register file 610 has 2 read ports and 2 write ports, in order to provide 2 to read to write with 2 in each clock cycle. Each port comprises selects circuit 612,614,616 or 618 and 288 data/address bus 613,615,617 or 619. Selecting circuit is to know such as circuit 612,614,616 and 618 in the art, and use address signal WRADDR1, WRADDR2, RDADDR1 or RDADDR2, this be decoder 220 from generally be 5 bit registers that in instruction, provide number, group position from instruction or state of a control register VCSR, and indicator register be to obtain vector registor or the syntax of instructions of scalar register. The path that data are read can be to load/store unit 250, perhaps by MUX 622 and 624, by multiplier 620 ALUs 630, accumulator 640 by MUX 656. 2 registers are read in most of operation, and read phase 514 is finished in one-period. Yet, some instruction, as take advantage of and the instruction that adds instruction VMAD and operation Double Length vector need to more than the data of 2 registers, cause read phase 514 to need to surpass a clock cycle.
In the execution phase 515, multiplier 620, ALU 630 and accumulator 640 are processed the data that read from register file 610 front. If in order to read necessary a plurality of cycles of data demand, the execution phase 515 can be overlapping with read phase 514. The duration of execution phase 515 is depended on type (integer or floating type) and the quantity (read cycle data) of deal with data element. From carry out register 422,423 and 425 signal controlling data inserting to ALU 630, accumulator 640 and multiplier 620 in order to realize that in the execution phase first step operates. From carry out register 432,433 and 435 signal controlling realizes the second step operation in the execution phases 515.
Fig. 6 C shows the block diagram of multiplier 620 and ALU 630 1 embodiment. Multiplier 620 is integer multiplier, and it comprises 8 independently 36 * 36 multipliers 626. Each multiplier 626 comprises 49 * 9 multipliers that link together by control circuit. To 8 and 9 bit data elements width, disconnect the mutual binding of 49 * 9 multipliers from the control signal of scheduler 230, so that each multiplier 626 is realized 4 multiplication, multiplier 620 is realized 32 independently multiplication in one-period. To 16 bit data elements, control circuit 9 * 9 multipliers to the operation that links together. Multiplier 620 is realized 16 parallel multiplications. To 32 integer data element types, 8 626 each clock cycle of multiplier are realized 8 parallel multiplications. The result of multiplication provides 576 results to 9 bit data elements width, provides 512 results to other data length.
At write phase 516, from the result store of execution phase in register file 610. Within a clock cycle, can write 2 registers, 2 data values that input MUX 602 and 605 selections will be write. The duration of the write phase 516 of once-through operation depends on the data volume that will be write as operating result and from the competition of LSU 250, LSU 250 may be by writing to finish the loading instruction to register file 610. Select register that the data from logical block 630, accumulator 640 and multiplier 620 are write from the signal of carrying out register 426 and 427.
Fig. 5 B illustrates and carries out the execution pipeline 520 that loads instruction. Identical for instruction fetching stage 511, decoding stage 512 and the stage of sending 513 of execution pipeline 520 and illustrated register to the operation of register. Read phase 514 is also identical with top explanation, just execution data path 240 usefulness from the data of register file 610 to determine the address of calls cache subsystem 130. At address phase 525, MUX 652,654 and 656 is selected the address, and this address is provided for the load/store unit 250 of execution phase 526 and 527. When load/store unit 250 was processed operation, during stage 526 and 527, the Information preservation of load operation was in FIFO 410.
Fig. 7 shows an embodiment of load/store unit 250. Calls cache subsystem 130 during the stage 256 is with the data of request stage 525 determined addresses. Present embodiment uses (transaction based) high-speed cache based on affairs to call, and can pass through high-speed buffer subsystem 130 access local address spaces comprising a plurality of equipment of processor 110 and 120. In several cycles after calls cache subsystem 130, requested data may can not get, but when other called hang-up, load/store unit 250 can the calls cache subsystems. Therefore, load/store unit 250 unlikely pauses. High-speed buffer subsystem 130 provides the required clock periodicity of requested data to depend on hitting of data cache 194 or miss (hit or miss).
In the driving stage 527, high-speed buffer subsystem 130 is that load/store unit 250 is confirmed (assert) data-signal. High-speed buffer subsystem 130 can provide the data of 256 (32 bytes) to load/store unit 250 in each cycle, and byte alignment device 710 is aimed at each byte of 32 bytes in corresponding 9 memory locations, so that 288 value to be provided. 288 form is easily to the multimedia application of for example mpeg encoded and decoding, and they use 9 bit data elements sometimes. 288 place values write read data 720. To write phase 528, scheduler 230 is sent to the field 4 of fifo buffer 410 and carries out register 426 or 427, and 288 value of data buffer 720 is write register file 610.
Fig. 5 C shows and carries out the used execution pipeline 530 of storage instruction. The fetching stage 511 of execution pipeline 530, decoding stage 512 and the stage of sending 513 are identical with what illustrate previously. Read phase 514 is also identical with what illustrate previously, and just read phase is read data and the used data of address computation that will store. Want stored data to be written into write data buffer 730 in the load/store unit 250. MUX 740 becomes the data transaction of 9 bit byte forms the form of traditional octet. From the data of the conversion of buffer 730 with from the relative address in address computation stage 525, during the SRAM stage 536, delivered to concurrently high-speed buffer subsystem 130.
In the embodiment of vector processor, each instruction be 32 long and have a kind of form in 9 kinds of forms shown in Fig. 8, and be labeled as REAR, REAI, RRRM5, RRRR, RI, CT, RRRM9, RRRM9*, and RRRM9** Appendix E has illustrated the instruction set of vector processor 120.
When determining an effective address, use some loading, storage and the cache operations of scalar register to have the REAR form. The REAR format order is that 000b identifies and 3 operands arranged by 3 register number sign with a position 29-31, and 2 register number SRb and SRi are scalar register, and register number Rn can be scalar or vector registor, and this depends on a D. Group position B or for register Rn identifies a group, if indicate whether when perhaps the default vector register size is Double Length that vector registor Rn is Double Length. The operation that opcode field Opc sign is carried out operand, and field TT indication transmission type is for loading or storage. Typical REAR format order is instruction VL, and it comes bit load registers Rn from scalar register SRb and the definite address of SRi content addition. If position A is set, the address of calculating is stored among the scalar register SRb.
The REAI format order is identical with the REAR instruction, just is used to replace the content of scalar register SRi from 8 immediate values of field IMM. REAR and REAI form are countless according to the length of element field.
The RRRM5 form is used for having the instruction of 2 source operands and a destination operand. These instructions have 3 register manipulation numbers or 2 register manipulation numbers and 15 immediate value. Coding at field D, the S shown in the appendix E and M determines whether that first source operand Ra is scalar or vector registor; Whether the 2nd source operand Rb/IM5 is scalar register, vector registor or 5 immediate values; And whether destination register Rd is scalar or vector registor.
The RRRR form is used for having the instruction of 4 register manipulation numbers. Register number Ra and Rb indication source register. Register number Rd indicates destination register, and register number Rc indication source or destination register, this depends on field Opc. The all operations were number is vector registor, is scalar register unless position S is set indicator register Rb. The data element length of field DS indication vector registor. Field Opc selects the data type of 32 bit data elements.
The RI format order loads an immediate value to register. Field IMM comprises can reach 18 immediate value. Register number Rd indicates destination register, and this destination register is vector registor or the scalar register in current group, and this depends on a D. Field DS and F be length and the type of designation data element respectively. To 32 integer data elements, 18 immediate values are being loaded into register Rd with the previous crops sign extended. To the floating data element, position 18, position 17 to 10 and position 9 to 0 represent respectively symbol, the exponential sum mantissa of 32 floating point values.
The CT form is used for flow control instructions, and it comprises opcode field Opc, condition field Cond and 23 s' immediate value IMM. When the condition field indicated condition is true time, branch then occurs. Possible condition code is " always (unconditionally) ", " Less than (less than) ", " equal (equaling) ", " Less than or equal (being less than or equal to) ", " greater than (greater than) ", " not equal (being not equal to) ", " greater than or equal (more than or equal to) " and " overflow (overflowing) ". Position GT, EQ, LT and SO among state and the control register VCSR are used for appreciation condition.
Form RRRM9 provides 3 register manipulation numbers or 2 register manipulation numbers and 19 immediate value. Which operand the combination of position D, S and M indicates is vector registor, scalar register or 9 immediate values. Field DS designation data length of element. RRRM9*And RRRM9**Form is the special circumstances of RRRM9 form, and distinguishes with opcode field Opc. RRRM9* form condition code Cond and id field alternate source register number Ra. RRRM9**Form replaces each highest significant position of immediate value with condition code Cond and position K. RRRM9*And RRRM9**Further specify in appendix E and provide, relate to conditional branch instruction VCMOV, element shielding conditional jump CMOVM and comparison and masking instruction CMPV be set.
Although in conjunction with specific embodiments the present invention has been made explanation, but these explanations only are the examples that the present invention uses, should be as being not a kind of restriction, the various modifications of the disclosed embodiments characteristics and combination still belong to the scope of the present invention that following claim defines in addition.
Appendix A
In an exemplary embodiment, processor 110 is the general processors according to ARM7 processor standard. In ARM7 to the description references ARM structured file of register or ARM7 tables of data (document number ARM DDI 0020C, in December, 1994 distribution).
In order to cooperatively interact 110 processors with vector processor 120: starting and stop vector processor; The test vector processor state comprises synchronous regime; Scalar/special register from vector processor 120 passes to data in the general register of processor 110; And the scalar/special register that the data in the general register is passed to vector processor. Between the vector registor of general register and vector processor, there is not direct conveyer, these transmission need memory as mediator.
Table A .1 has illustrated the ARM7 instruction set of expanding for the reciprocation of vector processor.
Table A .1: the ARM7 instruction set of expansion
Instruction | The result |
STARTVP | This instruction makes vector processor enter the VP-RUN state, if vector processor has entered the VP-RUN state then without impact. STARTVP carries out as processor data operation (CDP) class in the ARM7 structure, turns back to ARM7 without the result, and ARM7 continues its execution. |
INTVP | This instruction makes vector processor enter the VP-IDEL state, if vector processor has entered the VP-IDEL state then without impact. INTVP carries out as processor data operation (CDP) class in the ARM7 structure, turns back to ARM7 without the result, and ARM7 continues its execution. |
TESTSET | User's extended register is read in this instruction, and register- |
MFER | Transfer to the ARM general register from extended register, in the ARM7 structure, MFER carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted). |
Instruction | The result |
MFVP | Transfer to the ARM7 general register from the scalar/special register of vector processor. Be different from other ARM7 instruction, this instruction is only carried out when vector processor is in VP-IDLE state. Otherwise its result is undefined. In the ARM7 structure, MFVP carries out as processor register transfer (MRC) class. ARM7 gets clogged, until instruction is performed (register is transmitted). |
MTER | Transfer to extended register from the ARM7 general register, in the ARM7 structure, MTER transmits (MCR) class as coprocessor register and carries out. ARM7 gets clogged, until this instruction is performed (register is transmitted). |
MTVP | Transfer to the scalar/special register of vector processor from the ARM7 general register, be different from other ARM7 instruction, this instruction is only carried out when vector processor is in VP_ IDLE state. Otherwise its result is undefined. In the ARM7 structure, MTVP transmits (MCR) class as coprocessor register and does not carry out. ARM7 gets clogged, until this instruction is performed (register is transmitted). |
CACHE | The software administration of ARM7 data cache is provided |
PFTCH | The cache line of looking ahead is delivered to the ARM7 data cache. |
WBACK | The cache line that the ARM7 data cache is come is written back in the memory. |
Table A .2 has listed the unusual of ARM7, before carrying out the fault instruction, detects and reports that these are unusual. The exception vector address provides with sexadecimal notation.
Table A .2:ARM7 is unusual
Exception vector | Explanation |
0x00000000 | ARM7 resets |
0x00000004 | The ARM7 undefined instruction is unusual |
0x00000004 | Vector processor is unavailable unusual |
0x00000008 | The ARM7 software interrupt |
0x0000000C | The ARM7 single step is unusual |
0x0000000C | ARM7 IA breakpoint is unusual |
0x00000010 | ARM7 data address breakpoint is unusual |
0x00000010 | ARM7 invalid data address is unusual |
0x00000018 | The ARM7 protection is violating the regulations unusual |
The following describes the syntax that the ARM7 instruction set is expanded. About the form of the explanation of term and instruction with reference to ARM structured file or ARM7 tables of data (document number ARM DDI 0020C, deliver in December, 1994).
The ARM structure provides 3 kinds of instruction formats for coprocessor interface:
1. coprocessor data manipulation (CDP)
2. the coprocessor data transmit (LDC, STC)
3. coprocessor register transmits (MRC, MCR)
Whole two kinds of forms are used in the expansion of MSP structure.
The coprocessor data manipulation form (CDP) that uses for operation need not return to ARM7. The CDP form
30 25 20 15 10 5 0
The CDP format fields has following agreement:
Field | Meaning |
Cond | Condition field, this field designated order executive condition |
Opc | The co processor operation code |
CRn | The co processor operation number register |
CRd | The coprocessor destination register |
CP# | Coprocessor number; Below coprocessor number be current use: 1111-ARM7 data cache 0111-vector processor, the register of expansion |
CP | Coprocessor information |
CPm | The co processor operation number register |
Coprocessor data transfer format (LDC, STC) is used for directly loading or the register subset of storage vector processor arrives memory. The ARM7 processor is responsible for providing word address, and vector processor provides or receive data, and the number of words of control transmission. More detailed content is with reference to the ARM7 tables of data. LDC, the STC form
30 25 20 15 10 5 0
Format fields has following agreement:
Field | Meaning |
Cond | Condition field, this field designated order executive condition |
P | The Pre/Post flag bit |
U | The Up/Down position |
N | Transmit length, because the CRd field does not have enough figure places, position N uses as a part of source or destination register identifier. |
W | The write-back position |
L | Load/the storage position |
Rn | Base register |
CRn | Coprocessor source/destination register |
CP# | Coprocessor number, following coprocessor number are current uses: 1111-ARM7 data cache 0111-vector processor, the register of expansion |
Offset | Without 8 of symbols side-play amount immediately |
Coprocessor register transformat (MRC, MCR) is used for directly transmission information between ARM7 and vector processor. This form is used in the scalar of ARM7 register and vector processor or the transfer between the special register.
MRC, the MCR form
30 25 20 15 10 5 0
This format fields has following agreement:
Field | Meaning |
Cond | Condition field, the condition that this field designated order is carried out |
Opc | The co processor operation code |
L | Loading/storage position L=0 moves on to vector processor L=1 and moves from vector processor |
CRn:Crm | Coprocessor source/destination register. CRn<1:0 only 〉: CRm<3:0〉be used |
Rd | ARM source/destination register |
CP# | Coprocessor number, following coprocessor number are current uses: 1111=ARM7 data cache 0111=vector processor, the register of expansion |
CP | Coprocessor information |
The ARM instruction of expansion
The ARM instruction alphabet sequence of expansion is explained.
The CACHE cache operations
Form
30 25 20 15 10 5 0
The assembler syntax
STC{cond}p15,cOpc,<Address>
CACHE{cond}Opc,<Address>
Cond={eq wherein, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, It, gt, le, ai, nv} and Opc={0,1,3}. Note, because the CRn field of LDC/STC form is used to specify Opc. The decimal representation of Opcode must be by letter " C " take the lead (namely representing 0 with CO) in the first syntax. About the address mode syntax with reference to the ARM7 tables of data.
Explanation
Be true time at Cond only, carry out this instruction. Opc<3:0〉indicate following operation:
Opc<3:0> | Meaning |
0000 | Write-back and calcellation are by the cache line of the change of EA appointment. If the row of coupling comprises the data of not changing, this row is cancelled, and refuses write-back. If can't find the cache line that comprises EA, data cache keeps remaining untouched. |
0001 | Write-back and calcellation are by the cache line of the change of EA traction appointment. If matching row comprises the data of not changing, this row is cancelled refuses write-back. |
0010 | Be used for PFTCH and WBACK instruction |
0011 | Calcellation is by the cache line of EA appointment. Even this row was changed, this cache line is also by cancel (not write-back). This is a kind of privileged operation, if attempt to use under user mode, it will cause that the ARM7 protection is violating the regulations |
Other | Keep |
Operation
With reference to the ARM7 tables of data, how EA calculates.
Unusually
The ARM7 protection is violating the regulations.
INTVP interrupt vector processor
The assembler syntax
CDP{cond}p7,1,c0,c0,co
INTVP{cond}
Cond={eq wherein, ne, cs, cc, mi, pl, vs, vc, hi, Is, ge, It, gt, le, al, ns}.
Explanation
This instruction is that true time is carried out at Cond only. This instruction is signaled vector processor is stopped. ARM7 needn't wait for that vector processor stops, and continues to carry out next instruction.
Should use MFER busy waiting circulation and whether after this instruction is carried out, stop in order to looking at vector processor. If vector processor is at the VP_IDLE state, then this instruction is inoperative. Position 19:12,7:15 and 3:0 are retained.
Unusually
Vector processor is unavailable.
MFER shifts from extended register
Form
30 25 20 15 10 5 0
The assembler syntax
MRC{cond}p7,2,Rd,cP,cER,0
MFER{cond}Rd,RNAME
Cond={eg wherein, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, P={0,1}, ER={0 ... .15} and RNAME refers to the register memonic symbol (that is, PERO or CSR) of appointment on the structure.
Explanation
This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to P:ER<3:0〉the extended register ER of appointment shifts, and is as shown in the table. Explanation with reference to chapters and sections 1.2 extended registers.
ER<3:0> | P=0 | P=1 |
0000 | UER0 | PER0 |
0001 | UER1 | PER1 |
0010 | UER2 | PER2 |
0011 | UER3 | PER3 |
0100 | UER4 | PER4 |
0101 | UER5 | PER5 |
0110 | UER6 | PER6 |
0111 | UER7 | PER7 |
1000 | UER8 | PER8 |
1001 | UER9 | PER9 |
ER<3:0> | P=0 | P=1 |
1010 | UER10 | PER10 |
1011 | UER11 | PER11 |
1100 | UER12 | pER12 |
1101 | UER13 | PER13 |
1110 | UER14 | PER14 |
1111 | UER15 | PER15 |
Position 19:17 and 7:5 are retained
Unusually
When attempting to access PERx in user mode, protection is violating the regulations.
MFVP shifts from vector processor
Form
The assembler syntax
MRC{cond}p7,1,Rd,Crn,CRm,0
MFVP{cond}Rd,RNAME
Cond={eq wherein, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=and r0 ... r15}, CRn={c0 ... .c15}, CRm={c0 ... .c15} and RNAME refers to the register memonic symbol (that is, SPO or VCS) of appointment on the structure
Explanation
This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to the scalar of vector processor/special register CRn<1:0 〉: CRm<3:0〉shift. Distribution with reference to register transfer vector processor register number among the chapters and sections 3.2.3.
Position 7.5 and CRn<3:2〉be retained.
Below the vector processor register mappings is presented at. The table 15 of reference vector processor special register (SP0-SP15).
CRM<3:0> | CRn<1:0>=00 | CRn<1:0>=01 | CRn<1:0>=10 | CRn<1:0>=111 |
0000 | SR0 | SR16 | SP0 | RASR0 |
0001 | SR1 | SR17 | Sp0 | RASR1 |
0010 | SR2 | SR18 | SP0 | RASR2 |
0011 | SR3 | SR19 | SP0 | RASR3 |
0100 | SR4 | SR20 | SP0 | RASR4 |
0101 | SR5 | SR21 | SP0 | RASR5 |
0110 | SR6 | SR22 | SP0 | RASR6 |
0111 | SR7 | SR23 | SP0 | RASR7 |
1000 | SR8 | SR24 | SP0 | RASR8 |
1001 | SR9 | SR25 | SP0 | RASR9 |
1010 | SR10 | SR26 | SP0 | RASR10 |
1011 | SR11 | SR27 | SP0 | RASR11 |
1100 | SR12 | SR28 | SP0 | RASR12 |
1101 | SR13 | SR29 | SP0 | RASR13 |
1110 | SR14 | SR30 | SP0 | RASR14 |
1111 | SR15 | SR31 | SP0 | RASR15 |
SR0 often reads 32 zero, and ignores writing it.
Unusually
Vector processor is unavailable.
MTER transfers to extended register
The assembler syntax
MRC{cond}p7,2,Rd,cP,cER,0
MFVP{cond}Rd,RNAME
Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, P={0,1}, ER={0 ... 15}. RNAME refers to the register memonic symbol (that is, PERO or CSR) of appointment on the structure.
Explanation
This instruction is that true time is carried out in condition only. ARM7 register Rd is according to P:ER<3:0〉the extended register ER of appointment shifts. As shown in the table
ER<3:0> | P=0 | P=1 |
0000 | UER0 | PER0 |
0001 | UER1 | PER1 |
0010 | UER2 | PER2 |
0011 | UER3 | PER3 |
0100 | UER4 | PER4 |
0101 | UER5 | PER5 |
0110 | UER6 | PER6 |
0111 | UER7 | PER7 |
1000 | UER8 | PER8 |
1001 | UER9 | PER9 |
1010 | UER10 | PER10 |
1011 | UER11 | PER11 |
1100 | UER12 | PER12 |
1101 | UER13 | PER13 |
1110 | UER14 | PER14 |
1111 | UER15 | PER15 |
Position 19:17 and 7:5 are for subsequent use
Unusually
Attempt is when user mode access PERx, and protection is violating the regulations.
MTVP transfers to vector processor
The assembler syntax
MRC{cond}p7,1,Rd,Crn,CRm,0
MFVP{cond}Rd,RNAME
Here Cond={eq, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd=r0 ... and r15}, CRn={c0 ... .c15}, CRm={c0 ... .c15}. RNAME refers to the register memonic symbol (that is, SPO or VCS) of appointment on the structure.
Explanation
This instruction is that true time is carried out at Cond only. ARM7 register Rd is according to the scalar of vector processor/special register CRn<1:0 〉: CRm<3:0〉shift.
Position 7:5 and CRn<3:2〉keep.
The vector processor register mappings is as follows
CRM<3:0> | CRn<1:0>=00 | CRn<1:0>=01 | CRn<1:0>=10 | CRn<1:0>=111 |
0000 | SR0 | SR16 | SP0 | RASR0 |
0001 | SR1 | SR17 | SP0 | RASR1 |
0010 | SR2 | SR18 | SP0 | RASR2 |
0011 | SR3 | SR19 | SP0 | RASR3 |
0100 | SR4 | SR20 | SP0 | RASR4 |
0101 | SR5 | SR21 | SP0 | RASR5 |
0110 | SR6 | SR22 | SP0 | RASR6 |
0111 | SR7 | SR23 | SP0 | RASR7 |
1000 | SR8 | SR24 | SP0 | RASR8 |
1001 | SR9 | SR25 | SP0 | RASR9 |
1010 | SR10 | SR26 | SP0 | RASR10 |
1011 | SR11 | SR27 | SP0 | RASR11 |
1100 | SR12 | SR28 | SP0 | RASR12 |
1101 | SR13 | SR29 | SP0 | RASR13 |
1110 | SR14 | SR30 | SP0 | RASR14 |
1111 | SR15 | SR31 | SP0 | RASR15 |
Unusually
Vector processor is unavailable.
PFTCH looks ahead
Form
30 25 20 15 10 5
0
The assembler syntax
LDC{cond}p15,2,<Address>
PFTCH{cond}<Address>
Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, Is, ge, lt, gt, le, al, nv}, the ARM7 tables of data of reference address mode syntax.
Explanation
This instruction is that true time is carried out at Cond only. Cache line by the EA appointment is pre-fetched in the ARM7 data cache.
Operation
How to be calculated about EA, with reference to the ARM7 tables of data.
Unusually: nothing
STARTVP start vector processor
Form
30 25 20 15 10 5
0
The assembler syntax
CDP{cond}p7,0,cO,cO,cO
STARTVP{cond}
Cond={eq wherein, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, it, gt, le, al, nv}.
Explanation
This instruction is that true time is carried out at cond only. This instruction is signaled to vector processor, starts to carry out and automatically remove VISRC<vjp〉and VISRC<vip. ARM7 does not wait for that vector processor starts execution, continues to carry out next instruction.
The state of vector processor must be initialized to desired state before this instruction is carried out. If vector processor is at the VP-RUN state, then this instruction is without effect.
Position 19:12,7:5 and 3:0 keep.
Unusually
Vector processor is unavailable.
TESTSET test and setting
Form
30 25 20 15 10 5
0
The assembler syntax
MRC{cond}p7,0,Rd,cO,cER,0
TESTSET{cond}Rd,RNAME
Here cond={eq, he, cs, cc, mi, p1, rs, re, hi, ls, ge, It, gt, le, al, nv}. Rd=and r0....r15}, ER={0 ... ..15}, RNAME refer to the register memonic symbol (that is, VER1 or VASYNC) of appointment on the structure.
Explanation
This instruction is that true time is carried out at cond only, and this instruction turns back to the content of UERX among the RD, and sets UERX<30〉be 1. If destination register is appointed as by ARM7 register 15 then UERx<30〉return in the Z position of CPSR, in order to can realize short busy waiting circulation.
Current, only have UER1 to be prescribed in company with reading instruction works.
Position 19:17 and 7:5 keep.
Unusually: nothing
Appendix B
The organization definition of multimedia processor 100 extended register of processor 110 usefulness MFER and MTER instruction access, extended register comprises special permission extended register and user's extended register.
The special permission extended register is mainly used in controlling the operation of multi-media signal processor. B.1 they be shown in table
Show B.1: the special permission extended register
Number | Memonic symbol | Explanation |
PER0 | CTR | Control register |
PER1 | PVR | The processor type register |
PER2 | VIMSK | The vector IMR |
PER3 | ALABR | ARM7 IA breakpoint register |
PER4 | ADABR | ARM7 data address breakpoint register |
PER5 | SPREG | The scratchpad register |
PER6 | STR | Status register |
The operation of control register control MSP100, all positions among the CTR are eliminated when resetting, and B.2 the definition of register shown in showing.
Table definition B.2:CTR
The position | Memonic symbol | Explanation |
31-13 | Keeping the position reads as 0 forever | |
12 | VDCI | Vector data cache invalidation position. During set, it is invalid that whole vector processor data caches are become. Because the cache invalidation operation can conflict with normal cache operations usually, so can only support an invalid code sequence. |
11 | VDE | Vector data cache enabling position. When removing, forbid the vector processor data cache |
10 | VICI | Vector instruction cache invalidation position. It is invalid that whole vector processor instruction caches are become. Because the cache invalidation operation can conflict with normal cache operations usually. So can only support an invalid code sequence. |
9 | VICE | Vector instruction cache enabling position. When removing, forbid the vector processor instruction cache. |
The position | Mnemonic symbol | Explanation |
8 | ADCI | ARM7 data cache invalid bit. When set, it is invalid that whole ARM7 data caches are become. Because cache invalidation operates usually together with normal cache operations conflict, so only support an invalid code sequence. |
7 | ADCE | ARM7 data cache enable bit. When removing, forbid the ARM7 data cache. |
6 | AICI | ARM7 instruction cache invalid bit. When set, it is invalid that whole ARM7 instruction caches are become. Because cache invalidation operates usually together with normal cache operations conflict, so only support an invalid code sequence. |
5 | AICE | ARM7 instruction cache enable bit. When removing, forbid the ARM7 instruction cache |
4 | APSE | ARM7 processor single step enable bit. When set, make the ARM7 processor after carrying out an instruction, it is unusual that the single step of ARM7 processor occurs. The single step function only obtains under user or way to manage. |
3 | SPAE | Scratchpad access enable bit. When setting, allow ARM7 to process from scratchpad and load or deposit scratchpad. When removing, attempt loading or be stored into scratchpad unusual to produce ARM7 invalid data address |
2 | VPSE | Vector processor single step enable bit. When setting, make vector processor after carrying out an instruction, it is unusual that the vector processor single step occurs. |
1 | VPPE | Vector processor streamline enable bit. When removing, the configuration vector processor is in order to operate under the nonpipeline mode. This moment, it was movable only having an instruction in the vector processor execution pipeline. |
0 | VPAE | Vector processor access enabled position. When setting, make as mentioned above the ARM7 instruction of ARM7 processing execution expansion. When removing, stop ARM7 processing execution expansion ARM7 instruction. All such attempts can produce unavailable unusual of vector processor |
The state of status register instruct MS P100. All positions among the field STR are eliminated when resetting, and B.3 the definition of register shown in showing.
Show B.3 STR definition
The position | Memonic symbol | Explanation |
31:23 | Reservation position-forever pronounce 0 | |
22 | ADAB | When ARM7 data address breakpoint coupling occured, ARM7 data address breakpoint exception bits was set up, and interrupting report by data exception should be unusual. |
21 | AIDA | When ARM7 loads or the storage instruction attempts to access debatable address or MSP concrete scheme when not finishing, maybe when attempting to access a unallowed scratch pad memory, it is unusual to produce ARM7 invalid data address. Thisly unusually can stop interrupt reporting by data. |
20 | AIAB | When ARM7 IA breakpoint matches now, ARM7 IA breakpoint exception bits is set. This stops by looking ahead interrupting reporting unusually. |
19 | AIIA | ARM7 illegal command address is unusual. This exception stops by looking ahead interrupting reporting. |
18 | ASTP | The ARM7 single step is unusual. This stops by looking ahead interrupting reporting unusually. |
17 | APV | ARM7 protection violation. The exception is reported via the IRQ interrupt |
16 | VPUA | Vector processor can not get an exception, the exception can not get through the coprocessor Interrupt to report |
15-0 | Reserved - always read as 0 |
Processor type (Version) register identifies the processor specific multimedia signal processor family
Processor type.
Vector processor interrupt mask register VIMSK control processor 110 different vector processor
Often reported. With VISRC register when the corresponding bit is set when, VIMSK in each one
ARM7 to interrupts generated an exception. It does not affect how to detect abnormal vector processors, but the impact is
No exceptions will be interrupted ARM7. In VIMSK all the bits are cleared at reset. Register set
Defined as shown in Table B.4
Table B.4: VIMSK Definition
Position | Mnemonic | Explanation |
31 | DABE | Data address break interrupt enable |
30 | LABE | Instruction address break interrupt enable |
29 | SSTPE | Single-step interrupt enable |
28-14 | Reserved - always read as 0. | |
13 | FOVE | Floating point overflow interrupt enable |
12 | FINVE | Illegal floating point operand interrupt enable |
11 | FDIVE | Floating-point division by zero interrupt enable |
10 | IOVE | Integer overflow interrupt enable |
9 | IDIVE | Integer divide by zero interrupt is enabled |
8-7 | Reserved - always read as 0 | |
6 | VIE | VCINT interrupt enable |
5 | VJE | VCJOIN interrupt enable |
4-1 | Reserved - always read as 0 | |
0 | CSE | Context switching is enabled |
ARM7 instruction address breakpoint registers ARM7 aid debugging process. Register Definition Table
B.5 shows.
Table B.5: AIABR Definition
Position | Mnemonic | Explanation |
31-2 | LADR | |
1 | Reserved, always read as 0 | |
0 | LABE | Instruction address breakpoints can, cleared on reset. If set, when "ARM7 instruction accesses address" matches ALABR <31:2>, And VCSR <AIAB> cleared occurs ARM7 instruction to Address breakpoint exception, VCSR <ALAB> set to indicate an exception. When a match occurs, if VCSR <ALAB> has been set, then the VCSR <AIAB> cleared match is ignored. In the instruction execution Before reporting anomalies. |
"ARM7 Data Address Breakpoint Registers" Auxiliary ARM7 debug procedures. Register Definition
As shown in Table B.6.
Table B.6: ADABR Definition
Position | Mnemonic | Explanation |
31-2 | DADR | ARM data addresses. Undefined at |
1 | SABE | Storage "Address Breakpoint Enable" in the reset clears. If set, when the ARM7 Memory access address high 30 matches ADABR <31:2> and VCSR <ADAB> Is cleared, the occurrence of "ARM7 Data Address Breakpoint" exception. VCSR <ADAB> set, indicates abnormalities. When a match occurs, if VCSR <ADAB> Has been set, this VCSR <ADAB> is cleared, Match is ignored. In storage before instruction execution, the exception is reported. |
0 | LABE | Load address breakpoint enabled. Cleared on reset. If set, when the ARM7 Load Access address high 30 matches ADABR <31:2> and VCSR <ADAB> Is cleared when "ARM7 Data Address Breakpoint" exception. VCSR <ADAB> is set to indicate an exception. When a match occurs if the VCSR <ADAB> has been set, this VCSR <ADAB> is cleared, Match is ignored. In previously reported abnormal load instruction. |
"Scratchpad registers" Configuring the cache subsystem 130 is formed using a high SRAM
Speed and size of the temporary address. Register definitions are shown in Table B.7
Table B.7: SPREG Definition
Position | Mnemonic | Explanation |
31-11 | SPBASE | "High-speed buffer base address" indicates the start address of scratchpad high 21. According MSP_BASE register value, which value must have 4M bytes Offset |
10-2 | Retention | |
1-0 | SPSIZE | The size scratchpad 00 -> 0K (vector processor with 4K data cache) 01 -> 2K (vector processor with 2K data cache) 10 -> 3K (vector processor with 1K data cache) 11 -> 4K (without vector processor data cache) |
Users extended registers 110 and 120 is mainly used for synchronization of the processor. Users extended registers when
Only a pre defined, the mapping in place 30, and for example "MFERR15, UERx" finger
Order will return a bit value Z flag. Bit UERx <31> and UERx <29:0> are always
Read as 0. Users extended registers are described in Table B.8.
Table B.8: User Extension Register
Number | Mnemonic | Explanation |
UER0 | VPSTATE | Vector register status flag. When set, |
UER1 | VASYNC | Vector and ARM7 synchronization flag. |
Table B.9 shows the power-on reset state when the extended registers.
Table B.9: Extended power status register
Register | Reset state |
CTR | 0 |
PVR | TBD |
VIMSK | 0 |
ALABR | AIABR <0> = 0, the other not defined |
ADABR | ADABR <0> = 0, the other not defined |
STR | 0 |
VPSTATE | VPSTATE <30> = 0, the other not defined |
VASYNE | VASYNC <3> = 0, the other not defined |
Appendix C
Structural state vector processor 120 comprises 32 32-bit scalar register; 32 288
Vector registers of two groups; one pair of 576 vector accumulator register; a group of 32 dedicated registers.
Scalar, vector, and intended for general-purpose programming accumulator register with, and supports many different data types.
The following tags are used here and later parts: VR indicates vector registers; VRi denotes
the i-th vector registers (zero offset); VR [i] represents the vector register VR in the i-th data element;
Represents the vector register VR <a:b> the bits a to b, and VR [i] <a:b> means to the
VR of registers in the i-th bit of a data element to b.
For a number of elements in a vector register, vector data structure has a type and number of additional
According to the length dimension. Because there is a fixed size vector register, it depends on the number of data elements to maintain
Element length. MSP structure defines as shown in Table C.1 length 5 elements.
Table C.1: the length of the data element
Length Name | Length (bits) |
Boolean | 1 |
Byte | 8 |
Byte 9 | 9 |
Halfword | 16 |
Word | 32 |
MSP structure, according to the data type specified in the instruction and length to explain the vector data. Typically,
Most math instruction byte, byte 9, halfword, and word length of the element supports two's complement (integer) grid
Style. In addition, for most arithmetic instructions, the word length of the element supports IEEE754 single precision format.
A programmer can be in any desired way to interpret the data, as long as the instruction sequence to produce meaningful
Results. For example, programmers freely in bytes 9 to store 8-bit unsigned number, which is equivalent to freely
8 unsigned byte data saved to the element, and with the supplied two's complement arithmetic instructions to operate
They are, as long as the program can handle "false" overflow results.
There are 32 scalar registers, called SR0 to SR31. Scalar register is 32 bits long and to accommodate
Is satisfied by any one of a defined length of a data element. Scalar register is a special register SR0
Makers. Register SR0 always read 32 zeros. And disregard for SR0 register writes. Byte, word,
Section 9 and the half-word data type is stored in the scalar register the least significant bit, and that the most significant
Bits have undefined values.
Since no data type indicator registers, the programmer must know the storage used by each instruction
The data types. This differs from the 32-bit register that contains the 32-bit value other structures. MSP
A structured data type specified correctly modify only the results for the defined data type A bit. For example,
Byte 9 plus the results can only be modified scalar register 32 goals Low 9. The higher the value of the 23
Not defined. Unless otherwise indicated by instruction.
64 vector registers are configured two groups, each group of 32 registers. Group 0 contains the first 32
Registers, followed by the group 1 comprises 32 registers. These two groups a set to the current group,
Another setting or alternative groups. All vector instruction through the use of default values in the current group registers, except
The load / store and register transfer instructions, they can access the alternative group vector register. In the "to
Volume control "and" Status Register VCSR "in CBANK bits used to set the group of 0 or 1 to
For the current group (another one as an alternative group). In the current group of vector registers are designated as VR0 to
VR31, and in the alternative group designated as VRA0 to VRA31.
...
64 vector registers are configured two groups, each group of 32 registers. Group 0 contains the first 32
Registers, followed by the group 1 comprises 32 registers. These two groups a set to the current group,
Another setting or alternative groups. All vector instruction through the use of default values in the current group registers, except
The load / store and register transfer instructions, they can access the alternative group vector register. In the "to
Volume control "and" Status Register VCSR "in CBANK bits used to set the group of 0 or 1 to
For the current group (another one as an alternative group). In the current group of vector registers are designated as VR0 to
VR31, and in the alternative group designated as VRA0 to VRA31.
...
VRi<575:0>=VR
1i<287:0>:VR
0i<287:0>
Here VR0i and VR1i are 1 and 0 represents the group number of registers in the vector register VRi.
Double-wide vector registers are called VR0 to VR31.
Vector register can hold byte, byte 9, halfword, or word length of more than one element, as shown in Table C.2
Shown.
Table C.2: number of elements of each vector register
Length of the element name | Element length (bits) | Maximum number of elements | The total number of bits used |
Byte 9 | 9 | 32 | 288 |
Byte | 8 | 32 | 256 |
Halfword | 16 | 16 | 256 |
Word | 32 | 8 | 256 |
Does not support a mixture of various elements length register. In addition to byte 9 elements outside with only 288
The 256 bits. In particular, the ninth bit of each do. In byte, half-word and word length of 32 without
Bit is reserved. Their values programmer should not make any assumptions.
Vector accumulator register is compared to the result in the destination register has higher precision intermediate nodes
If available storage. Vector accumulator register 288 consists of four registers, which is VAC1H,
VAC1L, VAC0H and VAC0L. VAC0H: VAC0L default by the three instructions through the
Purposes. VEC64 mode only, VCL1H: VAC1L 9 to 64 bytes for the analog vector operations.
Even in VEC32 manner set 1 for the current group, still use this VAC0H: VAC0L right.
To generate the source vector register with the same number of elements in the result of extended precision, by a pair of
Registers to hold the extended-precision elements, as shown in Table C.3.
Table C.3: Vector Accumulator format
Element length | Logical View | VAC format |
Byte 9 | VAC[i]<17:0> | VAC0H [i] <8>: VAC0L <i> <8:0> with For i = 0 .. 31 and VAC1H [i-32] <8:0>: VAC1L [i-32] <8:0> for i = 32 .. 63 |
Byte | VAC[i]<15:0> | VAC0H [i] <7:0>: VAC0L <i> <7:0> For i = 0 .. 31 and VAC1H [i-32] <7:0>: VAC1L [i-32] <7:0> for i = 32 .. 63 |
Halfword | VAC[i]<31:0> | VAC0H [i] <15:0>: VAC0L <i> <15: 0> for i = 0 .. 15 and VAC1H [i-16] <15: 0>: VAC1L [i-16] for i = 16 .. 31 |
Word | VAC[i]<63:0> | VAC0H [i] <31:0>: VAC0L <i> <31: 0> for i = 0 .. 7 and VAC1H [i-8] <31: 0>: VAC1L [i-8] <31:0> for i = 8 .. 15 |
Only VEC64 mode only used VAC1H: VAC1L right, at this time the number of elements, the byte 9 (and
Byte), halfword and word 64, 32 or 16, respectively.
There are 33 dedicated registers can not be loaded directly from memory or directly into memory. 16 special
Using registers are called RASR0 to RASR15, forming an internal subroutine return address stack by adjusting
Use and return instructions for use. Another 17 32 dedicated registers are shown in Table C.4
Table C.4: special register
Number | Mnemonic | Explanation |
SP0 | VCSR | Vector control and status register |
SP1 | VPC | Vector program counter |
SP2 | VEPC | Vectored exception program counter |
SP3 | VISRC | Vectored interrupt source register |
SP4 | VIINS | Vectored Interrupt instruction register |
SP5 | VCR1 | |
SP6 | VCR2 | Vector Count Register 2 |
SP7 | VCR3 | |
SP8 | VGMR0 | Total vector mask register 0 |
SP9 | VGMR1 | Vector mask register a total |
SP10 | VOR0 | Vector overflow register 0 |
SP11 | VOR1 | Vector overflow register 1 |
SP12 | VLABR | Vector data address breakpoint registers |
SP13 | VDABR | Vector instruction address breakpoint register |
SP14 | VMMR0 | Vector shift mask register 0 |
SP15 | VMMR1 | Vector mask register a transfer |
SP16 | VASYNC | Vector and ARM7 Synchronization Register |
Vector control and status registers VCSRDefinitions are shown in Table C.5
Table C.5: VCSR Definition
Position | Mnemonic | Explanation |
31:18 | Retention | |
17:13 | VSP<4:0> | Return address stack pointer. VSP by moving to and from the subroutine subroutine Cheng instructions to return to use to keep track of internal return address stack. In return Return address stack is only 16 entrance, VSP <4> is used to detect stack Overflow condition. |
12 | SO | Summary overflow status flag. When the result of an arithmetic operation overflows, this bit is Set. This bit is once set is unchanged until the write 0 to |
When cleared. | ||
Position | Mnemonic | Explanation |
11 | GT | Greater than the state flag. When SRa> SRb, use VSUBS instruction set Set this bit. |
10 | EQ | Equal status flag. When SRa = SRb, use VSUBS instruction set Set this bit. |
9 | LT | Less than the state flag. When SRa <SRb time by VSUBS instruction set The bit |
8 | SMM | Select a transfer mask. When this bit is set, VMMR0 / 1 to becoming operator Shielding elements surgery operation. |
7 | CEM | Complement shielding elements. When this bit is set, regardless of the configured arithmetic Shielding element operation, the element is defined shielded VGMR0 / 1 or VMMR0 / 1 to 1's complement. This bit does not change VGMR0 / 1 or VMMR0 / 1 contents of these registers are used only to change. SMM: CEM Code provides: 00 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 01 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 10 - with VMMR0 / 1 as all elements except VCMOVM outside Masked. 11 - with VMMR0 / 1 as all elements except outside VCMOVM Masked. ... |
6 | OED | Complement shielding elements. When this bit is set, regardless of the configured arithmetic Shielding element operation, the element is defined shielded VGMR0 / 1 or VMMR0 / 1 to 1's complement. This bit does not change VGMR0 / 1 or VMMR0 / 1 contents of these registers are used only to change. SMM: CEM Code provides: 00 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 01 - with VGMR0 / 1 as all elements except outside VCMOVM Masked. 10 - with VMMR0 / 1 as all elements except VCMOVM outside Masked. 11 - with VMMR0 / 1 as all elements except outside VCMOVM Masked. ... |
5 | ISAT | Integer saturation mode. OED: ISAT bit combination is defined as: 00 No OED: ISAT bit saturation states: 00 unsaturated, when an overflow exception reports. X1 saturated, does not cause an overflow |
Position | Mnemonic | Explanation |
10 unsaturated, when an overflow exception is not reported. | ||
4:3 | RMODE | IEEE754 floating point rounding mode operation. 00 negative infinity rounding direction 01 rounding direction zero 10 rounding direction closest to the value 11 rounding direction positive infinity |
2 | FSAT | Saturation mode bit floating point (IEEE fast way) |
1 | CBANK | Current group bit. When set, indicates that the group one of the current group. When cleared the table Group 0 for the current group show, when VEC64 bit is set, CBANK suddenly Slightly. |
0 | VEC64 | 64 bytes 9 vector mode bit. When set, the provisions of vector registers and
There |
Vector VPC program counter registerBy the vector processor 120 to execute the next instruction
Address. ARM7 processor 110 is issued STARTVP command to start operation of the vector processor 120
Register should be loaded before the VPC.
Vectored exception program counter VEPCIndicate the most likely to cause abnormal latest instruction address.
MSP100 does not support precise exception, therefore, with the "most likely" is used.
Vectored interrupt source register VISRCOn the ARM7 processor 110 that the interrupt source. Appropriate bit
By the hardware when the abnormality is detected is set. In the vector processor 120 begins executing software before re-
Must clear the register VISRC. In the register VISRC any location in the bit vector processing are caused
120 into the state of VP-IDLE. If the corresponding interrupt enable bit in VIMSK be set to
Interrupts the processor 110 is sent. Table C.6 defines the contents of the register VISRC.
C.6: VISRC Definition
Position | Mnemonics | Explanation |
31 | DAB | Data address |
30 | LAB | Instruction address breakpoint exception |
29 | SSTP | Single step exception |
28-18 | Retention | |
17 | IIA | Invalid instruction address anomalies |
16 | IINS | Invalid instruction exception |
15 | IDA | Invalid data address exception |
14 | UDA | Unaligned data access exception |
13 | FOV | Floating-point overflow exception |
12 | FINV | The number of floating-point invalid operation exception |
11 | FDIV | Floating-point division by zero exception |
10 | IOV | Integer overflow exception |
9 | IDIV | Integer Divide by Zero exception |
8 | RASO | The return address on the stack overflow exception |
7 | RASU | The return address |
6 | VIP | VCINT exception is pending, the Executive STARTVP command clears the bit |
5 | VJP | VCJOIN exception is pending, the Executive STARTVP command clears the bit |
4-0 | VPEV | Vector processor exception vector |
Vector interrupt instruction register VIINSWhen VCINT or VCJOIN instruction is executed to interrupt the
ARM7 processor 110, VCINT or VCJOIN instruction is updated.
Vector Count Register VCR1, VCR2 and VCR3For the "reduction and branch" instructions
VD1CBR, VD2CBR and VD3CBR, and used to perform the loop count is initialized. When executing
OK VD1CBR instruction VCR1 register is decremented by 1. If the count value is not zero, and the command
Match the conditions as referred to VFLAG, branching occurs. Otherwise, the branch does not occur. Register VCR1
In any case, can be decremented by 1. Register VCR2 and VCR3 be used in the same way.
Vector fully shielded register VGMR0VEC32 mode indicates the destination vector will be affected
Register elements and in VEC64 mode in VR <287:0> elements within. In VGMR0
A control vector for each 9 bits in the destination register updates. Specifically, VGMR0 <i>
Control VEC32 mode VRd <9i +8:9 i> Updates and VEC64 VR mode0d <9i
+8:9 I> update. Note, VR0d refers to the VEC64 mode within the destination register bank 0
Device, while VRd refers to the current group destination register. In VEC32 mode, both in the group 0,
Also for group 1. Vector mask register VGMR0 full instructions for all except the VCMOVM
Execution of the instructions.
Vector mask register VGMR1 represents all VEC64 VR mode will be affected
<575:288> elements within. In each of the control register VGMR1 the purpose of group 1 vector
9 bits register updates. Specifically VGMR1 <i> control VR1 <9i +8:9 i>
Updates. In VEC32 VGMR1 mode register is not used, but VEC64 mode, image
In addition VCMOVM instruction outside the ring of all instructions executed.
Vector overflow register VOR0VEC32 mode represents the elements and VEC64 mode
VR <287:0> the elements that comprise a vector arithmetic overflow after the results. The register
Scalar register is not subject to modification arithmetic. Bit VOR0 <i> set indicates byte and byte 9
The i-th element of the first <i,idiv2> half word elements, or the operation of the first word data type (i, idiv4)
Elements including overflow results. For example, bits 1 and 3 may be set to indicate, respectively, the first half-word
And word elements overflow. In VOR0 median mapping differs from the median of VGMR0 or VGMR1
Mappings.
Vector overflow register VOR1VEC64 mode for showing VR <575:288>
The elements that are included in the vector arithmetic operation result after an overflow. Register VOR1 in VEC32
Mode is not used, nor by the scalar arithmetic to modify. Bit set VOR1 <i> expressed words
Section or byte 9 i-th element, half-word section (i, idiv2) elements, or the operation of the first word data type
(i idiv4) elements include an overflow results. For example, bits 1 and 3 may be respectively set as shown in the
VR <575:288> in the first half-word or word element overflow. In VOR1 median mapping does not
VGMR0 or the same as the mapping of the bits VGMR1.
Vector instruction address breakpoint register VLABRAid debugging vector program. Registers are defined as
Table C.7 below.
Table C.7: VLABR Definition
Position | Mnemonic | Explanation |
31-2 | IADR | Vector instruction address, the reset is not defined |
1 | Reserved bit | |
0 | IABE | Instruction address breakpoints enabled. In the reset is not defined. If set, when the vector refers to Make access address with VLABR <31:2> matches happen "vector instruction to Address Breakpoint "exception, set bit VISRC <IAB> to indicate abnormalities of the different Often before instruction execution reports. |
Vector data address breakpoint registers VDABRAid debugging vector program. Registers are defined as
Table C.8 representation.
Table C.8: VDABR Definition
Position | Mnemonic | Explanation |
31-2 | DADR | Vector data addresses. When the reset is not defined |
1 | SABE | Memory address breakpoint enabled. Reset is not defined. If set, when the vector storage Chu access address with VDABR <31:2> match happen "vector Data Address Breakpoint "exception. VISRC <DAB> bit is set to indicate Exception. Previously reported in the storage instruction execution exception. |
0 | LABE | Load address breakpoint enabled. Cleared on reset. If set, when the vector plus Set access address with VDABR <31:2> match occurs when "the number of vectors According to the Address Breakpoint "Exception. VISRC <DAB> is set to indicate an exception. Before loading the instruction execution report abnormalities. |
Vector shift mask register VMMR0At all times for VCMOVM command to use, while
When VCSR <SMM> = 1 in time for all commands used. Register VMMR0 indicates VEC32
Mode will be affected elements of the destination vector register, and VEC64 mode VRL
<287:0> inline elements. Each bit in the VMMR0 control vector nine bits of the destination register
Updates. Specifically VMMR0 <i> in VEC32 mode control VRd <9i +8:9 i>
Updates the control mode in VEC64 VR0d <9i +8:9 i> updates. In VEC64 mold
Where VR0d indicates the purpose of the group 0 register, VRd refers to the current group of the destination register,
In VEC32 mode VRd can also be in group 0 In Group 1.
Vector shift mask register VMMR1At all times for VCMOVM command to use, while
When VCSR <SMM> = 1 in time for all commands used. Register VMMR1 indicates VEC64
Model affected the VR <575:288> elements, VMMR1 of each control
Vector group 1 9 bits in the destination register updates. Specifically VGMR1 <i> control VR1d
<9i +8:9 i> update. In VEC32 VGMR1 mode register is not used.
Vector and ARM7 synchronization register VASYNCProvided between the processor 110 and 120 Production
/ Consumer type of synchronization. Currently, the only defined bit 30. When the vector processor VP-120
RUN or VP_IDLE time, ARM7 processor available MFER, MTER and TESTSET means
Make access to the register VASYNC. Register VASYNC not pass TVP or MFVP instruction is
ARM7 processor accesses. Because these commands can not access beyond the beginning of 16 vector processors
Special register. Vector processing instruction accesses through VMOV register VASYNC.
Table C.9 shows power-on reset vector processor state.
Table C.9: Power-on reset state vector processors
Register | Reset state |
SR0 | 0 |
All other registers | Undefined |
In the vector processor can execute instructions prior to the adoption ARM7 processor 110 initializes dedicated registers
Register.
Appendix D
Each instruction implied or required by the source and destination operand data types. Some commands have the same
Applicable to more than one data type semantics. Some instructions have the semantics of the source with a digital
According to the types, and different data types on the results. This appendix describes the exemplary embodiment of the number of support
According to the type. In the present application are described in Table 1 of the supported data types int8, int9, int16,
int32 and float. Does not support unsigned integer format, unsigned integer value in the first before use
First must be converted to two's complement format. The programmer is free to use unsigned integer arithmetic instructions together or
Select any other format, as long as the proper handling overflow. This structure defines only two's complement integer
32-bit floating-point number and the type of data overflow. These structures are not detected 8,9,16 or 32-bit computing
Implementation of this operation is to detect the necessary unsigned overflow. Table D.1 shows the loading operation supported by the
The data length
...
Each instruction implied or required by the source and destination operand data types. Some commands have the same
Applicable to more than one data type semantics. Some instructions have the semantics of the source with a digital
According to the types, and different data types on the results. This appendix describes the exemplary embodiment of the number of support
According to the type. In the present application are described in Table 1 of the supported data types int8, int9, int16,
int32 and float. Does not support unsigned integer format, unsigned integer value in the first before use
First must be converted to two's complement format. The programmer is free to use unsigned integer arithmetic instructions together or
Select any other format, as long as the proper handling overflow. This structure defines only two's complement integer
32-bit floating-point number and the type of data overflow. These structures are not detected 8,9,16 or 32-bit computing
Implementation of this operation is to detect the necessary unsigned overflow. Table D.1 shows the loading operation supported by the
The data length
...
Length of the data memory | Register data length | Load operation |
8-bit | 9-bit | Load 8, sign extended to 9 (for Canadian Contains eight two's complement) |
8-bit | 9-bit | Load eight, zero-extended to nine (for loading Unsigned 8) |
16-bit | 16-bit | Load 16, (used to load 16-bit unsigned Or two's complement) |
32-bit | 32-bit | Load 32, (used to load 32-bit unsigned, 2's complement integer or 32-bit floating point) |
This structure according to the data type specified memory address boundary alignment. That is not aligned on byte to
Requirements; right halfword aligned halfword boundary conditions; right word is the word boundary alignment condition.
Table D.2 shows the supported data storage operation length
Table D.2: storing operations supported by the data length
Register data length | Length of the data memory | Storage operation |
8-bit | 8-bit | Storage 8 (8-bit unsigned storage or 2's complement Code) |
9-bit | 8-bit | Cut to the lower 8 bits, storage 8 (a memory 9 |
Whether the value of the symbol of 0-255 2 Complement) | ||
16-bit | 16-bit | Storage 16 (16-bit unsigned storage or 2 Complement). |
32-bit | 32-bit | Storage 32 |
Because more than one data type is mapped to either a scalar or vector registers. So in the head
Registers for some data types may be some bit is not defined results. In fact, in addition to the
Amount of data in the destination register in byte 9 the operation and the length of the scalar data in the destination register word length operation
Work, the some bits in the destination register, their values are not due to an operation are defined. These bits,
Structural requirements of their value is undefined, Table D.3 shows the length of the data for each undefined
Position.
Table D.3: Undefined bit data length
Data length | Vector destination register | Scalar destination register |
Byte | VR<9i+8>,for i=0 to 31 | SR<31:8> |
Byte 9 | none | SR<31:9> |
Halfword | VR<9i+8>,for i=0 to 31 | SR<31:16> |
Word | VR<9i+8>,for i=0 to 31 | none |
When programming programmer must know the source and destination registers or memory data type. Data Classes
Length of the element from one type into another potentially resulting in a different number of elements stored in a vector register
Medium. For example, from half-word to word data type conversion of vector registers need two vector registers to save
The same number of storage elements is converted. On the contrary, from the vector register with a user-defined format word
Data type conversion into half-word format, in the vector register is half the number of elements to produce the same, and the remaining
I bit in the other half. In both cases, the data type conversion is converted to produce an element having
Structure configuration, the length of these elements is different from the length of the source element.
...
When programming programmer must know the source and destination registers or memory data type. Data Classes
Length of the element from one type into another potentially resulting in a different number of elements stored in a vector register
Medium. For example, from half-word to word data type conversion of vector registers need two vector registers to save
The same number of storage elements is converted. On the contrary, from the vector register with a user-defined format word
Data type conversion into half-word format, in the vector register is half the number of elements to produce the same, and the remaining
I bit in the other half. In both cases, the data type conversion is converted to produce an element having
Structure configuration, the length of these elements is different from the length of the source element.
...
Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first
Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively
Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package
The basic steps include:
Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first
Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively
Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package
The basic steps include:...
Described in Appendix E special instructions such as VSHFLL and VUNSHFLL a data length of the first
Degree vector into two kinds of data lengths of vectors simplified. In the vector register VRa, from the relatively
Smaller element length (eg int8) into a larger element length (such as int16) 2's complement data type package
The basic steps include:...
In the vector register VRa Lieutenant two's complement number from the larger element length (int16) into smaller lengths
(As int8) included in the basic steps of:
1 Verify int16 data types byte length of each element can be represented. If you need to
To the ends of saturation of the elements to fit a smaller length.
(2) the elements of the VRa vector VRb to mix with another wash transferred to two vectors VRc:
VRd, in the VRa: VRd, each element in the high half transferred to VRc, transferred to the lower half
VRd, so the low half of VRd effectively VRa collection of all elements in the lower half million
Su.
The following data type conversion in order to provide some special instructions: int32 into single-precision floating-point; single
Precision floating-point into fixed-point (XY notation); single-precision floating-point turn into int32; int8 into int9; int9
Into int16; and int16 into int9.
To provide flexibility in program design vector, most vector instructions and use only shielded element
Operation within the selected vector register element. "Full vector mask register" VGMR0 and
VGMR1 element identified by vector instructions in the destination register and vector accumulator to be modified
Elements. 9 bytes for byte and a data length of operation, VGMR0 (or VGMR1) in 32-bit
Everyone in the identification of an element to be operated, the bit is set, indicates VGMR0 <i> byte length
Element i will be effect. Where i is 0 to 31. Right half-word data in terms of the length of operation, in VGMR0 (or
VGMR1) Each of the 32 bits in the two identified an element to be operated. Bit VGMR0 <2i: 2i
+1> Set, indicates that the role of element i will be, i is 0-15. If the length of the half word data operations
VGMR0 only one pair is set, then only those bits corresponding byte is modified. To
Data word length operation, VGMR0 (or VGMR1) set for each set of four bits identify an element is operated
Made. Bit VGMR0 <4i: 4i +3> set, indicates that the role of element i will be, i is 0-7. As
VGMR0 the fruit in the four bits are not set all bits of the data word length operation is set, only the
Those bits should byte is modified.
...
To provide flexibility in program design vector, most vector instructions and use only shielded element
Operation within the selected vector register element. "Full vector mask register" VGMR0 and
VGMR1 element identified by vector instructions in the destination register and vector accumulator to be modified
Elements. 9 bytes for byte and a data length of operation, VGMR0 (or VGMR1) in 32-bit
Everyone in the identification of an element to be operated, the bit is set, indicates VGMR0 <i> byte length
Element i will be effect. Where i is 0 to 31. Right half-word data in terms of the length of operation, in VGMR0 (or
VGMR1) Each of the 32 bits in the two identified an element to be operated. Bit VGMR0 <2i: 2i
+1> Set, indicates that the role of element i will be, i is 0-15. If the length of the half word data operations
VGMR0 only one pair is set, then only those bits corresponding byte is modified. To
Data word length operation, VGMR0 (or VGMR1) set for each set of four bits identify an element is operated
Made. Bit VGMR0 <4i: 4i +3> set, indicates that the role of element i will be, i is 0-7. As
VGMR0 the fruit in the four bits are not set all bits of the data word length operation is set, only the
Those bits should byte is modified.
...
For vector programming flexibility, most MSP instruction supports vector and scalar operations three kinds
Form, are as follows:
1 vector = vector of vector operations
(2) vector = scalar vector operations
3 vector = scalar scalar operations
Case 2 scalar registers specified as the B operand in scalar register a single element complex
A vector is made to match the number of elements in the operand required amount. Copied elements are designated with a scalar
Operand elements have the same value. Scalar Operands for immediate operands form can be derived from a scalar register
Or instruction. In the case of immediate operand, if the data type specified by the data length ratio can be obtained
Immediately to the field length is large, the use of appropriate sign extension.
In many multimedia applications, especially immediate attention to the source and the accuracy of the final result. In addition, the entire
Multiply instruction produces energy stored in two vector registers in the "double precision" intermediate results.
Typically, MSP architecture supports 8,9,16 and 32 elements in two's complement integer format
And 32 elements IEEE754 single precision format. The definition of an overflow, the result is outside a predetermined data
Type can be represented by the maximum positive or maximum negative range. When an overflow occurs, write the destination register
The value is not a valid number, the defined underflow used only for floating-point operations.
Unless otherwise noted, all floating point operations specified in bits VCSR <RMODE> rounding the four
One way. Some instructions use the well-known rounding zero (even rounding) rounding mode. These instructions are clearly
Noted.
In many multimedia applications, the saturation is an important feature. MSP architecture supports all four
Integer and floating-point operations saturation. The median in the register VCSR ISAT Specify integer saturation mode. Floating point
Saturated mode, also known as IEEE fast manner in which a VCSR FSAT bit to specify. When enabled saturated
Mode, exceeds the maximum positive or negative results are a large set maximum positive or maximum negative value. In this
Case, no overflow occurs, the overflow bit can not be set.
Table D.4 lists the exact exceptions that previously specified in the implementation of fault detection and reporting. Different
Constant vector address in hexadecimal notation.
Table D.4: precise exception
Exception vector | Explanation |
0x00000018 | Vector processor instruction address breakpoint exception |
0x00000018 | Vector processor data address breakpoint exception |
0x00000018 | Invalid instruction exception vector processors |
0x00000018 | Single step exception vector processors |
0x00000018 | Vector processors return address on the stack overflow exception |
0x00000018 | Vector processors return address stack underflow exception |
0x00000018 | Exception vector processor VCINT |
0x00000018 | Exception vector processor VCJOIN |
Table D.5 lists the inexact exception, these anomalies in the implementation of certain directives in the program is faulty
After instruction, to be detected and reported.
Table D.5: inexact exception
Exception vector | Explanation |
0x00000018 | Invalid exception vector processor instruction address |
0x00000018 | Invalid data address exception vector processors |
0x00000018 | Vector processor does not align data access exception |
0x00000018 | Vector processor integer overflow exception |
0x00000018 | Floating-point overflow exception vector processors |
0x00000018 | Floating-point invalid operand vector processor exception |
0x00000018 | Vector processor floating point divide by zero exception |
0x00000018 | Vector processor integer divide by zero exception |
Appendix E
The vector processor instructions included are shown in Table E.1 in eleven categories
Table E.1 | Vector instruction class summary |
Category | Explanation |
Control flow | Instructions contained in this category include the transfer and is used to control the interface ARM7 instruction The program flow. |
Logical (bitwise manner, shielding) | This class includes instruction bitwise logical manner. Although (bitwise manner, shielding) Data type is Boolean class, but logic instructions to modify using elemental shield The results, which requires data types. |
Shift and Rotate (Calculated as elemental way, shielded) | This category contains instructions for each element of the shift and rotate bit screen Cover. The class distinction between the length of the element, and shielded by the elements of. |
Arithmetic (Calculated as elemental way, shielded) | This class includes elements of the way by arithmetic instructions. (Calculated as elemental way, shielded) That is a result of i-th element of the source element of the i-th calculated , The type of the elements of the class distinction, and subject to the impact of shielding elements. |
Multimedia (Calculated as elemental way, shielded) | This category contains instructions for optimizing multimedia (calculated as elemental way, shielded) Applications, the class distinction element type, and shielded by the elements affected. |
Data Type Conversion (Calculated as elemental way, unshielded) | This class contains the instructions for converting from one element (element mode, no screen Cover) data type to another. This class supports the specified data class instruction Type set, and without shielding elements, since this structure does not support storage Is in more than one data type. |
Arithmetic between elements | This class includes instruction for a different location from the vector fetch two elements An arithmetic result. |
Transfer between elements | This class includes instruction for a different location from the vector fetch two elements Rearrange elements. |
Load / store | This class includes instructions for loading or storage registers. These instructions are not Masked by the impact of elements. |
Cache Operation | This category contains instructions for controlling the instruction and data caches. These refer to So shielded from the impact of elements. |
Register transfers | This class contains instructions for transferring data between two registers. These Instructions are usually shielded from the impact of elements, but some elements can be selected Masked. |
Table E.2 lists the flow control instructions.
Table E.2: Flow control instructions
Mnemonic | Explanation |
VCBR | Conditional branch |
VCBRI | Indirect conditional branch |
VD1CBR | Reduction VCR1 and conditional branches |
VD2CBR | Reduction VCR2 and conditional branches |
VD3CBR | Reduction VCR3 and conditional branches |
VCJSR | Conditions rotor routines |
VCJSRI | Indirect rotor routine conditions |
VCRSR | Conditional Return from the program |
VCINT | ARM7 interrupt conditions |
VCJOIN | Conditions confluence with ARM7 |
VCCS | Context switching conditions |
VCBARR | Conditions barrier |
VCHGCR | Change Control Register (VCSR) |
Logic class supports Boolean data type and shielded by the elements affected. Table E.3 lists the flow control commands.
Table E.3: logic instructions
Mnemonic | Explanation |
VNOT | NOT--B |
VAND | AND-(A&B) |
VCAND | Complement AND-(-A & B) |
VANDC | AND complement - (A &-B) |
VNAND | NAND--(A&B) |
VOR | OR-(A|R) |
VCOR | Complement OR-(-A | R) |
VORC | OR complement of - (A |-R) |
VNOR | NOR--(A|R) |
VXOR | XOR - (A ^ R) |
VXNOR | Exclusive NOR - (A ^ R) |
Shift / Rotate shift class instructions int8, int9, int16 and int32 data type operations (non-floating
Point data types), and subject to the impact of shielding elements. Table E.4 lists the shift / rotate class instruction.
Table E.4: Shift and Rotate class
Mnemonic | Explanation |
VDIV2N | In addition to a power of 2 |
VLSL | Logical Shift Left |
VLSR | Logical Shift Right |
VROL | Rotate Left |
VROR | Rotate Right |
Typically, the arithmetic class instruction support int8, int9, int16 and int32 and floating-point data types, and
Masked by the impact of elements. For non-supported data types specifically limited, see below each instruction
A detailed description. VCMPV instruction is not subject to the impact shield element, the element shield their work situation
Condition. Table E.5 lists the arithmetic class instruction.
Table E.5: math class
Mnemonic | Explanation |
VASR | Arithmetic shift right |
VADD | Plus |
VAVG | Average |
VSUB | Minus |
VASUB | Less absolute |
VMUL | Multiply |
VMULA | Multiply accumulator |
VMULAF | Multiply accumulator fractional |
VMULF | Multiply decimals |
VMULFR | And multiply decimals and rounding |
VMULL | By Low |
VMAD | Multiplication and addition |
VMADL | Low multiplication and addition |
VADAC | Add and accumulate |
Mnemonic | Explanation |
VADACL | Add and accumulate low |
VMAC | Multiply and accumulate |
VMACF | Multiply and accumulate fractional |
VMACL | Multiply and accumulate low |
VMAS | Multiply and subtract from accumulator |
VMASF | Multiply and subtract from accumulator fractional |
VMASL | Multiply and subtract from accumulator low |
VSATU | Saturated to the upper limit |
VSATL | Saturated to the lower limit |
VSUBS | Less scalar and postcondition |
VCMPV | Compare vectors and set mask |
VDIVI | In addition to initializing |
VDIVS | Except |
VASL | Arithmetic shift right |
VASA | Arithmetic shift an accumulator |
MPEG instructions are specially adapted for the MPEG encoding and decoding of a class of instructions, but may be in various
Manner. MPEG directive does not support int8, int9, int16 and int32 data types, and are subject to
Elements shielding effects. Table E.6 lists MPEG instruction.
Table E.6: MPEG class
Mnemonic | Explanation |
VAAS3 | Plus processing (-1,0,1) symbol |
VASS3 | Addition and subtraction (-1, 0) Symbol |
VEXTSGN2 | Extraction (-1,1) symbol |
VEXTSGN3 | Extraction (-1,0,1) symbol |
VXORALL | XOR all elements of the least significant bit. |
Each data type conversion instruction to support specific data types, and is not a shadow shield element
Sound, because this structure does not support more than one register data type. Table E.7 lists the data classes
Type conversion instructions.
Table E.7: data type conversion classes
Mnemonic | Explanation |
VCVTIF | Convert from integer to float |
VCVTFF | Floating-point to fixed-point conversion |
VROUND | Rounding floating-point to integer (supports four IEEE rounding mode Style) |
VCNTLZ | Count leading 0 |
VCVTB9 | Converting data type Byte 9 |
Internal element arithmetic class instruction support int8, int9, int16 and int32 and floating-point data types.
Table E.8 lists the internal elements of math class instruction.
Table E.8: internal element arithmetic class
Mnemonic | Explanation |
VADDH | Two adjacent elements plus |
VAVGH | Average of two adjacent elements |
VAVGQ | Average of the four elements |
VMAXE | Maximum switching even / odd elements |
Transfer between elements support byte-oriented instructions, byte 9, halfword, and word length of the data, are listed in Table E.9
Transfer between the elements of class instruction.
Table E.9: Elements interline transfer type
Mnemonic | Explanation |
VESL | Elements to the left one |
VESR | Elements to the right one |
VSHFL | Even / odd element shuffling |
VSHFL | Even / odd element shuffling |
VSHFLH | High even / odd element shuffling |
VSHFLL | Low even / odd element shuffling |
VUNSHFL | Even / odd elements deshuffling |
VUNSHFLH | High even / odd elements deshuffling |
VUNSHFLL | Low even / odd elements deshuffling |
Load / store instructions in addition to support for byte, half-word and word length of the data outside also particularly relevant support byte 9
Data length operation, and shielded by the elements of. Table E.10 lists the load / store instruction class.
Table E10: load / store category
Mnemonic | Explanation |
VL | Load |
VLD | Load double word |
VLQ | Load quadword |
VLCB | Loaded from the ring buffer |
VLR | Inverse sequence of elements loaded |
VLWS | Span load |
VST | Storage |
VSTD | Memory double word |
VSTQ | Storage quadword |
VSTCB | Stored in the ring buffer |
VSTR | Inverse sequence of elements stored |
VSTWS | Span storage |
Most of register transfer instruction support int8, int9, int16 and int32 and floating-point type,
Not affected by the impact shield element, only VCMOVM instruction is subject to the impact shield element. Table E.11
Lists the register transfer class instruction.
Table E.11: register transfer class
Mnemonic | Explanation |
VLI | Immediate loading |
VMOV | Shift |
VCMOV | Conditional transfer |
VCMOVM | Shielded with conditional branching element |
VEXTRT | Extracting an element |
VINSERT | Insert an element |
Table E.12 lists the cache subsystem 130 controls a cache operation class instruction.
Table E.12: Cache operation class
Mnemonic | Explanation |
VCACHE | The data or instruction cache cache operation |
VPFTCH | To a data cache prefetch |
VWBACK | From the data cache write-back |
Instructions predicate
To simplify the description of the instruction set in the appendix uses a special terminology. For example, the instruction operation
Operand is a byte, byte 9, halfword, or word length signed two's complement integer, unless otherwise
Comments. The term "registers" is used to refer to common (scalar or vector) registers, other types of registers
Are clearly explained. Press assembly language syntax, the suffix b, b9, h, and w represents the data length (byte,
Byte 9, half-word and word) and integer data types (int8, int9, int16 and int2). In addition, with the
To describe the instruction operands, operation, and assembly language syntax terminology and symbols are as follows.
Rd purpose registers (vector, scalar or dedicated)
Ra, Rb source registers a and b (vector, scalar or private)
Rc source or destination register c (vector or scalar)
Rs store data source register (vector or scalar)
S 32-bit scalar or special registers
Vector register VR current group
VRA substitution group vector register
VR0 0 group vector register
VRd vector destination register (default is the current group, unless the VRA is specified)
VRa, VRb vector source register a and b
The source or destination register VRC vector C
VRS vector store data source register
VAC0H vector accumulator registers 0 High
VAC0L vector accumulator register 0 Low
VAC1H vector accumulator registers a high
VAC1L vector accumulator registers a low
SRd scalar destination register
SRa, SRb scalar source registers a and b
SRb + in order to effectively address base register update
SRs scalar data storage source register
SP special register
VR [i] vector register VR in the i-th element
VR [i] <a:b> vector register VR in the i-th element of a to b bits
VR [i] <msb> vector register VR in the i-th element in the most significant bit
The effective memory access address EA
MEM Memory
BYTE [EA] EA memory address of a byte
HALF [EA] EA in halfword memory address, the address EA +1 to bits <15:8>.
WORD [EA] EA of a memory address, address EA +3 to bits <31:24>.
NumElem given data type for the specified number of elements. In VEC32 model, the word
Section and byte 9, halfword, or word data length, respectively 32,16, or 8; in
VEC64 model, byte and byte 9, halfword, or word data length, respectively
64, 32 or 16. Scalar operation NumElem is 0.
EMASK [i] represents the i element by element shield. Byte for byte and 9, half-word or words
Data length, in VGMR0 / 1, ~ VGMR0 / 1, VMMR0 / 1 or ~
VMMR0 / 1 respectively represents 1, 2 or 4 bits. A scalar operation, even
EMASK [i] = 0, but also that element shield is set.
MMASK [i] represents the i element by element shield. In bytes and byte 9, halfword, or word
Data length, respectively, in VMMR0 or VMMR1 represents 1, 2 or 4
Position.
VCSR vector control and status register
VCSR VCSR <x> represents one bit or more bits. "X" is the field name
VPC vector processor program counter
VECSIZE vector register length, in VEC32 pattern is 32, the pattern is in VEC64
64。
SPAD register
C programming structure used to describe the flow control operation. Some exceptions noted below:
= Assignment
Connection
{X ‖ Y} X or Y to select between (not a logical or)
sex on the length of the specified data sign extension
sex_dp the specified data length double precision number sign extension
sign "(arithmetic) right sign extension
zex the specified data length zero-extended
zero "(logic) right zero extension
"Left (filled with zeros)
trnc7 truncated front 7 (from a half-word)
trnc1 front of an amputated (from byte 9)
% Modulo operator
| expression | expression taking the absolute value
/ Except (for floating-point data types using four kinds of IEEE rounding modes)
/ / Divide (using a zero rounding mode rounding)
Saturate () for integer types saturated to the maximum negative or maximum positive value, does not produce an overflow; right
To floating-point data types, can be saturated to positive infinity, positive zero, negative zero, or negative
Or infinity.
General instruction format shown in Figure 8 and described below.
REAR formatBy the load, store and use the operation instruction cache, and format the fields with REAR
Have the meanings given below in Table E.13.
Table E.13: REAR format
Field | Significance |
OPC<4:0> | Opcode |
B | The Rn registers the group identifier |
D | Purpose / Source scalar register. When set, Rn <4:0> point mark Volume register. In VEC32 mode, the B: D-coded legal values are: 00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is in the alternative group vector register 11 Undefined |
In VEC64 mode, the B: D-coded legal values are: 00 Only in the vector register Rn 4,8,16 or 32 bytes are Use 01 Rn is a scalar register 10 vector register Rn, all 64 bytes are used 11 Undefined | |
TT<1:0> | Transfer type, indicating the specific load or store operation. See below LT And ST coding table. |
C | Cache off. This bit is set to bypass the data cache load Memory. This bit is used to load and store instructions Mnemonic set cache-off Set (OFF to connect Mnemonic) |
A | Address is updated, set this bit with a valid address update SRb. Effective Address Press SRb + SRi calculation. |
Rn<4:0> | Destination / source register number |
SRb<4:0> | Scalar base register number |
SRi<4:0> | Indexed register number marked |
Bits 17:15 are reserved and should be zero, in order to ensure that the structure in the future to extend compatibility. B: D and
Some coding TT field is undefined, the programmer should not use these codes, because the structure is not specified when
Such a coding is used the expected results. Table E.14 shows VEC32 and VEC64 modes are supported
Scalar load operation (the TT field is encoded as LT).
Table E.14 in VEC32 and VEC64 mode REAR load operation
D:LT | Mnemonic | Significance |
100 | .bs9 | Load 8 become byte 9 lengths sign extension |
101 | .h | Load 16 become half-word length |
110 | .bz9 | Load 8 byte 9 lengths become zero expansion |
111 | .w | Load 32 as word length |
Table E.15 shows VEC32 mode support vector load operation (the TT fields are assigned as LT
Code), then VCSR <0> bit is cleared.
Table E.15: VEC32 mode REAR load operation
D:LT | Mnemonic | Significance |
000 | .4 | 4 bytes from the memory into the register to load the lower 4 bytes 9, And keep the remaining byte 9 does not change. 4 bytes for each section 9 9 Bit according to the corresponding section 8 for sign extension. |
001 | .8 | Loaded from the memory into the register lower 8 bytes 8 bytes 9, And keep the remaining byte 9 does not change. 8 bytes for each section 9 9 Bit according to the corresponding section 8 for sign extension. |
010 | .16 | Load 16 bytes from memory into the register lower 16 bytes 9 and keep the remaining byte 9 does not change. 9 of 16 bytes each No. 9 according to the corresponding section 8 for sign extension. |
011 | .32 | Load 32 bytes from memory into the register lower 32 bytes 9 and keep the remaining byte 9 does not change. 9 of 32 bytes each No. 9 according to the corresponding section 8 for sign extension. |
B bit is used to indicate the current or alternative groups.
Table E.16 shows VEC64 mode support vector load operation (by the TT field as LT
Coding). At this point VCSR <0> bit is set.
Table E.16: VEC32 load operation mode REAR
B:D:LT | Mnemonic | Significance |
0000 | .4 | 4 bytes from the memory into the register to load the lower 4 bytes 9 and keep the remaining byte 9 does not change. Each of 4 bytes 9 The first nine months, according to the corresponding section 8 for sign extension. |
0001 | .8 | Loaded from the memory into the register lower 8 bytes 8 bytes 9 and keep the remaining byte 9 does not change. 9, each 8 bytes First nine months according to the corresponding section 8 for sign extension. |
0010 | .16 | 16 bytes from the memory into the register to load the lower 16 characters Section 9 and keep the remaining byte 9 does not change. 16 bytes 9 Each section 9 according to the corresponding section 8 for sign extension. |
B:D:LT | Mnemonic | Significance |
0011 | .32 | Load 32 bytes from memory into the register lower 32 words Section 9 and keep the remaining byte 9 does not change. 32 bytes 9 Each section 9 according to the corresponding section 8 for sign extension. |
1000 | Undefined | |
1001 | Undefined | |
1010 | Undefined | |
1011 | .64 | Loads from memory 64 bytes into the register lower 64 words Section 9 and keep the remaining byte 9 does not change. 64 bytes 9 Each section 9 according to the corresponding section 8 for sign extension. |
64 bytes of bit B is used to indicate vector operations, because the VEC64 mode when a group and there is no alternative
The concept of groups.
Table E.17 lists VEC32 and VEC64 scalar modes are supported storage operation (in the TT field
Is encoded as ST).
Table E.17: REAR scalar storage operations
D:ST | Mnemonic | Significance |
100 | .b | Memory byte or byte 9 lengths become 8 (from byte 9 truncated one) |
101 | .h | Store halfword length become 16 |
110 | Undefined | |
111 | .w | Memory word length as 32 |
Table E.18 lists VEC32 mode support vector storage operation (in the TT field was incorporated as a ST
Code), then VCSR <0> bit is cleared.
Table E18: VEC32 mode REAR vector storage operations
D:ST | Mnemonic | Significance |
000 | .4 | 4 bytes from the register memory to the memory, register 4 bytes 9 Each section 9 is ignored. |
001 | .8 | Storage 8 bytes from the register to the memory byte register 8 9 Each section 9 is ignored. |
010 | .1b | Store 16 bytes from register to memory, registers 16 bytes 9 Each of the ninth bit is ignored. |
011 | .32 | Store 32 bytes from register to memory, registers 32 bytes 9 Each of the ninth bit is ignored. |
Table E.19 lists VEC64 mode support vector storage operation (in the TT field was incorporated as a ST
Code), then VCSR <0> bit is set.
Table E.19: In VEC32 REAR vector memory operation mode
B:D:ST | Mnemonic | Significance |
0000 | .4 | 4 bytes from the register memory to the memory, register 4 bytes 9 each of the ninth bit is ignored. |
0001 | .8 | Register stores 8 bytes from the memory, registers 8 bytes 9 each of the ninth bit is ignored. |
0010 | .16 | Store 16 bytes from register to memory, registers 16 words Each section 9 Section 9 is ignored. |
0011 | .32 | Store 32 bytes from register to memory, registers 32 words Each section 9 Section 9 is ignored. |
1000 | Undefined | |
1001 | Undefined | |
1010 | Undefined | |
1011 | .64 | Store 64 bytes from register to memory, registers 64 words Each section 9 Section 9 is ignored. |
Bit B is used to indicate 64 byte vector operations, because in VEC64 mode does not exist in the current group and alternative
The concept of groups.
REAI formatBy the loading, storage, and operating instruction cache, table E.20 shows REAI grid
Under the meaning of each field type.
Table E.20: REAI format
Field | Significance |
OPC<4:0> | Opcode |
B | Group identifier register Rn. When VEC32 mode settings, Rn <4:0> indicates the group in alternative vector register number; When VEC64 mold Type is set, it indicates that all vectors (64 bytes) operation. |
D | Purpose / Source scalar register. When set, Rn <4:0> represents a landmark Volume register. In VEC32 mode B: D-coded legal values are: |
00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is an alternative set of vector registers 11 Undefined In VEC64 mode B: D-coded legal values are: 00 only in the vector register Rn, 8, 16 or 32 bytes are the Use 01 Rn is a scalar register 10 vector registers Rn, the entire 64 bytes are used 11 Undefined ... | |
TT<1:0> | 00 Rn is the current set of vector registers 01 Rn is a scalar register (in the current group) 10 Rn is an alternative set of vector registers 11 Undefined In VEC64 mode B: D-coded legal values are: 00 only in the vector register Rn, 8, 16 or 32 bytes are the Use 01 Rn is a scalar register 10 vector registers Rn, the entire 64 bytes are used 11 Undefined ... |
C | Cache Close, set this bit to bypass loading the data cache. This bit is used to load and store instructions Cache-off mnemonic set (connection OFF to mnemonic). |
A | Address is updated, set this bit with a valid address update SRb. Effective address by SRb + IM <7:0> calculation. |
Rn<4:0> | Destination / source register number |
SRb<4:0> | Scalar base register number |
IMM<7:0> | An 8-bit immediate offset, according to two's complement digital illustration. |
REAR and REAI format used to transmit the same type of coding. See further details on the coding
REAR format.
RRRM5 format provides three registers or two registers, and a 5-bit immediate operand. Table E.21
RRRM5 format defined fields.
Table E.21: RRRM5 format
Field | Significance |
OP<4:0> | Opcode |
D | Purpose scalar register. When set, Rd <4:1> indicates scalar Storage Device; When cleared, Rd <4:0> indicates vector register. |
S | Scalar Rb register. When set point Rb <4:0> is a scalar register Device; When cleared, Rb <4:0> is the vector registers. |
SD<1:0> | Data width, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type) 11 characters (for int2 or floating point data types) |
M | D: S bit modifier, see below D: S: M coding table. |
Rd<4:0> | Objective D register number |
Ra<4:0> | Source A register number |
Rb <4:0> or IM5 <4:0> | Source B register, or 5-bit literal, depending on D: S: M coding, 5 immediate value as an unsigned number. |
Bit 19:15 Reserved and must be zero to ensure compatibility in the future to expand.
All vector register operand refers to the current group (group 0 can be also be a group) unless otherwise
Make statements. Table E.22 lists when DS <1:0> is 00, 01 or 10 of the D: S: M series
Yards.
Table E22: DS is not equal to 11:00 RRRM5 the D: S: M Coding
Coding | Rd | Ra | Rb/IM5 | Note |
000 | VRd | VRa | VRb | Three vector register operands |
001 | Undefined | |||
010 | VRd | VRa | SRb | B operand is a scalar register |
011 | VRd | VRa | IM5 | B operand is the immediate 5 |
100 | Undefined | |||
101 | Undefined | |||
110 | SRd | SRa | SRb | Three scalar register operand |
111 | SRd | SRa | IM5 | B operand is the immediate 5 |
When DS <1:0> is 11:00 D: S: M coding has the following meanings:
Table E.23: DS equal to 11:00, RRRM5 the D: S: M Coding
D:S:M | Rd | Ra | Rb/IM5 | Note |
000 | VRd | VRa | VRb | Three vector register operand (int32) |
001 | VRd | VRa | VRb | Three vector register operand (float) |
010 | VRd | VRa | SRb | B operand is a scalar register (int32) |
011 | VRd | VRa | IM5 | B operand is 5 immediate data (int32) |
100 | VRd | VRa | SRb | B operand is a scalar register (float) |
101 | SRb | SRa | SRb | Three scalar register operand (float) |
110 | SRd | SRa | SRb | Three scalar register operand (int32) |
111 | SRd | SRa | IM5 | B operand is 5 immediate data (int32) |
RRRR formatProvides four register operands, Table E.24 shows RRRR format fields.
Table E.24: RRRR format
Field | Significance |
Op<4:0> | Opcode |
S | Scalar Rb register. When set point Rb <4: 0> is a scalar register; When cleared, Rb <4: 0> is a vector register. |
DS<1:0> | Data length, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type) 11 characters (for int32 data type) |
Rc<4:0> | Source / destination register number C |
Rd<4:0> | Objective D register number |
Ra<4:0> | Source A register number |
Rb<4:0> | Source B register number |
All vector register operand refers to the current group (either 0 group can also be a group), unless otherwise
Make statements.
RI formatOnly by the load immediate instruction. Table E.25 RI format specified field.
Table E.25: RI format
Field | Significance |
D | Purpose scalar register. When set, Rd <4:0> represents a landmark Volume registers; When cleared, Rd <4:0> indicates the current group a Vector register. |
F | Floating-point data types. When set, indicates that a floating-point data types, and Requirements DS <1:0> of 11. |
DS<1:0> | Data length, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for intt9 data type) 10 half-word (for int16 data type) |
11 characters (for int32 or floating point data types) | |
Rd<4:0> | Objective D register number |
IMM<18:0> | A literal value 19 |
Field F: DS <1:0> of certain coding undefined. Programming these codes should not, as
This structure is not given when using this encoding the expected consequences. Loaded into the Rd value depends on the number of
The type of data, as shown in Table E.26.
Table E.26: RI format load value
Format | Data Type | Register operand |
.b | Byte (8) | Rd<7:0>:=Imm<7:0> |
.b9 | Byte (9) | Rd<8:0>:=Imm<8:0> |
.h | Half-word (16) | Rd<15:0>:=Imm<15:0> |
.w | Word (32) | Rd <31:0>: = sign extension IMM <18:0> |
.f | Floating point (32) | Rd <31>: = Imm <18> (symbol) Rd <30:23>: = Imm <17:0> (index) Rd <22:13>: = Imm <9:0> (mantissa) Rd <12:0>: = 0 |
CT formatContains fields shown in Table E.27.
Table E.27: CT format
Field | Significance |
Opc<3:0> | Opcode |
Cond<2:0> | Transfer conditions: 000 unconditional Less than 001 010 is equal to Less than or equal to 011 100 is greater than 101 is not equal to Greater than or equal to 110 111 Overflow |
IMM<22:0> | 23 immediate digital offset press two's complement number instructions. |
Transfer conditions using VCSR [GT: EQ: LT] fields. Overflow condition using VCSR [SO] bit, when
When set, it takes precedence over GT, EQ and LT bit. VCCS and VCBARR places other than the above
Said to explain Cond <2:0> field, refer to its instruction description details.
RRRM9 formatSpecify three registers or two registers and a 9 immediate operand. Table
E.28 given RRRM9 formatted field.
Table E.28: RRRM9 format
Field | Significance |
Opc<5:0> | Opcode |
D | Purpose scalar register. When set, Rd <4:0> represents a landmark Volume registers; When cleared, Rd <4:0> represents a vector register Makers. |
S | Scalar Rb register. When set, indicates Rb <4:0> is a scalar Register; When cleared, Rb <4:0> is a vector register. |
DS<1:0> | Data width, which is encoded as: 00 bytes (for int8 data type) 01 byte 9 (for int9 data type) 10 half-word (for int16 data type) |
11 characters (for int32 or floating point data types) | |
M | On the D: S bit modifier, see the back D: S: M coding table. |
Rd<4:0> | Purpose register number |
Ra<4:0> | Source A register number |
Rb <4:0> or IM5 <4:0> | Source B register number or a 5-bit literal, depending on the D: S: M series Yards. |
IM9<3:0> | And IM5 <4:0> supplied with a 9 immediate, depending on D: S: M coding. |
Bits are reserved bits 19:15, when D: S: M coding does not specify an immediate operand, and must
Must be 0 to ensure future compatibility.
All vector register operand refers to the current group (either 0 group can also be a group) unless otherwise
Make statements. D: S: M coding with RRRM5 format shown in Table E.22 and E.23 those are relative
Same, except under DS <1:0> coding segments extracted from the literal number immediately above, Table E.29
Shown.
Table E.29: RRRM9 format literal value
DS | Matching data types | B operand |
00 | int8 | Source B<7:0>:=IM9<2:0>:IM5<4: 0> |
01 | int9 | Source B<8:0>:=IM9<3:0>:IM5<4: 0> |
10 | int16 | Source B<15:0>:=sex(IM9<3:0>:IM5<4: 0>) |
11 | int32 | Source B<31:0>:=sex(IM9<3:0>:IM5<4: 0>) |
Floating-point data types can not get immediate format.
The following is based on Alphanumeric MSP vector instructions. Note:
1 Unless otherwise indicated, the instruction is shielded by the elements of. CT formatting commands without shadow shield element
Rang. By the load, store and cache directive composed REAR and REAI formatting commands are not subject to
Elements shield effect.
2 floating-point data types can not get 9 immediate operand.
3 only in the operating instructions given in vector form. The scalar operations, assuming only one, namely 0
Element is defined.
4 pairs RRRM5 and RRRM9 format, the following coding for integer data types (b, b9, h,
w):
D:S:M | 000 | 010 | 011 | 110 | 111 |
DS | 00 | 01 | 10 | 11 |
5 on RRRM5 and RRRM9 format, the following coding for floating-point data types:
D:S:M | 001 | 100 | n/a | 101 | n/a |
DS | 11 |
6 may cause an overflow for all the instructions that, when VCSR <ISAT> bit is set, the saturation to
int8, int9, int16, int32 maximum or minimum limit is adopted. Accordingly, when VCSR
<FSAT> Bit is set, the floating-point result saturates to - infinity, -0, +0 or + infinity.
7 Press syntactic rules,. N can be used instead. B9 to represent the data length byte 9.
8 for all the instructions to return to the destination register or to the vector of the accumulator is IEEE754 floating-point results
Single-precision format. Floating-point results written to the lower portion of the accumulator, high part does not change.
VAAS3 plus and additional (-1, 0) Symbol
Format
Assembler syntax
VAAS3.dt VRd,VRa,VRb
VAAS3.dt VRd,VRa,SRb
VAAS3.dt SRd,SRa,SRb
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | S<-S@S | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / Scalar contents of register Ra Rb is added to produce an intermediate result, the intermediate results
Additionally with Ra symbol; and the end result is stored in the vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
if(Ra[i]>0) extsgn3=1;
else if(Ra[i]<0) extsgn3=-1;
else extsgn3=0;
Rd[i]=Ra[i]+Rb[i]+extsgn3;
}
Abnormal
Overflow.
VADAC add and accumulate
Format
Assembler syntax
VADAC.dt VRc,VRd,VRa,VRb
VADAC.dt SRc,SRd,SRa,SRb
Where dt = {b, b9, h, w}.
Supported modes
S | VR | SR | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
The Ra and Rb each element of the operand vector elements with each double accumulator sum the
Each element of the double-precision and stored into the accumulator and the destination vector register Ra and Rd. Ra and Rb
Using the specified data type, and VAC using the appropriate double-precision data types (16,18,32 and
64 respectively int8, int9, int16 and int32). Each double-precision elements are stored in a high
VACH and Rc. If Rc = Rd, Rc the results are undefined.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Aop[i]={VRa[i]‖SRa};
Bop[i]={VRb[i]‖SRb};
VACH[i]:VACL[i]=sex(Aop[i]+Bop[i])+VACH[i]:VACL[i];
Rc[i]=VACH[i];
Rd[i]=VACL[i];
}
VADACL add and accumulate low
Format
Assembler syntax
VADACL.dt VRd,VRa,VRb
VADACL.dt VRd,VRa,SRb
VADACL.dt VRd,VRa,#IMM
VADACL.dt SRd,SRa,SRb
VADACL.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M DS | V<-V@V int8(b) | V<-V@S int9(b9) | V<-V@I int16(h) | S<-S@S int32(w) | S<-S@I |
The Ra and Rb / immediate operand vector for each element and each extended precision accumulator element
Addition, the extended precision and deposit vector accumulator; returned to the accuracy of the lower destination register Rd.
Ra and Rb / immediate use of the specified data type, and VAC with the appropriate double-precision data types
(16,18,32 and 64 respectively int8, int9, int16 and int32). Each extended precision
Elements are stored in VACH in high.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
VACH[i]:VACL[i]=sex(Ra[i]+Bop[i])+VACH[i]:VACL[i];
Rd[i]=VACL[i];
}
VADD plus
Format
Assembler syntax
VADD.dt VRd,VRa,VRb
VADD.dt VRd,VRa,SRb
VADD.dt VRd,VRa,#IMM
VADD.dt SRd,SRa,SRb
VADD.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
Plus Ra and Rb / immediate operands, and return them to the destination register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]=[VRb[i]‖SRb‖sex(IMM<8:0>));
Rd[i]=Ra[i]+Bop[i];
}
Abnormal
Overflow, floating-point invalid operand.
VADDH plus two adjacent elements
Format
Assembler syntax
VADDH.dt VRd,VRa,VRb
VADDH.dt VRd,VRa,SRb
Where dt = {b, b9, h, w, f}.
Supported modes
Explanation
D:S:M | V<-V@V | V<-V@S | |||
DS | int8.(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Operating
for(i=0;i<NumElem-1;i++){
Rd[i]=Ra[i]+Ra[i+1];
}
Rd[NumElem-1]=Ra[NumElem-1]+{VRb[0]‖SRb};
Abnormal
Overflow, floating-point invalid operand.
Programming notes
This directive is NOT affected shielding elements.
VAND with
Format
Assembler syntax
VAND.dt VRd,VRa,VRb
VAND.dt VRd,VRa,SRb
VAND.dt VRd,VRa,#IMM
VAND.dt SRd,SRa,SRb
VAND.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For Ra and Rb / logical and immediate operands and returns the result to the destination register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=Ra[i]<k>&Bop[i]<k>,k=for all bits in elementi;
}
Abnormal
None.
VANDC and complement
Format
Assembler syntax
VANDC.dt VRd,VRa,VRb
VANDC.dt VRd,VRa,SRb
VANDC.dt VRd,VRa,#IMM
VANDC.dt SRd,SRa,SRb
VANDC.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For Ra and Rb / immediate operands and logical complement, and returns the result to the destination register
Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]=(VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=Ra[i]<k>&~Bop[i]<k>,k=for all bits in elementi;
}
Abnormal
None.
Arithmetic shift accumulator VASA
Format
Assembler syntax
VASAL.dt
VASAR.dt
Where dt = {b, b9, h, w}, and R that direction left or right shift.
Supported modes
R | left | right | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector accumulator registers each data element left one position, and filled with zeros from the right
(If R = 0), or a sign-extended right position (if R = 1). The results are stored in a vector
Accumulator.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
if(R=1)
VACOH[i]:VACOL[i]=VACOH[i]:VACOL[i]sign>>1;
else
VACOH[i]:VACOL[i]=VACOH[i]:VACOL[i]<<1;
}
Abnormal
Overflow.
VASL arithmetic left shift
Format
Assembler syntax
VASL.dt VRd,VRa,SRb
VASL.dt VRd,VRa,#IMM
VASL.dt SRd,SRa,SRb
VASL.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S>-S@I | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar register Ra each data element left, from the right are filled with zeros by the shift amount
Scalar register Rb or IMM field gives the results stored in the vector / scalar register Rd. To
Those elements caused an overflow, the result and in accordance with its symbol contains the largest positive or negative value to the maximum. Shift
Position is defined as an unsigned integer.
Operating
shift_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
Rd[i]=saturate(Ra[i]<<shift_amount);
}
Abnormal
None.
Programming notes
Note Shift-amount from SRb or IMM <4:0> gained five digits. For byte,
byte9, halfword data types, the programmer is responsible for the shift amount specified correctly, this shift is less than or equal to
The number of bits in the data length. If the shift is greater than the specified data length, the element will be filled with zeros.
VASR arithmetic shift right
Format
Assembler syntax
VASR.dt VRd,VRa,SRb
VASR.dt VRd,VRa,#IMM
VASR.dt SRd,SRa,SRb
VASR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S<-S@I | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar register Ra, each data element is an arithmetic right shift, the most significant bit position of a character
Number extension, the shift amount in the scalar register Rb, or the least significant bit IMM field is given, the results
Stored in a vector / scalar register Rd. Shift amount specified as an unsigned integer.
Operating
shift_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMAS[i];i++)(
Rd[i]=Ra[i]sign>>shift_amount;
}
Abnormal
None.
Programming notes
Note Shift-amount from SRb or IMM <4:0> gained five digits. For
byte, byte9, halfword data types, the programmer is responsible for correctly specified shift amount, a small amount of this shift
Than or equal to the length of the data digits. If the shift is greater than the specified length of data elements by symbol
Bit stuffing.
VASS3 plus and minus (-1, 0) Symbol
Format
Assembler syntax
VASS3.dt VRd,VRa,VRb
VASS3.dt VRd,VRa,SRb
VASS3.dt SRd,SRa,SRb
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | S<-S@S | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / Scalar contents of register Ra Rb is added to produce an intermediate result, and the Ra is
Symbols removed from the intermediate results; final result is stored in the vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
if(Ra[i]>0) extsgn3=1;
else if(Ra[i]<0) extsgn3=-1;
else extsgn3=0;
Rd[i]=Ra[i]+Rb[i]-extsgn3;
}
Abnormal
Overflow.
VASUB absolute value subtraction
Format
Assembler syntax
VASUB.dt VRd,VRa,VRb
VASUB.dt VRd,VRa,SRb
VASUB.dt VRd,VRa,#IMM
VASUB.dt SRd,SRa,SRb
VASUB.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@V | V+V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
Vector / scalar register Rb or IMM field content from the vector / scalar contents of register Ra
Subtracted, the absolute results are stored in the vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]=[Rb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]=|Ra[i]~Bop[i]|;
Abnormal
Overflow, floating-point invalid operand.
Programming notes
If the subtraction result is the biggest negative, then the absolute value of the operation after an overflow occurs. If you allow full
And mode of operation of the absolute value of the result of this will be the largest positive number.
VAVG two elements mean
Format
Assembler syntax
VAVG.dt VRd,VRa,VRb
VAVG.dt VRd,VRa,SRb
VAVG.dt SRd,SRa,SRb
Where dt = {b, b9, h, w, f}. Use VAVGT for integer data types to refer to
Be "truncated" rounding mode.
Supported modes
D:S:M | V<-V@V | V<-V@S | S<-S@S | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
Vector / scalar add the contents of register Ra vector / scalar register Rb contents to generate a
Intermediate results; followed by the intermediate result by 2, and the final result is stored in the vector / scalar register Rd
Medium. For integer data types, T = 1 if the rounding mode is truncated, and if T = 0 (default),
Then rounded to zero. Floating-point data types, the rounding mode specified by the VCSR <RMODE>.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]=(Rb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]=(Ra[i]+Bop[i])//2;
}
Abnormal
None.
VAVGH average of two adjacent elements
Format
Assembler syntax
VAVGH.dt VRd,VRa,VRb
VAVGH.dt VRd,VRa,SRb
Where dt = {b, b9, h, w, f]. Use VAVGHT for integer data types to
Specify "truncate" the rounding mode.
Supported modes
D:S:M | V<-V@V | V<-V@S | |||
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
For each element, an average of two adjacent elements right. On integer data type, if T = 1,
Rounding mode is cut off, while the T = 0 (default) is rounded down to zero. Floating-point data types, the rounding mode
Designated by the VCSR <RMODE>.
Operating
for(i=0;i<NumElem-1;i++){
Rd[i]=(Ra[i]+Ra[i+l])//2;
}
Rd[NumElem-1]=(Ra[NumElem-1)+{VRb[0]‖SRb})//2;
Abnormal
None.
Programming notes
This command is not affected by masking element.
VAVGQ four average
Format
Assembler syntax
VAVGQ.dt VRd,VRa,VRb
Where dt = {b, b9, h, w}. Use VAVGQT for integer data types to indicate
"Truncation" rounding mode.
Supported modes
D:S:M | V<-V@V | |||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
In VEC64 mode does not support this command.
As shown below, the use of truncated mode specified by the T (1 truncated rounding zero 0 is the default)
To calculate the average of four elements. Note that the leftmost element (Dn-1) Is undefined.
Operating
for(i=0;i<NumElem-1;i++){
Rd[i]=(Ra[i]+Rb[i]+Ra[i+1]+Rb[i+1])//4:
}
Abnormal
None.
VCACHE Cache Operation
Format
Assembler syntax
VCACHE.fc SRd,SRi
VCACHE.fc SRb,#IMM
VCACHE.fc SRb+,SRi
VCACHE.fc SRb+,#IMM
Where fc = {0,1}.
Explanation
The instruction for vector data use Cache software management. When the data part or all of the Cache
Such as temporary memory is configured, this command has no effect on the temporary memory.
Supports the following options:
FC<2:0> | Significance |
000 | Write-back and make it match with the EA label altered the Cache line is invalid. If Matching row contains data that is not altered, then make this line is not valid without the write-back. If Found no Cache line contains EA, the data Cache reserve the right not to be touched. |
001 | Write-back and make the index specified by EA altered the Cache line is invalid. If Matching row contains data that is not altered, so that this line is not valid without the write-back. |
Other | Undefined |
Operating
Abnormal
None.
Programming notes
This command is not affected by masking element.
VCAND complement and
Format
Assembler syntax
VCAND.dt VRd,VRa,VRb
VCAND.dt VRd,VRa,SRb
VCAND.dt VRd,VRa,#IMM
VCAND.dt SRd,SRa,SRb
VCAND.dt SRd,SRa,#IMM
Where dt = (b, b9, h, w). Note. W and. F indicate the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For Ra and Rb / immediate operands and logical complement, and return their results to the destination register
Devices Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=~Ra[i]<k>&Bop[i]<k>,k=for all bits in elementi;
}
Abnormal
None.
VCBARR Conditions barrier
Format
Assembler syntax
VCBARR.cond
Where cond = {0-7}. Each condition will be given later mnemonics.
Explanation
As long as this condition remains valid, delaying all the directives and subsequent instructions (appear in the program sequence
Those behind). Cond <2:0> field interpretation CT format different from the other conditions
Instruction.
Current definition of the following conditions:
Cond<2:0> | Significance |
000 | Later in the implementation of any command, waiting all previous instructions (program sequence Column appears earlier) to end the execution. |
Other | Undefined |
Operating
while(Cond=true)
stall all later instructoins;
Abnormal
None.
Programming notes
This instruction is provided for the software to force a series of instructions executed. This command can be used to force precisely
Report does not clearly abnormal event. For example, if the instruction is immediately used in the calculation of abnormal events can cause
After surgery instructions, this event will be the instruction addressing exception program counter reports.
VCBR conditional branch
Format
Assembler syntax
VCBR.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then the transfer, this is not a delayed branch.
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un))
VPC=VPC+sex(Offset<22:0>*4);
elseVPC=VPC+4;
Abnormal
Invalid instruction address.
VCBRI indirect conditional branch
Format
Assembler syntax
VCBRI.cond SRb
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then the indirect transfer. This is not a delayed branch.
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un))
VPC=SRb<31:2>:b’00;
elseVPC=VPC+4;
Abnormal
Invalid instruction address.
VCCS Conditions scene conversion
Format
Assembler syntax
VCCS #Offset
Explanation
If VIMSK <cse> is true, then jump to the site conversion routines. This is not a delayed turn
Shift.
If VIMSK <cse> is true, VPC +4 (return address) saved to the return address stack
Stacks. If not true, from VPC +4 continue.
Operating
If(VIMSK<cse>=1){
if(VSP<4>>15){
VISRC<RASO>=1;
signal ARM7 with RASO exception;
VP STATE=VP_IDLE;
}else{
RSTACK[VSP<3:0>]=VPC+4;
VSP<4:0>=VSP<4:0>+1;
VPC=VPC+sex(Offset<22:0>*4);
}
}else VPC=VPC+4;
Abnormal
Return address stack overflow.
VCHGCR change control register
Format
Assembler syntax
VCHGCR Mode
Explanation
This command changes the operating mode of vector processors
Mode in each specified as follows:
Mode | Significance |
bit1:0 | This two control VCSR <CBANK> bit. Coding specify: 00 - do not change 01 - Clear VCSR <CBANK> bit 10 - Set VCSR <CBANK> bit 11 - bit trigger VCSR <CBANK> |
bits3:2 | This two control VCSR <SMM> bit. Coding specify: 00 - do not change 01 - Clear VCSR <SMM> bit 10 - Set VCSR <SMM> bit 11 - bit trigger VCSR <SMM> |
bit5:4 | This two control VCSR <CEM> bit. Coding specify: 00 - do not change 01 - Clear VCSR <CEM> bit 10 - Set VCSR <CEM> bit 11 - bit trigger VCSR <CEM> |
Other | Undefined |
Operating
Abnormal
None.
Programming notes
The directive provides for the hardware to be more effective than VMOV instructions to change the way in VCSR
Control bit.
VCINT condition interrupt ARM7
Format
Assembler syntax
VCINT.cond #ICODE
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, when enabled, execution stops and interrupts ARM7.
Operating
If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un)){
VISRC<vip>=1;
VIINS=[VCINT.cond#ICODE instruction];
VEPC=VPC;
if(VIMSK<vie>=1)signal ARM7 interrupt;
VP_STATE=VP_IDLE;
}
else VPC=VPC+4;
Abnormal
VCINT interrupted.
VCJOIN connection with ARM7 task conditions
Format
Assembler syntax
VCJOIN.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, when enabled, execution stops and interrupts ARM7.
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)){
VISRC<vjp>=-1;
VIINS=[VCJOIN.cond#Offset instruction];
VEPC=VPC;
if(VIMSK<vje>=1)signal ARM7 interrupt;
VP_STATE=VP_IDLE;
}
else VPC=VPC+4;
Abnormal
VCJOIN interrupted.
VCJSR conditional jump to subroutine
Format
Assembler syntax
VCJSR.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then jump to the subroutine. This is not a delayed branch.
If Cond is true, the VPC +4 (return address) saved to the return address stack. If a non-
True, from the VPC +4 continue.
Operating
If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un))(
if(VSP<4>>15){
VISRC<RASO>=1;
signal ARM7 with RASO exception;
VP_STATE=VP_IDLE;
}else{
RSTACK[VSP<3:0>]=VPC+4;
VSP<4:0>=VSP<4:0>+1:
VPC=VPC+sex(Offset<22:0>*4);
}
}else VPC=VPC+4;
Abnormal
Return address stack overflow.
VCJSRI indirect conditional jump to subroutine
Format
Assembler syntax
VCJSRI.cond SRb
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, then the indirect jump to subroutine. This is not a delayed branch.
If Cond is true, VPC +4 (return address) saved to the return address stack. If
Not true, from VPC +4 continue.
Operating
If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un)){
if(VSP<4:0>15){
VISRC<RASO>=1;
signal ARM7 with RASO exception;
VP_STATE=VP_IDLE;
}else{
RSTACK[VSP<3:0>]=VPC+4;
VSP<4:0>=VSP<4:0>+1;
VPC=SRb<31:2>:b′00;
}
}else VPC=VPC+4:
Abnormal
Return address stack overflow.
VCMOV conditional branch
Format
Assembler syntax
VCMOV dt Rd,Rb,cond
VCMOV.dt Rd,#IMM,cond
Where dt = {b, b9, h, w, f}, cond = (un, lt, eq, le, gt,
ne, ge, ov}. Attention. F and. W specify the same operation, unless. F 9 data type does not support the
Immediate operand.
Supported modes
D:S:M | V<-V | V<-S | V<-I | S<-S | S<-I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
If Cond is true, the contents of register Rb transferred to the register Rd. ID <1:0>
Further specify the source and destination registers:
Vector register VR current group
SR scalar register
SY synchronization register
VAC vector accumulator register (register coded reference to the VAC VMOV explanation)
D:S:M | ID<1:0>=00 | ID<1:0>=01 | ID<1:0>=10 | ID<1:0>=11 |
V<-V | VR<-VR | VR<-VAC | VAC<-VR | |
V<-S | VR<-SR | VAC<-SR | ||
V<-I | VR<-I | |||
S<-S | SR<-SR | |||
S<-I | SR<-1 |
Operating
If((Cond=VCSR[SOV,GT,EQ,LT])|(Cond=un))
for(i=0;i<NumElem;i++)
Rd [i] = {Rb [i] ‖ ‖ SRb Sex (IMM <8:0>)};
Abnormal
None.
Programming notes
Elements of this Directive without shielding effect, VCMOVM affected by shielding elements.
On the eight elements, vector floating-point precision accumulator expansion is expressed using the full 576. Because
And, including the transfer of the accumulator vector registers must be specified. B9 data length.
VCMOVM elements shielded with conditional branching
Format
Assembler syntax
VCMOVM.dt Rd,Rb,cond
VCMOVM.dt Rd,#IMM,cond
Where dt = {b, b9, h, w, f}, cond = {un, lt, eq, le, gt,
ne, ge, ov}. Attention. F and. W specify the same operation, unless. F 9 data type does not support the
Immediate operand.
Supported modes
D:S:M | V<-V | V<-S | V<-1 | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
If Cond is true, then transfer the contents of register Rb to register Rd. ID <1:0>
Further specify the source and destination registers:
Vector register VR current group
SR scalar register
VAC vector accumulator register (register coded reference to the VAC VMOV explanation)
D:S:M | ID<1:0>=00 | ID<1:0>= 01 | ID<1:0>=10 | ID<1:0>=11 |
V<-V | VR<-VR | VR<-VAC | VAC<-VR | |
V<-S | VR<-SR | VAC<-SR | ||
V<-I | V<-I | |||
S<-S | ||||
S<-I |
Operating
If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un))
for(i=0;i<NumElem && MMASK[i];i++)
Rd[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};
Abnormal
None.
Programming notes
Elements of this Directive by VMMR shielding effect, VCMOV element shielded from impact.
On the eight elements in the vector floating-point precision accumulator expansion is expressed using the full 576. Because
And, including the transfer of the accumulator vector registers must be specified. B9 data length.
VCMPV comparison and set shield
Format
Assembler syntax
VCMPV.dt VRa,VRb,cond,mask
VCMPV.dt VRa,SRb,cond,mask
Where dt = {b, b9, h, w, f}, cond = {lt, eq, le, gt, ne,
ge,}, mask = {VGMR, VMMR}. If you specify is not masked, VGMR is assumed.
Supported modes
D:S:M | M<-V@V | M<-V@S | |||
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
VRa and VRb vector register contents by performing a subtraction operation (VRa [i]-VRb [i]) for
Element comparison means, if the result of the comparison instruction and VCMPV Cond fields match,
VGMR (eg K = 0) or VMMR (eg K = 1) # i phase register bit is set. For example,
If Cond field is less than (LT), when VRa [i] <VRb [i] is set VGMR [i] (or VMMR [i])
Position.
Operating
for(i=0:i<NumElem:i++){
Bop[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};
relationship[i]=Ra[i]?Bop[i];
if(K=1)
MMASK[i]=(relationship[i]=Cond)?True:False;
else
EMASK[i]=(relationship[i]=Cond)?True:False;
}
Abnormal
None.
Programming notes
This command is not affected shielding element.
VCNTLZ count leading zeros
Format
Assembler syntax
VCNTLZ.dt VRd,VRb
VCNTLZ.dt SRd,SRb
Where dt = {b, b9, h, w}.
Supported modes
5 | V<-V | S<-S | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For each element in Rb number of leading zeros count; count value returned in Rd.
Operating
for(i=0:i<NumElem && EMASK[i];i++){
Rd[i]=number of leading zeroes(Rb[i]);
}
Abnormal
None.
Programming notes
If all the bits are zero element, the result is equal to the length of elements (8,9,16 or 32 respectively
Corresponding byte, byte9, halfword, or word).
Leading zero count position index element has an inverse relationship (if used in VCMPR instruction
Behind). For the conversion to the element's position, for a given data type, subtract from NumElem VCNTLZ
Results.
VCOR or complement
Format
Assembler syntax
VCOR.dt VRd,VRa,VRb
VCOR.dt VRd,VRa,SRb
VCOR.dt VRd,VRa,#IMM
VCOR.dt SRd,SRa,SRb
VCOR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For Ra and Rb / immediate operand or a logical complement, and returns the result to the destination register
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=-Ra[i]<k>|Bop[i]<k>,k=for all bits in elementi;
Abnormal
None.
VCRSR conditions return from subroutine
Format
Assembler syntax
VCRSR.cond
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, return from subroutine. This is not a delayed branch
If Cond is true, from the stack, the return address stored in the return address to continue. As
If not true, from VPC +4 continue.
Operating
If((Cond=VCSR[SO,GTEQ,LT])|(Cond=un)){
if(VSP<4:0>=0){
VISRC<RASU>=1;
signal ARM7 with RASU exeeption:
VP_STATE=VP_IDLE;
}else{
VSP<4:0>=VSP<4:0>-1;
VPC=RSTACK[VSP<3:0>];
VPC<1:0>=b′00;
}
}else VPC=VPC+4;
Abnormal
Invalid instruction address, return address stack underflow.
VCVTB9 byte9 data type conversion
Format
Assembler syntax
VCVTB9.md VRd,VRb
VCVTB9.md SRd,SRb
Where md = {bb9, b9h, hb9}.
Support model
S | V<-V | S<-S | ||
MD | bb9 | b9h | hb9 |
Explanation
Each element in Rb convert from byte to byte9 (bb9), from byte9 converted to halfword (b9h)
Conversion to or from a halfword byte9 (hb9).
Operating
if(md<1:0>=0){//bb9 for byte to byte9 conversion
VRd=VRb;
VRd<9i+8>=VRb<9i+7>,i=0 to 31(or 63 in VEC64 mode)}
else if(md<1:0>=2){//b9h for byte9 to halfword conversion
VRd=VRb;
VRd<18i+16:18i+9>=VRb<18i+8>,i=0 to 15(or 31 in VEC64 mode)}
else if(md<1:0>=3)//hb9 for halfword to byte9 conversion
VRd<18i+8>=VRb<18i+9>,i=0 to 15(or 31 in VEC64 mode)
else VRd=undefuned;
Abnormal
None.
Programming notes
In conjunction with b9h mode before this instruction, requiring the programmer to use shuffle (shuffle) operation tone
Entire vector register the decrease in the number of elements. Hb9 used together with the instruction mode, requires
Programmer operation with unshuffle destination vector register adjustment the increase in the number of elements. This instruction does not
Masked by the impact of elements.
VCVTFF floating-point to fixed-point conversion
Format
Assembler syntax
VCVTFF VRd,VRa,SRb
VCVTFF VRd,VRa,#IMM
VCVTFF SRd,SRa,SRb
VCVTFF SRd,SRa,#IMM
Supported modes
D:S:M | V<-V,S | V<-V,I | S<-S,S | S<-S,I |
Explanation
Vector / scalar contents of register Ra convert from a 32-bit floating point format <X,Y> sentinel
Real number, wherein the length of Y the Rb (mode 32) or IMM field specifies, and X is the length from the (32-Y
The Length). X indicates the integer part, Y represents the fractional part. The result is stored in the vector / scalar register
Register Rd.
Operating
Y_size={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem;i++){
Rd[i]=convert to<32-Y_size.Y size>format(Ra[i]);
}
Abnormal
Overflow.
Programming notes
The directive only supports Word data length. Because the structure does not support multi-register data classes
Type, the instruction does not use the element shield. The directive on the use of integer data types rounded to zero rounding mode.
VCVTIF integer to floating point conversion
Format
Assembler syntax
VCVTIF VRd,VRb
VCVTIF VRd,SRb
VCVTIF SRd,SRb
Supported modes
D:S:M | V<-V | V<-S | S<-S |
Explanation
Vector / scalar register Rb contents from int32 convert floating-point data types, the result is stored in
Vector / scalar register Rd.
Operating
for(i=0;i<NumElem;i++){
Rd[i]=convert to floating point format(Rb[i]);
}
Abnormal
None.
Programming notes
This instruction supports only word data length. Because the structure does not support multiple data types in the register,
This instruction does not use the element shield.
VD1CBR VCR1 and conditions of transfer of minus one
Format
Assembler syntax
VD1CBR.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, VCR1 decremented and metastasis. This is not a delayed branch.
Operating
VCR1=VCR1-1;
If((VCR1>0)&((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)))
VPC=VPC+Sex(Offset<22:0>*4);
else VPC=VPC+4;
Abnormal
Invalid instruction address.
Programming notes
Note VCR1 condition is checked in before the transfer minus 1. When VCR1 perform this refers to 0:00
Order the loop count is set to 2 effective32-1。
Format
Assembler syntax
VD2CBR.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
If Cond is true, VCR2 decremented and metastasis. This is not a delayed branch.
Operating
VCR2=VCR2-1;
If((VCR2>0)&((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)))
VPC=VPC+sex(Offset<22:0>*4);
else VPC=VPC+4;
Abnormal
Invalid instruction address.
Programming notes
Note VCR2 condition is checked in before the transfer minus 1. When VCR2 perform this means is 0
Order the loop count is set to 2 effective32-1。
Format
Assembler syntax
VD3CBR.cond #Offset
Where cond = {un, lt, eq, le, gt, ne, ge, ov}.
Explanation
When Cond is true, VCR3 minus one and metastasis. This is not a delayed branch.
Operating
VCR3=VCR3-1;
If((VCR3>0)&((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)))
VPC=VPC+sex(Offset<22:0>*4);
else VPC=VPC+4;
Abnormal
Invalid instruction address.
Programming notes
Note VCR3 condition is checked in before the transfer minus 1. When VCR3 perform this refers to 0:00
Order the loop count is set to 2 effective32-1。
VDIV2N by 2
n
Except
Format
Assembler syntax
VDIV2N.dt VRd,VRa,SRb
VDIV2N.dt VRd,VRa,#IMM
VDIV2N.dt SRd,SRa,SRb
VDIV2N.dt SRd,SRa,#IMV
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S<-S@I | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar contents of register Ra are twonIn addition, where n is a scalar register Rb, or 2MM
Positive integer content, the final result is stored in the vector / scalar register Rd. This command uses the truncated (to
Zero rounding) as rounding mode.
Operating
N={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
Rd[i]=Ra[i]/2
N;
}
Abnormal
None.
Programming notes
Note that N is SRb or IMM <4:0> to obtain the five digits. For byte, byte9,
halfword data types, the programmer is responsible for properly specify the data length is less than or equal to the precision level N
Values. If it is greater than the precision of the specified data length, the element filled with the sign bit. This instruction is used to zero
Rounding rounding mode.
VDIV2N.F are two floating-point
n
Except
Format
Assembler syntax
VDIV2N.f VRd,VRa,SRb
VDIV2N.f VRd,VRa,#IMM
VDIV2N.f SRd,SRa,SRb
VDIV2N.f SRd,SRa,#IMM
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
Explanation
Vector / scalar contents of register Ra are 2n addition, where n is the scalar register Rb or the IMM
Positive integer content, the final result is stored in the vector / scalar register Rd.
Operating
N={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
Rd[i]=Ra[i]/2
N;
}
Abnormal
None.
Programming notes
Note that N is SRb or IMM <4:0> gained five digits.
VDIVI incomplete unless initialization
Format
Assembler syntax
VDIVI.ds VRb
VDIVI.ds SRb
Where ds = {b, b9, h, w}.
Supported modes
S | VRb | SRb | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Do not restore unsigned integer divide initialization steps. Dividend is double precision accumulator
Signed integer. If the dividend is a single-precision number, it must be sign-extended to double precision, and stored in
VACOH and VACOL in. Divisor is Rb in single-precision signed integer.
If the sign of the dividend the same sign as the divisor, is subtracted from the accumulator high Rb. As
Different, Rb is added to the accumulator on high.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb)
if(VACOH[i]<msb>=Bop[i]<msb>)
VACOH[i]=VACOH[i]-Bop[i];
else
VACOH[i]=VACOH[i]+Bop[i]:
}
Abnormal
None.
Programming notes
In division step, the programmer is responsible for checking overflow or division by zero situation.
VDOVS incomplete unless steps
Format
Assembler syntax
VDIVS.ds VRb
VDIVS.ds SRb
Where ds = {b, b9, h, w}.
Supported modes
S | VRb | SRb | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Perform a recovery with no sign except the election on behalf of steps. Requirements of this Directive to be executed with the same number of times
According to the same length (for example, int8 is 8, int9 of 9 times, int16 to 16, int32 data
Type 32). VDIVI instruction must be used in addition to the steps once before, early in the accumulator generates
Initial part of the remainder. Divisor is Rb in single-precision signed integer. Each step of extracting a quotient
Move accumulator bits and the least significant bit.
If the portion of the accumulator with the sign of the remainder of the same sign as the divisor in Rb, from high accumulator
Bit subtracted Rb. If they are different, Rb is added to the accumulator on high.
If the accumulator portion derived remainder (plus or minus a result) the sign of the divisor symbol phase
Same, then the quotient bit is 1. If not identical, then the quotient bit is 0. Accumulator left a position with suppliers
Bit populated.
In addition to the steps at the end, the remainder is in the accumulator high, rather low in the business in the accumulator. This quotient
1's complement form.
Operating
VESL elements to the left one
Format
Assembler syntax
VESL.dt SRc,VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
S | SRb | |||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
The elements in the vector register Ra left a location from the scalar register Rb populated. Being
Out of the leftmost element returns to scalar register Rc, other elements return to the vector register Rd.
Operating
VRd[0]=SRb;
for(i=o;i<NumElem-1;i++)
VRd[i]=VRa[i-1];
SRc=VRa[NumElem-1];
Abnormal
None.
Programming notes
This command is not affected shielding element.
VESR elements to the right one
Format
Assembler syntax
VESL.dt SRc,VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
S | SRb | |||
Ds | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
The vector register Ra are elements to the right one position, from a scalar register Rb populated. Being shifted
The rightmost element returns to scalar register Rc, other elements return to the vector register Rd.
Operating
SRc=VRa[0];
for(i=o;i<NumElem-2;i++)
VRd[i]=VRa[i+1];
VRd[NumElem-1]=SRb;
Abnormal
None.
Programming notes
This command is not affected shielding element.
VEXTRT extract an element
Format
Assembler syntax
VEXTRT.dt SRd,VRa,SRb
VEXTRT.dt SRd,VRa,#IMM
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | S<-S | S<-I | |||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Extracted from Ra vector register elements and store them in a scalar register Rd, the register of the cable
Cited by a scalar register Rb or IMM field indicates.
Operating
index32={SRb%32‖IMM<4:0>};
index64={SRb%64‖IMM<5:0>};
index=(VCSR<vec64>)?index64:index32;
SRd=VRa[index];
Abnormal
None.
Programming notes
This command is not affected shielding element.
VEXTSGN2 extraction (1, -1) symbol
Format
Assembler syntax
VEXTSGN2.dt VRd,VRa
VEXTSGN2.dt SRd,SRa
Where dt = {b, b9, h, w}.
Supported modes
S | V<-V | S<-S | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Calculate vector / scalar register Ra symbol value content element method, the result is stored in the vector /
Scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Rd[i]=(Ra[i]<0)?-1:1;
}
Abnormal
None.
VEXTSGN3 extract (1,0, -1) symbol
Format
Assembler syntax
VEXTSGN3.dt VRd,VRa
VEXTSGN3.dt SRd,SRa
Where dt = {b, b9, h, w}.
Supported modes
S | V<-V | S<-S | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Calculate vector / scalar register Ra symbol value content element method, the result is stored in the vector /
Scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
if(Ra[i]>0) Rd[i]=1;
else if(Ra[i]<0) Rd[i]=-1;
else Rd[i]=0;
}
Abnormal
None.
VINSRT insert an element
Format
Assembler syntax
VINSRT.dt VRd,SRa,SRb
VINSRT.dt VRd,SRa,#IMM
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-S | V<-I | |||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
The scalar register Ra, Rb elements in scalar register or IMM field specifies the index of the plug
Into the vector register Rd.
Operating
index32={SRb%32‖IM4<4:0>};
index64={SRb%64‖IMM<5:0>};
index=(VCSR<Vec64>)?index64:index32;
VRd[index]=SRa;
Abnormal
None.
Programming notes
Elements of this Directive without shielding effect.
VL load
Format
Assembler syntax
VL.lt Rd,SRb,SRi
VL.lt Rd,SRb,#IMM
VL.lt Rd,SRb+,SRi
VL.lt Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd,
SRd}. Note. B and. Bs9 specify the same action, .64, and can not be specified together VRAd. For cache
-off loaded using VLOFF.
Operating
Load current or alternative group a vector register or a scalar register.
Operating
EA=SRb+{SRi‖Sex(IMM<7:0>)};
if(A=1)SRb=EA;
Rd = the table below:
LT | Load operation |
.b | SR d<7:0>:=BYTE[EA] |
.bz9 | SR d<8:0>=zex BYTE[EA] |
.bs9 | SR 4<8:0>=sex BYTE[EA] |
.h | SR d<15:0>=HALF[EA] |
.w | SR d<31:0>=WORD[EA] |
.4 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 3 |
.8 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 7 |
.16 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 15 |
.32 | VR d<9i+8:9i>=sex BYT[EA+i],i=0 to 31 |
.64 | VR 0d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31 VR 1d<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VLCB loaded from the circular buffer
Format
Assembler syntax
VLCB.lt Rd,SRb,SRi
VLCB.lt Rd,SRb,#IMM
VLCB.lt Rd,SRb+,SRi
VLCB.lt Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd,
SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For cache
-off loaded using VLCBOFF.
Explanation
From the SRb+1The BEGIN pointer and SRb+2The END defined circular buffer pointer
Loads a vector or scalar register.
Address update is loaded and before the operation, such as the effective address is greater than END address, effectively
Address is adjusted. In addition,. H and. W scalar loaded circular buffer boundary must separately with halfword
And word boundaries.
Operating
EA=SR
b+{SRi‖sex(IMM<7:0>)};
BEGIN=SR
b+1;
END=SR
b+2;
cbsize=END-BEGIN;
if(EA>END)EA=BEGIN+(EA-END);
if(A=1)SR
b=EA;
R
d= See the following table:
LT | Load operation |
.bz9 | SR d<8:0>=zex BYTE[EA] |
.bs9 | SR d<8:0>=sex BYTE[EA] |
.h | SR d<15:0>=HALF[EA] |
.w | SR d<31:0>=WORD[EA] |
.4 | VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 3 |
.8 | VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 7 |
LT | Load operation |
.16 | VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 15 |
.32 | VR d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 31 |
.64 | VR 0d<9i+8:9i>=sex BYTE[(EA+i>END)?EA+i-cbsize:EA+i],i=0 to 31 VR 1d<9i+8:9i>=sex BYTE[(EA+32+i>END)?EA+32+i-cbsize:EA+32+i],i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
Programmer must determine the following condition for this command to work as desired:
BEGIN<EA<2*END-BEGIN
Namely, EA> BEGIN and EA-END <END-BEGIN.
VLD double load
Format
Assembler syntax
VLD.lt Rd,SRb,SRi
VLD.lt Rd,SRb,#IMM
VLD.lt Rd,SRb+,SRi
VLD.lt Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd,
SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For cache
-off loaded using VLDOFF.
Explanation
Load current or alternative group two vector registers or two scalar register.
Operating
EA=SR
b+{SR
i‖Sex(IMM<7:0>)};
if(A=1)SR
b=EA;
R
d:R
d+1= Table below:
LT | Load operation |
.bz9 | SR d<8:0>=zex BYTE[EA] SR d+1<8:0>=zex BYTE[EA+1] |
.bs9 | SR d<8:0>=zex BYTE[EA] SR d+1<8:0>=zex BYTE[EA+1] |
.h | SR d<15:0>=HALF[EA] SR d+1<15:0>=HALF[EA+2] |
.w | SR d<31:0>=WORD[EA] SR d+1<31:0>=WORD[EA+4] |
.4 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 3 VR d+1<9i+8:9i>=sex BYTE[EA+4+i],i=0 to 3 |
.8 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 7 VR d+1<9i+8:9i>=sex BYTE[EA+8+i],i=0 to 7 |
LT | Load operation |
.16 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 15 VR d+1<9i+8:9i>=sex BYTE[EA+16+i],i=0 to 15 |
.32 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31 VR d+1<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31 |
.64 | VR 0d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31 VR 1d<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31 VR 0d+1<9i+8:9i>=sex BYTE[EA+64+i],i=0 to 31 VR 1d+1<9i+8:9i>=sex BYTE[EA+96+i],i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VLI immediate loading
Format
Assembler syntax
VLI.dt VRd,#IMM
VLI.dt SRd,#IMM
Where dt = {b, b9, h, w, f}.
Explanation
Scalar or vector registers to load the immediate value.
Scalar register is loaded, according to the type of data loaded byte, byte9, halfword, or
word. For byte, byte9 and halfword data types, unaffected those byte (byte9) does not
Is changed.
Operating
Rd = the table below:
DT | Loading scalar | Vector Load |
.i8 | SR d<7:0>=IMM<7:0> | VR d=32 int8 elements |
.i9 | SR d<8:0>=IMM<8:0> | VR d=32 int9 elements |
.i16 | SR d<15:0>=IMM<15:0> | VR d=16 int16 elements |
.i32 | SR d<31:0>=sex IMM<18:0> | VR d=8 int32 elcments |
.f | SR d<31>=IMM<18>(sign) SR d<30:23>=IMM<17:10>(exponent) SR d<22:13>=IMM<9:0>(mantissa) SR d<12:0>=zeroes | VR d=8 float elements |
Abnormal
None.
VLQ four load
Format
Assembler syntax
VLQ.lt Rd,SRb,SRi
VLQ.lt Rd,SRb,#IMM
VLQ.lt Rd,SRb+,SRi
VLQ.lt Rd,SRb+,#IMM
Where lt = {b, bz9, bs9, h, w, 4,8,16,32,64}, Rd = {VRd, VRAd,
SRd}. Attention. B and. Bs9 specify the same operation, .64 and VRAd can not be specified together. For Cache
-off load utilization VLQOFF.
Explanation
Group in the current or alternative loading four vector registers or four scalar register.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)];
if(A=1)SR
b=EA;;
R
d:R
d+1:R
d+2:R
d+3= Table below:
LT | Load operation |
.bz9 | SR d<8:0>=zex BYTE[EA] SR d+1<8:0>=zex BYTE[EA+1] SR d+2<8:0>=zex BYTE[EA+2] SR d3<8:0>=zex BYTE[EA+3] |
.bs9 | SR d<8:0>=zex BYTE[EA] SR d+1<8:0>=zex BYTE[EA+1] SR d+2<8:0>=zex BYTE[EA+2] SR d+3<8:0>=zex BYTE[EA+3] |
.h | SR d<15:0>=HALF[EA] SR d+1<15:0>=HALF[EA+2] SR d+2<15:0>=HALF[EA+4] SR d+3<15:0>=HALF{EA+6] |
LT | Load operation |
.w | SR d<31:0>=WORD[EA] SR d+1<31:0>=WORD[EA+4] SR d+2<31:0>=WORD[EA+8] SR d+3<31:0>=WORD[EA+12] |
.4 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 3 VR d+1<9i+8:9i>=sex BYTE[EA+4+i],i=0 to 3 VR d+2<9i+8:9i>=sex BYTE[EA+8+i],i=0 to 3 VR d+3<9i+8:9i>=sex BYTE[EA+12+i],i=0 to 3 |
.8 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 7 VR d+1<9i+8:9i>=sex BYTE[EA+8+i],i=0 to 7 VR d+2<9i+8:9i>=sex BYTE[EA+16+i],i=0 to 7 VR d+3<9i+8:9i>=sex BYTE[EA+24+i],i=0 to 7 |
.16 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 15 VR d+1<9i+8:9i>=sex BYTE[EA+16+i],i=0 to 15 VR d+2<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 15 VR d+3<9i+8:9i>=sex BYTE[EA+48+i],i=0 to 15 |
.32 | VR d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31 VR d+1<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31 VR d+2<9i+8:9i>=sex BYTE[EA464+i],i=0 to 31 VR d+3<9i+8:9i>=sex BYTE[EA+96+i],i=0 to 31 |
.64 | VR 0d<9i+8:9i>=sex BYTE[EA+i],i=0 to 31 VR 1d<9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31 VR 0d+1<9i+8:9i>=sex BYTE[EA+64+i],i=0 to 31 VR 1d+1<9i+8:9i>=sex BYTE[EA+96+i],i=0 to 31 VR 0d+2<9i+8:9i>=sex BYTE[EA+128+i],i=0 to 31 VR 1d+2<9i+8:9i>=sex BYTE[EA+160+i],i=0 to 31 VR 0d+3<9i+8:9i>=sex BYTE[EA+192+i],i=0 to 31 VR 1d+3<9i+8:9i>=sex BYTE[EA+224+i],i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VLR reverse loading
Format
Assembler syntax
VLR.lt Rd,SRb,SRi
VLR.lt Rd,SRb,#IMM
VLR.lt Rd,SRb+,SRi
VLR.lt Rd,SRb+,#IMM
Where lt = {4,8,16,32,64}, Rd = {VRd, VRAd}. Note .64 and VRAd
Can not be specified together. Cache-off load for use VLROFF.
Explanation
Load a sequence in reverse element vector registers. This command is not supported scalar destination register.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)];
if(A=1)SR
b=EA;
Rd = the table below:
LT | Load operation |
.4 | VR d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 3 |
.8 | VR d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 7 |
.16 | VE d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 15 |
.32 | VR d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 31 |
.64 | VR 0d[31-i]<8:0>=sex BYTE[EA+32+i],i=0 to 31 VR 1d[31-i]<8:0>=sex BYTE[EA+i],i=0 to 31 |
Abnormal
Invalid data address address misaligned accesses.
Programming notes
This command is not affected shielding element.
VLSL Logical Shift Left
Format
Assembler syntax
VLSL.dt VRd,VRa,SRb
VLSL.dt VRd,VRa,#IMM
VLSL.dt SRd,SRa,SRb
VLSL.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S<-S@I | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar register Ra logical shift left each element, the least significant bit (LSB) position
Zero fill, the shift amount in the scalar register Rb or IMM field is given, the result is stored in the vector / standard
Amount of register Rd.
Operating
shift_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i]:i++){
Rd[i]=Ra[i]<<shift_amount;
}
Abnormal
None.
Programming notes
Note that shift-amount from SRb or IMM <4:0> gained five digits, for
In byte, byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to
The shift amount of bits. If the shift is greater than the specified data length, the element will be filled with zeros.
VLSR Logical Shift Right
Format
Assembler syntax
VLSR.dt VRd,VRa,SRb
VLSR.dt VRd,VRa,#IMM
VLSR.dt SRd,SRa,SRb
VLSR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S<-S@I | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar register Ra logical shift right for each element, the most significant bit (MSB) position
With zero fill, the shift amount in the scalar register Rb or IMM field is given, the result is stored in the vector /
Scalar register Rd.
Operating
shift_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
Rd[i]=Ra[i]zero>>shift_amount;
}
Abnormal
None.
Programming notes
Note that shift-amount from SRb or IMM <4:0> gained five digits, for
In byte, byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to
The shift amount of bits. If the shift is greater than the specified data length, the element will be filled with zeros.
VLWS span load
Format
Assembler syntax
VLWS.dt Rd,SRb,SRi
VLWS.dt Rd,SRb,#IMM
VLWS.dt Rd,SRb+,SRi
VLWS.dt Rd,SRb+,#IMM
Where dt = {4,8,16,32}, Rd = {VRd, VRAd}. Note that the mode is not .64
Support, with the VL instead. On the Cache-off loaded using VLWSOFF.
Explanation
From the effective address beginning with scalar register SRb +1 as the span of control registers, from the storage
32 bytes loaded into the vector registers VRd.
LT specified block size, for each block of consecutive bytes loaded. SRb +1 specified stride,
Separating the start of two consecutive blocks of bytes.
stride must be equal to or greater than the block size. EA must be aligned with the data length. stride
And the block size must be a multiple data length.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)};
if(A=1)SR
b=EA;
Block-size={4‖8‖16‖32};
Stride=SR
b+1<31:0>;
for(i=0;i<VECSIZE/Block-size;i++)
for(j=0;j<Block-size;j++)
VRd[i*Block-size+j]<8:0>=sex BYTE{EA+i*Stride
+j};
Abnormal
Invalid data address, unaligned accesses.
VMAC multiply and accumulate
Format
Assembler syntax
VMAC.dt VRa,VRb
VMAC.dt VRa,SRb
VMAC.dt VRa,#IMM
VMAC.dt SRa,SRb
VMAC.dt SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) | float(f) |
Explanation
Ra and Rb each element of each element in a double-precision multiplication to produce an intermediate result; the
The intermediate results of each element of the vector of double-precision accumulator element of each double precision addition, each
Double-precision elements and stored in vector accumulator.
Ra and Rb using the specified data type, and VAC using the appropriate double-precision data types (16,
32 and 64 respectively int8, int16 and int32). Each element in the high part of double-precision storage
In VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++)(
Aop[i]={VRa[i]‖SRa};
Bop[i]={VRb[i]‖SRb);
if(dt=float)VACL[i]=Aop[i]*Bop[i]+VACL[i];
else VACH[i]:VACL[i]=Aop[i]*Bop[i]+VACH[i]:VACL[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMACF fractional multiply and accumulate
Format
Assembler syntax
VMACF.dt VRa,VRb
VMACF.dt VRa,SRb
VMACF.dt VRd,#IMM
VMACF.dt SRa,SRb
VMACF.dt SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) |
Explanation
VRa and Rb each element of each element in a double-precision multiplication to produce an intermediate result;
This left a double-precision intermediate results; the shifted intermediate results of each of the double-precision elements and to
Double the amount of each element of the accumulator sum; each element to a double-precision vector accumulator and storage
Medium.
VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types
(16, 32 and 64 respectively int8, int16 and int32). Each element of the double-precision high portion
Points stored in VACH in.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
VACH[i]:VACL[i]=((VRa[i]*Bop[i])<<1)+VACH[i]:VACL[i];
}
Abnormal
Overflow.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMACL multiply and accumulate low
Format
Assembler syntax
VMACL.dt VRd,VRa,VRb
VMACL.dt VRd,VRa,SRb
VMACL.dt VRd,VRa,#IMM
VMACL.dt SRd,SRa,SRb
VMACL.dt SRd,SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) | float(f) |
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results
Fruit; this intermediate results of each double-precision vector accumulator elements and each element of double precision sum;
Each element will be stored in double-precision and vector accumulator; returned to the lower part of the destination register bit
VRd.
VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types
(16, 32 and 64 respectively int8, int16 and int32). Each bit double precision element part
Stored in VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb};
if(dt=float)VACL[i]=VRa[i]*Bop[i]+VACL[i];
else VACH[i]:VACL[i]=VRa[i]*Bop[i]+VACH[i]:VACL[i];
VRd[i]=VACL[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VMAD multiplication and addition
Format
Assembler syntax
VMAD.dt VRc,VRd,VRa,VRb
VMAD.dt SRc,SRd,SRa,SRb
Where dt = (b, h, w).
Supported modes
S | VR | SR | ||
DS | int8(b) | int16(h) | int32(w) |
Explanation
Each element of the Ra and Rb each element of a double-precision multiplication to produce intermediate results;
Each of the intermediate results of this double-precision elements and adding each element Rc; double precision of each element
Degrees and stored in the destination register Rd +1: Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Aop[i]={VRa[i]‖SRa];
Bop[i]=(VRb[i]‖SRb};
Cop[i]=(VRc[i]‖SRc};
Rd+1[i]:Rd[i]=Aop[i]*Bop[i]+sex_dp(Cop[i]);
}
Abnormal
None.
VMADL low multiplication and addition
Format
Assembler syntax
VMADL.df VRc,VRd,VRa,VRb
VMADL.dt SRc,SRd,SRa,SRb
Where dt = {b, h, w, f}.
Supported modes
S | VR | SR | ||
DS | int8(b) | float(f) | int16(h) | int32(w) |
Explanation
Each element of the Ra and Rb each element of a double-precision multiplication to produce intermediate results;
This intermediate result for each element of the double-precision adding each element Rc; double precision of each element
Degrees and the low part of the return to the destination register Rd.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++)(
Aop[i]={VRa[i]‖SRa};
Bop[i]={VRb[i]‖SRb];
Cop[i]={VRc[i]‖SRc{;
if(dt=Roat)Lo[i]=Aop[i]*Bop[i]+Cop[i];
else Hi[i]:Lo[i]=Aop[i]*Bop[i]+sex_dp(Cop[i]);
Rd[i]=Lo[i];
}
Abnormal
Overflow invalid floating point operands.
VMAS multiply and subtract from accumulator
Format
Assembler syntax
VMAS.dt VRa,VRb
VMAS.dt VRa,SRb
VMAS.dt VRa,#IMM
VMAS.dt SRa,SRb
VMAS.dt SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) | float(f) |
Explanation
Ra and Rb each element to each element in a double-precision multiplication to produce an intermediate result; from
Each double precision vector accumulator element by subtracting the intermediate results of each double precision element; each element
Double-precision and storage elements to the vector accumulator.
Ra and Rb using the specified data type, and VAC using the appropriate double-precision data types (16,
32 and 64 respectively int8, int16 and int32). Each element in the high part of double-precision storage
In VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb};
if(dt=float)VACL[i]=VACL[i]-VRa[i]*Bop[i];
else VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]*Bop[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMASF fractional multiply and subtract from accumulator
Format
Assembler syntax
VMASF.dt VRa,VRb
VMASF.dt VRa,SRb
VMASF.dt VRa,#IMM
VMASF.dt SRa,SRb
VMASF.dt SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) |
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results
Fruit; intermediate results of a double-precision left one; from each double-precision vector accumulator subtracts the elements are
Each intermediate result shift double precision element; the double of each element to the vector accumulation and storage
Makers.
VRa and Rb using the specified data type, and VAC using the appropriate double-precision data types
(16, 32 and 64 respectively, and int8, int16 and int32). Each element of the double-precision high portion
Points stored in VACH in.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)];
VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]*Bop[i];
}
Abnormal
Overflow.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMASL low multiply and subtract from accumulator
Format
Assembler syntax
VMASL.dt VRd,VRa,VRb
VMASL.dt VRd,VRa,SRb
VMASL.dt VRd,VRa,#IMM
VMASL.dt SRd,SRa,SRb
VMASL.dt SRd,SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) | float(f) |
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results
Fruit; from the vector accumulator subtracts the elements of each double-precision double-precision intermediate results of each element; each
Elements and stored in double-precision vector accumulator; the lower part of the store to the destination register VRd.
RVa and Rb using the specified data type, and VAC using the appropriate double-precision data types (16,
32 and 64 respectively int8, int16 and int32). Each double-precision elements stored in the high part of
VACH in.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb};
if(dt=float)VACL[i]=VACL[i]-VRA[i]*Bop[i];
else VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]*Bop[i]:
VRd[i]=VACL[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VMAXE and maximum pairwise exchange
Format
Assembler syntax
VMAXE.dt VRd,VRb
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M | V<-V | ||||
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
VRa should be equal VRb. When VRa with VRb not the same, the result is undefined.
Each vector register Rb even / odd pairs of data elements are compared, and each data element pairs
The larger value is stored to the even positions, each data element of the vector stored in the smaller register Rd
In an odd position.
Operating
for(i=0;i<NumElem && EMASK[i];i=i+2)(
VRd[i]=(VRb[i]>VRb[i+1])?VRb[i]:VRb[i+1];
VRd[i+1]=(VRb[i]>VRb[i+1])?VRb[i+1]:VRb[i]:
}
Abnormal
None.
VMOV transfer
Format
Assembler syntax
VMOV.dt Rd,Rb
Where dt = {b, b9, h, w, f}. Rd and Rb instruction register name specified on the structure.
Note. W and. F indicate the same operation.
Supported modes
Explanation
Transfer the contents of register Rb to register Rd. Group field specifies the source and destination registers
Groups. Register set markup approach is:
Vector register VR current group
VRA substitution group vector register
SR scalar register
SP special register
RASR return address stack register
MAC vector accumulator registers (see below VAC register code table)
Group <3:0> | Source group | Destination group | Note |
0000 | Retention | ||
0001 | VR | VRA | |
0010 | VRA | VR | |
0011 | VRA | VRA | |
0100 | Retention | ||
0101 | Retention | ||
0110 | VRA | VAC | |
0111 | VAC | VRA | |
1000 | Retention | ||
1001 | SR | VRA | |
1010 | Retention | ||
1011 | Retention | ||
1100 | SR | SP | |
1101 | SP | SR | |
1110 | SR | RASR | |
1111 | RASR | SR |
Note that you can not use this command to the vector register scalar register. VEXTRT instruction is
Provided for this purpose.
The VAC register encoded using the following table:
R<2:0> | Register | Note |
000 | Undefined | |
001 | VAC0L | |
010 | VAC0H | |
011 | VAC0 | Specify VAC0H: VAC0L both. As Designated as the source, VRd +1: VRd Send Register pair is updated. VRd must be an even Number of registers. |
100 | Undefined | |
101 | VAC1L | |
110 | VAC1H | |
111 | VAC1 | Specify VAC1H: VAC1L both. As Designated as the source, VRd +1: VRd Send Register pair is updated. VRd must be an even Number of registers. |
Other | Undefined |
Operating
Rd=Rb
Abnormal
Set in VCSR or VISRC unusual event status bit will cause a corresponding anomalies.
Programming notes
This command is not affected shielding element. Note that the mode used in VEC64 replacement group does not exist
Concept, VEC64 mode, the instruction can not be used to replace the group from the register or to alternative groups
Register transfers.
VMUL multiply
Format
Assembler syntax
VMUL.dt VRc,VRd,VRa,VRb
VMUL.dt SRc,SRd,SRa,SRb
Where dt = {b, h, w}.
Supported modes
S | VR | SR | ||
DS | int8(b) | int16(h) | int32(w) |
Explanation
Each element of the Ra and Rb each element to produce a double-precision multiplication result; each
Elements and return to double-precision destination register Rc: Rd.
Ra and Rb using the specified data type, and Rc: Rd using the appropriate double-precision data types
(16, 32 and 64 respectively int8, int16 and int32). Each element of the double-precision high portion
Points stored in Rc.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Aop[i]={VRa[i]‖SRa};
Bop[i]={VRb[i]‖SRb};
Hi[i]:Lo[i]=Aop[i]*Bop[i]:
Rc[i]=Hi[i];
Rd[i]=Lo[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead. This command does not support
Floating-point data types, as a result of extended data types are not supported.
VMULA multiply to accumulator
Format
Assembler syntax
VMULA.dt VRa,VRb
VMULA.dt VRa,SRb
VMULA.dt VRa,#IMM
VMULA.dt SRa,SRb
VMULA.dt SRa,#IMM
Where dt = {b, h, w, f}.
Supported modes
D:S:M | V@V | V@S | V@I | S@S | S@I |
DS | int8(b) | int16(h) | int32(w) | float(f) |
Explanation
Each element of the VRa and Rb each element to produce a double-precision multiplication result; the
This result is written to the accumulator.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb};
if(dt==float)VACL[i]=VRa[i]*Bop[i];
else VACH[i]:VACL[i]=VRa[i]*Bop[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VMULAF decimal multiply to accumulator
Format
Assembler syntax
VMULAF.dt VRa,VRb
VMULAF.dt VRa,SRb
VMULAF.dt VRa,#IMM
VMULAF.dt SRa,SRb
VMULAF.dt SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M | V@V | V@S | V@I | S@S | S@I |
DS | int8(b) | int16(h) | int32(w) |
Explanation
Each element of the VRa and Rb multiplying each element to produce a knot in the middle of the double-precision
Fruit; This left a double-precision intermediate results; writes the result to the accumulator.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
VACH[i]:VACL[i]=(VRa[i]*Bop[i])<<1;
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMULF multiply decimals
Format
Assembler syntax
VMULF.dt VRd,VRa,VRb
VMULF.dt VRd,VRa,SRb
VMULF.dt VRd,VRa,#IMM
VMULF.dt SRd,SRa,SRb
VMULF.dt SRd,SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) |
Explanation
Each element of the VRa and Rb each element of a double-precision multiplication to produce intermediate results
Fruit; This left a double-precision intermediate results; high part of the result returned to the destination register VRd
+ 1, The low part of the return to the destination register VRd. VRd register must be an even number.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Hi[i]:Lo[i]=(VRa[i]*Bop[i])<<1;
VRd+1[i]=Hi[i];
VRd[i]=Lo[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMULFR multiply decimals and rounding
Format
Assembler syntax
VMULFR.dt VRd,VRa,VRb
VMULFR.dt VRd,VRa,SRb
VMULFR.dt VRd,VRa,#IMM
VMULFR.dt SRd,SRa,SRb
VMULFR.dt SRd,SRa,#IMM
Where dt = {b, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) |
Explanation
Each element of the VRa and Rb each element to produce a double-precision multiplication intermediate result;
This left a double-precision intermediate results; this is shifted intermediate result is rounded to the upper part; highs
Partial return to the destination register VRd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]=(VRb[i]‖SRb‖sex(IIMM<8:0>)};
Hi[i]:Lo[i]=(VRa[i]*Bop[i])<<1;
if(Lo[i]<msb>==1) Hi[i]=Hi[i]+1;
VRd[i]=Hi[i];
}
Abnormal
None.
Programming notes
This command does not support int9 data types, use int16 data type instead.
VMULL by low
Format
Assembler syntax
VMULL.dt VRd,VRa,VRb
VMULL.dt VRd,VRa,SRb
VMULL.dt VRd,VRa,#IMM
VMULL.dt SRd,SRa,SRb
VMULL.dt SRd,SRa,#IMM
Where dt = (b, h, w, f}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int16(h) | int32(w) | float(f) |
Explanation
Each element of the VRa and Rb each element to produce a double-precision multiplication result; result
The lower part of the return to the destination register VRd.
Floating-point data types, all of the operands and the result are single precision.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb};
if(dt=Roat)Lo[i]=VRa[i]*Bop[i];
else Hi[i]:Lo[i]=VRa[i]*Bop[i];
VRd[i]=Lo[i];
}
Abnormal
Overflow invalid floating point operands.
Programming notes
This command does not support int9 data types. Instead use the int16 data type.
VNAND and non-
Format
Assembler syntax
VNAND.dt VRd,VRa,VRb
VNAND.dt VRd,VRa,SRb
VNAND.dt VRd,VRa,#IMM
VNAND.dt SRd,SRa,SRb
VNAND.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For each element in Ra every one with Rb / immediate operands in the corresponding bit logic
NAND, results are returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]=[VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=-(Ra[i]<k> & Bop[i]<k>).for k=all bits in elementi;
}
Abnormal
None.
VNOR or
Format
Assembler syntax
VNOR.dt VRd,VRa,VRb
VNOR.dt VRd,VRa,SRb
VNOR.dt VRd,VRa,#IMM
VNOR.dt SRd,SRa,SRb
VNOR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | ini9(b9) | int16(h) | int32(w) |
Explanation
For each element in Ra every one with Rb / immediate operands corresponding bits in the logic NOR;
Results are returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=-(Ra[i]<k>|Bop[i]<k>).for k=all bits in elementi;
}
Abnormal
None.
VOR or
Format
Assembler syntax
VOR.dt VRd,VRa,VRb
VOR.dt VRd,VRa,SRb
VOR.dt VRd,VRa,#IMM
VOR.dt SRd,SRa,SRb
VOR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For each element in Ra every one with Rb / immediate operands corresponding bit logical OR;
Results are returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=Ra[i]<k>|Bop[i]<k>,for k=all bits in elementi;
}
Abnormal
None.
VORC or complement
Format
Assembler syntax
VORC.dt VRd,VRa,VRb
VORC.dt VRd,VRa,SRb
VORC.dt VRd,VRa,#IMM
VORC.dt SRd,SRa,SRb
VORC.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Attention. W and. F specify the same operation.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
For each element in Ra every one with Rb / immediate operand corresponding logical complement of the bit
OR; result is returned in Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)}
Rd[i]<k>=Ra[i]<k>|-Bop[i]<k>,for k=all bits in elementi;
}
Abnormal
None.
VPFTCH prefetch
Format
Assembler syntax
VPFTCH.ln SRb,SRi
VPFTCH.ln SRb,#IMM
VPFTCH.ln SRb+,SRi
VPFTCH.ln SRb+,#IMM
Where ln = {1,2,4,8}.
Explanation
Start from a valid address multiple vector data Cache prefetch rows. Cache is specified as the number of rows
Next:
LN <1:0> = 00: prefetch a line of 64 bytes Cache
LN <1:0> = 01: prefetch two rows of 64 bytes Cache
LN <1:0> = 10: prefetch 4 lines of 64 bytes Cache
LN <1:0> = 11: prefetch 8 lines of 64 bytes Cache
If the address is not valid falls on 64-byte boundary, then the first 64 bytes truncated to match the edges
Boundary alignment.
Operating
Abnormal
Invalid data address anomalies.
Programming notes
EA <31:0> pointed out in a local memory byte address.
VPFTCHSP prefetched into temporary memory
Format
Assembler syntax
VPFTCHSP.ln SRp,SRb,SRi
VPFTCHSP.ln SRp,SRb,#IMM
VPFTCHSP.ln SRp,SRb+,SRi
VPFTCHSP.ln SRP,SRb+,#IMM
Where ln = {1,2,4,8}. Note VPFTCH and VPFTCHSP have the same
Opcode.
Explanation
Temporary memory from the memory to send multiple blocks of 64 bytes. Effective address given memory
Start address and SRp provide temporary memory starting address. The number of 64-byte blocks are assigned as follows:
LN <1:0> = 00: sending a 64-byte block
LN <1:0> = 01: sending two 64-byte blocks
LN <1:0> = 10: transmission 4 blocks of 64 bytes
LN <1:0> = 11: sending eight blocks of 64 bytes
If the address is not valid falls on 64-byte boundaries, first truncated to make the 64-byte boundary
Alignment. If SRp in the temporary memory address pointer does not fall on a 64-byte boundary, it also cut
Off with the 64-byte boundary alignment. Align the temporary memory pointer address to increase the number of bytes transferred.
Operating
EA=SRb+{SRi‖sex(IMM<7:0>)};
if(A=1)SRb=EA;
Num_bytes={64‖128‖256‖512};
Mem_adrs=EA<31:6>:6b′000000;
SRp=SRp<31:6>:6b′000000;
for(i=0;i<Num_bytes;i++)
SPAD[SRp++]=MEM[Mem_adrs+i];
Abnormal
Invalid data address anomalies.
VROL Rotate Left
Format
Assembler syntax
VROL.dt VRd,VRa,SRb
VROL.dt VRd,VRa,#IMM
VROL.dt SRd,SRa,SRb
VROL.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S<-S@I | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar register Ra left circle for each data element, left the number of bits in a scalar register
Rb, or IMM field is given, the result is stored vector / scalar register Rd.
Operating
rotate_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
Rd[i]=Ra[i]rotate_left rotate_amount;
}
Abnormal
None.
Programming notes
Note rotate-amount from SRb or IMM <4:0> gained five digits. For byte,
byte9, halfword data types, the programmer is responsible for correctly specified data length is less than or equal to the number of bits
Cyclic shift amount. If the shift amount is greater than the specified data length, the result is undefined.
Note that n bits Rotate Left Rotate Right ElemSize-n bits, where ElemSize table
The length of the given data shows the number of bits.
VROR Rotate Right
Format
Assembler syntax
VROR.dt VRd,SRa,SRb
VROR.dt VRd,SRa,#IMM
VROR.dt SRd,SRa,SRb
VROR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@S | V<-V@I | S<-S@S | S<-S@I | |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar register Ra each data element rotated right, right by the number of bits in a scalar register
Rb, or IMM field is given, the result is stored vector / scalar register Rd.
Operating
rotate_amount={SRb%32‖IMM<4:0>};
for(i=0;i<NumElem && EMASK[i];i++){
Rd[i]=Ra[i]rotate_right rotate_amount;
}
Abnormal
None.
Programming notes
Note rotate-amount from SRb or IMM <4:0> has made a number of five.
For byte, byte9, halfword data types, the programmer responsible for the correct designation is less than or equal to the data
The length of the cyclic shift amount of bits. If the shift amount is greater than the specified data length, the result
Is undefined.
Note that the loop right by n bits is equivalent to rotate left ElemSize-n bits, where ElemSize table
The length of the given data shows the number of bits.
VROUND will float to integer rounding
Format
Assembler syntax
VROUND.rm VRd,VRb
VROUND.rm SRd,SRb
Where m = {ninf, zero, near, pinf}.
Supported modes
D:S:M | V<-V | S<-S |
Explanation
Vector / scalar register Rb contents of floating-point data format rounding to become the nearest 32-bit integer
Number (Word), the result is stored in the vector / scalar register Rd. Rounding mode specified in RM.
RM<1:0> | Mode | Significance |
00 | ninf | To - ∞ rounding |
01 | zero | Rounding toward zero |
10 | near | Rounded to the nearest even number |
11 | pinf | Rounding toward + ∞ |
Operating
for(i=0;i<NumElem;i++){
Rd[i]=Convert to int32(Rb[i]);
}
Abnormal
None.
Programming notes
This command is not affected shielding element.
VSATL saturation to low limit
Format
Assembler syntax
VSATL.dt VRd,VRa,VRb
VSATL.dt VRd,VRa,SRb
VSATL.dt VRd,VRa,#IMM
VSATL.dt SRd,SRa,SRb
VSATL.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}. Note 9 immediate unsupported. F data types.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
Vector / scalar register Ra with it each data element in the vector / scalar register Rb or IMM
Field compared to the corresponding lower limit check. If the data element value smaller than the lower limit, were set equal to
In the lower limit, and the final result is stored in vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]=(Ra[i]<Bop[i]?Bop[i]:Ra[i];
}
Abnormal
None.
VSATU saturate the high limit
Format
Assembler syntax
VSATU.dt VRd,SRa,SRb
VSATU.dt VRd,SRa,SRb
VSATU.dt VRd,SRa,#IMM
VSATU.dt SRd,SRa,SRb
VSATU.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}. Note 9 immediate unsupported. F data types.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
Vector / scalar register Ra with it each data element in the vector / scalar register Rb or IMM
Field is checked against the corresponding high limit. If the data element is greater than this upper limit, were set equal to
At high limit, and the final result is stored in vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)];
Rd[i]=(Ra[i]>Bop[i])?Bop[i]:Ra[i];
}
Abnormal
None.
VSHFL shuffling
Format
Assembler syntax
VSHFL.dt VRc,VRd,VRa,VRb
VSHFL.dt VRc,VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
S | VRb | SRb | ||
DS | int8(b) | int9(b9) | int18(h) | int32(w) |
Explanation
Vector contents of register Ra and Rb shuffling, the result is stored in the vector register Rc: Rd,
As shown below:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VSHFLH shuffling high
Format
Assembler syntax
VSHFLH.dt VRd,VRa,VRb
VSHFLH.dt VRd,VRa,SRb
Where dt = {b, b9, h, w, f]. Attention. W and. F specify the same operation.
Supported modes
S | VRb | SRb | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector contents of register Ra and Rb shuffled, placed in the high part of the result vector register
Rd, as shown below:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VSHFLL shuffled Low
Format
Assembler syntax
VSHFLL.dt VRd,VRa,VRb
VSHFLL.dt VRd,VRa,SRb
Where dt = {b, b9, h, W, f}. Attention. W and. F specify the same operation.
Supported modes
S | VRb | SRb | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector contents of register Ra and Rb shuffling, the results stored in the lower part of the vector register
Rd, as shown below:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VST Storage
Format
Assembler syntax
VST.st Rs,SRb,SRi
VST.st Rs,SRb,#IMM
VST.st Rs,SRb+,SRi
VST.st Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.
Note. B and. B9t indicate the same operation, .64, and can not be specified together VRAs. On the Cache-off
Storage usage VSTOFF.
Explanation
Storing a vector or scalar register.
Operating
EA=SR
b+[SR
i‖sex(IMM<7:0>)};
if(A=1)SR
b=EA;
MEM [EA] = table:
ST | Storage operations |
.b | BYTE[EA]=SR s<7:0> |
.h | HALF[EA]=SR s<15:0> |
.w | WORD[EA]=SR s<31:0> |
.4 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 3 |
.8 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 7 |
.16 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 15 |
.32 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 31 |
.64 | BYTE[EA+i]=VR 0s<9i+7:9i>,i=0 to 31 BYTE[EA+32+i]=VR 1s<9i+7:9i>,i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VSTCB stored in a circular buffer
Format
Assembler syntax
VSTCB.st Rs,SRb,SRi
VSTCB.st Rs,SRb,#IMM
VSTCB.st Rs,SRb+,SRi
VSTCB.st Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.
Note. B and. B9t indicate the same operation, .64, and can not be specified together VRAs. On the Cache-off
Use VSTCBOFF.
Explanation
From the circular buffer stores the vector or scalar register, a circular buffer bounded by SRb+1In
The BEGIN pointer and SRb+2The END-pointer.
And address of the storage before the update operation, if the effective address is greater than END address, it will be
Adjusted. In addition,. H and. W scalar boundary of the circular buffer must be loaded separately and with halfword
Word boundary alignment.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)};
BEGIN=SR
b+1;
END=SR
b+2;
cbsize=END-BEGIN;
if(EA>END)EA=BEGIN+(EA-END);
if(A=1)SR
b=EA;
MEM [EA] = table:
ST | Storage operations |
.b | BYTE[EA]=SR s<7:0>; |
.h | HALF[EA]=SR s<15:0>; |
.w | WORD[EA]=SR s<31:0>; |
.4 | BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 3 |
.8 | BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 7 |
.16 | BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 15 |
.32 | BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR s<9i+7:9i>,i=0 to 31 |
.64 | BYTE[(EA+i>END)?EA+i-cbsize:EA+i]=VR 0s<9i+7:9i>,i=0 to 31 BYTE[(EA+32+i>END)?EA+32+i-cbsize:EA+32+i]=VR 1s<9i+7:9i>. i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
Programming of the following conditions must be determined in order to make this work in the desired command:
BEGIN<EA<2*END-BEGIN
Namely, EA> BEGIN and EA-END <END-BEGIN
VSTD dual storage
Format
Assembler syntax
VSTD.st Rs,SRb,SRi
VSTD.st Rs,SRb,#IMM
VSTD.st Rs,SRb+,SRi
VSTD.st Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.
Attention. B and. B9t specify the same operation, .64 and VRAs can not be specified together. On the Cache-off
Storage usage VSTDOFF.
Explanation
From the current or alternative storage group from two vector registers or two scalar register.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)};
if(A=1)SR
b=EA;
MEM [EA] = table:
ST | Storage operations |
.b | BYTE[EA]=SR s<7:0> BYTE[EA+1]=SR s+1<7:0> |
.h | HALF[EA]=SR s<15:0> HALF[EA+2]=SR s+1<15:0> |
.w | WORD[EA]=SR s<31:0> WORD[EA+4]=SR s+1<31:0> |
.4 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 3 BYTE[EA+4+i]=VR s+1<9i+7:9i>,i=0 to 3 |
.8 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 7 BYTE[EA+8+i]=VR s+1<9i+7:9i>,i=0 to 7 |
.16 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 15 BYTE[EA+16+i]=VR s+1<9i+7:9i>,i=0 to 15 |
ST | Storage operations |
.32 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 31 BYTE[EA+32+i]=VR s+1<9i+7:9i>,i=0 to 31 |
.64 | BYTE[EA+i]=VR 0s<9i+7:9i>,i=0 to 31 BYTE[EA+32+i]=VR 1s<9i+7:9i>,i=0 to 31 BYTE[EA+64+i]=VR 0s+1<9i+7:9i>,i=0 to 31 BYTE[EA+96+i]=VR 1s+1<9i+7:9i>,i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
Elements of this Directive without shielding effect.
VSTQ four storage
Format
Assembler syntax
VSTQ.st Rs,SRb,SRi
VSTQ.st Rs,SRb,#IMM
VSTQ.st Rs,SRb+,SRi
VSTQ.st Rs,SRb+,#IMM
Where st = {b, b9t, h, w, 4,8,16,32,64}, Rs = {VRs, VRAs, SRs}.
Attention. B and. B9t specify the same operation, .64 and VRAs can not be specified together. On the Cache-off
Storage usage VSTQOFF.
Explanation
Storage from the current or alternative set of four vector registers or four scalar register.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)};
if(A=1)SR
b=EA;
MEM [EA] = table:
ST | Storage operations |
.b | BYTE[EA]=SR s<7:0> BYTE[EA+1]=SR s+1<7:0> BYTE[EA+2]=SR s+2<7:0> BYTE[EA+3]=SR s+3<7:0> |
.h | HALF[EA]=SR s<15:0> HALF[EA+2]=SR s+1<15:0> HALF[EA+4]=SR s+2<15:0> HALF[EA+6]=SR s+3<15:0> |
.w | WORD[EA]=SR s<31:0> WORD[EA+4]=SR s+1<31:0> WORD[EA+8]=SR s+2<31:0> WORD[EA+12]=SR s+3<31:0> |
ST | Storage operations |
.4 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 3 BYTE[EA+4+i]=VR s+1<9i+7:9i>,i=0 to 3 BYTE[EA+8+i]=VR s+2<9i+7:9i>,i=0 to 3 BYTE[EA+12+i]=VR s+3<9i+7:9i>,i=0 to 3 |
.8 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 7 BYTE[EA+8+i]=VR s+1<9i+7:9i>,i=0 to 7 BYTE[EA+16+i]=VR s+2<9i+7:9i>,i=0 to 7 BYTE[EA+24+i]=VR s+3<9i+7:9i>,i=0 to 7 |
.16 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 15 BYTE[EA+16+i]=VR s+1<9i+7:9i>,i=0 to 15 BYTE[EA+32+i]=VR s+2<9i+7:9i>,i=0 to 15 BYTE[EA+48+i]=VR s+3<9i+7:9i>,i=0 to 15 |
.32 | BYTE[EA+i]=VR s<9i+7:9i>,i=0 to 31 BYTE[EA+32+i]=VR s+1<9i+7:9i>,i=0 to 31 BYTE[EA+64+i]=VR s+2<9i+7:9i>,i=0 to 31 BYTE[EA+96+i]=VR s+3<9i+7:9i>,i=0 to 31 |
.64 | BYTE[EA+i]=VR 0s<9i+7:9i>,i=0 to 31 BYTE[EA+32+i]=VR 1s<9i+7:9i>,i=0 to 31 BYTE[EA+64+i]=VR 0s+1<9i+7:9i>,i=0 to 31 BYTE[EA+96+i]=VR 1s+1<9i+7:9i>,i=0 to 31 BYTE[EA+128+i]=VR 0s+2<9i+7:9i>,i=0 to 31 BYTE[EA+160+i]=VR 1s+2<9i+7:9i>,i=0 to 31 BYTE[EA+192+i]=VR 0s+3<9i+7:9i>,i=0 to 31 BYTE[EA+224+i]=VR 1s+3<9i+7:9i>,i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
This command is not affected shielding element.
VSTR reverse Storage
Format
Assembler syntax
VSTR.st Rs,SRb,SRi
VSTR.st Rs,SRb,#IMM
VSTR st Rs,SRb+,SRi
VSTR.st Rs,SRb+,#IMM
Where st = {4,8,16,32,64}, Rs = {VRs, VRAs}. Note .64 and VRAs
Can not be specified together. On the Cache-off storage usage VSTROFF.
Explanation
Stored in reverse order of elements in vector registers. The directive does not support scalar data source register.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)};
if(A=1)SR
b=EA;
MEM [EA] = table:
ST | Storage operations |
.b | BYTE[EA+i]=VR s[31-i]<7:0>,for i=0 to 31 |
.h | HALF[EA+i]=VR s[15-i]<15:0>,for i=0 to 15 |
.w | WORD[EA+i]=VR s[7-i]<31:0>,for i=0 to 7 |
.4 | BYTE[EA+i]=VR s[31-i]<7:0>,i=0 to 3 |
.8 | BYTE[EA+i]=VR s[31-i ]<7:0>,i=0 to 7 |
.16 | BYTE[EA+i]=VR s[31-i]<7:0>,i=0 to 15 |
.32 | BYTE[EA+i]=VR s[31-i]<7:0>,i=0 to 31 |
.64 | BYTE[EA+32+i]=VR 0s[31-i]<7:0>,i=0 to 31 BYTE[EA+i]=VR 1s[31-i]<7:0>,i=0 to 31 |
Abnormal
Invalid data address, unaligned accesses.
Programming notes
Elements of this Directive without shielding effect.
VSTWS span storage
Format
Assembler syntax
VSTWS.st Rs,SRb,SRi
VSTWS.st Rs,SRb,#IMM
VSTWS.st Rs,SRb+,SRi
VSTWS.st Rs,SRb+,#IMM
Where st = [8,16,32}, Rs = {VRs, VRAs}. Note that .64 is not supported mode
With VST instead. On the Cache-off storage usage VSTWSOFF.
Explanation
Start from a valid address, using scalar register SRb+1As a span of control registers, from the vector register
Register VRs to store 32 bytes of memory.
ST instruction block size, block storage from each successive bytes. SRb+1Instructions stride,
Separating the start of two consecutive blocks of bytes.
Stride must be equal to or greater than the block size. EA must be aligned data length. stride and
block size must be a multiple data length.
Operating
EA=SR
b+{SR
i‖sex(IMM<7:0>)};
if(A=1)SR
b=EA;
Block-size={4‖8‖16‖32};
Stride=SR
b+1<31:0);
for(i=0;i<VECSIZE/Block-size;i
++)
for(j=0;j<Block-size;j
++)
BYTE[EA+I*Stride+j]=VRs{i*Block-size+j}<7:0>;
Abnormal
Invalid data address, unaligned accesses.
VSUB Less
Format
Assembler syntax
VSUB.dt VRd,VRa,VRb
VSUB.dt VRd,VRa,SRb
VSUB.dt VRd,VRa,#IMM
VSUB.dt SRd,SRa,SRb
VSUB.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
From the vector / scalar Subtract the contents of register Ra vector / scalar register Rb content, the knot
If stored in a vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]=Ra[i]-Bop[i];
}
Abnormal
Overflow invalid floating point operands.
VSUBS downs and set
Format
Assembler syntax
VSUBS.dt SRd,SRa,SRb
VSUBS.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w, f}.
Supported modes
D:S:M | S<-S@S | S<-S@I | |||
DS | int8(b) | int9(b9) | int16(h) | int32(w) | float(f) |
Explanation
Subtracted from SRa SRb; result into SRd, and set VCSR in VFLAG bit.
Operating
Bop={SRb‖sex(IMM<8:0>)};
SRd=SRa-Bop;
VCSR<lt,eq,gt>=status(SRa-Bop);
Abnormal
Overflow invalid floating point operands.
VUNSHFL deshuffling
Format
Assembler syntax
VUNSHFL.dt VRc,VRd,VRa,VRb
VUNSHFL.dt VRc,VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Note. W and. F indicate the same operation.
Supported modes
S | VRb | SRb | ||
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
The contents of vector register VRa Rb deshuffling and into vector register VRc: VRd,
As follows:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VUNSHFLH deshuffling high
Format
Assembler syntax
VUNSHFLH.dt VRd,VRa,VRb
VUNSHFLH.dt VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Note. W and. F indicate the same operation.
Supported modes
S | VRb | SRb | ||
Ds | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
The contents of vector register VRa Rb is deshuffling; returned to the high part of the result vector register
Register VRd, as follows:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VUNSHFLL deshuffling low
Format
Assembler syntax
VUNSHFLL.dt VRd,VRa,VRb
VUNSHFLL.dt VRd,VRa,SRb
Where dt = {b, b9, h, w, f}. Attention. W and. F specify the same operation.
Supported modes
S | VRb | SRb | ||
Ds | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
The contents of vector register VRa Rb is deshuffling; results returned to the low part of the vector register
Register VRd, as follows:
Operating
Abnormal
None.
Programming notes
This instruction does not use the element shield.
VWBACKSP writeback from the temporary memory
Format
Assembler syntax
VWBACKSP.ln SRp,SRb,SRi
VWBACKSP.ln SRp,SRb,#IMM
VWEACKSP.ln SRp,SRb+,SRi
VWBACKSP.ln SRp,SRb+,#IMM
Where ln = {1,2,4,8}. Note VWBACK and VWBACKSP use the same operating
For the code.
Explanation
Transferred from the temporary memory to the memory more than 64 byte blocks. Effective address given memory
Start address, SRp given temporary memory starting address. The number of 64-byte blocks are assigned as follows:
LN <1:0> = 00: sending a 64-byte block
LN <1:0> = 01: sending two 64-byte blocks
LN <1:0> = 10: transmission 4 blocks of 64 bytes
LN <1:0> = 11: sending eight blocks of 64 bytes
If the address is not valid falls on 64-byte boundary, then it is first truncated to 64 bytes with the edges
Boundary alignment. If SRp pointer in the temporary memory address does not fall on 64-byte boundaries, but also
Truncated and well and 64-byte boundary alignment. Align the temporary memory pointer address to send word
Increase the number of sections.
Operating
EA=SRb+{SRi‖sex(IMM<7:0>)};
if(A=1)SRb=EA;
Num_bytes={64‖128‖256‖512};
Mem_adrs=EA<31:6>:6b′000000;
SRp=SRp<31:6>:6b′000000;
for(i=0;i<Num_bytes;i++)
SPAD[SRp++]=MEM[Mem_adrs+i];
Abnormal
Invalid data address anomalies.
VXNOR Exclusive NOR
Format
Assembler syntax
VXNOR.dt VRd,VRa,VRb
VXNOR.dt VRd,VRa,SRb
VXNOR.dt VRd,VRa,#IMM
VXNOR.dt SRd,SRa,SRb
VXNOR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar contents of register Ra and vector / scalar register Rb contents logical XOR
Africa, the result is stored in vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=-(Ra[i]<k>^Bop[i]<k>),for k=all bits in elementi;
}
Abnormal
None.
VXOR XOR
Format
Assembler syntax
VXOR.dt VRd,VRa,VRb
VXOR.dt VRd,VRa,SRb
VXOR.dt VRd,VRa,#IMM
VXOR.dt SRd,SRa,SRb
VXOR.dt SRd,SRa,#IMM
Where dt = {b, b9, h, w}.
Supported modes
D:S:M | V<-V@V | V<-V@S | V<-V@I | S<-S@S | S<-S@I |
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
Vector / scalar contents of register Ra and vector / scalar register Rb contents Exclusive
The result is placed vector / scalar register Rd.
Operating
for(i=0;i<NumElem && EMASK[i];i++){
Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};
Rd[i]<k>=Ra[i]<k>^Bop[i]<k>,for k=all bits in elementi;
}
Abnormal
None.
VXORALL XOR all the elements
Format
Assembler syntax
VXORALL.dt SRd,VRb
Where dt = {b, b9, h, w}. Attention. B and. B9 specify the same operation.
Supported modes
DS | int8(b) | int9(b9) | int16(h) | int32(w) |
Explanation
VRb each element along with the least significant bit XOR, a result is returned to SRd most
Low significant bits. This command is not affected shielding element.
Operating
Abnormal
None.
VWBACK writeback
Format
Assembler syntax
VWBACK.ln SRb,SRi
VWBACK.ln SRb,#IMM
VWBACK.ln SRb+,SRi
VWBACK.ln SRb+,#IMM
Where ln = {1,2,4,8}.
Explanation
Vector Data Cache whose index is specified in the EA (EA match its label those same phase
Anti) Cache line, as it contains the modified data, were updated to the memory. If more than one
Cache line is specified, when they contain the modified data, the subsequent successive rows are updated in Cache
To the memory. Cache the number of rows specified as follows:
LN <1:0> = 00: write a line of 64 bytes Cache
LN <1:0> = 01: write two lines of 64 bytes Cache
LN <1:0> = 10: write four lines of 64 bytes Cache
LN <1:0> = 11: write 8 lines of 64 bytes Cache
If the address is not valid falls on 64-byte boundary, then it is first truncated to 64 bytes with the edges
Boundary alignment.
Operating
Abnormal
Invalid data address anomalies.
Programming notes
EA <31:0> point out local memory byte address.
Claims (9)
1 A processor comprising:
A scalar register, suitable for storing a single scalar value;
A vector register, for storing a plurality of data elements; and
Processing circuit, which is connected to said scalar register and said vector registers, wherein the processing
Circuit is responsive to a single instruction to perform a variety of operations in parallel, each operation of said vector register
With an element of data in the scalar registers of the scalar values together.
(2) A method of operating the processing circuit to execute the command, the method comprising:
Read valued components constituting register data elements; and
Perform parallel operation, the operation of the scalar value combined with each data element to produce a vector
Results.
3 as claimed in claim 2, wherein said parallel operation performed comprising said target
Value is multiplied with each of said data elements to generate vector data results.
4 as claimed in claim 2, wherein said parallel operation performed comprising said target
Value added to each of said data elements to generate vector data results.
5 as claimed in claim 2, further comprising reading from another register in said target value
Combining said data element, wherein the further register for storing a single scalar value.
As claimed in claim 2, further comprising extracting from the instruction with the value of said target
Combining elements of said data.
7 A method of operating the processor, the method comprising:
Providing a plurality of processors in said scalar register and a plurality of vector registers, wherein each standard
Volume registers for storing a single scalar value, and each vector register adapted to store a vector component constitutes
A plurality of data elements;
To each scalar register number assigned to a register, the register number is different from the label assigned to other
Volume register register number;
To each vector register number assigned to a register, the register number is different from the other assigned to
Volume registers register number, which is assigned to at least some of said vector register and register number assigned
To the scalar register number register the same;
Forming an instruction, the instruction includes a first operand and the second operand, wherein the first operand
Identifies a scalar register is a register number, the second operand is a vector register identifies
Register number; and
Executing said instruction by said identifier of said first operand register, and the scalar
Said identifier of said second operand vector register a transfer data between the data elements.
As claimed in claim 7, wherein:
Forming said instructions further comprises a vector used to identify data elements in the third operation
Number; and which
Executing said command to said first operand by the scalar registers identified by said
Identification of said second operand vector register operand identifies said third data elements
Transfer data between.
(10) as claimed in claim 7, wherein:
The directive also includes the formation of another scalar register is used to identify a third operand; and
Among
Executing said command to said first operand by the scalar registers identified by said
Identified by the second operand in said another scalar register values stored in the vector identity
Register transfer data between the data elements.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US69958596A | 1996-08-19 | 1996-08-19 | |
US699585 | 1996-08-19 | ||
US699,585 | 1996-08-19 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1188275A CN1188275A (en) | 1998-07-22 |
CN1152300C true CN1152300C (en) | 2004-06-02 |
Family
ID=24809983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB971174059A Expired - Fee Related CN1152300C (en) | 1996-08-19 | 1997-08-19 | Single-instruction-multiple-data processing with combined scalar/vector operations |
Country Status (6)
Country | Link |
---|---|
JP (1) | JPH10143494A (en) |
KR (1) | KR100267089B1 (en) |
CN (1) | CN1152300C (en) |
DE (1) | DE19735349B4 (en) |
FR (1) | FR2752629B1 (en) |
TW (1) | TW346595B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103002276B (en) * | 2011-03-31 | 2017-10-03 | Vixs系统公司 | Multi-format video decoder and coding/decoding method |
WO2013095597A1 (en) * | 2011-12-22 | 2013-06-27 | Intel Corporation | Systems, apparatuses, and methods for performing an absolute difference calculation between corresponding packed data elements of two vector registers |
US9792115B2 (en) * | 2011-12-23 | 2017-10-17 | Intel Corporation | Super multiply add (super MADD) instructions with three scalar terms |
CN102750133B (en) * | 2012-06-20 | 2014-07-30 | 中国电子科技集团公司第五十八研究所 | 32-Bit triple-emission digital signal processor supporting SIMD |
KR102179385B1 (en) | 2013-11-29 | 2020-11-16 | 삼성전자주식회사 | Method and processor for implementing instruction and method and apparatus for encoding instruction and medium thereof |
GB2543303B (en) * | 2015-10-14 | 2017-12-27 | Advanced Risc Mach Ltd | Vector data transfer instruction |
US10108581B1 (en) * | 2017-04-03 | 2018-10-23 | Google Llc | Vector reduction processor |
US11409692B2 (en) * | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
CN114116513B (en) * | 2021-12-03 | 2022-07-29 | 中国人民解放军战略支援部队信息工程大学 | Register mapping method and device from multi-instruction set architecture to RISC-V instruction set architecture |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5081573A (en) * | 1984-12-03 | 1992-01-14 | Floating Point Systems, Inc. | Parallel processing system |
US5001662A (en) * | 1989-04-28 | 1991-03-19 | Apple Computer, Inc. | Method and apparatus for multi-gauge computation |
JPH04336378A (en) * | 1991-05-14 | 1992-11-24 | Nec Corp | Information processor |
US5669013A (en) * | 1993-10-05 | 1997-09-16 | Fujitsu Limited | System for transferring M elements X times and transferring N elements one time for an array that is X*M+N long responsive to vector type instructions |
DE69519449T2 (en) * | 1994-05-05 | 2001-06-21 | Conexant Systems Inc | Space pointer data path |
-
1997
- 1997-04-04 KR KR1019970012609A patent/KR100267089B1/en not_active IP Right Cessation
- 1997-08-14 DE DE19735349A patent/DE19735349B4/en not_active Expired - Fee Related
- 1997-08-18 FR FR9710440A patent/FR2752629B1/en not_active Expired - Fee Related
- 1997-08-19 CN CNB971174059A patent/CN1152300C/en not_active Expired - Fee Related
- 1997-08-19 JP JP9222417A patent/JPH10143494A/en active Pending
- 1997-08-19 TW TW086111965A patent/TW346595B/en not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
FR2752629B1 (en) | 2005-08-26 |
DE19735349A1 (en) | 1998-04-02 |
KR19980018065A (en) | 1998-06-05 |
TW346595B (en) | 1998-12-01 |
JPH10143494A (en) | 1998-05-29 |
FR2752629A1 (en) | 1998-02-27 |
KR100267089B1 (en) | 2000-11-01 |
CN1188275A (en) | 1998-07-22 |
DE19735349B4 (en) | 2006-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1117316C (en) | Single-instruction-multiple-data processing using multiple banks of vector registers | |
CN1112635C (en) | Single-instruction-multiple-data processing in multimedia signal processor and device thereof | |
CN1194292C (en) | Microprocessor with improved instruction set system structure | |
CN1103961C (en) | Coprocessor data access control | |
CN1135468C (en) | Digital signal processing integrated circuit architecture | |
CN1080906C (en) | System and method for processing datums | |
CN1246772C (en) | Processor | |
CN101720460B (en) | Compact instruction set encoding | |
CN1625731A (en) | Configurable data processor with multi-length instruction set architecture | |
CN1149469C (en) | Set of instructions for operating on packed data | |
US9411585B2 (en) | Multi-addressable register files and format conversions associated therewith | |
CN1656495A (en) | A scalar/vector processor | |
CN1584824A (en) | Microprocessor frame based on CISC structure and instruction realizing style | |
CN1173931A (en) | Method and appts. for custom operations of a processor | |
CN1152300C (en) | Single-instruction-multiple-data processing with combined scalar/vector operations | |
CN1226323A (en) | Data processing apparatus registers | |
CN1115631C (en) | Eight-bit microcontroller having a RISC architecture | |
CN1103959C (en) | Register addressing in a data processing apparatus | |
CN1862485A (en) | A digital signal processor | |
CN1279435C (en) | Digital signal processor | |
CN1254740C (en) | Data processing using coprocessor | |
CN1104679C (en) | Data processing condition code flags | |
CN1226325A (en) | Input operation control in data processing systems | |
CN1223934C (en) | Macroinstruction collecting symmetrical parallel system structure micro processor | |
CN1226324A (en) | Data processing system register control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20040602 Termination date: 20090819 |