CN102750133A

CN102750133A - 32-Bit triple-emission digital signal processor supporting SIMD

Info

Publication number: CN102750133A
Application number: CN2012102058120A
Authority: CN
Inventors: 屈凌翔; 张庆文; 黄嵩人; 杨晓刚
Original assignee: CETC 58 Research Institute
Current assignee: CETC 58 Research Institute
Priority date: 2012-06-20
Filing date: 2012-06-20
Publication date: 2012-10-24
Anticipated expiration: 2032-06-20
Also published as: CN102750133B

Abstract

The invention discloses a 32-bit triple-emission digital signal processor supporting SIMD (Single Instruction Multiple Data), comprising three flow lines in parallel emission: a data access flow line, an integer arithmetic flow line and a vector arithmetic flow line, wherein each flow line is provided with an independent decoding and execution unit and supports SIMD operation. The 32-bit triple-emission digital signal processor supporting SIMD is mainly composed of a program memory interface unit, a data memory interface unit, an instruction fetch unit, a flow line control unit, a system bus, a data access flow line unit, an integer arithmetic flow line unit, a vector arithmetic flow line unit, a data register, an address register, a vector register, a coprocessor interface unit and a floating point arithmetic unit, all of which are connected together through a circuit. The 32-bit triple-emission digital signal processor supporting SIMD supports parallel execution of three flow lines so that the parallel processing capability of a DSP (Digital Signal Processor) is improved; besides, the 32-bit triple-emission digital signal processor supports parallel execution four groups of 16-bit multiplying and adding operations in a single cycle, and supports simultaneous execution of the operation of five groups of data and the access operation of one group of data; therefore, the data processing capability of the DSP is enhanced.

Description

Support the digital signal processor of 32 three emissions of SIMD

Technical field

The present invention relates to digital signal processor, specifically is the DSP of 32 three emissions of a kind of SIMD of support.

Background technology

DSP is processing digital signal in real time, possesses the data-handling capacity of ultra general processor far away, plays an important role in fields such as data communication, multimedia processing.Along with the high speed development of the communication technology and multimedia technology, also increasingly high to the requirement of DSP data-handling capacity.The method that promotes the DSP data-handling capacity mainly contains and promotes the DSP dominant frequency, adopts the multinuclear framework to promote the processing power of entire circuit, the parallel processing capability that the employing multiple-issue architecture promotes DSP nuclear etc.Along with the DSP dominant frequency is increasingly high, the cost and the difficulty that continue raising are also increasingly high at present; Though and multicore architecture can reduce the requirement to DSP nuclear, can increase the design difficulty of whole SOC circuit greatly.Therefore improving dominant frequency and taking outside the multicore architecture, multiple-issue architecture is also more and more universal.The present invention adopts 3 emitting structurals, and 3 streamlines can executed in parallel; Especially to add processing power in order promoting to take advantage of, to have designed the Vector Processing streamline separately, two streamlines of this streamline and other are independent fully to be carried out, and mainly supports parallel data operation.

Summary of the invention

The objective of the invention is to overcome the deficiency that exists in the prior art; Improve the concurrent operation ability and the multiply-add operation ability of processor; The digital signal processor of 32 three emissions of a kind of SIMD of support is provided, and is the digital signal processor towards multimedia processing and data communication field.

According to technical scheme provided by the invention; The digital signal processor of 32 three emissions of described support SIMD; The streamline that comprises 3 parallel emissions: data access streamline, integer arithmetic streamline, vector operation streamline; Every streamline possesses independently decoding and performance element, and supports the SIMD operation;

Said data access streamline comprises access decoding unit, storage control unit, address arithmetic unit, address-generation unit; The access decoding unit links to each other with the address arithmetic unit with address-generation unit through control signal with storage control unit; The computing of control address generation unit and address arithmetic unit; Address-generation unit links to each other with data register with address register through data bus with the address arithmetic unit; Reading of data is carried out computing from address register and data register, and the result writes back address register or generates the memory access address and delivers to the data-carrier store interface; The access decoding unit is got instruction and is referred to that the access class instruction that the unit launches deciphers; Storage control unit is according to the operation of access decode results control address arithmetic element and address-generation unit; The address arithmetic unit is to calculating from address register, data register or several immediately address dates, and the result who obtains is kept in the address register; Address-generation unit is according to generation memory access addresses such as addressing modes; Have only the data access streamline to carry out read-write operation to external memory storage, all the other two streamlines are only to register manipulation;

Said integer arithmetic streamline comprises ALU decoding unit, arithmetic control module, bit processing unit, ALU, coprocessor interface unit; The ALU decoding unit links to each other with bit processing unit with ALU through control signal with the arithmetic control module; The computing of control ALU and bit processing unit; ALU links to each other with data register through data bus with bit processing unit; Reading of data is handled from data register, and writes back the result in the data register; Bit processing unit is used for the execute bit operation, comprises position insertion, position extraction, position replacement and shifting function etc.; The coprocessor interface unit is responsible for carrying out data with coprocessor and is exchanged;

Said vector operation streamline comprises taking advantage of adding decoding unit, taking advantage of and add control module, vector operation unit; Take advantage of and add decoding unit and add control module and link to each other with the vector operation unit through control signal with taking advantage of; The computing of control vector arithmetic element; The vector operation unit links to each other with vector registor with data register through data bus; Reading of data is handled from data register or vector registor, and writes back the result in data register or the vector registor; Comprise the multiplication unit of operand extraction unit, 4 16*32,2 64 bit accumulators in the vector operation unit, be mainly used in to carry out and parallel take advantage of, take advantage of and add, take advantage of and add reducing, support SIMD;

Also comprise:

The finger unit is got in instruction, be 3 streamlines shared get the finger unit, the programmable counter that is used for controlling this unit realizes that the order of program carries out and redirect, and conflict is made prediction to data; Get in the instruction buffer that refers to the unit to instruction through program storage interface read-in programme data, and the order code in the instruction buffer is judged, send to 32 or 16 bit instruction sign indicating numbers respectively in the corresponding streamline according to judged result;

FPU Float Point Unit as coprocessor, is connected to the integer arithmetic streamline through the coprocessor interface unit; Have independently floating intruction set, carry out floating-point operation, and send result back to data register through the integer arithmetic streamline through the integer arithmetic streamline;

Register file: comprise 16 32 bit address registers, 16 32 bit data register, 16 32 bit vector register and special function registers; Address register is used for address arithmetic and the memory access address generates, and data register is used for integer arithmetic and vector operation, and vector registor is used for the SIMD operation of support vector computing, all supports different streamlines simultaneously from parallel the reading and writing data of different ports;

The program storage interface is got through address bus and program bus and instruction and is referred to that the unit links to each other, receives instruction and gets getting of referring to that the unit sends over and refer to the address, and send to director data to instruct through program bus and get the finger unit; Instruction get refer to the unit through instruction bus with access decoding unit, ALU decoding unit, take advantage of and add decoding unit and link to each other, and 3 of the transmissions that can walk abreast are instructed in these 3 decoding units; Said data register, vector registor, address register link to each other with the data-carrier store interface unit through data bus, carry out exchanges data through data-carrier store interface and external data memory; Pipeline control unit links to each other with each execution unit through the streamline control signal, controls the execution of each execution unit, and accepts the feedback of each execution unit.

The digital signal processor of 32 three emissions of said support SIMD possesses multimedia and handles special instruction and Viterbi decoding special instruction; Support the instruction of 16 bit instructions and floating-point operation; Support fixed point and floating-point operation, wherein 16 bit instructions are the subclass of 32 bit instructions; Have independently data cache and program high-speed cache.

Every streamline possesses its proprietary instruction, and two of specialized designs are used to distinguish this three types of instructions in the operational code of instruction; Get the finger stage in instruction, instruction is got the finger unit and is deciphered to confirm classes of instructions in advance to these two, and classification sends to corresponding streamline then.

The present invention adopts the memory organization mode of stratification, the first order near execution unit be the internal register file, comprise 16 32 address register, 16 32 data register, 16 32 vector registor; The second level is high-speed cache; The third level is a data-carrier store; Two 32 bit address registers can be formed 64 bit address register pairs; Two 32 bit vector registers can be formed 64 bit vector register pairs; It is right that two 32 bit data register can be formed 64 bit data register; 4 vector registors or data register can be formed 128 bit data register groups, are used to support the SIMD operation.

The present invention adopts 5 stage pipeline structure, is respectively to get finger, decoding, carry out 1, carry out 2, write back, and wherein decoding and operating part have 3 groups of independently parts respectively.

Article 3, the streamline shared instruction get refer to the unit in decoding, carry out 1, carry out 2, write back the stage independent parallel and carry out; Wherein the data access streamline is responsible for address arithmetic, storage access, unconditional jump; The integer arithmetic streamline is responsible for adding reducing, logical operation, compare operation, shifting function, floating-point operation, bit manipulation, condition redirect, and the vector operation streamline mainly be responsible for to carry out and single or a plurality ofly parallel take advantage of, take advantage of and add, take advantage of plus-minus, take advantage of and subtract add operation.

Said instruction is got and is referred to that the unit has the preparatory decoding function of instruction, determines transmitting instructions in data access streamline, integer arithmetic streamline or vector operation streamline according to the preparatory decode results of instruction; The instruction buffer that refers to comprise in the unit 16*16 position is got in instruction, and the instruction that refers to unit judges instruction buffer exit is got in instruction, the figure place and the bar number of decision transmitting instructions, and multipotency is launched 3 32 bit instructions simultaneously; When data in the instruction buffer during smaller or equal to 128bit, instruction is got and is referred to that the unit can read in 128 routine datas through the program storage interface.

Said vector operation streamline has independently instruction decode and instruction control unit, independent vector operation unit, special-purpose instruction set; The multiplication unit that comprises operand extraction unit, 4 16*32 in the vector operation unit, two 64 ACC comprises the SIMD instruction in the special-purpose instruction set, support parallel multiplication, take advantage of and add, take advantage of operations such as subtracting, take advantage of plus-minus; Support at most 4 groups 16 of executed in parallel or 2 groups 32 s' multiplication or take advantage of to add, take advantage of and subtract, take advantage of and add reducing.

The present invention has the Vector Processing class instruction that is specifically designed to the Vector Processing streamline, and the operation to 128 bit data register group Xn, XVn is supported in the instruction of Vector Processing class, is used to support single instrction to carry out 4 groups parallel 16 and takes advantage of add operation; The data register bank formed by 4 32 bit data register of Xn wherein, the data register bank that XVn is made up of 4 32 bit vector registers.

The present invention is low-power consumption, high-speed, the high-performance digital signal processor used towards embedded system, is mainly used in built-in applied systems such as radio communication, Flame Image Process, control in real time.The present invention adopts superscale RSIC instruction framework, supports the parallel emission of 3 instructions of single clock cycle, supports the decoding of 16/32 bit instruction, supports monocycle 4MAC operation, supports abundant DSP addressing mode, supports the SIMD operation, supports floating-point operation.

Advantage of the present invention is: the present invention is a kind of 32 fixed point/floating-point signal processor supporting single-instruction multiple-data stream (SIMD) (SIMD) and three emissions; It walks abreast different instructions and is transmitted into corresponding performance element; Support 3 pipeline parallel method operations, also support the instruction of SIMD class simultaneously.The present invention supports 3 pipeline parallel methods to carry out, and has improved the parallel processing capability of DSP; Increased independently vector operation unit; Support the add operation of taking advantage of of 4 groups 16 of monocycle executed in parallel; Add integer arithmetic unit and data access unit with the vector pipeline executed in parallel; The present invention can support 5 groups of data operations and 1 group of data access operation to carry out simultaneously, has promoted the data-handling capacity of DSP.

Description of drawings

Fig. 1 is the basic structure block diagram of DSP of the present invention.

Fig. 2 is the multi-level store hierarchical chart of DSP of the present invention.

Fig. 3 gets for the instruction of DSP of the present invention and refers to cellular construction figure.

Fig. 4 is the vector operation unit fundamental block diagram of DSP of the present invention.

Fig. 5 is the multiport register file data flow figure of DSP of the present invention.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is described further.

Digital signal processor of the present invention adopts 3 emissions, 5 stage pipeline structure, possesses the streamline of 3 executed in parallel, and every streamline possesses to be got fingers, decoding, execution 1, execution 2, write back 5 grades of flowing water.Wherein getting the finger unit is that 3 streamlines are shared, and decoding and operating part possess 3 groups of independently parts.

Comprise 16 bit instructions and 32 bit instructions in the instruction set of digital signal processor of the present invention, wherein 16 bit instructions are subclass of 32 bit instructions.Except possessing the DSP universal command, also be designed with multimedia and handle special instruction, Viterbi decoding special instruction, Vector Processing instruction and floating-point operation instruction in the instruction set of the present invention.Multimedia is handled special instruction and is comprised instructions such as asking vectorial mean value, matrix operation, byte-extraction, is used to quicken multimedia and handles; Viterbi decoding special instruction comprises instructions such as separate bit interleave, position, the Viterbi tracking is returned, and is used to quicken the Viterbi decoding; The instruction of Vector Processing class is mainly used in the SIMD operation, supports the operation to 128 bit data, supports monocycle 4MAC operation; The floating-point operation instruction is used to support floating-point operation.

As shown in Figure 1, the present invention comprises the streamline of 3 parallel emissions: data access streamline, integer arithmetic streamline, vector operation streamline, every streamline possesses independently decoding and performance element, and supports the SIMD operation.

The present invention is mainly got by program storage interface unit, data-carrier store interface unit, instruction and refers to that unit, pipeline control unit, system bus, data access pipelined units, integer arithmetic pipelined units, vector operation pipelined units, data register, address register, vector registor, coprocessor interface unit, FPU Float Point Unit connect to form through circuit.Wherein, the data access pipelined units comprises access decoding unit, storage control unit, address arithmetic unit, address-generation unit; The integer arithmetic pipelined units comprises ALU decoding unit, arithmetic control module, ALU, bit processing unit; The vector operation pipelined units comprises to take advantage of and adds decoding unit, takes advantage of and add control module, vector operation unit.FPU Float Point Unit is connected to the integer arithmetic streamline as coprocessor through the coprocessor interface unit.Program storage interface unit, data-carrier store interface unit are the interfaces that exchanges with external data, wherein comprise configurable program high-speed cache in the program storage interface, comprise configurable data cache in the data-carrier store interface.Pipeline control unit is used for pipeline state management and abnormality processing.General-purpose register file is made up of 16 32 bit address registers, 16 32 bit data register and 16 32 bit vector registers, has constituted near the first order storer of processor, and the second level is high-speed cache, and the third level is a data-carrier store.

The annexation of each module is as shown in Figure 1 among the present invention.The program storage interface is got through address bus and program bus and instruction and is referred to that the unit links to each other, receives instruction and gets getting of referring to that the unit sends over and refer to the address, and send to director data to instruct through program bus and get the finger unit.Instruction get refer to the unit through instruction bus with access decoding unit, ALU decoding unit, take advantage of and add decoding unit and the dependent instruction control module links to each other, and 3 of the transmissions that can walk abreast are instructed in these 3 decoding units.The access decoding unit links to each other the computing of control address generation unit and address arithmetic unit with storage control unit with the address arithmetic unit with address-generation unit through control signal.Address-generation unit links to each other with data register with address register through data bus with the address arithmetic unit; Reading of data is carried out computing from address register and data register, and the result writes back address register or generates the memory access address and delivers to the data-carrier store interface.The ALU decoding unit links to each other with bit processing unit with ALU through control signal with the arithmetic control module, the computing of control ALU and bit processing unit.ALU links to each other with data register through data bus with bit processing unit, and reading of data is handled from data register, and writes back the result in the data register.Take advantage of and add decoding unit and add control module and link to each other the computing of control vector arithmetic element through control signal with the vector operation unit with taking advantage of.The vector operation unit links to each other with vector registor with data register through data bus, and reading of data is handled from data register or vector registor, and writes back the result in data register or the vector registor.Data register, vector registor, address register link to each other with the data-carrier store interface unit through data bus, carry out exchanges data through data-carrier store interface and external data memory.Pipeline control unit links to each other with each execution unit through the streamline control signal, controls the execution of each execution unit, and accepts the feedback of each execution unit.FPU Float Point Unit is connected to the integer arithmetic streamline through coprocessor interface, through integer arithmetic streamline reading command and data, carries out associative operation, and sends result back to data register through the integer arithmetic streamline.

This DSP nuclear reads in from external memory storage through the program storage interface and instructs instruction to get in the instruction buffer that refers to the unit; And, can walk abreast at most and send 3 instructions through the type of instruction being confirmed in the preparatory decoding of instruction buffer exit instruction and being sent in the corresponding streamline.Decoding unit in the streamline and instruction control unit receive get refer to instruction that the unit sends over after, instruction is deciphered, produce relevant control signal, confirm the types and sources of operand, and send into performance element to operand.Performance element in integer arithmetic streamline and the vector operation streamline calculates operand under the control of the control signal that decoding unit produces, and the result of generation sent in the relevant register in the stage of writing back.The memory access address of the performance element executive address computing of data access streamline or generation data access operation; And be saved in corresponding address register to the result of address arithmetic in the stage that writes back, perhaps external memory storage is carried out data read-write operation according to the memory access address that generates.

The program storage interface is that connection external program bus and built-in command are got the interface that refers to the unit, and its inside comprises a configurable program high-speed cache, can enable or close this high-speed cache.The data-carrier store interface is the interface that connects inner general-purpose register and external data bus, the interface that provides data to exchange, and its inside comprises a configurable data cache.

Instruction is got and is referred to that the unit is 3 parts that streamline is shared, and it reads in instruction through the program storage interface, and sends to the respective streams waterline after instruction deciphered in advance.

The data access streamline comprises access decoding unit, storage control unit, address arithmetic unit, address-generation unit; The access decoding unit is got instruction and is referred to that the access class instruction that the unit launches deciphers; Storage control unit is according to the operation of access decode results control address arithmetic element and address-generation unit; The address arithmetic unit is to calculating from address register, data register or several immediately address dates, and the result who obtains is kept in the address register; Address-generation unit is according to generation memory access addresses such as addressing modes.The data access streamline is responsible for address computation and data access work, and the data of carrying out between inner general-purpose register and the external memory storage exchange, and the data of also being responsible between data register and the address register simultaneously exchange.

The integer arithmetic streamline comprises ALU decoding unit, arithmetic control module, bit processing unit, ALU (ALU), coprocessor interface unit; Bit processing unit is used for the execute bit operational order, comprises position insertion, position extraction, position replacement and shifting function etc.; The coprocessor interface unit is responsible for carrying out data with coprocessor and is exchanged.The integer arithmetic streamline receives instruction and the execution that refers to that from getting the unit sends over, and execution result writes back in the data register, and the main plus-minus of being responsible for waits bit manipulations such as logical operation, displacement such as arithmetical operation, AOI.

The vector operation streamline comprises taking advantage of and adds decoding unit, takes advantage of and add control module, vector operation unit.Take advantage of to add decoding unit and refer to that to getting the instruction that the unit sends over deciphers, generate associated control signal; Take advantage of and add the computing of control module according to decode results control vector arithmetic element; The multiplication unit, 2 64 bit accumulators (ACC) that comprise operand extraction unit, 4 16*32 in the vector operation unit are mainly used in and carry out the parallel operations such as adding, take advantage of plus-minus of taking advantage of, take advantage of, and support SIMD.The vector operation streamline mainly is responsible for the computing that single instrction is carried out multidiameter delay, can support 4 groups 16 multiply-add operation executed in parallel at most.

Pipeline control unit is responsible for generating control signal corresponding, controls the operation of each streamline.Pipeline control unit receives the state of each streamline of present stage, and interruptions, trap etc. are made judgement, and generation each pipeline state of next stage is controlled the operation of next stage of each streamline.

As shown in Figure 2, the present invention adopts the memory organization mode of stratification, near execution unit be the internal register file, comprise 16 32 address register, 16 32 data register, 16 32 vector registor; The second level is high-speed cache; The third level is a data-carrier store.Comprise the program high-speed cache in the program storage interface of the present invention, comprise data cache in the data-carrier store interface, memory interface can determine whether using high-speed cache (CACHE) through corresponding configuration.General-purpose register of the present invention can make up use; Two 32 bit address registers can be formed 64 bit address register pairs; Two 32 bit vector registers can be formed 64 bit vector register pairs; It is right that two 32 bit data register can be formed 64 bit data register, and 4 vector registors or data register can be formed 128 bit data register groups, are used to support the SIMD operation.

As shown in Figure 3, instruction of the present invention is got and is referred to that the unit is responsible for from the external read instruction fetch to instruction BUF (impact damper), and sends in the respective streams waterline after instruction deciphered in advance.Instruction get refer to the unit comprise one 256 instruction BUF, in advance decoding unit, get and refer to the request generation unit, get and refer to the cancelling signal generation unit, get and refer to scalar/vector, recursion instruction processing unit, branch's jump instruction processing unit.Instruction BUF is used to preserve the instruction of reading in, and when the data of instruction among the BUF are lower than 128, get and refers to that the unit can generate and get the finger request signal, and read 128 bit instruction data in instruction BUF.Get and refer to that the request generation unit produces according to pipeline state and data bits among the instruction BUF and get the finger request signal.Get to refer to the cancelling signal generation unit judges according to pipeline state whether get the finger activity cancels, get the finger cancelling signal if will cancel then generating.Recursion instruction processing unit, branch's jump instruction processing unit are handled recursion instruction, branch's jump instruction respectively, calculate the PC value that makes new advances.Get and refer to scalar/vector, generate next cycles per instruction and get the finger address according to the result of former PC value with recursion instruction, branch's jump instruction.Instruct preparatory decoding unit to comprise integer arithmetic instruction formation logic, vector operation instruction formation logic, data access command formation logic etc.; Main being responsible for judged low several among the instruction BUF; Definite instruction strip number and figure place of therefrom reading; Can read 3 32 bit instructions simultaneously at most, and send in the respective streams waterline.

As shown in Figure 4, vector operation of the present invention unit pushes away the multiplication unit of logic, operand extraction unit, 4 16*32,4 grades of CSA, 2 64 ACC before mainly comprising data.Take advantage of to add decoding unit and take advantage of and add control module is deciphered generation vector operation unit, back to instruction control signal.Operand through pushing away before the data after the processing is sent in the operand extraction unit, and mainly from data register, the number or the operation result in last cycle immediately, instruction operands is at most 128bit to operand, from data register bank Xn or XVn.The operand extraction unit extracts operand according to the result of instruction decode, and delivers to the operand after extracting among each multiplication unit and the ACC.As carry out one and support 4 groups 16 to take advantage of when adding the parallel SIMD instruction of carrying out, the operand extraction unit is 64 positional operands of input that unit delivers to respectively in 4 multiplication units with 16 half-words just, carries out 4 groups of 16 parallel multiply operations; Is 128 positional operands of input that unit delivers among two ACC after respectively with 32 words, carries out 4 groups of 32 parallel add operations; The execution result of two ACC is handled the back combination through formation logic as a result and is generated 128 vector operation result, writes back among 128 the data register bank Xn or XVn.Therefore the vector operation unit can be supported simultaneously to take advantage of for 4 groups 16 and add parallel execution the: 32+16*16; And 4 32 result combinations that generate become 128 bit data to write back to data register bank.

As shown in Figure 5, general-purpose register file of the present invention is the memory bank of multiport.General-purpose register comprises address register, data register, vector registor, and these three groups of memory banks that register file all is a multiport are supported parallel read-write operation.Three groups of registers are all supported to exchange with the data of carrying out of external memory storage, also support different streamlines to carry out computing from reading of data wherein simultaneously.Address register support data access stream waterline therefrom reading of data carries out address arithmetic; Data register support integer arithmetic, vector operation, three streamlines of data access therefrom reading of data carry out computing; Vector registor support vector arithmetic pipelining therefrom reading of data carries out computing.

Claims

1. support the digital signal processor of 32 three emissions of SIMD; It is characterized in that: the streamline that comprises 3 parallel emissions: data access streamline, integer arithmetic streamline, vector operation streamline; Every streamline possesses independently decoding and performance element, and supports the SIMD operation;

Also comprise:

2. support the digital signal processor of 32 three emissions of SIMD according to claim 1; It is characterized in that; Possess multimedia and handle special instruction and Viterbi decoding special instruction; Support the instruction of 16 bit instructions and floating-point operation, support fixed point and floating-point operation, wherein 16 bit instructions are the subclass of 32 bit instructions.

3. support the digital signal processor of 32 three emissions of SIMD according to claim 1, it is characterized in that having independently data cache and program high-speed cache.

4. support the digital signal processor of 32 three emissions of SIMD according to claim 1, it is characterized in that every streamline possesses its proprietary instruction, two of specialized designs are used to distinguish this three types of instructions in the operational code of instruction; Get the finger stage in instruction, instruction is got the finger unit and is deciphered to confirm classes of instructions in advance to these two, and classification sends to corresponding streamline then.

5. support the digital signal processor of 32 three emissions of SIMD according to claim 1; It is characterized in that; Adopt the memory organization mode of stratification; The first order near execution unit be the internal register file, comprise 16 32 address register, 16 32 data register, 16 32 vector registor; The second level is high-speed cache; The third level is a data-carrier store; Two 32 bit address registers can be formed 64 bit address register pairs; Two 32 bit vector registers can be formed 64 bit vector register pairs; It is right that two 32 bit data register can be formed 64 bit data register; 4 vector registors or data register can be formed 128 bit data register groups, are used to support the SIMD operation.

6. supporting the digital signal processors of 32 three of SIMD emissions according to claim 1, it is characterized in that, adopt 5 stage pipeline structure, is respectively to get fingers, decoding, carry out 1, execution 2, write back, and wherein deciphers and operating part has 3 groups of independently parts respectively.

7. like 32 three digital signal processors of launching of the said support of claim 6 SIMD, it is characterized in that 3 streamline shared instructions are got the finger unit and deciphered, carrying out 1, carry out 2, writing back stage independent parallel execution; Wherein the data access streamline is responsible for address arithmetic, storage access, unconditional jump; The integer arithmetic streamline is responsible for adding reducing, logical operation, compare operation, shifting function, floating-point operation, bit manipulation, condition redirect, and the vector operation streamline mainly be responsible for to carry out and single or a plurality ofly parallel take advantage of, take advantage of and add, take advantage of plus-minus, take advantage of and subtract add operation.

8. support the digital signal processor of 32 three emissions of SIMD according to claim 1; It is characterized in that; Said instruction is got and is referred to that the unit has the preparatory decoding function of instruction, determines transmitting instructions in data access streamline, integer arithmetic streamline or vector operation streamline according to the preparatory decode results of instruction; The instruction buffer that refers to comprise in the unit 16*16 position is got in instruction, and the instruction that refers to unit judges instruction buffer exit is got in instruction, the figure place and the bar number of decision transmitting instructions, and multipotency is launched 3 32 bit instructions simultaneously; When data in the instruction buffer during smaller or equal to 128bit, instruction is got and is referred to that the unit can read in 128 routine datas through the program storage interface.

9. support the digital signal processors of 32 three of SIMD emissions according to claim 1, it is characterized in that said vector operation streamline has the instruction set of independently instruction decode and instruction control unit, independent vector operation unit, special use; The multiplication unit that comprises operand extraction unit, 4 16*32 in the vector operation unit, two 64 ACC comprises the SIMD instruction in the special-purpose instruction set, support parallel multiplication, take advantage of and add, take advantage of operations such as subtracting, take advantage of plus-minus; Support at most 4 groups 16 of executed in parallel or 2 groups 32 s' multiplication or take advantage of to add, take advantage of and subtract, take advantage of and add reducing.

10. support the digital signal processor of 32 three emissions of SIMD according to claim 1; It is characterized in that; Have the Vector Processing class instruction that is specifically designed to the Vector Processing streamline; The operation to 128 bit data register group Xn, XVn is supported in Vector Processing class instruction, is used to support single instrction to carry out 4 groups parallel 16 and takes advantage of add operation; The data register bank formed by 4 32 bit data register of Xn wherein, the data register bank that XVn is made up of 4 32 bit vector registers.