CN102750133B

CN102750133B - 32-Bit triple-emission digital signal processor supporting SIMD

Info

Publication number: CN102750133B
Application number: CN201210205812.0A
Authority: CN
Inventors: 屈凌翔; 张庆文; 黄嵩人; 杨晓刚
Original assignee: CETC 58 Research Institute
Current assignee: CETC 58 Research Institute
Priority date: 2012-06-20
Filing date: 2012-06-20
Publication date: 2014-07-30
Anticipated expiration: 2032-06-20
Also published as: CN102750133A

Abstract

The invention discloses a 32-bit triple-emission digital signal processor supporting SIMD (Single Instruction Multiple Data), comprising three flow lines in parallel emission: a data access flow line, an integer arithmetic flow line and a vector arithmetic flow line, wherein each flow line is provided with an independent decoding and execution unit and supports SIMD operation. The 32-bit triple-emission digital signal processor supporting SIMD is mainly composed of a program memory interface unit, a data memory interface unit, an instruction fetch unit, a flow line control unit, a system bus, a data access flow line unit, an integer arithmetic flow line unit, a vector arithmetic flow line unit, a data register, an address register, a vector register, a coprocessor interface unit and a floating point arithmetic unit, all of which are connected together through a circuit. The 32-bit triple-emission digital signal processor supporting SIMD supports parallel execution of three flow lines so that the parallel processing capability of a DSP (Digital Signal Processor) is improved; besides, the 32-bit triple-emission digital signal processor supports parallel execution four groups of 16-bit multiplying and adding operations in a single cycle, and supports simultaneous execution of the operation of five groups of data and the access operation of one group of data; therefore, the data processing capability of the DSP is enhanced.

Description

Support the digital signal processor of 32 three transmittings of SIMD

Technical field

The present invention relates to digital signal processor, specifically a kind of DSP of 32 three transmittings supporting SIMD.

Background technology

DSP is processing digital signal in real time, possesses the data-handling capacity of super general processor far away, plays an important role in fields such as data communication, multimedia processing.Along with the high speed development of the communication technology and multimedia technology, also more and more higher to the requirement of DSP data-handling capacity.The method that promotes DSP data-handling capacity mainly contains and promotes DSP dominant frequency, adopts multi-core architecture to promote processing power, the parallel processing capability that employing multiple-issue architecture promotes DSP core etc. of whole circuit.Along with DSP dominant frequency is more and more higher, the cost and the difficulty that continue raising are also more and more higher at present; Although and multicore architecture can reduce the requirement to DSP core, can greatly increase the design difficulty of whole SOC circuit.Therefore improving dominant frequency and taking outside multicore architecture, multiple-issue architecture is also more and more universal.The present invention adopts 3 emitting structurals, and 3 streamlines can executed in parallel; Especially to add processing power in order promoting to take advantage of, to have designed separately Vector Processing streamline, this streamline and another two streamlines are independent execution completely, mainly supports parallel data operation.

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, improve concurrent operation ability and the multiply-add operation ability of processor, the digital signal processor of 32 three transmittings of a kind of SIMD of support is provided, and is the digital signal processor towards multimedia processing and data communication field.

According to technical scheme provided by the invention, the digital signal processor of 32 three transmittings of described support SIMD, comprise the streamline of 3 parallel transmittings: data access streamline, integer arithmetic streamline, vector operation streamline, every streamline possesses independently decoding and performance element, and supports SIMD operation;

Described data access streamline, comprises access decoding unit, storage control unit, address arithmetic unit, address-generation unit; Access decoding unit is connected with address arithmetic unit with address-generation unit by control signal with storage control unit, control the computing of address-generation unit and address arithmetic unit, address-generation unit is connected with data register with address register by data bus with address arithmetic unit, from address register and data register, reading out data carries out computing, and result writes back address register or generates memory access address delivers to data-carrier store interface; Decoding is carried out in the access class instruction that access decoding unit launches instruction fetching unit; Storage control unit is according to the operation of access decode results control address arithmetic unit and address-generation unit; Address arithmetic unit is to calculating from the address date of address register, data register or immediate, and the result obtaining is kept in address register; Address-generation unit generates memory access address according to addressing mode etc.; Only have data access streamline to carry out read-write operation to external memory storage, all the other two streamlines are only to register manipulation;

Described integer arithmetic streamline, comprises ALU decoding unit, arithmetic control module, bit processing unit, ALU, coprocessor interface unit; ALU decoding unit is connected with bit processing unit with ALU by control signal with arithmetic control module, control the computing of ALU and bit processing unit, ALU is connected with data register by data bus with bit processing unit, from data register, reading out data is processed, and result is write back in data register; Bit processing unit is used for execute bit operation, comprises an insertion, position extraction, position replacement and shifting function etc.; Coprocessor interface unit is responsible for and coprocessor carries out data exchange;

Described vector operation streamline, comprises taking advantage of adding decoding unit, taking advantage of and add control module, vector operation unit; Take advantage of and add decoding unit and add control module and be connected with vector operation unit by control signal with taking advantage of, the computing of control vector arithmetic element, vector operation unit is connected with vector registor with data register by data bus, from data register or vector registor, reading out data is processed, and result is write back in data register or vector registor; In vector operation unit, comprise the multiplication unit of operand extraction unit, 4 16*32,2 64 bit accumulators, be mainly used in carrying out and parallel take advantage of, take advantage of and add, take advantage of and add reducing, support SIMD;

Also comprise:

Instruction fetching unit, is 3 fetching unit that streamline is shared, realizes the order of program and carries out and redirect, and data collision is made prediction for controlling the programmable counter of this unit; , in the instruction buffer of instruction fetching unit, and the order code in instruction buffer is judged by program storage interface read-in programme data, according to judged result, 32 or 16 bit instruction codes are sent to respectively in corresponding streamline;

Float Point Unit, as coprocessor, is connected to integer arithmetic streamline by coprocessor interface unit; Have independently floating intruction set, carry out floating-point operation by integer arithmetic streamline, and result is sent back to data register by integer arithmetic streamline;

Register file: comprise 16 32 bit address registers, 16 32 bit data register, 16 32 bit vector register and special function registers; Address register generates for address arithmetic and memory access address, and data register is for integer arithmetic and vector operation, and vector registor, for the SIMD operation of support vector computing, all supports different streamlines simultaneously from parallel the reading and writing data of different port;

Program storage interface is connected with program bus and instruction fetching unit by address bus, receives the fetching address that instruction fetching unit sends over, and director data is sent to instruction fetching unit by program bus; Instruction fetching unit by instruction bus with access decoding unit, ALU decoding unit, take advantage of and add decoding unit and be connected, and can walk abreast and send 3 instructions in these 3 decoding units; Described data register, vector registor, address register are connected with data-carrier store interface unit by data bus, carry out exchanges data by data-carrier store interface and external data memory; Pipeline control unit is connected with each execution unit by Pipeline control signal, controls the execution of each execution unit, and accepts the feedback of each execution unit.

The digital signal processor of 32 three transmittings of described support SIMD possesses multimedia and processes special instruction and Viterbi decoding special instruction, support 16 bit instructions and floating-point operation instruction, support fixed point and floating-point operation, wherein 16 bit instructions are the subset of 32 bit instructions; There is independently data cache and program high-speed cache.

Every streamline possesses its proprietary instruction, and in the operational code of instruction, two of specialized designs are for distinguishing this three classes instruction; In the instruction fetching stage, instruction fetching unit carries out pre-decode to these two and determines classes of instructions, and then classification sends to corresponding streamline.

The present invention adopts the memory organization mode of stratification, the first order the most close execution unit be internal register file, comprise the address register of 16 32, the data register of 16 32, the vector registor of 16 32; The second level is high-speed cache; The third level is data-carrier store; Two 32 bit address registers can form 64 bit address register pairs, two 32 bit vector registers can form 64 bit vector register pairs, two 32 bit data register can form 64 bit data register pair, 4 vector registors or data register can form 128 bit data register groups, for supporting SIMD operation.

The present invention adopts 5 stage pipeline structure, is respectively fetching, decoding, execution 1, carries out 2, writes back, and wherein decoding and operating part have respectively 3 groups of independently parts.

Article 3, streamline shared instruction fetching unit decoding, carry out 1, carry out 2, write back the stage independent parallel carry out; Wherein data access streamline is responsible for address arithmetic, memory access, unconditional jump, integer arithmetic streamline is responsible for adding reducing, logical operation, compare operation, shifting function, floating-point operation, bit manipulation, conditional jump, and vector operation streamline is mainly responsible for carrying out and single or multiplely parallel takes advantage of, take advantage of and add, take advantage of plus-minus, take advantage of and subtract add operation.

Described instruction fetching unit has instruction pre-decode function, determines that according to instruction pre-decode result instruction issue is in data access streamline, integer arithmetic streamline or vector operation streamline; The instruction buffer that comprises a 16*16 position in instruction fetching unit, the instruction in instruction fetching unit judges instruction buffer exit, determines figure place and the number of instruction issue, multipotency is launched 3 32 bit instructions simultaneously; In the time that the data in instruction buffer are less than or equal to 128bit, instruction fetching unit can read in 128 routine datas by program storage interface.

Described vector operation streamline has independently Instruction decoding and instruction control unit, independent vector operation unit, special instruction set; Vector operation unit comprises the multiplication unit of operand extraction unit, 4 16*32, and the ACC of two 64 comprises SIMD instruction in special instruction set, supports parallel multiplication, takes advantage of and add, take advantage of and subtract, take advantage of the operations such as plus-minus; Support at most 4 groups 16 of executed in parallel or 2 groups 32 s' multiplication or take advantage of to add, take advantage of and subtract, take advantage of and add reducing.

The present invention has the Vector Processing class instruction that is specifically designed to Vector Processing streamline, and the operation to 128 bit data register group Xn, XVn is supported in the instruction of Vector Processing class, takes advantage of add operation for supporting single instrction to carry out 4 groups of 16 of walking abreast; The data register bank that wherein Xn is made up of 4 32 bit data register, the data register bank that XVn is made up of 4 32 bit vector registers.

The present invention is the low-power consumption of an embedded system application, high-speed, high-performance digital signal processor, is mainly used in the built-in applied systems such as radio communication, image processing, control in real time.The present invention adopts superscale RSIC instruction framework, supports the parallel transmitting of 3 instructions of single clock cycle, supports 16/32 bit instruction decoding, supports monocycle 4MAC operation, supports abundant DSP addressing mode, supports SIMD operation, supports floating-point operation.

Advantage of the present invention is: the present invention is a kind of 32 fixed point/floating-point signal processor supporting single-instruction multiple-data stream (SIMD) (SIMD) and three transmittings, different parallel instructions is transmitted into corresponding performance element by it, support 3 pipeline parallel method operations, also support the instruction of SIMD class simultaneously.The present invention supports 3 pipeline parallel methods to carry out, and has improved the parallel processing capability of DSP; Increase independently vector operation unit, support the add operation of taking advantage of of 4 groups 16 of monocycle executed in parallel, add integer arithmetic unit and data access unit with vector pipeline executed in parallel, the present invention can support 5 groups of data operations and 1 group of data access operation to carry out simultaneously, has promoted the data-handling capacity of DSP.

Brief description of the drawings

Fig. 1 is the basic structure block diagram of DSP of the present invention.

Fig. 2 is the multi-level store hierarchical chart of DSP of the present invention.

Fig. 3 is the instruction fetching cellular construction figure of DSP of the present invention.

Fig. 4 is the vector operation unit fundamental block diagram of DSP of the present invention.

Fig. 5 is the multiport register file data flow figure of DSP of the present invention.

Embodiment

Below in conjunction with drawings and Examples, the invention will be further described.

Digital signal processor of the present invention adopts 3 transmittings, 5 stage pipeline structure, possesses the streamline of 3 executed in parallel, and every streamline possesses fetching, decoding, execution 1, carries out 2, writes back 5 grades of flowing water.Wherein fetching unit is that 3 streamlines share, and decoding and operating part possess 3 groups of independently parts.

In the instruction set of digital signal processor of the present invention, comprise 16 bit instructions and 32 bit instructions, wherein 16 bit instructions are subsets of 32 bit instructions.In instruction set of the present invention, except possessing DSP universal command, be also designed with multimedia and process special instruction, Viterbi decode special instruction, Vector Processing instruction and floating-point operation instruction.Multimedia is processed special instruction and is comprised instructions such as asking vectorial mean value, matrix operation, byte-extraction, for accelerating multimedia processing; Viterbi decoding special instruction comprises the instructions such as bit interleave, position separate, Viterbi tracking is returned, for accelerating Viterbi decoding; The instruction of Vector Processing class is mainly used in SIMD operation, supports the operation to 128 bit data, supports monocycle 4MAC operation; Floating-point operation instruction is used for supporting floating-point operation.

As shown in Figure 1, the streamline that the present invention comprises 3 parallel transmittings: data access streamline, integer arithmetic streamline, vector operation streamline, every streamline possesses independently decoding and performance element, and supports SIMD operation.

The present invention is mainly connected to form through circuit by program storage interface unit, data-carrier store interface unit, instruction fetching unit, pipeline control unit, system bus, data access pipelined units, integer arithmetic pipelined units, vector operation pipelined units, data register, address register, vector registor, coprocessor interface unit, Float Point Unit.Wherein, data access pipelined units comprises access decoding unit, storage control unit, address arithmetic unit, address-generation unit; Integer arithmetic pipelined units comprises ALU decoding unit, arithmetic control module, ALU, bit processing unit; Vector operation pipelined units comprises to take advantage of and adds decoding unit, takes advantage of and add control module, vector operation unit.Float Point Unit is connected to integer arithmetic streamline as coprocessor through coprocessor interface unit.Program storage interface unit, data-carrier store interface unit are the interfaces exchanging with external data, wherein in program storage interface, comprise configurable program high-speed cache, comprise configurable data cache in data-carrier store interface.Pipeline control unit is for pipeline state management and abnormality processing.General-purpose register file is made up of 16 32 bit address registers, 16 32 bit data register and 16 32 bit vector registers, has formed the first order storer of close processor, and the second level is high-speed cache, and the third level is data-carrier store.

In the present invention, the annexation of each module as shown in Figure 1.Program storage interface is connected with program bus and instruction fetching unit by address bus, receives the fetching address that instruction fetching unit sends over, and director data is sent to instruction fetching unit by program bus.Instruction fetching unit by instruction bus with access decoding unit, ALU decoding unit, take advantage of and add decoding unit and dependent instruction control module is connected, and can walk abreast and send 3 instructions in these 3 decoding units.Access decoding unit is connected with address arithmetic unit with address-generation unit by control signal with storage control unit, controls the computing of address-generation unit and address arithmetic unit.Address-generation unit is connected with data register with address register by data bus with address arithmetic unit, from address register and data register, reading out data carries out computing, and result writes back address register or generates memory access address delivers to data-carrier store interface.ALU decoding unit is connected with bit processing unit with ALU by control signal with arithmetic control module, controls the computing of ALU and bit processing unit.ALU is connected with data register by data bus with bit processing unit, and from data register, reading out data is processed, and result is write back in data register.Take advantage of and add decoding unit and add control module and be connected with vector operation unit by control signal with taking advantage of, the computing of control vector arithmetic element.Vector operation unit is connected with vector registor with data register by data bus, and from data register or vector registor, reading out data is processed, and result is write back in data register or vector registor.Data register, vector registor, address register are connected with data-carrier store interface unit by data bus, carry out exchanges data by data-carrier store interface and external data memory.Pipeline control unit is connected with each execution unit by Pipeline control signal, controls the execution of each execution unit, and accepts the feedback of each execution unit.Float Point Unit is connected to integer arithmetic streamline by coprocessor interface, by integer arithmetic streamline reading command and data, carries out associative operation, and result is sent back to data register by integer arithmetic streamline.

This DSP core reads in instruction in the instruction buffer of instruction fetching unit by program storage interface from external memory storage, and by the pre-decode of instruction buffer exit instruction being determined to the type of instruction and being sent in corresponding streamline, can walk abreast at most and send 3 instructions.Decoding unit in streamline and instruction control unit, receiving after the instruction that fetching unit sends over, carry out decoding to instruction, produce relevant control signal, determine the types and sources of operand, and operand is sent into performance element.Under the control of the control signal that the performance element in integer arithmetic streamline and vector operation streamline produces at decoding unit, operand is calculated, the result of generation was sent in corresponding register in the stage of writing back.The performance element executive address computing of data access streamline or the memory access address of generated data accessing operation, and in the stage that writes back, the result of address arithmetic is saved in to corresponding address register, or according to the memory access address generating, external memory storage is carried out to data read-write operation.

Program storage interface is the interface that connects external program bus and built-in command fetching unit, and its inside comprises a configurable program high-speed cache, can enable or close this high-speed cache.Data-carrier store interface is the interface that connects inner general-purpose register and external data bus, and the interface of data exchange is provided, and its inside comprises a configurable data cache.

Instruction fetching unit is 3 parts that streamline is shared, and it reads in instruction by program storage interface, and instruction is carried out sending to respective streams waterline after pre-decode.

Data access streamline comprises access decoding unit, storage control unit, address arithmetic unit, address-generation unit; Decoding is carried out in the access class instruction that access decoding unit launches instruction fetching unit; Storage control unit is according to the operation of access decode results control address arithmetic unit and address-generation unit; Address arithmetic unit is to calculating from the address date of address register, data register or immediate, and the result obtaining is kept in address register; Address-generation unit generates memory access address according to addressing mode etc.Data access streamline is responsible for address computation and data access work, carries out the data exchange between inner general-purpose register and external memory storage, is also responsible for the data exchange between data register and address register simultaneously.

Integer arithmetic streamline comprises ALU decoding unit, arithmetic control module, bit processing unit, ALU (ALU), coprocessor interface unit; Bit processing unit, for execute bit operational order, comprises an insertion, position extraction, position replacement and shifting function etc.; Coprocessor interface unit is responsible for and coprocessor carries out data exchange.Integer arithmetic streamline receives the instruction sending over from fetching unit and carries out, and execution result writes back in data register, main responsible plus-minus waits the bit manipulations such as logical operation, displacement such as arithmetical operation, AOI.

Vector operation streamline comprises taking advantage of and adds decoding unit, takes advantage of and add control module, vector operation unit.Take advantage of to add the instruction that decoding unit sends over fetching unit and carry out decoding, generate associated control signal; Take advantage of and add control module according to the computing of decode results control vector arithmetic element; In vector operation unit, comprise multiplication unit, 2 64 bit accumulators (ACC) of operand extraction unit, 4 16*32, be mainly used in carrying out and parallel take advantage of, take advantage of and add, take advantage of the operations such as plus-minus, support SIMD.Vector operation streamline is mainly responsible for the computing of single instrction execution multidiameter delay, can support at most the multiply-add operation executed in parallel of 4 groups 16.

Pipeline control unit is responsible for generating corresponding control signal, controls the operation of each streamline.Pipeline control unit receives the state of each streamline of present stage, and interruption, trap etc. are judged, and generates each pipeline state of next stage, controls the operation of each streamline next stage.

As shown in Figure 2, the present invention adopts the memory organization mode of stratification, and the most close execution unit is internal register file, comprises the address register of 16 32, the data register of 16 32, the vector registor of 16 32; The second level is high-speed cache; The third level is data-carrier store.Program storage interface of the present invention comprises program high-speed cache, and data-carrier store interface comprises data cache, and memory interface can determine whether using high-speed cache (CACHE) by corresponding configuration.General-purpose register of the present invention can be used in combination, two 32 bit address registers can form 64 bit address register pairs, two 32 bit vector registers can form 64 bit vector register pairs, two 32 bit data register can form 64 bit data register pair, 4 vector registors or data register can form 128 bit data register groups, for supporting SIMD operation.

As shown in Figure 3, instruction fetching of the present invention unit is responsible for from outside reading command to instruction BUF(impact damper), and instruction is carried out sending in respective streams waterline after pre-decode.Instruction fetching unit comprises instruction BUF, pre-decode unit, fetching request generation unit, fetching cancelling signal generation unit, fetching scalar/vector, recursion instruction processing unit, branch's jump instruction processing unit of one 256.Instruction BUF is for preserving the instruction of reading in, and when the data in instruction BUF are during lower than 128, fetching unit can generate fetching request signal, and reads 128 bit instruction data in instruction BUF.Fetching request generation unit produces fetching request signal according to the data bits in pipeline state and instruction BUF.Fetching cancelling signal generation unit judges according to pipeline state whether fetching activity is cancelled, and generates fetching cancelling signal if will cancel.Recursion instruction processing unit, branch's jump instruction processing unit are processed recursion instruction, branch's jump instruction respectively, calculate the PC value making new advances.Fetching scalar/vector, according to the result of former PC value and recursion instruction, branch's jump instruction, generates next cycles per instruction fetching address.Instruction pre-decode unit comprises integer arithmetic instruction formation logic, vector operation instruction formation logic, data access command formation logic etc., main being responsible for judges low several in instruction BUF, determine instruction strip number and the figure place of therefrom reading, can read at most 3 32 bit instructions simultaneously, and send in respective streams waterline.

As shown in Figure 4, vector operation of the present invention unit pushes away the multiplication unit of logic, operand extraction unit, 4 16*32,4 grades of CSA, 2 64 ACC before mainly comprising data.Take advantage of and add the control signal that decoding unit and taking advantage of adds control module instruction is carried out generating after decoding vector operation unit.Send in operand extraction unit through pushing away operand after treatment before data, operand is mainly from the operation result in data register, immediate or last cycle, and instruction operands mostly is 128bit most, from data register bank Xn or XVn.Operand extraction unit extracts operand according to the result of Instruction decoding, and the operand after extracting is delivered in each multiplication unit and ACC.As carry out one and support that 4 groups 16 while taking advantage of the SIMD instruction that adds executed in parallel, operand extraction unit is just delivered to 64 positional operands of input respectively taking 16 half-words as unit in 4 multiplication units, carries out 4 groups of 16 multiply operations that walk abreast; 128 positional operands of input are delivered to respectively in two ACC afterwards taking 32 words as unit, carried out 4 groups of 32 parallel add operations; The vector operation result of the execution result of two ACC, 128 of combination producings after result formation logic is processed, writes back in the data register bank Xn or XVn of 128.Therefore vector operation unit can be supported to take advantage of for 4 groups 16 simultaneously and add executed in parallel: 32+16*16; And 4 32 results that generate are combined into 128 bit data and write back to data register bank.

As shown in Figure 5, the memory bank that general-purpose register file of the present invention is multiport.General-purpose register comprises address register, data register, vector registor, and these three groups of memory banks that register file is all multiport, support parallel read-write operation.Three groups of registers are all supported the data exchange that carries out with external memory storage, also support different streamlines to carry out computing from reading out data wherein simultaneously.Address register supported data access stream waterline therefrom reading out data carries out address arithmetic; Data register support integer arithmetic, vector operation, three streamlines of data access therefrom reading out data carry out computing; Vector registor support vector arithmetic pipelining therefrom reading out data carries out computing.

Claims

1. support the digital signal processor of 32 three transmittings of SIMD, it is characterized in that: the streamline that comprises 3 parallel transmittings: data access streamline, integer arithmetic streamline, vector operation streamline, every streamline possesses independently decoding and performance element, and supports SIMD operation;

Described data access streamline, comprises access decoding unit, storage control unit, address arithmetic unit, address-generation unit; Access decoding unit is connected with address arithmetic unit with address-generation unit by control signal with storage control unit, control the computing of address-generation unit and address arithmetic unit, address-generation unit is connected with data register with address register by data bus with address arithmetic unit, from address register and data register, reading out data carries out computing, and result writes back address register or generates memory access address delivers to data-carrier store interface; Decoding is carried out in the access class instruction that access decoding unit launches instruction fetching unit; Storage control unit is according to the operation of access decode results control address arithmetic unit and address-generation unit; Address arithmetic unit is to calculating from the address date of address register, data register or immediate, and the result obtaining is kept in address register; Address-generation unit generates memory access address according to addressing mode; Only have data access streamline to carry out read-write operation to external memory storage, all the other two streamlines are only to register manipulation;

Described integer arithmetic streamline, comprises ALU decoding unit, arithmetic control module, bit processing unit, ALU, coprocessor interface unit; ALU decoding unit is connected with bit processing unit with ALU by control signal with arithmetic control module, control the computing of ALU and bit processing unit, ALU is connected with data register by data bus with bit processing unit, from data register, reading out data is processed, and result is write back in data register; Bit processing unit, for execute bit operation, comprises that insert position, extract position, replace and shifting function position; Coprocessor interface unit is responsible for and coprocessor carries out data exchange;

Also comprise:

Instruction fetching unit, is 3 fetching unit that streamline is shared, realizes the order of program and carries out and redirect, and data collision is made prediction for controlling the programmable counter of this instruction fetching unit; , in the instruction buffer of instruction fetching unit, and the order code in instruction buffer is judged by program storage interface read-in programme data, according to judged result, 32 or 16 bit instruction codes are sent to respectively in corresponding streamline;

Register file: comprise 16 32 bit address registers, 16 32 bit data register, 16 32 bit vector register and special function registers; Address register generates for address arithmetic and memory access address, and data register is for integer arithmetic and vector operation, and vector registor, for the SIMD operation of support vector computing, all supports different streamlines simultaneously from parallel the reading and writing data of different port; Program storage interface is connected with program bus and instruction fetching unit by address bus, receives the fetching address that instruction fetching unit sends over, and director data is sent to instruction fetching unit by program bus; Instruction fetching unit by instruction bus with access decoding unit, ALU decoding unit, take advantage of and add decoding unit and be connected, and can walk abreast and send 3 instructions in these 3 decoding units; Described data register, vector registor, address register are connected with data-carrier store interface unit by data bus, carry out exchanges data by data-carrier store interface and external data memory; Pipeline control unit is connected with each execution unit by Pipeline control signal, controls the execution of each execution unit, and accepts the feedback of each execution unit.

2. support as claimed in claim 1 the digital signal processor of 32 three transmittings of SIMD, it is characterized in that, possess multimedia and process special instruction and Viterbi decoding special instruction, support 16 bit instructions and floating-point operation instruction, support fixed point and floating-point operation, wherein 16 bit instructions are the subset of 32 bit instructions.

3. the digital signal processor of supporting as claimed in claim 1 32 three transmittings of SIMD, is characterized in that having independently data cache and program high-speed cache.

4. the digital signal processors of supporting as claimed in claim 1 32 three of SIMD transmittings, is characterized in that, every streamline possesses its proprietary instruction, and in the operational code of instruction, two of specialized designs are for distinguishing this three classes instruction; In the instruction fetching stage, instruction fetching unit carries out pre-decode to these two and determines classes of instructions, and then classification sends to corresponding streamline.

5. support as claimed in claim 1 the digital signal processor of 32 three transmittings of SIMD, it is characterized in that, adopt the memory organization mode of stratification, the first order the most close execution unit be internal register file, comprise the address register of 16 32, the data register of 16 32, the vector registor of 16 32; The second level is high-speed cache; The third level is data-carrier store; Two 32 bit address registers can form 64 bit address register pairs, two 32 bit vector registers can form 64 bit vector register pairs, two 32 bit data register can form 64 bit data register pair, 4 vector registors or data register can form 128 bit data register groups, for supporting SIMD operation.

6. the digital signal processor of supporting as claimed in claim 1 32 three transmittings of SIMD, is characterized in that, adopts 5 stage pipeline structure, is respectively fetching, decoding, execution 1, carries out 2, writes back, and wherein decoding and operating part have respectively 3 groups of independently parts.

7. the digital signal processors of supporting as claimed in claim 6 32 three of SIMD transmittings, is characterized in that, 3 streamline shared instruction fetching unit in decoding, carry out 1, carry out 2, write back stage independent parallel and carry out; Wherein data access streamline is responsible for address arithmetic, memory access, unconditional jump, integer arithmetic streamline is responsible for adding reducing, logical operation, compare operation, shifting function, floating-point operation, bit manipulation, conditional jump, and vector operation streamline is mainly responsible for carrying out and single or multiplely parallel takes advantage of, take advantage of and add, take advantage of plus-minus, take advantage of and subtract add operation.

8. support as claimed in claim 1 the digital signal processor of 32 three transmittings of SIMD, it is characterized in that, described instruction fetching unit has instruction pre-decode function, determines that according to instruction pre-decode result instruction issue is in data access streamline, integer arithmetic streamline or vector operation streamline; The instruction buffer that comprises a 16*16 position in instruction fetching unit, the instruction in instruction fetching unit judges instruction buffer exit, determines figure place and the number of instruction issue, multipotency is launched 3 32 bit instructions simultaneously; In the time that the data in instruction buffer are less than or equal to 128bit, instruction fetching unit can read in 128 routine datas by program storage interface.

9. the digital signal processors of supporting as claimed in claim 1 32 three of SIMD transmittings, is characterized in that, described vector operation streamline has independently Instruction decoding and instruction control unit, independent vector operation unit, special instruction set; Vector operation unit comprises the multiplication unit of operand extraction unit, 4 16*32, and the totalizer of two 64 comprises SIMD instruction in special instruction set, supports parallel multiplication, takes advantage of and add, take advantage of and subtract, take advantage of and add reducing; Support at most 4 groups 16 of executed in parallel or 2 groups 32 s' multiplication or take advantage of to add, take advantage of and subtract, take advantage of and add reducing.

10. support as claimed in claim 1 the digital signal processor of 32 three transmittings of SIMD, it is characterized in that, have the Vector Processing class instruction that is specifically designed to Vector Processing streamline, the operation to 128 bit data register group Xn, XVn is supported in the instruction of Vector Processing class, takes advantage of add operation for supporting single instrction to carry out 4 groups of 16 of walking abreast; The data register bank that wherein Xn is made up of 4 32 bit data register, the data register bank that XVn is made up of 4 32 bit vector registers.