CN100461095C

CN100461095C - Medium reinforced pipelined multiplication unit design method supporting multiple mode

Info

Publication number: CN100461095C
Application number: CNB2007101569132A
Authority: CN
Inventors: 严晓浪; 孟建熠; 葛海通
Original assignee: Zhejiang University ZJU; Hangzhou C Sky Microsystems Co Ltd
Current assignee: Zhejiang University ZJU; Hangzhou C Sky Microsystems Co Ltd
Priority date: 2007-11-20
Filing date: 2007-11-20
Publication date: 2009-02-11
Anticipated expiration: 2027-11-20
Also published as: CN101162423A

Abstract

The invention discloses a multi-mode supported media reinforcement flowline multiplication unit design method. Besides realizing the common multiplication and the multiplication progression operations, the invention also provides the single instruction multiple data (SIMD) mode aiming at the data parallelism in the media application, the multiple-instruction multiple data (SDMD) mode aiming at the non data parallelism in the media application and the high accuracy operation mode aiming at the high accuracy data operation. The four multiplication operation modes provide the support for the complex algorithm of the digital signal processing in the media application field. The four modes can be dynamically switched by instructions, thereby effectively reducing the time of the algorithm switch. The multiplication unit applies flowline design methods respectively aiming at different multiplications, thereby improving the operation speed of all multiplications and greatly enhancing the operation efficiency. When the multiplication unit provided by the invention is applied to an embedding type processor, the digital signal processing capability of the embedding type processor is improved and the application domain is widened.

Description

The multimodal medium reinforced pipelined multiplication unit design method of a kind of support

Technical field

The present invention relates to the multimodal medium reinforced pipelined multiplication unit design method of a kind of support.

Background technology

Along with the continuous development of multi-media decoding and encoding, the communication technology and VLSI (very large scale integrated circuit) designs ability, obtain huge development based on the multimedia audio-video application of flush bonding processor and DSP.At present, consumer electronics product has become the common cheap consumer goods as mobile phone, MP3, Digital Television, digital broadcasting etc.

On the other hand, along with technology constantly develops, the user also improves day by day for the requirement of the quality of real-time, complicacy and the audio frequency and video of media product.This forces multimedia technology to further develop and the realization of digital signal processing is further innovated, the inevitable choice that is treated as present digital signal processing of hardware-accelerated encoding and decoding.For different application characteristics, a lot of algorithms are arranged in the Digital Signal Processing, as FIR, DCT, IDCT etc., but by analyzing, can find that these in fact all are multiplication and multiply accumulating computing by the panoramic algorithm that vast media algorithm adopted.The realization of multiplication and multiply accumulating computing and acceleration are the most key rings during all hardware codec or DSP realize.

For the dirigibility and the raising calculation process ability that increase hardware system, digital signal processor (DigitalSignal Processor) is applied in a lot of audio-visual systems widely.Digital signal processor has carried out hardware-accelerated for multiplication and multiply accumulating computing, significantly improved processing power, and because its operating software, can realize the standard of various audio frequency and video with software, has brought very big hope for the dirigibility of product.But because DSP product itself mainly applies to the computing field, but not the control field, this makes it for present multitask, become today that the programming complexity constantly promotes unable to do what one wishes.

The embedded microprocessor cost is lower, and the ability of carrying out complicated control program is strong, but the ability of digital signal processing is relatively poor relatively.In present multimedia practical application, both needed very strong digital signal processing capability so that various signals are sampled and analyzed, also need the needs that very strong control ability is supported multi-job operation simultaneously.Find that from present present situation microprocessor and DSP progressively merge.Therefore in microprocessor, increase corresponding D SP enhancement unit and seem very urgent.

A kind of multimodal medium reinforced pipelined multiplication unit of supporting that this patent proposes is a flush bonding processor medium codec functions enhancement unit that is applied to digital processing field.The essence of this unit is to expand by simple in flush bonding processor, when keeping its control ability, effectively promotes digital signal processing capability, reaches the function of part DSP.

Summary of the invention

The purpose of this invention is to provide the multimodal medium reinforced pipelined multiplication unit design method of a kind of support.

Support that multimodal medium reinforced pipelined multiplication unit design method feature is as follows:

1) according to the common SISD single instruction single data multiplication of the dissimilar Dynamic Selection of assembly instruction, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplication;

2) above-mentioned four kinds of multiplication adopt The pipeline design;

3) above-mentioned four kinds of multiplyings are separated into operand respectively and prepare and two stages of data operation each stage individual processing;

4) each data path of multi-mode medium reinforced pipelined multiplication unit all can carry out the operation of multiplication and multiply accumulating;

5) for the high precision multiplication data path of multiplication unit,, improve whole performance by increasing the degree of depth of streamline;

6) multi-mode medium reinforced pipelined multiplication unit is provided with integer and two kinds of operational patterns of decimal;

7) in the multiply accumulating computing, the intermediate result that obtains after the multiplying has been carried out the expansion of safeguard bit, enlarge the interval of additive operation;

8) the retired buffer zone of instruction is set, supports multiplication and multiply accumulating instruction retirement fast;

9) high bit register and low bit register are physically independent, link to each other in logic.

The common SISD single instruction single data multiplication of described dissimilar Dynamic Selection, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplication: in the Media Processor instruction set according to assembly instruction, increase the assembly instruction collection of supporting corresponding modes, the user is by simply calling dissimilar instructions, and hardware selects corresponding pattern to carry out computing automatically; The SISD single instruction single data pattern refers to individual instructions operation individual data; Simd mode refers to that individual instructions operates a plurality of data simultaneously; The multiple-instruction multiple-data (MIMD) pattern refers to carry out simultaneously many instructions and operates for a plurality of data; The high precision multiplication refers to the multiply operation of the width of operand than common multiply operation SerComm, and result's precision is also than common multiplication height.

Described above-mentioned four kinds of multiplication adopt The pipeline design: respectively common SISD single instruction single data multiplication, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplying are divided into some steps, each clock period, but the different operating of streamline hardware executed in parallel different instruction at different levels.

Describedly above-mentioned four kinds of multiplyings are separated into respectively operand is prepared and two stages of data operation, each stage individual processing: the process of the preparation of operand and operation of data is separate from the abstraction hierarchy, it is assigned in the different pipeline stages respectively, carries out parallel work-flow.

Each data path of described multi-mode medium reinforced pipelined multiplication unit all can carry out the operation of multiplication and multiply accumulating: each data path can be done multiply operation as independent data operation logical block, also support the computing of multiply accumulating, the multiply accumulating computing is multiplexing multiplication function unit.

Described multi-mode medium reinforced pipelined multiplication unit is provided with integer and two kinds of operational patterns of decimal: multiplier disposes according to the user, by changing the steering logic of hardware data path, carries out the multiplying of the multiplication and the integer of decimal respectively.

Described in the multiply accumulating computing; the intermediate result that obtains after the multiplying has been carried out the expansion of safeguard bit; enlarge the interval of additive operation: in accumulating operation; carry out corresponding sign bit expansion for each multiplication intermediate result; the sign bit expansion increases the precision of additive operation, and the assurance accumulation result does not produce in certain scope and overflows.

The described retired buffer zone of instruction that is provided with is supported multiplication and multiply accumulating instruction retirement fast: designed special-purpose retired buffer zone in the multiplication unit, be mainly used in steering order retirement fast under not intervention situation of main flow waterline.

Described high bit register and low bit register are physically independent, and link to each other in logic: high bit register and low bit register are independently physically, the result of corresponding different respectively data path inputs; Logically, in the computing of high-precision multiplication of correspondence and multiply accumulating, because width increases as a result, therefore high bit register is used to deposit result's high-order object information, and low bit register is used for depositing the low level object information, and both logically keep in touch.

The present invention is a kind of advanced multiplication unit structure that satisfies media application field complicated digital signal Processing Algorithm demand.The common SISD single instruction single data multiplication of this structural support, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplication provide support flexibly to complicated algorithm.By adopting The pipeline design, promoted the computing throughput of multiplication unit greatly, improved calculation process speed.

Description of drawings

Fig. 1 is multi-mode multiplication unit streamline division figure;

Fig. 2 is an operand preparatory unit hardware elementary diagram;

Fig. 3 is the data path hardware elementary diagram;

Fig. 4 is quick retire mechanism hardware circuit diagram.

Embodiment

Support that multimodal medium reinforced pipelined multiplication unit is a hardware execution units for the media algorithm enhancing of general-purpose built-in type processor design.It only is a performance element, is effectively replenishing of complicated flush bonding processor.

2) above-mentioned four kinds of multiplication adopt The pipeline design;

The common SISD single instruction single data multiplication of described dissimilar Dynamic Selection, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplication: in the Media Processor instruction set according to assembly instruction, increase the assembly instruction collection of supporting corresponding modes, the user is by simply calling dissimilar instructions, and hardware selects corresponding pattern to carry out computing automatically; The SISD single instruction single data pattern refers to individual instructions operation individual data; Simd mode refers to that individual instructions operates a plurality of data simultaneously; The multiple-instruction multiple-data (MIMD) pattern refers to carry out simultaneously many instructions and operates for a plurality of data; The high precision multiplication refers to the multiply operation of the width of operand than common multiply operation SerComm, and result's precision is also than common multiplication height.In order to use this performance element, first-selected its instruction set of expansion that needs of general-purpose built-in type processor, the relevant various types of instructions of extended medium enhancing on original instruction set.These instructions are divided into following type:

● general precision multiplication and multiply accumulating instruction

● high precision multiplication and multiply accumulating instruction

● single instruction multiple data (SIMD) instruction

● multiple-instruction multiple-data (MIMD) (MIMD) instruction

General precision multiplication and multiply accumulating instruction are the instructions of carrying out common computing, and these instructions generally are applied to some common occasions for the binary cycle instruction.The instruction of high precision multiplication and multiply accumulating be a kind of operand data width than common multiplication and the wide instruction of multiply accumulating instruction, width is a times of common multiplying order usually.These instructions will be used to those occasions to data accuracy requirement height (video of high-fidelity and voice data budget).Multiplying order of single instruction multiple data instruction can be carried out multiplication and multiply accumulating computing to a plurality of data respectively.Generally be applied to the reasonable occasion of data parallelism, as the operation of binary channels symmetric data, two data operations that passage is sent here are carried out in an instruction.The multiple-instruction multiple-data (MIMD) instruction is the special instruction that many atomic instructions are carried out different data respectively.It is undesirable generally to be applied to data parallelism, but needs to improve the occasion of computing throughput.These instructions cooperatively interact, and can select flexibly for different application, improve the speed of media algorithm its main operational.

Described above-mentioned four kinds of multiplication adopt The pipeline design: respectively common SISD single instruction single data multiplication, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplying are divided into some steps, each clock period, but the different operating of streamline hardware executed in parallel different instruction at different levels.Usually in hardware design, the method for shifter-adder is adopted in common multiplying, and the time-delay of hardware circuit is bigger, in order effectively to improve the throughput of multiplying unit, needs to adopt advanced pipelining.In supporting multimodal medium reinforced pipelined multiplication unit,, carried out reasonable production line for each bar data path and divided as shown in Figure 1 by careful research and analysis.Overall pipeline has been divided 3 grades, because the relatively slow singularity of speed of high precision multiplication and multiply accumulating computing needs to increase separately 1 level production line.

Describedly above-mentioned four kinds of multiplyings are separated into respectively operand is prepared and two stages of data operation, each stage individual processing: the process of the preparation of operand and operation of data is separate from the abstraction hierarchy, it is assigned in the different pipeline stages respectively, carries out parallel work-flow.Because multiplication unit need be supported the multiplication and the multiply accumulating computing of various modes, the preparation of operand is complicated more a lot of than common multiplying unit, therefore the data beamhouse operation is realized with an independent level production line.By the decoding unit of flush bonding processor, instruction is decoded, obtain corresponding control information and data message.Owing to need to support single instruction multiple data and multiple-instruction multiple-data (MIMD) pattern, so need two or more data preparatory unit.These two mutual concurrent workings of data preparatory unit, circuit theory diagrams as shown in Figure 2.Operand is prepared the circuit that hardware circuit is based on selector switch.The data path of the information gating correspondence that this part hardware obtains according to decoding, for the operand of different mode ready.Under the pattern of single instruction multiple data and multiple-instruction multiple-data (MIMD), operand preparation hardware module needs multidata to intercept and splices in addition.These operations are operated according to the logic that decoding obtains.

Each data path of described multi-mode medium reinforced pipelined multiplication unit all can carry out the operation of multiplication and multiply accumulating: each data path can be done multiply operation as independent data operation logical block, also support the computing of multiply accumulating, the multiply accumulating computing is multiplexing multiplication function unit.In order to satisfy the needs of all kinds of media algorithm to the operation that adds up, three data paths have designed independent hardware totalizer respectively.Totalizer is a totalizer that has safeguard bit, and hardware costs is little, operates in multiplication result and directly operates after coming out but can make to add up, and has improved the utilization ratio of hardware greatly.The overall performance that lifting adds up.After operand was ready, corresponding data and control signal will enter pipeline register, prepared to carry out the operation of data operation.Support multimodal medium reinforced pipelined multiplication device that 3 data paths are arranged, the hardware configuration of each bar data path is shown in 3.

Data path 1 is used for realizing the arithmetic operation of common multiplication and multiply accumulating, and this path is the most basic data path.The path top is a multiplicative operator, and this multiplicative operator support has symbol and no symbol manipulation.This multiplier is the core devices of whole medium reinforced pipelined multiplication device arithmetic element, and the result of multiply operation produces in this device.Data path 1 is in the computing of single instruction multiple data and multiple-instruction multiple-data (MIMD), as wherein computing path participation computing.

Data path 2 is auxiliary data paths.In common multiplication and multiply accumulating computing, it does not participate in operation of data, is in dormant state.Under the situation of single instruction multiple data and multiple-instruction multiple-data (MIMD), this streamline is waken up automatically, participates in arithmetic operation.With data path 1 collaborative work simultaneously, realized like this carrying out two arithmetic operations in the single hardware clock cycle, thereby effectively improved the efficient of data manipulation.Its basic structure and data path 1 are similar.

Data path 3 is data paths of supporting high precision multiplication and multiply accumulating operation.Generally this data path can be supported the data operation that the precision than general data path doubles.Because the width of data operation increases, the operating delay of this data path is bigger, so need to increase the breakthrough that an extra level production line is realized total body frequency in this path.Therefore in the streamline shown in Fig. 1 is divided, this path has been carried out extra division.

Described multi-mode medium reinforced pipelined multiplication unit is provided with integer and two kinds of operational patterns of decimal: multiplier disposes according to the user, by changing the steering logic of hardware data path, carries out the multiplying of the multiplication and the integer of decimal respectively.In the medium reinforced pipelined multiplication unit of supporting various modes, under each pattern, all support the multiply operation of fractional fixed point and integer.For the multiplication support of fractional fixed point, only need multiply each other, then to the multiplication result multiplication result that just can obtain decimal that moves to left according to multiplication of integers.Hardware costs is little, selects promptly can to obtain 3 data paths respectively to the support of integer and fractional fixed point by simple data.

Described in the multiply accumulating computing; the intermediate result that obtains after the multiplying has been carried out the expansion of safeguard bit; enlarge the interval of additive operation: in accumulating operation; carry out corresponding sign bit expansion for each multiplication intermediate result; the sign bit expansion increases the precision of additive operation, and the assurance accumulation result does not produce in certain scope and overflows.In cumulative process, the very possible generation of intermediate result overflowed, and overflows to cause data accuracy to descend.The reason that causes overflowing is because the computing width that accumulating operation adopts is not enough.In order to improve the data accuracy of processor in cumulative process, in multi-mode medium enhancing multiplication unit, 8 safeguard bits have been increased.8 safeguard bits can guarantee that (value that at every turn adds up is a maximal value) adds up for continuous 256 times under worst condition, and the result is not overflowed.Safeguard bit can be reconstructed according to the requirement of software algorithm.

In the described totalizer design, the retired buffer zone of instruction is set, supports multiplication and multiply accumulating instruction retirement fast: designed special-purpose retired buffer zone in the multiplication unit, be mainly used in steering order retirement fast under not intervention situation of main flow waterline.In order to improve the overall performance of multiplication unit, multiplication unit need be supported quick retire mechanism.Because multiplication, darker streamline has been adopted in particularly high precision multiplying, and streamline is dark more, the easy more wait that causes processor main flow waterline.For reducing the pause of main flow waterline, the hardware circuit of the quick retire mechanism of the built-in support of multiplication unit that this patent proposes.The multiplication of the overwhelming majority and multiply accumulating operation are carried out retirement by quick retired buffer zone, improve retired speed, thereby have improved the overall performance of flush bonding processor.Quick retired buffer zone hardware is realized as shown in Figure 4.Can when the main flow waterline sends to multiplication unit with operation, this instruction be pressed into quick retired buffer zone by quick pensioner's instruction.After this for the main flow waterline, this instruction has been transferred to the quick retired buffer zone of multiplication unit and has been controlled, so the main flow waterline has carried out retired operation to it, then continues to carry out follow-up instruction.For the quick retired buffer zone of multiplication unit, receive after the order that the main flow waterline sends and come, begin to be responsible for the execution and the retirement of this instruction.It can monitor the retired information that sends over from the main flow waterline constantly.If the instruction in the current list item is by the retirement of main flow waterline, so quick retired buffer zone effectively can be finished the quick retirement to instruction as long as wait for multiplication result, does not need again pensioner's request to be sent to the main flow waterline, by main flow waterline control retirement.

Described high bit register and low bit register are physically independent, and link to each other in logic: high bit register and low bit register are independently physically, the result of corresponding different respectively data path inputs; Logically, in the computing of high-precision multiplication of correspondence and multiply accumulating, because width increases as a result, therefore high bit register is used to deposit result's high-order object information, and low bit register is used for depositing the low level object information, and both logically keep in touch.In order to support fast retirement and quick access as a result, two registers that are used for access results need be set in multiplication unit inside.These two registers are only used low bit register in simple multiplication and multiply accumulating computing.In high precision multiplication and multiply accumulating computing, two registers are united use, respectively a high position and the low level of ecbatic.In the multiplication and multiply accumulating computing of single instruction multiple data and multiple-instruction multiple-data (MIMD), these two registers are used the result who is used for storing two data paths respectively by parallel.

Claims

1. support multimodal medium reinforced pipelined multiplication unit design method for one kind, its feature is as follows:

2) above-mentioned four kinds of multiplication adopt The pipeline design;

2. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1, it is characterized in that the common SISD single instruction single data multiplication of described dissimilar Dynamic Selection, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplication: in the Media Processor instruction set according to assembly instruction, increase the assembly instruction collection of supporting corresponding modes, the user is by simply calling dissimilar instructions, and hardware selects corresponding pattern to carry out computing automatically; The SISD single instruction single data pattern refers to individual instructions operation individual data; Simd mode refers to that individual instructions operates a plurality of data simultaneously; The multiple-instruction multiple-data (MIMD) pattern refers to carry out simultaneously many instructions and operates for a plurality of data; The high precision multiplication refers to the multiply operation of the width of operand than common multiply operation SerComm, and result's precision is also than common multiplication height.

3. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1, it is characterized in that described above-mentioned four kinds of multiplication adopt The pipeline design: respectively common SISD single instruction single data multiplication, single instruction multiple data multiplication, multiple-instruction multiple-data (MIMD) multiplication or high precision multiplying are divided into some steps, each clock period, but the different operating of streamline hardware executed in parallel different instruction at different levels.

4. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1, it is characterized in that describedly above-mentioned four kinds of multiplyings being separated into respectively operand is prepared and two stages of data operation, each stage individual processing: the process of the preparation of operand and operation of data is separate from the abstraction hierarchy, it is assigned in the different pipeline stages respectively, carries out parallel work-flow.

5. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1, each data path that it is characterized in that described multi-mode medium reinforced pipelined multiplication unit all can carry out the operation of multiplication and multiply accumulating: each data path can be done multiply operation as independent data operation logical block, also support the computing of multiply accumulating, the multiply accumulating computing is multiplexing multiplication function unit.

6. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1, it is characterized in that described multi-mode medium reinforced pipelined multiplication unit is provided with integer and two kinds of operational patterns of decimal: multiplier disposes according to the user, by changing the steering logic of hardware data path, carry out the multiplying of the multiplication and the integer of decimal respectively.

7. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1; it is characterized in that described in the multiply accumulating computing; the intermediate result that obtains after the multiplying has been carried out the expansion of safeguard bit; enlarge the interval of additive operation: in accumulating operation; carry out corresponding sign bit expansion for each multiplication intermediate result; the sign bit expansion increases the precision of additive operation, and the assurance accumulation result does not produce in certain scope and overflows.

8. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1, it is characterized in that the described retired buffer zone of instruction that is provided with, support multiplication and multiply accumulating instruction retirement fast: designed special-purpose retired buffer zone in the multiplication unit, be mainly used in steering order retirement fast under not intervention situation of main flow waterline.

9. the multimodal medium reinforced pipelined multiplication unit design method of a kind of support according to claim 1, it is characterized in that described high bit register and low bit register are physically independent, link to each other in logic: high bit register and low bit register are independently physically, the result of corresponding different respectively data path inputs; Logically, in the computing of high-precision multiplication of correspondence and multiply accumulating, because width increases as a result, therefore high bit register is used to deposit result's high-order object information, and low bit register is used for depositing the low level object information, and both logically keep in touch.