CN100410873C - Separate saturation add-minus function to improve key-performming moment of processor pipe - Google Patents

Separate saturation add-minus function to improve key-performming moment of processor pipe Download PDF

Info

Publication number
CN100410873C
CN100410873C CNB2006100670992A CN200610067099A CN100410873C CN 100410873 C CN100410873 C CN 100410873C CN B2006100670992 A CNB2006100670992 A CN B2006100670992A CN 200610067099 A CN200610067099 A CN 200610067099A CN 100410873 C CN100410873 C CN 100410873C
Authority
CN
China
Prior art keywords
minus
instruction
histories
time
pipeline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2006100670992A
Other languages
Chinese (zh)
Other versions
CN1821954A (en
Inventor
大卫A·鲍德鲁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN1821954A publication Critical patent/CN1821954A/en
Application granted granted Critical
Publication of CN100410873C publication Critical patent/CN100410873C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Advance Control (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention relates to method and device for separating saturated add-subtract calculation function from one execution unit to improve the key execution phase time course of processor pipeline framework. The saturated add-subtract calculation function is one of arithmetic logic operation function requiring relatively long time in the execution unit. The present invention separates the time consuming saturated add-subtract calculation function from the execution unit and makes it possible to complete the saturated add-subtract calculation function in time exceeding one pipeline phase time course. Separating saturated add-subtract calculation function from the execution unit can reduce the time course required by the execution unit as the key phase in the pipeline framework processor. On the whole, separating saturated add-subtract calculation function from the execution unit can speed the execution speed of the processor.

Description

Separate saturation add-minus function is to improve the crucial execute phase time-histories of processor pipeline
Technical field
The present invention relates to a kind of processor, the separate saturation add-minus calculation function is to improve the pipeline architecture processor of time-histories in especially a kind of crucial performance element of self processor.
Background technology
Just constantly evolution of the processing procedure science and technology of integrated circuit now, the volume of the semiconductor device of integrating with integrated circuit also dwindles significantly, cause real circuit of making intensive further, because the clock pulse propagation delay (propagationdelay) between the integrating semiconductor device is more and more littler, so the tolerable integrated circuit is carried out with more and more higher clock pulse.
Because device ever-smaller and clock pulse constantly increase, the framework of circuit is more to circuit performance, that is execution speed has bigger influence power.The execution speed that is positioned at the processor that an electronic installation is used to execute instruction can determine the execution speed of this electronic installation.Processor with pipeline work capacity can be carried out a plurality of instructions simultaneously in different blocks or the pipeline stage in this processor.Each step (step) in the line construction is finished the some of an instruction, and as the production line in the factory, different step is finished the different piece of different instruction abreast.Above-mentioned each step is called as a pipeline stage (stage) or section (segment).These stages are to link to each other one by one forming a pipeline, that is instruct since then that an end of pipeline enters, and continue to handle in each stage, leave in the other end at last.Should be noted that one has in the processor of line construction, a stage is to be equal to a unit (unit), and for example an execute phase can be equivalent to a performance element." stage " speech is a step that is used for showing to claim pipeline, and the hardware configuration that is comprised in this stage then represented in " unit " speech.
The running of synchronous processing device is to carry out according to the clock pulse cycle.Generally speaking, an instruction can be in each clock pulse, and a stage of self processor pipeline moves to the next stage.The time-consuming stage at most of processing instruction is so-called crucial time-histories (critical timing) in all stages, need reduce above-mentioned crucial time-histories usually to increase the execution speed of processor as far as possible.The processing procedure of execute phase usually can be time-consuming longer than other stage, so need seek to improve the various modification method that it carries out time-histories.Have arithmetic logical unit in execute phase to form the calculating core of a processor, carry out as the elementary arithmetic of addition of integer, subtraction, logic AND computing and logic OR computing or the like or the function of logical operation.Must the wherein a kind of of time-consuming crucial arithmetic logical operation at most be saturated (saturated) plus and minus calculation.When saturated conditions took place, its end value will force to be set at maximum or minimum value, was 1 entirely as everybody or was 0 entirely.Owing to need the added logic circuit to carry out above-mentioned saturation add-minus computing, be one of function the slowest in the arithmetic logical unit so plant computing.In view of the above, need to improve the counter framework of the required time-histories of saturation add-minus computing, could shorten the time-histories of execute phase, and then promote the execution speed of processor.
Summary of the invention
In above-mentioned background of invention, in order to meet the demand of interests on the industry, the invention provides a kind of processor can be in order to solve the target that above-mentioned traditional processor fails to reach.
The present invention mainly provides the method and apparatus of the performance element time-histories of improving processor.In an embodiment, the present invention provides the device time-histories of the execute phase to improve this processor in the pipeline architecture processor.This device comprises performance element, a multiple time-histories saturation add-minus device and one second multiplexer with arithmetic logical unit and first multiplexer.This arithmetic logical unit is to be used for receiving the tradition plus-minus from the previous stage of execute phase to instruct and operand.One traditional adder subtracter is the some for arithmetic logical unit, and it can execute plus-minus method in a pipeline stage time-histories.The first above-mentioned multiplexer is to be used to receive the output of this arithmetic logical unit and to produce a data path.Multiple time-histories saturation add-minus device is to receive saturation add-minus instruction and operand from the previous stage of the execute phase of this processor, and carries out the saturation add-minus computing to produce a full operation result that closes in multiple pipeline stage time-histories.Second multiplexer is to be used to receive this data path and should full to close operation result to export time stage of this execute phase to.
Another object of the present invention is that the device time-histories of the execute phase to improve this processor is provided in the pipeline architecture processor.This device comprises a decoding unit, a multiple time-histories saturation add-minus device and a multiplexer.This decoding unit be used for from a tradition decoding finish instruction isolate the saturation add-minus instruction with and operand.Above-mentioned multiple time-histories saturation add-minus device is to be used for receiving saturation add-minus instruction and its operand from this decoding unit, and carries out the saturation add-minus computing in multiple pipeline stage time-histories.Moreover this multiplexer is a time stage that is used to receive the execution result of saturation add-minus computing and exports the execute phase to.
Another object of the present invention is to provide a kind of method of improving the processor time-histories.The method comprises and judges whether to receive saturation add-minus instruction, when receiving this saturation add-minus instruction, in multiple pipeline stage time-histories, carry out the saturation add-minus computing to produce a saturation arithmetic result, and when receiving the instruction of unsaturation plus-minus, in the time-histories of a pipeline stage, carry out the unsaturation plus and minus calculation to produce a unsaturation operation result.
Another object of the present invention is to provide another kind to improve the method for processor time-histories.The method comprise receive instruction and operand, to receive instruction decipher to produce a decoding finish instruction, judge this decoding whether finish instruction be saturation add-minus instruction, when receiving this saturation add-minus when instructing, the computing of execution saturation add-minus is to produce a saturation arithmetic result in multiple pipeline stage time-histories; And, in a pipeline stage time-histories, carry out the unsaturation plus and minus calculation to produce a unsaturation operation result when receiving the instruction of unsaturation plus-minus.
The present invention is a kind of processor in this direction of inquiring into.In order to understand the present invention up hill and dale, detailed step and composition thereof will be proposed in following description.Apparently, execution of the present invention is not defined in the specific details that the operator had the knack of of processor.On the other hand, well-known composition or step are not described in the details, with the restriction of avoiding causing the present invention unnecessary.Preferred embodiment meeting of the present invention is described in detail as follows, yet except these detailed descriptions, the present invention can also implement in other embodiments widely, and scope of the present invention do not limited, its with after claim be as the criterion.
Description of drawings
For further specifying technology contents of the present invention, below in conjunction with embodiment and accompanying drawing describes in detail as after, wherein:
Fig. 1 is a block schematic diagram that has a processor of five pipeline stage traditionally;
Fig. 2 is the block schematic diagram for a decoding unit of a processor that has pipeline architecture in the known technology;
Fig. 3 is the block schematic diagram for a performance element of a processor that has pipeline architecture in the known technology;
Fig. 4 is the block schematic diagram according to a decoding unit of the present invention;
Fig. 5 is in one pipeline processor according to the present invention, has a block schematic diagram of a performance element of separate saturation add-minus function;
Fig. 6 is for carrying out a time-histories synoptic diagram of traditional arithmetic logical operation, saturation add-minus computing and performance element in the known pipeline architecture processor;
Fig. 7 is for carrying out a time-histories synoptic diagram of traditional arithmetic logical operation, saturation add-minus computing and performance element in the pipeline architecture processor according to the present invention;
Fig. 8 A be for the present invention from an operating process synoptic diagram of a pipeline architecture processor of performance element separate saturation add-minus method function;
Fig. 8 B be for the present invention from an operating process synoptic diagram of a pipeline architecture processor of performance element separate saturation add-minus method function;
Fig. 9 A be for the present invention from an operating process synoptic diagram of a pipeline architecture processor of performance element separate saturation add-minus method function; And
Fig. 9 B be for the present invention from an operating process synoptic diagram of a pipeline architecture processor of performance element separate saturation add-minus method function.
Embodiment
With reference to shown in Figure 1, it is the block schematic diagram that has a processor of five pipeline stage for.Teaching of the present invention with the explanation also applicable to other pipeline operation framework with different phase number and different kenels.In framework shown in Figure 1, have an instruction fetch unit 110, a decoding unit 120, a performance element 130, a memory access unit 140 and a working storage and write back unit 150.Except that describing the part in this explanation is special, the function mode of said units or logical circuit block all is a tradition, and is known for being familiar with this operator, so do not add to describe in detail in this.
Technology as is known, this instruction fetch unit 110 are to extract instruction in regular turn according to the value of a programmable counter in the working storage archives 160 or content to carry out, or according to exception vector, branch instruction and link order with the instruction in the extraction internal memory.This instruction fetch unit 110 also is used to judge that all exception vectors link the return address of instruction with branch, and this return address is write or is stored in a suitable working storage of above-mentioned working storage archives 160.
Above-mentioned decoding unit 120 is to decipher at coming from the instruction that instruction fetch unit 110 transmitted, and produces enough control signals and supply with performance element 130 so that carry out this instruction.The framework of decoding unit 120 is to change along with the design of processor, but is familiar with general running and tissue that this operator has known typical decoding unit.Secondly, the framework of performance element 130 is also different along with the design of processor.Generally speaking, performance element 130 comprises the logical circuit that is used to execute instruction, and the execution of its instruction is the control signal according to above-mentioned decoding unit 120.Memory access unit 140 is to be situated between with extraneous data-carrier store to connect, and comes the access data with the requirement according to performance element 130 performed instructions.Certainly, be not that all instructions all need access memory; But for those instructions that need access memory, memory access unit 140 is to be used for external memory is carried out access action.
At last, above-mentioned working storage writes back unit 150 and is responsible for storing or write in the suitable working storage of result to the working storage archives 160 after instruction is carried out.
Because five stages of this pipeline processor are parallel runnings, the speed that the time-histories in minimizing most critical stage can be promoted this processor effectively.Generally speaking, performance element 130 is than other four-stage cost longer time in the pipeline.In performance element 130, has arithmetic logical unit to carry out as to add all numerical evaluation and the comparison operation of the class of multiplication.Must long a kind of computing consuming time be saturated (saturated) plus and minus calculation in the arithmetic logical operation.The computing of this kind saturation add-minus be when a tradition add/result that subtracter is carried out plus-minus method exceeds this tradition and adds/computing that bit that subtracter can be dealt with is produced when counting.Usually only can carry out the saturation add-minus computing, and can not carry out this kind computing at the address to data, can be more urgent because obtain the demand of address.When saturated conditions took place, end value can force to be set at maximal value or minimum value, that is whole place value all is 1 or 0 situation.Owing to need extra logical circuit to carry out the saturation add-minus computing, so the saturation add-minus computing is one of function the slowest in the arithmetic logical unit.Consider complexity that the saturation add-minus computing is had and characteristic consuming time, mean and to reduce performance element 130 required time-histories if this function is shifted out performance element 130.Because the execute phase, promptly performance element 130, are the stage of most critical in the pipeline operation framework, so reduce the execution speed that performance element 130 required time-histories can be improved the processor with pipeline architecture.
Please refer to shown in Figure 2ly, it is the block schematic diagram for a decoding unit 210 of a processor that has pipeline architecture in the known technology.This decoding unit 120 is from its previous stage, promptly receives the instruction back and deciphers from instruction fetch unit 110.This decoding unit 210 produces enough control signals according to this and supplies with performance element 130 to carry out the execution of decoded instruction.The reception square 211 of above-mentioned decoding unit 210 receives instruction and operand, the instruction that is received is deciphered in decoding function square 212, finished the instruction of decoding and operand and passed on decoding unit 210 and deliver to time stage of decoding unit 210 by transmitting function square 213, it is the execute phase 130, so that carry out.
Please refer to shown in Figure 3ly, it is the block schematic diagram for a performance element 310 of a processor that has pipeline architecture in the known technology.As shown in Figure 1, this performance element 310 is pipeline stage that for this reason have 140 of the decoding unit 120 of pipeline architecture processor and memory access unit.Above-mentioned decoding unit 120 produces enough control signals and supplies with performance element 310 so that carry out this instruction.In the previous stage shown in Figure 3 of the decoding unit of five pipeline stage of reference normally, this decoding unit provide the instruction of finishing decoding and operand so far the arithmetic logical unit 320 in the performance element 310 so that carry out arithmetic logical operation.One typical performance element 310 can be carried out addition, subtraction, displacement (shift) and logical operation function.Teaching provided by the present invention and idea also can be applicable to have the performance element 310 of other calculation function.The result who is calculated by above-mentioned arithmetic logical unit 320 sends into a multiplexer 330 earlier and then is resent to a time stage, that is this has the memory access unit 140 of the processor of pipeline architecture.Above-mentioned decoding unit 120 also provide the required selection signal of this multiplexer 330 to select this multiplexer 330 output and be sent to a time stage.Under another situation, if when the data of desire output must be stored in working storage archives 160, the output of this multiplexer 330 can see through above-mentioned memory access unit 140 and write back unit 150 and arrive at working storage archives 160 with working storage.Another kind of may situation be that its output feeds back to performance element 310 itself, the situation that for example ought need continued product to calculate again.
Please refer to shown in Figure 4ly, it is to be the block schematic diagram according to a decoding unit 420 of the present invention.The present invention finishes saturation add-minus instruction in the instruction from general decoding in decoding unit 420 and separates, and the instruction of importing decoding unit 420 provided by the present invention and operand are to be same as instruction and the operand of importing the decoding unit 210 that Fig. 2 marks.The receiving function square 421 of this decoding unit 420 is to be used for receiving instruction and operand, when instruct in decoding function square 422 decoded after, the instruction of having deciphered can be divided into traditional instruction and the saturation add-minus instruction of having deciphered, and sends into respectively in transmitting function square 423 and 424.Operand can be sent to a performance element 510 shown in Figure 5 and a multiple time-histories saturation add-minus device 540 abreast.The traditional instruction of finishing decoding will be transferred into performance element 510 carrying out the execution of traditional instruction, and the saturation add-minus instruction of finishing decoding will be transferred into above-mentioned multiple time-histories saturation add-minus device 540 to carry out the saturation add-minus computing.
Please refer to shown in Figure 5ly, it is in one pipeline processor according to the present invention, has a block schematic diagram of a performance element 510 of separate saturation add-minus function.Owing to need in arithmetic logical unit 520, separate, so this performance element 510 can be finished arithmetic logical operation in than short time interval in one than the multiple saturation add-minus computing of the longer execution time-histories of traditional instruction.The decoding unit 420 of its previous stage then need is responsible for judging that instruction and operand that institute's desire is carried out are to be tradition or saturated plus and minus calculation.The instruction and the operand of tradition plus and minus calculation will be had the performance element 510 of arithmetic logical unit 520 and one first multiplexer 530 so that carry out traditional plus and minus calculation by feed-in, and its result will be transferred into one second multiplexer 550.If instruction and operand that institute's desire is carried out are to be the saturation add-minus computing, the saturation add-minus of then having deciphered instruction will be transferred feed-in than the long saturation add-minus device 540 of a tradition stage time-histories with operand.The result that multiple thus time-histories saturation add-minus device 540 is calculated can import second multiplexer 550 to incorporate the data path of this performance element 510 into, also or can independently carry out.By above-mentioned saturation add-minus method therewith the result that merged of performance element 510 data paths can be sent to the memory access unit 140 of pipeline architecture or be diverted to working storage again and write back unit 150.
In Fig. 6 and time-histories synoptic diagram shown in Figure 7 is to show how to reduce now the required time-histories of performance element in the processor.Fig. 6 is for carrying out a time-histories synoptic diagram of traditional arithmetic logical operation (all arithmetic logical operations except that the saturation add-minus computing), saturation add-minus computing and performance element in the pipeline job processor.Second time-histories line is represented the time-histories of carrying out taking in all traditional arithmetic logical operations at most in this performance element; Article three, the time-histories line is represented the time-histories of carrying out the saturation add-minus computing.Thus among the time-histories figure as can be seen, carry out the required time-histories of saturation add-minus computing and will be longer than and carry out traditional arithmetic logical operation.Take that to carry out other function long owing to carry out the saturation add-minus computing, so the time-histories of performance element is that computing is decided according to saturation add-minus.
Then compare Fig. 6 and Fig. 7, Fig. 7 is the time-histories synoptic diagram for traditional arithmetic logical operation according to the present invention, saturation add-minus computing and performance element.Please get back to shown in Figure 5, because the present invention isolates saturation add-minus musical instruments used in a Buddhist or Taoist mass 540 in performance element 510, so must increase by one second multiplexer 550, and (dotted line as shown in Figure 7 a) will be incorporated in the pipeline stage time-histories with the arithmetic logical operation time that tradition takes at most the time that this second multiplexer 550 is increased.In view of the above, comparison diagram 6 and performance element time-histories shown in Figure 7, the more traditional performance element of the execution speed of performance element provided by the present invention is fast, because when saturation add-minus device 540 after performance element 510 separates, the required unit stage time-histories of above-mentioned performance element can be decided according to the time-histories of traditional arithmetic logical operation, but not according to the time-consuming full plus and minus calculation that closes at most.And owing to the unit stage time-histories in each stage in the pipeline is all equal, so the unit time-histories in other stage can shorten according to the shortening of execute phase in the pipeline, and then make the pipeline architecture processor provided by the invention can be because of the saturation add-minus musical instruments used in a Buddhist or Taoist mass is separated and accelerates the processing speed of integral tube line architecture from performance element.
Please refer to shown in Fig. 8 A, it is for from an operating process synoptic diagram of a pipeline architecture processor of traditional performance element separate saturation add-minus method function.Step 810 judges whether the instruction that has received is saturation add-minus instruction; If the instruction that receives is for saturation add-minus instruction, then carry out step 820.Step 820 is that execution is longer than the saturation add-minus computing of a pipeline stage time-histories to produce the result of saturation add-minus computing, because multiple saturation add-minus function is separated in traditional arithmetic logical unit, then, this operating process sees through junction symbol A and proceeds to Fig. 8 B.If the instruction that receives is not saturation add-minus instruction, then carry out step 830; That is in the time-histories of a pipeline stage, performance element carries out the unsaturation plus and minus calculation to bear results on a normal data path.
In Fig. 8 B, the continuing junction symbol A of Fig. 8 A of step 840.Will be in step 840 seeing through a multiplexer with the normal data path of step 830 gained by saturation arithmetic result that step 820 produced merges.Last step 850, the output of this multiplexer will be transmitted so far, and the memory access unit 140 or an inferior working storage of pipeline architecture processor write back unit 150.
Please refer to shown in Fig. 9 A, it is for from an operating process synoptic diagram of a pipeline architecture processor of performance element separate saturation add-minus method function.In step 910, decoding unit receives instruction and operand since then the previous stage of pipeline architecture processor.The instruction that is received is then decoded in step 920, and these instructions of having deciphered are admitted to a time stage of decoding unit in the pipeline architecture, that is performance element.Carry out a time step 930, judge whether the instruction that has received is saturation add-minus instruction; If the result is true, then flow process then carry out step 940.After receiving the saturation add-minus instruction of having deciphered and operand, the saturation add-minus musical instruments used in a Buddhist or Taoist mass can carry out the saturation add-minus computing in the step 940 of one most time-histories.Then, this operating process sees through junction symbol B and proceeds to Fig. 9 B.If the result of determining step 930 is pseudo-, then in step 950, this performance element will be finished the computing operation of unsaturation plus-minus in a pipeline stage time-histories.
In Fig. 9 of hookup 9A B, proceed above-mentioned from the operating process of the pipeline architecture processor of performance element separate saturation add-minus method function.Step 960 receives from the saturation arithmetic result of step 940 and normal data path from step 950, and the saturation arithmetic result merged enters in the normal data path.In step 970, the output of merging will be transmitted so far that the memory access unit 140 or an inferior working storage of pipeline architecture processor write back unit 150.In this embodiment, the saturation add-minus musical instruments used in a Buddhist or Taoist mass will no longer be the bottleneck that shortens on the execute phase time-histories.Because the required time of performed other computing of arithmetic logical unit all comes shortly than the saturation add-minus computing, thus can shorten the required unit stage time-histories of performance element, according to and the time-histories of shortening pipeline architecture processor integral body.
Apparently, according to the description among the top embodiment, the present invention has many corrections and difference.Therefore need be understood in the scope of its additional claim item, except above-mentioned detailed description, the present invention can also implement widely in other embodiments.Above-mentioned is preferred embodiment of the present invention only, is not in order to limit claim of the present invention; All other do not break away from the equivalence of being finished under the disclosed spirit and changes or modification, all should be included in the following claim.

Claims (14)

1. a pipeline architecture processor is characterized in that, comprises:
One performance element comprises:
One arithmetic logical unit is to receive the instruction of unsaturation plus-minus from the previous stage of this performance element to add and subtract the computing of instruction with operand to finish this unsaturation in a pipeline stage time-histories;
One first multiplexer is to receive the computing output of this arithmetic logical unit and produce a data path;
One multiple time-histories saturation add-minus device is to receive saturation add-minus instruction and operand from the previous stage of this performance element, and carries out the saturation add-minus computing to produce a full operation result that closes in multiple pipeline stage time-histories; And
One second multiplexer receives this data path and should full close operation result to export the inferior two-stage of this performance element to.
2. according to the pipeline architecture processor of claim 1, it is characterized in that wherein above-mentioned saturation add-minus instruction is separated in the previous stage of this performance element with the instruction of unsaturation plus-minus.
3. according to the pipeline architecture processor of claim 1, it is characterized in that the required unit stage time-histories of wherein above-mentioned performance element is to be decided by this arithmetic logical unit.
4. according to the pipeline architecture processor of claim 1, it is characterized in that wherein be for a decoding stage previous stage of this performance element, comprises a decoding unit.
5. according to the pipeline architecture processor of claim 1, it is characterized in that wherein this unsaturation plus-minus instruction comprises the instruction of tradition plus-minus, logic instruction, displacement commands.
6. a pipeline architecture processor is characterized in that, comprises:
One decoding unit, be from a tradition decoding finish isolate in the instruction saturation add-minus instruction with and operand;
One multiple time-histories saturation add-minus device is to receive this saturation add-minus instruction and its operand from this decoding unit, and carries out the computing of this saturation add-minus instruction in multiple pipeline stage time-histories; And
One multiplexer is time three stages that receive a computing execution result of this saturation add-minus instruction and export this decoding unit to.
7. according to the pipeline architecture processor of claim 6, it is characterized in that, wherein also comprise:
One arithmetic logical unit is to receive the instruction of unsaturation plus-minus from this decoding unit, and finishes the computing of this saturation add-minus instruction in the pipeline stage time-histories.
8. according to the pipeline architecture processor of claim 7, it is characterized in that wherein this unsaturation plus-minus instruction comprises the instruction of tradition plus-minus, logic instruction, displacement commands.
9. according to the pipeline architecture processor of claim 6, it is characterized in that, wherein when this multiple time-histories saturation add-minus device when this arithmetic logical unit separates, a time required unit stage time-histories of a stage that is positioned at this decoding unit is decided by this arithmetic logical unit.
10. according to the pipeline architecture processor of claim 9, it is characterized in that wherein this time stage that is positioned at decoding unit is to be an execute phase, comprises a performance element.
11. a method of improving the execute phase time-histories of processor pipeline is characterized in that, comprises:
When receiving saturation add-minus instruction, in multiple pipeline stage time-histories, carry out the saturation add-minus computing to produce a saturation arithmetic result; And
When receiving the instruction of unsaturation plus-minus, in a pipeline stage time-histories, carry out the unsaturation plus and minus calculation to produce a unsaturation plus and minus calculation result.
12. the method according to the execute phase time-histories of improving processor pipeline of claim 11 is characterized in that, wherein also comprises:
This saturation arithmetic result is integrated with this tradition data path; And
In the inferior two-stage of this execute phase, use the new result after merging.
13. the method according to the execute phase time-histories of improving processor pipeline of claim 11 is characterized in that, should required unit stage time-histories of execute phase be the computing time-histories that is decided by this unsaturation plus-minus instruction wherein.
14. the method according to the execute phase time-histories of improving processor pipeline of claim 11 is characterized in that, wherein instructs before the step receiving saturation add-minus instruction and unsaturation plus-minus, described method also comprises:
Receive an instruction and an operand;
The instruction of this reception deciphered to produce a decoding finish instruction; And
Judge whether instruction is finished in this decoding is that this saturation add-minus instructs.
CNB2006100670992A 2005-04-12 2006-04-04 Separate saturation add-minus function to improve key-performming moment of processor pipe Active CN100410873C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US67026905P 2005-04-12 2005-04-12
US60/670,269 2005-04-12

Publications (2)

Publication Number Publication Date
CN1821954A CN1821954A (en) 2006-08-23
CN100410873C true CN100410873C (en) 2008-08-13

Family

ID=36923344

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100670992A Active CN100410873C (en) 2005-04-12 2006-04-04 Separate saturation add-minus function to improve key-performming moment of processor pipe

Country Status (2)

Country Link
CN (1) CN100410873C (en)
TW (1) TWI314287B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8316071B2 (en) * 2009-05-27 2012-11-20 Advanced Micro Devices, Inc. Arithmetic processing unit that performs multiply and multiply-add operations with saturation and method therefor
US11055137B2 (en) 2019-05-08 2021-07-06 Imam Abdulrahman Bin Faisal University CPU scheduling methods based on relative time quantum for dual core environments

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1421001A (en) * 1999-11-12 2003-05-28 太阳微系统公司 Optimization of N-base typed arithmetic expressions
US20040167954A1 (en) * 2003-02-21 2004-08-26 Infineon Technologies North America Corp. Overflow detection system for multiplication
US20050060359A1 (en) * 2003-07-23 2005-03-17 Schulte Michael J. Arithmetic unit for addition or subtraction with preliminary saturation detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1421001A (en) * 1999-11-12 2003-05-28 太阳微系统公司 Optimization of N-base typed arithmetic expressions
US20040167954A1 (en) * 2003-02-21 2004-08-26 Infineon Technologies North America Corp. Overflow detection system for multiplication
US20050060359A1 (en) * 2003-07-23 2005-03-17 Schulte Michael J. Arithmetic unit for addition or subtraction with preliminary saturation detection

Also Published As

Publication number Publication date
TWI314287B (en) 2009-09-01
CN1821954A (en) 2006-08-23
TW200636575A (en) 2006-10-16

Similar Documents

Publication Publication Date Title
US20020169942A1 (en) VLIW processor
AU618142B2 (en) Tightly coupled multiprocessor instruction synchronization
EP0208870B1 (en) Vector data processor
US7136989B2 (en) Parallel computation processor, parallel computation control method and program thereof
JP2539974B2 (en) Register read control method in information processing apparatus
US11513804B2 (en) Pipeline flattener with conditional triggers
CN100410873C (en) Separate saturation add-minus function to improve key-performming moment of processor pipe
CN100451951C (en) 5+3 levels pipeline structure and method in RISC CPU
US6092183A (en) Data processor for processing a complex instruction by dividing it into executing units
KR101545701B1 (en) A processor and a method for decompressing instruction bundles
US7975128B2 (en) Apparatuses and programs for implementing a forwarding function
CN100356318C (en) Methods and apparatus for instruction alignment
US6832334B2 (en) Computer register watch
CN111124490A (en) Precision-loss-free low-power-consumption MFCC extraction accelerator using POSIT
US7831806B2 (en) Determining target addresses for instruction flow changing instructions in a data processing apparatus
US5819081A (en) Method of executing a branch instruction of jumping to a subroutine in a pipeline control system
CN102289363A (en) Method for controlling data stream and computer system
CN102063290B (en) Systematized RISC CPU (Reduced Instruction-Set Computer Central Processing unit) production line control method
CN101923386A (en) Method and device for reducing CPU power consumption and low power consumption CPU
JP2002268877A (en) Clock control method and information processor using the same
CN101615114A (en) Finish the microprocessor realizing method of multiplication twice, addition twice and displacement twice
CN101930281B (en) Method and device for reducing power consumption of CPU and low-power CPU
US20030018883A1 (en) Microcode branch prediction indexing to macrocode instruction addresses
JP3414579B2 (en) Programmable controller
JPH052485A (en) Pipeline control system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant