CN1142485C - Correlation delay eliminating method for streamline control - Google Patents

Correlation delay eliminating method for streamline control Download PDF

Info

Publication number
CN1142485C
CN1142485C CNB011315695A CN01131569A CN1142485C CN 1142485 C CN1142485 C CN 1142485C CN B011315695 A CNB011315695 A CN B011315695A CN 01131569 A CN01131569 A CN 01131569A CN 1142485 C CN1142485 C CN 1142485C
Authority
CN
China
Prior art keywords
instruction
branch
fetch
fundamental block
prefetched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB011315695A
Other languages
Chinese (zh)
Other versions
CN1349160A (en
Inventor
葵 戴
戴葵
王志英
沈立
王蓉晖
王蕾
张春元
王明仕
王进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CNB011315695A priority Critical patent/CN1142485C/en
Publication of CN1349160A publication Critical patent/CN1349160A/en
Application granted granted Critical
Publication of CN1142485C publication Critical patent/CN1142485C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Advance Control (AREA)

Abstract

The present invention discloses a method for eliminating control correlation delays in pipelines. The goal of the present invention is to effectively eliminate control correlation delays in pipelines and improve the performance of a microprocessor on the premise that the simplicity and the low power consumption of hardware are realized. The present invention has the technical scheme that a compiler determines all probable branch target addresses of a branch instruction and inserts a prefetched instruction; all next instructions of the current branch instruction are read by two fetch instruction components in advanced, and a selector selects an instruction provided by one fetch instruction component to decode and execute according the decoding result of the current branch instruction. The prefetched instruction has a fetch address 1, a fetch address 2, a fetch address and a fetch stack which correspond to different branch instructions and are executed by the fetch instruction components. The prefetched instruction uses a basic block as a unit. The hardware of the present invention realizes low complexity, low power consumption, high control correlation delay elimination rate and small invalid prefetched instruction number. Microprocessors designed by the present invention have a high performance price ratio.

Description

Correlation delay eliminating method for streamline control
Technical field: the present invention relates to the removing method of streamline control correlation delay in the microprocessor Design, especially require removing method low in energy consumption, that hardware is realized streamline control correlation delay in the simple embedded microprocessor Design.
Background technology: at present, the method for eliminating streamline control correlation delay in the microprocessor Design roughly can be divided into two classes, i.e. branch prediction and delayed branch method.Principle of locality when branch prediction method mainly utilizes program run predicts according to the statistical information of branch instruction execution result whether next branch transition is successful.The effect of branch prediction not only depends on its accuracy, and the expense during with branch prediction is closely related.Streamline control correlation delay depends on and recovers the strategy taked after structure, forecast method and the prediction error of streamline.The weak point of this method is, prediction needs a large amount of hardware supported, and for example recovery parts after caluclate table and the prediction error etc. realize that expense is big, power consumption is high.The main thought of delayed branch method is to carry out and the irrelevant instruction of branch instruction in the relevant pause period of control, thereby covers this pause period.The weak point of this method is, can't fill all Tapped Delay grooves, and the instruction that can't guarantee simultaneously to be scheduled always must be carried out, and if not necessary, performance can not get real raising so.
Embedded microprocessor is mainly used in fields such as household electrical appliances, mobile phone, microcontroller, require low in energy consumption, above method or can't satisfy the requirement that it is low in energy consumption, complexity is low, or control correlation delay elimination factor is low, can't fully improve its performance.Even in the general purpose microprocessor design, these methods also can't be eliminated the control correlation delay in the streamline expeditiously.
Summary of the invention: technical matters to be solved by this invention is to satisfy under the embedded microprocessor hardware realization prerequisite simple, low in energy consumption, efficiently eliminates streamline control correlation delay, improves microprocessor performance.
Technical scheme of the present invention is: determined all possible branch target address of branch instruction and inserted prefetched instruction by the compiler in the collector; Two instruction fetch parts of design read in all possible successor instruction of current branch instruction in advance in instruction fetch module; Selector switch of design in instruction decode and execution module, give the instruction decode parts according to the instruction of selecting instruction fetch parts to provide to the decode results of current branch instruction and instruction execution unit is deciphered and carried out by selector switch, thereby eliminate streamline control correlation delay.Still do not have both at home and abroad at present and adopt this method to carry out the report that streamline control correlation delay is eliminated.
The present invention relates to eight nouns: streamline, branch instruction, streamline are correlated with, control is relevant, control correlation delay, instruction prefetch, prefetched instruction, fundamental block, and their definition is:
(1) streamline: the execution process instruction of microprocessor is resolved into the experimental process process, every
Individual subprocess can be held at other subprocess of its special function Duan Shangyu effectively simultaneously
OK, the flowing water technology that instruction that Here it is is carried out.Streamline is the concrete reality of flowing water technology
Existing.In different microprocessor Design, what stages streamline specifically is divided into
Also inequality.General streamline all comprises 5 stages: get finger, decoding, execution,
Memory access, write back.
(2) branch instruction: all instructions of reprogramming Counter Value are referred to as branch instruction, it
Comprise four classes: conditional branch instruction, direct unconditional branch instruction, unconditional indirectly
Transfer instruction and function, process link order (for example retum statement).
(3) streamline is relevant: because the pipeline stall that the relation of interdependence between instruction causes claims
For streamline is correlated with.
(4) control is relevant: make control relevant because the streamline that branch instruction causes is relevant.
(5) control correlation delay: because the relevant pipeline stalling clock periodicity that causes of control just
It is the control correlation delay.
(6) instruction prefetch: before instruction is performed, will instruct the behaviour who from storer, takes out in advance
Work is called instruction prefetch.
(7) prefetched instruction: be responsible for finishing the instruction of looking ahead and be called prefetched instruction.
(8) fundamental block: the basic composition unit of program, it has only inlet (is fundamental block
First statement) and an outlet (being the last item statement of fundamental block), base
This piece contains two instructions at least, and the outlet of fundamental block is a branch instruction, a program
Always can be divided into several fundamental blocks.
Implementation of the present invention is:
1. determine all possible branch target address of branch instruction and insert prefetched instruction by the compiler in the collector;
2. when program brought into operation, in the instruction fetch module, instruction fetch parts read in first fundamental block, another instruction fetch parts free time.
For each fundamental block in the program:
(a) the instruction fetch parts of being responsible for reading in this fundamental block read in each bar in the fundamental block successively
Instruction, and judge whether it is prefetched instruction.If the current instruction that is read into is for pre-
Instruction fetch then sends to prefetched instruction another instruction fetch parts, and being carried out by it should
Prefetched instruction; Otherwise instruction is sent to decoding part in instruction decode and the execution module
Part is deciphered.
(b) when the instruction of the last item of fundamental block be after branch instruction decoding finishes, instruction decode and
Selector switch in the execution module is according to the base in instruction fetch parts of decode results selection
This piece is as the follow-up fundamental block of current fundamental block:
1) supposes that two instruction fetch parts are respectively IF 0And IF 1, the current IF that carrying out 0In
Instruction, IF 1Then carry out prefetched instruction, the succeeding target of the current fundamental block of looking ahead
Instruction.When instruction sequences is carried out, the instruction that promptly finishes decoding be not branch instruction or
When jump condition is the branch instruction of False, select IF 0In instruction translate
Sign indicating number; When carrying out branch instruction, the instruction that promptly finishes decoding is that jump condition is
During the branch instruction of True, select IF 1In instruction decipher.
2) if current streamline is being carried out IF 1In instruction, selection strategy is just in time opposite,
If that is: the instruction amenable to process is carried out in proper order, the instruction that promptly finishes decoding is not a branch
When Zhi Zhiling or jump condition are the branch instruction of False, select IF 1In finger
Order is deciphered; When carrying out branch instruction, the instruction that promptly finishes decoding is to shift
When condition is the branch instruction of True, select IF 0In instruction decipher.
Therefore, in program operation process, instruction fetch parts are responsible for instruction execution unit provides instruction, another instruction fetch parts are responsible for finishing instruction prefetch, the concurrent working of two instruction fetch parts, make that when branch instruction decoding finishes all possible branch target instruction has been kept at respectively in two instruction fetch parts.
Compare with general compiler, compiler of the present invention has increased by two specific functions: determine all possible transfer address of branch instruction and insert prefetched instruction according to dissimilar branch instructions in fundamental block.The flow process that compiler inserts prefetched instruction is: program compiler is each bar instruction in the read routine code successively, when running into branch instruction, expression arrives the end of current fundamental block, inserts corresponding prefetched instruction according to the type of branch instruction after article one instruction of this branch instruction place fundamental block.
Branch instruction is divided into four classes, and the present invention has designed three branch instructions that prefetched instruction is corresponding different according to their different situations.Article three, prefetched instruction is fetch addr1, addr2, fetchaddr, fetch stack.
(1) conditional branch instructions: shift and success also may fail, two follow-up bases are arranged
This piece, two follow-up fundamental blocks of the current fundamental block of should looking ahead simultaneously.Two possibilities are arranged
Transfer address, one is kept in the instruction, another is that finger after this instruction
The address of order.Insert this moment after article one instruction of this branch instruction place fundamental block
Go into prefetched instruction fetch addr1, addr2 looks ahead from address addr1 and addr2
Two fundamental blocks.Addr1 is obtained by this branch instruction decoding, and addr2 is this instruction
Instruction address afterwards, (unit is a word to its value for branch instruction address adds instruction length
Joint).
(2) directly unconditional branch instruction: shift success always, have only one follow-up basic
Piece can obtain branch target address when compiling, the prefetch target fundamental block gets final product.
Have only a possible transfer address, be kept in the instruction.This moment is in this branch instruction
Insert prefetched instruction fetch addr after article one instruction of place fundamental block, look ahead
The fundamental block that begins from address addr.Addr is obtained by this branch instruction decoding.
(3) unconditional branch instruction indirectly: shift success always, but owing to divert the aim ground
The location is kept in the register, can't obtain when compiling usually, refers to for this class branch
Order is not handled.
(4) process return statement: shift success always, have only a follow-up fundamental block, this
When quasi-sentence appears at the invocation of procedure usually and returns, because embedding may appear in the invocation of procedure
Cover, the present invention (promptly looks ahead ground with the return address that a stack is preserved the invocation of procedure
The location), during each invocation of procedure the return address is kept at stack top location, when looking ahead from
Obtain prefetch address in the stack top location.This moment is at this process link order place fundamental block
Article one instruction after insert prefetched instruction fetch stack, look ahead from stack top location
The fundamental block that the address begins.
Different with other instructions, prefetched instruction is carried out by the instruction fetch parts.The coding of the prefetched instruction of different RISC (reduced instruction set computer calculating) instruction set correspondence may be different, but as long as realize identical functions of the present invention, all belong to protection domain of the present invention.Looking ahead with the fundamental block is that unit carries out, except when conditional branch instructions shifts outside a part of successor instruction of looking ahead when failing, the instruction of being looked ahead all will be performed.
If all possible target instruction target word is read into before branch instruction decoding finishes, just can select correct successor instruction to decipher and carry out according to the branch instruction decode results.The SPECint95 benchmark program group that adopts system performance evaluation and test association (System Performance Evaluation CooperativeConsortium) to provide is tested, when realization is of the present invention in FastDLX simulator (the CPU simulator of standard), if do not consider indirect unconditional branch instruction (this class instruction shared ratio in program is lower), finish the probability that back branch target instruction has been read in branch instruction decoding and reach 99.3%; If consider indirect unconditional branch instruction, finish the back branch target in branch instruction decoding and instruct the probability that has been read into to reach 93%.
The present invention has the following advantages:
(1) the hardware implementation complexity is low, low in energy consumption.Compare with branch prediction techniques, this
Bright recovery hardware when having saved complicated branch prediction hardware and prediction error, only
Use simple control logic unit, greatly reduced hard-wired difficulty and multiple
Assorted degree.
(2) control correlation delay elimination factor height.The present invention is directly according to the decoding of branch instruction
The result selects the branch target instruction, and instruction prefetch has guaranteed to finish when branch instruction decoding
The most instructions in back all are read into, and most controls are relevant prolongs thereby eliminated
Late.
(3) looking ahead with the fundamental block is that unit carries out, and looks ahead when carrying out current fundamental block
The follow-up fundamental block of it all makes prefetch operation send the time early, has guaranteed to fill
The foot time finishes and looks ahead; Look ahead when simultaneously, shifting failure except conditional branch instructions
Outside the part successor instruction, the instruction of being looked ahead all will be performed, effectively
Reduced invalid prefectching.
The present invention is satisfying under the embedded microprocessor hardware realization prerequisite simple, low in energy consumption, has realized efficient elimination streamline control correlation delay, improves the purpose of microprocessor performance.The present invention also can be applicable in the general purpose microprocessor design.
Description of drawings:
Fig. 1 is the process flow diagram that compiler of the present invention inserts prefetched instruction;
Fig. 2 is an overall logic structural drawing of the present invention;
Fig. 3 is the spacetime diagram of the non-branch instruction of general microprocessor in 5 level production lines;
Fig. 4 is the spacetime diagram of general microprocessor branch instruction in 5 level production lines;
Fig. 5 is the spacetime diagram of branch instruction in 5 level production lines behind employing the present invention;
Fig. 6 adopts the test result that the present invention is directed to the SPECint95 benchmark program;
Fig. 7 adopts the performance of the present invention and other control correlation delay eliminating method to compare.
Embodiment:
Fig. 1 inserts the process flow diagram of prefetched instruction for compiler of the present invention.Program compiler is each bar instruction in the read routine code successively, when running into branch instruction, expression arrives the end of current fundamental block, type according to branch instruction is inserted corresponding prefetched instruction after article one instruction of this branch instruction place fundamental block, program is process return statement always at last, can run into branch instruction when therefore compiling.
Fig. 2 is an overall logic structural drawing of the present invention.It is made up of collector, instruction fetch module, instruction decode and execution module:
Collector mainly is responsible for determining all possible transfer address of branch instruction and is inserted prefetched instruction according to the type of branch instruction.Compiler reads each the bar instruction in the source program successively, is branch instruction if this instructs, and then inserts corresponding prefetched instruction according to the type of branch instruction after article one instruction of its place fundamental block.Concrete grammar is: for conditional branch instructions, two follow-up fundamental blocks are arranged, corresponding prefetch address has two, insert prefetched instruction fetchaddr1, addr2, wherein addr1 is obtained by this branch instruction decoding, and addr2 is the instruction address after this instruction, and its value adds instruction length (unit is a byte) for branch instruction address; For direct unconditional branch instruction, have only a follow-up fundamental block, corresponding prefetch address has one, inserts prefetched instruction fetch addr, and wherein addr is obtained by instruction decode; The process return statement has a follow-up fundamental block, and corresponding prefetch address has one, is kept in the stack top location, inserts prefetched instruction fetch stack.Program code after the compiling is kept in the storer.
Instruction fetch module mainly is responsible for instruction decode and execution module provides instruction, and carry out instruction prefetch, and its function is finished by two instruction fetch parts.Instruction fetch parts IF 0From instruction Cache, read instruction instruction fetch parts IF by port 0 1Then from instruction Cache, read instruction by port one.When program brings into operation, select IF 0The instruction that provides is deciphered and is carried out, IF 1Free time, work as IF 0After reading in prefetched instruction, it is transmitted to IF 1, by IF 1Finish and look ahead; In program operation process, which instruction fetch parts carries out instruction fetch, and which instruction fetch parts carries out instruction prefetch and should determine according to the decode results of branch instruction: if IF 0Be responsible for instruction fetch, IF 1Be responsible for instruction prefetch, the instruction that finishes decoding is not branch instruction or the branch instruction that shifts failure, and the two operation is constant so, otherwise IF 1Be responsible for instruction fetch, IF 0Be responsible for instruction prefetch; If IF 1Be responsible for instruction fetch, IF 0Be responsible for instruction prefetch, the instruction that finishes decoding is not branch instruction or the branch instruction that shifts failure, and the two operation is constant so, otherwise IF 0Be responsible for instruction fetch, IF 1Be responsible for instruction prefetch.
Instruction decode and execution module mainly are responsible for the decoding and the execution of instruction, instruction decoded and that carry out is provided by instruction fetch parts in the instruction fetch module, wherein the decode results of branch instruction is sent to selector switch and two instruction fetch parts, and the decode results of non-branch instruction is sent to instruction execution unit and carries out.Selector switch is responsible for deciphering and carrying out according to the instruction that the decode results of branch instruction selects instruction fetch parts to provide, and when program brought into operation, it selected IF 0In instruction decode and execution, in program operation process, it is selected according to the decode results of branch instruction: if using IF 0In instruction, the instruction that finishes decoding is not branch instruction or the branch instruction that shifts failure, IF in continuing so to use 0Instruction, otherwise use IF 1In instruction; If using IF 1In instruction, the instruction that finishes decoding is not branch instruction or the branch instruction that shifts failure, continues to use IF so 1In instruction, otherwise use IF 0In instruction.In execution process instruction,, then the process return address is kept in the stack top location so that look ahead if generating process calls.
As Fig. 3, suppose that streamline is divided into 5 stages: get finger (IF), decoding (ID), carry out (EX), memory access (MEM) and write back (WB), instruction p is article one instruction of carrying out after instruction i, and instruction p+1 is the second instruction of carrying out after instruction i, and the like.Because i is not branch instruction, so in to its decoding, the instruction fetch parts are reading command p, therefore instruct IF stage of ID stage and instruction p of i to overlap, and the ID stage of the EX stage and instruction p of instruction i overlaps, and does not control correlation delay at this moment.
As Fig. 4, instruction i is a branch instruction, in the streamline identical with Fig. 3, only after instruction i decoding finishes, could determine the address of instruction p, and the IF stage of the EX stage and instruction p of instruction i is overlapping, and this moment, streamline had the control correlation delay of a clock period.
As Fig. 5, presumptive instruction I is a branch instruction, instruction i 1The instruction of carrying out when being the branch transition failure, instruction i 2The instruction of carrying out when being the branch transition success, the second instruction of instruction q for after branch instruction, carrying out, three instruction of instruction q+1 for after branch instruction, carrying out, the rest may be inferred.After adopting the present invention, in instruction i decoding, two instruction fetch parts will be distinguished reading command i 1And i 2, can be after instruction i decoding finishes according to decode results selection instruction i 1Or i 2Decipher the EX stage and instruction i of instruction i 1Or i 2The ID stage overlap, eliminated original control correlation delay.
The present invention successfully realizes in the streamline of milky way TS-1 embedded microprocessor IP kernel, can effectively eliminate streamline control correlation delay, the result of Fig. 6 for adopting the test of SPECint95 benchmark program group to obtain.Among the figure, the longitudinal axis is represented each benchmark program in the SPECint95 benchmark program group, and transverse axis represents to control the correlation delay elimination factor, and gcc is 88.80%, ijpeg is 83.32%, compress is 88.55%, and perl is 88.69%, and m88ksim is 92.99%, li is 85.09%, vertex is 87.47%, and go is 86.16%, and the average elimination factor of streamline control correlation delay reaches 87.8%.
Fig. 7 has gone out to adopt the different resulting performances of control correlation delay eliminating method, wherein an average clock periodicity that instruction is required is carried out in CPI (cycle per instruction) expression, and conditional branching postpones, unconditional branch postpones and the unit of average mark Zhi Yanchi all is clock period (cycle).Wherein, the situation of any Tapped Delay technology for eliminating is not adopted in the pipeline stalling representative, and two instruction fetching components represent to adopt situation of the present invention.After adopting the present invention, conditional branching postpones to be reduced to 0.05, and unconditional branch postpones to be reduced to 0.09, and average Tapped Delay is reduced to 0.06, and effectively the CPI value is reduced to 1.01, all well below the result who adopts additive method to obtain.

Claims (4)

1. correlation delay eliminating method for streamline control, its overall logic structure comprises collector, instruction fetch module, instruction decode and execution module, it is characterized in that being responsible for determining all possible transfer address of branch instruction and being inserted prefetched instruction according to the type of branch instruction by the compiler in the collector; Instruction fetch module also will carry out instruction prefetch except being responsible for instruction decode and execution module provides instruction, and its function is finished by two instruction fetch parts IF0 and IF1; A selector switch is arranged in instruction decode and the execution module, and the instruction decode parts are given in the responsible instruction of selecting instruction fetch parts to provide according to the decode results of current branch instruction and instruction execution unit is deciphered and carried out; Its implementation is: 1) determined all possible branch target address of branch instruction and inserted prefetched instruction by compiler in the collector; 2) when program brings into operation, instruction fetch parts read in first fundamental block in the instruction fetch module, another instruction fetch parts free time; For each fundamental block in the program: (a) the instruction fetch parts of being responsible for reading in this fundamental block read in each bar in the fundamental block successively
Instruction, and judge whether it is prefetched instruction; If the current instruction that is read into is for pre-
Instruction fetch then sends to prefetched instruction another instruction fetch parts, and being carried out by it should
Prefetched instruction; Otherwise instruction is sent to decoding unit to be deciphered; (b) when the instruction of the last item of fundamental block be after branch instruction decoding finishes, instruction decode and
Selector switch in the execution module is selected an instruction fetch portion according to the branch instruction decode results
Fundamental block in the part is as the follow-up fundamental block of current fundamental block:
If the I. current IF that carrying out 0In instruction, IF 1Then carry out prefetched instruction,
The look ahead succeeding target instruction of current fundamental block; When instruction sequences is carried out, promptly finish to translate
The instruction of sign indicating number is not branch instruction or jump condition when being the branch instruction of False, choosing
Select IF 0In instruction decipher; When carrying out branch instruction, promptly finish to decipher
Instruction is a jump condition when being the branch instruction of True, selects IF 1In instruction advance
Row decoding;
If II. current streamline is being carried out IF 1In instruction, selection strategy is phase just in time
Instead, if that is: the instruction amenable to process is carried out in proper order, the instruction that promptly finishes decoding is not
When branch instruction or jump condition are the branch instruction of False, select IF 1In finger
Order is deciphered; When carrying out branch instruction, the instruction that promptly finishes decoding is to shift bar
When part is the branch instruction of True, select IF 0In instruction decipher.
2. correlation delay eliminating method for streamline control according to claim 1, it is characterized in that the flow process that described compiler inserts prefetched instruction is: program compiler is each bar instruction in the read routine code successively, when running into branch instruction, expression arrives the end of current fundamental block, inserts corresponding prefetched instruction according to the type of branch instruction after article one instruction of this branch instruction place fundamental block.
3. correlation delay eliminating method for streamline control according to claim 1, it is characterized in that described prefetched instruction is that corresponding different branch instructions designs, they comprise fetchaddr1, three kinds of addr2, fetch addr, fetch stack, different branch instructions will be used different prefetched instructions:
(1) conditional branch instructions: shift and success also may fail, two follow-up bases are arranged
This piece, two follow-up fundamental blocks of the current fundamental block of should looking ahead simultaneously; Have two can
Can transfer address, one is kept in the instruction, another be this instruction afterwards that
The address of bar instruction; This moment is in article one instruction of this branch instruction place fundamental block
Insert prefetched instruction fetch addr1 afterwards, addr2 looks ahead from address addr1
Two fundamental blocks that begin with addr2; Addr1 is obtained by this branch instruction decoding,
Addr2 is the instruction address after this instruction, and its value adds instruction for branch instruction address
Length;
(2) directly unconditional branch instruction: shift success always, have only one follow-up basic
Piece can obtain branch target address when compiling, the prefetch target fundamental block gets final product; Only
A possible transfer address is arranged, be kept in the instruction; This moment is at this branch instruction place
Insert prefetched instruction fetch addr after article one instruction of fundamental block, look ahead from ground
The fundamental block that location addr begins; Addr is obtained by this branch instruction decoding;
(3) unconditional branch instruction indirectly: shift success always, but owing to divert the aim ground
The location is kept in the register, when compiling, can't obtain usually, and for this class branch instruction,
Do not handle;
(4) process return statement: shift success always, have only a follow-up fundamental block, this
When quasi-sentence appears at the invocation of procedure usually and returns, because embedding may appear in the invocation of procedure
Cover, the present invention preserves the return address of the invocation of procedure-be prefetch address with a stack,
During each invocation of procedure the return address is kept at stack top location, when looking ahead from stack top location
The middle prefetch address that obtains; This moment, the article one at this process link order place fundamental block referred to
Insert prefetched instruction fetch stack after the order, looking ahead begins from the stack top location address
Fundamental block.
4. the removing method of streamline control correlation delay according to claim 1, it is characterized in that described instruction prefetch is that unit carries out with the fundamental block, a part of successor instruction of looking ahead when shifting failure except conditional branch instructions, the instruction of being looked ahead all will be performed.
CNB011315695A 2001-11-28 2001-11-28 Correlation delay eliminating method for streamline control Expired - Fee Related CN1142485C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB011315695A CN1142485C (en) 2001-11-28 2001-11-28 Correlation delay eliminating method for streamline control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB011315695A CN1142485C (en) 2001-11-28 2001-11-28 Correlation delay eliminating method for streamline control

Publications (2)

Publication Number Publication Date
CN1349160A CN1349160A (en) 2002-05-15
CN1142485C true CN1142485C (en) 2004-03-17

Family

ID=4670694

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB011315695A Expired - Fee Related CN1142485C (en) 2001-11-28 2001-11-28 Correlation delay eliminating method for streamline control

Country Status (1)

Country Link
CN (1) CN1142485C (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7111125B2 (en) * 2002-04-02 2006-09-19 Ip-First, Llc Apparatus and method for renaming a data block within a cache
JP3627725B2 (en) 2002-06-24 2005-03-09 セイコーエプソン株式会社 Information processing apparatus and electronic apparatus
US7000095B2 (en) * 2002-09-06 2006-02-14 Mips Technologies, Inc. Method and apparatus for clearing hazards using jump instructions
US7613906B2 (en) * 2005-08-12 2009-11-03 Qualcomm Incorporated Advanced load value check enhancement
CN101782847B (en) * 2009-01-20 2013-04-24 瑞昱半导体股份有限公司 Data storage method and processor using same
CN102117198B (en) * 2009-12-31 2015-07-15 上海芯豪微电子有限公司 Branch processing method
CN106990942A (en) * 2011-06-29 2017-07-28 上海芯豪微电子有限公司 branch processing method and system
CN103838550B (en) * 2012-11-26 2018-01-02 上海芯豪微电子有限公司 A kind of branch process system and method
CN103608768B (en) * 2013-04-01 2017-03-01 华为技术有限公司 A kind of data prefetching method, relevant apparatus and system
CN103902252B (en) * 2014-03-28 2016-08-31 中国航天科技集团公司第九研究院第七七一研究所 A kind of analysis method for instruction pipeline dependency
WO2016155623A1 (en) * 2015-03-30 2016-10-06 上海芯豪微电子有限公司 Information-push-based information system and method
CN105260256B (en) * 2015-10-27 2018-03-23 首都师范大学 A kind of fault detect of duplication redundancy streamline and backing method
CN110780925B (en) * 2019-09-02 2021-11-16 芯创智(北京)微电子有限公司 Pre-decoding system and method of instruction pipeline
CN112416438A (en) * 2020-12-08 2021-02-26 王志平 Method for realizing pre-branching of assembly line
CN113076136A (en) * 2021-04-23 2021-07-06 中国人民解放军国防科技大学 Safety protection-oriented branch instruction execution method and electronic device
CN116112580B (en) * 2022-11-23 2024-04-26 国网智能电网研究院有限公司 Hardware pipeline GTP data distribution method and device for power low-delay service

Also Published As

Publication number Publication date
CN1349160A (en) 2002-05-15

Similar Documents

Publication Publication Date Title
CN1142485C (en) Correlation delay eliminating method for streamline control
CN1222868C (en) Method and apparatus for multi-thread pipelined instruction decoder
US10268480B2 (en) Energy-focused compiler-assisted branch prediction
CN1191524C (en) Pretaking using future branch-path information obtained by predicting from branch
CN1308826C (en) System and method for CPI scheduling in SMT processor
CN1129843C (en) Use composite data processor systemand instruction system
CN1308825C (en) System and method for CPI load balancing in SMT processors
CN1292343C (en) Apparatus and method for exception responses within processor and processing pipeline
CN1222985A (en) Method relating to handling of conditional jumps in multi-stage pipeline arrangement
Gordon-Ross et al. Exploiting fixed programs in embedded systems: A loop cache example
CN1571954A (en) Method and apparatus for perfforming compiler transformation of software code using fastforward regions and value specialization
CN1682181A (en) Data processing system having an external instruction set and an internal instruction set
CN1504881A (en) Java execution equipment and java execution method
WO2013112282A1 (en) Method and apparatus for register spill minimization
CN1173262C (en) Optimized bytecode interpreter of virtual machine instructions
Park et al. Microarchitecture-aware code generation for deep learning on single-isa heterogeneous multi-core mobile processors
CN1473294A (en) hardware loop
CN1716202A (en) Be association of activity and inertia incomplete disposal route of static information and device in the binary translation
CN1860436A (en) Method and system for processing a loop of instructions
CN100345117C (en) Floating-point operation process for X8b in binary translation
CN1100293C (en) Instruction control splice method and device
TWI379230B (en) Instruction mode identification apparatus and instruction mode identification method
CN1269036C (en) Methods and appats, for generating speculative helper therad spawn-target points
CN1542608A (en) Microcontroller
CN1755631A (en) Library function call disposal route in the binary translation

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee