CN1266592C - Dynamic VLIW command dispatching method according to determination delay - Google Patents

Dynamic VLIW command dispatching method according to determination delay Download PDF

Info

Publication number
CN1266592C
CN1266592C CN 200310110566 CN200310110566A CN1266592C CN 1266592 C CN1266592 C CN 1266592C CN 200310110566 CN200310110566 CN 200310110566 CN 200310110566 A CN200310110566 A CN 200310110566A CN 1266592 C CN1266592 C CN 1266592C
Authority
CN
China
Prior art keywords
instruction
vliw
delay
cache
earliest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200310110566
Other languages
Chinese (zh)
Other versions
CN1545026A (en
Inventor
王志英
沈立
戴葵
张春元
鲁建壮
李云照
陆洪毅
王蕾
王进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN 200310110566 priority Critical patent/CN1266592C/en
Publication of CN1545026A publication Critical patent/CN1545026A/en
Application granted granted Critical
Publication of CN1266592C publication Critical patent/CN1266592C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention discloses dynamic VLIW instruction dispatching method according to determined delay. The goal of the present invention is to solve the problems that a VLIW microprocessor can not eliminate the compatibility of dynamic delay and binary codes. The present invention has the technical scheme that a pipeline core is divided into a front end and a back end, wherein the front end comprises an instruction fetch module, an instruction decode module and an instruction dispatching module, and the back end comprises FU and ROB. Determined execution delay is designated for various operations in a VLIW instruction according to statistical information, and the earliest available time of each register is stored. In running, the execution time of each operation is determined according to the earliest available time of each register, and the end time of execution is determined according to operational delay. Meanwhile, the optimized result of a VLIW compiler is used for ensuring that no correlation exist among a plurality of operations in the same VLIW instruction. The present invention enables an instruction dispatching mechanism with simple use to dynamically determine the execution time of operations by the parallelism development result of the VLIW compiler, the performance of microprocessors is improved, and the problem of binary code compatibility is solved.

Description

According to the dynamic VLIW instruction scheduling method of determining to postpone
Technical field: the present invention relates to instruct in the microprocessor Design dynamic instruction dispatching method in dispatching method, especially VLIW (very long instruction word, the Very Long Instruction Word) microprocessor Design.
Background technology: instruction scheduling both can be finished by compiler static state when compiling in the microprocessor Design, also can dynamically be finished by hardware mechanisms when operation, and these two kinds of methods cut both ways.In general, all despatching works of static method are all finished by compiler, seldom even do not need only to need extra hardware, shortcoming is can't handle effectively that when operation produce dynamic deferred (delay that causes as branch transition or reference-to storage), performance is subjected to certain restriction, its typical case's representative is a vliw microprocessor, for example the MultiFlow microprocessor of MultiFlow company; Message scheduling instruction when dynamic approach then can make full use of operation is carried out, reduce preferably because the delay that branch and accessing operation cause, shortcoming is the support that needs complex hardware mechanism, the hardware cost height, superscalar architecture is its representative, for example X86 series microprocessor of Intel Company.
Studies show that the performance of superscalar microprocessor has been tending towards the limit at present, and vliw microprocessor shows good performance advantage and development prospect, becomes the focus of current architectural study and microprocessor Design.The VLIW structure forms a very long instruction with the operation of many instructions packing, and each operation during same VLIW instructs can executed in parallel, and VLIW gains the name thus.The VLIW structure adopts a plurality of independent functional units, and with each operation in the parallel execution of instructions, the task of many instructions that selection can be flowed out is simultaneously finished by compiler.But the information elimination was dynamic deferred when present vliw microprocessor still can't make full use of operation, also had serious binary code compatibling problem simultaneously.Prior art is as separating the mechanism of flowing out (Split-Issue) and DISVLIW (Dynamically Instruction ScheduledVLIW, dynamic dispatching VLIW) all hardware instruction scheduling mechanism and VLIW structure can not be merged well, though solved the binary code compatibling problem to a certain extent, but hardware mechanisms is to a great extent in the work that repeats compiler, and design complexities is very high, and performance but can not get effective raising.How to realize that with simple hardware mechanisms VLIW instruction dynamic dispatching is a key issue that needs to be resolved hurrily.
Summary of the invention: the objective of the invention is to solve vliw microprocessor and can't effectively utilize the dynamic deferred and binary code compatibling problem of information elimination when moving, overcome that hardware mechanisms repeats compiler work in a large number in the prior art, design complexities is high but can't effectively improve the shortcoming of performance, the hardware instruction scheduling mechanism is combined among the VLIW structure, improve the dynamic deferred tolerance of VLIW structure, and utilize the concurrency of VLIW compiler development to reduce hardware complexity, improve the performance of vliw microprocessor, eliminate dynamic deferred and solution binary code compatibling problem.
Technical scheme of the present invention is: according to statistical information is that various types of operations specify the execution of determining to postpone in the VLIW instruction, and the pot life the earliest of each register is preserved; Determine the execution time of each operation during operation according to the pot life the earliest of each register, determine its execution concluding time according to operating delay; Utilize the optimization result of VLIW compiler to guarantee not exist between a plurality of operations in the same VLIW instruction any relevant simultaneously, can detect the execution time of a plurality of operations simultaneously and whether relevantly needn't detect between them, reduce hardware complexity.Still there is not the report that adopts this method dynamic dispatching VLIW instruction at present both at home and abroad.
Concrete scheme is: whole vliw microprocessor system comprises pipeline kernel, Cache system, memory controller and four parts of storer: pipeline kernel is responsible for execution command, and execution result write back storer, it is connected with the Cache system with address bus by the instruction/data bus; The Cache system holds instruction and the high-speed cache of data, is made up of instruction Cache and Data Cache two parts; Memory controller provides the interface of storer and Cache system, and when the required instruction of pipeline kernel or data were not in Cache, memory controller was responsible for Cache is read in instruction or data from storer; Storer is held instruction and data.Be connected with address bus by instruction bus, data bus between these three modules of Cache system, memory controller and storer.
The vliw microprocessor system utilizes pipeline kernel to carry out the dynamic instruction scheduling, and the method for designing of pipeline kernel is: pipeline kernel is divided into front-end and back-end.Front end comprises instruction fetch module, instruction decode module and three modules of instruction dispatch module, is responsible for memory fetch, with the instruction decode of fetching, and according to each execution time of operating in the definite instruction of decode results; The rear end comprises that (Function Unit, FU) and again the sequencing buffering (be responsible for executable operations and the execution result of operation is confirmed by Reorder Buffer, ROB) two modules for the functional unit of executable operations.Instruction fetch module is connected with address bus and instruction Cache by instruction bus, is responsible for instruction fetch from instruction Cache; The VLIW code translator is arranged in the instruction decode module, be connected by instruction bus, instruction is deciphered, determine the delay of operation, but determine the execution time the earliest of operation according to the information that writes down among the RAT according to the action type that decoding obtains with instruction fetch module; The instruction dispatch module comprises register pot life table (Register Available Table, RAT) and functional unit selector switch (Function Unit Selector, FUS), the pot life the earliest of each register of RAT table record, FUS is assigned to different functional units according to decode results with each operation and carries out, determine the execution time the earliest of operation, and revise the pot life the earliest of corresponding register (if any) among the RAT according to operating delay.The functional unit of rear end comprises ALU unit, memory access unit, floating point unit, branch units, concrete quantity is different because of the design of different microprocessors, can carry out a plurality of operations simultaneously, in order not hinder the smooth decoding of other instruction, each functional unit has an instruction queue, preserves and waits for all operations of carrying out in this unit; ROB preserves the execution result of operation and according to the order of instruction fetch the execution result of operation is confirmed, the result who obtains confirming is write back register or storer, abandon remaining result, in each instruction queue of cancelling simultaneously all etc. this result's to be used operation.
The concrete steps of utilizing pipeline kernel to carry out the dynamic instruction scheduling are:
1. instruction fetch module is got the VLIW instruction the Cache from instruction, and for every instruction distributing the execution result of holding instruction among the sequencing buffering ROB again, write down this sequence number I, identify every instruction uniquely by I.
2. instruction decode module is to the VLIW instruction decode, determine its delay according to each operation types in the instruction, determine the execution time the earliest of each operation according to the register pot life that writes down among the RAT, and be each operated allocated two numberings I and p, the instruction at I sign operation place, p represents that this operates in the position of instruction among the I, like this by<I, p〉represent an operation uniquely; The method of determining operating delay is:
(a) operate for Load, microprocessor Design all adopts Memory Hierarchy to constitute complicated storage system at present, and visiting wherein any single-level memory inefficacy all can influence its delay, and the delay of Load operation is not a fixed value when therefore moving; The simulation test statistics shows, to be appointed as the delayed aging fruit that visit data Cache hits best when the delay of Load operation, therefore the present invention also adopts this hypothesis, and the delay that visit data Cache hits is appointed as in the delay of Load operation, is a clock period.
(b) for Store operation, postpone also uncertainly during its operation, but the Store operation does not have destination register, therefore can not influence the execution of other type instruction.Microprocessor all adopts and writes buffering at present, the delay of Store operation can be shortened to a clock period, so the present invention supposes that its delay is a clock period.
(c) for other all types of operations, postponing all is determined value, promptly definite after the microprocessor pipeline design is determined.
3. for each operation in the VLIW instruction, the action type that the instruction dispatch module of front end obtains according to decoding module determines that all can carry out the functional unit of this operation, determine that this operates in the execution time the earliest on each functional unit, this operation is assigned on the functional unit that can carry out this operation the earliest carries out, revise the pot life the earliest of this operation destination register (if any) among the RAT simultaneously.The method of determining to operate in the execution time the earliest on each functional unit is: for each functional unit, this that obtains according to decoding operates that the execution time is determined its position in this functional unit instruction queue the earliest, if this position is taken by other operation, then seek first room backward and put into this operation; For fear of finishing the pipeline stall that causes because of waiting for that the instruction of some long delays is carried out, the length setting of instruction queue is the long delay number of all operations.
4. the functional unit of each clock period streamline rear end detects the operation be positioned at its instruction queue head whether carry out required source operand all ready, is then to carry out this operation, otherwise waits for;
5. the sequencing buffering is confirmed each execution result of operating in every instruction successively according to the instruction fetch order again, and the result that will be identified writes back register or storer; If the result of certain operation is cancelled, then all wait for the operation of this operating result in the flush instructions formation.
By above implementation as can be known, the present invention can information dynamically adjust execution time of each operation in the VLIW instruction when decoding finishes the back according to operation, and determines that it carries out concluding time, realizes the dynamic instruction scheduling.
Adopt the present invention can reach following technique effect:
(1) the hardware implementation complexity is low, low in energy consumption.Compare with traditional dynamic instruction dispatching technique, the present invention utilizes RAT to write down the pot life the earliest of each register, for each operation, instruction dispatch unit is determined the execution time the earliest of this operation according to the information among the RAT, and put it in the functional unit waiting list that to carry out it the earliest, can avoid using complicated coherent detection mechanism, for example instruction window like this.Simultaneously the VLIW compiler guarantees not exist between each operation in the same VLIW instruction any relevant, can handle a plurality of operations simultaneously and needn't detect these the operation between whether exist relevant, avoided using complicated instruction coherent detection hardware, the execution time of operation is determined by its type fully, greatly reduces hard-wired difficulty and complexity.
(2) increased substantially the performance of vliw microprocessor.Adopt dynamic instruction dispatching method of the present invention, vliw microprocessor can redefine the execution sequence of instruction according to the pipelining delay that when operation dynamically produces, effectively eliminated the adverse effect of these delays, improved the performance of vliw microprocessor track performance.
(3) instruction dispatch unit can when operation according to microprocessor in functional unit actual quantity and postpone to redefine execution time of operation, and it be assigned to each functional unit carry out, solved the binary code compatibling problem that the VLIW structure faces.
(4) each functional unit all has an instruction queue, is used to preserve the instruction that wait is carried out on this functional unit, can avoid hindering the normal decoding of other instruction like this.The length setting of instruction queue is the long delay number of all operations, has avoided finishing the pipeline stall that causes because of waiting for that some long delay instruction is carried out.
The present invention can make full use of the concurrency development result of VLIW compiler, uses simple instruction scheduling mechanism dynamically to determine the execution time of operation, improves the vliw microprocessor performance, and has solved the binary code compatibling problem that the VLIW structure faces effectively.
The SPECint95 benchmark program group and the Unix core benchmark program group that adopt system performance evaluation and test association (System Performance Evaluation CooperativeConsortium) to provide are tested, when every instruction contains that simulation is of the present invention in the VLIW simulator of 4 operations, can obtain average 2.217 IPC (on average phase execution command number, Instructions per Cycle) weekly.
Description of drawings:
Fig. 1 is a dynamic VLIW microprocessor logic block diagram of the present invention.
Fig. 2 is a dynamic VLIW microprocessor pipeline nuclear logic diagram of the present invention.
Fig. 3 is the logic diagram of dynamic instruction scheduling mechanism of the present invention.
Fig. 4 is dynamic instruction scheduling mechanism The performance test results figure of the present invention.
Embodiment:
Fig. 1 is a dynamic VLIW microprocessor system logic diagram of the present invention.Whole vliw microprocessor system comprises pipeline kernel, Cache system, memory controller and four parts of storer: pipeline kernel is responsible for execution command, and execution result write back storer, it is connected with the Cache system with address bus by the instruction/data bus; The Cache system holds instruction and the high-speed cache of data, is made up of instruction Cache and Data Cache two parts; Memory controller provides the interface of storer and Cache system, and when the required instruction of pipeline kernel or data were not in Cache, memory controller was responsible for Cache is read in instruction or data from storer; Storer is held instruction and data.Be connected with address bus by instruction bus, data bus between these three modules of Cache system, memory controller and storer.
Fig. 2 is a dynamic VLIW microprocessor pipeline nuclear logic diagram of the present invention.Pipeline kernel is divided into front-end and back-end (with dashed lines is irised out), and front end comprises instruction fetch module, instruction decode module, instruction dispatch module, connects by internal bus between these three modules.Instruction fetch module is responsible for instruction fetch and by internal bus instruction decode module is sent in instruction, the code translator of instruction decode module is sent to the instruction dispatch module to instruction decode and with decode results by internal bus, decode results comprises the source-register and the destination register of action type, operating delay, operation, and the operation during the instruction dispatch module will be instructed according to decode results is assigned to each functional unit FU of rear end 1~FU nWaiting list IQ in; A plurality of independent functional units are used in the rear end, can carry out a plurality of operations simultaneously, and execution result is temporarily stored among the sequencing buffering ROB again, confirm that by sequencing buffering again the result who obtains confirming will be write back storer or register, and all the other results will be cancelled.
Fig. 3 is a dynamic instruction scheduling logic synoptic diagram in the vliw microprocessor pipeline kernel of the present invention.The dynamic instruction scheduling logic is made up of three modules such as instruction decode module, instruction dispatch module, instruction execution modules:
Instruction decode module comprises the VLIW code translator, be connected by internal bus with instruction fetch module, be responsible for instruction decode, and determine the delay of each operation in the VLIW instruction, determine the execution time the earliest of each operation according to the register pot life that writes down among the RAT according to the action type that decoding obtains.For the execution result of order affirmation operation, it also is each operated allocated two numberings I and p, and I is a command identification, and p is the position that operates in the instruction, and I and p can represent each operation uniquely.
The instruction dispatch module comprises register pot life table RAT and functional unit selector switch FUS, be responsible for determining the execution time the earliest of each operation in the VLIW instruction according to the instruction decode result, with and the pot life the earliest of destination register, and operation is put into the waiting list of corresponding function unit.RAT adopts the structure identical with register file, the pot life the earliest of a physical register of record.FUS is connected with RAT by internal bus, and the pot life the earliest of register is write RAT.
The instruction execution module comprise functional unit and again sequencing cushion ROB, interconnect by internal data bus, the execution result of being responsible for after executable operations also will be confirmed writes back register or storer, and wherein functional unit is responsible for executable operations, and the sequencing buffering is responsible for results verification again.ROB adopts the structure identical with register file, and the information of record comprises operation mark I and p, operation execution result.
Fig. 4 is dynamic instruction scheduling mechanism The performance test results figure of the present invention, the performance that has compared the present invention and other two kinds of dynamic instruction scheduling mechanism DL1 and DL6, DL1 and DL6 represent respectively to suppose that the Load operating delay is respectively 1 and 6 o'clock dynamic instruction scheduling mechanism.The longitudinal axis is IPC (on average phase execution command number, Instructions per Cycle) weekly, and transverse axis is the used instruction window size of other dynamic instruction scheduling mechanism.The SPECint95 benchmark program group and the Unix core benchmark program group that adopt system performance evaluation and test association (System Performance Evaluation CooperativeConsortium) to provide are tested, the benchmark program of selecting for use comprises li, compress and the m88ksim among the SPECint95, and the lex.c among the Unix, wc.c and grep.c.When every instruction contains that simulation is of the present invention in the VLIW simulator of 4 operations, can obtain average 2.217 IPC (on average phase execution command number, Instructions perCycle) weekly.

Claims (3)

1 one kinds of foundations are determined the dynamic VLIW instruction scheduling method of delay, its VLIW is that the very long instruction word microprocessor system comprises pipeline kernel, Cache system, memory controller and storer, pipeline kernel is responsible for execution command, and execution result write back storer, it is connected with the Cache system with address bus by the instruction/data bus; The Cache system holds instruction and the high-speed cache of data, is made up of instruction Cache and Data Cache two parts; Memory controller provides the interface of storer and Cache system, and when the required instruction of pipeline kernel or data were not in Cache, memory controller was responsible for Cache is read in instruction or data from storer; Storer is held instruction and data; Be connected with address bus by instruction bus, data bus between these three modules of Cache system, memory controller and storer; It is characterized in that utilizing pipeline kernel to carry out the dynamic instruction scheduling, the method for designing of pipeline kernel is: pipeline kernel is divided into the rear and front end, front end comprises instruction fetch module, instruction decode module, instruction dispatch module, be responsible for memory fetch, with the instruction decode of fetching, and according to each execution time of operating in the definite instruction of decode results; The rear end comprises the functional unit FU of executable operations and two modules of sequencing buffering ROB again, is responsible for executable operations and the execution result of operation is confirmed; Instruction fetch module is connected with address bus and instruction Cache by instruction bus, is responsible for instruction fetch from instruction Cache; The VLIW code translator is arranged in the instruction decode module, be connected by instruction bus with instruction fetch module, instruction is deciphered, determined the delay of operation, but determine the execution time the earliest of operation according to the information that writes down among the register pot life table RAT according to the action type that decoding obtains; The instruction dispatch module comprises register pot life table RAT---Register Available Table and functional unit selector switch FUS, the pot life the earliest of each register of RAT table record, FUS is assigned to different functional units according to decode results with each operation and carries out, determine the execution time the earliest of operation, and revise the pot life the earliest of corresponding register among the RAT according to operating delay; The functional unit of rear end comprises ALU unit, memory access unit, floating point unit, branch units, and each functional unit has an instruction queue, preserves and waits for all operations of carrying out in this unit; ROB preserves the execution result of operation and according to the order of instruction fetch the execution result of operation is confirmed, the result who obtains confirming is write back register or storer, abandon remaining result, in each instruction queue of cancelling simultaneously all etc. this result's to be used operation; The concrete grammar that utilizes pipeline kernel to carry out the dynamic instruction scheduling is:
1.1 instruction fetch module is got the VLIW instruction the Cache from instruction, and for every instruction distributing the execution result of holding instruction among the sequencing buffering ROB again, write down this sequence number I, identify every instruction uniquely by I;
1.2 instruction decode module is to the VLIW instruction decode, determine its delay according to each operation types in the instruction, determine the execution time the earliest of each operation according to the register pot life that writes down among the RAT, and be each operated allocated two numberings I and p, the instruction at I sign operation place, p represents that this operates in the position of instruction among the I, like this by<I, p〉represent an operation uniquely;
1.3 for each operation in the VLIW instruction, the action type that the instruction dispatch module obtains according to pool sign indicating number module determines that all can carry out the functional unit of this operation, determine that this operates in the execution time the earliest on each functional unit, this operation is assigned on the functional unit that can carry out this operation the earliest carries out, revise the pot life the earliest of this operation destination register among the RAT simultaneously;
1.4 whether the detection of the functional unit of each clock period streamline rear end is positioned at the required source operand of the operation execution of its instruction queue head all ready, is then to carry out this operation, otherwise waits for;
1.5 sequencing buffering ROB confirms each execution result of operating in every instruction successively according to the instruction fetch order again, and the result that will be identified writes back register or storer; If the result of certain operation is cancelled, then all wait for the operation of this operating result in the flush instructions formation.
The dynamic VLIW instruction scheduling method that 2 foundations according to claim 1 determine to postpone is characterized in that describedly determining that according to each operation types in the instruction method of operating delay is:
2.1 the delay that visit data Cache hits is appointed as in the delay of Load operation, it is a clock period;
2.2 the delay of Store operation is assumed to a clock period;
2.3 for other all types of operations, postponing all is determined value, promptly definite after the microprocessor pipeline design is determined.
3 foundations according to claim 1 and 2 are determined the dynamic VLIW instruction scheduling method of delay, it is characterized in that the described method of determining to operate in the execution time the earliest on each functional unit is: for each functional unit, this that obtains according to decoding operates that the execution time is determined its position in this functional unit instruction queue the earliest, if this position is taken by other operation, then seek first room backward and put into this operation; And the length setting of instruction queue is that the long delay number of all operations is to avoid finishing the pipeline stall that causes because of waiting for that some long delay instruction is carried out.
CN 200310110566 2003-11-26 2003-11-26 Dynamic VLIW command dispatching method according to determination delay Expired - Fee Related CN1266592C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200310110566 CN1266592C (en) 2003-11-26 2003-11-26 Dynamic VLIW command dispatching method according to determination delay

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200310110566 CN1266592C (en) 2003-11-26 2003-11-26 Dynamic VLIW command dispatching method according to determination delay

Publications (2)

Publication Number Publication Date
CN1545026A CN1545026A (en) 2004-11-10
CN1266592C true CN1266592C (en) 2006-07-26

Family

ID=34335664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200310110566 Expired - Fee Related CN1266592C (en) 2003-11-26 2003-11-26 Dynamic VLIW command dispatching method according to determination delay

Country Status (1)

Country Link
CN (1) CN1266592C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702118B (en) * 2009-11-12 2012-08-29 中国人民解放军国防科学技术大学 Method for controlling production line with incomplete lock-step VLIW processor
CN110162339A (en) * 2019-05-20 2019-08-23 江南大学 A method of microprocessor soft error neurological susceptibility is reduced based on transmitting queue is adjusted

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100409181C (en) * 2006-05-25 2008-08-06 西北工业大学 Precise abnormal streamline scheduling method for floating point processing unit
US7496866B2 (en) * 2006-06-22 2009-02-24 International Business Machines Corporation Method for optimizing of pipeline structure placement
CN101782847B (en) * 2009-01-20 2013-04-24 瑞昱半导体股份有限公司 Data storage method and processor using same
CN101853151B (en) * 2009-05-19 2013-06-26 威盛电子股份有限公司 Device and method adaptive to microprocessor
JP2011090592A (en) * 2009-10-26 2011-05-06 Sony Corp Information processing apparatus and instruction decoder for the same
CN101751244B (en) * 2010-01-04 2013-05-08 清华大学 Microprocessor
CN102567137B (en) * 2010-12-27 2013-09-25 北京国睿中数科技股份有限公司 System and method for restoring contents of RAT (register alias table) by using ROB (reorder buffer) when branch prediction fails
CN102799419B (en) * 2012-09-05 2014-10-22 无锡江南计算技术研究所 Register writing conflict detection method and device, and processor
GB2569276B (en) * 2017-10-20 2020-10-14 Graphcore Ltd Compiler method
CN113805944B (en) * 2021-11-18 2022-02-25 北京微核芯科技有限公司 Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702118B (en) * 2009-11-12 2012-08-29 中国人民解放军国防科学技术大学 Method for controlling production line with incomplete lock-step VLIW processor
CN110162339A (en) * 2019-05-20 2019-08-23 江南大学 A method of microprocessor soft error neurological susceptibility is reduced based on transmitting queue is adjusted

Also Published As

Publication number Publication date
CN1545026A (en) 2004-11-10

Similar Documents

Publication Publication Date Title
US8266413B2 (en) Processor architecture for multipass processing of instructions downstream of a stalled instruction
CN1310155C (en) Appts. for memory communication during runhead execution
JP3548132B2 (en) Method and apparatus for flushing pipeline stages in a multithreaded processor
US5699536A (en) Computer processing system employing dynamic instruction formatting
US5918005A (en) Apparatus region-based detection of interference among reordered memory operations in a processor
TW446912B (en) Methods and apparatus for reordering load operations in a computer processing system
CN100361073C (en) Method and apparatus for multi-thread pipelined instruction decoder
JP4553936B2 (en) Techniques for setting command order in an out-of-order DMA command queue
CN1266592C (en) Dynamic VLIW command dispatching method according to determination delay
US20070043934A1 (en) Early misprediction recovery through periodic checkpoints
US20140095848A1 (en) Tracking Operand Liveliness Information in a Computer System and Performing Function Based on the Liveliness Information
KR20180036490A (en) Pipelined processor with multi-issue microcode unit having local branch decoder
CN1103960C (en) Method relating to handling of conditional jumps in multi-stage pipeline arrangement
CN103907090B (en) Method and device for reducing hardware costs for supporting miss lookahead
CN1790256A (en) Branch lookahead prefetch for microprocessors
US7650485B1 (en) Structure and method for achieving very large lookahead instruction window via non-sequential instruction fetch and issue
CN101535951A (en) Methods and apparatus for recognizing a subroutine call
JPH07152559A (en) Superscalar pipeline-type processor with reinforced pipe control and register conversion function
US6381691B1 (en) Method and apparatus for reordering memory operations along multiple execution paths in a processor
CN1227584C (en) Method and apparatus for constructing pre-scheduled instruction cache
WO2021078630A1 (en) Decoupled access-execute processing
US11210098B2 (en) Variable latency instructions
US10990398B2 (en) Mechanism for interrupting and resuming execution on an unprotected pipeline processor
JP2001142699A (en) Transfer mechanism of instruction data in pipeline processor
CN1601462A (en) Extended register space device of processor and method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee