CN106610816A - Avoidance method for conflict between instruction sets in RISC-CPU and avoidance system thereof - Google Patents
Avoidance method for conflict between instruction sets in RISC-CPU and avoidance system thereof Download PDFInfo
- Publication number
- CN106610816A CN106610816A CN201611246947.6A CN201611246947A CN106610816A CN 106610816 A CN106610816 A CN 106610816A CN 201611246947 A CN201611246947 A CN 201611246947A CN 106610816 A CN106610816 A CN 106610816A
- Authority
- CN
- China
- Prior art keywords
- instruction
- conflict
- cpu
- risc
- relevant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000004458 analytical method Methods 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims abstract description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000003780 insertion Methods 0.000 claims description 7
- 230000037431 insertion Effects 0.000 claims description 7
- 238000007619 statistical method Methods 0.000 claims description 7
- 230000001419 dependent effect Effects 0.000 claims description 4
- 241000288673 Chiroptera Species 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000004080 punching Methods 0.000 claims 1
- 230000001427 coherent effect Effects 0.000 abstract 4
- 238000013461 design Methods 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 3
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000001343 mnemonic effect Effects 0.000 description 2
- 102000006822 Agouti Signaling Protein Human genes 0.000 description 1
- 108010072151 Agouti Signaling Protein Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012407 engineering method Methods 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30134—Register stacks; shift registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30141—Implementation provisions of register files, e.g. ports
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3814—Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3816—Instruction alignment, e.g. cache line crossing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
- G06F9/3869—Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
The invention discloses an avoidance method for the conflict between instruction sets in RISC-CPU and an avoidance system thereof. The avoidance method comprises the following steps that step one: the conflict type is determined according to the data dependence relationship between different instruction sets in the RISC-CPU; step two: accessing to a "register file" or "memory" is judged by aiming at the current instruction, the step three is performed if the judgment result is yes, or analysis of the next instruction is continued; step three: a "coherent window" is defined by aiming at the current instruction, a "coherent instruction" is searched in the "coherent window" and existence of the "coherent instruction" is judged, the process enters the step four if the judgment result is yes, or the judgment result indicates no read-write conflict; and step four: the conflict method is selected according to the conflict type between different instruction sets, the data conflict is solved by using a concrete strategy and the assembly line throughput efficiency can also be guaranteed.
Description
Technical field
The present invention relates to computer processing technology field, and in particular to the rule conflicted between instruction set in a kind of RISC-CPU
Keep away method and system.
Background technology
As the design in Embedded RISC-CPU (is selected and instruction custom design from CPU hardware instruction set, to correspondence volume
Translate the design of device) it is extremely important.
Related technology is as follows:Artificial " generation meaning method (Beijing) the quasiconductor research and development of Application No. CN200810191060, application
Co., Ltd " application for a patent for invention " reduces the instruction conflict in processor ", and which solves to instruct the strategy for breaking through to be to pass through
(instruction issue) stage is sent in instruction, 2 kinds of instructions is carried out and is selected, be sent to follow-up parallel functional units, which is seen
Instruction Lothrus apterus are planted, then arbitrates out one of which instruction.The method is for one Lothrus apterus of selection from multi-emitting instruction window
Instruction.
And number of patent application " performs memory disambiguation for the application for a patent for invention of the Intel company of CN200710087737
Technology ", its conflict avoidance being directed between adjacent memory reference order gives solution, but is not involved with remaining
Coherence window inside associated instruction set or between instruction set is defined and the conflict avoidance between relevant instruction.
" one kind is referred to based on overlength for the application for a patent for invention of Application No. CN201310054280 of Xian Electronics Science and Technology University
Make the assembler method for designing of word ASIP ", instruction scheduling is completed by assembler design, is thought highly of using deposit
The methods such as name are conflicted with writeafterread (W-A-R) with removing the write after write (W-A-W) that instruction Out-of-order execution causes.
Number of patent application " performs memory disambiguation for the application for a patent for invention of the Intel company of " CN200710087737 "
Technology ", its conflict avoidance being directed between adjacent memory reference order gives solution, but is not involved with remaining
Coherence window inside associated instruction set or between instruction set is defined and the conflict avoidance between relevant instruction, and with this
The flowing water number of stages of the Embedded RISC-CPU involved by patent is different, does not also refer to that the design is proposed sub from instruction is based on
CPU hardware engineering design strategy of prototype CPU of collection to full instruction set.
It can be seen that, without the method for providing the relevant Conflict solvings of RAW from the overall angle of instruction system in prior art,
Do not have to propose the CPU hardware design engineering strategy from the CPU prototypes of Local Minimum instruction set to all instructions collection.
The content of the invention
To solve the deficiency that prior art is present, the invention discloses the rule conflicted between instruction set in a kind of RISC-CPU
Method and system are kept away, and for solving the mechanism conflicted between instruction set, and a kind of RISC-CPU are provided for read/write conflict solution
Based Intelligent Control hardware device certainly.
For achieving the above object, concrete scheme of the invention is as follows:
The bypassing method conflicted between instruction set in a kind of RISC-CPU, comprises the following steps:
Step one:According to the data dependence relation in RISC-CPU between different instruction set, it is determined that conflict type;
Step 2:For present instruction, judge whether which needs to access " register file " or " memorizer ", if so,
Step 3 is then carried out, otherwise, continues the lower bar instruction of analysis;
Step 3:For present instruction, " coherence window " is defined, in " coherence window ", find " relevant instruction ", and sentence
It is disconnected to whether there is " relevant instruction ", step 4 is if so, then entered, otherwise, is judged as without read/write conflict;
Step 4:According to the conflict type between instruction set, the selection of collision method is carried out, carried out using specific strategy
Data collision solves to ensure streamline throughput efficiency simultaneously.
Further, in step one, the framework, instruction set and flowing water section for RISC-CPU carries out statistical analysiss and returns
Class, is divided into the instruction of data processing instructions collection, memory reference instruction collection, branch instruction collection and immediate all instructions set
Collection.
Further, in step one, after four kinds of instruction set are obtained, the conflict between statistical analysiss instruction set is various
The traversal of the read/write conflict type between instruction set, classification, the director data for obtaining existing rely on the conflict for introducing:Below
" dependent instruction " reads old register data or the new write data that also do not read of covering, i.e., original instruction sequences
Data dependence relation is destroyed, and causes capability error;And original instruction pipeline is interrupted, including being frozen due to streamline
Knot or the streamline insertion instruction that causes of NOP instruction are handled up reduction.
Further, the classification of the type that conflicts is specially:Data processing, memory access, inside immediate instruction set with
Between them.
Further, in step 3, " coherence window " definition:For superscalar instruction process, it is not necessary to consider instruction
Out-of-order execution, for present instruction, if which needs to write certain depositor, need in following window
It is interior:If present instruction to register file in concrete depositor or certain position of memorizer operate;Refer to from currently
Make toward following instruction stream and pushing away, submit involved job sequence up to present instruction result from instruction is taken out.
Further, in step 3, " relevant instruction " definition:Based on defined good " coherence window ", do not consider
The Out-of-order execution of instruction, for " present instruction ", if which needs certain the skew ground to certain depositor or memorizer
Location is write, then, in the range of " coherence window ", search toward following instruction from present instruction, be equally also required to " currently referring to
Make " write or read one or more instruction, there is the instruction of data dependence relation in these, it is understood that there may be data
Read/write conflict, causes the interruption of streamline, reduces instruction throughput efficiency.
Further, in " coherence window ", the method for Conflict solving, including:(1) before the current command presentation stage
Logic operation result, be directly fed into the arithmetic element of following relevant instruction, and without when present instruction it is submitted,
Again last result is fed into the arithmetic element of subsequent instruction.(2) the computing knot of the relevant instruction behind the present instruction
Fruit carries out delay version, the depositor resource backup solved as read/write conflict by the use of the output for postponing version.
Further, in step 4, collision method is specifically included:
(1), before the submission of final operation result, intermediate operations data feedover data in advance, and intermediate operations data shift to an earlier date
Last presentation stage need not be arrived in feedforward, be just directly fed into the input of next instruction;
(2), delay version feedforward of the data before the submission of final operation result;
(3), the flowing water of intermediate calculation results postpones version with former computing version and deposits, in the flowing water section of next relevant instruction
In falling, increase extra delay version register, postpone one and clap or a few bats, while retaining the deposit in original normal flowing water section
Device version;
(4), originally streamline does not freeze strategy;
(5) method for, not using insertion NOP instruction.
Further, specifically, the RAW Conflict solvings of ADD and XOR instructions:The operation result of ADD instruction is shifted to an earlier date feed-in
To XOR as input, it is not necessary to submit to when ADD instruction and perform again;
The RAW Conflict solvings of STORE and LOAD instruction:The result of STORE as LOAD address computation be input into, by
Depositor before LOAD instruction, after addition delay;
The RAW Conflict solvings of ADD and STORE instructions:The operation result of ADD instruction is directly fed into next stage as skew
Address, carries out the calculating of STORE order addressable address;
The RAW Conflict solvings of LOAD and ADD instruction:Input of the result of LOAD instruction directly as ADD, the centre of ADD
As a result postpone the offset address that a version clapped is accessed as DMEM.
Further, in step one, it would be possible to after all read/write conflicts for existing are sorted out, be stored in local archive table
In:Conflict retrieval table, including index and the type that conflicts;Using superscalar techniques, from command memory, a plurality of finger is taken out
Order, stores in instruction buffer window, and " coherence window " for carrying out in advance defines the searching with " relevant instruction ";By definition
" coherence window " and " relevant instruction ", carries out the searching of the read/write conflict of instruction stream, if there is the relevant conflict between instruction,
Then by searching the conflict type being stored in advance between local instruction set, what is conflicted quickly evades in advance.
The avoidance system conflicted between instruction set in a kind of RISC-CPU, including Conflict solving control unit, apply
In RISC-CPU, the prominent solution control unit is used to realize:After all read/write conflicts that will likely exist are sorted out, this is stored in
In ground archives table:Conflict retrieval table, including index and the type that conflicts;Using superscalar techniques, from command memory, take out
A plurality of instruction, stores in instruction buffer window, and " coherence window " for carrying out in advance defines the searching with " relevant instruction ";Pass through
" coherence window " and " relevant instruction " are defined, the searching of the read/write conflict of instruction stream is carried out, if there is relevant between instruction
Conflict, then by searching the conflict type being stored in advance between local instruction set, what is conflicted quickly advises in advance
Keep away.
Beneficial effects of the present invention:
The present invention, is carried out for the conflict type existed between instruction in advance for the instruction set of current RISC-CPU
The statistical analysiss conflicted between instruction set with sort out, the result then found by conflict " relevant instruction ", " conflict is sought for control
Look for and solve " unit, reach the smooth purpose of coherence window internal pipeline.
The invention provides solving the available strategy conflicted between instruction set in RISC-CPU designs;Define instruction " phase
/ write conflict is write in dry window " and " relevant instruction ", " relevant instruction " read/write internal to solve " coherence window ", Writing/Reading.This
Text utilizes the solution of RAW conflicts as an example, analyzes the conflict between several major instruction collection, and the engineering method is instructed
The dependency of conflict quantifies to define and Conflict solving, and by adding read/write conflict positioning and solving hardware cell, which can be effective
Evade the instruction conflict existed between RISC-CPU different instruction sets, meanwhile, improve flowing water throughput efficiency.
Description of the drawings
Fig. 1 is the classification of the instruction set involved by RISC-CPU;
Fig. 2 is the signal of " coherence window " and " relevant instruction ";
Fig. 3 is the background and workflow of the proposition of the present invention;
Fig. 4 is the conflict positioning and the hardware cell for solving of addition;
Fig. 5 is the RAW Conflict solvings of ADD and XOR instructions;
Fig. 6 is the RAW Conflict solvings of STORE and LOAD instruction;
Fig. 7 is the RAW Conflict solvings of ADD and STORE instructions;
Fig. 8 is the RAW Conflict solvings of LOAD and ADD instruction.
Specific embodiment:
The present invention is described in detail below in conjunction with the accompanying drawings:
As shown in figure 1, the present invention proposes one kind for solving cpu instruction conflict effective ways.The present invention is in design
All instructions be divided into 4 big class (data processing instructions, branch Branch instructions, immediate instruction, memory reference order), from every
Representative subset of instructions is extracted in class instruction, solved inside subset of instructions respectively and RAW between them (write-it is rear-
Read) the relevant conflict of immediate data.Statistical separates out all potential RAW conflicts, the positioning for then being conflicted and solution in advance
Certainly, reach streamline smoothly purpose.
RISC-CPU designs proposed by the present invention are divided into data processing (Data-Processing) all instructions set,
Memory access (STORE/LOAD), branch (Branch), four kinds of instruction class of immediate (Immediate).For every kind of instruction
Inside (except branch Branch) class and instruction class between, explained with the Conflict solving of RAW (writing-rear-reading) as an example,
Total several RAW conflicts (data processing, memory access, inside immediate instruction set between them), need to carry out data
The relevant Conflict solving that dependence causes.
" coherence window " is defined:For superscalar instruction process, it is not necessary to consider instruction Out-of-order execution, for currently finger
For order, if which needs to write certain depositor, need in following window:If present instruction is to deposit
Certain position of concrete depositor or memorizer in device heap is operated;Push away from present instruction toward following instruction stream, from taking
Go out instruction until present instruction result submits involved job sequence to.
" relevant instruction " definition:Based on above defined good " coherence window ", the Out-of-order execution for instructing is not considered, it is right
For " present instruction ", if which needs to write certain offset address of certain depositor or memorizer, in " phase
In the range of dry window ", search toward following instruction from present instruction, be equally also required to " present instruction " is write or read
One or more instruction.There is the instruction of data dependence relation in these, it is understood that there may be reading and writing data conflict, cause streamline
Interruption, reduce instruction throughput efficiency (CPI).
For every kind of situation, " coherence window " is given respectively with " relevant instruction ".Fig. 2 is shown in the example signal of coherence window,
Such as ADD (heavy line) and XOR (fine dotted line) is instructed.The example of relevant instruction is illustrated to see Fig. 2, the relevant finger of such as ADD instruction
Order correspondence XOR and STORE is instructed.For between all similar instruction set, such as:For data processing instructions subset:ADD with
XOR is instructed, and as shown in table 1,2, carries out the solution of RAW conflicts respectively, as shown in Figure 3.Such as:For memory reference instruction
(LOAD/STORE), such as table 3,4, shown in 5,6:
The instruction format of 1 data processing of table (Data-Processing instruction) instruction set
The representative of 2 data processing instructions collection of table (ADD addition instructions, the instruction of XOR XORs)
Table 3, memorizer to depositor (LOAD) instruction format
Table 4, the representative in LOAD instruction class subset
LDR | RD←[RS] | LDR RD,[RS] |
Table 5, depositor to memorizer (STORE) instruction format
Table 6, the representative in STORE instruction class subsets
STR | RD→[RS] | STR RD,[RS] |
The solution of RAW conflicts is carried out respectively, as shown in Figure 5.
For, between all foreign peoples's instruction set, carrying out the solution of RAW conflicts, such as Fig. 6, shown in 7,8 respectively.
For remaining instruction set, immediate instruction, branch instruction, the RAW conflicts between them are no longer analyzed.
In " coherence window ", the method for Conflict solving, including:(1) logic before the current command presentation stage is transported
Result is calculated, the arithmetic element of following relevant instruction is directly fed into, and it is submitted without waiting until present instruction, then last
Result be fed into the arithmetic element of subsequent instruction.(2) operation result of the relevant instruction behind the present instruction is prolonged
Slow version, the depositor resource backup solved as read/write conflict by the use of the output for postponing version.
The RAW that the design is only analyzed between data processing and memory reference instruction conflicts.For other instruction set it
Between and remaining WAR/WAW conflict with evade, no longer concrete example introduce.
With reference to Fig. 3,4, carry out the present invention instruction conflict solve explanation.In RISC-CPU hardware designs, due to adjacent
Reading and writing data dependence is there may be between instruction, i.e., for register file in same depositor or memorizer it is same
There is read-write order and conflict in the access of sample offset address, cause streamline pause, reduce efficiency:RAW/WAR/WAR.For
The design of RISC-CPU, the conflict existed between instruction set can cause the interruption of streamline, reduce instruction throughput efficiency.According to
5 sections of flowing water paragraphs of classics of MIPS:Stage 1- instruction fetch (fetch), stage 2- Instruction decoding (decode), stage 3- are performed and are referred to
Make (excute), stage 4- memory access (memory access, for internal Memory, containing load/store two
Big class), stage 5- write-back.(Write Back, operation result write back to the register file in RISC-CPU to 5 sections of flowing water
Register-bank).In classical MIPS frameworks, most long 5 flowing water paragraphs of occupancy are instructed (such as to access memorizer STORE to refer to
Order, is then written back into register file);The data operation instruction that common non-memory is accessed, (such as ADD adds to take 4 sections of flowing water
Method is instructed, and the final result being added is written in certain depositor of register file, without memory access in the middle of ADD instruction
Stage), some instructions are carried out being over (such as redirect JMP instructions) in decoding stage;Conditional branch instructions, as its needs changes
Become the order of instruction stream, cause the non-sequential execution of instruction;It can be seen that, RISC-CPU design in, between different instruction set due to
The flowing water section of occupancy is different, and instructs the randomness of arrangement very strong, causes to same depositor or storage in register file
The read/write of device offset address is likely that there are data dependence conflict.
The present invention, by analysis of classical MIPS framework, between different instruction set possible register data read-write according to
Rely the conflict for causing, define " coherence window " for present instruction, and find from " coherence window " for currently referring to
" relevant instruction " of order, by the data dependence relation between different instruction set in the RISC-CPU that analyzes in advance, is rushed
The determination of prominent type and evading for flowing water efficiency:(1) behind, " dependent instruction " reads old register data or covers and also do not have
The data dependence relation for having the new write data of reading, i.e., original instruction sequences is destroyed, and causes capability error;(2) with
And original instruction pipeline is interrupted and (includes being gulped down due to the instruction that streamline is frozen or streamline insertion NOP instruction causes
Tell reduction), streamline is once frozen or bulk delay simply, it will causes the weakening of flowing water advantage, reduces instruction and gulp down
Tell efficiency.
In the present invention, the example of analysis carries out data collision using following strategy and solves while ensureing that streamline is handled up effect
Rate:(1), before the submission of final operation result, intermediate operations data feedover data in advance;(2) data are in final operation result
Delay version feedforward before submission;(3), intermediate calculation results prolong the slow version of flowing water with former computing version and deposit;(4), originally
Streamline does not freeze strategy;(5), do not use:The method of insertion NOP instruction.Such as:Method is planted for (1st):Intermediate operations number
According to feedovering (need not arrive last presentation stage) in advance, the input of next instruction is just directly fed into;Plant for (3rd)
Method:In the flowing water paragraph of next relevant instruction, increase extra delay version register (postponing to clap or a few bats), while
Retain the depositor version in original normal flowing water section.
For the instruction set of current RISC-CPU, for the conflict type existed between instruction, instructed in advance
The statistical analysiss conflicted between collection and classification, the result then found by conflict " relevant instruction ", control " are conflicted and are found and solution
Certainly " unit, reaches the smooth purpose of coherence window internal pipeline.Fig. 4,6 are to shift to an earlier date feed forward mechanism using intermediate calculation results;
Fig. 5,7 are delay version methods using result of calculation.The current method having had, is NOP mechanism, and the method can be by flowing water
Line postpones backward, and this patent does not use this method, meanwhile, this patent does not use the method for freezing streamline, and this is specially
The delay version of profit is to retain original pipeline register, while for the relevant instruction conflict categorization results being known a priori by, it is standby
Part postpones version, so reaches the internal local instruction conflict of solution " coherence window ", meanwhile, original normal version will not be destroyed
This flowing water segment data transmission.This patent, needs to make full use of data feed-forward, postpones various methods such as version, reach relevant window
Mouth is internal, and streamline is smooth.This patent, sorts out the Conflict solving of instruction from statistics, solves to carry out the management of system to positioning.
Meanwhile, this patent Conflict solving framework, it is also possible to absorb new instruction solution technology, so that whole instruction conflict solves system
It is more complete.
Explanation to Fig. 1:Fig. 1 is the conflict between several instruction set (data processing, immediate, branch, storage are accessed)
Solve.
Explanation to Fig. 2:Fig. 2 provides " coherence window " that two kinds of instructions are instructed for current ADD instruction, current XOR;
And for ADD instruction, give " the relevant instruction " that there is reading and writing data conflict:XOR, STORE, the two instructions are
The relevant instruction of ADD instruction.Explanation to Fig. 3:Give referring to " relevant instruction " solution based on " coherence window " for the present invention
Make the overall flow of collision method.This method, can as the instruction read/write conflict intelligent positioning in a kind of RISC-CPU with
Module is solved, is added in RISC-CPU stones.Certainly, for need the framework of the RISC-CPU of current design, flowing water hop count,
After instruction set species, instruction format etc. determine, need to carry out the exploitation of RISC-CPU stones again.Traditional method, will not conflict
Solution, from the angle of system, carry out quantify positioning with solve.
The patent, carries out advance statistical analysiss to the read/write conflict between variety classes or type command of the same race first,
And the strict classification for being conflicted, it would be possible to after all read/write conflicts for existing are sorted out, be stored in local archive table:Conflict
Retrieval table (index, conflict type);Using superscalar techniques, from command memory, a plurality of instruction is taken out, storage is to instruction
In buffer window, " coherence window " for carrying out in advance defines the searching with " relevant instruction ";By strict definition " relevant window
Mouthful " and " relevant instruction ", the searching of the read/write conflict of instruction stream is carried out, if there is the relevant conflict between instruction, is then passed through
Search has been stored in advance in the conflict type between local instruction set, and what is conflicted quickly evades in advance (by addition
Conflict solving control unit).Table 1,2,3 in the invention is to list several instructions, data of the table 1 for data processing instructions
Form and instruction mnemonic, table 2 and data form and instruction mnemonic that table 3 is memory access LOAD/STORE instructions
Symbol;
Fig. 5,7 are that feedforward in advance and Fig. 6,8 are the follow-up relevant intermediate calculation results for instructing before prime instruction results are submitted to
Postpone version output, both approaches are provided to prevent flowing water from interrupting.The output of relevant instruction different delays beat, needs profit
With some extra depositors, these depositors are the delay versions of result of calculation, and which does not destroy the result of original flowing water.
For the technology of remaining solution conflict, the invention can be expanded in the technological frame of the present invention.
If present instruction, register file or memorizer are not conducted interviews, then analyze next instruction, " relevant window
Mouthful " slide downward;If not having " relevant instruction " in " coherence window ", no read/write conflict, analysis terminate.
Explanation to Fig. 5:The operation result of ADD instruction is fed into XOR in advance as input, it is not necessary to when ADD instruction
Submission is performed again, it is ensured that the smoothness of streamline.
Explanation to Fig. 6:The result of STORE is input into as the address computation of LOAD, by, before LOAD instruction, adding
Depositor after delay, it is ensured that the smoothness of streamline.
Explanation to Fig. 7:The operation result of ADD instruction is directly fed into next stage as offset address, carries out STORE lives
Make the calculating of addressable address, it is ensured that the smoothness of streamline.
Explanation to Fig. 8:The storage result of LOAD instruction is directly sent to following dependent instruction:The result of LOAD instruction is direct
Used as the input of ADD, the intermediate result of ADD postpones the offset address that a version clapped is accessed as DMEM, it is ensured that streamline
It is smooth.
The present invention can realize that utilizing " coherence window " to carry out instruction conflict searching with " relevant instruction " is evaded with instruction.It is
The effective simple strategy solved by conflict between various instruction set, for RISC-CPU hardware instructions commonly conflict solution
It is certainly beneficial.
Intelligence conflict is found and is added on RISC-CPU designs with the unit for solving, and can be applicable to super scalar CPU design.This
It is bright data to be shifted to an earlier date, does not use flowing water to freeze, not used insertion NOP instruction, postpone feedforward or be based on bat more than intermediate calculation results
Postpone the technical combinations such as version using streamline smooth realize technology.
Although the above-mentioned accompanying drawing that combines is described to the specific embodiment of the present invention, not to present invention protection model
The restriction enclosed, one of ordinary skill in the art should be understood that on the basis of technical scheme those skilled in the art are not
The various modifications made by needing to pay creative work or deformation are still within protection scope of the present invention.
Claims (10)
1. the bypassing method for conflicting between instruction set in a kind of RISC-CPU, is characterized in that, comprise the following steps:
Step one:According to the data dependence relation in RISC-CPU between different instruction set, it is determined that conflict type;
Step 2:For present instruction, judge whether which needs to access " register file " or " memorizer ", if so, then enter
Row step 3, otherwise, continues the lower bar instruction of analysis;
Step 3:For present instruction, " coherence window " is defined, in " coherence window ", find " relevant instruction ", and judgement is
No presence " relevant instruction ", if so, then enters step 4, otherwise, is judged as without read/write conflict;
Step 4:According to the conflict type between instruction set, the selection of collision method is carried out, data are carried out using specific strategy
Conflict solving ensures streamline throughput efficiency simultaneously.
2. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as claimed in claim 1, is characterized in that, in step
In one, the framework, instruction set and flowing water section for RISC-CPU carries out statistical analysiss and sorts out, and all instructions set is divided into number
According to process instruction collection, memory reference instruction collection, branch instruction collection and immediate instruction set.
3. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as claimed in claim 2, is characterized in that, in step
In one, after four kinds of instruction set are obtained, the conflict between statistical analysiss instruction set, the read/write conflict class between various instruction sets
The traversal of type, classification, the director data for obtaining existing rely on the conflict for introducing:" dependent instruction " reads old register count below
According to or cover the new write data that also do not read, i.e., the data dependence relation of original instruction sequences is destroyed, and causes work(
Can mistake;And original instruction pipeline is interrupted, including as streamline is frozen or streamline insertion NOP instruction causes
Instruction handle up reduction.
4. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as described in claim 1 or 3, is characterized in that, punching
The classification of prominent type is specially:Data processing, memory access, inside immediate instruction set between them;
Preferably, the RAW Conflict solvings of ADD and XOR instructions:The operation result of ADD instruction is fed in advance XOR as defeated
Enter, it is not necessary to submit to when ADD instruction and perform again;
The RAW Conflict solvings of STORE and LOAD instruction:The result of STORE is input into as the address computation of LOAD, by LOAD
Depositor before instruction, after addition delay;
The RAW Conflict solvings of ADD and STORE instructions:The operation result of ADD instruction is directly fed into next stage as skew ground
Location, carries out the calculating of STORE order addressable address;
The RAW Conflict solvings of LOAD and ADD instruction:Input of the result of LOAD instruction directly as ADD, the intermediate result of ADD
Postpone the offset address that a version clapped is accessed as DM EM.
5. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as claimed in claim 1, is characterized in that, in step
In three, " coherence window " definition:For superscalar instruction process, it is not necessary to consider the Out-of-order execution of instruction, for present instruction
For, if which needs to write certain depositor, need in following window:If present instruction is to depositor
Certain position of concrete depositor or memorizer in heap is operated;Push away from present instruction toward following instruction stream, from taking-up
Instruction is until present instruction result submits involved job sequence to.
6. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as claimed in claim 5, is characterized in that, in step
In three, " relevant instruction " definition:Based on defined good " coherence window ", the Out-of-order execution for instructing is not considered, to " currently referring to
Make " for, if which needs to write certain offset address of certain depositor or memorizer, at " coherence window "
In the range of, search toward following instruction from present instruction, be equally also required to for being write to " present instruction " or being read
Or multiple instruction, there is the instruction of data dependence relation in these, it is understood that there may be reading and writing data conflict, in causing streamline
It is disconnected, reduce instruction throughput efficiency.
7. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as claimed in claim 1, is characterized in that, be located at
In " coherence window ", the method for Conflict solving, including:(1) the logic operation result before the current command presentation stage, directly present
Enter the arithmetic element to following relevant instruction, and it is submitted without waiting until present instruction, then last result is fed into
The arithmetic element of subsequent instruction.(2) operation result of the relevant instruction behind the present instruction carries out delay version, using prolonging
The depositor resource backup that the output of version is solved as read/write conflict late.
8. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as claimed in claim 1, is characterized in that, step 4
In, collision method is specifically included:
(1), before the submission of final operation result, intermediate operations data feedover data in advance, and intermediate operations data are feedovered in advance
Last presentation stage need not be arrived, the input of next instruction is just directly fed into;
(2), delay version feedforward of the data before the submission of final operation result;
(3), the flowing water of intermediate calculation results postpones version with former computing version and deposits, in the flowing water paragraph of next relevant instruction
In, increase extra delay version register, postpone one and clap or a few bats, while retaining the depositor in original normal flowing water section
Version;
(4), originally streamline does not freeze strategy;
(5) method for, not using insertion NOP instruction.
9. the bypassing method for conflicting between instruction set in a kind of RISC-CPU as claimed in claim 1, is characterized in that, in step
In one, it would be possible to after all read/write conflicts for existing are sorted out, be stored in local archive table:Conflict retrieval table, including index
And conflict type;Using superscalar techniques, from command memory, a plurality of instruction is taken out, store in instruction buffer window, enter
Row " coherence window " in advance defines the searching with " relevant instruction ";By defining " coherence window " and " relevant instruction ", carry out
The searching of the read/write conflict of instruction stream, if there is the relevant conflict between instruction, has then been stored in advance in this by search
Conflict type between the instruction set on ground, what is conflicted are quickly evaded in advance.
10. the avoidance system for conflicting between instruction set in a kind of RISC-CPU, is characterized in that, including Conflict solving control unit,
Apply in RISC-CPU, the prominent solution control unit is used to realize:After all read/write conflicts that will likely exist are sorted out, deposit
Storage is in local archive table:Conflict retrieval table, including index and the type that conflicts;Using superscalar techniques, from command memory
In, a plurality of instruction being taken out, is stored in instruction buffer window, " coherence window " for carrying out in advance is defined to be sought with " relevant instruction "
Look for;By defining " coherence window " and " relevant instruction ", the searching of the read/write conflict of instruction stream is carried out, if there is between instruction
Relevant conflict, then be stored in advance in conflict type between local instruction set by searching, what is conflicted is quick
Evade in advance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611246947.6A CN106610816B (en) | 2016-12-29 | 2016-12-29 | The bypassing method and system to conflict between instruction set in a kind of RISC-CPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611246947.6A CN106610816B (en) | 2016-12-29 | 2016-12-29 | The bypassing method and system to conflict between instruction set in a kind of RISC-CPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106610816A true CN106610816A (en) | 2017-05-03 |
CN106610816B CN106610816B (en) | 2018-10-30 |
Family
ID=58636378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611246947.6A Active CN106610816B (en) | 2016-12-29 | 2016-12-29 | The bypassing method and system to conflict between instruction set in a kind of RISC-CPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106610816B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189715A (en) * | 2018-08-16 | 2019-01-11 | 算丰科技(北京)有限公司 | Programmable artificial intelligence accelerator execution unit and artificial intelligence accelerated method |
CN111221573A (en) * | 2018-11-26 | 2020-06-02 | 深圳云天励飞技术有限公司 | Management method of register access time sequence, processor, electronic equipment and computer readable storage medium |
CN113312087A (en) * | 2021-06-17 | 2021-08-27 | 东南大学 | Cache optimization method based on RISC processor constant pool layout analysis and integration |
US11256444B2 (en) | 2020-07-10 | 2022-02-22 | Hon Hai Precision Industry Co., Ltd. | Method for processing read/write data, apparatus, and computer readable storage medium thereof |
TWI758778B (en) * | 2020-07-10 | 2022-03-21 | 鴻海精密工業股份有限公司 | Data read-write processing method, apparatus, and computer readable storage medium thereof |
CN114238182A (en) * | 2021-12-20 | 2022-03-25 | 北京奕斯伟计算技术有限公司 | Processor, data processing method and device |
CN113312087B (en) * | 2021-06-17 | 2024-06-11 | 东南大学 | Cache optimization method based on RISC processor constant pool layout analysis and integration |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5627982A (en) * | 1991-06-04 | 1997-05-06 | Matsushita Electric Industrial Co., Ltd. | Apparatus for simultaneously scheduling instructions from plural instruction stream into plural instruction executions units |
CN101067781A (en) * | 2006-03-07 | 2007-11-07 | 英特尔公司 | Technique to perform memory disambiguation |
CN101770357A (en) * | 2008-12-31 | 2010-07-07 | 世意法(北京)半导体研发有限责任公司 | Method for reducing instruction conflict in processor |
CN103116485A (en) * | 2013-01-30 | 2013-05-22 | 西安电子科技大学 | Assembler designing method based on specific instruction set processor for very long instruction words |
-
2016
- 2016-12-29 CN CN201611246947.6A patent/CN106610816B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5627982A (en) * | 1991-06-04 | 1997-05-06 | Matsushita Electric Industrial Co., Ltd. | Apparatus for simultaneously scheduling instructions from plural instruction stream into plural instruction executions units |
CN101067781A (en) * | 2006-03-07 | 2007-11-07 | 英特尔公司 | Technique to perform memory disambiguation |
CN101770357A (en) * | 2008-12-31 | 2010-07-07 | 世意法(北京)半导体研发有限责任公司 | Method for reducing instruction conflict in processor |
CN103116485A (en) * | 2013-01-30 | 2013-05-22 | 西安电子科技大学 | Assembler designing method based on specific instruction set processor for very long instruction words |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189715A (en) * | 2018-08-16 | 2019-01-11 | 算丰科技(北京)有限公司 | Programmable artificial intelligence accelerator execution unit and artificial intelligence accelerated method |
CN111221573A (en) * | 2018-11-26 | 2020-06-02 | 深圳云天励飞技术有限公司 | Management method of register access time sequence, processor, electronic equipment and computer readable storage medium |
US11256444B2 (en) | 2020-07-10 | 2022-02-22 | Hon Hai Precision Industry Co., Ltd. | Method for processing read/write data, apparatus, and computer readable storage medium thereof |
TWI758778B (en) * | 2020-07-10 | 2022-03-21 | 鴻海精密工業股份有限公司 | Data read-write processing method, apparatus, and computer readable storage medium thereof |
CN113312087A (en) * | 2021-06-17 | 2021-08-27 | 东南大学 | Cache optimization method based on RISC processor constant pool layout analysis and integration |
CN113312087B (en) * | 2021-06-17 | 2024-06-11 | 东南大学 | Cache optimization method based on RISC processor constant pool layout analysis and integration |
CN114238182A (en) * | 2021-12-20 | 2022-03-25 | 北京奕斯伟计算技术有限公司 | Processor, data processing method and device |
CN114238182B (en) * | 2021-12-20 | 2023-10-20 | 北京奕斯伟计算技术股份有限公司 | Processor, data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106610816B (en) | 2018-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106610816A (en) | Avoidance method for conflict between instruction sets in RISC-CPU and avoidance system thereof | |
Muslim et al. | Efficient FPGA implementation of OpenCL high-performance computing applications via high-level synthesis | |
Ipek et al. | Core fusion: accommodating software diversity in chip multiprocessors | |
CN100538628C (en) | Be used for system and method in SIMD structure processing threads group | |
TWI742048B (en) | Processors, methods, and systems to allocate load and store buffers based on instruction type | |
Ernst et al. | Cyclone: A broadcast-free dynamic instruction scheduler with selective replay | |
CN105426160A (en) | Instruction classified multi-emitting method based on SPRAC V8 instruction set | |
CN107810479A (en) | Determination for the target location of processor control transmission | |
CN103646009A (en) | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations | |
Sembrant et al. | Long term parking (ltp) criticality-aware resource allocation in ooo processors | |
Repetti et al. | Pipelining a triggered processing element | |
Jeong et al. | CASINO core microarchitecture: Generating out-of-order schedules using cascaded in-order scheduling windows | |
Hara et al. | Performance comparison of ILP machines with cycle time evaluation | |
Iliakis et al. | Repurposing GPU microarchitectures with light-weight out-of-order execution | |
Jacobi | Formal verification of complex out-of-order pipelines by combining model-checking and theorem-proving | |
Henry et al. | The ultrascalar processor-an asymptotically scalable superscalar microarchitecture | |
Kalaitzidis | Advanced speculation to increase the performance of superscalar processors | |
Chaudhary | Custom exact branch predictor for astar benchmark | |
Theodoropoulos et al. | A distributed colouring algorithm for control hazards in asynchronous pipelines | |
US11409530B2 (en) | System, method and apparatus for executing instructions | |
Winkel | Optimal Global Instruction Scheduling for the Itanium® Processor Architecture | |
Molina et al. | Implementation of search process for a content-based image retrieval application on system on chip | |
Gellert et al. | Perceptron-Based Selective Load Value Prediction in a Multicore Architecture | |
Roth et al. | Dynamic techniques for load and load-use scheduling | |
Jost et al. | Improving performance in VLIW soft-core processors through software-controlled scratchpads |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |