CN108287730A

CN108287730A - A kind of processor pipeline structure

Info

Publication number: CN108287730A
Application number: CN201810338781.3A
Authority: CN
Inventors: 胡振波
Original assignee: Wuhan Silicon Integrated Co Ltd
Current assignee: Wuhan Silicon Integrated Co Ltd
Priority date: 2018-03-14
Filing date: 2018-04-16
Publication date: 2018-07-17
Anticipated expiration: 2038-04-16
Also published as: CN108287730B

Abstract

The invention discloses a kind of processor pipeline structure, including the location of instruction, Fetch unit, execution unit, memory access unit and writeback unit, the writeback unit writes back module and second including first and writes back module；Described first writes back module for arbitrating writing back sequentially for each long period instruction execution result through execution unit or the output of memory access unit, and the sequence that writes back is made to send sequence consensus with what corresponding long period instructed；Described second writes back module for arbitrate writing back sequentially for multi-cycle instructions implementing result that one-cycle instruction export through execution unit writes back module output with first, and long period is instructed with higher priority；The present invention is improved by the internal structure to pipelined units at different levels, solves the problems, such as that existing processor pipeline structure can not take into account low-power consumption, inexpensive small area and high performance simultaneously.

Description

A kind of processor pipeline structure

Technical field

The invention belongs to processor hardware design fields, more particularly, to a kind of high performance place of super low-power consumption Manage device pipeline organization.

Background technology

In recent years, with the continuous promotion of integrated circuit fabrication process, processor integrated level is continuously improved with performance, accordingly Ground, power consumption are also constantly increasing, with mobile device be widely used and the fast development of Internet of Things, for low-power consumption, The demand of the low cost and high performance processor of small area is continuously increased, and possesses high property again while reducing power consumption, low cost Can, become one new research hotspot of designer.

Authorization Notice No. is that the patent of invention of 104699463 B of CN discloses a kind of Novel hydroelectric cable architecture, using structure The mode of register stack, to reduce a large amount of dynamic power consumptions that register overturning generates caused by data path；Above-mentioned assembly line Structure is primarily adapted for use in a large amount of dynamic power consumptions for reducing that register overturning generates caused by mass rapid transmits, and is advised in transmission The low occasion of the small error rate of mould can not play effectiveness, and can also increase the complexity of design, increase processor area, lead to cost It increases；

The patent of invention that Authorization Notice No. is 101464721 B of CN discloses setting for a kind of control performance and power consumption Meter method, when detecting that processor throughput reduces, reconfigures assembly line by monitoring the performance of pipeline-type processor It is switched to low performance pattern from high performance mode, to reduce power consumption；The system and design are sufficiently complex, and it is small not to be suitable for low cost The processor of area designs, and has also recognized this point in his specific implementation mode, and it is sufficiently complex to mention development With take；

Authorization Notice No. is that the patent of invention of 103218029 B of CN discloses a kind of flowing water knot of control supply voltage Structure by changing the structure of register in existing pipeline organization, while increasing outside register built error correction circuit and assembly line Portion's error correction circuit, further decreases supply voltage, adjusts voltage in real time using the height of error number so that kernel power consumption is further It reduces；Above-mentioned design scheme reduces cost, although reducing work(to a certain extent there is no reduction processor area is considered Consumption, but the complexity and cost of system are also improved simultaneously.

In conclusion although existing processor reduces power consumption to a certain extent, it is the increase in the complexity of system Degree, increases processor area and improves cost.

Invention content

For the disadvantages described above or Improvement requirement of the prior art, the present invention provides a kind of super low-power consumption high-performance processors Pipeline organization is improved by the internal structure to pipelined units at different levels, and its object is to solve existing processor Pipeline organization can not take into account low-power consumption, inexpensive small area and high performance problem simultaneously.

To achieve the above object, according to one aspect of the present invention, a kind of processor pipeline structure is provided, including is referred to Enable storage unit, Fetch unit, execution unit, memory access unit and writeback unit；The first end of Fetch unit and instruction storage are single Member is connected, and second end is connected with the first end of execution unit；The second end of execution unit is connected with the first end of writeback unit, the Three ends are connected with the first end of memory access unit；The second end of memory access unit is connected with the second end of writeback unit；

Fetch unit takes out an instruction from the location of instruction within a clock cycle；Execution unit is used for taking Refer to the instruction of unit output into row decoding and execution, the result of instruction execution writes back register group by writeback unit；

Writeback unit writes back module and second including first and writes back module；First writes back the first end and execution unit of module Second end be connected, second end is connected with the second end of memory access unit, and the first end that third end writes back module with second is connected；The Two write back the second end of module is connected with the 4th end of execution unit；

First writes back module for arbitrating each long period instruction execution result through execution unit or the output of memory access unit Sequence is write back, makes to write back and sequentially sends sequence consensus with what corresponding long period instructed；Second write back module for arbitrate through execution The one-cycle instruction and first that unit exports write back writing back sequentially for the multi-cycle instructions implementing result of module output, and long period refers to Enabling has higher priority.

Preferably, above-mentioned processor pipeline structure, Fetch unit include the first program counter, the second programmed counting Device, PC generation modules, Partial Decode module, branch prediction module and command register；

The first end of command register is connected with the first end of the location of instruction, the first end of second end and execution unit It is connected；The first end of Partial Decode module is connected with the second end of the location of instruction, and the of second end and branch prediction module One end is connected, and third end is connected with the first end of PC generation modules；The of the second end of branch prediction module and PC generation modules Two ends are connected；The third end of PC generation modules is connected with the first end of the first program counter, the 4th end and the first programmed counting The second end of device is connected, and the 5th end is connected with the third end of the location of instruction, and the 6th end is connected with the third end of execution unit； The third end of first program counter is connected with the first end of the second program counter；The second end of second program counter with hold The third end of row unit is connected；

Partial Decode module is used for the present instruction to being taken out from the location of instruction into row decoding to judge that this is current The type of instruction is ordinary instruction or branch's jump instruction, and if ordinary instruction, Partial Decode module is directly by the current finger Order is sent to PC generation modules；PC generation modules are given birth to according to the current instruction address that present instruction and the first program counter are sent The address for waiting for instruction fetch at next；

If branch's jump instruction, then the present instruction is sent to branch prediction module by Partial Decode module；Branch is pre- The jump target addresses that module obtains the present instruction by static prediction are surveyed, PC generation modules are obtained according to branch prediction module The jump target addresses of present instruction generate the next address for waiting for instruction fetch；

Partial Decode module, branch prediction module and PC generation modules are combined logical structure, and the decoding of present instruction divides Branch is predicted and the next generation for waiting for instruction fetch address is completed within the same clock cycle.

Preferably, above-mentioned processor pipeline structure, execution unit include decoding module, send module, instruction trace Module, one-cycle instruction computing module, long period ordering calculation module and delivery module；

The first end of decoding module is connected with the second end of command register, second end and the second of the second program counter End is connected, and third end is connected with the first end of module is sent；Send the second end of module with the first end phase of instruction trace module Even, third end is connected with the first end of one-cycle instruction computing module, the first end at the 4th end and long period ordering calculation module It is connected, the 5th end is connected with the first end of memory access unit；The second end of long period ordering calculation module and first writes back module First end is connected；The second end that the second end of instruction trace module writes back module with first is connected；One-cycle instruction computing module Second end write back the second end of module with second and be connected, third end is connected with the second end of memory access unit, the 4th end with deliver The first end of module is connected；The third end that the second end of delivery module writes back module with second is connected, and third end generates mould with PC 6th end of block is connected；

Instruction trace module sends module for storing the long period command information for being sent away and not yet writing back When carrying out instruction and sending, each long period command information that will store in the information for currently sending instruction and instruction trace module It is compared, to judge whether present instruction is related to the long period instruction generation data for being sent and not yet write back Property, if it is not, then normally sending；If so, pause send, until related long period instruction execution finish release data dependence it Just continue to send afterwards.

Preferably, above-mentioned processor pipeline structure, delivery module include abnormal judging submodule and branch prediction solution Analyse submodule；

The first end of the first end of branch prediction analyzing sub-module and abnormal judging submodule with one-cycle instruction operation 4th end of module is connected, and the second end of the second end of branch prediction analyzing sub-module and abnormal judging submodule is generated with PC 6th end of module is connected；The third end that the third end of abnormal judging submodule writes back module with second is connected；

Branch prediction analyzing sub-module is used to judge PC generation modules according to the operation result of one-cycle instruction computing module Whether the next address for waiting for instruction fetch generated be correct, if so, not dealing with；If it is not, then removing wrong address and generating New next waits for instruction fetch address and is fed back to PC generation modules；

Abnormal judging submodule is used to judge that present instruction is executing according to the operation result of one-cycle instruction computing module Whether mistake occurs in the process, if it is not, not dealing with then；If so, removing current instruction address and generating new address and incite somebody to action It feeds back to PC generation modules.

Preferably, above-mentioned processor pipeline structure, instruction trace module include multiple for storing long period instruction The list item of information, a list item correspond to the information of storage one long period instruction, including source operand register index and result Register index.

Preferably, above-mentioned processor pipeline structure, instruction trace module realize that first writes back module pair using FIFO When multiple long period instructions carry out written-back operation, writing for different long period instructions is arbitrated according to the direction of the read pointer of FIFO sequence It rolls back and rationalize sequence；After a certain long period instruction is written back into, which is instructed corresponding information deletion by instruction trace module.

Preferably, above-mentioned processor pipeline structure, one-cycle instruction computing module are additionally operable to generate memory access Address；

Control module of the memory access unit as memory access, according to above-mentioned memory reference address by address judge from Corresponding instruction is obtained in command storage unit part, or corresponding data is obtained from data storage part.

Preferably, above-mentioned processor pipeline structure, the location of instruction are realized using instruction close coupling memory.

In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show Beneficial effect：

(1) a kind of processor pipeline structure provided by the invention writes back module and second by first and writes back module reality The two-stage for having showed instruction writes back, and first writes back module and the effect of instruction trace module cooperative completes writing for different long periods instruction It returns, so that it is write back sequence and is sent sequence strict conformance, realize the succinct of hardware configuration, reduce processor area；Second It writes back module and writes back sequence for arbitrate whole one-cycle instructions and long period instruction, wherein long period instruction has preferential Grade；And in the idling cycle of no long period instruction write-back, one-cycle instruction then can at will write back；Strategy is write back by two-stage By the delivery of long period instruction and write back separation so that even if performing the instruction of multicycle long period, still will not block flowing water Line allows subsequent one-cycle instruction to remain able to smoothly write back and deliver, improves processor performance；

(2) a kind of processor pipeline structure provided by the invention by instruction trace module and sends module cooperative to be made With solving the problems, such as data dependence；Instruction trace module is for storing the long period for being sent away and not yet writing back Command information, send module carry out instruction send when, the information and instruction trace mould of currently sending instruction is in the block each Long period command information is compared, to judge whether present instruction instructs with the long period that has been sent and not yet write back RAW and WAW correlations are generated, data dependence is such as not present, then normally sends；Such as there is data dependence, then pause is sent, Just continue to send until related long period instruction execution, which finishes, releases data dependence；The present invention is using obstruction flowing water The method of line solves the problems, such as data dependence, and the result without instructing long period is directly quickly bypassed to be waited for subsequent Instruction is sent, the power consumption and area of processor are reduced；

(3) a kind of processor pipeline structure provided by the invention is stored using the ITCM that the monocycle accesses as instruction Device, Fetch unit can fetch an instruction with a cycle from ITCM；Traditional Cache is replaced using ITCM, disclosure satisfy that Super low-power consumption small area processor requirement of real-time, and reduce the cost and area of processor；Partial Decode module, branch prediction Module and PC generation modules are combined logical structure, and Fetch unit completes instruction reading, Partial Decode, divides in one cycle Branch prediction generates the sequence of operations such as the PC that next is waited for instruction fetch, accomplishes continuously instruction fetch, substantially increases processing Device performance.

Description of the drawings

Fig. 1 is a kind of integrated stand composition of processor pipeline structure provided in an embodiment of the present invention；

Fig. 2 is a kind of structure chart of the Fetch unit of processor pipeline structure provided in an embodiment of the present invention；

Fig. 3 is a kind of structure chart of the execution unit of processor pipeline structure provided in an embodiment of the present invention；

Fig. 4 is a kind of structure chart of the writeback unit of processor pipeline structure provided in an embodiment of the present invention.

Specific implementation mode

In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below It does not constitute a conflict with each other and can be combined with each other.

A kind of processor pipeline structure that the embodiment of the present invention is provided, is primarily adapted for use in Embedded super low-power consumption The inexpensive small area processor of Scenario Design, the hummingbird E200 processor cores ground certainly based on RISC-V frameworks such as our company；It should Pipeline organization includes multi-stage pipeline units, specifically includes the location of instruction, Fetch unit, execution unit, memory access unit And writeback unit；The first end of Fetch unit is connected with the location of instruction, and second end is connected with the first end of execution unit；It holds The second end of row unit is connected with the first end of writeback unit, and third end is connected with the first end of memory access unit；Memory access unit Second end is connected with the second end of writeback unit；

The present invention mainly divides the level of assembly line according to the clock cycle, wherein the location of instruction and Fetch unit category In the first level production line, for storing instruction, Fetch unit is for continuous continual single from instruction storage for the location of instruction Instruction fetch in member；Execution unit and writeback unit belong to the second level production line, the finger that execution unit is used to export Fetch unit Enable into row decoding and execution, writeback unit be used for by the result of instruction execution write back general register group (Register File, Regfile)；Because the decoding of instruction is executed and is write back and is in the same clock cycle, by execution unit and writes back list Member is divided into the second level in pipeline organization.

It only needs above-mentioned two level production line can be completed some one-cycle instructions, and some long periods is instructed, Need to use the memory access function of memory access unit, memory access unit belongs to third level production line, but the result of memory access unit output is logical The writeback unit crossed in the second level production line writes back general register group, and therefore, a kind of super low-power consumption provided in this embodiment is high Performance processor pipeline organization is an elongated pipeline organization, compares existing linear type pipeline organization, reduces stream Waterline series, and then the area of processor can be reduced and reduce cost.

Fetch unit is used for the instruction fetch from the location of instruction, and to improve processor performance, the process of instruction fetch need to be done To " fast " and " successive ", since a kind of super low-power consumption high-performance processor pipeline organization that the present embodiment is provided is main Suitable for the low small area processor of Embedded super low-power consumption Scenario Design, the journey of the embeded processor core of this rank Sequence size of code is little, therefore the present invention is using instruction close coupling memory (Instruction Tightly Coupled Memory；ITCM) the storage instructed as the location of instruction, Fetch unit can take in a clock cycle from ITCM Go out an instruction, realizes quick instruction fetch；Compared to traditional I-Cache, meeting super low-power consumption small area processor real-time Under the premise of it is required that, the cost and area of processor can be reduced.

Fetch unit includes the first program counter, the second program counter, PC generation modules, Partial Decode module, divides Branch prediction module and command register；

The first end of command register is connected with the first end of ITCM, and second end is connected with the first end of execution unit；Portion The first end of decoding module is divided to be connected with the second end of ITCM, second end is connected with the first end of branch prediction module, third end It is connected with the first end of PC generation modules；The second end of branch prediction module is connected with the second end of PC generation modules；PC is generated The third end of module is connected with the first end of the first program counter, and the 4th end is connected with the second end of the first program counter, 5th end is connected with the third end of ITCM, and the 6th end is connected with the second end of execution unit；The third end of first program counter It is connected with the first end of the second program counter；The second end of second program counter is connected with the third end of execution unit；

Command register be used to store the instruction (being known as present instruction) taken out from ITCM in a certain clock cycle and Next clock cycle sends it to execution unit；Second program counter is used to receive working as the first program counter transmission Preceding IA simultaneously sends it to execution unit in next clock cycle；Execution unit, which synchronizes, receives present instruction and current IA.

Partial Decode module is used for the present instruction taken out from ITCM into row decoding to judge the class of the present instruction Type is ordinary instruction or branch's jump instruction, and if ordinary instruction, which is directly sent to by Partial Decode module PC generation modules；PC generation modules generate next according to the current instruction address that present instruction and the first program counter are sent Wait for the address (i.e. PC values) of instruction fetch；If branch's jump instruction, then the present instruction is sent to branch by Partial Decode module Prediction module；Branch prediction module obtains the jump target addresses of the present instruction by static prediction, PC generation modules according to The jump target addresses for the present instruction that branch prediction module obtains generate the address that next is waited for instruction fetch and send out it respectively Give the first program counter and ITCM；

Partial Decode module, branch prediction module and PC generation modules are combined logical structure, and the decoding of present instruction divides Branch is predicted and the acquisition of the next generation for waiting for instruction fetch address and present instruction is completed within the same clock cycle, therefore this The Fetch unit that embodiment provides can complete the acquisition of present instruction within a clock cycle and next is waited for instruction fetch address Generation, realize continuously instruction fetch, improve processor performance.

Branch prediction module is using a kind of simple, flexible static prediction method：The conditional branching redirected backward is referred to Order is predicted as really redirecting, and the conditional branch instructions redirected forward is predicted as not redirecting, specific main points are as follows：

1, for the direct jump instruction of conditional, the conditional branch instructions such as such as BEQ, BNE use above-mentioned static prediction method (redirecting backward, be predicted as needing to redirect, be otherwise predicted as to redirect)；For jump target addresses, using its PC and The offset that immediate indicates is added to obtain its jump target addresses；

2, for unconditional direct jump instruction, such as jal is instructed, and since it is bound to redirect, there is no need to predict that it is redirected Direction；For jump target addresses, it is added with the offset that immediate indicates to obtain its jump target addresses using its PC.

3, for unconditional indirect jump instruction, such as jalr is instructed, and since it is bound to redirect, there is no need to predict its jump Turn direction；For its jump target addresses, the operation that the base address needed for jump target addresses comes from its rs1 index is calculated Number, needs to read from general register group, and be also possible to and be carrying out the instruction executed in unit and form RAW data phases Guan Xing；The present invention is different according to its rsl index and takes different schemes：If corresponding call number is literal register, directly It connects and uses the constant, without being read from related register；If corresponding call number is common link registers, by the deposit Device direct cable takes out, and generates the data dependence of read-after-write in order to prevent, needs judgement in the second level production line Writeback unit not to the register carry out written-back operation, it is specific as follows：

If 31, the call number of rs1 is x0, it is (fixed according to RISC-V frameworks that the constant 0 that then be used directly carries out base address calculating Adopted x0 indicates constant 0), without being read from Regfile.

If 32, the call number of rs1 is x1, refer to as function return jump since x1 is frequently utilized for link registers It enables, by x1, direct cable takes out from the Regfile of execution unit, need not occupy the Read Port of Regfile；For It prevents from being carrying out the instruction executed in unit and needs to write back link registers to cause RAW data dependences, branch pre- Module is surveyed to need to judge currently whether there are instruction write-back link registers；

If 3, the call number of rs1 is needed using Regfile's in addition to other of x0 and x1 register (abbreviation xn) Read Port read out xn from Regfile, and need to judge whether resource is not present in the free time to current Read Port Conflict；Meanwhile it being carrying out the instruction executed in unit in order to prevent needs to write back xn and causing RAW data dependences, branch pre- Module is surveyed to need to judge currently whether there is instruction write-back Regfile.

Execution unit includes decoding module, sends module, instruction trace module, one-cycle instruction computing module, long period Ordering calculation module and delivery module；

The first end of decoding module is connected with the second end of command register, second end and the second of the second program counter End is connected, and third end is connected with the first end of module is sent；Send the second end of module with the first end phase of instruction trace module Even, third end is connected with the first end of one-cycle instruction computing module, the first end at the 4th end and long period ordering calculation module It is connected, the 5th end is connected with the first end of memory access unit；The second end of instruction trace module is connected with the first end of writeback unit； The second end of one-cycle instruction computing module is connected with the second end of writeback unit, the second end phase at third end and memory access unit Even, the 4th end is connected with the first end of delivery module；The second end of long period ordering calculation module and the third end of writeback unit It is connected；The second end of delivery module is connected with the 4th end of writeback unit, and third end is connected with the 6th end of PC generation modules；

Decoding module is used for the present instruction of acquisition and current instruction address into row decoding to obtain operand register Index；And for corresponding operation data to be obtained from Read-Regfile according to operand register index；

Send module for being sent the operation data that decoding module obtains to different arithmetic elements according to instruction type It executes；Wherein, one-cycle instruction computing module is mainly used for the operation and execution of one-cycle instruction, long period ordering calculation module It is mainly used for the operation and execution of long period instruction；One-cycle instruction and the implementing result of long period instruction pass through writeback unit Write back Write-Regfile；

Delivery module is used to the result of calculation of one-cycle instruction computing module consigning to PC generation modules；Delivery module packet Include abnormal judging submodule and branch prediction analyzing sub-module；

The first end of the first end of branch prediction analyzing sub-module and abnormal judging submodule with one-cycle instruction operation 4th end of module is connected, and the second end of the second end of branch prediction analyzing sub-module and abnormal judging submodule is generated with PC 6th end of module is connected；The third end of abnormal judging submodule is connected with the 4th end of writeback unit；

Branch prediction analyzing sub-module is used to judge PC generation modules according to the result of calculation of one-cycle instruction computing module Next generated waits for whether instruction fetch address is correct, if so, not dealing with；If it is not, then removing wrong address and generating new Next wait for instruction fetch address and be fed back to PC generation modules；Abnormal judging submodule according to one-cycle instruction for transporting The result of calculation for calculating module judges whether present instruction occurs mistake in the process of implementation, if it is not, not dealing with then；If so, It removes current instruction address and generates new address and be fed back to PC generation modules；PC generation modules are by the new address of acquisition It is sent to ITCM, Fetch unit instruction fetch and is sent to execution unit and is executed into row decoding from ITCM again.

Since mistake may occur in the process of implementation for the instruction of part long period, so writeback unit needs and abnormal judgement Submodule triggers exception into line interface, if producing exception, the implementing result of long period instruction does not write back Write- Regfile。

Send module needs whether to check it based on the micro-architecture sent in order when instruction is sent at every It is executed but there are data dependences between the instruction that not yet writes back with sending before；Data dependence is divided into writeafterread (Write- After-Read；WAR), read-after-write (Read-After-Write；) and write after write (Write-After-Write RAW；WAW) several Kind；

1, WAR correlations：Since pipeline organization provided by the invention is suitable for being based on sending in order, write back in order Micro-architecture processor, source operand is just had read when sending from general register group in instruction, it is therefore " follow-up The instruction write-back Write-Regfile operation of execution " there is no fear of being happened at that " instruction that preamble executes is from Read-Regfile Before read operands ", therefore it there is no fear of data collision caused by WAR correlations occur.

2, RAW correlations：The second level of the instruction in assembly line sent, it is assumed that the instruction sent before is (referred to as Preamble instructs) be one-cycle instruction (second level for being also at assembly line writes back), then preamble one-cycle instruction, which has been completed, holds The instruction gone and resulted back into Write-Regfile, therefore sending can not possibly generate and preamble one-cycle instruction RAW correlations caused by data collision；It is assumed that preamble instruction is long period instruction, since long period instruction needs are multiple Period could write-back result, therefore the instruction sent is possible to generate the RAW correlations that instruct with preamble long period.

3, WAW correlations：The second level of the instruction in assembly line sent, it is assumed that preamble instruction is to refer to the monocycle It enabling, then preamble one-cycle instruction, which has been completed, executes and has resulted back into Write-Regfile, therefore sending Instruction can not possibly generate data collision caused by the WAW correlations with preamble one-cycle instruction；It is assumed that preamble instruction is long Cycles per instruction, due to long period instruction need multiple periods could write-back result, the instruction sent is possible to generate With the WAW correlations of preamble long period instruction.

To sum up, in pipeline organization provided by the invention, " instruction sent " is only possible to and " has not carried out and finish Long period instruction " between generate RAW and WAW correlations.

In order to detect RAW the and WAW correlations between the instruction currently sent and preamble long period instruct, this hair Bright that an instruction trace module is provided in execution unit, which is sent away and still for storing The long period command information not write back, information include but not limited to the source operand register index and result of long period instruction Register index；

The instruction trace module preferably uses the FIFO of first in, first out mechanism to realize；Module is sent often to send a long period Instruction is then long period instruction one list item (Entry) of distribution in instruction trace module, for storing long period instruction Source operand register index and result register index；Writeback unit writes back the implementing result that the long period instructs After Write-Regfile, which is then instructed corresponding list item to remove by instruction trace module, therefore instruction trace module Middle storage is the long period command information for being sent away and not yet writing back；Send module carry out instruction send when, The source operand register index and result register index and instruction trace mould each list item in the block of instruction will currently be sent Information is compared, and RAW is generated to judge whether present instruction instructs with the long period that has been sent and not yet write back With WAW correlations, data dependence is such as not present, then normally sends；Such as there is data dependence, then pause is sent, Zhi Daoxiang Customs director's cycles per instruction be finished release data dependence after just continue to send.The depth of FIFO is defaulted as two tables , you can while storing the information of two long periods instruction；List item number is preferentially no more than four, otherwise will reduce processor The speed of service.

Pipeline organization provided by the invention conflicts for caused by data dependence, using obstruction assembly line method, And there is no directly quickly bypassing and waiting sending instruction to subsequent the result of long period instruction, reduce the power consumption of processor with Area.

Part long period is instructed, such as Load and Store instructions and " A " extended instruction, needs to use memory access list The memory access function of member；Above-metioned instruction is after sending module to be sent to one-cycle instruction computing module, one-cycle instruction computing module Memory reference address is generated through operation and sends it to memory access unit, control mould of the memory access unit as memory access Block judges to obtain corresponding instruction from command storage unit part by address, or obtains corresponding data from data storage part.

Writeback unit writes back module and second including first and writes back module, and first writes back the first end of module and long period refers to The second end of computing module is enabled to be connected, second end is connected with the second end of instruction trace module, and third end and second writes back module First end be connected, the 4th end is connected with the third end of memory access unit；Second writes back the second end and one-cycle instruction fortune of module The second end for calculating module is connected, and third end is connected with the third end of abnormal judging submodule；

First, which writes back module, is mainly used for arbitrating writing back for each long period instruction execution result, as shown in figure 4, long period refers to It enables the operation result after long period ordering calculation module or memory access cell processing initially enter first and writes back module；In addition, the One operation result for writing back the long period instruction of module reception is also possible to come from multiplier-divider, FPU and EAI coprocessors etc.； It when these long period instruction write-backs, is theoretically not necessarily to stringent send sequence according to it, it is only necessary to occur conflicting in register It is followed when situation and sends sequence, remaining time out of order can write back.But in order to realize the succinct of hardware, the present embodiment choosing It selects and strictly sends sequence to carry out writing back for its operation result according to what long period instructed；Due to different long period instruction executions Periodicity is different or even the execution cycle number of some long periods instruction is dynamic, therefore can not easily judge these The precedence relationship of long period instruction, so needing the precedence relationship between pre-recorded these long periods instruction.

Instruction tracing module provided in this embodiment is the information for recording long period instruction, and module is sent often to send one A long period instruction can be then long period instruction one list item of distribution in instruction tracing module to record long period instruction Information；Instruction label (the Instruction Tag that the FIFO pointers (Pointer) of this list item are instructed as the long period； ITAG)；Long period instruction carries always its corresponding ITAG after sending when its operation result is written back into；

Instruction trace module and first writes back the written-back operation that all long period instructions are completed in module cooperative cooperation, and first writes The operation result for returning the long period instruction that module receives includes that the long period instructs corresponding ITAG；Due to instruction trace module It is the FIFO of a first in, first out, the read pointer (ReadPointer) of FIFO can be directed toward the list item for entering instruction trace module at first, First writes back module is sent to the operation result of the long period instruction corresponding to the list item second and writes back module, meanwhile, instruction The long period is instructed corresponding list item to delete by tracking module；First to write back module true according to the direction of instruction trace module sequence Fixed length cycles per instruction operation result writes back sequence, it is ensured that it writes back sequence and sends sequence strict conformance.

Second, which writes back module, is mainly used for receiving the operation result for the one-cycle instruction that one-cycle instruction computing module is sent, And the operation result of the long period instruction after first writes back module arbitration, and all instructions is carried out by the way of priority The arbitration for writing back sequence, since the execution period of long period instruction is long, than the one-cycle instruction that is writing back in program flow In in position earlier, so long period instruction writes back writing back with higher priority than one-cycle instruction.If In the idling cycle of no long period instruction write-back, one-cycle instruction then can at will write back；In later i.e. in program flow (if without data dependence), therefore the one-cycle instruction of position can first write back than the long period instruction of position earlier Pipeline organization provided in an embodiment of the present invention is provided simultaneously with the out of order ability write back.

Compared to existing processor pipeline structure, a kind of processor pipeline structure provided by the invention, by right The internal structure of pipelined units at different levels is improved, in Fetch unit be arranged combined logical structure Partial Decode module, Branch prediction module and PC generation modules, the acquisition that can complete present instruction within a clock cycle and next wait for instruction fetch The generation of address realizes continuously instruction fetch, improves processor performance；Instruction trace mould is set in execution unit Block solves the problems, such as data dependence using the method for obstruction assembly line, and the result without instructing long period is directly fast Speed bypasses and waits sending instruction to subsequent, reduces the power consumption and area of processor；Writeback unit is divided into first and writes back module Module is write back with second, strategy is write back by delivery that long period instructs by two-stage and writes back separation so that even if performing more Period long period instructs, and still will not block assembly line, subsequent one-cycle instruction is allowed to remain able to smoothly write back and deliver, Improve processor performance；Solve existing processor pipeline structure can not take into account simultaneously low-power consumption, inexpensive small area and High performance problem.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include Within protection scope of the present invention.

Claims

1. a kind of processor pipeline structure, including the location of instruction, Fetch unit, execution unit, memory access unit and write back Unit, which is characterized in that the first end of the Fetch unit is connected with the location of instruction, and the first of second end and execution unit End is connected；The second end of the execution unit is connected with the first end of writeback unit, the first end phase at third end and memory access unit Even；The second end of the memory access unit is connected with the second end of writeback unit；

The Fetch unit takes out an instruction from the location of instruction within a clock cycle；The execution unit is used for To Fetch unit output instruction into row decoding and execution, the result of instruction execution writes back register by the writeback unit Group；

The writeback unit writes back module and second including first and writes back module；Described first writes back first end and the execution of module The second end of unit is connected, and second end is connected with the second end of memory access unit, the first end phase that third end writes back module with second Even；Described second writes back the second end of module is connected with the 4th end of execution unit；

Described first writes back module for arbitrating each long period instruction execution result through execution unit or the output of memory access unit Sequence is write back, the sequence that writes back is made to send sequence consensus with what corresponding long period instructed；Described second writes back module for secondary The one-cycle instruction and first that sanction is exported through execution unit write back writing back sequentially for the multi-cycle instructions implementing result of module output, Long period instruction has higher priority.

2. processor pipeline structure as described in claim 1, which is characterized in that the Fetch unit includes the first program meter Number device, the second program counter, PC generation modules, Partial Decode module, branch prediction module and command register；

The first end of described instruction register is connected with the first end of the location of instruction, the first end of second end and execution unit It is connected；The first end of the Partial Decode module is connected with the second end of the location of instruction, second end and branch prediction module First end be connected, third end is connected with the first end of PC generation modules；The second end of the branch prediction module is generated with PC The second end of module is connected；The third end of the PC generation modules is connected with the first end of the first program counter, the 4th end with The second end of first program counter is connected, and the 5th end is connected with the third end of the location of instruction, the 6th end and execution unit Third end be connected；The third end of first program counter is connected with the first end of the second program counter；Described second The second end of program counter is connected with the third end of execution unit；

The Partial Decode module is used for the present instruction to being taken out from the location of instruction into row decoding to judge that this is current The type of instruction is ordinary instruction or branch's jump instruction, and if ordinary instruction, Partial Decode module is directly by the current finger Order is sent to PC generation modules；The present instruction that the PC generation modules are sent according to present instruction and the first program counter Location generates the next address for waiting for instruction fetch；

If branch's jump instruction, then the present instruction is sent to branch prediction module by Partial Decode module；The branch is pre- The jump target addresses that module obtains the present instruction by static prediction are surveyed, PC generation modules are obtained according to branch prediction module The jump target addresses of present instruction generate the next address for waiting for instruction fetch；

The Partial Decode module, branch prediction module and PC generation modules are combined logical structure, and the decoding of present instruction divides Branch is predicted and the next generation for waiting for instruction fetch address is completed within the same clock cycle.

3. processor pipeline structure as claimed in claim 1 or 2, which is characterized in that the execution unit includes decoding mould Block sends module, instruction trace module, one-cycle instruction computing module, long period ordering calculation module and delivery module；

The first end of the decoding module is connected with the second end of command register, second end and the second of the second program counter End is connected, and third end is connected with the first end of module is sent；The second end and the first of instruction trace module for sending module End is connected, and third end is connected with the first end of one-cycle instruction computing module, and the of the 4th end and long period ordering calculation module One end is connected, and the 5th end is connected with the first end of memory access unit；The second end of the long period ordering calculation module is write with first The first end for returning module is connected；The second end that the second end of described instruction tracking module writes back module with first is connected；The list The second end that the second end of cycles per instruction computing module writes back module with second is connected, the second end phase at third end and memory access unit Even, the 4th end is connected with the first end of delivery module；The third end phase that the second end of the delivery module writes back module with second Even, third end is connected with the 6th end of PC generation modules；

Described instruction tracking module is described to send for storing the long period command information for being sent away and not yet writing back When carrying out instruction and sending, each long period that will be stored in the information for currently sending instruction and instruction trace module instructs module Information is compared, and data phase is generated to judge whether present instruction instructs with the long period that has been sent and not yet write back Guan Xing, if it is not, then normally sending；If so, pause is sent, until related long period instruction execution finishes releasing data dependence Just continue to send later.

4. processor pipeline structure as claimed in claim 3, which is characterized in that the delivery module includes abnormal judgement Module and branch prediction analyzing sub-module；

The first end of the first end of the branch prediction analyzing sub-module and abnormal judging submodule with one-cycle instruction operation 4th end of module is connected, and the second end of the second end of branch prediction analyzing sub-module and abnormal judging submodule is generated with PC 6th end of module is connected；The third end that the third end of abnormal judging submodule writes back module with second is connected；

The branch prediction analyzing sub-module is used to judge PC generation modules according to the operation result of one-cycle instruction computing module Whether the next address for waiting for instruction fetch generated be correct, if so, not dealing with；If it is not, then removing wrong address and generating New next waits for instruction fetch address and is fed back to PC generation modules；

The exception judging submodule is used to judge that present instruction is executing according to the operation result of one-cycle instruction computing module Whether mistake occurs in the process, if it is not, not dealing with then；If so, removing current instruction address and generating new address and incite somebody to action It feeds back to PC generation modules.

5. processor pipeline structure as claimed in claim 3, which is characterized in that described instruction tracking module includes multiple use In the list item of storage long period command information, a list item corresponds to the information of storage one long period instruction, described information Including source operand register index and result register index.

6. processor pipeline structure as claimed in claim 5, which is characterized in that described instruction tracking module is real using FIFO Existing, first, which writes back module, instructs multiple long periods when carrying out written-back operation, according to the direction of the read pointer of FIFO sequence That arbitrates different long period instructions writes back sequence；After a certain long period instruction is written back into, described instruction tracking module is all by the length Phase instructs corresponding information deletion.

7. processor pipeline structure as claimed in claim 3, which is characterized in that the one-cycle instruction computing module is also used In generation memory reference address；

Control module of the memory access unit as memory access, according to the memory reference address by address judge from Corresponding instruction is obtained in command storage unit part, or corresponding data is obtained from data storage part.

8. processor pipeline structure as described in claim 1, which is characterized in that described instruction storage unit is tight using instruction Coupled memory is realized.