CN113946368A

CN113946368A - Three-level pipeline architecture based on RISC-V instruction set, processor and data processing method

Info

Publication number: CN113946368A
Application number: CN202111275421.1A
Authority: CN
Inventors: 赵翠华; 张海金; 杨靓; 娄冕; 李红桥; 李磊; 罗敏涛; 黄巾
Original assignee: Xian Microelectronics Technology Institute
Current assignee: Xian Microelectronics Technology Institute
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-18
Anticipated expiration: 2041-10-29
Also published as: CN113946368B

Abstract

The invention provides a three-stage pipeline architecture based on a RISC-V instruction set, which comprises an instruction fetching stage module, a decoding stage module, an execution stage module and a register file; the original second-stage flowing water is divided into second-stage and third-stage flowing water, so that the logic of the second-stage flowing water is reduced, and the main frequency is promoted. The invention controls the instruction flow reaching the subsequent execution stage by carrying out correlation decoding on the source register and the target register of the current instruction and the target register in the production line, if the instruction flow is correlated and the production line is stopped, if the instruction flow is not correlated, the instruction of the decoding stage is sent to the execution stage, and the correctness of the execution of the processor function under out-of-order delivery is ensured. The architecture of the invention adopts a rapid execution mode of parallel execution and out-of-order delivery of long-period instructions, allows long-period instructions with longer execution time, such as load/store, division and the like, to be executed in parallel with ALU and other long-period instructions under the condition of no resource conflict, and accelerates the execution performance of the processor.

Description

Three-level pipeline architecture based on RISC-V instruction set, processor and data processing method

Technical Field

The invention belongs to the field of low-power-consumption processor core architectures, and relates to a three-level pipeline architecture based on a RISC-V instruction set, a processor and a data processing method.

Background

At present, the commercial low-power-consumption processor core mainly uses a Cortex-M series core of an ARM architecture, a pipeline is generally short, and the pipeline is 2-level or 3-level, but the commercial architecture core has the defects of high authorization cost and low performance. The patent document with application number 201810933214.2 discloses a two-stage pipeline architecture based on RISC-V instruction set, which provides a low power consumption processor architecture based on two-stage pipeline, but based on two-stage pipeline, the first stage is instruction fetch, the second stage is decoding, executing and write-back, the second stage pipeline has large logic, which is not good for improving the overall dominant frequency of the processor.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a three-level pipeline architecture based on a RISC-V instruction set, a processor and a data processing method, wherein the second-level pipeline logic is reduced, and the improvement of the integral dominant frequency of the processor is facilitated.

The invention is realized by the following technical scheme:

a three-level pipeline architecture based on RISC-V instruction set is characterized by comprising an instruction fetching level module, a decoding level module, an execution level module and a register file;

the instruction fetching stage module is used for acquiring a current instruction, receiving an instruction skipping indicating signal, generating a PC value to be accessed according to the current instruction and the instruction skipping indicating signal, and outputting the current instruction and the PC value to the decoding stage module;

the decoding stage module is used for carrying out command analysis on the received current instruction to obtain an instruction analysis indicating signal, a source register rs1, a source register rs2 and a destination register of the current instruction, meanwhile, whether the source register rs1, the source register rs2 or the destination register of the current instruction are related to a destination register of an execution instruction in the execution stage module is judged according to the source register rs1, the source register rs2, the destination register of the current instruction and the received write control and instruction completion indicating signal, if so, the pipeline is halted, if not, the instruction analysis indicating signal is output to the execution stage module, the related indicating signal is output to the execution stage module, and read control is output to a register file; the register file outputs read data 1 and read data 2 to the execution level module;

and the execution stage module is used for generating an operand 1 and an operand 2 according to the related indication signal, the write data, the read data 1 and the read data 2, executing the instruction according to the operand 1, the operand 2 and the instruction analysis indication signal, outputting an instruction completion indication signal to the decoding stage module, outputting an instruction jump indication signal to the instruction fetching stage module, obtaining the write data and the write control, outputting the write data and the write control to the register file, and outputting the write control to the decoding stage module.

Preferably, the instruction fetching stage module comprises a branch prediction unit, a PC generation unit and an instruction buffer;

the PC generating unit generates a PC value to be accessed according to the instruction branch prediction result, the instruction jump indicating signal and the instruction filling state in the instruction buffer; the instruction buffer acquires a current instruction based on the PC value and writes the current instruction into the instruction buffer; the branch prediction unit carries out branch prediction on the instruction based on the current instruction in the instruction buffer, generates an instruction branch prediction result and outputs the instruction branch prediction result to the PC generation unit; the instruction buffer outputs the current instruction to the decoding level module, and the PC generating unit outputs the PC value to the decoding level module.

Preferably, the decoding stage module comprises a command parsing unit, an operand decoding unit and a related judgment circuit;

the command analysis unit is used for receiving the current instruction output by the instruction fetching level module and carrying out command analysis on the current instruction to obtain an instruction analysis indicating signal;

the operand decoding unit is used for receiving the PC value output by the instruction fetching stage module and decoding the instruction to obtain a source register rs1, a source register rs2 and a destination register of the current instruction and outputting the source register rs1, the source register rs2 and the destination register to the related judging circuit;

and the related judging circuit is used for judging whether the source register rs1, the source register rs2 or the destination register of the current instruction is related to the destination register of the execution instruction in the execution stage module or not according to the source register rs1, the source register rs2 and the destination register of the current instruction and the received write control and instruction completion indicating signals, if so, the pipeline is halted, and if not, the instruction analysis indicating signal is output to the execution stage module, the related indicating signal is output to the execution stage module, and the read control is output to the register file.

Further, the judgment types of the relevant judgment circuit are classified into the following five cases: a determination that the source register rs1 of the current instruction is related to the destination register of the executing instruction; a determination that the source register rs2 of the current instruction is related to the destination register of the executing instruction; a determination that a destination register of the current instruction is related to a destination register of the execution instruction; the x1 register is associated with the predicate of the register that executes the instruction; the current instruction is a REM instruction, and the execution instruction is a DIV instruction, which conforms to the judgment of DIV/REM instruction sequence.

Further, when the two registers are determined to be related to each other, if the two registers are the same, it is determined that the two registers are related to each other.

Preferably, the execution stage module comprises an operand generation circuit, an execution unit and a write control circuit;

the operand generating circuit is used for receiving the related indication signals output by the decoding stage module, the read data 1 and the read data 2 output by the register file and the write data output by the write control circuit, obtaining the operand 1 and the operand 2 after processing and outputting the operand 1 and the operand 2 to the execution unit;

the execution unit is used for analyzing the instruction, the operand 1 and the operand 2 according to the received instruction, outputting the obtained operation result and the instruction completion instruction signal to the write control circuit, outputting the instruction completion instruction signal to the decoding stage module and outputting the instruction jump instruction signal to the instruction fetching stage module;

and the write control circuit is used for outputting write data according to the received operation result and the instruction completion indicating signal, outputting the write control to the register file, outputting the write control to the decoding stage module, and outputting the write data to the operand generation circuit.

Furthermore, the write control circuit adopts a null insertion strategy for writing in the register of the long-cycle instruction destination delivered out-of-order.

Preferably, corresponding inter-stage registers are arranged among the instruction-fetching stage module, the decoding stage module and the execution stage module, and are respectively an instruction-fetching stage register, a decoding stage register and an execution stage register.

Preferably, the interstage propulsion of the three-stage pipeline formed by the instruction-fetching stage module, the decoding stage module and the execution stage module adopts a global pipeline hold propulsion mode or a pipeline interstage valid and ready interactive handshake mode.

A processor comprising said RISC-V instruction set based three-stage pipeline architecture.

A data processing method is based on the three-stage pipeline architecture based on the RISC-V instruction set, and comprises the following three stages of pipelines:

fetching instruction level flowing water: acquiring a current instruction, receiving an instruction jump indicating signal, generating a PC value to be accessed according to the current instruction and the instruction jump indicating signal, and outputting the current instruction and the PC value to a decoding-level pipeline;

decoding level pipelining: performing command analysis on a received current instruction to obtain an instruction analysis indicating signal, and a source register rs1, a source register rs2 and a destination register of the current instruction, and meanwhile, judging whether the source register rs1, the source register rs2 or the destination register of the current instruction are related to a destination register of an execution instruction in execution-stage flow according to the source register rs1, the source register rs2, the destination register of the current instruction and the received write control and instruction completion indicating signal, if so, pausing the flow line, otherwise, outputting the instruction analysis indicating signal to the execution-stage flow, outputting a related indicating signal to the execution-stage flow, and outputting read control to a register file; the register file outputs read data 1 and read data 2 to an execution stage pipeline;

execution level pipelining: generating an operand 1 and an operand 2 according to the related indication signals, the write data, the read data 1 and the read data 2, executing the instruction according to the operand 1, the operand 2 and the instruction analysis indication signals, outputting an instruction completion indication signal to a decoding level pipeline, outputting an instruction jump indication signal to an instruction fetch level pipeline, obtaining write data and write control, outputting the write data and the write control to a register file, and outputting the write control to a decoding level pipeline.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention divides the original second-stage flow into the second-stage flow and the third-stage flow based on the three-stage flow architecture of the RISC-V instruction set, so that the logic of the second-stage flow is reduced, and the promotion of the dominant frequency is facilitated. The invention provides a decoding stage module, which provides a decoding circuit for supporting the parallel execution of long-period instructions and controlling the instruction flow reaching the subsequent execution stage. Based on the dynamic pipeline, a plurality of destination registers of the executing but not delivered instructions in the pipeline are recorded dynamically in real time. The source and destination registers of the current instruction and the destination register in the production line are subjected to correlation decoding, the instruction flow reaching the subsequent execution stage is controlled, if the instruction flow is correlated, the production line is stopped, if the instruction flow is not correlated, the instruction in the decoding stage is sent to the execution stage, and the correctness of the execution of the processor function under out-of-order delivery is ensured. The architecture of the invention adopts a rapid execution mode of parallel execution and out-of-order delivery of long-period instructions, allows long-period instructions with longer execution time, such as load/store, division and the like, to be executed in parallel with ALU and other long-period instructions under the condition of no resource conflict, and accelerates the execution performance of the processor. The pipeline architecture has the advantages of clear implementation method, reduction of the cost of the processor kernel architecture, simple control logic, strong universality and expandability, and capability of reasonably controlling the number of parallel execution instructions by combining with the hardware execution resources of the actual execution stage.

Further, the execution stage proposes a register write control logic that supports out-of-order commit. And when the long-period instruction is executed and the target register is required to be written, adopting an insertion strategy, and finishing the writing of the target register of the long-period instruction when no other register is written in the current period. By the aid of the insertion strategy, pipeline stall cycles which are required to be added for long-period instruction writing are reduced, and instruction execution speed is accelerated.

Drawings

FIG. 1 is a block diagram of a three-level pipeline architecture based on the RISC-V instruction set according to the present invention;

FIG. 2 shows the propulsion mode between stages: (a) propelling among global pipeline stages, (b) propelling among interactive handshake pipeline stages;

FIG. 3 is a correlation determination circuit;

fig. 4 is a write control circuit.

Detailed Description

The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.

The invention relates to a three-level pipeline architecture based on a RISC-V instruction set, which specifically introduces the following steps: first, as shown in FIG. 1, the three-stage pipeline architecture based on RISC-V instruction set includes an instruction fetch stage module, a decode stage module, and an execution stage module. The instruction fetching level module mainly comprises a branch prediction unit, a PC (personal computer) generation unit, an instruction buffer and an instruction fetching access interface; the decoding level module mainly comprises a command analysis unit, an operand decoding unit and a related judgment circuit; the execute stage module mainly comprises an operand generating circuit, an execution unit (LD/ST queue, divider, multiplier, ALU, CSR), a data access interface and a write control circuit.

And secondly, corresponding interstage registers are arranged among the instruction fetching stage module, the decoding stage module and the execution stage module in the three-stage pipeline, and are respectively an instruction fetching stage register, a decoding stage register and an execution stage register. The three-stage pipeline interstage propulsion can adopt a global pipeline hold propulsion mode and a pipeline interstage valid and ready interactive handshake mode, as shown in fig. 2, fig. 2(a) is a global pipeline interstage propulsion mode, namely, based on a global pipeline control signal, three pipeline stages carry out pipeline propulsion in a unified way; FIG. 2(b) shows the propulsion between valid and ready interactive handshake flow stages, that is, when the current flow stage completes its work and the next flow stage can receive new work, the current flow stage can be transferred to the next flow stage.

The PC generating unit in the instruction fetching level module is connected with the instruction fetching access interface, the branch prediction unit and the instruction buffer; the instruction fetching access interface is connected with the PC generating unit and the instruction buffer, and the branch prediction unit is connected with the PC generating unit and the instruction buffer. The PC generating unit generates a PC value to be accessed according to the instruction branch prediction result, the instruction jump indicating signal and the instruction filling state in the instruction buffer, the instruction buffer acquires an instruction through an instruction fetching access interface and a bus based on the PC value, and writes the instruction into the instruction buffer; the branch prediction unit carries out branch prediction on the instruction based on the instruction in the instruction buffer, generates an instruction branch prediction result and transmits the instruction branch prediction result to the PC generation unit; the instruction buffer and PC generation unit takes the instruction and the PC value as the input of the instruction fetching stage register and transmits the input to the decoding stage module.

Thirdly, the output of the instruction fetching stage register is connected with the command analysis unit and the operand decoding unit, the command analysis unit analyzes the 32/16-bit command, an instruction analysis indicating signal is output to serve as the input of the decoding stage register, the operand decoding unit decodes the 32/16-bit command, and the output source register rs1, the source register rs2 and the output destination register are connected with the input of the related judging circuit. The write control signal output by the execution stage register and the instruction completion indicating signal output by the execution unit in the execution stage module are used as the input of the relevant judging circuit, the read control signal is output by the relevant judging circuit as the input of the register file, and the relevant indicating signal is output as the input of the decoding stage register.

Fourthly, the related indication signal of the decoding stage register, the read data 1 and the read data 2 output by the register file and the write data of the write control circuit of the execution stage are used as the input of an operand generating circuit, the output of the operand generating circuit is used as the input of an execution unit, the instruction analysis indication signal of the decoding stage register is used as the input of the execution unit, the output operation result and the instruction completion indication signal of the execution unit are used as the input of a write control circuit, the output instruction jump indication signal of the execution unit is used as the input of a PC generating unit, and the output write data and the write control of the write control circuit are connected with the input of the register file and are also used as the input of the execution stage register.

Fifthly, the input of the register file is the write data and write control output by the write control circuit and the read control output by the related judging circuit, and the read data 1 and the read data 2 output by the register file are connected with the input of the operand generating circuit. The register file writing time sequence is to complete the writing operation at the next rising edge of the writing address and the writing data clock, and the reading time sequence is to complete the reading operation at the next rising edge of the reading address and the reading data clock.

Sixth, based on the execution units of the execution stage, up to a number (2 in this embodiment) of Load/Store instructions, 1 DIV instruction, 1 REM instruction, and 1 single-cycle ALU instruction/MUL instruction/CSR instruction of the current execution stage may be executed in the execution stage pipeline. Based on the execution strategy of the long-period instruction, register related conflicts and resource related conflicts are mainly solved in order to ensure the correctness of the execution semantics of the long-period instruction. For the resource conflict introduced by the execution stage, pipeline control needs to be performed in cooperation with the execution stage, for example, if there is a division DIV instruction and another division DIV instruction in the pipeline, it is necessary to wait for the previous DIV instruction to complete execution before corresponding to the subsequent DIV instruction, and the following analysis is focused on register related conflict.

Register correlation, which mainly handles RAW correlation and WAW correlation. Raw (read after write), in decoding stage, checking whether the source register rs1/rs2 of the current instruction is the same as the destination register rd of the execution stage executing instruction, if so, correlation occurs, and no inter-stage transfer, i.e. pipeline stopping, is performed. In order to improve the instruction execution efficiency, a data bypass mode is adopted. For long-period instructions, register correlation is released in the period of writing rd of the long-period instructions, and the registered operation result can be directly accessed in the next period. The WAW (write after write) is based on the pipelining strategy of the out-of-order delivery of the processor, and simultaneously needs to judge whether the destination register rd of the current instruction is the same as the destination register rd of the execution stage execution instruction, if so, correlation occurs, the pipelining needs to be halted for waiting, and the elimination mode is similar to the register correlation. Based on the above-mentioned register correlation analysis and the instruction jump of the DIV/REM instruction sequence determination and instruction fetch stage jal based on x1, the decoding stage correlation determination circuit is shown in FIG. 3, which is divided into the following five cases:

before explaining the five cases, first, the following signals are used to indicate the meanings:

rs 1: current instruction source register rs 1;

rs 2: current instruction source register rs 2;

rd: the current instruction destination register rd;

x 1: indicating the x1 register;

ld1_ rd: rd, representing the load sequence 1 instruction;

ld1_ valid: indicating that the load sequence 1 instruction is currently executing;

ld2_ rd: rd, representing the load sequence 2 instruction;

ld2_ valid: indicating that the load sequence 2 instruction is currently executing;

div _ rd: rd representing a division instruction;

div _ valid: indicating that a divide instruction is currently executing;

rem _ rd: rd, representing the REM instruction;

rem _ valid: indicating that the REM instruction is currently executing;

rf _ rd: represents a write address of the register file;

rf _ rd _ we: represents write enable of the register file;

longinst: representing a non-long-cycle instruction that writes to a register file.

de _ rf _ rd: a destination register representing a current instruction;

de _ rf _ rd _ we, indicating that the register for write purposes of the current instruction is enabled.

(1) Rs1 of the current instruction is judged relative to the instruction occurrence data of rd written by the execution stage, and the judgment logic is as follows:

{ld1_rd|5{ld1_valid}＝＝rs1}|

{ld2_rd|5{ld2_valid}＝＝rs1}|

{div_rd|5{div_valid}＝＝rs1}|

{rem_rd|5{rem_valid}＝＝rs1}|

{rf_rd|5{rf_rd_we&&not longinst}＝＝rs1}

(2) rs2 of the current instruction is judged relative to the instruction occurrence data of rd written by the execution stage, and the judgment logic is as follows:

{ld1_rd|5{ld1_valid}＝＝rs2}|

{ld2_rd|5{ld2_valid}＝＝rs2}|

{div_rd|5{div_valid}＝＝rs2}|

{rem_rd|5{rem_valid}＝＝rs2}|

{rf_rd|5{rf_rd_we&&not longinst}＝＝rs2}

(3) judging the rd of the current instruction and rd-writing multi-cycle instruction generation data which is not executed by an execution stage, wherein the judgment logic is as follows:

{ld1_rd|5{ld1_valid}＝＝rd}|

{ld2_rd|5{ld2_valid}＝＝rd}|

{div_rd|5{div_valid}＝＝rd}|

{rem_rd|5{rem_valid}＝＝rd}

(4) x1 is associated with the instruction occurrence data of the execution stage write rd, and the decision logic is:

{ld1_rd|5{ld1_valid}＝＝x1}|

{ld2_rd|5{ld2_valid}＝＝x1}|

{div_rd|5{div_valid}＝＝x1}|

{rem_rd|5{rem_valid}＝＝x1}|

{rf_rd|5{rf_rd_we&&not longinst}＝＝x1}|

{de_rf_rd|5{de_rf_rd_we＝＝x1}

(5) for the current REM instruction, the execute stage has a DIV instruction being executed, which conforms to DIV/REM instruction sequence determination. For the sequence: DIV [ U ] rdq, rs1, rs 2; REM [ U ] rdr, rs1, rs2, the source register index numbers and the sequence of the two instructions must be completely the same, the index number of the DIV result register rdq must not be equal to the index numbers of rs1 and rs2, then REM instruction is directly launched, and the execution stage can directly obtain REM result value according to DIV result.

And seventhly, outputting write data and write control by the write control circuit based on the execution stage, wherein the write control comprises write enable and write data, and adopting a null insertion strategy for writing in the register of the long-period instruction destination delivered out of order, namely finishing the writing in of the register of the long-period instruction destination when no other register write operation exists in the current period. By the aid of the insertion strategy, pipeline stall cycles which are required to be added for long-period instruction writing are reduced, instruction execution speed is accelerated, and the specific logic structure is shown in FIG. 4. The write enable in write control is connected to the output of mux1, the execute stage current instruction write enable is connected to the select terminal of mux1, 1 is connected to terminal 1 of mux1, terminal 0 of mux1 is connected to the output of mux2, there are related and related long cycle instructions that have completed connecting to the select terminal of mux2, 1 is connected to terminal 1 of mux2, terminal 0 of mux2 is connected to the output of mux3, there is a completed long cycle instruction connected to the select terminal of mux3, 1 is connected to terminal 1 of mux3, and 0 is connected to terminal 0 of mux 3. The write address is connected to the output of mux4, the execute stage current instruction write enable is connected to the select terminal of mux4, the execute stage current instruction rd is connected to the 1 terminal of mux4, the 0 terminal of mux4 is connected to the output terminal of mux5, there is a related and related long cycle instruction rd that has completed connecting to the select terminal of mux5, the related completed long cycle instruction rd is connected to the 1 terminal of mux5, and the completed long cycle instruction rd based on a certain sequence is connected to the 0 terminal of mux 5. Write data is coupled to an output of mux6, execute stage current command write enable is coupled to a select terminal of mux6, execute stage current command write data is coupled to a 1 terminal of mux6, a 0 terminal of mux6 is coupled to an output of mux7, there is an associated long cycle command completed coupled to the select terminal of mux7, an associated completed long cycle command write data is coupled to a 1 terminal of mux7, and the completed long cycle command write data based on a sequence is coupled to a 0 terminal of mux 7.

The invention discloses a processor, which comprises a three-level pipeline architecture based on a RISC-V instruction set.

The invention also provides a data processing method, which is based on the three-stage pipeline architecture based on the RISC-V instruction set and comprises the following three-stage pipeline:

The invention has been applied to a RISC-V processor core and has completed FPGA verification. The architecture has the advantages of clear implementation method, simple control logic, strong universality and expandability, and suitability for the design of low-power-consumption and miniaturized processor cores.

Claims

1. A three-level pipeline architecture based on RISC-V instruction set is characterized by comprising an instruction fetching level module, a decoding level module, an execution level module and a register file;

2. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein the instruction fetch stage module comprises a branch prediction unit, a PC generation unit and an instruction buffer;

3. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein the decode stage module comprises a command parsing unit, an operand decoding unit and a dependent decision circuit;

4. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 3, wherein the decision type of the dependent decision circuit is divided into five cases: a determination that the source register rs1 of the current instruction is related to the destination register of the executing instruction; a determination that the source register rs2 of the current instruction is related to the destination register of the executing instruction; a determination that a destination register of the current instruction is related to a destination register of the execution instruction; the x1 register is associated with the predicate of the register that executes the instruction; the current instruction is a REM instruction, and the execution instruction is a DIV instruction, which conforms to the judgment of DIV/REM instruction sequence.

5. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein the execution stage module comprises operand generation circuitry, an execution unit and write control circuitry;

6. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 5, wherein the write control circuit employs a break-in strategy for writing to long-cycle instruction destination registers that are delivered out-of-order.

7. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein there are corresponding inter-stage registers between the instruction fetch stage module, the decode stage module and the execution stage module, respectively, the instruction fetch stage register, the decode stage register and the execution stage register.

8. The RISC-V instruction set based three-stage pipeline architecture of claim 1, wherein the three-stage pipeline inter-stage propulsion consisting of the fetch stage module, the decode stage module and the execute stage module employs global pipeline hold propulsion or interactive handshake between pipeline inter-stage valid and ready.

9. A processor comprising a RISC-V instruction set based three-stage pipeline architecture according to any of claims 1-8.

10. A data processing method, characterized in that a RISC-V instruction set based three-stage pipeline architecture according to any of claims 1-9, comprising a three-stage pipeline: