CN113946368A - Three-level pipeline architecture based on RISC-V instruction set, processor and data processing method - Google Patents

Three-level pipeline architecture based on RISC-V instruction set, processor and data processing method Download PDF

Info

Publication number
CN113946368A
CN113946368A CN202111275421.1A CN202111275421A CN113946368A CN 113946368 A CN113946368 A CN 113946368A CN 202111275421 A CN202111275421 A CN 202111275421A CN 113946368 A CN113946368 A CN 113946368A
Authority
CN
China
Prior art keywords
instruction
stage
register
execution
outputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111275421.1A
Other languages
Chinese (zh)
Other versions
CN113946368B (en
Inventor
赵翠华
张海金
杨靓
娄冕
李红桥
李磊
罗敏涛
黄巾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Microelectronics Technology Institute
Original Assignee
Xian Microelectronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Microelectronics Technology Institute filed Critical Xian Microelectronics Technology Institute
Priority to CN202111275421.1A priority Critical patent/CN113946368B/en
Publication of CN113946368A publication Critical patent/CN113946368A/en
Application granted granted Critical
Publication of CN113946368B publication Critical patent/CN113946368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30069Instruction skipping instructions, e.g. SKIP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a three-stage pipeline architecture based on a RISC-V instruction set, which comprises an instruction fetching stage module, a decoding stage module, an execution stage module and a register file; the original second-stage flowing water is divided into second-stage and third-stage flowing water, so that the logic of the second-stage flowing water is reduced, and the main frequency is promoted. The invention controls the instruction flow reaching the subsequent execution stage by carrying out correlation decoding on the source register and the target register of the current instruction and the target register in the production line, if the instruction flow is correlated and the production line is stopped, if the instruction flow is not correlated, the instruction of the decoding stage is sent to the execution stage, and the correctness of the execution of the processor function under out-of-order delivery is ensured. The architecture of the invention adopts a rapid execution mode of parallel execution and out-of-order delivery of long-period instructions, allows long-period instructions with longer execution time, such as load/store, division and the like, to be executed in parallel with ALU and other long-period instructions under the condition of no resource conflict, and accelerates the execution performance of the processor.

Description

Three-level pipeline architecture based on RISC-V instruction set, processor and data processing method
Technical Field
The invention belongs to the field of low-power-consumption processor core architectures, and relates to a three-level pipeline architecture based on a RISC-V instruction set, a processor and a data processing method.
Background
At present, the commercial low-power-consumption processor core mainly uses a Cortex-M series core of an ARM architecture, a pipeline is generally short, and the pipeline is 2-level or 3-level, but the commercial architecture core has the defects of high authorization cost and low performance. The patent document with application number 201810933214.2 discloses a two-stage pipeline architecture based on RISC-V instruction set, which provides a low power consumption processor architecture based on two-stage pipeline, but based on two-stage pipeline, the first stage is instruction fetch, the second stage is decoding, executing and write-back, the second stage pipeline has large logic, which is not good for improving the overall dominant frequency of the processor.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a three-level pipeline architecture based on a RISC-V instruction set, a processor and a data processing method, wherein the second-level pipeline logic is reduced, and the improvement of the integral dominant frequency of the processor is facilitated.
The invention is realized by the following technical scheme:
a three-level pipeline architecture based on RISC-V instruction set is characterized by comprising an instruction fetching level module, a decoding level module, an execution level module and a register file;
the instruction fetching stage module is used for acquiring a current instruction, receiving an instruction skipping indicating signal, generating a PC value to be accessed according to the current instruction and the instruction skipping indicating signal, and outputting the current instruction and the PC value to the decoding stage module;
the decoding stage module is used for carrying out command analysis on the received current instruction to obtain an instruction analysis indicating signal, a source register rs1, a source register rs2 and a destination register of the current instruction, meanwhile, whether the source register rs1, the source register rs2 or the destination register of the current instruction are related to a destination register of an execution instruction in the execution stage module is judged according to the source register rs1, the source register rs2, the destination register of the current instruction and the received write control and instruction completion indicating signal, if so, the pipeline is halted, if not, the instruction analysis indicating signal is output to the execution stage module, the related indicating signal is output to the execution stage module, and read control is output to a register file; the register file outputs read data 1 and read data 2 to the execution level module;
and the execution stage module is used for generating an operand 1 and an operand 2 according to the related indication signal, the write data, the read data 1 and the read data 2, executing the instruction according to the operand 1, the operand 2 and the instruction analysis indication signal, outputting an instruction completion indication signal to the decoding stage module, outputting an instruction jump indication signal to the instruction fetching stage module, obtaining the write data and the write control, outputting the write data and the write control to the register file, and outputting the write control to the decoding stage module.
Preferably, the instruction fetching stage module comprises a branch prediction unit, a PC generation unit and an instruction buffer;
the PC generating unit generates a PC value to be accessed according to the instruction branch prediction result, the instruction jump indicating signal and the instruction filling state in the instruction buffer; the instruction buffer acquires a current instruction based on the PC value and writes the current instruction into the instruction buffer; the branch prediction unit carries out branch prediction on the instruction based on the current instruction in the instruction buffer, generates an instruction branch prediction result and outputs the instruction branch prediction result to the PC generation unit; the instruction buffer outputs the current instruction to the decoding level module, and the PC generating unit outputs the PC value to the decoding level module.
Preferably, the decoding stage module comprises a command parsing unit, an operand decoding unit and a related judgment circuit;
the command analysis unit is used for receiving the current instruction output by the instruction fetching level module and carrying out command analysis on the current instruction to obtain an instruction analysis indicating signal;
the operand decoding unit is used for receiving the PC value output by the instruction fetching stage module and decoding the instruction to obtain a source register rs1, a source register rs2 and a destination register of the current instruction and outputting the source register rs1, the source register rs2 and the destination register to the related judging circuit;
and the related judging circuit is used for judging whether the source register rs1, the source register rs2 or the destination register of the current instruction is related to the destination register of the execution instruction in the execution stage module or not according to the source register rs1, the source register rs2 and the destination register of the current instruction and the received write control and instruction completion indicating signals, if so, the pipeline is halted, and if not, the instruction analysis indicating signal is output to the execution stage module, the related indicating signal is output to the execution stage module, and the read control is output to the register file.
Further, the judgment types of the relevant judgment circuit are classified into the following five cases: a determination that the source register rs1 of the current instruction is related to the destination register of the executing instruction; a determination that the source register rs2 of the current instruction is related to the destination register of the executing instruction; a determination that a destination register of the current instruction is related to a destination register of the execution instruction; the x1 register is associated with the predicate of the register that executes the instruction; the current instruction is a REM instruction, and the execution instruction is a DIV instruction, which conforms to the judgment of DIV/REM instruction sequence.
Further, when the two registers are determined to be related to each other, if the two registers are the same, it is determined that the two registers are related to each other.
Preferably, the execution stage module comprises an operand generation circuit, an execution unit and a write control circuit;
the operand generating circuit is used for receiving the related indication signals output by the decoding stage module, the read data 1 and the read data 2 output by the register file and the write data output by the write control circuit, obtaining the operand 1 and the operand 2 after processing and outputting the operand 1 and the operand 2 to the execution unit;
the execution unit is used for analyzing the instruction, the operand 1 and the operand 2 according to the received instruction, outputting the obtained operation result and the instruction completion instruction signal to the write control circuit, outputting the instruction completion instruction signal to the decoding stage module and outputting the instruction jump instruction signal to the instruction fetching stage module;
and the write control circuit is used for outputting write data according to the received operation result and the instruction completion indicating signal, outputting the write control to the register file, outputting the write control to the decoding stage module, and outputting the write data to the operand generation circuit.
Furthermore, the write control circuit adopts a null insertion strategy for writing in the register of the long-cycle instruction destination delivered out-of-order.
Preferably, corresponding inter-stage registers are arranged among the instruction-fetching stage module, the decoding stage module and the execution stage module, and are respectively an instruction-fetching stage register, a decoding stage register and an execution stage register.
Preferably, the interstage propulsion of the three-stage pipeline formed by the instruction-fetching stage module, the decoding stage module and the execution stage module adopts a global pipeline hold propulsion mode or a pipeline interstage valid and ready interactive handshake mode.
A processor comprising said RISC-V instruction set based three-stage pipeline architecture.
A data processing method is based on the three-stage pipeline architecture based on the RISC-V instruction set, and comprises the following three stages of pipelines:
fetching instruction level flowing water: acquiring a current instruction, receiving an instruction jump indicating signal, generating a PC value to be accessed according to the current instruction and the instruction jump indicating signal, and outputting the current instruction and the PC value to a decoding-level pipeline;
decoding level pipelining: performing command analysis on a received current instruction to obtain an instruction analysis indicating signal, and a source register rs1, a source register rs2 and a destination register of the current instruction, and meanwhile, judging whether the source register rs1, the source register rs2 or the destination register of the current instruction are related to a destination register of an execution instruction in execution-stage flow according to the source register rs1, the source register rs2, the destination register of the current instruction and the received write control and instruction completion indicating signal, if so, pausing the flow line, otherwise, outputting the instruction analysis indicating signal to the execution-stage flow, outputting a related indicating signal to the execution-stage flow, and outputting read control to a register file; the register file outputs read data 1 and read data 2 to an execution stage pipeline;
execution level pipelining: generating an operand 1 and an operand 2 according to the related indication signals, the write data, the read data 1 and the read data 2, executing the instruction according to the operand 1, the operand 2 and the instruction analysis indication signals, outputting an instruction completion indication signal to a decoding level pipeline, outputting an instruction jump indication signal to an instruction fetch level pipeline, obtaining write data and write control, outputting the write data and the write control to a register file, and outputting the write control to a decoding level pipeline.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention divides the original second-stage flow into the second-stage flow and the third-stage flow based on the three-stage flow architecture of the RISC-V instruction set, so that the logic of the second-stage flow is reduced, and the promotion of the dominant frequency is facilitated. The invention provides a decoding stage module, which provides a decoding circuit for supporting the parallel execution of long-period instructions and controlling the instruction flow reaching the subsequent execution stage. Based on the dynamic pipeline, a plurality of destination registers of the executing but not delivered instructions in the pipeline are recorded dynamically in real time. The source and destination registers of the current instruction and the destination register in the production line are subjected to correlation decoding, the instruction flow reaching the subsequent execution stage is controlled, if the instruction flow is correlated, the production line is stopped, if the instruction flow is not correlated, the instruction in the decoding stage is sent to the execution stage, and the correctness of the execution of the processor function under out-of-order delivery is ensured. The architecture of the invention adopts a rapid execution mode of parallel execution and out-of-order delivery of long-period instructions, allows long-period instructions with longer execution time, such as load/store, division and the like, to be executed in parallel with ALU and other long-period instructions under the condition of no resource conflict, and accelerates the execution performance of the processor. The pipeline architecture has the advantages of clear implementation method, reduction of the cost of the processor kernel architecture, simple control logic, strong universality and expandability, and capability of reasonably controlling the number of parallel execution instructions by combining with the hardware execution resources of the actual execution stage.
Further, the execution stage proposes a register write control logic that supports out-of-order commit. And when the long-period instruction is executed and the target register is required to be written, adopting an insertion strategy, and finishing the writing of the target register of the long-period instruction when no other register is written in the current period. By the aid of the insertion strategy, pipeline stall cycles which are required to be added for long-period instruction writing are reduced, and instruction execution speed is accelerated.
Drawings
FIG. 1 is a block diagram of a three-level pipeline architecture based on the RISC-V instruction set according to the present invention;
FIG. 2 shows the propulsion mode between stages: (a) propelling among global pipeline stages, (b) propelling among interactive handshake pipeline stages;
FIG. 3 is a correlation determination circuit;
fig. 4 is a write control circuit.
Detailed Description
The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.
The invention relates to a three-level pipeline architecture based on a RISC-V instruction set, which specifically introduces the following steps: first, as shown in FIG. 1, the three-stage pipeline architecture based on RISC-V instruction set includes an instruction fetch stage module, a decode stage module, and an execution stage module. The instruction fetching level module mainly comprises a branch prediction unit, a PC (personal computer) generation unit, an instruction buffer and an instruction fetching access interface; the decoding level module mainly comprises a command analysis unit, an operand decoding unit and a related judgment circuit; the execute stage module mainly comprises an operand generating circuit, an execution unit (LD/ST queue, divider, multiplier, ALU, CSR), a data access interface and a write control circuit.
And secondly, corresponding interstage registers are arranged among the instruction fetching stage module, the decoding stage module and the execution stage module in the three-stage pipeline, and are respectively an instruction fetching stage register, a decoding stage register and an execution stage register. The three-stage pipeline interstage propulsion can adopt a global pipeline hold propulsion mode and a pipeline interstage valid and ready interactive handshake mode, as shown in fig. 2, fig. 2(a) is a global pipeline interstage propulsion mode, namely, based on a global pipeline control signal, three pipeline stages carry out pipeline propulsion in a unified way; FIG. 2(b) shows the propulsion between valid and ready interactive handshake flow stages, that is, when the current flow stage completes its work and the next flow stage can receive new work, the current flow stage can be transferred to the next flow stage.
The PC generating unit in the instruction fetching level module is connected with the instruction fetching access interface, the branch prediction unit and the instruction buffer; the instruction fetching access interface is connected with the PC generating unit and the instruction buffer, and the branch prediction unit is connected with the PC generating unit and the instruction buffer. The PC generating unit generates a PC value to be accessed according to the instruction branch prediction result, the instruction jump indicating signal and the instruction filling state in the instruction buffer, the instruction buffer acquires an instruction through an instruction fetching access interface and a bus based on the PC value, and writes the instruction into the instruction buffer; the branch prediction unit carries out branch prediction on the instruction based on the instruction in the instruction buffer, generates an instruction branch prediction result and transmits the instruction branch prediction result to the PC generation unit; the instruction buffer and PC generation unit takes the instruction and the PC value as the input of the instruction fetching stage register and transmits the input to the decoding stage module.
Thirdly, the output of the instruction fetching stage register is connected with the command analysis unit and the operand decoding unit, the command analysis unit analyzes the 32/16-bit command, an instruction analysis indicating signal is output to serve as the input of the decoding stage register, the operand decoding unit decodes the 32/16-bit command, and the output source register rs1, the source register rs2 and the output destination register are connected with the input of the related judging circuit. The write control signal output by the execution stage register and the instruction completion indicating signal output by the execution unit in the execution stage module are used as the input of the relevant judging circuit, the read control signal is output by the relevant judging circuit as the input of the register file, and the relevant indicating signal is output as the input of the decoding stage register.
Fourthly, the related indication signal of the decoding stage register, the read data 1 and the read data 2 output by the register file and the write data of the write control circuit of the execution stage are used as the input of an operand generating circuit, the output of the operand generating circuit is used as the input of an execution unit, the instruction analysis indication signal of the decoding stage register is used as the input of the execution unit, the output operation result and the instruction completion indication signal of the execution unit are used as the input of a write control circuit, the output instruction jump indication signal of the execution unit is used as the input of a PC generating unit, and the output write data and the write control of the write control circuit are connected with the input of the register file and are also used as the input of the execution stage register.
Fifthly, the input of the register file is the write data and write control output by the write control circuit and the read control output by the related judging circuit, and the read data 1 and the read data 2 output by the register file are connected with the input of the operand generating circuit. The register file writing time sequence is to complete the writing operation at the next rising edge of the writing address and the writing data clock, and the reading time sequence is to complete the reading operation at the next rising edge of the reading address and the reading data clock.
Sixth, based on the execution units of the execution stage, up to a number (2 in this embodiment) of Load/Store instructions, 1 DIV instruction, 1 REM instruction, and 1 single-cycle ALU instruction/MUL instruction/CSR instruction of the current execution stage may be executed in the execution stage pipeline. Based on the execution strategy of the long-period instruction, register related conflicts and resource related conflicts are mainly solved in order to ensure the correctness of the execution semantics of the long-period instruction. For the resource conflict introduced by the execution stage, pipeline control needs to be performed in cooperation with the execution stage, for example, if there is a division DIV instruction and another division DIV instruction in the pipeline, it is necessary to wait for the previous DIV instruction to complete execution before corresponding to the subsequent DIV instruction, and the following analysis is focused on register related conflict.
Register correlation, which mainly handles RAW correlation and WAW correlation. Raw (read after write), in decoding stage, checking whether the source register rs1/rs2 of the current instruction is the same as the destination register rd of the execution stage executing instruction, if so, correlation occurs, and no inter-stage transfer, i.e. pipeline stopping, is performed. In order to improve the instruction execution efficiency, a data bypass mode is adopted. For long-period instructions, register correlation is released in the period of writing rd of the long-period instructions, and the registered operation result can be directly accessed in the next period. The WAW (write after write) is based on the pipelining strategy of the out-of-order delivery of the processor, and simultaneously needs to judge whether the destination register rd of the current instruction is the same as the destination register rd of the execution stage execution instruction, if so, correlation occurs, the pipelining needs to be halted for waiting, and the elimination mode is similar to the register correlation. Based on the above-mentioned register correlation analysis and the instruction jump of the DIV/REM instruction sequence determination and instruction fetch stage jal based on x1, the decoding stage correlation determination circuit is shown in FIG. 3, which is divided into the following five cases:
before explaining the five cases, first, the following signals are used to indicate the meanings:
rs 1: current instruction source register rs 1;
rs 2: current instruction source register rs 2;
rd: the current instruction destination register rd;
x 1: indicating the x1 register;
ld1_ rd: rd, representing the load sequence 1 instruction;
ld1_ valid: indicating that the load sequence 1 instruction is currently executing;
ld2_ rd: rd, representing the load sequence 2 instruction;
ld2_ valid: indicating that the load sequence 2 instruction is currently executing;
div _ rd: rd representing a division instruction;
div _ valid: indicating that a divide instruction is currently executing;
rem _ rd: rd, representing the REM instruction;
rem _ valid: indicating that the REM instruction is currently executing;
rf _ rd: represents a write address of the register file;
rf _ rd _ we: represents write enable of the register file;
longinst: representing a non-long-cycle instruction that writes to a register file.
de _ rf _ rd: a destination register representing a current instruction;
de _ rf _ rd _ we, indicating that the register for write purposes of the current instruction is enabled.
(1) Rs1 of the current instruction is judged relative to the instruction occurrence data of rd written by the execution stage, and the judgment logic is as follows:
{ld1_rd|5{ld1_valid}==rs1}|
{ld2_rd|5{ld2_valid}==rs1}|
{div_rd|5{div_valid}==rs1}|
{rem_rd|5{rem_valid}==rs1}|
{rf_rd|5{rf_rd_we&&not longinst}==rs1}
(2) rs2 of the current instruction is judged relative to the instruction occurrence data of rd written by the execution stage, and the judgment logic is as follows:
{ld1_rd|5{ld1_valid}==rs2}|
{ld2_rd|5{ld2_valid}==rs2}|
{div_rd|5{div_valid}==rs2}|
{rem_rd|5{rem_valid}==rs2}|
{rf_rd|5{rf_rd_we&&not longinst}==rs2}
(3) judging the rd of the current instruction and rd-writing multi-cycle instruction generation data which is not executed by an execution stage, wherein the judgment logic is as follows:
{ld1_rd|5{ld1_valid}==rd}|
{ld2_rd|5{ld2_valid}==rd}|
{div_rd|5{div_valid}==rd}|
{rem_rd|5{rem_valid}==rd}
(4) x1 is associated with the instruction occurrence data of the execution stage write rd, and the decision logic is:
{ld1_rd|5{ld1_valid}==x1}|
{ld2_rd|5{ld2_valid}==x1}|
{div_rd|5{div_valid}==x1}|
{rem_rd|5{rem_valid}==x1}|
{rf_rd|5{rf_rd_we&&not longinst}==x1}|
{de_rf_rd|5{de_rf_rd_we==x1}
(5) for the current REM instruction, the execute stage has a DIV instruction being executed, which conforms to DIV/REM instruction sequence determination. For the sequence: DIV [ U ] rdq, rs1, rs 2; REM [ U ] rdr, rs1, rs2, the source register index numbers and the sequence of the two instructions must be completely the same, the index number of the DIV result register rdq must not be equal to the index numbers of rs1 and rs2, then REM instruction is directly launched, and the execution stage can directly obtain REM result value according to DIV result.
And seventhly, outputting write data and write control by the write control circuit based on the execution stage, wherein the write control comprises write enable and write data, and adopting a null insertion strategy for writing in the register of the long-period instruction destination delivered out of order, namely finishing the writing in of the register of the long-period instruction destination when no other register write operation exists in the current period. By the aid of the insertion strategy, pipeline stall cycles which are required to be added for long-period instruction writing are reduced, instruction execution speed is accelerated, and the specific logic structure is shown in FIG. 4. The write enable in write control is connected to the output of mux1, the execute stage current instruction write enable is connected to the select terminal of mux1, 1 is connected to terminal 1 of mux1, terminal 0 of mux1 is connected to the output of mux2, there are related and related long cycle instructions that have completed connecting to the select terminal of mux2, 1 is connected to terminal 1 of mux2, terminal 0 of mux2 is connected to the output of mux3, there is a completed long cycle instruction connected to the select terminal of mux3, 1 is connected to terminal 1 of mux3, and 0 is connected to terminal 0 of mux 3. The write address is connected to the output of mux4, the execute stage current instruction write enable is connected to the select terminal of mux4, the execute stage current instruction rd is connected to the 1 terminal of mux4, the 0 terminal of mux4 is connected to the output terminal of mux5, there is a related and related long cycle instruction rd that has completed connecting to the select terminal of mux5, the related completed long cycle instruction rd is connected to the 1 terminal of mux5, and the completed long cycle instruction rd based on a certain sequence is connected to the 0 terminal of mux 5. Write data is coupled to an output of mux6, execute stage current command write enable is coupled to a select terminal of mux6, execute stage current command write data is coupled to a 1 terminal of mux6, a 0 terminal of mux6 is coupled to an output of mux7, there is an associated long cycle command completed coupled to the select terminal of mux7, an associated completed long cycle command write data is coupled to a 1 terminal of mux7, and the completed long cycle command write data based on a sequence is coupled to a 0 terminal of mux 7.
The invention discloses a processor, which comprises a three-level pipeline architecture based on a RISC-V instruction set.
The invention also provides a data processing method, which is based on the three-stage pipeline architecture based on the RISC-V instruction set and comprises the following three-stage pipeline:
fetching instruction level flowing water: acquiring a current instruction, receiving an instruction jump indicating signal, generating a PC value to be accessed according to the current instruction and the instruction jump indicating signal, and outputting the current instruction and the PC value to a decoding-level pipeline;
decoding level pipelining: performing command analysis on a received current instruction to obtain an instruction analysis indicating signal, and a source register rs1, a source register rs2 and a destination register of the current instruction, and meanwhile, judging whether the source register rs1, the source register rs2 or the destination register of the current instruction are related to a destination register of an execution instruction in execution-stage flow according to the source register rs1, the source register rs2, the destination register of the current instruction and the received write control and instruction completion indicating signal, if so, pausing the flow line, otherwise, outputting the instruction analysis indicating signal to the execution-stage flow, outputting a related indicating signal to the execution-stage flow, and outputting read control to a register file; the register file outputs read data 1 and read data 2 to an execution stage pipeline;
execution level pipelining: generating an operand 1 and an operand 2 according to the related indication signals, the write data, the read data 1 and the read data 2, executing the instruction according to the operand 1, the operand 2 and the instruction analysis indication signals, outputting an instruction completion indication signal to a decoding level pipeline, outputting an instruction jump indication signal to an instruction fetch level pipeline, obtaining write data and write control, outputting the write data and the write control to a register file, and outputting the write control to a decoding level pipeline.
The invention has been applied to a RISC-V processor core and has completed FPGA verification. The architecture has the advantages of clear implementation method, simple control logic, strong universality and expandability, and suitability for the design of low-power-consumption and miniaturized processor cores.

Claims (10)

1. A three-level pipeline architecture based on RISC-V instruction set is characterized by comprising an instruction fetching level module, a decoding level module, an execution level module and a register file;
the instruction fetching stage module is used for acquiring a current instruction, receiving an instruction skipping indicating signal, generating a PC value to be accessed according to the current instruction and the instruction skipping indicating signal, and outputting the current instruction and the PC value to the decoding stage module;
the decoding stage module is used for carrying out command analysis on the received current instruction to obtain an instruction analysis indicating signal, a source register rs1, a source register rs2 and a destination register of the current instruction, meanwhile, whether the source register rs1, the source register rs2 or the destination register of the current instruction are related to a destination register of an execution instruction in the execution stage module is judged according to the source register rs1, the source register rs2, the destination register of the current instruction and the received write control and instruction completion indicating signal, if so, the pipeline is halted, if not, the instruction analysis indicating signal is output to the execution stage module, the related indicating signal is output to the execution stage module, and read control is output to a register file; the register file outputs read data 1 and read data 2 to the execution level module;
and the execution stage module is used for generating an operand 1 and an operand 2 according to the related indication signal, the write data, the read data 1 and the read data 2, executing the instruction according to the operand 1, the operand 2 and the instruction analysis indication signal, outputting an instruction completion indication signal to the decoding stage module, outputting an instruction jump indication signal to the instruction fetching stage module, obtaining the write data and the write control, outputting the write data and the write control to the register file, and outputting the write control to the decoding stage module.
2. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein the instruction fetch stage module comprises a branch prediction unit, a PC generation unit and an instruction buffer;
the PC generating unit generates a PC value to be accessed according to the instruction branch prediction result, the instruction jump indicating signal and the instruction filling state in the instruction buffer; the instruction buffer acquires a current instruction based on the PC value and writes the current instruction into the instruction buffer; the branch prediction unit carries out branch prediction on the instruction based on the current instruction in the instruction buffer, generates an instruction branch prediction result and outputs the instruction branch prediction result to the PC generation unit; the instruction buffer outputs the current instruction to the decoding level module, and the PC generating unit outputs the PC value to the decoding level module.
3. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein the decode stage module comprises a command parsing unit, an operand decoding unit and a dependent decision circuit;
the command analysis unit is used for receiving the current instruction output by the instruction fetching level module and carrying out command analysis on the current instruction to obtain an instruction analysis indicating signal;
the operand decoding unit is used for receiving the PC value output by the instruction fetching stage module and decoding the instruction to obtain a source register rs1, a source register rs2 and a destination register of the current instruction and outputting the source register rs1, the source register rs2 and the destination register to the related judging circuit;
and the related judging circuit is used for judging whether the source register rs1, the source register rs2 or the destination register of the current instruction is related to the destination register of the execution instruction in the execution stage module or not according to the source register rs1, the source register rs2 and the destination register of the current instruction and the received write control and instruction completion indicating signals, if so, the pipeline is halted, and if not, the instruction analysis indicating signal is output to the execution stage module, the related indicating signal is output to the execution stage module, and the read control is output to the register file.
4. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 3, wherein the decision type of the dependent decision circuit is divided into five cases: a determination that the source register rs1 of the current instruction is related to the destination register of the executing instruction; a determination that the source register rs2 of the current instruction is related to the destination register of the executing instruction; a determination that a destination register of the current instruction is related to a destination register of the execution instruction; the x1 register is associated with the predicate of the register that executes the instruction; the current instruction is a REM instruction, and the execution instruction is a DIV instruction, which conforms to the judgment of DIV/REM instruction sequence.
5. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein the execution stage module comprises operand generation circuitry, an execution unit and write control circuitry;
the operand generating circuit is used for receiving the related indication signals output by the decoding stage module, the read data 1 and the read data 2 output by the register file and the write data output by the write control circuit, obtaining the operand 1 and the operand 2 after processing and outputting the operand 1 and the operand 2 to the execution unit;
the execution unit is used for analyzing the instruction, the operand 1 and the operand 2 according to the received instruction, outputting the obtained operation result and the instruction completion instruction signal to the write control circuit, outputting the instruction completion instruction signal to the decoding stage module and outputting the instruction jump instruction signal to the instruction fetching stage module;
and the write control circuit is used for outputting write data according to the received operation result and the instruction completion indicating signal, outputting the write control to the register file, outputting the write control to the decoding stage module, and outputting the write data to the operand generation circuit.
6. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 5, wherein the write control circuit employs a break-in strategy for writing to long-cycle instruction destination registers that are delivered out-of-order.
7. A RISC-V instruction set based three-stage pipeline architecture as claimed in claim 1, wherein there are corresponding inter-stage registers between the instruction fetch stage module, the decode stage module and the execution stage module, respectively, the instruction fetch stage register, the decode stage register and the execution stage register.
8. The RISC-V instruction set based three-stage pipeline architecture of claim 1, wherein the three-stage pipeline inter-stage propulsion consisting of the fetch stage module, the decode stage module and the execute stage module employs global pipeline hold propulsion or interactive handshake between pipeline inter-stage valid and ready.
9. A processor comprising a RISC-V instruction set based three-stage pipeline architecture according to any of claims 1-8.
10. A data processing method, characterized in that a RISC-V instruction set based three-stage pipeline architecture according to any of claims 1-9, comprising a three-stage pipeline:
fetching instruction level flowing water: acquiring a current instruction, receiving an instruction jump indicating signal, generating a PC value to be accessed according to the current instruction and the instruction jump indicating signal, and outputting the current instruction and the PC value to a decoding-level pipeline;
decoding level pipelining: performing command analysis on a received current instruction to obtain an instruction analysis indicating signal, and a source register rs1, a source register rs2 and a destination register of the current instruction, and meanwhile, judging whether the source register rs1, the source register rs2 or the destination register of the current instruction are related to a destination register of an execution instruction in execution-stage flow according to the source register rs1, the source register rs2, the destination register of the current instruction and the received write control and instruction completion indicating signal, if so, pausing the flow line, otherwise, outputting the instruction analysis indicating signal to the execution-stage flow, outputting a related indicating signal to the execution-stage flow, and outputting read control to a register file; the register file outputs read data 1 and read data 2 to an execution stage pipeline;
execution level pipelining: generating an operand 1 and an operand 2 according to the related indication signals, the write data, the read data 1 and the read data 2, executing the instruction according to the operand 1, the operand 2 and the instruction analysis indication signals, outputting an instruction completion indication signal to a decoding level pipeline, outputting an instruction jump indication signal to an instruction fetch level pipeline, obtaining write data and write control, outputting the write data and the write control to a register file, and outputting the write control to a decoding level pipeline.
CN202111275421.1A 2021-10-29 2021-10-29 Three-stage pipeline architecture, processor and data processing method based on RISC-V instruction set Active CN113946368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111275421.1A CN113946368B (en) 2021-10-29 2021-10-29 Three-stage pipeline architecture, processor and data processing method based on RISC-V instruction set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111275421.1A CN113946368B (en) 2021-10-29 2021-10-29 Three-stage pipeline architecture, processor and data processing method based on RISC-V instruction set

Publications (2)

Publication Number Publication Date
CN113946368A true CN113946368A (en) 2022-01-18
CN113946368B CN113946368B (en) 2024-04-30

Family

ID=79337210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111275421.1A Active CN113946368B (en) 2021-10-29 2021-10-29 Three-stage pipeline architecture, processor and data processing method based on RISC-V instruction set

Country Status (1)

Country Link
CN (1) CN113946368B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881194A (en) * 2023-09-01 2023-10-13 腾讯科技(深圳)有限公司 Processor, data processing method and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4206062A1 (en) * 1991-03-01 1992-09-03 Mitsubishi Electric Corp Pipeline data processor for instructions - has source operands stored in registers and data in memory for execution using ALU circuits via selector circuit
EP1770507A2 (en) * 2005-09-30 2007-04-04 Fujitsu Ltd. Pipeline processing based on RISC architecture
CN103984530A (en) * 2014-05-15 2014-08-13 中国航天科技集团公司第九研究院第七七一研究所 Assembly line structure and method for improving execution efficiency of store command
CN108287730A (en) * 2018-03-14 2018-07-17 武汉市聚芯微电子有限责任公司 A kind of processor pipeline structure
CN109144573A (en) * 2018-08-16 2019-01-04 胡振波 Two-level pipeline framework based on RISC-V instruction set
CN109918130A (en) * 2019-01-24 2019-06-21 中山大学 A kind of four level production line RISC-V processors with rapid data bypass structure

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4206062A1 (en) * 1991-03-01 1992-09-03 Mitsubishi Electric Corp Pipeline data processor for instructions - has source operands stored in registers and data in memory for execution using ALU circuits via selector circuit
EP1770507A2 (en) * 2005-09-30 2007-04-04 Fujitsu Ltd. Pipeline processing based on RISC architecture
CN103984530A (en) * 2014-05-15 2014-08-13 中国航天科技集团公司第九研究院第七七一研究所 Assembly line structure and method for improving execution efficiency of store command
CN108287730A (en) * 2018-03-14 2018-07-17 武汉市聚芯微电子有限责任公司 A kind of processor pipeline structure
CN109144573A (en) * 2018-08-16 2019-01-04 胡振波 Two-level pipeline framework based on RISC-V instruction set
CN109918130A (en) * 2019-01-24 2019-06-21 中山大学 A kind of four level production line RISC-V processors with rapid data bypass structure

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
折如义;李炳辉;姜佩贺;: "三级流水线RISC-V处理器设计与验证", 电子技术应用, no. 05, 6 May 2020 (2020-05-06), pages 50 - 55 *
邓天传;胡振波;: "一种超低功耗的RISC-V处理器流水线结构", 电子技术应用, no. 06, 6 June 2019 (2019-06-06), pages 56 - 59 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881194A (en) * 2023-09-01 2023-10-13 腾讯科技(深圳)有限公司 Processor, data processing method and computer equipment
CN116881194B (en) * 2023-09-01 2023-12-22 腾讯科技(深圳)有限公司 Processor, data processing method and computer equipment

Also Published As

Publication number Publication date
CN113946368B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
US11720365B2 (en) Path prediction method used for instruction cache, access control unit, and instruction processing apparatus
US8904153B2 (en) Vector loads with multiple vector elements from a same cache line in a scattered load operation
US20120204008A1 (en) Processor with a Hybrid Instruction Queue with Instruction Elaboration Between Sections
JPH1124929A (en) Arithmetic processing unit and its method
US20120284488A1 (en) Methods and Apparatus for Constant Extension in a Processor
CN110806899B (en) Assembly line tight coupling accelerator interface structure based on instruction extension
US11704131B2 (en) Moving entries between multiple levels of a branch predictor based on a performance loss resulting from fewer than a pre-set number of instructions being stored in an instruction cache register
CN110825437A (en) Method and apparatus for processing data
CN113946368B (en) Three-stage pipeline architecture, processor and data processing method based on RISC-V instruction set
CN110688160B (en) Instruction pipeline processing method, system, equipment and computer storage medium
US7681022B2 (en) Efficient interrupt return address save mechanism
US20060112258A1 (en) Parallel data path architecture for high energy efficiency
CN112559403B (en) Processor and interrupt controller therein
CN112559048B (en) Instruction processing device, processor and processing method thereof
CN111857830B (en) Method, system and storage medium for designing path for forwarding instruction data in advance
CN112559037B (en) Instruction execution method, unit, device and system
CN114924792A (en) Instruction decoding unit, instruction execution unit, and related devices and methods
CN113779755A (en) Design method of silicon-based multispectral integrated circuit chip and integrated circuit chip
CN113703841B (en) Optimization method, device and medium for register data reading
WO2024087039A1 (en) Block instruction processing method and block instruction processor
WO2023093128A1 (en) Operation instruction processing method and system, main processor, and coprocessor
CN115269011A (en) Instruction execution unit, processing unit and related device and method
CN110737406A (en) method for realizing multi-port register file supporting disorder
US20210042111A1 (en) Efficient encoding of high fanout communications
CN116107637A (en) Instruction processor and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant