CN116974633A - Three-stage pipeline micro-architecture for risc-v processor - Google Patents

Three-stage pipeline micro-architecture for risc-v processor Download PDF

Info

Publication number
CN116974633A
CN116974633A CN202310931116.6A CN202310931116A CN116974633A CN 116974633 A CN116974633 A CN 116974633A CN 202310931116 A CN202310931116 A CN 202310931116A CN 116974633 A CN116974633 A CN 116974633A
Authority
CN
China
Prior art keywords
instruction
address
register
unit
csr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310931116.6A
Other languages
Chinese (zh)
Inventor
乔树山
王建超
游恒
尚德龙
周玉梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Intelligent Technology Research Institute
Original Assignee
Zhongke Nanjing Intelligent Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Intelligent Technology Research Institute filed Critical Zhongke Nanjing Intelligent Technology Research Institute
Priority to CN202310931116.6A priority Critical patent/CN116974633A/en
Publication of CN116974633A publication Critical patent/CN116974633A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The invention discloses a three-stage pipeline micro-architecture for risc-v processors, belonging to the technical field of integrated circuit structures, comprising: the device comprises a finger taking unit, a decoding unit and an executing unit; the instruction fetching unit fetches instructions from an instruction memory of the instruction fetching unit and sends the instructions to the decoding unit; the decoding unit analyzes the instruction and sends the analyzed instruction to the execution unit; the execution unit executes corresponding operation according to the analyzed instruction, reads required data from a data memory of the execution unit in the process of executing the operation, completes access of the data in the same clock period after the operation is executed, and writes the operation result back to a general register in the decoding unit after the operation is executed; compared with the prior art, the invention has simpler logic structure, can meet the performance requirements of the edge end Internet of things equipment, reduces the power consumption, has smaller area and is more suitable for the processor requirements of the edge end Internet of things market.

Description

Three-stage pipeline micro-architecture for risc-v processor
Technical Field
The invention relates to a three-stage pipeline micro-architecture for an risc-v processor, belonging to the technical field of integrated circuit structures.
Background
As the number of pipeline stages is increased, more registers are consumed, so that more area cost is brought, and meanwhile, the branch prediction failure can be solved by adopting a method for flushing the pipeline, and the processor delay is increased as the number of pipeline stages is increased.
Early classical pipelines were five-stage pipelines, which were fetch, decode, execute, memory access, and write-back, respectively. Modern processors often have extremely deep pipeline stages, such as up to tens of stages, or twenty stages deep, so that higher dominant frequencies can be run. However, in the internet of things equipment at the edge end, such high working performance is often not needed, but a higher requirement is provided for a low-power consumption small area, so that a processor with a lower low-power consumption area is required under the condition of sufficient performance.
Disclosure of Invention
The invention aims to provide a three-stage pipeline micro-architecture for an risc-v processor, which solves the problems of high power consumption and large area in edge-end Internet of things equipment in the prior art.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
the invention provides a three-stage pipelined microarchitecture for risc-v processors, comprising: the device comprises a finger taking unit, a decoding unit and an executing unit;
the instruction fetching unit fetches instructions from an instruction memory of the instruction fetching unit and sends the instructions to the decoding unit;
the decoding unit analyzes the instruction and sends the analyzed instruction to the execution unit;
the execution unit executes corresponding operation according to the analyzed instruction, reads the needed data from the data memory of the execution unit in the process of executing the operation, completes the access and storage of the data in the same clock period after the operation is executed, and writes the operation result back to the general register in the decoding unit after the operation is executed.
Further, the instruction fetching unit comprises an instruction address register and an instruction memory, wherein the output end of the instruction address register is connected with the input end of the instruction memory, the instruction address register sends an instruction address to the instruction memory, and the instruction memory fetches a corresponding instruction according to the instruction address and sends the instruction to the decoding unit.
Further, the instruction address register resets to 0 after the instruction address is sent, then monitors whether a skip signal or a pause signal is received, if the skip signal or the pause signal is not received, the value of the instruction address register is added with 4 and then the instruction address is retransmitted, if the skip signal is received, a multiplexer for sending the skip signal sends the skip address contained in the skip signal to the instruction address register, and if the pause signal is received, the value of the instruction address register is kept unchanged.
Further, a cycle variable for counting is set in the instruction address register, and the cycle variable performs self-increment operation when each clock rising edge triggers.
Further, the decoding unit includes an instruction decoder and a general register, the instruction decoder analyzes the type and the operation code of the instruction according to the opcode, function 3 and function 7 segments in the instruction, after the analysis is completed, the instruction is sent to the execution unit, the general register includes two read ports and a write-back port, the read ports output corresponding operands according to the register index in the instruction, and the write-back port is used for receiving the operation result.
Further, the execution unit comprises an arithmetic logic operation module, a CSR module, an address generation module and a multiplier-divider;
the address generation module is used for generating a write-back address, a read operation address and a write operation address;
the arithmetic logic operation module is used for carrying out logic operation, addition and subtraction and shift operation according to the analyzed instruction, and writing the operation result back to the general register according to the write-back address after the operation is completed;
the multiplication and division device is used for carrying out multiplication and division operation according to the analyzed instruction, and writing the operation result back to the general register according to the write-back address after the operation is finished;
the CSR module comprises a CSR register group and a CSR read-write control module, wherein the CSR register group updates a corresponding CSR register according to the write operation address, the CSR register group returns a value of the corresponding CSR register according to the read operation address, and the CSR read-write control module is used for controlling read-write operation of the CSR register group.
Furthermore, the execution unit also outputs the opcode in the instruction to the multiplexer as a control signal thereof, so that the multiplexer outputs the jumped or non-jumped instruction address to the instruction address register.
Compared with the prior art, the invention has the following beneficial effects:
the three-stage pipeline micro-architecture for the risc-v processor provided by the invention adopts a three-stage pipeline structure to realize the design purposes of low power consumption and small area on the premise of considering the functions of the processor, and accesses and writes back in the pipeline corresponding to the execution unit.
Drawings
FIG. 1 is a schematic diagram of a three-stage pipelined microarchitecture for an risc-v processor provided by an embodiment of the invention;
FIG. 2 is a schematic illustration of a pipeline provided by an embodiment of the present invention;
FIG. 3 is a simulated waveform diagram of a three-stage pipeline provided by an embodiment of the present invention;
FIG. 4 is a diagram of an risc-v instruction format provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of an RV32I portion integer instruction according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a simulation waveform of a decoding unit according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of simulation waveforms of an execution unit according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a verification result provided by an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and the following examples are only for more clearly illustrating the technical aspects of the present invention, and are not to be construed as limiting the scope of the present invention.
As shown in FIG. 2, the present invention provides a three-stage pipelined microarchitecture for an risc-v processor comprising: the device comprises a finger taking unit, a decoding unit and an executing unit; the instruction fetching unit fetches instructions from an instruction memory of the instruction fetching unit and sends the instructions to the decoding unit; the decoding unit analyzes the instruction and sends the analyzed instruction to the execution unit; the execution unit executes corresponding operation according to the analyzed instruction, reads the required data from the data memory thereof in the process of executing the operation, completes the access of the data in the same clock period after the operation is executed, and writes the operation result back to the general register in the decoding unit after the operation is executed.
As shown in fig. 1, the three-stage pipeline includes:
the first stage of the pipeline is instruction fetch IF (Instraction Fetch), the process of fetching instructions from instruction memory (Instraction Memory).
The second stage of the pipeline is instruction decode ID (Instraction Decode), a process of resolving instructions read from instruction memory.
The third stage of the pipeline is instruction execution EX (Instraction Execute), which performs a specific operation process according to the parsed instruction, including reading Data needed to participate in the operation from a Data Memory (Data Memory) and saving the result of the operation to a Memory or register.
As shown in fig. 1, in this embodiment, a 32-bit risc-v processor is adopted, and a specific implementation method of the three-stage pipeline micro architecture is as follows: first, in the instruction fetch stage, the Program Counter PC (Program Counter) contains the address of the instruction to be executed, the first step is to read the instruction from the instruction memory (Instration Memoty) into the if_id register, and the PC value is also fed into the if_id register by 4 addresses. Secondly, entering an instruction decoding stage, decoding a specific instruction at present according to the instruction content, determining a register involved in the current stage, such as whether a register needs to be written and which register needs to be written according to the specific instruction, sending a read result to an ID_EX register if a read register (RegFile) operation exists, and decoding an immediate number to carry out sign bit expansion if an immediate number exists and sending the expanded result to the ID_EX register. And finally, entering an instruction execution stage, wherein compared with a traditional five-stage pipeline, the execution stage of the design not only completes the operation process, but also completes the memory access and write-back process in the traditional processor. Depending on what instruction is currently performing the corresponding operation, such as an add instruction, the value of Reg1 and the value of Reg2 are added. If the memory loading instruction is the memory loading instruction, reading the memory data of the corresponding address. If the instruction is a jump instruction, a jump signal is sent, and the selector is triggered to select the next PC address through the jump signal. And writing the result after the operation back into the RegFile.
The instruction fetching unit mainly comprises an instruction address register and an instruction memory, wherein the instruction address register stores an instruction address, the instruction memory stores an instruction to be executed, the output of the instruction address register is connected to the input address of the instruction memory, the instruction in the corresponding address of the instruction memory can be read out to the IF_ID register according to the input address, the value of the instruction address register is 0 after reset, and the instruction of the address of the instruction register 0 starts to be read to the IF_ID register after the clock rising edge arrives. When a jump instruction is encountered in the execution stage, a jump signal is generated, and the jump instruction is divided into a conditional direct jump, an unconditional direct jump and an unconditional indirect jump. A stall signal is generated when a divide instruction is encountered during the execution phase. The instruction address register detects whether a jump or pause signal is received, if not, the above operation is repeated after the value of the instruction address register is added by 4, if the jump signal is detected, the multiplexer outputs the jump address to the instruction address register, and if the pause signal is detected, the value of the instruction address register remains unchanged.
The decoding unit mainly comprises an instruction decoder and a general register, the 32-bit basic integer instruction of the risc-v processor has six formats, the lower seven bits of each instruction are operation codes (opcodes), and the lowest two bits are instruction indication codes. The instruction also includes operand register addresses rs1 and rs2 and an immediate imm, i.e., three operands are provided. The instruction decoder can analyze the type and the operation code of the instruction according to the opcode, function 3 and function 7 segments in the instruction Instr [31:0], and the rest instruction part comprises register indexes or immediate numbers of two operands, and the two-bit operands are obtained through the extension of the immediate numbers and the reading of general register data. A general purpose register (Regfile) port supports two read ports, one write back port, which outputs corresponding operands according to a register index in the instruction. The write-back port writes back data to the general purpose register according to the write-back address obtained by the execution unit.
The execution unit comprises an arithmetic logic operation module, a control and status register unit (CSR module), an address generation module and a multiplier-divider, and further realizes the memory access and write-back functions of the traditional five-stage pipeline processor. The arithmetic logic operation module mainly completes the operations of logic operation, addition and subtraction and shift operation, and uses opcode as a control signal of a multiplexer so as to output a jump or non-jump instruction address register address. And in the execution stage, the corresponding operation type is finished according to the operation code of the instruction, and the execution result is sent to each corresponding module or stored in a register. The CSR module comprises a CSR register group and a CSR read-write control module, wherein the CSR read-write control module completes the operation of a CSR read-write instruction, receives the write operation from the execution unit and outputs the execution unit read register data.
The following are the main functions of the module:
program counter (instruction address register): the module has a cycle variable for counting, which performs self-increment operation when each clock rising edge triggers.
Write operation of CSR register: the module updates the corresponding CSR register according to the write address and data received from the execution unit.
Read operation of CSR register: the module returns the value of the corresponding CSR register according to the read operation address received from the execution unit.
The memory access refers to a process of performing a data memory read operation on a memory address generated after the execution of a memory access instruction is completed, and reading or writing data from or into the data memory. Because the processor is designed into a three-stage pipeline structure, after the instruction execution is finished, the data memory is accessed in the same clock cycle, and the data memory is written back to the corresponding component according to the operation result. Write back is the last step of the pipeline, writing the instruction's operation result back to the general register file (RegFile).
As shown in FIG. 3, three groups of inst_addr_i/inst_i from top to bottom respectively correspond to instruction addresses and instruction contents of the instruction fetch unit, the decoding unit and the execution unit, and it can be seen that the instruction 00000D13 with the address 0 after reset and pull-up is read out, sent to the decoding unit at the next clock rising edge through the IF_ID register and sent to the execution unit at the next clock edge through the ID_EX register.
Instruction 00000D13 (addi) is fetched and decoded, risc-v instruction format is shown in FIG. 4, RV32I partial integer instruction is shown in FIG. 5.
The decoding unit simulation waveform is shown in fig. 6, where the opcode is 0010011 and the function 3 is 000 and the addi instruction, where RegFile is an empty register and the immediate is 0, so the op1 and op2 waveforms are 0 signals, and the decoding unit task is completed.
The execution unit simulation waveform is shown in FIG. 7, the operation performed is addition, and both operands are 0, so the final result is also 0, reg_wdata is written into the 0x1a address of regFile, and the execution unit task is completed.
Instruction compatibility testing is implemented by test programs maintained by risc-v authorities, which aim to verify whether the designed processor architecture meets specifications. The results of the verification are shown in fig. 8, and all the verifications pass.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (7)

1. A three-stage pipelined microarchitecture for an risc-v processor, comprising: the device comprises a finger taking unit, a decoding unit and an executing unit;
the instruction fetching unit fetches instructions from an instruction memory of the instruction fetching unit and sends the instructions to the decoding unit;
the decoding unit analyzes the instruction and sends the analyzed instruction to the execution unit;
the execution unit executes corresponding operation according to the analyzed instruction, reads the needed data from the data memory of the execution unit in the process of executing the operation, completes the access and storage of the data in the same clock period after the operation is executed, and writes the operation result back to the general register in the decoding unit after the operation is executed.
2. The three-stage pipeline microarchitecture for an risc-v processor according to claim 1 wherein the instruction fetch unit includes an instruction address register and an instruction memory, an output of the instruction address register is connected to an input of the instruction memory, the instruction address register sends an instruction address to the instruction memory, and the instruction memory fetches a corresponding instruction according to the instruction address and sends the instruction to the decode unit.
3. The three stage pipeline microarchitecture for an risc-v processor of claim 2 wherein the instruction address register resets to 0 after the instruction address is sent, then monitors whether a skip signal or a stall signal is received, if no skip signal or no stall signal is received, then the instruction address is resent after the instruction address register value is added by 4, if a skip signal is received, then the skip address contained in the skip signal is sent to the instruction address register by the multiplexer that sent the skip signal, and if a stall signal is received, then the instruction address register value remains unchanged.
4. The three stage pipelined microarchitecture for an risc-v processor of claim 2 wherein a cycle variable for counting is provided in the instruction address register that performs a self increment operation on each clock rising edge trigger.
5. The three-stage pipeline microarchitecture for an risc-v processor of claim 1 wherein the decode unit includes an instruction decoder that parses out a type and an operation code of an instruction from opcode, function 3, and function 7 segments in the instruction, the parsed out instruction being sent to the execution unit, and a general purpose register including two read ports that output corresponding operands according to a register index in the instruction, and a write back port that receives the operation result.
6. The three-stage pipelined microarchitecture for an risc-v processor of claim 1 wherein the execution unit includes an arithmetic logic operation module, a CSR module, an address generation module, and a multiplier-divider;
the address generation module is used for generating a write-back address, a read operation address and a write operation address;
the arithmetic logic operation module is used for carrying out logic operation, addition and subtraction and shift operation according to the analyzed instruction, and writing the operation result back to the general register according to the write-back address after the operation is completed;
the multiplication and division device is used for carrying out multiplication and division operation according to the analyzed instruction, and writing the operation result back to the general register according to the write-back address after the operation is finished;
the CSR module comprises a CSR register group and a CSR read-write control module, wherein the CSR register group updates a corresponding CSR register according to the write operation address, the CSR register group returns a value of the corresponding CSR register according to the read operation address, and the CSR read-write control module is used for controlling read-write operation of the CSR register group.
7. The three stage pipelined microarchitecture for an risc-v processor of claim 1 wherein the execution unit further outputs an opcode in the instruction to a multiplexer as its control signal causing the multiplexer to output a jumped or non-jumped instruction address to an instruction address register.
CN202310931116.6A 2023-07-27 2023-07-27 Three-stage pipeline micro-architecture for risc-v processor Pending CN116974633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310931116.6A CN116974633A (en) 2023-07-27 2023-07-27 Three-stage pipeline micro-architecture for risc-v processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310931116.6A CN116974633A (en) 2023-07-27 2023-07-27 Three-stage pipeline micro-architecture for risc-v processor

Publications (1)

Publication Number Publication Date
CN116974633A true CN116974633A (en) 2023-10-31

Family

ID=88484483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310931116.6A Pending CN116974633A (en) 2023-07-27 2023-07-27 Three-stage pipeline micro-architecture for risc-v processor

Country Status (1)

Country Link
CN (1) CN116974633A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117193861A (en) * 2023-11-07 2023-12-08 芯来智融半导体科技(上海)有限公司 Instruction processing method, apparatus, computer device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117193861A (en) * 2023-11-07 2023-12-08 芯来智融半导体科技(上海)有限公司 Instruction processing method, apparatus, computer device and storage medium
CN117193861B (en) * 2023-11-07 2024-03-15 芯来智融半导体科技(上海)有限公司 Instruction processing method, apparatus, computer device and storage medium

Similar Documents

Publication Publication Date Title
EP2430526B1 (en) Instruction cracking based on machine state
US20080201564A1 (en) Data processor
CN116974633A (en) Three-stage pipeline micro-architecture for risc-v processor
US20090228686A1 (en) Energy efficient processing device
US6799266B1 (en) Methods and apparatus for reducing the size of code with an exposed pipeline by encoding NOP operations as instruction operands
CN112182999B (en) Three-stage pipeline CPU design method based on MIPS32 instruction system
Cho et al. Access region locality for high-bandwidth processor memory system design
Wong et al. The Delft reconfigurable VLIW processor
Zhang et al. Design and verification of three-stage pipeline CPU based on RISC-V architecture
Campi et al. A reconfigurable processor architecture and software development environment for embedded systems
Hexsel cMIPS–a synthesizable VHDL model for the classical five stage pipeline
Yılmaz et al. Design and Implementation of a 32-bit RISC-V Core
Sleeba et al. An ASIC implementation and evaluation of a profiled low-energy instruction set architecture extension
Cain et al. A dynamic binary translation approach to architectural simulation
Ogla et al. Implementation of Global History Branch Prediction Using MicroBlaze Processor
Kueffler A Configurable RISC V Processor Core for FPGA Devices
Lee et al. Efficient random vector verification method for an embedded 32-bit RISC core
Andorno Design of the frontend for LEN5, a RISC-V Out-of-Order processor
Hepola Generation of Customized RISC-V Implementations
Othman INVESTIGATION OF NEW ARCHITECTURAL FEATURES TO SUPPORT PERFORMANCE IMPROVEMENT IN EMBEDDED PROCESSORS
Soares et al. Design space exploration using T&D-Bench
JP4382076B2 (en) Data processing device
Wang et al. Simple-VLIW: a fundamental VLIW architectural simulation platform
Matosevic et al. Performance Evaluation of the Nios Architecture
Hsiao et al. A Simulation Toolkit for x86-compatible Processors—Xsim

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination