CN117472445B

CN117472445B - Superscalar processing system, method and related equipment based on emission buffering

Info

Publication number: CN117472445B
Application number: CN202311800171.8A
Authority: CN
Inventors: 刘宇翔; 倪磊
Original assignee: Ruisixinke Shenzhen Technology Co ltd
Current assignee: Ruisixinke Shenzhen Technology Co ltd
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-04-23
Anticipated expiration: 2043-12-26
Also published as: CN117472445A

Abstract

The invention is suitable for the technical field of processors, and particularly relates to a superscalar processing system, a superscalar processing method and related equipment based on emission buffering, wherein the superscalar processing system comprises a finger fetching unit used for acquiring instructions to be executed from a memory according to a processor cycle; the decoding unit is used for decoding and compiling the instruction acquired by the instruction acquisition unit to obtain an instruction to be executed transmitting buffer unit, and the instruction to be executed is temporarily stored and transmitted to the execution unit for execution according to a preset transmitting rule; an execution unit, configured to execute the received instruction to be executed on a processor pipeline; and the write-back unit is used for acquiring an execution result of the execution unit on the instruction to be executed and writing the execution result back into a write-back register of the processor. The invention reduces the frequency of pipeline flushing and avoids the performance problem caused by whole pipeline suspension or pipeline flushing.

Description

Superscalar processing system, method and related equipment based on emission buffering

Technical Field

The invention is applicable to the technical field of processors, and particularly relates to a superscalar processing system and method based on emission buffering and related equipment.

Background

The fifth generation of reduced instruction set (RISC-V) is a free, open and extensible instruction set architecture, and a processor system realized based on the RISC-V instruction set can be optimized according to different scenes such as industrial control, internet of things equipment and the like. In existing processor systems, there are a number of specific processing methods for situations where special operations need to be performed on the processor pipeline, such as system instruction control and status register instructions.

One method is to directly stall the whole pipeline in a full pipeline halt mode when an instruction needing to execute a special operation is detected, and block the execution of a subsequent instruction to wait for the execution of the special operation instruction to finish. Obviously, this may result in a long pipeline stall, affecting the overall throughput of the processor.

Another approach is to trigger a pipeline flush signal when an instruction requiring a special operation is detected and the contents of a pipeline portion need to be flushed, discard a subsequent instruction already in the pipeline, and then restart execution of the pipeline. This approach reduces dead time compared to the pipelined stall approach, but requires the introduction of complex control logic.

In addition, there are other methods of inserting no-operation into the processor system through hardware, that is, inserting some no-operation instructions which do not perform any operation after the instructions which need to perform special operations, so as to fill the pipeline, and maintain the stability of the pipeline. While this approach helps to maintain the proper execution order of the pipeline, it requires coordination of a software compiler, while also increasing the length of the instruction stream.

In summary, conventional methods may have problems of inefficiency and greater impact on pipelines when processing special operation instructions in a processor system.

Disclosure of Invention

The invention provides a superscalar processing system, a superscalar processing method and related equipment based on emission buffering, and aims to solve the problem of processing efficiency caused by inflexible processing of partial instructions by an existing processor system.

To solve the above technical problem, in a first aspect, the present invention provides a superscalar processing system based on transmit buffering, including:

the instruction fetching unit is used for acquiring instructions to be executed from the memory according to the processor cycle;

the decoding unit is used for decoding and compiling the instruction acquired by the instruction acquisition unit to acquire an instruction to be executed;

The transmitting buffer unit is used for temporarily storing the instruction to be executed and transmitting the instruction to be executed to the execution unit for execution according to a preset transmitting rule;

an execution unit, configured to execute the received instruction to be executed on a processor pipeline;

and the write-back unit is used for acquiring an execution result of the execution unit on the instruction to be executed and writing the execution result back into a write-back register of the processor.

Still further, the transmit buffer unit includes:

A queue subunit, configured to temporarily store the instruction to be executed according to a preset queue;

And the emission checking subunit is used for transmitting the instruction to be executed sent by the decoding unit to the execution unit for execution according to the preset emission rule, or temporarily storing the instruction to be executed sent by the decoding unit into the queue subunit.

Further, the preset transmission rule specifically includes:

Determining a target instruction type, and judging whether the current instruction to be executed transmitted from the decoding unit accords with the target instruction type:

If not, transmitting the instruction to be executed to the execution unit for execution;

And if so, taking the current instruction to be executed as a target instruction, temporarily storing the target instruction and the subsequent instruction to be executed into the queue subunit, and selecting the target instruction from the queue subunit to be transmitted to the execution unit for execution after the execution of all instructions in the execution unit and the write-back unit is completed.

Further, the predetermined queue is a first-in first-out queue.

Still further, the target instruction type includes: a first type of instruction associated with the status and control register instructions and a second type of instruction associated with the system instructions.

Still further, the emission inspection subunit is further configured to:

After the write-back unit obtains the execution result of the target instruction:

if the target instruction is the first type instruction, the processor pipeline is emptied, and the instruction to be executed is selected from the queue subunit and transmitted to the execution unit for execution;

and if the target instruction is the second type instruction, selecting the instruction to be executed from the queue subunit, and transmitting the instruction to be executed to the execution unit for execution.

In a second aspect, the present invention also provides a superscalar processing method based on transmit buffer implemented by a superscalar processing system based on transmit buffer as described above, the superscalar processing method comprising the steps of:

s1, acquiring an instruction to be executed from a memory according to a processor cycle through an instruction fetching unit;

s2, decoding and compiling the instruction acquired by the instruction acquisition unit through a decoding unit to acquire an instruction to be executed;

S3, temporarily storing the instruction to be executed in a transmitting buffer unit, and transmitting the instruction to be executed to an executing unit for executing according to a preset transmitting rule;

S4, executing the received instruction to be executed on a processor pipeline through an execution unit;

S5, obtaining an execution result of the execution unit on the instruction to be executed through a write-back unit, and writing the execution result back into a write-back register of the processor.

Further, the step S3 specifically includes:

And transmitting the instruction to be executed, which is sent out by the decoding unit, to an execution unit for execution through a transmission checking subunit according to the preset transmission rule, or temporarily storing the instruction to be executed, which is sent out by the decoding unit, into a queue subunit according to a preset queue.

In a third aspect, the present invention also provides a computer device comprising: the method for processing the superscalar based on the emission buffer comprises a memory, a processor and a superscalar based on the emission buffer which is stored on the memory and can run on the processor, wherein the processor realizes the steps in the superscalar based on the emission buffer processing method according to any one of the embodiments when executing the superscalar based on the emission buffer.

In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon a transmit buffer based superscalar processing program which when executed by a processor implements the steps in a transmit buffer based superscalar processing method as in any of the above embodiments.

The invention has the beneficial effects that the invention provides the superscalar processing system with the emission buffer structure, through the design of emission buffer, the operation instructions which possibly affect the pipeline performance of the processor are processed by combining a queue and instruction checking mode, the emptying frequency of the pipeline is reduced, the performance problem caused by the suspension or emptying of the whole pipeline is avoided, meanwhile, the design simplifies the control logic of the pipeline in the processor, reduces the system power consumption and improves the overall performance of the processor.

Drawings

FIG. 1 is a schematic diagram of a transmit buffer based superscalar processing system provided by the present invention;

FIG. 2 is a schematic diagram of a transmit buffer unit according to the present invention;

FIG. 3 is a block flow diagram of steps of a method for processing superscalar based on transmit buffering according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a superscalar processing system based on transmit buffering, where the superscalar processing system 100 based on transmit buffering includes:

The instruction fetching unit 101 is configured to obtain an instruction to be executed from the memory according to a processor cycle;

The decoding unit 102 is configured to decode and compile the instruction acquired by the instruction fetching unit to obtain an instruction to be executed;

A transmitting buffer unit 103, configured to temporarily store the instruction to be executed, and transmit the instruction to be executed to an execution unit 104 for execution according to a preset transmitting rule;

an execution unit 104, configured to execute the received instruction to be executed on a processor pipeline;

and the write-back unit 105 is used for acquiring the execution result of the execution unit on the instruction to be executed and writing the execution result back into a write-back register of the processor.

Specifically, the issue buffer unit 103 in the embodiment of the present application is disposed between the decoding unit 102 and the execution unit 104, and in the existing processor system, after the decoding unit 102 decodes and compiles the instruction to be executed, the instruction is directly sent to the execution unit 104 for execution, and the present application processes the instruction to be executed that is about to be sent to the execution unit 104 based on the issue buffer unit 103. Referring to fig. 2, the transmit buffer unit includes:

A queue subunit 1031, configured to temporarily store the instruction to be executed according to a preset queue;

And the emission checking subunit 1032 is configured to emit the instruction to be executed issued by the decoding unit to the execution unit for execution according to the preset emission rule, or temporarily store the instruction to be executed issued by the decoding unit into the queue subunit 1031.

The preset transmitting rule specifically comprises the following steps:

Determining a target instruction type, and judging whether the current to-be-executed instruction transmitted from the decoding unit 102 accords with the target instruction type:

if not, transmitting the instruction to be executed to the execution unit 104 for execution;

If so, the current instruction to be executed is used as a target instruction, the target instruction and the subsequent instructions to be executed are temporarily stored in the queue subunit 1031, and after all the instructions in the execution unit 104 and the write-back unit 105 are executed, the target instruction is selected from the queue subunit 1031 and transmitted to the execution unit 104 for execution.

The preset queue is a first-in first-out queue. The fifo can ensure the execution sequence of the processor instructions, and after determining the target instruction entering the queue, the fifo can enable the subsequent instructions to be normally executed according to the order in which the decoding unit 102 sends the instructions to be executed, even if the subsequent operations of emptying the pipeline are not required.

The target instruction type includes: a first type of instruction associated with the status and control register instructions and a second type of instruction associated with the system instructions. In processor systems implemented based on the RISC-V instruction set, long dead times may occur for instruction operations associated with state and control register instructions, while instruction operations associated with system instructions require suspending execution of the pipeline. For different instruction types, the issue check subunit 1032 is further configured to:

after the write-back unit 105 obtains the execution result of the target instruction:

If the target instruction is the first type instruction, flushing a processor pipeline, selecting the instruction to be executed from the queue subunit 1031, and transmitting the instruction to be executed to the execution unit 104 for execution;

If the target instruction is the second type instruction, the instruction to be executed is selected from the queue subunit 1031 and transmitted to the execution unit 104 for execution.

By adding the emission buffer structure and the corresponding instruction checking mode in the processor system, the instruction control logic can be realized in a simpler mode, and meanwhile, the pipeline operation requirement of the processor system when executing specific instructions is met.

The embodiment of the invention also provides a superscalar processing method based on the emission buffer, which is realized by the superscalar processing system based on the emission buffer, referring to fig. 3, fig. 3 is a block flow diagram of steps of the superscalar processing method based on the emission buffer, which is provided by the embodiment of the invention, and the superscalar processing method comprises the following steps:

The step S3 specifically comprises the following steps:

Specifically, the preset transmission rule specifically includes:

If yes, taking the current instruction to be executed as a target instruction, temporarily storing the target instruction and the subsequent instruction to be executed into the queue subunit, and after all instructions in the execution unit and the write-back unit are executed, selecting the target instruction from the queue subunit and transmitting the target instruction to the execution unit for execution;

After the target instruction completes execution, all pipeline contents are emptied or execution continues from the issue buffer, depending on its type.

The preset queue is a first-in first-out queue.

The method for processing the superscalar is based on being capable of implementing the superscalar processing system as described in the above embodiments, and implementing the same technical effects based on the corresponding system units, and is not described in detail herein.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention, where the computer device 200 includes: memory 202, processor 201, and transmit buffer based superscalar processor stored on the memory 202 and executable on the processor 201.

The processor 201 invokes the superscalar processing program based on the transmit buffer stored in the memory 202, and executes the steps in the superscalar processing method based on the transmit buffer provided in the embodiment of the present invention, please refer to fig. 3, which specifically includes the following steps:

The step S3 specifically comprises the following steps:

Specifically, the preset transmission rule specifically includes:

The preset queue is a first-in first-out queue.

The computer device 200 provided in the embodiment of the present invention can implement the steps in the superscalar processing method based on transmit buffering in the above embodiment, and can implement the same technical effects, and is not described herein again with reference to the description in the above embodiment.

The embodiment of the invention also provides a computer readable storage medium, on which a superscalar processing program based on transmission buffer is stored, and when the superscalar processing program based on transmission buffer is executed by a processor, the process and the steps in the superscalar processing method based on transmission buffer provided by the embodiment of the invention are realized, and the same technical effects can be realized, so that repetition is avoided and redundant description is omitted.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

While the embodiments of the present invention have been illustrated and described in connection with the drawings, what is presently considered to be the most practical and preferred embodiments of the invention, it is to be understood that the invention is not limited to the disclosed embodiments, but on the contrary, is intended to cover various equivalent modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A superscalar processing system based on transmit buffering, comprising:

The transmission buffer unit is used for temporarily storing the instruction to be executed and transmitting the instruction to be executed to the execution unit for execution according to a preset transmission rule, and the transmission buffer unit comprises:

The emission checking subunit is used for transmitting the instruction to be executed sent by the decoding unit to the execution unit for execution according to the preset emission rule, or temporarily storing the instruction to be executed sent by the decoding unit into the queue subunit;

the write-back unit is used for acquiring an execution result of the execution unit on the instruction to be executed and writing the execution result back into a write-back register of the processor;

The preset transmitting rule specifically comprises the following steps:

2. The transmit buffer based superscalar processing system of claim 1 wherein said pre-set queue is a first-in-first-out queue.

3. The transmit buffer based superscalar processing system of claim 2 wherein said target instruction type comprises: a first type of instruction associated with the status and control register instructions and a second type of instruction associated with the system instructions.

4. The transmit buffer-based superscalar processing system of claim 3, wherein said transmit check subunit is further configured to:

5. A transmit buffer based superscalar processing method implemented on the basis of a transmit buffer based superscalar processing system as claimed in any one of claims 1-4, characterized in that said superscalar processing method comprises the steps of:

6. The method for processing superscalar based on transmit buffering of claim 5, wherein step S3 is specifically:

7. A computer device, comprising: memory, a processor and a transmit buffer based superscalar processor stored on said memory and executable on said processor, said processor implementing the steps in the transmit buffer based superscalar processing method as claimed in any one of claims 5, 6 when said transmit buffer based superscalar processor is executing said transmit buffer based superscalar processor.

8. A computer readable storage medium, wherein a transmission buffer based superscalar processing program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of the transmission buffer based superscalar processing method according to any one of claims 5 and 6.