CN115080121A - Instruction processing method and device, electronic equipment and computer-readable storage medium - Google Patents

Instruction processing method and device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN115080121A
CN115080121A CN202210784294.6A CN202210784294A CN115080121A CN 115080121 A CN115080121 A CN 115080121A CN 202210784294 A CN202210784294 A CN 202210784294A CN 115080121 A CN115080121 A CN 115080121A
Authority
CN
China
Prior art keywords
instruction
cache
cache unit
indication information
rob
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210784294.6A
Other languages
Chinese (zh)
Inventor
郭向飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eswin Computing Technology Co Ltd
Original Assignee
Beijing Eswin Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eswin Computing Technology Co Ltd filed Critical Beijing Eswin Computing Technology Co Ltd
Priority to CN202210784294.6A priority Critical patent/CN115080121A/en
Publication of CN115080121A publication Critical patent/CN115080121A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The embodiment of the application provides an instruction processing method and device, electronic equipment and a computer readable storage medium, and relates to the field of computers. The method comprises the following steps: receiving first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in the first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, the processing mode that the ROB cache unit is already allocated to the instruction when the instruction processing is not completed, and the instruction is stored in the ROB cache after the instruction processing flow is completed; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.

Description

Instruction processing method and device, electronic equipment and computer-readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an instruction processing method and apparatus, an electronic device, and a computer-readable storage medium.
Background
Currently, superscalar design approaches are commonly used for processor design. The superscalar design processor supports multi-emission out-of-order execution, namely, in the superscalar design processor, instructions have the characteristics of out-of-order execution and sequential retirement. Most superscalar design processors guarantee sequential retirement of instructions by ordering the instructions through a Reorder Buffer (ROB) module. However, in the ROB design, the ROB cache resources are usually allocated in the instruction processing process, which results in the waste of cache resources.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks, in particular the technical drawback of cache resource waste.
According to an aspect of the present application, there is provided an instruction processing method including:
receiving first indication information; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed;
responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the instructions stored in the first cache are arranged according to an instruction generation sequence.
Optionally, after allocating the target cache unit for the first instruction in the first cache, the method further includes:
storing the first instruction to the target cache unit.
Optionally, the method further includes:
and if a second instruction generated before the first instruction is stored in the first cache, deleting the first instruction and the second instruction according to the instruction generation sequence.
Optionally, after the receiving instruction generates a message, the method further includes:
adding an identity to the first instruction, wherein the identity indicates an instruction generation sequence of the first instruction.
Optionally, the allocating a target cache unit for the first instruction in the first cache includes:
determining a target cache unit corresponding to the identity identification according to the corresponding relation between the identity identification and the cache unit identification
According to another aspect of the present application, there is provided an instruction processing apparatus including:
the receiving module is used for receiving first indication information; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed;
the allocation module is used for responding to the first indication information and allocating a target cache unit for the first instruction in a first cache; the instructions stored in the first cache are arranged according to an instruction generation sequence.
Optionally, the apparatus further comprises:
a storage module to store the first instruction to a target cache unit after the target cache unit is allocated for the first instruction in the first cache.
Optionally, the apparatus further comprises:
and the instruction deleting module is used for deleting the first instruction and the second instruction according to the instruction generating sequence under the condition that the second instruction generated before the first instruction is stored in the first cache.
According to another aspect of the present application, there is provided an electronic device including:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: executing the instruction processing method according to any one of the first aspect of the present application.
For example, in a third aspect of the present application, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the instruction processing method as shown in the first aspect of the application.
According to yet another aspect of the present application, there is provided a computer-readable storage medium, which when executed by a processor implements the instruction processing method of any one of the first aspects of the present application.
For example, in a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, where a computer program is stored, and when the program is executed by a processor, the computer program implements the instruction processing method shown in the first aspect of the present application.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided in the various alternative implementations of the first aspect described above.
The beneficial effect that technical scheme that this application provided brought is:
the embodiment of the application receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, the processing mode that the ROB cache unit is already allocated to the instruction when the instruction processing is not completed, and the instruction is stored in the ROB cache after the instruction processing flow is completed; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of an instruction processing method according to an embodiment of the present disclosure;
fig. 2 is a schematic view of an application scenario of an instruction processing method according to an embodiment of the present application;
fig. 3 is a schematic view of an application scenario of an instruction processing method according to an embodiment of the present application;
fig. 4 is a schematic view of an application scenario of an instruction processing method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an instruction processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device for instruction processing according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
With the advent of the big data age, people have higher and higher requirements on the processing speed of computers. To improve the performance of processors, it has been proposed to process instructions in a pipelined manner. The pipeline technique is a technique that decomposes each instruction into a plurality of stages and overlaps the operations of the stages, thereby realizing parallel processing of several instructions. Here, the instructions in the program are still executed in a sequence, but several instructions may be obtained in advance, and when the current instruction is not executed completely, other stages of the subsequent instructions are started in advance, so that the processing efficiency of the processor can be obviously improved.
There are more than one pipeline in a Central Processing Unit (CPU) and more than one instruction can be completed in each clock cycle, which is known as superscalar technology. Currently, superscalar processors are mainly used. In superscalar processors, one processor core is capable of performing a type of parallel operation that is instruction level parallel. Thus, a superscalar processor can achieve higher processor throughput at the same processor dominant frequency. For superscalar processors, the instruction pipeline may include a Fetch (Fetch) stage, a decode (Decoder) stage, a Dispatch (Dispatch) stage, an Issue (Issue) stage, an Execute (Execute) stage, and a retirement (Retire) stage, among others. Wherein, in the instruction fetching stage, a plurality of instructions can be fetched from an instruction Cache (I-Cache); in the decode stage, multiple instructions may be sequentially (in-order) decoded in the same clock cycle and then issued out-of-order (out-of-order) in the issue stage, enabling the execution stage to execute the multiple instructions out-of-order. In order to enable the instruction results executed out of order to be retired in order, a Reorder Buffer (ROB) module is introduced into the instruction pipeline, and the ROB module is used for reordering the instruction results executed out of order. After the instructions are executed out of order, the results of the execution are submitted out of order to the ROB module. However, the ROB module may retire the execution results submitted out of order in the order of the cache entries.
That is, superscalar techniques are currently employed in most processors. The processor supports the characteristic of multi-emission out-of-order execution, but in a high-performance processor which executes out-of-order in superscalar, the characteristics of out-of-order execution and sequential retirement are generally reserved, so that the internal execution details of the processor are not sensed by software, and the debugging difficulty of the software is greatly reduced. It should be noted that in most processor designs, the ROB module typically orders instructions to ensure sequential retirement of instructions. This ROB design requires the ROB resources to begin allocation at the previous stage in the pipeline (typically the dispatch stage) of instruction out of order. The reason that the ROB module needs to record the original sequence of the instruction stream before the instruction stream is out of order in order to achieve the instruction sequencing function, but the allocated ROB resources are used after the instructions are further subjected to stages of transmitting, executing, retirement and the like, which increases resource waste to a certain extent.
Based on this, the embodiment of the present application provides an instruction processing method, which receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, the processing mode that the ROB cache unit is already allocated to the instruction when the instruction processing is not completed, and the instruction is stored in the ROB cache after the instruction processing flow is completed; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present application provides an instruction processing method, which is optionally applied to an electronic device. For convenience of description, the embodiments of the present application will be described below by taking the method as an example applied to a processor; for example, the processor may be a CPU or a Graphics Processing Unit (GPU), or the like.
In the embodiment of the application, the processor is a core component for fetching, decoding, transmitting, executing and the like. It is noted that the processor described herein may specifically be a superscalar processor, unless specifically noted otherwise.
The superscalar processor is generally characterized by 'out-of-order execution and sequential retirement', and correspondingly, the superscalar processor is provided with an ROB module, and instruction streams are sequenced through the ROB module, so that the sequential retirement of instructions can be guaranteed. Specifically, in the design of the ROB module, the ROB module serves multiple purposes, may include functions of instruction sorting, instruction retirement caching, and the like, and in some structures, also undertakes the task of register renaming. However, this approach not only increases the design complexity of the ROB module, but also causes resource waste to a certain extent, and reduces the resource utilization rate. Based on this, in the embodiment of the present application, the instruction sorting function and the retirement instruction cache function are decoupled. In addition, it should be noted that other functions in the ROB module may also be decoupled from the ROB module, for example, the register renaming function is decoupled from the ROB module, and the register renaming function is completed by an independent register renaming module to release the renaming task of the ROB module; the embodiment of the present application is not particularly limited to this, and the following description will take an example in which the instruction sequencing function is decoupled from the ROB module as an example.
It will be appreciated that in a superscalar processor, a plurality of instructions in an instruction stream sequence are sequentially operated on in an instruction fetch stage, a decode stage and a dispatch stage; out-of-order operations are then entered from the issue stage so that the plurality of instructions can be executed out-of-order in the execute stage. Thus, in order to implement the instruction sorting function, the embodiment of the present application needs to record the original sequence of the instruction stream before the instruction stream is out of order. For the related art, the ROB module needs to allocate resources in advance to record the original sequence of the instruction stream; thus, from the perspective of allocating resources in advance by the ROB module due to instruction ordering, the related art starts allocating ROB resources at the previous stage (usually, the dispatch stage) of the instruction out-of-order in the pipeline, but wastes resources to some extent. In this embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; the occupied cache resources are reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved. Specifically, the method may comprise the steps of:
s101: first indication information is received. The first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed completely.
Optionally, the embodiment of the application may be applied to the technical field of computers, and in particular, may be applied to a processing operation scenario in which a Re-order Buffer (ROB; which may be referred to as an ROB cache later) stores a first instruction.
The first instruction may comprise any instruction in a processor. For example, the first instruction may include a data read instruction, a data write instruction, a data operation instruction, an object program launch instruction, an object program exit instruction, and so on.
The first indication information is indication information indicating that the execution of the instruction processing flow corresponding to the first instruction is finished.
Specifically, the completion of the execution of the instruction processing flow of the first instruction may be determined by:
the first method is as follows: determined by the processing result that generated the first instruction. For example, the first instruction is a data operation instruction; when the data operation result corresponding to the first instruction is generated, it may be determined that the instruction processing flow of the first instruction is executed completely.
The second method comprises the following steps: the determination is accomplished by processing step execution of the instruction processing flow. For example, the instruction processing flow of the first instruction includes 5 processing steps executed in the execution order, and when the execution of the 5 th processing step is completed, it may be determined that the execution of the instruction processing flow of the first instruction is completed. By way of example, with reference to fig. 2, an instruction processing flow of the embodiment of the present application is described: the instruction processing flow shown in the figure includes 5 processing steps, and the 5 processing steps are an instruction fetching step, a decoding step, a distributing step, a transmitting step and an executing step respectively.
Instruction fetching step (Instruction Fetch Unit, IFU): the get instruction is executed, fetching the instruction from the I-Cache using the value of a Program Counter (PC) Register as an address.
Decoding step (Decoder): and decoding the fetched instruction, and reading the register file according to the decoding result to obtain the source operand of the instruction.
Dispensing step (Dispatcher): execution sends the decoded instructions to the transmit module in the original order specified in the program.
Emission step (Issue): execution sends instructions in the issue queue to the execution module. Specifically, in the execution process of the instruction pipeline, the instruction after instruction fetching, decoding and distribution is pushed to the transmitting module and is cached in the transmitting queue of the transmitting module.
Execute step (Execute or lsu): and executing the instruction according to the decoding result.
S102: responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the instructions stored in the first cache are arranged according to an instruction generation sequence.
The first cache comprises a cache used for storing the instructions after the instruction processing flow is executed; optionally, in this embodiment of the present application, the first cache may be an ROB cache.
In high performance superscalar processors, Instruction out-of-order execution is typically employed to achieve better Instruction Parallelism (ILP) to improve processor performance. In order to support functions such as precise exception, hardware speculation, and register renaming in the out-of-order execution of instructions, a Reorder Buffer (ROB) mechanism is usually used, that is, the ROB is used to implement a pipeline in which instructions exit an instruction processing flow in sequence, so as to ensure the accuracy of the out-of-order execution of programs.
For example, in an actual processing scenario, an ROB cache unit may be allocated for instructions after they are decoded, completing register renaming and order preservation of instructions; in the instruction processing completion stage, the instruction after the instruction processing flow is executed is stored in an ROB cache; then, the instructions exit from the ROB cache in sequence according to the generation sequence; General-Purpose registers (GPRs) and memory are sequentially updated after instructions exit, ensuring accurate exception and hardware speculation.
However, in the above processing scenario, at the instruction decoding stage, that is, when the instruction processing is not completed yet, the ROB cache unit is already allocated to the instruction, and the instruction is not stored in the ROB cache until the instruction processing flow is completed; therefore, when the instruction processing flow is not finished, allocating the ROB cache unit in advance in the ROB cache occupies cache resources, and reduces the number of the to-be-processed instructions flowing on the instruction processing pipeline.
In order to reduce cache resource occupation, increase the number of instructions to be processed on an instruction processing pipeline, and improve instruction processing efficiency, in the embodiments of the present application, when first indication information indicating that an instruction processing flow corresponding to an instruction (i.e., a first instruction of the present application) is executed is completed is received, a target cache unit may be allocated to the first instruction in a first cache.
The target cache unit is a cache unit corresponding to the first instruction. In the first cache, e.g., the ROB cache, the stored instructions are arranged in an instruction generation order.
Optionally, in an actual processing scenario, since the instructions stored in the ROB cache are arranged according to the instruction generation sequence, in the embodiment of the present application, a target cache unit may be allocated to a first instruction according to a correspondence between an Identity (ID) of the first instruction and an identifier of a cache unit.
Specifically, instruction IDs may be established for the respective instructions after the instructions are generated to identify the order of the instruction generation; recording the cache unit identification of the cache unit of the first instruction in the ROB cache; and then determining a target cache unit corresponding to the first instruction according to the corresponding relation between the instruction ID of the first instruction and the cache unit identifier corresponding to the first instruction.
For example, the instruction ID of the first generated instruction (e.g., the first generated instruction within a preset timing period) may be "0", i.e., the first generated instruction is instruction 0; the cache unit identifier of the cache unit of the instruction 0 in the ROB cache is 10, that is, the instruction 0 corresponds to the cache unit 10; since the instructions stored in the ROB cache are arranged in the instruction generation order, the instructions generated after instruction 0 may be stored in a cache unit after cache unit 10. For example, instruction 1 is stored in cache unit 11, instruction 2 is stored in cache unit 12, instruction 3 is stored in cache unit 13, and so on. In addition, in some embodiments, the cache units corresponding to the instructions may not be adjacent; for example, instruction 1 is stored in cache unit 11, instruction 2 is stored in cache unit 13, instruction 3 is stored in cache unit 15, and so on.
The embodiment of the application receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, the processing mode that the ROB cache unit is already allocated to the instruction when the instruction processing is not completed, and the instruction is stored in the ROB cache after the instruction processing flow is completed; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
In an embodiment of the application, after allocating the target cache unit for the first instruction in the first cache, the method further includes:
storing the first instruction to the target cache unit.
After a target cache unit corresponding to a first instruction is determined, the first instruction is stored to the target cache unit. It is understood that, in the embodiment of the present application, the first instruction stored in the target cache unit may be instruction information corresponding to the first instruction, for example, an ID of the first instruction, a processing object of the first instruction, a processing state of the first instruction, and the like.
In one embodiment of the present application, the method further comprises:
and if a second instruction generated before the first instruction is stored in the first cache, deleting the first instruction and the second instruction according to the instruction generation sequence.
Specifically, the second instruction includes an instruction generated before the first instruction.
When the second instruction is stored in the first cache, that is, the instruction processing flow corresponding to the second instruction is executed completely, so that the second instruction is stored in the first cache, the first instruction and the second instruction may be deleted according to the instruction generation sequence.
It can be understood that, in the embodiment of the present application, the instruction after the instruction processing flow is executed is stored in the ROB cache; the instructions are then retired from the ROB cache in-order in the order of generation.
In an embodiment of the application, after the receiving the instruction generates the message, the method further includes:
adding an identity to the first instruction, wherein the identity indicates an instruction generation sequence of the first instruction.
In an embodiment of the present application, said allocating a target cache unit for the first instruction in the first cache comprises:
and determining a target cache unit corresponding to the identity identification according to the corresponding relation between the identity identification and the cache unit identification.
Optionally, in an actual processing scenario, because the stored instructions are arranged according to the instruction generation sequence in the ROB cache, in the embodiment of the present application, a target cache unit may be allocated to a first instruction according to a correspondence between an Identity (ID) of the first instruction and an identifier of a cache unit.
Specifically, instruction IDs may be established for the respective instructions after the instructions are generated to identify the order of the instruction generation; recording the cache unit identification of the cache unit of the first instruction in the ROB cache; and then determining a target cache unit corresponding to the first instruction according to the corresponding relation between the instruction ID of the first instruction and the cache unit identifier corresponding to the first instruction.
For example, the instruction ID of the first generated instruction (e.g., the first generated instruction within a preset timing period) may be "0", i.e., the first generated instruction is instruction 0; the cache unit identifier of the cache unit of the instruction 0 in the ROB cache is 10, that is, the cache unit 10 corresponding to the instruction 0; since the instructions stored in the ROB cache are arranged in the instruction generation order, the instructions generated after instruction 0 may be stored in a cache unit after cache unit 10. For example, instruction 1 is stored in cache unit 11, instruction 2 is stored in cache unit 12, instruction 3 is stored in cache unit 13, and so on. In addition, in some embodiments, the cache units corresponding to the instructions may not be adjacent; for example, instruction 1 is stored in cache unit 11, instruction 2 is stored in cache unit 13, instruction 3 is stored in cache unit 15, and so on.
The embodiment of the application receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, the processing mode that the ROB cache unit is already allocated to the instruction when the instruction processing is not completed, and the instruction is stored in the ROB cache after the instruction processing flow is completed; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
The following describes an instruction retirement scenario according to an embodiment of the present application by using two examples:
example one, a retirement scenario of an instruction is described with reference to fig. 3.
The instruction processing flow shown in the figure includes 6 processing steps, and the 6 processing steps are an instruction fetching step, a decoding step, a distributing step, an emission step, an execution step and a retirement step.
Instruction fetching step (Instruction Fetch Unit, IFU): the get instruction is executed, fetching the instruction from the I-Cache using the value of a Program Counter (PC) Register as an address.
Decoding step (Decoder): and decoding the fetched instruction, and reading the register file according to the decoding result to obtain the source operand of the instruction.
Dispensing step (Dispatcher): execution sends the decoded instructions to the transmit module in the original order specified in the program.
Emission step (Issue): execution sends instructions in the issue queue to the execution module. Specifically, in the execution process of the instruction pipeline, the instruction after instruction fetching, decoding and distribution is pushed to the transmitting module and is cached in the transmitting queue of the transmitting module.
An execution step (Execute or lsu: executing the instruction according to the decoded result.
Retirement step (Retire): and executing the instruction after the execution is finished.
As shown in fig. 3, the identification of the buffer units in the ROB buffer is from 10, and then 11, 12, 13, 14, etc. increase from top to bottom. Assume that instructions carrying instruction ids of 0, 1, 2, 3, 4, respectively, flow in the instruction pipeline.
At time t0, the instruction processing flow of the instruction is not completed (i.e., the instruction processing state shown in the figure is 0); instruction 1 completes the instruction processing flow (i.e., the instruction processing state shown in the figure is 1) at time t1, and thus instruction 1 may be stored to the cache location; since the oldest instruction (i.e. instruction 0 corresponding to the oldest id in the figure) generated first can be stored in the cache unit 10 (i.e. cache unit 10 corresponding to the oldest rob entry in the figure), then instruction 1 can be stored in the cache unit 11 according to the corresponding relationship between the instruction identifier and the cache unit identifier; at time t2, instruction 3 completes the instruction processing flow, so that instruction 3 may be stored in the cache unit, and according to the correspondence between the instruction identifier and the cache unit identifier, instruction 3 may be stored in the cache unit 13; at time t3, instruction 2 completes the instruction processing flow, so instruction 2 may be stored in the cache unit, and according to the correspondence between the instruction identifier and the cache unit identifier, instruction 2 may be stored in the cache unit 12; instruction 0 completes the instruction processing flow at time t4, and thus instruction 0 may be stored to cache molecule 10; at this time, instruction 0, instruction 1, and instruction 2 generated before instruction 3 are all stored in the cache unit, and then instruction 0, instruction 1, instruction 2, and instruction 3 may be retired, that is, instruction 0, instruction 1, instruction 2, and instruction 3 may be deleted from the ROB cache. Accordingly, the oldest id is updated to 4, and the oldest rob entry is updated to 14.
It should be further noted that, in the embodiment of the present application, the oldest instruction (also referred to as "oldest instruction") refers to an instruction that is ordered first in the original order of the instruction stream among all instructions that have not yet been retired, and the value of the oldest instruction identification information carried by the oldest instruction is the smallest. That is, a smaller value of instruction id indicates that the older the instruction, the earlier the ordering is in the original order of the instruction stream; conversely, a larger value for instruction id indicates that the instruction is newer, and is ordered further back in the original order of the instruction stream. In general, the value of the oldest instruction id carried by the oldest instruction is zero, and the value of the corresponding oldest cache identifier in the ROB cache is also zero, but is not limited in particular.
Example two, a retirement scenario of an instruction with an exception during execution of the instruction is described with reference to fig. 4.
The instruction processing flow shown in fig. 4 includes 6 processing steps, and the 6 processing steps are an instruction fetching step, a decoding step, a distributing step, an emitting step, an executing step, and a retirement step.
As shown in fig. 4, the identification of the buffer units in the ROB buffer is from 10, and then 11, 12, 13, 14, etc. increase from top to bottom. Assume that instructions carrying instruction ids of 0, 1, 2, 3, 4, respectively, flow in the instruction pipeline.
At time t0, the instruction processing flow of the instruction is not completed (i.e., the instruction processing state shown in the figure is 0); at time t1, instruction 1 completes the instruction processing flow (i.e. the instruction processing state shown in the figure is 1), and therefore instruction 1 can be stored into the cache unit; since the instruction generated first (i.e. instruction 0 corresponding to the oldest id in the figure) can be stored in the cache unit 10 (i.e. cache unit 10 corresponding to the oldest rob entry in the figure), then instruction 1 can be stored in the cache unit 11 according to the corresponding relationship between the instruction identifier and the cache unit identifier; at time t2, instruction 3 completes the instruction processing flow, so that instruction 3 may be stored in the cache unit, and according to the correspondence between the instruction identifier and the cache unit identifier, instruction 3 may be stored in the cache unit 13; at time t3, instruction 2 completes the instruction processing flow, so instruction 2 may be stored in the cache unit, and according to the correspondence between the instruction identifier and the cache unit identifier, instruction 2 may be stored in the cache unit 12, but instruction 2 has an exception during the instruction processing, and may be marked as an exception instruction; instruction 0 completes the instruction processing flow at time t4, and thus instruction 0 may be stored to cache molecule 10; at this time, instruction 0, instruction 1, and instruction 2 generated before instruction 3 are all stored in the cache unit, and since instruction 2 is an abnormal instruction, only instruction 0 and instruction 1 may be retired, that is, instruction 0 and instruction 1 may be deleted from the ROB cache. In this case, since there is an exception in instruction 2, the instruction generated after instruction 2 may also be regarded as an exception instruction, and then instruction 2 and instructions 3 and 4 generated after instruction 2 may be deleted; the newly generated instruction is then identified starting with an instruction ID of "2".
The embodiment of the application receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, the processing mode that the ROB cache unit is already allocated to the instruction when the instruction processing is not completed, and the instruction is stored in the ROB cache after the instruction processing flow is completed; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
In short, in the embodiment of the present application, the instruction sorting function included in the conventional ROB module is decoupled from the retired instruction cache function, so that the decoupled ROB module only retains the retired instruction cache function, and the instruction sorting function is provided by a separate tag module. The instruction sequencing function is decoupled from the traditional ROB module, so that the processing of the allocated cache unit can be moved to be executed after the instruction execution is finished, the serial and upstream and downstream relations between the cache module and the caches such as issue queue, load queue and store queue are realized on the whole, the coexistence mode in the related technology is avoided, and the resource utilization rate is effectively improved.
An embodiment of the present application provides an instruction processing apparatus, and as shown in fig. 5, the instruction processing apparatus 50 may include: a receiving module 501, an allocating module 502, wherein,
a receiving module 501, configured to receive first indication information; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed;
an allocating module 502, configured to allocate a target cache unit for the first instruction in a first cache in response to the first indication information; the instructions stored in the first cache are arranged according to an instruction generation sequence.
In one embodiment of the present application, the apparatus further comprises:
a storage module to store the first instruction to a target cache unit after the target cache unit is allocated for the first instruction in the first cache.
In one embodiment of the present application, the apparatus further comprises:
and the instruction deleting module is used for deleting the first instruction and the second instruction according to the instruction generating sequence under the condition that the second instruction generated before the first instruction is stored in the first cache.
In one embodiment of the present application, the apparatus further comprises:
an identity addition module for, after the receiving instruction generates a message,
adding an identity to the first instruction, wherein the identity indicates an instruction generation sequence of the first instruction.
In an embodiment of the present application, the allocating module is specifically configured to determine, according to a correspondence between the identity identifier and the cache unit identifier, a target cache unit corresponding to the identity identifier
In short, compared with the related art, the instruction processing apparatus of the embodiment of the present application mainly includes:
(1) the mark adding module is added, the mark can be marked for each executed instruction, the sequence of each instruction is indicated, and the mark flows along with the instruction in the assembly line until the instruction is retired. The module as an independent module can be positioned at any stage before the out-of-order stage in the instruction pipeline to execute operation, for example, in the IFU stage, when an instruction is fetched from the I-Cache, the marking module follows the PC value to determine the instruction address corresponding to the instruction to complete the marking of the instruction, namely, corresponding instruction identification information is added.
(2) The obtained distribution module after decoupling only retains the function of instruction retirement caching, and in addition, the implementation logics of other abnormal marks, pipeline flushing and the like are basically consistent with the ROB module before decoupling. But after functional decoupling by the ROB module, the allocation of the ROB entry will be made at the completion of instruction execution.
(3) One implementation is as follows: according to the current oldest identification register, the register is used for recording the instruction identification information of the oldest instruction in the instructions which are not retired currently. Illustratively, after a typical power-on reset, the register value is 0, i.e., the instruction numbered 0. Thus, the register will assist the dispatch module to complete the dispatch process of writing the corresponding instruction information into the ROB entry for the executed instruction.
(4) The other realization mode is as follows: according to the mapping between the instruction identification information and the cache unit identification information, maintaining the corresponding relation between the instruction identification information carried by the current instruction and the cache unit identification information of the ROB entry where the current instruction is located, and completing the allocation process of writing the corresponding instruction information into the ROB entry of the cache module aiming at the executed instruction by the auxiliary allocation module.
The apparatus of the embodiment of the present application may execute the method provided by the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus of the embodiments of the present application correspond to the steps in the method of the embodiments of the present application, and for the detailed functional description of the modules of the apparatus, reference may be specifically made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
The embodiment of the application receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, the processing mode that the ROB cache unit is already allocated to the instruction when the instruction processing is not completed, and the instruction is stored in the ROB cache after the instruction processing flow is completed; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the embodiment of the application receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, when the instruction processing is not finished, the processing mode that the ROB cache unit is already distributed for the instruction, and the instruction is stored into the ROB cache after the instruction processing flow is finished is adopted; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
In an alternative embodiment, an electronic device is provided, as shown in fig. 6, the electronic device 4000 shown in fig. 6 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
The memory 4003 is used for storing application program codes (computer programs) for executing the present scheme, and is controlled by the processor 4001 to execute. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.
Among them, electronic devices include but are not limited to: mobile phones, notebook computers, multimedia players, desktop computers, and the like.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments.
The embodiment of the application receives first indication information; responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed; that is to say, in the embodiment of the present application, when first indication information that an instruction processing flow corresponding to a first instruction is executed is received, a target cache unit may be allocated to the first instruction in a first cache; compared with the prior art, when the instruction processing is not finished, the processing mode that the ROB cache unit is already distributed for the instruction, and the instruction is stored into the ROB cache after the instruction processing flow is finished is adopted; according to the embodiment of the application, the occupation of cache resources is reduced, the number of the to-be-processed instructions flowing on the instruction processing pipeline is increased, and the instruction processing efficiency is improved.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as desired, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in this application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of this application are also within the protection scope of the embodiments of this application without departing from the technical idea of this application.

Claims (10)

1. An instruction processing method, comprising:
receiving first indication information; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed;
responding to the first indication information, and allocating a target cache unit for the first instruction in a first cache; the instructions stored in the first cache are arranged according to an instruction generation sequence.
2. The method of claim 1, wherein after allocating the target cache location for the first instruction in the first cache, the method further comprises:
storing the first instruction to the target cache unit.
3. The instruction processing method of claim 1, further comprising:
and if a second instruction generated before the first instruction is stored in the first cache, deleting the first instruction and the second instruction according to the instruction generation sequence.
4. The method according to any one of claims 1 to 3, wherein after receiving the instruction generation message, the method further comprises:
adding an identity to the first instruction, wherein the identity indicates an instruction generation sequence of the first instruction.
5. The method of claim 1, wherein said allocating a target cache location in a first cache for the first instruction comprises:
and determining a target cache unit corresponding to the identity identification according to the corresponding relation between the identity identification and the cache unit identification.
6. An instruction processing apparatus, comprising:
the receiving module is used for receiving first indication information; the first indication information indicates that an instruction processing flow corresponding to a first instruction in the processor is executed;
the allocation module is used for responding to the first indication information and allocating a target cache unit for the first instruction in a first cache; the instructions stored in the first cache are arranged according to an instruction generation sequence.
7. The instruction processing apparatus according to claim 6, wherein the apparatus further comprises:
a storage module to store the first instruction to a target cache unit after the target cache unit is allocated for the first instruction in the first cache.
8. The instruction processing apparatus according to claim 6, wherein said apparatus further comprises:
and the instruction deleting module is used for deleting the first instruction and the second instruction according to the instruction generating sequence under the condition that the second instruction generated before the first instruction is stored in the first cache.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: executing the instruction processing method according to any of claims 1 to 5.
10. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the instruction processing method of any one of claims 1 to 5.
CN202210784294.6A 2022-06-28 2022-06-28 Instruction processing method and device, electronic equipment and computer-readable storage medium Pending CN115080121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210784294.6A CN115080121A (en) 2022-06-28 2022-06-28 Instruction processing method and device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210784294.6A CN115080121A (en) 2022-06-28 2022-06-28 Instruction processing method and device, electronic equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN115080121A true CN115080121A (en) 2022-09-20

Family

ID=83258642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210784294.6A Pending CN115080121A (en) 2022-06-28 2022-06-28 Instruction processing method and device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115080121A (en)

Similar Documents

Publication Publication Date Title
CN108027769B (en) Initiating instruction block execution using register access instructions
US7117345B2 (en) Non-stalling circular counterflow pipeline processor with reorder buffer
US8074060B2 (en) Out-of-order execution microprocessor that selectively initiates instruction retirement early
CN107771318B (en) Mapping instruction blocks to instruction windows based on block size
KR20180021812A (en) Block-based architecture that executes contiguous blocks in parallel
CN108027733B (en) Storing invalidates in a target field
CN110825437B (en) Method and apparatus for processing data
CN114356420B (en) Instruction pipeline processing method and device, electronic device and storage medium
CN112214241B (en) Method and system for distributed instruction execution unit
CN113590197A (en) Configurable processor supporting variable-length vector processing and implementation method thereof
CN116302106A (en) Apparatus, method, and system for facilitating improved bandwidth of branch prediction units
US20140095814A1 (en) Memory Renaming Mechanism in Microarchitecture
CN114579312A (en) Instruction processing method, processor, chip and electronic equipment
US11314516B2 (en) Issuing instructions based on resource conflict constraints in microprocessor
US20100306513A1 (en) Processor Core and Method for Managing Program Counter Redirection in an Out-of-Order Processor Pipeline
CN110515659B (en) Atomic instruction execution method and device
JP2008527559A (en) Processor and instruction issuing method thereof
US20090164757A1 (en) Method and Apparatus for Performing Out of Order Instruction Folding and Retirement
CN112540792A (en) Instruction processing method and device
CN115080121A (en) Instruction processing method and device, electronic equipment and computer-readable storage medium
CN111857830B (en) Method, system and storage medium for designing path for forwarding instruction data in advance
US7783692B1 (en) Fast flag generation
US20040128484A1 (en) Method and apparatus for transparent delayed write-back
US20140201505A1 (en) Prediction-based thread selection in a multithreading processor
US20190171461A1 (en) Skip ahead allocation and retirement in dynamic binary translation based out-of-order processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination