CN115269011A

CN115269011A - Instruction execution unit, processing unit and related device and method

Info

Publication number: CN115269011A
Application number: CN202210667362.0A
Authority: CN
Inventors: 刘畅; 陈昊文; 江滔
Original assignee: Pingtouge Shanghai Semiconductor Co Ltd
Current assignee: Hangzhou C Sky Microsystems Co Ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-11-01

Abstract

The embodiment of the application provides an instruction execution unit, a processing unit, a related device and a related method, and the scheme is suitable for various chips comprising a CISC instruction set, a RISC reduced instruction set (especially RISC-V instruction set) or a VLIM instruction set architecture, such as an Internet of things chip, an audio/video chip and the like. The instruction execution unit includes: the acquiring subunit is used for acquiring a pipeline flushing request aiming at the instruction execution unit when a speculative failure occurs; and the flushing subunit is used for responding to the pipeline flushing request and performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode according to the execution delay and the architecture of the instruction execution unit, wherein the first flushing mode clears the speculative instructions on the wrong path before the instructions are executed, and the second flushing mode enables the speculative instructions on the wrong path not to generate the execution effect. This scheme can improve the efficiency that the assembly line erodeed.

Description

Instruction execution unit, processing unit, and related apparatus and method

Technical Field

The embodiment of the application relates to the technical field of chips, in particular to an instruction execution unit, a processing unit, a related device and a related method.

Background

In the design of high performance processors, the concept of speculative execution is introduced in order to improve the performance of the processor. The processor can predict the program flow and the branch direction through a speculative execution technology, and execute the instruction in the corresponding program flow according to the prediction result. When a program flow is mispredicted or a branch is mispredicted, a speculative failure may occur, and at this time, there may be an instruction that needs to generate an execution effect on a correct path or an instruction that does not need to generate an execution effect on a wrong path inside the processor. When a speculative failure occurs, the processor needs to generate a pipeline flushing action, and while keeping the instruction on the correct path, the processor clears the instruction on the wrong path, so as to ensure the correctness of the overall execution behavior.

At present, for an out-of-order processor, after a pipeline flushing request is generated, a speculative instruction located on a wrong path can be determined only after all instructions in the processor are executed, so that the speculative instruction located on the wrong path does not generate an execution effect, the speculative instruction on a correct path generates an execution effect, and the pipeline flushing is completed.

However, after waiting for all the instructions in the processor to complete execution, the speculative instructions on the wrong path do not produce the execution effect, and the waiting time for all the instructions in the processor to complete execution is long, which results in low efficiency of pipeline flushing.

Disclosure of Invention

Embodiments of the present disclosure provide an instruction execution unit, a processing unit, and related apparatuses and methods, to at least solve or mitigate the above-mentioned problems.

According to a first aspect of embodiments of the present application, there is provided an instruction execution unit, including: the acquiring subunit is used for acquiring a pipeline flushing request aiming at the instruction execution unit when a speculative failure occurs; and the flushing subunit is used for responding to the pipeline flushing request, and performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode according to the execution delay and the architecture of the instruction execution unit, wherein the first flushing mode clears the speculative instructions on the wrong path before the instructions are executed, and the second flushing mode enables the speculative instructions on the wrong path not to generate the execution effect.

According to a second aspect of embodiments of the present application, there is provided a processing unit comprising: the instruction fetching unit is used for acquiring an instruction to be executed; the instruction decoding unit is used for decoding the instruction to be executed; the instruction transmitting unit is used for sending the decoded instruction to be executed to the instruction executing unit; at least one instruction execution unit according to the first aspect above.

According to a third aspect of embodiments herein, there is provided a computing device comprising: the processing unit of the second aspect; a memory coupled to the processing unit and storing the instructions to be executed.

According to a fourth aspect of embodiments of the present application, there is provided a pipeline flushing method, including: when a speculative failure occurs, acquiring a pipeline flushing request aiming at an instruction execution unit; and responding to the pipeline flushing request, and performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode according to the execution delay and the architecture of the instruction execution unit, wherein the first flushing mode clears the speculative instructions on the wrong path before the instructions are executed, and the second flushing mode enables the speculative instructions on the wrong path not to generate the execution effect.

According to the pipeline flushing scheme provided by the embodiment of the application, when a speculative failure occurs, the obtaining subunit obtains a pipeline flushing request, the flushing subunit responds to the pipeline flushing request, and pipeline flushing is performed on the instruction execution unit through the first flushing mode and/or the second flushing mode according to the execution delay and the architecture of the instruction execution unit. The flushing brush unit can flush the instruction execution unit through the first flushing mode and/or the second flushing mode according to the execution delay and the architecture of the instruction execution unit, and can improve the flushing speed of the pipeline on the basis of flushing the instruction execution unit with different architectures, so that the flushing efficiency of the pipeline can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to these drawings.

FIG. 1 is a schematic diagram of a computing device to which one embodiment of the present application applies;

FIG. 2 is a schematic view of a processing unit of one embodiment of the present application;

FIG. 3 is a schematic diagram of an instruction execution unit of one embodiment of the present application;

FIG. 4 is a schematic diagram of an instruction sequence of one embodiment of the present application;

FIG. 5 is a flow diagram of a pipeline flushing method of one embodiment of the present application.

Detailed Description

The present application is described below based on examples, but the present application is not limited to only these examples. In the following detailed description of the present application, some specific details are set forth in detail. It will be apparent to one skilled in the art that the present application may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present application. The figures are not necessarily drawn to scale.

First, some nouns or terms appearing in the process of describing the embodiments of the present application are applicable to the following explanations.

And (3) speculative execution: speculative execution is one of the optimization techniques, and a processor using the speculative execution technique may predict a program flow and a branch direction according to existing information, and execute a subsequent instruction in the program flow in advance by using an idle time according to a prediction result, where the instruction executed in advance may or may not be subsequently used.

Correct path: when a speculative failure occurs, the program flow will execute the path to. The speculative instructions that are located on the correct path need to produce an execution effect, and are older relative to the speculative failure instructions that cause subsequent speculative failures.

Error path: when a speculative failure occurs, the program flow will not execute the path. The speculative instructions located on the wrong path need not produce an execution effect, and the speculative instructions located on the wrong path are updated relative to the speculative failure instructions that caused the subsequent speculative failure.

A speculative failure instruction: the speculative instruction which causes the speculative failure does not need to complete the execution effect when compared with the speculative instruction which is new than the speculative failure instruction, and the speculative instruction which is older than the speculative failure instruction needs to complete the execution effect.

Flushing a flow line: and when the speculative failure occurs, the processor clears the behavior of the speculative instruction positioned on the error path in the processor.

Computing device

FIG. 1 shows a schematic block diagram of a computing device 10. Computing device 10 may be built on various types of processing units and driven by any of the WINDOWS operating systems, UNIX operating systems, linux operating systems, etc. Further, computing device 10 may be implemented in hardware and/or software in a PC, desktop, notebook, server, and mobile communication device, among others.

As shown in FIG. 1, computing device 10 may include one or more processing units 12, and memory 14. Memory 14 in computing device 10 may serve as main storage (referred to simply as main memory or memory) for storing instruction information and/or data information represented by data signals, e.g., memory 14 may store data provided by processing unit 12 (e.g., as budgeted results), and may also be used to facilitate data exchange between processing unit 12 and external storage device 16 (or be secondary or external).

In some cases, processing unit 12 needs to access memory 14 over bus 11 to retrieve data in memory 14 or to modify data in memory 14. To alleviate the speed gap between processing unit 12 and memory 14 due to the slow access speed of memory 14, computing device 10 also includes a cache memory 18 communicatively coupled to bus 11, cache memory 18 being used to cache some program data or message data in memory 14 that may be repeatedly called. The cache Memory 18 may be implemented by a type of storage device such as a Static Random Access Memory (SRAM). The Cache memory 18 may have a multi-level structure, for example, a three-level Cache structure having a first-level Cache (L1 Cache), a second-level Cache (L2 Cache), and a third-level Cache (L3 Cache), and the Cache memory 18 may also have a Cache structure with more than three levels or other types of Cache structures. In some embodiments, a portion of cache memory 18 (e.g., a level one cache, or both a level one cache and a level two cache) may be integrated within processing unit 12 or in the same system on a chip as processing unit 12.

Based on this, the processing unit 12 may include an instruction execution unit 121, a memory management unit 122, and the like. The instruction execution unit 121, when executing some instructions that require a memory modification, initiates a write access request that specifies the write data and corresponding physical address that need to be written to the memory. The memory management unit 122 is configured to translate the virtual addresses specified by the instructions into physical addresses mapped by the virtual addresses, where the physical addresses specified by the write access request may be consistent with the physical addresses specified by the corresponding instructions.

The information exchange between the memory 14 and the cache 18 may be organized in data blocks. In some embodiments, the cache memory 18 and the memory 14 may be divided into data blocks by the same spatial size, and a data block may be the smallest unit of data exchange (including one or more data of a preset length) between the cache memory 18 and the memory 14. For the sake of brevity and clarity, each data block in the cache memory 18 is referred to below simply as a cache block (or may be referred to as a cacheline or cache line), and different cache blocks have different cache block addresses. Each data block in the memory 14 is simply referred to as a memory block, and different memory blocks have different memory block addresses. The cache block address and/or the memory block address may include a physical address tag used to locate the data block.

Due to space and resource constraints, the cache memory 18 cannot cache the entire contents of the memory 14, i.e., the storage capacity of the cache memory 18 is generally smaller than that of the memory 14, and the cache block addresses provided by the cache memory 18 cannot correspond to the memory block addresses provided by the memory 14. When the processing unit 12 needs to access the memory, it first accesses the cache memory 18 via the bus 11 to determine whether the contents to be accessed are already stored in the cache memory 18. If the content to be accessed is already stored in cache 18, cache 18 hits, at which point processing unit 12 calls the content to be accessed directly from cache 18. If the content to be accessed is not stored in cache 18, cache 18 and processing unit 12 need to access memory 14 via bus 11 to look up the corresponding information in memory 14. Because the access rate of the cache memory 18 is very fast, the efficiency of the processing unit 12 can be significantly improved when the cache memory 18 hits, thereby also improving the performance and efficiency of the overall computing device 10.

In addition, computing device 10 may also include input/output devices such as storage device 16, a display device, an audio device, a mouse/keyboard, and the like. The storage device 16 may be a hard disk, an optical disk, and a flash memory, etc. coupled to the bus 11 through respective interfaces for information access. The display device may be coupled to the bus 11 via a corresponding graphics card for displaying in accordance with display signals provided by the bus 11.

Computing device 10 may also include a communication device 17, and in turn computing device 10 may communicate with a network or other devices in a variety of ways. The communication device 17 may include one or more communication modules and the communication device 17 may include a wireless communication module adapted for a particular wireless communication protocol. For example, the communication device 17 may include a WLAN module for implementing WiFi communication compliant with the 802.11 standard established by the Institute of Electrical and Electronics Engineering (IEEE). The communication device 17 may include a WWAN module for wireless wide area communication conforming to a cellular or other wireless wide area protocol. The communication device 17 may also include a bluetooth module or other communication module using other protocols, or other custom type communication modules. The communication device 17 may also be a port for serial transmission of data.

It should be noted that the structure of the computing device 10 may vary from computing device 10 to computing device 10 depending on the motherboard, operating system, and instruction set architecture. For example, many computing devices today are provided with an input/output control hub connected between the bus 11 and various input/output devices, and the input/output control hub may be integrated within the processing unit 12 or separate from the processing unit 12.

Processing unit

FIG. 2 is a schematic block diagram of processing unit 12 of one embodiment of the present application. As shown in fig. 2, each processing unit 12 may include one or more processor cores 120 for processing instructions, the processing and execution of which may be controlled by a user (e.g., via an application program) and/or a system platform. Each processor core 120 may be configured to process a specific Instruction Set, which may support Complex Instruction Set Computing (CISC), reduced Instruction Set Computing (RISC), or Very Long Instruction Word (VLIW) -based Computing, and it is particularly noted that the processor cores 120 are adapted to process a RISC-V Instruction Set. The different processor cores 120 may each process different or the same instruction set. The Processor core 120 may also include other processing modules, such as a Digital Signal Processor (DSP), etc. As an example, processor core 1 through processor core m are shown in fig. 2, m being a positive integer.

Cache memory 18, shown in FIG. 1, may be fully or partially integrated within processing unit 12. Depending on the architecture, the cache memory 18 may be a single or multiple levels of internal cache memory (e.g., level 3 caches L1 through L3 shown in FIG. 2, collectively referred to as 18 in FIG. 2) within and/or outside of the respective processor cores 120, as well as instruction and instruction oriented and data oriented caches. Various components in processing unit 12 may share at least a portion of cache memory, e.g., processor cores 1-m may share a third level cache memory L3. Processing unit 12 may also include an external cache (not shown), and other cache structures may also be external to processing unit 12.

As shown in FIG. 2, processing unit 12 may include a Register File 126, and Register File 126 may include a plurality of registers for storing different types of data and/or instructions, which may be of different types, such as Register File 126 may include integer registers, floating point registers, status registers, instruction registers, pointer registers, and the like. The registers in register file 126 may be implemented using general purpose registers, or may be designed specifically for the actual needs of processing unit 12.

The processing Unit 12 may include a Memory Management Unit (MMU) 122 for implementing the translation of virtual addresses to physical addresses. A part of entries in the page table are cached in the memory management unit 122, and the memory management unit 122 may also obtain uncached entries from the memory. One or more memory management units 122 may be disposed in each processor core 120, and the memory management units 122 in different processor cores 120 may be synchronized with the memory management units 122 located in other processing units or processor cores, such that each processing unit or processor core may share a unified virtual storage system.

The processing unit 12 is used to execute sequences of instructions (i.e., programs). The process of execution of each instruction by processing unit 12 includes: and taking out the instruction from the memory for storing the instruction, decoding the taken-out instruction, executing the decoded instruction, keeping the instruction execution result and the like, and circulating until all instructions in the instruction sequence are executed or a halt instruction is encountered.

To implement the above process, the processing unit 12 may include an instruction fetch unit 124, an instruction decode unit 125, an instruction issue unit (not shown), an instruction execution unit 121, a retirement unit 123, and the like.

Instruction fetch unit 124 acts as a boot engine for processing unit 12 to move instructions from memory 14 into an instruction register (which may be one of register files 126 shown in FIG. 2 for holding instructions) and to receive a next instruction fetch address or to compute a next instruction fetch address based on an instruction fetch algorithm, which may be incrementing the address or decrementing the address based on the instruction length.

After fetching the instruction, the processing unit 12 enters an instruction decoding stage, and the instruction decoding unit 125 decodes the fetched instruction according to a predetermined instruction format to obtain operand fetch information required by the fetched instruction, so as to prepare for the operation of the instruction execution unit 121. The operand fetch information may include a pointer to an immediate, register, or other software/hardware capable of providing source operands.

An instruction issue unit is typically present in the high performance processing unit 12 between the instruction decode unit 125 and the instruction execution unit 121 for scheduling and control of instructions to efficiently distribute individual instructions to different instruction execution units 121, enabling parallel operation of multiple instructions. After an instruction is fetched, decoded and dispatched to the corresponding instruction execution unit 121, the corresponding instruction execution unit 121 starts executing the instruction, i.e. performs the operation indicated by the instruction, and the corresponding function is performed.

The retirement unit 123 (also referred to as an instruction retirement unit or an instruction write-back unit) is mainly configured to write back an execution result generated by the instruction execution unit 121 to a corresponding storage location (e.g., a register inside the processing unit 12), so that a subsequent instruction can quickly obtain the corresponding execution result from the storage location.

For different classes of instructions, different instruction execution units 121 may be provided in the processing unit 12 accordingly. The instruction execution unit 121 may be an operation unit (e.g., including an arithmetic logic unit, a shaping processing unit, a vector operation unit, etc. for performing operations according to operands and outputting operation results), a memory execution unit (e.g., for accessing a memory according to an instruction to read data in the memory or write specified data to the memory, etc.), a coprocessor, etc. In the processing unit 12, the respective instruction execution units 121 may run in parallel and output the corresponding execution results.

Instruction execution unit 121, when executing a certain type of instruction (e.g., a memory access instruction), needs to access memory 14 to obtain information stored in memory 14 or to provide data that needs to be written into memory 14. It should be noted that the instruction execution Unit 121 for executing the access instruction may also be referred to as a memory execution Unit, and the memory execution Unit may be a Load Store Unit (LSU) and/or other units for memory access.

After the access instruction is fetched by instruction fetch unit 124, instruction decode unit 125 may decode the access instruction so that the source operand of the access instruction may be fetched. The decoded access instruction is provided to the corresponding instruction execution unit 121, and the instruction execution unit 121 may perform a corresponding operation on a source operand of the access instruction (e.g., an arithmetic logic unit performs an operation on the source operand stored in a register) to obtain address information corresponding to the access instruction, and initiate a corresponding request, such as an address translation request, a write access request, and the like, according to the address information.

The source operand of the access instruction generally includes an address operand, and the instruction execution unit 121 performs an operation on the address operand to obtain a virtual address or a physical address corresponding to the access instruction. When the memory management unit 122 is disabled, the instruction execution unit 121 may directly obtain the physical address of the memory accessing instruction through a logical operation. When the memory management unit 121 is enabled, the corresponding instruction execution unit 121 initiates an address translation request according to a virtual address corresponding to the access instruction, where the address translation request includes a virtual address corresponding to an address operand of the access instruction; the memory management unit 122 responds to the address translation request and converts the virtual address in the address translation request into a physical address according to the entry matching the virtual address, so that the instruction execution unit 121 can access the cache memory 18 and/or the storage 14 according to the translated physical address.

Depending on the function, the memory access instruction may include a load instruction and a store instruction. The execution of the load instruction typically does not require modification of information in memory 14 or cache 18, and the instruction execution unit 121 need only read data stored in memory 14, cache 18, or an external storage device according to the address operand of the load instruction. Unlike load instructions, where the source operands of a store instruction include not only address operands, but also data information, the execution of the store instruction typically requires modification of information in memory 14 and/or cache 18. The data information of the store instruction may point to write data, and the source of the write data may be the execution result of an instruction such as an arithmetic instruction, a load instruction, etc., or may be data provided by a register or other storage unit in the processing unit 12, or may be an immediate.

The processing unit 12 may predict the program flow and the branch direction through speculative execution, and execute the subsequent instructions in the program flow in advance at the idle time according to the prediction result, and when a speculative failure occurs, the processing unit 12 needs to flush through a pipeline, clear the speculative instructions located on the internal error path, and retain the speculative instructions on the correct path, so as to ensure the correctness of the overall execution behavior. When a speculative failure occurs, the instruction execution unit 121 within the pipeline flushing range needs to perform pipeline flushing to clear the speculative instructions located on the wrong path inside and complete the execution effect of the speculative instructions on the correct path.

The processing unit 12 may further include a retirement unit 123, and when a speculative failure occurs, the retirement unit 123 may determine a pipeline flushing range according to a cause of the speculative failure, and further send a pipeline flushing request to the instruction execution unit 121 located within the pipeline flushing range, so that the instruction execution unit 121 located within the pipeline flushing range performs pipeline flushing, clears speculative instructions on an incorrect path, and retains speculative instructions on a correct path, thereby ensuring that the overall execution behavior of the processing unit 12 is correct.

The embodiments of the present application mainly focus on the pipeline flushing process of the instruction execution unit 121, and the process of the pipeline flushing will be described in detail later.

Instruction execution unit

When the speculative failure occurs, the instruction execution unit within the pipeline flushing range needs to perform pipeline flushing, and when the pipeline flushing is performed, the preceding stage cannot transmit a new instruction to the instruction execution unit, so that a large number of processing unit execution vacuoles are generated in the pipeline flushing process, and the performance of the processing unit is influenced. Since pipeline flushes cannot be eliminated, reducing the time required for each pipeline flush becomes a viable solution for improving the performance of a processing unit.

Currently, when an instruction execution unit flushes a pipeline, it is necessary to determine the new and old relationships of internal instructions with respect to speculative failure instructions according to the depth of the pipeline, and then determine speculative instructions located on a wrong path and speculative instructions located on a correct path according to the new and old relationships of the instructions, and then clear the speculative instructions located on the wrong path and complete the execution effect of the speculative instructions located on the correct path. However, determining the new and old relationships of instructions according to the pipeline depth is only applicable to some types of processing units, for example, the in-order processing unit may determine the new and old relationships of instructions according to the pipeline depth, and the out-of-order processing unit may wait for all instructions in the processor to be executed and then enable speculative instructions on a wrong path to have no effect, so that the efficiency of pipeline flushing on the out-of-order processing unit is low.

The embodiment of the present application is provided to cope with the problem of low pipeline flushing efficiency of the out-of-order processing unit, and is mainly implemented by the instruction execution unit 121. The internal structure of the instruction execution unit 121 and the implementation process of the embodiment of the present application are discussed in detail below.

FIG. 3 is a diagram illustrating an internal structure of an instruction execution unit according to an embodiment of the present application. As shown in fig. 3, the instruction execution unit 121 includes a fetch subunit 1211 and a flush subunit 1212. Upon a speculative failure, the fetch subunit 1211 may fetch a pipeline flush request for the instruction execution unit 121 in which it is located. The flushing subunit 1212 may perform pipeline flushing on the instruction execution unit 121 through the first flushing mode and/or the second flushing mode according to the execution delay and the framework of the instruction execution unit 121 in response to the pipeline flushing request acquired by the acquisition subunit 1211. The first flushing mode clears the speculative instruction on the error path before the instruction is executed, and the second flushing mode can enable the speculative instruction on the error path not to generate an execution effect.

When a speculative failure occurs, the retirement unit 123 in the processing unit 12 determines a pipeline flushing range according to the cause of the speculative failure, and sends a pipeline flushing request to the instruction execution unit 121 located in the pipeline flushing range. An instruction execution unit 121 that is within the pipeline flush range may receive a pipeline flush request.

The first flush mode clears speculative instructions on a wrong path before the instructions are executed, so that pipeline flushing can be performed without waiting for all instructions inside the instruction execution unit 121 to be executed completely, and thus the time required by pipeline flushing every time can be shortened.

The second flush mode may not generate the execution effect of the speculative instruction on the wrong path, and after all instructions inside the instruction execution unit 121 are executed, the new and old relationships of the instructions may be determined according to the execution result, so that the second flush mode may be applicable to the instruction execution units 121 of various architectures, but the second flush mode needs to determine the new and old relationships of the instructions after the instructions are executed, if a long-delay instruction exists in the instruction execution unit 121, it takes a long time for the instruction execution unit 121 to execute the long-delay instruction, and it takes a long time to complete the pipeline flush, and during the pipeline flush, a previous stage cannot transmit a new instruction to the instruction execution unit 121, so that the processing unit 12 generates an execution bubble, and the performance of the processing unit 12 is affected.

In this embodiment, when a speculative failure occurs, the obtaining subunit 1211 obtains a pipeline flush request, and the flushing subunit 1212 performs pipeline flush on the instruction execution unit 121 through the first flush mode and/or the second flush mode according to the execution delay and the architecture of the instruction execution unit 121 in response to the pipeline flush request, where the pipeline flush speed is fast because the first flush mode clears a speculative instruction on an error path before the instruction is executed, but the pipeline flush speed is only applicable to instruction execution units of a partial architecture type, and the second flush mode may make the speculative instruction on the error path generate no execution effect and is applicable to instruction execution units of various architecture types. Therefore, the flushing subunit 1212 can flush the pipeline of the instruction execution unit 121 through the first flushing mode and/or the second flushing mode according to the execution delay and the architecture of the instruction execution unit 121, and can increase the flushing speed of the pipeline on the basis of flushing the pipeline of the instruction execution units of different architectures, thereby increasing the pipeline flushing efficiency.

In a possible implementation manner, when the pipeline flushing is performed on the instruction execution unit 121 through the first flushing mode, a speculative failure instruction causing a speculative failure is determined first, then, according to the pipeline depth of the instruction execution unit 121, an old-new relationship between each speculative instruction and the speculative failure instruction in the instruction execution unit 121 is determined, and then, according to the old-new relationship, a first speculative instruction located on an error path and a second speculative instruction located on a correct path in the instruction execution unit 121 can be determined, and then, the first speculative instruction is removed, and the second speculative instruction continues to be instructed.

When the flushing subunit 1212 flushes the pipeline of the instruction execution unit 121 via the first flush mode, the flushing subunit 121 may determine the speculative failure instruction that caused the speculative failure. Speculative failures may occur when program flow prediction errors, including interrupts, exceptions, etc., or branch prediction errors. When the instruction execution unit 12 is interrupted or abnormal, if the currently executed instruction is a speculative instruction, the currently executed instruction is determined as a speculative failure instruction, and if the currently executed instruction is not a speculative instruction, the next speculative execution to be executed is determined as a speculative failure instruction. When a branch prediction error occurs in the instruction execution unit 12, the speculative instruction at the branch position of the program flow is determined to be a speculative failure instruction.

Alternatively, the new and old relationships of each speculative instruction in the instruction execution unit 121 may be determined according to the ID of the instruction. When the instructions are issued, the ID is distributed to the instructions, and the ID of the instructions is increased or decreased according to the issuing sequence, so that the new and old relations among the instructions can be determined according to the ID of the instructions.

FIG. 4 is a diagram of an instruction sequence (program flow) of one embodiment of the present application. As shown in fig. 4, the instruction execution unit 121 currently executes the instruction a, and executes the instruction B, the instruction C, the instruction D, and the instruction E in advance through speculative execution, where the instruction B is a speculative instruction 1, the instruction C is a speculative instruction 2, the instruction D is a speculative instruction 3, and the instruction E is a speculative instruction 4. If instruction execution unit 121 generates an interrupt or exception while executing instruction A, instruction B (speculative instruction 1) is determined to be a speculative failure instruction. If instruction execution unit 121 generates an interrupt or exception while executing instruction C, instruction C (speculative instruction 2) is determined to be a speculative failure instruction.

For the instruction execution unit 121 that processes in order, after determining the speculative failure instruction, the flushing subunit 1212 may determine, according to the pipeline depth of the instruction execution unit 121, the new and old relationships of each speculative instruction in the instruction execution unit 121 with respect to the speculative failure instruction, determine the speculative instruction updated with respect to the speculative failure instruction as the first speculative instruction located on the wrong path, and determine the speculative instruction older than the speculative failure instruction as the second speculative instruction located on the correct path.

It should be noted that the first speculative instruction may include a plurality of instructions, and is not limited to one instruction. The second speculative instruction may also include a plurality of instructions, not limited to one instruction.

As shown in fig. 4, if instruction B (speculative instruction 1) is determined to be a speculative failure instruction, speculative instruction 2 (instruction C), speculative instruction 3 (instruction D), and speculative instruction 4 (instruction E) updated with respect to instruction B (speculative instruction 1) are determined to be the first speculative instructions. If instruction C (speculative instruction 2) is determined to be a speculative failure instruction, speculative instruction 3 (instruction D) and speculative instruction 4 (instruction E) updated with respect to instruction C (speculative instruction 2) are determined to be a first speculative instruction, and speculative instruction 1 (instruction B) older with respect to instruction C (speculative instruction 2) is determined to be a second speculative instruction.

In this embodiment of the present application, for the instruction execution unit 121 that can determine the new and old relationships between instructions according to the pipeline depth, the flushing subunit 1212 determines, after determining the speculation failure instruction, the new and old relationships of each speculative instruction in the instruction execution unit 121 with respect to the speculation failure instruction according to the pipeline depth of the instruction execution unit 121, determines the speculative instruction updated with respect to the speculation failure instruction as a first speculative instruction, determines the speculative instruction older than the speculation failure instruction as a second speculative instruction, further clears the first speculative instruction, and continues to execute the second speculative instruction, thereby completing pipeline flushing. Since the first speculative instruction is cleared before the instruction is executed, the pipeline flush can be completed without waiting for the instructions in the instruction execution unit 121 to be completely executed, so that the efficiency of the pipeline flush can be improved.

In a possible implementation manner, when the pipeline flushing is performed on the instruction execution unit 121 in the second flushing mode, after all instructions in the instruction execution unit 121 are executed, the flushing subunit 1212 determines, according to the new and old relationships of all the instructions, a third speculative instruction located on an incorrect path and a fourth speculative instruction located on a correct path in the instruction execution unit 121, so that the third speculative instruction does not generate an execution effect, and the fourth speculative instruction generates an execution effect.

When the flushing subunit 1212 flushes the pipeline of the instruction execution unit 121 through the second flushing mode, after the instructions in the instruction execution unit 121 are executed completely, the flushing subunit 1212 determines the old-new relationship of each instruction according to the execution result of each instruction, and determines the third speculative instruction located on the wrong path and the fourth speculative instruction located on the correct path according to the old-new relationship of each instruction. Since the flushing subunit 1212 determines the old and new relationships of the instructions according to the execution result of the instruction, the instruction execution units suitable for various architecture types, such as the in-order instruction execution unit and the out-of-order instruction execution unit, may perform pipeline flushing through the second flushing mode.

It should be appreciated that in an out-of-order processing unit, the instruction execution unit 121 executes instructions out-of-order, writes instructions after the execution of the instructions is completed, and reorders the instructions, so that the new and old relationships between the instructions can be determined after the execution of the instructions is completed. In the in-order processing unit, the instruction execution unit 121 sequentially executes instructions, and the executed instructions are still arranged in order, so that the new and old relationships among the instructions can be determined after the instructions are executed.

When the pipeline flushing is performed on the instruction execution unit 121 through the second flushing mode, the third speculative instruction located on the wrong path and the fourth speculative instruction located on the correct path are both executed, but the third speculative instruction may not generate an execution effect, and the fourth speculative instruction may generate a speculative effect, so as to ensure the correctness of the overall execution behavior. The third speculative instruction and the fourth speculative instruction will complete execution, but the result of the third speculative instruction will not be written back or have other execution effects, and the fourth speculative instruction will normally be written back or have other execution effects.

As shown in fig. 4, instruction C (speculative instruction 2) is a branch instruction, and the speculative execution prediction program flow flows to a branch formed by instruction D (speculative instruction 3) and instruction E (speculative instruction 4). If instruction C (speculative instruction 2) is mispredicted for the branch, i.e., instruction F should be executed after instruction C is executed, but speculative execution predicts that instruction D will be executed, then instruction C, instruction D, and instruction E are determined to be the third speculative instructions on the wrong path, and instruction B is determined to be the fourth speculative instruction on the correct path. The instruction B, the instruction C, the instruction D and the instruction E can complete the execution, but the execution result of the instruction B is normally written back, and the execution results of the instruction C, the instruction D and the instruction E are not written back, so that the instruction B can generate the execution effect, and the instruction C, the instruction D and the instruction E do not generate the execution effect.

In this embodiment, for the instruction execution units 121 of various architecture types, the flushing subunit 1212 may determine, after the execution of each instruction in the instruction execution unit 121 is completed, a new relationship and an old relationship of each instruction, and further determine a third speculative instruction located on an incorrect path and a fourth speculative instruction located on a correct path, so that the third speculative instruction does not generate an execution effect, and the fourth speculative instruction generates an execution effect, thereby completing pipeline flushing. After the instructions are executed, the speculative instructions on the correct path and the speculative instructions on the wrong path are determined, the method and the device are suitable for pipeline flushing of the in-order processing unit and the out-of-order processing unit, and the applicability of pipeline flushing is guaranteed.

In one possible implementation, when the flushing subunit 1212 performs pipeline flushing on the instruction execution unit 121 through the first flush mode and/or the second flush mode according to the execution delay and the architecture of the instruction execution unit 121, if the average execution delay of the instruction execution unit 121 is greater than the preset delay threshold or the instruction execution unit 121 is in the in-order processing architecture, the flushing subunit 1212 performs pipeline flushing on the instruction execution unit 121 through the first flush mode.

Because the first flush mode can determine the speculative instruction located on the wrong path and the speculative instruction located on the correct path according to the new and old relations of the speculative instructions before the instructions are executed, the pipeline flush can be completed without waiting for the instructions in the instruction execution unit 121 to be completely executed, the time required by the pipeline flush is shortened, and the performance of the instruction execution unit 121 can be improved for the instruction execution unit 121 with larger execution delay.

For the instruction execution unit 121 of the in-order processing architecture, the new and old relationships of each instruction can be determined according to the pipeline depth of the instruction execution unit 121, and a complex circuit structure is not required to be designed for determining the new and old relationships of the instruction, so that while the pipeline flushing speed is ensured, the design complexity of the processing unit 12 can be avoided, and the processing unit 12 is ensured to have lower cost and volume.

For example, each instruction execution unit 121 included in the processing unit 12 includes a vector processing unit, the vector processing unit has a relatively large delay and is fixed, and the flushing subunit 1212 performs pipeline flushing through the first flushing mode. When the pipeline is flushed, instructions in the pipeline are judged, and speculative instructions on wrong paths are cleared.

It should be understood that the average execution delay of the instruction execution unit 121 may be determined according to the execution delay of the instruction execution unit 121 responsible for executing the instruction, and if the execution delay of the instruction with a larger proportion of the instructions executed by the instruction execution unit 121 is greater than the delay threshold, the average execution delay of the instruction execution unit 121 is determined to be greater than the delay threshold. For example, if the execution delay of 90% of the instructions executed by the instruction execution unit 121 is greater than the delay threshold, and the execution delay of 10% of the instructions is less than the delay threshold, the average execution delay of the instruction execution unit 121 is determined to be greater than the delay threshold.

In the embodiment of the present application, when the average execution delay of the instruction execution unit 121 is greater than the delay threshold or the instruction execution unit 121 is an in-order processing architecture, the flushing subunit 1212 performs pipeline flushing on the instruction execution unit 121 through the first flushing mode, and on the basis of ensuring the pipeline flushing speed, the design complexity of the processing unit 12 can be reduced, thereby ensuring that the processing unit 12 has lower cost and volume.

In a possible implementation manner, when the flushing subunit 1212 performs pipeline flushing on the instruction execution unit 121 through the first flushing mode and/or the second flushing mode according to the execution delay and the architecture of the instruction execution unit 121, if the average execution delay of the instruction execution unit 121 is less than or equal to the preset delay threshold and the instruction execution unit 121 is the out-of-order processing architecture, the flushing subunit 1212 performs pipeline flushing on the instruction execution unit 121 through the second flushing mode.

For the instruction execution unit 121 of the out-of-order processing architecture, if the execution delay of the instruction execution unit 121 is small, the influence of the pipeline flushing speed on the performance of the instruction execution unit 121 is small, and it is not necessary to design a complex circuit structure for improving the pipeline flushing speed, after the execution of each instruction in the instruction execution unit 121 is completed, the new and old relations of each instruction are determined, so that the speculative instruction on the wrong path does not generate the execution effect, but the speculative execution on the correct path generates the execution effect, the pipeline flushing is completed, and the design complexity of the instruction execution unit 121 is reduced on the basis of ensuring the performance of the instruction execution unit 121.

For example, each instruction execution unit 121 included in the processing unit 12 includes a shaping processing unit, the relative delay of the shaping processing unit is small and fixed, and the flushing subunit 1212 performs pipeline flushing through the second flushing mode. The flushing unit 1212 determines whether the speculative instruction is located on the correct path after all the instructions in the instruction execution unit 121 are executed, writes the speculative instruction on the correct path back, generates an execution effect, and clears the speculative instruction on the wrong path.

In the embodiment of the present application, for the out-of-order processing architecture instruction execution unit 121 whose average execution delay is less than or equal to the delay threshold, the flushing subunit 1212 performs pipeline flushing on the instruction execution unit 121 through the second flushing mode, and there is no need to design a circuit structure for determining the new and old relationships of instructions in the instruction execution unit 121, so that while the pipeline flushing speed is ensured, the design complexity of the processing unit 12 can be reduced, thereby ensuring that the processing unit 12 has lower cost and volume.

It should be noted that, when a speculative failure occurs, not only the instruction execution unit 121 needs to perform pipeline flushing, but both the front stage and the back stage of the instruction execution unit 121 need to perform pipeline flushing, and there may be a plurality of instruction execution units 121 in the processing unit 12 that need to perform pipeline flushing, and during the pipeline flushing of the processing unit 12, the front stage does not issue new instructions to the instruction execution unit 121, so that the pipeline flushing efficiency of a certain instruction execution unit 121 is simply improved, and the performance of the processing unit 12 may not be improved, and therefore when the execution delay of the instruction execution unit 121 is small, it is not necessary to design a complex circuit structure to perform pipeline flushing on the instruction execution unit 121 through the first flush mode.

In one possible implementation, when the flushing subunit 1212 flushes the pipeline of the instruction execution unit 121 through the first flushing mode and/or the second flushing mode according to the execution delay and the architecture of the instruction execution unit 121, if the execution delay of the instruction execution unit 121 dynamically changes above and below the delay threshold, the flushing subunit 1212 flushes the pipeline of the instruction execution unit 121 through a combination of the first flushing mode and the second flushing mode.

Specifically, for the N stages of pipelines included in the instruction execution unit 121, pipeline flushing is performed by the first flush-through pattern for each stage of pipeline after the check point, and pipeline flushing is performed by the second flush-through pattern for each stage of pipeline before the check point. Wherein N is a positive integer greater than or equal to 3, the checkpoint is an Mth stage pipeline in the N stages of pipelines, and M is a positive integer greater than 1 and less than N, and it is determined that the instruction to be executed can pass the checkpoint.

The check point is set in a certain stage of the multi-stage pipeline included in the instruction execution unit 121, at least one stage of the pipeline is set before and after the check point, when the instruction passes through the check point, it can be determined whether the instruction needs to be executed, the non-speculative instruction can pass through the check point to the next stage of the pipeline without being executed, the speculative instruction needs to be executed first, the speculative instruction on the correct path can pass through the check point, and the speculative instruction on the wrong path cannot pass through the check point. And for each stage of pipelines after the check point, performing pipeline flushing through a first flushing mode, and for each stage of pipelines before the check point, performing pipeline flushing through a second flushing mode.

For example, each instruction execution unit 121 included in the processing unit 12 includes a memory access unit, the memory access unit has an indeterminate execution delay when executing an instruction, and the flushing subunit 1212 performs pipeline flushing by combining the first flushing mode and the second flushing mode. The instruction execution unit 121 sets a checkpoint to allow the checkpoint to be passed only when the instruction is determined to be executed by the checkpoint judging unit. When the assembly line is flushed, the residual instructions pass through the check points in sequence for judgment, if the residual instructions are judged to be speculative instructions, the instructions are cancelled, and if the residual instructions are judged to be non-speculative instructions, the residual instructions normally pass through the check points. And after all instructions pass the judgment of the check point, the flushing is completed.

In this embodiment, a checkpoint is set in a multistage pipeline included in the instruction execution unit 121, an instruction passes through the checkpoint to determine whether the instruction needs to be executed, and it is determined that the instruction that needs to be executed can pass through the checkpoint, but the instruction that does not need to be executed cannot pass through the checkpoint, so that the pipeline can be continuously flushed, and it is not necessary to wait for the completion of execution of all instructions in the instruction execution unit 121, and then determine speculative instructions on a correct path and speculative instructions on an incorrect path, so that the efficiency of flushing the pipeline can be improved.

In one possible implementation, when the flushing subunit 1212 performs pipeline flushing by combining the first flush mode and the second flush mode, the speculative instruction that was checkpointed when the fetch subunit 1211 received the pipeline flush request may be determined to be a speculative failure instruction. When speculative instructions in each stage of pipeline before the check point sequentially pass through the check point, the flushing subunit 1212 judges a new and old relationship between the speculative instruction passing through the check point and the speculative failure instruction, if the speculative instruction passing through the check point is older than the speculative failure instruction, the speculative instruction passing through the check point generates an execution effect, and if the speculative instruction passing through the check point is newer than the speculative failure instruction, the speculative instruction passing through the check point does not generate the execution effect.

In this embodiment of the present application, after determining the failed speculative instruction, when the speculative instructions in each stage of the pipeline before the check point pass through the check point, the flushing subunit 1212 determines the new and old relationships between the speculative instructions that pass through the check point and the speculative failure instructions, and determines the speculative instructions on the wrong path and the speculative instructions on the correct path, so that the speculative instructions on the wrong path do not generate an execution effect, and the speculative instructions on the correct path generate an execution effect, thereby improving the flushing speed of the pipeline and ensuring the flushing effect of the pipeline.

It should be understood that the flushing subunit 1212 pipeline flushes the instruction execution unit 121 by combining the first flushing mode with the second flushing mode, and mainly for the instruction execution unit 121 that executes instructions out-of-order, some or all of the instruction execution units 121 in the out-of-order processing unit 12 can execute instructions out-of-order.

Pipeline flushing method

FIG. 5 is a flowchart of a pipeline flushing method of one embodiment of the present application, which may be performed by the instruction execution unit 121 in the above-described embodiment. As shown in fig. 5, the pipeline flushing method includes the following steps:

step 501, when a speculative failure occurs, acquiring a pipeline flushing request aiming at an instruction execution unit;

and step 502, responding to the pipeline flushing request, and according to the execution delay and the architecture of the instruction execution unit, performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode, wherein the first flushing mode clears the speculative instructions on the wrong path before the instructions are executed, and the second flushing mode enables the speculative instructions on the wrong path not to generate the execution effect.

In the embodiment of the application, when a speculative failure occurs, the pipeline flushing can be performed on the instruction execution unit through the first flushing mode and/or the second flushing mode according to the execution delay and the architecture of the instruction execution unit, the speculative instruction on the wrong path can be cleared before the instruction is executed in the first flushing mode, the flushing speed is high, the speculative instruction on the wrong path can not generate an execution effect in the second flushing mode, the flushing speed of the pipeline can be improved on the basis of performing pipeline flushing on the instruction execution units with different architectures, and therefore the pipeline flushing efficiency can be improved.

In one possible implementation, the first flush mode includes: determining a speculative failure instruction causing speculative failure, determining a new-old relationship between each speculative instruction and the speculative failure instruction in the instruction execution unit according to the pipeline depth of the instruction execution unit, determining a first speculative instruction positioned on a wrong path and a second speculative instruction positioned on a correct path in the instruction execution unit according to the new-old relationship, clearing the first speculative instruction, and enabling the second speculative instruction to continue to execute. The second flush mode includes: after all instructions in the instruction execution unit are executed, determining a third speculative instruction positioned on an error path and a fourth speculative instruction positioned on a correct path in the instruction execution unit according to the new and old relations of all instructions, so that the third speculative instruction does not generate an execution effect, and the fourth speculative instruction generates an execution effect.

In one possible implementation manner, according to the execution delay of the instruction execution unit and the architecture of the instruction execution unit, performing pipeline flushing on the instruction execution unit through the first flushing mode and/or the second flushing mode comprises: and when the average execution delay of the instruction execution unit is larger than a preset delay threshold value or the instruction execution unit is an in-order processing architecture, performing pipeline flushing on the instruction execution unit through a first flushing mode.

In one possible implementation manner, according to the execution delay of the instruction execution unit and the architecture of the instruction execution unit, performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode, includes: and when the average execution delay of the instruction execution unit is less than or equal to the delay threshold value and the instruction execution unit is an out-of-order processing architecture, performing pipeline flushing on the instruction execution unit through a second flushing mode.

In one possible implementation manner, according to the execution delay of the instruction execution unit and the architecture of the instruction execution unit, performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode, includes: when the execution delay of the instruction execution unit dynamically changes above and below a delay threshold, for N-stage pipelines included in the instruction execution unit, pipeline flushing is performed on each stage of pipeline after a check point through a first flushing mode, and pipeline flushing is performed on each stage of pipeline before the check point through a second flushing mode, wherein N is a positive integer larger than or equal to 3, the check point is an Mth stage of pipeline in the N-stage pipeline, M is a positive integer larger than 1 and smaller than N, and it is determined that an instruction to be executed can pass through the check point.

In one possible implementation, pipeline flushing is performed on each stage of pipelines before the check point through a second flushing mode, and the method comprises the following steps: the method comprises the steps that a speculative instruction located at a check point when an acquisition subunit receives a pipeline flushing request is determined as a speculative failure instruction, when speculative instructions in various stages of pipelines located before the check point sequentially pass through the check point, the new and old relation between the speculative instruction passing through the check point and the speculative failure instruction is judged, if the speculative instruction passing through the check point is older than the speculative failure instruction, the speculative instruction passing through the check point generates an execution effect, and if the speculative instruction passing through the check point is newer than the speculative failure instruction, the speculative instruction passing through the check point does not generate the execution effect.

It should be noted that, because details of the pipeline flushing method are described in detail in the instruction execution unit part of the foregoing embodiment, with reference to the schematic structural diagram, specific processes may refer to descriptions in the foregoing embodiment of the instruction execution unit, and are not described again here.

Computer storage medium

The present application also provides a computer-readable storage medium storing instructions for causing a machine to perform a pipeline flushing method as described herein. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present application.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Computer program product

Embodiments of the present application further provide a computer program product, which includes computer instructions for instructing a computing device to perform operations corresponding to any of the above method embodiments.

Commercial value of embodiments of the present application

When the problem of adaptability of pipeline flushing is solved, a dual-mode pipeline flushing acceleration mechanism is adopted, the first flushing mode directly flushes and clears speculative instructions on all wrong paths in the instruction execution unit, the second flushing mode judges whether the instructions are located in the correct paths or the wrong paths after all instructions in the instruction execution unit are executed, and execution results of speculative execution on the wrong paths are cleared. According to the characteristics of each instruction execution unit in the processing unit, a first flushing mode, a second flushing mode or a mode of combining the first flushing mode and the second flushing mode can be selected to flush the assembly line, so that the design complexity and the flushing speed of the assembly line can be balanced in the design of the processing unit, the performance loss during flushing of the assembly line is reduced, and the overall performance of the processing unit is improved.

It should be understood that the embodiments in this specification are described in a progressive manner, and that the same or similar parts in the various embodiments may be referred to one another, with each embodiment being described with emphasis instead of the other embodiments. In particular, as for the method embodiments, since they are substantially similar to the methods described in the apparatus and system embodiments, the description is simple, and the relevant points can be referred to the partial description of the other embodiments.

It should be understood that the above description describes particular embodiments of the present specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

It should be understood that an element described herein in the singular or shown in the figures only represents a limitation of the number of the element to one. Further, modules or elements described or illustrated herein as separate may be combined into a single module or element, and modules or elements described or illustrated herein as a single may be split into multiple modules or elements.

It is also to be understood that the terms and expressions employed herein are used as terms of description and not of limitation, and that the embodiment or embodiments of the specification are not limited to those terms and expressions. The use of such terms and expressions is not intended to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications may be made within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims are to be regarded as covering all such equivalents.

Claims

1. An instruction execution unit, comprising:

an obtaining subunit, configured to obtain, when a speculative failure occurs, a pipeline flush request for the instruction execution unit;

and the flushing subunit is used for responding to the pipeline flushing request, and performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode according to the execution delay and the architecture of the instruction execution unit, wherein the first flushing mode clears the speculative instructions on the wrong path before the instructions are executed, and the second flushing mode enables the speculative instructions on the wrong path not to generate the execution effect.

2. The instruction execution unit of claim 1, wherein,

the first flush-through mode comprises: determining a speculative failure instruction causing speculative failure, determining a new-old relationship between each speculative instruction and the speculative failure instruction in the instruction execution unit according to the pipeline depth of the instruction execution unit, determining a first speculative instruction located on an error path and a second speculative instruction located on a correct path in the instruction execution unit according to the new-old relationship, clearing the first speculative instruction, and enabling the second speculative instruction to continue to execute;

the second flush-through mode comprises: after the execution of each instruction in the instruction execution unit is completed, according to the new and old relations of each instruction, a third speculative instruction located on an error path and a fourth speculative instruction located on a correct path in the instruction execution unit are determined, so that the third speculative instruction does not generate an execution effect, and the fourth speculative instruction generates an execution effect.

3. The instruction execution unit of claim 2, wherein,

the flushing subunit is configured to perform pipeline flushing on the instruction execution unit through the first flushing mode when the average execution delay of the instruction execution unit is greater than a preset delay threshold or the instruction execution unit is an in-order processing architecture.

4. The instruction execution unit of claim 2, wherein,

the flushing subunit is configured to perform pipeline flushing on the instruction execution unit through the second flushing mode when the average execution delay of the instruction execution unit is less than or equal to the delay threshold and the instruction execution unit is an out-of-order processing architecture.

5. The instruction execution unit of claim 3 or 4, wherein,

the flushing subunit is configured to, when the execution delay of the instruction execution unit dynamically changes above and below the delay threshold, perform pipeline flushing on each stage of pipelines after a checkpoint through the first flushing mode and perform pipeline flushing on each stage of pipelines before the checkpoint through the second flushing mode for N stages of pipelines included in the instruction execution unit, where N is a positive integer greater than or equal to 3, the checkpoint is an mth stage of pipelines in the N stages of pipelines, M is a positive integer greater than 1 and less than N, and it is determined that an instruction to be executed can pass through the checkpoint.

6. The instruction execution unit of claim 5, wherein,

the system comprises an acquisition subunit, a pipeline flushing request acquiring subunit, a check point judging subunit and a flushing subunit, wherein the acquisition subunit is used for acquiring a speculative instruction of a pipeline in a check point, the flushing subunit is used for determining the speculative instruction at the check point as a speculative failure instruction when the acquisition subunit receives the pipeline flushing request, judging the new and old relationship between the speculative instruction passing the check point and the speculative failure instruction when the speculative instruction at each stage before the check point sequentially passes through the check point, if the speculative instruction passing the check point is older than the speculative failure instruction, enabling the speculative instruction passing the check point to generate an execution effect, and if the speculative instruction passing the check point is newer than the speculative failure instruction, enabling the speculative instruction passing the check point not to generate the execution effect.

7. A processing unit, comprising:

the instruction fetching unit is used for acquiring an instruction to be executed;

the instruction decoding unit is used for decoding the instruction to be executed;

the instruction transmitting unit is used for sending the decoded instruction to be executed to the instruction executing unit;

at least one instruction execution unit according to any one of claims 1-6.

8. The processing unit of claim 7, wherein the processing unit further comprises:

and the retirement unit is used for determining a pipeline flushing range according to the generation reason of the speculative failure when the speculative failure occurs, and sending a pipeline flushing request to the instruction execution unit positioned in the pipeline flushing range.

9. A computing device, comprising:

a processing unit according to claim 7 or 8;

a memory coupled to the processing unit and storing the instructions to be executed.

10. A method of pipeline flushing, comprising:

when a speculative failure occurs, acquiring a pipeline flushing request aiming at an instruction execution unit;

and responding to the pipeline flushing request, and performing pipeline flushing on the instruction execution unit through a first flushing mode and/or a second flushing mode according to the execution delay and the architecture of the instruction execution unit, wherein the first flushing mode clears the speculative instructions on the wrong path before the instructions are executed, and the second flushing mode enables the speculative instructions on the wrong path not to generate the execution effect.

11. The pipeline flushing method of claim 10,

12. The pipeline flushing method of claim 11, wherein the pipeline flushing the instruction execution unit with a first flush-through mode and/or a second flush-through mode according to an execution latency of the instruction execution unit and an architecture of the instruction execution unit comprises:

and when the average execution delay of the instruction execution unit is larger than a preset delay threshold value or the instruction execution unit is an in-order processing architecture, performing pipeline flushing on the instruction execution unit through the first flushing mode.

13. The pipeline flushing method of claim 11, wherein the pipeline flushing the instruction execution unit with a first flush-through mode and/or a second flush-through mode according to an execution latency of the instruction execution unit and an architecture of the instruction execution unit comprises:

and when the average execution delay of the instruction execution unit is less than or equal to the delay threshold value and the instruction execution unit is an out-of-order processing architecture, performing pipeline flushing on the instruction execution unit through the second flushing mode.

14. The pipeline flushing method of claim 12 or 13, wherein the pipeline flushing the instruction execution unit through a first flushing mode and/or a second flushing mode according to the execution latency of the instruction execution unit and the architecture of the instruction execution unit comprises:

when the execution delay of the instruction execution unit dynamically changes above and below the delay threshold, for N-level pipelines included in the instruction execution unit, performing pipeline flushing on each level of pipelines after a check point through the first flush mode, and performing pipeline flushing on each level of pipelines before the check point through the second flush mode, where N is a positive integer greater than or equal to 3, the check point is an mth level of pipelines in the N-level pipelines, and M is a positive integer greater than 1 and less than N, and it is determined that an instruction to be executed can pass through the check point.

15. The pipeline flushing method of claim 14, wherein the pipeline flushing through the second flushing pattern for each stage of the pipeline prior to the checkpoint comprises:

determining the speculative instruction located at the check point when the acquisition subunit receives the pipeline flushing request as a speculative failure instruction, judging a new-old relationship between the speculative instruction passing through the check point and the speculative failure instruction when the speculative instructions in each stage of pipelines located before the check point sequentially pass through the check point, if the speculative instruction passing through the check point is older than the speculative failure instruction, enabling the speculative instruction passing through the check point to generate an execution effect, and if the speculative instruction passing through the check point is newer than the speculative failure instruction, enabling the speculative instruction passing through the check point not to generate the execution effect.