CN111078294A

CN111078294A - Instruction processing method and device of processor and storage medium

Info

Publication number: CN111078294A
Application number: CN201911157269.XA
Authority: CN
Inventors: 周玉龙; 刘同强; 李拓; 邹晓峰
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-04-28
Anticipated expiration: 2039-11-22
Also published as: CN111078294B

Abstract

The invention discloses an instruction processing method of a processor, which comprises the following steps: the instruction fetching unit sends a matching request of preset parameters to the bypass unit; the bypass unit matches preset parameters based on the preset parameter matching request; and responding to the successful matching of the preset parameters, the bypass unit sends a decoding result of the instruction corresponding to the matched preset parameters to the execution unit. The method disclosed by the invention adds bypass operation between instruction fetching and decoding, and performs instruction pre-fetching and pre-decoding on the instruction, and directly skips the instruction fetching and decoding module for instruction operation hit in the bypass module, and directly performs execution operation, so that the time for instruction fetching and decoding can be greatly saved, the efficiency of a production line is improved, and the execution efficiency of a processor is also improved.

Description

Instruction processing method and device of processor and storage medium

Technical Field

The present invention relates to the field of processors, and in particular, to a method, an apparatus, and a storage medium for processing instructions of a processor.

Background

In order to improve the efficiency of the CPU, most modern microprocessors have widely adopted pipeline design, and for some CPUs with higher performance, the design of the pipeline of the CPU becomes the key for determining the performance of the CPUs, such as multi-launch super pipeline technology. In the design process, the instruction execution path is divided into 5 units as shown in fig. 1, and the pipeline structures are: fetch (Ifetch), decode (Dec), execute (Exec), memory operation (Mem), and write-back register (WB).

In a general pipeline design, instruction fetching and decoding are separated, a set of processing flows are provided respectively, a large amount of data are needed to be interacted with each other, and the pipeline is often halted correspondingly due to the halt of some reasons, so that the instruction fetching and decoding are halted correspondingly, and the processing of a lower-level pipeline is waited to be finished, thereby causing the design to be not only complicated, but also having low efficiency.

Disclosure of Invention

In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides an instruction processing method for a processor, including the steps of:

sending a matching request of preset parameters to the bypass unit through an instruction fetching unit;

the bypass unit matches preset parameters based on the preset parameter matching request;

and responding to the successful matching of the preset parameters, the bypass unit sends a decoding result of the instruction corresponding to the matched preset parameters to the execution unit.

In some embodiments, further comprising:

in response to the preset parameters not being matched, the bypass unit returns an instruction fetching request to the instruction fetching unit;

the instruction fetching unit acquires a corresponding instruction from an instruction register by using the unmatched preset parameters based on the instruction fetching request and sends the instruction to a decoding unit for decoding the instruction;

the decoding unit sends the decoded instruction to the execution unit and sends the decoded instruction and the corresponding preset parameter to the bypass unit.

In some embodiments, the bypass unit performs matching of the preset parameter based on the preset parameter matching request, further comprising:

receiving, by a matching unit of the bypass units, a matching request;

and the matching unit is used for matching preset parameters according to the preset parameters cached by the first caching unit in the bypass unit.

In some embodiments, the matching unit performs matching of preset parameters according to preset parameters cached by a first cache unit in the bypass unit, and further includes:

the matching unit sends a first reading request to a first scheduling unit in the bypass unit;

and the first scheduling unit allows the matching unit to read the preset parameters cached by the first caching unit in response to receiving the first reading request.

In some embodiments, in response to the matching of the preset parameter being successful, the bypass unit sends a decoding result of an instruction corresponding to the matched preset parameter to the execution unit, further comprising:

and the matching unit responds to the successful matching of the preset parameters, reads the decoding result of the corresponding instruction in the cache of the second cache unit in the bypass unit, and sends the decoding result to the execution unit.

In some embodiments, reading a decoded result of the corresponding instruction in a second cache unit cache in the bypass unit further comprises:

the matching unit sends a second read request to a second scheduling unit in the bypass unit;

the first scheduling unit allows the matching unit to read the decoding result of the corresponding instruction cached by the second caching unit in response to receiving the second read request.

In some embodiments, the decoding unit sends the decoded instruction and the corresponding preset parameter to the bypass unit, further comprising:

the decoding unit sends the decoded instruction and the corresponding preset parameters to an updating unit of the bypass unit;

the updating unit sends a first write request and a second write request to the first scheduling unit and the second scheduling unit respectively;

after receiving the first write request, the first scheduling unit allows the updating unit to write the preset parameters into the first cache unit;

and after receiving the second write request, the second scheduling unit allows the updating unit to write the decoded instruction and the corresponding preset parameters into the second cache unit.

In some embodiments, further comprising:

responding to the first scheduling unit and/or the second scheduling unit receiving a read request and a write request at the same time, and responding to one request and then responding to the other request by the first scheduling unit and/or the second scheduling unit;

judging whether the number of the cached preset parameters in the first cache unit reaches a threshold value and/or whether the number of the cached decoded instructions and the corresponding preset parameters in the second cache unit reaches the threshold value;

and in response to the threshold value being reached, replacing the cached preset parameters in the first cache unit and/or the cached decoded instructions and the corresponding preset parameters in the second cache unit with the new decoded instructions and the corresponding preset parameters by using a preset algorithm.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:

at least one processor; and

a memory storing a computer program operable on the processor, wherein the processor executes the program to perform any of the steps of the bandwidth control method described above.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of any one of the bandwidth control methods described above.

The invention has one of the following beneficial technical effects: the method disclosed by the invention adds bypass operation between instruction fetching and decoding, and performs instruction pre-fetching and pre-decoding on the instruction, and directly skips the instruction fetching and decoding module for instruction operation hit in the bypass module, and directly performs execution operation, so that the time for instruction fetching and decoding can be greatly saved, the efficiency of a production line is improved, and the execution efficiency of a processor is also improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a flow diagram of a standard five-stage pipeline in the prior art;

FIG. 2 is a flowchart illustrating a method for processing instructions of a processor according to an embodiment of the present invention;

FIG. 3 is a block diagram of a 5-stage pipeline architecture for a processor provided by an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

According to an aspect of the present invention, an embodiment of the present invention proposes an instruction processing method of a processor, as shown in fig. 2, which may include performing the following steps: s1, the instruction fetching unit sends a matching request of preset parameters to the bypass unit; s2, the bypass unit matches the preset parameters based on the preset parameter matching request; s3, in response to the preset parameter matching success, the bypass unit sends the decoding result of the instruction corresponding to the matched preset parameter to the execution unit.

The method disclosed by the invention adds bypass operation between instruction fetching and decoding, and performs instruction pre-fetching and pre-decoding on the instruction, and directly skips the instruction fetching and decoding module for instruction operation hit in the bypass module, and directly performs execution operation, so that the time for instruction fetching and decoding can be greatly saved, the efficiency of a production line is improved, and the execution efficiency of a processor is also improved.

In some embodiments, fig. 3 illustrates a block diagram of a 5-stage pipeline architecture of a processor, and as shown in fig. 3, an instruction fetch unit (Ifetch) may include an instruction fetch module and a PC value processing module, a decode unit (Dec) may include a decode module and a branch prediction module, an execution unit (Exee) may include an execution module, an access unit (Mem) may include an access module, and a write-back unit (WB) may include a write-back module.

The branch prediction module is configured to guess which branch will be executed before the branch instruction is executed, so as to improve the performance of an instruction pipeline of the processor, the execution module is configured to transmit the branch to a specific arithmetic unit for execution according to a specific operation type of the instruction, the copy storage module is configured to access a data memory, and the write-back module is configured to write back an execution result to a register. It should be noted that the above modules are all conventional modules in a processor 5-stage pipeline structure, and the invention is not limited thereto.

It should be noted that, in the embodiment of the present invention, the preset parameter may be a PC value (an address for indicating and storing a next instruction to be executed), and the PC value may be generated by the PC value processing module, or may be another parameter.

In some embodiments, the method proposed by the present invention may further include:

the instruction fetching unit acquires a corresponding instruction from an instruction register by using the unmatched preset parameters based on the instruction fetching request and sends the instruction to a decoding unit for decoding the instruction; the decoding unit sends the decoded instruction to the execution unit and sends the decoded instruction and the corresponding preset parameter to the bypass unit.

Specifically, if the lower-level pipeline has an instruction fetching requirement, matching is firstly carried out on the bypass unit, if matching is successful, a decoding result is directly sent to the execution unit to be executed, and if matching is unsuccessful, a corresponding instruction register can be used for decoding after instruction fetching, and the decoding does not pass through the bypass unit any more. After the decoding unit completes decoding, the decoding result is sent to the execution unit, and meanwhile, the decoding result is also sent to the bypass unit, so that the decoding result is replaced and stored.

receiving, by a matching unit of the bypass units, a matching request;

Specifically, as shown in fig. 3, when receiving a request for a read instruction from the PC value processing module, the matching unit matches the PC value in the request for the read instruction with the stored PC value cached in the first caching unit.

Specifically, as shown in fig. 3, if the matching is successful, a second read request is sent to the second scheduling unit, the second scheduling unit allows the matching unit to read the decoding result of the corresponding instruction after receiving the second read request, and then the matching unit sends the decoding result to the execution module after obtaining the decoding result.

Specifically, as shown in fig. 3, if the matching unit fails to match successfully, a read instruction operation is issued to the instruction fetching module, that is, the instruction fetching module obtains a corresponding instruction from the instruction memory according to the PC value, and then sends the instruction to the decoding module for decoding, and then sends a decoding result to the execution module and the bypass unit.

In some embodiments, the update unit in the bypass unit sends the first write request and the second write request to the first scheduling unit and the second scheduling unit, respectively, to write the PC value into the first buffer unit and write the PC value and the decoding result into the second buffer unit.

In some embodiments, in response to the first scheduling unit and/or the second scheduling unit receiving a read request and a write request at the same time, the first scheduling unit and/or the second scheduling unit responds to one of the requests and then responds to the other request.

Specifically, the first scheduling unit and/or the second scheduling unit is responsible for receiving scheduling work of the request of the matching unit and the request of the latest updating unit, and completing read-write operation on the first cache unit and the second cache unit. When the first scheduling module and/or the second scheduling module simultaneously receive the read request sent by the matching module and the write request sent by the updating module, one of the requests is responded first, and then the other request is responded.

In some embodiments, the method further includes determining whether the number of the preset parameters cached in the first cache unit reaches a threshold and/or whether the number of the decoded instructions and the corresponding preset parameters cached in the second cache unit reaches a threshold;

Specifically, after receiving the decoding result sent by the decoding module, the updating unit sends a request to the first and/or second scheduling module for the decoding result according to the least recently used algorithm, to request to add or replace the buffering result in the first buffering unit and/or the second buffering unit. When new data is cached in the cache unit again after the cache data in the first cache unit and/or the second cache unit has reached the threshold value, the least recently used data can be replaced by using a least recently used algorithm.

Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer apparatus 501, including:

at least one processor 520; and

the memory 510, the memory 510 stores a computer program 511 that can be run on the processor, and the processor 520 executes the program to perform the steps of any of the above instruction processing methods of the processor.

Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 5, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the steps of the instruction processing method of the processor as any one of the above.

Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct related hardware to implement the methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

In addition, the apparatuses, devices, and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal device, such as a server, and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed by the embodiment of the invention can be applied to any one of the electronic terminal devices in the form of electronic hardware, computer software or a combination of the electronic hardware and the computer software.

Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. An instruction processing method of a processor, comprising performing the steps of:

sending a matching request of preset parameters to a bypass unit through an instruction fetching unit;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the bypass unit performs matching of preset parameters based on the preset parameter matching request, further comprising:

receiving, by a matching unit of the bypass units, a matching request;

4. The method as claimed in claim 3, wherein the matching unit performs matching of the preset parameter according to the preset parameter buffered by the first buffer unit in the bypass unit, further comprising:

5. The method of claim 4, wherein in response to the matching of the preset parameters being successful, the bypass unit sends a decoded result of an instruction corresponding to the matched preset parameters to the execution unit, further comprising:

6. The method of claim 5, wherein reading a decoded result of the corresponding instruction in a second cache unit cache in the bypass unit, further comprises:

7. The method of claim 6, wherein the decode unit sends the decoded instructions and corresponding preset parameters to the bypass unit, further comprising:

8. The method of claim 7, further comprising:

9. A computer device, comprising:

at least one processor; and

memory storing a computer program operable on the processor, characterized in that the processor performs the steps of the method according to any of claims 1-8 when executing the program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 8.