CN114356416B - Processor, control method and device thereof, electronic equipment and storage medium - Google Patents

Processor, control method and device thereof, electronic equipment and storage medium Download PDF

Info

Publication number
CN114356416B
CN114356416B CN202111671303.2A CN202111671303A CN114356416B CN 114356416 B CN114356416 B CN 114356416B CN 202111671303 A CN202111671303 A CN 202111671303A CN 114356416 B CN114356416 B CN 114356416B
Authority
CN
China
Prior art keywords
instruction
instruction execution
execution unit
execution units
dispatched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111671303.2A
Other languages
Chinese (zh)
Other versions
CN114356416A (en
Inventor
张克松
张俊建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202111671303.2A priority Critical patent/CN114356416B/en
Publication of CN114356416A publication Critical patent/CN114356416A/en
Application granted granted Critical
Publication of CN114356416B publication Critical patent/CN114356416B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Power Sources (AREA)
  • Advance Control (AREA)

Abstract

A control method for a processor, a control apparatus for a processor, an electronic device, and a computer-readable storage medium. The processor includes a plurality of instruction execution units of different types, and the control method for the processor includes: acquiring attribute information of a plurality of instructions to be dispatched, wherein the attribute information represents the type of an instruction execution unit to which the instructions to be dispatched are to be dispatched; determining M candidate instruction execution units in an idle dispatching state in at least one instruction dispatching process from the current moment from a plurality of instruction execution units based on the attribute information of a plurality of instructions to be dispatched; determining at least one instruction execution unit to be dormant from the M candidate instruction execution units; m is a positive integer. According to the method, the clock can be closed on the level of the instruction execution unit, and the dynamic power consumption of the instruction execution unit with the clock closed is greatly reduced, so that the power consumption of the whole instruction execution unit can be reduced, and the overall power consumption of the processor can be further reduced.

Description

Processor, control method and device thereof, electronic equipment and storage medium
Technical Field
Embodiments of the present disclosure relate to a control method for a processor, a control apparatus for a processor, an electronic device, and a computer-readable storage medium.
Background
With the development of science and technology, the variety of electronic devices varies day by day, and a processor is a core component of an electronic device, and the processor includes, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and the like. The power consumption of the processor can affect the performance of the processor, and the current design of the processor considers the ratio of the performance to the power consumption; the excessive power consumption of the processor can cause excessive chip heating, so that the chip is damaged, and the design difficulty of a chip application system is increased; the increase in power consumption of the processor increases the power consumption of the entire system, and the effectiveness of the energy cannot be guaranteed. Therefore, how to reduce the power consumption of the processor is a very important issue in the design of the processor.
Disclosure of Invention
At least one embodiment of the present disclosure provides a control method for a processor including a plurality of instruction execution units of different types, the method including: obtaining attribute information of a plurality of instructions to be dispatched, wherein the attribute information represents the type of an instruction execution unit to which the instructions to be dispatched are dispatched; determining M candidate instruction execution units in an idle dispatching state in at least one instruction dispatching process from the current time from the plurality of instruction execution units based on the attribute information of the plurality of instructions to be dispatched; determining at least one instruction execution unit to be hibernated from the M candidate instruction execution units; m is a positive integer.
For example, the control method provided by an embodiment of the present disclosure further includes: and closing the clock of the instruction execution unit to be dormant so as to enable the instruction execution unit to be dormant to be converted into a dormant state from an activated state.
For example, the control method provided by an embodiment of the present disclosure further includes: determining at least one instruction execution unit with instruction dispatch in at least one instruction dispatch process from the instruction execution units in the dormant state as an instruction execution unit to be activated based on the attribute information of the plurality of instructions to be dispatched; and turning on a clock of the instruction execution unit to be activated so as to convert the instruction execution unit to be activated from a sleep state to an activated state.
For example, in a control method provided in an embodiment of the present disclosure, determining at least one instruction execution unit to be hibernated from the M candidate instruction execution units includes: and determining at least one instruction execution unit which releases all dispatched instructions from the M candidate instruction execution units according to the instruction release queue, wherein the instruction execution unit is used as the at least one instruction execution unit to be dormant.
For example, in the control method provided in an embodiment of the present disclosure, obtaining attribute information of a plurality of instructions to be dispatched includes: and acquiring attribute information of a plurality of instructions to be dispatched from the instruction dispatching queue.
For example, in the control method provided in an embodiment of the present disclosure, the at least one instruction dispatching process is a process of dispatching the first P instructions to be dispatched in the instruction dispatching queue; determining M candidate instruction execution units from the plurality of instruction execution units that are in a dispatch idle state during at least one instruction dispatch from a current time, comprising: determining M instruction execution units without instructions to be dispatched in the process of dispatching the previous P instructions to be dispatched from the plurality of instruction execution units, as the M candidate instruction execution units in a dispatching idle state, wherein P is an integer greater than or equal to 0.
For example, in the control method provided by an embodiment of the present disclosure, the at least one instruction dispatching process is a process of dispatching all instructions to be dispatched in the instruction dispatching queue; determining M candidate instruction execution units from the plurality of instruction execution units that are in a dispatch idle state during at least one instruction dispatch from a current time, comprising: determining M instruction execution units from the plurality of instruction execution units to which no instruction is dispatched in the process of dispatching the all to-be-dispatched instructions as the M candidate instruction execution units in the dispatching idle state.
For example, in the control method provided in an embodiment of the present disclosure, the multiple instructions to be dispatched are a loop body instruction sequence, and the at least one instruction dispatching process is a process of dispatching the loop body instruction sequence; wherein determining M candidate instruction execution units from the plurality of instruction execution units that are in a dispatch idle state during at least one instruction dispatch from a current time comprises: determining idle probabilities respectively corresponding to the plurality of instruction execution units based on the loop body instruction sequence; and taking the instruction execution unit with the idle probability larger than or equal to the probability threshold value as the M candidate instruction execution units in the dispatching idle state.
For example, in the control method provided by an embodiment of the present disclosure, the probability threshold is 100%, or the probability threshold is greater than or equal to 50% and less than 100%.
For example, the control method provided by an embodiment of the present disclosure further includes: for each instruction execution unit to be dormant with the idle probability being greater than or equal to the probability threshold and smaller than 100%, closing the clock of the instruction execution unit to be dormant based on the position of the instruction corresponding to the instruction execution unit to be dormant in an instruction dispatching queue and the instruction release condition in the instruction execution unit to be dormant.
For example, in a control method provided in an embodiment of the present disclosure, turning on a clock of the instruction execution unit to be activated includes: and opening a clock of the instruction execution unit to be activated according to the position of the instruction to be dispatched corresponding to the instruction execution unit to be activated in the instruction dispatching queue.
For example, in a control method provided in an embodiment of the present disclosure, opening a clock of the to-be-activated instruction execution unit according to a position of the to-be-dispatched instruction corresponding to the to-be-activated instruction execution unit in an instruction dispatch queue includes: and responding to the occurrence of the instruction to be dispatched corresponding to the instruction execution unit to be activated at the last bit of the instruction dispatch queue, and turning on a clock of the instruction execution unit to be activated, wherein the last bit of the instruction dispatch queue is the end position of the instruction dispatch queue for receiving the newly added instruction.
For example, in a control method provided by an embodiment of the present disclosure, turning on a clock of the instruction execution unit to be activated according to a position of an instruction to be dispatched, which corresponds to the instruction execution unit to be activated, in an instruction dispatch queue includes: and responding to the fact that the to-be-dispatched instruction corresponding to the instruction execution unit advances to a preset position of the instruction dispatch queue from the last bit of the instruction dispatch queue, and turning on a clock of the to-be-activated instruction execution unit.
For example, the control method provided by an embodiment of the present disclosure further includes: under the condition of a multi-thread running mode, determining a plurality of groups of instruction execution units to be dormant corresponding to a plurality of threads respectively, wherein each group of instruction execution units to be dormant comprises at least one instruction execution unit to be dormant; determining intersection instruction execution units of the multiple groups of instruction execution units to be dormant; closing the clock of the intersect instruction execution unit.
For example, the control method provided by an embodiment of the present disclosure further includes: under the condition of a multi-thread operation mode, determining a plurality of groups of instruction execution units to be activated, which correspond to a plurality of threads respectively; determining all instruction execution units of the plurality of groups of instruction execution units to be activated; and opening the clocks of all the instruction execution units.
For example, in a control method provided by an embodiment of the present disclosure, the processor has a first instruction fetch mode and a second instruction fetch mode. Determining a plurality of groups of instruction execution units to be dormant corresponding to a plurality of threads respectively, including: for a thread corresponding to the first instruction fetching mode, determining M candidate instruction execution units without instruction dispatch in at least one instruction dispatch process from the current moment from the plurality of instruction execution units, and determining at least one instruction execution unit to be dormant from the M candidate instruction execution units; and for the thread corresponding to the second instruction fetching mode, determining idle probabilities corresponding to the multiple instruction execution units respectively based on a loop body instruction sequence, determining M candidate instruction execution units with idle probabilities larger than or equal to a probability threshold, and determining at least one instruction execution unit to be dormant from the M candidate instruction execution units.
At least one embodiment of the present disclosure provides a control apparatus for a processor including a plurality of instruction execution units of different types, the control apparatus including an attribute acquisition module, a first determination module, and a second determination module. The attribute obtaining module is configured to obtain attribute information of a plurality of instructions to be dispatched, wherein the attribute information represents the type of an instruction execution unit to which the instructions to be dispatched are dispatched; the first determining module is configured to determine M candidate instruction execution units in an idle dispatching state in at least one instruction dispatching process from the current time from the plurality of instruction execution units based on the attribute information of the plurality of instructions to be dispatched; the second determining module is configured to determine at least one instruction execution unit to be hibernated from the M candidate instruction execution units, M being a positive integer.
At least one embodiment of the present disclosure provides a processor including the control device provided in any one of the embodiments of the present disclosure.
At least one embodiment of the present disclosure provides an electronic device comprising a processor; a memory including one or more computer program modules; wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the control method provided by any embodiment of the disclosure.
At least one embodiment of the present disclosure provides a computer-readable storage medium for storing non-transitory computer-readable instructions that, when executed by a computer, may implement a control method provided by any embodiment of the present disclosure.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.
Fig. 1 illustrates a schematic diagram of a processor provided by at least one embodiment of the present disclosure;
FIG. 2 illustrates a schematic diagram of the transmission of clock signals provided by at least one embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of one power consumption distribution of a processor;
fig. 4 is a flowchart illustrating a control method for a processor according to at least one embodiment of the present disclosure;
fig. 5A illustrates a schematic diagram of another processor provided by at least one embodiment of the present disclosure;
fig. 5B is a schematic diagram illustrating a selection process of an instruction execution unit according to at least one embodiment of the present disclosure;
FIG. 6 illustrates another transmission schematic of clock signals provided by at least one embodiment of the present disclosure;
fig. 7 illustrates a schematic diagram of another processor provided by at least one embodiment of the present disclosure;
FIG. 8 is a diagram illustrating an execution of a loop statement provided by at least one embodiment of the present disclosure;
FIG. 9 is a diagram illustrating a statistical method of idle probability of an instruction execution unit during execution of a loop body instruction sequence, according to at least one embodiment of the present disclosure;
fig. 10 illustrates a schematic diagram of another processor provided by at least one embodiment of the present disclosure;
fig. 11 illustrates a schematic block diagram of a control apparatus for a processor according to at least one embodiment of the present disclosure;
fig. 12 is a schematic block diagram of another control apparatus for a processor provided in at least one embodiment of the present disclosure;
fig. 13 illustrates a schematic block diagram of a processor provided by at least one embodiment of the present disclosure;
fig. 14 is a schematic block diagram of an electronic device according to some embodiments of the present disclosure
Fig. 15 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure; and
fig. 16 is a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.
Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
Fig. 1 illustrates a schematic diagram of a processor provided in at least one embodiment of the present disclosure.
As shown in FIG. 1, an instruction execution pipeline of a processor may include an instruction dispatch unit 110 and an instruction execution unit 120, the instruction dispatch unit 110 including an instruction dispatch queue, the instruction execution unit 120 including a plurality of instruction execution units C1, C2, C3, \ 8230;, cn, n being positive integers, of different types. Multiple instruction execution units C1, C2, C3, \ 8230, cn are used to execute different types of instructions, or multiple instruction execution units cooperate to complete the same type of instructions. For example, the attribute information of the instructions allocated to the instruction execution units C1, C2, C3, \ 8230, cn are different, and the attribute information represents the type of the instruction execution unit to which the instruction to be dispatched is to be dispatched, that is, the attribute information can represent the type of the instruction execution unit to which the instruction to be dispatched needs to be dispatched. Each instruction execution unit may include a plurality of components or circuits, and the like, and may include, for example, a register, an arithmetic device (e.g., an addition, subtraction, multiplication, and division arithmetic device), a memory access component (e.g., a read/write operation device for a memory access address), and the like.
The processor may be a single-core processor or a multi-core processor, and each processor core of the multi-core processor may include the instruction dispatching unit 110, the instruction executing unit 120, and the like, and may further include, for example, a branch prediction unit, an instruction fetching unit, a decoding unit, a register renaming unit, and the like, which is not limited by the present disclosure.
For example, the instruction dispatch queue 111 contains a plurality of instructions to be dispatched that are arranged in time sequence, the instructions in the queue are dispatched sequentially from front to back, and as the instructions on the front side are continuously dispatched, the instructions on the back side continuously move forward, and new instruction information is filled into the instruction dispatch queue after passing through the decoding unit. Each instruction to be dispatched contains attribute information, such as fixed point, floating point, vector, etc. attributes corresponding to the instruction execution unit. Instruction dispatch queue 111 may dispatch instructions to be dispatched to corresponding instruction execution units C1, C2, C3, \ 8230, cn in order based on the attribute of each instruction to be dispatched.
For example, during the operation of the processor, a system clock is required to synchronize the operations of the components inside the processor. In the processor core, clock signals with the same frequency and the same phase are adopted, so that all functional components in the processor core are ensured to run according to the same rhythm.
Fig. 2 illustrates a schematic diagram of a transmission clock signal provided by at least one embodiment of the present disclosure.
As shown in fig. 2, the clock signal may be transmitted to the components (e.g., registers) included in the respective instruction execution units C1, C2, C3, \ 8230;, cn, to control the synchronous operation of the corresponding components. For example, in order to reduce power consumption, a Clock Gating CG (Clock Gating) may be provided for at least some components in the instruction execution unit, and the Clock Gating (hereinafter also referred to as "Gating") may close clocks of components that are not currently used, for example, temporarily cut off a Clock signal of a register that is not used in a current Clock cycle, close a transmission function of the register, prevent useless data from entering a next-stage logic, and avoid causing a series of unnecessary logic flips, thereby achieving an effect of reducing power consumption. Accordingly, an appropriate amount of gating can be provided inside the respective instruction execution units to reduce the power consumption of each instruction execution unit.
FIG. 3 shows a schematic diagram of one power consumption distribution for a typical processor.
As shown in fig. 3, the power consumption of the processor mainly includes items such as combinatorial logic (combinatorial), registers (Flops), gates (Gate), clocks (Clocks), custom memory modules (Macro), and latches (Latch). With the main power consumption distributed among combinatorial logic, registers, gating and clocks. The power consumption of the clock is about 7%, and the power consumption introduced by gating as power consumption control is also relatively high, about 15%. That is, although the role of gating is to reduce power consumption, on the other hand, gating itself causes some power consumption.
Therefore, a power consumption control method of adding clock gating to the register can reduce the dynamic power consumption of the register. However, gating circuits also introduce additional circuit power consumption. This requires that the number of gating circuits added needs to be modest, but there is no guarantee that gating covers all registers. If more gates are arranged in the instruction execution unit, the overall power consumption of the processor is increased, the effect of reducing the power consumption cannot be achieved, and with the increase of the circuit complexity of the instruction execution unit, the additional power consumption generated by the clock and the gates in the instruction execution unit is increased, and the effect of well reducing the power consumption of the processor cannot be achieved.
The embodiment of the disclosure provides a control method for a processor, a control device for the processor, an electronic device and a computer readable storage medium. The disclosed embodiments provide a method for reducing power consumption of a processor from the perspective of the processor microarchitecture. If the current gating is said to be for each register device, then gating in the embodiments of the present disclosure is primarily for each instruction execution unit in the processor. The control method for the processor comprises the following steps: acquiring attribute information of a plurality of instructions to be dispatched, wherein the attribute information represents the type of an instruction execution unit to which the instructions to be dispatched are to be dispatched; determining M candidate instruction execution units in an idle dispatching state in at least one instruction dispatching process from the current moment from a plurality of instruction execution units based on the attribute information of a plurality of instructions to be dispatched; at least one instruction execution unit to be hibernated is determined from the M candidate instruction execution units, M being a positive integer.
According to the control method for the processor, the clock can be closed on the level of the instruction execution unit, and the dynamic power consumption of the instruction execution unit with the clock closed is greatly reduced, so that the power consumption of the whole instruction execution unit can be reduced, and the whole power consumption of the processor can be further reduced.
Fig. 4 shows a flowchart of a control method for a processor according to at least one embodiment of the present disclosure.
As shown in fig. 4, the control method may include steps S210 to S230.
Step S210: and acquiring attribute information of a plurality of instructions to be dispatched.
Step S220: and determining M candidate instruction execution units in a dispatching idle state in at least one instruction dispatching process from the current time on the basis of the attribute information of the plurality of instructions to be dispatched.
Step S230: at least one instruction execution unit to be hibernated is determined from the M candidate instruction execution units.
Fig. 5A illustrates a schematic diagram of another processor provided by at least one embodiment of the present disclosure. The processor to which the control method is applied may be, for example, the processor shown in fig. 5A, which is added with the control device 300 compared to the processor shown in fig. 1, and the control method may be executed by the control device 300. As described above, the processor may also be a single-core processor or a multi-core processor, and each processor core of the multi-core processor may individually include the instruction dispatching unit 110, the instruction executing unit 120, and the like, and may further include, for example, a branch prediction unit, an instruction fetching unit, a decoding unit, a register renaming unit, and the like, which is not limited by the present disclosure.
For example, in step S210, the attribute information characterizes the type of instruction execution unit to which the instruction to be dispatched is to be dispatched. The plurality of instructions to be dispatched may be, for example, all or a portion of the instructions in the instruction dispatch queue. The attribute information is, for example, fixed point, floating point, vector, or other attributes corresponding to the instruction execution unit.
For example, in step S220, according to the attribute information of each instruction to be dispatched, the instruction execution units that respectively execute the plurality of instructions to be dispatched may be determined, and then it may be determined which instruction execution units are in a dispatch idle state (for example, no instruction execution unit to which an instruction is dispatched) in the process of executing at least part of the instruction to be dispatched, and each instruction execution unit in the dispatch idle state is taken as a candidate instruction execution unit, thereby obtaining M candidate instruction execution units, where M is an integer greater than or equal to 0.
For example, in step S230, M candidate instruction execution units may be all regarded as the instruction execution units to be hibernated; or a part of the candidate instruction execution units may be selected from the M candidate instruction execution units as the instruction execution units to be hibernated. For example, a candidate instruction execution unit satisfying the release condition may be selected from the M candidate instruction execution units as an instruction execution unit to be hibernated according to the instruction release queue, which will be described in detail below.
Fig. 5B is a schematic diagram illustrating a selection process of an instruction execution unit according to at least one embodiment of the present disclosure, and as shown in fig. 5B, a plurality of instruction execution units (e.g., C1, C2, C3, \ 8230;, cn) included in the instruction execution portion may be categorized into set 1. After step S220, a number of candidate instruction execution units (e.g., C1, C2, C3, C4) are selected, which may be attributed to set 2. After step S230, a number of instruction execution units to be hibernated (e.g. C1, C2) are further selected, and the number of instruction execution units to be hibernated may be categorized into set 3.
For example, after determining the instruction execution units to be hibernated, the clock of each instruction execution unit to be hibernated may be turned off to cause the instruction execution units to be hibernated to transition from an active state to a sleep state.
Fig. 6 illustrates a schematic diagram of another transmission clock signal provided by at least one embodiment of the present disclosure.
As shown in fig. 6, for example, a clock switch device 122 may be provided for each of the instruction execution units C1, C2, C3, \\ 8230;, cn, and during execution of at least a portion of the instruction to be dispatched, the clock of the instruction execution unit to be hibernated is turned off by the clock switch device, that is, the clock of the instruction execution unit to be hibernated is cut off, so that the instruction execution unit is in a clock-off state, and thus, similarly, clock gating in the instruction execution unit to be hibernated is not controlled, and power consumption of the clock gating is reduced; the dynamic power consumption of the register is reduced due to the fact that the clock is turned off; in addition, because no level inversion exists, the dynamic power consumption of the combinational logic is reduced; meanwhile, clock transmission is blocked, so that clock power consumption is reduced. The clock switching device 122 may be implemented as clock gating, for example.
According to the control method provided by the embodiment of the disclosure, the instruction execution unit in the dispatch idle state in the dispatch process of a plurality of instructions in the future is judged according to the attribute information of the instruction to be dispatched, and the clock of at least part of the instruction execution unit in the dispatch idle state is cut off. Because most program behaviors have pertinence, and the instruction behaviors are more concentrated on a specific instruction execution unit within a period of time, the circuit power consumption of the instruction execution unit can be greatly reduced, and the circuit power consumption comprises a sequential circuit, a combinational circuit, a clock circuit and a gating circuit, so that the overall power consumption of the processor can be reduced.
For example, an appropriate amount of clock gating CG may also be provided within the instruction execution unit. Under the condition that the clock of the instruction execution unit is in the closed state, all devices in the instruction execution unit are in the closed state, and compared with the prior art, the power consumption generated by the gating circuit is reduced. Under the condition that the clock of the instruction execution unit is in an open state, the clock gating inside the instruction execution unit can close the corresponding parts or parts of the clock gating according to needs, so that the effect of reducing power consumption to a certain extent is achieved.
Fig. 7 illustrates a schematic diagram of another processor provided by at least one embodiment of the present disclosure.
As shown in fig. 7, compared to the processor shown in fig. 5A, the processor may further include an instruction release unit 130 including an instruction release queue 131. For example, the instruction release queue 131 includes a plurality of to-be-released instructions, after the instructions in the instruction dispatch queue 111 are dispatched to the corresponding instruction execution units, the dispatched instructions are added to the instruction release queue 131 as to-be-released instructions, and after the current instruction and the previous instructions are completely executed, the instruction release unit may release the completed instructions from the instruction release queue 131 and exit (reire) the instruction execution pipeline of the processor. For example, taking instruction i as an example, after an instruction dispatch unit dispatches instruction i to an instruction execution unit j, the instruction release unit 131 adds the instruction i until the instruction execution unit j finishes executing the instruction i, and after all instructions before instruction i in the instruction release queue are executed, the instruction release unit may release the instruction i from the instruction release queue 131. That is, if the instruction i is still in the instruction release queue 131, it indicates that the instruction i has been dispatched but not released, and the instruction may be executed or may have been executed, and in order to ensure the functional correctness of the processor, when the instruction is not released, the execution unit corresponding to the instruction is considered to be in a busy state, and if the instruction i is released from the instruction release queue 131, it indicates that the instruction i has been executed.
For example, in some examples, at least one instruction execution unit from the M candidate instruction execution units for which the instructions to be dispatched have all been released may be determined as the at least one instruction execution unit to be stalled based on the instruction release queue. For example, after determining M candidate instruction execution units according to step S220, the instruction execution units to be hibernated may be determined by determining which candidate instruction execution units have dispatched instructions but not yet executed and which candidate instruction execution units have executed completed instructions according to the instruction release queue. Based on the mode, the closed instruction execution units are ensured not to have the instruction in execution, so that the condition that the instruction execution is interrupted is avoided, and the function of the processor is ensured.
For example, in some examples, in step S220, the at least one instruction dispatch process may be a dispatch process of all instructions to be dispatched in the instruction dispatch queue. M instruction execution units to which no instruction is dispatched in the process of dispatching all instructions to be dispatched can be determined from the plurality of instruction execution units as M candidate instruction execution units in a dispatching idle state.
For example, there are K (K is greater than or equal to 0) to-be-dispatched instructions in the instruction dispatch queue, K is 100, for example, the attribute information of the first 50 to-be-dispatched instructions in the 100 to-be-dispatched instructions is a fixed point, and the fixed point instruction execution unit executes the fixed point instruction; the attribute information of the last 50 instructions to be dispatched is floating point and is executed by the floating point instruction execution unit. The remaining instruction execution units (e.g., vector instruction execution unit, matrix instruction execution unit, memory-accessed instruction execution unit, etc.) except the fixed-point instruction execution unit and the floating-point instruction execution unit may be taken as candidate instruction execution units. In the process of executing the 100 instructions, the vector instruction execution unit, the matrix instruction execution unit, the memory access instruction execution unit and other instruction execution units all have no instruction to dispatch, and the instruction execution unit to be dormant can be determined by combining with the instruction release condition, and the clock of the instruction execution unit to be dormant is temporarily turned off.
For example, in other examples, in step S220, the at least one instruction dispatch process may be a dispatch process of the first P instructions to be dispatched in the instruction dispatch queue. M instruction execution units without instructions to be dispatched in the process of dispatching the previous P to-be-dispatched instructions can be determined from the plurality of instruction execution units as M candidate instruction execution units in a dispatching idle state. For example, N instruction execution units corresponding to the attribute information of the previous P to-be-dispatched instructions in the instruction dispatch queue 111 in the plurality of instruction execution units may be determined first, and the remaining instruction execution units except the N instruction execution units in the plurality of instruction execution units are taken as candidate instruction execution units, where P and N are integers greater than or equal to 0.
For example, there are K to-be-dispatched instructions in the instruction dispatch queue, for example, K is 100, and the attribute information of the first 30 to-be-dispatched instructions in the 100 to-be-dispatched instructions is a fixed point and is executed by the fixed point instruction execution unit; the attribute information of the middle 40 instructions to be dispatched is a floating point and is executed by a floating point instruction execution unit; the attribute information of the last 30 instructions to be dispatched is a vector, which is executed by the vector instruction execution unit. If P is 70, the first P to-be-dispatched instructions refer to the first 30 to-be-dispatched instructions and the middle 40 to-be-dispatched instructions, and in the process of dispatching the first 70 to-be-dispatched instructions, a plurality of instruction execution units such as other instruction execution units (for example, a vector instruction execution unit, a matrix instruction execution unit, an access instruction execution unit and the like) except the fixed-point instruction execution unit and the floating-point instruction execution unit do not have instructions dispatched and serve as candidate instruction execution units. The instruction execution unit to be dormant can be determined by combining with the instruction release condition, and the clock of the instruction execution unit to be dormant is temporarily closed.
For the expressions of "front" and "back" of the instruction dispatch queue described in the embodiments of the present disclosure, the "front" indicates that the instruction dispatch queue dispatches one end of the instruction, the "back" indicates that the received instruction enters one end of the instruction dispatch queue, and in general, the instruction located on the "front" side enters the instruction dispatch queue earlier than the instruction located on the "back" side enters the instruction dispatch queue, and the instruction located on the "front" side is dispatched to the instruction execution unit earlier than the instruction located on the "back" side.
For example, the control method may further include: determining at least one instruction execution unit for instruction dispatch in at least one instruction dispatch process from the instruction execution units in the dormant state as an instruction execution unit to be activated based on the attribute information of the plurality of instructions to be dispatched; and turning on a clock of the instruction execution unit to be activated to convert the instruction execution unit to be activated from the sleep state to the activation state.
For example, based on the attribute information of a plurality of instructions to be dispatched, an instruction execution unit to be activated is determined from the instruction execution units currently in the state of closing the clock, the instruction execution unit to be activated corresponds to the attribute information of at least one instruction to be dispatched in the instruction dispatch queue, and the corresponding clock is opened with a certain probability.
For example, in some examples, the clock for an instruction execution unit to be activated is turned on based on the location in the instruction dispatch queue of the instruction execution unit to be dispatched corresponding to the instruction execution unit to be activated.
For example, the clock for the instruction execution unit to be activated may be turned on in response to the instruction to be dispatched corresponding to the instruction execution unit to be activated appearing in the last bit of the instruction dispatch queue. The last bit of the instruction dispatch queue is the end position of the instruction dispatch queue for receiving the new instruction. That is, the clock for the respective instruction execution unit to be activated may be turned on in response to the instruction corresponding to the instruction execution unit to be activated being present in the instruction dispatch queue. For example, the vector instruction execution unit and the matrix instruction execution unit are currently in a clock-off state, and once an instruction to be dispatched, whose attribute information is a vector, appears in the instruction dispatch queue, the vector instruction execution unit can serve as an instruction execution unit to be activated to turn on a clock of the vector instruction execution unit; once a to-be-dispatched instruction with attribute information of a matrix appears in the instruction dispatch queue, the matrix instruction execution unit can be used as an instruction execution unit to be activated, and a clock of the matrix instruction execution unit is opened. Based on the mode, the clock of the instruction execution unit corresponding to each instruction to be dispatched in the instruction dispatching queue can be fully ensured to be in an open state when the instruction to be dispatched is dispatched, and the execution component can execute the corresponding instruction without omission.
For example, in other examples, the clock for the instruction execution unit to be activated may be turned on in response to the instruction to be dispatched corresponding to the instruction execution unit advancing from the last bit of the instruction dispatch queue to the predetermined location of the instruction dispatch queue.
For example, in response to an instruction to be dispatched corresponding to an instruction execution unit to be activated reaching a predetermined location of the instruction dispatch queue, the clock of the instruction execution unit to be activated is turned on. For example, the instruction dispatch queue may have 100 instruction positions, 100 to-be-dispatched instructions may be arranged, and the predetermined position may be the 10 th instruction position, or the 50 th instruction position, or the 70 th instruction position, etc., in the order from front to back. The predetermined position may be determined according to practical circumstances and is not limited by the present disclosure. Taking the predetermined position as the 10 th instruction position in the sequence from front to back as an example, the vector instruction execution unit is currently in a clock off state, and when a to-be-dispatched instruction with attribute information as a vector appears in the instruction dispatch queue and the to-be-dispatched instruction moves forward from the last position to the 10 th instruction position, the clock of the vector instruction execution unit is turned on. Based on the mode, on one hand, the clock of the corresponding instruction execution unit can be ensured to be in an open state when each instruction to be dispatched in the instruction dispatching queue is dispatched; on the other hand, the time length for closing the instruction execution unit can be prolonged as much as possible, and further the power consumption of the corresponding execution unit can be reduced.
For example, the plurality of instructions to be dispatched include a sequence of loop body instructions, each sequence of loop body instructions including a number of instructions. The at least one instruction dispatch process in step S220 can be a process of dispatching a loop body instruction sequence.
Fig. 8 illustrates an execution diagram of a loop statement provided by at least one embodiment of the present disclosure.
As shown in fig. 8, for example, a statement X1, a judgment statement X2, and a statement X3 exist in the program, the judgment statement X2 is executed after the statement X1 is executed, it is necessary to return to the execution program statement X2 when the judgment result is the first judgment result (for example, no), and then the judgment statement X1 needs to be executed again, and it is necessary to return to the execution program statement X2 again when the judgment result is still the first judgment result, and the loop may be ended and the statement X3 may be executed downward until the judgment result becomes the second predetermined result (for example, yes).
For example, a loop body instruction sequence may be several instructions corresponding to one loop, that is, a plurality of instructions in one loop may form one instruction sequence, and the instruction sequence has a high probability of repeating within a certain period of time. For example, one cycleThe ring needs to execute (n + 1) instructions: OP (optical fiber) 0 、OP 1 、OP 2 、OP 3 、…、OP n Then instruction OP 0 ~OP n A loop body instruction sequence is formed, and in the instruction dispatching queue, the instructions of the loop body are repeated, wherein the specific repetition number depends on the loop number of the loop body.
For example, in the case of loop execution, since loop bodies have a nested phenomenon, a plurality of loop body instruction sequences arranged in sequence exist in an instruction dispatch queue, and if attributes are determined for all instructions in the instruction dispatch queue one by one, and an instruction execution unit capable of turning off a clock is determined according to the attribute of each instruction, additional power consumption overhead is caused. Thus, for the case of loop body instructions, the disclosed embodiments also provide another way to determine instruction execution units that can turn off clocks.
For example, in step S220, idle probabilities respectively corresponding to the multiple instruction execution units may be determined based on the loop body instruction sequence; and taking the instruction execution unit with the idle probability larger than or equal to the probability threshold value as M candidate instruction execution units in the dispatching idle state.
Fig. 9 illustrates a schematic diagram of a method for statistics of idle probability of an instruction execution unit during execution of a loop body instruction sequence according to at least one embodiment of the present disclosure.
As shown in FIG. 9, OP 0 ~OP n Representing a loop body instruction sequence, instruction OP 0 ~OP n And (5) circulating the instruction. In step S220, each instruction execution unit may be taken as an object, and a free probability during executing the loop body instruction sequence may be determined, for example, a free value ExeEmpty ((i-1) -i) of the instruction execution unit between each two adjacent instructions may be counted, where the free value is "1" if the instruction execution unit is in a free state, and is greater than or equal to "0" otherwise (equal to "0" in the embodiment of the present disclosure). Then, the idle probability of the instruction execution unit can be calculated using the following equation:
Figure BDA0003453073830000141
wherein, P (Empty) N is the number of instructions contained in the loop body instruction sequence, i is an integer greater than or equal to 1 and less than or equal to n, and ExeEmpty ((i-1) -i) is the idle probability of an instruction execution unit executing an instruction OP (i-1) To instruction OP i The idle value during the period, exeEmpty (n-0), is the instruction execution unit executing the instruction OP n To instruction OP 0 Idle value of period. At present, the prior art can detect the number of instructions in the loop body, and the embodiments of the present disclosure can detect the number of instructions in the loop body by using the prior art, which is not described herein again.
For example, the loop body instruction sequence includes 10 instructions, the first 9 instructions are executed by the fixed point instruction execution unit, and the last instruction is executed by the floating point instruction execution unit, and then the idle probability corresponding to the fixed point instruction execution unit is 0.1, the idle probability corresponding to the floating point instruction execution unit is 0.9, and the idle probabilities corresponding to the remaining instruction execution units (e.g., the vector instruction execution unit, the matrix instruction execution unit, the access instruction execution unit, and the like) are all 1. The greater the idle probability corresponding to the instruction execution unit, the longer the duration that the instruction execution unit is in the idle state during execution of the loop body instruction sequence is considered. If the idle probability reaches 1, it indicates that the instruction execution unit need not be used during execution of the loop body instruction sequence. If the idle probability is not 1 but is close to 1, then it indicates that there is very little time for the instruction execution unit to be used during execution of the loop body instruction sequence.
For example, in some examples, the probability threshold is 100% (i.e., the probability threshold is 1), that is, only the instruction execution units whose idle probability reaches 1 are clocked off during execution of the loop body instruction sequence (provided that instructions dispatched into the instruction execution units have all been released). In this way, all the instruction execution units that need to be used can be kept in the clock-on state all the time during execution of the loop body instruction sequence, and frequent switching of the execution unit clock-on/off states can be avoided.
For example, in other examples, the probability threshold is greater than or equal to 50% and less than 100% (i.e., the probability threshold is between 0.5 and 1). For example, with a probability threshold of 0.8, the clocks of the instruction execution units with an idle probability greater than or equal to 0.8 and less than or equal to 1 may be turned off during execution of the loop body instruction sequence. Following the above example, if the idle probability corresponding to the fixed-point instruction execution unit is 0.1, the idle probability corresponding to the floating-point instruction execution unit is 0.9, and the idle probabilities corresponding to the remaining instruction execution units (e.g., the vector instruction execution unit, the matrix instruction execution unit, the access instruction execution unit, etc.) are all 1, it may be considered that the clocks of the floating-point instruction execution unit and the remaining instruction execution units (e.g., the vector instruction execution unit, the matrix instruction execution unit, the access instruction execution unit, etc.) are turned off. In this way, instruction execution units which are not used or are rarely used during the execution of the loop instruction sequence in the loop body are set as candidate instruction execution units, and the instruction execution units to be dormant can be further obtained from the candidate instruction execution units in combination with the instruction release condition, and the clock of the instruction execution units to be dormant is turned off. Thus, it is possible to avoid switching the clock on/off state of the instruction execution unit too frequently.
For example, for each instruction execution unit to be hibernated with an idle probability equal to 100%, the corresponding clocks may be all turned off before the loop body instruction is executed. That is, for an instruction execution unit with an idle probability of 100%, the clock of the corresponding instruction execution unit is kept in a closed state during the execution of the loop body until the loop body is skipped, and then the clock open/closed state of the instruction execution unit is determined according to the monitoring of the instruction dispatch queue.
For example, for each instruction execution unit to be dormant with the idle probability greater than or equal to the probability threshold and less than 100%, the clock of the instruction execution unit to be dormant is turned on or off based on the position of the instruction corresponding to the instruction execution unit to be dormant in the instruction dispatch queue and the instruction release condition in the instruction execution unit to be dormant.
For example, following the example above, the loop body instruction sequenceComprising 10 instructions (e.g. OP) 0 ~OP 9 ) The first 9 instructions are executed by the fixed point instruction execution unit (e.g., OP) 0 ~OP 8 ) Last instruction (e.g. OP) 9 ) Executed by a floating point instruction execution unit, e.g., the floating point instruction execution unit has an idle probability greater than a probability threshold and less than 100%. Instruction dispatch queues are arranged, for example, as [ OP ] 0 ,OP 1 ,OP 2 ,OP 3 ,…,OP 9 ,OP 0 ,OP 1 ,OP 2 ,OP 3 ,…,OP 9 ,OP 0 ,OP 1 ,OP 2 ,OP 3 ,…,OP 9 ,…]. At the dispatch of an instruction OP 0 ,OP 1 ,OP 2 ,OP 3 ,…,OP 8 Meanwhile, the floating-point instruction execution unit may be in a clock-off state, i.e., an instruction OP is about to be dispatched 9 Time of flight (i.e., upon the next instruction dispatch, OP) 9 To be dispatched), OP 9 As an instruction execution unit to be activated, a clock corresponding to the floating-point instruction execution unit is opened to carry out an instruction OP 9 Dispatch to a floating point instruction execution unit. Completing execution of an instruction OP in a floating-point instruction execution unit 9 Thereafter, the clock of the floating-point instruction execution unit may be turned off again until the instruction OP in the next cycle 9 When dispatch is to occur, the floating point instruction execution unit is again clocked. In this manner, the clocks to some of the instruction execution units are dynamically turned on/off during execution of the loop body instruction sequence.
According to the control method, the idle probability of each instruction execution unit is determined according to the loop body instruction sequence, the instruction execution unit needing to close the clock is determined according to the idle probability, on the basis of the mode, the judgment granularity is improved according to the loop body instruction sequence, the execution unit needed by one loop only needs to be judged, namely the idle instruction execution unit can be found according to one loop, the instruction attribute does not need to be obtained one by one according to each loop, the idle instruction execution unit does not need to be judged, and therefore the effectiveness of power consumption control can be improved.
For example, in some embodiments, in the case of the multi-thread operating mode, multiple groups of instruction execution units to be hibernated, which correspond to multiple threads respectively, are determined, where each group of instruction execution units to be hibernated includes at least one instruction execution unit to be hibernated; determining intersection instruction execution units of a plurality of groups of instruction execution units to be dormant; the clock of the intersect instruction execution unit is closed.
For example, in a case where multiple threads run simultaneously, each thread includes an instruction dispatch queue belonging to the thread, in this case, a corresponding instruction execution unit to be dormant may be determined for each thread, results of all threads are summarized, and instruction execution units to be dormant under multiple threads are arbitrated and determined.
For example, two threads (e.g., thread 0 and thread 1) run simultaneously, and according to an instruction to be dispatched in the thread 0, the floating point instruction execution unit, the vector instruction execution unit, the matrix instruction execution unit and the memory access instruction execution unit are determined as instruction execution units to be dormant in a future period. According to the instruction to be dispatched in the thread 1, determining that the vector instruction execution unit, the matrix instruction execution unit and the access instruction execution unit are the instruction execution units to be dormant in a future period of time. The intersection of the instruction execution units to be dormant of the two threads may be used as the instruction execution unit to be dormant, for example, the intersection of the two threads is a vector instruction execution unit, a matrix instruction execution unit, and an access instruction execution unit, and clocks of the three instruction execution units may be turned off.
For example, in the case of a multi-thread operation mode, a plurality of groups of instruction execution units to be activated corresponding to a plurality of threads, respectively, may be determined; determining all instruction execution units of a plurality of groups of instruction execution units to be activated; clocks all instruction execution units are turned on.
For example, the processor has a first instruction fetching mode and a second instruction fetching mode, and for a thread corresponding to the first instruction fetching mode, M candidate instruction execution units to which no instruction is dispatched in at least one instruction dispatching process from the current time are determined from the plurality of instruction execution units, and at least one instruction execution unit to be dormant is determined from the M candidate instruction execution units. And for the thread corresponding to the second instruction fetching mode, determining idle probabilities corresponding to the plurality of instruction execution units respectively based on the loop body instruction sequence, determining M candidate instruction execution units with idle probabilities larger than or equal to a probability threshold, and determining at least one instruction execution unit to be dormant from the M candidate instruction execution units.
Fig. 10 illustrates a schematic diagram of another processor provided by at least one embodiment of the present disclosure.
As shown in fig. 10, the processor may further include a first Instruction fetching component 141 and a second Instruction fetching component 142, the second Instruction fetching component 142 is, for example, an OC (Micro-Operation Cache) path Instruction fetching component, and the first Instruction fetching component 141 is, for example, an IC (Instruction Cache) path Instruction fetching component. A first instruction fetch mode may be fetching instructions using first instruction fetch unit 141 and a second instruction fetch mode may be fetching instructions using second instruction fetch unit 142. Further, the processor may include an instruction decoding unit 150, an instruction fetch path determining unit 160, and an address generating unit and branch predicting unit 170. Address generation unit and branch prediction unit 170 may be used to predict the jump direction and jump destination address of a branch instruction. The instruction fetch path decision unit 160 may be configured to decide an instruction fetch path and select an incoming IC or OC instruction fetch path. Wherein the IC path fetch unit is operable to fetch instructions from the instruction cache, the IC stores instruction information prior to decoding, and translates the decoded information into machine-recognized micro instructions that are typically written into the OC for future use in the OC fetch path; the OC path instruction fetcher may be configured to fetch an instruction from decoded instruction information (i.e., a microinstruction) that has been deposited, and the system switches to the IC path instruction fetcher upon a miss (miss) in the OC path instruction fetcher. The loop body instruction will mostly choose to fetch the instruction from the OC path. The instruction decoding unit 150 is configured to decode instruction data obtained by the IC path fetching unit, and convert program instructions into microinstructions recognizable by the processor. The instruction dispatching unit 110 dispatches the instructions to the instruction execution units C1, C2, C3, \ 8230;, cn according to the execution conditions of the downstream instruction execution units C1, C2, C3, \\ 8230;, cn and in combination with the attributes of the instructions that need to be dispatched at present.
For example, in the single thread mode, the thread reads instructions in the first instruction fetching mode or the second instruction fetching mode, and at the same time, the thread can only read instructions in one instruction fetching mode. If the thread is switched to the first instruction fetching mode, the attribute information of each instruction to be dispatched in the instruction dispatching queue can be judged one by one, a candidate instruction execution unit is determined according to the attribute information of all or part of the instructions to be dispatched in the instruction dispatching queue, and the instruction execution unit to be dormant is further determined. Because most of the instructions corresponding to the second instruction fetching mode are loop body instruction sequences, if the thread switches to the second instruction fetching mode, the candidate instruction execution units can be determined by adopting the above manner for the loop body instruction sequences, and the instruction execution units to be dormant can be further determined.
For example, in the single thread mode, the instruction execution unit to be dormant/activated may be determined only according to the second instruction fetching mode, so as to avoid frequent clock turning on/off and ensure effectiveness of power consumption control. For example, in a single thread mode, it may also be selected to determine the instruction execution unit to be dormant/activated only according to the first instruction fetching mode, and since some programs run and there is little second instruction fetching characteristic, the power consumption can be reduced by adopting the first instruction fetching mode.
For example, whether in the simultaneous multi-thread mode or the single thread mode, it is only considered that the sleep/activation of the corresponding execution unit is controlled with respect to the second instruction fetch mode.
Another embodiment of the present disclosure also provides a control apparatus for a processor. The control device for the processor judges the instruction execution unit in the dispatching idle state in the dispatching process of a plurality of instructions in the future according to the attribute information of the instruction to be dispatched, cuts off the clock of at least part of the instruction execution unit in the dispatching idle state, and can close the clock on the level of the instruction execution unit based on the scheme, so that the power consumption of the corresponding instruction execution unit can be reduced, and the whole power consumption of the processor can be further reduced. Because the program behaviors are targeted, and the instruction behaviors are more concentrated on a specific instruction execution unit within a period of time, the circuit power consumption of the instruction execution unit can be greatly reduced and comprises a sequential circuit, a combinational circuit, a clock circuit and a gate control circuit, and therefore the whole power consumption of the processor can be reduced.
Fig. 11 illustrates a schematic block diagram of a control apparatus 300 for a processor according to at least one embodiment of the present disclosure.
For example, as shown in fig. 11, the control device 300 includes an attribute acquisition unit 310, a first determination unit 320, and a second determination unit 330.
The attribute obtaining unit 310 is configured to obtain attribute information of a plurality of instructions to be dispatched, wherein the attribute information characterizes a type of instruction execution unit to which the instructions to be dispatched are to be dispatched. The attribute acquisition unit 310 may perform, for example, step S210 described in fig. 4.
The first determining unit 320 is configured to determine, from among the plurality of instruction execution units, M candidate instruction execution units that are in a dispatch idle state during at least one dispatch of an instruction from a current time, based on attribute information of a plurality of instructions to be dispatched, M being a positive integer. The first determination unit 320 may perform, for example, step S220 described in fig. 4.
The second determination unit 330 is configured to determine at least one instruction execution unit to be hibernated from the M candidate instruction execution units. The second determination unit 330 may perform, for example, step S230 described in fig. 4.
For example, the attribute obtaining unit 310, the first determining unit 320, and the second determining unit 330 may be hardware, software, firmware, and any feasible combination thereof. For example, the attribute obtaining unit 310, the first determining unit 320, and the second determining unit 330 may be dedicated or general circuits, chips, or devices, and may also be a combination of a processor and a memory. The embodiments of the present disclosure are not limited in this regard to the specific implementation forms of the above units.
It should be noted that, in the embodiment of the present disclosure, each unit of the control device 300 corresponds to each step of the aforementioned control method, and for specific functions of the control device 300, reference may be made to the related description of the control method, which is not described herein again. The components and configuration of the control device 300 shown in fig. 11 are exemplary only, and not limiting, and the control device 300 may include other components and configurations as desired.
Fig. 12 is a schematic block diagram of another control apparatus for a processor according to at least one embodiment of the present disclosure.
For example, as shown in fig. 12, the control apparatus 300 may further include a clock control unit 340, where the clock control unit 340 is configured to turn off a clock of the instruction execution unit to be hibernated, so that the instruction execution unit to be hibernated transitions from an active state to a sleep state. Furthermore, the clock control unit 340 may also be configured to turn on the clock of the instruction execution unit. For example, the clock control unit may include a plurality of sub-clock control units for controlling the clock opening/closing/opening of the plurality of instruction execution units, and in the case of receiving a clock closing/opening signal for one or more instruction execution units, closes/opens the corresponding instruction execution unit.
For example, as shown in fig. 12, the control apparatus 300 may further include a release monitoring unit 350, where the release monitoring unit 350 is configured to determine, from the M candidate instruction execution units, at least one instruction execution unit from which the instructions to be dispatched are all released, as the at least one instruction execution unit to be hibernated, according to the instruction release queue.
For example, as shown in fig. 12, the attribute acquisition unit 310 is configured to: and acquiring attribute information of a plurality of instructions to be dispatched from the instruction dispatching queue.
For example, the at least one instruction dispatching process is a process for dispatching the first P to-be-dispatched instructions in the instruction dispatching queue. The first determining unit 320 is further configured to: determining M instruction execution units without instructions to be dispatched in the process of P to-be-dispatched instructions before dispatching from a plurality of instruction execution units, wherein P is an integer greater than or equal to 0, and the M instruction execution units are used as M candidate instruction execution units in an idle dispatching state.
For example, at least one instruction dispatch process is a process that dispatches all of the instructions to be dispatched in the instruction dispatch queue. The first determining unit 320 is further configured to: and determining M instruction execution units without instructions to be dispatched in the process of dispatching all the instructions to be dispatched from the plurality of instruction execution units as M candidate instruction execution units in a dispatching idle state.
For example, the plurality of instructions to be dispatched are a loop body instruction sequence, and the at least one instruction dispatch process is a process of dispatching the loop body instruction sequence. The first determination unit is further configured to: determining idle probabilities corresponding to a plurality of instruction execution units respectively based on the loop body instruction sequence; and taking the instruction execution unit with the idle probability larger than or equal to the probability threshold value as M candidate instruction execution units in the dispatching idle state. For example, in some examples, the probability threshold is 100%. In other examples, the probability threshold is greater than or equal to 50% and less than 100%.
For example, as shown in fig. 12, the control apparatus 300 may further include a fourth determination unit 370, the fourth determination unit 370 being configured to: and for each instruction execution unit to be dormant with the idle probability being greater than or equal to the probability threshold and less than 100%, turning on or turning off the clock of the instruction execution unit to be dormant based on the position of the instruction corresponding to the instruction execution unit to be dormant in the instruction dispatching queue and the instruction release condition in the instruction execution unit to be dormant.
For example, as shown in fig. 12, the control apparatus may further include a third determining unit 360, and the third determining unit 360 may be configured to: determining at least one instruction execution unit with instruction dispatch in at least one instruction dispatch process from the instruction execution units in the dormant state as an instruction execution unit to be activated based on the attribute information of the plurality of instructions to be dispatched; and turning on a clock of the instruction execution unit to be activated to convert the instruction execution unit to be activated from the sleep state to the active state.
For example, the third determining unit 360 may be further configured to: and opening the clock of the instruction execution unit to be activated according to the position of the instruction to be dispatched corresponding to the instruction execution unit to be activated in the instruction dispatching queue.
For example, the third determining unit 360 may be further configured to: and opening a clock of the instruction execution unit to be activated in response to the instruction to be dispatched corresponding to the instruction execution unit to be activated appearing at the last bit of the instruction dispatch queue, wherein the last bit of the instruction dispatch queue is the end position of the instruction dispatch queue for receiving the newly added instruction.
For example, the third determining unit 360 may be further configured to: and opening a clock of the instruction execution unit to be activated in response to the instruction to be dispatched corresponding to the instruction execution unit advancing to the preset position of the instruction dispatch queue from the last bit of the instruction dispatch queue.
For example, the first determining unit 320, the second determining unit 330, the third determining unit 360 and the fourth determining unit 370 are all located in an algorithm module, for example.
For example, the control apparatus may further comprise a multithreading module (not shown in the figures) configured to: under the condition of a multi-thread running mode, determining a plurality of groups of instruction execution units to be dormant corresponding to a plurality of threads respectively, wherein each group of instruction execution units to be dormant comprises at least one instruction execution unit to be dormant; determining intersection instruction execution units of a plurality of groups of instruction execution units to be dormant; the clock of the intersect instruction execution unit is closed.
For example, the multithreading module is further configured to: under the condition of a multi-thread operation mode, determining a plurality of groups of instruction execution units to be activated, which respectively correspond to a plurality of threads; determining all instruction execution units of a plurality of groups of instruction execution units to be activated; clocks all instruction execution units are turned on.
For example, a processor has a first instruction fetch mode and a second instruction fetch mode. For example, a processor includes a first instruction fetch unit and a second instruction fetch unit, a first instruction fetch mode to fetch instructions using the first instruction fetch unit, and a second instruction fetch mode to fetch instructions using the second instruction fetch unit. The multithreading module may be further configured to: for a thread corresponding to the first instruction fetching mode, determining M candidate instruction execution units without instructions being dispatched in at least one instruction dispatching process from the current moment from the plurality of instruction execution units, and determining at least one instruction execution unit to be dormant from the M candidate instruction execution units; and for the thread corresponding to the second instruction fetching mode, determining idle probabilities corresponding to the plurality of instruction execution units respectively based on the loop body instruction sequence, determining M candidate instruction execution units with idle probabilities larger than or equal to a probability threshold, and determining at least one instruction execution unit to be dormant from the M candidate instruction execution units.
For example, the control apparatus may further include a clock status monitoring module (not shown in the figure) configured to monitor the clock on/off status of each instruction execution unit. For example, the current clock status may be queried and confirmed from the clock status monitoring module before performing the step of turning on/off the clock. For example, for an instruction execution unit that needs to perform clock shutdown operation, the clock state monitoring module may determine whether a clock of the instruction execution unit is currently in an open state, and then perform clock shutdown operation after determining. After performing the step of turning on/off the clock, the latest state of the clock may be updated to the clock state monitoring module.
The control device of the embodiment of the disclosure judges the instruction execution unit in the dispatch idle state in the dispatch process of a plurality of instructions in the future according to the attribute information of the instruction to be dispatched, and cuts off the clock of at least part of the instruction execution unit in the dispatch idle state. Because most program behaviors have pertinence, and the instruction behaviors are more concentrated on a specific instruction execution unit within a period of time, the circuit power consumption of the instruction execution unit can be greatly reduced, and the circuit power consumption comprises a sequential circuit, a combinational circuit, a clock circuit and a gating circuit, so that the overall power consumption of the processor can be reduced.
Another embodiment of the present disclosure also provides a processor.
Fig. 13 illustrates a schematic block diagram of a processor 400 provided by at least one embodiment of the present disclosure.
For example, as shown in fig. 13, the processor 400 includes the control device 300 as described above. Further, the processor 400 may also include an instruction dispatching unit 110, an instruction executing unit 120, an instruction releasing unit 130, and the like.
At least one embodiment of the present disclosure also provides an electronic device comprising a processor and a memory, the memory including one or more computer program modules. One or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the control method described above. The electronic equipment can reduce the power consumption of the whole instruction execution component, and further can reduce the whole power consumption of the processor.
Fig. 14 is a schematic block diagram of an electronic device provided in some embodiments of the present disclosure. As shown in fig. 14, the electronic device 500 includes a processor 510 and a memory 520. Memory 520 is used to store non-transitory computer readable instructions (e.g., one or more computer program modules). The processor 510 is configured to execute non-transitory computer readable instructions, which when executed by the processor 510 may perform one or more of the steps of the control method described above. The memory 520 and the processor 510 may be interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, processor 510 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capabilities and/or program execution capabilities. For example, the Central Processing Unit (CPU) may be an X86 or ARM architecture or the like. The processor 510 may be a general-purpose processor or a special-purpose processor that may control other components in the electronic device 500 to perform desired functions.
For example, memory 520 may include any combination of one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer-readable storage medium and executed by processor 510 to implement various functions of electronic device 500. Various applications and various data, as well as various data used and/or generated by the applications, and the like, may also be stored in the computer-readable storage medium.
It should be noted that, in the embodiment of the present disclosure, reference may be made to the above description about the control method for specific functions and technical effects of the electronic device 500, and details are not repeated here.
Fig. 15 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. The electronic device 600 is, for example, suitable for implementing the control method provided by the embodiments of the present disclosure. The electronic device 600 may be a terminal device or the like. It should be noted that the electronic device 600 shown in fig. 15 is only one example, and does not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 15, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 610 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 620 or a program loaded from a storage means 680 into a Random Access Memory (RAM) 630. In the RAM630, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 610, the ROM 620, and the RAM630 are connected to each other by a bus 640. An input/output (I/O) interface 650 is also connected to bus 640.
Generally, the following devices may be connected to the I/O interface 650: input devices 660 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 670 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, or the like; storage 680 including, for example, magnetic tape, hard disk, etc.; and a communication device 690. The communication device 690 may allow the electronic apparatus 600 to communicate with other electronic apparatuses wirelessly or by wire to exchange data. While fig. 15 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided, and that the electronic device 600 may alternatively be implemented or provided with more or less means.
For example, according to an embodiment of the present disclosure, the above-described control method may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program comprising program code for performing the control method described above. In such embodiments, the computer program may be downloaded and installed from a network through communication device 690, or installed from storage device 680, or installed from ROM 620. When executed by the processing device 610, the computer program may implement the functions defined in the control method provided by the embodiments of the present disclosure.
At least one embodiment of the present disclosure also provides a computer-readable storage medium for storing non-transitory computer-readable instructions that when executed by a computer implement the control method described above. By using the computer readable storage medium, the power consumption of the whole instruction execution unit can be reduced, and the whole power consumption of the processor can be further reduced.
Fig. 16 is a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure. As shown in fig. 16, a computer-readable storage medium 700 is used to store non-transitory computer-readable instructions 710. For example, the non-transitory computer readable instructions 710, when executed by a computer, may perform one or more steps in accordance with the control methods described above.
For example, the storage medium 700 may be applied to the electronic device 500 described above. The storage medium 700 may be, for example, the memory 520 in the electronic device 500 shown in fig. 14. For example, the related description about the storage medium 700 may refer to the corresponding description of the memory 520 in the electronic device 500 shown in fig. 14, and will not be repeated here.
The following points need to be explained:
(1) The drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to common designs.
(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.
The above description is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

Claims (20)

1. A control method for a processor comprising a plurality of instruction execution units of different types, the method comprising:
acquiring attribute information of a plurality of instructions to be dispatched, wherein the attribute information represents the type of an instruction execution unit to which the instructions to be dispatched are dispatched;
determining M candidate instruction execution units in an idle dispatching state in at least one instruction dispatching process from the current time from the plurality of instruction execution units based on the attribute information of the plurality of instructions to be dispatched;
determining at least one instruction execution unit to be hibernated from the M candidate instruction execution units;
wherein the plurality of instructions to be dispatched comprises a loop body instruction sequence, the at least one instruction dispatch process comprises a process of dispatching the loop body instruction sequence, and determining from among the plurality of instruction execution units, M candidate instruction execution units that are in a dispatch idle state during at least one instruction dispatch from a current time, comprises:
determining idle probabilities respectively corresponding to the plurality of instruction execution units based on the loop body instruction sequence;
taking the instruction execution units with the idle probability greater than or equal to a probability threshold as the M candidate instruction execution units in the dispatching idle state;
wherein M is a positive integer.
2. The method of claim 1, further comprising:
and closing the clock of the instruction execution unit to be dormant so that the instruction execution unit to be dormant is converted into a dormant state from an activated state.
3. The method of claim 1, further comprising:
determining that at least one instruction execution unit for instruction dispatch exists in at least one instruction dispatch process from the current moment to the instruction execution units in the dormant state based on the attribute information of the plurality of instructions to be dispatched, wherein the at least one instruction execution unit is used as an instruction execution unit to be activated;
and turning on a clock of the instruction execution unit to be activated so as to convert the instruction execution unit to be activated from a sleep state to an activated state.
4. The method of claim 1, wherein determining at least one instruction execution unit to sleep from the M candidate instruction execution units comprises:
and determining at least one instruction execution unit from the M candidate instruction execution units, wherein the instruction execution unit releases all dispatched instructions, and the instruction execution unit is used as the at least one instruction execution unit to be dormant.
5. The method of any of claims 1-4, wherein obtaining attribute information for a plurality of instructions to be dispatched comprises:
and acquiring attribute information of a plurality of instructions to be dispatched from the instruction dispatching queue.
6. The method of claim 5, wherein the at least one instruction dispatch process comprises a process that dispatches a first P to-be-dispatched instructions in the instruction dispatch queue; determining M candidate instruction execution units from the plurality of instruction execution units that are in a dispatch idle state during at least one instruction dispatch from a current time, comprising:
determining M instruction execution units from the plurality of instruction execution units to which no instruction is dispatched in dispatching the first P to-be-dispatched instructions as the M candidate instruction execution units in a dispatch idle state,
wherein P is an integer greater than or equal to 0.
7. The method of claim 5, wherein the at least one instruction dispatch process comprises a process that dispatches all of the instructions to be dispatched in the instruction dispatch queue; determining M candidate instruction execution units from the plurality of instruction execution units that are in a dispatch idle state during at least one instruction dispatch from a current time, comprising:
determining M instruction execution units from the plurality of instruction execution units to which no instruction is dispatched in the process of dispatching the all to-be-dispatched instructions as the M candidate instruction execution units in the dispatching idle state.
8. The method of claim 1, wherein,
the probability threshold is 100%, or,
the probability threshold is greater than or equal to 50% and less than 100%.
9. The method of claim 8, further comprising:
for each instruction execution unit to be dormant with the idle probability being greater than or equal to the probability threshold and smaller than 100%, closing the clock of the instruction execution unit to be dormant based on the position of the instruction corresponding to the instruction execution unit to be dormant in an instruction dispatching queue and the instruction release condition in the instruction execution unit to be dormant.
10. The method of claim 3, wherein opening a clock of the instruction execution unit to be activated comprises:
and according to the position of the instruction to be dispatched corresponding to the instruction execution unit to be activated in the instruction dispatching queue, opening the clock of the instruction execution unit to be activated.
11. The method of claim 10, wherein opening a clock of the instruction execution unit to be activated according to a position of an instruction to be dispatched corresponding to the instruction execution unit to be activated in an instruction dispatch queue comprises:
responding to the occurrence of an instruction to be dispatched corresponding to the instruction execution unit to be activated at the last bit of the instruction dispatch queue, turning on the clock of the instruction execution unit to be activated,
the last bit of the instruction dispatching queue is the end position of the instruction dispatching queue for receiving the newly added instruction.
12. The method of claim 10, wherein opening a clock of the instruction execution unit to be activated according to a position of an instruction to be dispatched corresponding to the instruction execution unit to be activated in an instruction dispatch queue comprises:
and responding to the fact that the to-be-dispatched instruction corresponding to the instruction execution unit advances to a preset position of the instruction dispatch queue from the last bit of the instruction dispatch queue, and turning on a clock of the to-be-activated instruction execution unit.
13. The method of any of claims 1 to 4, further comprising:
under the condition of a multi-thread running mode, determining a plurality of groups of instruction execution units to be dormant corresponding to a plurality of threads respectively, wherein each group of instruction execution units to be dormant comprises at least one instruction execution unit to be dormant;
determining intersection instruction execution units of the multiple groups of instruction execution units to be dormant;
closing the clock of the intersect instruction execution unit.
14. The method of claim 3, further comprising:
under the condition of a multi-thread operation mode, determining a plurality of groups of instruction execution units to be activated, which respectively correspond to a plurality of threads;
determining all instruction execution units of the plurality of groups of instruction execution units to be activated;
and opening the clocks of all the instruction execution units.
15. The method of claim 13, wherein the processor has a first instruction fetch mode and a second instruction fetch mode,
determining a plurality of groups of instruction execution units to be dormant corresponding to a plurality of threads respectively, comprising:
for the thread corresponding to the first instruction fetching mode, determining M candidate instruction execution units without instructions being dispatched in at least one instruction dispatching process from the current moment from the plurality of instruction execution units, and determining at least one instruction execution unit to be dormant from the M candidate instruction execution units;
and for the thread corresponding to the second instruction fetching mode, determining idle probabilities corresponding to the multiple instruction execution units respectively based on a loop body instruction sequence, determining M candidate instruction execution units with idle probabilities larger than or equal to a probability threshold, and determining at least one instruction execution unit to be dormant from the M candidate instruction execution units.
16. A control apparatus for a processor including a plurality of instruction execution units of different types, the control apparatus comprising:
the attribute acquisition module is configured to acquire attribute information of a plurality of instructions to be dispatched, wherein the attribute information represents the type of an instruction execution unit to which the instructions to be dispatched are to be dispatched;
a first determining module configured to determine, from the plurality of instruction execution units, M candidate instruction execution units that are in a dispatch idle state in at least one instruction dispatch process from a current time, based on attribute information of the plurality of instructions to be dispatched;
a second determining module configured to determine at least one instruction execution unit to be hibernated from the M candidate instruction execution units;
wherein the plurality of instructions to be dispatched comprises a sequence of loop body instructions, the at least one instruction dispatch process comprises a process of dispatching the sequence of loop body instructions, the first determination module is further configured to:
determining idle probabilities respectively corresponding to the plurality of instruction execution units based on the loop body instruction sequence;
taking the instruction execution units with the idle probability greater than or equal to a probability threshold as the M candidate instruction execution units in the dispatching idle state;
wherein M is a positive integer.
17. A processor comprising the control device of claim 16.
18. The processor of claim 17, further comprising:
a plurality of instruction execution units configured to execute instructions to be dispatched;
the instruction dispatching queue is configured to store the instructions to be dispatched according to the execution time sequence;
and the instruction release queue is configured to store the to-be-released instructions.
19. An electronic device, comprising:
a processor;
a memory including one or more computer program modules;
wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the control method of any one of claims 1-15.
20. A computer-readable storage medium storing non-transitory computer-readable instructions that, when executed by a computer, implement the control method of any one of claims 1-15.
CN202111671303.2A 2021-12-31 2021-12-31 Processor, control method and device thereof, electronic equipment and storage medium Active CN114356416B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111671303.2A CN114356416B (en) 2021-12-31 2021-12-31 Processor, control method and device thereof, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111671303.2A CN114356416B (en) 2021-12-31 2021-12-31 Processor, control method and device thereof, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114356416A CN114356416A (en) 2022-04-15
CN114356416B true CN114356416B (en) 2023-04-07

Family

ID=81105648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111671303.2A Active CN114356416B (en) 2021-12-31 2021-12-31 Processor, control method and device thereof, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114356416B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221403A (en) * 2019-12-27 2020-06-02 中国农业大学 SoC system and method capable of allocating sleep mode control
CN113590197A (en) * 2021-07-30 2021-11-02 中国人民解放军国防科技大学 Configurable processor supporting variable-length vector processing and implementation method thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7930564B2 (en) * 2006-07-31 2011-04-19 Intel Corporation System and method for controlling processor low power states
US8230247B2 (en) * 2011-12-30 2012-07-24 Intel Corporation Transferring architectural functions of a processor to a platform control hub responsive to the processor entering a deep sleep state
US9405872B2 (en) * 2014-06-08 2016-08-02 Synopsys, Inc. System and method for reducing power of a circuit using critical signal analysis
CN112147931B (en) * 2020-09-22 2022-06-24 哲库科技(北京)有限公司 Control method, device and equipment of signal processor and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221403A (en) * 2019-12-27 2020-06-02 中国农业大学 SoC system and method capable of allocating sleep mode control
CN113590197A (en) * 2021-07-30 2021-11-02 中国人民解放军国防科技大学 Configurable processor supporting variable-length vector processing and implementation method thereof

Also Published As

Publication number Publication date
CN114356416A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
JP4287799B2 (en) Processor system and thread switching control method
KR100973951B1 (en) Unaligned memory access prediction
JP5853216B2 (en) Integrated circuit, computer system, and control method
TW201923561A (en) Scheduling tasks in a multi-threaded processor
JP2009053861A (en) Program execution control device
KR20140113444A (en) Processors, methods, and systems to relax synchronization of accesses to shared memory
US7971040B2 (en) Method and device for saving and restoring a set of registers of a microprocessor in an interruptible manner
US9069565B2 (en) Processor and control method of processor
JP2000132390A (en) Processor and branch prediction unit
GB2287108A (en) Method and apparatus for avoiding writeback conflicts between execution units sharing a common writeback path
CN114201219B (en) Instruction scheduling method, instruction scheduling device, processor and storage medium
CN111381869B (en) Micro-operation cache using predictive allocation
KR20150079429A (en) Apparatus for handling processor read-after-write hazards with cache misses and operation method thereof
CN114168202B (en) Instruction scheduling method, instruction scheduling device, processor and storage medium
US9710269B2 (en) Early conditional selection of an operand
JPWO2008155794A1 (en) Information processing device
CN114356416B (en) Processor, control method and device thereof, electronic equipment and storage medium
CN108021563B (en) Method and device for detecting data dependence between instructions
CN116048627B (en) Instruction buffering method, apparatus, processor, electronic device and readable storage medium
US7065636B2 (en) Hardware loops and pipeline system using advanced generation of loop parameters
KR100431975B1 (en) Multi-instruction dispatch system for pipelined microprocessors with no branch interruption
US10963260B2 (en) Branch predictor
CN113918225A (en) Instruction prediction method, instruction data processing apparatus, processor, and storage medium
US10831232B2 (en) Computer architecture allowing recycling of instruction slack time
CN116113940A (en) Graph calculation device, graph processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant