WO2011161884A1 - 集積回路、コンピュータシステム、制御方法 - Google Patents
集積回路、コンピュータシステム、制御方法 Download PDFInfo
- Publication number
- WO2011161884A1 WO2011161884A1 PCT/JP2011/003185 JP2011003185W WO2011161884A1 WO 2011161884 A1 WO2011161884 A1 WO 2011161884A1 JP 2011003185 W JP2011003185 W JP 2011003185W WO 2011161884 A1 WO2011161884 A1 WO 2011161884A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- loop
- processor
- unit
- instruction
- power
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
Definitions
- the present invention relates to an integrated circuit having a processor, a computer system, and a control method, and more particularly to an integrated circuit, a computer system, and a control method that reduce power consumption when the processor executes a busy wait.
- busy wait is used when waiting for input from a user or when synchronizing between a plurality of processors (or a plurality of logical processors, processes, threads, etc.) operating in parallel.
- a plurality of processors or a plurality of logical processors, processes, threads, etc.
- the busy wait is performed, for example, a loop process for repeatedly checking the value of a specific variable (for example, a synchronization variable) is executed in the processor. Then, when the value of the specific variable changes to the set value, the loop process ends and the original process is executed.
- a specific variable for example, a synchronization variable
- the use of the busy wait enables a synchronous process in which one process is completed and then the other process is started between a plurality of processors operating in parallel. Is widely used.
- busy wait has the disadvantage of wasting processor resources.
- the number of loops repeated for monitoring the synchronization variable may be several hundred to several tens of thousands in some cases. Therefore, the busy weight is a wasteful control method from the viewpoint of power consumption.
- Patent Document 1 discloses a method for reducing wasteful power consumption at the time of spin wait, which is a kind of busy weight.
- the spin wait is used for, for example, synchronization processing between multiprocessors, and FIG. 33 assumes a situation in which two processors sequentially execute processing divided into a front stage and a rear stage.
- a spin lock (a type of interlock) is configured by the setting unit 1101 on the first processor side, the verification unit 1102 on the second processor side, and the variable 1110 for synchronization processing.
- the setting unit 1101 of the first processor writes “0” in the synchronization variable 1110, and then performs the previous process S1111.
- the first processor writes “1” in the synchronization variable 1110 when the previous process S1111 is completed.
- the verification unit 1102 of the second processor cannot start the subsequent process S1122 until the value of the synchronization variable 1110 becomes “1”. Therefore, the determination in step S1121 is repeated until the synchronization variable 1110 becomes “1” (that is, the loop is repeated to wait), the spin wait state is entered, and power is wasted.
- Patent Document 1 discloses a method of detecting that a processor has executed an instruction sequence used for spin wait, and at that time putting the processor in a power saving state.
- FIG. 34 is a diagram illustrating a configuration of the spin weight detection unit described in Patent Document 1. In FIG.
- the spin wait detection unit 1222 includes an executed instruction sequence buffer 1234 of the processor, a spin wait instruction sequence storage unit 1236, and a comparison unit 1238 that compares instruction sequences included in both.
- the spin wait instruction string storage unit 1236 stores an interlock instruction string (for example, test_and_set and compare_and_swap) specific to the spin wait.
- the comparison unit 1238 detects that the executed instruction sequence buffer 1234 includes the interlock instruction sequence stored in the spinwait instruction sequence storage unit 1236, the spinwait instruction detection signal 1241 is output. Is done.
- the processor is put into a power saving state using the spin wait instruction detection signal 1241 as a trigger.
- the busy wait is detected based on the execution of a specific instruction sequence (test_and_set, etc.) stored in the spinwait instruction sequence storage unit 1236.
- a busy wait using a different instruction sequence cannot be detected.
- various instruction sequences are stored in the spinwait instruction sequence storage unit 1236, and each of the various instruction sequences and the executed instruction sequence are stored.
- there is a limit to the comparison That is, since the busy weight that can be detected is limited, there is a problem that the situation in which power consumption can be reduced is limited.
- the present invention solves the above-described problems in the conventional technology, and can detect execution of a variety of busy weights, and can reduce power consumed by the execution of busy weights in a processor. It aims to provide a method.
- an integrated circuit is an integrated circuit including a processor, and the processor detects that a loop process that repeatedly executes a loop including one or more instructions is executed.
- a loop propagation dependency analyzing unit that detects a loop propagation dependency that spans two loops having different execution times between instructions in the loop processing, and the loop processing detected by the loop detecting unit,
- a power control unit that performs power saving control for reducing power consumption due to the execution of the loop processing when the loop propagation dependency is not detected by the loop propagation dependency analyzing unit.
- the loop detection unit can detect the loop processing, and the loop propagation dependency analysis unit can determine whether the loop processing is for busy wait. Therefore, a busy wait loop can be detected without comparison with a specific instruction sequence (for example, an interlock instruction sequence). Therefore, it is possible to easily detect a busy wait loop constituted by an instruction sequence other than a specific instruction sequence and perform power saving control during the busy wait. As a result, it is possible to detect more various busy wait executions and reduce power wasted due to busy wait execution.
- a specific instruction sequence for example, an interlock instruction sequence
- FIG. 1 is a diagram schematically illustrating a configuration of a computer system 1300 including an integrated circuit according to a first embodiment.
- FIG. 3 is a diagram schematically illustrating a configuration of a computer system 1300 including an integrated circuit according to a first embodiment.
- 3 is a diagram illustrating an instruction set 1400 of a processor 1301 included in the integrated circuit according to the first embodiment;
- FIG. 3 is a diagram illustrating a register set 1500 of a processor 1301 in Embodiment 1.
- FIG. 6 is a diagram illustrating an instruction sequence 1600 according to the first embodiment.
- FIG. 6 is a diagram illustrating an example of an instruction sequence 2100 that constitutes a loop according to Embodiment 1.
- FIG. 6 is a flowchart showing processing of a loop propagation dependency analyzing unit 1701 in the first embodiment.
- 6 is a diagram illustrating data in a dependency relationship analysis buffer 2200 used by a loop propagation dependency analysis unit 1701 in the first embodiment.
- FIG. 6 is a diagram illustrating an example of an instruction sequence forming a loop in Embodiment 1.
- FIG. FIG. 11 is a diagram illustrating data in a dependency relationship analysis buffer 2700 used by a loop propagation dependency analysis unit 1701 in the first embodiment.
- 3 is a flowchart showing processing of a first power control unit 1703 in the first embodiment.
- FIG. 6 is a flowchart showing processing of a loop escape detection unit 1704 in the first embodiment.
- 6 is a flowchart showing processing of a second power control unit 1705 in the first embodiment.
- 6 is a diagram showing an electronic circuit that constitutes a part of a loop propagation dependence analyzing unit 1701 in the first embodiment.
- FIG. 6 is a diagram showing an electronic circuit that constitutes a part of a loop propagation dependence analyzing unit 1701 in the first embodiment.
- FIG. 6 is a diagram showing an electronic circuit that constitutes a part of a loop propagation dependence analyzing unit 1701 in the first embodiment.
- FIG. It is a figure which shows typically the structure of the computer system 2800 concerning a modification.
- FIG. 18 is a diagram schematically showing a configuration of a multi-thread compatible computer system 2900 in the second embodiment.
- FIG. 10 is a diagram schematically showing a configuration of a register group 2910 in the second embodiment.
- 10 is a flowchart showing processing of a first power control unit 3013 in the second embodiment.
- 10 is a flowchart showing processing of a second power control unit 3015 in the second embodiment.
- 10 is a flowchart showing processing of a first power control unit 3013 in the second embodiment.
- 10 is a flowchart showing processing of a second power control unit 3015 in the second embodiment.
- FIG. 10 is a diagram schematically showing a configuration of a computer system 4000 in the third embodiment.
- FIG. 14 is a flowchart illustrating an operation related to power saving control of the program counter monitoring unit 4100 according to the third embodiment.
- 10 is a flowchart illustrating an operation of a bus monitoring unit 4100 according to the third embodiment.
- 14 is a flowchart illustrating an operation related to the end of power saving control of the program counter monitoring unit 4100 according to the third embodiment.
- FIG. 18 is a diagram schematically showing a configuration of a computer system 4500 in the fourth embodiment. It is a figure which illustrates other instruction sets 1400A. It is a figure which illustrates further another instruction set 1400B. It is a figure which illustrates typically an example of the busy weight used as the background art of this invention. It is a figure which shows typically the structure of the spin weight detection part of patent document 1.
- FIG. 18 is a diagram schematically showing a configuration of a computer system 4500 in the fourth embodiment. It is a figure which illustrates other instruction sets 1400A. It is a figure which illustrates further another instruction set 1400B.
- FIG. 1 is a block diagram schematically showing the configuration of the computer system 1300.
- the computer system 1300 includes a processor 1301, a main memory 1302 (RAM, ROM, etc.), an I / O device 1303 (input / output device), a power supply device 1304, and a bus 1305.
- the processor 1301 is formed in an integrated circuit.
- the processor 1301, the main memory 1302, the I / O device 1303, and the power supply device 1304 are connected to each other via a bus 1305.
- the power supply device 1304 supplies power 1306 and a clock 1307 to the processor 1301.
- the power supply device 1304 includes a regulator, and can change the voltage of the power 1306 supplied to the processor 1301.
- the power supply device 1304 includes a clock generation circuit and a frequency divider circuit, and can change the frequency of the clock 1307 supplied to the processor 1301.
- the processor 1301 includes a register group 1310, a program counter 1311 included in the register group 1310, an instruction fetch / decode unit 1320, an issue unit 1330, an instruction sequence holding unit 1340, an execution unit 1350, a retirement unit 1360, an instruction cache 1370, and a data cache 1380. Is provided.
- the processor 1301 includes a power saving control device (reference numerals 1701 to 1705 and 1900 in FIG. 1), and performs power saving control when the busy power loop processing is executed by the power saving control device. Do. This power saving control device will be described later.
- FIG. 2 shows the computer system 1300 in a simplified manner, omitting the illustration of the power saving control device. Based on this figure, the basic operation of the computer system 1300 will be described.
- the instruction fetch / decode unit 1320 reads from the instruction cache 1370 an instruction sequence that the processor 1301 may execute based on the value of the program counter 1311. That is, so-called instruction prefetching is performed.
- the instruction fetch / decode unit 1320 decodes the read instruction sequence and temporarily stores it in the instruction sequence holding unit 1340.
- the instruction string holding unit 1340 holds a predetermined number of instructions, and a new instruction is overwritten on the oldest instruction. That is, the instruction sequence holding unit 1340 is configured by a ring buffer. Therefore, the issued instruction issued by the following issuing unit 1330 is left in the instruction string holding unit 1340.
- the issuing unit 1330 sends the instruction sequence stored in the instruction sequence holding unit 1340 to the execution unit 1350, which is ready for execution. That is, an instruction is issued. For example, when issuing an operation instruction, the issuing unit 1330 takes the value of a specified operand (source register) from the register group 1310 and sends it to the execution unit 1350 together with the instruction.
- the execution unit 1350 performs various processes. Various types of processing include arithmetic processing such as four arithmetic operations and bit operations of values stored in the register group 1310, floating point processing, load store processing via the data cache 1380 and the bus 1305, branching, and the like. The execution result of the processing performed by the execution unit 1350 is sent to the retirement unit 1360.
- arithmetic processing such as four arithmetic operations and bit operations of values stored in the register group 1310
- floating point processing such as four arithmetic operations and bit operations of values stored in the register group 1310
- load store processing via the data cache 1380 and the bus 1305, branching, and the like.
- the execution result of the processing performed by the execution unit 1350 is sent to the retirement unit 1360.
- the processor 1301 is provided with a plurality of execution units 1350, and the plurality of execution units 1350 process instruction sequences in parallel.
- the issuing unit 1330 sequentially sends an instruction sequence to a free execution unit 1350 among the plurality of execution units 1350.
- the retirement unit 1360 writes the execution result of the execution unit 1350 to the register group 1310 after confirming that the preparation for writing to the register group 1310 is completed. Normally, the retirement unit 1360 writes the execution results to the register group 1310 in the order of instruction addresses.
- the instruction cache 1370 and the data cache 1380 temporarily hold information read from the main memory 1302 and information scheduled to be written to the main memory 1302.
- FIG. 3 is a diagram exemplifying an assembler code instruction 1401 included in the instruction set, an outline 1402 of each instruction 1401, and an operation content 1403.
- the instruction set 1400 includes an ADD instruction 1411 for addition, a SUB instruction 1412 for subtraction, a CMP instruction 1413 for comparison, a MOV instruction 1414 for movement, a BNE instruction 1415 for conditional branching, a JMP instruction 1416 for jumping, a bus From an LDR instruction 1417 for reading information from the main memory 1302, I / O device 1303 and power supply device 1304 connected to 1305, from the main memory 1302, I / O device 1303 and power supply device 1304 connected to the bus 1305
- a STR instruction 1418 for writing information and a NOP instruction 1419 for incrementing the program counter without performing an operation are included.
- the register on the left side of the assignment symbol that is, the register of the assignment destination is referred to as the destination register or DST register.
- the register on the right side of the assignment symbol, that is, the assignment source register is referred to as a source register or an SRC register.
- the DST register and the SRC register are specific examples of “variables”.
- the register set 1500 includes a general-purpose register 1511 that is generally used at the time of calculation, a program counter (PC) 1512 indicating the address of an instruction being executed by the processor, and a condition flag register (CFR) that is used for determining a conditional branch. ) 1513.
- PC program counter
- CFR condition flag register
- instruction set 1400 and the register set 1500 are illustrated by partially excluding portions necessary for explanation.
- instruction sets and register sets may be used.
- the present invention can be implemented in ARM, MIPS, x86, mn10300 instruction sets, etc. widely used by those skilled in the art.
- the instruction string holding unit 1340 stores an address 1601 where an instruction is stored and an instruction 1602 in association with each other.
- the power saving control device includes a loop detection unit 1701, a loop propagation dependence analysis unit 1702, a first power control unit 1703, a loop escape detection unit 1704, and a second power control unit 1705. Further, the power saving control device includes a loop range storage unit 1900 provided in the register group 1310.
- the power saving control device is roughly divided into two components.
- the first component detects a busy wait execution state, performs power saving control, and performs processing to put the computer system 1300 into a power saving state.
- the second component performs processing to return the computer system 1300 to the state before the power saving control is performed by detecting the end of the busy wait and ending the power saving control.
- the first component and the second component will be described in this order.
- the first component includes a loop detection unit 1701, a loop propagation dependency analysis unit 1702, and a first power control unit 1703.
- the loop detection unit 1701 detects the execution of the loop processing, and the loop propagation dependency analysis unit 1702 determines whether the processing is busy wait loop processing, calculation loop processing, or the like.
- the first power control unit 1703 performs power saving control.
- the loop detection unit 1701 determines establishment of the branch instruction in step S1801. If the determination result of step S1801 is YES (branch is established), the process proceeds to step S1803. If the determination result of step S1801 is NO (branch is not established), the determination of step S1801 is performed again.
- the determination is made based on the execution result of the jump instruction (JMP instruction 1416) or the conditional branch instruction (BNE instruction 1415) and the value of the condition flag register (CFR) 1513 sent from the execution unit 1350 to the retirement unit 1360.
- the loop detection unit 1701 determines that the branch instruction is established when the execution result is an execution content in which the value of the program counter 1311 is rewritten to a predetermined address.
- the loop detection unit 1701 includes a branch instruction storage unit that holds various branch instructions (JMP instruction, BNE instruction, etc.), and an instruction comparison unit that compares the executed instruction stored in the retirement unit 1360 with the various branch instructions. And. When the executed branch instruction is stored in the retirement unit 1360, the instruction comparison unit detects the branch instruction.
- branch instruction storage unit that holds various branch instructions (JMP instruction, BNE instruction, etc.)
- instruction comparison unit that compares the executed instruction stored in the retirement unit 1360 with the various branch instructions. And.
- the instruction comparison unit detects the branch instruction.
- the loop detection unit 1701 refers to the value of the CFR 1513 of the register group 1310. For example, when the value of the CFR 1513 is not “0”, the branch is established, and “0”. If it is, it is, it is determined that the branch is not established.
- the value of the CFR 1513 is, for example, that the execution result of the comparison instruction (CMP) preceding the conditional branch instruction is written by the retirement unit 1360.
- step S1803 the loop detection unit 1701 determines whether or not the branch destination address is an address preceding the branch instruction. If the determination result in step S1803 is YES (if the branch destination address is an address preceding the branch instruction), the process advances to step S1805. If the determination result in step S1803 is NO (if the branch destination address is not an address preceding the branch instruction), the process returns to step S1801.
- the execution result sent from the execution unit 1350 to the retirement unit 1360 includes the branch instruction address and branch destination address information, and the above determination is performed by comparing these two addresses. . When the branch destination address is an address preceding the branch instruction (an address having a value smaller than the address of the branch instruction), it is determined that the loop execution state is set.
- the set value can be set to 10, for example.
- step S1804 the loop detection unit 1701 extracts the start address 1911 and the end address 1912 of the loop.
- the loop detection unit 1701 extracts the branch destination address of the branch instruction in which the branch is established as the loop start address 1911. Next, the loop detection unit 1701 extracts, as the loop end address 1912, an address in which the branch instruction in which the branch is taken is stored.
- step S 1805 the loop detection unit 1701 outputs the loop start address 1911 and the loop end address 1912 extracted in step S 1804 to the loop range storage unit 1900.
- the loop range storage unit 1900 includes a register (a type of memory) that stores a head address 1911 and a tail address 1912.
- step S1807 the loop detection unit 1701 outputs a dependency analysis execution instruction 1711 to the loop propagation dependency analysis unit 1702.
- the execution instruction 1711 is made, for example, by setting the voltage of the signal line connecting the loop detection unit 1701 and the loop propagation dependence analysis unit 1702 to a high level.
- the loop detection unit 1701 may use processes other than those shown here as long as it can detect that the processor is executing a loop. For example, when the address held in the program counter 1311 is updated and the address is decreased, it may be determined that the loop execution state is set.
- loop propagation dependency analyzing unit 1702 (1-2) Loop propagation dependency analyzing unit 1702
- the loop propagation dependency analyzing unit 1702 discriminates between a loop used for busy wait and a loop that is not busy wait (for example, a loop for calculation) will be described while listing specific instruction sequences. .
- a value is read from the address indicated by the R2 register to the R0 register by the LDR instruction at address 1008.
- the value of the R0 register is compared with the value of the R1 register in which the value is stored in advance by the CMP instruction at the address 100c. If the values match, the BNE instruction at address 1010 exits the loop. That is, the loop process ends due to the branch failure. On the other hand, if the values do not match, the BNE instruction at address 1010 branches to address 1004. That is, a branch is established.
- the instruction sequence 2100 shown in FIG. 8 performs a process of repeatedly reading and comparing values from a fixed address (address indicated by the R2 register).
- the value stored at the address pointed to by the R2 register corresponds to a busy wait synchronization variable.
- the instruction sequence 2100 is used to wait for key input, and the subsequent processing is executed when, for example, the key is pressed (when the synchronization variable is rewritten). However, when the key is not pressed (when the synchronization variable is not rewritten), the process of confirming the value of the synchronization variable (address indicated by the R2 register) is repeatedly performed, which wastes power. Therefore, it is desirable to suppress power consumption due to execution of such loop processing (instruction sequence 2100).
- step S2001 the loop propagation dependency analyzing unit 1702 starts analysis processing at the timing when the execution instruction 1711 is received from the loop detecting unit 1701.
- step S2003 the loop propagation dependency analyzing unit 1702 acquires the loop start address 1911 and the end address 1912 from the loop range storage unit 1900.
- the start address 1911 and the like of the loop are acquired in the same manner as when the issuing unit 1330 acquires a value from a predetermined register, and the designated register is the register of the loop range storage unit 1900.
- step S2005 the loop propagation dependency analyzing unit 1702 acquires an instruction sequence within the loop range from the instruction sequence holding unit 1340. As described above, the issued instruction issued by the issuing unit 1330 remains in the instruction string holding unit 1340. Further, the number of instructions stored in the instruction string holding unit 1340 is increased so that an instruction string within the loop range remains in the instruction string holding unit 1340 even after the loop is detected.
- the program counter 1311 is rewritten to the start address of the loop after execution of the branch instruction, and the instruction sequence in the loop range is rewritten by the instruction fetch / decode unit 1320. It may be fetched / decoded and stored in the instruction string holding unit 1340.
- step S2007 the loop propagation dependency analyzing unit 1702 stores the acquired instruction sequence 2100 in the dependency analysis buffer 2200 provided in the loop propagation dependency analyzing unit 1702.
- the instruction sequence 2100 constituting the loop is stored by being arranged twice for the loop.
- an entry is provided for each instruction included in the loop.
- the entry is a memory that stores an instruction and information associated with the instruction.
- Information stored in each entry of the dependency analysis buffer 2200 includes the mnemonic 2201 of the instruction, the identifier of the DST register (destination register) 2202 of the instruction, and the identifier of the SRC register (source register) 2203 of the instruction.
- an entry number 2204 for identifying the position of the instruction in the dependency analysis buffer 2200 and whether the instruction is in the preceding loop or the subsequent loop are determined. It includes an iteration number 2205 to be executed, an entry number 2206 of the dependent source instruction, and an iteration span determination result 2207 between the instruction and the dependent source instruction.
- step S2007 various types of information are stored in the dependency relationship analysis buffer 2200.
- the information includes a mnemonic 2201, an identifier of the DST register 2202, and an identifier of the SRC register 2203.
- step S2009 to step S2017 the loop propagation dependency analyzing unit 1702 repeatedly performs the following processing in order from the head to the tail for each entry of the dependency analysis buffer 2200.
- step S2011 the loop propagation dependency analyzing unit 1702 assigns an entry number 2204 to each entry.
- the number used as the entry number 2204 is “1” for the first entry, and hereinafter referred to as “2”, “3”, “4”, “5”, “6”, “7”, “8”. In this way, the value increases by one.
- step S2013 the loop propagation dependency analyzing unit 1702 assigns an iteration number 2205 to each entry to identify whether it is a preceding loop or a succeeding loop.
- the identifier indicating the preceding loop is “1” and the identifier indicating the subsequent loop is “2”.
- the iteration number 2205 may be information other than “1” and “2” as long as it is information that can identify whether the loop is the preceding stage or the succeeding loop.
- step S2015 the loop propagation dependency analyzing unit 1702 writes the entry number 2206 of the dependency source instruction.
- the dependency source instruction is the immediately preceding instruction that uses the SRC register of the instruction as the DST register.
- RAW Read after Write
- step S2015 information that can determine only the presence / absence of loop propagation dependency may be generated without checking the presence / absence of dependency in the loop. Specifically, it is only necessary to generate information for determining whether an instruction with an iteration number 2205 of “1” becomes a dependency source instruction for each entry with an iteration number 2205 of “2”.
- step S2016 the loop propagation dependence analyzing unit 1702 checks whether there is loop propagation dependence. Specifically, the entry number 2206 of the dependent instruction is referred to, and it is determined whether the iteration number to which the dependent instruction belongs is “1” and the iteration number 2205 of the instruction is “2”. If the determination result is YES, loop propagation dependence exists, and “YES” is written in the iteration straddling determination result 2207 with the dependency source instruction (actually, the value is set to “1”). If the determination result is NO, there is no loop propagation dependency, and “NO” is written in the iteration straddling determination result 2207 with the dependency source instruction (actually, the value is set to “0”).
- step S2017 when the loop processing is completed for all entries, the loop is terminated and the process proceeds to step S2019.
- FIG. 10 shows the state of the dependency relationship analysis buffer 2200 immediately before the determination in step S2019 of the loop propagation dependency analysis unit 1702 when the processor 1301 executes the instruction sequence shown in FIG. ing.
- step S2019 the loop propagation dependency analyzing unit 1702 determines whether there is an entry whose determination result 2207 with the dependency source instruction is “YES”.
- step S2019 If the decision result in the step S2019 is YES, since it is not a busy wait loop, the process proceeds to a step S2021.
- step S2021 the loop propagation dependency analyzing unit 1702 ends the process without outputting the execution instruction 1712 to the first power control unit 1703. Therefore, power saving control is not performed.
- step S2019 determines whether the determination result in step S2019 is NO, the process proceeds to step S2023 because it is a busy wait loop.
- step S2023 the loop propagation dependence analyzing unit 1702 outputs an execution instruction 1712 for executing power saving control to the first power control unit 1703, and ends the process. Therefore, power saving control is performed.
- the execution instruction 1712 is made, for example, by setting the voltage of the signal line connecting the loop propagation dependency analyzing unit 1702 and the first power control unit 1703 to a high level.
- the loop propagation dependency analyzing unit 1702 if the information stored in the dependency relationship analyzing buffer 2200 is the example of FIG. 10, there is no loop propagation dependency, and the determination result in step S2019 is “NO”. " That is, there is no entry whose entry number is No. 5 to No. 8 that has an “YES” as the result of the iteration over the dependency instruction 2207.
- the loop process subjected to the dependency analysis is a busy wait loop process and is a target of power saving control.
- step S2023 the process proceeds to step S2023, and as a result, an execution instruction 1712 is output to the first power control unit 1703.
- the loop propagation dependency analyzing unit 1702 performs the analysis processing shown in the flowchart of FIG. 9, and the source register of the instruction belonging to the latter loop is executed by executing the instruction belonging to the preceding loop. It is determined that there is no loop propagation dependency by not reading the written value. In other words, the loop propagation dependency analyzing unit 1702 determines that there is loop propagation dependency when the source register of the instruction belonging to the subsequent loop is read out the value written by the execution of the instruction belonging to the previous loop.
- the value obtained by adding 4 to the R2 register is input to the R2 register by the ADD instruction at address 2004.
- a value is read into the R0 register from the address indicated by the R2 register by the LDR instruction at address 2008.
- the value of the R0 register is compared with the value of the R1 register in which the set value is stored in advance by the CMP instruction at the address 200c. If the values match, the loop is exited with a BNE instruction at address 2010. If the values do not match, a branch to address 2004 is made with a BNE instruction at address 2010.
- the instruction sequence 2600 shown in FIG. 11 is different from the instruction sequence 2100 shown in FIG. 9 in that it does not perform a process of comparing values by repeatedly reading values from a certain address. This is because the value of the R2 register is added in each loop by the ADD instruction at address 2004, so that the address value used by the LDR instruction changes every time. That is, the instruction sequence 2600 is a process for finding an address storing the same value as the R1 register from a plurality of addresses, and is not an instruction sequence used for a busy wait loop. Therefore, this loop processing should be excluded from the power saving control target.
- FIG. 12 shows the state of the dependency relationship analysis buffer 2700 when the dependency analysis is performed by the loop propagation dependency analysis unit 1702 for the instruction sequence 2600 shown in FIG. Note that the state of the dependency analysis buffer 2700 is the state immediately before the determination in step S2019 is performed.
- SRC register 2203 is an R2 register, and the entry number is “No. 1 DST register 2202 is the same.
- the entry number is “No. 1 and No.
- no DST register 2202 is an R2 register. That is, the entry number is No.
- an iteration straddling determination result 2207 with the dependency source instruction becomes YES. Therefore, the determination result in step S2019 of the loop propagation dependency analyzing unit 1702 is YES (loop propagation dependency exists).
- the execution instruction 1712 is not output to the first power control unit 1703, and the first power control unit 1703 does not perform power saving control of the processor 1301 at this timing.
- loop processing used for busy weight is selectively detected as a power saving control target, and loop processing not for busy weight is selectively detected. It is understood that it is not detected.
- step S2019 is performed in the processing of the loop propagation dependency analyzing unit 1702.
- step S2019 it is determined that the source register of the instruction belonging to the subsequent loop is not written by the execution of the instruction belonging to the previous loop.
- the entry number in FIG. 6 SRC register 2203 is an R2 register. After the value of this R2 register is read into the R0 register, it is compared with the value of the R1 register. In this case, the R2 register corresponds to the synchronization variable. It can be seen that there is no instruction for setting the R2 register as the DST register 2202 in the loop, and the R2 register as the synchronization variable cannot be rewritten due to an internal factor of the loop. That is, it can be determined that this is a busy wait loop.
- the loop processing used for the busy wait is selectively detected from the various loop processing executed by the processor, and the power is selectively saved when the loop used for the busy wait is being executed. Control can be performed.
- the loop propagation dependency analyzing unit 1702 As a process of the loop propagation dependency analyzing unit 1702, a procedure for determining whether the SRC register 2203 of the instruction of the subsequent loop is used as the DST register 2202 of the instruction of the preceding loop is taken as an example.
- the loop propagation dependency analyzing unit 1702 is implemented using other processing as long as it can detect that the value of the variable used in the subsequent loop cannot be rewritten by executing the instruction of the preceding loop. It doesn't matter.
- the processing procedure of the flowchart of FIG. 9 is an example for easily explaining the processing procedure for analyzing the presence / absence of loop propagation dependency, and the presence / absence of loop propagation dependency is determined by a method other than the above processing procedure. (Specific examples will be described later).
- the dependency analysis buffers 2200 and 2700 for example, at least one of the mnemonic 2201, the entry number 2204, the iteration number 2205, the entry number 2206 of the dependency source instruction, etc. may be omitted. Good.
- the first power control unit 1703 waits until an execution instruction 1712 is received from the loop propagation dependency analysis unit 1702 in step S2301.
- the first power control unit 1703 When the first power control unit 1703 receives the execution instruction 1712, the first power control unit 1703 performs power saving control in step S2303. Specifically, the first power control unit 1703 transmits power control information 1731 for shifting the power supply mode of the power supply device 1304 from the normal power mode to the power saving mode to the power supply device 1304.
- the frequency of the clock 1307 supplied to the processor 1301 by the power supply device 1304 is reduced, and the voltage of the power 1306 supplied to the processor 1301 is lowered.
- the clock frequency is reduced to 12.5% of the maximum frequency, and the voltage is reduced to 70% of the maximum voltage.
- the power supply device 1304 can switch the power supply mode between the normal power mode and the power saving mode.
- the power supply device 1304 In the normal power mode, the power supply device 1304 generates a clock 1307 having a set frequency (for example, the maximum frequency) by the clock generation circuit and supplies the generated clock to the processor 1301.
- the power supply device 1304 supplies power 1306 having a set voltage (for example, maximum voltage) to the processor 1301.
- the power supply device 1304 when the power supply device 1304 receives the power control information 1731 from the first power control unit 1703, the power supply device 1304 reduces the voltage and the clock frequency from the set values as described in the above example by the power saving mode. And the clock 1307 are supplied. Note that a plurality of types of power saving modes may be provided, and the degree of power saving may be varied among the plurality of types of power saving modes.
- step S2305 the first power control unit 1703 outputs a loop escape detection execution instruction 1713 to the loop escape detection unit 1704.
- Second component A loop escape detection unit 1704 (which is an example of a loop end detection unit) and a second power control unit 1705, which are second components of the power saving control device, will be described.
- the loop exit detection unit 1704 and the second power control unit 1705 respectively perform processing for detecting that the processor 1301 has exited from the busy wait loop processing and power control for terminating the power saving control.
- loop Exit Detection Unit (2-1) Loop Exit Detection Unit The operation of the loop exit detection unit 1704 will be described with reference to the flowchart of FIG.
- step S2401 the loop escape detection unit 1704 waits until an execution instruction 1713 is received from the first power control unit 1703.
- the loop exit detection unit 1704 Upon receipt of the execution instruction 1713, the loop exit detection unit 1704 acquires the loop start address 1911 and the loop end address 1912 from the loop range storage unit 1900 in step S2403.
- the start address 1911 and the end address 1912 are stored in a memory included in the loop escape detection unit 1704. Note that the start address 1911 and the end address 1912 acquired by the loop detection unit 1701 may be held in the memory.
- step S2405 the loop exit detection unit 1704 determines whether or not a branch is taken based on the execution result of the branch instruction (and the value of the condition flag register (CFR)). This determination is the same as the processing performed by the loop detection unit 1701 in step S1801.
- the loop escape detection unit 1704 includes a branch instruction storage unit and an instruction comparison unit. The branch instruction storage unit and the instruction comparison unit may be shared by the loop detection unit 1701 and the loop escape detection unit 1704.
- step S2405 determines whether the branch is established or not. If the determination result of step S2405 is YES (branch is established), the process proceeds to step S2407. If the determination result of step S2405 is NO (branch is not established), the process proceeds to step S2408.
- step S2407 when a conditional branch instruction is included in addition to the branch instruction located at the end of the loop, the loop escape detection unit 1704 determines whether or not the conditional branch instruction branches outside the loop range. Specifically, it is determined whether the jump destination address of the branch established in step S 2405 is out of the range of the loop start address 1911 and the loop end address 1912 acquired from the loop range storage unit 1900. That is, when the branch destination address is smaller than the head address 1911 or larger than the tail address 1912, it is determined that the branch destination is out of the loop range.
- step S2407 If the decision result in the step S2407 is YES, since the loop process is finished, the process proceeds to a step S2409 so as to finish the power saving control. On the other hand, if the determination result is NO, the loop process continues, and the process returns to step S2405 to continue the detection process.
- step S2408 the loop escape detection unit 1704 determines whether or not the execution result of the conditional branch instruction located at the end of the loop is a branch failure. Specifically, it is determined whether the address of the conditional branch instruction is the same as the end address 1912 of the loop. If the decision result in the step S2408 is YES, the loop process has been finished and the process proceeds to a step S2409. If the determination result of step S2408 is NO, the loop process is continued and the process returns to step S2405.
- step S2409 the loop escape detection unit 1704 outputs an execution instruction 1714 to end the power saving control to the second power control unit 1705 in order to end the power saving control.
- the execution instruction 1714 is made, for example, by setting the voltage of the signal line connecting the loop escape detection unit 1704 and the second power control unit 1705 to a high level.
- the loop escape detection unit 1704 is arranged in the retirement unit 1360, and it is detected whether or not the loop processing is completed based on the execution result of the branch instruction.
- that method or apparatus may be used.
- An example of another method that can be used is a method of detecting the end of loop processing when the value of the program counter 1311 is out of the loop range. In this case, in order to improve the stability of the loop end detection process, for example, when the value of the program counter 1311 becomes larger than a value obtained by adding a predetermined value to the end address of the loop, it is determined that the loop process has ended. May be.
- step S2501 the second power control unit 1705 waits until it receives an execution instruction 1714 from the loop propagation dependence analysis unit 1702.
- the second power control unit 1705 performs control to end the power saving control and return to the normal power control. Specifically, the second power control unit 1705 transmits power control information 1751 for returning the power supply mode of the power supply device 1304 from the power saving mode to the normal power mode to the power supply device 1304. As a result, the power supply device 1304 increases, for example, the frequency of the clock 1307 and the voltage of the power 1306 supplied to the processor 1301 that have been reduced by the first power control unit 1703 (for example, to the original value). return).
- the loop detection unit 1701 As described above, the loop detection unit 1701, the loop propagation dependence analysis unit 1702, the first power control unit 1703, the loop escape detection unit 1704, and the second power control unit that constitute the power saving control device according to Embodiment 1 of the present invention.
- the operation of 1705 has been described.
- the first power control unit 1703 and the second power control unit 1705 constitute the “power control unit”.
- the “power control unit” may be configured by the first power control unit 1703, and the second power control unit 1705 may not be included in the “power control unit”.
- the loop propagation dependency analyzing unit 1702 can analyze the presence / absence of loop propagation dependency using, for example, digital electronic circuits shown in FIGS. 16 and 17.
- FIG. 16 is a diagram showing a flow dependence detection circuit 2851 that detects flow dependence.
- FIG. 17 is a diagram showing a preliminary detection circuit 2853 that preliminarily detects both flow dependency and loop propagation dependency in order to detect loop propagation dependency.
- four instructions instructions 1 to 4 are included in the loop.
- the dependency between instructions other than four, for example, between five or more instructions.
- a circuit for detecting the dependency relationship may be configured.
- the flow dependency detection circuit 2851 includes an instruction buffer 2855 that stores an instruction sequence included in the loop, and values of the DST register (destination register) 2202 and the SRC register (source register) 2203 between the instructions stored in the instruction buffer 2855. And a comparison circuit 2857 for comparing the two.
- the uppermost instruction is the head of the loop.
- the instruction buffer 2855 stores an instruction string within the loop range from the instruction string holding unit 1340. Note that the instruction string holding unit 1340 may be used as the instruction buffer 2855.
- the comparison circuit 2857 outputs “1” when the values of the DST register 2202 and the SRC register 2203 are equal, and outputs “0” when the values are different. That is, the comparison circuit 2857 outputs “1” when there is a dependency relationship.
- one SRC register 2203 may be compared with a plurality of DST registers 2202 (for example, the SRC register 2203 of the instruction 4). In this case, if “1” is output even in one of the plurality of comparison circuits 2857, it is determined that there is flow dependency (“1”). On the other hand, if all of the plurality of comparison circuits 2857 output “0”, it is determined that there is no flow dependency (“0”).
- the determination results of the SRC registers 2203a and 2203b of the nth instruction are Sna and Snb.
- the determination results of the SRC registers 2203a and 2203b of the instruction 2 are S2a and S2b, respectively. Since the instruction 1 does not depend on the flow, the determination result is not shown.
- the preliminary detection circuit 2853 shown in FIG. 17 includes three instruction buffers 2861a, 2861b, and 2861c that store instruction sequences included in the loop, and a DST register 2202 and an SRC register 2203 between the two instruction buffers 2861a and 2861b.
- a comparison circuit 2857 for comparing values is provided.
- the instruction buffer 2861c stores an instruction string within the loop range when the i-th loop is executed. Then, when the (i + 1) th loop is executed, the stored instruction sequence is moved to the instruction buffer 2861a. Thereby, when the (i + 1) -th loop is executed, the instruction sequence of the loop is stored in the two instruction buffers 2861a and 2861b.
- the comparison circuit 2857 is the same as the comparison circuit of the flow dependence detection circuit 2851. However, in the preliminary detection circuit 2853, the combination of connection between the DST register 2202 and the SRC register 2203 is different from the flow-dependent detection circuit 2851. Specifically, the DST register 2202 of the instruction buffer 2861a storing the preceding loop instruction and the SRC register 2203 of the instruction buffer 2861b storing the subsequent loop instruction are compared in all combinations.
- the determination results of the SRC registers 2203a and 2203b of the instruction buffer 2861b are Rna and Rnb, similar to the preliminary detection circuit 2853.
- any of the SRC registers 2203 is determined to be dependent in the preliminary detection circuit 2853 and determined not to be dependent in the flow dependency detection circuit 2851, it is determined that there is loop propagation dependency. . That is, loop propagation dependency is detected.
- the flow dependency detection circuit 2851 determines that the SRC register 2203 with the entry number 5 is dependent on the DST register 2202 with the entry number 1. Further, since the SRC register 2203 with the entry number 5 has no flow dependency in the loop, the flow dependency detection circuit 2851 does not detect the dependency. Therefore, for the SRC register 2203 with the entry number 5, the determination result R1a of the preliminary detection circuit 2853 is “1”, and the determination result S1a of the flow dependency detection circuit 2851 is “0”. In such a case, loop propagation dependency is detected. In FIG. 16, as described above, the determination result S1a of the instruction 1 is omitted, but since the flow dependency does not occur in the instruction 1, the determination result S1a of the instruction 1 is “0”.
- the loop propagation dependency analyzing unit 1702 can be configured by combining the preliminary detection circuit 2853 and the flow dependency detection circuit 2851.
- the preliminary detection circuit 2853 of FIG. 17 includes three instruction buffers 2861a, 2861b, and 2861c, but one may be used.
- the instruction buffer 2861a For the instructions stored in the instruction buffer 2861a, only the DST register 2202 is connected to the comparison circuit 2857, and the SRC register 2203 is not connected to the comparison circuit 2857.
- the instruction stored in the instruction buffer 2861b only the SRC register 2203 is connected to the comparison circuit 2857, and the DST register 2202 is not connected to the comparison circuit 2857. Therefore, for example, any instruction stored in the instruction buffer 2861b may be used as long as all combinations of the DST register 2202 and the SRC registers 2203a and 2203b can be compared.
- FIG. 18 shows a simple loop propagation dependency detection circuit 2871 used for simply detecting loop propagation dependency.
- the simple loop propagation dependency detection circuit 2871 includes an instruction buffer 2855 and a comparison circuit 2857 as with the flow dependency detection circuit 2851. Each comparison circuit 2857 is connected to a DST register 2202 and an SRC register 2203 corresponding to one instruction. That is, the simple loop propagation dependency detection circuit 2871 determines whether or not the DST register 2202 and the SRC register 2203 are the same in one instruction, and determines the determination results Qna and Qnb (n is 1 to 4 in the figure). It is a circuit to output.
- the simple loop propagation dependency detection circuit 2871 can easily perform loop propagation based on the existence of a calculation expression for incrementing (or decrementing) the value of a register serving as a variable in a general calculation loop. Dependency is detected.
- the loop propagation dependency analyzing unit 1702 may be configured by a processor that executes a program that performs the processing illustrated in FIG.
- the processor that executes the program may be a simple processor such as a microprogram sequencer that executes a microprogram.
- a simple processor can be formed in the processor 1301.
- the processor for executing the program may be a separate processor formed in the same integrated circuit as the processor 1301 or the processor 1301. Note that loop processing executed by a processor that executes the program separate from the processor 1301 can be excluded from detection and dependency analysis.
- each component (loop detection unit 1701 and the like) of the power saving control device other than the loop propagation dependence analysis unit 1702 may be configured by an electronic circuit, or a program for processing each component You may comprise by the processor to perform.
- the power saving control and the control for terminating the power saving control and returning to the normal power control may be as follows, for example, in addition to the above-described one.
- the power saving control is performed by reducing the frequency of the clock 1307 supplied to the processor 1301 or reducing the voltage of the power 1306 supplied to the processor 1301.
- the control to end the power saving control and return to the normal power control is performed by increasing the frequency of the clock 1307 supplied to the processor 1301 or increasing the voltage of the power 1306 supplied to the processor 1301.
- a processor 1301, a main memory 1302, an I / O device group 1303, and a power supply device 1304 are connected to each other via a bus 1305, as in FIG.
- power 1306 and a clock 1307 are supplied from the power supply device 1304 to the processor 1301.
- FIG. 19 specifically shows an example of devices that can become the I / O device group 1303.
- the antenna 2810 is a device that receives broadcast waves and radio waves of the form telephone network.
- the tuner 2811 converts the analog radio wave received by the antenna 2810 into a digital signal.
- the decoder 2812 decodes the digital signal converted by the tuner.
- codecs that the decoder 2812 decodes include MPEG2, MPEG4-AVC, MPEG4-MVC, and the like.
- the OSD generator 2813 synthesizes the video data decoded by the decoder 2812 and the screen data generated by the processor 1301 into a video data form.
- the video display 2814 displays the video data synthesized by the OSD generator 2813 on the screen.
- Examples of the video display 2814 include a liquid crystal display, a plasma display, an organic EL display, and an LED display.
- the speaker 2816 performs audio output of the audio data decoded by the decoder 2812.
- the semiconductor memory read / write device 2821, HDD 2822 (hard disk storage device), and optical disc read / write device 2823 can be used as storage devices.
- the network communication device 2825 receives a network signal from the outside of the computer system 2800.
- Examples of the network communication device 2825 include an Ethernet adapter and a wireless LAN adapter.
- Remote control receiver 2826 receives an infrared control signal from the outside.
- a device that can become the I / O device group 1303 described here is mounted on the computer system 2800 is arbitrary. Further, a device different from the device described here may be mounted. For example, input devices such as a keyboard, a mouse, and a touch panel can also be the I / O device group 1303.
- a computer system 2800 partially equipped with an I / O device is a personal computer, mainframe, TV, VCR, HDD recorder, mobile phone, car navigation system, landline phone, copy machine, network relay device, mobile terminal with touch panel. Etc. may be constructed.
- control targets of the first power control unit 1703 and the second power control unit 1705 are not limited to the voltage of the power 1306 and the frequency of the clock 1307.
- the control target of the first power control unit 1703 and the second power control unit 1705 may be anything that controls the power supplied to the processor 1301 or the computer systems 1300 and 2800.
- the computer system 2900 according to the second embodiment differs from the computer system 1300 according to the first embodiment in the following points.
- each register group 2910 includes a thread identifier register 2912 and a time slice register 2913.
- the computer system 2900 includes a thread switching unit 2920 that switches threads operating in the processor 2901.
- the first power control unit and the second power control unit have a function of outputting time slice setting information to the register group 2910 and a function of controlling the number of execution units 1350. .
- FIG. 21 shows first and second register groups 2910a and 2910b corresponding to the first and second threads, respectively.
- Each register group 2910 includes a thread identifier register 2912 and a time slice register 2913 in addition to the components of the register group 1310 described with reference to FIG.
- the thread identifier register 2912 is a register that stores a thread identifier for identifying a specific thread from among a plurality of existing threads. For example, “1” is written in the thread identifier register 2912a belonging to the first register group 2910a corresponding to the first thread, and “2” is written in the thread identifier register 2912b belonging to the second register group 2910b corresponding to the second thread. Is written.
- the time slice register 2913 stores a time allocated to each thread, that is, a time slice. For example, when assigning time slices at a rate of 100 clock cycles to the first thread and 100 clock cycles to the second thread, “100” is stored in the time slice register 2913a of the first register group 2910a corresponding to the first thread. The Further, “100” is stored in the time slice register 2913b of the second register group 2910b corresponding to the second thread.
- the thread switching unit 2920 assigns the execution time of the processor 2901 to each thread in a time-sharing manner according to the value of the time slice register 2913 of each thread.
- the thread switching unit 2920 will be specifically described.
- the thread switching unit 2920 is connected to the register group 2910 and the instruction fetch / decode unit 1320 directly or via a bus. Then, the thread switching unit 2920 reads the value of the program counter 1311 of the register group 2910 corresponding to the currently executed thread and transfers it to the instruction fetch / decode unit 1320. The thread switching unit 2920 also transfers the thread identifier together with the value of the program counter 1311.
- the thread switching unit 2920 transfers the value “1” of the thread identifier and the value of the program counter 1311a of the first register group 2910a to the instruction fetch / decode unit 1320. Thereby, the instruction of the program that executes the first thread is fetched / decoded.
- the information in the thread identifier register 2912 that is, the value of the thread identifier (for example, “1”) is given to the instruction sequence fetched by the instruction fetch / decode unit 1320.
- timing of thread switching is notified by a timer / counter (not shown) provided outside the processor 2901. This will be specifically described below.
- the time slice stored in the time slice register 2913a is set in the timer / counter, and the timer / counter is started.
- the timer / counter notifies the thread switching unit 2920 when the set time slice has elapsed.
- the thread switching unit 2920 sends the register group corresponding to the next thread, for example, the value “2” of the thread identifier of the second register group 2910b and the value of the program counter 1311b to the instruction fetch / decode unit 1320. Forward to.
- the address of the instruction fetched by the instruction fetch / decode unit 1320 is changed to the address of the instruction of the program executing the second thread.
- the instruction of the program executing the second thread is fetched / decoded, and the operating thread is switched.
- the timer / counter is reset after the set time has elapsed, and is started after the time slice of the thread to be executed next, for example, the second thread is set.
- a timer / counter may be provided in the processor 2901.
- the time slice register 2913 and the thread switching unit 2920 constitute a “thread management unit”.
- the “thread management unit” may include a thread identifier register 2912.
- the issue unit 1330 reads the value of the register group 2910 corresponding to the thread identifier assigned to the instruction sequence from the plurality of register groups 2910 when issuing instructions such as operations.
- the retirement unit 1360 writes the execution result of the execution unit 1350 back to the register group 2910 corresponding to the thread identifier assigned to the instruction sequence from the plurality of register groups 2910.
- the loop range storage unit 1900 stores the thread identifier in association with the loop range.
- a loop range and a thread identifier can be stored for a plurality of threads.
- the loop detection unit 1701 When the loop detection unit 1701 detects a loop process, the loop detection unit 1701 writes a thread identifier together with the loop range in the loop range storage unit 1900.
- the loop propagation dependency analyzing unit 1702 acquires the thread identifier of the thread in which the loop processing is detected together with the loop range from the loop range storage unit 1900. Then, the loop propagation dependency analyzing unit 1702 performs a loop propagation dependency analyzing process on the instruction string in the loop range of the target thread. If there is no loop propagation dependency, the loop propagation dependency analyzing unit 1702 outputs an execution instruction 1712 to the first power control unit 3013. Here, the loop propagation dependency analyzing unit 1702 gives information for identifying a thread to the execution instruction 1712.
- a plurality of signal lines connecting the loop propagation dependence analyzing unit 1702 and the first power control unit 3013 correspond to a plurality of threads, respectively, and the signal lines corresponding to the thread subjected to the analysis processing Information for identifying the thread is given by increasing the voltage.
- the loop escape detection unit 1704 receives the execution instruction 1713 together with the thread identifier from the first power control unit 3013. Then, the loop escape detection unit 1704 acquires the loop range corresponding to the thread identifier from the loop range storage unit 1900. Then, the end of the loop process is detected based on the execution result of the branch instruction of the target thread.
- the loop escape detection unit 1704 can detect the end of loop processing for a plurality of threads. When the end of the loop processing is detected for any thread, the thread identifier is sent to the second power control unit 3015 together with the execution instruction 1714.
- the first power control unit 3013 and the second power control unit 3015 are provided in the retirement unit 1360. Thus, writing to the register group 2910 can be easily performed using the function of the retirement unit 1360.
- the first power control unit 3013 has a function of outputting time slice setting information 3103 to the register group 2910 and an operation number reduction instruction 3107 of the execution unit 1350. It has a function to output.
- the second power control unit 3015 In addition to the function of outputting power control information 1751 to the power supply device 1304, the second power control unit 3015 outputs a function of outputting time slice setting information 3105 to the register group 2910 and an operation number increase instruction 3109 of the execution unit 1350. It has a function to output.
- step S3201 the first power control unit 3013 determines whether or not the execution instruction 1712 has been received from the loop propagation dependency analyzing unit 1702. If the determination result is YES, the process proceeds to step S3203. If the determination result is NO, the determination in step S3201 is executed again. As described above, the execution instruction 1712 includes information for identifying a thread.
- step S3203 the first power control unit 3013 then transfers the register group 2910 to the register group 2910 corresponding to the thread specified by the execution instruction 1712 (register group 2910 corresponding to the currently operating thread). F), the time slice setting information 3103 is output so that the time slice after the change related to the thread is smaller than the time slice before the change.
- the first power control unit 3013 sets the value of the time slice register 2913b of the second register group 2910b in which the thread identifier register 2912 is “2”. Change from “100” to “50”.
- step S3205 the first power control unit 3013 outputs the power control information 1731 to the power supply device 1304. For example, an instruction is given to set the frequency of the clock 1307 to three quarters.
- step S3207 the first power control unit 3013 outputs an execution instruction 1713 together with the designated thread identifier to the loop escape detection unit 1704.
- FIG. 23 shows a flowchart of the operation of the second power control unit 3015.
- step S3301 the second power control unit 3015 determines whether or not the execution instruction 1714 is received from the loop escape detection unit 1704. If the determination result is yes, the process proceeds to step S3303. If the determination result is NO, step S3301 is executed again. As described above, the execution instruction 1714 includes a thread identifier.
- step S3303 the second power control unit 3015 transfers to the register group 2910 corresponding to the thread indicated by the thread identifier (to the register group 2910 corresponding to the currently operating thread), and after the change related to the thread.
- the time slice setting information 3105 is output so that the time slice increases from the time slice before the change (for example, returns to the time slice before being decreased by the first power control unit 3013).
- the second power control unit 3015 sets the time slice of the second register group 2910b in which the thread identifier register 2912 is “2” among the plurality of register groups 2910.
- the value of the register 2913b is returned from “50” to “100”.
- step S3305 the second power control unit 3015 outputs the power control information 1751 to the power supply device 1304.
- the power supply device 1304 increases (restores) the frequency of the clock 1307 supplied to the processor 2901 that has been reduced by the first power control unit 3013.
- a busy-waiting thread (second thread in the above example) and a thread not in the busy-waiting state, that is, a thread performing normal processing (in the above example, the first thread) ) And the processor resources (processor processing capacity) allocated to them.
- the processor resource allocated to each thread can be generally expressed by an expression “time slice of each thread ⁇ total sum of time slices of all threads ⁇ frequency of clock 1307”.
- the frequency of the clock 1307 at the time of executing the power saving control decreases to three-fourths of the frequency of the clock 1307 at the time of normal power control for both the first and second threads.
- time slice of each thread ⁇ total sum of time slices of all threads increases to 4/3 of the normal thread control in the first thread (from 100 ⁇ 200 to 100 ⁇ 150). ). In the second thread, it is reduced to 2/3 of the normal power control (from 100/200 to 50/150).
- the processor resources allocated to the first thread that is not in the busy wait state are three-quarters of the decrease rate of the frequency of the clock 1307 and four-thirds of the increase rate of the time slice allocation rate. Multiply by to get “1”. That is, even if power saving control is performed, the processor resource allocated to the first thread does not decrease.
- the processor resources allocated to the second thread are halved, and waste of power is suppressed.
- the processor resource allocated to the busy wait state thread is selectively reduced, the power consumed by the processor 2901 by executing the busy wait, and The power consumed by the computer system 2900 can be reduced.
- the above power saving control can be considered as follows.
- the processor resource corresponding to the decrease in the time slice allocated to the busy wait state thread is “50 ⁇ 200 ⁇ frequency f” using the above formula. That is, the decrease in the time slice corresponds to a quarter of the processor resource during normal power control.
- the power consumption of the processor 2901 can be reduced without reducing the processor resources of threads that are not in the busy wait state. Can be reduced. Further, when a specific thread in the busy wait state leaves the busy wait state, the processor resources allocated to the specific thread can be increased without decreasing the processor resources of other threads.
- the power saving control described above reduces the frequency of the clock 1307 according to the decrease in the time slice allocated to the busy-wait state thread.
- the power saving control may be performed by reducing the number of execution units 1350.
- step S3401 the first power control unit 3013 determines whether or not the execution instruction 1712 has been received from the loop propagation dependency analyzing unit 1702. If the determination result is YES, the process proceeds to step S3403. If the determination result is NO, the process returns to step S3401. As described above, the execution instruction 1712 includes information for identifying a thread.
- step S3403 the first power control unit 3013 then transfers the register group 2910 to the register group 2910 corresponding to the thread specified by the execution instruction 1712 (register group 2910 corresponding to the currently operating thread).
- the time slice setting information 3103 is output. For example, when the currently executing thread identifier is “2”, the first power control unit 3013 sets the value of the time slice register 2913b of the second register group 2910b in which the thread identifier register 2912 is “2”. , “100” is changed to “50”.
- step S3405 the first power control unit 3013 outputs an operation number reduction instruction 3107 for reducing the operation number of the execution unit 1350 to the execution unit 1350. For example, when there are four execution units 1350, the execution unit 1350 is instructed to reduce the number of operating units to three.
- the first power control unit 3013 is connected to the issuing unit 1330 by a signal line. Then, the first power control unit 3013 instructs the issuing unit 1330 to stop issuing instructions to some of the plurality of execution units 1350. For example, the instruction transmits a signal to a signal line corresponding to the execution unit 1350 to be stopped among the signal lines corresponding to each of the plurality of execution units 1350. Thereby, the issuing of instructions to some execution units 1350 is stopped.
- an electronic switch composed of a transistor is provided in a power supply path that supplies power to each execution unit 1350.
- the first power control unit 3013 can cut off the power supply to the execution unit 1350 and reduce the number of operations by turning off the electronic switch corresponding to the execution unit 1350 to be stopped.
- the power supply to the execution unit 1350 is cut off after instructing the issue unit 1330 to stop issuing instructions.
- step S3407 the first power control unit 3013 outputs an execution instruction 1713 to the loop escape detection unit 1704.
- step S3501 the second power control unit 3015 determines whether or not the execution instruction 1714 has been received from the loop escape detection unit 1704. If the determination result is YES, the process proceeds to step S3503. If the determination result is NO, the process of step S3501 is executed again. As described above, the execution instruction 1714 includes a thread identifier.
- the second power control unit 3015 outputs the time slice setting information 3105 to the register group 2910 corresponding to the thread indicated by the thread identifier (to the register group 2910 corresponding to the currently operating thread). To do. For example, when the thread identifier is “2”, the value of the time slice register 2913b of the second register group 2910b whose thread identifier register 2912 is “2” is returned from “50” to “100”.
- step S3505 the second power control unit 3015 outputs an operation number increase instruction 3109 for increasing the operation number of the execution unit 1350 to the execution unit 1350.
- the operation unit 1350 is instructed to return the number of operations to four, which is the number before the first power control unit 3013 reduces the number.
- the operation number increase instruction 3109 is performed in the reverse order to the operation number decrease instruction 3107 described above. That is, after the electronic switch is turned on, an instruction to resume issuing instructions to the issuing unit 1330 is instructed.
- a thread that is executing a busy wait (second thread in the above example) and a thread that is not executing a busy wait, that is, a thread that is performing normal processing (the above-described thread)
- the processor resource (processor processing capacity) allocated to the first thread)
- the processor resource allocated to each thread is generally “time slice of each thread ⁇ total sum of time slices of all threads ⁇ number of running execution units 1350”.
- the number of execution units 1350 when the power saving control is executed is reduced to three-fourths during the normal power control, but the “time allocated to each thread” when the power saving control is executed. “Slice / total sum of time slices of all threads” increases to 4/3 of the normal power control (from 100/200 to 100/150), so the product of both is “1”. On the other hand, for the second thread, the product of both is one-half.
- the thread in which the busy loop is not executed that is, the processor resource allocated to the first thread does not decrease even if the power saving control is executed.
- the processing power allocated to the thread that is executing the busy loop is selectively reduced, and the power consumed by the processor 2901 by executing the busy wait and The power consumed by the computer system 2900 can be reduced.
- the above power saving control can be considered as follows.
- the processor resource corresponding to the decrease in the time slice allocated to the busy-waiting thread is “50 ⁇ 200 ⁇ the number of running execution units” using the above formula. That is, the decrease in the time slice corresponds to a quarter of the processor resource during normal power control.
- the number of running execution units 1350 is reduced by a quarter according to the one-fourth processor resource, the consumption of the processor 2901 without reducing the processor resources of threads that are not in the busy wait state. Electric power can be reduced. Further, when a specific thread in the busy wait state leaves the busy wait state, by increasing the number of running execution units 1350, the processor resources allocated to the specific thread are increased without decreasing the processor resources of other threads. Can be made.
- the power saving control described above reduces the number of execution units to be operated in accordance with the decrease in the time slice allocated to the busy wait state thread.
- the processor resource (processing capacity) allocated to each thread is “time slice of each thread ⁇ total sum of time slices of all threads ⁇ frequency of clock 1307” X number of execution units 1350 in operation.
- At least one of the frequency of the clock 1307 and the number of running units 1350 can be reduced according to the reduction in the time slice allocated to the busy-wait state thread.
- processor resource of a thread that is not executing a busy loop during power saving control may be increased. Even in this case, the power consumed by the execution of the busy loop can be reduced.
- the power saving control and the control for returning to the normal power control after ending the power saving control and the power saving control described in the power saving control 1 and the power saving control 2 are as follows, for example: Also good.
- Power saving control is performed by reducing the voltage of the power 1306 supplied to the processor 2901 or by reducing the voltage of the power 1306 supplied to the processor 2901 and the frequency of the clock 1307. Then, the control to end the power saving control and return to the normal power control is performed by increasing the voltage of the power 1306 supplied to the processor 2901 or increasing the voltage of the power 1306 supplied to the processor 2901 and the frequency of the clock 1307. By doing.
- the power saving control is performed by reducing the number of operating units 1350 and reducing the voltage of power 1306 supplied to the processor 2901, or reducing the number of operating units 1350 operating and supplying the processor 2901 with power. This is done by reducing the voltage of 1306 and the frequency of the clock 1307. Then, the control to end the power saving control and return to the normal power control is performed by increasing the number of operating units 1350 and increasing the voltage of the power 1306 supplied to the processor 2901, or by operating the number of executing units 1350. And the voltage of the power 1306 supplied to the processor 2901 and the frequency of the clock 1307 are increased.
- control targets of the first power control unit 3013 and the second power control unit 3015 are not limited to the voltage of the power 1306, the frequency of the clock 1307, and the number of running units 1350.
- the control target of the first power control unit 3013 and the second power control unit 3015 may be anything that controls the power supplied to the processor 2901 or the computer system 2900.
- the computer system 4000 according to the third embodiment differs from the computer system 1300 according to the first embodiment in the following points.
- the computer system 4000 includes a program counter monitoring unit 4100 instead of the loop detection unit 1701 and the loop escape detection unit 1704.
- the computer system 4000 includes a bus monitoring unit 4200 instead of the loop propagation dependency analyzing unit 1702.
- the processor 4020 in the integrated circuit 4010 includes a power saving control device (loop detection unit 1701, loop propagation dependency analysis unit 1702, first power control unit 1703, loop escape detection unit 1704, first power supply control unit) from the processor 1301 of the first embodiment.
- the second power control unit 1705 is removed.
- the issuing unit 1330A has a configuration in which the loop propagation dependency analyzing unit 1702 is removed from the issuing unit 1330 according to the first embodiment, and the retirement unit 1360A includes the loop detecting unit 1701 and the loop from the retirement unit 1360 according to the first embodiment.
- the escape detection unit 1704 is removed.
- the program counter monitoring unit 4100, the bus monitoring unit 4200, the first power control unit 1703, and the second power control unit 1705 constitute a power saving control device. Note that some or all of the program counter monitoring unit 4100, the bus monitoring unit 4200, the first power control unit 1703, and the second power control unit 1705 may be provided in the processor.
- Program counter monitoring unit 4100 (operation related to power saving control) The operation related to the power saving control of the program counter monitoring unit 4100 will be described with reference to the flowchart of FIG. However, the flowchart of FIG. 27 focuses on the fact that the value of the program counter 1311 is repeated in a certain pattern while the processor 4020 executes the loop processing.
- step S4501 the program counter monitoring unit 4100 monitors whether the value of the program counter 1311 is repeated in a certain pattern. If the determination result in step S4501 is NO (when the value of the program counter 1311 is not repeated in a certain pattern), the process returns to step S4501. If the determination result of step S4501 is YES (when the value of the program counter 1311 is repeated in a certain pattern), the process proceeds to step S4503.
- step S4503 the program counter monitoring unit 4100 outputs an execution instruction 1711A for bus monitoring execution to the bus monitoring unit 4200.
- Bus monitoring unit 4200 The operation of the bus monitoring unit 4200 will be described with reference to the flowchart of FIG. However, the flowchart in FIG. 28 focuses on the fact that there is a case where the address related to reading does not change if the processor 4020 executes loop processing and there is no loop propagation dependency between the loops.
- the bus monitoring unit 4200 determines in step S4601 whether the execution instruction 1711A has been received from the program counter monitoring unit 4100. If the determination result of step S4601 is YES, the process proceeds to step S4603. If the determination result of step S4601 is NO, the determination of step S4601 is executed again.
- step S4603 the bus monitoring unit 4200 monitors the bus 1305 and determines whether there is a change in the address related to reading. If the determination result of step S4603 is YES (if there is a change in the address related to reading), the process proceeds to step S4605. If the determination result of step S4603 is NO (when there is no change in the address related to reading), the process proceeds to step S4607.
- step S4605 the bus monitoring unit 4200 does not output the execution instruction 1712 to execute the power saving control to the first power control unit 1703.
- step S4607 the bus monitoring unit 4200 outputs an execution instruction 1712 for executing power saving control to the first power control unit 1703.
- the first power control unit 1703 receives the execution instruction 1712 from the bus monitoring unit 4200
- the first power control unit 1703 transmits the power control information 1731 to the power supply device 1304 and also executes the execution instruction 1713A for detecting the end of power saving control. Output to the counter 4100. Note that the power saving control by the first power control unit 1703 can use each power saving control described in the first embodiment.
- Program counter monitoring unit 4100 (operation related to power saving control termination) The operation related to the end of the power saving control of the program counter monitoring unit 4100 will be described with reference to the flowchart of FIG. However, the flowchart of FIG. 29 focuses on the fact that the value of the program counter 1311 deviates from repetition with a certain pattern during execution of the loop processing after the processor 4020 finishes executing the loop processing.
- the program counter monitoring unit 4100 determines whether or not the execution instruction 1713A is received from the first power control unit 1703 in step S4701. If the determination result of step S4701 is YES, the process proceeds to step S4703. If the determination result of step S4701 is NO, the determination of step S4701 is executed again.
- step S4703 the program counter monitoring unit 4100 monitors whether the value of the program counter 1311 is repeated in the same constant pattern as the constant pattern at the time of determination in step S4501. If the determination result in step S4703 is YES (when the value of the program counter 1311 is repeated in a constant pattern), it is determined that the processor 4020 continues to execute the loop process, and the process returns to step S4703. If the determination result of step S4703 is NO (when the value of the program counter 1311 is not repeated in a certain pattern), it is considered that the processor 4020 has finished executing the loop process, and the process proceeds to step S4703.
- step S4705 the program counter monitoring unit 4100 outputs an execution instruction 1714 for executing power saving control to the second power control unit 1705.
- the second power control unit 1705 receives the execution instruction 1714 from the program counter monitoring unit 4100, the second power control unit 1705 transmits power control information 1735 to the power supply device 1304.
- finishes the power saving control by the 2nd power control part 1705, and returns to normal power control can use the control which complete
- FIG. 30 the same components as those in the computer system 2900 in FIG. 20 or the computer system 4000 in FIG. 26 are denoted by the same reference numerals and description thereof is omitted.
- the computer system 4500 according to the fourth embodiment is different from the computer system 2900 according to the second embodiment in the following points.
- the computer system 4500 includes a program counter monitoring unit 4100 instead of the loop detection unit 1701 and the loop escape detection unit 1704.
- the computer system 4500 includes a bus monitoring unit 4200 instead of the loop propagation dependency analyzing unit 1702.
- the processor 4520 in the integrated circuit 4510 is connected to the power saving control device (loop detection unit 1701, loop propagation dependency analysis unit 1702, first power control unit 3013, loop escape detection unit 1704, The second power control unit 3015) is removed.
- the program counter monitoring unit 4100, the bus monitoring unit 4200, the first power control unit 3013, and the second power control unit 3015 constitute a power saving control device. Note that some or all of the program counter monitoring unit 4100, the bus monitoring unit 4200, the first power control unit 3013, and the second power control unit 3105 may be provided in the processor.
- Program counter monitoring unit 4100 (operation related to power saving control) The program counter monitoring unit 4100 receives the value of the program counter 1311 of each thread and information indicating the currently operating thread from the thread switching unit 2920.
- the program counter monitoring unit 4100 monitors the program counter 1311 in units of threads because the processor 4200 operates while switching threads.
- the program counter monitoring unit 4100 detects that the value of the program counter 1311 of the thread corresponding to the information received from the thread switching unit 2920 is repeated in a certain pattern
- the program counter monitoring unit 4100 issues an execution instruction 1711A for bus monitoring execution.
- the data is output to the bus monitoring unit 4200.
- the execution instruction 1711A includes information indicating a thread in which the value of the program counter 1311 is repeated in a certain pattern.
- Bus monitoring unit 4200 Information indicating the currently operating thread is input from the thread switching unit 2920 to the bus monitoring unit 4200.
- the bus monitoring unit 4200 monitors the bus 1305 for the thread indicated by the information included in each execution instruction 1711A received from the program counter monitoring unit 4100.
- the bus monitoring unit 4200 detects that there is no change in the address related to reading of the currently operating thread received from the thread switching unit 2920, the bus monitoring unit 4200 executes the power saving control execution to the first power control unit 3013.
- An instruction 1712 is output.
- This execution instruction 1712 includes information indicating a thread in which it is detected that there is no change in the address related to reading.
- the first power control unit 3013 outputs an execution instruction 1713A for detecting the end of power saving control to the program counter 4100.
- the power saving control by the first power control unit 3013 can use each power saving control described in the second embodiment.
- the time slice number reduction control by the first power control unit 3013 can use the time slice number reduction control described in the second embodiment.
- Program counter monitoring unit 4100 (operation related to power saving control termination) The program counter monitoring unit 4100 monitors the program counter 1311 for a thread indicated by information included in each execution instruction 1713A received from the first power control unit 3013. When the program counter monitoring unit 4100 detects that the value of the program counter 1311 of the currently operating thread received from the thread switching unit 2920 is not repeated in a certain pattern, the program counter monitoring unit 4100 transfers to the second power control unit 3014. An execution instruction 1714 for executing power saving control is output. This execution instruction 1714 includes information indicating a thread whose value of the program counter 1311 is not repeated in a certain pattern.
- finishes the power saving control by the 2nd power control part 3015 and returns to normal power control can use the control which complete
- the increase control of the number of time slices by the second power control unit 3015 can use the increase control of the number of time slices described in the second embodiment.
- the branch instruction included in the instruction sequence before execution read into the instruction cache 1370 can be detected by the loop detection unit, and the detected loop process can be analyzed in advance for the presence or absence of loop propagation dependency.
- the branch instruction address or loop range can be stored in the buffer. As a result, when the instruction at the address stored in the buffer is executed, it can be detected that the busy wait loop is executed.
- the loop detection unit can be provided at a place other than the retirement unit 1360, and the loop propagation dependence analysis part can also be provided at a place other than the issuing unit 1330.
- a unit for fetching instructions from the instruction cache may be provided separately from the instruction fetch / decode unit 1320.
- the number of running units is reduced during power saving control.
- the loop detection unit 1704 detects the loop process based on the execution result of the branch instruction or the like.
- the retirement unit 1360 may be provided with a reorder buffer for storing issued instructions and addresses of the instructions.
- the issue of the branch instruction can be detected based on the branch instruction sent from the issue unit 1330 to the reorder buffer. Thereafter, the execution of the branch instruction can be detected when the execution result of the branch instruction is sent to the retirement unit or when the execution result of the branch instruction is retired (for example, when the program counter is rewritten).
- the power supply device includes a clock frequency storage unit that stores the frequency of the clock supplied to the processor, and the processor reads the clock frequency from the clock frequency storage unit. You may be able to do it.
- the first example and the second example are given as examples of the mounting method in this case.
- the clock frequency storage unit is implemented as a memory-mapped register.
- the clock frequency storage unit is assigned to a specific address on the bus.
- the processor reads the clock frequency from the clock frequency storage unit via the bus.
- the clock frequency storage unit is acquired by the processor when the processor executes a specific instruction.
- the processor executes a specific instruction in the execution unit, the processor reads the clock frequency from the clock frequency storage unit existing in the power supply device.
- FIG. 31 shows an example of an instruction set 1400A having specific instructions.
- An instruction set 1400A shown in FIG. 31 is obtained by adding a READCLK instruction 1420 to the instruction set 1400 described in FIG. 4, and the READCLK instruction 1420 is executed by the processor to read the clock frequency from the clock frequency storage unit. Fall under certain instructions.
- the time slice may be read by a processor.
- the time slice is acquired by the processor when the processor executes a specific instruction.
- the processor reads the time slice when a specific instruction is executed in the execution unit.
- FIG. 32 shows an example of an instruction set 1400B having specific instructions.
- An instruction set 1400B shown in FIG. 32 is obtained by adding a READCLK instruction 1420 and a READTS instruction 1421 to the instruction set 1400 described in FIG. 4, and the READTS instruction 1421 is executed by the processor to read the time slice. Corresponds to a specific instruction.
- An integrated circuit is an integrated circuit including a processor, and the processor detects a loop process that repeatedly executes a loop including one or more instructions in the processor; In the loop processing, a loop propagation dependency analyzing unit that detects a loop propagation dependency that spans two loops having different execution times between instructions, and a loop processing detected by the loop detecting unit, the loop propagation dependency analyzing unit A power control unit that performs power saving control to reduce power consumption due to the execution of the loop processing when loop propagation dependency is not detected.
- the loop detection unit can detect the loop processing, and the loop propagation dependency analysis unit can determine whether the loop processing is for busy wait. Therefore, a busy wait loop can be detected without comparison with a specific instruction sequence (for example, an interlock instruction sequence). Therefore, it is possible to easily detect a busy wait loop constituted by an instruction sequence other than a specific instruction sequence and perform power saving control during the busy wait. As a result, it is possible to detect more various busy wait executions and reduce power wasted due to busy wait execution.
- a specific instruction sequence for example, an interlock instruction sequence
- the loop propagation dependency means that a dependency relationship occurs between a loop instruction executed at the i-th time and a loop instruction executed at the j-th time (j> i) in the loop processing. .
- loop propagation dependency exists when read from. If the loop process has loop propagation dependency, it is determined that the loop process is not a busy wait, but a loop process for executing an operation or the like, and is excluded from power saving control.
- a loop is configured between addresses 2004 and 2010.
- the value of the register (R2 register) written by the ADD instruction at address 2004 is read by the same instruction (in the ADD instruction at address 2004) the next time. Therefore, in the loop formed by the instruction sequence 2600 shown in FIG. 11, the value written to the specific register by the instruction in the i-th loop is specified by the instruction in the j-th (j> i) loop. Corresponds to the loop read from the register. Therefore, loop propagation dependence exists in the loop formed by the instruction sequence 2600 shown in FIG. A loop having loop propagation dependency is determined not to be for busy wait, and is excluded from power saving control.
- the register value written in the preceding loop is not read out in the following loop.
- the register to be written in the loop is only R0, but the value written to the R0 register in the preceding loop is not read in the succeeding loop. Therefore, in the loop formed by the instruction sequence 2100 shown in FIG. 8, the specific value written in the register by the instruction in the i-th loop is read by the instruction in the j-th (j> i) loop. It is a loop that is not done. Therefore, there is no loop propagation dependency in the loop formed by the instruction sequence shown in FIG. A loop having no loop propagation dependency is determined to be for busy wait, and is subject to power saving control.
- the detection of the loop processing may be performed after execution of an instruction included in the loop, or may be performed before execution. Furthermore, the loop processing may be detected from the instruction sequence stored in the instruction cache.
- loop propagation dependent detection process may be performed before or after the execution of the instruction included in the loop as long as it is after the loop detection process.
- a control method is a control method for a computer system including a processor, wherein the processor detects loop processing in which a loop including one or more instructions is repeatedly executed.
- a loop propagation dependency detecting step for detecting a loop propagation dependency in which the dependency between instructions in two loops having different execution times in the loop processing is detected; and the loop propagation dependency in the loop processing detected by the loop detecting step
- a computer system is a computer system including a processor, and a loop detection unit that detects that a loop process that repeatedly executes a loop including one or more instructions is executed in the processor; In the loop processing, a loop propagation dependency analyzing unit that detects a loop propagation dependency that spans two loops having different execution times between instructions, and a loop processing detected by the loop detecting unit, the loop propagation dependency analyzing unit A power control unit that performs power saving control to reduce power consumption due to the execution of the loop processing when loop propagation dependency is not detected.
- the integrated circuit according to an aspect of the present invention further includes a loop end detection unit that detects that the execution of the loop processing has ended, and the power control unit is configured to perform the loop in a state where power saving control is performed.
- the end detection unit may end the power saving control when detecting the end of the loop processing.
- the loop propagation dependency analyzing unit is configured such that, in the loop processing, the value written in the variable in the first loop is executed after the first loop. It is possible to determine that there is loop propagation dependence by reading from the variable.
- the loop detection unit detects that the processor is in a loop execution state when an instruction branching to a preceding address is executed in the processor. can do. Thereby, when the program is executed in the processor, the loop process can be easily detected based on the execution of the branch instruction.
- the integrated circuit further includes a loop range storage unit that stores a loop range, and the loop detection unit stores the loop range when a branch instruction that branches to a preceding address is detected.
- Output a loop range to the loop, further instruct the loop propagation dependency analysis unit to perform dependency analysis, and the loop propagation dependency analysis unit loops the instruction sequence within the loop range stored in the loop range storage unit. Propagation-dependent detection processing can be performed.
- the loop end detection unit may execute a branch instruction that branches out of the loop range stored in the loop range storage unit, or may be at the end of the loop range. It can be detected that the execution of the loop processing in the processor is completed when the execution result of the conditional branch instruction located is not branching.
- the loop range storage unit may store information including a loop start address and a loop end address.
- the processor includes an instruction sequence holding unit that holds a fetched instruction sequence, and the loop propagation dependency analyzing unit is configured to store the loop held in the instruction sequence holding unit. It is possible to perform loop propagation dependent detection processing for instruction sequences in the range of.
- the instruction string stored in the instruction string holding unit may include a prefetched instruction.
- the processor when the power control unit performs power saving control, performs control to reduce the frequency of a clock supplied to the processor, and ends the power saving control. Control for increasing the frequency of the clock supplied to the device can be performed. By reducing the frequency of the clock, waste of power due to busy wait can be reduced.
- the power control unit when the power control unit performs power saving control, the power control unit performs control to reduce the voltage of power supplied to the processor, and ends the power saving control. Control for increasing the voltage of the power supplied to the power source can be performed. By reducing the voltage of power, waste of power due to busy weight can be reduced.
- the power control unit when the power control unit performs power saving control, the power control unit performs control to reduce a voltage of power supplied to the processor and to reduce a frequency of a clock supplied to the processor.
- control for increasing the voltage of the power supplied to the processor and increasing the frequency of the clock supplied to the processor can be performed.
- waste of power due to the busy weight can be reduced.
- the loop propagation dependency analyzing unit determines that loop propagation dependency exists when the variable to be written in one instruction is the same as the variable to be read in the loop processing. Can be.
- the processor includes a plurality of execution units, and the power control unit performs control to stop some of the plurality of execution units when performing power saving control. And when the power saving control is terminated, a control for resuming the execution of some of the stopped execution units can be performed. By stopping some of the plurality of execution units, waste of power due to busy waits can be reduced.
- the processor includes a thread management unit that manages a time slice allocated to each of a plurality of threads, and the power control unit performs a loop when performing power saving control.
- the thread management unit When the thread management unit is instructed to decrease the time slice allocated to a busy-waiting thread that is executing loop processing in which propagation dependency is not detected, and the power saving control is terminated, the time slice allocated to the thread is increased.
- the thread management unit may be instructed. Accordingly, it is possible to reduce the waste of power due to the busy wait by reducing the time slice allocated to the busy wait state thread.
- the power control unit when the power control unit performs power saving control, the power control unit instructs the thread management unit to decrease the time slice allocated to the thread in the busy wait state, and supplies the instruction to the thread management unit.
- the thread management unit is instructed to increase the time slice allocated to the thread, and the frequency of the clock supplied to the processor is increased. Can be done. As a result, power consumption due to busy waits can be reduced and the power consumption of the processor can be reduced.
- the processor includes a plurality of execution units
- the power control unit reduces a time slice allocated to the thread in a busy wait state when performing power saving control. Instructing the management unit, performing control to stop a part of the plurality of execution units, and instructing the thread management unit to increase the time slice to be allocated to the thread when power saving control is terminated. The control for resuming the execution of some of the plurality of execution units may be performed. As a result, power consumption due to busy waits can be reduced and the power consumption of the processor can be reduced.
- the processor includes a plurality of execution units
- the power control unit reduces the time slice allocated to a busy-waiting thread when performing power saving control.
- the power control unit when the power control unit performs power saving control, instructs the thread management unit to decrease the time slice allocated to a thread in a busy wait state and supplies the instruction to the thread management unit.
- the thread management unit In order to reduce the frequency of the clock and reduce the voltage of the power supplied to the processor, and to end the power saving control, the thread management unit is instructed to increase the time slice allocated to the thread, and Control may be performed to increase the frequency of the clock supplied to the processor and increase the voltage of the power supplied to the processor.
- Control may be performed to increase the frequency of the clock supplied to the processor and increase the voltage of the power supplied to the processor.
- the frequency of the clock supplied to the processor is decreased according to the decrease in the time slice allocated to the busy wait state thread. Can be.
- the power control unit when the power control unit performs power saving control, the number of operating units of the plurality of execution units is reduced according to the reduction in the time slice allocated to the busy wait state thread. Can be.
- An integrated circuit is an integrated circuit including a processor, and the processor is connected to a first monitoring unit that monitors whether a counter value is repeated in a constant pattern in a program counter in the processor.
- a second monitoring unit that monitors whether there is a change in an address related to reading by the processor in the generated bus, and the first monitoring unit detects that the counter value is repeated in a certain pattern,
- a power control unit that performs power saving control for reducing power consumption by the processor when the monitoring unit detects that there is no change in an address related to reading by the processor.
- a control method for a computer system including a processor, the first monitoring step for monitoring whether a counter value is repeated in a constant pattern in a program counter in the processor, A second monitoring step for monitoring whether there is a change in an address related to reading by the processor in the bus to which the processor is connected, and it is detected by the first monitoring step that the counter value is repeated in a certain pattern. And a power control step of performing power saving control for reducing power consumption by the processor when it is detected by the second monitoring step that there is no change in an address related to reading by the processor.
- a computer system is a computer system including a processor, wherein the processor is connected to a first monitoring unit that monitors whether a counter value is repeated in a constant pattern in a program counter in the processor.
- a second monitoring unit that monitors whether there is a change in an address related to reading by the processor in the generated bus, and the first monitoring unit detects that the counter value is repeated in a certain pattern,
- a power control unit that performs power saving control for reducing power consumption by the processor when the monitoring unit detects that there is no change in an address related to reading by the processor.
- the power control unit detects that the counter value is not repeated in a constant pattern by the first monitoring unit when power saving control is performed. Then, the power saving control can be terminated.
- the integrated circuit according to the present invention includes a processor and has a function of detecting that the processor is executing busy wait and a function of performing power saving control during busy wait execution. Therefore, it is useful to install in a computer system.
- Examples of computer systems to which the present invention can be applied include personal computers, mainframes (general-purpose large computers), televisions, video decks, HDD recorders, mobile phones, car navigation systems, landline phones, copy machines, network relay devices, Examples include a mobile terminal with a touch panel, a game machine, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Power Sources (AREA)
Abstract
Description
<コンピュータシステム1300の構成>
本発明の実施の形態1にかかる集積回路を備えたコンピュータシステムの構成を、図1を参照しながら説明する。図1は、コンピュータシステム1300の構成を模式的に示すブロック図である。
図2は、省電力制御装置の図示を省略し、コンピュータシステム1300を簡略化して示す。この図に基づいて、コンピュータシステム1300の基本的な動作を説明する。
プロセッサ1301の命令セットを、図3を参照しながら説明する。図3は、命令セットに含まれるアセンブラコードの命令1401と、各命令1401の概要1402及び演算内容1403とを例示する図である。
本実施の形態にかかるプロセッサのレジスタセット、つまり、レジスタ群1310の構成を、図4を参照しながら説明する。
命令列保持部1340に保持される命令列1600のデータ構造の一例を、図5を参照しながら説明する。命令列保持部1340には、命令が格納されていたアドレス1601と、命令1602とが対応付けられて格納される。
次に、省電力制御装置の構成を、図1を参照しながら説明する。
まず、第一の構成要素は、ループ検出部1701と、ループ伝搬依存解析部1702と、第一の電力制御部1703とによって構成される。そして、ループ検出部1701によってループ処理の実行を検出し、ループ伝搬依存解析部1702によってビジーウェイト用のループ処理であるか演算用のループ処理等であるかを判定する。そして、ビジーウェイト用のループ処理である場合に、第一の電力制御部1703によって省電力制御を行う。以下、第一の構成要素の動作について、図面を参照しながら説明する。
ループ検出部1701の動作を、図6のフローチャートを参照しながら説明する。
ここで、ループ伝搬依存解析部1702によって、ビジーウェイトに用いられるループと、ビジーウェイトでないループ(例えば、演算用のループ)とが判別されることを、具体的な命令列を列挙しながら説明する。
まず、ループ伝搬依存解析部1702の解析対象となるビジーウェイト用ループの命令列の例として、図8の命令列を挙げる。
次に、ループ伝搬依存解析部1702によって、ループ伝搬依存が検出されるループを構成する命令列の例として、図11の命令列2600を挙げる。
上記2つの例によって、ビジーウェイトに用いられるループ処理が省電力制御の対象として選択的に検出され、ビジーウェイト用ではないループ処理は選択的に検出されないことが理解される。
次に、第一の電力制御部1703の動作を、図13のフローチャートを参照しながら説明する。
省電力制御装置の第二の構成要素である、ループ脱出検出部1704(ループ終了検出部の一例である)と、第二の電力制御部1705とについて説明する。これらループ脱出検出部1704と、第二の電力制御部1705とは、それぞれ、プロセッサ1301がビジーウェイト用のループ処理から脱出したことを検出する処理、省電力制御を終了させる電力制御を行う。
ループ脱出検出部1704の動作を、図14のフローチャートを参照しながら説明する。
次に、第二の電力制御部1705の動作を、図15のフローチャートを参照しながら説明する。
ループ伝搬依存解析部1702は、例えば、図16、図17に示すデジタルの電子回路を用いてループ伝搬依存の有無を解析するものとすることができる。
<I/Oデバイス>
なお、上記実施の形態1にかかるコンピュータシステムの変形例を、図19に示す。なお、図19において、図1と同じ構成要素については、同じ符号を用い、説明を省略する。
実施の形態2におけるマルチスレッド対応のコンピュータシステム2900の構成を、図20を参照しながら説明する。
レジスタ群2910のデータ構造を、図21を参照しながら説明する。
第一の電力制御部3013及び第二の電力制御部3015は、リタイアメントユニット1360内に設けられる。これにより、リタイアメントユニット1360の機能を用いてレジスタ群2910への書き込み等を容易に行うことができる。
第一の電力制御部3013の動作のフローチャートを図22に示す。
上記の例では、コンピュータシステム2900は、第一の電力制御部3013と、第二の電力制御部3015は、クロック1307の周波数を制御することで省電力制御を行い、コンピュータシステム2900で浪費される電力を抑制していた。
上記2種類の省電力制御を合わせると、各スレッドに割り当てられるプロセッサリソース(処理能力)は、「各スレッドのタイムスライス÷全スレッドのタイムスライスの総和×クロック1307の周波数×実行ユニット1350の稼動本数」となる。
実施の形態3におけるコンピュータシステムの構成を、図26を参照しながら説明する。
集積回路4010内のプロセッサ4020は、実施の形態1のプロセッサ1301から、省電力制御装置(ループ検出部1701、ループ伝搬依存解析部1702、第一の電力制御部1703、ループ脱出検出部1704、第二の電力制御部1705)を取り除いた構成である。また、発行ユニット1330Aは、実施の形態1の発行ユニット1330から、ループ伝搬依存解析部1702を取り除いた構成であり、リタイアメントユニット1360Aは、実施の形態1のリタイアメントユニット1360からループ検出部1701、ループ脱出検出部1704を取り除いた構成である。
実施の形態3では、プログラムカウンタ監視部4100、バス監視部4200、第一の電力制御部1703、第二の電力制御部1705が、省電力制御装置を構成する。なお、プログラムカウンタ監視部4100、バス監視部4200、第一の電力制御部1703、第二の電力制御部1705の一部又は全部をプロセッサ内に設けるようにしてもよい。
プログラムカウンタ監視部4100の省電力制御に関する動作を、図27のフローチャートを参照しながら説明する。但し、図27のフローチャートは、プロセッサ4020がループ処理を実行する間、プログラムカウンタ1311の値は一定のパターンで繰り返されることに着目したものである。
バス監視部4200の動作を、図28のフローチャートを参照しながら説明する。但し、図28のフローチャートは、プロセッサ4020がループ処理を実行し、そのループ間にループ伝搬依存関係がなければ読み出しに係るアドレスに変化が生じない場合があることに着目したものである。
プログラムカウンタ監視部4100の省電力制御終了に関する動作を、図29のフローチャートを参照しながら説明する。但し、図29のフローチャートは、プロセッサ4020がループ処理の実行終了後、プログラムカウンタ1311の値がループ処理の実行中の一定のパターンでの繰り返しから外れることに着目したものである。
実施の形態4におけるコンピュータシステムの構成を、図30を参照しながら説明する。
集積回路4510内のプロセッサ4520は、実施の形態2のプロセッサ2901から、省電力制御装置(ループ検出部1701、ループ伝搬依存解析部1702、第一の電力制御部3013、ループ脱出検出部1704、第二の電力制御部3015)を取り除いた構成である。
実施の形態4では、プログラムカウンタ監視部4100、バス監視部4200、第一の電力制御部3013、第二の電力制御部3015が、省電力制御装置を構成する。なお、プログラムカウンタ監視部4100、バス監視部4200、第一の電力制御部3013、第二の電力制御部3105の一部又は全部をプロセッサ内に設けるようにしてもよい。
プログラムカウンタ監視部4100には、各スレッドのプログラムカウンタ1311の値が入力されるとともに、スレッド切替え部2920から現在動作中のスレッドを示す情報が入力される。
バス監視部4200には、スレッド切替え部2920から現在動作中のスレッドを示す情報が入力される。
プログラムカウンタ監視部4100は、第一の電力制御部3013から受け取った各実行指示1713Aに含まれる情報が示すスレッドについてプログラムカウンタ1311を監視する。そして、プログラムカウンタ監視部4100は、スレッド切替え部2920から受け取った現在動作中のスレッドのプログラムカウンタ1311の値が一定のパターンで繰り返していないことを検出した場合に、第二の電力制御部3014へ省電力制御終了実行の実行指示1714を出力する。なお、この実行指示1714はプログラムカウンタ1311の値が一定のパターンで繰り返さなくなったスレッドを示す情報を含むものとする。
(1)前記実施の形態および変形例の説明はあらゆる点において本発明の例示にすぎず、その範囲を限定しようとするものではない。また、本発明の範囲を逸脱することなく種々の改良や変形を行うことができることは言うまでもない。
本発明の一態様における集積回路は、プロセッサを備えた集積回路であって、前記プロセッサにおいて、1以上の命令からなるループを繰り返し実行するループ処理が実行されることを検出するループ検出部と、前記ループ処理において命令間の依存が実行回の異なる2つのループにまたがるループ伝搬依存を検出するループ伝搬依存解析部と、前記ループ検出部によって検出されたループ処理に、前記ループ伝搬依存解析部によってループ伝搬依存が検出されない場合に、前記ループ処理の実行による電力消費を低減させる省電力制御を行う電力制御部と、を備える。
1301 プロセッサ
1302 主記憶
1303 I/Oデバイス
1304 電力供給デバイス
1305 バス
1306 電力
1307 クロック
1310 レジスタ群
1311 プログラムカウンタ
1320 命令フェッチ/デコードユニット
1330 発行ユニット
1340 命令列保持部
1350 実行ユニット
1360 リタイアメントユニット
1370 命令キャッシュ
1380 データキャッシュ
1400 命令セット
1500 レジスタセット
1512 プログラムカウンタ
1513 条件フラグレジスタ
1701 ループ検出部
1702 ループ伝搬依存解析部
1703 第一の電力制御部
1704 ループ脱出検出部
1705 第二の電力制御部
1900 ループ範囲記憶部
2200 依存関係解析用バッファ
2202 DSTレジスタ
2203 SRCレジスタ
2700 依存関係解析用バッファ
2800 コンピュータシステム
2851 フロー依存検出回路
2853 予備的検出回路
2855 命令バッファ
2857 比較回路
2861 命令バッファ
2871 簡易ループ伝搬依存検出回路
2900 コンピュータシステム
2901 プロセッサ
2910 レジスタ群
2912 スレッド識別子レジスタ
2913 タイムスライスレジスタ
2920 スレッド切替え部
3013 第一の電力制御部
3015 第二の電力制御部
Claims (27)
- プロセッサを備えた集積回路であって、
前記プロセッサにおいて、1以上の命令からなるループを繰り返し実行するループ処理が実行されることを検出するループ検出部と、
前記ループ処理において命令間の依存が実行回の異なる2つのループにまたがるループ伝搬依存を検出するループ伝搬依存解析部と、
前記ループ検出部によって検出されたループ処理に、前記ループ伝搬依存解析部によってループ伝搬依存が検出されない場合に、前記ループ処理の実行による電力消費を低減させる省電力制御を行う電力制御部と、
を備える集積回路。 - さらに、ループ処理の実行が終了したことを検出するループ終了検出部を備え、
前記電力制御部は、省電力制御が行われている状態において、前記ループ終了検出部が、前記ループ処理の終了を検出した場合に省電力制御を終了させる、
請求項1記載の集積回路。 - 前記ループ伝搬依存解析部は、前記ループ処理において、第一のループで変数に書き込まれた値が、前記第一のループの後に実行される第二のループで前記変数から読み出されることをもって、ループ伝搬依存があると判定する、
請求項2記載の集積回路。 - 前記ループ検出部は、前記プロセッサにおいて、先行したアドレスに分岐する命令が実行されたことをもって、前記プロセッサがループ実行状態になったことを検出する、
請求項3記載の集積回路。 - さらに、ループの範囲を記憶する、ループ範囲記憶部を有し、
前記ループ検出部は、先行したアドレスに分岐する分岐命令を検出した場合に、前記ループ範囲記憶部にループの範囲を出力し、さらに、前記ループ伝搬依存解析部に依存関係解析を指示し、
前記ループ伝搬依存解析部は、前記ループ範囲記憶部に記憶されたループの範囲の命令列についてループ伝搬依存の検出処理を行う、
請求項4記載の集積回路。 - 前記ループ終了検出部は、
前記ループ範囲記憶部に記憶されたループの範囲外に分岐する分岐命令が実行されたこと、又は、前記ループの範囲の末尾に位置する条件分岐命令の実行結果が分岐不成立であることをもって、前記プロセッサにおけるループ処理の実行が終了したことを検出する、
請求項5記載の集積回路。 - 前記ループ範囲記憶部は、ループの先頭アドレスと、ループの末尾アドレスとを含む情報を記憶する、
請求項5記載の集積回路。 - 前記プロセッサは、フェッチされた命令列を保持する命令列保持部を備えており、
前記ループ伝搬依存解析部は、前記命令列保持部に保持された前記ループの範囲の命令列についてループ伝搬依存の検出処理を行う、
請求項5記載の集積回路。 - 前記命令列保持部に記憶される命令列は、プリフェッチされた命令を含む、
請求項8記載の集積回路。 - 前記電力制御部は、省電力制御を行う場合に、前記プロセッサへ供給するクロックの周波数を減少させる制御を行い、省電力制御を終了する場合に、前記プロセッサへ供給するクロックの周波数を増加させる制御を行う、
請求項2記載の集積回路。 - 前記電力制御部は、省電力制御を行う場合に、前記プロセッサへ供給する電力の電圧を低下させる制御を行い、省電力制御を終了する場合に、前記プロセッサへ供給する電力の電圧を増大させる制御を行う、
請求項2記載の集積回路。 - 前記電力制御部は、省電力制御を行う場合に、前記プロセッサへ供給する電力の電圧を低下させるとともに前記プロセッサへ供給するクロックの周波数を減少させる制御を行い、省電力制御を終了する場合に、前記プロセッサへ供給する電力の電圧を増大させるとともに前記プロセッサへ供給するクロックの周波数を増加させる制御を行う、
請求項2記載の集積回路。 - 前記ループ伝搬依存解析部は、
前記ループ処理において、一の命令の書き込み対象の変数と、読み出し対象の変数とが同じ場合に、ループ伝搬依存が存在すると判定する、
請求項1記載の集積回路。 - 前記プロセッサは、複数の実行ユニットを備え、
前記電力制御部は、省電力制御を行う場合に、前記複数の実行ユニットの一部のものを停止させる制御を行い、省電力制御を終了する場合に、停止させた前記複数の実行ユニットの一部のものの実行を再開させる制御を行う、
請求項2記載の集積回路。 - 前記プロセッサは、複数のスレッドの各々に割り当てられるタイムスライスを管理する、スレッド管理部を有し、
前記電力制御部は、
省電力制御を行う場合に、ループ伝搬依存が検出されないループ処理を実行しているビジーウェイト状態のスレッドに割り当てるタイムスライスの減少を前記スレッド管理部に指示し、
省電力制御を終了する場合に、前記スレッドに割り当てるタイムスライスの増加を前記スレッド管理部へ指示する、
請求項2記載の集積回路。 - 前記電力制御部は、
省電力制御を行う場合に、ビジーウェイト状態のスレッドに割り当てるタイムスライスの減少を前記スレッド管理部へ指示するとともに、前記プロセッサに供給するクロックの周波数を減少させる制御を行い、
省電力制御を終了する場合に、前記スレッドに割り当てるタイムスライスの増加を前記スレッド管理部へ指示するとともに、前記プロセッサに供給するクロックの周波数を増加させる制御を行う、
請求項15記載の集積回路。 - 前記プロセッサは、複数の実行ユニットを備え、
前記電力制御部は、
省電力制御を行う場合に、ビジーウェイト状態のスレッドに割り当てるタイムスライスの減少を前記スレッド管理部へ指示するとともに、前記複数の実行ユニットのうちの一部を停止させる制御を行い、
省電力制御を終了する場合に、前記スレッドに割り当てるタイムスライスの増加を前記スレッド管理部へ指示するとともに、前記複数の実行ユニットの一部の実行を再開させる制御を行う、
請求項15記載の集積回路。 - 前記プロセッサは、複数の実行ユニットを備え、
前記電力制御部は、
省電力制御を行う場合に、ビジーウェイト状態のスレッドに割り当てるタイムスライスの減少を前記スレッド管理部へ指示するとともに、前記プロセッサに供給するクロックの周波数を減少させ、前記複数の実行ユニットのうちの一部を停止させる制御を行い、
省電力制御を終了する場合に、前記スレッドに割り当てるタイムスライスの増加を、前記スレッド管理部へ指示するとともに、前記プロセッサに供給するクロックの周波数を増加させ、前記複数の実行ユニットの一部の実行を再開させる制御を行う、
請求項15記載の集積回路。 - 前記電力制御部は、
省電力制御を行う場合に、ビジーウェイト状態のスレッドに割り当てるタイムスライスの減少を前記スレッド管理部へ指示するとともに、前記プロセッサに供給するクロックの周波数を減少させ、前記プロセッサに供給する電力の電圧を低下させる制御を行い、
省電力制御を終了する場合に、前記スレッドに割り当てるタイムスライスの増加を前記スレッド管理部へ指示するとともに、前記プロセッサに供給するクロックの周波数を増加させ、前記プロセッサに供給する電力の電圧を増大させる制御を行う、
請求項15記載の集積回路。 - 前記電力制御部は、
省電力制御を行う場合に、ビジーウェイト状態のスレッドに割り当てるタイムスライスの減少分に応じて、前記プロセッサに供給するクロックの周波数を減少させる、
請求項16記載の集積回路。 - 前記電力制御部は、
省電力制御を行う場合に、ビジーウェイト状態のスレッドに割り当てるタイムスライスの減少分に応じて、前記複数の実行ユニットの稼働本数を減少させる、
請求項17記載の集積回路。 - プロセッサを備えたコンピュータシステムの制御方法であって、
前記プロセッサにおいて、1以上の命令からなるループを繰り返し実行するループ処理が実行されることを検出するループ検出ステップと、
前記ループ処理において命令間の依存が実行回の異なる2つのループにまたがるループ伝搬依存を検出するループ伝搬依存検出ステップと、
前記ループ検出ステップによって検出されたループ処理に、前記ループ伝搬依存解析ステップによってループ伝搬依存が検出されない場合に、前記ループ処理の実行による電力消費を低減させる省電力制御を行う電力制御ステップと、
を備える制御方法。 - プロセッサを備えたコンピュータシステムであって、
前記プロセッサにおいて、1以上の命令からなるループを繰り返し実行するループ処理が実行されることを検出するループ検出部と、
前記ループ処理において命令間の依存が実行回の異なる2つのループにまたがるループ伝搬依存を検出するループ伝搬依存解析部と、
前記ループ検出部によって検出されたループ処理に、前記ループ伝搬依存解析部によってループ伝搬依存が検出されない場合に、前記ループ処理の実行による電力消費を低減させる省電力制御を行う電力制御部と、
を備えるコンピュータシステム。 - プロセッサを備えた集積回路であって、
プロセッサ内のプログラムカウンタにおいてカウンタ値が一定のパターンで繰り返しているかを監視する第1監視部と、
前記プロセッサが接続されたバスにおいて前記プロセッサによる読み出しに係るアドレスに変化がないかを監視する第2監視部と、
前記第1監視部によって前記カウンタ値が一定のパターンで繰り返されていることが検出され、前記第2監視部によって前記プロセッサによる読み出しに係るアドレスに変化がないことが検出された場合に、前記プロセッサによる電力消費を低減させる省電力制御を行う電力制御部と、
を備える集積回路。 - 前記電力制御部は、省電力制御が行われている場合において、前記第1監視部によって前記カウンタ値が一定のパターンで繰り返さなくなったことが検出されると、前記省電力制御を終了させる、
請求項24記載の集積回路。 - プロセッサを備えたコンピュータシステムの制御方法であって、
プロセッサ内のプログラムカウンタにおいてカウンタ値が一定のパターンで繰り返しているかを監視する第1監視ステップと、
前記プロセッサが接続されたバスにおいて前記プロセッサによる読み出しに係るアドレスに変化がないかを監視する第2監視ステップと、
前記第1監視ステップによって前記カウンタ値が一定のパターンで繰り返されていることが検出され、前記第2監視ステップによって前記プロセッサによる読み出しに係るアドレスに変化がないことが検出された場合に、前記プロセッサによる電力消費を低減させる省電力制御を行う電力制御ステップと、
を備える制御方法。 - プロセッサを備えたコンピュータシステムであって、
プロセッサ内のプログラムカウンタにおいてカウンタ値が一定のパターンで繰り返しているかを監視する第1監視部と、
前記プロセッサが接続されたバスにおいて前記プロセッサによる読み出しに係るアドレスに変化がないかを監視する第2監視部と、
前記第1監視部によって前記カウンタ値が一定のパターンで繰り返されていることが検出され、前記第2監視部によって前記プロセッサによる読み出しに係るアドレスに変化がないことが検出された場合に、前記プロセッサによる電力消費を低減させる省電力制御を行う電力制御部と、
を備えるコンピュータシステム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012521283A JP5853216B2 (ja) | 2010-06-25 | 2011-06-07 | 集積回路、コンピュータシステム、制御方法 |
US13/395,985 US8918664B2 (en) | 2010-06-25 | 2011-06-07 | Integrated circuit, computer system, and control method, including power saving control to reduce power consumed by execution of a loop |
CN201180003832.3A CN102576318B (zh) | 2010-06-25 | 2011-06-07 | 集成电路、计算机系统、控制方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010144799 | 2010-06-25 | ||
JP2010-144799 | 2010-06-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011161884A1 true WO2011161884A1 (ja) | 2011-12-29 |
Family
ID=45371091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/003185 WO2011161884A1 (ja) | 2010-06-25 | 2011-06-07 | 集積回路、コンピュータシステム、制御方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8918664B2 (ja) |
JP (1) | JP5853216B2 (ja) |
CN (1) | CN102576318B (ja) |
WO (1) | WO2011161884A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022177955A (ja) * | 2021-05-19 | 2022-12-02 | 株式会社ユニバーサルエンターテインメント | 遊技機 |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9459871B2 (en) * | 2012-12-31 | 2016-10-04 | Intel Corporation | System of improved loop detection and execution |
US9087570B2 (en) | 2013-01-17 | 2015-07-21 | Micron Technology, Inc. | Apparatuses and methods for controlling a clock signal provided to a clock tree |
EP2972781A4 (en) | 2013-03-15 | 2016-10-19 | Intel Corp | METHOD AND SYSTEMS FOR VECTORIZING SCALAR COMPUTER PROGRAM GRINDINGS WITH GRINDING DEPENDENCIES |
GB2514618B (en) * | 2013-05-31 | 2020-11-11 | Advanced Risc Mach Ltd | Data processing systems |
US11789769B2 (en) | 2013-09-20 | 2023-10-17 | Qualcomm Incorporated | System and method for generation of event driven, tuple-space based programs |
US10564949B2 (en) * | 2013-09-20 | 2020-02-18 | Reservoir Labs, Inc. | System and method for generation of event driven, tuple-space based programs |
JP6183251B2 (ja) * | 2014-03-14 | 2017-08-23 | 株式会社デンソー | 電子制御装置 |
US9524011B2 (en) | 2014-04-11 | 2016-12-20 | Apple Inc. | Instruction loop buffer with tiered power savings |
US9952863B1 (en) | 2015-09-01 | 2018-04-24 | Apple Inc. | Program counter capturing |
US10437483B2 (en) | 2015-12-17 | 2019-10-08 | Samsung Electronics Co., Ltd. | Computing system with communication mechanism and method of operation thereof |
US10579125B2 (en) * | 2016-02-27 | 2020-03-03 | Intel Corporation | Processors, methods, and systems to adjust maximum clock frequencies based on instruction type |
US11194573B1 (en) | 2018-02-09 | 2021-12-07 | Rigetti & Co, Llc | Streaming execution for a quantum processing system |
US11132233B2 (en) * | 2018-05-07 | 2021-09-28 | Micron Technology, Inc. | Thread priority management in a multi-threaded, self-scheduling processor |
US11296999B2 (en) * | 2018-06-26 | 2022-04-05 | Telefonaktiebolaget Lm Ericsson (Publ) | Sliding window based non-busy looping mode in cloud computing |
JP7125525B2 (ja) * | 2021-05-19 | 2022-08-24 | 株式会社ユニバーサルエンターテインメント | 遊技機 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0869411A (ja) * | 1994-08-30 | 1996-03-12 | Toshiba Corp | 半導体装置 |
JPH0991136A (ja) * | 1995-09-25 | 1997-04-04 | Toshiba Corp | 信号処理装置 |
JP2009053861A (ja) * | 2007-08-24 | 2009-03-12 | Panasonic Corp | プログラム実行制御装置 |
JP2009069921A (ja) * | 2007-09-11 | 2009-04-02 | Hitachi Ltd | マルチプロセッサシステム |
US20090113191A1 (en) * | 2007-10-25 | 2009-04-30 | Ronald Hall | Apparatus and Method for Improving Efficiency of Short Loop Instruction Fetch |
JP2009146243A (ja) * | 2007-12-17 | 2009-07-02 | Hitachi Ltd | 基板バイアス制御を活用する電力性能最適化コンパイラ及びプロセッサシステム |
JP2010066892A (ja) * | 2008-09-09 | 2010-03-25 | Renesas Technology Corp | データプロセッサ及びデータ処理システム |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2912713B2 (ja) | 1991-01-30 | 1999-06-28 | エヌティエヌ株式会社 | 高温耐久性グリース組成物 |
JPH09114660A (ja) * | 1995-10-18 | 1997-05-02 | Hitachi Ltd | データ処理装置 |
JP4253796B2 (ja) | 2001-11-08 | 2009-04-15 | 富士通株式会社 | コンピュータ及び制御方法 |
US7873820B2 (en) * | 2005-11-15 | 2011-01-18 | Mips Technologies, Inc. | Processor utilizing a loop buffer to reduce power consumption |
US7721127B2 (en) * | 2006-03-28 | 2010-05-18 | Mips Technologies, Inc. | Multithreaded dynamic voltage-frequency scaling microprocessor |
-
2011
- 2011-06-07 US US13/395,985 patent/US8918664B2/en not_active Expired - Fee Related
- 2011-06-07 JP JP2012521283A patent/JP5853216B2/ja not_active Expired - Fee Related
- 2011-06-07 CN CN201180003832.3A patent/CN102576318B/zh not_active Expired - Fee Related
- 2011-06-07 WO PCT/JP2011/003185 patent/WO2011161884A1/ja active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0869411A (ja) * | 1994-08-30 | 1996-03-12 | Toshiba Corp | 半導体装置 |
JPH0991136A (ja) * | 1995-09-25 | 1997-04-04 | Toshiba Corp | 信号処理装置 |
JP2009053861A (ja) * | 2007-08-24 | 2009-03-12 | Panasonic Corp | プログラム実行制御装置 |
JP2009069921A (ja) * | 2007-09-11 | 2009-04-02 | Hitachi Ltd | マルチプロセッサシステム |
US20090113191A1 (en) * | 2007-10-25 | 2009-04-30 | Ronald Hall | Apparatus and Method for Improving Efficiency of Short Loop Instruction Fetch |
JP2009146243A (ja) * | 2007-12-17 | 2009-07-02 | Hitachi Ltd | 基板バイアス制御を活用する電力性能最適化コンパイラ及びプロセッサシステム |
JP2010066892A (ja) * | 2008-09-09 | 2010-03-25 | Renesas Technology Corp | データプロセッサ及びデータ処理システム |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022177955A (ja) * | 2021-05-19 | 2022-12-02 | 株式会社ユニバーサルエンターテインメント | 遊技機 |
JP7278632B2 (ja) | 2021-05-19 | 2023-05-22 | 株式会社ユニバーサルエンターテインメント | 遊技機 |
Also Published As
Publication number | Publication date |
---|---|
CN102576318A (zh) | 2012-07-11 |
US20120179924A1 (en) | 2012-07-12 |
JP5853216B2 (ja) | 2016-02-09 |
CN102576318B (zh) | 2016-03-30 |
US8918664B2 (en) | 2014-12-23 |
JPWO2011161884A1 (ja) | 2013-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5853216B2 (ja) | 集積回路、コンピュータシステム、制御方法 | |
US10551896B2 (en) | Method and apparatus for dynamic clock and voltage scaling in a computer processor based on program phase | |
CN108027767B (zh) | 寄存器读取/写入排序 | |
JP5043560B2 (ja) | プログラム実行制御装置 | |
EP3686741B1 (en) | Backward compatibility testing of software in a mode that disrupts timing | |
US10078357B2 (en) | Power gating functional units of a processor | |
TWI461908B (zh) | 於即時指令追蹤紀錄中之除錯動作的選擇性紀錄技術 | |
US20070234094A1 (en) | Methods and apparatus to control power consumption within a processor | |
JP2003516570A (ja) | マルチスレッド・プロセッサ内の複数のスレッドに入り、出る方法と装置 | |
KR20140113444A (ko) | 공유 메모리에 대한 액세스들의 동기화를 완화하기 위한 프로세서들, 방법들 및 시스템들 | |
US20170161075A1 (en) | Increasing processor instruction window via seperating instructions according to criticality | |
KR20030051380A (ko) | 마이크로프로세서 | |
US8037366B2 (en) | Issuing instructions in-order in an out-of-order processor using false dependencies | |
Rengasamy et al. | Critics critiquing criticality in mobile apps | |
US20020156999A1 (en) | Mixed-mode hardware multithreading | |
US8095780B2 (en) | Register systems and methods for a multi-issue processor | |
US20140201505A1 (en) | Prediction-based thread selection in a multithreading processor | |
US11880231B2 (en) | Accurate timestamp or derived counter value generation on a complex CPU | |
CN118245187A (zh) | 线程调度方法及装置、电子设备及存储介质 | |
KR20240128829A (ko) | 루프 재생 성능을 최적화하기 위한 프로세서에서 캡처된 루프의 최적화 | |
JP2008186854A (ja) | 半導体集積回路 | |
JPWO2007004323A1 (ja) | 情報処理装置 | |
CN114356416A (zh) | 处理器及其控制方法、装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180003832.3 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11797773 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012521283 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13395985 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11797773 Country of ref document: EP Kind code of ref document: A1 |