WO2006011189A1

WO2006011189A1 - Parallel computer

Info

Publication number: WO2006011189A1
Application number: PCT/JP2004/010610
Authority: WO
Inventors: Atsuo Ozaki
Original assignee: Mitsubishi Denki Kabushiki Kaisha
Priority date: 2004-07-26
Filing date: 2004-07-26
Publication date: 2006-02-02
Also published as: JPWO2006011189A1; JP4082439B2

Abstract

It is possible to parallel-process a task within the processing limit time of the task while reducing the power consumption. A parallel computer divides a task into a plurality of processing units and executes the divided processing units in parallel. The parallel computer includes: task dividing means (11) for dividing a task into a plurality of processing units which can be executed in separate processors and outputting the divided processing units as sub-tasks; a sub-task attribute information file (12) for holding attribute information on the sub-tasks divided by the task dividing means (11); processors (14-1 to 14-N) configured so that the power consumption amount is controlled from outside and executing the sub-tasks divided by the task dividing means (11); and processor control means (13) for distributing the sub-tasks divided by the task dividing means (11), to the processors (14-1 to 14-N) according to the sub-task attribute information held in the sub-task attribute information file (12), instructing execution of the sub-tasks, and controlling the power consumption amount of the processors (14-1 to 14-N).

Description

Specification

Parallel computer

Technical field

[0001] The present invention relates to a parallel computer in which a single task is divided into a plurality of execution units, and each execution unit is processed in parallel by a plurality of processors, and the processing capability of the entire parallel computer is maintained. On the other hand, it relates to a technology for saving power consumption.

[0002] Further, the present invention relates to a technique for saving power consumption of the entire parallel computer while satisfying a restriction on processing completion time imposed on each task.

Background art

[0003] Portable information devices such as mobile phones and notebook computers are required to be lightweight. However, these devices often incorporate large-capacity batteries to drive processors with a high operating frequency for a long time. The large capacity and the heavy weight of the battery are a major problem in reducing the weight of portable information devices.

[0004] A technology is known that changes the operating frequency of the processor according to the type and content of processing to extend the duration while reducing the capacity of the battery and reducing its weight! . This is based on the principle that the power consumption can be saved by operating the processor at a low operating frequency! /.

[0005] By the way, even in portable information devices, there is a demand for processing with time constraints such as multimedia data processing, and there are many cases where real-time processing is required as in an embedded system. . For example, Japanese Patent Application Laid-Open No. 2002-99432 (hereinafter referred to as Patent Document 1) is known as a power saving technique for appropriately changing the operating frequency while aiming at processing having a restriction on processing time.

[0006] With this technology, task scheduling is performed while determining whether the processing time requirement of each task with processing time constraints is satisfied, and there is room in the processing time requirement of the entire task. In this method, the processor operating frequency and power supply voltage are changed to save power.

[0007] As a technique for speeding up the processing, the processor is operated at a high operating frequency. In addition to the above, a method of performing parallel processing by combining a plurality of processors is often used. For example, Japanese Patent Application Laid-Open No. 2002-215599 (hereinafter referred to as Patent Document 2) is known as a technique for reducing power consumption by controlling the operating frequency of each processor constituting such a multiprocessor system. And

[0008] In the method in Patent Document 2, when a plurality of processors are used to process a plurality of tasks, when some of the processors complete processing earlier than the other processors, By keeping the power supply voltage low according to the processing completion time of other processors, the power consumption can be reduced.

Disclosure of the invention

Problems to be solved by the invention

However, the method in Patent Document 2 is based on the processing completion time of other processors, and is not based on the time constraint of the processing itself. Therefore, the method disclosed in Patent Document 2 cannot be applied to a system that has time constraints on processing.

[0010] On the other hand, the method in Patent Document 1 is based on a system in which a single processor power is also configured. When applied to a multiprocessor system, the method depends on the minimum processing unit between tasks. It is clear that it cannot be applied without satisfying the condition that there is no relationship or that the influence of the dependency relationship can be ignored.

[0011] In the field of parallel computers, parallel arithmetic algorithms in which each processor cooperates to solve a single problem (task) have been widely studied. However, even if the method of Patent Document 1 or the method of Patent Document 1 and the method of Patent Document 2 are combined, these research results cannot be used.

[0012] The present invention was made to solve such a problem, and provides a computer that completes a single task by parallel processing within a requested processing time while reducing power consumption. For the purpose!

Means for solving the problem

[0013] A parallel computer according to the present invention is a parallel computer that divides a task into a plurality of processing units and executes the divided processing units in parallel.

The above tasks are divided into multiple processing units that can be executed by individual processors. A task dividing means for outputting a processing unit as a plurality of subtasks;

A subtask attribute information file that holds attribute information of the subtask divided by the task dividing means;

A plurality of processors configured to control power consumption from the outside and executing subtasks divided by the task dividing means;

Based on the subtask attribute information held in the subtask attribute information file, the subtask divided by the task dividing means is distributed to the plurality of processors to instruct execution of the subtask and the plurality of processors. Processor control means for controlling the power consumption of

It is equipped with.

[0014] In the above, the concept of subtask includes a partial instruction code sequence formed by dividing a part of an instruction code sequence constituting a task, but it goes without saying. Instead of dividing the instruction code itself that constitutes the task, the data that is the processing target of the task that is not divided is divided into a plurality of processing units.

The invention's effect

As described above, according to the parallel computer of the present invention, the power consumption of each processor is controlled while distributing the subtask to a plurality of processors based on the attribute information of the subtask divided into the task powers. As a result, the power consumption can be reduced while satisfying the task execution time constraints.

Brief Description of Drawings

FIG. 1 is a block diagram showing a configuration of a parallel computer according to Embodiment 1 of the present invention;

FIG. 2 is a diagram showing the characteristics of the processor of the parallel computer according to Embodiment 1 of the present invention;

FIG. 3 is a flowchart of the parallel computer according to Embodiment 1 of the present invention;

FIG. 4 is a diagram for explaining a method for selecting an execution method according to the first embodiment of the present invention;

[Figure 5] A diagram showing the relationship of boundary values to be considered when selecting various execution methods.

FIG. 6 is a diagram showing the relationship between the number of processors and power consumption.

BEST MODE FOR CARRYING OUT THE INVENTION [0017] Implementation form 1.

FIG. 1 is a block diagram showing a configuration of a parallel computer according to Embodiment 1 of the present invention. In the figure, a task input terminal 10 is an input terminal for inputting a task to be processed by the parallel computer. Here, a task is a central processing unit (Central Processing Unit)

The unit of work in the interior). The term “work” here refers to a predetermined processing unit composed of a combination of computer instruction codes, so that the power of the computer operator and system administrator can be easily and easily handled. From this point of view, the task size is often determined. However, no matter what processing unit is used to configure one task, the features of the present invention are not lost.

In the figure, it is assumed that a task input terminal 10 is provided so that an external force can also input a task. However, the computer may be configured to autonomously acquire tasks stored in an external storage device under the control of the operating system. Computer systems having such a configuration are very common and need not be explained again here.

The task dividing means 11 is a part that divides a single task input from the task input terminal 10 into a plurality of subtasks.

[0020] The subtask attribute information file 12 is a file for storing additional information about each subtask, and is a random access memory (RAM), a fixed disk device or other storage device or storage element, or a storage device. Data stored by the circuit. Note that only the subtask attribute information file 12 need not physically exist alone, for example, a program executable file for a task input from the task input terminal 10 (instruction code and static data are stored, It is also possible to adopt a configuration in which the program is stored in a binary program file) and handled as the subtask attribute information file 12.

[0021] The control processor 13 as the processor control means distributes the subtask divided by the task dividing means 11 to a plurality of processors having 14 N power while referring to the subtask attribute information file 12. Above, it is the part that instructs the processor to which the subtask is distributed to process the subtask. The control processor 13 It has the feature of controlling the power consumption of the computing processors 14 1 1 14 N, and it aims to reduce power consumption while satisfying the task execution time constraints.

[0022] It should be noted that, as the subtask configuration, the task instruction code string is divided into instruction code strings having a smaller number of steps, and the data to be processed by the task is divided into smaller size data. It is conceivable that the configuration When sub-tasks are configured by dividing instruction code strings, they should be expressed as executing sub-tasks, and when sub-tasks are configured by dividing data, they should be expressed as processing sub-tasks. However, in order to simplify the notation here, we will use the expression that uniformly “processes subtasks”. However, the expression “process subtask” t includes the meaning of “execute subtask” t.

The arithmetic processor 14 1-1 14 N is an arithmetic device or a circuit for processing each subtask divided by the task dividing means 11. Furthermore, the computing processor 14 1 1 14 N can control the external power consumption. As a method of controlling power consumption, the arithmetic processor 14 1 1 14 N itself has an interface that directly changes the power consumption, and the power consumption is changed via this interface. In addition, if the instruction code of each subtask is decoded and executed based on the clock signal input from the external power! /, This clock signal can be changed. It is also possible to change the power consumption through.

FIG. 2 is a diagram showing an example of the characteristics of the processor for operation 14-1 and the processor 14N. As shown in the figure, the arithmetic processor 14 1 one processor 14 N can select at least three operation states of “high-speed operation state”, “standard operation state”, and “idle state”. When in high-speed operation, the processor 14—one processor 14 N operates at a voltage of 1.8 V and consumes 0.5 W of power with an operating frequency of 380 MHz. In the standard operating state, the arithmetic processor 14-1 processor 14N operates at an operating frequency of 152MHz and a voltage of 1.OV, and its power consumption is 0.053W. Furthermore, when operating in an idle state, the operating frequency is 33 MHz, the voltage is 1. OV, and the power consumption is 0, 0115W.

[0025] As shown by the characteristics shown in this figure, an electronic circuit generally has a higher operating frequency. It is known that the power consumption per unit time increases as the number of times increases. The relationship between power consumption P, operating frequency F, and power supply voltage V is given by equation (1) when leakage power is ignored. Where t is the signal transition rate and C is the capacitance.

[0026] P = tCFV ² (1)

[0027] It should be noted that the arithmetic processor 14 1-1 14 N has, for example, a configuration that does not change three operation states including a “high-speed operation state”, a “standard operation state”, and an “idle state”. However, the processor that can be used in the present invention is not limited to such an example.

[0028] Since the clock speed can vary depending on the temperature of the environment in which the computer is placed, a practical sales processor has a margin for the variation of the external clock. Such a sales processor operates faster when the external clock is increased, and operates slower as the external clock is decreased. Therefore, unlike the processor shown in the above example, it actively supports multiple operating states, and even when using a sales processor, it can proactively provide a margin for external clock fluctuations. This makes it possible to use the features of the present invention. Recently, processors that can reduce power consumption have been widely used in mopile applications and are well known in the art, so we will not go into further detail here.

[0029] It should be noted that the arithmetic processors 14 1 1 14 N should not be construed as being limited to each being, for example, an independent LSI component. For example, a vector processor is a single arithmetic device, but can execute a plurality of operations in parallel. The configuration of the computer shown in Fig. 1 includes such a configuration. It goes without saying that the control processor 13 and the arithmetic processor 14 1 1 14 N can be replaced with a completed computer such as a personal computer or a workstation. That is, the present invention can also be applied to a parallel computing system in which a plurality of computers are combined.

It should be noted that the task dividing means 11 may be configured as an independent control circuit or control device, or may be configured as a computer program executed by the control processor 13.

[0031] Further, the control processor 13 is configured to fetch the processor in a general processor architecture. If the task and subtask are regarded as a machine language instruction code and microcode in the processor, the entire system shown in Fig. 1 can be regarded as a single processor. . In this case, array processing by vector operation is regarded as a task, and processing of each element of the array is regarded as a plurality of subtasks. Furthermore, the task decomposition means 11 would be a compiler (language processor) that supports vector processing instructions called vectory compilers, and a decoder that decodes vector operation instructions into microcode. . Such compiler technology is already known. In addition, instead of processing units determined at the level of processor architecture, the relationship between processes and threads may be considered to correspond to the relationship between tasks and subtasks. In this case, the relationship between tasks and subtasks is flexibly defined based on the system design. In this way, the configuration in Figure 1 can be applied at various levels.

[0032] Next, the operation of the parallel computer according to the first embodiment of the present invention will be described. FIG. 3 is a flowchart showing the operation of this parallel computer. When a task to be executed is input from the task input terminal 10, the task dividing means 11 divides the task into subtasks (step S101). Subsequently, the control processor 13 acquires a task processing time limit T (step S102). The processing time limit T is a value predetermined by the system.

[0033] For example, in the case of a process and a thread, T is determined from the purpose of the user or the system. If the system is intended to perform signal processing of input signals (for example, some observations) that occur every fixed time (sampling time), the sampling time, which is the period for acquiring these signals, is limited. It will correspond to time T.

[0034] Further, the processing time limit T may be determined from the configuration of the parallel computer without the processing time limit being determined from the external specification. For example, when configuring a processor that completes most instructions within one clock with an external clock, the time limit corresponding to one external clock is the processing time limit T.

[0035] Subsequently, the control processor 13 calculates a task processing time tmin when the arithmetic processors 14 1 and 14 N are set to the high-speed operation state (step ST103). In order to realize this process, the estimated process completion time of each subtask must be divided in advance. Required. Therefore, for example, the processing time in the high-speed operation state and the standard state of each subtask by any of the processors 14 1 to 14 N is measured in advance and stored in the subtask attribute information file. Then, the control processor 13 obtains the processing time of the subtask according to the type of the subtask, and calculates the processing time tmin of the task.

[0036] It should be noted that the processing time of the subtask is measured only for one of the high-speed operation state and the standard operation state! /, Or only one of them, and the operation frequency of the measured processing time and the other operation state are measured. The other processing time may be approximated by multiplying the ratio with the operating frequency.

[0037] As a result, if tmin is less than the processing time limit T (step ST104: Yes), it means that the processor capacity for computing 14 1 1 14 N exceeds the processing capacity of the subtask to be processed. Since there is enough processing capacity, the process moves to the power saving process after step ST105.

[0038] On the other hand, if tmin does not fall below the processing time limit T, it is necessary to focus on high-speed processing rather than power consumption savings. Set the status (Step ST106: Execution method 1). Then, the process proceeds to step ST111. The processing after step ST111 will be described later.

[0039] In step ST105, the control processor 13 sets any one of the arithmetic processors 14-1 to 14N to the standard operation state, and performs all subtasks only for the processor set to the standard operation state. Calculate the task processing time tstd when executed. In this case, tstd is calculated based on the processing time of the subtask as in the calculation of tmin in step ST103. If this tstd force is exceeded (ST107: Yes), processing by any one of the processors 14 1 1 14 N must satisfy the request to complete the task within the processing time limit T. Since this is not possible, the process proceeds to parallel processing using a plurality of processors after step ST1 09.

[0040] On the other hand, if tstd does not exceed T, one processor alone can satisfy the request to complete the task within the processing time limit T. Standard operation of one processor, for example, processor 14 1 Set the status (step ST108). Power! Then, the other processors excluding the arithmetic processor 14-1, that is, the arithmetic processors 14 2-14 N are set in an idle state.

By so doing, it is possible to simultaneously achieve a reduction in power consumption while satisfying the requirement for real-time processing of completing task processing within a predetermined processing time limit.

On the other hand, if tstd exceeds T, select one of the following processing methods (execution method 3 and execution method 4) based on the nature of the subtask and the nature of each processor (operation frequency, power consumption): Based on the processing method, calculate the number of processors n and the operating frequency used for subtask processing. (Step ST109).

[0042] Execution method 3:

Arithmetic processor 14—1 Selects one of the 14 N computing processors, sets the operating frequency of the selected computing processor to the operating frequency ι8 of the high-speed operating state, and uses this computing processor to Perform subtasks. Arithmetic processors other than the selected arithmetic processor are set in an idle state.

[0043] Execution method 4:

Arithmetic Processor 14—1 Selects n computing processors out of 14N, and selects the selected n processors (2≤n) with the operating frequency of the selected computing processor as the operating frequency O in the standard operating state. It is executed by a processor for ≤N). Arithmetic processors other than the selected n arithmetic processors are set in an idle state.

[0044] Execution method 5:

Arithmetic processor 14 1 1 14 Select m (m <n) arithmetic processors from 14 N, and set the selected processor's operating frequency as the operating frequency of the high-speed operating state) 8. It is executed by a processor of m <n≤N). All but the selected m processors are set to idle state.

[0045] Next, a method for selecting one of execution method 3, execution method 4, and execution method 5 will be described.

FIG. 4 shows a time chart example of execution method 3 and execution method 4 within the processing constraint time (T). Since the difference between the two is the part within the bold frame, the power consumption for this part is Compare competence. In the case of FIG. 4, the processing constraint time (T) can be expressed as equation (2) because the processing time of execution method 4 is longer than that of execution method 3. Here, Tc (= TS + TR) is a time required for one communication process, and is a combination of a transmission processing time TS and a reception processing time TR. Τα is the execution time when one piece of processing data is processed by one processor at the operating frequency α. Η indicates the number of processors.

[0047] T = (n-1) -TC + T a / η (2)

[0048] Equation (3) shows the power consumption C2 [W's] by execution method 3 in this case. Here, the first term of Equation (3) is the power consumption required to process data at the operating frequency β, and the remaining second term is the idle processor (Fig. 4: processor for computation). This figure shows the amount of power consumed by the processor 1 (14) and the processor (Figure 4: processor 14 1) during the idle period after data processing. In addition, k = α Z j8.

[0049] C2 = Pj8 -k-Ta + k-Ργ · Τα · (n-1)

+ η · Ργ · [Τα · (l / n-k) + (n-1) -Tc]

= Ρβ -k-Τα + Ργ · 【(1— k) · Τα + η · (n-1) -Tc] (3)

[0050] Similarly, equation (4) shows the power consumption C3 [W's] by execution method 4 in this case. Here, the first term in equation (4) is the sum of the power consumption required for communication processing and the power consumption in all idle states, and the second term represents the power consumption required for data processing. It is a thing.

[0051] C3 = (n-1) -Pa -Tc + (1 / η) · Ρα · Τα

+ (η-1) · [Ρα -Tc + (1 / η) · Ρα · Τα + (η-2) · Ργ -Tc]

= (η-ΐ) · [2 · Ρα + (η-2) · Ργ] -Tc + Ρα · Τα (4)

[0052] If C2 = C3, then equation (5) can be derived from equations (3) and (4). The case where C2 = C3 is satisfied is when the power consumption by these two execution methods is equal, and the value of each parameter satisfying C2 = C3 becomes the boundary value, and parameter values other than this boundary value are taken. In addition, one of these execution methods is advantageous. Here, p represents the ratio of communication processing time to data processing (TcZT a).

[0053] p

+ Ργ · (lk)} / {2- (n-1) (Ρα ~ Ργ)} (5) [0054] If p calculated based on this equation (5) is compared with P 3 for power saving execution selected by execution method 2, the superiority or inferiority of execution methods 3 and 4 can be determined. It can be seen that execution method 3 should be applied if there is, and execution method 4 should be applied if> P3. The discussion so far relates to the case where the processing time of execution method 4 is longer than that of execution method 3 based on Fig. 4, but in the opposite case, equations (3) and (4 ) Is a different force The same equation (5) is derived. However, when n = 2, 3, depending on the size relationship between the transmission processing time Ts and the reception processing time TR, for example, the execution processor 4 shown in FIG. There is a case that it will be scratched. However, assuming Ts = TR, even if n = 2, 3, it is given by Eq. (5).

[0055] FIG. 5 shows the value of p with respect to the number of arithmetic processors ( _n ≥ 2) when the values in FIG. 2 are given to the parameters on the right side of equation (5). Execution methods 3 and 4, and superiority and inferiority of 1 and 4 can be obtained by analyzing the target task and finding the optimal number of processors and the p value in that case for power saving in execution method 4. This can be determined from FIG.

[0056] Fig. 6 shows the ratio (E3ZE4) of the power consumption to execution method 4 when execution method 3 is selected / executed with respect to an appropriate p (≤0.05). If p≤0.05, the effect of parallel processing is always obtained when the number of processors is in the range of 2-20. From this result (Fig. 6), it can be confirmed that when the value of is constant, this ratio decreases as the number of processors increases. Conversely, if p decreases, this ratio increases. Therefore, if / 0 becomes smaller as the number of arithmetic processors increases, the rate of reduction of this ratio will be smaller during that state.

[0057] In this way, the relationship between the number of processors and p as shown in Fig. 5 is obtained in advance based on the ratio of communication processing and processing time, and the operating frequency and power consumption of the arithmetic processor. It is stored in a storage area such as the attribute information file 12. In step S 109, the execution method 3 or the execution method 4 is selected from the relationship of the expression (5) for the control processor 13.

[0058] In the above example, as the dependency relationship between subtasks, the expression described by expanding the communication processing example for distributing the subtasks to the computing processors 14 1 1 14 N to other dependency relationships is shown. (3) —It is easy to derive a relationship corresponding to equation (5). [0059] Regarding the selection of execution methods 3 and 5, since both methods have the same operating frequency, execution method 3 is selected if both methods are completed within the time limit. This is because power saving can be executed when the number of processors used is small.

[0060] Further, regarding the selection of execution methods 4 and 5, if execution method 4 requires more processing time than execution method 5, the power consumption C5 of execution method 5 is as follows.

[0061] C5 = (m-l)-[2-Pj8 + (m-2) -Py] -k-Tc + Pj8 -Tj8

+ Ργ-{Tc- [n- (n-l) -k-m- (m— 1)] + Τα · (1— k)} (6)

[0062] Here, the first and second terms in Equation (6) are the power consumption of the processor to which the process is assigned, and the third and fourth terms are the idle processor and the process. It shows the power consumption of a processor that is in an idle state because it has been allocated but is in a wait state after processing is complete.

[0063] Therefore, the difference in power consumption C5—C3 between execution method 4 and execution method 5 is

[0064] C5-C3 = Tc- {2-k- (m— 1) (Ρβ-Ργ) ~ 2 · (η— 1) · (Ρα-Ργ)}

+ Τα-{k- (Ρ | 8-Ργ)-(Ρα-Ργ)} (7)

[0065] Use this equation (7) to determine the superiority or inferiority of execution method 4 and execution method 5. Even if execution method 5 requires more processing time than execution method 4, equation (6) is different, but the same equation (7) is derived.

[0066] Finally, the control processor 13 distributes the subtasks to the arithmetic processors 14 1 to 14 based on the execution method determined in step S109, and instructs the execution of the subtasks (step ST110).

Thus, according to the parallel computer of the first embodiment of the present invention, the task is divided into subtasks, and any one of the execution methods 1 and 4 is executed based on the subtask dependency. Since the task is selected and executed in parallel, the total power consumption of multiple processors can be reduced while satisfying the task processing constraint time.

[0068] In the above description, it is assumed that the control processor 13 is a dedicated processor for distributing subtasks. The load of the control processor 13 may be lower than that of the processor 14-1 to 14N. So the processor 14 1 1 14 N functions Configure it for use!

Industrial applicability

The present invention can be widely applied to a computer processing system for parallel operations such as a parallel computer system having a plurality of computers in a cluster configuration or a parallel processing processor having a plurality of operation instruction processing units.

Claims

The scope of the claims

[1] In a parallel computer that divides a task into multiple processing units and executes the divided processing units in parallel.

A task dividing means for dividing the task into a plurality of processing units that can be executed by individual processors, and outputting the divided processing units as a plurality of subtasks;

A parallel computer characterized by comprising:

[2] In the parallel computer according to claim 1,

The subtask attribute information is characterized by holding the processing time of the subtask as attribute information.

[3] In the parallel computer according to claim 2,

Some or all of the plurality of processors are connected to the first processor operating in the standard operating state.

A second processor that can be set to operate in either a standard operating state or an idle state that consumes less power than the standard operating state,

The processor control means obtains the processing time of the subtask attribute information file power subtask, calculates the task processing time on the assumption that the subtask is processed using only the first processor, and calculates the calculated task processing time. A parallel computer that sets the second processor to an idle state and distributes the subtask to only the first processor and instructs execution of the subtask when the time is shorter than a predetermined time.

[4] In the parallel computer according to claim 3,

The first processor is configured as a processor that can be set between a standard operation state and a high-speed operation state that operates at a higher speed than the standard operation state.

The processor control means obtains the processing time of the subtask attribute information file and calculates the task processing time on the assumption that the subtask is processed using only the first processor set to the high-speed operation state. When the calculated task processing time is shorter than the predetermined time, the second processor is set in an idle state and the first processor is set in a high-speed operation state, and only the first processor is set. A parallel computer that distributes the subtask and instructs execution of the subtask.

[5] In the parallel computer according to claim 1,

A subtask attribute information file stores a dependency relationship between the subtask and another subtask.

[6] In the parallel computer according to claim 5,

The subtask attribute information file stores the relationship between the number of processors and the time to distribute subtasks to these processors as a dependency between the subtask and other subtasks.

The processor control means distributes the subtasks to the plurality of processors based on the relationship between the number of the plurality of processors held in the subtask attribute information file and the time for distributing the subtasks to the processors, and the processors. A parallel computer characterized by controlling the power consumption of the computer.