WO2012105230A1 - 分散システム、装置、方法及びプログラム - Google Patents
分散システム、装置、方法及びプログラム Download PDFInfo
- Publication number
- WO2012105230A1 WO2012105230A1 PCT/JP2012/000605 JP2012000605W WO2012105230A1 WO 2012105230 A1 WO2012105230 A1 WO 2012105230A1 JP 2012000605 W JP2012000605 W JP 2012000605W WO 2012105230 A1 WO2012105230 A1 WO 2012105230A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- normal
- power saving
- state
- nodes
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/329—Power saving characterised by the action undertaken by task scheduling
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to a distributed system having a plurality of nodes, and more particularly to a distributed system that saves power of the entire distributed system when the plurality of nodes are operated as one distributed system.
- a distributed system has several tens to thousands of nodes (computers and storage devices) each including a processor and a storage medium, and is used as one system by connecting with a network, and cannot be obtained by one node. It is a system that can obtain capacity and storage capacity.
- a distributed storage system in which nodes are connected to a network and data is stored and used by using an HDD (hard disk drive) or a memory.
- HDD hard disk drive
- the computer in which data is allocated and the computer that performs processing are realized by software and special hardware, and the operation is dynamically changed according to the system status. Resource usage is adjusted to improve performance for system users (client computers).
- the power consumption of one node constituting the distributed system is about 150 W / H.
- the power consumption of the entire system becomes enormous. There is a problem.
- the amount of load on the system changes from moment to moment, and there is not always a load that makes full use of all nodes.
- the number of nodes required in the system varies with time, such as using 10% of the entire system at some times and 90% of the entire system at other times.
- Patent Document 1 describes a cluster system that reduces power consumption by shifting an idle node that is not executing a job to a suspended state.
- jobs are assigned to multiple nodes and executed, and when a job is completed, the next job is assigned to multiple nodes. Manage the job of attaching.
- Patent Document 2 discloses a job management method for executing a job given to the system with low power consumption in a computer system such as the above-described supercomputer.
- the job status for the system is held, the future job execution time and the number of nodes necessary for job execution are determined, and the necessary nodes for each job execution are determined in advance. Prepare before starting job execution, stop nodes that are not required for job execution, and reduce power consumption of the entire system.
- Non-Patent Document 1 The node stop used in the above-mentioned prior art usually takes a stop state defined by ACPI: Advanced Configuration and Power Interface (Non-Patent Document 1).
- the time required for node recovery is concealed by returning the stop node in advance based on the job execution schedule.
- a system such as a super computer that processes jobs in the execution queue
- it is effective in the case where one job is cooperatively operated by several hundred nodes and processed in an order of several tens of minutes (parallel processing).
- a processing request to the system cannot be predicted as in a distributed storage system, or when there are many small jobs, or a large job is divided into small tasks and distributed to many nodes (distributed processing)
- the time required for return cannot be concealed.
- the present invention in a distributed system having a plurality of nodes, suppresses the reduction in processing performance when the load increases while suppressing the power consumption of the entire distributed system by shifting the nodes to a stopped state. It is an object of the present invention to provide a distributed system, an information processing apparatus, a distribution method, and a distributed program.
- the distributed system includes a normal node having a plurality of power saving states with different return times to a normal operation state, and a management node that assigns the job to the normal node and executes the job, and the management node is in the power saving state.
- a node selection unit that selects a normal node to be assigned and executed from a normal node, and a node control unit that controls the normal node selected by the node selection unit to return to a normal operation state.
- the normal nodes are selected in order from the power saving state with a short time until returning to the normal operation state.
- the distributed system includes a normal node including a power supply control unit and a task execution unit, a job reception unit that receives a job execution command, a job received by the job reception unit, and decomposes the job received into one or a plurality of tasks.
- a distributed system comprising: a job management unit that causes one or a plurality of normal nodes to execute a task; and a management node that includes a node power control unit that manages and controls a power state of the normal node.
- the job management means executes a task obtained by disassembling the job according to the amount of jobs accepted by the management node.
- the node power control means has less than the number of normal nodes that execute the task determined by the job management means in the normal startup state among the normal nodes
- a startup node selection function for selecting a normal node from a normal node in one or more of power saving states in a plurality of stages as much as the number of normal nodes for executing a task is satisfied.
- a task execution instruction for instructing execution of a task is issued to a normal node that can be executed and a normal node that has returned to a normal activation state in accordance with the activation instruction issued by the node power supply control means.
- a task is executed in accordance with a task execution instruction issued by the means.
- An information processing apparatus is an information processing apparatus that assigns a job to a normal node and executes the job in a distributed system, and saves power among normal nodes having a plurality of power saving states with different return times to the normal operation state
- Node selection means for selecting a normal node to be executed by assigning a job from a normal node in a state, and a node control means for controlling the normal node selected by the node selection means to return to a normal operation state.
- the means is characterized in that a plurality of power saving states are selected in order from a normal node in a power saving state with a short time until returning to a normal operation state.
- the distribution method according to the present invention is a distribution method in which a job is assigned to a normal node and executed, and is a normal node in a power saving state among a plurality of power saving states having different return times to the normal operation state.
- select a normal node to allocate and execute a job from select the normal node to control the selected normal node to return to the normal operation state, and select a normal node to allocate and execute a job from the normal node in the power saving state
- the selection is made in order from the normal node in the power saving state in which the time until returning to the normal operation state among the plurality of power saving states is short.
- a distributed program according to the present invention is a distributed program for allocating a job to a normal node and executing the job in a distributed system, wherein the computer has a plurality of power saving states with different return times to the normal operation state.
- Node selection processing for selecting a normal node to be executed by assigning a job from a normal node in the power saving state, and node control processing for controlling the selected normal node to return to the normal operation state.
- the processing is characterized in that selection is made in order from a normal node in a power saving state with a short time until returning to a normal operation state among a plurality of power saving states.
- the power consumption of the entire distributed system is suppressed by shifting the nodes to the stopped state, but the deterioration of the processing performance when the load increases is suppressed. Can be achieved.
- FIG. 1 is a block diagram illustrating a configuration example of a distributed system according to the present embodiment.
- the distributed system includes one or more client nodes 100, one or more management nodes 200, and one or more normal nodes 300 connected to a network having an access path determination unit.
- the distributed system includes one or more client nodes 100, one or more management nodes 200, and one or more normal nodes 300 connected to a network having an access path determination unit.
- one client node 100 and one management node 200 are shown, but a plurality of client nodes 100 and management nodes 200 may be included.
- the normal node 300 has an individual node number (node 001 to node XXX) one by one. Specifically, the normal node 300 stores the node number in the storage unit.
- the client node 100 is a node that requests job execution.
- a job execution request issued by the client node 100 is transmitted to the management node 200 via the network.
- the client node 100 transmits a job execution request to the management node 200 via the network.
- the management node 200 includes a job reception unit 210, a job control unit 220, and a node power supply control unit 230.
- the job receiving unit 210 has a function of receiving a job execution request issued by the client node 100. Specifically, the job receiving unit 210 receives a job execution request transmitted from the client node 100 via the network. Hereinafter, the expression that the client node 100 issues a job execution request is used. Specifically, the client node 100 transmits a job execution request to the management node 200 via the network.
- the job control unit 220 has a function of dividing the job received by the job receiving unit 210 as a unit task that can be executed by each normal node 300 and requesting the normal node 300 to execute the task. Specifically, the job control unit 220 divides the job indicated by the job execution request received by the job receiving unit 210 into tasks that can be executed by the normal node 300, and then divides the job into the normal node 300 via the network. Send an execution request.
- the node power control unit 230 has functions of managing the power state of the normal node 300, determining the normal node 300 that performs power control such as node stop / return, and issuing a power control request. Specifically, the node power control unit 230 transmits a power control request to the normal node 300 that performs power control via the network.
- the expression that a power control request is issued is used. More specifically, the power control request is transmitted to the normal node 300 via the network.
- the power control request issued by the management node 200 includes a stop command for stopping the normal node 300 in the power saving state (hereinafter also referred to as a node stop command) and a return command for returning the normal node 300 from the power saving state (hereinafter referred to as the node stop command).
- a stop command for stopping the normal node 300 in the power saving state
- the node stop command for returning the normal node 300 from the power saving state
- an activated state for example, an execution state or an idle state, which will be described later, also referred to as a normal operation state or a normal activation
- FIG. 2 is a block diagram illustrating a configuration example of the management node 200.
- the job control unit 220 includes a job decomposition unit 221, a task placement determination unit 222, and an instruction notification unit 223.
- the job decomposing unit 221 has a function of dividing a job as a unit task that can be executed by each normal node 300. Specifically, the job decomposing unit 221 divides the job indicated by the job execution request received by the job receiving unit 210 into units of tasks that can be executed by the normal node 300.
- the task placement determination unit 222 has a function of determining which normal node 300 is to execute the task decomposed by the job decomposition unit 221.
- the command notification unit 223 has a function of notifying the normal node 300 of a task execution command and a power control request according to the determination of the task placement determination unit 222.
- the expression of issuing a task execution instruction is used.
- the management node 200 transmits a task execution instruction to the normal node 300 via the network.
- the node power control unit 230 includes a power control node determination unit 231.
- the power control node determination unit 231 determines the normal node 300 to be subjected to power control and the power state to be transferred when a power control request (or a node activation request described later) is received from the job control unit 220 or every predetermined period. It has a function to do.
- the management node 200 includes a storage unit 240.
- the storage unit 240 spans the job control unit 220 and the node power supply control unit 230, task allocation information for managing the task distribution status for each normal node 300 and the task execution status of the normal node 300, and each normal node 300. And node state information for managing the power state of each node.
- the normal node 300 includes a communication unit 310, a task execution unit 320, a return instruction reception unit 330, and a power supply control unit 340.
- the communication unit 310 has a function of receiving a task execution command issued by the management node 200 and a stop command in the power control request.
- the task execution unit 320 has a function of executing a task based on the task execution command received by the communication unit 310.
- the return command receiving unit 330 has a function of receiving a return command in the power control request issued by the management node 200.
- the power supply control unit 340 has a function of performing power supply control in accordance with a stop command received by the communication unit 310 and a return command received by the return command receiving unit 330.
- a single normal node 300 can simultaneously process a predetermined number of tasks.
- one task can be executed to simplify the explanation.
- Fig. 3 shows a state transition diagram when power is controlled by a normal node.
- the normal node 300 can shift to three types of power saving states (stop state level 1, stop state level 2, and stop state level 3) having different power saving effects. .
- the power saving state there is a power saving state defined by ACPI (Advanced Configuration and Power Interface).
- S1 defined by ACPI as stop state level 1 (processor power supply stop)
- S3 defined by ACPI as stop state level 2 (power is supplied only to the memory)
- S4 defined by ACPI as stop state level 3 Let's move to (Evacuate memory contents, stop all power supply).
- electricity is supplied to the return command receiving unit 330, and the return command receiving unit 330 can always receive the return command.
- the normal node 300 Even if the normal node 300 is not in the idle state but in the task execution state, the normal node 300 holds data indicating the execution state in a storage device such as a memory or a disk device, and enters the power saving state via the idle state. You can also migrate.
- the power saving effect is smaller in the order of S1, S3, S4. That is, S1 is the smallest, S3 is the next smallest, and S4 is the largest.
- the state transition time for shifting from the idle state to the power saving state or returning from the power saving state to the idle state is short in the order of S1, S3, and S4. That is, S1 is shortest, S3 is shortest, and S4 is longest. That is, the power saving effect is higher as the power saving state has a longer state transition time.
- FIG. 4 shows an example of the power consumption of each state and the time taken for each state transition.
- task placement information and node state information held by the management node 200 in the distributed system will be described.
- the task placement information and the node state information are stored in the storage unit 240 of the management node 200.
- FIG. 5 is a sequence diagram showing the flow of job execution.
- the client node 100 transmits a job execution request to the management node 200.
- the management node 200 that has received the job execution request from the client node 100 causes the job disassembly unit 221 to disassemble the received job into one or more tasks [job disassembly process shown in FIG. 5].
- the accepted job is a job indicated by the received job execution request.
- the management node 200 determines whether or not the disassembled task can be executed by the activated normal node 300 because a part of the ordinary node 300 is stopped in the low power consumption state. To do. When it is determined that the normal node 300 that is activated cannot be executed, the management node 200 restores part or all of the normal nodes 300 that are stopped in the low power consumption state [node restoration shown in FIG. processing].
- the task placement determination unit 222 notifies the node power supply control unit 230 of the number of nodes to be returned together with the node activation request. Then, the node power supply control unit 230 determines a return target node and issues a node return command to the determined normal node 300 to be returned. Thereafter, the normal node 300 that has received the return command performs a node return process, and returns a node return response to the management node 200 after the return.
- the node return process is, for example, controlling the normal node 300 to shift from the stopped state to the idle state.
- the normal node 300 that has received the task execution instruction executes the task in accordance with the task execution instruction, and transmits a task completion notification to the management node 200 after the task execution is completed.
- the management node 200 determines whether to stop the normal node 300 that has completed the task in a low power consumption state [power control determination processing]. When it is determined to stop in the low power consumption state, the management node 200 issues a node stop command to the corresponding normal node 300. Then, the normal node 300 that has received the node stop command stops in a low power consumption state. On the other hand, if it is determined in the power supply control determination process that the normal node 300 is not to be stopped, the management node 200 does nothing and the corresponding normal node 300 also stands by in an idle state.
- job disassembly processing, task placement determination processing, node return processing, and power control determination processing in the management node 200 will be described.
- a processing sequence between the management node 200 and the normal node 300 when the management node 200 issues each command of a node return command, a task execution command, and a node stop command will be described.
- the job disassembling unit 221 disassembles one or more jobs into (number of all normal nodes 300 ⁇ number of normal nodes 300 executing tasks) or less tasks.
- the job decomposing unit 221 decomposes the task as one task processed by one normal node 300 per unit. For example, when a process of merging a number sequence written in N files as a job and sorting them into one file is given, N-1 sort tasks, one (sort and file merge) task, The method of decomposing into two can be considered.
- the task disassembling method in the job disassembling unit 221 is not limited.
- the task allocation determination unit 222 determines the normal node 300 that executes the task decomposed in the job decomposition process. Specifically, the task placement determination unit 222 refers to the node state information stored in the storage unit 240 and determines whether or not the number of tasks is equal to or less than the number of normal nodes 300 in the idle state. If it is determined that the number of tasks is equal to or less than the number of normal nodes 300 in the idle state, the task placement determination unit 222 executes the tasks for the normal nodes 300 corresponding to the number of tasks among the normal nodes 300 in the idle state. Select as a node.
- the task placement determination unit 222 selects all the idle normal nodes 300 as task execution nodes, and the remaining (number of tasks) -Number of normal nodes in the idle state)
- the normal nodes 300 are activated by the node return process and are selected as the normal nodes 300 to execute tasks.
- the normal node 300 selected as the task execution node starts task execution according to the task execution instruction immediately after the task placement determination process if it is in the idle state, and after transitioning to the idle state in the node return process if it is in the stop state. .
- the method for determining which of the normal nodes 300 in the idle state to select as a node for executing the task is not limited. Further, the determination method as to which task is allocated to which normal node 300 is not limited.
- FIG. 6 shows a procedure in which the node power supply control unit 230 receives a node activation request, selects the normal nodes 300 that are in the stopped state for the number of nodes to be activated, and issues a return command to the normal nodes 300.
- the normal nodes 300 in the stopped state are selected in order from the stopped state level 1, that is, the nodes to be returned in order from the node having the low power saving effect and the fast return, and the return command is selected. Issuing.
- the node power control unit 230 receives the node activation request output from the task placement determination unit 222 (step S10).
- the node power supply control unit 230 receives a value n indicating the number of normal nodes to be activated together with the node activation request.
- the node power supply controller 230 issues a return command to all the normal nodes 300 at the stop state level k (step S13). ). Then, the node power supply control unit 230 sets the value of x to x—the number of normal nodes 300 (stop state level k), and repeats the processing of steps S12 and S13 until the stop state level 3 is reached in order of increasing stop state level ( Steps S14, 15, 16). Thereafter, the node power supply control unit 230 waits for reception of nx node return responses (step S19), and completes the node activation process.
- the node power supply controller 230 instructs the x of the normal nodes 300 at the stopped state level k to return. Is issued (step S17). Thereafter, the node power control unit 230 waits for reception of n node return responses (steps S18 and S19), and completes the node activation process.
- the power supply control node determination unit 231 issues a return command specifying the node number of the normal node 300 to be returned.
- the issued return command is notified to the return command receiving unit 330 of the normal node 300 to be returned via the command notification unit 223.
- the return command receiving unit 330 of the normal node 300 When the return command receiving unit 330 of the normal node 300 receives the return command, it starts energizing the node and outputs a return request to the power supply control unit 340. Then, the power supply control unit 340 performs node return processing.
- the node return process is, for example, controlling the normal node 300 to shift from the stopped state to the idle state.
- the normal node 300 When the node return process by the power control unit 340 is completed, the normal node 300 notifies the management node 200 of a node return response indicating that the node return process is completed, together with its own node number, using the communication unit 310. .
- the power supply control node determination unit 231 of the management node 200 receives the node return response, the node state information stored in the storage unit 240 is rewritten. Specifically, the power supply control node determination unit 231 rewrites the node state information to indicate that the normal node 300 that has received the node return response is being activated.
- FIG. 8 is a sequence diagram illustrating an example of a flow from when the task placement determination unit 222 issues a task execution instruction to when the normal node 300 completes task execution.
- the task placement determination unit 222 After the task placement information is rewritten, the task placement determination unit 222 notifies the task execution command to the normal node 300 that executes the task via the command notification unit 223.
- the normal node 300 executes the task in the task execution unit 320 according to the received task execution command.
- the normal node 300 uses the communication unit 310 to notify the management node 200 of the task completion notification together with the task execution result.
- the task placement determination unit 222 of the management node 200 When the task placement determination unit 222 of the management node 200 receives the task completion notification, it rewrites the task placement information to the effect that the task has been completed. That is, the task placement determination unit 222 rewrites the task placement information so as to indicate that the target normal node 300 is not executing a task. Then, when the rewriting of task placement information is completed, the management node 200 executes a power supply control determination process.
- idle state nodes (total number of nodes ⁇ number of execution state nodes) ⁇ 0%
- stop state level 1 (total number of nodes ⁇ number of execution state nodes) ⁇ 10%
- stop state level 2 There is also a method of calculating the required number of nodes from the number of nodes in each state at that time, such as: (total number of nodes ⁇ number of execution state nodes) ⁇ 30%.
- FIG. 9 is a flowchart illustrating an example of a processing procedure of power supply control determination processing.
- the power supply control node determination unit 231 sequentially determines the number of nodes stopped at the stop state level k from the stop state level 1 to the stop state level 2. Compare with the set number set in advance for each stop state level.
- the power supply control node determination unit 231 determines whether or not the number of idle nodes is smaller than the set value of the stop state level k (here, stop state level 1) (steps S22 and S23). ).
- step S23 when it is determined in step S23 that the number of idle nodes is greater than or equal to the set value, the power supply control node determination unit 231 repeats the process of step S23 until the stop state level 3 is reached (steps S25 and S26). . Thereafter, the power control node determination unit 231 determines to shift to the stop state level 3, issues a stop command to the target normal node 300 (step S27), and completes the power control determination process.
- FIG. 10 is a sequence diagram illustrating an example of a flow from when the power supply control node determination unit 231 of the node power supply control unit 230 issues a stop command until the normal node 300 stops in the power saving state.
- the power supply control node determination unit 231 first changes the node state information so that the state of the normal node 300 to be stopped indicates the stopped state. Thereafter, the power control node determination unit 231 notifies the stop target normal node 300 of the stop command via the command notification unit 223.
- the normal node 300 that has received the stop command by the communication unit 310 outputs a stop command to the power supply control unit 340 and controls to stop at the specified stop level according to the stop command.
- the distributed system of this embodiment executes a job.
- the power supply control node determination unit 231 causes the normal node 300 that is in a stopped state to have more recovery time at a timing asynchronous with job execution so as to satisfy the required number of nodes set for each idle state and stopped state level. Change the state to a short stop state or idle state [node number adjustment processing].
- FIG. 11 is a flowchart illustrating an example of a procedure in the case of adding n normal nodes 300 to the idle state or the stopped state level L in the node number adjustment process.
- the idle state is described as a stop state level 0.
- the power supply control node determination unit 231 determines that the number of normal nodes 300 at the stop state level k (here, stop state level 3) is higher than x (here, n) in descending order of the stop state level. It is determined whether it is small (steps S31 and S32).
- the power supply control node determination unit 231 changes all the normal nodes 300 in the stopped state level k to the stopped state level L (step S1). S33). Then, the power control node determination unit 231 sets the value of x to x ⁇ the number of normal nodes 300 (stop state level k), and repeats the processes of steps S32 and S33 until the stop state level L is reached in order of increasing stop state level. (Steps S34, 35, 36). Thereafter, the power supply control node determination unit 231 completes the node adjustment process.
- the power supply control node determination unit 231 determines x of the normal nodes 300 at the stop state level k to the stop state level. Change to L (step S37). Thereafter, the node power supply control unit 230 sets the value of x to 0 (step S38) and completes the node adjustment process.
- the n normal nodes 300 among the normal nodes 300 in the higher stop state level can be shifted to the stop state level L.
- the normal node 300 in the stopped state is shifted to a lower stopped state level, for example, it is returned to the idle state by a return command (for example, shown in FIG. 7), and after the return response is received, Stop in a stopped state (eg shown in FIG. 10).
- timing at which the node number adjustment process is performed is arbitrary, for example, it is desirable to take a timing at which the node number adjustment process is performed after a predetermined period since the last power control request (stop command or return command) is issued. .
- the distributed system according to the present invention suppresses the power consumption of the entire distributed system by shifting the node to the stopped state, but when the load becomes large, the stopped state level that quickly returns. It is possible to suppress a decrease in processing performance by causing tasks to be executed by returning them in order from the lowest node.
- the normal node 300 can execute only one task at a time.
- a plurality of tasks may be executed simultaneously. In this case, it is necessary to manage all assigned tasks in the task placement information.
- the task placement determination unit 222 determines task placement, a task placement determination method may be used in which many tasks are placed on a node that is active and has a small number of running tasks.
- FIG. 12 is a block diagram illustrating a minimum configuration example of the distributed system.
- the distributed system includes, as minimum components, a normal node 10 having a plurality of power saving states with different return times to the normal operation state, and a management node that assigns and executes jobs to the normal node 10. 20.
- the management node 20 includes a node selection unit 21 and a node control unit 22.
- the node selection unit 21 selects a normal node 10 to which a job is assigned and executed from the normal node 10 in a power saving state.
- the node selection means 21 selects in order from the normal node 10 in the power saving state in which the time until returning to the normal operation state among the plurality of power saving states is short. Then, the node control unit 22 controls the normal node 10 selected by the node selection unit 21 to return to the normal operation state.
- the power consumption of the entire distributed system is suppressed by shifting the nodes to the stopped state, but when the load increases, the nodes in the order of the lower stop state level that are quicker to recover By returning and executing the task, it is possible to suppress a decrease in processing performance.
- the distributed system performs normal jobs (for example, realized by the normal node 300) having a plurality of power saving states with different return times to normal operation states (for example, an execution state and an idle state) and jobs
- a node selecting means for example, realized by the management node 200 that assigns and executes a job to a node, and the management node selects a normal node to be assigned and executed by a normal node in a power saving state (for example, the task placement determination unit 222 and the power control node determination unit 231) and node control means (for example, power control node determination) for controlling the normal node selected by the node selection unit to return to the normal operation state.
- the node selection means includes a plurality of power saving states. And selecting from the ordinary node in the order in the power saving state is short time to return to the normal operation state.
- a normal node includes at least a processor that executes arithmetic processing, a memory that stores information, and a non-volatile storage device, and stops only the power of the processor as a power-saving state that can be transferred.
- a first power saving state for example, stop state level 1
- a second power saving state for example, stop state level 2
- the node selection means includes at least three kinds of power saving states such as a third power saving state (for example, a stopped state level 3) in which the context is stored in a nonvolatile storage device and all power supply is stopped.
- a normal node in a first power saving state and a normal node in a second power saving state It may be configured to select with priority in the order of the ordinary node in the third power saving state.
- the distributed system includes a normal node (for example, by the normal node 300) including power control means (for example, realized by the power control unit 340) and task execution means (for example, realized by the task execution unit 320).
- a job receiving unit that receives a job execution command (for example, realized by the job receiving unit 210), and the job received by the job receiving unit is divided into one or a plurality of tasks.
- Job management means for causing a plurality of normal nodes to execute tasks (for example, realized by the job control unit 220) and node power supply control means for managing and controlling the power state of the normal nodes (for example, realized by the node power supply control unit 230)
- a management node e.g., realized by the management node 200
- the power supply control means has a function of shifting the normal node to a plurality of power saving states in which the power consumption during the power saving state and the time for returning from the power saving state to the normal startup state are different from each other, and
- the job management means determines the number of normal nodes that execute the task that decomposed the job according to the amount of jobs accepted by the management node.
- the node power supply control means has the number of normal nodes that are in a normal activation state and can execute tasks among normal nodes.
- the number of normal nodes for executing the task determined by the job management means is less than the normal node in one or more of the power saving states of a plurality of stages.
- the start node From the start node, it has a startup node selection function that selects normal nodes as many as the number of normal nodes that execute the task, and in the startup node selection function, the time required to return to the normal startup state among the power saving states in multiple stages Are selected as normal nodes to be returned to the normal startup state in order from the normal node in the short power saving state, and issue a startup command that is an instruction to shift to the normal startup state to the selected normal node (
- the power management node determination unit 231 executes the processing
- the job management unit is a normal node that is in a normal activation state among normal nodes and that can execute tasks
- the node power source control unit issues
- a task execution command is issued to instruct the normal node that has returned to the normal startup state according to the startup command to execute the task.
- the stage is characterized in that the task is executed in accordance with a task execution instruction issued by the job management means.
- a normal node is equipped with at least a processor that executes arithmetic processing, a memory that stores information, and a non-volatile storage device.
- a first power saving state in which only the power supply is stopped (for example, a stopped state level 1) and a second power saving state (for example, a stopped state level 2) in which a calculation context is stored in the memory and power supply other than the memory is stopped.
- at least three kinds of power saving states such as a third power saving state (for example, a stopped state level 3) in which the operation context is stored in a nonvolatile storage device and all power supply is stopped. May be.
- the distributed system according to the present invention can be applied to distributed computers, distributed databases, distributed storage, parallel data processing systems, parallel file systems, parallel databases, data grids, and cluster computers.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Power Sources (AREA)
Abstract
Description
20 管理ノード
21 ノード選択手段
22 ノード制御手段
100 クライアントノード
200 管理ノード
210 ジョブ受信部
220 ジョブ制御部
221 ジョブ分解部
222 タスク配置決定部
223 命令通知部
230 ノード電源制御部
231 電源制御ノード決定部
240 記憶部
300 通常ノード
310 通信部
320 タスク実行部
330 復帰命令受信部
340 電源制御部
Claims (10)
- 通常動作状態への復帰時間が異なる複数の省電力状態を有する通常ノードと、
ジョブを前記通常ノードに割り当てて実行させる管理ノードとを備え、
前記管理ノードは、
前記省電力状態にある通常ノードから前記ジョブを割り当てて実行させる通常ノードを選択するノード選択手段と、
前記ノード選択手段が選択した通常ノードを前記通常動作状態に復帰させるように制御するノード制御手段とを含み、
前記ノード選択手段は、前記複数の省電力状態のうちの通常動作状態に復帰するまでの時間が短い省電力状態にある通常ノードから順に選択する
ことを特徴とする分散システム。 - 通常ノードは、演算処理を実行するプロセッサと、情報を記憶するメモリおよび不揮発性の記憶装置とを少なくとも搭載し、移行可能な省電力状態として、前記プロセッサの電源のみ停止する第1の省電力状態と、演算のコンテキストを前記メモリに保存し、前記メモリ以外の給電を停止する第2の省電力状態と、演算のコンテキストを前記不揮発性の記憶装置に保存し、全ての給電を停止する第3の省電力状態との3種類の省電力状態とを少なくとも含み、
ノード選択手段は、前記省電力状態にある通常ノードからジョブを割り当てて実行させる通常ノードを選択する際に、前記第1の省電力状態にある通常ノード、前記第2の省電力状態にある通常ノード、前記第3の省電力状態にある通常ノードの順に優先して選択する
請求項1記載の分散システム。 - 電源制御手段とタスク実行手段とを備える通常ノードと、
ジョブの実行命令を受け付けるジョブ受信手段と、前記ジョブ受信手段が受け付けたジョブを1つまたは複数のタスクに分解し、1台または複数台の前記通常ノードに該タスクを実行させるジョブ管理手段と、前記通常ノードの電源状態を管理制御するノード電源制御手段とを備える管理ノードとを備えた分散システムであって、
前記電源制御手段は、省電力状態中の消費電力と前記省電力状態から通常起動状態に復帰する時間とがそれぞれ異なる複数段階の省電力状態に該通常ノードを移行させる機能と、省電力状態にある該通常ノードを通常起動状態に復帰させる機能とを有し、
前記ジョブ管理手段は、該管理ノードが受け付けたジョブの量に応じて、該ジョブを分解したタスクを実行させる通常ノード数を決定し、
前記ノード電源制御手段は、前記通常ノードのうち通常起動状態にありタスク実行が可能な通常ノード数が、前記ジョブ管理手段が決定したタスクを実行させる通常ノード数に満たない場合に、前記複数段階の省電力状態のうちの1つ以上の状態にある通常ノードから、前記タスクを実行させる通常ノード数を満たすだけ通常ノードを選択する起動ノード選択機能を有し、前記起動ノード選択機能において、前記複数段階ある省電力状態のうちの通常起動状態に復帰するまでの時間が短い省電力状態にある通常ノードから順に、通常起動状態に復帰させる通常ノードとして選択し、選択した通常ノードに対して、通常起動状態への移行を指示する命令である起動命令を発行し、
前記ジョブ管理手段は、前記通常ノードのうち通常起動状態にありタスク実行が可能な通常ノードと、前記ノード電源制御手段が発行した前記起動命令に従って通常起動状態に復帰した通常ノードとに対してタスクの実行を指示するタスク実行命令を発行し、
前記タスク実行手段は、前記ジョブ管理手段が発行した前記タスク実行命令に従ってタスクを実行する
ことを特徴とする分散システム。 - 通常ノードは、演算処理を実行するプロセッサと、情報を記憶するメモリおよび不揮発性の記憶装置とを少なくとも搭載し、
電源制御手段は、制御する省電力状態として、前記プロセッサの電源のみ停止する第1の省電力状態と、演算のコンテキストを前記メモリに保存し、前記メモリ以外の給電を停止する第2の省電力状態と、演算のコンテキストを前記不揮発性の記憶装置に保存し、全ての給電を停止する第3の省電力状態との3種類の省電力状態とを少なくとも含む
請求項3記載の分散システム。 - ノード電源制御手段は、通常起動状態に復帰させる通常ノードを、第1の省電力状態にある通常ノードから選択し、次に、第2の省電力状態にある通常ノードから選択し、次に、第3の省電力状態にある通常ノードから選択する
請求項4記載の分散システム。 - ノード電源制御手段は、タスク実行が完了した通常ノードを、複数段階の省電力状態のうちのいずれかの状態に移行させることを決定し、前記通常ノードに対し、決定した省電力状態への移行を指示するノード停止命令を発行する
請求項3から請求項5のうちのいずれか1項に記載の分散システム。 - ノード電源制御手段は、タスク実行が完了した通常ノードに対してノード停止命令を発行するにあたり、各省電力状態ごとに予め設定された所定数に満たない省電力状態のうち、通常起動状態に復帰するまでの時間が短い省電力状態に移行させることを決定する
請求項6記載の分散システム。 - 分散システムにおいてジョブを通常ノードに割り当てて実行させる情報処理装置であって、
通常動作状態への復帰時間が異なる複数の省電力状態を有する前記通常ノードのうちの該省電力状態にある通常ノードから前記ジョブを割り当てて実行させる通常ノードを選択するノード選択手段と、
前記ノード選択手段が選択した通常ノードを前記通常動作状態に復帰させるように制御するノード制御手段とを含み、
前記ノード選択手段は、前記複数の省電力状態のうちの通常動作状態に復帰するまでの時間が短い省電力状態にある通常ノードから順に選択する
ことを特徴とする情報処理装置。 - ジョブを通常ノードに割り当てて実行させる分散方法であって、
通常動作状態への復帰時間が異なる複数の省電力状態を有する前記通常ノードのうちの該省電力状態にある通常ノードから前記ジョブを割り当てて実行させる通常ノードを選択し、
選択した通常ノードを前記通常動作状態に復帰させるように制御し、
前記省電力状態にある通常ノードから前記ジョブを割り当てて実行させる通常ノードを選択する際に、前記複数の省電力状態のうちの通常動作状態に復帰するまでの時間が短い省電力状態にある通常ノードから順に選択する
ことを特徴とする分散方法。 - 分散システムにおいてジョブを通常ノードに割り当てて実行させるための分散プログラムであって、
コンピュータに、
通常動作状態への復帰時間が異なる複数の省電力状態を有する前記通常ノードのうちの該省電力状態にある通常ノードから前記ジョブを割り当てて実行させる通常ノードを選択するノード選択処理と、
選択した通常ノードを前記通常動作状態に復帰させるように制御するノード制御処理とを実行させ、
前記ノード選択処理で、前記複数の省電力状態のうちの通常動作状態に復帰するまでの時間が短い省電力状態にある通常ノードから順に選択させる
ための分散プログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012555746A JP5786870B2 (ja) | 2011-02-02 | 2012-01-31 | 分散システム、装置、方法及びプログラム |
US13/981,643 US9201707B2 (en) | 2011-02-02 | 2012-01-31 | Distributed system, device, method, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011020949 | 2011-02-02 | ||
JP2011-020949 | 2011-02-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012105230A1 true WO2012105230A1 (ja) | 2012-08-09 |
Family
ID=46602454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/000605 WO2012105230A1 (ja) | 2011-02-02 | 2012-01-31 | 分散システム、装置、方法及びプログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US9201707B2 (ja) |
JP (1) | JP5786870B2 (ja) |
WO (1) | WO2012105230A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018123707A1 (ja) * | 2016-12-27 | 2018-07-05 | 日立オートモティブシステムズ株式会社 | 制御装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006343955A (ja) * | 2005-06-08 | 2006-12-21 | Canon Inc | 情報処理装置およびその制御方法 |
JP2010165193A (ja) * | 2009-01-16 | 2010-07-29 | Fujitsu Ltd | 負荷分散装置、負荷分散方法および負荷分散プログラム |
JP2011257834A (ja) * | 2010-06-07 | 2011-12-22 | Ricoh Co Ltd | 分散処理システム |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10187636A (ja) * | 1996-12-24 | 1998-07-21 | Nec Corp | 消費電力低減クラスタシステム |
JPH11170666A (ja) * | 1997-12-12 | 1999-06-29 | Ricoh Co Ltd | ネットワーク用プリンタ装置 |
US6901522B2 (en) * | 2001-06-07 | 2005-05-31 | Intel Corporation | System and method for reducing power consumption in multiprocessor system |
JP2003162515A (ja) | 2001-11-22 | 2003-06-06 | Fujitsu Ltd | クラスタシステム |
WO2003073185A2 (en) * | 2002-02-28 | 2003-09-04 | Zetacon Corporation | Predictive control system and method |
US6977528B2 (en) * | 2002-09-03 | 2005-12-20 | The Regents Of The University Of California | Event driven dynamic logic for reducing power consumption |
US7334142B2 (en) * | 2004-01-22 | 2008-02-19 | International Business Machines Corporation | Reducing power consumption in a logically partitioned data processing system with operating system call that indicates a selected processor is unneeded for a period of time |
US7088141B2 (en) * | 2004-10-14 | 2006-08-08 | International Business Machines Corporation | Multi-threshold complementary metal-oxide semiconductor (MTCMOS) bus circuit and method for reducing bus power consumption via pulsed standby switching |
US7539882B2 (en) * | 2005-05-30 | 2009-05-26 | Rambus Inc. | Self-powered devices and methods |
US7490254B2 (en) * | 2005-08-02 | 2009-02-10 | Advanced Micro Devices, Inc. | Increasing workload performance of one or more cores on multiple core processors |
JP4370336B2 (ja) | 2007-03-09 | 2009-11-25 | 株式会社日立製作所 | 低消費電力ジョブ管理方法及び計算機システム |
US8006111B1 (en) * | 2007-09-21 | 2011-08-23 | Emc Corporation | Intelligent file system based power management for shared storage that migrates groups of files based on inactivity threshold |
JP5207792B2 (ja) * | 2008-02-19 | 2013-06-12 | キヤノン株式会社 | 情報処理装置及び情報処理方法 |
JP5423879B2 (ja) * | 2010-03-29 | 2014-02-19 | 日本電気株式会社 | データアクセス場所選択システム、方法およびプログラム |
US8516284B2 (en) * | 2010-11-04 | 2013-08-20 | International Business Machines Corporation | Saving power by placing inactive computing devices in optimized configuration corresponding to a specific constraint |
-
2012
- 2012-01-31 US US13/981,643 patent/US9201707B2/en active Active
- 2012-01-31 WO PCT/JP2012/000605 patent/WO2012105230A1/ja active Application Filing
- 2012-01-31 JP JP2012555746A patent/JP5786870B2/ja active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006343955A (ja) * | 2005-06-08 | 2006-12-21 | Canon Inc | 情報処理装置およびその制御方法 |
JP2010165193A (ja) * | 2009-01-16 | 2010-07-29 | Fujitsu Ltd | 負荷分散装置、負荷分散方法および負荷分散プログラム |
JP2011257834A (ja) * | 2010-06-07 | 2011-12-22 | Ricoh Co Ltd | 分散処理システム |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018123707A1 (ja) * | 2016-12-27 | 2018-07-05 | 日立オートモティブシステムズ株式会社 | 制御装置 |
JP2018106472A (ja) * | 2016-12-27 | 2018-07-05 | 日立オートモティブシステムズ株式会社 | 制御装置 |
Also Published As
Publication number | Publication date |
---|---|
US20130312004A1 (en) | 2013-11-21 |
JPWO2012105230A1 (ja) | 2014-07-03 |
US9201707B2 (en) | 2015-12-01 |
JP5786870B2 (ja) | 2015-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4370336B2 (ja) | 低消費電力ジョブ管理方法及び計算機システム | |
US11983198B2 (en) | Multi-cluster warehouse | |
US9977689B2 (en) | Dynamic scaling of management infrastructure in virtual environments | |
JP2008257578A (ja) | 情報処理装置、スケジューラおよび情報処理置のスケジュール制御方法 | |
US8893148B2 (en) | Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks | |
US8312464B2 (en) | Hardware based dynamic load balancing of message passing interface tasks by modifying tasks | |
US20090063885A1 (en) | System and Computer Program Product for Modifying an Operation of One or More Processors Executing Message Passing Interface Tasks | |
US20090064165A1 (en) | Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks | |
JP2007041720A (ja) | ジョブステップ実行プログラムおよびジョブステップ実行方法 | |
US20120036383A1 (en) | Power supply for networked host computers and control method thereof | |
JP2003067351A (ja) | 分散型コンピュータの構成制御システム | |
JP2008257572A (ja) | 論理区画に動的に資源割り当てを行うストレージシステム及びストレージシステムの論理分割方法 | |
US20090064166A1 (en) | System and Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks | |
JP4912927B2 (ja) | タスク割当装置、及びタスク割当方法 | |
KR101392584B1 (ko) | 리소스 모니터링을 이용한 동적 데이터 처리 장치 및 그 방법 | |
US10928883B2 (en) | System management device | |
WO2016092856A1 (ja) | 情報処理装置、情報処理システム、タスク処理方法、及び、プログラムを記憶する記憶媒体 | |
JP2008217575A (ja) | ストレージ装置及びその構成最適化方法 | |
JP2016012344A (ja) | アプリケーションを実行する方法及びリソースマネジャ | |
JP5810918B2 (ja) | スケジューリング装置、スケジューリング方法及びプログラム | |
JP2016126426A (ja) | マルチコアシステム、マルチコアプロセッサ、並列処理方法及び並列処理制御プログラム | |
JP5786870B2 (ja) | 分散システム、装置、方法及びプログラム | |
CN100397345C (zh) | 用于管理资源元素队列的方法和控制器 | |
JPWO2011118424A1 (ja) | マシン稼動計画作成装置、マシン稼動計画作成方法、及びマシン稼動計画作成用プログラム | |
JP2018151968A (ja) | 管理装置、分散システム、管理方法、及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12741541 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2012555746 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13981643 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12741541 Country of ref document: EP Kind code of ref document: A1 |