CN101881996A - Parallel memory system check-point power consumption optimization method - Google Patents

Parallel memory system check-point power consumption optimization method Download PDF

Info

Publication number
CN101881996A
CN101881996A CN 201010229535 CN201010229535A CN101881996A CN 101881996 A CN101881996 A CN 101881996A CN 201010229535 CN201010229535 CN 201010229535 CN 201010229535 A CN201010229535 A CN 201010229535A CN 101881996 A CN101881996 A CN 101881996A
Authority
CN
China
Prior art keywords
storage server
object storage
power consumption
state
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010229535
Other languages
Chinese (zh)
Other versions
CN101881996B (en
Inventor
陈娟
杨灿群
黄春
董勇
易会战
王�锋
杜云飞
赵克佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN2010102295358A priority Critical patent/CN101881996B/en
Publication of CN101881996A publication Critical patent/CN101881996A/en
Application granted granted Critical
Publication of CN101881996B publication Critical patent/CN101881996B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a parallel memory system check-point power consumption optimization method, which aims at solving the technical problem that power consumption optimization of the parallel memory system is how to proceed based on the operation characteristics of the check-point. The invention adopts the technical scheme that: a server work state gather indicating the work state of each target memory server is built for each memory server, each element in the gather represents a process of the target memory server for providing the service, and more elements in the gather represent that the target memory server provides the check-point service to more processes; and after one target memory server receives a power consumption state setup request, the request is judged whether to be executed according to the state of the server work state gather. By adopting the method, different power consumption states can be self-adaptively set according to the work state of the target memory server operated by each response check point, so the power consumption of the vacant server can be reduced, and the setting confliction problem of different power consumption state orders of the target memory server can be eliminated.

Description

A kind of parallel memory system check-point power consumption optimization method
Technical field
The present invention relates to the power consumption optimization method of parallel memory system, refer to especially by storage server being provided with multistage power consumption state, to the method for parallel memory system check-point operation carrying out optimised power consumption.
Background technology
Parallel memory system is the important component part in the massively parallel computer system, and the power consumption that a large amount of file read-write operations produces accounts for the very most of of whole power consumption of computer systems.The checkpoint is the important means that strengthens the high performance computing system availability.In high performance computing system, extensive science computing application often working time longer, and because larger, take a large amount of computational resources, the possibility that hardware fault appears in the system that makes significantly increases.For the assurance program can normally be moved, improve the validity of sequential operation, can in the application program operational process, carry out checkpointed usually, preserve each running state of a process and shared internal memory in the application.In case system's operation is broken down, can utilize the operation of the image file recovery application of nearest preservation, improve the availability of system.Checkpointed is created independently image file for each calculation procedure, and each image file is accompanied by the read-write of mass data, and a large amount of read-write operations makes the power consumption of object storage server sharply rise.Therefore, at the characteristic of checkpointed, it is very necessary and effective that storage server is implemented optimised power consumption.
Checkpointed is carried out has the characteristics of property at interval, and the user carries out a checkpointed to whole application at regular intervals.This interval property characteristics make that the storage server of preserving the checkpoint image file is not in running order always, have the state of storage server zero service of certain hour.Related checkpoint image file is stored in the independent partitions of parallel memory system and (is called checkpoint reflection subregion) among the present invention, its operation and other file read-write are separately, the file read-write of non-checkpointed can not use the object storage server under this checkpoint reflection subregion, therefore, when certain object storage server does not have checkpointed and need handle, server is in idle condition, there is power wastage, the object storage server in this stage can be set to low power consumpting state, to save power consumption.
Utilize checkpoint reflection subregion, and the time interval characteristics of checkpointed, by reducing processor frequencies, memory device being made as the power consumption that means such as low power consumpting state reduce the object storage server that is in idle condition, reducing the energy consumption in the storage system operational process, is one of important means that realizes the storage system optimised power consumption.
At present, power consumption optimization method at storage system is mainly reflected in the memory device level, comprise device sleeps, conditioning equipment rotating speed, minimizing disk tracking number of times are set, also has power consumption control in addition, the power consumption of server during the reduction data backup when backup requirements is arranged at data backup server.The work of carrying out optimised power consumption at the checkpointed characteristics is seldom arranged, towards the optimization of checkpoint also mainly towards the performance optimization aspect.At present, many storage systems all are equipped with the object storage server that carries out checkpointed specially, and the optimised power consumption of ignoring it will be that one of storage resources is wasted greatly.
Summary of the invention
The technical problem to be solved in the present invention is how based on the checkpointed characteristic parallel memory system to be implemented optimised power consumption.Specifically comprise: how to insert the power consumption state setting command, how to solve the collision problem that a plurality of power consumption states instruction that relates to a plurality of checkpointed is set: when use larger or operation more for a long time, checkpoint image file number is greater than the object storage number of servers, a plurality of image files are kept on the same object storage server, different computing nodes can send repeatedly power consumption state setting command, produces conflict.
Technical scheme of the present invention is: be server duty set of each object storage server constructs, be used to represent the duty of this object storage server, on behalf of a server, each element in the set of server duty the process of service is provided, element in the set is many more, represents this object storage server to provide the checkpoint service for many more processes.After certain object storage server receives that a power consumption state is set request, according to the state of server duty set, judge whether to need to carry out this request, with repetition and the collision problem of avoiding a plurality of power consumption state instructions to set.Concrete technical scheme is:
The first step, be two power consumption states of object storage server definition: normal power consumption state and low power consumpting state.Before carrying out checkpointed, the object storage server is set at normal power consumption state.After checkpointed was complete, computing node sent the low power consumpting state setting command to the object storage server, and the object storage server is set to low power consumpting state.
Second step, for the parallel memory system that N object storage server arranged, (server duty of the structure of 1≤j≤N) is gathered G for object storage server j j, set G jEmbodied the duty of current object storage server, N is a positive integer.Set G jIn each element be the process identification (PID) I that obtains by the splicing of job number and process number, representative object storage server j provides the process of checkpoint service for it.G when initial jBe sky.
Below each step all launch at each object storage server j.
The 3rd step, object storage server j wait for power consumption state setting request R on the horizon, R ∈ { R Normal, R Down, R wherein NormalExpression is set at the request of normal power consumption state, R with the object storage server DownExpression is set at the object storage server request of low power consumpting state.
After the 4th step, object storage server receive that a power consumption state is set request R, ask pairing job number, process number according to this, job number and process number are coupled together constitute a process identification (PID) I, for example job number is 1000, process number is 500, and then identifying I is 1000500.
If the 5th step R=R Normal, carried out for the 6th step; Otherwise, carried out for the tenth step;
The power consumption state that the 6th step, this moment arrive is set request R and is required the object storage server is set at normal power consumption state, and indicated object storage server j need respond the services request of I, I is incorporated into the server duty set G of this object storage server j j, i.e. G j=G jU{I}.
The power consumption state of the 7th step, the current object storage server j of inquiry if be in low power consumpting state, then carried out for the 8th step; Otherwise carried out for the 9th step.
The 8th step, object storage server j carry out request R, and j is set at normal power consumption state with the object storage server, and the power consumption state of revising current object storage server j simultaneously is normal power consumption state, changes for the 14 step.
The 9th the step, ignore this request R, changeed for the 14 step.
The power consumption state that the tenth step, this moment arrive is set request R and is required the object storage server is set at low power consumpting state, and indicated object storage server j has finished the services request of I, with the services state set G of I from object storage server j jIn remove i.e. G j=G j-{ I}.
The 11 step, judgement G this moment jWhether be empty, if carried out for the 12 step; Otherwise carried out for the 13 step.
The 12 step, object storage server j carry out request R, and j is set at low power consumpting state with the object storage server, and the power consumption state of revising current object storage server j simultaneously is a low power consumpting state, changes for the 14 step.
The 13 step, this moment still have the checkpoint operation of other processes to need service, ignore this request R.
The 14 goes on foot, whether has new checkpoint service at hand, if carried out for the 3rd step; Otherwise, carried out for the 15 step.
The 15 step, end.
Adopt this method can reach following effect:
1) can different power consumption states be set adaptively according to each busy-idle condition that responds the object storage server of checkpointed, reach the purpose that reduces the idle server power consumption.
2) the present invention is directed to big operation scale and many job runs situation, defined the duty set of object storage server,, eliminated the collision problem that a plurality of power consumption state instructions of object storage server are set by inquiring about the state of this set.
Description of drawings
Fig. 1 has provided the object storage system structural drawing that has checkpoint reflection subregion, and subregion 1 is used for the subregion of save routine data, and subregion 2 is used to preserve the checkpoint image file for checkpoint reflection subregion.
Fig. 2 is an overview flow chart of the present invention.
Embodiment
Step 1), be two power consumption states of object storage server definition: normal power consumption state and low power consumpting state.
Step 2), be a server duty set of object storage server j structure G j, G when initial jBe sky.
Step 3), object storage server j are waiting for power consumption state setting request R on the horizon, R ∈ { R constantly Normal, R Down.
Step 4), after the object storage server receives that power consumption state is set request R, job number and process number coupled together constitute a process identification (PID) I.
If step 5) R=R Normal, then execution in step 6); Otherwise, execution in step 10).
Step 6), G j=G jU{I}.
If the current low power consumpting state that is in of step 7) object storage server j, then execution in step 8); Otherwise execution in step 9).
Step 8), object storage server j is set at normal power consumption state, the power consumption state of revising current object storage server j simultaneously is normal power consumption state, turns to step 14).
Step 9), ignore this request R, turn to step 14).
Step 10), G j=G j-{ I}.
Step 11), judgement G this moment jWhether be empty, if then execution in step 12); Otherwise execution in step 13).
Step 12), object storage server j is set at low power consumpting state, the power consumption state of revising current object storage server j simultaneously is a low power consumpting state.Turn to step 14).
Step 13), ignore this request R.
Step 14), whether the service of new checkpoint is arranged at hand, if, execution in step 3); Otherwise, execution in step 15).
Step 15), end.

Claims (1)

1. parallel memory system check-point power consumption optimization method is characterized in that may further comprise the steps:
The first step, be two power consumption states of object storage server definition: normal power consumption state and low power consumpting state; Before carrying out checkpointed, the object storage server is set at normal power consumption state, after checkpointed was complete, computing node sent the low power consumpting state setting command to the object storage server, and the object storage server is set to low power consumpting state;
Second step, for the parallel memory system that N object storage server arranged, be that server duty of object storage server j structure gathers G j, set G jIn each element be the process identification (PID) I that obtains by the splicing of job number and process number, representative object storage server j provides the process of checkpoint service for it; G when initial jBe sky; N is a positive integer, 1≤j≤N;
Each object storage server j enters following work:
The 3rd step, object storage server j wait for power consumption state setting request R on the horizon, R ∈ { R Normal, R Down, R wherein NormalExpression is set at the request of normal power consumption state, R with the object storage server DownExpression is set at the object storage server request of low power consumpting state;
The 4th step, object storage server are asked pairing job number, process number according to this after receiving that power consumption state is set request R, job number and process number are coupled together constitute a process identification (PID) I;
If the 5th step R=R Normal, carried out for the 6th step; Otherwise, carried out for the tenth step;
The power consumption state that the 6th step, this moment arrive is set request R and is required the object storage server is set at normal power consumption state, and indicated object storage server j need respond the services request of I, I is incorporated into the server duty set G of this object storage server j j, i.e. G j=G jU{I};
The power consumption state of the 7th step, the current object storage server j of inquiry if be in low power consumpting state, then carried out for the 8th step; Otherwise carried out for the 9th step;
The 8th step, object storage server j carry out request R, and j is set at normal power consumption state with the object storage server, and the power consumption state of revising current object storage server j simultaneously is normal power consumption state, changes for the 14 step;
The 9th the step, ignore this request R, changeed for the 14 step;
The power consumption state that the tenth step, this moment arrive is set request R and is required the object storage server is set at low power consumpting state, and indicated object storage server j has finished the services request of I, with the services state set G of I from object storage server j jIn remove i.e. G j=G j-{ I};
The 11 step, judgement G this moment jWhether be empty, if carried out for the 12 step; Otherwise carried out for the 13 step;
The 12 step, object storage server j carry out request R, and j is set at low power consumpting state with the object storage server, and the power consumption state of revising current object storage server j simultaneously is a low power consumpting state, changes for the 14 step;
The 13 step, this moment still have the checkpoint operation of other processes to need service, ignore this request R;
The 14 goes on foot, whether has new checkpoint service at hand, if carried out for the 3rd step; Otherwise, carried out for the 15 step;
The 15 step, end.
CN2010102295358A 2010-07-19 2010-07-19 Parallel memory system check-point power consumption optimization method Expired - Fee Related CN101881996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102295358A CN101881996B (en) 2010-07-19 2010-07-19 Parallel memory system check-point power consumption optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102295358A CN101881996B (en) 2010-07-19 2010-07-19 Parallel memory system check-point power consumption optimization method

Publications (2)

Publication Number Publication Date
CN101881996A true CN101881996A (en) 2010-11-10
CN101881996B CN101881996B (en) 2011-07-27

Family

ID=43054027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102295358A Expired - Fee Related CN101881996B (en) 2010-07-19 2010-07-19 Parallel memory system check-point power consumption optimization method

Country Status (1)

Country Link
CN (1) CN101881996B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915257A (en) * 2012-09-28 2013-02-06 曙光信息产业(北京)有限公司 TORQUE(tera-scale open-source resource and queue manager)-based parallel checkpoint execution method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584787A (en) * 2003-08-19 2005-02-23 英特尔公司 Power conservation in the absence of AC power
WO2008016162A1 (en) * 2006-08-02 2008-02-07 Kabushiki Kaisha Toshiba Memory system and memory chip

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584787A (en) * 2003-08-19 2005-02-23 英特尔公司 Power conservation in the absence of AC power
WO2008016162A1 (en) * 2006-08-02 2008-02-07 Kabushiki Kaisha Toshiba Memory system and memory chip

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915257A (en) * 2012-09-28 2013-02-06 曙光信息产业(北京)有限公司 TORQUE(tera-scale open-source resource and queue manager)-based parallel checkpoint execution method
CN102915257B (en) * 2012-09-28 2017-02-08 曙光信息产业(北京)有限公司 TORQUE(tera-scale open-source resource and queue manager)-based parallel checkpoint execution method

Also Published As

Publication number Publication date
CN101881996B (en) 2011-07-27

Similar Documents

Publication Publication Date Title
US11345020B2 (en) Robot cluster scheduling system
US8943353B2 (en) Assigning nodes to jobs based on reliability factors
CN102111337B (en) Method and system for task scheduling
Ananthanarayanan et al. Why let resources idle? Aggressive cloning of jobs with Dolly
CN109240825B (en) Elastic task scheduling method, device, equipment and computer readable storage medium
CN103067425A (en) Creation method of virtual machine, management system of virtual machine and related equipment thereof
CN102958166A (en) Resource allocation method and resource management platform
US20110107344A1 (en) Multi-core apparatus and load balancing method thereof
CN108351783A (en) The method and apparatus that task is handled in multinuclear digital information processing system
CN103593242A (en) Resource sharing control system based on Yarn frame
CN102713854A (en) Method and apparatus for saving and restoring container state
WO2014168913A1 (en) Database management system with database hibernation and bursting
CN101713970A (en) Method and systems for restarting a flight control system
US20200183703A1 (en) Systems and methods for selecting a target host for migration of a virtual machine
US11169844B2 (en) Virtual machine migration to multiple destination nodes
US20230325082A1 (en) Method for setting up and expanding storage capacity of cloud without disruption of cloud services and electronic device employing method
CN105808346A (en) Task scheduling method and device
CN105095112A (en) Method and device for controlling caches to write and readable storage medium of non-volatile computer
CN101881996B (en) Parallel memory system check-point power consumption optimization method
CN109783304B (en) Energy-saving scheduling method and corresponding device for data center
CN103957229A (en) Active updating method, device and server for physical machines in IaaS cloud system
US20190243673A1 (en) System and method for timing out guest operating system requests from hypervisor level
JP2007328413A (en) Method for distributing load
CN112328402A (en) High-efficiency self-adaptive space-based computing platform architecture and implementation method thereof
WO2015111067A1 (en) Dynamically patching kernels using storage data structures

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110727

Termination date: 20160719

CF01 Termination of patent right due to non-payment of annual fee