CN102176696B

CN102176696B - Multi-computer system

Info

Publication number: CN102176696B
Application number: CN201110046897.8A
Authority: CN
Inventors: 李麟; 刘瑞贤; 张晋锋
Original assignee: Dawning Information Industry Beijing Co Ltd
Current assignee: Zhongke Shuguang International Information Industry Co ltd; Dawning Information Industry Co Ltd
Priority date: 2011-02-25
Filing date: 2011-02-25
Publication date: 2013-03-20
Anticipated expiration: 2031-02-25
Also published as: CN102176696A

Abstract

In order to liberate managers from machine rooms, facilitate the management of the managers to groups, and aim at the problems of excessive electric power resource consumption and the like caused by the increase of performance and quantity of servers, the invention provides a multi-computer system which comprises a plurality of working groups, wherein each working group contains a plurality of single computers according to the preset policy; each single computer reports respective power consumption information and load information to the working group of each single computer according to a preset acquisition period; each working group allocates energy resources to the single computers contained in each working group according to the power consumption information and load information, and each working group is provided with a dynamic resource pool; and when the sudden load change of one of the single computers occurs, the single computer uses the energy resources in the dynamic resource pool, and the working group forecasts the load in the next preset acquisition period according to the power consumption information and load information to be used for allocating energy resources.

Description

Multicomputer system

Technical field

The present invention relates in general to computer realm, more specifically, relates to a kind of multicomputer system.

Background technology

The computing node of many isomorphisms or isomery is got up by network connection, make it present the multicomputer system with single system mapping and be also referred to as cluster.It has high performance-price ratio, resource-sharing, high flexibility, enhanced scalability, the characteristics such as high fault tolerance.In recent years, along with the development of computer technology, become a kind of popular trend with cluster structure supercomputer or superserver.The scale of cluster extends to hundreds of nodes even thousands of node from several nodes in past, and the management and monitoring of group system also becomes and becomes increasingly complex, and the management and monitoring of cluster more and more becomes a challenging job.

Simultaneously, how effectively to monitor group system, make the keeper realize easily management to whole system by graphical interfaces, supervisory control system should provide easy use, extendible instrument, help the keeper to monitor the work shape body of whole cluster, thus guarantee group system efficiently, stably operation.

Yet along with the lifting of server performance and the increase of quantity, the electric power resource of in recent years its consumption climbs up and up.In resource scarcity more and more serious today, cluster power consumption managed and study have very high society and economy and be worth.So, how from monitor message magnanimity, undressed, to extract active data, simultaneously the information of monitoring is processed and analyzed, dynamically regulating and control server energy consumption according to the situation of load distributes, and how to realize dynamically that according to loading condition the distribution of load becomes new focus, to saving energy consumption demand is arranged also simultaneously.

Summary of the invention

For the keeper is liberated from machine room, make things convenient for the keeper to the management of cluster, simultaneously, the problem such as the electric power resource consumption that brings along with the increase of performance and quantity for server is too much, the invention provides a kind of multicomputer system, comprise: a plurality of working groups, in the working group each all comprises a plurality of units according to predetermined policy, wherein, in a plurality of units each all reports power consumption information and the load information of self to affiliated working group with predetermined collection period, wherein, working group distributes to a plurality of units that it comprises according to power consumption information and load information with energy resource, and working group has the dynamic resource pond, when a unit generation load changing in a plurality of units, unit uses the energy resource in the dynamic resource pond, and wherein, the load when working group predicts next predetermined collection period according to power consumption information and load information is used for carrying out energy resource and distributes.

Wherein, predetermined policy is that the unit of carrying out same business is in same working group.

Wherein, load information comprises cpu busy percentage, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in.

Wherein, prediction comprises: step 1, and calculating connects the first output error of the network of multicomputer system; Step 2 is once trained, and utilizes the second output error of weights, threshold value and the network after the training of learning rate computing network; Step 3 when the ratio of the second output error and the first output error during greater than predefined parameter, reduces step-length of learning rate, otherwise increases step-length of learning rate; Step 4 is returned step 2, until the ratio of the second output error and the first output error is less than predefined parameter.

Wherein, calculate weights with following formula:

Weights Wi (t _n)=A ₁* cpu busy percentage (t _n)+A ₂* memory usage (t _n)+A ₃* bandwidth availability ratio (t _n)+A ₄* magnetic disc i/o rate of people logging in (t _n),

Wherein, A ₁Corresponding to cpu busy percentage (t _n) constant factor, A ₂Corresponding to memory usage (t _n) constant factor, A ₃Corresponding to bandwidth availability ratio (t _n) constant factor, and A ₄Corresponding to magnetic disc i/o rate of people logging in (t _n) constant factor.

Wherein, the resource in the dynamic resource pond is quantified as preset power, and when the energy resource in the unit use dynamic resource pond, a unit reports service time to the working group under it.

Wherein, when arrived in service time, the affiliated unit of working group's order of unit was returned energy resource.

Wherein, when arrived in service time, whether the affiliated working group's unit of inquiry of unit returned energy resource, if a unit need to be selected, then continues the use energy resource, otherwise returns energy resource.

Wherein, when the energy resource in the dynamic resource pond is inadequate, working group locking dynamic resource pond.

Other features and advantages of the present invention will be set forth in the following description, and, partly from specification, become apparent, perhaps understand by implementing the present invention.Purpose of the present invention and other advantages can realize and obtain by specifically noted structure in the specification of writing, claims and accompanying drawing.

Description of drawings

Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:

Fig. 1 shows the block diagram according to multicomputer system of the present invention;

Embodiment

Describe embodiments of the invention in detail below in conjunction with accompanying drawing.

System provided by the present invention is based on the multi-level dcs of correlation, and every layer of control strategy of taking is different.

Cpu busy percentage to cluster, cpu frequency, memory usage, bandwidth availability ratio, the magnetic disc i/o rate of people logging in, the information such as power consumption are from the unit layer, working group, a plurality of ranks such as cluster are monitored respectively and are dispatched, realization is to collection and the storage of historical data, and process and analyze according to the data that gather, and dynamically regulate and control the reasonable distribution that server energy consumption distributes and realizes dynamically load according to loading condition according to the situation of load, find a kind of load to the mapping relations of computer, thereby improve the utilance of CPU, improve resource utilization, obtain effectively sharing of high-performance resource, to reduce the cluster energy consumption, save certain energy.

Below with reference to Fig. 1 not unique embodiment of the present invention is described.

Unit layer 101:

The information such as real-time monitoring power consumption, load, and this information provided to the upper strata.Simultaneously, node is carried out the related command that assign on the upper strata.Wherein load information comprises cpu busy percentage, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in.

Working group's layer 103:

A plurality of nodes are become a working group based on service groups, and different business belongs to different working groups, and the different nodes in the working group are finished time business jointly by resource-sharing, so need working group that Balance of load is made a decision.Simultaneously, working group supports 2 kinds of load characteristics, one, and predictable load, for predictable load, need to be according to be assigned to each node of load with this professional equilibrium; Its two, unpredictable load is satisfied the sudden change of unit layer load by the dynamic resource pond is provided.By this resource pool is quantized, as the per minute resource size is set is A, the resource pool after the quantification (resource size, time), and wherein resource size is A, the time is the time synchronous with unit.If unit layer load changing can be applied for resource and application resource service time to working group, if arrive service time, will inquire and whether will return resource, if need continued access, then can continue to use this resource.If resource is inadequate, then triggering command will lock the dynamic resource pond.

Cluster layer 105:

According to application or region a plurality of working groups are organized together, according to the priority of different business and different business resource is reasonably distributed, guarantee lower floor's working stability, reasonable, effectively operation, have simultaneously the function of resource remote backup.

Wherein, monitor message is to process in the following way with load to distribute:

Monitor respectively from a plurality of ranks such as unit layer, working group, clusters by information such as above-mentioned cpu busy percentage to cluster, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in, power consumptions, obtain related data.Next, will process the data that obtain.

At first, according to the information of cluster monitoring, calculate the overall utilization of cluster resource, comprise the resources such as CPU, internal memory, bandwidth, disk, simultaneously the operating position of the CPU computing capability Ei of total CPU computing capability E of statistical cluster and unit respectively.

Secondly, according to the t of above-mentioned result to cluster _n+ Δ t load constantly predicts that wherein Δ t is the collection period of monitor message.

Adopt the neural network prediction method in the present embodiment, the neural net artificial neural net has the ability of self-organizing, self adaptation and self study, many influencing factors in the processing time sequence preferably have the problems such as uncertain and non-linear, and neural net becomes the tool Predicting Technique of development prospect.For the forecast analysis problem, be suitable for the BP network, after the type of determining network, select the structure and parameter of network, it need to select the parameters such as the number of plies of network, every node layer number, initial weight, threshold value, learning algorithm, learning rate, and the selection of parameters is to gather by experience and examination mostly.To select less the number of hidden nodes on the basis of input/output relation correctly reflecting in the principle of the nodes of selecting network, so that network as far as possible simply.

The BP algorithm of standard is used very wide in practice, but it exists that convergence rate is slow, the setting of the structural parameters that have " local minimum point " problem, network and operational parameter is all without generally acknowledged theoretical direction, generally all is shortcoming and the problem such as rule of thumb to choose.In the present embodiment standard BP algorithm is improved, adopted based on self adaptation modification learning rate algorithm and accelerate network convergence.Detailed process is:

At first calculate the output error of network;

Then after each training finishes, utilize the learning rate of this moment to calculate weights and the threshold value of network, and calculate the network output error of this moment.If the ratio of the output error of this moment and the output error of previous moment greater than predefined parameter p erfect_inc, reduces unit step-length of learning rate; Otherwise increase unit step-length of learning rate.

Recomputate at last weights and the threshold value of network, until output error is less than parameter p erfect_inc.

At last, according to the load of prediction, take the power minimum of cluster consumption as target, reasonably distribute load, reasonably distribute power consumption according to each node load situation, thereby reach energy-conservation purpose.Take strategy for the distribution of load:

1. the node of cluster is when initially coming into operation, the system manager according to the hardware configuration situation of node to initial weight W of each Node configuration ^o _i, generally be that higher its initial weight of joint behavior is higher, along with the variation of node load, the node weights are constantly dynamically adjusted.

2. with cpu busy percentage, memory usage, bandwidth availability ratio, the magnetic disc i/o rate of people logging in factor as computing formula.According to the monitor message of the current collection of each node, calculate the weights that make new advances.According to for different application the ratio of parameters being carried out suitable adjustment in system's running, be constant factor A of each setting parameter _i, and ∑ A _i=1.Each node N then _iWeights at (t _n) constantly can be described as:

Wi (t _n)=A ₁* cpu busy percentage (t _n)+A ₂* memory usage (t _n)+A ₃* bandwidth availability ratio (t _n)+A ₄* magnetic disc i/o rate of people logging in (t _n)

3. according to the Dynamic Weights of above-mentioned each node and the cluster t of prediction thereof _n+ Δ t load constantly, the load that can reasonably distribute each node, thus according to the loading condition of each node, distribute dynamically corresponding power consumption, reach energy-conservation purpose.

The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a multicomputer system is characterized in that, comprising:

A plurality of working groups, each in the described working group all comprises a plurality of units according to predetermined policy,

Wherein, in described a plurality of unit each all reports power consumption information and the load information of self to affiliated working group with predetermined collection period, wherein, described load information comprises cpu busy percentage, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in;

Wherein, described working group distributes to a plurality of units that it comprises according to described power consumption information and described load information with energy resource, and, described working group has the dynamic resource pond, when a unit generation load changing in described a plurality of units, a described unit uses the energy resource in the described dynamic resource pond

And wherein, the load when described working group predicts next described predetermined collection period according to described power consumption information and described load information is used for carrying out energy resource and distributes, and wherein, described prediction comprises:

Step 1, calculating connects the first output error of the network of described multicomputer system;

Step 2 is once trained, and utilizes learning rate to calculate the second output error of weights, threshold value and the described network after the training of described network;

Step 3 when the ratio of described the second output error and described the first output error during greater than predefined parameter, reduces step-length of described learning rate, otherwise increases step-length of described learning rate;

Step 4 is returned step 2, until the ratio of described the second output error and described the first output error is less than described predefined parameter.

2. system according to claim 1 is characterized in that, described predetermined policy is that the unit of carrying out same business is in same working group.

3. system according to claim 1 is characterized in that, calculates described weights with following formula:

Wherein, A ₁Corresponding to described cpu busy percentage (t _n) constant factor, A ₂Corresponding to described memory usage (t _n) constant factor, A ₃Corresponding to described bandwidth availability ratio (t _n) constant factor, and A ₄Corresponding to described magnetic disc i/o rate of people logging in (t _n) constant factor.

4. system according to claim 1, it is characterized in that, resource in the described dynamic resource pond is quantified as preset power, and when a described unit used energy resource in the described dynamic resource pond, a described unit reported service time to the working group under it.

5. system according to claim 4 is characterized in that, when arrived in described service time, the described unit of working group's order under the described unit was returned described energy resource.

6. system according to claim 4, it is characterized in that, when arrive in described service time, working group under the described unit inquires whether a described unit returns described energy resource, if a described unit need to be selected, then continue to use described energy resource, otherwise return described energy resource.

7. system according to claim 1 is characterized in that, when the energy resource in the described dynamic resource pond was inadequate, described working group locked described dynamic resource pond.