CN102176696B - Multi-computer system - Google Patents

Multi-computer system Download PDF

Info

Publication number
CN102176696B
CN102176696B CN201110046897.8A CN201110046897A CN102176696B CN 102176696 B CN102176696 B CN 102176696B CN 201110046897 A CN201110046897 A CN 201110046897A CN 102176696 B CN102176696 B CN 102176696B
Authority
CN
China
Prior art keywords
working group
unit
load
energy resource
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110046897.8A
Other languages
Chinese (zh)
Other versions
CN102176696A (en
Inventor
李麟
刘瑞贤
张晋锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Shuguang International Information Industry Co ltd
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN201110046897.8A priority Critical patent/CN102176696B/en
Publication of CN102176696A publication Critical patent/CN102176696A/en
Application granted granted Critical
Publication of CN102176696B publication Critical patent/CN102176696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

In order to liberate managers from machine rooms, facilitate the management of the managers to groups, and aim at the problems of excessive electric power resource consumption and the like caused by the increase of performance and quantity of servers, the invention provides a multi-computer system which comprises a plurality of working groups, wherein each working group contains a plurality of single computers according to the preset policy; each single computer reports respective power consumption information and load information to the working group of each single computer according to a preset acquisition period; each working group allocates energy resources to the single computers contained in each working group according to the power consumption information and load information, and each working group is provided with a dynamic resource pool; and when the sudden load change of one of the single computers occurs, the single computer uses the energy resources in the dynamic resource pool, and the working group forecasts the load in the next preset acquisition period according to the power consumption information and load information to be used for allocating energy resources.

Description

Multicomputer system
Technical field
The present invention relates in general to computer realm, more specifically, relates to a kind of multicomputer system.
Background technology
The computing node of many isomorphisms or isomery is got up by network connection, make it present the multicomputer system with single system mapping and be also referred to as cluster.It has high performance-price ratio, resource-sharing, high flexibility, enhanced scalability, the characteristics such as high fault tolerance.In recent years, along with the development of computer technology, become a kind of popular trend with cluster structure supercomputer or superserver.The scale of cluster extends to hundreds of nodes even thousands of node from several nodes in past, and the management and monitoring of group system also becomes and becomes increasingly complex, and the management and monitoring of cluster more and more becomes a challenging job.
Simultaneously, how effectively to monitor group system, make the keeper realize easily management to whole system by graphical interfaces, supervisory control system should provide easy use, extendible instrument, help the keeper to monitor the work shape body of whole cluster, thus guarantee group system efficiently, stably operation.
Yet along with the lifting of server performance and the increase of quantity, the electric power resource of in recent years its consumption climbs up and up.In resource scarcity more and more serious today, cluster power consumption managed and study have very high society and economy and be worth.So, how from monitor message magnanimity, undressed, to extract active data, simultaneously the information of monitoring is processed and analyzed, dynamically regulating and control server energy consumption according to the situation of load distributes, and how to realize dynamically that according to loading condition the distribution of load becomes new focus, to saving energy consumption demand is arranged also simultaneously.
Summary of the invention
For the keeper is liberated from machine room, make things convenient for the keeper to the management of cluster, simultaneously, the problem such as the electric power resource consumption that brings along with the increase of performance and quantity for server is too much, the invention provides a kind of multicomputer system, comprise: a plurality of working groups, in the working group each all comprises a plurality of units according to predetermined policy, wherein, in a plurality of units each all reports power consumption information and the load information of self to affiliated working group with predetermined collection period, wherein, working group distributes to a plurality of units that it comprises according to power consumption information and load information with energy resource, and working group has the dynamic resource pond, when a unit generation load changing in a plurality of units, unit uses the energy resource in the dynamic resource pond, and wherein, the load when working group predicts next predetermined collection period according to power consumption information and load information is used for carrying out energy resource and distributes.
Wherein, predetermined policy is that the unit of carrying out same business is in same working group.
Wherein, load information comprises cpu busy percentage, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in.
Wherein, prediction comprises: step 1, and calculating connects the first output error of the network of multicomputer system; Step 2 is once trained, and utilizes the second output error of weights, threshold value and the network after the training of learning rate computing network; Step 3 when the ratio of the second output error and the first output error during greater than predefined parameter, reduces step-length of learning rate, otherwise increases step-length of learning rate; Step 4 is returned step 2, until the ratio of the second output error and the first output error is less than predefined parameter.
Wherein, calculate weights with following formula:
Weights Wi (t n)=A 1* cpu busy percentage (t n)+A 2* memory usage (t n)+A 3* bandwidth availability ratio (t n)+A 4* magnetic disc i/o rate of people logging in (t n),
Wherein, A 1Corresponding to cpu busy percentage (t n) constant factor, A 2Corresponding to memory usage (t n) constant factor, A 3Corresponding to bandwidth availability ratio (t n) constant factor, and A 4Corresponding to magnetic disc i/o rate of people logging in (t n) constant factor.
Wherein, the resource in the dynamic resource pond is quantified as preset power, and when the energy resource in the unit use dynamic resource pond, a unit reports service time to the working group under it.
Wherein, when arrived in service time, the affiliated unit of working group's order of unit was returned energy resource.
Wherein, when arrived in service time, whether the affiliated working group's unit of inquiry of unit returned energy resource, if a unit need to be selected, then continues the use energy resource, otherwise returns energy resource.
Wherein, when the energy resource in the dynamic resource pond is inadequate, working group locking dynamic resource pond.
Other features and advantages of the present invention will be set forth in the following description, and, partly from specification, become apparent, perhaps understand by implementing the present invention.Purpose of the present invention and other advantages can realize and obtain by specifically noted structure in the specification of writing, claims and accompanying drawing.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 shows the block diagram according to multicomputer system of the present invention;
Embodiment
Describe embodiments of the invention in detail below in conjunction with accompanying drawing.
System provided by the present invention is based on the multi-level dcs of correlation, and every layer of control strategy of taking is different.
Cpu busy percentage to cluster, cpu frequency, memory usage, bandwidth availability ratio, the magnetic disc i/o rate of people logging in, the information such as power consumption are from the unit layer, working group, a plurality of ranks such as cluster are monitored respectively and are dispatched, realization is to collection and the storage of historical data, and process and analyze according to the data that gather, and dynamically regulate and control the reasonable distribution that server energy consumption distributes and realizes dynamically load according to loading condition according to the situation of load, find a kind of load to the mapping relations of computer, thereby improve the utilance of CPU, improve resource utilization, obtain effectively sharing of high-performance resource, to reduce the cluster energy consumption, save certain energy.
Below with reference to Fig. 1 not unique embodiment of the present invention is described.
Unit layer 101:
The information such as real-time monitoring power consumption, load, and this information provided to the upper strata.Simultaneously, node is carried out the related command that assign on the upper strata.Wherein load information comprises cpu busy percentage, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in.
Working group's layer 103:
A plurality of nodes are become a working group based on service groups, and different business belongs to different working groups, and the different nodes in the working group are finished time business jointly by resource-sharing, so need working group that Balance of load is made a decision.Simultaneously, working group supports 2 kinds of load characteristics, one, and predictable load, for predictable load, need to be according to be assigned to each node of load with this professional equilibrium; Its two, unpredictable load is satisfied the sudden change of unit layer load by the dynamic resource pond is provided.By this resource pool is quantized, as the per minute resource size is set is A, the resource pool after the quantification (resource size, time), and wherein resource size is A, the time is the time synchronous with unit.If unit layer load changing can be applied for resource and application resource service time to working group, if arrive service time, will inquire and whether will return resource, if need continued access, then can continue to use this resource.If resource is inadequate, then triggering command will lock the dynamic resource pond.
Cluster layer 105:
According to application or region a plurality of working groups are organized together, according to the priority of different business and different business resource is reasonably distributed, guarantee lower floor's working stability, reasonable, effectively operation, have simultaneously the function of resource remote backup.
Wherein, monitor message is to process in the following way with load to distribute:
Monitor respectively from a plurality of ranks such as unit layer, working group, clusters by information such as above-mentioned cpu busy percentage to cluster, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in, power consumptions, obtain related data.Next, will process the data that obtain.
At first, according to the information of cluster monitoring, calculate the overall utilization of cluster resource, comprise the resources such as CPU, internal memory, bandwidth, disk, simultaneously the operating position of the CPU computing capability Ei of total CPU computing capability E of statistical cluster and unit respectively.
Secondly, according to the t of above-mentioned result to cluster n+ Δ t load constantly predicts that wherein Δ t is the collection period of monitor message.
Adopt the neural network prediction method in the present embodiment, the neural net artificial neural net has the ability of self-organizing, self adaptation and self study, many influencing factors in the processing time sequence preferably have the problems such as uncertain and non-linear, and neural net becomes the tool Predicting Technique of development prospect.For the forecast analysis problem, be suitable for the BP network, after the type of determining network, select the structure and parameter of network, it need to select the parameters such as the number of plies of network, every node layer number, initial weight, threshold value, learning algorithm, learning rate, and the selection of parameters is to gather by experience and examination mostly.To select less the number of hidden nodes on the basis of input/output relation correctly reflecting in the principle of the nodes of selecting network, so that network as far as possible simply.
The BP algorithm of standard is used very wide in practice, but it exists that convergence rate is slow, the setting of the structural parameters that have " local minimum point " problem, network and operational parameter is all without generally acknowledged theoretical direction, generally all is shortcoming and the problem such as rule of thumb to choose.In the present embodiment standard BP algorithm is improved, adopted based on self adaptation modification learning rate algorithm and accelerate network convergence.Detailed process is:
At first calculate the output error of network;
Then after each training finishes, utilize the learning rate of this moment to calculate weights and the threshold value of network, and calculate the network output error of this moment.If the ratio of the output error of this moment and the output error of previous moment greater than predefined parameter p erfect_inc, reduces unit step-length of learning rate; Otherwise increase unit step-length of learning rate.
Recomputate at last weights and the threshold value of network, until output error is less than parameter p erfect_inc.
At last, according to the load of prediction, take the power minimum of cluster consumption as target, reasonably distribute load, reasonably distribute power consumption according to each node load situation, thereby reach energy-conservation purpose.Take strategy for the distribution of load:
1. the node of cluster is when initially coming into operation, the system manager according to the hardware configuration situation of node to initial weight W of each Node configuration o i, generally be that higher its initial weight of joint behavior is higher, along with the variation of node load, the node weights are constantly dynamically adjusted.
2. with cpu busy percentage, memory usage, bandwidth availability ratio, the magnetic disc i/o rate of people logging in factor as computing formula.According to the monitor message of the current collection of each node, calculate the weights that make new advances.According to for different application the ratio of parameters being carried out suitable adjustment in system's running, be constant factor A of each setting parameter i, and ∑ A i=1.Each node N then iWeights at (t n) constantly can be described as:
Wi (t n)=A 1* cpu busy percentage (t n)+A 2* memory usage (t n)+A 3* bandwidth availability ratio (t n)+A 4* magnetic disc i/o rate of people logging in (t n)
3. according to the Dynamic Weights of above-mentioned each node and the cluster t of prediction thereof n+ Δ t load constantly, the load that can reasonably distribute each node, thus according to the loading condition of each node, distribute dynamically corresponding power consumption, reach energy-conservation purpose.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (7)

1. a multicomputer system is characterized in that, comprising:
A plurality of working groups, each in the described working group all comprises a plurality of units according to predetermined policy,
Wherein, in described a plurality of unit each all reports power consumption information and the load information of self to affiliated working group with predetermined collection period, wherein, described load information comprises cpu busy percentage, cpu frequency, memory usage, bandwidth availability ratio, magnetic disc i/o rate of people logging in;
Wherein, described working group distributes to a plurality of units that it comprises according to described power consumption information and described load information with energy resource, and, described working group has the dynamic resource pond, when a unit generation load changing in described a plurality of units, a described unit uses the energy resource in the described dynamic resource pond
And wherein, the load when described working group predicts next described predetermined collection period according to described power consumption information and described load information is used for carrying out energy resource and distributes, and wherein, described prediction comprises:
Step 1, calculating connects the first output error of the network of described multicomputer system;
Step 2 is once trained, and utilizes learning rate to calculate the second output error of weights, threshold value and the described network after the training of described network;
Step 3 when the ratio of described the second output error and described the first output error during greater than predefined parameter, reduces step-length of described learning rate, otherwise increases step-length of described learning rate;
Step 4 is returned step 2, until the ratio of described the second output error and described the first output error is less than described predefined parameter.
2. system according to claim 1 is characterized in that, described predetermined policy is that the unit of carrying out same business is in same working group.
3. system according to claim 1 is characterized in that, calculates described weights with following formula:
Weights Wi (t n)=A 1* cpu busy percentage (t n)+A 2* memory usage (t n)+A 3* bandwidth availability ratio (t n)+A 4* magnetic disc i/o rate of people logging in (t n),
Wherein, A 1Corresponding to described cpu busy percentage (t n) constant factor, A 2Corresponding to described memory usage (t n) constant factor, A 3Corresponding to described bandwidth availability ratio (t n) constant factor, and A 4Corresponding to described magnetic disc i/o rate of people logging in (t n) constant factor.
4. system according to claim 1, it is characterized in that, resource in the described dynamic resource pond is quantified as preset power, and when a described unit used energy resource in the described dynamic resource pond, a described unit reported service time to the working group under it.
5. system according to claim 4 is characterized in that, when arrived in described service time, the described unit of working group's order under the described unit was returned described energy resource.
6. system according to claim 4, it is characterized in that, when arrive in described service time, working group under the described unit inquires whether a described unit returns described energy resource, if a described unit need to be selected, then continue to use described energy resource, otherwise return described energy resource.
7. system according to claim 1 is characterized in that, when the energy resource in the described dynamic resource pond was inadequate, described working group locked described dynamic resource pond.
CN201110046897.8A 2011-02-25 2011-02-25 Multi-computer system Active CN102176696B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110046897.8A CN102176696B (en) 2011-02-25 2011-02-25 Multi-computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110046897.8A CN102176696B (en) 2011-02-25 2011-02-25 Multi-computer system

Publications (2)

Publication Number Publication Date
CN102176696A CN102176696A (en) 2011-09-07
CN102176696B true CN102176696B (en) 2013-03-20

Family

ID=44519802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110046897.8A Active CN102176696B (en) 2011-02-25 2011-02-25 Multi-computer system

Country Status (1)

Country Link
CN (1) CN102176696B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932166B (en) * 2012-10-09 2015-04-08 无锡江南计算技术研究所 Clustering power management system
CN103067297B (en) * 2013-01-25 2015-10-07 中国科学院声学研究所 A kind of dynamic load balancing method based on resource consumption prediction and device
CN103281254B (en) * 2013-06-05 2016-09-07 中国电子科技集团公司第十五研究所 The method of bandwidth dynamic allocation
CN103605418B (en) * 2013-10-23 2017-01-04 曙光信息产业(北京)有限公司 The regulating power consumption method and apparatus of cluster server
CN105450684B (en) * 2014-08-15 2019-01-01 中国电信股份有限公司 Cloud computing resource scheduling method and system
CN105227410A (en) * 2015-11-04 2016-01-06 浪潮(北京)电子信息产业有限公司 Based on the method and system that the server load of adaptive neural network detects
CN105676996A (en) * 2015-12-31 2016-06-15 曙光信息产业(北京)有限公司 Loongson server power consumption control method and device
CN105677836A (en) * 2016-01-05 2016-06-15 北京汇商融通信息技术有限公司 Big data processing and solving system simultaneously supporting offline data and real-time online data
CN110825518B (en) * 2019-10-14 2023-06-09 上海交通大学 Micro-service-oriented nanosecond power resource distribution method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920745A (en) * 2005-08-23 2007-02-28 国际商业机器公司 System and method for maximizing server utilization in a resource constrained environment
CN1963769A (en) * 2005-11-10 2007-05-16 国际商业机器公司 Method, device and system for provisioning resources
CN101843131A (en) * 2007-11-01 2010-09-22 高通股份有限公司 Resource scaling in wireless communication systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8706123B2 (en) * 2008-03-24 2014-04-22 Qualcomm Incorporated Common data channel resource usage report

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1920745A (en) * 2005-08-23 2007-02-28 国际商业机器公司 System and method for maximizing server utilization in a resource constrained environment
CN1963769A (en) * 2005-11-10 2007-05-16 国际商业机器公司 Method, device and system for provisioning resources
CN101843131A (en) * 2007-11-01 2010-09-22 高通股份有限公司 Resource scaling in wireless communication systems

Also Published As

Publication number Publication date
CN102176696A (en) 2011-09-07

Similar Documents

Publication Publication Date Title
CN102176696B (en) Multi-computer system
Kansal et al. Energy-aware virtual machine migration for cloud computing-a firefly optimization approach
Shi et al. MDP and machine learning-based cost-optimization of dynamic resource allocation for network function virtualization
CN109324875B (en) Data center server power consumption management and optimization method based on reinforcement learning
CN108182105B (en) Local dynamic migration method and control system based on Docker container technology
Gao et al. An energy-aware ant colony algorithm for network-aware virtual machine placement in cloud computing
Xu et al. Resource pre-allocation algorithms for low-energy task scheduling of cloud computing
CN109271232A (en) A kind of cluster resource distribution method based on cloud computing platform
Barlaskar et al. Enhanced cuckoo search algorithm for virtual machine placement in cloud data centres
Gu et al. A multi-objective fog computing task scheduling strategy based on ant colony algorithm
CN102339233A (en) Cloud computing centralized management platform
Zhang et al. An energy and SLA-aware resource management strategy in cloud data centers
Tarahomi et al. A prediction‐based and power‐aware virtual machine allocation algorithm in three‐tier cloud data centers
CN115718644A (en) Computing task cross-region migration method and system for cloud data center
Nikzad et al. Sla-aware and energy-efficient virtual machine placement and consolidation in heterogeneous DVFS enabled cloud datacenter
González-Vélez et al. Adaptive statistical scheduling of divisible workloads in heterogeneous systems
Kumar et al. A Hybrid Eagle’s Web Swarm Optimization (EWSO) technique for effective cloud resource management
CN110308991B (en) Data center energy-saving optimization method and system based on random tasks
Guo Ant colony optimization computing resource allocation algorithm based on cloud computing environment
Rajagopal et al. Fuzzy softset based VM selection in cloud datacenter
Altomare et al. Energy-aware migration of virtual machines driven by predictive data mining models
Luo et al. Communication-aware and energy saving virtual machine allocation algorithm in data center
Zhang A QoS-enhanced data replication service in virtualised cloud environments
Alsbatin et al. Efficient virtual machine placement algorithms for consolidation in cloud data centers
Patel et al. Efficient resource allocation strategy to improve energy consumption in cloud data centers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB02 Change of applicant information

Address after: 100193 Beijing, Haidian District, northeast Wang West Road, building 8, No. 36

Applicant after: Dawning Information Industry (Beijing) Co.,Ltd.

Address before: 100084 Beijing Haidian District City Mill Street No. 64

Applicant before: Dawning Information Industry (Beijing) Co.,Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181213

Address after: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee after: Zhongke Shuguang International Information Industry Co.,Ltd.

Address before: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee before: Dawning Information Industry (Beijing) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190315

Address after: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee after: Zhongke Shuguang International Information Industry Co.,Ltd.

Address before: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee before: Dawning Information Industry (Beijing) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220725

Address after: 266000 room 211, zone a, software park, No. 169, Songling Road, Laoshan District, Qingdao City, Shandong Province

Patentee after: Zhongke Shuguang International Information Industry Co.,Ltd.

Patentee after: DAWNING INFORMATION INDUSTRY Co.,Ltd.

Address before: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee before: Zhongke Shuguang International Information Industry Co.,Ltd.