CN102325054A

CN102325054A - Self-adaptive adjusting method for hierarchy management of distributed type calculation management platform cluster

Info

Publication number: CN102325054A
Application number: CN201110316673A
Authority: CN
Inventors: 王胜明; 徐泰山; 方勇杰; 徐健; 许剑冰; 郭剑; 邵伟; 张劲中; 万芳茹; 卢耀华
Original assignee: Nanjing NARI Group Corp; State Grid Electric Power Research Institute
Current assignee: Nari Technology Co Ltd
Priority date: 2011-10-18
Filing date: 2011-10-18
Publication date: 2012-01-18

Abstract

The invention provides a self-adaptive adjusting method for hierarchy management of a cluster node resource, which belongs to the field of distributed type calculation. The method is applied to a distributed type calculation management platform. The distributed type calculation management platform divides a system into a plurality of working areas according to the demand of an application function, organizes calculation by using a working domain as a unit, meets the requirements of different working periods of different application functions and mutually switches and improves the standby capability of the system by node support among the working domains. When the normal running working domain quantity or cluster node quantity is changed, a cluster node optimization distributing module on an application server confirms the cluster node distributing information of all the working domains according to parameters such as reference working period, minimal reserved node quantity, maximal reserved node quantity, and the like of all the working domains by aiming at cluster node adjusting quantity minimum, therefore, the calculating capability of a computer cluster is fully utilized.

Description

The self-adapting regulation method of a kind of Distributed Calculation management platform group of planes differentiated control

Technical field

The invention belongs to the Distributed Calculation field, more precisely a kind of self-adapting regulation method of the Distributed Calculation management platform group of planes differentiated control that can be used for electric power system but be not limited thereto.

Background technology

In the Distributed Calculation field; Particularly need carry out the field of great amount of calculation; For example at the power system safety and stability analysis field, along with electric power system constantly develops, scale of power enlarges day by day; Because on-line system is for the requirement of rapidity time response, computational speed has become the bottleneck of the online application of restriction.Parallel computation is to realize efficient ways extensive, that the complex electric network on-line analysis is calculated; Effectively manage for an extensive group of planes; Make it to be applicable to the various computing function of on-line analysis calculating; And, become the key that power domain improves the system-computed performance in the calculated performance that guarantees to bring into play to greatest extent on the basis of reliability an extensive group of planes.

Document one " distributed paralleling calculation platform and calculation task allocating method thereof " (application number: 200810239104.2) a kind of distributed paralleling calculation platform system and calculation task allocating method thereof are provided.Distributed paralleling calculation platform is responsible for receiving and is calculated input file in this system, forms online and off-line Task Distribution scheme.This method is divided into an off-line group of planes and an online group of planes according to Data Source with a group of planes, and an online group of planes only calculates online task, and an off-line group of planes can calculate online task and off-line task.Role's switching of cluster nodes needs manual configuration to accomplish.This method adopts the framework mode of client/server, and cluster nodes is unified the calculated data and other instructions of reception server, and the filtration of calculated data and processing be independent the completion on each cluster nodes.

Document two " fault-tolerance approach of utilizing cluster nodes to back up each other " (application number: 02159479.1) proposed a kind of fault-tolerance approach of utilizing cluster nodes to back up each other.This method connects, intercoms mutually and backs up through the heartbeat ring between cluster nodes; Host node distributes the position of newly added node in a group of planes, and returns the information on services that newly added node is born; Newly added node starts service depending process one by one, and corresponding service IP is set; If start failure, host node then selects other nodes to start this service; When node in the group of planes finds that adjacent node is unusual, confirm to this adjacent node; Host node is taken over this failed services.The node heat that this method has mainly solved in the cluster management is equipped with problem, but this method and be not suitable for the differentiation management to the various computing function of electric power system parallel computer group.

Document three " a kind of group of planes AMS and application management method thereof " (application number: 201010286186) proposed a kind of group of planes AMS that is applied to the large-scale cluster management.This system comprises execution engine modules and DBM, and DBM is used for storing the result that each is used in real time, and sets up monitoring form, the change information of the result of all associated application of a plurality of application of record in the said monitoring form; Carry out each application that engine modules is used for carrying out NOWs; And the result of each application write DBM in real time; Also be used for said monitoring form according to regular reading of data library module of the cycle that sets; Read behind the said monitoring form change information at every turn, judge respectively whether each trigger condition of using is satisfied, and when trigger condition satisfies, trigger application corresponding according to the result of a plurality of application of being read.The present invention also provides corresponding group of planes application management method.The present invention can reduce the database access linking number, reduces expense; Can handle various complex logic relations between application; Be convenient to management and operation more.

More than in three kinds of methods; Document one does not consider that electric power system Distributed Calculation management platform is in the influence of the emergency case of running (like the variation of calculating scale, the operation exception of cluster nodes) to system; The division of a group of planes is only according to the difference of data source; And the computing function of cluster nodes is just fixing when original allocation, in computational process, according to computing function the changes in demand of node resource is not adjusted automatically, and computational resource can't be fully used; Document two and document three are not considered the concurrent scheduling relation between electric power system parallel computation different application function, can't solve the variability issues of calculation requirement, Data Source and computational process of the various computing function of electric power system.Therefore above-mentioned three kinds of methods all do not have well to solve the problem of electric power system Distributed Calculation management platform cluster management, and the computational resource of cluster nodes can't be utilized.

Summary of the invention

Technical problem to be solved by this invention is; Overcome the limitation of prior art; Consider calculation features, work period and the consequent demand that computational resource is constantly changed of various computing function, the self-adapting regulation method of parallel computer group differentiated control available in a kind of electric power system is provided.

Among the present invention, computer cluster is carried out differentiated control according to the level of " system-working field-cluster nodes ", system is used to distinguish the different pieces of information source, and working field is used for satisfying the requirement in system's computing function different operating cycle.Cluster nodes in the system is divided the different working territory, and it all is unit with the working field that data processing and result gather, and the data processing between the different operating territory, task scheduling and result's passback all are independently.In each working field, all have a management node alone.Be implemented in dynamically adjustment between a plurality of working fields for the cluster nodes in the system through following method, concrete steps are following:

1. in cluster nodes deploy Distributed Calculation management platform, realize between role's identification, node of node data communication, task scheduling and management, data are obtained and function such as passback as a result;

2. in application server deploy cluster nodes self adaptation adjustment program, realize the Stateful Inspection of working field and computing node and the optimized distribution management of cluster nodes;

3. when working field running status or cluster nodes quantity change; Parameters such as the cluster nodes self adaptation adjustment program on the application server is counted according to the reference work cycle of each working field, minimum reserve section and maximum reserve section is counted; Readjust the distribution of cluster nodes on each working field, and adjusted assignment information is distributed to cluster nodes;

4. the Distributed Calculation management platform of moving on the cluster nodes realizes the dynamic switching of cluster nodes between a plurality of working fields according to the information of working field under the amended cluster nodes.

Assume that the normal operation of the system the number of working domain

, the normal operation of the number of cluster nodes

,

is the first i work area set reference duty cycle; Current

a cluster node

working domain distribution relationship matrix

indicates,

(unassigned status The default cluster nodes belong to the working domain 0) is the i-th and j-th cluster nodes working domain assignment affiliation (if 1 means that the i-th node from the other working domain switching to the working domain j use; if it is 0, which means that it does not belong to the working domain j; case of -1 indicates that the i-th node j from the working domain switching to other work domains use); formula (1) represents

a cluster node

working domain satisfy (4) - (7) types of constraints minimize the number of cluster nodes to adjust the optimal allocation of the objective function;

is the i-th cluster work area allocated number of nodes, as specified in formula (2) below; formula (3) ensure that a cluster node only be assigned to a work area (in the number of cluster nodes

is less than the sum of all the work domain of the maximum number of nodes and reservations

time); formula (4) to ensure that each work area allocated based on the reference duty cycle cluster node resources; formula (5) to ensure that all possible resources available cluster nodes are assigned to the work area (if the number of cluster nodes domain over all the work of the maximum number of nodes and reservations

, then there exists a node is unassigned);

is the i-th cluster assigned work area a minimum number of nodes (defaults to 0), where (6) is working domain of computing resources during normal operation the minimum requirements;

is the i-th cluster allocation working domain maximum number of nodes (default is the total number of cluster nodes), formula (7) is working domain of computing resources during normal operation of the highest configuration.

?

?(1)

(2)

(3)

(4)

(5)

?(6)

?(7)

Cluster nodes optimized distribution solution procedure realizes that based on the method for iterative computation concrete steps are following:

(1) working field set to be adjusted is designated as

; Initial value is

; Assigned the working field set and be designated as

; Initial value is

; Judge normal operation the cluster nodes number be

whether satisfies the minimum computational resource requirement that normally moves working field, promptly whether formula (8) is set up.If set up, change (2); If be false, then choose and distribute the minimum working field d of priority from working field set to be adjusted _k, join the working field that has assigned

In the set, promptly

,

, form new waiting and adjust working field set { d ₁..., d _K-1, d _K+1..., d _r, and with working field d _kThe cluster nodes number that distributes

Be changed to 0, change (1);

（8）

(2) according to reference work cycle of each working field in working field the to be adjusted set

; Cluster nodes number

according to each working field expection distribution during formula (9) is calculated

; And round (if zero downwards; Then value is 1), unnecessary node number is distributed to the high working field of priority successively;

（9）

(3) if working field to be adjusted set

non-NULL; The interstitial content

that distributes for each working field checks successively according to formula (7) whether working field after the pre-adjustment satisfies the constraint of counting of maximum reserve section.If do not satisfy; The node number of this working field final assignment is set to maximum reserve section and counts; And this working field

joined in working field

set that has assigned; I.e. ,

.The maximum reserve section that

deducts working field

the node number of participating in the distribution that calculate as next iteration the back of counting changes (2);

(4) if working field to be adjusted set non-NULL; The interstitial content

that each working field distributes for

checks successively according to formula (6) whether working field after the pre-adjustment satisfies the constraint of counting of minimum reserve section.If do not satisfy; The node number of this working field final assignment is set to minimum reserve section and counts; And this working field joined in working field

set that has assigned; I.e.

,

.The minimum reserve section that

deducts working field

(5) if working field to be adjusted set

non-NULL; The interstitial content

that distributes for each working field all satisfies maximum node and the minimum joint constraint that keeps of keeping; All add in the working field set

that has assigned in all working territory in the working field set that then will be to be adjusted; I.e.

,

;

(6) according to the cluster nodes number

of original normal each working field of operation with assigned the cluster nodes number

of each working field in back working field

set, pick out that working field set

that node reduces and node increase working field is Ji Heed

.

Be working field set D _DelThe number of middle working field,

Be the working field set

In each working field node number that need reduce; Be the working field set

Middle working field number,

Be the working field set

In each working field interstitial content that need increase.For state is unappropriated node; Acquiescence is placed on working field 0 and handles; Working field 0 acquiescence belongs to working field set

, and

is

;

(7) the optimized distribution matrix

with cluster nodes is initialized as the preceding allocation matrix

of adjustment; And be that unappropriated node is the node of working field 0 as preassignment with state, simultaneously correspondence position in the allocation matrix

is changed to 1;

(8) node with each working field in the working field set

sorts by switching priority from low to high; Pick out the minimum individual node of

of priority, the state

of correspondence position in the allocation matrix of corresponding cluster nodes is changed to-1.All elements in the allocation matrix

is picked out formation adjustment node set

for-1 node; And sort from low to high according to node priority; Distribute to the working field (each node is only adjusted once) in one by one, and guarantee that the final node number that increases of working field

is

.The state

of each node correspondence position in allocation matrix

is 1 in revising

, thereby forms the allocation matrix

of cluster nodes.

(9) according to the node during the allocation matrix of cluster nodes is revised

in database or text affiliated working field number (element is 1 working field for the affiliated working field of-1 node is revised as column element in the corresponding row with allocation matrix

in), the node number of revising each working field is adjusted node number.

In computational process, the reference work cycle of each working field can be supported to carry out online correction according to the online actual calculation cycle.Formula (10) is the weighted average calculation formula in the cycle that works online of working field i; Wherein P is current round;

is the cycle that the works online weighted average of P-1 wheel before the working field i, and

works as the computing time of front-wheel for working field i.Formula (11) supports to set according to the ratio of user preset the revisory coefficient k in reference work cycle simultaneously;

is (identical with the cycle unit that works online for the manual reference work cycle of setting; If do not need not revise according to the cycle of working online, can revisory coefficient k be set to 0).

?

(10)

?(11)

In addition; To the distribution of the computing node resource of each working field except can distributing according to formula (4) the reference work cycle; All right calculated performance according to a working field promotes efficient as distributing target function, chooses the maximum working field of calculated performance lifting and distributes.

The present invention is based on the thought of multiple management; Cluster nodes is managed according to the three-decker of " system-working field-cluster nodes "; The node redundancy scheme by expanding between working field in traditional working field, and is supported to carry out automatically according to the running status of working field in the system and cluster nodes the optimized distribution of cluster nodes resource.When improving system reliability, also fully improved the cluster nodes efficiency of resource, effectively avoid moving aperiodic in the system resources idle problem of working field.

Description of drawings

Accompanying drawing described herein is used to provide the present invention done and further describes, and constitutes the application's a part, but do not constitute the present invention is not limited.In the accompanying drawings:

Fig. 1 is the self-adapting regulation method system configuration example figure of electric power system parallel computer group differentiated control.

Fig. 2 is the processing logic exemplary plot of cluster nodes optimized distribution module.

Embodiment

For the purpose, technical scheme and the advantage that make the embodiment of the invention is clearer, the present invention is described in further detail below in conjunction with embodiment and accompanying drawing.But the present invention is not limited to given example.

Fig. 1 has provided the self-adapting regulation method system configuration of electric power system parallel computer group differentiated control and has needed mutual information.Between application server and cluster nodes, cluster nodes inside needs mutual node and working field information and file (comprising data and result) all to carry out alternately through network message.

Fig. 2 has provided the processing logic sketch map of cluster nodes optimized distribution module on the application server.This module is adjusted the cluster nodes number that each working field distributes according to user pre-configured working field and cluster nodes information and related constraint condition, to improve the overall computational performance of system.Concrete steps are following:

What (1) Fig. 2 step 1 was described is in the iterative computation process, participates in the working field of adjustment and the screening process of cluster nodes.Pick out the working field and the cluster nodes of all operations during initial start,, check whether the cluster nodes of current participation adjustment satisfies the minimum reservation joint constraint in all working territory according to the constraint that the minimum reserve section of each working field is counted.If do not satisfy, pick out the minimum working field of priority and directly join and distribute in the completed working field set, remaining working field and cluster nodes are carried out the minimum reserve section constraint inspection of counting again.Ultimately selected to participate iterative adjustment of the domain to be set

and the cluster node sets .

What (2) Fig. 2 step 2 was described expects that according to each working field of reference work computation of Period the node number

that distributes is (because the interstitial content that each working field distributes possibly be non-integer; Therefore; The result is rounded downwards; Remove fractional part, the difference part mean allocation of sum of finding the solution as a result and actual node number is given the working field that needs to increase the node number).

(3) Fig. 2 step 3 is described is that restriction and the minimum reserve section restriction of counting of counting is carried out verification to the node number

of expection distribution according to the maximum reserve section of user preset.If the working field restriction of not satisfying that maximum reserve section is counted or minimum reserve section is counted is arranged; The interstitial content that this working field distributes is set to that maximum reserve section is counted (not satisfying formula (7) constraint) or minimum reserve section is counted (not satisfying formula (6) constraint); And join in the completed working field set of distribution ; The node number of participating in the distribution deducts and accomplishes the share out the work node number in territory of node; Activate the iteration sign again, repeat step (1).Count and the restriction of counting of minimum reserve section if wait to adjust maximum reserve section that the expection distribution node number in all working territory in the working field set

all satisfies each working field, directly get into step (4).

(4) Fig. 2 step 4 is found the solution the interstitial content that each working field needs is adjusted according to the expection distribution node number of each working field and the difference of the current node number that has; Pick out the working field set

that working field is gathered

(belong to working field 0 for unappropriated node acquiescence, working field 0 belongs to

) and node increases that node reduces after picking out adjustment.Each working field for reducing in the node working field set

sorts by node switching priority to the node that it had; The individual cluster nodes of

that priority is minimum is picked out as node to be adjusted, and forms node set to be adjusted

.For all nodes in

; The node number that needs increase according to each working field

in

; Switch priority according to node and sort from low to high, distribute to the working field (each node only switches once) in

one by one.Revise the affiliated working field of preserving in local data base or the file of waiting to adjust node at last, upgrade the current actual cluster nodes number that has of each working field synchronously.

Above-described practical implementation case; Just carry out further detailed elaboration to the object of the invention, technical scheme and beneficial effect; And be not used in qualification protection scope of the present invention; All any modifications of on principle of the present invention and basis, being carried out etc. all should be included within protection scope of the present invention.

Claims

1. the self-adapting regulation method of Distributed Calculation management platform group of planes differentiated control; The content of this management by methods and adjustment comprises: cluster nodes; Form by one or more computer; Be used to move Distributed Calculation management platform and correlation computations program, it is the least unit of scheduling of Distributed Calculation management platform computational resource and management; The information interaction of all cluster nodes is carried out based on local area network (LAN), supports TCP and udp protocol; Working field is also claimed computational fields, is meant the set of the cluster nodes of dividing in order to realize some related application function; It is that Distributed Calculation management platform application function is realized the base unit that the work period is provided with; Application server is responsible for the work of calculated data generation, result of calculation processing and node and working field state information collection and issue; Cluster nodes is applied to the Distributed Calculation management platform of electric power system based on the self-adapting regulation method of differentiated control; Wherein operation cluster nodes self adaptation adjustment program on application server realizes the operation monitoring to all working territory and cluster nodes in the system; Operation Distributed Calculation management platform on all cluster nodes, the working field under the identification is separately realized the independent calculating of each working field and the renewal of running status; Concrete steps are following:

1) needs according to application function become a plurality of working fields with system divides; With the working field is that unit carries out data processing, calculating scheduling and result's recovery of computing function separately; There is not the mutual of data and control information between working field, to satisfy the different application function demand in different operating cycle;

2) out of service or when increasing new working field when the working field that has moved; Cluster nodes self adaptation on the application server is adjusted parameters such as program is counted according to the reference work cycle of each working field, minimum reserve section and maximum reserve section is counted, and dynamically adjusts the cluster nodes assignment information of each working field;

3) when the cluster nodes of having moved during out of service or newly-increased cluster nodes; Cluster nodes self adaptation on the application server is adjusted parameters such as program is counted according to the reference work cycle of each working field, minimum reserve section and maximum reserve section is counted, and dynamically adjusts the cluster nodes assignment information of each working field;

4) the Distributed Calculation management platform of moving on the cluster nodes realizes the dynamic switching of cluster nodes between a plurality of working fields according to working field information under the amended cluster nodes.

2. the self-adapting regulation method of a Distributed Calculation management platform group of planes according to claim 1 differentiated control, the redundancy scheme that it is characterized in that Distributed Calculation management platform in the step 1) is by extending to whole NOWs in the single working field.

3. the self-adapting regulation method of a Distributed Calculation management platform group of planes according to claim 1 differentiated control; It is characterized in that step 2) and 3) in be target with cluster nodes adjustment minimum number, realize the classification optimized distribution of cluster nodes between working field according to the running status of working field and cluster nodes;

?

?(1)

(2)

(3)

(4)

?(5)

(6)

?(7)

Where is the number of the normal operation of the working domain,

for the normal operation of the fleet of the number of nodes,

is

a cluster node working domain distribution relationship matrix, is the assignment matrix before the adjustment,

is adjusted to meet (4) - (7) various constraints cluster nodes to adjust the minimum number of optimal allocation matrix,

is the i-th cluster nodes assigned work area number,

is the i-th and j-th cluster nodes working domain assignment affiliation (if 1, which means that the i-th node from the switch to work other jobs domains using domain j; if it is 0, which means that it does not belong to the working domain j; if -1, which means that the i-th node from the work area to another work area j switch use) (Work Domain 0 as a special work area, all state unallocated cluster nodes belong to the working domain 0);?

is the domain of the i-th job reference duty cycle;

is the i-th cluster assigned work area a minimum number of nodes, the default is 0;

is the i-th cluster assigned work area up to the number of nodes, the default is the total number of cluster nodes; formula (3) to ensure a cluster node only be assigned to a work area; formula (4) to ensure that each work area allocated based on the reference duty cycle cluster node resources; formula (5) to ensure as far as possible all available resources are allocated cluster node to the work area to use (if the number of cluster nodes domain over all the work of the maximum number of nodes and reservations

, then there are unassigned node); formula (6) is a work area during normal operation minimum of computing resources; formula (7) is a work area during normal operation the maximum configuration of the computing resources; concrete solution as follows:

1) obtains the approximate solution that cluster nodes is distributed according to formula (4) and formula (5);

2) and then judge whether the constraint of formula (6) and formula (7) is satisfied, if both satisfy, this approximate solution is feasible solution, and directly commentaries on classics iv);

3) if can't satisfy formula (6), pick out the minimum working field of priority, remaining working field and cluster nodes number are found the solution according to formula (4) and formula (5) again, change i);

4) if can't satisfy formula (7), pick out the working field that does not satisfy condition, distribute maximum node number, remaining working field and cluster nodes number are found the solution by formula (4) and formula (5), change i) to it;

If according to i) obtain feasible solution, relatively the destination node number in all working territory and the difference of present node number are picked out the working field set that needs to reduce node

The working field that increases node with needs is gathered

Be working field set D _DelMiddle working field number,

Be the working field set

In each working field node number that need reduce;

Be the working field set

Middle working field number,

Be the working field set

In each working field node number that need increase be unappropriated node for state, acquiescence being placed on working field 0 and handling, working field 0 acquiescence belongs to the working field set

5) the optimized distribution matrix

with cluster nodes is initialized as the preceding allocation matrix

of adjustment; And be that unappropriated node is the node of working field 0 as preassignment with state; Simultaneously correspondence position in the optimized distribution matrix

is changed to 1, the node number that working field 0 needs to reduce is the number of node in the working field 0;

6) node with each working field in the working field set

according to priority sorts from low to high; Pick out the minimum individual node of

of priority, the state

of correspondence position in the optimized distribution matrix

of cluster nodes is changed to-1; The node that contains-1 state in all state matrixs is picked out formation adjustment node set

; And sort from low to high according to node priority; (each node only distributes once to distribute to working field

in

one by one; Satisfy formula (3) constraint), and guarantee that the final node number that increases of working field is ; The state

of each node correspondence position in optimized distribution matrix

is 1 in revising

, thereby forms the optimized distribution matrix

of cluster nodes;

7) according to the cluster nodes during adjusted cluster nodes optimized distribution matrix is revised in database or text affiliated working field number (element is 1 working field for the affiliated working field of-1 node is revised as column element in the corresponding row with optimized distribution matrix

in), the affiliated node number of revising each working field is adjusted cluster nodes number.