CN1302412C

CN1302412C - Computer group system and its operation managing method

Info

Publication number: CN1302412C
Application number: CNB2003101035870A
Authority: CN
Inventors: 叶庆华; 赵玉萍; 张喜青; 柳书广; 肖利民
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2003-11-11
Filing date: 2003-11-11
Publication date: 2007-02-28
Anticipated expiration: 2023-11-11
Also published as: CN1617126A

Abstract

The present invention discloses a computer cluster system. The present invention comprises more than one cluster node pool and more than one job scheduling device, wherein the cluster node pool is an aggregation of partial computer nodes in a computers cluster and is used for completing parallel tasks which are loaded to the cluster node pool; the job scheduling device is used for processing the parallel tasks submitted by users and loading the parallel tasks to different node pools according to the requirements of the parallel tasks and job intensity. The present invention can not only manage one single cluster but also easily extend to manage multiple cluster systems so as to achieve manage uniformly the system resources and jobs of each cluster system and uniformly schedule the resources and jobs of the cluster systems. Besides, the present invention is beneficial to realization of single system mapping of multiple cluster systems so that the usage difficulty of multiple cluster systems is reduced.

Description

A kind of computer cluster system and job management method thereof

Technical field

The present invention relates to a kind of computer cluster system and job management method thereof, especially a kind of computer cluster system and job management method thereof that adopts node pond mode; Belong to the computer group technical field.

Background technology

Network of Workstation be one group separate, by the interconnected computing machine of express network, and managed with the pattern of triangular web, promptly make full use of the resource of each computing machine in the group of planes, realize the parallel processing of complex calculation.Along with science and networks development, people are to the raising of the requirement of the operational speed of a computer and processing power, and the node scale in the group of planes is increasing, and simultaneously, increasing mechanism has a plurality of computer groups.How to utilize former computer group; protect original investment; bring into play the computing power of old equipment; realize task management and resource sharing between a plurality of group of planes; form a unified management; the computing environment that processing power is more powerful is a very problem of reality, also how to support later on that for Network of Workstation grid system has the certain experiences meaning simultaneously.

Network of Workstation is except can providing powerful processing power, and the important benefit of its another one is exactly a single system mapping.That is to say that the user can sign in to the login node in the group of planes, by using Network of Workstation software, a machine is convenient as using.For a multimachine group system, the user wishes a system with single system mapping equally.In addition,, make system have the function of automatic scheduling batch processing job, support multimachine group's system to necessitate in order to realize support to batch processing job.

A lot of single cluster operation system is arranged in the prior art, but these cluster operation systems can not support the multimachine group, if a kind of operating system of supporting the multimachine group can be arranged, the single system mapping that the realization Network of Workstation has, the unified management of operation, the uniform dispatching of resource, significant for the utilization factor that improves multimachine group system.

Summary of the invention

Technical matters to be solved by this invention is: a kind of system that goes for the multimachine group of proposition, not only can support a large-scale group of planes, and can be good at a plurality of heterogeneous clusters are realized unified task management, resource sharing and the uniform dispatching of multimachine group system.

Another technical matters to be solved by this invention is: propose a kind of job processing method that is applied to above-mentioned Network of Workstation.

A kind of computer cluster system comprises: more than one group of planes node pond and job scheduler;

Described group of planes node pond is the set of a part of computing machine node, is used for finishing the parallel task that is loaded on the node pond;

Described job scheduler is used for the parallel task that process user is submitted to, according to the requirement of parallel task parallel task is loaded into different node ponds.

Aforesaid node pond both can be the calculating node of a group of planes, also can be the node of a different group of planes.

A kind of computer cluster system job management method comprises the steps:

Step 1, according to the resource information in each node pond, judge whether to exist the node pond of satisfying the operation resource request, if there is the node pond satisfy resource requirement, then execution in step two; Otherwise execution in step three;

Have only one if step 2 satisfies the node pond of resource requirement, then select this node pond operation parallel task, execution in step four;

If satisfying the node pond of resource requirement is more than one,, select the node pond of the lighter node pond of load, execution in step four as this operation of operation then according to the loading condition of each group of planes;

If the resource requirement of operation can't be satisfied in the single node of step 3 pond, then select the node Pooled resources of operation needs successively, satisfy the demands or dispatched whole node Pooled resources and moved this parallel task up to resource according to the loading condition of node in each group of planes; Perhaps directly load this task to overall node pond;

Step 4, with operation scheduling and load operating in the node pond of selecting.

Aforesaid computer cluster system job management method before the described step 1, also comprises: when an operation is submitted to, if specified the node pond, then job scheduling is moved in the group of planes of this node pond correspondence; If do not specify the node pond, then execution in step one.

The present invention has the following advantages:

1, can realize the unified resource management of multimachine group system, unified task management and unified job scheduling.This cluster job management system implementation method that the present invention proposes, not only can manage a single group of planes, realize resource management, control of authority and the job scheduling of single Network of Workstation, and, can expand to multimachine group system at an easy rate, the system resource of each Network of Workstation of unified management and operation, the resource in the realization multimachine group system and the uniform dispatching of operation.

2, can realize the single system mapping of multimachine group system.A plurality of group of planes are by disposing the cluster job management system that we propose, and for the user provides the single system mapping of a multimachine group system, are very easy to the use of system.

3, reduce the difficulty that multimachine group system uses.The cluster job management system that adopts us to propose has not only solved the problem of a plurality of group of planes of unified management, and uses the multimachine group as using a single group of planes, has simplified user's use.The user who has been familiar with group of planes environment for use can use multimachine group system easily.

Description of drawings

Fig. 1 is the synoptic diagram of the single group of planes node of the present invention cell system;

Fig. 2 is the synoptic diagram of multimachine group node cell system of the present invention;

Fig. 3 is the process flow diagram of task management of the present invention.

Embodiment

Group of planes structure is a kind of loose coupling structure, each node in the group of planes is an independently unit, the configuration of node and operating system can be according to user's demand flexible configuration, node can be PC or workstation, operating system can be the operating system of any commercialization, independently of one another between each node, simultaneously again can collaborative work under the management of group of planes software and configuration, a single system mapping is provided to external world.

Multimachine group system is meant the system that is made up of a more than one group of planes.Usually, each Network of Workstation is an isomery, and different system configuration is arranged.

Computer cluster system of the present invention comprises: more than one group of planes node pond and job scheduler;

In Network of Workstation of the present invention, adopted the notion of subregion that a plurality of nodes pond is set, a node pond, a corresponding nodal set (set of node), simultaneously corresponding operation collection that is submitted to this node pond.Parallel task (being operation) by job scheduler unified management user request is loaded into parallel task on the different node ponds by demand, realizes the processing to parallel task.

The user can calculate on the node directly submit job or by network (as the web page of internat) submission task, perhaps Network of Workstation comprises and is used for specially carrying out alternately with the user, submit job returns operation result's login node, and operation is submitted on the login node.This is identical with an existing group of planes.

Because node of the present invention pond both can comprise the part of a group of planes and calculate node, the calculating node that also can comprise different group of planes, this division in logic can be realized the unified resource management of the Network of Workstation that multimachine group becomes, unified task management and unified job scheduling.Therefore, Network of Workstation of the present invention, not only can realize the resource management and the job scheduling of single Network of Workstation, and, can expand to the Network of Workstation that multimachine group becomes at an easy rate, the system resource of each Network of Workstation of unified management and operation, the resource in the realization multimachine group system and the uniform dispatching of operation.

Network of Workstation for a plurality of group of planes are formed owing to adopted unified job scheduler to carry out the loading of parallel task, so the single system mapping of a multimachine group system can be provided for the user easily, is very easy to the use of system.

Owing to the operation in the node pond can only move in the nodal set in this node pond.For the strong node of some computing power, can make it logically be in two or more different node ponds, promptly the parallel task that loads on two node ponds can move on this node; In other words, between the nodal set in each node pond common factor can be arranged.Can utilize the computing power of each node so fully, improve the computing power of Network of Workstation integral body.

For Network of Workstation, finish a plurality of nodes pond as required and divide, the principle of division can be group of planes design task amount and at customer group set.Network of Workstation for multimachine group becomes generally comprises the node pond of being made up of the nodal set of each group of planes at least.

Each node pond in the Network of Workstation can be provided with access control right, reaches the purpose of control group of planes system resource.A node pond has the control of authority of self, only allows specific user submit job on specific node to arrive this node pond.The availability of the user's that the purpose of doing like this is to guarantee that some is special resource makes a group of planes that differentiated services can be provided, and the manageability that is had is stronger.

For Network of Workstation, the node pond can also comprise the many Chi Jiedianchi that are made up of more than one node pond, is used for the bigger parallel task of the amount of finishing the work.Node pond, many ponds can divide timing dynamically to generate at parallel task by job scheduler, also can preestablish.Many Chi Jiedianchi have operation collection and the control of authority of oneself,

Can set up the node pond of an overall situation for Network of Workstation of the present invention, overall node pond is the special many Chi Jiedianchi that comprise whole nodes pond, and its nodal set comprises all the node resources in the Network of Workstation.

Especially, the Network of Workstation for multimachine group becomes when the user needs the resource combined calculation of all group of planes, can be submitted to operation this overall node pond, and job scheduler will be chosen node for operation and calculate from the node resource of each group of planes; And less when user's operation degree of parallelism, in the time of can in a single group of planes, finishing, just operation is submitted to the node pond of corresponding Network of Workstation, like this, job scheduler is chosen the node resource for this operation and is calculated in the nodal set in this node pond.By above realization, for little operation, be arranged in the single group of planes as far as possible and calculate, the efficient of raising job run most probably.For big operation, can utilize the node resource of a plurality of group of planes to calculate again, satisfy the needs of big operation.

Be an example of the Network of Workstation formed of a single group of planes as shown in Figure 1, in the group of planes with y*n, it be divided into three node ponds, comprising an overall node pond p0, two mutually disjoint node pond p1 and p2.Simultaneously, for each group of planes node pond is provided with Access Control List (ACL), for example p1 only allows A group user to use, and p2 only allows B group user to use, and p0 allows C group user to use.Like this, for A group user, their operation can only be scheduled and operate in the nodal set of node pond p1 management, and for B group user, their operation can only be scheduled and operate in the nodal set of node pond p2 management.And for C group user, their operation may be scheduled and operate in all nodes.By similar node pond division methods and corresponding access rights setting, can reach the purpose of flexible control computational resource like this.

Be an example of the Network of Workstation of multimachine group one-tenth as shown in Figure 2, total N group of planes in this Network of Workstation, the node number of each group of planes is respectively x*n, y*j ..., z*v; The nodal set of each group of planes is divided into each node pond, from P1 to PN.Create the node pond of an overall situation simultaneously, P0.By the access rights in node pond are set, can therefrom control the user's collection that allows each group of planes of visit.

Network of Workstation job management method of the present invention as shown in Figure 3, comprises the steps:

Also comprise before the described step 1: when an operation is submitted to,, then job scheduling is moved (authority in the node pond is provided with under the prerequisite that allows this user capture) in the nodal set of this node pond correspondence if specified the node pond.If do not specify the node pond, then execution in step one.

Job scheduler will judge at first whether the node pond of operation appointment allows the user of submit job to use, whether the general node that will judge that also the user submits to meets the requirement of setting, if satisfied the authority setting in this node pond, then job scheduler moves job scheduling in the nodal set of this node pond correspondence.

In the described step 1, job scheduler adopts following steps to judge whether the resource of the combination in node pond or node pond satisfies the demand of parallel task:

At first, to each node pond, obtain resource informations such as node wherein;

Secondly, to each resource that operation is asked, whether the number of judging this resource that the node pond has is more than or equal to this number of resources of job request; If then this resource request can be satisfied in this node pond; Otherwise the resource requirement of this operation can't be satisfied in this node pond; If there is the node pond, can both satisfy the All Jobs requested resource, then this node pond is the node pond of satisfying this operation resource requirement.

The whole parallel task of job scheduling management loads, can come the load in each node pond is judged according to the load condition of task, so in the step 2, job scheduler can select the node pond to finish the parallel task of loading from the lighter node pond of load, help load balancing like this, can improve the whole efficiency of a group of planes; Why select load gentlier rather than necessarily to select the lightest node pond of load to come running job, be because will consider the load in node pond on the one hand, to consider the simple of load-balancing algorithm on the other hand, such as the method that can adopt threshold value, if the load in a node pond meets or exceeds threshold value, just do not loading new parallel task to it.

If the resource requirement of operation can't be satisfied in single node pond, select the combination in different node ponds to satisfy the required resource of operation successively according to the loading condition of node in each group of planes, the mode that just dynamically generates many Chi Jiedianchi satisfies the resource requirement of operation; Perhaps, also can directly just call overall node and handle this parallel task, as shown in Figure 3, such method is more simple.Identical in the judgment mode of node pond load and the step 1; It is also identical with judgment mode in the step 1 whether resource satisfies the judgement of demand of operation.

In the described step 4, the loading of the parallel task in node pond, identical with the mode of existing single cluster operation system loads parallel task, all be in the node set of determining, choose job run node (being job scheduling) by the node selection strategy, load operations is to the node operation of selecting then, here do not giving unnecessary details, can be with reference to [1] Rajkumar Buyya, Zheng Weimin etc. translate, a high-performance group of planes calculates: structure and system (first volume) (Electronic Industry Press, June calendar year 2001); [2] Huang Kai, Xu Zhiwei can expand parallel computing, structure and programming relevant documents such as (China Machine Press, in Mays, 2000).

It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to the foregoing description, those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention, and not breaking away from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims

1, a kind of computer cluster system is characterized in that: comprising: more than one group of planes node pond and job scheduler;

Described group of planes node pond is the set that a part is calculated node, is used for finishing the parallel task that is loaded on the node pond; Described node pond can comprise the calculating node of a group of planes, also can comprise the calculating node of different group of planes; Any one calculates node can belong to more than one node pond;

2, computer cluster system according to claim 1 is characterized in that: user's parallel task can calculate the node submission or submit to by network; Perhaps also comprise the login node, user's parallel task is submitted to by the login node.

3, computer cluster system according to claim 1 and 2 is characterized in that: described node pond has the control of authority of self.

4, computer cluster system according to claim 1 and 2 is characterized in that: described node pond can also comprise the many Chi Jiedianchi that are made up of more than one node pond.

5, computer cluster system according to claim 4 is characterized in that: described node pond comprises the node pond of an overall situation, and its nodal set can comprise all the node resources in a plurality of Network of Workstation.

6, a kind of computer cluster system job management method is characterized in that: comprise the steps:

7, computer cluster system job management method according to claim 6, it is characterized in that: before the described step 1, also comprise: when an operation is submitted to,, then job scheduling is moved in the group of planes of this node pond correspondence if specified the node pond; If do not specify the node pond, then execution in step one.

8, according to claim 6 or 7 described computer cluster system job management methods, it is characterized in that: in the described step 1, job scheduler adopts following step to judge whether the node Pooled resources satisfies the demand of parallel task:

Step 21, to each node pond, obtain node resource information wherein;

Step 22, to each resource that operation is asked, whether the number of judging this resource that the node pond has more than or equal to this number of resources of job request; If then this resource request can be satisfied in this node pond; Otherwise the resource requirement of this operation can't be satisfied in this node pond;

If there is the node pond in step 23, can both satisfy the All Jobs requested resource, then this node pond is the node pond of satisfying this operation resource requirement.