CN103713974A

CN103713974A - High-performance job scheduling management node dual-computer reinforcement method and device

Info

Publication number: CN103713974A
Application number: CN201410007013.1A
Authority: CN
Inventors: 马四腾
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2014-01-07
Filing date: 2014-01-07
Publication date: 2014-04-09
Anticipated expiration: 2034-01-07
Also published as: CN103713974B

Abstract

The invention provides a high-performance job scheduling management node dual-computer reinforcement method to simultaneously monitor heartbeat information and operating system resources of a main management node. When faults are found to happen to the heartbeat information or the operating system resources of the main management node, management node switching is started. Meanwhile, the invention further provides a corresponding device. Dual-computer reinforcement of the job scheduling management node is achieved through the method and the device, the operating system resources can be monitored, and the defects of a traditional method are effectively overcome.

Description

A kind of high-performance job scheduling management node two-shipper reinforcement means and equipment

Technical field

The present invention relates to field of computer technology, the two-shipper that is specifically related to a kind of job scheduling management node is reinforced.

Background technology

Current, network computer technology, has promoted development and the widespread use of group system.With express network, high-performance workstation or PC (PC) are connected into cluster by certain structure, realize parallel computation, only need very little cost just can obtain the performance of large scale computer and parallel machine.Yet along with the continuous expansion of high-performance computer cluster application scale, the problem of management of cluster is also following.Job scheduling system is mainly responsible for receiving the job request that user submits to, and to the requirement of operation, selects suitable computational resource to carry out completing user operation according to specific scheduling rule and user.Under the help of job scheduling system, for user's HPCC system, just look like a large server that possesses a lot of CPU, a plurality of users can use this system simultaneously.The operation that job scheduling system leading subscriber is submitted to, is each operation Resources allocation reasonably, thereby guarantees to make full use of the computing power of group system, and as far as possible promptly obtains operation result.Therefore the importance of job scheduling system is also just self-evident.

Traditional reinforcement means comprises the deployment of management node unit, or uses heartbeat (heartbeat) scheme to carry out two-shipper reinforcing.All there is certain defect leak in these two kinds of modes, the mode that for example adopts management node unit to dispose, once this management node breaks down, just can cause the job scheduling system of whole cluster to quit work, the operation of whole cluster cannot be carried out reasonable efficient scheduling, job run also just there will be stagnation, has a strong impact on running efficiency of system; For another example adopt heartbeat scheme to carry out two-shipper reinforcing, design factor due to heartbeat software self, can not carry out resource level monitoring to job scheduling system, once the resource of monitoring breaks down, just can not effectively carry out resource switch, can cause equally whole group operation cannot carry out reasonable efficient scheduling, have a strong impact on running efficiency of system.Because above-mentioned two kinds of reinforcing modes all exist fatal shortcoming, therefore how more effectively job scheduling system to be reinforced and just to become a technical matters urgently to be resolved hurrily.

Summary of the invention

The present invention proposes a kind of high-performance job scheduling management node two-shipper reinforcement means and equipment, avoided on the one hand unit to dispose the Single Point of Faliure problem causing, on the other hand, provide the monitoring to operating system resource, can effectively make up the deficiency of classic method.

A high-performance job scheduling management node two-shipper reinforcement means, comprising:

Step 1: the share directory of nfs server is mounted on job scheduling two-shipper management node, starts heartbeat monitor and monitoring resource;

Step 2: heartbeat monitor and monitoring resource are monitored the heartbeat message of current main management node and operating system resource respectively;

Step 3: judge whether the described heartbeat message of current main management node or operating system resource break down, if it is start management node and switch.

A high-performance job scheduling management node two-shipper bracing means, comprising:

Heartbeat inspecting module, is configured for the heartbeat message of current main management node is monitored, and to monitoring resource module report heartbeat failure message;

Monitoring resource module, is configured for the operating system resource of current main management node is monitored, and when receiving heartbeat failure message or judge that described operating system resource breaks down, starts management node and switch.

The invention has the beneficial effects as follows to realize the two-shipper of job scheduling management node is reinforced, also realized the monitoring to operating system resource, can effectively make up the deficiency of classic method simultaneously.

Accompanying drawing explanation

Fig. 1 is the operation logic block diagram of a kind of high-performance job scheduling management node two-shipper reinforcement means of proposing of the present invention.

Fig. 2 is the process flow diagram of a kind of high-performance job scheduling management node two-shipper reinforcement means of proposing of the present invention.

Fig. 3 is the theory diagram of a kind of high-performance job scheduling management node two-shipper bracing means of proposing of the present invention.

Embodiment

With reference to Fig. 1, Fig. 1 shows the operation logic block diagram of the method for the present invention's proposition, at management node 1(main management node) and management node 2 on move the method that the present invention proposes, the heartbeat message of heartbeat inspecting module Real-Time Monitoring main management node, when the heartbeat of finding main management node is broken down, report monitoring resource module.Monitoring resource module is monitored the operating system resource on main management node in real time, when finding that operating system resource breaks down or while receiving the main management node heartbeat fault of heartbeat inspecting module report, start management node handoff procedure, make management node 2 become main management node.

With reference to accompanying drawing 2, Fig. 2 shows a kind of high-performance job scheduling management node two-shipper reinforcement means process flow diagram that the present invention proposes, and comprising:

Step 1: the share directory of nfs server is mounted on job scheduling two-shipper management node, starts heartbeat monitor (corosync) and monitoring resource (pacemaker).Described heartbeat monitor and monitoring resource are monitored management node 1 and management node 2 respectively, and wherein management node 1 is as main management node, and management node 2 is as slave node, and management node 1 and management node 2 fabrication processes are dispatched two-shipper nodes.User can be configured heartbeat monitor and monitoring resource parameter in advance, for example monitor duration timeout, the supervision interval interval of resource allocation are, grouping and the boot sequence of resource, need to configure STONITH, so to greatest extent the availability of Support Resource simultaneously.

Step 2: heartbeat monitor and monitoring resource are monitored the heartbeat message of current main management node and operating system resource respectively.

Referring to Fig. 3, Fig. 3 shows a kind of high-performance job scheduling management node two-shipper bracing means that the present invention proposes, and described device comprises:

Certainly; the present invention also can have other various embodiments; in the situation that not deviating from spirit of the present invention and essence thereof; those of ordinary skill in the art are when making according to the present invention various corresponding changes and distortion, but these corresponding changes and distortion all should belong to the protection domain of claim of the present invention.

Claims

1. a high-performance job scheduling management node two-shipper reinforcement means, is characterized in that, comprising:

2. the method for claim 1, is characterized in that:

User is configured heartbeat monitor and monitoring resource parameter in advance, and described parameter comprises monitor duration timeout, supervision interval interval.

3. a high-performance job scheduling management node two-shipper bracing means, is characterized in that: comprising: