CN103279386A

CN103279386A - Method for achieving high availability of computer operation scheduling system

Info

Publication number: CN103279386A
Application number: CN2013102290979A
Authority: CN
Inventors: 马四腾
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2013-06-09
Filing date: 2013-06-09
Publication date: 2013-09-04

Abstract

The invention provides a method for achieving high availability of a computer operation scheduling system. High availability software like heartbeat is combined with operation scheduling software and network file system (NFS) shared storage to achieve the high availability of the computer operation scheduling system. In a traditional scheme, an achieving mode of a high-performance computer operation scheduling system is mainly characterized in that the high-performance computer operation scheduling system is singly arranged on one server. Once the server crashes, the operation scheduling system fails, a high-performance computer cannot conduct scheduling operation normally, operation is stopped, and system resource is wasted. The arranging mode is easy and flexible, but system availability is poor. For such an important application, availability should be considered at the first place.

Description

The high available method of a kind of job scheduling system

Technical field

The present invention relates to the Computer Applied Technology field, specifically the high available method of a kind of job scheduling system.

Background technology

Current, based on network computer technology has promoted development and the widespread use of group system.With express network high-performance workstation or PC are connected into cluster by certain structure, realize parallel computation, only with very little cost, just can obtain the performance of large scale computer and parallel machine; The software systems that these workstations or PC are managed are exactly the cluster management system that this paper will study, and the job scheduling technology is one of gordian technique in the cluster management system.

Job scheduling is to embody the user job of being impartial to, and improves system response time, and then improves the key factor of system performance.In view of the critical role of job scheduling at cluster management system, we have carried out the job scheduling research at group system, propose and designed a kind of practicality, stable, reliable job scheduling strategy, practical application shows it is the solution preferably of this key problem of cluster job scheduling.

Summary of the invention

The purpose of this invention is to provide the high available method of a kind of job scheduling system.

The objective of the invention is to realize in the following manner, concrete steps are as follows: use two-server, be called server 1 and server 2 respectively, dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used NFS and share storage, wherein heartbeat heartbeat software is managed as high available resources, NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling, and software is successfully disposed the back two-server and adopt the work of Active-Standby active/standby mode in operational process;

Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service, during normal operation in normal, heartbeat heartbeat software can focus on all resources host node server 1, slave node server 2 is in waiting status, when the user has request of access, can directly have access to host node server 1 by the mode of accesses virtual IP, the machine in case host node server 1 is delayed, Heartbeat heartbeat software can detect the state of host node server 1 by the heartbeat line, and simultaneously with all resource switch to slave node server 2, this moment, server 2 became host node, because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is restarted, it is normal that the software of heartbeat heartbeat at this moment can detect server 1 by the heartbeat line again, and server 1 can be as slave node, during with convenient server 2 faults, take over heartbeat heartbeat software and switch the resource of coming, cause resource to operate on certain node always, realize high available.

The invention has the beneficial effects as follows: use two-server (being called server 1 and server 2 respectively), dispose the identical high available software of heartbeat heartbeat software and job scheduling software simultaneously, and be used the shared storage of NFS.Wherein heartbeat heartbeat software is managed as high available resources, and NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling.Using this method successfully to dispose the back two-server adopts the Active-Standby(active and standby in operational process) mode works.

Use two-server, cooperate high available software management resource, realize the high methods availalbe of job scheduling system.

Description of drawings

Fig. 1 is physical node catenation principle figure;

Fig. 2 is that server 1 and server 2 all are in the normal operating conditions synoptic diagram;

Fig. 3 is server 1 working state schematic representation when delaying machine;

Working state schematic representation when Fig. 4 server 1 recovers.

Embodiment

Explain below with reference to Figure of description method of the present invention being done.

Use two-server (being called server 1 and server 2 respectively), dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used the shared storage of NFS.Wherein heartbeat heartbeat software is managed as high available resources, and NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling.Using this method successfully to dispose the back two-server adopts the Active-Standby(active and standby in operational process) mode works.

Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service.During normal operation in normal, heartbeat heartbeat software can focus on all resources (server 1) in the host node, slave node (server 2) is in waiting status, when the user has request of access, can directly have access to host node (server 1) by the mode of accesses virtual IP.In case host node (server 1) machine of delaying, Heartbeat heartbeat software can detect the state of host node (server 1) by heartbeat line (being generally netting twine), and simultaneously with all resource switch to slave node (server 2), this moment server 2 become host node.Because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is being restarted, at this moment it is normal that heartbeat heartbeat software can detect server 1 by the heartbeat line, server 1 can be as slave node, during with convenient server 2 faults, takes over heartbeat and switches the resource of coming, cause resource to operate on certain node always, realize high available.

Except the described technical characterictic of instructions, be the known technology of those skilled in the art.

Claims

1. the high available method of a job scheduling system, it is characterized in that concrete steps are as follows: use two-server, be called server 1 and server 2 respectively, dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used NFS and share storage, wherein heartbeat heartbeat software is managed as high available resources, NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling, and software is successfully disposed the back two-server and adopt the work of Active-Standby active/standby mode in operational process;