CN103279386A - Method for achieving high availability of computer operation scheduling system - Google Patents

Method for achieving high availability of computer operation scheduling system Download PDF

Info

Publication number
CN103279386A
CN103279386A CN2013102290979A CN201310229097A CN103279386A CN 103279386 A CN103279386 A CN 103279386A CN 2013102290979 A CN2013102290979 A CN 2013102290979A CN 201310229097 A CN201310229097 A CN 201310229097A CN 103279386 A CN103279386 A CN 103279386A
Authority
CN
China
Prior art keywords
server
heartbeat
software
host node
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013102290979A
Other languages
Chinese (zh)
Inventor
马四腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN2013102290979A priority Critical patent/CN103279386A/en
Publication of CN103279386A publication Critical patent/CN103279386A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention provides a method for achieving high availability of a computer operation scheduling system. High availability software like heartbeat is combined with operation scheduling software and network file system (NFS) shared storage to achieve the high availability of the computer operation scheduling system. In a traditional scheme, an achieving mode of a high-performance computer operation scheduling system is mainly characterized in that the high-performance computer operation scheduling system is singly arranged on one server. Once the server crashes, the operation scheduling system fails, a high-performance computer cannot conduct scheduling operation normally, operation is stopped, and system resource is wasted. The arranging mode is easy and flexible, but system availability is poor. For such an important application, availability should be considered at the first place.

Description

The high available method of a kind of job scheduling system
Technical field
The present invention relates to the Computer Applied Technology field, specifically the high available method of a kind of job scheduling system.
Background technology
Current, based on network computer technology has promoted development and the widespread use of group system.With express network high-performance workstation or PC are connected into cluster by certain structure, realize parallel computation, only with very little cost, just can obtain the performance of large scale computer and parallel machine; The software systems that these workstations or PC are managed are exactly the cluster management system that this paper will study, and the job scheduling technology is one of gordian technique in the cluster management system.
Job scheduling is to embody the user job of being impartial to, and improves system response time, and then improves the key factor of system performance.In view of the critical role of job scheduling at cluster management system, we have carried out the job scheduling research at group system, propose and designed a kind of practicality, stable, reliable job scheduling strategy, practical application shows it is the solution preferably of this key problem of cluster job scheduling.
Summary of the invention
The purpose of this invention is to provide the high available method of a kind of job scheduling system.
The objective of the invention is to realize in the following manner, concrete steps are as follows: use two-server, be called server 1 and server 2 respectively, dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used NFS and share storage, wherein heartbeat heartbeat software is managed as high available resources, NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling, and software is successfully disposed the back two-server and adopt the work of Active-Standby active/standby mode in operational process;
Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service, during normal operation in normal, heartbeat heartbeat software can focus on all resources host node server 1, slave node server 2 is in waiting status, when the user has request of access, can directly have access to host node server 1 by the mode of accesses virtual IP, the machine in case host node server 1 is delayed, Heartbeat heartbeat software can detect the state of host node server 1 by the heartbeat line, and simultaneously with all resource switch to slave node server 2, this moment, server 2 became host node, because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is restarted, it is normal that the software of heartbeat heartbeat at this moment can detect server 1 by the heartbeat line again, and server 1 can be as slave node, during with convenient server 2 faults, take over heartbeat heartbeat software and switch the resource of coming, cause resource to operate on certain node always, realize high available.
The invention has the beneficial effects as follows: use two-server (being called server 1 and server 2 respectively), dispose the identical high available software of heartbeat heartbeat software and job scheduling software simultaneously, and be used the shared storage of NFS.Wherein heartbeat heartbeat software is managed as high available resources, and NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling.Using this method successfully to dispose the back two-server adopts the Active-Standby(active and standby in operational process) mode works.
Use two-server, cooperate high available software management resource, realize the high methods availalbe of job scheduling system.
Description of drawings
Fig. 1 is physical node catenation principle figure;
Fig. 2 is that server 1 and server 2 all are in the normal operating conditions synoptic diagram;
Fig. 3 is server 1 working state schematic representation when delaying machine;
Working state schematic representation when Fig. 4 server 1 recovers.
Embodiment
Explain below with reference to Figure of description method of the present invention being done.
Use two-server (being called server 1 and server 2 respectively), dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used the shared storage of NFS.Wherein heartbeat heartbeat software is managed as high available resources, and NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling.Using this method successfully to dispose the back two-server adopts the Active-Standby(active and standby in operational process) mode works.
Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service.During normal operation in normal, heartbeat heartbeat software can focus on all resources (server 1) in the host node, slave node (server 2) is in waiting status, when the user has request of access, can directly have access to host node (server 1) by the mode of accesses virtual IP.In case host node (server 1) machine of delaying, Heartbeat heartbeat software can detect the state of host node (server 1) by heartbeat line (being generally netting twine), and simultaneously with all resource switch to slave node (server 2), this moment server 2 become host node.Because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is being restarted, at this moment it is normal that heartbeat heartbeat software can detect server 1 by the heartbeat line, server 1 can be as slave node, during with convenient server 2 faults, takes over heartbeat and switches the resource of coming, cause resource to operate on certain node always, realize high available.
Except the described technical characterictic of instructions, be the known technology of those skilled in the art.

Claims (1)

1. the high available method of a job scheduling system, it is characterized in that concrete steps are as follows: use two-server, be called server 1 and server 2 respectively, dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used NFS and share storage, wherein heartbeat heartbeat software is managed as high available resources, NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling, and software is successfully disposed the back two-server and adopt the work of Active-Standby active/standby mode in operational process;
Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service, during normal operation in normal, heartbeat heartbeat software can focus on all resources host node server 1, slave node server 2 is in waiting status, when the user has request of access, can directly have access to host node server 1 by the mode of accesses virtual IP, the machine in case host node server 1 is delayed, Heartbeat heartbeat software can detect the state of host node server 1 by the heartbeat line, and simultaneously with all resource switch to slave node server 2, this moment, server 2 became host node, because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is restarted, it is normal that the software of heartbeat heartbeat at this moment can detect server 1 by the heartbeat line again, and server 1 can be as slave node, during with convenient server 2 faults, take over heartbeat heartbeat software and switch the resource of coming, cause resource to operate on certain node always, realize high available.
CN2013102290979A 2013-06-09 2013-06-09 Method for achieving high availability of computer operation scheduling system Pending CN103279386A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013102290979A CN103279386A (en) 2013-06-09 2013-06-09 Method for achieving high availability of computer operation scheduling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013102290979A CN103279386A (en) 2013-06-09 2013-06-09 Method for achieving high availability of computer operation scheduling system

Publications (1)

Publication Number Publication Date
CN103279386A true CN103279386A (en) 2013-09-04

Family

ID=49061921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013102290979A Pending CN103279386A (en) 2013-06-09 2013-06-09 Method for achieving high availability of computer operation scheduling system

Country Status (1)

Country Link
CN (1) CN103279386A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103713974A (en) * 2014-01-07 2014-04-09 浪潮(北京)电子信息产业有限公司 High-performance job scheduling management node dual-computer reinforcement method and device
CN103905250A (en) * 2014-03-21 2014-07-02 浪潮电子信息产业股份有限公司 Method for optimizing and managing cluster state
CN103986755A (en) * 2014-05-12 2014-08-13 浪潮电子信息产业股份有限公司 Implementation method of high-security full-redundancy parallel file system
CN104601347A (en) * 2013-10-30 2015-05-06 北京临近空间飞行器系统工程研究所 High reliability data release storage system and method
CN105141456A (en) * 2015-08-25 2015-12-09 山东超越数控电子有限公司 Method for monitoring high-availability cluster resource
CN105183591A (en) * 2015-09-07 2015-12-23 浪潮(北京)电子信息产业有限公司 High-availability cluster implementation method and system
CN105337762A (en) * 2015-09-28 2016-02-17 浪潮(北京)电子信息产业有限公司 File sharing method supporting automatic failover
CN105468446A (en) * 2015-11-20 2016-04-06 浪潮电子信息产业股份有限公司 Linux based method for realizing high availability of HPC job scheduling
CN106789350A (en) * 2017-01-23 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of back-level server virtualization system host node High Availabitity
CN108234271A (en) * 2017-10-25 2018-06-29 国云科技股份有限公司 A kind of cloud platform service network IP management methods
CN111651291A (en) * 2020-04-23 2020-09-11 国网河南省电力公司电力科学研究院 Shared storage cluster brain crack prevention method, system and computer storage medium
WO2022037171A1 (en) * 2020-08-20 2022-02-24 苏州浪潮智能科技有限公司 Data request method, device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787244A (en) * 1995-02-13 1998-07-28 Fujitsu Limited Information retrieval system
CN101043310A (en) * 2007-04-27 2007-09-26 北京佳讯飞鸿电气有限责任公司 Image backup method for dual-core control of core controlled system
CN101262147A (en) * 2008-04-18 2008-09-10 深圳南瑞科技有限公司 Dual-machine switching device for remote workstation
CN102096602A (en) * 2009-12-15 2011-06-15 中国移动通信集团公司 Task scheduling method, and system and equipment thereof
CN102629906A (en) * 2012-03-30 2012-08-08 浪潮电子信息产业股份有限公司 Design method for improving cluster business availability by using cluster management node as two computers
CN103106126A (en) * 2013-01-16 2013-05-15 浪潮电子信息产业股份有限公司 High-availability computer system based on virtualization

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787244A (en) * 1995-02-13 1998-07-28 Fujitsu Limited Information retrieval system
CN101043310A (en) * 2007-04-27 2007-09-26 北京佳讯飞鸿电气有限责任公司 Image backup method for dual-core control of core controlled system
CN101262147A (en) * 2008-04-18 2008-09-10 深圳南瑞科技有限公司 Dual-machine switching device for remote workstation
CN102096602A (en) * 2009-12-15 2011-06-15 中国移动通信集团公司 Task scheduling method, and system and equipment thereof
CN102629906A (en) * 2012-03-30 2012-08-08 浪潮电子信息产业股份有限公司 Design method for improving cluster business availability by using cluster management node as two computers
CN103106126A (en) * 2013-01-16 2013-05-15 浪潮电子信息产业股份有限公司 High-availability computer system based on virtualization

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601347B (en) * 2013-10-30 2018-02-13 北京临近空间飞行器系统工程研究所 A kind of highly reliable data publication storage method
CN104601347A (en) * 2013-10-30 2015-05-06 北京临近空间飞行器系统工程研究所 High reliability data release storage system and method
CN103713974A (en) * 2014-01-07 2014-04-09 浪潮(北京)电子信息产业有限公司 High-performance job scheduling management node dual-computer reinforcement method and device
CN103905250A (en) * 2014-03-21 2014-07-02 浪潮电子信息产业股份有限公司 Method for optimizing and managing cluster state
CN103905250B (en) * 2014-03-21 2018-02-23 浪潮电子信息产业股份有限公司 A kind of method of optimum management cluster state
CN103986755A (en) * 2014-05-12 2014-08-13 浪潮电子信息产业股份有限公司 Implementation method of high-security full-redundancy parallel file system
CN105141456A (en) * 2015-08-25 2015-12-09 山东超越数控电子有限公司 Method for monitoring high-availability cluster resource
CN105183591A (en) * 2015-09-07 2015-12-23 浪潮(北京)电子信息产业有限公司 High-availability cluster implementation method and system
CN105337762A (en) * 2015-09-28 2016-02-17 浪潮(北京)电子信息产业有限公司 File sharing method supporting automatic failover
CN105468446A (en) * 2015-11-20 2016-04-06 浪潮电子信息产业股份有限公司 Linux based method for realizing high availability of HPC job scheduling
CN106789350A (en) * 2017-01-23 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of back-level server virtualization system host node High Availabitity
CN108234271A (en) * 2017-10-25 2018-06-29 国云科技股份有限公司 A kind of cloud platform service network IP management methods
CN111651291A (en) * 2020-04-23 2020-09-11 国网河南省电力公司电力科学研究院 Shared storage cluster brain crack prevention method, system and computer storage medium
CN111651291B (en) * 2020-04-23 2023-02-03 国网河南省电力公司电力科学研究院 Method, system and computer storage medium for preventing split brain of shared storage cluster
WO2022037171A1 (en) * 2020-08-20 2022-02-24 苏州浪潮智能科技有限公司 Data request method, device and medium

Similar Documents

Publication Publication Date Title
CN103279386A (en) Method for achieving high availability of computer operation scheduling system
CN104965850A (en) Database high-available implementation method based on open source technology
US20150100826A1 (en) Fault domains on modern hardware
CN103064742A (en) Automatic deployment system and method of hadoop cluster
CN105187512A (en) Method and system for load balancing of virtual machine clusters
US10785350B2 (en) Heartbeat in failover cluster
CN105045531A (en) Buffer synchronization mechanism between double storage controllers
CN103716372A (en) Digital library-as-a-service cloud computing platform construction method
CN103457775A (en) High-availability virtual machine pooling management system based on roles
CN105721582A (en) Multi-node file backup system
CN102868727A (en) Method for realizing high availability of logical volume
CN103942128A (en) Double-computer reinforcing method for high-performance job scheduling management node
CN104123183A (en) Cluster assignment dispatching method and device
CN103533068A (en) Independent and balanced task distribution cluster system based on IP
CN102820998A (en) Dual-fault-tolerant service system applicable to office applications and data storage method of dual-fault-tolerant service system
CN104468710A (en) Mixed big data processing system and method
CN104270272B (en) A kind of electric energy quality monitoring data management scheme based on mobile Agent
US8621260B1 (en) Site-level sub-cluster dependencies
CN109408597A (en) A kind of power grid metering big data storage system and its creation method
CN103209219A (en) Distributed cluster file system
CN105302817A (en) Distributed file system management method and apparatus
CN103209218A (en) Management system for disaster-tolerant all-in-one machine
CN103259845A (en) Improvement method of data backup task based on network interruption
US9432476B1 (en) Proxy data storage system monitoring aggregator for a geographically-distributed environment
CN104765798A (en) System and method for achieving Mysql remote synchronous fault tolerance enhancement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130904

WD01 Invention patent application deemed withdrawn after publication