CN103279386A - Method for achieving high availability of computer operation scheduling system - Google Patents
Method for achieving high availability of computer operation scheduling system Download PDFInfo
- Publication number
- CN103279386A CN103279386A CN2013102290979A CN201310229097A CN103279386A CN 103279386 A CN103279386 A CN 103279386A CN 2013102290979 A CN2013102290979 A CN 2013102290979A CN 201310229097 A CN201310229097 A CN 201310229097A CN 103279386 A CN103279386 A CN 103279386A
- Authority
- CN
- China
- Prior art keywords
- server
- heartbeat
- software
- host node
- normal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Hardware Redundancy (AREA)
Abstract
The invention provides a method for achieving high availability of a computer operation scheduling system. High availability software like heartbeat is combined with operation scheduling software and network file system (NFS) shared storage to achieve the high availability of the computer operation scheduling system. In a traditional scheme, an achieving mode of a high-performance computer operation scheduling system is mainly characterized in that the high-performance computer operation scheduling system is singly arranged on one server. Once the server crashes, the operation scheduling system fails, a high-performance computer cannot conduct scheduling operation normally, operation is stopped, and system resource is wasted. The arranging mode is easy and flexible, but system availability is poor. For such an important application, availability should be considered at the first place.
Description
Technical field
The present invention relates to the Computer Applied Technology field, specifically the high available method of a kind of job scheduling system.
Background technology
Current, based on network computer technology has promoted development and the widespread use of group system.With express network high-performance workstation or PC are connected into cluster by certain structure, realize parallel computation, only with very little cost, just can obtain the performance of large scale computer and parallel machine; The software systems that these workstations or PC are managed are exactly the cluster management system that this paper will study, and the job scheduling technology is one of gordian technique in the cluster management system.
Job scheduling is to embody the user job of being impartial to, and improves system response time, and then improves the key factor of system performance.In view of the critical role of job scheduling at cluster management system, we have carried out the job scheduling research at group system, propose and designed a kind of practicality, stable, reliable job scheduling strategy, practical application shows it is the solution preferably of this key problem of cluster job scheduling.
Summary of the invention
The purpose of this invention is to provide the high available method of a kind of job scheduling system.
The objective of the invention is to realize in the following manner, concrete steps are as follows: use two-server, be called server 1 and server 2 respectively, dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used NFS and share storage, wherein heartbeat heartbeat software is managed as high available resources, NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling, and software is successfully disposed the back two-server and adopt the work of Active-Standby active/standby mode in operational process;
Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service, during normal operation in normal, heartbeat heartbeat software can focus on all resources host node server 1, slave node server 2 is in waiting status, when the user has request of access, can directly have access to host node server 1 by the mode of accesses virtual IP, the machine in case host node server 1 is delayed, Heartbeat heartbeat software can detect the state of host node server 1 by the heartbeat line, and simultaneously with all resource switch to slave node server 2, this moment, server 2 became host node, because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is restarted, it is normal that the software of heartbeat heartbeat at this moment can detect server 1 by the heartbeat line again, and server 1 can be as slave node, during with convenient server 2 faults, take over heartbeat heartbeat software and switch the resource of coming, cause resource to operate on certain node always, realize high available.
The invention has the beneficial effects as follows: use two-server (being called server 1 and server 2 respectively), dispose the identical high available software of heartbeat heartbeat software and job scheduling software simultaneously, and be used the shared storage of NFS.Wherein heartbeat heartbeat software is managed as high available resources, and NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling.Using this method successfully to dispose the back two-server adopts the Active-Standby(active and standby in operational process) mode works.
Use two-server, cooperate high available software management resource, realize the high methods availalbe of job scheduling system.
Description of drawings
Fig. 1 is physical node catenation principle figure;
Fig. 2 is that server 1 and server 2 all are in the normal operating conditions synoptic diagram;
Fig. 3 is server 1 working state schematic representation when delaying machine;
Working state schematic representation when Fig. 4 server 1 recovers.
Embodiment
Explain below with reference to Figure of description method of the present invention being done.
Use two-server (being called server 1 and server 2 respectively), dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used the shared storage of NFS.Wherein heartbeat heartbeat software is managed as high available resources, and NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling.Using this method successfully to dispose the back two-server adopts the Active-Standby(active and standby in operational process) mode works.
Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service.During normal operation in normal, heartbeat heartbeat software can focus on all resources (server 1) in the host node, slave node (server 2) is in waiting status, when the user has request of access, can directly have access to host node (server 1) by the mode of accesses virtual IP.In case host node (server 1) machine of delaying, Heartbeat heartbeat software can detect the state of host node (server 1) by heartbeat line (being generally netting twine), and simultaneously with all resource switch to slave node (server 2), this moment server 2 become host node.Because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is being restarted, at this moment it is normal that heartbeat heartbeat software can detect server 1 by the heartbeat line, server 1 can be as slave node, during with convenient server 2 faults, takes over heartbeat and switches the resource of coming, cause resource to operate on certain node always, realize high available.
Except the described technical characterictic of instructions, be the known technology of those skilled in the art.
Claims (1)
1. the high available method of a job scheduling system, it is characterized in that concrete steps are as follows: use two-server, be called server 1 and server 2 respectively, dispose identical heartbeat heartbeat software and job scheduling software simultaneously, and be used NFS and share storage, wherein heartbeat heartbeat software is managed as high available resources, NFS shares storage and is used for making two-server to share the essential information of the operation that needs scheduling, and software is successfully disposed the back two-server and adopt the work of Active-Standby active/standby mode in operational process;
Heartbeat heartbeat software is mainly managed two resources, virtual IP address and job scheduling service, during normal operation in normal, heartbeat heartbeat software can focus on all resources host node server 1, slave node server 2 is in waiting status, when the user has request of access, can directly have access to host node server 1 by the mode of accesses virtual IP, the machine in case host node server 1 is delayed, Heartbeat heartbeat software can detect the state of host node server 1 by the heartbeat line, and simultaneously with all resource switch to slave node server 2, this moment, server 2 became host node, because server 2 is working properly, use so the service that do not influence is normal, at this moment server 1 is restarted, it is normal that the software of heartbeat heartbeat at this moment can detect server 1 by the heartbeat line again, and server 1 can be as slave node, during with convenient server 2 faults, take over heartbeat heartbeat software and switch the resource of coming, cause resource to operate on certain node always, realize high available.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013102290979A CN103279386A (en) | 2013-06-09 | 2013-06-09 | Method for achieving high availability of computer operation scheduling system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2013102290979A CN103279386A (en) | 2013-06-09 | 2013-06-09 | Method for achieving high availability of computer operation scheduling system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103279386A true CN103279386A (en) | 2013-09-04 |
Family
ID=49061921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2013102290979A Pending CN103279386A (en) | 2013-06-09 | 2013-06-09 | Method for achieving high availability of computer operation scheduling system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103279386A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103713974A (en) * | 2014-01-07 | 2014-04-09 | 浪潮(北京)电子信息产业有限公司 | High-performance job scheduling management node dual-computer reinforcement method and device |
CN103905250A (en) * | 2014-03-21 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | Method for optimizing and managing cluster state |
CN103986755A (en) * | 2014-05-12 | 2014-08-13 | 浪潮电子信息产业股份有限公司 | Implementation method of high-security full-redundancy parallel file system |
CN104601347A (en) * | 2013-10-30 | 2015-05-06 | 北京临近空间飞行器系统工程研究所 | High reliability data release storage system and method |
CN105141456A (en) * | 2015-08-25 | 2015-12-09 | 山东超越数控电子有限公司 | Method for monitoring high-availability cluster resource |
CN105183591A (en) * | 2015-09-07 | 2015-12-23 | 浪潮(北京)电子信息产业有限公司 | High-availability cluster implementation method and system |
CN105337762A (en) * | 2015-09-28 | 2016-02-17 | 浪潮(北京)电子信息产业有限公司 | File sharing method supporting automatic failover |
CN105468446A (en) * | 2015-11-20 | 2016-04-06 | 浪潮电子信息产业股份有限公司 | Linux based method for realizing high availability of HPC job scheduling |
CN106789350A (en) * | 2017-01-23 | 2017-05-31 | 郑州云海信息技术有限公司 | A kind of method and device of back-level server virtualization system host node High Availabitity |
CN108234271A (en) * | 2017-10-25 | 2018-06-29 | 国云科技股份有限公司 | A kind of cloud platform service network IP management methods |
CN111651291A (en) * | 2020-04-23 | 2020-09-11 | 国网河南省电力公司电力科学研究院 | Shared storage cluster brain crack prevention method, system and computer storage medium |
WO2022037171A1 (en) * | 2020-08-20 | 2022-02-24 | 苏州浪潮智能科技有限公司 | Data request method, device and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787244A (en) * | 1995-02-13 | 1998-07-28 | Fujitsu Limited | Information retrieval system |
CN101043310A (en) * | 2007-04-27 | 2007-09-26 | 北京佳讯飞鸿电气有限责任公司 | Image backup method for dual-core control of core controlled system |
CN101262147A (en) * | 2008-04-18 | 2008-09-10 | 深圳南瑞科技有限公司 | Dual-machine switching device for remote workstation |
CN102096602A (en) * | 2009-12-15 | 2011-06-15 | 中国移动通信集团公司 | Task scheduling method, and system and equipment thereof |
CN102629906A (en) * | 2012-03-30 | 2012-08-08 | 浪潮电子信息产业股份有限公司 | Design method for improving cluster business availability by using cluster management node as two computers |
CN103106126A (en) * | 2013-01-16 | 2013-05-15 | 浪潮电子信息产业股份有限公司 | High-availability computer system based on virtualization |
-
2013
- 2013-06-09 CN CN2013102290979A patent/CN103279386A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787244A (en) * | 1995-02-13 | 1998-07-28 | Fujitsu Limited | Information retrieval system |
CN101043310A (en) * | 2007-04-27 | 2007-09-26 | 北京佳讯飞鸿电气有限责任公司 | Image backup method for dual-core control of core controlled system |
CN101262147A (en) * | 2008-04-18 | 2008-09-10 | 深圳南瑞科技有限公司 | Dual-machine switching device for remote workstation |
CN102096602A (en) * | 2009-12-15 | 2011-06-15 | 中国移动通信集团公司 | Task scheduling method, and system and equipment thereof |
CN102629906A (en) * | 2012-03-30 | 2012-08-08 | 浪潮电子信息产业股份有限公司 | Design method for improving cluster business availability by using cluster management node as two computers |
CN103106126A (en) * | 2013-01-16 | 2013-05-15 | 浪潮电子信息产业股份有限公司 | High-availability computer system based on virtualization |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104601347B (en) * | 2013-10-30 | 2018-02-13 | 北京临近空间飞行器系统工程研究所 | A kind of highly reliable data publication storage method |
CN104601347A (en) * | 2013-10-30 | 2015-05-06 | 北京临近空间飞行器系统工程研究所 | High reliability data release storage system and method |
CN103713974A (en) * | 2014-01-07 | 2014-04-09 | 浪潮(北京)电子信息产业有限公司 | High-performance job scheduling management node dual-computer reinforcement method and device |
CN103905250A (en) * | 2014-03-21 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | Method for optimizing and managing cluster state |
CN103905250B (en) * | 2014-03-21 | 2018-02-23 | 浪潮电子信息产业股份有限公司 | A kind of method of optimum management cluster state |
CN103986755A (en) * | 2014-05-12 | 2014-08-13 | 浪潮电子信息产业股份有限公司 | Implementation method of high-security full-redundancy parallel file system |
CN105141456A (en) * | 2015-08-25 | 2015-12-09 | 山东超越数控电子有限公司 | Method for monitoring high-availability cluster resource |
CN105183591A (en) * | 2015-09-07 | 2015-12-23 | 浪潮(北京)电子信息产业有限公司 | High-availability cluster implementation method and system |
CN105337762A (en) * | 2015-09-28 | 2016-02-17 | 浪潮(北京)电子信息产业有限公司 | File sharing method supporting automatic failover |
CN105468446A (en) * | 2015-11-20 | 2016-04-06 | 浪潮电子信息产业股份有限公司 | Linux based method for realizing high availability of HPC job scheduling |
CN106789350A (en) * | 2017-01-23 | 2017-05-31 | 郑州云海信息技术有限公司 | A kind of method and device of back-level server virtualization system host node High Availabitity |
CN108234271A (en) * | 2017-10-25 | 2018-06-29 | 国云科技股份有限公司 | A kind of cloud platform service network IP management methods |
CN111651291A (en) * | 2020-04-23 | 2020-09-11 | 国网河南省电力公司电力科学研究院 | Shared storage cluster brain crack prevention method, system and computer storage medium |
CN111651291B (en) * | 2020-04-23 | 2023-02-03 | 国网河南省电力公司电力科学研究院 | Method, system and computer storage medium for preventing split brain of shared storage cluster |
WO2022037171A1 (en) * | 2020-08-20 | 2022-02-24 | 苏州浪潮智能科技有限公司 | Data request method, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103279386A (en) | Method for achieving high availability of computer operation scheduling system | |
CN104965850A (en) | Database high-available implementation method based on open source technology | |
US20150100826A1 (en) | Fault domains on modern hardware | |
CN103064742A (en) | Automatic deployment system and method of hadoop cluster | |
CN105187512A (en) | Method and system for load balancing of virtual machine clusters | |
US10785350B2 (en) | Heartbeat in failover cluster | |
CN105045531A (en) | Buffer synchronization mechanism between double storage controllers | |
CN103716372A (en) | Digital library-as-a-service cloud computing platform construction method | |
CN103457775A (en) | High-availability virtual machine pooling management system based on roles | |
CN105721582A (en) | Multi-node file backup system | |
CN102868727A (en) | Method for realizing high availability of logical volume | |
CN103942128A (en) | Double-computer reinforcing method for high-performance job scheduling management node | |
CN104123183A (en) | Cluster assignment dispatching method and device | |
CN103533068A (en) | Independent and balanced task distribution cluster system based on IP | |
CN102820998A (en) | Dual-fault-tolerant service system applicable to office applications and data storage method of dual-fault-tolerant service system | |
CN104468710A (en) | Mixed big data processing system and method | |
CN104270272B (en) | A kind of electric energy quality monitoring data management scheme based on mobile Agent | |
US8621260B1 (en) | Site-level sub-cluster dependencies | |
CN109408597A (en) | A kind of power grid metering big data storage system and its creation method | |
CN103209219A (en) | Distributed cluster file system | |
CN105302817A (en) | Distributed file system management method and apparatus | |
CN103209218A (en) | Management system for disaster-tolerant all-in-one machine | |
CN103259845A (en) | Improvement method of data backup task based on network interruption | |
US9432476B1 (en) | Proxy data storage system monitoring aggregator for a geographically-distributed environment | |
CN104765798A (en) | System and method for achieving Mysql remote synchronous fault tolerance enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130904 |
|
WD01 | Invention patent application deemed withdrawn after publication |