CN103713974A - High-performance job scheduling management node dual-computer reinforcement method and device - Google Patents
High-performance job scheduling management node dual-computer reinforcement method and device Download PDFInfo
- Publication number
- CN103713974A CN103713974A CN201410007013.1A CN201410007013A CN103713974A CN 103713974 A CN103713974 A CN 103713974A CN 201410007013 A CN201410007013 A CN 201410007013A CN 103713974 A CN103713974 A CN 103713974A
- Authority
- CN
- China
- Prior art keywords
- management node
- heartbeat
- job scheduling
- operating system
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a high-performance job scheduling management node dual-computer reinforcement method to simultaneously monitor heartbeat information and operating system resources of a main management node. When faults are found to happen to the heartbeat information or the operating system resources of the main management node, management node switching is started. Meanwhile, the invention further provides a corresponding device. Dual-computer reinforcement of the job scheduling management node is achieved through the method and the device, the operating system resources can be monitored, and the defects of a traditional method are effectively overcome.
Description
Technical field
The present invention relates to field of computer technology, the two-shipper that is specifically related to a kind of job scheduling management node is reinforced.
Background technology
Current, network computer technology, has promoted development and the widespread use of group system.With express network, high-performance workstation or PC (PC) are connected into cluster by certain structure, realize parallel computation, only need very little cost just can obtain the performance of large scale computer and parallel machine.Yet along with the continuous expansion of high-performance computer cluster application scale, the problem of management of cluster is also following.Job scheduling system is mainly responsible for receiving the job request that user submits to, and to the requirement of operation, selects suitable computational resource to carry out completing user operation according to specific scheduling rule and user.Under the help of job scheduling system, for user's HPCC system, just look like a large server that possesses a lot of CPU, a plurality of users can use this system simultaneously.The operation that job scheduling system leading subscriber is submitted to, is each operation Resources allocation reasonably, thereby guarantees to make full use of the computing power of group system, and as far as possible promptly obtains operation result.Therefore the importance of job scheduling system is also just self-evident.
Traditional reinforcement means comprises the deployment of management node unit, or uses heartbeat (heartbeat) scheme to carry out two-shipper reinforcing.All there is certain defect leak in these two kinds of modes, the mode that for example adopts management node unit to dispose, once this management node breaks down, just can cause the job scheduling system of whole cluster to quit work, the operation of whole cluster cannot be carried out reasonable efficient scheduling, job run also just there will be stagnation, has a strong impact on running efficiency of system; For another example adopt heartbeat scheme to carry out two-shipper reinforcing, design factor due to heartbeat software self, can not carry out resource level monitoring to job scheduling system, once the resource of monitoring breaks down, just can not effectively carry out resource switch, can cause equally whole group operation cannot carry out reasonable efficient scheduling, have a strong impact on running efficiency of system.Because above-mentioned two kinds of reinforcing modes all exist fatal shortcoming, therefore how more effectively job scheduling system to be reinforced and just to become a technical matters urgently to be resolved hurrily.
Summary of the invention
The present invention proposes a kind of high-performance job scheduling management node two-shipper reinforcement means and equipment, avoided on the one hand unit to dispose the Single Point of Faliure problem causing, on the other hand, provide the monitoring to operating system resource, can effectively make up the deficiency of classic method.
A high-performance job scheduling management node two-shipper reinforcement means, comprising:
Step 1: the share directory of nfs server is mounted on job scheduling two-shipper management node, starts heartbeat monitor and monitoring resource;
Step 2: heartbeat monitor and monitoring resource are monitored the heartbeat message of current main management node and operating system resource respectively;
Step 3: judge whether the described heartbeat message of current main management node or operating system resource break down, if it is start management node and switch.
A high-performance job scheduling management node two-shipper bracing means, comprising:
Heartbeat inspecting module, is configured for the heartbeat message of current main management node is monitored, and to monitoring resource module report heartbeat failure message;
Monitoring resource module, is configured for the operating system resource of current main management node is monitored, and when receiving heartbeat failure message or judge that described operating system resource breaks down, starts management node and switch.
The invention has the beneficial effects as follows to realize the two-shipper of job scheduling management node is reinforced, also realized the monitoring to operating system resource, can effectively make up the deficiency of classic method simultaneously.
Accompanying drawing explanation
Fig. 1 is the operation logic block diagram of a kind of high-performance job scheduling management node two-shipper reinforcement means of proposing of the present invention.
Fig. 2 is the process flow diagram of a kind of high-performance job scheduling management node two-shipper reinforcement means of proposing of the present invention.
Fig. 3 is the theory diagram of a kind of high-performance job scheduling management node two-shipper bracing means of proposing of the present invention.
Embodiment
With reference to Fig. 1, Fig. 1 shows the operation logic block diagram of the method for the present invention's proposition, at management node 1(main management node) and management node 2 on move the method that the present invention proposes, the heartbeat message of heartbeat inspecting module Real-Time Monitoring main management node, when the heartbeat of finding main management node is broken down, report monitoring resource module.Monitoring resource module is monitored the operating system resource on main management node in real time, when finding that operating system resource breaks down or while receiving the main management node heartbeat fault of heartbeat inspecting module report, start management node handoff procedure, make management node 2 become main management node.
With reference to accompanying drawing 2, Fig. 2 shows a kind of high-performance job scheduling management node two-shipper reinforcement means process flow diagram that the present invention proposes, and comprising:
Step 1: the share directory of nfs server is mounted on job scheduling two-shipper management node, starts heartbeat monitor (corosync) and monitoring resource (pacemaker).Described heartbeat monitor and monitoring resource are monitored management node 1 and management node 2 respectively, and wherein management node 1 is as main management node, and management node 2 is as slave node, and management node 1 and management node 2 fabrication processes are dispatched two-shipper nodes.User can be configured heartbeat monitor and monitoring resource parameter in advance, for example monitor duration timeout, the supervision interval interval of resource allocation are, grouping and the boot sequence of resource, need to configure STONITH, so to greatest extent the availability of Support Resource simultaneously.
Step 2: heartbeat monitor and monitoring resource are monitored the heartbeat message of current main management node and operating system resource respectively.
Step 3: judge whether the described heartbeat message of current main management node or operating system resource break down, if it is start management node and switch.
Referring to Fig. 3, Fig. 3 shows a kind of high-performance job scheduling management node two-shipper bracing means that the present invention proposes, and described device comprises:
Heartbeat inspecting module, is configured for the heartbeat message of current main management node is monitored, and to monitoring resource module report heartbeat failure message;
Monitoring resource module, is configured for the operating system resource of current main management node is monitored, and when receiving heartbeat failure message or judge that described operating system resource breaks down, starts management node and switch.
Certainly; the present invention also can have other various embodiments; in the situation that not deviating from spirit of the present invention and essence thereof; those of ordinary skill in the art are when making according to the present invention various corresponding changes and distortion, but these corresponding changes and distortion all should belong to the protection domain of claim of the present invention.
Claims (3)
1. a high-performance job scheduling management node two-shipper reinforcement means, is characterized in that, comprising:
Step 1: the share directory of nfs server is mounted on job scheduling two-shipper management node, starts heartbeat monitor and monitoring resource;
Step 2: heartbeat monitor and monitoring resource are monitored the heartbeat message of current main management node and operating system resource respectively;
Step 3: judge whether the described heartbeat message of current main management node or operating system resource break down, if it is start management node and switch.
2. the method for claim 1, is characterized in that:
User is configured heartbeat monitor and monitoring resource parameter in advance, and described parameter comprises monitor duration timeout, supervision interval interval.
3. a high-performance job scheduling management node two-shipper bracing means, is characterized in that: comprising:
Heartbeat inspecting module, is configured for the heartbeat message of current main management node is monitored, and to monitoring resource module report heartbeat failure message;
Monitoring resource module, is configured for the operating system resource of current main management node is monitored, and when receiving heartbeat failure message or judge that described operating system resource breaks down, starts management node and switch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410007013.1A CN103713974B (en) | 2014-01-07 | 2014-01-07 | A kind of high-performance job scheduling management node two-shipper reinforcement means and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410007013.1A CN103713974B (en) | 2014-01-07 | 2014-01-07 | A kind of high-performance job scheduling management node two-shipper reinforcement means and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103713974A true CN103713974A (en) | 2014-04-09 |
CN103713974B CN103713974B (en) | 2016-02-17 |
Family
ID=50406975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410007013.1A Active CN103713974B (en) | 2014-01-07 | 2014-01-07 | A kind of high-performance job scheduling management node two-shipper reinforcement means and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103713974B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942128A (en) * | 2014-04-29 | 2014-07-23 | 浪潮电子信息产业股份有限公司 | Double-computer reinforcing method for high-performance job scheduling management node |
CN104123183A (en) * | 2014-07-28 | 2014-10-29 | 浪潮(北京)电子信息产业有限公司 | Cluster assignment dispatching method and device |
CN105141456A (en) * | 2015-08-25 | 2015-12-09 | 山东超越数控电子有限公司 | Method for monitoring high-availability cluster resource |
CN105260377A (en) * | 2015-09-01 | 2016-01-20 | 浪潮(北京)电子信息产业有限公司 | Updating method and system based on hierarchical storage |
CN105743995A (en) * | 2016-04-05 | 2016-07-06 | 北京轻元科技有限公司 | Transplantable high-available container cluster deploying and managing system and method |
CN106708881A (en) * | 2015-11-17 | 2017-05-24 | 华为技术有限公司 | Interaction method and device based on network file system |
CN107819619A (en) * | 2017-11-02 | 2018-03-20 | 郑州云海信息技术有限公司 | A kind of continual method of access for realizing NFS |
CN109062184A (en) * | 2018-08-10 | 2018-12-21 | 中国船舶重工集团公司第七〇九研究所 | Two-shipper emergency and rescue equipment, failure switching method and rescue system |
CN109542471A (en) * | 2018-11-28 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of installation method and device of calculate node |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101179432A (en) * | 2007-12-13 | 2008-05-14 | 浪潮电子信息产业股份有限公司 | Method of implementing high availability of system in multi-machine surroundings |
US20090193071A1 (en) * | 2008-01-30 | 2009-07-30 | At&T Knowledge Ventures, L.P. | Facilitating Deployment of New Application Services in a Next Generation Network |
CN103227838A (en) * | 2013-05-10 | 2013-07-31 | 中国工商银行股份有限公司 | Multi-load equalization processing device and method |
CN103279386A (en) * | 2013-06-09 | 2013-09-04 | 浪潮电子信息产业股份有限公司 | Method for achieving high availability of computer operation scheduling system |
CN103297543A (en) * | 2013-06-24 | 2013-09-11 | 浪潮电子信息产业股份有限公司 | Job scheduling method based on computer cluster |
-
2014
- 2014-01-07 CN CN201410007013.1A patent/CN103713974B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101179432A (en) * | 2007-12-13 | 2008-05-14 | 浪潮电子信息产业股份有限公司 | Method of implementing high availability of system in multi-machine surroundings |
US20090193071A1 (en) * | 2008-01-30 | 2009-07-30 | At&T Knowledge Ventures, L.P. | Facilitating Deployment of New Application Services in a Next Generation Network |
CN103227838A (en) * | 2013-05-10 | 2013-07-31 | 中国工商银行股份有限公司 | Multi-load equalization processing device and method |
CN103279386A (en) * | 2013-06-09 | 2013-09-04 | 浪潮电子信息产业股份有限公司 | Method for achieving high availability of computer operation scheduling system |
CN103297543A (en) * | 2013-06-24 | 2013-09-11 | 浪潮电子信息产业股份有限公司 | Job scheduling method based on computer cluster |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942128A (en) * | 2014-04-29 | 2014-07-23 | 浪潮电子信息产业股份有限公司 | Double-computer reinforcing method for high-performance job scheduling management node |
CN104123183B (en) * | 2014-07-28 | 2017-11-14 | 浪潮(北京)电子信息产业有限公司 | Cluster job scheduling method and apparatus |
CN104123183A (en) * | 2014-07-28 | 2014-10-29 | 浪潮(北京)电子信息产业有限公司 | Cluster assignment dispatching method and device |
CN105141456A (en) * | 2015-08-25 | 2015-12-09 | 山东超越数控电子有限公司 | Method for monitoring high-availability cluster resource |
CN105260377A (en) * | 2015-09-01 | 2016-01-20 | 浪潮(北京)电子信息产业有限公司 | Updating method and system based on hierarchical storage |
CN105260377B (en) * | 2015-09-01 | 2019-02-12 | 浪潮(北京)电子信息产业有限公司 | A kind of upgrade method and system based on classification storage |
CN106708881A (en) * | 2015-11-17 | 2017-05-24 | 华为技术有限公司 | Interaction method and device based on network file system |
CN106708881B (en) * | 2015-11-17 | 2020-08-25 | 华为技术有限公司 | Interaction method and device based on network file system |
CN105743995A (en) * | 2016-04-05 | 2016-07-06 | 北京轻元科技有限公司 | Transplantable high-available container cluster deploying and managing system and method |
CN105743995B (en) * | 2016-04-05 | 2019-10-18 | 北京轻元科技有限公司 | A kind of system and method for the deployment of portable High Availabitity and management container cluster |
CN107819619A (en) * | 2017-11-02 | 2018-03-20 | 郑州云海信息技术有限公司 | A kind of continual method of access for realizing NFS |
CN109062184A (en) * | 2018-08-10 | 2018-12-21 | 中国船舶重工集团公司第七〇九研究所 | Two-shipper emergency and rescue equipment, failure switching method and rescue system |
CN109062184B (en) * | 2018-08-10 | 2021-05-14 | 中国船舶重工集团公司第七一九研究所 | Double-machine emergency rescue equipment, fault switching method and rescue system |
CN109542471A (en) * | 2018-11-28 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of installation method and device of calculate node |
Also Published As
Publication number | Publication date |
---|---|
CN103713974B (en) | 2016-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103713974B (en) | A kind of high-performance job scheduling management node two-shipper reinforcement means and equipment | |
TWI603266B (en) | Resource adjustment methods and systems for virtual machines | |
CN102984012B (en) | Management method and system for service resources | |
CN106406905B (en) | Configuration method and system for SETUP option of BIOS of server | |
WO2016058318A1 (en) | Elastic virtual machine (vm) resource scaling method, apparatus and system | |
CN112948063B (en) | Cloud platform creation method and device, cloud platform and cloud platform implementation system | |
CN102394774A (en) | Service state monitoring and failure recovery method for controllers of cloud computing operating system | |
CN105159769A (en) | Distributed job scheduling method suitable for heterogeneous computational capability cluster | |
CN103942128A (en) | Double-computer reinforcing method for high-performance job scheduling management node | |
KR20200078328A (en) | Systems and methods of monitoring software application processes | |
CN105812169A (en) | Host and standby machine switching method and device | |
CN109842526B (en) | Disaster recovery method and device | |
CN104660694A (en) | Method and apparatus for calling service | |
CN102025776A (en) | Disaster tolerant control method, device and system | |
CN101262479B (en) | A network file share method, server and network file share system | |
CN103312541A (en) | Management method of high-availability mutual backup cluster | |
CN103152420B (en) | A kind of method avoiding single-point-of-failofe ofe Ovirt virtual management platform | |
CN107579850B (en) | Wired and wireless hybrid networking method based on SDN control for cloud data center | |
CN105141691A (en) | System and method for automatically expanding virtual machine cluster under cloud computing | |
CN107529180B (en) | Base station cloud test environment construction device and method | |
CN109995554A (en) | The control method and cloud dispatch control device of multi-stage data center active-standby switch | |
CN108154343B (en) | Emergency processing method and system for enterprise-level information system | |
CN107005434A (en) | A kind of method, device and the equipment of synchronous virtual network function VNF states | |
CN105302276A (en) | Design method for limiting power consumption of SmartRack whole cabinet | |
CN109117320A (en) | Power distribution automation main station failure disaster tolerance processing system and method based on cloud platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |