CN105812159B - A kind of cloud platform monitoring alarm method - Google Patents

A kind of cloud platform monitoring alarm method Download PDF

Info

Publication number
CN105812159B
CN105812159B CN201410841470.0A CN201410841470A CN105812159B CN 105812159 B CN105812159 B CN 105812159B CN 201410841470 A CN201410841470 A CN 201410841470A CN 105812159 B CN105812159 B CN 105812159B
Authority
CN
China
Prior art keywords
node
monitoring alarm
master node
task
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410841470.0A
Other languages
Chinese (zh)
Other versions
CN105812159A (en
Inventor
刘冬
喻之斌
贝振东
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201410841470.0A priority Critical patent/CN105812159B/en
Publication of CN105812159A publication Critical patent/CN105812159A/en
Application granted granted Critical
Publication of CN105812159B publication Critical patent/CN105812159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of cloud platform monitoring alarm methods, comprising: for each node: broadcasting the operating status of this node, receives the operating status of other node broadcasts, periodically updates all monitoring nodes alert service operating status lists of local maintenance;According to the monitoring alarm service operation status list, master node is determined;The master node being confirmed as fulfils the responsibility of master node;The queue of monitoring alarm task message is safeguarded by message queue protocol.Implementation of the present invention is simple and reliable, is with good expansibility, high availability and fault-tolerance.

Description

A kind of cloud platform monitoring alarm method
Technical field
The present invention relates to a kind of cloud platform monitoring alarm methods.
Background technique
The mature and grid of virtual technology are in connection, are pregnant with cloud computing platform.Cloud computing platform handle Huge infrastructure, data storage, various platforms, software composition be mutually shared, cooperation resource pool, and takes out on this basis As going out stratification service, such as architecture (laaS), platform (Paas), software are provided for user in such a way that payment uses (Saas) service such as.
Monitoring is the important component of cloud computing platform, it is many such as network analysis, system in cloud computing platform Management, job scheduling, load balancing, event prediction, fault detection and recovery operation premise, cloud computing platform can be helped The use of dynamic quantization resource, detection service defect, discovery user's use pattern, auxiliary resources scheduler module decision, to raising cloud The service quality of computing platform plays a significant role.Cloud computing platform not only includes the storage of bottom, network, computing resource, is gone back Including on this basis virtual resource and these resources abstract integration after cloud platform.It is flooded with above cloud platform each How isomery, dynamic, complicated resource under kind a large amount of distributed environment carry out efficient dynamic to them and monitor and pipe Reason is to provide the guarantee of high-quality service.
The famous cloud computing platform of industry has the monitoring solution of oneself at present, to realize the prison to cloud computing platform Control alarm.But these schemes often all use one-site model, i.e. a node is responsible for the monitoring of entire cloud computing platform, with The continuous extension of cloud platform, the pressure of monitoring also can be increasing, and scalability and fault-tolerance are also poor, it is difficult to guarantee The monitoring alarm task of cloud platform efficiently executes.Therefore, the monitoring and alarming system of cloud platform not only will efficiently realize that monitoring is appointed Business, should also be with good expansibility, high availability and fault-tolerance.
Summary of the invention
In view of this, it is necessary to provide a kind of cloud platform monitoring alarm methods.
The present invention also provides a kind of cloud platform monitoring alarm methods, which is characterized in that this method comprises the following steps: a. pairs In each node: broadcasting the operating status of this node, receive the operating status of other node broadcasts, periodically update local maintenance All monitoring nodes alert service operating status lists;B. it according to the monitoring alarm service operation status list, determines Master node;C. the master node being confirmed as fulfils the responsibility of master node;D. it is safeguarded by message queue protocol Monitoring alarm task message queue.
Wherein, this method further includes step e: redefining master node after master node is broken down and distributes and appoints Business.
The monitoring alarm service operation status list includes: the running state information and temporal information of all nodes.
The step b is specifically included: according to the running state information in the monitoring alarm service operation status list And temporal information, judge whether this node is the node started earliest in the normal node of current operating conditions;If this node It is the node started earliest in the normal node of current operating conditions, then sends broadcast message, notifies all nodes by this node Serve as master node.
The step c is specifically included: after having several new monitoring alarm tasks to be created, master node is average The new monitoring alarm task is distributed to all nodes;When there is the new addition from node or when there is node to be deleted, Master node redistributes all monitoring alarm tasks to each node.
It is described after thering are several new monitoring alarm tasks to be created, new monitoring described in master node mean allocation Alarm task is specifically included to all nodes: after having several new monitoring alarm tasks to be created, being submitted to monitoring alarm In message queue, master node obtains new mission bit stream from message queue;Master node is distributed using reasonable algorithm Monitoring alarm task gives each node, sends inter-related task ID to each from node, realizes load balancing.
It is described when have it is new added from node when, master node redistributes all monitoring alarm tasks to each section Point specifically includes: master node is according to the operating status and report time of each node in all node operating status lists To determine whether there is newly added node;If there is newly added node, then master node visit safeguards monitoring alarm task Message queue, obtain new monitoring alarm task;Master node assigns tasks to newly added section using reasonable algorithm Point realizes load balancing.
It is described when there is node to be deleted, master node redistribute all monitoring alarm tasks give each node tool Body includes: that master node is sentenced according to the operating status of each node in all node operating status lists and report time It is disconnected whether to there is node to be deleted;If there is node is deleted, then master node visit safeguards the message team of monitoring alarm task Column, obtain the monitoring alarm task that the node is responsible for;Master node assigns tasks to all operations using reasonable algorithm Normal node realizes load balancing.
The step d is specifically included: all nodes may have access to the monitoring alarm task message queue, therefrom obtain Monitoring alarm task;Newly created monitoring alarm task is inserted into the tail portion of the monitoring alarm task message queue;From institute State the monitoring alarm task deleted and executed in monitoring alarm task message queue.
The step e is specifically included: if the state that other each nodes can not receive current master node updates The period of information is more than specified value, then judges that current master node operating status goes wrong;All nodal tests are local The monitoring alarm service operation status list of maintenance selects the node currently started earliest as new master node;New Master node reads the queue of monitoring alarm task message, obtains current all monitoring alarm tasks, the monitoring alarm is appointed Again reasonable distribution is executed to all nodes for business.
A kind of cloud platform monitoring alarm method of the present invention, using distributed structure/architecture, implementation is simple and reliable, can be efficient Huge cloud platform monitoring alarm task is completed, and is with good expansibility and high availability and good fault-tolerant Property.
Detailed description of the invention
Fig. 1 is the running environment schematic diagram of cloud platform monitoring alarm method of the present invention;
Fig. 2 is the flow chart of cloud platform monitoring alarm method of the present invention;
Fig. 3 is the job stream of the preferred embodiment when there is several new monitoring alarm tasks to be created in step S3 of the present invention Cheng Tu;
Fig. 4 is the operation process chart of the preferred embodiment when there is the new addition from node in step S3 of the present invention;
Fig. 5 is the operation process chart of the preferred embodiment when there is node to be deleted in step S3 of the present invention.
Specific embodiment
With reference to the accompanying drawing and specific embodiment the present invention is described in further detail.
As shown in fig.1, being the running environment schematic diagram of cloud platform monitoring alarm method of the present invention.
The running environment of the cloud platform monitoring alarm method uses master-slave architecture, comprising: master node and several It is a from node.Monitoring alarm service processes are run on each node, the common monitoring alarm task for completing cloud platform.master Node and between node and between node pass through remote procedure call protocol (Remote Procedure Call Protocol, RPC) and message queue protocol (Advanced Message Queuing Protocol, AMQP) realize that information is logical Letter.
As shown in fig.2, being the operation process chart of cloud platform monitoring alarm method preferred embodiment of the present invention.
The monitoring alarm service of step S1, each node pass through remote procedure call protocol, this node of periodic broadcast Operating status, inform whether the operation of other this node of node normal.Meanwhile each node receives other node broadcasts Operating status periodically updates all monitoring nodes alert service operating status lists of local maintenance.Specifically:
The monitoring alarm service of each node passes through remote procedure call protocol, the operation shape of this node of periodic broadcast For state to all nodes, the operating status of the broadcast includes instant temporal information, to inform other this node of node this moment It whether normal runs.
The monitoring alarm service of each node receives the operation shape of other node broadcasts by remote procedure call protocol State periodically updates the local all monitoring nodes alert service operating status lists safeguarded.The list records are all Node instant running state information and temporal information.
Step S2, the monitoring alarm seeervice cycle property of each node detect each node operating status list safeguarded, Judge whether this node is the node currently started earliest, if it is, transmission broadcast message, notifies all nodes by this node Serve as master node.Specifically:
The local node operating status list safeguarded of monitoring alarm seeervice cycle property detection of each node, according to list In running state information and temporal information, judge whether this node is to start earliest in the normal node of current operating conditions Node;
If this node is the node started earliest in the normal node of current operating conditions, broadcast message is sent, is led to Know that all nodes serve as master node by this node.
Step S3, after a node is confirmed as master node, the responsibility of Yao Lvhang master node.Master section The main task of point is exactly reasonable distribution monitoring alarm task to each from node.When there is several new monitoring alarm tasks to be created After building, new monitoring alarm task described in master node mean allocation gives all nodes.When there is the new addition from node Or when there is node to be deleted, master node redistributes all monitoring alarm tasks to each node, reaches task Load balance.
Step S4 safeguards a monitoring alarm task message queue by message queue protocol, wraps in the message queue Containing all current monitoring alarm tasks.Specifically:
The current all monitoring alarm tasks being carrying out of monitoring alarm task message queue maintenance.All nodes can visit It asks this message queue, therefrom obtains monitoring alarm task;
When there is new monitoring alarm task to be created, it will be inserted into the tail portion of message queue;
When monitoring alarm task is performed, it will deleted from message queue.
Step S5, after master node is broken down, since the operating status of local monitor alarm task can not be broadcasted, when When all node operating status lists of other nodal test local maintenances, can select currently the node that starts earliest as Master node, and all tasks are obtained from monitoring alarm message queue, it is reassigned to all nodes.Specifically:
When causing current master node to break down for some reason, since local monitor alarm task can not be broadcasted Operating status, therefore other each nodes can not receive the state updating information of current master node.It is super when this period It crosses after specified value, that is, can determine whether that current master node operating status goes wrong;
The monitoring alarm service operation status list of all nodal test local maintenances, the node that selection currently starts earliest As new master node;
New master node reads the queue of monitoring alarm task message, current all monitoring alarm tasks is obtained, by institute Stating monitoring alarm task, reasonable distribution is executed to all nodes again.
As shown in fig.3, be in cloud platform monitoring alarm method step S3 of the present invention when have several new monitoring alarms appoint After business is created, master node distributes the new monitoring alarm task to the work flow of the preferred embodiment from node Figure.
Step S311, master node periodically monitor the message queue of monitoring alarm task.
Step S312 is submitted in monitoring alarm message queue after having several new monitoring alarm tasks to be created, Master node obtains new mission bit stream from message queue.
Step S313, master node gives each node using reasonable algorithm distribution monitoring alarm task, sends related Task ID realizes load balancing to each from node.
As shown in fig.4, be in cloud platform monitoring alarm method step S3 of the present invention when have it is new from node add when The operation process chart of the processing method preferred embodiment of master node.
In step S321, master node periodically safeguards the list for updating local all node operating statuses.
In step S322, master node is according to the operating status of each node in all node operating status lists With report time to determine whether there is newly added node.
In step S323, if there is newly added node, then master node visit safeguards disappearing for monitoring alarm task Queue is ceased, new monitoring alarm task is obtained.
In step S324, master node assigns tasks to newly added node using reasonable algorithm, realizes load It is balanced.
As shown in fig.5, be in cloud platform monitoring alarm method step S3 of the present invention when have node be deleted when master The operation process chart of the processing method preferred embodiment of node.
In step S331, master node periodically safeguards the list for updating local all node operating statuses.
In step S332, master node is according to the operating status of each node in all node operating status lists With report time to determine whether there is node to be deleted.
In step S333, if there is node is deleted, then master node visit safeguards the message of monitoring alarm task Queue obtains the monitoring alarm task that the node is responsible for.
In step S334, master node assigns tasks to the node of all normal operations using reasonable algorithm, real Existing load balancing.
Although the present invention is described referring to current better embodiment, those skilled in the art should be able to be managed Solution, above-mentioned better embodiment is only used to illustrate the present invention, be not intended to limit the scope of protection of the present invention, any in the present invention Spirit and spirit within, any modification, equivalence replacement, improvement for being done etc. should be included in right of the invention and protect Within the scope of shield.

Claims (8)

1. a kind of cloud platform monitoring alarm method, which is characterized in that this method comprises the following steps:
A. for each node: broadcasting the operating status of this node, receive the operating status of other node broadcasts, periodically update All monitoring nodes alert service operating status lists of local maintenance;
B. according to the monitoring alarm service operation status list, master node is determined;
C. the master node being confirmed as fulfils the responsibility of master node;
D. the queue of monitoring alarm task message is safeguarded by message queue protocol;
The step b is specifically included:
According to the running state information and temporal information in the monitoring alarm service operation status list, whether this node is judged It is the node started earliest in the normal node of current operating conditions;
If this node is the node started earliest in the normal node of current operating conditions, broadcast message is sent, notifies institute There is node to serve as master node by this node;
The step c is specifically included:
After thering are several new monitoring alarm tasks to be created, new monitoring alarm task described in master node mean allocation To all nodes;
When there is the new addition from node or when there is node to be deleted, master node redistributes all monitoring alarms Task gives each node.
2. the method as described in claim 1, which is characterized in that this method further includes step e:
Master node is redefined after master node is broken down and distributes task.
3. the method as described in claim 1, which is characterized in that the monitoring alarm service operation status list includes: all The running state information and temporal information of node.
4. the method as described in claim 1, which is characterized in that described to there are several new monitoring alarm tasks to be created it Afterwards, monitoring alarm task new described in master node mean allocation is specifically included to all nodes:
After thering are several new monitoring alarm tasks to be created, be submitted in monitoring alarm message queue, master node from Message queue obtains new mission bit stream;
Master node using reasonable algorithm distribution monitoring alarm task give each node, send inter-related task ID to it is each from Node realizes load balancing.
5. the method as described in claim 1, which is characterized in that described when there is the new addition from node, master node weight All monitoring alarm tasks are newly distributed to specifically include to each node:
Master node judges according to the operating status of each node in all node operating status lists and report time Whether newly added node is had;
If there is newly added node, then the message queue of master node visit maintenance monitoring alarm task, obtains new prison Control alarm task;
Master node assigns tasks to newly added node using reasonable algorithm, realizes load balancing.
6. the method as described in claim 1, which is characterized in that described when there is node to be deleted, master node divides again It is specifically included with all monitoring alarm tasks to each node:
Master node judges according to the operating status of each node in all node operating status lists and report time Whether there is node to be deleted;
If there is node is deleted, then the message queue of master node visit maintenance monitoring alarm task, obtains the node institute Responsible monitoring alarm task;
Master node assigns tasks to the node of all normal operations using reasonable algorithm, realizes load balancing.
7. the method as described in claim 1, which is characterized in that the step d is specifically included:
All nodes may have access to the monitoring alarm task message queue, therefrom obtain monitoring alarm task;
Newly created monitoring alarm task is inserted into the tail portion of the monitoring alarm task message queue;
The monitoring alarm task executed is deleted from the monitoring alarm task message queue.
8. method according to claim 2, which is characterized in that the step e is specifically included:
If the period that other each nodes can not receive the state updating information of current master node is more than specified value, Judge that current master node operating status goes wrong;
The monitoring alarm service operation status list of all nodal test local maintenances, select currently the node that starts earliest as New master node;
New master node reads the queue of monitoring alarm task message, current all monitoring alarm tasks is obtained, by the prison Controlling alarm task, reasonable distribution is executed to all nodes again.
CN201410841470.0A 2014-12-30 2014-12-30 A kind of cloud platform monitoring alarm method Active CN105812159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410841470.0A CN105812159B (en) 2014-12-30 2014-12-30 A kind of cloud platform monitoring alarm method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410841470.0A CN105812159B (en) 2014-12-30 2014-12-30 A kind of cloud platform monitoring alarm method

Publications (2)

Publication Number Publication Date
CN105812159A CN105812159A (en) 2016-07-27
CN105812159B true CN105812159B (en) 2019-06-04

Family

ID=56980157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410841470.0A Active CN105812159B (en) 2014-12-30 2014-12-30 A kind of cloud platform monitoring alarm method

Country Status (1)

Country Link
CN (1) CN105812159B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107395458B (en) * 2017-07-31 2020-05-22 东软集团股份有限公司 System monitoring method and device
CN107608285B (en) * 2017-09-01 2019-10-08 北京南凯自动化系统工程有限公司 A kind of comprehensive monitoring system
CN109144737A (en) * 2018-10-09 2019-01-04 郑州云海信息技术有限公司 Controller management method, apparatus and storage medium in a kind of distributed cluster system
CN112685199B (en) * 2020-12-30 2023-10-20 董小君 Message queue repairing method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512729A (en) * 2002-12-31 2004-07-14 联想(北京)有限公司 Method for network equipment self adaption load equalization
CN101924650A (en) * 2010-08-04 2010-12-22 浙江省电力公司 Method for implementing services and intelligent server autonomy of failure information system
CN102882909A (en) * 2011-07-15 2013-01-16 易云捷讯科技(北京)有限公司 Cloud computing service monitoring system and method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589499B2 (en) * 2002-05-10 2013-11-19 Silicon Graphics International Corp. Real-time storage area network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512729A (en) * 2002-12-31 2004-07-14 联想(北京)有限公司 Method for network equipment self adaption load equalization
CN101924650A (en) * 2010-08-04 2010-12-22 浙江省电力公司 Method for implementing services and intelligent server autonomy of failure information system
CN102882909A (en) * 2011-07-15 2013-01-16 易云捷讯科技(北京)有限公司 Cloud computing service monitoring system and method thereof

Also Published As

Publication number Publication date
CN105812159A (en) 2016-07-27

Similar Documents

Publication Publication Date Title
CN104461752B (en) A kind of multimedia distributed task processing method of two-stage failure tolerant
CN105812159B (en) A kind of cloud platform monitoring alarm method
CN107959705B (en) Distribution method of streaming computing task and control server
CN104184819A (en) Multi-hierarchy load balancing cloud resource monitoring method
TWI794158B (en) Garbage collection method and device
CN105871957B (en) Monitoring framework design method and monitoring server, agent unit, control server
CN105554123B (en) Large capacity perceives cloud computing platform system
CN110990200A (en) Flow switching method and device based on multi-activity data center
CN105302641B (en) The method and device of node scheduling is carried out in virtual cluster
CN103458055A (en) Clout competing platform
CN106603696A (en) High-availability system based on hyper-converged infrastructure
CN112437129B (en) Cluster management method and cluster management device
CN110958311A (en) YARN-based shared cluster elastic expansion system and method
CN105893211A (en) Method and system for monitoring
CN112636982A (en) Network countermeasure environment configuration method and experiment cloud platform system for network countermeasure
CN111552483A (en) Cloud service deployment method, device, equipment and medium
CN104484228B (en) Distributed parallel task processing system based on Intelli DSC
CN111988347B (en) Data processing method of board hopping machine system and board hopping machine system
CN111064586B (en) Distributed parallel charging method
CN113835834A (en) K8S container cluster-based computing node capacity expansion method and system
CN108243205A (en) A kind of method, equipment and system for being used to control cloud platform resource allocation
CN111614702B (en) Edge calculation method and edge calculation system
CN115391058B (en) SDN-based resource event processing method, resource creation method and system
CN111092754A (en) Real-time access service system and implementation method thereof
CN106453118B (en) Flow control method and flow control system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant