CN105812159B

CN105812159B - A kind of cloud platform monitoring alarm method

Info

Publication number: CN105812159B
Application number: CN201410841470.0A
Authority: CN
Inventors: 刘冬; 喻之斌; 贝振东; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2014-12-30
Filing date: 2014-12-30
Publication date: 2019-06-04
Anticipated expiration: 2034-12-30
Also published as: CN105812159A

Abstract

The present invention relates to a kind of cloud platform monitoring alarm methods, comprising: for each node: broadcasting the operating status of this node, receives the operating status of other node broadcasts, periodically updates all monitoring nodes alert service operating status lists of local maintenance；According to the monitoring alarm service operation status list, master node is determined；The master node being confirmed as fulfils the responsibility of master node；The queue of monitoring alarm task message is safeguarded by message queue protocol.Implementation of the present invention is simple and reliable, is with good expansibility, high availability and fault-tolerance.

Description

A kind of cloud platform monitoring alarm method

Technical field

The present invention relates to a kind of cloud platform monitoring alarm methods.

Background technique

The mature and grid of virtual technology are in connection, are pregnant with cloud computing platform.Cloud computing platform handle Huge infrastructure, data storage, various platforms, software composition be mutually shared, cooperation resource pool, and takes out on this basis As going out stratification service, such as architecture (laaS), platform (Paas), software are provided for user in such a way that payment uses (Saas) service such as.

Monitoring is the important component of cloud computing platform, it is many such as network analysis, system in cloud computing platform Management, job scheduling, load balancing, event prediction, fault detection and recovery operation premise, cloud computing platform can be helped The use of dynamic quantization resource, detection service defect, discovery user's use pattern, auxiliary resources scheduler module decision, to raising cloud The service quality of computing platform plays a significant role.Cloud computing platform not only includes the storage of bottom, network, computing resource, is gone back Including on this basis virtual resource and these resources abstract integration after cloud platform.It is flooded with above cloud platform each How isomery, dynamic, complicated resource under kind a large amount of distributed environment carry out efficient dynamic to them and monitor and pipe Reason is to provide the guarantee of high-quality service.

The famous cloud computing platform of industry has the monitoring solution of oneself at present, to realize the prison to cloud computing platform Control alarm.But these schemes often all use one-site model, i.e. a node is responsible for the monitoring of entire cloud computing platform, with The continuous extension of cloud platform, the pressure of monitoring also can be increasing, and scalability and fault-tolerance are also poor, it is difficult to guarantee The monitoring alarm task of cloud platform efficiently executes.Therefore, the monitoring and alarming system of cloud platform not only will efficiently realize that monitoring is appointed Business, should also be with good expansibility, high availability and fault-tolerance.

Summary of the invention

In view of this, it is necessary to provide a kind of cloud platform monitoring alarm methods.

The present invention also provides a kind of cloud platform monitoring alarm methods, which is characterized in that this method comprises the following steps: a. pairs In each node: broadcasting the operating status of this node, receive the operating status of other node broadcasts, periodically update local maintenance All monitoring nodes alert service operating status lists；B. it according to the monitoring alarm service operation status list, determines Master node；C. the master node being confirmed as fulfils the responsibility of master node；D. it is safeguarded by message queue protocol Monitoring alarm task message queue.

Wherein, this method further includes step e: redefining master node after master node is broken down and distributes and appoints Business.

The monitoring alarm service operation status list includes: the running state information and temporal information of all nodes.

The step b is specifically included: according to the running state information in the monitoring alarm service operation status list And temporal information, judge whether this node is the node started earliest in the normal node of current operating conditions；If this node It is the node started earliest in the normal node of current operating conditions, then sends broadcast message, notifies all nodes by this node Serve as master node.

The step c is specifically included: after having several new monitoring alarm tasks to be created, master node is average The new monitoring alarm task is distributed to all nodes；When there is the new addition from node or when there is node to be deleted, Master node redistributes all monitoring alarm tasks to each node.

It is described after thering are several new monitoring alarm tasks to be created, new monitoring described in master node mean allocation Alarm task is specifically included to all nodes: after having several new monitoring alarm tasks to be created, being submitted to monitoring alarm In message queue, master node obtains new mission bit stream from message queue；Master node is distributed using reasonable algorithm Monitoring alarm task gives each node, sends inter-related task ID to each from node, realizes load balancing.

It is described when have it is new added from node when, master node redistributes all monitoring alarm tasks to each section Point specifically includes: master node is according to the operating status and report time of each node in all node operating status lists To determine whether there is newly added node；If there is newly added node, then master node visit safeguards monitoring alarm task Message queue, obtain new monitoring alarm task；Master node assigns tasks to newly added section using reasonable algorithm Point realizes load balancing.

It is described when there is node to be deleted, master node redistribute all monitoring alarm tasks give each node tool Body includes: that master node is sentenced according to the operating status of each node in all node operating status lists and report time It is disconnected whether to there is node to be deleted；If there is node is deleted, then master node visit safeguards the message team of monitoring alarm task Column, obtain the monitoring alarm task that the node is responsible for；Master node assigns tasks to all operations using reasonable algorithm Normal node realizes load balancing.

The step d is specifically included: all nodes may have access to the monitoring alarm task message queue, therefrom obtain Monitoring alarm task；Newly created monitoring alarm task is inserted into the tail portion of the monitoring alarm task message queue；From institute State the monitoring alarm task deleted and executed in monitoring alarm task message queue.

The step e is specifically included: if the state that other each nodes can not receive current master node updates The period of information is more than specified value, then judges that current master node operating status goes wrong；All nodal tests are local The monitoring alarm service operation status list of maintenance selects the node currently started earliest as new master node；New Master node reads the queue of monitoring alarm task message, obtains current all monitoring alarm tasks, the monitoring alarm is appointed Again reasonable distribution is executed to all nodes for business.

A kind of cloud platform monitoring alarm method of the present invention, using distributed structure/architecture, implementation is simple and reliable, can be efficient Huge cloud platform monitoring alarm task is completed, and is with good expansibility and high availability and good fault-tolerant Property.

Detailed description of the invention

Fig. 1 is the running environment schematic diagram of cloud platform monitoring alarm method of the present invention；

Fig. 2 is the flow chart of cloud platform monitoring alarm method of the present invention；

Fig. 3 is the job stream of the preferred embodiment when there is several new monitoring alarm tasks to be created in step S3 of the present invention Cheng Tu；

Fig. 4 is the operation process chart of the preferred embodiment when there is the new addition from node in step S3 of the present invention；

Fig. 5 is the operation process chart of the preferred embodiment when there is node to be deleted in step S3 of the present invention.

Specific embodiment

With reference to the accompanying drawing and specific embodiment the present invention is described in further detail.

As shown in fig.1, being the running environment schematic diagram of cloud platform monitoring alarm method of the present invention.

The running environment of the cloud platform monitoring alarm method uses master-slave architecture, comprising: master node and several It is a from node.Monitoring alarm service processes are run on each node, the common monitoring alarm task for completing cloud platform.master Node and between node and between node pass through remote procedure call protocol (Remote Procedure Call Protocol, RPC) and message queue protocol (Advanced Message Queuing Protocol, AMQP) realize that information is logical Letter.

As shown in fig.2, being the operation process chart of cloud platform monitoring alarm method preferred embodiment of the present invention.

The monitoring alarm service of step S1, each node pass through remote procedure call protocol, this node of periodic broadcast Operating status, inform whether the operation of other this node of node normal.Meanwhile each node receives other node broadcasts Operating status periodically updates all monitoring nodes alert service operating status lists of local maintenance.Specifically:

The monitoring alarm service of each node passes through remote procedure call protocol, the operation shape of this node of periodic broadcast For state to all nodes, the operating status of the broadcast includes instant temporal information, to inform other this node of node this moment It whether normal runs.

The monitoring alarm service of each node receives the operation shape of other node broadcasts by remote procedure call protocol State periodically updates the local all monitoring nodes alert service operating status lists safeguarded.The list records are all Node instant running state information and temporal information.

Step S2, the monitoring alarm seeervice cycle property of each node detect each node operating status list safeguarded, Judge whether this node is the node currently started earliest, if it is, transmission broadcast message, notifies all nodes by this node Serve as master node.Specifically:

The local node operating status list safeguarded of monitoring alarm seeervice cycle property detection of each node, according to list In running state information and temporal information, judge whether this node is to start earliest in the normal node of current operating conditions Node；

If this node is the node started earliest in the normal node of current operating conditions, broadcast message is sent, is led to Know that all nodes serve as master node by this node.

Step S3, after a node is confirmed as master node, the responsibility of Yao Lvhang master node.Master section The main task of point is exactly reasonable distribution monitoring alarm task to each from node.When there is several new monitoring alarm tasks to be created After building, new monitoring alarm task described in master node mean allocation gives all nodes.When there is the new addition from node Or when there is node to be deleted, master node redistributes all monitoring alarm tasks to each node, reaches task Load balance.

Step S4 safeguards a monitoring alarm task message queue by message queue protocol, wraps in the message queue Containing all current monitoring alarm tasks.Specifically:

The current all monitoring alarm tasks being carrying out of monitoring alarm task message queue maintenance.All nodes can visit It asks this message queue, therefrom obtains monitoring alarm task；

When there is new monitoring alarm task to be created, it will be inserted into the tail portion of message queue；

When monitoring alarm task is performed, it will deleted from message queue.

Step S5, after master node is broken down, since the operating status of local monitor alarm task can not be broadcasted, when When all node operating status lists of other nodal test local maintenances, can select currently the node that starts earliest as Master node, and all tasks are obtained from monitoring alarm message queue, it is reassigned to all nodes.Specifically:

When causing current master node to break down for some reason, since local monitor alarm task can not be broadcasted Operating status, therefore other each nodes can not receive the state updating information of current master node.It is super when this period It crosses after specified value, that is, can determine whether that current master node operating status goes wrong；

The monitoring alarm service operation status list of all nodal test local maintenances, the node that selection currently starts earliest As new master node；

New master node reads the queue of monitoring alarm task message, current all monitoring alarm tasks is obtained, by institute Stating monitoring alarm task, reasonable distribution is executed to all nodes again.

As shown in fig.3, be in cloud platform monitoring alarm method step S3 of the present invention when have several new monitoring alarms appoint After business is created, master node distributes the new monitoring alarm task to the work flow of the preferred embodiment from node Figure.

Step S311, master node periodically monitor the message queue of monitoring alarm task.

Step S312 is submitted in monitoring alarm message queue after having several new monitoring alarm tasks to be created, Master node obtains new mission bit stream from message queue.

Step S313, master node gives each node using reasonable algorithm distribution monitoring alarm task, sends related Task ID realizes load balancing to each from node.

As shown in fig.4, be in cloud platform monitoring alarm method step S3 of the present invention when have it is new from node add when The operation process chart of the processing method preferred embodiment of master node.

In step S321, master node periodically safeguards the list for updating local all node operating statuses.

In step S322, master node is according to the operating status of each node in all node operating status lists With report time to determine whether there is newly added node.

In step S323, if there is newly added node, then master node visit safeguards disappearing for monitoring alarm task Queue is ceased, new monitoring alarm task is obtained.

In step S324, master node assigns tasks to newly added node using reasonable algorithm, realizes load It is balanced.

As shown in fig.5, be in cloud platform monitoring alarm method step S3 of the present invention when have node be deleted when master The operation process chart of the processing method preferred embodiment of node.

In step S331, master node periodically safeguards the list for updating local all node operating statuses.

In step S332, master node is according to the operating status of each node in all node operating status lists With report time to determine whether there is node to be deleted.

In step S333, if there is node is deleted, then master node visit safeguards the message of monitoring alarm task Queue obtains the monitoring alarm task that the node is responsible for.

In step S334, master node assigns tasks to the node of all normal operations using reasonable algorithm, real Existing load balancing.

Although the present invention is described referring to current better embodiment, those skilled in the art should be able to be managed Solution, above-mentioned better embodiment is only used to illustrate the present invention, be not intended to limit the scope of protection of the present invention, any in the present invention Spirit and spirit within, any modification, equivalence replacement, improvement for being done etc. should be included in right of the invention and protect Within the scope of shield.

Claims

1. a kind of cloud platform monitoring alarm method, which is characterized in that this method comprises the following steps:

A. for each node: broadcasting the operating status of this node, receive the operating status of other node broadcasts, periodically update All monitoring nodes alert service operating status lists of local maintenance；

B. according to the monitoring alarm service operation status list, master node is determined；

C. the master node being confirmed as fulfils the responsibility of master node；

D. the queue of monitoring alarm task message is safeguarded by message queue protocol；

The step b is specifically included:

According to the running state information and temporal information in the monitoring alarm service operation status list, whether this node is judged It is the node started earliest in the normal node of current operating conditions；

If this node is the node started earliest in the normal node of current operating conditions, broadcast message is sent, notifies institute There is node to serve as master node by this node；

The step c is specifically included:

After thering are several new monitoring alarm tasks to be created, new monitoring alarm task described in master node mean allocation To all nodes；

When there is the new addition from node or when there is node to be deleted, master node redistributes all monitoring alarms Task gives each node.

2. the method as described in claim 1, which is characterized in that this method further includes step e:

Master node is redefined after master node is broken down and distributes task.

3. the method as described in claim 1, which is characterized in that the monitoring alarm service operation status list includes: all The running state information and temporal information of node.

4. the method as described in claim 1, which is characterized in that described to there are several new monitoring alarm tasks to be created it Afterwards, monitoring alarm task new described in master node mean allocation is specifically included to all nodes:

After thering are several new monitoring alarm tasks to be created, be submitted in monitoring alarm message queue, master node from Message queue obtains new mission bit stream；

Master node using reasonable algorithm distribution monitoring alarm task give each node, send inter-related task ID to it is each from Node realizes load balancing.

5. the method as described in claim 1, which is characterized in that described when there is the new addition from node, master node weight All monitoring alarm tasks are newly distributed to specifically include to each node:

Master node judges according to the operating status of each node in all node operating status lists and report time Whether newly added node is had；

If there is newly added node, then the message queue of master node visit maintenance monitoring alarm task, obtains new prison Control alarm task；

Master node assigns tasks to newly added node using reasonable algorithm, realizes load balancing.

6. the method as described in claim 1, which is characterized in that described when there is node to be deleted, master node divides again It is specifically included with all monitoring alarm tasks to each node:

Master node judges according to the operating status of each node in all node operating status lists and report time Whether there is node to be deleted；

If there is node is deleted, then the message queue of master node visit maintenance monitoring alarm task, obtains the node institute Responsible monitoring alarm task；

Master node assigns tasks to the node of all normal operations using reasonable algorithm, realizes load balancing.

7. the method as described in claim 1, which is characterized in that the step d is specifically included:

All nodes may have access to the monitoring alarm task message queue, therefrom obtain monitoring alarm task；

Newly created monitoring alarm task is inserted into the tail portion of the monitoring alarm task message queue；

The monitoring alarm task executed is deleted from the monitoring alarm task message queue.

8. method according to claim 2, which is characterized in that the step e is specifically included:

If the period that other each nodes can not receive the state updating information of current master node is more than specified value, Judge that current master node operating status goes wrong；

The monitoring alarm service operation status list of all nodal test local maintenances, select currently the node that starts earliest as New master node；

New master node reads the queue of monitoring alarm task message, current all monitoring alarm tasks is obtained, by the prison Controlling alarm task, reasonable distribution is executed to all nodes again.