CN105812159B - A kind of cloud platform monitoring alarm method - Google Patents
A kind of cloud platform monitoring alarm method Download PDFInfo
- Publication number
- CN105812159B CN105812159B CN201410841470.0A CN201410841470A CN105812159B CN 105812159 B CN105812159 B CN 105812159B CN 201410841470 A CN201410841470 A CN 201410841470A CN 105812159 B CN105812159 B CN 105812159B
- Authority
- CN
- China
- Prior art keywords
- node
- monitoring alarm
- master node
- task
- master
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention relates to a kind of cloud platform monitoring alarm methods, comprising: for each node: broadcasting the operating status of this node, receives the operating status of other node broadcasts, periodically updates all monitoring nodes alert service operating status lists of local maintenance;According to the monitoring alarm service operation status list, master node is determined;The master node being confirmed as fulfils the responsibility of master node;The queue of monitoring alarm task message is safeguarded by message queue protocol.Implementation of the present invention is simple and reliable, is with good expansibility, high availability and fault-tolerance.
Description
Technical field
The present invention relates to a kind of cloud platform monitoring alarm methods.
Background technique
The mature and grid of virtual technology are in connection, are pregnant with cloud computing platform.Cloud computing platform handle
Huge infrastructure, data storage, various platforms, software composition be mutually shared, cooperation resource pool, and takes out on this basis
As going out stratification service, such as architecture (laaS), platform (Paas), software are provided for user in such a way that payment uses
(Saas) service such as.
Monitoring is the important component of cloud computing platform, it is many such as network analysis, system in cloud computing platform
Management, job scheduling, load balancing, event prediction, fault detection and recovery operation premise, cloud computing platform can be helped
The use of dynamic quantization resource, detection service defect, discovery user's use pattern, auxiliary resources scheduler module decision, to raising cloud
The service quality of computing platform plays a significant role.Cloud computing platform not only includes the storage of bottom, network, computing resource, is gone back
Including on this basis virtual resource and these resources abstract integration after cloud platform.It is flooded with above cloud platform each
How isomery, dynamic, complicated resource under kind a large amount of distributed environment carry out efficient dynamic to them and monitor and pipe
Reason is to provide the guarantee of high-quality service.
The famous cloud computing platform of industry has the monitoring solution of oneself at present, to realize the prison to cloud computing platform
Control alarm.But these schemes often all use one-site model, i.e. a node is responsible for the monitoring of entire cloud computing platform, with
The continuous extension of cloud platform, the pressure of monitoring also can be increasing, and scalability and fault-tolerance are also poor, it is difficult to guarantee
The monitoring alarm task of cloud platform efficiently executes.Therefore, the monitoring and alarming system of cloud platform not only will efficiently realize that monitoring is appointed
Business, should also be with good expansibility, high availability and fault-tolerance.
Summary of the invention
In view of this, it is necessary to provide a kind of cloud platform monitoring alarm methods.
The present invention also provides a kind of cloud platform monitoring alarm methods, which is characterized in that this method comprises the following steps: a. pairs
In each node: broadcasting the operating status of this node, receive the operating status of other node broadcasts, periodically update local maintenance
All monitoring nodes alert service operating status lists;B. it according to the monitoring alarm service operation status list, determines
Master node;C. the master node being confirmed as fulfils the responsibility of master node;D. it is safeguarded by message queue protocol
Monitoring alarm task message queue.
Wherein, this method further includes step e: redefining master node after master node is broken down and distributes and appoints
Business.
The monitoring alarm service operation status list includes: the running state information and temporal information of all nodes.
The step b is specifically included: according to the running state information in the monitoring alarm service operation status list
And temporal information, judge whether this node is the node started earliest in the normal node of current operating conditions;If this node
It is the node started earliest in the normal node of current operating conditions, then sends broadcast message, notifies all nodes by this node
Serve as master node.
The step c is specifically included: after having several new monitoring alarm tasks to be created, master node is average
The new monitoring alarm task is distributed to all nodes;When there is the new addition from node or when there is node to be deleted,
Master node redistributes all monitoring alarm tasks to each node.
It is described after thering are several new monitoring alarm tasks to be created, new monitoring described in master node mean allocation
Alarm task is specifically included to all nodes: after having several new monitoring alarm tasks to be created, being submitted to monitoring alarm
In message queue, master node obtains new mission bit stream from message queue;Master node is distributed using reasonable algorithm
Monitoring alarm task gives each node, sends inter-related task ID to each from node, realizes load balancing.
It is described when have it is new added from node when, master node redistributes all monitoring alarm tasks to each section
Point specifically includes: master node is according to the operating status and report time of each node in all node operating status lists
To determine whether there is newly added node;If there is newly added node, then master node visit safeguards monitoring alarm task
Message queue, obtain new monitoring alarm task;Master node assigns tasks to newly added section using reasonable algorithm
Point realizes load balancing.
It is described when there is node to be deleted, master node redistribute all monitoring alarm tasks give each node tool
Body includes: that master node is sentenced according to the operating status of each node in all node operating status lists and report time
It is disconnected whether to there is node to be deleted;If there is node is deleted, then master node visit safeguards the message team of monitoring alarm task
Column, obtain the monitoring alarm task that the node is responsible for;Master node assigns tasks to all operations using reasonable algorithm
Normal node realizes load balancing.
The step d is specifically included: all nodes may have access to the monitoring alarm task message queue, therefrom obtain
Monitoring alarm task;Newly created monitoring alarm task is inserted into the tail portion of the monitoring alarm task message queue;From institute
State the monitoring alarm task deleted and executed in monitoring alarm task message queue.
The step e is specifically included: if the state that other each nodes can not receive current master node updates
The period of information is more than specified value, then judges that current master node operating status goes wrong;All nodal tests are local
The monitoring alarm service operation status list of maintenance selects the node currently started earliest as new master node;New
Master node reads the queue of monitoring alarm task message, obtains current all monitoring alarm tasks, the monitoring alarm is appointed
Again reasonable distribution is executed to all nodes for business.
A kind of cloud platform monitoring alarm method of the present invention, using distributed structure/architecture, implementation is simple and reliable, can be efficient
Huge cloud platform monitoring alarm task is completed, and is with good expansibility and high availability and good fault-tolerant
Property.
Detailed description of the invention
Fig. 1 is the running environment schematic diagram of cloud platform monitoring alarm method of the present invention;
Fig. 2 is the flow chart of cloud platform monitoring alarm method of the present invention;
Fig. 3 is the job stream of the preferred embodiment when there is several new monitoring alarm tasks to be created in step S3 of the present invention
Cheng Tu;
Fig. 4 is the operation process chart of the preferred embodiment when there is the new addition from node in step S3 of the present invention;
Fig. 5 is the operation process chart of the preferred embodiment when there is node to be deleted in step S3 of the present invention.
Specific embodiment
With reference to the accompanying drawing and specific embodiment the present invention is described in further detail.
As shown in fig.1, being the running environment schematic diagram of cloud platform monitoring alarm method of the present invention.
The running environment of the cloud platform monitoring alarm method uses master-slave architecture, comprising: master node and several
It is a from node.Monitoring alarm service processes are run on each node, the common monitoring alarm task for completing cloud platform.master
Node and between node and between node pass through remote procedure call protocol (Remote Procedure Call
Protocol, RPC) and message queue protocol (Advanced Message Queuing Protocol, AMQP) realize that information is logical
Letter.
As shown in fig.2, being the operation process chart of cloud platform monitoring alarm method preferred embodiment of the present invention.
The monitoring alarm service of step S1, each node pass through remote procedure call protocol, this node of periodic broadcast
Operating status, inform whether the operation of other this node of node normal.Meanwhile each node receives other node broadcasts
Operating status periodically updates all monitoring nodes alert service operating status lists of local maintenance.Specifically:
The monitoring alarm service of each node passes through remote procedure call protocol, the operation shape of this node of periodic broadcast
For state to all nodes, the operating status of the broadcast includes instant temporal information, to inform other this node of node this moment
It whether normal runs.
The monitoring alarm service of each node receives the operation shape of other node broadcasts by remote procedure call protocol
State periodically updates the local all monitoring nodes alert service operating status lists safeguarded.The list records are all
Node instant running state information and temporal information.
Step S2, the monitoring alarm seeervice cycle property of each node detect each node operating status list safeguarded,
Judge whether this node is the node currently started earliest, if it is, transmission broadcast message, notifies all nodes by this node
Serve as master node.Specifically:
The local node operating status list safeguarded of monitoring alarm seeervice cycle property detection of each node, according to list
In running state information and temporal information, judge whether this node is to start earliest in the normal node of current operating conditions
Node;
If this node is the node started earliest in the normal node of current operating conditions, broadcast message is sent, is led to
Know that all nodes serve as master node by this node.
Step S3, after a node is confirmed as master node, the responsibility of Yao Lvhang master node.Master section
The main task of point is exactly reasonable distribution monitoring alarm task to each from node.When there is several new monitoring alarm tasks to be created
After building, new monitoring alarm task described in master node mean allocation gives all nodes.When there is the new addition from node
Or when there is node to be deleted, master node redistributes all monitoring alarm tasks to each node, reaches task
Load balance.
Step S4 safeguards a monitoring alarm task message queue by message queue protocol, wraps in the message queue
Containing all current monitoring alarm tasks.Specifically:
The current all monitoring alarm tasks being carrying out of monitoring alarm task message queue maintenance.All nodes can visit
It asks this message queue, therefrom obtains monitoring alarm task;
When there is new monitoring alarm task to be created, it will be inserted into the tail portion of message queue;
When monitoring alarm task is performed, it will deleted from message queue.
Step S5, after master node is broken down, since the operating status of local monitor alarm task can not be broadcasted, when
When all node operating status lists of other nodal test local maintenances, can select currently the node that starts earliest as
Master node, and all tasks are obtained from monitoring alarm message queue, it is reassigned to all nodes.Specifically:
When causing current master node to break down for some reason, since local monitor alarm task can not be broadcasted
Operating status, therefore other each nodes can not receive the state updating information of current master node.It is super when this period
It crosses after specified value, that is, can determine whether that current master node operating status goes wrong;
The monitoring alarm service operation status list of all nodal test local maintenances, the node that selection currently starts earliest
As new master node;
New master node reads the queue of monitoring alarm task message, current all monitoring alarm tasks is obtained, by institute
Stating monitoring alarm task, reasonable distribution is executed to all nodes again.
As shown in fig.3, be in cloud platform monitoring alarm method step S3 of the present invention when have several new monitoring alarms appoint
After business is created, master node distributes the new monitoring alarm task to the work flow of the preferred embodiment from node
Figure.
Step S311, master node periodically monitor the message queue of monitoring alarm task.
Step S312 is submitted in monitoring alarm message queue after having several new monitoring alarm tasks to be created,
Master node obtains new mission bit stream from message queue.
Step S313, master node gives each node using reasonable algorithm distribution monitoring alarm task, sends related
Task ID realizes load balancing to each from node.
As shown in fig.4, be in cloud platform monitoring alarm method step S3 of the present invention when have it is new from node add when
The operation process chart of the processing method preferred embodiment of master node.
In step S321, master node periodically safeguards the list for updating local all node operating statuses.
In step S322, master node is according to the operating status of each node in all node operating status lists
With report time to determine whether there is newly added node.
In step S323, if there is newly added node, then master node visit safeguards disappearing for monitoring alarm task
Queue is ceased, new monitoring alarm task is obtained.
In step S324, master node assigns tasks to newly added node using reasonable algorithm, realizes load
It is balanced.
As shown in fig.5, be in cloud platform monitoring alarm method step S3 of the present invention when have node be deleted when master
The operation process chart of the processing method preferred embodiment of node.
In step S331, master node periodically safeguards the list for updating local all node operating statuses.
In step S332, master node is according to the operating status of each node in all node operating status lists
With report time to determine whether there is node to be deleted.
In step S333, if there is node is deleted, then master node visit safeguards the message of monitoring alarm task
Queue obtains the monitoring alarm task that the node is responsible for.
In step S334, master node assigns tasks to the node of all normal operations using reasonable algorithm, real
Existing load balancing.
Although the present invention is described referring to current better embodiment, those skilled in the art should be able to be managed
Solution, above-mentioned better embodiment is only used to illustrate the present invention, be not intended to limit the scope of protection of the present invention, any in the present invention
Spirit and spirit within, any modification, equivalence replacement, improvement for being done etc. should be included in right of the invention and protect
Within the scope of shield.
Claims (8)
1. a kind of cloud platform monitoring alarm method, which is characterized in that this method comprises the following steps:
A. for each node: broadcasting the operating status of this node, receive the operating status of other node broadcasts, periodically update
All monitoring nodes alert service operating status lists of local maintenance;
B. according to the monitoring alarm service operation status list, master node is determined;
C. the master node being confirmed as fulfils the responsibility of master node;
D. the queue of monitoring alarm task message is safeguarded by message queue protocol;
The step b is specifically included:
According to the running state information and temporal information in the monitoring alarm service operation status list, whether this node is judged
It is the node started earliest in the normal node of current operating conditions;
If this node is the node started earliest in the normal node of current operating conditions, broadcast message is sent, notifies institute
There is node to serve as master node by this node;
The step c is specifically included:
After thering are several new monitoring alarm tasks to be created, new monitoring alarm task described in master node mean allocation
To all nodes;
When there is the new addition from node or when there is node to be deleted, master node redistributes all monitoring alarms
Task gives each node.
2. the method as described in claim 1, which is characterized in that this method further includes step e:
Master node is redefined after master node is broken down and distributes task.
3. the method as described in claim 1, which is characterized in that the monitoring alarm service operation status list includes: all
The running state information and temporal information of node.
4. the method as described in claim 1, which is characterized in that described to there are several new monitoring alarm tasks to be created it
Afterwards, monitoring alarm task new described in master node mean allocation is specifically included to all nodes:
After thering are several new monitoring alarm tasks to be created, be submitted in monitoring alarm message queue, master node from
Message queue obtains new mission bit stream;
Master node using reasonable algorithm distribution monitoring alarm task give each node, send inter-related task ID to it is each from
Node realizes load balancing.
5. the method as described in claim 1, which is characterized in that described when there is the new addition from node, master node weight
All monitoring alarm tasks are newly distributed to specifically include to each node:
Master node judges according to the operating status of each node in all node operating status lists and report time
Whether newly added node is had;
If there is newly added node, then the message queue of master node visit maintenance monitoring alarm task, obtains new prison
Control alarm task;
Master node assigns tasks to newly added node using reasonable algorithm, realizes load balancing.
6. the method as described in claim 1, which is characterized in that described when there is node to be deleted, master node divides again
It is specifically included with all monitoring alarm tasks to each node:
Master node judges according to the operating status of each node in all node operating status lists and report time
Whether there is node to be deleted;
If there is node is deleted, then the message queue of master node visit maintenance monitoring alarm task, obtains the node institute
Responsible monitoring alarm task;
Master node assigns tasks to the node of all normal operations using reasonable algorithm, realizes load balancing.
7. the method as described in claim 1, which is characterized in that the step d is specifically included:
All nodes may have access to the monitoring alarm task message queue, therefrom obtain monitoring alarm task;
Newly created monitoring alarm task is inserted into the tail portion of the monitoring alarm task message queue;
The monitoring alarm task executed is deleted from the monitoring alarm task message queue.
8. method according to claim 2, which is characterized in that the step e is specifically included:
If the period that other each nodes can not receive the state updating information of current master node is more than specified value,
Judge that current master node operating status goes wrong;
The monitoring alarm service operation status list of all nodal test local maintenances, select currently the node that starts earliest as
New master node;
New master node reads the queue of monitoring alarm task message, current all monitoring alarm tasks is obtained, by the prison
Controlling alarm task, reasonable distribution is executed to all nodes again.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410841470.0A CN105812159B (en) | 2014-12-30 | 2014-12-30 | A kind of cloud platform monitoring alarm method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410841470.0A CN105812159B (en) | 2014-12-30 | 2014-12-30 | A kind of cloud platform monitoring alarm method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105812159A CN105812159A (en) | 2016-07-27 |
CN105812159B true CN105812159B (en) | 2019-06-04 |
Family
ID=56980157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410841470.0A Active CN105812159B (en) | 2014-12-30 | 2014-12-30 | A kind of cloud platform monitoring alarm method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105812159B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107395458B (en) * | 2017-07-31 | 2020-05-22 | 东软集团股份有限公司 | System monitoring method and device |
CN107608285B (en) * | 2017-09-01 | 2019-10-08 | 北京南凯自动化系统工程有限公司 | A kind of comprehensive monitoring system |
CN109144737A (en) * | 2018-10-09 | 2019-01-04 | 郑州云海信息技术有限公司 | Controller management method, apparatus and storage medium in a kind of distributed cluster system |
CN112685199B (en) * | 2020-12-30 | 2023-10-20 | 董小君 | Message queue repairing method and device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512729A (en) * | 2002-12-31 | 2004-07-14 | 联想(北京)有限公司 | Method for network equipment self adaption load equalization |
CN101924650A (en) * | 2010-08-04 | 2010-12-22 | 浙江省电力公司 | Method for implementing services and intelligent server autonomy of failure information system |
CN102882909A (en) * | 2011-07-15 | 2013-01-16 | 易云捷讯科技(北京)有限公司 | Cloud computing service monitoring system and method thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8589499B2 (en) * | 2002-05-10 | 2013-11-19 | Silicon Graphics International Corp. | Real-time storage area network |
-
2014
- 2014-12-30 CN CN201410841470.0A patent/CN105812159B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512729A (en) * | 2002-12-31 | 2004-07-14 | 联想(北京)有限公司 | Method for network equipment self adaption load equalization |
CN101924650A (en) * | 2010-08-04 | 2010-12-22 | 浙江省电力公司 | Method for implementing services and intelligent server autonomy of failure information system |
CN102882909A (en) * | 2011-07-15 | 2013-01-16 | 易云捷讯科技(北京)有限公司 | Cloud computing service monitoring system and method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN105812159A (en) | 2016-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104461752B (en) | A kind of multimedia distributed task processing method of two-stage failure tolerant | |
CN105812159B (en) | A kind of cloud platform monitoring alarm method | |
CN107959705B (en) | Distribution method of streaming computing task and control server | |
CN104184819A (en) | Multi-hierarchy load balancing cloud resource monitoring method | |
TWI794158B (en) | Garbage collection method and device | |
CN105871957B (en) | Monitoring framework design method and monitoring server, agent unit, control server | |
CN105554123B (en) | Large capacity perceives cloud computing platform system | |
CN110990200A (en) | Flow switching method and device based on multi-activity data center | |
CN105302641B (en) | The method and device of node scheduling is carried out in virtual cluster | |
CN103458055A (en) | Clout competing platform | |
CN106603696A (en) | High-availability system based on hyper-converged infrastructure | |
CN112437129B (en) | Cluster management method and cluster management device | |
CN110958311A (en) | YARN-based shared cluster elastic expansion system and method | |
CN105893211A (en) | Method and system for monitoring | |
CN112636982A (en) | Network countermeasure environment configuration method and experiment cloud platform system for network countermeasure | |
CN111552483A (en) | Cloud service deployment method, device, equipment and medium | |
CN104484228B (en) | Distributed parallel task processing system based on Intelli DSC | |
CN111988347B (en) | Data processing method of board hopping machine system and board hopping machine system | |
CN111064586B (en) | Distributed parallel charging method | |
CN113835834A (en) | K8S container cluster-based computing node capacity expansion method and system | |
CN108243205A (en) | A kind of method, equipment and system for being used to control cloud platform resource allocation | |
CN111614702B (en) | Edge calculation method and edge calculation system | |
CN115391058B (en) | SDN-based resource event processing method, resource creation method and system | |
CN111092754A (en) | Real-time access service system and implementation method thereof | |
CN106453118B (en) | Flow control method and flow control system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |