CN105376100B - A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource - Google Patents
A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource Download PDFInfo
- Publication number
- CN105376100B CN105376100B CN201510902578.0A CN201510902578A CN105376100B CN 105376100 B CN105376100 B CN 105376100B CN 201510902578 A CN201510902578 A CN 201510902578A CN 105376100 B CN105376100 B CN 105376100B
- Authority
- CN
- China
- Prior art keywords
- service
- alarm
- responsible
- alarm regulation
- warning rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Abstract
The present invention relates to cloud platform monitoring resource technical field, especially a kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource.Present invention collection monitoring data regular first;Then monitoring rules are set;Then start all distributed warning rule evaluation services;Each alarm regulation evaluation services broadcast the service status information of itself in next step, whether each service starting time for judging oneself is earliest, the process that alarm regulation assesses main service is serviced and executed based on if starting the time earliest, if the starting time is not to execute alarm regulation earliest to assess non-master service.When the present invention solves all alarm regulations of single service inspection, processing capacity is weak, processing is delayed, single alarm regulation evaluation services exit extremely will lead to the resource monitor service failure of entire cloud platform, is unable to meet production the disadvantages of environmental resource monitoring service High Availabitity rigors;It can be applied on the monitoring resource field of cloud computing.
Description
Technical field
The present invention relates to cloud platform monitoring resource technical field, especially a kind of distribution suitable for cloud platform monitoring resource
Formula alarm regulation appraisal procedure.
Background technique
Cloud computing resources huge number, and cloud platform is born and is monitored to various resources.And it faces and wants various using field
When scape, need to spend a large amount of man power and material to develop cloud platform business function and support the monitoring process similar, but plant
The class cloud platform resource different with details, can not quick response user demand, bring following problems:
First is that the cost of overlapping development and time investment need to monitor different resource types according to specific money
Source is developed.
Second is that monitoring resource cannot need to adapt to various usage scenarios.
Some specific resources is monitored third is that can not dynamically suspend, unwanted repetition mail notification is often sent, to user
Normal work be made into puzzlement.
Fourth is that secondary development low efficiency, operation flow cannot be shared.
Fifth is that conventional single monitoring service processing capacity is weak.
Sixth is that processing delay is big when single service processing a large amount of monitoring rules.
Seventh is that single alarm regulation evaluation services exit the resource monitor service failure that will lead to entire cloud platform, nothing extremely
Method meets production environment resource monitor service High Availabitity rigors.
In order to be the monitoring resource met to various businesses scene with less cost input, need a kind of suitable for cloud
The method of the distributed warning rule evaluation of platform resource monitoring, user dispose multiple distributed warning rule evaluation services, just
Miscellaneous cloud platform can be monitored and monitor resource, and distributed treatment is carried out to monitoring rules, effectively solve single monitoring clothes
Business processing capacity is weak, processing delay is big, is unable to the problems such as High Availabitity.
Summary of the invention
Present invention solves the technical problem that being to provide a kind of distributed warning rule suitable for cloud platform monitoring resource
The method of assessment, solves that single monitoring alarm rule evaluation processing capacity is weak, the processing alarm regulation processing of single monitoring service
Delay is big, individually service the requirement not being able to satisfy user to monitoring service High Availabitity, abnormal exit of single service leads to entire cloud
The problems such as platform monitoring service is failed.
The technical solution that the present invention solves above-mentioned technical problem is:
The method includes the following steps:
Step 1: regular collection monitoring data;
Step 2: setting monitoring rules;
Step 3: starting all distributed warning rule evaluation services;
Step 4: each alarm regulation evaluation services broadcast the service status information of itself;
Step 5: judging whether the service starting time of oneself is earliest, if executing step 6, executes step 12 if not;
Step 6: being arranged based on itself and service;
Step 7: query monitor rule sum distributes to itself and the responsible alarm regulation of all non-master service institutes according to algorithm
List;
Step 8: the alarm regulation list that poll is responsible for;
Step 9: alarm assessment and alarming assignment;
Step 10: broadcasting the service status information of itself;
Step 11: being recycled into next cycle, execute step 7;
Step 12: itself is set for non-master service;
Step 13: the alarm regulation list that poll is responsible for;
Step 14: alarm assessment and alarming assignment;
Step 15: checking whether main service survives, if executing step 16, execute step 5 if not;
Step 16: broadcasting the service status information of itself;
Step 17: being recycled into next cycle, execute step 13.
User according to business needs, be arranged monitored item, filter condition, data statistics mode, initial time, the end time,
The time interval of statistics, threshold value comparison mode, alarm triggered movement, whether repeat alarm triggered movement, whether give birth to
Imitate alarm regulation content;
The monitored item, including cpu busy percentage, cpu load, disk utilization, network uplink byte number, network uplink
Rate, network downstream byte number, network downstream rate, disk IO read-write byte number/per second, meshed network connection status, application
The connection status of serve port, physical machine temperature, mainboard fan revolving speed, node runing time, each cloud storage pond utilization rate,
Entire cloud storage utilization rate, the various types of mirror image sum.
The step 7 is the sum of main service-seeking alarm regulation, and according to the quantity of non-master service, if can average mark
Match, is then responsible for non-master service;If cannot if distribute the alarm regulation of the overwhelming majority and be responsible for each non-master service, and bear certainly
The alarm regulation of fraction is blamed, and the responsible alarm regulation quantity of institute must be shorter than each non-master quantity for being responsible for of service.
When the alarm assessment and alarming assignment: the statistical result of its monitoring data is inquired according to rule, according to every
More whether the threshold value of rule and its statistical result meet alarm conditions, and alarming assignment is triggered if meeting;Alarming assignment is logical
It often include that alarm log records, the address url of user's offer, mail notification in calling rule, and can choose execution wherein
One or more kinds of tasks.
The step 15 is: non-master service inspects periodically whether main service survives, if then continuing to execute non-master service
Process, if otherwise reselecting new main service.
The method that the present invention passes through distributed warning rule evaluation avoids the processing capacity for individually alerting assessment rule service
It is weak, it avoids single monitoring service processing alarm regulation processing delay big, single service is avoided not to be able to satisfy user to monitoring service
The requirement of High Availabitity, if the availability requirement that user needs to improve alarm assessment only needs to start more distributed warning rules
Evaluation services are realized and can be achieved with the monitoring to various cloud platform resources, operation maintenance personnel root with less cost input
According to the monitoring rules that the business scenario combination of oneself needs, without developing again, quick response user demand.Area of the present invention
Not in general rule-based cloud platform monitoring system cannot use distributed warning rule evaluation the shortcomings that.
Detailed description of the invention
The following further describes the present invention with reference to the drawings:
Fig. 1 is flow chart;
Fig. 2 is building-block of logic of the present invention.
Specific embodiment
There are many embodiments of the present invention, illustrates one of implementation method by taking privately owned cloud platform as an example here, such as schemes
1, shown in 2, specific implementation process of the present invention is as follows:
1, regular collection monitoring data
2, monitoring rules are set
3, start all distributed warning rule evaluation services
4, each alarm regulation evaluation services broadcast the service status information of itself
5, judge whether the service starting time of oneself is earliest, if executing main service flow journey, otherwise executes non-master service
Process
6, it is arranged based on itself and services
7, query monitor rule sum distributes to itself and the responsible alarm regulation list of all non-master service institutes according to algorithm
8, the alarm regulation list that poll is responsible for
9, alarm assessment and alarming assignment
10, the service status information of itself is broadcasted,Invocation step4 method
11, it is recycled into next cycle,Invocation step7 method
12, itself is set for non-master service
13, the alarm regulation list that poll is responsible for, the method for invocation step 8
14, alarm assessment and alarming assignment, the method for invocation step 9
16, the service status information of itself, the method for invocation step 4 are broadcasted
17, it is recycled into next cycle, continues to execute the process of non-serving.
18, whole flow process terminates.
Claims (5)
1. a kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource, it is characterised in that: the method
Include the following steps:
Step 1: regular collection monitoring data;
Step 2: setting alarm regulation;
Step 3: starting all distributed warning rule evaluation services;
Step 4: each alarm regulation evaluation services broadcast the service status information of itself;
Step 5: judging whether the service starting time of oneself is earliest, if executing step 6, executes step 12 if not;
Step 6: being arranged based on itself and service;
Step 7: query warning rule sum distributes to itself according to algorithm and the responsible alarm regulation of all non-master service institutes arranges
Table;
Step 8: the alarm regulation list that poll is responsible for;
Step 9: alarm assessment and alarming assignment;The alarm assessment and alarming assignment are to inquire it according to rule to monitor number
According to statistical result, more whether alarm conditions are met according to the threshold value of every rule and its statistical result, is triggered if meeting
Alarming assignment;Alarming assignment generally includes alarm log record, the address url of user's offer, mail notification in calling rule, and
And it can choose execution one of which or multiple-task;
Step 10: broadcasting the service status information of itself;
Step 11: being recycled into next cycle, execute step 7;
Step 12: itself is set for non-master service;
Step 13: the alarm regulation list that poll is responsible for;
Step 14: alarm assessment and alarming assignment;
Step 15: checking whether main service survives, if executing step 16, execute step 5 if not;
Step 16: broadcasting the service status information of itself;
Step 17: being recycled into next cycle, execute step 13.
2. distributed warning rule evaluation method according to claim 1, it is characterised in that: user sets according to business needs
Alarm regulation is set, the content of alarm regulation includes: setting monitored item, filter condition, data statistics mode, initial time, end
Time, the time interval of statistics, threshold value comparison mode, alarm triggered movement, whether repeat alarm triggered movement, be
It is no to come into force;
The monitored item, including cpu busy percentage, cpu load, disk utilization, network uplink byte number, network uplink speed
Rate, network downstream byte number, network downstream rate, disk IO read-write byte number/per second, meshed network connection status, using clothes
It is engaged in the connection status of port, physical machine temperature, mainboard fan revolving speed, node runing time, each cloud storage pond utilization rate, whole
A cloud storage utilization rate, mirror image sum.
3. distributed warning rule evaluation method according to claim 1, it is characterised in that: the step 7 is main clothes
The sum for query warning rule of being engaged in, and be responsible for if energy mean allocation to non-master service according to the quantity of non-master service;If no
The alarm regulation that the overwhelming majority can then be distributed is responsible for each non-master service, and itself is responsible for the alarm regulation of fraction, and
Responsible alarm regulation quantity must be shorter than each non-master servicing the quantity be responsible for.
4. distributed warning rule evaluation method according to claim 2, it is characterised in that: the step 7 is main clothes
The sum for query warning rule of being engaged in, and be responsible for if energy mean allocation to non-master service according to the quantity of non-master service;If no
The alarm regulation that the overwhelming majority can then be distributed is responsible for each non-master service, and itself is responsible for the alarm regulation of fraction, and
Responsible alarm regulation quantity must be shorter than each non-master servicing the quantity be responsible for.
5. distributed warning rule evaluation method according to any one of claims 1 to 4, it is characterised in that: the step
Rapid 15 are: non-master service inspects periodically whether main service survives, if then continuing to execute non-master service procedure, if otherwise selecting again
Select new main service.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510902578.0A CN105376100B (en) | 2015-12-09 | 2015-12-09 | A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510902578.0A CN105376100B (en) | 2015-12-09 | 2015-12-09 | A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105376100A CN105376100A (en) | 2016-03-02 |
CN105376100B true CN105376100B (en) | 2019-05-21 |
Family
ID=55377927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510902578.0A Active CN105376100B (en) | 2015-12-09 | 2015-12-09 | A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105376100B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106301919A (en) * | 2016-08-17 | 2017-01-04 | 浪潮电子信息产业股份有限公司 | The warning system of a kind of privatization cloud platform and its implementation |
CN107453951A (en) * | 2017-08-15 | 2017-12-08 | 郑州云海信息技术有限公司 | A kind of storage pool monitoring method and device |
CN108270618B (en) * | 2017-12-30 | 2021-07-16 | 华为技术有限公司 | Alarm determination method, device and alarm system |
CN108833414B (en) * | 2018-06-20 | 2019-03-15 | 重庆市地理信息中心 | A kind of online service abnormality monitoring method |
CN108920327A (en) * | 2018-06-27 | 2018-11-30 | 郑州云海信息技术有限公司 | A kind of cloud computing alarm method and device |
CN109710486A (en) * | 2018-11-28 | 2019-05-03 | 国云科技股份有限公司 | A method of the customized example warning strategies based on cloudy platform |
CN109728938A (en) * | 2018-12-11 | 2019-05-07 | 国云科技股份有限公司 | A kind of method of assessment system service level |
CN110933512B (en) * | 2019-10-23 | 2022-05-06 | 视联动力信息技术股份有限公司 | Load determination method and device based on video network |
CN111431733B (en) * | 2020-02-20 | 2021-06-22 | 拉扎斯网络科技(上海)有限公司 | Service alarm coverage information evaluation method and device |
CN114285642B (en) * | 2021-12-24 | 2023-07-18 | 苏州浪潮智能科技有限公司 | Control management method and device for host sensitive service and port in cloud platform |
CN114826871B (en) * | 2022-02-23 | 2024-04-12 | 浪潮软件集团有限公司 | Cloud platform monitoring alarm processing function test method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101695049A (en) * | 2009-11-10 | 2010-04-14 | 杭州华三通信技术有限公司 | Method and device for processing businesses in monitoring system |
CN103841100A (en) * | 2014-02-18 | 2014-06-04 | 河海大学 | System for having access to flood-prevention early warning service based on Android tablet terminal and construction method |
CN104184819A (en) * | 2014-08-29 | 2014-12-03 | 城云科技(杭州)有限公司 | Multi-hierarchy load balancing cloud resource monitoring method |
CN104410512A (en) * | 2014-10-28 | 2015-03-11 | 国云科技股份有限公司 | Resource monitoring alarm framework suitable for cloud computation and method thereof |
CN104657250A (en) * | 2014-12-16 | 2015-05-27 | 无锡华云数据技术服务有限公司 | Monitoring method for monitoring performance of cloud host |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007048653A2 (en) * | 2005-10-26 | 2007-05-03 | International Business Machines Corporation | A method and system for systems management tasks on endpoints |
-
2015
- 2015-12-09 CN CN201510902578.0A patent/CN105376100B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101695049A (en) * | 2009-11-10 | 2010-04-14 | 杭州华三通信技术有限公司 | Method and device for processing businesses in monitoring system |
CN103841100A (en) * | 2014-02-18 | 2014-06-04 | 河海大学 | System for having access to flood-prevention early warning service based on Android tablet terminal and construction method |
CN104184819A (en) * | 2014-08-29 | 2014-12-03 | 城云科技(杭州)有限公司 | Multi-hierarchy load balancing cloud resource monitoring method |
CN104410512A (en) * | 2014-10-28 | 2015-03-11 | 国云科技股份有限公司 | Resource monitoring alarm framework suitable for cloud computation and method thereof |
CN104657250A (en) * | 2014-12-16 | 2015-05-27 | 无锡华云数据技术服务有限公司 | Monitoring method for monitoring performance of cloud host |
Also Published As
Publication number | Publication date |
---|---|
CN105376100A (en) | 2016-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105376100B (en) | A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource | |
CN109471705B (en) | Task scheduling method, device and system, and computer device | |
CN111049705B (en) | Method and device for monitoring distributed storage system | |
US9870269B1 (en) | Job allocation in a clustered environment | |
CN105471671A (en) | Method for customizing monitoring rules of cloud platform resources | |
EP2503733B1 (en) | Data collecting method, data collecting apparatus and network management device | |
US20180004568A1 (en) | Distributed task system and service processing method based on internet of things | |
US9588813B1 (en) | Determining cost of service call | |
CN103905533A (en) | Distributed type alarm monitoring method and system based on cloud storage | |
CN106713396B (en) | Server scheduling method and system | |
US9535749B2 (en) | Methods for managing work load bursts and devices thereof | |
CN110716800B (en) | Task scheduling method and device, storage medium and electronic equipment | |
CN110928655A (en) | Task processing method and device | |
CN105159769A (en) | Distributed job scheduling method suitable for heterogeneous computational capability cluster | |
CN111880939A (en) | Container dynamic migration method and device and electronic equipment | |
CN110727508A (en) | Task scheduling system and scheduling method | |
CN106034047B (en) | Data processing method and device | |
Simoncelli et al. | Stream-monitoring with blockmon: convergence of network measurements and data analytics platforms | |
Thamsen et al. | Mary, Hugo, and Hugo*: Learning to schedule distributed data‐parallel processing jobs on shared clusters | |
Dayarathna et al. | Energy consumption analysis of data stream processing: a benchmarking approach | |
CN107391262B (en) | Job scheduling method and device | |
CN107609129B (en) | Log real-time processing system | |
CN110333930A (en) | Digital Platform system | |
CN109086128A (en) | Method for scheduling task and device | |
JP2014115684A (en) | System resources managing method for virtual systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 523808 19th Floor, Cloud Computing Center, Chinese Academy of Sciences, No. 1 Kehui Road, Songshan Lake Hi-tech Industrial Development Zone, Dongguan City, Guangdong Province Applicant after: G-Cloud Technology Co., Ltd. Address before: 523808 No. 14 Building, Songke Garden, Songshan Lake Science and Technology Industrial Park, Dongguan City, Guangdong Province Applicant before: G-Cloud Technology Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |