CN105376100B - A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource - Google Patents

A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource Download PDF

Info

Publication number
CN105376100B
CN105376100B CN201510902578.0A CN201510902578A CN105376100B CN 105376100 B CN105376100 B CN 105376100B CN 201510902578 A CN201510902578 A CN 201510902578A CN 105376100 B CN105376100 B CN 105376100B
Authority
CN
China
Prior art keywords
service
alarm
responsible
alarm regulation
warning rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510902578.0A
Other languages
Chinese (zh)
Other versions
CN105376100A (en
Inventor
马桂成
杨松
季统凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
G Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Cloud Technology Co Ltd filed Critical G Cloud Technology Co Ltd
Priority to CN201510902578.0A priority Critical patent/CN105376100B/en
Publication of CN105376100A publication Critical patent/CN105376100A/en
Application granted granted Critical
Publication of CN105376100B publication Critical patent/CN105376100B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The present invention relates to cloud platform monitoring resource technical field, especially a kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource.Present invention collection monitoring data regular first;Then monitoring rules are set;Then start all distributed warning rule evaluation services;Each alarm regulation evaluation services broadcast the service status information of itself in next step, whether each service starting time for judging oneself is earliest, the process that alarm regulation assesses main service is serviced and executed based on if starting the time earliest, if the starting time is not to execute alarm regulation earliest to assess non-master service.When the present invention solves all alarm regulations of single service inspection, processing capacity is weak, processing is delayed, single alarm regulation evaluation services exit extremely will lead to the resource monitor service failure of entire cloud platform, is unable to meet production the disadvantages of environmental resource monitoring service High Availabitity rigors;It can be applied on the monitoring resource field of cloud computing.

Description

A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource
Technical field
The present invention relates to cloud platform monitoring resource technical field, especially a kind of distribution suitable for cloud platform monitoring resource Formula alarm regulation appraisal procedure.
Background technique
Cloud computing resources huge number, and cloud platform is born and is monitored to various resources.And it faces and wants various using field When scape, need to spend a large amount of man power and material to develop cloud platform business function and support the monitoring process similar, but plant The class cloud platform resource different with details, can not quick response user demand, bring following problems:
First is that the cost of overlapping development and time investment need to monitor different resource types according to specific money Source is developed.
Second is that monitoring resource cannot need to adapt to various usage scenarios.
Some specific resources is monitored third is that can not dynamically suspend, unwanted repetition mail notification is often sent, to user Normal work be made into puzzlement.
Fourth is that secondary development low efficiency, operation flow cannot be shared.
Fifth is that conventional single monitoring service processing capacity is weak.
Sixth is that processing delay is big when single service processing a large amount of monitoring rules.
Seventh is that single alarm regulation evaluation services exit the resource monitor service failure that will lead to entire cloud platform, nothing extremely Method meets production environment resource monitor service High Availabitity rigors.
In order to be the monitoring resource met to various businesses scene with less cost input, need a kind of suitable for cloud The method of the distributed warning rule evaluation of platform resource monitoring, user dispose multiple distributed warning rule evaluation services, just Miscellaneous cloud platform can be monitored and monitor resource, and distributed treatment is carried out to monitoring rules, effectively solve single monitoring clothes Business processing capacity is weak, processing delay is big, is unable to the problems such as High Availabitity.
Summary of the invention
Present invention solves the technical problem that being to provide a kind of distributed warning rule suitable for cloud platform monitoring resource The method of assessment, solves that single monitoring alarm rule evaluation processing capacity is weak, the processing alarm regulation processing of single monitoring service Delay is big, individually service the requirement not being able to satisfy user to monitoring service High Availabitity, abnormal exit of single service leads to entire cloud The problems such as platform monitoring service is failed.
The technical solution that the present invention solves above-mentioned technical problem is:
The method includes the following steps:
Step 1: regular collection monitoring data;
Step 2: setting monitoring rules;
Step 3: starting all distributed warning rule evaluation services;
Step 4: each alarm regulation evaluation services broadcast the service status information of itself;
Step 5: judging whether the service starting time of oneself is earliest, if executing step 6, executes step 12 if not;
Step 6: being arranged based on itself and service;
Step 7: query monitor rule sum distributes to itself and the responsible alarm regulation of all non-master service institutes according to algorithm List;
Step 8: the alarm regulation list that poll is responsible for;
Step 9: alarm assessment and alarming assignment;
Step 10: broadcasting the service status information of itself;
Step 11: being recycled into next cycle, execute step 7;
Step 12: itself is set for non-master service;
Step 13: the alarm regulation list that poll is responsible for;
Step 14: alarm assessment and alarming assignment;
Step 15: checking whether main service survives, if executing step 16, execute step 5 if not;
Step 16: broadcasting the service status information of itself;
Step 17: being recycled into next cycle, execute step 13.
User according to business needs, be arranged monitored item, filter condition, data statistics mode, initial time, the end time, The time interval of statistics, threshold value comparison mode, alarm triggered movement, whether repeat alarm triggered movement, whether give birth to Imitate alarm regulation content;
The monitored item, including cpu busy percentage, cpu load, disk utilization, network uplink byte number, network uplink Rate, network downstream byte number, network downstream rate, disk IO read-write byte number/per second, meshed network connection status, application The connection status of serve port, physical machine temperature, mainboard fan revolving speed, node runing time, each cloud storage pond utilization rate, Entire cloud storage utilization rate, the various types of mirror image sum.
The step 7 is the sum of main service-seeking alarm regulation, and according to the quantity of non-master service, if can average mark Match, is then responsible for non-master service;If cannot if distribute the alarm regulation of the overwhelming majority and be responsible for each non-master service, and bear certainly The alarm regulation of fraction is blamed, and the responsible alarm regulation quantity of institute must be shorter than each non-master quantity for being responsible for of service.
When the alarm assessment and alarming assignment: the statistical result of its monitoring data is inquired according to rule, according to every More whether the threshold value of rule and its statistical result meet alarm conditions, and alarming assignment is triggered if meeting;Alarming assignment is logical It often include that alarm log records, the address url of user's offer, mail notification in calling rule, and can choose execution wherein One or more kinds of tasks.
The step 15 is: non-master service inspects periodically whether main service survives, if then continuing to execute non-master service Process, if otherwise reselecting new main service.
The method that the present invention passes through distributed warning rule evaluation avoids the processing capacity for individually alerting assessment rule service It is weak, it avoids single monitoring service processing alarm regulation processing delay big, single service is avoided not to be able to satisfy user to monitoring service The requirement of High Availabitity, if the availability requirement that user needs to improve alarm assessment only needs to start more distributed warning rules Evaluation services are realized and can be achieved with the monitoring to various cloud platform resources, operation maintenance personnel root with less cost input According to the monitoring rules that the business scenario combination of oneself needs, without developing again, quick response user demand.Area of the present invention Not in general rule-based cloud platform monitoring system cannot use distributed warning rule evaluation the shortcomings that.
Detailed description of the invention
The following further describes the present invention with reference to the drawings:
Fig. 1 is flow chart;
Fig. 2 is building-block of logic of the present invention.
Specific embodiment
There are many embodiments of the present invention, illustrates one of implementation method by taking privately owned cloud platform as an example here, such as schemes 1, shown in 2, specific implementation process of the present invention is as follows:
1, regular collection monitoring data
2, monitoring rules are set
3, start all distributed warning rule evaluation services
4, each alarm regulation evaluation services broadcast the service status information of itself
5, judge whether the service starting time of oneself is earliest, if executing main service flow journey, otherwise executes non-master service Process
6, it is arranged based on itself and services
7, query monitor rule sum distributes to itself and the responsible alarm regulation list of all non-master service institutes according to algorithm
8, the alarm regulation list that poll is responsible for
9, alarm assessment and alarming assignment
10, the service status information of itself is broadcasted,Invocation step4 method
11, it is recycled into next cycle,Invocation step7 method
12, itself is set for non-master service
13, the alarm regulation list that poll is responsible for, the method for invocation step 8
14, alarm assessment and alarming assignment, the method for invocation step 9
16, the service status information of itself, the method for invocation step 4 are broadcasted
17, it is recycled into next cycle, continues to execute the process of non-serving.
18, whole flow process terminates.

Claims (5)

1. a kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource, it is characterised in that: the method Include the following steps:
Step 1: regular collection monitoring data;
Step 2: setting alarm regulation;
Step 3: starting all distributed warning rule evaluation services;
Step 4: each alarm regulation evaluation services broadcast the service status information of itself;
Step 5: judging whether the service starting time of oneself is earliest, if executing step 6, executes step 12 if not;
Step 6: being arranged based on itself and service;
Step 7: query warning rule sum distributes to itself according to algorithm and the responsible alarm regulation of all non-master service institutes arranges Table;
Step 8: the alarm regulation list that poll is responsible for;
Step 9: alarm assessment and alarming assignment;The alarm assessment and alarming assignment are to inquire it according to rule to monitor number According to statistical result, more whether alarm conditions are met according to the threshold value of every rule and its statistical result, is triggered if meeting Alarming assignment;Alarming assignment generally includes alarm log record, the address url of user's offer, mail notification in calling rule, and And it can choose execution one of which or multiple-task;
Step 10: broadcasting the service status information of itself;
Step 11: being recycled into next cycle, execute step 7;
Step 12: itself is set for non-master service;
Step 13: the alarm regulation list that poll is responsible for;
Step 14: alarm assessment and alarming assignment;
Step 15: checking whether main service survives, if executing step 16, execute step 5 if not;
Step 16: broadcasting the service status information of itself;
Step 17: being recycled into next cycle, execute step 13.
2. distributed warning rule evaluation method according to claim 1, it is characterised in that: user sets according to business needs Alarm regulation is set, the content of alarm regulation includes: setting monitored item, filter condition, data statistics mode, initial time, end Time, the time interval of statistics, threshold value comparison mode, alarm triggered movement, whether repeat alarm triggered movement, be It is no to come into force;
The monitored item, including cpu busy percentage, cpu load, disk utilization, network uplink byte number, network uplink speed Rate, network downstream byte number, network downstream rate, disk IO read-write byte number/per second, meshed network connection status, using clothes It is engaged in the connection status of port, physical machine temperature, mainboard fan revolving speed, node runing time, each cloud storage pond utilization rate, whole A cloud storage utilization rate, mirror image sum.
3. distributed warning rule evaluation method according to claim 1, it is characterised in that: the step 7 is main clothes The sum for query warning rule of being engaged in, and be responsible for if energy mean allocation to non-master service according to the quantity of non-master service;If no The alarm regulation that the overwhelming majority can then be distributed is responsible for each non-master service, and itself is responsible for the alarm regulation of fraction, and Responsible alarm regulation quantity must be shorter than each non-master servicing the quantity be responsible for.
4. distributed warning rule evaluation method according to claim 2, it is characterised in that: the step 7 is main clothes The sum for query warning rule of being engaged in, and be responsible for if energy mean allocation to non-master service according to the quantity of non-master service;If no The alarm regulation that the overwhelming majority can then be distributed is responsible for each non-master service, and itself is responsible for the alarm regulation of fraction, and Responsible alarm regulation quantity must be shorter than each non-master servicing the quantity be responsible for.
5. distributed warning rule evaluation method according to any one of claims 1 to 4, it is characterised in that: the step Rapid 15 are: non-master service inspects periodically whether main service survives, if then continuing to execute non-master service procedure, if otherwise selecting again Select new main service.
CN201510902578.0A 2015-12-09 2015-12-09 A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource Active CN105376100B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510902578.0A CN105376100B (en) 2015-12-09 2015-12-09 A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510902578.0A CN105376100B (en) 2015-12-09 2015-12-09 A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource

Publications (2)

Publication Number Publication Date
CN105376100A CN105376100A (en) 2016-03-02
CN105376100B true CN105376100B (en) 2019-05-21

Family

ID=55377927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510902578.0A Active CN105376100B (en) 2015-12-09 2015-12-09 A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource

Country Status (1)

Country Link
CN (1) CN105376100B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106301919A (en) * 2016-08-17 2017-01-04 浪潮电子信息产业股份有限公司 The warning system of a kind of privatization cloud platform and its implementation
CN107453951A (en) * 2017-08-15 2017-12-08 郑州云海信息技术有限公司 A kind of storage pool monitoring method and device
CN108270618B (en) * 2017-12-30 2021-07-16 华为技术有限公司 Alarm determination method, device and alarm system
CN108833414B (en) * 2018-06-20 2019-03-15 重庆市地理信息中心 A kind of online service abnormality monitoring method
CN108920327A (en) * 2018-06-27 2018-11-30 郑州云海信息技术有限公司 A kind of cloud computing alarm method and device
CN109710486A (en) * 2018-11-28 2019-05-03 国云科技股份有限公司 A method of the customized example warning strategies based on cloudy platform
CN109728938A (en) * 2018-12-11 2019-05-07 国云科技股份有限公司 A kind of method of assessment system service level
CN110933512B (en) * 2019-10-23 2022-05-06 视联动力信息技术股份有限公司 Load determination method and device based on video network
CN111431733B (en) * 2020-02-20 2021-06-22 拉扎斯网络科技(上海)有限公司 Service alarm coverage information evaluation method and device
CN114285642B (en) * 2021-12-24 2023-07-18 苏州浪潮智能科技有限公司 Control management method and device for host sensitive service and port in cloud platform
CN114826871B (en) * 2022-02-23 2024-04-12 浪潮软件集团有限公司 Cloud platform monitoring alarm processing function test method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101695049A (en) * 2009-11-10 2010-04-14 杭州华三通信技术有限公司 Method and device for processing businesses in monitoring system
CN103841100A (en) * 2014-02-18 2014-06-04 河海大学 System for having access to flood-prevention early warning service based on Android tablet terminal and construction method
CN104184819A (en) * 2014-08-29 2014-12-03 城云科技(杭州)有限公司 Multi-hierarchy load balancing cloud resource monitoring method
CN104410512A (en) * 2014-10-28 2015-03-11 国云科技股份有限公司 Resource monitoring alarm framework suitable for cloud computation and method thereof
CN104657250A (en) * 2014-12-16 2015-05-27 无锡华云数据技术服务有限公司 Monitoring method for monitoring performance of cloud host

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007048653A2 (en) * 2005-10-26 2007-05-03 International Business Machines Corporation A method and system for systems management tasks on endpoints

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101695049A (en) * 2009-11-10 2010-04-14 杭州华三通信技术有限公司 Method and device for processing businesses in monitoring system
CN103841100A (en) * 2014-02-18 2014-06-04 河海大学 System for having access to flood-prevention early warning service based on Android tablet terminal and construction method
CN104184819A (en) * 2014-08-29 2014-12-03 城云科技(杭州)有限公司 Multi-hierarchy load balancing cloud resource monitoring method
CN104410512A (en) * 2014-10-28 2015-03-11 国云科技股份有限公司 Resource monitoring alarm framework suitable for cloud computation and method thereof
CN104657250A (en) * 2014-12-16 2015-05-27 无锡华云数据技术服务有限公司 Monitoring method for monitoring performance of cloud host

Also Published As

Publication number Publication date
CN105376100A (en) 2016-03-02

Similar Documents

Publication Publication Date Title
CN105376100B (en) A kind of distributed warning rule evaluation method suitable for cloud platform monitoring resource
CN109471705B (en) Task scheduling method, device and system, and computer device
CN111049705B (en) Method and device for monitoring distributed storage system
US9870269B1 (en) Job allocation in a clustered environment
CN105471671A (en) Method for customizing monitoring rules of cloud platform resources
EP2503733B1 (en) Data collecting method, data collecting apparatus and network management device
US20180004568A1 (en) Distributed task system and service processing method based on internet of things
US9588813B1 (en) Determining cost of service call
CN103905533A (en) Distributed type alarm monitoring method and system based on cloud storage
CN106713396B (en) Server scheduling method and system
US9535749B2 (en) Methods for managing work load bursts and devices thereof
CN110716800B (en) Task scheduling method and device, storage medium and electronic equipment
CN110928655A (en) Task processing method and device
CN105159769A (en) Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN111880939A (en) Container dynamic migration method and device and electronic equipment
CN110727508A (en) Task scheduling system and scheduling method
CN106034047B (en) Data processing method and device
Simoncelli et al. Stream-monitoring with blockmon: convergence of network measurements and data analytics platforms
Thamsen et al. Mary, Hugo, and Hugo*: Learning to schedule distributed data‐parallel processing jobs on shared clusters
Dayarathna et al. Energy consumption analysis of data stream processing: a benchmarking approach
CN107391262B (en) Job scheduling method and device
CN107609129B (en) Log real-time processing system
CN110333930A (en) Digital Platform system
CN109086128A (en) Method for scheduling task and device
JP2014115684A (en) System resources managing method for virtual systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 523808 19th Floor, Cloud Computing Center, Chinese Academy of Sciences, No. 1 Kehui Road, Songshan Lake Hi-tech Industrial Development Zone, Dongguan City, Guangdong Province

Applicant after: G-Cloud Technology Co., Ltd.

Address before: 523808 No. 14 Building, Songke Garden, Songshan Lake Science and Technology Industrial Park, Dongguan City, Guangdong Province

Applicant before: G-Cloud Technology Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant