CN112866020A - Cloud center intelligent alarm processing system and method - Google Patents

Cloud center intelligent alarm processing system and method Download PDF

Info

Publication number
CN112866020A
CN112866020A CN202110036592.2A CN202110036592A CN112866020A CN 112866020 A CN112866020 A CN 112866020A CN 202110036592 A CN202110036592 A CN 202110036592A CN 112866020 A CN112866020 A CN 112866020A
Authority
CN
China
Prior art keywords
alarm
resource
index
indexes
duty
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110036592.2A
Other languages
Chinese (zh)
Inventor
杨继伟
魏金雷
于颜华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110036592.2A priority Critical patent/CN112866020A/en
Publication of CN112866020A publication Critical patent/CN112866020A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a cloud center intelligent alarm processing system and method, and belongs to the technical field of IT operation and maintenance. The cloud center intelligent alarm processing system comprises resource management, index definition, duty management, alarm rules, an alarm view template, an alarm and an alarm view, wherein the resource management is used for managing resources, and the resources comprise virtual machines, physical machines, switches, storage equipment, virtual equipment, middleware and application software; defining indexes by the index definition, wherein the indexes are items acquired by data and comprise a CPU (central processing unit), a memory and network access flow; the duty management is used for arranging daily duty personnel; when abnormality occurs, triggering alarm in real time according to alarm rules; and generating an alarm view according to the alarm view template, wherein the alarm view template corresponds to the alarm rule one by one. The cloud center intelligent alarm processing system is beneficial to rapidly troubleshooting problems, finding out fault root causes, improving operation and maintenance efficiency and having good popularization and application values.

Description

Cloud center intelligent alarm processing system and method
Technical Field
The invention relates to the technical field of IT operation and maintenance, and particularly provides a cloud center intelligent alarm processing system and method.
Background
Today, with the rapid development of cloud computing, cloud centers in various places also bloom all the time. The cloud management system equipped in the cloud center ensures that various services of the cloud center are normally carried out, and is also equipped with a set of operation and maintenance management system for ensuring the long-term stable operation of the system. The operation and maintenance management system generally performs operation and maintenance management on various devices in the cloud center, and when the system is abnormal, how to timely discover and solve the abnormality becomes a difficult problem of the operation and maintenance management system.
The existing operation and maintenance management system of the cloud center has high dependence on operation and maintenance personnel, and monitoring data of resources are respectively collected and are not connected with each other. When an alarm occurs or a fault occurs, problems need to be manually checked through various searches, and time and labor are wasted.
Disclosure of Invention
The technical task of the invention is to provide an intelligent alarm processing system of a cloud center, which can associate resources of the cloud center and also associate alarms, is beneficial to rapidly troubleshooting problems, finding out fault root causes and improving operation and maintenance efficiency.
The invention further provides an intelligent alarm processing method of the cloud center.
In order to achieve the purpose, the invention provides the following technical scheme:
a cloud center intelligent alarm processing system comprises resource management, index definition, duty management, alarm rules, an alarm view template, an alarm and an alarm view, wherein the resource management is used for managing resources, and the resources comprise virtual machines, physical machines, switches, storage equipment, virtual equipment, middleware and application software; defining indexes by the index definition, wherein the indexes are items acquired by data and comprise a CPU (central processing unit), a memory and network access flow; the duty management is used for arranging daily duty personnel; when abnormality occurs, triggering alarm in real time according to alarm rules; generating an alarm view according to an alarm view template, wherein the alarm view template corresponds to the alarm rule one by one; and triggering an alarm when the index is not in the threshold range, and generating an alarm view according to an alarm rule associated with the alarm information.
Preferably, the attributes of the resource management include a code, a name, a resource type, and an associated resource. These resources are first manually entered into the system or can be automatically discovered and automatically entered into the system. Each resource can have its own personalized attribute, the attributes of the resources are persisted in a database, and each resource can be separately tabulated and maintained. The resource associated resource refers to a resource to which the resource belongs or a resource having a connection relationship, for example, if the virtual machine is located on one physical machine, a resource to which the virtual machine belongs is the physical machine, and an associated resource of the virtual machine is the physical machine. For another example, if the physical machine is connected to the network through the switch, one of the associated resources of the physical machine is the switch. One resource can correspond to multiple associated resources, and the associated resources are configured only by configuring the associated resource type.
Preferably, the attributes of the index defined by the index include a code, a name, a resource to which the index belongs, and a unit. The encoding of the indicator definition must be consistent with the encoding of the indicator of the data acquisition layer.
Preferably, the alarm rule comprises a rule name, an index used by the rule, a threshold value of the index, and a processing suggestion when the index is not in the threshold value range. The alarm rules may define alarm levels: severe, primary, secondary, general, warning. An alarm threshold is set for each level of alarm.
Preferably, the alarm view template comprises a correlation index, an index display form, a time range and a duty.
Wherein, the associated index is an index associated with an index in the alarm rule. If the index in the alarm rule is a.1 and the resource to which the alarm rule belongs is a, the correlation index may be other indexes of the resource a to which the correlation index belongs or indexes of other resources. The other resources are generally associated resources of resource a, that is, resources having an affiliation or connection relationship with resource a. If certain contact exists in the business and the alarm problem can be checked, the correlation index can be set. The configuration associated index is first selected from the other indexes of resource a and the associated resources of resource a. The index display form refers to a display form of the associated index. Whether a graph or a histogram, etc. Time range. The correlation index shows the performance data in which time range. This time frame may be set to the first hour, the first two hours, etc. of generating the alert. And (4) on duty. Refers to the person on duty and the contact address in the time range.
The on-duty management is an indispensable module of the cloud center, and the cloud center reasonably arranges daily on-duty personnel through the on-duty management system. When the cloud center system generates an alarm or the fault cannot be automatically repaired, the person on duty on the day can be found through duty management as soon as possible, the person on duty is contacted, and the problem is solved as soon as possible, so that the healthy operation of the cloud center system is guaranteed. The duty management of the invention mainly maintains the arrangement of daily operation and maintenance duty personnel, and can quickly contact the duty personnel through telephone or mail.
A cloud center intelligent alarm processing method is used for managing resources and configuring associated resources for the resources; defining indexes for each resource, wherein the indexes are consistent with the indexes for data acquisition; the on-duty personnel and the contact way of each day are maintained through on-duty management; by configuring an alarm rule, when the performance data is not in a threshold range, triggering an alarm and giving an alarm processing suggestion; and dynamically generating an alarm view through an alarm view template, displaying a performance chart of the associated index of the index in a specified time range, showing whether an alarm is generated, finding the on-duty personnel and the contact way in the specified time range, and giving a processing suggestion.
Preferably, the associated resources of the resources are configured, the index belongs to one of the resources, an alarm rule is set for the index, the alarm rule corresponds to an alarm view template, and an alarm view is dynamically generated when an alarm is triggered.
Preferably, each resource is configured with a resource type, and the associated resource is configured as a resource having an affiliation or connection relationship with the resource.
Compared with the prior art, the cloud center intelligent alarm processing method has the following outstanding beneficial effects: the cloud center intelligent alarm processing method has the advantages that resources are related, alarms are related, and an alarm view is constructed, so that when the cloud center system is abnormal, the cloud center intelligent alarm processing method is beneficial to quickly positioning problems, finding out fault root causes, improving operation and maintenance efficiency, guaranteeing long-term stable operation of the system, and has good popularization and application values.
Drawings
FIG. 1 is an architecture diagram of a cloud-centric intelligent alarm processing system according to the present invention;
FIG. 2 is a resource index and resource association topological graph of the cloud center intelligent alarm processing system according to the present invention;
fig. 3 is a structural relationship diagram of the cloud center intelligent alarm processing system according to the present invention.
Detailed Description
The cloud center intelligent alarm processing system and method of the present invention will be further described in detail with reference to the accompanying drawings and embodiments.
Examples
As shown in fig. 1, the cloud-centric intelligent alarm processing system of the present invention includes resource management, index definition, duty management, alarm rules, an alarm view template, an alarm and an alarm view.
The resources are managed by resource management, and the resources comprise virtual machines, physical machines, switches, storage devices, virtual devices, middleware and application software. Attributes of a resource for resource management include a code, a name, a resource type, and an associated resource. These resources are first manually entered into the system or can be automatically discovered and automatically entered into the system. Each resource can have its own personalized attribute, the attributes of the resources are persisted in a database, and each resource can be separately tabulated and maintained. The resource associated resource refers to a resource to which the resource belongs or a resource having a connection relationship, for example, if the virtual machine is located on one physical machine, a resource to which the virtual machine belongs is the physical machine, and an associated resource of the virtual machine is the physical machine. For another example, if the physical machine is connected to the network through the switch, one of the associated resources of the physical machine is the switch. One resource can correspond to multiple associated resources, and the associated resources are configured only by configuring the associated resource type.
The index is defined by the index definition, and the index is a data acquisition item, including a CPU, a memory and network access flow. As shown in fig. 2, each resource may correspond to multiple indexes, and resource a corresponds to indexes a.1, a.2, and a.3; resource B corresponds to indexes b.1 and b.2; resource C corresponds to indexes c.1 and c.2; resource D corresponds to index d.1. The basic attributes defined by the index include: code, name, resource and unit. The encoding of the indicator definition must be consistent with the encoding of the indicator of the data acquisition layer.
The on-duty management is used to schedule daily on-duty personnel. The on-duty management is an indispensable module of the cloud center, and the cloud center reasonably arranges daily on-duty personnel through an on-duty management system. When the cloud center system generates an alarm or the fault cannot be automatically repaired, the person on duty on the day can be found through duty management as soon as possible, the person on duty is contacted, and the problem is solved as soon as possible, so that the healthy operation of the cloud center system is guaranteed. The duty management of the invention mainly maintains the arrangement of daily operation and maintenance duty personnel, and can quickly contact the duty personnel through telephone or mail.
And when the abnormity occurs, triggering an alarm in real time according to an alarm rule. The alarm rule includes a rule name, an index used by the rule, a threshold value of the index, and a processing suggestion when the index is not within the threshold value. The alarm rules may define alarm levels: severe, primary, secondary, general, warning. An alarm threshold is set for each level of alarm.
And generating an alarm view according to the alarm view template, wherein the alarm view template corresponds to the alarm rule one by one. The alarm view template comprises associated indexes, an index display form, a time range and a duty. The associated index is an index associated with an index in the alarm rule. If the index in the alarm rule is a.1 and the resource to which the alarm rule belongs is a, the correlation index may be other indexes of the resource a to which the correlation index belongs or indexes of other resources. The other resources are generally associated resources of resource a, that is, resources having an affiliation or connection relationship with resource a. If certain contact exists in the business and the alarm problem can be checked, the correlation index can be set. The configuration associated index is first selected from the other indexes of resource a and the associated resources of resource a. The index display form refers to a display form of the associated index. Whether a graph or a histogram, etc. Time range. The correlation index shows the performance data in which time range. This time frame may be set to the first hour, the first two hours, etc. of generating the alert. And (4) on duty. Refers to the person on duty and the contact address in the time range.
According to the set alarm rule, when a certain index is not in the threshold range, an alarm is triggered, alarm information is displayed in colors according to the alarm level set in the alarm rule, and the threshold and the value of the index when the alarm is generated are displayed at the same time. According to the alarm rule associated with the alarm information and the alarm view template associated with the alarm rule, the alarm view of the alarm information can be dynamically generated, modules of performance data, a duty management system, the alarm rule and the alarm view template which need to be associated with the alarm view are generated, and the structural relationship between the alarm view and each module is shown in the attached figure 3.
The generated alarm view comprises: time range, associated index chart, and alarm processing suggestion.
(1) Time range. If the time range set by the alarm view template is the previous hour of alarm generation, and the alarm generation time is xxxx year y month z day 10 point 10 min 0 s, the time range displayed in the alarm view is xxxx year y month z day 09 point 10 min 0 s to xxxx year y month z day 10 point 10 min 0 s.
(2) And (5) associating the index chart. And (3) reading the performance data of the associated index in the time range indicated by the (1) according to the associated index and the display form set in the alarm view template, and dynamically generating an ecological chart. And if the associated index has a set alarm rule and an alarm is generated in the time range indicated by the step (1), identifying in the chart.
(3) And (4) a person on duty. And if the alarm view template is on duty, reading the on-duty personnel and the contact information thereof in the on-duty system within the time range according to the time range indicated by the step (1).
(4) And (5) warning processing suggestion. And each alarm rule is configured with an alarm processing suggestion, and the alarm processing suggestion displays the processing suggestion of the alarm and the alarm processing suggestion of the index related to the alarm in terms of entries. If the alarm rule configured by the index a.1 is wa.1, the processing suggestion is ha.1. The related indexes in the alarm view template corresponding to wa.1 are a.2, b.1 and b.2, and the processing suggestions in the alarm rules corresponding to the related indexes are ha.2, hb.1 and hb.2 respectively. When wa.1 triggers an alarm, no alarm is generated by a.2 and b.1 and an alarm is generated by b.2 within the time range specified by the alarm view template of wa.1, then the alarm processing suggestions are displayed as ha.1 and hb.2.
The intelligent alarm processing method of the cloud center carries out resource management and allocates associated resources for the resources; defining indexes for each resource, wherein the indexes are consistent with the indexes for data acquisition; the on-duty personnel and the contact way of each day are maintained through on-duty management; by configuring an alarm rule, when the performance data is not in a threshold range, triggering an alarm and giving an alarm processing suggestion; and dynamically generating an alarm view through an alarm view template, displaying a performance chart of the associated index of the index in a specified time range, showing whether an alarm is generated, finding the on-duty personnel and the contact way in the specified time range, and giving a processing suggestion.
The method comprises the steps of configuring related resources of resources, setting an alarm rule for the index, wherein the index belongs to one of the resources, the alarm rule corresponds to an alarm view template, and dynamically generating an alarm view when an alarm is triggered. Each resource is configured with a resource type, and the associated resource is configured to be a resource which has an affiliation relation or a connection relation with the resource.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. The utility model provides a cloud center intelligence warning processing system which characterized in that: the method comprises the steps of resource management, index definition, duty management, alarm rules, an alarm view template, an alarm and an alarm view, wherein the resource management is used for managing resources, and the resources comprise virtual machines, physical machines, switches, storage equipment, virtual equipment, middleware and application software; defining indexes by the index definition, wherein the indexes are items acquired by data and comprise a CPU (central processing unit), a memory and network access flow; the duty management is used for arranging daily duty personnel; when abnormality occurs, triggering alarm in real time according to alarm rules; generating an alarm view according to an alarm view template, wherein the alarm view template corresponds to the alarm rule one by one; and triggering an alarm when the index is not in the threshold range, and generating an alarm view according to an alarm rule associated with the alarm information.
2. The cloud-centric intelligent alarm processing system according to claim 1, wherein: the attributes of the resource managed by the resource include a code, a name, a resource type, and an associated resource.
3. The cloud-centric intelligent alarm processing system according to claim 2, wherein: the attributes of the indexes defined by the indexes comprise codes, names, resources and units.
4. The cloud-centric intelligent alarm processing system according to claim 3, wherein: the alarm rule comprises a rule name, an index used by the rule, a threshold value of the index, and a processing suggestion when the index is not in the threshold value range.
5. The cloud-centric intelligent alarm processing system according to claim 4, wherein: the alarm view template comprises correlation indexes, an index display form, a time range and a duty.
6. A cloud center intelligent alarm processing method is characterized in that: the method carries out resource management and allocates associated resources for the resources; defining indexes for each resource, wherein the indexes are consistent with the indexes for data acquisition; the on-duty personnel and the contact way of each day are maintained through on-duty management; by configuring an alarm rule, when the performance data is not in a threshold range, triggering an alarm and giving an alarm processing suggestion; and dynamically generating an alarm view through an alarm view template, displaying a performance chart of the associated index of the index in a specified time range, showing whether an alarm is generated, finding the on-duty personnel and the contact way in the specified time range, and giving a processing suggestion.
7. The cloud-centric intelligent alarm processing method according to claim 6, characterized in that: configuring the related resource of the resource, wherein the index belongs to one resource, setting an alarm rule for the index, wherein the alarm rule corresponds to an alarm view template, and dynamically generating an alarm view when an alarm is triggered.
8. The cloud-centric intelligent alarm processing method according to claim 7, characterized in that: each resource is configured with a resource type, and the associated resource is configured to be a resource which has an affiliation relation or a connection relation with the resource.
CN202110036592.2A 2021-01-12 2021-01-12 Cloud center intelligent alarm processing system and method Pending CN112866020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110036592.2A CN112866020A (en) 2021-01-12 2021-01-12 Cloud center intelligent alarm processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110036592.2A CN112866020A (en) 2021-01-12 2021-01-12 Cloud center intelligent alarm processing system and method

Publications (1)

Publication Number Publication Date
CN112866020A true CN112866020A (en) 2021-05-28

Family

ID=76002885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110036592.2A Pending CN112866020A (en) 2021-01-12 2021-01-12 Cloud center intelligent alarm processing system and method

Country Status (1)

Country Link
CN (1) CN112866020A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103746849A (en) * 2014-01-14 2014-04-23 浪潮电子信息产业股份有限公司 IT (information technology) operation and maintenance management system based on mobile intelligent terminal
CN104410535A (en) * 2014-12-23 2015-03-11 浪潮电子信息产业股份有限公司 Intelligent monitoring and alarming method for cloud resources
CN108829558A (en) * 2018-05-22 2018-11-16 郑州云海信息技术有限公司 A kind of intelligent operation management method and system of data center's alarm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103746849A (en) * 2014-01-14 2014-04-23 浪潮电子信息产业股份有限公司 IT (information technology) operation and maintenance management system based on mobile intelligent terminal
CN104410535A (en) * 2014-12-23 2015-03-11 浪潮电子信息产业股份有限公司 Intelligent monitoring and alarming method for cloud resources
CN108829558A (en) * 2018-05-22 2018-11-16 郑州云海信息技术有限公司 A kind of intelligent operation management method and system of data center's alarm

Similar Documents

Publication Publication Date Title
CN108512689A (en) Micro services business monitoring method and server
CN109284251A (en) Blog management method, device, computer equipment and storage medium
CN104407964A (en) Centralized monitoring system and method based on data center
CN107124298A (en) Alert aggregation method and system
CN114253228B (en) Industrial equipment object modeling method and device based on digital twin
CN109240876A (en) Example monitoring method, computer readable storage medium and terminal device
CN112783901A (en) Internet of things time sequence big data processing method based on Internet of things middleware
CN102571413B (en) Method for resource management under cluster environment
CN112579558A (en) Method, device, storage medium and equipment for displaying topological graph
US8850321B2 (en) Cross-domain business service management
CN113590432A (en) Database inspection method and device
CN114157679A (en) Cloud-native-based distributed application monitoring method, device, equipment and medium
CN109858807A (en) Enterprise operation monitoring method and system
CN112866020A (en) Cloud center intelligent alarm processing system and method
CN112784129A (en) Pump station equipment operation and maintenance data supervision platform
CN111833110A (en) Customer life cycle positioning method and device, electronic equipment and storage medium
CN116136801B (en) Cloud platform data processing method and device, electronic equipment and storage medium
CN115269554A (en) Tree data management method, device, equipment and medium based on multi-service scene
CN110471373B (en) Information processing method, program, and information processing apparatus
CN114860851A (en) Data processing method, device, equipment and storage medium
CN113962656A (en) Power grid data asset management method, system, equipment and storage medium
CN109189786B (en) Method for periodically generating custom report form for network element management system
CN113326401A (en) Method and system for generating field blood margin
CN113971500A (en) Data subdivision management method and device and data management platform
CN111597179B (en) Method and device for automatically cleaning data, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210528

RJ01 Rejection of invention patent application after publication