CN114443437A

CN114443437A - Alarm root cause output method, apparatus, device, medium, and program product

Info

Publication number: CN114443437A
Application number: CN202210110134.3A
Authority: CN
Inventors: 何城
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-06

Abstract

The disclosure provides an alarm root cause output method which can be applied to the technical field of big data. The alarm root cause output method comprises the following steps: determining an alarm rule of upstream monitoring data; identifying an alarm triggering state of a current monitoring data item relative to a historical monitoring data item in the upstream monitoring data based on an alarm rule; and traversing root cause equipment of the alarm root cause in the equipment topological structure according to the alarm triggering state of the current monitoring data item, and outputting the alarm root cause of the root cause equipment. The present disclosure also provides an alarm root cause output device, apparatus, storage medium, and program product.

Description

Alarm root cause output method, apparatus, device, medium, and program product

Technical Field

The present disclosure relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for outputting an alarm root cause.

Background

At present, in the operation and maintenance process of a data center system, for example, for the operation and maintenance scene of machine room infrastructure, the current operation conditions of various infrastructures of the system need to be collected and analyzed through a sensor and internet of things technology, and the number of monitored sensor point positions is different from hundreds of thousands to millions according to the scale of the data center. When the core node fails, the downstream equipment triggers a large number of alarms at the same time. For example, machine room infrastructure alarms are based on the internet of things data acquisition technology, and meanwhile, alarm aggregation scenes are constructed through integration of tree structure retrieval algorithms to prompt operation and maintenance personnel to alarm root causes.

Disclosure of Invention

In view of at least one of the above technical problems in the determination of an alarm root cause of a data center room infrastructure, the present disclosure provides an alarm root cause output method, apparatus, device, medium, and program product for increasing the alarm root cause location processing speed.

According to a first aspect of the present disclosure, there is provided an alarm root cause output method, including: determining an alarm rule of upstream monitoring data; identifying an alarm triggering state of a current monitoring data item relative to a historical monitoring data item in the upstream monitoring data based on an alarm rule; and traversing root cause equipment of the alarm root cause in the equipment topological structure according to the alarm triggering state of the current monitoring data item, and outputting the alarm root cause of the root cause equipment.

According to the embodiment of the present disclosure, before determining the alarm rule of the upstream monitoring data, the method further includes: determining equipment standard information corresponding to the upstream monitoring data; and configuring the topological structure of the equipment according to the equipment standard information.

According to the embodiment of the disclosure, in determining the device standard information corresponding to the upstream monitoring data, the method includes: standardizing equipment type information corresponding to the upstream monitoring data; standardizing monitoring point location information corresponding to the equipment type information based on the standardized equipment type information to generate equipment standard information; the equipment type information comprises an equipment name and a corresponding equipment number; the monitoring point location information includes a monitoring point location name corresponding to the device name and a corresponding monitoring point location number.

According to an embodiment of the present disclosure, in configuring a device topology according to device standard information, the method includes: and configuring equipment topology association corresponding to the upstream monitoring data according to the equipment standard information through equipment topology editing to complete the configuration of an equipment topology structure.

According to the embodiment of the disclosure, the method for determining the alarm rule of the upstream monitoring data comprises the following steps: establishing a rule item of an alarm rule according to equipment standard information corresponding to the upstream monitoring data; generating an alarm condition corresponding to the rule item to determine an alarm rule; the alarm conditions comprise alarm level, alarm range, alarm equipment type, alarm monitoring point position, alarm triggering condition, alarm recovery condition and alarm topological relation.

According to the embodiment of the present disclosure, before identifying the alarm triggering state of the current monitoring data item in the upstream monitoring data relative to the historical monitoring data item based on the alarm rule, the method further includes: cleaning and standardizing the received upstream monitoring data to generate upstream standardized data; identifying a data change status of a current normalized data item in the upstream normalized data relative to a historical normalized data item; and determining warehousing monitoring data in the upstream standardized data according to the data change state of the current standardized data item.

According to the embodiment of the disclosure, in identifying the alarm triggering state of the current monitoring data item relative to the historical monitoring data item in the upstream monitoring data based on the alarm rule, the method includes: inquiring an alarm rule matched with the current monitoring data item according to the equipment standard information corresponding to the current monitoring data item in the warehousing monitoring data; and responding to the inquired alarm rule, and performing trigger check on the current monitoring data item to determine the alarm trigger state of the current monitoring data item relative to the historical monitoring data item.

According to the embodiment of the disclosure, in response to the queried alarm rule, performing trigger check on the current monitoring data item to determine an alarm trigger state of the current monitoring data item relative to the historical monitoring data item, the method includes: determining that the alarm triggering state of the equipment monitoring point position corresponding to the historical monitoring data item is not triggered through triggering check; and responding to the triggerless alarm triggering state, and when the monitoring point position monitoring information of the equipment corresponding to the current monitoring data item meets the alarm triggering condition of the alarm rule, newly adding the alarm triggering state of the current monitoring data.

According to the embodiment of the present disclosure, in response to the queried alarm rule, performing trigger check on the current monitoring data item to determine an alarm trigger state of the current monitoring data item relative to the historical monitoring data item, the method further includes: through trigger check, determining that the alarm trigger state of the equipment monitoring point position corresponding to the historical monitoring data item is triggered; and responding to the triggered alarm triggering state, and updating the alarm triggering state of the current monitoring data when the monitoring information of the equipment monitoring point position corresponding to the current monitoring data item meets the alarm triggering condition of the alarm rule.

According to the embodiment of the present disclosure, in response to the queried alarm rule, performing trigger check on the current monitoring data item to determine an alarm trigger state of the current monitoring data item relative to the historical monitoring data item, the method further includes: based on the alarm triggering state, when the alarm change corresponding to the current monitoring data item does not exist, updating the alarm information corresponding to the current monitoring data item; or based on the alarm triggering state, when the alarm change corresponding to the current monitoring data item exists, newly adding the alarm information corresponding to the current monitoring data item.

According to the embodiment of the present disclosure, traversing root cause devices of alarm root causes in a device topology according to the alarm trigger state of the current monitoring data item, for outputting the alarm root causes of the root cause devices, includes: determining the position state information of the current monitoring equipment corresponding to the current monitoring data item in the equipment topological structure according to the alarm triggering state of the current monitoring data item; and traversing the upstream equipment in the equipment topological structure according to the position state information of the current monitoring equipment, and outputting root cause equipment.

According to the embodiment of the present disclosure, in a device topology structure, according to the location state information of the current monitoring device, performing traversal on an upstream device thereof, and outputting root cause devices, the method includes: determining the last final trigger device in an alarm trigger state in the traversal execution process of the upstream device; and outputting the final trigger equipment as root cause equipment based on the alarm aggregation information of the final trigger equipment.

According to the embodiment of the present disclosure, traversing root cause devices of alarm root causes in a device topology according to the alarm trigger state of the current monitoring data item, for outputting the alarm root causes of the root cause devices, includes: updating the aggregation alarm information of the root cause equipment; and outputting the updated aggregated alarm information as an alarm root cause.

A second aspect of the present disclosure provides an alarm root cause output apparatus, including a rule determining module, a state identifying module, and a device traversing module. The rule determining module is used for determining an alarm rule of the upstream monitoring data; the state identification module is used for identifying an alarm triggering state of a current monitoring data item relative to a historical monitoring data item in the upstream monitoring data based on an alarm rule; and the equipment traversing module is used for traversing root cause equipment of the alarm root cause in the equipment topological structure according to the alarm triggering state of the current monitoring data item, and is used for outputting the alarm root cause of the root cause equipment.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described alarm root cause output method.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described alarm root cause output method.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above alarm root cause output method.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of an alarm root cause output method, apparatus, device, medium, and program product according to embodiments of the disclosure;

FIG. 2 schematically illustrates a flow diagram of an alarm root cause output method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a composition diagram of a device topology according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a composition diagram of alarm rules, in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a data processing flow diagram for upstream-binned monitoring data, in accordance with an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow diagram of an identification process of an alarm triggered state of a currently monitored data item, in accordance with an embodiment of the present disclosure;

FIG. 7 schematically illustrates an output flow diagram of a root cause device according to an embodiment of the present disclosure;

FIG. 8 is a block diagram schematically illustrating an alarm root cause output apparatus according to an embodiment of the present disclosure; and

FIG. 9 schematically illustrates a block diagram of an electronic device adapted to implement an alarm root cause output method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that these descriptions are illustrative only and are not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In order to realize alarm root cause positioning output, two traditional modes mainly exist in the prior art.

Firstly, extracting the characteristic value or key information of each alarm, grouping and classifying the key information through a preset rule, and performing aggregation reminding on the alarms of the same class. The technology of standardization of alarm information, regular matching analysis of alarm information and the like is mainly applied. However, the above solutions mainly analyze and classify the alarm information, rely on the advanced combing of business rules, and the long-term operation and maintenance refining of alarm grouping information and characteristic values, require the accumulation of expert experience, and cannot automatically or semi-automatically solve the alarm aggregation problem by technical means.

Secondly, alarm aggregation is completed through big data analysis and condition judgment, after an alarm is generated by an upstream system, the system receives the alarm, carries out alarm identification through big data analysis, compares the alarm identification with historical data, judges preset conditions, and obtains the root cause of the alarm through big data analysis and scene elimination. However, the technical solution depends on the analysis and calculation capability of big data, and if the analyzed data volume is large, the processing time is long, the requirement of rapid identification and positioning of the alarm information cannot be met, and a certain time delay exists. Meanwhile, the setting of the judgment condition needs to be exhaustive, the parameters need to be continuously modified according to the analysis result, and the method cannot be rapidly put into operation and maintenance.

It should be noted that the alarm root cause output method and device disclosed by the present disclosure may be used in the field of big data technology and artificial intelligence technology, and may also be used in the financial field and any field other than the financial field.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the data including the personal information of the user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the public order and good custom are not violated. Wherein, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

The embodiment of the disclosure provides an alarm root cause output method, which includes: determining an alarm rule of upstream monitoring data; identifying an alarm triggering state of a current monitoring data item relative to a historical monitoring data item in the upstream monitoring data based on an alarm rule; and traversing root cause equipment of the alarm root cause in the equipment topological structure according to the alarm triggering state of the current monitoring data item, and outputting the alarm root cause of the root cause equipment.

Fig. 1 schematically illustrates an application scenario diagram of an alarm root cause output method, apparatus, device, medium, and program product according to embodiments of the present disclosure.

As shown in fig. 1, the application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the alarm root output method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the alarm root cause output device provided by the embodiments of the present disclosure may be generally disposed in the server 105. The alarm root cause output method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the alarm output device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

The alarm root cause output method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 6 based on the scenario described in fig. 1.

Fig. 2 schematically shows a flow chart of an alarm root cause output method according to an embodiment of the present disclosure.

As shown in fig. 2, the alarm root cause output method of the embodiment includes operations S201 to S203.

In operation S201, determining an alarm rule of upstream monitoring data;

in operation S202, an alarm trigger state of a current monitoring data item relative to a historical monitoring data item in the upstream monitoring data is identified based on an alarm rule;

in operation S203, traversing root cause devices of the alarm root cause in the device topology according to the alarm trigger state of the current monitoring data item, for outputting the alarm root cause of the root cause devices.

The upstream monitoring data is the monitoring data of each device in the operation and maintenance system in the embodiment of the present disclosure. Specifically, for each device in the operation and maintenance system, the device has a corresponding matched monitoring module or monitoring device, and periodically, periodically or in real time monitors the operation of the device, and forms monitoring data. The monitoring data set formed by all the devices in the operation and maintenance system may be used as the upstream monitoring data. The alarm rule is an alarm setting condition matched with each monitoring data item in the upstream monitoring data, and when the alarm setting condition is met, if the monitoring data corresponding to the monitoring data item is abnormal, the monitoring alarm of the monitoring data item can be realized. Therefore, through the alarm rule of the upstream monitoring data, alarm matching can be carried out on each data item in the upstream monitoring data.

For each monitoring data item in the upstream monitoring data, since the state monitoring needs to be performed in a timed, periodic or even real-time manner, the alarm triggering monitoring needs to be performed at different times for this monitoring data item. The current monitoring data item is the current data item in the upstream monitoring data currently carrying out alarm triggering monitoring. The historical monitoring data item is the historical data item which is consistent with the data item corresponding to the current monitoring data item, and the historical data item is the historical data item in the upstream monitoring data which has finished the alarm triggering monitoring. In other words, the names of the data items and the devices corresponding to the current monitoring data item and the historical monitoring data item are the same, but the corresponding data items have different contents, the data item content of the current monitoring data item is the data item content at the current monitoring trigger time, and the data item content of the historical monitoring data item is the data item content at the historical time after the monitoring trigger is completed.

And identifying the alarm triggering state of the current monitoring data item according to the alarm rule. The current data content of the current monitoring data item can be compared with the alarm triggering data content of the data item matched in the alarm rule to identify whether the state of the current monitoring data item is alarm triggering or not. For example, when the current monitoring data item is a voltage data item of the storage battery pack device, and the monitoring value of the current voltage data item is 220V (that is, data content), if the alarm triggering data content of the voltage data item matched in the corresponding alarm rule is 220V, it indicates that the two data contents are consistent, and it is identified that there is no alarm abnormality, and the alarm triggering state of the current voltage data item is marked as no trigger, that is, no alarm processing is performed; correspondingly, when the two data contents are inconsistent, the alarm exception can be identified, and the alarm trigger state of the current voltage data item is marked as trigger, namely alarm processing is carried out. Therefore, the alarm triggering state is a data abnormality feedback state of each monitoring data item in the corresponding upstream monitoring data matched according to the alarm rule, the data item is marked as alarm triggering when the data is abnormal, and otherwise, the data item is marked as no triggering.

For example, the current monitoring data item at 11:00 am on 11 am at 12 months 11 months 12 months 2021 is the voltage data item of the battery pack device, the monitoring value of the current voltage data item is 220V, no abnormality exists, and the alarm triggering state is no triggering; correspondingly, the historical monitoring data item of 10:00 am on 12/11/2021 is the voltage data item of the storage battery pack device, the monitoring value of the historical voltage data item is 210V, the abnormal condition is detected, and the alarm triggering state is triggered.

The alarm triggering state of the current monitoring data item can be judged according to the alarm triggering state of the historical monitoring data item. For example, according to the alarm rule, when the alarm trigger state of the historical monitoring data item is a trigger, if the monitoring data content of the current monitoring data item is the same as that of the historical monitoring data item, the alarm trigger state of the current monitoring data item can be identified as a trigger. Conversely, when the two data are not the same, the monitoring data content of the current monitoring data item can be further identified through the alarm rule. Therefore, the relation between the historical monitoring data item and the current historical monitoring data can be established, and the current historical monitoring data is judged by means of the alarm state of the historical monitoring data item, so that the parallel operation of the data is facilitated, and the data processing speed is increased.

The device topology is the topology of all upstream devices in operation in the operation and maintenance system of the present disclosure. The topological incidence relation of all upstream devices can be established through the device topological structure, and traversal can be executed on alarm triggering of the current monitoring data item through the topological incidence relation so as to inquire root cause devices of alarm root causes. For the current monitoring data item in each upstream monitoring data, once the alarm triggering state corresponding to the current monitoring data item is a trigger, that is, the content of the current monitoring data item is abnormal, the monitoring abnormality may also exist on other corresponding upstream devices having a topological association relationship with the current monitoring data item. That is, the upstream device of the current device corresponding to the current monitoring data item is abnormal, which may cause the corresponding current monitoring data item of the current device to have an abnormal state triggered by the alarm. By traversing the upstream device having the topological association relationship with the current device, the device where the root anomaly causing the anomaly of the current monitoring data is located, namely the root cause device, can be found. The abnormal data item and the content of the abnormal data item in the abnormal state, which cause the alarm triggering of the current monitoring data of the current device, can be used as the alarm root cause.

Therefore, the alarm root cause output method of the embodiment of the disclosure establishes the upstream and downstream device contact by using the topological association relationship between the devices in the operation and maintenance system, standardizes the devices and alarm information which may cause alarm storms, and realizes fast positioning of root cause devices and accurate output of alarm root causes without depending on expert technical experience and big data analysis, compared with the conventional method of performing alarm aggregation by using the alarm rule configuration relationship one by one and performing alarm aggregation by using big data analysis in the prior art, thereby avoiding the huge workload and poor flexibility caused by the one-by-one management and configuration of the alarm rules in the traditional mode, and the traditional defect that the association can not be updated quickly is overcome, the output efficiency of the alarm root cause is greatly improved, the accuracy and the output speed of the alarm root cause are ensured, and the operation and maintenance safety and the operation and maintenance efficiency are improved.

As shown in fig. 2, before determining the alarm rule of the upstream monitoring data in operation S201, according to an embodiment of the present disclosure, the method further includes:

determining equipment standard information corresponding to the upstream monitoring data;

and configuring the topological structure of the equipment according to the equipment standard information.

The equipment standard information is the equipment information after the basic information of all the equipment of the operation and maintenance system corresponding to the upstream monitoring data is standardized. The devices mainly include operation and maintenance monitoring devices such as machine room data center infrastructure, and basic information of the operation and maintenance monitoring devices mainly includes device type information, device basic information, device space information, monitoring point location information and the like of the operation and maintenance monitoring devices. In addition, the standardization process may mainly include initial management of data to facilitate basic data configuration of upstream monitoring data.

After the data initialization is performed on the upstream monitoring data, the standardization of the basic information of the corresponding devices can be realized, and the device standard information is formed, such as the unification of the standard fields of the data items, so that the subsequent processing of the data items can be performed according to the unified standard fields in the later period.

Further, the standard information given to the device can realize the establishment of the topological association relationship among the devices, so that a corresponding device topological structure is formed based on the topological association relationship. The root cause tracing of the alarm root causes of the abnormal data items in the later period is facilitated, and therefore searching and positioning of the upstream and downstream topological relation are achieved.

As shown in fig. 2, in determining the device standard information corresponding to the upstream monitoring data according to the embodiment of the present disclosure, the method includes:

standardizing equipment type information corresponding to the upstream monitoring data;

standardizing monitoring point location information corresponding to the equipment type information based on standardized equipment type information to generate equipment standard information;

the equipment type information comprises an equipment name and a corresponding equipment number; the monitoring point location information includes a monitoring point location name corresponding to the device name and a corresponding monitoring point location number.

And initializing the device basic information corresponding to the upstream monitoring data, wherein the device basic information comprises device type information, monitoring point location information standardization and device naming standardization, so that the device upstream and downstream information standardization is completed.

Firstly, in the process of standardizing the equipment type information and standardizing the monitoring point location information of the equipment, it is considered that infrastructure equipment in a machine room center has multiple equipment types, such as a storage battery pack, a high-voltage cabinet, a low-voltage cabinet, a precision air conditioner, a UPS and the like, so the equipment type information can be used as the type or category information of operation and maintenance equipment, including equipment names. Each type of device may have multiple monitoring point locations, and each monitoring point location may implement feedback on different monitoring data items, for example, the monitoring point location of the device, which is a storage battery, includes monitoring data items of voltage, internal resistance, temperature, discharge state, and the like, where the monitoring values and monitoring contents of these specific data items may be used as data contents of the data items. Therefore, the monitoring point location information may be information of a monitoring data item corresponding to each monitoring point location of the device. Therefore, after the standardization of the monitoring point location information is completed, the monitoring alarm rule setting can be performed for the monitoring point location under the corresponding equipment type in the subsequent alarm rule configuration process, and if the monitoring value of the monitoring voltage data item of the equipment storage battery pack can be set in the alarm rule, an alarm is given out when the monitoring value is less than 0.

According to the equipment type division, each equipment in the center of the machine room can be divided into various types such as a transformer, a high-voltage cabinet, a low-voltage cabinet, UPS equipment, a storage battery pack, a precision air conditioner, a temperature and humidity sensor, gas detection equipment and the like. For the standardization of the type information of each device, each type of device of the devices may be named and identified, for example, the devices are named uniformly according to the numbering rule of letters and numbers with 4-digit numbers, for example, the number of the battery pack is BA01, and the device may be adjusted adaptively according to actual situations. Therefore, the equipment type information can also comprise the equipment number corresponding to the equipment name, and the standardization of the type information of each equipment can be realized.

After the standardization of the device type information is completed, the monitoring point locations corresponding to the device type numbers can be divided into monitoring point location monitoring parameters such as monitoring voltage, monitoring internal resistance, monitoring temperature rise, monitoring discharge state and the like of the storage battery pack device, and therefore the monitoring point location information includes the monitoring point location names corresponding to the device names. Further, the monitoring locations corresponding to the devices may also be named uniformly according to the numbering rule of letters or numbers with 4-bit numbers, for example, the monitoring point of the storage battery device is a battery voltage, the number of the battery voltage is 1001, and the monitoring locations can be specifically adjusted adaptively according to actual conditions. Therefore, monitoring point location numbers can be defined according to the monitoring point location names corresponding to the equipment names, and the monitoring point location information corresponding to each equipment type is standardized.

Further, by means of the standardization of the equipment type information and the monitoring point location information, the unified naming number of the equipment type and the monitoring point location is realized. Therefore, the unified naming of the devices can be realized, the specifically unified naming device names of the devices can also reflect professional information and space information of the devices, for example, the unified naming of the devices can be carried out according to the numbering naming rule of building (1 bit) -floor (3 bits) -room (5 bits) -professional (2 bits) -device type (4 bits) -device number coding (8 bits), for example, the 5 nd group and 2 nd battery of the 1 st room power storage battery pack of the 1 st room battery room of the negative 2 th building can be named as A-B02-BAR01-01-BA 01-00050002. Accordingly, uniform naming of the corresponding monitoring point locations can also be achieved.

Therefore, the device standard information can be formed according to the standardization of the device type information and the device monitoring point location information, so that the unique readability of the device is ensured.

Fig. 3 schematically illustrates a composition diagram of a device topology according to an embodiment of the present disclosure.

As shown in fig. 2 and 3, in configuring a device topology according to device standard information according to an embodiment of the present disclosure, the method includes:

and configuring equipment topology association corresponding to the upstream monitoring data according to the equipment standard information through equipment topology editing to complete the configuration of an equipment topology structure.

Through an equipment topology editing program or tool, uniform equipment naming numbers in equipment standard information of each equipment in the machine room data center are uniformly edited, and an equipment topology structure with equipment topology association relation can be formed. Specifically, in the equipment topology editing, the topology editing of the unified naming number of the equipment can be performed on the maintained equipment through visual interaction, the association between the upper-level equipment and the lower-level equipment is completed, the upper-level topology association and the lower-level topology association of the equipment are formed, and the tree topology structure of the machine room infrastructure equipment is formed after the configuration is completed. A plurality of topological structures can be defined according to actual needs and association relations between professions and equipment, and one piece of equipment can be in a plurality of topological structures.

As shown in fig. 3, in the equipment topology 300, a high voltage distribution cabinet 310 may be interrelated as an upstream equipment with downstream equipment transformers 320, 330; the transformer 320 can be used as an upstream device to be correlated with the low-voltage power distribution cabinets 340, 350 and 360 of the downstream device; therein, the low voltage distribution cabinet 340 may be interrelated as an upstream device with downstream devices UPS devices 370, 380, 390. Thus, a device topology 300 with topological relationships can be formed. Therefore, when any one of the devices is abnormal in the data item of the corresponding monitoring point, the device corresponding to the monitoring point and all upstream and downstream devices which may cause the data item abnormality of the device can be searched and located to determine the influence range and the most upstream abnormal device.

By means of the equipment topological structure, root cause tracing can be carried out on the alarm root causes of all abnormal data items in the later period, and therefore searching and positioning of the upstream and downstream topological relations and alarm rule configuration of monitoring data items of corresponding equipment are achieved.

FIG. 4 schematically illustrates a composition diagram of alarm rules, according to an embodiment of the disclosure.

As shown in fig. 2 to 4, in the alarm rule for determining the upstream monitoring data in operation S201 according to the embodiment of the present disclosure, the method includes:

establishing a rule item of an alarm rule according to equipment standard information corresponding to the upstream monitoring data;

generating an alarm condition corresponding to the rule item to determine an alarm rule;

the alarm conditions comprise alarm levels, alarm ranges, alarm equipment types, alarm monitoring point positions, alarm triggering conditions, alarm recovery conditions and alarm topological relations.

After the basic configuration of the equipment standard information and the equipment topological structure is completed, alarm rules can be set for data items of monitoring points corresponding to the equipment standard information, the alarm rules can be input and matched in batches corresponding to different data items, and alarm judgment is performed on abnormal data items of the received upstream monitoring data through the matched alarm rules.

The alarm rule corresponding to each data item has a corresponding unified name, and the name can be defined according to the alarm condition, the corresponding monitoring point position, the corresponding equipment type and the topological association relationship of the topological equipment. Specifically, a rule item may be newly created according to the device standard information corresponding to the data item of the upstream monitoring data, and the rule item may uniquely define the alarm condition of the data item to form a data item alarm rule.

As shown in fig. 4, the alarm condition includes an alarm device type 410 (which may be embodied by a device name, etc.), an alarm level 420, an alarm effective area 430 (i.e., an alarm range), an alarm association topology 440 (i.e., an alarm topology relationship), an alarm recovery condition 450, an alarm triggering condition 460, and an alarm monitoring point (i.e., a monitoring index 470). Thus, a systematic definition of alarm rules can be achieved.

The alert device type 410 sets: and configuring the type of equipment, such as a storage battery pack, defined by the data item corresponding to the upstream monitoring data. The alert level 420 sets: the level of the abnormal alarm of the abnormal data item is defined as early warning, general alarm, serious alarm and the like, and can be specifically set according to the operation and maintenance requirements. The validation area 430 sets: the space range of the monitoring equipment of the data item alarm rule can be selected according to the condition, such as the whole machine room, the whole building or the whole floor. Default to the whole machine room, if a storage battery pack is selected, the storage battery pack refers to all storage battery packs of the whole machine room, if a certain storage battery pack is selected, the storage battery pack in the storage battery pack is referred to, and the monitoring point location is set by analogy within the range, so that the monitoring point location related to the alarm under the equipment type, such as the voltage under the storage battery pack, is defined. The association topology 440 sets: selecting a device topology 300 of upstream and downstream relationships of devices as defined above in fig. 3, if triggered as described above for trigger condition 460, will complete the aggregate computing trigger association based on the device topology. The recovery condition 450 sets: when the trigger condition is reached, the alarm is quitted, and the definition content can be consistent with the trigger condition, for example, the voltage of the storage battery pack is greater than 0, which indicates that the fault is recovered. Wherein the trigger conditions 460 set: if the threshold of the monitoring point location of the data item corresponding to the corresponding monitoring index is set to be greater than, less than, greater than or equal to, and less than or equal to the threshold reference relationship, after the numerical value of the data content of the corresponding data line in the upstream monitoring data or the defined corresponding numerical value is compared with the threshold, it can be defined whether the data content is abnormal, whether the data item is an abnormal data item, whether the corresponding equipment is abnormal equipment, and whether other upstream and downstream equipment having a topological association relationship with the data item is also abnormal equipment.

Therefore, through the alarm rule setting corresponding to each data item, batch alarm rule matching can be directly carried out on the data items, the judgment standard for the abnormity and the normal of each data item is established, the matching speed of the alarm rule in the later period is accelerated, the alarm aggregation processing efficiency is improved, and the judgment accuracy of data item alarm is ensured.

Fig. 5 schematically illustrates a data processing flow diagram of upstream-binned monitoring data according to an embodiment of the present disclosure.

As shown in fig. 2 to 5, according to an embodiment of the present disclosure, before identifying an alarm trigger state of a current monitoring data item in the upstream monitoring data with respect to a historical monitoring data item based on an alarm rule in operation S202, the method further includes:

cleaning and standardizing the received upstream monitoring data to generate upstream standardized data;

identifying a data change status of a current normalized data item in the upstream normalized data relative to a historical normalized data item;

and determining warehousing monitoring data in the upstream standardized data according to the data change state of the current standardized data item.

As shown in fig. 4, upstream monitoring data provided by an upstream monitoring system may be received through the internet of things technology, where the upstream monitoring data may include monitoring data streams of monitoring devices such as a power monitoring system (e.g., a high-voltage cabinet, a low-voltage cabinet UPS, a storage battery pack, etc.), an operation monitoring system, and a dynamic loop monitoring system (e.g., a precision air conditioner, etc.), as in operation S501.

Specifically, the received upstream monitoring data includes not only each monitoring data item, but also data content corresponding to each data item, such as a specific value of a monitoring point location, for example, a storage battery with a number of a-B02-BAR01-01-BA01-00050002, and a reading of a voltage of 1001 at the monitoring point location of 320V.

For example, after receiving the real-time upstream monitoring data collected and transmitted by the internet of things, it may be monitored whether the device related to the upstream monitoring data belongs to the device corresponding to the standardized device standard information, that is, whether the device corresponding to the upstream monitoring data has completed the standardized device information maintenance operation, as in operation S502. If the query confirms that the device corresponding to the upstream monitoring data has completed the standardized operation and the corresponding device standard information is generated, the next step is continued, otherwise, the data is directly discarded, as in operation S521. For example, the upstream monitoring data of the equipment which is uniformly named as C-B01-BAR11-01-BA01-00000001 is received, but the corresponding equipment basic information of the equipment is not subjected to the information standardization processing, and the corresponding equipment standard maintenance information is not obtained, and the upstream monitoring data is discarded. Therefore, the primary screening of the upstream monitoring data to be processed can be completed, the amount of useless data is reduced primarily, redundant data is screened out, and the data processing process is accelerated.

And on the basis of the primary screening result of the equipment, performing further data cleaning and standardization processing on the received upstream monitoring data. The corresponding monitoring data items in the upstream monitoring data generally include corresponding data item generation time, corresponding unique device naming identification information, device monitoring point location information, and corresponding monitoring values. Therefore, after the corresponding device is determined to be the maintained device, it is necessary to complete the supplement of the remaining attribute information of the device according to the device standardization information, for example, supplement of the device type, the device space information, and the device monitoring point location, which are not included in the upstream monitoring data, and complete the cleaning and standardization of the upstream monitoring data, so as to generate the corresponding upstream standardization data, which is the corresponding upstream monitoring data subjected to the data supplement standardization process, as in operation S503. Therefore, the method can further realize the precise processing of the upstream monitoring data, so that the upstream standardized data generated by the upstream monitoring data can be perfectly matched in the subsequent alarm processing process, and the method is favorable for realizing the accurate alarm aggregation.

After the data cleaning and standardization are completed, a data storage link is entered, and the upstream standardized data is stored, so as to facilitate real-time calling, for example, the data calling is transmitted to alarm rule matching, and an alarm logic process is accessed, as in operation S505.

After data is cleaned, the warehousing logic of the corresponding database firstly queries the current values of the corresponding devices and the monitoring points of each current standardized data item in the upstream standardized data, combines the historical values of the corresponding devices and the monitoring points of the historical standardized data items relative to the current standardized data item, determines that the data change state of the current standardized data item is unchanged when the current values are unchanged relative to the historical values, directly updates the generation time of the monitoring data of the historical standardized data item, updates the generation time of the data of the corresponding devices and the monitoring points of the current standardized data item, and completes the process of warehousing the data to form the warehousing monitoring data, such as operation S504.

On the contrary, when the current value changes relative to the historical value, the data change state of the current standardized data item is determined to be changed, a data item record is added in a data table in the database, and the current standardized data item, the corresponding equipment and monitoring point data and the data generation time are simultaneously filled into the added data item record, so that the record of the changed current standardized data item is put into a warehouse, and the warehouse monitoring data is formed.

Therefore, by cleaning and standardizing the upstream monitoring data, simultaneously performing warehousing operation on the corresponding data, screening out the data content of the unchanged data item in the warehousing operation process, updating the generation time of the data content, or newly adding the data content of the changed data item and the generation time of the data content, the data amount of the upstream monitoring data is greatly reduced, the data amount of subsequent processing is greatly reduced, simultaneously, the record of the changed data can be kept, and the log time of the unchanged data is updated, so that the processing accuracy of the data is ensured, the integrity of the data is maintained, the data omission and loss are prevented, and the data input of the later alarm aggregation process is facilitated.

FIG. 6 schematically illustrates a flow chart of an identification process of an alarm triggered state of a currently monitored data item according to an embodiment of the present disclosure.

As shown in fig. 2 to 6, according to an embodiment of the present disclosure, in identifying an alarm trigger state of a current monitoring data item in upstream monitoring data relative to a historical monitoring data item based on an alarm rule in operation S202, the method includes:

inquiring an alarm rule matched with the current monitoring data item according to equipment standard information corresponding to the current monitoring data item in the warehousing monitoring data;

and responding to the inquired alarm rule, and performing trigger check on the current monitoring data item to determine the alarm trigger state of the current monitoring data item relative to the historical monitoring data item.

For the warehouse entry monitoring data which is cleaned and stored in the warehouse, each data item needs to be matched with a corresponding alarm rule. For the current monitoring data item in the warehousing monitoring data, the query matching of the alarm rule is required to be performed according to the unified naming information, the equipment type information, the corresponding monitoring point location information and the like of the equipment in the corresponding equipment standard information. And inquiring the rule items which can be matched with the alarm rules in the alarm rules, wherein each rule item can define the alarm condition of the current monitoring data item, thereby being beneficial to finishing the alarm triggering check described below. The query of the alarm rule is to match the rule items of the corresponding data items according to the preset alarm rule matching conditions and the matching relationship between the equipment type information, the monitoring point location information and the like and the matched rule items.

After the query matching of the alarm rule is completed, the trigger check can be executed according to the alarm rule matched by the query on whether the data content of the current monitoring data item meets the trigger condition of the alarm rule or not. And if the trigger condition is met, the trigger check is successful, otherwise, the trigger check fails. The alarm triggering state of the current monitoring data item with successful trigger check may be alarm triggering, and the alarm triggering state of the current monitoring data item with failed trigger check may be alarm non-triggering.

Therefore, the judgment of the alarm triggering state of the current monitoring data item can be realized, the alarm aggregation processing process can be further facilitated, and the alarm aggregation processing precision is ensured.

Further, by means of trigger check, alarm judgment, calculation and judgment of streaming data and real-time matching alarm rules can be carried out on the warehouse-in monitoring data after the data is cleaned, and alarm is triggered after alarm conditions are met.

As shown in fig. 6, after receiving the device warehousing monitoring data, querying an alarm rule through the device standard information corresponding to the current monitoring data item in the warehousing monitoring data, checking whether the device corresponding to the current monitoring data item and the monitoring point location thereof are configured with the rule item information of the alarm rule, if so, entering an alarm triggering check, and if not, ending the process and exiting, if not, in operation S601-S602. Further, if the device monitoring data is configured with an alarm rule, a trigger check is performed to check a trigger state corresponding to the alarm rule, if the alarm rule has been triggered, that is, an existing alarm is generated under the alarm rule, an existing alarm processing link is entered, and if the alarm rule has not been triggered before, a newly added alarm processing link is entered, in which, operations S603-S607 are performed.

As shown in fig. 2 to fig. 6, according to an embodiment of the present disclosure, in response to an queried alarm rule, performing a trigger check on a current monitoring data item to determine an alarm trigger state of the current monitoring data item relative to a historical monitoring data item, includes:

determining that the alarm triggering state of the equipment monitoring point position corresponding to the historical monitoring data item is not triggered through triggering check;

and responding to the triggerless alarm triggering state, and when the monitoring point position monitoring information of the equipment corresponding to the current monitoring data item meets the alarm triggering condition of the alarm rule, newly adding the alarm triggering state of the current monitoring data.

If the device warehousing monitoring data is configured with an alarm rule, performing trigger check, checking the trigger state of the current monitoring data item corresponding to the alarm rule, and if the alarm rule is not triggered by the historical monitoring data item, that is, if no alarm state is generated correspondingly under the alarm rule, entering a newly-added alarm processing program, such as operations S603-S606.

And triggering and checking the historical monitoring data item corresponding to the current monitoring data item, wherein if the monitoring data content corresponding to the historical monitoring data item does not meet the alarm triggering condition of the matched alarm rule, the alarm triggering state of the equipment monitoring point position of the historical monitoring data item corresponding to the monitoring data content can be not triggered.

And triggering and checking the monitoring data content of the current monitoring data item, and if the monitoring data content corresponding to the current monitoring data item meets the alarm triggering condition of the matched alarm rule, indicating that the alarm triggering state of the equipment monitoring point position of the monitoring data content corresponding to the current monitoring data item is triggered.

Therefore, the alarm triggering state of the current monitoring data item is changed compared with the alarm triggering state corresponding to the historical monitoring data item, so that the current monitoring data item and the corresponding data content are newly added in the corresponding data table in the database, and the current monitoring data item is added into the data table.

The method comprises the steps of entering a triggering check process for an alarm rule which is not triggered and corresponds to a historical monitoring data item of warehouse monitoring data, judging whether a current monitoring data item meets a triggering condition or not, if so, adding an alarm processing process under the alarm rule, monitoring the alarm state all the time, recording alarm information, alarming time for the first time and the like, and modifying the alarm rule state into a triggered state. If the trigger condition is not satisfied, the process ends in operations S606-S607.

Therefore, the newly increased triggered current monitoring data item can be realized, the integrity of the triggered monitoring data is ensured, and the omission or loss of necessary abnormal data is avoided, so that the accuracy of abnormal triggering alarm is ensured.

As shown in fig. 2 to fig. 6, according to the embodiment of the present disclosure, in response to the queried alarm rule, performing trigger check on the current monitoring data item to determine an alarm trigger state of the current monitoring data item relative to the historical monitoring data item, further includes:

through trigger check, determining that the alarm trigger state of the equipment monitoring point position corresponding to the historical monitoring data item is triggered;

and responding to the triggered alarm triggering state, and updating the alarm triggering state of the current monitoring data when the monitoring information of the equipment monitoring point position corresponding to the current monitoring data item meets the alarm triggering condition of the alarm rule.

If the device warehousing monitoring data is configured with an alarm rule, performing trigger check, checking the trigger state of the current monitoring data item corresponding to the alarm rule, and if the alarm rule is already triggered by the historical monitoring data item, namely the alarm rule is generated corresponding to the existing alarm state, entering an existing alarm processing program, such as operation S603-S605.

And performing trigger check on the historical monitoring data item corresponding to the current monitoring data item, wherein if the monitoring data content corresponding to the historical monitoring data item meets the alarm trigger condition of the matched alarm rule, the alarm trigger state of the equipment monitoring point position of the historical monitoring data item corresponding to the monitoring data content can be triggered.

Therefore, the alarm trigger state of the current monitoring data item may be changed compared with the alarm trigger state corresponding to the historical monitoring data item. Therefore, when the data content of the alarm triggering of the current monitoring data item is consistent with the data content of the alarm triggering corresponding to the historical monitoring data item, the current monitoring data item and the corresponding data content generation time are updated in the corresponding data table in the database, and the data generation time of the current monitoring data item is added into the data table.

If the alarm is triggered by the historical monitoring data item in the device warehousing monitoring data, whether the latest current monitoring data item still meets the trigger condition is judged, and if the trigger condition is met and the alarm is given, the latest alarm time is recorded and the alarm time of the historical monitoring data item is replaced, in operation S605.

Therefore, the screening of the monitoring data items of the existing triggering alarm rule can be realized, and the alarm time and/or the data generation time corresponding to the monitoring data items are updated, thereby further screening the monitoring data items.

based on the alarm triggering state, when the alarm change corresponding to the current monitoring data item does not exist, updating the alarm information corresponding to the current monitoring data item; or

And based on the alarm triggering state, when the alarm change corresponding to the current monitoring data item exists, newly adding the alarm information corresponding to the current monitoring data item.

And if the latest current monitoring data item does not meet the triggering condition and no corresponding triggering alarm change exists, updating the data content corresponding to the current monitoring data item and the alarm information (such as alarm time, alarm rule and the like) matched with the current monitoring data item, and replacing the data content and the alarm information of the historical monitoring data item.

If the latest current monitoring data item data meets the triggering condition and corresponding triggering alarm changes exist, adding the data content corresponding to the current monitoring data item and the alarm information (such as alarm time, alarm rules and the like) matched with the current monitoring data item, and arranging the data content and the alarm information of the historical monitoring data item in the data table in parallel.

As shown in fig. 6, in the updating of the alarm record in operation S608, if there is an alarm process of the historical monitoring data item, and if there is no change in the alarm of the current monitoring data item, the latest occurrence time of the alarm, the number of times of triggering, and the like are updated; and if the alarm of the current monitoring data item is recovered, updating the alarm recovery time, the alarm state and the like. On the contrary, if there is a change in the alarm of the current monitoring data item, an alarm record for alarm processing, such as alarm information (including the first time of alarm occurrence, the alarm level status, etc.), is newly added, and after the update record is completed, the aggregated alarm program module is entered, as in operation S609.

If the latest current monitoring data item does not meet the triggering condition, judging whether the latest current monitoring data item meets the restoring condition, if so, recording the latest alarm state and the alarm time, updating the record, simultaneously canceling the alarm rule daemon, and modifying the alarm rule state into an un-triggered state. If the latest current monitoring data item does not meet the triggering condition or the recovery condition, the process is ended.

Therefore, real-time parallel judgment of each data item in the data stream of the warehouse-in monitoring data can be realized, the data volume of the data stream can be screened and updated, the data processing volume is greatly reduced, the resource consumption and the data processing delay are reduced, the data processing speed is improved, the integrity of the alarm data is ensured, the omission and loss of abnormal data are avoided, and the accuracy of data alarm aggregation is improved.

Fig. 7 schematically illustrates an output flow diagram of a root cause device according to an embodiment of the present disclosure.

As shown in fig. 2 to 7, according to an embodiment of the present disclosure, traversing root cause devices of alarm root causes in a device topology according to an alarm trigger state of a current monitoring data item in operation S203, for outputting the alarm root causes of the root cause devices, includes:

determining the position state information of the current monitoring equipment corresponding to the current monitoring data item in the equipment topological structure according to the alarm triggering state of the current monitoring data item;

and traversing the upstream equipment in the equipment topological structure according to the position state information of the current monitoring equipment, and outputting root cause equipment.

In order to meet the alarm aggregation processing of the aggregated alarm scene and complete the continuous monitoring and updating of the alarm state, the equipment corresponding to the current monitoring data item can be positioned according to the equipment topology association relation defined by the equipment topology structure. As shown in fig. 3, if the current monitoring data item is already in the triggered alarm triggering state, the UPS device 370 corresponding to the current monitoring data item may be located, and the tracking is performed according to the upstream device low-voltage power distribution cabinet 340 associated with the UPS device 370. Therefore, the position of the device in the device topology structure can be defined by means of the device topology association relationship, for example, the device high voltage distribution cabinet 310 is an original upstream device, and the defined position is (0, 0); transformers 320 and 330, which are downstream devices of the high-voltage distribution cabinet 310, are in a first-stage association relationship, and can be respectively positioned at positions (1, 0) and (1, 1); correspondingly, the low-voltage power distribution cabinets 340, 350, and 360 as the downstream devices of the transformer 320 are in the second-stage association relationship, and can be respectively located at positions (2, 0), (2, 1), and (2, 2); UPS devices 370, 380, 390 that are downstream devices of the low voltage distribution cabinet 340 are in a third level association and may be located at positions (3, 0), (3, 1) and (3, 2), respectively.

Therefore, the location position, i.e. the location state information, of the current monitoring device in the device topology structure of the current monitoring data item can be understood as an associated coordinate defined by the device management level. The definition of the associated position of each device can be realized through the position state information of each device, thereby being beneficial to positioning or tracing each device. The corresponding alarm information about the current monitoring data item is input, and the aggregated alarm check is performed, so that the associated topology condition of the alarm rule can be further judged, as in operations S701 to S703.

When the current monitoring data item is judged to be in the triggered alarm triggering state, the abnormal influence range in the whole equipment topological structure can be determined according to the tracing of upstream equipment. As shown in fig. 3, when the UPS device 370 is a current monitoring device corresponding to a current monitoring data item, and the data item is already in an alarm triggering state for triggering an alarm, the UPS device 370 is traversed according to the location state information associated with the data item in the device topology structure, as shown in operations S704-S706. And if no corresponding equipment topological structure is set, directly processing the operation and maintenance data.

Therefore, when the alarm triggering state of the current monitoring data item is triggered, the position of the current monitoring equipment corresponding to the current monitoring data item in the equipment topological structure can be obtained, and the position is marked as alarm triggering, so that the root cause equipment corresponding to the position of the equipment topological structure can be traversed and traced by means of the position of the equipment topological structure.

As shown in fig. 2 to fig. 7, according to an embodiment of the present disclosure, in the device topology, performing traversal on the upstream device according to the location state information of the current monitoring device, and outputting a root cause device, the method includes:

determining the last final trigger equipment in an alarm trigger state in the traversal execution process of the upstream equipment;

and outputting the final trigger equipment as root cause equipment based on the alarm aggregation information of the final trigger equipment.

In the embodiment of the present disclosure, alarm aggregation actually performs aggregation and convergence on alarms of various upstream systems or upstream devices through a series of algorithms, and finds a data processing means of a root cause node triggering an alarm, so as to reduce an alarm storm.

Specifically, the higher level device of the current monitoring device is traversed, and whether the higher level device is also in the triggered alarm triggering state is determined, if yes, the traversal is continuously performed on the higher level device until the device in the triggerless alarm triggering state is traversed, the device in the triggered alarm triggering state in the last traversal process is output as the final triggering device, and meanwhile, the alarm information of the device is determined, that is, the final triggering device is the root cause device, as in operations S704-S706.

The alarm information of the final trigger device may be used to form the alarm aggregation information, and the alarm aggregation information may be an alarm information record set of all triggered devices determined by the traversal process in the device topology structure. According to the alarm aggregation information, the output of the final trigger equipment can be realized, and the output is root cause equipment.

Therefore, compared with the traditional mode that the workload is huge and the flexibility is poor due to the one-by-one management configuration of the alarm rules, the workload of data processing can be greatly reduced through the topological association relation defined by the topological structure of the equipment, the data processing is more flexible, and the quick update of the association information can be met, so that the association is more favorable for the application.

Therefore, by means of the matching of the alarm rules, the triggering alarm check is realized, the corresponding alarm equipment is identified based on the triggering alarm check result (namely the alarm trigger state) of each data item, the tracing output of the root cause equipment is realized according to the topological correlation state of the alarm equipment, the large-area generation of various alarm storms can be basically avoided, the operation and maintenance stability in the alarm aggregation process is ensured, the operation and maintenance safety is obviously improved, and the operation and maintenance efficiency is improved.

As shown in fig. 2 to 7, according to an embodiment of the present disclosure, traversing root cause devices of alarm root causes in a device topology according to an alarm trigger state of a current monitoring data item in operation S203, for outputting the alarm root causes of the root cause devices, further includes:

updating the aggregation alarm information of the root cause equipment;

and outputting the updated aggregated alarm information as an alarm root cause.

If it is determined that there is no root cause device corresponding to the current monitoring device in the existing alarm aggregation record, the dimension of the current root cause device may be increased and alarm aggregation information may be added according to the topological association relationship of the device topological structure of the current monitoring device, and aggregated alarm information such as aggregated topological structure information, root cause device type, matched alarm rule name, alarm occurrence time, and alarm aggregation frequency may be recorded, as in operation S707. Finally, the alarm aggregation record is output and forwarded to the dimension processing as the alarm root cause in operation S708. And when the final alarm root is recovered from the alarm information of the equipment, the alarm aggregation state can be modified to be recovered.

Therefore, the method of the embodiment of the present disclosure can standardize the equipment which may cause an alarm storm and the alarm information by using the upstream and downstream association relationship of the machine room infrastructure, match the alarm rule for the corresponding data item through the equipment topology association setting on the premise of not depending on expert judgment and data analysis, and combine alarm aggregation to quickly find the alarm root cause equipment, so that the method is suitable for the operation and maintenance monitoring scene with a tighter upstream and downstream relationship, can significantly improve the operation and maintenance safety, and improve the operation and maintenance efficiency.

As shown in fig. 3, to further embody the beneficial effects of the above technical solution, the following specific embodiments are further provided for those skilled in the art to more fully understand the above method of the embodiment of the present disclosure.

Firstly, maintaining the association relationship information of the equipment standard information and the equipment topological structure of the current monitoring equipment, specifically taking the a high-voltage power distribution cabinet 310 as a root node, the downstream of the high-voltage power distribution cabinet comprises transformers 320 and 330, the low-voltage power distribution cabinets 340, 350 and 360 are arranged below the transformer 320, and then the UPS equipment 370, 380 and 390 is downwards arranged, and the equipment naming and the topology setting are standardized according to the information standardization scheme.

Then, configuring the alarm rule information of each device, alarming when the switch state of the current monitoring device is 0 (if the alarm rule is set to "0-abnormal, 1-normal"), and simultaneously matching the alarm rule to the corresponding device as shown in fig. 3.

And receiving current monitoring data collected by the Internet of things technology, and cleaning and warehousing the data to form warehousing monitoring data. If the topology position defined by the transformer 320 shown in fig. 3 is (1, 0), the topology position defined by the low-voltage distribution cabinet 340 is (2, 0), the topology position defined by the low-voltage distribution cabinet 350 is Device (2, 1), the topology position defined by the low-voltage distribution cabinet 360 is (2, 2), and the topology positions defined by the UPS devices 370 and 390 are the states of the (3, 0), (3, 1), (3, 2) devices, which are adjusted to be 0, then an alarm is triggered.

The system judges the equipment alarm triggering and generates an alarm.

When no alarm device is found upwards after the alarm of the transformer 320(1, 0), the transformer 320(1, 0) is the final aggregation alarm device, and the other 6 devices such as the low-voltage power distribution cabinet 340(2, 0) are all positioned to the transformer 320(1, 0) when the topology equipment goes over, the system judges that the alarm generated by the transformer 320(1, 0) is a root cause alarm, judges that the alarm generated by the other 6 devices is an aggregated alarm, and outputs the root cause alarm.

Therefore, in the operation and maintenance process of the infrastructure of the data center machine room, if the upstream equipment fails or operates abnormally, the downstream equipment of the equipment represented by the power system, such as the transformer, the high-voltage power distribution cabinet, the low-voltage power distribution cabinet, the UPS and the like, is affected, and when the upstream main node equipment triggers an alarm, the associated downstream equipment generates an alarm storm, which is not beneficial to the rapid positioning and analysis of the alarm. By the method for outputting the alarm root cause, the root cause can be quickly positioned when a large number of alarms and alarm storms occur based on the equipment topological structure of the machine room infrastructure and through collection and topological analysis of equipment alarm information, so that the effects of alarm aggregation and alarm convergence are achieved.

Based on the alarm root cause output method, the disclosure also provides an alarm root cause output device. The apparatus will be described in detail below with reference to fig. 8.

Fig. 8 schematically shows a block diagram of a structure of an alarm root cause output device according to an embodiment of the present disclosure.

As shown in fig. 8, the alarm root cause output apparatus 800 of this embodiment includes a rule determination module 810, a state identification module 820, and a device traversal module 830.

The rule determination module 810 is used to determine alarm rules for upstream monitoring data. In an embodiment, the rule determining module 810 may be configured to perform the operation S201 described above, which is not described herein again.

The state identification module is used for identifying the alarm triggering state of the current monitoring data item relative to the historical monitoring data item in the upstream monitoring data based on the alarm rule. In an embodiment, the state identification module 820 may be configured to perform the operation S202 described above, which is not described herein again.

The device traversing module is used for traversing root cause devices of the alarm root causes in the device topological structure according to the alarm triggering states of the current monitoring data items, and is used for outputting the alarm root causes of the root cause devices. In an embodiment, the device traversing module 830 may be configured to perform the operation S203 described above, and will not be described herein again.

According to an embodiment of the present disclosure, any of the rule determination module 810, the state identification module 820, and the device traversal module 830 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the rule determination module 810, the state identification module 820, and the device traversal module 830 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware. Alternatively, at least one of the rule determination module 810, the state identification module 820, and the device traversal module 830 may be implemented at least in part as a computer program module that, when executed, may perform corresponding functions.

As shown in fig. 9, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. An alarm root cause output method, comprising:

determining an alarm rule of upstream monitoring data;

identifying an alarm trigger state of a current monitoring data item relative to a historical monitoring data item in the upstream monitoring data based on the alarm rule; and

traversing root cause equipment of the alarm root cause in an equipment topological structure according to the alarm triggering state of the current monitoring data item, and outputting the alarm root cause of the root cause equipment.

2. The method of claim 1, wherein prior to the determining the alarm rule for upstream monitoring data, further comprising:

and configuring the topological structure of the equipment according to the standard information of the equipment.

3. The method of claim 2, wherein the determining of the device standard information corresponding to the upstream monitoring data includes:

standardizing monitoring point location information corresponding to the equipment type information based on the standardized equipment type information to generate equipment standard information;

the equipment type information comprises an equipment name and a corresponding equipment number; the monitoring point location information comprises a monitoring point location name corresponding to the equipment name and a corresponding monitoring point location number.

4. The method of claim 2, wherein said configuring the device topology according to the device standard information comprises:

and configuring the device topology association corresponding to the upstream monitoring data according to the device standard information through device topology editing to complete the configuration of the device topology structure.

5. The method of claim 2, wherein the determining the alarm rule of the upstream monitoring data comprises:

establishing a rule item of the alarm rule according to equipment standard information corresponding to the upstream monitoring data;

generating an alarm condition corresponding to the rule item to determine the alarm rule;

6. The method of claim 1, wherein prior to said identifying an alarm trigger state of a current monitoring data item relative to a historical monitoring data item in said upstream monitoring data based on said alarm rule, further comprising:

identifying a data change status of a current normalized data item relative to a historical normalized data item in the upstream normalized data;

7. The method of claim 6, wherein in the identifying an alarm trigger state of a current monitoring data item in the upstream monitoring data relative to a historical monitoring data item based on the alarm rule comprises:

inquiring the alarm rule matched with the current monitoring data item according to the equipment standard information corresponding to the current monitoring data item in the warehousing monitoring data;

8. The method of claim 7, wherein in the alert rule responsive to the query, performing a trigger check on the current monitoring data item to determine an alert trigger state of the current monitoring data item relative to the historical monitoring data item comprises:

determining that the alarm triggering state of the equipment monitoring point position corresponding to the historical monitoring data item is not triggered through the triggering check;

responding to the alarm triggering state which is not triggered, and adding the alarm triggering state of the current monitoring data when the monitoring point position monitoring information of the equipment corresponding to the current monitoring data item meets the alarm triggering condition of the alarm rule.

9. The method of claim 7, wherein, in the alert rule responsive to the query, performing a trigger check on the current monitoring data item to determine an alert trigger state of the current monitoring data item relative to the historical monitoring data items, further comprises:

determining that the alarm triggering state of the equipment monitoring point position corresponding to the historical monitoring data item is triggered through the triggering check;

and in response to the triggered alarm triggering state, updating the alarm triggering state of the current monitoring data when the equipment monitoring point position monitoring information corresponding to the current monitoring data item meets the alarm triggering condition of the alarm rule.

10. The method of claim 7, wherein, in the alert rule responsive to the query, performing a trigger check on the current monitoring data item to determine an alert trigger state of the current monitoring data item relative to the historical monitoring data items, further comprises:

And based on the alarm triggering state, when the alarm change corresponding to the current monitoring data item exists, adding alarm information corresponding to the current monitoring data item.

11. The method of claim 1, wherein traversing root cause devices of the alarm root cause in a device topology according to the alarm trigger state of the current monitoring data item for outputting alarm root causes of the root cause devices comprises:

and traversing the upstream equipment in the equipment topological structure according to the position state information of the current monitoring equipment, and outputting the root cause equipment.

12. The method of claim 11, wherein traversing the upstream device in the device topology according to the location status information of the current monitoring device, and outputting the root cause device comprises:

determining the last final trigger device in an alarm trigger state in the traversal execution process of the upstream device;

and outputting the final trigger equipment as the root cause equipment based on the alarm aggregation information of the final trigger equipment.

13. The method of claim 11, wherein traversing root cause devices of the alarm root cause in a device topology according to the alarm trigger state of the current monitoring data item for outputting alarm root causes of the root cause devices comprises:

updating the aggregation alarm information of the root cause equipment;

and outputting the updated aggregated alarm information as the alarm root cause.

14. An alarm root cause output device, comprising:

the rule determining module is used for determining an alarm rule of the upstream monitoring data;

the state identification module is used for identifying the alarm triggering state of the current monitoring data item relative to the historical monitoring data item in the upstream monitoring data based on the alarm rule; and

and the equipment traversing module is used for traversing root cause equipment of the alarm root cause in an equipment topological structure according to the alarm triggering state of the current monitoring data item, and is used for outputting the alarm root cause of the root cause equipment.

15. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-13.

16. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 13.

17. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 13.