CN109408325B - Method and device for performing alarm operation - Google Patents

Method and device for performing alarm operation Download PDF

Info

Publication number
CN109408325B
CN109408325B CN201811147264.4A CN201811147264A CN109408325B CN 109408325 B CN109408325 B CN 109408325B CN 201811147264 A CN201811147264 A CN 201811147264A CN 109408325 B CN109408325 B CN 109408325B
Authority
CN
China
Prior art keywords
storage unit
operation parameter
preset
current value
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811147264.4A
Other languages
Chinese (zh)
Other versions
CN109408325A (en
Inventor
朱东阳
徐一鸣
张亚风
丁晨
李静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811147264.4A priority Critical patent/CN109408325B/en
Publication of CN109408325A publication Critical patent/CN109408325A/en
Application granted granted Critical
Publication of CN109408325B publication Critical patent/CN109408325B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Alarm Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure relates to a method and a device for performing alarm operation, and belongs to the technical field of data storage. The method comprises the following steps: if the current value of the first operation parameter of the first storage unit is within the preset alarm range corresponding to the first storage unit, acquiring the current value of the first operation parameter of the associated storage unit corresponding to the first storage unit; and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the preset prediction alarm range corresponding to the second storage unit, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit, wherein the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit. By adopting the method and the device, whether the same problem occurs in other wider storage units besides the first operation parameter of the first storage unit can be effectively predicted.

Description

Method and device for performing alarm operation
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a method and an apparatus for performing an alarm operation.
Background
With the development of science and technology, storage technology is more and more widely applied. For a storage system with a large storage space, the structure of the storage system is also complicated. The storage system may include a plurality Of regions (cloud areas), each Region may include a plurality Of AZ (available zones), each AZ may include a plurality Of PODs (Point Of Delivery points), each POD may include a plurality Of servers, each server may further include a plurality Of VMs (Virtual machines), and each VM may include a database for storing data Of a user. In the operation process of the storage system, the operation parameter value of each VM can be monitored, and when the operation parameter value of any VM is found to be within a preset alarm range, alarm processing is carried out. Therefore, after the system maintenance personnel check the prompt information generated based on the alarm processing, the system maintenance personnel can perform corresponding maintenance operation on the abnormal VM.
In carrying out the present disclosure, the inventors found that at least the following problems exist:
in some cases, for example, in the case of network congestion, when the value of the operating parameter of any VM is within the preset alarm range, the server to which it belongs, and even the values of the operating parameters of the storage units in a larger range, such as POD, AZ, Region, etc., to which the server to which it belongs, may be within their corresponding preset alarm ranges due to the influence of network congestion. In this way, the same problem can not be prevented in a wider range of memory units in a timely and effective manner.
Disclosure of Invention
In order to overcome the problems in the related art, the present disclosure provides the following technical solutions:
in a first aspect, a method of performing an alarm operation is provided, the method comprising:
monitoring whether the current value of the first operation parameter of each storage unit is within a preset alarm range corresponding to each storage unit;
if the current value of the first operation parameter of the first storage unit is within the preset alarm range corresponding to the first storage unit, acquiring the current value of the first operation parameter of the associated storage unit corresponding to the first storage unit;
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a preset prediction alarm range corresponding to the second storage unit, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit, wherein the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit.
In a possible implementation manner, if the current value of the first operating parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to the preset second storage unit, the alarming the first operating parameter of the first storage unit and the first operating parameter of the second storage unit includes:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuous increase state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
In a possible implementation manner, if the current value of the first operating parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to the preset second storage unit, and the value of the first operating parameter of the second storage unit is in a continuously increasing state within the preset historical time, alarming the first operating parameter of the first storage unit and the first operating parameter of the second storage unit, includes:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit, and the value of the first operation parameter of the second storage unit acquired at the later time point in the adjacent time points within the preset historical time length is larger than the value acquired at the earlier time point, alarming the first operation parameters of the first storage unit and the second storage unit.
In a possible implementation manner, if there is a current value of the first operating parameter of the first storage unit within a preset alarm range corresponding to the first storage unit, the method further includes:
acquiring a current numerical value of at least one associated operation parameter corresponding to a first operation parameter of a first storage unit;
if the current value of the second operation parameter in the at least one associated operation parameter is within a preset fault range, determining target fault information corresponding to the first operation parameter and the second operation parameter according to the corresponding relation of the preset first operation parameter, the associated operation parameter and the fault information;
and displaying the target fault information in the process of alarming.
In one possible implementation, the associated storage unit includes each upper level storage unit of the first storage unit.
In one possible implementation, the associated storage unit further includes other lower storage units than the first storage unit of the upper storage units.
In a possible implementation manner, each storage unit sequentially includes, from the upper level to the lower level, a cloud area Region, an available area AZ, a delivery point POD, and a virtual machine VM.
In a second aspect, an apparatus for performing an alarm operation is provided, the apparatus comprising at least one module for implementing the method for performing an alarm operation provided in the first aspect.
In a third aspect, a server is provided that includes a processor, a memory, the processor configured to execute instructions stored in the memory; the processor executes the instructions to implement the method for performing the alarm operation provided by the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, comprising instructions which, when run on a server, cause the server to perform the method of the first aspect described above.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a server, cause the server to perform the method of the first aspect described above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
by the method provided by the embodiment of the disclosure, when the value of the first operation parameter sent by the first storage unit is within the preset alarm range, it can be determined that the current value of the first operation parameter of the second storage unit in the associated storage unit is within the preset prediction alarm range corresponding to the second storage unit, and the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit, so that it can be effectively predicted whether the same problem occurs in other storage units in a wider range except the first operation parameter of the first storage unit.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. In the drawings:
FIG. 1 is a schematic diagram illustrating the structure of a storage system in accordance with an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating the structure of a server in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method of performing an alarm operation in accordance with one exemplary embodiment;
FIG. 4 is a flow diagram illustrating a method of performing an alarm operation in accordance with one exemplary embodiment;
FIG. 5 is a flow diagram illustrating a method of performing an alarm operation in accordance with one exemplary embodiment;
fig. 6 is a schematic diagram illustrating the structure of an apparatus for performing an alarm operation according to an exemplary embodiment.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The disclosed embodiments provide a method for performing an alarm operation, which may be implemented by a storage system. As shown in fig. 1, the storage system may include a plurality of regions, each Region may include a plurality of AZ, each AZ may include a plurality of PODs, each POD may include a plurality of servers, each server may further include a plurality of VMs, and each VM may further include a database for storing data of a user.
As shown in fig. 2, the server may include a processor 21, a transmitter 22, and a receiver 23, and the receiver 23 and the transmitter 22 may be respectively connected to the processor 21. The receiver 23 may be used to receive messages or data, the transmitter 22 and receiver 23 may be network cards, and the transmitter 22 may be used to transmit messages or data. The server may also include an acceleration component (which may be referred to as an accelerator), which may be a network card when the acceleration component is a network acceleration component. The processor 21 may be the control center of the server, and various interfaces and lines are used to connect various parts of the entire server, such as the receiver 23 and the transmitter 22. In the present invention, the processor 21 may be a Central Processing Unit (CPU), and optionally, the processor 21 may include one or more Processing units; processor 21 may integrate an application processor, which primarily handles the operating system, and a modem processor, which primarily handles wireless communications. The processor 21 may also be a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, or the like. The server may further include a memory 24, the memory 24 may be used to store software programs and modules, and the processor 21 executes various functional applications and data processing of the server by reading the software codes and modules stored in the memory.
An exemplary embodiment of the present disclosure provides a method for performing an alarm operation, and as shown in fig. 3, a processing flow of the method may include the following steps:
step S210, monitoring whether the current value of the first operating parameter of each storage unit is within the preset alarm range corresponding to each storage unit.
In an implementation, each storage unit may include, in order from upper to lower, a Region, an AZ, a POD, a VM, a database, and the like. The operation parameters may include parameters such as CPU (Central Processing Unit) usage, hard disk read/write speed, and network bandwidth. Corresponding alarm ranges may be set for each operating parameter for each storage unit. For example, the CPU usage of the VM may be set to correspond to an alarm range of greater than ninety percent.
In practical application, the current values of the operating parameters of the storage units can be acquired according to a preset period, and whether the current values of the operating parameters of the storage units are within the alarm ranges corresponding to the preset storage units or not is judged. For example, the current value of the CPU usage of the VM may be obtained according to a preset period, and whether the current value of the CPU usage of the VM is greater than ninety percent may be determined.
Step S220, if the current value of the first operating parameter of the first storage unit is within the preset alarm range corresponding to the first storage unit, obtaining the current value of the first operating parameter of the associated storage unit corresponding to the first storage unit.
In implementation, if the current value of the first operating parameter of the first storage unit is within the preset alarm range corresponding to the first storage unit, it may be determined that the current value of the first operating parameter of the first storage unit is in an abnormal range, and some fault may occur in the first storage unit, so that the current value of the first operating parameter of the first storage unit is in the abnormal range.
When the current value of the first operating parameter of the first storage unit is within the alarm range corresponding to the preset first storage unit, the associated storage unit corresponding to the first storage unit can be determined according to the corresponding relationship between the preset storage unit and other storage units, and the current value of the first operating parameter of the associated storage unit corresponding to the first storage unit is obtained.
Alternatively, a causal relationship library may be established, and the causal relationship library stores the correspondence between the storage unit and other storage units. When the current value of the first operating parameter of the first storage unit is within the alarm range corresponding to the preset first storage unit, the associated storage unit corresponding to the first storage unit can be searched in the causal relationship library, and the current value of the first operating parameter of the associated storage unit corresponding to the first storage unit is obtained.
For example, a large number of slow logs or slow SQL (Structured Query Language) exist in the database, the slow logs or the slow SQL may be caused by too low a disk read-write speed of the VM and too high a CPU utilization rate, the too low a disk read-write speed of the VM and too high a CPU utilization rate may be caused by too low a disk read-write speed of the server and too high a CPU utilization rate, the too low a disk read-write speed of the server and too high a CPU utilization rate may be caused by too low an average read-write speed of the POD, and the too low an average read-write speed of the POD may be caused by too low an average read-write speed of the AZ.
Based on the above cause and effect relationship, the disk read-write speed of the slow log or slow SQL-VM, the CPU utilization-the disk read-write speed of the server, the CPU utilization-the read-write average speed of POD-the read-write average speed of AZ may be stored in the cause and effect relationship library. When a large number of slow logs or slow SQL are found in the database, the first operating parameter of the associated storage unit corresponding to the slow logs or the slow SQL can be searched in the causal relationship library.
If the first operation parameter of the associated storage unit corresponding to the slow log or the slow SQL is found in the causal relationship library and is the disk read-write speed of the VM and the CPU utilization rate, the current values of the disk read-write speed of the VM and the CPU utilization rate are obtained. If the disk read-write speed of the VM and the current value of the CPU utilization rate are also within the alarm ranges respectively corresponding to the disk read-write speed of the VM and the CPU utilization rate, or the trend of the change of the disk read-write speed of the VM and the current value of the CPU utilization rate is to be within the alarm ranges respectively corresponding to the disk read-write speed of the VM and the CPU utilization rate, the over-low disk read-write speed of the VM and the over-high CPU utilization rate can be determined, and a large amount of slow logs or slow SQL (structured query language) exist in the database due to the over-low disk read-write speed of the VM and the over-high CPU utilization rate. Moreover, a large amount of slow logs or slow SQL exists in the found database, and not only a large amount of slow logs or slow SQL exists in the database, but also a large amount of slow logs or slow SQL may exist or will be generated in a database in a larger range.
Step S230, if the current value of the first operating parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit, an alarm is given to the first operating parameter of the first storage unit and the first operating parameter of the second storage unit.
And the predicted alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit.
In an implementation, the predicted alarm range corresponding to the second storage unit may be obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit, for example, the alarm range corresponding to the CPU utilization of the VM is greater than ninety percent, and the predicted alarm range corresponding to the CPU utilization of the VM is greater than eighty percent, which corresponds to that even if the current value of the first operating parameter of the second storage unit is not within the corresponding alarm range but is about to reach the alarm range soon, it may be determined that the current value of the first operating parameter of the second storage unit is in an abnormal range.
If the current value of the first operating parameter of the second storage unit in the associated storage unit is within the preset prediction alarm range corresponding to the second storage unit, it can be determined that not only the current value of the first operating parameter of the first storage unit is in an abnormal range, but also a storage unit in a range larger than the first storage unit also has a similar problem.
After all abnormal storage units are determined, alarming can be carried out on the first operation parameters of all abnormal storage units, so that system maintenance personnel can carry out corresponding maintenance operation on the abnormal storage units after checking prompt information generated based on alarming.
Alternatively, step S230 may include: and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuous increase state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
In practice, it may be determined whether the value of the first operating parameter of the second storage unit is in a continuously increasing state or not, in addition to determining whether the current value of the first operating parameter of the second storage unit is soon about to reach within the warning range. Specifically, it may be determined whether the value of the first operating parameter of the second storage unit is greater and greater within the preset historical time, and if the value of the first operating parameter of the second storage unit is greater and greater within the preset historical time, it may be predicted that after a short period of time, the value of the first operating parameter of the second storage unit is likely to be within the alarm range corresponding to the second storage unit. Therefore, if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit, and the value of the first operation parameter of the second storage unit is in a continuous increase state within the preset historical time, the alarm is given to the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
Optionally, if the current value of the first operating parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit, and the value of the first operating parameter of the second storage unit is in a continuously increasing state within the preset historical time, the step of alarming the first operating parameter of the first storage unit and the first operating parameter of the second storage unit may include: and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit, and the value of the first operation parameter of the second storage unit acquired at the later detection time point of any two adjacent detection time points in the preset historical time is larger than the value acquired at the earlier detection time point, alarming the first operation parameters of the first storage unit and the second storage unit.
In an implementation, the values of the first operating parameter of the second storage unit may be obtained periodically, so that all values within a preset historical time period may be obtained, and it is determined whether the value of the first operating parameter of the second storage unit obtained at a later detection time point in an adjacent detection time point is greater than the value obtained at a previous detection time point, in other words, whether the value of the first operating parameter of the second storage unit is greater and greater. If the value of the first operation parameter of the second storage unit is larger and larger, it can be determined that the trend of the change of the value of the first operation parameter of the second storage unit is to be within the alarm range corresponding to the second storage unit, and then the alarm can be given to the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
Or if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuously reduced state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
If the value of the first operating parameter of the second storage unit is smaller and more abnormal, whether the current value of the first operating parameter of the second storage unit in the associated storage unit is within a preset prediction alarm range corresponding to the second storage unit or not and whether the value of the first operating parameter of the second storage unit is in a continuously reduced state within a preset historical time can be judged. And if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuously reduced state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
Alternatively, the associated storage unit may include each upper level storage unit of the first storage unit. Each storage unit can comprise a Region, an AZ, a POD, a VM and a database from the upper level to the lower level in sequence. The associated storage unit may further include other lower storage units than the first storage unit of each upper storage unit.
In practice, to predict whether a memory unit having a wider range than the first memory unit will have the same problem based on the current value of the first operating parameter of the first memory unit that is not normal, can determine the superior storage unit of the first storage unit, acquire the current value of the first operating parameter of the superior storage unit of the first storage unit, judge whether the current value of the first operating parameter of the superior storage unit of the first storage unit is in the corresponding prediction alarm range, if the current value of the first operating parameter of the upper storage unit of the first storage unit is within the corresponding prediction alarm range, then the current value of the first operating parameter of the upper level storage unit which is one level higher than the level of the upper level storage unit is obtained, and judging whether the current value of the first operating parameter of the upper-level storage unit which is higher than the level of the upper-level storage unit by one level is within the corresponding prediction alarm range. And circulating the steps until all the storage units with the exception are found out.
In order to predict whether a storage unit with a wider range than a first storage unit has the same problem according to the current value of the first operating parameter of the abnormal first storage unit, whether the current value of the first operating parameter of the same level storage unit of the first storage unit is within a corresponding alarm range can be determined besides directly determining the upper storage unit of the first storage unit.
Specifically, if the value of the first operating parameter of the first storage unit is within the alarm range corresponding to the preset first storage unit, the value of the first operating parameter of the same-level storage unit corresponding to the first storage unit is obtained, if the value of the first operating parameter of the same-level storage unit is within the corresponding preset prediction alarm range, the value of the first operating parameter of the higher-level storage unit corresponding to the first storage unit is obtained, if the value of the first operating parameter of the higher-level storage unit is within the corresponding preset prediction alarm range, the higher-level storage unit is determined as the first storage unit, and the step of obtaining the value of the first operating parameter of the same-level storage unit corresponding to the first storage unit is executed. And alarming the first operating parameters of the first storage unit, the same-level storage unit in the corresponding preset prediction alarm range and the higher-level storage unit in the corresponding preset prediction alarm range.
For example, a large number of slow logs or slow SQL exist in the database, the slow logs or slow SQL may be caused by too low a disk read-write speed of the VM and too high a CPU utilization rate, the too low a disk read-write speed of the VM and the too high a CPU utilization rate may be caused by too low a disk read-write speed of the server and too high a CPU utilization rate, the too low a disk read-write speed of the server and the too high a CPU utilization rate may be caused by too low an average read-write speed of the POD, and the too low an average read-write speed of the POD may be caused by too low an average read-write speed of the AZ.
Based on the above cause and effect relationship, the disk read-write speed of the slow log or slow SQL-VM, the CPU utilization-the disk read-write speed of the server, the CPU utilization-the read-write average speed of POD-the read-write average speed of AZ may be stored in the cause and effect relationship library. When a large number of slow logs or slow SQL are found in the database, the first operating parameter of the associated storage unit corresponding to the slow logs or the slow SQL can be searched in the causal relationship library.
If the first operation parameter of the associated storage unit corresponding to the slow log or the slow SQL is found in the causal relationship library and is the disk read-write speed of the VM and the CPU utilization rate, the current values of the disk read-write speed of the VM and the CPU utilization rate are obtained. If the disk read-write speed of the VM and the current value of the CPU utilization rate are also within the alarm ranges respectively corresponding to the disk read-write speed of the VM and the CPU utilization rate, or the trend of the change of the disk read-write speed of the VM and the current value of the CPU utilization rate is to be within the alarm ranges respectively corresponding to the disk read-write speed of the VM and the CPU utilization rate, the over-low disk read-write speed of the VM and the over-high CPU utilization rate can be determined, and a large amount of slow logs or slow SQL (structured query language) exist in the database due to the over-low disk read-write speed of the VM and the over-high CPU utilization rate.
If the disk read-write speed of one VM is too low and the CPU utilization rate is too high, whether the disk read-write speed of other VMs belonging to the same server is too low and the CPU utilization rate is too high can be determined. If the disk read-write speed of other VMs belonging to the same server is too slow, and the CPU utilization is too high, or the trend of the change of the disk read-write speed of other VMs belonging to the same server is slower and higher, and the trend of the change of the CPU utilization is higher and higher, it can be determined that the same abnormality also exists in other VMs belonging to the same server.
And if the upper storage unit of the VM found in the causal relationship library is the server to which the VM belongs, the disk read-write speed and the CPU utilization rate of the server can be obtained, and whether the disk read-write speed and the CPU utilization rate of the server are within the corresponding prediction alarm ranges respectively is judged. In application, it is possible to search for an exception from the database, and then search for an exception from the VM, the server, the POD, the AZ, the Region, or even a wider range of storage units in sequence until an exception in the largest range of storage units is found, so that it is possible to predict whether an exception has occurred or will occur similarly in the database in the wider range of storage units. For the Region, the Region is a distributed storage resource pool shared by a plurality of AZs, if the Region is abnormal, the user data of the AZs can be affected, and therefore if the Region is abnormal, a plurality of databases in the Region can be abnormal.
By the method provided by the embodiment of the disclosure, when the value of the first operation parameter sent by the first storage unit is within the preset alarm range, it can be determined that the current value of the first operation parameter of the second storage unit in the associated storage unit is within the preset prediction alarm range corresponding to the second storage unit, and the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit, so that it can be effectively predicted whether the same problem occurs in other storage units in a wider range except the first operation parameter of the first storage unit.
Based on the same inventive concept, an exemplary embodiment of the present disclosure provides a method for performing an alarm operation, which is different from the above embodiments, if there is a current value of a first operating parameter of a first storage unit within an alarm range corresponding to a preset first storage unit, in addition to predicting whether a storage unit in a wider range is also abnormal, target fault information corresponding to the first storage unit may be determined. As shown in fig. 4, the processing flow of the method may include the following steps:
step S310, obtain a current value of at least one associated operating parameter corresponding to the first operating parameter of the first storage unit.
Step S320, if the current value of the second operation parameter in the at least one associated operation parameter is within the preset fault range, determining target fault information corresponding to the first operation parameter and the second operation parameter according to a corresponding relationship between the preset first operation parameter, the associated operation parameter, and the fault information.
And step S330, displaying target fault information in the process of alarming.
In implementation, a key feature index set of each fault in the historical data can be extracted from the historical data through a preset data mining algorithm, causality of all feature indexes in the key feature index set is then removed, specifically, feature indexes without effect variables and feature indexes without effect variables in the key index set can be removed from the key index set, and a complete feature index library is established. Meanwhile, a minimum feature index library can be established according to a maximum information gain rule.
In the minimum feature index library, the corresponding relationship among the first operating parameter, the associated operating parameter, and the fault information may be stored, so that when the current value of the first operating parameter of the first storage unit is not within the normal range, at least one associated operating parameter (minimum feature index) corresponding to the first operating parameter of the first storage unit may be searched in the minimum feature index library. For each fault, a plurality of characteristic indexes exist, and the minimum characteristic index refers to the characteristic index which can obtain the maximum information gain in the plurality of characteristic indexes and is the minimum characteristic index.
After the minimum characteristic index is obtained, the current value of the minimum characteristic index can be obtained, and if the current value of the minimum characteristic index is within a preset fault range, target fault information corresponding to the minimum characteristic index is determined. And displaying the target fault information in the alarming process so that a system maintenance worker can perform corresponding maintenance operation on the abnormal storage unit after checking the prompt information generated based on the alarm.
For example, as shown in fig. 5, the active-standby synchronization delay between VMs may be monitored, and if the delay is within 200 microseconds, the synchronization is normal, and if the delay is greater than 200 microseconds, the synchronization is abnormal. If the main-standby synchronization time delay between the VMs is abnormal, the CPU utilization rate needs to be checked, if the CPU utilization rate is less than ninety percent, the CPU utilization rate is normal, and if the CPU utilization rate is more than ninety percent, the CPU utilization rate is abnormal. If the CPU utilization rate is abnormal, the synchronous lagging data volume is needed, if the synchronous lagging data volume is less than 1MB, the synchronous lagging data volume is normal, and if the synchronous lagging data volume is more than 1MB, the synchronous lagging data volume belongs to the main-standby synchronous fault.
The complete characteristic indexes may include the primary and standby synchronization delay between VMs, the CPU usage rate, and the amount of synchronization lag data. The minimum characteristic indicator may include an amount of synchronization lag data.
In application, the primary-standby synchronization delay between the VMs can be monitored, the primary-standby synchronization delay between the VMs is a first operation parameter of the first storage unit, and when the primary-standby synchronization delay between the VMs is greater than 200 microseconds, the current value of the first operation parameter of the first storage unit is within the alarm range corresponding to the first storage unit, and the current value of the first operation parameter of the first storage unit is abnormal. At this time, the minimum characteristic index corresponding to the first operating parameter of the first storage unit may be obtained, and specifically, the synchronization delay data amount corresponding to the first operating parameter of the first storage unit may be obtained, and in the above process, the CPU utilization does not need to be checked. If the synchronous lag data volume is larger than 1MB, the primary and standby synchronous faults of the VM can be judged.
If the synchronization lag data volume is less than 1MB, it may be determined that even though the CPU utilization rate is high, even though the primary-secondary synchronization delay between the VMs is long, the actual situation may be that the data volume required to be synchronized between the primary-secondary VMs is huge, and inevitably, the CPU utilization rate is high, and the primary-secondary synchronization delay between the VMs is long, but this is not a fault, but is a normal phenomenon.
In short, as long as the synchronous hysteresis data amount is within the preset fault range, the target fault information corresponding to the first storage unit can be judged without referring to other related operation parameters.
In practical applications, for example, a tenant F rents two sets of HA (High Available, dual-computer) clusters in a storage system, and establishes four databases in total. And the cluster A and the cluster B have the current value of the first operating parameter in the corresponding alarm range. At this time, the minimum characteristic index corresponding to the first operating parameter may be obtained, and target fault information corresponding to the cluster a and the cluster B may be determined according to the minimum characteristic index.
The specific operating parameters of the cluster a and the cluster B of the tenant F are as follows:
(1) in a main database in the cluster A, the CPU utilization rate of a VM is ninety percent, the memory utilization rate is eighty-five percent, the main-standby synchronization delay between VMs is 1 millisecond, and the synchronization delay data volume is 2 MB.
(2) In the standby database in the cluster A, the CPU utilization rate of the VM is fifty percent, the memory utilization rate is forty-five percent, the main-standby synchronization delay between the VMs is 1 millisecond, and the synchronization delay data volume is 2 MB.
(3) In a main database in the cluster B, the CPU utilization rate of the VM is forty percent, the memory utilization rate is twenty percent, the main-standby synchronization delay between the VMs is 3 milliseconds, and the synchronization delay data volume is 100 KB.
(4) In the standby database in the cluster B, the CPU utilization rate of the VM is twenty percent, the memory utilization rate is thirty percent, the main-standby synchronization delay between the VMs is 3 milliseconds, and the synchronization delay data volume is 100 KB.
Tenant Y HAs a set of HA clusters, including two databases. The specific operating parameters of the cluster C of tenant Y are as follows:
(1) in a main database in the cluster C, the CPU utilization rate of the VM is ninety percent, the memory utilization rate is eighty-five percent, the main-standby synchronization delay between the VMs is 2 milliseconds, and the synchronization delay data volume is 100 KB.
(2) In the standby database in the cluster C, the CPU utilization rate of the VM is fifty percent, the memory utilization rate is forty-five percent, the main-standby synchronization delay between the VMs is 2 milliseconds, and the synchronization delay data volume is 2 MB.
According to the fault ranges corresponding to the characteristic indexes, it can be determined that all the characteristic indexes in the cluster a are within the corresponding fault ranges, the minimum characteristic index is also within the corresponding fault range, and the cluster a has a primary and standby synchronous fault.
Similarly, according to the failure ranges corresponding to the characteristic indexes, it can be determined that some characteristic indexes in the cluster B are within the corresponding failure ranges, but the minimum characteristic index is within the corresponding failure range, so that the cluster B also has a primary-backup synchronous failure.
Then, it may be determined that the failure of the primary database in cluster a and the failure of the secondary database in cluster a are related, the failure of the primary database in cluster B and the failure of the secondary database in cluster B are related, the failure of the primary database in cluster a, the failure of the secondary database in cluster a, the failure of the primary database in cluster B and the failure of the secondary database in cluster B are related, and VMs of cluster a and cluster B are different, so that the failure of the primary database in cluster a, the failure of the secondary database in cluster a, the failure of the primary database in cluster B and the failure of the secondary database in cluster B should be caused by the failure of a server.
Subsequently, after determining that the server has failed, the number of databases with the first operating parameter within the alarm range among all databases in the server may be searched, and if the number is greater than a threshold value empirically given by an expert, it may be determined that the server has failed due to a network problem.
When the first operation parameter in all databases in the search server is in the database within the alarm range, it is determined that the master-slave synchronization time delay between VMs of the cluster C is 2 milliseconds, and if the master-slave synchronization time delay is within the alarm range, the cluster C may be marked as a risk cluster, and in the subsequent operation process, a focus is placed on a risk cluster similar to the cluster C.
By the method provided by the embodiment of the disclosure, when the value of the first operation parameter sent by the first storage unit is within the preset alarm range, it can be determined that the current value of the first operation parameter of the second storage unit in the associated storage unit is within the preset prediction alarm range corresponding to the second storage unit, and the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit, so that it can be effectively predicted whether the same problem occurs in other storage units in a wider range except the first operation parameter of the first storage unit.
Yet another exemplary embodiment of the present disclosure provides an apparatus for performing an alarm operation, as shown in fig. 6, including:
the monitoring module 610 is configured to monitor whether the current value of the first operating parameter of each storage unit is within a preset alarm range corresponding to each storage unit, and may specifically implement the monitoring function in step S210, and other implicit steps.
The obtaining module 620 is configured to obtain the current value of the first operating parameter of the associated storage unit corresponding to the first storage unit when the current value of the first operating parameter of the first storage unit is within the preset alarm range corresponding to the first storage unit, and may specifically implement the obtaining function in step S220 and other implicit steps.
The alarming module 630 is configured to alarm the first operating parameter of the first storage unit and the first operating parameter of the second storage unit when the current value of the first operating parameter of the second storage unit in the associated storage unit is within a preset prediction alarming range corresponding to the second storage unit, where the prediction alarming range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarming range corresponding to the second storage unit, and specifically, the alarming function in step S230 and other implicit steps may be implemented.
Optionally, the alarm module 630 is configured to:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuous increase state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
Optionally, the alarm module 630 is configured to:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within the prediction alarm range corresponding to the preset second storage unit, and the value of the first operation parameter of the second storage unit acquired at the later detection time point of any two adjacent detection time points in the preset historical time is larger than the value acquired at the earlier detection time point, alarming the first operation parameters of the first storage unit and the second storage unit.
Optionally, the obtaining module 620 is further configured to obtain a current value of at least one associated operating parameter corresponding to the first operating parameter of the first storage unit;
the device further comprises:
the determining module is used for determining target fault information corresponding to the first operating parameter and the second operating parameter according to the corresponding relation among the preset first operating parameter, the associated operating parameter and the fault information when the current value of the second operating parameter in the at least one associated operating parameter is within the preset fault range;
and the display module is used for displaying the target fault information in the process of alarming.
Optionally, the associated storage unit includes each upper level storage unit of the first storage unit.
Optionally, the associated storage unit further includes other lower storage units than the first storage unit of the upper storage units.
Optionally, each storage unit sequentially includes, from the upper level to the lower level, a cloud area Region, an available area AZ, a delivery point POD, and a virtual machine VM.
It should be noted that the monitoring module 610, the obtaining module 620, and the alarming module 630 may be implemented by a processor, or implemented by a processor and a memory.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
By the device provided by the embodiment of the disclosure, when the value of the first operation parameter sent by the first storage unit is within the preset alarm range, it can be determined that the current value of the first operation parameter of the second storage unit in the associated storage unit is within the preset prediction alarm range corresponding to the second storage unit, and the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit, so that it can be effectively predicted whether the same problem occurs in other storage units in a wider range except the first operation parameter of the first storage unit.
It should be noted that: in the alarm operation device provided in the above embodiment, only the division of the functional modules is illustrated when the alarm operation is performed, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the server is divided into different functional modules to complete all or part of the functions described above. In addition, the device for performing the alarm operation and the method for performing the alarm operation provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (22)

1. A method of performing an alarm operation, the method comprising:
monitoring whether the current value of the first operation parameter of each storage unit is within a preset alarm range corresponding to each storage unit;
if the current value of the first operation parameter of the first storage unit is within the preset alarm range corresponding to the first storage unit, acquiring the current value of the first operation parameter of the associated storage unit corresponding to the first storage unit;
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a preset prediction alarm range corresponding to the second storage unit, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit, wherein the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit.
2. The method of claim 1, wherein alarming the first operating parameter of the first storage unit and the second storage unit if the current value of the first operating parameter of the second storage unit in the associated storage unit is within the predicted alarm range corresponding to the preset second storage unit comprises:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuous increase state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
3. The method according to claim 2, wherein the alarming the first operating parameter of the first storage unit and the first operating parameter of the second storage unit if the current value of the first operating parameter of the second storage unit in the associated storage unit is within the predicted alarm range corresponding to the preset second storage unit and the value of the first operating parameter of the second storage unit is in a continuously increasing state within the preset historical time period comprises:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to a preset second storage unit, and the value of the first operation parameter of the second storage unit acquired at the later detection time point of any two adjacent detection time points in a preset historical time is larger than the value of the first operation parameter of the second storage unit acquired at the earlier detection time point, alarming the first operation parameters of the first storage unit and the second storage unit.
4. The method of claim 1, wherein if the current value of the first operating parameter in the first storage unit is within the alarm range corresponding to the preset first storage unit, the method further comprises:
acquiring a current numerical value of at least one associated operation parameter corresponding to a first operation parameter of a first storage unit;
if the current value of the second operation parameter in the at least one associated operation parameter is within a preset fault range, determining target fault information corresponding to the first operation parameter and the second operation parameter according to the corresponding relation of the preset first operation parameter, the associated operation parameter and the fault information;
and displaying the target fault information in the process of alarming.
5. The method of claim 1, wherein the associated storage unit comprises each upper level storage unit of the first storage unit.
6. The method according to claim 5, wherein the associated storage unit further comprises other lower storage units of the respective upper storage units except the first storage unit.
7. The method according to claim 1, wherein each storage unit comprises a cloud area Region, an available area AZ, a delivery point POD and a virtual machine VM from upper level to lower level.
8. An apparatus for performing an alarm operation, the apparatus comprising:
the monitoring module is used for monitoring whether the current value of the first operating parameter of each storage unit is within a preset alarm range corresponding to each storage unit;
the acquisition module is used for acquiring the current value of the first operating parameter of the associated storage unit corresponding to the first storage unit when the current value of the first operating parameter of the first storage unit is within the preset alarm range corresponding to the first storage unit;
and the alarm module is used for alarming the first operation parameters of the first storage unit and the second storage unit when the current value of the first operation parameter of the second storage unit in the associated storage unit is within a preset prediction alarm range corresponding to the second storage unit, wherein the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit.
9. The apparatus of claim 8, wherein the alert module is configured to:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuous increase state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
10. The apparatus of claim 9, wherein the alert module is configured to:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to a preset second storage unit, and the value of the first operation parameter of the second storage unit acquired at the later detection time point of any two adjacent detection time points in a preset historical time is larger than the value of the first operation parameter of the second storage unit acquired at the earlier detection time point, alarming the first operation parameters of the first storage unit and the second storage unit.
11. The apparatus according to claim 8, wherein the obtaining module is further configured to obtain a current value of at least one associated operating parameter corresponding to the first operating parameter of the first storage unit;
the device further comprises:
the determining module is used for determining target fault information corresponding to the first operating parameter and the second operating parameter according to the corresponding relation among the preset first operating parameter, the associated operating parameter and the fault information when the current value of the second operating parameter in the at least one associated operating parameter is within the preset fault range;
and the display module is used for displaying the target fault information in the process of alarming.
12. The apparatus of claim 8, wherein the associated storage unit comprises each upper level storage unit of the first storage unit.
13. The apparatus of claim 12, wherein the associated storage unit further comprises a lower storage unit of the upper storage units other than the first storage unit.
14. The apparatus according to claim 8, wherein each storage unit comprises a cloud area Region, an available area AZ, a delivery point POD, and a virtual machine VM from upper level to lower level.
15. A server, comprising a processor and a memory, wherein:
the processor is used for monitoring whether the current value of the first operating parameter of each storage unit is within a preset alarm range corresponding to each storage unit stored in the memory; if the current value of the first operation parameter of the first storage unit is within the alarm range corresponding to the preset first storage unit stored in the memory, acquiring the current value of the first operation parameter of the associated storage unit corresponding to the first storage unit; and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to a preset second storage unit stored in the memory, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit, wherein the prediction alarm range corresponding to the second storage unit is obtained by performing range boundary expansion adjustment on the alarm range corresponding to the second storage unit.
16. The server of claim 15, wherein the processor is configured to:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to the preset second storage unit and the value of the first operation parameter of the second storage unit is in a continuous increase state within the preset historical time, alarming the first operation parameter of the first storage unit and the first operation parameter of the second storage unit.
17. The server of claim 16, wherein the processor is configured to:
and if the current value of the first operation parameter of the second storage unit in the associated storage unit is within a prediction alarm range corresponding to a preset second storage unit, and the value of the first operation parameter of the second storage unit acquired at the later detection time point of any two adjacent detection time points in a preset historical time is larger than the value of the first operation parameter of the second storage unit acquired at the earlier detection time point, alarming the first operation parameters of the first storage unit and the second storage unit.
18. The server of claim 15, wherein the processor is further configured to:
acquiring a current numerical value of at least one associated operation parameter corresponding to a first operation parameter of a first storage unit;
if the current value of the second operation parameter in the at least one associated operation parameter is within a preset fault range, determining target fault information corresponding to the first operation parameter and the second operation parameter according to the corresponding relation of the preset first operation parameter, the associated operation parameter and the fault information;
and displaying the target fault information in the process of alarming.
19. The server according to claim 15, wherein the associated storage unit includes each upper storage unit of the first storage unit.
20. The server according to claim 19, wherein the associated storage unit further includes a lower storage unit other than the first storage unit of the upper storage units.
21. The server according to claim 15, wherein each storage unit comprises a cloud area Region, an available area AZ, a delivery point POD, and a virtual machine VM from upper level to lower level.
22. A computer-readable storage medium comprising instructions that, when executed on a server, cause the server to perform the method of any of claims 1-7.
CN201811147264.4A 2018-09-29 2018-09-29 Method and device for performing alarm operation Active CN109408325B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811147264.4A CN109408325B (en) 2018-09-29 2018-09-29 Method and device for performing alarm operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811147264.4A CN109408325B (en) 2018-09-29 2018-09-29 Method and device for performing alarm operation

Publications (2)

Publication Number Publication Date
CN109408325A CN109408325A (en) 2019-03-01
CN109408325B true CN109408325B (en) 2020-11-03

Family

ID=65465664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811147264.4A Active CN109408325B (en) 2018-09-29 2018-09-29 Method and device for performing alarm operation

Country Status (1)

Country Link
CN (1) CN109408325B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103178991A (en) * 2011-12-21 2013-06-26 中国移动通信集团黑龙江有限公司 Method and system for analyzing multiple-network relation
CN103701627A (en) * 2012-09-27 2014-04-02 北京搜狐新媒体信息技术有限公司 Cloud computing platform fault detection method, cloud computing platform fault detection method, solving method and solving device
CN104243184A (en) * 2013-06-06 2014-12-24 中国移动通信集团河北有限公司 Alarm information processing method and apparatus
CN106681882A (en) * 2015-11-06 2017-05-17 上海瑞致软件有限公司 IT-service concentrated monitoring and managing system based on Apriori algorithm
CN108255661A (en) * 2016-12-29 2018-07-06 北京京东尚科信息技术有限公司 A kind of method and system for realizing Hadoop cluster monitorings

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102195070B1 (en) * 2014-10-10 2020-12-24 삼성에스디에스 주식회사 System and method for detecting and predicting anomalies based on analysis of time-series data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103178991A (en) * 2011-12-21 2013-06-26 中国移动通信集团黑龙江有限公司 Method and system for analyzing multiple-network relation
CN103701627A (en) * 2012-09-27 2014-04-02 北京搜狐新媒体信息技术有限公司 Cloud computing platform fault detection method, cloud computing platform fault detection method, solving method and solving device
CN104243184A (en) * 2013-06-06 2014-12-24 中国移动通信集团河北有限公司 Alarm information processing method and apparatus
CN106681882A (en) * 2015-11-06 2017-05-17 上海瑞致软件有限公司 IT-service concentrated monitoring and managing system based on Apriori algorithm
CN108255661A (en) * 2016-12-29 2018-07-06 北京京东尚科信息技术有限公司 A kind of method and system for realizing Hadoop cluster monitorings

Also Published As

Publication number Publication date
CN109408325A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN108923952B (en) Fault diagnosis method, equipment and storage medium based on service monitoring index
CN110784355B (en) Fault identification method and device
CN113641526B (en) Alarm root cause positioning method and device, electronic equipment and computer storage medium
CN110601900A (en) Network fault early warning method and device
CN110674014A (en) Method and device for determining abnormal query request
CN113312371A (en) Processing method, equipment and system for execution plan
CN111314158B (en) Big data platform monitoring method, device, equipment and medium
CN114328102A (en) Equipment state monitoring method, device, equipment and computer readable storage medium
KR20170084445A (en) Method and apparatus for detecting abnormality using time-series data
CN112799923A (en) System abnormality reason determining method, device, equipment and storage medium
CN109408325B (en) Method and device for performing alarm operation
CN112965791B (en) Timing task detection method, device, equipment and storage medium
CN106294721B (en) Cluster data counting and exporting methods and devices
CN113472881B (en) Statistical method and device for online terminal equipment
CN115438244A (en) Database health degree assessment method and device
CN115529219A (en) Alarm analysis method and device, computer readable storage medium and electronic equipment
CN115277220A (en) Industrial control network traffic safety classification method and system and readable storage device
CN112732517B (en) Disk fault alarm method, device, equipment and readable storage medium
CN113312197A (en) Method and apparatus for determining batch faults, computer storage medium and electronic device
CN109426559B (en) Command issuing method and device, storage medium and processor
US20240004765A1 (en) Data processing method and apparatus for distributed storage system, device, and storage medium
CN114124758B (en) Flow monitoring method and device
CN115599312B (en) Big data processing method and AI system based on storage cluster
CN111596641B (en) Fault analysis method and device, storage medium and equipment for emergency diesel engine of nuclear power station
CN113300948B (en) Method, device, system and storage medium for service survivability analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220215

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.