CN115150249A - Storage system warning method, device, equipment and storage medium - Google Patents

Storage system warning method, device, equipment and storage medium Download PDF

Info

Publication number
CN115150249A
CN115150249A CN202210778324.2A CN202210778324A CN115150249A CN 115150249 A CN115150249 A CN 115150249A CN 202210778324 A CN202210778324 A CN 202210778324A CN 115150249 A CN115150249 A CN 115150249A
Authority
CN
China
Prior art keywords
alarm
information
repairing
maintenance terminal
acquired
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210778324.2A
Other languages
Chinese (zh)
Inventor
赵晓青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN202210778324.2A priority Critical patent/CN115150249A/en
Publication of CN115150249A publication Critical patent/CN115150249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0609Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time based on severity or priority
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The application discloses a storage system warning method, a device, equipment and a storage medium, which relate to the technical field of data storage and comprise the following steps: acquiring alarm information reported by a cluster, and matching the alarm information with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level so as to determine a correspondingly distributed operation and maintenance terminal; judging whether the received identification information is acquired within a first preset time interval; if the identification information is acquired, repairing the alarm item based on an operation instruction sent by the operation and maintenance terminal and automatically learning a corresponding repairing process through artificial intelligence; and if the identification information is not acquired, automatically repairing the alarm item through artificial intelligence so as to process the alarm information according to a repairing result. According to the technical scheme, the efficiency and pertinence of the operation and maintenance personnel for processing the problems are improved, the time for processing the problems is saved, and the warning of the storage cluster and the efficiency of problem processing are guaranteed.

Description

Storage system warning method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a method, an apparatus, a device, and a storage medium for warning in a storage system.
Background
With the rapid development of cloud computing and big data technology in modern society, production data accumulated in production and life also grow exponentially, and mass storage technology is becoming an indispensable part in the development of the internet. In the distributed storage system, since massive data needs to be monitored and managed, some critical information or faults need to be warned. However, as the data storage scale and the operation amount are continuously increased, problems and alarms which may be encountered under complex conditions are increased, if all alarms are sent to each operation and maintenance person, the pressure for the operation and maintenance person to identify and process the alarms is increased, and even the operation and maintenance person cannot timely acquire and process the corresponding alarms due to excessive alarm conditions, which may cause a serious accident to occur to a cluster and cause inestimable loss.
In summary, how to reduce the time for operation and maintenance personnel to identify different alarms and ensure the efficiency of alarm and problem processing of the storage cluster is the problem to be solved at present.
Disclosure of Invention
In view of this, an object of the present invention is to provide a storage system alarm method, apparatus, device and storage medium, which can reduce the time for operation and maintenance personnel to identify different alarms, and ensure the efficiency of alarm and problem processing of a storage cluster. The specific scheme is as follows:
in a first aspect, the present application discloses a storage system alarm method, including:
acquiring alarm information reported by a cluster, and matching the alarm information with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level so as to determine an operation and maintenance terminal corresponding to the alarm information;
distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval;
if the identification information is acquired within the first preset time interval, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning a corresponding repairing process through artificial intelligence;
and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence so as to process the alarm information according to a repairing result.
Optionally, the storage system alarm method further includes:
determining alarm items corresponding to different preset alarm information in quantity, and performing priority sequencing on the alarm items to obtain first configuration information;
determining a configuration level corresponding to the alarm item, and performing priority ordering on the configuration level to obtain second configuration information;
and determining the target configuration information by using the first configuration information and the second configuration information.
Optionally, the allocating the alarm information to the corresponding operation and maintenance terminal includes:
and distributing the alarm information to the operation and maintenance terminal according to a preset distribution rule based on the alarm priority and the alarm level.
Optionally, if the identification information is obtained within the first preset time interval, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal includes:
and if the identification information is acquired within the first preset time interval, stopping pushing the alarm information, and repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal.
Optionally, if the identification information is not obtained within the first preset time interval, automatically repairing the alarm item through the artificial intelligence includes:
and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence according to a list of default repairing steps of the alarm item, and judging whether the automatic repairing is successful.
Optionally, the determining whether the automatic repair is successful includes:
when the automatic repair is successful, sending a notification of the repair completion after the repair is completed;
and when the automatic repair fails, sending a repair failure notification, and pushing the alarm information to other operation and maintenance terminals correspondingly distributed to the alarm priority until the identification information is obtained.
Optionally, the pushing the alarm information to an operation and maintenance terminal correspondingly allocated to another alarm priority until the identification information is obtained includes:
and pushing the alarm information to other operation and maintenance terminals correspondingly distributed with the alarm priority, judging whether the identification information is acquired or not within a second preset time interval, and if the identification information is not acquired within the second preset time interval, continuing to push the alarm information to other operation and maintenance terminals correspondingly distributed with the alarm priority until the identification information is acquired.
In a second aspect, the present application discloses a storage system warning device, including:
the information acquisition module is used for acquiring alarm information reported by the cluster, and matching the alarm information with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level so as to determine an operation and maintenance terminal corresponding to the alarm information;
the information distribution module is used for distributing the alarm information to the corresponding operation and maintenance terminal and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval;
the first alarm item repairing module is used for repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal when the identification information is acquired within the first preset time interval, and automatically learning a corresponding repairing process through artificial intelligence;
and the second alarm item repairing module is used for automatically repairing the alarm item through the artificial intelligence when the identification information is not acquired within the first preset time interval so as to process the alarm information according to a repairing result.
In a third aspect, the present application discloses an electronic device comprising a processor and a memory; wherein the memory is used for storing a computer program which is loaded and executed by the processor to implement the storage system alert method as described above.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the storage system alert method as previously described.
In the method, firstly, alarm information reported by a cluster is obtained, and the alarm information is matched with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level, so that an operation and maintenance terminal which is correspondingly distributed to the alarm information is determined; then distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval; if the identification information is acquired within the first preset time interval, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning a corresponding repairing process through artificial intelligence; and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence so as to process the alarm information according to a repairing result. Therefore, at the operation and maintenance terminal, a user can configure the alarm items which are handled by the user in a customized manner, and the alarm items can be classified and distributed to the relevant operation and maintenance terminals to be handled according to the alarm priority and the alarm level which are configured in a customized manner in advance. In the manual intervention repair process of the operation and maintenance terminal, the artificial intelligence can automatically learn and correct, through the isolation and repair mechanism accurate pushing of alarming, the time for the operation and maintenance terminal to identify different alarms is reduced, the efficiency and pertinence for problem processing of the operation and maintenance terminal are improved, the time for identifying and processing problems of alarming is saved, the product service level and the whole competitiveness are improved, the artificial intelligence repair ensures the alarming and problem processing efficiency of a storage cluster, and the alarming mechanism of the storage system is more intelligent. The mechanism enhances the overall stability of the storage system and saves the maintenance cost to a certain extent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for the terminal of the ordinary skill in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a storage system alarming method disclosed in the present application;
FIG. 2 is a schematic diagram illustrating a user-defined configuration process for an alert of a storage system according to the present disclosure;
FIG. 3 is a flow chart of a particular storage system alert method disclosed herein;
FIG. 4 is a schematic diagram of an alarm isolation repair process of a storage system according to the present disclosure;
FIG. 5 is a schematic structural diagram of an alarm device of a storage system according to the present disclosure;
fig. 6 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a terminal of ordinary skill in the art without making creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
At present, with the continuous increase of data storage scale and operand, problems and alarms which may be encountered under complex conditions will increase, if all alarms are sent to each operation and maintenance person, the pressure for identifying alarms and processing problems will increase for the operation and maintenance person, and even the operation and maintenance person cannot timely acquire and process corresponding alarms due to excessive alarm conditions, which may cause serious accidents in a cluster and cause inestimable loss.
Therefore, the storage system alarm scheme is provided, the time for operation and maintenance personnel to identify different alarms can be reduced, and the efficiency of alarm and problem processing of the storage cluster is guaranteed.
The embodiment of the invention discloses a storage system warning method, which is shown in figure 1 and comprises the following steps:
step S11: and acquiring alarm information reported by the cluster, and matching the alarm information with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level so as to determine an operation and maintenance terminal corresponding to the alarm information.
In the embodiment of the application, when a cluster alarm is generated and alarm reporting is performed, the alarm information reported by the cluster is obtained, and a user can define the priorities and the alarm levels corresponding to different alarm items in advance, so that the alarm information is matched with target configuration information which is configured by the user in advance and carries the alarm priorities and the alarm levels, and the user can determine which alarm item or items are handled by different operation and maintenance terminals.
It should be noted that, when configuring by user, the user can configure one or more alarm items that the user is responsible for receiving by user, and can perform priority ranking, as well as alarm level and priority ranking. Specifically, alarm items corresponding to different preset number of alarm information are determined, and priority ranking is performed on the alarm items to obtain first configuration information; determining a configuration level corresponding to the alarm item, and performing priority ordering on the configuration level to obtain second configuration information; and determining the target configuration information by using the first configuration information and the second configuration information.
As shown in fig. 2, a specific flow of user-defined configuration is shown, and after the flow starts, the personal alarm module receives configuration, and one or more responsible alarm modules may be configured; it can be understood that the alarm module is an alarm item corresponding to the alarm information. And then the filled alarm modules are subjected to priority sorting. Further, alarm level configuration is carried out, for example, the levels of high risk, serious, normal and slight can be received; the levels of the received alarms are then prioritized. Therefore, when the cluster generates the alarm, the alarm items are classified and distributed to the operation and maintenance terminals for processing the alarm of the related alarm items according to the configuration information defined by the user and the information such as the alarm priority, the alarm level and the like, and if the alarm is not related, the alarm is not pushed.
Step S12: and distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether the identification information which is sent by the operation and maintenance terminal and has received the alarm information is acquired within a first preset time interval.
In the embodiment of the application, after it is determined that the alarm item to be processed is to be pushed to the corresponding operation and maintenance terminal, whether the identification information sent by the operation and maintenance terminal and received by the alarm information is acquired within a first preset time interval is judged. It can be understood that, when the alarm information is allocated to the corresponding operation and maintenance terminal, the alarm information is allocated to the operation and maintenance terminal according to a preset allocation rule based on the alarm priority and the alarm level. For example, according to the alarm item and the alarm level which are in user-defined responsibility, the priority level corresponding to the alarm item and the alarm level is ranked at the first operation and maintenance terminal, whether an operation and maintenance person clicks to receive the alarm information in 30 minutes at the operation and maintenance terminal is judged, and then further judgment is carried out.
Step S13: and if the identification information is acquired within the first preset time interval, repairing the alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning the corresponding repairing process through artificial intelligence.
In the embodiment of the application, if the identification information of the alarm received by the operation and maintenance terminal is acquired within the first preset time interval, the alarm item is repaired according to the repair instruction sent by the operation and maintenance terminal. In the process, artificial Intelligence (AI) also automatically learns the repairing process, so that if the same alarm information which is not processed in time is aimed at, the Artificial Intelligence can process the same alarm information, the alarm mechanism of the storage system is more intelligent, and the efficiency of solving the problem is improved.
Illustratively, when a cluster alarm is generated and reported, the priority corresponding to the alarm is preferentially distributed to the first operation and maintenance terminal, if the corresponding operation and maintenance terminal receives the alarm, the operation and maintenance personnel at the operation and maintenance terminal click to receive the alarm, and then the alarm is returned to a received identifier of the system, and the alarm is not pushed any more and is not pushed to other related personnel. In the process of manual intervention repair, artificial Intelligence (AI) will automatically learn to correct the default repair.
Step S14: and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence so as to process the alarm information according to a repairing result.
In the embodiment of the application, if the identification information that the operation and maintenance terminal has received the alarm is not acquired within the first preset time interval, the artificial intelligence tries to automatically repair the alarm. It can be understood that a set of default repair step lists is provided for each alarm when leaving a factory, artificial intelligence tries automatic repair according to the default repair lists, and a notification is sent after repair is completed; and if the repair fails, continuing to send an alarm and prompting that the automatic repair fails, requiring manual intervention repair, when the alarm is continuously pushed, distributing the alarm to personnel responsible for the alarm but with the second highest priority, judging whether the identification information is acquired or not within a second preset time interval, and so on.
Illustratively, when an alarm is pushed to an operation and maintenance terminal with the alarm priority and the alarm level ranked first, if the system does not receive the received identifier in more than 30 minutes, the system triggers artificial intelligence automatic repair of the alarm, tries automatic repair, and sends a notification after the repair is completed; if the repair fails, continuing to send an alarm and prompting the automatic repair failure, wherein manual intervention repair is needed; and when the alarm is continuously pushed, distributing the alarm to the personnel in charge of the alarm but with the second highest priority, and then if the accepted identification of any personnel is not received after ten minutes, continuously pushing the alarm and pushing the alarm to the personnel with the third highest priority, and so on.
It should be noted that, when the same operation and maintenance terminal receives alarm items with different priorities and alarm items with different levels, the operation and maintenance terminal may perform processing according to the priority of the preference customization processing, which is not specifically limited herein. For example, the operation and maintenance terminal a, the operation and maintenance terminal B, and the operation and maintenance terminal C exist. The operation and maintenance terminal A is distributed with a task 1a, a task 2b and a task 3a; the operation and maintenance terminal B is distributed with a task 1B, a task 2c and a task 3B; the operation and maintenance terminal C is distributed with a task 1C, a task 2a and a task 3C; the priority of each task is task 1, task 2 and task 3, and the priority of the task level is a, b and c; in the processing process, the operation and maintenance terminal with high task priority may be distributed first, and then the processing may be performed according to the task level.
In the method, firstly, alarm information reported by a cluster is obtained, and the alarm information is matched with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level, so that an operation and maintenance terminal which is correspondingly distributed to the alarm information is determined; then distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval; if the identification information is acquired within the first preset time interval, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning a corresponding repairing process through artificial intelligence; and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence so as to process the alarm information according to a repairing result. Therefore, at the operation and maintenance terminal, a user can configure the alarm items which are handled by the user in a customized manner, and the alarm items can be classified and distributed to the relevant operation and maintenance terminals to be handled according to the alarm priority and the alarm level which are configured in a customized manner in advance. In the manual intervention repair process of the operation and maintenance terminal, the artificial intelligence can automatically learn and correct, through the isolation and repair mechanism accurate pushing of alarming, the time for the operation and maintenance terminal to identify different alarms is reduced, the efficiency and pertinence for problem processing of the operation and maintenance terminal are improved, the time for identifying and processing problems of alarming is saved, the product service level and the whole competitiveness are improved, the artificial intelligence repair ensures the alarming and problem processing efficiency of a storage cluster, and the alarming mechanism of the storage system is more intelligent. The mechanism enhances the overall stability of the storage system and saves the maintenance cost to a certain extent.
The embodiment of the application discloses a specific storage system warning method, which is shown in fig. 3 and comprises the following steps:
step S21: and acquiring alarm information reported by the cluster, and matching the alarm information with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level so as to determine an operation and maintenance terminal corresponding to the alarm information.
Step S22: and distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether the identification information which is sent by the operation and maintenance terminal and has received the alarm information is acquired within a first preset time interval.
For more specific processing procedures of the step S21 and the step S22, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Step S23: and if the identification information is acquired within the first preset time interval, stopping pushing the alarm information, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning a corresponding repairing process through artificial intelligence.
In the embodiment of the application, the cluster reports the alarm information, then determines the operation and maintenance terminal to which the alarm information is correspondingly allocated, and if the system acquires the identification information sent by the operation and maintenance terminal and received the alarm information within a first preset time interval, which indicates that the alarm item corresponding to the alarm information is to be processed, the cluster does not push the alarm information to other operation and maintenance terminals. In the process of repairing the manual intervention alarm item, the artificial intelligence can automatically learn the corresponding repairing process, so that the artificial intelligence can process the same alarm information which is not processed in time, the alarm mechanism of the storage system is more intelligent, and the efficiency of solving the problem is improved.
Step S24: and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence according to a list of default repairing steps of the alarm item, and judging whether the automatic repairing is successful.
In the embodiment of the application, if the system does not acquire the identification information sent by the operation and maintenance terminal and received the alarm information within a first preset time interval, at this time, the artificial intelligence tries to automatically repair, specifically, automatically repairs the alarm item according to the list of the default repair step of the alarm item, and judges whether the automatic repair is successful, if the repair is successful, the repair is completed, and if the repair is unsuccessful, it is proved that manual intervention is needed to repair.
Step S25: when the automatic repair is successful, sending a repair completion notification after the repair is completed; and when the automatic repair fails, sending a repair failure notification, and pushing the alarm information to other operation and maintenance terminals correspondingly distributed to the alarm priority until the identification information is obtained.
In the embodiment of the application, if the artificial intelligence clears the automatic repair of the alarm item according to the default repair step list of the alarm item provided by factory, the completion of the repair is prompted; correspondingly, if the artificial intelligence automatically repairs the failure of the alarm item according to the list of default repair steps of the alarm item provided by factory, the repair failure can be prompted, manual intervention is needed for repair, and meanwhile, the alarm information is pushed to other operation and maintenance terminals correspondingly distributed with the alarm priority until the alarm information is received and processed.
In this embodiment of the application, the pushing the alarm information to an operation and maintenance terminal correspondingly allocated to another alarm priority until the identification information is obtained includes: and pushing the alarm information to other operation and maintenance terminals correspondingly distributed with the alarm priority, judging whether the identification information is acquired or not within a second preset time interval, and if the identification information is not acquired within the second preset time interval, continuing to push the alarm information to other operation and maintenance terminals correspondingly distributed with the alarm priority until the identification information is acquired. It is understood that the second predetermined time interval can be set arbitrarily, and can be the same as the first predetermined time interval or different from the first predetermined time interval.
Fig. 4 is a flow chart of processing the alarm information. When the cluster generates an alarm, identifying the alarm module and the alarm level of the alarm; acquiring personnel according with the alarm module and the alarm level according with the configuration information; according to the alarm module and the alarm level, corresponding alarms are preferentially distributed to the alarm module and the personnel with the first level priority; if the corresponding operation and maintenance personnel receive the alarm and click to receive, a received identifier is returned to the system. The alarm is not pushed to other related personnel any more; if the system does not receive the received identification within 30 minutes, an alarm AI automatic modification module is triggered, the system tries automatic repair, and the repair is completed to send a notice. If the repair fails, continuing to send the alarm and continuing to carry out alarm pushing, and meanwhile, distributing the alarm to personnel who are responsible for the alarm but have the second highest priority; if the received identification of any person is not received after 30 minutes, continuing to push the received identification and pushing the received identification to the person with the third highest priority, and so on; and when someone does not give an alarm, the artificial intelligence can automatically learn to correct the default repairing step in the manual intervention repairing process.
In the method, firstly, alarm information reported by a cluster is obtained, and the alarm information is matched with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level, so that an operation and maintenance terminal which is correspondingly distributed to the alarm information is determined; then distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval; if the identification information is acquired within the first preset time interval, stopping pushing the alarm information, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning a corresponding repairing process through artificial intelligence; if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence according to a list of default repairing steps of the alarm item, and judging whether the automatic repairing is successful; when the automatic repair is successful, sending a repair completion notification after the repair is completed; and when the automatic repair fails, sending a repair failure notification, and pushing the alarm information to other operation and maintenance terminals correspondingly distributed to the alarm priority until the identification information is obtained. Therefore, at the operation and maintenance terminal, a user can configure the alarm items which are handled by the user in a customized manner, and the alarm items can be classified and distributed to the relevant operation and maintenance terminals to be handled according to the alarm priority and the alarm level which are configured in a customized manner in advance. In the manual intervention repair process of the operation and maintenance terminal, the artificial intelligence can automatically learn and correct, through the isolation and repair mechanism accurate pushing of alarming, the time for the operation and maintenance terminal to identify different alarms is reduced, the efficiency and pertinence for problem processing of the operation and maintenance terminal are improved, the time for identifying and processing problems of alarming is saved, the product service level and the whole competitiveness are improved, the artificial intelligence repair ensures the alarming and problem processing efficiency of a storage cluster, and the alarming mechanism of the storage system is more intelligent. The mechanism enhances the overall stability of the storage system and saves the maintenance cost to a certain extent.
Correspondingly, an embodiment of the present application further discloses a storage system alarm device, as shown in fig. 5, the device includes:
the information acquisition module 11 is configured to acquire alarm information reported by a cluster, and match the alarm information with target configuration information, which is configured by a user in a pre-defined manner and carries an alarm priority and an alarm level, so as to determine an operation and maintenance terminal to which the alarm information is correspondingly allocated;
the information distribution module 12 is configured to distribute the alarm information to the corresponding operation and maintenance terminal, and determine whether to acquire identification information sent by the operation and maintenance terminal and received the alarm information within a first preset time interval;
the first alarm item repairing module 13 is configured to repair an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal when the identification information is obtained within the first preset time interval, and automatically learn a corresponding repairing process through artificial intelligence;
and the second alarm item repairing module 14 is configured to automatically repair the alarm item through the artificial intelligence when the identification information is not acquired within the first preset time interval, so as to process the alarm information according to a repair result.
For more specific working processes of the modules, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Therefore, according to the scheme of the embodiment, firstly, the alarm information reported by the cluster is obtained, and the alarm information is matched with the target configuration information which is configured by a user in a user-defined manner in advance and carries the alarm priority and the alarm level, so as to determine the operation and maintenance terminal to which the alarm information is correspondingly distributed; then distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval; if the identification information is acquired within the first preset time interval, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning a corresponding repairing process through artificial intelligence; and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence so as to process the alarm information according to a repairing result. Therefore, at the operation and maintenance terminal, a user can configure the alarm items which are handled by the user in a customized manner, and the alarm items can be classified and distributed to the relevant operation and maintenance terminals to be handled according to the alarm priority and the alarm level which are configured in a customized manner in advance. In the manual intervention repair process of the operation and maintenance terminal, the artificial intelligence can automatically learn and correct, through the isolation and repair mechanism accurate pushing of alarming, the time for the operation and maintenance terminal to identify different alarms is reduced, the efficiency and pertinence for problem processing of the operation and maintenance terminal are improved, the time for identifying and processing problems of alarming is saved, the product service level and the whole competitiveness are improved, the artificial intelligence repair ensures the alarming and problem processing efficiency of a storage cluster, and the alarming mechanism of the storage system is more intelligent. The mechanism enhances the overall stability of the storage system and saves the maintenance cost to a certain extent.
Further, an electronic device is disclosed in the embodiments of the present application, and fig. 6 is a block diagram of an electronic device 20 according to an exemplary embodiment, which should not be construed as limiting the scope of the application.
Fig. 6 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein, the memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the storage system alarm method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be a computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the memory 22 is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk, an optical disk, or the like, the resources stored thereon may include an operating system 221, a computer program 222, data 223, and the like, and the data 223 may include various data. The storage means may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device on the electronic device 20 and the computer program 222, and may be Windows Server, netware, unix, linux, or the like. The computer programs 222 may further include computer programs that can be used to perform other specific tasks in addition to the computer programs that can be used to perform the storage system alert method performed by the electronic device 20 disclosed in any of the foregoing embodiments.
Further, embodiments of the present application disclose a computer-readable storage medium, where the computer-readable storage medium includes a Random Access Memory (RAM), a Memory, a Read-Only Memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a magnetic disk, or an optical disk or any other form of storage medium known in the art. Wherein the computer program, when executed by a processor, implements the aforementioned storage system alert method. For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
The steps of a storage system alarm method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The storage system warning method, device, equipment and storage medium provided by the invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the invention; meanwhile, for a terminal in the general technology field, according to the idea of the present invention, there may be changes in the specific implementation and application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A storage system alarming method is characterized by comprising the following steps:
acquiring alarm information reported by a cluster, and matching the alarm information with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level so as to determine an operation and maintenance terminal corresponding to the alarm information;
distributing the alarm information to the corresponding operation and maintenance terminal, and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval;
if the identification information is acquired within the first preset time interval, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal, and automatically learning a corresponding repairing process through artificial intelligence;
and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence so as to process the alarm information according to a repairing result.
2. The storage system alarming method of claim 1, further comprising:
determining alarm items corresponding to different preset alarm information, and performing priority sequencing on the alarm items to obtain first configuration information;
determining a configuration level corresponding to the alarm item, and performing priority ordering on the configuration level to obtain second configuration information;
and determining the target configuration information by using the first configuration information and the second configuration information.
3. The storage system alarming method of claim 1, wherein the allocating the alarming information to the corresponding operation and maintenance terminal comprises:
and distributing the alarm information to the operation and maintenance terminal according to a preset distribution rule based on the alarm priority and the alarm level.
4. The storage system alarm method according to claim 1, wherein if the identification information is obtained within the first preset time interval, repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal includes:
and if the identification information is acquired within the first preset time interval, stopping pushing the alarm information, and repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal.
5. The storage system alarm method according to any one of claims 1 to 4, wherein if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence includes:
and if the identification information is not acquired within the first preset time interval, automatically repairing the alarm item through the artificial intelligence according to a list of default repairing steps of the alarm item, and judging whether the automatic repairing is successful.
6. The storage system alarming method of claim 5, wherein the determining whether the automatic repair is successful comprises:
when the automatic repair is successful, sending a notification of the repair completion after the repair is completed;
and when the automatic repair fails, sending a repair failure notification, and pushing the alarm information to other operation and maintenance terminals correspondingly distributed to the alarm priority until the identification information is obtained.
7. The storage system alarming method according to claim 6, wherein the pushing the alarming information to an operation and maintenance terminal correspondingly allocated to another alarming priority until the identification information is acquired comprises:
and pushing the alarm information to other operation and maintenance terminals correspondingly distributed with the alarm priority, judging whether the identification information is acquired or not within a second preset time interval, and if the identification information is not acquired within the second preset time interval, continuing to push the alarm information to other operation and maintenance terminals correspondingly distributed with the alarm priority until the identification information is acquired.
8. A storage system alert device, comprising:
the information acquisition module is used for acquiring alarm information reported by the cluster, and matching the alarm information with target configuration information which is configured by a user in a user-defined manner in advance and carries an alarm priority and an alarm level so as to determine an operation and maintenance terminal corresponding to the alarm information;
the information distribution module is used for distributing the alarm information to the corresponding operation and maintenance terminal and judging whether identification information which is sent by the operation and maintenance terminal and receives the alarm information is acquired within a first preset time interval;
the first alarm item repairing module is used for repairing an alarm item corresponding to the alarm information based on an operation instruction sent by the operation and maintenance terminal when the identification information is acquired within the first preset time interval, and automatically learning a corresponding repairing process through artificial intelligence;
and the second alarm item repairing module is used for automatically repairing the alarm item through the artificial intelligence when the identification information is not acquired within the first preset time interval so as to process the alarm information according to a repairing result.
9. An electronic device, comprising a processor and a memory; wherein the memory is for storing a computer program that is loaded and executed by the processor to implement the storage system alert method of any of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements a storage system alert method as claimed in any one of claims 1 to 7.
CN202210778324.2A 2022-06-29 2022-06-29 Storage system warning method, device, equipment and storage medium Pending CN115150249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210778324.2A CN115150249A (en) 2022-06-29 2022-06-29 Storage system warning method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210778324.2A CN115150249A (en) 2022-06-29 2022-06-29 Storage system warning method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115150249A true CN115150249A (en) 2022-10-04

Family

ID=83409501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210778324.2A Pending CN115150249A (en) 2022-06-29 2022-06-29 Storage system warning method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115150249A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110011843A (en) * 2019-03-29 2019-07-12 网宿科技股份有限公司 Alarm information processing method, electronic equipment and storage medium
CN110096410A (en) * 2019-03-15 2019-08-06 中国平安人寿保险股份有限公司 Alarm information processing method, system, computer installation and readable storage medium storing program for executing
CN110245056A (en) * 2019-06-10 2019-09-17 中国工商银行股份有限公司 O&M alarm information processing method and device
US20200160993A1 (en) * 2018-11-16 2020-05-21 International Business Machines Corporation Artificial Intelligence Based Alert System
CN112631818A (en) * 2020-12-24 2021-04-09 平安科技(深圳)有限公司 Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium
WO2021174835A1 (en) * 2020-03-04 2021-09-10 平安科技(深圳)有限公司 Alarm information processing method and apparatus, and computer apparatus and storage medium
CN113590025A (en) * 2021-06-20 2021-11-02 济南浪潮数据技术有限公司 Alarm method and device for distributed storage system
CN113791959A (en) * 2021-08-13 2021-12-14 济南浪潮数据技术有限公司 Alarm pushing method, system, terminal and storage medium of service platform
CN113961441A (en) * 2021-10-29 2022-01-21 中国工商银行股份有限公司 Alarm event processing method, auditing method, device, equipment, medium and product
CN114118454A (en) * 2021-10-20 2022-03-01 郑州云海信息技术有限公司 Equipment management method, device, equipment and readable medium based on 5G network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160993A1 (en) * 2018-11-16 2020-05-21 International Business Machines Corporation Artificial Intelligence Based Alert System
CN110096410A (en) * 2019-03-15 2019-08-06 中国平安人寿保险股份有限公司 Alarm information processing method, system, computer installation and readable storage medium storing program for executing
CN110011843A (en) * 2019-03-29 2019-07-12 网宿科技股份有限公司 Alarm information processing method, electronic equipment and storage medium
CN110245056A (en) * 2019-06-10 2019-09-17 中国工商银行股份有限公司 O&M alarm information processing method and device
WO2021174835A1 (en) * 2020-03-04 2021-09-10 平安科技(深圳)有限公司 Alarm information processing method and apparatus, and computer apparatus and storage medium
CN112631818A (en) * 2020-12-24 2021-04-09 平安科技(深圳)有限公司 Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium
CN113590025A (en) * 2021-06-20 2021-11-02 济南浪潮数据技术有限公司 Alarm method and device for distributed storage system
CN113791959A (en) * 2021-08-13 2021-12-14 济南浪潮数据技术有限公司 Alarm pushing method, system, terminal and storage medium of service platform
CN114118454A (en) * 2021-10-20 2022-03-01 郑州云海信息技术有限公司 Equipment management method, device, equipment and readable medium based on 5G network
CN113961441A (en) * 2021-10-29 2022-01-21 中国工商银行股份有限公司 Alarm event processing method, auditing method, device, equipment, medium and product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AHILLYA S BOTE; DEEPAK KSHIRSAGAR; ASHISH MADKAIKAR; BIMAL SHAH: "Intelligent Based Alarm Management System for Plant Automation", 《2018 3RD IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT)》 *
林舒刚;: "5G网络智能运维研究", 广东通信技术, no. 03 *

Similar Documents

Publication Publication Date Title
CN104657212A (en) Task scheduling method and system
CN109426510B (en) Software processing method and device, electronic equipment and computer readable storage medium
CN108737132B (en) Alarm information processing method and device
CN101778004B (en) Terminal and method for performing device management scheduled based on threshold thereof
CN111988240B (en) Data transmission method and device, electronic equipment and storage medium
CN102318270A (en) Access node monitoring control apparatus, access node monitoring system, method, and program
CN111104260A (en) Service upgrade monitoring method, device, server and storage medium
US20080144488A1 (en) Method and System for Providing Prioritized Failure Announcements
US20230029198A1 (en) Scheduling complex jobs in a distributed network
CN107577527B (en) Task generation and scheduling method and device
CN115077955A (en) Intelligent monitoring method and device for equipment faults, electronic equipment and storage medium
CN110865921A (en) Data monitoring method and device, readable storage medium and electronic equipment
CN111949421B (en) SDK calling method, device, electronic equipment and computer readable storage medium
CN110072199B (en) Method and system for monitoring short message sending abnormity
CN113472574A (en) Method, device, medium and electronic equipment for monitoring user equipment based on 5G private network
CN115150249A (en) Storage system warning method, device, equipment and storage medium
CN110855003A (en) Method and device for calling and comparing self-adaptive configuration of main station
CN111082964A (en) Distribution method and device of configuration information
CN113242147B (en) Automatic operation and maintenance deployment method, device, equipment and storage medium of multi-cloud environment
CN115509714A (en) Task processing method and device, electronic equipment and storage medium
CN111284352A (en) Transport vehicle charging method and system
CN114036032A (en) Real-time program monitoring method and device
CN110932926B (en) Container cluster monitoring method, system and device
CN111190788B (en) Data monitoring method and device, electronic equipment and readable medium
CN115085371A (en) Intelligent power distribution network engineering auxiliary management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240425

Address after: Room 1801, 18th Floor, Jiyun Investment Building, No. 278 Xinyi Road, Zhengdong New District, Zhengzhou City, Henan Province, 450047

Applicant after: Zhengzhou Inspur Data Technology Co.,Ltd.

Country or region after: China

Address before: 250101 room s311, building S05, Inspur Science Park, No. 1036, Inspur Road, Jinan pilot Free Trade Zone, Jinan, Shandong Province

Applicant before: Ji'nan tide data Technology Co.,Ltd.

Country or region before: China