CN114116282A - Method and device for reporting and repairing network additional storage fault - Google Patents

Method and device for reporting and repairing network additional storage fault Download PDF

Info

Publication number
CN114116282A
CN114116282A CN202111342238.9A CN202111342238A CN114116282A CN 114116282 A CN114116282 A CN 114116282A CN 202111342238 A CN202111342238 A CN 202111342238A CN 114116282 A CN114116282 A CN 114116282A
Authority
CN
China
Prior art keywords
alarm
alarm event
error
reporting
error code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111342238.9A
Other languages
Chinese (zh)
Other versions
CN114116282B (en
Inventor
郑强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111342238.9A priority Critical patent/CN114116282B/en
Publication of CN114116282A publication Critical patent/CN114116282A/en
Application granted granted Critical
Publication of CN114116282B publication Critical patent/CN114116282B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明提供一种网络附加存储故障上报并修复的方法、系统、设备和存储介质,方法包括:获取网络附加存储的告警信息文件,并在所述告警信息文件中填充告警数据信息;根据填充的所述告警数据信息依次判断每个告警事件是否触发告警;响应于告警事件能够触发告警,调用上报函数上报所述告警事件;以及根据所述告警事件出现的标识,调用故障模式库中的修复函数对所述告警事件进行修复。本发明会显示网络附加存储告警并对用户可见,这样就能高效的应对故障,保证系统的稳定性,同时能够自动修复部分告警,不再需要人工干预,这样对用户没有感知,增加用户的认可度。

Figure 202111342238

The present invention provides a method, system, device and storage medium for reporting and repairing network attached storage faults. The method includes: acquiring an alarm information file stored in a network attached storage, and filling the alarm information file with alarm data information; The alarm data information sequentially determines whether each alarm event triggers an alarm; in response to the alarm event being able to trigger an alarm, calling a reporting function to report the alarm event; and calling a repair function in the failure mode library according to the identification of the occurrence of the alarm event Repair the alarm event. The present invention will display the network attached storage alarm and make it visible to the user, so that the fault can be efficiently dealt with, the stability of the system can be ensured, and some alarms can be automatically repaired, and manual intervention is no longer required, so that the user has no perception and increases the user's approval. Spend.

Figure 202111342238

Description

Method and device for reporting and repairing network additional storage fault
Technical Field
The present invention relates to the field of storage, and in particular, to a method, a system, a device, and a storage medium for reporting and repairing a network attached storage failure.
Background
In the big data era, the requirements on the reliability of storage and accurate positioning of problems are higher and higher. However, after a failure occurs in the service of the existing MCS system (simplified linux based on linux kernel) NAS (Network Attached Storage), there is no alarm event prompt information related to the Network Attached Storage service in the GUI (Graphical User Interface), so that it is inconvenient for a User to obtain the failure information in time, and thus the failure information cannot be measured and processed in time, and hidden troubles are buried for stable operation of the system.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, a system, a computer device, and a computer readable storage medium for reporting and repairing a network attached storage failure, where a network attached storage alarm is visually displayed on a user's page, and when the network attached storage alarm occurs, automatic repair is performed to reduce manual intervention, increase the user's acceptance, and improve the system stability.
Based on the above object, an aspect of the embodiments of the present invention provides a method for reporting and repairing a network attached storage failure, including the following steps: acquiring an alarm information file additionally stored in a network, and filling alarm data information in the alarm information file; judging whether each alarm event triggers an alarm or not in sequence according to the filled alarm data information; responding to an alarm event and triggering an alarm, and calling a reporting function to report the alarm event; and calling a repair function in a failure mode library to repair the alarm event according to the identifier of the alarm event.
In some embodiments, the invoking the reporting function to report the alarm event includes: activating an error corresponding to the alarm event in a manager corresponding to the alarm event, and checking whether other managers activate the error; and mapping the error code to a node real error code and setting an error flag in response to the other managers not activating the error.
In some embodiments, the method further comprises: and in response to the alarm event not triggering the alarm, calling a clearing function to clear the alarm event.
In some embodiments, the invoking a clear function to clear the alarm event comprises: clearing error code information in the cache, and judging whether the error code is a preset value or not; and responding to the error code as a preset value, clearing the current mode of the platform main process, and setting the platform main process as a common mode.
In another aspect of the embodiments of the present invention, a system for reporting and repairing a network attached storage fault is provided, including: the acquisition module is configured for acquiring an alarm information file additionally stored in a network and filling alarm data information in the alarm information file; the judging module is configured to sequentially judge whether each alarm event triggers an alarm according to the filled alarm data information; the reporting module is configured to respond to an alarm event and trigger an alarm, and call a reporting function to report the alarm event; and the repairing module is configured to call a repairing function in the failure mode library to repair the alarm event according to the identifier of the alarm event.
In some embodiments, the reporting module is configured to: activating an error corresponding to the alarm event in a manager corresponding to the alarm event, and checking whether other managers activate the error; and mapping the error code to a node real error code and setting an error flag in response to the other managers not activating the error.
In some embodiments, the system further comprises a purge module configured to: and in response to the alarm event not triggering the alarm, calling a clearing function to clear the alarm event.
In some embodiments, the purge module is further configured to: clearing error code information in the cache, and judging whether the error code is a preset value or not; and responding to the error code as a preset value, clearing the current mode of the platform main process, and setting the platform main process as a common mode.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: by visually displaying the network additional storage alarm on the page of the user and automatically repairing when the network additional storage alarm occurs, the manual intervention is reduced, the recognition degree of the user is increased, and the stability of the system is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic diagram of an embodiment of a method for reporting and repairing a network attached storage failure according to the present invention;
fig. 2 is a schematic diagram of an embodiment of a system for reporting and repairing a network attached storage failure according to the present invention;
fig. 3 is a schematic diagram of a hardware structure of an embodiment of a computer device for reporting and repairing a network attached storage failure according to the present invention;
fig. 4 is a schematic diagram of an embodiment of a computer storage medium for reporting and repairing a network attached storage failure according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
A first aspect of an embodiment of the present invention provides an embodiment of a method for reporting and repairing a network attached storage fault. Fig. 1 is a schematic diagram illustrating an embodiment of a method for reporting and repairing a network attached storage failure according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
s1, acquiring an alarm information file additionally stored in a network, and filling alarm data information in the alarm information file;
s2, sequentially judging whether each alarm event triggers an alarm according to the filled alarm data information;
s3, responding to the alarm event and triggering the alarm, and calling a reporting function to report the alarm event; and
and S4, according to the identifier of the alarm event, calling a repair function in a failure mode library to repair the alarm event.
Through embedding a plurality of sensors with faults in the Network additional storage virtual machine, if faults occur, the sensors can quickly capture and report to an MCS System, for example, alarms such as collection of Network additional storage node failover, NFS (Network File System) service, CIFS (Common Internet File Systems) service, FTP (File Transfer Protocol) service, Minioss service, Network additional storage restart fault, Network additional storage Ethernet port fault, File System capacity and the like are called by the MCS System, and the implementation flow is as follows:
the method is implemented by a daemon vm _ daemon.py, nas _ alarmd is called once every 5 seconds, and is connected with a virtual machine through ssh (Secure Shell) to execute nas _ alarm.py to inquire, and the inquiry is based on nodes. And nas _ alarmd acquires the states of network additional storage node failover, network file system service, general Internet file system service, file transfer protocol service and Minioss service, restart, network card and file system in the virtual machine, and writes a fifo file for the mcs alarm code to query if the query is successful.
And acquiring an alarm information file additionally stored in a network, and filling alarm data information in the alarm information file. And sequentially judging whether each alarm event triggers an alarm or not according to the filled alarm data information.
And responding to an alarm event and triggering an alarm, and calling a reporting function to report the alarm event. The MCS system alarm detection processing is completed through an EC module and a PL module in the system, each module has the following specific functions, the EC module sequentially judges alarm events by reading network additional stored alarm information files, fills information such as error records, state data, activation marks and the like, then sequentially processes the alarm events according to the filled information, if an alarm exists, an alarm reporting function is called, otherwise, an alarm clearing function is called; and the PL module carries out error code sequencing according to the received alarm event information and reports the alarm event. The specific process is as follows: the MCS checks whether the event state is at starting, and exits if the event state is at starting; the MCS system reads the NAS warning information state file and judges whether the acquired information is valid or not, and if the acquired information is invalid, the NAS warning information state file is quitted; the MCS system starts to sequentially judge the NAS warning information and fills information such as error records, state data, activation marks and the like; and processing the alarm events in turn according to the alarm data information filled in the previous step, calling an ecmgr _ sensor _ report _ node _ error function to report an alarm if a certain alarm event has an alarm, and calling the ecmgr _ sensor _ clear _ node _ error function to clear the alarm if no alarm exists.
In some embodiments, the invoking the reporting function to report the alarm event includes: activating an error corresponding to the alarm event in a manager corresponding to the alarm event, and checking whether other managers activate the error; and mapping the error code to a node real error code and setting an error flag in response to the other managers not activating the error. And checking whether the error code is 0x522, if so, forcibly setting the platform main process to 522 mode, and if not, calling a function to report an alarm. The error code is cached to prevent the loss of error information due to the exit of the io process (input/output process).
In some embodiments, the method further comprises: and in response to the alarm event not triggering the alarm, calling a clearing function to clear the alarm event.
In some embodiments, the invoking a clear function to clear the alarm event comprises: clearing error code information in the cache, and judging whether the error code is a preset value or not; and responding to the error code as a preset value, clearing the current mode of the platform main process, and setting the platform main process as a common mode. It is checked whether the error code is 0x522, if so, the platform host process 522 mode is cleared, and if not, the platform host process is set to normal mode.
Invoking the clear function to clear the alarm event also includes: activating an error corresponding to the alarm event in a manager corresponding to the alarm event, and checking whether other managers activate the error; and mapping the error code to a node true error code in response to the other manager not activating the error.
And calling a repair function in a failure mode library to repair the alarm event according to the identifier of the alarm event.
The information of the related alarm event of the NAS can be displayed in an alarm interface at the front end of the graphical user interface, the interface lists error codes, time stamps, states, descriptions, object types, object identifications and object name information of the current alarm event, and right-clicking a certain alarm event can execute operations of checking attributes, clearing logs, running repairs and the like on the alarm event. And partial alarm is performed, the large data background script is registered, and then the automatic repair module is called to automatically position and repair. And the principle of the automatic modification module calls an automatic repair module in the fault mode library to automatically repair according to the identifier of the alarm.
It should be particularly noted that, in each embodiment of the foregoing method for reporting and repairing a network attached storage failure, each step may be intersected, replaced, added, or deleted, and therefore, the method for reporting and repairing a network attached storage failure, which is transformed by reasonable permutation and combination, shall also belong to the protection scope of the present invention, and shall not limit the protection scope of the present invention to the embodiment.
Based on the above object, a second aspect of the embodiments of the present invention provides a system for reporting and repairing a network attached storage failure. As shown in fig. 2, the system 200 includes the following modules: the acquisition module is configured for acquiring an alarm information file additionally stored in a network and filling alarm data information in the alarm information file; the judging module is configured to sequentially judge whether each alarm event triggers an alarm according to the filled alarm data information; the reporting module is configured to respond to an alarm event and trigger an alarm, and call a reporting function to report the alarm event; and the repairing module is configured to call a repairing function in the failure mode library to repair the alarm event according to the identifier of the alarm event.
In some embodiments, the reporting module is configured to: activating an error corresponding to the alarm event in a manager corresponding to the alarm event, and checking whether other managers activate the error; and mapping the error code to a node real error code and setting an error flag in response to the other managers not activating the error.
In some embodiments, the system further comprises a purge module configured to: and in response to the alarm event not triggering the alarm, calling a clearing function to clear the alarm event.
In some embodiments, the purge module is further configured to: clearing error code information in the cache, and judging whether the error code is a preset value or not; and responding to the error code as a preset value, clearing the current mode of the platform main process, and setting the platform main process as a common mode.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, acquiring an alarm information file additionally stored in a network, and filling alarm data information in the alarm information file; s2, sequentially judging whether each alarm event triggers an alarm according to the filled alarm data information; s3, responding to the alarm event and triggering the alarm, and calling a reporting function to report the alarm event; and S4, according to the identifier of the alarm event, calling a repair function in a failure mode library to repair the alarm event.
In some embodiments, the invoking the reporting function to report the alarm event includes: activating an error corresponding to the alarm event in a manager corresponding to the alarm event, and checking whether other managers activate the error; and mapping the error code to a node real error code and setting an error flag in response to the other managers not activating the error.
In some embodiments, the steps further comprise: and in response to the alarm event not triggering the alarm, calling a clearing function to clear the alarm event.
In some embodiments, the invoking a clear function to clear the alarm event comprises: clearing error code information in the cache, and judging whether the error code is a preset value or not; and responding to the error code as a preset value, clearing the current mode of the platform main process, and setting the platform main process as a common mode.
Fig. 3 is a schematic diagram of a hardware structure of an embodiment of the computer device for reporting and repairing the network attached storage failure according to the present invention.
Taking the device shown in fig. 3 as an example, the device includes a processor 301 and a memory 302.
The processor 301 and the memory 302 may be connected by a bus or other means, such as the bus connection in fig. 3.
The memory 302 is used as a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for reporting and repairing a network attached storage failure in the embodiment of the present application. The processor 301 executes various functional applications and data processing of the server by running the nonvolatile software programs, instructions and modules stored in the memory 302, that is, a method for reporting and repairing a network attached storage failure is realized.
The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a method of network-attached storage failure reporting and repairing, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Computer instructions 303 corresponding to one or more methods for reporting and repairing a network attached storage failure are stored in the memory 302, and when executed by the processor 301, perform the method for reporting and repairing a network attached storage failure in any of the above-described method embodiments.
Any embodiment of the computer device executing the method for reporting and repairing the network attached storage failure can achieve the same or similar effects as any corresponding method embodiment.
The invention also provides a computer readable storage medium, which stores a computer program for executing the method for reporting and repairing the network additional storage fault when the computer program is executed by the processor.
Fig. 4 is a schematic diagram of an embodiment of a computer storage medium for reporting and repairing the network attached storage failure according to the present invention. Taking the computer storage medium as shown in fig. 4 as an example, the computer readable storage medium 401 stores a computer program 402 which, when executed by a processor, performs the method as described above.
Finally, it should be noted that, as one of ordinary skill in the art can appreciate, all or part of the processes in the methods of the foregoing embodiments may be implemented by instructing relevant hardware by a computer program, and the program of the method for reporting and repairing a network-attached storage failure may be stored in a computer-readable storage medium, and when executed, may include the processes of the foregoing embodiments of the methods. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1.一种网络附加存储故障上报并修复的方法,其特征在于,包括如下步骤:1. a method for network attached storage fault reporting and repair, characterized in that, comprising the steps: 获取网络附加存储的告警信息文件,并在所述告警信息文件中填充告警数据信息;Obtaining an alarm information file stored in a network attached storage, and filling the alarm information file with alarm data information; 根据填充的所述告警数据信息依次判断每个告警事件是否触发告警;According to the filled alarm data information, sequentially determine whether each alarm event triggers an alarm; 响应于告警事件能够触发告警,调用上报函数上报所述告警事件;以及In response to the alarm event being able to trigger an alarm, a reporting function is called to report the alarm event; and 根据所述告警事件出现的标识,调用故障模式库中的修复函数对所述告警事件进行修复。According to the identification of the occurrence of the alarm event, the repair function in the failure mode library is called to repair the alarm event. 2.根据权利要求1所述的方法,其特征在于,所述调用上报函数上报所述告警事件包括:2. The method according to claim 1, wherein the calling a reporting function to report the alarm event comprises: 在所述告警事件对应的管理器中激活所述告警事件对应的错误,并检查其他管理器是否激活过所述错误;以及activate the error corresponding to the alarm event in the manager corresponding to the alarm event, and check whether other managers have activated the error; and 响应于其他管理器未激活过所述错误,将错误码映射为节点真实错误码,并设置错误标记。In response to the error not being activated by other managers, the error code is mapped to the node real error code and the error flag is set. 3.根据权利要求1所述的方法,其特征在于,方法还包括:3. The method according to claim 1, wherein the method further comprises: 响应于告警事件不能触发告警,调用清除函数清除所述告警事件。In response to the alarm event not being able to trigger the alarm, a clear function is called to clear the alarm event. 4.根据权利要求3所述的方法,其特征在于,所述调用清除函数清除所述告警事件包括:4. The method according to claim 3, wherein the calling a clearing function to clear the alarm event comprises: 将缓存中的错误码信息清除,并判断错误码是否为预设值;以及Clear the error code information in the cache, and determine whether the error code is the default value; and 响应于错误码为预设值,清除平台主进程的当前模式,并将所述平台主进程设置为普通模式。In response to the error code being a preset value, the current mode of the platform main process is cleared, and the platform main process is set to the normal mode. 5.一种网络附加存储故障上报并修复的系统,其特征在于,包括:5. A system for reporting and repairing network-attached storage faults, comprising: 获取模块,配置用于获取网络附加存储的告警信息文件,并在所述告警信息文件中填充告警数据信息;an obtaining module, configured to obtain an alarm information file stored in a network attached storage, and fill in the alarm data information in the alarm information file; 判断模块,配置用于根据填充的所述告警数据信息依次判断每个告警事件是否触发告警;a judgment module, configured to sequentially judge whether each alarm event triggers an alarm according to the filled alarm data information; 上报模块,配置用于响应于告警事件能够触发告警,调用上报函数上报所述告警事件;以及a reporting module, configured to be able to trigger an alarm in response to an alarm event, and to call a reporting function to report the alarm event; and 修复模块,配置用于根据所述告警事件出现的标识,调用故障模式库中的修复函数对所述告警事件进行修复。The repair module is configured to call the repair function in the failure mode library to repair the alarm event according to the identification of the occurrence of the alarm event. 6.根据权利要求5所述的系统,其特征在于,所述上报模块配置用于:6. The system according to claim 5, wherein the reporting module is configured to: 在所述告警事件对应的管理器中激活所述告警事件对应的错误,并检查其他管理器是否激活过所述错误;以及activate the error corresponding to the alarm event in the manager corresponding to the alarm event, and check whether other managers have activated the error; and 响应于其他管理器未激活过所述错误,将错误码映射为节点真实错误码,并设置错误标记。In response to the error not being activated by other managers, the error code is mapped to the node real error code and the error flag is set. 7.根据权利要求5所述的系统,其特征在于,系统还包括清除模块,配置用于:7. The system according to claim 5, wherein the system further comprises a clearing module configured to: 响应于告警事件不能触发告警,调用清除函数清除所述告警事件。In response to the alarm event not being able to trigger the alarm, a clear function is called to clear the alarm event. 8.根据权利要求7所述的系统,其特征在于,所述清除模块进一步配置用于:8. The system of claim 7, wherein the clearing module is further configured to: 将缓存中的错误码信息清除,并判断错误码是否为预设值;以及Clear the error code information in the cache, and determine whether the error code is the default value; and 响应于错误码为预设值,清除平台主进程的当前模式,并将所述平台主进程设置为普通模式。In response to the error code being a preset value, the current mode of the platform main process is cleared, and the platform main process is set to the normal mode. 9.一种计算机设备,其特征在于,包括:9. A computer equipment, characterized in that, comprising: 至少一个处理器;以及at least one processor; and 存储器,所述存储器存储有可在所述处理器上运行的计算机指令,所述指令由所述处理器执行时实现权利要求1-4任意一项所述方法的步骤。a memory storing computer instructions executable on the processor, the instructions implementing the steps of the method of any one of claims 1-4 when executed by the processor. 10.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-4任意一项所述方法的步骤。10. A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the steps of the method of any one of claims 1-4 are implemented.
CN202111342238.9A 2021-11-12 2021-11-12 Method and device for reporting and repairing network additional storage faults Active CN114116282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111342238.9A CN114116282B (en) 2021-11-12 2021-11-12 Method and device for reporting and repairing network additional storage faults

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111342238.9A CN114116282B (en) 2021-11-12 2021-11-12 Method and device for reporting and repairing network additional storage faults

Publications (2)

Publication Number Publication Date
CN114116282A true CN114116282A (en) 2022-03-01
CN114116282B CN114116282B (en) 2023-08-18

Family

ID=80379036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111342238.9A Active CN114116282B (en) 2021-11-12 2021-11-12 Method and device for reporting and repairing network additional storage faults

Country Status (1)

Country Link
CN (1) CN114116282B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473788A (en) * 2022-08-29 2022-12-13 苏州浪潮智能科技有限公司 A storage alarm test method, device, equipment, and storage medium
CN115842710A (en) * 2022-11-22 2023-03-24 中国农业银行股份有限公司 Service side data processing method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339297A (en) * 2016-09-14 2017-01-18 郑州云海信息技术有限公司 Method and system for warning failures of storage system in real time
CN108763038A (en) * 2018-08-08 2018-11-06 平安科技(深圳)有限公司 Management method, device, computer equipment and the storage medium of alarm data
CN110688280A (en) * 2019-09-25 2020-01-14 中国建设银行股份有限公司 Management system, method, equipment and storage medium of alarm event
CN112035319A (en) * 2020-08-31 2020-12-04 浪潮云信息技术股份公司 Monitoring alarm system for multi-path state
CN112131201A (en) * 2020-09-18 2020-12-25 苏州浪潮智能科技有限公司 Method, system, equipment and medium for high availability of network additional storage
WO2021136247A1 (en) * 2019-12-31 2021-07-08 华为技术有限公司 Alarm processing method and apparatus, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339297A (en) * 2016-09-14 2017-01-18 郑州云海信息技术有限公司 Method and system for warning failures of storage system in real time
CN108763038A (en) * 2018-08-08 2018-11-06 平安科技(深圳)有限公司 Management method, device, computer equipment and the storage medium of alarm data
CN110688280A (en) * 2019-09-25 2020-01-14 中国建设银行股份有限公司 Management system, method, equipment and storage medium of alarm event
WO2021136247A1 (en) * 2019-12-31 2021-07-08 华为技术有限公司 Alarm processing method and apparatus, and storage medium
CN112035319A (en) * 2020-08-31 2020-12-04 浪潮云信息技术股份公司 Monitoring alarm system for multi-path state
CN112131201A (en) * 2020-09-18 2020-12-25 苏州浪潮智能科技有限公司 Method, system, equipment and medium for high availability of network additional storage

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473788A (en) * 2022-08-29 2022-12-13 苏州浪潮智能科技有限公司 A storage alarm test method, device, equipment, and storage medium
CN115473788B (en) * 2022-08-29 2023-08-11 苏州浪潮智能科技有限公司 A storage alarm test method, device, equipment, and storage medium
CN115842710A (en) * 2022-11-22 2023-03-24 中国农业银行股份有限公司 Service side data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN114116282B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN107241229B (en) A business monitoring method and device based on an interface testing tool
CN112631913B (en) Method, device, equipment and storage medium for monitoring operation faults of application program
CN114116282B (en) Method and device for reporting and repairing network additional storage faults
CN111680104B (en) Data synchronization method, device, computer equipment and readable storage medium
CN112261114A (en) Data backup system and method
CN110659186A (en) Alarm information reporting method and device
CN112231130B (en) Method, system, equipment and medium for positioning fault according to log
CN117667258A (en) Restarting method and device of embedded system, electronic equipment and readable storage medium
CN115333923B (en) Fault point tracing analysis method, device, equipment and medium
CN112764990A (en) Target process monitoring method and device and computer equipment
JP2012003651A (en) Virtualized environment motoring device, and monitoring method and program for the same
TWI518680B (en) Method for maintaining file system of computer system
EP4557682A1 (en) Method and apparatus for processing service alarm based on cdn, device, and medium
CN112068935A (en) Method, device and equipment for monitoring deployment of kubernets program
CN110502581A (en) Distributed database system monitoring method and device
CN106161087A (en) The network interface card error event collection method of a kind of linux system and system
CN110231921B (en) Log printing method, device, equipment and computer readable storage medium
CN110798347A (en) Service state detection method, device, equipment and storage medium
CN112817623B (en) Method and device for publishing application program, mobile terminal and readable storage medium
CN116662285A (en) Storage method and device of server log, storage medium and electronic device
CN115733740A (en) Log detection method and device, computer equipment and computer readable storage medium
CN114629786A (en) Log real-time analysis method, device, storage medium and system
CN119814529A (en) Fault alarm method, device, computer equipment and storage medium
CN115952006B (en) Resource leak detection method, system, device, server and storage medium
CN119862167A (en) File operation tracking method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant