CN112084097A - Disk warning method and device - Google Patents

Disk warning method and device Download PDF

Info

Publication number
CN112084097A
CN112084097A CN202011021727.XA CN202011021727A CN112084097A CN 112084097 A CN112084097 A CN 112084097A CN 202011021727 A CN202011021727 A CN 202011021727A CN 112084097 A CN112084097 A CN 112084097A
Authority
CN
China
Prior art keywords
information
disk
current
file
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011021727.XA
Other languages
Chinese (zh)
Inventor
王立伟
张翔
余冬玲
石一飞
范鹏
王洪
陈东平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202011021727.XA priority Critical patent/CN112084097A/en
Publication of CN112084097A publication Critical patent/CN112084097A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Abstract

The invention discloses a disk warning method, which comprises the following steps: the current period log file of each disk controller and the state information of the disk connected with the disk controller, wherein the current period log file comprises: normal information and abnormal information; writing the abnormal information in the log file of each current period, the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into the current routing inspection file according to a preset rule; comparing the information in the current routing inspection file with the information in the routing inspection file in the previous period one by one until the comparison of all the information in the current routing inspection file is completed; writing the information which is not passed through the comparison into the current alarm file; and sending out alarm information based on the current alarm file. The implementation of the embodiment of the specification ensures that the sent alarms are all faults which newly appear during automatic routing inspection, so that the fault information cannot be submerged in a large number of repeated alarms, and the fault discovery rate is improved.

Description

Disk warning method and device
Technical Field
The invention relates to the technical field of server fault monitoring and alarming, in particular to a disk alarming method and device.
Background
With the rapid development of data centers, disk-intensive servers increasingly exert the advantages of high capacity, low cost, flexible expansion and high reliability in the aspect of mass data storage. On hardware, the disk intensive server adopts an equipment redundancy design, provides a hot plugging technology, can replace disks, power supplies, fans and the like on line, and adopts an RAID mechanism to correspondingly protect databases, files, shared resource information and the like. When one disk fails, the server sends an alarm, only the failed disk needs to be replaced, and the disk array performs data verification and recovery through an RAID mechanism without influencing data reading and writing of the system.
Under the prior art, the fault states of disk intensive servers of some manufacturers and models cannot be collected by a hardware management platform in a BMC or IPMI mode, but are manually and patrolled by an operation and maintenance engineer, a method for timely and automatically discovering faults and early warning about faults to occur at the level of an operating system is lacked, and an effective solution is not available at present for the blank current situation of the alarm channel.
Because the number of the disks of the disk intensive server is large, the disk failure is a high-occurrence failure scene, and when the failure occurs, warning information needs to be obtained in time for processing, so that the risk of data loss caused by overlong failure state duration or untimely discovery is avoided. The current fault finding mode depends on manual inspection of an operation and maintenance engineer, excessive consumption of labor cost of the operation and maintenance engineer is caused, and the risk of missed fault detection, false fault detection or untimely fault finding exists.
Therefore, in order to avoid that a disk fault cannot be found when a single alarm channel fails, a technical scheme of a disk alarm method and a device is urgently needed to be provided, a set of disk fault alarm mechanism for a disk intensive server is designed according to the level of an operating system, a disk state log is collected at regular time, the impending fault is pre-judged, and fault alarm information is obtained in time by monitoring the disk state at regular time, so that operation and maintenance personnel can conveniently process the fault.
Disclosure of Invention
In view of the foregoing problems in the prior art, an object of the present invention is to provide a disk warning method and device, which can detect sensitive information in a log file to improve management and control of the sensitive information and improve user security.
In a first aspect, the present invention provides a disk alarm method, where the method is applied to a disk-intensive server, and the method includes:
traversing each disk controller in the server in the current period to acquire a log file in the current period of each disk controller and state information of a disk connected with the disk controller, wherein the log file in the current period comprises: normal information and abnormal information, the state information including: normal state information and abnormal state information;
writing abnormal information in each current period log file, the state information corresponding to the abnormal information, the abnormal state information and normal information corresponding to the abnormal state information into a current routing inspection file according to a preset rule;
comparing the information in the current inspection file with the information in the inspection file in the previous period one by one until all the information in the current inspection file is compared;
writing the information which is not passed through the comparison into the current alarm file;
and sending out alarm information based on the current alarm file so that a person subscribing the alarm information can obtain the abnormal information corresponding to the disk and/or the content of the abnormal state information corresponding to the disk controller.
In a second aspect, the present invention provides a disk warning device, including:
the information acquisition module is configured to perform traversal of each disk controller in the server in the current period to acquire a current period log file of each disk controller and state information of a disk connected with the disk controller, where the current period log file includes: normal information and abnormal information, the state information including: normal state information and abnormal state information;
the first writing module is configured to write the abnormal information in each current period log file, the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into a current routing inspection file according to a preset rule;
the comparison module is configured to compare the information in the current inspection file with the information in the inspection file in the previous period one by one until the comparison of all the information in the current inspection file is completed;
the second writing module is configured to write the information which is not passed through the comparison into the current alarm file;
and the warning module is configured to execute sending out warning information based on the current warning file so that a person who subscribes the warning information can acquire the abnormal information corresponding to the disk and/or the content of the abnormal state information corresponding to the disk controller.
In a third aspect, the present invention provides a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the disk alarm method as described above.
In a fourth aspect, the present invention provides a disk warning device, including at least one processor, and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements the disk alert method as described above by executing the instructions stored by the memory.
The disk warning method and the disk warning device provided by the invention have the following beneficial effects:
the implementation of the embodiment of the specification converts the process that the fault and the abnormity of the disk intensive servers of the original partial manufacturers and models need to be found through manual inspection by an operation and maintenance engineer into the mode of automatic inspection at regular time and alarm, thereby effectively avoiding the risks of missed detection, false detection or untimely detection of the fault, greatly saving the labor operation and maintenance cost of enterprises and improving the efficiency of operation and maintenance work. In addition, the implementation of the embodiment of the specification ensures that the sent alarms are all new faults when automatic routing inspection is carried out, so that fault information cannot be submerged in a large number of repeated alarms, the trouble that operation and maintenance personnel repeatedly receive the same fault information is solved, the new faults can be timely discovered and processed, and the fault discovery rate is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description of the embodiment or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic flow chart of a first disk alarm method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a second disk warning method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a third disk warning method according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a fourth disk alarm method according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating a fifth disk warning method according to an embodiment of the present invention;
fig. 6 is a flowchart illustrating a sixth disk warning method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a disk warning device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a disk warning device according to an embodiment of the present invention.
The system comprises a data acquisition module 110, an information acquisition module 120, a first writing module 130, a comparison module 140, a second writing module and an alarm module 150.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device.
A Redundant Array of Independent Disks (RAID) has the meaning of "array with redundancy capability made up of Independent Disks".
The disk array is a disk group with a large capacity formed by combining a plurality of disks, and the performance of the whole disk system is improved by utilizing the additive effect generated by providing data by the individual disks. With this technique, data is divided into a plurality of sectors, each of which is stored on a respective hard disk.
The disk array can also utilize the concept of Parity Check (Parity Check), when any hard disk in the array fails, the data can still be read out, and when the data is reconstructed, the data is calculated and then is placed into a new hard disk again.
As shown in fig. 1, fig. 1 is a schematic flowchart of a first disk alarm method provided in an embodiment of the present invention, and the present invention provides a disk alarm method, where the method is applied to a disk intensive server, and the method includes:
s102, traversing each disk controller in the server in the current period to acquire a log file of the current period of each disk controller and state information of a disk connected with the disk controller, wherein the log file of the current period comprises: normal information and abnormal information, the state information including: normal state information and abnormal state information.
In the specific implementation process, the server may have several disk controllers, each connected to several disks, and each disk controller consists of mainly control logic circuit and microprocessor connected to the computer system bus, read-write data decoding and encoding circuit to separate read data and compensate write data, data error detecting and correcting circuit, logic circuit to control data transmission, serial-to-parallel conversion, format, etc. according to the command sent from the computer, read-only memory to store basic input and output program of the disks, buffer area for data exchange, etc. The current period log file is recorded with the execution command and the corresponding execution condition which are characterized to be received. The state information characterizes the state of the disk to which the disk controller is coupled. The state information may include: the abnormal state information can be characterized as that the abnormal state occurs to the disk, and the abnormal state of the disk can include: fault conditions and unknown conditions. The fault state is a disk fault which can be identified, and the unknown state is a fault state which can not be identified and can not work normally.
It is understood that traversing each disk controller may be implemented by a preset patrol script, and the patrol script may periodically execute traversing each disk controller.
S104, writing the abnormal information in the log file of each current period, the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into a current inspection file according to a preset rule.
In a specific implementation process, the current patrol file may be a readable and writable file established before traversing each disk controller in the server in the current period. The abnormal information in the log file in the current period and the abnormal state information in the state information can be determined according to a keyword matching mode.
And writing the abnormal information, the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into a current inspection file in sequence according to the occurrence time of the abnormal information and the occurrence time of the abnormal state information.
Or after the abnormal information, the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information are written into the current routing inspection file, sequencing all the information according to the abnormal information and the time of occurrence of the abnormal state information.
S106, comparing the information in the current inspection file with the information in the inspection file in the previous period one by one until the comparison of all the information in the current inspection file is completed.
And S108, writing the information which is not passed through the comparison into the current alarm file.
In a specific implementation process, information in the current inspection file can be compared with information in the inspection file in the previous period one by one in a keyword matching mode, and information which is not passed through the comparison is written into the current alarm file until all information in the current inspection file is compared.
Or comparing the information in the current inspection file with the information in the inspection file in the previous period one by one in a keyword matching mode, and writing the information which is not compared into the current alarm file after the comparison of all the information in the current inspection file is completed.
It is understood that the current alarm file has a file with a specific file name, such as: check _ media _ disk _ log. The file name of the current patrol file may be checkdisk.
And S110, sending out alarm information based on the current alarm file so that a person subscribing the alarm information can acquire the content of the abnormal information corresponding to the disk and/or the abnormal state information corresponding to the disk controller.
In a specific implementation process, the corresponding warning information may be generated based on the abnormal information or the abnormal state information in the current warning file and pushed to the terminal device subscribing to the warning information. And enabling the personnel subscribing the alarm information to acquire the abnormal information corresponding to the disk occurring in the current period and not occurring in the previous period and/or the content of the abnormal state information corresponding to the disk controller.
The implementation of the embodiment of the specification converts the process that the fault and the abnormity of the disk intensive servers of the original partial manufacturers and models need to be found through manual inspection by an operation and maintenance engineer into the mode of automatic inspection at regular time and alarm, thereby effectively avoiding the risks of missed detection, false detection or untimely detection of the fault, greatly saving the labor operation and maintenance cost of enterprises and improving the efficiency of operation and maintenance work. In addition, the implementation of the embodiment of the specification ensures that the sent alarms are all new faults when automatic routing inspection is carried out, so that fault information cannot be submerged in a large number of repeated alarms, the trouble that operation and maintenance personnel repeatedly receive the same fault information is solved, the new faults can be timely discovered and processed, and the fault discovery rate is improved.
On the basis of the foregoing embodiment, in an embodiment of this specification, as shown in fig. 2, fig. 2 is a schematic flowchart of a second disk alarm method provided in an embodiment of the present invention, and as shown in fig. 2, after traversing each disk controller in a server in a current period to obtain a log file of the current period of each disk controller and state information of a disk connected to the disk controller, the method includes:
s202, outputting the current period log file to a corresponding folder according to the communication address of the disk controller, and adding a current time tag to obtain a current period filing file.
In a specific implementation process, because the communication addresses of each disk controller are different, a corresponding number of archived files can be established according to the number of the communication addresses of the disk controllers, each archived file corresponds to one disk controller, the corresponding current-period log file can be output to the archived file in the corresponding folder according to the communication addresses of the disk controllers, and the current-period archived file can be obtained by adding the current time tag. Each current period archive file comprises an execution command and an execution condition of the corresponding disk controller.
And S204, recording the state information of the disk corresponding to the disk controller into a file corresponding to the current period archived file according to a preset type.
In a specific implementation process, the preset category may be set by the state information, and the preset category may include: abnormal, unknown and normal.
Illustratively, the number of the disk controllers is a, each disk controller is connected with b disks, so that a folders are established, each folder is filed with a corresponding current period filed file and b corresponding state information, and each state information is stored in a corresponding class of file.
In the implementation of the embodiment of the present specification, through traversing the states of the disk controllers and the connected disks, the log files of the current period of each disk controller and the state information of the connected disks are collected, and the collected information is stored and filed, so that the operation condition of the full life cycle of the disk intensive server can be comprehensively grasped, and the management of the full life cycle of the disk intensive server is realized. Meanwhile, the value of the data can be exerted by keeping the historical records of the operation and maintenance data, and data support is provided for follow-up fault statistics and analysis and improvement of intelligent management and operation and maintenance capabilities.
On the basis of the foregoing embodiment, in an embodiment of this specification, fig. 3 is a flowchart illustrating a third disk warning method according to an embodiment of the present invention, and as shown in fig. 3, after sending warning information based on the current warning file, the method includes:
s302, emptying the information in the current alarm file.
In a specific implementation process, after the alarm information is sent out, the information in the current alarm file can be emptied.
The implementation of the embodiment of the specification can effectively avoid the situation that the current alarm file has too much information and abnormal state information or abnormal information outside the current period exists, and the reliability of the embodiment of the specification is improved.
On the basis of the foregoing embodiment, in an embodiment of the present specification, fig. 4 is a schematic flow chart of a fourth disk alarm method provided in the embodiment of the present invention, and as shown in fig. 4, the disk includes a virtual disk group and a physical disk group;
the traversing each disk controller in the server in the current period to acquire the log file of the current period of each disk controller and the state information of the disk connected with the disk controller comprises:
s302, obtaining the disk topology information on the disk array, and obtaining node information of different physical disk groups, physical slot position information of each physical disk group, and a corresponding relation between each virtual disk group and the physical disk group.
In a specific implementation process, because the disk array is provided with a plurality of physical disk groups and a plurality of virtual disk groups, before traversing the disk controller, communication connection should be established with the corresponding physical disk groups and virtual disk groups, different communication addresses are allocated, and corresponding address information is allocated to a plurality of node information in the server.
S304, acquiring the current period log file of each disk controller and the working condition information of the virtual disk group and the physical disk group connected with the disk controllers based on the disk topology information, wherein the working condition information comprises: state information, node information, and physical slot position information.
In a specific implementation process, the current period log file of each disk controller, and the working condition information of the virtual disk group and the physical disk group connected to the disk controller may be acquired based on the disk topology information, where the working condition information includes: state information, node information, and physical slot position information.
Correspondingly, writing the abnormal information in each current period log file and the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into a current routing inspection file according to a preset rule comprises:
and writing the abnormal information in each current period log file, the working condition information corresponding to the abnormal information, the working condition information containing the abnormal state information and the normal information corresponding to the abnormal state information into a current routing inspection file according to a preset rule.
In a specific implementation process, the patrol script may call a storcli64 tool to manage and acquire hardware topology information on the RA ID card, find a physical slot position corresponding relationship of each disk, acquire a physical disk corresponding relationship under each virtual disk group VD, and acquire state information of a physical disk group and state information of a virtual disk group. When a fault or an abnormality occurs, recording abnormal state information, machine node information, fault disk physical slot position information and the like of the physical disk group into a current patrol check file check _ megaraid _ disk _ log.
The implementation of the embodiment of the specification can effectively solve the problem that operation and maintenance pain points of partial manufacturers and models cannot effectively manage the running state of the disk through a unified hardware management platform, the acquisition of the hardware state information of the server is realized on an operating system layer, the risk that faults cannot be timely found and processed when a single alarm channel fails is avoided, and the integrity of data is ensured.
On the basis of the foregoing embodiment, in an embodiment of this specification, fig. 5 is a schematic flowchart of a fifth disk warning method provided in an embodiment of the present invention, and as shown in fig. 5, sending warning information based on the current warning file includes:
s502, comparing the information in the current alarm file with the abnormal keywords in the preset abnormal keyword list.
In a specific implementation, the exception key list may include the following keys: grd, Pdgd, OfLn, UBad, Failed, Missing, offln, uncorrectable errors, uncorrectable medium errors, uncorrectable double medium errors, Background information analysis Failed, systematic Check completed with uncorrectable data, control Read imputing block, rebuilt Failed, Unable access device, Bad block table, Controller calculated a surface error and the like, each keyword corresponding to an exception category. It will be appreciated that each keyword may be other characters as well.
S504, one or more disk forewarning messages matched with the abnormal keywords are obtained, and the forewarning messages comprise: exception information and/or exception status information.
S506, analyzing the current routing inspection file to obtain physical slot position information of the abnormal disk corresponding to the one or more disk pre-warning information respectively;
and S508, sending the warning information according to the physical slot position information and the early warning information.
The implementation of the embodiment of the description can comprehensively master the operation condition of the full life cycle of the disk-intensive server by carrying out the historical record of the routing inspection log on the state of the server, thereby realizing the management of the full life cycle. Meanwhile, the value of the data can be exerted by keeping the historical records of the operation and maintenance data, and data support is provided for follow-up fault statistics and analysis and improvement of intelligent management and operation and maintenance capabilities.
On the basis of the above embodiment, in an embodiment of the present specification, the alarm information includes: and the abnormal description is related to one or more of abnormal state information occurrence time, abnormal disk name, physical slot position information and solution suggestion.
In a specific implementation, the description of the anomaly may be as follows:
for the virtual disk group VD, there may be three state information, that is, "err," "unknown," and "ok," where err refers to a fault/abnormal state, unknown refers to an unknown state, and ok refers to a normal state. The fault abnormal state of the virtual disk group comprises the following steps:
dgrd: the virtual disk group is in a 'dgrd' state, and whether the RAID group member disk is abnormal or not needs to be checked;
pdgd: the virtual disk group is in a Pdgd state, and whether the RAID group member disk is abnormal or not needs to be checked;
OfLn: the virtual disk group is in the "OfLn" state, and the disk group is in the data unavailable state, and whether there are multiple failed disks needs to be checked.
For the physical disk group PD, there may be three state information, that is, "err," "unknown," and "ok," where err refers to a fault/abnormal state, unknown refers to an unknown state, and ok refers to a normal state. The fault abnormal state of the physical disk comprises the following steps:
UBad: the disk is in a UBad state, and whether the disk has individual faults needs to be checked;
failed: the disk is in a Failed state and needs to be replaced;
missing: the disk is in a 'Missing' state, and whether the disk fails or is pulled out needs to be checked;
offfn: the disk is in the "offfn" state, and it needs to check whether the disk has been configured by RAID before.
There are several exception states for the disk controller as follows:
noncorrectable errors: the background initialization is completed but the uncoordinated errors exist, and the state of the physical disk needs to be checked;
unreecoverable medium error: the state of the physical disk needs to be checked when an Unrecoverable media error occurs;
noncorrectable double medium errors: if the uncorrectable double me error occurs, the state of the physical disk needs to be checked;
background initiation failed: the background initialization fails, and the state of the physical disk needs to be checked;
consistency Check completed with uncorrectable data: the consistency check is completed, but the 'uncorrectable data' exists, the data needs to be repaired, the problem is not necessarily related to hardware, but the synchronous check is needed to check whether the disk has a fault;
patrol Read numbering bad block: when the data inspection reading meets the bad block, the state of the disk needs to be checked;
rebuild failed: because the reconstruction fails due to a disk error, it is necessary to check whether the disk state is "f ail" or "missing";
unable to access device: if the disk is not usable, the state of the disk needs to be checked, and whether a fault exists or not is judged;
bad block table: if the disk bad block table is full, checking whether the disk state is failed, and checking whether the bad block table in the disk Smart information is full;
controller accounted a surface error and wa reset: the disk controller fails and the disk controller state needs to be checked.
On the basis of the foregoing embodiment, in an embodiment of this specification, fig. 6 is a schematic flow chart of a sixth disk warning method provided in the embodiment of the present invention, and as shown in fig. 6, the method further includes:
s702, replacing the information in the inspection file in the previous period with the information in the current inspection file, and removing the information in the current inspection file.
The implementation of the embodiment of the description can provide the computing power for generating the alarm information and ensure the accuracy of generating the alarm information.
On the other hand, an embodiment of the present disclosure provides a disk warning device, and fig. 7 is a schematic structural diagram of a disk warning device provided in an embodiment of the present disclosure, as shown in fig. 7, including:
the information obtaining module 110 is configured to perform traversal of each disk controller in the server in the current period to obtain a log file of the current period of each disk controller and status information of a disk connected to the disk controller, where the log file of the current period includes: normal information and abnormal information, the state information including: normal state information and abnormal state information;
a first writing module 120, configured to write the abnormal information in each current period log file and the state information corresponding to the abnormal information, the abnormal state information, and the normal information corresponding to the abnormal state information into a current inspection file according to a preset rule;
the comparison module 130 is configured to compare the information in the current inspection file with the information in the inspection file in the previous period one by one until the comparison of all the information in the current inspection file is completed;
a second writing module 140 configured to write the information that fails to pass the comparison into the current alarm file;
and the alarm module 150 is configured to perform sending out alarm information based on the current alarm file, so that a person who subscribes the alarm information knows the abnormal information corresponding to the disk and/or the content of the abnormal state information corresponding to the disk controller.
On the basis of the above embodiments, in an embodiment of the present specification, the method further includes:
the first filing module is configured to output the current period log file to a corresponding folder according to the communication address of the disk controller and add a current time tag to obtain a current period filing file;
and the second filing module is configured to record the state information of the disk corresponding to the disk controller into a file corresponding to the current period filing file according to a preset category.
On the basis of the above embodiments, in an embodiment of the present specification, the method further includes:
and the first clearing module is configured to clear the information in the current alarm file.
On the basis of the above embodiment, in an embodiment of the present specification, the information obtaining module 110 includes:
the topological relation acquisition unit is configured to execute acquisition of disk topological information on the disk array, and obtain node information of different physical disk groups, physical slot position information of each physical disk group and a corresponding relation between each virtual disk group and the physical disk group;
a working condition information obtaining unit configured to perform obtaining of a current period log file of each disk controller and working condition information of a virtual disk group and a physical disk group connected to the disk controllers based on the disk topology information, where the working condition information includes: state information, node information and physical slot position information;
correspondingly, the first writing module comprises:
and the writing module unit is configured to write the abnormal information in each current period log file, the working condition information corresponding to the abnormal information, the working condition information containing the abnormal state information and the normal information corresponding to the abnormal state information into the current routing inspection file according to a preset rule.
On the basis of the above embodiments, in an embodiment of the present specification, the alarm module 150 includes:
the comparison unit is configured to compare the information in the current alarm file with the abnormal keywords in a pre-configured abnormal keyword list;
a forewarning information obtaining unit configured to perform obtaining one or more disk forewarning information matched with the abnormal keyword, the forewarning information including: exception information and/or exception status information;
the analysis unit is configured to analyze the current inspection file and acquire physical slot position information of the abnormal disk corresponding to the one or more disk early warning information respectively;
and the warning unit is configured to execute sending out the warning information according to the physical slot position information and the forewarning information.
On the basis of the above embodiments, in an embodiment of the present specification, the method further includes:
and the second clearing module is configured to replace the information in the inspection file in the previous period with the information in the current inspection file and clear the information in the current inspection file.
In another aspect, the present specification provides a computer readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement a disk alarm method as described above.
On the other hand, an embodiment of the present specification provides a disk warning device, and fig. 8 is a schematic structural diagram of a disk warning device according to an embodiment of the present invention, as shown in fig. 8, including at least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements a disk alert method as described above by executing the instructions stored by the memory.
Since the technical effects of the disk warning device, the computer-readable storage medium, and the disk warning apparatus are the same as those of the disk warning method, they are not described herein again.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The implementation principle and the generated technical effect of the testing method provided by the embodiment of the invention are the same as those of the system embodiment, and for the sake of brief description, the corresponding contents in the system embodiment can be referred to where the method embodiment is not mentioned.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above functions, if implemented in the form of software functional units and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product stored in a storage medium, which includes instructions for causing a computer (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, and are used for illustrating the technical solutions of the present invention, but not for limiting the same, and the scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still modify or easily conceive of the technical solutions described in the foregoing embodiments or make equivalent substitutions for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the above claims.

Claims (10)

1. A disk alarm method is applied to a disk intensive server and comprises the following steps:
traversing each disk controller in the server in the current period to acquire a log file in the current period of each disk controller and state information of a disk connected with the disk controller, wherein the log file in the current period comprises: normal information and abnormal information, the state information including: normal state information and abnormal state information;
writing the abnormal information in each current period log file, the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into a current routing inspection file according to a preset rule;
comparing the information in the current inspection file with the information in the inspection file in the previous period one by one until all the information in the current inspection file is compared;
writing the information which is not passed through the comparison into the current alarm file;
and sending out alarm information based on the current alarm file so that the personnel subscribing the alarm information can know the abnormal information corresponding to the disk and/or the content of the abnormal state information corresponding to the disk controller.
2. The method of claim 1, wherein traversing each disk controller in the current-cycle server to obtain the current-cycle log file of each disk controller and the status information of the disk connected to the disk controller comprises:
outputting the current period log file to a corresponding folder according to the communication address of the disk controller, and adding a current time tag to obtain a current period filing file;
and recording the state information of the disk corresponding to the disk controller into a file corresponding to the current period archived file according to a preset category.
3. The method of claim 1, after issuing alert information based on the current alert file, comprising:
and clearing the information in the current alarm file.
4. The method of claim 1, the disks comprising a virtual disk group and a physical disk group;
the traversing each disk controller in the server in the current period to acquire the log file of the current period of each disk controller and the state information of the disk connected with the disk controller comprises:
acquiring disk topology information on a disk array to obtain node information of different physical disk groups, physical slot position information of each physical disk group and a corresponding relation between each virtual disk group and the physical disk group;
acquiring a current period log file of each disk controller and working condition information of a virtual disk group and a physical disk group connected with the disk controllers based on the disk topology information, wherein the working condition information comprises: state information, node information and physical slot position information;
correspondingly, writing the abnormal information in each current period log file and the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into a current routing inspection file according to a preset rule comprises:
and writing the abnormal information in each current period log file, the working condition information corresponding to the abnormal information, the working condition information containing the abnormal state information and the normal information corresponding to the abnormal state information into a current routing inspection file according to a preset rule.
5. The method of claim 4, the issuing alert information based on the current alert file comprising:
comparing the information in the current alarm file with the abnormal keywords in a preset abnormal keyword list;
acquiring one or more disk forewarning information matched with the abnormal keywords, wherein the forewarning information comprises: exception information and/or exception status information;
analyzing the current inspection file to acquire physical slot position information of the abnormal disk corresponding to the one or more disk early warning information respectively;
and sending the warning information according to the physical slot position information and the advance warning information.
6. The method of claim 5, the alert information comprising: and the abnormal description is related to one or more of abnormal state information occurrence time, abnormal disk name, physical slot position information and solution suggestion.
7. The method of claim 5, further comprising:
and replacing the information in the inspection file in the previous period with the information in the current inspection file, and removing the information in the current inspection file.
8. A disk warning device comprising:
an information obtaining module (110) configured to perform traversal of each disk controller in a current-period server to obtain a current-period log file of each disk controller and status information of a disk connected to the disk controller, where the current-period log file includes: normal information and abnormal information, the state information including: normal state information and abnormal state information;
a first writing module (120) configured to write the abnormal information in each current period log file and the state information corresponding to the abnormal information, the abnormal state information and the normal information corresponding to the abnormal state information into a current inspection file according to a preset rule;
the comparison module (130) is configured to compare the information in the current inspection file with the information in the inspection file in the previous period one by one until the comparison of all the information in the current inspection file is completed;
a second writing module (140) configured to write the information that fails to pass the comparison into the current alarm file;
and the alarm module (150) is configured to execute sending out alarm information based on the current alarm file so that a person subscribing to the alarm information can know the content of the abnormal information corresponding to the disk and/or the abnormal state information corresponding to the disk controller.
9. A computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the disk alert method as claimed in any one of claims 1 to 7.
10. A disk alert device includes at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the disk alert method as claimed in any one of claims 1 to 7 by executing the instructions stored by the memory.
CN202011021727.XA 2020-09-25 2020-09-25 Disk warning method and device Pending CN112084097A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011021727.XA CN112084097A (en) 2020-09-25 2020-09-25 Disk warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011021727.XA CN112084097A (en) 2020-09-25 2020-09-25 Disk warning method and device

Publications (1)

Publication Number Publication Date
CN112084097A true CN112084097A (en) 2020-12-15

Family

ID=73739903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011021727.XA Pending CN112084097A (en) 2020-09-25 2020-09-25 Disk warning method and device

Country Status (1)

Country Link
CN (1) CN112084097A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428709A (en) * 2022-01-17 2022-05-03 广州鲁邦通物联网科技股份有限公司 SDS state detection method and system in cloud management platform

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106681930A (en) * 2017-01-23 2017-05-17 北京思特奇信息技术股份有限公司 Distributed automatic application operation abnormity detecting method and system
CN107423194A (en) * 2017-06-30 2017-12-01 阿里巴巴集团控股有限公司 Front end abnormality alarming processing method, apparatus and system
CN108737170A (en) * 2018-05-09 2018-11-02 中国银行股份有限公司 A kind of batch daily record abnormal data alarm method and device
CN109684141A (en) * 2018-12-19 2019-04-26 郑州云海信息技术有限公司 A kind of disk failure diagnostic method, device, terminal and readable storage medium storing program for executing
CN110187997A (en) * 2019-06-06 2019-08-30 深信服科技股份有限公司 A kind of disk method for monitoring abnormality, device, equipment and medium
US10467075B1 (en) * 2015-11-19 2019-11-05 American Megatrends International, Llc Systems, devices and methods for predicting disk failure and minimizing data loss
CN110442495A (en) * 2019-07-30 2019-11-12 杭州安恒信息技术股份有限公司 The method for automating cruising inspection system exception

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467075B1 (en) * 2015-11-19 2019-11-05 American Megatrends International, Llc Systems, devices and methods for predicting disk failure and minimizing data loss
CN106681930A (en) * 2017-01-23 2017-05-17 北京思特奇信息技术股份有限公司 Distributed automatic application operation abnormity detecting method and system
CN107423194A (en) * 2017-06-30 2017-12-01 阿里巴巴集团控股有限公司 Front end abnormality alarming processing method, apparatus and system
CN108737170A (en) * 2018-05-09 2018-11-02 中国银行股份有限公司 A kind of batch daily record abnormal data alarm method and device
CN109684141A (en) * 2018-12-19 2019-04-26 郑州云海信息技术有限公司 A kind of disk failure diagnostic method, device, terminal and readable storage medium storing program for executing
CN110187997A (en) * 2019-06-06 2019-08-30 深信服科技股份有限公司 A kind of disk method for monitoring abnormality, device, equipment and medium
CN110442495A (en) * 2019-07-30 2019-11-12 杭州安恒信息技术股份有限公司 The method for automating cruising inspection system exception

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428709A (en) * 2022-01-17 2022-05-03 广州鲁邦通物联网科技股份有限公司 SDS state detection method and system in cloud management platform

Similar Documents

Publication Publication Date Title
CN105468484B (en) Method and apparatus for locating a fault in a storage system
EP2672387B1 (en) A distributed object storage system
CN102880522B (en) Hardware fault-oriented method and device for correcting faults in key files of system
US10147048B2 (en) Storage device lifetime monitoring system and storage device lifetime monitoring method thereof
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
US20150074462A1 (en) Diagnostic analysis tool for disk storage engineering and technical support
CN108536548B (en) Method and device for processing bad track of disk and computer storage medium
CN101097531A (en) Computer RAID array early-warning system and method
CN111324192A (en) System board power supply detection method, device, equipment and storage medium
US9529674B2 (en) Storage device management of unrecoverable logical block addresses for RAID data regeneration
CN111104293A (en) Method, apparatus and computer program product for supporting disk failure prediction
CN111414268A (en) Fault processing method and device and server
CN109726036B (en) Data reconstruction method and device in storage system
CN105607973B (en) Method, device and system for processing equipment fault in virtual machine system
CN110597655A (en) Fast predictive restoration method for coupling migration and erasure code-based reconstruction and implementation
CN105138280A (en) Data write-in method, apparatus and system
CN114860487A (en) Memory fault identification method and memory fault isolation method
CN108170375B (en) Overrun protection method and device in distributed storage system
CN112084097A (en) Disk warning method and device
CN110058961B (en) Method and apparatus for managing storage system
CN116737462A (en) Data processing method, system, device and medium
CN110968456B (en) Method and device for processing fault disk in distributed storage system
JP5849491B2 (en) Disk control device, disk device abnormality detection method, and program
JP2018180982A (en) Information processing device and log recording method
JP2022052504A (en) Bmc, server system, device stabilization determination method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination