CN103207820A - Method and device for fault positioning of hard disk on basis of raid card log - Google Patents

Method and device for fault positioning of hard disk on basis of raid card log Download PDF

Info

Publication number
CN103207820A
CN103207820A CN2013100460087A CN201310046008A CN103207820A CN 103207820 A CN103207820 A CN 103207820A CN 2013100460087 A CN2013100460087 A CN 2013100460087A CN 201310046008 A CN201310046008 A CN 201310046008A CN 103207820 A CN103207820 A CN 103207820A
Authority
CN
China
Prior art keywords
hard disk
raid card
state
daily record
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100460087A
Other languages
Chinese (zh)
Other versions
CN103207820B (en
Inventor
刘亮
王雁鹏
王晓静
魏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310046008.7A priority Critical patent/CN103207820B/en
Publication of CN103207820A publication Critical patent/CN103207820A/en
Application granted granted Critical
Publication of CN103207820B publication Critical patent/CN103207820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a fault positioning method of a hard disk on the basis of a raid card log. The method includes the following steps: a raid card enables a log to be pushed to an asynchronous event processing engine; a monitoring tool analyzes current state of the hard disk, if a logical disk is in degraded state or offline state, the hard disk breaks down; the log is analyzed by the engine to obtain log information related to off-disk, and the log information is pushed to a memory of a server to generate a local raid card log; the monitoring tool grabs a plurality of change event records of a physical disk of a magnetic disk so as to obtain final state of the hard disk; and the current state of the hard disk is compared with the final state of the hard disk, if the current state is not matched with the final state, the physical disk is off-disk. The fault positioning method of the hard disk on the basis of the raid card log can reach full coverage rate on hard disk operation fault detection, improves hard disk monitoring and detection accuracy substantially, and improves operation and maintenance efficiency of the server. The invention further provides a fault positioning device of the hard disk on the basis of the raid card log.

Description

Fault Locating Method and device based on the hard disk of raid card daily record
Technical field
The present invention relates to technical field of information storage, particularly a kind of Fault Locating Method and device of the hard disk based on the daily record of raid card.
Background technology
At the LSI(Large-scale integration that uses in the enterprise servers, large scale integrated circuit) type raid(Redundant Arrays of Inexpensive Disks, disk array) fault detect of card hard disk, storehouse or instrument that prior art utilizes raid card manufacturer to provide, read each hard disk under the raid card/SSD(Solid State Disk, solid state hard disc) state, and failure count; When disk state undesired, when perhaps failure count surpasses threshold value, trigger fault alarm, yet when hard disk/SSD catastrophic failure, when causing raid card system can't identify, therefore the raid card controller can be played respective disc and remove out the raid array, no longer records this and coils any relevant state and failure message, can cause existing technological means that the fault that physics takes place falls the hard disk of dish is failed to report.
Summary of the invention
The present invention is intended to one of solve the problems of the technologies described above at least.
For this reason, one object of the present invention be to propose a kind of can reach the hard disk operation troubles detected more full coverage rate is arranged, and the accuracy that can increase substantially the hard disk monitoring and detect, the hard disk failure localization method based on the daily record of raid card of raising server O﹠M efficient.
Another object of the present invention is to propose a kind of hard disk failure locating device based on the daily record of raid card.
To achieve these goals, the embodiment of first aspect present invention has proposed a kind of Fault Locating Method of the hard disk based on the daily record of raid card, wherein, asynchronous real time propelling movement interface is set between disk array raid card and server, and be provided with the asynchronous event processing engine in the described server, described hard disk failure localization method comprise the steps: described raid cartoon cross described asynchronous real time propelling movement interface with raid card daily record real time propelling movement to described asynchronous event processing engine; Monitoring tool is analyzed the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judges that then described hard disk breaks down; When judging that described hard disk breaks down, described asynchronous event processing engine to described raid card daily record analyze to obtain and fall the relevant log information of dish, and with described with fall to coil relevant log information and push to the internal memory of described server to generate the daily record of local raid card; Described monitoring tool grasps many transition logouts of the physical disks of described disk in the daily record of described local raid card, and obtains the end-state of described hard disk according to many described transition logouts; And described monitoring tool compares current state and the end-state of described hard disk, if the current state of described hard disk and end-state do not match, judges that then the physical disks of described hard disk is fallen dish.
The Fault Locating Method based on the hard disk of raid card daily record according to the embodiment of the invention, in conjunction with hard disk current operation health and fitness information and the daily record of analyzing the raid card, can reach the hard disk operation troubles detected more full coverage rate is arranged, and increased substantially the accuracy of hard disk monitoring and detection, improved the O﹠M efficient of server.
In addition, the Fault Locating Method of the hard disk based on the daily record of raid card according to the above embodiment of the present invention can also have following additional technical characterictic:
In an embodiment of the present invention, if the current state of described hard disk and end-state coupling judge that then described hard disk breaks down.
In an embodiment of the present invention, described asynchronous event processing engine is after obtaining the described log information relevant with falling dish, also comprise the steps: the described log information relevant with falling dish formatd processing, the log information after format is handled pushes to the internal memory of described server.
In an embodiment of the present invention, the transition state of the described hard disk of described transition logout comprises: normal condition is transitted towards that malfunction, malfunction are transitted towards normal condition, malfunction is transitted towards abnormality.
In an embodiment of the present invention, the described end-state of obtaining described hard disk according to many described transition logouts, comprise the steps: the time of many described transition logouts is analyzed, obtain a transition logout of final time, obtain the end-state of described hard disk.
The embodiment of second aspect present invention has also proposed a kind of fault locator of the hard disk based on the daily record of raid card, comprise: monitoring tool, raid card, server and asynchronous real time propelling movement interface, wherein said asynchronous real time propelling movement interface is between described raid card and described server, and described raid card is used for by described asynchronous real time propelling movement interface raid card daily record real time propelling movement described server extremely; Described server comprises the asynchronous event processing engine, described asynchronous event processing engine is used for receiving the daily record of described raid card by described asynchronous real time propelling movement interface, and when described hard disk breaks down, daily record analyzes to obtain the log information relevant with falling dish to described raid card, and the described log information relevant with falling dish pushed to the internal memory of described server to generate the daily record of local raid card; Described monitoring tool is used for analyzing the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judge that then described hard disk breaks down, and in the daily record of described local raid card, grasp many transition logouts of the physical disks of described disk, and obtain the end-state of described hard disk according to many described transition logouts, and current state and the end-state of described hard disk compared, if the current state of described hard disk and end-state do not match, judge that then the physical disks of described hard disk is fallen dish.
The fault locator based on the hard disk of raid card daily record according to the embodiment of the invention, in conjunction with hard disk current operation health and fitness information and the daily record of analyzing the raid card, can reach the hard disk operation troubles detected more full coverage rate is arranged, and increased substantially the accuracy of hard disk monitoring and detection, improved the O﹠M efficient of server.
In addition, the fault locator of the hard disk based on the daily record of raid card according to the above embodiment of the present invention can also have following additional technical characterictic:
In an embodiment of the present invention, described monitoring tool judges that described hard disk breaks down when the current state that monitors described hard disk and end-state coupling.
In an embodiment of the present invention, described asynchronous event processing engine also is used for the described log information relevant with falling dish formatd processing, and the log information after format is handled pushes to the internal memory of described server.
In an embodiment of the present invention, the transition state of the described hard disk of described transition logout comprises: normal condition is transitted towards that malfunction, malfunction are transitted towards normal condition, malfunction is transitted towards abnormality.
In an embodiment of the present invention, described monitoring tool was analyzed the time of many described transition logouts, obtained a transition logout of final time, obtained the end-state of described hard disk.
Additional aspect of the present invention and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment in conjunction with following accompanying drawing, wherein:
Fig. 1 is according to an embodiment of the invention based on the process flow diagram of the Fault Locating Method of the hard disk of raid card daily record;
Fig. 2 pushes synoptic diagram based on the raid card asynchronous event of the Fault Locating Method of the hard disk of raid card daily record in accordance with another embodiment of the present invention;
Fig. 3 pushes framework based on the raid card asynchronous event of the Fault Locating Method of the hard disk of raid card daily record according to an embodiment of the invention;
Fig. 4 is according to an embodiment of the invention based on physical disks state variation record synoptic diagram in the raid card daily record of the Fault Locating Method of the hard disk of raid card daily record;
Fig. 5 is in accordance with another embodiment of the present invention based on the process flow diagram of the Fault Locating Method of the hard disk of raid card daily record; With
Fig. 6 is according to an embodiment of the invention based on the structural drawing of the fault locator of the hard disk of raid card daily record.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical or similar label is represented identical or similar elements or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, it will be appreciated that, term " " center "; " vertically "; " laterally "; " on "; D score; " preceding ", " back ", " left side ", " right side ", " vertically ", " level ", " top ", " end ", " interior ", close the orientation of indications such as " outward " or position is based on orientation shown in the drawings or position relation, only be that the present invention for convenience of description and simplification are described, rather than device or the element of indication or hint indication must have specific orientation, with specific orientation structure and operation, therefore can not be interpreted as limitation of the present invention.In addition, term " first ", " second " only are used for describing purpose, and can not be interpreted as indication or hint relative importance.
In description of the invention, need to prove that unless clear and definite regulation and restriction are arranged in addition, term " installation ", " linking to each other ", " connection " should be done broad understanding, for example, can be fixedly connected, also can be to removably connect, or connect integratedly; Can be mechanical connection, also can be to be electrically connected; Can be directly to link to each other, also can link to each other indirectly by intermediary, can be the connection of two element internals.For the ordinary skill in the art, can concrete condition understand above-mentioned term concrete implication in the present invention.
Describe the Fault Locating Method based on the hard disk of raid card daily record according to the embodiment of the invention in detail below in conjunction with accompanying drawing 1-5.
As shown in Figure 1, based on the Fault Locating Method of the hard disk of raid card daily record, wherein, asynchronous real time propelling movement interface is set between disk array raid card and server according to an embodiment of the invention, and be provided with the asynchronous event processing engine in the server, this method may further comprise the steps:
Step S101, the raid cartoon is crossed asynchronous real time propelling movement interface with the asynchronous event processing engine of raid card daily record real time propelling movement to the server.
Particularly, all event informations that the raid card takes place have been recorded in the complete daily record of raid card, comprise the numbering of event in daily record, time of origin, information such as event description and event data.Adopt asynchronous mechanism to realize the real-time Communication for Power of server this locality and raid card controller, in case event occurs for the raid card, the raid card controller is in its storer in daily record storage time, utilize asynchronous event to push interface and be pushed to the asynchronous event processing engine that operates in the server, the asynchronous event processing engine is carried out analyzing and processing to event information.
Step S102, monitoring tool is analyzed the current state of hard disk, if the Logical Disk of hard disk is in degradation degraded state or the offline state that rolls off the production line, judges that then hard disk breaks down.
Particularly, the Logical Disk state of hard disk has optimal, and three kinds of states of degraded and offline reflect normal, degradation and the down status of current raid card logic dish respectively, and in other words, above-mentioned three kinds of states can be understood as corresponding normal and fault two states.Degraded or offline state occur when monitoring tool monitors Logical Disk, then decision logic dish corresponding physical dish breaks down.Wherein, monitoring tool is but is not limited to the MegaCli instrument.
Step S103, when judging that hard disk breaks down, the asynchronous event processing engine to the raid card daily record analyze to obtain and fall the relevant log information of dish, and with fall to coil relevant log information and push to the internal memory of server to generate the daily record of local raid card.Particularly, after the asynchronous event processing engine obtains the log information relevant with falling dish, processing is analyzed, filters and formatd to this log information relevant with falling dish, and will format the internal memory that log information after handling pushes to server, in order to generate the daily record of local raid card, convenient real-time inquiry and real time propelling movement when reaching zero influence to server performance, are accomplished obtaining in real time key message.
Step S104, monitoring tool grasp many transition logouts of the physical disks of disk in the daily record of local raid card, and obtain the end-state of hard disk according to many transition logouts.
Particularly, the transition state of transition logout hard disk comprises: normal condition is transitted towards malfunction, malfunction is transitted towards normal condition and malfunction is transitted towards abnormality.And the concrete step of the end-state of obtaining hard disk according to many transition logouts is: the time to many transition logouts is analyzed, and obtains a transition logout of final time, obtains the end-state of hard disk.
Step S105, monitoring tool is compared current state and the end-state of hard disk, if current state and the end-state of hard disk do not match, judges that then the physical disks of hard disk is fallen dish.Further, if the current state of hard disk and end-state coupling judge that then hard disk breaks down.
A flash is arranged on the raid card, and the various log when being used for the permanent storage operation can not lose during power down yet; Event in the raid card operational process comprises any situation of falling to coil that occurs, and corresponding state change all can be recorded among the flash.So utilize raid card stored log can cover hard disk failure very all sidedly.Then in above-mentioned example, for LSI type raid card, can utilize the MegaCli instrument to grasp raid card controller, the health parameters of physical disks and Logical Disk etc.For example, the Logical Disk state has optimal, and three kinds of states of degraded and offline reflect normal, degradation and the down status of current raid card logic dish respectively, and in other words, above-mentioned three kinds of states can be understood as the state of corresponding normal and fault.If degraded or offline state appear in Logical Disk, then can necessarily there be fault by decision logic dish corresponding physical dish.Accordingly, the media Error of physical disks, predictive failure, numerical value such as firmware state have reflected the running status of current physical disks, the firmware state online that reaches the standard grade, failure failed, unusual unconfigure_good, state values such as fault unconfigure_bad reflect the normal and abnormality of current physical disks respectively.In conjunction with the status information of Logical Disk and physical disks, can judge effectively whether the raid card is current moves normal and which piece Logical Disk has problem.
For the raid card that physics falls dish does not take place, above-mentioned detection means all can detect hard disk failure real-time and accurately.But when generation physics fell to coil, the raid card controller was no longer kicked out of array with this dish, causes falling the running state information of the faulty hard disk of dish not to be acquired in real time, also just can't navigate to the fault of this hard disk with above-mentioned means.Consider that the raid card controller can in time be recorded to event information in the daily record of raid card, comprise that physics falls to coil event information, can obtain the daily record of raid card and analyze, excavate the hard disk real-time running state information that to obtain, thereby realize generation physics is fallen the location of the faulty hard disk of dish.
Because all event informations that the raid card takes place have been recorded in the complete daily record of raid card, comprise the numbering of event in daily record, time of origin, event description, information such as event data, when server raid card event takes place when frequent, the log information amount is very big, and frequently reading daily record can influence server performance to internal memory.Be directed to this, adopt asynchronous mechanism to realize the real-time Communication for Power of server this locality and raid card controller, in case event occurs for the raid card, the raid card controller is stored in the event log in flash, utilize asynchronous event to push interface and be pushed to the asynchronous event processing engine that operates in the server, the asynchronous event processing engine is carried out the real-time information analysis, filters and format, the log information of format is stored in the server local hard drive, convenient real-time inquiry and data mining.Event asynchronous communication framework between raid card and the server has been realized the real time propelling movement of log information to this locality, when reaching zero influence to server performance, to obtaining in real time of key message.Asynchronous push is stored in the local daily record to the daily record increment of this locality, uses for fault location.It pushes synoptic diagram as shown in Figures 2 and 3.In conjunction with Fig. 2 and Fig. 3, when raid card when event occurs, the raid card reads the correlation parameter of event from its RAM, and on the one hand, raid card parameter information in its flash generates raid card log daily record; Simultaneously, on the other hand, the raid card pushes the interface propelling data to asynchronous event, and the asynchronous event processing engine of server receives and deal with data, and the data after will formaing are stored in server memory to generate the daily record of local raid card.
Event information type in the daily record of raid card has kind more than 200, and falling to coil relevant type with the location has 5 kinds, and wherein 2 class events of most critical are the status change information record of Logical Disk and physical disks.Status change information is recording the situation of change of hard disk running status, comprise by normal condition to malfunction, by malfunction to normal condition, by a kind of malfunction (unconfigure_bad) to another kind of abnormality (unconfigure_good) etc.Wherein, for example shown in Figure 4 about the set form that records of the event description (Event Description) of raid card logic dish and physical disks, be about a record of the state variation of physical disks in the daily record of raid card.
Every hard disk can have a lot of status change records in its cycle of operation, have only the current running state information of this piece hard disk of storage in the last item status change record.Event description record to this type of form is analyzed, obtain the final running status of every hard disk, thereby take place under the situation of current running status of hard disk that physics falls dish can't obtaining in real time, file by the event in the daily record of raid card, navigate to corresponding physical disks and fall dish, thereby improved failure checking cover ratio.
Fig. 5 is in accordance with another embodiment of the present invention based on the process flow diagram of the Fault Locating Method of the hard disk of raid card daily record.
As shown in Figure 5, based on the Fault Locating Method of the hard disk of raid card daily record, may further comprise the steps in accordance with another embodiment of the present invention:
Step S501, the operational monitoring instrument.Wherein, monitoring tool is but is not limited to the MegaCli instrument.Utilize the MegaCli instrument can grasp the raid card controller, the health parameters of physical disks and Logical Disk etc.
Step S502 analyzes current disk state.The Logical Disk state of hard disk has optimal, and three kinds of states of degraded and offline reflect normal, degradation and the down status of current raid card logic dish respectively, and namely above-mentioned three kinds of states can be understood as corresponding normal and fault two states.Degradation degraded or the offline state that rolls off the production line occur when monitoring tool monitors Logical Disk, then decision logic dish corresponding physical dish breaks down.
Step S503 judges whether to exist Logical Disk degraded or offline state.Judge namely whether monitoring tool detects Logical Disk and degraded or offline state occur, if execution in step S504 then, otherwise execution in step S505.
Step S504 generates the daily record of local raid card.Namely when detecting Logical Disk and degraded or offline state occur, illustrate that Logical Disk breaks down, then the asynchronous event processing engine to the raid card daily record analyze to obtain and fall the relevant log information of dish, processing is analyzed, filters and formatd to this log information relevant with falling dish, and will format the internal memory that log information after handling pushes to server, in order to generate the daily record of local raid card, convenient real-time inquiry and real time propelling movement, when reaching zero influence to server performance, accomplish obtaining in real time key message.
Step S505, non-fault.Namely when degraded or offline state do not appear in the decision logic dish, the hard disk non-fault is described.
Step S506 presses form and grasps physical disks status change logout.Event information type in the daily record of raid card has kind more than 200, and falling to coil relevant type with the location has 5 kinds, and wherein 2 class events of most critical are the status change information record of Logical Disk and physical disks.Status change information is recording the situation of change of hard disk running status, comprise by normal condition to malfunction, by malfunction to normal condition, by a kind of malfunction (unconfigure_bad) to another kind of abnormality (unconfigure_good) etc.Wherein, about the set form that records of the event description (Event Description) of raid card logic dish and physical disks, and need grasp physical disks status change logout according to this set form.
Step S507 resolves the end-state of each hard disk.Every hard disk can have a lot of status change records in its cycle of operation, have only the current running state information of this piece hard disk of storage in the last item status change record.Time to the event description of this type of form record is analyzed, and obtains a transition logout of the final time of every hard disk, obtains the final running status of hard disk.
Step S508 and the hard disk that can obtain current running status mate.Be that monitoring tool is compared current state and its end-state of hard disk.
Step S509, whether the current running status of hard disk mates with final running status.Judge namely whether the hard disk end-state of storing in the current running status of hard disk and the daily record of raid card mates.If execution in step S510 then, otherwise execution in step S511.
Step S510, hard disk failure detects fault.Namely when the current running status of hard disk is mated with final running status, judge that hard disk breaks down, and detect the particular location that fault takes place, handle.
Step S511, hard disc physical falls dish, detects fault.Namely when the current running status of hard disk does not match with final running status, judge that the physical disks of hard disk is fallen dish, and can realize that the faulty hard disk that generation physics is fallen dish positions.
Step S512, LSI type raid card.Namely be directed to the raid card of LSI type.
Step S513, LSI type raid card message interface.Namely between raid card and server, be provided with asynchronous real time propelling movement interface.
Step S514, raid card generation event.Namely event occurs when the raid card.
Step S515, the asynchronous event daily record pushes finger daemon filter message, asynchronous push key message.Namely when raid card when event occurs, the raid card controller is stored in the event log in flash, utilize asynchronous event to push interface and be pushed to the asynchronous event processing engine that operates in the server, the asynchronous event processing engine is carried out the real-time information analysis, filter and format, the log information of format is stored in the server local hard drive, convenient real-time inquiry and data mining.Event asynchronous communication framework between raid card and the server has been realized the real time propelling movement of log information to this locality, when reaching zero influence to server performance, to obtaining in real time of key message.Asynchronous push is stored in the local daily record to the daily record increment of this locality, uses for fault location.Namely continue execution in step S506.
The Fault Locating Method based on the hard disk of raid card daily record according to the embodiment of the invention, in conjunction with hard disk current operation health and fitness information and the daily record of analyzing the raid card, can reach the hard disk operation troubles detected more full coverage rate is arranged, and increased substantially the accuracy of hard disk monitoring and detection, improved the O﹠M efficient of server.
Fig. 6 is according to an embodiment of the invention based on the structural drawing of the fault locator of the hard disk of raid card daily record.
As shown in Figure 6, according to an embodiment of the invention based on the fault locator 600 of the hard disk of raid card daily record, comprise: monitoring tool 610, raid card 620, server 630 and asynchronous real time propelling movement interface 640, wherein asynchronous real time propelling movement interface 640 is arranged between raid card 620 and the server 630.
Particularly, raid card 620 is used for by asynchronous real time propelling movement interface 640 raid card daily record real time propelling movement to server 630.
Server 630 comprises the asynchronous event processing engine, be used for receiving by asynchronous real time propelling movement interface 640 daily record of raid card 620, and when hard disk breaks down, the daily record of raid card 620 is analyzed to obtain the log information relevant with falling dish, and will push to the internal memory of server 630 to generate the daily record of local raid card in falling the relevant log information of dish.Particularly, the asynchronous event processing engine is by formaing processing for falling the relevant log information of dish, and the log information after format is handled pushes to the internal memory of server 630.
Monitoring tool 610 is used for analyzing the current state of hard disk, if the Logical Disk of hard disk is in degradation degraded state or the offline state that rolls off the production line, judge that then hard disk breaks down, and in the daily record of local raid card, grasp many transition logouts of the physical disks of disk, and the time of many transition logouts analyzed, obtain a transition logout of final time, thereby obtain the end-state of hard disk, and current state and the end-state of hard disk compared, if the current state of hard disk and end-state do not match, judge that then the physical disks of hard disk is fallen dish.Further, when monitoring tool 610 monitors the current state of hard disk and end-state coupling, judge that hard disk breaks down.Wherein, the transition state of transition logout hard disk, comprising: normal condition is transitted towards malfunction, and malfunction is transitted towards normal condition, and malfunction is transitted towards abnormality.Monitoring tool is but is not limited to the MegaCli instrument.
In above-mentioned example, raid card 620 by asynchronous real time propelling movement interface 640 with the daily record real time propelling movement of the raid card 620 asynchronous event processing engine to server 630, monitoring tool 610 is analyzed the current state of hard disk, when states such as degraded occurring when the Logical Disk that monitors hard disk, judge that hard disk breaks down, the asynchronous event processing engine is analyzed the information relevant with falling dish of obtaining to the daily record of raid card 620 then, and the internal memory that pushes to server 630 generates the daily record of local raid card, so that real-time inquiry and data mining.Monitoring tool 610 grasps many transition logouts of physical disks of disk in the local raid card daily record that generates then, obtain the end-state of hard disk accordingly, and compare with the current state of hard disk, if mate then hard disk breaks down, if do not match, then the physical disks of hard disk is fallen dish, and can navigate to the faulty hard disk that concrete which hard disk breaks down and concrete generation physics falls to coil.
The fault locator based on the hard disk of raid card daily record according to the embodiment of the invention, in conjunction with hard disk current operation health and fitness information and the daily record of analyzing the raid card, can reach the hard disk operation troubles detected more full coverage rate is arranged, and increased substantially the accuracy of hard disk monitoring and detection, improved the O﹠M efficient of server.
Describe and to be understood that in the process flow diagram or in this any process of otherwise describing or method, expression comprises module, fragment or the part of code of the executable instruction of the step that one or more is used to realize specific logical function or process, and the scope of preferred implementation of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by opposite order, carry out function, this should be understood by the embodiments of the invention person of ordinary skill in the field.
In process flow diagram the expression or in this logic of otherwise describing and/or step, for example, can be considered to the sequencing tabulation for the executable instruction that realizes logic function, may be embodied in any computer-readable medium, use for instruction execution system, device or equipment (as the computer based system, comprise that the system of processor or other can be from the systems of instruction execution system, device or equipment instruction fetch and execution command), or use in conjunction with these instruction execution systems, device or equipment.With regard to this instructions, " computer-readable medium " can be anyly can comprise, storage, communication, propagation or transmission procedure be for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically of computer-readable medium (non-exhaustive list) comprises following: the electrical connection section (electronic installation) with one or more wirings, portable computer diskette box (magnetic device), random-access memory (ram), ROM (read-only memory) (ROM), can wipe and to edit ROM (read-only memory) (EPROM or flash memory), fiber device, and portable optic disk ROM (read-only memory) (CDROM).In addition, computer-readable medium even can be paper or other the suitable media that to print described program thereon, because can be for example by paper or other media be carried out optical scanning, then edit, decipher or handle to obtain described program in the electronics mode with other suitable methods in case of necessity, then it is stored in the computer memory.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, a plurality of steps or method can realize with being stored in the storer and by software or firmware that suitable instruction execution system is carried out.For example, if realize with hardware, the same in another embodiment, in the available following technology well known in the art each or their combination realize: have for the discrete logic of data-signal being realized the logic gates of logic function, special IC with suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
Those skilled in the art are appreciated that and realize that all or part of step that above-described embodiment method is carried is to instruct relevant hardware to finish by program, described program can be stored in a kind of computer-readable recording medium, this program comprises one of step or its combination of method embodiment when carrying out.
In addition, each functional unit in each embodiment of the present invention can be integrated in the processing module, also can be that the independent physics in each unit exists, and also can be integrated in the module two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, also can adopt the form of software function module to realize.If described integrated module realizes with the form of software function module and during as independently production marketing or use, also can be stored in the computer read/write memory medium.
The above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
In the description of this instructions, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, those having ordinary skill in the art will appreciate that: can carry out multiple variation, modification, replacement and modification to these embodiment under the situation that does not break away from principle of the present invention and aim, scope of the present invention is by claim and be equal to and limit.

Claims (10)

1. Fault Locating Method based on the hard disk of raid card daily record, it is characterized in that, asynchronous real time propelling movement interface is set between disk array raid card and server, and is provided with the asynchronous event processing engine in the described server, described hard disk failure localization method comprises the steps:
Described raid cartoon cross described asynchronous real time propelling movement interface with raid card daily record real time propelling movement to described asynchronous event processing engine;
Monitoring tool is analyzed the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judges that then described hard disk breaks down;
When judging that described hard disk breaks down, described asynchronous event processing engine to described raid card daily record analyze to obtain and fall the relevant log information of dish, and with described with fall to coil relevant log information and push to the internal memory of described server to generate the daily record of local raid card;
Described monitoring tool grasps many transition logouts of the physical disks of described disk in the daily record of described local raid card, and obtains the end-state of described hard disk according to many described transition logouts; And
Described monitoring tool is compared current state and the end-state of described hard disk, if the current state of described hard disk and end-state do not match, judges that then the physical disks of described hard disk is fallen dish.
2. hard disk failure localization method as claimed in claim 1 is characterized in that, if the current state of described hard disk and end-state coupling judge that then described hard disk breaks down.
3. hard disk failure localization method as claimed in claim 1 is characterized in that, described asynchronous event processing engine also comprises the steps: after obtaining the described log information relevant with falling dish
The described log information relevant with falling dish formatd processing, and the log information after format is handled pushes to the internal memory of described server.
4. hard disk failure localization method as claimed in claim 1, it is characterized in that, the transition state of the described hard disk of described transition logout comprises: normal condition is transitted towards that malfunction, malfunction are transitted towards normal condition, malfunction is transitted towards abnormality.
5. hard disk failure localization method as claimed in claim 1 is characterized in that, describedly obtains the end-state of described hard disk according to many described transition logouts, comprises the steps:
Time to many described transition logouts is analyzed, and obtains a transition logout of final time, obtains the end-state of described hard disk.
6. the fault locator based on the hard disk of raid card daily record is characterized in that, comprising: monitoring tool, raid card, server and asynchronous real time propelling movement interface, wherein said asynchronous real time propelling movement interface between described raid card and described server,
Described raid card is used for by described asynchronous real time propelling movement interface raid card daily record real time propelling movement described server extremely;
Described server comprises the asynchronous event processing engine, described asynchronous event processing engine is used for receiving the daily record of described raid card by described asynchronous real time propelling movement interface, and when described hard disk breaks down, daily record analyzes to obtain the log information relevant with falling dish to described raid card, and the described log information relevant with falling dish pushed to the internal memory of described server to generate the daily record of local raid card;
Described monitoring tool is used for analyzing the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judge that then described hard disk breaks down, and in the daily record of described local raid card, grasp many transition logouts of the physical disks of described disk, and obtain the end-state of described hard disk according to many described transition logouts, and current state and the end-state of described hard disk compared, if the current state of described hard disk and end-state do not match, judge that then the physical disks of described hard disk is fallen dish.
7. device as claimed in claim 6 is characterized in that, described monitoring tool judges that described hard disk breaks down when the current state that monitors described hard disk and end-state coupling.
8. device as claimed in claim 6 is characterized in that, described asynchronous event processing engine also is used for the described log information relevant with falling dish formatd processing, and the log information after format is handled pushes to the internal memory of described server.
9. device as claimed in claim 6 is characterized in that, the transition state of the described hard disk of described transition logout comprises: normal condition is transitted towards that malfunction, malfunction are transitted towards normal condition, malfunction is transitted towards abnormality.
10. device as claimed in claim 6 is characterized in that, described monitoring tool was analyzed the time of many described transition logouts, obtains a transition logout of final time, obtains the end-state of described hard disk.
CN201310046008.7A 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log Active CN103207820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310046008.7A CN103207820B (en) 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310046008.7A CN103207820B (en) 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log

Publications (2)

Publication Number Publication Date
CN103207820A true CN103207820A (en) 2013-07-17
CN103207820B CN103207820B (en) 2016-06-29

Family

ID=48755049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310046008.7A Active CN103207820B (en) 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log

Country Status (1)

Country Link
CN (1) CN103207820B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995772A (en) * 2014-06-10 2014-08-20 浪潮电子信息产业股份有限公司 RAID card log completely-storing method based on LINUX operation system
CN105045689A (en) * 2015-06-25 2015-11-11 浪潮电子信息产业股份有限公司 Method for using RAID card to perform hard disk batch detection, monitoring and alerting
CN105068901A (en) * 2015-07-27 2015-11-18 浪潮电子信息产业股份有限公司 Disk detection method
CN105117172A (en) * 2015-08-31 2015-12-02 北京神州云科数据技术有限公司 RAID (Redundant Arrays of Inexpensive Disks) historical non-identification record storage method
CN105223889A (en) * 2015-10-13 2016-01-06 浪潮电子信息产业股份有限公司 A kind of method being applicable to the automatic monitoring PMC RAID card daily record of producing line
CN106250258A (en) * 2016-07-29 2016-12-21 北京云集智造科技有限公司 A kind of disk failure localization method and device
CN107515827A (en) * 2017-08-21 2017-12-26 湖南国科微电子股份有限公司 Storage method, device and the SSD of the self-defined daily records of PCIE SSD
CN107577545A (en) * 2016-07-05 2018-01-12 北京金山云网络技术有限公司 A kind of failed disk detection and restorative procedure and device
CN107766191A (en) * 2017-11-03 2018-03-06 郑州云海信息技术有限公司 The automatic detecting storage information of Linux systems and the method for testing of health status
CN108763020A (en) * 2018-05-23 2018-11-06 郑州云海信息技术有限公司 It is a kind of fall disk capture the method and monitor card of storage management card daily record automatically
CN108984119A (en) * 2018-06-28 2018-12-11 郑州云海信息技术有限公司 A kind of asynchronous method, apparatus and controlled terminal for obtaining RAID card information
CN111625390A (en) * 2020-05-28 2020-09-04 深圳市晶讯软件通讯技术有限公司 Embedded equipment fault recovery method and device, embedded equipment and storage medium
CN112162705A (en) * 2020-09-30 2021-01-01 新浪网技术(中国)有限公司 RAID (redundant array of independent disk) set fault automatic offline repair reporting method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004061681A1 (en) * 2002-12-26 2004-07-22 Fujitsu Limited Operation managing method and operation managing server
CN101359959A (en) * 2008-09-17 2009-02-04 中兴通讯股份有限公司 Information acquisition method for fault locating analysis
CN101887387A (en) * 2010-04-07 2010-11-17 山东高效能服务器和存储研究院 Method for remotely intelligently monitoring and analyzing RAID faults
CN102662787A (en) * 2012-04-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for protecting system disk RAID (redundant array of independent disks)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004061681A1 (en) * 2002-12-26 2004-07-22 Fujitsu Limited Operation managing method and operation managing server
CN101359959A (en) * 2008-09-17 2009-02-04 中兴通讯股份有限公司 Information acquisition method for fault locating analysis
CN101887387A (en) * 2010-04-07 2010-11-17 山东高效能服务器和存储研究院 Method for remotely intelligently monitoring and analyzing RAID faults
CN102662787A (en) * 2012-04-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for protecting system disk RAID (redundant array of independent disks)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995772A (en) * 2014-06-10 2014-08-20 浪潮电子信息产业股份有限公司 RAID card log completely-storing method based on LINUX operation system
CN105045689A (en) * 2015-06-25 2015-11-11 浪潮电子信息产业股份有限公司 Method for using RAID card to perform hard disk batch detection, monitoring and alerting
CN105068901A (en) * 2015-07-27 2015-11-18 浪潮电子信息产业股份有限公司 Disk detection method
CN105117172B (en) * 2015-08-31 2019-04-02 深圳神州数码云科数据技术有限公司 A kind of disk array history falls the store method of disk record
CN105117172A (en) * 2015-08-31 2015-12-02 北京神州云科数据技术有限公司 RAID (Redundant Arrays of Inexpensive Disks) historical non-identification record storage method
CN105223889A (en) * 2015-10-13 2016-01-06 浪潮电子信息产业股份有限公司 A kind of method being applicable to the automatic monitoring PMC RAID card daily record of producing line
CN107577545A (en) * 2016-07-05 2018-01-12 北京金山云网络技术有限公司 A kind of failed disk detection and restorative procedure and device
CN107577545B (en) * 2016-07-05 2021-02-02 北京金山云网络技术有限公司 Method and device for detecting and repairing fault disk
CN106250258A (en) * 2016-07-29 2016-12-21 北京云集智造科技有限公司 A kind of disk failure localization method and device
CN107515827A (en) * 2017-08-21 2017-12-26 湖南国科微电子股份有限公司 Storage method, device and the SSD of the self-defined daily records of PCIE SSD
CN107515827B (en) * 2017-08-21 2021-07-27 湖南国科微电子股份有限公司 PCIE SSD custom log storage method and device and SSD
CN107766191A (en) * 2017-11-03 2018-03-06 郑州云海信息技术有限公司 The automatic detecting storage information of Linux systems and the method for testing of health status
CN108763020A (en) * 2018-05-23 2018-11-06 郑州云海信息技术有限公司 It is a kind of fall disk capture the method and monitor card of storage management card daily record automatically
CN108984119A (en) * 2018-06-28 2018-12-11 郑州云海信息技术有限公司 A kind of asynchronous method, apparatus and controlled terminal for obtaining RAID card information
CN111625390A (en) * 2020-05-28 2020-09-04 深圳市晶讯软件通讯技术有限公司 Embedded equipment fault recovery method and device, embedded equipment and storage medium
CN111625390B (en) * 2020-05-28 2024-03-26 深圳市晶讯技术股份有限公司 Embedded equipment fault recovery method and device, embedded equipment and storage medium
CN112162705A (en) * 2020-09-30 2021-01-01 新浪网技术(中国)有限公司 RAID (redundant array of independent disk) set fault automatic offline repair reporting method and system

Also Published As

Publication number Publication date
CN103207820B (en) 2016-06-29

Similar Documents

Publication Publication Date Title
CN103207820A (en) Method and device for fault positioning of hard disk on basis of raid card log
CN100504795C (en) Computer RAID array early-warning system and method
CN105468484B (en) Method and apparatus for locating a fault in a storage system
US10198196B2 (en) Monitoring health condition of a hard disk
CN102591591B (en) Disk detection system, disk detection method and network store system
US9047922B2 (en) Autonomous event logging for drive failure analysis
WO2016107402A1 (en) Magnetic disk fault prediction method and device based on prediction model
TW201629766A (en) Storage device lifetime monitoring system and storage device lifetime monitoring method thereof
CN105224888B (en) A kind of data of magnetic disk array protection system based on safe early warning technology
US20100332189A1 (en) Embedded microcontrollers classifying signatures of components for predictive maintenance in computer servers
CN111104293A (en) Method, apparatus and computer program product for supporting disk failure prediction
US20050210161A1 (en) Computer device with mass storage peripheral (s) which is/are monitored during operation
CN103049345B (en) Based on Disk State transition detection method and the device of asynchronous mechanism
Huang et al. Characterizing disk health degradation and proactively protecting against disk failures for reliable storage systems
CN109597731A (en) A kind of state test method of processor
CN107943654A (en) A kind of method of quick determining server environmental temperature monitoring abnormal cause
CN105372584A (en) Microswitch testing method, device and system
US8451019B2 (en) Method of detecting failure and monitoring apparatus
US8161324B2 (en) Analysis result stored on a field replaceable unit
CN107807862A (en) Detect the method, apparatus and server of hard disk failure point
JP4627327B2 (en) Abnormality judgment device
CN112084097A (en) Disk warning method and device
JP2011180673A (en) Apparatus for diagnosis of disk deterioration
CN114706720B (en) Method, system, equipment and storage medium for judging slow disk of distributed storage system
CN113986142B (en) Disk fault monitoring method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant