CN103207820A

CN103207820A - Method and device for fault positioning of hard disk on basis of raid card log

Info

Publication number: CN103207820A
Application number: CN2013100460087A
Authority: CN
Inventors: 刘亮; 王雁鹏; 王晓静; 魏伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2013-02-05
Filing date: 2013-02-05
Publication date: 2013-07-17
Anticipated expiration: 2033-02-05
Also published as: CN103207820B

Abstract

The invention provides a fault positioning method of a hard disk on the basis of a raid card log. The method includes the following steps: a raid card enables a log to be pushed to an asynchronous event processing engine; a monitoring tool analyzes current state of the hard disk, if a logical disk is in degraded state or offline state, the hard disk breaks down; the log is analyzed by the engine to obtain log information related to off-disk, and the log information is pushed to a memory of a server to generate a local raid card log; the monitoring tool grabs a plurality of change event records of a physical disk of a magnetic disk so as to obtain final state of the hard disk; and the current state of the hard disk is compared with the final state of the hard disk, if the current state is not matched with the final state, the physical disk is off-disk. The fault positioning method of the hard disk on the basis of the raid card log can reach full coverage rate on hard disk operation fault detection, improves hard disk monitoring and detection accuracy substantially, and improves operation and maintenance efficiency of the server. The invention further provides a fault positioning device of the hard disk on the basis of the raid card log.

Description

Fault Locating Method and device based on the hard disk of raid card daily record

Technical field

The present invention relates to technical field of information storage, particularly a kind of Fault Locating Method and device of the hard disk based on the daily record of raid card.

Background technology

At the LSI(Large-scale integration that uses in the enterprise servers, large scale integrated circuit) type raid(Redundant Arrays of Inexpensive Disks, disk array) fault detect of card hard disk, storehouse or instrument that prior art utilizes raid card manufacturer to provide, read each hard disk under the raid card/SSD(Solid State Disk, solid state hard disc) state, and failure count; When disk state undesired, when perhaps failure count surpasses threshold value, trigger fault alarm, yet when hard disk/SSD catastrophic failure, when causing raid card system can't identify, therefore the raid card controller can be played respective disc and remove out the raid array, no longer records this and coils any relevant state and failure message, can cause existing technological means that the fault that physics takes place falls the hard disk of dish is failed to report.

Summary of the invention

The present invention is intended to one of solve the problems of the technologies described above at least.

For this reason, one object of the present invention be to propose a kind of can reach the hard disk operation troubles detected more full coverage rate is arranged, and the accuracy that can increase substantially the hard disk monitoring and detect, the hard disk failure localization method based on the daily record of raid card of raising server O﹠M efficient.

Another object of the present invention is to propose a kind of hard disk failure locating device based on the daily record of raid card.

To achieve these goals, the embodiment of first aspect present invention has proposed a kind of Fault Locating Method of the hard disk based on the daily record of raid card, wherein, asynchronous real time propelling movement interface is set between disk array raid card and server, and be provided with the asynchronous event processing engine in the described server, described hard disk failure localization method comprise the steps: described raid cartoon cross described asynchronous real time propelling movement interface with raid card daily record real time propelling movement to described asynchronous event processing engine; Monitoring tool is analyzed the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judges that then described hard disk breaks down; When judging that described hard disk breaks down, described asynchronous event processing engine to described raid card daily record analyze to obtain and fall the relevant log information of dish, and with described with fall to coil relevant log information and push to the internal memory of described server to generate the daily record of local raid card; Described monitoring tool grasps many transition logouts of the physical disks of described disk in the daily record of described local raid card, and obtains the end-state of described hard disk according to many described transition logouts; And described monitoring tool compares current state and the end-state of described hard disk, if the current state of described hard disk and end-state do not match, judges that then the physical disks of described hard disk is fallen dish.

The Fault Locating Method based on the hard disk of raid card daily record according to the embodiment of the invention, in conjunction with hard disk current operation health and fitness information and the daily record of analyzing the raid card, can reach the hard disk operation troubles detected more full coverage rate is arranged, and increased substantially the accuracy of hard disk monitoring and detection, improved the O﹠M efficient of server.

In addition, the Fault Locating Method of the hard disk based on the daily record of raid card according to the above embodiment of the present invention can also have following additional technical characterictic:

In an embodiment of the present invention, if the current state of described hard disk and end-state coupling judge that then described hard disk breaks down.

In an embodiment of the present invention, described asynchronous event processing engine is after obtaining the described log information relevant with falling dish, also comprise the steps: the described log information relevant with falling dish formatd processing, the log information after format is handled pushes to the internal memory of described server.

In an embodiment of the present invention, the transition state of the described hard disk of described transition logout comprises: normal condition is transitted towards that malfunction, malfunction are transitted towards normal condition, malfunction is transitted towards abnormality.

In an embodiment of the present invention, the described end-state of obtaining described hard disk according to many described transition logouts, comprise the steps: the time of many described transition logouts is analyzed, obtain a transition logout of final time, obtain the end-state of described hard disk.

The embodiment of second aspect present invention has also proposed a kind of fault locator of the hard disk based on the daily record of raid card, comprise: monitoring tool, raid card, server and asynchronous real time propelling movement interface, wherein said asynchronous real time propelling movement interface is between described raid card and described server, and described raid card is used for by described asynchronous real time propelling movement interface raid card daily record real time propelling movement described server extremely; Described server comprises the asynchronous event processing engine, described asynchronous event processing engine is used for receiving the daily record of described raid card by described asynchronous real time propelling movement interface, and when described hard disk breaks down, daily record analyzes to obtain the log information relevant with falling dish to described raid card, and the described log information relevant with falling dish pushed to the internal memory of described server to generate the daily record of local raid card; Described monitoring tool is used for analyzing the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judge that then described hard disk breaks down, and in the daily record of described local raid card, grasp many transition logouts of the physical disks of described disk, and obtain the end-state of described hard disk according to many described transition logouts, and current state and the end-state of described hard disk compared, if the current state of described hard disk and end-state do not match, judge that then the physical disks of described hard disk is fallen dish.

The fault locator based on the hard disk of raid card daily record according to the embodiment of the invention, in conjunction with hard disk current operation health and fitness information and the daily record of analyzing the raid card, can reach the hard disk operation troubles detected more full coverage rate is arranged, and increased substantially the accuracy of hard disk monitoring and detection, improved the O﹠M efficient of server.

In addition, the fault locator of the hard disk based on the daily record of raid card according to the above embodiment of the present invention can also have following additional technical characterictic:

In an embodiment of the present invention, described monitoring tool judges that described hard disk breaks down when the current state that monitors described hard disk and end-state coupling.

In an embodiment of the present invention, described asynchronous event processing engine also is used for the described log information relevant with falling dish formatd processing, and the log information after format is handled pushes to the internal memory of described server.

In an embodiment of the present invention, described monitoring tool was analyzed the time of many described transition logouts, obtained a transition logout of final time, obtained the end-state of described hard disk.

Additional aspect of the present invention and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.

Description of drawings

Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment in conjunction with following accompanying drawing, wherein:

Fig. 1 is according to an embodiment of the invention based on the process flow diagram of the Fault Locating Method of the hard disk of raid card daily record;

Fig. 2 pushes synoptic diagram based on the raid card asynchronous event of the Fault Locating Method of the hard disk of raid card daily record in accordance with another embodiment of the present invention;

Fig. 3 pushes framework based on the raid card asynchronous event of the Fault Locating Method of the hard disk of raid card daily record according to an embodiment of the invention;

Fig. 4 is according to an embodiment of the invention based on physical disks state variation record synoptic diagram in the raid card daily record of the Fault Locating Method of the hard disk of raid card daily record;

Fig. 5 is in accordance with another embodiment of the present invention based on the process flow diagram of the Fault Locating Method of the hard disk of raid card daily record; With

Fig. 6 is according to an embodiment of the invention based on the structural drawing of the fault locator of the hard disk of raid card daily record.

Embodiment

Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical or similar label is represented identical or similar elements or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.

In description of the invention, it will be appreciated that, term " " center "; " vertically "; " laterally "; " on "; D score; " preceding ", " back ", " left side ", " right side ", " vertically ", " level ", " top ", " end ", " interior ", close the orientation of indications such as " outward " or position is based on orientation shown in the drawings or position relation, only be that the present invention for convenience of description and simplification are described, rather than device or the element of indication or hint indication must have specific orientation, with specific orientation structure and operation, therefore can not be interpreted as limitation of the present invention.In addition, term " first ", " second " only are used for describing purpose, and can not be interpreted as indication or hint relative importance.

In description of the invention, need to prove that unless clear and definite regulation and restriction are arranged in addition, term " installation ", " linking to each other ", " connection " should be done broad understanding, for example, can be fixedly connected, also can be to removably connect, or connect integratedly; Can be mechanical connection, also can be to be electrically connected; Can be directly to link to each other, also can link to each other indirectly by intermediary, can be the connection of two element internals.For the ordinary skill in the art, can concrete condition understand above-mentioned term concrete implication in the present invention.

Describe the Fault Locating Method based on the hard disk of raid card daily record according to the embodiment of the invention in detail below in conjunction with accompanying drawing 1-5.

As shown in Figure 1, based on the Fault Locating Method of the hard disk of raid card daily record, wherein, asynchronous real time propelling movement interface is set between disk array raid card and server according to an embodiment of the invention, and be provided with the asynchronous event processing engine in the server, this method may further comprise the steps:

Step S101, the raid cartoon is crossed asynchronous real time propelling movement interface with the asynchronous event processing engine of raid card daily record real time propelling movement to the server.

Particularly, all event informations that the raid card takes place have been recorded in the complete daily record of raid card, comprise the numbering of event in daily record, time of origin, information such as event description and event data.Adopt asynchronous mechanism to realize the real-time Communication for Power of server this locality and raid card controller, in case event occurs for the raid card, the raid card controller is in its storer in daily record storage time, utilize asynchronous event to push interface and be pushed to the asynchronous event processing engine that operates in the server, the asynchronous event processing engine is carried out analyzing and processing to event information.

Step S102, monitoring tool is analyzed the current state of hard disk, if the Logical Disk of hard disk is in degradation degraded state or the offline state that rolls off the production line, judges that then hard disk breaks down.

Particularly, the Logical Disk state of hard disk has optimal, and three kinds of states of degraded and offline reflect normal, degradation and the down status of current raid card logic dish respectively, and in other words, above-mentioned three kinds of states can be understood as corresponding normal and fault two states.Degraded or offline state occur when monitoring tool monitors Logical Disk, then decision logic dish corresponding physical dish breaks down.Wherein, monitoring tool is but is not limited to the MegaCli instrument.

Step S103, when judging that hard disk breaks down, the asynchronous event processing engine to the raid card daily record analyze to obtain and fall the relevant log information of dish, and with fall to coil relevant log information and push to the internal memory of server to generate the daily record of local raid card.Particularly, after the asynchronous event processing engine obtains the log information relevant with falling dish, processing is analyzed, filters and formatd to this log information relevant with falling dish, and will format the internal memory that log information after handling pushes to server, in order to generate the daily record of local raid card, convenient real-time inquiry and real time propelling movement when reaching zero influence to server performance, are accomplished obtaining in real time key message.

Step S104, monitoring tool grasp many transition logouts of the physical disks of disk in the daily record of local raid card, and obtain the end-state of hard disk according to many transition logouts.

Particularly, the transition state of transition logout hard disk comprises: normal condition is transitted towards malfunction, malfunction is transitted towards normal condition and malfunction is transitted towards abnormality.And the concrete step of the end-state of obtaining hard disk according to many transition logouts is: the time to many transition logouts is analyzed, and obtains a transition logout of final time, obtains the end-state of hard disk.

Step S105, monitoring tool is compared current state and the end-state of hard disk, if current state and the end-state of hard disk do not match, judges that then the physical disks of hard disk is fallen dish.Further, if the current state of hard disk and end-state coupling judge that then hard disk breaks down.

A flash is arranged on the raid card, and the various log when being used for the permanent storage operation can not lose during power down yet; Event in the raid card operational process comprises any situation of falling to coil that occurs, and corresponding state change all can be recorded among the flash.So utilize raid card stored log can cover hard disk failure very all sidedly.Then in above-mentioned example, for LSI type raid card, can utilize the MegaCli instrument to grasp raid card controller, the health parameters of physical disks and Logical Disk etc.For example, the Logical Disk state has optimal, and three kinds of states of degraded and offline reflect normal, degradation and the down status of current raid card logic dish respectively, and in other words, above-mentioned three kinds of states can be understood as the state of corresponding normal and fault.If degraded or offline state appear in Logical Disk, then can necessarily there be fault by decision logic dish corresponding physical dish.Accordingly, the media Error of physical disks, predictive failure, numerical value such as firmware state have reflected the running status of current physical disks, the firmware state online that reaches the standard grade, failure failed, unusual unconfigure_good, state values such as fault unconfigure_bad reflect the normal and abnormality of current physical disks respectively.In conjunction with the status information of Logical Disk and physical disks, can judge effectively whether the raid card is current moves normal and which piece Logical Disk has problem.

For the raid card that physics falls dish does not take place, above-mentioned detection means all can detect hard disk failure real-time and accurately.But when generation physics fell to coil, the raid card controller was no longer kicked out of array with this dish, causes falling the running state information of the faulty hard disk of dish not to be acquired in real time, also just can't navigate to the fault of this hard disk with above-mentioned means.Consider that the raid card controller can in time be recorded to event information in the daily record of raid card, comprise that physics falls to coil event information, can obtain the daily record of raid card and analyze, excavate the hard disk real-time running state information that to obtain, thereby realize generation physics is fallen the location of the faulty hard disk of dish.

Because all event informations that the raid card takes place have been recorded in the complete daily record of raid card, comprise the numbering of event in daily record, time of origin, event description, information such as event data, when server raid card event takes place when frequent, the log information amount is very big, and frequently reading daily record can influence server performance to internal memory.Be directed to this, adopt asynchronous mechanism to realize the real-time Communication for Power of server this locality and raid card controller, in case event occurs for the raid card, the raid card controller is stored in the event log in flash, utilize asynchronous event to push interface and be pushed to the asynchronous event processing engine that operates in the server, the asynchronous event processing engine is carried out the real-time information analysis, filters and format, the log information of format is stored in the server local hard drive, convenient real-time inquiry and data mining.Event asynchronous communication framework between raid card and the server has been realized the real time propelling movement of log information to this locality, when reaching zero influence to server performance, to obtaining in real time of key message.Asynchronous push is stored in the local daily record to the daily record increment of this locality, uses for fault location.It pushes synoptic diagram as shown in Figures 2 and 3.In conjunction with Fig. 2 and Fig. 3, when raid card when event occurs, the raid card reads the correlation parameter of event from its RAM, and on the one hand, raid card parameter information in its flash generates raid card log daily record; Simultaneously, on the other hand, the raid card pushes the interface propelling data to asynchronous event, and the asynchronous event processing engine of server receives and deal with data, and the data after will formaing are stored in server memory to generate the daily record of local raid card.

Event information type in the daily record of raid card has kind more than 200, and falling to coil relevant type with the location has 5 kinds, and wherein 2 class events of most critical are the status change information record of Logical Disk and physical disks.Status change information is recording the situation of change of hard disk running status, comprise by normal condition to malfunction, by malfunction to normal condition, by a kind of malfunction (unconfigure_bad) to another kind of abnormality (unconfigure_good) etc.Wherein, for example shown in Figure 4 about the set form that records of the event description (Event Description) of raid card logic dish and physical disks, be about a record of the state variation of physical disks in the daily record of raid card.

Every hard disk can have a lot of status change records in its cycle of operation, have only the current running state information of this piece hard disk of storage in the last item status change record.Event description record to this type of form is analyzed, obtain the final running status of every hard disk, thereby take place under the situation of current running status of hard disk that physics falls dish can't obtaining in real time, file by the event in the daily record of raid card, navigate to corresponding physical disks and fall dish, thereby improved failure checking cover ratio.

Fig. 5 is in accordance with another embodiment of the present invention based on the process flow diagram of the Fault Locating Method of the hard disk of raid card daily record.

As shown in Figure 5, based on the Fault Locating Method of the hard disk of raid card daily record, may further comprise the steps in accordance with another embodiment of the present invention:

Step S501, the operational monitoring instrument.Wherein, monitoring tool is but is not limited to the MegaCli instrument.Utilize the MegaCli instrument can grasp the raid card controller, the health parameters of physical disks and Logical Disk etc.

Step S502 analyzes current disk state.The Logical Disk state of hard disk has optimal, and three kinds of states of degraded and offline reflect normal, degradation and the down status of current raid card logic dish respectively, and namely above-mentioned three kinds of states can be understood as corresponding normal and fault two states.Degradation degraded or the offline state that rolls off the production line occur when monitoring tool monitors Logical Disk, then decision logic dish corresponding physical dish breaks down.

Step S503 judges whether to exist Logical Disk degraded or offline state.Judge namely whether monitoring tool detects Logical Disk and degraded or offline state occur, if execution in step S504 then, otherwise execution in step S505.

Step S504 generates the daily record of local raid card.Namely when detecting Logical Disk and degraded or offline state occur, illustrate that Logical Disk breaks down, then the asynchronous event processing engine to the raid card daily record analyze to obtain and fall the relevant log information of dish, processing is analyzed, filters and formatd to this log information relevant with falling dish, and will format the internal memory that log information after handling pushes to server, in order to generate the daily record of local raid card, convenient real-time inquiry and real time propelling movement, when reaching zero influence to server performance, accomplish obtaining in real time key message.

Step S505, non-fault.Namely when degraded or offline state do not appear in the decision logic dish, the hard disk non-fault is described.

Step S506 presses form and grasps physical disks status change logout.Event information type in the daily record of raid card has kind more than 200, and falling to coil relevant type with the location has 5 kinds, and wherein 2 class events of most critical are the status change information record of Logical Disk and physical disks.Status change information is recording the situation of change of hard disk running status, comprise by normal condition to malfunction, by malfunction to normal condition, by a kind of malfunction (unconfigure_bad) to another kind of abnormality (unconfigure_good) etc.Wherein, about the set form that records of the event description (Event Description) of raid card logic dish and physical disks, and need grasp physical disks status change logout according to this set form.

Step S507 resolves the end-state of each hard disk.Every hard disk can have a lot of status change records in its cycle of operation, have only the current running state information of this piece hard disk of storage in the last item status change record.Time to the event description of this type of form record is analyzed, and obtains a transition logout of the final time of every hard disk, obtains the final running status of hard disk.

Step S508 and the hard disk that can obtain current running status mate.Be that monitoring tool is compared current state and its end-state of hard disk.

Step S509, whether the current running status of hard disk mates with final running status.Judge namely whether the hard disk end-state of storing in the current running status of hard disk and the daily record of raid card mates.If execution in step S510 then, otherwise execution in step S511.

Step S510, hard disk failure detects fault.Namely when the current running status of hard disk is mated with final running status, judge that hard disk breaks down, and detect the particular location that fault takes place, handle.

Step S511, hard disc physical falls dish, detects fault.Namely when the current running status of hard disk does not match with final running status, judge that the physical disks of hard disk is fallen dish, and can realize that the faulty hard disk that generation physics is fallen dish positions.

Step S512, LSI type raid card.Namely be directed to the raid card of LSI type.

Step S513, LSI type raid card message interface.Namely between raid card and server, be provided with asynchronous real time propelling movement interface.

Step S514, raid card generation event.Namely event occurs when the raid card.

Step S515, the asynchronous event daily record pushes finger daemon filter message, asynchronous push key message.Namely when raid card when event occurs, the raid card controller is stored in the event log in flash, utilize asynchronous event to push interface and be pushed to the asynchronous event processing engine that operates in the server, the asynchronous event processing engine is carried out the real-time information analysis, filter and format, the log information of format is stored in the server local hard drive, convenient real-time inquiry and data mining.Event asynchronous communication framework between raid card and the server has been realized the real time propelling movement of log information to this locality, when reaching zero influence to server performance, to obtaining in real time of key message.Asynchronous push is stored in the local daily record to the daily record increment of this locality, uses for fault location.Namely continue execution in step S506.

As shown in Figure 6, according to an embodiment of the invention based on the fault locator 600 of the hard disk of raid card daily record, comprise: monitoring tool 610, raid card 620, server 630 and asynchronous real time propelling movement interface 640, wherein asynchronous real time propelling movement interface 640 is arranged between raid card 620 and the server 630.

Particularly, raid card 620 is used for by asynchronous real time propelling movement interface 640 raid card daily record real time propelling movement to server 630.

Server 630 comprises the asynchronous event processing engine, be used for receiving by asynchronous real time propelling movement interface 640 daily record of raid card 620, and when hard disk breaks down, the daily record of raid card 620 is analyzed to obtain the log information relevant with falling dish, and will push to the internal memory of server 630 to generate the daily record of local raid card in falling the relevant log information of dish.Particularly, the asynchronous event processing engine is by formaing processing for falling the relevant log information of dish, and the log information after format is handled pushes to the internal memory of server 630.

Monitoring tool 610 is used for analyzing the current state of hard disk, if the Logical Disk of hard disk is in degradation degraded state or the offline state that rolls off the production line, judge that then hard disk breaks down, and in the daily record of local raid card, grasp many transition logouts of the physical disks of disk, and the time of many transition logouts analyzed, obtain a transition logout of final time, thereby obtain the end-state of hard disk, and current state and the end-state of hard disk compared, if the current state of hard disk and end-state do not match, judge that then the physical disks of hard disk is fallen dish.Further, when monitoring tool 610 monitors the current state of hard disk and end-state coupling, judge that hard disk breaks down.Wherein, the transition state of transition logout hard disk, comprising: normal condition is transitted towards malfunction, and malfunction is transitted towards normal condition, and malfunction is transitted towards abnormality.Monitoring tool is but is not limited to the MegaCli instrument.

In above-mentioned example, raid card 620 by asynchronous real time propelling movement interface 640 with the daily record real time propelling movement of the raid card 620 asynchronous event processing engine to server 630, monitoring tool 610 is analyzed the current state of hard disk, when states such as degraded occurring when the Logical Disk that monitors hard disk, judge that hard disk breaks down, the asynchronous event processing engine is analyzed the information relevant with falling dish of obtaining to the daily record of raid card 620 then, and the internal memory that pushes to server 630 generates the daily record of local raid card, so that real-time inquiry and data mining.Monitoring tool 610 grasps many transition logouts of physical disks of disk in the local raid card daily record that generates then, obtain the end-state of hard disk accordingly, and compare with the current state of hard disk, if mate then hard disk breaks down, if do not match, then the physical disks of hard disk is fallen dish, and can navigate to the faulty hard disk that concrete which hard disk breaks down and concrete generation physics falls to coil.

Describe and to be understood that in the process flow diagram or in this any process of otherwise describing or method, expression comprises module, fragment or the part of code of the executable instruction of the step that one or more is used to realize specific logical function or process, and the scope of preferred implementation of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by opposite order, carry out function, this should be understood by the embodiments of the invention person of ordinary skill in the field.

In process flow diagram the expression or in this logic of otherwise describing and/or step, for example, can be considered to the sequencing tabulation for the executable instruction that realizes logic function, may be embodied in any computer-readable medium, use for instruction execution system, device or equipment (as the computer based system, comprise that the system of processor or other can be from the systems of instruction execution system, device or equipment instruction fetch and execution command), or use in conjunction with these instruction execution systems, device or equipment.With regard to this instructions, " computer-readable medium " can be anyly can comprise, storage, communication, propagation or transmission procedure be for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically of computer-readable medium (non-exhaustive list) comprises following: the electrical connection section (electronic installation) with one or more wirings, portable computer diskette box (magnetic device), random-access memory (ram), ROM (read-only memory) (ROM), can wipe and to edit ROM (read-only memory) (EPROM or flash memory), fiber device, and portable optic disk ROM (read-only memory) (CDROM).In addition, computer-readable medium even can be paper or other the suitable media that to print described program thereon, because can be for example by paper or other media be carried out optical scanning, then edit, decipher or handle to obtain described program in the electronics mode with other suitable methods in case of necessity, then it is stored in the computer memory.

Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, a plurality of steps or method can realize with being stored in the storer and by software or firmware that suitable instruction execution system is carried out.For example, if realize with hardware, the same in another embodiment, in the available following technology well known in the art each or their combination realize: have for the discrete logic of data-signal being realized the logic gates of logic function, special IC with suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.

Those skilled in the art are appreciated that and realize that all or part of step that above-described embodiment method is carried is to instruct relevant hardware to finish by program, described program can be stored in a kind of computer-readable recording medium, this program comprises one of step or its combination of method embodiment when carrying out.

In addition, each functional unit in each embodiment of the present invention can be integrated in the processing module, also can be that the independent physics in each unit exists, and also can be integrated in the module two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, also can adopt the form of software function module to realize.If described integrated module realizes with the form of software function module and during as independently production marketing or use, also can be stored in the computer read/write memory medium.

The above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.

In the description of this instructions, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.

Although illustrated and described embodiments of the invention, those having ordinary skill in the art will appreciate that: can carry out multiple variation, modification, replacement and modification to these embodiment under the situation that does not break away from principle of the present invention and aim, scope of the present invention is by claim and be equal to and limit.

Claims

1. Fault Locating Method based on the hard disk of raid card daily record, it is characterized in that, asynchronous real time propelling movement interface is set between disk array raid card and server, and is provided with the asynchronous event processing engine in the described server, described hard disk failure localization method comprises the steps:

Described raid cartoon cross described asynchronous real time propelling movement interface with raid card daily record real time propelling movement to described asynchronous event processing engine;

Monitoring tool is analyzed the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judges that then described hard disk breaks down;

When judging that described hard disk breaks down, described asynchronous event processing engine to described raid card daily record analyze to obtain and fall the relevant log information of dish, and with described with fall to coil relevant log information and push to the internal memory of described server to generate the daily record of local raid card;

Described monitoring tool grasps many transition logouts of the physical disks of described disk in the daily record of described local raid card, and obtains the end-state of described hard disk according to many described transition logouts; And

Described monitoring tool is compared current state and the end-state of described hard disk, if the current state of described hard disk and end-state do not match, judges that then the physical disks of described hard disk is fallen dish.

2. hard disk failure localization method as claimed in claim 1 is characterized in that, if the current state of described hard disk and end-state coupling judge that then described hard disk breaks down.

3. hard disk failure localization method as claimed in claim 1 is characterized in that, described asynchronous event processing engine also comprises the steps: after obtaining the described log information relevant with falling dish

The described log information relevant with falling dish formatd processing, and the log information after format is handled pushes to the internal memory of described server.

4. hard disk failure localization method as claimed in claim 1, it is characterized in that, the transition state of the described hard disk of described transition logout comprises: normal condition is transitted towards that malfunction, malfunction are transitted towards normal condition, malfunction is transitted towards abnormality.

5. hard disk failure localization method as claimed in claim 1 is characterized in that, describedly obtains the end-state of described hard disk according to many described transition logouts, comprises the steps:

Time to many described transition logouts is analyzed, and obtains a transition logout of final time, obtains the end-state of described hard disk.

6. the fault locator based on the hard disk of raid card daily record is characterized in that, comprising: monitoring tool, raid card, server and asynchronous real time propelling movement interface, wherein said asynchronous real time propelling movement interface between described raid card and described server,

Described raid card is used for by described asynchronous real time propelling movement interface raid card daily record real time propelling movement described server extremely;

Described server comprises the asynchronous event processing engine, described asynchronous event processing engine is used for receiving the daily record of described raid card by described asynchronous real time propelling movement interface, and when described hard disk breaks down, daily record analyzes to obtain the log information relevant with falling dish to described raid card, and the described log information relevant with falling dish pushed to the internal memory of described server to generate the daily record of local raid card;

Described monitoring tool is used for analyzing the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, judge that then described hard disk breaks down, and in the daily record of described local raid card, grasp many transition logouts of the physical disks of described disk, and obtain the end-state of described hard disk according to many described transition logouts, and current state and the end-state of described hard disk compared, if the current state of described hard disk and end-state do not match, judge that then the physical disks of described hard disk is fallen dish.

7. device as claimed in claim 6 is characterized in that, described monitoring tool judges that described hard disk breaks down when the current state that monitors described hard disk and end-state coupling.

8. device as claimed in claim 6 is characterized in that, described asynchronous event processing engine also is used for the described log information relevant with falling dish formatd processing, and the log information after format is handled pushes to the internal memory of described server.

9. device as claimed in claim 6 is characterized in that, the transition state of the described hard disk of described transition logout comprises: normal condition is transitted towards that malfunction, malfunction are transitted towards normal condition, malfunction is transitted towards abnormality.

10. device as claimed in claim 6 is characterized in that, described monitoring tool was analyzed the time of many described transition logouts, obtains a transition logout of final time, obtains the end-state of described hard disk.