CN103207820B - The Fault Locating Method of hard disk and device based on raid card log - Google Patents

The Fault Locating Method of hard disk and device based on raid card log Download PDF

Info

Publication number
CN103207820B
CN103207820B CN201310046008.7A CN201310046008A CN103207820B CN 103207820 B CN103207820 B CN 103207820B CN 201310046008 A CN201310046008 A CN 201310046008A CN 103207820 B CN103207820 B CN 103207820B
Authority
CN
China
Prior art keywords
hard disk
raid card
state
log
transition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310046008.7A
Other languages
Chinese (zh)
Other versions
CN103207820A (en
Inventor
刘亮
王雁鹏
王晓静
魏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310046008.7A priority Critical patent/CN103207820B/en
Publication of CN103207820A publication Critical patent/CN103207820A/en
Application granted granted Critical
Publication of CN103207820B publication Critical patent/CN103207820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The present invention proposes the Fault Locating Method of a kind of hard disk based on raid card log, comprises the following steps: daily record is pushed to asynchronous event and processes engine by raid card;The current state of monitoring tool analysis hard disk, if Logical Disk is in degraded state or offline state, then hard disk failure;Daily record is analyzed obtaining the log information relevant to falling dish by engine, and the internal memory pushing to server generates local raid card log;Monitoring instrument captures a plurality of transition logout of the physical disks of disk wherein, obtains the end-state of hard disk accordingly;And by its current state and end-state comparison, if not mating, then physical disks falls dish.Embodiments of the invention can reach the detection of hard disk operation troubles is had more full coverage rate, and hard disk monitoring and the accuracy that detect are greatly improved, and improves the O&M efficiency of server.The invention allows for the fault locator of a kind of hard disk based on raid card log.

Description

The Fault Locating Method of hard disk and device based on raid card log
Technical field
The present invention relates to technical field of information storage, particularly to Fault Locating Method and the device of a kind of hard disk based on raid card log.
Background technology
For the LSI(Large-scaleintegration used in enterprise servers, large scale integrated circuit) type raid(RedundantArraysofInexpensiveDisks, disk array) fault detect of card hard disk, prior art utilizes the storehouse or instrument that raid card manufacturer provides, read each hard disk/SSD(SolidStateDisk under raid card, solid state hard disc) state, and failure count;When disk state is abnormal, or time failure count exceedes threshold value, trigger fault is reported to the police, but when hard disk/SSD catastrophe failure, when causing raid card system None-identified, raid card controller can be played respective disc and remove out raid array, does not re-record any relevant state of this dish and fault message, therefore can cause that the fault having occurred and that hard disk that physics falls dish is failed to report by existing technological means.
Summary of the invention
It is contemplated that at least solve one of above-mentioned technical problem.
For this, it is an object of the present invention to propose a kind of can reaching and the detection of hard disk operation troubles is had more full coverage rate, and hard disk monitoring and the accuracy detected can be increased substantially, improve the hard disk failure localization method based on raid card log of server O&M efficiency.
Another object of the present invention is to propose a kind of hard disk failure positioner based on raid card log.
To achieve these goals, the embodiment of first aspect present invention proposes the Fault Locating Method of a kind of hard disk based on raid card log, wherein, asynchronous real time propelling movement interface is set between disk array raid card and server, and described server be provided with asynchronous event process engine, described hard disk failure localization method comprise the steps: described raid card by described asynchronous real time propelling movement interface by raid card log real time propelling movement to described asynchronous event process engine;The current state of monitoring tool analysis hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, then judges that described hard disk breaks down;When judging that described hard disk breaks down, described asynchronous event processes engine and is analyzed described raid card log obtaining the log information relevant to falling dish, and the described log information relevant to falling dish pushes to the internal memory of described server to generate this locality raid card log;Described monitoring instrument captures a plurality of transition logout of the physical disks of described disk in described local raid card log, and obtains the end-state of described hard disk according to a plurality of described transition logout;And current state and the end-state of described hard disk compared by described monitoring instrument, if the current state of described hard disk and end-state are not mated, then judge that the physical disks of described hard disk falls dish.
The Fault Locating Method of the hard disk based on raid card log according to embodiments of the present invention, in conjunction with current the running health and fitness information and analyze the daily record of raid card of hard disk, can reach that the detection of hard disk operation troubles is had more full coverage rate, and hard disk monitoring and the accuracy that detect are greatly improved, improve the O&M efficiency of server.
It addition, the Fault Locating Method of the hard disk based on raid card log according to the above embodiment of the present invention can also have following additional technical characteristic:
In an embodiment of the present invention, if the current state of described hard disk and end-state coupling, then judge that described hard disk breaks down.
In an embodiment of the present invention, described asynchronous event processes engine after obtaining described relevant to falling dish log information, also comprising the steps: the described log information relevant to falling dish is formatted process, the log information after formatting being processed pushes to the internal memory of described server.
In an embodiment of the present invention, the transition state of hard disk described in described transition logout, including: normal condition is transitted towards malfunction, malfunction is transitted towards normal condition, malfunction is transitted towards abnormality.
In an embodiment of the present invention, the described end-state obtaining described hard disk according to a plurality of described transition logout, comprise the steps: the time of a plurality of described transition logout is analyzed, obtain a transition logout of final time, obtain the end-state of described hard disk.
The embodiment of second aspect present invention also proposed the fault locator of a kind of hard disk based on raid card log, including: monitoring instrument, raid card, server and asynchronous real time propelling movement interface, wherein said asynchronous real time propelling movement interface is between described raid card and described server, and described raid card is used for raid card log real time propelling movement to described server by described asynchronous real time propelling movement interface;Described server includes asynchronous event and processes engine, described asynchronous event processes engine for by raid card log described in described asynchronous real time propelling movement interface, and when described hard disk breaks down, it is analyzed described raid card log obtaining the log information relevant to falling dish, and the described log information relevant to falling dish is pushed to the internal memory of described server to generate this locality raid card log;Described monitoring instrument is for analyzing the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, then judge that described hard disk breaks down, and in described local raid card log, capture a plurality of transition logout of the physical disks of described disk, and the end-state of described hard disk is obtained according to a plurality of described transition logout, and current state and the end-state of described hard disk are compared, if the current state of described hard disk and end-state are not mated, then judge that the physical disks of described hard disk falls dish.
The fault locator of the hard disk based on raid card log according to embodiments of the present invention, in conjunction with current the running health and fitness information and analyze the daily record of raid card of hard disk, can reach that the detection of hard disk operation troubles is had more full coverage rate, and hard disk monitoring and the accuracy that detect are greatly improved, improve the O&M efficiency of server.
It addition, the fault locator of the hard disk based on raid card log according to the above embodiment of the present invention can also have following additional technical characteristic:
In an embodiment of the present invention, described monitoring instrument is when monitoring current state and the end-state coupling of described hard disk, it is judged that described hard disk breaks down.
In an embodiment of the present invention, described asynchronous event processes engine and is additionally operable to the described log information relevant to falling dish is formatted process, and the log information after formatting being processed pushes to the internal memory of described server.
In an embodiment of the present invention, the transition state of hard disk described in described transition logout, including: normal condition is transitted towards malfunction, malfunction is transitted towards normal condition, malfunction is transitted towards abnormality.
In an embodiment of the present invention, the time of a plurality of described transition logout is analyzed by described monitoring instrument, obtains a transition logout of final time, obtains the end-state of described hard disk.
The additional aspect of the present invention and advantage will part provide in the following description, and part will become apparent from the description below, or is recognized by the practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage are from conjunction with will be apparent from easy to understand the accompanying drawings below description to embodiment, wherein:
Fig. 1 is the flow chart of the Fault Locating Method of hard disk based on raid card log according to an embodiment of the invention;
The raid card asynchronous event that Fig. 2 is the Fault Locating Method of hard disk based on raid card log in accordance with another embodiment of the present invention pushes schematic diagram;
The raid card asynchronous event that Fig. 3 is the Fault Locating Method of hard disk based on raid card log according to an embodiment of the invention pushes framework;
Fig. 4 be the Fault Locating Method of hard disk based on raid card log according to an embodiment of the invention raid card log in a physical disks state change record schematic diagram;
Fig. 5 is the flow chart of the Fault Locating Method of hard disk based on raid card log in accordance with another embodiment of the present invention;With
Fig. 6 is the structure chart of the fault locator of hard disk based on raid card log according to an embodiment of the invention.
Detailed description of the invention
Being described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of same or like function from start to finish.The embodiment described below with reference to accompanying drawing is illustrative of, and is only used for explaining the present invention, and is not considered as limiting the invention.
In describing the invention, it will be appreciated that, term " " center ", " longitudinal direction ", " transverse direction ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end ", " interior ", orientation or the position relationship of the instruction such as " outward " are based on orientation shown in the drawings or position relationship, it is for only for ease of the description present invention and simplifies description, rather than the device of instruction or hint indication or element must have specific orientation, with specific azimuth configuration and operation, therefore it is not considered as limiting the invention.Additionally, term " first ", " second " are only for descriptive purposes, and it is not intended that indicate or hint relative importance.
In describing the invention, it is necessary to explanation, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection " should be interpreted broadly, for instance, it is possible to it is fixing connection, it is also possible to be removably connect, or connect integratedly;Can be mechanically connected, it is also possible to be electrical connection;Can be joined directly together, it is also possible to be indirectly connected to by intermediary, it is possible to be the connection of two element internals.For the ordinary skill in the art, it is possible to concrete condition understands above-mentioned term concrete meaning in the present invention.
Fault Locating Method below in conjunction with the accompanying drawing 1-5 detailed description hard disk based on raid card log according to embodiments of the present invention.
As shown in Figure 1, according to an embodiment of the invention based on the Fault Locating Method of the hard disk of raid card log, wherein, asynchronous real time propelling movement interface is set between disk array raid card and server, and server is provided with asynchronous event process engine, the method comprises the following steps:
Step S101, the asynchronous event in raid card log real time propelling movement to server is processed engine by asynchronous real time propelling movement interface by raid card.
Specifically, the full log of raid card have recorded all event informations that raid card occurs, including event numbering in daily record, time of origin, the information such as event description and event data.Asynchronous mechanism is adopted to realize the real-time Communication for Power of server local and raid card controller, once raid card event occurs, raid card controller stores while time daily record in its memory, utilizing asynchronous event to push interface and be pushed to the asynchronous event process engine run in the server, asynchronous event processes engine and is analyzed event information processing.
Step S102, the current state of monitoring tool analysis hard disk, if the Logical Disk of hard disk is in degradation degraded state or the offline state that rolls off the production line, then judge that hard disk breaks down.
Specifically, the Logical Disk state of hard disk has three kinds of states of optimal, degraded and offline, reflects normal, degradation and the down status of current raid card Logical Disk respectively, and in other words, above-mentioned three kinds of states can be understood as corresponding normal and fault two states.Degraded or offline state occur when monitoring instrument monitors Logical Disk, then the physical disks that decision logic dish is corresponding breaks down.Wherein, monitoring instrument is but is not limited to MegaCli instrument.
Step S103, when judging that hard disk breaks down, asynchronous event processes engine and is analyzed raid card log obtaining the log information relevant to falling dish, and the log information relevant to falling dish pushes to the internal memory of server to generate this locality raid card log.Specifically, after asynchronous event processes the log information that engine acquisition is relevant to falling dish, the log information that this is relevant to falling dish is analyzed, filters and formatting process, and the log information after formatting being processed pushes to the internal memory of server, to generate local raid card log, facilitate real-time inquiry and real time propelling movement, while reaching on server performance zero impact, accomplish the real-time acquisition to key message.
Step S104, monitoring instrument captures a plurality of transition logout of the physical disks of disk in local raid card log, and obtains the end-state of hard disk according to a plurality of transition logout.
Specifically, the transition state of transition logout hard disk, including: normal condition is transitted towards malfunction, malfunction is transitted towards normal condition and malfunction is transitted towards abnormality.And according to the step that the end-state of a plurality of transition logout acquisition hard disk is concrete be: the time of a plurality of transition logout is analyzed, obtains a transition logout of final time, obtain the end-state of hard disk.
Step S105, current state and the end-state of hard disk are compared by monitoring instrument, if the current state of hard disk and end-state are not mated, then judge that the physical disks of hard disk falls dish.Further, if the current state of hard disk and end-state coupling, then judge that hard disk breaks down.
Having one piece of flash on raid card, various log when running for permanent storage, without loss during power down;Event in raid card running, including occur any fall dish situation, corresponding state change all can record in flash.Therefore utilize the daily record that raid card stores can cover hard disk failure very all sidedly.Then in the examples described above, for LSI type raid card, it is possible to use MegaCli instrument captures raid card controller, the health parameters of physical disks and Logical Disk etc..Such as, Logical Disk state has three kinds of states of optimal, degraded and offline, reflects normal, degradation and the down status of current raid card Logical Disk respectively, and in other words, above-mentioned three kinds of states can be understood as corresponding normal and fault state.If degraded or offline state occurs in Logical Disk, then can be determined that physical disks corresponding to Logical Disk certainly exists fault.Accordingly, the mediaError of physical disks, predictivefailure, the numerical value such as firmwarestate reflect the running status of present physical dish, firmwarestate reaches the standard grade online, failure failed, abnormal unconfigure_good, the state values such as fault unconfigure_bad, the respectively normal and abnormality of reflection present physical dish.Status information in conjunction with Logical Disk and physical disks, it is possible to effectively judge that raid card currently whether normal operation and which block Logical Disk are problematic.
For not occurring physics to fall the raid card of dish, above-mentioned detection means all can detect hard disk failure real-time and accurately.But, when occurring physics to fall dish, this dish is no longer kicked out of array by raid card controller, causes falling that the running state information of the faulty hard disk of dish can not be acquired in real time, also just cannot navigate to the fault of this hard disk by above-mentioned means.Consider that event information can be recorded in raid card log by raid card controller in time, dish event information is fallen including physics, raid card log can be obtained be analyzed, excavate the hard disk real-time running state information that cannot obtain, thus realizing generation physics is fallen the location of the faulty hard disk of dish.
Owing to raid card full log have recorded all event informations that raid card occurs, including event numbering in daily record, time of origin, event description, the information such as event data, when server raid card event occurs frequent, log information amount is very big, and frequently reading daily record can affect server performance to internal memory.It is directed to this, asynchronous mechanism is adopted to realize the real-time Communication for Power of server local and raid card controller, once raid card event occurs, raid card controller stores while event log in flash, utilizing asynchronous event to push interface and be pushed to the asynchronous event process engine run in the server, asynchronous event processes engine and carries out real-time signal analysis, filters and formatting, the log information of formatting is stored in server local hard disk, facilitates real-time inquiry and data mining.Event asynchronous communication framework between raid card and server achieves log information to local real time propelling movement, the real-time acquisition while reaching on server performance zero impact, to key message.Asynchronous push is stored in local daily record to local daily record increment, for location fault.It pushes schematic diagram as shown in Figures 2 and 3.In conjunction with Fig. 2 and Fig. 3, when event occurs for raid card, raid card reads the relevant parameter of event from its RAM, on the one hand, raid card, to parameter information in its flash, generates raid card log daily record;Meanwhile, on the other hand, raid card pushes interface propelling data to asynchronous event, and the asynchronous event of server processes engine and receives and process data, and the data after formatting is stored in server memory to generate local raid card log.
Event information type in raid card log has kind more than 200, and the type falling dish relevant with location has 5 kinds, the wherein status change information record that 2 class events are Logical Disk and physical disks of most critical.Status change information records the situation of change of hard disk running status, including by normal condition to malfunction, by malfunction to normal condition, by a kind of malfunction (unconfigure_bad) to another kind abnormality (unconfigure_good) etc..Wherein, the record about the event description (EventDescription) of raid card Logical Disk and physical disks has fixing form, for instance shown in Fig. 4, is the record changed about the state of physical disks in raid card log.
Every piece of hard disk, in its cycle of operation, can have a lot of status change records, only stores the current operating conditions information of this block hard disk in the last item status change record.The event description record of this type of form is analyzed, obtain the final running status of every piece of hard disk, thus when the current operating conditions of hard disk that generation physics falls dish cannot be obtained in real time, achieve by the event in raid card log, navigate to corresponding physical disks and fall dish, thus improve failure checking cover ratio.
Fig. 5 is the flow chart of the Fault Locating Method of hard disk based on raid card log in accordance with another embodiment of the present invention.
As it is shown in figure 5, in accordance with another embodiment of the present invention based on the Fault Locating Method of the hard disk of raid card log, comprise the following steps:
Step S501, operational monitoring instrument.Wherein, monitoring instrument is but is not limited to MegaCli instrument.Utilize MegaCli instrument can capture raid card controller, the health parameters of physical disks and Logical Disk etc..
Step S502, analyzes current disk state.The Logical Disk state of hard disk has three kinds of states of optimal, degraded and offline, reflects normal, degradation and the down status of current raid card Logical Disk respectively, and namely above-mentioned three kinds of states can be understood as corresponding normal and fault two states.Degradation degraded or the offline state that rolls off the production line occur when monitoring instrument monitors Logical Disk, then the physical disks that decision logic dish is corresponding breaks down.
Step S503, it may be judged whether there is Logical Disk degraded or offline state.Namely judge whether monitoring instrument detects that degraded or offline state occurs in Logical Disk, if it is perform step S504, otherwise perform step S505.
Step S504, generates local raid card log.Namely when detecting that degraded or offline state occurs in Logical Disk, explanation Logical Disk breaks down, then asynchronous event processes engine and raid card log is analyzed the log information relevant to falling dish with acquisition, the log information that this is relevant to falling dish is analyzed, filters and formatting process, and the log information after formatting being processed pushes to the internal memory of server, to generate local raid card log, facilitate real-time inquiry and real time propelling movement, while reaching on server performance zero impact, accomplish the real-time acquisition to key message.
Step S505, fault-free.Namely, when degraded or offline state does not occur in decision logic dish, hard disk fault-free is described.
Step S506, captures physical disks status change logout by form.Event information type in raid card log has kind more than 200, and the type falling dish relevant with location has 5 kinds, the wherein status change information record that 2 class events are Logical Disk and physical disks of most critical.Status change information records the situation of change of hard disk running status, including by normal condition to malfunction, by malfunction to normal condition, by a kind of malfunction (unconfigure_bad) to another kind abnormality (unconfigure_good) etc..Wherein, the record about the event description (EventDescription) of raid card Logical Disk and physical disks has fixing form, and needs to capture physical disks status change logout according to this fixing form.
Step S507, resolves the end-state of each hard disk.Every piece of hard disk, in its cycle of operation, can have a lot of status change records, only stores the current operating conditions information of this block hard disk in the last item status change record.The time of the event description record of this type of form is analyzed, obtains a transition logout of the final time of every piece of hard disk, obtain the final running status of hard disk.
Step S508, and the hard disk that can obtain current operating conditions mates.Namely the current state of hard disk and its end-state are compared by monitoring instrument.
Step S509, whether the current operating conditions of hard disk mates with final running status.Namely judge whether the current operating conditions of hard disk mates with the hard disk end-state of storage in raid card log.If it is perform step S510, otherwise perform step S511.
Step S510, hard disk failure, detect fault.Namely when the current operating conditions of hard disk mates with final running status, it is determined that hard disk breaks down, and detect the particular location that fault occurs, process.
Step S511, hard disc physical falls dish, detects fault.Namely when the current operating conditions of hard disk and final running status are not mated, it is determined that the physical disks of hard disk falls dish, and the faulty hard disk being capable of generation physics is fallen dish positions.
Step S512, LSI type raid card.Namely the raid card of LSI type it is directed to.
Step S513, LSI type raid card message interface.Namely between raid card and server, it is provided with asynchronous real time propelling movement interface.
Step S514, raid card generation event.Namely when raid card, event occurs.
Step S515, asynchronous event daily record pushes finger daemon filter message, asynchronous push key message.Namely when event occurs for raid card, raid card controller stores while event log in flash, utilize asynchronous event to push interface and be pushed to the asynchronous event process engine run in the server, asynchronous event processes engine and carries out real-time signal analysis, filter and formatting, the log information of formatting is stored in server local hard disk, facilitates real-time inquiry and data mining.Event asynchronous communication framework between raid card and server achieves log information to local real time propelling movement, the real-time acquisition while reaching on server performance zero impact, to key message.Asynchronous push is stored in local daily record to local daily record increment, for location fault.Namely step S506 is continued executing with.
The Fault Locating Method of the hard disk based on raid card log according to embodiments of the present invention, in conjunction with current the running health and fitness information and analyze the daily record of raid card of hard disk, can reach that the detection of hard disk operation troubles is had more full coverage rate, and hard disk monitoring and the accuracy that detect are greatly improved, improve the O&M efficiency of server.
Fig. 6 is the structure chart of the fault locator of hard disk based on raid card log according to an embodiment of the invention.
As shown in Figure 6, according to an embodiment of the invention based on the fault locator 600 of the hard disk of raid card log, including: monitoring instrument 610, raid card 620, server 630 and asynchronous real time propelling movement interface 640, wherein asynchronous real time propelling movement interface 640 is arranged between raid card 620 and server 630.
Specifically, raid card 620 is used for raid card log real time propelling movement to server 630 by asynchronous real time propelling movement interface 640.
Server 630 includes asynchronous event and processes engine, for being received the daily record of raid card 620 by asynchronous real time propelling movement interface 640, and when hard disk breaks down, it is analyzed the daily record of raid card 620 obtaining the log information relevant to falling dish, and the internal memory of server 630 will be pushed to generate this locality raid card log in falling the log information that dish is correlated with.Specifically, asynchronous event processes engine and formats process by the log information for falling dish relevant, and the log information after formatting being processed pushes to the internal memory of server 630.
Monitoring instrument 610 is for analyzing the current state of hard disk, if the Logical Disk of hard disk is in degradation degraded state or the offline state that rolls off the production line, then judge that hard disk breaks down, and in local raid card log, capture a plurality of transition logout of the physical disks of disk, and the time of a plurality of transition logout is analyzed, obtain a transition logout of final time, thus obtaining the end-state of hard disk, and current state and the end-state of hard disk are compared, if the current state of hard disk and end-state are not mated, then judge that the physical disks of hard disk falls dish.Further, when monitoring instrument 610 monitors current state and the end-state coupling of hard disk, it is determined that hard disk breaks down.Wherein, the transition state of transition logout hard disk, including: normal condition is transitted towards malfunction, and malfunction is transitted towards normal condition, and malfunction is transitted towards abnormality.Monitoring instrument is but is not limited to MegaCli instrument.
In the examples described above, the asynchronous event of the daily record real time propelling movement of raid card 620 to server 630 is processed engine by asynchronous real time propelling movement interface 640 by raid card 620, the current state of hard disk analyzed by monitoring instrument 610, when there are the states such as degraded in the Logical Disk monitoring hard disk, judge that hard disk breaks down, then asynchronous event processes engine and the daily record of raid card 620 is analyzed the information that acquisition is relevant to falling dish, and the internal memory pushing to server 630 generates local raid card log, in order to real-time inquiry and data mining.Then monitoring instrument 610 captures a plurality of transition logout of physical disks of disk in the local raid card log generated, obtain the end-state of hard disk accordingly, and compare with the current state of hard disk, if mated, hard disk breaks down, if do not mated, then the physical disks of hard disk falls dish, and can navigate to which hard disk concrete and break down and the concrete faulty hard disk occurring physics to fall dish.
The fault locator of the hard disk based on raid card log according to embodiments of the present invention, in conjunction with current the running health and fitness information and analyze the daily record of raid card of hard disk, can reach that the detection of hard disk operation troubles is had more full coverage rate, and hard disk monitoring and the accuracy that detect are greatly improved, improve the O&M efficiency of server.
Describe in flow chart or in this any process described otherwise above or method and be construed as, represent and include the module of code of executable instruction of one or more step for realizing specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press order that is shown or that discuss, including according to involved function by basic mode simultaneously or in the opposite order, performing function, this should be understood by embodiments of the invention person of ordinary skill in the field.
Represent in flow charts or in this logic described otherwise above and/or step, such as, it is considered the sequencing list of executable instruction for realizing logic function, may be embodied in any computer-readable medium, use for instruction execution system, device or equipment (such as computer based system, including the system of processor or other can from instruction execution system, device or equipment instruction fetch the system performing instruction), or use in conjunction with these instruction execution systems, device or equipment.For the purpose of this specification, " computer-readable medium " can be any can comprise, store, communicate, propagate or transmission procedure is for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically (non-exhaustive list) of computer-readable medium includes following: have the electrical connection section (electronic installation) of one or more wiring, portable computer diskette box (magnetic device), random-access memory (ram), read only memory (ROM), erasable edit read only memory (EPROM or flash memory), fiber device, and portable optic disk read only memory (CDROM).Additionally, computer-readable medium can even is that the paper that can print described program thereon or other suitable media, because can such as by paper or other media be carried out optical scanning, then carry out editing, interpreting or be processed to electronically obtain described program with other suitable methods if desired, be then stored in computer storage.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple steps or method can realize with the storage software or firmware in memory and by suitable instruction execution system execution.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: there is the discrete logic of logic gates for data signal realizes logic function, there is the special IC of suitable combination logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries can be by the hardware that program carrys out instruction relevant and complete, described program can be stored in a kind of computer-readable recording medium, this program upon execution, including the step one or a combination set of of embodiment of the method.
Additionally, each functional unit in each embodiment of the present invention can be integrated in a processing module, it is also possible to be that unit is individually physically present, it is also possible to two or more unit are integrated in a module.Above-mentioned integrated module both can adopt the form of hardware to realize, it would however also be possible to employ the form of software function module realizes.If described integrated module is using the form realization of software function module and as independent production marketing or use, it is also possible to be stored in a computer read/write memory medium.
Storage medium mentioned above can be read only memory, disk or CD etc..
In the description of this specification, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example describe are contained at least one embodiment or the example of the present invention.In this manual, the schematic representation of above-mentioned term is not necessarily referring to identical embodiment or example.And, the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiments or example.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that: these embodiments can being carried out multiple change, amendment, replacement and modification when without departing from principles of the invention and objective, the scope of the present invention is by claim and equivalency thereof.

Claims (10)

1. the Fault Locating Method based on the hard disk of raid card log, it is characterized in that, arranging asynchronous real time propelling movement interface between disk array raid card and server, and be provided with asynchronous event process engine in described server, described hard disk failure localization method comprises the steps:
Raid card log real time propelling movement to described asynchronous event is processed engine by described asynchronous real time propelling movement interface by described raid card;
The current state of monitoring tool analysis hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, then judges that described hard disk breaks down;
When judging that described hard disk breaks down, described asynchronous event processes engine and is analyzed described raid card log obtaining the log information relevant to falling dish, and the described log information relevant to falling dish pushes to the internal memory of described server to generate this locality raid card log;
Described monitoring instrument captures a plurality of transition logout of the physical disks of disk in described local raid card log, and obtains the end-state of described hard disk according to a plurality of described transition logout;And
Current state and the end-state of described hard disk are compared by described monitoring instrument, if the current state of described hard disk and end-state are not mated, then judge that the physical disks of described hard disk falls dish.
2. hard disk failure localization method as claimed in claim 1, it is characterised in that if the current state of described hard disk and end-state coupling, then judge that described hard disk breaks down.
3. hard disk failure localization method as claimed in claim 1, it is characterised in that described asynchronous event processes engine after obtaining described relevant to falling dish log information, also comprises the steps:
The described log information relevant to falling dish is formatted process, and the log information after formatting being processed pushes to the internal memory of described server.
4. hard disk failure localization method as claimed in claim 1, it is characterised in that the transition state of hard disk described in described transition logout, including: normal condition is transitted towards malfunction, malfunction is transitted towards normal condition, malfunction is transitted towards abnormality.
5. hard disk failure localization method as claimed in claim 1, it is characterised in that the described end-state obtaining described hard disk according to a plurality of described transition logout, comprises the steps:
The time of a plurality of described transition logout is analyzed, obtains a transition logout of final time, obtain the end-state of described hard disk.
6. the fault locator based on the hard disk of raid card log, it is characterised in that including: monitoring instrument, raid card, server and asynchronous real time propelling movement interface, wherein said asynchronous real time propelling movement interface between described raid card and described server,
Described raid card is used for raid card log real time propelling movement to described server by described asynchronous real time propelling movement interface;
Described server includes asynchronous event and processes engine, described asynchronous event processes engine for by raid card log described in described asynchronous real time propelling movement interface, and when described hard disk breaks down, it is analyzed described raid card log obtaining the log information relevant to falling dish, and the described log information relevant to falling dish is pushed to the internal memory of described server to generate this locality raid card log;
Described monitoring instrument is for analyzing the current state of hard disk, if the Logical Disk of described hard disk is in degradation degraded state or the offline state that rolls off the production line, then judge that described hard disk breaks down, and in described local raid card log, capture a plurality of transition logout of the physical disks of disk, and the end-state of described hard disk is obtained according to a plurality of described transition logout, and current state and the end-state of described hard disk are compared, if the current state of described hard disk and end-state are not mated, then judge that the physical disks of described hard disk falls dish.
7. device as claimed in claim 6, it is characterised in that described monitoring instrument is when monitoring current state and the end-state coupling of described hard disk, it is judged that described hard disk breaks down.
8. device as claimed in claim 6, it is characterised in that described asynchronous event processes engine and is additionally operable to the described log information relevant to falling dish is formatted process, and the log information after formatting being processed pushes to the internal memory of described server.
9. device as claimed in claim 6, it is characterised in that the transition state of hard disk described in described transition logout, including: normal condition is transitted towards malfunction, malfunction is transitted towards normal condition, malfunction is transitted towards abnormality.
10. device as claimed in claim 6, it is characterised in that the time of a plurality of described transition logout is analyzed by described monitoring instrument, obtains a transition logout of final time, obtains the end-state of described hard disk.
CN201310046008.7A 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log Active CN103207820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310046008.7A CN103207820B (en) 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310046008.7A CN103207820B (en) 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log

Publications (2)

Publication Number Publication Date
CN103207820A CN103207820A (en) 2013-07-17
CN103207820B true CN103207820B (en) 2016-06-29

Family

ID=48755049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310046008.7A Active CN103207820B (en) 2013-02-05 2013-02-05 The Fault Locating Method of hard disk and device based on raid card log

Country Status (1)

Country Link
CN (1) CN103207820B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995772A (en) * 2014-06-10 2014-08-20 浪潮电子信息产业股份有限公司 RAID card log completely-storing method based on LINUX operation system
CN105045689A (en) * 2015-06-25 2015-11-11 浪潮电子信息产业股份有限公司 Method for monitoring and alarming hard disks by using RAID card batch detection
CN105068901A (en) * 2015-07-27 2015-11-18 浪潮电子信息产业股份有限公司 Method for detecting magnetic disc
CN105117172B (en) * 2015-08-31 2019-04-02 深圳神州数码云科数据技术有限公司 A kind of disk array history falls the store method of disk record
CN105223889A (en) * 2015-10-13 2016-01-06 浪潮电子信息产业股份有限公司 Method for automatically monitoring PMC RAID card log suitable for production line
CN107577545B (en) * 2016-07-05 2021-02-02 北京金山云网络技术有限公司 Method and device for detecting and repairing fault disk
CN106250258B (en) * 2016-07-29 2019-03-29 北京云集智造科技有限公司 A kind of disk failure localization method and device
CN107515827B (en) * 2017-08-21 2021-07-27 湖南国科微电子股份有限公司 PCIE SSD custom log storage method and device and SSD
CN107766191A (en) * 2017-11-03 2018-03-06 郑州云海信息技术有限公司 The automatic detecting storage information of Linux systems and the method for testing of health status
CN108763020A (en) * 2018-05-23 2018-11-06 郑州云海信息技术有限公司 It is a kind of fall disk capture the method and monitor card of storage management card daily record automatically
CN108984119A (en) * 2018-06-28 2018-12-11 郑州云海信息技术有限公司 A kind of asynchronous method, apparatus and controlled terminal for obtaining RAID card information
CN111625390B (en) * 2020-05-28 2024-03-26 深圳市晶讯技术股份有限公司 Embedded equipment fault recovery method and device, embedded equipment and storage medium
CN112162705A (en) * 2020-09-30 2021-01-01 新浪网技术(中国)有限公司 RAID (redundant array of independent disk) set fault automatic offline repair reporting method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004061681A1 (en) * 2002-12-26 2004-07-22 Fujitsu Limited Operation managing method and operation managing server
CN101359959A (en) * 2008-09-17 2009-02-04 中兴通讯股份有限公司 Information acquisition method for fault locating analysis
CN101887387A (en) * 2010-04-07 2010-11-17 山东高效能服务器和存储研究院 Method for remotely intelligently monitoring and analyzing RAID faults
CN102662787A (en) * 2012-04-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for protecting system disk RAID (redundant array of independent disks)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004061681A1 (en) * 2002-12-26 2004-07-22 Fujitsu Limited Operation managing method and operation managing server
CN101359959A (en) * 2008-09-17 2009-02-04 中兴通讯股份有限公司 Information acquisition method for fault locating analysis
CN101887387A (en) * 2010-04-07 2010-11-17 山东高效能服务器和存储研究院 Method for remotely intelligently monitoring and analyzing RAID faults
CN102662787A (en) * 2012-04-20 2012-09-12 浪潮电子信息产业股份有限公司 Method for protecting system disk RAID (redundant array of independent disks)

Also Published As

Publication number Publication date
CN103207820A (en) 2013-07-17

Similar Documents

Publication Publication Date Title
CN103207820B (en) The Fault Locating Method of hard disk and device based on raid card log
CN100504795C (en) Computer RAID array early-warning system and method
CN105468484B (en) Method and apparatus for locating a fault in a storage system
CN102148046B (en) Data storage device tester
CN102157176B (en) Data storage device tester
CN102279775B (en) Method for processing failure of hard disk under Linux system
US9047922B2 (en) Autonomous event logging for drive failure analysis
CN103197995B (en) Hard disk fault detection method and device
CN104951383A (en) Hard disk health state monitoring method and hard disk health state monitoring device
CN102117660A (en) Data storage device tester
CN102591591A (en) Disk detection system, disk detection method and network storage system
CN111104293A (en) Method, apparatus and computer program product for supporting disk failure prediction
JP2005322399A (en) Maintenance method of track data integrity in magnetic disk storage device
US20050210161A1 (en) Computer device with mass storage peripheral (s) which is/are monitored during operation
CN103049345B (en) Based on Disk State transition detection method and the device of asynchronous mechanism
CN111522703A (en) Method, apparatus and computer program product for monitoring access requests
CN111048138A (en) Hard disk fault detection method and related device
CN114003417B (en) Method, device and storage medium for realizing automatic fault transfer of RAID card
CN105372584A (en) Microswitch testing method, device and system
CN114758714A (en) Hard disk fault prediction method and device, electronic equipment and storage medium
CN107807862A (en) Detect the method, apparatus and server of hard disk failure point
US8161324B2 (en) Analysis result stored on a field replaceable unit
CN109741786A (en) A kind of solid state hard disk monitoring method, device and equipment
CN116775362A (en) Method and system for processing path blocking of redundant array of independent disks
CN112084097B (en) Disk alarm method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant