CN104158843A

CN104158843A - Storage unit invalidation detecting method and device for distributed file storage system

Info

Publication number: CN104158843A
Application number: CN201410333913.5A
Authority: CN
Inventors: 李璐
Original assignee: SHENZHEN ZHONGBO KECHUANG INFORMATION TECHNOLOGY Co Ltd
Current assignee: Beijing Toyou Feiji Electronics Co., Ltd.
Priority date: 2014-07-14
Filing date: 2014-07-14
Publication date: 2014-11-19
Anticipated expiration: 2034-07-14
Also published as: CN104158843B

Abstract

The invention discloses a storage unit invalidation detecting method for a distributed file storage system. The method comprises the following steps: acquiring operation marks of storage units of nodes in sequence; when the acquiring of the operation mark of one storage unit is in fail, recording the storage unit as an invalid storage unit, and continuing acquiring the operation marks of other storage units of the node wherein the operation mark is located, or acquiring the operation marks of the storage units of other nodes in sequence. The invention further discloses a storage unit invalidation detecting device for the distributed file storage system. According to the storage unit invalidation detecting method and device for the distributed file storage system, the operation marks of the storage units of the nodes are acquired in sequence to determine the invalid storage units and to record, so that the invalid storage units in the nodes of the distributed file storage system can be effectively detected, a user can maintain the invalid storage units in time, and the reliability of the distributed file storage system is ensured.

Description

Storage-unit-failure detection method and the device of distributed file storage system

Technical field

The present invention relates to distributed file system failure detection field, relate in particular to storage-unit-failure detection method and the device of distributed file storage system.

Background technology

In recent years, network distribution type storage has become the new trend of Development of storage technology.Distributed file system is to build the requisite part of large-scale distributed storage system.Because data are to be distributed in the memory cell of different memory nodes, when even certain several storage-unit-failure is unavailable, because these data still exist in some memory cell of other nodes, so access node is normal visit data still, this just provides the high reliability of data.Although data have back-up storage in other memory cell, when lost efficacy the continuous cumulative rises of memory cell time, may cause the loss of data, and then cause data normally not access, distributed file storage system lost efficacy unavailable.

Therefore, need the scheme that a kind of detection of stored element failure is provided badly, to find in time the failed storage unit in distributed file storage system, thereby be convenient to carry out the timely replacing of memory cell, guarantee the high reliability of distributed file storage system.

Summary of the invention

Main purpose of the present invention is to solve the technical problem that distributed file storage system can not detect failed storage unit.

For achieving the above object, the storage-unit-failure detection method of a kind of distributed file storage system provided by the invention, the storage-unit-failure detection method of described distributed file storage system comprises the following steps:

Obtain successively the operation sign of the memory cell of each node;

When the operation sign of memory cell is obtained unsuccessfully, recording this memory cell is failed storage unit, and continues to obtain the operation sign of other memory cell of this memory cell place node, or, obtain successively the operation sign of the memory cell of other node.

Preferably, when the described sign of the operation in memory cell is obtained unsuccessfully, recording this memory cell is that failed storage unit comprises:

When the operation sign of memory cell is obtained unsuccessfully, restart this memory cell;

In default very first time interval, if memory cell is restarted unsuccessfully, recording this memory cell is failed storage unit.

Preferably, when the described sign of the operation in memory cell is obtained unsuccessfully, recording this memory cell is failed storage unit, and the operation sign of other memory cell in this node is obtained in continuation successively, or, after obtaining successively the step of operation sign of memory cell of other node, the storage-unit-failure detection method of described distributed file storage system also comprises:

Determine the quantity of failed storage unit described in distributed file storage system;

When the quantity of failed storage unit described in distributed file storage system is greater than first threshold, determine that described distributed file storage system lost efficacy.

Preferably, before the step that described operation of obtaining successively the memory cell of each node identifies, the storage-unit-failure detection method of described distributed file storage system also comprises:

Control between the node in distributed file storage system and mutually send and detect packet;

Successively using arbitrary node in distributed file storage system as Section Point, other node as first node to determine the validity of Section Point;

Within default very first time interval, determine the quantity of the first node that does not receive response data packet, the detection packet that the described Section Point of described feedback data packet sends based on described first node feeds back;

When the quantity that does not receive the first node of response data packet is greater than the second default threshold values, recording described Section Point is failure node, and by described Section Point shielding.

Preferably, when the described quantity not receiving the first node of response data packet is greater than the second default threshold values, recording described Section Point is failure node, and by after the step of described Section Point shielding, the storage-unit-failure detection method of described distributed file storage system also comprises:

Determine the quantity of failure node described in distributed file storage system;

When the quantity of failure node described in distributed file storage system is less than the 3rd default threshold value, determine that described distributed file storage system is effective.

In addition, for achieving the above object, the present invention also provides a kind of storage-unit-failure checkout gear of distributed file storage system, and the storage-unit-failure checkout gear of described distributed file storage system comprises:

Acquisition module, for obtaining successively the operation sign of the memory cell of each node, and while obtaining unsuccessfully for the operation sign in memory cell, continue to obtain the operation sign of other memory cell of this memory cell place node, or the operation of obtaining successively the memory cell of other node identifies;

Logging modle, while obtaining unsuccessfully for the operation sign in memory cell, recording this memory cell is failed storage unit.

Preferably, described logging modle comprises:

Restart unit, while obtaining unsuccessfully for the operation sign in memory cell, restart this memory cell;

Record cell, in the very first time interval default, if memory cell is restarted unsuccessfully, recording this memory cell is failed storage unit.

Preferably, the storage-unit-failure checkout gear of described distributed file storage system also comprises:

The first determination module, for obtaining the quantity of failed storage unit described in distributed file storage system;

The second determination module, while being greater than first threshold for the quantity in failed storage unit described in distributed file storage system, determines that described distributed file storage system lost efficacy.

Control module, mutually sends and detects packet for controlling between the node of distributed file storage system;

Node availability detection module, for successively using the arbitrary node of distributed file storage system as Section Point, other node as first node to determine the validity of Section Point;

The 3rd determination module, within the very first time interval default, determines the quantity of the first node that does not receive response data packet, and the detection packet that the described Section Point of described feedback data packet sends based on described first node feeds back;

Shroud module, while being greater than the second default threshold values for the quantity not receiving the first node of response data packet, recording described Section Point is failure node, and by described Section Point shielding.

The 4th determination module, for determining the quantity of failure node described in distributed file storage system;

The 5th determination module, while being less than the 3rd default threshold value for the quantity at failure node described in distributed file storage system, determines that described distributed file storage system is effective.

Storage-unit-failure detection method and the device of distributed file storage system of the present invention, by obtaining successively the operation of memory cell in each node, identify, to determine failed storage unit record, can effectively detect the failed storage unit in the node of distributed file storage system, for user, in time the memory cell losing efficacy is safeguarded, guaranteed the reliability of distributed file storage system.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of storage-unit-failure detection method first embodiment of distributed file storage system of the present invention;

Fig. 2 is the schematic flow sheet of storage-unit-failure detection method second embodiment of distributed file storage system of the present invention;

Fig. 3 is the schematic flow sheet of storage-unit-failure detection method the 3rd embodiment of distributed file storage system of the present invention;

Fig. 4 is the schematic flow sheet of storage-unit-failure detection method the 4th embodiment of distributed file storage system of the present invention;

Fig. 5 is the schematic flow sheet of storage-unit-failure detection method the 5th embodiment of distributed file storage system of the present invention;

Fig. 6 is the high-level schematic functional block diagram of storage-unit-failure checkout gear first embodiment of distributed file storage system of the present invention;

Fig. 7 is the high-level schematic functional block diagram of storage-unit-failure checkout gear second embodiment of distributed file storage system of the present invention;

Fig. 8 is the high-level schematic functional block diagram of storage-unit-failure checkout gear the 3rd embodiment of distributed file storage system of the present invention;

Fig. 9 is the high-level schematic functional block diagram of storage-unit-failure checkout gear the 4th embodiment of distributed file storage system of the present invention;

Figure 10 is the high-level schematic functional block diagram of storage-unit-failure checkout gear the 5th embodiment of distributed file storage system of the present invention.

The realization of the object of the invention, functional characteristics and advantage, in connection with embodiment, are described further with reference to accompanying drawing.

Embodiment

Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

The invention provides a kind of storage-unit-failure detection method (following describe in referred to as storage-unit-failure detection method) of distributed file storage system.

With reference to Fig. 1, the schematic flow sheet of storage-unit-failure detection method the first embodiment that Fig. 1 is distributed file storage system of the present invention.

In the first embodiment, this storage-unit-failure detection method comprises:

Step S10, the operation of obtaining successively the memory cell of each node identifies;

In distributed file storage system, each node comprises a plurality of memory cell, and each memory cell, when operation, has the operation sign (for example moving the process number of process) of unique correspondence.During distributed file storage system operation, each node work, the memory cell operation work in node can get operation sign corresponding to this memory cell in node; If certain memory cell is not moved work in node, in node, obtain less than operation sign corresponding to this memory cell.By obtaining successively the operation of the memory cell of each node, identify, determine the memory cell of moving work in node, and the memory cell of not moving work, have the memory cell of fault.

Step S20, when the operation sign of memory cell is obtained unsuccessfully, recording this memory cell is failed storage unit, and continues to obtain the operation sign of other memory cell of this memory cell place node, or the operation of obtaining successively the memory cell of other node identifies.

When the operation sign that has memory cell is obtained unsuccessfully, while obtaining the operation sign less than this memory cell, the not operation of this memory cell is described, there is fault and can not use, by this unit records, be failed storage unit; And continue to obtain the operation sign of other memory cell of this memory cell place node or the operation sign of the memory cell of other node.After recording failed storage unit, can send maintenance request to maintenance terminal (terminal of carrying as server and maintenance personal etc.), remind in time failed storage unit to be repaired or replaced, to guarantee the reliability of distributed file storage system.

The storage-unit-failure detection method that the present embodiment proposes, by obtaining successively the operation of memory cell in each node, identify, to determine failed storage unit record, can effectively detect the failed storage unit in the node of distributed file storage system, for user, in time the memory cell losing efficacy is safeguarded, guaranteed the reliability of distributed file storage system.

With reference to Fig. 2, the schematic flow sheet of storage-unit-failure detection method the second embodiment that Fig. 2 is distributed file storage system of the present invention.

The scheme of the scheme of the second embodiment based on the first embodiment, in a second embodiment, in the step S20 of this storage-unit-failure detection method, when the operation sign of memory cell is obtained unsuccessfully, recording this memory cell is that failed storage unit comprises:

Step S21, when the operation sign of memory cell is obtained unsuccessfully, restarts this memory cell;

Because some memory cell can not be moved the failure problems of work, can be by restarting solution, make it recover normal operation work, therefore when the operation sign that has memory cell is obtained unsuccessfully, first restart this memory cell, so that partial memory cell can normally move by immediate recovery, make distributed file storage system keep as far as possible many memory cell operation work, guarantee the maximum reliability of distributed file storage system operation, and reduced attendant's maintenance workload.

Step S22, in default very first time interval, if memory cell is restarted unsuccessfully, recording this memory cell is failed storage unit.

The memory cell that conventionally can solve by restarting fault can be restarted successfully in (very first time) at the appointed time, and the memory cell that can not solve by restarting fault can not be restarted successfully in regulation.To returning to the result of reboot operation behind the very first time interval of the memory cell execution reboot operation of fault, if that returns restarts result for to restart unsuccessfully (this memory cell is restarted unsuccessfully within default very first time interval), recording this memory cell is failed storage unit; If that returns restarts result for restarting successfully (this memory cell is restarted successfully within default very first time interval), this memory cell is recovered normal operation, now can get the operation sign of this memory cell, and then judge that this memory cell is effective, the operation sign of other memory cell of this memory cell place node is obtained in continuation successively, or the operation of obtaining successively the memory cell of other node identifies.

The storage-unit-failure detection method of the present embodiment, when the operation sign of memory cell is obtained unsuccessfully, memory cell is restarted, so that can solve by restarting the memory cell of failure problems, by restarting, recover normal operation immediately, the storage degree unit record that can not restart solution failure problems is failed storage unit, make distributed file storage system keep as far as possible many memory cell operation work, guarantee the reliability of distributed file storage system, and reduce attendant's maintenance workload.

With reference to Fig. 3, the schematic flow sheet of storage-unit-failure detection method the 3rd embodiment that Fig. 3 is distributed file storage system of the present invention.

The scheme of the scheme of the 3rd embodiment based on the first embodiment or the second embodiment, in the 3rd embodiment, after step S20, storage-unit-failure detection method also comprises:

Step S30, determines the quantity of failed storage unit described in distributed file storage system;

In distributed file storage system, the operation of the memory cell of each node sign has been obtained successively and has been recorded out after all failed storage unit, determines the total quantity of the failed storage unit of record.

Step S40, when the quantity of failed storage unit described in distributed file storage system is greater than first threshold, determines that described distributed file storage system lost efficacy.

Default first threshold is preferably total half of the memory cell of all available nodes in distributed file storage system (do not lose efficacy node).When the quantity of failed storage unit described in distributed file storage system surpasses first threshold, think that abnormal (can not visit data or the data of access incorrect etc.) easily appears in the transfer of data of distributed file storage system and access, the reliability of the data of distributed file storage system is low, now determine that distributed file storage system lost efficacy, out of service.

The storage-unit-failure detection method of the present embodiment, when the quantity of the failed storage unit in distributed file storage system surpasses first threshold, distributed file storage system is defined as losing efficacy, stop distributed file storage system operation, avoid distributed file storage system continue operation cause loss of data and visit data abnormal.

With reference to Fig. 4, the schematic flow sheet of storage-unit-failure detection method the 4th embodiment that Fig. 4 is distributed file storage system of the present invention.

The scheme of the scheme of the 4th embodiment based on arbitrary embodiment in the first to the 3rd embodiment, in the 4th embodiment, before step S10, storage-unit-failure detection method also comprises:

Step S50, controls between the node in distributed file storage system and mutually sends and detect packet;

In the present embodiment, can control between each node and send and detect packet mutually, to guarantee the mutual detection of running status between each node in distributed file storage system.

Step S60, successively using arbitrary node in distributed file storage system as Section Point, other node as first node to determine the validity of Section Point;

For example, in distributed file storage system, there are A, B, C, tetra-nodes of D, using B node as Section Point, A, C, tri-nodes of D are first node, judge that whether B node is effective, after judging that B node is whether effectively, can continue judge that whether C node effective according to default order, the like until detected all nodes.

Step S70, determines within default very first time interval, does not receive the quantity of the first node of response data packet, and the detection packet that the described Section Point of described feedback data packet sends based on first node feeds back;

In the present embodiment, Section Point, when receiving packet, is resolved to determine the type of the packet receiving to the packet receiving, and at the packet receiving, is while detecting packet, to described first node feedback response packet.Owing to there is the situation of communication link fails, first node does not receive the feedback data that Section Point sends and comprises that multiple situation: a, communication link break down; B, first node break down not send and detect packet; C, Section Point break down and do not send feedback data packet.

In the present embodiment, the step of quantity of determining the first node of the response data packet do not receive Section Point feedback can realize by following scheme: when a, first node do not receive response data packet in default very first time interval, recording Section Point is insincere node with respect to first node, and record the sign (as title and code etc.) of first node, the quantity of the sign of the first node of this record is the quantity of the first node of the response data packet that does not receive Section Point feedback; When b, first node do not receive response data packet in default very first time interval, recording described Section Point is insincere node.This step that records insincere node can be accomplished in several ways, for example, set up trusted node database and insincere node database, when Section Point is recorded as to insincere node, identified (as title and code etc.) and be added in insincere node database; Or, when Section Point is recorded as to insincere node, add insincere sign to described Section Point, and obtaining that to record Section Point be insincere degree of node, this records Section Point is the quantity that insincere degree of node is the first node of the response data packet that does not receive Section Point feedback.

Step S80, when the quantity that does not receive the first node of response data packet is greater than the second default threshold values, recording described Section Point is failure node, and by described Section Point shielding.

Failure node can not be used, to the storage-unit-failure in failure node, detect nonsensical, and in node, the quantity of memory cell is more, in order to have improved the efficiency of storage-unit-failure detection method, therefore the present embodiment shields the failure node detecting, make not obtain the operation sign of the memory cell of failure node, avoid insignificant detection.The second threshold values can be set by user, and half of the quantity that preferred version is first node, to guarantee that recording Section Point is failure node, and failure node is shielded when most of first node does not receive the response data packet of Section Point feedback.

The storage element abatement detecting method that the present embodiment proposes, before the operation sign of memory cell of obtaining successively each node, first detect the failure node in distributed file storage system and failure node is masked, do not obtain the operation sign of the memory cell of failure node, failure node is not carried out to storage-unit-failure detection, significantly improved the efficiency that storage-unit-failure detects.

With reference to Fig. 5, the schematic flow sheet of storage-unit-failure detection method the 5th embodiment that Fig. 5 is distributed file storage system of the present invention.

The scheme of the 5th embodiment based on the 4th embodiment, in the 5th embodiment, after step S80 and before step S10, storage-unit-failure detection method also comprises:

Step S90, determines the quantity of failure node described in distributed file storage system;

Step S100, when the quantity of failure node described in distributed file storage system is less than the 3rd default threshold value, determines that described distributed file storage system is effective.

In the present embodiment, the 3rd default threshold values is preferably half of number of nodes in distributed file storage system, in distributed file storage system during node major part unavailable (having most of failure node), think that this distributed file storage system can not carry out transfer of data, determine that this distributed file storage system lost efficacy, now distributed file storage system is unavailable, then the failure detection of the memory cell of distributed file storage system has not been had to meaning.When in distributed file storage system, the quantity of failure node is less than the 3rd threshold value, distributed file storage system is just defined as effectively, now just meaningful to the failure detection of the memory cell of distributed file storage system.After recording failure node and determining that this distributed file storage system lost efficacy, can send maintenance request to maintenance terminal (terminal of carrying as server and maintenance personal etc.), guarantee that failure node and distributed file storage system recover normal in time.

The storage-unit-failure detection method of the present embodiment, before the operation sign of memory cell of obtaining successively each node, first determine that whether distributed file storage system is available, when distributed file storage system is available, just the memory cell of the node of distributed file storage system is carried out to failure detection, avoided when distributed file storage system lost efficacy, distributed file storage system has been done to insignificant storage-unit-failure and detect.

The present invention also provides a kind of storage-unit-failure checkout gear (following describe in referred to as storage-unit-failure checkout gear) of distributed file storage system.

With reference to Fig. 6, the high-level schematic functional block diagram of storage-unit-failure checkout gear the first embodiment that Fig. 6 is distributed file storage system of the present invention.

In the first embodiment, described storage-unit-failure checkout gear comprises:

Acquisition module 10, for obtaining successively the operation sign of the memory cell of each node, and while obtaining unsuccessfully for the operation sign in memory cell, continue to obtain the operation sign of other memory cell of this memory cell place node, or the operation of obtaining successively the memory cell of other node identifies;

In distributed file storage system, each node comprises a plurality of memory cell, and each memory cell, when operation, has the operation sign (for example moving the process number of process) of unique correspondence.During distributed file storage system operation, each node work, the memory cell operation work in node can get operation sign corresponding to this memory cell in node; If certain memory cell is not moved work in node, in node, obtain less than operation sign corresponding to this memory cell.The operation of obtaining successively the memory cell of each node by acquisition module 10 identifies, and determines the memory cell of moving work in node, and the memory cell of not moving work, has the memory cell of fault.

Logging modle 20, while obtaining unsuccessfully for the operation sign in memory cell, recording this memory cell is failed storage unit.

When the operation of obtaining memory cell at acquisition module 10 identifies unsuccessfully, be that acquisition module 10 obtains operation when sign less than this memory cell, the not operation of this memory cell is described, has fault and can not use, logging modle 20 is failed storage unit by this unit records; And acquisition module 10 continues to obtain the operation sign of other memory cell of this memory cell place node or the operation sign of the memory cell of other node.After logging modle 20 records failed storage unit, can send maintenance request to maintenance terminal (terminal of carrying as server and maintenance personal etc.), remind and in time failed storage unit is repaired or replaced, to guarantee the reliability of distributed file storage system.

The storage-unit-failure checkout gear that the present embodiment proposes, the operation of obtaining successively memory cell in each node by acquisition module 10 identifies, to determine failed storage unit and to carry out record by logging modle 20, can effectively detect the failed storage unit in the node of distributed file storage system, for user, in time the memory cell losing efficacy is safeguarded, guaranteed the reliability of distributed file storage system.

With reference to Fig. 7, the high-level schematic functional block diagram of storage-unit-failure checkout gear the second embodiment that Fig. 7 is distributed file storage system of the present invention.

The scheme of the scheme of the second embodiment based on the first embodiment, in a second embodiment, the logging modle 20 of described storage-unit-failure checkout gear comprises:

Restart unit 21, while obtaining unsuccessfully for the operation sign in memory cell, restart this memory cell;

Because some memory cell can not be moved the failure problems of work, can be by restarting solution, make it recover normal operation work, when the operation of therefore obtaining memory cell at acquisition module 10 identifies unsuccessfully, restart unit 21 and restart this memory cell, so that some memory cell can normally be moved by immediate recovery, make distributed file storage system keep as far as possible many memory cell operation work, guarantee the maximum reliability of distributed file storage system operation, and reduced attendant's maintenance workload.

Record cell 22, in the very first time interval default, if memory cell is restarted unsuccessfully, recording this memory cell is failed storage unit.

Conventionally the memory cell that can restart by restarting unit 21 solution fault can be restarted successfully in (very first time) at the appointed time, and the memory cell that can not restart by restarting unit 21 solution fault can not be restarted successfully in regulation.To returning to the result of reboot operation behind the very first time interval of the memory cell execution reboot operation of fault, if that returns restarts result for restarting unsuccessfully (this memory cell is restarted unsuccessfully within default very first time interval), to record this memory cell be failed storage unit to record cell 22; If that returns restarts result for restarting successfully (this memory cell is restarted successfully within default very first time interval), this memory cell is recovered normal operation, now acquisition module 10 can get the operation sign of this memory cell, and then judge that this memory cell is effective, acquisition module 10 continues to obtain the operation sign of other memory cell of this memory cell place node, or the operation of obtaining successively the memory cell of other node identifies.

The storage-unit-failure checkout gear of the present embodiment, when the operation sign of memory cell is obtained unsuccessfully, restarting 21 pairs of unit memory cell restarts, so that can solve by restarting the memory cell of failure problems, by restarting, recover normal operation immediately, the storage degree unit record that record cell 22 can not be restarted solution failure problems is failed storage unit, make distributed file storage system keep as far as possible many memory cell operation work, guarantee the reliability of distributed file storage system, and reduce attendant's maintenance workload.

With reference to Fig. 8, the high-level schematic functional block diagram of storage-unit-failure checkout gear the 3rd embodiment that Fig. 8 is distributed file storage system of the present invention.

The scheme of the scheme of the 3rd embodiment based on the first or second embodiment, in the 3rd embodiment, described storage-unit-failure checkout gear also comprises:

The first determination module 30, for obtaining the quantity of failed storage unit described in distributed file storage system;

In distributed file storage system, acquisition module 10 has obtained the operation sign of the memory cell of each node successively, and logging modle 20 records out after all inefficacy storage units, and the first determination module 30 is determined the total quantity of the failed storage unit of record.

The second determination module 40, while being greater than first threshold for the quantity in failed storage unit described in distributed file storage system, determines that described distributed file storage system lost efficacy.

Default first threshold is preferably total half of the memory cell of all available nodes in distributed file storage system (do not lose efficacy node).The second determination module 40 is when the quantity of failed storage unit described in distributed file storage system surpasses first threshold, think that abnormal (can not visit data or the data of access incorrect etc.) easily appears in the transfer of data of distributed file storage system and access, the reliability of the data of distributed file storage system is low, now determine that distributed file storage system lost efficacy, out of service.

With reference to Fig. 9, the high-level schematic functional block diagram of storage-unit-failure checkout gear the 4th embodiment that Fig. 9 is distributed file storage system of the present invention.

The scheme of the scheme of the 4th embodiment based on arbitrary embodiment in the first to the 3rd embodiment, in the 4th embodiment, described storage-unit-failure checkout gear also comprises:

Control module 50, mutually sends and detects packet for controlling between the node of distributed file storage system;

In the present embodiment, control module 50 can be controlled between each node and send and detect packet mutually, to guarantee the mutual detection of running status between each node in distributed file storage system.

Node availability detection module 60, for successively using the arbitrary node of distributed file storage system as Section Point, other node as first node to determine the validity of Section Point;

For example, in distributed file storage system, there are A, B, C, tetra-nodes of D, node availability detection module 60 is using B node as Section Point, A, C, tri-nodes of D are first node, judge that whether B node is effective, after judging that B node is whether effectively, node availability detection module 60 can continue judge that whether C node effective according to default order, the like until detected all nodes.

The 3rd determination module 70, within the very first time interval default, determines the quantity of the first node that does not receive response data packet, and the detection packet that the described Section Point of described feedback data packet sends based on described first node feeds back;

In the present embodiment, the 3rd determination module 70 determines that the step of quantity of the first node of the response data packet that does not receive Section Point feedback can realize by following scheme: when a, first node do not receive response data packet in default very first time interval, recording Section Point is insincere node with respect to first node, and record the sign (as title and code etc.) of first node, the quantity of the sign of the first node of this record is the quantity of the first node of the response data packet that does not receive Section Point feedback; When b, first node do not receive response data packet in default very first time interval, recording described Section Point is insincere node.This step that records insincere node can be accomplished in several ways, for example, set up trusted node database and insincere node database, when Section Point is recorded as to insincere node, identified (as title and code etc.) and be added in insincere node database; Or, when Section Point is recorded as to insincere node, add insincere sign to described Section Point, and obtaining that to record Section Point be insincere degree of node, this records Section Point is the quantity that insincere degree of node is the first node of the response data packet that does not receive Section Point feedback.

Shroud module 80, while being greater than the second default threshold values for the quantity not receiving the first node of response data packet, recording described Section Point is failure node, and by described Section Point shielding.

Failure node can not be used, to the storage-unit-failure in failure node, detect nonsensical, and in node, the quantity of memory cell is more, in order to have improved the efficiency of storage-unit-failure detection method, therefore shroud module 80 shields the failure node detecting, make not obtain the operation sign of the memory cell of failure node, avoid insignificant detection.The second threshold values can be set by user, preferred version is half of quantity of first node, to guarantee that it is failure node that shroud module 80 records Section Point, and failure node is shielded when most of first node does not receive the response data packet of Section Point feedback.

The storage element failure detection device that the present embodiment proposes, obtain successively the operation sign of memory cell of each node at acquisition module 10 before, first detect the failure node in distributed file storage system and by shroud module 80, failure node masked, do not obtain the operation sign of the memory cell of failure node, failure node is not carried out to storage-unit-failure detection, significantly improved the efficiency that storage-unit-failure detects.

With reference to Figure 10, the high-level schematic functional block diagram of storage-unit-failure checkout gear the 5th embodiment that Figure 10 is distributed file storage system of the present invention.

The scheme of the scheme of the 5th embodiment based on the 4th embodiment, in the 5th embodiment, described storage-unit-failure checkout gear also comprises:

The 4th determination module 90, for determining the quantity of failure node described in distributed file storage system;

The 5th determination module 100, while being less than the 3rd default threshold value for the quantity at failure node described in distributed file storage system, determines that described distributed file storage system is effective.

In the present embodiment, the 3rd default threshold values is preferably half of number of nodes in distributed file storage system, in distributed file storage system during node major part unavailable (having most of failure node), the 5th determination module 100 thinks that this distributed file storage system can not carry out transfer of data, determine that this distributed file storage system lost efficacy, now distributed file storage system is unavailable, then the failure detection of the memory cell of distributed file storage system has not been had to meaning.When in distributed file storage system, the quantity of failure node is less than the 3rd threshold value, the 5th determination module 100 just determines that distributed file storage system is effective, now just meaningful to the failure detection of the memory cell of distributed file storage system.After recording failure node and determining that this distributed file storage system lost efficacy, can send maintenance request to maintenance terminal (terminal of carrying as server and maintenance personal etc.), guarantee that failure node and distributed file storage system recover normal in time.

The storage-unit-failure checkout gear of the present embodiment, obtain successively the operation sign of memory cell of each node at acquisition module 10 before, first by the 5th determination module 100, determine that whether distributed file storage system is available, at distributed file storage system, determine when available and just the memory cell of the node of distributed file storage system is carried out to failure detection, avoided when distributed file storage system lost efficacy, distributed file storage system has been done to insignificant storage-unit-failure and detect.

These are only the preferred embodiments of the present invention; not thereby limit the scope of the claims of the present invention; every equivalent structure or conversion of equivalent flow process that utilizes specification of the present invention and accompanying drawing content to do; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims

1. a storage-unit-failure detection method for distributed file storage system, is characterized in that, the storage-unit-failure detection method of described distributed file storage system comprises the following steps:

Obtain successively the operation sign of the memory cell of each node;

2. the storage-unit-failure detection method of distributed file storage system as claimed in claim 1, is characterized in that, when the described sign of the operation in memory cell is obtained unsuccessfully, recording this memory cell is that failed storage unit comprises:

3. the storage-unit-failure detection method of distributed file storage system as claimed in claim 1, it is characterized in that, when the described sign of the operation in memory cell is obtained unsuccessfully, recording this memory cell is failed storage unit, and the operation sign of other memory cell in this node is obtained in continuation successively, or after obtaining successively the step of operation sign of memory cell of other node, the storage-unit-failure detection method of described distributed file storage system also comprises:

4. the storage-unit-failure detection method of the distributed file storage system as described in any one in claim 1-3, it is characterized in that, before the step that described operation of obtaining successively the memory cell of each node identifies, the storage-unit-failure detection method of described distributed file storage system also comprises:

5. the storage-unit-failure detection method of distributed file storage system as claimed in claim 4, it is characterized in that, when the described quantity not receiving the first node of response data packet is greater than the second default threshold values, recording described Section Point is failure node, and by after the step of described Section Point shielding, the storage-unit-failure detection method of described distributed file storage system also comprises:

6. a storage-unit-failure checkout gear for distributed file storage system, is characterized in that, the storage-unit-failure checkout gear of described distributed file storage system comprises:

7. the storage-unit-failure checkout gear of distributed file storage system as claimed in claim 6, is characterized in that, described logging modle comprises:

8. the storage-unit-failure checkout gear of distributed file storage system as claimed in claim 6, is characterized in that, the storage-unit-failure checkout gear of described distributed file storage system also comprises:

9. the storage-unit-failure checkout gear of the distributed file storage system as described in any one in claim 6-8, is characterized in that, the storage-unit-failure checkout gear of described distributed file storage system also comprises:

10. the storage-unit-failure checkout gear of distributed file storage system as claimed in claim 9, is characterized in that, the storage-unit-failure checkout gear of described distributed file storage system also comprises: