CN107547252A

CN107547252A - A kind of network failure processing method and device

Info

Publication number: CN107547252A
Application number: CN201710515775.6A
Authority: CN
Inventors: 马春燕; 陈杰
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2017-06-29
Filing date: 2017-06-29
Publication date: 2018-01-05
Anticipated expiration: 2037-06-29
Also published as: CN107547252B

Abstract

The invention discloses a kind of network failure processing method and device.Distributed memory system includes the service network being deployed separately and storage net, and clustered control node M ON and multiple object storage device OSD are connected to service network and communicated, and multiple OSD are also connected to storage net and communicated, and this method is applied to OSD, including：The OSD and the storage net connecting link between the OSD of OSD pairings are detected, obtains the link-state information of the OSD sides；If detecting there is abnormal, active termination OSD finger daemon in the Link State of the OSD sides, prevent the OSD from sending the information with the OSD failures of OSD pairings to MON by service network.The network failure processing method of the application, apply in the OSD of storage system, when the Link State for detecting itself side has abnormal, the finger daemon of active termination itself, prevent to report other OSD failures by mistake, solve the problems, such as that OSD shakes in storage net.

Description

A kind of network failure processing method and device

Technical field

The present invention relates to communication technical field, more particularly to a kind of network failure processing method and device.

Background technology

Ceph is that one kind is increased income distributed memory system, it has also become one of most general storage system instantly, and at present Popularity highest is increased income one of stored items.Ceph has the characteristics that high-performance, highly reliable and Highly Scalable, including object storage Equipment (Object Storage Device, OSD) and cluster monitoring node (Monitor, MON).OSD is used to provide storage money Source, (up states) can provide storage when state is normal, and (down states) normally can not be read and be write when state is abnormal, OSD possesses a finger daemon (OSD deamon) of oneself, for be responsible for complete OSD all logic functions, including with MON Communicated with other OSD to safeguard renewal system mode etc..MON is used to receive the state report that OSD is reported, renewal and diffusion OSD status informations (OSDMap).To safeguard the global state of whole Ceph clusters.

However, Ceph cluster applications, when production environment, network connection is to influence one of very important factor of its work. The network structure suggested in Ceph clusters is, by service network (Public network) and storage net (Cluster Network) it is deployed separately.State of the service network mainly between carrying user real data, OSD and MON, MON and MON, OSD Information and heartbeat communication, storage net are mainly used in heartbeat communication and cluster internal data between OSD, such as recovery, copy, scouring Deng.In robustness and reliability testing, network flash or other network problems can trigger the OSD concussions that storage is netted (Flapping) phenomenon, some or all OSD is shown as and are set to up or down states repeatedly, cause service disconnection.

The content of the invention

This application provides a kind of network failure processing method and device, with solve in Ceph due to network flash or its Storage net OSD concussion problems caused by his network problem.

One side according to the application, there is provided a kind of network failure processing method, in distributed memory system, bag The service network being deployed separately and storage net are included, clustered control node M ON and multiple object storage device OSD are connected to the service network Being communicated, the multiple OSD is also connected to the storage net and communicated, and this method is applied to the OSD, including：

The OSD and the storage net connecting link between the OSD of OSD pairings are detected, obtains the Link State of the OSD sides Information；

If detecting there is abnormal, active termination OSD finger daemon in the Link State of the OSD sides, to prevent to be somebody's turn to do OSD sends the information with the OSD failures of OSD pairings by service network to MON.

According to further aspect of the application, there is provided a kind of dealing with network breakdown device, in distributed memory system, Including the service network being deployed separately and storage net, clustered control node M ON and multiple object storage device OSD are connected to the business Net is communicated, and the multiple OSD is also connected to the storage net and communicated, and the device is applied to the OSD, including：

Link detecting unit, for OSD where detecting the device and the storage net between the OSD of place OSD pairings Connecting link, obtain the link-state information of place OSD sides；

Processing unit, detect that the Link State of place OSD sides has exception for working as, then the active termination place OSD finger daemon, with prevent place OSD by service network to MON send with place OSD match OSD failures letter Breath.

The beneficial effect of the application is：The network failure processing method of the application, apply in the OSD of Ceph storage nets, By detecting the OSD and the storage net connecting link between the OSD of OSD pairings, the Link State letter of the OSD sides is obtained Breath, when the Link State for detecting the OSD sides has abnormal, then active termination OSD finger daemon, so as to send out again Raw state is recovered, and to prevent the OSD failures of wrong report and OSD pairings, solves the problems, such as that OSD shakes in storage net.

Brief description of the drawings

Fig. 1 is the Ceph network architectures and communication scheme；

Fig. 2 communication process schematic diagrames between peer OSD；

Fig. 3 is the schematic flow sheet of the network failure processing method of the application one embodiment；

The network path schematic diagram that Fig. 4 is connected between peer OSD by multistage route implementing；

Fig. 5 is the structural representation of the dealing with network breakdown device of the application one embodiment；

Fig. 6 is the structural representation of the dealing with network breakdown device of the application another embodiment；

Fig. 7 is the structural representation of the dealing with network breakdown device of the application another embodiment；

Fig. 8 is a kind of structural representation of OSD hardware of the application one embodiment.

Embodiment

Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the application.

It is only merely for the purpose of description specific embodiment in term used in this application, and is not intended to be limiting the application. " one kind " of singulative used in the application and appended claims, " described " and "the" are also intended to including majority Form, unless context clearly shows that other implications.It is also understood that term "and/or" used herein refers to and wrapped Containing the associated list items purpose of one or more, any or all may be combined.

It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application A little information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, do not departing from In the case of the application scope, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determining ".

In order that those skilled in the art more fully understand the technical scheme of the application, first, the application application back of the body is introduced Scape Ceph increases income the structure and operation principle of distributed memory system.As shown in figure 1, the service network of Ceph clusters is (i.e. in figure Front) it is deployed separately with storage net (i.e. Back in figure).Service network mainly carries user's real data, OSD and MON, MON Map information and heartbeat between MON, OSD communicate, and storage network is mainly used in heartbeat communication and cluster internal between OSD Data, such as recovery, copy, scouring.

OSD status checkout flows are described as follows：

1st, OSD actively reports the state of oneself to MON

For example, acquiescence 900s is reported once, therefore, after OSD breaks down, their own is waited to report MON time mistake It is long.If also, shorten and called time on this, the load of MON processing OSD state reportings can be linearly increasing with cluster scale increase, And generally we only need to be concerned about the OSD to go wrong, the most of the time, this reported not how many practical function, because This reduces the effect of this time and bad.

2nd, OSD detects it and matches the OSD states of (peer)

Heartbeat can be established between OSD, for example, carrying same placement group (Placement Group, write a Chinese character in simplified form PG) OSD Between establish peer relations, or establish peer relations with the OSD before and after self ID.The state to be communicated according to heartbeat, reports it Peer OSD state.

Specifically, the inspection policies between peer OSD are as follows：

Each OSD opens a thread, and (0.5~5.9s) sends heartbeat to each peer OSD and disappeared at regular intervals Breath, if cluster configuration public_network and cluster_network, each heartbeat message can be simultaneously at two Link is sent.

As shown in Fig. 2 OSD.a is to its peer OSD --- OSD.b sends MOSDPing::PING heartbeat message (carries OSD.a transmission timestamp and osdmap version numbers), OSD.b can reply MOSDPing to OSD.a under normal circumstances::PING_ (timestamp that Reply messages carry remains what OSD.a was sended over to REPLY messages, and can also take OSD.b oneself Osdmap version), the Reply messages that OSD.a receives OSD.b will record OSD.b heartbeat message.

If (acquiescence 20s, can match somebody with somebody) OSD.a can not receive OSD.b REPLY messages in preset time, will be by OSD.b A failure_queue queue is added to, and reports MON.One OSD is reported 3 times by its peer OSD, then MON renewals Osdmap, the state of the OSD is put into down.

Afterwards, the OSD self-tests for being set to down are attempted to bind network interface card again, and state puts up if binding success, if unsuccessful Down and out are then further set to, shows that the OSD thoroughly breaks down, no longer carries any PG.

However, being had a problem that in above-mentioned inspection policies, Ceph can detect the machine network interface card hardware fault, but nothing Method detects cluster networking link abnormal conditions, for example netting twine is pulled out, or network flash etc..With Fig. 1 interior joint B network failures Exemplified by, because the PING_REPLY messages that can not receive peer OSD can be reported mutually between OSD.a and OSD.b, OSD.b and OSD.c Down, for example when MON continuously receives OSD.b and reports down by its peer OSD three times, update OSDmap, OSD.b is put into down. OSD.b is set to start self-test after down and attempts to bind network interface card again, if net card failure, then Bind Failed, and OSD.b shape State will not change；But if it is that netting twine is pulled out, or network flash, or opposite end (connection interchanger one end) network interface failure, netting twine Do not connect, now OSD.b binds network interface card success, OSD.b states up again again.Similarly, OSD.a and OSD.c also can be by OSD.b reports down, then binds network interface card, state up again.Therefore, cause multiple OSD in cluster up/down repeatedly, cause Long-time OSD shakes, and OSD concussions will necessarily cause upper-layer service to be interrupted.

Based on this, the technical concept of the application is：Stored for prior art Ceph in network, due to network flash or OSD caused by other network problems shakes problem, increases link detection mechanism in the OSD of Ceph storage nets, realizes to OSD certainly The detection of body side Link State, link-state information is obtained, when the Link State for detecting OSD itself sides has exception When, then the finger daemon of active termination itself.By the increased link detection mechanism in OSD, find out and be truly present failure One side OSD, and its finger daemon is terminated, so as to which the OSD generating state will not recover again, to prevent wrong report, it matches OSD failures, Solve the problems, such as that OSD shakes in storage net.

The implementation process of the dealing with network breakdown scheme of the application is specifically described with reference to embodiments.

Fig. 3 shows the schematic flow sheet of the network failure processing method of the application one embodiment.In distributed storage In system, including the service network being deployed separately and storage net, clustered control node M ON and multiple object storage device OSD are connected to The service network is communicated, and the multiple OSD is also connected to the storage net and communicated, and this method is applied to the OSD, bag Include following steps：

Step S110, the OSD and the storage net connecting link between the OSD of OSD pairings are detected, obtains the OSD sides Link-state information.

Step S120, if detecting there is abnormal, active termination OSD finger daemon in the Link State of the OSD sides, With prevent the OSD by service network to MON send with the OSD pairing OSD failures information.

Pass through the state self-test to the OSD, after detecting that the OSD Link States of itself have exception, active termination The finger daemon of the OSD, the OSD is prevented to carry out state recovery, so as to avoid the peer OSD Network Abnormals for reporting the OSD by mistake, The problem of OSD concussions occur is avoided, ensures that upper-layer service normally issues and data effective mobility.

In application scheme, link detecting includes the detection to network interface and the detection to router on network path.

For there was only level-1 router between better simply networking, such as each OSD, it is all connected on same router, Then judge that the detection to Link State can be achieved in network interface state.Now, the OSD is detected and between the OSD of OSD pairings Net connecting link is stored, obtains the link-state information of the OSD sides, including：

The network interface of the OSD is detected, obtains the network interface status information of the OSD.

Accordingly, if the Link State for detecting the OSD sides exist it is abnormal, the active termination OSD guard into Journey, to prevent the OSD by service network to MON transmissions and the information of the OSD failures of OSD pairings, including：

By the network interface state self-test to the OSD, if detecting the network interface abnormal state of the OSD, the active termination OSD Finger daemon, with prevent the OSD by service network to MON send with the OSD pairing OSD failures information.

Specifically, a timing network testing mechanism can be increased on OSD, every 6 seconds OSD are super in detection heartbeat communication When before, first detecting network interface, whether normal (for example, trawl performance failure, netting twine is not plugged or damaged, and exchanges generator terminal and is pulled out Deng).When the OSD detects itself network interface exception, i.e. direct connected link abnormal state, oneself actively exits process, so that will not Report Peer OSD heartbeat communication abnormalities by mistake, therefore can solve the problems, such as OSD Flapping.

In addition, isolated fault is come with this, it is necessary to using multistage route come networking for large-scale cluster.Therefore, OSD Between heartbeat be no longer single router connection, multiple routers may be crossed over.As shown in figure 4, OSD1 is through router A It is connected to center router C, OSD2 and is connected to center router C through router B, be i.e. has three on network path between OSD1 and OSD2 Level router, and router C is common connection OSD1 and OSD2 center router.When router A breaks down, OSD1 without Method receives OSD2 heartbeat response, and the backward Mon of time-out reports the message that OSD2 is Down；After Mon receives message, issue OSDMap, OSD2 is set to Down；Now OSD2 has found that the network interface of oneself is normal, reports MON, and state is updated to up, and to OSD1 sends heartbeat message, and yet with router A failures, the heartbeat that OSD2 can not receive OSD1 responds, OSD2 meetings after time-out Report the message that OSD1 is Down, Mon OSD1 can be set into Down after receiving message to Mon, and so on, produce Flapping Problem.

The key of problem is the side for finding out real failure, if it is understood that the series of router, heartbeat message quilt The OSD that real failure is just can determine that on which router is blocked in, solves the problems, such as wrong report.Based on this, in some of the application In embodiment, this method further comprises：

The storage cluster network topological information on the OSD；In the cluster network topology information, include cluster network Router series, and the positional information of center router；The OSD obtains the OSD according to the cluster network topology information The center router position being connected to jointly with the OSD matched with the OSD, so as to confirm which level router is located at the OSD sides.

Described detection OSD and the storage net connecting link between the OSD of OSD pairings, obtain the link of the OSD sides Status information, further comprise：

After the OSD and the OSD heartbeat communication abnormalities matched with the OSD, the OSD sends IP messages and received at different levels step by step The message that router returns, if failing to receive the message of certain level-1 router return, judge that the router breaks down.

Accordingly, if the Link State for detecting the OSD sides exist it is abnormal, the active termination OSD guard into Journey, to prevent the OSD from, to MON transmissions and the information of the OSD failures of OSD pairings, further comprising by service network：

The OSD is according to the cluster network topology information, if detecting the router to break down in the OSD and Center Road By between device, then OSD active terminations OSD finger daemon, to prevent the OSD from being sent and the OSD to MON by service network The information of the OSD failures of pairing.By newly-increased cluster network topology information, the OSD really to break down is detected, so as to give Failure OSD kicks out of from cluster work, and it is Down to avoid the OSD from reporting other OSD by mistake, solves the problems, such as OSD Flapping.

The cluster network topology message, is created in networking.Also, closed in the router level of cluster network After system changes, as there is the OSD newly increased in cluster, or there is OSD to be replaced, then cluster network topology information can occur Renewal, based on such a situation, methods described further comprises：

Cluster network topology information after the renewal sent with the OSD of OSD pairings is received by service network and stored, with And send the cluster network topology information after renewal to the OSD matched with the OSD；And/or the MON is received by service network Cluster network topology information after the renewal of transmission simultaneously stores.

Preferably, in some embodiments of the present application, can using ICMP ICMP agreements come by Level finds the router on network path.Specifically, the IP messages are in accordance with ICMP ICMP agreements IP messages, time-to-live (Time To Live, abbreviation TTL) field of the IP messages, describe it in transmit process, are being lost The limiting value for the number of devices that can be undergone before abandoning.

Then the message described above for sending IP messages step by step and receiving each level router return includes：

The time-to-live TTL initial value of the IP messages is set to 1, is transmitted, often receives level-1 router return Message after, the TTL numerical value of the IP messages is added 1, and send again.

By taking OSD1 in Fig. 4 as an example, OSD1 send out a TTL initial value be 1 IP messages (in fact, send out every time for 3 The message of individual 40 byte, including source address, the time tag that destination address and message are sent) to destination OSD2.When on path First router A when receiving this message, TTL is subtracted 1 by it, and now, TTL is changed into 0, so router A can be by this message Lose, and send back to an ICMP time exceeded message (include the source address of transmitting IP packet, all the elements of IP messages and The IP address of router), after OSD1 receives this message, just know that this router A is present on this path, then, then The IP messages that another TTL is 2 are sent out, find the 2nd center router C.Tracking is route with this, every time by the IP messages of submitting TTL adds 1, to find another router.

Assuming that router A failures, when OSD1 first IP message (TTL 1) of submitting is overtime, according to cluster network topology Information, it is known that the route of time-out is router A, in OSD1 itself one end, therefore, OSD1, which cancels, reports OSD2 as Down's Message, oneself exits finger daemon.Meanwhile OSD2 sends out the IP messages that TTL is 1 to OSD1, router B returns to message, OSD2 The IP message that TTL is 2 is sent out again, and router C is successfully returned, and OSD2 continues to send out the message that TTL is 3, due to router A events Barrier, IP message overtime returns, according to cluster network topology information, OSD2 knows centered on center router router C and without reason Barrier, it can thus be appreciated that OSD1 ends failure, and the message that OSD1 is Down is reported to Mon, and after Mon receives message, OSDMap is issued, will OSD1 is set to Down.Therefore, OSD1 can actively exit finger daemon, avoid wrong report when the link of oneself breaks down Peer OSD are Down, so as to solve the problems, such as OSD Flapping.

Corresponding to the above method, disclosed herein as well is a kind of dealing with network breakdown device, in distributed memory system, Including the service network being deployed separately and storage net, clustered control node M ON and multiple object storage device OSD are connected to the business Net is communicated, and the multiple OSD is also connected to the storage net and communicated, and the device is applied to the OSD, with reference to the institute of figure 5 Show, functionally divide, present networks fault treating apparatus 200 includes：

Link detecting unit 210, for OSD where detecting the device and the storage between the OSD of place OSD pairings Net connecting link, obtain the link-state information of place OSD sides.

Processing unit 220, detect that the Link State of place OSD sides has exception for working as, then the active termination institute In OSD finger daemon, to prevent place OSD from being sent and place the OSD OSD failures matched to MON by service network Information.

Further, with reference to shown in figure 6, in another embodiment of the application, the link detecting unit 210 wraps Include：

Network interface detection unit 211, whether the network interface for detecting place OSD is normal, obtains place OSD network interface shape State information.

The processing unit 220, specifically in OSD network interfaces abnormal state where detecting this, the active termination institute In OSD finger daemon, to prevent place OSD from being sent and place the OSD OSD failures matched to MON by service network Information.

With further reference to shown in Fig. 7, in another embodiment of the application, the device also includes：

Memory cell 230, for storage cluster network topological information；In the cluster network topology information, include collection The positional information of the router sum of series center router of group network.The device obtains according to the cluster network topology information OSD where the device and the center router position being connected to jointly with the place OSD OSD matched, so as to which which grade road confirmed Place OSD sides are located at by device.

The link detecting unit 210 further comprises：

Router detection unit 212, the OSD heartbeat communication abnormalities matched for OSD where this and with OSD where this Afterwards, IP messages are sent step by step and connect the message that receipts routers at different levels return, if failing to receive disappearing for certain level-1 router return Breath, then judge that the router breaks down.

The processing unit 220, it is further used for, according to the cluster network topology information, judging the route to break down Device position；When detecting the router to break down where this between OSD and center router, the active termination place OSD finger daemon, with prevent place OSD by service network to MON send with place OSD match OSD failures letter Breath.

Referring again to shown in Fig. 7, in some embodiments of the present application, the device further comprises：

Updating block 240, for after the router hierarchical relationship of cluster network changes, by service network receive with Cluster network topology information after the renewal that the OSD of place OSD pairings is sent, and be sent to the memory cell 230 and store, And send the cluster network topology information after renewal to the OSD matched with place OSD；And/or institute is received by service network The cluster network topology information after the renewal of MON transmissions is stated, and is sent to the memory cell 230 and stores.

Specifically, the IP messages that the router detection unit 212 is sent are in accordance with ICMP ICMP The IP messages of agreement.The time-to-live TTL initial value of the IP messages is set to 1 by the router detection unit 212, is sent out Send, after the message for often receiving level-1 router return, the TTL numerical value of the IP messages is added 1, and send again.When failing After the message for receiving the return of certain level-1 router, then judge that the router breaks down.

The dealing with network breakdown device that the application provides can be realized by software, can also pass through hardware or software and hardware With reference to mode realize.Exemplified by implemented in software, can by processor 810 by nonvolatile memory 850 with network failure Machine-executable instruction corresponding to processing unit 200 reads in internal memory 840 and run.For hardware view, as shown in figure 8, For a kind of hardware structure diagram of the application device, except the processor 810 shown in Fig. 8, internal bus 820, network interface 830, Outside internal memory 840 and nonvolatile memory 850, according to the actual functional capability of the OSD, other hardware can also be included, to this Repeat no more.

In various embodiments, the nonvolatile memory 850 can be：Memory driver (such as hard drive Device), solid state hard disc, any kind of storage dish (such as CD, DVD), either similar storage medium or their group Close.The internal memory 840 can be：RAM (Radom Access Memory, random access memory), volatile memory, it is non-easily The property lost memory, flash memory.

Further, nonvolatile memory 850 and internal memory 840 are used as machinable medium, can store thereon by Manage machine-executable instruction corresponding to the dealing with network breakdown device 200 that device 810 performs.

For device embodiment, because it corresponds essentially to embodiment of the method, so related part is real referring to method Apply the part explanation of example.Device embodiment described above is only schematical, wherein described be used as separating component The unit of explanation can be or may not be physically separate, can be as the part that unit is shown or can also It is not physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality Need to select some or all of module therein to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not In the case of paying creative work, you can to understand and implement.

It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation In any this actual relation or order.Term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or equipment including a series of elements not only include those key elements, and And also include the other element being not expressly set out, or also include for this process, method, article or equipment institute inherently Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including institute State in process, method, article or the equipment of key element and other identical element also be present.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims

A kind of 1. network failure processing method, it is characterised in that in distributed memory system, including the service network being deployed separately With storage net, clustered control node M ON and multiple object storage device OSD are connected to the service network and communicated, the multiple OSD is also connected to the storage net and communicated, and this method is applied to the OSD, including：

The OSD and the storage net connecting link between the OSD of OSD pairings are detected, obtains the Link State letter of the OSD sides Breath；

If detecting there is abnormal, active termination OSD finger daemon, to prevent the OSD from leading in the Link State of the OSD sides Cross the information for the OSD failures that service network is sent to MON and the OSD is matched.
2. according to the method for claim 1, it is characterised in that detect the OSD and depositing between the OSD of OSD pairings Net connecting link is stored up, obtains the link-state information of the OSD sides, including：

The network interface of the OSD is detected, obtains the network interface status information of the OSD；

Accordingly, if the Link State for detecting the OSD sides has abnormal, active termination OSD finger daemon, with The information for the OSD failures that the OSD is sent by service network to MON and the OSD is matched is prevented, including：

If detecting the network interface abnormal state of the OSD, active termination OSD finger daemon, to prevent the OSD from passing through business Net sends the information with the OSD failures of OSD pairings to MON.
3. method according to claim 1 or 2, it is characterised in that this method further comprises：

Storage cluster network topological information；In the cluster network topology information, include the router series of cluster network, with And the positional information of center router；The OSD obtains the OSD and matched with the OSD according to the cluster network topology information The center router positions that are connected to jointly of OSD, so as to confirm which level router is located at the OSD sides；

Described detection OSD and the storage net connecting link between the OSD of OSD pairings, obtain the Link State of the OSD sides Information, further comprise：

After the OSD and the OSD heartbeat communication abnormalities matched with the OSD, IP messages are sent step by step and receive each level router and are returned The message returned, if failing to receive the message of certain level-1 router return, judge that the router breaks down；

Accordingly, if detecting there is abnormal, active termination OSD finger daemon in the Link State of the OSD sides, to prevent The OSD is sent by service network to MON and the information of the OSD failures of OSD pairings, further comprises：

According to the cluster network topology information, if detecting the router to break down between the OSD and center router, Then active termination OSD finger daemon, to prevent the OSD from sending the OSD failures matched with the OSD to MON by service network Information.
4. according to the method for claim 3, it is characterised in that methods described further comprises：In the route of cluster network After device hierarchical relationship changes,

Cluster network topology information after the renewal sent with the OSD of OSD pairings is received by service network and stored, Yi Jixiang The cluster network topology information after renewal is sent with the OSD of OSD pairings；

And/or the cluster network topology information after the renewal of the MON transmissions is received by service network and is stored.
5. according to the method for claim 3, it is characterised in that the IP messages are in accordance with ICMP The IP messages of ICMP agreements；The message for sending IP messages step by step and receiving each level router return includes：

The time-to-live TTL initial value of the IP messages is set to 1, is transmitted, often receives disappearing for level-1 router return After breath, the TTL numerical value of the IP messages is added 1, and send again.
A kind of 6. dealing with network breakdown device, it is characterised in that in distributed memory system, including the service network being deployed separately With storage net, clustered control node M ON and multiple object storage device OSD are connected to the service network and communicated, the multiple OSD is also connected to the storage net and communicated, and the device is applied to the OSD, including：

Link detecting unit, it is connected for OSD where detecting the device and the storage net between the OSD of place OSD pairings Link, obtain the link-state information of place OSD sides；

Processing unit, for existing extremely when the Link State that detect place OSD sides, then active termination place OSD Finger daemon, with prevent place OSD by service network to MON send with place OSD match OSD failures information.
7. device according to claim 6, it is characterised in that the link detecting unit includes：

Network interface detection unit, whether the network interface for detecting place OSD is normal, obtains place OSD network interface status information；

The processing unit, specifically in OSD network interfaces abnormal state where detecting this, active termination place OSD's Finger daemon, with prevent place OSD by service network to MON send with place OSD match OSD failures information.
8. the device according to claim 6 or 7, it is characterised in that the device further comprises：

Memory cell, for storage cluster network topological information；In the cluster network topology information, include cluster network The positional information of router sum of series center router；The device obtains the device institute according to the cluster network topology information The center router position that the OSD that OSD where OSD and with this is matched is connected to jointly, so as to confirm which level router is located at Place OSD sides；

The link detecting unit further comprises：

Router detection unit, for OSD where this and with after the OSD heartbeat communication abnormalities of OSD pairings where this, sending out step by step Send IP messages and receive the message that receipts routers at different levels return, if failing to receive the message of certain level-1 router return, sentence The disconnected router breaks down；

The processing unit, it is further used for according to the cluster network topology information, when detecting the router that breaks down When where this between OSD and center router, active termination place OSD finger daemon, to prevent place OSD from passing through Service network sends the information with the OSD failures of place OSD pairings to MON.
9. device according to claim 8, it is characterised in that the device further comprises：

Updating block, for after the router hierarchical relationship of cluster network changes,

Cluster network topology information after the renewal sent with the OSD of place OSD pairings is received by service network, and is sent to The memory cell storage, and send the cluster network topology information after renewal to the OSD matched with place OSD；

And/or the cluster network topology information after the renewal of the MON transmissions is received by service network, and it is sent to described deposit Storage unit stores.
10. device according to claim 8, it is characterised in that the IP messages that the router detection unit is sent are to abide by Keep the IP messages of ICMP ICMP agreements；Specifically, the router detection unit is by the IP messages Time-to-live TTL initial value is set to 1, is transmitted, after the message for often receiving level-1 router return, by the IP messages TTL numerical value adds 1, and sends again.