CN102904806B - Deadlock free fault-tolerant self-adaptation routing method of computer system - Google Patents

Deadlock free fault-tolerant self-adaptation routing method of computer system Download PDF

Info

Publication number
CN102904806B
CN102904806B CN201210371076.6A CN201210371076A CN102904806B CN 102904806 B CN102904806 B CN 102904806B CN 201210371076 A CN201210371076 A CN 201210371076A CN 102904806 B CN102904806 B CN 102904806B
Authority
CN
China
Prior art keywords
node
fault
network
sub
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210371076.6A
Other languages
Chinese (zh)
Other versions
CN102904806A (en
Inventor
向东
张研
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201210371076.6A priority Critical patent/CN102904806B/en
Publication of CN102904806A publication Critical patent/CN102904806A/en
Application granted granted Critical
Publication of CN102904806B publication Critical patent/CN102904806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The invention provides a deadlock free fault-tolerant self-adaptation routing method of a computer system. The method includes the following steps that, firstly, nodes in the computer system are numbered; then, whether the nodes are local safety nodes is judged; an initialization strategy of a fault message is defined, the fault message is collected, a fault message table is established, and the fault message table comprises an adjacent network fault message table and a local fault message table; and finally, the fault message table is subjected to updating maintenance. According to the deadlock free fault-tolerant self-adaptation routing method, failure nodes can be efficiently and rapidly positioned under the condition that network transmission is not suspended, the fault message can be rapidly and accurately collected and updated, and safety of the computer system is improved.

Description

A kind of fault-tolerant adaptive routing method of deadlock freedom of computer system
Technical field
The present invention relates to computer system troubleshooting technical field, particularly the fault-tolerant adaptive routing method of a kind of deadlock freedom of computer system.
Background technology
High-end fault-tolerant computer, usually also referred to as server, there is the advantage of the high reliability in powerful information processing capability and large-scale server system, be widely used in the process of banking, telecommunication management, electronic government affairs and a lot of other mission criticals.This computer, generally containing 8-64 processor, has the reliability up to 99.999% while running large-scale calculations task.High speed information disposal ability and high reliability are two most important characteristics of fault-tolerant computer.Fault-tolerantly refer to the ability operated in the situation lower network of component failure, fault-toleranr technique is reduced to cost with huge performance often.
In the recent period, occurred a kind of novel fault-tolerant computer system, this system contains 4*N computing node, and wherein each computing node can provide enough powerful computing capability.This system can be divided into two sub-networks, wherein in each sub-network all containing 2*N computing node.And containing N number of interchanger be connected to each other in each sub-network, each interchanger connects two computing nodes.Each interchanger is connected to an interchanger in another sub-network.The degree of each interchanger is N+2, and it is connected to two local computing nodes and N number of adjacent interchanger.
Modern router is very healthy and strong, is in operation and can not often occurs losing efficacy.But in some environments, must inefficacy to be expected to some extent and be located, such as, defence and military application system, the environments for use such as space delivery system, no matter how little the probability of component failure is, and these operational environments adopt part failure rate, failover capabilities and system use ability of smoothly demoting characterizes, and this has just driven the development for straight-forward network Fault-tolerant Routing Algorithm.Routing algorithm determines channel sequence shared in the process that message bag transmits to destination node via data source nodes.The efficiency of routing policy on directly-connected network has very important impact.For high-end fault-tolerant computer system, the algorithm of fault tolerance rout ing has for the performance of system and more importantly acts on.
Because this New Fault-tolerant Computer Systems Organization is brand-new, any network configuration that it is different from the past, does not therefore have corresponding routing algorithm.Therefore, this fault-tolerant computer system needs a kind of deadlock freedom routing policy efficiently, particularly due to this system for be the running of mission critical, therefore for it provides the efficient fault-tolerant adaptive routing method of a kind of deadlock freedom just to seem very important.
Summary of the invention
The present invention is intended at least solve the technical problem existed in prior art, especially innovatively proposes a kind of fault-tolerant adaptive routing method of deadlock freedom of computer system.
In order to realize above-mentioned purpose of the present invention, the invention provides a kind of fault-tolerant adaptive routing method of deadlock freedom of computer system, it comprises the steps:
S1: the node in computer system is numbered;
S2: judge whether described node is secure topical node;
S3: the initialization strategy of failure definition information, collects described fault message, sets up fault message table, and described fault message table comprises adjacent networks fault message table and local fault message table;
S4: updating maintenance is carried out to described fault message table.
The fault-tolerant adaptive routing method of deadlock freedom of the present invention is efficiently quick to the location of failure node when not suspending Internet Transmission, can standard collect with cutting and upgrade fault message fast, improve the fail safe of computer system.
Additional aspect of the present invention and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or additional aspect of the present invention and advantage will become obvious and easy understand from accompanying drawing below combining to the description of embodiment, wherein:
Fig. 1 is the structural representation of computer network of the present invention;
Fig. 2 is a kind of malfunction of computer network of the present invention;
Fig. 3 is the schematic diagram after the part of malfunction shown in Fig. 2 is recovered;
Fig. 4 is the schematic diagram that computer network described in Fig. 2 produces new fault.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.
In describing the invention, unless otherwise prescribed and limit, it should be noted that, term " installation ", " being connected ", " connection " should be interpreted broadly, such as, can be mechanical connection or electrical connection, also can be the connection of two element internals, can be directly be connected, also indirectly can be connected by intermediary, for the ordinary skill in the art, the concrete meaning of above-mentioned term can be understood as the case may be.
The present invention proposes a kind of fault-tolerant adaptive routing method of deadlock freedom of computer system, it comprises the steps:
S1: the node in computer system is numbered;
S2: whether decision node is secure topical node;
S3: the initialization strategy of failure definition information, collects fault message, sets up fault message table, and this fault message table comprises adjacent networks fault message table and local fault message table;
S4: updating maintenance is carried out to fault message table.
The fault-tolerant adaptive routing method of this deadlock freedom specifically comprises the steps:
The first step: the node in computer system is numbered.Computer system of the present invention comprises at least two sub-networks, in the preferred embodiment of the present invention, this computer system comprises the first sub-network and the second sub-network, first sub-network comprises N number of node connected with each other, by 0,1 ..., N-1 is numbered, second sub-network comprises N number of node connected with each other, by N, N+1 ..., 2N-1 is numbered, if the value of two node serial numbers to N delivery being positioned at heterogeneous networks is identical, then between these two interchangers of heterogeneous networks, there is a link.
Second step: whether decision node is secure topical node.
First regulation is when fault-free link between a node and its all adjacent node, then claim this node to be fault-free node.
Whether concrete decision node is the method for secure topical node: if the link between all fault-free nodes in present node and this present node place sub-network is fault-free link, then this present node is secure topical node; Otherwise this present node is the dangerous node in local.
3rd step: the initialization strategy of failure definition information, collects fault message, sets up fault message table, and fault message table comprises adjacent networks fault message table and local fault message table.
The present invention have employed local fault information when carrying out Fault-tolerant routing, only with the label of in the first sub-network for v1, and the node of 0≤v1≤N-1 is described, local fault information definition is: fault-free link between other fault-free nodes all in node v1 and its place first sub-network, so just claims this node to be secure topical; Otherwise, when at least there is a faulty link in node v1 and the internodal link of other fault-free of the first sub-network, so claim this node for locally unsafe.Here the faulty link said only comprises the link in same sub-network, does not consider for two internetwork faulty links of son.
And for a fault-free node, it is by maintenance two fault message tables, i.e. rete mirabile neighbor information table and local fault message table, wherein, the node security information in another sub-network that what rete mirabile neighbor information table was preserved is in this sub-network corresponding to all fault-free nodes.The state value 0 of the rete mirabile adjacent node of fault-free node, 1 represents, respectively corresponding fault-free, this two states of fault.What local fault information table was safeguarded is current fault-free node the local fault information of corresponding node in another subnet in a network, if the corresponding node of present node in another subnet is secure topical, so represent with 0; If corresponding node is that local is dangerous, then represent with 1.
Each fault-free node in computer network, such as the first sub-network SN0 [0, N-1] in each fault-free node, all by micro-for storage first sub-network SN11={v|v=2k, N≤2k≤2N-1, k ∈ N}, second micro-sub-network SN12={v|v=2k+1, wherein N≤2k≤2N-2, k ∈ N}, 3rd micro-sub-network SN13={v|v=4k or v=4k+1, wherein N≤4k≤2N-1 and N≤4k+1≤2N-1, k ∈ N}, 4th micro-sub-network SN14={v|v=4k+2 or v=4k+3, wherein N≤4k+2≤2N-1 and N≤4k+3≤2N-1, k ∈ N}, 5th micro-sub-network SN15={v|v=k+N, 0≤2k≤N-1, k ∈ N} and the 6th micro-sub-network SN16={v|v=k+N, N-1≤2k≤2N-2, local fault information in k ∈ N}.It should be noted that, in the present invention, first sub-network SN0 [0, N-1] be namely by being numbered 0,1 ..., N-1 node composition network, the network that namely first micro-sub-network SN11={v|v=2k, N≤2k≤2N-1, k ∈ N} is made up of the node being numbered v, other micro-sub-networks are also define according to this method, and therefore not to repeat here.
Correspondingly, consider the second sub-network SN1 [N, 2N-1] in each fault-free node, it is also by the micro-sub-network SN01={v|v=2k of storage the 7th, 0≤2k≤N-1, k ∈ N}, 8th micro-sub-network SN02={v|v=2k+1, wherein 0≤2k≤N-2, k ∈ N}, 9th micro-sub-network SN03={v|v=4k or v=4k+1, wherein 0≤4k≤N-1 and 0≤4k+1≤N-1, k ∈ N}, tenth micro-sub-network SN04={v|v=4k+2 or v=4k+3, wherein 0≤4k+2≤N-1 and 0≤4k+3≤N-1, k ∈ N}, 11 micro-sub-network SN05={v|v=k, 0≤2k≤N-1, k ∈ N}, 12 micro-sub-network SN06={v|v=k, N-1≤2k≤2N-2, local fault information in k ∈ N}.
When node is the dangerous node in local, the fault-free node in the sub-network at dangerous node place, this local also needs to set up secure topical information table, the secure topical information of this dangerous node in local of secure topical information table record in three micro-subnets.
When the local fault information table in the first sub-network SN0 and the second sub-network SN1 cannot provide the deadlock freedom of shortest route or three spans detour route time, information in the secure topical information table of the storage secure topical information in micro-sub-network that in the first sub-network SN0 and the second sub-network SN1, fault-free node stores just seems particularly important, the Choice in the path that can be obtained by the information in secure topical information table detouring.
In computer system of the present invention, it is considered that need the node of these two sub-networks all to take in during local fault information for sub-network SN0 and SN1 interior joint.And for all the other 12 micro-sub-network SNi1, SNi2, SNi3, during the local fault information of SNi4, SNi5, SNi6, only consider N/2 that forms current sub network network and be positioned at same prefix network node, be namely arranged in SN0 or SN1 network, the local fault information of the network formed.For network SNi1, according to the prefix network information at present node place during its consideration local fault information, only consider the local fault information in network SN01 or SN11 respectively, wherein, i is 0 or 1.
Due to computer system in an initial condition, not containing the security information of adjacent node in node, therefore first use handshake steps, collect the fault message of adjacent node according to the initialization strategy of fault message, concrete step is:
S51: each node sends confirmation to the adjacent node of this node simultaneously, confirms this node fault-free;
S52: the confirmation that receives the in step s 51 all adjacent nodes to this node send by each node, receives information that adjacent node beams back and integrates and obtain this node institute failure condition in a sub-network and locally fault message table;
S53: this node that step S52 obtains by each node failure condition in a sub-network and local fault message send to the adjacent node of this node in different sub-network network, simultaneously, this node the confirmation received from the adjacent node of this node in different sub-network network is sent to this node all adjacent nodes in a sub-network, obtain adjacent networks fault message table.
The present invention specifically carries out the collection of local fault information in the following way, the local fault information in this way in the node all-ones subnet network at its place of acquisition that can walk abreast in one-period and micro-sub-network.For all fault-free nodes, parallel use local fault informational strategy carries out information processing.A node v is supposed that its label is v1, is arranged in network SNi, wherein i=0 or 1, first, node checks all the other N-1 nodes in the network SNi of its place, if in network in SNi any one lead to the link failure of v, so v be locally dangerous node; Otherwise it is secure topical node.
Meanwhile, this method also determines the fault message of node v in corresponding three micro-sub-networks according to corresponding three companies to the link of v simultaneously, if the such as equal fault-free of respective link in micro-sub-network SNi1, so node v is secure topical in this micro-sub-network; Otherwise it is that local is dangerous in micro-sub-network.
In order to transmitting message bag, the present invention also needs each fault-free node being arranged in sub-network SN0 or SN1 to store the corresponding node of another sub-network SN1 or SN0 or the state of adjacent links.Node can obtain the state of its corresponding node in another subnet or the link towards corresponding node in one-period, the state information using another cycle oneself to be acquired with posterior nodal point is transmitted to all the other the fault-free nodes in present networks and obtains the information of all the other fault-free nodes of present networks and the local fault information of rete mirabile corresponding node, re-uses one-period and rete mirabile local fault information is transmitted.
In the present embodiment, the local fault information that the initialization strategy of fault message also comprises all nodes is further transmitted by the virtual network of a narrow bandwidth.
4th step: updating maintenance is carried out to fault message table, concrete step comprises:
Each node carries out the transmission of data message, and receives the signal of network failure probing device transmission;
When described node receives the marking signal of security information change in described fault-tolerant computer system, then perform local fault message more new technological process, this local fault information updating flow process comprises:
If the marking signal received is recover in network or create malfunctioning node, then represent that the item of this malfunctioning node is modified in the adjacent node fault message table of all adjacent fault-free node of this malfunctioning node;
If the marking signal received is recover in network or create faulty link <A-B>, then revise the list item about the other end Node B of faulty link in the fault message table of the adjacent node of node A after an end node A of faulty link receives marking signal;
The all adjacent nodes of amended adjacent node fault message table to this fault-free node send by the adjacent fault-free node all with fault change of location node, and the information integration that will adjacent node be received send;
Fault-free node by from this fault-free node the information that the adjacent node of different sub-network network receives send to this fault-free node all adjacent nodes in a sub-network, meanwhile, the fault message that confirmed in a sub-network by fault-free node of fault-free node and local fault information send to the adjacent node of this fault-free node in different sub-network network.
The solution of the present invention can make system upgrade local fault information when not stopping any task.If any one node v being arranged in sub-network SNi breaks down, i=0 or 1, it is very necessary for so upgrading local fault information for the corresponding node v ' in another sub-network.The local fault information of node v is stored in all fault-free nodes in network SNj (wherein j=1-i), and respective value all should be modified to 1 by these nodes in this case.When a malfunctioning node is replaced by fault-free node, also need the renewal carrying out local fault information accordingly.
When having there is link failure in sub-network SN0 or SN1, in sub-network SN1 or SN0, the local fault information of two end nodes has all needed to upgrade.Suppose that node A and B is respectively two end nodes of faulty link.So in subnet SN1 or SN0, the local fault information of corresponding A and B must convert 1 to by 0.So concerning all the other 6 micro-sub-networks, when faulty link is comprised in micro-sub-network, then the local fault informational needs of node A and B is updated to 1.When two end points of an and if only if link are all in this micro-sub-network, claim this link comprise by this micro-sub-network.When node is positioned at sub-network SN0, node can by the local fault information broadcasting of renewal to all fault-free nodes of network SN1; Same, if node is positioned at SN1, be broadcast to all nodes of SN0.
Each node in sub-network SN1 or SN0 uses 1 binary digit to represent another coupled sub-network SN0 or the fault message of SN1 interior joint.On the whole, each fault-free node all use at the most 8 for storing the fault message of across a network link or corresponding node.Same, when new fault occurs, relative all fault-free nodes all need to upgrade fault message.Same, after having malfunctioning node to be repaired, safeguard that in the table of local fault information, respective items all makes corresponding amendment.
In the preferred embodiment of the present invention, computer network architecture as shown in Figure 1, this Computer Systems Organization contains 32 computing nodes, and wherein each computing node can provide enough powerful computing capability.This system can be divided into two sub-networks, wherein in each sub-network all containing 16 computing nodes.And containing 8 interchangers be connected to each other in each sub-network, each interchanger connects two computing nodes.Each interchanger is connected to an interchanger in another sub-network.The degree of each interchanger is 10, and it is connected to two local computing nodes and eight adjacent interchangers.In this case, the label of node can represent with 4 bits, and being such as numbered 2 nodes can represent with label 0010.
Fig. 2 shows the malfunction that may occur in this computer network, and carries out local fault information and the explanation safeguarding implementation process for these 8 node structures.In figure, forward span represents the path from low grade node to high grade node-node transmission, negative sense span represents the path from high grade node to low grade node-node transmission, and present embodiment is the fault tolerance rout ing strategy proposed based on the preferential routing resource of negative sense and partial failure information approach.In the present embodiment, assuming that node 0000,0011,0101,1010 is malfunctioning node, and link <0110-1110> and link <1100-1011> is faulty link, and all the other nodes and link all keep normal.
The tunnel often used in computer systems, which can reduce passage utilization rate, increases the design difficulty of interchanger simultaneously.In the present embodiment, when not using tunnel, 3 clean culture cycle initialization are used to complete collection and the renewal of fault message.Below for any node A in sample network (when N=8), set forth the process of whole local fault information.In an initial condition, first the security information not containing adjacent node in node, therefore use handshake steps to collect the fault message of adjacent node.
One-period, A node to its all nodes be directly connected, all the other nodes in the first sub-network network and the correspondence in the 21 network joint A ', send a message; Receive the message that all adjacent nodes transmit to it simultaneously, if do not receive corresponding node A ' transmission come message, so then in adjacent networks fault message table by this node correspondence position 1, otherwise then set to 0.The information now stored in A node is the state information that adjacent all node-node transmission come.
Second period, the information collected in one-period sends to all adjacent nodes by A, and receives the information that all direct connected nodes pass back.Whether the information transmitted in this step is the binary number of 8, to should reaching by the adjacent all nodes of node.First the information that in the consolidated network received with regard to A below, a certain Node B transmits is that example illustrates specific implementation process.If there is a certain node C in the state information that Node B transmits cannot arrive, but can reach at the state information interior joint C of A, so B, C are the unsafe node in local, and B is comprising Node B, is also unsafe node in all micro-sub-network of C; If exist in the state information that B transmits, a certain to reach node C in the state information of A be inaccessible, so then put node C for local unsafe; All unreachable in the state that a certain node C transmits at all same network node that A and A receives if exist, so putting C is malfunctioning node, and for Node B, if there is no all the other nodes are unreachable, then B is still security node; Arrange accordingly at adjacent networks fault message table at the fault message of the corresponding node B ' of another subnet according to B.For the information that corresponding node A ' transmission comes, obtain A ' and the connection state with all the other nodes of network in A, but can't judge whether A ' is secure topical node under this state.
In 3rd cycle, the fault message collected in second period and local fault information are passed to A ' by A, the information received are transmitted to other nodes of same network in second period from A ' simultaneously.Now, concerning node A, the information that A ' transmits A is judged whether A ' is secure topical.Meanwhile, the nodal information in the map network that A receives other node-node transmission and comes, pass to the data of A in conjunction with A ', A just can know whether the corresponding node of other nodes in present networks is secure topical node.
By above-mentioned three cycles, just obtain two tables of local fault informational strategy, adjacent networks fault message table and local fault message table.For being labeled as the unsafe node in local in local fault information table, adopt a supplementary table, i.e. the secure topical information of this corresponding node of secure topical information table record in three micro-sub-networks.
Under failure condition as shown in Figure 2, adjacent networks fault message table is as shown in table 1.The content that local fault information table stores is as shown in table 2.For the dangerous node 1100 in the local of 0100 nodes records in table 2, fault-free node in first sub-network also needs Maintenance Table 3, record the secure topical information of 1100 nodes in three micro-subnets, it should be noted that, in the present invention, " * " number represents that both can get 0 also can get 1, in the present embodiment, although only need record 1100 node at three micro-sub-network 11**, 1*0*, secure topical information table in 1**0, but micro-sub-network 11**, 1*0*, network represented by 1**0 is all a part of sub-network 1***, therefore in table, there is the information of mark sub-network 1***.
Table 1 adjacent networks fault message table
Table 2 local fault information table
The secure topical information table of table 3 1100 node
The node of micro-sub-network 1*** 11** 1*0* 1**0
Value 1 0 0 0
When fault state changes in system, as replaced malfunctioning node, repair faulty link, or when producing fault, system dynamic realtime data are upgraded.As shown in Figure 3, in a certain cycle, in network, malfunctioning node A (0101) and faulty link <1100-1011> is repaired.First, fault test set in network detects in network and has occurred the change of malfunction, now its all fault-free nodes adjacent with fault recovery node A in network send update signal, represent that the fault of node A is repaired, now, all fault-free nodes in first sub-network and node A ' (1101) receive update signal, and represent that the list item of node A is modified by adjacent node fault message table in node, change unfaulty conditions into; Simultaneously, node in second sub-network also receives the signal that link in network failure condition changes, meanwhile, two terminal B (1100) and the C (1011) of fault restoration link <1100-1011> revise respectively and show in for the fault message of node C and B.Now, network enters the local fault information updating stage.Period 1, the adjacent node security information of oneself preserving sends to all adjacent nodes by the node receiving update signal, and receives the information that all adjacent nodes pass back.Subsequently, the network safety information table that the fault-free node receiving node failure information sends according to adjacent node upgrades local fault information table in respective nodes.Second round, the fault message that a upper cycle collects by node and local fault information pass to the corresponding node of rete mirabile, the information received are transmitted to other nodes of same network from corresponding node simultaneously.Now namely complete the renewal of local fault information.The modification that after upgrading, nodes stores is for shown in table 4 and table 5.
Table 4 adjacent networks fault message table
Table 5 local fault information table
As shown in Figure 4, in a certain cycle, in network, produce malfunctioning node D (0100) and faulty link <1000-1001>.First, fault test set in network detects in network and has occurred the change of malfunction, now its all fault-free nodes producing node D with fault adjacent in network send update signal, represent that node D produces fault, now, all fault-free nodes in first sub-network and node D ' (1100) receive update signal, and represent that the list item of node D is modified by adjacent node fault message table in node, change malfunction into; Simultaneously, node in second sub-network also receives the signal that link in network failure condition changes, meanwhile, two end points E (1000) and the F (1001) of faulty link <1000-1001> revise respectively and show in for the fault message of node C and B.Now, network enters the local fault information updating stage.Period 1, the adjacent node security information of oneself preserving sends to all adjacent nodes by the node receiving update signal, and receives the information that all adjacent nodes pass back.Subsequently, the network safety information table that the fault-free node receiving node failure information sends according to adjacent node upgrades local fault information table in respective nodes.Second round, the fault message that a upper cycle collects by node and local fault information pass to the corresponding node of rete mirabile, the information received are transmitted to other nodes of same network from corresponding node simultaneously.Now, the renewal of local fault information is just completed.The modification that after upgrading, nodes stores is for shown in table 6-table 8.
Table 6 adjacent networks fault message table
Table 7 local fault information table
The secure topical information table of table 8 1001 node
The node of micro-sub-network 1*** 10** 1*0* 1**1
Value 1 1 1 0
In the present invention, real-time collecting is carried out for the failure condition in network.In the process of fault information collection, communication with one another adopts a data bit to carry out the confirmation of connection state information.After having been set up by the fault message table in network, special tunnel between routing node, has constantly been used to carry out network state exchange.When occurring new fault in network or having repaired some fault, the information in the mode his-and-hers watches of fault information collection is adopted to upgrade accordingly.
In the description of this specification, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, identical embodiment or example are not necessarily referred to the schematic representation of above-mentioned term.And the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiment or example.
Although illustrate and describe embodiments of the invention, those having ordinary skill in the art will appreciate that: can carry out multiple change, amendment, replacement and modification to these embodiments when not departing from principle of the present invention and aim, scope of the present invention is by claim and equivalents thereof.

Claims (6)

1. the fault-tolerant adaptive routing method of the deadlock freedom of computer system, is characterized in that, comprise the steps:
S1: the node in computer system is numbered;
S2: judge whether described node is secure topical node, specifically comprise: if present node and described present node the link between all fault-free nodes be in a sub-network fault-free link, then described present node is secure topical node, otherwise described present node is the dangerous node in local;
S3: the initialization strategy of failure definition information, collects described fault message, sets up fault message table, and described fault message table comprises adjacent networks fault message table and local fault message table, specifically comprises:
S51: each node sends confirmation to the adjacent node of described node simultaneously, confirms described node fault-free,
S52: all adjacent nodes of the confirmation received in step s 51 to described node send by each node, receive the information beamed back of adjacent node and integrate and obtain described node institute failure condition in a sub-network and locally fault message table, wherein, what described local fault information table was safeguarded is current fault-free node the local fault information of corresponding node in another subnet in a network
S53: the described node that step S52 obtains by each node failure condition in a sub-network and local fault message send to the adjacent node of described node in different sub-network network, simultaneously, described node the confirmation received from the adjacent node of described node in different sub-network network is sent to described node all adjacent nodes in a sub-network, obtain adjacent networks fault message table, wherein, the node security information in another sub-network that what described adjacent networks fault message table was preserved is in this sub-network corresponding to all fault-free nodes;
S4: updating maintenance is carried out to described fault message table.
2. the fault-tolerant adaptive routing method of deadlock freedom of computer system as claimed in claim 1, it is characterized in that, described computer system comprises the first sub-network and the second sub-network, described first sub-network comprises N number of node, by 0,1 ..., N-1 is numbered, described second sub-network comprises N number of node, by N, N+1 ..., 2N-1 is numbered.
3. the fault-tolerant adaptive routing method of deadlock freedom of computer system as claimed in claim 1, it is characterized in that, when node is the dangerous node in local, fault-free node in the sub-network at dangerous node place, described local also needs to set up secure topical information table, the secure topical information of the dangerous node in local in three micro-subnets described in described secure topical information table record.
4. the fault-tolerant adaptive routing method of deadlock freedom of computer system as claimed in claim 1, it is characterized in that, described fault message initialization strategy, also comprises further:
The local fault information of all nodes is transmitted by the virtual network of a narrow bandwidth.
5. the fault-tolerant adaptive routing method of deadlock freedom of computer system as claimed in claim 1, it is characterized in that, described step S4 comprises:
Each node carries out the transmission of data message, and receives the signal of network failure probing device transmission;
When described node receives the marking signal of security information change in described fault-tolerant computer system, then perform local fault message more new technological process.
6. the fault-tolerant adaptive routing method of deadlock freedom of computer system as claimed in claim 5, it is characterized in that, described local fault information updating flow process, comprising:
If the described marking signal received is recover in network or create malfunctioning node, then represent that the item of described malfunctioning node is modified in the adjacent node fault message table of all adjacent fault-free node of described malfunctioning node;
If the described marking signal received is recover in network or create faulty link <A-B>, then revise the list item about the other end node of faulty link in the fault message table of the adjacent node of described node after an end node of faulty link receives marking signal;
The all adjacent nodes of amended adjacent node fault message table to described fault-free node send by the adjacent fault-free node all with fault change of location node, and the information integration that will adjacent node be received send;
Fault-free node by from described fault-free node the information that the adjacent node of different sub-network network receives send to described fault-free node all adjacent nodes in a sub-network, meanwhile, the fault message that confirmed in a sub-network by described fault-free node of described fault-free node and local fault information send to the adjacent node of described fault-free node in different sub-network network.
CN201210371076.6A 2012-09-28 2012-09-28 Deadlock free fault-tolerant self-adaptation routing method of computer system Active CN102904806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210371076.6A CN102904806B (en) 2012-09-28 2012-09-28 Deadlock free fault-tolerant self-adaptation routing method of computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210371076.6A CN102904806B (en) 2012-09-28 2012-09-28 Deadlock free fault-tolerant self-adaptation routing method of computer system

Publications (2)

Publication Number Publication Date
CN102904806A CN102904806A (en) 2013-01-30
CN102904806B true CN102904806B (en) 2015-04-15

Family

ID=47576849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210371076.6A Active CN102904806B (en) 2012-09-28 2012-09-28 Deadlock free fault-tolerant self-adaptation routing method of computer system

Country Status (1)

Country Link
CN (1) CN102904806B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656808B (en) * 2016-10-10 2019-08-02 清华大学 A kind of second level connects the fault-tolerant complete adaptive routing method of interference networks entirely
EP3576479A4 (en) * 2017-02-27 2020-03-04 Huawei Technologies Co., Ltd. Management method, and management unit and system
CN110099398B (en) * 2019-05-24 2022-07-15 浙江大华技术股份有限公司 Network failure information collection method, network system, storage medium and network device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787478A (en) * 2004-12-09 2006-06-14 电子科技大学 Method for non deadlock self adaptive routing in multi-dimensional exchanging structure
CN101267394A (en) * 2008-03-10 2008-09-17 清华大学 No dead lock plane self-adapted routing method in 3-D mesh
CN102170402A (en) * 2011-05-31 2011-08-31 清华大学 A deadlock-free adaptive routing algorithm in a Torus network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5701416A (en) * 1995-04-13 1997-12-23 Cray Research, Inc. Adaptive routing mechanism for torus interconnection network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787478A (en) * 2004-12-09 2006-06-14 电子科技大学 Method for non deadlock self adaptive routing in multi-dimensional exchanging structure
CN101267394A (en) * 2008-03-10 2008-09-17 清华大学 No dead lock plane self-adapted routing method in 3-D mesh
CN102170402A (en) * 2011-05-31 2011-08-31 清华大学 A deadlock-free adaptive routing algorithm in a Torus network

Also Published As

Publication number Publication date
CN102904806A (en) 2013-01-30

Similar Documents

Publication Publication Date Title
CN100512194C (en) Sending and receiving method and system for link aggregation method, device and MAC frame
CN101425953B (en) Address updating method for Ether ring network and network node
US8018844B2 (en) Reliable message transfer over an unreliable network
CN103986512B (en) Satellite network inter-satellite link failure recovery method based on regional division
CN106301952A (en) A kind of SDN datum plane link backup method and device
CN106933693A (en) A kind of data-base cluster node failure self-repairing method and system
CN103281247A (en) Universal routing method and universal routing system of data center network
CN101160824A (en) Method for processing the tandem connection monitoring failure dependency of different levels and an equipment thereof
CN103843293A (en) Communication system, sending device, communication device, failure notification method and non-temporary computer-readable medium storing program
CN102904806B (en) Deadlock free fault-tolerant self-adaptation routing method of computer system
JP2008167315A (en) Redundant line connecting method and wide-area communication network node device
CN102006202A (en) Router identification collision detection method and router
JP5491623B2 (en) Address refresh method and system
CN106789624A (en) One kind failure route recovery method
CN101192959B (en) A recovery method for multicast service connection in automatic switching optical network
CN107547374B (en) Aggregation route processing method and device
CN101207545A (en) Method and system for detection of repeat media access control address on elastic grouping ring
CN102136965A (en) Method for detecting tunnel faults and traffic engineering (TE) node
CN102932250A (en) Non-deadlock self-adaptation routing method based on fault-tolerant computer network structure
CN104038364B (en) The fault-tolerance approach of distributed stream treatment system, node and system
CN103179034B (en) Adaptive routing method without dead
CN102651712A (en) Node routing method of multiprocessor system, controller and multiprocessor system
CN104378287A (en) Topology calculating method and device
CN101651560B (en) Two-way routing network and reliability supporting method
CN100413258C (en) Pre-alarming method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant