CN100571183C - A kind of barrier operating network system, device and method based on fat tree topology - Google Patents

A kind of barrier operating network system, device and method based on fat tree topology Download PDF

Info

Publication number
CN100571183C
CN100571183C CNB2007101207540A CN200710120754A CN100571183C CN 100571183 C CN100571183 C CN 100571183C CN B2007101207540 A CNB2007101207540 A CN B2007101207540A CN 200710120754 A CN200710120754 A CN 200710120754A CN 100571183 C CN100571183 C CN 100571183C
Authority
CN
China
Prior art keywords
barrier
module
bag
reduction
sequence number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2007101207540A
Other languages
Chinese (zh)
Other versions
CN101127677A (en
Inventor
曹政
刘新春
安学军
王达伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CNB2007101207540A priority Critical patent/CN100571183C/en
Publication of CN101127677A publication Critical patent/CN101127677A/en
Application granted granted Critical
Publication of CN100571183C publication Critical patent/CN100571183C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses barrier (Barrier) operating network system, device and method based on fat tree topology.This system is included in the data internet of multiple processor system, uses the root of telephone net node as the barrier tree, and processor node is as the leaf of barrier tree.Telephone net node comprises the barrier module, is used for the process of reduction and distribution.Guarantee the reliability aspect of barrier operating, adopt response mode,, adopt the urgency mode receiving the barrier distributing packets to receiving barrier reduction bag.By the single-bit label manner, barrier operating is repeatedly distinguished.By the error rate statistics, obtain the link error rate of current switch, automatically overtime re-transmission parameter is adjusted.The reliability that it has guaranteed Network Transmission has improved the efficient of Network Transmission and has promptly hanged down retardance, reaches the dual assurance of function and performance.

Description

A kind of barrier operating network system, device and method based on fat tree topology
Technical field
The present invention relates to the multiprocessor technical field, particularly relate to a kind of barrier (Barrier) operating network system, device and method based on fat tree topology.
Background technology
Multiple processor system is parallel processing computer (Parallel Computer System, PCS) a kind of in, form by many independently computers that can independently carry out the program of oneself, interconnect between the processor of many computers, the exchanges data between the realization program and synchronous.
Multiple processor system is multiple-instruction multiple-data (Multiple Instruction Multiple Data, MIMD) processor system, the part of each processor distribution concurrent program, the concurrent execution processing of each block.
But, in multiple processor system, in the implementation of each block, between block, in some cases, when being correlated with as data take place between the block, still need certain order-preserving (order) relation,, just need carry out the simultaneous operation between the processor in order to safeguard this orderpreserving relation.
Wherein, being widely used in barrier (Barrier) operation in the multiple processor system is a kind of in the simultaneous operation.Barrier operating is the synchronous points of concurrent program, and concurrent program is just carried out subsequent operation after will waiting for that all processors of participating all arrive this synchronous points.
Barrier operating is divided into two processes, and one is the barrier arrival notice, and another is that barrier is finished notice.
The barrier arrival notice is the processor reporting system, and this processor arrives the process of barrier synchronization point.In tree-like barrier algorithm, this process has the feature of reduction step by step, is also referred to as reduction (Reduce) process;
It is the notifications processors that barrier is finished notice, and system has arrived the process of barrier synchronization point.In tree-like barrier algorithm, this process has the feature of distribution step by step, is also referred to as distribution (Distribute) process.
The time-dependent of barrier arrival notice process is in the processor that arrives synchronous points the latest, and barrier is finished notification procedure and then depended on concrete barrier and realize.
Barrier operating is the serial part in the concurrent program, if can reduce the barrier operating process, then can reach the purpose of optimizing barrier operating and parallel program performance.
Because the Synchronous Processing operation relates to a plurality of processors, therefore, barrier operating also relates to the interconnect problem of interprocessor.In the prior art, two kinds of realization technology are arranged, a kind of data internet that is to use interprocessor, another kind is to use barrier private internet network.
U.S. Pat 006216174B1 discloses the system and method that a kind of quick barrier circuit is realized, it is provided with a barrier (barrier) position in processor, and draw two holding wires, and one is used for barrier (Barrier) arrival notice, and one is used for barrier (Barrier) and finishes notice.Its realize Global B arrier position by specialized hardware with operation, thereby finish the barrier operating process.
But owing to need extra network components for barrier operating is provided with a cover private internet network, and need independent wiring, cost is higher, therefore, can not get good application.And the barrier operating that the data internet that adopts interprocessor is realized increases the support to barrier in data network components, is the higher realization technology of a kind of cost performance.
U.S. Pat 005365228A discloses barrier operating realization in a kind of multistage network, and it realizes barrier operating by transforming the data internet, by the priority setting, has only the highest barrier operating of priority to be able to carry out in network at every turn.
But realize barrier operating by the data internet, influence the performance of barrier operating in case barrier (Barrier) packet is made mistakes light, heavy then cause the collapse of program running, therefore need its reliability of guarantee.
Simultaneously, barrier operating all needs the execution efficient that keeps higher in any environment, therefore realizes barrier operating by the data internet, also needs to ensure its efficient, promptly low retardance.
And prior art is when guaranteeing its reliability, can't implementation efficiency the i.e. optimization of low retardance, also just can't reach the dual assurance of function and performance.
Summary of the invention
The object of the present invention is to provide a kind of barrier (Barrier) operating network system, device and method based on fat tree topology.The reliability that it has guaranteed Network Transmission has improved the efficient of Network Transmission and has promptly hanged down retardance, reaches the dual assurance of function and performance.
Be a kind of barrier operating network system of realizing that the object of the invention provides based on fat tree topology, it is included in the data internet of multiple processor system, described barrier operating network system, the barrier tree that comprises many fat tree topology structures, and the root that uses telephone net node to set as barrier, processor node is as the leaf of barrier tree, described telephone net node comprises the barrier module, be used for described telephone net node being configured according to the fat tree topology structure, and in reduction procedure, adopt response mode to collecting barrier reduction packet procedures according to the state of described telephone net node, in distribution procedure, adopt the urgency mode to collecting barrier distributing packets process, guarantee the reliability of barrier operating.
Described barrier module also is used for distinguishing double barrier operating by the single-bit label manner.
Described barrier module also is used for obtaining the link error rate of current switch by the error rate statistics, automatically overtime re-transmission parameter is adjusted.
Described barrier module comprises the configuration set module, the barrier state machine, and barrier bag grouping module, the time-out count module, wherein:
Described configuration set module comprises leaf configuration register and father's port arrangement register, is used for determining the internodal interconnecting relation of barrier group;
Described barrier bag grouping module is used for bag is classified according to the barrier group number, and the barrier bag of each group is translated as all kinds of incidents;
Described barrier state machine comprises a plurality of group state machines, and the dispatch state machine;
Described time-out count module is used to control the generation of the re-transmission and the incident of urgency.
Described group state machine is handled the barrier packet that belongs to this group, and the barrier packet that each group produces is sent to the next stage node after dispatching through the dispatch state machine;
Described group state machine comprises barrier reduce state register, and this reduce state register is used for writing down the various state informations in the barrier operating process;
This group state machine also comprises barrier completion status position, and this mode bit has write down the completion status of barrier distribution operation.
Described barrier module also comprises the link state monitoring module, is used for dynamically obtaining link-state information, and according to link-state information, dynamically arranges re-transmission and urge timeout threshold, uses for the time-out count module.
For realizing that the object of the invention also provides a kind of switch, root as the barrier of fat tree topology structure tree, comprise the barrier module, be used for described telephone net node being configured according to the fat tree topology structure, and according to the state of described telephone net node in reduction procedure, adopt response mode to collecting barrier reduction packet procedures, in distribution procedure, adopt the urgency mode to collecting barrier distributing packets process, guarantee the reliability of barrier operating.
Described barrier module also is used for distinguishing double barrier operating by the single-bit label manner.
Described barrier module also is used for obtaining the link error rate of current switch by the error rate statistics, automatically overtime re-transmission parameter is adjusted.
Described barrier module comprises the configuration set module, the barrier state machine, and barrier bag grouping module, the time-out count module, wherein:
Described configuration set module comprises leaf configuration register and father's port arrangement register, is used for determining the internodal interconnecting relation of barrier group;
Described barrier bag grouping module is used for bag is classified according to the barrier group number, and the barrier bag of each group is translated as all kinds of incidents, and these incidents comprise that stipulations, distribution, stipulations reply and distribute urgency;
Described barrier state machine comprises a plurality of group state machines, and the dispatch state machine;
Described time-out count module is used to control the generation of the re-transmission and the incident of urgency.
Described barrier module also comprises the link state monitoring module, is used for dynamically obtaining link-state information, and according to link-state information, dynamically arranges re-transmission and urge timeout threshold, uses for the time-out count module.
For realizing that the object of the invention also further provides a kind of barrier operating method based on fat tree topology, comprise the following steps:
Steps A is configured the telephone net node in the barrier tree of barrier operating system according to fat knot topological structure;
Step B, telephone net node is in idle condition, responds the urgency bag different with current sequence number, waits for the barrier reduction bag identical with current sequence number;
Step C after telephone net node receives the barrier reduction bag identical with current barrier sequence number, responds the barrier reduction bag identical with current sequence number, in the mode of overtime re-transmission, all barrier reduction bags is returned respond packet, carries out the barrier reduction;
Step D when the reduction process is finished, then responds the barrier distributing packets identical with current sequence number, and the process that receives the barrier distributing packets adopts the mode of overtime urgency, and the barrier distributing packets that receives is carried out multicast distribution, forwards step B then to.
Among the described step D, carry out the barrier distribution, also comprise the following steps:
To the position negate of barrier completion status.
Among the described step C, respond the barrier reduction bag identical,, all barrier reduction bags are returned respond packet, carry out the barrier reduction, comprise the following steps: in the mode of overtime re-transmission with current sequence number
Step C1 after receiving barrier reduction bag, checks the barrier sequence number in the barrier reduction bag; If consistent with current sequence number, the mode bit of the corresponding reduce state register that then resets goes to step C2; Otherwise abandon; Simultaneously, to all barrier reduction bags, reply response packet, the sequence number in the response packet adopts the sequence number in the barrier reduction bag;
Step C2, reduce state register are complete 0 o'clock, and the barrier module is sent barrier reduction bag, and control the startup of time-out count module and reply time-out count, forward step C3 to; Otherwise go to step C2;
Step C3, when receiving response packet, and after replying sequence number and current sequence number being consistent, then the reset answer time-out count is finished overtime action this time, goes to step C4; Reach threshold value if reply time-out count, then retransmit barrier reduction bag;
Step C4 waits for the barrier distributing packets.
Described threshold value is provided with by the link state monitoring module.
Described link state monitoring module comprises Link State test and timeout threshold setting, two counters are set in the link state monitoring module, and the number of crc error bag appears in a record, is initialized as 1, another writes down the number of all packets, and the ratio of two counters is error rate:
r=C error/C total
Wherein, C ErrorFor the number of data packets of crc error, C occurring TotalBe the packet sum that receives, and C ErrorAnd C TotalValue all be not 0;
Timeout threshold Ack_Thres=K/r, wherein K is a constant.
Among the described step D, respond the barrier distributing packets identical,, adopt the mode of overtime urgency, carry out the process of barrier distribution, comprise the following steps: the position negate of barrier completion status with current sequence number
Step D1 finishes the telephone net node of reduction procedure, when time-out count is replied in startup, starts the overtime urgency counting of barrier distributing packets; After urging counting to reach threshold value, if also do not receive the barrier distributing packets, then send distribution and urge bag to father's port, urge father node to carry out the transmission of barrier distributing packets;
Step D2, if root node, then after finishing the barrier reduction procedure, the barrier module of node uses current barrier sequence number position to fill the barrier distributing packets, goes to step D3; If not root node, then receiving the barrier distributing packets, and when sequence number is identical with the current sequence number position in the distributing packets, going to step D3;
Step D3, the effective port of the barrier module of node in the leaf configuration register sends the barrier distributing packets, and set barrier completion status position is 1, goes to step D4;
Step D4 urges bag if receive barrier, if barrier completion status position is 1, and urges the sequence number position of bag different with current barrier sequence number, and then the barrier module of node is retransmitted the barrier distributing packets, fills the barrier distributing packets of retransmitting with the value after the position negate of barrier sequence number.
Described threshold value is provided with by the link state monitoring module.
The function of link state monitoring module comprises the Link State test and urges the threshold value setting, two counters are set in the link state monitoring module, and the number of crc error bag appears in a record, is initialized as 1, another writes down the number of all packets, and the ratio of two counters is error rate:
r=C error/C total
Wherein, C ErrorFor the number of data packets of crc error, C occurring TotalBe the packet sum that receives, and C ErrorAnd C TotalValue all be not 0;
Wherein, urgency threshold value Dun_Thres=(K ' * L)/and r, wherein K ' is a constant, L is the number of plies of node in the barrier tree.
Described steps A can comprise the following steps:
Steps A 1 by the leaf configuration register in the configuration set module in the configuration switch and father's port arrangement register, is determined in the barrier group interconnecting relation between node;
Steps A 2 is used the value of leaf configuration register in the configuration set module, and the initial value of barrier status register is set;
Steps A 3, the barrier completion status that before carrying out barrier operating, resets position.
Effective effect of the present invention is: barrier (Barrier) operating network system, the device and method that the present invention is based on fat tree topology are applied in the multiple processor system, are associated with Link State, and based on overtime retransmission mechanism, high efficient and reliable need not to intervene.Wherein barrier operating network system adopts tree topology, and the structure of barrier tree is configurable, and it realizes the high efficient and reliable barrier operating network system relevant with Link State, shares physical link with the data internet; Shorten the communication path between processor node and the barrier tree root node, reduce the transmission delay of barrier bag; It supports concurrent a plurality of barrier operatings, the barrier tree that different operating is corresponding different, and tree structure information wherein, by the register-stored that is arranged in switch and the processor, this information can be configured, and realizes the dynamic adjustment of tree structure.And make its reliability relevant with Link State, and promptly, automatically overtime re-transmission parameter is adjusted according to the link error rate of current switch, reduce of the influence of overtime retransmission packet to network, improve the efficient of makeing mistakes and recovering.
Description of drawings
Fig. 1 is barrier (Barrier) the operating network system structural representation that the present invention is based on fat tree topology;
Fig. 2 is a telephone net node structural representation of the present invention;
Fig. 3 is barrier among Fig. 2 (Barrier) modular structure schematic diagram;
Fig. 4 A is a leaf configuration register schematic diagram;
Fig. 4 B is father's port arrangement register schematic diagram;
Fig. 4 C is barrier (Barrier) reduce state register schematic diagram;
Fig. 4 D is barrier (Barrier) completion status position schematic diagram;
Fig. 5 is barrier (Barrier) the method for operation process flow diagram that the present invention is based on fat tree topology;
Fig. 6 is barrier (Barrier) reduction procedure flow chart;
Fig. 7 is barrier (Barrier) distribution procedure flow chart;
Fig. 8 is barrier (Barrier) link state monitoring module routine schematic diagram.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, a kind of barrier based on fat tree topology of the present invention (Barrier) operating network system, device and method are further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The present invention is based on barrier operating network system, the device and method of fat tree topology, core is to realize the high efficient and reliable barrier operating relevant with Link State, it is based on fat tree (Fat-Tree) topological structure, adopt tree structure, use the root of switch as barrier (Barrier) tree, processor node 12 is as the leaf of barrier tree, and support concurrent a plurality of barrier operatings, the barrier tree that different operating is corresponding different, tree structure information wherein, by the register-stored that is arranged in switch and the processor, can be configured.
Fat tree (Fat-Tree) structure is a kind of topological structure that is widely used in the data internet in the multiple processor system.Fat tree (Fat-Tree) structure is a kind of tree structure.In fat tree (Fat-Tree) structure, processor node 12 is positioned at the leaf of tree, and root and the intermediate node of tree are made up of switch.
Usually, barrier operating can face two class integrity problems: 1) be the barrier packet loss; 2) be that double barrier operating mixes.
Different reliability method meetings has influence on the performance of barrier operating, therefore, for barrier operating, needs a kind of reliability method efficiently to guarantee barrier operating.
Communication link signal quality and thermal noise cause first kind problem to occur, and for this class problem, the opportunity of EDC error detection and correction is very crucial, directly has influence on the performance of operation, and be the problem that presses for solution opportunity how to select EDC error detection and correction.And because the generation probability of mistake is different along with the difference of link in the first kind problem, the link of unlike signal quality should be equipped with different reliability methods, and the reliability realization is irrelevant with Link State in the prior art, can't reach the maximization of performance.
Barrier network of the present invention adopts point-to-point overtime repeating method to first kind integrity problem:
For the barrier reduction procedure, adopt response mode, promptly receive the node of barrier (Barrier) reduction bag, loopback barrier (Barrier) response packet, in setting-up time, do not receive the node of barrier response packet, then retransmit barrier reduction bag;
For the barrier distribution phase, adopt the urgency mode, promptly send the node of barrier reduction bag, in setting-up time, do not receive barrier (Barrier) distributing packets, then send barrier (Barrier) and urge bag, urge the arrival of barrier distributing packets.
Preferably, barrier response packet and barrier urge bag only to be present between the adjacent two-stage node.
For the second class problem, be because processor is finished barrier and had asynchronism, be that double barrier operating may be present in the system simultaneously, the reliability mechanisms of retransmission data packet may can intensify the appearance of this class problem, for this second class problem, how effectively to distinguish double barrier operating also is the problem that presses for solution.
Barrier network of the present invention adopts the method identification of single-bit sign to the second class integrity problem.Because the obstructive semanteme of barrier for an application, has twice continuous barrier operating to exist at most, therefore adopts single-bit to identify and is more excellent method, can solves the second class problem effectively.
More preferably, the reliability of barrier network of the present invention is also relevant with Link State, and it increases the error rate statistical function in switch, according to the link error rate of current switch, automatically overtime re-transmission parameter is adjusted.
When Link State is good, increase overtime re-transmission parameter, reduce of the influence of overtime retransmission packet to network; Otherwise reduce overtime re-transmission parameter, improve the efficient of makeing mistakes and recovering.
As shown in Figure 1, be the barrier operating network system schematic diagram based on fat tree topology of the present invention, this barrier operating network system is included in the data internet of multiple processor system, use fat tree (Fat-Tree) topological structure, use the root of telephone net node 11 as the barrier tree, processor node 12 is as the leaf of barrier tree, and promptly switch is positioned at the intermediate node of tree, and processor node 12 is the leaf of tree.A barrier tree promptly participates in the network components set of same barrier operating, and a barrier tree is a barrier group.A plurality of barrier groups are present in the barrier operating network system simultaneously.
The structural information of described barrier tree by the register-stored that is arranged in switch and the processor, can be configured.
As shown in Figure 2, be switch architecture schematic diagram in the barrier operating network system that the present invention is based on fat tree topology, the input/output port of switch is made up of a plurality of Virtual Channels, and Virtual Channel comprises data Virtual Channel (VCFIFO) 23 and barrier Virtual Channel (Barrier FIFO) 25.Data packets for transmission hands over adopted switch (not shown) to be responsible for the exchange scheduling by data in the data Virtual Channel 23; Barrier (Barrier) packet in barrier (Barrier) Virtual Channel 25 is handled by barrier module 21.
Described barrier packet, shown in Fig. 4 E, wherein,
VC: barrier Virtual Channel number
Type: barrier bag type, 3, existing 4 kinds of bag types are respectively: 001-reduction bag; The 010-distributing packets; 100-reduction respond packet; Bag is urged in the 110-distribution.
Seq: the sequence number of barrier operating, 1bit, in order to distinguish double barrier operating, initial value is 0.
Barrier ID: the group (Barrier Group) of indicating current barrier operating place.
BPCRC: the CRC check sign indicating number of barrier bag.
At first number wrap classification according to Virtual Channel by port receiver module 22, the barrier packet is buffered in barrier (Barrier) input block, after the barrier processing data packets of barrier module 21 with the barrier buffering area, produce new barrier packet, the new barrier packet that produces is cached in the barrier output buffer.For the data in each output buffer, port sending module 24 is responsible for scheduling, and the barrier output buffer has the highest dispatching priority.
Barrier module 21 of the present invention is used for the process in reduction and distribution, and the barrier reduction bag of receiving is adopted response mode, to the barrier reduction bag that sends, adopts the urgency mode, guarantees the reliability of barrier operating.
Preferably, described barrier module 21 also is used for by the single-bit label manner, distinguishes different barrier operatings.
More preferably, described barrier module 21 also is used for, automatically overtime re-transmission parameter being adjusted according to the link error rate of current switch by the error rate statistics.
The detailed structure of barrier module 21 comprises configuration set module 31 as shown in Figure 3 among Fig. 2, barrier state machine 32, barrier bag grouping module 33, time-out count module 34, and link state monitoring module 35.
Described configuration set module 31 comprises leaf configuration register and father's port arrangement register, is used for determining the internodal interconnecting relation of barrier group.
The mapping of one bit is arranged in the leaf configuration register of each port of switch in configuration set module 31, effective when leaf configuration register high level, invalid during low-voltage.Shown in Fig. 4 A, it is the leaf port of No. 1 barrier group node that the value of leaf configuration register is being represented the 0-2 port.
Father's port arrangement register in the configuration set module 31 is with port numbers sign father port.
Whether root (Root) position, being used to identify present node is the root of whole barrier group if being set in father's port register; Reset (Reset) position, be used to the barrier operating that resets.
Shown in Fig. 4 B, representing port 3 for father's port arrangement register is father's ports of this node in No. 0 barrier group, and this node is not the root of whole barrier group.
Barrier bag grouping module 33 is used for bag is classified according to the barrier group number, and the barrier bag of each group is translated as all kinds of incidents, comprises reduction incident, Distribution Events, retransmission events and urgency incident.
Barrier state machine 32 comprises first group state machine, 321, the second group state machines 321 ..., N-1 group state machine 321, and dispatch state machine 322.
321 pairs of group state machines belong to the barrier packet of this group to be handled.The barrier packet that each group produces is sent to the next stage node after dispatching through dispatch state machine 322.
Described group state machine 321 comprises barrier reduce state register, and this reduce state register is used for writing down the various state informations in the barrier operating process.
Shown in Fig. 4 C, be barrier (Barrier) reduce state register, its each representing the barrier reduce state of each leaf port.Also comprise a sequence number position (Seq) in this barrier reduce state register, be used to distinguish double barrier operating.
This group state machine 321 also comprises barrier (Barrier) completion status position, and shown in Fig. 4 D, this mode bit has write down the completion status of barrier distribution operation.
Time-out count module 34 is used to control the generation of the re-transmission and the incident of urgency.
Link state monitoring module 35 is used for dynamically obtaining link-state information, and according to link-state information, dynamically arranges and retransmit and urge timeout threshold, uses for time-out count module 34.
Carry out before the barrier operating, set up the barrier tree structure by configuration set module 31 earlier, use for subsequent operation.
When operation is carried out, barrier bag grouping module 33 reads the barrier packet from the barrier input block, these bags are separated according to group number, and be translated as all kinds of incidents, comprise reduction incident, Distribution Events, retransmission events and urgency incident, use for barrier group state machine 321.
At all kinds of incidents of obtaining, barrier group state machine 321 produces different barrier packets and responds.
The barrier packet that each group produces is sent to corresponding barrier output buffer by dispatch state machine 322.Wherein barrier group state machine 321 relates to based on the realization that retransmits reliability mechanisms, by the cooperation of time-out count module 34 and link state monitoring module 35, to reach the purpose that guarantees the reliability mechanisms optimum performance.
Barrier operating network system based on fat tree topology of the present invention, realized that processor node 12 is positioned at the leaf of tree, root and the intermediate node of tree are made up of switch, the present invention is because under fat tree (Fat-Tree) topological structure, all processors interconnect by switch, therefore, switch is in the centre position in the processor communication path, select switch can obtain the shortest communication path, overcome the root of existing selection processor node 12, the defective that causes communication delay to increase as the barrier tree.
The present invention has simplified the barrier agreement by having realized the administration configuration function based on the supervising the network of Ethernet.Overcome existing internet based on fat tree (Fat-Tree) topology, select the root of switch for use as the barrier tree, need allow switch realize more barrier agreement, increase complicated administration configuration function, increase the defective of the implementation complexity of switch greatly.
Describe the barrier based on fat tree topology of the present invention (Barrier) method of operation below in detail, as shown in Figure 5, comprise the following steps:
Step S100, layoutprocedure.Among the present invention, at first the telephone net node 11 in the barrier tree of barrier operating system is configured according to fat knot topological structure;
Step S110 by the leaf configuration register in the configuration set module 31 in the configuration switch and father's port arrangement register, determines in the barrier group interconnecting relation between node.
The mapping of one bit is arranged in the leaf configuration register of each port of switch in configuration set module 31, effective during leaf configuration register high level, invalid during low level.Shown in Fig. 4 A, the leaf port that it is No. 1 barrier group node that the value of leaf configuration register is being represented 0~No. 2 port.
Father's port arrangement register in the configuration set module 31 is with port numbers sign father port.
Whether root (Root) position, being used to identify present node is the root of whole barrier group if being set in father's port register; Reset (Reset) position, be used to the barrier operating that resets.
Shown in Fig. 4 B, father's port arrangement register represents that port 3 is father's ports of this node in No. 0 barrier group, and this node is not the root of whole barrier group.
Step S120 uses the value of leaf configuration register in the configuration set module 31, and the initial value of barrier status register is set;
Step S130, the barrier completion status that before carrying out barrier operating, resets position=0.
Step S200, when telephone net node 11 was in idle condition, then the barrier module 21 responses urgency bag different with current sequence number of telephone net node 11 waited for the barrier reduction bag identical with current sequence number, and abandoned other bag;
Step S300, after telephone net node 11 receives the barrier reduction bag identical with current barrier sequence number, respond the urgency bag different with current sequence number, and the response barrier reduction bag identical with current sequence number, abandon other bag, in the mode of overtime re-transmission, all barrier reduction bags are returned respond packet, carry out the barrier reduction;
Step S400, value in the barrier reduce state register in the barrier module 21 is complete 0, be that reduction procedure is when finishing, then the barrier module 21 of node responds the barrier distributing packets identical with current sequence number, to the position negate of barrier completion status, adopt the mode of overtime urgency, carry out the barrier distribution, forward step S200 then to, enter idle condition.
As shown in Figure 6, describe in detail among the step S300 of the present invention below, respond the barrier reduction bag identical,, all barrier reduction bags are returned respond packet, carry out the process of barrier reduction in the mode of overtime re-transmission with current sequence number.
In the embodiment of the invention, only to telephone net node 11 inside, and describe, but it goes for whole barrier operating network equally at a barrier group.
Integrity problem appears in the reduction procedure:
1) receives the barrier reduction bag of makeing mistakes;
2) the barrier reduction of Fa Songing contracts out mistake;
3) receive the barrier reduction bag of last barrier operating.
The present invention is by the configurable retransmission mechanism of makeing mistakes, and sequence number bit-identify method, and above-mentioned mistake is handled, and guarantees correctly carrying out of barrier operating, shown in Fig. 5 B, specifies as follows:
In operating process, when port 0 is received a barrier reduction bag, just the mode bit with corresponding No. 0 port in the reduce state register resets, when the value of reduction status register is complete 0 the time, cause this node reduction and finish incident, barrier reduction bag is sent to father's port.
Double barrier operating is distinguished in sequence number position in the reduce state register, has only the barrier reduction bag identical with the current sequence number position just the reduce state register to be made amendment.
In reduction procedure, also comprise the steps, guarantee the reliability of reduction:
Step S310 ', after receiving barrier reduction bag, barrier module 21 is checked the barrier sequence number in the barrier reduction bag; If consistent with current sequence number, the mode bit of the corresponding reduce state register that then resets goes to step S320 '; Otherwise abandon; Simultaneously, to all barrier reduction bags, barrier module 21 is all replied and is replied (ACK) bag, replys the sequence number in the sequence number employing barrier reduction bag in (ACK) bag;
Step S320 ', reduce state register are complete 0 o'clock, and barrier module 21 is sent barrier reduction bag, and control 34 startups of time-out count module and reply (ACK) time-out count, forward step S330 ' to; Otherwise go to step S320 ';
Step S330 ' replys (ACK) bag when receiving, and after replying (ACK) sequence number and current sequence number being consistent, then barrier module 21 reset answer (ACK) time-out count is finished overtime action this time, goes to step S340 '; Reach threshold value if reply (ACK) time-out count, then retransmit barrier reduction bag;
Preferably, in step S330 ', when the threshold value of overtime re-transmission was set, this threshold value was provided with by link state monitoring module 35.As shown in Figure 8, the function of link state monitoring module 35 comprises Link State test and timeout threshold setting, two counters are set in the link state monitoring module 35, the number of crc error bag appears in a record, be initialized as 1, another writes down the number of all packets, and the ratio of two counters is error rate:
r=C error/C total
Wherein, C ErrorValue is not 0.
Timeout threshold Ack_Thres=K/r, wherein K is a constant.
Therefore, error rate r is high more, and threshold value is more little, retransmits frequent more; Otherwise error rate r is low more, and threshold value is big more, retransmits few more.
For the high link of error rate, in time retransmit the efficient of meeting raising system, but too frequent, retransmission packet can disturb normal transfer of data and barrier operating; For the low link of error rate, reduce retransmitting as far as possible, reduce to greatest extent to retransmit and disturb, but overtime long, the barrier that can reduce again when makeing mistakes is carried out efficient.
According to dynamically arranging of timeout threshold, determine to retransmit the best opportunity of taking place, just can reach the optimum performance of makeing mistakes and recovering.
Retransmit the balance of disturbing two aspects in order to reach timely re-transmission and to reduce, preferably, rule of thumb value is provided with K=2 * RTT, and wherein, RRT is round-trip delay (Round-Trip Time).
Step S340 ' waits for the barrier distributing packets.
As shown in Figure 7, describe in further detail below among the step S400, the barrier module 21 responses barrier distributing packets identical with current sequence number of node to the position negate of barrier completion status, adopts the mode of overtime urgency, carries out the process of barrier distribution.
In the embodiment of the invention, also only an intra-node being described of distribution procedure, but it is equally applicable to whole barrier network.
Step S410, if judge to be root node (Root), it is Root position=1, then when finishing the barrier reduction procedure, the effective port of the barrier module 21 of root node in the leaf configuration register sends the barrier distributing packets, the barrier distributing packets is used current barrier sequence number bit-identify, the barrier of set simultaneously completion status position;
Step S420, if not root node (Root), after then uncle's port was received a barrier distributing packets, the barrier module 21 of node was with regard to set barrier completion status position, and transmitted this barrier distributing packets to the leaf port.
In distribution procedure, also comprise the steps, guarantee the reliability of distribution:
Step S410 ', finish the telephone net node 11 of reduction procedure, reply in startup in (ACK) time-out count, time-out count module 34 in the barrier module 21 of telephone net node 11 starts the overtime urgency counting of barrier distributing packets, urge counting use with reduction procedure in reply (ACK) overtime different overtime urgency threshold value;
After urging counting to reach threshold value, if also do not receive the barrier distributing packets, then send distribution and urge bag to father's port, urge father node to carry out the transmission of barrier distributing packets;
Preferably, in step S410 ', overtime urgency threshold value need be set, and this threshold value is provided with by link state monitoring module 35, as shown in Figure 8, the function of link state monitoring module 35 comprises the Link State test and urges the threshold value setting, two counters are set in the link state monitoring module 35, and the number of crc error bag appears in a record, is initialized as 1, another writes down the number of all packets, and the ratio of two counters is error rate:
r=C error/C total
Wherein, C ErrorValue is not 0.
Wherein, urgency threshold value Dun_Thres=(K ' * L)/and r, wherein K ' is a constant, L is the number of plies of node in the barrier tree.Error rate r is high more, and threshold value is more little, urges frequent more; Otherwise error rate r is low more, and threshold value is big more, urges sparse more; Along with tree structure, bottom-up, urge frequency to increase gradually.
For the high link of error rate, in time urge the efficient of meeting raising system, but too frequent, urge bag can disturb normal transfer of data and barrier operating.Therefore, for the low link of error rate, reduce as far as possible and urge, reduce to greatest extent to urge and disturb, but overtime long, the barrier that can reduce again when makeing mistakes is carried out efficient.
According to dynamically arranging of timeout threshold, determine to urge the best opportunity of taking place, just can reach the optimum performance of makeing mistakes and recovering.
In order to reach timely urgency and to reduce and urge the balance of disturbing two aspects, preferably, rule of thumb value is provided with K '=4 * RTT.
Step S420 ', if root node (Root), then after finishing the barrier reduction procedure, the barrier module 21 of node uses current barrier sequence number position to fill the barrier distributing packets, goes to step S430 '; If not root node, then receiving the barrier distributing packets, and when sequence number is identical with the current sequence number position in the distributing packets, going to step S430 ';
Step S430 ', the effective port of the barrier module 21 of node in the leaf configuration register sends the barrier distributing packets, and set barrier completion status position is 1, goes to step S440 ';
Step S440 ', urge bag if receive barrier,, and urge the sequence number position of bag different with current barrier sequence number if barrier completion status position is 1, then the barrier module 21 of node is retransmitted the barrier distributing packets, fills the barrier distributing packets of retransmitting with the value after the position negate of barrier sequence number.
In the present invention, comprehensive reduction and two processes of distribution realize barrier operating, use three states can realize the present invention, have simplified logical design greatly.
The present invention can adopt hardware description language (Hardware Description Language HDL) to write, comprehensively, download to field programmable gate array (Field Programmable Gate Array after the emulation, debugging, FPGA) in device or the application-specific integrated circuit (ASIC), can realize required on-chip system chip.Perhaps each functional module also the available dedicated integrated circuit (Application Specific Intergrated Circuits ASIC) realizes.
More than specific embodiments of the invention are described and illustrate it is exemplary that these embodiment should be considered to it, and be not used in and limit the invention, the present invention should make an explanation according to appended claim.

Claims (20)

1, a kind of barrier operating network system based on fat tree topology, it is included in the data internet of multiple processor system, described barrier operating network system, the barrier tree that comprises many fat tree topology structures, and the root that uses telephone net node to set as barrier, processor node is characterized in that as the leaf of barrier tree
Described telephone net node comprises:
The barrier module, be used for described telephone net node being configured according to the fat tree topology structure, and according to the state of described telephone net node in reduction procedure, adopt response mode to collecting barrier reduction packet procedures, in distribution procedure, adopt the urgency mode to collecting barrier distributing packets process, guarantee the reliability of barrier operating.
2, barrier operating network system according to claim 1 is characterized in that, described barrier module also is used for distinguishing double barrier operating by the single-bit label manner.
3, barrier operating network system according to claim 1 and 2 is characterized in that, described barrier module also is used for obtaining the link error rate of current switch by the error rate statistics, automatically overtime re-transmission parameter is adjusted.
4, barrier operating network system according to claim 1 and 2 is characterized in that, described barrier module comprises the configuration set module, the barrier state machine, and barrier bag grouping module, the time-out count module, wherein:
Described configuration set module comprises leaf configuration register and father's port arrangement register, is used for determining the internodal interconnecting relation of barrier group;
Described barrier bag grouping module is used for bag is classified according to the barrier group number, and the barrier bag of each group is translated as all kinds of incidents;
Described barrier state machine comprises a plurality of group state machines, and the dispatch state machine;
Described time-out count module is used to control the generation of the re-transmission and the incident of urgency.
5, barrier operating network system according to claim 4 is characterized in that, described group state machine is handled the barrier packet that belongs to this group, and the barrier packet that each group produces is sent to the next stage node after dispatching through the dispatch state machine;
Described group state machine comprises barrier reduce state register, and this reduce state register is used for writing down the various state informations in the barrier operating process;
This group state machine also comprises barrier completion status position, and this mode bit has write down the completion status of barrier distribution operation.
6, barrier operating network system according to claim 4 is characterized in that, described barrier module also comprises the link state monitoring module, be used for dynamically obtaining link-state information, and according to link-state information, dynamically arrange re-transmission and urge timeout threshold, use for the time-out count module.
7, a kind of switch, root as the barrier of fat tree topology structure tree, it is characterized in that, comprise the barrier module, be used for described telephone net node being configured according to the fat tree topology structure, and according to the state of described telephone net node in reduction procedure, adopt response mode to collecting barrier reduction packet procedures, in distribution procedure, adopt the urgency mode to collecting barrier distributing packets process, guarantee the reliability of barrier operating.
8, switch according to claim 7 is characterized in that, described barrier module also is used for distinguishing double barrier operating by the single-bit label manner.
9, switch according to claim 8 is characterized in that, described barrier module also is used for obtaining the link error rate of current switch by the error rate statistics, automatically overtime re-transmission parameter is adjusted.
According to claim 7 or 8 described switches, it is characterized in that 10, described barrier module comprises the configuration set module, the barrier state machine, barrier bag grouping module, the time-out count module, wherein:
Described configuration set module comprises leaf configuration register and father's port arrangement register, is used for determining the internodal interconnecting relation of barrier group;
Described barrier bag grouping module is used for bag is classified according to the barrier group number, and the barrier bag of each group is translated as all kinds of incidents; Described incident comprises that reduction, distribution, reduction reply and distribute the urgency incident;
Described barrier state machine comprises a plurality of group state machines, and the dispatch state machine;
Described time-out count module is used to control the generation of the re-transmission and the incident of urgency.
11, switch according to claim 10, described barrier module also comprises the link state monitoring module, is used for dynamically obtaining link-state information, and according to link-state information, dynamically arranges re-transmission and urge timeout threshold, uses for the time-out count module.
12, a kind of barrier operating method based on fat tree topology is characterized in that, comprises the following steps:
Steps A is configured the telephone net node in the barrier tree of barrier operating system according to fat knot topological structure;
Step B, telephone net node is in idle condition, responds the urgency bag different with current sequence number, waits for the barrier reduction bag identical with current sequence number;
Step C after telephone net node receives the barrier reduction bag identical with current barrier sequence number, responds the barrier reduction bag identical with current sequence number, in the mode of overtime re-transmission, all barrier reduction bags is returned respond packet, carries out the barrier reduction;
Step D when the reduction process is finished, then responds the barrier distributing packets identical with current sequence number, and the process that receives the barrier distributing packets adopts the mode of overtime urgency, and the barrier distributing packets that receives is carried out multicast distribution, forwards step B then to.
13, the barrier operating method based on fat tree topology according to claim 12 is characterized in that, among the described step D, carries out the barrier distribution, also comprises the following steps:
To the position negate of barrier completion status.
14, according to claim 12 or 13 described barrier operating methods based on fat tree topology, it is characterized in that, among the described step C, respond the barrier reduction bag identical with current sequence number, mode with overtime re-transmission, all barrier reduction bags are returned respond packet, carry out the barrier reduction, comprise the following steps:
Step C1 after receiving barrier reduction bag, checks the barrier sequence number in the barrier reduction bag; If consistent with current sequence number, the mode bit of the corresponding reduce state register that then resets goes to step C2; Otherwise abandon; Simultaneously, to all barrier reduction bags, reply response packet, the sequence number in the response packet adopts the sequence number in the barrier reduction bag;
Step C2, reduce state register are complete 0 o'clock, and the barrier module is sent barrier reduction bag, and control the startup of time-out count module and reply time-out count, forward step C3 to; Otherwise go to step C2;
Step C3, when receiving response packet, and after replying sequence number and current sequence number being consistent, then the reset answer time-out count is finished overtime action this time, goes to step C4; Reach threshold value if reply time-out count, then retransmit barrier reduction bag;
Step C4 waits for the barrier distributing packets.
15, the barrier operating method based on fat tree topology according to claim 14 is characterized in that, described threshold value is provided with by the link state monitoring module.
16, the barrier operating method based on fat tree topology according to claim 15, it is characterized in that, described link state monitoring module comprises Link State test and timeout threshold setting, two counters are set in the link state monitoring module, the number of crc error bag appears in a record, be initialized as 1, another writes down the number of all packets, and the ratio of two counters is error rate:
r=C error/C total
Wherein, C ErrorFor the number of data packets of crc error, C occurring TotalBe the packet sum that receives, and C ErrorAnd C TotalValue all be not 0;
Timeout threshold Ack_Thres=K/r, wherein K is a constant.
17, the barrier operating method based on fat tree topology according to claim 13 is characterized in that, among the described step D, respond the barrier distributing packets identical,, adopt the mode of overtime urgency the position negate of barrier completion status with current sequence number, carry out the process of barrier distribution, comprise the following steps:
Step D1 finishes the telephone net node of reduction procedure, when time-out count is replied in startup, starts the overtime urgency counting of barrier distributing packets; After urging counting to reach threshold value, if also do not receive the barrier distributing packets, then send distribution and urge bag to father's port, urge father node to carry out the transmission of barrier distributing packets;
Step D2, if root node, then after finishing the barrier reduction procedure, the barrier module of node uses current barrier sequence number position to fill the barrier distributing packets, goes to step D3; If not root node, then receiving the barrier distributing packets, and when sequence number is identical with the current sequence number position in the distributing packets, going to step D3;
Step D3, the effective port of the barrier module of node in the leaf configuration register sends the barrier distributing packets, and set barrier completion status position is 1, goes to step D4;
Step D4 urges bag if receive barrier, if barrier completion status position is 1, and urges the sequence number position of bag different with current barrier sequence number, and then the barrier module of node is retransmitted the barrier distributing packets, fills the barrier distributing packets of retransmitting with the value after the position negate of barrier sequence number.
18, the barrier operating method based on fat tree topology according to claim 17 is characterized in that, described threshold value is provided with by the link state monitoring module.
19, the barrier operating method based on fat tree topology according to claim 18, it is characterized in that, the function of described link state monitoring module comprises the Link State test and urges the threshold value setting, two counters are set in the link state monitoring module, the number of crc error bag appears in a record, be initialized as 1, another writes down the number of all packets, and the ratio of two counters is error rate:
r=C error/C total
Wherein, C ErrorFor the number of data packets of crc error, C occurring TotalBe the packet sum that receives, and C ErrorAnd C TotalValue all be not 0;
Wherein, urgency threshold value Dun_Thres=(K ' * L)/and r, wherein K ' is a constant, L is the number of plies of node in the barrier tree.
20, according to claim 12 or 13 described barrier operating methods, it is characterized in that described steps A comprises the following steps: based on fat tree topology
Steps A 1 by the leaf configuration register in the configuration set module in the configuration switch and father's port arrangement register, is determined in the barrier group interconnecting relation between node;
Steps A 2 is used the value of leaf configuration register in the configuration set module, and the initial value of barrier status register is set;
Steps A 3, the barrier completion status that before carrying out barrier operating, resets position.
CNB2007101207540A 2007-08-24 2007-08-24 A kind of barrier operating network system, device and method based on fat tree topology Active CN100571183C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101207540A CN100571183C (en) 2007-08-24 2007-08-24 A kind of barrier operating network system, device and method based on fat tree topology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101207540A CN100571183C (en) 2007-08-24 2007-08-24 A kind of barrier operating network system, device and method based on fat tree topology

Publications (2)

Publication Number Publication Date
CN101127677A CN101127677A (en) 2008-02-20
CN100571183C true CN100571183C (en) 2009-12-16

Family

ID=39095607

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101207540A Active CN100571183C (en) 2007-08-24 2007-08-24 A kind of barrier operating network system, device and method based on fat tree topology

Country Status (1)

Country Link
CN (1) CN100571183C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945050B (en) * 2010-09-25 2014-03-26 中国科学院计算技术研究所 Dynamic fault tolerance method and system based on fat tree structure
JP5542787B2 (en) * 2011-12-08 2014-07-09 シャープ株式会社 Image forming apparatus
CN107066417A (en) * 2017-02-28 2017-08-18 郑州云海信息技术有限公司 A kind of method and apparatus of link parameter on-line tuning
CN109246030B (en) * 2018-08-28 2021-06-15 烽火通信科技股份有限公司 Method and system for realizing state machine in configuration editing process

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5365228A (en) * 1991-03-29 1994-11-15 International Business Machines Corporation SYNC-NET- a barrier synchronization apparatus for multi-stage networks
US5832261A (en) * 1991-11-28 1998-11-03 Fujitsu Limited Barrier synchronizing mechanism for a parallel data processing control system
US6216174B1 (en) * 1998-09-29 2001-04-10 Silicon Graphics, Inc. System and method for fast barrier synchronization
CN1514591A (en) * 2002-12-31 2004-07-21 浪潮电子信息产业股份有限公司 High speed, high character price ratio multi branch fat tree network topological structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5365228A (en) * 1991-03-29 1994-11-15 International Business Machines Corporation SYNC-NET- a barrier synchronization apparatus for multi-stage networks
US5832261A (en) * 1991-11-28 1998-11-03 Fujitsu Limited Barrier synchronizing mechanism for a parallel data processing control system
US6216174B1 (en) * 1998-09-29 2001-04-10 Silicon Graphics, Inc. System and method for fast barrier synchronization
CN1514591A (en) * 2002-12-31 2004-07-21 浪潮电子信息产业股份有限公司 High speed, high character price ratio multi branch fat tree network topological structure

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于二叉树胖树模型的并行FFT算法. 魏文红,高大利.计算机应用,第27卷第4期. 2007 *
用胖树拓扑构建InfiniBand集群系统的分析与研究. 王文义,陈荟惠.计算机工程与应用,第43卷第3期. 2007 *

Also Published As

Publication number Publication date
CN101127677A (en) 2008-02-20

Similar Documents

Publication Publication Date Title
JP3649580B2 (en) A system for reporting errors in a distributed computer system.
US5459725A (en) Reliable multicasting over spanning trees in packet communications networks
US9313115B2 (en) Traffic generator with priority flow control
CN1633647B (en) System and method for managing data transfers in a network
CN103973482A (en) Fault-tolerant on-chip network system with global communication service management capability and method
JP2001306506A (en) Protocol for executing transaction
US9197373B2 (en) Method, apparatus, and system for retransmitting data packet in quick path interconnect system
CN110971542B (en) SRIO data transmission system based on FPGA
JPH09505713A (en) System for parallel assembly of data transmission in broadband networks
CN100571183C (en) A kind of barrier operating network system, device and method based on fat tree topology
CN103141050A (en) Data packet retransmission method and node in quick path interconnect system
US20020150056A1 (en) Method for avoiding broadcast deadlocks in a mesh-connected network
CN101361310A (en) A data processor system and a method for communicating data
JPH10326260A (en) Error reporting method using hardware element of decentralized computer system
US6999411B1 (en) System and method for router arbiter protection switching
US20060133376A1 (en) Multicast transmission protocol for fabric services
CN112350897A (en) Network testing device based on dynamic connection end-to-end reliable transmission protocol
CN111682966B (en) Network communication device with fault active reporting function, system and method thereof
CN117640511B (en) Wired communication system, communication chip, communication method and medium thereof
CN101330341B (en) Method and system for synchronizing grid barrier
CN117812027B (en) RDMA (remote direct memory access) acceleration multicast method, device, equipment and storage medium
Lankes et al. Benefits of selective packet discard in networks-on-chip
CN109347760A (en) A kind of data transmission method for uplink and device
Chatterjee et al. Mitigating transceiver and token controller permanent faults in wireless network-on-chip
Tang et al. Tolerating network failures in system area networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant