CN103973482A - Fault-tolerant on-chip network system with global communication service management capability and method - Google Patents

Fault-tolerant on-chip network system with global communication service management capability and method Download PDF

Info

Publication number
CN103973482A
CN103973482A CN201410164138.5A CN201410164138A CN103973482A CN 103973482 A CN103973482 A CN 103973482A CN 201410164138 A CN201410164138 A CN 201410164138A CN 103973482 A CN103973482 A CN 103973482A
Authority
CN
China
Prior art keywords
fault
network
link
node
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410164138.5A
Other languages
Chinese (zh)
Inventor
葛芬
吴宁
张颖
叶云飞
徐文涛
郑锦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201410164138.5A priority Critical patent/CN103973482A/en
Publication of CN103973482A publication Critical patent/CN103973482A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a fault-tolerant on-chip network system with the global communication service management capability. In the fault-tolerant on-chip network system, a network monitor and fault-tolerant routers serve as cores and are connected mutually through special buses to form the fault-tolerant on-chip network system. The invention further discloses a route distribution method and a network data transmission method applied to the fault-tolerant on-chip network system. Through the fault-tolerant on-chip network system, the route distribution method and the network data transmission method, the global network state can be monitored, network congestion and fault links can be located and recognized, data retransmission can be conducted timely, transient faults can be avoided, congestion and permanent fault links can be avoided through rerouting, the fault-tolerant capability of an on-chip network is effectively improved, and meanwhile the network data transmission performance at a high communication load is optimized and improved.

Description

There is fault-tolerant network-on-a-chip and the method for global communication transaction management ability
Technical field
The invention belongs to network-on-a-chip design field, be specifically related to a kind of method that there is the fault-tolerant network-on-a-chip of global communication transaction management ability and be applied to this system.
Background technology
Network-on-chip (Network on Chip, NoC) by build a micronetwork based on data message exchange on one single chip, for the interconnection between IP kernel provide efficiently, reliably, communication construction flexibly, become the effective scheme that solves globally interconnected and communication issue in complicated system-on-chip designs.As shown in Figure 1, the router that each router (Router, R) is adjacent with four is connected a typical network-on-a-chip, and is connected with a function IP kernel by network interface (Network Interface, NI).Interconnecting channel between router and between router and function IP kernel is made up of two unidirectional links.
Along with constantly reducing of System on Chip/SoC area and characteristic size, the fault that may occur on chip becomes very important.Thereby, adopt fault-tolerant strategy to remain valid forwarding data bag in the time there is communication failure, become the major issue that NoC system reliability design need be considered.NoC communication failure is divided into transient fault and permanent fault two classes.Transient fault refers in transfer of data, and owing to crosstalking, the reason such as unstable of electromagnetic interference, technique causes one or more bit mistakes, the method for solution has fault-tolerant route based on Random Communication and the retransmission mechanism based on error detection and correction code.Permanent fault is normally due to the damage physically in production process or chip running, certain module being produced, and it can not be repaired, and need to find out an alternative path and re-route packet.Current fault-tolerant route is mainly divided into static routing and dynamic routing two classes.Static fault-tolerant route need first be determined fault zone according to known fault point, then designs the route that detours and avoid fault zone, thereby in the time there is new physical fault in system running, it just can not walk around new malfunctioning node.Dynamic fault-tolerant route can be according to current network state, adaptive adjustment routed path.But existing dynamic fault-tolerant routing algorithm is only considered the malfunction in adjacent node or institute's structure realm, and correlation while having ignored the congested and packet concurrent transmission of the global network that may be caused by local fault.
Therefore, carry out network-on-chip reliability design from global communication transaction management angle, network-on-chip failure tolerant ability not only can be effectively improved, the high traffic load lower network problem of blocking up can also be optimized and improve, further to improve the throughput of network-on-chip transfer of data.
Summary of the invention
The problem that the object of the invention is to lack for current network-on-a-chip global communication transaction management, provides a kind of fault-tolerant network-on-a-chip framework with global communication transaction management ability.The present invention is by a networkmonitor module monitors global network state, locate and distinguish the congested and faulty link in network-on-chip, and can distinguish instantaneous and permanent link failure, adopt re-transmission mode to avoid transient fault, calculate and get around congested and permanent fault link by rerouting, effectively improve network-on-chip failure tolerant ability, optimize simultaneously and improve high traffic load lower network data transmission performance.
Realize technical scheme of the present invention as follows:
A kind of fault-tolerant network-on-a-chip with global communication transaction management ability, comprise interconnected redundancy router and function IP kernel, it is characterized in that: also comprise a networkmonitor, described networkmonitor is connected with each redundancy router in network-on-chip by dedicated bus, be used for monitoring global network state, and real-time status Network Based determines the routed path of all communication traces of application task, to get around congested and faulty link, described networkmonitor comprises following part:
A) network state acquisition module, this module is connected with each router, for collecting the real-time link state information of whole network;
B) path allocation algoritic module, this module is that every communication trace calculates routed path according to communication task and link-state information execution route allocation algorithm;
C) routed path sending module, the routed path data that this module calculates path allocation algoritic module send to connected router;
D) memory module, this module is made up of communication task table, link-state list and overall routing table, wherein communication task table is for preserving source node and destination node numbering and the communication bandwidth constraint information of each communication trace of application task, link-state list is for preserving the free time of interconnecting link between router, congested and malfunction, and overall routing table is used for the routed path data that storing path allocation algorithm module calculates.
The Core Feature of networkmonitor is that real-time status Network Based is that all communication traces of application task calculate routed path, the present invention further provides the path allocation algorithm that is applied to above-mentioned fault-tolerant network-on-a-chip, comprises the steps:
A) by every of application task communication trace by communication bandwidth requirements descending, and according to source/destination node to and fault or congestion link position generate quadrantal diagram.
B) input quadrantal diagram w and adjacency list thereof, source node r and destination node t.
C) distance of all nodes in initialization quadrantal diagram w, apart from d (i) represent from source node r to node i path shared link bandwidth sum.Make the distance d (r)=0 of r; For v ≠ r, d (v)=∞.
The node set of d) establishing quadrantal diagram w is T, starts search with u=r.
E) the descendant node set N (u) of calculating u, for each v N (u), if v T and d (v) >d (u)+w (uv), represent to find the more short path of v, upgrade distance d (v)=d (u)+w (uv) of v.Wherein, the weight w (uv) on uv limit is for describing the communication trace bandwidth demand providing in the core traffic diagram CCG of application task correspondence.Then from T, delete the node that current u points to, T=T-u, with seasonal u be in T another apart from minimum node.
F) repeating step (e), until search destination node t, output is apart from d (t).In iterative process, if due to link failure or congested, from present node u to its follow-up node set N (u), addressable node all cannot find a shorter path, date back to the previous node of present node u along the shortest path of setting up before, and present node u is deleted from T, then repeated execution of steps (e), until can find next satisfactory node u.So, the final path generating just can get around fault or congestion link, ensures that apart from d (t) be the link bandwidth minimum that dispense path takies simultaneously, makes balancing link load.
The present invention also provides the network data transmission method that is applied to above-mentioned fault-tolerant network-on-a-chip, comprises the steps:
(1) routing table of the each router of initialization; Once network-on-chip application system is started working, networkmonitor is to all-router broadcast routing information; Each router only reception sources node is the communication path of own local IP kernel, and path is saved in routing table;
(2) in the time that normal data packet is transmitted, the query router routing table that source node connects, and routing information is recorded in a microplate of packet, transmit in source routing mode;
(3), in transmitting procedure, when link generation permanent fault or when congested, each router being connected with link informs that by dedicated bus variation has occurred networkmonitor network state simultaneously;
(4) networkmonitor receives after global network real-time status information, again be that all communication traces calculate optimum interchangeable path, avoid malfunctioning node, and make network link loads equilibrium, avoid congestion, and new routing information is sent to all-router; At next transmission cycle, source node is pressed the optimum routed path that networkmonitor upgrades, and re-groups package and mails to destination node.
In the present invention, redundancy router is except completing data route and the forwarding work of router unit in traditional network-on-chip, also need to have obtain adjacent node status function locating congested and faulty link, have error detection functions of retransmission distinguishing instantaneous and permanent fault, have and the function of networkmonitor interactive information.Therefore preferably, redundancy router comprises following part:
A) buffer storage, comprises input block and retransmission buffer, is respectively used to preserve input port the data microplate receiving and the data microplate that need to again transmit;
B) coder/decoder, comprises input decoder and output coder, and wherein input decoder is for decoding to the data microplate of input channel, and verifies that whether it is correct, if detect in microplate wrongly, thinks in transmitting procedure and has produced transient fault; Output coder is for encoding to the data microplate that will send to next stage router through output port;
C) port controller, for the data microplate of input block being deposited in to the retransmission buffer of this port, the occupancy of the request number of times that record data microplate retransmits simultaneously and monitoring this direction input block, judges with this whether interconnecting link permanent fault or congested has occurred;
D) routing module, this module is resolved the data packet head microplate that deposits input block in, selects source routing or self adaptation route according to the route indication in the microplate of packet header, determines routing direction, selects output port;
E) arbitration modules, this module, in the time having the identical output port of multiple input port requests, is arbitrated according to the Congestion Level SPCC of all directions input block, selects the data of the input port of large (buffer performance is high) of output Congestion Level SPCC;
F) alteration switch, adopts full connecting valve structure, the holding wire between each input port and output port is directly connected, for data are exchanged to another port from a port;
G) state controller, this module receives state feedback signal and the repeat requests signal of the lower level router of all directions output port connection, control the forwarding work of canned data in buffer storage, and deposit the state information receiving in link-state list, request networkmonitor reading out data from link-state list;
H) routing table, for preserving the routed path information of sending from networkmonitor.
The present invention adopts above technical scheme, compared with prior art, has following technique effect:
The present invention introduces a network-on-chip watch-dog, for obtaining global network real-time status information and execution route allocation algorithm, builds the network-on-a-chip with global communication transaction management ability.The present invention can detect and locate the congested and faulty link in network-on-chip, and can distinguish instantaneous and permanent link failure, adopt re-transmission mode to avoid transient fault, calculate and get around congested and permanent fault by rerouting, the load of balance network-on-chip, raising network-on-chip data throughout and reliability are had to actively good using value.
Brief description of the drawings
Fig. 1 is typical network-on-a-chip structure chart;
Fig. 2 is the fault-tolerant network-on-a-chip structure chart with global communication transaction management ability;
Fig. 3 is triplication redundancy structure chart;
Fig. 4 is networkmonitor circuit structure diagram;
Fig. 5 A is network-on-chip quadrantal diagram;
Fig. 5 B is the routed path allocation example figure based on quadrantal diagram;
Fig. 6 is redundancy router circuit structure diagram;
Fig. 7 is even parity check coding-decoding circuit structure chart;
Fig. 8 is the circuit block diagram of redundancy router port controller module;
Fig. 9 is packet header microplate format chart;
Figure 10 A is redundancy router routing module circuit block diagram;
Figure 10 B is routing algorithm process chart;
Figure 11 is that alteration switch shows circuit structure diagram;
Figure 12 is state controller circuit structure diagram;
Figure 13 is the transfer of data flow process figure of system of the present invention.
Embodiment
Below in conjunction with accompanying drawing, describe the present invention by embodiment.
, as example the present invention is illustrated taking typical 4 × 4 lattice structure network-on-chips shown in Fig. 1 below.
The fault-tolerant network-on-a-chip with global communication transaction management ability of Fig. 2 for building on 4 × 4 lattice structure network-on-chips.As shown in Figure 2, each router R, except being connected with a local IP kernel c with four neighboring routers, is also connected to a networkmonitor by dedicated bus.This dedicated bus is for transmission state and routed path information, adopt triplication redundancy technology (Triple Module Redundancy, TMR), use 3 lines and one to vote the reliability of device circuit with guarantee information transmission, as shown in Figure 3, output data mode f is determined by three input data mode a, b, c.
Networkmonitor is for obtaining state information and the execution route allocation algorithm of the real-time load of global network and available link, mainly be made up of network state acquisition module, path allocation algoritic module, routed path sending module, memory module and built-in self-test module, its circuit structure diagram as shown in Figure 4.Each several part practical function is as follows respectively:
(1) network state acquisition module
This module is connected with each router, for collecting the real-time link state information of whole network.When receiving after the request signal Req of any one router, acquisition module starts to read corresponding data line Link_state uplink state information, deposit in link-state list, and send Ack signal to corresponding router, and each link-state information is outputed to path allocation algoritic module by set triggering signal routing_trigger.
(2) path allocation algoritic module
This module is according to current network state, execution route allocation algorithm is that each the communication trace in given application task calculates routed path new_rout, set triggering signal send_trigger after calculating, request path sending module is ready for sending routed path.For improving computational efficiency and the extensibility of path allocation algorithm, the present invention is by the mode based on AHB-Lite bus expansion ARM Cortex-M0 processor core and data program memory, by processor core operating path allocation algorithm.This path allocation algorithm comprises following six steps:
(a) by every of application task communication trace by communication bandwidth requirements descending, and according to source/destination node to and fault or congestion link position generate quadrantal diagram.
Wherein, according to the relative position of source node and destination node, in NoC, can be divided into the quadrantal diagram of 8 types, as shown in Figure 5A, be respectively ES, SW, WN, NE, WE, EW, NS, SN.If source/destination node is WE, EW, NS or SN to the quadrantal diagram generating, and wherein there is link failure or congested, by source node and destination node place row or column respectively to node of external expansion, generate new quadrantal diagram.As shown in Figure 5 B, source node S 1 generates quadrantal diagram SN with destination node D1, wherein has fault or congested link (representing by "×"), by source/destination node column to node of external expansion, generate new quadrantal diagram WE+WN.
(b) input quadrantal diagram w and adjacency list thereof, source node r and destination node t.
(c) distance of all nodes in initialization w, apart from d (i) represent from source node r to node i path shared link bandwidth sum.Make the distance d (r)=0 of r; For v ≠ r, d (v)=∞.
(d) node set of establishing w is T, starts search with u=r.
(e) the descendant node set N (u) of calculating u, for each v N (u), if v T and d (v) >d (u)+w (uv), represent to find the more short path of v, upgrade distance d (v)=d (u)+w (uv) of v.Wherein, the weight w (uv) on uv limit is for describing the communication trace bandwidth demand providing in the core traffic diagram CCG of application task correspondence.Then from T, delete the node that current u points to, T=T-u, with seasonal u be in T another apart from minimum node.
(f) repeating step (e), until search destination node t, output is apart from d (t).In iterative process, if due to link failure or congested, from present node u to its follow-up node set N (u), addressable node all cannot find a shorter path, date back to the previous node of present node u along the shortest path of setting up before, and present node u is deleted from T, then repeated execution of steps (e), until can find next satisfactory node u.In Fig. 5 B, in the time being source node S 2 and destination node D2 dispense path, in the time that running into node E, cannot continue forward north orientation, and date back node N and walk around fault or congested link, dotted arrow represents the routed path generating.
So, the final path generating just can get around fault or congestion link, ensures that apart from d (t) be the link bandwidth minimum that dispense path takies simultaneously, makes balancing link load.
(3) routed path sending module
The current results new_rout that this module calculates path allocation algoritic module and the routing information old_rout being kept in overall routing table before compare, only the path data having changed is sent to corresponding router, and deposited in overall routing table, to improve routed path information updating efficiency.When transmission, send request signal RT_request to corresponding router, export routing iinformation to data wire RT_output simultaneously.
(4) memory module
Communication task table in this module is for preserving source node and destination node numbering and the communication bandwidth requirements information of each communication trace, wherein respectively with 4bit data representation source node numbering Src_ID and destination node numbering Dest_ID, with the bandwidth demand BW_req of 16bit data representation communication trace.In 4 × 4 Mesh topologys, the ID of 16 network nodes can be expressed as 0000 ~ 1111.Link-state list is used 4bit data Router_ID identifier router, 2bit data Link_ID identifies the link (" 00 ", " 01 ", " 10 ", " 11 " represent respectively the link of router all directions four direction) being connected with router, 2bit data Status Value description chain line state (" 00 ", " 01 ", " 1X " represent that respectively Link State is idle, congested and fault).Every routed path in overall situation routing table is all preserved by the form of source node numbering, destination node numbering and each jumping routing direction (" 00 ", " 01 ", " 10 ", " 11 " represent respectively four of all directions routing direction).Wherein, in 4 × 4 Mesh networks, the maximum hop count in shortest route path is 6, considers in order to avoid congested or malfunctioning node and need to adopt the route that detours, and the routing direction Forward Direction of definition routed path is 16bit.
(5) built-in self-test module
For verifying the reliability of networkmonitor itself, this module forms by testing maker, response analysis device and test controller, adopts finite state machine to realize.
Except above-mentioned networkmonitor, in network-on-a-chip of the present invention, also having an important component part is redundancy router, its circuit structure diagram as shown in Figure 6, comprises buffer storage, coder/decoder, port controller, routing module, arbitration modules, alteration switch, state controller and routing table.Each several part practical function is as follows respectively, and wherein router adopts worm channel switching technology, and in data transmission procedure, each packet can be divided into multiple microplates, and setting each microplate length is 32bit.
(1) buffer storage
In order to reduce transmission delay, redundancy router of the present invention only arranges data buffer zone at input port, and output port is not data cached.Input block is an asynchronous static RAM (SRAM) of reading of synchronous write, and its capacity setting is 320K bit, and data width is 32 bits, and its capacity and data width are variable as required.The present invention also arranges retransmission buffer at input port, as the data backup district from input block output, be that the data that input port is sent at every turn from data buffer zone deposit retransmission buffer in simultaneously, with subordinate's router failure in the situation that, the data that are not successfully received before can again sending from retransmission buffer.The degree of depth of retransmission buffer is made as 1, and its capacity is 32 bits.
(2) coder/decoder
The present invention arranges decoder at router input port, at output port, encoder is set.Input decoder is for decoding to the data microplate of input channel, and verify that whether it is correct, if detect in microplate wrong, think in transmitting procedure and produced transient fault, port controller can send Nack semaphore request sender (upper level router) and again transmit this microplate, and the data of reception are abandoned; If the data microplate receiving does not have mistake, deposit input block in.Output coder is for encoding to the data microplate that will send to next stage router through output port.In the present invention, adopt the even parity check code that is simple and easy to realization, the data encoding of 32 bits is 33 bits, and highest order is even parity bit, and coding-decoding circuit as shown in Figure 7.
(3) port controller
Except local input port controller, the port controller of all the other four directions can record data the microplate request number of times retransmitting and the occupancy of monitoring this direction input block.If number of retransmissions exceedes the threshold value of setting, port controller will send status signal (Status Value=" 10 ") to upper level router, informs that its interconnecting link has sent permanent fault; If buffer performance reaches the thresholding of setting, port controller can send state value " 01 ", informs that its interconnecting link of level router has sent congested.In addition, if state value is " 00 ", represent that current interconnecting link does not exist congested and fault, upper level router need to send data as quickly as possible, to reduce time delay.
Port controller further can be divided into transmission control unit (TCU), retransmission counter, link state analysis device and four submodules of Buffer controller, and its circuit block diagram as shown in Figure 8.In the time that upper level router has the request of data of transmission Req, do not detecting in the microplate rub-out signal Error_flag situation that decoder sends, transmission control unit (TCU) superior router feedback answer signal Ack, control Buffer controller simultaneously and send writing address signal Wr_addr[3:0] and write enable signal Wr_n, deposit data in input block.Otherwise Error_flag signal is effective if detect, transmission control unit (TCU) sends level router data retransmission microplate on Nack semaphore request.Buffer controller is by sending chip selection signal Cs_n and reading address signal Rd_addr[3:0] data of input block are sent to output port.Retransmission counter is counted Error_flag signal, if receive continuously, number of times exceedes setting threshold value, by link state analysis device to the set of Stress_value corresponding positions, represent link there is fault.It is congested that while link state analysis device also can judge by the poor buffer performance calculating of read/write address whether current link occurs according to Buffer controller, and to the set of Stress_value corresponding positions.
(4) routing module
Routing module is resolved the packet header microplate that deposits input block in, determines routing direction, selects output port.Definition data packet head microplate length is 32bit, and form as shown in Figure 9.Field Dest_ID and Src_ID represent respectively destination node address and source node address, and Routing_path records routed path information, and Rt_flag is route-type mark, and Packet_legnth is data packet length, and Hops represents hop count.As shown in Figure 10 A, each signal implication and handling process are as shown in Figure 10 B for the circuit block diagram of routing module.
In Figure 10 A, signal E_routing_en represents that in the buffering area of eastern input direction, existing packet header microplate arrives, data parser reads the packet header microplate E_head_flit of this direction input, parses destination node address D est_ID and route indication position Rt_flag wherein.Route computing function module judges the route indication signal E_Rt_flag of this bearing data, if signal value is 0, adopt source routing, extract routing information and jumping figure in the microplate of packet header, calculate the position (Next_X of down hop, Next_Y), simultaneously according to the whether congested or Reflector E_stress_flag of state controller feedback, judge whether this down hop path can transmit, if cannot transmit, adopt partial adaptivity XY routing algorithm to recalculate next position of jumping according to destination node address E_Dest_ID and present node address Local_ID, send Modify_req signal notice output port simultaneously and change the Rt_flag value in the microplate of packet header into 1, be 1 if directly read E_Rt_flag signal value, adopt partial adaptivity XY routing algorithm to calculate down hop position.Sending direction controller is determined the output port E_dir of these input port data according to the position (Next_X, Next_Y) of down hop, and is sent to arbitration modules.
(5) arbitration modules
If there is the identical output port of multiple input port requests, the Stress_value that arbitration modules sends according to the port controller shown in Fig. 8 is known the degree that takies of all directions input block, allow data in the input port buffering area of Congestion Level SPCC large (buffer performance is high) in advance, if it is identical to compete the loading condition of multiple input ports of same output port, adopt polling algorithm.
(6) alteration switch
The present invention adopts at a high speed, the alteration switch structure of simple full link, the holding wire between each input port and output port is directly connected, as shown in figure 11, to improve to greatest extent parallel transmission ability.When the data that input port carrys out reception forward, data are transferred to respectively other four output ports, complete data output by moderator control output end mouth via alteration switch.
(7) state controller
State controller is for the treatment of Link State value Status Value and Nack signal, and they are fed back by four input port controllers of next stage router.
This module further can be divided into state analyzer, congested/malfunction transmitter and three submodules of Buffer selection control, and its circuit structure diagram as shown in figure 12.In the time having the Nack signal input of certain input direction, state analyzer can notify Buffer to select the MUX of the input port of control module to this direction to send signal, controls MUX and selects to send data from the retransmission buffer of this port.As X_status_value[1:0] while being not 00 (X represents all directions direction E, S, W, N), represent, in the connecting link of input direction, congested or fault has occurred.State analyzer parses link circuit condition, by corresponding state flag bit X_stress_flag set, sends to routing module.Simultaneously, congested/malfunction generator sends to link-state list by corresponding format by Link State value Link_state, and send request Req to networkmonitor, when receiving after the answer signal Ack of networkmonitor feedback, next Link State value is sent.
(8) routing table
Routing table is made up of register group, and for preserving the routed path information of sending from networkmonitor, its form is identical with overall routing table in networkmonitor.
Based on being the fault-tolerant network-on-a-chip that core forms by above-mentioned networkmonitor and redundancy router, the process of data packet transmission as shown in figure 13, comprises following four steps.
(1) routing table of the each router of initialization.
First communication bandwidth and the delayed data between the network topology structure based on definite and IP kernel pair, it is that every communication trace calculates shortest route path that networkmonitor calls path allocation algoritic module.Once network-on-chip application system is started working, networkmonitor is to all-router broadcast routing information.Each router only reception sources node is the communication path of own local IP kernel, and path is saved in routing table.
(2) in the time that normal data packet is transmitted, the query router routing table that source node connects, and routing information is recorded in a microplate of packet, transmit in source routing mode.
(3), in transmitting procedure, when link generation permanent fault or when congested, each router being connected with link informs that by dedicated bus variation has occurred networkmonitor network state simultaneously.
(4) networkmonitor receives after global network real-time status information, again be that all communication traces calculate optimum interchangeable path, avoid malfunctioning node, and make network link loads equilibrium, avoid congestion, and new routing information is sent to all-router.At next transmission cycle, the optimum routed path that source node just can upgrade by networkmonitor, re-groups package and mails to destination node.
Application said method can form the fault-tolerant network-on-a-chip with global communication transaction management ability, realize the monitoring to global network state, locate and distinguish the congested and faulty link in network, and carry out in time data re-transmission and avoid transient fault, calculate and get around congested and permanent fault link by rerouting, reach the communication objective of high-performance high reliability.

Claims (6)

1. one kind has the fault-tolerant network-on-a-chip of global communication transaction management ability, comprise redundancy router and the function IP kernel of interconnection, it is characterized in that: also comprise a networkmonitor, described networkmonitor is connected with each redundancy router in network-on-chip by dedicated bus, be used for monitoring global network state, and real-time status Network Based determines the routed path of all communication traces of application task, to get around congested and faulty link, described networkmonitor comprises following part:
A) network state acquisition module, this module is connected with each router, for collecting the real-time link state information of whole network;
B) path allocation algoritic module, this module is that every communication trace calculates routed path according to communication task and link-state information execution route allocation algorithm;
C) routed path sending module, the routed path data that this module calculates path allocation algoritic module send to connected router;
D) memory module, this module is made up of communication task table, link-state list and overall routing table, wherein communication task table is for preserving source node and destination node numbering and the communication bandwidth constraint information of each communication trace of application task, link-state list is for preserving the free time of interconnecting link between router, congested and malfunction, and overall routing table is used for the routed path data that storing path allocation algorithm module calculates.
2. the fault-tolerant network-on-a-chip with global communication transaction management ability according to claim 1, is characterized in that: described networkmonitor also comprises a built-in self-test module, for verifying the reliability of networkmonitor itself.
3. the fault-tolerant network-on-a-chip with global communication transaction management ability according to claim 1, is characterized in that: described redundancy router comprises following part:
A) buffer storage, comprises input block and retransmission buffer, is respectively used to preserve input port the data microplate receiving and the data microplate that need to again transmit;
B) coder/decoder, comprises input decoder and output coder, and wherein input decoder is for decoding to the data microplate of input channel, and verifies that whether it is correct, if detect in microplate wrongly, thinks in transmitting procedure and has produced transient fault; Output coder is for encoding to the data microplate that will send to next stage router through output port;
C) port controller, for the data microplate of input block being deposited in to the retransmission buffer of this port, the occupancy of the request number of times that record data microplate retransmits simultaneously and monitoring this direction input block, judges with this whether interconnecting link permanent fault or congested has occurred;
D) routing module, this module is resolved the data packet head microplate that deposits input block in, selects source routing or self adaptation route according to the route indication in the microplate of packet header, determines routing direction, selects output port;
E) arbitration modules, in the time having the identical output port of multiple input port requests, this module is arbitrated according to the Congestion Level SPCC of all directions input block, selects the data of the input port that output Congestion Level SPCC is larger;
F) alteration switch, adopts full connecting valve structure, the holding wire between each input port and output port is directly connected, for data are exchanged to another port from a port;
G) state controller, this module receives state feedback signal and the repeat requests signal of the lower level router of all directions output port connection, control the forwarding work of canned data in buffer storage, and deposit the state information receiving in link-state list, request networkmonitor reading out data from link-state list;
H) routing table, for preserving the routed path information of sending from networkmonitor.
4. the fault-tolerant network-on-a-chip with global communication transaction management ability according to claim 1, is characterized in that: described dedicated bus adopts triplication redundancy technology.
5. be applied to the path allocation methodology described in claim 1 with the fault-tolerant network-on-a-chip of global communication transaction management ability, it is characterized in that comprising the steps:
A) by every of application task communication trace by communication bandwidth requirements descending, and according to source/destination node to and fault or the position of congestion link in network-on-chip generate quadrantal diagram w;
B) input quadrantal diagram w and adjacency list thereof, source node r and destination node t;
C) distance of all nodes in initialization quadrantal diagram w, apart from d (i) represent from source node r to node i path shared link bandwidth sum; Make the distance d (r)=0 of r; For v ≠ r, d (v)=∞;
The node set of d) establishing quadrantal diagram w is T, starts search with u=r;
E) the descendant node set N (u) of calculating u, for each v N (u), if v T and d (v) >d (u)+w (uv), represent to find the more short path of v, upgrade distance d (v)=d (u)+w (uv) of v; Wherein, the weight w (uv) on uv limit is for describing the communication trace bandwidth demand providing in the core traffic diagram CCG of application task correspondence; Then from T, delete the node that current u points to, T=T-u, with seasonal u be in T another apart from minimum node;
F) repeating step (e), until search destination node t, output is apart from d (t); In iterative process, if due to link failure or congested, from present node u to its follow-up node set N (u), addressable node all cannot find a shorter path, date back to the previous node of present node u along the shortest path of setting up before, and present node u is deleted from T, then repeated execution of steps (e), until can find next satisfactory node u; So, the final path generating just can get around fault or congestion link, ensures that apart from d (t) be the link bandwidth minimum that dispense path takies simultaneously, makes balancing link load.
6. be applied to the network data transmission method described in claim 1 with the fault-tolerant network-on-a-chip of global communication transaction management ability, it is characterized in that comprising the steps:
(1) routing table of the each router of initialization; Once network-on-chip application system is started working, networkmonitor is to all redundancy router broadcast routing informations; Each redundancy router only reception sources node is the communication path of own local IP kernel, and path is saved in routing table;
(2) in the time that normal data packet is transmitted, the query router routing table that source node connects, and routing information is recorded in a microplate of packet, transmit in source routing mode;
(3), in transmitting procedure, when link generation permanent fault or when congested, each router being connected with link informs that by dedicated bus variation has occurred networkmonitor network state simultaneously;
(4) networkmonitor receives after global network real-time status information, again be that all communication traces calculate optimum interchangeable path, avoid malfunctioning node, and make network link loads equilibrium, avoid congestion, and new routing information is sent to all-router; At next transmission cycle, source node is pressed the optimum routed path that networkmonitor upgrades, and re-groups package and mails to destination node.
CN201410164138.5A 2014-04-22 2014-04-22 Fault-tolerant on-chip network system with global communication service management capability and method Pending CN103973482A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410164138.5A CN103973482A (en) 2014-04-22 2014-04-22 Fault-tolerant on-chip network system with global communication service management capability and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410164138.5A CN103973482A (en) 2014-04-22 2014-04-22 Fault-tolerant on-chip network system with global communication service management capability and method

Publications (1)

Publication Number Publication Date
CN103973482A true CN103973482A (en) 2014-08-06

Family

ID=51242549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410164138.5A Pending CN103973482A (en) 2014-04-22 2014-04-22 Fault-tolerant on-chip network system with global communication service management capability and method

Country Status (1)

Country Link
CN (1) CN103973482A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104270279A (en) * 2014-10-28 2015-01-07 电子科技大学 On-line error detection circuit for fault of NoC (Network-on-Chip) illegal path
CN104796343A (en) * 2015-03-21 2015-07-22 西安电子科技大学 Communication structure based on network-on-chip
CN105187272A (en) * 2015-08-24 2015-12-23 阔地教育科技有限公司 Network state detection method and device
CN105450524A (en) * 2015-12-08 2016-03-30 北京飞讯数码科技有限公司 Media forwarding routing optimization algorithm
CN105656773A (en) * 2016-03-24 2016-06-08 合肥工业大学 High-reliability link failure tolerance module and method aiming at transient failures and intermittent failures in network-on-chip
CN105893321A (en) * 2016-03-24 2016-08-24 合肥工业大学 Path diversity-based crossbar switch fine-grit fault-tolerant module in network on chip and method
CN106487673A (en) * 2016-12-08 2017-03-08 北京时代民芯科技有限公司 A kind of error detection based on triplication redundancy retransmits fault tolerance rout ing unit
CN106506267A (en) * 2016-10-26 2017-03-15 合肥工业大学 NoC real-time monitoring configuration circuits towards the combination ofperformance and static behavior of power consumption temperature
CN106792832A (en) * 2017-01-25 2017-05-31 合肥工业大学 The congestion discrimination module and its method of radio node in a kind of wireless network-on-chip
CN105207841B (en) * 2015-08-24 2018-12-18 阔地教育科技有限公司 A kind of network state detection method and device based on Online class
CN109802889A (en) * 2017-11-17 2019-05-24 华为技术有限公司 A kind of information transferring method and device
CN110351192A (en) * 2019-08-15 2019-10-18 电子科技大学 A kind of multi-level optional compound route control method of dynamic towards network-on-chip
CN111147198A (en) * 2020-01-02 2020-05-12 中科驭数(北京)科技有限公司 Data retransmission method and device
CN111382115A (en) * 2018-12-28 2020-07-07 北京灵汐科技有限公司 Path creating method and device for network on chip and electronic equipment
CN111475457A (en) * 2020-04-08 2020-07-31 苏州浪潮智能科技有限公司 Method, device and storage medium for determining data transmission path of network on chip
CN111522775A (en) * 2020-04-22 2020-08-11 合肥工业大学 Network-on-chip routing device and control method thereof
CN111651118A (en) * 2020-04-27 2020-09-11 中国科学院微电子研究所 Memory system, control method and control device
CN111817952A (en) * 2019-10-11 2020-10-23 西安电子科技大学 Mesh structure-based high-fault-tolerance low-delay routing algorithm
CN112491666A (en) * 2021-02-03 2021-03-12 之江实验室 Elastic reliable router for power grid wide area phase measurement system
JP2021052385A (en) * 2019-08-16 2021-04-01 ウルトラソック、テクノロジーズ、リミテッドUltrasoc Technologies Limited Addressing mechanism for system-on-chip
CN113218437A (en) * 2021-04-30 2021-08-06 华中师范大学 Large-area array fault-tolerant network reading device and method for high-density charge sensor chip
CN113382431A (en) * 2021-06-16 2021-09-10 复旦大学 Inter-node fault-tolerant communication system and communication method suitable for large-scale parallel computing
US20220166705A1 (en) * 2019-05-23 2022-05-26 Hewlett Packard Enterprise Development Lp Dragonfly routing with incomplete group connectivity
WO2022143020A1 (en) * 2020-12-31 2022-07-07 Oppo广东移动通信有限公司 Chip and control method therefor, and computer-readable storage medium and electronic device
US11437846B2 (en) 2021-02-03 2022-09-06 Zhejiang Lab Reliable resilient router for wide-area phasor measurement system of power grid
CN115225694A (en) * 2022-06-13 2022-10-21 中科驭数(北京)科技有限公司 Data stream transmission method, device, equipment and medium
CN115361332A (en) * 2022-08-16 2022-11-18 中国工商银行股份有限公司 Processing method and device for fault-tolerant routing, processor and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035723A (en) * 2009-09-28 2011-04-27 清华大学 On-chip network router and realization method
CN102148763A (en) * 2011-04-28 2011-08-10 南京航空航天大学 Dynamic path distribution method and system applicable to network on chip
CN102546406A (en) * 2011-12-28 2012-07-04 龙芯中科技术有限公司 Network-on-chip routing centralized control system and device and adaptive routing control method
CN102629912A (en) * 2012-03-27 2012-08-08 中国人民解放军国防科学技术大学 Fault-tolerant deflection routing method and device for bufferless network-on-chip
CN102868604A (en) * 2012-09-28 2013-01-09 中国航空无线电电子研究所 Two-dimension Mesh double buffering fault-tolerant route unit applied to network on chip

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035723A (en) * 2009-09-28 2011-04-27 清华大学 On-chip network router and realization method
CN102148763A (en) * 2011-04-28 2011-08-10 南京航空航天大学 Dynamic path distribution method and system applicable to network on chip
CN102546406A (en) * 2011-12-28 2012-07-04 龙芯中科技术有限公司 Network-on-chip routing centralized control system and device and adaptive routing control method
CN102629912A (en) * 2012-03-27 2012-08-08 中国人民解放军国防科学技术大学 Fault-tolerant deflection routing method and device for bufferless network-on-chip
CN102868604A (en) * 2012-09-28 2013-01-09 中国航空无线电电子研究所 Two-dimension Mesh double buffering fault-tolerant route unit applied to network on chip

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
葛芬: "专用片上网络设计关键技术研究", 《中国博士论文全文数据库信息科技辑》 *
葛芬等: "基于网络监控器的专用片上网络动态容错路由", 《电子学报》 *

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104270279A (en) * 2014-10-28 2015-01-07 电子科技大学 On-line error detection circuit for fault of NoC (Network-on-Chip) illegal path
CN104270279B (en) * 2014-10-28 2017-07-18 电子科技大学 A kind of network-on-chip the illegal route On-line Fault error detection circuit
CN104796343A (en) * 2015-03-21 2015-07-22 西安电子科技大学 Communication structure based on network-on-chip
CN105187272B (en) * 2015-08-24 2018-12-18 阔地教育科技有限公司 A kind of network state detection method and device
CN105187272A (en) * 2015-08-24 2015-12-23 阔地教育科技有限公司 Network state detection method and device
CN105207841B (en) * 2015-08-24 2018-12-18 阔地教育科技有限公司 A kind of network state detection method and device based on Online class
CN105450524A (en) * 2015-12-08 2016-03-30 北京飞讯数码科技有限公司 Media forwarding routing optimization algorithm
CN105450524B (en) * 2015-12-08 2020-06-09 北京飞讯数码科技有限公司 Method for optimizing and calculating media forwarding route
CN105656773A (en) * 2016-03-24 2016-06-08 合肥工业大学 High-reliability link failure tolerance module and method aiming at transient failures and intermittent failures in network-on-chip
CN105893321A (en) * 2016-03-24 2016-08-24 合肥工业大学 Path diversity-based crossbar switch fine-grit fault-tolerant module in network on chip and method
CN105893321B (en) * 2016-03-24 2019-01-11 合肥工业大学 The fault-tolerant module of crossbar switch fine granularity and its method in network-on-chip based on Path diversity
CN105656773B (en) * 2016-03-24 2018-10-02 合肥工业大学 The fault-tolerant module of highly reliable link and its method of transient fault and intermittent defect are directed in network-on-chip
CN106506267B (en) * 2016-10-26 2019-04-05 合肥工业大学 The NoC real-time monitoring configuration circuit of combination ofperformance and static behavior towards power consumption temperature
CN106506267A (en) * 2016-10-26 2017-03-15 合肥工业大学 NoC real-time monitoring configuration circuits towards the combination ofperformance and static behavior of power consumption temperature
CN106487673B (en) * 2016-12-08 2019-06-04 北京时代民芯科技有限公司 A kind of error detection re-transmission fault tolerance rout ing unit based on triplication redundancy
CN106487673A (en) * 2016-12-08 2017-03-08 北京时代民芯科技有限公司 A kind of error detection based on triplication redundancy retransmits fault tolerance rout ing unit
CN106792832B (en) * 2017-01-25 2019-06-14 合肥工业大学 The congestion discrimination module and its method of radio node in a kind of wireless network-on-chip
CN106792832A (en) * 2017-01-25 2017-05-31 合肥工业大学 The congestion discrimination module and its method of radio node in a kind of wireless network-on-chip
CN109802889B (en) * 2017-11-17 2020-10-27 华为技术有限公司 Information transmission method and device
CN109802889A (en) * 2017-11-17 2019-05-24 华为技术有限公司 A kind of information transferring method and device
CN111382115A (en) * 2018-12-28 2020-07-07 北京灵汐科技有限公司 Path creating method and device for network on chip and electronic equipment
US11398981B2 (en) 2018-12-28 2022-07-26 Lynxi Technologies Co., Ltd. Path creation method and device for network on chip and electronic apparatus
CN111382115B (en) * 2018-12-28 2022-04-15 北京灵汐科技有限公司 Path creating method and device for network on chip and electronic equipment
US20220166705A1 (en) * 2019-05-23 2022-05-26 Hewlett Packard Enterprise Development Lp Dragonfly routing with incomplete group connectivity
CN110351192A (en) * 2019-08-15 2019-10-18 电子科技大学 A kind of multi-level optional compound route control method of dynamic towards network-on-chip
JP7369104B2 (en) 2019-08-16 2023-10-25 シーメンス インダストリー ソフトウェア インコーポレイテッド Addressing mechanisms for systems-on-chip
JP2021052385A (en) * 2019-08-16 2021-04-01 ウルトラソック、テクノロジーズ、リミテッドUltrasoc Technologies Limited Addressing mechanism for system-on-chip
CN111817952A (en) * 2019-10-11 2020-10-23 西安电子科技大学 Mesh structure-based high-fault-tolerance low-delay routing algorithm
CN111147198A (en) * 2020-01-02 2020-05-12 中科驭数(北京)科技有限公司 Data retransmission method and device
CN111147198B (en) * 2020-01-02 2021-05-25 中科驭数(北京)科技有限公司 Data retransmission method and device
CN111475457A (en) * 2020-04-08 2020-07-31 苏州浪潮智能科技有限公司 Method, device and storage medium for determining data transmission path of network on chip
CN111522775A (en) * 2020-04-22 2020-08-11 合肥工业大学 Network-on-chip routing device and control method thereof
CN111522775B (en) * 2020-04-22 2023-05-16 合肥工业大学 Network-on-chip routing device and control method thereof
CN111651118A (en) * 2020-04-27 2020-09-11 中国科学院微电子研究所 Memory system, control method and control device
CN111651118B (en) * 2020-04-27 2023-11-21 中国科学院微电子研究所 Memory system, control method and control device
WO2022143020A1 (en) * 2020-12-31 2022-07-07 Oppo广东移动通信有限公司 Chip and control method therefor, and computer-readable storage medium and electronic device
WO2022165822A1 (en) * 2021-02-03 2022-08-11 之江实验室 Resilient and reliable router for wide-area phasor measurement system of power grid
US11437846B2 (en) 2021-02-03 2022-09-06 Zhejiang Lab Reliable resilient router for wide-area phasor measurement system of power grid
CN112491666A (en) * 2021-02-03 2021-03-12 之江实验室 Elastic reliable router for power grid wide area phase measurement system
CN113218437B (en) * 2021-04-30 2022-05-13 华中师范大学 Large-area array fault-tolerant network reading device and method for high-density charge sensor chip
CN113218437A (en) * 2021-04-30 2021-08-06 华中师范大学 Large-area array fault-tolerant network reading device and method for high-density charge sensor chip
CN113382431A (en) * 2021-06-16 2021-09-10 复旦大学 Inter-node fault-tolerant communication system and communication method suitable for large-scale parallel computing
CN113382431B (en) * 2021-06-16 2022-12-13 复旦大学 Inter-node fault-tolerant communication system and communication method suitable for large-scale parallel computing
CN115225694A (en) * 2022-06-13 2022-10-21 中科驭数(北京)科技有限公司 Data stream transmission method, device, equipment and medium
CN115225694B (en) * 2022-06-13 2023-12-12 中科驭数(北京)科技有限公司 Data stream transmission method, device, equipment and medium
CN115361332A (en) * 2022-08-16 2022-11-18 中国工商银行股份有限公司 Processing method and device for fault-tolerant routing, processor and electronic equipment

Similar Documents

Publication Publication Date Title
CN103973482A (en) Fault-tolerant on-chip network system with global communication service management capability and method
JP3816531B2 (en) Asynchronous packet switching
US20190260504A1 (en) Systems and methods for maintaining network-on-chip (noc) safety and reliability
CN102868604B (en) Two-dimension Mesh double buffering fault-tolerant route unit applied to network on chip
CN102629912B (en) Fault-tolerant deflection routing method and device for bufferless network-on-chip
US20130242745A1 (en) Relay device, method of controlling relay device, and relay system
JP6191833B2 (en) Communication device, router having communication device, bus system, and circuit board of semiconductor circuit having bus system
CN103618673A (en) NoC routing method guaranteeing service quality
CN106487673B (en) A kind of error detection re-transmission fault tolerance rout ing unit based on triplication redundancy
US6615221B2 (en) Scalable transport layer protocol for multiprocessor interconnection networks that tolerates interconnection component failure
CN104579951A (en) Fault-tolerance method in on-chip network under novel fault and congestion model
WO2018004931A1 (en) Techniques for virtual ethernet switching of a multi-node fabric
Schley et al. Fault localizing end-to-end flow control protocol for networks-on-chip
CN105656773A (en) High-reliability link failure tolerance module and method aiming at transient failures and intermittent failures in network-on-chip
CN100571183C (en) A kind of barrier operating network system, device and method based on fat tree topology
WO2008057831A2 (en) Large scale multi-processor system with a link-level interconnect providing in-order packet delivery
CN102904807A (en) Method for realizing fault-tolerant reconfigurable network on chip through split data transmission
CN111682966B (en) Network communication device with fault active reporting function, system and method thereof
Sanusi et al. Smart-flooding: A novel scheme for fault-tolerant NoCs
CN102724115B (en) Link layer fault tolerance circuit design suitable for on-chip network system
CN103401775A (en) Source message conversion device, message conversion method thereof, target message conversion device and message conversion method of target message conversion device
Adamu et al. Review of deterministic routing algorithm for network-on-chip
Khichar et al. Fault aware adaptive routing algorithm for mesh based NoCs
CN113347029A (en) Torus network fault tolerance method based on topology reconstruction and path planning
CN112637053A (en) Method and device for determining backup forwarding path of route

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140806

RJ01 Rejection of invention patent application after publication