CN107276920A

CN107276920A - A kind of distributed flow control system and mechanism applied to hybrid three-dimensional network-on-chip

Info

Publication number: CN107276920A
Application number: CN201710628121.4A
Authority: CN
Inventors: 闫改珍; 吴宁; 葛芬; 周芳; 岳新新; 聂国明
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2017-07-28
Filing date: 2017-07-28
Publication date: 2017-10-20
Anticipated expiration: 2037-07-28
Also published as: CN107276920B

Abstract

The invention discloses a kind of distributed flow control system and mechanism applied to bus NoC hybrid three-dimensional network-on-chips.With reference to the application scenarios and architectural features of hybrid three-dimensional network-on-chip, the flow-control mechanism will flow control components distribution in the resource contention link such as routing node and Bus Interface Unit；Based on convergence flow service model by it is abstract by the individual flow of routing node and Bus Interface Unit be multiple convergence flows, and dynamic learning is carried out to the bandwidth demand of convergence flow；Each routing arbitration device and bus control unit determine the service priority grade of packet according to the difference between the bandwidth demand and active service state of convergence flow and carry out data forwarding in real time, realize the distribution according to need of communication bandwidth.The present invention introduces stream feature learning module in routing node and Bus Interface Unit, and Flow Behavior monitoring modular, four nextport hardware component NextPorts such as two-level priority arbitration modules and distributed bus control module realize above-mentioned stream control function and feature.

Description

A kind of distributed flow control system and mechanism applied to hybrid three-dimensional network-on-chip

Technical field

The present invention relates to a kind of distributed flow control system and mechanism applied to hybrid three-dimensional network-on-chip, more particularly to one Distributed flow control system and mechanism kind applied to bus-NoC hybrid three-dimensional network-on-chips, belong to integrated circuit, interconnect on piece Network design field.

Background technology

The performance benefits brought with parallel instructions tend to saturation, and semi-conductor industry gradually develops to multinuclear on-chip system. Collect being continuously increased for Nucleation on piece so that interconnection communications bandwidth turns into the performance bottleneck and design challenge of multiple nucleus system.On piece Network (Network-on-Chip, NoC) interconnection architecture can improve the concurrency of intercore communication, be to alleviate multiple nucleus system interconnection belt The effective solution of wide problem.But NoC frameworks can not reduce full sheet interconnection length, as multinuclear on-chip system occurs thousand The even more massive interconnection of core level is with communicating, and conventional two-dimensional NoC will be produced under the application of highly-parallel and communications-intensive Surprising communication power consumption and communication delay.Active device in one single chip is distributed in different physical layers by three-dimensional integration technology On, interlayer is realized by silicon hole (Through Silicon Visas, TSVs) and interconnected, and is interconnected in vertical direction for multinuclear There is provided new dimension.In numerous three-dimensional interconnection frameworks, bus-NoC hybrid three-dimensional network-on-chips retain in each two dimensional surface NoC interconnection structures, and use bus interconnection in vertical direction and jumped with to construct vertical one and communicate, can effectively play vertical TSV speed links, the communication advantage of low consumption, are more applicable for interconnecting on extensive multinuclear piece.

Existing network-on-chip flow-control mechanism is broadly divided into the type of doing one's best and the class of guarantee type two.Hybrid three-dimensional network-on-chip In, NoC interference networks and perpendicular interconnection bus are the shared communication resources of all nodes in plane, when thousands of IP kernels simultaneously Serious bandwidth contention can be produced during same communication path of sending out communication stream contention numerous.The type flow-control mechanism of doing one's best is conceived to mutually The lifting of frame linking structure overall performance, and the communication requirement of individual flow is have ignored, the distributed arbitration mechanism of network-on-chip causes net There is unpredictability in the performance of network behavior and the application program operated on processor core, it is impossible to ensure the band of application-specific Wide and latency requirement.And existing guarantee type flow-control mechanism is more by virtual circuit or time-division multiplex technology, along data transfer path Routing cache or service slots are preengage, performance isolation is carried out to data stream to form separated network.It contributes to be individual Body stream provides communication support, but may need individually to distribute service queue for each individual flow or cause interconnection bandwidth whole utilization Rate is not enough.

To solve the above problems, research and propose by global synchronization frame (Globally-Synchronized-Frame, GSF) scheduling logic in routing node is transferred to source node to reduce the area and power dissipation overhead of routing node by mechanism, but When network size is larger, GSF needs longer frame length to offset the delay that global synchronization is brought, so as to cause information source buffering area Increase.The guarantee type flow-control mechanism for the better performances that another industry is proposed is to seize dummy clock mechanism (Preemptive Virtual Clock, PVC), it is that each data flow distributes priority based on VirtualClock, when occurring Priority Inversion, Bandwidth safeguard is carried out with retransmission mechanism, although avoid using larger source buffering area, but ensure intensity and network using abandoning There is design contradiction between throughput.

For the network-on-chip interconnection scale below hundred cores level more than existing flow-control mechanism, in order to realize individual traffic Communication support is more using per flow-control mechanism.This will cause control complexity to increase with the increase of communication stream in multiple nucleus system Greatly, so as to cause scalability to be deteriorated, larger face may be brought under the extensive interconnection applications scene of hundred cores and thousand cores level Product expense and performance loss.At the same time, in hybrid three-dimensional network-on-chip, in addition to NoC interconnection bandwidths, vertical busses bandwidth is also The communication contention resource of different data streams.The bus flow-control mechanism of centralization can introduce a large amount of extra TSV numbers, so as to bring Larger area overhead, and reduce chip yield.Therefore, it is necessary to architectural features based on hybrid three-dimensional network-on-chip and should With scene, the strong distributed flow-control mechanism of the saving of research expense, scalability ensures single individual flow bandwidth demand.

The content of the invention

For flow-control mechanism described in background technology brought under extensive multinuclear interconnection applications background larger area expense and It is the problems such as performance loss, of the invention towards bus-NoC hybrid three-dimensional network-on-chip interconnection architectures, it is proposed that one kind is based on convergence The distributed flow control system and mechanism of flow control, on the premise of whole interference networks performance is not reduced, ensure the logical of individual flow Believe bandwidth demand.The mechanism will be abstract for multiple convergences by the individual flow of routing node and bus based on convergence flow service model Stream；Each routing arbitration device and bus control unit are true in real time according to the difference between the bandwidth demand and active service state of convergence flow Determine the service priority grade of packet and carry out data forwarding, realize the distribution according to need of communication bandwidth.The present invention is for Internet While individual data items stream provides bandwidth safeguard service in network, the extra performance and area overhead of interference networks are reduced.

The present invention uses following technical scheme to solve above-mentioned technical problem：

On the one hand, the present invention provides a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip, including stream spy Levy study module, Flow Behavior monitoring modular, two-level priority arbitration modules and distributed bus control module；

Flow feature learning module, the data flow that is forwarded for routing node and Bus Interface Unit dynamic learning through it and its Bandwidth demand, and global most congestion link bandwidth demand is solved by Distributed Calculation；Further, the bandwidth according to convergence flow is needed Ask and the bandwidth demand of most congestion link determines that each convergence flow expects the microplate number being forwarded within the monitoring cycle；

Flow Behavior monitoring modular, for monitoring the remittance that routing node and Bus Interface Unit are actually forwarded within the monitoring cycle Conflux microplate number, and its difference that microplate number is forwarded with convergence flow expectation is calculated, it is used as resource contention link bandwidth allocation Foundation；

Two-level priority arbitration modules, when being distributed for routing node internal chiasma Switch Resource, are arbitrated multiple to same The request signal that input port or output port are initiated, realizes the bandwidth allocation of route output link；

Distributed bus control module, for the arbitration modules by being distributed in each Bus Interface Unit, realizes mixing three Tie up vertical distribution of the TSV bus bandwidths between each Bus Interface Unit in network-on-chip.

As the present invention further technical scheme, the stream feature learning module include be distributed in all routing nodes and Some stream feature learning devices of EBI, wherein, positioned at the stream feature learning device of routing node, by identical input/output port Between individual flow it is abstract be a convergence flow；Positioned at the stream feature learning device of Bus Interface Unit, by identical hair on same bus It is a convergence flow to send the individual flow between node and receiving node abstract.

As the further technical scheme of the present invention, the stream feature learning method of the stream feature learning device is as follows：

A) data flow transmission bandwidth reservation requesting data bag, by the communication stream bandwidth feature produced by it by source node along number According to streaming path transmission to destination node；

B) routing node and Bus Interface Unit of bandwidth reservation request data package are received, is made according to it needed for local The communication resource is added up to corresponding convergence flow bandwidth demand；

C) data flow is completed after the transmission of bandwidth reservation request data package, notifies all routing nodes and Bus Interface Unit；

D) routing node searches global most congestion link and its corresponding pre- with Bus Interface Unit by Distributed Calculation About bandwidth；

E) reserved bandwidth based on most congestion link, each routing node and Bus Interface Unit determine that its local convergence flow exists The microplate number being forwarded is expected in the monitoring cycle.

As the further technical scheme of the present invention, the Distributed Calculation of global most congestion link bandwidth demand in step d Solution procedure is as follows：

1) each routing node tries to achieve east, south, west, north first, the bandwidth of localized communication link on local five directions is needed Ask, and vertical busses link bandwidth demand, and take its maximum as locally most congestion link bandwidth demand；

2) local most congestion link bandwidth demand along X to, Y-direction, Z-direction make distributed contrast successively and transmit, that is, mixing In each plane of network on three-dimensional chip, link-local bandwidth demand phase of the left side routing node first with its adjacent routing node in right side Compare, take its big value along X to transmission, until rightmost side routing node, obtains bandwidth demands of the X to most congestion link；The rightmost side X is made distributed contrast along Y-direction to most congestion link bandwidth demand in a similar manner again and transmitted by routing node, obtains plane most The bandwidth demand of congestion link；Plane most congestion link bandwidth demand enters row information by TSV buses in Z-direction and exchanged, and at it In contrasted in Bus Interface Unit, obtain the overall situation most congestion link bandwidth demand in whole network；

3) global most congestion link bandwidth demand along Z-direction, Y-direction, X to broadcast to all routing node and EBI Unit.

As the further technical scheme of the present invention, the Flow Behavior monitoring modular is distributed in all routing nodes and bus Interface unit, for monitoring the bandwidth usage of convergence flow in real time, and updates its priority；Flow Behavior monitoring modular is this Each converge one service state register of stream configuration on ground, and its value embodies the reserved bandwidth and actual acquisition bandwidth of convergence flow Difference；Data transfer phase, every the monitoring cycle, it is micro- that the value of service state register increases that each convergence flow expects to be forwarded Piece number；Each convergence flow is forwarded a microplate, the value quilt of service state register in routing node or Bus Interface Unit Subtract 1；The instantaneous value of service state register as routing node cross bar switch distribute and the bus right to use distribution when arbitration according to According to, when instantaneous value is bigger convergence flow have use remaining bandwidth, will be endowed in route and bus arbitration link higher excellent First level.

As the further technical scheme of the present invention, the two-level priority arbitration modules are by based on covering logic circuit Priority arbiter is constituted with the Round-Robin moderators cascade based on least recently used principle, for determining finally quilt The convergence flow of service；When multiple convergence flows are asked using identical routing node resource or bus resource simultaneously, with convergence flow The currency of service state register is input, carries out one-level priority arbitration；When multiple convergence flows have limit priority, Two grades of arbitrations are carried out based on least recently used principle.

As the further technical scheme of the present invention, the Logic coverage circuit is covered by Logic coverage drive circuit, logic Lid bus is constituted with Logic coverage contrast circuit, has the convergence flow of data forwarding request by service state register currency, is led to Logic coverage drive circuit is crossed while delivering to Logic coverage bus, the service state value of low priority is by the service shape of high priority State value is covered；Each convergence flow is by the service state value for contrasting the currency of Logic coverage bus with delivering to Logic coverage bus It is no consistent to judge the forward data whether with limit priority；

One bit clock is divided into two parts by Logic coverage drive circuit by putting-evaluation dynamic CMOS circuits realization in advance.Patrol Covering drive circuit is collected first to put the load capacity of Logic coverage bus in advance by pull-down NMOS；Patrolled when one or more When the input for collecting covering drive circuit is logic 1, pull-up PMOS conducting, Logic coverage bus remains logic 1, only when all When the input of Logic coverage drive circuit is logical zero, Logic coverage bus is logical zero；

Logic coverage contrast circuit is using parallel covering way of contrast, to reduce Logic coverage delay, convergence flow service shape All positions of state value are added to Logic coverage bus simultaneously, read all positions in Logic coverage bus during contrast simultaneously.

As the further technical scheme of the present invention, convergence flow service state value uses logic continuous programming code：

1) coding is made up of a string of continuous logical zeros and a string of continuous logics 1；

2) full 0 and complete 1 binary string fall within qualified coding；

3) at least one difference between any two code in space encoder.

As the further technical scheme of the present invention, the distributed bus control module is articulated in by some be distributed in The arbitration modules of each Bus Interface Unit in vertical busses constitute a distributed two-level priority arbitration modules, and it is introduced Two groups of TSV buses：One group is arbitration bus, for completing distributed Logic coverage；Another group is status information bus, is used for Indicate the node of the acquisition right to use in one-level priority arbitration；In each Bus Interface Unit, the data with bus use demand Stream first sends its service state value to the result setting state of TSV arbitration bus, then foundation arbitration bus Logic coverage The state value of shared bus, the Round-Robin moderators for being distributed in each Bus Interface Unit will be same according to the state value shared Step is judged, it is determined that unique bus user..

On the other hand, the present invention also provides a kind of distributed flow-control mechanism applied to hybrid three-dimensional network-on-chip, is used for The bandwidth safeguard of individual flow is realized, the flow-control mechanism includes：

1st, stream control components distribution is in bus-NoC hybrid three-dimensional on-chip network structure routing nodes and Bus Interface Unit Resource contention link；Routing node is abstract for convergence flow by all individual flows between identical input, output port；EBI list Member is abstract for convergence flow by the data flow between identical sending node and receiving node；Routing node and the stream in Bus Interface Unit Control bandwidth demand of the component based on convergence flow realizes the bandwidth allocation of resource contention link；

2nd, convergence flow is obtained in the bandwidth demand of each resource contention link by on-line study；Hybrid three-dimensional network-on-chip frame When application in structure changes, each resource contention link initially enters the stream feature learning stage, records the bandwidth of each individual flow Demand, and and then calculate the microplate number that each convergence flow is expected to be forwarded within the specifically monitored cycle；

3rd, in the resource contention link of routing node and Bus Interface Unit, bandwidth allocation passes through the scheduling based on priority Mechanism realizes that the priority that wherein data are forwarded is true by the local resource use state of each routing node and Bus Interface Unit It is fixed, i.e. to expect that be forwarded microplate number determines with being actually forwarded the difference of microplate number within the monitoring cycle by its affiliated convergence flow.

The present invention uses above technical scheme compared with prior art, with following technique effect：

(1) present invention is using distributed flow-control mechanism, each routing node and service of the Bus Interface Unit based on convergence flow State determines the forwarding priority of individual flow, and control complexity is related only to the number of plies of routed port number and network on three-dimensional chip, And it is unrelated with the number of individual flow in network-on-chip, thus with more preferable scalability, it is adaptable to extensive network-on-chip is mutual Frame linking structure；

(2) present invention uses the bandwidth safeguard mechanism based on priority, in each resource contention link, only in accordance with data flow Bandwidth usage determines the sequencing of being serviced, without blocking any data flow, thus link bandwidth can obtain more It is effective to utilize, it is favorably improved the overall performance of network-on-chip interconnection architecture；

(3) priority of data flow is determined by the use state of local resource in the present invention, and with the resource of other nodes Race condition is unrelated, thus be not in the situation of Priority Inversions, it is to avoid abandon making for re-transmission policy in the mechanism such as PVC With the bandwidth safeguard intensity of individual flow will not produce contradiction with the lifting of systematic function；

(4) priority arbiter by Logic coverage circuit realiration, based on binary system is compared tree with traditional in the present invention The priority arbiter of (Binary Comparison Tree) structure is compared, with lower area overhead；

(5) the TSV arbitration bus and status information bus that distributed vertical bus control unit is introduced in the present invention are each total Shared between line interface unit, and bottom is through to by the top layer of whole three-dimensionally integrated system.With centralized vertical busses controller Compare, extra TSV expenses are smaller, and can improve the heat dissipation characteristics of three-dimensionally integrated system

Brief description of the drawings

Fig. 1 is bus-NoC hybrid three-dimensional network-on-chip interconnection architectures；

Fig. 2 for for realize distributed flow-control mechanism routing framework and its needed for nextport hardware component NextPort；

Fig. 3 is the distributed computing method of most congestion link bandwidth demand, wherein, (a) is congestion link bandwidth demand Route internal counting circuit, (b) is the globally shared path of congestion link bandwidth demand；

Fig. 4 is the structure of Flow Behavior monitoring modular；

Fig. 5 is the stream prosecutor case that routing node switchs distribution；

Fig. 6 is two-level priority moderator structure；

Fig. 7 is Logic coverage driving circuit structure；

Fig. 8 is distributed bus controller structure.

Embodiment

Technical scheme is described in further detail below in conjunction with the accompanying drawings：

The present invention designs a kind of distributed flow-control mechanism applied to bus-NoC hybrid three-dimensional network-on-chips, additionally draws altogether Enter the distributed flow control system that four nextport hardware component NextPorts are constituted, the system includes：Stream feature learning module, Flow Behavior monitoring modular, Two-level priority arbitration modules and distributed bus control module.

Stream feature learning module of the present invention is used for what routing node was forwarded with Bus Interface Unit dynamic learning through it Data flow and its bandwidth demand, and global most congestion link bandwidth demand is solved by Distributed Calculation.Further, according to convergence The bandwidth demand of the bandwidth demand of stream and most congestion link determines that each convergence flow expects the microplate being forwarded within the monitoring cycle Number.

Stream feature learning module includes being distributed in some stream feature learning devices of all routing nodes and EBI, is located at The stream feature learning device of routing node, by the individual flow between identical input/output port it is abstract be a convergence flow, for N The routing node of individual port, altogether can abstract N × (N-1) individual convergence flow；And the stream feature learning device for being located at Bus Interface Unit will Abstract individual flow on same bus between identical sending node and receiving node is a convergence flow, for M node Bus, altogether can abstract M × (M-1) individual convergence flow.Flow each convergence flow f in feature learning device_ij(input port is i, output port J) to be configured with a bandwidth reservation register C_ij, for realizing that bandwidth is cumulative with remembering.

Flow feature learning method as follows：

1.1) after multiple nucleus system application updates, bandwidth reservation register clear 0；

1.2) the task transmission bandwidth reservation requesting data bag of each resource node is run on, by the communication stream band produced by it Quant's sign is by source node along transmission path to destination node.In order to save hardware spending, stream bandwidth feature by normalize in The integer value of minimum resolution bandwidth is represented (BW)；

1.3) routing node of bandwidth reservation request data package is received, based on the source entrained by packet, destination node Input port i of the adress analysis data flow in the routing node and output port j, judges the convergence flow f belonging to data flow_ij； According to stream bandwidth feature entrained in packet, convection current bandwidth subscription register C_ijAdded up.Receiving bandwidth reservation please The Bus Interface Unit asked, it is broadcasted in bus, realizes the information sharing between all nodes in bus.Each node receives band After wide reserve requests, based on the source entrained by packet, sending node i of the destination address analyze data stream in bus with receiving Node j, judges the convergence flow f belonging to data flow_ij, and convection current bandwidth subscription register C_ijAdded up；

1.4) each resource node is completed after the transmission of bandwidth reservation request data package, notifies all routing nodes to be connect with bus Mouth unit；

1.5) all routing nodes search most congestion link and its corresponding with Bus Interface Unit by Distributed Calculation Reserved bandwidth C_max；Calculate its data microplate number F that can be forwarded in given monitoring cycle W_max；

1.6) C is fed back_maxWith F_maxTo all routing nodes and Bus Interface Unit；

1.7) routing node and each convergence flow foundation C in Bus Interface Unit_ij, C_maxAnd F_maxIt is determined that in the specifically monitored cycle The data microplate number E being forwarded is expected in W_ij；

1.8) stream feature learning terminates.

The distributed solution procedure of global most congestion link bandwidth demand is as follows：

A) each routing node tries to achieve east, south, west, north first, the bandwidth of localized communication link on local five directions is needed Ask, and vertical busses link bandwidth demand, and take its maximum as locally most congestion link bandwidth demand.

B) local most congestion link bandwidth demand along X to, Y-direction, Z-direction make distributed contrast and transmit successively.I.e. in mixing In each plane of network on three-dimensional chip, link-local bandwidth demand phase of the left side routing node first with its adjacent routing node in right side Compare, take its big value along X to transmission, until rightmost side routing node, obtains X to most congestion link.Rightmost side routing node is again Make the most congestion link in distributed contrast transmission acquisition plain film face along Y-direction in a similar manner.Plane most congestion link bandwidth Demand is entered row information by TSV buses in Z-direction and exchanged, and is contrasted in wherein one Bus Interface Unit, obtains whole Global most congestion link bandwidth demand in network.

C) global most congestion link bandwidth demand along Z-direction, Y-direction, X to broadcast to all routing node and EBI Unit.

Flow Behavior monitoring modular of the present invention is used to monitor that routing node and Bus Interface Unit are real within the monitoring cycle The convergence flow microplate number of border forwarding, and its difference that microplate number is forwarded with convergence flow expectation is calculated, it is used as resource contention link The foundation of bandwidth allocation.

Flow Behavior monitoring modular sets a stream service state register for local each convergence flow, and its value embodies convergence The reserved bandwidth of stream and the actual difference for obtaining bandwidth.25 stream service state registers, EBI are had in routing node Stream service state register number in unit depends on the number of plies of hybrid three-dimensional network-on-chip.Flow the value of service state register Enter Mobile state with the data forwarding of each resource contention link to update.The initial time in each monitoring cycle, each convergence flow service The value of status register increases certain share, and its value is the microplate number that convergence flow is expected to be forwarded within the monitoring cycle；Work as convergence Stream is forwarded after a microplate, and the value of respective service status register subtracts 1.

Flow Behavior monitoring module is each convergence flow f_ijConfigure a service state register S_ij, wherein in stream characterology The habit stage clear 0.In data transfer phase, every monitoring cycle W, S_ijValue increase E_ij(that is, monitor in cycle W, f_ijExpect by The microplate number of forwarding)；Each convergence flow f_ijA microplate, S are forwarded in routing node or Bus Interface Unit_ijValue quilt Subtract 1.S_ijInstantaneous value will by be used as routing node cross bar switch distribution and the bus right to use distribution when arbitration foundation.S_ij's Mean convergence flow f when being worth larger_ijStill there is more available remaining bandwidth, will be endowed in route and bus arbitration link higher Priority.

Two-level priority arbitration modules of the present invention are connect by some each buses being articulated in vertical busses that are distributed in The arbitration modules of mouthful unit constitute a distributed two-level priority arbitration modules, by the moderator based on priority with The cascade of Round-Robin moderators is constituted, when multiple data flows contention route or during bus bandwidth simultaneously, for determining finally quilt The data flow of service.Multiple convergence flows are asked using identical routing node resource or bus resource simultaneously when, taken with convergence flow The currency of business status register is input, carries out one-level priority arbitration；When multiple convergence flows have limit priority, base Two grades of arbitrations are carried out in least recently used principle.

Moderator based on priority is made up of Logic coverage circuit, all data flow f for asking to be forwarded_ijServiced State value S_ijDeliver to arbitration bus, low preferential S simultaneously_ijBy by the S of high priority_ijCovering, the S with limit priority_ijWill It is retained on arbitration bus.As more than two f_ijThere is limit priority S simultaneously_ijWhen, pass through Round-Robin moderators Make a choice wherein, it is ensured that the uniqueness of arbitration result.

Logic coverage circuit is made up of Logic coverage drive circuit and contrast circuit.All data flows for asking to be forwarded will Its service state value delivers to Logic coverage bus simultaneously, and the service state value of low priority is covered by the service state value of high priority Lid.Each convergence flow judges whether it has most by the way that whether the value for reading Logic coverage bus is consistent with the service state value of oneself High priority.Logic coverage drive circuit by putting-evaluation dynamic CMOS circuits realization in advance.One bit clock is divided into two parts, All nodes are put in advance by pull-down NMOS to the load capacity of Logic coverage bus first；When one or more Logic coverages When the input of circuit is logic 1, pull-up PMOS conducting, Logic coverage bus remains logic 1, only when all Logic coverages electricity When the input on road is logical zero, Logic coverage bus is logical zero.Logic coverage contrast circuit is using parallel covering to analogy Formula, to reduce Logic coverage delay.All positions of convergence flow service state value are added to Logic coverage bus simultaneously, during contrast simultaneously Read all positions in Logic coverage bus.

In order to avoid part position is mutually covered in parallel covering way of contrast, convergence flow service state value is continuous using logic Coding：

1) coding need to be made up of a string of continuous logical zeros and a string of continuous logics 1；

2) full 0 and complete 1 binary string fall within qualified coding；

3) at least one difference between any two code in space encoder.

Distributed bus control module of the present invention is distributed in each Bus Interface Unit being articulated in vertical busses, leads to Cross distributed two-level priority moderator and realize vertical TSV bus arbitrations.Two groups of TSV buses need to be introduced.One group total for arbitration Line, for completing distributed Logic coverage；Another group is status information bus, for indicating to obtain in one-level priority arbitration The node of the right to use.Logic coverage circuit is distributed in each Bus Interface Unit controller, and each Bus Interface Unit control A Round Robin moderator, and simultaneously operating are each equipped with device.In addition to the TSV buses for realizing Logic coverage, volume It is shared between each node that outer introducing TSV status informations bus is used for first order arbitration result.With TSV status information buses It is worth to input, Round Robin moderators determine unique total according to least recently used principle in each Bus Interface Unit Line user.In each Bus Interface Unit, the data flow with bus use demand is first by its S_ijSend total to TSV arbitrations The state value of line, then the result setting state tying bus of foundation arbitration bus Logic coverage, is distributed in each EBI list The Round-Robin moderators of member will synchronously be judged according to shared state value, it is determined that unique bus user.

Technical scheme is described in further detail below by specific embodiment：

Distributed flow-control mechanism of the present invention is applied to bus-NoC hybrid three-dimensional network-on-chips, as shown in Figure 1.Each money Source node is regularly distributed in different layers and realized by routing node and interconnected.Routing node has six interconnection ports, is respectively The east mouthful (E), the south mouthful (S), western port (W), the North mouthful (N), local port (L) and vertical port (V).East, south, west, north Interconnection in four ports realizations and plane between adjacent routing node；Local port is realized with resource node and interconnected；Vertical port Interconnected with Bus Interface Unit.Data flow between source node S and destination node D presses the forwarding of dimension sequence XYZ paths, i.e. data flow first Transmitted in the plane where source node, when after the upright position D (x, y) where reaching destination node, through routeing vertical end mouthful TSV buses are forwarded to, destination node is eventually arrived at.Feature learning phase resource reserve requests bag is flowed also according to identical path to turn Hair.

Nextport hardware component NextPort needed for distributed flow-control mechanism of the present invention is distributed in each routing node and EBI single port Unit, as shown in Figure 2.Wherein, routing node is classical input Virtual Channel structure, by router-level topology, Virtual Channel distribution, switch point Match somebody with somebody, switch transmission four level production lines composition.Under XYZ dimension sequence routes, the packet that bus is received is submitted directly to resource node, Without further routeing, thus have inside five input ports, six output ports, routing node altogether can abstract 25 for routing node Individual convergence flow.On the basis of classical architecture, extra increase stream feature learning module, Flow Behavior monitoring modular and two-level priority Arbitration modules realize the bandwidth safeguard of convergence flow.Feature learning module is flowed on the basis of each convergence flow bandwidth demand is learnt, and is led to Cross Distributed Calculation and determine most congestion link, and estimate that each convergence flow is forwarded the expected value of microplate number in monitoring cycle W.Stream Behavior monitoring module is based on stream feature learning result and switch allocation result calculates each convergence flow reserved bandwidth and obtains band with actual Difference between width, and determine the service priority of each convergence flow.Two-level priority arbitration modules are excellent according to stream monitoring modular offer First level is preferably the larger convergence flow distribution switch of bandwidth surplus, so as to realize bandwidth allocation.

Bus Interface Unit sends buffering containing one and buffered with a reception, on this basis extra increase stream feature learning Module, Flow Behavior monitoring modular and distributed bus control module realize that bus bandwidth is distributed.Convergence flow amount in bus by Bus Interface Unit number (that is, the number of plies of bus-NoC network on three-dimensional chip) is determined.Feature learning module is flowed to monitor with Flow Behavior The operation principle of module is similar to the respective modules in routing node.Due to sharing the Node distribution of bus bandwidth in different layers Interior, the present invention realizes that bus bandwidth is distributed using distributed two-level priority moderator.Each Bus Interface Unit need to monitor stream The priority that module is provided sends to arbitration bus and completes Logic coverage simultaneously, and the result of Logic coverage passes through status information bus Realize shared.The node for obtaining the bus right to use sends out the microplate being forwarded and corresponding target bus interfaces element address respectively Data/address bus and address bus are delivered to, microplate on the Bus Interface Unit readout data bus that address is matched is provided with address bus And be stored in reception buffering.

Stream feature learning module of the present invention determines the bandwidth of most congestion link in interconnection system by Distributed Calculation Demand, as shown in Figure 3.Each routing node tries to achieve eastern (E), southern (S), western (W), northern (N), phase on local (L) five directions first The bandwidth demand of adjacent communication link, and neighboring vertical bus links bandwidth demand, in such as Fig. 3 shown in (a).Comparator Max_ The maximum of the 1 above-mentioned each link bandwidth demand asked first, i.e. the maximum bandwidth demand of link-local.Maximum link-local band Wide demand compares transmission by distribution and obtained in global maximum, such as Fig. 3 shown in (b).In each three-dimensional planar, path 1. on most The route in left side transmits maximum link-local bandwidth demand to the east mouthful, and adjacent routing node is read from western port (Max_W) The value, and compared with local maximum link bandwidth demand, take its big value to transmit to the east mouthful (Max_E).Path 1. rightmost side road X-axis will be obtained to maximum link bandwidth demand from node, the value is transmitted to the south of routing node mouthful (Max_S).Its path is 2. On adjacent routing node read the value from the North mouthful (Max_N), and by comparator Max_2 and the x-axis of local maintenance to most Big link bandwidth demand is made comparisons, and takes its big value to transmit to the south mouthful (Max_S).2. above the routing node of lower side will be obtained in path The maximum link bandwidth demand in plane is obtained, 3. the value is transmitted along path to bottom bottom right side gusset, and passes through comparator Max_3 tries to achieve the bandwidth demand of most congestion link in whole interference networks.The value will along as shown in (b) in Fig. 3 3. 2. 1. Broadcast to all routing node and Bus Interface Unit in path.The time complexity of the Distributed Calculation is O (X+Y+Z), its In, X, Y, Z is dimension of the network on three-dimensional chip in three directions.Compared to centralization calculating, distribution used in the present invention Calculating greatly have compressed calculating time and corresponding data communication.

Flow Behavior monitoring modular of the present invention is distributed in each routing node and Bus Interface Unit, is routing node or total One Flow Behavior monitor of each convergence stream configuration in line interface unit, as shown in Figure 4.Routing node has 25 Flow Behaviors Monitor, Bus Interface Unit has Nz-1 Flow Behavior monitor (Nz is the number of plies of hybrid three-dimensional network-on-chip).When switch point With stage G_ijWhen enable signal is effective, that is, port i and port j is allowed to transmit a microplate, convergence flow f_ijService state deposit Device S_ijSubtract 1；When seeervice cycle W, timing signal Tw was effective, convergence flow f_ijService state register S_ijIncrease E_ij, i.e. monitoring The microplate number being forwarded is expected in cycle W.

Two-level priority moderator of the present invention is used for the flow-control mechanism for realizing routing node switch distribution, such as Fig. 5 institutes Show.Switch distribution uses output port preference strategy, i.e. first to input port is identical and request signal that output port is different enters Row arbitration, based on arbitration result different to the identical input port of output port request signal arbitrations again.Bus-NoC mixing three Dimension network-on-chip has 5 input ports, 6 output ports, thus switch distribution needs 11 two-level priority moderators altogether.

Two-level priority moderator of the present invention by logic-based cover the priority arbitration device of circuit with it is minimum recently Cascaded and constituted using Round Robin moderators, as shown in Figure 6.As request signal convergence flow f_ijService request signal r_ijHave During effect, its service state currency S_ijCovering bus is sent to, lowest priority (0) is otherwise sent to covering bus.Work as detection To bus data with being sent S_ijWhen being worth identical, show current f_ijWith limit priority, p_ijSet；Otherwise mean current S_ijCovered by other higher priority, p_ijReset.Round Robin moderators are with p_ijTo input, by the request of nearest being serviced Lowest priority is set to, finally the only effective enable signal is obtained from all request signals for participating in arbitration.Round Robin moderators will ensure under all convergence flow priority identical situations that be capable of justice distributes bandwidth for it.

The Logic coverage drive circuit that two-level priority moderator is used, as shown in Figure 7.It it was 0 phase in synchronised clock clk Between, the PMOS cut-off in each drive circuit, NMOS tube conducting, Logic coverage bus load electric capacity C_LPass through all drive circuits Interior NMOS electric discharges；During synchronised clock clk is 1, the NMOS tube cut-off in each drive circuit, when one of input is to patrol When collecting 1, the PMOS in its drive circuit will be turned on, load capacity C_LHigh level will be charged to, Logic coverage bus is to patrol Collect 1；Only when all inputs are logical zero, Logic coverage bus is just logical zero, so as to realize covering of the logic 1 to logical zero；Patrol Collecting 1 has higher priority.

Logic coverage contrast circuit is using parallel covering way of contrast, to reduce Logic coverage delay.In order to avoid Part position is mutually covered in parallel covering way of contrast, and convergence flow service state value uses logic continuous programming code, meets following volumes Code rule：

2) full 0 and complete 1 binary string fall within qualified coding；

3) at least one difference between any two code in space encoder.

The code space for meeting features described above that N binary strings are constituted containing 2N legal codings, can at most be divided For two subsets of the preferential subset of low level and high priority subset, each subset is containing N+1 coding.Wherein, in the preferential subset of low level Code logic 1 since low level, and the code logic 1 in high priority subset, since a high position, full 0 and complete 1 coding are same When belong to two subsets.The preferential subset of low level is all feasible parallel arbitration coding with high priority subset but can not be mixed.

Distributed bus control module of the present invention is realized by distributed two-level priority moderator, referring to Fig. 8.With The difference of two-level priority moderator shown in Fig. 6 is that Logic coverage circuit is distributed in a Bus Interface Unit controller In, and it is each equipped with a Round Robin moderator, and simultaneously operating in each Bus Interface Unit controller.Except for Realize outside the TSV buses of Logic coverage, additionally introducing TSV status informations bus is used for first order arbitration result (P_j) in each section It is shared between point.According to the value, Round Robin moderators are true according to least recently used principle in each Bus Interface Unit Fixed unique bus user.

The present invention is described with reference to current embodiment, and it is public that unspecified part belongs to those skilled in the art Know general knowledge.Those skilled in the art are it should be appreciated that above-mentioned embodiment is not used only for illustrating the present invention It is any within the scope of the spirit and principles in the present invention to limit protection scope of the present invention, any modification for being done, equivalent replace Change, improve, should be included within the scope of the present invention.Therefore, protection scope of the present invention should be wanted with right The protection domain of book is asked to be defined.

Claims

1. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip, it is characterised in that including stream feature learning mould Block, Flow Behavior monitoring modular, two-level priority arbitration modules and distributed bus control module；

Feature learning module is flowed, the data flow and its bandwidth forwarded for routing node and Bus Interface Unit dynamic learning through it Demand, and global most congestion link bandwidth demand is solved by Distributed Calculation；Further, according to convergence flow bandwidth demand and The bandwidth demand of most congestion link determines that each convergence flow expects the microplate number being forwarded within the monitoring cycle；

Flow Behavior monitoring modular, for monitoring the convergence flow that routing node and Bus Interface Unit are actually forwarded within the monitoring cycle Microplate number, and its difference that microplate number is forwarded with convergence flow expectation is calculated, it is used as the foundation of resource contention link bandwidth allocation；

Two-level priority arbitration modules, when being distributed for routing node internal chiasma Switch Resource, are arbitrated multiple to same input The request signal that port or output port are initiated, realizes the bandwidth allocation of route output link；

Distributed bus control module, for the arbitration modules by being distributed in each Bus Interface Unit, realizes hybrid three-dimensional piece Vertical distribution of the TSV bus bandwidths between each Bus Interface Unit in upper network.

2. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 1, its feature exists In, the stream feature learning module includes being distributed in some stream feature learning devices of all routing nodes and EBI, wherein, Positioned at the stream feature learning device of routing node, by the individual flow between identical input/output port it is abstract be a convergence flow；It is located at The stream feature learning device of Bus Interface Unit, be by the individual flow on same bus between identical sending node and receiving node is abstract One convergence flow.

3. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 2, its feature exists In the stream feature learning method of the stream feature learning device is as follows：

a）Data flow transmission bandwidth reservation requesting data bag, by the communication stream bandwidth feature produced by it by source node along data flow Transmission path is to destination node；

B) routing node and Bus Interface Unit of bandwidth reservation request data package are received, is used according to it needed for local The communication resource corresponding convergence flow bandwidth demand is added up；

D) routing node and Bus Interface Unit search global most congestion link by Distributed Calculation and its corresponding preengage band It is wide；

E) reserved bandwidth based on most congestion link, each routing node and Bus Interface Unit determine its local convergence flow in prison The microplate number being forwarded is expected in the control cycle.

4. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 3, its feature exists In the Distributed Calculation solution procedure of global most congestion link bandwidth demand is as follows in step d：

1）Each routing node tries to achieve east, south, west, north, the bandwidth demand of localized communication link on local five directions first, with And the bandwidth demand of vertical busses link, and its maximum is taken as locally most congestion link bandwidth demand；

2）Local most congestion link bandwidth demand alongXTo,YTo,ZTo distributed contrast transmission is made successively, i.e., in hybrid three-dimensional In each plane of network-on-chip, left side routing node is first compared with the link-local bandwidth demand of its adjacent routing node in right side Compared with taking its big value edgeXTo transmission, until rightmost side routing node, is obtainedXTo the bandwidth demand of most congestion link；Rightmost side road In a similar manner will again by nodeXTo most congestion link bandwidth demand edgeYTo distributed contrast transmission is made, obtain plane and most gather around Fill in the bandwidth demand of link；Plane most congestion link bandwidth demand enters row information by TSV buses in Z-direction and exchanged, and wherein Contrasted in one Bus Interface Unit, obtain the global most congestion link bandwidth demand in whole network；

3）Global most congestion link bandwidth demand edgeZTo,YTo,XTo broadcast to all routing nodes and EBI list Member.

5. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 1, its feature exists In the Flow Behavior monitoring modular is distributed in all routing nodes and Bus Interface Unit, the band for monitoring convergence flow in real time Wide service condition, and update its priority；Flow Behavior monitoring modular is posted for local each convergence one service state of stream configuration Storage, its value embodies the reserved bandwidth and the actual difference for obtaining bandwidth of convergence flow；Data transfer phase, every monitoring week Phase, the value of service state register increases the microplate number that each convergence flow is expected to be forwarded；Each convergence flow in routing node or A microplate is forwarded in Bus Interface Unit, the value of service state register is subtracted 1；The instantaneous value of service state register is made For routing node cross bar switch distribute and the bus right to use distribution when arbitration foundation, when instantaneous value is bigger convergence flow have it is more can With remaining bandwidth, higher priority will be endowed in route and bus arbitration link.

6. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 1, its feature exists In the two-level priority arbitration modules are by the priority arbiter based on covering logic circuit with being based on least recently used original Round-Robin moderators cascade then is constituted, the convergence flow for determining final being serviced；When simultaneously multiple convergence flows are asked During using identical routing node resource or bus resource, using the currency of convergence flow service state register as input, carry out One-level priority arbitration；When multiple convergence flows have limit priority, two grades of arbitrations are carried out based on least recently used principle.

7. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 6, its feature exists In the Logic coverage circuit is made up of Logic coverage drive circuit, Logic coverage bus and Logic coverage contrast circuit, there is number According to the convergence flow of forwarding request by service state register currency, Logic coverage is delivered to by Logic coverage drive circuit simultaneously Bus, the service state value of low priority is covered by the service state value of high priority；Each convergence flow is by contrasting Logic coverage The currency of bus and deliver to the service state value of Logic coverage bus and whether unanimously judge the forward data whether with most High priority；

One bit clock is divided into two parts by Logic coverage drive circuit by putting-evaluation dynamic CMOS circuits realization in advance；Logic is covered Lid drive circuit is put in advance by pull-down NMOS to the load capacity of Logic coverage bus first；When one or more logics are covered When the input of lid drive circuit is logic 1, pull-up PMOS conducting, Logic coverage bus remains logic 1, only when all logics When the input for covering drive circuit is logical zero, Logic coverage bus is logical zero；

Logic coverage contrast circuit is using parallel covering way of contrast, to reduce Logic coverage delay, convergence flow service state value All positions simultaneously be added to Logic coverage bus, during contrast simultaneously read Logic coverage bus on all positions.

8. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 7, its feature exists In convergence flow service state value uses logic continuous programming code：

1）Coding is made up of a string of continuous logical zeros and a string of continuous logics 1；

2）Full 0 and complete 1 binary string fall within qualified coding；

3）At least one difference between any two code in space encoder.

9. a kind of distributed flow control system applied to hybrid three-dimensional network-on-chip according to claim 1, its feature exists In the distributed bus control module is by some arbitrations for being distributed in each Bus Interface Unit being articulated in vertical busses The distributed two-level priority arbitration modules of module composition one, it introduces two groups of TSV buses：One group is arbitration bus, is used for Complete distributed Logic coverage；Another group is status information bus, for indicating to obtain the right to use in one-level priority arbitration Node；In each Bus Interface Unit, its service state value is sent secondary to TSV by the data flow with bus use demand first Bus is cut out, then the result according to arbitration bus Logic coverage sets the state value of state tying bus, is distributed in each bus and connects The Round-Robin moderators of mouth unit will synchronously be judged according to shared state value, it is determined that unique bus user.

10. a kind of distributed flow-control mechanism applied to hybrid three-dimensional network-on-chip, the bandwidth safeguard for realizing individual flow, its It is characterised by, the flow-control mechanism includes：

1st, stream control components distribution is in bus-NoC hybrid three-dimensional on-chip network structure routing nodes and Bus Interface Unit resource Competitive link；Routing node is abstract for convergence flow by all individual flows between identical input, output port；Bus Interface Unit will Data flow between identical sending node and receiving node is abstract for convergence flow；Routing node and the stream control group in Bus Interface Unit Bandwidth demand of the part based on convergence flow realizes the bandwidth allocation of resource contention link；

2nd, convergence flow is obtained in the bandwidth demand of each resource contention link by on-line study；In hybrid three-dimensional on-chip network structure Application when changing, each resource contention link initially enters the stream feature learning stage, records the bandwidth demand of each individual flow, And and then calculate the microplate number that each convergence flow is expected to be forwarded within the specifically monitored cycle；

3rd, in the resource contention link of routing node and Bus Interface Unit, bandwidth allocation passes through the scheduling mechanism based on priority Realize, the priority that wherein data are forwarded is determined by the local resource use state of each routing node and Bus Interface Unit, That is, expect that be forwarded microplate number determines with being actually forwarded the difference of microplate number within the monitoring cycle by its affiliated convergence flow.