CN111522775A

CN111522775A - Network-on-chip routing device and control method thereof

Info

Publication number: CN111522775A
Application number: CN202010320744.7A
Authority: CN
Inventors: 李桢旻; 翁晓峰; 王镜涵; 沈烨钦; 杜高明; 王晓蕾
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2020-08-11
Anticipated expiration: 2040-04-22
Also published as: CN111522775B

Abstract

The invention provides a network-on-chip routing device and a control method thereof, wherein the control method comprises the steps of configuring a learning module for each router of the network-on-chip routing device, and forming a routing node by each router and one learning module; the learning module of each routing node respectively acquires the congestion state information of the learning modules of adjacent routing nodes and then performs parallel learning to acquire the optimal data transmission path of each destination routing node; and carrying out data transmission according to the optimal data transmission path. The network-on-chip routing device and the control method thereof can accelerate the network convergence speed, improve the number of parallel lines and greatly improve the speed of path planning, and simultaneously can find a proper path only by acquiring the position information of a target routing node, thereby not only reducing the occupation of resources, but also shortening the time of path planning.

Description

Network-on-chip routing device and control method thereof

Technical Field

The invention relates to the technical field of multi-mode data feature learning, in particular to an on-chip network routing device and a control method thereof.

Background

Routing algorithms are one of the main factors affecting the performance of the network on chip. Static network-on-chip architectures, such as the X-Y routing algorithm, generate "hot spots" when power consumption increases and router temperature increases during mass data transmission. The 'hot spot' is easy to cause data blockage and data delay, so that the chip is unreliable and the service life is reduced. Therefore, there is a need to improve the algorithm to make the network-on-chip become a dynamic architecture to improve the data transmission efficiency (because the routing algorithm can optimize the transmission path for each data packet, reduce the overall transmission time and thus improve the efficiency), reduce congestion (because there is no routing algorithm or the routing algorithm does not optimize the transmission path in time, a large number of data packets select the same link to cause single or multiple link congestion), "hot spot" and data transmission delay (because the hop count of the message passing from the source node to the destination routing node is directly determined by the actual path passing through), ensure that the network-on-chip has good temperature while communicating at high speed, thereby ensuring the safety and reliability of the chip.

At present, a plurality of dynamic self-adaptive algorithms, namely a Q-learning algorithm in machine learning, have strong self-adaptability, can autonomously interact with the environment to deduce the optimal path information, and can obtain a path with finite length only after the learning result is converged, so that the deadlock condition is avoided. However, the traditional algorithm has the disadvantages that only one agent (agent) can interact with the environment in a single time, the learning process is slow, and the random number and the greedy strategy make the algorithm not necessarily search all the spaces, so that a better path is found.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide an on-chip network routing apparatus and a control method thereof, which are used to solve the technical problem that the audio and video collaborative learning method in the prior art is not suitable for label-free data.

To achieve the above and other related objects, the present invention provides a method for controlling a network-on-chip routing apparatus, including:

configuring a learning module on each router of the network-on-chip routing device, wherein each router and one learning module form a routing node;

the learning module of each routing node respectively acquires the congestion state information of the learning modules of adjacent routing nodes and then performs parallel learning to acquire the optimal data transmission path of each destination routing node;

and carrying out data transmission according to the optimal data transmission path.

In an optional embodiment, before the step of performing parallel learning after the learning module of each routing node respectively acquires the congestion state information of the learning modules of adjacent routing nodes, writing the congestion state information of each routing node into the learning module of the routing node.

In an optional embodiment, the control method further comprises the steps of:

when the fault detection module of the routing node detects a fault, fault information is sent and written into the learning module of the routing node, and the learning module of the routing node recognizes the fault information and then carries out path planning again.

In an alternative embodiment, the congestion status information includes destination routing node information, path information, multi-level congestion status information, device edge information, and temporary damage information.

In an optional embodiment, the step of performing data transmission according to the optimal data transmission path includes:

packing the destination routing node information into a head microchip of a data packet to be transmitted to form a first data packet;

and inputting the first data packet into a selected routing node, wherein the routing node carries out data transmission according to an optimal data transmission path corresponding to a target routing node in a head microchip of the first data packet.

In an optional embodiment, the step of performing parallel learning after the learning module of each routing node respectively acquires the congestion state information of the learning modules of adjacent routing nodes to acquire the optimal data transmission path of each destination routing node includes:

the learning module of each routing node simultaneously acquires the maximum reward value stored in the learning module of each adjacent routing node;

after unit learning time passes in each routing node, calculating the maximum reward value stored in the learning module of each adjacent routing node according to a preset formula and the blocking state information of the learning module of each adjacent routing node so as to obtain a plurality of weighted reward values;

taking the maximum value in the weighted reward values as the maximum reward value of the local routing node;

repeating the three steps until the longest path is learned, so as to obtain a routing table of the destination routing node;

and repeating the four steps until the routing table of each destination routing node is prepared.

In an alternative embodiment, when each of the routing nodes has four data transmission directions, the preset formula includes

Wherein Q (cs, A) represents the weighted reward value of the A direction of the local routing node, Q (ns, A) represents the maximum reward value stored by the routing node of the next hop, cs represents the local, ns represents the next hop, A represents the direction, gamma₁Denotes the first order plugging factor, gamma₂Representing the secondary clogging coefficient, gamma₁、γ₂Between 0 and 1, and gamma₁>γ₂And Q (cs, max) represents the maximum reward value stored by the local routing node.

To achieve the above and other related objects, the present invention also provides an on-chip network routing apparatus, comprising:

the system comprises a plurality of routing nodes connected according to a preset network topology structure, wherein each routing node comprises a router and a learning module which are connected with each other;

the learning module of each routing node is used for acquiring the congestion state information of the learning modules of adjacent routing nodes and then performing parallel learning to acquire the optimal data transmission path of each destination routing node; the router of each routing node is configured to perform data transmission according to the optimal data transmission path.

In an optional embodiment, the router includes a failure detection module, an input port, a crossbar matrix, and an output port, and the learning module is connected to the failure detection module and the crossbar matrix, respectively.

In an optional embodiment, the learning module comprises:

a routing algorithm storage unit for storing a computer program for implementing the function of the learning module;

a first matrix for storing congestion status information of each adjacent direction of the corresponding routing node;

the second matrix is used for storing the weighted reward values of all data transmission directions after the corresponding route nodes learn;

the direction selection matrix is used for storing the data transmission direction and the destination routing node information corresponding to the maximum value in the weighted reward values of all the data transmission directions in the second matrix;

and the routing table is used for storing the copied direction selection matrix.

The network-on-chip routing device and the control method thereof adopt a split parallel Q learning network-on-chip fault-tolerant routing algorithm, namely the learning module of each routing node respectively acquires the blocking state information of the learning modules of the adjacent routing nodes and then performs parallel learning to acquire the optimal data transmission path of each destination routing node, thereby accelerating the network convergence speed, increasing the number of parallel lines, greatly improving the speed of path planning, and simultaneously, only needing to acquire the position information of the destination routing node, finding a proper path, not only reducing the resource occupation, but also shortening the time of path planning;

in the network-on-chip routing device and the control method thereof, in the network-on-chip fault-tolerant routing algorithm based on split parallel Q learning, the random number is parallelized, the random selection direction of the random number is changed into the full selection direction, and each data transmission direction of each routing node can work in parallel, so that the coverage rate can be ensured to reach 100%;

in the network-on-chip routing device and the control method thereof, the network-on-chip fault-tolerant routing algorithm based on split parallel Q learning does not use random numbers, and the convergence time is greatly shortened under the condition of ensuring the coverage rate;

in the network-on-chip routing device and the control method thereof, the path planning is more flexible when the multi-stage blockage occurs in the network-on-chip fault-tolerant routing algorithm based on the split parallel Q learning.

Drawings

Fig. 1 is a flowchart illustrating a method for controlling a network-on-chip routing device according to the present invention.

Fig. 2 is a schematic diagram illustrating a partial architecture of a network-on-chip routing apparatus according to the present invention.

Fig. 3 is a schematic diagram illustrating connection between a single routing node and an adjacent routing node in the network-on-chip routing apparatus according to the present invention.

Fig. 4 shows a novel router interconnection architecture with distributed Q modules in an embodiment of the invention.

Fig. 5 is a schematic structural diagram illustrating a network-on-chip congestion situation according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating a Q matrix final prize value according to an embodiment of the invention.

Fig. 7 is a schematic diagram illustrating path planning information of the Q module according to an embodiment of the present invention.

FIG. 8 is a diagram illustrating the structure of a data packet from PE1 entering R1 and accessing a routing table, gate output, in accordance with an embodiment of the present invention.

FIG. 9 is a schematic diagram illustrating the location and operation of a fault detection module according to an embodiment of the present invention.

FIG. 10 is a diagram illustrating a packet format according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

Please refer to fig. 1-10. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to fig. 1, an embodiment of the present invention introduces a method for controlling a network-on-chip routing device, where the method is implemented by a split-type parallel Q-learning network-on-chip fault-tolerant routing algorithm, so as to accelerate a network convergence rate, improve a number of parallel lines, and greatly improve a path planning rate, and a suitable path can be found only by obtaining location information of a destination node, so that not only is resource occupation reduced, but also a path planning time is shortened. Specifically, the split parallel Q learning network-on-chip fault-tolerant routing algorithm of the invention enables a data packet to always select a link which leads to a destination and has better transmission efficiency for routing by collecting network congestion information and link use conditions. The algorithm utilizes each router 10 to search in all directions, parallelizes to obtain network-on-chip blocking information, utilizes Q-learning (Q learning) to rapidly calculate the optimal path of a data packet in real time, enables the network-on-chip to be changed into a dynamic architecture, shares data flow globally, improves data transmission efficiency and performance, reduces delay in the blocking and data transmission processes, and improves fault tolerance rate and throughput rate. The router 10 also has good expansibility, and can be applied to networks on chips with different scales.

The technical solution of the present invention will be specifically explained below with reference to the accompanying drawings.

Referring to fig. 1, the method for controlling the network-on-chip routing device includes the following steps:

step S10, configuring a learning module 20 for each router 10 of the network-on-chip routing device, where each router 10 and one learning module 20 form a routing node 100;

step S20, the learning module 20 of each routing node 100 respectively acquires the congestion state information of the learning modules 20 of adjacent routing nodes 100, and then performs parallel learning to acquire the optimal data transmission path of each destination routing node;

and step S30, transmitting data according to the optimal data transmission path.

In step S10, please refer to fig. 1, a network-on-chip routing device is configured, that is, a learning module 20 is configured for each router 10 of the network-on-chip routing device, and each router 10 and one learning module 20 form a routing node 100. Wherein, fig. 2 shows a partial architecture schematic diagram of the network-on-chip routing device of the present invention; fig. 3 shows a schematic connection diagram of a single routing node 100 and an adjacent routing node 100 in the network-on-chip routing apparatus of the present invention.

Referring to fig. 2 and 3, the network-on-chip routing apparatus includes a plurality of routing nodes 100 connected according to a preset network topology structure, each routing node 100 includes a router 10 and a Q intelligent learning module 20 (hereinafter referred to as Q module 20 or learning module 20) connected to each other, that is, the network-on-chip routing apparatus of the present invention includes a network-on-chip including a plurality of routers 10 and a learning network including a plurality of learning modules 20, data packet transmission is not performed between the router 10 and the Q module 20 of each routing node 100, and reward value information (including a weight value, a hop count, a path length short message, and the like) is transmitted between the Q modules 20 of two adjacent routing nodes 100 and connected to each other for path planning. Specifically, as shown in fig. 2, in the present invention, a Q module 20 is deployed to each router node, and each router 10 and the Q module 20 deployed thereon together serve as one routing node 100; as shown in fig. 3, in a network on chip, each node learns in parallel at the same time, and the Q module 20 of each node has m (m is a positive integer) learning directions (corresponding to m data transmission directions of each router 10) which can learn in parallel at the same time. It should be noted that the size of the Q-module 20 of the present invention can be increased with the instantiation of the router 10, and the adaptability is strong.

Referring to fig. 2, in the present invention, each router 10 at least includes four parts, namely, a fault detection module 11, an input port 13(input port), a crossbar 14(crossbar), and an output port 16(output port), which are sequentially arranged. The data of the input and output ports 16 of the router 10 is determined according to the scale of the network on chip; as an example, the number of the input ports 13 is, for example, m +1, where m input ports 13 of one router 10 are respectively connected to m adjacent routers 10, and the remaining one input port 13 is used as a port (local input port 13) in a local (local) direction, from which packet data of a corresponding terminal PE below the router 10 enters the network on chip, and the packet data sent by the terminal PE may also be replaced by a packet sender (injector), for example; as an example, the number of the output ports 16 is, for example, m +1, where m output ports 16 of one router 10 are respectively connected to m adjacent routers 10, and the remaining one output port 16 is a port (local output port 16) in a local (local) direction from which a packet enters the terminal PE after reaching the destination router 10.

It should be noted that, in addition to the fault tolerance function, the router of the present invention: for soft errors (transient data errors in the transmission process), judging whether the data of the data packet is correct by odd check, and once the data is wrong, stopping the transmission of the data packet by the router and waiting for the terminal PE to send again; for hard errors (i.e. a damaged router node or a damaged data pipe, data cannot correctly pass through a fault area), the fault detection module 11 is used to detect the hard errors, and when a hard error is detected, the Q module 20 is used to perform path planning again.

Specifically, referring to fig. 2, in the present invention, the fault detection module 11 is connected to an R matrix 21 of a Q module 20 to be described later; when a data packet enters the input port 13, the data packet passes through the fault detection module 11, the fault detection module 11 is configured to detect a hard error of the router 10, the fault detection module 11 includes a flow counter and a low level counter (also referred to as a level judgment state machine), the flow counter is configured to monitor that the flow counter provides the R matrix 21 with congestion status information and a device fault state, and the low level counter accumulates the number of consecutive low levels in the transmission line and provides the R matrix 21 with the fault state. Specifically, a flow counter is utilized, the counter judges the blocking degree according to the packet sending number in a period of time, but the counter does not count in a longer period of time, and the data pipeline in the direction is judged to be damaged; by using the level judgment, a low level counter is designed beside the flow counter, the data pipeline is high level by default, when the output port 16 sends data, a start bit (for example 01) is sent to indicate that the data starts to be transmitted first, and the fault detection uses the default level of the data pipeline, if the data pipeline in the direction is non-high level (namely high resistance state or low level) for a continuous period of time, the data pipeline in the direction is judged to be damaged.

It should be noted that, in the present invention, please refer to fig. 9, when the learning module 20 of the routing node 100 identifies the fault information, path planning is performed again; specifically, after determining that a certain channel is damaged, the fault detection module 11 returns an error flag bit to the Q module 20, indicating that the line is damaged; the Q module 20 writes the state into the R matrix 21 according to the error position, and at this time, it can be regarded that the failure state is the device edge state and the full jam state, so that at the next learning, the agent updates the reward value to the Q matrix in the direction according to the weight of the full jam, and finally, the effect of bypassing the failure region is achieved.

Referring to fig. 2, a First Input First Output (FIFO) 12 is further disposed between each Input port 13 of the router 10 and the fault detection module 11, and is used for data buffering to prevent data errors or delay increase due to excessive transmission pressure; a FIFO (First Input First Output, First in First out queue 15) is further disposed between each Output port 16 of the router 10 and the crossbar 14, and is used for data buffering, so as to prevent data errors or delay increase due to excessive transmission pressure.

In the present invention, referring to fig. 3, the Q module 20 includes a routing algorithm unit (not shown), an R matrix 21 (i.e. a first matrix), a Q matrix (i.e. a second matrix, not shown), a direction selection matrix (not shown), and a routing table 22. The routing algorithm unit is used to implement a computer program for storing a Q-learning routing algorithm, which is used to implement a path planning function of the Q-module 20, and it may be implemented by a mem type memory or a reg register, which is used to store congestion status information of each adjacent direction of the routing node 100, and it may be implemented by a mem type memory or a reg register. The R matrix 21 is used to store congestion status information of m adjacent directions of the routing node 100, including destination routing node, path, multi-level congestion, device edge, and temporary defect, and may be implemented by a mem type memory or a reg register. The Q matrix is used to store weighted reward values (defined below) of the learned data transmission directions of the routing node 100, which may be implemented by a mem type memory or a reg register, where the weighted reward values in the Q matrix represent expected values from the destination node, and the greater the reward value, the faster the path is selected to the destination. The direction selection matrix is used for storing the data transmission direction (which is used as the optimal data transmission path) corresponding to the maximum value in the weighted reward values of all the data transmission directions in the Q matrix and the destination routing node information; the direction selection matrix can be implemented by, for example, a memory (reg register may also be used) of the mem (memory) type, the address is a destination node, the data is an m-bit unique code representing m directions, the weighted reward value in the Q matrix represents an expected value from the destination node, and the more the weighted reward value is, the faster the path is selected to the destination. The routing table 22 is used for storing the copied direction selection matrix, the copied routing table 22 can be used for routing access of data packets transmitted from different directions, and the routing table 22 copied by the direction selection matrix can be realized by a reg register (or a mem type memory) for example, so as to adapt to high-speed access and response; in the present invention, the routing table 22 only stores the direction information of the local node, which greatly reduces the storage resources of the routing table 22 compared with the traditional routing algorithm.

The operating principle of the routing algorithm of the invention is that in a unit learning time, the Q matrix simultaneously reads the maximum reward values of the Q matrix of the adjacent routing nodes 100 from the routing nodes 100 in m directions, and the reward values of the target routing nodes after a plurality of unit learning times are sequentially transmitted to the Q matrix of each routing node 100 according to the chain rule. The convergence time of the path planning of the routing algorithm is faster than that of the common self-adaptive routing algorithm, and the planned path is better. In the invention, the Q matrix is used for reading the reward value information of the routing nodes 100 in m directions simultaneously and simulating the action searching environment selected in the traditional Q-learning algorithm, but the routing algorithm of the invention does not use random number to select actions but adopts full coverage type searching, so that all the spaces can be searched, and the time for path planning is greatly reduced. It should be noted that the routing algorithm Unit of the present invention can be implemented by a GPU (Graphics Processing Unit), an ASIC (application specific Integrated Circuit) or an FPGA (Field Programmable gate array), and the routing algorithm Unit of the present invention can be implemented by, for example, an FPGA with a semi-custom Circuit based on cost and execution time considerations.

When a data packet request is sent to a certain destination routing node, the routing workflow is carried out according to the following steps:

first, initialization is performed to allow each routing node 100 to prepare routing tables 22 of all destination routing nodes, that is, the learning module 20 of each routing node 100 obtains congestion state information of the learning modules 20 of adjacent routing nodes 100 respectively and then performs parallel learning to obtain an optimal data transmission path of each destination routing node (step S20), which includes the following sub-steps:

step S21, at the initial stage of the initialization stage, quantize the congestion status of the entire routing network to a specific value, write the value into the R matrix 21, and write the information of the destination routing node into the R matrix 21, that is, write the quantized congestion status and the information of the destination routing node (the quantized congestion status and the information of the destination routing node are defined as congestion status information) of each routing node 100 into the learning module 20 of the routing node 100;

step S22, in the initialization phase, the learning nodes (the learning modules 20 of the routing nodes 100) respectively read the next hop maximum reward values of the corresponding nodes in m directions, that is, the learning module 20 of each routing node 100 simultaneously obtains the maximum reward values stored in the learning modules 20 of the adjacent routing nodes 100, and simulates a search environment for selecting actions in a conventional Q-learning algorithm, and instead of selecting actions using random numbers, full coverage type search is adopted, so that all the spaces can be guaranteed to be searched, and the time for path planning is greatly reduced;

step S23, an initialization stage, in each routing node 100, after unit learning time, the Q matrix of m adjacent routing nodes 100 is calculated by a preset formula, m weighted reward values are obtained, the m weighted reward values are stored in regs (registers) of m directions of the route, meanwhile, the m weighted reward values are compared, and the maximum value is used as the reward maximum value of the current routing node 100 (or called as the local routing node 100) for the routing nodes 100 in other adjacent directions to call during learning;

step S24, repeating steps S22 to S23 x times until learning of the longest route is completed, so as to obtain a routing table 22 of the destination routing node, where x depends on the size of the routing network, and when the network is n × n, x is n × n-1, and at this time, one round of rotation is completed;

step S25, each time a rotation is completed, it indicates that for a certain destination routing node, the routing lookup table of the routing network under the condition is completely prepared, and when the network conditions corresponding to all destination nodes are completely learned (n × n times, where n × n is the size of the network, the steps S22 to S24 are repeated), the rotation is completed once, and each routing node 100 completely prepares the routing table 22 of all destination nodes, in other words, the routing table 22 of each destination routing node is completely prepared, and the initialization stage is completed.

Then, the data transmission is performed by using the prepared routing table 22, that is, the data transmission is performed according to the optimal data transmission path (S30), which includes the following substeps:

step S31, in the stage of packet sending, a packet sender in the terminal PE packs destination routing node information into packet header flits of data to be transmitted to form a first data packet, where the first data packet enters the router 10 through the input port 13, and then enters the crossbar 14(crossbar module) of the router 10, and the crossbar 14 sends the destination routing node information in the header flit into the Q module 20;

step S32, the Q module 20 uses the destination routing node information carried by the incoming first data packet as the address of the routing table 22, finds out the direction of the next hop of the first data packet (expressed by the unique hot code), and returns the direction to the crossbar 14;

step S33, the crossbar 14 performs direction gating according to the unique code, and transmits the first data packet to the output port 16 in the corresponding direction according to the gating result, so that the transmission process of the first data packet in one router 10 is completed;

and step S34, continuing the data packet transmission (repeating the steps S31-S33) until the first data packet reaches the destination routing node.

It should be noted that, in the present invention, the transmission of each packet is parallel, and steps S31-S34 can be performed simultaneously. It should be noted that, in the present invention, during the data transmission process, the fault detection module 11 may update the congestion information in real time and perform path planning again. That is, when the data packet uses the copied routing table 22 for data transmission, the Q module 20 updates the congestion status information and the path plan, stores the path information into the direction selection matrix, and copies the path information into the routing table 22 after a certain time interval for the access of the packet header flit information. The path planning speed is very fast, and the dynamic path can meet the real-time requirement.

The invention greatly accelerates the learning speed by utilizing the parallelism characteristic of FPGA hardware, and the routing algorithm of the invention changes the original Q-learning process from serial to parallel; in the conventional Q-learning algorithm, only one agent exists in the whole network at a certain time, and the split parallel Q-learning routing algorithm splits a long process from the start point to the end point of agent exploration into a plurality of small exploration behaviors, and can be performed simultaneously, and each routing node 100 on a long path is split into a node with a small exploration behavior. Specifically, in the original learning process, each time a time unit passes, the agent searches forward one step and transmits an incentive value once, and the total length of a path without branches is assumed to be x nodes, and the time for the agent to finish one search is assumed to be x time units; the split routing algorithm of the present invention is that every time a time unit passes, all nodes on the path transmit the reward value to an adjacent node, but only the node storing the reward value information of the destination routing node is the currently valid transmission node, and after x time units, the reward value of the destination node is embodied in the routing table 22 of the start node, that is, the traditional Q-learning is similar to long distance running, and the split algorithm is similar to relay. For an unbranched path, the two algorithms do not differ much, but if the paths rise into the whole routing network, the parallelization has the following advantages: the traditional Q-learning is difficult to parallelize the exploration process, one reason is that two or more agents are randomly explored, and if the same node is just explored, competition-risk phenomena can occur when reward value data is stored at the same time; the split-type algorithm can allow a higher number of parallels, and besides each router node, each direction in each router node can be parallel, that is, the split-type algorithm is theoretically m × n times higher than the traditional algorithm in exploration efficiency (m is the number of directions of each node, and n is the total number of routers 10 of the network).

In general, reinforcement learning generally adopts a multi-agent (each agent independently uses a random number) exploration environment in the application field to realize random number parallelization; each agent initial action is random, then according to feedback and different machine learning algorithms, the agent action randomness is gradually reduced, and after long-time iteration, an optimal selection path is gradually converged; the conventional routing algorithm is unstable in the exploration process because a random number is used for controlling the direction, because after a long time of convergence, part of the direction is not explored, and the method for controlling the direction by using the random number cannot guarantee that each state of each node in the network is explored, that is, the coverage rate is not 100%. The random number is parallelized, the random number random selection direction is changed into the full selection direction, and each direction of each node of the split routing algorithm can work in parallel, so that the split routing algorithm can ensure that the coverage rate reaches 100%; in the problem of finding the optimal path, the coverage rate of the traditional algorithm is low, so the effect of finding the optimal path is not as good as that of a split algorithm; the split algorithm is better in learning result (accuracy).

In addition, in the invention, because the split-type algorithm does not use random numbers, the convergence time is greatly shortened under the condition of ensuring the coverage rate, because the condition of state repetition is met by using the random numbers, the convergence time is increased, and the split-type algorithm can ensure that each condition can be explored and is explored only once.

In addition, the split parallel Q learning-based network-on-chip fault-tolerant routing algorithm is more flexible in path planning under the condition of multi-stage blockage. The traditional routing algorithm can only select the direction close to the target land node, and has less selectivity, so that the blocking degree cannot be effectively reduced; the routing algorithm of the invention can select the directions far away from the destination node (the directions are non-blocking or lower blocking degrees) at a certain blocking degree, thereby achieving the purpose of reducing the blocking degree of the whole network on chip.

Referring to fig. 1-10, the technical solution of the present invention will be described with reference to a specific embodiment.

In this embodiment, a step of configuring the network-on-chip routing device is first performed (step S10). Please refer to the network routing apparatus on the picture, which adopts a 2D-mesh topology, and may include 9 router nodes (hereinafter referred to as routers 10), for example, and each router 10 may have 4 data transmission directions. As shown in fig. 3, in order to collect information of each router 10 of the network, calculate a weight, and plan a path, for example, 9Q smart learning modules 20 need to be added to the network. The Q intelligent learning module 20 (hereinafter referred to as Q module 20 or learning module 20) is connected with the router 10R in the 2D-mesh network one by one, and transmits the congestion state information and the packet path information without transmitting the packet. After obtaining the congestion state information, each Q module 20 starts to perform intelligent learning in parallel, plans an optimal data transmission path of each destination routing node, stores the path information in the direction selection matrix of the Q module 20, and copies 4 routing table 22 registers for routing access of data packets transmitted in different directions. It should be noted that, in an alternative example, only 1 routing table 22 register with 4 read ports and 1 write port may be duplicated instead of the function of 4 routing table 22 registers — for routing access of packets transmitted in different directions. It should be noted that, in other embodiments, the network-on-chip routing device may also adopt other topologies, and the number of the routers 10 and the Q modules 20 in the network routing device may also be adjusted according to actual needs.

In this embodiment, each router 10 at least includes a failure detection module 11, a first-in first-out queue 12, an input port 13(input port), a crossbar 14(crossbar), a first-in first-out queue 15, and an output port 16(output port) that are sequentially arranged, and detailed functional descriptions of each unit of the router 10 are described in detail in the relevant sections above.

In this embodiment, the Q module 20 includes a learning process (routing algorithm unit) of Q-learning, an R matrix 21 (first matrix), a Q matrix (second matrix), a direction selection matrix, and a routing table 22, and the detailed description of the functions of each unit of the Q module 20 is described in the relevant parts above, and is not repeated here. Referring to fig. 4, in a 3 × 3 sized network on chip, each node learns in parallel at the same time; the Q-module 20 of each node has 4 learning directions that can be learned simultaneously.

The working principle of the routing algorithm is as follows: in a unit learning time, the Q matrix simultaneously reads the maximum reward values of the adjacent Q matrixes from the routing nodes 100 in 4 directions, and the Q reward values of the destination routing nodes after a plurality of unit learning times are sequentially transmitted to the Q matrix of each routing node 100 according to a chain rule.

The network-on-chip congestion situation of this embodiment is shown in fig. 5: r1 to R4 are primary plugs, R3 to R6 are secondary plugs, R5 to R6 are secondary plugs, and R5 to R8 are primary plugs. The flow of the routing algorithm will be explained below using as an example a packet request sent from routing node No. 1 to routing node No. 9 100 (destination routing node):

first, initialization is performed (a data packet is not allowed to be sent to the router 10 network in the initialization phase), so that each routing node 100 prepares the routing tables 22 of all destination routing nodes, that is, the learning module 20 of each routing node 100 obtains the congestion status information of the learning modules 20 of the neighboring routing nodes 100 respectively and then performs parallel learning, so as to obtain the optimal data transmission path of each destination routing node (step S20), which specifically includes the following sub-steps:

step S21, in the initial stage of initialization, quantizes the congestion status of the entire routing network to a specific value, writes the value into the R matrix 21, and writes the information of the destination routing node into the R matrix 21: quantizing the information of a destination routing node into 3 ' b000, quantizing the channel information into 3 ' b001, quantizing the first-stage blocking state information into 3 ' b010, quantizing the second-stage blocking state information into 3 ' b011, analogizing the multi-stage blocking state information in sequence, and quantizing the device edge information and the device temporary inactivation information into 3 ' b 111; the specific quantization indexes are as follows: a passage: the number of packets passing through the routing node 100 in a period of time is 0; first-stage blockage: the number of packets passing through routing node 100 over a period of time is X1; secondary blockage: the number of packets passing through routing node 100 over a period of time is X2; and so on; the fault tolerance function of the router 10 can be realized by means of a signal of temporary deactivation information of the device. It should be noted that, at this stage, the data packet is not allowed to be sent into the router 10 network;

step S22, in the initialization stage, except that the reward value of the destination routing node R9 is set to 1000, the reward values of the other routing nodes 100 are initialized to 0; the Q module 20 reads the next hop reward maximum values of the corresponding routing nodes 100 in 4 adjacent directions respectively; for the network-on-chip congestion situation shown in fig. 5, specifically, R6 and R8 can obtain the reward value information of R9 in the first unit learning time; the reward value obtained by R3, R5 and R7 in the first unit of learning time is 0; the R3, the R5 and the R7 can respectively obtain the reward value information of the R6 and the R8 in the second unit learning time; by analogy, all the routing nodes 100 can finally obtain the reward value information of the destination routing node R9;

step S23, an initialization phase, in each routing node 100, after unit learning time, the maximum Q matrix reward value of 4 adjacent routing nodes 100 is calculated by formula (1) (2), 4 weighted reward values are obtained, and the obtained 4 weighted reward values are stored in reg (registers) of 4 directions of the local routing node 100 respectively;

the calculation formula is as follows: (1)

(2)

wherein Q (cs, A) represents the weighted reward value of the A direction of the local routing node 100, Q (ns, A) the maximum reward value of the next hop routing node 100, cs represents local, ns represents next hop, A represents direction, γ₁Denotes the first order plugging factor, gamma₂Represents a secondary clogging coefficient, wherein γ is a value between (0, 1), and γ₁>γ₂Q (cs, max) represents the maximum reward value stored by the local routing node 100; as an example: gamma ray₁＝0.9，γ₂＝0.8；

And (3) taking the weighted reward values of 4 directions for comparison by a formula (2), and taking the maximum weighted reward value as the maximum reward value Q (cs, max) of the current node for the routing nodes 100 in other adjacent directions to call during learning. The following will analyze its reward value with several routing nodes 100 as examples:

for R9, the local node's prize value of 1000 is the maximum prize value;

for R6, Q (6, max) ═ Q (6,9) ═ 1000;

for R5, Q (5,6) ═ γ₂*Q(6,9)＝γ₂*1000；Q(5,8)＝γ₁*Q(8,9)＝γ ₁1000; q (5,4) ═ Q (4,7) -1 ═ Q (7,8) -2 ═ Q (8,9) -3 ═ 997; q (5,2) is not considered; for R5, the maximum prize value Q (5, max) ═ Q (5,4) ═ 997;

step S24, repeating steps 2 to 3x times until the worst route congestion status information, that is, the longest route planning information is transmitted, where x depends on the size of the routing network, and when the network is 3 × 3, x is 3 × 3 — 1 — 8, at this time, a round of turns is completed, and as for the destination routing node R9, the final reward value in the Q module 20 and the direction information of the routing table 22 are as shown in fig. 6 and 7;

step S25, each time a rotation is completed, it indicates that for a certain destination routing node (e.g. R9), the routing lookup table of the routing network under the condition is completely prepared, and when the network conditions corresponding to all destination routing nodes are completely learned (for each destination routing node, 3 × 3 times in total repeating steps 2 to 4, 3 × 3 is the size of the network), one rotation is completed, each routing node 100 completely prepares the routing table 22 of 9 destination routing nodes, the addresses of the destination routing nodes in the table and the corresponding optimal data transmission directions, this routing table 22 can provide the optimal path information from the data packet in any direction to any destination routing node, and the initialization stage is completed.

Then, data transmission is performed using the prepared routing table 22, that is, data transmission is performed according to the optimal data transmission path (S30). For the destination routing node R9, the reward values in the final Q module 20 and the direction information of the routing table 22 are as shown in fig. 6, 7; an optimal path (also called an optimal data transmission path) can be obtained according to the reward value information, namely R1-R2-R5-R4-R7-R8-R9, and data transmission is carried out according to the optimal transmission path, and the method comprises the following sub steps:

step S31, as shown in fig. 8, in the packet sending stage, the R1 routing node 100 receives a data packet (to-be-transmitted data packet) sent from the PE1, the packet sender in the PE1 encodes the destination routing node information (R9) as 4' b1001 and packs the information into a packet header flit to form a first data packet, and then the first data packet enters the router node through the input port 13 of the PE1-R1, and then enters the crossbar 14(crossbar module) of the R1; the crossbar 14 transmits destination routing node information 4' b1001 in the header flit to the Q module 20; as shown in fig. 10, the header flit format of the first packet is as follows: each data comprises a data packet ID, a destination number and data information;

step S32, the Q module 20 uses the destination information 4 ' b1001 carried by the incoming first data packet as the address of the routing table 22, finds that the best path direction of the next hop of R1 is east (E, 4 ' b0001), and returns the east (E, 4 ' b0001) to the crossbar 14;

step S33, the crossbar 14 gates the direction east according to the unique code 4' b0001, and transfers the data packet to the output port 16 of the direction east according to the gating result, so that the transmission process of the data packet in the R1 router 10 is completed;

step S34, continuing the packet transmission (repeating the steps S31-S33), obtaining an optimal path according to the information of the prize value of the Q-module 20, i.e., R1-R2-R5-R4-R7-R8-R9, and transmitting the packet to R9 according to the optimal path.

It should be noted that, in this embodiment, the packet transmission is concurrent, and steps S31 to S34 may be performed simultaneously, that is, in addition to the packet sent by PE1, other PEs may also send out the packet and perform data transmission on the network on chip; meanwhile, in this embodiment, during the data transmission process, the fault detection module 11 may update the congestion information in real time and perform path planning again. That is, when the data packet uses the copied routing table 22 for data transmission, the Q module 20 updates the congestion status information and the path plan, stores the path information into the direction selection matrix, and copies the path information into the routing table 22 after a certain time interval for the access of the packet header flit information. The path planning speed is very fast, and the dynamic path can meet the real-time requirement.

It should be noted that, in this embodiment, referring to fig. 9, when the learning module 20 of the routing node 100 identifies the failure information, path planning is performed again; specifically, after determining that a certain channel is damaged, the fault detection module 11 returns an error flag bit to the Q module 20, indicating that the line is damaged; the Q module 20 writes the state into the R matrix 21 according to the error position, and at this time, it can be regarded that the failure state is the device edge state and the full jam state, so that at the next learning, the agent updates the reward value to the Q matrix in the direction according to the weight of the full jam, and finally, the effect of bypassing the failure region is achieved.

In the description herein, numerous specific details are provided, such as examples of components and/or methods, to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that an embodiment of the invention can be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, methods, components, materials, parts, and/or the like. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of embodiments of the invention.

Reference throughout this specification to "one embodiment", "an embodiment", or "a specific embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment, and not necessarily all embodiments, of the present invention. Thus, respective appearances of the phrases "in one embodiment", "in an embodiment", or "in a specific embodiment" in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any specific embodiment of the present invention may be combined in any suitable manner with one or more other embodiments. It is to be understood that other variations and modifications of the embodiments of the invention described and illustrated herein are possible in light of the teachings herein and are to be considered as part of the spirit and scope of the present invention.

It will also be appreciated that one or more of the elements shown in the figures can also be implemented in a more separated or integrated manner, or even removed for inoperability in some circumstances or provided for usefulness in accordance with a particular application.

Additionally, any reference arrows in the drawings/figures should be considered only as exemplary, and not limiting, unless otherwise expressly specified. Further, as used herein, the term "or" is generally intended to mean "and/or" unless otherwise indicated. Combinations of components or steps will also be considered as being noted where terminology is foreseen as rendering the ability to separate or combine is unclear.

As used in the description herein and throughout the claims that follow, "a", "an", and "the" include plural references unless otherwise indicated. Also, as used in the description herein and throughout the claims that follow, unless otherwise indicated, the meaning of "in …" includes "in …" and "on … (on)".

The above description of illustrated embodiments of the invention, including what is described in the abstract of the specification, is not intended to be exhaustive or to limit the invention to the precise forms disclosed herein. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes only, various equivalent modifications are possible within the spirit and scope of the present invention, as those skilled in the relevant art will recognize and appreciate. As indicated, these modifications may be made to the present invention in light of the foregoing description of illustrated embodiments of the present invention and are to be included within the spirit and scope of the present invention.

The systems and methods have been described herein in general terms as the details aid in understanding the invention. Furthermore, various specific details have been given to provide a general understanding of the embodiments of the invention. One skilled in the relevant art will recognize, however, that an embodiment of the invention can be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, methods, components, materials, parts, and/or the like. In other instances, well-known structures, materials, and/or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the invention.

Thus, although the present invention has been described herein with reference to particular embodiments thereof, a latitude of modification, various changes and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of the invention will be employed without a corresponding use of other features without departing from the scope and spirit of the invention as set forth. Thus, many modifications may be made to adapt a particular situation or material to the essential scope and spirit of the present invention. It is intended that the invention not be limited to the particular terms used in following claims and/or to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include any and all embodiments and equivalents falling within the scope of the appended claims. Accordingly, the scope of the invention is to be determined solely by the appended claims.

Claims

1. A control method of a network-on-chip routing device, the control method comprising:

2. The method according to claim 1, wherein the step of performing parallel learning after the learning module of each routing node acquires the congestion status information of the learning modules of the neighboring routing nodes further comprises writing the congestion status information of each routing node into the learning module of the routing node.

3. The method for controlling the network-on-chip routing device according to claim 1, wherein the method further comprises the steps of:

4. The method of claim 1, wherein the congestion status information comprises destination routing node information, path information, multi-level congestion status information, device edge information, and temporary damage information.

5. The method for controlling the network-on-chip routing device according to claim 1, wherein the step of performing data transmission according to the optimal data transmission path comprises:

6. The method according to any one of claims 1 to 5, wherein the step of performing parallel learning after the learning module of each routing node acquires the congestion status information of the learning modules of adjacent routing nodes, respectively, to acquire the optimal data transmission path of each destination routing node comprises:

7. The method of claim 6, wherein when each of the routing nodes has four data transmission directions, the predetermined formula comprises

8. An on-chip network routing apparatus, comprising:

9. The network-on-chip routing apparatus of claim 8, wherein the router comprises a failure detection module, an input port, a crossbar, and an output port, and the learning module is connected to the failure detection module and the crossbar, respectively.

10. The network-on-chip routing apparatus of claim 8, wherein the learning module comprises:

a routing algorithm unit for storing a computer program for implementing the learning module function;