CN111522775B

CN111522775B - Network-on-chip routing device and control method thereof

Info

Publication number: CN111522775B
Application number: CN202010320744.7A
Authority: CN
Inventors: 李桢旻; 翁晓峰; 王镜涵; 沈烨钦; 杜高明; 王晓蕾
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2023-05-16
Anticipated expiration: 2040-04-22
Also published as: CN111522775A

Abstract

The invention provides a network-on-chip routing device and a control method thereof, wherein the control method comprises the steps of configuring a learning module on each router of the network-on-chip routing device, wherein each router and the learning module form a routing node; the learning module of each routing node respectively acquires the blocking state information of the learning module of each adjacent routing node and then carries out parallel learning so as to acquire the optimal data transmission path of each destination routing node; and carrying out data transmission according to the optimal data transmission path. The network-on-chip routing device and the control method thereof can accelerate the network convergence speed, improve the parallel lines, greatly improve the path planning speed, and can find a proper path only by acquiring the position information of the destination routing node, thereby not only reducing the occupation of resources, but also shortening the path planning time.

Description

Network-on-chip routing device and control method thereof

Technical Field

The invention relates to the technical field of multi-mode data feature learning, in particular to a network-on-chip routing device and a control method thereof.

Background

The routing algorithm is one of the main factors affecting network-on-chip performance. Static network-on-chip architectures, such as X-Y routing algorithms, increase power consumption and router temperature when large amounts of data are transmitted, creating "hot spots". The hot spot tends to cause data blocking, data delay, unreliable chip and reduced service life. Therefore, there is a need to improve the algorithm to change the network-on-chip into a dynamic architecture to increase the data transmission efficiency (because the routing algorithm can optimize the transmission path for each data packet, reduce the overall transmission time and thus increase the efficiency), reduce congestion (because there is no routing algorithm or the routing algorithm does not optimize the transmission path in time, and a large number of data packets select the same link to cause single or multiple links to be blocked), "hot spots", and data transmission delay (because the number of hops that a message passes from a source node to a destination routing node is directly determined by the actual path that passes through), and ensure that the network-on-chip has good temperature while communicating at high speed, thereby ensuring that the chip is safe and reliable.

At present, a plurality of dynamic self-adaptive algorithms exist, and the Q-learning algorithm in machine learning has extremely strong self-adaptability, can autonomously interact with the environment to derive the optimal path information, and can obtain a finite path only after the learning result converges, so that the condition of deadlock is avoided. However, the conventional algorithm has the defects that only one agent (agent) can interact with the environment in a single time, the learning process is slow, and the algorithm cannot necessarily explore all spaces due to random numbers and greedy strategies, so that a better path is found.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a network-on-chip routing device and a control method thereof, which are used for solving the technical problem that the audio-video collaborative learning method in the prior art is not suitable for label-free data.

To achieve the above and other related objects, the present invention provides a control method of a network-on-chip routing device, the control method comprising:

configuring a learning module on each router of the network-on-chip routing device, wherein each router and the learning module form a routing node;

the learning module of each routing node respectively acquires the blocking state information of the learning module of each adjacent routing node and then carries out parallel learning so as to acquire the optimal data transmission path of each destination routing node;

and carrying out data transmission according to the optimal data transmission path.

In an optional embodiment, before the step of parallel learning after the learning module of each routing node acquires the blocking state information of the learning module of each adjacent routing node, the step of writing the blocking state information of each routing node into the learning module of the routing node is further included.

In an alternative embodiment, the control method further comprises the steps of:

when the fault detection module of the routing node detects a fault, the fault information is sent and written into the learning module of the routing node, and path planning is performed again after the learning module of the routing node recognizes the fault information.

In an alternative embodiment, the congestion status information includes destination routing node information, path information, multi-level congestion status information, device edge information, and temporary damage information.

In an alternative embodiment, the step of transmitting data according to the optimal data transmission path includes:

packaging the destination routing node information into a header microchip of a data packet to be transmitted so as to form a first data packet;

and inputting the first data packet into a selected routing node, and carrying out data transmission by the routing node according to an optimal data transmission path corresponding to a destination routing node in a first microchip of the first data packet.

In an optional embodiment, the step of performing parallel learning after the learning module of each routing node obtains the blocking state information of the learning module of each adjacent routing node to obtain the optimal data transmission path of each destination routing node includes:

The learning module of each routing node simultaneously acquires the maximum rewards stored in the learning module of each adjacent routing node;

after unit learning time passes in each routing node, calculating the maximum rewarding value stored in the learning module of each adjacent routing node according to a preset formula according to the blocking state information of the learning module of each adjacent routing node so as to obtain a plurality of weighted rewarding values;

taking the maximum value of the weighted reward values as the maximum reward value of the local routing node;

repeating the three steps until the longest path learning is completed, so as to obtain a routing table of the destination routing node;

and repeatedly executing the four steps until the routing table of each destination routing node is prepared.

In an alternative embodiment, when each of the routing nodes has four data transmission directions, the preset formula includes

Wherein Q (cs, A) represents a weighted prize value of the A direction of a local routing node, Q (ns, A) represents a maximum prize value stored by the routing node of the next hop, cs represents the local, ns represents the next hop, A represents the direction, gamma ₁ Representing the first order plug coefficient, gamma ₂ Representing the secondary blockage factor, gamma ₁ 、γ ₂ Between 0 and 1, and gamma ₁ >γ ₂ Q (cs, max) represents the maximum prize value stored by the local routing node.

To achieve the above and other related objects, the present invention also provides a network-on-chip routing device, including:

a plurality of routing nodes connected according to a preset network topology structure, wherein each routing node comprises a router and a learning module which are connected with each other;

the learning module of each routing node is used for respectively acquiring the blocking state information of the learning module of each adjacent routing node and then carrying out parallel learning so as to acquire the optimal data transmission path of each destination routing node; the router of each routing node is used for transmitting data according to the optimal data transmission path.

In an alternative embodiment, the router includes a fault detection module, an input port, a crossbar matrix, and an output port, and the learning module is connected to the fault detection module and the crossbar matrix, respectively.

In an alternative embodiment, the learning module includes:

a routing algorithm storage unit for storing a computer program implementing the function of the learning module;

A first matrix for storing congestion status information of respective adjacent directions of the routing nodes;

the second matrix is used for storing the weighted rewards of each data transmission direction after the corresponding routing node learns;

the direction selection matrix is used for storing the data transmission direction and the destination routing node information corresponding to the maximum value in the weighted reward values of all the data transmission directions in the second matrix;

and the routing table is used for storing the copied direction selection matrix.

The network-on-chip routing device and the control method thereof adopt a split type parallel Q-learning network-on-chip fault-tolerant routing algorithm, namely the learning module of each routing node respectively acquires the blocking state information of the learning module of each adjacent routing node and then carries out parallel learning so as to acquire the optimal data transmission path of each destination routing node, thereby accelerating the network convergence speed, improving the parallel number, greatly improving the speed of path planning, and simultaneously, only acquiring the position information of the destination routing node, finding a proper path, not only reducing the occupation of resources, but also shortening the time of path planning;

In the network-on-chip routing device and the control method thereof, in the network-on-chip fault-tolerant routing algorithm based on split parallel Q learning, the random number is parallelized, the random selection direction is changed into the full selection direction, and each data transmission direction of each routing node can work in parallel, so that the coverage rate can be ensured to reach 100%;

according to the network-on-chip routing device and the control method thereof, random numbers are not used in the network-on-chip fault-tolerant routing algorithm based on split parallel Q learning, and the convergence time is also greatly shortened under the condition of guaranteeing the coverage rate;

according to the network-on-chip routing device and the control method thereof, the network-on-chip fault-tolerant routing algorithm based on split parallel Q learning is more flexible in path planning during multi-stage blocking.

Drawings

Fig. 1 is a flow chart of a control method of a network-on-chip routing device according to the present invention.

Fig. 2 is a schematic diagram of a partial architecture of a network-on-chip routing device according to the present invention.

Fig. 3 is a schematic diagram showing connection between a single routing node and an adjacent routing node in the network-on-chip routing device according to the present invention.

Fig. 4 shows a novel router interconnection architecture with distributed Q modules in accordance with an embodiment of the present invention.

Fig. 5 is a schematic diagram of a network on chip blocking situation in an embodiment of the present invention.

Fig. 6 is a diagram illustrating the final prize value of the Q matrix in accordance with an embodiment of the present invention.

Fig. 7 is a schematic diagram of path planning information of a Q module according to an embodiment of the invention.

FIG. 8 is a schematic diagram illustrating the structure of a data packet from PE1 into R1 and accessing the routing table, gating output according to an embodiment of the present invention.

FIG. 9 is a schematic diagram showing the location and operation of the fault detection module according to an embodiment of the present invention.

Fig. 10 is a schematic diagram of a packet format in an embodiment of the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.

Please refer to fig. 1-10. It should be noted that, the illustrations provided in the present embodiment merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complex.

Referring to fig. 1, an embodiment of the present invention introduces a control method of a network-on-chip routing device, where the control method of the network-on-chip routing device is implemented by using a split type parallel Q-learning network-on-chip fault-tolerant routing algorithm, so as to increase the network convergence speed, increase the parallel number, greatly increase the rate of path planning, and only obtain the position information of a destination node to find a suitable path, thereby not only reducing the occupation of resources, but also shortening the time of path planning. Specifically, the split parallel Q learning network-on-chip fault-tolerant routing algorithm of the invention enables the data packet to always select a link which leads to a destination and has better transmission efficiency for routing by collecting network congestion information and link service conditions. The algorithm utilizes each router 10 to search in all directions, parallelizes to obtain network-on-chip blocking information, utilizes Q-learning (Q learning) to rapidly calculate the optimal path of the data packet in real time, enables the network-on-chip to be changed into a dynamic architecture, and enables data traffic to be shared by the whole, thereby improving data transmission efficiency and performance, reducing blocking and delay in the data transmission process, and improving fault tolerance and throughput. The router 10 also has good expansibility and can be applied to networks on chip with different scales.

The technical scheme of the invention will be specifically described below with reference to the accompanying drawings.

Referring to fig. 1, the method for controlling the network-on-chip routing device includes the following steps:

step S10, configuring a learning module 20 on each router 10 of the network-on-chip routing device, wherein each router 10 and one learning module 20 form a routing node 100;

step S20, after the learning module 20 of each routing node 100 obtains the blocking state information of the learning module 20 of each adjacent routing node 100, parallel learning is performed to obtain the optimal data transmission path of each destination routing node;

and step S30, carrying out data transmission according to the optimal data transmission path.

In step S10, referring to fig. 1, the configuration of the network-on-chip routing device is performed, that is, a learning module 20 is configured for each router 10 of the network-on-chip routing device, and each router 10 and one learning module 20 form a routing node 100. Wherein fig. 2 shows a schematic diagram of a partial architecture of the network-on-chip routing device of the present invention; fig. 3 shows a schematic diagram of the connection of a single routing node 100 with a neighboring routing node 100 in a network-on-chip routing device according to the present invention.

Referring to fig. 2 and 3, the network-on-chip routing device includes a plurality of routing nodes 100 connected according to a preset network topology, each routing node 100 includes a router 10 and a Q intelligent learning module 20 (hereinafter referred to as Q module 20 or learning module 20) connected to each other, that is, the network-on-chip routing device of the present invention includes a network-on-chip composed of a plurality of routers 10 and a learning network composed of a plurality of learning modules 20, no packet transmission is performed between the router 10 and the Q module 20 of each routing node 100, and prize value information (including weight value, hop count, path length information, etc.) is transmitted between the Q modules 20 of two adjacent routing nodes 100. Specifically, as shown in fig. 2, in the present invention, Q-modules 20 are deployed to each router node, and each router 10 and Q-module 20 deployed thereon are combined as one routing node 100; as shown in fig. 3, in one network-on-chip, each node learns in parallel at the same time, and the Q module 20 of each node has m (m is a positive integer) learning directions (corresponding to each router 10 having m data transmission directions) which can learn in parallel at the same time. The scale of the Q module 20 of the present invention may be increased according to the instantiation of the router 10, and the adaptation is high.

Referring to fig. 2, in the present invention, each router 10 includes at least four parts of a fault detection module 11, an input port 13 (input port), a crossbar matrix 14 (crossbar), and an output port 16 (output port) which are sequentially disposed. The data of the input and output ports 16 of the router 10 is decided according to the scale of the network on chip; as an example, the number of input ports 13 is m+1, where m input ports 13 of one router 10 are respectively connected to m routers 10 adjacent thereto, and the remaining one input port 13 is used as a local (local input port 13) port, from which the corresponding terminal PE packet data under the router 10 enters the network on chip, and the terminal PE packet data may be replaced by, for example, a packet transmitter (packet transmitter); as an example, the number of output ports 16 is m+1, for example, where m output ports 16 of one router 10 are connected to m routers 10 adjacent to each other, and the remaining output port 16 is a local (local output port 16) port, from which a packet arrives at the destination router 10 and enters the terminal PE.

It should be noted that, the router of the present invention has fault tolerance function: for soft errors (transient data errors in the transmission process), the data of the data packet is judged to be correct by odd check, and once the data is in error, the router stops the transmission of the data packet and waits for the terminal PE to resend; for hard errors (hard errors, i.e. router node damage or data pipe damage, data cannot pass correctly through the failure area), the failure detection module 11 is used to detect and when a hard error is detected, the Q module 20 is used to re-route the path.

Specifically, referring to fig. 2, in the present invention, the fault detection module 11 is connected to an R matrix 21 of a Q module 20 to be described later; when a packet enters the input port 13, it passes through the fault detection module 11, where the fault detection module 11 is configured to detect a hard error of the router 10, and the fault detection module 11 includes a flow counter and a low level counter (also referred to as a level judgment state machine), where the flow counter is configured to monitor the flow counter to provide the R matrix 21 with blocking condition information and device fault status, and the low level counter accumulates the number of continuous low levels in the transmission line and provides the R matrix 21 with fault status. Specifically, the flow counter is utilized to judge the blocking degree according to the number of the packets sent in a period of time, but the counter does not count in a longer period of time, so that the data pipeline in the direction is judged to be damaged; by using level determination, a low level counter is designed beside the flow counter, the data pipeline is defaulted to be high level, when the output port 16 sends data, the start bit (e.g. 01) is sent first to indicate that the data starts to be transmitted, and fault detection uses the default level of the data pipeline, if the pipeline is continuously non-high level (i.e. high impedance state or low level) for a period of time, the data pipeline in the direction is determined to be damaged.

It should be noted that, in the present invention, referring to fig. 9, when the learning module 20 of the routing node 100 recognizes the fault information, path planning is performed again; specifically, after judging that a certain channel is damaged, the fault detection module 11 returns an error flag bit to the Q module 20 to indicate that the line is damaged; the Q module 20 writes the state into the R matrix 21 according to the error position, which can be regarded as a fault state=device edge state=full blocking state, so that the agent updates the reward value to the Q matrix in the direction according to the weight of all blocking when learning next time, and finally, the action of bypassing the fault region is achieved.

Referring to fig. 2, a FIFO (First Input First Output, first-in first-out queue 12) is further disposed between each input port 13 of the router 10 and the fault detection module 11, for buffering data, so as to prevent an excessive transmission pressure from causing data errors or delay increases; a FIFO (First Input First Output, first-in first-out queue 15) is also provided between each output port 16 of the router 10 and the crossbar 14 for data buffering, preventing transmission pressure from being excessive, resulting in data errors or increased delays.

In the present invention, referring to fig. 3, the Q module 20 includes a routing algorithm unit (not shown), an R matrix 21 (i.e., a first matrix), a Q matrix (i.e., a second matrix, not shown), a direction selection matrix (not shown), and a routing table 22. The routing algorithm unit is configured to implement a computer program storing a routing algorithm of Q-learning, which is configured to implement a path planning function of the Q module 20, and may be implemented by a mem type memory or a reg register, and the first matrix is configured to store congestion status information of each neighboring direction of the routing node 100 to which the first matrix belongs, and may be implemented by a mem type memory or a reg register. The R matrix 21 is used for storing congestion status information of m adjacent directions of the routing node 100, including information of destination routing nodes, paths, multi-level congestion, device edges, temporary damage, and the like, which can be implemented by a mem type memory or reg register. The Q matrix is used to store the weighted prize values (which are defined below) for each data transmission direction after learning by the associated routing node 100, which may be implemented by a mem-type memory or reg register, the weighted prize values in the Q matrix representing the expected value from the destination node, the larger the prize value, the faster this path is selected to the destination. The direction selection matrix is used for storing the data transmission direction (which is used as the optimal data transmission path) and the destination routing node information corresponding to the maximum value in the weighted reward values of all the data transmission directions in the Q matrix; the direction selection matrix may be implemented, for example, by a memory (reg register may also be used) of the mem (memory) type, where the address is the destination node, the data is an m-bit one-hot code representing m directions, the weighted prize value in the Q matrix represents the expected value from the destination node, and the larger the weighted prize value, the faster the path is selected to the destination. The routing table 22 is used for storing the copied direction selection matrix, the copied routing table 22 can be used for routing access of data packets transmitted in different directions, and the routing table 22 copied by the direction selection matrix can be realized by a reg register (a mem type memory can also be adopted) for adapting to high-speed access and response; in the present invention, the routing table 22 only stores the direction information of the local node, and the storage resources of the routing table 22 are greatly reduced compared with the conventional routing algorithm.

The working principle of the routing algorithm of the invention is that in one unit learning time, the Q matrix respectively reads the maximum rewards of the Q matrix of the adjacent routing nodes 100 from the routing nodes 100 in m directions at the same time, and the rewards of the destination routing nodes after a plurality of unit learning time are sequentially transmitted to the Q matrix of each routing node 100 according to a chain rule. The path planning convergence time of the routing algorithm is faster than that of the common self-adaptive routing algorithm, and the planned path is better. In the invention, the Q matrix is used for simultaneously reading the rewarding value information of the routing nodes 100 in m directions and simulating the action selection exploration environment in the traditional Q-learning algorithm, but the routing algorithm of the invention does not use random number selection action, but adopts full-coverage exploration, thereby ensuring exploration to all spaces and greatly reducing the time of path planning. It should be noted that, the routing algorithm unit of the present invention may be implemented by GPU (Graphics Processing Unit, graphics processor), ASIC (Application Specific Integrated Circuit ) or FPGA (Field Programmable Gate Array, field programmable gate array), and based on cost and execution time, the routing algorithm of the present invention may be implemented by FPGA of a semi-custom circuit.

When a data packet request is sent to a certain destination routing node, the routing workflow is performed according to the following steps:

firstly, initializing to make each routing node 100 ready for the routing table 22 of all destination routing nodes, that is, making the learning module 20 of each routing node 100 obtain the congestion status information of the learning modules 20 of adjacent routing nodes 100, and then performing parallel learning to obtain the optimal data transmission path of each destination routing node (step S20), which includes the following sub-steps:

step S21, in the initial stage of the initialization stage, firstly quantifying the congestion status of the whole routing network into a specific value, writing the specific value into an R matrix 21, and writing the information of the destination routing node into the R matrix 21, namely writing the quantified congestion status and destination routing node information (the quantified congestion status and destination routing node information are defined as congestion status information) of each routing node 100 into a learning module 20 of the routing node 100;

step S22, in the initialization stage, the learning node (the learning module 20 of the routing node 100) respectively reads the maximum prize value of the next hop of the corresponding node in m directions, that is, the learning module 20 of each routing node 100 simultaneously acquires the maximum prize value stored in the learning module 20 of each adjacent routing node 100, and simulates the action exploration environment selected in the conventional Q-learning algorithm, so that the random number selection action is not used, but full-coverage exploration is adopted, so that all the space can be ensured to be explored, and the time of path planning is greatly reduced;

Step S23, in the initialization stage, after unit learning time, Q matrixes of adjacent m routing nodes 100 are calculated by a preset formula in each routing node 100, m weighted rewards are obtained and stored at reg (register) positions of m directions of the routing node 100 respectively, meanwhile, m weighted rewards are taken as comparison, and the largest value is taken as a rewards maximum value of the current routing node 100 (or called as the local routing node 100) for the routing nodes 100 in other adjacent directions to call during learning;

step S24, repeating the steps S22-S23 for x times until the longest path is learned to obtain a routing table 22 of the destination routing node, wherein x depends on the size of the routing network, and when the network is n x n, x=n n-1, and a node rotation is completed at this time;

step S25, each time a node rotation is completed, it is explained that for a certain destination routing node, the routing lookup tables of the routing network under the condition are prepared, and the network conditions corresponding to all destination nodes are learned (repeating steps S22 to S24 n times, n is the network size) once, each routing node 100 is prepared to the routing tables 22 of all destination nodes, in other words, each routing table 22 of the destination routing node is prepared, and the initialization phase is ended.

Then, the data transmission is performed by using the prepared routing table 22, that is, the data transmission is performed according to the optimal data transmission path (S30), including the following sub-steps:

step S31, in the packet sending stage, a packet sender in the terminal PE packages the destination routing node information into a packet header microchip to be transmitted so as to form a first data packet, the first data packet enters the router 10 from the input port 13 and then is transmitted into a crossbar matrix 14 (crossbar module) of the router 10, and the crossbar matrix 14 transmits the destination routing node information in the header microchip into a Q module 20;

step S32, the Q module 20 uses the destination routing node information carried by the incoming first data packet as the address of the routing table 22, finds the direction of the next hop (single hot code representation) of the first data packet, and transmits the direction back to the crossbar 14;

step S33, the crossbar switch matrix 14 performs direction gating according to the unique thermal code, and transmits the first data packet to the output port 16 in the corresponding direction according to the gating result, so that the transmission process of the first data packet in one router 10 is completed;

step S34, the packet transmission is continued (steps S31 to S33 are repeated) until the first packet reaches the destination routing node.

In the present invention, the transmission of each data packet is parallel, and steps S31 to S34 may be performed simultaneously. It should be noted that, in the present invention, during the data transmission process, the fault detection module 11 may update the congestion information in real time and re-perform the path planning. That is, when the data packet uses the copied routing table 22 to perform data transmission, the Q module 20 updates the congestion status information and the path plan, stores the path information into the direction selection matrix, and copies the path information into the routing table 22 for accessing by the packet header flit information after a certain time interval. The speed of path planning is very fast, and the dynamic path can meet the real-time requirement.

The invention greatly accelerates the learning speed by utilizing the parallelism characteristic of FPGA hardware, and the routing algorithm changes the original Q-learning process from serial to parallel; while a conventional Q-learning algorithm only has one agent existing in the entire network at a time, a split-type parallel Q-learning routing algorithm splits a long process of agent exploration from a start point to an end point into a plurality of small exploration behaviors, and can be performed simultaneously, and each routing node 100 on a long path is split into nodes of one small exploration behavior. Specifically, the original learning process is that every time a time unit is passed, the agent searches forward for one step and transmits a reward value once, and the agent finishes the once searching for x time units assuming that the total length of a path without branches is x nodes; the split routing algorithm of the present invention is that every time unit, all nodes on the path transmit the prize value to an adjacent node, but only the node storing the prize value information of the destination routing node is the current valid transmitting node, after x time units, the prize value of the destination node is reflected in the routing table 22 of the starting node, that is, the traditional Q-learning algorithm is similar to long-distance running, and the split algorithm is similar to relay. The two algorithms do not differ much for a non-branched path, but parallelization has the following advantages if it goes up into the whole routing network: traditional Q-learning is difficult to parallelize the exploration process, one of the reasons is that two or more agents randomly explore, and if just explore the same node, competition-adventure phenomenon can occur when reward value data are stored at the same time; the split may allow a higher number of parallelism, and the algorithm may be parallel for each direction in each router node in addition to each router node, i.e. the split algorithm is theoretically m x n times higher than the conventional algorithm in exploration efficiency (m is the number of directions per node, n is the total number of routers 10 of the network).

In general, reinforcement learning generally adopts multiple agents (each agent uses a random number alone) to explore an environment in an application field so as to realize random number parallelization; each agent acts randomly at the beginning, then according to feedback and different machine learning algorithms, the randomness of the agent acts gradually decreases, and after long-time iteration, an optimal selection path is gradually converged; conventional routing algorithms are unstable during the exploration process due to the use of random numbers to control the direction, since some directions are still not explored after a long period of convergence, and the use of random numbers to control the direction does not guarantee that every state of every node in the network is explored, i.e. the coverage is not 100%. In the invention, the random number is parallelized, and the random selection direction is changed into the full selection direction, and as each direction of each node of the split type routing algorithm can work in parallel, the split type routing algorithm can ensure that the coverage rate reaches 100%; in the problem of finding the optimal path, the coverage rate of the traditional algorithm is low, so that the effect of finding the optimal path is not as good as that of a split algorithm; the split algorithm is better in learning results (accuracy).

In addition, in the invention, as the split algorithm does not use random numbers, under the condition of guaranteeing coverage rate, the convergence time is also greatly shortened, because the situation of repeated states can be met by using the random numbers, the convergence time can be increased, and the split algorithm can ensure that each situation can be explored and only explored once.

In addition, the network-on-chip fault-tolerant routing algorithm based on split parallel Q learning is more flexible in performance when planning paths under the condition of multistage blockage. The traditional routing algorithm can only select the direction close to the destination land node, and the selectivity is less, so that the blocking degree cannot be effectively reduced; the routing algorithm of the invention can select the directions away from the destination node (the directions are no-blockage or lower-blockage degrees) under certain blockage degrees, so that the purpose of reducing the blockage degree of the whole network on chip is achieved.

Referring to fig. 1-10, the following description will be made with reference to a specific embodiment.

In this specific embodiment, a step of configuring the network-on-chip routing device is first performed (step S10). Please refer to the network-on-picture routing device using a 2D-mesh topology, for example, it may include 9 router nodes (hereinafter referred to as routers 10), and each router 10 may have, for example, 4 data transmission directions. As shown in fig. 3, in order to collect information of each router 10 of the network, calculate weights and plan paths, for example, 9Q intelligent learning modules 20 need to be added to the network. The Q intelligent learning module 20 (hereinafter referred to as Q module 20 or learning module 20) and the router 10R in the 2D-mesh network are connected one by one, and transmit congestion status information and packet path information between each other, and do not transmit a packet. After each Q module 20 obtains the congestion status information, intelligent learning begins rapidly and in parallel, an optimal data transmission path of each destination routing node is planned, path information is stored in a direction selection matrix of the

Q module

20, and 4 routing table 22 registers are copied for routing access of data packets transmitted in different directions. It should be noted that, in an alternative example, only 1 routing table 22 register having 4 read ports and 1 write port may be copied instead of the function of the 4 routing table 22 registers—for routing access of data packets transmitted in different directions. It should be noted that, in other embodiments, the network-on-chip routing device may also adopt other topologies, and the number of routers 10 and Q modules 20 in the network routing device may also be adjusted according to actual needs.

In this embodiment, each router 10 includes at least a fault detection module 11, a first-in first-out queue 12, an input port 13 (input port), a crossbar matrix 14 (crossbar), a first-out queue 15, and an output port 16 (output port) sequentially arranged, and the functional details of each unit of the router 10 are described in the relevant sections above.

In this embodiment, the Q module 20 includes a learning process (a routing algorithm unit) of Q-learning and an R matrix 21 (a first matrix), a Q matrix (a second matrix), a direction selection matrix and a routing table 22, and detailed descriptions of functions of each unit of the Q module 20 are described in the relevant sections above, which are not repeated herein. Referring to fig. 4, in a 3x3 size network on chip, each node learns in parallel at the same time; there are 4 learning directions for each node's Q-module 20 to learn simultaneously.

The working principle of the routing algorithm is as follows: in a unit learning time, the Q matrix simultaneously reads the maximum rewards of adjacent Q matrices from the routing nodes 100 in 4 directions, and the Q rewards of the destination routing nodes after a plurality of unit learning times are sequentially transmitted to the Q matrix of each routing node 100 according to a chain rule.

The network-on-chip blocking situation of this embodiment is shown in fig. 5: r1 to R4 are primary plugs, R3 to R6 are secondary plugs, R5 to R6 are secondary plugs, and R5 to R8 are primary plugs. The flow of the routing algorithm will be explained below using a packet request sent from routing node No. 1 100 to routing node No. 9 100 (destination routing node) as a case:

firstly, initializing (the data packet is not allowed to be sent to the router 10 network in the initialization stage) to make each routing node 100 ready the routing table 22 of all destination routing nodes, that is, the learning module 20 of each routing node 100 acquires the congestion status information of the learning module 20 of each adjacent routing node 100, and then performs parallel learning to acquire the optimal data transmission path of each destination routing node (step S20), which specifically includes the following sub-steps:

in the initial stage of the initialization stage, the step S21 is to quantify the congestion status of the entire routing network to a specific value, write the congestion status into the R matrix 21, and write the information of the destination routing node into the R matrix 21: quantizing information of a destination routing node to 3' b000, quantizing path information to 3' b001, quantizing primary blocking state information to 3' b010, quantizing secondary blocking state information to 3' b011, analogizing multi-stage blocking state information, and quantizing device edge information and device temporary deactivation information to 3' b111; the specific quantization indexes are as follows: the passage is as follows: the number of packets passing through the routing node 100 for a period of time is 0; primary blockage: the number of data packets passing through the routing node 100 for a period of time is X1; secondary blockage: the number of data packets passing through the routing node 100 for a period of time is X2; and so on; by means of this signal of the device temporary deactivation information, a fault tolerant function of the router 10 can be achieved. It should be noted that, in this stage, the data packet is not allowed to be sent to the router 10 network;

Step S22, in the initialization stage, except that the rewards of the destination routing node R9 are set to 1000, the rewards of the rest routing nodes 100 are initialized to 0; the Q module 20 reads the next hop prize maximum of the corresponding routing node 100 in the 4 adjacent directions, respectively; for the network-on-chip blocking situation shown in fig. 5, specifically, R6 and R8 can obtain the prize value information of R9 in the first unit learning time; the rewarding value obtained by R3, R5 and R7 in the first unit learning time is 0; r3, R5 and R7 can respectively obtain the rewarding value information of R6 and R8 in the second unit learning time; by analogy, all the routing nodes 100 can finally obtain the rewarding value information of the destination routing node R9;

step S23, in the initialization stage, after unit learning time passes in each routing node 100, the Q matrix rewards maximum value of the adjacent 4 routing nodes 100 is calculated by the formulas (1) and (2), 4 weighted rewards values are obtained, and the obtained 4 weighted rewards values are respectively stored in reg (register) of the local routing node 100 in 4 directions;

the calculation formula is as follows: (1)

(2)

Where Q (cs, A) represents the weighted prize value for the A-direction of the local routing node 100, Q (ns, A) represents the maximum prize value for the next hop routing node 100, cs represents the local, ns represents the next hop, A represents the direction, gamma ₁ Representing the first order plug coefficient, gamma ₂ Represents a secondary clogging factor, wherein gamma is a value between (0, 1), and gamma ₁ >γ ₂ Q (cs, max) represents the maximum prize value stored by the local routing node 100; as an example: gamma ray ₁ ＝0.9，γ ₂ ＝0.8；

By the formula (2), the weighted prize values of 4 directions are compared, and the largest weighted prize value is used as the largest prize value Q (cs, max) of the current node, so that the routing nodes 100 in other adjacent directions can be called during learning. The following will analyze its prize value by way of example with several routing nodes 100:

for R9, the prize value 1000 for the local node is the maximum prize value;

for R6, Q (6, max) =q (6, 9) =1000;

for R5, Q (5, 6) =γ ₂ *Q(6,9)＝γ ₂ *1000；Q(5,8)＝γ ₁ *Q(8,9)＝γ ₁ *1000; q (5, 4) =q (4, 7) -1=q (7, 8) -2=q (8, 9) -3=997; q (5, 2) is not considered; for R5, maxPrize value Q (5, max) =q (5, 4) =997;

step S24, repeating steps 2-3 x times until the worst route congestion status information, that is, the longest route planning information is also transferred, where x depends on the size of the routing network, and when the network is 3*3, x= 3*3-1=8, and a node rotation is completed at this time, and as for the destination routing node R9, the prize value in the final Q module 20 and the direction information of the routing table 22 are shown in fig. 6 and 7;

Step S25, each time a cycle is completed, it is explained that for a certain destination routing node (such as R9), the routing lookup table of the routing network under the condition is prepared, the network conditions corresponding to all destination routing nodes are learned (for each destination routing node, repeating steps 2-4 for 3*3 times, 3*3 being the network size), each routing node 100 is prepared with a routing table 22 of 9 destination routing nodes, the address of the destination routing node and the corresponding optimal data transmission direction in the table, this routing table 22 can provide the optimal path information of the data packet of any direction to any destination routing node, and the initialization phase is ended.

Then, data transmission is performed using the prepared routing table 22, that is, data transmission is performed according to the optimal data transmission path (S30). For the destination routing node R9, the prize value in the final Q module 20 and the direction information of the routing table 22 are shown in fig. 6 and 7; an optimal path (also called an optimal data transmission path) can be obtained according to the bonus value information, namely, R1-R2-R5-R4-R7-R8-R9, and data transmission is carried out according to the optimal transmission path, and the method comprises the following substeps:

Step S31, as shown in FIG. 8, in the packet sending stage, the R1 routing node 100 receives a data packet (to be transmitted data packet) sent from PE1, a packet sender in PE1 encodes the destination routing node information (R9) into 4'b1001 and packages the 4' b1001 into a packet header microchip to form a first data packet, and then the first data packet enters the router node from an input port 13 of PE1-R1 and then enters a crossbar matrix 14 (crossbar module) of R1; the crossbar matrix 14 transmits the destination routing node information 4' b1001 in the head flit into the Q module 20; as shown in fig. 10, the first flit format of the first packet is as follows: each data comprises a data packet ID, a destination number and data information;

step S32, the Q module 20 uses the destination information 4'b1001 carried by the first data packet as the address of the routing table 22, finds that the optimal path direction of the next hop of R1 is east (E, 4' b 0001), and transmits the best path direction back to the crossbar matrix 14;

step S33, the crossbar 14 gates the direction east according to the unique thermal code 4' b0001, and transmits the data packet to the output port 16 of Fang Xiangdong according to the gate result, so that the transmission process of the data packet in the R1 router 10 is completed;

and step S34, continuing to transmit the data packet (repeatedly executing the steps S31-S33), and obtaining an optimal path, namely R1-R2-R5-R4-R7-R8-R9, according to the rewarding value information of the Q module 20, and transmitting the data packet to R9 according to the path.

It should be noted that, in this embodiment, the data packet transmission is concurrent, and steps S31 to S34 may be performed simultaneously, that is, other PEs besides the data packet sent by PE1 may send the data packet, and perform data transmission on the network on chip at the same time; meanwhile, in this embodiment, during the data transmission process, the fault detection module 11 may update the congestion information in real time and re-perform the path planning. That is, when the data packet uses the copied routing table 22 to perform data transmission, the Q module 20 updates the congestion status information and the path plan, stores the path information into the direction selection matrix, and copies the path information into the routing table 22 for accessing by the packet header flit information after a certain time interval. The speed of path planning is very fast, and the dynamic path can meet the real-time requirement.

It should be noted that, in this embodiment, referring to fig. 9, when the learning module 20 of the routing node 100 recognizes the fault information, path planning is performed again; specifically, after judging that a certain channel is damaged, the fault detection module 11 returns an error flag bit to the Q module 20 to indicate that the line is damaged; the Q module 20 writes the state into the R matrix 21 according to the error position, which can be regarded as a fault state=device edge state=full blocking state, so that the agent updates the reward value to the Q matrix in the direction according to the weight of all blocking when learning next time, and finally, the action of bypassing the fault region is achieved.

In the description herein, numerous specific details are provided, such as examples of components and/or methods, to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that an embodiment of the invention can be practiced without one or more of the specific details, or with other apparatus, systems, components, methods, components, materials, parts, and so forth. In other instances, well-known structures, materials, or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the invention.

Reference throughout this specification to "one embodiment," "an embodiment," or "a particular embodiment (a specific embodiment)" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment, and not necessarily in all embodiments, of the invention. Thus, the appearances of the phrases "in one embodiment (in one embodiment)", "in an embodiment (in an embodiment)", or "in a specific embodiment (in a specific embodiment)" in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any specific embodiment of the present invention may be combined in any suitable manner with one or more other embodiments. It will be appreciated that other variations and modifications of the embodiments of the invention described and illustrated herein are possible in light of the teachings herein and are to be considered as part of the spirit and scope of the invention.

It will also be appreciated that one or more of the elements shown in the figures may also be implemented in a more separated or integrated manner, or even removed because of inoperability in certain circumstances or provided because it may be useful depending on the particular application.

In addition, any labeled arrows in the drawings/figures should be considered only as exemplary, and not limiting, unless otherwise specifically indicated. Furthermore, the term "or" as used herein is generally intended to mean "and/or" unless specified otherwise. Combinations of parts or steps will also be considered as being noted where terminology is foreseen as rendering the ability to separate or combine is unclear.

As used in the description herein and throughout the claims that follow, unless otherwise indicated, "a," "an," and "the" include plural references. Also, as used in the description herein and throughout the claims that follow, unless otherwise indicated, the meaning of "in … (in)" includes "in … (in)" and "on … (on)".

The above description of illustrated embodiments of the invention, including what is described in the abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed herein. Although specific embodiments of, and examples for, the invention are described herein for illustrative purposes only, various equivalent modifications are possible within the spirit and scope of the present invention, as those skilled in the relevant art will recognize and appreciate. As noted, these modifications can be made to the present invention in light of the foregoing description of illustrated embodiments of the present invention and are to be included within the spirit and scope of the present invention.

The systems and methods have been described herein in general terms as being helpful in understanding the details of the present invention. Furthermore, various specific details have been set forth in order to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that an embodiment of the invention can be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, methods, components, materials, parts, and/or the like. In other instances, well-known structures, materials, and/or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the invention.

Thus, although the invention has been described herein with reference to particular embodiments thereof, a latitude of modification, various changes and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of the invention will be employed without a corresponding use of other features without departing from the scope and spirit of the invention as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit of the present invention. It is intended that the invention not be limited to the particular terms used in following claims and/or to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include any and all embodiments and equivalents falling within the scope of the appended claims. Accordingly, the scope of the invention should be determined only by the following claims.

Claims

1. A method for controlling a network-on-chip routing device, the method comprising:

performing data transmission according to the optimal data transmission path;

the step of performing parallel learning after the learning module of each routing node obtains the blocking state information of the learning module of each adjacent routing node so as to obtain the optimal data transmission path of each destination routing node includes:

repeating the four steps until the routing table of each destination routing node is prepared;

when each routing node has four data transmission directions, the preset formula comprises

2. The method according to claim 1, wherein before the step of parallel learning after the learning module of each routing node acquires the congestion status information of the learning module of each adjacent routing node, writing the congestion status information of each routing node into the learning module of the routing node.

3. The control method of a network-on-chip routing device according to claim 1, characterized in that the control method further comprises the step of:

4. The method of claim 1, wherein the congestion status information includes destination routing node information, path information, multi-level congestion status information, device edge information, and temporary damage information.

5. The method of controlling a network-on-chip routing device according to claim 1, wherein the step of transmitting data according to the optimal data transmission path comprises:

6. A network-on-chip routing device, comprising:

the learning module of each routing node is used for respectively acquiring the blocking state information of the learning module of each adjacent routing node and then carrying out parallel learning so as to acquire the optimal data transmission path of each destination routing node; the router of each routing node is used for carrying out data transmission according to the optimal data transmission path;

7. The network-on-chip routing device of claim 6, wherein the router comprises a failure detection module, an input port, a crossbar matrix, and an output port, the learning module being coupled to the failure detection module and the crossbar matrix, respectively.

8. The network-on-chip routing device of claim 6, wherein the learning module comprises:

A routing algorithm unit for storing a computer program for realizing the function of the learning module;