CN113079093B

CN113079093B - Routing method based on hierarchical Q-routing planning

Info

Publication number: CN113079093B
Application number: CN202110389260.2A
Authority: CN
Inventors: 李桢旻; 翁晓峰; 王镜涵; 李天瑜; 马宇晴; 杜高明; 宋宇鲲
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2022-03-15
Anticipated expiration: 2041-04-12
Also published as: CN113079093A

Abstract

The invention discloses a routing method based on hierarchical Q-routing planning, which obtains a high-efficiency data transmission link by sensing the congestion condition of a network and the use condition of an interconnection link and performing global hierarchical parallel planning. The algorithm of the invention is a routing algorithm based on a lookup table, the routing algorithm stores the planned direction in the routing table in the learning module of each router node, and the data packet obtains the path information by accessing the routing table in the learning module of the router node. The invention constructs hierarchical design on the basis of split Q-routing, and greatly reduces the convergence time of the algorithm by using a multilayer congestion sensor and multilayer parallel learning, thereby improving the network-on-chip data transmission efficiency, compressing a routing table and reducing the hardware resource consumption.

Description

Routing method based on hierarchical Q-routing planning

Technical Field

The invention belongs to the technical field of communication of integrated circuit network-on-chip, and particularly relates to a network-on-chip routing method based on hierarchical Q-routing planning.

Background

With the gradual failure of moore's law, the development of the semiconductor process is gradually slowed down, and the working frequency of the single-core processor is difficult to rapidly increase when encountering a bottleneck. The System on Chip (SoC) of the traditional bus structure has the disadvantages of poor expansibility, low parallelism and the like, and a new method, namely Network on Chip (NoC) communication, other than the traditional bus is needed to improve the working frequency of the whole Chip. The NoC has good expansibility, can process data of a plurality of IP cores in a chip in parallel, and effectively solves the problems of power consumption, performance, area and the like.

The NoC comprises the aspects of a topological structure, a routing algorithm, a switching technology and the like, and the patent researches the routing algorithm. Routing algorithms provide the direction of transmission for packets in nocs, a ring of great importance in nocs. An excellent routing algorithm will improve transmission efficiency and increase throughput through rapid, reasonable path planning.

Split Q-routing is a reinforcement learning based network-on-chip routing algorithm. To find the shortest route path between the source router node and the target router node. The problems of data delay, power consumption increase, temperature rise of the router and the like caused by the transmission of a large amount of data by the NoC can be solved well. However, as the scale of the network on chip is continuously increased, the network congestion will be more and more serious, and the split Q-routing will have the problem that the path planning time is too long, so that the timeliness is lost and the requirement is difficult to meet.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a routing method based on hierarchical Q-routing planning, which aims to make up the defects of the traditional Q-routing, further improve the transmission performance of the NoC, reduce the power consumption and increase the throughput; meanwhile, the area of a hardware circuit can be reduced by compressing the routing table.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a routing method based on hierarchical Q-routing planning, which is applied to a network on chip consisting of w router nodes, w resource nodes and a plurality of interconnected channels, wherein the router nodes comprise input ports, output ports, congestion perceptrons, multi-way gates and access routing tables; the method is characterized in that a learning module is arranged in the router node; the learning module includes: the system comprises a learning mode arbitrator, a hierarchical control module, a routing table selection module, 3 sub-learning modules and 3 routing tables; the routing method comprises the following steps:

step 1: dividing all router nodes into three-layer network structures according to the following rules, thereby forming a pyramid structure; the rule is as follows:

w router nodes per x in a layer 1 network structure²Each node is divided into a group, thereby forming a group consisting of

A layer 1 network structure formed by virtual router groups;

in a layer 2 network structure

Each virtual router group in every y²Each node is divided into a group, thereby forming a group consisting of

A layer 2 network structure formed by virtual router groups;

in a layer 3 network structure

A virtual router group with z²Dividing the nodes into a group, thereby forming a layer 3 consisting of 1 virtual router group;

the 3 sub-learning modules and the 3 routing tables correspond to the network structures of all layers; each sub-learning module includes: r matrix, Q value comparator;

let L₁ ^i,hRepresents the h sub-learning module, i, corresponding to the ith router node in the 1 st network structure<x²，h＝1,2,3；

Let L_j ⁱIndicating the ith virtual router group in the jth layer network structure, wherein j is not equal to 1;

let L₃ ⁱ：L₂ ⁱIndicating an ith virtual router group in the layer 2 network structure in the ith virtual router group in the layer 3 network structure;

let L₃ ⁱ：L₂ ⁱ：L₁ ^i,hThe h sub-learning module represents the h sub-learning module corresponding to the ith router node in the 1 st network structure in the ith virtual router group in the 2 nd network structure in the ith virtual router group in the 3 rd network structure;

step 2: each router node senses the congestion degree and learns in parallel in each layer of network structure;

step 2.1: with L₃ ⁱ：L₂ ⁱ：L₁ ^i,1The 1 st sub-learning module is used as the current router node; for the current router node in the 1 st layer network structure, the occupancy rate counting is carried out on the input port of the current router node through the congestion sensor of the current router node, the flow counting is carried out on the output port of the current router node, and therefore the 1 st layer congestion level on each path is obtained and stored in the L in the 1 st layer network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,1In the R matrix of the 1 st sub-learning module represented;

for L in layer 2 network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,2The represented 2 nd sub-learning module performs average pooling on congestion levels of all nodes in the ith virtual router group in the 2 nd network structure to obtain pooled congestion levels, and stores the pooled congestion levels into the L in the 2 nd network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,2In the R matrix of the 2 nd sub-learning module represented;

for L in layer 3 network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,3The 3 rd sub-learning module performs average pooling on congestion levels of all nodes in the ith virtual router group in the 3 rd network structure to obtain pooled congestion levels, and stores the pooled congestion levels in the topmost virtual router groupL in a layer network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,3In the R matrix of the 3 rd sub-learning module represented;

step 2.2: initializing i to 0;

step 2.3: mixing L with₃ ⁱ：L₂ ⁱ：L₁ ^i,hThe reward values in the sub-learning modules represented by

h

1,2 and 3 are initialized to the target reward value; initializing the reward values in the sub-learning modules of other router nodes to 0;

step 2.4: for L in layer 1 network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,1The 1 st sub-learning module reads the maximum reward value of the sub-learning module on the adjacent router node and according to L₃ ⁱ：L₂ ⁱ：L₁ ^i,1The R matrix in the 1 st sub-learning module performs weighting processing on the read maximum reward value to obtain a weighted reward value, and selects the weighted maximum reward value to transmit to all adjacent router nodes;

for L in layer 2 network structure₃ ⁱ：L₂ ⁱ：L₁ ^0,2The 2 nd sub-learning module, read and L₃ ⁱ：L₂ ⁱ：L₁ ^0,2The maximum reward value of the 2 nd sub-learning module at the same position in the virtual router group adjacent to the 2 nd sub-learning module is expressed and is taken as L₃ ⁱ：L₂ ⁱThe maximum reward value of all nodes in the represented ith virtual router group; according to item L₃ ⁱ：L₂ ⁱ：L₁ ^0,2The R matrix in the 2 nd sub-learning module performs weighting processing on the read maximum reward value to obtain a weighted reward value, and selects the weighted maximum reward value to be transmitted to the L₃ ⁱ：L₂ ⁱ：L₁ ^0,2The 2 nd sub-learning module is adjacent to the virtual router groupThe 2 nd sub-learning module at the same position;

for L in layer 3 network structure₃ ⁱ：L₂ ⁱ：L₁ ^0,33 rd sub-learning module, read and L₃ ⁱ：L₂ ⁱ：L₁ ^0,3The maximum reward value of the 3 rd sub-learning module at the same position in the 3 rd sub-learning module adjacent virtual router group is taken as the ith virtual router group L in the 3 rd layer network structure₃ ⁱMaximum reward values for all nodes in the set; according to L in layer 3 network structures₃ ⁱ：L₂ ⁱ：L₁ ^0,3The R matrix in the 3 rd sub-learning module performs weighting processing on the read maximum reward value to obtain a weighted reward value, and selects the weighted maximum reward value to be transmitted to the L₃ ⁱ：L₂ ⁱ：L₁ ^0,3The 3 rd sub-learning module at the same position in the virtual router group adjacent to the 3 rd sub-learning module;

step 2.5: after the path information of the next hop of each sub-learning module of each router node is obtained, the path information of the next hop of each sub-learning module of each router node is transmitted to the hierarchical control module in parallel;

and step 3: and correcting the path information in the hierarchical control module, namely: correcting path information from a high-layer network structure to a low-layer network structure, and then performing reverse direction transmission on the corrected path information from the low-layer network structure to the high-layer network structure, so that each sub-learning module of the router node obtains the corrected path information, and stores the corrected path information into a routing table of a corresponding layer network structure;

and 4, step 4: assigning i +1 to i, and returning to the step 2.3 until i is max-1, so as to complete the path planning of each router node as a destination node, wherein max represents the maximum number of router nodes in the virtual router group in each layer of the network structure;

and 5: and (3) transmission of network-on-chip data packets:

the data packet in the network-on-chip accesses the routing table in each layer network structure, sequentially passes through the input port and the multi-way gate, and the multi-way gate performs access operation on the routing table selection module in the learning module;

the routing table selection module reads the position information of the destination router node in the data packet, accesses the routing table according to the access rule and takes out the path information stored in the routing table:

and according to the taken-out path information in the routing table, if the taken-out path information is return information, the destination router node transmits the data packet to a packet receiver of the destination router node, otherwise, the data packet is transmitted to a corresponding output port according to the taken-out path information, so that the transmission of the data packet is completed.

The routing algorithm based on the hierarchical Q-routing plan is also characterized in that the step 3 is carried out according to the following steps:

step 3.1, the learning mode arbitrator judges the source of the path information sent to the hierarchical control module, if the source is L₃ ⁱ：L₂ ⁱ：L₁ ^0,hWhen h is 1,2,3, the 1 st sub-learning module, the 2 nd sub-learning module, and the 3 rd sub-learning module represent that the path information is from the sub-learning module located at the same position in the adjacent virtual router group; otherwise, the path information is from the h sub-learning module corresponding to the 0 th router node in the same virtual router group;

step 3.2, correcting the path information from the layer 3 network structure to the layer 1 network structure in sequence according to the correction rule;

step 3.3, judging whether the reverse transmission rule is met, if so, sequentially performing reverse transmission from the 1 st network structure to the 3 rd network structure according to the reverse transmission rule, and otherwise, sequentially performing reverse transmission from the 1 st network structure to the 3 rd network structure according to a strategy 1 or a strategy 2; wherein, the strategy 1 is to directly obtain the uncorrected path information of the current layer; strategy 2 is to obtain the uncorrected path information of the lower layer;

and 3.4, sending the corrected and reversely transmitted path information into a routing table of a corresponding layer network structure.

The correction rule is as follows:

if L is₃ ⁱ：L₂ ⁱ：L₁ ^i,2The 2 nd sub-learning module is shown located at the position of the adjacent virtual router group in the layer 3 grid structure, and L₃ ⁱ：L₂ ⁱ：L₁ ^i,3The path information in the 3 rd sub-learning module represented needs to be transmitted across the adjacent virtual router group; then L is₃ ⁱ：L₂ ⁱ：L₁ ^i,2The path information of the 2 nd sub-learning module is changed to L₃ ⁱ：L₂ ⁱ：L₁ ^i,3The path information of the 3 rd sub-learning module is represented, and a correction signal is generated; otherwise, L is reserved₃ ⁱ：L₂ ⁱ：L₁ ^i,2The path information of the 2 nd sub-learning module;

if L is₃ ⁱ：L₂ ⁱ：L₁ ^i,1The 1 st sub-learning module is located at the position of the adjacent virtual router group in the layer 2 grid structure, and L₃ ⁱ：L₂ ⁱ：L₁ ^i,2The path information in the 2 nd sub-learning module represented needs to be transmitted across the adjacent virtual router group; then L is₃ ⁱ：L₂ ⁱ：L₁ ^i,1The path information of the 1 st sub-learning module is changed to L₃ ⁱ：L₂ ⁱ：L₁ ^i,2The expressed path information of the 2 nd sub-learning module is divided, and a correction signal is generated; otherwise, L is reserved₃ ⁱ：L₂ ⁱ：L₁ ^i,1The path information of the 1 st sub-learning module;

the reverse transmission rule is as follows:

if L is₃ ⁱ：L₂ ⁱ：L₁ ^i,hWhen the path information of the 1 st sub-learning module and the 2 nd sub-learning module denoted by

h

1,2 is corrected, L is corrected₃ ⁱ：L₂ ⁱ：L₁ ^i,hPath information of the 1 st and 2 nd sub-learning modules denoted by

h

1,2 is inversely transmitted to L₃ ⁱ：L₂ ⁱ：L₁ ^i,(h+1)And h is 1,2 represents the 2 nd sub-learning module and the 3 rd sub-learning module, otherwise, the reverse transmission is not carried out.

And the target reward value is (p-1) multiplied by q, wherein p represents the number of the router nodes in the virtual router group in the corresponding layer network structure, and q represents the weighted value corresponding to the maximum congestion level.

The rule of the weighting processing is as follows:

if the next hop is a path, subtracting y from the maximum reward value;

if the next jump is the first-level blockage, subtracting 3 multiplied by y +1 from the maximum reward value;

if the next hop is a secondary occlusion, then the maximum reward value is subtracted by 3 × (3 × y +1) + 1;

if the next hop is a device edge or temporary deactivation, the zero maximum reward value is zero; wherein y represents a positive integer.

The access rule is as follows:

step a, initializing i to 3;

and b, comparing the data packet from the i-th layer network structure according to the position information of the destination router node, if the destination router node of the data packet is in the group corresponding to the i-th layer network structure, assigning i-1 to i, and returning to the step b until i is equal to 1, otherwise, accessing the routing table in the i-th layer network structure by the data packet.

Compared with the prior art, the invention has the beneficial effects that:

1. the network-on-chip routing method based on hierarchical Q-routing planning of the invention comprises the steps of forming NoC systems of various scales by multiplexing a learning module; each router node performs hierarchical parallel learning, and the time for path planning is greatly reduced, so that the real-time change of the NoC network environment blocking condition is better adapted.

2. The invention divides and compresses the routing table, and integrates a plurality of routing tables into a complete routing table through mapping, thereby greatly reducing the resources occupied by the routing table, and the reduction amplitude is further increased along with the increase of the scale of the NoC network. Taking an 8 × 8 routing network as an example, 8 × 8 × 4 bits (4 bits represent four directions) are needed before layering is not introduced, and if three Q-routing are performed, only 3 × 4 × 4 bits (4 nodes in each area) are needed, which can be seen as an 80% reduction in area resources.

3. According to the invention, through reducing the learning levels of part of nodes, namely, the small part of nodes are subjected to multi-layer learning, and the large part of nodes are subjected to non-multi-layer learning, the consumption of Q-routing circuit area resources is reduced, and meanwhile, the layout and wiring of the whole system circuit are facilitated.

Drawings

FIG. 1 is a diagram of a router node structure according to the present invention;

FIG. 2 is a schematic diagram of a learning module configuration according to the present invention;

FIG. 3 is a hierarchical diagram of a router node according to the present invention;

FIG. 4 is a block diagram of a hierarchical Q-routing implementation of the present invention;

FIG. 5 is an exemplary diagram of a correction rule and a reverse transmission rule according to the present invention;

FIG. 6 is a flow chart of a routing table read in accordance with the present invention;

fig. 7 is a flow chart of an example NoC system of the present invention.

Detailed Description

In the routing method based on hierarchical Q-routing planning in this embodiment, in a network on chip including 64 router nodes, 64 resource nodes, and a plurality of interconnection channels, configuration is performed with reference to a learning module configuration manner shown in fig. 1 and 2, where a router node includes an input port, an output port, a congestion sensor, a multi-way gate, and an access routing table; the method is characterized in that a learning module is arranged in a router node; referring to fig. 4, the learning module includes: the system comprises a learning mode arbitrator, a hierarchical control module, a routing table selection module, 3 sub-learning modules and 3 routing tables. The number of the network structure, the sub-learning modules and the routing tables can be increased to adapt to more router node networks; referring to fig. 7, the routing method is performed as follows:

step 1: referring to fig. 3, all router nodes are divided into three-layer network structures according to the following rules, so as to form a pyramid structure; the purpose of such layering is to enable parallel learning and obtain a transmission path with short time consumption and high transmission rate. The rule is:

dividing 64 router nodes into one group by 4 nodes in a layer 1 network structure, so as to form the layer 1 network structure consisting of 16 virtual router groups;

dividing 16 virtual router nodes into one group by every 4 nodes in a layer 2 network structure, thereby forming the layer 2 network structure consisting of 4 virtual router groups;

allocating 4 virtual router nodes in z in a layer 3 network structure²Dividing the nodes into a group, thereby forming a layer 3 consisting of 1 virtual router group;

3 sub-learning modules and 3 routing tables; each sub-learning module includes: r matrix, Q value comparator;

let L₁ ^0,hRepresents the h sub-learning module, i, corresponding to the ith router node in the 1 st network structure<x²，h＝1,2,3；

let L₃ ⁱ：L₂ ⁱ：L₁ ^i,hThe ith virtual router group in the 3 rd network structure is represented, and the ith virtual router group in the 2 nd network structure is positioned in the ith virtual router group in the 1 st network structure and corresponds to the ith router nodeh sub-learning modules;

step 2.1: with L₃ ⁱ：L₂ ⁱ：L₁ ^i,1The 1 st sub-learning module is used as the current router node; for the current router node in the 1 st layer network structure, the occupancy rate counting is carried out on the input port of the current router node through the congestion sensor of the current router node, the flow counting is carried out on the output port of the current router node, and therefore the 1 st layer congestion level on each path is obtained and stored in the L in the 1 st layer network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,1In the R matrix of the 1 st sub-learning module represented; the congestion level R value 3 'b 000 represents a channel, 3' b001 represents a primary blockage, 3 'b 010 represents a secondary blockage, and 3' b111 represents an unreachable channel (fully blocked or temporarily deactivated state).

for L in layer 3 network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,3The 3 rd sub-learning module performs average pooling on congestion levels of all nodes in the ith virtual router group in the 3 rd network structure to obtain pooled congestion levels, and stores the pooled congestion levels into the L in the topmost network structure₃ ⁱ：L₂ ⁱ：L₁ ^i,3In the R matrix of the 3 rd sub-learning module represented;

step 2.2: initializing i to 0;

h

1,2 and 3 are initialized to the target reward value; in this embodiment, the destination reward value is 39, the number of router nodes in the virtual router group is 4, and the weighting value corresponding to the maximum congestion level is 13. Initializing the reward values in the sub-learning modules of other router nodes to 0;

for L in layer 2 network structure₃ ⁱ：L₂ ⁱ：L₁ ^0,2The 2 nd sub-learning module, read and L₃ ⁱ：L₂ ⁱ：L₁ ^0,2The maximum reward value of the 2 nd sub-learning module at the same position in the virtual router group adjacent to the 2 nd sub-learning module is expressed and is taken as L₃ ⁱ：L₂ ⁱThe maximum reward value of all nodes in the represented ith virtual router group; according to item L₃ ⁱ：L₂ ⁱ：L₁ ^0,2The R matrix in the 2 nd sub-learning module performs weighting processing on the read maximum reward value to obtain a weighted reward value, and selects the weighted maximum reward value to be transmitted to the L₃ ⁱ：L₂ ⁱ：L₁ ^0,2The 2 nd sub-learning module at the same position in the virtual router group adjacent to the represented 2 nd sub-learning module;

in specific implementation, the rule of the weighting process is as follows:

if the next hop is a path, subtracting 1 from the maximum reward value;

if the next hop is the first-level blockage, subtracting 4 from the maximum reward value;

if the next hop is a secondary jam, subtracting 13 from the maximum reward value;

if the next hop is a device edge or temporary deactivation, the zero maximum reward value is zero;

and step 3: referring to fig. 5, the path information is modified in the hierarchical control module, that is: correcting path information from a high-layer network structure to a low-layer network structure, and then performing reverse direction transmission on the corrected path information from the low-layer network structure to the high-layer network structure, so that each sub-learning module of the router node obtains the corrected path information, and stores the corrected path information into a routing table of a corresponding layer network structure;

step 3.1: the learning mode arbitrator judges the source of the path information sent to the hierarchical control module, if the source is L₃ ⁱ：L₂ ⁱ：L₁ ^0,hWhen h is 1,2,3, the 1 st sub-learning module, the 2 nd sub-learning module, and the 3 rd sub-learning module represent that the path information is from the sub-learning module located at the same position in the adjacent virtual router group; otherwise, the path information is from the h sub-learning module corresponding to the 0 th router node in the same virtual router group;

step 3.2: according to the correction rule, path information is corrected from the 3 rd layer network structure to the 1 st layer network structure in sequence;

step 3.3: judging whether the anti-transmission rule is met, if so, sequentially performing anti-transmission from the 1 st network structure to the 3 rd network structure according to the anti-transmission rule, and otherwise, sequentially performing anti-transmission from the 1 st network structure to the 3 rd network structure according to a strategy 1 or a strategy 2; wherein, the strategy 1 is to directly obtain the uncorrected path information of the current layer; strategy 2 is to obtain the uncorrected path information of the lower layer;

step 3.4: and sending the corrected and reversely transmitted path information into a routing table of a corresponding layer network structure.

The correction rule is:

the anti-transmission rule is as follows:

h

1,2 is inversely transmitted to L₃ ⁱ：L₂ ⁱ：L₁ ^i,(h+1)H is 1,2 represents the 2 nd sub-learning module and the 3 rd sub-learning module, otherwise, the reverse transmission is not carried out;

and 4, step 4: and assigning i +1 to i, and returning to the step 2.3 until i is 3, thereby completing the path planning of each router node as a destination node.

And 5: referring to fig. 5, the transmission of the network-on-chip packet is as follows:

the routing table selection module reads the position information of the destination router node in the data packet, including the information of the nodes in the layer 3, layer 2 and layer 1, and accesses the routing table and takes out the path information stored in the routing table according to the following access rules:

step a, initializing i to 3;

Claims

1. A routing method based on hierarchical Q-routing planning is applied to a network on chip consisting of w router nodes, w resource nodes and a plurality of interconnecting channels, wherein the router nodes comprise input ports, output ports, congestion perceptrons, multi-way gates and access routing tables; the method is characterized in that a learning module is arranged in the router node; the learning module includes: the system comprises a learning mode arbitrator, a hierarchical control module, a routing table selection module, 3 sub-learning modules and 3 routing tables; the routing method comprises the following steps:

A layer 1 network structure formed by virtual router groups;

in a layer 2 network structure

A layer 2 network structure formed by virtual router groups;

in a layer 3 network structure

step 2.2: initializing i to 0;

step 2.3: mixing L with₃ ⁱ：L₂ ⁱ：L₁ ^i,hThe reward values in the sub-learning modules represented by h 1,2 and 3 are initialized to the target reward value; initializing the reward values in the sub-learning modules of other router nodes to 0;

and 5: and (3) transmission of network-on-chip data packets:

2. The routing method based on hierarchical Q-routing planning of claim 1, wherein the step 3 is performed as follows:

step 3.3, judging whether the anti-transmission rule is met, if so, sequentially performing anti-transmission from the 1 st network structure to the 3 rd network structure according to the anti-transmission rule, and otherwise, sequentially performing anti-transmission from the 1 st network structure to the 3 rd network structure according to the strategy 1 or the strategy 2; wherein, the strategy 1 is to directly obtain the uncorrected path information of the current layer; strategy 2 is to obtain the uncorrected path information of the lower layer;

3. The hierarchical Q-routing based routing method of claim 2,

the correction rule is as follows:

if L is₃ ⁱ：L₂ ⁱ：L₁ ^i,2The 2 nd sub-learning module represented is located in the layer 3 grid structure adjacent to the virtual routeIn the position of the group, and L₃ ⁱ：L₂ ⁱ：L₁ ^i,3The path information in the 3 rd sub-learning module represented needs to be transmitted across the adjacent virtual router group; then L is₃ ⁱ：L₂ ⁱ：L₁ ^i,2The path information of the 2 nd sub-learning module is changed to L₃ ⁱ：L₂ ⁱ：L₁ ^i,3The path information of the 3 rd sub-learning module is represented, and a correction signal is generated; otherwise, L is reserved₃ ⁱ：L₂ ⁱ：L₁ ^i,2The path information of the 2 nd sub-learning module;

the reverse transmission rule is as follows:

if L is₃ ⁱ：L₂ ⁱ：L₁ ^i,hWhen the path information of the 1 st sub-learning module and the 2 nd sub-learning module denoted by h 1,2 is corrected, L is corrected₃ ⁱ：L₂ ⁱ：L₁ ^i,hPath information of the 1 st and 2 nd sub-learning modules denoted by h 1,2 is inversely transmitted to L₃ ⁱ：L₂ ⁱ：L₁ ^i,(h+1)And h is 1,2 represents the 2 nd sub-learning module and the 3 rd sub-learning module, otherwise, the reverse transmission is not carried out.

4. The hierarchical Q-routing scheme-based routing method of claim 1,

5. The hierarchical Q-routing scheme-based routing method of claim 1,

the rule of the weighting processing is as follows:

if the next hop is a path, subtracting y from the maximum reward value;

6. The hierarchical Q-routing scheme-based routing method of claim 1,

the access rule is as follows:

step a, initializing i to 3;