CN104684040A

CN104684040A - Method for establishing a routing path through Q learning on-board network based on fuzzy reasoning

Info

Publication number: CN104684040A
Application number: CN201510103439.1A
Authority: CN
Inventors: 方敏; 郭祥; 彭垚森; 郑海红; 刘彦勋
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2015-03-09
Filing date: 2015-03-09
Publication date: 2015-06-03
Anticipated expiration: 2035-03-09
Also published as: CN104684040B

Abstract

The invention relates to a method for establishing a routing path through a Q learning on-board network based on fuzzy reasoning. The method comprises the following specific steps: (1) performing network initialization; (2) sending a greetings data packet in a broadcasting manner; (3) starting to send a request message through a source network node; (4) calculating the channel grade of an intermediate network node; (5) updating a Q value in a routing request data packet; (6) judging whether a current network node s is a destination network node or not, executing the step (7) if yes and otherwise, executing the step (4); (7) establishing positive routing information; (8) judging whether a routing reply data packet reaches the source network node or not, executing the step (9) if yes and otherwise, executing the step (7); (9) sending the data packet. According to the method, the combination of a fuzzy reasoning technology and a routing technology is realized, and the discount rate in a Q learning method is calculated according to the fuzzy reasoning and can be dynamically adjusted according to the network environment condition of the on-board network, so that the speed of establishing on-board network routing is accelerated.

Description

Method for establishing routing path of Q learning vehicle-mounted network based on fuzzy inference

Technical Field

The invention belongs to the technical field of communication, and further relates to a method for establishing a routing path of a Q learning vehicle-mounted network based on fuzzy inference in a wireless sensor network. The invention evaluates the quality of the node link in the vehicle-mounted network environment by applying Q learning and calculates the fuzzy judgment value of the discount rate in the Q learning of the node, and is a routing method capable of dynamically and self-adapting to the vehicle-mounted network. The invention can accelerate the speed of establishing the global optimal routing path. Can be applied to various fields such as vehicle-mounted networks and the like.

Background

At present, in the technical field of wireless sensor networks represented by fuzzy reasoning, a method of combining the fuzzy reasoning technology and the routing technology is widely used for establishing vehicle-mounted network routing information and accelerating the speed of establishing a global optimal routing path.

The patent application "a vehicle-mounted network routing protocol using intersection static nodes to assist data forwarding" (application number: CN201310275846, publication number: CN103379575A) proposed by south china university discloses a vehicle-mounted network routing protocol using intersection static nodes to assist data forwarding. According to the method, static nodes are arranged at each intersection, and all the nodes continuously collect road condition information and assist in data forwarding. However, the method still has the following defects: each static node only collects the delay time and the road state between the current network node and the neighbor network node, and establishes a routing path on the basis of the delay time and the road state, so that the routing path cannot be established from the global road condition, and the global optimal routing path is difficult to obtain quickly.

A self-Learning Network Routing Method for mobile ad hoc networks was devised by Ke Wan and Wai-Choong Wong et al in the article "A MANET Routing Protocol using Q-Learning Method Integrated with the Bayesian Network (Proceedings of the 2012IEEEICCS,2012,1(3): 133-141)". The method replaces an AODV routing method with a broadcast route discovery mechanism only depending on hop count, and the Bayesian network is used for estimating the congestion level of the neighbor nodes. However, the method has the following disadvantages: because the learning rate and the discount rate based on the Q learning routing model both adopt fixed experience values and cannot change along with the dynamic change of a specific network, the convergence rate of the algorithm is low; the Bayesian network for estimating the link state of the neighbor node needs periodic training to adapt to the dynamic change of the network, and if the training period is too large, the current link state of the vehicle-mounted network cannot be accurately reflected, and the overall performance of the network can be influenced under the complex vehicle-mounted network environment.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method (FAODV) for establishing a routing path by a Q learning vehicle-mounted network based on fuzzy reasoning. The invention estimates the link quality of the vehicle-mounted network through fuzzy reasoning so as to calculate the discount rate in the Q learning routing method. Therefore, a more accurate routing reward value is determined, the convergence speed of Q learning is increased, and the performance of the vehicle-mounted network routing method is improved.

The technical scheme for realizing the invention is as follows: fuzzy logic is introduced to evaluate the channel level of the network node, and the discount rate in the Q learning method is corrected according to the channel level of the network node so as to determine the attention degree of future rewarding for routing selection. The Q learning method of fuzzy inference is adopted, so that the reward of routing selection among nodes can be dynamically changed according to different channel qualities of the network nodes. The convergence rate of the Q learning vehicle-mounted network routing method is increased, and the purpose of improving the performance of the vehicle-mounted network routing method is achieved.

In order to achieve the above object, the present invention comprises the following main steps:

(1) network initialization:

setting a Q value table of each network node in the network to be empty initially, setting the learning rate to be 0.8 and setting a routing table to be empty;

(2) broadcast hello packets:

all network nodes in the network regularly broadcast and send HELLO HELLO data packets, after receiving HELLO packets sent by adjacent network nodes, the receiving network nodes inquire whether the adjacent network nodes exist in an adjacent network node list, if yes, the HELLO HELLO data packets of other network nodes are continuously received, and if not, the adjacent network nodes are added into the adjacent network node list;

(3) the source network node starts to send a request message:

the source network node checks the routing table information, if the next hop network node is not empty, the data are sent to the next hop network node according to the information in the routing table, otherwise, the routing request data packet is broadcast and sent to establish routing contact;

(4) calculating a channel level of the intermediate network node:

(4a) calculating the channel idle rate between adjacent network nodes;

(4b) calculating the signal strength between adjacent network nodes;

(4c) taking the channel idle rate between adjacent network nodes as a 1 st channel quality index, taking the signal intensity between the adjacent network nodes as a 2 nd channel quality index, and obtaining the channel grade between the current network node and the adjacent network nodes by adopting a fuzzy C mean value method;

(5) updating the Q value in the Q value table of the network node according to the following formula:

<math> <mrow> <msub> <mi>Q</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>&LeftArrow;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>α</mi> <mo>)</mo> </mrow> <msub> <mi>Q</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>α</mi> <mo>{</mo> <mi>R</mi> <mo>+</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>,</mo> <mi>x</mi> <mo>)</mo> </mrow> <msub> <mi>max</mi> <mrow> <mi>y</mi> <mo>&Element;</mo> <msub> <mi>N</mi> <mi>s</mi> </msub> </mrow> </msub> <msub> <mi>Q</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>}</mo> </mrow> </math>

wherein Q is_s(O, x) represents that the current network node S selects the neighboring network node x as a Q value of a next hop node to reach the destination network node O, α represents a learning rate, α is 0.8, R represents an immediate reward, if the current network node S selects the neighboring node x as the next hop node to reach the destination network node O, R is equal to 1, otherwise R is equal to 0; n is a radical of_SRepresenting a set of adjacent network nodes of the current network node S, wherein f (S, x) represents a discount rate, and the value of the discount rate is the channel level between the current network node S and the adjacent network node x;

(6) judging whether the current network node S is a target network node, if so, executing the step (7), otherwise, executing the step (4);

(7) establishing forward routing information:

starting from a target network node, sending a route reply data packet to a neighbor network node, selecting an adjacent network node with the maximum Q value as a network node of a next hop for each hop of the route reply data packet, transmitting the route reply data packet to the network node of the next hop, and establishing forward routing information;

(8) judging whether the routing reply data packet reaches the source network node, if so, executing the step (9), otherwise, executing the step (7);

(9) and (3) sending a data packet:

when the routing reply data packet reaches the source network node, the establishment of a routing path from the source network node to the destination network node is completed, and the source network node starts to send the data packet according to the routing table information.

Compared with the prior art, the invention has the following advantages:

firstly, because the invention adopts the fuzzy clustering method to divide the vehicle-mounted network channel grade, the invention overcomes the defects that the prior art can not accurately reflect the current link condition of the vehicle-mounted network when the Bayesian network is used for estimating the vehicle-mounted network channel grade, and the overall performance of the network can be influenced under the complex vehicle-mounted network environment, so that the invention has the advantages of accurately reflecting the current link condition of the vehicle-mounted network and improving the accuracy of estimating the vehicle-mounted network channel grade.

Secondly, because the Q value in the Q value table of the network node can be updated by adopting the fuzzy clustering method to calculate the discount rate in the Q learning method according to the real-time state of the network environment, the defect that the Q value in the Q value table of the network node is updated by adopting a fixed discount rate and cannot change along with the dynamic change of a specific network in the prior art is overcome, and the method has the advantage of updating the Q value in the Q value table of the network node according to the actual network environment of the vehicle-mounted network in a self-adaptive manner.

Thirdly, because the route path is obtained by adopting the Q learning method, and the Q learning method evaluates the channel quality of the network node from the whole situation, the defect that the delay time and the road state between the current network node and the neighbor network node are only collected and the local optimum is easily caused in the prior art is overcome, so that the route path is provided by the invention on a complex vehicle

The method has the advantage that the global optimal routing path can be established in the network-carrying network environment from the global consideration.

Drawings

FIG. 1 is a flow chart of the present invention;

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, the specific implementation steps of the present invention are as follows:

step 1, network initialization.

Setting the Q value table of each network node in the network to be initially null, setting the learning rate to be 0.8 and setting the routing table to be

Is set to empty.

And step 2, broadcasting and sending the greeting data packet.

All network nodes in the network regularly broadcast and send HELLO data packets, after receiving HELLO packets sent by adjacent network nodes, the receiving network nodes inquire whether the adjacent network nodes exist in an adjacent network node list, if yes, the HELLO data packets of other network nodes are continuously received, and if not, the adjacent network nodes are added into the adjacent network node list.

And step 3, the source network node starts to send a request message.

And the source network node checks the routing table information, if the next hop network node is not empty, the data is sent to the next hop network node according to the information in the routing table, otherwise, the routing request data packet is broadcast and sent to establish routing contact.

And 4, calculating the channel level of the intermediate network node.

The specific steps of calculating the channel idle rate between adjacent network nodes are as follows:

firstly, according to the following formula, calculating the channel idle time obtained by monitoring the intermediate network node:

wherein, T1 represents the idle time of the channel obtained by monitoring the intermediate network node, T represents the time for monitoring the intermediate network node channel once, N represents the monitoring times of monitoring the intermediate network node channel, B_kIndicating feedback on the kth monitoring of the intermediate network node channel, when the channel is busy, B_kEqual to 0, when the channel is idle, B_kEqual to 1.

Secondly, calculating the idle rate of the intermediate network node channel according to the following formula:

R = \frac{T 1}{T 2}

wherein, R represents the idle rate of the intermediate network node channel, T1 represents the channel idle time obtained by monitoring the intermediate network node, T2 represents the time taken for monitoring the intermediate network node channel N times, and N represents the number of monitoring times of the intermediate network node channel.

Thirdly, calculating the channel idle rate between adjacent network nodes according to the following formula:

R_sx＝min{R_s,R_x}

wherein R is_sxIndicating the channel idleness, R, between the current network node S and the neighboring network node x_sIndicating the channel idle rate, R, of the network node S_xRepresenting the channel idle rate of network node x, and min {. cndot.) represents the minimum value operation.

And taking the channel idle rate between adjacent network nodes as a 1 st channel quality index, taking the signal strength between the adjacent network nodes as a 2 nd channel quality index, and obtaining the channel grade between the current network node and the adjacent network nodes by adopting a fuzzy C mean value method.

The specific steps of calculating the channel level between the current network node and the adjacent network node are as follows:

the method comprises the following steps that firstly, the channel idle rate grade L1 between a current network node S and an adjacent network node x is divided into three grades of { large, medium and small } in a fuzzy mode, and the signal intensity grade L2 between the current network node S and the adjacent network node x is divided into three grades of { strong, good and weak }; updating the k-th index grade center of the ith index grade center of the jth channel quality index representing the current network node S and the adjacent network node x_ji(k) Is set to [0,1 ]]K represents the number of updates of the rank center, and the initial value of k is set to 0.

Secondly, respectively calculating the k-th updating value sigma of the jth channel quality index of the current network node S and the adjacent network node x to the ith index grade center thereof according to the following formula_ji(k) Degree of membership of:

wherein, mu_jiRepresents the k-time update value sigma of the j-th channel quality index of the current network node S and the adjacent network node x to the i-th index grade center_ji(k) Degree of membership of, σ_ji(k) K-th update value, z, of ith index level center representing jth channel quality index of current network node S and neighboring network node x_jRepresents the jth channel quality indicator between the current network node S and the neighboring network node x, and Σ () represents the summation operation.

Thirdly, correcting the index grade center of the current network node according to the following formula:

wherein σ_ji(k +1) represents the (k +1) th updated value of the ith index level center of the jth channel quality index of the current network node S and the adjacent network node x, M represents the channel quality sample space dimension of the current network node S and the adjacent network node x, mu_jiRepresents the k-time update value sigma of the j-th channel quality index of the current network node S and the adjacent network node x to the i-th index grade center_ji(k) Degree of membership of, z_jRepresents the jth channel quality indicator between the current network node S and the neighboring network node x, and Σ () represents the summation operation.

Fourthly, calculating an updating error value according to the following formula:

where e represents the update error value, σ_ji(k +1) represents the (k +1) th updated value, σ, of the ith index level center of the jth channel quality index of the current network node S and the adjacent network node x_ji(k) The k-th update value of the ith index level center, representing the jth channel quality index for the current network node S and the neighboring network node x, Σ () represents a summation operation.

And fifthly, judging whether the updating error value e is smaller than the threshold value 0.00001, if so, executing the sixth step of the step, otherwise, executing the second step of the step.

And sixthly, manually establishing a fuzzy rule base according to the membership degree of the channel quality index grade between the current network node S and the adjacent network node x and the corresponding channel grade.

The contents of the fuzzy rule base established are shown in table 1.

TABLE 1 fuzzy rule base

	Signal strength	Channel idle rate	Channel rank
				Rule 1	High strength	Big (a)	Good taste
Rule 2	High strength	In	Good taste
				Rule 3	High strength	Small	In general
Rule 4	In	Big (a)	Good taste
				Rule 5	In	In	In general
Rule 6	In	Small	Difference (D)
				Rule 7	Weak (weak)	Big (a)	In general
Rule 8	Weak (weak)	In	Difference (D)
				Rule 9	Weak (weak)	Small	Difference (D)

The channel level center ambiguity is divided into three levels of { poor, fair, good }, and the channel level centers w of the three ambiguity divisions are {0.25,0.5,0.75 }.

Calculating the membership degree of the ith channel grade corresponding to the mth index grade center of the 1 st channel quality index and the nth index grade center of the 2 nd channel quality index between the current network node S and the adjacent network node x according to the following formula:

wherein,the membership degree of the ith channel grade corresponding to the mth index grade center of the 1 st channel quality index and the nth index grade center of the 2 nd channel quality index between the current network node S and the adjacent network node x is obtained by matching the channel grade between the current network node S and the adjacent network node x through a rule base, and the mu is_1mMembership, μ, of the mth index level center representing the 1 st channel quality index of the current network node S and the neighboring network node x_2nAnd the degree of membership of the nth index level center of the 2 nd channel quality index of the current network node S and the adjacent network node x.

Seventhly, calculating a channel level fuzzy value between the current network node S and the adjacent network node x according to the following formula:

P_{mn}^{l} = \max {L_{mn}^{l}}

wherein,the membership degree of the ith channel level corresponding to the mth index level center of the 1 st channel quality index and the nth index level center of the 2 nd channel quality index between the current network node S and the adjacent network node x,and the membership degree of the ith channel grade corresponding to the mth index grade center of the 1 st channel quality index and the nth index grade center of the 2 nd channel quality index between the current network node S and the adjacent network node x is represented.

Eighthly, defuzzifying the fuzzy set of the channel grade by adopting a center average method to obtain a clear value of the channel grade

Wherein f (S, x) represents a channel level between the current network node S and the neighboring network node x, w (l) represents the l-th channel level center of the current network node S and the neighboring network node x,the membership degree of the ith channel level corresponding to the mth index level center of the 1 st channel quality index between the current network node S and the adjacent network node x, and Σ (-) represents the summation operation.

And step 5, updating the Q value in the Q value table of the network node according to the following formula:

wherein Q is_s(O, x) represents that the current network node S selects the neighboring network node x as a Q value of a next hop node to reach the destination network node O, α represents a learning rate, α is 0.8, R represents an immediate reward, if the current network node S selects the neighboring node x as the next hop node to reach the destination network node O, R is equal to 1, otherwise R is equal to 0; n is a radical of_sRepresents the set of neighboring network nodes of the current network node S, and f (S, x) represents the discount rate, which takes the value of the channel level between the current network node S and the neighboring network node x.

And 6, judging whether the current network node S is a target network node, if so, executing the step (7), otherwise, executing the step (4).

And 7, establishing forward routing information.

Starting from a target network node, sending a route reply data packet to a neighbor network node, selecting an adjacent network node with the maximum Q value as a network node of a next hop for each hop of the route reply data packet, transmitting the route reply data packet to the network node of the next hop, and establishing forward routing information.

And 8, judging whether the routing reply data packet reaches the source network node, if so, executing the step (9), otherwise, executing the step (7).

And 9, sending the data packet.

The effect of the invention can be verified by the following simulation experiment:

1. simulation conditions are as follows:

the experiment of the present invention uses NS2 software as the simulation environment. The simulation parameters of the present invention set by the software are shown in table 2.

TABLE 2NS2 simulation parameter settings

Physical communication channel	Channel/WirelessChannel
		Wireless transmission model	Propagation/TwoRayGround
Network interface type	Phy/WirelessPhy
		Physical layer using protocol	Mac/802_11
Interface queue type	Queue/DropTail/PriQueue
		Network interface queue size	50
Routing protocol	FAODV/AODV
		Number of wireless nodes	40
Setting a topological range length	1000
		Setting a topological range width	1000

2. Simulation content:

in the scene of the simulation experiment of the invention, the vehicle at the intersection can go straight, turn left and turn right, and the probability of turning is 0.5, 0.25 and 0.25 respectively. The simulation time is 160 seconds, and the width and the height of the simulation interval are 1000 meters multiplied by 1000 meters respectively. The number of nodes is set to 40, the target minimum speed is 2m/s, the average speed is 7m/s, the probability of pause is 0.3, and the maximum stop time is 10 s. The initial Q value in the experiment was 0.0 and the learning rate was set to 0.8.

The simulation experiment of the invention adopts the packet loss rate, the delay mean value and the throughput rate as the measurement standard, wherein the formula of the packet loss rate L is as follows:

L = \frac{NSP - NRP}{NSP}

the NSP indicates the number of packets sent by the source node, and the NRP indicates the number of packets received by the destination node. The formula of the delay time D (i) is:

D(i)＝RT(i)-ST(i)

wherein, d (i) indicates the delay time of the ith data packet received by the destination node, rt (i) indicates the time of the ith data packet received by the destination node, and st (i) indicates the time of the ith data packet transmitted by the source node. The throughput rate th (i) is given by the formula:

TH (i) = \frac{TB (i) - TB (i - 1)}{RT (i) - RT (i - 1)}

where th (i) represents the throughput of the destination node at time i, TB (i) represents the total number of packets received up to time i, TB (i-1) represents the total number of packets received at time i-1, RT (i) represents the time at time i, and RT (i-1) represents the time at time i-1.

3. And (3) simulation result analysis:

the simulation experiment result of the invention compares the method FAODV for establishing the routing path based on the Q learning vehicle-mounted network of the fuzzy inference with the distance vector routing method AODV method in the prior art. The simulation experiment result adopts the average value of 20 repeated experiments.

The packet loss ratio of the distance vector routing method based on the AODV method of the present invention and the prior art is shown in table 3. The left side of each group is provided with a fuzzy reasoning Q learning vehicle-mounted network routing method, and the right side of each group is provided with a distance vector routing method AODV method based on the prior art. The packet loss rate of the fuzzy reasoning Q learning vehicle-mounted network routing method link 0, especially the link 1, is far lower than that of the distance vector routing method AODV method in the prior art. In the prior art, the packet loss rates of a link 2 and a link 3 based on a distance vector routing method are zero, and the packet loss rate of a fuzzy reasoning Q learning vehicle-mounted network routing method is close to that of the AODV method. However, from the average value of the packet loss rate, the packet loss rate of the fuzzy inference Q learning vehicle network routing method is lower than that of the AODV method. The invention considers the channel quality between the nodes, selects a better path and has low overall packet loss rate.

Table 3 packet loss ratio in two methods

Table 4 shows the mean delay values of the four links according to the distance vector routing method AODV method of the present invention and the prior art. It can be seen that for the delays of link 0 and link 1, the present invention is significantly lower than the AODV link, and the delays of link 2 and link 3 are substantially equivalent to the AODV method, and overall the present invention is significantly better than the distance vector routing method based on the prior art AODV method. This is because we use the channel idle rate and signal strength as estimation to evaluate the link quality during transmission, and select the idle channel to transmit data, so the delay is reduced significantly.

TABLE 4 mean delay of two methods(s)

Link numbering	FAODV method	AODV method
			0	0.0416	0.3370
1	0.0194	0.4527
			2	0.0242	0.0185
3	0.0700	0.0265
			Mean value	0.0388	0.2087

Table 5 shows the throughput of the distance vector routing based method AODV method of the present invention and the prior art. For link 0 and link 2, the throughput rate of the method is obviously superior to that of the distance vector routing method AODV method in the prior art. For the throughput rate of the link 1, the fuzzy reasoning Q learning vehicle-mounted network routing method is equivalent to the AODV method. The average value is the average value of the throughput rates of 4 links, and the throughput rate of the method is better than that of the AODV method on the whole. The channel idle rate is used as estimation to judge the link quality, and the idle channel is selected to transmit data, so that the throughput rate is improved.

TABLE 5 throughput Rate (Kb/s) for two methods

Link numbering	FAODV method	AODV method
			0	82.09	80.75
1	80.76	81.74
			2	76.61	73.60
3	67.58	69.51
			Mean value	76.76	76.40

Claims

1. A method for establishing a routing path of a Q learning vehicle-mounted network based on fuzzy inference comprises the following steps:

(1) network initialization:

(2) broadcast hello packets:

(3) the source network node starts to send a request message:

(4) calculating a channel level of the intermediate network node:

(4a) calculating the channel idle rate between adjacent network nodes;

(4b) calculating the signal strength between adjacent network nodes;

<math> <mrow> <msub> <mi>Q</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>&LeftArrow;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>α</mi> <mo>)</mo> </mrow> <msub> <mi>Q</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>α</mi> <mo>{</mo> <mi>R</mi> <mo>+</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>s</mi> <mo>)</mo> </mrow> <msub> <mi>max</mi> <mrow> <mi>y</mi> <mo>&Element;</mo> <msub> <mi>N</mi> <mi>s</mi> </msub> </mrow> </msub> <msub> <mi>Q</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>}</mo> </mrow> </math>

wherein Q is_s(o, x) represents that the current network node s selects the adjacent network node x as a Q value of a next hop node reaching the destination network node o, wherein α represents a learning rate, α is 0.8, and R represents an instant reward, if the current network node s selects the adjacent node x as the next hop node, the current network node can reach the destination network node o, R is equal to 1, otherwise, R is equal to 0; n is a radical of_sRepresenting a set of adjacent network nodes of the current network node s, and f (s, x) representing a discount rate, wherein the discount rate is taken as a channel level between the current network node s and the adjacent network node x;

(7) establishing forward routing information:

(9) and (3) sending a data packet:

2. The method for establishing a routing path for a Q-learning vehicle-mounted network based on fuzzy inference as claimed in claim 1, wherein the attributes in the Q-value table of the network node in step (1) comprise:

q-value table information ═ destination network node, next hop network node, Q value }.

3. The method for establishing the routing path for the Q-learning vehicle-mounted network based on fuzzy inference as claimed in claim 1, wherein the attributes in the network node routing table in step (1) refer to:

and the routing information is { destination address, next hop network node, sequence number and timer }.

4. The method for establishing a routing path for a Q-learning vehicle-mounted network based on fuzzy inference as claimed in claim 1, wherein the attributes in the HELLO packet in step (2) refer to:

P_hello(ii) source network node address, network node idle time ratio, network node channel strength };

P_RREQdestination network node, sequence number, hop count, Q value.

5. The method for establishing a routing path for a Q-learning vehicle-mounted network based on fuzzy inference as claimed in claim 1, wherein the step (4a) of calculating the channel idle rate between adjacent network nodes comprises the following steps:

wherein, T1 represents the idle time of the channel obtained by monitoring the intermediate network node, T represents the time for monitoring the intermediate network node channel once, N represents the monitoring times of monitoring the intermediate network node channel, B_kIndicating kth monitoring of intermediate network node channelsFeedback, when the channel is busy, B_kEqual to 0, when the channel is idle, B_kEqual to 1;

R = \frac{T 1}{T 2}

wherein, R represents the idle rate of the intermediate network node channel, T1 represents the channel idle time obtained by monitoring the intermediate network node, T2 represents the time for monitoring the intermediate network node channel for N times, and N represents the monitoring times of the intermediate network node channel;

R_sx＝min{R_s,R_x}

wherein R is_sxRepresenting the channel idleness, R, between the current network node s and the neighboring network node x_sIndicating the channel idle rate, R, of the network node s_xRepresenting the channel idle rate of network node x, and min {. cndot.) represents the minimum value operation.

6. The method for establishing a routing path for a Q-learning vehicle-mounted network based on fuzzy inference as claimed in claim 1, wherein the calculation formula of the signal strength between adjacent network nodes in step (4b) is as follows:

D_{sx} = (1 - \frac{d_{sx}}{d 1})

wherein D is_sxIndicating a current netSignal strength between network node s and adjacent network node x, d_sxRepresenting the euclidean distance between the current network node s and the neighboring network node x, d1 representing the signal communication range of the network node, d1 being 250 m.

7. The method for establishing the routing path for the Q-learning vehicle-mounted network based on fuzzy inference as claimed in claim 1, wherein the fuzzy C-means method of step (4C) comprises the following steps:

the method comprises the following steps that firstly, the channel idle rate grade L1 between a current network node s and an adjacent network node x is divided into three grades of { large, medium and small } in a fuzzy mode, and the signal intensity grade L2 between the current network node s and the adjacent network node x is divided into three grades of { strong, good and weak }; updating the k-th index grade center of the ith index grade center of the jth channel quality index representing the current network node s and the adjacent network node x_ji(k) Is set to [0,1 ]]K represents the number of updates of the rank center, and the initial value of k is set to 0;

wherein, mu_jiRepresents the k-time update value sigma of the j-th channel quality index of the current network node s and the adjacent network node x to the i-th index grade center_ji(k) Degree of membership of, σ_ji(k) K-th update value, z, of ith index level center representing jth channel quality index of current network node s and neighboring network node x_jRepresents the jth channel quality indicator between the current network node s and the neighboring network node x, Σ (-) represents a summation operation;

wherein σ_ji(k +1) represents the (k +1) th updated value of the ith index level center of the jth channel quality index of the current network node s and the adjacent network node x, M represents the channel quality sample space dimension of the current network node s and the adjacent network node x, mu_jiRepresents the k-time update value sigma of the j-th channel quality index of the current network node s and the adjacent network node x to the i-th index grade center_ji(k) Degree of membership of, z_jRepresents the jth channel quality indicator between the current network node s and the neighboring network node x, Σ (-) represents a summation operation;

where e represents the update error value, σ_ji(k +1) represents the (k +1) th updated value, σ, of the ith index level center of the jth channel quality index of the current network node s and the adjacent network node x_ji(k) A k-th update value of the ith index level center representing the jth channel quality index for the current network node s and the neighboring network node x, (·) representing a summation operation;

step five, judging whether the updating error value e is smaller than a threshold value of 0.00001, if so, executing the step six, otherwise, executing the step two;

sixthly, manually establishing a fuzzy rule base according to the membership degree of the channel quality index grade between the current network node s and the adjacent network node x and the corresponding channel grade; the fuzzy division of the channel level center is three levels of { poor, normal and good }, and the channel level centers w of the three fuzzy divisions are {0.25,0.5 and 0.75 };

wherein,represents the current network node s andthe membership degree of the ith channel grade corresponding to the mth index grade center of the 1 st channel quality index and the nth index grade center of the 2 nd channel quality index between the adjacent network nodes x, the channel grade between the current network node s and the adjacent network node x is obtained by matching a rule base, and mu_1mMembership, μ, of the mth index level center representing the 1 st channel quality index of the current network node s and the neighboring network node x_2nRepresenting the membership degree of the nth index grade center of the 2 nd channel quality index of the current network node s and the adjacent network node x;

P_{mn}^{l} = \max {L_{mn}^{l}}

wherein,the membership degree of the ith channel level corresponding to the mth index level center of the 1 st channel quality index and the nth index level center of the 2 nd channel quality index between the current network node s and the adjacent network node x,representing the membership degree of the ith channel grade corresponding to the mth index grade center of the 1 st channel quality index and the nth index grade center of the 2 nd channel quality index between the current network node s and the adjacent network node x;

eighthly, defuzzifying the channel level fuzzy set by adopting a center average method to obtain a channel level clear value:

wherein f (s, x) represents a channel level between the current network node s and the neighboring network node x, w (l) represents the l-th channel level center of the current network node s and the neighboring network node x,the degree of membership of the mth index level center representing the 1 st channel quality index between the current network node s and the adjacent network node x to the ith channel level corresponding to the nth index level center of the 2 nd channel quality index, Σ (-) represents the summation operation.